Automatic information extraction in Vietnamese text
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/28/2/2493Abstract
This paper presents semi-supervised approaches to construct a Vietnamese information extraction system. Our approach in named entity extraction inherits the idea of Liao and expands it by using proper name coreference rules to find new entities. The new entities are put into the training set to learn new context features for the extracting module. The experimental results show that our method achieves higher accuracy than Liao’s. In relation extraction, we improved the Shallow Linguistic Kernel (SLK) of Giuliano et al.’s by modifying the window size of the kernel and using additional features to present sentences, including part of speech, another entity types, and a dictionary of compound verbs. Our experimental results show that the supervised method using our SLK achieves higher accuracy than one used by Giuliano et al. And its accuracy when applying the semi-supervised method is higher than that when using the supervised one.
Metrics
Downloads
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.