Automatic information extraction in Vietnamese text

San Chanrathany; Lê Thanh Hương; Nguyễn Thanh Thủy; Nguyễn Hữu Thiệu

doi:10.15625/1813-9663/28/2/2493

Automatic information extraction in Vietnamese text

San Chanrathany, Lê Thanh Hương, Nguyễn Thanh Thủy, Nguyễn Hữu Thiệu

Author affiliations

Authors

San Chanrathany
Lê Thanh Hương
Nguyễn Thanh Thủy
Nguyễn Hữu Thiệu

DOI:

https://doi.org/10.15625/1813-9663/28/2/2493

Abstract

This paper presents semi-supervised approaches to construct a Vietnamese information extraction system. Our approach in named entity extraction inherits the idea of Liao and expands it by using proper name coreference rules to find new entities. The new entities are put into the training set to learn new context features for the extracting module. The experimental results show that our method achieves higher accuracy than Liao’s. In relation extraction, we improved the Shallow Linguistic Kernel (SLK) of Giuliano et al.’s by modifying the window size of the kernel and using additional features to present sentences, including part of speech, another entity types, and a dictionary of compound verbs. Our experimental results show that the supervised method using our SLK achieves higher accuracy than one used by Giuliano et al. And its accuracy when applying the semi-supervised method is higher than that when using the supervised one.

Downloads

How to Cite

[1]S. Chanrathany, L. T. Hương, N. T. Thủy, and N. H. Thiệu, “Automatic information extraction in Vietnamese text”, J. Comput. Sci. Cybern., vol. 28, no. 2, pp. 115–128, Oct. 2012.

Download Citation

Issue

Vol. 28 No. 2 (2012)

Section

Computer Science

License

1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.
2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.

Automatic information extraction in Vietnamese text

Authors

DOI:

Abstract

Downloads

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Published by Year

indexing

Information