Automatic information extraction in Vietnamese text

San Chanrathany, Lê Thanh Hương, Nguyễn Thanh Thủy, Nguyễn Hữu Thiệu
Author affiliations

Authors

  • San Chanrathany
  • Lê Thanh Hương
  • Nguyễn Thanh Thủy
  • Nguyễn Hữu Thiệu

DOI:

https://doi.org/10.15625/1813-9663/28/2/2493

Abstract

 This paper presents semi-supervised approaches to construct a Vietnamese information extraction system. Our approach in named entity extraction inherits the idea of Liao and expands it by using proper name coreference rules to find new entities. The new entities are put into the training set to learn new context features for the extracting module. The experimental results show that our method achieves higher accuracy than Liao’s. In relation extraction, we improved the Shallow Linguistic Kernel (SLK) of Giuliano et al.’s by modifying the window size of the kernel and using additional features to present sentences, including part of speech, another entity types, and a dictionary of compound verbs. Our experimental results show that the supervised method using our SLK achieves higher accuracy than one used by Giuliano et al. And its accuracy when applying the semi-supervised  method is higher than that when using the supervised one.

Metrics

Metrics Loading ...

How to Cite

[1]
S. Chanrathany, L. T. Hương, N. T. Thủy, and N. H. Thiệu, “Automatic information extraction in Vietnamese text”, JCC, vol. 28, no. 2, pp. 115–128, Oct. 2012.

Issue

Section

Computer Science

Most read articles by the same author(s)

1 2 3 > >>