A model for exploiting the target language characteristics to extract bilingual base noun phrases

Nguyen Chi Hieu
Author affiliations

Authors

  • Nguyen Chi Hieu Faculty of Information Technology Industrial University of Ho Chi Minh City

DOI:

https://doi.org/10.15625/1813-9663/30/2/3591

Keywords:

Npbase, classifiers, word order, NLP

Abstract

Bilingual Base Noun Phrase (BaseNP) extraction is one of the key tasks of Natural Language Processing (NLP). This task is more challenging for the pair of English-Vietnamese due to the lack of available Vietnamese language resources such as treebanks, part-of-speech taggers, and parsers. In this paper, we propose a combination model that uses language characteristics based on statistics and the projection method to extract BaseNP correspondences from a bilingual corpus. The language characteristics used in this model include the word segmentation, word order and word classification [1]. Our model overcomes not only the lack of resources of Vietnamese, but also improves the performance of miss-alignment, null-alignment, overlap and conflict projection of the existing methods. The proposed model can be easily applied to other language pairs. Experiment on 66,646 pairs of sentences in the English-Vietnamese bilingual corpus shows that our proposed model is very satisfactory.

Metrics

Metrics Loading ...

Downloads

Published

10-06-2014

How to Cite

[1]
N. C. Hieu, “A model for exploiting the target language characteristics to extract bilingual base noun phrases”, JCC, vol. 30, no. 2, pp. 177–188, Jun. 2014.

Issue

Section

Computer Science