A model for exploiting the target language characteristics to extract bilingual base noun phrases
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/30/2/3591Keywords:
Npbase, classifiers, word order, NLPAbstract
Bilingual Base Noun Phrase (BaseNP) extraction is one of the key tasks of Natural Language Processing (NLP). This task is more challenging for the pair of English-Vietnamese due to the lack of available Vietnamese language resources such as treebanks, part-of-speech taggers, and parsers. In this paper, we propose a combination model that uses language characteristics based on statistics and the projection method to extract BaseNP correspondences from a bilingual corpus. The language characteristics used in this model include the word segmentation, word order and word classification [1]. Our model overcomes not only the lack of resources of Vietnamese, but also improves the performance of miss-alignment, null-alignment, overlap and conflict projection of the existing methods. The proposed model can be easily applied to other language pairs. Experiment on 66,646 pairs of sentences in the English-Vietnamese bilingual corpus shows that our proposed model is very satisfactory.Metrics
Metrics Loading ...
Downloads
Published
10-06-2014
How to Cite
[1]
N. C. Hieu, “A model for exploiting the target language characteristics to extract bilingual base noun phrases”, JCC, vol. 30, no. 2, pp. 177–188, Jun. 2014.
Issue
Section
Computer Science
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.