French-Vietnamese statistical machine translation combining with chunk alignments
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/29/4/4343Keywords:
Bilingual corpus, statistical machine translation, chunk alignmentAbstract
Nowadays, among Statistical Machine Translation (SMT) models, the phrase-based SMT is highly appreciated. However, this model is still lacked of linguistics knowledge at higher levels such as morphological, syntactic and semantic information. Consequently, the results of this approach are still limited in cases of long sentences. So, using morphological information from such as phrase chunking on the purpose of reducing the length of sentences to improve the translation quality is a promising approach, and hence to disambiguate the chunk alignment in the long sentences. In this paper, we present an approach of a chunk alignment applied to French-Vietnamese SMT. We have tested our model system with a French-Vietnamese bilingual corpus which consists of 10,000 pairs and assessed the metrics measures. The result of the model of French-Vietnamese SMT based on chunk alignment is considerable with the BLEU metric measure which increases almost 2% in comparison of the baseline model.
Metrics
Downloads
Published
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.