Retranslating number expression unknown word in Chinese-Vietnamese statistical machine translation
Keywords:Chinese-Vietnamese statistical machine translation, unknown word, named entity, number expression.
Word boundary in Chinese and Vietnamese is not defined by a space. Therefore, Chinese-Vietnamese word segmentations are always done first in Chinese-Vietnamese natural language processing problem in general and in Chinese-Vietnamese statistical machine translation in particular. The word segmentation increases the final quality of translation, but it appears many unknown words (UKW) in the target translation. The type of popular unknown word in Chinese-Vietnamese translation system is named entity (NE). In this paper, we present a hybrid method to combine statistic and rule and to re-translate number expression NE-UKW (NumExp-NE-UKW). Applying this method into Chinese-Vietnamese SMT, the experiment result shows that our method significantly improves Chinese-Vietnamese SMT performance.
How to Cite
License1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.
2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.