Retranslating number expression unknown word in Chinese-Vietnamese statistical machine translation

Phước Thanh Trần, Điền Đinh
Author affiliations

Authors

  • Phước Thanh Trần Khoa Công nghệ Thông tin, Trường Đại Học Công nghiệp Thực Phẩm TPHCM
  • Điền Đinh Khoa Công nghệ Thông tin, Trường Đại Học Khoa Học Tự Nhiên TPHCM

DOI:

https://doi.org/10.15625/1813-9663/30/2/2589

Keywords:

Chinese-Vietnamese statistical machine translation, unknown word, named entity, number expression.

Abstract

Word boundary in Chinese and Vietnamese is not defined by a space. Therefore, Chinese-Vietnamese word segmentations are always done first in Chinese-Vietnamese natural language processing problem in general and in Chinese-Vietnamese statistical machine translation in particular. The word segmentation increases the final quality of translation, but it appears many unknown words (UKW) in the target translation. The type of popular unknown word in Chinese-Vietnamese translation system is named entity (NE). In this paper, we present a hybrid method to combine statistic and rule and to re-translate number expression NE-UKW (NumExp-NE-UKW). Applying this method into Chinese-Vietnamese SMT, the experiment result shows that our method significantly improves Chinese-Vietnamese SMT performance.

Metrics

Metrics Loading ...

Published

10-06-2014

How to Cite

[1]
P. T. Trần and Điền Đinh, “Retranslating number expression unknown word in Chinese-Vietnamese statistical machine translation”, JCC, vol. 30, no. 2, pp. 127–138, Jun. 2014.

Issue

Section

Computer Science