Statistical syntax-based machine translation approach to diacritization problem

Nguyen Minh Hai, Nguyễn Minh Tuấn
Author affiliations

Authors

  • Nguyen Minh Hai Học viện Công nghệ Bưu chính-Viễn thông
  • Nguyễn Minh Tuấn Học viện Công nghệ Bưu chính-Viễn thông

DOI:

https://doi.org/10.15625/1813-9663/30/1/2839

Keywords:

Automatic diacritization, syntax-based machine translation, grammatical inference.

Abstract

In this paper, the automatic diacritization of a language is modeled as a statistical syntax-based machine translation problem with the source undiacritized text and the target diacritized text of the same language. The grammatical inference technique ABL proposed in [2] is extended for learning a probabilistic synchronous context-free grammar from training corpus containing plain diacritized sentences only. The diacritization is to parse input sentences by the probabilistic CKY parsing algorithm for received grammar. This method is applied to Vietnamese with high quality result. As language independent building way, it can be applied to the other languages.

Metrics

Metrics Loading ...

Published

26-03-2014

How to Cite

[1]
N. M. Hai and N. M. Tuấn, “Statistical syntax-based machine translation approach to diacritization problem”, JCC, vol. 30, no. 1, pp. 39–48, Mar. 2014.

Issue

Section

Computer Science

Most read articles by the same author(s)