Statistical syntax-based machine translation approach to diacritization problem
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/30/1/2839Keywords:
Automatic diacritization, syntax-based machine translation, grammatical inference.Abstract
In this paper, the automatic diacritization of a language is modeled as a statistical syntax-based machine translation problem with the source undiacritized text and the target diacritized text of the same language. The grammatical inference technique ABL proposed in [2] is extended for learning a probabilistic synchronous context-free grammar from training corpus containing plain diacritized sentences only. The diacritization is to parse input sentences by the probabilistic CKY parsing algorithm for received grammar. This method is applied to Vietnamese with high quality result. As language independent building way, it can be applied to the other languages.Metrics
Metrics Loading ...
Downloads
Published
26-03-2014
How to Cite
[1]
N. M. Hai and N. M. Tuấn, “Statistical syntax-based machine translation approach to diacritization problem”, JCC, vol. 30, no. 1, pp. 39–48, Mar. 2014.
Issue
Section
Computer Science
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.