Development of Vietnamese Speech Synthesis System using Deep Neural Networks
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/34/4/13172Keywords:
Text-to-speech, speech synthesis, deep neural network, hidden Markov modelAbstract
In this paper, we present our first Vietnamese speech synthesis system based on deep neural networks. To improve the training data collected from the Internet, a cleaning method is proposed. The experimental results indicate that by using deeper architectures we can achieve better performance for the TTS than using shallow architectures such as hidden Markov model. We also present the effect of using different amounts of data to train the TTS systems. In the VLSP TTS challenge 2018, our proposed DNN-based speech synthesis system won the first place in all three subjects including naturalness, intelligibility, and MOS.Metrics
References
ng. Pathol., vol. 18, no. 2, pp. 122–134, 2016.
M. Brunelle, “Northern and Southern Vietnamese tone coarticulation: A comparative case study,” J. Southeast Asian Linguist., vol. 1, pp. 49–62, 2009.
M. Brunelle, “Tone perception in Northern and Southern Vietnamese,” J. Phon., vol. 37, no. 1, pp. 79–96, 2009.
J. Edmondson and N. V. Lợi, “Tones and voice quality in modern northern Vietnamese: instrumental case studies.”,” Mon-Khmer Stud., vol. 28, 1997.
M. Morise, F. Yokomori, and K. Ozawa, “WORLD: a vocoder-based high-quality speech synthesis system for real-time applications,” IEICE Trans. Inf. Syst., vol. 99, no. 7, pp. 1877–1884, 2016.
Downloads
Published
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.