Development of Vietnamese Speech Synthesis System using Deep Neural Networks

Thinh Van Nguyen, Bao Quoc Nguyen, Kinh Huy Phan, Hai Van Do
Author affiliations

Authors

  • Thinh Van Nguyen
  • Bao Quoc Nguyen
  • Kinh Huy Phan
  • Hai Van Do

DOI:

https://doi.org/10.15625/1813-9663/34/4/13172

Keywords:

Text-to-speech, speech synthesis, deep neural network, hidden Markov model

Abstract

In this paper, we present our first Vietnamese speech synthesis system based on deep neural networks. To improve the training data collected from the Internet, a cleaning method is proposed. The experimental results indicate that by using deeper architectures we can achieve better performance for the TTS than using shallow architectures such as hidden Markov model. We also present the effect of using different amounts of data to train the TTS systems. In the VLSP TTS challenge 2018, our proposed DNN-based speech synthesis system won the first place in all three subjects including naturalness, intelligibility, and MOS.

Metrics

Metrics Loading ...

References

ng. Pathol., vol. 18, no. 2, pp. 122–134, 2016.

M. Brunelle, “Northern and Southern Vietnamese tone coarticulation: A comparative case study,” J. Southeast Asian Linguist., vol. 1, pp. 49–62, 2009.

M. Brunelle, “Tone perception in Northern and Southern Vietnamese,” J. Phon., vol. 37, no. 1, pp. 79–96, 2009.

J. Edmondson and N. V. Lợi, “Tones and voice quality in modern northern Vietnamese: instrumental case studies.”,” Mon-Khmer Stud., vol. 28, 1997.

M. Morise, F. Yokomori, and K. Ozawa, “WORLD: a vocoder-based high-quality speech synthesis system for real-time applications,” IEICE Trans. Inf. Syst., vol. 99, no. 7, pp. 1877–1884, 2016.

Downloads

Published

30-01-2019

How to Cite

[1]
T. V. Nguyen, B. Q. Nguyen, K. H. Phan, and H. V. Do, “Development of Vietnamese Speech Synthesis System using Deep Neural Networks”, JCC, vol. 34, no. 4, p. 349–363, Jan. 2019.

Issue

Section

Computer Science