Extract speech feature vectors for HMM-based Vietnamese speech synthesis system

Phan Thanh Sơn, Dương Tử Cường
Author affiliations

Authors

  • Phan Thanh Sơn Khoa Công nghệ thông tin, Học viện Kỹ thuật quân sự
  • Dương Tử Cường Học viện Kỹ thuật Quân sự

DOI:

https://doi.org/10.15625/1813-9663/29/1/2303

Keywords:

Vietnamese speech synthesis, context-dependent, speech parameterization, statistical parametric speech synthesis.

Abstract

Recently, the statistical framework based on Hidden Markov Models (HMMs) plays an important role in the speech synthesis method. The system can be built without requiring a very large speech corpus for training the system. In this method, statistical modeling is applied to learn distributions of context-dependent acoustic vectors extracted from speech signals, each vector contains a suitable parametric representation of one speech frame and Vietnamese phonetic rules to synthesize the speech. The overall performance of the systems is often limited by the accuracy of the underlying speech parameterization and reconstruction method. The method proposed in this paper allows accurate MFCC, F0 and tone extraction and high-quality reconstruction of speech signals assuming Mel Log Spectral Approximation filter. Its suitability for high-quality HMM-based speech synthesis is shown through evaluations subjectively.

Metrics

Metrics Loading ...

Published

01-04-2013

How to Cite

[1]
P. T. Sơn and D. T. Cường, “Extract speech feature vectors for HMM-based Vietnamese speech synthesis system”, JCC, vol. 29, no. 1, pp. 55–65, Apr. 2013.

Issue

Section

Computer Science

Most read articles by the same author(s)