Applying Bottle Neck Feature for Vietnamese speech recognition

Nguyễn Văn Huy, Lương Chi Mai, Vũ Tất Thắng
Author affiliations

Authors

  • Nguyễn Văn Huy Khoa Điện tử, Trường ĐH Kỹ thuật Công nghiệp, Thái Nguyên, Việt Nam
  • Lương Chi Mai Viện Công nghệ Thông tin, Viện Khoa học hàn lâm Việt Nam
  • Vũ Tất Thắng Viện Công nghệ Thông tin, Viện Khoa học hàn lâm Việt Nam

DOI:

https://doi.org/10.15625/1813-9663/29/4/4345

Keywords:

BNF, bottle neck feature, Vietnamese speech recognition, HMM-GMM

Abstract

In the paper, the basic idea of Bottle Neck Feature (BNF) and the process how to extract BNF are presented. We apply BNF for Vietnamese speech recognition with five layers MLP network of different sizes for the first hidden layer. Input features to extract BNF feature are Perceptual Linear Prediction (PLP) and Mel Frequency Cepstral Coefficient (MFCC). The experiments are carried out on a data set of VOV (Voice of Vietnam). The results show that using BNF for Vietnamese speech recognition, a WER (Word Error Rate) is improved up to 6-7% comparing to the baseline system, and MFCC feature gives a better result than PLP feature.

Metrics

Metrics Loading ...

Published

03-12-2013

How to Cite

[1]
N. V. Huy, L. C. Mai, and V. T. Thắng, “Applying Bottle Neck Feature for Vietnamese speech recognition”, JCC, vol. 29, no. 4, pp. 379–388, Dec. 2013.

Issue

Section

Computer Science