Applying Bottle Neck Feature for Vietnamese speech recognition

Nguyễn Văn Huy, Lương Chi Mai, Vũ Tất Thắng

Abstract


In the paper, the basic idea of Bottle Neck Feature (BNF) and the process how to extract BNF are presented. We apply BNF for Vietnamese speech recognition with five layers MLP network of different sizes for the first hidden layer. Input features to extract BNF feature are Perceptual Linear Prediction (PLP) and Mel Frequency Cepstral Coefficient (MFCC). The experiments are carried out on a data set of VOV (Voice of Vietnam). The results show that using BNF for Vietnamese speech recognition, a WER (Word Error Rate) is improved up to 6-7% comparing to the baseline system, and MFCC feature gives a better result than PLP feature.

Keywords


BNF, bottle neck feature, Vietnamese speech recognition, HMM-GMM



DOI: https://doi.org/10.15625/1813-9663/29/4/4345

Journal of Computer Science and Cybernetics ISSN: 1813-9663

Published by Vietnam Academy of Science and Technology