ACCELERATION IN STATE-OF-THE-ART ASR APPLIED TO A VIETNAMESE TRANSCRIPTION SYSTEM
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/34/4/13181Keywords:
Vietnamese automatic speech recognition, transcription systemAbstract
This paper presents the adoption of state-of-the-art ASR techniques into Vietnamese. To better assess these techniques, speech corpora in the research community are assembled, and expanded, making a unified evaluation material under the name VN-Corpus. On this corpus, three ASR systems are built using the conventional HMM-GMM recipe, SGMM, and DNN respectively. Experimental results crown DNN with the overall WER of 12.1%. In the best case, DNN even cut down to 9.7% error rate.
Metrics
References
Quan Vu, et al., “A Robust Vietnamese Voice Server for Automated Directory Assistance Application,” RIVF-VLSP, HCM City, Viet Nam, 2012.
Quan Vu, et al., “iSago: The Vietnamese Mobile Speech Assistant for Food-court and Restaurant Location,” RIVF-VLSP, HCM City, Viet Nam, 2012.
S. Young, "HMMs and Related Speech Recognition Technologies." Springer Handbook of Speech Processing, Springer, 2007.
D. Povey, et al., “Subspace Gaussian mixture models for speech recognition,” Proceedings of ICASSP’10, 2010.
G. Hinton, et al. "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." Signal Processing Magazine, IEEE 29.6, pp. 82-97, 2012.
P. Hoang, Syllable Dictionary, Danang Publishing House, 1996.
Quan Vu, et al., “Advances in Acoustic Modeling for Vietnamese LVCSR,” International Conference on Asian Language Processing, Singapore, 2009.
Quan Vu, et al., “A Robust Transcription System for Soccer Video Database,” International Conference on Audio Language and Image Processing (ICALIP), 2010.
Quan Vu, et al., "Temporal confusion network for speech-based soccer event retrieval," International Conference on Advanced Technologies for Communications (ATC), 2013.
H. Nguyen, et al., “Selection of Basic Units for Vietnamese Large Vocabulary Continuous Speech Recognition,” The 4th IEEE International Conference on Computer Science - Research, Innovation and Vision of the Future, HCMC, Vietnam, 2006.
D. Povey and G. Saon, “Feature and model space feature adaptation with full covariance gaussian,” Proceedings of the 9th International Conference on Spoken Language Processing (ICSLP), pp. 4330–4333, 2006.
F. Seide, G. Li, X. Chien, and D. Yu, “Feature engineering in context- dependent deep neural networks for conversational speech transcription,” Proceedings of Automatic Speech Recognition and Understanding Workshop (ASRU), 2011.
Downloads
Published
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.