MSV challenge language-adversarial training for indic multilingual speaker verification
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/18320Keywords:
Speaker verification, adversarial training, multilingual.Abstract
Speaker verification now reports a reasonable level of accuracy in its applications in voice-based biometric systems. Recent research on deep neural networks and predicting speaker identity based on speaker embeddings have gained remarkable success. However, results are limited when it comes to verifying multilingual speakers. In this paper, we propose an ensemble system submitted to the I-MSV Challenge 2022. The system is built upon the ECAPA and RawNet model with additional adversarial training layers. Probabilistic Linear Discriminant Analysis back-end scoring and Large Margin Cosine Loss are implemented to further obtain more discriminative features. Experimental results show that on the Constraint Private Test set of the task, our proposed model achieved remarkable results, ranked third with an Equal Error Rate (EER) of 2.9734\%.
Metrics
References
Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khu-
danpur, “A time delay neural network architecture for
efficient modeling of long temporal contexts,” 09 2015,
pp. 3214–3218.
Brecht Desplanques, Jenthe Thienpondt, and Kris De-
muynck, “Ecapa-tdnn: Emphasized channel attention,
propagation and aggregation in tdnn based speaker ver-
ification,” 10 2020.
Y. Jiang, Kong Aik Lee, Z. Tang, Bin Ma, Anthony
Larcher, and Haizhou Li, “Plda modeling in i-vector
and supervector space for speaker verification,” vol. 2,
pp. 1678–1681, 01 2012.
Jee-Weon Jung, Hee-Soo Heo, Ju-Ho Kim, Hye-Jin
Shim, and Ha-Jin Yu, “Rawnet: Advanced end-to-
end deep neural network using raw waveforms for text-
independent speaker verification,” 04 2019.
Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Zhifeng
Li, Dihong Gong, Jingchao Zhou, and Wei Liu, “Cos-
face: Large margin cosine loss for deep face recogni-
tion,” 01 2018.
Motoki Sato, Hitoshi Manabe, Hiroshi Noji, and Yuji
Matsumoto, “Adversarial training for cross-domain uni-
versal dependency parsing,” 01 2017, pp. 71–79.
Bengt J Borgstr ̈om, “Discriminative training of plda for
speaker verification with x-vectors,” 2020.
Georg Heigold, Vincent Vanhoucke, Alan Senior,
Patrick Nguyen, Marc’Aurelio Ranzato, Matthieu
Devin, and Jeffrey Dean, “Multilingual acoustic models
using distributed deep neural networks,” in 2013 IEEE
international conference on acoustics, speech and sig-
nal processing. IEEE, 2013, pp. 8619–8623.
Jui-Ting Huang, Jinyu Li, Dong Yu, Li Deng, and Yifan
Gong, “Cross-language knowledge transfer using multi-
lingual deep neural network with shared hidden layers,”
in 2013 IEEE International Conference on Acoustics,
Speech and Signal Processing. IEEE, 2013, pp. 7304–
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pas-
cal Germain, Hugo Larochelle, Franc ̧ois Laviolette,
Mario Marchand, and Victor Lempitsky, “Domain-
adversarial training of neural networks,” The journal
of machine learning research, vol. 17, no. 1, pp. 2096–
, 2016.
Ke Hu, Hasim Sak, and Hank Liao, “Adversarial train-
ing for multilingual acoustic modeling,” arXiv preprint
arXiv:1906.07093, 2019.
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pas-
cal Germain, Hugo Larochelle, Franc ̧ois Laviolette,
Mario Marchand, and Victor Lempitsky, “Domain-
adversarial training of neural networks,” The journal
of machine learning research, vol. 17, no. 1, pp. 2096–
, 2016.
Lantian Li, Ruiqian Nai, and Dong Wang, “Real addi-
tive margin softmax for speaker verification,” in ICASSP
-2022 IEEE International Conference on Acous-
tics, Speech and Signal Processing (ICASSP). IEEE,
, pp. 7527–7531.
Yi Liu, Liang He, and Jia Liu, “Large margin soft-
max loss for speaker verification,” arXiv preprint
arXiv:1904.03479, 2019.
Mirco Ravanelli, Titouan Parcollet, Peter Plantinga,
Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem
Subakan, Nauman Dawalatabad, Abdelwahab Heba,
Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-
Wei Fu, Chien-Feng Liao, Elena Rastorgueva, Franc ̧ois
Grondin, William Aris, Hwidong Na, Yan Gao, Re-
nato De Mori, and Yoshua Bengio, “Speech-
Brain: A general-purpose speech toolkit,” 2021,
arXiv:2106.04624.
Ahilan Kanagasundaram, Robert Vogt, David Dean,
Sridha Sridharan, and Michael Mason, “I-vector based
speaker recognition on short utterances,” in Pro-
ceedings of the 12th Annual Conference of the Inter-
national Speech Communication Association. Interna-
tional Speech Communication Association, 2011, pp.
–2344.
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos
Zafeiriou, “Arcface: Additive angular margin loss for
deep face recognition,” in Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition,
, pp. 4690–4699.
Roland Auckenthaler, Michael Carey, and Harvey
Lloyd-Thomas, “Score normalization for text-
independent speaker verification systems,” Digital Sig-
nal Processing, vol. 10, no. 1-3, pp. 42–54, 2000.
Andrey Shulipa, Sergey Novoselov, and Yuri Matveev,
“Scores calibration in speaker recognition systems,”
in International Conference on Speech and Computer.
Springer, 2016, pp. 596–603.
Philipp Moritz, Robert Nishihara, and Michael Jordan,
“A linearly-convergent stochastic l-bfgs algorithm,” in
Artificial Intelligence and Statistics. PMLR, 2016, pp.
–258.
Florin R ̆astoceanu and Marilena Laz ̆ar, “Score fusion
methods for text-independent speaker verification appli-
cations,” in 2011 6th Conference on Speech Technology
and Human-Computer Dialogue (SpeD). IEEE, 2011,
pp. 1–6.
Ville Hautam ̈aki, Tomi Kinnunen, Filip Sedl ́ak,
Kong Aik Lee, Bin Ma, and Haizhou Li, “Sparse classi-
fier fusion for speaker verification,” IEEE Transactions
on Audio, Speech, and Language Processing, vol. 21,
no. 8, pp. 1622–1631, 2013.
Jee-weon Jung, You Jin Kim, Hee-Soo Heo, Bong-Jin
Lee, Youngki Kwon, and Joon Son Chung, “Pushing
the limits of raw waveform speaker recognition,” arXiv
preprint, vol. 2203, 2022.
Downloads
Published
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.