A Comparison of Features for Multilingual Speaker Identification – A Review and Some Experimental Results
Pritam Limbaji Sale1, Spoorti J Jainar2, B.G. Nagaraja3
1Pritam Limbaji Sale, Research Scholar, VTU, Belagavi Belgaum (Karnataka), India.
2Spoorti J. Jainar, Research Scholar, VTU, Belagavi Belgaum (Karnataka), India.
3B.G. Nagaraja, Professor & Head, Department of E&CE, JIT, Davangere (Karnataka), India.
Manuscript received on 15 December 2018 | Revised Manuscript received on 27 December 2018 | Manuscript Published on 24 January 2019 | PP: 299-304 | Volume-7 Issue-4S2 December 2018 | Retrieval Number: Es2069017519/19©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Countries like India, Canada, Malaysia, etc. are multilingual in nature. People in multilingual countries have habituated to use several languages. Due to the increased number of multilingual speaker identification system applications, the interest in the area has grown notably in recent years. The accuracy of speaker recognition system is severely degraded if training and testing speech languages are different. In speaker recognition area, researchers have made many attempts to tackle language mismatch issues. Choosing a suitable feature extraction method for obtaining appropriate information using speech signal is an essential task. This paper reports a concise experimental review of ten feature extraction techniques for the multilingual scenario. The monolingual, crosslingual and multilingual speaker identification studies are carried out using randomly selected 50 speakers from the IITG multi-variability speaker recognition (IITG-MV) database. Comparative results indicate that subband centroid frequency coefficients (SCFC), linear frequency cepstral coefficients (LFCC) and multitaper Mel frequency cepstral coefficients (MFCC) features are considerably more useful in all the speaker identification. Further, concluding any relation to speaker identification performance in the language mismatch environment is identification as the distribution of speakers in different languages is non-uniform.
Keywords: Speaker Identification, Monolingual, Cross Lingual, Multilingual, LPCC, MFCC, IMFCC, LFCC, RFCC, Multitaper MFCC, SCFC, SSFC, GMM-UBM.
Scope of the Article: Cross-Layer Optimization