Robust Speaker Diarization Based on Daubechies Wavelet, Nonlinear Energy Operator and Pyknogram
Sukhvinder Kaur1, J.S. Sohal2, Amit Gupta3
1Sukhvinder Kaur, Electronics and Communication Engineering, Research Scholar, IKG Punjab Technical University, Punjab, India.
2J.S Sohal, LCET, Ludhiana, Punjab, India Amit Gupta, Electronics and Communication Engineering IKGPTU, Punjab, India.
Manuscript received on November 12, 2019. | Revised Manuscript received on November 25, 2019. | Manuscript published on 30 November, 2019. | PP: 5653-5659 | Volume-8 Issue-4, November 2019. | Retrieval Number: D8535118419/2019©BEIESP | DOI: 10.35940/ijrte.D8535.118419
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Two common disciplines of speech processing are speaker recognition “identification and verification of speaker”, and speaker diarization, “who spoke when”. Motivated by various applications in automatic speaker recognition, speaker indexing, word counting, and audio transcription, speaker diarization (SD) becomes a significant area of signal processing. The basic designing steps of SD are feature extraction, voice activity detection (VAD), segmentation, and clustering. VAD process is accomplished by Daubechies 40, discrete wavelets transform (DWT). Initially, DWT was used for compression, scaling, and denoising of audio-stream and then partitioned into small frames of size 0.12 seconds. Next, features of each frame were extracted by applying nonlinear energy operator (NEO) based pyknogram. To measure the similarity between frames, a sliding window on delta-BIC distance metric was applied. A negative value of its output represents the same segments and vice-versa. To improve the output of the segmentation process, resegmentation was applied by information change rate method. At last, hierarchical clustering groups the homogeneous segments that correspond to a particular speaker and has been graphically represented by the dendrogram. The performance of SD was evaluated by F-measure and speaker diarization error rate (SER) and their results were compared with the traditional speaker diarization system that uses MFCC and BIC for segmentation and clustering. It reveals a significant reduction of 12.3% of SER in the proposed diarization system.
Keywords: Bayesian Information Criteria, Dendrogram, Diarization Error Rate, Pyknogram.
Scope of the Article: Security Technology and Information Assurance.