Audio-Visual Asynchrony in Malayalam Phonemes and Allophones
Bibish Kumar K T1, Sunil John2, Muraleedharan K M3, R K Sunil Kumar4
1Bibish Kumar K T, Computer Speech & intelligence Research Centre, Department of Physics, Government College, Madappally, Vadakara, Calicut, Kerala, India.
2Sunil John, Computer Speech & intelligence Research Centre, Department of Physics, Government College, Madappally, Vadakara, Calicut, Kerala, India.
3Muraleedharan K M, Computer Speech & intelligence Research Centre, Department of Physics, Government College, Madappally, Vadakara, Calicut, Kerala, India.
4R K Sunil Kumar, School of Information Science and Technology, Kannur University, Kerala, India.
Manuscript received on 06 August 2019. | Revised Manuscript received on 12 August 2019. | Manuscript published on 30 September 2019. | PP: 8359-8362 | Volume-8 Issue-3 September 2019 | Retrieval Number: C6468098319/2019©BEIESP | DOI: 10.35940/ijrte.C6468.098319
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Simulating the audio-visual asynchrony (AVA) is just one of those essential issues to be researched in the use of video signal along with the audio signal in the speech processing applications. AVA analysis dealswith the estimation of theasynchrony between audio andvisual speech signal produced during the articulation of phonemes and allophones. Just a few works of literature have discussed this specific dilemma that immediately reflects more exploration is needed to tackle this open research issue. An audio-visualMalayalam speech database containing of 50 phonemes along with 106 allophones of five indigenous speakers has been created. The listed visual information is made up of the complete facial area recorded in a frontal perspective. Time annotation of the audio and video signals is performed manually. Duration of audio signal and video signal of every phonemes and allophones are estimated from the time annotated audio visual database. Asynchrony is then estimated as their differences. Asynchrony analysis was performed individually for phonemes and allophones to underline the coarticulation effect. Multi modal speech recognition has greater accuracy than audio only speech recognition, especially in noisy environment. AVA plays a vital role in applications like multi modal speech recognition and synthesis, automatic redubbing, etc.
Keywords: Audio-Visual Asynchrony, Preservatory Coarticulation, Anticipatory Coarticulation, Phonemes and Allophones.
Scope of the Article: Visual Analytics