Word N-Gram Based Approach for Word Sense Disambiguation in Telugu Natural Language Processing
Palanati Durga Prasad1, K.V.N. Sunitha2, B. Padmaja Rani3
1Palanati Durga Prasad, Academic Consultant, Department of CSE, Ucet, Mg University, Nalgonda (Telangana), India
2Dr. K.V.N. Sunitha, Professor, Department of CSE, BVRIT, JNTUH, Hyderabad (Telangana), India.
3Dr. B. Padmaja Rani, Professor, Department of CSE, JNTUH, Hyderabad (Telangana), India.
Manuscript received on 29 April 2019 | Revised Manuscript received on 11 May 2019 | Manuscript Published on 17 May 2019 | PP: 686-690 | Volume-7 Issue-6S4 April 2019 | Retrieval Number: F11410476S419/2019©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Telugu ( ) is one of the Dravidian languages which is morphologically rich. As in the other languages it too contains polysemous words which have different meanings in different contexts. There are several language models exist to solve the word sense disambiguation problem with respect to each language like English, Chinese, Hindi and Kannada etc. The proposed method gives a solution for the word sense disambiguation problem with the help of n-gram technique which has given good results in many other languages. The methodology mentioned in this paper finds the co-occurrence words of target polysemous word and we call them as n-grams. A Telugu corpus sent as input for training phase to find n-gram joint probabilities. By considering these joint probabilities the target polysemous word will be assigned a correct sense in testing phase. We evaluate the proposed method on some polysemous Telugu nouns and verbs. The methodology proposed gives the F-measure 0.94 when tested on Telugu corpus collected from CIIL, various news papers and story books .The present methodology can give better results with increase in size of training corpus and in future we plan to evaluate it on all words not only nouns and verbs.
Keywords: Joint Probabilities, Machine Translation, n-Grams, Word Sense Disambiguation.
Scope of the Article: Natural Language Processing