Efficient Technique for word Identification and Recognition in Telugu Documents
Kesana Mohana Lakshmi1, Tummala Ranga Babu2
1Kesana Mohana Lakshmi, Department of Electronics and Communication Engineering, CMR Technical Campus, Hyderabad, Telangana, India.
2Tummala Ranga Babu, Department of Engineering and Communication Engineering, RVR&JC College of Engineering, Guntur, A.P, India.
Manuscript received on 17 March 2019 | Revised Manuscript received on 21 March 2019 | Manuscript published on 30 July 2019 | PP: 6053-6057 | Volume-8 Issue-2, July 2019 | Retrieval Number: B3793078219/19©BEIESP | DOI: 10.35940/ijrte.B3793.078219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Telugu language is one of the most spoken Indian languages throughout the world. Since it has an old heritage, so Telugu literature and newspaper publications can be scanned to identify individual words. Identification of Telugu word images poses serious problems owing to its complex structure and larger set of individual characters. This paper aims to develop a novel methodology to achieve the same using SIFT (Scale Invariant Feature Transform) features of telugu words and classifying these features using BoVW (bag of visual words). The features are clustered to create a dictionary using k-means clustering. These words are used to create a visual codebook of the word images and the classification is achieved through SVM (Support Vector Machine).
Index Terms: Telugu, SHIFT, SVM, BOVW
Scope of the Article: Pattern Recognition