Language Identification for Multilingual Sentiment Examination
Deepali D. Londhe1, Aruna Kumari2, Emmanuel M.3
1Deepali Londhe, Research Scholar, Assistant Professor, Department of CSE, KL University, Vijaywada, PICT, Pune, SPPU, Pune (Maharashtra), India.
2Dr. Aruna Kumari, Ph.D Guide, Professor, Department of CSE, VJIT, Hydrabad, K.L. University, Vijaywada (A.P), India.
3Dr. Emmanuel M, Professor, Department of IT, PICT, Pune, SPPU, Pune (Maharashtra), India.
Manuscript received on 19 October 2019 | Revised Manuscript received on 25 October 2019 | Manuscript Published on 02 November 2019 | PP: 3571-3576 | Volume-8 Issue-2S11 September 2019 | Retrieval Number: B14440982S1119/2019©BEIESP | DOI: 10.35940/ijrte.B1444.0982S1119
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Social media is most popular platform on which users can share their views, reviews and knowledge about various topics, news, products etc. Identifying sentiments or opinions of users is valuable for many e-commerce companies, Hotels, e-learning etc. This opinion analysis is useful for companies to improve their service and products. Due to increase in web users across globe, users happen to post their views freely over the internet. Many different languages are spoken across globe, supporting multilingual nature of social media makes analysis of such text difficult. Sentiment analysis can be conducted using videos, image, text, where text sentiment analysis is most popular form because of freely available contents in the form of blogs, reviews, comments etc. Because of development of social media platform, people can post comment in any language, creates the need for Multilingual sentiment analysis. Sentiment analysis task needs phases such as data collection, pre-processing, sentiment classification and polarity identification. The Multilingual nature needs Script Identification on the input text by labelling the different words used in text along with scripts used to denote them. Various languages used in the text are identified and the Hindi language text written in Romanized script is transliterated to Devanagari script. Text is then completely translated into English language and POS(Parts of Speech) tagging is performed on the obtained text. The aim and purpose of this study is to survey different techniques of multilingual sentiment analysis, and language identification of source text, where n-grams model outperforms all.
Keywords: Language Processing, Sentiment Analysis, Machine Learning, Lexicon based Approach.
Scope of the Article: Natural Language Processing and Machine Translation