Issues in Urdu-Hindi NER Output of Google and Bing Translator: An Orthographic Perspective
Md. Tauseef Qamar1, Juhi Yasmeen2
1Md. Tauseef Qamar, Ph.D. Scholar, D/O Linguistics, AMU, Aligarh.
2Dr. Juhi Yasmeen, Ph.D. in Linguistics, AMU, Aligarh.
Manuscript received on November 17., 2019. | Revised Manuscript received on November 24 2019. | Manuscript published on 30 November, 2019. | PP: 12981-12985 | Volume-8 Issue-4, November 2019. | Retrieval Number: D8067118419/2019©BEIESP | DOI: 10.35940/ijrte.D8067.118419
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Named Entity Recognition (NER) is a sub-task of information extraction in which names are extracted both from the text and linguistic corpora which is still a tough nut to crack for NLP researchers in existing Machine Translation (MT) system due to its long tail. Since decades, NER has been an area of great interest both in MT and computational linguistics, thus, several tools have been designed for their handling in different languages. Therefore, this paper aims to compare the end user output of both Google and Bing translator with special reference to Urdu-Hindi NER. This will provide more insights in the development of intelligent language tools. Thus, on the one hand, the paper deals with orthographic challenges pertaining to Urdu-Hindi NER in general, while on the other hand, the paper also sheds light on the transliteration issues in particular. Further, we have also investigated the personal names, and named entity of Urdu, especially ezafat constructions. Consequently, the paper also proposes to handle NER from the language engineering point of view based on the existing end user output quality. Furthermore, the MT output of both Google and Bing has been ranked on the scale of 0 to 1, where 0 assigned to the correct output while 1 given to the wrong or inaccurate output.
Keywords: Named Entity Recognition, Urdu Orthographic Challenges, Ezafat, Googl and Bing NER Urdu-Hindi Output.
Scope of the Article: Pattern Recognition.