Named Entity Recognition for a Low Resource Language
Abhijit Debbarma1, Paritosh Bhattacharya2, Bipul Shyam Purkayastha3
1Abhijit Debbarma, PhD Scholar, Department of Computer Sc & Engineering, NIT Agartala, Jirania, Tripura, India.
2Dr. Paritosh Bhattacharya, Associate Professor, Department of Computer Sc & Engineering, NIT Agartala, Jirania, Tripura, India.
3Prof. Bipul Shyam Purkayastha, Professor, Department of Computer Science, Assam University, Silchar, Assam, India.
Manuscript received on 9 August 2019. | Revised Manuscript received on 18 August 2019. | Manuscript published on 30 September 2019. | PP: 587-590 | Volume-8 Issue-3 September 2019 | Retrieval Number: B2085078219/19©BEIESP | DOI: 10.35940/ijrte.B2085.098319
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Kokborok named entity recognition using the rules based approach is being studied in this paper. Named entity recognition is one of the applications of natural language processing. It is considered a subtask for information extraction. Named entity recognition is the means of identifying the named entity for some specific task. We have studied the named entity recognition system for the Kokborok language. Kokborok is the official language of the state of Tripura situated in the north eastern part of India. It is also widely spoken in other part of the north eastern state of India and adjoining areas of Bangladesh. The named entities are like the name of person, organization, location etc. Named entity recognitions are studied using the machine learning approach, rule based approach or the hybrid approach combining the machine learning and rule based approaches. Rule based named entity recognitions are influence by the linguistic knowledge of the language. Machine learning approach requires a large number of training data. Kokborok being a low resource language has very limited number of training data. The rule based approach requires linguistic rules and the results are not depended on the size of data available. We have framed a heuristic rules for identifying the named entity based on linguistic knowledge of the language. An encouraging result is obtained after we test our data with the rule based approach. We also tried to study and frame the rules for the counting system in Kokborok in this paper. The rule based approach to named entity recognition is found suitable for low resource language with limited digital work and absence of named entity tagged data. We have framed a suitable algorithm using the rules for solving the named entity recognition task for obtaining a desirable result.
Index Terms: NER, Kokborok, Rule Base, NLP
Scope of the Article: Natural Language Processing