Feature Selection using Normalized Weight Method for Tamil Text Classification
N. Rajkumar1, T.S. Subashini2, K. Rajan3, V. Ramalingam4
1N. Rajkumar*, Research Scholar,Department of Computer and Information Science, Annamalai University, India.
2T.S. Subashini, Department of Computer Science and Engineering, Annamalai University, Tamil Nadu, India.
3K. Rajan, Department of Computer Engineering, Muthiah Polytechnic College, Annamalainagar.
4V. Ramalingam, Department of Computer Science and Engineering, Annamalai University, Tamil Nadu, India.
Manuscript received on April 02, 2020. | Revised Manuscript received on April 15, 2020. | Manuscript published on May 30, 2020. | PP: 9-14 | Volume-9 Issue-1, May 2020. | Retrieval Number: F9068038620/2020©BEIESP | DOI: 10.35940/ijrte.F9068.059120
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The Feature Selection process simplify the Tamil text classification work at present we are in the information age, in this period all the applications has great growth in the domain of World Wide Web, so regional language like Tamil materials such as web pages, e-mails, e-books, and digital data has grown enormously so the retrieval of this Tamil digital document is more wanted by Tamil Document searcher. For quick retrieval of needed Tamil digitized documents among the millions of Tamil web documents, these documents should be classified by content according to their classes. The Tamil Text classification is a background work for many Tamil NLP applications such as query response, information extraction, information summarization, etc. the implementation of text categorization is very important in the information retrieval field. The text categorization assigns a document an appropriate category from a predefined group of categories. Tamil Text Classification classifies the documents based on Tamil text in a Document. Tamil language words are very rich in morphology and hence Tamil language consists of very large set of word forms. So it is important to reduce the features of Tamil text. This paper discusses about Feature selection Using Normalized weight from the huge set of key words from the preprocessed corpus. The Feature selection done by Term Weighting (TF*IDF) normalized method is reducing the size of the key word list which is very useful for training and testing Tamil text classification algorithms.
Keywords: Stop word, Stemming, Feature Selection, Text mining, Text Classification, NLP
Scope of the Article: Classification.