Document Classification Using KNN with Fuzzy Bags of Word Representation
P. Lakshmi Prasanna1, S. Manogni2, P. Tejaswini3, K. Tanmay Kumar4, K. Manasa5
1P. Lakshmi Prasanna, Assistant Professor, KL University, Guntur (Andhra Pradesh), India.
2S. Manogni, B.Tech Graduate, KL University, Guntur (Andhra Pradesh), India.
3P. Tejaswini, B.Tech Graduate, KL University, Guntur (Andhra Pradesh), India.
4K. Tanmay Kumar, B.Tech Graduate, KL University, Guntur (Andhra Pradesh), India.
5K. Manasa, B.Tech Graduate, KL University, Guntur (Andhra Pradesh), India.
Manuscript received on 25 March 2019 | Revised Manuscript received on 06 April 2019 | Manuscript Published on 18 April 2019 | PP: 631-634 | Volume-7 Issue-6S March 2019 | Retrieval Number: F03240376S19/2019©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Text classification is used to classify the documents depending on the words, phrases and word combinations according to the declared syntaxes. There are many applications that are using text classification such as artificial intelligence, to maintain the data according to the category and in many other. Some keywords which are called topics are selected to classify the given document. Using these Topics the main idea of the document can be identified . Selecting the Topics is an important task to classify the document according to the category. In this proposed system keywords are extracted from documents using TF-IDF and Word Net. TF-IDF algorithm is mainly used to select the important words by which document can be classified. Word Net is mainly used to find similarity between these candidate words. The words which are having the maximum similarity are considered as Topics(keywords). In this experiment we used TF-IDF model to find the similar words so that to classify the document . Decision tree algorithm gives the better accuracy for text classification when compared to other algorithms fuzzy system to classify text written in natural language according to topic. It is necessary to use a fuzzy classifier for this task, due to the fact that a given text can cover several topics with different degrees. In this context, traditional classifiers are inappropriate, as they attempt to sort each text in a single class in a winner-takes-all fashion. The classifier we propose automatically learns its fuzzy rules from training examples. We have applied it to classify news articles, and the results we obtained are promising. The dimensionality of a vector is very important in text classification. We can decrease this dimensionality by using clustering based on fuzzy logic. Depending on the similarity we can classify the document and thus they can be formed into clusters according to their Topics. After formation of clusters one can easily access the documents and save the documents very easily. In this we can find the similarity and summarize the words called Topics which can be used to classify the Documents.
Keywords: Classification Fuzzy Representation Language Intelligence Automatically.
Scope of the Article: Fuzzy Logics