The Impact of Feature Selection Methods for Classifying Arabic Textual Data
Mohammad Abu-Arqoub1, Ghassan F. Issa2, Wael M. Hadi3
1Mohammad Abu-Arqoub*, Faculty of Information Technology, University of Petra, Amman, Jordan.
2Ghassan F. Issa*, Faculty of Information Technology, University of Petra, Amman, Jordan.
3Wael M. Hadi, Faculty of Information Technology, University of Petra, Amman, Jordan.
Manuscript received on November 15, 2019. | Revised Manuscript received on November 23, 2019. | Manuscript published on November 30, 2019. | PP: 1333-1338 | Volume-8 Issue-4, November 2019. | Retrieval Number: D7163118419/2019©BEIESP | DOI: 10.35940/ijrte.D7163.118419
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Text classification is a vital process due to the large volume of electronic articles. One of the drawbacks of text classification is the high dimensionality of feature space. Scholars developed several algorithms to choose relevant features from article text such as Chi-square (x2), Information Gain (IG), and Correlation (CFS). These algorithms have been investigated widely for English text, while studies for Arabic text are still limited. In this paper, we investigated four well-known algorithms: Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbors (KNN), and Decision Tree against benchmark Arabic textual datasets, called Saudi Press Agency (SPA) to evaluate the impact of feature selection methods. Using the WEKA tool, we have experimented the application of the four mentioned classification algorithms with and without feature selection algorithms. The results provided clear evidence that the three feature selection methods often improves classification accuracy by eliminating irrelevant features.
Keywords: Feature Selection; Text Classification; Arabic Textual Data; Classical Algorithms.
Scope of the Article: Classification.