TFIDF and Entropy for Sports, News and Gambling Web Content Classification
Muhammad Dawood1, Othman Bin Ibrahim2, Aliyu Mohammad Abali3
1Muhammad Dawood, Ph.D Student, Faculty of Computing, UTM, Malaysia.
2Othman Bin Ibrahim, Associate Professor, Faculty of Computing, UTM, Malaysia.
3Aliyu Mohammad Abali, Ph.D Student, Faculty of Computing, UTM, Malaysia.
Manuscript received on 22 April 2019 | Revised Manuscript received on 05 May 2019 | Manuscript Published on 17 May 2019 | PP: 25-30 | Volume-8 Issue-1S May 2019 | Retrieval Number: A10040581S19/2019©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The exponential increase in online data or information brought up the issue of information security. Gambling web content is one of the greatest harmful res ources that pollutes children’s and adolescents ‘ minds by disguis ing sports and News Web pages. Gambling can be adopted by any gender or age. Several web content-based analytical approaches were proposed to prevent children from accessing these illegal web content. They are introduced to the Internet at an early age. Most approaches are weak to classify web content of high similarity such as Gambling, Sports and News Web pages. In this paper two existing term weighting schemes namely TFIDF and Entropy are used as feature selection process in filtering website. We examine the performance of both techniques via datasets and compare it with the term weighting schemes. The suitability of these term weighting schemes as the selection of features is measured according to the accuracy of the results obtained using the classification program known as Support Vector Machine (SVM). In this paper the performance of TFIDF and Entropy is judged on the basis of Accuracy. Results showed that TFIDF performed better than Entropy. On average, TFIDF obtained 97% and Entropy 91% accuracy. This study is hoped to give other researchers an insight, especially those who would like to work in the same area.
Keywords: URL, Term Weighting Schemes, TFIDF, Entropy, and SVM.
Scope of the Article: Web Technologies