Impact of Tweet Features and Machine-Learning Classifiers for Twitter Spam Detection
S. Nitheesh Prabu1, Abhishek Pal2, S. Sundar Ram3, S. Karthika4
1S. Nitheesh Prabu, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam (Tamil Nadu), India.
2Abhishek Pal, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam (Tamil Nadu), India.
3S. Sundar Ram, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam (Tamil Nadu), India.
4S. Karthika, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam (Tamil Nadu), India.
Manuscript received on 22 April 2019 | Revised Manuscript received on 01 May 2019 | Manuscript Published on 07 May 2019 | PP: 31-35 | Volume-7 Issue-6S3 April 2019 | Retrieval Number: F1007376S19/2019©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Twitter changed how people get their everyday news and has provided a different platform of communication for everyone. It is this capacity of Twitter that attracts a lot of spammers to the platform. Twitter has an anti-spamming team and also encourages its users to report tweets, which they feel are spam. Even though this helps in identifying spam, it does not guarantee real time protection of the user. A number of mechanisms have been proposed to block spam to keep Twitter a safer place. Recent studies have directed efforts on detecting Twitter spam by applying machine learning algorithms. This paper is a study of such mechanisms. The dataset consists of 1,00,000 tweets which had been extracted from a tweet dataset containing 600 million tweets, out of which 6.5 million were spam. Each tweet was described using 12 features. The problem was then converted into a binary classification problem in the feature space. Then the importance of the features was analysed, studying the results of multiple classifiers and metrics.
Keywords: Ada Boost; Classification; Decision Tree; Logistic Regression; Machine Learning; Naïve Bayes; Random Forest; Social Networking; SVM; Twitter Spam.
Scope of the Article: Machine-Learning