K-Means Cluster Based Oversampling Algorithm for Imbalanced Data Classification
S. Santha Subbulaxmi1, G. Arumugam2
1Ms. S. Santha Subbulaxmi,* Research Scholar, Madurai Kamaraj University, Madurai, Tamil Nadu, India.
2Dr. G. Arumugam, Professor & Head of the Department (Retd.), Department of Computer Science, Madurai Kamaraj University, Madurai, Tamil Nadu, India.
Manuscript received on January 01, 2020. | Revised Manuscript received on January 20, 2020. | Manuscript published on January 30, 2020. | PP: 3436-3440 | Volume-8 Issue-5, January 2020. | Retrieval Number: E6535018520/2020©BEIESP | DOI: 10.35940/ijrte.E6535.018520
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Imbalanced data classification problems endeavor to find a dependent variable in a skewed data distribution. Imbalanced data classification problems present in many application areas like, medical disease diagnosis, risk management, fault-detection, etc. It is a challenging problem in the field of machine learning and data mining. In this paper, K-Means cluster based oversampling algorithm is proposed to solve the imbalanced data classification problem. The experimental results show that the proposed algorithm outperforms the existing oversampling algorithms of previous studies.
Keywords: Imbalanced Data, Classification, Oversampling, K-Means Clustering, Synthetic Instances.
Scope of the Article: Classification.