Learning from Imbalanced Data in Classification
Seema S. Yadav1, Girish P. Bhole2
1Seema S. Yadav, Research Scholar, Computer Engineering, Veermata Jijabai Technological Institute, Matunga, Mumbai, India.
2Girish P. Bhole, Professor, Computer Engineering, Veermata Jijabai Technological Institute, Matunga, Mumbai, India.
Manuscript received on January 02, 2020. | Revised Manuscript received on January 15, 2020. | Manuscript published on January 30, 2020. | PP: 1907-1916 | Volume-8 Issue-5, January 2020. | Retrieval Number: E6286018520/2020©BEIESP | DOI: 10.35940/ijrte.E6286.018520
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Imbalanced data learning is a research area and day by day development is going on. Due to these researchers are motivated to pay attention to find efficient and adaptive methods for real-world problems. Machine learning, as well as data mining, is a field where researchers are finding different methods to solve problems related to imbalanced datasets and also the challenges faced in day to day life. The uneven class distribution in the dataset is the reason behind the degradation of performance in approaches used by data mining as well as machine learning. Continuous advancements of machine learning as well as mining data combining it with big data, a deep insight is required to understand the nature of learning imbalanced data. New challenges are emerging due to this development. Among the two approaches algorithm level and data level, the most popular approach compared to this is the hybrid approach. It is found that there is a bias for the majority class which affects the decision making task and overall accuracy of classification. The ensemble method is an efficient technique to deal with the uneven distribution of data. The aim of the paper is to presents the overview of class imbalance problems, solutions to handle it, open issues and challenges in learning imbalanced datasets. Based on the experiment conducted on one dataset it is found that ensemble technique along with other data-level methods gives good results. This hybrid method can be applied in many real-life applications like software defect prediction, behavior analysis, intrusion detection, medical diagnosis, etc. The paper further provides research directions in learning from the imbalanced dataset.
Keywords: Class Imbalance, Classifier, Majority And Minority Class, Biasing, Sampling, Feature Selection, Imbalanced Learning, Machine Learning, Preprocessing.
Scope of the Article: Machine Learning.