Bug Severity Prediction using Class Imbalance Problem
Shubhra Goyal Jindal1, Arvinder Kaur2
1Shubhra Goyal Jindal*, University School of Information and communication technology, Guru Gobind Singh Indraprastha university, Delhi, India.
2Arvinder Kaur, University School of Information and communication technology, Guru Gobind Singh Indraprastha university, Delhi, India.
Manuscript received on November 22, 2019. | Revised Manuscript received on November 28, 2019. | Manuscript published on November 30, 2019. | PP: 2687-2695 | Volume-8 Issue-4, November 2019. | Retrieval Number: D7297118419/2019©BEIESP | DOI: 10.35940/ijrte.D7297.118419
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Class imbalance problem is often observed when instances of major class exceed instances of minor class. The performance of machine learning techniques is immensely afflicted by imbalanced data in several fields. The skewed distribution either predicts the majority class with high error rate or will not foresee the minority class. To solve the problem of imbalanced data of software bugs, Synthetic minority oversampling technique (SMOTE) is used which balances the imbalanced datasets of Apache Projects. It is applied on summary of bugs to balance the dataset and predicts severity at system and component level. Several machine learning techniques are applied on imbalanced as well as balanced datasets to predict the severity of software bugs using textual description. Test outcomes and statistical analysis shows improved results on balanced datasets in respect to Gmean and balance metrics instead of machine learning techniques applied on imbalanced data. Evaluation metrics Gmean improves by 34% and balance by 11% at system level and by 42% and 62% at component level. Further, it was observed that solving class imbalance problem on textual data is helpful in augmenting the performance.
Keywords: Class Imbalance Problem, Severity Prediction, Synthetic Minority Oversampling Technique, Software Bug Reports.
Scope of the Article: Knowledge Engineering Tools and Techniques.