An Integrated Strategy for Data Mining Based on Identifying Important and Contradicting Variables for Breast Cancer Recurrence Research
Avijit Kumar Chaudhuri1, Deepankar Sinha2, Kousik Bhattacharya3, Anirban Das4
1Avijit Kumar Chaudhuri, Assistant Professor, Academic In-Charge, Sikkim Manipal University Learning Centre, Kolkata.
2Deepankar Sinha, PhD, Indian Institute of Technology (IIT), Kharagpur.
3Kousik Bhattacharya, Assistant Registrar(Exam.) DDE, Rabindra Bharati University, Kolkata, West Bengal.
4Anirban Das, HOD, Computer Science & Engineering in Amity University Kolkata.
Manuscript received on February 02, 2020. | Revised Manuscript received on February 10, 2020. | Manuscript published on March 30, 2020. | PP: 1096-1106 | Volume-8 Issue-6, March 2020. | Retrieval Number: F7567038620/2020©BEIESP | DOI: 10.35940/ijrte.F7567.038620
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Cancer leads to most deaths worldwide, and breast cancer isa leading disease that causes death among women. This disease is unique in the way that once treated, can recur in some cases. Individuals are unable to identify their condition before it becomes dangerous. Extracting significant predictive features of breast cancer is an important and risky job for further study. Researchers have applied data mining techniques in medical science. Several authors suggest that a single method doesn’t resolve issues in diagnosing problems, and a hybrid model is desirable. In this paper, the authors propose an integrated approachto avoid Type 1 and 2 errors in predicting recurrence. They identify important and contradicting variables and consider them for inclusion and exclusion, respectively, to revise the dataset. The evaluation of findings of crucial methods, using original and revised datasets, widens the choice of identifying the technique with higher accuracy. The results show that the accuracy improves with the selection of variables restricted to the ones identified as relatively significant, and the dataset revised after elimination of contradicting variables.
Keywords: Data Mining Techniques, Errors, Integrated Approach, Under-Estimation, Recurrence, Breast Cancer
Scope of the Article: Data Mining.