Classification of Gene Expression Data with Optimized Feature Selection
T. Ragunthar1, S. Selvakumar2
1T. Ragunthar, Assistant Professor, Department of Computer Science & Engineering, Sri Sairam Institute of Technology Chennai, India.
2S. Selvakumar, Professor, Department of Computer Science & Engineering, GKM College of Engineering & Technology, Chennai, India.
Manuscript received on 04 March 2019 | Revised Manuscript received on 09 March 2019 | Manuscript published on 30 July 2019 | PP: 4763-4769 | Volume-8 Issue-2, July 2019 | Retrieval Number: B1845078219/19©BEIESP | DOI: 10.35940/ijrte.B1845.078219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: There are different types of fatal diseases that could possibly outspread to various parts of the body. It thus becomes obligatory to predict the existence of such anomalies, in order to prune the extent of their spread. Examining the characteristics of genes provides a deep intuition about the disease classification, as they play a vital role in influencing how an organism appears, behaves and survives in an environment. The detection of the abnormal genes could be efficiently modelled using statistical methods and machine learning approaches. Gene expression data derived from a microarray could act as an aid for this statistical computation. Microarray being a recent leap in molecular biology, provides a scope for hybridization of DNA samples that can be interpreted as values based on the gene expression level that the genome possesses. We propose an idea to select a subset of features from the huge number of samples retrieved from the gene expression profiles using Boruta feature selection algorithm. A comparative study with various supervised classification algorithms is made to categorize this subset to a normal and deviant gene. This serves to discover the most appropriate algorithm to classify the gene expression data. Hence assorting the abnormal genes in future could be accelerated with ease.
KEYWORDS- Boruta algorithm, DNA Samples, Feature Selection, Gene Expression data, Kernel, Machine Learning, Microarray, Random Forest, SVM.
Scope of the Article: Machine Learning