Handling Scarcity of Data in Autism Identification using Binary Imputation Method
Sushama Rani Dutta1, Sujoy Datta2, Monideepa Roy3
1Sushama Rani Dutta, SRF in ITRA Project in the School of Computer Engineering, KIIT Deemed to be University
2Sujoy Datta, Assistant Professor in the School of Computer Engineering, KIIT Deemed University
3Monideepa Roy, Associate Professor at KIIT Deemed University, Bhubaneswar.
Manuscript received on 15 April 2019 | Revised Manuscript received on 22 May 2019 | Manuscript published on 30 May 2019 | PP: 3087-3094 | Volume-8 Issue-1, May 2019 | Retrieval Number: A2469058119/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Autism is a neuro-developmental disorder. Identifying the type of autism is a very crucial job for a doctor, since each type of autism has a different type of therapy. In rural areas, the identification and prediction of suspected autistic children become difficult because of various factors. This is because, mostly the parents are uneducated and are not able to notice and express the symptoms of their children properly. This in turn leads to the doctors often being left to deal with incomplete datasets, thus making the diagnosis process erroneous or difficult. In our previous work, we had proposed a framework to assist the doctors as well as the parents of the anticipated patients in rural areas to better recall the maximum number of symptoms, by prompting them for associated symptoms, once a first symptom is mentioned by the parent. Our method prompted the parents with possible associated symptoms based on previous autistic children data stored in EHR (Electronic Health Records). However, in case of surveys where the above procedure has not been implemented, the complete set of symptoms for a patient may not be available, thus leading to incomplete datasets. The incomplete datasets are the data sets which are having missing symptoms. Diagnosis of autism with missing symptoms is very difficult. In this paper, we have proposed a Binary Imputation method (BIM) algorithm, to handle such missing symptoms in the collected datasets, which uses the weight factors (influence of parameter on the disease diagnosis) of the symptoms. This method inserts a binary “1” for imputing values in place of some missing attributes, which is decided by the proposed BIM. We use Levenshtein distance (LD) formula for finding the suspected child by imputing ‘1’ in place of only one high weight missing symptom in a dataset. This method has been tested with the collected Asperger syndrome (autism type) datasets for identification of autism. We get better accuracy in diagnosis of autism and finding of the suspected child, as compared to other missing values handling methods like K nearest neighbour imputation method, mean imputation and case deletion methods. This method will help the doctor for easy diagnosis with the datasets having missing symptoms because all the missing symptoms can be handled by BIM algorithm.
Index Terms: Autism Identification; m RMR Rule; Machine Learning; Weight Factor; Missing Symptom; Asperger Syndrom; Binary Imputation Method.
Scope of the Article: Machine Learning