Data Cleaning Techniques for Large Data Sets
Yogita Bansal1, Ankita Chopra2

1Yogita Bansal, MCA department, Jagan Institute of Management studies, Delhi, India.
2Ankita Chopra*, MCA department, Jagan Institute of Management studies, Delhi, India.
Manuscript received on March 15, 2020. | Revised Manuscript received on March 24, 2020. | Manuscript published on March 30, 2020. | PP: 4453-4456 | Volume-8 Issue-6, March 2020. | Retrieval Number: E6938018520/2020©BEIESP | DOI: 10.35940/ijrte.E6938.038620

Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: In today’s emerging era of data science where data plays a huge role for accurate decision making process it is very important to work on cleaned and irredundant data. As data is gathered from multiple sources it might contain anomalies, missing values etc. which needs to be removed this process is called data pre-processing. In this paper we perform data pre-processing on news popularity data set where extraction , transform and loading (ETL) is done .The outcome of the process is cleaned and refined news data set which can be used to do further analysis for knowledge discovery on popularity of news . Refined data give accurate predictions and can be better utilized in decision making process.
Keywords: Data Mining, Data Pre Processing, Extraction, Transform, load, Knowledge discovery.
Scope of the Article: Data Mining.