Preprocessing Big Data for Efficient Storage and Research
Melbin J Reena1, A. Shajin Nargun2
1Melbin J Reena, Research Scholar, Noorul Islam Centre for Higher Education, (Tamil Nadu), India.
2Dr. A. Shajin Nargun, Director Academics, Noorul Islam Centre for Higher Education, (Tamil Nadu), India.
Manuscript received on 16 July 2019 | Revised Manuscript received on 01 August 2019 | Manuscript Published on 10 August 2019 | PP: 11-16 | Volume-8 Issue-2S3 July 2019 | Retrieval Number: B10030782S319/2019©BEIESP | DOI: 10.35940/ijrte.B1003.0782S319
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Big Data refers to large datasets and so it is not possible to store, manage and analyze it using commonly used software systems. The emergence of smart phones, social networks and online applications has led to the generation of massive amounts of structured, unstructured and semi structured data. Big data analytics has received sizeable attention since it offers a great opportunity to uncover potentials from heavy amounts of data. Data preprocessing techniques, when applied prior to analytics, can substantially improve the overall quality of the patterns mined and/or the time required for the actual mining. Thus this paper presents an efficient method for preprocessing data and also partitioning big dataset based on sensitivity parameters. The partitioned dataset can be uploaded to public and private cloud based on the importance of data in the partition. Thus hybrid cloud storage and processing of big data is supported by this approach. The experimental results show that the proposed method preprocesses and partition data with high accuracy and reduced processing time.
Keywords: Big Data Analytics, Preprocessing, Partitioning, Hybrid Cloud, Cloud Storage.
Scope of the Article: Big Data Analytics