Big Data and Machine Learning Integration: The Benefits and Research Issues in the Huge Data Processing
B Priyanka1, K. Uma Pavan Kumar2, Indivar Shaik3

1B Priyanka, Research Scholar, Department of Computer Science and Engineering, SSSUTMS, Sehore (Madhya Pradesh), India.
2Dr. K. Uma Pavan Kumar, Associate Professor, Department of Computer Science and Engineering, Malla Reddy Institute of Technology, Hyderabad (Telangana), India.
3Indivar Shaik, Research Scholar, Department of Computer Science and Engineering, SSSUTMS, Sehore (Madhya Pradesh), India.
Manuscript received on 15 October 2019 | Revised Manuscript received on 24 October 2019 | Manuscript Published on 02 November 2019 | PP: 2427-2429 | Volume-8 Issue-2S11 September 2019 | Retrieval Number: B12810982S1119/2019©BEIESP | DOI: 10.35940/ijrte.B1281.0982S1119
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The generation of the data from individual member to MNC incurring more burden on the existing architectures. The current requirements of processing and storing huge data may not be suitable to the existing storage and processing techniques. The fundamental issue is kind of the data populated every second in the social media even reaching to peta bytes of the storage the processing of this huge data is another problem. Here the concept of big data comes into the picture,Hadoop is a frame work which is helpful to store huge amounts of the data and to process the data in parallel and distributed mode. The framework is the combination of Hadoop Distributed File System(HDFS) and Map Reduce(MR). HDFS is a distributed storage which allows huge storage capacity solves the issue of abnormal data population, whereas the processing of the data is taken by the Map Reduce which provides a versatile model of processing the huge amounts of the data. The other dimension of the current work is to analyze the huge amounts of the data which is beyond the scope of Hadoop based tools. Machine Learning (ML) is a class of algorithms provides various techniques to analyze the huge data in a better possible way. ML provides classification techniques, clustering mechanisms and Recommender systems to name a few. The importance of the current work is to integrate the Hadoop and R which in turn the combination of Big data and ML. The work provides the key benefits of such integration and future scope of the integration along with possible research constraints in the reality. We believe the work gives a platform to researchers so as to extract the future scope of the integration and difficulties faced in the process.
Keywords: Hadoop, Framework, R, Parallel Processing, Distributed Storage.
Scope of the Article: Machine Learning