Distributed Programming Frameworks in Cloud Platforms
Anitha Patil
Anitha Patil, Department of Computer Science Engineering.
Manuscript received on 23 March 2019 | Revised Manuscript received on 30 March 2019 | Manuscript published on 30 March 2019 | PP: 611-619 | Volume-7 Issue-6, March 2019 | Retrieval Number: F2429037619/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Cloud computing technology has enabled storage and analysis of large volumes of data or big data. With cloud computing, a new discipline in computer science known as Data Science came into existence. Data Science is an interdisciplinary field which includes statistics, machine learning, predictive analytics and deep learning. It is meant for extracting hidden patterns from big data. Since big data consumes more storage space that cannot be accommodated with traditional storage devices, cloud computing resources of Infrastructure as a Service (IaaS) is used. Therefore, big data and big data analytics cannot exist without cloud computing. Another important fact is that big data can be subjected to analytics for obtaining Business Intelligence (BI). This process needs distributed programming frameworks like Hadoop, Apache Spark, Apache Flink, Apache Storm and Apache Samza. Without thorough understanding about these frameworks that run in cloud platforms, it is difficult to use them appropriately. Therefore, this paper throws light into a comparative study of these frameworks and evaluation of Apache Flink and Apache Spark with an empirical study. TeraSort benchmark is used for experiments.
Keywords: Cloud computing, Big data, Big data analytics, Distributed programming frameworks.
Scope of the Article: Data Modelling, Mining and Data Analytics