Processing Big Data with Apache Flink
N. Deshai1, B.V.D.S. Sekhar2, S. Venkataramana3
1N. Deshai, Department of Information Technology, Sagi Rama Krishnam Raju Engineering College, Bhimavaram (Andhra Pradesh), India.
2B.V.D.S. Sekhar, N. Deshai, Department of Information Technology, Sagi Rama Krishnam Raju Engineering College, Bhimavaram (Andhra Pradesh), India.
3S. Venkata Ramana, N. Deshai, Department of Information Technology, Sagi Ramakrishnam Raju Engineering College, Bhimavaram (Andhra Pradesh), India.
Manuscript received on 11 May 2019 | Revised Manuscript received on 05 June 2019 | Manuscript Published on 15 June 2019 | PP: 16-20 | Volume-8 Issue-1S3 June 2019 | Retrieval Number: A10040681S319/2019©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: In the current decade, the analytics of Big Data become more popular and we need advanced tools to store and process world large volume of datasets regarding on-demand and stream process. The Flink is Apache hosted latest data analytics framework, well-distributed data processing tool and 4G of Big Data that allows analyzing large-scale datasets at any scale and anywhere. This is a full and free open source policy for significant fast, and dynamic data analysis on both traditional and real-time world data; support the improvement of numerous data pipelines with directly acyclic graph models. Flink can process unlimited and limited real-world data sets furthermore which become been created to govern state-full streaming requests at a complex range. Flink provides high performance and low latency streaming and supports the more scalability and high flexibility from different programs and rich distributed Map Reduce-like policies including more efficiency, out-of-core execution, and query optimization abilities found in parallel databases. This paradigm is great challenging because dynamic executions completely depend on multiple parameter configurations. This paper aim is to recognize and demonstrate the main influence of various architectural options and the arrangements of the parameter during the observation of end-to-end execution. We frequently utilizing this methodology to analyze the performance of Flink 1.5 as faster than Spark because of its underlying streaming engine by various characteristics are batches and workloads repeatedly on up to 100 nodes. Every stream processing tool tend to be handle further consideration and major challenges such as low latency, more throughput, fault tolerant and in memory computation.
Keywords: Big Data, Apache Spark, Flink, Batch, Stream.
Scope of the Article: Big Data Analytics Application Systems