Text Mining with Apache Hadoop over different Hadoop Clusters Architectures
E. Laxmi Lydia1, Gorapalli Chandra Sekhar2, Madhu BabuChevuru3, Dasari Ramya4, K. Vijaya Kumar5
1E. Laxmi Lydia, Associate Professor, Department of Computer Science Engineering, Vignan’s Institute of Information Technology, Andhra Pradesh, India.
2Gorapalli Chandra Sekhar, PG Schloar, Department of Computer Science Engineering, Vignan’s Institute of Information Technology, Andhra Pradesh, India.
3Madhu Babu Chevuru, Asst. Professor, Department of Computer Science Engineering, VFSTR Deemed to be University.
4Dasari Ramya, Junior Research Fellow, Department of Computer Science Engineering, Vignan’s Institute of Information Technology, Andhra Pradesh, India.
5K. Vijaya Kumar, Associate Professor, Department of Computer Science Engineering, Vinan’s Institute of Information Technology, Andhra Pradesh, India.
Manuscript received on 16 March 2019 | Revised Manuscript received on 22 March 2019 | Manuscript published on 30 July 2019 | PP: 1252-1256 | Volume-8 Issue-2, July 2019 | Retrieval Number: B1866078219/19©BEIESP | DOI: 10.35940/ijrte.B1866.078219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Big data is very much practical for real time applicational systems. One of the mostly used real time application worldwide are on unstructured documents. Large number of documents are managed and maintained through popular leading Big Data platform is Hadoop. It maintains all the information at Hadoop Distributed File System in Blocks. Irrespective of datasize, Big Data has opened its path to store and analyze the data which has consumed time. To overcome this, Hadoophas designed cluster process for large volumes of unstructured data computations. Three different cluster architectures like Standalone, Single node cluster and multi node clusters are considered. In this paper, Big Data allows Hadoop platform to boost the processing speed overlarge datasets through cluster architectures, which are studied and analyzed through text documents from newsgroup20 dataset. It identifies the challenges on text mining and its applications using Apache Hadoop.
Keywords: Big Data, Hadoop Cluster, Standalone Mode, Pseudo Cluster Mode, Fully Distributed Mode,
Scope of the Article: Big Data