Similarity Measurement Technique for Measuring the Performance of Page Rank Algorithm Based On Hadoop
M. A. H. Wadud1, M. A. Jafor2, M. F. Mridha3, M. M. Rahman4
1M. A. H. Wadud, Department of CSE, Mawlana Bhashani Science and Technology University, Tangail-1902, Bangladesh.
2M. A. Jafor, Department of CSE, Mawlana Bhashani Science and Technology University, Tangail-1902, Bangladesh.
3M. F. Mridha, Department of CSE, Bangladesh University of Business and Technology, Dhaka, Bangladesh.
4M. M. Rahman*, Department of CSE, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh.
Manuscript received on January 05, 2020. | Revised Manuscript received on January 25, 2020. | Manuscript published on January 30, 2020. | PP: 4712-4717 | Volume-8 Issue-5, January 2020. | Retrieval Number: E6843018520/2020©BEIESP | DOI: 10.35940/ijrte.E6843.018520
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: In this century big data manipulation is a challenging task in the field of web mining because content of web data is massively increasing day by day. Using search engine retrieving efficient, relevant and meaningful information from massive amount of Web Data is quite impossible. Different search engine uses different ranking algorithm to retrieve relevant information easily. A new page ranking algorithm is presented based on synonymous word count using Hadoop MapReduce framework named as Similarity Measurement Technique (SMT). Hadoop MapReduce framework is used to partition Big Data and provides a scalable, economical and easier way to process these data. It stores intermediate result for running iterative jobs in the local disk. In this algorithm, SMT takes a query from user and parse it using Hadoop and calculate rank of web pages. For experimental purpose wiki data file have been used and applied page rank algorithm (PR), improvised page rank algorithm (IPR) and proposed SMT method to calculate page rank of all web pages and compare among these methods. Proposed method provides better scoring accuracy than other approaches and reduces theme drift problem. Keywords: Inlink, Outlink.
Keywords: PageRank, Hadoop, Iterative Map Reduce, Link Analysis.
Scope of the Article: Cloud Computing,