Elimination of Noisy Information from Web Pages
Alpa K. Oza1, Shailendra Mishra2
1Alpa K. Oza, Department of Information Technology, Parul Institute of Engineering and Technology, Gujarat Technological University, Ahmedabad (Gujarat), India.
2Shailendra Mishra, Department of Computer Science Engineering, Parul Institute of Technology, Gujarat Technological University, Ahmedabad (Gujarat), India.
Manuscript received on 21 March 2013 | Revised Manuscript received on 28 March 2013 | Manuscript published on 30 March 2013 | PP: 115-117 | Volume-2 Issue-1, March 2013 | Retrieval Number: A0523032113/2013©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: A Web page typically contains many information blocks. Besides, the content blocks, it usually has such blocks as navigation panels, copyright and privacy notices, and advertisements. These blocks that are not the main content blocks of the page, we call them as noisy blocks. We show that the information contained in these noisy blocks can seriously harm Web data mining. Thus eliminating these noises is of great importance. In our work we focus on identifying and removing local noises in web pages to improve the performance of mining. A simple idea for detection and removal of noises a new DOM tree structure is proposed. The result shows the remarkable increase in F score and accuracy is obtained.
Keywords: Noise Elimination, DOM Tree, Web Page Cleaning
Scope of the Article: Web Mining