ECETR-Extended Content Extraction via Tag Ratios
KR Ashok Kumar1, Y Rama Devi2
1First Author name: R Ashok Kumar , Research Scholor, Computer Science and Engineering, Rayalaseema University, Kurnool, (Andhra pradesh), India.
2Second Author name, Dr Y Rama Devi, Professor Department of CSE, CBIT,Gandipet,Hyderabad, (Telangana), India.
Manuscript received on 13 March 2019 | Revised Manuscript received on 20 March 2019 | Manuscript published on 30 March 2019 | PP: 158-160 | Volume-7 Issue-6, March 2019 | Retrieval Number: F2156037619/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The regular approach for the Common internet user to search the Contents of World Wide Web is through web query interfaces. Enormous use of the Internet to for the desired information around the world, the collection of important information from multiple web pages remains a difficult problem. There are multiple web content extraction systems are proposed to extract desired information from webpages. There are many number of manually constructed, supervised, semi supervised systems are developed in the field of web information extraction. There are many ways to extract the content from web pages are developed, such as document Object trees (DOM), Text Density, Tag Ratio proportion, visual information based algorithms. This paper proposes a novel web content extraction method on web content extraction uses Tag Ratios and added clustering methods. As our Proposed system is able to extract 85%-90% user relevant information.
Keywords: Web mining, Web data extraction, Web content extraction, Tag-Ratio, HTML, Document Object Model, tag ratios, web content extraction
Scope of the Article: Probabilistic Models and Methods