Web Data Extraction Using Tree Structure Algorithms – A Comparison
Seema Kolkur1, K.Jayamalini2
1Ms. Seema Kolkur, Assistant Professor, Department of Computer, Thadomal Shahani College of Engineering, Mumbai (Maharashtra), India.
2Ms. K. Jayamalini, Assistant Professor, Department of Computer, L.R. Tiwari College of Engineering, Mumbai (Maharashtra), India.
Manuscript received on 21 July 2013 | Revised Manuscript received on 28 July 2013 | Manuscript published on 30 July 2013 | PP: 35-39 | Volume-2 Issue-3, July 2013 | Retrieval Number: C0696072313/2013©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Nowadays, Web pages provide a large amount of structured data, which is required by many advanced applications. This data can be searched through their Web query interfaces. The retrieved information is also called ‘deep or hidden data’. The deep data is enwrapped in Web pages in the form of data records. These special Web pages are generated dynamically and presented to users in the form of HTML documents along with other content. These webpages can be a virtual gold mine of information for business, if mined effectively. Web Data Extraction systems or web wrappers are software applications for the purpose of extracting information from Web sources like Web pages. A Web Data Extraction system usually interacts with a Web source and extracts data stored in it. The extracted data is converted into the most convenient structured format and stored for further usage. This paper deals with the development of such a wrapper, which takes search engine result pages as input and converts them into structured format. Secondly, this paper proposes a new algorithm called Improved Tree Matching algorithm, which in turn, is based on the efficient Simple Tree Matching (STM) algorithm. Towards the end of this work, there is given a comparison with existing works. Experimental results show that this approach can extract web data with lower complexity compared to other existing approaches.
Keywords: About Web Data Extraction, Document Object Model (DOM), Improved Tree Matching algorithm.
Scope of the Article: Web Mining