A1463059120 - International Journal of Recent Technology and Engineering (IJRTE)

Identification of Web Site Reliability Through Data Scrapping at Web Crawler’s Navigation
S. Ponmaniraj¹, Tapas Kumar², Amit Kumar Goel³
¹S.Ponmaniraj *, Research Scholar, School of Computing Science and Engineering, Galgotias University, Uttar Pradesh, India.
²Dr. Tapas Kumar , Professor, School of Computing Science and Engineering, Galgotias University, Uttar Pradesh, India.
³Dr. Amit Kumar Goel, Professor, School of Computing Science and Engineering, Galgotias University, Uttar Pradesh, India
Manuscript received on April 06, 2020. | Revised Manuscript received on April 14, 2020. | Manuscript published on May 30, 2020. | PP: 139-144 | Volume-9 Issue-1, May 2020. | Retrieval Number: A1463059120/2020©BEIESP | DOI: 10.35940/ijrte.A1463.059120
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Searching a specified content on the web site is like epistle a single character in bunch of pages. When the user enters their keyword into any search engines, it takes that in to web server mining process for collecting the entire terms related to that entered key phrase. Few pages gives legal and authenticated matter for the user, which they really wanted to access. Whereas many other pages are bringing them some unwanted and malicious codes of pages or virus activity pages to harm user’s activities and the system’s functions. Generally a web page attacks the targeted system by faulty instructions and malevolent programs through some sort of intrusion methodologies are called as phishing. In this attacking method user is set to access unknown or illegal sites by the way of accessing some unidentified websites link imbedding with legal site contents. Once victim’s system performance got compromised then hackers started to do attack. To avoid this kind of molestations, user needs to understand reliability of web page’s contents before started to continue browsing. This research paper is going to present web crawler architecture, design complexities and implementation for scrapping web contents from visited web pages for indentifying their reliability and freshness.
Keywords: Intrusion Detection System, Parser, Scanner, Search Engine Optimization, Semantic Web, Unstructured Information Management Architecture, Web crawler, Web Robot.
Scope of the Article: Internet and Web Applications.

Download PDF

JOURNAL

REQUIREMENTS

PRODUCT

CONTACT US