Document Type
Thesis - University Access Only
Award Date
2010
Degree Name
Master of Science (MS)
Department / School
Electrical Engineering and Computer Science
Abstract
In recent years, the impact of Web crawler is becoming more and more significant. Web crawler is widely used on both commercial and research institutional Web sites which brought convenience to many people. The sword has two blades. With the positive function, there are many intentions behind Web crawler and they are often unscrupulous when it comes to Web site integrity, and traditional network security technology has limitation. Therefore, it is very important to find an effective approach to analysis and identify visits and distinguish Web crawler from other accesses. However, common methods used to detect Web crawlers can detect Web crawler, but still cannot distinguish undesirable crawlers from welcome crawlers. This thesis proposed a trap-based approach to detect Web crawler and determining rules for classifying Web crawlers accurately. Three methods are used in the detection system - hidden link, robots.txt and submission button. Compared with the system just use hidden link or behavior analysis, the detection system can distinguishes undesirable crawlers from human users and welcome crawlers. Finally, the test results are evaluated and analyzed to show the improvements.
Library of Congress Subject Headings
Malware (Computer software)
Data mining
Data protection
World Wide Web -- Security measures
Format
application/pdf
Number of Pages
67
Publisher
South Dakota State University
Recommended Citation
Zhong, Tianying, "An Enhanced Malicious Web Crawler Detection and Classification System" (2010). Electronic Theses and Dissertations. 1697.
https://openprairie.sdstate.edu/etd2/1697