Document Type

Thesis - University Access Only

Award Date

2010

Degree Name

Master of Science (MS)

Department / School

Electrical Engineering and Computer Science

Abstract

In recent years, the impact of Web crawler is becoming more and more significant. Web crawler is widely used on both commercial and research institutional Web sites which brought convenience to many people. The sword has two blades. With the positive function, there are many intentions behind Web crawler and they are often unscrupulous when it comes to Web site integrity, and traditional network security technology has limitation. Therefore, it is very important to find an effective approach to analysis and identify visits and distinguish Web crawler from other accesses. However, common methods used to detect Web crawlers can detect Web crawler, but still cannot distinguish undesirable crawlers from welcome crawlers. This thesis proposed a trap-based approach to detect Web crawler and determining rules for classifying Web crawlers accurately. Three methods are used in the detection system - hidden link, robots.txt and submission button. Compared with the system just use hidden link or behavior analysis, the detection system can distinguishes undesirable crawlers from human users and welcome crawlers. Finally, the test results are evaluated and analyzed to show the improvements.

Library of Congress Subject Headings

Malware (Computer software)

Data mining

Data protection

World Wide Web -- Security measures

Format

application/pdf

Number of Pages

Publisher

South Dakota State University

Recommended Citation

Zhong, Tianying, "An Enhanced Malicious Web Crawler Detection and Classification System" (2010). Electronic Theses and Dissertations. 1697.
https://openprairie.sdstate.edu/etd2/1697

Download

COinS

Open PRAIRIE: Open Public Research Access Institutional Repository and Information Exchange

Electronic Theses and Dissertations

An Enhanced Malicious Web Crawler Detection and Classification System

Document Type

Award Date

Degree Name

Department / School

Abstract

Library of Congress Subject Headings

Format

Number of Pages

Publisher

Recommended Citation

Search

Browse

Author Corner

Links

Open PRAIRIE: Open Public Research Access Institutional Repository and Information Exchange

Electronic Theses and Dissertations

An Enhanced Malicious Web Crawler Detection and Classification System

Author

Document Type

Award Date

Degree Name

Department / School

Abstract

Library of Congress Subject Headings

Format

Number of Pages

Publisher

Recommended Citation

Share

Search

Browse

Author Corner

Links