Document Type

Thesis - University Access Only

Award Date

2010

Degree Name

Master of Science (MS)

Department / School

Electrical Engineering and Computer Science

Abstract

In recent years, the impact of Web crawler is becoming more and more significant. Web crawler is widely used on both commercial and research institutional Web sites which brought convenience to many people. The sword has two blades. With the positive function, there are many intentions behind Web crawler and they are often unscrupulous when it comes to Web site integrity, and traditional network security technology has limitation. Therefore, it is very important to find an effective approach to analysis and identify visits and distinguish Web crawler from other accesses. However, common methods used to detect Web crawlers can detect Web crawler, but still cannot distinguish undesirable crawlers from welcome crawlers. This thesis proposed a trap-based approach to detect Web crawler and determining rules for classifying Web crawlers accurately. Three methods are used in the detection system - hidden link, robots.txt and submission button. Compared with the system just use hidden link or behavior analysis, the detection system can distinguishes undesirable crawlers from human users and welcome crawlers. Finally, the test results are evaluated and analyzed to show the improvements.

Library of Congress Subject Headings

Malware (Computer software)

Data mining

Data protection

World Wide Web -- Security measures

Format

application/pdf

Number of Pages

67

Publisher

South Dakota State University

Share

COinS