Document Type

Dissertation - Open Access

Award Date

2014

Degree Name

Doctor of Philosophy (PhD)

Department

Mathematics and Statistics

First Advisor

Xijin Ge

Abstract

The dynamic, decentralized world-wide-web has become an essential part of scientific research and communication, representing a relatively new medium for the conveyance of scientific thought and discovery. Researchers create thousands of web sites every year to share software, data and services. Unlike books and journals, however, the preservation systems are not yet mature. This carries implications that go to the core of science: the ability to examine another's sources to understand and reproduce their work. These valuable resources have been documented as disappearing over time in several subject areas. This dissertation examines the problem by performing a crossdisciplinary investigation, testing the effectiveness of existing remedies and introducing new ones. As part of the investigation, 14,489 unique web pages found in the abstracts within Thomson Reuters’ Web of Science citation index were accessed. The median lifespan of these web pages was found to be 9.3 years with 62% of them being archived. Survival analysis and logistic regression identified significant predictors of URL lifespan and included the year a URL was published, the number of times it was cited, its depth as well as its domain. Statistical analysis revealed biases in current static web-page solutions.

Library of Congress Subject Headings

Electronic information resources -- Management
Digital preservation
Web archiving
Web of Science

Description

Includes bibliographical references (pages 138-142)

Format

application/pdf

Number of Pages

152

Publisher

South Dakota State University

Rights

In Copyright - Non-Commercial Use Permitted
http://rightsstatements.org/vocab/InC-NC/1.0/

Comments

This is licensed under a Creative Commons Attribution -ShareAlike 4.0 International license (CCBY-SA 4.0)

Share

COinS