Document Type

Dissertation - Open Access

Award Date


Degree Name

Doctor of Philosophy (PhD)

Department / School

Mathematics and Statistics

First Advisor

Xijin Ge


The dynamic, decentralized world-wide-web has become an essential part of scientific research and communication, representing a relatively new medium for the conveyance of scientific thought and discovery. Researchers create thousands of web sites every year to share software, data and services. Unlike books and journals, however, the preservation systems are not yet mature. This carries implications that go to the core of science: the ability to examine another's sources to understand and reproduce their work. These valuable resources have been documented as disappearing over time in several subject areas. This dissertation examines the problem by performing a crossdisciplinary investigation, testing the effectiveness of existing remedies and introducing new ones. As part of the investigation, 14,489 unique web pages found in the abstracts within Thomson Reuters’ Web of Science citation index were accessed. The median lifespan of these web pages was found to be 9.3 years with 62% of them being archived. Survival analysis and logistic regression identified significant predictors of URL lifespan and included the year a URL was published, the number of times it was cited, its depth as well as its domain. Statistical analysis revealed biases in current static web-page solutions.

Library of Congress Subject Headings

Electronic information resources -- Management
Digital preservation
Web archiving
Web of Science


Includes bibliographical references (pages 138-142)



Number of Pages



South Dakota State University


In Copyright - Non-Commercial Use Permitted


This is licensed under a Creative Commons Attribution -ShareAlike 4.0 International license (CCBY-SA 4.0)