Web Preservation, URLs: "Analyzing the Persistence of Referenced Web Resources with Memento"
Authors: Robert Sanderson; Mark Edward Phillips; and Herbert Van de Sompel
In this paper, the authors present the results of a study into the persistence and availability of web resources referenced from papers in scholarly repositories. Two repositories with different characteristics, arXiv and the UNT digital library, are studied to determine if the nature of the repository, or of its content. Memento makes it possible to automate discovery of archived resources and to consider the time between the publication of the research and the archiving of the reference URLs. This automation allows us to process more than 160000 URLs, the largest known such study, and the repository metadata allows consideration of the results by discipline. The results are startling: 45% (66096) of the URLs referenced from arXiv still exist, but are not preserved for future generations, and 28% of resources referenced by UNT papers have been lost. Moving forwards, we provide some initial recommendations, including that repositories should publish URL lists extracted from papers that could be used as seeds for web archiving systems.
Source: University of North Texas Digital Library
See Also: Learn More About Memento, Download Momento Tools
Filed under: Digital Collections, Interactive Tools, Journal Articles, Libraries, News, Open Access, Preservation
About Gary Price
Gary Price (email@example.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.