New Case Study: “Leveraging Heritrix and the Wayback Machine on a Corporate Intranet”
The following article appears in the January/February 2016 issue of D-Lib Magazine.
Full Title
Leveraging Heritrix and the Wayback Machine on a Corporate Intranet: A Case Study on Improving Corporate Archives
Authors
Justin F. Brunelle
The MITRE Corporation and Old Dominion University
Krista Ferrante and Eliot Wilczek
The MITRE Corporation
Michele C. Weigle and Michael L. Nelson
Old Dominion University
Source
D-Lib Magazine
Vol 22, No. 1-2 (January/February 2016)
Authors
In this work, we present a case study in which we investigate using open-source, web-scale web archiving tools (i.e., Heritrix and the Wayback Machine installed on the MITRE Intranet) to automatically archive a corporate Intranet. We use this case study to outline the challenges of Intranet web archiving, identify situations in which the open source tools are not well suited for the needs of the corporate archivists, and make recommendations for future corporate archivists wishing to use such tools. We performed a crawl of 143,268 URIs (125 GB and 25 hours) to demonstrate that the crawlers are easy to set up, efficiently crawl the Intranet, and improve archive management. However, challenges exist when the Intranet contains sensitive information, areas with potential archival value require user credentials, or archival targets make extensive use of internally developed and customized web services. We elaborate on and recommend approaches for overcoming these challenges.
Direct to Full Text Article
Filed under: Archives and Special Collections, Management and Leadership, News

About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.