SUBSCRIBE
SUBSCRIBE
EXPLORE +
  • About infoDOCKET
  • Academic Libraries on LJ
  • Research on LJ
  • News on LJ
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Libraries
    • Academic Libraries
    • Government Libraries
    • National Libraries
    • Public Libraries
  • Companies (Publishers/Vendors)
    • EBSCO
    • Elsevier
    • Ex Libris
    • Frontiers
    • Gale
    • PLOS
    • Scholastic
  • New Resources
    • Dashboards
    • Data Files
    • Digital Collections
    • Digital Preservation
    • Interactive Tools
    • Maps
    • Other
    • Podcasts
    • Productivity
  • New Research
    • Conference Presentations
    • Journal Articles
    • Lecture
    • New Issue
    • Reports
  • Topics
    • Archives & Special Collections
    • Associations & Organizations
    • Awards
    • Funding
    • Interviews
    • Jobs
    • Management & Leadership
    • News
    • Patrons & Users
    • Preservation
    • Profiles
    • Publishing
    • Roundup
    • Scholarly Communications
      • Open Access

February 19, 2016 by Gary Price

New Case Study: “Leveraging Heritrix and the Wayback Machine on a Corporate Intranet”

February 19, 2016 by Gary Price

The following article appears in the January/February 2016 issue of D-Lib Magazine.
Full Title
Leveraging Heritrix and the Wayback Machine on a Corporate Intranet: A Case Study on Improving Corporate Archives
Authors
Justin F. Brunelle
The MITRE Corporation and Old Dominion University
Krista Ferrante and Eliot Wilczek
The MITRE Corporation
Michele C. Weigle and Michael L. Nelson
Old Dominion University
Source
D-Lib Magazine
Vol 22, No. 1-2 (January/February 2016)
Authors

In this work, we present a case study in which we investigate using open-source, web-scale web archiving tools (i.e., Heritrix and the Wayback Machine installed on the MITRE Intranet) to automatically archive a corporate Intranet. We use this case study to outline the challenges of Intranet web archiving, identify situations in which the open source tools are not well suited for the needs of the corporate archivists, and make recommendations for future corporate archivists wishing to use such tools. We performed a crawl of 143,268 URIs (125 GB and 25 hours) to demonstrate that the crawlers are easy to set up, efficiently crawl the Intranet, and improve archive management. However, challenges exist when the Intranet contains sensitive information, areas with potential archival value require user credentials, or archival targets make extensive use of internally developed and customized web services. We elaborate on and recommend approaches for overcoming these challenges.

Direct to Full Text Article

Filed under: Archives and Special Collections, Management and Leadership, News

SHARE:

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.

ADVERTISEMENT

Archives

Job Zone

ADVERTISEMENT

Related Infodocket Posts

Michigan: Grand Rapids Public Library Finds Rare Set of 'Magic Lantern' Slides Showing Early Tuskegee Institute

From Fox 17 (Grand Rapids): The folks over at the Grand Rapids Public Library made a fascinating discovery while digging through their massive archives back in March 2021, and are ...

Journal Article: "Knowledge Work in Platform Fact-Checking Partnerships"

The article linked below was recently published by the International Journal of Communication. Title Knowledge Work in Platform Fact-Checking Partnerships Authors Valérie Bélair-Gagnon University of Minnesota-Twin Cities, USA Rebekah Larsen ...

State Library Looks to Install Book Vending Machines Around North Dakota; A Guide to Communicating With Others: Messaging...

A Guide to Communicating With Others: Messaging Apps (via Privacy International) De Gruyter Acquires Mercury Learning and Information Report by the French Committee for Open Science Working Group on Electronic ...

Just Released: Calculators Now Emulated at The Internet Archive (The Calculator Drawer)

From an Internet Archive Blog Post by Jason Scott: It’s time to add another family of emulated older technology to the Internet Archive. The vast majority of platforms within what ...

Journal Article: "Crossref as a Bibliographic Discovery Tool in the Arts and Humanities"

The article linked below was recently published by Quantitative Science Studies. Title Crossref as a Bibliographic Discovery Tool in the Arts and Humanities Authors Ángel Borrego Universitat de Barcelona, Melcior ...

Montana: ImagineIF Trustees Hold Special Meeting on Library Security Concerns; Pennsylvania: Philly’s Free Library is Making Space for...

Colorado: Suspensions Increase at Pikes Peak Library District Under New Security Protocols (via The Gazette) Montana: ImagineIF Trustees Hold Special Meeting on Library Security Concerns (via Daily Inter Mountain) North ...

Not Real News: An Associated Press Roundup of Untrue Stories Shared Widely on Social Media This Week

From the Associated Press: A roundup of some of the most popular but completely untrue stories and visuals of the week. None of these are legit, even though they were ...

Report: "Australian Authors to Receive Compensation for E-Book Loans for First Time"

From The Sydney Morning Herald: Authors, illustrators, and editors will be compensated for e-book and audiobook library borrowings for the first time, in a move by the federal government to ...

National Archives and Records Administration (NARA) Publishes Customer Research Agenda

From the National Archives and Records Administration (NARA): A draft Customer Research Agenda was open for public review and comment in October 2022. “We’re grateful for the feedback we received ...

Report: "A Watermark for Chatbots Can Expose Text Written by an AI"

From MIT Technology Review: Hidden patterns purposely buried in AI-generated texts could help identify them as such, allowing us to tell whether the words we’re reading are written by a ...

The Accessibility of Federal Information and Data: A Brief Overview of Section 508 of the Rehabilitation Act (Updated...

From the Congressional Research Service: Nearly one in four Americans has a disability, according to 2018 estimates from the U.S. Census Bureau. Congress has recognized that in addition to making ...

NY Times: "New York Public Library Acquires Joan Didion’s Papers"

From The NY Times: When [Joan] Didion died in 2021 at age 87, the news set off an outpouring of tributes to a writer who fused penetrating insight and idiosyncratic personal voice, ...

ADVERTISEMENT

FOLLOW US ON TWITTER

Tweets by infoDOCKET

ADVERTISEMENT

This coverage is free for all visitors. Your support makes this possible.

This coverage is free for all visitors. Your support makes this possible.

Primary Sidebar

  • News
  • Reviews+
  • Technology
  • Programs+
  • Design
  • Leadership
  • People
  • COVID-19
  • Advocacy
  • Opinion
  • INFOdocket
  • Job Zone

Reviews+

  • Booklists
  • Prepub Alert
  • Book Pulse
  • Media
  • Readers' Advisory
  • Self-Published Books
  • Review Submissions
  • Review for LJ

Awards

  • Library of the Year
  • Librarian of the Year
  • Movers & Shakers 2022
  • Paralibrarian of the Year
  • Best Small Library
  • Marketer of the Year
  • All Awards Guidelines
  • Community Impact Prize

Resources

  • LJ Index/Star Libraries
  • Research
  • White Papers / Case Studies

Events & PD

  • Online Courses
  • In-Person Events
  • Virtual Events
  • Webcasts
  • About Us
  • Contact Us
  • Advertise
  • Subscribe
  • Media Inquiries
  • Newsletter Sign Up
  • Submit Features/News
  • Data Privacy
  • Terms of Use
  • Terms of Sale
  • FAQs
  • Careers at MSI


© 2023 Library Journal. All rights reserved.


© 2022 Library Journal. All rights reserved.