SUBSCRIBE
SUBSCRIBE
EXPLORE +
  • About infoDOCKET
  • Academic Libraries on LJ
  • Research on LJ
  • News on LJ
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Libraries
    • Academic Libraries
    • Government Libraries
    • National Libraries
    • Public Libraries
  • Companies (Publishers/Vendors)
    • EBSCO
    • Elsevier
    • Ex Libris
    • Frontiers
    • Gale
    • PLOS
    • Scholastic
  • New Resources
    • Dashboards
    • Data Files
    • Digital Collections
    • Digital Preservation
    • Interactive Tools
    • Maps
    • Other
    • Podcasts
    • Productivity
  • New Research
    • Conference Presentations
    • Journal Articles
    • Lecture
    • New Issue
    • Reports
  • Topics
    • Archives & Special Collections
    • Associations & Organizations
    • Awards
    • Funding
    • Interviews
    • Jobs
    • Management & Leadership
    • News
    • Patrons & Users
    • Preservation
    • Profiles
    • Publishing
    • Roundup
    • Scholarly Communications
      • Open Access

March 3, 2013 by Gary Price

GPO Publishes Update About Web Harvesting Pilot Project

March 3, 2013 by Gary Price

The March 2013 issue of the Federal Depository Library Program (FDLP) includes an update about the Government Printing Office (GPO) web harvesting pilot.
From the Article:

In late 2011, Library Services and Content Management (LSCM) and OAM staff developed a pilot project to test an implementation of the Internet Archive’s Heritrix-based Archive-It, which is a subscription-based Web harvesting and archiving service. In developing the pilot project, the project team networked with Web harvesting teams from the Library of Congress, the National Archives and Records Administration, and the University of North Texas Library (a GPO library partner already well-known for establishing the CyberCemetery and its leadership in digital preservation initiatives).
While each of these GPO partners and more than 228 libraries and agencies had proven the basic concept and viability of Heritrix and Archive-It, the Web Harvesting Task Force was charged with determining whether Archive-It would work within LSCM’s operational budget and staffing parameters.
Test crawls were conducted on ten test Web sites, and the resulting facsimile harvested copies were reviewed for performance. MARC records were created in the CGP by performing a crosswalk from the Archive-It Dublin Core metadata to MARC. Links in the CGP MARC records were created to the archived content on the Internet Archive’s Wayback server for each harvested Web site.
Having successfully achieved the proof of concept, Laurie Hall, LSCM’s Director of Library Technical Information Services, charged the Task Force to:

  • Form a Web Archiving Team and develop a project plan toward a full implementation of a Web harvesting and archiving service.
  • Develop modifications needed to LSCM workflow for acquisition, cataloging, classification, archiving, and access, to include whole Web sites as well as individual publications.
  • Develop configurations on cost and staff resources for continuation and expansion of the project, including a budget for FY2013.

Read the Complete Article and Meet the Web Harvesting Pilot Team
See Also: Slide Presentation About Web Harvest Pilot From Depository Library Conference (October 18, 2012; PDF)
Learn More About Archive-It (from the Internet Archive)

Filed under: Archives and Special Collections, Digital Preservation, Libraries, Management and Leadership, News, Preservation

SHARE:

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

ADVERTISEMENT

Archives

Job Zone

ADVERTISEMENT

Related Infodocket Posts

ADVERTISEMENT

FOLLOW US ON X

Tweets by infoDOCKET

ADVERTISEMENT

This coverage is free for all visitors. Your support makes this possible.

This coverage is free for all visitors. Your support makes this possible.

Primary Sidebar

  • News
  • Reviews+
  • Technology
  • Programs+
  • Design
  • Leadership
  • People
  • COVID-19
  • Advocacy
  • Opinion
  • INFOdocket
  • Job Zone

Reviews+

  • Booklists
  • Prepub Alert
  • Book Pulse
  • Media
  • Readers' Advisory
  • Self-Published Books
  • Review Submissions
  • Review for LJ

Awards

  • Library of the Year
  • Librarian of the Year
  • Movers & Shakers 2022
  • Paralibrarian of the Year
  • Best Small Library
  • Marketer of the Year
  • All Awards Guidelines
  • Community Impact Prize

Resources

  • LJ Index/Star Libraries
  • Research
  • White Papers / Case Studies

Events & PD

  • Online Courses
  • In-Person Events
  • Virtual Events
  • Webcasts
  • About Us
  • Contact Us
  • Advertise
  • Subscribe
  • Media Inquiries
  • Newsletter Sign Up
  • Submit Features/News
  • Data Privacy
  • Terms of Use
  • Terms of Sale
  • FAQs
  • Careers at MSI


© 2026 Library Journal. All rights reserved.


© 2022 Library Journal. All rights reserved.