Digital Preservation: The Library of Congress Seeks Information About Web Harvesting, Formal RFI Published

December 1, 2014 by Gary Price

On November 26th, the Library of Congress published a RFI (request for information) regarding web harvesting.
From the Synopsis:

The Library of Congress, Office of Strategic Initiatives (OSI) is seeking information from potential contractors about how to best to design a requirement related to saving and reviewing information from the Internet. The Library is seeking information, e.g., current/existing commercial solutions, design solutions, etc., on how to best meet this web harvesting requirement.
This RFI is to determine if potential offerors can meet the Library’s technical and production requirements for harvesting web content and to receive feedback on pricing models and reasonable quality assurance. The Library is actively seeking suggested solutions and alternatives that will meet our requirements.
From the RFI:
Many of the activities of the digital lifecycle for harvested web content occur at the Library of Congress, including seed URL nomination, permissions gathering, scoping and preparation of a seed list, quality review, and public access to researchers. The Library’s web harvesting curator tools and infrastructure have been developed for the inputs and outputs of open source tools (Heritrix for harvesting, and Wayback Machine for access). The potential requirements described here are to support the Library’s large-scale, ongoing harvesting efforts, plus storage for the life of any potential contract, indexing for access, restricted access to the content for processing by Library staff, and transfer to the Library for long-term storage.
Although the following provides a general description of the Library’s potential requirements, the Library is actively seeking suggested alternatives to the requirements discussed below, where appropriate.

Direct to Complete RFI (22 pages; PDF
Full text is also embedded below.

From the Library of Congress: RFI Web Harvesting

See Also: Web Archiving In the United States: A 2013 Survey
October 2013. 25 pages; PDF.
Published by the National Digital Stewardship Alliance.

Filed under: Digital Preservation, Libraries, News, Preservation

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

Digital Preservation: The Library of Congress Seeks Information About Web Harvesting, Formal RFI Published

About Gary Price

Archives

FOLLOW US ON TWITTER

Digital Preservation: The Library of Congress Seeks Information About Web Harvesting, Formal RFI Published

About Gary Price

Archives

Related Infodocket Posts

FOLLOW US ON TWITTER