August 17, 2018

Library of Congress Announces Release 4,240 New Web Archives on Loc.gov (Largest Single Release to Date)

From The Signal/Library of Congress:

The Library of Congress Digital Content Management Section is excited to announce the release of 4,240 new web  across 43 event and thematic collections on loc.gov, our largest single release of web archives to date! Web archives such as Slate Magazine from 2002 to present, Elizabeth Mesa’s Iraq War blog, and Sri Lanka’s current president Maithripala Sirisena’s campaign website (no longer live on the web) are now waiting to be discovered alongside millions of other Library items. Keep watching The Signal for deeper dives into the unique collections with web archives now available on loc.gov. The Web Archiving Team sends its deepest gratitude to all involved in this significant achievement for the Library.

With over 20,000 web archives among 114 ongoing and finished collections, the scale of the Library’s web archive has grown significantly, presenting compelling new challenges for description along the way. To provide access at the same rate the archive continues to expand, the Web Archiving Team (WAT), representatives from Acquisitions and Bibliographic Access (ABA), and Web Services created an innovative new MPLP cataloguing approach. The approach, known internally as the minimal-record approach, combines the descriptive talents of cataloging librarians with the power of Python scripting to automatically create MODS records.

2018-08-03_15-36-52

The Library successfully implemented the minimal-record approach during its previous releases of the Federal CourtsInternational Tribunals, and Legislative Branch Web Archive collections. In planning subsequent releases, WAT saw that many web archives overlap between thematic collections — this is possible because of the way the Library collects and manages the collections when building them. For example, Hark! a vagrant, appears in the Webcomics Web Archive and the Small Press Expo Comic and Comic Art Web Archive. In the current release, there are even more complicated examples, such as Beliefnet, which appears in three different collections curated by four different library units.

Learn More About the Minimal-Record Schema, View Screenshots, in Complete Blog Post

Direct to the Library of Congress–Web Archives

Direct to LOC Web Archiving FAQs

Gary Price About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.

Share