January 20, 2022

"The First Decade of Web Archiving at the Library of Congress"

From a Guest Blog Post by Abbie Grotke, Web Archiving Team Lead at the Library of Congress (via The Signal)

Eleven years ago, the Library of Congress established a pilot web archiving project to study methods to evaluate, select, collect, catalog, provide access to and preserve at-risk born digital content for future generations. We could write a book (or at least a few blog posts!) about lessons learned since then, yet we continue to face a variety of challenges.


We’ve collected over 240 terabytes of content, in almost 40 event and thematic collections. Our strengths are in government, public policy and law: we archive U.S. national elections, house and senate and committee sites, changes in the Supreme Court and legal blawgs.

e also build web archives with our special collection divisions – the Manuscript, Prints and Photographs and Music divisions are archiving sites related to their physical holdings. In recent years Library staff in overseas offices in Egypt, Brazil, Indonesia, India and Pakistan captured born digital content documenting elections and other events.

Read the Full Text of Abbie Grotke’s Post
More posts in this series will be available soon.

Direct to Library of Congress Web Archives




About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.