May 16, 2022

Library of Congress Releases Coronavirus Web Archive Collection Online

From the Library of Congress: 

After collecting a wide variety of web content documenting the COVID-19 pandemic over the past two years, the Library of Congress is now making its growing Coronavirus Web Archive available to the public.

The collection, which now includes 450 web archives, aims to balance government, science, business and policy content with human stories that will give future historians a sense of how the COVID-19 pandemic impacted the daily lives of individuals, families and communities.

The Library has been capturing coronavirus web content in many of its existing web collections since the start of the COVID-19 pandemic, well before establishing a formal collection plan in June 2020. Since the Library is a member of the International Internet Preservation Consortium, Library staff also nominated sites for that effort.

For the Coronavirus Web Archive, a core team of 10 recommending officers representing a variety of skills, perspectives and subject matter expertise from across the Library have worked together to build a well-rounded collection. Additionally, international collections librarians and overseas offices made contributions to ensure that the COVID-19 pandemic is represented in a truly global collection.

“We didn’t know anything about COVID-19 when the pandemic began, but at the Library of Congress, we did know how historical pandemics are researched,” said Jennifer Harbster, head of the Library’s Science Reference Section. “We may not know exactly what future historians will be looking for when they tell the story of these remarkable years, but by looking at our materials from the Influenza of 1918 and broadening our scope to include areas beyond science like policy, the arts, and social content, we hope to present a collection that will serve future researchers.”

The Library began building web archive collections in 2000 to gather web-based information that focused on specific themes or events as they unfolded. Over the past two decades, the Library’s web archive collections have grown to hold over 2.8 petabytes of data in over 21 billion files. With so much content published on the web, curators still cannot capture everything, so the Library has refined its collections process with a multidisciplinary, team-driven approach.

The Coronavirus Web Archive team continues to seek good examples of items that represent how Americans and people from across the globe are responding to the pandemic. The collection includes topics such as containment efforts, legal responses, human resource approaches, virtual education methods, unemployment trends, and artistic responses to the global challenge.

Library subject specialists are currently collecting content on vaccine rollouts, testing, virus variants, face mask guidance and developing subjects, such as guidance for students and teachers returning to the classroom. New content will continue to be released monthly, following a one-year embargo, as a part of this ongoing collection.

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.