January 19, 2022

Internet Archive Preserves More than 200 Terabytes of US Government Data During “End of Term Web Archive” Project

Note From infoDOCKET Founder/Editor Gary Price:

As I’ve said many times (and always very happy to repeat) over many years here on infoDOCKET and during presentations, the Internet Archive, under the leadership of Brewster Kahle, is an essential Internet research resource that is always improving in terms of content, services, ease of use, etc. I’m in constant appreciation of all that they do. I know this is also the case with users around the globe. All hail the IA!!!

Here’s a bit of info from a new IA Blog post with links to learn more about their work as one of several partners in the “End of Term Web Archive”project to collect and archive U.S. Government materials.

From the IA Blog:

In our December post, “Preserving U.S. Government Websites and Data as the Obama Term Ends,” we described our participation in the End of Term Web Archive project [End of Term Presidential Harvest 2016 Project] to preserve federal government websites and data at times of administration changes. We wanted to give a quick update on the project — we have archived a heck of a lot of data!

Between Fall 2016 and Spring 2017, the Internet Archive archived over 200 terabytes of government websites and data. This includes over 100TB of public websites and over 100TB of public data from federal FTP file servers totaling, together, over 350 million URLs/files. This includes over 70 million html pages, over 40 million PDFs and, towards the other end of the spectrum and for semantic web aficionados, 8 files of the text/turtle mime type. Other End of Term partners have also been vigorously preserving websites and data from the .gov/.mil web domains.

Every web page we have archived is accessible through the Wayback Machine and we are working to add the 2016 harvest to the main End of Term portal soon

Read the Complete Blog Post

See Also: Did You Know That You Can Archive Web Pages, PDF’s on Demand Using the Internet Archive? Learn more here and here.

About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.