For the End of Term 2016 archive, the Library of Congress, California Digital Library, University of North Texas Libraries, Internet Archive, George Washington University Libraries, Stanford University Libraries and the U.S. Government Publishing Office have joined together for a collaborative project to preserve public United States Government websites at the end of the current presidential administration ending January 20, 2017. Partners are joining together to select, collect, preserve, and make the web archives available for research use.
This web harvest — like its predecessors in 2008 and 2012 — is intended to document the federal government’s presence on the web during the transition of Presidential administrations and to enhance the existing collections of the partner institutions. This broad comprehensive crawl of the .gov domain will include as many federal .gov sites as we can find, plus federal content in other domains (such as .mil, .com and social media content).
And that’s where you come in. You can help the project immensely by nominating your favorite .gov website, other federal government websites or governmental social media account with the End of Term Nomination Tool. Please nominate as many sites as you want. Nominate early and often. Tell your friends, family and colleagues to do the same. Help us preserve the .gov domain for posterity, public access and long-term preservation.
Learn More, Read the Complete Blog Post
In this collaboration, the partners will structure and execute a comprehensive harvest of the Federal Government .gov domain. The Internet Archive will crawl broadly across the entire .gov domain. The University of North Texas and the California Digital Library, will supplement and extend the broad comprehensive crawl with focused, in-depth crawls based on prioritized lists of URLs, including social media. This dual-edged approach seeks to capture a comprehensive snapshot of the Federal government on the Web at the close of the current administration.
The project has two phases: A broad, comprehensive baseline crawl of .gov sites and more selective, focused crawls based on priorities established by the partners. This focused selection seeks to capture sites in greater depth and to identify those at greater risk of rapid change or disappearance.
Comprehensive Crawl – The Internet Archive will undertake a comprehensive crawl of the .gov domain (all of the URLs identified for this project) beginning in mid September 2016, and again in early 2017, after the inauguration.
Prioritized Crawl – The project team will assemble a list of related URL’s and social media feeds. As a result, the project team is calling upon government information specialists, including librarians, political and social science researchers, and academics – to assist in the selection and prioritization of the selected web sites to be included in the collection, as well as identifying the frequency and depth of the act of collecting. The schedule for crawling of the prioritized URLs is still to be determined but will be announced as the project gets underway, on the project’s listserv, and in other communications to the public.
Participants will be asked to refine the existing URL list by browsing or searching .gov URLs in the Nomination Tool. Specialists will review the URLs to determine if they are in scope or out of scope for the end-of-term project. Additional URLs may also be added by participants, during the duration of crawling.