Web Archiving: Cornell Selects Archive-It to Capture and Preserve 8 Million University Web Pages

April 13, 2011 by Gary Price

Internet Archive will begin preserving Cornell’s online content starting this month after the University signed a contract with the Internet archiving company in March.

Internet Archive will create an archive of Cornell’s entire web space — approximately eight million documents — by capturing HTML coding, images, PDFs and links to external pages, according to Dean Krafft, Cornell library chief technology strategist, who is overseeing the project.

Cornell workers are beginning to use Internet Archive’s “Archive-It” function to make test scans, or “crawls,” of Cornell’s Internet domain, Krafft said. A complete crawl of the Cornell domain will occur two to three times a year, with the first one scheduled to take place within the next month, he said.

[Clip]

Cornell previously partnered with Archive-It in 2009 to provide nearly 80,000 free online books to the public, according to a press release by Cornell Libraries.

[Clip]

Kristine Hanna, Internet Archive’s director of archiving services said that about 90 university libraries use the Archive-It service to collect and archive digital content.

Cornell’s archived web pages will be available publicly on Archive-It.org, giving people access to information that may no longer be available as a result of updates or removal of pages, Earle said.

We’ve been and continue to be major fans of the Archive-It service and the Internet Archive. Here’s are two reasons why.

1. As the article points out the Cornell collection will be available on the web along with those from many other organizations (not only higher-ed).

2. A feature that Archive-It collections offer vs. The Wayback Machine (an essential tool also from Internet Archive) is that they’re full text searchable. Nice!

See Also: New: Japan Earthquake 2011 Web Archive
From the Internet Archive/Archive-It

Filed under: Associations and Organizations, Libraries, Resources

Archive-It Cornell University Digitized Archives & Libraries Internet Archive Web Archiving

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

Web Archiving: Cornell Selects Archive-It to Capture and Preserve 8 Million University Web Pages

About Gary Price

Archives

Related Infodocket Posts

FOLLOW US ON X