October 22, 2019

Data Visualization: One Hour of the UK Web Archive Crawl in One Minute

From the UK Web Archive Blog:

Each year we attempt to collect as much of the UK web space as we can. This typically involves millions of websites and billions of individual assets (images, pdf’s, css files etc.). We send out our robots across the interwebs looking for websites that we can archive. The bots follow links to pages that have links to follow and it keeps going until we have archived (almost) everything. But what does it look like to ‘crawl’ the web? Here we have condensed an hour of live web crawling into a one minute video:

Every circle is a different website, and every line represents a link that was followed between websites. The size of the circle represents how many pages we visited from that site, and the width of the line represents the number of links we followed.

The blog posts also notes another visualization that provides a realtime view of what the UK Web Archive is crawling (only available when the crawler is active).

Read the Complete Blog Post

See Also: Direct to UK Web Archive

Gary Price About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.

Share