Data Visualization: One Hour of the UK Web Archive Crawl in One Minute

October 9, 2019 by Gary Price

Each year we attempt to collect as much of the UK web space as we can. This typically involves millions of websites and billions of individual assets (images, pdf’s, css files etc.). We send out our robots across the interwebs looking for websites that we can archive. The bots follow links to pages that have links to follow and it keeps going until we have archived (almost) everything. But what does it look like to ‘crawl’ the web? Here we have condensed an hour of live web crawling into a one minute video:

Every circle is a different website, and every line represents a link that was followed between websites. The size of the circle represents how many pages we visited from that site, and the width of the line represents the number of links we followed.

The blog posts also notes another visualization that provides a realtime view of what the UK Web Archive is crawling (only available when the crawler is active).

Read the Complete Blog Post

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

Data Visualization: One Hour of the UK Web Archive Crawl in One Minute

About Gary Price

Archives

FOLLOW US ON X

Data Visualization: One Hour of the UK Web Archive Crawl in One Minute

About Gary Price

Archives

Related Infodocket Posts

FOLLOW US ON X