October 20, 2014

AP: “British Library Sets to Archive the Web”

share save 171 16 AP: British Library Sets to Archive the Web

From the Associated Press:

Capturing the unruly, ever-changing Internet is like trying to pin down a raging river.

But the British Library is going to try.

For centuries the library has kept a copy of every book, pamphlet, magazine and newspaper published in Britain. Starting Saturday, it will also be bound to record every British website, e-book, online newsletter and blog in a bid to preserve the nation’s “digital memory.”

[Clip]

“Stuff out there on the Web is ephemeral,” said Lucie Burgess, the library’s head of content strategy. “The average life of a web page is only 75 days, because websites change, the contents get taken down.

[Clip]

Like reference collections around the world, the British Library has been attempting to archive the Web for years in a piecemeal way and has collected about 10,000 sites. Until now, though, it has had to get permission from website owners before taking a snapshot of their pages.

That began to change with a law passed in 2003, but it has taken a decade of legislative and technological preparation for the library to be ready to begin a vast trawl of all sites ending with the suffix .uk.

An automated web harvester will scan and record 4.8 million sites, a total of 1 billion web pages. Most will be captured once a year, but hundreds of thousands of fast-changing sites such as those of newspapers and magazines will be archived as often as once a day.

The library plans to make the content publicly available by the end of this year.

Read the Complete Article

See Also: Web Archiving (via The British Library)
Includes info about legal deposit of UK online publications.

See Also:  Non-Print Legal Deposit regulations laid before parliament (February 7, 2013)

See Also: British Library Welcomes Public Consultation on Non-Print Legal Deposit (February 27, 2012)

See Also: UK Web Archiving and Preservation Task Force (via DPC)

See Also: UK Web Archive (Search)

share save 171 16 AP: British Library Sets to Archive the Web
Gary Price About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.