From The Washington Post:
The protests and pleas for more time were just starting when Jason Scott took to Twitter to register his utter lack of surprise over the fate of Yahoo’s sprawling chitchat of neighborhoods, businesses, addicts in recovery and birdwatchers.
The team of volunteers Scott founded — “rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage” — has spent a decade hopping from one online obliteration to the next, capturing whatever they can in a public repository called the Wayback Machine. The Archive Team, as his group is known, keeps a “Deathwatch” of websites in various stages of shutdown (“Likely to Die,” “Dying,” “Dead as a Doornail”); Yahoo discards feature prominently.
Mark Graham, director of the Wayback Machine, says the tool comprises nearly 400 billion Web pages and accounts for about half of the nonprofit Internet Archive’s 60 petabytes of stored content. That is a lot of data. A petabyte, equal to more than a million gigabytes, is sometimes equated to 10 million filing cabinets of text.
But in the grand scheme of the Internet, it is also small. Yahoo Groups could clock in at several petabytes, Graham guesses — though he compares the act of estimation to walking into a library you can’t see the end of and trying to guess how many words it contains.