For the first time in over a year we have a new piece of information about the planned LC/Twitter archive. Unfortunately, it doesn’t provide any dates about a possible launch, info about what’s needed to be a “qualified researcher”, and if the archive be accessible outside of an LC building. However, at this point we’re happy to learn something new.
A company named Gnip (pronounced Guh´nip) will:
[Be] playing a key role in the this project by delivering past and future Twitter data to the Library of Congress for historic preservation in the archives.
We’ve learned a lot from this project and it is clear that some of the outcomes from this project will benefit Gnip’s future product offerings. We look forward to sharing some of the technical details in the future at the appropriate time.
The archive of Twitter data which Gnip delivers to the Library of Congress will cater to the needs of researchers who wish to use limited amounts of public Twitter data for non-commercial purposes. Gnip will continue to serve those seeking realtime data, full-coverage data, and commercial use cases.
Gnip is based in Boulder, Colorado and the local newspaper, The Daily Camera, has more.
Over the last six months, Gnip has been delivering 8 billion messages a week to the Library of Congress, which is working on search mechanisms and procedures on its own***. The archive will be available to qualified researchers for non-commercial use. No date has been set for when the archive will be available.
No protected tweets or direct messages are available to Gnip or the Library of Congress, and they won’t be part of the archive. Deleted tweets and linked data, such as websites and pictures (sorry, Anthony Weiner), won’t be part of the archive either.
Gnip is donating its services to the project. Dealing with data on that scale allowed Gnip to stretch its abilities and will contribute to future product offerings, Johnson said. Though he had no specifics today, he said commercial access to historical social media is likely.
What Does Gnip Do When They’re Not Working With LC?
Gnip is in the real-time social media access for enterprise applications. They offer some very interesting services.
Gnip has their own API and can provides access to data from a large number social media services including Facebook, Flickr, StumbleUpon, Tumblr, Twitter*, and YouTube. Here’s a list of some of the other social media services they work with.
In addition to being able to aggregate and normalize content from disparate sources, Gnip can also add metadata, remove duplicates, expand shortened URLs, provide translations, etc. More info here. If you like charts, here’s one that might be of interest. It illustrates how Gnip works at a VERY basic level.
Finally, you can take a look at what Gnip is up to by viewing this brief video. Their entire web site is worth a look. Many of the services they offer will be familiar from other database services but they’re working with real-time social media.
* Gnip was the first company to become an authorized reseller of Twitter data.
More from AllThingsD/WSJ (November 2010)
** When the Twitter archive was first announced LC told American Prospect that they had formed a partnership with Stanford to assist in building the service. The LC/Stanford collaboration is named CADS (Computational Approaches to Digital Stewardship).