October 9, 2015

Flickr / Yahoo Labs Release One Hundred Million Creative Commons Flickr Images in New Research Dataset

Note: The following news was released about 10 days ago.

Massive amounts of Flickr metadata is now available for researchers to use.

From Yahoo (via Tumblr):

On Flickr, photos, their metadata, their social ecosystem, and the pixels themselves make for a vibrant environment for answering many research questions at scale. However, scientific efforts outside of industry have relied on various sized efforts of one-off datasets for research. At Flickr and at Yahoo Labs, we set out to provide something more substantial for researchers around the globe.

Today, we are announcing the Flickr Creative Commons dataset as part of Yahoo Webscope’s datasets for researchers. The dataset, we believe, is one of the largest public multimedia datasets that has ever been released—99.3 million images and 0.7 million videos, all from Flickr and all under Creative Commons licensing.

The dataset (about 12GB) consists of a photo_id, a jpeg url or video url, and some corresponding metadata such as the title, description, title, camera type, title, and tags. Plus about 49 million of the photos are geotagged! What’s not there, like comments, favorites, and social network data, can be queried from the Flickr API.


The dataset can host a variety of research studies and challenges. One of the first challenges we are issuing is the MediaEval Placing Task, where the task is to build a system capable of accurately predicting where in the world the photos and videos were taken without using the longitude and latitude coordinates. This is just the start. We plan to create new challenges through expansion packs that will widen the scope of the dataset with new tasks like object localization, concept detection, and social semantics.

Read the Complete Tumblr Post
Includes info on how to access the dataset.

Gary Price About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.

Craft Exceptional Digital Experiences for Your Users
Digital UX LJ and ER&L present an exceptional roster of library and user experience (UX) experts for our newest online course, Digital UX Workshop: Crafting Exceptional Digital Experiences for the User-Centered Library. During this 5-week online workshop, you will explore why UX matters, and how to sell user-centered design (UCD) to leadership within your organization. Whether you want to redesign your website, revamp your user interface, create a new discovery tool, implement e-resources, or develop a mobile app—you’ll have a tangible product by the end of the course.