Data Bonanza: Harvard Library Making Nearly 100% of Their Catalog Records Open Access, More than 12 Million Records Released
Full Text of the Harvard announcement below. Below that, coverage from The NY Times.
The Harvard Library announced it would make more than 12 million catalog records from Harvard’s 73 libraries publicly available.
The records contain bibliographic information about books, videos, audio recordings, images, manuscripts, maps, and more. The Harvard Library is making these records available in accordance with its Open Metadata Policy and under a Creative Commons 0 public domain license. In addition, the Harvard Library announced its open distribution of metadata from its Digital Access to Scholarship at Harvard (DASH) scholarly article repository under a similar CC0 license.
“The Harvard Library is committed to collaboration andopen access. We hope this contribution is one of many steps toward sharing the vital cultural knowledge held by libraries with all,” said Mary Lee Kennedy, Senior Associate Provost for the Harvard Library.
The catalog records are available for bulk download from Harvard, and are available for programmatic access by software applications via API’s at the Digital Public Library of America (DPLA). The records are in the standard MARC21 format.
“By instituting a policy of open metadata, the HarvardLibrary has expressed its appreciation for the great potential that librarymetadata has for innovative uses. The two metadata releases today are primeexamples,” said Stuart Shieber, Library Board Member, Director of the Office for Scholarly Communication and Professor of Computer Science at Harvard.
John Palfrey, chair of the DPLA, said, “With this major contribution, developers will be able to start experimenting with building innovative applications that put to use the vital national resource that consists of our local public and research libraries, museums, archives and cultural collections.” He added that he hoped that this would encourage other institutions to make their own collection metadata publicly available.
The records consist of information describing works—including creator, title, publisher, date, language, and subject headings—as well as other descriptors usually invisible to end users, such as the equalization system used in a recording. Harvard’s Kennedy noted, “The accessibility of the entire set of data for each item will, we hope, spur imaginative uses that will find new value in what libraries know.”
From The NY Times: “Harvard Releases Big Data for Books”
“This is Big Data for books,” said David Weinberger, co-director of Harvard’s Library Lab. “There might be 100 different attributes for a single object.” At a one-day test run with 15 hackers working with information on 600,000 items, he said, people at created things like visual timelines of when ideas became broadly published, maps showing locations of different items, and a “virtual stack” of related volumes garnered from various locations.
Harvard plans also to eventually include circulation data on the items as well, said Stuart Shieber, director of Harvard’s Office of Scholarly Communication, who oversaw the project. “We have to be careful how we do that, to avoid releasing any personal information,” he said.
About Gary Price
Gary Price (email@example.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.