October 29, 2020

Metadata: Free Public Data File of 112+ Million Crossref Records Now Available

From a Crossref Blog Post by Jennifer Kemp:

A lot of people have been using our public, open APIs to collect data that might be related to COVID-19. This is great and we encourage it. We also want to make it easier. To that end we have made a free data file of the public elements from Crossref’s 112.5 million metadata records.

The file (65GB, in JSON format) is available via Academic Torrents here: https://doi.org/10.13003/83B2GP

It is important to note that Crossref metadata is always openly available. The difference here is that we’ve done the time-saving work of putting all of the records registered through March 2020 into one file for download.

The sheer number of records means that, though anyone can use these records anytime, downloading them all via our APIs can be quite time-consuming. We hope this saves the research community valuable time during this crisis.

  • All records are included. In other words, the data file has every DOI ever registered with Crossref through March 31st, 2020. This means it’s a large file, 65GB.
    • Metadata is supplied by our members and, as such, not all records have the same completeness (or quality) of metadata. Bibliographic metadata is generally required. All other metadata, e.g. license and funding information, ORCIDs, etc. is optional (though very much encouraged).
    • References (i.e. authors’ cited sources) are also optional metadata. Nearly 50 million records include references and, of those, nearly 30 million have open references that are included in the data filet. “Limited” and “Closed” references are not included in the data file.
    • If an error in the metadata is found, please report it directly to the publisher to correct.
  • The records are in JSON.
  • New and updated records can be added incrementally using our REST API, which includes a number of date filter options, e.g. index-date.
  • No registration is required to use our REST API but we do strongly encourage being a ‘polite’ (i.e. identified) user. It makes troubleshooting much easier and reduces the chance of negatively impacting other users.
Gary Price About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.

Share