Metadata: Free Public Data File of 112+ Million Crossref Records Now Available
A lot of people have been using our public, open APIs to collect data that might be related to COVID-19. This is great and we encourage it. We also want to make it easier. To that end we have made a free data file of the public elements from Crossref’s 112.5 million metadata records.
The file (65GB, in JSON format) is available via Academic Torrents here: https://doi.org/10.13003/83B2GP
It is important to note that Crossref metadata is always openly available. The difference here is that we’ve done the time-saving work of putting all of the records registered through March 2020 into one file for download.
The sheer number of records means that, though anyone can use these records anytime, downloading them all via our APIs can be quite time-consuming. We hope this saves the research community valuable time during this crisis.
- All records are included. In other words, the data file has every DOI ever registered with Crossref through March 31st, 2020. This means it’s a large file, 65GB.
- Metadata is supplied by our members and, as such, not all records have the same completeness (or quality) of metadata. Bibliographic metadata is generally required. All other metadata, e.g. license and funding information, ORCIDs, etc. is optional (though very much encouraged).
- References (i.e. authors’ cited sources) are also optional metadata. Nearly 50 million records include references and, of those, nearly 30 million have open references that are included in the data filet. “Limited” and “Closed” references are not included in the data file.
- If an error in the metadata is found, please report it directly to the publisher to correct.
- The records are in JSON.
- New and updated records can be added incrementally using our REST API, which includes a number of date filter options, e.g. index-date.
- No registration is required to use our REST API but we do strongly encourage being a ‘polite’ (i.e. identified) user. It makes troubleshooting much easier and reduces the chance of negatively impacting other users.
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.