Crossref: 2022 Public Data File of More Than 134 Million Metadata Records Now Available
In 2020 we released our first public data file, something we’ve turned into an annual affair supporting our commitment to the Principles of Open Scholarly Infrastructure (POSI). We’ve just posted the 2022 file, which can now be downloaded via torrent like in years past.
We aim to publish these in the first quarter of each year, though as you may notice, we’re a little behind our intended schedule. The reason for this delay was that we wanted to make critical new metadata fields available, including resource URLs and titles with markup.
Crossref metadata is always openly available via our API. We recommend you use this method to incrementally add new and updated records once you’re up and running with an annual public data file. If you’re interested in more frequent and regular “full-file” downloads, consider subscribing to our Metadata Plus program. Plus subscribers have access to monthly snapshots in JSON and XML formats.
Every year our metadata corpus grows. The 2020 file was 65GB and held 112 million records; 2021 came in at 102GB and 120 million records. This year the file weighs in at 160 GB and contains metadata for 134 million records, or all Crossref records registered up to and including April 30, 2022.
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.