May 20, 2022

Research Tools: “CORE Raises Repository Data Quality by Consolidating Information From External Datasets” (Crossref, MAG, Unpaywall, PubMed)

From JISC:

CORE has greatly increased the amount of content hosted directly in its database; last year the service provided access to approximately 12 million full texts, to date it hosts 18 million full texts and does not stop its continuous efforts to enrich its data. Data from repositories often come without basic identifiers such as DOIs and ORCIDs. This makes linking and understanding the relations between papers in repositories and published literature a non-trivial task.

Over the last year, we have become really excited about being able to offer a unique dataset, i.e. a dataset of full text articles, spanning pre-prints, reports, grey literature, theses as well as the best peer-reviewed research papers, from repositories and journals. A dataset that is complementary to other major scholarly datasets including Microsoft Academic Graph (MAG), Crossref (the majority of articles in CORE do not have an equivalent article in Crossref) and ORCID.

We are now pleased to announce that all article metadata from Crossref, a consortium led initiative which serves as a unique Digital Object Identifier (DOI) registration authority and contains around 100M metadata documents submitted from more than 4,500 publishers and organisations, are now linked and integrated in the CORE data. More specifically, using the internal project we called MUCC, we have processed and linked data from not only Crossref, but also MAGUnpaywallORCID and Pubmed.

Learn More, Read the Complete Post

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.