CORE has greatly increased the amount of content hosted directly in its database; last year the service provided access to approximately 12 million full texts, to date it hosts 18 million full texts and does not stop its continuous efforts to enrich its data. Data from repositories often come without basic identifiers such as DOIs and ORCIDs. This makes linking and understanding the relations between papers in repositories and published literature a non-trivial task.
Over the last year, we have become really excited about being able to offer a unique dataset, i.e. a dataset of full text articles, spanning pre-prints, reports, grey literature, theses as well as the best peer-reviewed research papers, from repositories and journals. A dataset that is complementary to other major scholarly datasets including Microsoft Academic Graph (MAG), Crossref (the majority of articles in CORE do not have an equivalent article in Crossref) and ORCID.
We are now pleased to announce that all article metadata from Crossref, a consortium led initiative which serves as a unique Digital Object Identifier (DOI) registration authority and contains around 100M metadata documents submitted from more than 4,500 publishers and organisations, are now linked and integrated in the CORE data. More specifically, using the internal project we called MUCC, we have processed and linked data from not only Crossref, but also MAG, Unpaywall, ORCID and Pubmed.