DataCite Launches First Release of the Data Citation Corpus
From a Make Data Count Post:
DataCite, in partnership with the Chan Zuckerberg Initiative (CZI), is delighted to announce the first release of the Data Citation Corpus. A major milestone in the Make Data Count initiative, the release makes eight million data citations openly available and usable for the first time via an interactive dashboard and public data file. We invite the community to engage with the data and provide feedback on this collaborative effort.
As highlighted by Make Data Count, the lack of a centralized resource for citations to datasets has hindered the evaluation of how open data is being used. To address this gap, DataCite, with funding from the Wellcome Trust, has developed an innovative aggregation that brings together for the first time data citations from diverse sources into a comprehensive and publicly accessible resource for the global community.
[Clip]
The first release of the corpus includes data citations in DataCite and Crossref metadata as well as asserted data citations contributed by CZI, available to the community via a data citation store and dashboard developed by Coko. Leveraging accession numbers from Europe PMC, CZI applied a machine-learning model to a large set of full-text articles and preprints to extract mentions to datasets. This has enabled the first-ever aggregation of citations for datasets with DOIs and accession numbers into a single corpus, enabling a more complete picture of data usage.
“As an organization that invests in research data and reference datasets, we believe it is critical to understand how data is shared and reused to enable new scientific discoveries,” said Patricia Brennan, Vice President of Science Technology at the Chan Zuckerberg Initiative. “DataCite has been a leader in this space, providing critical infrastructure for data citation and for tracking its reuse. We’re proud to support them in their vision to build a comprehensive global corpus of actionable data citations.’
[Clip]
The interactive dashboard of the corpus allows users to visualize and report on citations by a variety of facets, such as funder, data repository, or the journal where the article citing the data is published
A complete data file of all of the citations is also available for additional analysis and evaluation. Request the data file via this form.
Forthcoming releases will focus on addressing existing metadata gaps, for example, related to the disciplinary information for the datasets, and on incorporating feedback from early adopters. DataCite will also pursue new collaborations with additional citation aggregators to expand the breadth and scale of data citations in the corpus.
Learn More, Read the Complete Post
Filed under: Dashboards, Data Files, Funding, News, Open Access, Patrons and Users
About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.