There are widespread efforts to promote data sharing in research, and most of these focus on making the datasets associated with an individual research article publicly available in a reliable repository. An increasing number of publishers, funders and institutions have data sharing policies that recommend or mandate that the data from an article be made available upon publication.
Compliance with these data sharing policies is still frustratingly low, even when sharing is compulsory: authors are unsure which datasets they should be sharing (and where), and stakeholders cannot easily tell when the authors have shared the right data.
To address this issue, the Alfred P. Sloan Foundation is supporting Coko to develop DataSeer, an online service that uses Natural Language Processing to identify the datasets associated with a particular article, even secondary or tertiary datasets that may not be obvious. The goal is a service that guides authors through the data sharing process for their article, with reports for publishers, funders, and institutions so they can easily assess policy compliance by comparing what should be shared with what was actuallyshared. Our initial partners will be the University of California Curation Center (UC3), PLOS, and the University of California Press.
The project lead will be Dr Tim Vines, a peer review workflow expert with Origin Editorial. He conceived of DataSeer while working on how best to enforce the data sharing policy at the journal Molecular Ecology. DataSeer will developed as an open-source project, and will be made freely available to all potential users as both a standalone online service or as a PubSweet component.
Here’s a brief introductory video.
Direct to Full Text Announcement