New Journal Article: “Wide-Open: Accelerating Public Data Release by Automating Detection of Overdue Datasets”
The following article was published today by PLOS Biology.
Wide-Open: Accelerating Public Data Release By Automating Detection Of Overdue Datasets
Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.
Direct to Full Text Article
Text-Mining Tool Seeks Out ‘hidden Data (via Nature)
Two popular repositories that offer researchers the option to keep genetics data hidden, for example, are the Gene Expression Omnibus (GEO) and the Sequence Read Archive (SRA), both run by the US National Center for Biotechnology Information. Both sites require data sets to be made open when papers are published. But in practice, scientists often forget to do this, says Maxim Grechkin, a computer scientist at the University of Washington in Seattle.
Read the Complete Article
Filed under: Data Files, Journal Articles, News, Open Access, PLOS
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.