SUBSCRIBE
SUBSCRIBE
EXPLORE +
  • About infoDOCKET
  • Academic Libraries on LJ
  • Research on LJ
  • News on LJ
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Libraries
    • Academic Libraries
    • Government Libraries
    • National Libraries
    • Public Libraries
  • Companies (Publishers/Vendors)
    • EBSCO
    • Elsevier
    • Ex Libris
    • Frontiers
    • Gale
    • PLOS
    • Scholastic
  • New Resources
    • Dashboards
    • Data Files
    • Digital Collections
    • Digital Preservation
    • Interactive Tools
    • Maps
    • Other
    • Podcasts
    • Productivity
  • New Research
    • Conference Presentations
    • Journal Articles
    • Lecture
    • New Issue
    • Reports
  • Topics
    • Archives & Special Collections
    • Associations & Organizations
    • Awards
    • Funding
    • Interviews
    • Jobs
    • Management & Leadership
    • News
    • Patrons & Users
    • Preservation
    • Profiles
    • Publishing
    • Roundup
    • Scholarly Communications
      • Open Access

June 8, 2017 by Gary Price

New Journal Article: “Wide-Open: Accelerating Public Data Release by Automating Detection of Overdue Datasets”

June 8, 2017 by Gary Price

The following article was published today by PLOS Biology.
Title
Wide-Open: Accelerating Public Data Release By Automating Detection Of Overdue Datasets
Authors

Maxim Grechkin
University of Washington

Hoifung Poon
Microsoft Research

Bill Howe
University of Washington
 Source
PLOS Biology
PLoS Biol 15(6): e2002477
https://doi.org/10.1371/journal.pbio.2002477
Abstract

Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.

Direct to Full Text Article

Coverage

Text-Mining Tool Seeks Out ‘hidden Data (via Nature)

Two popular repositories that offer researchers the option to keep genetics data hidden, for example, are the Gene Expression Omnibus (GEO) and the Sequence Read Archive (SRA), both run by the US National Center for Biotechnology Information. Both sites require data sets to be made open when papers are published. But in practice, scientists often forget to do this, says Maxim Grechkin, a computer scientist at the University of Washington in Seattle.

So Grechkin and his collaborators developed Wide-Open to find non-open data, focusing on GEO and SRA. The tool scans papers for mentions of unique data-set identifier codes (called accession codes) that use the GEO’s or SRA’s code format. The tool could be tweaked to query other repositories as well, notes Grechkin.
Once it identifies a valid code, Wide-Open trawls the relevant repository to find out whether the data set is public. It notes as “overdue” any data set that isn’t available, but should be.

Read the Complete Article

 
 

Filed under: Data Files, Journal Articles, News, Open Access, PLOS

SHARE:

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.

ADVERTISEMENT

Archives

Job Zone

ADVERTISEMENT

Related Infodocket Posts

Judge Issues Opinion in Hachette Book Group, Et Al v. Internet Archive, Et Al; Plaintiffs Motion For Summary...

We Will Be Updating this Post with Media Reports, Statements, Analysis, etc. as they Become Available Posts/Statements From the the Internet Archive Blog Post:  “The Fight Continues”   We will ...

Journal Article: "The Case of the Disappearing Librarians: Analyzing Documentation of Librarians' Contributions to Systematic Reviews"

The article linked below was published today by the Journal of the Medical Library Association (JMLA). Title The Case of the Disappearing Librarians: Analyzing Documentation of Librarians’ Contributions to Systematic ...

Podcast: The Open Research Knowledge Graph, A Conversation with Vinodh Ilangovan and Jennifer D'Souza

A new Access 2 Perspectives podcast is now online. The conversation is hosted by Dr. Jo Havemann. From the Podcast Description Vinodh Ilangovan and Jennifer D’Souza work on the Open Research ...

AI Tools Are Generating Convincing Misinformation. Engaging With Them Means Being on High Alert; Report From Fully OA...

AI Tools Are Generating Convincing Misinformation. Engaging with Them Means Being on High Alert (via The Conversation) Guests at the Next DPLA Open Board + Community Meeting (April 10, 2023) ...

American Library Association Reports Record Number of Demands to Censor Library Books and Materials in 2022: Book Challenges...

From the American Library Association: The American Library Association (ALA) today released new data documenting* 1,269 demands to censor library books and resources in 2022, the highest number of attempted book ...

Penn State University Libraries: Expanded Judy Chicago Research Portal Relaunches With Five Unified Collections

From a PSU Libraries Blog Post: Penn State University Libraries has announced the relaunch of an expanded Judy Chicago Research Portal, a searchable gateway to the archives of this prominent feminist ...

Two Ebook Bill Hearings; New Digital Collections From South Africa, India, Nepal and Georgia Now Available Online From...

Clarivate Announces Gordon Samson as President, Intellectual Property and Nominates Dr. Saurabh Saha as New Independent Director Here Come the First ChatGPT Plugins (via OpenAI); More via TechCrunch Illinois House ...

Registration Now Open -- May 24-26 Nobel Prize Summit on Misinformation and Trust in Science (In-Person & Virtual)

From the U.S. National Academy of Science: Registration is now open for the Nobel Prize Summit Truth, Trust and Hope — which will convene Nobel Prize laureates and other world-renowned experts and ...

Report: "Top Missouri Lawmaker Moves To Strip Library Funding"

From the Associated Press (AP):  A powerful Missouri state lawmaker on Tuesday moved to strip state funding for public libraries over a fight about books. Republican House Budget Committee Chairman ...

European Research Council (ERC) Study Identifies Repositories That Allow Researchers to Comply With EU Open Science Rules

From the ERC: A new study identifies repositories for data and publications that could help ERC grantees, as well as beneficiaries of other Horizon Europe grants, comply with EU open ...

Nearly 20 Hindawi Journals Delisted From Leading Index Amid Concerns of Papermill Activity & More News Headlines

Conservation Center for Art & Historic Artifacts (CCAHA) and Lyrasis Announce Succession Planning Initiative for Collections Stewardship Nearly 20 Hindawi Journals Delisted From Leading Index Amid Concerns of Papermill Activity ...

Houston Chronicle: "As Book Bans Ebb, the Battle to Criminally Charge Texas Librarians Has Started"

From the Houston Chronicle: Politically and socially conservative, Texas is a national leader in school book challenges and bans; a Chronicle investigation last summer counted more than 2,000 content reviews of challenged school library ...

ADVERTISEMENT

FOLLOW US ON TWITTER

Tweets by infoDOCKET

ADVERTISEMENT

This coverage is free for all visitors. Your support makes this possible.

This coverage is free for all visitors. Your support makes this possible.

Primary Sidebar

  • News
  • Reviews+
  • Technology
  • Programs+
  • Design
  • Leadership
  • People
  • COVID-19
  • Advocacy
  • Opinion
  • INFOdocket
  • Job Zone

Reviews+

  • Booklists
  • Prepub Alert
  • Book Pulse
  • Media
  • Readers' Advisory
  • Self-Published Books
  • Review Submissions
  • Review for LJ

Awards

  • Library of the Year
  • Librarian of the Year
  • Movers & Shakers 2022
  • Paralibrarian of the Year
  • Best Small Library
  • Marketer of the Year
  • All Awards Guidelines
  • Community Impact Prize

Resources

  • LJ Index/Star Libraries
  • Research
  • White Papers / Case Studies

Events & PD

  • Online Courses
  • In-Person Events
  • Virtual Events
  • Webcasts
  • About Us
  • Contact Us
  • Advertise
  • Subscribe
  • Media Inquiries
  • Newsletter Sign Up
  • Submit Features/News
  • Data Privacy
  • Terms of Use
  • Terms of Sale
  • FAQs
  • Careers at MSI


© 2023 Library Journal. All rights reserved.


© 2022 Library Journal. All rights reserved.