August 15, 2020

A Group of Research Organizations Including NLM Launch COVID-19 Dataset (CORD-19)

From the Office of Science and Technology Policy:

Today, researchers and leaders from the Allen Institute for AI, Chan Zuckerberg Initiative (CZI), Georgetown University’s Center for Security and Emerging Technology (CSET), Microsoft, and the National Library of Medicine (NLM) at the National Institutes of Health released the COVID-19 Open Research Dataset (CORD-19) of scholarly literature about COVID-19, SARS-CoV-2, and the Coronavirus group.

Requested by The White House Office of Science and Technology Policy, the dataset represents the most extensive machine-readable Coronavirus literature collection available for data and text mining to date, with over 29,000 articles, more than 13,000 of which have full text.

Now, The White House joins these institutions in issuing a call to action to the Nation’s artificial intelligence experts to develop new text and data mining techniques that can help the science community answer high-priority scientific questions related to COVID-19.

The collection was constructed via a unique collaboration between Microsoft, NLM, CZI, and the Allen Institute for AI, coordinated by Georgetown University. Microsoft’s web-scale literature curation tools were used to identify and bring together worldwide scientific efforts and results, CZI provided access to pre-publication content, NLM provided access to literature content, and the Allen AI team transformed the content into machine-readable form, making the corpus ready for analysis and study.

The CORD-19 resource is available on the Allen Institute’s SemanticScholar.org website and will continue to be updated as new research is published in archival services and peer-reviewed publications. Researchers should submit the text and data mining tools and insights they develop in response to this call to action via the Kaggle platform. Through Kaggle, a machine learning and data science community owned by Google Cloud, these tools will be openly available for researchers around the world.

Read the Complete Call to Action Document

See Also: Meta.org

Gary Price About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.

Share