C3.ai Announces Public Availability of COVID-19 Data Lake
From a C3.ai News Release:
C3.ai, a leading enterprise AI software provider for accelerating digital transformation, today announced the general public release of the C3.ai COVID-19 Data Lake™.
[Clip]
The C3.ai COVID-19 Data Lake uniquely interconnects the elements of all the data sources into a single, unified federated data model that is immediately available for researchers to access through any utility that offers RESTful data access. Most importantly, the data lake pre-establishes essential links in those complex data sets so that researchers can easily navigate and explore all of the associations within and across the data sets through a knowledge graph and then apply advanced data science methods. By unifying the data sets, the C3.ai COVID-19 Data Lake helps researchers and developers generate insights faster and more easily than is possible with other data collections.
Other COVID-19 data collections are limited in that they only provide lists of URLs that link to individual data sets in different locations and in different formats, requiring extensive data wrangling and integration efforts to be useful. In addition, a few providers offer digital libraries, collections of data sources that are stored in one place, but the data are not pre-integrated nor federated.
[Clip]
”Having access to an integrated set of diverse COVID-19 data sources with a common data model can help accelerate analysis of critical supply chain issues in our work with FEMA and other agencies,” said Tim Russell, Research Engineer at the MIT Humanitarian Supply Chain Lab, MIT Center for Transportation & Logistics. “For example, as we look to understand the distribution and availability of COVID-19 testing equipment and materials – or the pandemic’s impact on freight flows throughout the country – the C3.ai COVID-19 Data Lake provides a valuable resource in unifying and simplifying access to the necessary data without having to waste time on finding, cleaning, and preparing the data for analysis.”
The C3.ai COVID-19 Data Lake, which includes data from a number of critical COVID data sources, is now publicly available at no cost to the global research community and is accessible at: https://c3.ai/covid.
Amazon Web Services (AWS) is co-sponsor of the open data initiative and is providing cloud infrastructure services in support of this initiative.
C3.ai COVID-19 Data Lake data sets include:
- Johns Hopkins University: COVID-19 Data Repository
- The COVID Tracking Project
- MOBS Lab: COVID-19 Situation Report
- nCoV-2019 Data Working Group: Epidemiology Data
- European Centre for Disease Prevention and Control: Worldwide Situation Updates
- COVID-19 Open Research Dataset (CORD-19)
- National Center for Biotechnology Information Virus Database
- World Health Organization: Daily Situation Reports
- Milken Institute COVID-19 Treatment and Vaccine Tracker
- World Health Organization COVID-19 R&D
- The New York Times: COVID-19 Data in the United States
Additional datasets, to be published May 15, 2020, will include:
- University of Montreal: COVID-19 Image Data Collection
- Carbon Health & Braid Health: COVID-19 Clinical Data Repository
- Kaiser Health News: US Hospital ICU Beds
- US Census Bureau: Population Data
- Apple: COVID-19 Mobility Trends
- Kaiser Family Foundation: Social Distancing Policies
- University of Washington – Institute for Health Metrics and Evaluation: COVID-19 Projections
- Data Science for COVID-19: South Korea Dataset
- Indian Ministry of Health & Family Welfare: COVID-19 India
- Sito del Dipartimento della Protezione Civile – Emergenza Coronavirus
- Environment Protection Agency: US Air Quality
- The World Bank – Global Health Statistics
Read the Complete Announcement
Filed under: Associations and Organizations, Data Files, Libraries, News, Open Access, Reports

About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.