May 26, 2022 Announces Public Availability of COVID-19 Data Lake

From a News Release:, a leading enterprise AI software provider for accelerating digital transformation, today announced the general public release of the COVID-19 Data Lake™.


The COVID-19 Data Lake uniquely interconnects the elements of all the data sources into a single, unified federated data model that is immediately available for researchers to access through any utility that offers RESTful data access. Most importantly, the data lake pre-establishes essential links in those complex data sets so that researchers can easily navigate and explore all of the associations within and across the data sets through a knowledge graph and then apply advanced data science methods. By unifying the data sets, the COVID-19 Data Lake helps researchers and developers generate insights faster and more easily than is possible with other data collections.

Other COVID-19 data collections are limited in that they only provide lists of URLs that link to individual data sets in different locations and in different formats, requiring extensive data wrangling and integration efforts to be useful. In addition, a few providers offer digital libraries, collections of data sources that are stored in one place, but the data are not pre-integrated nor federated.


”Having access to an integrated set of diverse COVID-19 data sources with a common data model can help accelerate analysis of critical supply chain issues in our work with FEMA and other agencies,” said Tim Russell, Research Engineer at the MIT Humanitarian Supply Chain Lab, MIT Center for Transportation & Logistics. “For example, as we look to understand the distribution and availability of COVID-19 testing equipment and materials – or the pandemic’s impact on freight flows throughout the country – the COVID-19 Data Lake provides a valuable resource in unifying and simplifying access to the necessary data without having to waste time on finding, cleaning, and preparing the data for analysis.”

The COVID-19 Data Lake, which includes data from a number of critical COVID data sources, is now publicly available at no cost to the global research community and is accessible at:

Amazon Web Services (AWS) is co-sponsor of the open data initiative and is providing cloud infrastructure services in support of this initiative. COVID-19 Data Lake data sets include:

Additional datasets, to be published May 15, 2020, will include:

Read the Complete Announcement

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.