January 20, 2022

Journal Article: “The Data Set Knowledge Graph: Creating a Linked Open Data Source for Data Sets”

The article linked below was recently published by Quantitative Science Studies.


The Data Set Knowledge Graph: Creating a Linked Open Data Source for Data Sets


Michael Färber
Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

David Lamprecht
Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany


Quantitative Science Studies 1–30.
DOI: 10.1162/qss_a_00161


Several scholarly knowledge graphs have been proposed to model and analyze the academic landscape. However, although the number of data sets has increased remarkably in recent years, these knowledge graphs do not primarily focus on data sets but rather associated entities such as publications. Moreover, publicly available data set knowledge graphs do not systematically contain links to the publications in which the data sets are mentioned. In this paper, we present an approach for constructing an RDF knowledge graph that fulfills these mentioned criteria. Our data set knowledge graph, DSKG, is publicly available at http://dskg.org and contains metadata of data sets for all scientific disciplines. To ensure high data quality of the DSKG, we first identify suitable raw data set collections for creating the DSKG. We then establish links between the data sets and publications modeled in the Microsoft Academic Knowledge Graph that mention these data sets. As the author names of data sets can be ambiguous, we develop and evaluate a method for author name disambiguation and enrich the knowledge graph with links to ORCID. Overall, our knowledge graph contains more than 2,000 data sets with associated properties, as well as 814,000 links to 635,000 scientific publications. It can be used for a variety of scenarios, facilitating advanced data set search systems and new ways of measuring and awarding the provisioning of data sets.

Source: 10.1162/qss_a_00161

Direct to Access Full Text Article

Direct to Data Set Knowledge Graph (DKSG)

About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.