May 28, 2022

New Journal Article: “Discovery and Reuse of Open Datasets: An Exploratory Study”

The following article appears in the Journal of eScience Librarianship published at the University of Massachusetts Medical School.2016-07-30_18-14-16


Discovery and Reuse of Open Datasets: An Exploratory Study


Sara Mannheimer
Montana State University-Bozeman

Leila Belle Sterman
Montana State University-Bozeman

Susan Borda
Montana State University-Bozeman


Journal of eScience Librarianship.
Vol. 5 (2016), Iss. 1


Objective: This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories.

Methods: Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric. The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description.

Results: Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories. Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers. The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates.

Conclusions: The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets. Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

Direct to Full Text Article (15 pages; PDF)

Direct to Additional Files

Figure 1: Median Citations by Repository

Table 1: Characteristics of Cited/Downloaded Datasets

Figure 2: Academic Disciplines

Figure 3: Data Repository Preservation Policies

Appendix A: Data Repository and Dataset Analysis Rubric

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.