May 20, 2022

Research Article: “The NIH Open Citation Collection: A Public Access, Broad Coverage Resource”

The following article was recently published by PLOS Biology.


The NIH Open Citation Collection: A Public Access, Broad Coverage Resource


B. Ian Hutchins
National Institutes of Health

Kirk L. Baker
National Institutes of Health

Matthew T. Davis
National Institutes of Health

Mario A. Diwersy

Ehsanul Haque
National Institutes of Health

Robert M. Harriman
National Institutes of Health

Travis A. Hoppe
National Institutes of Health

Stephen A. Leicht

Payam Meyer
National Institutes of Health

George M. Santangelo
National Institutes of Health


PLoS Biol 17(10): e3000385
DOI: 10.1371/journal.pbio.3000385


Citation data have remained hidden behind proprietary, restrictive licensing agreements, which raises barriers to entry for analysts wishing to use the data, increases the expense of performing large-scale analyses, and reduces the robustness and reproducibility of the conclusions. For the past several years, the National Institutes of Health (NIH) Office of Portfolio Analysis (OPA) has been aggregating and enhancing citation data that can be shared publicly. Here, we describe the NIH Open Citation Collection (NIH-OCC), a public access database for biomedical research that is made freely available to the community. This dataset, which has been carefully generated from unrestricted data sources such as MedLine, PubMed Central (PMC), and CrossRef, now underlies the citation statistics delivered in the NIH iCite analytic platform. We have also included data from a machine learning pipeline that identifies, extracts, resolves, and disambiguates references from full-text articles available on the internet. Open citation links are available to the public in a major update of iCite (

Direct to Full Text Article

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.