May 20, 2022

Digitization Projects: New Info About the National Library of Medicine PubMed Central Back Issue Scanning Project

Yesterday, the National Library of Medicine posted a “sources sought” request for companies who might be interested in digitizing journals as part of the National Library of Medicine PubMed Central Back Issue Scanning Project.

It’s likely the request is related to this news release about a memorandum of understanding signed between NLM and the Wellcome Trust in April 2014.

[The memorandum calls for the organizations] to work together to work together to make thousands of complete back issues of historically-significant biomedical journals freely available online. The terms of the MOU include a donation of £750,000 ($1.2 million) to the NLM that will support coordination of the three-year project to scan original materials from NLM’s collection at the article level, and Wellcome’s work to secure copyright clearances and permissions for electronic deposit from publishers. NLM will undertake conservation of the original material to ensure its preservation for future generations.

The Latest

Included in the Sources Sought filing posted on were a number of documents (PDF) that might be of interest to both those with a specific interest in this project as well as others who are involved in planning other large-scaled digitization projects.


DRAFT Statement of Work (8 pages; PDF)

From it we learn:

The purpose of the project is to perform scanning of an identified set of biomedical journals from their earliest date of publication forward to a specified end date. These scanned journal images will be added to NLM’s PubMed Central Database and will initially comprise approximately 770,000 pages but may extend in phases to 1.5 million pages. The scope of the project is to scan journals from NLM’s own collections to the article level, capture bibliographic citation data (XML) for each article, produce full article text (OCR), create article level PDF files, and produce full page images of each page (TIFF).

Four additional documents provide a great deal of detail about the project:

A. Image Specifications and Functional Requirements for Citation Capture (57 pages; PDF)

B. List of Titles to Be Scanned and Guide to Reading the Title List

C. Style Guide For One Journal

Look for Updates to the Draft and Additional Materials Here

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.