January 22, 2022

The Library of Congress Posts Solicitation For a Machine Learning/Deep Learning Pilot Program to “Maximize the Use of its Digital Collection”

Note: The solicitation linked below was first released on May 24, 2019 and updated yesterday. 

Direct to Complete Solicitation: 030ADV19Q0274

Pilot Summary

The Library has a need for expert services to perform research and development (R&D) for a machine learning/deep learning pilot program that will help maximize the use of its digital collections while supporting emerging styles of research and use.

Pilot Project Background

From the Statement of Work Document:

The Digital Innovation Labs Section (Labs) of the Digital Strategy Division of the Library of Congress (Library) in October 2018 published a five-year Digital Strategy for the agency. As outlined in that strategy, the Library seeks to maximize the value of its collections for research and to understand the technical capabilities and tools that are required to support the discovery and use of digital collections material.

The Library has hundreds of heterogeneous digital collections containing hundreds of millions of items. Image processing and machine learning provide promise in adding item-level information to digital collections material, which would aid in delivering relevant collections materials to users. The Library must conform to the highest ethical, transparency and accuracy standards when considering the use of automated methods to describe collections.

To adhere to these standards, the Library must test machine learning approaches across different materials, share results with community, and be able to “look inside the black box” of machine learning technologies in order to apply them appropriately in a Library setting. The Library wishes to engage in a test of machine learning technologies to preprocess text material in a way that would make that content more discoverable. The learning from this test would help determine how the Library can utilize this technology for a much larger scope of the collection.

By contributing to the Library’s capacity to facilitate research and access services for diverse users, this project will contribute to the Library’s strategic direction to be user-centered, digitally enabled, and data driven in addition to providing inputs to how the Library can maximize the use of its digital collections while supporting emerging styles of research and use. The project is designed to be transparent, to advance practice in the digital Library field and provide valuable information on options for processing digital materials at- scale.

2. Scope

2.1. The Contractor shall provide all expert services, labor, management, travel, and equipment to perform research and development (R&D) for a machine learning/deep learning pilot program to explore emerging methods in applying digital materials using machine learning, including deep learning techniques, in a Library context that conforms to ethical and professional standards.

3. Requirements
The Contractor shall:

3.1.  Explore emergent approaches to digital collections processing research,

3.2.  Advance understanding in several key areas of the Library’s Digital Strategy (available at https://www.Library.gov/digital-strategy) including:

3.3.  The Contractor, along with the Labs and other Library staff will explore a series of demands and options around digital collection pre-processing requirements at the Library and based on researcher needs. These are questions of both theoretical and practical importance to the Library and to the research and development of large-scale digital libraries of cultural heritage materials. The project team will apply their technical expertise in analyzing historic digital collections to the practical realities of managing and serving a very large heterogeneous digital collection for general and specialized users.

3.4.  The Contractor will have access to public domain Library of Congress textual collections and services for analysis, including the Chronicling America digitized newspaper data, and other public domain Library printed textual material (not handwritten): books, papers, periodicals, journals, etc. available at loc.gov/collections. The project team will also receive reference, logistic, and technical support and advice from Labs and other Library staff. All data created from analysis will be utilized at the Library’s discretion and after review.

4. Technical Requirements

4.1. Databases

4.1.1. The newspapers and technical details for the files being provided are available at:

https://chroniclingamerica.loc.gov/newspapers/ https://chroniclingamerica.loc.gov/about/api/ https://www.loc.gov/ndnp/guidelines/archive/guidelines1819.html

Direct to the Complete Statement of Work Document
6 pages; PDF. 

Learn More

About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.