May 16, 2022

Library of Congress Innovator in Residence Ben Lee Discusses His Newspaper Navigator Project That Uses Machine Learning to Extract Visual Content From Chronicling America & Announces Upcoming “Data Jam” to Preview Dataset

From a LC Blog Post by Innovator in Residence, Ben Lee:

Looking through the historic newspapers in Chronicling America, I’m always struck by how visually rich the pages are: beautiful illustrations, fascinating political cartoons, and prominent headlines abound. During my time as an Innovator in Residence at the Library of Congress, I’m developing a project called Newspaper Navigator, the goal of which is to re-imagine how we can explore wonderfully rich visual content in Chronicling America.


The 16+ million historical newspapers within Chronicling America are fascinating to me on so many levels. They are a portal back in time and reveal the rich history of the United States in a way that is unique to historic newspapers, from local histories to fun advertisements. But what excites me most about Chronicling America is how it reaches such a wide range of the American public, including school groups, genealogists, journalists, local historians, researchers, and even people looking to recreate old cooking recipes!

With the Newspaper Navigator dataset, I hope to enable Chronicling America users to search the collection in entirely new ways.  What are some of these ways?

That’s where the Newspaper Navigator “data jam” comes in! Before the public website for interactive browsing goes live, I am holding an event called a “data jam” to provide a sneak peek of the Newspaper Navigator dataset. The goal of the data jam is to release specific subsets of these images grouped by category, topic, and publication date – such as all of the Civil War maps shown above – in order to see what people can do with them. That goes for any skill or interest level – no coding necessary! I’m imagining some people may just want to browse, for example, all of the advertisements in New York newspapers from 1920 to 1930. Or there may be people with even more specific research questions looking for maps of the shifting front lines during the Civil War. At the other end of the spectrum, there may be programmers looking to study the headlines using emerging techniques from natural language processing or create visualizations of the extracted photographs. There will be something for everyone–the only requirement to join is an interest in the dynamic images contained within this treasure trove of newspapers past and present.

Learn More, Read the Complete Blog Post, View Images

Upcoming Data Jam

At 2 pm on May 7, Innovator in Residence Ben Lee is hosting a remote data jam at the Library of Congress to dig into the visual content contained in historical newspapers. Join us to explore this information and formulate research questions of your own!

Direct to Newspaper Navigator Website

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.