Library of Congress Labs Launches New Tool to Search Visual Content in Historical Newspapers
The public can now explore more than 1.5 million historical newspaper images online and free of charge. The latest machine learning experience from LC Labs, Newspaper Navigator allows users to search visual content in American newspapers dating from 1789-1963.
The user begins by entering a keyword that returns a selection of photos. Then the user can choose photos to search against, allowing the discovery of related images that were previously undetectable by search engines.
For decades, partners across the United States have collaborated to digitize newspapers through the Library’s Chronicling America website, a database of historical U.S. newspapers. The text of the newspapers is made searchable by character recognition technology, but users looking for specific images were required to page through the individual issues.
Through the creative ingenuity of Innovator in Residence Benjamin Lee and advances in machine learning, Newspaper Navigator now makes images in the newspapers searchable by enabling users to search by visual similarity. To create Newspaper Navigator, Lee trained computer algorithms to sort through 16 million Chronicling America newspaper pages in search of photographs, illustrations, maps, cartoons, comics, headlines and advertisements.
The idea for Lee’s groundbreaking project began with a Library crowdsourcing experiment by 2017 Innovator in Residence Tong Wang called Beyond Words, which invited members of the public to help identify cartoons, illustrations, photographs and advertisements in World War I-era newspapers. Users could draw boxes around visual content on a page, transcribe captions or review other users’ transcriptions.
While image searching techniques are not new from tech companies, Newspaper Navigator marries cultural heritage with computer science. Users encounter a real-time demonstration of how algorithms are trained to scan millions of pieces of data in seconds. All code used in the project is open source and placed in the public domain for unrestricted re-use. The dataset code can be accessed at github.com/LibraryOfCongress/newspaper-navigator.
Direct to Newspaper Navigator
Library of Congress Innovator in Residence Ben Lee Discusses His Newspaper Navigator Project That Uses Machine Learning to Extract Visual Content From Chronicling America & Announces Upcoming “Data Jam” to Preview Dataset (April 2020)
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.