The public can now explore more than 1.5 million historical newspaper images online and free of charge. The latest machine learning experience from LC Labs, Newspaper Navigator allows users to search visual content in American newspapers dating from 1789-1963.
How it Works
The user begins by entering a keyword that returns a selection of photos. Then the user can choose photos to search against, allowing the discovery of related images that were previously undetectable by search engines.
For decades, partners across the United States have collaborated to digitize newspapers through the Library’s Chronicling America website, a database of historical U.S. newspapers. The text of the newspapers is made searchable by character recognition technology, but users looking for specific images were required to page through the individual issues.
A Search Result For “Automobile”
The Developer
Through the creative ingenuity of Innovator in Residence Benjamin Lee and advances in machine learning, Newspaper Navigator now makes images in the newspapers searchable by enabling users to search by visual similarity. To create Newspaper Navigator, Lee trained computer algorithms to sort through 16 million Chronicling America newspaper pages in search of photographs, illustrations, maps, cartoons, comics, headlines and advertisements.
The idea for Lee’s groundbreaking project began with a Library crowdsourcing experiment by 2017 Innovator in Residence Tong Wang called Beyond Words, which invited members of the public to help identify cartoons, illustrations, photographs and advertisements in World War I-era newspapers. Users could draw boxes around visual content on a page, transcribe captions or review other users’ transcriptions.
[Clip]
Dataset Code
While image searching techniques are not new from tech companies, Newspaper Navigator marries cultural heritage with computer science. Users encounter a real-time demonstration of how algorithms are trained to scan millions of pieces of data in seconds. All code used in the project is open source and placed in the public domain for unrestricted re-use. The dataset code can be accessed at github.com/LibraryOfCongress/newspaper-navigator.
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area.
He earned his MLIS degree from Wayne State University in Detroit.
Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.
Here’s the Full Text of an Announcement From Lancaster University: A new project that works to increase access to valuable research is to receive more than £5.8 million [$7.15 Million/USD] ...
From the The Ministry of Higher Education and Research, Government of France: The Ministry of Higher Education and Research publishes the results of the French Open Science Monitor for 2022. ...
From a Post on the Programme On Democracy & Technology (DemTech), Oxford Internet Institute (University of Oxford) Website: Several members of our research team will be attending the 2023 International ...
From Governing: At a time when many states dispute whether accurate history should be taught in schools, Connecticut Historical Society is celebrating $1 million in federal funding to digitize its ...
From an arXiv Blog Post: arXivLabs, a framework for enabling the arXiv community to contribute to arXiv, continues to grow. We recently rolled out two new integrations—DagsHub and Influence Flower—to provide our ...
From Bard College: The Center for Curatorial Studies, Bard College (CCS Bard) today announced that acclaimed curator Robert Storr has donated major selections of his library and archive, an intensely ...
ACRL Acquires Threshold Achievement Test for Information Literacy (TATIL) Chat-Based Search and Discovery: Perplexity AI For iOS App Released & AI Search Startup Raises $26 Million To Offer Rival To ...
From a Science Article by Jeffery Brainard: Nearly two dozen journals from two of the fastest growing open-access publishers, including one of the world’s largest journals by volume, will no ...
Here’s the Full Text of an ALA Statement (March 27, 2023): The American Library Association (ALA) condemns—in the strongest terms possible—the violence, threats of violence and other acts of intimidation ...
From HathiTrust: The world has changed dramatically in the 15 years since HathiTrust’s creation and even more so in the 5 years since we adopted our 2019-2023 Strategic Directions. Despite the ...
From the Institute of Museum and Library Services (IMLS): The Institute of Museum and Library Services is pleased to announce 30 finalists for the 2023 National Medal for Museum and ...
Anticipating Preservation Needs of Archived Audio Tapes (via Library of Congress) Congress Introduces Bill to Tackle College Textbook Costs (via SPARC) Detailed Agenda and Updated Schedule Now Available For Spring ...