Library of Congress: LC Labs Releases Report on Humans-in-the-Loop Machine Learning Research Framework
From the Library of Congress:
Library of Congress innovation specialists examining the role of human expertise and experience in developing machine-powered research tools today released a report detailing their findings. The “Humans in the Loop” recommendation report from LC Labs details the potential and responsibility of the Library of Congress in its ongoing work to deepen access to its vast collections and share knowledge with other institutions.
The Library’s digital experiments have resulted in popular public initiatives such as By the People, the crowdsourcing platform powered by volunteer transcription, Citizen DJ, a music discovery and mixing app, and Newspaper Navigator, a machine learning algorithm that uncovered more than a million images in the Chronicling America newspaper collection. To discover the combined power of machine learning and crowdsourcing, the “Humans in the Loop” experiment investigated each step of creating a machine learning algorithm, building an engaging crowdsourcing program, and launching a prototype web experience for potential users. Together these approaches could transform access and discovery of the Library’s vast resources by combining human expertise with machine learning outputs.
“As the cultural heritage community has used more digital approaches to help our users access and discover large collections, people have wondered about the role of real humans in the future study of humanities,” said Kate Zwaard, director of Digital Strategy at the Library of Congress. “We wanted to answer that question in a way that promises to engage people, remain mindful of ethical and privacy impacts, and make our collections useful. We want to offer this report as a resource for other scholars and institutions who share these goals.”
The Library’s popular U.S. Telephone Directory collection, with its consistent layouts and fonts and unique snapshots of American communities over time, provided the ideal test sample for “Humans in the Loop.” LC Labs staff, Library subject matter experts, and partners from AVP, a data solutions provider, designed an experiment based on machine learning and crowdsourcing processes that could be created with the telephone directory’s contents. Using bounding boxes drawn around business listings and addresses in the phone books and transcriptions of these segments, the experiment team created training data to teach an algorithm to keep drawing. Wireframe mockups of sample web presentations were created for testing with potential users and for showcasing how volunteers might engage with and learn more about the collection.
Though the telephone directories are organized alphabetically with businesses categorized by industry, the research team quickly found that machine learning catalyzed a workflow for identifying the specific business name and address data that can enable flexible searching, incorporating geographic data and other Library collections. The experiment revealed the value of human expertise from volunteers and staff alike in every step of the experiment. Validating contributions and feedback on workflow design set the stage to not only improve the discovery of related information and context, but also to return exponential dividends. The humans following manual workflows processed 119 Directory listings; through this careful analysis this work seeded the machine learning workflow that generated 15,000 listings in just four days.
While the complete findings of the “Humans in the Loop” report can be found here, two major themes emerged: designing flexible and informed approaches and major investment in staffing and resources will enable sustained success. No two collections are exactly the same, so the processes outlined in “Humans in the Loop” are not a one-size-fits-all solution, and there is no substitute for human enthusiasm for problem solving.
Direct to Humans in the Loop Website/Resources
Direct to Final Recommendations Document
(97 pages; PDF)
Learn More: Visit LC Labs
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.