January 27, 2022

Transcribing Text: Library of Congress Launches Crowdsourcing Program to Aid in Discovery of Materials; Crowdsourcing Software Released as Open Source

From LC:

The Library of Congress today launched crowd.loc.gov, a crowdsourcing program that will connect the Library with virtual volunteers to transcribe text in digitized images from the Library’s historic collections.


Volunteers can work on selections from the papers of Abraham Lincoln and Mary Church Terrell, Clara Barton’s diaries, Branch Rickey’s baseball scouting reports or memoirs of Civil War veterans with disabilities from the William Oland Bourne Papers.

“Crowdsourcing demonstrates the passion of volunteers for history, learning and the power of technology to make those things more accessible,” said Librarian of Congress Carla Hayden. “The pages awaiting transcription at crowd.loc.gov represent some of the diversity of the Library’s treasure, and the metadata that will result from these transcriptions mean these digitized documents will have even greater use to classrooms, researchers or anyone who is curious about these historical figures.”


The transcripts developed and reviewed by volunteers will be made available on the Library’s website, loc.gov, making them keyword searchable for the first time. This will enhance access to handwritten and typed documents that computers cannot accurately extract text from.

Users will also benefit from greater on-screen readability and compatibility with screen readers used by people with visual disabilities.

“Earlier this month, the Library released its first digital strategy, an ambitious vision to throw open our treasure chest, connect with users and invest in our future. Crowd.loc.gov is an enormous leap forward in connecting with all Americans by welcoming their contributions to their Library,” said Kate Zwaard, the Library’s director of digital strategy. “We are especially excited about this tool’s potential to serve the curious – people who want to learn something interesting and unexpected.”

The Library will continue to add new material, including documents from the Rosa Parks papers, the woman’s suffrage movement, Civil War veterans, American poets and the history of psychiatry.


Crowd.loc.gov will launch with the Letters to Lincoln Challenge, inviting the public to transcribe 10,000 digital images from the Abraham Lincoln papers by the end of 2018.

Open Sourced

The software powering the crowdsourcing program has been released by the Library as open source for the benefit and use of other cultural heritage organizations considering similar efforts. View or contribute to the code repository at github.com/LibraryOfCongress/concordia.

Read the Complete Announcement

Direct to Crowd.Loc.gov

See Also: More Crowdsourcing: American Archive of Public Broadcasting (From LC and WGBH) Crowdsourcing Project
See Page 12 of PDF.

Other Libraries (A Few of Many Examples)

Learn About Crowdsourcing Projects at The British Library, Library and Archives Canada, and National Library of Australia


See Also: Transcription and Tagging: The Library of Congress Will Launch Crowdsourcing Program Next Week (October 17, 2018)

See Also: Library of Congress Labs Goes Live, a “Place to Encourage Innovation with LC Digital Collections” (September 17, 2018)

See Also: Journal Article: Crowdsourcing: How and Why Should Libraries Do It? (2010)



About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.