Professor Ted Underwood has received a $73,122 grant from the National Endowment for the Humanities to investigate the consequences of error in digital libraries. While digital libraries represent an immense storehouse of knowledge, the texts are full of errors because of the imperfect process by which they are transcribed optically.
“It isn’t unusual for five percent of the words in volumes to be mistranscribed, with the level of error much higher in some volumes,” said Underwood. “Simply measuring the fraction of mistranscribed words is easy. It’s harder to know how much difference those errors make for the methods and questions that actually interest researchers. Some forms of analysis are undisturbed by high levels of error; others may be quite sensitive, especially when errors are distributed unevenly across different historical periods and genres.”
Underwood will work with graduate students from the iSchool and English Department to construct parallel collections that pair each “clean” text with a realistically error-ridden version of the same book drawn from a digital library. The team will build collections of Chinese texts as well as English texts ranging from 1700 to the present, because different character sets and printing technologies produce different kinds of error. Then the team will apply a wide range of data-mining methods to both the clean and error-ridden collections and measure the distortion produced by transcription error and other common sources of noise. The project will provide tools that help other researchers estimate the level of uncertainty in their own conclusions.
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area.
He earned his MLIS degree from Wayne State University in Detroit.
Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.
From the Charleston City Paper: Librarians are learning the S.C. State Library in August quietly notified the national trade association for libraries that the state was not renewing its membership ...
From the Gillette News Record: Two months after she was fired and nearly two years after a criminal complaint was filed against her, former library director Terri Lesley is taking ...
From a SAGE News Release: Sage has launched a new collection of free-to-read research highlighting the effects of academic censorship on democracy, social-emotional learning, higher education, and more. Categories in ...
Here’s the Full Text of Today’s Annoucement From the University of Maryland Libraries: The University of Maryland Libraries announces the debut of a significant, newly digitized collection, making available online for the ...
From The New York Times: President Biden plans to announce on Thursday that he will devote federal money to create a new library and museum dedicated to his old friend ...
Here’s the Full Text of Today’s Boston Public Library Announcement: The Boston Public Library (BPL) is joining the Brooklyn Public Library’s Books Unbanned initiative to fight censorship and book banning by offering teens and ...
From the M-W Website: Signs of a healthy language include words being created, words being borrowed from other languages, and new meanings being given to existing words. Based on our ...
Brooklyn Public Library Brooklyn Public Library’s Leigh Hurwitz on Helping Young People Resist Censorship (via LitHub) Connecticut Local Libraries Investigating Book Thefts After Titles Go Missing (via NBC Connecticut) Copyright ...
ACM Tech Brief: Generative Artificial Intelligence AI Now Computational Power and AI (Report) Anthropic Amazon is Investing Up to $4 Billion in AI Startup Anthropic in Growing Tech Battle (via ...
From an EveryLibrary Release: The “Public Libraries and Book Bans – Parent Perception Survey” gathered insights from 853 parents and guardians with children under 18 during September 2023. The survey ...
From The Coeur d’Alene Press: “Kootenai County Sheriff Bob Norris said he has heard from both sides about reportedly inappropriate materials available to youth at local libraries. One side argued ...
AI Generative AI Has Disrupted Education. Here’s How It Can Be Used For Good – UNESCO (via WEF) University of Leeds Research Report on Potentials for AI in Libraries (via ...