Quality of HathiTrust Digitization Discussed in New Blog Post
In a new HathiTrust blog post Jeremy York (HathiTrust) and Kat Hagedorn (University of Michigan Library) take an in-depth look at the quality of scanned materials in HathiTrust. An appendix of error examples is also included.
York and Hagedorn write:
As reported in our monthly updates, we receive well over a hundred inquiries every month about quality problems with page images or OCR text of volumes in HathiTrust. That’s the bad news. The good news is that in most of these cases, there is something we can do about it. This blog post is intended to shed some light on our thinking and practices about quality in HathiTrust. We hope it will also encourage you to report any problems you might find so that we might have the opportunity to fix it, and deliver the highest quality collections we can for educational and research needs.
We go to great lengths to ensure we have the highest possible quality volumes in HathiTrust. Our approach to quality at a broad level is outlined in our commitment to quality. On a day-to-day level, we strive to offer one of the best user support teams around, responding to reported issues and providing updates as we make progress on addressing them. Someone might reasonably wonder, however, why there are quality problems in HathiTrust at all? Shouldn’t libraries, or HathiTrust, have better quality control? Aren’t librarians primarily concerned about information quality?
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.