May 20, 2022

Quality of HathiTrust Digitization Discussed in New Blog Post

In a new HathiTrust blog post Jeremy York (HathiTrust) and Kat Hagedorn (University of Michigan Library) take an in-depth look at the quality of scanned materials in HathiTrust. An appendix of error examples is also included.

York and Hagedorn write:

As reported in our monthly updates, we receive well over a hundred inquiries every month about quality problems with page images or OCR text of volumes in HathiTrust. That’s the bad news. The good news is that in most of these cases, there is something we can do about it. This blog post is intended to shed some light on our thinking and practices about quality in HathiTrust. We hope it will also encourage you to report any problems you might find so that we might have the opportunity to fix it, and deliver the highest quality collections we can for educational and research needs.

We go to great lengths to ensure we have the highest possible quality volumes in HathiTrust. Our approach to quality at a broad level is outlined in our commitment to quality. On a day-to-day level, we strive to offer one of the best user support teams around, responding to reported issues and providing updates as we make progress on addressing them. Someone might reasonably wonder, however, why there are quality problems in HathiTrust at all? Shouldn’t libraries, or HathiTrust, have better quality control? Aren’t librarians primarily concerned about information quality?

Read the Complete Post (approx. 2000 words) and Appendix of Error Examples

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.