In a new HathiTrust blog post Jeremy York (HathiTrust) and Kat Hagedorn (University of Michigan Library) take an in-depth look at the quality of scanned materials in HathiTrust. An appendix of error examples is also included.
York and Hagedorn write:
As reported in our monthly updates, we receive well over a hundred inquiries every month about quality problems with page images or OCR text of volumes in HathiTrust. That’s the bad news. The good news is that in most of these cases, there is something we can do about it. This blog post is intended to shed some light on our thinking and practices about quality in HathiTrust. We hope it will also encourage you to report any problems you might find so that we might have the opportunity to fix it, and deliver the highest quality collections we can for educational and research needs.
We go to great lengths to ensure we have the highest possible quality volumes in HathiTrust. Our approach to quality at a broad level is outlined in our commitment to quality. On a day-to-day level, we strive to offer one of the best user support teams around, responding to reported issues and providing updates as we make progress on addressing them. Someone might reasonably wonder, however, why there are quality problems in HathiTrust at all? Shouldn’t libraries, or HathiTrust, have better quality control? Aren’t librarians primarily concerned about information quality?