Highlights From HathiTrust Activities Update (June 2012)
The latest HathiTrust Update (June 2012) is now online. Here are a few highlights. You can access the complete issue here.
- HathiTrust has updated its bibliographic metadata specifications and minimum bibliographic metadata requirements in preparation for moving to Zephir (under development by California Digital Library) as the bibliographic metadata management system for HathiTrust. The requirements are in effect immediately for institutions that have not previously deposited content in HathiTrust.
- The University of Michigan made the first iteration of tools available to aid institutions in transforming, validating, and packaging digital content for deposit in HathiTrust. The tools can be downloaded at http://www.hathitrust.org/ingest_tools.
- Michigan staff completed the majority of development necessary to support a new rights status in HathiTrust Web applications. The status will apply to works that were restored to being in copyright in the United States by the General Agreement on Tariffs and Trade (GATT), but are now in the public domain in the rest of the world. An increasing number of these volumes are being identified as part of CRMS-World, the IMLS-funded continuation of the CRMS project.
- HathiTrust continued working with Boston College and began working with Penn State and the University of Illinois on ingest of volumes digitized by the Internet Archive.
- California Digital Library refined the algorithm used to score spelling suggestions based on queries extracted from HathiTrust log files and improved the way suggestions are made when stop words and words that are inappropriately combined are present in the query. The next step will be to experiment with making suggestions in different languages.
- Michigan removed a long-standing bottleneck in the full-text indexing process, effectively doubling throughput. Under ideal conditions, staff believe it should be possible now to index approximately 100,000 documents per hour.
Database Growth and Overall Size
- 50,193 volumes were added to the database during June
- Overall database size is now 10,408,905 items
- Public domain materials make up ~29% or 3,105,587 volumes
- Statistics and Visualizations
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.