May 19, 2022

Text and Data Mining News: “HathiTrust Research Center (HTRC) Extends Non-Consumptive Research Tools to Entire Corpus (Including Copyrighted Materials)

Major and exciting news from the HathiTrust Research Center (HTRC) today.

From HathiTrust:

2018-09-20_15-38-57Since 2011, HTRC has been developing services and tools to allow researchers to employ text and data mining methodologies using the HathiTrust collection.

To date, this service has been available only on the portion of the collection that is out of copyright.

With the development of a landmark HathiTrust policy and an updated release of HTRC AnalyticsHTRC now provides access to the text of the complete 16.7-million-item HathiTrust corpus for non-consumptive research, such as data mining and computational analysis, including items protected by copyright.

This extraordinary opportunity to use copyrighted materials for non-consumptive research purposes expands research access to the entire HathiTrust digital collection, which is sustained by HathiTrust’s 140+ member libraries.

Researchers may access HTRC’s easy-to-use computational tools ideal for beginners, as well as more complex tools to meet advanced data analysis needs.


This work has been several years in the making. A primary goal of HathiTrust is to enable the widest possible lawful research and educational uses of the HathiTrust collection. In recent years, US courts have recognized the solid legal basis for non-consumptive research on copyrighted materials. In 2016, HathiTrust established a working group to develop theNon-Consumptive Use Research Policy to ensure the responsible research use of copyrighted items.

The policy is now enacted in an updated release of HTRC Analytics, which allows researchers to conduct computational text analysis on copyrighted items as permitted under US copyright law. Non-consumptive research use DOES NOT change the legal status of items protected under copyright.

Read the Complete Blog Post, Learn About Specific HTRC Tools and Policies

See Also:  HathiTrust Non-Consumptive Use Research Policy

See Also:  Chart on HTRC Analytics Tool Access

See Also: Getting Started with HTRC Guide

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.