Along with the new Library Copyright Alliance amicus brief in AG v. HT, two law professors and a digital humanities expert have filed an amicus brief with the court.
The brief is signed by a total of 42 scholars from a number of disciplines.
Matthew L. Jockers
Department of English, University of Nebraska
Loyola University Chicago School of Law
University of California, Berkeley – School of Law
The 37 page (PDF) filing is available on SSRN with the title, “How Copyright Law Could Make or Break the Future for Digital Humanities”
Last September, The Authors Guild, Inc. filed claims for copyright infringement against the officers of the universities of Michigan, California, Wisconsin, Indiana and Cornell University for participating in the Google Book Project, an effort to mass digitize millions of books from university libraries, including those both in and out of copyright. The lawsuit is part of the Guild’s ongoing legal campaign against these efforts; it accuses the universities of participating in “systematic, concerted, widespread and unauthorized reproduction and distribution of millions of copyrighted books.” The complaint with respect to the universities is, first, that they allowed Google to digitize their library collections, and second, that the universities accepted corresponding digital files from Google and have consolidated those files into a shared digital repository known as the HathiTust. The HathiTust service enables a large collection of universities and research libraries to store, secure and search their digital collections using a shared infrastructure.
The case raises many legal, technical, and epistemological issues related to the future of higher education, research, and scholarship – especially those efforts that seek to take advantage of “big data” analytics and methodologies. Advances in computer technology and the availability of digital texts will allow scholars of the humanities a chance to do what biologists, physicists and economists have been doing for decades – analyze massive amounts of data. Large-scale quantitative projects like those being undertaken at the Stanford Literary Lab are unearthing previously unknowable information about individual works, and entire genres of literature.
Researchers working in Information Retrieval frequently use text mining and computer-aided classification to identify and retrieve relevant documents. Using similar techniques, researchers in the Digital Humanities are able to identify and retrieve relevant texts, often from unlikely places. Humanities researchers can thereby expand their traditional study of a few canonical works to a study of any one of the several million books in the larger archive of literary history—an archive that has hitherto remained hidden because of the limitations of humans’ reading capacity.
In an amicus brief filed on Friday, July 6, 2012, we joined 39 other scholars from disciplines including law, computer science, linguistics, history and literature to caution the court to consider the impact on this vital area of research when ruling on the legality of mass digitization. Specifically, the brief addresses whether United States copyright law should stand as an obstacle to statistical and computational analysis of the millions of books owned by the nation’s great university libraries. The Authors Guild argues that every digital copy of a book, made for any reason whatsoever, is an infringing copy and thus triggers liability for statutory damages, enhanced damages for willful conduct, attorneys’ fees, injunctive relief and destruction. The Guild has specifically asked the court to impound these digitized works and pull the plug on the HathiTrust’s power supply.
Our brief argues that, just as copyright law has long recognized the distinction between protection for an author’s original expression (e.g., the narrative prose describing the plot) and the public’s right to access the facts and ideas contained within that expression (e.g., a list of characters or the places they visit), the law must also recognize the distinction between copying books for expressive purposes (e.g., reading) and nonexpressive purposes, such as extracting metadata and conducting macroanalyses. We amici urge the court to follow established precedent with respect to Internet search engines, software reverse engineering, and plagiarism detection software and to hold that the digitization of books for text-mining purposes is a form of incidental or intermediate copying to be regarded as fair use as long as the end product is also nonexpressive or otherwise non-infringing.