Reports: “Anthropic Wins On Fair Use For Training Its LLM’s Loses On Building a ‘Central Library’ of Pirated Books”
Multiple items/resources follow.
From an Authors Alliance Post by Dave Hansen:
Yesterday, Judge Alsup released his decision on Anthropic’s motion for summary judgment in the fast-moving lawsuit it is defending, brought by three book authors on behalf of a class of millions objecting to Anthropic’s use of books for training its LLMs. We’ve recently posted about other aspects of the case related to the class action aspects, which are still pending, and the potential for settlement in this suit.
The decision represents a major win for Anthropic in that the decision found that its training AI on lawfully acquired copyrighted works was a fair use. Anthropic lost, however, on the issue of downloading pirated books to create a “central library” and more is still to come on the issue of Anthropic using those works for AI training.
Read the Complete Post (about 2070 words)
Direct to Full Text of the Opinion (via CourtListener)
Media Reports
From 404 Media:
This case, in which authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson sued Anthropic, maker of the Claude family of large language models, is one of dozens of high-profile lawsuits brought against AI giants. The authors sued Anthropic because the company scraped full copies of their books for the purposes of training their AI models from a now-notorious dataset called Books3, as well as from the piracy websites LibGen and Pirate Library Mirror (PiLiMi). The suit also claims that Anthropic bought used physical copies of books and scanned them for the purposes of training AI.
“From the start, Anthropic ‘had many places from which’ it could have purchased books, but it preferred to steal them to avoid ‘legal/practice/business slog,’ as cofounder and chief executive officer Dario Amodei put it. So, in January or February 2021, another Anthropic cofounder, Ben Mann, downloaded Books3, an online library of 196,640 books that he knew had been assembled from unauthorized copies of copyrighted books — that is, pirated,” William Alsup, a federal judge for the Northern District of California, wrote in his decision Monday. “Anthropic’s next pirated acquisitions involved downloading distributed, reshared copies of other pirate libraries. In June 2021, Mann downloaded in this way at least five million copies of books from Library Genesis, or LibGen, which he knew had been pirated. And, in July 2022, Anthropic likewise downloaded at least two million copies of books from the Pirate Library Mirror, or PiLiMi, which Anthropic knew had been pirated.”
Read the Complete Article (about 1090 words)
From The Verge:
Judge Alsup says the court will hold a separate trial on the pirated content used by Anthropic, which will determine the resulting damages.
“We are pleased that the Court recognized that using ‘works to train LLMs was transformative — spectacularly so,’ Anthropic spokesperson Jennifer Martinez said in an emailed statement to The Verge. “Consistent with copyright’s purpose in enabling creativity and fostering scientific progress, ‘Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.’”
Read the Complete Article (about 430 words)
Filed under: Data Files, Libraries, News, Reports
About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.



