Preprint: “The Files are in the Computer: Copyright, Memorization, and Generative AI”
The essay (preprint) linked below was recently shared on arXiv.
Title
The Files are in the Computer: Copyright, Memorization, and Generative AI
Authors
A. Feder Cooper
Cornell University
James Grimmelmann
Cornell Tech and Cornell Law School
Source
via arXiv
DOI: arXiv:2404.12590
April 19, 2024
Abstract
A central issue in copyright lawsuits against generative-AI companies is the degree to which a generative-AI model does or does not “memorize” the data it was trained on. Unfortunately, the debate has been clouded by ambiguity over what “memorization” is, leading to legal debates in which participants often talk past one another. In this essay, we attempt to bring clarity to the conversation over memorization.
From the Essay
We take no position on what the most appropriate copyright regimes for generative-AI systems should be, and we express no opinion on how pending copyright lawsuits should be decided. Our goal is merely to describe how these systems work so that copyright scholars can develop their theories of generative AI on a firm technical foundation. We seek clarity, precision, and technical accuracy.
Access to Full Text Essay
39 pages; PDF.
UPDATE (April 26, 2024) Advocates Urge Law Journal to Disclose Microsoft, Google Ties (via Bloomberg)
Filed under: Data Files, News
About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.