Report: Are AI Bots Knocking Cultural Heritage Offline?
From the GLAM-E Lab/ Engelberg Center on Innovation Law & Policy, NYU School of Law:
In late 2024, isolated accounts began to emerge from individual online cultural heritage collections. Those stories described servers and collections straining – and sometimes breaking – under the load of swarming bots. The bots were reportedly scraping all of the data from collections to build datasets to train AI models.
Did these reports reflect the experience of most online collections? Were they outliers? Or early warning signs?
The GLAM-E Lab surveyed dozens of GLAM (Gallery, Library, Archive, and Museum) institutions to begin to answer those questions. This report, published in June of 2025, documents how institutions are straining under swarms of scraping bots, and how things may get worse before they get better.
[Clip]
In brief, we found:
- Bots are widespread, although not universal. Of 43 respondents, 39 had experienced a recent increase in traffic. Twenty-seven of the 39 respondents experiencing an increase in traffic attributed it to AI training data bots, with an additional seven believing that bots could be contributing to the traffic.
- This increase in traffic has been hard to anticipate because few respondents were actively tracking bot traffic prior to the bots triggering a crisis in their collection. Many respondents did not realize they were experiencing a growth in bot traffic until the traffic reached the point where it overwhelmed the service and knocked online collections offline.
- Some respondents have been seeing an increase in bot traffic since 2021, while others did not experience their first spike until 2025.
- Some bots clearly identify themselves, while others take a range of measures to hide their source.
- When bots come, they tend to swarm for relatively brief periods of time. The frequency of these swarms may be increasing.
- Robots.txt is not currently an effective way to prevent bots from overwhelming collections.
- Respondents are deploying a range of home-grown and third-party firewall-based countermeasures to try to screen out bots based on IP address, geography, domain, and user agent string. Some of these efforts appear to be effective, although few are confident that they will be sustainable in the long term.
- Respondents are reluctant to take more aggressive steps to move collections behind things like login screens for a variety of reasons, including concerns about how effective those measures will be in the medium term, that implementing those changes can have negative impacts on welcome users, and whether login-based restrictions run counter to their larger goal of making the collections easily available online.
- Respondents worry that swarms of AI training data bots will create an environment of unsustainably escalating costs for providing online access to collections.
Direct to Full Text Report (by Michael Weinberg)
Direct to Full Text Report ||| Direct to Full Text Report (37 pages; PDF)
Media Coverage
“I’m confident in saying that this problem is widespread, and there are a lot of people and institutions who are worried about it and trying to think about what it means for the sustainability of these resources,” the author of the report, Michael Weinberg, told me. “A lot of people have invested a lot of time not only in making these resources available online, but building the community around institutions that do it. And this is a moment where that community feels collectively under threat and isn’t sure what the process is for solving the problem.”
Direct to Complete Article
Filed under: Archives and Special Collections, Data Files, Libraries, News, Patrons and Users, Reports
About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.



