Business Insider: “Here’s the List of Websites Gig Workers Used to Fine-Tune Anthropic’s AI Models. Its Contractor Left It Wide Open”
From Business Insider:
An internal spreadsheet obtained by Business Insider shows which websites Surge AI gig workers were told to mine — and which to avoid — while fine-tuning Anthropic’s AI to make it sound more “helpful, honest, and harmless.”
The spreadsheet allows sources like Bloomberg, Harvard University, and the New England Journal of Medicine while blacklisting others like The New York Times and Reddit.
Anthropic says it wasn’t aware of the spreadsheet and said it was created by a third-party vendor, the data-labeling startup Surge AI, which declined to comment on this point.
[Clip]
Many of the whitelisted sources copyright or otherwise restrict their content. The Mayo Clinic, Cornell University, and Morningstar, whose main websites were all listed as “sites you can use,” told BI they don’t have any agreements with Anthropic to use this data for training AI models.
[Clip]
The list includes over 120 permitted websites from a wide range of fields, including academia, healthcare, law, and finance. It includes 10 US universities, including Harvard, Yale, Northwestern, and the University of Chicago.
[Clip]
Medical information sources, such as the New England Journal of Medicine, and government sources, such as a list of UN treaties and the US National Archives, are also in the whitelist. So are university publishers like Cambridge University Press.
Learn More, Read the Complete Article (about 1000 words)
Direct to Lists
- Teaching AI – Example Sites You Can Use (Insider via DocumentCloud)
- Teaching AI – Not Approved (Insider via DocumentCloud)
Filed under: Archives and Special Collections, Companies (Publishers/Vendors), Data Files, News
About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.


