Report: “AI Hallucinations Can’t Be Stopped — but These Techniques Can Limit Their Damage”
From Nature:
It’s well known that all kinds of generative AI, including the large language models (LLMs) behind AI chatbots, make things up. This is both a strength and a weakness. It’s the reason for their celebrated inventive capacity, but it also means they sometimes blur truth and fiction, inserting incorrect details into apparently factual sentences. “They sound like politicians,” says Santosh Vempala, a theoretical computer scientist at Georgia Institute of Technology in Atlanta. They tend to “make up stuff and be totally confident no matter what”.
The particular problem of false scientific references is rife. In one 2024 study, various chatbots made mistakes between about 30% and 90% of the time on references, getting at least two of the paper’s title, first author or year of publication wrong1. Chatbots come with warning labels telling users to double-check anything important. But if chatbot responses are taken at face value, their hallucinations can lead to serious problems, as in the 2023 case of a US lawyer, Steven Schwartz, who cited non-existent legal cases in a court filing after using ChatGPT.
[Clip]
Fundamentally, LLMs aren’t designed to pump out facts. Rather, they compose responses that are statistically likely, based on patterns in their training data and on subsequent fine-tuning by techniques such as feedback from human testers. Although the process of training an LLM to predict the likely next words in a phrase is well understood, their precise internal workings are still mysterious, experts admit. Likewise, it isn’t always clear how hallucinations happen.
[Clip]One root cause is that LLMs work by compressing data. During training, these models squeeze the relationships between tens of trillions of words into billions of parameters — that is, the variables that determine the strengths of connections between artificial neurons. So they are bound to lose some information when they construct responses — effectively, expanding those compressed statistical patterns back out again. “Amazingly, they’re still able to reconstruct almost 98% of what they have been trained on, but then in that remaining 2%, they might go completely off the bat and give you a completely bad answer,” says Amr Awadallah, co-founder of Vectara, a company in Palo Alto, California, that aims to minimize hallucinations in generative AI.
[Clip]
There are a host of straightforward ways to reduce hallucinations. A model with more parameters that has been trained for longer tends to hallucinate less, but this is computationally expensive and involves trade-offs with other chatbot skills, such as an ability to generalize. Training on larger, cleaner data sets helps, but there are limits to what data are available.
Developers can also use an independent system, that has not been trained in the same way as the AI, to fact-check a chatbot response against an Internet search. Google’s Gemini system, for example, has a user option called double-check response, which will highlight parts of its answer in green (to show it has been verified by an Internet search) or brown (for disputed or uncertain content). This, however, is computationally expensive and takes time, says Awadallah. And such systems still hallucinate, he says, because the Internet is full of bad facts.
Learn More, Read the Complete Article (about 2580 words)
See Also: Hallucination Leaderboard
Filed under: Data Files, News, Patrons and Users

About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.