
Confidently wrong answers by large language models have been plaguing both users and labs since AIs inception, but a new study from OpenAI seeks to find a solution.
LLMs are trained on finding the next word in huge datasets, they say, focusing solely on finding the correct word in a sequence rather than looking for accuracy.
Things that cannot be predicted from these patterns, like random birthdays, would always produce errors, they argue — no matter how advanced the algorithm is.
No wrongs or rights in training
As there are no «wrongs» or «rights» but rather good or bad fits, and no reward for being uncertain or delivering an answer asking for clarification, you get hallucinations.
So, OpenAI is arguing that the incentive model in training is the culprit, in rewarding overall correctness as a statistical figure, rather than being satisfied with inconclusive answers, which would be marked as «wrong.»
Models incentivized on overall correctness are encouraged to make wild guesses on answers it doesn’t know instead of admitting ignorance, they say.
Better to guess than admit failure
As an example, they suggest thinking about a multiple choice test: If you encounter a question you don’t know the answer to, it is better to take a guess than not answering.
This will give you a, say, one in three chance of being correct — and when dealing with massive amounts of data, it provides for an incentive to guess rather than not answering, which is guaranteed to be wrong.
For GPT-5, they managed to reduce hallucinations from 75% on o4-mini to 26% by incentivizing «don’t know» answers or asking for clarification on the SimpleQA eval.
This also reduced GPT-5s accuracy rate by two per cent, but grew its «abstention rate» (not giving a clear answer) to 52% compared to 1% for o4-mini.
Time to incentivize uncertainty
So what is the answer to this malaise? Even when fine tuned to give no answers when the model doesn’t know, GPT-5 still hallucinates at a rate of 26%, which is roughly one in four answers.
The solution is for benchmarks and training operations to penalize wrong answers more than they penalize no answers at all, and to give credit for understanding uncertainty.
Here they point to the benchmarks, saying they need to update «so that their scoring discourages guessing.»
The solution is nigh
If they keep using massive datasets and reward guesses, OpenAI argues, then hallucinations will eke out a slight advantage. But if they instead reward uncertainty or asking for clarification, then next generation training, like that on GPT-5, will win the day.
That’s not an easy solution, as all the evaluations opt for marking negative answers as «wrong.»
So, in short, OpenAI thinks hallucinations are manageable and could be a thing of the past not by rewarding higher accuracy, but by getting our heads around the fact that sometimes, no answer is better than a hazardous guess.
Read more: OpenAI’s paper and their blog post.