Why AI Chatbots Hallucinate, According to OpenAI Researchers

OpenAI researchers claim they’ve cracked one of the biggest obstacles to large language model performance — hallucinations.

Hallucinations occur when a large language model generates inaccurate information that it presents as fact. They plague the most popular LLMs, from OpenAI’s GPT-5 to Anthropic’s Claude.

OpenAI’s baseline finding, which it made public in a paper released on Thursday, is that large language models hallucinate because the methods they’re trained under reward guessing more than admitting uncertainty.

In other words, LLMs are being told to fake it till they make it. Some are better than others, however. In a blog post last month, OpenAI said that Claude models are more “aware of their uncertainty and often avoid making statements that are inaccurate.” It also noted that Claude’s high refusal rates risked limiting its utility.

“Hallucinations persist due to the way most evaluations are graded — language models are optimized to be good test-takers, and guessing when uncertain improves test performance,” the researchers wrote in the paper.

Large language models are essentially always in “test-taking mode,” answering questions as if everything in life were binary — right or wrong, black or white.

In many ways, they’re not equipped for the realities of life, where uncertainty is more common than certainty, and true accuracy is not a given.

“Humans learn the value of expressing uncertainty outside of school, in the school of hard knocks. On the other hand, language models are primarily evaluated using exams that penalize uncertainty,” the researchers wrote.

The good news is that there is a fix, and it has to do with redesigning evaluation metrics.

“The root problem is the abundance of evaluations that are not aligned,” they wrote. “The numerous primary evaluations must be adjusted to stop penalizing abstentions when uncertain.”

In a blog post about the paper, OpenAI elaborated on what this type of adjustment would entail.

“The widely used, accuracy-based evals need to be updated so that their scoring discourages guessing. If the main scoreboards keep rewarding lucky guesses, models will keep learning to guess,” OpenAI said.

OpenAI did not immediately respond to a request for comment from Business Insider.

What's Hot

Couple Retired to Costa Rica With 2 Sons; Bought a $1.6 Million House

Stocks eye fresh highs as U.S.-Iran talks resume, oil rises on Hormuz tensions

Google Employee Who Made Nearly $1 Million Explains Why He Left

Why AI Chatbots Hallucinate, According to OpenAI Researchers

Couple Retired to Costa Rica With 2 Sons; Bought a $1.6 Million House

Google Employee Who Made Nearly $1 Million Explains Why He Left

Career Coach Recommends 4-Hour Burnout-Proof Job Application Routine

Couple Retired to Costa Rica With 2 Sons; Bought a $1.6 Million House

Stocks eye fresh highs as U.S.-Iran talks resume, oil rises on Hormuz tensions

Google Employee Who Made Nearly $1 Million Explains Why He Left

Concentrix Q2 2026 Earnings Preview

The Business of Formula One

Weddings and divorce: the scourge of investment returns

How F1 found a secret fuel to accelerate media rights growth

Archives

Categories

What's Hot

Why AI Chatbots Hallucinate, According to OpenAI Researchers

Related stories

Business Insider tells the innovative stories you want to know

Business Insider tells the innovative stories you want to know

Related Posts

Subscribe to Updates