Google Researchers Find the Best AI Model Is 69% Right

2025-12-12T21:26:30.691Z

We just got a sobering picture of how often AI models get their facts straight. This week, Google DeepMind introduced the FACTS Benchmark Suite, which measures how reliably AI models produce factually accurate answers.

It tests models in four areas: answering factoid questions from internal knowledge, using web search effectively, grounding responses in long documents, and interpreting images. The best model, Google’s Gemini 3 Pro, reached 69% accuracy, with other leading models falling well below that.

For context, if any of the reporters I manage filed stories that were 69% accurate, I would fire them.

Beyond journalism, this number should matter to businesses betting on AI. While models excel at speed and fluency, their factual reliability still lags far behind human expectations, especially in tasks involving niche knowledge, complex reasoning, or precise grounding in source material.

Even small factual errors can have outsized consequences in sectors such as finance, healthcare, and the law. This week, my talented colleague Melia Russell looked at how law firms are handling the rise of AI models as a source of legal truth. It’s messy: She recounts how one firm fired an employee because they filed a document riddled with fake cases after using ChatGPT to draft it.

The FACTS benchmark is a warning but also a roadmap: by quantifying where and how models fail, Google hopes to accelerate progress. But for now, the takeaway is clear: AI is getting better, but it’s still wrong about one-third of the time.

Sign up for BI’s Tech Memo newsletter here. Reach out to me via email at abarr@businessinsider.com.

What's Hot

How FBI Undercover Actually Works, According to a Former Agent

Pentagon eyes Ukrainian interceptor drones to counter Iran

How Big-Name Hedge Funds Did in February

Google Researchers Find the Best AI Model Is 69% Right

How FBI Undercover Actually Works, According to a Former Agent

How Big-Name Hedge Funds Did in February

Morgan Stanley to Cut 3% of Worldwide Workforce in Key Business Lines

How FBI Undercover Actually Works, According to a Former Agent

Pentagon eyes Ukrainian interceptor drones to counter Iran

How Big-Name Hedge Funds Did in February

Morgan Stanley to Cut 3% of Worldwide Workforce in Key Business Lines

The Business of Formula One

Weddings and divorce: the scourge of investment returns

How F1 found a secret fuel to accelerate media rights growth

Archives

Categories

What's Hot

Google Researchers Find the Best AI Model Is 69% Right

Related Posts

Subscribe to Updates