We just got a sobering picture of how often AI models get their facts straight. This week, Google DeepMind introduced the FACTS Benchmark Suite, which measures how reliably AI models produce factually accurate answers.
It tests models in four areas: answering factoid questions from internal knowledge, using web search effectively, grounding responses in long documents, and interpreting images. The best model, Google’s Gemini 3 Pro, reached 69% accuracy, with other leading models falling well below that.
For context, if any of the reporters I manage filed stories that were 69% accurate, I would fire them.
Beyond journalism, this number should matter to businesses betting on AI. While models excel at speed and fluency, their factual reliability still lags far behind human expectations, especially in tasks involving niche knowledge, complex reasoning, or precise grounding in source material.
Even small factual errors can have outsized consequences in sectors such as finance, healthcare, and the law. This week, my talented colleague Melia Russell looked at how law firms are handling the rise of AI models as a source of legal truth. It’s messy: She recounts how one firm fired an employee because they filed a document riddled with fake cases after using ChatGPT to draft it.
The FACTS benchmark is a warning but also a roadmap: by quantifying where and how models fail, Google hopes to accelerate progress. But for now, the takeaway is clear: AI is getting better, but it’s still wrong about one-third of the time.
Sign up for BI’s Tech Memo newsletter here. Reach out to me via email at abarr@businessinsider.com.
