Google has released a new assessment of artificial intelligence chatbots, concluding that even the most advanced systems still produce incorrect information at a troubling rate.
According to a report published in December 2025, Google found that leading AI chatbots achieve no more than 69% factual accuracy, meaning that roughly one out of every three answers is wrong, even when responses are delivered with confidence.
The findings come from Google’s newly introduced FACTS Benchmark Suite, developed by the company’s FACTS research team in collaboration with Kaggle. The benchmark is designed to measure whether AI systems provide information that is factually correct, rather than simply completing tasks in a fluent or convincing manner.
“Most evaluations today focus on task completion,” Google researchers noted in the report. “FACTS focuses on whether the answer is true.”
Google’s Gemini 3 Pro model ranked highest, scoring 69% overall accuracy. Other leading systems performed worse. ChatGPT-5 and Gemini 2.5 Pro scored close to 62%, while Claude 4.5 Opus reached about 51%, and Grok 4 scored roughly 54%.
The benchmark tested four areas:
Parametric knowledge, or facts learned during training
Search accuracy, measuring how well models retrieve correct information from the web
Grounding, which tests whether models stick to provided documents without inventing details
Multimodal understanding, including reading charts, graphs, and images
Multimodal tasks were the weakest category across all models, with accuracy often below 50%. Researchers warned that this weakness could lead to serious business errors, such as misreading financial charts or extracting incorrect data from documents.
Google’s report highlights risks for industries where accuracy is essential, including finance, healthcare, law, and human resources. In these fields, incorrect information can lead to financial losses, compliance violations, or harm to individuals.
The company stated that the problem is not just error rates, but the way AI presents mistakes. Chatbots often produce incorrect answers in a polished and confident tone, making errors harder for users to detect.
“Blind trust is risky,” the report stated, adding that human oversight remains essential.
The findings come as companies continue to invest heavily in AI. A separate survey by advisory firm Teneo, cited by the Wall Street Journal, found that 68% of CEOs plan to increase AI spending in 2026, even though fewer than half of current AI projects have delivered returns exceeding their costs.
Executives reported the strongest results in marketing and customer service, while applications in legal, HR, and security were described as less successful.
Despite concerns about automation, 67% of CEOs said AI is likely to increase entry-level hiring, rather than reduce it.

