A new accuracy study gives AI Overviews a passing grade. But the fine print is where the real story lives, and it has direct implications for how you think about getting cited.
The number that has been circulating since yesterday is 90%. That is how often Google's AI Overviews answer correctly, according to a study Oumi ran on behalf of The New York Times across 4,326 searches. Accuracy was 85% when Gemini 2 was powering the feature, 91% after Google switched to Gemini 3. At Google's scale, even the 9% error rate means hundreds of thousands of wrong answers going out every minute, which is the angle most coverage took.
That framing, though, skips the finding that should actually change how you think about your content strategy.
With Gemini 3, 56% of the accurate responses were what Oumi called "ungrounded." The linked source did not support the answer. The AI said something correct, cited a page as evidence, and the page either did not contain that information or actively conflicted with it. That figure was 37% with Gemini 2. Accuracy went up. The citation problem got measurably worse.
Facebook and Reddit were the second and fourth most common sources cited across the entire study. Facebook appeared in five percent of correct answers and seven percent of wrong ones.
The citation is not evidence, it is a best guess
Google's AI Overviews are built to answer questions. The citation at the bottom is not how the answer was reached. It is Google matching the answer it already generated to whatever page it thinks is most associated with that topic. Those are very different operations, and the study shows the gap between them is growing.
With traditional search, you got ten links and you decided. With AI Overviews, you get one answer and a reference that signals authority without requiring you to verify it. Most people do not click through. The citation creates a feeling of sourcing that the data does not back up.
I wrote about this structural problem in the Info-Tech blueprint on AI-powered search last May. The gap between confidence and accuracy is baked into how these systems work. This study puts numbers on what was previously more of a qualitative concern.
The number worth remembering: With Gemini 3, 56% of accurate AI Overview responses linked to sources that did not support the answer, up from 37% with Gemini 2. Accuracy improved. Attribution got worse.
Reddit and Facebook are not anomalies, they are the web
Google's AI trained on the web as it exists. The web has a lot of Facebook posts and Reddit threads. They are high-traffic, constantly updated, and written in plain language that pattern-matches well to search queries. From the model's perspective, they look like useful signal.
A Reddit thread that got upvoted is not peer-reviewed. For casual queries that is often fine. For anything technical, health-related, legal, or financial, it is a real problem. And for B2B marketers whose buyers are researching categories and vendors, the question of what the AI is actually reading when it cites your industry deserves more attention than it usually gets.
The competitive opening here is real. Most companies are not producing content structured clearly enough for AI systems to parse. If yours is, you have a better shot at being cited, even in an environment where the citation logic is unreliable.
Three things to do differently because of this study
Check what AI Overviews say when they cite you. Being cited and being correctly represented are not the same thing. If you are tracking AI visibility and seeing your content appear, look at the claim the overview is making alongside your citation. If the AI is asserting something you never said, your brand is being attached to an answer you did not give. That is worth knowing.
Stop optimizing for length, start optimizing for findability. Google's AI is not working through your 2,500-word whitepaper to find your argument. It is finding the sentence that most directly answers a specific question. A clearly labeled FAQ block, a direct answer in the first paragraph of a section, a heading that states a conclusion rather than teases one, these are more useful to AI citation logic than comprehensive coverage.
Schema markup is no longer optional. If your content does not tell Google what it is, who wrote it, and what it covers through structured data, you are making it harder to be attributed correctly. This has been true for a while. The ungrounding finding makes it more urgent.
Getting cited is not the goal. Getting cited for something you actually said is.
How Google and its critics will each misread this study
Google called the study flawed, noting that the SimpleQA benchmark was designed by OpenAI specifically to include tricky questions where AI systems tend to fail. That is a fair methodological point. The study does not reflect the full distribution of what people actually search for.
Critics will take the 9% error rate and extrapolate to scale, which is dramatic but not wrong, just incomplete. The more durable finding is the ungrounding trend: as Google's model gets more capable, it is growing more confident generating answers that its own source links do not support. That is not a benchmark artifact. It is a design characteristic.
For content strategists, the practical response stays the same regardless of how the accuracy debate resolves. Make your content easy for AI to find, easy to attribute, and structured so the most important claim is findable without reading everything around it. That was good writing before AI search existed. It just matters more now.
Sources
- Metz, Cade. "How Accurate Are Google's A.I. Overviews?" The New York Times, 7 Apr. 2026. nytimes.com
- Oumi. SimpleQA Benchmark Study of Google AI Overviews. Conducted Oct. 2025 (Gemini 2) and Feb. 2026 (Gemini 3). oumi.ai
- Search Engine Land. "Google AI Overviews: 90% Accurate, Yet Millions of Errors Remain." 7 Apr. 2026. searchengineland.com
- Bellamkonda, Shashi. "Stay Relevant in the Era of AI-Powered Search." Info-Tech Research Group, May 2025. infotech.com
misunderstoodmarketing.com · B2B Marketing Strategy
