Google’s AI Overview is wrong tens of millions of times per hour, NYT finds

Google gets it right more often than not, but 1 in 10 queries result in errors. (Picture: Adobe)
Sure, the measured accuracy by AI lab Oumi ticks in at a decent 90% — but when you scale it up to the sheer mass of Google’s traffic of more than five trillion searches per year, you get mind-boggling numbers of hundreds of thousands of «inaccuracies» per minute, the New York Times writes.

The test, conducted on the Gemini 3 generation of AI Overviews, was made using the SimpleQA dataset intended to probe for chatbot accuracy. It contains more than 4,000 questions with real, verifiable answers, that was made by OpenAI in 2024, Ars Technica reports.

Google’s AI answer machine pops up on every query these days, but it is difficult to tell precisely which model it uses for each task. For simple web searches, it might well opt for one of the faster Flash models rather than the more advanced Pro. It might also give different answers to the same question just milliseconds apart.

Google also doesn’t like the measurement being used, telling the NYT that «This study has serious holes. It doesn’t reflect what people are actually searching on Google.»

Read more: New York Times, Ars Technica.