
The test, conducted on the Gemini 3 generation of AI Overviews, was made using the SimpleQA dataset intended to probe for chatbot accuracy. It contains more than 4,000 questions with real, verifiable answers, that was made by OpenAI in 2024, Ars Technica reports.
Google’s AI answer machine pops up on every query these days, but it is difficult to tell precisely which model it uses for each task. For simple web searches, it might well opt for one of the faster Flash models rather than the more advanced Pro. It might also give different answers to the same question just milliseconds apart.
Google also doesn’t like the measurement being used, telling the NYT that «This study has serious holes. It doesn’t reflect what people are actually searching on Google.»
Read more: New York Times, Ars Technica.













