Grok 4 launches with stunning benchmarks, setting a new standard

Grok 4 is blazingly good at benchmarks, but also quite expensive.
The freshly released model deserves to be called state of the art, nearly doubling the previous best results in two key benchmarks.

Training since February 2025 on a cluster of 110,000 Nvidia B200 chips, in one of the most capable private supercomputers — the new model impresses on their own launch stream.

To the benches
It scores above 50% on the notoriously difficult Humanity’s Last Exam, where the previous best was 21.6%, held by Google’s Gemini 2.5 Pro, followed by OpenAIs o3 at 20.3%.

These results haven’t been confirmed on the benchmark’s own leaderboard yet, but were touted by x.ai, the makers of the bot, at launch.

On ARC-AGI, which pits AIs against a human baseline, Grok 4 (Thinking) gets only marginally better results than o3 (High).

But if you look at ARC-AGI-2 — which is more evolved and difficult — results peak at 16.2%. This is almost twice as high as the nearest competitor, which is Claude Opus 4 with 8.6%.

Better than a PhD
— With respect to academic questions, Grok 4 is better than PhD level in every subject, no exceptions, said Elon Musk at launch, according to TechCrunch. — At times, it may lack common sense, and it has not yet invented new technologies or discovered new physics, but that is just a matter of time.

Coding and multimodality in Grok 4 comes later.

X.ai are just releasing the base tex-based general model today, opting for a phased rollout of a coding model in early August, a multimodal agent in September and video generation in October. (Hat tip to u/vasilenko93, who posted this slide in r/singularity)

The new model rightly achieves its «SOTA» label based on the benchmarks alone, but it is also quite expensive.

Subscriptioin options for Grok 4 Heavy comes at a steep price.
X.ai sure knows how to charge for the latest state of the art.

$3,000 per year
The Grok 4 Heavy model requires a new subscription tier called SuperGrok Heavy, which will cost you $3,000 yearly.

The «normal» Grok 4 is available to SuperGrok subscribers at $300 per year.

And next generation models are expected from OpenAI later this summer, while rumor has it there is also an upcoming Gemini Pro 3.0 from Google.

Grok 4 is on fourth placed at the widely regarded LMArena bechmark.

UPDATE (July 16): Grok 4 results just showed up on the highly regarded LMArena leaderboard, and it’s not looking as rosy as the benchmarks they showed in their launch post. The model ends up at fourth place, behind Gemini 2.5 Pro, ChatGPTs o3, 4o and 4.5 preview. This follows reports of underwhelming performances by users in the wild.

Read more: X.ai’s launch stream, more details on TechCrunch, and Tom’s Guide.