teknotum
Skip to content

Teknotum

Grok 4 launches with stunning benchmarks, setting a new standard

Grok 4 is blazingly good at benchmarks, but also quite expensive.
The freshly released model deserves to be called state of the art, nearly doubling the previous best results in two key benchmarks.

Training since February 2025 on a cluster of 110,000 Nvidia B200 chips, in one of the most capable private supercomputers — the new model impresses on their own launch stream.

To the benches
It scores above 50% on the notoriously difficult Humanity’s Last Exam, where the previous best was 21.6%, held by Google’s Gemini 2.5 Pro, followed by OpenAIs o3 at 20.3%.

These results haven’t been confirmed on the benchmark’s own leaderboard yet, but were touted by x.ai, the makers of the bot, at launch.

On ARC-AGI, which pits AIs against a human baseline, Grok 4 (Thinking) gets only marginally better results than o3 (High).

But if you look at ARC-AGI-2 — which is more evolved and difficult — results peak at 16.2%. This is almost twice as high as the nearest competitor, which is Claude Opus 4 with 8.6%.

Better than a PhD
— With respect to academic questions, Grok 4 is better than PhD level in every subject, no exceptions, said Elon Musk at launch, according to TechCrunch. — At times, it may lack common sense, and it has not yet invented new technologies or discovered new physics, but that is just a matter of time.

Coding and multimodality in Grok 4 comes later.

X.ai are just releasing the base tex-based general model today, opting for a phased rollout of a coding model in early August, a multimodal agent in September and video generation in October. (Hat tip to u/vasilenko93, who posted this slide in r/singularity)

The new model rightly achieves its «SOTA» label based on the benchmarks alone, but it is also quite expensive.

Subscriptioin options for Grok 4 Heavy comes at a steep price.
X.ai sure knows how to charge for the latest state of the art.

$3,000 per year
The Grok 4 Heavy model requires a new subscription tier called SuperGrok Heavy, which will cost you $3,000 yearly.

The «normal» Grok 4 is available to SuperGrok subscribers at $300 per year.

And next generation models are expected from OpenAI later this summer, while rumor has it there is also an upcoming Gemini Pro 3.0 from Google.

Grok 4 is on fourth placed at the widely regarded LMArena bechmark.

UPDATE (July 16): Grok 4 results just showed up on the highly regarded LMArena leaderboard, and it’s not looking as rosy as the benchmarks they showed in their launch post. The model ends up at fourth place, behind Gemini 2.5 Pro, ChatGPTs o3, 4o and 4.5 preview. This follows reports of underwhelming performances by users in the wild.

Read more: X.ai’s launch stream, more details on TechCrunch, and Tom’s Guide.

Author Tor FosheimPosted on 10. July 202516. July 2025Tags grok

Post navigation

Previous Previous post: Reuters: OpenAI to launch Chromium-based web browser «within weeks»
Next Next post: Friday’s news in short

You might also like

AI use to become mandatory at Microsoft division

Google rolls out Veo 3 for Gemini Pro users globally

With help from top AI labs, American teachers to get better, free training

Grok’s new «companions:» sex crazed lovebot and a profane firestarter

OpenAI launches ChatGPT Agent mode, for tasks both easy and tough

In a first, judge rules training AI on copyrighted works is fair use

From the front page

Sundar Pichai: Gemini 3.0 is going to be released «this year»

08:49 18 Oct 2025

Weekend roundup: Copilot everywhere, Veo 3.1 and Altman on morality

05:59 17 Oct 2025

Anthropic launches Haiku 4.5; at twice the speed and a third of the cost

05:38 16 Oct 2025

Sam Altman says GPT-5 will be more friendly, allow age-verified erotica

04:30 15 Oct 2025

Broadcom to supply OpenAI with 10 GW’s worth of custom chip capacity

07:24 14 Oct 2025

AI airplanes anthropic apple bard cancer chatgpt climate coding copilot copyright defense drones education energy facebook film game gemini google grok hardware images instagram internet iphone law llama meta Microsoft military netflix nvidia openai research science search sosiale medier stargate streaming veo video work xai zuckerberg

  • About teknotum
  • Newsletter

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
Teknotum Proudly powered by WordPress