teknotum
Skip to content

Teknotum

Pitting humans against AI at FrontierMath yields mixed results

FrontierMath is notoriously difficult for machines to solve, but they are evolving quickly.
FrontierMath is notoriously difficult for machines to solve, but they are evolving quickly. (Picture: Epoch AI)
Epoch AI, the team behind the ridiculously difficult FrontierMath benchmark, decided to check how well humans do on it — and now predicts superhuman AI performance by a years time.

FrontierMath is a synthetic benchmark that contains 300 questions spanning from upper-graduate level to Field Medalist challenges, and the best machines on it score about 2%.

Undergrads and PhDs from MIT
For the human versus AI test, they went to MIT and chose 40 academics from exceptional undergrads to PhDs and split them into teams of four to five.

The eight teams were then given internet access and were told to solve 23 of the «easier» questions from the benchmark, while the same was given to the current top performing AI on the benchmark, which is OpenAIs o4-mini-medium.

FrontierMath versus humans results are mixed.
Humans do better on aggregate, but individual teams lost out to the OpenAI model

Better than teams, worse on aggregate
Once added up, the AI beat the average human team by small margins, but lost to the accumulated score of all the human teams. By a large margin.

The sum of the human teams won with 30-40 %, while the AI got a little over 20 %.

So, does this make humans better at solving math than AI, the researchers at Epoch AI ask?

The answer seems to be yes, but the questions in this test revolves more around reasoning than knowledge, to round out AIs already huge advantage of having more knowledge than «even the most erudite human mathematicians.»

o4 finished substantially faster
It also took o4-mini-medium a mere 5-20 minutes to complete each problem, finishing a lot faster than the humans did with around 40 minutes per task.

The report concludes that while humans teams and the current state of the art AI score in about the same ballpark, looking at the benchmark and AI evolution, they expect that to change.

Indeed, the report says they «think it’s likely that AIs will unambiguously exceed this threshold by the end of the year.»

— I think this is a useful human baseline that helps put FrontierMath evaluations into context, and I’m interested to see when AIs cross this threshold, the report concludes.

Read more: The report from Epoch AI and a handy Twitter thread. More on FrontierMath, discussion on r/singularity, and OpenAI’s o3, o4 release from teknotum.

Author Tor FosheimPosted on 27. May 202527. May 2025Tags AI, benchmarks

Post navigation

Previous Previous post: Court rejects free speech rights for AI chatbots — for now
Next Next post: Big Tech stops hiring new graduates, as entry level jobs dry up

You might also like

Timbaland’s next pop starlet is an AI avatar

AIFF, the AI Film Festival, showcases innovative video, as industry set to pounce

OpenAI’s Codex now available to ChatGPT Plus users

Anthropic CEO says it’s time to wake up on AI job losses

Big Tech stops hiring new graduates, as entry level jobs dry up

Court rejects free speech rights for AI chatbots — for now

From the front page

New vaccine combines nanotech and mRNA to eradicate pancreatic cancer

09:11 15 Jun 2025

Yes, there’s an AI ad out there — but the tech can do so much more

10:17 14 Jun 2025

Google Workspace now auto-summarizes PDFs for you

09:04 14 Jun 2025

Google experiments with audio overviews in Search results

08:48 14 Jun 2025

Meta invests $14.3 billion in infrastructure company Scale AI, rivals leaving

06:39 13 Jun 2025

Adobe advertising AI airplanes anthropic apple bard cancer chatgpt climate coding copyright defense drones energy facebook film game gemini google images instagram internet iphone law llama media meta Microsoft military netflix nvidia openai playstation research romfart science search sony sosiale medier streaming TV twitter vaccines work

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
  • About teknotum
  • Newsletter
Teknotum Manually edited with WordPress