OpenAI’s new ChatGPT-5.4 has native computer use and less hallucinations

The latest version of ChatGPT sees a marked jump in the benchmarks. (Picture: Adobe)
The new Thinking and Pro models are more «capable and efficient» and is the first OpenAI model with native computer use skills. It also improves on hallucinations and Office files creation — areas where Anthropic has been thriving.

— Together with advances in general reasoning, coding, and professional knowledge work, GPT‑5.4 enables more reliable agents, faster developer workflows, and higher-quality outputs across ChatGPT, the API, and Codex, OpenAI writes.

On hallucinations, it is 33% less likely to be wrong in its responses, and 18% less apt to have mistakes in replies compared to GPT‑5.2.

Lets you adjust thinking
You can adjust the model’s reasoning mid-flight. GPT‑5.4 Thinking outlines its work at the beginning of a complex query, so you can add instructions or «adjust its direction mid-response.»

This way, you likely wont have to go over the problem several times or start again to include new perspectives.

OpenAI is also going after Claude in this update, with improvements in native computer use, agent behavior and office file use.

On GDPval⁠, which tests agents’ abilities to do knowledge work across 44 occupations, GPT‑5.4 gets a score of matching or surpassing industry professionals 83.0% of the time, compared to 70.9% for GPT‑5.2. This could spook the markets.

There are also improvements on creating spreadsheets, presentations, and documents. Humans rated GPT‑5.4 68.0% higher than GPT‑5.2-created presentations because of better aesthetics, more visual variety, and better use of images. On testing against tasks for «a junior investment banking analyst,» GPT‑5.4 gets a score of 87.3%, compared to 68.4% for GPT‑5.2, another big step ahead.

At the same time, OpenAI is launching ChatGPT for Excel, which lets you use GPT inside your spreadsheets.

GPT-5.4 also gets 75.0% on OSWorld-Verified-benchmark, almost doubling GPT‑5.2’s 47.3%, and beating the human performance threshold at 72.4%.

On agentic coding, 5.4 is only slightly better than GPT-5.3 Codex, but much better than 5.2, if anyone still uses that — and it has lower latency. There is also a «/fast»-toggle that improves token speed 1.5 times with the same model.

OpenAI is claiming that ChatGPT-5.4-Codex is way better on «complex frontend tasks,» while producing noticeably better visuals and more functional results.

ChatGPT is «gradually rolling out» to ChatGPT and Codex as of this writing, and should be available soon for Plus, Pro and Team users.

In the API, it has a 1 million token context window and costs $2.50 for 1M input tokens and $15 per 1M output tokens.

Read more: OpenAI’s presentation, The Verge, Engadget, TechCrunch.