ChatGPT o3 and o4-mini are big steps toward AI agents

The models are inching ahead in benchmarks, but multimodality is where they truly shine.
The models are inching ahead in benchmarks, but multimodality is where they truly shine. (Picture: OpenAI)
OpenAI’s latest model drop hints at a future where agents can do most of our work — and is proving the point with image processing.

The new reasoning models are managing an ever so slight lead in many benchmarks and therefore earns the right to be called state of the art, but of particular note is that they improve on GPT o1 and o3-mini by almost 30% in the coding benchmark SWE-Bench Verified, OpenAI claims in their launch post.

— These are the smartest models we’ve released to date, representing a step change in ChatGPT’s capabilities for everyone from curious users to advanced researchers, says OpenAI.

Uses all the tools in the box
What steals the show though, is their ability to agentically use all the tools in the ChatGPT toolbox, such as image generation, image analysis and multiple web searches — and that really shines through on image recognition.

The models can zoom, tilt and rotate any image you give it, and coupled with its ability to reason and do web search has opened something of a pandora’s box.

At first it was thought that this capability would be handy for making final images out of sketches, or for analyzing whiteboards, writes The Verge.

Social media goes geo guessing
But on social media, users are already embracing the new tech, and are using it to play geolocation games, writes TechCrunch.

This is a game where you try to stump the model by giving it increasingly difficult images and make it guess the location. It seems it is surprisingly good at spotting landmarks and cities, and can even identify a restaurant location from its menu.

Analyzes and presents web search
The models are also flexible when using web searches, and can «search the web multiple times with the help of search providers, look at results, and try new searches if they need more info,» says OpenAI

If you ask about weather patterns, for instance, OpenAI says:

— The model can search the web for public utility data, write Python code to build a forecast, generate a graph or image, and explain the key factors behind the prediction, chaining together multiple tool calls.

This kind of ability to not just do a simple web search, but to analyze, reason and process the data for you hints at how powerful future models can be.

Not free, yet
GPT o3 and o4-mini are available since yesterday on the Pro and Plus tiers for subscribers, there is no word yet on if they will be open for free users.

And as for the now famously confusing lineup of GPT models, Sam Altman has teased that GPT 5 will arrive in summer, and thereby ending the naming confusion.

Read more: OpenAIs launch post, Engadget, The Verge and TechCrunch.

2 thoughts on “ChatGPT o3 and o4-mini are big steps toward AI agents”

Comments are closed.