OpenAI updates its voice models with GPT-5-level reasoning in API

The new models are only available to developers so far. (Picture: Adobe)
The new models out today in the API are GPT-Realtime-2 that can handle requests and do stuff for you, while being a natural conversation partner.

Realtime-Translate can handle live translations from 70+ input languages into 13 output languages and keep up with the speaker.

While Realtime-Whisper is a transcriber that turns speech into text as it happens.

Sam Altman notes that it’s mostly young people that prefer voice interactions with ChatGPT, while older people like to type.

Realtime-2 is priced at a whopping $32 for 1 million input tokens and $64 for 1M outputs. The other models are priced cheaply.

The models are only available in the API so far, so no general access within ChatGPT. They can, however, be added to apps in Codex.

Read more: OpenAI’s announcement, on X.com, 9to5Mac, and TechCrunch.

OpenAI reorganizing teams to create audio-first model, due in early 2026

The new model will handle interruptions and cross-talk better, reports say. (Picture: generated)
According to a new report in The Information, the company has unified engineering, product and research teams to make a better audio model.

They aim for an early 2026 release, in time for the still under wraps hardware device with Jony Ive, said to be voice only and due for launch in «about a year,» reports TechCrunch.

The new audio model supposedly sounds more natural and emotive, can handle interruptions more naturally and speak at the same time as a human, according to OpenAI watcher Tibor Blaho.

OpenAI is said to be planning a whole family of devices, possibly including glasses and smart speakers that are supposed to function more like companions than assistants.

Read more: Original reporting by The Information, writeups on TechCrunch and by Tibor Blaho.