OpenAI introduces Codex-Spark, greatly improving coding speed

Codex-Spark is small, fast, and almost as good as the real thing. (Pictures: OpenAI)
Thanks to their recent collaboration with Cerebras, the new model delivers «more than 1000 tokens per second while remaining highly capable for real-world coding tasks.»

The drawback is that it’s text-only and only has a 128K context window, and it’s supposed to be used «where latency matters as much as intelligence.»

The model is not as capable as GPT-5.3-Codex, but it comes in pretty close to its lower end, and thoroughly beats GPT-5.1-Codex in SWE-Bench Pro and Terminal-Bench 2.0.

In order to achieve this speed, OpenAI had to tinker with WebSocket and the Responses API, reducing client/server time by 80%, per-token overhead by 40% and time-to-first-token by 50%.

This will benefit other models «soon,» as it becomes the new default across products

GPT-5.3-Codex-Spark is available as a research preview for the $200/month Pro subscription, and is only available in the API for a «small set of designer partners,» with access expanding «over the coming weeks.»

It is the «first in a family of ultra-fast models,» OpenAI says.

Read more: OpenAI’s announcement, TechCrunch.