
It should be better at doing everyday tasks, and along with upgrades to Claude in Excel, Anthropic is also launching Claude in Powerpoint in beta with this release.
It also supports «agent teams,» letting you «spin up multiple agents that work in parallel as a team that coordinates autonomously.»
Opus 4.6 was also built by Claude, in what seems to have become an industry standard to use their own coding tools for new models. GPT-5.3-Codex was built in a similar manner.
As for benchmarks, it beats most frontier models on almost every one of them. It scores 65.4% on coding-level Terminal-Bench 2.0, and does 68.8% on the difficult ARC-AGI-2, and 53% on Humanity’s Last Exam for general reasoning.
Also new with this model is the advent of «Adaptive thinking,» which lets Claude itself decide when to use deeper reasoning, and different «Effort»-levels for each query, set by users, which could save a few tokens.
Read more: Anthropic’s introduction, TechCrunch, CNBC. Discussion on r/Singularity.