
It scores 80.9% in SWE-bench Verified, the preferred benchmark for coding lately. Gemini 3 has 76.2% in this bench, and GPT-5.1-Codex-Max registers at 77.9%.
Continue reading “Anthropic launches Claude Opus 4.5, with top scores in most benchmarks”

It scores 80.9% in SWE-bench Verified, the preferred benchmark for coding lately. Gemini 3 has 76.2% in this bench, and GPT-5.1-Codex-Max registers at 77.9%.
Continue reading “Anthropic launches Claude Opus 4.5, with top scores in most benchmarks”