
The new reasoning models are managing an ever so slight lead in many benchmarks and therefore earns the right to be called state of the art, but of particular note is that they improve on GPT o1 and o3-mini by almost 30% in the coding benchmark SWE-Bench Verified, OpenAI claims in their launch post.
— These are the smartest models we’ve released to date, representing a step change in ChatGPT’s capabilities for everyone from curious users to advanced researchers, says OpenAI.
Continue reading “ChatGPT o3 and o4-mini are big steps toward AI agents”












AI-bransjen har blitt enige med den amerikanske presidenten om en rekke frivillige tiltak for å bedre sikkerheten, sikre forskning på prioriterte felt, og ikke minst om å flagge innhold skapt av kunstig intelligens.