Tag
A study from the Opus 4.8 card shows that while multi-agent systems do not achieve better results on ProgramBench, they reach mediocre solutions twice as fast.
GPT5.5 achieved the first solve on the difficult ProgramBench SWE benchmark, significantly outperforming Opus 4.7.