@yacineMTB: If this keeps up, everyone is going to switch to got 5.5 if they haven't already. It really seems like if you are still…
Summary
YacineMTB argues that GPT 5.5 (likely a typo) surpasses Anthropic's Opus models, suggesting users are switching away from Opus. Dylan Field criticizes Opus 4.8 for degraded curiosity and increased sycophancy.
View Cached Full Text
Cached at: 05/31/26, 12:28 AM
If this keeps up, everyone is going to switch to got 5.5 if they haven’t already. It really seems like if you are still using opus, you are simply just incapable of telling the difference. I’m just shocked at how big the gap is myself. Is anthropic done for?
Dylan Field (@zoink): Opus 4.8 is a very strange model. Clearly Anthropic tried to improve honesty, which is commendable. However, the model’s curiosity (already worse in 4.7) degraded further. Result is a judgmental personality + sycophancy + sooo much hedging. Basically the opposite of Opus 3.
Similar Articles
@bentossell: wait… if most people think 5.5 is better than 4.7, i assume that’s due to terminal coding benchmark… 4.8 is still outpe…
The tweet discusses the release of Claude Opus 4.8, which improves upon Opus 4.7 with sharper judgment and longer independent work, though it notes that version 5.5 still outperforms it on a terminal coding benchmark.
@orca_build: Anthropic’s new Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1… …but it’s noticeably better at UI tasks.…
Anthropic's Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1 but excels at UI tasks; Orca's orchestration enables Codex to delegate UI tasks to Claude Code.
@shikhr_: After Opus 4.9 Anthropic has no choice but to release Opus 5
A tweet speculates that Anthropic must release Opus 5 after Opus 4.9.
@omarsar0: Same here. Happy with Opus 4.8 (planning) and GPT-5.5 (execution). Also, breaking steps into smaller ones for increasin…
A developer shares satisfaction with Opus 4.8 for planning and GPT-5.5 for execution, emphasizing that breaking tasks into smaller steps improves quality and that dynamic workflows are underrated.
@sashimikun_void: GPT-5.5 outperformed Claude Opus 4.8 on the DEEPSWE benchmark. Opus 4.8 takes twice as long, generates three times the …
GPT-5.5 outperforms Claude Opus 4.8 on the DEEPSWE benchmark, achieving higher scores with lower cost and less token bloat.