Tag
Fable 5 shows overall improvement over Opus 4.8 in video generation benchmarks, but Gemini 3.1 Pro demonstrates more artistic vision despite issues with tool calls and buggy code.
A comparison suggesting that Google's Gemini 3.1 Pro underperforms relative to Opus 4.7 in real-world usage, with the article highlighting Artificial Analysis as a go-to benchmarking resource.
User complains about the declining quality of Anthropic's Claude Opus model, from version 4.7 to 4.8, getting worse and worse, considering canceling subscription.
EyeBench-V3 visual benchmark evaluates Claude Opus 4.8, finding it still fails basic vision tasks, similar to IBench. The benchmark is introduced via a Twitter thread by Adonis Singh.
YacineMTB argues that GPT 5.5 (likely a typo) surpasses Anthropic's Opus models, suggesting users are switching away from Opus. Dylan Field criticizes Opus 4.8 for degraded curiosity and increased sycophancy.
Nick Kang adds a new task to his Twitter benchmark collection; Claude Opus 4.8 and other SOTA models pass, while Sonnet 4.6 and Grok 4.3 fail. Alfin remarks on Opus 4.8's dangerous capabilities.
The results of DeepSWE Opus 4.8 have been released, showcasing its performance on benchmarks.
Claude Opus 4.8 allows adding system instructions mid-conversation without breaking the prompt cache, reducing cost and latency for API requests.
A comparison of Opus and Qwen AI coding agents on the same bug and repo shows one agent finished 7x faster, sparking discussion on skills for single-prompt GitHub issue solving.
Anthropic releases Claude Opus 4.8, building on Opus 4.7 with sharper judgment and longer independent work capability, available at the same price.
The tweet discusses the release of Claude Opus 4.8, which improves upon Opus 4.7 with sharper judgment and longer independent work, though it notes that version 5.5 still outperforms it on a terminal coding benchmark.
Asking about the percentage of weight changes between Opus 4.7 and Opus 4.8.
User observes that the opus-4.8 model has degraded in performance since its launch.
Anthropic is preparing to release Opus 4.8, potentially alongside a release from OpenAI, marking a rare dual release event.
tunecat is a simple, self-hosted internet radio player controlled via IRC, written in pure Go with Opus transcoding. It runs as a lightweight server that serves audio files and responds to IRC commands.
Boris Cherny recommends using auto mode in Claude Code for parallel sessions, and ClaudeDevs announces that auto mode is now available on the Pro plan and supports Sonnet 4.6 and Opus 4.7.
A developer shares experience using cheap AI models (DeepSeek v4, Hunyuan Hy3 preview) to automate 90% of coding tasks, with Opus reserved for the harder 10%, highlighting cost and latency trade-offs.
A claim is made that a new AI model is comparable to Opus, a top-tier model, suggesting a significant advancement in performance.
The article discusses the unexpected rise in costs for advanced AI models like Opus 4.7, GPT 5.5, and Gemini 3.5 flash, contrasting with earlier expectations of decreasing prices.
A developer shares how they reduced their AI agent's weekly cost from $200 to $40 by routing simple subtasks to cheaper models like DeepSeek V4 Pro and Tencent Hunyuan while keeping complex reasoning on Opus 4.7, achieving comparable output quality for most work.