Tag
Opus 4.8 Thinking continues to deteriorate on the Hard Prompts English benchmark on LMArena, scoring 23 points lower than Opus 4.6 Thinking, which retains the top spot.
Discusses performance trade-offs of offloading large AI model weights from GPU VRAM to system RAM, comparing different GPU configurations like RTX 5090 vs RTX6000 for models like DeepSeek V4 Pro.
swyx reflects on Sam Altman's idea of building businesses that improve as AI models improve, linking it to the emerging concept of Agent Labs, and notes a clear correlation with revenue spikes in Q4 2025.
Benchmark results for the Gemini 3.5 Flash model are discussed, likely showcasing its performance across various AI tasks.
A user reports that switching from a highly-compressed IQ4_XS quant to the larger IQ4_NL_XL quant of Qwen 3.6 dramatically improves agentic-coding accuracy, despite lower tok/s, urging others to favor bigger quants when VRAM allows.