model-performance

#model-performance

Opus 4.8 Thinking 在 LMArena 的 Hard Prompts English 基准测试中持续下滑（再次）

Reddit r/singularity ↗ · 昨天

Opus 4.8 Thinking 在 LMArena 的 Hard Prompts English 基准测试中持续下滑，得分比保持榜首的 Opus 4.6 Thinking 低 23 分。

0 人收藏 0 人点赞

#model-performance

Reddit r/LocalLLaMA ↗ · 2026-05-24

讨论了将大型AI模型权重从GPU显存卸载到系统内存时的性能权衡，比较了不同GPU配置（如RTX 5090与RTX6000）在运行DeepSeek V4 Pro等模型时的表现。

0 人收藏 0 人点赞

#model-performance

X AI KOLs Following ↗ · 2026-05-20 缓存

swyx 回顾了Sam Altman关于构建随着AI模型改进而改进的企业的想法，将其与新出现的Agent Labs概念联系起来，并指出与2025年第四季度收入激增有明显的相关性。

0 人收藏 0 人点赞

#model-performance

Reddit r/singularity ↗ · 2026-05-19

讨论了Gemini 3.5 Flash模型的基准测试结果，可能展示了它在各种AI任务上的表现。

0 人收藏 0 人点赞

#model-performance

Reddit r/LocalLLaMA ↗ · 2026-04-22

有用户反馈，把高度压缩的 IQ4_XS 换成更大的 IQ4_NL_XL 后，Qwen 3.6 的 Agent 编程准确率大幅提升；虽然 tok/s 下降，但只要 VRAM 够，强烈建议优先选更大的量化。

0 人收藏 0 人点赞