Opus 4.7 在 SimpleBench 上得分低于 4.6 与 4.5

Reddit r/singularity 2026/04/22 13:03 模型

model-evaluation performance-regression benchmarking

摘要

Claude Opus 4.7 在 SimpleBench 评估中的表现较 4.6 与 4.5 版本有所下降。

暂无内容

查看原文

相似文章

Reddit r/singularity

Opus 4.8在MineBench 3D方块结构基准测试中相比Opus 4.7展现出更高的构建质量和更低的成本，尽管存在一些不一致性。该模型展示了更精简的推理过程和更高的推理效率。

X AI KOLs Timeline

Anthropic的Opus 4.8在Terminal-Bench 2.1上比GPT 5.5低3.6%，但擅长UI任务；Orca的编排功能让Codex能将UI任务委托给Claude Code。

Reddit r/singularity

对 Claude Opus 4.8 和 Claude Fable 5 在 MineBench 基准上的详细比较，重点突出了推理时间、成本、构建质量和提示敏感性方面的权衡。

Reddit r/singularity

ProgramBench 结果显示，Fable 5 的性能是 Opus 4.8 的两倍，即使在 99% 的运行中回退到 4.8。

Reddit r/singularity

DeepSWE Opus 4.8 的结果已发布，展示了其在基准测试中的表现。