Fable 5 below even Gemini 3.1 on Livebench
Summary
A discussion on LiveBench results showing Fable 5 performing below Gemini 3.1, questioning whether the benchmark is flawed or Anthropic is optimizing for benchmarks.
Similar Articles
Fable 5 benchmark with remotion video
Fable 5 shows overall improvement over Opus 4.8 in video generation benchmarks, but Gemini 3.1 Pro demonstrates more artistic vision despite issues with tool calls and buggy code.
Gemini 3.5 Flash Benchmarks
Benchmark results for the Gemini 3.5 Flash model are discussed, likely showcasing its performance across various AI tasks.
Gemini 3.5 flash scores, hasn’t even beat GPT 5.4 xhigh
Gemini 3.5 flash has achieved certain benchmark scores but has not yet surpassed GPT 5.4 xhigh in performance.
Claude Fable 5 benchmarks
Anthropic released benchmarks for Claude Fable 5, a new AI model, showing significant performance improvements.
Gemini 3.5 Flash looks worse than it seems on Artificial Analysis
Comparison showing that Gemini 3.5 Flash scores slightly lower than Gemini 3.1 Pro in Artificial Analysis benchmarks and has a higher total benchmark cost despite lower per-token API pricing.