Artificial Analysis | Google's Go To Website for Benchmaxxing | Gemini 3.1 Pro is nowhere near Opus 4.7 in real life use
Summary
A comparison suggesting that Google's Gemini 3.1 Pro underperforms relative to Opus 4.7 in real-world usage, with the article highlighting Artificial Analysis as a go-to benchmarking resource.
Similar Articles
Gemini 3.5 Flash Looks Good For How Fast It Is (8 minute read)
Google released Gemini 3.5 Flash, a hybrid speed model that rivals Opus 4.7 and GPT-5.5 in speed and cost while performing well on agentic and coding benchmarks.
Gemma 4 31B's competence surprised me
A user shares anecdotal findings that Gemma 4 31B outperforms Qwen 3.6 models and matches Opus 4.7 in understanding and refactoring messy academic code, highlighting a benchmark (SciCode) where Gemma excels.
Been picking frontier models on benchmarks that don't match our deployment conditions
The article highlights a performance rank-order flip between Claude Opus and Gemini Pro on a forecasting benchmark, depending on whether models perform their own web research or are given fixed evidence. This suggests that Opus excels at the research phase while Gemini is superior at judgment over fixed evidence, exposing a mismatch between standard benchmarks and actual deployment conditions.
Should we totally give up on Gemini for coding?
A user reports that Gemini 3.1 Pro significantly underperforms Codex and Claude for coding, likening it to an inexperienced junior developer, and doubts Google's ability to compete in frontier coding models.
Gemini 3.5 Flash looks worse than it seems on Artificial Analysis
Comparison showing that Gemini 3.5 Flash scores slightly lower than Gemini 3.1 Pro in Artificial Analysis benchmarks and has a higher total benchmark cost despite lower per-token API pricing.