Artificial Analysis | Google's Go To Website for Benchmaxxing | Gemini 3.1 Pro is nowhere near Opus 4.7 in real life use

Reddit r/singularity 06/07/26, 11:39 AM News

ai-models benchmarking google gemini opus performance-comparison opinion

Summary

A comparison suggesting that Google's Gemini 3.1 Pro underperforms relative to Opus 4.7 in real-world usage, with the article highlighting Artificial Analysis as a go-to benchmarking resource.

Title

Original Article

Similar Articles

Gemini 3.5 Flash Looks Good For How Fast It Is (8 minute read)

TLDR AI

Google released Gemini 3.5 Flash, a hybrid speed model that rivals Opus 4.7 and GPT-5.5 in speed and cost while performing well on agentic and coding benchmarks.

Gemma 4 31B's competence surprised me

Reddit r/LocalLLaMA

A user shares anecdotal findings that Gemma 4 31B outperforms Qwen 3.6 models and matches Opus 4.7 in understanding and refactoring messy academic code, highlighting a benchmark (SciCode) where Gemma excels.

Been picking frontier models on benchmarks that don't match our deployment conditions

Reddit r/AI_Agents

The article highlights a performance rank-order flip between Claude Opus and Gemini Pro on a forecasting benchmark, depending on whether models perform their own web research or are given fixed evidence. This suggests that Opus excels at the research phase while Gemini is superior at judgment over fixed evidence, exposing a mismatch between standard benchmarks and actual deployment conditions.

Should we totally give up on Gemini for coding?

Reddit r/AI_Agents

A user reports that Gemini 3.1 Pro significantly underperforms Codex and Claude for coding, likening it to an inexperienced junior developer, and doubts Google's ability to compete in frontier coding models.

Gemini 3.5 Flash looks worse than it seems on Artificial Analysis

Reddit r/singularity

Comparison showing that Gemini 3.5 Flash scores slightly lower than Gemini 3.1 Pro in Artificial Analysis benchmarks and has a higher total benchmark cost despite lower per-token API pricing.