Ranked AI models by what people actually use instead of benchmark scores - the benchmark champion barely makes the top 20

Reddit r/singularity 05/25/26, 12:28 PM News

ai-models benchmarks usage ranking open-source llm adoption

Summary

A ranking of AI models by real usage, cost, and speed reveals that benchmark champions often trail in actual adoption, with cheaper/faster models like Flash Lite and GPT-5 leading over premium counterparts like Gemini 3.1 Pro.

Most model leaderboards are just benchmark scores. I've been building one that ranks by real usage instead - how much each model is actually being run and talked about, plus cost and speed - and the order comes out almost unrecognisable. A few that stood out: * Gemini 3.1 Pro has the best benchmark scores of any model right now. By real usage it's only about #17 - it's still a preview, so hardly anyone's actually using it yet. * Google's most-used model isn't the Pro at all, it's the cheaper, faster Flash Lite. People reach for the cheap one, not the smartest one. * GPT-5.5 would sit near the top on benchmarks alone, but by usage it's around #22 - it's new and expensive, so most people haven't switched to it. * The model that comes out #1 overall isn't the benchmark leader either - it's GPT-5, which wins on sheer usage and how much it's talked about. OpenAI holds 6 of the top 7 the same way. The pattern across all of it: the best model on paper and the one people actually use are rarely the same, and usage tends to lag the benchmarks by a few weeks while people try a new release and decide if it's worth switching. Makes me wonder how much the benchmark race really matters to normal users versus price and availability. Do you actually use the top-benchmark model, or just whatever's cheap and fast enough? *(From an open-source ranking I've been building: AgentTape - if anyone wants the raw data!)*

Original Article

Ranked AI models by what people actually use instead of benchmark scores - the benchmark champion barely makes the top 20

Similar Articles

The "One-Size-Fits-All" AI era is dead. I benchmarked GPT-5.5, Claude 4.7, Gemini 3.1 Pro, and DeepSeek V4 Pro here is the actual state of the frontier.

Here is the current "Free-Tier AI Stack" for 2026

Does anyone else feel like AI benchmarks are becoming less useful for predicting real-world performance?

@aaron_epstein: New model just released that beats sonnet 4.6, gemini 3 flash, and gpt 5.4 mini on OCR, vision, and STT tasks @interfaz…

Arena.ai is running possibly the most fraudulent benchmark thus far

Submit Feedback

Similar Articles

The "One-Size-Fits-All" AI era is dead. I benchmarked GPT-5.5, Claude 4.7, Gemini 3.1 Pro, and DeepSeek V4 Pro here is the actual state of the frontier.

Here is the current "Free-Tier AI Stack" for 2026

Does anyone else feel like AI benchmarks are becoming less useful for predicting real-world performance?

@aaron_epstein: New model just released that beats sonnet 4.6, gemini 3 flash, and gpt 5.4 mini on OCR, vision, and STT tasks @interfaz…

Arena.ai is running possibly the most fraudulent benchmark thus far