DeepSeek enters the fight for token volume, Anthropic continues to dominate spend (12 minute read)

TLDR AI 06/10/26, 12:00 AM News

ai-gateway token-volume cost-efficiency production-ai anthropic deepseek model-mix

Summary

AI Gateway's May 2026 data shows DeepSeek's token share surged to 17% with minimal spend, while Anthropic retained 65% of spend, indicating cost-conscious routing and growing overall usage.

DeepSeek's share of tokens on AI Gateway jumped from under 1% to 17% in a single month while its share of spend stayed near 1%.

Original Article

View Cached Full Text

Cached at: 06/11/26, 12:12 AM

DeepSeek’s share of tokens on AI Gateway jumped from under 1% to 17% in a single month while its share of spend stayed near 1%.

DeepSeek enters the fight for token volume, Anthropic continues to dominate spend

Every month, AI Gateway routes tens of trillions of tokens between production applications and AI labs, giving us visibility into what AI usage actually looks like, separate from leaderboards and benchmarks.

May 2026 production index summary

Total AI Gateway tokens grew +20% MoM; total spend grew +43% MoM. Customers paid almost 20% more per token on average than in April.
DeepSeek’s share of tokens jumped from under 1% to 17% in a single month, while its share of spend stayed near 1%.
Anthropic’s share of spend grew from 61% to 65% in May, holding 70–80% of spend across every high-stakes use case (AI app generation, back office agents, and coding agents).
Cost-consciousness meant smarter routing between low-cost and frontier models. Customers got more deliberate about which model did which work, while overall usage kept climbing.

Last month, headlines about blown token budgets dominated tech news: Uber burned through its annual Claude Code budget shortly after Q1 and Amazon shut down KiroRank to curb unproductive tokenmaxxing. While runaway cost is a real problem, this month’s report shows that spend on production use cases still increased.

Two insights emerged from AI Gateway data in May:

Low-cost models entered production: New models shipped at price points that made the established labs look even more expensive, and they are capable enough to enter the mix in production.
Spend is increasing, but with smarter model mixes: Teams are still increasing token budgets, but they are implementing smarter routing strategies to get more value out of every dollar.

Low-cost models saw significant production volume for the first time

From February to April, volume distribution across labs on AI Gateway changed slowly, but in May, DeepSeek V4’s launch completely shifted token share. The low-cost end of the market that barely existed in April became AI Gateway’s third-largest provider by volume in May, without a significant impact on overall spend.

In April, DeepSeek accounted for less than 1% of AI Gateway tokens and less than 0.2% of spend. In May, its volume share jumped to 17% of tokens, putting it in third place, ahead of OpenAI. Almost all of the volume comes from two models: deepseek/deepseek-v4-flash and deepseek/deepseek-v4-pro, both released in May.

In May 2026, DeepSeek held 17% of monthly tokens, putting it third on the gateway by token volume.

The spend picture tells the other half of the story. Even though DeepSeek’s token share grew to 17% in a single month, its cost share stayed near 1%.

DeepSeek V4 Flash launched at $0.14 input / $0.28 output per million tokens, roughly 20–50× lower than comparable Anthropic models and 8–12× lower than other value-tier flagships like Qwen 3.6 Plus and Kimi K2.6. With a savings gap that big, teams adopted V4 Flash quickly.

DeepSeek was prominent in the previous token volume chart, but is nearly invisible in this spend chart.

Price alone wouldn’t have shifted DeepSeek’s volume that much in a month, meaning teams testing DeepSeek V4 against their existing evals found the output good enough to ship, not just low-cost enough to try.

Value-tier models have always existed on AI Gateway, but have never captured share at this scale, meaning DeepSeek V4 was the first model at its price point to clear the quality bar for production work.

Frontier labs continued to capture a majority of new spend

Even as the low-cost end of the market grew fastest in volume, the expensive end grew faster in dollars.

Anthropic’s token share grew from 26% to 32%, and its spend share from 61% to 65%. OpenAI’s token share held near 13%, but its spend share ticked up from 12% to 13% on a much larger total, so customers were paying more per OpenAI token in May.

The average token got more expensive in May, even with DeepSeek pulling the average down. That increase happened because the work that demands frontier models grew faster than the work that doesn’t. The AI coding agent use case shows the low-cost/frontier split most clearly:

DeepSeek drove 49% of the segment’s token volume, but only 4% of the cost.
Anthropic drove 28% of tokens and 70% of the cost.

Lower-cost models are now a meaningful part of production workflows, but frontier model use is still growing, driving the increase in overall spend.

In April 2026, xAI and MiniMax drove significant token volume in the coding agent use case.

In May 2026, DeepSeek took almost half of the coding agent use case, with xAI and MiniMax dropping off significantly. Back-office workloads stayed Anthropic-heavy across both months.

The frontier is getting more expensive per token, and customers are still paying. Anthropic continues to lead on spend, taking 65% of all gateway spend in May, and 70–80% of spend across every high-stakes use case.

In April 2026, Anthropic was the go-to frontier lab for high-stakes use cases like AI app generation, back office agents, and AI coding agents.

Anthropic continued to own high-stakes use cases in May 2026, even with DeepSeek V4’s significant gain in token volume.

Cost discipline became a routing strategy

Increased overall spend showed that demand for AI continued to grow in May, but teams applied more precision to their budgets through routing. They sent the cheap, high-volume work to lower-priced models and used frontier models where quality mattered most. Slow adoption of Google’s latest Flash model is a clear example.

Gemini 3.5 Flash launched in May at a higher price point than Gemini 3.0 Flash, but migration didn’t happen at scale. By month-end, 3.5 held only 7% of the Flash family’s tokens while 3.0 Flash held 90%.

When Gemini 3.5 Flash launched in May at a higher price than Gemini 3, migration didn’t happen at scale.

Compared to the rapid adoption of Gemini 3.1 Pro across February and March, slower migration to 3.5 Flash shows that teams happy with 3.0 Flash aren’t willing to pay the higher cost yet.

When Gemini 3.1 Pro launched in February, it gained 30% adoption immediately, and by the next month was the dominant model in the family.

Conclusion: Cost-effective, capable options mean smarter model mixes

This month’s report signals increased pricing sensitivity in the market, even as overall spend and token volume grow. That means developers are looking for ways to get more out of every dollar.

Data revealed two optimization strategies:

Using DeepSeek’s cheap, but capable V4 family for lower-risk, high-volume tasks
Choosing to delay model family upgrades until the ROI makes sense

Routing gives teams the ability to adjust their model mix, and budget, in real time as the labs compete for different layers of production AI workloads.

You can read the full report on the Vercel Blog.

DeepSeek enters the fight for token volume, Anthropic continues to dominate spend (12 minute read)

DeepSeek enters the fight for token volume, Anthropic continues to dominate spend

Low-cost models saw significant production volume for the first time

Frontier labs continued to capture a majority of new spend

Cost discipline became a routing strategy

Conclusion: Cost-effective, capable options mean smarter model mixes

Similar Articles

AI Gateway Production Trends (8 minute read)

)

@mark_k: Fascinating and very deep article about DeepSeek AI (@deepseek_ai). You would have never guessed what their strategy is…

Anthropic is renting Elon's GPUs for inference. The token shortage just started.

DeepSeek Announces Permanent Price Cut of 75% after Promotion Period

Submit Feedback

Similar Articles

AI Gateway Production Trends (8 minute read)

@mark_k: Fascinating and very deep article about DeepSeek AI (@deepseek_ai). You would have never guessed what their strategy is…

Anthropic is renting Elon's GPUs for inference. The token shortage just started.

DeepSeek Announces Permanent Price Cut of 75% after Promotion Period