Measured token consumption across 4 agent runtimes doing the same tasks. Costs ranged from 1x to 4x depending on cache architecture

Reddit r/AI_Agents 05/27/26, 04:12 AM News

agent-runtimes token-consumption prompt-caching claude-code open-source ai-cost comparison

Summary

A comparison of token consumption across four agent runtimes (Claude Code, OpenClaw, Hermes, and OpenClacky) on the same tasks reveals costs ranging from 0.8x to 4x relative to Claude Code, driven by differences in cache architecture and tool schema design.

I've been digging into why some agent runtimes burn through tokens so much faster than others, even when using the same model. Ran a controlled comparison on three real tasks and the gap was bigger than I expected. Setup: same model (Claude Sonnet), same tasks, measuring total input + output tokens. The agents tested were Claude Code, OpenClaw, Hermes, and ours (OpenClacky, open source). Rough results, normalized to Claude Code as 1.0x: - Hermes: ~3-4x. It ships 52 built-in tools. Every API call sends the full schema. That's 10-25k tokens of tool definitions per turn. If the schema shifts (dynamic tools), the whole thing is a cache miss. - OpenClaw: ~1.5x. Solid runtime, but skill loading touches the system prompt, which breaks prefix matching on every skill invocation. - Claude Code: 1.0x baseline. Good cache engineering, closed-source. - OpenClacky: ~0.8x. 16 tools, frozen system prompt, double cache markers. Cache hit rate stays above 90%. The underlying issue is pretty simple. On every turn, the API receives: system prompt + tool definitions + full conversation history. If prompt caching hits, you pay 1/10th price (Anthropic) or half price (OpenAI) for everything the model has already seen. If it misses, full price for all of it again. Most runtimes break their own cache without realizing it. The common ways: - Adding or removing tools mid-session changes the system prompt bytes - Loading new context into the system prompt (skills, memory, rules) - Compressing history at the wrong time rewrites what was already cached - Model switches split the cache namespace The fix isn't complicated in concept: freeze the prefix, put dynamic state elsewhere, use rolling cache markers so history growth doesn't invalidate prior turns. Took us two failed architectures and eight months to get the ordering right though. If you're running local models through something like LiteLLM or a local OpenAI-compatible server, it works. Cache benefits depend on your provider though. Anthropic and OpenAI have the best caching infra right now. Local setups still benefit from the smaller prompts regardless. Happy to go deeper on methodology if anyone wants.

Original Article

Measured token consumption across 4 agent runtimes doing the same tasks. Costs ranged from 1x to 4x depending on cache architecture

Similar Articles

Subagents Account for Most Token Costs in Long Agent Runs: Fixes That Cut Usage 70 to 90 Percent in Practice

OpenClaw + Hermes users: how many agents are you actually running day to day?

@ClementDelangue: Token costs are why there will be no saas apocalypse / good dev tools are cached intelligence for agents! The popular t…

Claude Token Counter, now with model comparisons

@_avichawla: A smarter Claude model burns more tokens, not fewer! And it's not a minor 3-5% difference. But 54% higher token usage. …

Submit Feedback

Similar Articles

Subagents Account for Most Token Costs in Long Agent Runs: Fixes That Cut Usage 70 to 90 Percent in Practice

OpenClaw + Hermes users: how many agents are you actually running day to day?

@ClementDelangue: Token costs are why there will be no saas apocalypse / good dev tools are cached intelligence for agents! The popular t…

Claude Token Counter, now with model comparisons

@_avichawla: A smarter Claude model burns more tokens, not fewer! And it's not a minor 3-5% difference. But 54% higher token usage. …