Ran 35 agent trials across 4 browser-snapshot formats - pass rate was identical, token cost wasn't

Reddit r/AI_Agents 07/01/26, 01:46 PM Tools

browser-agent token-efficiency opera mcp open-source dev-tool

Summary

Opera's team tested browser snapshot formats for AI agents, finding identical pass rates but significantly lower token cost with their compressed format (opera-compact), which eliminates redundant ARIA attributes and repeated URLs. They released it as an open-source MCP server.

Not going to pretend, I'm on the team at Opera that builds browser tooling for agents, so take the numbers with whatever grain of salt that deserves. Question we kept running into: when an agent drives a browser, how much of its context actually goes to the task vs. re-reading the same page structure every step? Wanted a real number instead of a guess. Setup: 7 browser tasks (adapted from AXI's bench-browser suite), gpt-5.5 medium reasoning, 5 runs per condition. | Format | Pass | Avg input tokens | Tool calls | |---------------------------------------|------|------------------|------------| | unprocessed MCP (chrome-devtools-mcp) | 100% | 179.2k | 2.1 | | AXI reference CLI | 100% | 102.2k | 1.5 | | our raw output, compression off | 100% | 107.5k | 1.6 | | our compressed format (opera-compact) | 100% | 36.3k | 1.4 | Same pass rate everywhere - the difference is entirely in what the agent pays to get there. The compression comes from cutting stuff that's genuinely redundant: ARIA attributes implicit for a given role anyway, echoed text nodes, repeated URLs collapsed into a lookup table instead of inlined every time. Ships as opera-browser-cli / opera-devtools-mcp, Apache-2.0, plugs in as an MCP server: npm install -g opera-browser-cli && opera-browser-cli setup If you're running multi-agent pipelines where a sub-agent re-snapshots constantly, curious whether this gap widens or shrinks at that scale - we only tested single-agent loops. (links in comment)

Original Article

Ran 35 agent trials across 4 browser-snapshot formats - pass rate was identical, token cost wasn't

Similar Articles

The "browser agents are expensive and still maturing" framing might be missing something architectural

Measured token consumption across 4 agent runtimes doing the same tasks. Costs ranged from 1x to 4x depending on cache architecture

Agent Execution Tax: new procurement metric for browser agent benchmarks?

I benchmarked how badly AI agents read raw HTML. The gap was bigger than I expected.

Cut my browser-agent cost 50x by NOT using an agent loop. Plan-then-execute + numbers.

Submit Feedback

Similar Articles

The "browser agents are expensive and still maturing" framing might be missing something architectural

Measured token consumption across 4 agent runtimes doing the same tasks. Costs ranged from 1x to 4x depending on cache architecture

Agent Execution Tax: new procurement metric for browser agent benchmarks?

I benchmarked how badly AI agents read raw HTML. The gap was bigger than I expected.

Cut my browser-agent cost 50x by NOT using an agent loop. Plan-then-execute + numbers.