40% of my browser agent's sessions were silently failing and the LLM wasn't the problem
Summary
A developer discovered that 40% of browser agent sessions silently failed due to browser fingerprinting and automation detection, not LLM reasoning. An open-source tool called Leakish identified the issues.
Similar Articles
Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces
This paper demonstrates that websites can identify which large language model powers a browsing agent by analyzing its behavioral patterns and timing data, achieving up to 96% F1 score across 14 frontier LLMs. It formalizes this attack surface and shows that random timing delays are insufficient to prevent identification.
Cut my browser-agent cost 50x by NOT using an agent loop. Plan-then-execute + numbers.
Describes a technique to reduce LLM costs in browser agent tasks by using a single planning call followed by deterministic execution, achieving 50x cost reduction compared to standard agent loops.
Agent Execution Tax: new procurement metric for browser agent benchmarks?
Fireworks AI and Notte introduce the 'Agent Execution Tax' metric after running 720 browser agent tasks across four LLMs, finding that execution reliability—not intelligence—is the primary bottleneck in agentic AI, with one model wasting 22.9% of inference calls on malformed JSON.
The "browser agents are expensive and still maturing" framing might be missing something architectural
Discusses architectural issues with current browser agents using headless Chrome + AI layer, and presents Opera Neon's CLI as an alternative where AI is integrated into the browser, reducing token overhead and improving understanding.
Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems
This paper identifies a new class of injection attacks where payloads mimic the domain language to evade LLM injection detectors, showing detection rates drop dramatically (e.g., from 93.8% to 9.7% on Llama 3.1 8B). The vulnerability is systematic and extends to dedicated safety classifiers like Llama Guard 3, which detected zero camouflage payloads.