40% of my browser agent's sessions were silently failing and the LLM wasn't the problem

Reddit r/AI_Agents 05/21/26, 01:07 PM Tools

browser-agent puppeteer automation-detection fingerprinting llm production-debugging open-source

Summary

A developer discovered that 40% of browser agent sessions silently failed due to browser fingerprinting and automation detection, not LLM reasoning. An open-source tool called Leakish identified the issues.

I built a Puppeteer agent that passed every reasoning eval. In production, 40% of sessions returned degraded results with zero errors. The LLM was reasoning correctly over poisoned input. The browser was the blind spot. I verified this with an open source scanner whose full codebase is on GitHub and whose fingerprint checks execute locally, so I trusted the output before pointing it at my agent's sessions. The tool is called Leakish. My sessions were flagged on Canvas rendering, WebRTC, and automation detection surfaces I never thought to monitor. I still don't have a clean fix for making the browser layer invisible to these detection systems.

Original Article

Similar Articles

Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces

Hugging Face Daily Papers

This paper demonstrates that websites can identify which large language model powers a browsing agent by analyzing its behavioral patterns and timing data, achieving up to 96% F1 score across 14 frontier LLMs. It formalizes this attack surface and shows that random timing delays are insufficient to prevent identification.

Cut my browser-agent cost 50x by NOT using an agent loop. Plan-then-execute + numbers.

Reddit r/AI_Agents

Describes a technique to reduce LLM costs in browser agent tasks by using a single planning call followed by deterministic execution, achieving 50x cost reduction compared to standard agent loops.

Agent Execution Tax: new procurement metric for browser agent benchmarks?

Reddit r/LocalLLaMA

Fireworks AI and Notte introduce the 'Agent Execution Tax' metric after running 720 browser agent tasks across four LLMs, finding that execution reliability—not intelligence—is the primary bottleneck in agentic AI, with one model wasting 22.9% of inference calls on malformed JSON.

The "browser agents are expensive and still maturing" framing might be missing something architectural

Reddit r/AI_Agents

Discusses architectural issues with current browser agents using headless Chrome + AI layer, and presents Opera Neon's CLI as an alternative where AI is integrated into the browser, reducing token overhead and improving understanding.

Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

Hacker News Top

This paper identifies a new class of injection attacks where payloads mimic the domain language to evade LLM injection detectors, showing detection rates drop dramatically (e.g., from 93.8% to 9.7% on Llama 3.1 8B). The vulnerability is systematic and extends to dedicated safety classifiers like Llama Guard 3, which detected zero camouflage payloads.