After testing AI agents on real browser tasks, I think the hype is ahead of the infrastructure

Reddit r/AI_Agents 06/01/26, 05:41 AM News

ai-agents browser-automation infrastructure reliability web-tasks testing

Summary

The author tested AI agents on real browser tasks and found them unreliable due to infrastructure limitations, arguing for a dedicated browser runtime for agents rather than relying on current browsers designed for humans.

For the last few weeks, I’ve been trying to get Claude Code / Codex-style agents to handle real web tasks. Not toy demos. Actual boring tasks people eventually want agents to do: * scrape recent main posts from an X account, exclude reposts / replies / pinned posts, then rank by views and engagement * search LinkedIn for remote full-time agent developer jobs, open a company’s career page, upload a resume, fill the form, and stop before submit * search Redfin listings, apply filters, open a property, change the mortgage calculator, and extract the estimated monthly payment * search Expedia flights, filter nonstop, choose a valid airline, fill passenger info, and stop before payment At one point, AI Twitter made me believe this stuff was basically solved. Everyone was posting: “my agent books everything for me” “my agent applies to jobs while I sleep” “my agent can use any website” “the browser is just another tool now” So I tried to push beyond simple browsing. But reality? The agents still: \>lose track of tabs \>break on logged-in pages \>get confused by dynamic UI \>turn multi-step flows into endless click / observe loops \>fail when a modal, redirect, or stale screenshot appears And honestly, it started **feeling like we’re blaming the model** for problems that come from the browser layer. Claude Code and Codex are already pretty useful inside codebases. But the web is different. A website is stateful, logged-in, asynchronous, visual, and full of weird edge cases. Current browsers were built for one human, one cursor, one active tab — not for an agent running a task in parallel. That made me realize something important: **AI agents don’t just need better reasoning**. **They need a better browser environment.** The interesting direction, IMO, is not “put a chatbot inside Chrome.” It’s giving agents their own browser runtime: isolated spaces, persistent logged-in sessions, parallel execution, and code-level orchestration instead of brittle click/type/screenshot commands. This also made some of the newer projects in the space more interesting to me. ==ego lite seems to be treating the browser as infrastructure for agents rather than as a UI for humans. Whether that's the right approach remains to be seen, but it feels closer to solving the reliability problem than simply adding a stronger model.

Original Article

After testing AI agents on real browser tasks, I think the hype is ahead of the infrastructure

Similar Articles

OpenAI Unveils Its First Custom AI Chip, Built for ChatGPT and Future AI Agents

AI is getting better at analysis. The problem is still the data.

@GoogleDeepMind: Watch → https://goo.gle/4w7S3LM Spotify → https://goo.gle/4eFgIA9 Apple Podcasts → https://goo.gle/3Sn4ZyM Or listen wh…

@GoogleDeepMind: What happens when millions of AI agents start negotiating, transacting, and delegating to one another? @weballergy join…

What is the best and affordable inference provider to run my AI agents?

Submit Feedback

Similar Articles

OpenAI Unveils Its First Custom AI Chip, Built for ChatGPT and Future AI Agents

AI is getting better at analysis. The problem is still the data.

@GoogleDeepMind: Watch → https://goo.gle/4w7S3LM Spotify → https://goo.gle/4eFgIA9 Apple Podcasts → https://goo.gle/3Sn4ZyM Or listen wh…

@GoogleDeepMind: What happens when millions of AI agents start negotiating, transacting, and delegating to one another? @weballergy join…

What is the best and affordable inference provider to run my AI agents?