After testing AI agents on real browser tasks, I think the hype is ahead of the infrastructure

Reddit r/AI_Agents News

Summary

The author tested AI agents on real browser tasks and found them unreliable due to infrastructure limitations, arguing for a dedicated browser runtime for agents rather than relying on current browsers designed for humans.

For the last few weeks, I’ve been trying to get Claude Code / Codex-style agents to handle real web tasks. Not toy demos. Actual boring tasks people eventually want agents to do: * scrape recent main posts from an X account, exclude reposts / replies / pinned posts, then rank by views and engagement * search LinkedIn for remote full-time agent developer jobs, open a company’s career page, upload a resume, fill the form, and stop before submit * search Redfin listings, apply filters, open a property, change the mortgage calculator, and extract the estimated monthly payment * search Expedia flights, filter nonstop, choose a valid airline, fill passenger info, and stop before payment At one point, AI Twitter made me believe this stuff was basically solved. Everyone was posting: “my agent books everything for me” “my agent applies to jobs while I sleep” “my agent can use any website” “the browser is just another tool now” So I tried to push beyond simple browsing. But reality? The agents still: \>lose track of tabs \>break on logged-in pages \>get confused by dynamic UI \>turn multi-step flows into endless click / observe loops \>fail when a modal, redirect, or stale screenshot appears And honestly, it started **feeling like we’re blaming the model** for problems that come from the browser layer. Claude Code and Codex are already pretty useful inside codebases. But the web is different. A website is stateful, logged-in, asynchronous, visual, and full of weird edge cases. Current browsers were built for one human, one cursor, one active tab — not for an agent running a task in parallel. That made me realize something important: **AI agents don’t just need better reasoning**. **They need a better browser environment.** The interesting direction, IMO, is not “put a chatbot inside Chrome.” It’s giving agents their own browser runtime: isolated spaces, persistent logged-in sessions, parallel execution, and code-level orchestration instead of brittle click/type/screenshot commands. This also made some of the newer projects in the space more interesting to me. ==ego lite seems to be treating the browser as infrastructure for agents rather than as a UI for humans. Whether that's the right approach remains to be seen, but it feels closer to solving the reliability problem than simply adding a stronger model.
Original Article

Similar Articles