The author tested AI agents on real browser tasks and found them unreliable due to infrastructure limitations, arguing for a dedicated browser runtime for agents rather than relying on current browsers designed for humans.
For the last few weeks, I’ve been trying to get Claude Code / Codex-style agents to handle real web tasks. Not toy demos. Actual boring tasks people eventually want agents to do: * scrape recent main posts from an X account, exclude reposts / replies / pinned posts, then rank by views and engagement * search LinkedIn for remote full-time agent developer jobs, open a company’s career page, upload a resume, fill the form, and stop before submit * search Redfin listings, apply filters, open a property, change the mortgage calculator, and extract the estimated monthly payment * search Expedia flights, filter nonstop, choose a valid airline, fill passenger info, and stop before payment At one point, AI Twitter made me believe this stuff was basically solved. Everyone was posting: “my agent books everything for me” “my agent applies to jobs while I sleep” “my agent can use any website” “the browser is just another tool now” So I tried to push beyond simple browsing. But reality? The agents still: \>lose track of tabs \>break on logged-in pages \>get confused by dynamic UI \>turn multi-step flows into endless click / observe loops \>fail when a modal, redirect, or stale screenshot appears And honestly, it started **feeling like we’re blaming the model** for problems that come from the browser layer. Claude Code and Codex are already pretty useful inside codebases. But the web is different. A website is stateful, logged-in, asynchronous, visual, and full of weird edge cases. Current browsers were built for one human, one cursor, one active tab — not for an agent running a task in parallel. That made me realize something important: **AI agents don’t just need better reasoning**. **They need a better browser environment.** The interesting direction, IMO, is not “put a chatbot inside Chrome.” It’s giving agents their own browser runtime: isolated spaces, persistent logged-in sessions, parallel execution, and code-level orchestration instead of brittle click/type/screenshot commands. This also made some of the newer projects in the space more interesting to me. ==ego lite seems to be treating the browser as infrastructure for agents rather than as a UI for humans. Whether that's the right approach remains to be seen, but it feels closer to solving the reliability problem than simply adding a stronger model.
OpenAI has announced its first custom AI chip designed specifically for running ChatGPT and future AI agents, marking a major step in reducing reliance on external hardware providers.
The author argues that AI analysis quality is limited more by data access and reliability than by reasoning, and that structured datasets dramatically improve outputs.
Google DeepMind promotes a podcast episode featuring Nenad Tomašev and Hannah Fry discussing AI agents, the future agentic economy, and related safety concerns.
Google DeepMind hosts a podcast discussing the rise of agentic economies, where millions of AI agents negotiate, transact, and delegate, and explores ways to diversify agent decision-making to avoid groupthink.