@larsencc: https://x.com/larsencc/status/2053862900289470765
Summary
This article details the production architecture behind the open-source browser-use library, explaining how to scale browser agents using AWS Lambda, SQS, and S3 for state management and retries.
View Cached Full Text
Cached at: 05/11/26, 04:40 PM
A Production Architecture for the Browser Use Open-Source Library
We open-sourced browser-use so anyone can run a browser agent locally with a few lines of Python.
Running millions of those agents in production with retries, timeouts, screenshots, audit trails, and billing requires infrastructure that took us 4,000+ commits to get right. This post is the architecture we use in production. If you want to run browser-use at scale in your own infrastructure, this is a good starting point.
What the open-source library gives you
The Agent class is the unit of work. You pass it a task, an LLM, and a BrowserSession. On each step it takes a screenshot and extracts the DOM in parallel, sends both to the LLM to decide the next action, executes that action, and repeats until the task is done.
This is the one method call we make in production. That’s the entire open-source library. Everything else in this post, the queues, the workers, the state management, the retry logic, is the infrastructure we built around it.
The architecture
The API is a FastAPI service on ECS Fargate. It accepts task creation requests, validates them, writes a row to the database, drops a message on SQS, and returns HTTP 202.
Behind it is a standard SQS queue with one message per agent run carrying the task ID and execution config. No ordering, no deduplication, no separate queue per workload type since agent tasks are independent of each other.
The worker is an AWS Lambda function with from browser_use import Agent at the top. It pulls messages off SQS, instantiates the agent, runs it to completion, and writes results to S3.
The complexity is in how each piece handles failure.
API entry
The handler validates the payload, creates session and task rows in the database, commits, sends an SQS message, and returns HTTP 202 with the task ID in under 50 milliseconds. The work hasn’t started yet.
The queue layer
We use a single standard SQS queue for all agent runs. We tried per-customer queues and priority-based routing early on but none of it improved throughput and all of it added operational overhead we didn’t want to maintain. A single queue with independent messages turned out to be enough.
The message carries the task ID, model and runtime settings, and a continuation counter that starts at zero:
The worker Lambda
The Lambda handler is wired to the queue via SQS event source mapping. For each message it parses the payload, provisions a browser session, constructs the LLM client, calls Agent(...).run(), writes step-by-step state to S3, and persists the final result to the database.
State in S3
We store four kinds of state in S3. Agent checkpoints are serialized after every step as JSON, which is what makes resumption possible. Screenshots are captured on each step for both the agent and for humans debugging failed runs. Execution logs are uploaded once on task completion. Output files like downloads or generated artifacts are served back through presigned URLs.
The uploads are fire-and-forget. If S3 is slow or returns a 503 on one upload, the agent run continues because we would rather lose a screenshot than fail a task.
When Lambda runs out of time
AWS Lambda has a 15-minute hard limit, but browser agents don’t. A task that takes 20 minutes shouldn’t fail because of a runtime constraint.
Two minutes before the Lambda deadline, the agent stops gracefully:
The handler checkpoints state to S3, sends a new SQS message with the continuation counter incremented, and returns success.
A new Lambda invocation picks up that message, fetches the state from S3, restores the Agent, and resumes from the last completed step. The user sees one task while the system stitches together N Lambda invocations through S3.
We cap the counter at 12 right now, which gives us about three hours of wall-clock time. This isn’t a hard constraint. We could raise it, but in practice we found that agents aren’t reliable enough at that many steps to justify running longer.
Failure handling
Lambda already gives you retries and SQS already gives you a DLQ, so we didn’t need a custom retry framework. We just needed to know when to use which.
If the handler throws, we report the message as failed via ReportBatchItemFailures. SQS puts it back on the queue when the visibility timeout expires and a different Lambda invocation picks it up. After three failed attempts the message goes to the DLQ.
There is no retry counter in our database. The retry state lives entirely in SQS metadata, so if we need to know whether a task is on its second attempt we check the message attributes, not our own tables.
The DLQ is a fire alarm. When messages land there it usually means we introduced a regression or there’s a deeper issue to investigate. It’s rare, and when it happens an engineer looks. We don’t auto-redrive.
Ephemeral storage cleanup
One thing we learned is that Lambda’s ephemeral storage (/tmp) does not auto-clean between invocations on a hot Lambda. If a previous agent run left files in /tmp, the next invocation on the same Lambda instance will see them. This can leak data between sessions or cause disk space issues on long-running instances.
We wipe the workspace at the start of every invocation and scope all writes to session-specific directories:
Why this works
The Lambda holds nothing across invocations. Everything that needs to survive lives in the database or S3, so we can kill workers without losing work.
The API doesn’t wait for the agent. It accepts the task, drops it on the queue, and gets out of the way. Customers never wait on agent execution at the API layer.
Lambda’s 15-minute limit was the biggest open question early on. Solving it with “save state, re-queue, restore state” instead of switching to a different runtime let us keep using Lambda’s auto-scaling and SQS’s retry semantics without introducing new infrastructure.
What survived
The library turns a natural-language task into browser actions. The infrastructure keeps thousands of those tasks running concurrently.
Getting both to work together reliably took 4,000+ commits and a lot of mistakes. We’ve hit every failure mode SQS and Lambda can throw at you, from event loop deadlocks to ephemeral storage leaking between invocations to Lambda continuations silently dropping state. The architecture in this post is what survived all of that.
Browser Use Cloud runs the open-source library on this architecture, plus everything we’ve built on top.
Similar Articles
@jhleath: https://x.com/jhleath/status/2065408690992148698
The author explains how they built a compute platform capable of launching millions of sandboxes per second in constant time, focusing on decoupled scheduling and capacity aggregation using Cassandra and S3.
The "browser agents are expensive and still maturing" framing might be missing something architectural
Discusses architectural issues with current browser agents using headless Chrome + AI layer, and presents Opera Neon's CLI as an alternative where AI is integrated into the browser, reducing token overhead and improving understanding.
@browser_use: https://x.com/browser_use/status/2066911791360422071
Browser Use Cloud rebuilt their infrastructure using Firecracker to reduce browser session costs from $0.06 to $0.02 per hour and achieve sub-second start times, while maintaining isolation and scalability.
How We Built Secure, Scalable Agent Sandbox Infrastructure (8 minute read)
Browser Use describes two patterns for isolating AI agents that execute code: isolating the tool vs isolating the agent. They implemented the agent isolation pattern using Unikraft micro-VMs on AWS, achieving secure, scalable, and disposable sandboxes.
The "agents need a browser" problem — I open-sourced my take on it
Otto (MIT) is an open-source browser extension that turns a real tab into a controllable node via CLI or agent, solving the 'agent needs a browser' problem without headless farms or expensive cloud services.