[browser-use-wasm] I made a browser-use agent that runs in WASM at zero cost

Reddit r/LocalLLaMA 06/12/26, 09:21 AM Tools

browser-use wasm webgpu open-source agent self-contained zero-cost

Summary

A developer built a self-contained browser-use agent that runs entirely in WASM/WebGPU at zero server cost, enabling full webpage control via natural language prompts.

The only cost is electricity! I built this in a few weeks since I couldn't find anything else like it. Demo: [https://pdufour.github.io/browser-use-wasm/](https://pdufour.github.io/browser-use-wasm/) Source Code: [https://github.com/pdufour/browser-use-wasm](https://github.com/pdufour/browser-use-wasm) One thing I've wanted to do for a while was add a widget to my page that allowed me to control the complete webpage just like any of the browser-use agents can. The key distinction is I wanted it to be fully self-contained, no serve involved. After a few weeks of tinkering I have a fairly good browser-use model running entirely via Snapdom / WASM / WebGPU / Wllama / ShowUi-2b and a little JS to tie it all together. **The browser use library I developed can handle all this:** * Typing into fields * Clicking links * Multi-turn actions (click on input, type something into it, click submit button) - all from one prompt - works 50% of the time * Changing dropdown options **Some lessons I learned making things others might find helpful:** 1. Tests are your friend, finding mind2web [https://github.com/OSU-NLP-Group/Mind2Web](https://github.com/OSU-NLP-Group/Mind2Web) and MiniWob [https://github.com/Farama-Foundation/miniwob-plusplus](https://github.com/Farama-Foundation/miniwob-plusplus) helped me continuously improve the accuracy on the browser-use actions 2. Browser use is very very hard. I've only supported a limited set of actions and even getting to that point was quite hard. To handle complex queries you need some kind of interaction loop but then you run into problems like figuring out when to end the loop. 3. Accuracy matters. For the longest time my click actions were off by a few px and I finally was able to track down the issue to the snapdom library. When a click is off by a few px that could mean its clicking in blank space rather than a button. I'm so glad this is fixed - [https://github.com/zumerlab/snapdom/issues/421](https://github.com/zumerlab/snapdom/issues/421). This code is super super alpha and a lot of stuff is probably broken but I thought I would share with Reddit to ask for feedback and see if people had any ideas on how to develop this further. I'm open to any ideas!

Original Article

[browser-use-wasm] I made a browser-use agent that runs in WASM at zero cost

Similar Articles

@browser_use: Introducing B, a browser agent template! Built on Eve by @vercel. Give any agent a real Browser Use Cloud browser. Watc…

The "browser agents are expensive and still maturing" framing might be missing something architectural

@browser_use: Watch an agent control 4 browsers at once. We're testing out a new cloud browser interface... browser-wall is the hub f…

@svpino: I'm yet to see an agent running inside a browser that doesn't feel like a hack. I tried a headless browser, but I can't…

@browser_use: A guide to hosting agents as reliable APIs

Submit Feedback

Similar Articles

@browser_use: Introducing B, a browser agent template! Built on Eve by @vercel. Give any agent a real Browser Use Cloud browser. Watc…

The "browser agents are expensive and still maturing" framing might be missing something architectural

@browser_use: Watch an agent control 4 browsers at once. We're testing out a new cloud browser interface... browser-wall is the hub f…

@svpino: I'm yet to see an agent running inside a browser that doesn't feel like a hack. I tried a headless browser, but I can't…

@browser_use: A guide to hosting agents as reliable APIs