WebHarbor - We "dock" the real websites into local for web agents! [R]

Reddit r/MachineLearning Papers

Summary

WebHarbor packages 15 real websites (Amazon, GitHub, BBC, etc.) as self-contained Flask+SQLite apps in a single Docker image with sub-second reset, designed for reproducible web agent evaluation and training. The project invites community contributions to expand to 100+ sites, with co-authorship opportunities.

Hello! Excited to share our latest community-driven research project: [**WebHarbor: Docking Real Websites for Evolving GUI Agent Environments**](https://aiming-lab.github.io/webharbor.github.io)! **TL;DR**: 15 popular websites (Amazon, GitHub, BBC News, arXiv, Booking, Hugging Face, etc.) packaged as self-contained Flask + SQLite apps in a single Docker image, with a control plane that resets each site to byte-identical state in <1 second, all by human-in-the-loop coding agent (e.g., Claude Code or CodeX). We support all 643 WebVoyager tasks out of the box. **Call for contribution**: Our Next goal is 100+ popular websites — covering all of Online-Mind2Web (147 sites) and beyond. Two tracks: * Contribute a new mirror site (use the coding-agent pipeline → human verify → open PR) → co-author on the final paper * Review submitted PRs (5 reviews → co-author) We also released useful skills for you(your coding agent) to work on it! Typically you can create a new mirron within 1 day! See more contribution details at [Contribute Guide](https://aiming-lab.github.io/webharbor.github.io/#contribute). **Why WebHarbor:** running web agent benchmarks on the live web is a nightmare — reCAPTCHA, geo-blocks, content drift, network flakiness, and tasks that go stale within months. Plus you can't reset the live web, which rules out heavy RL training. **You will need a lightweight, easy-to-reset, task-driven evolving environments for web agent, both evaluation and training!** **Related Resources:** |Name|Link| |:-|:-| |🏠 WebHarbor Project Page|[WebHarbor](https://aiming-lab.github.io/webharbor.github.io/)| |🤗 HuggingFace Dataset|[ChilleD/WebHarbor](https://huggingface.co/datasets/ChilleD/WebHarbor)| |💻 WebHarbor GitHub|[Code Repo](https://github.com/aiming-lab/WebHarbor)| |📊 Contribution Guide|[Guide Details](https://aiming-lab.github.io/webharbor.github.io/#contribute)| |📝 Contribution Request Form|[Google Form](https://forms.gle/ngcD1rzAfUEphNmRA)| Welcome suggestions and discussions!
Original Article

Similar Articles

Harbor

Product Hunt

Harbor is a CLI and companion tool for spinning up complete local LLM stacks.

Fully in-browser container builds

Lobsters Hottest

A web application that builds containers entirely in the browser using client-side code, demonstrating the power of custom container tooling. Users can pick a base image, run a shell script, and export the resulting image as a tar file.

Launch HN: Runtime (YC P26) – Sandboxed coding agents for everyone on a team

Hacker News Top

Runtime is a platform that provides sandboxed coding agents with company context, integrations, and guardrails, allowing every team member to automate tasks and ship work using agents. It includes custom environments, specialized agents, observability, and supports various integrations and deployment options.