@shao__meng: 微软发布终端原生 Web Agent 框架：Webwright https://github.com/microsoft/webwright… 核心设计：代码即动作传统网页智能体采用"观察→预测下一步点击→执行"的循环，每一步都依赖 L…

X AI KOLs Timeline 2026/05/27 00:44 工具

web-agent microsoft open-source playwright terminal automation framework

摘要

微软发布了终端原生的 Web Agent 框架 Webwright，通过让 LLM 编写 Playwright 脚本来实现网页操作自动化，具有极简架构和 SOTA 性能，并支持多种模型后端和产品集成。

微软发布终端原生 Web Agent 框架：Webwright https://github.com/microsoft/webwright… 核心设计：代码即动作传统网页智能体采用"观察→预测下一步点击→执行"的循环，每一步都依赖 LLM 判断。Webwright 的做法更贴近软件工程师思维： · 让 LLM 写 Playwright 脚本 —— 把网页操作变成可运行的 Python 程序 · 工作区即状态 —— 脚本、截图、日志保存在本地，浏览器会话可随时重建 · 终端优先 —— 核心循环只有三个模块约 1000 行代码，无隐藏编排层这种模式产生的"副产物"是可复用的自动化程序，而非一次性交互痕迹。性能表现：SOTA 水平 · Online-Mind2Web (300 任务)：86.7% (GPT-5.4)，同类开源框架中最高 · Odysseys (200 长程任务，平均 76.1 步)：60.1% (GPT-5.4)，较此前 SOTA (+15.6pt)，较基线 GPT-5.4 (+26.6pt) · Claude Opus 4.7：84.7% / 难例 80.5%，难例上超越 GPT-5.4 架构极简 Runner (150行) -> Model Endpoint (550行) -> Environment (300行) · 仅依赖 httpx、pydantic、playwright、typer · 无多智能体系统、无图引擎、无插件层 · 支持 OpenAI、Anthropic、OpenRouter 后端产品化与集成 · Claude Code：插件 /plugin install webwright@webwright，支持 /webwright:run 和 /webwright:craft · OpenAI Codex：插件市场安装，通过 @ webwright 调用 · OpenClaw / Hermes：共享 skills/webwright/ 目录，统一技能规范关键创新点 · Task2UI 模式 (2026-05-11 新增) —— 任务完成后自动渲染为 HTML 应用，结果可视化且可重用 · 脚本可复用性 —— 即使是 Qwen-3.5-9B 这样的小模型，在预置工具脚本辅助下也能达到 66.2% 的难例完成率 · 可审计性 —— 每次运行都保存轨迹、截图、报告，便于调试和回归

查看原文

查看缓存全文

缓存时间: 2026/05/27 09:21

微软发布终端原生 Web Agent 框架：Webwright https://github.com/microsoft/webwright…

核心设计：代码即动作传统网页智能体采用“观察→预测下一步点击→执行“的循环，每一步都依赖 LLM 判断。Webwright 的做法更贴近软件工程师思维： · 让 LLM 写 Playwright 脚本 —— 把网页操作变成可运行的 Python 程序 · 工作区即状态 —— 脚本、截图、日志保存在本地，浏览器会话可随时重建 · 终端优先 —— 核心循环只有三个模块约 1000 行代码，无隐藏编排层

这种模式产生的“副产物“是可复用的自动化程序，而非一次性交互痕迹。

性能表现：SOTA 水平 · Online-Mind2Web (300 任务)：86.7% (GPT-5.4)，同类开源框架中最高 · Odysseys (200 长程任务，平均 76.1 步)：60.1% (GPT-5.4)，较此前 SOTA (+15.6pt)，较基线 GPT-5.4 (+26.6pt) · Claude Opus 4.7：84.7% / 难例 80.5%，难例上超越 GPT-5.4

架构极简 Runner (150行) -> Model Endpoint (550行) -> Environment (300行)

· 仅依赖 httpx、pydantic、playwright、typer · 无多智能体系统、无图引擎、无插件层 · 支持 OpenAI、Anthropic、OpenRouter 后端

产品化与集成 · Claude Code：插件 /plugin install webwright@webwright，支持 /webwright:run 和 /webwright:craft · OpenAI Codex：插件市场安装，通过 @ webwright 调用 · OpenClaw / Hermes：共享 skills/webwright/ 目录，统一技能规范

关键创新点 · Task2UI 模式 (2026-05-11 新增) —— 任务完成后自动渲染为 HTML 应用，结果可视化且可重用 · 脚本可复用性 —— 即使是 Qwen-3.5-9B 这样的小模型，在预置工具脚本辅助下也能达到 66.2% 的难例完成率 · 可审计性 —— 每次运行都保存轨迹、截图、报告，便于调试和回归

microsoft/webwright

Source: https://github.com/microsoft/webwright

Webwright

Webwright logo

Turn Your Coding Models to Be State-of-the-art Browser Agents

📝 Blog: Webwright: A Terminal Is All You Need For Web Agents
🌐 Project Page: microsoft.github.io/Webwright

Webwright gives LLM a terminal where it can launch multiple browser sessions to inspect the page and complete a web task. It captures and inspects page screenshots/states only when needed. It enforces each web task to be completed end-to-end within a re-runnable Python script, i.e. your web agent browsing history is a single code file. No multi-agent system, no graph engine, no plugin layer, no hidden orchestration — just a terminal, a browser, and a model.

Already got your favorite agents, and wonder how to make Claude Code, Codex, Hermes, OpenClaw more capable in browser tasks? Consider adding Webwright plugin/skills!

📰 News

2026-05-11 — Support Task2UI mode: Webwright completes the task and renders task results into an HTML-based web app you can easily view and reuse.
2026-05-06 — Codex and Claude Code plugin manifests added; install via /plugin install webwright@webwright. OpenClaw and Hermes Agent integrations shipped; the same skills/webwright/ folder now loads across Claude Code, Codex, OpenClaw, and Hermes.
2026-05-04 — Initial public release: ~1.5k LoC, OpenAI / Anthropic / OpenRouter backends, Playwright environment.

💡 Motivation: Beyond Step-by-Step Web Interaction in a Stateful Browser

Most web agents today treat the browser session itself as the workspace: at each step the model receives the current page state and predicts a single next operation — a click, a type, a DOM selector, or a short tool call. Whatever the format, the agent is locked into predicting one web action at a time inside a predefined interaction loop. That harness was useful when LLMs were weaker. As models get stronger at writing and debugging code, the same harness becomes a bottleneck.

Webwright takes a different stance: separate the agent from the browser, and treat the browser as something the agent can launch, inspect, and discard while developing a program. The persistent artifact is not the browser session — it’s the code and logs in the local workspace.

🧱 Robust, reusable interaction with web environments — instead of fragile pixel-level actions, a coding agent with a terminal queries elements, waits for conditions, and handles dynamic behaviors like lazy loading or re-rendering. The resulting scripts can be rerun, adapted, and shared across tasks rather than rediscovered from scratch.
⚡ Efficient composition of complex workflows — multi-step interactions like selecting a date or filling a form become a compact program. Loops, functions, and abstractions let the agent generalize across similar tasks (e.g. different dates) without re-predicting the same low-level sequences. Fewer interaction rounds, faster execution, less error accumulation on long horizons.
🧪 Workspace-as-state, not browser-as-state — the agent can write exploratory scripts, spawn fresh browser sessions, and decide for itself when to capture screenshots and inspect failures, much like a human engineer iterating on an RPA script.
🪄 Surprisingly effective despite being minimal — this stripped-down setup turns out to handle complex and especially long-horizon web tasks well (see Performance).

🌟 Why Webwright

Most web agent frameworks bury the actual agent loop under layers of abstractions. Webwright takes the opposite stance:

🪶 Lightweight by design — core agent loop in a single ~450-line file, Playwright environment in ~570 lines, CLI in ~150 lines.
🧩 Pluggable model backends — OpenAI, Anthropic, and OpenRouter, each ~150–200 lines.
🔍 Zero hidden frameworks — just httpx, pydantic, playwright, and typer.
🔁 Flat prompt → observe → execute script loop — readable end-to-end, easy to debug, easy to fork.
🧪 Run-artifact first — every run writes trajectories and screenshots to disk for inspection.

If you want a minimal, easy-to-debug starting point for browser-using agents instead of another heavyweight platform, this is it.

🆚 How Webwright Differs From Other Browser-Agent Repos

How they differ at the architectural level:

	Stagehand (Browserbase)	agent-browser (Vercel)	browser-use	Webwright
Paradigm	Hybrid: code + NL primitives (`act` / `extract` / `agent`)	CLI tool that another agent (Claude Code, Codex, etc.) calls	Autonomous LLM agent loop over DOM/AX snapshots	Coding agent with a terminal; browser is just an environment it spawns
Action space	Playwright code, or NL → LLM-translated Playwright	Discrete subcommands (`open`, `click @e2`, `snapshot`, `eval`)	Indexed click/type actions selected by the LLM	Free-form Python (writes Playwright scripts itself)
What is “state”?	The browser session	The browser session (held by daemon across CLI calls)	The browser session	The local workspace — code, screenshots, logs. Browser is disposable.
Loop shape	Imperative; `agent()` does multi-step when needed	One CLI invocation per micro-step	observe → predict next action → execute → repeat	write code → execute → inspect screenshots → repair (code-as-action)

🎥 Demo

https://github.com/user-attachments/assets/4ed94cd5-11be-4daa-b2d7-1260a803baca

📊 Performance

State-of-the-art on two real-website benchmarks with a 100-step budget — see the blog post for full details.

🏆 Online-Mind2Web (300 tasks): 86.7% with GPT-5.4 — highest among open-sourced harnesses in the AutoEval category. Claude Opus 4.7 reaches 84.7%, and is stronger on the hard split (80.5% vs. 76.6% for GPT-5.4 at N=100).
🚀 Odysseys (200 long-horizon tasks): 60.1% with GPT-5.4 (avg. 76.1 steps) — +15.6 points over the prior SOTA (Opus 4.6 at 44.5%, using vision based approach and persistent browser) and +26.6 points over base GPT-5.4 (33.5% using xy-coordinate prediction and persistent browser).
🧠 Code-as-action beats coordinate prediction: Webwright substantially outperforms a reproduced GPT-5.4 screenshot+xy-coordinate baseline across all difficulty splits.
🧰 Small models + reusable tools: generated scripts can be packaged as parameterized CLI tools — even Qwen-3.5-9B completes tasks well on Online-Mind2Web sites with 5+ tools available.

Odysseys long-horizon eval @ 100 steps Online-Mind2Web AutoEval @ 100 steps

🗺️ Project Map

webwright/
├── pyproject.toml           # package: webwright
├── src/webwright/
│   ├── run/cli.py           # CLI entrypoint (`webwright`)
│   ├── agents/default.py    # core agent loop
│   ├── environments/        # Playwright browser workspace
│   ├── tools/               # image_qa, self_reflection
│   ├── models/              # openai_model, anthropic_model, base
│   ├── config/              # base.yaml, model_openai.yaml, model_claude.yaml
│   └── utils/
├── assets/
│   └── task_showcase/       # tiny Flask dashboard for repeatable runs
│       ├── app.py
│       ├── templates/       # dashboard.html, task.html
│       └── tasks/<short_id>/ # task.json + report.json per task
├── tests/
└── outputs/                 # run artifacts (trajectories, screenshots)

📰 Task Showcase (repeatable runs as a dashboard)

A tiny Flask app under assets/task_showcase/ consolidates Webwright runs for repeatable odyssey tasks (deals, inventory, listings, job boards, weather, etc.) into a single dashboard. Each task ships only two files — task.json (metadata) and report.json (curated, structured output: sources + result sections like tables, lists, summaries) — and the templates render them generically, so adding a new task is just dropping a new folder in assets/task_showcase/tasks/.

pip install flask
python assets/task_showcase/app.py    # http://127.0.0.1:5005

To have Webwright produce a renderer-ready task folder at runtime, stack the Task Showcase overlay:

python -m webwright.run.cli \
    -c base.yaml -c model_openai.yaml -c task_showcase.yaml \
    -t "<repeatable web task>" \
    --task-id my_repeatable_task \
    -o outputs/default

The run writes task_showcase/tasks/<short_id>/task.json and report.json inside the output workspace. Render those generated files without copying them back into the repo:

python assets/task_showcase/app.py \
    --tasks-dir outputs/default/<run>/task_showcase/tasks

🚀 Quick Start

Prerequisites

Python 3.10+
Chromium installed through Playwright
An API key for your chosen backend (OpenAI, Anthropic, or OpenRouter)

Install

pip install -e .
playwright install chromium

Run

Export credentials for the configured backend (for example, OPENAI_API_KEY with model_openai.yaml or ANTHROPIC_API_KEY with model_claude.yaml). The image_qa and self_reflection tools use the same configured model by default, so an Anthropic run does not require an OpenAI key. Then:

python -m webwright.run.cli \
    -c base.yaml -c model_openai.yaml \
    -t "Search for flights from SEA to JFK on 2026-08-15 to 2026-08-20" \
    --start-url https://www.google.com/flights \
    --task-id demo_openai \
    -o outputs/default

🚩 Flags

Flag	Description
`-c`	Config file(s) from `src/webwright/config/` (stackable).
`-t`	Task instruction.
`--start-url`	Initial page.
`--task-id`	Output subfolder name.
`-o`	Output directory.

🔌 Use as a Plugin

Webwright ships plugin manifests for both Claude Code (.claude-plugin/plugin.json) and OpenAI Codex (.codex-plugin/plugin.json), with the shared skill at skills/webwright/ and slash commands at skills/webwright/commands/. The host agent drives the Webwright loop natively — no extra LLM API key or cost beyond your host subscription. Hosts that read PNG screenshots natively skip the image_qa / self_reflection tools.

Common runtime deps (install once after either path):

pip install -e .
playwright install chromium

Claude Code

Install

Install through the bundled marketplace inside Claude Code:

# 1. Add this repo as a Claude Code plugin marketplace
/plugin marketplace add microsoft/Webwright

# 2. Install the plugin from that marketplace
/plugin install webwright@webwright

Prefer a local checkout? Point the marketplace command at the cloned repo instead:

/plugin marketplace add /absolute/path/to/Webwright
/plugin install webwright@webwright

Use

Start a new Claude Code session after installing — plugins are loaded at session start and won’t appear until you restart.

You can either ask Claude Code in plain English (the skill auto-activates from its description), or use one of the slash commands:

/webwright:run search Google Flights for flights from SEA to JFK on 2026-08-15 to 2026-08-20
/webwright:craft search a ticket on Google Flights from LAX to SFO depart June 7 return June 14

/webwright:run (or any plain prompt) produces a one-shot final_script.py for the literal task values.
/webwright:craft produces a reusable CLI tool: final_script.py becomes one parameterized function with a Google-style Args: docstring and an argparse wrapper whose flags default to the concrete task values, so you can rerun it later with different arguments — e.g. python final_script.py --origin JFK --destination LAX --depart-date 2026-07-01.

In both modes Claude Code scaffolds a workspace with plan.md, runs instrumented Playwright scripts under final_runs/run_<id>/, and visually self-verifies each critical point against the saved screenshots.

OpenAI Codex

Install

Codex reads Claude-style marketplaces, so the same repo works as a Codex plugin marketplace. From the Codex CLI:

# 1. Add this repo as a Codex plugin marketplace
codex plugin marketplace add microsoft/Webwright

# 2. Open the plugin browser and install Webwright
codex
/plugins

Prefer a local checkout?

codex plugin marketplace add /absolute/path/to/Webwright

Then restart Codex so the new marketplace and plugin are picked up.

Use

In a new Codex thread, either ask in plain English (the skill auto-activates from its description) or invoke the bundled skill explicitly with @webwright:

@webwright search Google Flights for flights from SEA to JFK on 2026-08-15 to 2026-08-20

Codex scaffolds a workspace with plan.md, runs instrumented Playwright scripts under final_runs/run_<id>/, and visually self-verifies each critical point against the saved screenshots.

To turn the plugin off without uninstalling, set its entry in ~/.codex/config.toml to enabled = false and restart Codex.

🦞 OpenClaw

Install

Install directly from a local checkout (path, archive, npm spec, git repo, or clawhub: spec all work):

openclaw plugins install /absolute/path/to/Webwright
openclaw gateway restart   # reload so the plugin and skill are picked up

Verify:

openclaw plugins list | grep webwright
openclaw skills  list | grep webwright   # should show "✓ ready"

Use

The webwright skill is now available to any OpenClaw agent surface (CLI, Telegram, etc.) — invoke it by asking the agent in natural language, or via the slash commands shipped under skills/webwright/commands/, e.g. /webwright run <task>.

To uninstall: openclaw plugins uninstall webwright.

Hermes Agent

Install

Hermes Agent is a skills-compatible client, so the same skills/webwright/ folder loads as a Hermes skill. Symlink it into your Hermes user-skills directory:

mkdir -p ~/.hermes/skills
ln -sfn /absolute/path/to/Webwright/skills/webwright ~/.hermes/skills/webwright

No Hermes-specific manifest is needed; only SKILL.md is loaded.

Use

Start Hermes (hermes) and ask it to drive a web task in natural language — the skill auto-activates from its description. You can also invoke it explicitly with /webwright.

Note: the named subcommands shipped under skills/webwright/commands/ (/webwright:run, /webwright:craft) are a Claude Code / Codex convention and are inert in Hermes; the skill itself still works end-to-end.

Credits

SWE-agent/mini-swe-agent — design inspiration for the minimal agent loop.
Playwright — browser automation.

Citation

If you use Webwright in your research or build on it, please cite this repository:

@misc{webwright2026,
  title        = {Webwright: A terminal is all you need for web agents},
  author       = {Lu, Yadong and Xu, Lingrui and Huang, Chao and Awadallah, Ahmed},
  year         = {2026},
  howpublished = {\url{https://github.com/microsoft/Webwright}},
  note         = {GitHub repository}
}

Omar Shahine (@OmarShahine): Need to try this. Hoping for massive boost over Playwright for browser automation.

相似文章

@axichuhai: 这个阿里的开源项目page-agent，能让你用自然语言控制网页界面，在 GitHub 已经斩获 18.7K star。它把 AI agent 直接塞进网页里，然后你用自然语言指挥它点按钮、填表单、跳流程都行。它不需要 headles…

X AI KOLs Timeline

阿里开源项目 Page-Agent 让你通过自然语言直接操控网页界面，无需 headless 浏览器或多模态模型，已在 GitHub 获得 18.7K star。

构建了一个让AI代理浏览网页的Playwright版本

Reddit r/AI_Agents

这是Playwright的一个分支，每次会话生成唯一的浏览器指纹，使AI代理能够在网上不被察觉地浏览。该项目完全开源，基于MIT许可证。

@GitTrend0x: 100% 本地桌面AI Agent 杀手级开源神器 https://github.com/bytedance/UI-TARS-desktop… 这就是 UI-TARS-desktop，字节跳动开源的 31k 星爆款多模态桌面自动化代理！ …

X AI KOLs Timeline

UI-TARS-desktop is a highly popular open-source tool by ByteDance that enables 100% local multimodal desktop automation, allowing users to control apps and browsers via natural language without cloud data leaks.

@QingQ77: 本地优先的多智能体协作桌面应用，把 AI 协作做成聊天一样的体验，支持多智能体任务分发、文件审查和人工审批。 https://github.com/lizyoko9/bitdance-agenthub… 一个基于 Next.js + El…

X AI KOLs Timeline

AgentHub 是一个本地优先的多智能体协作桌面应用，将 AI 协作做成聊天体验，支持任务分发、文件审查和人工审批，基于 Next.js 和 Electron 构建。

@jakevin7: OpenCLI 的APP 今天完全用一个组件库重新重构了，折腾了好久。 MakeAgent 我也准备让 Agent 把 UI 完全重构掉，这个 UI 可能是最后一面了。 https://github.com/jackwener/maka-…

X AI KOLs Following

Maka 是一个本地优先的桌面 AI 工作台，基于 Electron 构建，支持多种模型连接、工具调用、权限控制和隐私保护，并集成了机器人入口、本地记忆等功能。项目作者同时提到重构了 OpenCLI 的 APP 和计划重构 MakeAgent 的 UI。

microsoft/webwright

Webwright

📰 News

🎥 Demo

📊 Performance

🗺️ Project Map

📰 Task Showcase (repeatable runs as a dashboard)

🚀 Quick Start

Prerequisites

Install

Run

🚩 Flags

🔌 Use as a Plugin

Install

Use

Install

Use

Install

Use

Install

Use

Credits

Citation

相似文章

@axichuhai: 这个阿里的开源项目page-agent，能让你用自然语言控制网页界面，在 GitHub 已经斩获 18.7K star。 它把 AI agent 直接塞进网页里，然后你用自然语言指挥它点按钮、填表单、跳流程都行。 它不需要 headles…

构建了一个让AI代理浏览网页的Playwright版本

@GitTrend0x: 100% 本地桌面AI Agent 杀手级开源神器 https://github.com/bytedance/UI-TARS-desktop… 这就是 UI-TARS-desktop，字节跳动开源的 31k 星爆款多模态桌面自动化代理！ …

@QingQ77: 本地优先的多智能体协作桌面应用，把 AI 协作做成聊天一样的体验，支持多智能体任务分发、文件审查和人工审批。 https://github.com/lizyoko9/bitdance-agenthub… 一个基于 Next.js + El…

@jakevin7: OpenCLI 的APP 今天完全用一个组件库重新重构了，折腾了好久。 MakeAgent 我也准备让 Agent 把 UI 完全重构掉，这个 UI 可能是最后一面了。 https://github.com/jackwener/maka-…

提交意见反馈

@axichuhai: 这个阿里的开源项目page-agent，能让你用自然语言控制网页界面，在 GitHub 已经斩获 18.7K star。它把 AI agent 直接塞进网页里，然后你用自然语言指挥它点按钮、填表单、跳流程都行。它不需要 headles…