@geekbb: Agent harness 自动化优化工具,接管了 Agent harness 优化的脏活,你给一个基准测试命令和目标仓库,它就自动生成提案、跑评测、记结果、留好的,弃差的,自动改进 agent 的 prompt、配置和源码。 https…
摘要
autoharness 是一个自动化代理 harness 优化工具,能基于基准测试命令自动生成提案、运行评估并改进 agent 的 prompt、配置和源码,支持 Codex 和 Claude。
查看缓存全文
缓存时间: 2026/05/11 12:42
Agent harness 自动化优化工具,接管了 Agent harness 优化的脏活,你给一个基准测试命令和目标仓库,它就自动生成提案、跑评测、记结果、留好的,弃差的,自动改进 agent 的 prompt、配置和源码。
https://t.co/2qhYImGjuP https://t.co/t9qGZMZjkP
kayba-ai/autoharness
Source: https://github.com/kayba-ai/autoharness
autoharness
Let autoharness run overnight and come back to an optimized agent harness, so your production agents never make mistakes again.
autoharness improves agent harnesses by proposing or applying prompt, config, middleware, and source changes, running evals, and keeping or discarding candidates based on benchmark results.
It is a control plane for an existing harness repo. You point it at a target root and a benchmark command; autoharness manages proposals, iterations, campaigns, and champion state under .autoharness/.
Install
Fastest setup with Codex or Claude:
pipx install "git+https://github.com/kayba-ai/autoharness.git"cdinto your harness repo- open Codex or Claude Code in that repo
- tell the assistant:
Run autoharness guide --assistant codex --print-next-prompt, then use the generated onboarding packet to finish setup.
For Claude Code, swap --assistant codex for --assistant claude.
Else:
pipx install "git+https://github.com/kayba-ai/autoharness.git"
autoharness --help
If you do not use pipx:
python3 -m pip install --user "git+https://github.com/kayba-ai/autoharness.git"
How It Works
guideinspects a repo, asks a few focused setup questions in a TTY, stays scriptable with flags in non-interactive use, writes a starterautoharness.yamlplus benchmark config, and runs a readiness check.doctorreruns config, generator, and benchmark validation when you want an explicit readiness gate.setupandinitremain available when you want to manage bootstrap explicitly.run-benchmarkexecutes one benchmark directly.generate-proposalpreviews one candidate change without running it.run-iterationoroptimizeexecutes one candidate or a resumable search loop.promoteorpromote-from-comparemoves a winner into champion state.
Mental Model
target root: the harness repo or deployment tree to editbenchmark config: the command or adapter config that scores candidatesworkspace: the long-lived optimization efforttrack: one comparable lane inside a workspacecampaign: a resumable search run over candidate proposals.autoharness/: persisted settings, proposals, records, iterations, and champions
Batteries Included
- Adapters:
generic_command,pytest,harbor,tau2_bench,hal,car_bench - Proposal generators:
manual,failure_summary,local_template,local_command,openai_responses,codex_cli,claude_code - Extension model: Python plugins can add generators, preflight checks, and search strategies from
.autoharness/plugins/orAUTOHARNESS_PLUGIN_PATHS
Quick Start
Let autoharness generate a starter project config:
autoharness guide
In a TTY, guide asks a few setup questions. In scripts or CI, use flags like --non-interactive, --benchmark-command, --generator, and --autonomy.
If you want Codex or Claude to help you refine the setup, generate an assistant brief too:
autoharness guide --assistant codex --print-next-prompt
# or
autoharness guide --assistant claude --print-next-prompt
This writes autoharness.codex.md or autoharness.claude.md plus a structured autoharness.onboarding.json handoff next to autoharness.yaml, then prints a ready-to-paste assistant prompt. Assistant wrapper prompts live under contrib/agents/.
guide ends with a doctor pass. Run autoharness doctor again later if you want an explicit re-check or a repeated benchmark probe.
On a fresh install, guide prefers a local assistant backend when codex or claude is installed, otherwise uses openai_responses when OpenAI credentials are configured, and falls back to failure_summary only when no model-backed generator is available.
Then run the benchmark directly:
autoharness run-benchmark
If autoharness.yaml is present, autoharness will auto-bootstrap missing settings and workspace state on this common path. setup and init are still available when you want explicit control.
Generate a proposal against a target harness root:
autoharness generate-proposal
If you switch the project config to openai_responses, export an API key first:
export OPENAI_API_KEY=...
Run the outer loop:
autoharness optimize
autoharness report
Early Results
Example from one tau2 airline benchmark study. Relative deltas are measured against the baseline harness on the same workload. Results depend on the benchmark, harness, and evaluation setup, and some intervention combinations can regress.
Docs
For Power Users
- Background campaign workers plus queue and worker-state inspection
- Root-level memory, transfer suggestions, and portfolio scheduling
- Retention policies, pruning, and portable report and bundle exports
- Event logs, inspection commands, and operational reporting surfaces
- Python plugin hooks for generators, preflight checks, and search strategies
Want deeper analysis or a custom optimization workflow? Kayba offers managed harness optimization and agent-improvement support tailored to your stack.
Star this repo if you find it useful!
Built with ❤️ by Kayba and the open-source community.
相似文章
Claude Code 在一夜之间将我的 Agent 框架性能提升了 40%
作者介绍了“Autoharness”,这是一个利用 Claude Code 通过迭代提示词和超参数来自主优化 Agent 框架的工具。在 tau2-airline 基准测试中,该工具使性能提升了 40%。
远程代理管理器
用于远程管理和控制AI代理的工具
@GitHub_Daily: 用 Claude Code 做复杂项目,单个 Agent 能力有限,想让多个 Agent 协作分工,但手动配置团队结构和技能文件太繁琐。 最近找到 Harness 这个 Claude Code 插件,一句话描述你的项目,它就能自动生成一整…
Harness 是一个 Claude Code 插件,能根据一句话描述自动生成多 Agent 团队架构,内置 6 种协作模式和 100 套现成配置,帮助 Claude Code 从单兵作战变为团队协作。
@QingQ77: 把任何一个 GitHub 仓库变成它自己的 AI Agent——带专属 CLI、MCP 服务、记忆和签名认证,能直接 npm 发布。 https://github.com/ruvnet/agent-harness-generator… 你…
MetaHarness converts any GitHub repository into a custom AI agent harness with CLI, MCP service, memory, and signing, allowing deployment on multiple agent platforms.
@XAMTO_AI: 想自己从零搭一个生产级 Agent Harness?别做梦了,以为随便挑个框架就能收工的,基本全翻车了。 真相是这玩意儿压根不是"选框架"能摆平的事,它背后藏着15项你绕不开的硬核职责: 每一项都得做成能安装、能版本化、还能换语言跑的 w…
The article argues that production agent harnesses should not be monolithic frameworks but rather a stack of independent, replaceable workers connected by a shared trigger primitive, outlining 15 core responsibilities and how the iii engine implements this approach.