@BohuTANG: 之前还想跨模型互相做 Review,这个方式对我来说太慢了,现在发现了一个新的方式:/harden ,同一个模型两轮收敛,效果非常好 ,感兴趣的可以试试这个skill
摘要
BohuTANG introduces /harden, a method for same-model two-round convergence, and highlights the evot agent engine which completes complex tasks with fewer tokens and lower cost than alternatives like Claude Code.
查看缓存全文
缓存时间: 2026/06/22 05:37
之前还想跨模型互相做 Review,这个方式对我来说太慢了,现在发现了一个新的方式:/harden ,同一个模型两轮收敛,效果非常好 ,感兴趣的可以试试这个skill https://t.co/MRkeA9uMjl
evotai/evot
Source: https://github.com/evotai/evot
Evot
An agent engine that completes complex, long-running work with minimal tokens and maximum quality.
Every gain measured under a rigorous trace + eval framework — earned through relentless iteration, never guessed at.
News · Benchmark · Why · Dashboard · Install · Quickstart · Dev
📢 News
- 2026-06-16 [REPL] Shift+Tab cycles reasoning effort; persisted per session.
- 2026-06-05 [Dashboard] Built-in web dashboard — server metrics, sessions, usage, and tool traces.
- 2026-05-30 [Engine] Major refactor — four-pass compaction, pi-aligned parallel tools, leaner core.
- 2026-05-11 [Skills] Built-in
opencli— browser control, logged-in cookies, Feishu/Lark, Twitter/X.
Benchmark
Same task, same eval environment, different models. evot completes the work with fewer tokens, less time, and lower cost — on both frontier and open-source models.
| Claude Opus 4.6 | DeepSeek V4 Pro |
![]() |
![]() |
Task: Fix a real bug in serde_json (issue #979) — investigate root cause, apply fix, write regression test, verify all tests pass.
| Model | Metric | evot | claude-code | Difference |
|---|---|---|---|---|
| Opus 4.6 | Cost | $2.24 | $6.16 | 64% cheaper |
| Opus 4.6 | Time | 2m 56s | 3m 51s | 24% faster |
| Opus 4.6 | Input tokens | 574.8K | 1.5M | 62% fewer |
| DeepSeek V4 Pro | Cost | $0.02 | $0.07 | 67% cheaper |
| DeepSeek V4 Pro | Time | 6m 10s | 16m 34s | 63% faster |
| DeepSeek V4 Pro | Input tokens | 42.9K | 133.8K | 68% fewer |
All agents produce correct, passing code. The difference is how they manage context.
Why is evot faster and cheaper?
Give the LLM less context, but higher-quality context. Where other agents call the LLM to summarize when context overflows — burning extra tokens and time — evot uses zero LLM calls for context management:
- Algorithmic compaction — a four-pass Rust pipeline (Reclaim → Shrink → Collapse → Evict) runs in microseconds between turns. Images downgrade to path references; old turns collapse to one-line summaries.
- Spill to disk — large tool results write to disk with a short preview. The model re-reads on demand instead of carrying megabytes in context.
- Compaction markers — structured metadata (files modified, conclusions, environment state) survives compaction, so progress is never lost.
Every gain is earned under a rigorous trace + eval framework, not guessed at. Each engine change is measured against live traces and a reproducible benchmark pipeline — the same real-world tasks run against Claude Code and Codex (latest versions) — before it ships. Token usage, cost, time, and success rate must improve or hold. Relentless trial and iteration, where the numbers decide what stays. Continuous improvement, no regression.
Dashboard
Evot ships with a built-in web dashboard for real-time observability: server resource usage, all connected sessions, and per-session detail — token usage, tool call sequences, and span-level traces.
| Overview — server metrics & sessions | Session detail — usage & tool traces |
![]() |
![]() |
Installation
One-liner (recommended)
curl -fsSL https://evot.ai/install | sh
From source
git clone https://github.com/evotai/evot.git
cd evot
make setup && make install
evot
Quickstart
1. Set your API key
Create ~/.evotai/evot.env:
# Anthropic (default)
EVOT_LLM_ANTHROPIC_API_KEY=sk-ant-...
EVOT_LLM_ANTHROPIC_BASE_URL=your-anthropic-base-url
EVOT_LLM_ANTHROPIC_MODEL=claude-opus-4-6
# Multiple models: EVOT_LLM_ANTHROPIC_MODEL=claude-sonnet-4-6,claude-opus-4-6
# Or OpenAI
# EVOT_LLM_OPENAI_API_KEY=sk-...
# EVOT_LLM_OPENAI_BASE_URL=your-openai-base-url/v1
# EVOT_LLM_OPENAI_MODEL=gpt-5.5
# Or DeepSeek (Anthropic-compatible)
# EVOT_LLM_DEEPSEEK_API_KEY=sk-...
# EVOT_LLM_DEEPSEEK_BASE_URL=https://api.deepseek.com/anthropic
# EVOT_LLM_DEEPSEEK_PROTOCOL=anthropic
# EVOT_LLM_DEEPSEEK_MODEL=deepseek-v4-pro
# Or Xiaomi MiMo-V2.5-Pro (Anthropic-compatible)
# EVOT_LLM_XIAOMI_API_KEY=tp-...
# EVOT_LLM_XIAOMI_BASE_URL=https://token-plan-cn.xiaomimimo.com/anthropic
# EVOT_LLM_XIAOMI_PROTOCOL=anthropic
# EVOT_LLM_XIAOMI_MODEL=mimo-v2.5-pro
Use
--model provider:modelfor one-off overrides.
2. Run
evot # interactive REPL
evot -p "summarize today's PRs" # one-shot task
evot -p "review this" -f ./src/main.rs # attach file context
evot -p "continue work" -c # continue latest session in cwd
evot -p "continue work" -r my-session # resume or create session
In the REPL:
/helplists commands, Shift+Tab cycles the reasoning effort.
CLI flags & options
| Flag | Description |
|---|---|
-p, --prompt | Run a single prompt and exit |
-f, --file <path> | Attach file/directory context (repeatable) |
-c, --continue | Continue the latest session in the current directory |
-r, --resume <id> | Resume or create a session |
--model <model> | Override the configured model |
--env-file <path> | Path to a custom evot.env |
--skills <dir> | Add a skills directory (repeatable) |
--verbose | Enable info-level logging |
Development
make setup # install Rust toolchain, git hooks
make test # all tests (engine + CLI)
make install # compile standalone binary to ~/.evotai/bin/evot
License
Apache-2.0
相似文章
@BohuTANG: 在研发 Evot 过程中发现,要让 Anthropic(Opus 系列)模型发挥到极致,官方 Claude Code 的做法基本是最优解,很难绕过。 对 Claude Code prompt 做了深度分析和量化验证,发现他们在训练阶段就把…
在研发Evot过程中发现,要让Anthropic Opus模型发挥极致,官方Claude Code的方法是最优解,因为训练时将Agent Harness行为模式编入了权重,而非纯prompt工程;未来Agent Harness竞争将把行为下沉到模型层。
@xiaohu: Claude Code 之父自己的 CLAUDE.md 现在就两行... Claude Code 团队聊"少即是多"分享随着模型能力增加该如何和模型交流: “别跟模型较劲做加法,因为模型每代都在变强,你今天费劲搭的东西很快就白搭了。” 为…
Claude Code 团队分享了使用最佳实践:CLAUDE.md 应尽量简短并定期清空,坚持 CLI 而非 GUI 的原因是模型进步太快,用 AI 修 bug 已非常高效,核心策略是做减法、轻配置、信赖模型能力。
@zhixianio: 这两天新机器到了之后,我开始了「苦行僧」式的强迫自己使用本地模型来完成常见任务的修行 本以为会非常痛苦,没想到无论是速度还是质量都大大超出我的预期: 模型: Qwen3.6-35B-A3B-oQ6-fp16-mtp 运行:oMLX,开 N…
作者在本地新机器上使用Qwen3.6-35B-A3B模型和oMLX工具进行日常任务,发现速度和效果远超预期,甚至在PA和Coding场景下优于远程LLM,体现了端侧AI能力的显著提升。
@shao__meng: Claude Code、Cursor、Codex、Aider、Cline 部分底层模型可能完全相同,但 Agent 表现却不一样,为什么? @addyosmani 认为:是因为模型之上的那层“外壳” —— Harness,它包括「提示词、…
The article discusses how Addy Osmani argues that the performance difference between AI coding agents like Claude Code, Cursor, and Cline stems from their 'Harness'—the layer of prompts, tools, and constraints around the model—rather than the underlying model itself. It details best practices for harness engineering, including hooks, sandboxing, and context management, to bridge the gap between model capability and actual agent performance.
@Yuancheng: ➤ 最近还是不断有新的 Agent Harness 思路和实践在出现。 这两天看到 **OpenSquilla**,一个开源、能本地托管的 AI Agent。 ① 它有智能模型路由——同样的任务,token 成本比 OpenClaw 省 …
OpenSquilla 是一个开源、可本地托管的 AI Agent,具有智能模型路由功能,可在不同模型间分配任务以节省 token 成本,并引入 MetaSkill 机制让 Agent 自动组织技能。



