@BohuTANG: 之前还想跨模型互相做 Review，这个方式对我来说太慢了，现在发现了一个新的方式：/harden ，同一个模型两轮收敛，效果非常好，感兴趣的可以试试这个skill

X AI KOLs Timeline 2026/06/22 03:26 工具

agent-engine open-source github benchmark efficiency context-management

摘要

BohuTANG introduces /harden, a method for same-model two-round convergence, and highlights the evot agent engine which completes complex tasks with fewer tokens and lower cost than alternatives like Claude Code.

之前还想跨模型互相做 Review，这个方式对我来说太慢了，现在发现了一个新的方式：/harden ，同一个模型两轮收敛，效果非常好，感兴趣的可以试试这个skill https://t.co/MRkeA9uMjl

查看原文

查看缓存全文

缓存时间: 2026/06/22 05:37

evotai/evot

Source: https://github.com/evotai/evot

Evot

An agent engine that completes complex, long-running work with minimal tokens and maximum quality.

Every gain measured under a rigorous trace + eval framework — earned through relentless iteration, never guessed at.

News · Benchmark · Why · Dashboard · Install · Quickstart · Dev

📢 News

2026-06-16 [REPL] Shift+Tab cycles reasoning effort; persisted per session.
2026-06-05 [Dashboard] Built-in web dashboard — server metrics, sessions, usage, and tool traces.
2026-05-30 [Engine] Major refactor — four-pass compaction, pi-aligned parallel tools, leaner core.
2026-05-11 [Skills] Built-in opencli — browser control, logged-in cookies, Feishu/Lark, Twitter/X.

Benchmark

Same task, same eval environment, different models. evot completes the work with fewer tokens, less time, and lower cost — on both frontier and open-source models.

Claude Opus 4.6	DeepSeek V4 Pro

Task: Fix a real bug in serde_json (issue #979) — investigate root cause, apply fix, write regression test, verify all tests pass.

Model	Metric	evot	claude-code	Difference
Opus 4.6	Cost	$2.24	$6.16	64% cheaper
Opus 4.6	Time	2m 56s	3m 51s	24% faster
Opus 4.6	Input tokens	574.8K	1.5M	62% fewer
DeepSeek V4 Pro	Cost	$0.02	$0.07	67% cheaper
DeepSeek V4 Pro	Time	6m 10s	16m 34s	63% faster
DeepSeek V4 Pro	Input tokens	42.9K	133.8K	68% fewer

All agents produce correct, passing code. The difference is how they manage context.

Why is evot faster and cheaper?

Give the LLM less context, but higher-quality context. Where other agents call the LLM to summarize when context overflows — burning extra tokens and time — evot uses zero LLM calls for context management:

Algorithmic compaction — a four-pass Rust pipeline (Reclaim → Shrink → Collapse → Evict) runs in microseconds between turns. Images downgrade to path references; old turns collapse to one-line summaries.
Spill to disk — large tool results write to disk with a short preview. The model re-reads on demand instead of carrying megabytes in context.
Compaction markers — structured metadata (files modified, conclusions, environment state) survives compaction, so progress is never lost.

Every gain is earned under a rigorous trace + eval framework, not guessed at. Each engine change is measured against live traces and a reproducible benchmark pipeline — the same real-world tasks run against Claude Code and Codex (latest versions) — before it ships. Token usage, cost, time, and success rate must improve or hold. Relentless trial and iteration, where the numbers decide what stays. Continuous improvement, no regression.

Dashboard

Evot ships with a built-in web dashboard for real-time observability: server resource usage, all connected sessions, and per-session detail — token usage, tool call sequences, and span-level traces.

Overview — server metrics & sessions	Session detail — usage & tool traces

Installation

One-liner (recommended)

curl -fsSL https://evot.ai/install | sh

From source

git clone https://github.com/evotai/evot.git
cd evot
make setup && make install
evot

Quickstart

1. Set your API key

Create ~/.evotai/evot.env:

# Anthropic (default)
EVOT_LLM_ANTHROPIC_API_KEY=sk-ant-...
EVOT_LLM_ANTHROPIC_BASE_URL=your-anthropic-base-url
EVOT_LLM_ANTHROPIC_MODEL=claude-opus-4-6
# Multiple models: EVOT_LLM_ANTHROPIC_MODEL=claude-sonnet-4-6,claude-opus-4-6

# Or OpenAI
# EVOT_LLM_OPENAI_API_KEY=sk-...
# EVOT_LLM_OPENAI_BASE_URL=your-openai-base-url/v1
# EVOT_LLM_OPENAI_MODEL=gpt-5.5

# Or DeepSeek (Anthropic-compatible)
# EVOT_LLM_DEEPSEEK_API_KEY=sk-...
# EVOT_LLM_DEEPSEEK_BASE_URL=https://api.deepseek.com/anthropic
# EVOT_LLM_DEEPSEEK_PROTOCOL=anthropic
# EVOT_LLM_DEEPSEEK_MODEL=deepseek-v4-pro

# Or Xiaomi MiMo-V2.5-Pro (Anthropic-compatible)
# EVOT_LLM_XIAOMI_API_KEY=tp-...
# EVOT_LLM_XIAOMI_BASE_URL=https://token-plan-cn.xiaomimimo.com/anthropic
# EVOT_LLM_XIAOMI_PROTOCOL=anthropic
# EVOT_LLM_XIAOMI_MODEL=mimo-v2.5-pro

Use --model provider:model for one-off overrides.

2. Run

evot                                          # interactive REPL
evot -p "summarize today's PRs"               # one-shot task
evot -p "review this" -f ./src/main.rs        # attach file context
evot -p "continue work" -c                    # continue latest session in cwd
evot -p "continue work" -r my-session         # resume or create session

In the REPL: /help lists commands, Shift+Tab cycles the reasoning effort.

CLI flags & options

Flag	Description
`-p, --prompt`	Run a single prompt and exit
`-f, --file <path>`	Attach file/directory context (repeatable)
`-c, --continue`	Continue the latest session in the current directory
`-r, --resume <id>`	Resume or create a session
`--model <model>`	Override the configured model
`--env-file <path>`	Path to a custom `evot.env`
`--skills <dir>`	Add a skills directory (repeatable)
`--verbose`	Enable info-level logging

Development

make setup        # install Rust toolchain, git hooks
make test         # all tests (engine + CLI)
make install      # compile standalone binary to ~/.evotai/bin/evot

License

Apache-2.0

相似文章

@BohuTANG: 在研发 Evot 过程中发现，要让 Anthropic（Opus 系列）模型发挥到极致，官方 Claude Code 的做法基本是最优解，很难绕过。对 Claude Code prompt 做了深度分析和量化验证，发现他们在训练阶段就把…

X AI KOLs Timeline

在研发Evot过程中发现，要让Anthropic Opus模型发挥极致，官方Claude Code的方法是最优解，因为训练时将Agent Harness行为模式编入了权重，而非纯prompt工程；未来Agent Harness竞争将把行为下沉到模型层。

@xiaohu: Claude Code 之父自己的 CLAUDE.md 现在就两行... Claude Code 团队聊"少即是多"分享随着模型能力增加该如何和模型交流： “别跟模型较劲做加法，因为模型每代都在变强，你今天费劲搭的东西很快就白搭了。” 为…

X AI KOLs Timeline

Claude Code 团队分享了使用最佳实践：CLAUDE.md 应尽量简短并定期清空，坚持 CLI 而非 GUI 的原因是模型进步太快，用 AI 修 bug 已非常高效，核心策略是做减法、轻配置、信赖模型能力。

@zhixianio: 这两天新机器到了之后，我开始了「苦行僧」式的强迫自己使用本地模型来完成常见任务的修行本以为会非常痛苦，没想到无论是速度还是质量都大大超出我的预期：模型： Qwen3.6-35B-A3B-oQ6-fp16-mtp 运行：oMLX，开 N…

X AI KOLs Timeline

作者在本地新机器上使用Qwen3.6-35B-A3B模型和oMLX工具进行日常任务，发现速度和效果远超预期，甚至在PA和Coding场景下优于远程LLM，体现了端侧AI能力的显著提升。

@shao__meng: Claude Code、Cursor、Codex、Aider、Cline 部分底层模型可能完全相同，但 Agent 表现却不一样，为什么？ @addyosmani 认为：是因为模型之上的那层“外壳” —— Harness，它包括「提示词、…

X AI KOLs Timeline

The article discusses how Addy Osmani argues that the performance difference between AI coding agents like Claude Code, Cursor, and Cline stems from their 'Harness'—the layer of prompts, tools, and constraints around the model—rather than the underlying model itself. It details best practices for harness engineering, including hooks, sandboxing, and context management, to bridge the gap between model capability and actual agent performance.

@Yuancheng: ➤ 最近还是不断有新的 Agent Harness 思路和实践在出现。这两天看到 OpenSquilla，一个开源、能本地托管的 AI Agent。 ① 它有智能模型路由——同样的任务，token 成本比 OpenClaw 省 …

X AI KOLs Timeline

OpenSquilla 是一个开源、可本地托管的 AI Agent，具有智能模型路由功能，可在不同模型间分配任务以节省 token 成本，并引入 MetaSkill 机制让 Agent 自动组织技能。

evotai/evot

📢 News

Benchmark

Why is evot faster and cheaper?

Dashboard

Installation

One-liner (recommended)

From source

Quickstart

Development

License

相似文章

@BohuTANG: 在研发 Evot 过程中发现，要让 Anthropic（Opus 系列）模型发挥到极致，官方 Claude Code 的做法基本是最优解，很难绕过。 对 Claude Code prompt 做了深度分析和量化验证，发现他们在训练阶段就把…

@xiaohu: Claude Code 之父自己的 CLAUDE.md 现在就两行... Claude Code 团队聊"少即是多"分享随着模型能力增加该如何和模型交流： “别跟模型较劲做加法，因为模型每代都在变强，你今天费劲搭的东西很快就白搭了。” 为…

@zhixianio: 这两天新机器到了之后，我开始了「苦行僧」式的强迫自己使用本地模型来完成常见任务的修行 本以为会非常痛苦，没想到无论是速度还是质量都大大超出我的预期： 模型： Qwen3.6-35B-A3B-oQ6-fp16-mtp 运行：oMLX，开 N…

@shao__meng: Claude Code、Cursor、Codex、Aider、Cline 部分底层模型可能完全相同，但 Agent 表现却不一样，为什么？ @addyosmani 认为：是因为模型之上的那层“外壳” —— Harness，它包括「提示词、…

@Yuancheng: ➤ 最近还是不断有新的 Agent Harness 思路和实践在出现。 这两天看到 **OpenSquilla**，一个开源、能本地托管的 AI Agent。 ① 它有智能模型路由——同样的任务，token 成本比 OpenClaw 省 …

提交意见反馈

@BohuTANG: 在研发 Evot 过程中发现，要让 Anthropic（Opus 系列）模型发挥到极致，官方 Claude Code 的做法基本是最优解，很难绕过。对 Claude Code prompt 做了深度分析和量化验证，发现他们在训练阶段就把…

@zhixianio: 这两天新机器到了之后，我开始了「苦行僧」式的强迫自己使用本地模型来完成常见任务的修行本以为会非常痛苦，没想到无论是速度还是质量都大大超出我的预期：模型： Qwen3.6-35B-A3B-oQ6-fp16-mtp 运行：oMLX，开 N…

@Yuancheng: ➤ 最近还是不断有新的 Agent Harness 思路和实践在出现。这两天看到 OpenSquilla，一个开源、能本地托管的 AI Agent。 ① 它有智能模型路由——同样的任务，token 成本比 OpenClaw 省 …