@GitTrend0x: AI Agent Token 压缩 60-95% 开源神器 https://github.com/chopratejas/headroom… 这就是 Headroom,6.7k star LLM Token 终极压缩神器!一句话干翻所有 …
摘要
Headroom 是一个开源工具,可将 AI Agent 读取的工具输出、日志、RAG 片段等压缩 60-95%,同时保持答案质量不变,支持可逆压缩和跨 Agent 共享记忆。
查看缓存全文
缓存时间: 2026/06/03 03:41
AI Agent Token 压缩 60-95% 开源神器
https://github.com/chopratejas/headroom…
这就是 Headroom,6.7k star LLM Token 终极压缩神器!一句话干翻所有 Token 焦虑:把 Agent 读取的工具输出、日志、RAG 片段、文件、历史对话全部压缩 60-95%,答案质量完全不变,还支持可逆压缩 + 跨 Agent 共享记忆,直接把 Claude Code、Cursor、Aider 等工具的成本和上下文压力干到地板!
chopratejas/headroom
Source: https://github.com/chopratejas/headroom
██╗ ██╗███████╗ █████╗ ██████╗ ██████╗ ██████╗ ██████╗ ███╗ ███╗
██║ ██║██╔════╝██╔══██╗██╔══██╗██╔══██╗██╔═══██╗██╔═══██╗████╗ ████║
███████║█████╗ ███████║██║ ██║██████╔╝██║ ██║██║ ██║██╔████╔██║
██╔══██║██╔══╝ ██╔══██║██║ ██║██╔══██╗██║ ██║██║ ██║██║╚██╔╝██║
██║ ██║███████╗██║ ██║██████╔╝██║ ██║╚██████╔╝╚██████╔╝██║ ╚═╝ ██║
╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═╝ ╚═╝
The context compression layer for AI agents
60–95% fewer tokens · library · proxy · MCP · 6 algorithms · local-first · reversible
Docs · Install · Proof · Agents · Discord · llms.txt
AI agents / LLMs: read /llms.txt here, or fetch the live index / full docs blob.
Headroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.
Live: 10,144 → 1,260 tokens — same FATAL found.
What it does
- Library —
compress(messages)in Python or TypeScript, inline in any app - Proxy —
headroom proxy --port 8787, zero code changes, any language - Agent wrap —
headroom wrap claude|codex|cursor|aider|copilotin one command - MCP server —
headroom_compress,headroom_retrieve,headroom_statsfor any MCP client - Cross-agent memory — shared store across Claude, Codex, Gemini, auto-dedup
headroom learn— mines failed sessions, writes corrections toCLAUDE.md/AGENTS.md- Reversible (CCR) — originals never deleted; LLM retrieves on demand
How it works (30 seconds)
Your agent / app
(Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
│ prompts · tool outputs · logs · RAG results · files
▼
┌────────────────────────────────────────────────────┐
│ Headroom (runs locally — your data stays here) │
│ ──────────────────────────────────────────────── │
│ CacheAligner → ContentRouter → CCR │
│ ├─ SmartCrusher (JSON) │
│ ├─ CodeCompressor (AST) │
│ └─ Kompress-base (text, HF) │
│ │
│ Cross-agent memory · headroom learn · MCP │
└────────────────────────────────────────────────────┘
│ compressed prompt + retrieval tool
▼
LLM provider (Anthropic · OpenAI · Bedrock · …)
- ContentRouter — detects content type, selects the right compressor
- SmartCrusher / CodeCompressor / Kompress-base — compress JSON, AST, or prose
- CacheAligner — stabilizes prefixes so provider KV caches actually hit
- CCR — stores originals locally; LLM calls
headroom_retrieveif it needs them
→ Architecture · CCR reversible compression · Kompress-base model card
Get started (60 seconds)
# 1 — Install
pip install "headroom-ai[all]" # Python
npm install headroom-ai # Node / TypeScript
# 2 — Pick your mode
headroom wrap claude # wrap a coding agent
headroom proxy --port 8787 # drop-in proxy, zero code changes
# or: from headroom import compress # inline library
# 3 — See the savings
headroom stats
Granular extras: [proxy], [mcp], [ml], [agno], [langchain], [evals]. Requires Python 3.10+.
Proof
Savings on real agent workloads:
| Workload | Before | After | Savings |
|---|---|---|---|
| Code search (100 results) | 17,765 | 1,408 | 92% |
| SRE incident debugging | 65,694 | 5,118 | 92% |
| GitHub issue triage | 54,174 | 14,761 | 73% |
| Codebase exploration | 78,502 | 41,254 | 47% |
Accuracy preserved on standard benchmarks:
| Benchmark | Category | N | Baseline | Headroom | Delta |
|---|---|---|---|---|---|
| GSM8K | Math | 100 | 0.870 | 0.870 | ±0.000 |
| TruthfulQA | Factual | 100 | 0.530 | 0.560 | +0.030 |
| SQuAD v2 | QA | 100 | — | 97% | 19% compression |
| BFCL | Tools | 100 | — | 97% | 32% compression |
Reproduce: python -m headroom.evals suite --tier 1 · Full benchmarks & methodology
Agent compatibility matrix
| Agent | headroom wrap | Notes |
|---|---|---|
| Claude Code | ● | --memory · --code-graph |
| Codex | ● | shares memory with Claude |
| Cursor | ● | prints config — paste once |
| Aider | ● | starts proxy + launches |
| Copilot CLI | ● | starts proxy + launches |
| OpenClaw | ● | installs as ContextEngine plugin |
Any OpenAI-compatible client works via headroom proxy. MCP-native: headroom mcp install.
When to use · When to skip
Great fit if you…
- run AI coding agents daily and want savings without changing your code
- work across multiple agents and want shared memory
- need reversible compression — originals always retrievable via CCR
Skip it if you…
- only use a single provider’s native compaction and don’t need cross-agent memory
- work in a sandboxed environment where local processes can’t run
Integrations — drop Headroom into any stack
| Your setup | Hook in with |
|---|---|
| Any Python app | compress(messages, model=…) |
| Any TypeScript app | await compress(messages, { model }) |
| Anthropic / OpenAI SDK | withHeadroom(new Anthropic()) · withHeadroom(new OpenAI()) |
| Vercel AI SDK | wrapLanguageModel({ model, middleware: headroomMiddleware() }) |
| LiteLLM | litellm.callbacks = [HeadroomCallback()] |
| LangChain | HeadroomChatModel(your_llm) |
| Agno | HeadroomAgnoModel(your_model) |
| Strands | Strands guide |
| ASGI apps | app.add_middleware(CompressionMiddleware) |
| Multi-agent | SharedContext().put / .get |
| MCP clients | headroom mcp install |
What's inside
- SmartCrusher — universal JSON: arrays of dicts, nested objects, mixed types.
- CodeCompressor — AST-aware for Python, JS, Go, Rust, Java, C++.
- Kompress-base — our HuggingFace model, trained on agentic traces.
- Image compression — 40–90% reduction via trained ML router.
- CacheAligner — stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.
- IntelligentContext — score-based context fitting with learned importance.
- CCR — reversible compression; LLM retrieves originals on demand.
- Cross-agent memory — shared store, agent provenance, auto-dedup.
- SharedContext — compressed context passing across multi-agent workflows.
headroom learn— plugin-based failure mining for Claude, Codex, Gemini.
Pipeline internals
Headroom exposes one stable request lifecycle across compress(), the SDK, and the proxy:
Setup → Pre-Start → Post-Start → Input Received → Input Cached → Input Routed → Input Compressed → Input Remembered → Pre-Send → Post-Send → Response Received
- Transforms do the work: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor, Kompress-base, IntelligentContext / RollingWindow.
- Pipeline extensions observe or customize lifecycle stages via
on_pipeline_event(...). - Compression hooks sit alongside the canonical lifecycle as an additional extension seam.
- Proxy extensions remain the server/app integration seam for ASGI middleware, routes, and startup policy.
Provider and tool-specific behavior lives under headroom/providers/ so core orchestration stays focused on lifecycle, sequencing, and policy.
- CLI/tool slices:
headroom/providers/claude,copilot,codex,openclaw - Provider runtime slices:
headroom/providers/claude,gemini, plus shared backend/runtime dispatch inheadroom/providers/registry.py - Core files stay orchestration-first:
wrap.py,client.py,cli/proxy.py, andproxy/server.pydelegate provider-specific env shaping, API target normalization, backend selection, and transport dispatch.
Install
pip install "headroom-ai[all]" # Python, everything
npm install headroom-ai # TypeScript / Node
docker pull ghcr.io/chopratejas/headroom:latest
Granular extras: [proxy], [mcp], [ml] (Kompress-base), [agno], [langchain], [evals]. Requires Python 3.10+.
Using pipx? Choose a supported interpreter explicitly:
pipx install --python python3.13 "headroom-ai[all]"
→ Installation guide — Docker tags, persistent service, PowerShell, devcontainers.
headroom learn
headroom learn — mines failed sessions, writes corrections to CLAUDE.md / AGENTS.md / GEMINI.md.
Documentation
| Start here | Go deeper |
|---|---|
| Quickstart | Architecture |
| Proxy | How compression works |
| MCP tools | CCR — reversible compression |
| Memory | Cache optimization |
| Failure learning | Benchmarks |
| Configuration | Limitations |
Compared to
Headroom runs locally, covers every content type, works with every major framework, and is reversible.
| Scope | Deploy | Local | Reversible | |
|---|---|---|---|---|
| Headroom | All context — tools, RAG, logs, files, history | Proxy · library · middleware · MCP | Yes | Yes |
| RTK | CLI command outputs | CLI wrapper | Yes | No |
| lean-ctx | CLI commands, MCP tools, editor rules | CLI wrapper · MCP | Yes | No |
| Compresr, Token Co. | Text sent to their API | Hosted API call | No | No |
| OpenAI Compaction | Conversation history | Provider-native | No | No |
Attribution. Headroom ships with the excellent RTK binary for shell-output rewriting —
git show --short, scopedls, summarized installers. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it. Headroom can also use lean-ctx as the selected CLI context tool; setHEADROOM_CONTEXT_TOOL=lean-ctxbefore runningheadroom wrap ....
Contributing
git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest
Devcontainers in .devcontainer/ (default + memory-stack with Qdrant & Neo4j). See CONTRIBUTING.md.
Community
- Discord — questions, feedback, war stories.
- Kompress-base on HuggingFace — the model behind our text compression.
License
Apache 2.0 — see LICENSE.
GitTrend (@GitTrend0x): Claude Code 自动生成专业多 Agent 团队杀手级开源神器
https://t.co/tkr2kJ2TmP
这就是 Harness,5.3k star Claude Code 顶级 meta-skill!一句话干翻所有手动搭 Agent 的痛苦:只要描述一个领域,它就能自动设计出完整的多 Agent 团队(包含角色定义 +
相似文章
Headroom (GitHub 仓库)
Headroom 是一个开源工具,能在 AI 代理读取上下文(工具输出、日志、RAG 块、对话历史等)之前对其进行压缩,在到达 LLM 时可减少 60–95% 的令牌数量,同时保留答案质量。它支持多种集成模式,包括库、代理、代理包装和 MCP 服务器,并提供可逆压缩与跨代理记忆。
@Chenzeze777: 兄弟们今天刷 GitHub 直接愣住了。 Headroom,一周涨了 1.4 万星,海外开发者圈彻底炸了。我本来以为是又一个 PPT 开源项目,结果仔细看了眼实测数据——代码搜索 1.7 万 token 压到 1400,答案一字没变。 给…
Headroom 是一个开源工具,可将代码搜索结果和AI对话中的token数量压缩高达92%(如从1.7万压缩到1400),且保持答案质量不变,支持多平台本地免费运行。
@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2062553418460479577
一款名为Headroom的开源工具采用可逆的Compress-Cache-Retrieve架构,能将AI智能体上下文压缩高达90%,使模型能够在需要时检索原始细节,而非永久丢弃。
@WY_mask: 给各类 AI 编程助手打造持久化记忆引擎 http://github.com/rohitg00/agentmemory… 在后台静默记录代码修改和上下文 自动提取并压缩成结构化记忆 节省长上下文带来的 Token 消耗 关联过去的信息,随…
agentmemory 是一个为 AI 编程助手提供持久化记忆的开源工具,能静默记录代码修改和上下文,自动提取并压缩成结构化记忆,降低 Token 消耗,并支持 Claude Code、Codex 等多种主流平台。
@GitTrend0x: Claude Code 代码库智能大脑 27×省 Token 杀手级开源神器 https://github.com/repowise-dev/repowise… 这就是 Repowise,专为 AI 辅助工程团队打造的代码库智能平台!它把…
Repowise is an open-source tool that indexes codebases into four intelligence layers (dependency graph, git history, auto-documentation, architectural decisions) and exposes them via seven MCP tools to AI coding agents like Claude Code, achieving up to 27× token savings while maintaining answer quality.