@GitTrend0x: AI Agent Token 压缩 60-95% 开源神器 https://github.com/chopratejas/headroom… 这就是 Headroom，6.7k star LLM Token 终极压缩神器！一句话干翻所有 …

X AI KOLs Timeline 2026/06/03 02:33 工具

token-compression ai-agent open-source llm context-compression mcp proxy

摘要

Headroom 是一个开源工具，可将 AI Agent 读取的工具输出、日志、RAG 片段等压缩 60-95%，同时保持答案质量不变，支持可逆压缩和跨 Agent 共享记忆。

AI Agent Token 压缩 60-95% 开源神器 https://github.com/chopratejas/headroom… 这就是 Headroom，6.7k star LLM Token 终极压缩神器！一句话干翻所有 Token 焦虑：把 Agent 读取的工具输出、日志、RAG 片段、文件、历史对话全部压缩 60-95%，答案质量完全不变，还支持可逆压缩 + 跨 Agent 共享记忆，直接把 Claude Code、Cursor、Aider 等工具的成本和上下文压力干到地板！

查看原文

查看缓存全文

缓存时间: 2026/06/03 03:41

AI Agent Token 压缩 60-95% 开源神器

https://github.com/chopratejas/headroom…

这就是 Headroom，6.7k star LLM Token 终极压缩神器！一句话干翻所有 Token 焦虑：把 Agent 读取的工具输出、日志、RAG 片段、文件、历史对话全部压缩 60-95%，答案质量完全不变，还支持可逆压缩 + 跨 Agent 共享记忆，直接把 Claude Code、Cursor、Aider 等工具的成本和上下文压力干到地板！

chopratejas/headroom

Source: https://github.com/chopratejas/headroom

  ██╗  ██╗███████╗ █████╗ ██████╗ ██████╗  ██████╗  ██████╗ ███╗   ███╗
  ██║  ██║██╔════╝██╔══██╗██╔══██╗██╔══██╗██╔═══██╗██╔═══██╗████╗ ████║
  ███████║█████╗  ███████║██║  ██║██████╔╝██║   ██║██║   ██║██╔████╔██║
  ██╔══██║██╔══╝  ██╔══██║██║  ██║██╔══██╗██║   ██║██║   ██║██║╚██╔╝██║
  ██║  ██║███████╗██║  ██║██████╔╝██║  ██║╚██████╔╝╚██████╔╝██║ ╚═╝ ██║
  ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝╚═════╝ ╚═╝  ╚═╝ ╚═════╝  ╚═════╝ ╚═╝     ╚═╝
                  The context compression layer for AI agents

60–95% fewer tokens · library · proxy · MCP · 6 algorithms · local-first · reversible

Docs · Install · Proof · Agents · Discord · llms.txt

_{AI agents / LLMs: read /llms.txt here, or fetch the live index / full docs blob.}

Headroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.

Headroom in action
_{Live: 10,144 → 1,260 tokens — same FATAL found.}

What it does

Library — compress(messages) in Python or TypeScript, inline in any app
Proxy — headroom proxy --port 8787, zero code changes, any language
Agent wrap — headroom wrap claude|codex|cursor|aider|copilot in one command
MCP server — headroom_compress, headroom_retrieve, headroom_stats for any MCP client
Cross-agent memory — shared store across Claude, Codex, Gemini, auto-dedup
headroom learn — mines failed sessions, writes corrections to CLAUDE.md / AGENTS.md
Reversible (CCR) — originals never deleted; LLM retrieves on demand

How it works (30 seconds)

 Your agent / app
   (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
        │   prompts · tool outputs · logs · RAG results · files
        ▼
    ┌────────────────────────────────────────────────────┐
    │  Headroom   (runs locally — your data stays here)  │
    │  ────────────────────────────────────────────────  │
    │  CacheAligner  →  ContentRouter  →  CCR            │
    │                    ├─ SmartCrusher   (JSON)        │
    │                    ├─ CodeCompressor (AST)         │
    │                    └─ Kompress-base  (text, HF)    │
    │                                                    │
    │  Cross-agent memory  ·  headroom learn  ·  MCP     │
    └────────────────────────────────────────────────────┘
        │   compressed prompt  +  retrieval tool
        ▼
 LLM provider  (Anthropic · OpenAI · Bedrock · …)

ContentRouter — detects content type, selects the right compressor
SmartCrusher / CodeCompressor / Kompress-base — compress JSON, AST, or prose
CacheAligner — stabilizes prefixes so provider KV caches actually hit
CCR — stores originals locally; LLM calls headroom_retrieve if it needs them

→ Architecture · CCR reversible compression · Kompress-base model card

Get started (60 seconds)

# 1 — Install
pip install "headroom-ai[all]"          # Python
npm install headroom-ai                 # Node / TypeScript

# 2 — Pick your mode
headroom wrap claude                    # wrap a coding agent
headroom proxy --port 8787              # drop-in proxy, zero code changes
# or: from headroom import compress      # inline library

# 3 — See the savings
headroom stats

Granular extras: [proxy], [mcp], [ml], [agno], [langchain], [evals]. Requires Python 3.10+.

Proof

Savings on real agent workloads:

Workload	Before	After	Savings
Code search (100 results)	17,765	1,408	92%
SRE incident debugging	65,694	5,118	92%
GitHub issue triage	54,174	14,761	73%
Codebase exploration	78,502	41,254	47%

Accuracy preserved on standard benchmarks:

Benchmark	Category	N	Baseline	Headroom	Delta
GSM8K	Math	100	0.870	0.870	±0.000
TruthfulQA	Factual	100	0.530	0.560	+0.030
SQuAD v2	QA	100	—	97%	19% compression
BFCL	Tools	100	—	97%	32% compression

Reproduce: python -m headroom.evals suite --tier 1 · Full benchmarks & methodology

Agent compatibility matrix

Agent	`headroom wrap`	Notes
Claude Code	●	`--memory` · `--code-graph`
Codex	●	shares memory with Claude
Cursor	●	prints config — paste once
Aider	●	starts proxy + launches
Copilot CLI	●	starts proxy + launches
OpenClaw	●	installs as ContextEngine plugin

Any OpenAI-compatible client works via headroom proxy. MCP-native: headroom mcp install.

When to use · When to skip

Great fit if you…

run AI coding agents daily and want savings without changing your code
work across multiple agents and want shared memory
need reversible compression — originals always retrievable via CCR

Skip it if you…

only use a single provider’s native compaction and don’t need cross-agent memory
work in a sandboxed environment where local processes can’t run

Integrations — drop Headroom into any stack

Your setup	Hook in with
Any Python app	`compress(messages, model=…)`
Any TypeScript app	`await compress(messages, { model })`
Anthropic / OpenAI SDK	`withHeadroom(new Anthropic())` · `withHeadroom(new OpenAI())`
Vercel AI SDK	`wrapLanguageModel({ model, middleware: headroomMiddleware() })`
LiteLLM	`litellm.callbacks = [HeadroomCallback()]`
LangChain	`HeadroomChatModel(your_llm)`
Agno	`HeadroomAgnoModel(your_model)`
Strands	Strands guide
ASGI apps	`app.add_middleware(CompressionMiddleware)`
Multi-agent	`SharedContext().put / .get`
MCP clients	`headroom mcp install`

What's inside

SmartCrusher — universal JSON: arrays of dicts, nested objects, mixed types.
CodeCompressor — AST-aware for Python, JS, Go, Rust, Java, C++.
Kompress-base — our HuggingFace model, trained on agentic traces.
Image compression — 40–90% reduction via trained ML router.
CacheAligner — stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.
IntelligentContext — score-based context fitting with learned importance.
CCR — reversible compression; LLM retrieves originals on demand.
Cross-agent memory — shared store, agent provenance, auto-dedup.
SharedContext — compressed context passing across multi-agent workflows.
headroom learn — plugin-based failure mining for Claude, Codex, Gemini.

Pipeline internals

Headroom exposes one stable request lifecycle across compress(), the SDK, and the proxy:

Setup → Pre-Start → Post-Start → Input Received → Input Cached → Input Routed → Input Compressed → Input Remembered → Pre-Send → Post-Send → Response Received

Transforms do the work: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor, Kompress-base, IntelligentContext / RollingWindow.
Pipeline extensions observe or customize lifecycle stages via on_pipeline_event(...).
Compression hooks sit alongside the canonical lifecycle as an additional extension seam.
Proxy extensions remain the server/app integration seam for ASGI middleware, routes, and startup policy.

Provider and tool-specific behavior lives under headroom/providers/ so core orchestration stays focused on lifecycle, sequencing, and policy.

CLI/tool slices: headroom/providers/claude, copilot, codex, openclaw
Provider runtime slices: headroom/providers/claude, gemini, plus shared backend/runtime dispatch in headroom/providers/registry.py
Core files stay orchestration-first: wrap.py, client.py, cli/proxy.py, and proxy/server.py delegate provider-specific env shaping, API target normalization, backend selection, and transport dispatch.

Install

pip install "headroom-ai[all]"          # Python, everything
npm install headroom-ai                 # TypeScript / Node
docker pull ghcr.io/chopratejas/headroom:latest

Granular extras: [proxy], [mcp], [ml] (Kompress-base), [agno], [langchain], [evals]. Requires Python 3.10+.

Using pipx? Choose a supported interpreter explicitly:

pipx install --python python3.13 "headroom-ai[all]"

→ Installation guide — Docker tags, persistent service, PowerShell, devcontainers.

headroom learn

headroom learn in action

headroom learn — mines failed sessions, writes corrections to CLAUDE.md / AGENTS.md / GEMINI.md.

Documentation

Start here	Go deeper
Quickstart	Architecture
Proxy	How compression works
MCP tools	CCR — reversible compression
Memory	Cache optimization
Failure learning	Benchmarks
Configuration	Limitations

Compared to

Headroom runs locally, covers every content type, works with every major framework, and is reversible.

	Scope	Deploy	Local	Reversible
Headroom	All context — tools, RAG, logs, files, history	Proxy · library · middleware · MCP	Yes	Yes
RTK	CLI command outputs	CLI wrapper	Yes	No
lean-ctx	CLI commands, MCP tools, editor rules	CLI wrapper · MCP	Yes	No
Compresr, Token Co.	Text sent to their API	Hosted API call	No	No
OpenAI Compaction	Conversation history	Provider-native	No	No

Attribution. Headroom ships with the excellent RTK binary for shell-output rewriting — git show --short, scoped ls, summarized installers. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it. Headroom can also use lean-ctx as the selected CLI context tool; set HEADROOM_CONTEXT_TOOL=lean-ctx before running headroom wrap ....

Contributing

git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest

Devcontainers in .devcontainer/ (default + memory-stack with Qdrant & Neo4j). See CONTRIBUTING.md.

Community

Discord — questions, feedback, war stories.
Kompress-base on HuggingFace — the model behind our text compression.

License

Apache 2.0 — see LICENSE.

GitTrend (@GitTrend0x): Claude Code 自动生成专业多 Agent 团队杀手级开源神器

https://t.co/tkr2kJ2TmP

这就是 Harness，5.3k star Claude Code 顶级 meta-skill！一句话干翻所有手动搭 Agent 的痛苦：只要描述一个领域，它就能自动设计出完整的多 Agent 团队（包含角色定义 +

相似文章

@nini_incrypto_: Headroom，把大模型 Token 成本砍掉 95% ！ 1. 真·零代码更改：提供 Proxy 代理模式，任何编程语言只需改个端口就能直接无缝接入。 2. 全吞吐压缩：自动压缩工具输出、运行日志、RAG 知识库切片以及密密麻麻的聊天…

X AI KOLs Timeline

Headroom 是一个上下文压缩层，可以将 AI agent 读取的 Token 成本降低 60-95%，支持零代码更改的代理模式，且不降低模型回答质量。

Headroom (GitHub 仓库)

TLDR AI

Headroom 是一个开源工具，能在 AI 代理读取上下文（工具输出、日志、RAG 块、对话历史等）之前对其进行压缩，在到达 LLM 时可减少 60–95% 的令牌数量，同时保留答案质量。它支持多种集成模式，包括库、代理、代理包装和 MCP 服务器，并提供可逆压缩与跨代理记忆。

@hasantoxr: 所以我发现了一个GitHub仓库，它可以阻止AI代理无谓地消耗token。它叫Headroom。它是由一位……

X AI KOLs Timeline

Headroom是Netflix的Tejas Chopra开发的一个GitHub工具，它能在将输入（工具输出、日志、RAG块等）发送给LLM之前进行压缩，承诺在不改变答案的前提下减少60–95%的token。它支持Python/TypeScript库、本地代理、MCP服务器，以及针对流行编程代理的封装器。

@Chenzeze777: 兄弟们今天刷 GitHub 直接愣住了。 Headroom，一周涨了 1.4 万星，海外开发者圈彻底炸了。我本来以为是又一个 PPT 开源项目，结果仔细看了眼实测数据——代码搜索 1.7 万 token 压到 1400，答案一字没变。给…

X AI KOLs Timeline

Headroom 是一个开源工具，可将代码搜索结果和AI对话中的token数量压缩高达92%（如从1.7万压缩到1400），且保持答案质量不变，支持多平台本地免费运行。

@AYi_AInotes: Damn，这个开源工具直接减少了95%token消耗这可能是今年最狠的LLM降本神器， Netflix工程师开源的Headroom 把本地Agent套在Codex，Cursor，OpenClaw，Hermes或Claude code外面…

X AI KOLs Timeline

Netflix工程师开源了Headroom工具，在本地预处理阶段自动压缩LLM输入上下文，减少高达95%的token消耗，兼容Codex、Cursor等主流AI编码工具，无需修改代码即可生效。

chopratejas/headroom

What it does

How it works (30 seconds)

Get started (60 seconds)

Proof

Agent compatibility matrix

When to use · When to skip

Install

headroom learn

Documentation

Compared to

Contributing

Community

License

相似文章

@nini_incrypto_: Headroom，把大模型 Token 成本砍掉 95% ！ 1. 真·零代码更改：提供 Proxy 代理模式，任何编程语言只需改个端口就能直接无缝接入。 2. 全吞吐压缩：自动压缩工具输出、运行日志、RAG 知识库切片以及密密麻麻的聊天…

Headroom (GitHub 仓库)

@hasantoxr: 所以我发现了一个GitHub仓库，它可以阻止AI代理无谓地消耗token。它叫Headroom。它是由一位……

@Chenzeze777: 兄弟们今天刷 GitHub 直接愣住了。 Headroom，一周涨了 1.4 万星，海外开发者圈彻底炸了。我本来以为是又一个 PPT 开源项目，结果仔细看了眼实测数据——代码搜索 1.7 万 token 压到 1400，答案一字没变。 给…

@AYi_AInotes: Damn，这个开源工具直接减少了95%token消耗 这可能是今年最狠的LLM降本神器， Netflix工程师开源的Headroom 把本地Agent套在Codex，Cursor，OpenClaw，Hermes或Claude code外面…

提交意见反馈

@Chenzeze777: 兄弟们今天刷 GitHub 直接愣住了。 Headroom，一周涨了 1.4 万星，海外开发者圈彻底炸了。我本来以为是又一个 PPT 开源项目，结果仔细看了眼实测数据——代码搜索 1.7 万 token 压到 1400，答案一字没变。给…

@AYi_AInotes: Damn，这个开源工具直接减少了95%token消耗这可能是今年最狠的LLM降本神器， Netflix工程师开源的Headroom 把本地Agent套在Codex，Cursor，OpenClaw，Hermes或Claude code外面…