Tag
Headroom is a context compression layer that cuts AI agent token costs by 60–95%, supports a zero-code-change proxy mode, and does not degrade model response quality.
This paper proposes Telegraph English, a readable symbolic format for context compression that outperforms matched-budget baselines on multi-hop QA datasets, preserving entity content more densely.
An analysis of how six AI coding agents (Claude Code, Codex CLI, OpenCode, Cline, Cursor, Amp) converge on layered progressive compression for long contexts, differing in what they protect (user messages, stateful tool outputs) and whether they inform the model of compression, with tradeoffs between cost and accuracy.
This paper presents Latent Context Language Models (LCLMs), a family of encoder-decoder compressors that efficiently handle long contexts through architectural search and large-scale pretraining, outperforming traditional KV cache methods in accuracy, speed, and memory usage.
An open-source tool called Headroom compresses AI agent context by up to 90% using a reversible Compress-Cache-Retrieve architecture, enabling models to retrieve original details on demand instead of discarding them permanently.
Headroom is an open-source tool that compresses tool outputs, logs, RAG snippets, and more read by AI Agents by 60-95% while maintaining answer quality, supporting reversible compression and cross-agent shared memory.
LongAttnComp adapts AttnComp for long-context reasoning by fine-tuning lightweight cross-attention layers and introducing token-level chunking, a top-p algorithm, positional reordering, and a query parser. It achieves strong performance on long-context tasks like code debugging and transfers across multiple model families.
Tencent Cloud database team open-sourced TencentDB Agent Memory, a runtime system that solves the context degradation problem in long tasks for AI agents, compressing short-term context into the memory system through three-layer backtracking and dynamic compression, and integrating a long-term memory pipeline. This is a landmark attempt for AI agent memory systems moving from 'database' to 'runtime'.
Headroom is an open-source tool that compresses context for AI agents—tool outputs, logs, RAG chunks, and conversation history—before they reach the LLM, reducing tokens by 60–95% while preserving answer quality. It supports multiple integration modes including library, proxy, agent wrapping, and MCP server, and offers reversible compression with cross-agent memory.
Tencent AI has open-sourced an Agent memory system that significantly improves token efficiency and agent consistency in long dialogues through three methods: real-time context compression, Mermaid task maps, and Persona memory. Token consumption is reduced by 61%, and persona consistency jumps from 48% to 76%.
lean-ctx is an open-source Rust-based context runtime that reduces token costs for AI coding agents like Claude Code, Cursor, Copilot, and others by 60–95% through file read compression and shell output optimization. It operates as a Shell Hook and MCP Server with 56 tools and multiple read modes.
TACO is a self-evolving framework that automatically discovers and refines context compression rules for long-horizon terminal agents.
TACO introduces a self-evolving compression framework that automatically learns to shrink redundant terminal interaction history, cutting token overhead ~10% while boosting accuracy 1-4% across TerminalBench and other code-agent benchmarks.