@DataChaz: STOP BURNING YOUR TOKENS! If you use Claude Code, you are probably wasting 80% of your context window. I found 10 ace t…
Summary
A tweet thread by @DataChaz lists 10 open-source tools to drastically reduce token usage in Claude Code and similar AI coding assistants, potentially cutting API bills by 75-98% through various optimizations.
View Cached Full Text
Cached at: 05/18/26, 10:35 PM
STOP BURNING YOUR TOKENS!
If you use Claude Code, you are probably wasting 80% of your context window.
I found 10 ace tools that will completely rescue your API bill.
- Caveman Claude
- Literally makes Claude talk like a caveman
- Slashes 75% of output tokens with zero loss in accuracy Repo → http://github.com/juliusbrussee/caveman…
- RTK (Rust Token Killer)
- A blazing fast proxy that filters terminal output
- 60-90% reduction and completely dependency-free Repo → http://github.com/rtk-ai/rtk
- Code Review Graph
- Claude reads only what matters using a Tree-sitter graph
- An unbelievable 49x token reduction on huge monorepos Repo → http://github.com/tirth8205/code-review-graph…
- Context Mode
- Sandboxes raw output into SQLite instead of your context
- A staggering 98% context reduction on logs & GitHub Repo → http://github.com/mksglu/context-mode…
- Claude Token Optimizer
- Brilliant setup prompts that optimize any project
- 90% token savings, taking docs from 11K to 1.3K Repo → http://github.com/nadimtuhin/claude-token-optimizer…
- Token Optimizer
- Hunts down the invisible ghost tokens eating your context
- Fully restores and protects your context quality Repo → http://github.com/alexgreensh/token-optimizer…
- Token Optimizer MCP
- Adds aggressive caching and compression to your MCP tools
- 95%+ token reduction through pure intelligence Repo → http://github.com/ooples/token-optimizer-mcp…
- Claude Context
- Zilliz’s hybrid vector search MCP
- Makes your entire codebase the context for 40% less cost Repo → http://github.com/zilliztech/claude-context…
- Claude Token Efficient
- Just drop one CLAUDE.md file into your repo
- Enforces strict terseness with zero code changes Repo → http://github.com/drona23/claude-token-efficient…
- Token Savior
- Navigates your code by symbols, not giant files
- 97% reduction on code navigation with persistent memory Repo → http://github.com/mibayy/token-savior…
[ The god-tier stack ] Pick 2-3 based on what’s draining you:
Massive repo? Code Review Graph + Token Savior Heavy terminal output? RTK MCP data dumps? Context Mode Need an instant fix? Caveman + Claude Token Efficient
Most devs are bleeding tokens.
Run /context in a fresh session and watch the savings roll in
juliusbrussee/caveman
Source: https://github.com/juliusbrussee/caveman
caveman
why use many token when few do trick
Before/After • Install • What You Get • Benchmarks • Full install guide
A Claude Code skill/plugin (also Codex, Gemini, Cursor, Windsurf, Cline, Copilot, 30+ more) that makes agent talk like caveman — cuts ~75% of output tokens, keeps full technical accuracy. Brain still big. Mouth small.
Before / After
🗣️ Normal Claude (69 tokens)
|
🪨 Caveman Claude (19 tokens)
|
🗣️ Normal Claude
|
🪨 Caveman Claude
|
Same fix. 75% less word. Brain still big.
┌─────────────────────────────────────┐
│ TOKENS SAVED ████████ 75% │
│ TECHNICAL ACCURACY ████████ 100%│
│ SPEED INCREASE ████████ ~3x │
│ VIBES ████████ OOG │
└─────────────────────────────────────┘
Pick your level of grunt — lite (drop filler), full (default caveman), ultra (telegraphic), or wenyan (classical Chinese, even shorter). One command switch. Cost go down forever.
Install
One line. Find every agent. Install for each.
# macOS / Linux / WSL / Git Bash
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash
# Windows (PowerShell 5.1+)
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex
~30 seconds. Needs Node ≥18. Skip agent you no have. Safe to re-run.
Trigger: type /caveman or say “talk like caveman”. Stop with “normal mode”.
One agent only, manual command, or any of 30+ other agents → INSTALL.md. Install break? Open agent, say “Read CLAUDE.md and INSTALL.md, install caveman for me.” Agent fix own brain.
What You Get
| Skill | What |
|---|---|
/caveman [lite|full|ultra|wenyan] | Compress every reply. Levels stick until session end. |
/caveman-commit | Conventional Commit messages, ≤50 char subject. Why over what. |
/caveman-review | One-line PR comments: L42: 🔴 bug: user null. Add guard. |
/caveman-stats | Real session token usage + lifetime savings + USD. Tweetable line via --share. |
/caveman-compress <file> | Rewrite memory file (e.g. CLAUDE.md) into caveman-speak. Cuts ~46% input tokens every session. Code/URLs/paths byte-preserved. |
caveman-shrink | MCP middleware. Wraps any MCP server, compresses tool descriptions. npm. |
cavecrew-* | Caveman subagents (investigator/builder/reviewer). ~60% fewer tokens than vanilla, main context lasts longer. |
Statusline badge — Claude Code shows [CAVEMAN] ⛏ 12.4k (lifetime tokens saved). Updates every /caveman-stats run. Set CAVEMAN_STATUSLINE_SAVINGS=0 to silence.
Auto-activate every session: Claude Code, Codex, Gemini (built-in). Cursor / Windsurf / Cline / Copilot get always-on rule files via --with-init. Other agents trigger with /caveman per session. Full feature matrix in INSTALL.md.
Benchmarks
Real token counts from the Claude API. Average 65% output reduction across 10 prompts (range 22-87%).
| Task | Normal | Caveman | Saved |
|---|---|---|---|
| Explain React re-render bug | 1180 | 159 | 87% |
| Fix auth middleware token expiry | 704 | 121 | 83% |
| Set up PostgreSQL connection pool | 2347 | 380 | 84% |
| Explain git rebase vs merge | 702 | 292 | 58% |
| Refactor callback to async/await | 387 | 301 | 22% |
| Architecture: microservices vs monolith | 446 | 310 | 30% |
| Review PR for security issues | 678 | 398 | 41% |
| Docker multi-stage build | 1042 | 290 | 72% |
| Debug PostgreSQL race condition | 1200 | 232 | 81% |
| Implement React error boundary | 3454 | 456 | 87% |
| Average | 1214 | 294 | 65% |
Raw data and reproduction script: benchmarks/. Three-arm eval harness (baseline / terse / skill) lives in evals/ — caveman compared against Answer concisely. not against verbose default, so the delta is honest.
caveman-compress receipts (real memory files):
| File | Original | Compressed | Saved |
|---|---|---|---|
claude-md-preferences.md | 706 | 285 | 59.6% |
project-notes.md | 1145 | 535 | 53.3% |
claude-md-project.md | 1122 | 636 | 43.3% |
todo-list.md | 627 | 388 | 38.1% |
mixed-with-code.md | 888 | 560 | 36.9% |
| Average | 898 | 481 | 46% |
Caveman only affects output tokens — thinking/reasoning tokens untouched. Caveman no make brain smaller. Caveman make mouth smaller. Biggest win is readability and speed, cost savings a bonus.
A March 2026 paper “Brevity Constraints Reverse Performance Hierarchies in Language Models” found that constraining large models to brief responses improved accuracy by 26 points on certain benchmarks. Verbose not always better. Sometimes less word = more correct.
How It Work
- Install drop skill file in agent.
- Skill tell agent: drop filler, keep substance, use fragments.
- For Claude Code, hook also write tiny flag file each session — agent see flag, talk caveman from message one. No need say
/caveman. - Stats command read Claude Code session log, count tokens saved, write number to statusline.
- Caveman-compress sub-skill rewrite memory files (CLAUDE.md, project notes) so each session start with smaller context. Save tokens forever, not just one reply.
Maintainer detail (hook architecture, file ownership, CI sync) live in CLAUDE.md.
Lobster, Meet Rock 🦞🪨
OpenClaw the self-host gateway. One box, many agent inside (Claude Code, Codex, Pi, OpenCode), wired to your Slack / Discord / iMessage / Telegram / whatever. Tagline: “The lobster way.” Lobster strong. Lobster smart. Lobster also talk a lot.
Caveman teach lobster brevity — same canonical installer, scoped to one agent:
# macOS / Linux / WSL
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash -s -- --only openclaw
# Windows (PowerShell): no Node? install Node ≥18 first, then
npx -y github:JuliusBrussee/caveman -- --only openclaw
Two thing happen, no more:
- Skill drop at
~/.openclaw/workspace/skills/caveman/SKILL.md— spec-correct frontmatter (version,always: true), discoverable byopenclaw skills list. Skill not auto-inject (OpenClaw load skill on demand) — that why we also do step 2. - SOUL.md nudge. Tiny marker-fenced block appended to
~/.openclaw/workspace/SOUL.md. OpenClaw inject SOUL.md into every turn under “Project Context” (12K-per-file, 60K total — block well under). Lobster terse from message one. No/cavemanper session. No nag.
~/.openclaw/workspace/
├── skills/caveman/SKILL.md ← full ruleset, on-demand load
└── SOUL.md ← <!-- caveman-begin --> ... <!-- caveman-end -->
↑ auto-inject every turn
Custom workspace path? OPENCLAW_WORKSPACE=/your/path before the command. Uninstall: same one-liner with --uninstall — skill folder gone, SOUL.md block ripped out cleanly, your other workspace content stay untouched. Idempotent re-runs (frontmatter not double-prepended, marker block not duplicated).
Lobster claw still sharp. Lobster mouth now small. Brain still big.
Caveman Ecosystem
Four tools. One philosophy: agent do more with less.
| Repo | What |
|---|---|
| caveman (you here) | Output compression — why use many token when few do trick |
| cavemem | Cross-agent memory — why agent forget when agent can remember |
| cavekit | Spec-driven build loop — why agent guess when agent can know |
| cavegemma | Gemma 4 31B fine-tuned on caveman pairs — why prompt every turn when weight remember |
Compose: cavekit drive build, caveman compress what agent say, cavemem compress what agent remember, cavegemma bake compression into weight. One rock. Two rock. Three rock. Four rock. That it.
Links
- INSTALL.md — full install matrix, all flags, per-agent detail
- CONTRIBUTING.md — how to send patch
- CLAUDE.md — maintainer guide (file ownership, hook architecture, CI)
- docs/ — extra guides (Windows install, etc.)
- Issues — bug, feature, weird behavior
Star This Repo
Caveman save you token, save you money. Star cost zero. Fair trade. ⭐
Also by Julius Brussee
- Revu — local-first macOS study app with FSRS spaced repetition. revu.cards
License
MIT — free like mass mammoth on open plain.
Similar Articles
@_avichawla: A smarter Claude model burns more tokens, not fewer! And it's not a minor 3-5% difference. But 54% higher token usage. …
The article analyzes why smarter AI agents like Claude consume more tokens when interacting with human-centric backends like Supabase due to inefficient context discovery. It introduces InsForge, an open-source backend tool designed for agents that provides structured context to significantly reduce token usage and manual interventions.
@_avichawla: Claude Code used 3x fewer tokens with one change: - Before: 10.4M tokens · 10 errors · $9.21 - After: 3.7M tokens · 0 e…
By swapping to Insforge Skills + CLI as the backend context layer, a user cut Claude Code token usage by 64 %, eliminated all errors and reduced cost from $9.21 to $2.81.
@tom_doerr: Reduces Claude Code and Cursor token costs by 60-95% https://github.com/yvgude/lean-ctx
lean-ctx is an open-source Rust-based context runtime that reduces token costs for AI coding agents like Claude Code, Cursor, Copilot, and others by 60–95% through file read compression and shell output optimization. It operates as a Shell Hook and MCP Server with 56 tools and multiple read modes.
@KevinNaughtonJr: the 12-line function claude code produces after burning 600k tokens
A developer tweets that Claude Code produced a mere 12-line function after consuming 600k tokens, highlighting potential inefficiency in AI code generation.
Save your tokens on Claude Code
A user shares a custom harness for Claude Code that uses RAG with a graph database and bash script hooks to prevent context stuffing, saving tokens.