@neil_xbt: Someone turned 383 scattered files and 100 meeting transcripts into a compact wiki and cut their Claude token usage by …
Summary
A technique called the LLM Wiki pattern compiles raw documents into a structured wiki with an index, cutting Claude token usage by 95% by paying structural understanding costs only once during compilation instead of on every query.
View Cached Full Text
Cached at: 06/25/26, 09:15 AM
Someone turned 383 scattered files and 100 meeting transcripts into a compact wiki and cut their Claude token usage by 95%!
The folder structure that did it got Andrej Karpathy 16 million views.
Every AI knowledge tool that processes raw documents on every query burns tokens on structural understanding already paid for the last time you asked.
Roughly 75% of tokens in a typical query go toward understanding structure, not generating the answer. The LLM Wiki pattern pays that cost once, during compilation, and never again.
→ raw folder: immutable source documents the LLM reads but never modifies → wiki folder: structured markdown pages with cross-references the LLM builds and maintains
→ index file: a lightweight map the model loads first, pulling only the relevant pages → the 75% finding: LLMs spend 75% of tokens on structural understanding, not answers — the wiki eliminates that cost after the first build
The gap between people re-processing their own documents on every query and people querying a compiled wiki at 95% lower cost is not technical complexity.
It is two folders and thirty minutes of setup.
Bookmark so you do not lose it!
Follow @neil_xbt for more AI engineering intelligence that shows you what the 16-million-view folder structure actually requires to build.
Similar Articles
@PawelHuryn: 187,000 people have starred one CLAUDE.md file on GitHub. A CLAUDE.md loads on every turn. Every line is rent you pay o…
A Twitter thread by Paweł Huryn shares insights on writing effective CLAUDE.md files for Claude Code, emphasizing project-specific rules and avoiding unnecessary bloat.
@Asteri_eth: Karpathy found a way to reduce token consumption by 90% The problem is that the LLM re-reads the same files over and ov…
Karpathy's 'Wiki Layer' method reduces LLM token usage by up to 90% by having the model clean, structure, and link data into a local Markdown knowledge base, eliminating repeated reading of raw files.
@DataChaz: STOP BURNING YOUR TOKENS! If you use Claude Code, you are probably wasting 80% of your context window. I found 10 ace t…
A tweet thread by @DataChaz lists 10 open-source tools to drastically reduce token usage in Claude Code and similar AI coding assistants, potentially cutting API bills by 75-98% through various optimizations.
@tom_doerr: Reduces Claude Code and Cursor token costs by 60-95% https://github.com/yvgude/lean-ctx
lean-ctx is an open-source Rust-based context runtime that reduces token costs for AI coding agents like Claude Code, Cursor, Copilot, and others by 60–95% through file read compression and shell output optimization. It operates as a Shell Hook and MCP Server with 56 tools and multiple read modes.
@_avichawla: Claude Code used 3x fewer tokens with one change: - Before: 10.4M tokens · 10 errors · $9.21 - After: 3.7M tokens · 0 e…
By swapping to Insforge Skills + CLI as the backend context layer, a user cut Claude Code token usage by 64 %, eliminated all errors and reduced cost from $9.21 to $2.81.