Use context profiler to optimize your LLM calls and reduce token use

Reddit r/LocalLLaMA 06/12/26, 12:51 PM Tools

llm token-optimization profiler context-window developer-tool open-source

Summary

ContextSpy is a local proxy tool that profiles how LLM applications use their context window, breaking down token usage by category to help developers optimize and reduce costs.

Hi all. After getting inspired at the local PyCon conference, I am working on a new tool for LLM applications and coding agents - a context window profiler: [https://github.com/RimantasZ/contextspy](https://github.com/RimantasZ/contextspy) All the talk now is how to reduce token usage (to either reduce API costs, or speed up local inference), and there are a myriad of tools aimed at solving this automatically - from caveman mode to various token compressors. ContextSpy is a profiler tool for analysing context usage of LLM applications. It is implemented as a local proxy that sits between your coding agent and the LLM API. It records every request and breaks down where the input tokens are going — system prompt, tool definitions, file contents, conversation history, and so on — so you can see how the context window is actually being used. This approach allows optimising token use from the other side - similar to how CPU or memory profiler is used to identify performance bottlenecks or memory leaks, ContexSpy allows reviewing what is in the context and making a decision if all that info is really necessary. It is still in the early stages of development, so any feedback is very welcome - be it someone testing it in their setup, registering some issues (of which there are still plenty), dropping a comment here, or placing a star to keep me going through those sleepless after-work hours :) https://preview.redd.it/kfpp1mryku6h1.png?width=4060&format=png&auto=webp&s=05b2afc5182559a4471860aed573f246e1ee4e82 https://preview.redd.it/lpvlnjmzku6h1.png?width=3254&format=png&auto=webp&s=a986915efb1bbdacbcc1105055e4f572b942783c

Original Article

Use context profiler to optimize your LLM calls and reduce token use

Similar Articles

Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents

@omarsar0: // The Efficiency Frontier // Cool paper on context management. As agents reuse the same documents and histories across…

@IntuitMachine: PEEK: The 1k-Token Map That Just Killed the Long-Context Tax Your LLM agent is reading the same 50k-token codebase for …

Why is every "context layer" tool lying about token savings?

The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management

Submit Feedback

Similar Articles

Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents

@omarsar0: // The Efficiency Frontier // Cool paper on context management. As agents reuse the same documents and histories across…

@IntuitMachine: PEEK: The 1k-Token Map That Just Killed the Long-Context Tax Your LLM agent is reading the same 50k-token codebase for …

Why is every "context layer" tool lying about token savings?

The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management