Use context profiler to optimize your LLM calls and reduce token use

Reddit r/LocalLLaMA Tools

Summary

ContextSpy is a local proxy tool that profiles how LLM applications use their context window, breaking down token usage by category to help developers optimize and reduce costs.

Hi all. After getting inspired at the local PyCon conference, I am working on a new tool for LLM applications and coding agents - a context window profiler: [https://github.com/RimantasZ/contextspy](https://github.com/RimantasZ/contextspy) All the talk now is how to reduce token usage (to either reduce API costs, or speed up local inference), and there are a myriad of tools aimed at solving this automatically - from caveman mode to various token compressors. ContextSpy is a profiler tool for analysing context usage of LLM applications. It is implemented as a local proxy that sits between your coding agent and the LLM API. It records every request and breaks down where the input tokens are going — system prompt, tool definitions, file contents, conversation history, and so on — so you can see how the context window is actually being used. This approach allows optimising token use from the other side - similar to how CPU or memory profiler is used to identify performance bottlenecks or memory leaks, ContexSpy allows reviewing what is in the context and making a decision if all that info is really necessary. It is still in the early stages of development, so any feedback is very welcome - be it someone testing it in their setup, registering some issues (of which there are still plenty), dropping a comment here, or placing a star to keep me going through those sleepless after-work hours :) https://preview.redd.it/kfpp1mryku6h1.png?width=4060&format=png&auto=webp&s=05b2afc5182559a4471860aed573f246e1ee4e82 https://preview.redd.it/lpvlnjmzku6h1.png?width=3254&format=png&auto=webp&s=a986915efb1bbdacbcc1105055e4f572b942783c
Original Article

Similar Articles

@omarsar0: // The Efficiency Frontier // Cool paper on context management. As agents reuse the same documents and histories across…

X AI KOLs Following

This paper introduces The Efficiency Frontier, a unified framework for cost–performance optimization in LLM context management that models context strategy selection as a deployment-aware optimization problem, achieving 25% reduction in token usage and over 50% lower token cost with amortized memory compression compared to full-context prompting.

Why is every "context layer" tool lying about token savings?

Reddit r/AI_Agents

The author critiques the lack of transparent benchmarking in emerging context layer and MCP optimizer tools that promise drastic token savings, noting that real-world tests fail to replicate claimed efficiencies. They urge developers to demand open, reproducible benchmarks and ask for recommendations of tools that actually deliver measurable results.