I built a proxy to shrink agent LLM requests after my API bill stopped making sense

Reddit r/AI_Agents Products

Summary

A solo founder introduces Orqen, a proxy that sits between your SDK and LLM providers to optimize outbound requests by compressing tool results, managing history, and reducing token costs, without changing agent code.

I’ve been building agentic apps on OpenAI / Anthropic / Bedrock. Subscriptions felt capped until every loop resent full tool lists, fat tool results, and growing history. Input tokens were the real meter — not “one chat,” dozens of full payloads. I wanted frontier models, not “buy a GPU and run 27B,” but the cloud bill still hurt. So I built Orqen: sits between your SDK and the provider, optimises the whole outbound request each turn (tool routing, compressing tool results, long-session history/summarisation, schema cleanup, validation with fail-open). You change the API key + base URL; the agent code stays the same. It’s live now. I’m a solo founder, UK company. Still early — looking for people running tool-calling agents in prod to tell me what would make them trust a proxy in the path. Questions I’m trying to answer: - Where is most of your token bloat — tools, history, or tool results? - Would response headers + dashboard proof of saved tokens be enough? - What would stop you from trying it?
Original Article

Similar Articles

Proxy for LLMs to learn how Agents works?

Reddit r/AI_Agents

User seeks an open-source proxy to intercept and debug API calls from AI agents to understand their internal workings, after finding LiteLLM too enterprise-focused.

10 Ways To Reduce Your LLM API Costs

Reddit r/AI_Agents

A practical guide listing 10 strategies to reduce costs when using LLM APIs, including model selection, prompt caching, batch processing, and monitoring expenses.