$16 refactor, 400 steps, 95% routed to open MoE
Summary
A developer built a routing layer on vLLM to route simple agent steps to a cheap open-source MoE model (21B active) and hard steps to Opus, reducing costs to $15.60 for a 400-step refactor with 93.4% success rate.
Similar Articles
Split my agent into a cheap router model and a premium synthesis model, bill dropped about 75%
A developer splits their AI agent's LLM calls into a cheap router model (GPT-OSS 120B) for tool-picking and a premium model (gpt-5.4) for synthesis, cutting costs by ~78% while maintaining output quality.
6 weeks daily-driving an open-source desktop agent shell with a 3-model split (Haiku triager → Sonnet reviewer → Opus executor). Real cost numbers + what broke.
A 6-week real-world experiment using an open-source desktop agent shell with a three-model split (Haiku triager, Sonnet reviewer, Opus executor) reports a 64% cost reduction and details failure modes like context bloat and runaway sub-agents.
I built LEMoE: A stateless, lightweight Mixture of Experts (MoE) router for local LLMs. Open-source and looking for feedback!
LEMoE is an open-source, stateless Mixture of Experts (MoE) router that acts as an API proxy to route prompts to specialized LLMs, featuring cascading contextual routing and silent self-correction.
my agent bill went from $200 a week to $40 when I stopped running Opus on every subtask
A developer shares how they reduced their AI agent's weekly cost from $200 to $40 by routing simple subtasks to cheaper models like DeepSeek V4 Pro and Tencent Hunyuan while keeping complex reasoning on Opus 4.7, achieving comparable output quality for most work.
@hooeem: https://x.com/hooeem/status/2062266452921491934
A guide explaining how to make agentic workflows up to 462x cheaper by compiling fixed procedures into smaller fine-tuned models instead of repeatedly prompting frontier models.