$16 refactor, 400 steps, 95% routed to open MoE

Reddit r/LocalLLaMA Tools

Summary

A developer built a routing layer on vLLM to route simple agent steps to a cheap open-source MoE model (21B active) and hard steps to Opus, reducing costs to $15.60 for a 400-step refactor with 93.4% success rate.

Got tired of $160 Opus bills so I spent a weekend wiring up a routing layer on vLLM 0.8 (2xA100, enable\_auto\_tool\_choice). Getting the tool call parser to cooperate took longer than the actual routing logic. Once it worked though, easy agent steps go to the 21B active MoE and hard steps get Opus. Hunyuan Hy3 preview handled 380 of 400 steps on a 12k line Python repo at \~$0.02 each ($7.60). Opus covered the remaining 20 at $0.40 ($8), so $15.60 all in. I set reasoning to no\_think on routine steps which cut token spend by roughly 30%. Final success rate was 93.4%. DeepSeek V4 hit similar accuracy but ran about 2x slower on search loop steps. The 14 file circular import refactor is where it fell apart. Kept hallucinating module paths that didn't exist. Tencent reports 99.99% step success over 495 step workflows in production, and honestly that tracks for straightforward calls, but tangled dependency graphs still need Opus.
Original Article

Similar Articles