Is anyone actually solving per-prompt model routing well yet, or are we all just eyeballing it?

Reddit r/AI_Agents 06/18/26, 12:00 PM News

ai-routing prompt-routing model-selection cost-optimization agent-workflows triage

Summary

The article explores the challenge of per-prompt model routing in AI agents, questioning whether anyone has effectively solved it. It points out that current practices rely on gut feeling, flat-rate plans reduce pressure to optimize, and a triage layer may introduce its own costs.

I run agents on real work every day. Content pipelines, code, the usual. And the thing I still can't do cleanly is decide which model handles which request. The standard advice is "be disciplined, use the cheap model for the cheap job, save the big one for hard stuff." Fine in theory. But I'm sitting there picking models by gut, and I'm meant to be a power user. If I can't route it reliably, the advice is quietly assuming the hard part is already solved. It isn't. It gets worse when you look closer. The unit isn't even task to model. One task contains cheap turns and expensive turns. A coding agent spends most of its turns reading files, running a command, summarising an error. Boring stuff a small local model like Qwen handles fine. Then one turn actually needs to reason about a tricky bug, and that's the turn you want the expensive model on. So the real granularity is prompt to model, evaluated per turn. Right now nobody routes at that level. You pick one model for the whole run and overpay on the easy turns or underperform on the hard ones. The obvious answer is a triage layer. A small model reads each prompt, scores how hard it is, forwards it to the cheapest model that'll clear the bar. Conceptually clean. I keep waiting for someone to nail it. Here's the bit I can't get past though. That triage model is itself a paid call on every single prompt. To route correctly it has to be good enough to understand the request, which means it isn't free and it isn't instant. So have you actually moved the cost, or just added a tollbooth in front of it? Maybe a tiny classifier is cheap enough that the savings dwarf it. Maybe the routing decision is genuinely harder than it looks and the cheap classifier sends hard prompts to the cheap model and you eat the quality hit. I don't know which way that math falls, and I haven't seen anyone show their working. My honest suspicion is the reason this layer doesn't properly exist yet is that flat-rate plans have removed the pressure. When you're on all-you-can-eat, nobody feels the per-prompt price, so nobody builds the thing that optimises it. The day those plans go metered, routing stops being a nice-to-have and becomes the product. So I'm asking the people who actually build this stuff. Is anyone routing per prompt in production and getting it right? What does your triage layer cost you, and does it earn its keep once you count its own calls? Or is per-prompt auto-routing a worse-is-better trap and we're all better off just picking a model and living with it?

Original Article

Is anyone actually solving per-prompt model routing well yet, or are we all just eyeballing it?

Similar Articles

@tomas_hk: yes it is have written our learnings here:

Would you rather tune one model’s reasoning depth or route across two models?

No One Fits All: From Fixed Prompting to Learned Routing in Multilingual LLMs

Running a 24/7 AI agent dev team: I route each role to a different LLM (Claude/Kimi/MiniMax/GPT) to dodge a ~$2k/mo API bill. Setup + what actually breaks.

Can prompting reduce AI sycophancy or is it mostly model behavior?

Submit Feedback

Similar Articles

@tomas_hk: yes it is have written our learnings here:

Would you rather tune one model’s reasoning depth or route across two models?

No One Fits All: From Fixed Prompting to Learned Routing in Multilingual LLMs

Running a 24/7 AI agent dev team: I route each role to a different LLM (Claude/Kimi/MiniMax/GPT) to dodge a ~$2k/mo API bill. Setup + what actually breaks.

Can prompting reduce AI sycophancy or is it mostly model behavior?