Show HN: Smart model routing directly in Claude, Codex and Cursor

Hacker News Top 06/26/26, 04:40 PM Tools

model-routing claude codex cursor open-source developer-tools

Summary

Workweave/router is a tool for smart model routing directly within Claude, Codex, and Cursor, enabling efficient selection of AI models.

We built a model router that plugs into coding agents (e.g. Claude Code, Codex, Cursor, etc.) and intelligently sends requests to the best model to serve them. Here's a quick demo of running it locally: <a href="https://www.youtube.com/watch?v=isKhAyivtfM" rel="nofollow">https://www.youtube.com/watch?v=isKhAyivtfM</a>.At Weave, we write ~all our code with AI, and it's been getting more expensive. This came to a head when Opus 4.7 was released and, thanks to its tokenizer changes, our costs shot up. We knew we didn't need Opus for everything but we didn't want to lose out on the intelligence for the cases where you really need it. So we decided to build a model router to handle this for us.The Weave Router acts as an Anthropic/OpenAI endpoint specifically for coding agents. It looks at every inference request and intelligently (more on that in a sec) decides what model to send it to, handling all the translations required along the way. So it can use faster/cheaper models (e.g. DeepSeek v4, GLM 5.2, Kimi K2.6) when possible, and frontier models (Opus 4.8 & GPT 5.5 (& Fable whenever it's back)) when necessary.How do we know what model to route to? We trained an RL model on tens of thousands (so far!) of agent traces. We reward the routing model when it selects an LLM that successfully completes the given task.Here's an example: if you ask the router to plan a complex change, it will (probably) route that request to Opus 4.8. Subagents exploring the codebase to gather context will be routed to more suitable models (e.g. DeepSeek V4 Flash). Then when you have the plan ready to implement, it will be (most likely) be handed to a quicker model (e.g. GLM 5.2) to carry it out.We've been using this internally for the last month or so. We've saved 40% on tokens vs. what we otherwise would have paid, with no noticeable differences in quality or velocity.The router is source-available under Elastic License 2.0, so you can self-host it. Or if you prefer, you can also use our hosted version: weaverouter.com.I'll be here to answer any questions you may have!

Original Article

View Cached Full Text

Cached at: 06/26/26, 05:19 PM

workweave/router

Source: https://github.com/workweave/router

One endpoint. Every model. Always the right one.

A drop-in proxy for Anthropic, OpenAI, and Gemini that picks the best model for every request: using a tiny on-box embedder, not a vibes-based prompt.

🥇 #1 on the RouterArena leaderboard ¹ — Acc-Cost Arena 76.09.

Built by Weave: The #1 engineering intelligence platform, loved by Robinhood, PostHog, Reducto, and hundreds of others.

What it does

Point Claude Code, Codex, Cursor, or your own app at localhost:8080. The router:

🎯 Routes per request. A cluster scorer derived from Avengers-Pro ² picks the right model from your enabled providers, every turn.
🔌 Speaks everyone’s API. Anthropic Messages, OpenAI Chat Completions, Gemini native. Streaming, tools, vision, the works.
🧠 Knows OSS too. DeepSeek, Kimi, GLM, Qwen, Llama, Mistral via OpenRouter (or any OpenAI-compatible endpoint).
🔒 BYOK by default. Provider keys stay on your box, encrypted at rest.
📊 Observable. OTLP traces out of the box. See your dashboard in the Weave dashboard (http://localhost:8080/ui/dashboard) or drop in Honeycomb, Datadog, Grafana, whatever.

30-second quickstart

The fastest way: point Claude Code, Codex, or opencode at the hosted Weave Router with one command. No clone, no Docker, no Postgres.

npx @workweave/router

That’s it. The installer asks which tool (Claude Code, Codex, or opencode), walks you through scope (user vs. project), grabs a router key, and wires the right config file. Other flavors:

npx @workweave/router --claude              # skip the picker, Claude Code
npx @workweave/router --codex               # skip the picker, OpenAI Codex CLI
npx @workweave/router --opencode            # skip the picker, opencode
npx @workweave/router --scope project       # per-repo, commits settings.json (or .codex/ / opencode.json)
npx @workweave/router --local               # self-hosted localhost:8080
npx @workweave/router --base-url https://router.acme.internal
npx @workweave/[email protected]                 # pin a version

Requires Node ≥ 18 (Claude Code and opencode paths also need jq). Full flag reference: install/npm/README.md.

Or: self-host the whole stack

If you want the router (and dashboard) running on your own box:

# 1. Drop a provider key in. OpenRouter is the recommended baseline.
echo "OPENROUTER_API_KEY=sk-or-v1-..." >> .env.local

# 2. Boot Postgres + router on :8080 and seed an rk_ key.
make full-setup

The router is up at http://localhost:8080, the dashboard at http://localhost:8080/ui/ (password: admin), and your rk_... key prints in the logs.

# Call it like Anthropic
curl -sS http://localhost:8080/v1/messages \
  -H "Authorization: Bearer rk_..." \
  -d '{"model":"claude-sonnet-4-5","max_tokens":256,
       "messages":[{"role":"user","content":"hi"}]}'

# ...or like OpenAI
curl -sS http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer rk_..." \
  -d '{"model":"gpt-4o-mini",
       "messages":[{"role":"user","content":"hi"}]}'

# Peek at the routing decision without proxying
curl -sS http://localhost:8080/v1/route -H "Authorization: Bearer rk_..." -d '...'

Wire it into your tools

Claude Code. Run make install-cc to wire Claude Code at the local self-hosted router (it’s also invoked automatically at the end of make full-setup). For the hosted router, use npx @workweave/router above.

Codex (OpenAI CLI). npx @workweave/router --codex patches ~/.codex/config.toml (or <repo>/.codex/config.toml with --scope project) with a managed [model_providers.weave] block and sets model_provider = "weave". Codex’s existing OPENAI_API_KEY flows through to api.openai.com for the plan-based passthrough; the router key rides in an X-Weave-Router-Key HTTP header. Re-install and --uninstall --codex rewrite/remove only the managed block, leaving the rest of your Codex config untouched.

opencode. npx @workweave/router --opencode merges a provider.weave entry into ~/.config/opencode/opencode.json (or <repo>/opencode.json with --scope project). It uses opencode’s bundled @ai-sdk/anthropic provider pointed at the router’s /v1 endpoint — the router speaks the Anthropic Messages API natively, so opencode works unmodified. The router key and identity headers ride alongside the provider config; re-install rewrites only the managed block and --uninstall --opencode strips it.

Cursor (early beta, performance may not be the best). Settings → Models → Override OpenAI Base URL → http://localhost:8080/v1, paste rk_... as the API key.

Switching on/off. After installing, npx @workweave/router off --claude (or --codex / --opencode) routes that client straight to its provider again without discarding the router config; on flips it back, and status reports which way it’s pointing. Claude Code also gets /router-off, /router-on, and /router-status slash commands. Cursor toggles via the same Settings → Models override above. See install/README.md.

Two keys, don’t mix them up:

sk-or-... / sk-ant-... / sk-... = your upstream provider key. Lives in .env.local.

rk_... = your router key. Clients send this as a Bearer token.

Endpoints

Endpoint	Format
`POST /v1/messages`	Anthropic Messages, routed
`POST /v1/chat/completions`	OpenAI Chat Completions, routed
`POST /v1beta/models/:action`	Gemini `generateContent`, routed
`POST /v1/route`	Returns the decision, no upstream call
`GET /v1/models` · `POST /v1/messages/count_tokens`	Anthropic passthrough
`GET /health` · `GET /validate`	liveness + key check

Deeper docs

📐 Configuration reference: every env var, BYOK encryption, OTel knobs, cluster routing.
🛠️ Contributing: layering rules, hot-reload dev, migrations, tests, the whole engineering loop.
🏗️ Architecture: package layout, import contracts, recipes for adding endpoints / providers / strategies.

Roadmap

Token-aware rate limiting (Redis sliding window per installation)
Sub-installations for tenant hierarchies
Speculative dispatch + hedging for tail latency

Zhang, Y. et al. Beyond GPT-5: Making LLMs Cheaper and Better via Performance–Efficiency Optimized Routing (Avengers-Pro). arXiv:2508.12631, 2025. https://arxiv.org/abs/2508.12631

Lu, Y., Liu, R., Yuan, J., Cui, X., Zhang, S., Liu, H., & Xing, J. RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers. arXiv:2510.00202, 2025. https://arxiv.org/abs/2510.00202

Show HN: Smart model routing directly in Claude, Codex and Cursor

workweave/router

What it does

30-second quickstart

Or: self-host the whole stack

Wire it into your tools

Endpoints

Deeper docs

Roadmap

Similar Articles

Switchcraft: AI Model Router for Agentic Tool Calling

Run Codex and Claude with any model including GLM 5.2. No settings file headaches.

@adambcohen93: Weave is launching the number 1 prompt router in the world. It enables you to get 70% more efficient use of your tokens…

@shao__meng: Why do Claude Code, Cursor, Codex, Aider, and Cline exhibit different agent behaviors despite potentially sharing the same underlying models? @addyosmani argues: It's due to the "shell" above the model — the Harness, which includes "prompts, ...

@DeRonin_: How I actually route between models : Tweet drafts : Sonnet 4.6 Long-form articles : Opus 4.6 Code work : Kimi 2.6 Agen…

Submit Feedback

Similar Articles

Switchcraft: AI Model Router for Agentic Tool Calling

Run Codex and Claude with any model including GLM 5.2. No settings file headaches.

@adambcohen93: Weave is launching the number 1 prompt router in the world. It enables you to get 70% more efficient use of your tokens…

@shao__meng: Why do Claude Code, Cursor, Codex, Aider, and Cline exhibit different agent behaviors despite potentially sharing the same underlying models? @addyosmani argues: It's due to the "shell" above the model — the Harness, which includes "prompts, ...

@DeRonin_: How I actually route between models : Tweet drafts : Sonnet 4.6 Long-form articles : Opus 4.6 Code work : Kimi 2.6 Agen…