Show HN: Smart model routing directly in Claude, Codex and Cursor
Summary
Workweave/router is a tool for smart model routing directly within Claude, Codex, and Cursor, enabling efficient selection of AI models.
View Cached Full Text
Cached at: 06/26/26, 05:19 PM
workweave/router
Source: https://github.com/workweave/router
One endpoint. Every model. Always the right one.
A drop-in proxy for Anthropic, OpenAI, and Gemini that picks the best model for every request: using a tiny on-box embedder, not a vibes-based prompt.
🥇 #1 on the RouterArena leaderboard 1 — Acc-Cost Arena 76.09.
Built by Weave: The #1 engineering intelligence platform, loved by Robinhood, PostHog, Reducto, and hundreds of others.
What it does
Point Claude Code, Codex, Cursor, or your own app at localhost:8080. The router:
- 🎯 Routes per request. A cluster scorer derived from Avengers-Pro 2 picks the right model from your enabled providers, every turn.
- 🔌 Speaks everyone’s API. Anthropic Messages, OpenAI Chat Completions, Gemini native. Streaming, tools, vision, the works.
- 🧠 Knows OSS too. DeepSeek, Kimi, GLM, Qwen, Llama, Mistral via OpenRouter (or any OpenAI-compatible endpoint).
- 🔒 BYOK by default. Provider keys stay on your box, encrypted at rest.
- 📊 Observable. OTLP traces out of the box. See your dashboard in the Weave dashboard (http://localhost:8080/ui/dashboard) or drop in Honeycomb, Datadog, Grafana, whatever.
30-second quickstart
The fastest way: point Claude Code, Codex, or opencode at the hosted Weave Router with one command. No clone, no Docker, no Postgres.
npx @workweave/router
That’s it. The installer asks which tool (Claude Code, Codex, or opencode), walks you through scope (user vs. project), grabs a router key, and wires the right config file. Other flavors:
npx @workweave/router --claude # skip the picker, Claude Code
npx @workweave/router --codex # skip the picker, OpenAI Codex CLI
npx @workweave/router --opencode # skip the picker, opencode
npx @workweave/router --scope project # per-repo, commits settings.json (or .codex/ / opencode.json)
npx @workweave/router --local # self-hosted localhost:8080
npx @workweave/router --base-url https://router.acme.internal
npx @workweave/[email protected] # pin a version
Requires Node ≥ 18 (Claude Code and opencode paths also need jq). Full
flag reference: install/npm/README.md.
Or: self-host the whole stack
If you want the router (and dashboard) running on your own box:
# 1. Drop a provider key in. OpenRouter is the recommended baseline.
echo "OPENROUTER_API_KEY=sk-or-v1-..." >> .env.local
# 2. Boot Postgres + router on :8080 and seed an rk_ key.
make full-setup
The router is up at http://localhost:8080, the dashboard at
http://localhost:8080/ui/ (password: admin), and your rk_... key
prints in the logs.
# Call it like Anthropic
curl -sS http://localhost:8080/v1/messages \
-H "Authorization: Bearer rk_..." \
-d '{"model":"claude-sonnet-4-5","max_tokens":256,
"messages":[{"role":"user","content":"hi"}]}'
# ...or like OpenAI
curl -sS http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer rk_..." \
-d '{"model":"gpt-4o-mini",
"messages":[{"role":"user","content":"hi"}]}'
# Peek at the routing decision without proxying
curl -sS http://localhost:8080/v1/route -H "Authorization: Bearer rk_..." -d '...'
Wire it into your tools
Claude Code. Run make install-cc to wire Claude Code at the local
self-hosted router (it’s also invoked automatically at the end of
make full-setup). For the hosted router, use npx @workweave/router
above.
Codex (OpenAI CLI). npx @workweave/router --codex patches
~/.codex/config.toml (or <repo>/.codex/config.toml with --scope project)
with a managed [model_providers.weave] block and sets model_provider = "weave".
Codex’s existing OPENAI_API_KEY flows through to api.openai.com for the
plan-based passthrough; the router key rides in an X-Weave-Router-Key HTTP
header. Re-install and --uninstall --codex rewrite/remove only the managed
block, leaving the rest of your Codex config untouched.
opencode. npx @workweave/router --opencode merges a provider.weave
entry into ~/.config/opencode/opencode.json (or <repo>/opencode.json
with --scope project). It uses opencode’s bundled @ai-sdk/anthropic
provider pointed at the router’s /v1 endpoint — the router speaks the
Anthropic Messages API natively, so opencode works unmodified. The router
key and identity headers ride alongside the provider config; re-install
rewrites only the managed block and --uninstall --opencode strips it.
Cursor (early beta, performance may not be the best). Settings →
Models → Override OpenAI Base URL → http://localhost:8080/v1, paste
rk_... as the API key.
Switching on/off. After installing, npx @workweave/router off --claude
(or --codex / --opencode) routes that client straight to its provider
again without discarding the router config; on flips it back, and status
reports which way it’s pointing. Claude Code also gets /router-off,
/router-on, and /router-status slash commands. Cursor toggles via the same
Settings → Models override above. See install/README.md.
Two keys, don’t mix them up:
sk-or-.../sk-ant-.../sk-...= your upstream provider key. Lives in.env.local.rk_...= your router key. Clients send this as a Bearer token.
Endpoints
| Endpoint | Format |
|---|---|
POST /v1/messages | Anthropic Messages, routed |
POST /v1/chat/completions | OpenAI Chat Completions, routed |
POST /v1beta/models/:action | Gemini generateContent, routed |
POST /v1/route | Returns the decision, no upstream call |
GET /v1/models · POST /v1/messages/count_tokens | Anthropic passthrough |
GET /health · GET /validate | liveness + key check |
Deeper docs
- 📐 Configuration reference: every env var, BYOK encryption, OTel knobs, cluster routing.
- 🛠️ Contributing: layering rules, hot-reload dev, migrations, tests, the whole engineering loop.
- 🏗️ Architecture: package layout, import contracts, recipes for adding endpoints / providers / strategies.
Roadmap
- Token-aware rate limiting (Redis sliding window per installation)
- Sub-installations for tenant hierarchies
- Speculative dispatch + hedging for tail latency
Zhang, Y. et al. Beyond GPT-5: Making LLMs Cheaper and Better via Performance–Efficiency Optimized Routing (Avengers-Pro). arXiv:2508.12631, 2025. https://arxiv.org/abs/2508.12631
Lu, Y., Liu, R., Yuan, J., Cui, X., Zhang, S., Liu, H., & Xing, J. RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers. arXiv:2510.00202, 2025. https://arxiv.org/abs/2510.00202
Similar Articles
Switchcraft: AI Model Router for Agentic Tool Calling
This paper introduces Switchcraft, the first AI model router specifically optimized for agentic tool calling to reduce inference costs. By using a lightweight DistilBERT classifier, it achieves significant cost savings while maintaining high accuracy in tool-use tasks.
Run Codex and Claude with any model including GLM 5.2. No settings file headaches.
A CLI tool called relay-ai acts as a proxy for Codex Desktop and Claude Code, enabling users to route requests to any model (including GLM 5.2) using their own API keys or OAuth subscriptions, with features to prevent crashes and manage context overflow.
@adambcohen93: Weave is launching the number 1 prompt router in the world. It enables you to get 70% more efficient use of your tokens…
Weave launches a prompt router that analyzes prompts and routes them to the most cost-effective model, claiming up to 70% cost reduction without performance loss. It integrates with existing workflows like Claude, Cursor, and Codex, and its source code is available.
@shao__meng: Why do Claude Code, Cursor, Codex, Aider, and Cline exhibit different agent behaviors despite potentially sharing the same underlying models? @addyosmani argues: It's due to the "shell" above the model — the Harness, which includes "prompts, ...
The article discusses how Addy Osmani argues that the performance difference between AI coding agents like Claude Code, Cursor, and Cline stems from their 'Harness'—the layer of prompts, tools, and constraints around the model—rather than the underlying model itself. It details best practices for harness engineering, including hooks, sandboxing, and context management, to bridge the gap between model capability and actual agent performance.
@DeRonin_: How I actually route between models : Tweet drafts : Sonnet 4.6 Long-form articles : Opus 4.6 Code work : Kimi 2.6 Agen…
A user shares their personal routing strategy between various AI models for different tasks like tweet drafts, articles, code, agentic loops, and image generation, arguing that single-model setups lead to higher costs.