RouteProfile: Elucidating the Design Space of LLM Profiles for Routing
Summary
This paper introduces RouteProfile, a design space for LLM profiles in routing systems, demonstrating that structured profiles and query-level signals improve routing performance and generalization to new models.
View Cached Full Text
Cached at: 05/15/26, 08:24 AM
Paper page - RouteProfile: Elucidating the Design Space of LLM Profiles for Routing
Source: https://huggingface.co/papers/2605.00180
Abstract
LLM profiling design significantly impacts routing performance, with structured profiles and query-level signals demonstrating superior reliability and generalization compared to flat profiles and domain-level signals.
As the large language model (LLM) ecosystem expands, individual models exhibit varying capabilities across queries, benchmarks, and domains, motivating the development ofLLM routing. While prior work has largely focused onrouter mechanism design,LLM profiles, which capture model capabilities, remain underexplored. In this work, we ask: How does LLM profile design affect routing performance across different routers? Addressing this question helps clarify the role of profiles in routing, disentangle profile design from router design, and enable fairer comparison and more principled development of routing systems. To this end, we view LLM profiling as astructured information integrationproblem over heterogeneous interaction histories. We develop a general design space ofLLM profiles, namedRouteProfile, along four key dimensions:organizational form,representation type,aggregation depth, andlearning configuration. Through systematic evaluation across three representative routers under both standard and new-LLMgeneralizationsettings, we show that: (1) structured profiles consistently outperform flat ones; (2)query-level signalsare more reliable than coarsedomain-level signals; and (3)generalizationto newly introduced models benefits most from structured profiles under trainable configurations. Overall, our work highlights LLM profile design as an important direction for future routing research.
View arXiv pageView PDFProject pageGitHub6Add to collection
Get this paper in your agent:
hf papers read 2605\.00180
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.00180 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.00180 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.00180 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
The Routing Plateau: Understanding and Breaking the Accuracy Limits of LLM Routers
This paper identifies a 'routing plateau' phenomenon where diverse LLM routing methods converge to similar accuracy, far below the oracle, due to a predictability bottleneck that limits query-specific routing. It then shows that larger datasets, stronger encoders, and fine-tuning can help break through this plateau.
@amitiitbhu: New article: LLM Routing Read here: https://outcomeschool.com/blog/llm-routing…
A tutorial blog post explaining LLM Routing — the practice of directing user queries to the most appropriate LLM based on cost, latency, and quality. Covers routing strategies, anatomy of an LLM router, and comparisons with Mixture of Experts.
Learning Agent Routing From Early Experience
This paper introduces BoundaryRouter, a training-free framework that optimizes LLM agent usage by routing queries to either lightweight inference or full agent execution based on early experience. It also presents RouteBench, a benchmark for evaluating routing performance, showing significant improvements in speed and accuracy.
Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents
This paper introduces LQM-ContextRoute, a contextual bandit router for selecting between functionally equivalent tool providers in LLM agents, balancing latency and answer quality. It outperforms baselines on web-search and retriever benchmarks.
From Sampled Outcomes to Capability Distributions: Rethinking Supervision for LLM Routing
This paper proposes DARS, a framework that constructs routing supervision from a distributional view of model behavior to address the unreliability of single-shot labels in LLM routing.