Tag
This paper formulates multi-agent routing as set-valued prediction, introduces a WildChat-derived benchmark with 3,000 prompts over a 12-agent catalog, and evaluates methods including supervised classifiers and cost-aware routing to study accuracy-cost trade-offs.