Tag
This paper formulates multi-agent routing as set-valued prediction, introduces a WildChat-derived benchmark with 3,000 prompts over a 12-agent catalog, and evaluates methods including supervised classifiers and cost-aware routing to study accuracy-cost trade-offs.
This paper presents the first unified benchmark for pathway-guided therapy response modeling, evaluating three biologically informed architectures (BINN, GraphPath, PATH) across five cancer cohorts from The Cancer Genome Atlas for multi-label prediction of targeted therapy, radiation therapy, and survival outcomes.
This paper investigates how reasoning models perform zero-shot multi-label classification over millions of candidate labels. The authors characterize a two-phase process of shortlisting and fine-grained reasoning, and propose a mechanistic distillation method that outperforms standard distillation for transferring these capabilities to smaller models.