Tag
MOSAIC introduces a structured agentic framework for automated data science that uses memory-grounded model selection and workflow construction, validated on financial time-series tasks. It outperforms AutoML and agentic baselines.
PRAXIS is a new algorithm that efficiently approximates the Rashomon set of near-optimal decision trees, achieving orders of magnitude improvement in runtime and memory while maintaining near-perfect recall.
Svpino demonstrates how to use an inference router to match problem complexity with the appropriate AI model, suggesting users should stop interacting with models directly.
HyDRA is a hybrid dynamic routing architecture for heterogeneous LLM pools that predicts fine-grained capability requirements per query and selects the cheapest capable model via shortfall matching, achieving up to 72.5% cost savings with quality maintained. It is deployed in GitHub Copilot's VS Code Chat auto-mode and decouples routing from model catalog, requiring no retraining when models change.
This paper introduces proxy metrics based on token-level statistics from expert-written solutions to forecast downstream LLM performance, significantly outperforming loss-based methods in model selection, pretraining data selection, and training-time forecasting.
This article argues that the narrative that only frontier AI models are necessary for production is driven by financing needs, not architectural reality. It highlights that smaller, efficient models like Phi-4, Claude Haiku, and routing solutions like RouteLLM offer cost-effective alternatives, and most enterprises waste tokens by defaulting to large models.
This paper introduces Layer-wise Representation Dynamics (LRD), a framework with three measurement families to analyze how hidden states change across layers in language models. Applied to 31 models on 30 MTEB tasks, LRD reveals architectural differences and enables label-free model selection and inference-time layer pruning.
The article highlights that agent harnesses cause a 30-50 point performance swing compared to model selection, arguing that teams should focus on instance-level verification rather than just model names.
A tutorial blog post explaining LLM Routing — the practice of directing user queries to the most appropriate LLM based on cost, latency, and quality. Covers routing strategies, anatomy of an LLM router, and comparisons with Mixture of Experts.
Toto is a tool that routes context-rich tasks to the best AI model for the job.