Tag
This article argues that the narrative that only frontier AI models are necessary for production is driven by financing needs, not architectural reality. It highlights that smaller, efficient models like Phi-4, Claude Haiku, and routing solutions like RouteLLM offer cost-effective alternatives, and most enterprises waste tokens by defaulting to large models.
This paper introduces Layer-wise Representation Dynamics (LRD), a framework with three measurement families to analyze how hidden states change across layers in language models. Applied to 31 models on 30 MTEB tasks, LRD reveals architectural differences and enables label-free model selection and inference-time layer pruning.
The article highlights that agent harnesses cause a 30-50 point performance swing compared to model selection, arguing that teams should focus on instance-level verification rather than just model names.
A tutorial blog post explaining LLM Routing — the practice of directing user queries to the most appropriate LLM based on cost, latency, and quality. Covers routing strategies, anatomy of an LLM router, and comparisons with Mixture of Experts.
Toto is a tool that routes context-rich tasks to the best AI model for the job.