routing

#routing

@Modular: HTTP routing has been a solved problem for many years. Then came Large Language Models. Their backends aren't interchan…

X AI KOLs Following ↗ · 2d ago Cached

Modular published a blog post explaining why traditional HTTP routing doesn't work for LLM inference workloads. The article describes how their distributed inference framework handles stateful, heterogeneous GPU pods with KV caches, specialized prefill/decode backends, and conversation-level routing that traditional stateless routing algorithms cannot address.

0 favorites 0 likes

#routing

No One Fits All: From Fixed Prompting to Learned Routing in Multilingual LLMs

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers from National Taiwan University propose replacing fixed translation-based prompting strategies in multilingual LLMs with lightweight learned classifiers that route each instance to either native or translation-based prompting. Their analysis across 10 languages and 4 benchmarks shows no single strategy is universally optimal, with translation benefiting low-resource languages most, and the learned routing achieving statistically significant improvements over fixed strategies.

0 favorites 0 likes

routing

@Modular: HTTP routing has been a solved problem for many years. Then came Large Language Models. Their backends aren't interchan…

No One Fits All: From Fixed Prompting to Learned Routing in Multilingual LLMs

Submit Feedback