distribution-aware

#distribution-aware

Beyond Prediction: Tail-Aware Scheduling for LLM Inference

arXiv cs.LG ↗ · 2026-06-18 Cached

This paper introduces a distribution-aware, prediction-free scheduling framework for LLM inference that replaces explicit length prediction with soft priority boosting using statistical signals. The method co-optimizes scheduling and cache-aware preemption to reduce tail latency, achieving up to 35-50% reduction in P99 TTLT compared to SRPT with perfect length knowledge.

0 favorites 0 likes

#distribution-aware

From Sampled Outcomes to Capability Distributions: Rethinking Supervision for LLM Routing

arXiv cs.LG ↗ · 2026-06-08 Cached

This paper proposes DARS, a framework that constructs routing supervision from a distributional view of model behavior to address the unreliability of single-shot labels in LLM routing.

0 favorites 0 likes

#distribution-aware

CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning

arXiv cs.LG ↗ · 2026-05-26 Cached

This paper introduces CurveRL, a principled distribution-aware prompt reweighting approach for reinforcement learning with verifiable rewards (RLVR) that improves LLM reasoning by assigning weights based on the rank and density of pass rates rather than their absolute values, consistently outperforming GRPO and other baselines.

0 favorites 0 likes

#distribution-aware