tail-latency

#tail-latency

Meet Alice. Alice is impatient

Lobsters Hottest ↗ · 3d ago Cached

This blog post explains the inspection paradox in system latency and recovery time measurement, showing why customers experience longer average waits than service metrics suggest. It includes an interactive simulation and emphasizes the importance of understanding the tail of the distribution.

0 favorites 0 likes

#tail-latency

Beyond Prediction: Tail-Aware Scheduling for LLM Inference

arXiv cs.LG ↗ · 6d ago Cached

This paper introduces a distribution-aware, prediction-free scheduling framework for LLM inference that replaces explicit length prediction with soft priority boosting using statistical signals. The method co-optimizes scheduling and cache-aware preemption to reduce tail latency, achieving up to 35-50% reduction in P99 TTLT compared to SRPT with perfect length knowledge.

0 favorites 0 likes

tail-latency

Meet Alice. Alice is impatient

Beyond Prediction: Tail-Aware Scheduling for LLM Inference

Submit Feedback