goodput

#goodput

@kazukifujii: Sakura Internet's Michishita-san's article comprehensively summarizes LLM Inference and comes highly recommended. It fe…

X AI KOLs Timeline ↗ · 2d ago Cached

This article summarizes a presentation by Junda Chen on disaggregated inference for LLMs, explaining why goodput (throughput meeting latency SLOs) matters more than raw throughput, and how separating prefill and decode phases improves performance. It also highlights the influence on NVIDIA Dynamo.

0 favorites 0 likes

#goodput

@robertnishihara: Some intuition about PD disaggregation from the blog - PD doesn't speed up prefill and can actually hurt TTFT - PD's re…

X AI KOLs Following ↗ · 3d ago Cached

This blog post from Anyscale explains the intuition behind Prefill-Decode (PD) disaggregation for LLM serving, showing how separating prefill and decode phases onto dedicated GPUs can achieve up to 2.7x better goodput and 67% cost savings when using Ray and vLLM on AMD MI325X, while also discussing when PD disaggregation does not help.

0 favorites 0 likes

goodput

@kazukifujii: Sakura Internet's Michishita-san's article comprehensively summarizes LLM Inference and comes highly recommended. It fe…

@robertnishihara: Some intuition about PD disaggregation from the blog - PD doesn't speed up prefill and can actually hurt TTFT - PD's re…

Submit Feedback