nvidia-dynamo

#nvidia-dynamo

@kazukifujii: Sakura Internet's Michishita-san's article comprehensively summarizes LLM Inference and comes highly recommended. It fe…

X AI KOLs Timeline ↗ · yesterday Cached

This article summarizes a presentation by Junda Chen on disaggregated inference for LLMs, explaining why goodput (throughput meeting latency SLOs) matters more than raw throughput, and how separating prefill and decode phases improves performance. It also highlights the influence on NVIDIA Dynamo.

0 favorites 0 likes

#nvidia-dynamo

The Price of Anarchy in Disaggregated Inference

Hugging Face Daily Papers ↗ · 2026-06-11 Cached

This paper presents a game-theoretic analysis of disaggregated inference architectures that separate prefill and decode phases across GPU pools, characterizing how GPU saturation affects performance. The authors propose an adaptive controller that detects saturation transitions and adjusts routing parameters, reducing the Price of Anarchy significantly in experiments on NVIDIA B200 clusters.

0 favorites 0 likes

nvidia-dynamo

@kazukifujii: Sakura Internet's Michishita-san's article comprehensively summarizes LLM Inference and comes highly recommended. It fe…

The Price of Anarchy in Disaggregated Inference

Submit Feedback