Tag
Discusses the nuanced reality of prefill-decode disaggregation in LLM serving at scale, based on customer patterns and validated on AMD with vLLM.
Delta-compressed weight sync technique merged into slime, enabling lossless delta sync for Megatron ↔ SGLang disaggregation, enhancing reinforcement learning at scale.