kv-offload

#kv-offload

@vllm_project: vLLM v0.21.0 is out! 367 commits from 202 contributors (49 new). Highlights: KV Offload + HMA, spec decode with thinkin…

X AI KOLs Following ↗ · 2026-05-16 Cached

vLLM v0.21.0 has been released with KV Offload + HMA, speculative decoding with thinking budget for reasoning models, TOKENSPEED_MLA on Blackwell for DSR1/Kimi K2.5, Mooncake distributed KV, DeepSeek V4 pipeline parallelism, and a C++20 + Transformers v5 baseline.

0 favorites 0 likes

kv-offload

@vllm_project: vLLM v0.21.0 is out! 367 commits from 202 contributors (49 new). Highlights: KV Offload + HMA, spec decode with thinkin…

Submit Feedback