@vllm_project: vLLM v0.21.0 is out! 367 commits from 202 contributors (49 new). Highlights: KV Offload + HMA, spec decode with thinkin…

X AI KOLs Following 05/16/26, 01:52 AM Tools

vllm release llm-inference kv-offload speculative-decoding reasoning-models open-source

Summary

vLLM v0.21.0 has been released with KV Offload + HMA, speculative decoding with thinking budget for reasoning models, TOKENSPEED_MLA on Blackwell for DSR1/Kimi K2.5, Mooncake distributed KV, DeepSeek V4 pipeline parallelism, and a C++20 + Transformers v5 baseline.

vLLM v0.21.0 is out! 367 commits from 202 contributors (49 new). Highlights: KV Offload + HMA, spec decode with thinking budget (reasoning models), TOKENSPEED_MLA on Blackwell for DSR1 / Kimi K2.5, Mooncake distributed KV, DeepSeek V4 pipeline parallelism. C++20 + Transformers v5 baseline. Thread

Original Article

View Cached Full Text

Cached at: 05/16/26, 09:23 PM

vLLM v0.21.0 is out! 367 commits from 202 contributors (49 new).

Highlights: KV Offload + HMA, spec decode with thinking budget (reasoning models), TOKENSPEED_MLA on Blackwell for DSR1 / Kimi K2.5, Mooncake distributed KV, DeepSeek V4 pipeline parallelism. C++20 + Transformers v5 baseline.

Thread

@vllm_project: vLLM v0.21.0 is out! 367 commits from 202 contributors (49 new). Highlights: KV Offload + HMA, spec decode with thinkin…

Similar Articles

vllm-project/vllm v0.20.0rc1

vllm-project/vllm v0.21.0rc1

vllm-project/vllm v0.20.1

vllm-project/vllm v0.20.0

vllm-project/vllm v0.19.1

Submit Feedback