Tag
vLLM integrates Mooncake Store for distributed KV cache reuse, enabling cross-node prefix caching to efficiently serve agentic workloads with high token reuse.
Qwen inference team announced TokenSpeed, a high-performance LLM inference engine for agentic workloads, achieving 540 TPS, with open-source preview available.
Lightseek releases TokenSpeed, a high-performance LLM inference engine optimized for agentic workloads, featuring compiler-backed parallelism and advanced kernel optimizations that have been adopted by vLLM.