Tag
vLLM integrates Mooncake Store for distributed KV cache reuse, enabling cross-node prefix caching to efficiently serve agentic workloads with high token reuse.
KV Packet proposes a recomputation-free cache reuse framework for LLMs that uses trainable soft-token adapters to bridge context discontinuities, eliminating overhead while maintaining performance comparable to full recomputation baselines on Llama-3.1 and Qwen2.5.