shared-prefix

#shared-prefix

Enabling KV Caching of Shared Prefix for Diffusion Language Models

arXiv cs.LG ↗ · 3d ago Cached

This paper proposes BiCache, a novel KV caching technique for shared prefixes in diffusion language models, which avoids accuracy collapse by dynamically reusing cached keys and values in shallow layers and achieves 36.3%–98.3% throughput improvement.

0 favorites 0 likes

shared-prefix

Enabling KV Caching of Shared Prefix for Diffusion Language Models

Submit Feedback