bidirectional-attention

#bidirectional-attention

[Talk] Text Diffusion — Google DeepMind's Brendan O’Donoghue

Reddit r/LocalLLaMA ↗ · 3h ago Cached

DeepMind研究员Brendan O'Donoghue深入介绍文本扩散模型，通过迭代去噪生成文本，相比自回归模型延迟更低但吞吐量受限，并展示自修正和动态计算等独特优势。

0 favorites 0 likes

#bidirectional-attention

Enabling KV Caching of Shared Prefix for Diffusion Language Models

arXiv cs.LG ↗ · 2d ago Cached

This paper proposes BiCache, a novel KV caching technique for shared prefixes in diffusion language models, which avoids accuracy collapse by dynamically reusing cached keys and values in shallow layers and achieves 36.3%–98.3% throughput improvement.

0 favorites 0 likes

bidirectional-attention

[Talk] Text Diffusion — Google DeepMind's Brendan O’Donoghue

Enabling KV Caching of Shared Prefix for Diffusion Language Models

Submit Feedback