prefilling

#prefilling

Prefilling-dLLM: Predictive Prefilling for Long-Context Inference in Diffusion Language Models

arXiv cs.CL ↗ · yesterday Cached

This paper proposes Prefilling-dLLM, a training-free framework that partitions the prefix into chunks and caches KV representations, achieving state-of-the-art quality and up to 28x speedup for long-context inference in diffusion language models.

0 favorites 0 likes

prefilling

Prefilling-dLLM: Predictive Prefilling for Long-Context Inference in Diffusion Language Models

Submit Feedback