prefilling

Tag

Cards List
#prefilling

Prefilling-dLLM: Predictive Prefilling for Long-Context Inference in Diffusion Language Models

arXiv cs.CL · yesterday Cached

This paper proposes Prefilling-dLLM, a training-free framework that partitions the prefix into chunks and caches KV representations, achieving state-of-the-art quality and up to 28x speedup for long-context inference in diffusion language models.

0 favorites 0 likes
← Back to home

Submit Feedback