diffusion-llm

Tag

Cards List
#diffusion-llm

TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

Hugging Face Daily Papers · 2026-05-19 Cached

TIDE is a lossless inference system for diffusion large language models that leverages temporal stability of expert activations to reduce I/O overhead and computation, achieving up to 1.4-1.5x throughput improvements on single GPU-CPU systems.

0 favorites 0 likes
#diffusion-llm

PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding

arXiv cs.CL · 2026-05-18 Cached

This paper introduces Parallel Speculative Decoding (PSD), a training-free framework that accelerates diffusion LLM inference by jointly improving spatial and temporal efficiency, achieving up to 5.5× tokens per forward pass with comparable quality to greedy decoding.

0 favorites 0 likes
#diffusion-llm

@DivyanshT91162: Autoregressive LLMs might already be getting replaced Someone built dLLM — an open-source library that can turn ANY aut…

X AI KOLs Timeline · 2026-05-16 Cached

dLLM is an open-source library that converts any autoregressive LLM into a diffusion LLM, enabling parallel decoding and faster text generation.

0 favorites 0 likes
#diffusion-llm

Why there isn't any top LLM providers investing on diffusion LLM?

Reddit r/singularity · 2026-05-11

This article questions why major LLM providers are not investing in Diffusion LLMs despite recent advancements like Mercury 2. It explores potential fundamental issues or hardware bottlenecks hindering broader adoption.

0 favorites 0 likes
#diffusion-llm

$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

arXiv cs.CL · 2026-04-22 Cached

R²-dLLM introduces spatio-temporal redundancy reduction techniques that cut diffusion LLM decoding steps by up to 75% while preserving generation quality, addressing a key deployment bottleneck.

0 favorites 0 likes
← Back to home

Submit Feedback