block-diffusion

#block-diffusion

Latent Block-Diffusion Temporal Point Processes: A Semi-Autoregressive Framework for Asynchronous Event Sequence Generation

arXiv cs.LG ↗ · 19h ago Cached

Introduces a semi-autoregressive framework that combines latent block diffusion with temporal point processes for generating asynchronous event sequences, reducing error accumulation while enabling variable-length output.

0 favorites 0 likes

#block-diffusion

@lmsysorg: New blog: The next generation of speculative decoding: DFlash and Spec V2 DFlash + Spec V2 hit >4.3X baseline throughpu…

X AI KOLs Following ↗ · 2026-06-15 Cached

New research on DFlash and Spec V2 speculative decoding methods achieves >4.3X baseline throughput for LLM inference, released as the default speculative decoding engine in SGLang.

0 favorites 0 likes

#block-diffusion

@zhijianliu_: This is what DFlash was built for. Our block-diffusion drafter + KV injection, now running at frontier scale — thanks t…

X AI KOLs Following ↗ · 2026-06-15 Cached

DFlash, a block-diffusion drafter with KV injection, is now running at frontier scale, achieving up to 4.3x greater throughput over baseline, integrated with Modal and SGLang for Qwen 397B.

0 favorites 0 likes

#block-diffusion

Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

arXiv cs.CL ↗ · 2026-05-25 Cached

Fast-dDrive is a block-diffusion VLA model for end-to-end autonomous driving that achieves state-of-the-art trajectory accuracy while delivering over 12x throughput speedup over autoregressive baselines, addressing the trade-off between high-fidelity planning and efficient inference for edge deployment.

0 favorites 0 likes

#block-diffusion

z-lab/gemma-4-31B-it-DFlash

Hugging Face Models Trending ↗ · 2026-04-30 Cached

Z-lab released DFlash, a speculative decoding drafter model for Gemma-4-31B-it that uses lightweight block diffusion to draft multiple tokens in parallel, achieving up to 5.8x speedup over autoregressive baseline.

0 favorites 0 likes

#block-diffusion

z-lab/Qwen3.6-27B-DFlash

Hugging Face Models Trending ↗ · 2026-04-23 Cached

This article introduces Qwen3.6-27B-DFlash, a specialized drafter model for DFlash, a novel speculative decoding method using block diffusion to accelerate inference speed. It provides installation instructions for vLLM and SGLang to enable parallel drafting with the target Qwen3.6-27B model.

0 favorites 0 likes

#block-diffusion

z-lab/Qwen3.6-35B-A3B-DFlash

Hugging Face Models Trending ↗ · 2026-04-17 Cached

z-lab releases DFlash, a speculative decoding drafter that uses a lightweight block-diffusion model to draft 15–16 tokens in parallel, yielding up to 2.9× speedup for Qwen3.6-35B-A3B inference.

0 favorites 0 likes

block-diffusion

Latent Block-Diffusion Temporal Point Processes: A Semi-Autoregressive Framework for Asynchronous Event Sequence Generation

@lmsysorg: New blog: The next generation of speculative decoding: DFlash and Spec V2 DFlash + Spec V2 hit >4.3X baseline throughpu…

@zhijianliu_: This is what DFlash was built for. Our block-diffusion drafter + KV injection, now running at frontier scale — thanks t…

Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

z-lab/gemma-4-31B-it-DFlash

z-lab/Qwen3.6-27B-DFlash

z-lab/Qwen3.6-35B-A3B-DFlash

Submit Feedback