block-diffusion

Tag

Cards List
#block-diffusion

@lmsysorg: New blog: The next generation of speculative decoding: DFlash and Spec V2 DFlash + Spec V2 hit >4.3X baseline throughpu…

X AI KOLs Following · 16h ago Cached

New research on DFlash and Spec V2 speculative decoding methods achieves >4.3X baseline throughput for LLM inference, released as the default speculative decoding engine in SGLang.

0 favorites 0 likes
#block-diffusion

Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

arXiv cs.CL · 2026-05-25 Cached

Fast-dDrive is a block-diffusion VLA model for end-to-end autonomous driving that achieves state-of-the-art trajectory accuracy while delivering over 12x throughput speedup over autoregressive baselines, addressing the trade-off between high-fidelity planning and efficient inference for edge deployment.

0 favorites 0 likes
#block-diffusion

z-lab/gemma-4-31B-it-DFlash

Hugging Face Models Trending · 2026-04-30 Cached

Z-lab released DFlash, a speculative decoding drafter model for Gemma-4-31B-it that uses lightweight block diffusion to draft multiple tokens in parallel, achieving up to 5.8x speedup over autoregressive baseline.

0 favorites 0 likes
#block-diffusion

z-lab/Qwen3.6-27B-DFlash

Hugging Face Models Trending · 2026-04-23 Cached

This article introduces Qwen3.6-27B-DFlash, a specialized drafter model for DFlash, a novel speculative decoding method using block diffusion to accelerate inference speed. It provides installation instructions for vLLM and SGLang to enable parallel drafting with the target Qwen3.6-27B model.

0 favorites 0 likes
#block-diffusion

z-lab/Qwen3.6-35B-A3B-DFlash

Hugging Face Models Trending · 2026-04-17 Cached

z-lab releases DFlash, a speculative decoding drafter that uses a lightweight block-diffusion model to draft 15–16 tokens in parallel, yielding up to 2.9× speedup for Qwen3.6-35B-A3B inference.

0 favorites 0 likes
← Back to home

Submit Feedback