@LottoLabs: This is awesome work Dflash for qwen 3.5/6 series
Summary
Charles Frye announces the co-release with Z Lab of six new DFlash speculators for Alibaba Qwen 3.x models, achieving over 1k output tokens per second for Qwen 3.5 122B-A10B on a B200.
View Cached Full Text
Cached at: 06/20/26, 10:25 PM
This is awesome work
Dflash for qwen 3.5/6 series
Charles 🎉 Frye (@charles_irl): Speculation Is All You Need.
In this blog post, we announce the co-release (w/ Z Lab) of six more state-of-the-art DFlash speculators for @Alibaba_Qwen 3.x.
Over 1k output tps for 3.5 122B-A10B on a B200.
Read the blog for why we’re all-in on spec dec.
Similar Articles
z-lab/Qwen3.6-35B-A3B-DFlash
z-lab releases DFlash, a speculative decoding drafter that uses a lightweight block-diffusion model to draft 15–16 tokens in parallel, yielding up to 2.9× speedup for Qwen3.6-35B-A3B inference.
@charles_irl: Speculation Is All You Need. In this blog post, we announce the co-release (w/ Z Lab) of six more state-of-the-art DFla…
Modal and Z Lab release six new DFlash speculative decoding draft models for Qwen 3.x, achieving over 1000 tokens per second on a B200 and arguing that speculative decoding is the most impactful inference optimization.
z-lab/Qwen3.6-27B-DFlash
This article introduces Qwen3.6-27B-DFlash, a specialized drafter model for DFlash, a novel speculative decoding method using block diffusion to accelerate inference speed. It provides installation instructions for vLLM and SGLang to enable parallel drafting with the target Qwen3.6-27B model.
@zhijianliu_: DFlash for Qwen3.6-35B-A3B just dropped The community was running the day-1 preview before we even finished training. N…
Z-lab releases DFlash for Qwen3.6-35B-A3B, a model fine-tuning/compression technique, with training complete and weights now available on GitHub and HuggingFace.
@RedHat_AI: Qwen3-8B now has a DFlash speculator! 82.2% first-token acceptance on math reasoning. 3.74 avg tokens accepted per step…
Red Hat AI released a DFlash speculator model for Qwen3-8B, achieving 82.2% first-token acceptance on math reasoning tasks. The model was trained using the Speculators library and vLLM to optimize inference speed.