@LottoLabs: This is awesome work Dflash for qwen 3.5/6 series

X AI KOLs Timeline 06/20/26, 12:52 AM Models

speculative-decoding qwen dflash inference-acceleration open-source ai-model-optimization

Summary

Charles Frye announces the co-release with Z Lab of six new DFlash speculators for Alibaba Qwen 3.x models, achieving over 1k output tokens per second for Qwen 3.5 122B-A10B on a B200.

This is awesome work Dflash for qwen 3.5/6 series

Original Article

View Cached Full Text

Cached at: 06/20/26, 10:25 PM

This is awesome work

Dflash for qwen 3.5/6 series

Charles 🎉 Frye (@charles_irl): Speculation Is All You Need.

In this blog post, we announce the co-release (w/ Z Lab) of six more state-of-the-art DFlash speculators for @Alibaba_Qwen 3.x.

Over 1k output tps for 3.5 122B-A10B on a B200.

Read the blog for why we’re all-in on spec dec.

Similar Articles

z-lab/Qwen3.6-35B-A3B-DFlash

Hugging Face Models Trending

z-lab releases DFlash, a speculative decoding drafter that uses a lightweight block-diffusion model to draft 15–16 tokens in parallel, yielding up to 2.9× speedup for Qwen3.6-35B-A3B inference.

@charles_irl: Speculation Is All You Need. In this blog post, we announce the co-release (w/ Z Lab) of six more state-of-the-art DFla…

X AI KOLs Following

Modal and Z Lab release six new DFlash speculative decoding draft models for Qwen 3.x, achieving over 1000 tokens per second on a B200 and arguing that speculative decoding is the most impactful inference optimization.

z-lab/Qwen3.6-27B-DFlash

Hugging Face Models Trending

This article introduces Qwen3.6-27B-DFlash, a specialized drafter model for DFlash, a novel speculative decoding method using block diffusion to accelerate inference speed. It provides installation instructions for vLLM and SGLang to enable parallel drafting with the target Qwen3.6-27B model.

@zhijianliu_: DFlash for Qwen3.6-35B-A3B just dropped The community was running the day-1 preview before we even finished training. N…

X AI KOLs Following

Z-lab releases DFlash for Qwen3.6-35B-A3B, a model fine-tuning/compression technique, with training complete and weights now available on GitHub and HuggingFace.

@RedHat_AI: Qwen3-8B now has a DFlash speculator! 82.2% first-token acceptance on math reasoning. 3.74 avg tokens accepted per step…

X AI KOLs Following

Red Hat AI released a DFlash speculator model for Qwen3-8B, achieving 82.2% first-token acceptance on math reasoning tasks. The model was trained using the Speculators library and vLLM to optimize inference speed.

Similar Articles

z-lab/Qwen3.6-35B-A3B-DFlash

@charles_irl: Speculation Is All You Need. In this blog post, we announce the co-release (w/ Z Lab) of six more state-of-the-art DFla…

z-lab/Qwen3.6-27B-DFlash

@zhijianliu_: DFlash for Qwen3.6-35B-A3B just dropped The community was running the day-1 preview before we even finished training. N…

@RedHat_AI: Qwen3-8B now has a DFlash speculator! 82.2% first-token acceptance on math reasoning. 3.74 avg tokens accepted per step…

Submit Feedback