@LottoLabs: This is awesome work Dflash for qwen 3.5/6 series

X AI KOLs Timeline Models

Summary

Charles Frye announces the co-release with Z Lab of six new DFlash speculators for Alibaba Qwen 3.x models, achieving over 1k output tokens per second for Qwen 3.5 122B-A10B on a B200.

This is awesome work Dflash for qwen 3.5/6 series
Original Article
View Cached Full Text

Cached at: 06/20/26, 10:25 PM

This is awesome work

Dflash for qwen 3.5/6 series

Charles 🎉 Frye (@charles_irl): Speculation Is All You Need.

In this blog post, we announce the co-release (w/ Z Lab) of six more state-of-the-art DFlash speculators for @Alibaba_Qwen 3.x.

Over 1k output tps for 3.5 122B-A10B on a B200.

Read the blog for why we’re all-in on spec dec.

Similar Articles

z-lab/Qwen3.6-35B-A3B-DFlash

Hugging Face Models Trending

z-lab releases DFlash, a speculative decoding drafter that uses a lightweight block-diffusion model to draft 15–16 tokens in parallel, yielding up to 2.9× speedup for Qwen3.6-35B-A3B inference.

z-lab/Qwen3.6-27B-DFlash

Hugging Face Models Trending

This article introduces Qwen3.6-27B-DFlash, a specialized drafter model for DFlash, a novel speculative decoding method using block diffusion to accelerate inference speed. It provides installation instructions for vLLM and SGLang to enable parallel drafting with the target Qwen3.6-27B model.