@charles_irl: Many are belatedly realizing that intelligence must be open. For open intelligence to succeed, developers must work tog…

X AI KOLs Following 06/15/26, 04:38 PM Tools

open-intelligence sglang inference-speed throughput collaboration speculative-decoding

Summary

A collaboration between Modal, SGLang, and Z Lab integrates DFlash speculation into SGLang, achieving up to 4.3x throughput improvement for Alibaba's Qwen 397B-A17B model, advancing open intelligence.

Many are belatedly realizing that intelligence must be open. For open intelligence to succeed, developers must work together across institutional lines. That's why I'm particularly excited about this collab across @modal, @sgl_project, and Z Lab:

Original Article

View Cached Full Text

Cached at: 06/16/26, 11:40 AM

Many are belatedly realizing that intelligence must be open.

For open intelligence to succeed, developers must work together across institutional lines.

That’s why I’m particularly excited about this collab across @modal, @sgl_project, and Z Lab:

Modal (@modal): We worked with @lmsysorg and https://t.co/Cg0JsVomui to

integrate DFlash spec into @sgl_project

make it faster with overlap

train a DFlash drafter for @Alibaba_Qwen 397B-A17B

The result: up to 4.3x greater throughput over baseline and 1.5x over native MTP.

Similar Articles

@modal: We worked with @lmsysorg and http://z-lab.ai to - integrate DFlash spec into @sgl_project - make it faster with overlap…

X AI KOLs Following

Modal collaborated with LMSys and Z Lab to integrate DFlash speculative decoding into SGLang, achieving up to 4.3x throughput improvement over baseline and 1.5x over native multi-token prediction for large language models.

@lmsysorg: New blog: The next generation of speculative decoding: DFlash and Spec V2 DFlash + Spec V2 hit >4.3X baseline throughpu…

X AI KOLs Following

New research on DFlash and Spec V2 speculative decoding methods achieves >4.3X baseline throughput for LLM inference, released as the default speculative decoding engine in SGLang.

@zhijianliu_: This is what DFlash was built for. Our block-diffusion drafter + KV injection, now running at frontier scale — thanks t…

X AI KOLs Following

DFlash, a block-diffusion drafter with KV injection, is now running at frontier scale, achieving up to 4.3x greater throughput over baseline, integrated with Modal and SGLang for Qwen 397B.

DFlash and Spec V2 Decoding (14 minute read)

TLDR AI

Z Lab, SGLang, and Modal release DFlash, a new speculative decoding model for Qwen 3.5 397B-A17B that uses block diffusion and KV injection to achieve over 4x throughput improvement over baseline and 1.5x over native MTP.

@Ali_TongyiLab: We are pleased to highlight an excellent community model from developer : Qwen3.6-27B-MTP-pi-reasoning-GGUF. Built on o…

X AI KOLs Timeline

Alibaba's Tongyi Lab highlights a community model, Qwen3.6-27B-MTP-pi-reasoning-GGUF, built on Qwen3.6-27B, optimized for automated programming and debugging workflows for local coding agents.

Similar Articles

@modal: We worked with @lmsysorg and http://z-lab.ai to - integrate DFlash spec into @sgl_project - make it faster with overlap…

@lmsysorg: New blog: The next generation of speculative decoding: DFlash and Spec V2 DFlash + Spec V2 hit >4.3X baseline throughpu…

@zhijianliu_: This is what DFlash was built for. Our block-diffusion drafter + KV injection, now running at frontier scale — thanks t…

DFlash and Spec V2 Decoding (14 minute read)

@Ali_TongyiLab: We are pleased to highlight an excellent community model from developer : Qwen3.6-27B-MTP-pi-reasoning-GGUF. Built on o…

Submit Feedback