@charles_irl: Many are belatedly realizing that intelligence must be open. For open intelligence to succeed, developers must work tog…

X AI KOLs Following Tools

Summary

A collaboration between Modal, SGLang, and Z Lab integrates DFlash speculation into SGLang, achieving up to 4.3x throughput improvement for Alibaba's Qwen 397B-A17B model, advancing open intelligence.

Many are belatedly realizing that intelligence must be open. For open intelligence to succeed, developers must work together across institutional lines. That's why I'm particularly excited about this collab across @modal, @sgl_project, and Z Lab:
Original Article
View Cached Full Text

Cached at: 06/16/26, 11:40 AM

Many are belatedly realizing that intelligence must be open.

For open intelligence to succeed, developers must work together across institutional lines.

That’s why I’m particularly excited about this collab across @modal, @sgl_project, and Z Lab:

Modal (@modal): We worked with @lmsysorg and https://t.co/Cg0JsVomui to

  • integrate DFlash spec into @sgl_project
  • make it faster with overlap
  • train a DFlash drafter for @Alibaba_Qwen 397B-A17B

The result: up to 4.3x greater throughput over baseline and 1.5x over native MTP.

Similar Articles

DFlash and Spec V2 Decoding (14 minute read)

TLDR AI

Z Lab, SGLang, and Modal release DFlash, a new speculative decoding model for Qwen 3.5 397B-A17B that uses block diffusion and KV injection to achieve over 4x throughput improvement over baseline and 1.5x over native MTP.