@mervenoyann: DiffusionGemma is out it's compute-bound so 4x faster compared to other Gemma-4 models (1k tok/s on H100) also great on…

X AI KOLs Following Models

Summary

DiffusionGemma is out; it's compute-bound and 4x faster than other Gemma-4 models with 1k tok/s on H100, and excels at coding tasks including 3D generation and front-end.

DiffusionGemma is out 🔥 it's compute-bound so 4x faster compared to other Gemma-4 models (1k tok/s on H100) 💨 also great on coding, generate and iterate on any code from 3D generation to front-end ⤵️ https://t.co/NAjEaml6dV
Original Article
View Cached Full Text

Cached at: 06/10/26, 05:53 PM

DiffusionGemma is out 🔥

it’s compute-bound so 4x faster compared to other Gemma-4 models (1k tok/s on H100) 💨

also great on coding, generate and iterate on any code from 3D generation to front-end ⤵️ https://t.co/NAjEaml6dV

Similar Articles

DiffusionGemma: 4x Faster Text Generation

Hacker News Top

Google introduces DiffusionGemma, an experimental 26B MoE open model that achieves up to 4x faster text generation on GPUs using text diffusion, targeting speed-critical interactive local workflows.

Gemma 4 MTP vs DFlash on 1x H100: dense vs MoE results

Reddit r/LocalLLaMA

This benchmark compares Gemma 4's Multi-Token Prediction (MTP) and z-lab's DFlash speculative decoding methods on a single H100 GPU, showing MTP faster for dense models and DFlash faster for MoE models.

Gemma 4 26B Hits 600 Tok/s on One RTX 5090

Reddit r/LocalLLaMA

A benchmark shows that using vLLM with DFlash speculative decoding boosts Gemma 4 26B inference to ~578 tokens per second on a single RTX 5090, achieving a 2.56x speedup over baseline.