@_philschmid: Gemma goes diffusion! DiffusionGemma with up to 1000+ tokens per second! - Built on Gemma 4 as a 26B MoE model. - 3.8B …
Summary
DiffusionGemma, a 26B MoE model based on Gemma 4, achieves over 1000 tokens per second using diffusion for text generation in 256-token blocks, fitting in 18GB VRAM with quantization, released under Apache 2.0.
View Cached Full Text
Cached at: 06/10/26, 05:53 PM
Gemma goes diffusion! DiffusionGemma with up to 1000+ tokens per second! 🌬️
- Built on Gemma 4 as a 26B MoE model.
- 3.8B parameters during inference.
- Generates text in 256-token blocks in parallel.
- Fits within 18 GB VRAM limits when quantized.
- Apache 2.0 https://t.co/rnQsdRNoD0
Similar Articles
DiffusionGemma: 4x Faster Text Generation
Google introduces DiffusionGemma, an experimental 26B MoE open model that achieves up to 4x faster text generation on GPUs using text diffusion, targeting speed-critical interactive local workflows.
@mervenoyann: DiffusionGemma is out it's compute-bound so 4x faster compared to other Gemma-4 models (1k tok/s on H100) also great on…
DiffusionGemma is out; it's compute-bound and 4x faster than other Gemma-4 models with 1k tok/s on H100, and excels at coding tasks including 3D generation and front-end.
DiffusionGemma
Google released DiffusionGemma, an open-weight text generation model (26B parameters, 4B active) under Apache 2 license, demonstrating high inference speeds via NVIDIA's NIM cloud API.
@HuggingPapers: NVIDIA just released an NVFP4-quantized DiffusionGemma on Hugging Face A 26B MoE multimodal model generating text via p…
NVIDIA released a 26B MoE multimodal model called DiffusionGemma on Hugging Face, using NVFP4 quantization and achieving over 1,100 tokens per second on Hopper hardware.
DiffusionGemma: The Developer Guide- Google Developers Blog
DiffusionGemma is a new experimental model from Google DeepMind that uses parallel generation on a 256-token canvas, achieving up to 4x faster token generation on GPUs. This developer guide explains its architecture, bidirectional context, and includes a fine-tuning recipe for solving Sudoku.