Google released DiffusionGemma, an open-weight text generation model (26B parameters, 4B active) under Apache 2 license, demonstrating high inference speeds via NVIDIA's NIM cloud API.
# DiffusionGemma
Source: [https://simonwillison.net/2026/Jun/10/diffusiongemma/](https://simonwillison.net/2026/Jun/10/diffusiongemma/)
10th June 2026 \- Link Blog
**[DiffusionGemma](https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/)**\([via](https://news.ycombinator.com/item?id=48478471)\) Last May Google briefly released an experimental Gemini Diffusion model\. I[tried the preview at the time](https://simonwillison.net/2025/May/21/gemini-diffusion/)and recorded it running at 857 tokens/second\. It was an exciting model, but Google made no further announcements about it\.
That research has returned in the best possible way: as a new open weight \(Apache 2 licensed\) Gemma model,[google/diffusiongemma\-26B\-A4B\-it](https://huggingface.co/google/diffusiongemma-26B-A4B-it)\.
NVIDIA are currently[hosting the model for free](https://build.nvidia.com/google/diffusiongemma-26b-a4b-it)on their NIM cloud API\. I used that API to[generate this pelican](https://tools.simonwillison.net/markdown-svg-renderer#url=https%3A%2F%2Fgist.github.com%2Fsimonw%2Fe5e234a6dc6eef61e209ce1629620042), which took 4\.4s \(according to`time uv run generate\.py`\) to return 2,409 tokens \- so at least 500 tokens/second\.

Posted[10th June 2026](https://simonwillison.net/2026/Jun/10/)at 8 pm
## Recent articles
- [Initial impressions of Claude Fable 5](https://simonwillison.net/2026/Jun/9/claude-fable-5/)\- 9th June 2026
- [Running Python code in a sandbox with MicroPython and WASM](https://simonwillison.net/2026/Jun/6/micropython-in-a-sandbox/)\- 6th June 2026
- [Claude Opus 4\.8: "a modest but tangible improvement"](https://simonwillison.net/2026/May/28/claude-opus-4-8/)\- 28th May 2026
This is a**link post**by Simon Willison, posted on[10th June 2026](https://simonwillison.net/2026/Jun/10/)\.
[google412](https://simonwillison.net/tags/google/)[ai2,065](https://simonwillison.net/tags/ai/)[generative\-ai1,823](https://simonwillison.net/tags/generative-ai/)[llms1,791](https://simonwillison.net/tags/llms/)[nvidia18](https://simonwillison.net/tags/nvidia/)[pelican\-riding\-a\-bicycle118](https://simonwillison.net/tags/pelican-riding-a-bicycle/)[gemma15](https://simonwillison.net/tags/gemma/)[llm\-release205](https://simonwillison.net/tags/llm-release/)[llm\-performance16](https://simonwillison.net/tags/llm-performance/)
### Monthly briefing
Sponsor me for**$10/month**and get a curated email digest of the month's most important LLM developments\.
Pay me to send you less\!
[Sponsor & subscribe](https://github.com/sponsors/simonw/)
Google introduces DiffusionGemma, an experimental 26B MoE open model that achieves up to 4x faster text generation on GPUs using text diffusion, targeting speed-critical interactive local workflows.
Google released DiffusionGemma, an experimental open-source diffusion model for text generation that achieves 4x speed boost over autoregressive models, optimized for local processing.
Google DeepMind releases DiffusionGemma, a 26B-parameter Mixture-of-Experts model that uses discrete diffusion for faster text generation, supporting multimodal inputs and a 256K token context.
DiffusionGemma is a new experimental model from Google DeepMind that uses parallel generation on a 256-token canvas, achieving up to 4x faster token generation on GPUs. This developer guide explains its architecture, bidirectional context, and includes a fine-tuning recipe for solving Sudoku.
DiffusionGemma, a 26B MoE model based on Gemma 4, achieves over 1000 tokens per second using diffusion for text generation in 256-token blocks, fitting in 18GB VRAM with quantization, released under Apache 2.0.