DiffusionGemma

Simon Willison's Blog 06/10/26, 08:00 PM Models

gemma diffusion open-weights text-generation google nvidia

Summary

Google released DiffusionGemma, an open-weight text generation model (26B parameters, 4B active) under Apache 2 license, demonstrating high inference speeds via NVIDIA's NIM cloud API.

No content available

Original Article

View Cached Full Text

Cached at: 06/10/26, 09:45 PM

# DiffusionGemma Source: [https://simonwillison.net/2026/Jun/10/diffusiongemma/](https://simonwillison.net/2026/Jun/10/diffusiongemma/) 10th June 2026 \- Link Blog **[DiffusionGemma](https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/)**$[via](https://news.ycombinator.com/item?id=48478471)$ Last May Google briefly released an experimental Gemini Diffusion model\. I[tried the preview at the time](https://simonwillison.net/2025/May/21/gemini-diffusion/)and recorded it running at 857 tokens/second\. It was an exciting model, but Google made no further announcements about it\. That research has returned in the best possible way: as a new open weight $Apache 2 licensed$ Gemma model,[google/diffusiongemma\-26B\-A4B\-it](https://huggingface.co/google/diffusiongemma-26B-A4B-it)\. NVIDIA are currently[hosting the model for free](https://build.nvidia.com/google/diffusiongemma-26b-a4b-it)on their NIM cloud API\. I used that API to[generate this pelican](https://tools.simonwillison.net/markdown-svg-renderer#url=https%3A%2F%2Fgist.github.com%2Fsimonw%2Fe5e234a6dc6eef61e209ce1629620042), which took 4\.4s $according to`time uv run generate\.py`$ to return 2,409 tokens \- so at least 500 tokens/second\. ![Flat minimalist illustration of a white pelican with a large orange beak riding a red bicycle with black wheels, against a pale blue background with a green line representing the ground](https://static.simonwillison.net/static/2026/diffusiongemma-pelican.png) Posted[10th June 2026](https://simonwillison.net/2026/Jun/10/)at 8 pm ## Recent articles - [Initial impressions of Claude Fable 5](https://simonwillison.net/2026/Jun/9/claude-fable-5/)\- 9th June 2026 - [Running Python code in a sandbox with MicroPython and WASM](https://simonwillison.net/2026/Jun/6/micropython-in-a-sandbox/)\- 6th June 2026 - [Claude Opus 4\.8: "a modest but tangible improvement"](https://simonwillison.net/2026/May/28/claude-opus-4-8/)\- 28th May 2026 This is a**link post**by Simon Willison, posted on[10th June 2026](https://simonwillison.net/2026/Jun/10/)\. [google412](https://simonwillison.net/tags/google/)[ai2,065](https://simonwillison.net/tags/ai/)[generative\-ai1,823](https://simonwillison.net/tags/generative-ai/)[llms1,791](https://simonwillison.net/tags/llms/)[nvidia18](https://simonwillison.net/tags/nvidia/)[pelican\-riding\-a\-bicycle118](https://simonwillison.net/tags/pelican-riding-a-bicycle/)[gemma15](https://simonwillison.net/tags/gemma/)[llm\-release205](https://simonwillison.net/tags/llm-release/)[llm\-performance16](https://simonwillison.net/tags/llm-performance/) ### Monthly briefing Sponsor me for**$10/month**and get a curated email digest of the month's most important LLM developments\. Pay me to send you less\! [Sponsor & subscribe](https://github.com/sponsors/simonw/)

DiffusionGemma

Similar Articles

DiffusionGemma: 4x Faster Text Generation

Google's latest DiffusionGemma open AI model comes with a 4x speed boost

google/diffusiongemma-26B-A4B-it

DiffusionGemma: The Developer Guide- Google Developers Blog

@_philschmid: Gemma goes diffusion! DiffusionGemma with up to 1000+ tokens per second! - Built on Gemma 4 as a 26B MoE model. - 3.8B …

Submit Feedback

Similar Articles

DiffusionGemma: 4x Faster Text Generation

Google's latest DiffusionGemma open AI model comes with a 4x speed boost

google/diffusiongemma-26B-A4B-it

DiffusionGemma: The Developer Guide- Google Developers Blog

@_philschmid: Gemma goes diffusion! DiffusionGemma with up to 1000+ tokens per second! - Built on Gemma 4 as a 26B MoE model. - 3.8B …