@omarsar0: This is awesome! I am spending a lot of time on diffusion LLMs these days, so this is perfect timing. I feel like there…
Summary
Google DeepMind released DiffusionGemma, an open experimental model that generates text in blocks rather than word-by-word, enabling self-correction and faster output.
View Cached Full Text
Cached at: 06/10/26, 05:53 PM
This is awesome!
I am spending a lot of time on diffusion LLMs these days, so this is perfect timing.
I feel like there are so many underexplored research questions around text diffusion.
Weight available in HF. https://t.co/BpZM7Vxwvm
Google DeepMind (@GoogleDeepMind): DiffusionGemma is our new experimental open model with up to 4x faster output on dedicated GPUs.
Instead of predicting word-by-word, it generates entire blocks of text simultaneously. This lets the model self-correct and format complex markdown in real time.
Similar Articles
google/diffusiongemma-26B-A4B-it
Google DeepMind releases DiffusionGemma, a 26B-parameter Mixture-of-Experts model that uses discrete diffusion for faster text generation, supporting multimodal inputs and a 256K token context.
DiffusionGemma: 4x Faster Text Generation
Google introduces DiffusionGemma, an experimental 26B MoE open model that achieves up to 4x faster text generation on GPUs using text diffusion, targeting speed-critical interactive local workflows.
@vllm_project: Congrats to @GoogleDeepMind on DiffusionGemma A 26B diffusion language model on the Gemma4 backbone, and the first dLLM…
vLLM announces native support for Google DeepMind's DiffusionGemma, a 26B discrete diffusion language model that generates 256-token blocks in parallel, enabling low-latency inference at 1200+ tok/s on a single H200.
Google's latest DiffusionGemma open AI model comes with a 4x speed boost
Google released DiffusionGemma, an experimental open-source diffusion model for text generation that achieves 4x speed boost over autoregressive models, optimized for local processing.
DiffusionGemma
Google released DiffusionGemma, an open-weight text generation model (26B parameters, 4B active) under Apache 2 license, demonstrating high inference speeds via NVIDIA's NIM cloud API.