@HuggingPapers: NVIDIA just released an NVFP4-quantized DiffusionGemma on Hugging Face A 26B MoE multimodal model generating text via p…

X AI KOLs Following Models

Summary

NVIDIA released a 26B MoE multimodal model called DiffusionGemma on Hugging Face, using NVFP4 quantization and achieving over 1,100 tokens per second on Hopper hardware.

NVIDIA just released an NVFP4-quantized DiffusionGemma on Hugging Face A 26B MoE multimodal model generating text via parallel diffusion, with 256K context and 1,100+ tokens/sec speed on Hopper. https://t.co/xxd93AKmga
Original Article
View Cached Full Text

Cached at: 06/10/26, 09:55 PM

NVIDIA just released an NVFP4-quantized DiffusionGemma on Hugging Face

A 26B MoE multimodal model generating text via parallel diffusion,

with 256K context and 1,100+ tokens/sec speed on Hopper. https://t.co/xxd93AKmga

Similar Articles

DiffusionGemma: 4x Faster Text Generation

Hacker News Top

Google introduces DiffusionGemma, an experimental 26B MoE open model that achieves up to 4x faster text generation on GPUs using text diffusion, targeting speed-critical interactive local workflows.

DiffusionGemma

Simon Willison's Blog

Google released DiffusionGemma, an open-weight text generation model (26B parameters, 4B active) under Apache 2 license, demonstrating high inference speeds via NVIDIA's NIM cloud API.

google/diffusiongemma-26B-A4B-it

Hugging Face Models Trending

Google DeepMind releases DiffusionGemma, a 26B-parameter Mixture-of-Experts model that uses discrete diffusion for faster text generation, supporting multimodal inputs and a 256K token context.