@HuggingPapers: NVIDIA just released an NVFP4-quantized DiffusionGemma on Hugging Face A 26B MoE multimodal model generating text via p…
Summary
NVIDIA released a 26B MoE multimodal model called DiffusionGemma on Hugging Face, using NVFP4 quantization and achieving over 1,100 tokens per second on Hopper hardware.
View Cached Full Text
Cached at: 06/10/26, 09:55 PM
NVIDIA just released an NVFP4-quantized DiffusionGemma on Hugging Face
A 26B MoE multimodal model generating text via parallel diffusion,
with 256K context and 1,100+ tokens/sec speed on Hopper. https://t.co/xxd93AKmga
Similar Articles
@_philschmid: Gemma goes diffusion! DiffusionGemma with up to 1000+ tokens per second! - Built on Gemma 4 as a 26B MoE model. - 3.8B …
DiffusionGemma, a 26B MoE model based on Gemma 4, achieves over 1000 tokens per second using diffusion for text generation in 256-token blocks, fitting in 18GB VRAM with quantization, released under Apache 2.0.
DiffusionGemma: 4x Faster Text Generation
Google introduces DiffusionGemma, an experimental 26B MoE open model that achieves up to 4x faster text generation on GPUs using text diffusion, targeting speed-critical interactive local workflows.
DiffusionGemma
Google released DiffusionGemma, an open-weight text generation model (26B parameters, 4B active) under Apache 2 license, demonstrating high inference speeds via NVIDIA's NIM cloud API.
google/diffusiongemma-26B-A4B-it
Google DeepMind releases DiffusionGemma, a 26B-parameter Mixture-of-Experts model that uses discrete diffusion for faster text generation, supporting multimodal inputs and a 256K token context.
@HuggingPapers: NVIDIA just released AnyFlow on Hugging Face The first any-step video diffusion model that generates high-quality text-…
NVIDIA released AnyFlow, the first any-step video diffusion model for text-to-video generation, allowing smooth quality scaling across inference budgets (4 to 50 steps).