DiffusionGemma under real workloads feels very different from benchmark demos
Summary
Internal testing of DiffusionGemma reveals significant performance differences between H100 and A100 GPUs under real-world workloads, with H100s scaling much better under concurrency, and efficiency varying greatly depending on workload type, raising questions about benchmark reliability.
Similar Articles
@mervenoyann: DiffusionGemma is out it's compute-bound so 4x faster compared to other Gemma-4 models (1k tok/s on H100) also great on…
DiffusionGemma is out; it's compute-bound and 4x faster than other Gemma-4 models with 1k tok/s on H100, and excels at coding tasks including 3D generation and front-end.
DiffusionGemma 26B A4B results on my 5090
This post presents benchmark results and tuning parameters for running DiffusionGemma 26B A4B GGUF models on an RTX 5090 GPU, showing up to 44% speedup via optimized temperature settings and quantization choices.
DifussionGemma 4 on 4x7900xtx
Reports running DiffusionGemma 26B on four AMD 7900 XTX GPUs using vllm, achieving 100 tps generation with overall 45-60 t/s, sharing performance metrics and setup commands.
DiffusionGemma: 4x Faster Text Generation
Google introduces DiffusionGemma, an experimental 26B MoE open model that achieves up to 4x faster text generation on GPUs using text diffusion, targeting speed-critical interactive local workflows.
Diffusion Gemma is 4x faster, but makes 6x more mistakes!
A benchmark shows Diffusion Gemma is 4x faster than Gemma4 but makes 6x more factual mistakes, especially on obscure topics, trading factual accuracy for smooth text generation.