parallel-inference

Tag

Cards List
#parallel-inference

@_philschmid: Gemma goes diffusion! DiffusionGemma with up to 1000+ tokens per second! - Built on Gemma 4 as a 26B MoE model. - 3.8B …

X AI KOLs Following · 15h ago Cached

DiffusionGemma, a 26B MoE model based on Gemma 4, achieves over 1000 tokens per second using diffusion for text generation in 256-token blocks, fitting in 18GB VRAM with quantization, released under Apache 2.0.

0 favorites 0 likes
← Back to home

Submit Feedback