Tag
A user shares their experience running DiffusionGemma 26B on a 4090 GPU via vLLM, achieving up to 475t/s but noting drawbacks like single-user limitation, lower accuracy, and short context, concluding it's not worth using over the regular 26B model.