df-lash

Tag

Cards List
#df-lash

Gemma 4 26B Hits 600 Tok/s on One RTX 5090

Reddit r/LocalLLaMA · 5d ago

A benchmark shows that using vLLM with DFlash speculative decoding boosts Gemma 4 26B inference to ~578 tokens per second on a single RTX 5090, achieving a 2.56x speedup over baseline.

0 favorites 0 likes
← Back to home

Submit Feedback