@mweinbach: Who said TPUs can't be fast! This is roughly Groq speeds out of TPU 8i, but with Gemini Flash model so far more intelli…

X AI KOLs Timeline News

Summary

Google demonstrated Gemini Flash model achieving 600-1400 tokens per second on TPU 8i, rivaling Groq's inference speeds.

Who said TPUs can't be fast! This is roughly Groq speeds out of TPU 8i, but with Gemini Flash model so far more intelligent https://t.co/J39FF2yCm6
Original Article
View Cached Full Text

Cached at: 05/20/26, 04:35 PM

Who said TPUs can’t be fast!

This is roughly Groq speeds out of TPU 8i, but with Gemini Flash model so far more intelligent https://t.co/J39FF2yCm6

Max Weinbach (@mweinbach): Google just showed a demo, Gemini Flash model running between 600-1400 tokens per second on TPU 8i

It peaked out around 1480 tok/s, with average around 800 tok/s

Similar Articles

Gemma 4 26B Hits 600 Tok/s on One RTX 5090

Reddit r/LocalLLaMA

A benchmark shows that using vLLM with DFlash speculative decoding boosts Gemma 4 26B inference to ~578 tokens per second on a single RTX 5090, achieving a 2.56x speedup over baseline.

120 tok/s on 12GB VRAM with Gemma 4 12B QAT MTP

Reddit r/LocalLLaMA

Google's Gemma 4 12B QAT model achieves 120 tok/s on a 12GB GPU using Multi-Token Prediction (MTP) with llama.cpp. A step-by-step guide and benchmark comparison without MTP show a 2x speedup.