@mweinbach: Who said TPUs can't be fast! This is roughly Groq speeds out of TPU 8i, but with Gemini Flash model so far more intelli…
Summary
Google demonstrated Gemini Flash model achieving 600-1400 tokens per second on TPU 8i, rivaling Groq's inference speeds.
View Cached Full Text
Cached at: 05/20/26, 04:35 PM
Who said TPUs can’t be fast!
This is roughly Groq speeds out of TPU 8i, but with Gemini Flash model so far more intelligent https://t.co/J39FF2yCm6
Max Weinbach (@mweinbach): Google just showed a demo, Gemini Flash model running between 600-1400 tokens per second on TPU 8i
It peaked out around 1480 tok/s, with average around 800 tok/s
Similar Articles
@analogalok: Gemma 4 12B QAT (dense) achieves 1000+ tokens/sec prefill on 8GB VRAM with 120k context Gemma 4 12B QAT (dense), TurboQ…
Gemma 4 12B QAT (dense) achieves over 1000 tokens per second prefill on an 8GB RTX 4060 with 120k context using TurboQuant, enabling full GPU layer offloading. This represents a 42% increase in prefill speed over previous methods.
Gemma 4 26B Hits 600 Tok/s on One RTX 5090
A benchmark shows that using vLLM with DFlash speculative decoding boosts Gemma 4 26B inference to ~578 tokens per second on a single RTX 5090, achieving a 2.56x speedup over baseline.
120 tok/s on 12GB VRAM with Gemma 4 12B QAT MTP
Google's Gemma 4 12B QAT model achieves 120 tok/s on a 12GB GPU using Multi-Token Prediction (MTP) with llama.cpp. A step-by-step guide and benchmark comparison without MTP show a 2x speedup.
@onusoz: 16x parallel Gemma-4-26B-A4B-NVFP4 runs 18 output tokens/s, aggregate 300 tok/s 1 DGX Spark with 128 GB unified memo…
@onusoz demonstrates running 16 parallel instances of NVIDIA's quantized Gemma-4-26B-A4B-NVFP4 model on a single DGX Spark with 128GB unified memory, achieving 300 tok/s aggregate, showcasing high concurrency without flashinfer.
Our eighth generation TPUs: two chips for the agentic era
Google unveils 8th-gen TPUs: TPU 8t for training and TPU 8i for inference, purpose-built for power-efficient, large-scale AI agent workloads and arriving later this year.