@JeffDean: My @Google colleagues @NormJouppi, Sridhar Lakshmanamurthy, Cliff Young, and David Patterson recently wrote a paper tha…
Summary
Google researchers published a paper summarizing the evolution of TPU supercomputers from TPU v2 to Ironwood, detailing architectural stability, scale, resilience, power efficiency, and a 3600x performance increase over eight years.
View Cached Full Text
Cached at: 06/18/26, 08:10 PM
My @Google colleagues @NormJouppi, Sridhar Lakshmanamurthy, Cliff Young, and David Patterson recently wrote a paper that will appear in the July/August 2026 edition of @ieeemicro titled “Google’s Training Supercomputers from TPU v2 to Ironwood: Architectural Stability, Scale, Resilience, Power Efficiency, and Sustainability Across Five Generations”. It’s chock full of interesting data about the evolution of TPU chip generations, as well as how workloads at Google have transformed over time (hint: lots more transformer-based models!), and how the generations have gotten ~30X more energy efficient per flop.
Lots of changes over these generations: Air cooling in TPUv2 to water cooling in TPUv3 onwards 2D to 3D torus-based interconnects 30X improvement TFLOPS/Watt 256 chips (TPUv2) to 9216 chips (Ironwood) per pod
Read the full paper: https://arxiv.org/abs/2606.15870
Google’s Training Supercomputers from TPU v2 to Ironwood: Architectural Stability, Scale, Resilience, Power Efficiency, and Sustainability Across Five Generations
Source: https://arxiv.org/abs/2606.15870 View PDF
Abstract:This paper (to appear in the July/August 2026 issue of IEEE Micro magazine) summarizes five generations of Google s TPUs, from TPU v2 to Ironwood, highlighting their evolution as scalable, resilient, power-efficient, sustainable supercomputers for AI training. It details the TPU s stable architecture, which has surprisingly easily accommodated the rapidly changing deep neural network workloads, such as the rise of Transformers. Key advancements over eight years include 10x increase in HBM capacity and bandwidth per node, a 100x increase in peak node performance, and a 3600x increase in supercomputer performance. The paper also discusses the role of optical circuit switches, built-in self test, and hardware replay in enhancing resilience and how TPU’s environmental impact is reduced with substantial improvements in performance per Watt and in carbon emissions per floating point operation. It concludes by identifying six features that may well characterize successful training accelerators of this decade.
Submission history
From: Cliff Young [view email] **[v1]**Sun, 14 Jun 2026 15:44:31 UTC (2,943 KB)
Similar Articles
Here’s how our TPUs power increasingly demanding AI workloads.
Google explains how its custom Tensor Processing Units (TPUs) are designed to handle massive AI workloads, highlighting the latest generation's ability to process 121 exaflops of compute power.
The eighth-generation TPU: An architecture deep dive
Google unveils eighth-generation TPU 8t and TPU 8i, purpose-built for massive pre-training and inference with SparseCore, native FP4, and 9,600-chip superpods to power world models and agentic AI.
Our eighth generation TPUs: two chips for the agentic era
Google unveils 8th-gen TPUs: TPU 8t for training and TPU 8i for inference, purpose-built for power-efficient, large-scale AI agent workloads and arriving later this year.
Google just unveiled its newest AI chips
Google unveiled eighth-gen TPUs (8t/8i) and a new Gemini Enterprise Agent Platform at Cloud Next, while revealing 75% of new Google code is now AI-generated.
We're launching two specialized TPUs for the agentic era.
Google announces the launch of two new specialized TPU chips, TPU 8i and TPU 8t, designed to optimize AI agent reasoning and large model training respectively.