Tag
A deep dive into Google's TPU architecture, explaining the design philosophy of systolic arrays, pipelining, and ahead-of-time compilation that enables high throughput and energy efficiency.
Google's TPU uses the systolic array architecture from 1978 to accelerate matrix multiplication with less memory movement. The post shares links to the original paper and TPU design, and suggests building a small-scale version on an FPGA.
Google is adopting Nvidia's strategy to build a competitive AI chip business, renting TPU computing power to Anthropic and boosting inference performance to rival Nvidia's dominance.
A blog post from LMSYS Org details optimizing Ling-2.6-1T, a 1 trillion parameter hybrid MoE model, on TPU v7x using SGLang-JAX, achieving efficient inference by hiding MoE data movement behind computation with a single Pallas kernel.
Google is in talks with Samsung to manufacture a component of its next-generation AI chip, codenamed Icefish, using 2-nanometer technology, while the main part will be made by TSMC. The chip aims to offer an alternative to Nvidia's GPUs and is expected to enter mass production as soon as 2028.
A new CLI tool for Google Colab enables GPU/TPU provisioning, remote script execution, and interactive REPL access from the terminal, with built-in agent skills for automated tasks like fine-tuning models.
This article is the first part of the AI Engineering Panorama series. From a historical perspective, it reviews the evolution of GPUs from gaming graphics cards to AI accelerators, the bold bet of CUDA, the independent path of Google's TPU, and why NVIDIA ultimately prevailed. It also provides a detailed analysis of the underlying logic of AI infrastructure such as chips, supply chain, networking, and power.
Midjourney stated that using Google TPUs set their research back by a year, expressing regret for not sticking exclusively with Nvidia hardware.
Google demonstrated Gemini Flash model achieving 600-1400 tokens per second on TPU 8i, rivaling Groq's inference speeds.
Google released a roundup of major AI updates from April 2026, including the Gemma 4 model, Gemini Enterprise Agent Platform, and eighth-generation TPUs announced at Cloud Next '26.
Google explains how its custom Tensor Processing Units (TPUs) are designed to handle massive AI workloads, highlighting the latest generation's ability to process 121 exaflops of compute power.
Google unveiled eighth-gen TPUs (8t/8i) and a new Gemini Enterprise Agent Platform at Cloud Next, while revealing 75% of new Google code is now AI-generated.
Google unveils eighth-generation TPU 8t and TPU 8i, purpose-built for massive pre-training and inference with SparseCore, native FP4, and 9,600-chip superpods to power world models and agentic AI.
Google unveils 8th-gen TPUs: TPU 8t for training and TPU 8i for inference, purpose-built for power-efficient, large-scale AI agent workloads and arriving later this year.
Google announces the launch of two new specialized TPU chips, TPU 8i and TPU 8t, designed to optimize AI agent reasoning and large model training respectively.
Sundar Pichai opened Google I/O 2026 with highlights of AI token processing reaching 3.2 quintillion per month, new TPU 80/80i chips, the Gemini Omni world model, and multiple product updates, emphasizing full-stack AI innovation.
Google I/O '26 keynote demonstrated AI acceleration across the board: 32 quadrillion tokens processed per month, over 900 million monthly active users for Gemini, new-generation TPU chips and world model Gemini Omni unveiled, along with conversational AI features such as Ask YouTube and Docs Live.
This is a creative and humorous short film about TPU Training Day preparing for Google I/O, showing the spirit of the TPU team taking on trillions of tasks through dialogue and motivational slogans.