@chessMan786: Fundamentals of GPU Architecture
Summary
A tweet shares a link to an article about the fundamentals of GPU architecture.
View Cached Full Text
Cached at: 06/27/26, 03:57 PM
Fundamentals of GPU Architecture https://t.co/MYtgFKmdLq
Similar Articles
@vivekgalatage: Best structured reference I've found for GPU optimization - 450 papers, 14 years of research. Some techniques will have…
A tweet shares a structured reference of 450 papers on GPU optimization spanning 14 years, noting that while some techniques evolve, the mental models remain useful. It also references a lecture on GPU architectures by Onur Mutlu.
@DanKornas: GPU engineering is too broad to learn from random tabs. Awesome GPU Engineering is a curated GitHub list of resources f…
A curated GitHub list of resources for learning GPU engineering, covering architecture, kernel programming, optimization, distributed systems, and AI acceleration with books, frameworks, profilers, and interview prep.
@snowboat84: https://x.com/snowboat84/status/2061962883651731602
This article is the first part of the AI Engineering Panorama series. From a historical perspective, it reviews the evolution of GPUs from gaming graphics cards to AI accelerators, the bold bet of CUDA, the independent path of Google's TPU, and why NVIDIA ultimately prevailed. It also provides a detailed analysis of the underlying logic of AI infrastructure such as chips, supply chain, networking, and power.
@goyal__pramod: Software is evolving, so should you! These are the best blogs I read to understand GPUs and CUDA!
Tweet recommending a collection of blogs to understand GPUs and CUDA, encouraging developers to improve their skills.
@ZhihuFrontier: GPU programming changed because Tensor Cores became too fast to feed Zhihu contributor THU-PACMAN实验室 shared a sharp bre…
A detailed analysis of how NVIDIA GPU programming evolved from Volta to Blackwell, highlighting the shift from synchronous thread models to asynchronous dataflow and the challenges of feeding Tensor Cores. The article discusses new hardware features like TMA, TMEM, and tcgen05 MMA, and shows how modern kernels like FlashAttention-3 and FlashMLA exploit these changes for higher utilization.