@vivekgalatage: Best structured reference I've found for GPU optimization - 450 papers, 14 years of research. Some techniques will have…
Summary
A tweet shares a structured reference of 450 papers on GPU optimization spanning 14 years, noting that while some techniques evolve, the mental models remain useful. It also references a lecture on GPU architectures by Onur Mutlu.
View Cached Full Text
Cached at: 05/21/26, 10:22 AM
Best structured reference I’ve found for GPU optimization - 450 papers, 14 years of research. Some techniques will have evolved, but the mental models hold up.
https://t.co/2kFfsEq31h https://t.co/0jmUz9OUsX
Vivek Galatage (@vivekgalatage): Yesterday’s lecture on GPU Architectures by @onurmutlu
Similar Articles
@pauliusztin_: I just found one of the most useful resources for understanding GPUs. No more jumping between random docs, PDFs, and fo…
Modal Labs has released an open-source, interlinked GPU glossary that consolidates fragmented NVIDIA documentation, CUDA details, and compiler flags into a single navigable resource for engineers optimizing LLM training and inference.
@Suryanshti777: NVIDIA just revealed the hidden tricks they’re using to make LLM fine-tuning dramatically faster. Not new GPUs. Not big…
NVIDIA and Unsloth have published a technical guide detailing three low-level optimizations that can accelerate LLM fine-tuning by up to 25%, including packed-sequence caching, double-buffered checkpointing, and optimized MoE routing. The guide provides deep systems-level explanations and benchmarks aimed at ML engineers and developers.
@levidiamode: Day 138/365 of GPU Programming One of my favorite lectures I've watched this year is Stanford's CS336 lecture 7 on GPU …
A learner shares enthusiasm for Stanford CS336 lecture 7 on GPU parallelism, which covers fundamental operations and connects them to multi-GPU setups and parallelism techniques like tensor, data, and pipeline parallelism.
@_akhaliq: GPU Forecasters Language Models as Selective Surrogates for Kernel Runtime Optimization
This paper proposes using language models as selective surrogates to optimize GPU kernel runtime, demonstrating a novel approach to performance forecasting.
@rohanpaul_ai: Good GPU performance summaries - in 6 mints.
A link to concise GPU performance summaries, claim to take 6 minutes to read.