@vivekgalatage: Best structured reference I've found for GPU optimization - 450 papers, 14 years of research. Some techniques will have…

X AI KOLs Timeline 05/20/26, 10:00 PM News

Summary

A tweet shares a structured reference of 450 papers on GPU optimization spanning 14 years, noting that while some techniques evolve, the mental models remain useful. It also references a lecture on GPU architectures by Onur Mutlu.

Best structured reference I've found for GPU optimization - 450 papers, 14 years of research. Some techniques will have evolved, but the mental models hold up. https://t.co/2kFfsEq31h https://t.co/0jmUz9OUsX

Original Article

View Cached Full Text

Cached at: 05/21/26, 10:22 AM

Best structured reference I’ve found for GPU optimization - 450 papers, 14 years of research. Some techniques will have evolved, but the mental models hold up.

https://t.co/2kFfsEq31h https://t.co/0jmUz9OUsX

Vivek Galatage (@vivekgalatage): Yesterday’s lecture on GPU Architectures by @onurmutlu

Similar Articles

@pauliusztin_: I just found one of the most useful resources for understanding GPUs. No more jumping between random docs, PDFs, and fo…

X AI KOLs Following

Modal Labs has released an open-source, interlinked GPU glossary that consolidates fragmented NVIDIA documentation, CUDA details, and compiler flags into a single navigable resource for engineers optimizing LLM training and inference.

@Suryanshti777: NVIDIA just revealed the hidden tricks they’re using to make LLM fine-tuning dramatically faster. Not new GPUs. Not big…

X AI KOLs Timeline

NVIDIA and Unsloth have published a technical guide detailing three low-level optimizations that can accelerate LLM fine-tuning by up to 25%, including packed-sequence caching, double-buffered checkpointing, and optimized MoE routing. The guide provides deep systems-level explanations and benchmarks aimed at ML engineers and developers.

@levidiamode: Day 138/365 of GPU Programming One of my favorite lectures I've watched this year is Stanford's CS336 lecture 7 on GPU …

X AI KOLs Timeline

A learner shares enthusiasm for Stanford CS336 lecture 7 on GPU parallelism, which covers fundamental operations and connects them to multi-GPU setups and parallelism techniques like tensor, data, and pipeline parallelism.

@_akhaliq: GPU Forecasters Language Models as Selective Surrogates for Kernel Runtime Optimization

X AI KOLs Following

This paper proposes using language models as selective surrogates to optimize GPU kernel runtime, demonstrating a novel approach to performance forecasting.

@rohanpaul_ai: Good GPU performance summaries - in 6 mints.

X AI KOLs Following

A link to concise GPU performance summaries, claim to take 6 minutes to read.

Similar Articles

@pauliusztin_: I just found one of the most useful resources for understanding GPUs. No more jumping between random docs, PDFs, and fo…

@Suryanshti777: NVIDIA just revealed the hidden tricks they’re using to make LLM fine-tuning dramatically faster. Not new GPUs. Not big…

@levidiamode: Day 138/365 of GPU Programming One of my favorite lectures I've watched this year is Stanford's CS336 lecture 7 on GPU …

@_akhaliq: GPU Forecasters Language Models as Selective Surrogates for Kernel Runtime Optimization

@rohanpaul_ai: Good GPU performance summaries - in 6 mints.

Submit Feedback