@levidiamode: 163/365 of GPU Programming Looking at a few different agentic GPU kernel optimization systems today. The two I'm most i…

X AI KOLs Timeline 06/15/26, 09:33 PM News

gpu-programming kernel-optimization agentic-systems machine-learning competition research

Summary

A tweet discussing two agentic GPU kernel optimization systems: Auto GPU Kernel by @dogacel0 and Kernel Design Agents from @songhan_mit's lab, both winners at the MLSys Sparse Attention FlashInfer competition. The thread highlights different approaches using subagents and Claude skills for GPU programming.

163/365 of GPU Programming Looking at a few different agentic GPU kernel optimization systems today. The two I'm most interested in atm: - @dogacel0's Auto GPU Kernel which he used to win the DeepSeek Sparse Attention FlashInfer challenge at MLSys this year - Kernel Design Agents out of @songhan_mit's lab which won 1st place in the MoE track of the same competition Really interesting to see the different uses of subagents and Claude skills like the Kernel Wiki to optimize these agentic loops for GPU programming. Some great inspiration in both for my own workflows

Original Article

View Cached Full Text

Cached at: 06/16/26, 01:13 AM

163/365 of GPU Programming

Looking at a few different agentic GPU kernel optimization systems today. The two I’m most interested in atm:

@dogacel0’s Auto GPU Kernel which he used to win the DeepSeek Sparse Attention FlashInfer challenge at MLSys this year
Kernel Design Agents out of @songhan_mit’s lab which won 1st place in the MoE track of the same competition

Really interesting to see the different uses of subagents and Claude skills like the Kernel Wiki to optimize these agentic loops for GPU programming. Some great inspiration in both for my own workflows

Links to repos:

https://github.com/Dogacel/auto-gpu-kernel…
https://github.com/mit-han-lab/kernel-design-agents…

Similar Articles

AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents

Hugging Face Daily Papers

AgentKernelArena is an open-source benchmark for evaluating AI coding agents on GPU kernel optimization, assessing full agent workflows and generalization to unseen configurations across 196 tasks.

@levidiamode: 157/365 of GPU Programming Another FlashAttention4 resource that's been really helpful for me is the talk @charles_irl …

X AI KOLs Following

A daily GPU programming thread highlights a talk by Charles_irl that reverse-engineers FlashAttention4 code before the paper release, praising the Modal team's deep code dissection and inferences about the forward pass.

@levidiamode: Day 138/365 of GPU Programming One of my favorite lectures I've watched this year is Stanford's CS336 lecture 7 on GPU …

X AI KOLs Timeline

A learner shares enthusiasm for Stanford CS336 lecture 7 on GPU parallelism, which covers fundamental operations and connects them to multi-GPU setups and parallelism techniques like tensor, data, and pipeline parallelism.

@levidiamode: 158/365 of GPU Programming I think I understand the high level differences between the FlashAttention 2, 3 and 4 forwar…

X AI KOLs Timeline

The author documents their progress in learning GPU programming, focusing on understanding the high-level differences between FlashAttention 2, 3, and 4 forward passes, and lists several low-level concepts they need to explore further.

AdaExplore: Failure-Driven Adaptation and Diversity-Preserving Search for Efficient Kernel Generation

arXiv cs.CL

Researchers from Carnegie Mellon, University of Washington, and Arm propose AdaExplore, an LLM agent framework for GPU kernel code generation that achieves 3.12× and 1.72× speedups on KernelBench Level-2 and Level-3 benchmarks through failure-driven adaptation and diversity-preserving search, without additional fine-tuning.