memory-bound

#memory-bound

@_avichawla: Prefill & decode in LLM inference. Have you ever noticed that the first token from an LLM always takes a moment to appe…

X AI KOLs Timeline ↗ · yesterday Cached

Explains the two phases of LLM inference - prefill and decode - detailing how GPU bottlenecks shift from compute-bound during prefill to memory-bound during decode, and the importance of KV caching.

0 favorites 0 likes

#memory-bound

@HanGuo97: LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels. CODA reparameterizes…

X AI KOLs Following ↗ · 2026-05-21 Cached

CODA reparameterizes memory-bound operations in LLM training to fuse them into the matmul epilogue, achieving near state-of-the-art performance with LLM-generated kernels.

0 favorites 0 likes

#memory-bound

@polydao: This Stanford lecture on AI inference will teach you more about how LLMs work in production than most ML courses > Clau…

X AI KOLs Timeline ↗ · 2026-05-13

A Stanford lecture on AI inference emphasizes practical bottlenecks like KV-cache and techniques like speculative decoding and continuous batching, offering more real-world insight than typical ML courses.

0 favorites 0 likes

memory-bound

@_avichawla: Prefill & decode in LLM inference. Have you ever noticed that the first token from an LLM always takes a moment to appe…

@HanGuo97: LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels. CODA reparameterizes…

@polydao: This Stanford lecture on AI inference will teach you more about how LLMs work in production than most ML courses > Clau…

Submit Feedback