flash-attention-4

#flash-attention-4

@charles_irl: Last fall, we shared our deep dive on FA4 internals. But we didn't stop at grokking the kernel. Since then, we've been …

X AI KOLs Following ↗ · 5d ago Cached

A blog post details contributions to FlashAttention-4 to improve its performance for large language model inference, especially for decode-heavy workloads, by adjusting parallelism strategies and supporting irregular memory accesses.

0 favorites 0 likes

#flash-attention-4

@levidiamode: 157/365 of GPU Programming Another FlashAttention4 resource that's been really helpful for me is the talk @charles_irl …

X AI KOLs Following ↗ · 2026-06-09 Cached

A daily GPU programming thread highlights a talk by Charles_irl that reverse-engineers FlashAttention4 code before the paper release, praising the Modal team's deep code dissection and inferences about the forward pass.

0 favorites 0 likes

flash-attention-4

@charles_irl: Last fall, we shared our deep dive on FA4 internals. But we didn't stop at grokking the kernel. Since then, we've been …

@levidiamode: 157/365 of GPU Programming Another FlashAttention4 resource that's been really helpful for me is the talk @charles_irl …

Submit Feedback