kernel-programming

Tag

Cards List
#kernel-programming

@charles_irl: Rewriting parallelism is a big move and it'd be nice to make it even faster than we can do with CuTe DSL. FA4 is a very…

X AI KOLs Following · 2d ago Cached

Discussion about rewriting parallelism to improve kernel performance using CuTe DSL and tile programming models for the FA4 (FlashAttention 4) kernel.

0 favorites 0 likes
← Back to home

Submit Feedback