@Kimi_Moonshot: We're open-sourcing FlashKDA — our high-performance CUTLASS-based implementation of Kimi Delta Attention kernels. Achie…

X AI KOLs Following Tools

Summary

Moonshot AI releases FlashKDA, an open-source CUTLASS-based implementation of Kimi Delta Attention kernels that delivers 1.72×–2.22× prefill speedup on H20 GPUs.

We're open-sourcing FlashKDA — our high-performance CUTLASS-based implementation of Kimi Delta Attention kernels. Achieves 1.72×–2.22× prefill speedup over the flash-linear-attention baseline on H20, and works as a drop-in backend for flash-linear-attention. Explore on github:
Original Article

Similar Articles