flash-attention-2

#flash-attention-2

Anyone using Flash Attention 2 (ai-bond) on their V100's? How is the performance?

Reddit r/LocalLLaMA ↗ · 2026-05-29

A user benchmarks a V100-compatible port of Flash Attention 2, reporting 3x-17x speedups and up to 94% memory reduction over default PyTorch attention.

0 favorites 0 likes

flash-attention-2

Anyone using Flash Attention 2 (ai-bond) on their V100's? How is the performance?

Submit Feedback