tensor-core

#tensor-core

Exploring FlashAttention-3/4 optimizations on RTX GPUs

Reddit r/LocalLLaMA ↗ · 2026-07-09

This article explores whether FlashAttention-3/4 optimizations benefit RTX GPUs, concluding that FA-2 is the ceiling due to hardware limitations on consumer cards.

0 favorites 0 likes

#tensor-core

@elliotarledge: Claude Fable 5 [max] on KernelBench-Hard. The main kernel that impressed me was a B200 fp8 GEMM: it HAND WROTE raw SM10…

X AI KOLs Timeline ↗ · 2026-07-03 Cached

Claude Fable 5 achieves top results on KernelBench-Hard by hand-writing PTX code for B200 fp8 GEMM, outperforming other models and reaching 44-59% of peak performance on compute-bound shapes.

0 favorites 0 likes

#tensor-core

@charles_irl: Rewriting parallelism is a big move and it'd be nice to make it even faster than we can do with CuTe DSL. FA4 is a very…

X AI KOLs Following ↗ · 2026-06-11 Cached

Discussion about rewriting parallelism to improve kernel performance using CuTe DSL and tile programming models for the FA4 (FlashAttention 4) kernel.

0 favorites 0 likes

#tensor-core

FP8 is All You Need (Part 1): Debunking Hardware FP64 as the HPC Holy Grail

arXiv cs.AI ↗ · 2026-06-08 Cached

This paper argues that using FP8 tensor cores with Ozaki Scheme II can replace native FP64 hardware for high-performance scientific computing on AI-optimized GPUs like NVIDIA's B300, achieving full double-precision accuracy at much higher throughput. The authors present a Tensor-Memory Equilibrium model and show that emulated FP64 performance can exceed native FP64 by orders of magnitude across all workloads.

0 favorites 0 likes

tensor-core

Exploring FlashAttention-3/4 optimizations on RTX GPUs

@elliotarledge: Claude Fable 5 [max] on KernelBench-Hard. The main kernel that impressed me was a B200 fp8 GEMM: it HAND WROTE raw SM10…

@charles_irl: Rewriting parallelism is a big move and it'd be nice to make it even faster than we can do with CuTe DSL. FA4 is a very…

FP8 is All You Need (Part 1): Debunking Hardware FP64 as the HPC Holy Grail

Submit Feedback