tensor-core

Tag

Cards List
#tensor-core

@charles_irl: Rewriting parallelism is a big move and it'd be nice to make it even faster than we can do with CuTe DSL. FA4 is a very…

X AI KOLs Following · 4d ago Cached

Discussion about rewriting parallelism to improve kernel performance using CuTe DSL and tile programming models for the FA4 (FlashAttention 4) kernel.

0 favorites 0 likes
#tensor-core

FP8 is All You Need (Part 1): Debunking Hardware FP64 as the HPC Holy Grail

arXiv cs.AI · 2026-06-08 Cached

This paper argues that using FP8 tensor cores with Ozaki Scheme II can replace native FP64 hardware for high-performance scientific computing on AI-optimized GPUs like NVIDIA's B300, achieving full double-precision accuracy at much higher throughput. The authors present a Tensor-Memory Equilibrium model and show that emulated FP64 performance can exceed native FP64 by orders of magnitude across all workloads.

0 favorites 0 likes
← Back to home

Submit Feedback