gpu-architecture

#gpu-architecture

FP8 is All You Need (Part 1): Debunking Hardware FP64 as the HPC Holy Grail

arXiv cs.AI ↗ · 2026-06-08 Cached

This paper argues that using FP8 tensor cores with Ozaki Scheme II can replace native FP64 hardware for high-performance scientific computing on AI-optimized GPUs like NVIDIA's B300, achieving full double-precision accuracy at much higher throughput. The authors present a Tensor-Memory Equilibrium model and show that emulated FP64 performance can exceed native FP64 by orders of magnitude across all workloads.

0 favorites 0 likes

#gpu-architecture

@vivekgalatage: Best structured reference I've found for GPU optimization - 450 papers, 14 years of research. Some techniques will have…

X AI KOLs Timeline ↗ · 2026-05-20 Cached

A tweet shares a structured reference of 450 papers on GPU optimization spanning 14 years, noting that while some techniques evolve, the mental models remain useful. It also references a lecture on GPU architectures by Onur Mutlu.

0 favorites 0 likes

#gpu-architecture

Anyone else following Q.ANT's photonic GPU advancements? Tech shifting point

Reddit r/LocalLLaMA ↗ · 2026-05-13

Q.ANT has opened a US facility in Austin and appointed Bruno Spruth as CTO, highlighting production of photonic GPUs at the Leibniz Supercomputing Centre that claim significant performance and energy efficiency gains over traditional transistor-based chips.

0 favorites 0 likes

#gpu-architecture

@pauliusztin_: I just found one of the most useful resources for understanding GPUs. No more jumping between random docs, PDFs, and fo…

X AI KOLs Following ↗ · 2026-05-08 Cached

Modal Labs has released an open-source, interlinked GPU glossary that consolidates fragmented NVIDIA documentation, CUDA details, and compiler flags into a single navigable resource for engineers optimizing LLM training and inference.

0 favorites 0 likes

#gpu-architecture

https://www.youtube.com/watch?v=aE0onltJlOo

YouTube AI Channels ↗ · 2026-05-21 Cached

This lecture introduces the flexible evolution of GPU architecture as a SIMD (vector/array) processor, discusses data parallelism, memory bank grouping, bank conflicts, serial bottlenecks, and the history of SIMD instructions (such as MMX), emphasizing how GPUs leverage data parallelism and deal with serial bottlenecks.

0 favorites 0 likes

gpu-architecture

FP8 is All You Need (Part 1): Debunking Hardware FP64 as the HPC Holy Grail

@vivekgalatage: Best structured reference I've found for GPU optimization - 450 papers, 14 years of research. Some techniques will have…

Anyone else following Q.ANT's photonic GPU advancements? Tech shifting point

@pauliusztin_: I just found one of the most useful resources for understanding GPUs. No more jumping between random docs, PDFs, and fo…

https://www.youtube.com/watch?v=aE0onltJlOo

Submit Feedback