Tag
Rigel is an empirical characterization of Apple's Metal 4.1 tensor compute path on the M4 Max GPU, revealing that fp8 matmul2d is emulated (not accelerated), the operation executes entirely on GPU shader cores without a dedicated matrix datapath, and reconstructing the opaque cooperative tensor fragment layout.