microbenchmarking

Tag

Cards List
#microbenchmarking

Rigel: Reverse-Engineering the Metal 4.1 Tensor Compute Path on the Apple M4 Max GPU

arXiv cs.CL · 2026-06-12 Cached

Rigel is an empirical characterization of Apple's Metal 4.1 tensor compute path on the M4 Max GPU, revealing that fp8 matmul2d is emulated (not accelerated), the operation executes entirely on GPU shader cores without a dedicated matrix datapath, and reconstructing the opaque cooperative tensor fragment layout.

0 favorites 0 likes
← Back to home

Submit Feedback