skinny-matmuls

#skinny-matmuls

@shreyansh_26: https://x.com/shreyansh_26/status/2069125463860302212

X AI KOLs Timeline ↗ · 3d ago Cached

This post explains the Decompose-K technique for accelerating skinny large-K matrix multiplications by splitting the K dimension into chunks, running batched matmuls, and summing partials. It provides a PyTorch implementation and benchmarks showing significant speedups over standard torch.compile for bad-shaped matmuls.

0 favorites 0 likes

skinny-matmuls

@shreyansh_26: https://x.com/shreyansh_26/status/2069125463860302212

Submit Feedback