small-batch-decode

Tag

Cards List
#small-batch-decode

@shreyansh_26: How do you make a matmul fast when M and N are tiny but K is enormous? (MoE routers, small-batch decode.) Decompose-K: …

X AI KOLs Timeline · 3d ago Cached

A technique to accelerate matrix multiplication when M and N are small but K is large, as encountered in MoE routers and small-batch decoding, by decomposing K and running partial GEMMs in parallel. The approach beats PyTorch Inductor on most shapes using a custom Triton kernel.

0 favorites 0 likes
← Back to home

Submit Feedback