gemm

#gemm

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

Hugging Face Daily Papers ↗ · 2026-05-26 Cached

RT-Lynx proposes using activation sparsity instead of weight sparsity to accelerate diffusion models, achieving up to 1.55× linear-layer speedup while maintaining generation quality, and is accepted at ICML 2026.

0 favorites 0 likes

#gemm

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Hacker News Top ↗ · 2026-05-22 Cached

Introduces CODA, a GPU kernel abstraction that expresses Transformer operations as GEMM-plus-epilogue programs to reduce data movement, covering nearly all non-attention computation in a Transformer block.

0 favorites 0 likes

#gemm

@HanGuo97: Finally, huge thanks to the incredible team: @jcz42, Arjun, Driss, @tensorcore, @yoonrkim, and @tri_dao! PDF: https://a…

X AI KOLs Following ↗ · 2026-05-21 Cached

CODA introduces a GPU kernel abstraction that rewrites transformer computations as GEMM-plus-epilogue programs, reducing memory-bound operations and improving efficiency in training.

0 favorites 0 likes

gemm

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

@HanGuo97: Finally, huge thanks to the incredible team: @jcz42, Arjun, Driss, @tensorcore, @yoonrkim, and @tri_dao! PDF: https://a…

Submit Feedback