Tag
RT-Lynx proposes using activation sparsity instead of weight sparsity to accelerate diffusion models, achieving up to 1.55× linear-layer speedup while maintaining generation quality, and is accepted at ICML 2026.
Introduces CODA, a GPU kernel abstraction that expresses Transformer operations as GEMM-plus-epilogue programs to reduce data movement, covering nearly all non-attention computation in a Transformer block.
CODA introduces a GPU kernel abstraction that rewrites transformer computations as GEMM-plus-epilogue programs, reducing memory-bound operations and improving efficiency in training.