Tag
CODA introduces a GPU kernel abstraction that rewrites transformer computations as GEMM-plus-epilogue programs, reducing memory-bound operations and improving efficiency in training.