Tag
The author developed a Lean4-to-TileLang tensor program superoptimizer that automatically generates optimized accelerator kernels and derives hyperparameter scaling laws, achieving a 1.8x speedup on A100 GPUs.