kernel-generation

#kernel-generation

Hawk: Harnessing Hardware-Aware Knowledge for High-Performance NPU Kernel Generation

arXiv cs.AI ↗ · 2026-07-03 Cached

Hawk is a training-free framework that uses hardware-aware knowledge to improve NPU kernel generation via LLMs, raising generation accuracy from 49.4% to 80.0% and achieving up to 2.2× execution speedup over state-of-the-art baselines.

0 favorites 0 likes

#kernel-generation

Toward Better HIP Kernel Generation for AMD GPUs: Synthetic Data, Multi-Agent Search, and Reinforcement Learning

Reddit r/LocalLLaMA ↗ · 2026-07-02 Cached

Explores synthetic data generation, multi-agent optimization, and reinforcement learning to improve language models' ability to generate high-performance HIP kernels for AMD GPUs, demonstrating improvements in compilation and correctness rates on MI350X.

0 favorites 0 likes

#kernel-generation

KForge: LLM-Driven Cross-Platform Kernel Generation for AI Accelerators

arXiv cs.LG ↗ · 2026-06-03 Cached

KForge is a cross-platform framework that uses two collaborating LLM-based agents to automatically generate and optimize high-performance compute kernels for diverse AI accelerators, achieving significant speedups on NVIDIA B200 and Intel Arc B580 hardware.

0 favorites 0 likes

#kernel-generation

@leloykun: I lost track of time again >.< I'm really sorry if you DMed me lately. I promise to go over my DMs! --- This sprint, I …

X AI KOLs Following ↗ · 2026-05-12

The author developed a Lean4-to-TileLang tensor program superoptimizer that automatically generates optimized accelerator kernels and derives hyperparameter scaling laws, achieving a 1.8x speedup on A100 GPUs.

0 favorites 0 likes

#kernel-generation

AdaExplore: Failure-Driven Adaptation and Diversity-Preserving Search for Efficient Kernel Generation

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers from Carnegie Mellon, University of Washington, and Arm propose AdaExplore, an LLM agent framework for GPU kernel code generation that achieves 3.12× and 1.72× speedups on KernelBench Level-2 and Level-3 benchmarks through failure-driven adaptation and diversity-preserving search, without additional fine-tuning.

0 favorites 0 likes

kernel-generation

Hawk: Harnessing Hardware-Aware Knowledge for High-Performance NPU Kernel Generation

Toward Better HIP Kernel Generation for AMD GPUs: Synthetic Data, Multi-Agent Search, and Reinforcement Learning

KForge: LLM-Driven Cross-Platform Kernel Generation for AI Accelerators

@leloykun: I lost track of time again >.< I'm really sorry if you DMed me lately. I promise to go over my DMs! --- This sprint, I …

AdaExplore: Failure-Driven Adaptation and Diversity-Preserving Search for Efficient Kernel Generation

Submit Feedback