llm-code-generation

#llm-code-generation

Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon

Hugging Face Daily Papers ↗ · 2026-05-10 Cached

Metal-Sci introduces a 10-task benchmark for optimizing scientific computing kernels on Apple Silicon, paired with an evolutionary search framework driven by large language models. The study evaluates models like Claude Opus 4.7, Gemini 3.1 Pro, and GPT 5.5, demonstrating significant speedups while using out-of-distribution testing to catch silent performance regressions.

0 favorites 0 likes

#llm-code-generation

KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

Hugging Face Daily Papers ↗ · 2026-05-06 Cached

KernelBench-X is a new benchmark for evaluating LLM-generated GPU kernels, revealing that task structure impacts correctness more than method design and that correctness does not guarantee hardware efficiency.

0 favorites 0 likes

llm-code-generation

Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon

KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

Submit Feedback