@_akhaliq: GPU Forecasters Language Models as Selective Surrogates for Kernel Runtime Optimization

X AI KOLs Following Papers

Summary

This paper proposes using language models as selective surrogates to optimize GPU kernel runtime, demonstrating a novel approach to performance forecasting.

GPU Forecasters Language Models as Selective Surrogates for Kernel Runtime Optimization https://t.co/s2r0lFWz9r
Original Article
View Cached Full Text

Cached at: 06/02/26, 07:38 PM

GPU Forecasters

Language Models as Selective Surrogates for Kernel Runtime Optimization https://t.co/s2r0lFWz9r

Similar Articles

A hackable compiler to generate efficient fused GPU kernels for AI models [P]

Reddit r/MachineLearning

The author presents a custom, hackable ML compiler written in Python that lowers LLMs to optimized CUDA kernels through a multi-stage IR pipeline, achieving performance competitive with or superior to PyTorch on specific operations. The article details the compiler's optimization passes, lowering rules, and CLI usage for generating efficient fused GPU kernels.

Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption

arXiv cs.AI

This paper presents an empirical study on scheduling multiple LLMs on shared heterogeneous hardware, focusing on performance implications of CPU-GPU offloading and preemption. It finds that offloading causes non-linear decode degradation, especially for smaller models, and preemption overhead is dominated by model state reload, providing design guidance for future multi-model schedulers.

Extensions and limitations of the neural GPU

OpenAI Blog

This paper explores extensions and limitations of the Neural GPU model, demonstrating improvements through curriculum design and scaling, enabling it to learn arithmetic operations on decimal numbers and long expressions while identifying failure modes on symmetric inputs analogous to adversarial examples.