@PyTorch: More details about the tutorial https://pldi26.sigplan.org/details/pldi-2026-tutorials/1/Writing-Performance-Portable-K…

X AI KOLs Following Events

Summary

Helion is a Python DSL that compiles to optimized Triton code for performance-portable GPU kernels. This tutorial at PLDI 2026 covers Helion's architecture, autotuning, and CuteDSL backend.

More details about the tutorial https://pldi26.sigplan.org/details/pldi-2026-tutorials/1/Writing-Performance-Portable-Kernels-Simplified-with-Helion…
Original Article
View Cached Full Text

Cached at: 06/05/26, 09:20 PM

More details about the tutorial https://pldi26.sigplan.org/details/pldi-2026-tutorials/1/Writing-Performance-Portable-Kernels-Simplified-with-Helion…


Writing Performance-Portable Kernels Simplified with Helion (PLDI 2026 - PLDI Tutorials) - PLDI 2026

Source: https://pldi26.sigplan.org/details/pldi-2026-tutorials/1/Writing-Performance-Portable-Kernels-Simplified-with-Helion

This program is tentative and subject to change.

Abstract

Modern machine learning relies heavily on custom kernels for performance, which are often written in hardware-specific languages and create technical debt. Helion addresses this by compiling a high-level Python Domain Specific Language (DSL) into optimized Triton code, automating low-level details and hardware-specific tuning. With its PyTorch-like syntax and autotuning engine, Helion delivers fast, portable performance while significantly reducing development effort. Helion is open-source athttps://github.com/pytorch/helion. This 3-hour tutorial will describe Helion through a series of talk and demonstrations.

This tutorial will describe Helion through a series of talks, demonstrations, and hands-on experiments.

  1. Introduction to Helion (35 mins): We will provide an overview of Helion, including its underlying motivation, programming model, overall design architecture, and various use cases.
  2. Compiler Architecture and Integration with TorchInductor (35 mins): The Helion compiler architecture progressively lowers Python functions into highly optimized Triton code, utilizing TorchInductor as its backend. The key stages of this compilation pipeline are Python AST parsing, Type Propagation, Device IR lowering, a series of compiler passes, and finally, code generation. We will detail the integration between Helion and TorchInductor, explaining how this interface enables Helion to target both GPU and non-GPU hardware and how users can incorporate their own custom backends.
  3. 30-min break: (Time to set up compute for hands-on experiments in the following section)
  4. Autotuning in Helion (50 mins): A key feature of Helion is its scalable autotuning framework that explores a vast configuration space, where one Helion kernel can map to thousands of Triton kernels. In this session, we detail the configuration space that Helion explores, illustrate how different configs map to Triton code, and examine the various search strategies that Helion utilizes, such as Likelihood-Free Bayesian Optimization and LLM-guided autotuning. Attendees will also have the opportunity to gain hands-on experience with autotuning Helion kernels.
  5. CuteDSL backend for SOTA NVIDIA performance (30 mins): In this session, we will present the cutting-edge performance we are achieving on NVIDIA GPUs, driven by our ongoing efforts to build the CuteDSL backend in Helion. We will also showcase the agentic development workflow that facilitates these advancements.

This program is tentative and subject to change.

Similar Articles