Tag
ExecuTorch now has an MLX delegate that enables GPU-accelerated inference for PyTorch models on Apple Silicon Macs, supporting LLMs, speech-to-text, and MoE models with quantization via TorchAO.
ExecuTorch, PyTorch's on-device AI deployment framework, won the Best Industry Paper Award at MLSysConf 2026. The paper introduces a unified solution for running models on diverse hardware, from microcontrollers to SoCs.
This article introduces ExecuTorch, a unified PyTorch-native deployment framework designed to run AI models on diverse edge devices without requiring model conversion or reimplementation.