MAX models can now run on Apple silicon GPUs
Summary
MAX models have been updated to run on Apple silicon GPUs, enabling faster inference on Macs.
Similar Articles
Apple announced new on device inference engine for Apple Silicon
Apple announced CoreAI, a new on-device inference engine for Apple Silicon at WWDC, replacing CoreML and supporting larger models up to 20B parameters via optimized inference, with a focus on phones and tablets.
@akshay_pachaar: Apple finally did it. Its new framework, Core AI, runs models entirely on Apple silicon, so inference happens on the us…
Apple released Core AI, a new framework that runs AI models entirely on Apple silicon devices (iPhone, iPad, Mac, Vision Pro) with zero server calls. It includes a memory-safe Swift API, model export recipes for PyTorch, an optimizer, and debugging tools, supporting models like Qwen, Mistral, and SAM3.
@neural_avb: I am working on porting SAM models and harness into Apple silicon. Already seeing 1.25x inference speed increase on mlx…
Porting SAM 2.1 models to Apple silicon with MLX, achieving 1.25x inference speed increase on the small model, with quantized versions planned.
@PyTorch: ExecuTorch now has an MLX delegate that runs PyTorch models on Apple Silicon GPUs. It supports LLMs, speech-to-text, an…
ExecuTorch now has an MLX delegate that enables GPU-accelerated inference for PyTorch models on Apple Silicon Macs, supporting LLMs, speech-to-text, and MoE models with quantization via TorchAO.
@HuggingModels: Gemma 4 is here, and it's optimized for Apple Silicon. This 4-bit quantized model runs fast on your Mac, not just in th…
Gemma 4 is a 4-bit quantized model optimized for Apple Silicon, enabling fast local inference on Mac devices, reducing reliance on cloud computing.