Apple MLX 团队推出 MLX LM Server,一个在 Mac 上完全本地运行 AI 智能体工作流的工具,支持连续批处理、分布式推理和 M5 神经加速,无需云端或 API 密钥。
**Key Technical Advantages:** * **Performance:** The *M5* chip's neural accelerators significantly boost prompt processing * **Concurrency:** *MLX LM Server* utilizes **continuous batching** to handle multiple sub-agent requests simultaneously without stalling * **Scaling:** For massive models that exceed local memory, *MLX* supports **distributed inference** across multiple Macs using *Thunderbolt RDMA* To get started, developers can install *MLX LM* via pip and point their preferred agent tool to the local server address Pretty cool over all!
Three MLX videos from WWDC demonstrate running AI agents entirely locally on Apple Silicon using the MLX stack, including local inference, tool calling, and distributed inference across Macs, enabling no-cloud, offline AI workflows.
SwiftLM is a Swift-native LLM inference server for Apple Silicon that runs large models without Python, using SSD streaming to load MoE weights and enabling 122B models on 64 GB Macs.