@AlexJonesax: Two open-source MLX inference servers worth knowing about if you run LLMs on Mac: MTPLX (@youssofal) Uses a model's own…
Summary
This article highlights two open-source MLX inference servers for Mac: MTPLX, which optimizes token speed using speculative decoding without a draft model, and oMLX, which improves workflow efficiency with persistent KV caches for coding agents.
Similar Articles
jundot/omlx
oMLX is a new open-source tool for optimized LLM inference on Apple Silicon Macs, featuring continuous batching and tiered KV caching managed via a menu bar app.
MTPLX V1: The Swift App For Running & Creating MLX MTP Models (2x TPS Qwen 3.6 27B)
MTPLX V1 is a native Mac app that bundles the MTP speculative decoding engine for MLX models, offering features like model conversion via Forge, built-in chat, benchmarking, and support for smaller models. It achieves over 2x speedup with mathematical exactness.
MLX engine comparison… and oMLX is the top choice.
A blog post comparing MLX inference engines, concluding oMLX as the top choice, with benchmarks on M5 Max 64GB using Qwen3.6-35B-A3B-4bit.
New MLX LM Server From Apple
Apple's MLX team introduces MLX LM Server, a tool for running AI agent workflows fully locally on Mac, supporting continuous batching, distributed inference, and M5 neural acceleration, with no need for cloud or API keys.
@jundotkim: oMLX 0.3.9rc1 released. Highlights: - Low-memory Macs stay stable instead of getting killed by the OS - DFlash bumped t…
oMLX 0.3.9rc1, an LLM inference server optimized for Apple Silicon Macs, adds low-memory stability, chunked prefill, multi-tasking admin chat, and more.