mlx

#mlx

Build a LLM from Scratch using MLX

Reddit r/LocalLLaMA ↗ · 15h ago

A guide on building a large language model from scratch using Apple's MLX framework.

0 favorites 0 likes

#mlx

650+ Apache-2.0 biomedical NER/de-id models that run on-device in MLX. Same fp32 weights, identical outputs: the clinical NER models run 30-40x faster than PyTorch-CPU on a 3-year-old M3 Max. Repro inside.

Reddit r/LocalLLaMA ↗ · yesterday

A collection of 650+ Apache-2.0 licensed biomedical NER and de-identification models that run on-device via MLX, achieving 30-40x faster inference than PyTorch-CPU on an M3 Max with identical outputs.

0 favorites 0 likes

#mlx

@cevenif: For those running local LLMs on Macs, here's a tool worth watching — Rapid-MLX. It delivers 2-4x faster inference on M-series chips than Ollama, thanks to being built directly on Apple's MLX framework for more thorough utilization of the chip architecture. Key highlights: KV cache pruning plus…

X AI KOLs Timeline ↗ · 2026-06-18 Cached

Rapid-MLX is a local LLM inference tool optimized for Apple M-series chips. Built on the MLX framework, it achieves 2 to 4 times faster inference than Ollama, supports multiple models, tool calling, and an OpenAI API-compatible interface.

0 favorites 0 likes

#mlx

@pcuenq: GLM 5.2 has just been released Here it's already running with MLX on two Mac Studios (M3 Ultra). This is comparable to …

X AI KOLs Timeline ↗ · 2026-06-16 Cached

GLM 5.2, an open-weight AI model comparable to top closed models, has been released and is now running on MLX on two Mac Studios (M3 Ultra).

0 favorites 0 likes

#mlx

@no_stp_on_snek: Config-I quant of MiniMax-M3 is up on MLX. 2-bit experts, 4-bit attention, 8-bit boundaries + embeddings, f16 router. ~…

X AI KOLs Following ↗ · 2026-06-16 Cached

Announces the release of a Config-I quantization of MiniMax-M3 on MLX, using 2-bit experts and 4-bit attention to reduce the 427B MoE model from 869GB to ~167GB, though the quant is untested and requires a patch for mlx_lm.

0 favorites 0 likes

#mlx

React Native ExecuTorch now runs Gemma 4 (Vulkan and MLX accelerated)

Reddit r/LocalLLaMA ↗ · 2026-06-15

The react-native-executorch library now integrates Google's Gemma 4 model, enabling fully offline, GPU-accelerated inference in React Native apps using Vulkan on Android and MLX on Apple Silicon.

0 favorites 0 likes

#mlx

@ActuallyIsaak: Here is a real-life run, end-to-end from training to using the trained LLM in LM Studio by @lmstudio MLX-LoRA-Studio gi…

X AI KOLs Following ↗ · 2026-06-14 Cached

MLX-LoRA-Studio is a native macOS app for fine-tuning LLMs on Apple Silicon, offering a user-friendly interface and support for various training algorithms including SFT, DPO, and QAT. It is fully open-source and allows local, private fine-tuning without cloud dependency.

0 favorites 0 likes

#mlx

@julien_c: This is awesome news: oMLX, by @jundotkim, now supports the standard HF cache model directory Great MLX server for Loca…

X AI KOLs Following ↗ · 2026-06-12 Cached

oMLX, a MLX server for local AI, now supports the standard Hugging Face cache model directory, simplifying model loading.

0 favorites 0 likes

#mlx

@awnihannun: The video from @angeloskath on local agentic AI with MLX is excellent. I also hear it's one of the most viewed videos i…

X AI KOLs Following ↗ · 2026-06-12 Cached

A tweet highlights an excellent WWDC video by Angelos Kath on building local agentic AI with MLX, noting rapid progress in open-weight models and hardware capabilities.

0 favorites 0 likes

#mlx

MTPLX V1: The Swift App For Running & Creating MLX MTP Models (2x TPS Qwen 3.6 27B)

Reddit r/LocalLLaMA ↗ · 2026-06-12

MTPLX V1 is a native Mac app that bundles the MTP speculative decoding engine for MLX models, offering features like model conversion via Forge, built-in chat, benchmarking, and support for smaller models. It achieves over 2x speedup with mathematical exactness.

0 favorites 0 likes

#mlx

@yagilb: I had the huge privilege of presenting at WWDC this year, demoing @lmstudio's upcoming clustering feature live on stage…

X AI KOLs Following ↗ · 2026-06-10 Cached

Yagil Bubrovnik presented at WWDC, demoing LM Studio's upcoming clustering feature on stage, crediting the MLX team for their work.

0 favorites 0 likes

#mlx

Releasing Cohere North Mini Code

Reddit r/LocalLLaMA ↗ · 2026-06-09

Cohere officially launches North Mini Code, a coding model, with weights available on Hugging Face and deployment support for vLLM and MLX.

0 favorites 0 likes

#mlx

@awnihannun: Three MLX videos dropped at WWDC: Running agents locally by @angeloskath https://youtube.com/watch?v=wykPErJ8M-8… Distr…

X AI KOLs Following ↗ · 2026-06-09 Cached

Three MLX videos from WWDC demonstrate running AI agents entirely locally on Apple Silicon using the MLX stack, including local inference, tool calling, and distributed inference across Macs, enabling no-cloud, offline AI workflows.

0 favorites 0 likes

#mlx

New MLX LM Server From Apple

Reddit r/LocalLLaMA ↗ · 2026-06-09 Cached

Apple's MLX team introduces MLX LM Server, a tool for running AI agent workflows fully locally on Mac, supporting continuous batching, distributed inference, and M5 neural acceleration, with no need for cloud or API keys.

0 favorites 0 likes

#mlx

@RayFernando1337: Extreme Alpha RN: We got a special guest from Google for our event to chat about the next Gen Foundation models. Plus w…

X AI KOLs Following ↗ · 2026-06-08 Cached

A special guest from Google will discuss next generation foundation models at the Extreme Alpha RN event, with additional speaker Awni Hannun, co-creator of MLX.

0 favorites 0 likes

#mlx

@jundotkim: I just shipped oMLX v0.4.0, the first official release with the new native Swift macOS app. https://github.com/jundot/o…

X AI KOLs Timeline ↗ · 2026-06-02 Cached

oMLX v0.4.0 ships a native Swift macOS app with redesigned onboarding, settings UI, Hugging Face cache discovery, and improved model management for running local AI on Macs.

0 favorites 0 likes

#mlx

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama)

Reddit r/LocalLLaMA ↗ · 2026-05-31

A CS student built mlx-Chronos, an open-source CLI tool that standardizes benchmarking of MLX inference engines on Apple Silicon by measuring TTFT, throughput, memory usage, and thermal state, with a community leaderboard for sharing results.

0 favorites 0 likes

#mlx

mlx-code — local LLM coding agent for Apple Silicon

Reddit r/artificial ↗ · 2026-05-31 Cached

mlx-code is a Python package that provides a local-first LLM coding agent for Apple Silicon, bundling an MLX inference server, multi-protocol API support, git worktree isolation, and composable multi-agent primitives.

0 favorites 0 likes

#mlx

@badlogicgames: pibot is now running fully local, using parakeet for STT, qwen3-tts for TTS, and Qwen 3.6 as the local multi-modal LLM …

X AI KOLs Following ↗ · 2026-05-29 Cached

pibot is now fully local, using Parakeet for STT, Qwen3-tts for TTS, and Qwen 3.6 as the local multimodal LLM via llama.cpp, with Rust/mlx-c based inference engines, achieving zero Python dependencies.

0 favorites 0 likes

#mlx

We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

Reddit r/LocalLLaMA ↗ · 2026-05-25

Mininglamp AI released Cider, a small SDK that adds W8A8 activation quantization to Apple's MLX framework, achieving up to 1.84x speedup on prefill for large language models on M5 Pro via custom Metal kernels. The tool works with any MLX model, with INT8 TensorOps support for M5 and above.

0 favorites 0 likes

mlx

Submit Feedback