mlx

#mlx

@dealignai: DeepSeek-V4-Flash CRACK'd (ablated/uncensored) - Mac's Only (Osaurus/vMLX) https://huggingface.co/dealignai/DeepSeek-V4…

X AI KOLs Timeline ↗ · 2026-05-24 Cached

An abliterated (uncensored) version of DeepSeek-V4-Flash, optimized for Apple Macs with MLX, removing refusal behaviors while preserving knowledge and reasoning.

0 favorites 0 likes

#mlx

Command A+ (218B MoE) running on Apple Silicon — MLX port, PR open

Reddit r/LocalLLaMA ↗ · 2026-05-23

A PR for mlx-lm adds support for Cohere's Command A+ (218B MoE) model on Apple Silicon, with architecture details for the implementation.

0 favorites 0 likes

#mlx

I added native MTP to exo for Qwen3.6 MLX models; here are the exactness and speed results

Reddit r/LocalLLaMA ↗ · 2026-05-23

Added native multi-token prediction (MTP) support to the exo local inference tool for Qwen3.6 MLX models, achieving up to 2x speedup on 27B models on an M5 Max laptop while maintaining exactness.

0 favorites 0 likes

#mlx

HF flagged safetensors as unsafe? wtf?

Reddit r/LocalLLaMA ↗ · 2026-05-21

Hugging Face flagged a safetensors file as unsafe, confusing users who question the policy.

0 favorites 0 likes

#mlx

@DanKornas: Fine-tuning local LLMs shouldn’t require renting a cloud GPU. Silicon Studio is an open-source desktop app for local LL…

X AI KOLs Following ↗ · 2026-05-21 Cached

Silicon Studio is an open-source desktop app that enables local LLM fine-tuning and inference on Apple Silicon Macs using MLX, with features for data preparation, model management, and visual configuration.

0 favorites 0 likes

#mlx

@adrgrondin: Subagents running locally and simultaneously on MacBook Pro M5 with Codex CLI + @lmstudio to review code and find bugs …

X AI KOLs Following ↗ · 2026-05-20 Cached

Demonstrates running subagents locally on a MacBook Pro M5 using Codex CLI and LM Studio with Qwen 3.6 and MLX batching for code review and bug detection.

0 favorites 0 likes

#mlx

@ivanfioravanti: Writing Fast MLX skill by @awnihannun is a must have for anyone working with the Apple mlx framework.

X AI KOLs Timeline ↗ · 2026-05-20 Cached

Tweet recommending the 'Writing Fast MLX' skill by Awni Hannun for developers working with Apple's MLX framework.

0 favorites 0 likes

#mlx

@Michaelzsguo: https://x.com/Michaelzsguo/status/2056842405815447684

X AI KOLs Timeline ↗ · 2026-05-19 Cached

A practical guide to organizing local LLM experiments by using a layered wrapper system and a consistent directory structure to avoid model location drift, flag amnesia, and harness coupling.

0 favorites 0 likes

#mlx

@sabeshbharathi: Imagine a future where you have truly personal and proactive assistants on the best personal AI devices ever - your Mac…

X AI KOLs Following ↗ · 2026-05-19 Cached

Sabesh Bharathi envisions personal proactive AI assistants on MacBooks using MLX, and announces the first MLX India community meet-up held on May 3rd.

0 favorites 0 likes

#mlx

@jundotkim: oMLX 0.3.9rc1 released. Highlights: - Low-memory Macs stay stable instead of getting killed by the OS - DFlash bumped t…

X AI KOLs Timeline ↗ · 2026-05-19 Cached

oMLX 0.3.9rc1, an LLM inference server optimized for Apple Silicon Macs, adds low-memory stability, chunked prefill, multi-tasking admin chat, and more.

0 favorites 0 likes

#mlx

@Prince_Canuma: Quick update on the water situation M3 Ultra and Titan (RTX6000 Pro) seem to have recovered with little to no visible d…

X AI KOLs Timeline ↗ · 2026-05-18 Cached

Personal update on hardware water damage recovery, showcasing MLX-VLM serving Qwen3-4B-Instruct locally on an RTX6000 Pro at ~300 tok/s for autocomplete and git commit generation via Zed IDE.

0 favorites 0 likes

#mlx

MLX engine comparison… and oMLX is the top choice.

Reddit r/LocalLLaMA ↗ · 2026-05-18

A blog post comparing MLX inference engines, concluding oMLX as the top choice, with benchmarks on M5 Max 64GB using Qwen3.6-35B-A3B-4bit.

0 favorites 0 likes

#mlx

@PyTorch: ExecuTorch now has an MLX delegate that runs PyTorch models on Apple Silicon GPUs. It supports LLMs, speech-to-text, an…

X AI KOLs Following ↗ · 2026-05-18 Cached

ExecuTorch now has an MLX delegate that enables GPU-accelerated inference for PyTorch models on Apple Silicon Macs, supporting LLMs, speech-to-text, and MoE models with quantization via TorchAO.

0 favorites 0 likes

#mlx

@neural_avb: I am working on porting SAM models and harness into Apple silicon. Already seeing 1.25x inference speed increase on mlx…

X AI KOLs Following ↗ · 2026-05-17 Cached

Porting SAM 2.1 models to Apple silicon with MLX, achieving 1.25x inference speed increase on the small model, with quantized versions planned.

0 favorites 0 likes

#mlx

@rohanpaul_ai: So much possibilities for on-device small models. Here @adrgrondin is running Google’s Gemma 4 E2B on iPhone 17 Pro. ~4…

X AI KOLs Following ↗ · 2026-05-17 Cached

Google's Gemma 4 E2B is demonstrated running on an iPhone 17 Pro via MLX optimization, achieving ~40 tokens/second with 128K context and offline thinking mode for coding and math.

0 favorites 0 likes

#mlx

@Michaelzsguo: So you bought the 128GB MacBook Pro. Now the question is not, “Which local model gets the highest TPS?” It is: which se…

X AI KOLs Timeline ↗ · 2026-05-17 Cached

This thread recommends a local AI coding stack for the 128GB MacBook Pro, using Qwen 3.6 model with MLX server and specific configurations for reliable coding assistance.

0 favorites 0 likes

#mlx

@sitinme: There's a pretty interesting open-source project called Cider, specifically designed to accelerate local AI inference on Macs with Apple Silicon chips. Many people buy a Mac mini or MacBook Pro and want to run models locally, but often encounter issues like insufficient speed and high memory usage. Actually...

X AI KOLs Timeline ↗ · 2026-05-17 Cached

Cider is an open-source project designed for Apple Silicon Macs, accelerating local AI inference by fully leveraging the computing power of M-series chips. It is compatible with the MLX ecosystem, supports models like Qwen and Llama, and is easy to install.

0 favorites 0 likes

#mlx

Yesterday I saw a new research paper about δ-mem and integrated with openclaw

Reddit r/openclaw ↗ · 2026-05-17

A new research paper on δ-mem improves agent response quality by 7-32% when integrated with openclaw. The project is currently usable only with mlx and Qwen3:4b, but adapters for other models are expected.

0 favorites 0 likes

#mlx

I fitted the new δ-mem research for apple silicon using mlx and openclaw integration! My findings

Reddit r/LocalLLaMA ↗ · 2026-05-16

The author implements the δ-mem research paper on Apple Silicon using MLX and OpenClaw, showing memory and attention improvements in local AI agent tests, though with mixed results compared to CUDA benchmarks.

0 favorites 0 likes

#mlx

@nash_su: Mac inference speed doubled. MTPLX is an integrated solution combining MLX and MTP, specifically optimized for model inference on Apple Silicon. By using models with a custom MTP head, it can deliver doubled inference speed. I tested it with Qwen3.6-27…

X AI KOLs Timeline ↗ · 2026-05-16 Cached

MTPLX is an integrated solution combining MLX and MTP, specifically optimized for model inference speed on Apple Silicon. Tests show that Qwen3.6-27B achieves double the inference speed of LM Studio, and it also integrates fan management.

0 favorites 0 likes

mlx

Submit Feedback