Tag
An abliterated (uncensored) version of DeepSeek-V4-Flash, optimized for Apple Macs with MLX, removing refusal behaviors while preserving knowledge and reasoning.
A PR for mlx-lm adds support for Cohere's Command A+ (218B MoE) model on Apple Silicon, with architecture details for the implementation.
Added native multi-token prediction (MTP) support to the exo local inference tool for Qwen3.6 MLX models, achieving up to 2x speedup on 27B models on an M5 Max laptop while maintaining exactness.
Hugging Face flagged a safetensors file as unsafe, confusing users who question the policy.
Silicon Studio is an open-source desktop app that enables local LLM fine-tuning and inference on Apple Silicon Macs using MLX, with features for data preparation, model management, and visual configuration.
Demonstrates running subagents locally on a MacBook Pro M5 using Codex CLI and LM Studio with Qwen 3.6 and MLX batching for code review and bug detection.
Tweet recommending the 'Writing Fast MLX' skill by Awni Hannun for developers working with Apple's MLX framework.
A practical guide to organizing local LLM experiments by using a layered wrapper system and a consistent directory structure to avoid model location drift, flag amnesia, and harness coupling.
Sabesh Bharathi envisions personal proactive AI assistants on MacBooks using MLX, and announces the first MLX India community meet-up held on May 3rd.
oMLX 0.3.9rc1, an LLM inference server optimized for Apple Silicon Macs, adds low-memory stability, chunked prefill, multi-tasking admin chat, and more.
Personal update on hardware water damage recovery, showcasing MLX-VLM serving Qwen3-4B-Instruct locally on an RTX6000 Pro at ~300 tok/s for autocomplete and git commit generation via Zed IDE.
A blog post comparing MLX inference engines, concluding oMLX as the top choice, with benchmarks on M5 Max 64GB using Qwen3.6-35B-A3B-4bit.
ExecuTorch now has an MLX delegate that enables GPU-accelerated inference for PyTorch models on Apple Silicon Macs, supporting LLMs, speech-to-text, and MoE models with quantization via TorchAO.
Porting SAM 2.1 models to Apple silicon with MLX, achieving 1.25x inference speed increase on the small model, with quantized versions planned.
Google's Gemma 4 E2B is demonstrated running on an iPhone 17 Pro via MLX optimization, achieving ~40 tokens/second with 128K context and offline thinking mode for coding and math.
This thread recommends a local AI coding stack for the 128GB MacBook Pro, using Qwen 3.6 model with MLX server and specific configurations for reliable coding assistance.
Cider is an open-source project designed for Apple Silicon Macs, accelerating local AI inference by fully leveraging the computing power of M-series chips. It is compatible with the MLX ecosystem, supports models like Qwen and Llama, and is easy to install.
A new research paper on δ-mem improves agent response quality by 7-32% when integrated with openclaw. The project is currently usable only with mlx and Qwen3:4b, but adapters for other models are expected.
The author implements the δ-mem research paper on Apple Silicon using MLX and OpenClaw, showing memory and attention improvements in local AI agent tests, though with mixed results compared to CUDA benchmarks.
MTPLX is an integrated solution combining MLX and MTP, specifically optimized for model inference speed on Apple Silicon. Tests show that Qwen3.6-27B achieves double the inference speed of LM Studio, and it also integrates fan management.