@Michaelzsguo: So you bought the 128GB MacBook Pro. Now the question is not, “Which local model gets the highest TPS?” It is: which se…

X AI KOLs Timeline Tools

Summary

This thread recommends a local AI coding stack for the 128GB MacBook Pro, using Qwen 3.6 model with MLX server and specific configurations for reliable coding assistance.

So you bought the 128GB MacBook Pro. Now the question is not, “Which local model gets the highest TPS?” It is: which setup can I actually trust to get the job done? This is the local coding stack I’d start with: Qwen 3.6, dense 27B, Q6 quant, MLX server, 8192 output tokens, 20GB prompt cache, and deterministic decoding. If Anthropic’s success story tells us anything, it is that once you figure out coding, you can expand into almost anything else. Local models stop being a hobby when they can finish the patch.
Original Article
View Cached Full Text

Cached at: 05/17/26, 08:24 PM

So you bought the 128GB MacBook Pro.

Now the question is not, “Which local model gets the highest TPS?”

It is: which setup can I actually trust to get the job done?

This is the local coding stack I’d start with: Qwen 3.6, dense 27B, Q6 quant, MLX server, 8192 output tokens, 20GB prompt cache, and deterministic decoding.

If Anthropic’s success story tells us anything, it is that once you figure out coding, you can expand into almost anything else.

Local models stop being a hobby when they can finish the patch.

spot on! it’s very subjective thing. even I can do a benchmark like this (I did that one on A100, maybe I should do it on my mac too)

totally. my codex-qwen works fine, though a little slower. I really think we will only get better from here.

try the same but probably slightly smaller prompt cache say 8GB. please let us know how it works.

Similar Articles

Localmaxxing (3 minute read)

TLDR AI

The article analyzes the viability of running AI inference locally on a MacBook Pro, comparing a local Qwen 35B model against the cloud-based Claude Opus 4.5. It concludes that local models are 2x faster for routine tasks, making them a practical choice for half of daily workloads despite a slight capability gap.

2x 512gb ram M3 Ultra mac studios

Reddit r/LocalLLaMA

A user shares their $25k hardware setup of two 512GB RAM M3 Ultra Mac Studios for running large language models locally, having tested DeepSeek V3 Q8 and GLM 5.1 Q4 via the exo distributed inference backend, while awaiting Kimi 2.6 MLX optimization.

Running local models on an M4 with 24GB memory

Hacker News Top

A guide on running local AI models like Qwen 3.5-9B on an M4 MacBook with 24GB RAM using tools like LM Studio, Ollama, and pi, including specific configuration tips for optimal performance.