@jun_song: Best mid-range local LLM hardware : DGX Spark vs Mac Studio M5 Max 128GB (upcoming) Price: $4.7k (cheaper if used or OE…
Summary
A comparison of DGX Spark vs Mac Studio M5 Max for running local LLMs, highlighting decode speed, prefill performance, RAM, power consumption, and cost. The Mac wins on decode bandwidth but DGX is faster for prefill and supports batching.
View Cached Full Text
Cached at: 05/16/26, 07:23 PM
Best mid-range local LLM hardware :
DGX Spark vs Mac Studio M5 Max 128GB (upcoming)
Price: 4.7k (cheaper if used or OEM) vs ~5k (est) Decode: 273 GB/s vs 614 GB/s (Mac wins by 2.2x) Prefill: DGX is ~2x faster + supports batching RAM: 128GB unified on both Power: 240W vs 200W (insanely efficient) Thermals: Both quiet, but DGX runs hot Perks: CUDA vs MLX optimization allows Deepseek V4 Flash on your desk.
Similar Articles
@Michaelzsguo: Two days ago, I asked whether I should buy a Mac Studio for local LLMs. I was genuinely humbled by how much great feedb…
The author shares a synthesized buying guide for hardware suitable for running local LLMs, comparing Mac Studio, NVIDIA, and AMD options based on community feedback.
@songjunkr: Sharing my local LLM setup for personal use: Equipment: MacStudio M2 Ultra 64gb Model on load - SuperQwen3.6 35b mlx 4b…
A user shared their personal local LLM stack running on a MacStudio M2 Ultra 64 GB, combining SuperQwen3.6-35b-mlx-4bit, Ernie Image Turbo, and multiple helper models for coding and chat.
2x 512gb ram M3 Ultra mac studios
A user shares their $25k hardware setup of two 512GB RAM M3 Ultra Mac Studios for running large language models locally, having tested DeepSeek V3 Q8 and GLM 5.1 Q4 via the exo distributed inference backend, while awaiting Kimi 2.6 MLX optimization.
I benchmarked 21 local LLMs on a MacBook Air M5 for code quality AND speed
A developer benchmarked 21 local LLMs on MacBook Air M5 using HumanEval+ and found Qwen 3.6 35B-A3B (MoE) leads at 89.6% with 16.9 tok/s, while Qwen 2.5 Coder 7B offers the best RAM-to-performance ratio at 84.2% in 4.5 GB. Notably, Gemma 4 models significantly underperformed expectations (31.1% for 31B), possibly due to Q4_K_M quantization effects.
Choosing a Mac Mini for local LLMs — what would YOU actually buy?
A community discussion post seeking advice on which Mac Mini configuration (M4, M2 Pro, or M1 Max) to purchase for running local LLMs with Ollama and coding assistants, with the decision complicated by rumored M5 releases and current supply shortages.