@antirez: I didn't expect DeepSeek v4 PRO (not Flash) to run well on the Mac Studio M3 Ultra with 512GB of RAM. This is 2 bit qua…

X AI KOLs Timeline 05/17/26, 08:21 AM News

deepseek mac-studio m3-ultra llm local-inference quantization performance

Summary

Antirez reports that DeepSeek v4 PRO runs well on a Mac Studio M3 Ultra with 512GB RAM using 2-bit quantization, achieving 130 t/s prefill and 13 t/s generation.

I didn't expect DeepSeek v4 PRO (not Flash) to run well on the Mac Studio M3 Ultra with 512GB of RAM. This is 2 bit quantized with the same DwarfStar recipe used for Flash. 433GB GGUF file. 130 t/s prefill, 13 t/s generation. Prefill in the video is low because small prompt. https://t.co/ciyx0XCSh7

Original Article

View Cached Full Text

Cached at: 05/17/26, 11:32 AM

I didn’t expect DeepSeek v4 PRO (not Flash) to run well on the Mac Studio M3 Ultra with 512GB of RAM. This is 2 bit quantized with the same DwarfStar recipe used for Flash. 433GB GGUF file. 130 t/s prefill, 13 t/s generation. Prefill in the video is low because small prompt. https://t.co/ciyx0XCSh7

Similar Articles

You can run Deepseek 4 flash on mac (M3 Max, 96gb)

Reddit r/LocalLLaMA

A guide on running DeepSeek 4 flash on a Mac M3 Max with 96GB RAM using Antirez's ds4 engine and SSD streaming, achieving ~12 tokens/second inference speed.

@antirez: DeepSeek v4 PRO running via SSD streaming on my 128GB MacBook m5 max. 1.6 trillion parameters.

X AI KOLs Timeline

DeepSeek v4 PRO, a 1.6 trillion parameter model, is running via SSD streaming on a 128GB MacBook m5 max, demonstrating local inference of a massive model.

@Snixtp: DeepSeek V4 Flash on a single RTX Pro 6000?

X AI KOLs Following

DeepSeek V4 Flash GGUF quantizations have been released by antirez, enabling the model to run on single GPUs like the RTX Pro 6000 and Macs with 128GB+ RAM. The quantized files are available on Hugging Face with instructions for the DS4 inference engine.

@ttasanen: Just fired up DS4 by @antirez on my Mac Studio M3 Ultra 256GB and man, it’s seriously impressive. A clean, purpose-buil…

X AI KOLs Timeline

DS4 is a specialized inference engine by antirez designed to run DeepSeek V4 Flash locally on high-end Mac hardware, featuring optimized KV cache handling and 1M context support.

Follow-up: DeepSeek V4 Flash on 2x RTX PRO 6000 finishes real coding tasks faster than Sonnet and Opus, at about Sonnet quality