@antirez: I didn't expect DeepSeek v4 PRO (not Flash) to run well on the Mac Studio M3 Ultra with 512GB of RAM. This is 2 bit qua…

X AI KOLs Timeline 05/17/26, 08:21 AM News

deepseek mac-studio m3-ultra llm local-inference quantization performance

Summary

Antirez reports that DeepSeek v4 PRO runs well on a Mac Studio M3 Ultra with 512GB RAM using 2-bit quantization, achieving 130 t/s prefill and 13 t/s generation.

I didn't expect DeepSeek v4 PRO (not Flash) to run well on the Mac Studio M3 Ultra with 512GB of RAM. This is 2 bit quantized with the same DwarfStar recipe used for Flash. 433GB GGUF file. 130 t/s prefill, 13 t/s generation. Prefill in the video is low because small prompt. https://t.co/ciyx0XCSh7

Original Article

View Cached Full Text

Cached at: 05/17/26, 11:32 AM

I didn’t expect DeepSeek v4 PRO (not Flash) to run well on the Mac Studio M3 Ultra with 512GB of RAM. This is 2 bit quantized with the same DwarfStar recipe used for Flash. 433GB GGUF file. 130 t/s prefill, 13 t/s generation. Prefill in the video is low because small prompt. https://t.co/ciyx0XCSh7

Similar Articles

@Snixtp: DeepSeek V4 Flash on a single RTX Pro 6000?

X AI KOLs Following

DeepSeek V4 Flash GGUF quantizations have been released by antirez, enabling the model to run on single GPUs like the RTX Pro 6000 and Macs with 128GB+ RAM. The quantized files are available on Hugging Face with instructions for the DS4 inference engine.

@ttasanen: Just fired up DS4 by @antirez on my Mac Studio M3 Ultra 256GB and man, it’s seriously impressive. A clean, purpose-buil…

X AI KOLs Timeline

DS4 is a specialized inference engine by antirez designed to run DeepSeek V4 Flash locally on high-end Mac hardware, featuring optimized KV cache handling and 1M context support.

@antirez: I didn't expect DeepSeek v4 PRO (not Flash) to run well on the Mac Studio M3 Ultra with 512GB of RAM. This is 2 bit qua…

Similar Articles

@Snixtp: DeepSeek V4 Flash on a single RTX Pro 6000?

@ttasanen: Just fired up DS4 by @antirez on my Mac Studio M3 Ultra 256GB and man, it’s seriously impressive. A clean, purpose-buil…

2x 512gb ram M3 Ultra mac studios

DeepSeek-V4-Flash W4A16+FP8 with MTP self-speculation: 85 tok/s @ 524k on 2× RTX PRO 6000 Max-Q

antirez/deepseek-v4-gguf

Submit Feedback