@antirez: I didn't expect DeepSeek v4 PRO (not Flash) to run well on the Mac Studio M3 Ultra with 512GB of RAM. This is 2 bit qua…

X AI KOLs Timeline News

Summary

Antirez reports that DeepSeek v4 PRO runs well on a Mac Studio M3 Ultra with 512GB RAM using 2-bit quantization, achieving 130 t/s prefill and 13 t/s generation.

I didn't expect DeepSeek v4 PRO (not Flash) to run well on the Mac Studio M3 Ultra with 512GB of RAM. This is 2 bit quantized with the same DwarfStar recipe used for Flash. 433GB GGUF file. 130 t/s prefill, 13 t/s generation. Prefill in the video is low because small prompt. https://t.co/ciyx0XCSh7
Original Article
View Cached Full Text

Cached at: 05/17/26, 11:32 AM

I didn’t expect DeepSeek v4 PRO (not Flash) to run well on the Mac Studio M3 Ultra with 512GB of RAM. This is 2 bit quantized with the same DwarfStar recipe used for Flash. 433GB GGUF file. 130 t/s prefill, 13 t/s generation. Prefill in the video is low because small prompt. https://t.co/ciyx0XCSh7

Similar Articles

@Snixtp: DeepSeek V4 Flash on a single RTX Pro 6000?

X AI KOLs Following

DeepSeek V4 Flash GGUF quantizations have been released by antirez, enabling the model to run on single GPUs like the RTX Pro 6000 and Macs with 128GB+ RAM. The quantized files are available on Hugging Face with instructions for the DS4 inference engine.