DS4
Summary
Salvatore Sanfilippo released DS4, a project enabling DeepSeek V3 (referred to as V4 in text) Flash to run with a 1M context window on Mac Metal hardware, with potential for DGX and AMD support.
Similar Articles
@ttasanen: Just fired up DS4 by @antirez on my Mac Studio M3 Ultra 256GB and man, it’s seriously impressive. A clean, purpose-buil…
DS4 is a specialized inference engine by antirez designed to run DeepSeek V4 Flash locally on high-end Mac hardware, featuring optimized KV cache handling and 1M context support.
A few words on DS4
Antirez announces DwarfStar 4 (DS4), a local AI tool that runs DeepSeek v4 Flash with asymmetric 2/8 bit quantization on high-end consumer hardware, achieving near-frontier performance. He discusses the project's rapid popularity, future plans for model updates and distributed inference, and the significance of local AI for serious tasks.
DeepSeek 4 Flash local inference engine for Metal
ds4 is a native local inference engine for DeepSeek V4 Flash optimized for Apple Silicon, featuring disk-based KV cache persistence and Metal acceleration.
@Snixtp: DeepSeek V4 Flash on a single RTX Pro 6000?
DeepSeek V4 Flash GGUF quantizations have been released by antirez, enabling the model to run on single GPUs like the RTX Pro 6000 and Macs with 128GB+ RAM. The quantized files are available on Hugging Face with instructions for the DS4 inference engine.
You can run Deepseek 4 flash on mac (M3 Max, 96gb)
A guide on running DeepSeek 4 flash on a Mac M3 Max with 96GB RAM using Antirez's ds4 engine and SSD streaming, achieving ~12 tokens/second inference speed.