DS4

Reddit r/LocalLLaMA 05/10/26, 12:25 PM Tools

Summary

Salvatore Sanfilippo released DS4, a project enabling DeepSeek V3 (referred to as V4 in text) Flash to run with a 1M context window on Mac Metal hardware, with potential for DGX and AMD support.

The developer that created Redis, Salvatore Sanfilippo, has released a new project on GitHub named DS4. [https://github.com/antirez/ds4/](https://github.com/antirez/ds4/) The TL;DR on this one is getting DeepSeek V4 Flash running with a 1M context windows on Mac Metal hardware. Some novel techniques going on. A few hours ago he posted a video of it running on a DGX: [https://x.com/antirez/status/2053381973226184749](https://x.com/antirez/status/2053381973226184749) So if they can get it running on a DGX, maybe a Pro 6000 at a slightly smaller context window at a high speed. I also think that they could figure out the AMD chips as well in the future. The server already has an OpenAI and Anthropic endpoints for use with Agentic code tools. I know the people on this sub-reddit have AMAZING hardware. I would encourage people to check out this project and see if there is a contribution that they can make.

Original Article

Similar Articles

@ttasanen: Just fired up DS4 by @antirez on my Mac Studio M3 Ultra 256GB and man, it’s seriously impressive. A clean, purpose-buil…

X AI KOLs Timeline

DS4 is a specialized inference engine by antirez designed to run DeepSeek V4 Flash locally on high-end Mac hardware, featuring optimized KV cache handling and 1M context support.

A few words on DS4

Hacker News Top

Antirez announces DwarfStar 4 (DS4), a local AI tool that runs DeepSeek v4 Flash with asymmetric 2/8 bit quantization on high-end consumer hardware, achieving near-frontier performance. He discusses the project's rapid popularity, future plans for model updates and distributed inference, and the significance of local AI for serious tasks.

DeepSeek 4 Flash local inference engine for Metal

Hacker News Top

ds4 is a native local inference engine for DeepSeek V4 Flash optimized for Apple Silicon, featuring disk-based KV cache persistence and Metal acceleration.

@Snixtp: DeepSeek V4 Flash on a single RTX Pro 6000?

X AI KOLs Following

DeepSeek V4 Flash GGUF quantizations have been released by antirez, enabling the model to run on single GPUs like the RTX Pro 6000 and Macs with 128GB+ RAM. The quantized files are available on Hugging Face with instructions for the DS4 inference engine.

You can run Deepseek 4 flash on mac (M3 Max, 96gb)

Reddit r/LocalLLaMA

A guide on running DeepSeek 4 flash on a Mac M3 Max with 96GB RAM using Antirez's ds4 engine and SSD streaming, achieving ~12 tokens/second inference speed.