DS4

Reddit r/LocalLLaMA Tools

Summary

Salvatore Sanfilippo released DS4, a project enabling DeepSeek V3 (referred to as V4 in text) Flash to run with a 1M context window on Mac Metal hardware, with potential for DGX and AMD support.

The developer that created Redis, Salvatore Sanfilippo, has released a new project on GitHub named DS4. [https://github.com/antirez/ds4/](https://github.com/antirez/ds4/) The TL;DR on this one is getting DeepSeek V4 Flash running with a 1M context windows on Mac Metal hardware. Some novel techniques going on. A few hours ago he posted a video of it running on a DGX: [https://x.com/antirez/status/2053381973226184749](https://x.com/antirez/status/2053381973226184749) So if they can get it running on a DGX, maybe a Pro 6000 at a slightly smaller context window at a high speed. I also think that they could figure out the AMD chips as well in the future. The server already has an OpenAI and Anthropic endpoints for use with Agentic code tools. I know the people on this sub-reddit have AMAZING hardware. I would encourage people to check out this project and see if there is a contribution that they can make.
Original Article

Similar Articles

A few words on DS4

Hacker News Top

Antirez announces DwarfStar 4 (DS4), a local AI tool that runs DeepSeek v4 Flash with asymmetric 2/8 bit quantization on high-end consumer hardware, achieving near-frontier performance. He discusses the project's rapid popularity, future plans for model updates and distributed inference, and the significance of local AI for serious tasks.

@Snixtp: DeepSeek V4 Flash on a single RTX Pro 6000?

X AI KOLs Following

DeepSeek V4 Flash GGUF quantizations have been released by antirez, enabling the model to run on single GPUs like the RTX Pro 6000 and Macs with 128GB+ RAM. The quantized files are available on Hugging Face with instructions for the DS4 inference engine.