ds4 webui
Summary
A minimal open-source web UI for the DS4 inference server is released, designed for Apple Silicon Macs with at least 128GB of RAM.
Similar Articles
DS4
Salvatore Sanfilippo released DS4, a project enabling DeepSeek V3 (referred to as V4 in text) Flash to run with a 1M context window on Mac Metal hardware, with potential for DGX and AMD support.
@ttasanen: Just fired up DS4 by @antirez on my Mac Studio M3 Ultra 256GB and man, it’s seriously impressive. A clean, purpose-buil…
DS4 is a specialized inference engine by antirez designed to run DeepSeek V4 Flash locally on high-end Mac hardware, featuring optimized KV cache handling and 1M context support.
@mitsuhiko: Nice! @antirez merged my tool parameter streaming changes into ds4. Means you can now just install the pi extension and…
Developer mitsuhiko released an open-source Pi extension that integrates with ds4 to streamline running DeepSeek V4 Flash locally on macOS. The tool automates model downloads, quantization selection based on RAM, and server lifecycle management for a seamless local LLM experience.
DeepSeek 4 Flash local inference engine for Metal
ds4 is a native local inference engine for DeepSeek V4 Flash optimized for Apple Silicon, featuring disk-based KV cache persistence and Metal acceleration.
@antirez: DS4 running on DGX Spark (GB10 / CUDA), private branch for now. 12 tokens/sec, the memory bandwidth is limited in this …
Antirez reports benchmarking DS4 inference on the DGX Spark (GB10), noting 12 tokens/sec generation speed and high prefill performance, with plans to merge the codebase once mature.