@antirez: I just pushed a big refactoring of DS4 backends with CUDA support and single direction activation steering. The Metal p…
Summary
antirez pushed a major refactoring of DS4 backends, adding CUDA support and single direction activation steering while preserving the Metal path. Only M3 and DGX Spark hardware are supported for now.
Similar Articles
@antirez: DS4 running on DGX Spark (GB10 / CUDA), private branch for now. 12 tokens/sec, the memory bandwidth is limited in this …
Antirez reports benchmarking DS4 inference on the DGX Spark (GB10), noting 12 tokens/sec generation speed and high prefill performance, with plans to merge the codebase once mature.
A few words on DS4
Antirez announces DwarfStar 4 (DS4), a local AI tool that runs DeepSeek v4 Flash with asymmetric 2/8 bit quantization on high-end consumer hardware, achieving near-frontier performance. He discusses the project's rapid popularity, future plans for model updates and distributed inference, and the significance of local AI for serious tasks.
DS4
Salvatore Sanfilippo released DS4, a project enabling DeepSeek V3 (referred to as V4 in text) Flash to run with a 1M context window on Mac Metal hardware, with potential for DGX and AMD support.
@ttasanen: Just fired up DS4 by @antirez on my Mac Studio M3 Ultra 256GB and man, it’s seriously impressive. A clean, purpose-buil…
DS4 is a specialized inference engine by antirez designed to run DeepSeek V4 Flash locally on high-end Mac hardware, featuring optimized KV cache handling and 1M context support.
@antirez: For the DGX Spark owners. This is what you get with DS4 in your hardware. I want to post this to show how with fast pre…
antirez shares a demonstration of using DS4 on the DGX Spark, showing that despite slow generation, fast prefill keeps the system usable.