@antirez: DS4 running on DGX Spark (GB10 / CUDA), private branch for now. 12 tokens/sec, the memory bandwidth is limited in this …

X AI KOLs Timeline 05/10/26, 07:49 AM News

Summary

Antirez reports benchmarking DS4 inference on the DGX Spark (GB10), noting 12 tokens/sec generation speed and high prefill performance, with plans to merge the codebase once mature.

DS4 running on DGX Spark (GB10 / CUDA), private branch for now. 12 tokens/sec, the memory bandwidth is limited in this system, at 270GB/sec. But prefill is ways more alighed to M3 Max at ~200 t/s. I'll release when more mature, but it is almost sure that it will get merged. https://t.co/LVYSDQ4Hnp

Original Article

View Cached Full Text

Cached at: 05/10/26, 10:23 AM

DS4 running on DGX Spark (GB10 / CUDA), private branch for now. 12 tokens/sec, the memory bandwidth is limited in this system, at 270GB/sec. But prefill is ways more alighed to M3 Max at ~200 t/s. I’ll release when more mature, but it is almost sure that it will get merged. https://t.co/LVYSDQ4Hnp

Similar Articles

@antirez: For the DGX Spark owners. This is what you get with DS4 in your hardware. I want to post this to show how with fast pre…

X AI KOLs Timeline

antirez shares a demonstration of using DS4 on the DGX Spark, showing that despite slow generation, fast prefill keeps the system usable.

@onusoz: 16x parallel Gemma-4-26B-A4B-NVFP4 runs 18 output tokens/s, aggregate 300 tok/s 🫪 1 DGX Spark with 128 GB unified memo…

X AI KOLs Timeline

@onusoz demonstrates running 16 parallel instances of NVIDIA's quantized Gemma-4-26B-A4B-NVFP4 model on a single DGX Spark with 128GB unified memory, achieving 300 tok/s aggregate, showcasing high concurrency without flashinfer.

Deepseek V4 flash performance on DGX Spark

Reddit r/LocalLLaMA

A Reddit user shares their experience running DeepSeek V4 Flash on a dual-ASUS GX10 DGX Spark setup, detailing performance metrics, configuration, and power consumption, with throughput benchmarks across various context lengths.

@antirez: I just pushed a big refactoring of DS4 backends with CUDA support and single direction activation steering. The Metal p…

X AI KOLs Timeline

antirez pushed a major refactoring of DS4 backends, adding CUDA support and single direction activation steering while preserving the Metal path. Only M3 and DGX Spark hardware are supported for now.

Dual dgx spark (Asus GX10) MiniMax M2.7 results

Reddit r/LocalLLaMA

User benchmarks dual Asus GX10 (DGX Spark) running MiniMax-M2.7-AWQ-4bit, achieving 30–40 tokens/s while drawing only ~100 W each, replacing noisy multi-GPU rigs.

Similar Articles

@antirez: For the DGX Spark owners. This is what you get with DS4 in your hardware. I want to post this to show how with fast pre…

@onusoz: 16x parallel Gemma-4-26B-A4B-NVFP4 runs 18 output tokens/s, aggregate 300 tok/s 🫪 1 DGX Spark with 128 GB unified memo…

Deepseek V4 flash performance on DGX Spark

@antirez: I just pushed a big refactoring of DS4 backends with CUDA support and single direction activation steering. The Metal p…

Dual dgx spark (Asus GX10) MiniMax M2.7 results

Submit Feedback