I just bought Asus Ascent : Nvidia GB10 (DGX) and It is slower than my Ryzen Ai Max
Summary
A user reports that their Asus Ascent with Nvidia GB10 (DGX) is slower than their Ryzen AI Max when running LLMs like Gemma4-31B, despite expected 2-4x speedup, and shares their llama-cpp configuration for debugging.
Similar Articles
@pupposandro: 2.5x faster than llama.cpp on Strix Halo. We just shipped DFlash + PFlash for the AMD Ryzen AI MAX+ 395 iGPU (gfx1151, …
A new toolset (DFlash + PFlash) achieves 2.5x faster inference than llama.cpp on AMD Ryzen AI MAX+ 395 iGPU, demonstrating significant speedups for Qwen3.6-27B with 128 GiB unified memory.
Dual dgx spark (Asus GX10) MiniMax M2.7 results
User benchmarks dual Asus GX10 (DGX Spark) running MiniMax-M2.7-AWQ-4bit, achieving 30–40 tokens/s while drawing only ~100 W each, replacing noisy multi-GPU rigs.
Nvidia RTX 3090 vs Intel Arc Pro B70 llama.cpp Benchmarks
Community benchmark shows Intel Arc Pro B70 averages ~71% slower prompt processing and ~54% slower token generation than RTX 3090 under llama.cpp, with SYCL backend sometimes beating Vulkan on the same card.
Gemma 4 26B Hits 600 Tok/s on One RTX 5090
A benchmark shows that using vLLM with DFlash speculative decoding boosts Gemma 4 26B inference to ~578 tokens per second on a single RTX 5090, achieving a 2.56x speedup over baseline.
@antirez: DS4 running on DGX Spark (GB10 / CUDA), private branch for now. 12 tokens/sec, the memory bandwidth is limited in this …
Antirez reports benchmarking DS4 inference on the DGX Spark (GB10), noting 12 tokens/sec generation speed and high prefill performance, with plans to merge the codebase once mature.