我刚刚买了华硕 Ascent: Nvidia GB10 (DGX)，但它比我的 Ryzen Ai Max 慢。

Reddit r/LocalLLaMA 2026/05/15 18:24 新闻

performance-comparison llama-cpp inference-speed dgx ryzen-ai-max gemma-4

摘要

用户报告称，其搭载Nvidia GB10（DGX）的Asus Ascent在运行Gemma4-31B等大语言模型时，速度比Ryzen AI Max还要慢（预期应有2-4倍加速），并分享了他们的llama-cpp配置以供调试。

按理说应该快2-4倍，但我用Gemma4-31B只得到6 TK/s。我哪里做错了？ - 推理引擎：llama-cpp 最新版（2026年5月15日），通过 https://ggml.ai/dgx-spark.sh 自行编译 - 测试过的模型：Step3.5-Apex-I-Quality - DGX - 27 tk/s，AI-Max 30 tk/s - gemma-4-31B-it-UD-Q8_K_XL - 6.19 tk/s，AI-Max 7.10 tk/s 命令： ``` llama-server --models-preset /home/dgx/models/models.ini --models-dir /home/dgx/models/ --host 0.0.0.0 --port 8080 --models-max 1 --parallel 1 ``` model.ini 文件： ``` [*] threads = 12 flash-attn = on mlock = off mmap = off fit = on warmup = on ; batch-size = 4096 ; ubatch-size = 512 cache-type-k = q8_0 cache-type-v = q8_0 jinja = true direct-io = on cache-prompt = true cache-reuse = 256 cache-ram = 32768 reasoning-format = auto n-gpu-layers = 999 ```

查看原文

我刚刚买了华硕 Ascent: Nvidia GB10 (DGX)，但它比我的 Ryzen Ai Max 慢。

相似文章

@pupposandro：在 Strix Halo 上比 llama.cpp 快 2.5 倍。我们刚刚为 AMD Ryzen AI MAX+ 395 iGPU（gfx1151，……）发布了 DFlash + PFlash

双 DGX Spark（华硕 GX10）MiniMax M2.7 实测

@analogalok: 我刚刚在我的 RTX 4060 上用 llama.cpp + CUDA 13.2 跑了 Google 全新的 Unsloth Gemma4 12B 密集 GGUF，每秒 21 个 token…

一台10年前的Xeon就够了

全新Google Gemma 4 12B自称性能接近26B模型——我们实测了这两款！

提交意见反馈