ExLlamaV3 重大更新！

Reddit r/LocalLLaMA 2026/05/11 07:05 工具

摘要

ExLlamaV3 发布了一系列重大更新，包括对 Gemma 4 的支持、缓存效率的提升，以及新的 DFlash 技术，可显著提高各类模型的推理速度。

Turboderp 最近一直在 [火力全开](https://github.com/turboderp-org/exllamav3/commits/dev)，在这场将新模型塞入更小、更快硬件的无尽战斗中。上个月我们率先发布了 [Gemma 4 支持](https://github.com/turboderp-org/exllamav3/releases/tag/v0.0.29)，随后又带来了 [缓存效率改进](https://github.com/turboderp-org/exllamav3/releases/tag/v0.0.30)。两周前推出的 [DFlash 支持](https://github.com/turboderp-org/exllamav3/releases/tag/v0.0.31) 带来了令人瞩目的测试结果： | 类别 | 基准 | N-gram/后缀 | DFlash | | :- | :- | :- | :- | | Agentic, code | 55.98 t/s | 89.58 t/s (1.60x) | 140.61 t/s (2.51x) | | Agentic, curl | 54.03 t/s | 74.62 t/s (1.38x) | 125.94 t/s (2.33x) | | Coding | 59.21 t/s | 75.34 t/s (1.27x) | 177.67 t/s (3.00x) | | Creative | 59.10 t/s | 67.26 t/s (1.13x) | 89.19 t/s (1.50x) | | Creative (reasoning) | 59.03 t/s | 64.25 t/s (1.09x) | 93.54 t/s (1.58x) | | Translation | 58.11 t/s | 55.39 t/s (0.95x) | 75.73 t/s (1.30x) | | Translation (reasoning) | 58.08 t/s | 80.21 t/s (1.38x) | 119.43 t/s (2.06x) | 上周进行了 [更多模型优化](https://github.com/turboderp-org/exllamav3/releases/tag/v0.0.32)，提升如下： | 模型 | 3090¹ | 4090¹ | 5090¹ | 6000 Pro¹ | 5090² | 6000 Pro² | | :- | :- | :- | :- | :- | :- | :- | | Qwen3.5-35B-A3B 4.00bpw | 5.3% | 5.8% | 8.6% | 10.3% | 21.0% | 23.5% | | Qwen3.5-27B 4.00bpw | 0.0% | 1.9% | 8.1% | 11.7% | 13.1% | 15.0% | | Trinity-Nano 4.15bpw | 29.5% | 48.6% | 52.3% | 52.9% | 70.5% | 72.4% | | Gemma4-26B-A4B 4.10bpw | 3.1% | 2.9% | 7.8% | 9.6% | 16.4% | 19.2% | | Gemma4-31B 4.00bpw | 4.0% | 4.9% | 10.0% | 8.0% | 16.0% | 12.0% | 过去两天又推出了 [DFlash 模型量化](https://github.com/turboderp-org/exllamav3/releases/tag/v0.0.33)，并修复了更多 Bug 及提升了效率，dev 分支上工作仍在继续！欢迎来 [exllama Discord](https://discord.gg/AD2mVhZzf) 打个招呼。

查看原文

ExLlamaV3 重大更新！

相似文章

@Prince_Canuma: Gemma 4 + 🦅 = brrr 下一次 MLX-VLM 版本将包含大量改进！这里是对 Eagle3 推测解码的初步预览…

推出 Gemma 3

BeeLlama.cpp：支持推理和视觉的先进 DFlash 与 TurboQuant。在 RTX 3090 上以 200k 上下文运行 Qwen 3.6 27B Q5，速度比基线快 2-3 倍（峰值 135 tps！）

@jundotkim: oMLX 0.3.9.dev2 已发布。亮点包括：- 视觉路径上的 Gemma 4 MTP（感谢 @Prince_Canuma 的 mlx-vlm）。图像+文本的解码速度显著提升 -...

vllm-project/vllm v0.19.1rc0: [Misc] 清理 Gemma4 实现 (#38872)

提交意见反馈