如何防止 llama.cpp 将数据卸载到交换空间？

Reddit r/LocalLLaMA 2026/06/11 11:22 工具

llama-cpp swap memory-management kv-cache troubleshooting gguf

摘要

用户寻求关于如何防止 llama.cpp 在 RAM 完全耗尽前将 KV 缓存卸载到交换空间的建议，并分享了他们在配备 96GB RAM 的 M2 Max 和大型 Qwen 模型上的配置。

我曾尝试通过使用 llama.cpp 标志来防止此问题，但问题依然存在：每当我的 RAM 接近 96GB 时，llama-server / llama.cpp 就会决定将 KV 缓存卸载到交换空间。这通常发生在 RAM 使用量达到 91-92GB 时，而我仍有 4GB 剩余。是否有更激进的方法让 llama.cpp 只在 RAM 达到 95GB 时才进行卸载？规格：M2 Max 96GB，Qwen 3.5 122b q4，最新版 llama.cpp llama-server --port ${PORT} --model /Users/user/.lmstudio/models/unsloth/Qwen3.5-122B-A10B-MTP-GGUF/Qwen3.5-122B-A10B-UD-Q4_K_XL-00001-of-00003.gguf --spec-type draft-mtp --spec-draft-n-max 2 --ctx-size 150000 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 --mlock --parallel 1 --no-warmup --jinja --threads 8 -ngl 99 --ctx-checkpoints 32 --presence-penalty 0.0 --repeat-penalty 1.0 --no-context-shift --cache-ram 6000 -fa on

查看原文

如何防止 llama.cpp 将数据卸载到交换空间？

相似文章

也许将KV缓存卸载到RAM并不差

寻找关于 llama.cpp 服务器及模型卸载工作原理的阅读资源

Linux - 为什么 llama.cpp ROCm 的 KV 缓存消耗比 Vulkan 多那么多显存？

动态KV缓存量化与按需加载mmproj/MTP：我的llama.cpp愿望清单

[llama.cpp] 非对称 KV q8/q4 缓存：当前注意事项及 GGML 仓库中的讨论

提交意见反馈