Qwen3.6 27b / llama.cpp / opencode 最佳配置

Reddit r/LocalLLaMA 2026/04/22 15:44 工具

摘要

社区讨论帖，分享在多 GPU 环境下运行 27B Qwen3.6 GGUF 模型、支持 100K-512K 长上下文的 llama.cpp 优化启动命令。

请分享你的最佳配置 <3 Windows 双卡 3080 20GB 显存，DDR4 256GB 内存，llama.cpp，在 100K 填充上下文时我能达到 400/11 pp/tg（我的配置）： "A:/0_llama_server/llama-server.exe" -m "a:\0_LM_Studio\unsloth\Qwen3.6-27B-GGUF\Qwen3.6-27B-UD-Q5_K_XL.gguf" --port 8080 --alias qwen3.5:27b -ngl 999 --threads 22 --flash-attn on --host 0.0.0.0 --no-mmap -mg 1 --batch-size 1024 --ubatch-size 512 --ctx-checkpoints 128 --ctx-size 196610 --reasoning on --jinja --draft-max 128 --spec-ngram-size-n 48 --draft-min 2 --spec-type ngram-mod --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 --repeat_penalty 1.0 --presence_penalty 0.0 --chat-template-kwargs "{"preserve_thinking":true}" --tensor-split 0.46,0.54 DGX（用户 Impossible_Art9151）： llama-server -hf unsloth/Qwen3.6-27B-GGUF:UD-Q8_K_XL --host 0.0.0.0 --port 8095 --ctx-size 512000 --no-mmap --parallel 2 --flash-attn on --n-gpu-layers 999 --chat-template-kwargs "{"preserve_thinking":true}" --temp 0.7 --top-p 0.95 --top-k 20 --min-p 0.00 --repeat_penalty 1.0 --presence_penalty 0.0

查看原文

Qwen3.6 27b / llama.cpp / opencode 最佳配置

相似文章

48GB VRAM + Qwen 3.6 27B 的最佳设置

在 8GB 显存和 32GB 内存上运行 Qwen3.6 35b a3b，~190k 上下文

在24GB显存环境中运行Qwen 3.6 27B的配置：后端对比、量化选择与设置（llama.cpp, ik_llama.cpp, BeeLlama, vllm）

大家在 Qwen3.6 27b 上跑出来的速度是多少？

Qwen3.6 27B 在 vLLM 中的表现比在 llama.cpp 中更差

提交意见反馈