havenoammo/Qwen3.6-27B-MTP-UD-GGUF

Hugging Face Models Trending 2026/05/06 10:29 模型

qwen multi-token-prediction gguf unsloth speculative-decoding llama-cpp

摘要

该 Hugging Face 仓库提供了 Qwen3.6-27B 的 GGUF 文件，这些文件在 Unsloth UD XL 量化版本的基础上嫁接了多 Token 预测 (MTP) 层。它还包含了构建支持 MTP 的 llama.cpp 的说明，以实现投机解码。

任务：图像文本到文本标签：transformers, gguf, unsloth, qwen, qwen3_5, 图像文本到文本, 基础模型:Qwen/Qwen3.6-27B, 基础模型:量化:Qwen/Qwen3.6-27B, 许可证:apache-2.0, 端点兼容, 区域:us, 对话式

查看原文导出为 Word 导出为 PDF

查看缓存全文

缓存时间: 2026/05/10 18:35

havenoammo/Qwen3.6-27B-MTP-UD-GGUF · Hugging Face 来源：https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF > ### https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#mtp-layers-grafted-on-unsloth-ud-xl-quantizations移植到 Unsloth UD XL 量化版本的多令牌预测（MTP）层

本仓库中的 GGUF 文件是 Qwen3.6-27B 的Unsloth Dynamic 2.0 XL (UD XL) 量化版本，并在其上移植了多令牌预测 (MTP) 层。

基础量化： Unsloth UD XL — 详见 Unsloth Dynamic 2.0 GGUFs (https://unsloth.ai/docs/basics/unsloth-dynamic-v2.0-gguf) 以获取基准测试数据。
MTP 层： 取自 Radamanthys11/Qwen3.6-27B-MTP-Q8_0-GGUF (https://huggingface.co/Radamanthys11/Qwen3.6-27B-MTP-Q8_0-GGUF)，以 Q8_0 精度存储，然后合并到 UD XL GGUF 中。
为什么 MTP 使用 Q8？ 草稿头（draft heads）相对于基础模型较小，因此 Q8_0 能使其接近无损，同时避免对整个模型栈重新量化带来的开销。
convert.py： 用于将 MTP 层移植到 UD XL GGUF 的脚本。改编自此 gist (https://gist.github.com/buzz/1c439684d5e3f36492ae9f64ef7e3f67)。
27B_MTP.gguf： 用作 convert.py 输入的原始 Q8_0 MTP 层源文件。

要运行这些带有 MTP 功能的文件，你需要一个自定义构建的 llama.cpp，该构建需包含来自 PR #22673 (https://github.com/ggml-org/llama.cpp/pull/22673) 的 MTP/投机解码支持。请遵循以下步骤。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#building-llamacpp-with-mtp-support–step-by-step逐步构建支持 MTP 的 llama.cpp

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#1-clone-and-enter-the-repo1. 克隆并进入仓库

git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#2-fetch-the-latest-remote-changes2. 获取最新的远程更改

git fetch origin

这将从上游仓库拉取所有新的引用，确保你基于 master 分支的最新尖端进行操作。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#3-fetch-pr-22673-as-a-local-branch3. 将 PR #22673 作为本地分支获取

git fetch origin pull/22673/head:pr-22673

PR #22673 (https://github.com/ggml-org/llama.cpp/pull/22673) (“llama + spec: MTP Support”) 添加了投机解码 / MTP 基础设施，使 llama-server 能够消耗多令牌预测头。我们直接拉取它，而不等待其在上游合并。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#4-checkout-master-and-reset-to-latest-remote4. 检出 master 并重置到最新远程版本

git checkout master
git reset --hard 5207d120e

确保在上游 master 的当前状态有一个干净的起点，丢弃任何本地偏离。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#5-merge-the-pr-on-top-non-fast-forward5. 在顶部合并 PR（非快进）

git merge --no-ff pr-22673 -m "Merge PR #22673 (https://github.com/ggml-org/llama.cpp/pull/22673): llama + spec: MTP Support"

--no-ff 标志保留合并提交，以便如果 PR 正式合并并发生变化，你可以干净地进行 cherry-pick 或回滚。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#6-build-llama-server6. 构建 `llama-server`

cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release --target llama-server

这将生成 build/bin/llama-server。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#7-run-the-server-with-mtp-enabled7. 启用 MTP 运行服务器

./build/bin/llama-server \
-m path/to/qwen3.6-27b-ud-xl-mtp.gguf \
--spec-type mtp \
--spec-draft-n-max 3

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#–spec-type-mtp-tells-llamacpp-to-use-the-mtp-heads-baked-into-the-gguf–spec-draft-n-max-3-sets-the-max-number-of-draft-tokens-per-step-matching-the-models-3-mtp-layers`--spec-type mtp` 告诉 llama.cpp 使用嵌入在 GGUF 中的 MTP 头。`--spec-draft-n-max 3` 设置每步的最大草稿令牌数（匹配模型的 3 个 MTP 层）。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#read-our-how-to-run-qwen36-guide阅读我们的如何运行 Qwen3.6 指南！(https://unsloth.ai/docs/models/qwen3.6)

详见 Unsloth Dynamic 2.0 GGUFs (https://unsloth.ai/docs/basics/unsloth-dynamic-v2.0-gguf) 以获取我们的量化基准测试。

开发者角色支持，使 Qwen3.6 可以在 Codex、OpenCode 等环境中工作！
Qwen3.6 现在可以在 Unsloth Studio (https://unsloth.ai/docs/new/studio) 中运行和微调。阅读我们的指南 (https://unsloth.ai/docs/models/qwen3.6)。
工具调用改进：使解析嵌套对象以成功进行工具调用更加容易。
Qwen3.6 35B-A3B (4-bit GGUF) 在 Unsloth Studio 中运行工具调用的示例：qwen3.6 in unsloth studio

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#qwen36-27bQwen3.6-27B

Qwen Chat (https://chat.qwen.ai/)

该仓库包含 Hugging Face Transformers 格式的后训练模型的权重和配置文件。这些工件兼容 Hugging Face Transformers、vLLM、SGLang、KTransformers 等。继二月发布的 Qwen3.5 系列之后，我们很高兴分享 Qwen3.6 的第一个开源权重变体。Qwen3.6 基于社区的直接反馈构建，优先考虑稳定性和实用价值，为开发者提供更直观、响应更迅速且真正高效的编码体验。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#qwen36-highlightsQwen3.6 亮点

此次发布带来了重大升级，特别是在：

Agentic Coding（智能体编程）： 模型现在能够更流畅、更精确地处理前端工作流和仓库级推理。
思维保留： 我们引入了一个新选项，用于保留来自历史消息的推理上下文，简化迭代开发并减少开销。

基准测试结果 (https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_27b_score.png)

更多详情，请参阅我们的博客文章 Qwen3.6-27B (https://qwen.ai/blog?id=qwen3.6-27b)。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#model-overview模型概览

类型：带有视觉编码器的因果语言模型
训练阶段：预训练 & 后训练
语言模型
- 参数量：27B
- 隐藏维度：5120
- 令牌嵌入：248320 (填充后)
- 层数：64
- 隐藏布局：16 × (3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN))
- Gated DeltaNet:
  - 线性注意力头数量：V 为 48，QK 为 16
  - 头维度：128
- Gated Attention:
  - 注意力头数量：Q 为 24，KV 为 4
  - 头维度：256
- 旋转位置嵌入维度：64
- 前馈网络:
  - 中间维度：17408
- LM 输出：248320 (填充后)
MTP：经过多步训练
上下文长度：原生支持 262,144 令牌，可扩展至 1,010,000 令牌。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#benchmark-results基准测试结果

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#language语言

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#vision-language视觉语言

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#quickstart快速开始

为了简化集成，我们建议通过 API 使用 Qwen3.6。以下是通过 OpenAI 兼容 API 使用 Qwen3.6 的指南。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#serving-qwen36服务 Qwen3.6

Qwen3.6 可以通过 API 与流行的推理框架一起提供服务。以下展示了启动 Qwen3.6 模型的 OpenAI 兼容 API 服务器的示例命令。

推理效率和吞吐量在不同框架之间差异显著。建议使用最新的框架版本以确保最佳性能和兼容性。对于生产工作负载或高吞吐量场景，强烈推荐使用专用服务引擎，如 SGLang、KTransformers 或 vLLM。该模型默认的上下文长度为 262,144 令牌。如果遇到内存不足 (OOM) 错误，请考虑减小上下文窗口。但是，由于 Qwen3.6 利用扩展上下文来处理复杂任务，我们建议保持至少 128K 令牌的上下文长度以保留思维推理能力。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#sglangSGLang

SGLang (https://github.com/sgl-project/sglang) 是一个用于大型语言模型和视觉语言模型的快速服务框架。sglang>=0.5.10 推荐用于 Qwen3.6，可以在全新环境中使用以下命令安装：

uv pip install sglang[all]

详见其文档 (https://docs.sglang.ai/get_started/install.html)。

以下将在 http://localhost:8000/v1 创建 API 端点：

标准版本：以下命令可用于在 8 个 GPU 上使用张量并行创建最大上下文长度为 262,144 令牌的 API 端点。

python -m sglang.launch_server --model-path Qwen/Qwen3.6-27B --port 8000 --tp-size 8 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser qwen3

工具使用：为了支持工具使用，你可以使用以下命令。

python -m sglang.launch_server --model-path Qwen/Qwen3.6-27B --port 8000 --tp-size 8 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser qwen3 --tool-call-parser qwen3_coder

多令牌预测 (MTP)：以下命令推荐用于 MTP：

python -m sglang.launch_server --model-path Qwen/Qwen3.6-27B --port 8000 --tp-size 8 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser qwen3 --speculative-algo NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4

详细的部署指南，请参阅 SGLang Qwen3.5 Cookbook (https://lmsysorg.mintlify.app/cookbook/llm/Qwen/Qwen3.5)。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#vllmvLLM

vLLM (https://github.com/vllm-project/vllm) 是用于 LLM 的高吞吐量和内存高效的推理与服务引擎。vllm>=0.19.0 推荐用于 Qwen3.6，可以在全新环境中使用以下命令安装：

uv pip install vllm --torch-backend=auto

详见其文档 (https://docs.vllm.ai/en/stable/getting_started/installation/index.html)。

以下将在 http://localhost:8000/v1 创建 API 端点：

标准版本：以下命令可用于在 8 个 GPU 上使用张量并行创建最大上下文长度为 262,144 令牌的 API 端点。
```
vllm serve Qwen/Qwen3.6-27B --port 8000 --tensor-parallel-size 8 --max-model-len 262144 --reasoning-parser qwen3
```

工具调用：为了支持工具使用，你可以使用以下命令。

vllm serve Qwen/Qwen3.6-27B --port 8000 --tensor-parallel-size 8 --max-model-len 262144 --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder

多令牌预测 (MTP)：以下命令推荐用于 MTP：

vllm serve Qwen/Qwen3.6-27B --port 8000 --tensor-parallel-size 8 --max-model-len 262144 --reasoning-parser qwen3 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'

纯文本：以下命令跳过视觉编码器和多模态配置，以释放额外 KV 缓存的内存：

vllm serve Qwen/Qwen3.6-27B --port 8000 --tensor-parallel-size 8 --max-model-len 262144 --reasoning-parser qwen3 --language-model-only

详细的部署指南，请参阅 vLLM Qwen3.5 Recipe (https://docs.vllm.ai/projects/recipes/en/latest/Qwen/Qwen3.5.html)。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#ktransformersKTransformers

KTransformers (https://github.com/kvcache-ai/ktransformers) 是一个灵活的框架，用于体验利用 CPU-GPU 异构计算的前沿 LLM 推理优化。关于如何使用 KTransformers 运行 Qwen3.6，请参阅 KTransformers 部署指南 (https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/Qwen3.5.md)。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#hugging-face-transformersHugging Face Transformers

Hugging Face Transformers 包含一个轻量级服务器，可用于快速测试和中等负载部署。Qwen3.6 需要最新的 transformers：

pip install "transformers[serving]"

详见其文档 (https://huggingface.co/docs/transformers/main/serving)。

请同时确保已安装 torchvision 和 pillow。然后，运行 transformers serve 以启动一个服务器，其 API 端点位于 http://localhost:8000/v1；如果有可用的加速器，它将把模型放置在加速器上：

transformers serve Qwen/Qwen3.6-27B --port 8000 --continuous-batching

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#using-qwen36-via-the-chat-completions-api通过 Chat Completions API 使用 Qwen3.6

Chat completions API 可通过标准 HTTP 请求或 OpenAI SDK 访问。这里展示使用 OpenAI Python SDK 的示例。在开始之前，请确保已安装并配置了 API 密钥和 API 基础 URL，例如：

pip install -U openai
# Set the following accordingly
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"

我们建议使用以下一组采样参数进行生成

一般任务的思维模式：temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

精确编码任务（如 WebDev）的思维模式：temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

指令（或非思维）模式：temperature=0.7, top_p=0.80, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0

请注意，对采样参数的支持因推理框架而异。

Qwen3.6 模型默认在思维模式下运行，在生成最终响应之前生成由 \n...\n\n 标记的思维内容。要禁用思维内容并获取直接响应，请参阅此处的示例 (https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#instruct-or-non-thinking-mode)。

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#text-only-input纯文本输入

from openai import OpenAI

# Configured by environment variables
client = OpenAI()

messages = [
    {"role": "user", "content": "Type \"I love Qwen3.6\" backwards"},
]

chat_response = client.chat.completions.create(
    model="Qwen/Qwen3.6-27B",
    messages=messages,
    max_tokens=81920,
    temperature=1.0,
    top_p=0.95,
    presence_penalty=0.0,
    extra_body={
        "top_k": 20,
    },
)

print("Chat response:", chat_response)

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#image-input图像输入

from openai import OpenAI

# Configured by environment variables
client = OpenAI()

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3.5/demo/CI_Demo/mathv-1327.jpg"
                }
            },
            {
                "type": "text",
                "text": "The centres of the four illustrated circles are in the corners of the square. The two big circles touch each other and also the two little circles. With which factor do you have to multiply the radii of the little circles to obtain the radius of the big circles?\nChoices:\n(A) $\\frac{2}{9}$\n(B) $\\sqrt{5}$\n(C) $0.8 \\cdot \\pi$\n(D) 2.5\n(E) $1+\\sqrt{2}$"
            }
        ]
    }
]

response = client.chat.completions.create(
    model="Qwen/Qwen3.6-27B",
    messages=messages,
    max_tokens=81920,
    temperature=1.0,
    top_p=0.95,
    presence_penalty=0.0,
    extra_body={

havenoammo/Qwen3.6-27B-MTP-UD-GGUF

havenoammo/Qwen3.6-27B-MTP-UD-GGUF · Hugging Face 来源：https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF > ### https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#mtp-layers-grafted-on-unsloth-ud-xl-quantizations移植到 Unsloth UD XL 量化版本的多令牌预测（MTP）层

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#building-llamacpp-with-mtp-support–step-by-step逐步构建支持 MTP 的 llama.cpp

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#1-clone-and-enter-the-repo1. 克隆并进入仓库

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#2-fetch-the-latest-remote-changes2. 获取最新的远程更改

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#3-fetch-pr-22673-as-a-local-branch3. 将 PR #22673 作为本地分支获取

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#4-checkout-master-and-reset-to-latest-remote4. 检出 master 并重置到最新远程版本

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#5-merge-the-pr-on-top-non-fast-forward5. 在顶部合并 PR（非快进）

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#6-build-llama-server6. 构建 `llama-server`

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#7-run-the-server-with-mtp-enabled7. 启用 MTP 运行服务器

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#read-our-how-to-run-qwen36-guide阅读我们的如何运行 Qwen3.6 指南！(https://unsloth.ai/docs/models/qwen3.6)

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#qwen36-27bQwen3.6-27B

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#qwen36-highlightsQwen3.6 亮点

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#model-overview模型概览

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#benchmark-results基准测试结果

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#language语言

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#vision-language视觉语言

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#quickstart快速开始

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#serving-qwen36服务 Qwen3.6

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#sglangSGLang

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#vllmvLLM

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#ktransformersKTransformers

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#hugging-face-transformersHugging Face Transformers

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#using-qwen36-via-the-chat-completions-api通过 Chat Completions API 使用 Qwen3.6

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#text-only-input纯文本输入

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#image-input图像输入

相似文章

unsloth/Qwen3.6-35B-A3B-MTP-GGUF

unsloth/Qwen3.6-27B-MTP-GGUF

Unsloth 上的 MTP

Qwen3.6-27B-GGUF 重磅发布！

unsloth/Qwen3.6-27B-GGUF

提交意见反馈

havenoammo/Qwen3.6-27B-MTP-UD-GGUF · Hugging Face 来源：https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF > ### https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#mtp-layers-grafted-on-unsloth-ud-xl-quantizations移植到 Unsloth UD XL 量化版本的多令牌预测（MTP）层

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#building-llamacpp-with-mtp-support–step-by-step逐步构建支持 MTP 的 llama.cpp

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#1-clone-and-enter-the-repo1. 克隆并进入仓库

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#2-fetch-the-latest-remote-changes2. 获取最新的远程更改

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#3-fetch-pr-22673-as-a-local-branch3. 将 PR #22673 作为本地分支获取

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#4-checkout-master-and-reset-to-latest-remote4. 检出 master 并重置到最新远程版本

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#5-merge-the-pr-on-top-non-fast-forward5. 在顶部合并 PR（非快进）

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#6-build-llama-server6. 构建 llama-server

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#7-run-the-server-with-mtp-enabled7. 启用 MTP 运行服务器

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#read-our-how-to-run-qwen36-guide阅读我们的如何运行 Qwen3.6 指南！(https://unsloth.ai/docs/models/qwen3.6)

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#qwen36-27bQwen3.6-27B

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#qwen36-highlightsQwen3.6 亮点

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#model-overview模型概览

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#benchmark-results基准测试结果

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#language语言

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#vision-language视觉语言

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#quickstart快速开始

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#serving-qwen36服务 Qwen3.6

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#sglangSGLang

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#vllmvLLM

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#ktransformersKTransformers

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#hugging-face-transformersHugging Face Transformers

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#using-qwen36-via-the-chat-completions-api通过 Chat Completions API 使用 Qwen3.6

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#text-only-input纯文本输入

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#image-input图像输入

相似文章

unsloth/Qwen3.6-35B-A3B-MTP-GGUF

unsloth/Qwen3.6-27B-MTP-GGUF

Unsloth 上的 MTP

Qwen3.6-27B-GGUF 重磅发布！

unsloth/Qwen3.6-27B-GGUF

提交意见反馈

https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF#6-build-llama-server6. 构建 `llama-server`