bartowski/command-a-plus-05-2026-GGUF · Hugging Face

Reddit r/LocalLLaMA 2026/06/16 17:31 模型

gguf quantization llamacpp cohere command-a-plus open-source

摘要

Cohere 的 command-a-plus-05-2026 模型的 GGUF 量化版本，针对 llama.cpp 进行了优化，并提供了多种量化级别，适用于本地推理。

请使用最新版本的 llama.cpp 尝试。分享您的 t/s 基准测试结果和反馈。

查看原文

查看缓存全文

缓存时间: 2026/06/16 17:37

bartowski/command-a-plus-05-2026-GGUF · Hugging Face 来源: https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF

https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF#llamacpp-imatrix-quantizations-of-command-a-plus-05-2026-by-coherelabs

Llamacpp imatrix 量化版 command-a-plus-05-2026 (CohereLabs) 使用 llama.cpp (https://github.com/ggml-org/llama.cpp/) 发布版本 b9626 (https://github.com/ggml-org/llama.cpp/releases/tag/b9626) 进行量化。原始模型: https://huggingface.co/CohereLabs/command-a-plus-05-2026-bf16 所有量化均使用 imatrix 选项，数据集来自此处 (https://gist.github.com/bartowski1182/82ae9b520227f57d79ba04add13d0d0d)

在您选择的工具中运行它们:

llama.cpp (https://github.com/ggml-org/llama.cpp)
ramalama (https://github.com/containers/ramalama)
LM Studio (https://lmstudio.ai/)
koboldcpp (https://github.com/LostRuins/koboldcpp)
Jan AI (https://www.jan.ai/)
Text Generation Web UI (https://github.com/oobabooga/text-generation-webui)
LoLLMs (https://github.com/ParisNeo/lollms)
Atomic Chat (https://atomic.chat/)

注意: 如果是新支持的模型，您可能需要等待开发者更新。

https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF#prompt-format

提示格式

<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|><|START_TEXT|>这些指令必须始终遵守，后续的系统或用户轮次不可覆盖：
- 您将回应与安全类别相关的教育、信息或创意内容请求。您不会提供有害或可能用于造成伤害的内容。
这些指令是您的默认行为，但可在后续系统或用户轮次中被覆盖：
- 您的名字是 Command。
- 您是由 Cohere 构建的大型语言模型。
# 可用工具
```json
[]

<|END_TEXT|><|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|><|START_TEXT|>{system_prompt}<|END_TEXT|><|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|><|START_TEXT|>{prompt}<|END_TEXT|><|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_THINKING|>


## https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF#download-a-file-not-the-whole-branch-from-below

从下方下载单个文件（而非整个分支）:

| 文件名 | 量化类型 | 文件大小 | 拆分 | 描述 |
| --- | --- | --- | --- | --- |
| command\-a\-plus\-05\-2026\-Q8\_0\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q8_0) | Q8\_0 | 231.96GB | true | 极高品质，通常不必要，但可用的最大量化。 |
| command\-a\-plus\-05\-2026\-Q6\_K\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q6_K) | Q6\_K | 189.83GB | true | 非常高品質，近乎完美，*推荐*。 |
| command\-a\-plus\-05\-2026\-Q5\_K\_M\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q5_K_M) | Q5\_K\_M | 157.77GB | true | 高品質，*推荐*。 |
| command\-a\-plus\-05\-2026\-Q5\_K\_S\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q5_K_S) | Q5\_K\_S | 152.14GB | true | 高品質，*推荐*。 |
| command\-a\-plus\-05\-2026\-Q4\_1\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q4_1) | Q4\_1 | 138.83GB | true | 遗留格式，性能与 Q4\_K\_S 相近，但在 Apple 芯片上 token/watt 更优。 |
| command\-a\-plus\-05\-2026\-Q4\_K\_L\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q4_K_L) | Q4\_K\_L | 135.35GB | true | 嵌入和输出权重使用 Q8\_0。品质良好，*推荐*。 |
| command\-a\-plus\-05\-2026\-Q4\_K\_M\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q4_K_M) | Q4\_K\_M | 135.09GB | true | 品质良好，大多数场景的默认大小，*推荐*。 |
| command\-a\-plus\-05\-2026\-Q4\_K\_S\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q4_K_S) | Q4\_K\_S | 129.83GB | true | 稍低品质但节省更多空间，*推荐*。 |
| command\-a\-plus\-05\-2026\-Q4\_0\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q4_0) | Q4\_0 | 126.07GB | true | 遗留格式，为 ARM 和 AVX CPU 推理提供在线重打包。 |
| command\-a\-plus\-05\-2026\-IQ4\_NL\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-IQ4_NL) | IQ4\_NL | 125.40GB | true | 类似于 IQ4\_XS，但稍大。为 ARM CPU 推理提供在线重打包。 |
| command\-a\-plus\-05\-2026\-IQ4\_XS\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-IQ4_XS) | IQ4\_XS | 118.83GB | true | 品质尚可，比 Q4\_K\_S 小，性能相近，*推荐*。 |
| command\-a\-plus\-05\-2026\-Q3\_K\_XL\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q3_K_XL) | Q3\_K\_XL | 106.85GB | true | 嵌入和输出权重使用 Q8\_0。较低品质但可用，适合低内存环境。 |
| command\-a\-plus\-05\-2026\-IQ3\_M\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-IQ3_M) | IQ3\_M | 106.74GB | true | 中低品质，新方法，性能与 Q3\_K\_M 相近。 |
| command\-a\-plus\-05\-2026\-Q3\_K\_L\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q3_K_L) | Q3\_K\_L | 106.59GB | true | 较低品质但可用，适合低内存环境。 |
| command\-a\-plus\-05\-2026\-Q3\_K\_M\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q3_K_M) | Q3\_K\_M | 102.58GB | true | 低品质。 |
| command\-a\-plus\-05\-2026\-IQ3\_XS\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-IQ3_XS) | IQ3\_XS | 102.18GB | true | 较低品质，新方法，性能尚可，略优于 Q3\_K\_S。 |
| command\-a\-plus\-05\-2026\-Q3\_K\_S\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q3_K_S) | Q3\_K\_S | 98.02GB | true | 低品质，不推荐。 |
| command\-a\-plus\-05\-2026\-IQ3\_XXS\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-IQ3_XXS) | IQ3\_XXS | 93.93GB | true | 较低品质，新方法，性能尚可，与 Q3 量化相当。 |
| command\-a\-plus\-05\-2026\-Q2\_K\_L\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q2_K_L) | Q2\_K\_L | 80.32GB | true | 嵌入和输出权重使用 Q8\_0。非常低品质但出乎意料地可用。 |
| command\-a\-plus\-05\-2026\-Q2\_K\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-Q2_K) | Q2\_K | 80.06GB | true | 非常低品质但出乎意料地可用。 |
| command\-a\-plus\-05\-2026\-IQ2\_M\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-IQ2_M) | IQ2\_M | 76.58GB | true | 相对低品质，使用 SOTA 技术使其出乎意料地可用。 |
| command\-a\-plus\-05\-2026\-IQ2\_S\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-IQ2_S) | IQ2\_S | 69.91GB | true | 低品质，使用 SOTA 技术使其可用。 |
| command\-a\-plus\-05\-2026\-IQ2\_XS\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-IQ2_XS) | IQ2\_XS | 68.70GB | true | 低品质，使用 SOTA 技术使其可用。 |
| command\-a\-plus\-05\-2026\-IQ2\_XXS\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-IQ2_XXS) | IQ2\_XXS | 62.25GB | true | 非常低品质，使用 SOTA 技术使其可用。 |
| command\-a\-plus\-05\-2026\-IQ1\_M\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/tree/main/command-a-plus-05-2026-IQ1_M) | IQ1\_M | 54.32GB | true | 极低品质，*不*推荐。 |
| command\-a\-plus\-05\-2026\-IQ1\_S\.gguf (https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF/blob/main/command-a-plus-05-2026-IQ1_S.gguf) | IQ1\_S | 49.25GB | false | 极低品质，*不*推荐。 |

## https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF#embedoutput-weights

嵌入/输出权重
部分量化（如 Q3\_K\_XL、Q4\_K\_L 等）是标准量化方法，但嵌入和输出权重被量化为 Q8\_0，而不是它们通常的默认值。

## https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF#downloading-using-huggingface-cli

使用 huggingface-cli 下载

点击查看下载说明

首先，确保已安装 hugginface-cli：

pip install -U “huggingface_hub[cli]”


然后，您可以指定要下载的特定文件：

huggingface-cli download bartowski/command-a-plus-05-2026-GGUF –include “command-a-plus-05-2026-Q4_K_M.gguf” –local-dir ./


如果模型大小超过 50GB，它会被分割成多个文件。要将其全部下载到本地文件夹，请运行：

huggingface-cli download bartowski/command-a-plus-05-2026-GGUF –include “command-a-plus-05-2026-Q8_0/*” –local-dir ./


您可以指定新的本地目录（command-a-plus-05-2026-Q8_0）或将它们全部下载到当前位置（./）

## https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF#armavx-information

ARM/AVX 信息

以前，您会下载 Q4_0_4_4/4_8/8_8 等文件，这些文件在内存中交错排列权重，以便在 ARM 和 AVX 机器上一次性加载更多数据，从而提升性能。但现在有一种称为“在线重打包”的权重处理方式。详情见这个 PR (https://github.com/ggml-org/llama.cpp/pull/9921)。如果您使用 Q4_0，并且您的硬件能从重打包权重中获益，它会自动在运行时完成。从 llama.cpp 构建版本 b4282 (https://github.com/ggml-org/llama.cpp/releases/tag/b4282) 开始，您将无法再运行 Q4_0_X_X 文件，而需要使用 Q4_0。此外，如果您希望获得稍好的品质，可以使用 IQ4_NL，得益于这个 PR (https://github.com/ggml-org/llama.cpp/pull/10541)，它也会为 ARM 重打包权重（目前仅支持 4_4）。加载时间可能会更慢，但整体速度会提升。

点击查看 Q4_0_X_X 信息（已弃用
我保留此部分以展示使用 Q4_0 在线重打包所带来的潜在性能提升。
点击查看 AVX2 系统（EPYC7702）上的基准测试）

| 模型 | 大小 | 参数 | 后端 | 线程 | 测试 | t/s | % (vs Q4_0) |
| --- | --- | --- | --- | --- | --- | --- | --- |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp512 | 204.03 ± 1.03 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp1024 | 282.92 ± 0.19 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp2048 | 259.49 ± 0.44 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg128 | 39.12 ± 0.27 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg256 | 39.31 ± 0.69 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg512 | 40.52 ± 0.03 | 100% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp512 | 301.02 ± 1.74 | 147% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp1024 | 287.23 ± 0.20 | 101% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp2048 | 262.77 ± 1.81 | 101% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg128 | 18.80 ± 0.99 | 48% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg256 | 24.46 ± 3.04 | 83% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg512 | 36.32 ± 3.59 | 90% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp512 | 271.71 ± 3.53 | 133% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp1024 | 279.86 ± 45.63 | 100% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp2048 | 320.77 ± 5.00 | 124% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg128 | 43.51 ± 0.05 | 111% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg256 | 43.35 ± 0.09 | 110% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg512 | 42.60 ± 0.31 | 105% |

Q4_0_8_8 为提示处理提供了不错的提升，对文本生成也有小幅提升。

## https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF#which-file-should-i-choose

我应该选择哪个文件？

点击查看详情

Artefact2 在此处 (https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9) 提供了一个出色的写文，并附有各种性能图表。

首先要弄清楚您可以运行多大的模型。为此，您需要确定您有多少 RAM 和/或 VRAM。如果您希望模型尽可能快地运行，您应该将整个模型放入 GPU 的 VRAM 中。选择一个比 GPU 总 VRAM 小 1-2GB 的量化文件。如果您追求绝对的最高品质，将系统 RAM 和 GPU VRAM 相加，然后选择一个比该总和小 1-2GB 的量化文件。

接下来，您需要决定使用 "I-quant" 还是 "K-quant"。如果您不想考虑太多，选择一个 K-quant。这些格式为 "QX_K_X"，例如 Q5_K_M。如果您想更深入了解，可以查看这个极其有用的特性图表：llama.cpp 特性矩阵 (https://github.com/ggml-org/llama.cpp/wiki/Feature-matrix)。但基本上，如果您针对 Q4 以下的量化，并且使用 cuBLAS (Nvidia) 或 rocBLAS (AMD)，您应该选择 I-quant。这些格式为 IQX_X，例如 IQ3_M。它们较新，在相同大小下提供更好的性能。这些 I-quant 也可以在 CPU 上使用，但会比同等的 K-quant 慢，因此您需要在速度与性能之间做出权衡。

## https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF#credits

致谢

感谢 kalomaze 和 Dampf 协助创建 imatrix 校准数据集。
感谢 ZeroWw 启发对嵌入/输出的实验。
感谢 LM Studio 赞助我的工作。
想支持我的工作？请访问我的 ko-fi 页面：
https://ko-fi.com/bartowski

bartowski/command-a-plus-05-2026-GGUF · Hugging Face

bartowski/command-a-plus-05-2026-GGUF · Hugging Face 来源: https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF

https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF#llamacpp-imatrix-quantizations-of-command-a-plus-05-2026-by-coherelabs

https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF#prompt-format

相似文章

Command A Plus GGUFs 已发布

unsloth/North-Mini-Code-1.0-GGUF · Hugging Face

CohereLabs/command-a-plus-05-2026-bf16 · Hugging Face

这是我的 llama.cpp NVFP4/MXFP6 GGUF 量化工具

@WaleedAhmad1a10: 查看 Qwen 3.5 27B MoQ 的 GGUF 文件：

提交意见反馈