Command A Plus GGUFs 已发布

Reddit r/LocalLLaMA 2026/06/15 03:11 模型

command-a-plus gguf cohere open-source quantization agentic multilingual

摘要

Cohere 已发布其 Command A+ 模型的 GGUF 量化版本（25B 活跃参数 / 218B 总参数，Apache 2.0），用于本地推理，针对智能体和多语言任务进行了优化。

本周末，llama.cpp 添加了对 Command A Plus 和 North Mini Code 的支持。Unsloth 有 North Mini Code 的 GGUFs，但我没有找到任何人有最新的 Command A Plus 的 GGUFs，因此我亲自转换并量化了它！

查看原文

查看缓存全文

缓存时间: 2026/06/15 09:06

coder543/command-a-plus-05-2026-gguf · Hugging Face 来源：https://huggingface.co/coder543/command-a-plus-05-2026-gguf ## https://huggingface.co/coder543/command-a-plus-05-2026-gguf#command-a-ggufsCommand A+ GGUFs 文件名大小 (GiB)command-a-plus-05-2026-bf16.gguf407command-a-plus-05-2026-q4_k_m.gguf124command-a-plus-05-2026-q4_k_s.gguf116command-a-plus-05-2026-iq4_xs.gguf110command-a-plus-05-2026-q3_k_m.gguf98 > 注：这些 GGUF 文件仅支持文本，不支持图像输入。 — ## https://huggingface.co/coder543/command-a-plus-05-2026-gguf#model-card-for-command-aCommand A+ 模型卡片 ## https://huggingface.co/coder543/command-a-plus-05-2026-gguf#model-summary模型摘要 Command A+ 是一款开源模型，拥有 250 亿激活参数和 2180 亿总参数，专为智能体、多语言和重推理任务优化，并侧重于企业级性能，同时支持视觉输入以处理图像。开发方：Cohere (https://cohere.com/) 和 Cohere Labs (https://cohere.com/research) - 联系方式：Cohere Labs (https://cohere.com/research) - 许可协议：Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0) - 模型：command-a-plus-05-2026 - 模型规模：250 亿激活参数，2180 亿总参数 - 上下文长度：128K 输入有关该模型的更多详情，请查看我们的博文 (http://cohere.com/blog/command-a-plus)。您可以在我们的托管 Hugging Face 空间 (https://huggingface.co/spaces/CohereLabs/command-a-plus-05-2026) 中下载权重前试用 Command A+。可用量化版本以下量化版本可供使用，并附有示例最低 GPU 需求。所有三种量化版本在基准质量和性能上差异极小。我们推荐大多数用户使用 W4A4 (https://huggingface.co/CohereLabs/command-a-plus-05-2026-w4a4) 量化版本，它在速度和延迟方面表现出色，同时硬件占用更小。更多详情请查看我们的博文 (http://cohere.com/blog/command-a-plus)。使用方法 Transformers 请从包含该模型必要变更的源仓库安装 transformers。 # pip install transformers from transformers import AutoTokenizer, AutoModelForImageTextToText model_id = "CohereLabs/command-a-plus-05-2026-bf16" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForImageTextToText.from_pretrained(model_id) # 使用 command-a-plus-05-2026-bf16 聊天模板格式化消息 messages = [{"role": "user", "content": "What has keys but can't open locks?"}] input_ids = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt", ) gen_tokens = model.generate( input_ids, max_new_tokens=4096, do_sample=True, temperature=0.6, top_p=0.95 ) gen_text = tokenizer.decode(gen_tokens[0]) print(gen_text) 运行后，您应得到类似下面的输出，其中思考过程生成在`和`之间： `<|START_THINKING|>用户问了一个谜语：“什么有钥匙却打不开锁？”答案是钢琴（或键盘）。所以用答案回应。<|END_THINKING|>` 您也可以直接使用 transformers 的 pipeline 抽象来调用模型： from transformers import pipeline import torch model_id = "CohereLabs/command-a-plus-05-2026-bf16" tokenizer = AutoTokenizer.from_pretrained(model_id) pipe = pipeline( "text-generation", model=model_id, dtype="auto", device_map="auto", ) messages = [ {"role": "user", "content": "Explain the Transformer architecture"}, ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) outputs = pipe( messages, max_new_tokens=300, ) print(outputs[0]["generated_text"][-1]) vLLM 您也可以使用 vLLM 运行该模型。需要 `vllm>=0.21.0` 才能支持 Command A+，并且准确解析响应还需要安装 Cohere 的 `melody` 库 (https://pypi.org/project/cohere-melody/)。 `uv pip install vllm>=0.21.0 uv pip install transformers uv pip install cohere_melody>=0.9.0` 然后使用以下命令启动 vLLM 服务器： `# 适用于 B200，请根据您的设备调整 tp vllm serve CohereLabs/command-a-plus-05-2026-bf16 -tp 4 --tool-call-parser cohere_command4 --reasoning-parser cohere_command4 --enable-auto-tool-choice` ## https://huggingface.co/coder543/command-a-plus-05-2026-gguf#model-details模型详情输入：文本和图像。输出：模型生成文本。模型架构：Command A+ 是一个纯解码器的稀疏混合专家 Transformer 模型。拥有 250 亿激活参数和 2180 亿总参数，包含 128 个专家，其中每个 token 激活 8 个专家，并且所有 token 共享一个共享专家。注意力层以 3:1 的比例交替使用带有旋转位置编码的滑动窗口注意力层和不带位置编码的全局注意力层，这一设计最初由 Command A 引入。稀疏 MoE 层采用完全无丢弃训练，并使用基于 token 选择的路由器。我们使用基于加性偏置的负载均衡来鼓励各个专家之间的 token 负载平衡，并将 softmax 路由器激活函数替换为针对每个 token 的 topk 专家 logits 的归一化 sigmoid。支持的语言：模型已在 48 种语言上训练：英语、阿拉伯语、保加利亚语、孟加拉语、加泰罗尼亚语、捷克语、丹麦语、德语、希腊语、西班牙语、爱沙尼亚语、波斯语、芬兰语、菲律宾语、法语、爱尔兰语、希伯来语、印地语、克罗地亚语、匈牙利语、印度尼西亚语、冰岛语、意大利语、日语、韩语、立陶宛语、拉脱维亚语、马来语、马耳他语、荷兰语、挪威语、旁遮普语、波兰语、葡萄牙语、罗马尼亚语、俄语、斯洛伐克语、斯洛文尼亚语、塞尔维亚语、瑞典语、泰米尔语、泰卢固语、泰语、土耳其语、乌克兰语、乌尔都语、越南语、中文。上下文长度：Command A+ 支持 128K 的输入上下文长度和 64K 的输出长度。 ### https://huggingface.co/coder543/command-a-plus-05-2026-gguf#tool-use-capabilities工具使用能力：Command A+ 专门针对对话式工具使用能力进行了训练。这使得模型能够与外部工具（如 API、数据库或搜索引擎）进行交互。在 Transformers 中，通过聊天模板 (https://huggingface.co/docs/transformers/main/en/chat_templating#advanced-tool-use–function-calling) 支持 Command A+ 的工具使用。我们建议使用 JSON 模式提供工具描述。工具使用示例 [点击展开]from transformers import AutoTokenizer model_id = "CohereLabs/command-a-plus-05-2026-bf16" tokenizer = AutoTokenizer.from_pretrained(model_id) # 定义工具 tools = [{ "type": "function", "function": { "name": "query_daily_sales_report", "description": "连接到数据库，检索某一天的总体销售量和销售信息。", "parameters": { "type": "object", "properties": { "day": { "description": "检索这一天的销售数据，格式为 YYYY-MM-DD。", "type": "string", } }, "required": ["day"], }, }, }] # 定义对话输入 conversation = [ {"role": "user", "content": "Can you provide a sales summary for 29th September 2023?"} ] # 直接标记化工具使用提示 input_ids = tokenizer.apply_chat_template( conversation=conversation, tools=tools, tokenize=True, add_generation_prompt=True, return_tensors="pt", ) 然后您可以像平常一样从此输入生成。如果模型生成了计划和工具调用，您应按如下方式将其添加到聊天历史中： `tool_call = {"name": "query_daily_sales_report", "arguments": {"day": "2023-09-29"}} thinking = "我将使用 query_daily_sales_report 工具查找 2023 年 9 月 29 日的销售摘要。" conversation.append({"role": "assistant", "tool_calls": [{"id": "0", "type": "function", "function": tool_call}], "thinking": thinking})` 然后调用工具并以工具角色将结果作为字典附加，如下所示： `api_response_query_daily_sales_report = {"date": "2023-09-29", "summary": "总销售金额：10000，总销售单位：250"} # 这必须是一个字典！！ # 追加工具结果 conversation.append({"role": "tool", "tool_call_id": "0", "content": api_response_query_daily_sales_report})` 之后，您可以再次调用 generate() 让模型在聊天中使用工具结果。请注意，这只是工具调用的简要介绍 — 更多信息请参阅 Transformers 的工具使用文档 (https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use–function-calling)。带引用的工具使用 [点击展开]可选地，您可以通过在 `tokenizer.apply_chat_template()` 中使用 `enable_citations=True` 要求模型在响应中包含引用跨度（citations），以指示信息来源。生成结果如下所示： `2023 年 9 月 29 日，总销售金额为 10000，总销售单位为 250。` 当开启引用时，模型会将文本片段（称为“跨度”）与支持这些片段的具体工具结果（称为“来源”）关联起来。Command A+ 使用一对 `<cs>` 和 `<ce>` 标签来指示某个跨度可以基于一系列来源进行引用，并在结束标签中列出这些来源。例如，`<cs><ce>0:0,1:0</ce></cs>` span 表示 “span” 由来自 `tool_call_id=0` 的结果 1 和 2 以及来自 `tool_call_id=1` 的结果 0 支持。来自同一工具调用的来源被分组在一起，列为 `{tool_call_id}:[{结果索引列表}]`，然后用 “,” 连接。 ## https://huggingface.co/coder543/command-a-plus-05-2026-gguf#model-card-contact模型卡片联系方式如需报告本模型卡片中的错误或对细节有疑问，请联系 [[email protected]]。立即试用：您可以在 Playground (https://dashboard.cohere.com/playground/chat?model=command-a-plus-05-2026) 中试用 Command A+。您也可以在我们的专用 Hugging Face 空间 (https://huggingface.co/spaces/CohereLabs/command-a-plus-05-2026) 中使用它。

相似文章

bartowski/command-a-plus-05-2026-GGUF · Hugging Face

Reddit r/LocalLLaMA

Cohere 的 command-a-plus-05-2026 模型的 GGUF 量化版本，针对 llama.cpp 进行了优化，并提供了多种量化级别，适用于本地推理。

CohereLabs/command-a-plus-05-2026-bf16 · Hugging Face

Reddit r/LocalLLaMA

Cohere发布了Command A+，这是一个开源模型，拥有250亿活跃参数（总计2180亿），针对代理型、多语言和重度推理任务进行了优化，支持视觉输入和128K上下文，采用Apache 2.0许可证。

CohereLabs/command-a-plus-05-2026-w4a4

Hugging Face Models Trending

CohereLabs 发布了 Command A+，一个开源的 25B 活跃参数模型，针对智能体、多语言和推理任务进行了优化，支持视觉功能，采用 Apache 2.0 许可证。

@aisearchio: GLM 5.2 GGUF 已经来了！8位版本大小约为完整模型的一半。更小版本即将推出 https://huggingfa…

X AI KOLs Timeline

GLM 5.2 GGUF 量化模型已发布，8位版本大小约为完整模型的一半；更小版本即将推出。

校准用于智能体编码任务的2位GGUF量化（<10Gb）

Reddit r/LocalLLaMA

本文介绍Qwopus3.6-27B-Coder模型的校准2位GGUF量化版本，用于智能体编码任务。实验表明，IQ2_M量化（9.74 GiB）在SWE-rebench基准测试中达到63%的通过率，与Q5_K_M量化相当，但模型大小仅为其一半。

相似文章

bartowski/command-a-plus-05-2026-GGUF · Hugging Face

CohereLabs/command-a-plus-05-2026-bf16 · Hugging Face

CohereLabs/command-a-plus-05-2026-w4a4

@aisearchio: GLM 5.2 GGUF 已经来了！8位版本大小约为完整模型的一半。更小版本即将推出 https://huggingfa…

校准用于智能体编码任务的2位GGUF量化（<10Gb）

提交意见反馈