Gemma 4 Chat Template 现在支持保留思考

Reddit r/LocalLLaMA 2026/06/08 13:35 模型

摘要

Google 的 Gemma 4 31B IT 模型现在更新了聊天模板，支持保留思考过程，并改进了空值处理、推理保留和输入验证。

暂无内容

查看原文

查看缓存全文

缓存时间: 2026/06/08 15:19

google/gemma-4-31B-it · 修复：聊天模板 — 空值处理、推理保留、对话标签平衡、输入验证

来源：https://huggingface.co/google/gemma-4-31B-it/discussions/118

使用 google/gemma-4-31B-it 的指南（支持库、推理引擎、笔记本和本地应用）

请点击以下链接快速开始。

库
- Transformers (https://huggingface.co/google/gemma-4-31B-it?library=transformers) 如何使用 google/gemma-4-31B-it 配合 Transformers：

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="google/gemma-4-31B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("google/gemma-4-31B-it")
model = AutoModelForImageTextToText.from_pretrained("google/gemma-4-31B-it")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]

inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

推理
- HuggingChat (https://huggingface.co/chat/models/google/gemma-4-31B-it)
笔记本
- Google Colab (https://huggingface.co/google/gemma-4-31B-it/colab)
- Kaggle (https://huggingface.co/google/gemma-4-31B-it/kaggle)
- AMD Developer Cloud (https://huggingface.co/google/gemma-4-31B-it/amd)
本地应用
- 设置 (https://huggingface.co/settings/local-apps)
- vLLM (https://huggingface.co/google/gemma-4-31B-it?local-app=vllm) 如何使用 google/gemma-4-31B-it 配合 vLLM：

通过 pip 安装并启动模型服务

# Install vLLM from pip:
pip install vllm

# Start the vLLM server:
vllm serve "google/gemma-4-31B-it"

# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
  "model": "google/gemma-4-31B-it",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this image in one sentence."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ]
}'

使用 Docker

docker model run hf.co/google/gemma-4-31B-it

SGLang (https://huggingface.co/google/gemma-4-31B-it?local-app=sglang) 如何使用 google/gemma-4-31B-it 配合 SGLang：

通过 pip 安装并启动模型服务

# Install SGLang from pip:
pip install sglang

# Start the SGLang server:
python3 -m sglang.launch_server \
  --model-path "google/gemma-4-31B-it" \
  --host 0.0.0.0 \
  --port 30000

# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
  "model": "google/gemma-4-31B-it",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this image in one sentence."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ]
}'

使用 Docker 镜像

docker run --gpus all \
  --shm-size 32g \
  -p 30000:30000 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --env "HF_TOKEN=<token>" \
  --ipc=host \
  lmsysorg/sglang:latest \
  python3 -m sglang.launch_server \
    --model-path "google/gemma-4-31B-it" \
    --host 0.0.0.0 \
    --port 30000

# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
  "model": "google/gemma-4-31B-it",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this image in one sentence."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ]
}'

Docker Model Runner (https://huggingface.co/google/gemma-4-31B-it?local-app=docker-model-runner) 如何使用 google/gemma-4-31B-it 配合 Docker Model Runner：

docker model run hf.co/google/gemma-4-31B-it

相似文章

实验性“Preserve Thinking” Jinja 模板，用于 llama.cpp 中的 Gemma4 31B

Reddit r/LocalLLaMA

这是一个实验性 Jinja 模板，用于 llama.cpp 中的 Gemma4 31B，通过修复常见的 thinking tag 问题来提升多轮工具调用的稳定性。欢迎社区反馈，但 Google 不推荐使用。

PSA：Gemma 4 12B 在编程和工具调用方面并非完全不可用，你需要特殊的聊天模板

Reddit r/LocalLLaMA

Gemma 4 12B 在工具调用和编程方面存在已知问题，但在 llama.cpp 中使用自定义聊天模板可以解决这些错误。用户应在评估模型的编程能力之前，从源码编译 llama.cpp 并应用此修复。

Google AI Edge Gallery v1.0.13 和 v1.0.14 更新：Gemma 4 多令牌预测、Pixel TPU 支持、实验性 MCP、新技能，以及聊天历史保存功能

Reddit r/LocalLLaMA

Google AI Edge Gallery v1.0.13 和 v1.0.14 更新增加了对 Gemma 4 的多令牌预测支持、Pixel TPU 优化、实验性 MCP、新技能以及聊天历史保存功能，提升了设备端生成式 AI 能力。

Gemma 4 2B 通过 Spring AI / LM Studio 正确处理结构化 JSON 输出、工具调用和推理轨迹——包括在代码审查中识别出一个真实的 Java 错误

Reddit r/LocalLLaMA

用户测试了 Gemma 4 2B 在本地通过 LM Studio 和 Spring AI 运行，用于结构化 JSON 输出、工具调用和推理轨迹，发现它正确识别了代码审查中的 Java 错误，并且性能与更大的模型相当。

google/gemma-4-E4B-it-assistant

Hugging Face Models Trending

Google DeepMind 发布了 Gemma 4 E4B 指令微调助手模型，该模型具备多模态能力、推理改进以及针对低延迟端侧应用优化的投机解码功能。

google/gemma-4-31B-it · 修复：聊天模板 — 空值处理、推理保留、对话标签平衡、输入验证

使用 google/gemma-4-31B-it 的指南（支持库、推理引擎、笔记本和本地应用）

通过 pip 安装并启动模型服务

使用 Docker

通过 pip 安装并启动模型服务

使用 Docker 镜像

相似文章

实验性“Preserve Thinking” Jinja 模板，用于 llama.cpp 中的 Gemma4 31B

PSA：Gemma 4 12B 在编程和工具调用方面并非完全不可用，你需要特殊的聊天模板

Google AI Edge Gallery v1.0.13 和 v1.0.14 更新：Gemma 4 多令牌预测、Pixel TPU 支持、实验性 MCP、新技能，以及聊天历史保存功能

Gemma 4 2B 通过 Spring AI / LM Studio 正确处理结构化 JSON 输出、工具调用和推理轨迹——包括在代码审查中识别出一个真实的 Java 错误

google/gemma-4-E4B-it-assistant

提交意见反馈