Gemma 4 Chat Template 现在支持保留思考

Reddit r/LocalLLaMA 模型

摘要

Google 的 Gemma 4 31B IT 模型现在更新了聊天模板,支持保留思考过程,并改进了空值处理、推理保留和输入验证。

暂无内容
查看原文
查看缓存全文

缓存时间: 2026/06/08 15:19

google/gemma-4-31B-it · 修复:聊天模板 — 空值处理、推理保留、对话标签平衡、输入验证

来源:https://huggingface.co/google/gemma-4-31B-it/discussions/118

使用 google/gemma-4-31B-it 的指南(支持库、推理引擎、笔记本和本地应用)

请点击以下链接快速开始。

    • Transformers (https://huggingface.co/google/gemma-4-31B-it?library=transformers) 如何使用 google/gemma-4-31B-it 配合 Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="google/gemma-4-31B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)
# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("google/gemma-4-31B-it")
model = AutoModelForImageTextToText.from_pretrained("google/gemma-4-31B-it")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]

inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
  • 推理

    • HuggingChat (https://huggingface.co/chat/models/google/gemma-4-31B-it)
  • 笔记本

    • Google Colab (https://huggingface.co/google/gemma-4-31B-it/colab)
    • Kaggle (https://huggingface.co/google/gemma-4-31B-it/kaggle)
    • AMD Developer Cloud (https://huggingface.co/google/gemma-4-31B-it/amd)
  • 本地应用

    • 设置 (https://huggingface.co/settings/local-apps)
    • vLLM (https://huggingface.co/google/gemma-4-31B-it?local-app=vllm) 如何使用 google/gemma-4-31B-it 配合 vLLM:
通过 pip 安装并启动模型服务
# Install vLLM from pip:
pip install vllm

# Start the vLLM server:
vllm serve "google/gemma-4-31B-it"

# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
  "model": "google/gemma-4-31B-it",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this image in one sentence."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ]
}'
使用 Docker
docker model run hf.co/google/gemma-4-31B-it
  • SGLang (https://huggingface.co/google/gemma-4-31B-it?local-app=sglang) 如何使用 google/gemma-4-31B-it 配合 SGLang:
通过 pip 安装并启动模型服务
# Install SGLang from pip:
pip install sglang

# Start the SGLang server:
python3 -m sglang.launch_server \
  --model-path "google/gemma-4-31B-it" \
  --host 0.0.0.0 \
  --port 30000

# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
  "model": "google/gemma-4-31B-it",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this image in one sentence."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ]
}'
使用 Docker 镜像
docker run --gpus all \
  --shm-size 32g \
  -p 30000:30000 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --env "HF_TOKEN=<token>" \
  --ipc=host \
  lmsysorg/sglang:latest \
  python3 -m sglang.launch_server \
    --model-path "google/gemma-4-31B-it" \
    --host 0.0.0.0 \
    --port 30000

# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
  "model": "google/gemma-4-31B-it",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this image in one sentence."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ]
}'
  • Docker Model Runner (https://huggingface.co/google/gemma-4-31B-it?local-app=docker-model-runner) 如何使用 google/gemma-4-31B-it 配合 Docker Model Runner:
docker model run hf.co/google/gemma-4-31B-it

相似文章

google/gemma-4-E4B-it-assistant

Hugging Face Models Trending

Google DeepMind 发布了 Gemma 4 E4B 指令微调助手模型,该模型具备多模态能力、推理改进以及针对低延迟端侧应用优化的投机解码功能。