Experimental "Preserve Thinking" Jinja Template for Gemma4 31B in llama.cpp

Reddit r/LocalLLaMA 05/23/26, 05:21 AM Tools

jinja-template llama-cpp gemma4 multi-turn-tool-calls experimental community-contribution

Summary

An experimental Jinja template for Gemma4 31B in llama.cpp that improves stability for multi-turn tool calls by fixing common thinking tag issues. Community feedback is welcome, but this is not recommended by Google.

[https://huggingface.co/stevelikesrhino/gemma-4-31B-it-nvfp4-GGUF/blob/main/gemma4-improved.jinja](https://huggingface.co/stevelikesrhino/gemma-4-31B-it-nvfp4-GGUF/blob/main/gemma4-improved.jinja) Yall are more than welcome to try it out and provide feedback. In my own testing in Pi-coding-agent I no longer have the "forgot to close thinking tag" "forgot to open thinking" "closed thinking to early" problem. It's more stable for multi-turn tool calls within multiple turns of prompts. Disclaimer this is NOT recommended by Google.

Original Article

Similar Articles

Gemma 4 Chat Template now has preserve thinking

Reddit r/LocalLLaMA

Google's Gemma 4 31B IT model now has a chat template fix that preserves thinking and improves null handling, reasoning preservation, and input validation.

PSA: Gemma 4 12B is NOT completely broken for coding and tool calling, you need a special chat template

Reddit r/LocalLLaMA

Gemma 4 12B has a known issue with tool calling and coding, but using a custom chat template in llama.cpp resolves the bugs. Users should compile llama.cpp from source and apply the fix before evaluating the model's coding ability.

[WIP] Gemma 4 MTP

Reddit r/LocalLLaMA

llama.cpp is an open-source C/C++ library for efficient LLM inference on various hardware, supporting multiple quantization formats and GPU backends. This README details its features, installation, and recent updates including Hugging Face cache migration and multimodal support.

Gemma 4 2B handling structured JSON output + tool calling + reasoning traces correctly via Spring AI / LM Studio — including identifying a real Java bug in code review

Reddit r/LocalLLaMA

User tested Gemma 4 2B running locally via LM Studio and Spring AI for structured JSON output, tool calling, and reasoning traces, finding it correctly identified a Java bug in code review and performed comparably to larger models.

google/gemma-4-26B-A4B-it-assistant

Hugging Face Models Trending

Google DeepMind released Gemma 4 MTP drafters for the Gemma 4 family, enabling significant decoding speedups via speculative decoding while maintaining exact generation quality for low-latency applications.

Similar Articles

Gemma 4 Chat Template now has preserve thinking

PSA: Gemma 4 12B is NOT completely broken for coding and tool calling, you need a special chat template

[WIP] Gemma 4 MTP

Gemma 4 2B handling structured JSON output + tool calling + reasoning traces correctly via Spring AI / LM Studio — including identifying a real Java bug in code review

google/gemma-4-26B-A4B-it-assistant

Submit Feedback