froggeric/Qwen-Fixed-Chat-Templates
Summary
This repository provides fixed Jinja chat templates for Qwen 3.5 and 3.6, addressing rendering errors, token waste, and missing features in the official templates for engines like LM Studio and llama.cpp.
View Cached Full Text
Cached at: 05/08/26, 09:08 AM
froggeric/Qwen-Fixed-Chat-Templates · Hugging Face
Source: https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#fixed-chat-templates-for-qwen-35–36Fixed Chat Templates for Qwen 3.5 & 3.6
2026-05-05— Reviewed against community merged templates (allanchan339, fakezeta). Confirmed all useful features already present;
from\_jsonstring-arg parsing not portable to C++ engines. Added auto-close unclosed thinking, thanks to allanchan339.
Drop-in Jinja templates that fix rendering errors, token waste, and missing features in the official Qwen chat templates. Works in LM Studio, llama.cpp, vLLM, MLX, oMLX, and any engine that supports HuggingFace Jinja templates.
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#why-you-need-thisWhy you need this
The official Qwen templates have bugs that break real usage:
ProblemImpactTool calls fail on C++ engines`developerrole rejectedModern APIs send it; the official template raises an errorEmpty thinking blocks spam contextEvery past turn gets wrapped in tags, even with nothing insideNo way to toggle thinkingYou’re stuck with whatever the model defaults toQwen 3.6:</thinking\>hallucinationModel sometimes generates the wrong closing tag; parser failsNo-user-query exception breaks tool callingraise\_exceptioncrashes agentic loops and resets in OpenClaw and similar runtimesUnclosed thinking before tool callModel starts reasoning then calls a tool without closing thinking block — malformed output
All seven are fixed here, plus a clean<\|think\_on\|\>/<\|think\_off\|\>toggle you can drop into any message.
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#quick-installQuick install
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#lm-studioLM Studio
- Open your Qwen model in the right-side panel
- Scroll toPrompt Template
- Replace the template with the contents of
qwen3\.5/chat\_template\.jinjaorqwen3\.6/chat\_template\.jinja - Save
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#llamacpp–koboldcppllama.cpp / koboldcpp
--jinja --chat-template-file qwen3.6/chat_template.jinja
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#vllm–textgenvLLM / TextGen
Replace thechat\_templatestring in yourtokenizer\_config\.jsonwith the file contents.
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#omlxoMLX
Overwritechat\_template\.jinjain your local model directory. Load with\-\-jinja. Remove anychat\_template\_kwargsoverrides — the template handles everything internally.
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#which-file-do-i-useWhich file do I use?
The 3.6 template is a superset — it additionally handlespreserve\_thinking,</thinking\>hallucination recovery, and interrupted thought streams. If you’re on 3.6, use the 3.6 file.
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#thinking-toggleThinking toggle
Drop<\|think\_on\|\>or<\|think\_off\|\>anywhere in your system or user prompt. The template intercepts the tag, removes it from context so the model never sees it, and flips the mode.
Fast answer, no reasoning:
System: You are a coding assistant. <|think_off|>
User: What's 2+2?
Deep reasoning:
System: You are a coding assistant. <|think_on|>
User: Implement a red-black tree in Rust.
The tag syntax (<\|think\_on\|\>,<\|think\_off\|\>) uses Qwen’s control-token delimiters, so it will never collide with real text. Earlier community templates used/think, which broke legitimate paths likecd /mnt/project/think.
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#pre-installed-modelsPre-installed models
These templates are already bundled with:
- froggeric/Qwen3.6-27B-MLX-8bit
- froggeric/Qwen3.6-27B-MLX-4bit
- froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-8bit
- froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-4bit
- froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-8bit
- froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-6bit
- froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-4bit
- froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-8bit
- froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-6bit
- froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit
If you’re using one of those, you already have the template. This repo is for everyone else.
Technical details — what exactly was fixed## https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#tool-calls-on-c-enginesTool calls on C++ engines
The official template iterates tool call arguments with\|items:
{%- for key, value in tool_call.arguments|items %}
Python’s Jinja supports\|items. C++ runtimes (LM Studio, llama.cpp, MLX) do not — the template produces a rendering error instead of output. This template uses direct dictionary key lookups instead:
{%- for args_name in tool_call.arguments %}
{%- set args_value = tool_call.arguments[args_name] %}
It also replacesis sequencewithis iterable(stricter C++ runtimes require it), removes\|safewrappers (also Python-only), and handles arguments returned as raw strings instead of objects.
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#developer-roledeveloperrole
The OpenAI-compatible API spec sendsmessage\.role == "developer"for system-level instructions. The official Qwen template only checks for"system"and throws on anything else. Both templates here accept"developer"and map it to the system role.
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#empty-thinking-blocksEmpty thinking blocks
The official template wraps every past assistant turn in thinking tags:
<|im_start|>assistant
<think/>
</think >
Here is the answer...
When there’s no reasoning content, those tags are dead weight — they waste context tokens and break prefix caching. The Qwen 3.5 template checksreasoning\_contentbefore emitting. The Qwen 3.6 template goes further: it respects thepreserve\_thinkingkwarg, checksreasoning\_content\|trim\|length \> 0, and ties history visibility to the<\|think\_off\|\>override.
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#thinking-hallucination-qwen-36-only</thinking\>hallucination (Qwen 3.6 only)
The Qwen 3.6 model sometimes generates</thinking\>instead of the expected closing tag. The official parser splits on</think \>only and fails. The 3.6 template detects which closing tag was actually used and splits on that:
{%- if '</think >' in content %}
{%- set think_end_token = '</think >' %}
{%- elif '</thinking>' in content %}
{%- set think_end_token = '</thinking>' %}
It also handles interrupted generation (max tokens hit mid-thought) by rescuing incomplete streams instead of injecting broken tag pairs.
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#arguments-serializationArguments serialization
The official template serializes argument values with\|tojsonunconditionally, which turns PythonTrueinto JSONtruecorrectly but fails when the value is already a string. The fixed templates check the type first — strings pass through as-is, everything else goes through\|tojson.
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#auto-close-unclosed-thinking-before-tool-callsAuto-close unclosed thinking before tool calls
The model sometimes starts a thinking block and then immediately calls a tool without emitting the closing tag. The official template doesn’t handle this — the unclosed thinking tag bleeds into the tool call, producing malformed output. Both fixed templates detect this pattern and auto-inject the closing tag before the tool call boundary.
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#no-user-query-exceptionNo-user-query exception
The official template scans the message list in reverse to find the last “real” user query (skipping tool-result wrappers). If all user messages are tool results — or there are no user messages at all — it firesraise\_exception\('No user query found in messages\.'\)and the templatehard-crashes.
This breaks real usage:
- Agentic tool-calling chainswhere the conversation ends with tool results and no fresh user query
- **After
/resetor/new**in runtimes like OpenClaw, where tool results from a prior session persist without a new user message - System-only contextswith no user messages
The fix replaces the exception with a graceful fallback:\{%\- set ns\.last\_query\_index = messages\|length \- 1 %\}. The thinking display logic then degrades naturally — assistant turns with reasoning content still show thinking tags whenpreserve\_thinkingis enabled.
Comparison — Qwen 3.5 templatesFeatureOfficialLuffyTheFoxmod-ellaryPneunyThisTool argumentsFailsFixedMissingFixedFixed\|saferemovedFailsFixedMissingFixedFixeddeveloperroleMissingMissingMissingMissingAddedThinking toggleNoneNone/think(system only)None**<\|think\_off\|\>anywhereEmpty think in historyBrokenBrokenTags omittedBrokenFixedText safetyN/AN/ABreaks on/thinkin pathsN/ASafeClean instructionsYesYesYesInjects “I cannot call a tool”YesNo-user-query crashCrashesCrashesCrashesCrashesGraceful fallbackAuto-close thinking before toolNot handledNot handledNot handledNot handledAuto-injects close tag**
Comparison — Qwen 3.6 templateFeatureOfficialThisTool argumentsFails (\|items)Fixed\|saferemovedFailsFixeddeveloperroleMissingAddedThinking toggleNone**<\|think\_off\|\>anywhere**preserve\_thinkingSpams empty blocksDynamic length checks</thinking\>hallucinationFailsDetected and handledInterrupted streamsBroken tagsRescuedAuto-close thinking before toolNot handledAuto-injects close tagNo-user-query crashCrashesGraceful fallback
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#authorshipAuthorship
RoleAuthorOriginal modelsAlibaba Cloud (Qwen team)Template fixesfroggeric
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates#licenseLicense
Apache-2.0, inherited from Qwen.
Similar Articles
@MaximeRivest: Tool calling in open source LLMs is wildly different from one model to another. I just wipped up: http://chattemplatepl…
A new web tool, Chat Template Playground, lets users visualize how different open-source LLMs render their chat templates, highlighting differences in prompting and tokenization.
What's in a GGUF, besides the weights – and what's still missing?
This article explores the GGUF file format used by llama.cpp for language models, highlighting its single-file convenience and the role of embedded chat templates and special tokens. It also compares different Jinja implementations and discusses what is still missing from the format.
havenoammo/Qwen3.6-27B-MTP-UD-GGUF
This Hugging Face repository provides GGUF files for Qwen3.6-27B with Multi-Token Prediction (MTP) layers grafted onto Unsloth UD XL quantizations. It includes instructions for building llama.cpp with MTP support to enable speculative decoding.
Jackrong/Qwopus3.6-27B-Coder-Compat-MTP-GGUF
Jackrong releases Qwopus3.6-27B-Coder-Compat-MTP-GGUF, a GGUF quantization of the Qwopus3.6-27B-Coder model with an expanded chat template for better interoperability with tool-using runtimes and OpenAI-compatible agent frameworks.
Experimental "Preserve Thinking" Jinja Template for Gemma4 31B in llama.cpp
An experimental Jinja template for Gemma4 31B in llama.cpp that improves stability for multi-turn tool calls by fixing common thinking tag issues. Community feedback is welcome, but this is not recommended by Google.