Gryphe/Pantheon-Reasoning-27B · Hugging Face
Summary
Gryphe releases Pantheon-Reasoning-27B, an uncensored dense Qwen 3.6 27B model fine-tuned with reasoning traces for enhanced roleplay and narrative generation. It combines roleplay data with full thinking traces to improve character immersion and narrative planning.
View Cached Full Text
Cached at: 05/30/26, 11:18 AM
Gryphe/Pantheon-Reasoning-27B · Hugging Face
Source: https://huggingface.co/Gryphe/Pantheon-Reasoning-27B
https://huggingface.co/Gryphe/Pantheon-Reasoning-27B#pantheon-reasoning-27bPantheon-Reasoning-27B
An experiment in bringing reasoning capability to the Pantheon roleplay series in the form of an uncensored dense Qwen 3.6 27B. This specific model can be thought of as a successor to both the Pantheon series and the one-time Codex release since I used such a large variety of data this time around.
Yet another theory being tested this time around: take the data that Pantheon is built on, pair it with full thinking traces, and let the model reason its way through character work — weighing tone, planning narrative beats, considering how a character would actually respond before committing to a line. Whether that meaningfully improves roleplay quality over a non-reasoning model is a question you’ll hopefully be able to help me answer.
GGUF quantsare available here.
https://huggingface.co/Gryphe/Pantheon-Reasoning-27B#model-detailsModel details
Base model isllmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved, and from what I can tell this worked out very, very nicely in regards to refusal reduction and writing capabilities.
I considered Gemma 4 31B but that model has been an absolute pain to train. Something something special snowflake architectures. (grumble, grumble)
All training sources include full reasoning traces, with thinking active across every assistant turn:
- Pantheon data(~28%) - the core Pantheon roleplay corpus with reasoning traces back-generated using the method described below
- Opus-4.6-Reasoning-24k(~21%) - a cleaned and deduplicated aggregation of Claude Opus 4.6 reasoning traces covering general instruction-following, STEM, and coding; provides the broad reasoning backbone
- WorldSim data(~16%) - long-form Opus 4.6 narrative roleplay with native reasoning traces, focusing on extended storytelling, character immersion, and emergent world logic, cobbled together through various experiments - mainly third person present tense but has a bit of everything + cliché cleaned, of course!
- Text adventure data(~16%) - high stakes interactive fiction and text adventure content with reasoning back-generated, lending the model a more grounded, prose-forward writing style
- General roleplay data(~16%) - a broad collection of highly varied roleplay transcripts with reasoning back-generated, helping the model generalise well to arbitrary character setups
- Tiamat data(~3%) - character and roleplay dataset originally built forTiamat-24B-Magistral, featuring a multi-step generation/extension/improvement pipeline with critic-improver rewrites to reduce AI clichés, with reasoning back-generated for each exchange
The model was trained withpreserve\_thinking: true, so thinking tags remain active across all assistant turns in multi-turn conversations, not just the first.
https://huggingface.co/Gryphe/Pantheon-Reasoning-27B#reasoning-back-generationReasoning back-generation
For the Pantheon, text adventure, Tiamat, and general roleplay data, thinking traces were generated using DeepSeek 3.2 after the fact rather than being native to the source material. I tried V4 Flash as well but it proved to be terrible at this specific task. The approach prompts the model to thinkas a writer planning their next response— before writing — rather than annotating a response that already exists. This distinction matters: the goal is genuine forward planning (considering character psychology, tone, and narrative direction), not post-hoc explanation.
Each generated trace was validated by a judge model before being kept. Traces that slipped into character voice, produced pure restatement, or read as analysis rather than planning were rejected and retried. The result is thinking that reflects real craft decisions rather than a summary of what the response contains.
The theory is that this reasoning ties semi-seamlessly into Qwen 3.6 27B’s native training and therefore enhances, rather than blatantly overwrites.
https://huggingface.co/Gryphe/Pantheon-Reasoning-27B#what-is-pantheonWhat is Pantheon?
Pantheon is my ongoing series of roleplay-focused finetunes built around a collection of diverse personas — characters with distinct personalities, voices, accents and mannerisms. Though I made sure to mention exactly which personas these were in the past in reality I’m generally the only one bothering to actually use them (lol) so I’m not going to bother with a huge list this time around.
TLDR: Ten personas put through hundreds of scenarios, from good to bad and anything in-between.
https://huggingface.co/Gryphe/Pantheon-Reasoning-27B#inferenceInference
These settings have been working well for me:
"temperature": 1.0,
"repetition_penalty": 1.0,
"min_p": 0.05
Reasoning models seem to work better without a repetition penalty — likely because it also affects the thinking traces, even though those aren’t visible in the output.
I obviously recommend leaving thinking enabled, and ideally withpreserve\_thinkingturned on. Having said that, I’m also very curious about non-reasoning performance!
https://huggingface.co/Gryphe/Pantheon-Reasoning-27B#prompt-formatPrompt Format
The model was trained using ChatML via Qwen3.6’s chat template, which should be applied automatically.
Since reasoning doesn’t tend to play nice with character name prefixes enabled I’m inclined to recommend against using them.
https://huggingface.co/Gryphe/Pantheon-Reasoning-27B#notesNotes
This is, like most of my releases nowadays, a research release and hasn’t gone through extensive quality testing beyond basic sanity checks. The core question — does reasoning actually help roleplay, or does it just add latency? — is one I’m genuinely curious about, and your feedback will be far more informative than my own bias here. Let me know what you find!
https://huggingface.co/Gryphe/Pantheon-Reasoning-27B#creditsCredits
- Everyone fromAnthracite! Hi, guys!
- Latitude, for which I am still producing finetunes on a regular basis, helping me keep my skills sharp and up-to-date!
- All the original dataset authors behind the Opus 4.6 reasoning data — full credits in thedataset card
- All the folks I chat with on a daily basis on Discord! You know who you are.
- Anyone I forgot to mention, just in case!
Similar Articles
Qwen/Qwen3.6-27B-FP8
Alibaba releases Qwen3.6-27B-FP8, a 27B FP8-quantized model with strong agentic coding and reasoning benchmarks, now available on Hugging Face.
Qwen/Qwen3.6-27B
Qwen releases the open-weight Qwen3.6-27B model on Hugging Face, featuring improved stability, agentic coding capabilities, and thinking preservation for better developer productivity.
Qwen3.6-27B-GGUF is here!
Community GGUF release of Qwen’s 27B hybrid-architecture model with 262k context, multimodal inputs, tool calling and "Thinking Preservation" for agentic coding.
hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
A 35B-parameter Qwen3.6 model fine-tuned with Claude-Opus-style chain-of-thought distillation data and released in GGUF quantized formats for efficient local inference.
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
Jackrong releases Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, a fine-tuned 27B parameter model with improved reasoning capabilities and stability, along with comprehensive training guides and code on GitHub using the Unsloth framework.
