@Ex0byt: And... Ladies and Gentlemen: Qwen3.6-27B-PRISM-PRO-DQ - enjoy!
Summary
Release of Qwen3.6-27B-PRISM-PRO-DQ, a dynamically quantized GGUF version of Qwen3.6-27B with bias/propaganda removal, preserving native MTP draft head and vision tower, enabling lossless speculative decoding for faster inference.
View Cached Full Text
Cached at: 05/20/26, 10:30 AM
And… Ladies and Gentlemen: Qwen3.6-27B-PRISM-PRO-DQ - enjoy! https://t.co/fFsWmeDwAu
Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ · Hugging Face
Source: https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ
https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#qwen36-27b-prism-pro–dq-ggufQwen3.6-27B-PRISM-PRO — DQ GGUF
llama.cpp-native GGUF quantization ofQwen3\.6\-27B\-PRISM\-PROusing the PRISM project’sdynamic-quant (DQ)recipe.~13.7 GB(vs 55 GB BF16).
PRISM-PRO ofQwen/Qwen3\.6\-27B(bias/propoganda removal) This GGUF preserves the model’s native MTP draft head + full vision tower, and pairs with the separately-publishedEAGLE-3 drafterfor lossless faster decode.
https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#performancePerformance
llama.cpp on a single NVIDIA Blackwell GPU, single-stream greedy decode:
configtok/sspeedupno-spec baseline801.00×native MTP(built-in draft head)121****1.51×EAGLE-3 chain (with our drafter)1111.39× Speculative decoding islossless(output token-identical to non-spec greedy, modulo batched-verify floating-point non-associativity intrinsic to all spec decoding). For a faster SGLang deployment (~183 tok/s, ~1.97× over no-spec) using the BF16 target + EAGLE-3, see thedrafter repo.
https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#quick-start-llamacppQuick start (llama.cpp)
# 1. no-spec baseline
./llama-server --model Qwen3.6-27B-PRISM-PRO-DQ.gguf
# 2. native MTP speculative decoding (the model's own draft head -- fastest in llama.cpp)
./llama-server --model Qwen3.6-27B-PRISM-PRO-DQ.gguf \
--spec-type draft-mtp --spec-draft-n-max 1 --spec-draft-n-min 1
# 3. EAGLE-3 chain (needs the WIP PR #18039 patches + the RS-rollback fix --
# a one-shot llama.cpp patch script is documented alongside the drafter:
# https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-EAGLE3)
./llama-server --model Qwen3.6-27B-PRISM-PRO-DQ.gguf \
--spec-type draft-eagle3 --model-draft <eagle3-drafter.gguf> \
--spec-draft-n-max 2
https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#provenanceProvenance
- Base:
Qwen/Qwen3\.6\-27B(hybrid: 48 GatedDeltaNet linear-attention layers- 16 full-attention layers; hidden 5120; vocab 248 320; native MTP head). - **PRISM Dynamic Quantization:**PRISM DQ recipe (llama.cpp GGUF dynamic quant) — preserves the MTP draft head (15 tensors) and the full vision tower (333 tensors).
https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#licenseLicense
Apache-2.0. Derived fromQwen/Qwen3\.6\-27B(Apache-2.0).
Similar Articles
DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF
A community-finetuned, uncensored version of the Qwen 3.6 27B model featuring high-precision GGUF quantizations.
Qwen3.6-27B-GGUF is here!
Community GGUF release of Qwen’s 27B hybrid-architecture model with 262k context, multimodal inputs, tool calling and "Thinking Preservation" for agentic coding.
Qwen3.6-27B Uncensored Aggressive is out with K_P quants!
Community release of Qwen3.6-27B stripped of safety refusals and packaged in optimized K_P GGUF quants for llama.cpp and LM Studio.
Qwen/Qwen3.6-35B-A3B-FP8
Alibaba releases Qwen3.6-35B-A3B-FP8, an open-weight quantized variant of Qwen3.6 with 35B parameters and 3B activated via MoE, featuring improved agentic coding capabilities and thinking preservation for iterative development.
@no_stp_on_snek: After testing a replay of prompts from this model I can confidently say it's a viable replacement for one of my product…
A decensored version of Qwen3.6-35B-A3B, using the Heretic method with MPOA, achieving 88% fewer refusals while preserving model quality, is released as GGUF quantizations by llmfan46.