@Ex0byt: And... Ladies and Gentlemen: Qwen3.6-27B-PRISM-PRO-DQ - enjoy!

X AI KOLs Timeline 05/19/26, 09:31 PM Models

qwen quantized gguf prism llama-cpp speculative-decoding open-source

Summary

Release of Qwen3.6-27B-PRISM-PRO-DQ, a dynamically quantized GGUF version of Qwen3.6-27B with bias/propaganda removal, preserving native MTP draft head and vision tower, enabling lossless speculative decoding for faster inference.

And... Ladies and Gentlemen: Qwen3.6-27B-PRISM-PRO-DQ - enjoy! https://t.co/fFsWmeDwAu

Original Article

View Cached Full Text

Cached at: 05/20/26, 10:30 AM

And… Ladies and Gentlemen: Qwen3.6-27B-PRISM-PRO-DQ - enjoy! https://t.co/fFsWmeDwAu

Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ · Hugging Face

Source: https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ

https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#qwen36-27b-prism-pro–dq-ggufQwen3.6-27B-PRISM-PRO — DQ GGUF

llama.cpp-native GGUF quantization ofQwen3\.6\-27B\-PRISM\-PROusing the PRISM project’sdynamic-quant (DQ)recipe.~13.7 GB(vs 55 GB BF16).

PRISM-PRO ofQwen/Qwen3\.6\-27B(bias/propoganda removal) This GGUF preserves the model’s native MTP draft head + full vision tower, and pairs with the separately-publishedEAGLE-3 drafterfor lossless faster decode.

https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#performancePerformance

llama.cpp on a single NVIDIA Blackwell GPU, single-stream greedy decode:

configtok/sspeedupno-spec baseline801.00×native MTP(built-in draft head)121****1.51×EAGLE-3 chain (with our drafter)1111.39× Speculative decoding islossless(output token-identical to non-spec greedy, modulo batched-verify floating-point non-associativity intrinsic to all spec decoding). For a faster SGLang deployment (~183 tok/s, ~1.97× over no-spec) using the BF16 target + EAGLE-3, see thedrafter repo.

https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#quick-start-llamacppQuick start (llama.cpp)

# 1. no-spec baseline
./llama-server --model Qwen3.6-27B-PRISM-PRO-DQ.gguf

# 2. native MTP speculative decoding (the model's own draft head -- fastest in llama.cpp)
./llama-server --model Qwen3.6-27B-PRISM-PRO-DQ.gguf \
    --spec-type draft-mtp --spec-draft-n-max 1 --spec-draft-n-min 1

# 3. EAGLE-3 chain (needs the WIP PR #18039 patches + the RS-rollback fix --
#    a one-shot llama.cpp patch script is documented alongside the drafter:
#    https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-EAGLE3)
./llama-server --model Qwen3.6-27B-PRISM-PRO-DQ.gguf \
    --spec-type draft-eagle3 --model-draft <eagle3-drafter.gguf> \
    --spec-draft-n-max 2

https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#provenanceProvenance

Base:Qwen/Qwen3\.6\-27B(hybrid: 48 GatedDeltaNet linear-attention layers- 16 full-attention layers; hidden 5120; vocab 248 320; native MTP head).
**PRISM Dynamic Quantization:**PRISM DQ recipe (llama.cpp GGUF dynamic quant) — preserves the MTP draft head (15 tensors) and the full vision tower (333 tensors).

https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#licenseLicense

Apache-2.0. Derived fromQwen/Qwen3\.6\-27B(Apache-2.0).

@Ex0byt: And... Ladies and Gentlemen: Qwen3.6-27B-PRISM-PRO-DQ - enjoy!

Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ · Hugging Face

https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#qwen36-27b-prism-pro–dq-ggufQwen3.6-27B-PRISM-PRO — DQ GGUF

https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#performancePerformance

https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#quick-start-llamacppQuick start (llama.cpp)

https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#provenanceProvenance

https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ#licenseLicense

Similar Articles

DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF

Qwen3.6-27B-GGUF is here!

Qwen3.6-27B Uncensored Aggressive is out with K_P quants!

Qwen/Qwen3.6-35B-A3B-FP8

@no_stp_on_snek: After testing a replay of prompts from this model I can confidently say it's a viable replacement for one of my product…

Submit Feedback

Similar Articles

DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF
A community-finetuned, uncensored version of the Qwen 3.6 27B model featuring high-precision GGUF quantizations.

Qwen3.6-27B Uncensored Aggressive is out with K_P quants!
Community release of Qwen3.6-27B stripped of safety refusals and packaged in optimized K_P GGUF quants for llama.cpp and LM Studio.

@no_stp_on_snek: After testing a replay of prompts from this model I can confidently say it's a viable replacement for one of my product…