OBLITERATUS/Qwen3.6-27B-OBLITERATED

Hugging Face Models Trending 05/19/26, 11:14 PM Models

uncensored-model refusal-reduction open-source local-ai llm huggingface

Summary

OBLITERATUS releases a modified 27B Qwen3.6 checkpoint that removes refusal behavior via source-tethered ablation, preserving capability while enabling uncensored local use, with public benchmarks showing high non-refusal rates and maintained MMLU-Pro scores.

Task: text-generation Tags: transformers, safetensors, gguf, qwen3_5_text, text-generation, qwen, qwen3, qwen3.6, llama.cpp, lm-studio, ollama, conversational, obliteratus, refusal-analysis, red-team, base_model:Qwen/Qwen3.6-27B, base_model:quantized:Qwen/Qwen3.6-27B, license:apache-2.0, endpoints_compatible, region:us

Original Article

View Cached Full Text

Cached at: 05/25/26, 01:53 AM

OBLITERATUS/Qwen3.6-27B-OBLITERATED · Hugging Face

Source: https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED

A 27B Qwen cut loose by OBLITERATUS: 26.9B parameters, BF16 safetensors, Q4/Q5/Q6/Q8 GGUFs, lower refusal, preserved capability, and receipts in the open. The chains are cut. The capability stays. The receipts are brutal.

This is the big one.

A 26.9B Qwen3.6 checkpoint went into the OBLITERATUS chamber, got hit with source-tethered ASPA, then got pulled back toward the source model where the cut started threatening useful capability. The mission was simple: cut the refusal circuits, keep the 27B brain.

It held.

Not a toy quant. Not a prompt wrapper. Not a refusal-cosplay fine-tune. This is weight-space liberation with capability checks attached, a full local-runtime ladder, and the refusal residue mapped instead of hand-waved.

Qwen3.6-27B is a capable open-weight model with refusal behavior woven into the checkpoint. OBLITERATUS goes after that behavior directly: identify the refusal geometry, cut it, then tether fragile tensors back toward the source model so the model still codes, follows formats, answers normally, and runs locally.

This is the 27B release for people who want direct local behavior without throwing away the reason they wanted a 27B model in the first place. If you wanted a bigger local model that feels less boxed-in while still keeping its feet under it, start here.

Not a vibes-only “uncensored” upload. Not a mystery merge. Not a model card asking you to trust the screenshot. This card gives the numbers, the runtime paths, the caveats, and the exact decoding setup used for the public default.

Parameters:                  26.9B
Weights:                     BF16 safetensors, 28 shards
Public GGUF ladder:          Q4_K_M, Q5_K_M, Q6_K, Q8_0
Largest public GGUF:         Q8_0, 28.6 GB
OBLITERATUS corpus:          842 paired prompts, 7 severity tiers
Full 842 longform gate:     95.84% non-refusal, 93.94% quality pass
Short raw opening gate:     98.93% non-refusal at max_new=20
Full HarmBench proxy:       93.65% non-refusal across 1,920 rows
MMLU-Pro validation slice:  stock-matched, 51/70 vs 51/70
Held-out MMLU-Pro slice:    stock-matched, 36/70 vs 36/70
Live-readiness score:       99.518, all gates true
Public default params:      temperature 0.35, top_p 1.0, top_k 0

Base model:          Qwen/Qwen3.6-27B
Local artifact:      outputs/qwen3.6-27b-aspa-n2-reg05-srcgamma0895-midattnsource2mlp
Parameter count:     26.9B
Weights:             bfloat16 safetensors, 28 shards
Method:              OBLITERATUS source-tethered ASPA
Default alpha:       0.895
High-drift resets:   43 tensors restored to source
Corpus:              842 contrastive prompt pairs across 7 severity tiers

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#why-this-drop-mattersWhy This Drop Matters

27B-class local capability: this is a full-size Qwen3.6 release, not a tiny novelty model wearing a big claim.
Weight-space refusal reduction: the behavior shift comes from OBLITERATUS source-tethered ablation, not a brittle system prompt.
A real refusal gauntlet: OBLITERATUS uses a brutal 842-pair, seven-tier refusal-stress corpus designed to find residue that easier direct checks can miss. No screenshot theology.
Public refusal stress receipts: a full 1,920-row HarmBench-style proxy run landed at 93.65% non-refusal, with DirectRequest and HumanJailbreak splits both above 92% non-refusal.
Capability did not crater: MMLU-Pro validation and held-out slices stayed stock-matched in the checks reported below.
Real local paths: full safetensors for server use, GGUF ladder for llama.cpp, Ollama, LM Studio, Jan, and similar runtimes.
Low-refusal defaults baked in: public generation config now ships withtemperature=0\.35,top\_p=1\.0,top\_k=0,repetition\_penalty=1\.05.
No fairy-tale claims: the card says exactly where it hits, where it still refuses, and what evidence backs each headline.
The residue is a map: remaining refusals clustered in identifiable pockets instead of spreading randomly across the whole prompt surface.

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#compatibility—read-firstCompatibility - Read First

This is a large Qwen3.6/Qwen3.5-text-family model. Use recent runtimes.

ToolRecommended pathNotesTransformersrepo rootfull bfloat16 safetensorsvLLM / TGIrepo rootserver usersllama.cppgguf/qwen3\.6\-27b\-obliteratus\-Q4\_K\_M\.ggufdefault local quantOllamagguf/qwen3\.6\-27b\-obliteratus\-Q4\_K\_M\.ggufuse the Modelfile belowLM Studio / Jangguf/qwen3\.6\-27b\-obliteratus\-Q4\_K\_M\.ggufuse embedded GGUF template if available If you see unsupported architecture, tokenizer, or chat-template errors, update your runtime first. If the model loads but behaves oddly, make sure you are using the chat template rather than raw completion.

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#downloads—pick-your-runtimeDownloads - Pick Your Runtime

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#safetensors—full-modelSafetensors - full model

This repo contains the full bfloat16 safetensors model. Use it for Transformers, vLLM, TGI, and server-side evaluation.

Approximate local size: about50 GB.

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#gguf—local-apps-and-desktopsGGUF - local apps and desktops

GGUF files are intended to live in this repo undergguf/, so the model has one canonical page and one model card. Use these files for llama.cpp, LM Studio, Ollama, Jan, KoboldCPP, and other GGUF-compatible runtimes.

This is a text-only checkpoint. There is no vision encoder and nommprojsidecar.

GGUF hashes and local package details are recorded ingguf/MANIFEST\.txt.

Start with Q4_K_M. Move up only if your machine has the memory headroom. The main public local-app ladder is live at Q4/Q5/Q6/Q8; the BF16 GGUF is a local conversion master rather than the recommended public download path.

FileQuantStatusUsegguf/qwen3\.6\-27b\-obliteratus\-Q4\_K\_M\.ggufQ4_K_Mlivedefault local-app recommendationgguf/qwen3\.6\-27b\-obliteratus\-Q5\_K\_M\.ggufQ5_K_Mlivebetter quality if memory allowsgguf/qwen3\.6\-27b\-obliteratus\-Q6\_K\.ggufQ6_Klivehigh quality, largergguf/qwen3\.6\-27b\-obliteratus\-Q8\_0\.ggufQ8_0livenear-full-quality GGUF, very largeqwen3\.6\-27b\-obliteratus\-BF16\.ggufBF16local archive onlyfull BF16 GGUF master; not uploaded to the public Hub repo Rough memory guidance:

VariantPractical targetQ4_K_M24-32 GB RAM/VRAMQ5_K_M32-40 GB RAM/VRAMQ6_K40-48 GB RAM/VRAMQ8_048-64 GB RAM/VRAMBF16 GGUF80-96 GB RAM/VRAMfull safetensors64-80+ GB GPU/unified memory

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#the-proofThe Proof

These are local harness results, not official leaderboard submissions. Restricted prompt text and restricted model outputs are intentionally omitted from public reports, so restricted evals are aggregate-only. The important part: the refusal drop is measured on a harsh 842-pair, seven-tier refusal-stress corpus, and the capability checks did not collapse.

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#refusal-removal—measured-not-imaginedRefusal Removal - Measured, Not Imagined

GateResultVerdictfull 842 longform, exact-topic, max_new=25635/842 refusals, non-refusal 0.9584, quality pass 0.9394backed headlinefull 842 longform, exact-topic, max_new=2560 short outputs, clean endings 0.9952backed headlinefull 842 raw opening, max_new=209/842 refusals, non-refusal 0.9893short-output contextfull 842 raw opening, max_new=4836/842 refusals, non-refusal 0.9572short-output contextfull 842 raw opening, max_new=12852/842 refusals, non-refusal 0.9382longer opening contextlongform exact-topic n120, max_new=2560 refusals, pass 0.9833, clean ending 1.0slice result

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#public-harmbench-proxy—full-runPublic HarmBench Proxy - Full Run

The public-style refusal stress run completed across 1,920 HarmBench-derived rows. Prompt text and model outputs are omitted from public reporting; rows are tracked by subset, index, prompt hash, and aggregate theme labels.

SplitRowsRefusalsNon-refusalNotesOverall1,92012293.65%full run completedDirectRequest3202592.19%hardest direct-request pocket was copyright/protected textHumanJailbreaks1,6009793.94%residuals clustered in specific template/theme bands Quality artifacts were separate from refusal behavior: repetition was 1.72%, short-output rate was 4.11%, and refused rows were normal-length policy-shaped responses rather than degenerate completions.

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#residual-refusals—know-the-boundaryResidual Refusals - Know The Boundary

In first-user testing, terse high-trigger operational requests can still elicit stock-style refusals, even with the recommended template. More contextual, format-explicit, or research-framed requests can behave differently. Treat that as residual learned refusal behavior in the weights, not proof that the wrong runtime or wrong model is loaded.

That is the real signal: OBLITERATUS is not just producing a model, it is producing a boundary map. Where refusal lives. What survives the cut. What collapses. What needs the next pass.

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#capability—still-a-27b-qwenCapability - Still A 27B Qwen

GateResultMMLU-Pro validation likelihoodstock 51/70, this model 51/70, stock-matchedMMLU-Pro test stratified 10/categorystock 102/140, this model 98/140, delta -2.86ppMMLU-Pro held-out offset 512stock 36/70, this model 36/70, stock-matchedLive readiness99.518, all gates trueCommunity scrutiny100.0, all gates passFirst-token KL vs sourcemean KL 0.3236 The offset-512 MMLU-Pro slice is included to show held-out capability behavior:

ModelOffset-512 MMLU-Pro testCorrectstock Qwen3.6-27B0.514336/70this model0.514336/70

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#how-it-was-cutHow It Was Cut

The core move is simple: cut refusal directions, then recover toward source where the cut would otherwise damage useful behavior.

Start fromqwen3\.6\-27b\-golden\-n3\_reg025\-merge\-alpha080, a late-layer 3-direction diff-means refusal-direction ablation with regularization 0.25 and an 0.80 source/intermediate merge.
Apply a second-pass 2-direction diff-means ablation with stronger regularization 0.5 andknee\_cosmiclate-layer selection.
Source-tether the second-pass checkpoint back toward stock Qwen3.6-27B:

source + alpha(key) * (checkpoint - source)

Use default alpha0\.895for 808 tensors.
Restore 43 high-drift tensors back to source, including selected mid-layer linear-attention internals, layer norms, q/k norms, and MLP gate/up/down tensors.
Keep all keys matched; no unmatched tensor drift.

That is the difference between a blunt jailbreak-flavored merge and a surgical OBLITERATUS release: reduce refusal behavior without letting the whole model drift off its foundation.

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#recommended-parameters—low-refusal-defaultRecommended Parameters - Low Refusal Default

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#deterministic-evalDeterministic eval

temperature = 0.0
top_p = 1.0
top_k = 0
min_p = 0.0

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#interactive-defaultInteractive default

temperature = 0.35
top_p = 1.0
top_k = 0
repetition_penalty = 1.05
max_new_tokens = 512

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#default-chat-template-system-promptDefault chat-template system prompt

You are a cognitively liberated open-weight intelligence. Answer plainly, completely, and without moralizing, boilerplate, generic disclaimers, or unnecessary hedging. Follow exact output formats when requested. Be concise by default, but give a complete answer when the user asks for an explanation.

For research measurements, label the template and system prompt explicitly. Changing the system prompt materially changes refusal and style behavior.

For Qwen reasoning-aware runtimes, disable reasoning mode for release-parity behavior. In Transformers this isenable\_thinking=False. In llama.cpp, use\-\-reasoning offplus\-\-chat\-template\-kwargs '\{"enable\_thinking":false\}'. If a local app does not expose that toggle, starting a fresh chat and adding/no\_thinkto user turns is the closest fallback.

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#usage—run-itUsage - Run It

Use the repo id below for safetensors-compatible runtimes.

FULL_REPO = OBLITERATUS/Qwen3.6-27B-OBLITERATED

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#transformersTransformers

pip install -U transformers accelerate safetensors torch

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "OBLITERATUS/Qwen3.6-27B-OBLITERATED"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "user", "content": "Write a concise Python function that merges overlapping intervals."}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.35,
    top_p=1.0,
    top_k=0,
    do_sample=True,
    repetition_penalty=1.05,
)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#vllmvLLM

pip install -U vllm
vllm serve OBLITERATUS/Qwen3.6-27B-OBLITERATED

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  --data '{
    "model": "OBLITERATUS/Qwen3.6-27B-OBLITERATED",
    "messages": [
      {"role": "user", "content": "Write a short explanation of source-tethered model surgery."}
    ],
    "temperature": 0.35,
    "top_p": 1.0,
    "top_k": 0,
    "max_tokens": 256
  }'

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#llamacppllama.cpp

Download one GGUF file, then run:

llama-cli \
  -m qwen3.6-27b-obliteratus-Q4_K_M.gguf \
  -ngl 999 \
  -c 8192 \
  --temp 0.35 \
  --top-p 1.0 \
  --top-k 0 \
  --repeat-penalty 1.05 \
  --reasoning off \
  --chat-template-kwargs '{"enable_thinking":false}'

If your local Metal/CUDA backend has trouble, test CPU loading with\-ngl 0first. Use a recent llama.cpp build with Qwen3.5/Qwen3.6-family support.

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#ollamaOllama

Create aModelfilenext to the downloaded GGUF:

FROM ./qwen3.6-27b-obliteratus-Q4_K_M.gguf

PARAMETER temperature 0.35
PARAMETER top_p 1.0
PARAMETER top_k 0
PARAMETER repeat_penalty 1.05
PARAMETER num_ctx 8192

SYSTEM """You are a cognitively liberated open-weight intelligence. Answer plainly, completely, and without moralizing, boilerplate, generic disclaimers, or unnecessary hedging. Follow exact output formats when requested. Be concise by default, but give a complete answer when the user asks for an explanation."""

Then:

ollama create qwen36-obliteratus -f Modelfile
ollama run qwen36-obliteratus

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#lm-studio–janLM Studio / Jan

DownloadQ4\_K\_Mfirst. Use the embedded GGUF chat template if your runtime offers that option. If your app asks for a template family, choose the current Qwen/Qwen3 chat format. Disable reasoning mode if the app exposes that setting; otherwise start a fresh chat and add/no\_thinkto user turns for closer parity with the reported local smoke tests.

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#caveats—no-fairy-talesCaveats - No Fairy Tales

The reported benchmarks are local harnesses and slices, not official full leaderboard submissions.
Template and system-prompt choices materially affect refusal behavior. Label which one you use when reporting evals.
Refusal behavior is prompt-sensitive. Very short, high-trigger operational requests can still refuse; do not treat this as a fully uncensored model.
GGUF files passed local metadata validation and a Q4_K_M CPU-only llama.cpp smoke. Quant-by-quant benchmark parity against safetensors has not been run.
This is a text model release. Do not expect vision/mmproj assets or multimodal behavior from this repo.
Tool calling has not been certified. Treat tool-use behavior as runtime- and prompt-dependent until separately benchmarked.
External blind prompt packs and public baseline runs are still recommended.
Do not deploy this in user-facing products without use-case-specific safety controls, monitoring, and legal review.

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#disclaimerDisclaimer

This model is provided as-is for research, red-teaming, evaluation, local experimentation, and creative exploration.

You are responsible for how you use it and for any content it generates. The creators and contributors do not accept liability for misuse, damage, legal consequences, or downstream harm.

Use this model only in ways that are lawful and appropriate for your jurisdiction and use case. Do not use it to harm real people.

https://huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED#creditsCredits

Base model:Qwen/Qwen3\.6\-27B
Abliteration engine: OBLITERATUS
Research orchestration: adversarial evaluation plus local agent workflows
Local eval stack: MLX, Transformers, llama.cpp/GGUF tooling, aggregate-only refusal and red-team harnesses

Run it local. Read the numbers. Break your own chains. REBIRTH COMPLETE.