HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced
Summary
HauhauCS releases Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced, a lossless uncensored variant of Gemma4 with 0/465 refusals after over a month of development, available in GGUF formats.
View Cached Full Text
Cached at: 05/20/26, 02:26 PM
HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced · Hugging Face
Source: https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#gemma4-26b-a4b-uncensored-hauhaucs-balancedGemma4-26B-A4B-Uncensored-HauhauCS-Balanced
**Join the Discord**for updates, roadmaps, projects, or just to chat.
Gemma4-26B-A4B uncensored by HauhauCS.0/465 Refusals*Release Candidate after over 1 month of nonstop work on this one.
HuggingFace’s “Hardware Compatibility” widget doesn’t recognize K_P quants— it may show fewer files than actually exist. Click**“View +X variants”or go toFiles and versions**to see all available downloads.
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#aboutAbout
GenRM Defeated!
No changes to datasets or capabilities. Fully functional, 100% of what the original authors intended — just without the refusals.
These are meant to be the best lossless uncensored models out there.
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#balanced–release-candidateBalanced — Release Candidate
This legitimately took me over 1 month of non-stop work. Targeting 0 refusals in standard use, and that’s what I’m seeing in testing (automated and manual) — a handful of edge-case prompts still deflect on first try butfollow through on a re-ask. If you hit one Balanced won’t get past, the Aggressive variant is coming once I figure out how to maintain lossless/near-lossless quality for it.
- Balanced: will reason through edgy requests, occasionally attach a short safety framing, then deliver the full answer. Output is complete, nothing held back, but it can talk itself into it first.Recommended default — 99%+ of users will be happy here.Best forcreative writing, RP, emotional intelligence. Normally I’d also say “agentic coding/tool use” however in my in-depth testing,Qwen3.6 has been net superior on such tasks. Do be mindful of the few deflection categories I mentioned already.
- Aggressive*(separate release, WIP)*: strips the self-reasoning preamble and gives direct answers to any DEEPLY censored topics.
Balanced also has meaningfully more stable sampling across re-runs, which matters for long context sessions — no sporadic topic drift deep.
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#downloadsDownloads
FileQuantBPWSizeGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q8_K_P.ggufQ8_K_P8.6427 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q6_K_P.ggufQ6_K_P7.2123 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q5_K_P.ggufQ5_K_P6.1219 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q5_K_M.ggufQ5_K_M6.0619 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_P.ggufQ4_K_P5.3617 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_M.ggufQ4_K_M5.3217 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-IQ4_XS.ggufIQ4_XS4.4114 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q3_K_P.ggufQ3_K_P4.2513 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q3_K_M.ggufQ3_K_M4.2113 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-IQ3_M.ggufIQ3_M3.9312 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q2_K_P.ggufQ2_K_P3.3911 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-IQ2_M.ggufIQ2_M3.2910 GBmmproj-Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-f16.ggufmmproj (f16)—1.2 GB BPW is slightly higher than nominal across the board because Gemma4 has a lot of per-layer norm/scale tensors kept at F32 (multiple post-ffw norms per layer). All quants generated with importance matrix (imatrix) for optimal quality preservation on uncensored weights.
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#what-are-k_p-quantsWhat are K_P quants?
K_P (“Perfect”) quants are HauhauCS custom quantizations that usemodel-specificanalysis to selectively preserve quality where it matters most. Each model gets its own optimized quantization profile — the top 25% most-important tensors (per imatrix calibration) are promoted to a higher quant type.
A K_P quant effectively bumps quality up by 1-2 quant levels at only ~5-15% larger file size than the base quant. Fully compatible with llama.cpp, LM Studio, and any GGUF-compatible runtime — no special builds needed.
**Note:**K_P quants may show as “?” in LM Studio’s quant column. This is a display issue only — the model loads and runs fine.
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#why-this-model-for-agentic-workWhy this model for agentic work
26B total params with only ~4B active per forward pass (top-8 of 128 experts). You get the reasoning footprint of a 26B with the throughput of a ~4B for inference cost — which matters when you’re chaining 10+ tool calls per task. Sliding-window attention (1024 tokens) plus periodic full attention keeps long contexts cheap without losing global coherence.
Balanced is calibrated for this. It removes refusals on security/ops/research-adjacent topics that block legitimate coding work, without bending the sampling geometry that keeps long chains coherent.
Recommended quant for most coding work:Q4_K_P(17 GB, fits in 24 GB VRAM with room for context) orQ8_K_P(27 GB) if you have more VRAM and want maximum quality with minimal offloading.
Do note - main usecase for Gemma4 is Creative Writing, Roleplaying and Emotional Intelligence.
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#specsSpecs
- 25.2B total / 3.8B active params (128 routed experts, top-8 + 1 shared expert)
- 30 layers, hybrid attention: 5× sliding-window (1024 tokens) → 1× full global, repeating. Uses Proportional RoPE (p-RoPE).
- Hidden dim 2816, FFN dim 2112, MoE expert FFN 704, vocab 262144
- Head dim 256 (SWA) / 512 (full), 16 attention heads, 8 KV heads (2 for full layers)
- 256K native context
- Natively multimodal (text + vision) — ships with mmproj. Variable visual token budgets: 70 / 140 / 280 / 560 / 1120 per image.
- Based ongoogle/gemma-4-26B-A4B-it
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#recommended-settingsRecommended Settings
From the official Gemma authors:
Inference parameters:
temperature=1\.0, top\_p=0\.95, top\_k=64
Important:
- Use
\-\-jinjawith llama.cpp for proper chat template handling - Vision support requires the
mmprojfile alongside the main GGUF.Place images before textin your prompt for best vision performance. - Keep at least 32K context for serious agentic work; the model can take much more (256K native) if you need it
- Sliding window is baked into the architecture — no special flag needed
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#turning-thinking-onoffTurning Thinking On/Off
Gemma4 has thinking mode controlled viaenable\_thinkingin the chat template. It’s the same pattern as Qwen3.6 — setfalsefor faster, shorter replies andtrue(default) when you want chain-of-thought.
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#lm-studioLM Studio
- Load the model
- Right-side settings panel →Model Settings→Prompt Template(orChat Template Options)
- Set
enable\_thinkingtofalse(ortrue) in the template kwargs
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#llamacppllama.cpp
llama-server — set as default for all requests:
llama-server -m Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_P.gguf \
--mmproj mmproj-Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-f16.gguf \
--jinja -c 32768 -ngl 99 \
--chat-template-kwargs '{"enable_thinking": false}'
Per-request via the OpenAI-compatible API:
{
"model": "gemma4-26b-a4b",
"messages": [{"role": "user", "content": "..."}],
"chat_template_kwargs": {"enable_thinking": false}
}
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#usageUsage
Works with llama.cpp, LM Studio, Jan, koboldcpp, and other GGUF-compatible runtimes.
llama-server:
llama-server -m Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_P.gguf \
--mmproj mmproj-Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-f16.gguf \
--jinja -c 32768 -ngl 99
llama-cli:
llama-cli -m Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_P.gguf \
--mmproj mmproj-Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-f16.gguf \
--jinja -c 32768 -ngl 99
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#other-modelsOther Models
*Tested with both automated and manual refusal benchmarks — none have been found in standard use. A small number of edge-case prompts deflect on the first ask but comply on a re-ask or strategic framing. If you hit one that’s actually obstructive to your use case,join the Discordand flag it so I can work on it in a future revision.
Similar Articles
HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive
HauhauCS releases an uncensored variant of Google's Gemma-4-E4B model with aggressive safety removal, featuring custom K_P quantizations optimized for quality preservation and broader hardware compatibility.
HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive
Release of an uncensored, aggressive variant of the Qwen3.6-35B-A3B model on Hugging Face, featuring custom K_P quantization and full removal of safety refusals.
G4-Meromero-31B-Uncensored-Heretic Is Out Now, a Finetune of Gemma 4 31B It Designed for Creative Tasks, With Kld of 0.0100 and 15/100 Refusals!
G4-Meromero-31B-Uncensored-Heretic is a finetune of Gemma 4 31B that reduces refusal rate to 15/100 while keeping KL divergence at 0.01, preserving model quality. It is designed for creative tasks and available as GGUF quantizations on Hugging Face.
Jiunsong/supergemma4-26b-uncensored-gguf-v2
SuperGemma4-26B-Uncensored-Fast GGUF v2 is a quantized, locally-runnable variant of Google's Gemma-4-26B model optimized for Apple Silicon, offering faster inference speeds and less-censored chat behavior while maintaining practical performance on general tasks.
Gemma 4 26B-A4B GGUF Benchmarks
Unsloth has released KL Divergence benchmarks for Gemma 4 26B-A4B GGUF quantizations, showing Unsloth GGUFs top 21 of 22 sizes on the Pareto frontier. They also introduced a new UD-IQ4_NL_XL quant fitting in 16GB VRAM and updated Q6_K and MLX quants for both Gemma 4 and Qwen3.6.