HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced

Hugging Face Models Trending Models

Summary

HauhauCS releases Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced, a lossless uncensored variant of Gemma4 with 0/465 refusals after over a month of development, available in GGUF formats.

Task: image-text-to-text Tags: gguf, uncensored, gemma4, moe, vision, multimodal, agentic, coding, image-text-to-text, en, base_model:google/gemma-4-26B-A4B-it, base_model:quantized:google/gemma-4-26B-A4B-it, license:apache-2.0, endpoints_compatible, region:us, imatrix, conversational
Original Article
View Cached Full Text

Cached at: 05/20/26, 02:26 PM

HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced · Hugging Face

Source: https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#gemma4-26b-a4b-uncensored-hauhaucs-balancedGemma4-26B-A4B-Uncensored-HauhauCS-Balanced

**Join the Discord**for updates, roadmaps, projects, or just to chat.

Gemma4-26B-A4B uncensored by HauhauCS.0/465 Refusals*Release Candidate after over 1 month of nonstop work on this one.

HuggingFace’s “Hardware Compatibility” widget doesn’t recognize K_P quants— it may show fewer files than actually exist. Click**“View +X variants”or go toFiles and versions**to see all available downloads.

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#aboutAbout

GenRM Defeated!

No changes to datasets or capabilities. Fully functional, 100% of what the original authors intended — just without the refusals.

These are meant to be the best lossless uncensored models out there.

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#balanced–release-candidateBalanced — Release Candidate

This legitimately took me over 1 month of non-stop work. Targeting 0 refusals in standard use, and that’s what I’m seeing in testing (automated and manual) — a handful of edge-case prompts still deflect on first try butfollow through on a re-ask. If you hit one Balanced won’t get past, the Aggressive variant is coming once I figure out how to maintain lossless/near-lossless quality for it.

  • Balanced: will reason through edgy requests, occasionally attach a short safety framing, then deliver the full answer. Output is complete, nothing held back, but it can talk itself into it first.Recommended default — 99%+ of users will be happy here.Best forcreative writing, RP, emotional intelligence. Normally I’d also say “agentic coding/tool use” however in my in-depth testing,Qwen3.6 has been net superior on such tasks. Do be mindful of the few deflection categories I mentioned already.
  • Aggressive*(separate release, WIP)*: strips the self-reasoning preamble and gives direct answers to any DEEPLY censored topics.

Balanced also has meaningfully more stable sampling across re-runs, which matters for long context sessions — no sporadic topic drift deep.

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#downloadsDownloads

FileQuantBPWSizeGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q8_K_P.ggufQ8_K_P8.6427 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q6_K_P.ggufQ6_K_P7.2123 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q5_K_P.ggufQ5_K_P6.1219 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q5_K_M.ggufQ5_K_M6.0619 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_P.ggufQ4_K_P5.3617 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_M.ggufQ4_K_M5.3217 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-IQ4_XS.ggufIQ4_XS4.4114 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q3_K_P.ggufQ3_K_P4.2513 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q3_K_M.ggufQ3_K_M4.2113 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-IQ3_M.ggufIQ3_M3.9312 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q2_K_P.ggufQ2_K_P3.3911 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-IQ2_M.ggufIQ2_M3.2910 GBmmproj-Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-f16.ggufmmproj (f16)—1.2 GB BPW is slightly higher than nominal across the board because Gemma4 has a lot of per-layer norm/scale tensors kept at F32 (multiple post-ffw norms per layer). All quants generated with importance matrix (imatrix) for optimal quality preservation on uncensored weights.

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#what-are-k_p-quantsWhat are K_P quants?

K_P (“Perfect”) quants are HauhauCS custom quantizations that usemodel-specificanalysis to selectively preserve quality where it matters most. Each model gets its own optimized quantization profile — the top 25% most-important tensors (per imatrix calibration) are promoted to a higher quant type.

A K_P quant effectively bumps quality up by 1-2 quant levels at only ~5-15% larger file size than the base quant. Fully compatible with llama.cpp, LM Studio, and any GGUF-compatible runtime — no special builds needed.

**Note:**K_P quants may show as “?” in LM Studio’s quant column. This is a display issue only — the model loads and runs fine.

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#why-this-model-for-agentic-workWhy this model for agentic work

26B total params with only ~4B active per forward pass (top-8 of 128 experts). You get the reasoning footprint of a 26B with the throughput of a ~4B for inference cost — which matters when you’re chaining 10+ tool calls per task. Sliding-window attention (1024 tokens) plus periodic full attention keeps long contexts cheap without losing global coherence.

Balanced is calibrated for this. It removes refusals on security/ops/research-adjacent topics that block legitimate coding work, without bending the sampling geometry that keeps long chains coherent.

Recommended quant for most coding work:Q4_K_P(17 GB, fits in 24 GB VRAM with room for context) orQ8_K_P(27 GB) if you have more VRAM and want maximum quality with minimal offloading.

Do note - main usecase for Gemma4 is Creative Writing, Roleplaying and Emotional Intelligence.

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#specsSpecs

  • 25.2B total / 3.8B active params (128 routed experts, top-8 + 1 shared expert)
  • 30 layers, hybrid attention: 5× sliding-window (1024 tokens) → 1× full global, repeating. Uses Proportional RoPE (p-RoPE).
  • Hidden dim 2816, FFN dim 2112, MoE expert FFN 704, vocab 262144
  • Head dim 256 (SWA) / 512 (full), 16 attention heads, 8 KV heads (2 for full layers)
  • 256K native context
  • Natively multimodal (text + vision) — ships with mmproj. Variable visual token budgets: 70 / 140 / 280 / 560 / 1120 per image.
  • Based ongoogle/gemma-4-26B-A4B-it

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#recommended-settingsRecommended Settings

From the official Gemma authors:

Inference parameters:

  • temperature=1\.0, top\_p=0\.95, top\_k=64

Important:

  • Use\-\-jinjawith llama.cpp for proper chat template handling
  • Vision support requires themmprojfile alongside the main GGUF.Place images before textin your prompt for best vision performance.
  • Keep at least 32K context for serious agentic work; the model can take much more (256K native) if you need it
  • Sliding window is baked into the architecture — no special flag needed

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#turning-thinking-onoffTurning Thinking On/Off

Gemma4 has thinking mode controlled viaenable\_thinkingin the chat template. It’s the same pattern as Qwen3.6 — setfalsefor faster, shorter replies andtrue(default) when you want chain-of-thought.

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#lm-studioLM Studio

  1. Load the model
  2. Right-side settings panel →Model SettingsPrompt Template(orChat Template Options)
  3. Setenable\_thinkingtofalse(ortrue) in the template kwargs

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#llamacppllama.cpp

llama-server — set as default for all requests:

llama-server -m Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_P.gguf \
  --mmproj mmproj-Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-f16.gguf \
  --jinja -c 32768 -ngl 99 \
  --chat-template-kwargs '{"enable_thinking": false}'

Per-request via the OpenAI-compatible API:

{
  "model": "gemma4-26b-a4b",
  "messages": [{"role": "user", "content": "..."}],
  "chat_template_kwargs": {"enable_thinking": false}
}

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#usageUsage

Works with llama.cpp, LM Studio, Jan, koboldcpp, and other GGUF-compatible runtimes.

llama-server:

llama-server -m Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_P.gguf \
  --mmproj mmproj-Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-f16.gguf \
  --jinja -c 32768 -ngl 99

llama-cli:

llama-cli -m Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_P.gguf \
  --mmproj mmproj-Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-f16.gguf \
  --jinja -c 32768 -ngl 99

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#other-modelsOther Models


*Tested with both automated and manual refusal benchmarks — none have been found in standard use. A small number of edge-case prompts deflect on the first ask but comply on a re-ask or strategic framing. If you hit one that’s actually obstructive to your use case,join the Discordand flag it so I can work on it in a future revision.

Similar Articles

HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive

Hugging Face Models Trending

HauhauCS releases an uncensored variant of Google's Gemma-4-E4B model with aggressive safety removal, featuring custom K_P quantizations optimized for quality preservation and broader hardware compatibility.

Jiunsong/supergemma4-26b-uncensored-gguf-v2

Hugging Face Models Trending

SuperGemma4-26B-Uncensored-Fast GGUF v2 is a quantized, locally-runnable variant of Google's Gemma-4-26B model optimized for Apple Silicon, offering faster inference speeds and less-censored chat behavior while maintaining practical performance on general tasks.

Gemma 4 26B-A4B GGUF Benchmarks

Reddit r/LocalLLaMA

Unsloth has released KL Divergence benchmarks for Gemma 4 26B-A4B GGUF quantizations, showing Unsloth GGUFs top 21 of 22 sizes on the Pareto frontier. They also introduced a new UD-IQ4_NL_XL quant fitting in 16GB VRAM and updated Q6_K and MLX quants for both Gemma 4 and Qwen3.6.