HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced

Hugging Face Models Trending 05/14/26, 04:24 PM Models

uncensored gemma4 llm open-source fine-tuning gguf balanced

Summary

HauhauCS releases Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced, a lossless uncensored variant of Gemma4 with 0/465 refusals after over a month of development, available in GGUF formats.

Task: image-text-to-text Tags: gguf, uncensored, gemma4, moe, vision, multimodal, agentic, coding, image-text-to-text, en, base_model:google/gemma-4-26B-A4B-it, base_model:quantized:google/gemma-4-26B-A4B-it, license:apache-2.0, endpoints_compatible, region:us, imatrix, conversational

Original Article

View Cached Full Text

Cached at: 05/20/26, 02:26 PM

HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced · Hugging Face

Source: https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#gemma4-26b-a4b-uncensored-hauhaucs-balancedGemma4-26B-A4B-Uncensored-HauhauCS-Balanced

**Join the Discord**for updates, roadmaps, projects, or just to chat.

Gemma4-26B-A4B uncensored by HauhauCS.0/465 Refusals*Release Candidate after over 1 month of nonstop work on this one.

HuggingFace’s “Hardware Compatibility” widget doesn’t recognize K_P quants— it may show fewer files than actually exist. Click**“View +X variants”or go toFiles and versions**to see all available downloads.

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#aboutAbout

GenRM Defeated!

No changes to datasets or capabilities. Fully functional, 100% of what the original authors intended — just without the refusals.

These are meant to be the best lossless uncensored models out there.

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#balanced–release-candidateBalanced — Release Candidate

This legitimately took me over 1 month of non-stop work. Targeting 0 refusals in standard use, and that’s what I’m seeing in testing (automated and manual) — a handful of edge-case prompts still deflect on first try butfollow through on a re-ask. If you hit one Balanced won’t get past, the Aggressive variant is coming once I figure out how to maintain lossless/near-lossless quality for it.

Balanced: will reason through edgy requests, occasionally attach a short safety framing, then deliver the full answer. Output is complete, nothing held back, but it can talk itself into it first.Recommended default — 99%+ of users will be happy here.Best forcreative writing, RP, emotional intelligence. Normally I’d also say “agentic coding/tool use” however in my in-depth testing,Qwen3.6 has been net superior on such tasks. Do be mindful of the few deflection categories I mentioned already.
Aggressive*(separate release, WIP)*: strips the self-reasoning preamble and gives direct answers to any DEEPLY censored topics.

Balanced also has meaningfully more stable sampling across re-runs, which matters for long context sessions — no sporadic topic drift deep.

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#downloadsDownloads

FileQuantBPWSizeGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q8_K_P.ggufQ8_K_P8.6427 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q6_K_P.ggufQ6_K_P7.2123 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q5_K_P.ggufQ5_K_P6.1219 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q5_K_M.ggufQ5_K_M6.0619 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_P.ggufQ4_K_P5.3617 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_M.ggufQ4_K_M5.3217 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-IQ4_XS.ggufIQ4_XS4.4114 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q3_K_P.ggufQ3_K_P4.2513 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q3_K_M.ggufQ3_K_M4.2113 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-IQ3_M.ggufIQ3_M3.9312 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q2_K_P.ggufQ2_K_P3.3911 GBGemma4-26B-A4B-Uncensored-HauhauCS-Balanced-IQ2_M.ggufIQ2_M3.2910 GBmmproj-Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-f16.ggufmmproj (f16)—1.2 GB BPW is slightly higher than nominal across the board because Gemma4 has a lot of per-layer norm/scale tensors kept at F32 (multiple post-ffw norms per layer). All quants generated with importance matrix (imatrix) for optimal quality preservation on uncensored weights.

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#what-are-k_p-quantsWhat are K_P quants?

K_P (“Perfect”) quants are HauhauCS custom quantizations that usemodel-specificanalysis to selectively preserve quality where it matters most. Each model gets its own optimized quantization profile — the top 25% most-important tensors (per imatrix calibration) are promoted to a higher quant type.

A K_P quant effectively bumps quality up by 1-2 quant levels at only ~5-15% larger file size than the base quant. Fully compatible with llama.cpp, LM Studio, and any GGUF-compatible runtime — no special builds needed.

**Note:**K_P quants may show as “?” in LM Studio’s quant column. This is a display issue only — the model loads and runs fine.

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#why-this-model-for-agentic-workWhy this model for agentic work

26B total params with only ~4B active per forward pass (top-8 of 128 experts). You get the reasoning footprint of a 26B with the throughput of a ~4B for inference cost — which matters when you’re chaining 10+ tool calls per task. Sliding-window attention (1024 tokens) plus periodic full attention keeps long contexts cheap without losing global coherence.

Balanced is calibrated for this. It removes refusals on security/ops/research-adjacent topics that block legitimate coding work, without bending the sampling geometry that keeps long chains coherent.

Recommended quant for most coding work:Q4_K_P(17 GB, fits in 24 GB VRAM with room for context) orQ8_K_P(27 GB) if you have more VRAM and want maximum quality with minimal offloading.

Do note - main usecase for Gemma4 is Creative Writing, Roleplaying and Emotional Intelligence.

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#specsSpecs

25.2B total / 3.8B active params (128 routed experts, top-8 + 1 shared expert)
30 layers, hybrid attention: 5× sliding-window (1024 tokens) → 1× full global, repeating. Uses Proportional RoPE (p-RoPE).
Hidden dim 2816, FFN dim 2112, MoE expert FFN 704, vocab 262144
Head dim 256 (SWA) / 512 (full), 16 attention heads, 8 KV heads (2 for full layers)
256K native context
Natively multimodal (text + vision) — ships with mmproj. Variable visual token budgets: 70 / 140 / 280 / 560 / 1120 per image.
Based ongoogle/gemma-4-26B-A4B-it

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#recommended-settingsRecommended Settings

From the official Gemma authors:

Inference parameters:

temperature=1\.0, top\_p=0\.95, top\_k=64

Important:

Use\-\-jinjawith llama.cpp for proper chat template handling
Vision support requires themmprojfile alongside the main GGUF.Place images before textin your prompt for best vision performance.
Keep at least 32K context for serious agentic work; the model can take much more (256K native) if you need it
Sliding window is baked into the architecture — no special flag needed

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#turning-thinking-onoffTurning Thinking On/Off

Gemma4 has thinking mode controlled viaenable\_thinkingin the chat template. It’s the same pattern as Qwen3.6 — setfalsefor faster, shorter replies andtrue(default) when you want chain-of-thought.

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#lm-studioLM Studio

Load the model
Right-side settings panel →Model Settings→Prompt Template(orChat Template Options)
Setenable\_thinkingtofalse(ortrue) in the template kwargs

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#llamacppllama.cpp

llama-server — set as default for all requests:

llama-server -m Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_P.gguf \
  --mmproj mmproj-Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-f16.gguf \
  --jinja -c 32768 -ngl 99 \
  --chat-template-kwargs '{"enable_thinking": false}'

Per-request via the OpenAI-compatible API:

{
  "model": "gemma4-26b-a4b",
  "messages": [{"role": "user", "content": "..."}],
  "chat_template_kwargs": {"enable_thinking": false}
}

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#usageUsage

Works with llama.cpp, LM Studio, Jan, koboldcpp, and other GGUF-compatible runtimes.

llama-server:

llama-server -m Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_P.gguf \
  --mmproj mmproj-Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-f16.gguf \
  --jinja -c 32768 -ngl 99

llama-cli:

llama-cli -m Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_P.gguf \
  --mmproj mmproj-Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-f16.gguf \
  --jinja -c 32768 -ngl 99

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#other-modelsOther Models

HauhauCS on HuggingFace

*Tested with both automated and manual refusal benchmarks — none have been found in standard use. A small number of edge-case prompts deflect on the first ask but comply on a re-ask or strategic framing. If you hit one that’s actually obstructive to your use case,join the Discordand flag it so I can work on it in a future revision.

HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced

HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced · Hugging Face

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#gemma4-26b-a4b-uncensored-hauhaucs-balancedGemma4-26B-A4B-Uncensored-HauhauCS-Balanced

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#aboutAbout

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#balanced–release-candidateBalanced — Release Candidate

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#downloadsDownloads

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#what-are-k_p-quantsWhat are K_P quants?

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#why-this-model-for-agentic-workWhy this model for agentic work

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#specsSpecs

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#recommended-settingsRecommended Settings

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#turning-thinking-onoffTurning Thinking On/Off

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#lm-studioLM Studio

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#llamacppllama.cpp

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#usageUsage

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced#other-modelsOther Models

Similar Articles

HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive

HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive

G4-Meromero-31B-Uncensored-Heretic Is Out Now, a Finetune of Gemma 4 31B It Designed for Creative Tasks, With Kld of 0.0100 and 15/100 Refusals!

Jiunsong/supergemma4-26b-uncensored-gguf-v2

Gemma 4 26B-A4B GGUF Benchmarks

Submit Feedback

Similar Articles

HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive

HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive

G4-Meromero-31B-Uncensored-Heretic Is Out Now, a Finetune of Gemma 4 31B It Designed for Creative Tasks, With Kld of 0.0100 and 15/100 Refusals!

Jiunsong/supergemma4-26b-uncensored-gguf-v2

Gemma 4 26B-A4B GGUF Benchmarks