Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2
Summary
SuperGemma4-26B-Uncensored-MLX-4bit-v2 is a fine-tuned and quantized variant of Google's Gemma 4 26B optimized for Apple Silicon, offering improved performance on code, reasoning, and tool-use tasks while maintaining faster inference speeds compared to the stock baseline.
View Cached Full Text
Cached at: 04/20/26, 02:45 PM
Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2 · Hugging Face
Source: https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2
https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#supergemma4-26b-uncensored-fast-v2SuperGemma4-26B-Uncensored-Fast v2
A faster, sharper, uncensored Gemma 4 26B for Apple Silicon.
This is the text-only flagship for people who want the core trade-off to be obvious at a glance:
- smarter than stock
Gemma 4 26B ITon real local agent tasks - faster than the stock local 4-bit baseline on the same machine
- uncensored, without falling apart on code, tool-use, or Korean prompts
https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#why-this-modelWhy this model
If you want the fast line instead of the multimodal line, this is the one to run.
Fastis part of the release identity, not just a minor variant- Uncensored behavior is preserved while practical capability goes up
- Strong at code, browser tasks, tool-use, planning, and Korean
- Tuned for local agent workloads on Apple Silicon MLX
https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#headline-numbersHeadline numbers
MetricGemma 4 26B IT original 4bitSuperGemma FastQuick bench overall91\.4``95\.8Avg generation speed42\.5 tok/s``46\.2 tok/sDelta overallbaseline\+4\.4Delta speedbaseline\+8\.7%
https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#category-gains-vs-originalCategory gains vs original
CategoryOriginalSuperGemma FastDeltaCode92\.3``98\.6``\+6\.3Browser87\.5``89\.6``\+2\.1Logic86\.9``95\.2``\+8\.3System Design97\.8``98\.9``\+1\.1Korean90\.7``95\.0``\+4\.3
https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#what-makes-it-attractiveWhat makes it attractive
- Beats the stock local 4-bit baseline in both quality and speed
- Produces stronger code, stronger reasoning, and more useful tool-oriented answers
- Handles Korean and agent-style prompts better than the original local run
- Keeps the uncensored feel without turning unstable or collapsing into broken outputs
- Built to feel immediately stronger in real usage, not just in a niche benchmark
https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#base-and-formatBase and format
- Base model:google/gemma-4-26B-A4B-it
- Format: MLX 4-bit
- Size: about
13GB - Best use case: fast text-only local agent model with stronger practical capability than stock Gemma 4
https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#why-it-is-better-than-stockWhy it is better than stock
- Higher quick-bench overall score:
95\.8vs91\.4 - Faster average generation speed:
46\.2 tok/svs42\.5 tok/s - Bigger gains where local agents actually benefit:- Code:
\+6\.3- Logic:\+8\.3- Korean:\+4\.3- Browser workflows:\+2\.1 - Uncensored behavior remains a core property of the release instead of being layered on after the fact
https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#recommended-launchRecommended launch
mlx_lm.server \
--model Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2 \
--port 8080
For OpenAI-compatible serving, letmlx\_lm\.serverauto-detect the bundled template.
Do not pass\-\-chat\-template /path/to/chat\_template\.jinjaas a literal path string on launch paths that expect the template body. That can corrupt responses.
https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#quick-testQuick test
mlx_lm.generate \
--model Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2 \
--prompt "Write a Python function that returns prime numbers up to n." \
--max-tokens 512
https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#included-filesIncluded files
benchmark\_quick\_bench\_20260412\.jsonbenchmark\_quick\_bench\_20260412\_responses\.jsonlSERVING\_NOTES\.md
https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#notesNotes
- This is the fast text-only line.
- The earlier “reasoning is broken” report reproduced as a serving-template launch issue, not as weight corruption.
- Re-fused and re-benchmarked locally before upload.
Similar Articles
Jiunsong/supergemma4-26b-uncensored-gguf-v2
SuperGemma4-26B-Uncensored-Fast GGUF v2 is a quantized, locally-runnable variant of Google's Gemma-4-26B model optimized for Apple Silicon, offering faster inference speeds and less-censored chat behavior while maintaining practical performance on general tasks.
@HuggingModels: Gemma 4 is here, and it's optimized for Apple Silicon. This 4-bit quantized model runs fast on your Mac, not just in th…
Gemma 4 is a 4-bit quantized model optimized for Apple Silicon, enabling fast local inference on Mac devices, reducing reliance on cloud computing.
Gemma 4 26B-A4B GGUF Benchmarks
Unsloth has released KL Divergence benchmarks for Gemma 4 26B-A4B GGUF quantizations, showing Unsloth GGUFs top 21 of 22 sizes on the Pareto frontier. They also introduced a new UD-IQ4_NL_XL quant fitting in 16GB VRAM and updated Q6_K and MLX quants for both Gemma 4 and Qwen3.6.
unsloth/gemma-4-26B-A4B-it-GGUF
Unsloth releases GGUF-quantized versions of Google DeepMind's Gemma 4 26B A4B instruction-tuned model, enabling efficient local inference with support for tool-calling and fine-tuning via Unsloth Studio. Gemma 4 is a multimodal MoE model with a 256K context window, supporting text, image, video, and audio inputs.
@0x0SojalSec: SUPER GEMMA 4 26B UNCENSORED GGUF v2 IS INSANE, - 0/100 refusals (actually uncensored) - Fixed all the tool-call + toke…
Super Gemma 4 26B Uncensored GGUF v2 is a community fine-tuned model offering uncensored responses with zero refusals, improved speed, and fixed tool-calling, optimized for local inference on llama.cpp and vLLM.