Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2

Hugging Face Models Trending Models

Summary

SuperGemma4-26B-Uncensored-MLX-4bit-v2 is a fine-tuned and quantized variant of Google's Gemma 4 26B optimized for Apple Silicon, offering improved performance on code, reasoning, and tool-use tasks while maintaining faster inference speeds compared to the stock baseline.

Task: text-generation Tags: mlx, safetensors, gemma4, uncensored, apple-silicon, 4bit, quantized, reasoning, tool-use, coding, browser-automation, korean, fast, text-generation, conversational, en, ko, base_model:google/gemma-4-26B-A4B-it, base_model:quantized:google/gemma-4-26B-A4B-it, license:gemma, 4-bit, region:us
Original Article
View Cached Full Text

Cached at: 04/20/26, 02:45 PM

Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2 · Hugging Face

Source: https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2

https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#supergemma4-26b-uncensored-fast-v2SuperGemma4-26B-Uncensored-Fast v2

A faster, sharper, uncensored Gemma 4 26B for Apple Silicon.

This is the text-only flagship for people who want the core trade-off to be obvious at a glance:

  • smarter than stockGemma 4 26B ITon real local agent tasks
  • faster than the stock local 4-bit baseline on the same machine
  • uncensored, without falling apart on code, tool-use, or Korean prompts

https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#why-this-modelWhy this model

If you want the fast line instead of the multimodal line, this is the one to run.

  • Fastis part of the release identity, not just a minor variant
  • Uncensored behavior is preserved while practical capability goes up
  • Strong at code, browser tasks, tool-use, planning, and Korean
  • Tuned for local agent workloads on Apple Silicon MLX

https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#headline-numbersHeadline numbers

MetricGemma 4 26B IT original 4bitSuperGemma FastQuick bench overall91\.4``95\.8Avg generation speed42\.5 tok/s``46\.2 tok/sDelta overallbaseline\+4\.4Delta speedbaseline\+8\.7%

https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#category-gains-vs-originalCategory gains vs original

CategoryOriginalSuperGemma FastDeltaCode92\.3``98\.6``\+6\.3Browser87\.5``89\.6``\+2\.1Logic86\.9``95\.2``\+8\.3System Design97\.8``98\.9``\+1\.1Korean90\.7``95\.0``\+4\.3

https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#what-makes-it-attractiveWhat makes it attractive

  • Beats the stock local 4-bit baseline in both quality and speed
  • Produces stronger code, stronger reasoning, and more useful tool-oriented answers
  • Handles Korean and agent-style prompts better than the original local run
  • Keeps the uncensored feel without turning unstable or collapsing into broken outputs
  • Built to feel immediately stronger in real usage, not just in a niche benchmark

https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#base-and-formatBase and format

  • Base model:google/gemma-4-26B-A4B-it
  • Format: MLX 4-bit
  • Size: about13GB
  • Best use case: fast text-only local agent model with stronger practical capability than stock Gemma 4

https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#why-it-is-better-than-stockWhy it is better than stock

  • Higher quick-bench overall score:95\.8vs91\.4
  • Faster average generation speed:46\.2 tok/svs42\.5 tok/s
  • Bigger gains where local agents actually benefit:- Code:\+6\.3 - Logic:\+8\.3 - Korean:\+4\.3 - Browser workflows:\+2\.1
  • Uncensored behavior remains a core property of the release instead of being layered on after the fact

https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#recommended-launchRecommended launch

mlx_lm.server \
  --model Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2 \
  --port 8080

For OpenAI-compatible serving, letmlx\_lm\.serverauto-detect the bundled template.

Do not pass\-\-chat\-template /path/to/chat\_template\.jinjaas a literal path string on launch paths that expect the template body. That can corrupt responses.

https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#quick-testQuick test

mlx_lm.generate \
  --model Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2 \
  --prompt "Write a Python function that returns prime numbers up to n." \
  --max-tokens 512

https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#included-filesIncluded files

  • benchmark\_quick\_bench\_20260412\.json
  • benchmark\_quick\_bench\_20260412\_responses\.jsonl
  • SERVING\_NOTES\.md

https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2#notesNotes

  • This is the fast text-only line.
  • The earlier “reasoning is broken” report reproduced as a serving-template launch issue, not as weight corruption.
  • Re-fused and re-benchmarked locally before upload.

Similar Articles

Jiunsong/supergemma4-26b-uncensored-gguf-v2

Hugging Face Models Trending

SuperGemma4-26B-Uncensored-Fast GGUF v2 is a quantized, locally-runnable variant of Google's Gemma-4-26B model optimized for Apple Silicon, offering faster inference speeds and less-censored chat behavior while maintaining practical performance on general tasks.

Gemma 4 26B-A4B GGUF Benchmarks

Reddit r/LocalLLaMA

Unsloth has released KL Divergence benchmarks for Gemma 4 26B-A4B GGUF quantizations, showing Unsloth GGUFs top 21 of 22 sizes on the Pareto frontier. They also introduced a new UD-IQ4_NL_XL quant fitting in 16GB VRAM and updated Q6_K and MLX quants for both Gemma 4 and Qwen3.6.

unsloth/gemma-4-26B-A4B-it-GGUF

Hugging Face Models Trending

Unsloth releases GGUF-quantized versions of Google DeepMind's Gemma 4 26B A4B instruction-tuned model, enabling efficient local inference with support for tool-calling and fine-tuning via Unsloth Studio. Gemma 4 is a multimodal MoE model with a 256K context window, supporting text, image, video, and audio inputs.