@0x0SojalSec: SUPER GEMMA 4 26B UNCENSORED GGUF v2 IS INSANE, - 0/100 refusals (actually uncensored) - Fixed all the tool-call + toke…

X AI KOLs Following 06/07/26, 04:33 PM Models

uncensored gemma gguf local-ai open-source 26b fine-tune

Summary

Super Gemma 4 26B Uncensored GGUF v2 is a community fine-tuned model offering uncensored responses with zero refusals, improved speed, and fixed tool-calling, optimized for local inference on llama.cpp and vLLM.

SUPER GEMMA 4 26B UNCENSORED GGUF v2 IS INSANE, - 0/100 refusals (actually uncensored) - Fixed all the tool-call + tokenizer jank - 90% faster prompt processing - Sharper, smarter, way more capable responses - Perfect local beast for llama.cpp Runs on 16 GB ,18, 22 GB VRAM (16.8 GB Q4_K_M file) - http://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2…

Original Article

View Cached Full Text

Cached at: 06/08/26, 09:23 AM

SUPER GEMMA 4 26B UNCENSORED GGUF v2 IS INSANE,

0/100 refusals (actually uncensored)
Fixed all the tool-call + tokenizer jank
90% faster prompt processing
Sharper, smarter, way more capable responses
Perfect local beast for llama.cpp

Runs on 16 GB ,18, 22 GB VRAM (16.8 GB Q4_K_M file)

http://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2…

Jiunsong/supergemma4-26b-uncensored-gguf-v2 · Hugging Face

Source: https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2 Librariesllama-cpp-pythonHow to use Jiunsong/supergemma4-26b-uncensored-gguf-v2 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Jiunsong/supergemma4-26b-uncensored-gguf-v2",
	filename="supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

NotebooksGoogle Colab KaggleLocal AppsSettings llama.cppHow to use Jiunsong/supergemma4-26b-uncensored-gguf-v2 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M

Use Docker

docker model run hf.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M

LM Studio Jan vLLMHow to use Jiunsong/supergemma4-26b-uncensored-gguf-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Jiunsong/supergemma4-26b-uncensored-gguf-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jiunsong/supergemma4-26b-uncensored-gguf-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M

OllamaHow to use Jiunsong/supergemma4-26b-uncensored-gguf-v2 with Ollama:

ollama run hf.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M

Unsloth StudioHow to use Jiunsong/supergemma4-26b-uncensored-gguf-v2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jiunsong/supergemma4-26b-uncensored-gguf-v2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jiunsong/supergemma4-26b-uncensored-gguf-v2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Jiunsong/supergemma4-26b-uncensored-gguf-v2 to start chatting

PiHow to use Jiunsong/supergemma4-26b-uncensored-gguf-v2 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes AgentnewHow to use Jiunsong/supergemma4-26b-uncensored-gguf-v2 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M

Run Hermes

hermes

Docker Model RunnerHow to use Jiunsong/supergemma4-26b-uncensored-gguf-v2 with Docker Model Runner:

docker model run hf.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M

LemonadeHow to use Jiunsong/supergemma4-26b-uncensored-gguf-v2 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Jiunsong/supergemma4-26b-uncensored-gguf-v2:Q4_K_M

Run and chat with the model

lemonade run user.supergemma4-26b-uncensored-gguf-v2-Q4_K_M

List all available models

lemonade list

@0x0SojalSec: SUPER GEMMA 4 26B UNCENSORED GGUF v2 IS INSANE, - 0/100 refusals (actually uncensored) - Fixed all the tool-call + toke…

Jiunsong/supergemma4-26b-uncensored-gguf-v2 · Hugging Face

Install from brew

Install from WinGet (Windows)

Use pre-built binary

Build from source code

Use Docker

Install from pip and serve model

Use Docker

Install Unsloth Studio (macOS, Linux, WSL)

Install Unsloth Studio (Windows)

Using HuggingFace Spaces for Unsloth

Start the llama.cpp server

Configure the model in Pi

Run Pi

Start the llama.cpp server

Configure Hermes

Run Hermes

Pull the model

Run and chat with the model

List all available models

Similar Articles

Jiunsong/supergemma4-26b-uncensored-gguf-v2

Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2

HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced

G4-Meromero-31B-Uncensored-Heretic Is Out Now, a Finetune of Gemma 4 31B It Designed for Creative Tasks, With Kld of 0.0100 and 15/100 Refusals!

@analogalok: i just ran Google's brand new Unsloth Gemma4 12B dense GGUF on my RTX 4060 using llama.cpp + CUDA 13.2 21 tokens per se…

Submit Feedback

Similar Articles

Jiunsong/supergemma4-26b-uncensored-gguf-v2

Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2

HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced

G4-Meromero-31B-Uncensored-Heretic Is Out Now, a Finetune of Gemma 4 31B It Designed for Creative Tasks, With Kld of 0.0100 and 15/100 Refusals!

@analogalok: i just ran Google's brand new Unsloth Gemma4 12B dense GGUF on my RTX 4060 using llama.cpp + CUDA 13.2 21 tokens per se…