@Tono_Ken3: Added Q3 series to gemma-4-12B-coder-fable5-composer2.5-GGUF You might be able to try out the essence of Fable5 (as a t…

X AI KOLs Timeline 06/16/26, 04:16 AM Models

gemma-4 gguf quantization code-generation open-source hugging-face fine-tuning

Summary

New Q3 quantizations added to the gemma-4-12B-coder-fable5-composer2.5 GGUF model, enabling the coding-focused fine-tune to run on GPUs with around 6GB VRAM using importance-matrix quantized versions.

Added Q3 series to gemma-4-12B-coder-fable5-composer2.5-GGUF You might be able to try out the essence of Fable5 (as a teacher role) in coding even on a GPU with around 6GB VRAM

Original Article

View Cached Full Text

Cached at: 06/16/26, 11:49 AM

Added Q3 series to gemma-4-12B-coder-fable5-composer2.5-GGUF You might be able to try out the essence of Fable5 (as a teacher role) in coding even on a GPU with around 6GB VRAM

sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF · Hugging Face

Source: https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%92%BB-gemma-4-12b-coder-fable5-%C3%97-composer25–imatrix-gguf-%E2%9C%A8💻 Gemma-4-12B-Coder (fable5 × composer2.5) —imatrix GGUF✨

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#runs-anywhere-llamacpp-runs–amdvulkan-cpu-apple-nvidia-no-blackwell-no-mtp-just-gguf-%F0%9F%90%A7%F0%9F%8D%8E%F0%9F%AA%9FRuns anywhere llama.cpp runs —AMD/Vulkan, CPU, Apple, NVIDIA. No Blackwell, no MTP, just GGUF. 🐧🍎🪟

Importance-matrix (imatrix) quants ofyuxinlu1’s coding model, calibrated onreal Python coding dataso the low-bit builds keep their coding smarts. Text-only (a coding model — no vision baggage). 💚

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%99%8F-credit🙏 Credit

Quants of**yuxinlu1/gemma\-4\-12B\-coder\-fable5\-composer2\.5\-v1— all thanks to@yuxinlu1for the model. ⭐ the original and watch it for a v2! The author’s recipe: a fine-tune ofgoogle/gemma\-4\-12B\-itonexecution-verifiedPython coding chains-of-thought (Composer 2.5 real CoT + a Fable 5 “second-attempt” set for the hard cases). Itthinks in Gemma’s native channel**, then writes clean, runnable code. De-refused; Python/algorithmic focus; English-centric.

Why this repo:the originals are static GGUF. These add animportance matrix(code-calibrated) soIQ4_XS / Q4_Kkeep more quality at low VRAM — the builds that fly forAMD/Vulkan and CPUfolks.

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%93%A6-pick-your-quant-all-imatrix📦 Pick your quant (all imatrix)

QuantSizeVibe🟢Q3_K_S5.53 GBsmallest that works— for8 GB / 6 GBcards (leaves room for context). ~**91.7%**HumanEval[:12]🟢Q3_K_M****6.09 GBtinyandsharp —**100%**HumanEval[:15]🔵IQ4_XS****6.64 GBthe imatrix 4-bit sweet spot —**100%**HumanEval[:15]🔵Q4_K_M****7.38 GBbalanced (embeddings/output at Q6_K)⚪Q5_K_M****8.55 GBquality-first if you have the RAM/VRAM

💡8 GB VRAM (or 6 GB):grabQ3_K_S(5.5 GB) — it leaves headroom for context and still codes well. On theVulkanbackend (AMD) all of these fly. ⚠️Avoid IQ3 (i-quant 3-bit) for this model—IQ3\_XXS/IQ3\_Scollapseto gibberish here (gemma-4’s special attention layers don’t survive 3-bit i-quants). The**Q3\_K\_\*K-quants**stay coherent at the same size — that’s why the small tiers are Q3_K, not IQ3.

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%9A%80-run-it-llamacpp–any-backend🚀 Run it (llama.cpp — any backend)

# build llama.cpp with your backend (Vulkan for AMD):  cmake -B build -DGGML_VULKAN=ON && cmake --build build
# grab one quant:
hf download sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF gemma-4-12B-coder-fable5-composer2.5-IQ4_XS.gguf --local-dir .

# chat server (OpenAI-compatible at http://localhost:8080)
./llama-server -m gemma-4-12B-coder-fable5-composer2.5-IQ4_XS.gguf \
  -ngl 99 --ctx-size 16384 -fa on --jinja \
  --temp 1.0 --top-p 0.95 --top-k 64 --host 0.0.0.0 --port 8080

⚠️ Needs arecent llama.cpp— this is thegemma4architecture (older builds won’t load it). 🧠Thinking is on by defaultvia the chat template (\-\-jinja). The model reasons through edge cases, then writes the code. For deterministic coding use\-\-temp 0.

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%A6%99-ollama-one-line-straight-from-this-repo🦙 Ollama (one line, straight from this repo)

ollama run hf.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF:Q4_K_M

Pick any tag:Q3\_K\_S``Q3\_K\_M``IQ4\_XS``Q4\_K\_M``Q5\_K\_M.

❗“manifest not found”?You must includeboththehf\.co/prefixandan explicit quant tag. Without a tag, Ollama looks for:latest(which doesn’t exist here); withouthf\.co/, it searches Ollama’s own registry instead of this repo. The fix is just…\-GGUF:Q4\_K\_M.

Also works inLM Studio / Jan / KoboldCpp— import the GGUF, pick a quant, go. 🐾

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%93%8A-how-good-is-it-greedy-pass1📊 How good is it? (greedy pass@1)

BenchmarkScoreHumanEval****90.2%(148/164)MBPP****85.7%(366/427) Strong at hard algorithms,bug-fixing & refactoring, and faithful open reasoning. Japanese prompts cause no measurable Python-quality drop.

⚠️One honest caveat:ontime-series / quant-financecode it can introduce alook-ahead bias(and its reasoning may state the right rule while the code does the opposite). Great algorithm/debug helper — butreview its pandas/numpy back-test codebefore trusting it.

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%94%A7-quant-details🔧 Quant details

imatrixcomputed on acode-heavycalibration set (HumanEval + MBPP problems & solutions) so the importance matrix reflects real coding activations.
Source: the author’sQ8\_0GGUF (≈lossless). Text-onlygemma4(no vision/audio).
Higher tiers keep token-embeddings & output tensors atQ6\_K(K-quant default) for fidelity where it matters most; the Q3_K tiers trade a little there for size.
K-quants over i-quants here:gemma-4’s heterogeneous attention (head_dim 256 / 512 layers) survivesQ3\_K\_\*butcollapses underIQ3\_\*— verified, so the small tiers ship as Q3_K.

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%93%9A-license–use📚 License & use

Gemma Terms of Use(derivatives must comply). De-refused / not safety-aligned — add your own guardrails. Best on Python/algorithmic tasks; double-check general facts and time-series code. Shared as-is.Quants & eval byLna-Lab; thanks to @yuxinlu1.🐾✨

@Tono_Ken3: Added Q3 series to gemma-4-12B-coder-fable5-composer2.5-GGUF You might be able to try out the essence of Fable5 (as a t…

sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF · Hugging Face

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%92%BB-gemma-4-12b-coder-fable5-%C3%97-composer25–imatrix-gguf-%E2%9C%A8💻 Gemma-4-12B-Coder (fable5 × composer2.5) —imatrix GGUF✨

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%99%8F-credit🙏 Credit

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%93%A6-pick-your-quant-all-imatrix📦 Pick your quant (all imatrix)

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%9A%80-run-it-llamacpp–any-backend🚀 Run it (llama.cpp — any backend)

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%A6%99-ollama-one-line-straight-from-this-repo🦙 Ollama (one line, straight from this repo)

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%93%8A-how-good-is-it-greedy-pass1📊 How good is it? (greedy pass@1)

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%94%A7-quant-details🔧 Quant details

https://huggingface.co/sakamakismile/gemma-4-12B-coder-fable5-composer2.5-GGUF#%F0%9F%93%9A-license–use📚 License & use

Similar Articles

yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF

@Tono_Ken3: I noticed that there might be another person who realized that gemma-4-12b could rival qwen3.6-35b in practical work Ye…

@analogalok: gemma-4-12B-agentic-fable5-composer2.5 V2 is out. the agentic upgrade to the model trained on Fable 5's reasoning. Runn…

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it

Gemma 4 26B-A4B GGUF Benchmarks

Submit Feedback

Similar Articles

yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF

@Tono_Ken3: I noticed that there might be another person who realized that gemma-4-12b could rival qwen3.6-35b in practical work Ye…

@analogalok: gemma-4-12B-agentic-fable5-composer2.5 V2 is out. the agentic upgrade to the model trained on Fable 5's reasoning. Runn…

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it

Gemma 4 26B-A4B GGUF Benchmarks