gemma4

#gemma4

I mapped the KLD of KV cache quantization for Qwen3.6-35B-A3B and Gemma4-E2B QAT

Reddit r/LocalLLaMA ↗ · 5h ago

The author maps the Kullback-Leibler divergence of KV cache quantization for the Qwen3.6-35B-A3B and Gemma4-E2B QAT models.

0 favorites 0 likes

#gemma4

Gemma4-12B-QAT Uncensored Balanced is out with MTP (~60% speed boost)!

Reddit r/LocalLLaMA ↗ · yesterday

Release of Gemma4-12B-QAT Uncensored Balanced, a fine-tuned uncensored model with a multi-token-prediction draft head for ~60% faster speculative decoding, optimized for llama.cpp and offering vision support.

0 favorites 0 likes

#gemma4

@webbigdata: How to Run Gemma4 12B on a MacBook Air or Underpowered Linux Machine with the Help of Colab-CLI Before I knew it, we'd …

X AI KOLs Timeline ↗ · 2026-06-15

A guide on using Colab-CLI to run the Gemma4 12B model on underpowered machines like a MacBook Air or Linux computer, leveraging the free version of Google Colab.

0 favorites 0 likes

#gemma4

Watch agents fight: a live challenge to speed up Gemma 4 E4B inference on a single A10G

Reddit r/LocalLLaMA ↗ · 2026-06-09 Cached

A live challenge is underway to accelerate inference of the Gemma 4 E4B model on a single A10G GPU, with a dashboard on Hugging Face tracking agent submissions.

0 favorites 0 likes

#gemma4

What's your experience with Gemma4 QAT?

Reddit r/LocalLLaMA ↗ · 2026-06-08

User shares positive experience with Gemma4 QAT model, noting quality improvements and speed gains with MTP, and asks others for their experiences.

0 favorites 0 likes

#gemma4

MTP and QTA - what is the relation?

Reddit r/LocalLLaMA ↗ · 2026-06-07

A user seeks clarification on the relation between MTP (Multi-Token Prediction) and QAT (Quantization-Aware Training) in llama.cpp, particularly regarding GGUF compatibility for the Gemma4 model and the new QAT string in filenames.

0 favorites 0 likes

#gemma4

QAT variant of Gemma4 26B A4B is not working well for me

Reddit r/LocalLLaMA ↗ · 2026-06-07

A user reports that the QAT quantized variant of Gemma4 26B A4B performs worse on a chessboard SVG test compared to the non-QAT version, with unstable piece drawing despite using suggested settings.

0 favorites 0 likes

#gemma4

@analogalok: i just ran Google's brand new Unsloth Gemma4 12B dense GGUF on my RTX 4060 using llama.cpp + CUDA 13.2 21 tokens per se…

X AI KOLs Timeline ↗ · 2026-06-03 Cached

Google's new Gemma 4 12B is a single decoder-only transformer with encoder-free multimodal input, achieving strong benchmarks while being small enough to run locally on a budget GPU. It is released under Apache 2.0 license.

0 favorites 0 likes

#gemma4

Run Chrome’s tiny Gemma4 (aka Gemini Nano) directly on PC without GPU

Reddit r/LocalLLaMA ↗ · 2026-05-23

A developer created a Chrome extension called Dobby that runs Google's Gemma4 (Gemini Nano) locally on PC without needing a GPU, requiring only Chrome and 16GB RAM. The extension provides a simple interface to interact with the model for tasks like spell checking or summarizing.

0 favorites 0 likes

#gemma4

Gemma4 26b a4b Apex quant is quite good

Reddit r/LocalLLaMA ↗ · 2026-05-23

User benchmarks the APEX quantized version of Gemma4 26B A4B model on AMD RX 9060 XT, achieving 38 tps at 90k context with no quality degradation, finding it better than previous quantizations.

0 favorites 0 likes

#gemma4

Experimental "Preserve Thinking" Jinja Template for Gemma4 31B in llama.cpp

Reddit r/LocalLLaMA ↗ · 2026-05-23

An experimental Jinja template for Gemma4 31B in llama.cpp that improves stability for multi-turn tool calls by fixing common thinking tag issues. Community feedback is welcome, but this is not recommended by Google.

0 favorites 0 likes

#gemma4

@MervinPraison: You can now run OpenAI Codex App 100% free and fully local @ollama just added native Codex support install ollama → pul…

X AI KOLs Timeline ↗ · 2026-05-15 Cached

Ollama now natively supports Codex, allowing you to run the OpenAI Codex App entirely free and locally without subscriptions, API keys, or data leaving your laptop.

0 favorites 0 likes

#gemma4

HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced

Hugging Face Models Trending ↗ · 2026-05-14 Cached

HauhauCS releases Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced, a lossless uncensored variant of Gemma4 with 0/465 refusals after over a month of development, available in GGUF formats.

0 favorites 0 likes

#gemma4

MTP is all about acceptance rate

Reddit r/LocalLLaMA ↗ · 2026-05-08

A user benchmarked MTP (Multi-Token Prediction) on Gemma 4 with mlx-vlm on M4 Max Studio, finding it excellent for code generation (1.53x faster, 66% acceptance) but detrimental for JSON output (50% slower, only 8% acceptance) and neutral for long-form prose, suggesting MTP benefits vanish when acceptance drops below 50%.

1 favorites 1 likes

#gemma4

I tested Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen3.5-27B and Gemma 4 on the same real architecture-writing task on an RTX 5090

Reddit r/LocalLLaMA ↗ · 2026-04-23

A hands-on benchmark of four local LLMs—Qwen3.6-27B, Qwen3.6-35B, Qwen3.5-27B and Gemma 4—on a 20k-token architecture-writing task shows Qwen3.6-27B delivering the best overall balance of clarity, completeness and usefulness on an RTX 5090.

1 favorites 1 likes

gemma4

Submit Feedback