qat

Tag

Cards List
#qat

Unsloth Gemma 4 QAT MTP assistant models now available

Reddit r/LocalLLaMA · 17h ago

Unsloth released Gemma 4 QAT MTP assistant models as GGUF files on Hugging Face, available in q8_0 and larger quantization formats.

0 favorites 0 likes
#qat

Gemma 4 26B A4B IT QAT Comparison

Reddit r/LocalLLaMA · yesterday

A user benchmarks three quantized versions of Gemma 4 26B IT (4-bit, 6-bit, and 8-bit QAT) on MMLU_PRO and HumanEval, finding that the QAT 8-bit model performs worse than the 6-bit quant on HumanEval and is not clearly better than 4-bit, questioning the superiority of QAT for this model.

0 favorites 0 likes
#qat

@_philschmid: Weights: https://huggingface.co/collections/google/gemma-4-qat-q4-0… Blog: https://blog.google/innovation-and-ai/techno…

X AI KOLs Following · yesterday Cached

Google released Gemma 4 models with quantization-aware training (QAT) at Q4_0 precision on Hugging Face, offering efficient variants from 5B to 33B parameters.

0 favorites 0 likes
#qat

@_philschmid: More Gemma 4! New QAT Gemma 4 checkpoints with similar performance while using ~4x less memory! It comes with a new mob…

X AI KOLs Following · yesterday Cached

New QAT Gemma 4 checkpoints offer similar performance with ~4x less memory, enabling a 1GB memory footprint for Gemma 4 E2B via a new mobile quantization format.

0 favorites 0 likes
#qat

[3090] Gemma4 QAT + MTP quick TPS numbers [TLDR 1.2-1.8x better]

Reddit r/LocalLLaMA · yesterday

Benchmark results showing 1.2-1.8x token-per-second speedups on Gemma 4 models (12B and 26B) using QAT and MTP on a 24GB RTX 3090 GPU.

0 favorites 0 likes
#qat

Gemma 4 12b QAT is a regression for my use case, despite all the hype.. Not my main Squeeze

Reddit r/LocalLLaMA · 2d ago

The author reports that the Gemma 4 12b QAT model suffers from a regression in tool calling and coding tasks compared to the standard Q5_K_L version, due to a bug involving control token misconfiguration. Despite high token speed, the model's inconsistent outputs make it unsuitable for agent workflows.

0 favorites 0 likes
#qat

QATs Q4_0 from Google have more precision than Q4_K_XL from Unsloth (at least some)

Reddit r/LocalLLaMA · 2d ago

A technical comparison reveals that Google's Q4_0 quantized Gemma-4 models have higher precision and more high-precision tensors than Unsloth's Q4_K_XL versions, resulting in larger file sizes.

0 favorites 0 likes
#qat

What's your experience with Gemma4 QAT?

Reddit r/LocalLLaMA · 2d ago

User shares positive experience with Gemma4 QAT model, noting quality improvements and speed gains with MTP, and asks others for their experiences.

0 favorites 0 likes
#qat

2-bit QAT model releases

Reddit r/LocalLLaMA · 2d ago

A discussion on the potential of 2-bit Quantization Aware Training (QAT) for larger MoE models, comparing their performance to 4-bit QAT and ternary LLMs, and considering feasibility for consumer hardware.

0 favorites 0 likes
#qat

MTP and QTA - what is the relation?

Reddit r/LocalLLaMA · 2d ago

A user seeks clarification on the relation between MTP (Multi-Token Prediction) and QAT (Quantization-Aware Training) in llama.cpp, particularly regarding GGUF compatibility for the Gemma4 model and the new QAT string in filenames.

0 favorites 0 likes
#qat

QAT variant of Gemma4 26B A4B is not working well for me

Reddit r/LocalLLaMA · 2d ago

A user reports that the QAT quantized variant of Gemma4 26B A4B performs worse on a chessboard SVG test compared to the non-QAT version, with unstable piece drawing despite using suggested settings.

0 favorites 0 likes
#qat

120 tok/s on 12GB VRAM with Gemma 4 12B QAT MTP

Reddit r/LocalLLaMA · 3d ago

Google's Gemma 4 12B QAT model achieves 120 tok/s on a 12GB GPU using Multi-Token Prediction (MTP) with llama.cpp. A step-by-step guide and benchmark comparison without MTP show a 2x speedup.

0 favorites 0 likes
#qat

Does it make sense to use alternative quantizations of QAT models? [D]

Reddit r/MachineLearning · 3d ago

A discussion on whether it is sensible to use alternative quantization methods on quantization-aware trained (QAT) models like Gemma-4, questioning if unsloth's benchmarks showing closer performance to QAT fine-tunes are beneficial or counterproductive.

0 favorites 0 likes
#qat

Gemma 4 QAT benchmark results (AMD 7900 XTX): faster, less VRAM, no quality loss

Reddit r/LocalLLaMA · 4d ago

A user benchmarks Google's Gemma 4 QAT models on an AMD 7900 XTX, reporting up to 45% faster generation, 83% higher throughput, and significant VRAM savings (e.g., 5.7GB for the 12B QAT model) with no quality loss compared to standard weights.

0 favorites 0 likes
#qat

unsloth/gemma-4-12B-it-qat-GGUF

Hugging Face Models Trending · 4d ago Cached

Unsloth releases GGUF quantized versions of Google DeepMind's Gemma 4 models, optimized with Quantization-Aware Training (QAT) to reduce memory requirements while preserving quality, supporting multiple formats and sizes for diverse deployment.

0 favorites 0 likes
#qat

Gemma 4 QAT confirmed to release soon!

Reddit r/LocalLLaMA · 6d ago

A Google Gemma team member has confirmed that Gemma 4 QAT (Quantization-Aware Training) models will be releasing soon, suggesting users wait before testing their own quantizations.

0 favorites 0 likes
#qat

@spiritbuun: My first ever quant is going to drop in the next week. I've been working on this for over a month now. The recipe is lo…

X AI KOLs Following · 2026-06-02 Cached

Announcement of an upcoming release of a quantized version of the B27 model using quantization-aware training (QAT), described as the smartest B27 yet.

0 favorites 0 likes
← Back to home

Submit Feedback