Google的QATs Q4_0比Unsloth的Q4_K_XL具有更高的精度（至少部分如此）

Reddit r/LocalLLaMA 2026/06/08 04:26 新闻

quantization gguf model-comparison precision gemma-4 qat machine-learning

摘要

技术对比显示，Google的Q4_0量化Gemma-4模型比Unsloth的Q4_K_XL版本具有更高的精度和更多的高精度张量，从而导致文件体积更大。

我想尝试新的QATs，于是在HF上打开了两个合集（HF为我找到的）：[https://huggingface.co/collections/google/gemma-4-qat-q4-0](https://huggingface.co/collections/google/gemma-4-qat-q4-0) [https://huggingface.co/collections/unsloth/gemma-4-qat](https://huggingface.co/collections/unsloth/gemma-4-qat) 有个奇怪的现象引起了我的注意，例如E4B：[https://huggingface.co/google/gemma-4-E4B-it-qat-q4\_0-gguf/resolve/main/gemma-4-E4B\_q4\_0-it.gguf](https://huggingface.co/google/gemma-4-E4B-it-qat-q4_0-gguf/resolve/main/gemma-4-E4B_q4_0-it.gguf) 5.15 GB [https://huggingface.co/unsloth/gemma-4-E4B-it-qat-GGUF/resolve/main/gemma-4-E4B-it-qat-UD-Q4\_K\_XL.gguf](https://huggingface.co/unsloth/gemma-4-E4B-it-qat-GGUF/resolve/main/gemma-4-E4B-it-qat-UD-Q4_K_XL.gguf) 4.22 GB 我在想， _0 怎么会比 _K_XL 还大。于是我查看了它们（具体方法见文末）。来自Google： | Dtype | Size Used | Tensors Qty | Elements Total | Bytes Total | |--------------------------------------------------------------------------------| | q6_k | 0.75 | 2 | 3,489,660,928 | 2.44 GiB | | q4_0 | 0.5 | 342 | 3,945,267,200 | 1.84 GiB | | f16 | 2.0 | 1 | 27,525,120 | 52.50 MiB | | f32 | 4.0 | 321 | 560,426 | 2.14 MiB | 来自unsloth： | Dtype | Size Used | Tensors Qty | Elements Total | Bytes Total | |--------------------------------------------------------------------------------| | q4_0 | 0.5 | 345 | 7,462,453,248 | 3.47 GiB | | f32 | 4.0 | 321 | 560,426 | 2.14 MiB | 我还查看了Google的其他GGUF。E2B： | Dtype | Size Used | Tensors Qty | Elements Total | Bytes Total | |--------------------------------------------------------------------------------| | q6_k | 0.75 | 2 | 2,751,463,424 | 1.92 GiB | | q4_0 | 0.5 | 275 | 1,863,057,408 | 888.38 MiB | | f16 | 2.0 | 1 | 13,762,560 | 26.25 MiB | | f32 | 4.0 | 263 | 286,243 | 1.09 MiB | 看起来是 _K_XL 类型。不过较大的模型（例如12B）就只是 Q4_0： | Dtype | Size Used | Tensors Qty | Elements Total | Bytes Total | |--------------------------------------------------------------------------------| | q4_0 | 0.5 | 328 | 10,899,947,520 | 5.08 GiB | | q6_k | 0.75 | 1 | 1,006,632,960 | 720.00 MiB | | f32 | 4.0 | 338 | 770,096 | 2.94 MiB | 我不知道并且希望能得到解答的是：为什么E2B和E4B（相较于较大模型）在GGUF中多了这些张量： 1 : f16 | per_layer_model_proj.weight | [1536, 8960] 2 : f32 | per_layer_proj_norm.weight | [256] 3 : q6_k | per_layer_token_embd.weight | [8960, 262144] * 使用命令 `koboldcpp --analyze model.GGUF | vibe_coded.py`。如果你知道如何用llama bundle汇总GGUF中的张量数据，请告诉我，我会与vibed工具的结果进行比较。我曾考虑将工具放到GitHub上，但仍不清楚如何正确标注AI的使用。

查看原文

相似文章

unsloth/gemma-4-12B-it-qat-GGUF

Hugging Face Models Trending

Unsloth 发布了Google DeepMind的Gemma 4模型的GGUF量化版本，通过量化感知训练（QAT）优化，在保持质量的同时降低内存需求，支持多种格式和大小，适用于不同的部署场景。

@_philschmid: 权重：https://huggingface.co/collections/google/gemma-4-qat-q4-0… 博客：https://blog.google/innovation-and-ai/techno…

X AI KOLs Following

Google 发布了 Gemma 4 模型，采用量化感知训练 (QAT) 并以 Q4_0 精度托管在 Hugging Face 上，提供从 5B 到 33B 参数的高效变体。

google/gemma-4-12B-it-qat-q4_0-gguf

Hugging Face Models Trending

Google DeepMind 发布了 Gemma 4 模型，这些模型通过量化感知训练（QAT）进行了优化，并提供包括 GGUF 在内的多种格式，在降低内存需求的同时实现了高质量。

Gemma 4 QAT模型：为移动和笔记本电脑效率优化压缩

Hacker News Top

谷歌发布采用量化感知训练（QAT）优化的Gemma 4模型，旨在提升移动和笔记本电脑部署的效率，将E2B模型的内存占用降至1GB，同时保持质量。

Gemma 4 26B-A4B GGUF 基准测试

Reddit r/LocalLLaMA

嘿，r/LocalLLaMA 社区，我们为不同提供方的 Gemma 4 26B-A4B GGUF 进行了 KL 散度（KL Divergence）基准测试，以帮助大家挑选最佳的量化版本。* 平均 KL 散度结果使几乎所有 **Unsloth GGUF 都位于帕累托前沿** * KLD 用于衡量量化模型与原始 BF16 输出分布的匹配程度，从而反映模型保留的精度。* 这使得 Unsloth 在 21/22 种尺寸中**表现最佳。**99.9% KLD 及其他指标也呈现相似趋势。* 我们还更新了我们的 Q6_K 量化版本以提高动态性。此前，它们...