介绍 cyankiwi AWQ 4-bit 量化——26.05 更新

Reddit r/LocalLLaMA 2026/05/14 18:49 模型

quantization awq 4-bit llama open-source model-optimization benchmark

摘要

Cyankiwi 推出了其 AWQ 4-bit 量化方法的更新版本，该方法联合优化缩放因子和量化范围，在 Llama-3 模型上实现了比现有方法更低的 KL 散度。

在标准 AWQ 中，逐通道的缩放因子和量化范围是分两步选择的：先选缩放因子，再选量化参数。但它们并不是独立的，即舍入误差的取值依赖于另一个的选择，因此顺序优化会损失质量。我们的 cyankiwi AWQ 26.05 更新版将缩放因子和量化范围联合拟合至重建目标。我们以 Llama-3 为例，将 cyankiwi AWQ 26.05 更新版与每种主流 4-bit 方法进行了基准测试，在 GPQA Diamond 响应上测量了相对于 BF16 基线的 KL 散度。结果：cyankiwi 在所有三个基础模型上都取得了最低的 KLD。数值越低越好。 # Llama-3.2-3B-Instruct |量化模型|方法|KLD| |:-|:-|:-| |**cyankiwi/Llama-3.2-3B-Instruct-AWQ-INT4**|**cyankiwi AWQ INT4**|**0.00510**| |unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit|unsloth BNB NF4|0.00785| |unsloth/Llama-3.2-3B-Instruct-bnb-4bit|BNB NF4|0.00896| |nvidia/Meta-Llama-3.2-3B-Instruct-ONNX-INT4|AWQ INT4|0.01494| |casperhansen/llama-3.2-3b-instruct-awq|AWQ INT4|0.02437| # Llama-3.1-8B-Instruct |量化模型|方法|KLD| |:-|:-|:-| |**cyankiwi/Llama-3.1-8B-Instruct-AWQ-INT4**|**cyankiwi AWQ INT4**|**0.00478**| |RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16|GPTQ INT4|0.00729| |unsloth/Meta-Llama-3.1-8B-Instruct-unsloth-bnb-4bit|unsloth BNB NF4|0.00769| |unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit|BNB NF4|0.00835| |RedHatAI/Llama-3.1-8B-Instruct-NVFP4|SmoothQuant NVFP4|0.01059| |nvidia/Llama-3.1-8B-Instruct-NVFP4|NVFP4|0.01190| # Llama-3.3-70B-Instruct |量化模型|方法|KLD| |:-|:-|:-| |**cyankiwi/Llama-3.3-70B-Instruct-AWQ-INT4**|**cyankiwi AWQ INT4**|**0.02826**| |unsloth/Llama-3.3-70B-Instruct-unsloth-bnb-4bit|unsloth BNB NF4|0.04444| |casperhansen/llama-3.3-70b-instruct-awq|AWQ INT4|0.04859| |unsloth/Llama-3.3-70B-Instruct-bnb-4bit|BNB NF4|0.06879| |nvidia/Llama-3.3-70B-Instruct-NVFP4|NVFP4|0.08307| |RedHatAI/Llama-3.3-70B-Instruct-quantized.w4a16|GPTQ INT4|0.09272| https://preview.redd.it/uicubbg6951h1.png?width=6400&format=png&auto=webp&s=2f7f1d4e46c9953f00c68518b3c2aa058fc34e32

查看原文

介绍 cyankiwi AWQ 4-bit 量化——26.05 更新

相似文章

我们构建了一个Qwen3.5 0.8B的校准感知Q4_K_M量化版，与纯llama.cpp Q4_K_M相比，恢复了96.5%的BF16性能差距（SpectralQuant）

Qwen3.6-27B 量化基准测试

KronQ: 基于Kronecker分解Hessian矩阵的大语言模型量化方法

更多QAT内容以及毛茸茸的tick

我构建了一个工具，在量化前实际测试哪些权重重要，而不是靠猜测（Qwen3.6-27B，三个版本：Bedrock/Tightrope/Gambit）

提交意见反馈