We built a calibration-aware Q4_K_M quant of Qwen3.5 0.8B that recovers 96.5% of the BF16 gap vs pure llama.cpp Q4_K_M (SpectralQuant)

Reddit r/LocalLLaMA 06/27/26, 11:29 AM Tools

quantization calibration qwen model-compression efficiency spectralquant llama-cpp

Summary

A calibration-aware Q4_K_M quantization of Qwen3.5 0.8B using SpectralQuant recovers 96.5% of the BF16 performance gap compared to the standard llama.cpp Q4_K_M quant.

No content available

Original Article

Similar Articles

Qwen 3.6 27B on 24GB VRAM setup: backend comparisons, quant choice and settings (llama.cpp, ik_llama.cpp, BeeLlama, vllm)

Reddit r/LocalLLaMA

The article compares llama.cpp backends for running Qwen 3.6 27B on an RTX 3090 24GB, finding ik_llama.cpp with IQ4_KS quantization yields the best performance (1261 tok/s prefill, 72.9 tok/s decode).

Qwen3.6-27B Quantization Benchmark

Reddit r/LocalLLaMA

This article benchmarks various Qwen3.6-27B quantizations (Q8 to Q2) using KLD and Same Top P metrics, comparing providers like Unsloth and mradermacher, and offers recommendations for quality-size trade-offs.

@populartourist: Having worked consistently with Qwen3.6 27B NVFP4 on repos - it's clear that this quant is not reliable, at least for c…

X AI KOLs Timeline

The user reports that the Qwen3.6 27B NVFP4 quantization is unreliable for coding, with inconsistent quality despite high throughput, and suggests that Q4_K_M may be more consistent.

Qwen 3.6 27B 30GB Same top p: 98.358 ± 0.033 % vs UD Q8 K XL 33GB Same top p: 97.426 ± 0.041 %

Reddit r/LocalLLaMA

A community researcher shares a custom quantization recipe for Qwen3.6-27B that produces a smaller 30GB Q8 GGUF by keeping high-outlier sublayers in BF16, achieving better KLD and top-p metrics than Unsloth's 33GB Q8_K_XL variant.

Qwen3.5-122B-Q5-MTP - Qwen3.5-122B-Q6-MTP

Reddit r/LocalLLaMA

Benchmark comparison of Qwen3.5-122B Q5 and Q6 quantized models using llama.cpp with multi-token prediction on Strix Halo, showing throughput of 20.24 t/s and 17.17 t/s respectively.

Similar Articles

Qwen 3.6 27B on 24GB VRAM setup: backend comparisons, quant choice and settings (llama.cpp, ik_llama.cpp, BeeLlama, vllm)

Qwen3.6-27B Quantization Benchmark

@populartourist: Having worked consistently with Qwen3.6 27B NVFP4 on repos - it's clear that this quant is not reliable, at least for c…

Qwen 3.6 27B 30GB Same top p: 98.358 ± 0.033 % vs UD Q8 K XL 33GB Same top p: 97.426 ± 0.041 %

Qwen3.5-122B-Q5-MTP - Qwen3.5-122B-Q6-MTP

Submit Feedback