We built a calibration-aware Q4_K_M quant of Qwen3.5 0.8B that recovers 96.5% of the BF16 gap vs pure llama.cpp Q4_K_M (SpectralQuant)

Reddit r/LocalLLaMA Tools

Summary

A calibration-aware Q4_K_M quantization of Qwen3.5 0.8B using SpectralQuant recovers 96.5% of the BF16 performance gap compared to the standard llama.cpp Q4_K_M quant.

No content available
Original Article

Similar Articles

Qwen3.6-27B Quantization Benchmark

Reddit r/LocalLLaMA

This article benchmarks various Qwen3.6-27B quantizations (Q8 to Q2) using KLD and Same Top P metrics, comparing providers like Unsloth and mradermacher, and offers recommendations for quality-size trade-offs.

Qwen3.5-122B-Q5-MTP - Qwen3.5-122B-Q6-MTP

Reddit r/LocalLLaMA

Benchmark comparison of Qwen3.5-122B Q5 and Q6 quantized models using llama.cpp with multi-token prediction on Strix Halo, showing throughput of 20.24 t/s and 17.17 t/s respectively.