I mapped the KLD of KV cache quantization for Qwen3.6-35B-A3B and Gemma4-E2B QAT

Reddit r/LocalLLaMA Papers

Summary

The author maps the Kullback-Leibler divergence of KV cache quantization for the Qwen3.6-35B-A3B and Gemma4-E2B QAT models.

No content available
Original Article

Similar Articles

KVarN: Native vLLM backend for KV-cache quantization by Huawei

Hacker News Top

Huawei CSL releases KVarN, a native vLLM attention backend for KV-cache quantization that delivers 3-5x more KV-cache capacity and up to ~1.3x the throughput of FP16, with no calibration required. It claims up to ~2.4x the throughput of TurboQuant while maintaining FP16-level accuracy on models like Qwen3-32B.