Tag
Tested Multi-Token Prediction on a llamacpp fork with a Qwen-based MoE model, achieving +0.41% PPL improvement over fp16 baseline.
A detailed benchmark comparing KV cache quantization methods (TurboQuant, TCQ, q4, q5, q8) using PPL and KLD metrics on Qwen 3.6 27B, finding that TCQ improves low-bit quantization, asymmetric KV beats symmetric at same size, and q8 is often overkill. Includes analysis and data in linked article.