ppl

#ppl

@no_stp_on_snek: Tested out MTP for the first time on my llamacpp fork last night with turbo4 sym. GX10 hardware. using MoE model: llmfa…

X AI KOLs Following ↗ · 2026-05-22 Cached

Tested Multi-Token Prediction on a llamacpp fork with a Qwen-based MoE model, achieving +0.41% PPL improvement over fp16 baseline.

0 favorites 0 likes

#ppl

Here are my KV cache quantization benchmarks: TurboQuant is overrated but saved by TCQ, q5 deserves more attention, and symmetric q8 might be a waste of VRAM

Reddit r/LocalLLaMA ↗ · 2026-05-19

A detailed benchmark comparing KV cache quantization methods (TurboQuant, TCQ, q4, q5, q8) using PPL and KLD metrics on Qwen 3.6 27B, finding that TCQ improves low-bit quantization, asymmetric KV beats symmetric at same size, and q8 is often overkill. Includes analysis and data in linked article.

0 favorites 0 likes

ppl

@no_stp_on_snek: Tested out MTP for the first time on my llamacpp fork last night with turbo4 sym. GX10 hardware. using MoE model: llmfa…

Here are my KV cache quantization benchmarks: TurboQuant is overrated but saved by TCQ, q5 deserves more attention, and symmetric q8 might be a waste of VRAM

Submit Feedback