hadamard

#hadamard

Shard - getting to 10× KV cache compression

Reddit r/LocalLLaMA ↗ · 2026-05-26 Cached

Shard is a drop-in HuggingFace Cache that achieves 10x KV cache compression for Llama-3.1-8B by using PCA plus int4 quantization on K and Hadamard rotation plus vector quantization on V, without accuracy loss on benchmarks.

0 favorites 0 likes

hadamard

Shard - getting to 10× KV cache compression

Submit Feedback