decoding-time

#decoding-time

HARD-KV: Head-Adaptive Regularization for Decoding-time KV Compression

arXiv cs.LG ↗ · 3d ago Cached

Hard-KV introduces a Cascade Cache hierarchy and Logits Calibration mechanism to resolve the static-dynamic mismatch in head-adaptive KV cache compression, achieving up to 2x throughput improvement in long-context LLM inference.

0 favorites 0 likes

decoding-time

HARD-KV: Head-Adaptive Regularization for Decoding-time KV Compression

Submit Feedback