decoding-time

Tag

Cards List
#decoding-time

HARD-KV: Head-Adaptive Regularization for Decoding-time KV Compression

arXiv cs.LG · 3d ago Cached

Hard-KV introduces a Cascade Cache hierarchy and Logits Calibration mechanism to resolve the static-dynamic mismatch in head-adaptive KV cache compression, achieving up to 2x throughput improvement in long-context LLM inference.

0 favorites 0 likes
← Back to home

Submit Feedback