confidence-aware

#confidence-aware

CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM

Hugging Face Daily Papers ↗ · 2026-05-24 Cached

CONF-KV is a KV-cache management system that uses model uncertainty to dynamically adjust cache retention, improving memory efficiency for long-context LLM inference while maintaining accuracy within 1.5-2.1 perplexity points.

0 favorites 0 likes

confidence-aware

CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM

Submit Feedback