Tag
CONF-KV is a KV-cache management system that uses model uncertainty to dynamically adjust cache retention, improving memory efficiency for long-context LLM inference while maintaining accuracy within 1.5-2.1 perplexity points.