cache-sharing

Tag

Cards List
#cache-sharing

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

arXiv cs.LG · 2d ago Cached

Introduces RKSC, a training-free inference framework for multi-branch LLM reasoning that reduces KV cache redundancy via similarity-based sharing and early exit, achieving up to 3x speedup with minimal error.

0 favorites 0 likes
← Back to home

Submit Feedback