cache-sharing

#cache-sharing

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

arXiv cs.LG ↗ · 2d ago Cached

Introduces RKSC, a training-free inference framework for multi-branch LLM reasoning that reduces KV cache redundancy via similarity-based sharing and early exit, achieving up to 3x speedup with minimal error.

0 favorites 0 likes

cache-sharing

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

Submit Feedback