Tag
Introduces RKSC, a training-free inference framework for multi-branch LLM reasoning that reduces KV cache redundancy via similarity-based sharing and early exit, achieving up to 3x speedup with minimal error.