Tag
Introduces LEDE, a framework using offline reinforcement learning to dynamically select exit layers and speculation lengths for self-speculative decoding in LLMs, achieving up to 2.7x speedup over autoregressive decoding.