llm-inference-acceleration

Tag

Cards List
#llm-inference-acceleration

Experience-Driven Dynamic Exits for LLMs with Reinforcement Learning

arXiv cs.CL · 2d ago Cached

Introduces LEDE, a framework using offline reinforcement learning to dynamically select exit layers and speculation lengths for self-speculative decoding in LLMs, achieving up to 2.7x speedup over autoregressive decoding.

0 favorites 0 likes
← Back to home

Submit Feedback