Tag
This paper introduces Program-of-Layers (PoLar), a method that allows LLMs to dynamically skip or loop pretrained layers per input, improving accuracy and efficiency over fixed-depth inference.
This paper introduces PoLar, a framework that learns input-specific execution programs for frozen transformer layers, allowing layers to be skipped, kept, or repeated. It improves accuracy and reduces inference overhead compared to fixed-depth methods.