LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

Hugging Face Daily Papers 06/01/26, 12:00 AM Papers

Summary

LayerRoute is a lightweight adapter that selectively skips transformer blocks during inference based on input type, achieving compute savings while maintaining or improving model quality through gated routing and LoRA adaptation. It achieves a 12.91% skip differential on agentic language models.

Agentic language model systems alternate between two structurally distinct step types: structured tool calls (short, deterministic, low perplexity) and open-ended planning/reasoning steps (long, complex, high perplexity). Despite this heterogeneity, current inference systems apply identical compute to every step. We introduce LayerRoute, a lightweight adapter that learns to selectively skip transformer blocks on a per-input basis. LayerRoute augments each of the 24 transformer blocks in Qwen2.5-0.5B-Instruct with: (1) a per-layer router (~897 parameters, Linear(896,1)) that outputs a hard binary gate via the straight-through estimator, and (2) LoRA adapters (rank 8, ~1.08M parameters) on the Q/K/V/O attention projections. The backbone weights remain frozen. A single end-to-end training pass on agentic data (Hermes, Glaive, GSM8K, Turing) with a gate regularisation term forces the system to discover which blocks are skippable per input type. After 3,000 steps (6.4 minutes on an A100 40GB), LayerRoute achieves a 12.91% skip differential: tool calls skip 15.25% of FLOPs while planning steps skip only 2.34%, using only 1.10M trainable parameters (0.22% of the 494M backbone). Quality improves over the base model due to LoRA adaptation, with perplexity delta of -1.29 on tool calls and -1.30 on planning.

Original Article

View Cached Full Text

Cached at: 06/08/26, 03:16 PM

Paper page - LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

Source: https://huggingface.co/papers/2606.01838

Abstract

Agentic language model systems alternate between two structurally distinct step types:structured tool calls(short, deterministic, lowperplexity) andopen-ended planning/reasoning steps (long, complex, highperplexity). Despite this heterogeneity, current inference systems apply identical compute to every step. We introduceLayerRoute, a lightweight adapter that learns to selectively skiptransformer blockson a per-input basis.LayerRouteaugments each of the 24transformer blocksin Qwen2.5-0.5B-Instruct with: (1) a per-layerrouter(~897 parameters, Linear(896,1)) that outputs a hard binary gate via thestraight-through estimator, and (2)LoRA adapters(rank 8, ~1.08M parameters) on the Q/K/V/Oattention projections. Thebackbone weightsremain frozen. A singleend-to-end trainingpass on agentic data (Hermes, Glaive, GSM8K, Turing) with agate regularisationterm forces the system to discover which blocks are skippable per input type. After 3,000 steps (6.4 minutes on an A100 40GB),LayerRouteachieves a 12.91% skip differential: tool calls skip 15.25% ofFLOPswhile planning steps skip only 2.34%, using only 1.10M trainable parameters (0.22% of the 494M backbone). Quality improves over the base model due to LoRA adaptation, withperplexitydelta of -1.29 on tool calls and -1.30 on planning.

View arXiv page View PDF Project page GitHub0 Add to collection

Get this paper in your agent:

hf papers read 2606\.01838

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.01838 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.01838 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.01838 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

Paper page - LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation

Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures

Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

Parameter-Efficient Fine-Tuning with Learnable Rank

MoE$^2$-LoRA: When MoE Models Meet MoE-style Low-Rank Adaptation

Submit Feedback

Similar Articles

Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation

Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures

Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

Parameter-Efficient Fine-Tuning with Learnable Rank

MoE$^2$-LoRA: When MoE Models Meet MoE-style Low-Rank Adaptation