Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms
Summary
Introduces Queryable LoRA, a data-adaptive method for efficient fine-tuning that uses a shared memory of low-rank update atoms with attention-based routing and instruction regularization to enable dynamic, context-sensitive parameter updates while maintaining scalability.
View Cached Full Text
Cached at: 05/12/26, 02:52 PM
Paper page - Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms
Source: https://huggingface.co/papers/2605.08423
Abstract
A data-adaptive method for efficient fine-tuning of large neural networks uses a shared memory of low-rank update atoms with attention-based routing to enable dynamic, context-sensitive parameter updates while maintaining scalability.
We present a data-adaptive method forparameter-efficient fine-tuningof large neural networks. Standardlow-rank adaptationmethods improve efficiency by restricting each layer update to a fixed low-rank form, but this static parameterization can be too rigid when the appropriate correction depends on the input and on the evolving depth-wise computation of the network. Our approach replaces a purely layer-local adapter with ashared queryable memoryof low-rank update atoms. For each block of layers, the model forms a query from the current low-rank state and a running summary of previous blocks, uses this query to retrieve a content-dependent combination of shared update components viaattention, and applies the resulting routed operator within thelow-rank bottleneck. In this way, the method retains the efficiency and scalability oflow-rank adaptationwhile allowing the effective update to vary across inputs and to share reusable structure across layers. The resulting architecture provides a principled middle ground between static LoRA-style updates and fully generated parameter updates: it remains compact and parameter-efficient while supporting dynamic, context-sensitive adaptation. Further, we incorporateinstruction-regularizationby augmentingroutinglogits with alanguage-induced priorover update atoms, thereby biasing the selection of low-rank transformations toward semantically relevant directions without generating unconstrained parameter updates. Experiments on noisy non-linear regression tasks and LLM fine-tuning suggest that this queryable update-memory formulation can improve final test performance and training stability compared to standardlow-rank adaptation, while using a comparable number oftrainable parameters.
View arXiv pageView PDFProject pageAdd to collection
Get this paper in your agent:
hf papers read 2605\.08423
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.08423 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.08423 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.08423 in a Space README.md to link it from this page.
Collections including this paper1
Similar Articles
Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation
This paper proposes a Mixture of LoRA and Full (MoLF) fine-tuning framework that uses gradient-guided optimizer routing to adaptively switch between LoRA and full fine-tuning. It aims to overcome the structural limitations of relying solely on static adaptation methods by combining the plasticity of full tuning with the regularization of LoRA.
Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training
Hybrid-LoRA proposes a framework that selectively applies full fine-tuning to a small subset of modules while using LoRA for the rest, achieving performance near full fine-tuning with significantly lower computational cost. Experiments show improvements of up to 5.65% over existing parameter-efficient baselines.
AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation
AdaPreLoRA is a novel LoRA optimizer that uses Adafactor diagonal Kronecker preconditioning to improve factor-space updates while maintaining low memory usage, demonstrating competitive performance across various LLMs and tasks.
Parameter-Efficient Fine-Tuning with Learnable Rank
Researchers from Adelaide University introduce LR-LoRA (Learnable Rank LoRA), a parameter-efficient fine-tuning method that dynamically learns the adapter rank for each transformer layer during training rather than using a fixed global rank. LR-LoRA achieves state-of-the-art performance on language understanding and commonsense reasoning benchmarks, outperforming fixed-rank LoRA baselines.
@jbhuang0604: LoRA, low-rank adaptation, is arguably the most popular parameter-efficient fine-tuning method for LLMs. But how does i…
LoRA (low-rank adaptation) is the most popular parameter-efficient fine-tuning method for LLMs. This video introduces how LoRA and its variants (LoRA+, QLoRA, VeRA, DoRA) work.