Tag
Introduces Hankel Reduced order Model (HRM) adapter, an SSM-based residual module initialized via Balanced Truncation for parameter-efficient fine-tuning, outperforming LoRA on long-context tasks.
This article delves into the principles of LoRA and its variants (QLoRA, VeRA, DoRA), explaining how low-rank decomposition reduces trainable parameters to enable efficient fine-tuning of large models.
Proposes ARIADNE, a training-free, adapter-agnostic routing framework that selects the optimal PEFT adapter at inference time by measuring input proximity to adapter-specific centroids in embedding space, recovering 97.44% of upper-bound performance on 23 tasks.
This paper proposes sparsity-induced adaptations to LoRA, including Cheap LoRA (cLA) and a chained circulant variant (c³LA), and provides theoretical generalization bounds along with empirical evaluations showing up to 10% training time reduction and 15% peak GPU memory savings while maintaining competitive performance.
This paper proposes SDBN, a framework combining adversarial training with parameter-efficient fine-tuning to improve robustness of foundation models under noise and limited data, demonstrating substantial improvements in low-resource settings.
This paper empirically compares several LoRA variants for multilingual instruction tuning and finds no significant advantage of complex variants over basic LoRA in balancing cross-lingual transfer and knowledge retention.
Presents a systematic study of parameter-efficient fine-tuning using LoRA on Qwen2.5-3B for telecommunications customer support, comparing 16 LoRA configurations with both traditional metrics and energy consumption analysis. Finds divergence between quantitative and qualitative performance.
ReLoRA is a knowledge-reusing adaptation framework that efficiently restores service-ready LoRA adapters for evolving LLM services, reducing time-to-readiness by up to 8.9× and improving accuracy by up to 4.6% through adaptive initialization and scheduled regularization.
This paper explores using parameter-efficient fine-tuning (PEFT) as a compact substrate for persistent personal models, studying scaling up, down, and out, and introduces MinT for managing adapters.
FoRA introduces a parameter-efficient fine-tuning method that selects task-informative layers via Fisher scores and trains LoRA down-projections on the Stiefel manifold, reducing parameters while preserving accuracy.
Hybrid-LoRA proposes a framework that selectively applies full fine-tuning to a small subset of modules while using LoRA for the rest, achieving performance near full fine-tuning with significantly lower computational cost. Experiments show improvements of up to 5.65% over existing parameter-efficient baselines.
This paper proposes LayerTracer, an interpretable framework for layer allocation in continued pre-training, demonstrating that freezing deep layers while training shallow ones outperforms full-parameter fine-tuning. It offers a low-cost, actionable strategy for resource-constrained teams optimizing Large Language Models.
The article introduces Echo-LoRA, a new parameter-efficient fine-tuning method that injects cross-layer representations from deeper source layers into shallow LoRA modules to improve performance without adding inference-time overhead.
The paper introduces CERSA, a novel parameter-efficient fine-tuning method that uses singular value decomposition to retain principal components, significantly reducing memory usage while outperforming existing methods like LoRA.
This study empirically demonstrates that gradient-based LoRA rank allocation, effective in supervised fine-tuning, degrades performance in GRPO-based reinforcement learning due to flatter gradient landscapes and a gradient amplification effect.
This paper introduces GLoRA, a gauge-aware server representation for Federated LoRA that addresses the semantic mismatch in factor aggregation by estimating a consensus update subspace. Experiments show GLoRA outperforms baselines in performance and efficiency across heterogeneous client scenarios.
This paper proposes Badit, a method that decomposes large language model parameters into orthogonal high-singular-value LoRA experts to mitigate cross-task interference during multi-task instruction tuning.
Introduces Queryable LoRA, a data-adaptive method for efficient fine-tuning that uses a shared memory of low-rank update atoms with attention-based routing and instruction regularization to enable dynamic, context-sensitive parameter updates while maintaining scalability.
SAMoRA introduces a semantic-aware router and task-adaptive scaling to improve expert specialization and dynamic weighting in MoE-LoRA fine-tuning, outperforming prior methods on multi-task benchmarks.
Aletheia introduces a gradient-guided layer selection method for efficient LoRA fine-tuning that identifies task-relevant transformer layers via lightweight gradient probes and applies adapters selectively, achieving 15-28% training speedup across 14 models while maintaining downstream performance on MMLU, GSM8K, and HumanEval benchmarks.