parameter-free

#parameter-free

Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning

arXiv cs.LG ↗ · 2026-06-16 Cached

This paper introduces AdaNAGED, a method that combines zero-order optimization, parameter-free adaptation, and non-Euclidean update geometry for memory-efficient fine-tuning of large language models, with theoretical convergence guarantees and validation on the OPT-1.3B model.

0 favorites 0 likes

#parameter-free

Simply Stabilizing the Loop via Fully Looped Transformer

arXiv cs.LG ↗ · 2026-05-20 Cached

This paper identifies gradient oscillation and residual explosion as causes of training instability in Looped Transformers, and proposes Fully Looped Transformer with two parameter-free modifications (Fully Looped Architecture and Attention Injection) to stabilize training up to 12 loop iterations, achieving up to 13.2% improvement in downstream performance.

0 favorites 0 likes

parameter-free

Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning

Simply Stabilizing the Loop via Fully Looped Transformer

Submit Feedback