training-efficiency

#training-efficiency

Aurora: A Leverage-Aware Optimizer for Rectangular Matrices

Lobsters Hottest ↗ · 2026-05-10 Cached

Tilde Research introduces Aurora, a new optimizer designed to prevent neuron death in MLP layers while maintaining orthogonality, achieving state-of-the-art results on nanoGPT benchmarks and 100x data efficiency on 1B models.

0 favorites 0 likes

#training-efficiency

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

Hugging Face Daily Papers ↗ · 2026-05-09 Cached

AdaPreLoRA is a novel LoRA optimizer that uses Adafactor diagonal Kronecker preconditioning to improve factor-space updates while maintaining low memory usage, demonstrating competitive performance across various LLMs and tasks.

0 favorites 0 likes

#training-efficiency

Data Mixing for Large Language Models Pretraining: A Survey and Outlook

arXiv cs.CL ↗ · 2026-04-21 Cached

This paper presents a comprehensive survey of data mixing methods for LLM pretraining, formalizing the problem as bilevel optimization and introducing a taxonomy that distinguishes static (rule-based and learning-based) from dynamic (adaptive and externally guided) mixing approaches. The authors analyze trade-offs, identify cross-cutting challenges, and outline future research directions including finer-grained domain partitioning and pipeline-aware designs.

0 favorites 0 likes

#training-efficiency

New technique makes AI models leaner and faster while they’re still learning

MIT News — Artificial Intelligence ↗ · 2026-04-09 Cached

Researchers from MIT CSAIL and other institutions introduced CompreSSM, a technique that compresses state-space AI models during training by removing unnecessary components early, resulting in faster training and smaller models without sacrificing performance.

0 favorites 0 likes

#training-efficiency

Efficient training of language models to fill in the middle

OpenAI Blog ↗ · 2022-07-28 Cached

OpenAI presents a simple data augmentation technique that enables autoregressive language models to perform fill-in-the-middle (FIM) text generation without harming left-to-right performance, with extensive ablations and best practices provided for training such models.

0 favorites 0 likes

#training-efficiency

AI and efficiency

OpenAI Blog ↗ · 2020-05-05 Cached

OpenAI analyzes trends in AI algorithmic efficiency, showing that compute required to reach AlexNet-level performance has halved roughly every 16 months since 2012, outpacing hardware gains. The study draws comparisons across domains like DNA sequencing and transistor density to contextualize AI progress.

0 favorites 0 likes

training-efficiency

Aurora: A Leverage-Aware Optimizer for Rectangular Matrices

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

Data Mixing for Large Language Models Pretraining: A Survey and Outlook

New technique makes AI models leaner and faster while they’re still learning

Efficient training of language models to fill in the middle

AI and efficiency

Submit Feedback