training-efficiency

#training-efficiency

XPERT: Expert Knowledge Transfer for Effective Training of Language Models

arXiv cs.CL ↗ · yesterday Cached

The paper introduces XPERT, a framework that extracts and reuses expert knowledge from pre-trained Mixture-of-Experts (MoE) language models to improve training efficiency and performance in downstream models.

0 favorites 0 likes

#training-efficiency

SimReg: Achieving Higher Performance in the Pretraining via Embedding Similarity Regularization

arXiv cs.CL ↗ · yesterday Cached

This paper introduces SimReg, a regularization technique for LLM pretraining that uses embedding similarity to improve training convergence by over 30% and boost zero-shot performance.

0 favorites 0 likes

#training-efficiency

NoiseRater: Meta-Learned Noise Valuation for Diffusion Model Training

arXiv cs.LG ↗ · yesterday Cached

This paper introduces NoiseRater, a meta-learning framework that assigns importance scores to individual noise samples during diffusion model training to improve efficiency and generation quality.

0 favorites 0 likes

#training-efficiency

Gradient Extrapolation-Based Policy Optimization

arXiv cs.LG ↗ · 2d ago Cached

The article introduces Gradient Extrapolation-Based Policy Optimization (GXPO), a method that approximates multi-step lookahead in RL training for LLMs using only three backward passes. It demonstrates improved reasoning performance on math benchmarks over standard GRPO while maintaining fixed active-phase costs.

0 favorites 0 likes

#training-efficiency

Aurora: A Leverage-Aware Optimizer for Rectangular Matrices

Lobsters Hottest ↗ · 3d ago Cached

Tilde Research introduces Aurora, a new optimizer designed to prevent neuron death in MLP layers while maintaining orthogonality, achieving state-of-the-art results on nanoGPT benchmarks and 100x data efficiency on 1B models.

0 favorites 0 likes

#training-efficiency

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

Hugging Face Daily Papers ↗ · 4d ago Cached

AdaPreLoRA is a novel LoRA optimizer that uses Adafactor diagonal Kronecker preconditioning to improve factor-space updates while maintaining low memory usage, demonstrating competitive performance across various LLMs and tasks.

0 favorites 0 likes

#training-efficiency

Data Mixing for Large Language Models Pretraining: A Survey and Outlook

arXiv cs.CL ↗ · 2026-04-21 Cached

This paper presents a comprehensive survey of data mixing methods for LLM pretraining, formalizing the problem as bilevel optimization and introducing a taxonomy that distinguishes static (rule-based and learning-based) from dynamic (adaptive and externally guided) mixing approaches. The authors analyze trade-offs, identify cross-cutting challenges, and outline future research directions including finer-grained domain partitioning and pipeline-aware designs.

0 favorites 0 likes

#training-efficiency

New technique makes AI models leaner and faster while they’re still learning

MIT News — Artificial Intelligence ↗ · 2026-04-09 Cached

Researchers from MIT CSAIL and other institutions introduced CompreSSM, a technique that compresses state-space AI models during training by removing unnecessary components early, resulting in faster training and smaller models without sacrificing performance.

0 favorites 0 likes

#training-efficiency

Efficient training of language models to fill in the middle

OpenAI Blog ↗ · 2022-07-28 Cached

OpenAI presents a simple data augmentation technique that enables autoregressive language models to perform fill-in-the-middle (FIM) text generation without harming left-to-right performance, with extensive ablations and best practices provided for training such models.

0 favorites 0 likes

#training-efficiency

AI and efficiency

OpenAI Blog ↗ · 2020-05-05 Cached

OpenAI analyzes trends in AI algorithmic efficiency, showing that compute required to reach AlexNet-level performance has halved roughly every 16 months since 2012, outpacing hardware gains. The study draws comparisons across domains like DNA sequencing and transistor density to contextualize AI progress.

0 favorites 0 likes

training-efficiency

Submit Feedback