Tag
This paper introduces the log-alignment ratio (LAR), a training-time metric that measures parameter-activation alignment and predicts generalization by capturing the spread of weight and activation spectra. Experiments on grokking and a 3B-parameter language model show LAR tracks the transition from memorization to generalization and flags overfitting without held-out data.
Proposes R2R2, a regularization method for self-predictive learning in reinforcement learning to mitigate overfitting under high update-to-data ratios, achieving significant improvements on continuous control tasks.
A user found that reducing the learning rate from 2e-4 to 1e-4 significantly improved QLoRA fine-tuning of Llama 3.1 8B on a small dataset (8k samples), preventing overfitting and leading to better evaluation results.
This paper studies the trade-off between scarce target data and abundant generic data in mixture pretraining, finding that repetition is a key driver of performance and that mixture training tolerates 15-20 repetitions of target data. It introduces a repetition-aware scaling law to optimize mixture configurations under data constraints.
A modified scaling law accounting for data repetition effects provides compute-optimal training strategies for data-constrained scenarios, showing that beyond a point further repetition is counterproductive and compute is better spent on model capacity.
OpenAI trained 9 agents on the CoinRun environment with varying numbers of training levels to quantify generalization in reinforcement learning, finding substantial overfitting even with 16,000 training levels and that IMPALA-CNN architectures generalize significantly better than Nature-CNN baselines.