Tag
This paper studies data-constrained language model pretraining, proposing masked-input regularization (MIR) to improve validation loss and downstream performance, and SoftQ, a scaling law that better captures model-data interaction under repeated data.