masked-input

Tag

Cards List
#masked-input

Data-Constrained Language Model Pretraining: Improved Regularization and Scaling Laws

arXiv cs.LG · 2026-06-08 Cached

This paper studies data-constrained language model pretraining, proposing masked-input regularization (MIR) to improve validation loss and downstream performance, and SoftQ, a scaling law that better captures model-data interaction under repeated data.

0 favorites 0 likes
← Back to home

Submit Feedback