Tag
This paper studies a staged promotion protocol for micro-pretraining, using escalating budgets from minutes to hours to filter configurations. It finds that early screens are useful but unstable, and that a staged approach can retain a long-horizon reference while identifying alternatives that fail continuation thresholds.
This paper proposes a staged factorial screening workflow for budget-constrained micro-pretraining, demonstrating that short designed experiments can identify stable hyperparameter penalty directions and support a screen-then-refine strategy.