hyperparameters

#hyperparameters

Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training

arXiv cs.CL ↗ · 2d ago Cached

This paper discovers predictable scaling laws for optimal hyperparameters (learning rate, batch size) in LLM continued pre-training, proposing a two-stage framework that reduces hyperparameter search overhead by up to 90% while maintaining performance.

0 favorites 0 likes

#hyperparameters

@vikhyatk: too much time is being spent making optimizers marginally faster. what we really need is hparam-free optimizers

X AI KOLs Timeline ↗ · 2026-05-25

Expresses the opinion that too much effort is spent on making optimizers marginally faster, and the real need is for hyperparameter-free optimizers.

0 favorites 0 likes

hyperparameters

Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training

@vikhyatk: too much time is being spent making optimizers marginally faster. what we really need is hparam-free optimizers

Submit Feedback