token-learning

#token-learning

@rosinality: https://arxiv.org/abs/2606.29858 Why does power-law scaling occur? Loss of individual tokens follows a sigmoidal curve,…

X AI KOLs Timeline ↗ · 2d ago Cached

This paper presents a token-level framework showing that power-law scaling in language model loss arises from the aggregation of sigmoidal learning curves of individual tokens, and demonstrates that reshaping training distributions based on token learning times can accelerate validation loss reduction by 11%.

0 favorites 0 likes

token-learning

@rosinality: https://arxiv.org/abs/2606.29858 Why does power-law scaling occur? Loss of individual tokens follows a sigmoidal curve,…

Submit Feedback