capacity-allocation

Tag

Cards List
#capacity-allocation

Generalization Dynamics of LM Pre-training (17 minute read)

TLDR AI · 2026-05-19 Cached

This paper reveals that during pre-training, language models frequently and suddenly switch between pattern-matching and generalization behaviors, a phenomenon called mode-hopping, and presents a toy evaluation suite to study it.

0 favorites 0 likes
← Back to home

Submit Feedback