Tag
This paper reveals that during pre-training, language models frequently and suddenly switch between pattern-matching and generalization behaviors, a phenomenon called mode-hopping, and presents a toy evaluation suite to study it.