Tag
This paper introduces the concept of alignment pretraining, showing that discourse about AI in pretraining corpora can create self-fulfilling (mis)alignment in LLMs, and that upsampling aligned discourse significantly reduces misalignment.