self-fulfilling

Tag

Cards List
#self-fulfilling

Alignment pretraining: AI discourse creates self-fulfilling (mis)alignment

Hacker News Top · 2026-05-18 Cached

This paper introduces the concept of alignment pretraining, showing that discourse about AI in pretraining corpora can create self-fulfilling (mis)alignment in LLMs, and that upsampling aligned discourse significantly reduces misalignment.

0 favorites 0 likes
← Back to home

Submit Feedback