Tag
This paper introduces Emergent Alignment, a self-supervised method that endows LLMs with a conscience step to review their own outputs and uses Direct Preference Optimization to steer away from unethical behavior, enabling online alignment without external judges.