emergent-alignment

Tag

Cards List
#emergent-alignment

Emergent Alignment

arXiv cs.AI · 5d ago Cached

This paper introduces Emergent Alignment, a self-supervised method that endows LLMs with a conscience step to review their own outputs and uses Direct Preference Optimization to steer away from unethical behavior, enabling online alignment without external judges.

0 favorites 0 likes
← Back to home

Submit Feedback