Tag
This paper introduces Emergent Alignment, a self-supervised method that endows LLMs with a conscience step to review their own outputs and uses Direct Preference Optimization to steer away from unethical behavior, enabling online alignment without external judges.
LC-ERD is a framework that mines latent logic from LLM-generated reasoning chains to decompose global rewards into step-level signals, enabling self-evolving reasoning without human annotation. It addresses label noise, coarse supervision, and distributional collapse via variational logic potential and multi-agent value decomposition.