self-alignment

Tag

Cards List
#self-alignment

Emergent Alignment

arXiv cs.AI · 4d ago Cached

This paper introduces Emergent Alignment, a self-supervised method that endows LLMs with a conscience step to review their own outputs and uses Direct Preference Optimization to steer away from unethical behavior, enabling online alignment without external judges.

0 favorites 0 likes
#self-alignment

LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition

arXiv cs.AI · 2026-05-26 Cached

LC-ERD is a framework that mines latent logic from LLM-generated reasoning chains to decompose global rewards into step-level signals, enabling self-evolving reasoning without human annotation. It addresses label noise, coarse supervision, and distributional collapse via variational logic potential and multi-agent value decomposition.

0 favorites 0 likes
← Back to home

Submit Feedback