data-mediation

#data-mediation

Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer

arXiv cs.LG ↗ · 7h ago Cached

This paper investigates emergent and subliminal misalignment in LLMs through a data-centric lens, showing that harmful fine-tuning effects depend on structural properties of the data, task difficulty, pretraining composition, and training channels, with experiments comparing off-policy and on-policy distillation.

0 favorites 0 likes

data-mediation

Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer

Submit Feedback