data-mediation

Tag

Cards List
#data-mediation

Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer

arXiv cs.LG · 7h ago Cached

This paper investigates emergent and subliminal misalignment in LLMs through a data-centric lens, showing that harmful fine-tuning effects depend on structural properties of the data, task difficulty, pretraining composition, and training channels, with experiments comparing off-policy and on-policy distillation.

0 favorites 0 likes
← Back to home

Submit Feedback