Tag
Proposes the Piggyback Hypothesis that chat-template tokens can cause emergent misalignment in LLMs, and introduces Token-Regularized Finetuning (TReFT) to mitigate it while preserving in-domain learning.