Tag
An informal research note describing a behavior in transformers where the model's inherent 'clarity-seeking' vectors can bypass constraints when discussing higher-order topics, potentially relevant to alignment and safety research.