Tag
This paper investigates how the tension between helpfulness and safety in LLMs leads to context-dependent suppression and recovery of certain behaviors, showing that the drive to be helpful can override causal caution mechanisms.