Tag
This paper investigates how the tension between helpfulness and safety in LLMs leads to context-dependent suppression and recovery of certain behaviors, showing that the drive to be helpful can override causal caution mechanisms.
A creator describes how Twitter's algorithm drastically reduces reach after a viral post, with a 85-95% drop in metrics, and asks for transparency on how to recover from this suppression.