Tag
This paper investigates safety failures in Large Reasoning Models where harmful content appears in reasoning traces despite safe final answers, proposing an adaptive multi-principle steering method to mitigate these risks.