Tag
This paper introduces SHARD, a self-reframing distillation method that rewrites sensitive prompts to surface benign intent and fine-tunes models on safe, helpful responses, improving helpfulness while preserving safety.