self-reframing

#self-reframing

SHARD: Safe and Helpful Alignment via Self-Reframing Distillation

arXiv cs.CL ↗ · 2026-06-16 Cached

This paper introduces SHARD, a self-reframing distillation method that rewrites sensitive prompts to surface benign intent and fine-tunes models on safe, helpful responses, improving helpfulness while preserving safety.

0 favorites 0 likes

self-reframing

SHARD: Safe and Helpful Alignment via Self-Reframing Distillation

Submit Feedback