self-reframing

Tag

Cards List
#self-reframing

SHARD: Safe and Helpful Alignment via Self-Reframing Distillation

arXiv cs.CL · 2026-06-16 Cached

This paper introduces SHARD, a self-reframing distillation method that rewrites sensitive prompts to surface benign intent and fine-tunes models on safe, helpful responses, improving helpfulness while preserving safety.

0 favorites 0 likes
← Back to home

Submit Feedback