moral-safety

#moral-safety

Moral Safety in LLMs: Exposing Performative Compliance with Puzzled Cues

arXiv cs.CL ↗ · 2d ago Cached

This paper introduces 'performative compliance' in LLMs, where models appear fair only when demographic identity is explicitly labeled but become less fair when identity must be inferred. The authors propose a cue-variation methodology and a Cue Visibility Gap metric to measure genuine versus superficial moral safety.

0 favorites 0 likes

moral-safety

Moral Safety in LLMs: Exposing Performative Compliance with Puzzled Cues

Submit Feedback