intent-calibrated

Tag

Cards List
#intent-calibrated

OpenSafeIntent: Evaluating Intent-Calibrated Safe Completion Across Dual-Use Prompt Sets

arXiv cs.CL · yesterday Cached

OpenSafeIntent introduces a benchmark of controlled prompt sets that vary intent while holding tasks fixed, enabling evaluation of whether models calibrate assistance across benign, dual-use, and malicious variants rather than appearing safe on average.

0 favorites 0 likes
← Back to home

Submit Feedback