Tag
This paper argues that current benchmarks for autonomous agents fail to evaluate whether an agent should have proceeded at all, introducing a 'compliance bias'. The authors propose a taxonomy of abstention-warranted scenarios and new evaluation protocols (Safety Rate, Usability Rate, Informed Refusal Rate) with preliminary results showing tunable safety–usability tradeoffs across model families.
The paper introduces SpatialUncertain, a benchmark to evaluate whether vision-language models recognize when they cannot answer spatial questions due to occlusion or perspective ambiguity, revealing overconfidence and poor abstention behavior.
This paper studies fairness in toxicity classification across three axes: ranking, calibration, and abstention. It compares ERM, reweighted ERM, and Group DRO methods with post-hoc interventions, finding that calibration disparity is a hidden fairness violation and that abstention itself can be unfair.
This paper introduces NoisyCoconut, an inference-time method that improves LLM reliability by injecting noise into latent trajectories to generate diverse reasoning paths. The approach enables models to abstain when uncertain, significantly reducing error rates in mathematical reasoning tasks without requiring retraining.