abstention

Tag

Cards List
#abstention

What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents

arXiv cs.AI · yesterday Cached

This paper argues that current benchmarks for autonomous agents fail to evaluate whether an agent should have proceeded at all, introducing a 'compliance bias'. The authors propose a taxonomy of abstention-warranted scenarios and new evaluation protocols (Safety Rate, Usability Rate, Informed Refusal Rate) with preliminary results showing tunable safety–usability tradeoffs across model families.

0 favorites 0 likes
#abstention

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

Hugging Face Daily Papers · 2026-05-28 Cached

The paper introduces SpatialUncertain, a benchmark to evaluate whether vision-language models recognize when they cannot answer spatial questions due to occlusion or perspective ambiguity, revealing overconfidence and poor abstention behavior.

0 favorites 0 likes
#abstention

Fair and Calibrated Toxicity Detection with Robust Training and Abstention

arXiv cs.LG · 2026-05-15 Cached

This paper studies fairness in toxicity classification across three axes: ranking, calibration, and abstention. It compares ERM, reweighted ERM, and Group DRO methods with post-hoc interventions, finding that calibration disparity is a hidden fairness violation and that abstention itself can be unfair.

0 favorites 0 likes
#abstention

NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning

arXiv cs.LG · 2026-05-12 Cached

This paper introduces NoisyCoconut, an inference-time method that improves LLM reliability by injecting noise into latent trajectories to generate diverse reasoning paths. The approach enables models to abstain when uncertain, significantly reducing error rates in mathematical reasoning tasks without requiring retraining.

0 favorites 0 likes
← Back to home

Submit Feedback