runtime-safety

#runtime-safety

The Saturation Trap and the Subjectivity of Intervention Timing: Why Affect-Based Triggers and LLM Judges Fail to Time Interventions on Autonomous Agents

arXiv cs.AI ↗ · 4d ago Cached

This paper empirically examines when to interrupt autonomous AI agents during software execution, finding that affective-state thresholds saturate quickly, LLM judges achieve low F1 scores (0.17–0.40) at high cost, and human annotators themselves show near-chance agreement on intervention timing, making the construct unreliable as an optimization target.

0 favorites 0 likes

#runtime-safety

DART: Semantic Recoverability for Structured Tool Agents

arXiv cs.AI ↗ · 2026-05-25 Cached

DART introduces semantic recoverability for structured tool agents, formalizing a criterion to determine whether a local checkpoint restore remains valid after downstream commitments. Experiments across three LLM-driven domains show it correctly recovers all commitment-sensitive cases where baseline local recovery fails, and a safety audit finds no unsafe rollbacks.

0 favorites 0 likes

runtime-safety

The Saturation Trap and the Subjectivity of Intervention Timing: Why Affect-Based Triggers and LLM Judges Fail to Time Interventions on Autonomous Agents

DART: Semantic Recoverability for Structured Tool Agents

Submit Feedback