Tag
This paper introduces OpenClawBench, a large-scale dataset for benchmarking process-side anomalies in real-world AI agent execution trajectories. It reveals that task success can hide process failures, with 9.33% of oracle-passing executions containing anomalies, and provides structured supervision via a novel taxonomy.