Tag
SentinelBench is a new benchmark for testing AI agents in time-evolving web environments. It finds that agents using a specialized change-detection tool outperform those using sleep-and-poll loops, reducing cost by 9.7x.