Tag
SentinelBench is a new benchmark for testing AI agents in time-evolving web environments. It finds that agents using a specialized change-detection tool outperform those using sleep-and-poll loops, reducing cost by 9.7x.
ChangeFlow presents a generative framework for remote sensing change detection that synthesizes change masks in latent space using rectified flow, achieving improved accuracy and robustness through sampling-based prediction ensembling, with an average F1 of 80.4% across four benchmarks.