Tag
A University of Texas paper introduces AgingBench, a benchmark that reveals AI agents can become less reliable after deployment due to memory and maintenance decay, even when the underlying model remains unchanged.