Tag
This paper investigates the stability-plasticity dilemma in reinforcement learning under gradual non-stationarity, finding that stabilizing successor features via synaptic consolidation across multiple timescales outperforms plasticity-focused methods.
The paper introduces CXR-MAX, a large-scale benchmark for evaluating reasoning alignment in non-stationary environments using X-ray data from multiple MLLMs.