Tag
The paper introduces CXR-MAX, a large-scale benchmark for evaluating reasoning alignment in non-stationary environments using X-ray data from multiple MLLMs.