Tag
This paper introduces a benchmark of ten complex systems for validating causal abstraction metrics, evaluates over thirty candidate metrics, and proposes the Causal Abstraction Error (CAE) as a general-purpose validity metric that reliably discriminates valid from invalid explanations.