Tag
This paper examines counterfactual behavior in ML models through a geometric lens, showing that models with similar predictive performance can differ substantially in counterfactual outcomes due to the interaction between decision-boundary proximity and local data support. The findings identify counterfactual behavior as a distinct dimension from predictive performance, with implications for model selection and reliability of counterfactual explanation methods.
This paper introduces Counterfactual Explanation Consistency (CEC), a framework to detect and mitigate hidden procedural bias in outcome-fair models by aligning feature attributions between individuals and their counterfactual counterparts, with experiments on credit and income datasets.
The paper introduces Macro, a preference alignment framework using DPO to improve the validity and minimality of self-generated counterfactual explanations across multiple languages.