Tag
This paper introduces the Explanation Fairness Taxonomy (EFT) to analyze disparities in how LLMs justify decisions across demographic groups, finding significant biases in explanation quality and tone despite balanced decisions.
Academic study exposes systemic counterfactual unfairness in LLMs: jokes from privileged speakers are refused 67% more often and rated as more malicious than identical jokes from marginalized speakers.