Tag
Rishi Bommasani announces the publication of a four-year research study on the real-world impacts of AI hiring tools, based on outcomes for 3.3 million people.
This research paper finds that language models exhibit increased dialect bias when comparing Standard American English and African-American Vernacular English side-by-side, even after safety fine-tuning. Counterfactual fairness fine-tuning can reduce some biases in isolation but not consistently in contrastive settings.
Proposes EquiSumm, a gender bias-aware framework for inclusive tweet summarization that ensures representation of opinions from different gender groups, addressing demographic fairness in automated summarization.
This paper introduces Computable Fair Division (CFD), a framework using Boltzmann-Softmax control to balance efficiency and fairness in AI resource allocation, with real-time adaptation via AHC++.
The article argues that dependency cooldowns unfairly burden developers in earlier time zones and proposes using deterministic phased rollouts based on project identifiers to distribute adoption more equitably.
This paper identifies structural failure modes in tabular fair semi-supervised learning under confidence gating and proposes Online Primal-Dual Allocation (OPDA) to mitigate them without per-dataset tuning.
This paper proposes GESD, a procedural-oriented fairness metric that measures disparities in explanation stability across subgroups, and integrates it into a multi-objective optimization framework for jointly optimizing utility, outcome fairness, and explanation fairness.
This paper studies how post-training quantization introduces new biases in instruction-tuned LLMs, finding that 3-bit precision causes 6–21% of previously unbiased items to develop stereotypes, while standard metrics like perplexity fail to detect this degradation.
DebiasRAG proposes a tuning-free, query-specific debiasing framework using retrieval-augmented generation to reduce social biases in LLMs without degrading their original capabilities.
This paper proposes a three-level taxonomy for evaluating AI cultural capabilities—Cultural Awareness, Sensitivity, and Competence—grounded in intercultural communication theory, aiming to improve validity and interpretability of AI evaluations in multicultural settings.
This paper studies how instruction-tuned LLMs can exhibit fair outputs while retaining biased internal representations in high-stakes decisions like mortgage underwriting, showing that these hidden biases are causally potent, asymmetric, and exploitable through activation steering.
This paper studies fairness in toxicity classification across three axes: ranking, calibration, and abstention. It compares ERM, reweighted ERM, and Group DRO methods with post-hoc interventions, finding that calibration disparity is a hidden fairness violation and that abstention itself can be unfair.
This paper introduces Counterfactual Explanation Consistency (CEC), a framework to detect and mitigate hidden procedural bias in outcome-fair models by aligning feature attributions between individuals and their counterfactual counterparts, with experiments on credit and income datasets.
This paper presents a systematic evaluation of how differential privacy impacts social bias in large language models, finding that while it reduces bias in sentence scoring, the effect does not generalize across all tasks.
FairHealth is an open-source Python library designed for trustworthy healthcare AI in low-resource settings, offering modules for fairness auditing, privacy-preserving federated learning, and explainability.
This study reveals a 'Smart Pruning Paradox' where activation-aware pruning methods like Wanda preserve perplexity but significantly amplify bias in Large Language Models deployed on edge devices.
This paper introduces Pareto UCB1 Gossip and Simulated NSW UCB Gossip for multi-objective multi-agent multi-armed bandits, addressing both learning efficiency and fairness in stochastic environments.
This paper critiques the use of single-reference ground truth in ASR evaluation, arguing it causes epistemic injustice for speakers with aphasia. It proposes a new metric, Epistemic Injustice Distance, and advocates for WER-Range to account for diverse transcription conventions.
MIT researchers release the first multilingual negation benchmark covering seven languages and show VLMs like CLIP struggle with non-Latin scripts, while MultiCLIP and SpaceVLM offer uneven improvements across languages.
DART (Distill-Audit-Repair Training) is a new training framework that addresses 'harm drift' in safety-aligned LLMs, where fine-tuning for demographic difference-awareness causes harmful content to appear in model explanations. On eight benchmarks, DART improves Llama-3-8B-Instruct accuracy from 39.0% to 68.8% while reducing harm drift cases by 72.6%.