Tag
Polar is a 4,026-instance multiple-choice benchmark for evaluating political bias in LLMs across U.S. and South Korean political contexts, measuring bias through option-level likelihoods. Experiments on 38 LLMs show systematic bias patterns varying by political context, issue category, and presentation language.
Introduces Face-Fairness (FF), a plug-and-play framework for bias mitigation in deepfake detection, featuring Face-Feature Tuning (FFT) as the first demographic label-free fairness method that improves group accuracy and reduces performance gaps across demographics.
Investigates how self-supervised speech recognition models encode speaker group information (gender, age, dialect, ethnicity, native speaker status) across layers, and how finetuning for tasks like ASR or speaker identification affects this encoding.
This paper introduces a Pareto-guided teacher alignment method for fair personalized text generation, aiming to balance multiple objectives in language model outputs.
This paper proposes PAFO, a Pareto fairness optimization framework to mitigate personalized reward bias in reward models for LLMs, improving accuracy for minority user groups without harming majority groups.
This paper introduces AI-MASLD, a stress-audit framework for medical LLMs that reveals how benchmark accuracy can hide serious safety failures, and demonstrates that open-weight models can match or exceed proprietary ones on safety dimensions.
Aquifer is an MCP runtime that provides bounded queues, fairness controls, and dynamic pacing to handle rate limits and traffic spikes in AI agent systems. It also introduces the Aqueduct Protocol for dynamic flow state communication.
The paper proposes treating fairness as a symmetry operation in machine learning classifiers, implementing loss-based regularization to enforce invariance under swapping of sensitive attributes while holding merit features fixed. The framework achieves over 90% bias reduction with minimal accuracy loss and requires no causal graph knowledge.
This large-scale study of 3.4 million job applicants across 156 employers reveals that algorithmic monocultures in hiring algorithms from a single vendor cause racial disparities and systemic rejections, with 25.87% of Black applicants and 14.74% of Asian applicants adversely impacted.
Researchers from the University of Amsterdam propose a tabular reinforcement learning approach to the Metro Network Expansion Problem, showing it achieves comparable performance to Deep RL while reducing training episodes by 18x and carbon emissions by 12x on average. The method also incorporates social equity criteria and is evaluated on real-world metro networks in Xi'an and Amsterdam.
This paper investigates the impact of demographic bias (sex and age) on skin lesion classification using ResNet models, finding that sex biases stem from data imbalances while age biases consistently favor younger groups, and evaluating multi-task and adversarial learning mitigation strategies.
This paper investigates how LLMs produce different outcomes based on conversational context, finding that topic, rather than explicit user demographics, is the primary driver of disparities in high-stakes scenarios like salary advice.
This paper presents a multi-domain red teaming framework for evaluating safety, robustness, and fairness of medical LLMs across 690 clinically grounded scenarios. Results show that high aggregate accuracy can mask critical failures, and hybrid evaluation with clinician oversight is necessary for credible safety assessment.
Introduces TrustLDM, a comprehensive benchmark for evaluating safety, privacy, and fairness of Language Diffusion Models, revealing that their alignment degrades with malicious post contexts. Proposes an automatic evaluation framework, TrustLDM-Auto, to identify vulnerable configurations.
This paper proposes a neuron-level intervention method to identify gender-specific neurons in language models (feminine, masculine, gender-neutral) and steer sentence generation toward a target gender form while preserving meaning, with experiments showing precise control and bias mitigation.
COFT is a training-free decoding method that applies token-level fairness control and conformal calibration to reduce bias in chain-of-thought reasoning of large language models, achieving 30-55% bias reduction with minimal computational overhead.
This paper presents the first bias evaluation of multimodal speech recognition models, finding significant accuracy differences across gender and ethnicity when pairing faces with audio, with implications for fairness in AI systems.
This paper introduces GPF-LiveNews, a streaming evaluation protocol for auditing how large language models frame live news events differently for various demographic groups, using semantic sensitivity and sentiment disparity measures across 42 identity labels and seven prompt families.
A research paper analyzing how algorithmic monoculture in hiring—where many employers use the same vendor's screening algorithms—leads to systematic rejection of the same individuals and racial groups, using a dataset of 3 million applicants.
This paper presents a two-level autoresearch framework where an outer-loop AI agent autonomously optimizes inner-loop LLM policy-synthesis pipelines for multi-agent sequential social dilemmas, achieving superior performance and discovering objective-specific mechanisms like fairness under a maximin welfare objective.