Tag
Proposes a multimodal framework for fair Mild Cognitive Impairment detection from speech, using unlearning via gradient reversal to reduce demographic bias and improve performance across subgroups.
Introduces Face-Fairness (FF), a plug-and-play framework for bias mitigation in deepfake detection, featuring Face-Feature Tuning (FFT) as the first demographic label-free fairness method that improves group accuracy and reduces performance gaps across demographics.
The paper proposes treating fairness as a symmetry operation in machine learning classifiers, implementing loss-based regularization to enforce invariance under swapping of sensitive attributes while holding merit features fixed. The framework achieves over 90% bias reduction with minimal accuracy loss and requires no causal graph knowledge.
BiasGRPO proposes a framework using Group Relative Policy Optimization (GRPO) to stabilize social bias mitigation in LLMs by normalizing rewards across sampled completions, outperforming DPO and PPO on multiple benchmarks. The authors also release a compute-efficient bias reward model designed for integration into multi-objective RLHF pipelines.
This paper introduces a causal framework to quantify rationalization bias in LLM judges, where verdicts and explanations are influenced by non-evidential cues rather than underlying texts. It proposes cue interventions, anchoring metrics, and the Proof-Before-Preference mitigation protocol, demonstrating improved cue invariance.
This paper investigates whether positional bias in dense retrievers originates from architecture or training data, finding that training data distribution strongly influences bias and that balanced training can reduce sensitivity by up to 87% while maintaining retrieval performance.
This paper proposes a framework for parallel chunk-level processing of long documents with LLMs to reduce cumulative bias and improve evidence traceability, achieving significant reductions in omission errors and unsupported claims.
DebiasRAG proposes a tuning-free, query-specific debiasing framework using retrieval-augmented generation to reduce social biases in LLMs without degrading their original capabilities.
This paper introduces the Explanation Fairness Taxonomy (EFT) to analyze disparities in how LLMs justify decisions across demographic groups, finding significant biases in explanation quality and tone despite balanced decisions.
This paper proposes CAP-TTA, a test-time adaptation framework that uses preconditioned LoRA updates triggered by bias-risk scores to mitigate toxicity and bias in large language models during narrative generation, achieving faster optimization and better fluency than standard baselines.
This paper investigates how LLMs handle knowledge conflicts in retrieval-augmented generation by studying their preferences for different information sources. The authors find that LLMs prefer institutionally-corroborated sources but these preferences can be reversed by repetition, proposing a method to reduce repetition bias while maintaining consistent source preferences.
A systematic study evaluating training-free methods for improving trustworthiness in large language models, categorizing approaches into input, internal, and output-level interventions while analyzing trade-offs between trustworthiness, utility, and robustness.
OpenAI publishes a blog post outlining its commitment to intellectual freedom in ChatGPT design, emphasizing objectivity by default, user controls, and transparent principles through its Model Spec framework. The company highlights new personalization settings and ongoing efforts to evaluate and reduce political bias through stakeholder feedback.
Lilian Weng from OpenAI discusses her work on applied AI research, including robotics projects, language model safety, content moderation, and addressing social bias in deep learning models. She emphasizes the importance of safe deployment of cutting-edge AI techniques alongside their powerful real-world applications.
OpenAI describes the pre-training data filtering and active learning techniques used to reduce harmful content in DALL·E 2, while also addressing unintended bias amplification caused by data filtering—particularly demographic biases in generated images.