bias-mitigation

#bias-mitigation

Fair Cognitive Impairment Detection Through Unlearning

arXiv cs.LG ↗ · 2026-06-18 Cached

Proposes a multimodal framework for fair Mild Cognitive Impairment detection from speech, using unlearning via gradient reversal to reduce demographic bias and improve performance across subgroups.

0 favorites 0 likes

#bias-mitigation

Toward Calibrated, Fair, and accurate Deepfake Detection

arXiv cs.LG ↗ · 2026-06-10 Cached

Introduces Face-Fairness (FF), a plug-and-play framework for bias mitigation in deepfake detection, featuring Face-Feature Tuning (FFT) as the first demographic label-free fairness method that improves group accuracy and reduces performance gaps across demographics.

0 favorites 0 likes

#bias-mitigation

Detecting and Mitigating Bias by Treating Fairness as a Symmetry Operation

arXiv cs.AI ↗ · 2026-06-08 Cached

The paper proposes treating fairness as a symmetry operation in machine learning classifiers, implementing loss-based regularization to enforce invariance under swapping of sensitive attributes while holding merit features fixed. The framework achieves over 90% bias reduction with minimal accuracy loss and requires no causal graph knowledge.

0 favorites 0 likes

#bias-mitigation

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

arXiv cs.AI ↗ · 2026-06-04 Cached

BiasGRPO proposes a framework using Group Relative Policy Optimization (GRPO) to stabilize social bias mitigation in LLMs by normalizing rewards across sampled completions, outperforming DPO and PPO on multiple benchmarks. The authors also release a compute-efficient bias reward model designed for integration into multi-objective RLHF pipelines.

0 favorites 0 likes

#bias-mitigation

Faithful or Fabricated? A Causal Framework for Rationalization Bias in LLM Judges

arXiv cs.CL ↗ · 2026-05-26 Cached

This paper introduces a causal framework to quantify rationalization bias in LLM judges, where verdicts and explanations are influenced by non-evidential cues rather than underlying texts. It proposes cue interventions, anchoring metrics, and the Proof-Before-Preference mitigation protocol, demonstrating improved cue invariance.

0 favorites 0 likes

#bias-mitigation

Is Position Bias in Dense Retrievers Built In-or Learned from Data?

Hugging Face Daily Papers ↗ · 2026-05-26 Cached

This paper investigates whether positional bias in dense retrievers originates from architecture or training data, finding that training data distribution strongly influences bias and that balanced training can reduce sensitivity by up to 87% while maintaining retrieval performance.

0 favorites 0 likes

#bias-mitigation

Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction

arXiv cs.CL ↗ · 2026-05-21 Cached

This paper proposes a framework for parallel chunk-level processing of long documents with LLMs to reduce cumulative bias and improve evidence traceability, achieving significant reductions in omission errors and unsupported claims.

0 favorites 0 likes

#bias-mitigation

DebiasRAG: A Tuning-Free Path to Fair Generation in Large Language Models through Retrieval-Augmented Generation

arXiv cs.CL ↗ · 2026-05-18 Cached

DebiasRAG proposes a tuning-free, query-specific debiasing framework using retrieval-augmented generation to reduce social biases in LLMs without degrading their original capabilities.

0 favorites 0 likes

#bias-mitigation

Explanation Fairness in Large Language Models: An Empirical Analysis of Disparities in How LLMs Justify Decisions Across Demographic Groups

arXiv cs.CL ↗ · 2026-05-12 Cached

This paper introduces the Explanation Fairness Taxonomy (EFT) to analyze disparities in how LLMs justify decisions across demographic groups, finding significant biases in explanation quality and tone despite balanced decisions.

0 favorites 0 likes

#bias-mitigation

Preconditioned Test-Time Adaptation for Out-of-Distribution Debiasing in Narrative Generation

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper proposes CAP-TTA, a test-time adaptation framework that uses preconditioned LoRA updates triggered by bias-risk scores to mitigate toxicity and bias in large language models during narrative generation, achieving faster optimization and better fluency than standard baselines.

0 favorites 0 likes

#bias-mitigation

Whose Facts Win? LLM Source Preferences under Knowledge Conflicts

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper investigates how LLMs handle knowledge conflicts in retrieval-augmented generation by studying their preferences for different information sources. The authors find that LLMs prefer institutionally-corroborated sources but these preferences can be reversed by repetition, proposing a method to reduce repetition bias while maintaining consistent source preferences.

0 favorites 0 likes

#bias-mitigation

A Systematic Study of Training-Free Methods for Trustworthy Large Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

A systematic study evaluating training-free methods for improving trustworthiness in large language models, categorizing approaches into input, internal, and output-level interventions while analyzing trade-offs between trustworthiness, utility, and robustness.

0 favorites 0 likes

#bias-mitigation

Intellectual freedom by design

OpenAI Blog ↗ · 2025-07-15 Cached

OpenAI publishes a blog post outlining its commitment to intellectual freedom in ChatGPT design, emphasizing objectivity by default, user controls, and transparent principles through its Model Spec framework. The company highlights new personalization settings and ongoing efforts to evaluate and reduce political bias through stakeholder feedback.

0 favorites 0 likes

#bias-mitigation

The power of continuous learning

OpenAI Blog ↗ · 2022-12-23 Cached

Lilian Weng from OpenAI discusses her work on applied AI research, including robotics projects, language model safety, content moderation, and addressing social bias in deep learning models. She emphasizes the importance of safe deployment of cutting-edge AI techniques alongside their powerful real-world applications.

0 favorites 0 likes

#bias-mitigation

DALL·E 2 pre-training mitigations

OpenAI Blog ↗ · 2022-06-28 Cached

OpenAI describes the pre-training data filtering and active learning techniques used to reduce harmful content in DALL·E 2, while also addressing unintended bias amplification caused by data filtering—particularly demographic biases in generated images.

0 favorites 0 likes

bias-mitigation

Submit Feedback