bias-mitigation

#bias-mitigation

Explanation Fairness in Large Language Models: An Empirical Analysis of Disparities in How LLMs Justify Decisions Across Demographic Groups

arXiv cs.CL ↗ · yesterday Cached

This paper introduces the Explanation Fairness Taxonomy (EFT) to analyze disparities in how LLMs justify decisions across demographic groups, finding significant biases in explanation quality and tone despite balanced decisions.

0 favorites 0 likes

#bias-mitigation

Preconditioned Test-Time Adaptation for Out-of-Distribution Debiasing in Narrative Generation

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper proposes CAP-TTA, a test-time adaptation framework that uses preconditioned LoRA updates triggered by bias-risk scores to mitigate toxicity and bias in large language models during narrative generation, achieving faster optimization and better fluency than standard baselines.

0 favorites 0 likes

#bias-mitigation

Whose Facts Win? LLM Source Preferences under Knowledge Conflicts

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper investigates how LLMs handle knowledge conflicts in retrieval-augmented generation by studying their preferences for different information sources. The authors find that LLMs prefer institutionally-corroborated sources but these preferences can be reversed by repetition, proposing a method to reduce repetition bias while maintaining consistent source preferences.

0 favorites 0 likes

#bias-mitigation

A Systematic Study of Training-Free Methods for Trustworthy Large Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

A systematic study evaluating training-free methods for improving trustworthiness in large language models, categorizing approaches into input, internal, and output-level interventions while analyzing trade-offs between trustworthiness, utility, and robustness.

0 favorites 0 likes

#bias-mitigation

Intellectual freedom by design

OpenAI Blog ↗ · 2025-07-15 Cached

OpenAI publishes a blog post outlining its commitment to intellectual freedom in ChatGPT design, emphasizing objectivity by default, user controls, and transparent principles through its Model Spec framework. The company highlights new personalization settings and ongoing efforts to evaluate and reduce political bias through stakeholder feedback.

0 favorites 0 likes

#bias-mitigation

The power of continuous learning

OpenAI Blog ↗ · 2022-12-23 Cached

Lilian Weng from OpenAI discusses her work on applied AI research, including robotics projects, language model safety, content moderation, and addressing social bias in deep learning models. She emphasizes the importance of safe deployment of cutting-edge AI techniques alongside their powerful real-world applications.

0 favorites 0 likes

#bias-mitigation

DALL·E 2 pre-training mitigations

OpenAI Blog ↗ · 2022-06-28 Cached

OpenAI describes the pre-training data filtering and active learning techniques used to reduce harmful content in DALL·E 2, while also addressing unintended bias amplification caused by data filtering—particularly demographic biases in generated images.

0 favorites 0 likes

bias-mitigation

Explanation Fairness in Large Language Models: An Empirical Analysis of Disparities in How LLMs Justify Decisions Across Demographic Groups

Preconditioned Test-Time Adaptation for Out-of-Distribution Debiasing in Narrative Generation

Whose Facts Win? LLM Source Preferences under Knowledge Conflicts

A Systematic Study of Training-Free Methods for Trustworthy Large Language Models

Intellectual freedom by design

The power of continuous learning

DALL·E 2 pre-training mitigations

Submit Feedback