social-bias

Tag

Cards List
#social-bias

Evaluating Second-Order Bias of LLMs Through Epistemic Entitlement

arXiv cs.CL · 13h ago Cached

This paper introduces 'second-order bias', the bias LLMs exhibit when judging biased content, and proposes a reasoning task grounded in epistemic entitlement to evaluate it. Experiments show that the task evades safety guardrails and reveals systematic demographic biases in LLM judges.

0 favorites 0 likes
#social-bias

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

arXiv cs.AI · 2026-06-04 Cached

BiasGRPO proposes a framework using Group Relative Policy Optimization (GRPO) to stabilize social bias mitigation in LLMs by normalizing rewards across sampled completions, outperforming DPO and PPO on multiple benchmarks. The authors also release a compute-efficient bias reward model designed for integration into multi-objective RLHF pipelines.

0 favorites 0 likes
#social-bias

How Does Differential Privacy Affect Social Bias in LLMs? A Systematic Evaluation

arXiv cs.CL · 2026-05-13 Cached

This paper presents a systematic evaluation of how differential privacy impacts social bias in large language models, finding that while it reduces bias in sentence scoring, the effect does not generalize across all tasks.

0 favorites 0 likes
#social-bias

Evaluating fairness in ChatGPT

OpenAI Blog · 2024-10-15 Cached

OpenAI published a study examining how subtle identity cues like user names can influence ChatGPT's responses, introducing the concept of 'first-person fairness' to evaluate whether name-based biases lead to harmful stereotypes in direct user interactions. The research highlights limitations including a focus on English-language, binary gender, and four racial/ethnic categories.

0 favorites 0 likes
← Back to home

Submit Feedback