bias-detection

#bias-detection

The Ghost Annotator: a Framework to Explore Human Label Variation in Content Moderation through Conformal Prediction

arXiv cs.CL ↗ · 2026-06-03 Cached

The Ghost Annotator framework combines conformal prediction with collaborative filtering to model LLM behavior and human label variation in content moderation, revealing structural demographic biases in larger models.

0 favorites 0 likes

#bias-detection

I analyzed 25,500 LLM resume screenings to measure hiring bias. The results are a wake-up call.

Reddit r/artificial ↗ · 2026-06-01

A study analyzing 25,500 LLM resume evaluations across 10 models found a 45% bias rate driven by 'silent bias', with models inventing professional-sounding excuses to penalize candidates. It highlights significant variability in fairness and stability, with Claude, Mistral-Large, and Llama 4 being most stable, while Qwen and older Gemini models were volatile.

0 favorites 0 likes

#bias-detection

GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models

arXiv cs.CL ↗ · 2026-05-29 Cached

This paper introduces GPF-LiveNews, a streaming evaluation protocol for auditing how large language models frame live news events differently for various demographic groups, using semantic sensitivity and sentiment disparity measures across 42 identity labels and seven prompt families.

0 favorites 0 likes

#bias-detection

Do Fair Models Reason Fairly? Counterfactual Explanation Consistency for Procedural Fairness in Credit Decisions

arXiv cs.LG ↗ · 2026-05-14 Cached

This paper introduces Counterfactual Explanation Consistency (CEC), a framework to detect and mitigate hidden procedural bias in outcome-fair models by aligning feature attributions between individuals and their counterfactual counterparts, with experiments on credit and income datasets.

0 favorites 0 likes

#bias-detection

Surrogate modeling for interpreting black-box LLMs in medical predictions

arXiv cs.CL ↗ · 2026-04-23 Cached

Researchers propose a surrogate modeling framework to quantify and interpret latent medical knowledge encoded in black-box LLMs, revealing both valid associations and persistent racial biases.

0 favorites 0 likes

#bias-detection

Whose Story Gets Told? Positionality and Bias in LLM Summaries of Life Narratives

arXiv cs.CL ↗ · 2026-04-23 Cached

Columbia and Northwestern researchers propose a pipeline to surface race and gender bias in LLM abstractive summaries of life-story interviews, showing representational harm risks.

0 favorites 0 likes

#bias-detection

Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models

arXiv cs.CL ↗ · 2026-04-23 Cached

Introduces a framework to quantify how LLMs overstate certainty through rhetorical devices, revealing model-agnostic patterns of epistemic-rhetorical miscalibration.

0 favorites 0 likes

#bias-detection

Can We Locate and Prevent Stereotypes in LLMs?

arXiv cs.CL ↗ · 2026-04-23 Cached

ArXiv preprint maps stereotype-encoding neurons and attention heads in GPT-2 Small and Llama 3.2, showing biases cluster in small neuron subsets yet ablating them barely reduces biased text generation.

0 favorites 0 likes

#bias-detection

Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs

arXiv cs.CL ↗ · 2026-04-22 Cached

Google Research introduces LocQA, a 12-language dataset revealing that multilingual LLMs exhibit strong US-centric and population-based locale biases when answering ambiguous locale-dependent questions.

0 favorites 0 likes

#bias-detection

Investigating Counterfactual Unfairness in LLMs towards Identities through Humor

arXiv cs.CL ↗ · 2026-04-22 Cached

Academic study exposes systemic counterfactual unfairness in LLMs: jokes from privileged speakers are refused 67% more often and rated as more malicious than identical jokes from marginalized speakers.

0 favorites 0 likes

#bias-detection

BIASEDTALES-ML: A Multilingual Dataset for Analyzing Narrative Attribute Distributions in LLM-Generated Stories

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers introduce BIASEDTALES-ML, a large-scale multilingual dataset of ~350,000 LLM-generated children's stories across eight languages, designed to analyze narrative attribute distributions and cross-lingual bias patterns in language model outputs. The work reveals significant cross-lingual variability, highlighting limitations of English-centric bias evaluations.

0 favorites 0 likes

bias-detection

Submit Feedback