Tag
This paper introduces 'second-order bias', the bias LLMs exhibit when judging biased content, and proposes a reasoning task grounded in epistemic entitlement to evaluate it. Experiments show that the task evades safety guardrails and reveals systematic demographic biases in LLM judges.
This paper introduces a controlled content overlap setup using parallel Bible translations to evaluate how much style classifiers rely on content cues rather than actual style features. Results show that low-overlap models degrade when content cues are removed, while high-overlap models transfer more robustly.