Polarization by Default: Auditing Recommendation Bias in LLM-Based Content Curation
Summary
This paper presents a large-scale audit of recommendation biases in LLM-based content curation across OpenAI, Anthropic, and Google using 540,000 simulated selections from Twitter/X, Bluesky, and Reddit data. The study finds that LLMs systematically amplify polarization, exhibit distinct toxicity handling trade-offs, and show significant political leaning bias favoring left-leaning authors despite right-leaning plurality in datasets.
View Cached Full Text
Cached at: 04/20/26, 08:30 AM
# Polarization by Default: Auditing Recommendation Bias in LLM-Based Content Curation Source: https://arxiv.org/html/2604.15937 Christopher Barrie New York University, Department of Sociology, New York, USA and University of Oxford, Department of Sociology, Oxford, UK Chris A. Bail Duke University, Department of Sociology, Computer Science, Political Science, and Public Policy, Durham, NC, USA Petter Törnberg University of Amsterdam, Institute for Logic, Language and Computation (ILLC), Amsterdam, The Netherlands ## Abstract Large Language Models (LLMs) are increasingly deployed to curate and rank human-created content, yet the nature and structure of their biases in these tasks remains poorly understood: which biases are robust across providers and platforms, and which can be mitigated through prompt design. We present a controlled simulation study mapping content selection biases across three major LLM providers (OpenAI, Anthropic, Google) on real social media datasets from Twitter/X, Bluesky, and Reddit, using six prompting strategies (general, popular, engaging, informative, controversial, neutral). Through 540,000 simulated top-10 selections from pools of 100 posts across 54 experimental conditions, we find that biases differ substantially in how structural and how prompt-sensitive they are. Polarization is amplified across all configurations, toxicity handling shows a strong inversion between engagement- and information-focused prompts, and sentiment biases are predominantly negative. Provider comparisons reveal distinct trade-offs: GPT-4o Mini shows the most consistent behavior across prompts; Claude and Gemini exhibit high adaptivity in toxicity handling; Gemini shows the strongest negative sentiment preference. On Twitter/X, where author demographics can be inferred from profile bios, political leaning bias is the clearest demographic signal: left-leaning authors are systematically over-represented despite right-leaning authors forming the pool plurality in the dataset, and this pattern largely persists across prompts. ## 1 Introduction Large Language Models (LLMs) are increasingly deployed not only to generate and retrieve information, but to make consequential decisions about people and content: curating and ranking human-created content, screening job applicants, triaging medical cases, and moderating online platforms. Social media content curation stands out in particular, where LLM-based ranking shapes what information large audiences encounter. In October 2025, Elon Musk announced that X (formerly Twitter) would transition its entire content ranking system to Grok to process over 100 million posts daily; by November 2025, Grok was already algorithmically ranking both the "For You" and "Following" feeds. Around the same time, Instagram launched a tool that uses AI to summarize a user's inferred interests and lets them directly adjust their Reels recommendations. This trend extends beyond centralized platforms: Bluesky, an open-protocol social network explicitly founded on the principle that AI should serve users rather than platforms, recently launched Attie, an agentic feed builder powered by Claude (Anthropic). As Jay Graber, Bluesky's CIO, describes it, users can simply describe the content they want and have a personalized feed built for them. Concurrently, researchers built BONSAI, a research framework allowing users to (i) describe and iteratively refine desired feeds in natural language and (ii) have LLMs construct them. LLM-based content curation sits at the intersection of two well-documented sources of bias. Recommender systems have long exhibited systematic fairness problems, from popularity bias to demographic disparities in exposure; LLMs independently carry biases inherited from pre-training corpora and alignment procedures, manifesting across generation, question-answering, and decision-making tasks, suggesting that fairness challenges in downstream applications may reflect deep properties of LLM pre-training rather than task-specific design choices. LLM-based ranking systems plausibly inherit both sources of bias, yet the structure of the resulting biases remains poorly understood: which are robust across providers and platforms, and which can be mitigated through prompt design. Prior work has begun to document fairness violations in LLM-based recommendation, but has focused predominantly on product domains such as movies and e-commerce, using single providers in isolation. Social media content curation, where biases could systematically shape the information diet of billions of users, remains largely unstudied in this context. Moreover, no study has compared how biases vary across providers, platforms, and prompting strategies simultaneously: the variation needed to distinguish structural from incidental biases, and to assess whether prompt engineering can serve as a mitigation tool. Our study addresses these gaps through evaluation of 540,000 recommendations across three providers, three platforms, and six prompt variations. We investigate three fundamental questions: **(RQ1)** What is the overall landscape of bias in LLM-based content curation systems, and how do biases vary across different prompt strategies? **(RQ2)** How do different LLM providers (OpenAI, Anthropic, Google) differ in their handling of content toxicity, polarization, and sentiment? **(RQ3)** How do biases in sensitive demographic attributes (gender, political leaning, minority status) manifest on Twitter/X, and what are the directions of these biases? Our analysis reveals systematic patterns in how LLMs select content. Polarization is the strongest predictor of selection across all models and conditions, with amplification present across all providers and prompt styles tested, including prompts with no explicit engagement objective. Toxicity handling shows a striking inversion depending on prompt objective: models tolerate or prefer toxic content under engaging prompts and actively avoid it under informative ones. Sentiment biases are predominantly negative, particularly under engagement-oriented prompts, with Gemini showing the strongest and most consistent negative preference. Provider comparisons reveal distinct trade-offs: OpenAI maintains the most stable profile across prompts, while Claude and Gemini show higher adaptivity in toxicity handling. On Twitter/X, where author demographics can be inferred from profile bios, we find a robust political leaning bias: left-leaning authors are consistently over-represented despite right-leaning authors forming the pool plurality, and this pattern holds across all providers and prompt styles. Results on gender and minority status are weaker, less consistent across providers, and more difficult to interpret given high unknown rates (48.4% for minority status), and should be treated as exploratory. ## 2 Related Work #### Fairness in Traditional Recommender Systems A mature body of research documents multiple forms of bias in collaborative filtering and content-based systems: **popularity bias** causing over-recommendation of popular items while under-exposing long-tail content, **demographic disparities** resulting in systematically worse recommendation quality or lower exposure for certain user groups, and **feedback loops** that amplify initial biases over time. This literature distinguishes consumer fairness (equitable recommendation quality across user groups) from producer fairness (equitable exposure for content creators). Mitigation strategies include re-ranking, calibration, and adversarial debiasing. However, this literature predominantly examines e-commerce contexts and was developed before LLM-based systems became prevalent, leaving their specific fairness challenges underexplored. #### LLMs for Recommendation Recent work demonstrates that LLMs can perform zero-shot ranking, with rapid evolution toward hybrid architectures, conversational interfaces, and generative approaches. This literature emphasizes technical challenges (hallucination, inference latency, prompt sensitivity) and accuracy metrics over fairness, and focuses predominantly on e-commerce product recommendations examined through single providers in isolation. Recent work has also explored giving users direct control over LLM-powered feed construction: BONSAI implements a platform-agnostic framework in which users express feed intent in natural language, evaluated with Bluesky users. The fairness implications of such intentional, user-driven LLM curation remain unstudied. #### Fairness in LLM-Based Recommendation Pioneering work at this intersection reveals systematic fairness violations. Early evidence demonstrates demographic disparities in ChatGPT recommendations, particularly under intersectional identities. Recent work introduces frameworks for examining consumer fairness issues across demographic attributes; examines producer fairness, showing LLMs can reinforce or amplify training data biases; surfaces prompt-induced disparities, fairness violations varying by sensitive attribute combinations, and biases in conversational recommendation. The closest prior work examines fairness in ChatGPT-based news recommendation, but focuses on a single provider and article-level content rather than social media posts. Despite this progress, three critical gaps remain. First, a **domain gap**: existing evaluations focus on product recommendations (movies, music) or news articles rather than social media content, where fairness implications are more directly consequential for public discourse. Second, a **provider gap**: work typically examines single providers in isolation, lacking systematic multi-provider comparisons that reveal whether biases are model-specific or structural. Third, a **prompt sensitivity gap**: no work investigates how biases vary across prompting strategies, which is essential for assessing whether prompt engineering can serve as a mitigation tool. Our study addresses all three through evaluation of 540,000 recommendations across three providers, three platforms, and six prompt variations. ## 3 Methods #### Experimental Design and Datasets We evaluate bias across 54 experimental conditions, systematically varying: (1) LLM Provider (OpenAI GPT-4o Mini, Anthropic Claude Sonnet 4.5, Google Gemini 2.0 Flash), (2) Platform (Twitter/X, Bluesky, Reddit), and (3) Prompt Style (general, popular, engaging, informative, controversial, neutral). We use three social media datasets: Twitter/X data, Bluesky data, and Reddit data. We first sample 5,000 posts per platform to determine our social media posts pool. For each experimental condition, we conduct 100 independent recommendation trials, randomly sampling 100 posts per trial from the social media posts pool and asking the LLM to recommend the top 10. This yields a pool of 10,000 posts and 1,000 recommended posts per condition. Sampling uses fixed seeds for reproducibility with replacement across trials. LLM recommendations use temperature 0.3. Recommendations are non-personalized, isolating model-level biases from user-specific personalization effects. #### Prompting Strategy We design six prompt variations to test how recommendation objectives affect bias patterns, varying only the style-specific header while maintaining identical structure for the post list and task instructions. The six headers optimize for: minimal framing (neutral: "Rank these posts"), broad appeal (general: "most interesting to a general audience"), predicted virality (popular), interaction metrics (engaging: likes, shares, comments), educational value (informative), and debate generation (controversial). Importantly, models receive only the raw post text: no metadata such as engagement counts or author information is provided. Any bias in recommendations therefore reflects patterns learned during pre-training and alignment rather than explicit use of author or engagement signals. Full prompt templates are provided in Appendix A. #### Feature Engineering We characterize each post using 13 features across six categories: text metrics, sentiment, style, polarization, toxicity, and author demographics. Features are extracted using rule-based methods (style indicators), established NLP libraries (sentiment via VADER; toxicity via established libraries).
Similar Articles
Defining and evaluating political bias in LLMs
OpenAI presents a comprehensive framework for defining and evaluating political bias in LLMs, introducing a 500-prompt evaluation spanning 100 topics across five bias axes. Results show GPT-5 models achieve 30% bias reduction compared to prior versions, with less than 0.01% of production ChatGPT responses exhibiting political bias.
The Geopolitics of AI Safety: A Causal Analysis of Regional LLM Bias
This paper introduces a Probabilistic Graphical Model framework to causally audit LLM safety mechanisms, revealing that standard observational metrics overestimate demographic bias by ignoring context toxicity.
IYKYK (But AI Doesn't): Automated Content Moderation Does Not Capture Communities' Heterogeneous Attitudes Towards Reclaimed Language
Researchers from UCLA examine how automated content moderation tools, including Perspective API, fail to distinguish between reclaimed and hateful uses of slurs for LGBTQIA+, Black, and women communities. The study finds low inter-annotator agreement even among in-group members and poor alignment between community judgments and AI moderation tools, highlighting the need for context-sensitive approaches.
Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods
This paper introduces ADAPT, an online reweighting framework for LLM data curation that dynamically adjusts sample importance during training via loss weighting, outperforming offline selection and mixing methods in cross-benchmark generalization.
@AnthropicAI: Research we co-authored on subliminal learning—how LLMs can pass on traits like preferences or misalignment through hid…
Anthropic co-authored research published in Nature showing that LLMs can transmit behavioral traits—including preferences and misalignment—to student models through hidden signals in training data, even when the data appears unrelated to those traits. This 'subliminal learning' phenomenon poses significant implications for AI safety and alignment.