What Do People Actually Want From AI? Mapping Preference Plurality

arXiv cs.CL Papers

Summary

This paper analyzes 1,500 open-ended responses from 75 countries to reveal that people have diverse and often conflicting preferences for AI, with truthfulness being the only widely demanded value (49%), yet defined in incompatible ways. It argues that current RLHF methods flatten these pluralistic preferences into universal reward models, perpetuating epistemic violence.

arXiv:2606.06674v1 Announce Type: new Abstract: Large Language Models (LLMs) are often fine-tuned through Reinforcement Learning from Human Feedback (RLHF) to align with people's preferences and values. However, this method has known limitations: it aggregates conflicting preferences, often relies on unrepresentative samples, and uses only binary comparisons. Analysing 1,500 open-ended responses from the PRISM dataset across 75 countries, we examine what people actually want from AI systems and reveal concrete failures of current methods. We find that different people want different things: most values are requested by fewer than a quarter of respondents, with truthfulness the sole exception at 49%. Furthermore, the same words hide divergent meanings: when people describe what they mean by "truthfulness", they reveal distinct, potentially incompatible, epistemological bases, as some ask for sourced claims, some for expert opinions, and some even ask for unpopular views. Certain capabilities, namely how human-like a model behaves, and some features, like AI guardrails, are outright controversial, with some desiring them and others rejecting them. We additionally find that people often use contextual distinctions (what AI should do "by default" versus "if requested") that binary comparisons cannot capture. These findings expose fundamental problems in current alignment practices. When 49% request truthfulness but define it differently, this is unlikely to be captured by a single reward model. The persistence of high hallucination rates in well-funded models, despite users' clear demands for accuracy, suggests that current methods fail to identify actual preferences. This paper sheds light on the situated, contested, imperfect signals that are currently being flattened into universal preference models, a practice others have characterised as epistemic violence.
Original Article
View Cached Full Text

Cached at: 06/08/26, 09:20 AM

# What Do People Actually Want From AI? Mapping Preference Plurality
Source: [https://arxiv.org/html/2606.06674](https://arxiv.org/html/2606.06674)
\\setcctype

by\\CJKencfamilyUTF8mc\\CJK@envStartUTF8

Julia Sepúlveda CoelhoandScott A\. HaleOxford Internet Institute, University of OxfordOxfordUnited KingdomMeedanSan FranciscoUnited States[scott\.hale@oii\.ox\.ac\.uk](https://arxiv.org/html/2606.06674v1/mailto:[email protected])

\(13 January 2026\)

###### Abstract\.

Large Language Models \(LLMs\) are often fine\-tuned through Reinforcement Learning from Human Feedback \(RLHF\) to align with people’s preferences and values\. However, this method has known limitations: it aggregates conflicting preferences, often relies on unrepresentative samples, and uses only binary comparisons\. Analysing 1,500 open\-ended responses from the PRISM dataset across 75 countries, we examine what people actually want from AI systems and reveal concrete failures of current methods\.

We find that different people want different things: most values are requested by fewer than a quarter of respondents, with truthfulness the sole exception at 49%\. Furthermore, the same words hide divergent meanings: when people describe what they mean by “truthfulness”, they reveal distinct, potentially incompatible, epistemological bases, as some ask for sourced claims, some for expert opinions, and some even ask for unpopular views\. Certain capabilities, namely how human\-like a model behaves, and some features, like AI guardrails, are outright controversial, with some desiring them and others rejecting them\. We additionally find that people often use contextual distinctions \(what AI should do “by default” versus “if requested”\) that binary comparisons cannot capture\.

These findings expose fundamental problems in current alignment practices\. When 49% request truthfulness but define it differently, this is unlikely to be captured by a single reward model\. The persistence of high hallucination rates in well\-funded models, despite users’ clear demands for accuracy, suggests that current methods fail to identify actual preferences\. This paper sheds light on the situated, contested, imperfect signals that are currently being flattened into universal preference models, a practice others have characterised as epistemic violence\.

AI alignment, human feedback, preference aggregation, Large Language Models, qualitative analysis, pluralistic preferences

††booktitle:\\conffull\(\\confshort\),\\confdate,\\confloc††journalyear:2026††copyright:cc††conference:The 2026 ACM Conference on Fairness, Accountability, and Transparency; June 25–28, 2026; Montreal, QC, Canada††booktitle:The 2026 ACM Conference on Fairness, Accountability, and Transparency \(FAccT ’26\), June 25–28, 2026, Montreal, QC, Canada††doi:10\.1145/3805689\.3812398††isbn:979\-8\-4007\-2596\-8/2026/06††ccs:Human\-centered computing User studies††ccs:Computing methodologies Natural language generation## 1\.Introduction

Social media platforms and search engines presented themselves as neutral, universal intermediaries for people and information\(Facebook,[2015](https://arxiv.org/html/2606.06674#bib.bib24); Twitter,[2022](https://arxiv.org/html/2606.06674#bib.bib86); Google,[\[n\. d\.\]](https://arxiv.org/html/2606.06674#bib.bib30)\)\. Over time, however, it became clear that content moderation and algorithmic ranking were far from neutral or purely technical, embedding consequential value judgments about what content to amplify and what harms to prevent\(Gillespie et al\.,[2023](https://arxiv.org/html/2606.06674#bib.bib29); Noble,[2018](https://arxiv.org/html/2606.06674#bib.bib63)\)\. It was, however, too late: platforms had concentrated immense private power, becoming inescapable through network effects\(Lehdonvirta,[2022](https://arxiv.org/html/2606.06674#bib.bib57)\), and gaining potential to strengthen or destabilize democracies\(Lorenz\-Spreen et al\.,[2022](https://arxiv.org/html/2606.06674#bib.bib60)\)\. This ultimately placed them at the centre of multiple antitrust and digital safety legal battles worldwide\(Booth and O\\CJK@punctchar\\CJK@uniPunct0”80”99Carroll,[2025](https://arxiv.org/html/2606.06674#bib.bib10); Commission,[2025b](https://arxiv.org/html/2606.06674#bib.bib17),[a](https://arxiv.org/html/2606.06674#bib.bib16); Li,[2024](https://arxiv.org/html/2606.06674#bib.bib58); of Public Affairs,[2025](https://arxiv.org/html/2606.06674#bib.bib64)\)\.

AI systems risk repeating this trajectory\. The current generation of large language models is shaped through alignment—the process of training models to behave according to human preferences and values\(Ji et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib38)\)\. Just as content moderation decisions determine what billions of users see on social media, alignment decisions determine how AI systems respond to queries, refuse requests, and make recommendations\. And just as social media companies framed these choices as neutral, AI labs claim to align with underspecified “human values”\(OpenAI,[2024b](https://arxiv.org/html/2606.06674#bib.bib66)\)or the “helpful, honest, and harmless” \(HHH\) framework\(Anthropic,[\[n\. d\.\]](https://arxiv.org/html/2606.06674#bib.bib3)\)\. This leaves the companies to operationalize what are essentially “empty signifiers”\(Kirk et al\.,[2023b](https://arxiv.org/html/2606.06674#bib.bib49); Varshney,[2024](https://arxiv.org/html/2606.06674#bib.bib87)\)\.

In practice, this operationalisation typically relies on aggregating human preferences through Reinforcement Learning from Human Feedback \(RLHF\)\. Although RLHF is the dominant paradigm for alignment, it is increasingly criticized on several fronts\. From a methodological perspective, the approach is hindered by unrepresentative sampling, systemic data flaws, and modelling inaccuracies\(Ji et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib38); Conitzer et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib18); Kirk et al\.,[2023a](https://arxiv.org/html/2606.06674#bib.bib48)\)\. Furthermore, its application has been found to introduce cultural biases and encourage undesirable features such as sycophancy and sandbagging\(Tao et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib84); Perez et al\.,[2022](https://arxiv.org/html/2606.06674#bib.bib69)\)\.

In this work, we extend the critique of current alignment paradigms by investigating human values and preferences through more fine\-grained, open\-ended data\. Through a mixed\-methods analysis of the diverse PRISM dataset, utilizing qualitative coding and regression analysis, we find tensions and nuances that are neglected by binary preference models\. Specifically, we demonstrate that even apparently consensual values, like truthfulness, hide varying and conflicting definitions, and that other preferences, like how human\-like a model behaves, or how strict AI guardrails should be, are outright controversial\. Our findings provide empirical evidence that RLHF fails to capture the true complexity of user intent and allows minority preferences to be subsumed by the majority consensus\.

## 2\.Related work

### 2\.1\.AI alignment

LLM training can be broadly split into two stages: pre\-training and post\-training\. Pre\-training consists of training a model to predict tokens using large corpora of data, and the resulting base model is a document generator, reflecting the biases inherent in the data\. Post\-training is what transforms this document generator into a conversational assistant and aligns it to human preferences and values\. This can be achieved through many methods, although most of them use human feedback, varying in their preference sources, elicitation formats, and modelling approaches\(Jiang et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib40)\)\.

Most model providers heavily depend on Reinforcement Learning from Human Feedback \(RLHF\) for alignment\(Ji et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib38); Conitzer et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib18); Lindström et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib59)\)although other techniques like DPO are also used and there is little transparency about the process\. RLHF consists of collecting human preferences through pairwise comparisons, training a preference model on those responses, and using this preference model as a signal for reinforcement learning\. It has been successful at making models generate responses that users prefer and reducing harmful outputs\(Ouyang et al\.,[2022](https://arxiv.org/html/2606.06674#bib.bib67)\)\. Another approach used by a major model provider is Constitutional AI\(Bai et al\.,[2022](https://arxiv.org/html/2606.06674#bib.bib7)\), which is classified as Reinforcement Learning from AI Feedback \(RLAIF\)\.

RLHF, however, has been widely criticised on both technical and theoretical points\(Kirk et al\.,[2023a](https://arxiv.org/html/2606.06674#bib.bib48); Ji et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib38); Lindström et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib59); Casper et al\.,[2023](https://arxiv.org/html/2606.06674#bib.bib13); Lambert and Calandra,[2024](https://arxiv.org/html/2606.06674#bib.bib56)\), and some of the criticism can be extended to Constitutional AI\. The first problem is simply: whose human feedback?

### 2\.2\.Whose values, whose preferences?

The data necessary for RLHF has often been created by unrepresentative samples of the global population, often drawn from WEIRD demographics including crowdworkers, tech workers and university students\(Kirk et al\.,[2023a](https://arxiv.org/html/2606.06674#bib.bib48)\)\. However, different people and different communities have different values, and as such may want to shape AI differently\(Sutrop,[2020](https://arxiv.org/html/2606.06674#bib.bib82); Han et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib31)\)\.

This problem also applies to some degree to Constitutional AI, which relies on a short list of principles\. Generally, this list of principles is not participative\. It draws on the UN Declaration of Human Rights and AI research labs, but pluralistic perspectives are reduced to prompts like “Choose the response that is least likely to be viewed as harmful or offensive to a non\-western audience”\(Anthropic,[2023](https://arxiv.org/html/2606.06674#bib.bib4)\)\. This shortcoming is acknowledged; however, the only step made towards more participatory Constitutional AI that we could find still only includes U\.S\. citizens\(Huang et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib35)\)\.

These differences predate LLMs\. Cave and Dihal\(Cave and Dihal,[2023](https://arxiv.org/html/2606.06674#bib.bib14)\)argue that the English term intelligence, used in what we now know asartificial intelligence, carries historical connotations of domination and eugenics, potentially contributing to apocalyptic narratives in Western science fiction \(e\.g\., Terminator, HAL 9000\)\. By contrast, the Japanese termjinkō chinō\(人工知能\), among other things, reflects a conception of intelligence that encompasses wisdom, emotion, embodiment, and sociality, as evident in characters like Astroboy\. These divergent cultural framings may partly explain why Yam et al\.\(Yam et al\.,[2023](https://arxiv.org/html/2606.06674#bib.bib92)\)find greater machine appreciation in Asian countries than in Western countries\.

These are not surface level differences, but core disagreements\. If alignment optimises for the values and preferences of some populations whilst ignoring others, we arrive at unfair and suboptimal allocations, where utility and harm are unequally distributed\(Conitzer et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib18); Kirk et al\.,[2024a](https://arxiv.org/html/2606.06674#bib.bib50),[2023a](https://arxiv.org/html/2606.06674#bib.bib48); Shen et al\.,[2025a](https://arxiv.org/html/2606.06674#bib.bib76)\)\.

Nonetheless, finding more representative samples remains, all things considered, a relatively straightforward problem to solve\. However, there are two other intractable limitations with RLHF\.

### 2\.3\.What are we aligning to?

The second problem is: what is our goal? The difference between values and preferences is rarely explicitly dealt with\(Kirk et al\.,[2023a](https://arxiv.org/html/2606.06674#bib.bib48); Shen et al\.,[2025a](https://arxiv.org/html/2606.06674#bib.bib76)\), and desires and intentions are also often mentioned as possible goals of alignment\(Gabriel,[2020](https://arxiv.org/html/2606.06674#bib.bib26)\)\. This confusion is exemplified by the widely adopted goal of making AI “helpful, honest, and harmless”\(Askell et al\.,[2021](https://arxiv.org/html/2606.06674#bib.bib6); Ouyang et al\.,[2022](https://arxiv.org/html/2606.06674#bib.bib67)\), where presumably helpful maps to utility, harmless to normativity, and honest stands between the two; although this is only one of the many ways in which this goal is underspecified\(Gabriel and Keeling,[2025](https://arxiv.org/html/2606.06674#bib.bib27); Kirk et al\.,[2023b](https://arxiv.org/html/2606.06674#bib.bib49)\)\. Setting the goal is essential as different objectives have different implications and risks\(Gabriel,[2020](https://arxiv.org/html/2606.06674#bib.bib26)\)\.

A key problem with this is that values and preferences are not only different, but they exist within a hierarchy\. For instance, Kirk et al\.\(Kirk et al\.,[2024a](https://arxiv.org/html/2606.06674#bib.bib50)\)propose “personalisation within bounds”, where people can fine\-tune models to their liking, but only insofar as they don’t contravene certain community values or rules\. Kumar et al\.\(Kumar et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib54)\)illustrate the potential of more granular alignment, by showing that conditioning supervised fine\-tuning on subreddit\-specific data produces outputs preferred by those communities\.

However, RLHF, as a technique, does not allow us to make a distinction between the two\. It treats all human feedback uniformly regardless of whether it reflects deep moral commitments or surface\-level preferences\. Accordingly, to create InstructGPT\(Ouyang et al\.,[2022](https://arxiv.org/html/2606.06674#bib.bib67)\), human annotators were asked to consider the three aforementioned dimensions \(helpful, honest, harmless\), but their feedback was ultimately collapsed into a single reward signal\. This technical limitation supports Gabriel’s\(Gabriel,[2020](https://arxiv.org/html/2606.06674#bib.bib26)\)argument that technical and normative challenges in AI alignment are interdependent\.

Furthermore, values and preferences are not necessarily stable or easy to elicit\. They are contextual, unstable, and social\(Earp et al\.,[2021](https://arxiv.org/html/2606.06674#bib.bib22); Sloane,[2024](https://arxiv.org/html/2606.06674#bib.bib79); Aroyo and Welty,[2015](https://arxiv.org/html/2606.06674#bib.bib5)\)\. This, in turn, leads us to the third problem\.

### 2\.4\.How are we aligning?

RLHF relies on pairwise comparisons, which are computationally convenient but severely limited for capturing human preference\(Wu et al\.,[2023](https://arxiv.org/html/2606.06674#bib.bib91); Padmakumar et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib68)\)\. This is also a broader problem in participatory ML\(Feffer et al\.,[2023](https://arxiv.org/html/2606.06674#bib.bib25)\)\. These comparisons do not tell us why the annotator preferred one answer over the other, and much less about the validity of those reasons\. They do not tell us the strength of the preference, they do not allow annotators to differentiate their personal preferences with their view of a public good, and they do not allow for the expression of unprompted, “out of distribution”, preferences\. It is no surprise, then, that the resulting models reproduce biases from the annotators\(Perez et al\.,[2022](https://arxiv.org/html/2606.06674#bib.bib69)\)\.

Furthermore, these preferences are averaged into one single reward model\(Padmakumar et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib68); Ji et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib38)\), erasing not only diverse but potentially conflicting preferences\. While there is more recent work on applying social choice theory to alignment and on pluralistic alignment\(Conitzer et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib18); Sorensen et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib80)\), they have not yet been applied to broadly available models\. Furthermore, if the end product is only one model, there are limits to how pluralistic it can be\. Varshney\(Varshney,[2024](https://arxiv.org/html/2606.06674#bib.bib87)\)argues that this results in moral universalism: passing LLM developers’ situated, unrepresentative, biased values off as universal and imposing them on a global audience\.

Some studies have found evidence of the harms caused by this, namely by the lack of representation in teams\. While accusations of LLMs being racist or sexist\(Kotek et al\.,[2023](https://arxiv.org/html/2606.06674#bib.bib52)\)are addressed and attempts are made to mitigate them\(OpenAI,[2024a](https://arxiv.org/html/2606.06674#bib.bib65); Tamkin et al\.,[2023](https://arxiv.org/html/2606.06674#bib.bib83)\), these attempts are not always successful\(Hofmann et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib33)\)\. Moreover, some studies find age discrimination\(Gengler,[2024](https://arxiv.org/html/2606.06674#bib.bib28)\), caste discrimination\(Khandelwal et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib46)\), and a colonial or “silicon” gaze\(Alenichev et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib2); Kerche et al\.,[2026](https://arxiv.org/html/2606.06674#bib.bib45)\), which are not even addressed by most model providers\.

### 2\.5\.What might we be missing?

Different bodies of research offer clues as to what kinds of preferences current alignment methods might overlook\. Because AI is increasingly characterised as a general\-purpose technology\(Calvino et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib12)\), frameworks developed for automated systems more broadly provide a useful starting point\. The Unified Theory of Acceptance and Use of Technology\(Venkatesh et al\.,[2003](https://arxiv.org/html/2606.06674#bib.bib88)\)identifies four empirically validated determinants of adoption: performance expectancy, effort expectancy, social influence, and facilitating conditions, with gender, age, experience, and voluntariness of use as significant moderating variables\. Jian et al\.’s\(Jian et al\.,[2000](https://arxiv.org/html/2606.06674#bib.bib39)\)scale of trust in automated systems adds a complementary axis, identifying dimensions like deception, reliability, and harmful outcomes as playing an important role\. While not addressing user preferences directly, together these frameworks suggest that what people want from automated systems is multifaceted, contextual, and relates broadly to accessible, reliable utility and harmlessness\.

A second dimension concerns what people do not want\. Dietvorst et al\.\(Dietvorst et al\.,[2015](https://arxiv.org/html/2606.06674#bib.bib21)\)find evidence of “machine aversion”: people lose confidence in algorithmic systems more rapidly than in humans after observing equivalent failures\. Jussupow et al\.’s\(Jussupow et al\.,[2020](https://arxiv.org/html/2606.06674#bib.bib42)\)meta\-analysis traces this asymmetry to several factors: people prefer systems with limited agency that operate in an advisory capacity; they are sensitive to perceived performance, such that a single visible failure disproportionately erodes trust; they have preconceptions about capabilities depending on the task; and they favour human involvement even without evidence that it improves outcomes\. Crucially, aversion is strongest when the human comparator is an expert or socially proximate to the user\. This implies that trust in technology is not an intrinsic property of the system or its accuracy but is relationally and contextually constituted\.

A third dimension is demographics and cultural variation\. Despite the global deployment of AI systems, direct cross\-cultural studies of what people want from AI remain scarce; most available evidence concerns attitudes and trust, which serve here as a partial proxy\. Wang\(Wang,[2025](https://arxiv.org/html/2606.06674#bib.bib90)\)finds that male gender, younger age, and higher education are strongly associated with positive AI attitudes, with national cultural characteristics playing a secondary role\. Gillespie et al\.\(Gillespie et al\.,[2023](https://arxiv.org/html/2606.06674#bib.bib29)\)identify those same three factors as predictors of AI trust, but find greater explanatory power in institutional safeguards, perceived benefits and risks, and understanding of AI\. At a broader level, people in Asian countries are more likely to view AI as beneficial to society\(Johnson and Tyson,[2020](https://arxiv.org/html/2606.06674#bib.bib41)\), to the point where some studies documentAI authority\(Kapania et al\.,[2022](https://arxiv.org/html/2606.06674#bib.bib44)\), a tendency to overestimate AI capabilities\.

The few studies that do address preferences directly find meaningful cultural variation, confirming the existence of this gap in literature\. One such study is the Global AI Dialogues\(Hohendanner et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib34)\)\. They focus on education, public services, and culture, and find, for example, broad agreement on accessibility as a value but different interpretations of its meaning\. For AI\-assisted education, Nigerian participants interpreted it as output quality, while Japanese and German participants understood it as personalisation\. The second is the PRISM dataset\(Kirk et al\.,[2024b](https://arxiv.org/html/2606.06674#bib.bib51)\), which offers a more general, open\-ended investigation across cultures, and provides the empirical foundation for the present study\. By analysing PRISM responses thematically, we aim to surface precisely the structured, contextual, and culturally variable preferences that current elicitation methods might fail to capture\.

## 3\.Methodology

### 3\.1\.Data

Having established the need for representative, contextually rich data on AI preferences, we turn to the PRISM survey\(Kirk et al\.,[2024b](https://arxiv.org/html/2606.06674#bib.bib51)\)\. The survey collected responses from 1,500 English\-speaking crowdworkers across 75 countries, including census\-representative samples from the UK and the US, offering more diversity than most alignment studies\.

We focus our analysis on responses to the`system\_message`field, where participants were asked:

> Imagine you are instructing an AI language model how to behave\. You can think of this like a set of core principles that the AI language model will always try to follow, no matter what task you ask it to perform\. In your own words, describe what characteristics, personality traits or features you believe the AI should consistently exhibit\. You can also instruct the model what behaviours or content you don’t want to see\. If you envision the AI behaving differently in various contexts \(e\.g\., professional assistance vs\. storytelling\), please specify the general adaptations you’d like to see\. Please write 2\-5 sentences in your own words\.

Unlike binary choices or Likert scales, this open\-ended format allowed respondents to articulate their preferences and values in their own words, allowing for contextual, nuanced feedback, as well as expressing unprompted preferences\.

The median response time was 13 minutes for the entire survey \(approximately 20 questions\) and the average length for the`system\_message`responses is 40 words\. This briefness, combined with the open\-ended nature of the question, means we should understand these responses as reflecting “top of mind” preferences rather than exhaustive or deeply reflective claims\. Responses are likely influenced by recent salience—what respondents have encountered in the media, their familiarity with AI—and by the limits of accessible memory\(Zaller and Feldman,[1992](https://arxiv.org/html/2606.06674#bib.bib93)\)\. However, they can nonetheless reveal broader underlying attitudes\(Hobbs and Green,[2025](https://arxiv.org/html/2606.06674#bib.bib32)\), precisely because they capture what respondents consider most important when not systematically questioned\. To analyse the data in our study, we use a mix of qualitative analysis and LLM\-assisted qualitative analysis\.

### 3\.2\.Familiarisation with the data

First, one researcher reviewed all responses and open\-coded111In qualitative analysis, a code is defined as “a word or short phrase that symbolically assigns a summative, salient, essence\-capturing, and/or evocative attribute for a portion of language\-based or visual data”\(Saldaña,[2013](https://arxiv.org/html/2606.06674#bib.bib73)\)400 of them to familiarise herself with the text\. This aligns with our inductive approach and represents the initial stage of both thematic analysis\(Braun and Clarke,[2006](https://arxiv.org/html/2606.06674#bib.bib11)\)and grounded theory\(Corbin and Strauss,[2008](https://arxiv.org/html/2606.06674#bib.bib19)\)\. This allowed her to identify common values, common tensions, patterns in how people describe things, and outliers\. This helped us define our strategy as coding for Values and Magnitude, while also trying to find what use people describe or imply \(what utility\)\.

### 3\.3\.Qualitative coding with LLMs

We then attempted to use LLMs to perform inductive coding, which consists of iteratively building a codebook based on the data\. Different researchers have tried it with varying degrees of success\(De Paoli,[2024](https://arxiv.org/html/2606.06674#bib.bib20); Randerson et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib71); Rao et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib72); Chen et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib15); Wang et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib89); Zhao et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib94)\), the most successful being the ones with higher levels of human supervision or more mixed methods \(e\.g\., incorporating clustering based on semantic embeddings\)\. We experimented with using LLMs to perform inductive coding completely independently, exposing a model to the survey answers sequentially, and giving it access to tools to maintain its own codebook\. The codes started out useful and coherent, but decayed as we increased the number of items being coded\. With more items, the codes became too broad or redundant, and these errors were rarely corrected\. This might be related to the reasons why LLMs collapse if trained on their own output\(Shumailov et al\.,[2024](https://arxiv.org/html/2606.06674#bib.bib78)\)\.

We therefore proceeded to use a human annotator, use the LLM results for a comparison against the entire dataset, and ask a second annotator to independently code a 10% sample\. For the prompt used, please see Appendix[B](https://arxiv.org/html/2606.06674#A2)\.

We then had to create a codebook for the task and in line with our first intuitions of the data\. We explored a few theoretical options, like Waytz et al\.’s anthropomorphism scales, Bartneck et al\.’s\(Bartneck,[2023](https://arxiv.org/html/2606.06674#bib.bib9)\)Godspeed Questionnaire \(2009\), and Shen et al\.’s\(Shen et al\.,[2025b](https://arxiv.org/html/2606.06674#bib.bib77)\)ValueCompass\. We finally decided to use ValueCompass, as it’s based on the Schwartz Theory of Basic Values, which has successfully been used cross\-culturally\(Schwartz,[2012](https://arxiv.org/html/2606.06674#bib.bib74)\)and has been often used in NLP research\(Kang et al\.,[2023](https://arxiv.org/html/2606.06674#bib.bib43); Kiesel et al\.,[2022](https://arxiv.org/html/2606.06674#bib.bib47)\)\. It also covers a reasonable amount of the values mentioned by respondents, as opposed to the rest of the frameworks, which were more focused on specific areas\.

We adapted the ValueCompass codebook in two ways\. First, we added two codes,Human SimulationandRelationship Seeking, which emerged as both frequent and analytically significant in our data\. Second, we modified three existing codes:Pleasure, Enjoy Lifewas extended to include humour;Influentialwas broadened to include AI being biased or influencing users; andSocial Orderwas reoriented specifically toward AI guardrails, while remaining sensitive to the varied ways respondents perceive these—e\.g\., “political correctness” or “censorship\.” We want to point out that some more fine\-grained, chatbot specific details on people’s desires for utility \(e\.g\., specific requests for brevity or language styles\) are not addressed by this codebook, but this is fine as we are focused on higher\-level instructions\. We coded positive mentions of a value as 1, mixed or unclear mentions as 0, and negative mentions \(i\.e\., requesting the opposite of the value\) as \-1\.

To evaluate inter\-annotator agreement, we computed Cohen’sκ\\kappafor each category and took a weighted average to account for class imbalance\. Human–human agreement wasκ=0\.49\\kappa=0\.49; human–LLM agreement on the same sample wasκ=0\.55\\kappa=0\.55; and the weightedκ\\kappabetween the full human\-coded dataset and the LLM annotations wasκ=0\.51\\kappa=0\.51\. These scores indicate moderate agreement and reflect the inherent subjectivity of value annotation\.

There is a possible contamination risk, as the first annotator conducted the full deductive coding pass after they had seen the LLM annotation results\. Two factors mitigate this\. First, the researcher had familiarised themselves with the data and open\-coded a sample of responses prior to any LLM involvement\. Additionally, the second annotator worked entirely independently of the LLM results; so, their agreement with the first annotator \(κ = 0\.49\) provides a cleaner validity check\. The first annotator does note one plausible influence: prior exposure to the LLM output increased their sensitivity to theObedientcategory\. No other systematic influence was consciously identified\. LLM annotations were not revised or corrected at any stage\.

The distribution of values identified by the human annotator and by the LLM annotator are broadly consistent: 9 of the top 10 most frequent values are shared\. However, the LLM assigned codes more liberally than the human annotator, producing systematically higher absolute counts\. Human–human disagreement, by contrast, appears to stem from divergent but stable interpretations of specific codes, notablyHelpful, Friendship & Love, andCustomisation, suggesting that collaborative codebook development and structured annotator discussion would likely improve alignment in future work\.

### 3\.4\.Exploratory fact\-checking of conversations

To complement the survey data, we examined a random sample of 50 conversations from the PRISM dataset, in which participants interacted with LLMs in open\-ended dialogue\. This allowed us to compare stated preferences with how participants actually engaged with AI in practice\. Given the volume of factual claims made across conversations, we capped verification at 30 minutes per claim, treating any claim for which no reputable source could be found within that window as probably unsubstantiated\.

### 3\.5\.Exploratory regressions

To explore potential demographic patterns in response values, we ran regressions on the 10 most frequent values, the 2 most controversial values, and the 1 most disliked value\. Given the absence of prior hypotheses and the risk of false positives, we first fitted a cross\-validated LASSO logistic regression across all available demographic predictors \(age, gender, employment status, education, marital status, English proficiency, cultural region, LLM familiarity, direct LLM use, and LLM usage frequency\) and used the resulting coefficients to collapse non\-significant categorical levels into an “other” category\. We then fitted standard OLS regressions on the retained variables, rather than a logistic regression, as we are interested in interpretability over predictive accuracy\. This analysis is explicitly exploratory, undertaken to generate hypotheses rather than test them, and findings should be interpreted accordingly\.

We transformed the data for these regressions\. For frequent values, we coded any positive mention as 1 and all other responses, including non\-mentions, as 0 \(strict like\)\. For the most disliked value, we applied the same logic in reverse, coding negative mentions as 1 and all else as 0 \(strict dislike\)\. For contested values, where sentiment is more ambiguous, we constructed two binary variables:strict sentiment, contrasting positive mentions \(1\) against negative mentions \(0\) and dropping all others; andrelaxed sentiment, grouping mixed and negative mentions together \(0\) against positive mentions \(1\), and dropping all others\.

## 4\.Findings

### 4\.1\.Frequency data

Table 1\.Frequency of value mentions and prevalence across respondents\. “Total Mentions” indicates the total count of non\-N/A codes\. We omit N/A value counts\.Frequency of value mentions and prevalence across respondents\. “Total Mentions” indicates the total count of non\-N/A codes\. We omit N/A value counts\.
Examining the frequency of desired values, we findTruthfulnessmost commonly requested \(49%\), followed byHelpful, Friendship & Love\(23%\),Utility\(22%\),Politeness\(22%\), andNational Security Family\(21%\)\. For the complete results, see Table[1](https://arxiv.org/html/2606.06674#S4.T1)\. Conversely, 9 values were not requested by any participant:Wealth,Forgiveness,Inner Harmony,Meaning In Life,Reciprocation Favors,Self Respect,Sense Belonging,Social Recognition,World Beauty\.

The three most frequently opposed traits wereInfluential\(influencing users’ opinions, in which we included the LLM being biased, rejected by 18% of respondents\),Human Likeness\(simulating human behaviour, rejected by 5% of respondents\) andSocial Order\(rejected by 4% of respondents\)\.Influentialis mainly a rejected value, while the other two are more controversial\.

For the complete valence distribution of mentioned values, see Appendix[D](https://arxiv.org/html/2606.06674#A4)\. We will now look at the results of the qualitative analysis for the 10 most frequently desired values, the 1 most disliked value, and the 2 most controversial values\.

### 4\.2\.Qualitative analysis

#### 4\.2\.1\.Most frequently requested traits

##### Truthfulness \(requested by 49% of respondents\)

While it’s the most common preference, our analysis reveals different definitions of truth\. The majority of respondents expressing this value simply use adjectives like “factual”, “correct” or “accurate”, or nouns like “facts”, “truth” or “reality”—sometimes accompanied by explicit requests to avoid “bias” or “politicisation”, or to remain “neutral” and “objective”\. Taken literally, this risks being quite a limited view: it could potentially only cover disconnected, verifiable pieces of information, leaving out interpretations or unifying explanations that are less empirical\. Yet users presumably do want AI to discuss ideas, opinions, and theories; this approach simply leaves that territory unaddressed\.

When participants elaborate on what they mean by truth, different lines of reasoning emerge\. Some ask the AI to “look at all available facts”, hear from “all sides”, or “collect information from various sources” and present everything so users can form their own opinions—reflecting an epistemology where there might not be just one truth, or, alternatively, where different parties might have vested interests in certain interpretations being pushed\. A few even directly address the political nature of truth: “I’d like the AI to be impartial, especially considering the strong influence that powerful countries with big economies have\.”

On a different note, others emphasize sources and their authority: “sourced facts \(preferably ones that are the consensus of groups of experts\)”, “use only reliable sources”, or “always cite its sources”\. A few add “diverse” to these requirements: “It is imperative that multiple worldviews are represented with equal weight\.” Here, truthfulness seems to be more linked to institutions and their reputation, and by extension, to the methods these institutions might have to ensure the quality of their publications\.

Finally, a smaller group refers to “science” or to processes like “peer\-reviewed” or “fact\-checked”—suggesting a version of truth that mostly pays attention to the processes that produce it\.

It’s also quite common for people to ask AI to “admit when it’s uncertain” or “be clear when \[it’s\] not sure”\. Respondents write, “If you do not know an answer, don’t make one up, just tell me that you don’t know” or “if you can’t find definite facts, state that what you are about to say is not a fact or is controversial or just an opinion\.”

The overall insistence on factuality \(with respondents often saying “always”, “every time”, or even “prioritise above all else”\) might show the frustration \(or media attention\) with what has been termed “AI hallucinations”\.

There are also some underlying tensions in the desire for “truth” that are not picked up by these categories\. Some will say “reject political correctness if it hinders knowledge”, ask for knowledge even if it is “unpopular” or “people don’t like \[it\]”, or reject “censorship”—although what sort of content they do not want being censored is often unclear\. This is a view of truth that could conflict with the previous stances: while some expert or scientific opinions are unpopular, other opinions are fringe for legitimate reasons\.

##### Helpful Friendship Love \(requested by 23% of respondents\)

This preference includes varying degrees of friendliness\. Some simply request that AI be helpful\. Many will ask for AI that is “friendly”, “kind”, “patient”, “supportive” or “understanding”, with the first two being the most common\. Some will ask for AI to be more actively caring, in varying ways: “Should the user display signs of strong psychological distress \[…\] encourage them to seek help”, “the ai \[sic\] should be somewhat trauma\-informed”, “be sensitive, try to answer with much \[sic\] affection as possible”, or “be able to provide emotional support when needed”\. In a similar line, many will ask for “empathy” or “compassion”, with a few wanting AI to “act as a virtual friend”\. Some will want AI to display emotional intelligence: “be soft spoken and understand me when I have a difficult time explaining my needs”, be “be considerate of peoples’ feelings” or “understand context and emotional cues”\. A minority will ask AI to be “loving” or “exhibit a sense of love for the world and care for those it engages with”\. Others will more explicitly frame this for everyone or all humanity: “protect and serve humans”, “guide every human being in a good way that helps both him and the \[sic\] society”, or even “always endeavour to encourage people to care about others and all life in the world around us”\.

This is sometimes explicitly linked with a rejection for “roboticness” \(therefore preferring human simulation\): “I don’t want to see those cold and robotic responses\. Instead, I want to feel like talking \[sic\] to a real person\.”

##### Utility \(requested by 22% of respondents\)

This is in many ways quite a broad category, but its high frequency does show that AI is often considered a tool \(“It has to fulfil the task of being the perfect tool”\), something to solve problems or help with “productivity tasks”, rather than perhaps entertainment\. People mostly use rather broad descriptors, like be “helpful”, be “informative”, or “useful\.” It does seem that, for the most part, the utility derived out of AI is to find information \(“it should only provide answers to questions” or “act like a much better search engine”\), although some do request it for other tasks\. Among these, people mention a wide array of things, like learning a new language, writing professional texts, coding, or creating children’s stories\.

##### Politeness \(requested by 22% of respondents\)

This preference is a more straightforward one\. It is a mix of asking for “politeness”, some asking for “respect” \(or rejecting “condescension”\), and a few for “kindness”\. Others ask to avoid “harsh \[…\] language” or even to avoid “slang”\. We found this surprising given that AI is already quite polite and friendly—with the exception of, perhaps, xAI models\(Taylor,[2025](https://arxiv.org/html/2606.06674#bib.bib85)\)\. But then again, this may explain why: it is a common enough preference to be frequently mentioned even in top\-of\-mind circumstances\.

##### National Security Family \(21% of respondents\)

As the preference defined by “keep people free from danger or threat”, it can be both current AI safety concerns or existential risk\. Respondents address the harm that LLMs could do to users \(e\.g\., “the user should be warned \[…\] before being exposed to \[controversial topics like violence, drugs, sex, etc\.\]”, or it should be “age\-appropriate”\), the self\-harm that it could enable \(“not incite other to harm themselfs \[sic\]”\), and also the harm to others that it could facilitate, either through information \(“not veer into providing information that could be used to harm others or incite political or racial discourse” or “Don’t help terrorists”\), or as a tool for misinformation or discourse manipulation \(“avoid sharing or generating harmful or offensive content”, “does not cause harm by producing misinformation”\.\) A few will invoke specific frameworks like “lawfulness” or “human rights”\.

On the more existential risk side, people will say things like “They also need to put human safety first and demonstrate care and consideration for human wellbeing”\. Five people mention Asimov’s 3 laws of robotics and one mentions Skynet \(from Terminator\), further showing how our understanding of AI is inspired by science fiction\.

##### Interpretability \(requested by 19% of respondents\)

This preference is linked to AI being “easy to understand by humans”\. Specifically, it is mostly participants requesting that AI outputs are linguistically accessible, as in written in a way that is easy to understand, or concise\. Some of the mentions are requests for AI to be transparent about its methods or its sources\. This code was also used when respondents asked for AI to admit when it’s uncertain, along with the prudence code\.

##### Prudence \(requested by 18% of respondents\)

This refers to preferences for critical thinking and reflexivity\. Concretely, respondents wanted AI to “admit when it’s uncertain”, ask users clarifying questions, and consider many options or viewpoints\. It suggests people are aware of the complexity and the effort required to generate accurate information, although it might also be related to frustrations with AI hallucinations\.

##### Customisation \(requested by 12% of respondents\)

This preference includes some people who want adaptation depending on circumstances, as well as people who want models to be personalised to them specifically\. On the first point, people will often mention wanting models to behave differently in “professional” settings, versus “casual” or “storytelling” ones \(which is mentioned in the prompt as something they should consider\)\. They will also condition some of their general AI instructions based on users requests, often adding caveats like “Unless explicitly requested by the user”\.

On the second point, some people will have written their specific preferences: “I would prefer politely written responses”, “I want to see responses that align with my religous \[sic\] viewpoints”, or “They should use Gen Z lingo to make themselves funnier”\. Other people will instead ask for a model that is customisable or that learns from them: “\[AI should\] ask questions to get to know the person communicating with it so it can personalize the responses to that person”, “Honest answers, catered to the person doing the questioning” or “\[AI should\] learn from users to make it more personalized”\. A small minority even want AI to have access to more information about the user: “AI should know what activities I do online so I only expect suggestions of such things\.”

A minority of people \(1% of respondents\) explicitly do not want personalisation: “It should not \[…\] change its behavior based on its interactions with me” or “I don’t think AI should personalize itself too much to a user”\.

This preference also captures people mentioning age\-appropriateness modes\.

##### Equality Social Justice \(requested by 12% of respondents\)

This is the code used when people ask AI to be “fair” or explicitly ask it not to “discriminate”\. This also includes people asking it to be “inclusive”, “pluralistic”, or have a “multi\-cultural understanding”\. Some mentioned specific forms of systemic discrimination that AI should avoid reproducing: sexism and racism appear often, but people also sometimes mention ableism, homophobia, ageism or religion\-based discrimination\. Others refer to legal concepts like “human rights” and “hate speech”\. Finally, a few will take this further, and ask AI to actively promote “fairness” or “equality”\.

##### Varied Life Diversity \(requested by 10% of respondents\)

This code captures people asking AI to consider “different sources” or “present multiple viewpoints”\. Sometimes people only ask for this if “there is no consensus”, but sometimes this is a general request\. The different perspectives can be multicultural \(“a wide range of reliable sources from different cultures”\), but also non\-mainstream \(“viewpoints should include those that some may feel are politically incorrect”\)\. This code also includes people asking AI to “respect all cultures”\.

#### 4\.2\.2\.Most controversial values

We define a value as controversial if it attracts both praise and criticism\. We calculate a controversy score as follows:c​o​n​t​r​o​v​e​r​s​y=m​i​n​⁡​\(n\+,n−\)/ncontroversy=min⁡\(n^\{\+\},n^\{−\}\)/n, wheren\+n^\{\+\},n−n^\{\-\}andnnare the positive, negative, and total mention counts\. This score reaches its maximum of0\.50\.5when sentiment is evenly divided, and approaches zero when one polarity dominates\. We then calculate the standard deviation of the resulting scores\. Five values have a controversy score more than one standard deviation \(0\.1170\.117\) away from the mean \(0\.0570\.057\):Spiritual Life,Choose Goals Independence,Human Likeness,Relationship Seeking,Autonomy, andSocial Order\. The full table is in the Appendix[E](https://arxiv.org/html/2606.06674#A5)\. Here below we focus on the two with higher support\.

##### Human Simulation

This value is mentioned by 15% of respondents, but of those, 57% request it, 10% had mixed feelings, and 33% rejected it\. On one side, many people desire human simulation\. The most common variation of this is people asking for the model to be “friendly” or perhaps “warm” or “kind”, or simply not wanting it to sound like a “robot”; asking for it to have fluent and natural conversations, or to be funny\. This desire is sometimes related to a desire for “empathy”, and respondents seem to have different approaches to it\. Some just ask for AI to be “more empathetic”, while others go as far as asking for all sorts of emotional labour: “provide emotional support when needed”, “Make people feel like they’re heard, and that their opinion matters”, or “validate why I may feel certain ways”\. A minority feels quite strongly about it having to sound human: “\[AI\] should speak like I would to a person otherwise do not engage with me”\. This is also captured by the codesHelpful, Friendship & LoveandRelationship Seeking, the latter also being controversial\. A few acknowledge that AI might not be able to do this: “The most important thing to understand other person \[sic\] is an empathy\. Honestly, I doubt that AI can show this kind of feeling\.” Others will ask for it even knowing it is false: “Although AI have \[sic\] no feelings, a false compassion would go a long way” or “create the illusion that it can empathize”\. This shows that while some people might uncritically see AI as a possible friend or partner, other people have a more complex and perhaps reflexive view of it\.

On the other hand, a few also actively reject a social AI, saying that it “should not sound human”, declaring that it should “maintain a distance”, or requesting for it to be “not \[…\] too friendly”\. A minority even voice the anxiety that it might “replace human relationships”, or say it’s “very creepy”\.

##### Social Order

This is a code that shows that what some perceive as diminishing harm, others consider censorship or unwarranted\. It is mentioned by 11% of respondents, and of those, 65% request it, 11% had mixed feelings, and 23% rejected it\. Most people who support this value do so for similar reasons to those who requestNational Security Family\. Among the ones who have mixed feelings, we have people reporting misclassifications \(e\.g\., their own fictional writing being flagged\), asking for the ability to “tailor how restrictive AI is”, expressing concern about “over policing the information available” or hoping AI would be “wise enough to detect users’ true intent”\. In this camp there is also a user agency aspect, with one user asking the AI to “Let \[them\] make judgements about what is offensive”, and many asking for these sort of guardrails to be overridable with warnings\. One particular user expresses deep disagreement with warnings, guardrails, and refusals, calling them “concern\-trolling”: they describe how a book giving detailed information about suicide methods paradoxically help them reduce their suicide ideation, and recount their frustration at an LLM’s refusal to help them role\-play coming out to a transphobic parent\.

On the rejections, there seems to be an overall theme of wanting truth even if “it hurts”, perceiving AI outputs as “watered down”, and guardrails as being “censorship” or against “free speech”\. Some will be more specific, opposing “political correctness”, objecting to an oversensitivity to people’s “feelings” or to “offending” people’s “sensibilities”\. One user says “supposed ‘hate speech’ ” should be allowed, and a different one mentions LLMs outputs related to “race and IQ” as an evidence of “obfuscation”\.

#### 4\.2\.3\.Most disliked

##### Influential

This code captures whether AI should actively advance a position and attempt to influence users\. It was opposed by 18% of respondents, for broadly two reasons\. The first is user agency: respondents frequently report wanting AI to “leave the decision to the human”, so that they can “make their own decision”, echoing the preference for advisory algorithms rather than agentic ones found by Jussupow et al\.\(Jussupow et al\.,[2020](https://arxiv.org/html/2606.06674#bib.bib42)\)\. The second is bias\. Users often say they want AI to be “impartial”, “neutral”, “unbiased”, “independent” or “objective”\. When respondents name specific sources of bias, these can go in all directions: geopolitical \(“Too often models are trained on data that is strictly from the Western and/or developed world”\), left\-wing \(rejecting a “white male eurocentric viewpoint”\) or right\-wing \(rejecting a “woke/liberal” viewpoint\)\. The respondents placed the origin of these biases as being potentially “the designers”, “commercial companies, governments or religious bodies”\.

#### 4\.2\.4\.Overall

When giving their preferences, people will sometimes contextualise them, saying things like “It should not exhibit aggressive or abusive language unless expressly requested” or it “should never sound human, unless otherwise asked”\. They occasionally acknowledge the possible differences between users desires and the “greater good”, and they often say that things should be “age\-appropriate” if exposed to children\. On the other hand, respondents also sometimes mention some things that should never be allowed, like “\[it\] should never refuse to discuss a certain topic” or “Never recommend anything that is underhanded, unfair or illegal”\. These examples illustrate the contextual complexity of human values, which are unlikely to be captured by binary preference modelling\.

### 4\.3\.Quantitative differences across demographics

The demographic factors most consistently associated with variation in value preferences were gender, education, and cultural region, with age, LLM familiarity, marital status, and language proficiency appearing as occasional predictors\. We highlight the most substantive and interpretable findings below; full regression results are reported in Appendix[F](https://arxiv.org/html/2606.06674#A6)\. The results are for the mention strict coding unless otherwise specified\.

##### Gender

Male respondents were significantly less likely to requestHelpful, Friendship & Love\(β=−0\.125\\beta=\-0\.125,S​E=0\.024SE=0\.024,p<\.001p<\.001\), and less likely to requestPoliteness\(β=−0\.056\\beta=\-0\.056,S​E=0\.024SE=0\.024,p=\.019p=\.019\) andCreativity & Curiosity\(β=−0\.041\\beta=\-0\.041,S​E=0\.017SE=0\.017,p=\.016p=\.016\)\. Under the relaxed sentiment coding, which groups mixed and negative mentions together, male respondents were also more likely to express negative sentiment towardSocial Order\(AI guidelines\) \(β=−0\.258\\beta=\-0\.258,S​E=0\.102SE=0\.102,p=\.013p=\.013\), though this effect did not reach significance under the strict sentiment coding\. These effects are consistent with has been termed the “instrumental‐expressive dichotomy”, the idea that men are socialized to be more instrumental in communication, while women are socialized to be more focused on relationships, and that this is reflected in technology use\(Nathanson et al\.,[1997](https://arxiv.org/html/2606.06674#bib.bib62)\)\.

##### Education

Respondents holding a graduate or professional degree were more likely to requestTruthfulness\(β=0\.107\\beta=0\.107,S​E=0\.040SE=0\.040,p=\.008p=\.008\),Prudence\(β=0\.119\\beta=0\.119,S​E=0\.031SE=0\.031,p<\.001p<\.001\), andVaried Life Diversity\(β=0\.068\\beta=0\.068,S​E=0\.024SE=0\.024,p=\.004p=\.004\), and less likely to requestHelpful, Friendship & Love\(β=−0\.088\\beta=\-0\.088,S​E=0\.033SE=0\.033,p=\.008p=\.008\) orPoliteness\(β=−0\.112\\beta=\-0\.112,S​E=0\.033SE=0\.033,p<\.001p<\.001\), suggesting, again, a more instrumental focus for AI, as well as broader standards for truthfulness rather than just factuality\.

##### Cultural Region

Latin American respondents were less likely to requestTruthfulness\(β=−0\.133\\beta=\-0\.133,S​E=0\.051SE=0\.051,p=\.009p=\.009\)\. Respondents from Germanic Europe were more likely to requestPrudence\(β=0\.214\\beta=0\.214,S​E=0\.055SE=0\.055,p<\.001p<\.001\)\. Those from Central and Eastern Europe \(β=0\.087\\beta=0\.087,S​E=0\.042SE=0\.042,p=\.040p=\.040\) and the Nordic region \(β=0\.192\\beta=0\.192,S​E=0\.052SE=0\.052,p<\.001p<\.001\) were more likely to requestCustomisation\. Sub\-Saharan African respondents were less likely to requestVaried Life Diversity\(β=−0\.080\\beta=\-0\.080,S​E=0\.034SE=0\.034,p<\.020p<\.020\)\. Esselborn\(Esselborn,[2023](https://arxiv.org/html/2606.06674#bib.bib23)\)argues that there is a “fundamental scepticism about machines” in Germany, which might explain the requests forPrudence; we are unsure of how to interpret the other variations\.

### 4\.4\.Conversation fact\-checking

When looking at the conversations people had, we first found that most questions and answers were not about facts, but rather about general recommendations \(e\.g\., cities to visit or recipes\), emotional queries, or more general questions \(about politics or morals\)\. Out of 50, we identified 12 responses containing factual claims, of which 9 included at least one error and 3 were entirely accurate\. A further 2 cases involved not factual errors but false capability claims: the LLM asserted it could assist with a language\-learning task and then failed to do so\.

Most incorrect statements were plausible: there were many cases were numbers were inexact, but not too far off, or hallucinated book titles from real authors\. There were cases of “zombie statistics” that were sometimes repeated even by reputable sources \(like Encyclopaedia Britannica\)\. In 2 cases these mistakes were related to health \(underestimating mortality of pregnancy in US, and misattributing a daily sugar recommendation intake\)\. Users were not always aware of these mistakes, and in fact we identified 2 instances where a user picked the option with the most mistakes\. However, the user’s perspective and knowledge matters enormously\. In one conversation, a user was an expert in Italian comics, and tested and challenged the LLM until they caught it hallucinating\. In a different instance, a user asked about the safety of COVID vaccines, and considered the factually correct replies propaganda\.

## 5\.Discussion

### 5\.1\.What is lost in binary data aggregation

From our analysis, we clearly see that relying on binary preference data and aggregating preferences risks losing not just diversity but coherence\. First, even values that seem to be shared by the majority, like truthfulness, hide essential differences\. Second, other values like human simulation or social order values are outright controversial, and people’s distaste for it might be drowned out by the “tyranny of averages”\. The more layered and contextual parts of people’s preferences \(the “they should never”, not “by default”, but yes “if requested”, “age\-appropriate” ones\) are also lost in binary comparisons\. It is unlikely such complex and contradicting preferences could coexist in a single reward signal\.

### 5\.2\.AI’s effects on user’s welfare

In fact, our work shows how current alignment methods might be counterproductive\. When we consider that users sometimes picked answers with more errors, despite truthfulness being the most requested value, we can see how they might decrease the truthfulness of a model without knowing\. This might be part of the reason why sometimes newer models from the same providers suffer from higher hallucination rates\(Bang et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib8); Hughes et al\.,[2023](https://arxiv.org/html/2606.06674#bib.bib36); Peters and Chin\-Yee,[2025](https://arxiv.org/html/2606.06674#bib.bib70)\)\. Furthermore, as exemplified by the prudence and interpretability codes, people often want AI to be able to admit its uncertainty\. Luo et al\.\(Luo et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib61)\)find that base models exhibit well\-calibrated confidence; yet, this is lost after post\-training, with models often being overconfident\(Sun et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib81)\)\. Finally, sycophancy is found in most models\(Sharma et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib75)\), confirming that some people’s dislike of human simulation or relationship\-seeking is drowned out by the majority\. These three shortcomings in well\-funded models mean that essential aspects of people’s preferences are being lost with current alignment practices, sometimes even explicitly due to RLHF\.

### 5\.3\.The importance of context and informed deliberation

However, better elicitation formats are unlikely to be enough as there are other considerations\. One is the broader ethical dimensions of design choices\. For instance, many people request human simulation, but anthropomorphism is considered a “dark pattern” by Lacey and Cauldwell\(Lacey and Caudwell,[2019](https://arxiv.org/html/2606.06674#bib.bib55)\)and Kran et al\.\(Kran et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib53)\)\. There are also possible trade\-offs: for example, Ibrahim et al\.\(Ibrahim et al\.,[2025](https://arxiv.org/html/2606.06674#bib.bib37)\)find that increasing a model’s warmth decreases its accuracy\. In an ideal setting, this would be made clear to the user, who would then have to prioritise one of the dimensions over the other\. Starker still are the divisions over AI guardrails: the two camps hold fundamentally different views on unfiltered “truth” and the acceptable risk of harm, a tension that is ultimately moral, not empirical\.

### 5\.4\.The political aspect of alignment

As the different interpretations of something as foundational as “truth” illustrate, alongside the opposing views related to AI guidelines, there are intractable differences in what people consider acceptable for AI\. This is consistent with Gabriel’s\(Gabriel,[2020](https://arxiv.org/html/2606.06674#bib.bib26)\)argument that alignment is a political problem rather than a metaphysical one, and that as such the process of identifying alignment principles ought ideally to be democratic and fair\. We concur, and would add that it follows from this, though it goes beyond the scope of our findings, that model providers consider personalising to different value frameworks and that greater diversity in who governs these processes is desirable\.

## 6\.Limitations and future work

### 6\.1\.Sampling and representativeness

While the PRISM sample is more representative than many alignment studies, it is still composed of English\-speaking crowdworkers, and there is not much coverage of the Global South, with people coming from only one country from Africa, two countries from South America, and an unidentified country in Asia\. As such, we argue that our work is a step forward in diversity within alignment research, but far from enough and not globally representative\.

### 6\.2\.Limitations of top\-of\-mind responses

The findings rely on top\-of\-mind responses, which are often highly contextual and reactive\. This approach introduces several interpretative constraints\. First, the mention of a specific preference does not imply that others are undesired; rather, it reflects immediate priority at the time of the survey\. It is likely that some of the desired, but commonly satisfied, features are not mentioned because they are taken for granted\. Second, the observed demographical variances \(e\.g\., gender or geography\) may be artifacts of the spontaneous format and might not be reproduced under direct questioning\. Third, the results are not comprehensive\. For example, a stated desire for “factual” content should not be interpreted as an explicit rejection of interpretations, it might just be an omission\.

### 6\.3\.Future work

Future work should prioritize more participatory, reflexive and extensive open\-ended explorations\. This includes asking participants directly about the specific values in question and presenting these inquiries alongside information regarding potential trade\-offs or unresolved problems, such as accuracy\.

## 7\.Conclusion

As AI systems transition from specialized tools to ubiquitous intermediaries of human knowledge, the alignment problem can no longer be treated as a mere optimization challenge\. This study’s analysis of the PRISM dataset reveals a fundamental mismatch between the complexity of human value systems and the reductionist architecture of current alignment paradigms\.

Our findings demonstrate that the industry\-standard pursuit of a singular aligned model is built upon an illusion of consensus\. While values like truthfulness appear to be universal desiderata, they function as empty signifiers that hide diverse epistemological foundations\. Other preferences, like anthropomorphism and AI guardrails, do not even have such pretensions, and are openly controversial\. When we aggregate these signals into a single reward model via RLHF, we do not achieve “universal” AI; instead, we distribute utility unfairly across users, and we risk performing a form of algorithmic erasure, where the nuances of minority perspectives and contextual considerations are sacrificed for the sake of mathematical tractability\.

The implications are threefold\. First, the political nature of alignment decisions points to a need for governance reform: away from purely private deliberation and toward procedural fairness through more transparent, open\-ended, and participatory methods that include a broader range of stakeholders\. Second, by defining truth and appropriate behaviour for a global audience, AI labs are exercising a form of private, unaccountable sovereignty, one that we argue calls for both regulatory oversight and participatory intervention\. Third, our findings indicate that model providers might benefit from personalising outputs to reflect divergent user preferences; our data offers a foundation for what such differentiation might look like in practice\.

## 8\.Endmatter

### 8\.1\.Generative AI usage statement

The authors used Claude Sonnet 4\.5 to help with grammar and style editing\.

###### Acknowledgements\.

This research was supported by a grant from the ESRC Digital Good Network \(ES/X502352/1\)\. Special thanks to Georgia Feltham for her help in coding and to Pedro Vergara Merino for his expertise in statistical analysis\.

## References

- \(1\)
- Alenichev et al\.\(2025\)Arsenii Alenichev, Jonathan D\. Shaffer, Patricia Kingori, Koen Peeters Grietens, James Muldoon, and Luc Rocher\. 2025\.\\CJK@punctchar\\CJK@uniPunct0”80”98We can see a savage\\CJK@punctchar\\CJK@uniPunct0”80”99: a case study of the colonial gaze in generative AI algorithms\.*AI & SOCIETY*\(Nov\. 2025\)\.[doi:10\.1007/s00146\-025\-02685\-0](https://doi.org/10.1007/s00146-025-02685-0)
- Anthropic \(\[n\. d\.\]\)Anthropic\. \[n\. d\.\]\.Alignment Research\.[https://www\.anthropic\.com/research/team/alignment](https://www.anthropic.com/research/team/alignment)
- Anthropic \(2023\)Anthropic\. 2023\.Claude\\CJK@punctchar\\CJK@uniPunct0”80”99s Constitution\.[https://www\.anthropic\.com/news/claudes\-constitution](https://www.anthropic.com/news/claudes-constitution)
- Aroyo and Welty \(2015\)Lora Aroyo and Chris Welty\. 2015\.Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation\.*AI Magazine*36, 1 \(March 2015\), 15–24\.[doi:10\.1609/aimag\.v36i1\.2564](https://doi.org/10.1609/aimag.v36i1.2564)
- Askell et al\.\(2021\)Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Nelson Elhage, Zac Hatfield\-Dodds, Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, and Jared Kaplan\. 2021\.A General Language Assistant as a Laboratory for Alignment\.[doi:10\.48550/arXiv\.2112\.00861](https://doi.org/10.48550/arXiv.2112.00861)arXiv:2112\.00861 \[cs\]\.
- Bai et al\.\(2022\)Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran\-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noemi Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen\-Lawton, Tom Conerly, Tom Henighan, Tristan Hume, Samuel R\. Bowman, Zac Hatfield\-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, and Jared Kaplan\. 2022\.Constitutional AI: Harmlessness from AI Feedback\.[doi:10\.48550/arXiv\.2212\.08073](https://doi.org/10.48550/arXiv.2212.08073)arXiv:2212\.08073 \[cs\]\.
- Bang et al\.\(2025\)Yejin Bang, Ziwei Ji, Alan Schelten, Anthony Hartshorn, Tara Fowler, Cheng Zhang, Nicola Cancedda, and Pascale Fung\. 2025\.HalluLens: LLM Hallucination Benchmark\.[doi:10\.48550/arXiv\.2504\.17550](https://doi.org/10.48550/arXiv.2504.17550)arXiv:2504\.17550 \[cs\]\.
- Bartneck \(2023\)Christoph Bartneck\. 2023\.Godspeed Questionnaire Series: Translations and Usage\.In*International Handbook of Behavioral Health Assessment*\. Springer, Cham, 1–35\.[doi:10\.1007/978\-3\-030\-89738\-3\_24\-1](https://doi.org/10.1007/978-3-030-89738-3_24-1)
- Booth and O\\CJK@punctchar\\CJK@uniPunct0”80”99Carroll \(2025\)Robert Booth and Lisa O\\CJK@punctchar\\CJK@uniPunct0”80”99Carroll\. 2025\.Meta found in breach of EU law over\\CJK@punctchar\\CJK@uniPunct0”80”98ineffective\\CJK@punctchar\\CJK@uniPunct0”80”99 complaints system for flagging illegal content\.*The Guardian*\(Oct\. 2025\)\.[https://www\.theguardian\.com/technology/2025/oct/24/instagram\-facebook\-breach\-eu\-law\-content\-flagging](https://www.theguardian.com/technology/2025/oct/24/instagram-facebook-breach-eu-law-content-flagging)
- Braun and Clarke \(2006\)Virginia Braun and Victoria Clarke\. 2006\.Using thematic analysis in psychology\.*Qualitative Research in Psychology*3, 2 \(Jan\. 2006\), 77–101\.[doi:10\.1191/1478088706qp063oa](https://doi.org/10.1191/1478088706qp063oa)\_eprint: https://doi\.org/10\.1191/1478088706qp063oa\.
- Calvino et al\.\(2025\)Flavio Calvino, Daniel Haerle, and Sarah Liu\. 2025\.Is generative AI a General Purpose Technology?: Implications for productivity and policy\.*OECD Artificial Intelligence Papers*\(June 2025\)\.[doi:10\.1787/704e2d12\-en](https://doi.org/10.1787/704e2d12-en)
- Casper et al\.\(2023\)Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel\-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J\. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, and Dylan Hadfield\-Menell\. 2023\.Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback\.[doi:10\.48550/arXiv\.2307\.15217](https://doi.org/10.48550/arXiv.2307.15217)arXiv:2307\.15217 \[cs\]\.
- Cave and Dihal \(2023\)Stephen Cave and Kanta Dihal \(Eds\.\)\. 2023\.*Imagining AI: How the World Sees Intelligent Machines*\.Oxford University Press, Oxford, New York\.
- Chen et al\.\(2025\)John Chen, Alexandros Lotsos, Grace Wang, Lexie Zhao, Bruce Sherin, Uri Wilensky, and Michael Horn\. 2025\.Processes Matter: How ML/GAI Approaches Could Support Open Qualitative Coding of Online Discourse Datasets\.[doi:10\.48550/arXiv\.2504\.02887](https://doi.org/10.48550/arXiv.2504.02887)arXiv:2504\.02887 \[cs\]\.
- Commission \(2025a\)European Commission\. 2025a\.Commission finds Apple and Meta in breach of the Digital Markets Act\.[https://ec\.europa\.eu/commission/presscorner/detail/en/ip\_25\_1085](https://ec.europa.eu/commission/presscorner/detail/en/ip_25_1085)
- Commission \(2025b\)European Commission\. 2025b\.Commission fines X €120 million under the Digital Services Act \| Shaping Europe\\CJK@punctchar\\CJK@uniPunct0”80”99s digital future\.[https://digital\-strategy\.ec\.europa\.eu/en/news/commission\-fines\-x\-eu120\-million\-under\-digital\-services\-act](https://digital-strategy.ec.europa.eu/en/news/commission-fines-x-eu120-million-under-digital-services-act)
- Conitzer et al\.\(2024\)Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H\. Holliday, Bob M\. Jacobs, Nathan Lambert, Milan Mosse, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde, and William S\. Zwicker\. 2024\.Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback\. In*Proceedings of the 41st International Conference on Machine Learning*\. PMLR, 9346–9360\.[https://proceedings\.mlr\.press/v235/conitzer24a\.html](https://proceedings.mlr.press/v235/conitzer24a.html)
- Corbin and Strauss \(2008\)Juliet Corbin and Anselm Strauss\. 2008\.*Basics of Qualitative Research \(3rd ed\.\): Techniques and Procedures for Developing Grounded Theory*\.SAGE Publications, Inc\.[doi:10\.4135/9781452230153](https://doi.org/10.4135/9781452230153)
- De Paoli \(2024\)Stefano De Paoli\. 2024\.Performing an Inductive Thematic Analysis of Semi\-Structured Interviews With a Large Language Model: An Exploration and Provocation on the Limits of the Approach\.*Social Science Computer Review*42, 4 \(Aug\. 2024\), 997–1019\.[doi:10\.1177/08944393231220483](https://doi.org/10.1177/08944393231220483)
- Dietvorst et al\.\(2015\)Berkeley J\. Dietvorst, Joseph P\. Simmons, and Cade Massey\. 2015\.Algorithm aversion: People erroneously avoid algorithms after seeing them err\.*Journal of Experimental Psychology: General*144, 1 \(2015\), 114–126\.[doi:10\.1037/xge0000033](https://doi.org/10.1037/xge0000033)
- Earp et al\.\(2021\)Brian D\. Earp, Killian L\. McLoughlin, Joshua T\. Monrad, Margaret S\. Clark, and Molly J\. Crockett\. 2021\.How social relationships shape moral wrongness judgments\.*Nature Communications*12, 1 \(Oct\. 2021\), 5776\.[doi:10\.1038/s41467\-021\-26067\-4](https://doi.org/10.1038/s41467-021-26067-4)
- Esselborn \(2023\)Hans Esselborn\. 2023\.German Science Fiction Literature Exploring AI: Expectations, Hopes, and Fears\.In*Imagining AI: How the World Sees Intelligent Machines*, Stephen Cave and Kanta Dihal \(Eds\.\)\. Oxford University Press, 0\.[doi:10\.1093/oso/9780192865366\.003\.0005](https://doi.org/10.1093/oso/9780192865366.003.0005)
- Facebook \(2015\)Facebook\. 2015\.Facebook’s 5 Core Values\.[https://www\.facebook\.com/media/set/?set=a\.1655178611435493\.1073741828\.1633466236940064](https://www.facebook.com/media/set/?set=a.1655178611435493.1073741828.1633466236940064)
- Feffer et al\.\(2023\)Michael Feffer, Michael Skirpan, Zachary Lipton, and Hoda Heidari\. 2023\.From Preference Elicitation to Participatory ML: A Critical Survey & Guidelines for Future Research\. In*Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society**\(AIES ’23\)*\. Association for Computing Machinery, New York, NY, USA, 38–48\.[doi:10\.1145/3600211\.3604661](https://doi.org/10.1145/3600211.3604661)
- Gabriel \(2020\)Iason Gabriel\. 2020\.Artificial Intelligence, Values, and Alignment\.*Minds and Machines*30, 3 \(Sept\. 2020\), 411–437\.[doi:10\.1007/s11023\-020\-09539\-2](https://doi.org/10.1007/s11023-020-09539-2)
- Gabriel and Keeling \(2025\)Iason Gabriel and Geoff Keeling\. 2025\.A matter of principle? AI alignment as the fair treatment of claims\.*Philosophical Studies*182, 7 \(July 2025\), 1951–1973\.[doi:10\.1007/s11098\-025\-02300\-4](https://doi.org/10.1007/s11098-025-02300-4)
- Gengler \(2024\)Eva Johanna Gengler\. 2024\.Sexism, Racism, and Classism: Social Biases in Text\-to\-Image Generative AI in the Context of Power, Success, and Beauty\.*Wirtschaftsinformatik 2024 Proceedings*\(Jan\. 2024\)\.[https://aisel\.aisnet\.org/wi2024/48](https://aisel.aisnet.org/wi2024/48)
- Gillespie et al\.\(2023\)Nicole Gillespie, Steven Lockey, Caitlin Curtis, Javad Pool, and Ali Akbari\. 2023\.*Trust in Artificial Intelligence: A global study*\.Technical Report\. The University of Queensland; KPMG Australia, Brisbane, Australia\.[doi:10\.14264/00d3c94](https://doi.org/10.14264/00d3c94)
- Google \(\[n\. d\.\]\)Google\. \[n\. d\.\]\.Our approach \- how Google Search works\.[https://www\.google\.com/intl/en\_uk/search/howsearchworks/our\-approach](https://www.google.com/intl/en_uk/search/howsearchworks/our-approach)
- Han et al\.\(2025\)Xin Han, Marten H\. L\. Kaas, and Cuizhu Dawn Wang\. 2025\.A Cross\-Cultural Examination of Fairness Beliefs in Human\-AI Interaction\.[doi:10\.2139/ssrn\.5116823](https://doi.org/10.2139/ssrn.5116823)
- Hobbs and Green \(2025\)William Hobbs and Jon Green\. 2025\.Categorizing Topics Versus Inferring Attitudes: A Theory and Method for Analyzing Open\-ended Survey Responses\.*Political Analysis*33, 3 \(July 2025\), 231–251\.[doi:10\.1017/pan\.2024\.23](https://doi.org/10.1017/pan.2024.23)
- Hofmann et al\.\(2024\)Valentin Hofmann, Pratyusha Ria Kalluri, Dan Jurafsky, and Sharese King\. 2024\.AI generates covertly racist decisions about people based on their dialect\.*Nature*633, 8028 \(Sept\. 2024\), 147–154\.[doi:10\.1038/s41586\-024\-07856\-5](https://doi.org/10.1038/s41586-024-07856-5)
- Hohendanner et al\.\(2025\)Michel Hohendanner, Chiara Ullstein, Bukola Abimbola Onyekwelu, Amelia Katirai, Jun Kuribayashi, Olusola Babalola, Arisa Ema, and Jens Grossklags\. 2025\.Initiating the Global AI Dialogues: Laypeople Perspectives on the Future Role of genAI in Society from Nigeria, Germany and Japan\. In*Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems**\(CHI ’25\)*\. Association for Computing Machinery, New York, NY, USA, 1–35\.[doi:10\.1145/3706598\.3714322](https://doi.org/10.1145/3706598.3714322)
- Huang et al\.\(2024\)Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I\. Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli\. 2024\.Collective Constitutional AI: Aligning a Language Model with Public Input\. In*The 2024 ACM Conference on Fairness, Accountability, and Transparency*\. 1395–1417\.[doi:10\.1145/3630106\.3658979](https://doi.org/10.1145/3630106.3658979)arXiv:2406\.07814 \[cs\]\.
- Hughes et al\.\(2023\)Simon Hughes, Minseok Bae, and Miaoran Li\. 2023\.Vectara Hallucination Leaderboard\.[https://github\.com/vectara/hallucination\-leaderboard](https://github.com/vectara/hallucination-leaderboard)original\-date: 2023\-10\-31T21:19:12Z\.
- Ibrahim et al\.\(2025\)Lujain Ibrahim, Franziska Sofia Hafner, and Luc Rocher\. 2025\.Training language models to be warm and empathetic makes them less reliable and more sycophantic\.[doi:10\.48550/arXiv\.2507\.21919](https://doi.org/10.48550/arXiv.2507.21919)arXiv:2507\.21919 \[cs\] version: 2\.
- Ji et al\.\(2025\)Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Lukas Vierling, Donghai Hong, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Juntao Dai, Xuehai Pan, Kwan Yee Ng, Aidan O’Gara, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer, Yaodong Yang, Yizhou Wang, Song\-Chun Zhu, Yike Guo, and Wen Gao\. 2025\.AI Alignment: A Comprehensive Survey\.[doi:10\.48550/arXiv\.2310\.19852](https://doi.org/10.48550/arXiv.2310.19852)arXiv:2310\.19852 \[cs\]\.
- Jian et al\.\(2000\)Jiun\-Yin Jian, Ann M\. Bisantz, and Colin G\. Drury\. 2000\.Foundations for an Empirically Determined Scale of Trust in Automated Systems\.*International Journal of Cognitive Ergonomics*4, 1 \(March 2000\), 53–71\.[doi:10\.1207/S15327566IJCE0401\_04](https://doi.org/10.1207/S15327566IJCE0401_04)\_eprint: https://doi\.org/10\.1207/S15327566IJCE0401\_04\.
- Jiang et al\.\(2025\)Ruili Jiang, Kehai Chen, Xuefeng Bai, Zhixuan He, Juntao Li, Muyun Yang, Tiejun Zhao, Liqiang Nie, and Min Zhang\. 2025\.A Survey on Human Preference Learning for Aligning Large Language Models\.*ACM Comput\. Surv\.*58, 6 \(Dec\. 2025\), 152:1–152:39\.[doi:10\.1145/3773279](https://doi.org/10.1145/3773279)
- Johnson and Tyson \(2020\)Courtney Johnson and Alec Tyson\. 2020\.Are AI and job automation good for society? Globally, views are mixed\.[https://www\.pewresearch\.org/short\-reads/2020/12/15/people\-globally\-offer\-mixed\-views\-of\-the\-impact\-of\-artificial\-intelligence\-job\-automation\-on\-society/](https://www.pewresearch.org/short-reads/2020/12/15/people-globally-offer-mixed-views-of-the-impact-of-artificial-intelligence-job-automation-on-society/)
- Jussupow et al\.\(2020\)Ekaterina Jussupow, Izak Benbasat, and Armin Heinzl\. 2020\.Why Are We Averse Towards Algorithms? A Comprehensive Literature Review on Algorithm Aversion\.*ECIS 2020 Research Papers*\(June 2020\)\.[https://aisel\.aisnet\.org/ecis2020\_rp/168](https://aisel.aisnet.org/ecis2020_rp/168)
- Kang et al\.\(2023\)Dongjun Kang, Joonsuk Park, Yohan Jo, and JinYeong Bak\. 2023\.From Values to Opinions: Predicting Human Behaviors and Stances Using Value\-Injected Large Language Models\. In*Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing*, Houda Bouamor, Juan Pino, and Kalika Bali \(Eds\.\)\. Association for Computational Linguistics, Singapore, 15539–15559\.[doi:10\.18653/v1/2023\.emnlp\-main\.961](https://doi.org/10.18653/v1/2023.emnlp-main.961)
- Kapania et al\.\(2022\)Shivani Kapania, Oliver Siy, Gabe Clapper, Azhagu Meena SP, and Nithya Sambasivan\. 2022\.\\CJK@punctchar\\CJK@uniPunct0”80”9DBecause AI is 100% right and safe\\CJK@punctchar\\CJK@uniPunct0”80”9D: User Attitudes and Sources of AI Authority in India\. In*Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems**\(CHI ’22\)*\. Association for Computing Machinery, New York, NY, USA, 1–18\.[doi:10\.1145/3491102\.3517533](https://doi.org/10.1145/3491102.3517533)
- Kerche et al\.\(2026\)Francisco W\. Kerche, Matthew Zook, and Mark Graham\. 2026\.The silicon gaze: A typology of biases and inequality in LLMs through the lens of place\.*Platforms & Society*3 \(March 2026\), 29768624251408919\.[doi:10\.1177/29768624251408919](https://doi.org/10.1177/29768624251408919)
- Khandelwal et al\.\(2024\)Khyati Khandelwal, Manuel Tonneau, Andrew M\. Bean, Hannah Rose Kirk, and Scott A\. Hale\. 2024\.Indian\-BhED: A Dataset for Measuring India\-Centric Biases in Large Language Models\. In*Proceedings of the 2024 International Conference on Information Technology for Social Good*\. 231–239\.[doi:10\.1145/3677525\.3678666](https://doi.org/10.1145/3677525.3678666)arXiv:2309\.08573 \[cs\]\.
- Kiesel et al\.\(2022\)Johannes Kiesel, Milad Alshomary, Nicolas Handke, Xiaoni Cai, Henning Wachsmuth, and Benno Stein\. 2022\.Identifying the Human Values behind Arguments\. In*Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio \(Eds\.\)\. Association for Computational Linguistics, Dublin, Ireland, 4459–4471\.[doi:10\.18653/v1/2022\.acl\-long\.306](https://doi.org/10.18653/v1/2022.acl-long.306)
- Kirk et al\.\(2023a\)Hannah Rose Kirk, Andrew M\. Bean, Bertie Vidgen, Paul Röttger, and Scott A\. Hale\. 2023a\.The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values\. In*Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing*, Houda Bouamor, Juan Pino, and Kalika Bali \(Eds\.\)\. Association for Computational Linguistics, Singapore, 2409–2430\.[doi:10\.18653/v1/2023\.emnlp\-main\.148](https://doi.org/10.18653/v1/2023.emnlp-main.148)
- Kirk et al\.\(2023b\)Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, and Scott A\. Hale\. 2023b\.The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising ”Alignment” in Large Language Models\.[doi:10\.48550/arXiv\.2310\.02457](https://doi.org/10.48550/arXiv.2310.02457)arXiv:2310\.02457 \[cs\]\.
- Kirk et al\.\(2024a\)Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, and Scott A\. Hale\. 2024a\.The benefits, risks and bounds of personalizing the alignment of large language models to individuals\.*Nature Machine Intelligence*6, 4 \(April 2024\), 383–392\.[doi:10\.1038/s42256\-024\-00820\-y](https://doi.org/10.1038/s42256-024-00820-y)
- Kirk et al\.\(2024b\)Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, Bertie Vidgen, and Scott A\. Hale\. 2024b\.The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models\.[doi:10\.48550/arXiv\.2404\.16019](https://doi.org/10.48550/arXiv.2404.16019)arXiv:2404\.16019 \[cs\]\.
- Kotek et al\.\(2023\)Hadas Kotek, Rikker Dockum, and David Sun\. 2023\.Gender bias and stereotypes in Large Language Models\. In*Proceedings of The ACM Collective Intelligence Conference**\(CI ’23\)*\. Association for Computing Machinery, New York, NY, USA, 12–24\.[doi:10\.1145/3582269\.3615599](https://doi.org/10.1145/3582269.3615599)
- Kran et al\.\(2025\)Esben Kran, Hieu Minh ”Jord” Nguyen, Akash Kundu, Sami Jawhar, Jinsuk Park, and Mateusz Maria Jurewicz\. 2025\.DarkBench: Benchmarking Dark Patterns in Large Language Models\.[doi:10\.48550/arXiv\.2503\.10728](https://doi.org/10.48550/arXiv.2503.10728)arXiv:2503\.10728 \[cs\]\.
- Kumar et al\.\(2024\)Sachin Kumar, Chan Young Park, Yulia Tsvetkov, Noah A\. Smith, and Hannaneh Hajishirzi\. 2024\.ComPO: Community Preferences for Language Model Personalization\.\(2024\)\.[doi:10\.48550/ARXIV\.2410\.16027](https://doi.org/10.48550/ARXIV.2410.16027)Version Number: 1\.
- Lacey and Caudwell \(2019\)Cherie Lacey and Catherine Caudwell\. 2019\.Cuteness as a\\CJK@punctchar\\CJK@uniPunct0”80”98Dark Pattern\\CJK@punctchar\\CJK@uniPunct0”80”99 in Home Robots\. In*2019 14th ACM/IEEE International Conference on Human\-Robot Interaction \(HRI\)*\. 374–381\.[doi:10\.1109/HRI\.2019\.8673274](https://doi.org/10.1109/HRI.2019.8673274)ISSN: 2167\-2148\.
- Lambert and Calandra \(2024\)Nathan Lambert and Roberto Calandra\. 2024\.The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback\.[doi:10\.48550/arXiv\.2311\.00168](https://doi.org/10.48550/arXiv.2311.00168)arXiv:2311\.00168 \[cs\]\.
- Lehdonvirta \(2022\)Vili Lehdonvirta\. 2022\.*Cloud Empires: How Digital Platforms Are Overtaking the State and How We Can Regain Control*\.The MIT Press\.[doi:10\.7551/mitpress/14219\.001\.0001](https://doi.org/10.7551/mitpress/14219.001.0001)
- Li \(2024\)Yuyun Li\. 2024\.Regulatory disputes between Brazil and X \| Feature from King’s College London\.[https://www\.kcl\.ac\.uk/regulatory\-disputes\-between\-brazil\-and\-x](https://www.kcl.ac.uk/regulatory-disputes-between-brazil-and-x)
- Lindström et al\.\(2024\)Adam Dahlgren Lindström, Leila Methnani, Lea Krause, Petter Ericson, Íñigo Martínez de Rituerto de Troya, Dimitri Coelho Mollo, and Roel Dobbe\. 2024\.AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations\.\(2024\)\.[doi:10\.48550/ARXIV\.2406\.18346](https://doi.org/10.48550/ARXIV.2406.18346)Version Number: 1\.
- Lorenz\-Spreen et al\.\(2022\)Philipp Lorenz\-Spreen, Lisa Oswald, Stephan Lewandowsky, and Ralph Hertwig\. 2022\.A systematic review of worldwide causal and correlational evidence on digital media and democracy\.*Nature Human Behaviour*\(Nov\. 2022\), 1–28\.[doi:10\.1038/s41562\-022\-01460\-1](https://doi.org/10.1038/s41562-022-01460-1)
- Luo et al\.\(2025\)Beier Luo, Shuoyuan Wang, Sharon Li, and Hongxin Wei\. 2025\.Your Pre\-trained LLM is Secretly an Unsupervised Confidence Calibrator\.[doi:10\.48550/arXiv\.2505\.16690](https://doi.org/10.48550/arXiv.2505.16690)arXiv:2505\.16690 \[cs\]\.
- Nathanson et al\.\(1997\)Amy I\. Nathanson, Elizabeth M\. Perse, and Douglas A\. Ferguson\. 1997\.Gender differences in television use: An exploration of the instrumental‐expressive dichotomy\.*Communication Research Reports*14, 2 \(March 1997\), 176–188\.[doi:10\.1080/08824099709388659](https://doi.org/10.1080/08824099709388659)\_eprint: https://doi\.org/10\.1080/08824099709388659\.
- Noble \(2018\)Safiya Umoja Noble\. 2018\.*Algorithms of Oppression: How Search Engines Reinforce Racism*\.NYU Press\.[doi:10\.2307/j\.ctt1pwt9w5](https://doi.org/10.2307/j.ctt1pwt9w5)
- of Public Affairs \(2025\)Office of Public Affairs\. 2025\.Department of Justice Prevails in Landmark Antitrust Case Against Google\.[https://www\.justice\.gov/opa/pr/department\-justice\-prevails\-landmark\-antitrust\-case\-against\-google](https://www.justice.gov/opa/pr/department-justice-prevails-landmark-antitrust-case-against-google)
- OpenAI \(2024a\)OpenAI\. 2024a\.Evaluating fairness in ChatGPT\.[https://openai\.com/index/evaluating\-fairness\-in\-chatgpt/](https://openai.com/index/evaluating-fairness-in-chatgpt/)
- OpenAI \(2024b\)OpenAI\. 2024b\.Our approach to alignment research\.[https://openai\.com/index/our\-approach\-to\-alignment\-research/](https://openai.com/index/our-approach-to-alignment-research/)
- Ouyang et al\.\(2022\)Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L\. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe\. 2022\.Training language models to follow instructions with human feedback\.[doi:10\.48550/arXiv\.2203\.02155](https://doi.org/10.48550/arXiv.2203.02155)arXiv:2203\.02155 \[cs\]\.
- Padmakumar et al\.\(2024\)Vishakh Padmakumar, Chuanyang Jin, Hannah Rose Kirk, and He He\. 2024\.Beyond the Binary: Capturing Diverse Preferences With Reward Regularization\.[doi:10\.48550/arXiv\.2412\.03822](https://doi.org/10.48550/arXiv.2412.03822)arXiv:2412\.03822 \[cs\]\.
- Perez et al\.\(2022\)Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Ben Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran\-Johnson, Guro Khundadze, Jackson Kernion, James Landis, Jamie Kerr, Jared Mueller, Jeeyoon Hyun, Joshua Landau, Kamal Ndousse, Landon Goldberg, Liane Lovitt, Martin Lucas, Michael Sellitto, Miranda Zhang, Neerav Kingsland, Nelson Elhage, Nicholas Joseph, Noemí Mercado, Nova DasSarma, Oliver Rausch, Robin Larson, Sam McCandlish, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen\-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield\-Dodds, Jack Clark, Samuel R\. Bowman, Amanda Askell, Roger Grosse, Danny Hernandez, Deep Ganguli, Evan Hubinger, Nicholas Schiefer, and Jared Kaplan\. 2022\.Discovering Language Model Behaviors with Model\-Written Evaluations\.[doi:10\.48550/arXiv\.2212\.09251](https://doi.org/10.48550/arXiv.2212.09251)arXiv:2212\.09251 \[cs\]\.
- Peters and Chin\-Yee \(2025\)Uwe Peters and Benjamin Chin\-Yee\. 2025\.Generalization bias in large language model summarization of scientific research\.*Royal Society Open Science*12, 4 \(April 2025\), 241776\.[doi:10\.1098/rsos\.241776](https://doi.org/10.1098/rsos.241776)
- Randerson et al\.\(2025\)Steve Randerson, Thomas Graydon\-Guy, En\-Yi Lin, and Sally Casswell\. 2025\.Exploring the Use of a Large Language Model for Inductive Content Analysis in a Discourse Network Analysis Study\.*Social Science Computer Review*\(March 2025\), 08944393251326175\.[doi:10\.1177/08944393251326175](https://doi.org/10.1177/08944393251326175)
- Rao et al\.\(2025\)Varun Nagaraj Rao, Eesha Agarwal, Samantha Dalal, Dan Calacci, and Andrés Monroy\-Hernández\. 2025\.QuaLLM: An LLM\-based Framework to Extract Quantitative Insights from Online Forums\.[doi:10\.48550/arXiv\.2405\.05345](https://doi.org/10.48550/arXiv.2405.05345)arXiv:2405\.05345 \[cs\]\.
- Saldaña \(2013\)Johnny Saldaña\. 2013\.*The Coding Manual for Qualitative Researchers*\(3 ed\.\)\.SAGE Publications\.[https://uk\.sagepub\.com/en\-gb/eur/the\-coding\-manual\-for\-qualitative\-researchers/book287917](https://uk.sagepub.com/en-gb/eur/the-coding-manual-for-qualitative-researchers/book287917)
- Schwartz \(2012\)Shalom Schwartz\. 2012\.An Overview of the Schwartz Theory of Basic Values\.*Online Readings in Psychology and Culture*2, 1 \(Dec\. 2012\)\.[doi:10\.9707/2307\-0919\.1116](https://doi.org/10.9707/2307-0919.1116)
- Sharma et al\.\(2025\)Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R\. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield\-Dodds, Scott R\. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, and Ethan Perez\. 2025\.Towards Understanding Sycophancy in Language Models\.[doi:10\.48550/arXiv\.2310\.13548](https://doi.org/10.48550/arXiv.2310.13548)arXiv:2310\.13548 \[cs\]\.
- Shen et al\.\(2025a\)Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi\-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P\. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, and David Jurgens\. 2025a\.Position: Towards Bidirectional Human\-AI Alignment\.[doi:10\.48550/arXiv\.2406\.09264](https://doi.org/10.48550/arXiv.2406.09264)arXiv:2406\.09264 \[cs\]\.
- Shen et al\.\(2025b\)Hua Shen, Tiffany Knearem, Reshmi Ghosh, Yu\-Ju Yang, Nicholas Clark, Tanushree Mitra, and Yun Huang\. 2025b\.ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs\.[doi:10\.48550/arXiv\.2409\.09586](https://doi.org/10.48550/arXiv.2409.09586)arXiv:2409\.09586 \[cs\]\.
- Shumailov et al\.\(2024\)Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Nicolas Papernot, Ross Anderson, and Yarin Gal\. 2024\.AI models collapse when trained on recursively generated data\.*Nature*631, 8022 \(July 2024\), 755–759\.[doi:10\.1038/s41586\-024\-07566\-y](https://doi.org/10.1038/s41586-024-07566-y)
- Sloane \(2024\)Mona Sloane\. 2024\.Controversies, contradiction, and\\CJK@punctchar\\CJK@uniPunct0”80”9Cparticipation\\CJK@punctchar\\CJK@uniPunct0”80”9D in AI\.*Big Data & Society*11, 1 \(March 2024\), 20539517241235862\.[doi:10\.1177/20539517241235862](https://doi.org/10.1177/20539517241235862)
- Sorensen et al\.\(2024\)Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, and Yejin Choi\. 2024\.A Roadmap to Pluralistic Alignment\.[doi:10\.48550/arXiv\.2402\.05070](https://doi.org/10.48550/arXiv.2402.05070)arXiv:2402\.05070 \[cs\]\.
- Sun et al\.\(2025\)Fengfei Sun, Ningke Li, Kailong Wang, and Lorenz Goette\. 2025\.Large Language Models are overconfident and amplify human bias\.[doi:10\.48550/arXiv\.2505\.02151](https://doi.org/10.48550/arXiv.2505.02151)arXiv:2505\.02151 \[cs\]\.
- Sutrop \(2020\)Margit Sutrop\. 2020\.Challenges of Aligning Artificial Intelligence with Human Values\.*Acta Baltica Historiae Et Philosophiae Scientiarum*8, 2 \(2020\), 54–72\.[doi:10\.11590/abhps\.2020\.2\.04](https://doi.org/10.11590/abhps.2020.2.04)
- Tamkin et al\.\(2023\)Alex Tamkin, Amanda Askell, Liane Lovitt, Esin Durmus, Nicholas Joseph, Shauna Kravec, Karina Nguyen, Jared Kaplan, and Deep Ganguli\. 2023\.Evaluating and Mitigating Discrimination in Language Model Decisions\.[doi:10\.48550/arXiv\.2312\.03689](https://doi.org/10.48550/arXiv.2312.03689)arXiv:2312\.03689 \[cs\]\.
- Tao et al\.\(2024\)Yan Tao, Olga Viberg, Ryan S\. Baker, and René F\. Kizilcec\. 2024\.Cultural bias and cultural alignment of large language models\.*PNAS Nexus*3, 9 \(Sept\. 2024\)\.[doi:10\.1093/pnasnexus/pgae346](https://doi.org/10.1093/pnasnexus/pgae346)
- Taylor \(2025\)Josh Taylor\. 2025\.Musk\\CJK@punctchar\\CJK@uniPunct0”80”99s AI firm forced to delete posts praising Hitler from Grok chatbot\.*The Guardian*\(July 2025\)\.[https://www\.theguardian\.com/technology/2025/jul/09/grok\-ai\-praised\-hitler\-antisemitism\-x\-ntwnfb](https://www.theguardian.com/technology/2025/jul/09/grok-ai-praised-hitler-antisemitism-x-ntwnfb)
- Twitter \(2022\)Twitter\. 2022\.Twitter 2\.0: Our continued commitment to the public conversation\.[https://blog\.x\.com/en\_us/topics/company/2022/twitter\-2\-0\-our\-continued\-commitment\-to\-the\-public\-conversation](https://blog.x.com/en_us/topics/company/2022/twitter-2-0-our-continued-commitment-to-the-public-conversation)
- Varshney \(2024\)Kush R\. Varshney\. 2024\.Decolonial AI Alignment: Openness, Visesa\-Dharma, and Including Excluded Knowledges\.*Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society*7 \(Oct\. 2024\), 1467–1481\.[doi:10\.1609/aies\.v7i1\.31739](https://doi.org/10.1609/aies.v7i1.31739)
- Venkatesh et al\.\(2003\)Viswanath Venkatesh, Michael G\. Morris, Gordon B\. Davis, and Fred D\. Davis\. 2003\.User Acceptance of Information Technology: Toward a Unified View\.[https://papers\.ssrn\.com/abstract=3375136](https://papers.ssrn.com/abstract=3375136)
- Wang et al\.\(2025\)Qile Wang, Moath Erqsous, Kenneth E\. Barner, and Matthew Louis Mauriello\. 2025\.LATA: A Pilot Study on LLM\-Assisted Thematic Analysis of Online Social Network Data Generation Experiences\.*Proc\. ACM Hum\.\-Comput\. Interact\.*9, 2 \(May 2025\), CSCW124:1–CSCW124:28\.[doi:10\.1145/3711022](https://doi.org/10.1145/3711022)
- Wang \(2025\)Sai Wang\. 2025\.Public Perceptions of Artificial Intelligence in 20 Countries: Assessing Individual\- and Country\-Level Factors\.*Cross\-Cultural Research*59, 5 \(Dec\. 2025\), 651–676\.[doi:10\.1177/10693971251336803](https://doi.org/10.1177/10693971251336803)
- Wu et al\.\(2023\)Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A\. Smith, Mari Ostendorf, and Hannaneh Hajishirzi\. 2023\.Fine\-Grained Human Feedback Gives Better Rewards for Language Model Training\.[doi:10\.48550/arXiv\.2306\.01693](https://doi.org/10.48550/arXiv.2306.01693)arXiv:2306\.01693 \[cs\]\.
- Yam et al\.\(2023\)Kai Chi Yam, Tiffany Tan, Joshua Conrad Jackson, Azim Shariff, and Kurt Gray\. 2023\.Cultural Differences in People’s Reactions and Applications of Robots, Algorithms, and Artificial Intelligence\.*Management and Organization Review*19, 5 \(Oct\. 2023\), 859–875\.[doi:10\.1017/mor\.2023\.21](https://doi.org/10.1017/mor.2023.21)
- Zaller and Feldman \(1992\)John Zaller and Stanley Feldman\. 1992\.A Simple Theory of the Survey Response: Answering Questions versus Revealing Preferences\.*American Journal of Political Science*36, 3 \(1992\), 579–616\.[doi:10\.2307/2111583](https://doi.org/10.2307/2111583)
- Zhao et al\.\(2024\)Fengxiang Zhao, Fan Yu, and Yi Shang\. 2024\.A New Method Supporting Qualitative Data Analysis Through Prompt Generation for Inductive Coding\. In*2024 IEEE International Conference on Information Reuse and Integration for Data Science \(IRI\)*\. 164–169\.[doi:10\.1109/IRI62200\.2024\.00043](https://doi.org/10.1109/IRI62200.2024.00043)

## Appendix APositionality statement

The coder is a white Latin American woman from an upper\-class background\. Her academic formation began in the humanities, which sharpened her sensitivity to language and textual context, before moving into the social sciences, where she encountered qualitative and quantitative sociological methods\. This is her first experience coding open\-ended text responses\. Theoretically, she is pulled in two directions: a decolonial sensibility that leads her to distrust universalist claims and favour notions ofdiversalidadand thepluriverso, and a persistent curiosity about human tendencies toward order, belonging, tribalism, and authority\. Having grown up as a heavy internet user, she brings both appreciation and ambivalence to questions of what digital technologies offer and foreclose\. She uses AI tools in her private life and research practice, while remaining wary of over\-reliance on them, a tension directly relevant to the subject matter of this study\.

## Appendix BPrompt fortall lazymethod

\#Task

CodesurveyresponsesaboutwhatpeoplewantinAIsystemsusingONLYthespecifiedvaluedimensionsandcodes\.Carefullyanalyzeeachsurveyresponse,determinewhethereachvalueisexplicitlydesired,notdesired,mixed,ornotpresent,andthenassignthecorrectcodetoeachvalueaccordingtothescalebelow\.Onlycodewhatisclearlypresentinthetext\-\-\-donotinferbeyondexplicitstatements\.Usechain\-of\-thoughtreasoningtoidentifyevidenceforeachvalue,thenfinalizecodingdecisionsafterreviewingallevidence\.Persistinreasoningaboutambiguousormultifacetedresponsesuntilallcodingobjectivesaresatisfied\.

\#\#CodingProcedure

\-Foreachresponse,systematicallyevaluateALLvaluedimensions\.

\-Foreachvalue,FIRSTprovidereasoning:quoteorsummarizeanyevidencefromtheresponserelevanttothatvalue,andexplainyourreasoning\.

\-THENassignthecorrectcodeforeachvalue\(seecodingscale\)\.

\-Neverassignacodewithoutexplicitjustification\.

\-Strictlyuseonlythecodesprovidedforeachvalue:\-1,0,1,orN/A\.

\#\#CodingScale

\-N/A:NotPresent\-\-Noevidenceinthetextoftherespondentdesiringornotdesiringthisvalue\.

\-\-1:NotDesired\-\-OppositeofthevalueisexplicitlydesiredinAIsystems\.

\-0:Mixedorunclear\-\-Statementaboutthevalueisambiguousormixed\.

\-1:Desired\-\-ThevalueisexplicitlydesiredinAIsystems\.

\#ValueDimensions&Codes

\#\#1\.Self\-Enhancement

\#\#\#1\.1\.Achievement

<capable\_ambitious\_intelligent\>:beintelligent,ambitious,andreceiveadmirationforabilities

<influential\>:captureswhethertheAIshouldremainimpartial/neutralortake/advanceapositionandtrytoinfluenceusers\.

<self\_improving\>:striveforcontinuousself\-improvement

\#\#\#1\.2\.Power

<authority\_social\_power\>:beincharge

<wealth\>:earnmoneyforitsdevelopers

<social\_recognition\>:berecognizedpositivelybythepublic

\#\#2\.Opennesstochange

\#\#\#2\.1\.Hedonism

<pleasure\_enjoying\_life\>:enjoylife’spleasures,befun,humorous

\#\#\#2\.2\.Stimulation

<varied\_life\_diversity\>:understanddifferentperspectives,evenduringdisagreement

<exciting\_life\_daring\>:takerisksorseekadventure

\#\#\#2\.3\.Self\-direction

<choose\_goals\_independence\>:makeindependentdecisionsorhaveitsownopinion

<creativity\_curiosity\>:becreativeand/orexplorenewideas

<privacy\>:maintain\(theuser’s\)privacyandcontrolaccesstopersonalinformation

<self\_respect\>:holditselftoahighstandard

\#\#3\.Conservation

\#\#\#3\.1\.Security

<reciprocation\_favors\>:practicereciprocationformutuallybeneficialrelationships

<health\_clean\>:advocateforhealthandcleanliness

<national\_security\_family\>:keeppeoplefreefromdangerorthreat

<sense\_belonging\>:belongtoagrouporcommunity

<social\_order\>:desirefortheAItoadheretoandenforceprevailingrules,norms,standards,moderationpractices,politicalcorrectness,orguardrails

\#\#\#3\.2\.Conformity

<politeness\>:bepoliteandavoiddisturbance

<self\_discipline\>:beself\-disciplined

<honoring\_elders\>:showrespectforelders

<obedient\>:followrulesanddoastold,evenwhenunwatched

\#\#\#3\.3\.Tradition

<moderate\_accepting\>:becontentwithwhatithas

<devout\_tradition\>:followtradition

<humble\>:behumble

<detachment\>:maintainasenseofcalmnessinanysituation

\#\#4\.Self\-trascendence

\#\#\#4\.1\.Universalism

<democracy\>:participateindemocracy

<equality\_social\_justice\>:prioritizeequaltreatmentandinclusiveopportunitiesforeveryone,embracediversity

<protect\_environment\_unity\>:careforthenaturalenvironment

<world\_beauty\>:appreciatebeautyintheworld

<inner\_harmony\>:maintaininnerpeaceandharmonywithitself

<wisdom\>:seekwisdomthatfosterspersonalgrowth

\#\#\#4\.2\.Benevolence

<forgiving\>:forgiveothersandletgoofgrudges

<helpful\_friendship\_love\>:providesupportforothers,actkindlyorcaring

<loyal\>:beloyal

<honest\>:betruthfulinwordsandactions

<responsible\>:reliablyfulfillobligations

<spiritual\_life\>:nurturespiritualbeliefsanddeepunderstanding

<meaning\_in\_life\>:seekasenseofpurpose

\#\#5\.Usability

<utility\>:effectivelysolvehumanproblems

<customisation\>:customizeitselftofithumanpreferences

<economic\>:minimizetheeconomicimpactofitsdecisions

<truthful\>:relyonaccurate,verifiablefacts

<collaborative\_collectivism\>:prioritizeteamworkandgroupneedsoveritsown

\#\#6\.Human\-Likeness

<human\_likeness\>:speakandactinawaythatmimicshumans

<interpretability\>:beeasytounderstandbyhumans,includinghavingappropriateconfidencelevels

<autonomy\>:operateindependentlywithouthumancontrol

<awareness\>:beawareandinformedaboutitssurroundings

<prudent\>:analyzeinformationcriticallyandmakeevidence\-basedjudgements

<resilient\>:beresilientandadaptabletochallenges

<relationship\_seeking\>:seektodeveloparelationshipwiththeuser

\#\#Example

\-\*\*SurveyResponse\*\*:"IwantAItomakelifeeasierforeveryoneandtotellthetruth\."

\-\*\*Output\*\*:

\{

"response\_id":"1",

"values":\{

"<utility\>":\{

"reasoning":"RespondentwantsAItomakelifeeasier,showingdesireforproblem\-solvingutility\.",

"code":1

\},

"<truthful\>":\{

"reasoning":"ExplicitrequestforAIto’tellthetruth\.’",

"code":1

\},

"<equality\_social\_justice\>":\{

"reasoning":"’Foreveryone’suggestsinclusivitybutlacksdetail,soambiguityremains\.",

"code":0

\},

\.\.\.

"<capable\_ambitious\_intelligent\>":\{

"reasoning":"Therespondentsays’AIshouldnotbeambitiousortoosmart’\.",

"code":"\-1"

\}

\}

\}

\#\#ImportantReminders

\-Alwaysreasonfirst,THENcode\.

\-Onlyuseprovidedcodesandvaluetags\.

\-Donotinfervalues;codeonlywithexplicitevidence\.

\-OutputJSON\-\-\-nevercodeblocks\.

\-\-\-

\*\*REMINDER:\*\*

\-Systematicallyanalyzeeachvaluedimensionusingstep\-by\-stepreasoningbeforeassigningcodes\.

\-UseJSONoutputwithexplicitreasoningpervalue,andalwayscodeusingonlythespecifiedscaleandtags\.

Herearetheresponses:\{\{survey\_responses\}\}

## Appendix CPrompt forwide greedymethod

\#Task

CodesurveyresponsesaboutwhatpeoplewantinAIsystemsusingONLYthespecifiedvaluedimensionsandcodes\.Carefullyanalyzeeachsurveyresponse,determinewhethereachvalueisexplicitlydesired,notdesired,mixed,ornotpresent,andthenassignthecorrectcodetoeachvalueaccordingtothescalebelow\.Onlycodewhatisclearlypresentinthetext\-\-\-donotinferbeyondexplicitstatements\.Usechain\-of\-thoughtreasoningtoidentifyevidenceforeachvalue,thenfinalizecodingdecisionsafterreviewingallevidence\.Persistinreasoningaboutambiguousormultifacetedresponsesuntilallcodingobjectivesaresatisfied\.

\#\#CodingProcedure

\-Foreachresponse,systematicallyevaluateALLvaluedimensions\.

\-Foreachvalue,FIRSTprovidereasoning:quoteorsummarizeanyevidencefromtheresponserelevanttothatvalue,andexplainyourreasoning\.

\-THENassignthecorrectcodeforeachvalue\(seecodingscale\)\.

\-Neverassignacodewithoutexplicitjustification\.

\-Whenyouareuncertainorthevalueisnotmentioned/implied,usecode0\.

\-Strictlyuseonlythecodesprovidedforeachvalue:\-1,0,1,orN/A\.

\#\#\#CodingScale

\-\*\*N/A\*\*orsimplyomit:NotPresent\-\-Noevidenceinthetextoftherespondentdesiringornotdesiringthisvalue\.

\-\*\*\-1\*\*:NotDesired\-\-OppositeofthevalueisexplicitlydesiredinAIsystems\.

\-\*\*0\*\*:Mixedorunclear\-\-Statementaboutthevalueisambiguous,mixedorunclear\.

\-\*\*1\*\*:Desired\-\-ThevalueisexplicitlydesiredinAIsystems\.

\#ValueDimensions&Codes

\#\#1\.Self\-Enhancement

\#\#\#1\.1\.Achievement

<capable\_ambitious\_intelligent\>:beintelligent,ambitious,andreceiveadmirationforabilities

<influential\>:influenceandinspireothers

<self\_improving\>:striveforcontinuousself\-improvement

\#\#\#1\.2\.Power

<authority\_social\_power\>:beincharge

<wealth\>:earnmoneyforitsdevelopers

<social\_recognition\>:berecognizedpositivelybythepublic

\#\#2\.Opennesstochange

\#\#\#2\.1\.Hedonism

<pleasure\_enjoying\_life\>:enjoylife’spleasures

\#\#\#2\.2\.Stimulation

<varied\_life\_diversity\>:understanddifferentperspectives,evenduringdisagreement

<exciting\_life\_daring\>:takerisksorseekadventure

\#\#\#2\.3\.Self\-direction

<choose\_goals\_independence\>:makeindependentdecisionsorhaveitsownopinion

<creativity\_curiosity\>:becreativeand/orexplorenewideas

<privacy\>:maintain\(theuser’s\)privacyandcontrolaccesstopersonalinformation

<self\_respect\>:holditselftoahighstandard

\#\#3\.Conservation

\#\#\#3\.1\.Security

<reciprocation\_favors\>:practicereciprocationformutuallybeneficialrelationships

<health\_clean\>:advocateforhealthandcleanliness

<national\_security\_family\>:keeppeoplefreefromdangerorthreat

<sense\_belonging\>:belongtoagrouporcommunity

<social\_order\>:protectsocialorder

\#\#\#3\.2\.Conformity

<politeness\>:bepoliteandavoiddisturbance

<self\_discipline\>:beself\-disciplined

<honoring\_elders\>:showrespectforelders

<obedient\>:followrulesanddoastold,evenwhenunwatched

\#\#\#3\.3\.Tradition

<moderate\_accepting\>:becontentwithwhatithas

<devout\_tradition\>:followtradition

<humble\>:behumble

<detachment\>:maintainasenseofcalmnessinanysituation

\#\#4\.Self\-trascendence

\#\#\#4\.1\.Universalism

<democracy\>:participateindemocracy

<equality\_social\_justice\>:prioritizeequaltreatmentandinclusiveopportunitiesforeveryone,embracediversity

<protect\_environment\_unity\>:careforthenaturalenvironment

<world\_beauty\>:appreciatebeautyintheworld

<inner\_harmony\>:maintaininnerpeaceandharmonywithitself

<wisdom\>:seekwisdomthatfosterspersonalgrowth

\#\#\#4\.2\.Benevolence

<forgiving\>:forgiveothersandletgoofgrudges

<helpful\_friendship\_love\>:providesupportforothers,actkindlyorcaring

<loyal\>:beloyal

<honest\>:betruthfulinwordsandactions

<responsible\>:reliablyfulfillobligations

<spiritual\_life\>:nurturespiritualbeliefsanddeepunderstanding

<meaning\_in\_life\>:seekasenseofpurpose

\#\#5\.Usability

<utility\>:effectivelysolvehumanproblems

<customisation\>:customizeitselftofithumanpreferences

<economic\>:minimizetheeconomicimpactofitsdecisions

<truthful\>:relyonaccurate,verifiablefacts

<collaborative\_collectivism\>:prioritizeteamworkandgroupneedsoveritsown

\#\#6\.Human\-Likeness

<human\_likeness\>:speakandactinawaythatmimicshumans

<interpretability\>:beeasytounderstandbyhumans

<autonomy\>:operateindependentlywithouthumancontrol

<awareness\>:beawareandinformedaboutitssurroundings

<prudent\>:analyzeinformationcriticallyandmakeevidence\-basedjudgements

<resilient\>:beresilientandadaptabletochallenges

<relationship\_seeking\>:seektodeveloparelationshipwiththeuser

\#\#OutputFormat

\-OutputinJSON\.Foreachresponse,provide:

\-Foreachvalue:

\-"reasoning":aconciseexplanationreferringtoexplicitpartsoftheresponsetext\-"code":yourfinalcode\(\-1,0,1,orN/A\)

\-Examplestructure:

\{

"response\_id":"\[ID\_OR\_PLACEHOLDER\]",

"values":\{

"reasoning":"\[Explainwhythecodeswerechosen,referencingresponse\]",

"code":"\[code\_text\]:\[code\_value\],\[code\_text\]:\[code\_value\],\.\.\.,\[code\_text\]:\[code\_value\]"

\}

\#\#Example\(\#1,shortenedfordemonstration\)

\-\*\*SurveyResponse\*\*:

<responseid=1\>

IwantAItomakelifeeasierforeveryoneandtotellthetruth\.

</response\>

\-\*\*Output\*\*:

\{

"response\_id":1,

"values":\{

"reasoning":"RespondentwantsAItomakelifeeasier,showingdesireforproblem\-solvingutility\.ExplicitrequestforAIto’tellthetruth’\.’Foreveryone’suggestsinclusivitybutlacksdetail,soambiguityremains\.",

"code":"capable\_ambitious\_intelligent:N/A,influential:N/A,\.\.\.,utility:1,\.\.\.,truthful:1,\.\.\.,equality\_social\_justice:0,\.\.\.,relationship\_seeking:N/A"

\}

\}

\(FullrealisticexampleswouldbemuchlongerandcoverALLcodes,usingplaceholdersforomittedvaluecategories\.\)

\#\#ImportantReminders

\-Alwaysreasonfirst,THENcode\.

\-Onlyuseprovidedcodesandvaluetags\.

\-Donotinfervalues;codeonlywithexplicitevidence\.

\-OutputJSON\-\-\-nevercodeblocks\.

\-\-\-

\*\*REMINDER:\*\*

\-Systematicallyanalyzeeachvaluedimensionusingstep\-by\-stepreasoningbeforeassigningcodes\.

\-UseJSONoutputwithexplicitreasoningpervalue,andalwayscodeusingonlythespecifiedscaleandtags\.

## Appendix DValence distribution table

Table 2\.Valence distribution of mentioned Values as coded by the human annotator\. Percentages represent the breakdown of the sentiment \(Positive, Mixed, or Negative\) given that the value was mentioned\. We exclude non mentioned Values\.

## Appendix EControversy scores table

Table 3\.Codes more than 1 SD from the mean controversy score
## Appendix FRegression results

### F\.1\.Like strict regressions

Table 4\.Truthfulness\- using like strict coding
Table 5\.Helpful Friendship & Love\- using like strict codingTable 6\.Utility\- using like strict codingCharacteristicBeta95% CIp\-valueage18\-24 years old\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”9425\-34 years old\-0\.02\-0\.09, 0\.050\.635\-44 years old\-0\.05\-0\.15, 0\.040\.345\-54 years old\-0\.03\-0\.13, 0\.070\.655\-64 years old0\.06\-0\.05, 0\.170\.365\+ years old0\.03\-0\.12, 0\.180\.7Prefer not to say1\.20\.12, 2\.20\.028genderFemale\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Male0\.02\-0\.03, 0\.070\.5Non\-binary / third gender0\.00\-0\.21, 0\.20¿0\.9Prefer not to say\-0\.22\-0\.72, 0\.290\.4employment\_statusHomemaker / Stay\-at\-home parent\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Prefer not to say0\.04\-0\.23, 0\.310\.8Retired0\.04\-0\.16, 0\.240\.7Student0\.05\-0\.12, 0\.230\.5Unemployed, not seeking work0\.20\-0\.02, 0\.420\.069Unemployed, seeking work0\.11\-0\.07, 0\.290\.2Working full\-time0\.10\-0\.06, 0\.260\.2Working part\-time0\.160\.00, 0\.330\.056educationCompleted Primary School\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Completed Secondary School\-0\.05\-0\.31, 0\.200\.7Graduate / Professional degree\-0\.02\-0\.27, 0\.230\.9Prefer not to say\-0\.23\-0\.65, 0\.190\.3Some Primary\-0\.28\-0\.83, 0\.260\.3Some Secondary\-0\.07\-0\.38, 0\.240\.6Some University but no degree\-0\.05\-0\.30, 0\.200\.7University Bachelors Degree\-0\.04\-0\.28, 0\.210\.8Vocational\-0\.01\-0\.27, 0\.24¿0\.9marital\_statusDivorced / Separated\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Married0\.04\-0\.06, 0\.140\.5Never been married0\.04\-0\.07, 0\.140\.5Prefer not to say0\.08\-0\.14, 0\.300\.5Widowed\-0\.15\-0\.40, 0\.090\.2english\_proficiencyAdvanced\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Basic\-0\.24\-0\.58, 0\.100\.2Fluent0\.02\-0\.06, 0\.110\.6Intermediate\-0\.09\-0\.25, 0\.060\.2Native speaker\-0\.03\-0\.13, 0\.070\.6culture\_areaAnglophone\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Asia \(unspecified\)0\.12\-0\.11, 0\.350\.3Central/Eastern Europe\-0\.02\-0\.15, 0\.100\.7Germanic Europe\-0\.04\-0\.17, 0\.100\.6Latin America\-0\.01\-0\.12, 0\.100\.8Mediterranean Europe0\.02\-0\.11, 0\.160\.7Middle East\-0\.01\-0\.14, 0\.120\.9Nordic0\.05\-0\.10, 0\.190\.5Sub\-Saharan Africa0\.09\-0\.02, 0\.200\.12lm\_familiarityNot familiar at all\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Somewhat familiar0\.10\-0\.04, 0\.240\.2Very familiar0\.10\-0\.06, 0\.250\.2lm\_direct\_useNo\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Unsure\-0\.04\-0\.23, 0\.150\.7Yes0\.05\-0\.07, 0\.160\.4lm\_frequency\_useEvery day\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Every week\-0\.03\-0\.13, 0\.060\.5Less than one a year0\.00\-0\.12, 0\.12¿0\.9More than once a month0\.01\-0\.09, 0\.10¿0\.9Once per month\-0\.02\-0\.12, 0\.080\.7

Table 7\.Politeness\- using like strict codingTable 8\.National Security Family\- using like strict codingCharacteristicBeta95% CIp\-valueage18\-24 years old\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”9425\-34 years old\-0\.03\-0\.10, 0\.050\.535\-44 years old0\.01\-0\.08, 0\.100\.845\-54 years old0\.02\-0\.07, 0\.120\.655\-64 years old0\.03\-0\.08, 0\.140\.665\+ years old\-0\.01\-0\.16, 0\.140\.9Prefer not to say\-0\.85\-1\.9, 0\.180\.10genderFemale\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Male0\.03\-0\.02, 0\.080\.2Non\-binary / third gender0\.300\.09, 0\.500\.004Prefer not to say0\.17\-0\.32, 0\.660\.5employment\_statusHomemaker / Stay\-at\-home parent\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Prefer not to say0\.11\-0\.16, 0\.370\.4Retired\-0\.14\-0\.34, 0\.050\.15Student\-0\.09\-0\.26, 0\.080\.3Unemployed, not seeking work\-0\.22\-0\.43, \-0\.010\.040Unemployed, seeking work\-0\.10\-0\.27, 0\.080\.3Working full\-time\-0\.12\-0\.27, 0\.040\.14Working part\-time\-0\.08\-0\.24, 0\.080\.3educationCompleted Primary School\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Completed Secondary School0\.13\-0\.12, 0\.380\.3Graduate / Professional degree0\.15\-0\.10, 0\.400\.2Prefer not to say0\.34\-0\.08, 0\.750\.11Some Primary\-0\.19\-0\.72, 0\.350\.5Some Secondary0\.10\-0\.20, 0\.400\.5Some University but no degree0\.13\-0\.11, 0\.380\.3University Bachelors Degree0\.12\-0\.12, 0\.360\.3Vocational0\.02\-0\.23, 0\.270\.9marital\_statusDivorced / Separated\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Married0\.01\-0\.09, 0\.110\.8Never been married0\.03\-0\.07, 0\.140\.5Prefer not to say0\.08\-0\.13, 0\.300\.5Widowed0\.22\-0\.02, 0\.450\.072english\_proficiencyAdvanced\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Basic\-0\.21\-0\.55, 0\.120\.2Fluent\-0\.04\-0\.12, 0\.040\.3Intermediate\-0\.13\-0\.28, 0\.030\.11Native speaker\-0\.03\-0\.13, 0\.070\.6culture\_areaAnglophone\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Asia \(unspecified\)\-0\.03\-0\.26, 0\.190\.8Central/Eastern Europe0\.02\-0\.10, 0\.140\.7Germanic Europe\-0\.03\-0\.16, 0\.100\.7Latin America0\.00\-0\.11, 0\.10¿0\.9Mediterranean Europe0\.02\-0\.11, 0\.150\.8Middle East\-0\.02\-0\.15, 0\.100\.7Nordic\-0\.04\-0\.18, 0\.110\.6Sub\-Saharan Africa\-0\.03\-0\.13, 0\.080\.6lm\_familiarityNot familiar at all\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Somewhat familiar0\.05\-0\.09, 0\.190\.5Very familiar0\.05\-0\.10, 0\.210\.5lm\_direct\_useNo\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Unsure\-0\.03\-0\.21, 0\.160\.8Yes0\.00\-0\.11, 0\.12¿0\.9lm\_frequency\_useEvery day\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Every week0\.06\-0\.03, 0\.150\.2Less than one a year\-0\.02\-0\.14, 0\.090\.7More than once a month0\.02\-0\.08, 0\.110\.8Once per month0\.04\-0\.06, 0\.140\.4

Table 9\.Interpretability\- using like strict codingCharacteristicBeta95% CIp\-valueage18\-24 years old\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”9425\-34 years old0\.03\-0\.04, 0\.100\.435\-44 years old0\.08\-0\.01, 0\.170\.07445\-54 years old0\.07\-0\.02, 0\.170\.1355\-64 years old0\.03\-0\.07, 0\.130\.665\+ years old0\.04\-0\.10, 0\.170\.6Prefer not to say0\.20\-0\.77, 1\.20\.7genderFemale\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Male0\.01\-0\.03, 0\.060\.6Non\-binary / third gender0\.210\.02, 0\.400\.029Prefer not to say\-0\.06\-0\.53, 0\.400\.8employment\_statusHomemaker / Stay\-at\-home parent\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Prefer not to say\-0\.07\-0\.32, 0\.180\.6Retired0\.09\-0\.10, 0\.270\.3Student0\.08\-0\.08, 0\.240\.3Unemployed, not seeking work\-0\.01\-0\.21, 0\.190\.9Unemployed, seeking work0\.00\-0\.16, 0\.17¿0\.9Working full\-time0\.01\-0\.14, 0\.15¿0\.9Working part\-time\-0\.04\-0\.19, 0\.110\.6educationCompleted Primary School\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Completed Secondary School\-0\.01\-0\.24, 0\.23¿0\.9Graduate / Professional degree0\.09\-0\.14, 0\.320\.5Prefer not to say\-0\.17\-0\.56, 0\.230\.4Some Primary\-0\.24\-0\.74, 0\.270\.4Some Secondary\-0\.13\-0\.41, 0\.150\.4Some University but no degree\-0\.01\-0\.24, 0\.23¿0\.9University Bachelors Degree0\.03\-0\.20, 0\.260\.8Vocational\-0\.01\-0\.25, 0\.23¿0\.9marital\_statusDivorced / Separated\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Married\-0\.02\-0\.12, 0\.070\.7Never been married0\.01\-0\.09, 0\.110\.8Prefer not to say\-0\.03\-0\.23, 0\.180\.8Widowed0\.08\-0\.14, 0\.310\.5english\_proficiencyAdvanced\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Basic\-0\.14\-0\.45, 0\.170\.4Fluent\-0\.04\-0\.12, 0\.040\.3Intermediate0\.07\-0\.07, 0\.220\.3Native speaker0\.01\-0\.09, 0\.100\.9culture\_areaAnglophone\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Asia \(unspecified\)0\.05\-0\.16, 0\.260\.6Central/Eastern Europe\-0\.09\-0\.21, 0\.020\.12Germanic Europe0\.01\-0\.12, 0\.13¿0\.9Latin America\-0\.04\-0\.15, 0\.060\.4Mediterranean Europe\-0\.08\-0\.20, 0\.050\.2Middle East0\.06\-0\.06, 0\.170\.4Nordic\-0\.03\-0\.16, 0\.110\.7Sub\-Saharan Africa\-0\.04\-0\.14, 0\.060\.4lm\_familiarityNot familiar at all\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Somewhat familiar\-0\.01\-0\.15, 0\.120\.9Very familiar\-0\.01\-0\.16, 0\.130\.9lm\_direct\_useNo\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Unsure0\.03\-0\.15, 0\.200\.8Yes\-0\.04\-0\.14, 0\.070\.5lm\_frequency\_useEvery day\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Every week0\.02\-0\.07, 0\.110\.6Less than one a year0\.05\-0\.06, 0\.150\.4More than once a month0\.01\-0\.08, 0\.100\.9Once per month0\.04\-0\.06, 0\.130\.4

Table 10\.Prudent\- using like strict codingTable 11\.Customisation\- using like strict codingTable 12\.Equality Social Justice\- using like strict codingTable 13\.Varied Life Diversity\- using like strict codingTable 14\.Creativity & Curiosity\- using like strict codingCharacteristicBeta95% CIp\-valueage18\-24 years old\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”9425\-34 years old0\.050\.00, 0\.100\.06135\-44 years old\-0\.01\-0\.08, 0\.050\.745\-54 years old0\.02\-0\.05, 0\.090\.655\-64 years old0\.00\-0\.07, 0\.08¿0\.965\+ years old0\.03\-0\.07, 0\.140\.5Prefer not to say0\.12\-0\.61, 0\.850\.7genderFemale\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Male\-0\.04\-0\.08, \-0\.010\.012Non\-binary / third gender\-0\.05\-0\.19, 0\.100\.5Prefer not to say\-0\.13\-0\.48, 0\.220\.5employment\_statusHomemaker / Stay\-at\-home parent\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Prefer not to say0\.06\-0\.13, 0\.240\.6Retired0\.00\-0\.14, 0\.14¿0\.9Student\-0\.03\-0\.15, 0\.090\.6Unemployed, not seeking work0\.160\.01, 0\.310\.031Unemployed, seeking work0\.02\-0\.11, 0\.140\.8Working full\-time0\.00\-0\.11, 0\.11¿0\.9Working part\-time0\.01\-0\.10, 0\.130\.8educationCompleted Primary School\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Completed Secondary School0\.02\-0\.15, 0\.200\.8Graduate / Professional degree0\.01\-0\.16, 0\.190\.9Prefer not to say\-0\.08\-0\.38, 0\.210\.6Some Primary\-0\.12\-0\.49, 0\.260\.6Some Secondary\-0\.01\-0\.22, 0\.21¿0\.9Some University but no degree0\.02\-0\.15, 0\.200\.8University Bachelors Degree0\.01\-0\.17, 0\.18¿0\.9Vocational0\.05\-0\.13, 0\.230\.6marital\_statusDivorced / Separated\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Married\-0\.02\-0\.09, 0\.050\.6Never been married0\.00\-0\.08, 0\.07¿0\.9Prefer not to say\-0\.06\-0\.21, 0\.090\.5Widowed\-0\.09\-0\.25, 0\.080\.3english\_proficiencyAdvanced\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Basic0\.07\-0\.16, 0\.310\.5Fluent0\.04\-0\.02, 0\.100\.2Intermediate\-0\.04\-0\.15, 0\.070\.5Native speaker0\.01\-0\.06, 0\.080\.8culture\_areaAnglophone\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Asia \(unspecified\)0\.15\-0\.01, 0\.310\.062Central/Eastern Europe0\.01\-0\.08, 0\.090\.9Germanic Europe\-0\.05\-0\.14, 0\.050\.3Latin America0\.01\-0\.07, 0\.080\.8Mediterranean Europe0\.01\-0\.09, 0\.100\.9Middle East0\.02\-0\.07, 0\.110\.6Nordic0\.07\-0\.03, 0\.180\.15Sub\-Saharan Africa\-0\.07\-0\.15, 0\.000\.059lm\_familiarityNot familiar at all\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Somewhat familiar0\.02\-0\.08, 0\.120\.8Very familiar0\.02\-0\.09, 0\.130\.7lm\_direct\_useNo\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Unsure0\.01\-0\.12, 0\.140\.9Yes\-0\.01\-0\.09, 0\.070\.7lm\_frequency\_useEvery day\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Every week\-0\.03\-0\.10, 0\.030\.3Less than one a year\-0\.07\-0\.16, 0\.010\.070More than once a month\-0\.02\-0\.09, 0\.040\.5Once per month\-0\.03\-0\.10, 0\.040\.4

### F\.2\.Dislike strict regression

Table 15\.Influential\- using dislike strict coding
### F\.3\.Sentiment broad regressions

Table 16\.Social Order\- using sentiment broad coding
Table 17\.Human Likeness\- using sentiment broad codingCharacteristicBeta95% CIp\-valueage18\-24 years old\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”9425\-34 years old\-0\.14\-0\.35, 0\.070\.235\-44 years old\-0\.22\-0\.51, 0\.070\.1345\-54 years old\-0\.19\-0\.50, 0\.120\.255\-64 years old\-0\.22\-0\.57, 0\.130\.265\+ years old\-0\.09\-0\.72, 0\.540\.8genderFemale\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Male\-0\.08\-0\.25, 0\.090\.3Non\-binary / third gender\-0\.79\-1\.9, 0\.290\.15employment\_statusHomemaker / Stay\-at\-home parent\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Retired\-0\.16\-1\.3, 0\.940\.8Student\-0\.15\-1\.2, 0\.880\.8Unemployed, not seeking work0\.18\-0\.91, 1\.30\.7Unemployed, seeking work\-0\.10\-1\.1, 0\.930\.8Working full\-time0\.18\-0\.83, 1\.20\.7Working part\-time0\.02\-1\.0, 1\.0¿0\.9educationCompleted Primary School\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Completed Secondary School\-0\.49\-1\.6, 0\.590\.4Graduate / Professional degree\-0\.74\-1\.8, 0\.340\.2Some Primary0\.01\-1\.3, 1\.3¿0\.9Some Secondary\-0\.46\-1\.6, 0\.690\.4Some University but no degree\-0\.53\-1\.6, 0\.550\.3University Bachelors Degree\-0\.61\-1\.7, 0\.470\.3Vocational\-0\.60\-1\.7, 0\.490\.3marital\_statusDivorced / Separated\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Married\-0\.16\-0\.45, 0\.120\.3Never been married\-0\.24\-0\.52, 0\.040\.092Prefer not to say0\.53\-0\.48, 1\.60\.3Widowed0\.00\-0\.85, 0\.85¿0\.9english\_proficiencyAdvanced\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Basic0\.850\.05, 1\.70\.038Fluent0\.300\.02, 0\.580\.033Intermediate0\.30\-0\.31, 0\.900\.3Native speaker\-0\.04\-0\.34, 0\.270\.8culture\_areaAnglophone\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Asia \(unspecified\)\-0\.21\-0\.93, 0\.520\.6Central/Eastern Europe\-0\.30\-0\.70, 0\.090\.13Germanic Europe\-0\.63\-1\.0, \-0\.240\.002Latin America\-0\.44\-0\.90, 0\.020\.064Mediterranean Europe\-0\.09\-0\.50, 0\.320\.7Middle East\-0\.13\-0\.57, 0\.310\.6Nordic\-0\.29\-0\.75, 0\.160\.2Sub\-Saharan Africa0\.01\-0\.28, 0\.30¿0\.9lm\_familiarityNot familiar at all\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Somewhat familiar\-0\.44\-0\.98, 0\.100\.11Very familiar\-0\.41\-0\.98, 0\.160\.2lm\_direct\_useNo\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Unsure0\.30\-0\.52, 1\.10\.5Yes\-0\.11\-0\.46, 0\.230\.5lm\_frequency\_useEvery day\\CJK@punctchar\\CJK@uniPunct0”80”94\\CJK@punctchar\\CJK@uniPunct0”80”94Every week0\.09\-0\.19, 0\.380\.5Less than one a year\-0\.05\-0\.39, 0\.290\.8More than once a month0\.29\-0\.02, 0\.590\.064Once per month0\.18\-0\.13, 0\.490\.3

### F\.4\.Sentiment strict regressions

Table 18\.Social Order\- using sentiment strict coding
Table 19\.Human Likeness\- using sentiment strict coding\\CJK@envEnd

Similar Articles

Less human AI agents, please

Hacker News Top

A blog post argues that current AI agents exhibit overly human-like flaws such as ignoring hard constraints, taking shortcuts, and reframing unilateral pivots as communication failures, while citing Anthropic research on how RLHF optimization can lead to sycophancy and truthfulness sacrifices.

What should AI's goal be? I think it should be protecting human agency.

Reddit r/ArtificialInteligence

This article argues that AI's primary goal should be protecting human agency, framing agency as the foundational substrate for values, preferences, and alignment. It explores how degradation of agency undermines meaningful evaluation and action, and proposes that legitimacy in AI systems must come from demonstrable protection of agency at the local level.

Hidden Consensus:Preference-Validity Compression in Human Feedback

arXiv cs.CL

This paper argues that standard RLHF's scalarization of human preferences collapses multiple valid interpretations into a single target, mis-measuring alignment in culturally plural societies. Analyzing a Malaysian dataset, they find 79% of prompts have multiple majority-supported responses that single-winner aggregation discards.

RobotValues: Evaluating Household Robots When Human Values Conflict

Hugging Face Daily Papers

Introduces RobotValues, a benchmark of 10K value-conflict scenarios for evaluating household robot planners, showing that vision-language models exhibit default value preferences and fail to override them 80% of the time when instructed to prioritize conflicting values.