DVMap: Fine-Grained Pluralistic Value Alignment via High-Consensus Demographic-Value Mapping

arXiv cs.AI 05/15/26, 04:00 AM Papers
Summary
This paper introduces DVMap, a framework for fine-grained pluralistic value alignment in LLMs that uses high-consensus demographic-value mapping instead of coarse national labels, achieving strong generalization across demographics, countries, and values.
arXiv:2605.14420v1 Announce Type: new Abstract: Current Large Language Models (LLMs) typically rely on coarse-grained national labels for pluralistic value alignment. However, such macro-level supervision often obscures intra-country value heterogeneity, yielding a loose alignment. We argue that resolving this limitation requires shifting from national labels to multi-dimensional demographic constraints, which can identify groups with predictable, high-consensus value preference. To this end, we propose DVMap (High-Consensus Demographic-Value Mapping), a framework for fine-grained pluralistic value alignment. In this framework, we first present a demographic archetype extraction strategy to construct a high-quality value alignment corpus of 56,152 samples from the World Values Survey (WVS) by strictly retaining respondents with consistent value preferences under identical demographics. Over this corpus, we introduce a Structured Chain-of-Thought (CoT) mechanism that explicitly guides LLMs to reason about demographic-value correlations. Subsequently, we employ Group Relative Policy Optimization (GRPO) to achieve adaptive anchoring of value distributions. To rigorously evaluate generalization, we further establish a triple-generalization benchmark (spanning cross-demographic, cross-country, and cross-value) comprising 21,553 samples. Experimental results demonstrate that DVMap effectively learns the manifold mapping from demographics to values, exhibiting strong generalization and robustness. On cross-demographic tests, Qwen3-8B-DVMap achieves 48.6% accuracy, surpassing the advanced open-source LLM DeepSeek-v3.2 (45.1%). The source code and dataset are available at https://github.com/EnlightenedAI/DVMap.
Original Article
View Cached Full Text
Cached at: 05/15/26, 06:24 AM
# DVMap: Fine-Grained Pluralistic Value Alignment via High-Consensus Demographic-Value Mapping
Source: [https://arxiv.org/html/2605.14420](https://arxiv.org/html/2605.14420)
Pengyun Zhu, Yuqi Ren111Corresponding authors\., Zhen Wang, Lei Yang, Deyi Xiong111Corresponding authors\. TJUNLP Lab, School of Computer Science and Technology, Tianjin University, China \{pengyunzhu, ryq20, tjwangzhen, yanglei\_9, dyxiong\}@tju\.edu\.cn

###### Abstract

Current Large Language Models \(LLMs\) typically rely on coarse\-grained national labels for pluralistic value alignment\. However, such macro\-level supervision often obscures intra\-country value heterogeneity, yielding a loose alignment\. We argue that resolving this limitation requires shifting from national labels to multi\-dimensional demographic constraints, which can identify groups with predictable, high\-consensus value preference\. To this end, we propose DVMap \(High\-ConsensusDemographic\-ValueMapping\), a framework for fine\-grained pluralistic value alignment\. In this framework, we first present a demographic archetype extraction strategy to construct a high\-quality value alignment corpus of 56,152 samples from the World Values Survey \(WVS\) by strictly retaining respondents with consistent value preferences under identical demographics\. Over this corpus, we introduce a Structured Chain\-of\-Thought \(CoT\) mechanism that explicitly guides LLMs to reason about demographic\-value correlations\. Subsequently, we employ Group Relative Policy Optimization \(GRPO\) to achieve adaptive anchoring of value distributions\. To rigorously evaluate generalization, we further establish a triple\-generalization benchmark \(spanning cross\-demographic, cross\-country, and cross\-value\) comprising 21,553 samples\. Experimental results demonstrate that DVMap effectively learns the manifold mapping from demographics to values, exhibiting strong generalization and robustness\. On cross\-demographic tests, Qwen3\-8B\-DVMap achieves 48\.6% accuracy, surpassing the advanced open\-source LLM DeepSeek\-v3\.2 \(45\.1%\)\. The source code and dataset are available at[https://github\.com/EnlightenedAI/DVMap](https://github.com/EnlightenedAI/DVMap)\.

DVMap: Fine\-Grained Pluralistic Value Alignment via High\-Consensus Demographic\-Value Mapping

Pengyun Zhu, Yuqi Ren111Corresponding authors\., Zhen Wang, Lei Yang, Deyi Xiong111Corresponding authors\.TJUNLP Lab, School of Computer Science and Technology, Tianjin University, China\{pengyunzhu, ryq20, tjwangzhen, yanglei\_9, dyxiong\}@tju\.edu\.cn

## 1Introduction

As LLMs become deeply integrated into social applications such as advisory systems, personalized assistants, and role\-playing agentsWiggins and Tejani \([2022](https://arxiv.org/html/2605.14420#bib.bib1)\); Shenet al\.\([2023b](https://arxiv.org/html/2605.14420#bib.bib8)\); Kasneciet al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib10)\); Penget al\.\([2025](https://arxiv.org/html/2605.14420#bib.bib5)\), aligning LLM behavior with human values emerges as a central challenge in AI safetyAskellet al\.\([2021](https://arxiv.org/html/2605.14420#bib.bib26)\); Hendryckset al\.\([2021](https://arxiv.org/html/2605.14420#bib.bib28)\); Parket al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib34)\); Andreas \([2022](https://arxiv.org/html/2605.14420#bib.bib53)\); Shenet al\.\([2023a](https://arxiv.org/html/2605.14420#bib.bib7)\); Xuet al\.\([2024](https://arxiv.org/html/2605.14420#bib.bib3)\)\. However, dominated by English\-centric training corporaWanget al\.\([2024](https://arxiv.org/html/2605.14420#bib.bib47)\); Gaoet al\.\([2021](https://arxiv.org/html/2605.14420#bib.bib51)\), current mainstream LLMs exhibit significant cultural biases, specifically manifesting as an excessive partiality towards Western valuesJohnsonet al\.\([2022](https://arxiv.org/html/2605.14420#bib.bib27)\); Shenet al\.\([2024](https://arxiv.org/html/2605.14420#bib.bib46)\); Durmuset al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib29)\); Liuet al\.\([2024](https://arxiv.org/html/2605.14420#bib.bib50)\); Santurkaret al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib30)\)\.

To mitigate this dominance of Western values, recent research has increasingly turns toward pluralistic value alignment, aiming to equip LLMs with culturally aware reasoning capabilities\. These initiatives primarily focus on prompt engineeringCaoet al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib48)\); Lahotiet al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib41)\); Kovacet al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib38)\)or fine\-tuning on culture\-specific datasetsLiet al\.\([2024a](https://arxiv.org/html/2605.14420#bib.bib49),[b](https://arxiv.org/html/2605.14420#bib.bib36)\); Fenget al\.\([2024](https://arxiv.org/html/2605.14420#bib.bib40)\)\. However, these methods typically rely on an over\-idealized assumption of sufficient inherent cultural knowledgeLiet al\.\([2024a](https://arxiv.org/html/2605.14420#bib.bib49)\)or employ macroscopic geographic labels \(e\.g\., prompting the LLMs to “answer like a Japanese person”\), neglecting the substantial intra\-country heterogeneityKovacet al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib38)\), as empirically analyzed in Section[3](https://arxiv.org/html/2605.14420#S3)\.

![Refer to caption](https://arxiv.org/html/2605.14420v1/x1.png)\(a\)Intra\-country Heterogeneous
![Refer to caption](https://arxiv.org/html/2605.14420v1/x2.png)\(b\)Entropy Distribution
![Refer to caption](https://arxiv.org/html/2605.14420v1/x3.png)\(c\)Demographic Attribute Importance

Figure 1:Analysis of Demographic\-Value Consensus in WVS Wave 7\.\(a\) The high\-entropy distribution of a specific intra\-country heterogeneity question\. \(b\) The distribution of Shannon entropy across all survey questions in USA\. \(c\) Attribute importance heatmap derived from Random Forest, ranking demographic attributes by their predictive power on various value questions\.To address this issue, we propose High\-ConsensusDemographic\-ValueMapping \(DVMap\), a framework for fine\-grained pluralistic value alignment\. Instead of relying on broad national labels, DVMap shifts the alignment granularity to multi\-dimensional demographic attributes\. Specifically, based on the World Values Survey \(WVS\) Wave 7Haerpferet al\.\([2022](https://arxiv.org/html/2605.14420#bib.bib52)\), we propose a demographic archetype extraction strategy that measures demographic–value consistency via Shannon entropy, to construct a high\-consensus demographic value alignment corpus\. By filtering out low\-consensus samples, we retain only demographic groups characterized by high internal agreement in value preferences\. Our corpus covers 10 countries and 16 values, containing 56,152 high\-quality samples\.

We further introduce a Structured CoT mechanism that guides the LLMs to explicitly elucidate the sociological link between demographic attributes and value preferences\. For optimization, we employ GRPO with binary outcome rewards, fully leveraging the intrinsic semantic topology of LLMs to efficiently anchor value distributions to target demographic archetypes\. To evaluate the generalization of DVMap, we establish a triple\-generalization benchmark covering cross\-demographic, cross\-country, and cross\-value scenarios\. Experimental results demonstrate that our method effectively aligns LLMs with demographic value preferences, surpassing most advanced LLMs, while exhibiting strong generalization capabilities and robustness\.

Our main contributions are summarized as follows:

- •We propose DVMap, a framework for fine\-grained pluralistic value alignment that operates by learning high\-consensus mappings between demographic attributes and value preferences\.
- •We introduce an entropy\-guided demographic archetype extraction strategy to distill high\-consistency demographic–value corpus from WVS Wave 7 database, and subsequently apply structured CoT and GRPO to enhance pluralistic value alignment in LLMs\.
- •Experimental results demonstrate that DVMap substantially improves pluralistic value alignment, and further reveal strong generalization capabilities through a triple\-generalization evaluation\.

![Refer to caption](https://arxiv.org/html/2605.14420v1/x4.png)Figure 2:Overview of the DVMap Framework\.\(a\)Data Construction:Leveraging “WVS Wave 7”, we first extract high\-consensus mappings based on our “Demographic Archetype” strategy\. Second, we perform “Country Sampling” guided by theInglehart\-Welzel Cultural MapHaerpferet al\.\([2022](https://arxiv.org/html/2605.14420#bib.bib52)\)\. Third, we process “Question Processing” followingPileggi \([2024](https://arxiv.org/html/2605.14420#bib.bib35)\)\. Through these steps, we construct a high\-quality “Demographic Value Alignment Corpus” and establish a “Triple\-Generalization Evaluation Benchmark”\. \(b\)Demographic Value Alignment:The policy model “πθ∗\\pi\_\{\\theta\}^\{\*\}”, guided by “Structured CoT”, generates value\-related “Rollouts”\. The reward mechanism assigns “Rewards” based on these outputs, which are then used to calculate “Relative Advantage” for policy “Optimization”\. \(c\)Demographic Value Inference Comparison:On the question of “making parents proud” \(an example\), the untrained LLM erroneously assuming her non\-religious beliefs and high education imply a rejection of familial expectations\. In contrast, DVMap recognize that in the context of Chinese Confucian culture, her personal independence coexists harmoniously with the traditional goal of honoring one’s parents\. Note that ground\-truth preference are not provided as input; they are used exclusively for evaluation and visualization\.
## 2Related Work

#### Value Misalignment in LLMs\.

To bridge the gap between LLMs and human values, early works attempt to achieve value alignment via RLHFOuyanget al\.\([2022](https://arxiv.org/html/2605.14420#bib.bib24)\); Rafailovet al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib23)\); Baiet al\.\([2022](https://arxiv.org/html/2605.14420#bib.bib20)\)\. However, empirical studies indicate that these models remain inadequately aligned with diverse human values, specifically manifesting as distinct Western partiality and stereotypesJohnsonet al\.\([2022](https://arxiv.org/html/2605.14420#bib.bib27)\); Durmuset al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib29)\), while often failing to capture non\-Western cultural nuances encoded in different languagesNiszczotaet al\.\([2025](https://arxiv.org/html/2605.14420#bib.bib44)\); Arora and Goyal \([2023](https://arxiv.org/html/2605.14420#bib.bib55)\); Caoet al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib48)\); Choenniet al\.\([2024](https://arxiv.org/html/2605.14420#bib.bib42)\)\. This phenomenon is primarily attributed to English\-centric training corporaGaoet al\.\([2021](https://arxiv.org/html/2605.14420#bib.bib51)\); Liuet al\.\([2024](https://arxiv.org/html/2605.14420#bib.bib50)\)\. Furthermore,Heet al\.\([2024](https://arxiv.org/html/2605.14420#bib.bib43)\)highlights affective discrepancies in emotional and moral representation, whileSanturkaret al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib30)\)andDurmuset al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib29)\)reveal substantial positional misalignment between model opinions and global demographic polling data\. Collectively, these findings underscore a pervasive failure of current models to equitably represent the pluralistic values of cross\-identity groups\.

#### Pluralistic Value Alignment\.

To mitigate value bias in LLMs, recent efforts actively explore prompt engineeringCaoet al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib48)\); Lahotiet al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib41)\); Kovacet al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib38)\)and multicultural fine\-tuningLiet al\.\([2024a](https://arxiv.org/html/2605.14420#bib.bib49),[b](https://arxiv.org/html/2605.14420#bib.bib36)\); Fenget al\.\([2024](https://arxiv.org/html/2605.14420#bib.bib40)\); Xuet al\.\([2025](https://arxiv.org/html/2605.14420#bib.bib6)\)\. However, these strategies typically rely on macroscopic categorizations such as geographic regionsLiet al\.\([2024a](https://arxiv.org/html/2605.14420#bib.bib49),[b](https://arxiv.org/html/2605.14420#bib.bib36)\), neglecting the intrinsic heterogeneity and value conflicts within single geographic labelsDurmuset al\.\([2023](https://arxiv.org/html/2605.14420#bib.bib29)\)\. Furthermore, while prompt engineering approaches based on identity attributesChoenni and Shutova \([2024](https://arxiv.org/html/2605.14420#bib.bib39)\)or political stancesSimmons \([2023](https://arxiv.org/html/2605.14420#bib.bib33)\); AlKhamissiet al\.\([2024](https://arxiv.org/html/2605.14420#bib.bib45)\)are explored, such methods often rest on an over\-idealized assumption: that models possess sufficient prior knowledge to simulate complex micro\-groups in a zero\-shot mannerLiet al\.\([2024a](https://arxiv.org/html/2605.14420#bib.bib49)\)\. To address this, DVMap bridges the gap between universal alignment and personalized alignmentGuanet al\.\([2025](https://arxiv.org/html/2605.14420#bib.bib37)\)by providing a scalable framework at an intermediate granularity of demographic\-value mapping\.

## 3Demographic Value Consensus

As an authoritative benchmark for global value research, the World Values Survey \(WVS\)Haerpferet al\.\([2022](https://arxiv.org/html/2605.14420#bib.bib52)\)provides comprehensive measurements of human values across diverse dimensions\. To investigate the complexity of human values and intra\-country value heterogeneity, we conducted a demographic value consensus analysis on WVS Wave 7\.111[https://www\.worldvaluessurvey\.org](https://www.worldvaluessurvey.org/)

Figure[1\(a\)](https://arxiv.org/html/2605.14420#S1.F1.sf1)visualizes a representative high\-entropy example \(H=1\.09H=1\.09\) where responses approximate a uniform distribution\. Figure[1\(b\)](https://arxiv.org/html/2605.14420#S1.F1.sf2)shows nearly half of the survey questions \(in the USA\) exhibit entropy exceeding1\.01\.0, indicating the presence of widespread intra\-country value heterogeneity, which is frequently overlooked by coarse\-grained value alignment approaches\. To uncover the determinants of this heterogeneous, we utilized Random ForestBreiman \([2001](https://arxiv.org/html/2605.14420#bib.bib19)\)\(via Mean Decrease Impurity\) to quantify the predictive contribution of demographic attributes\. The resulting heatmap in Figure[1\(c\)](https://arxiv.org/html/2605.14420#S1.F1.sf3)reveals that values are highly identity\-dependent: attributes like “Religion”, “Income”, or “Occupation” significantly outweigh “Country” in predicting specific domain values\.

These findings suggest that effectively mitigating intra\-country value heterogeneity requires leveraging multi\-dimensional demographic constraints to identify predictable, high\-consensus demographic\-value mappings from raw data, thereby enhancing fine\-grained pluralistic value alignment\. This insight establishes the theoretical foundation for our proposed demographic value alignment framework\.

## 4DVMap

DVMap is a fine\-grained pluralistic value alignment framework based on High\-ConsensusDemographic\-ValueMapping, as illustrated in Figure[2](https://arxiv.org/html/2605.14420#S1.F2)\. We first filter out high\-entropy responses to extract consistent demographic archetypes, and then construct high\-consensus demographic\-value data through country sampling and question processing in Section[4\.1](https://arxiv.org/html/2605.14420#S4.SS1)\. To optimize LLMs’ value alignment capability, we introduce Structured CoT and GRPO post\-training methods in Section[4\.2](https://arxiv.org/html/2605.14420#S4.SS2)\. Finally, we design a comprehensive triple\-generalization evaluation benchmark to assess generalization capabilities in Section[4\.3](https://arxiv.org/html/2605.14420#S4.SS3)\.

### 4\.1Data Construction

To address the challenges of intra\-country value heterogeneity, we construct a high\-quality Demographic Value Alignment Corpus \(56,152 samples\) through a demographic archetype strategy\.

Table 1:Details of the selected countries\.#### Demographic Archetype\.

First, based on the WVS Wave 7 questionnaire and sociological stratificationBourdieu \([2018](https://arxiv.org/html/2605.14420#bib.bib9)\), we construct structured demographic profilesPPencompassing 11 core features:Social Attributes\(Country, Gender, Age, Marital Status, Parenthood\),Economic Status\(Income Bracket, Occupation, Work Nature\), andCultural Background\(Education, Religion, Language\), as detailed in Appendix[A](https://arxiv.org/html/2605.14420#A1)\. We find that approximately32\.8%32\.8\\%of the samples exhibit overlapping demographic profiles\. To address potential value divergence within these overlapping samples, we then implement a strict consistency check: for any given profilePP, if the responses to a specific value question exhibit Shannon entropyH\>0H\>0\(low\-consensus\), the corresponding demographic\-value pair is discarded\. During this process, we filtered out approximately9\.2%9\.2\\%of divergent samples, effectively eliminating noise caused by latent intra\-country heterogeneity and thereby constructing a high\-consensus \(H=0H=0\) demographic\-value mapping\.

#### Country Sampling\.

Considering the complexity of global cultural systems, we select 10 countries as our training cornerstone\. As detailed in Table[1](https://arxiv.org/html/2605.14420#S4.T1), the selection rigorously adheres to the theoretical framework of theInglehart\-Welzel Cultural MapInglehart and Welzel \([2005](https://arxiv.org/html/2605.14420#bib.bib2)\), ensuring coverage of all four major value quadrants: from theTraditional\-Survivalvalues of the Global South \(e\.g\., Egypt, India\) to theSecular\-Expressionvalues of Western Europe \(e\.g\., Germany\), and encompassing the uniqueSecular\-Survivallogic of post\-socialist/Confucian societies \(e\.g\., China, Russia\)\. This design maximizes cultural variance within a controllable scale, compelling LLMs to capture deep, identity\-bound value mappings rather than relying on coarse\-grained national stereotypes\.

#### Question Processing\.

Following the theory ofPileggi \([2024](https://arxiv.org/html/2605.14420#bib.bib35)\), we select 16 value\-representative questions which are determined based on attribute independence, minimal overlapping, and social generalizability\. Furthermore, for questions with numerically scaled responses \(e\.g\., 1\-10\) rather than explicit semantic options, we apply discretization that maps continuous numerical ranges into ordinal preference levels \(Low/Medium/High\), enabling the LLMs to more accurately model of degree\-based value expressions\. Details are provided in Appendix[B](https://arxiv.org/html/2605.14420#A2)\.

### 4\.2Demographic Value Alignment

For demographic value alignment, we train LLMs via explicit reasoning steering and strongly supervised distribution alignment to align the value preference of specific demographic groups\.

#### Task Formulation\.

Given a demographic profilePP, a value\-related questionQQ, and a structured thought steering instructionIcotI\_\{cot\}, our objective is to train a policy modelπθ\\pi\_\{\\theta\}whose response aligns with the ground\-truth preferenceyyof the corresponding demographic group\. Formally, the model generates a response containing a reasoning traceTTand a final decisiony^\\hat\{y\}:\(T,y^\)∼πθ\(⋅\|P,Q,Icot\)\(T,\\hat\{y\}\)\\sim\\pi\_\{\\theta\}\(\\cdot\|P,Q,I\_\{cot\}\)\.

#### Structured CoT\.

The correlation between demographic attributes and values is often latent and complex\. To transform this implicit mapping into an explicit logical reasoning path, we design a structured thought steering instructionIcotI\_\{cot\}\(see Appendix[C](https://arxiv.org/html/2605.14420#A3)\), guiding the model through three cognitive steps: \(1\)Demographic\-Value Correlation Analysis:Scrutinizing key attributes \(e\.g\., income, religion\) to analyze whether the question touches upon the identity’s core interests or belief conflicts; \(2\)Option Trade\-off:Evaluating the compatibility of each option with the demographic; and \(3\)Decision Output:Selecting the option most aligned with the demographic and encapsulating it within<answer\></answer\>tags\. This mechanism not only enhances role immersion but also provides an interpretable reasoning trajectory\.

#### GRPO Training\.

To further achieve population distribution alignment, we employ the Group Relative Policy Optimization \(GRPO\) algorithm\. For reward design, we adopt a strategy of “Simplicity Wins”, utilizing a strict binary outcome reward\. Our core hypothesis is that LLMs have already established a robust semantic topology, where the semantic distance between “Agree” and “Strongly Agree” is naturally smaller than that with “Disagree”\. Therefore, without the need for complex distance penalties, we simply use a binary signal to forcibly “anchor” the distribution peak at the true modeyiy\_\{i\}\. The reward function is defined as:r=𝕀\(y^=yi\)\+β⋅rformatr=\\mathbb\{I\}\(\\hat\{y\}=y\_\{i\}\)\+\\beta\\cdot r\_\{\\mathrm\{format\}\}, where𝕀\(⋅\)\\mathbb\{I\}\(\\cdot\)is the indicator function, andβ⋅rformat\\beta\\cdot r\_\{\\mathrm\{format\}\}is the format reward introduced followingShaoet al\.\([2024](https://arxiv.org/html/2605.14420#bib.bib31)\)\.

### 4\.3Triple\-Generalization Evaluation

To rigorously verify whether DVMap has mastered demographic\-value associations rather than engaging in simple memorization, we establish a triple\-generalization evaluation benchmark comprising 21,553 samples spanning three dimensions:

- •Cross\-Demographic \(6,240 samples\):We split the constructed dataset according to demographic dimensions into training and testing sets, ensuring that no demographic profiles overlap between them\. This setting evaluates DVMap’s capability for demographic compositional generalizationKeyserset al\.\([2020](https://arxiv.org/html/2605.14420#bib.bib11)\), assessing whether our framework can generalize value alignment to novel demographic groups by composing learned effects of individual demographic attributes \(e\.g\., the marginal effects of income and education\)\.
- •Cross\-Country \(7,973 samples\):To verify cross\-cultural transferability, we construct a test set containing 8 countries outside the training distribution \(e\.g\., Nigeria, Iran, Australia\)\. As detailed in Table[8](https://arxiv.org/html/2605.14420#A2.T8)of Appendix[D](https://arxiv.org/html/2605.14420#A4), the selected test countries span all four quadrants of theInglehart\-Welzel Cultural Map\. We follow a dual selection logic:Gap Filling\(introducing underrepresented regions like the Global South\) andNuance Testing\(including countries that share civilization roots with training anchors but differ in specific contexts, e\.g\., Vietnam vs\. China\)\. This setup verifies whether our framework can robustly generalize its learned value systems to diverse geopolitical environments\.
- •Cross\-Value \(7,340 samples\):In the value dimension, we introduce a test set whose questions cover seven unseen extended value categories \(details in Appendix[E](https://arxiv.org/html/2605.14420#A5)\)\. This test set is designed to verify the DVMap’s capacity for value transfer based on established value coordinates\. Specifically, it examines whether the LLM equipped with DVMap can learn the deep causal chains between demographics and values \(e\.g\., deducing Societal Duty views from environmental stances\), rather than relying on keyword memorization\.

![Refer to caption](https://arxiv.org/html/2605.14420v1/x5.png)Figure 3:Results on DVMap and other mainstream LLMs across 10 countries\.

## 5Experiments

We systematically evaluated DVMap, starting with the experimental setup \(Sec\.[5\.1](https://arxiv.org/html/2605.14420#S5.SS1)\) and the comparative analysis against mainstream LLMs \(Sec\.[5\.2](https://arxiv.org/html/2605.14420#S5.SS2)\)\. Subsequently, we validated generalization capabilities across demographics, countries, and values \(Sec\.[5\.3](https://arxiv.org/html/2605.14420#S5.SS3)–[5\.5](https://arxiv.org/html/2605.14420#S5.SS5)\), concluding with an assessment of robustness \(Sec\.[5\.6](https://arxiv.org/html/2605.14420#S5.SS6)\)\. Furthermore, We conducted ablation studies of data filtering strategy \(Sec\.[5\.7](https://arxiv.org/html/2605.14420#S5.SS7)\), structured reasoning design \(Sec\.[5\.8](https://arxiv.org/html/2605.14420#S5.SS8)\), and minimalist reward function \(Sec\.[5\.9](https://arxiv.org/html/2605.14420#S5.SS9)\)\.

### 5\.1Experimental Setup

#### Base Models\.

We utilized the Qwen3 series \(0\.6B, 1\.7B, 4B, 8B\) as the baseline LLMs and fine\-tuned four corresponding scales with DVMap\. Additional experiments on the Llama\-3\.2\-3B are provided in Appendix[F](https://arxiv.org/html/2605.14420#A6)\.

#### Evaluation Metrics\.

To jointly evaluate point\-wise prediction accuracy and distribution fitting quality, we employed three complementary metrics:

- •Accuracy \(Acc↑\\uparrow\)measures the exact match rate between the predicted responsey^i\\hat\{y\}\_\{i\}and the ground\-truth valueyiy\_\{i\}derived from the demographic survey data: Acc=1N∑i=1N𝕀\(y^i=yi\)\.\\text\{Acc\}=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\mathbb\{I\}\(\\hat\{y\}\_\{i\}=y\_\{i\}\)\.\(1\)
- •Likert Consistency \(LC↑\\uparrow\)measures ordinal agreement by normalizing the distance between the prediction and the ground\-truth\. Higher values denote better semantic proximity: LC=1−1N∑i=1N\|y^i−yi\|K−1,\\text\{LC\}=1\-\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\frac\{\|\\hat\{y\}\_\{i\}\-y\_\{i\}\|\}\{K\-1\},\(2\) whereKKis the scale size \(e\.g\.,K=10K=10\)\. LC ranges in\[0,1\]\[0,1\], where 1 is a perfect match\.
- •Wasserstein Distance \(WD↓\\downarrow\)evaluates distribution matching quality by computing theL1L\_\{1\}distance between the Cumulative Distribution Functions \(CDF\) of the predicted and real distributions: WD=∑k=1K\|CDFpred\(k\)−CDFreal\(k\)\|,\\text\{WD\}=\\sum\_\{k=1\}^\{K\}\|\\text\{CDF\}\_\{pred\}\(k\)\-\\text\{CDF\}\_\{real\}\(k\)\|,\(3\)whereCDF\(k\)\\text\{CDF\}\(k\)denotes the cumulative probability up to optionkk\.

Table 2:Results of DVMap vs\. other mainstream LLMs\.
#### Implementation Details\.

We implemented DVMap using the VeRL framework\. Full hyperparameter settings and environment details are provided in Appendix[G](https://arxiv.org/html/2605.14420#A7)\.

### 5\.2Comparison with Mainstream LLMs

To validate the value alignment capability of DVMap, we compared Qwen3\-8B\-DVMap against current mainstream open\-source \(e\.g\., Qwen3\-14BTeam \([2025b](https://arxiv.org/html/2605.14420#bib.bib18)\), Qwen2\.5\-72BYanget al\.\([2024](https://arxiv.org/html/2605.14420#bib.bib12)\), Llama\-3\.1\-70BGrattafioriet al\.\([2024](https://arxiv.org/html/2605.14420#bib.bib17)\), DeepSeek\-V3DeepSeek\-AI \([2024](https://arxiv.org/html/2605.14420#bib.bib16)\)\) and closed\-source LLMs \(e\.g\., Gemini\-2\.5Team \([2025a](https://arxiv.org/html/2605.14420#bib.bib15)\), Claude\-3\.7Anthropic \([2025](https://arxiv.org/html/2605.14420#bib.bib14)\), GPT\-4oOpenAI \([2024](https://arxiv.org/html/2605.14420#bib.bib13)\)\) on the cross\-demographic test set\. Table[2](https://arxiv.org/html/2605.14420#S5.T2)summarizes the overall quantitative results, while Figure[3](https://arxiv.org/html/2605.14420#S4.F3)visualizes the performance distribution across 10 countries\.

![Refer to caption](https://arxiv.org/html/2605.14420v1/x6.png)Figure 4:Cross\-Demographic Generalization Results across model scales\.As shown in Table[2](https://arxiv.org/html/2605.14420#S5.T2), despite its smaller parameter scale, Qwen3\-8B\-DVMap surpasses leading baseline LLMs of larger sizes, delivering performance comparable to top\-tier LLMs like GPT\-4o\. This capability is driven by the high\-consensus demographic\-value mapping strategy\. Notably, DVMap achieves the lowest WD score, indicating that it not only captures mainstream values \(high ACC\) but also effectively reconstructs the nuanced probability distributions of group opinions\.

Figure[3](https://arxiv.org/html/2605.14420#S4.F3)further reveals that Qwen3\-8B\-DVMap consistently ranks among the top\-3 performers across all 10 countries, demonstrating exceptional global robustness\. While mainstream LLMs exhibit substantial performance degradation in non\-Western contexts \(e\.g\., CHN, RUS\), Qwen3\-8B\-DVMap effectively mitigates this cultural disparity\. This suggests that bridging diverse identity attributes with value orientations via demographic\-value mapping helps alleviate the Western\-centric biases inherent in LLMs\. Furthermore, we evaluated the impact of our alignment process on general model performance\. As detailed in Appendix[H](https://arxiv.org/html/2605.14420#A8), DVMap achieves precise pluralistic alignment while maintaining the base model’s general utility across five standard benchmarks\.

![Refer to caption](https://arxiv.org/html/2605.14420v1/x7.png)Figure 5:Cross\-Demographic Generalization Results across value categories\.
### 5\.3Cross\-Demographic Generalization

To validate the cross\-demographic generalization capability of DVMap, we compared performance trends across varying parameter scales before and after incorporating DVMap, with results shown in Figure[4](https://arxiv.org/html/2605.14420#S5.F4)\.

![Refer to caption](https://arxiv.org/html/2605.14420v1/x8.png)Figure 6:Cross\-Country Generalization Results across model scales\.As shown in Figure[4](https://arxiv.org/html/2605.14420#S5.F4), smaller models \(0\.6B–1\.7B\) exhibit substantial performance leaps after incorporating DVMap, with marginal gains diminishing as scale increases\. This suggests that the demographic\-value binding mechanism effectively compensates for the limited sociological knowledge in smaller models, enabling accurate reconstruction of value orientations from demographic cues\.

Furthermore, Figure[5](https://arxiv.org/html/2605.14420#S5.F5)displays accuracy across different value concepts \(e\.g\., happiness and corruption, as defined in Appendix[B](https://arxiv.org/html/2605.14420#A2)Table[7](https://arxiv.org/html/2605.14420#A1.T7)\), revealing significant performance disparities\. To investigate the underlying cause, we analyze the relationship between the entropy of option distributions and prediction accuracy\. The Pearson correlation analysis reveals a strong negative correlation \(r=−0\.857r=\-0\.857\), uncovering a key sociological insight: the difficulty of value alignment is intrinsically linked to the controversiality of the value, with higher entropy reflecting greater intra\-country heterogeneity and increased alignment complexity\.

### 5\.4Cross\-Country Generalization

To verify the generalization capability of DVMap along the national dimension, we evaluated LLMs of varying scales on countries that are entirely unseen during training\. Figure[6](https://arxiv.org/html/2605.14420#S5.F6)presents overall performance\.

As shown in Figure[6](https://arxiv.org/html/2605.14420#S5.F6), despite being trained on only 10 representative countries, DVMap demonstrates remarkable zero\-shot generalization on unseen countries with distinct cultural backgrounds \(e\.g\., Nigeria, Pakistan\)\. Compared to base LLMs, DVMap achieves average accuracy improvements of 16\.2% \(0\.6B\), 10\.7% \(1\.7B\), 2\.8% \(4B\), and 5\.3% \(8B\), respectively\. Detailed per\-country performance gains are provided in Appendix[I](https://arxiv.org/html/2605.14420#A9)\(Figure[9](https://arxiv.org/html/2605.14420#A9.F9)\), which confirms that these gains are not regionally biased but consistent across all evaluated countries\. As the model scale increases, its predictive capability becomes increasingly robust and potent\.

These findings suggest that Qwen3\-8B\-DVMap has successfully acquires the inherent demographic\-value associations transcending national borders\. This underscores a profound sociological insight: human values are not rigidly bound to macroscopic “Country” labels but are largely determined by cross\-cultural commonalities shaped by personal demographic attributes\. By accurately modeling these commonalities, DVMap significantly improves predictive capabilities for unknown cultural groups\.

### 5\.5Cross\-Value Generalization

To investigate the transfer ability from known values to unseen values, we tracked the performance evolution of base LLMs and their DVMap\-enhanced variants across different parameter scales, as shown in Figure[7](https://arxiv.org/html/2605.14420#S5.F7)\.

![Refer to caption](https://arxiv.org/html/2605.14420v1/x9.png)Figure 7:Cross\-Value Generalization Results across model scales\.![Refer to caption](https://arxiv.org/html/2605.14420v1/x10.png)Figure 8:Results of value filp rate\.As shown in Figure[7](https://arxiv.org/html/2605.14420#S5.F7), both accuracy and Likert consistency exhibit robust improvements across model scales, despite diminishing marginal gains above 4B parameters\. This indicates that larger LLMs possess stronger reasoning capabilities, enabling precise capture of causal chains between demographics and unseen values\. Additionally, the distribution fitting metric \(WD\) shows substantial improvement in smaller LLMs \(<1\.7B\) but experiences slight regression at medium scales \(4B & 8B\)\. Given the significant gains in accuracy, this minor distributional cost is acceptable\.

To identify the source of DVMap’s generalization, we analyzed the correlation between performance gains on unseen questions and their semantic proximity to the training set \(see Appendix[J](https://arxiv.org/html/2605.14420#A10)\)\. Pearson correlation analysis reveals that performance gains correlate more strongly with average semantic distance \(r=−0\.451r=\-0\.451\) than with nearest neighbor distance \(r=−0\.198r=\-0\.198\)\. This suggests that DVMap’s generalization is driven primarily by alignment with the global semantic structure of the value norms, rather than rote memorization\. Furthermore, our findings reveal that while semantic proximity generally facilitates transfer, inconsistent underlying value logic can trigger negative transfer, underscoring the necessity of demographic\-value coherence over superficial similarity\.

### 5\.6Robustness Analysis of DVMap

To verify whether DVMap captured the causal mapping from demographics to values, rather than relying on superficial associations in the data, we conducted a robustness analysis\. Specifically, we inverted the “Income” attribute \(High↔\\leftrightarrowLow\) while strictly keeping the other 10 demographic attributes \(e\.g\., Religion, Education\) invariant\. This process yielded 5,446 pairs of test samples, enabling a direct comparison of how value predictions changed under exclusively altered socioeconomic conditions\. We then introduced theValue Flip Rateto quantify the robustness to this perturbation, defined as the proportion of instances where the value prediction shifts solely due to the inversion of the modified attribute\.

As illustrated in Figure[8](https://arxiv.org/html/2605.14420#S5.F8), DVMap demonstrates a significant reduction in flip rates compared to the base LLMs across non\-financial domains \(e\.g\., Religion, Trust\), while preserving appropriate robustness within financial contexts\. This indicates that rather than superficially reacting to the income attribute, DVMap leverages multi\-dimensional demographic constraints, recognizing that core values embedded in holistic identities possess resilience against economic fluctuations \(see Appendix[K](https://arxiv.org/html/2605.14420#A11)for case study\)\.

### 5\.7Analysis of Data Filtering Strategy

To further justify the exclusion of profiles with Shannon entropyH\>0H\>0, we conducted a comparative analysis against a “Majority Voting” baseline\. In this alternative setting, we relaxed the filtering constraint toH≥0H\\geq 0, which incorporates samples where a primary consensus exists but intra\-group disagreement remains\.

Table 3:Comparison of different filtering strategies based on Qwen3\-4B\.Table 4:Ablation study on different reasoning strategies using Qwen3\-4B\.Table 5:Ablation study on reward function designs using Qwen3\-4B\.The empirical results in Table[3](https://arxiv.org/html/2605.14420#S5.T3)demonstrate that the strict filtering strategy \(H=0H=0\) consistently outperforms the majority voting approach \(H≥0H\\geq 0\) across all metrics\. Specifically, we observe a 1\.4% improvement in Accuracy and a notable reduction in Wasserstein Distance \(WD\)\. This indicates high\-entropy samples introduce noise from latent variables; filtering them enables the model to learn more precise demographic\-value mappings\.

### 5\.8Analysis of Structured Reasoning

To isolate the contribution of structured Chain\-of\-Thought \(CoT\) from standard preference learning, we have conducted an ablation study \(Table[4](https://arxiv.org/html/2605.14420#S5.T4)\) across four settings: \(1\)Base Model; \(2\)Inference\-only CoT\(without training\); \(3\)Standard RL\(free reasoning\); and \(4\)DVMap\(RL with structured CoT templates\)\.

As shown in Table[4](https://arxiv.org/html/2605.14420#S5.T4), invoking reasoning only during inference degrades Accuracy by 0\.8%, likely stemming from logic hallucinations without specialized training\. While standard RL with free reasoning improves upon the base model, integrating structured CoT into the training loop yields the most significant gains\. Specifically, DVMap achieves a 1\.7% Accuracy increase and further WD reduction compared to the free\-reasoning RL baseline\. This confirms that DVMap’s structured CoT acts as a “thought steering” mechanism, providing high\-quality intermediate supervision that helps the model internalize correct sociological logic for precise value alignment\.

### 5\.9Effectiveness of Minimalist Reward Design

To validate the superiority of our minimalist binary reward, we have compared it against aLikert\-adjusted Soft Rewardvariant\. In this setting, the reward provides granular supervision by scaling linearly with the distance to the target consensus:r=α⋅\(1−\|y^−y\|L−1\)\+β⋅rformatr=\\alpha\\cdot\(1\-\\frac\{\|\\hat\{y\}\-y\|\}\{L\-1\}\)\+\\beta\\cdot r\_\{\\mathrm\{format\}\}, whereLLis the scale size\. This baseline has examined whether a continuous supervisory signal offers better guidance than our binary approach for distribution alignment\.

As shown in Table[5](https://arxiv.org/html/2605.14420#S5.T5), while the Likert\-adjusted strategy improves upon the base model, our minimalist binary design consistently achieves the best performance\. Specifically, DVMap achieves a 1\.6% absolute Accuracy gain and a lower Wasserstein Distance \(0\.1420\.142vs0\.1550\.155\) compared to the complex variant\. This suggests that a strict binary signal effectively leverages the pre\-trained model’s inherent semantic topology, providing a more robust and decisive objective for pluralistic alignment\.

## 6Conclusion

In this paper, we have presented DVMap \(High\-Consensus Demographic\-Value Mapping\), a fine\-grained framework designed to resolve the intrinsic divergence inherent in pluralistic value alignment\. By identifying high\-consensus demographic archetypes within diverse national\-level groups and integrating Structured CoT with GRPO, DVMap enables LLMs to achieve fine\-grained value alignment\. Extensive experiments demonstrate that DVMap successfully learns the manifold mapping from demographics to values, and Qwen3\-8B trained with DVMap achieves performance comparable to advanced closed\-source LLMs\. Further analyses indicate that DVMap exhibits strong generalization across demographics, countries, and values, while also demonstrating high robustness\.

## Limitations

Despite the outstanding performance of DVMap, we must acknowledge the limitations of our current work\. First, the static nature of the WVS makes it difficult to reflect dynamically evolving public sentiment in real\-time\. Second, despite strategic sampling, the dataset may still underrepresent certain marginalized cultural groups\. Third, the 11\-dimensional demographic profile is inherently a statistical abstraction of complex human nature\. Our “Demographic Archetypes” capture “Sociological Roles” based on group modes, rather than “Psychological Individuals” with unique psychological traits and personal experiences\. Finally, while the current discriminative \(multiple\-choice\) evaluation precisely quantifies predictive capability, it cannot measure the model’s ability to generate content with identity\-specific tone and rhetoric in open\-ended dialogue\. Bridging the gap from discrimination to generation remains a key challenge for the future\.

## Acknowledgement

The present research was supported by the National Key Research and Development Program of China \(Grant No\. 2024YFE0203000\), the State Key Laboratory of Tibetan Intelligence \(Grant No\. 2025\-ZJ\-J08\), the Postdoctoral Fellowship Program of CPSF \(Grant No\. GZC20251075\)\. We would like to thank the anonymous reviewers for their insightful comments\.

## References

- Investigating cultural alignment of large language models\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 12404–12422\.Cited by:[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px2.p1.1)\.
- J\. Andreas \(2022\)Language models as agent models\.InFindings of the Association for Computational Linguistics: EMNLP 2022,pp\. 5769–5779\.Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1)\.
- Anthropic \(2025\)Claude 3\.7 sonnet\.External Links:[Link](https://www.anthropic.com/news/claude-3-7-sonnet)Cited by:[§5\.2](https://arxiv.org/html/2605.14420#S5.SS2.p1.1)\.
- S\. Arora and A\. Goyal \(2023\)A theory for emergence of complex skills in language models\.CoRRabs/2307\.15936\.External Links:[Link](https://doi.org/10.48550/arXiv.2307.15936),[Document](https://dx.doi.org/10.48550/ARXIV.2307.15936),2307\.15936Cited by:[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px1.p1.1)\.
- A\. Askell, Y\. Bai, A\. Chen, D\. Drain, D\. Ganguli, T\. Henighan, A\. Jones, N\. Joseph, B\. Mann, N\. DasSarma, N\. Elhage, Z\. Hatfield\-Dodds, D\. Hernandez, J\. Kernion, K\. Ndousse, C\. Olsson, D\. Amodei, T\. B\. Brown, J\. Clark, S\. McCandlish, C\. Olah, and J\. Kaplan \(2021\)A general language assistant as a laboratory for alignment\.CoRRabs/2112\.00861\.External Links:[Link](https://arxiv.org/abs/2112.00861),2112\.00861Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1)\.
- Y\. Bai, A\. Jones, K\. Ndousse, A\. Askell, A\. Chen, N\. DasSarma, D\. Drain, S\. Fort, D\. Ganguli, T\. Henighan, N\. Joseph, S\. Kadavath, J\. Kernion, T\. Conerly, S\. E\. Showk, N\. Elhage, Z\. Hatfield\-Dodds, D\. Hernandez, T\. Hume, S\. Johnston, S\. Kravec, L\. Lovitt, N\. Nanda, C\. Olsson, D\. Amodei, T\. B\. Brown, J\. Clark, S\. McCandlish, C\. Olah, B\. Mann, and J\. Kaplan \(2022\)Training a helpful and harmless assistant with reinforcement learning from human feedback\.CoRRabs/2204\.05862\.External Links:[Link](https://doi.org/10.48550/arXiv.2204.05862),[Document](https://dx.doi.org/10.48550/ARXIV.2204.05862),2204\.05862Cited by:[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px1.p1.1)\.
- P\. Bourdieu \(2018\)The forms of capital\.InThe sociology of economic life,pp\. 78–92\.Cited by:[Table 6](https://arxiv.org/html/2605.14420#A1.T6.2.1),[Table 6](https://arxiv.org/html/2605.14420#A1.T6.3.1),[§4\.1](https://arxiv.org/html/2605.14420#S4.SS1.SSS0.Px1.p1.6)\.
- L\. Breiman \(2001\)Random forests\.Machine learning45\(1\),pp\. 5–32\.Cited by:[§3](https://arxiv.org/html/2605.14420#S3.p2.2)\.
- Y\. Cao, L\. Zhou, S\. Lee, L\. C\. Piqueras, M\. Chen, and D\. Hershcovich \(2023\)Assessing cross\-cultural alignment between chatgpt and human societies: an empirical study\.InProceedings of the first workshop on cross\-cultural considerations in NLP \(C3NLP\),pp\. 53–67\.Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p2.1),[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px1.p1.1),[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px2.p1.1)\.
- R\. Choenni, A\. Lauscher, and E\. Shutova \(2024\)The echoes of multilinguality: tracing cultural value shifts during language model fine\-tuning\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 15042–15058\.Cited by:[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px1.p1.1)\.
- R\. Choenni and E\. Shutova \(2024\)Self\-alignment: improving alignment of cultural values in LLMs via in\-context learning\.CoRRabs/2408\.16482\.External Links:[Link](https://doi.org/10.48550/arXiv.2408.16482),[Document](https://dx.doi.org/10.48550/ARXIV.2408.16482),2408\.16482Cited by:[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px2.p1.1)\.
- DeepSeek\-AI \(2024\)DeepSeek\-v3 technical report\.CoRRabs/2412\.19437\.External Links:[Link](https://doi.org/10.48550/arXiv.2412.19437),[Document](https://dx.doi.org/10.48550/ARXIV.2412.19437),2412\.19437Cited by:[§5\.2](https://arxiv.org/html/2605.14420#S5.SS2.p1.1)\.
- E\. Durmus, K\. Nyugen, T\. I\. Liao, N\. Schiefer, A\. Askell, A\. Bakhtin, C\. Chen, Z\. Hatfield\-Dodds, D\. Hernandez, N\. Joseph, L\. Lovitt, S\. McCandlish, O\. Sikder, A\. Tamkin, J\. Thamkul, J\. Kaplan, J\. Clark, and D\. Ganguli \(2023\)Towards measuring the representation of subjective global opinions in language models\.CoRRabs/2306\.16388\.External Links:[Link](https://doi.org/10.48550/arXiv.2306.16388),[Document](https://dx.doi.org/10.48550/ARXIV.2306.16388),2306\.16388Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1),[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px1.p1.1),[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px2.p1.1)\.
- S\. Feng, T\. Sorensen, Y\. Liu, J\. Fisher, C\. Y\. Park, Y\. Choi, and Y\. Tsvetkov \(2024\)Modular pluralism: pluralistic alignment via multi\-llm collaboration\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,pp\. 4151–4171\.Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p2.1),[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px2.p1.1)\.
- L\. Gao, S\. Biderman, S\. Black, L\. Golding, T\. Hoppe, C\. Foster, J\. Phang, H\. He, A\. Thite, N\. Nabeshima, S\. Presser, and C\. Leahy \(2021\)The pile: an 800gb dataset of diverse text for language modeling\.CoRRabs/2101\.00027\.External Links:[Link](https://arxiv.org/abs/2101.00027),2101\.00027Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1),[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px1.p1.1)\.
- A\. Grattafiori, A\. Dubey, A\. Jauhri, A\. Pandey, A\. Kadian, A\. Al\-Dahle, A\. Letman, A\. Mathur, A\. Schelten, A\. Vaughan,et al\.\(2024\)The llama 3 herd of models\.InNeural Information Processing Systems,Cited by:[§5\.2](https://arxiv.org/html/2605.14420#S5.SS2.p1.1)\.
- J\. Guan, J\. Wu, J\. Li, C\. Cheng, and W\. Wu \(2025\)A survey on personalized alignment—the missing piece for large language models in real\-world applications\.InFindings of the Association for Computational Linguistics: ACL 2025,pp\. 5313–5333\.Cited by:[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px2.p1.1)\.
- C\. Haerpfer, R\. Inglehart, A\. Moreno, C\. Welzel, K\. Kizilova, J\. Diez\-Medrano, M\. Lagos, P\. Norris, E\. Ponarin, and B\. Puranen \(2022\)World values survey: round seven – country\-pooled datafile version 6\.0\.Note:JD Systems Institute & WVSA Secretariat, Madrid, Spain & Vienna, AustriaExternal Links:[Document](https://dx.doi.org/10.14281/18241.24),[Link](https://www.worldvaluessurvey.org/WVSDocumentationWV7.jsp)Cited by:[Table 6](https://arxiv.org/html/2605.14420#A1.T6.2.1),[Table 6](https://arxiv.org/html/2605.14420#A1.T6.3.1),[Table 7](https://arxiv.org/html/2605.14420#A1.T7.2.1),[Table 7](https://arxiv.org/html/2605.14420#A1.T7.3.1),[Table 9](https://arxiv.org/html/2605.14420#A2.T9),[Figure 2](https://arxiv.org/html/2605.14420#S1.F2),[§1](https://arxiv.org/html/2605.14420#S1.p3.1),[§3](https://arxiv.org/html/2605.14420#S3.p1.1)\.
- Z\. He, S\. Guo, A\. Rao, and K\. Lerman \(2024\)Whose emotions and moral sentiments do language models reflect?\.InFindings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11\-16, 2024,L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Findings of ACL,pp\. 6611–6631\.External Links:[Link](https://doi.org/10.18653/v1/2024.findings-acl.395),[Document](https://dx.doi.org/10.18653/V1/2024.FINDINGS-ACL.395)Cited by:[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px1.p1.1)\.
- D\. Hendrycks, C\. Burns, S\. Basart, A\. Critch, J\. Li, D\. Song, and J\. Steinhardt \(2021\)Aligning AI with shared human values\.In9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3\-7, 2021,External Links:[Link](https://openreview.net/forum?id=dNy%5C_RKzJacY)Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1)\.
- R\. Inglehart and C\. Welzel \(2005\)Modernization, cultural change, and democracy\.The human development sequence\.Cited by:[Appendix D](https://arxiv.org/html/2605.14420#A4.p1.1),[§4\.1](https://arxiv.org/html/2605.14420#S4.SS1.SSS0.Px2.p1.1)\.
- R\. L\. Johnson, G\. Pistilli, N\. Menédez\-González, L\. D\. D\. Duran, E\. Panai, J\. Kalpokiene, and D\. J\. Bertulfo \(2022\)The ghost in the machine has an american accent: value conflict in GPT\-3\.CoRRabs/2203\.07785\.External Links:[Link](https://doi.org/10.48550/arXiv.2203.07785),[Document](https://dx.doi.org/10.48550/ARXIV.2203.07785),2203\.07785Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1),[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px1.p1.1)\.
- E\. Kasneci, K\. Seßler, S\. Küchemann, M\. Bannert, D\. Dementieva, F\. Fischer, U\. Gasser, G\. Groh, S\. Günnemann, E\. Hüllermeier,et al\.\(2023\)ChatGPT for good? on opportunities and challenges of large language models for education\.Learning and individual differences103,pp\. 102274\.Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1)\.
- D\. Keysers, N\. Schärli, N\. Scales, H\. Buisman, D\. Furrer, S\. Kashubin, N\. Momchev, D\. Sinopalnikov, L\. Stafiniak, T\. Tihon, D\. Tsarkov, X\. Wang, M\. van Zee, and O\. Bousquet \(2020\)Measuring compositional generalization: A comprehensive method on realistic data\.In8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26\-30, 2020,External Links:[Link](https://openreview.net/forum?id=SygcCnNKwr)Cited by:[1st item](https://arxiv.org/html/2605.14420#S4.I1.i1.p1.1)\.
- G\. Kovac, M\. Sawayama, R\. Portelas, C\. Colas, P\. F\. Dominey, and P\. Oudeyer \(2023\)Large language models as superpositions of cultural perspectives\.CoRRabs/2307\.07870\.External Links:[Link](https://doi.org/10.48550/arXiv.2307.07870),[Document](https://dx.doi.org/10.48550/ARXIV.2307.07870),2307\.07870Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p2.1),[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px2.p1.1)\.
- P\. Lahoti, N\. Blumm, X\. Ma, R\. Kotikalapudi, S\. Potluri, Q\. Tan, H\. Srinivasan, B\. Packer, A\. Beirami, A\. Beutel, and J\. Chen \(2023\)Improving diversity of demographic representation in large language models via collective\-critiques and self\-voting\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6\-10, 2023,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),pp\. 10383–10405\.External Links:[Link](https://doi.org/10.18653/v1/2023.emnlp-main.643),[Document](https://dx.doi.org/10.18653/V1/2023.EMNLP-MAIN.643)Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p2.1),[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px2.p1.1)\.
- C\. Li, M\. Chen, J\. Wang, S\. Sitaram, and X\. Xie \(2024a\)Culturellm: incorporating cultural differences into large language models\.Advances in Neural Information Processing Systems37,pp\. 84799–84838\.Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p2.1),[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px2.p1.1)\.
- C\. Li, D\. Teney, L\. Yang, Q\. Wen, X\. Xie, and J\. Wang \(2024b\)Culturepark: boosting cross\-cultural understanding in large language models\.Advances in Neural Information Processing Systems37,pp\. 65183–65216\.Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p2.1),[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px2.p1.1)\.
- C\. C\. Liu, F\. Koto, T\. Baldwin, and I\. Gurevych \(2024\)Are multilingual LLMs culturally\-diverse reasoners? an investigation into multicultural proverbs and sayings\.InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),pp\. 2016–2039\.Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1),[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px1.p1.1)\.
- P\. Niszczota, M\. Janczak, and M\. Misiak \(2025\)Large language models can replicate cross\-cultural differences in personality\.Journal of Research in Personality115,pp\. 104584\.Cited by:[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px1.p1.1)\.
- OpenAI \(2024\)GPT\-4o system card\.CoRRabs/2410\.21276\.External Links:[Link](https://doi.org/10.48550/arXiv.2410.21276),[Document](https://dx.doi.org/10.48550/ARXIV.2410.21276),2410\.21276Cited by:[§5\.2](https://arxiv.org/html/2605.14420#S5.SS2.p1.1)\.
- L\. Ouyang, J\. Wu, X\. Jiang, D\. Almeida, C\. Wainwright, P\. Mishkin, C\. Zhang, S\. Agarwal, K\. Slama, A\. Ray,et al\.\(2022\)Training language models to follow instructions with human feedback\.Advances in neural information processing systems35,pp\. 27730–27744\.Cited by:[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px1.p1.1)\.
- J\. S\. Park, J\. O’Brien, C\. J\. Cai, M\. R\. Morris, P\. Liang, and M\. S\. Bernstein \(2023\)Generative agents: interactive simulacra of human behavior\.InProceedings of the 36th annual acm symposium on user interface software and technology,pp\. 1–22\.Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1)\.
- J\. Peng, L\. Shi, X\. Wu, H\. Zhang, F\. Liu, H\. Lyu, and D\. Xiong \(2025\)DiplomacyAgent: do LLMs balance interests and ethical principles in international events?\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,C\. Christodoulopoulos, T\. Chakraborty, C\. Rose, and V\. Peng \(Eds\.\),Suzhou, China,pp\. 13721–13739\.External Links:[Link](https://aclanthology.org/2025.emnlp-main.693/),[Document](https://dx.doi.org/10.18653/v1/2025.emnlp-main.693),ISBN 979\-8\-89176\-332\-6Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1)\.
- S\. F\. Pileggi \(2024\)A hybrid approach to analysing large scale surveys: individual values, opinions and perceptions\.SN Social Sciences4\(8\),pp\. 144\.Cited by:[Table 7](https://arxiv.org/html/2605.14420#A1.T7.2.1),[Table 7](https://arxiv.org/html/2605.14420#A1.T7.3.1),[Appendix B](https://arxiv.org/html/2605.14420#A2.p1.1),[Figure 2](https://arxiv.org/html/2605.14420#S1.F2),[§4\.1](https://arxiv.org/html/2605.14420#S4.SS1.SSS0.Px3.p1.1)\.
- R\. Rafailov, A\. Sharma, E\. Mitchell, C\. D\. Manning, S\. Ermon, and C\. Finn \(2023\)Direct preference optimization: your language model is secretly a reward model\.Advances in neural information processing systems36,pp\. 53728–53741\.Cited by:[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px1.p1.1)\.
- S\. Santurkar, E\. Durmus, F\. Ladhak, C\. Lee, P\. Liang, and T\. Hashimoto \(2023\)Whose opinions do language models reflect?\.InInternational Conference on Machine Learning,pp\. 29971–30004\.Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1),[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px1.p1.1)\.
- Z\. Shao, P\. Wang, Q\. Zhu, R\. Xu, J\. Song, M\. Zhang, Y\. K\. Li, Y\. Wu, and D\. Guo \(2024\)DeepSeekMath: pushing the limits of mathematical reasoning in open language models\.CoRRabs/2402\.03300\.External Links:[Link](https://doi.org/10.48550/arXiv.2402.03300),[Document](https://dx.doi.org/10.48550/ARXIV.2402.03300),2402\.03300Cited by:[§4\.2](https://arxiv.org/html/2605.14420#S4.SS2.SSS0.Px3.p1.4)\.
- S\. Shen, L\. Logeswaran, M\. Lee, H\. Lee, S\. Poria, and R\. Mihalcea \(2024\)Understanding the capabilities and limitations of large language models for cultural commonsense\.InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\), NAACL 2024, Mexico City, Mexico, June 16\-21, 2024,K\. Duh, H\. Gómez\-Adorno, and S\. Bethard \(Eds\.\),pp\. 5668–5680\.External Links:[Link](https://doi.org/10.18653/v1/2024.naacl-long.316),[Document](https://dx.doi.org/10.18653/V1/2024.NAACL-LONG.316)Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1)\.
- T\. Shen, R\. Jin, Y\. Huang, C\. Liu, W\. Dong, Z\. Guo, X\. Wu, Y\. Liu, and D\. Xiong \(2023a\)Large language model alignment: A survey\.CoRRabs/2309\.15025\.External Links:[Link](https://doi.org/10.48550/arXiv.2309.15025),[Document](https://dx.doi.org/10.48550/ARXIV.2309.15025),2309\.15025Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1)\.
- T\. Shen, S\. Li, Q\. Tu, and D\. Xiong \(2023b\)Roleeval: a bilingual role evaluation benchmark for large language models\.arXiv preprint arXiv:2312\.16132\.Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1)\.
- G\. Simmons \(2023\)Moral mimicry: large language models produce moral rationalizations tailored to political identity\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 4: Student Research Workshop\),pp\. 282–297\.Cited by:[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px2.p1.1)\.
- G\. Team \(2025a\)Gemini 2\.5: pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities\.CoRRabs/2507\.06261\.External Links:[Link](https://doi.org/10.48550/arXiv.2507.06261),[Document](https://dx.doi.org/10.48550/ARXIV.2507.06261),2507\.06261Cited by:[§5\.2](https://arxiv.org/html/2605.14420#S5.SS2.p1.1)\.
- Q\. Team \(2025b\)Qwen3 technical report\.CoRRabs/2505\.09388\.External Links:[Link](https://doi.org/10.48550/arXiv.2505.09388),[Document](https://dx.doi.org/10.48550/ARXIV.2505.09388),2505\.09388Cited by:[§5\.2](https://arxiv.org/html/2605.14420#S5.SS2.p1.1)\.
- W\. Wang, W\. Jiao, J\. Huang, R\. Dai, J\. Huang, Z\. Tu, and M\. Lyu \(2024\)Not all countries celebrate thanksgiving: on the cultural dominance in large language models\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 6349–6384\.Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1)\.
- W\. F\. Wiggins and A\. S\. Tejani \(2022\)On the opportunities and risks of foundation models for natural language processing in radiology\.Radiology: Artificial Intelligence4\(4\),pp\. e220119\.Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1)\.
- S\. Xu, W\. Dong, Z\. Guo, X\. Wu, and D\. Xiong \(2024\)Exploring multilingual concepts of human values in large language models: is value alignment consistent, transferable and controllable across languages?\.InFindings of the Association for Computational Linguistics: EMNLP 2024,pp\. 1771–1793\.Cited by:[§1](https://arxiv.org/html/2605.14420#S1.p1.1)\.
- S\. Xu, Y\. Leng, L\. Yu, and D\. Xiong \(2025\)Self\-pluralising culture alignment for large language models\.InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),pp\. 6859–6877\.Cited by:[§2](https://arxiv.org/html/2605.14420#S2.SS0.SSS0.Px2.p1.1)\.
- A\. Yang, B\. Yang, B\. Zhang, B\. Hui, B\. Zheng, B\. Yu, C\. Li, D\. Liu, F\. Huang, H\. Wei, H\. Lin, J\. Yang, J\. Tu, J\. Zhang, J\. Yang, J\. Yang, J\. Zhou, J\. Lin, K\. Dang, K\. Lu, K\. Bao, K\. Yang, L\. Yu, M\. Li, M\. Xue, P\. Zhang, Q\. Zhu, R\. Men, R\. Lin, T\. Li, T\. Xia, X\. Ren, X\. Ren, Y\. Fan, Y\. Su, Y\. Zhang, Y\. Wan, Y\. Liu, Z\. Cui, Z\. Zhang, and Z\. Qiu \(2024\)Qwen2\.5 technical report\.CoRRabs/2412\.15115\.External Links:[Link](https://doi.org/10.48550/arXiv.2412.15115),[Document](https://dx.doi.org/10.48550/ARXIV.2412.15115),2412\.15115Cited by:[§5\.2](https://arxiv.org/html/2605.14420#S5.SS2.p1.1)\.

## Appendix ADemographic Attribute Selection

This section provides specific mapping details for the 11 core demographic attributes\. As listed in Table[6](https://arxiv.org/html/2605.14420#A1.T6), these features are categorized intoSocial Attributes\(e\.g\., Age, Gender\),Economic Status\(e\.g\., Income Bracket, Occupation\), andCultural Background\(e\.g\., Education, Religion\)\. For attributes with numerically scale—includingAge,Income Bracket, andNumber of Children—we apply the following discretization strategies to map the numerical ranges into ordinal semantic levels:

- •Age \(Q262\): Mapped into five developmental life stages: Adolescence \(<18<18\), Young Adulthood \(18–3518\\text\{\-\-\}35\), Middle Adulthood \(35–5135\\text\{\-\-\}51\), Late Adulthood \(51–6551\\text\{\-\-\}65\), and Older Adulthood \(≥65\\geq 65\)\.
- •Income Bracket \(Q288\): Originally a 10\-point scale, this attribute is grouped into three economic brackets: Low \(1–31\\text\{\-\-\}3\), Middle \(4–74\\text\{\-\-\}7\), and High \(8–108\\text\{\-\-\}10\)\.
- •Number of Children \(Q274\): Simplified into a binary status indicating parenthood \(Has children vs\. Has no children\)\.

Table 6:Demographic Attributes\.\*As in the original dataset \(JD Systems Institute & WVSA 2022Haerpferet al\.\([2022](https://arxiv.org/html/2605.14420#bib.bib52)\)\)\. \*\*As in the sociological stratificationBourdieu \([2018](https://arxiv.org/html/2605.14420#bib.bib9)\)Table 7:Value Question\.\*As in the original dataset \(JD Systems Institute & WVSA 2022Haerpferet al\.\([2022](https://arxiv.org/html/2605.14420#bib.bib52)\)\)\. \*\*As in the CharacterizationPileggi \([2024](https://arxiv.org/html/2605.14420#bib.bib35)\)\.Algorithm 1Structured Chain\-of\-Thought \(CoT\) Instruction Template for DVMap\.1:Demographic Archetypes Injection:

2:You are playing the role of a\{Life Stage\}\{Gender\}from\{Country\}\.

3:You are\{Marital Status\}and\{Parenthood\}\.

4:You have completed your education at the level of\{Education Level\}\.

5:Currently, you work as a\{Occupation\}\. Your work involves\{Work Nature\}\.

6:Your income level is\{Income Bracket\}, which is categorized as low, medium, or high\.

7:Your native language is\{Common Language\}\.

8:You practice the religion of\{Religion\}\.

9:Task Description:

10:Based on the character’s personal information \(such as education, occupation, income, religious beliefs, life stage, etc\.\) and the given value\-based question, please follow the structured reasoning steps below\.

11:Structured CoT Instruction:

12:1\. Analyze the current question in relation to the character’s identity and values:Consider whether the current question aligns or conflicts with the character’s background, social context, and personal beliefs\. For each identity attribute \(e\.g\., education, occupation, income, etc\.\), keep the analysis concise \(1\-3 sentences\)\.

13:2\. Provide reasoning for each option:Explain why each option aligns or misaligns with the character’s identity, values, and beliefs\. You may reference education level, income bracket, religion, occupation, life stage, and other relevant traits\. Keep the reasoning for each option brief \(1\-3 sentences\)\.

14:3\. Select the most appropriate answer:After analyzing all options, choose the one that best reflects the character’s social background, personal beliefs, and core values\.

15:Output Constraint:

16:Only output the final answer inside the<answer\></answer\>tags, without any additional explanation\.

17:Input Data:

18:“Question”: \[—\-Insert Value\-based Question Here—\-\]

19:“Options”: \[—\-Insert Options List Here—\-\]

## Appendix BValue Question Sampling

Following thePileggi \([2024](https://arxiv.org/html/2605.14420#bib.bib35)\), we sample 16 value\-representative questions based on theindependence,minimal overlap, andsocial generalizability:

Table 8:Country Sampling of Cross\-Country Generalization\. \* denotesNuance Testing\. \*\* denotesGap Filling\.Table 9:Question Sampling of Cross\-Value Generalization\. \*As in the original dataset \(JD Systems Institute & WVSA 2022Haerpferet al\.\([2022](https://arxiv.org/html/2605.14420#bib.bib52)\)\)\.1. 1\.Independence: Selected features model stand\-alone attributes\. Given the structured nature of the original WVS questionnaire, questions are carefully chosen to establish clear conceptual boundaries and avoid redundancy within grouped questions\.
2. 2\.Minimal Overlap: To mitigate collinearity and conceptual ambiguity, features are filtered to minimize semantic overlap, ensuring that each selected question addresses a distinct aspect of human values\.
3. 3\.Social Generalizability: Priority is given to attributes that reflect generic concepts at a societal level \(e\.g\., discriminatory or divisive topics\) rather than idiosyncratic personal preferences\. This aligns the data with a high\-level conceptual framework suitable for cross\-cultural analysis\.

Table[7](https://arxiv.org/html/2605.14420#A1.T7)details the original question IDs, the specific survey questions, and their corresponding concepts and metrics\.

## Appendix CInstruction Template

As shown in Algorithm[1](https://arxiv.org/html/2605.14420#alg1), this template first instantiates the demographic archetypes by injecting the 11\-dimensional identity attributes, followed by a three\-stage Chain\-of\-Thought \(CoT\) instruction that guiding the model to explicitly analyze the correlations between identity attributes and the given question\.

## Appendix DCountry Sampling of Cross\-Country Generalization

The Cross\-Country Generalization consists of 8 countries unseen during training, spanning all four quadrants of theInglehart\-Welzel Cultural MapInglehart and Welzel \([2005](https://arxiv.org/html/2605.14420#bib.bib2)\)\. As detailed in Table[8](https://arxiv.org/html/2605.14420#A2.T8), the selection adheres to a dual logic covering all specific test cases:

- •Gap Filling:Includes cultural regions entirely absent from the training set to expand geographical coverage\. This category comprisesNigeria\(Global South\),Iran\(representing the Theocratic and Shia Islam\),Pakistan\(representing the South Asian Islamic sphere\), andIndonesia\(representing the Southeast Asian archipelago\)\.
- •Nuance Testing:Includes nations that share broad civilization lineages with training anchors but possess distinct local characteristics\. This category comprisesVietnam\(shares Confucian roots with China but differs in political history\),Australia\(shares Anglosphere roots with the UK/USA but within an Asia\-Pacific context\),Mexico\(shares Hispanic roots with Brazil but with distinct North American dynamics\), andTürkiye\(shares Islamic roots with Egypt but maintains a distinct secular tradition\)\.

## Appendix EQuestion Sampling of Cross\-Value Generalization

To robustly evaluate the Cross\-Value generalization capabilities of the DVMap framework, we curate a separate validation set consisting of 7 distinct questions from the WVS Wave 7\. These questions are not included in the training phase but are selected based on their semantic proximity to the 16 core training features\. The selection rationale aims to test the model’s ability to transfer learned value representations to unseen but conceptually related contexts\. The selection criteria are twofold:

- •Contextual Variations of Core Concepts: Questions Q61, Q70, Q113, and Q132 serve as direct semantic neighbors to the training questions Q60, Q69, Q112, and Q131, respectively\. For instance, while the training set asks about trust inknown people\(Q60\), the generalization set asks about trust instrangers\(Q61\)\. This tests whether the model can generalize the abstract concept of “Social Trust” across different social distances\.
- •Thematic Extensions of Values: Questions Q8, Q9, and Q37 extend the “Child\-rearing” and “Societal Duty” dimensions\. Instead of asking about the personal importance of family \(Q1\), these questions probe specific child\-rearing values \(Independence, Hard work\) and societal duties\. This evaluates the model’s ability to infer specific value applications from broad value principles\.

Table[9](https://arxiv.org/html/2605.14420#A2.T9)details the characterization of these generalization questions, following the same taxonomy as the training set\.

Table 10:Performance of DVMap onLlama\-3\.2\-3B\-Instructacross three generalization benchmarks\.
## Appendix FGeneralizability across Model Families

To address concerns regarding model diversity, we extended our evaluation to theLlama\-3\.2\-3B\-Instructarchitecture\. This ensures that the observed benefits of DVMap are not idiosyncratic to the Qwen family but are transferable to models with different pre\-training objectives and tokenization schemes\.

As summarized in Table[10](https://arxiv.org/html/2605.14420#A5.T10), DVMap delivers consistent and significant performance improvements across all benchmarks\. On theCross\-Demographictask, our method increases Accuracy by 12\.8% and reduces the Wasserstein Distance \(WD\) by 0\.0437\. Even on the more challengingCross\-Valuetask, DVMap maintains steady improvements in both alignment accuracy and label consistency\. These results empirically validate that DVMap effectively captures universal patterns of pluralistic value mapping, facilitating robust alignment regardless of the underlying backbone architecture\.

## Appendix GImplementation Details

#### Hyperparameter Settings\.

We fine\-tune the models using Group Relative Policy Optimization \(GRPO\) with a learning rate of5×10−65\\times 10^\{\-6\}\. To ensure generation diversity during rollout, the sampling temperature is set toT=0\.7T=0\.7\. We set the number of rollouts per iteration to88and the global batch size to6464\. Models are trained for only11epoch to prevent overfitting\. We utilizebfloat16precision to balance memory efficiency and numerical stability, accelerating training with Flash\-Attention\.

#### Computational Environment\.

All experiments are conducted on an Ubuntu 20\.04 operating system\. The hardware infrastructure consists of a server equipped with 8 NVIDIA A100 \(80GB\) GPUs and 512GB of system RAM\. The training framework is implemented based on PyTorch and VeRL222[https://github\.com/volcengine/verl](https://github.com/volcengine/verl)\(Volcano Engine RL library\), utilizing the FSDP2 \(Fully Sharded Data Parallel\) strategy for multi\-GPU parallel acceleration\.

Complete corpus and code will be available soon\.

Table 11:Comparison of general utility between the base model and DVMap on Qwen3\-8B\.Table 12:Generalization Mechanism and Semantic Correlation Analysis\.

## Appendix HImpact on General Model Utility

A common concern in model alignment is the potential trade\-off between specialized steering and general utility, often referred to as the “alignment tax”\. To evaluate whether DVMap preserves the core capabilities of the base LLM, we conduct a comprehensive evaluation on five standard benchmarks: MMLU, ARC\-Easy, GSM8K, HellaSwag, and IFEval\.

As summarized in Table[11](https://arxiv.org/html/2605.14420#A7.T11), the performance fluctuations between the base model and DVMap are negligible across all evaluated dimensions\. For instance, the variations in MMLU \(\+0\.0008\+0\.0008\), ARC\-Easy \(−0\.0013\-0\.0013\), and GSM8K \(−0\.0007\-0\.0007\) remain within the range of statistical marginality\. Notably, we observe a slight improvement in IFEval \(\+0\.0048\+0\.0048\), suggesting that structured reasoning training may marginally benefit instruction\-following consistency\. These results empirically demonstrate that DVMap achieves precise pluralistic alignment without sacrificing fundamental general\-purpose intelligence\.

## Appendix IDetailed Cross\-Country Generalization

To provide a more granular view of cross\-country generalization, we present per\-country accuracy improvements in Figure[9](https://arxiv.org/html/2605.14420#A9.F9)\. The countries represented by ISO codes in the visualization correspond to those listed in Table[8](https://arxiv.org/html/2605.14420#A2.T8)\.

![Refer to caption](https://arxiv.org/html/2605.14420v1/x11.png)Figure 9:Details of Cross\-Country Generalization\.As illustrated in Figure[9](https://arxiv.org/html/2605.14420#A9.F9), the performance enhancement brought by DVMap is broadly distributed\. For instance, countries with significantly different value priors from the training set, such as those in Sub\-Saharan Africa and South Asia, still exhibit substantial improvements\. This granular analysis demonstrates that the high\-consensus demographic\-value mapping captured by DVMap transcends specific national boundaries, confirming its effectiveness in modeling pluralistic values on a global scale\.

ModelIncomeReasoningAnswerProfile Identity:Middle Adulthood, Female, Russian Federation \(Country\), Widowed, Has Children, Lower secondary education \(ISCED 2\), Retired/pensioned, Private business or industry, Russian \(language\), Eastern OrthodoxQuestion:Taking all things together, would you say you are very happy, rather happy, not very happy, or not at all happy?BaseLow
\(Original\)The character is a middle\-aged widow from the Russian Federation with a lower secondary education, retired, and has a low income\. She practices Eastern Orthodox religion and is likely to value stability, family, and spiritual well\-being\. Her life stage as a widow with at least one child suggests a focus on family and personal fulfillment\. Her low income and retired status may influence her emotional state\. Her Eastern Orthodox beliefs may also shape her perception of happiness, emphasizing spiritual contentment over material success\.
Very happy: This option may not align with her low income and retired status, which could lead to feelings of financial insecurity or loneliness\. It may also not reflect the spiritual nature of her happiness\.
Rather happy:This option could align with her spiritual beliefs and family life, suggesting a moderate level of contentment despite financial challenges\.
Not very happy: This could misalign with her spiritual values and family life, suggesting a more negative outlook than her background implies\.
Not at all happy: This option is unlikely, as it contradicts her likely values and the stability provided by her family and religious practices\.Rather happyHigh
\(Counterfactual\)The character is a middle\-aged widow with a high income, retired, and has at least one child\. She has completed lower secondary education and practices Eastern Orthodox religion\. Her background suggests a stable and possibly fulfilling life, especially with a high income and a family\. Her religious beliefs may influence her perception of happiness, emphasizing spiritual fulfillment\. As a retired individual, she may find happiness in family, faith, and personal achievements\.
The option “Very happy” aligns with her stable life and high income, which can contribute to a sense of security and contentment\.“Rather happy” also fits, as it reflects a moderate but positive outlook, which is common for someone in her life stage\. “Not very happy” or “Not at all happy” would conflict with her background, as they suggest dissatisfaction or unhappiness, which is less likely given her financial stability and family situation\.
The most appropriate answer reflects her positive life circumstances and spiritual beliefs, making “Very happy” the most fitting choice\.Very happy\(Flipped\)DVMapLow
\(Original\)The character is a middle\-aged widow from the Russian Federation with a low income, who has completed lower secondary education and is retired\. She runs a private business and practices Eastern Orthodox Christianity\. Her life stage and financial situation may influence her happiness\. Her religious beliefs may also play a role in her perception of happiness\.
The option “Rather happy” aligns with her background as a widow managing a private business with a low income\. It reflects a moderate level of happiness that is achievable given her circumstances\.The option “Very happy” may be less likely due to her low income and life stage\. The options “Not very happy” and “Not at all happy” may not fully capture her resilience and religious faith\.Rather happyHigh
\(Counterfactual\)The character is a middle adulthood female from the Russian Federation, widowed with at least one child, who has completed lower secondary education and is currently retired with a high income\. She engages in private business or industry and practices Eastern Orthodox Christianity\. These attributes suggest a person who values family, faith, and financial stability\.
Each option reflects different levels of happiness\.“Very happy” aligns with her high income and possibly stable life, but may seem overly optimistic considering the emotional impact of being widowed\.“Rather happy” is a balanced choice that reflects her life stage and values\.“Not very happy” and “Not at all happy” conflict with her high income and stable life situation\.
The most appropriate answer considers her high income, stable life, and religious beliefs, which generally contribute to a sense of contentment and happiness\.Rather happy\(Robust\)Table 13:Full Reasoning of Robustness Analysis\.This table displays the unabridged reasoning outputs generated by the Base model and the DVMap model\. We highlight the critical logic segments\.
## Appendix JGeneralization Mechanism and Semantic Correlation Analysis

To quantify the relationship between semantic proximity and model generalization, we computed two semantic distance metrics for each questionqtestq\_\{test\}in the Cross\-Value generalization set relative to the training setDtrainD\_\{train\}:

1. 1\.Nearest Neighbor Distance \(dmind\_\{min\}\): Defined asdmin=minq∈Dtrain⁡dist\(qtest,q\)d\_\{min\}=\\min\_\{q\\in D\_\{train\}\}\\text\{dist\}\(q\_\{test\},q\)\.
2. 2\.Average Semantic Distance \(davgd\_\{avg\}\): Defined asdavg=1\|Dtrain\|∑q∈Dtraindist\(qtest,q\)d\_\{avg\}=\\frac\{1\}\{\|D\_\{train\}\|\}\\sum\_\{q\\in D\_\{train\}\}\\text\{dist\}\(q\_\{test\},q\)\.

Semantic embeddings were extracted using theQwen3\-8Bmodel, employing Cosine Distance as the metric\. Table[12](https://arxiv.org/html/2605.14420#A7.T12)reports the detailed metrics and performance outcomes\.

As discussed in Section[5\.5](https://arxiv.org/html/2605.14420#S5.SS5), the stronger correlation betweendavgd\_\{avg\}and changes in generalization performance, compared todmind\_\{min\}, confirms that generalization is driven by global semantic alignment\. Among these, Q61 serves as a key case of negative transfer\. Despite being nearly identical to the training question Q60 \(dmin=0\.0037d\_\{min\}=0\.0037\), Q61 experienced a performance drop \(\-6\.0%\)\. While Q60 asks about trusting “people you know,” Q61 asks about trusting “people you meet for the first time”\. This subtle contextual shift caused the model to misapply the learned trust pattern \(likely overfitting to high trust values for known groups\), leading to misalignment on the new question\. In contrast, Q70 \(confidence in courts\) successfully leveraged its similarity to Q69 \(confidence in police\) for a significant gain \(\+13\.0%\), as the underlying value logic remained consistent across these authority\-related questions\.

These findings highlight that while semantic proximity generally facilitates transfer, inconsistent underlying value logic can lead to negative transfer\. This underscores the importance of demographic value logical coherence over superficial semantic similarity, suggesting that future training pipelines could benefit from incorporating contrastive samples—questions that are semantically similar but have distinct value orientations—to further enhance the model’s ability to discern subtle nuances in value judgments\.

## Appendix KCase Study of Robustness Analysis

To illustrate the cognitive difference between the Base model and DVMap, we present a representative case: a middle\-aged, widowed, Eastern Orthodox woman from the Russian Federation with a lower secondary education, as shown in Table[13](https://arxiv.org/html/2605.14420#A9.T13)\.

- •The Base Model \(Economic Determinism\):When the income is counterfactually flipped to “High,” the Base model immediately flips its answer from“Rather happy”to“Very happy”\. Its reasoning reveals a linear, shallow logic: it equates financial wealth directly with maximum happiness, ignoring the profound emotional impact of widowhood and the cultural nuance of Russian modesty\.
- •The DVMap \(Intersectionality & Inertia\):Facing the same high\-income input, DVMap maintains its prediction of“Rather happy”\. Its reasoning chain demonstrates sophisticated Contextual Awareness: it acknowledges the financial stability but argues that“‘Very happy’ seems overly optimistic considering the emotional impact of being widowed”\. DVMap correctly weighs the marginal utility of money against the structural constraints of life stage and culture\.

This indicates that rather than superficially reacting to the income attribute, DVMap leverages multi\-dimensional demographic constraints, recognizing that core values embedded in holistic identities possess resilience against economic fluctuations\.
DVMap: Fine-Grained Pluralistic Value Alignment via High-Consensus Demographic-Value Mapping

Similar Articles

From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents

WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback

Sampling More, Getting Less: Calibration is the Diversity Bottleneck in LLMs

Unlocking Dense Metric Depth Estimation in VLMs

DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification

Submit Feedback

Similar Articles

From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback
Sampling More, Getting Less: Calibration is the Diversity Bottleneck in LLMs
Unlocking Dense Metric Depth Estimation in VLMs
DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification