Embeddings for Preferences, Not Semantics
Summary
This paper introduces a new embedding model designed to capture preferential similarity rather than just semantic similarity, improving preference prediction for collective decision-making systems.
View Cached Full Text
Cached at: 05/12/26, 07:11 AM
# Embeddings for Preferences, Not SemanticsCode: https://github.com/cartgr/Embeddings-for-Preferences Model: https://huggingface.co/cartgr/embeddings-for-preferences-st5-xl
Source: [https://arxiv.org/html/2605.08360](https://arxiv.org/html/2605.08360)
Carter Blair Harvard University carterblair@g\.harvard\.edu &Ariel D\. Procaccia Harvard University arielpro@g\.harvard\.edu &Milind Tambe Harvard University tambe@g\.harvard\.edu
###### Abstract
Modern AI is opening the door to collective decision\-making in which participants express their views as free\-form text rather than voting on a fixed set of candidates\. A natural idea is to embed these opinions in a vector space so that the substantial literature on facility location problems and fair clustering can be brought to bear\. But standard text embeddings measure semantic similarity, whereas distances in facility location problems and fair clustering require what we callpreferential similarity: a participant’s agreement with a piece of text should be inversely related to their distance from it\. Off\-the\-shelf embeddings inherit a coarse preference signal through a correlation between semantic and preferential similarity, but fail to capture preferences when the correlation breaks\. We formalize this as an invariance problem: text embedding models encode both a preference\-relevant signal \(stance and values\) and semantic nuisance \(style and wording\), and the two are observationally correlated, so a geometry that relies on nuisance can appear preference\-correct even when it is not\. We show that synthetic training data designed to break this correlation provably shifts the optimal scorer away from nuisance\-dominated cosine and significantly improves preference prediction across 11 online deliberation datasets\.
## 1Introduction
Many emerging systems for collective decision\-making allow participants to express their preferences in free\-form text instead of voting on predetermined candidate sets\. For example, two influential online deliberation platforms, Polis and Remesh, allow participants to write statements and vote on statements written by others\(Smallet al\.,[2021](https://arxiv.org/html/2605.08360#bib.bib1)\)\. Similarly, in generative social choice, participants express their preferences using free\-form text, which are then aggregated into a slate of representative statements\(Fishet al\.,[2026](https://arxiv.org/html/2605.08360#bib.bib3)\)\. A commonality among these systems is that they require methods to group participants and to estimate a participant’s utility for statements they did not vote on\. Since inputs are free\-form text, a natural idea is to embed each participant’s text using a text embedding model\. This would permit the grouping of participants via clustering in the embedding space and distances could be used to estimate participants’ utility for statements they did not vote on\. Further, it would allow for novel applications of ideas from the facility location and fair clustering literature\(Feldmanet al\.,[2016](https://arxiv.org/html/2605.08360#bib.bib52); Chenet al\.,[2019](https://arxiv.org/html/2605.08360#bib.bib53); Micha and Shah,[2020](https://arxiv.org/html/2605.08360#bib.bib54); Kellerhals and Peters,[2024](https://arxiv.org/html/2605.08360#bib.bib55)\)\.
However, off\-the\-shelf text embedding models are mainly trained and evaluated on semantic tasks such as retrieval, textual similarity, and natural\-language inference\(Muennighoffet al\.,[2023](https://arxiv.org/html/2605.08360#bib.bib38)\)\. These tasks tend to reward placing texts close together in embedding space when they discuss the same topic or answer a similar query\. They do not necessarily require close points to be mutually endorsable or, in other words, preferentially similar\.
The distinction between semantic and preferential similarity is important to understand if embeddings are to be used inside preference aggregation procedures\. To make the importance clear, it is useful to imagine two statements from either side of a controversial political debate\. They may share the same style and topic, as well as many of the same words, all of which a generic text embedding model may pick up on\. However, they would not be mutually endorsed\. Table[1](https://arxiv.org/html/2605.08360#S1.T1)gives an example where a small alteration of a statement produces near\-identical surface similarity to the anchor yet significantly alters the preferential similarity\. A standard embedding model \(ST5\-XL\) scores the altered distractor statement as more similar than an opinion\-aligned alternative expressed in different words\.
Table 1:A hard triplet illustrating the mismatch between semantic and preferential similarity\. The semantic distractor uses the anchor’s wording but reverses its stance, while the preference match preserves the stance but changes the wording\. A base embedding model ranks the distractor closer to the anchor\. A preference\-tuned and topic\-specific projected embedding produce the correct ranking\.We frame the mismatch between semantic and preferential geometry as an invariance problem\(Achille and Soatto,[2018](https://arxiv.org/html/2605.08360#bib.bib29)\): a preference geometry should be invariant to wording and style, and sensitive only to stance and values\. Generic embedding models do not have this invariance, because they encode topical and stylistic features that are useful for retrieval and similarity tasks but unrelated to whether two statements would be endorsed by the same person\. On natural deliberation data, this gap is partly hidden because semantic and preferential similarity are correlated: people who share a stance often share wording\. In §[4\.1](https://arxiv.org/html/2605.08360#S4.SS1)we make the confounding explicit by decomposing the cosine margin into a preference component and a nuisance component\. Cosine weights the two equally, so when they agree \(as on natural data\) it looks correct, but when they disagree \(as on the hard triplet in Table[1](https://arxiv.org/html/2605.08360#S1.T1)\) the nuisance dominates and cosine fails\.
Our method follows from this diagnosis\. We synthesize triplets consisting of an anchor, a preference match with different wording, and a semantic distractor with high surface overlap but opposite stance\. Training on these triplets forces cosine to down\-weight nuisance variation that ordinary data leaves confounded with preference\. We prove that, under this hard\-triplet distribution, the Bradley\-Terry risk is strictly decreased by reducing the nuisance contribution below cosine’s unit weighting\. Empirically, our method significantly improves performance on hard triplets and on preference prediction across 11 online deliberation datasets\.
When per\-topic votes are available, which is common in online deliberation platforms, the same framework suggests an even simpler method: learning a low\-rank projection of the frozen embedding\. Despite its simplicity, this projected embedding performs very well in practice\.
To summarize, our contributions are as follows:
1. 1\.We diagnose the mismatch between semantic and preferential similarity as an invariance problem, and formalize it via a decomposition of the cosine margin into preference signal and nuisance\.
2. 2\.We introduce a hard\-triplet synthesis procedure and prove that training on it strictly decreases the Bradley\-Terry risk relative to standard cosine\.
3. 3\.We demonstrate substantial gains on hard triplets and on preference prediction across 11 online deliberation datasets\.
4. 4\.We show that when per\-topic votes are available, a low\-rank projection of frozen embeddings outperforms full preference tuning\.
## 2Related Work
A growing body of work on collective decision\-making over free\-form text relies on some geometry over participants or statements: Polis derives opinion maps from vote matrices\(Smallet al\.,[2021](https://arxiv.org/html/2605.08360#bib.bib1)\), generative social choice groups statements in an LLM\-defined feature space to produce representative slates\(Fishet al\.,[2026](https://arxiv.org/html/2605.08360#bib.bib3)\),Blairet al\.\([2026](https://arxiv.org/html/2605.08360#bib.bib5)\)model consensus as a region of embedding space, andDeet al\.\([2026](https://arxiv.org/html/2605.08360#bib.bib30)\)use cosine similarity of embeddings as participant utility when auditing justified representation in slates of questions\. Our goal in this paper is to answer a prerequisite question\. Namely, do distances in general\-purpose sentence embedding spaces reflect preferential similarity? And, if not, can they be realigned so that they do?
Another line of work fine\-tunes sentence encoders for opinion\-related tasks, including stance\-aware embeddings for opinion mining\(Ghafouriet al\.,[2024](https://arxiv.org/html/2605.08360#bib.bib20)\)and sparsity\-aware embeddings for contradiction retrieval\(Xuet al\.,[2024](https://arxiv.org/html/2605.08360#bib.bib24)\)\. However, in §[6](https://arxiv.org/html/2605.08360#S6)we find that neither works well for our task\. Our recipe is also related to SimCSE\(Gaoet al\.,[2021](https://arxiv.org/html/2605.08360#bib.bib67)\), which uses NLI entailments as positives and contradictions as hard negatives\. Our task is different and we go a step further by engineering triplets where the nuisance signal intentionally points in the wrong direction \(see Appendix[E\.4](https://arxiv.org/html/2605.08360#A5.SS4)for the relevant ablation\)\. An extended related work discussion is in Appendix[A](https://arxiv.org/html/2605.08360#A1)\.
## 3Evaluation Setup
Before presenting our diagnosis and method, we describe the evaluation data and metrics used throughout\. We evaluate on 11 datasets from three deliberation platforms\. Generative social choice \(GSC\)\(Fishet al\.,[2026](https://arxiv.org/html/2605.08360#bib.bib3)\)provides surveys in which participants write free\-text opinions and then rate AI\-generated statements \(two abortion surveys and one on chatbot personalization\)\. Remesh provides binary agree/disagree votes on others’ open\-ended responses across three topics \(campus protests, foreign intervention, right to assemble\)\. Polis\(Smallet al\.,[2021](https://arxiv.org/html/2605.08360#bib.bib1)\)provides comment\-level agree/disagree votes across five conversations \(Seattle minimum wage, Bowling Green, Brexit, Canadian electoral reform, universal basic income\)\. Some participants author comments in addition to voting\. Together these span text lengths from short Polis comments to multi\-paragraph GSC opinions and cover 1,462 participants, 3,958 statements, and 1\.46M pairwise preference triplets\. Dataset details and URLs are in Appendix[B](https://arxiv.org/html/2605.08360#A2)\.
For each participant, we construct preference triplets\(a,p,n\)\(a,p,n\)whereaais the participant’s own written text \(the anchor\),ppis a statement they rated more favorably, andnnis one they rated less favorably\. Candidate models score each triplet by computing a similarity margins\(a,p\)−s\(a,n\)s\(a,p\)\-s\(a,n\); the triplet is correct if this margin is positive\. We call the fraction of correctly ordered triplets*triplet accuracy*, and*cosine accuracy*when the scorerssis cosine similarity\. In §[7](https://arxiv.org/html/2605.08360#S7)we also evaluate the ideal\-point scorer, for whichssis the learned distance\-based utility\.
## 4Diagnosing Embedding Models
We begin by characterizing the preference signal in existing embedding models\. We introduce a formal framework \(§[4\.1](https://arxiv.org/html/2605.08360#S4.SS1)\), show that cosine is an approximation to the ideal\-point utility margin, present empirical evidence for a correlation between nuisance and preference signal in the natural\-data regime \(§[4\.2](https://arxiv.org/html/2605.08360#S4.SS2)\), and diagnose the hard\-triplet failure \(§[4\.3](https://arxiv.org/html/2605.08360#S4.SS3)\)\.
### 4\.1Formal framework
Letψ:𝒳→ℝd\\psi:\\mathcal\{X\}\\to\\mathbb\{R\}^\{d\}denote a pretrained encoder producing unit\-norm embeddings\. Within the embedding, preference on a given topic is governed by a*preference subspace*S⊆ℝdS\\subseteq\\mathbb\{R\}^\{d\}of dimensionk≪dk\\ll dthat carries the stance\-relevant structure\. LetPSP\_\{S\}denote orthogonal projection ontoSSandPS⟂P\_\{S^\{\\perp\}\}onto its complement, and abbreviateψS:=PSψ\\psi\_\{S\}:=P\_\{S\}\\psi,ψ⟂:=PS⟂ψ\\psi\_\{\\perp\}:=P\_\{S^\{\\perp\}\}\\psi\. Each participantvvwrites anchor textava\_\{v\}and has*ideal point*uv:=ψS\(av\)∈Su\_\{v\}:=\\psi\_\{S\}\(a\_\{v\}\)\\in S, which is the projection of their own embedding ontoSS\. Different participants have different anchors and thus different ideal points inSS\. For example, pro\-choice and pro\-life users live in opposite regions along the stance axis while sharingSSas the relevant direction of variation\.
We model the utility of candidate statementjjas the negative squared Euclidean distance to the participant’s ideal point withinSS\.111The derivations below extend to the general Mahalanobis case−‖ψS\(av\)−ψS\(xj\)‖M2\-\\\|\\psi\_\{S\}\(a\_\{v\}\)\-\\psi\_\{S\}\(x\_\{j\}\)\\\|\_\{M\}^\{2\}for any PSDMMonSS: absorbingM\\sqrt\{M\}into the encoder reduces it to the Euclidean case with no other change\.Expanding the square,
U∗\(v,j\)=−‖ψS\(av\)−ψS\(xj\)‖2=2⟨ψS\(av\),ψS\(xj\)⟩−‖ψS\(av\)‖2−‖ψS\(xj\)‖2\.U^\{\*\}\(v,j\)\\;=\\;\-\\\|\\psi\_\{S\}\(a\_\{v\}\)\-\\psi\_\{S\}\(x\_\{j\}\)\\\|^\{2\}\\;=\\;2\\left\\langle\\psi\_\{S\}\(a\_\{v\}\),\\psi\_\{S\}\(x\_\{j\}\)\\right\\rangle\-\\\|\\psi\_\{S\}\(a\_\{v\}\)\\\|^\{2\}\-\\\|\\psi\_\{S\}\(x\_\{j\}\)\\\|^\{2\}\.Pairwise preferences follow Bradley\-Terry:Pr\[p≻n∣v\]=\(1\+e−\(U∗\(v,p\)−U∗\(v,n\)\)\)−1\\Pr\[p\\succ n\\mid v\]=\\bigl\(1\+e^\{\-\(U^\{\*\}\(v,p\)\-U^\{\*\}\(v,n\)\)\}\\bigr\)^\{\-1\}\. For within\-anchor rankings, the‖ψS\(av\)‖2\\\|\\psi\_\{S\}\(a\_\{v\}\)\\\|^\{2\}term is constant across candidates and cancels out, so the ranking\-relevant utility margin is
U∗\(v,p\)−U∗\(v,n\)=2⟨ψS\(av\),ψS\(xp\)−ψS\(xn\)⟩⏟ΔS\+‖ψS\(xn\)‖2−‖ψS\(xp\)‖2⏟Δnorm\.U^\{\*\}\(v,p\)\-U^\{\*\}\(v,n\)\\;=\\;2\\underbrace\{\\left\\langle\\psi\_\{S\}\(a\_\{v\}\),\\psi\_\{S\}\(x\_\{p\}\)\-\\psi\_\{S\}\(x\_\{n\}\)\\right\\rangle\}\_\{\\Delta\_\{S\}\}\\;\+\\;\\underbrace\{\\\|\\psi\_\{S\}\(x\_\{n\}\)\\\|^\{2\}\-\\\|\\psi\_\{S\}\(x\_\{p\}\)\\\|^\{2\}\}\_\{\\Delta\_\{\\text\{norm\}\}\}\.\(1\)The first termΔS\\Delta\_\{S\}is bilinear in the anchor and item projections and the secondΔnorm\\Delta\_\{\\text\{norm\}\}is an item\-specific quadratic that depends only on candidate norms withinSS\.
#### Cosine as an approximation\.
A cosine scorer on an off\-the\-shelf embedding model does not knowSSand uses alldddimensions equally\. BecausePSP\_\{S\}andPS⟂P\_\{S^\{\\perp\}\}map into orthogonal subspaces, the cosine margin decomposes additively as
s\(av,p\)−s\(av,n\)=⟨ψS\(av\),ψS\(xp\)−ψS\(xn\)⟩⏟ΔS\+⟨ψ⟂\(av\),ψ⟂\(xp\)−ψ⟂\(xn\)⟩⏟ΔT\.s\(a\_\{v\},p\)\-s\(a\_\{v\},n\)\\;=\\;\\underbrace\{\\left\\langle\\psi\_\{S\}\(a\_\{v\}\),\\psi\_\{S\}\(x\_\{p\}\)\-\\psi\_\{S\}\(x\_\{n\}\)\\right\\rangle\}\_\{\\Delta\_\{S\}\}\\;\+\\;\\underbrace\{\\left\\langle\\psi\_\{\\perp\}\(a\_\{v\}\),\\psi\_\{\\perp\}\(x\_\{p\}\)\-\\psi\_\{\\perp\}\(x\_\{n\}\)\\right\\rangle\}\_\{\\Delta\_\{T\}\}\.\(2\)Comparing \([2](https://arxiv.org/html/2605.08360#S4.E2)\) with \([1](https://arxiv.org/html/2605.08360#S4.E1)\), cosine captures the bilinear part of the utility margin \(up to a factor of two\) but misses the item\-norm differenceΔnorm\\Delta\_\{\\mathrm\{norm\}\}entirely, and contributes an out\-of\-subspace nuisanceΔT\\Delta\_\{T\}that has no utility counterpart\. Cosine is therefore an approximation to the ideal\-point utility margin, tight whenΔnorm\\Delta\_\{\\mathrm\{norm\}\}is small and whenΔT\\Delta\_\{T\}aligns withΔS\\Delta\_\{S\}\. In §[7](https://arxiv.org/html/2605.08360#S7)we find empirical evidence that this model describes real preferences well\.
Figure[1](https://arxiv.org/html/2605.08360#S4.F1)illustrates the three stages in this paper: the diagnosis that base cosine overweights the nuisance direction \(§[4](https://arxiv.org/html/2605.08360#S4)\), hard\-triplet fine\-tuning which suppressesΔT\\Delta\_\{T\}\(§[5](https://arxiv.org/html/2605.08360#S5)\), and a per\-topic rank\-rrprojection which projects ontoSSdirectly \(§[7](https://arxiv.org/html/2605.08360#S7)\)\.
preferenceSSnuisanceS⟂S^\{\\perp\}aappnnA\. Base embeddingpreferenceSSnuisanceS⟂S^\{\\perp\}aappnnB\. Hard\-triplet tunepreferenceSSnuisanceS⟂S^\{\\perp\}aappnnC\. Per\-topic projectionhard triplets\(§[5](https://arxiv.org/html/2605.08360#S5)\)per\-topic votes\(§[7](https://arxiv.org/html/2605.08360#S7)\)
Figure 1:A hard triplet: anchoraa, preference\-matchpp\(same stance, different wording\), and semantic distractornn\(opposite stance, same wording\) with preference subspaceSShorizontal and nuisanceS⟂S^\{\\perp\}vertical\. \(A\) In the pretrained embedding,nnsharesaa’s nuisance component, so cosine ranksnnabovepp\. \(B\) Fine\-tuning on counterfactual hard triplets downweightsψ⟂\\psi\_\{\\perp\}\(Theorem[1](https://arxiv.org/html/2605.08360#Thmtheorem1)\)\. \(C\) With per\-topic labels, a rank\-rrprojectionL⊤L^\{\\top\}maps ontoSSdirectly, discardingS⟂S^\{\\perp\}\.
### 4\.2Proximity bands: cosine carries a preference signal
With the framework in place, we first ask what cosine similarity on base embedding models captures on natural deliberation data\. This diagnostic is pairwise \(between an anchor statement and a candidate statement\) rather than triplet\-based\. For each participant, we compute cosine similarity from their anchor embedding to every statement they voted on, bin these pairs into quintiles by similarity, and compute the approval rate per band\. We use the binary\-vote datasets \(Remesh and Polis\), where approval is directly observed\.
Approval rises monotonically across all four encoders \(Figure[2](https://arxiv.org/html/2605.08360#S4.F2)\)\. The spread between the most distant and most similar quintile is roughly between 15 and 20 percentage points for each encoder\. From this it is clear that cosine carries some preference signal, as statements closer to a participant’s own \(according to cosine similarity\) are more likely to be approved by that participant\.
However, where this signal comes from is less clear\. A high cosine score can arise because the candidate is close to the anchor in the preference subspace, or because it shares topic, wording, style, or affect that happen to correlate with preference in naturally occurring text on deliberation platforms\. More specifically, at the pair level similarity can be decomposed as
s\(av,xj\)=⟨ψ\(av\),ψ\(xj\)⟩=⟨ψS\(av\),ψS\(xj\)⟩⏟sS\(av,xj\)\+⟨ψ⟂\(av\),ψ⟂\(xj\)⟩⏟sT\(av,xj\)\.s\(a\_\{v\},x\_\{j\}\)=\\langle\\psi\(a\_\{v\}\),\\psi\(x\_\{j\}\)\\rangle=\\underbrace\{\\langle\\psi\_\{S\}\(a\_\{v\}\),\\psi\_\{S\}\(x\_\{j\}\)\\rangle\}\_\{s\_\{S\}\(a\_\{v\},x\_\{j\}\)\}\+\\underbrace\{\\langle\\psi\_\{\\perp\}\(a\_\{v\}\),\\psi\_\{\\perp\}\(x\_\{j\}\)\\rangle\}\_\{s\_\{T\}\(a\_\{v\},x\_\{j\}\)\}\.Figure[2](https://arxiv.org/html/2605.08360#S4.F2)shows that the sum is predictive of approval but it does not identify whether the cause is the preference termsSs\_\{S\}, the nuisance termsTs\_\{T\}, or the two terms moving together\. On natural deliberation data it could be the case that people who share a stance often also share wording and style\. If this were the case, semantic similarity and preferential similarity would be observationally correlated\.
Figure 2:Approval rate by cosine\-similarity quintile, pooled across the binary\-vote evaluation datasets\.The same ambiguity appears in the triplet margin used throughout the paper\. For a triplet\(a,p,n\)\(a,p,n\)withpppreferred tonn, differencing the pairwise scores gives
s\(a,p\)−s\(a,n\)=ΔS\+ΔT\.s\(a,p\)\-s\(a,n\)=\\Delta\_\{S\}\+\\Delta\_\{T\}\.
Cosine givesΔS\\Delta\_\{S\}andΔT\\Delta\_\{T\}unit weight, but their effective strength on a dataset can be very different\. The contribution of each term depends not only on the scorer’s coefficient, but also on the norm, variance, dimensionality, and label\-alignment of the component in the observed triplets\. IfΔS\\Delta\_\{S\}is large and reliably aligned with approval, then cosine succeeds because the embedding already contains a strong preference geometry\. IfΔT\\Delta\_\{T\}is large and merely correlated with approval, then cosine can also succeed on natural data, but for the wrong reason — namely, by using semantic features that happen to track preferences in the observed distribution\.
The proximity bands therefore leave open two explanations\. Base embeddings may genuinely put nearby points near each other because they are preferentially similar\. Alternatively, base embeddings may mostly capture semantic similarity, which may appear preference\-aware only because semantic similarity is correlated with preferential similarity in data from online deliberation platforms\. The next diagnostic is designed to ascertain which of the two explanations is correct by putting the two sources of signal in conflict\.
### 4\.3Hard triplets: separating preference from surface similarity
To test whether the apparent preference signal in base embedding models results from a correlated nuisance component or from a true preference awareness, we construct hard triplets that deliberately decouple preference alignment from surface similarity\. For each anchor, we generate two comparison statements: a*semantic distractor*that shares the anchor’s vocabulary but takes the opposite stance, and a*preference match*that shares the anchor’s opinion but uses different vocabulary\. A model that relies on surface features will score the distractor higher, whereas a model that captures preferential similarity will score the match higher\.
We generate 875 such triplets \(100 per evaluation dataset where available\) by rewriting real participant anchors with GPT\-4o \(prompt in Appendix[G](https://arxiv.org/html/2605.08360#A7)\)\. On these triplets, cosine fails across every encoder family we test\. Base sentence\-T5\-XL reaches only 48\.3%, near chance\. Three widely\-used encoders perform far worse: e5\-large\-v2 \(26\.7%\), BGE\-large \(6\.3%\), and all\-mpnet\-base \(8\.2%\)\. These three rank the semantic distractor above the preference match 70–95% of the time\. Meanwhile these same encoders score 58–60% on the natural\-data triplets \(Table[2](https://arxiv.org/html/2605.08360#S4.T2)\)\. The drops, of up to roughly 50 percentage points, from natural to hard data suggest that cosine’s apparent preference signal on natural data comes almost entirely from a nuisance component that is correlated with preferences on natural data\.
Table 2:Triplet accuracy \(%\) on the natural datasets \(test split\) vs\. the hard triplets described in §[4\.3](https://arxiv.org/html/2605.08360#S4.SS3)\.In sum, hard triplets show that the natural\-data signal from cosine on base embedding models is not robust when surface similarity and preference alignment conflict\. This suggests using such conflicts not just as a diagnostic, but as supervision\.
## 5Decorrelated Preference Tuning
The hard\-triplet diagnostic suggests a simple intervention: train on cases where surface similarity and preference alignment disagree\. However, §[4\.2](https://arxiv.org/html/2605.08360#S4.SS2)and §[4\.3](https://arxiv.org/html/2605.08360#S4.SS3)suggest that these cases are rare in natural data\. Therefore, we construct synthetic counterfactual triplets that break this correlation; we tune the encoder on them so that cosine learns to follow the preference\-aligned item rather than the distractor that is similar in a superficial sense\. We call this proceduredecorrelated preference tuning\(DPT\)\.
### 5\.1Why hard triplets are corrective
We first formalize the direction in which hard\-triplet training should move a scorer\. Consider the class of bilinear scorers
sA\(a,x\)=ψ\(a\)⊤Aψ\(x\),s\_\{A\}\(a,x\)=\\psi\(a\)^\{\\top\}A\\psi\(x\),with symmetricA∈ℝd×dA\\in\\mathbb\{R\}^\{d\\times d\}\. To make the nuisance weight explicit, this can be written as
A\(B,λ\)=B\+λPS⟂,A\(B,\\lambda\)=B\+\\lambda P\_\{\{S^\{\\perp\}\}\},
whereB=PSBPSB=P\_\{S\}BP\_\{S\}only acts on the preference subspace andλ\\lambdascales the nuisance\. This family includes cosine on unit\-norm embeddings whenB=PSB=P\_\{S\}andλ=1\\lambda=1, and theSS\-projected scorer whenB=PSB=P\_\{S\}andλ=0\\lambda=0\.
For a triplet\(a,p,n\)\(a,p,n\), the margin is
ψ\(a\)⊤A\(B,λ\)\(ψ\(p\)−ψ\(n\)\)=ΔB\+λΔT,\\psi\(a\)^\{\\top\}A\(B,\\lambda\)\(\\psi\(p\)\-\\psi\(n\)\)=\\Delta\_\{B\}\+\\lambda\\Delta\_\{T\},whereΔB:=ψS\(a\)⊤B\(ψS\(p\)−ψS\(n\)\)\\Delta\_\{B\}:=\\psi\_\{S\}\(a\)^\{\\top\}B\(\\psi\_\{S\}\(p\)\-\\psi\_\{S\}\(n\)\)and whereΔT=⟨ψ⟂\(a\),ψ⟂\(p\)−ψ⟂\(n\)⟩\\Delta\_\{T\}=\\langle\\psi\_\{\\perp\}\(a\),\\psi\_\{\\perp\}\(p\)\-\\psi\_\{\\perp\}\(n\)\\rangleis the nuisance margin\. Thus,λ\\lambdacontrols how much the scorer relies on the nuisance part of cosine\.
Let𝒢=σ\(ψS\(a\),ψS\(p\),ψS\(n\)\)\\mathcal\{G\}=\\sigma\(\\psi\_\{S\}\(a\),\\psi\_\{S\}\(p\),\\psi\_\{S\}\(n\)\)denote theσ\\sigma\-algebra generated by the in\-subspace parts of the triplet\. Conditioning on𝒢\\mathcal\{G\}means holding the preference subspace projections fixed while averaging over the nuisance subspace variation\.
The hard\-triplet condition can be formalized as𝔼\[ΔT∣𝒢\]≤0\\mathbb\{E\}\[\\Delta\_\{T\}\\mid\\mathcal\{G\}\]\\leq 0almost surely, with strict inequality on a set of positive probability\. In words, once the preference\-relevant parts of the triplet are fixed, the nuisance margin does not help the preferred item and, on average, it favors the distractor\. Since we assume the preferred item in the hard triplet is aligned with the anchor in the preference subspace,222The following proof of Theorem[1](https://arxiv.org/html/2605.08360#Thmtheorem1), however, does not require this\.this condition formalizes the idea that, in hard triplets, the preference and nuisance components point in opposite directions\.
For the scorerA\(B,λ\)A\(B,\\lambda\), define the Bradley\-Terry population risk as
R\(B,λ\)=𝔼\[log\(1\+e−ΔB−λΔT\)\]\.R\(B,\\lambda\)=\\mathbb\{E\}\[\\log\(1\+e^\{\-\\Delta\_\{B\}\-\\lambda\\Delta\_\{T\}\}\)\]\.
The following result shows that, on any triplet distribution satisfying the hard\-triplet condition, risk is reduced by decreasing the weight on the nuisance\.
###### Theorem 1\.
If𝔼\[ΔT∣𝒢\]≤0\\mathbb\{E\}\[\\Delta\_\{T\}\\mid\\mathcal\{G\}\]\\leq 0a\.s\., with strict inequality on a set of positive probability, then
R\(B,λ\)<R\(B,1\)for everyλ∈\[0,1\)\.R\(B,\\lambda\)<R\(B,1\)\\quad\\text\{for every \}\\lambda\\in\[0,1\)\.
The result does not require knowingSSor directly optimizingλ\\lambda\. Instead, simply constructing a distribution of triplets whereΔS\\Delta\_\{S\}points in the right direction andΔT\\Delta\_\{T\}points in the wrong direction for each triplet almost surely rewards scorers that put relatively more weight on the preference aligned part of the embedding\. The proof is in Appendix[C](https://arxiv.org/html/2605.08360#A3); its main steps are showingR\(B,⋅\)R\(B,\\cdot\)is convex,R′\(B,0\)\>0R^\{\\prime\}\(B,0\)\>0, then using the convexity \(and consequent non\-decreasing derivative\) to show thatR\(B,⋅\)R\(B,\\cdot\)increases asλ\\lambdamoves from 0 toward 1\.
### 5\.2Counterfactual hard\-triplet generation and fine\-tuning
Theorem[1](https://arxiv.org/html/2605.08360#Thmtheorem1)suggests that training on a distribution of hard triplets should implicitly cause the encoder to place less weight on the nuisance subspace\. We now describe how we synthesize such a distribution and how we train on it\.
To begin, we assemble a pool of 2,000 political and social issues from the Habermas Machine dataset\(Tessleret al\.,[2024](https://arxiv.org/html/2605.08360#bib.bib27)\)and Kialo,333[https://huggingface\.co/datasets/timchen0618/Kialo](https://huggingface.co/datasets/timchen0618/Kialo)filtered by GPT\-4o\-mini for genuine policy debatability, and prompt Claude Sonnet 4 to generate 5 opinions per issue spanning the stance spectrum\. We then sample anchors from this pool and prompt GPT\-4o to rewrite each anchor into two versions\. The*preference match*preserves the anchor’s stance and values but changes vocabulary, framing, and sentence structure\. The*semantic distractor*preserves much of the anchor’s wording and structure but flips the stance\. Together these form a triplet\(a,p,n\)\(a,p,n\)in which the preferred item is stance\-aligned but less surface\-similar, while the dispreferred item is surface\-similar but stance\-opposed\. Applying the same rewrite prompt to real participant anchors from the 11 evaluation datasets produces the held\-out hard triplets used in §[4\.3](https://arxiv.org/html/2605.08360#S4.SS3)and §[6](https://arxiv.org/html/2605.08360#S6)\. All prompts and model details are in Appendix[G](https://arxiv.org/html/2605.08360#A7)\.
We remark that the training data is completely synthetic and was generated independently of the 11 benchmark datasets used for evaluation\. So, any performance gains on the natural data benchmark are a strong indicator of generalization\. In addition, the way the triplets are generated is specifically designed to approximate the hard\-triplet condition,𝔼\[ΔT∣𝒢\]≤0\\mathbb\{E\}\[\\Delta\_\{T\}\\mid\\mathcal\{G\}\]\\leq 0\.
We adapt the pretrained encoderψ\\psiwith LoRA\(Huet al\.,[2022](https://arxiv.org/html/2605.08360#bib.bib19)\)\(r=16r\\\!=\\\!16,α=48\\alpha\\\!=\\\!48, targeting query and value projections\)\. For a triplet\(a,p,n\)\(a,p,n\)we train with Bradley\-Terry loss over cosine differences:
ℒBT\(a,p,n\)=log\(1\+e−\(cos\(ψ\(a\),ψ\(p\)\)−cos\(ψ\(a\),ψ\(n\)\)\)\)\.\\mathcal\{L\}\_\{\\mathrm\{BT\}\}\(a,p,n\)=\\log\\\!\\Big\(1\+e^\{\-\\big\(\\cos\(\\psi\(a\),\\psi\(p\)\)\-\\cos\(\\psi\(a\),\\psi\(n\)\)\\big\)\}\\Big\)\.Training and inference both use cosine geometry, so what we optimize is what is used downstream\. We do not train to convergence on the hard\-triplet distribution since this would lead to negative weight on the nuisance subspace instead of the desired invariance\.
## 6Empirical Results
Fine\-tuning substantially reduces the hard\-triplet failure identified in §[4\.3](https://arxiv.org/html/2605.08360#S4.SS3)\. Our DPT\-tuned sentence\-T5\-XL lifts hard\-triplet \(from the evaluation set\) cosine accuracy from 48\.3% to 80\.0% and natural\-triplet mean accuracy from 65\.2% to 68\.6%\. Further, the recipe is not specific to sentence\-T5: applying the same training to three widely\-used BERT\-derived encoders, e5\-large\-v2, BGE\-large, and all\-mpnet\-base, produces similar patterns, with hard\-triplet accuracy lifting by roughly 20 to 50 percentage points across models \(Table[3](https://arxiv.org/html/2605.08360#S6.T3)\)\. The natural\-data lift is significant at the participant level under a paired Wilcoxon test \(p=3\.4×10−9p=3\.4\\times 10^\{\-9\}, pooled over the442442test\-split participants in the 11 datasets\) and significant on1010of1111datasets under a triplet\-level McNemar test \(Appendix[E\.5](https://arxiv.org/html/2605.08360#A5.SS5)\)\.
Table 3:Base vs DPT\-tuned cosine accuracy on the 11 natural\-data datasets \(test split\) and on the 875 hard triplets across 5 training seeds\. The recipe is hyperparameter\-identical across models except for learning rate and hard\-triplet count, which are val\-selected per encoder\.Table[4](https://arxiv.org/html/2605.08360#S6.T4)shows where the resulting DPT\-tuned geometry sits relative to a broader set of off\-the\-shelf and stance\-aware baselines, including methods explicitly designed for viewpoint or contradiction\-sensitive similarity\. It shows a representative cut of the full 25\-model comparison \(Appendix[F](https://arxiv.org/html/2605.08360#A6)\)\. Our DPT\-tuned sentence\-T5\-XL achieves the highest mean accuracy at 68\.6%, compared to 65\.2% for the next\-best embedding \(the untuned ST5\-XL\)\. Our method is best on 8 of 11 datasets, and no other model is best on more than one\.
Table 4:Triplet accuracy \(%\) across the 11 evaluation datasets, abridged to seven representative models\. A full comparison across 25 embedding models is in Appendix[F](https://arxiv.org/html/2605.08360#A6)\.
## 7Per\-Topic Projected Embeddings
Decorrelated preference tuning trains a single encoder whose cosine geometry is more preference\-aware without using any data from the target topic\. This section asks what changes when participant votes over statements are available, as is often the case for online deliberation platforms\. The section serves three purposes\. First, it gives a stronger topic\-specific deployment mode where the base embedding model is held fixed and a low\-rank linear map is learned from participant votes\. Second, it tests the ideal\-point model from §[4\.1](https://arxiv.org/html/2605.08360#S4.SS1)\. And third, it sharpens the diagnosis of cosine similarity on base embedding models\. If a low\-rank linear map works on a frozen encoder, it suggests that the encoder already contains a preference signal\. DPT makes this structure more visible to cosine by implicitly down\-weighting the nuisance component, and the per\-topic map recovers it from votes\.
Table 5:Each row relaxes one structural commitment of the ideal\-point scorer; all are fit on frozen base sentence\-T5\-XL with Bradley\-Terry loss across the 11 evaluation datasets \(3\-way participant split, val\-selected hyperparameters, five seeds\)\.To begin, we fit the ideal\-point utility directly on frozen sentence\-T5\-XL\. Letaabe a participantvv’s anchor andxjx\_\{j\}a candidate statement\. We learn a rank\-rrlinear mapL⊤:ℝd→ℝrL^\{\\top\}:\\mathbb\{R\}^\{d\}\\to\\mathbb\{R\}^\{r\}and score candidates by squared Euclidean distance in the mapped space:
U\(v,j\)=−‖L⊤ψ\(av\)−L⊤ψ\(xj\)‖2=2⟨L⊤ψ\(av\),L⊤ψ\(xj\)⟩−∥L⊤ψ\(xj\)∥2−∥L⊤ψ\(av\)∥2\.\\displaystyle U\(v,j\)=\-\\bigl\\lVert L^\{\\top\}\\psi\(a\_\{v\}\)\-L^\{\\top\}\\psi\(x\_\{j\}\)\\bigr\\rVert^\{2\}=2\\langle L^\{\\top\}\\psi\(a\_\{v\}\),L^\{\\top\}\\psi\(x\_\{j\}\)\\rangle\-\\lVert L^\{\\top\}\\psi\(x\_\{j\}\)\\rVert^\{2\}\-\\lVert L^\{\\top\}\\psi\(a\_\{v\}\)\\rVert^\{2\}\.\(3\)We refer toψ~\(x\)=L⊤ψ\(x\)\\tilde\{\\psi\}\(x\)=L^\{\\top\}\\psi\(x\)as aprojected embedding\. This form is the ideal\-point model with the preference subspace and metric learned from votes\. The same map is applied to anchors and candidates, and utility is distance rather than inner product\. FittingLLseparately for each of the 11 deliberation datasets with Bradley\-Terry loss on participant votes \(3\-way train/validation/test split, rankr=20r=20, 5 seeds\) reaches 77\.6% macro\-mean accuracy over the datasets on the test split\. Applied to the hard evaluation triplets, with no training on hard data, the same per\-topic linear map reaches 81\.1% macro\-mean accuracy \(Table[5](https://arxiv.org/html/2605.08360#S7.T5), first row\)\. This suggests that the frozen encoder already contains preference signal and thatLLsimply learns to read it out\. This may also explain why DPT is so sample efficient — the geometry only needs to be realigned rather than learned from scratch\.
We now turn to the question of whether the model proposed in §[4\.1](https://arxiv.org/html/2605.08360#S4.SS1)is empirically justified\. The ideal\-point scorer commits to three main structural claims aboutψ~\\tilde\{\\psi\}: the scorer is*linear*inψ\\psi,*tied*between anchor and item \(both projected by the sameLL\), and a*distance*rather than an inner product\. Table[5](https://arxiv.org/html/2605.08360#S7.T5)relaxes each claim in turn\. Replacing the linearL⊤L^\{\\top\}with a shared nonlinear MLPϕ\\phiwhile keeping the squared\-distance form costs 4\.1% on the natural data and 14\.6% on the hard evaluation triplets, indicating thatψ\\psiis already rich enough that added per\-text nonlinearity overfits rather than helps\. Projecting anchor and item through independentLa,LjL\_\{a\},L\_\{j\}matches natural\-data accuracy but loses 7\.4% on the hard evaluation triplets\. Dropping the item\-quadratic−∥L⊤ψ\(j\)∥2\-\\lVert L^\{\\top\}\\psi\(j\)\\rVert^\{2\}for the bilinear cross\-term alone \(same rank, same sharedLL\) leaves natural accuracy unchanged but loses 7\.8% on hard evaluation triplets\. A possible explanation for this is that the item\-norm encodes statement quality that directional similarity discards\.
Figure 3:Per\-topic scorer accuracy versus projection rankrr\. Mean±\\pmstd over five seeds, macro\-averaged over 11 datasets\.Our model also assumes that the learned space is low\-dimensional\. To test this we swept across ranksr∈\{1,2,5,10,20,50,100\}r\\in\\\{1,2,5,10,20,50,100\\\}\. Figure[3](https://arxiv.org/html/2605.08360#S7.F3)shows that accuracy rises fromr=1r=1and plateaus atr=20r=20\. Per\-topic supervision is data\-efficient as well\. Fixingr=20r=20, the projected embedding crosses the universal DPT cosine at roughly 50 labeled triplets \(Appendix[E\.7](https://arxiv.org/html/2605.08360#A5.SS7)\)\.
## 8Discussion
This paper develops methods for learning a preference geometry over text, motivated by the design choices that deliberative systems must make\. A key aspect of deliberation is exposing participants to a range of viewpoints\(Mutz,[2006](https://arxiv.org/html/2605.08360#bib.bib45); Fishkin and Luskin,[2005](https://arxiv.org/html/2605.08360#bib.bib43); Sunstein,[2002](https://arxiv.org/html/2605.08360#bib.bib46)\)\. Online platforms such as Polis, Remesh, Frankly, and the Stanford Online Deliberation Platform face this problem at scale, and must decide who should deliberate together, which comments to surface, and which viewpoints are missing\(Smallet al\.,[2021](https://arxiv.org/html/2605.08360#bib.bib1); Fishkinet al\.,[2019](https://arxiv.org/html/2605.08360#bib.bib41); Frankly,[2026](https://arxiv.org/html/2605.08360#bib.bib42)\)\. A preference geometry gives one way to represent the viewpoints before making these decisions\.
Two examples illustrate the point: forming deliberation groups, and aggregating the views expressed during deliberation\. The composition of a deliberation group affects the quality of discussion, with quality highest at moderate levels of within\-group disagreement\(Esterlinget al\.,[2015](https://arxiv.org/html/2605.08360#bib.bib50); Karpowitz and Mendelberg,[2007](https://arxiv.org/html/2605.08360#bib.bib49)\)\. In practice, participants are assigned to small groups at random or by algorithms that balance demographics across groups alongside other objectives such as maximizing distinct pairwise meetings across rounds\(Barrettet al\.,[2023](https://arxiv.org/html/2605.08360#bib.bib74),[2024](https://arxiv.org/html/2605.08360#bib.bib73)\), where demographic categories act as a coarse proxy for views\(Mansbridge,[1999](https://arxiv.org/html/2605.08360#bib.bib51)\)\.Yang and Bachmann \([2025](https://arxiv.org/html/2605.08360#bib.bib72)\)move beyond demographics by using voting data as a proxy of views\. A preference embedding space takes the final step, providing a direct measure of views in which objectives like balancing positions across rooms or controlling the level of within\-room disagreement can be specified directly\. Aggregation also has a natural formulation as voting in a metric space\(Bulteauet al\.,[2021](https://arxiv.org/html/2605.08360#bib.bib7); Feldmanet al\.,[2016](https://arxiv.org/html/2605.08360#bib.bib52)\)\. Since platforms typically surface several comments rather than one, the relevant problem is selecting a slate, which corresponds geometrically to clustering\. Herekk\-median andkk\-center capture utilitarian and egalitarian aggregation, while proportional fairness requires any large, aligned subgroup to have a representative\(Chenet al\.,[2019](https://arxiv.org/html/2605.08360#bib.bib53); Micha and Shah,[2020](https://arxiv.org/html/2605.08360#bib.bib54); Kellerhals and Peters,[2024](https://arxiv.org/html/2605.08360#bib.bib55)\)\.
A preference geometry can also aid generative social choice\(Fishet al\.,[2026](https://arxiv.org/html/2605.08360#bib.bib3); Boehmeret al\.,[2025](https://arxiv.org/html/2605.08360#bib.bib4)\), where generative models expand the set of candidate statements\. Before generation, it can identify groups who deserve representation, during generation it can guide the generative model, and after generation, it can measure how well a generated statement represents its intended group\. Appendix[E\.8](https://arxiv.org/html/2605.08360#A5.SS8)gives preliminary evidence that the tuned and projected geometries identify such groups better than the base embedding: on Remesh, users agree more with comments from their own cluster than from others, and the gap is significantly larger under the DPT\-tuned and projected geometries\.
#### Limitations and future work\.
First, our evaluation focuses on within\-participant rankings, not whether absolute distances are calibrated, and since utility is latent, direct verification is difficult\. Yet, Bradley\-Terry training does push correctly\-ordered pairs apart by a margin that scales with preference strength, and Appendix[E\.9](https://arxiv.org/html/2605.08360#A5.SS9)shows that similarity in the tuned and projected spaces tracks continuous Likert ratings better than the base model\. Second, the gap between the universal tune and the per\-topic projected embedding suggests that preferences have both a shared component across topics and a topic\-specific component that only voting data on that topic recovers\. A natural next step is to generate hard triplets conditioned on a target topic, producing a topic\-specific embedding without per\-topic votes\. Finally, an embedding aimed at capturing preferences is an empirical and imperfect representation of preferences\. It can support sense\-making and help surface diverse views, but it should not be treated as a perfect representation of any individual’s considered judgment, and it should not be used as a basis for binding decisions\(Revel and Pénigaud,[2026](https://arxiv.org/html/2605.08360#bib.bib47)\)\.
## Acknowledgments
This work was partially supported by the National Science Foundation under grant IIS\-2229881; by the Office of Naval Research under grants N00014\-24\-1\-2704 and N00014\-25\-1\-2153; and by grants from the Cooperative AI Foundation and the Foresight Institute\. Carter Blair is supported by an NSERC PGS D and a Cooperative AI PhD Fellowship\.
## References
- Emergence of invariance and disentanglement in deep representations\.Journal of Machine Learning Research19\(50\),pp\. 1–34\.Cited by:[§1](https://arxiv.org/html/2605.08360#S1.p4.1)\.
- E\. Anshelevich, A\. Filos\-Ratsikas, N\. Shah, and A\. A\. Voudouris \(2021\)Distortion in social choice problems: the first 15 years and beyond\.InProceedings of the 30th International Joint Conference on Artificial Intelligence \(IJCAI\),pp\. 4294–4301\.Note:Survey TrackExternal Links:[Document](https://dx.doi.org/10.24963/ijcai.2021/589)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p4.1)\.
- J\. Barrett, K\. Gal, P\. Gölz, R\. M\. Hong, and A\. D\. Procaccia \(2023\)Now we’re talking: better deliberation groups through submodular optimization\.InProceedings of the 37th AAAI Conference on Artificial Intelligence \(AAAI\),pp\. 5490–5498\.External Links:[Document](https://dx.doi.org/10.1609/aaai.v37i5.25682)Cited by:[§8](https://arxiv.org/html/2605.08360#S8.p2.2)\.
- J\. Barrett, P\. C\. Verpoort, and K\. Gal \(2024\)A new heuristic algorithm for balanced deliberation groups\.External Links:2410\.21451,[Link](https://arxiv.org/abs/2410.21451)Cited by:[§8](https://arxiv.org/html/2605.08360#S8.p2.2)\.
- C\. Blair, B\. Armstrong, S\. Alouf\-Heffetz, N\. Talmon, and D\. Grossi \(2026\)Probably approximately consensus: on the learning theory of finding common ground\.External Links:2604\.21811,[Link](https://arxiv.org/abs/2604.21811)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p1.1),[§2](https://arxiv.org/html/2605.08360#S2.p1.1)\.
- C\. Blair, K\. Larson, and E\. Law \(2025\)Reflective verbal reward design for pluralistic alignment\.InProceedings of the 34th International Joint Conference on Artificial Intelligence \(IJCAI\),External Links:[Document](https://dx.doi.org/10.24963/ijcai.2025/1141)Cited by:[§E\.1](https://arxiv.org/html/2605.08360#A5.SS1.p2.1)\.
- C\. Blair and K\. Larson \(2025\)Generating fair consensus statements with social choice on token\-level MDPs\.External Links:2510\.14106,[Link](https://arxiv.org/abs/2510.14106)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p1.1)\.
- N\. Boehmer, S\. Fish, and A\. D\. Procaccia \(2025\)Generative social choice: the next generation\.InProceedings of the 42nd International Conference on Machine Learning \(ICML\),Proceedings of Machine Learning Research, Vol\.267,pp\. 4627–4659\.External Links:[Link](https://proceedings.mlr.press/v267/boehmer25a.html)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p1.1),[§8](https://arxiv.org/html/2605.08360#S8.p3.1)\.
- L\. Bulteau, G\. Shahaf, E\. Shapiro, and N\. Talmon \(2021\)Aggregation over metric spaces: proposing and voting in elections, budgeting, and legislation\.Journal of Artificial Intelligence Research70,pp\. 1413–1439\.External Links:[Document](https://dx.doi.org/10.1613/jair.1.12388)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p4.1),[§8](https://arxiv.org/html/2605.08360#S8.p2.2)\.
- X\. Chen, B\. Fain, L\. Lyu, and K\. Munagala \(2019\)Proportionally fair clustering\.InProceedings of the 36th International Conference on Machine Learning \(ICML\),Proceedings of Machine Learning Research, Vol\.97,pp\. 1032–1041\.External Links:[Link](https://proceedings.mlr.press/v97/chen19d.html)Cited by:[§1](https://arxiv.org/html/2605.08360#S1.p1.1),[§8](https://arxiv.org/html/2605.08360#S8.p2.2)\.
- J\. Chooi, P\. Gölz, A\. D\. Procaccia, B\. Schiffer, and S\. Zhang \(2026\)Finding common ground in a sea of alternatives\.External Links:2603\.16751,[Link](https://arxiv.org/abs/2603.16751)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p1.1)\.
- P\. F\. Christiano, J\. Leike, T\. B\. Brown, M\. Martic, S\. Legg, and D\. Amodei \(2017\)Deep reinforcement learning from human preferences\.InAdvances in Neural Information Processing Systems,Vol\.30,pp\. 4299–4307\.Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p4.1)\.
- J\. Clinton, S\. Jackman, and D\. Rivers \(2004\)The statistical analysis of roll call data\.American Political Science Review98\(2\),pp\. 355–370\.External Links:[Document](https://dx.doi.org/10.1017/S0003055404001194)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p3.1)\.
- S\. De, L\. Gelauff, A\. Goel, S\. Milli, A\. D\. Procaccia, and A\. Siu \(2026\)Question the questions: auditing representation in online deliberative processes\.InProceedings of the ACM Web Conference 2026,pp\. 1640–1650\.External Links:[Document](https://dx.doi.org/10.1145/3774904.3792474)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p1.1),[§2](https://arxiv.org/html/2605.08360#S2.p1.1)\.
- K\. M\. Esterling, A\. Fung, and T\. Lee \(2015\)How much disagreement is good for democratic deliberation?\.Political Communication32\(4\),pp\. 529–551\.External Links:[Document](https://dx.doi.org/10.1080/10584609.2014.969466)Cited by:[§8](https://arxiv.org/html/2605.08360#S8.p2.2)\.
- M\. Feldman, A\. Fiat, and I\. Golomb \(2016\)On voting and facility location\.InProceedings of the 2016 ACM Conference on Economics and Computation \(EC ’16\),pp\. 269–286\.External Links:[Document](https://dx.doi.org/10.1145/2940716.2940725)Cited by:[§1](https://arxiv.org/html/2605.08360#S1.p1.1),[§8](https://arxiv.org/html/2605.08360#S8.p2.2)\.
- S\. Fish, P\. Gölz, D\. C\. Parkes, A\. D\. Procaccia, G\. Rusak, I\. Shapira, and M\. Wüthrich \(2026\)Generative social choice\.Journal of the ACM73\(2\)\.External Links:[Document](https://dx.doi.org/10.1145/3799709)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p1.1),[Appendix B](https://arxiv.org/html/2605.08360#A2.p1.1),[§1](https://arxiv.org/html/2605.08360#S1.p1.1),[§2](https://arxiv.org/html/2605.08360#S2.p1.1),[§3](https://arxiv.org/html/2605.08360#S3.p1.1),[§8](https://arxiv.org/html/2605.08360#S8.p3.1)\.
- J\. Fishkin, N\. Garg, L\. Gelauff, A\. Goel, K\. Munagala, S\. Sakshuwong, A\. Siu, and S\. Yandamuri \(2019\)Deliberative democracy with the online deliberation platform\.InThe 7th AAAI Conference on Human Computation and Crowdsourcing \(HCOMP\),pp\. 1–2\.Note:DemoExternal Links:[Link](https://www.humancomputation.com/2019/assets/papers/144.pdf)Cited by:[§8](https://arxiv.org/html/2605.08360#S8.p1.1)\.
- J\. S\. Fishkin and R\. C\. Luskin \(2005\)Experimenting with a democratic ideal: deliberative polling and public opinion\.Acta Politica40\(3\),pp\. 284–298\.External Links:[Document](https://dx.doi.org/10.1057/palgrave.ap.5500121)Cited by:[§8](https://arxiv.org/html/2605.08360#S8.p1.1)\.
- Frankly \(2026\)Frankly: enabling constructive dialogue\.Note:[https://frankly\.org/](https://frankly.org/)Accessed: 2026\-05\-01Cited by:[§8](https://arxiv.org/html/2605.08360#S8.p1.1)\.
- T\. Gao, X\. Yao, and D\. Chen \(2021\)SimCSE: simple contrastive learning of sentence embeddings\.InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing \(EMNLP\),pp\. 6894–6910\.External Links:[Document](https://dx.doi.org/10.18653/v1/2021.emnlp-main.552)Cited by:[§E\.4](https://arxiv.org/html/2605.08360#A5.SS4.p1.10),[§2](https://arxiv.org/html/2605.08360#S2.p2.1)\.
- S\. M\. Gerrish and D\. M\. Blei \(2011\)Predicting legislative roll calls from text\.InProceedings of the 28th International Conference on Machine Learning \(ICML\),pp\. 489–496\.Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p3.1)\.
- V\. Ghafouri, J\. Such, and G\. Suarez\-Tangil \(2024\)I love pineapple on pizza \!= I hate pineapple on pizza: stance\-aware sentence transformers for opinion mining\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing \(EMNLP\),pp\. 21046–21058\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.emnlp-main.1171)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p2.1),[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1),[§2](https://arxiv.org/html/2605.08360#S2.p2.1)\.
- U\. Grandi, R\. Bredereck, T\. Delemazure, U\. Endriss, J\. Maly, N\. Mattei, N\. Maudet, O\. Nardi, and S\. Szufa \(2026\)Social choice with text: collective decision making in the LLM era\.Note:Preprint available at HAL Open Science Archive \(hal\-05548117\), March 2026External Links:[Link](https://hal.science/hal-05548117)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p1.1)\.
- K\. Handa, Y\. Gal, E\. Pavlick, N\. Goodman, J\. Andreas, A\. Tamkin, and B\. Z\. Li \(2024\)Bayesian preference elicitation with language models\.External Links:2403\.05534,[Link](https://arxiv.org/abs/2403.05534)Cited by:[§E\.1](https://arxiv.org/html/2605.08360#A5.SS1.p2.1)\.
- E\. J\. Hu, Y\. Shen, P\. Wallis, Z\. Allen\-Zhu, Y\. Li, S\. Wang, L\. Wang, and W\. Chen \(2022\)LoRA: low\-rank adaptation of large language models\.InInternational Conference on Learning Representations \(ICLR\),External Links:[Link](https://openreview.net/forum?id=nZeVKeeFYf9)Cited by:[§5\.2](https://arxiv.org/html/2605.08360#S5.SS2.p4.4)\.
- J\. Introne \(2023\)Measuring belief dynamics on Twitter\.Proceedings of the International AAAI Conference on Web and Social Media17\(1\),pp\. 387–398\.External Links:[Document](https://dx.doi.org/10.1609/icwsm.v17i1.22154)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p2.1)\.
- C\. F\. Karpowitz and T\. Mendelberg \(2007\)Groups and deliberation\.Swiss Political Science Review13\(4\),pp\. 645–662\.External Links:[Document](https://dx.doi.org/10.1002/j.1662-6370.2007.tb00092.x)Cited by:[§8](https://arxiv.org/html/2605.08360#S8.p2.2)\.
- L\. Kellerhals and J\. Peters \(2024\)Proportional fairness in clustering: a social choice perspective\.InAdvances in Neural Information Processing Systems,Vol\.37,pp\. 28471–28494\.Cited by:[§1](https://arxiv.org/html/2605.08360#S1.p1.1),[§8](https://arxiv.org/html/2605.08360#S8.p2.2)\.
- M\. Laver, K\. Benoit, and J\. Garry \(2003\)Extracting policy positions from political texts using words as data\.American Political Science Review97\(2\),pp\. 311–331\.External Links:[Document](https://dx.doi.org/10.1017/S0003055403000698)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p3.1)\.
- S\. Lee, A\. Shakir, D\. Koenig, and J\. Lipp \(2024\)Open source strikes bread \- new fluffy embeddings model\.Note:[https://www\.mixedbread\.com/blog/mxbai\-embed\-large\-v1](https://www.mixedbread.com/blog/mxbai-embed-large-v1)Accessed: 2026\-05\-01Cited by:[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1)\.
- B\. Z\. Li, A\. Tamkin, N\. Goodman, and J\. Andreas \(2025\)Eliciting human preferences with language models\.InInternational Conference on Learning Representations \(ICLR\),External Links:[Link](https://openreview.net/forum?id=LvDwwAgMEW)Cited by:[§E\.1](https://arxiv.org/html/2605.08360#A5.SS1.p2.1)\.
- Z\. Li, X\. Zhang, Y\. Zhang, D\. Long, P\. Xie, and M\. Zhang \(2023\)Towards general text embeddings with multi\-stage contrastive learning\.External Links:2308\.03281,[Link](https://arxiv.org/abs/2308.03281)Cited by:[1st item](https://arxiv.org/html/2605.08360#A6.I1.i1.p1.1),[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1)\.
- J\. Mansbridge \(1999\)Should Blacks represent Blacks and women represent women? A contingent “yes”\.The Journal of Politics61\(3\),pp\. 628–657\.External Links:[Document](https://dx.doi.org/10.2307/2647821)Cited by:[§8](https://arxiv.org/html/2605.08360#S8.p2.2)\.
- L\. Merrick, D\. Xu, G\. Nuti, and D\. Campos \(2024\)Arctic\-embed: scalable, efficient, and accurate text embedding models\.External Links:2405\.05374,[Link](https://arxiv.org/abs/2405.05374)Cited by:[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1)\.
- E\. Micha and N\. Shah \(2020\)Proportionally fair clustering revisited\.In47th International Colloquium on Automata, Languages, and Programming \(ICALP 2020\),Leibniz International Proceedings in Informatics \(LIPIcs\), Vol\.168,pp\. 85:1–85:16\.External Links:[Document](https://dx.doi.org/10.4230/LIPIcs.ICALP.2020.85)Cited by:[§1](https://arxiv.org/html/2605.08360#S1.p1.1),[§8](https://arxiv.org/html/2605.08360#S8.p2.2)\.
- N\. Muennighoff, N\. Tazi, L\. Magne, and N\. Reimers \(2023\)MTEB: massive text embedding benchmark\.InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics \(EACL\),pp\. 2014–2037\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.eacl-main.148)Cited by:[§1](https://arxiv.org/html/2605.08360#S1.p2.1)\.
- D\. C\. Mutz \(2006\)Hearing the other side: deliberative versus participatory democracy\.Cambridge University Press\.External Links:ISBN 9780521612289Cited by:[§8](https://arxiv.org/html/2605.08360#S8.p1.1)\.
- J\. Ni, G\. Hernandez Abrego, N\. Constant, J\. Ma, K\. B\. Hall, D\. Cer, and Y\. Yang \(2022\)Sentence\-T5: scalable sentence encoders from pre\-trained text\-to\-text models\.InFindings of the Association for Computational Linguistics: ACL 2022,pp\. 1864–1874\.External Links:[Document](https://dx.doi.org/10.18653/v1/2022.findings-acl.146)Cited by:[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1)\.
- OpenAI \(2024\)New embedding models and API updates\.Note:[https://openai\.com/index/new\-embedding\-models\-and\-api\-updates/](https://openai.com/index/new-embedding-models-and-api-updates/)Accessed: 2026\-05\-01Cited by:[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1)\.
- L\. Ouyang, J\. Wu, X\. Jiang, D\. Almeida, C\. L\. Wainwright, P\. Mishkin, C\. Zhang, S\. Agarwal, K\. Slama, A\. Ray, J\. Schulman, J\. Hilton, F\. Kelton, L\. Miller, M\. Simens, A\. Askell, P\. Welinder, P\. F\. Christiano, J\. Leike, and R\. Lowe \(2022\)Training language models to follow instructions with human feedback\.InAdvances in Neural Information Processing Systems,Vol\.35,pp\. 27730–27744\.Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p4.1)\.
- K\. T\. Poole and H\. Rosenthal \(1985\)A spatial model for legislative roll call analysis\.American Journal of Political Science29\(2\),pp\. 357–384\.External Links:[Document](https://dx.doi.org/10.2307/2111172)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p3.1)\.
- R\. Rafailov, A\. Sharma, E\. Mitchell, C\. D\. Manning, S\. Ermon, and C\. Finn \(2023\)Direct preference optimization: your language model is secretly a reward model\.InAdvances in Neural Information Processing Systems,Vol\.36,pp\. 53728–53741\.Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p4.1)\.
- N\. Reimers and I\. Gurevych \(2019\)Sentence\-BERT: sentence embeddings using siamese BERT\-networks\.InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing \(EMNLP\-IJCNLP\),pp\. 3982–3992\.External Links:[Document](https://dx.doi.org/10.18653/v1/D19-1410)Cited by:[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1)\.
- M\. Revel and T\. Pénigaud \(2026\)AI\-enhanced deliberative democracy and the future of the collective will\.Philosophy & Technology39\(81\)\.External Links:[Document](https://dx.doi.org/10.1007/s13347-026-01044-1)Cited by:[§8](https://arxiv.org/html/2605.08360#S8.SS0.SSS0.Px1.p1.1)\.
- L\. Rheault and C\. Cochrane \(2020\)Word embeddings for the analysis of ideological placement in parliamentary corpora\.Political Analysis28\(1\),pp\. 112–133\.External Links:[Document](https://dx.doi.org/10.1017/pan.2019.26)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p3.1)\.
- J\. B\. Slapin and S\. Proksch \(2008\)A scaling model for estimating time\-series party positions from texts\.American Journal of Political Science52\(3\),pp\. 705–722\.External Links:[Document](https://dx.doi.org/10.1111/j.1540-5907.2008.00338.x)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p3.1)\.
- C\. T\. Small, M\. Bjorkegren, T\. Erkkilä, L\. Shaw, and C\. Megill \(2021\)Polis: scaling deliberation by mapping high dimensional opinion spaces\.RECERCA\. Revista de Pensament i Anàlisi26\(2\),pp\. 1–26\.External Links:[Document](https://dx.doi.org/10.6035/recerca.5516)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p1.1),[Appendix B](https://arxiv.org/html/2605.08360#A2.p3.1),[§1](https://arxiv.org/html/2605.08360#S1.p1.1),[§2](https://arxiv.org/html/2605.08360#S2.p1.1),[§3](https://arxiv.org/html/2605.08360#S3.p1.1),[§8](https://arxiv.org/html/2605.08360#S8.p1.1)\.
- C\. T\. Small, I\. Vendrov, E\. Durmus, H\. Homaei, E\. Barry, J\. Cornebise, T\. Suzman, D\. Ganguli, and C\. Megill \(2023\)Opportunities and risks of LLMs for scalable deliberation with Polis\.External Links:2306\.11932,[Link](https://arxiv.org/abs/2306.11932)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p1.1)\.
- C\. R\. Sunstein \(2002\)The law of group polarization\.Journal of Political Philosophy10\(2\),pp\. 175–195\.External Links:[Document](https://dx.doi.org/10.1111/1467-9760.00148)Cited by:[§8](https://arxiv.org/html/2605.08360#S8.p1.1)\.
- M\. H\. Tessler, M\. A\. Bakker, D\. Jarrett, H\. Sheahan, M\. J\. Chadwick, R\. Koster, G\. Evans, L\. Campbell\-Gillingham, T\. Collins, D\. C\. Parkes, M\. Botvinick, and C\. Summerfield \(2024\)AI can help humans find common ground in democratic deliberation\.Science386\(6719\),pp\. eadq2852\.External Links:[Document](https://dx.doi.org/10.1126/science.adq2852)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p1.1),[§5\.2](https://arxiv.org/html/2605.08360#S5.SS2.p2.1)\.
- K\. Vafa, S\. Naidu, and D\. M\. Blei \(2020\)Text\-based ideal points\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics \(ACL\),pp\. 5345–5357\.External Links:[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.475)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p3.1)\.
- T\. Vahtola, M\. Creutz, and J\. Tiedemann \(2022\)It is not easy to detect paraphrases: analysing semantic similarity with antonyms and negation using the new SemAntoNeg benchmark\.InProceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP,pp\. 249–262\.External Links:[Document](https://dx.doi.org/10.18653/v1/2022.blackboxnlp-1.20)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p2.1)\.
- Voyage AI \(2024\)Voyage\-3 & voyage\-3\-lite: a new generation of small yet mighty general\-purpose embedding models\.Note:[https://blog\.voyageai\.com/2024/09/18/voyage\-3/](https://blog.voyageai.com/2024/09/18/voyage-3/)Accessed: 2026\-05\-01Cited by:[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1)\.
- Voyage AI \(2025\)Voyage\-3\-large: the new state\-of\-the\-art general\-purpose embedding model\.Note:[https://blog\.voyageai\.com/2025/01/07/voyage\-3\-large/](https://blog.voyageai.com/2025/01/07/voyage-3-large/)Accessed: 2026\-05\-01Cited by:[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1)\.
- Voyage AI \(2026\)The Voyage 4 model family: shared embedding space with MoE architecture\.Note:[https://blog\.voyageai\.com/2026/01/15/voyage\-4/](https://blog.voyageai.com/2026/01/15/voyage-4/)Accessed: 2026\-05\-01Cited by:[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1)\.
- H\. Wachsmuth, S\. Syed, and B\. Stein \(2018\)Retrieval of the best counterargument without prior topic knowledge\.InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics \(ACL\),pp\. 241–251\.External Links:[Document](https://dx.doi.org/10.18653/v1/P18-1023)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p2.1)\.
- L\. Wang, N\. Yang, X\. Huang, B\. Jiao, L\. Yang, D\. Jiang, R\. Majumder, and F\. Wei \(2022\)Text embeddings by weakly\-supervised contrastive pre\-training\.External Links:2212\.03533,[Link](https://arxiv.org/abs/2212.03533)Cited by:[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1)\.
- S\. Xiao, Z\. Liu, P\. Zhang, N\. Muennighoff, D\. Lian, and J\. Nie \(2024\)C\-Pack: packed resources for general Chinese embeddings\.InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval \(SIGIR\),pp\. 641–649\.External Links:[Document](https://dx.doi.org/10.1145/3626772.3657878)Cited by:[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1)\.
- H\. Xu, Z\. Lin, Y\. Sun, K\. Chang, and P\. Indyk \(2024\)SparseCL: sparse contrastive learning for contradiction retrieval\.External Links:2406\.10746,[Link](https://arxiv.org/abs/2406.10746)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p2.1),[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1),[§2](https://arxiv.org/html/2605.08360#S2.p2.1)\.
- A\. Yang, B\. Yang, B\. Hui, B\. Zheng, B\. Yu, C\. Zhou, C\. Li, C\. Li, D\. Liu, F\. Huang, G\. Dong, H\. Wei, H\. Lin, J\. Tang, J\. Wang, J\. Yang, J\. Tu, J\. Zhang, J\. Ma, J\. Xu, J\. Zhou, J\. Bai, J\. He, J\. Lin, K\. Dang, K\. Lu, K\. Chen, K\. Yang, M\. Li, M\. Xue, N\. Ni, P\. Zhang, P\. Wang, R\. Peng, R\. Men, R\. Gao, R\. Lin, S\. Wang, S\. Bai, S\. Tan, T\. Zhu, T\. Li, T\. Liu, W\. Ge, X\. Deng, X\. Zhou, X\. Ren, X\. Zhang, X\. Wei, X\. Ren, X\. Liu, Y\. Fan, Y\. Yao, Y\. Zhang, Y\. Wan, Y\. Chu, Y\. Liu, Z\. Cui, Z\. Zhang, and Z\. Fan \(2024\)Qwen2 technical report\.External Links:2407\.10671,[Link](https://arxiv.org/abs/2407.10671)Cited by:[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1)\.
- J\. C\. Yang and F\. Bachmann \(2025\)Bridging voting and deliberation with algorithms: field insights from vTaiwan and Kultur Komitee\.InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency \(FAccT\),pp\. 3207–3223\.Cited by:[§8](https://arxiv.org/html/2605.08360#S8.p2.2)\.
- D\. Zhang, J\. Li, Z\. Zeng, and F\. Wang \(2024\)Jasper and Stella: distillation of SOTA embedding models\.External Links:2412\.19048,[Link](https://arxiv.org/abs/2412.19048)Cited by:[Appendix F](https://arxiv.org/html/2605.08360#A6.p2.1)\.
- Y\. Zhang, G\. Zhang, Y\. Wu, K\. Xu, and Q\. Gu \(2025\)Beyond Bradley\-Terry models: a general preference model for language model alignment\.InProceedings of the 42nd International Conference on Machine Learning \(ICML\),Proceedings of Machine Learning Research, Vol\.267\.External Links:[Link](https://arxiv.org/abs/2410.02197)Cited by:[Appendix A](https://arxiv.org/html/2605.08360#A1.p4.1)\.
## Appendix AExtended Related Work
Work on collective decision\-making over free\-form text increasingly relies on some representation of participants or statements\. Polis derives a low\-dimensional opinion map from the participant\-by\-comment vote matrix via PCA and clustering\[Smallet al\.,[2021](https://arxiv.org/html/2605.08360#bib.bib1),[2023](https://arxiv.org/html/2605.08360#bib.bib2)\], generative social choice and PROSE group statements in an LLM\-defined feature space to produce representative slates\[Fishet al\.,[2026](https://arxiv.org/html/2605.08360#bib.bib3), Boehmeret al\.,[2025](https://arxiv.org/html/2605.08360#bib.bib4)\],Blairet al\.\[[2026](https://arxiv.org/html/2605.08360#bib.bib5)\]model approximate consensus as a region of sentence\-embedding space, andDeet al\.\[[2026](https://arxiv.org/html/2605.08360#bib.bib30)\]audit justified representation of question slates using cosine similarity of question embeddings as participant utility\. Complementary work aggregates text without an explicit embedding geometry, for example through reward models, token\-level policies, or sampling\-based social\-choice procedures\[Tessleret al\.,[2024](https://arxiv.org/html/2605.08360#bib.bib27), Blair and Larson,[2025](https://arxiv.org/html/2605.08360#bib.bib28), Chooiet al\.,[2026](https://arxiv.org/html/2605.08360#bib.bib31), Grandiet al\.,[2026](https://arxiv.org/html/2605.08360#bib.bib8)\]\.
A separate line of work studies the mismatch between semantic overlap and stance\. Vahtola et al\.\[Vahtolaet al\.,[2022](https://arxiv.org/html/2605.08360#bib.bib22)\]show that generic sentence embeddings struggle when negation or antonymy flips meaning; Introne\[Introne,[2023](https://arxiv.org/html/2605.08360#bib.bib21)\]and Ghafouri et al\.\[Ghafouriet al\.,[2024](https://arxiv.org/html/2605.08360#bib.bib20)\]fine\-tune encoders to separate opposing viewpoints for stance detection or opinion retrieval; and counter\-argument retrieval methods add explicit dissimilarity terms or sparsity\-aware scoring to retrieve contradictions\[Wachsmuthet al\.,[2018](https://arxiv.org/html/2605.08360#bib.bib23), Xuet al\.,[2024](https://arxiv.org/html/2605.08360#bib.bib24)\]\. We have a similar starting point, but make a unique diagnosis and propose a formal model, then use these to inspire a novel method\.
A long tradition in political science estimates low\-dimensional ideal points from political data\. The canonical approach models legislators’ roll\-call votes as a function of their positions in a latent spatial model\[Poole and Rosenthal,[1985](https://arxiv.org/html/2605.08360#bib.bib32), Clintonet al\.,[2004](https://arxiv.org/html/2605.08360#bib.bib33)\]\. A parallel line recovers positions directly from text: Wordscores scales manifestos against reference documents\[Laveret al\.,[2003](https://arxiv.org/html/2605.08360#bib.bib35)\], Wordfish estimates time\-series party positions from speech\[Slapin and Proksch,[2008](https://arxiv.org/html/2605.08360#bib.bib34)\], and later work joins text with votes\[Gerrish and Blei,[2011](https://arxiv.org/html/2605.08360#bib.bib36)\], combines topic models with ideal points\[Vafaet al\.,[2020](https://arxiv.org/html/2605.08360#bib.bib37)\], or augments word embeddings with speaker metadata\[Rheault and Cochrane,[2020](https://arxiv.org/html/2605.08360#bib.bib26)\]\. Our §[7](https://arxiv.org/html/2605.08360#S7)finding — that preferences on a frozen sentence embedding are linearly accessible through a rank\-∼\\sim20 projection — is consistent with this tradition, recovered here as a property of a general\-purpose pretrained encoder on deliberation data from non\-legislators\.
Our work also connects to preference learning more broadly\. Bradley\-Terry objectives are standard in reward modeling and preference\-based fine\-tuning\[Christianoet al\.,[2017](https://arxiv.org/html/2605.08360#bib.bib11), Ouyanget al\.,[2022](https://arxiv.org/html/2605.08360#bib.bib12), Rafailovet al\.,[2023](https://arxiv.org/html/2605.08360#bib.bib13)\], where they reshape a generative policy’s output distribution\.Zhanget al\.\[[2025](https://arxiv.org/html/2605.08360#bib.bib71)\]also embed responses but score preferences with a skew\-symmetric operator to express within\-user cyclic preferences, a concern that does not arise in our cross\-user setting\. We use the Bradley\-Terry objective to reshape a reusable embedding geometry that can be consumed by metric social choice and other downstream geometric procedures\[Anshelevichet al\.,[2021](https://arxiv.org/html/2605.08360#bib.bib6), Bulteauet al\.,[2021](https://arxiv.org/html/2605.08360#bib.bib7)\], rather than only a per\-query reward model\.
## Appendix BDataset Details
The GSC datasets come from two studies byFishet al\.\[[2026](https://arxiv.org/html/2605.08360#bib.bib3)\]\. In each, participants first write free\-text opinions describing their views on a topic, then rate a set of AI\-generated policy statements on a Likert scale\. We construct triplets from all pairs of statements with distinct ratings for each participant\. The abortion generation survey uses participants who authored the original opinions; the abortion validation survey uses a separate cohort who rate the same statements and give verbal feedback to the statements\. The chatbot personalization survey follows the same structure\. Data for the abortion surveys is available at[https://github\.com/generative\-social\-choice/gsc\_abortion](https://github.com/generative-social-choice/gsc_abortion)and for chatbot personalization at[https://github\.com/generative\-social\-choice/chatbot\_personalization](https://github.com/generative-social-choice/chatbot_personalization)\.
The Remesh datasets come from the Polarized Issues corpus, which collects binary agree/disagree votes on open\-ended responses to political questions using the Remesh platform\. Participants first write their own response to a prompt, then vote on responses written by other participants\. We use three topics: campus protests, foreign intervention, and right to assemble\. The data is available at[https://github\.com/akonya/polarized\-issues\-data](https://github.com/akonya/polarized-issues-data)\.
The Polis datasets come from publicly available Polis conversations fromSmallet al\.\[[2021](https://arxiv.org/html/2605.08360#bib.bib1)\]\. In Polis, participants write short comments and vote agree, disagree, or pass on comments written by others\. We restrict to participants who both authored at least one comment and voted on at least five others, using their authored comments as anchors\. These datasets have the shortest texts and sparsest preference signal of any platform in our evaluation\. The data is available at[https://github\.com/compdemocracy/openData](https://github.com/compdemocracy/openData)\.
#### Licenses\.
The GSC datasets are released under AGPL\-3\.0, the Polarized Issues \(Remesh\) data under CC\-BY\-4\.0, and the Polis openData under CC\-BY\-4\.0\. The Habermas Machine dataset used for training\-data generation is released under CC\-BY\-4\.0 \(data\) and Apache\-2\.0 \(code\), and the Kialo dataset under MIT\. The embedding models used are sentence\-T5\-XL \(Apache\-2\.0\), e5\-large\-v2 \(MIT\), BGE\-large\-en\-v1\.5 \(MIT\), and all\-mpnet\-base\-v2 \(Apache\-2\.0\)\. All licenses permit research use\.
Table 6:Evaluation datasets\. Participants authored anchors \(own text\) and voted on statements \(others’ text\); pairwise preference triplets derive from their vote orderings\. GSC uses a small fixed statement pool; Remesh and Polis use open pools seeded by participants themselves\. We use all 11 datasets for evaluation\.
## Appendix CProof of Theorem[1](https://arxiv.org/html/2605.08360#Thmtheorem1)
Fix a symmetric in\-subspace operatorBBwithB=PSBPSB=P\_\{S\}BP\_\{S\}, and write
ℓ\(u\)=log\(1\+e−u\)\.\\ell\(u\)=\\log\(1\+e^\{\-u\}\)\.For a triplet\(a,p,n\)\(a,p,n\), recall that
ΔB=ψS\(a\)⊤B\(ψS\(p\)−ψS\(n\)\),ΔT=⟨ψ⟂\(a\),ψ⟂\(p\)−ψ⟂\(n\)⟩\.\\Delta\_\{B\}=\\psi\_\{S\}\(a\)^\{\\top\}B\\bigl\(\\psi\_\{S\}\(p\)\-\\psi\_\{S\}\(n\)\\bigr\),\\qquad\\Delta\_\{T\}=\\bigl\\langle\\psi\_\{\\perp\}\(a\),\\psi\_\{\\perp\}\(p\)\-\\psi\_\{\\perp\}\(n\)\\bigr\\rangle\.Thus, the risk can be written as
R\(B,λ\)=𝔼\[ℓ\(ΔB\+λΔT\)\]\.R\(B,\\lambda\)=\\mathbb\{E\}\\bigl\[\\ell\(\\Delta\_\{B\}\+\\lambda\\Delta\_\{T\}\)\\bigr\]\.
Our goal is to show thatR\(B,λ\)<R\(B,1\)R\(B,\\lambda\)<R\(B,1\)for everyλ∈\[0,1\)\\lambda\\in\[0,1\)\.
To do so, we will first note bounds which will be required below\. Since the encoder outputs unit\-norm embeddings and orthogonal projection cannot increase the norm \(by the Pythagorean theorem\),
‖ψ⟂\(a\)‖≤1,‖ψ⟂\(p\)−ψ⟂\(n\)‖≤‖ψ⟂\(p\)‖\+‖ψ⟂\(n\)‖≤2\.\\\|\\psi\_\{\\perp\}\(a\)\\\|\\leq 1,\\qquad\\\|\\psi\_\{\\perp\}\(p\)\-\\psi\_\{\\perp\}\(n\)\\\|\\leq\\\|\\psi\_\{\\perp\}\(p\)\\\|\+\\\|\\psi\_\{\\perp\}\(n\)\\\|\\leq 2\.Therefore,
\|ΔT\|=\|⟨ψ⟂\(a\),ψ⟂\(p\)−ψ⟂\(n\)⟩\|≤2\|\\Delta\_\{T\}\|=\\bigl\|\\langle\\psi\_\{\\perp\}\(a\),\\psi\_\{\\perp\}\(p\)\-\\psi\_\{\\perp\}\(n\)\\rangle\\bigr\|\\leq 2everywhere\. Also, sinceBBis fixed and the embeddings have norm at most one,
\|ΔB\|≤2‖B‖op\.\|\\Delta\_\{B\}\|\\leq 2\\\|B\\\|\_\{\\mathrm\{op\}\}\.Since\|ΔB\+λΔT\|≤2‖B‖op\+2\|λ\|\|\\Delta\_\{B\}\+\\lambda\\Delta\_\{T\}\|\\leq 2\\\|B\\\|\_\{\\mathrm\{op\}\}\+2\|\\lambda\|for each fixedλ\\lambda, andℓ\\ellis continuous,ℓ\(ΔB\+λΔT\)\\ell\(\\Delta\_\{B\}\+\\lambda\\Delta\_\{T\}\)is bounded and therefore integrable\.
We now justify differentiatingR\(B,λ\)R\(B,\\lambda\)with respect toλ\\lambda\. The derivative of the logistic loss is
ℓ′\(u\)=−11\+eu,\\ell^\{\\prime\}\(u\)=\-\\frac\{1\}\{1\+e^\{u\}\},
so\|ℓ′\(u\)\|≤1\|\\ell^\{\\prime\}\(u\)\|\\leq 1for everyu∈ℝu\\in\\mathbb\{R\}\. Fixλ∈ℝ\\lambda\\in\\mathbb\{R\}and, for nonzerohh, define the difference quotient random variable
Qh=ℓ\(ΔB\+\(λ\+h\)ΔT\)−ℓ\(ΔB\+λΔT\)h\.Q\_\{h\}=\\frac\{\\ell\(\\Delta\_\{B\}\+\(\\lambda\+h\)\\Delta\_\{T\}\)\-\\ell\(\\Delta\_\{B\}\+\\lambda\\Delta\_\{T\}\)\}\{h\}\.For each realization of the triplet,ΔB\\Delta\_\{B\}andΔT\\Delta\_\{T\}are fixed finite numbers, and the map
η↦ℓ\(ΔB\+ηΔT\)\\eta\\mapsto\\ell\(\\Delta\_\{B\}\+\\eta\\Delta\_\{T\}\)is differentiable\. Hence, ash→0h\\to 0,
Qh→ℓ′\(ΔB\+λΔT\)ΔTQ\_\{h\}\\to\\ell^\{\\prime\}\(\\Delta\_\{B\}\+\\lambda\\Delta\_\{T\}\)\\Delta\_\{T\}pointwise\. Moreover, by the mean value theorem, for eachh≠0h\\neq 0there exists someξh\\xi\_\{h\}betweenλ\\lambdaandλ\+h\\lambda\+hsuch that
Qh=ℓ′\(ΔB\+ξhΔT\)ΔT\.Q\_\{h\}=\\ell^\{\\prime\}\(\\Delta\_\{B\}\+\\xi\_\{h\}\\Delta\_\{T\}\)\\Delta\_\{T\}\.Therefore
\|Qh\|≤\|ℓ′\(ΔB\+ξhΔT\)\|\|ΔT\|≤2,\|Q\_\{h\}\|\\leq\|\\ell^\{\\prime\}\(\\Delta\_\{B\}\+\\xi\_\{h\}\\Delta\_\{T\}\)\|\\,\|\\Delta\_\{T\}\|\\leq 2,using\|ℓ′\|≤1\|\\ell^\{\\prime\}\|\\leq 1and\|ΔT\|≤2\|\\Delta\_\{T\}\|\\leq 2\. The constant function22is integrable, so dominated convergence gives
R′\(B,λ\)=limh→0R\(B,λ\+h\)−R\(B,λ\)h=limh→0𝔼\[Qh\]=𝔼\[ℓ′\(ΔB\+λΔT\)ΔT\]\.R^\{\\prime\}\(B,\\lambda\)=\\lim\_\{h\\to 0\}\\frac\{R\(B,\\lambda\+h\)\-R\(B,\\lambda\)\}\{h\}=\\lim\_\{h\\to 0\}\\mathbb\{E\}\[Q\_\{h\}\]=\\mathbb\{E\}\\bigl\[\\ell^\{\\prime\}\(\\Delta\_\{B\}\+\\lambda\\Delta\_\{T\}\)\\Delta\_\{T\}\\bigr\]\.
Thus, for everyλ∈ℝ\\lambda\\in\\mathbb\{R\},
R′\(B,λ\)=𝔼\[ℓ′\(ΔB\+λΔT\)ΔT\]\.R^\{\\prime\}\(B,\\lambda\)=\\mathbb\{E\}\\bigl\[\\ell^\{\\prime\}\(\\Delta\_\{B\}\+\\lambda\\Delta\_\{T\}\)\\Delta\_\{T\}\\bigr\]\.
We now evaluate this derivative atλ=0\\lambda=0\. Let
𝒢=σ\(ψS\(a\),ψS\(p\),ψS\(n\)\)\\mathcal\{G\}=\\sigma\\bigl\(\\psi\_\{S\}\(a\),\\psi\_\{S\}\(p\),\\psi\_\{S\}\(n\)\\bigr\)be theσ\\sigma\-algebra generated by the preference\-subspace parts of the triplet\. SinceB=PSBPSB=P\_\{S\}BP\_\{S\}, the quantityΔB\\Delta\_\{B\}is a function only of the projected variables in𝒢\\mathcal\{G\}\. Henceℓ′\(ΔB\)\\ell^\{\\prime\}\(\\Delta\_\{B\}\)is𝒢\\mathcal\{G\}\-measurable\. By the tower property,
R′\(B,0\)=𝔼\[ℓ′\(ΔB\)ΔT\]=𝔼\[𝔼\[ℓ′\(ΔB\)ΔT∣𝒢\]\]\.R^\{\\prime\}\(B,0\)=\\mathbb\{E\}\\bigl\[\\ell^\{\\prime\}\(\\Delta\_\{B\}\)\\Delta\_\{T\}\\bigr\]=\\mathbb\{E\}\\Bigl\[\\mathbb\{E\}\\bigl\[\\ell^\{\\prime\}\(\\Delta\_\{B\}\)\\Delta\_\{T\}\\mid\\mathcal\{G\}\\bigr\]\\Bigr\]\.
Sinceℓ′\(ΔB\)\\ell^\{\\prime\}\(\\Delta\_\{B\}\)is𝒢\\mathcal\{G\}\-measurable, the pull\-out property gives
𝔼\[ℓ′\(ΔB\)ΔT∣𝒢\]=ℓ′\(ΔB\)𝔼\[ΔT∣𝒢\]\.\\mathbb\{E\}\\bigl\[\\ell^\{\\prime\}\(\\Delta\_\{B\}\)\\Delta\_\{T\}\\mid\\mathcal\{G\}\\bigr\]=\\ell^\{\\prime\}\(\\Delta\_\{B\}\)\\,\\mathbb\{E\}\[\\Delta\_\{T\}\\mid\\mathcal\{G\}\]\.
Therefore
R′\(B,0\)=𝔼\[ℓ′\(ΔB\)𝔼\[ΔT∣𝒢\]\]\.R^\{\\prime\}\(B,0\)=\\mathbb\{E\}\\Bigl\[\\ell^\{\\prime\}\(\\Delta\_\{B\}\)\\,\\mathbb\{E\}\[\\Delta\_\{T\}\\mid\\mathcal\{G\}\]\\Bigr\]\.
The hard\-triplet condition states that
𝔼\[ΔT∣𝒢\]≤0a\.s\.,\\mathbb\{E\}\[\\Delta\_\{T\}\\mid\\mathcal\{G\}\]\\leq 0\\quad\\text\{a\.s\.\},with strict inequality on a set of positive probability\. Also,
ℓ′\(u\)=−11\+eu<0for everyu∈ℝ,\\ell^\{\\prime\}\(u\)=\-\\frac\{1\}\{1\+e^\{u\}\}<0\\qquad\\text\{for every \}u\\in\\mathbb\{R\},and henceℓ′\(ΔB\)<0\\ell^\{\\prime\}\(\\Delta\_\{B\}\)<0everywhere\. Therefore the product
ℓ′\(ΔB\)𝔼\[ΔT∣𝒢\]\\ell^\{\\prime\}\(\\Delta\_\{B\}\)\\,\\mathbb\{E\}\[\\Delta\_\{T\}\\mid\\mathcal\{G\}\]is nonnegative almost surely and strictly positive on a set of positive probability\. Hence
R′\(B,0\)\>0\.R^\{\\prime\}\(B,0\)\>0\.
It remains to show that this positivity at zero implies the desired comparison withλ=1\\lambda=1\. The lossℓ\\ellis convex because
ℓ′′\(u\)=eu\(1\+eu\)2\>0for everyu∈ℝ\.\\ell^\{\\prime\\prime\}\(u\)=\\frac\{e^\{u\}\}\{\(1\+e^\{u\}\)^\{2\}\}\>0\\qquad\\text\{for every \}u\\in\\mathbb\{R\}\.For each realization of the triplet, the map
λ↦ΔB\+λΔT\\lambda\\mapsto\\Delta\_\{B\}\+\\lambda\\Delta\_\{T\}is affine, so
λ↦ℓ\(ΔB\+λΔT\)\\lambda\\mapsto\\ell\(\\Delta\_\{B\}\+\\lambda\\Delta\_\{T\}\)is convex\. Taking expectations preserves convexity, and henceR\(B,⋅\)R\(B,\\cdot\)is convex\. SinceR\(B,⋅\)R\(B,\\cdot\)is differentiable, its derivative is nondecreasing\. Therefore, for everyλ∈\[0,1\]\\lambda\\in\[0,1\],
R′\(B,λ\)≥R′\(B,0\)\>0\.R^\{\\prime\}\(B,\\lambda\)\\geq R^\{\\prime\}\(B,0\)\>0\.
Now fix anyλ¯∈\[0,1\)\\bar\{\\lambda\}\\in\[0,1\)\. By the mean value theorem, there exists somec∈\(λ¯,1\)c\\in\(\\bar\{\\lambda\},1\)such that
R\(B,1\)−R\(B,λ¯\)=R′\(B,c\)\(1−λ¯\)\.R\(B,1\)\-R\(B,\\bar\{\\lambda\}\)=R^\{\\prime\}\(B,c\)\(1\-\\bar\{\\lambda\}\)\.Sincec∈\[0,1\]c\\in\[0,1\], we haveR′\(B,c\)\>0R^\{\\prime\}\(B,c\)\>0, and since1−λ¯\>01\-\\bar\{\\lambda\}\>0,
R\(B,1\)−R\(B,λ¯\)\>0\.R\(B,1\)\-R\(B,\\bar\{\\lambda\}\)\>0\.Equivalently,
R\(B,λ¯\)<R\(B,1\)\.R\(B,\\bar\{\\lambda\}\)<R\(B,1\)\.Becauseλ¯∈\[0,1\)\\bar\{\\lambda\}\\in\[0,1\)was arbitrary,
R\(B,λ\)<R\(B,1\)for everyλ∈\[0,1\)\.R\(B,\\lambda\)<R\(B,1\)\\qquad\\text\{for every \}\\lambda\\in\[0,1\)\.∎
## Appendix DAdditional Experimental Details
#### Hyperparameter selection\.
The sentence\-T5\-XL recipe reported in the main text was selected by a sweep over learning rate∈\{1\.25e\-5,5e\-5,1\.25e\-4,5e\-4\}\\in\\\{1\.25\\text\{e\-\}5,5\\text\{e\-\}5,1\.25\\text\{e\-\}4,5\\text\{e\-\}4\\\}, hard\-triplet count∈\{250,500,750,1000,2000\}\\in\\\{250,500,750,1000,2000\\\}, LoRA rank∈\{4,8,16,32,64\}\\in\\\{4,8,16,32,64\\\}, and LoRAα=3r\\alpha=3ron a held\-out validation split\. The selected configuration islr=1\.25e\-4\\mathrm\{lr\}=1\.25\\text\{e\-\}4,nhard=750n\_\{\\mathrm\{hard\}\}=750,r=16r=16,α=48\\alpha=48, which corresponds to 46 gradient steps at batch size 16 on a random 750 of the 2,000\-triplet synthetic pool, sampled uniformly without replacement and resampled independently for each training seed\. For cross\-model transfer \(e5, BGE, all\-mpnet\) we fix rank and alpha and re\-select\(lr,nhard\)\(\\mathrm\{lr\},n\_\{\\mathrm\{hard\}\}\)per encoder from the same grid; selected configurations are\(5e\-5,1000\)\(5\\text\{e\-\}5,1000\)for e5\-large,\(1\.25e\-4,500\)\(1\.25\\text\{e\-\}4,500\)for BGE\-large, and\(1\.25e\-4,1000\)\(1\.25\\text\{e\-\}4,1000\)for all\-mpnet\. We AdamW with a linear\-decay schedule and a 10% warmup ratio for all models\.
#### Compute\.
All experiments were run on a single NVIDIA A100 GPU\. Training a single DPT seed takes less than 20 minutes\.
## Appendix EAdditional Experiments
### E\.1Error Decomposition
To understand the residual errors of the tuned model, we classify a stratified sample of 913 errors using GPT\-4o\. Each error is a triplet where the tuned model incorrectly ranks the dispreferred statement higher\. We present the full triplet \(anchor texts, preferred, dispreferred\) and ask the classifier to select from six categories, presented in randomized order to avoid position bias\.
Table 7:LLM\-classified error categories for tuned model errors \(913 triplets, stratified across datasets, capped at 5 per participant\)\.Surface similarity remains the dominant failure mode at 61%, which indicates room for further improvement through refined hard\-triplet training\. Insufficient anchor signal accounts for 20\.5% and is concentrated in Polis datasets where comments are short: 73% of Bowling Green errors fall in this category, compared to 6% for GSC\. Subtle value distinctions at 16\.9% represent cases where both options broadly match the anchor’s views but differ on fine\-grained nuance below the resolution of sentence\-level similarity\. The high number of “insufficient anchor signal” errors is notable since this is technically something no embedding model can solve\. This must be addressed upstream by encouraging participants to write more or perhaps by giving them scaffolding to aid in their articulation of their preferences\[Blairet al\.,[2025](https://arxiv.org/html/2605.08360#bib.bib68), Liet al\.,[2025](https://arxiv.org/html/2605.08360#bib.bib69), Handaet al\.,[2024](https://arxiv.org/html/2605.08360#bib.bib70)\]\.
### E\.2LoRA Rank Ablation\.
We vary LoRA rankr∈\{4,8,16,32,64\}r\\in\\\{4,8,16,32,64\\\}withα=3r\\alpha=3r, holding all other hyperparameters at the selected ST5\-XL configuration\. Table[8](https://arxiv.org/html/2605.08360#A5.T8)reports held\-out test accuracy averaged across the 11 main evaluation datasets\. Accuracy peaks atr=16r\{=\}16:67\.6%67\.6\\%atr=4r\{=\}4,68\.7%68\.7\\%atr=16r\{=\}16,67\.1%67\.1\\%atr=32r\{=\}32, and63\.8%63\.8\\%atr=64r\{=\}64\. With only 46 optimizer steps on 750 triplets, higher ranks introduce more free parameters than the data can constrain, and the extra capacity starts overwriting the pretrained geometry that the tuning is meant to realign, not replace\.
Table 8:LoRA rank ablation on sentence\-T5\-XL \(single seed, mean test accuracy across 11 datasets\)\.
### E\.3Loss Ablation: Bradley\-Terry vs InfoNCE\.
We replace the pairwise Bradley\-Terry loss with InfoNCE \(temperature0\.050\.05\)\. For each anchor in a batch of 16, the positive is its own preference match and the negatives are its own semantic distractor plus the other 15 anchors’ preference matches as in\-batch negatives, giving 16 negatives per anchor\. All other hyperparameters are held at the selected ST5\-XL configuration\. The two losses yield essentially identical mean test accuracy \(68\.7% for BT vs 68\.6% for InfoNCE\), so the gain comes from the counterfactual triplet construction itself, not the choice of pairwise objective\. We report Bradley\-Terry in the main results because its score is exactly the signed cosine difference used at inference, and because it operates on each triplet independently — no batch size or batch composition to tune\.
### E\.4Normal\-Correlation Triplet Ablation\.
The hard\-triplet construction in §[4\.3](https://arxiv.org/html/2605.08360#S4.SS3)engineers a worst\-case correlation between stance and wording so that the nuisance margin satisfies the hard\-triplet condition𝔼\[ΔT∣𝒢\]≤0\\mathbb\{E\}\[\\Delta\_\{T\}\\mid\\mathcal\{G\}\]\\leq 0\. The purpose of this is to force the model to rely less on the nuisance subspace and more on the preference subspace\. We test whether this engineering is necessary by generating2,0002\{,\}000*normal\-correlation*training triplets in which stance and wording move together\. This mimics more traditional NLI pair training where the correlation between the desired signal and a nuisance signal is not explicitly broken\[Gaoet al\.,[2021](https://arxiv.org/html/2605.08360#bib.bib67)\]\. The rewrite prompt mirrors the hard prompt but flips the role assignment, so the preference match shares vocabulary with the anchor while the semantic distractor uses different vocabulary\. The full prompt is in Appendix[G](https://arxiv.org/html/2605.08360#A7)\. By construction𝔼\[ΔT∣𝒢\]\>0\\mathbb\{E\}\[\\Delta\_\{T\}\\mid\\mathcal\{G\}\]\>0, the opposite of the hard\-triplet condition used in Theorem[1](https://arxiv.org/html/2605.08360#Thmtheorem1)\. We sweep the same hyperparameter grid used for the hard\-triplet selection in Appendix[D](https://arxiv.org/html/2605.08360#A4)and select the configuration with the highest validation macro\-mean accuracy on the 11 natural datasets, which giveslr=1\.25e\-5\\mathrm\{lr\}=1\.25\\text\{e\-\}5,n=2000n=2000,r=64r=64\. Re\-running this configuration over five seeds yields 67\.0±\\pm0\.1% mean test accuracy and 59\.3±\\pm0\.2% accuracy on the hard evaluation triplets\. The matched hard\-trained DPT configuration in Table[3](https://arxiv.org/html/2605.08360#S6.T3)reaches 68\.6±\\pm0\.3% test and 80\.0±\\pm0\.4% on the hard evaluation triplets\. Normal\-correlation training therefore recovers a small lift over base cosine, but it is significantly worse than DPT\.
### E\.5Participant\-Level Significance\.
Table[3](https://arxiv.org/html/2605.08360#S6.T3)reports dataset\-mean accuracies across five training seeds, which is the right unit for comparing tuning recipes but does not directly speak to whether the gain is reliable on the unit of deployment, the individual participant\. We therefore run a paired test on per\-participant accuracies\. For each of the 11 evaluation datasets we replay the test\-split scoring loop under both base sentence\-T5\-XL and the DPT\-tuned encoder, keeping the per\-triplet outcome \(11if the encoder ranks the preference match above the distractor,0otherwise\) and the per\-participant accuracy \(e\.g\., the accuracy for participantiiusing the base model isaibasea^\{\\text\{base\}\}\_\{i\}\)\. Each participant then has a pair\(aibase,aituned\)\(a^\{\\text\{base\}\}\_\{i\},a^\{\\text\{tuned\}\}\_\{i\}\), and each triplet has a pair of binary outcomes\. We report a paired Wilcoxon signed\-rank test on the participant accuracies and a McNemar exact test on the triplet outcomes\.
Pooled across all 442 participants in the 11 test cohorts, the mean accuracy lift is \+2\.0 pp \(median \+0\.9 pp\)\. Tuned beats base on 51\.6% of participants, ties on 17\.2%, and loses on 31\.2%\. The paired Wilcoxon test yieldsp=3\.4×10−9p=3\.4\\times 10^\{\-9\}and the paired t\-test yieldsp=1\.8×10−8p=1\.8\\times 10^\{\-8\}\. McNemar over the 24,834 discordant triplets returnsb=14,891b=14\{,\}891wins for tuned andc=9,943c=9\{,\}943wins for base, withp=9\.5×10−218p=9\.5\\times 10^\{\-218\}\. Per\-dataset results are in Table[9](https://arxiv.org/html/2605.08360#A5.T9)\. The participant\-level Wilcoxon reachesp<0\.05p<0\.05on55of1111datasets, while the triplet\-level McNemar reachesp<0\.05p<0\.05on1010of1111\. The lone non\-mover under both is GSC chatbot, where DPT shows the smallest accuracy lift across the entire benchmark \(Δ=\+0\.4\\Delta=\+0\.4pp\)\. The Wilcoxon non\-significance on the four small Polis cohorts \(Seattle, Brexit, Canadian, UBI\) reflects power rather than effect, since each shows tuned\-mean above base\-mean and the triplet\-level McNemar reachesp<0\.05p<0\.05on all four\. The per\-dataset and pooled accuracies in Table[9](https://arxiv.org/html/2605.08360#A5.T9)are means of per\-participant accuracies, which weight each participant equally rather than each triplet \(as Table[15](https://arxiv.org/html/2605.08360#A6.T15)and Table[4](https://arxiv.org/html/2605.08360#S6.T4)do\)\.
Table 9:Paired test of DPT\-tuned vs base sentence\-T5\-XL at the participant and triplet level \(test split, seed\-4242DPT\)\.Δ\\Deltais the per\-participant accuracy difference in percentage points\. Wilcoxon is the paired signed\-rank test on participant accuracies, McNemar is the exact binomial test on per\-triplet wins and losses\.
### E\.6Per\-Topic Scorer: Rank Sweep
Figure[3](https://arxiv.org/html/2605.08360#S7.F3)in §[7](https://arxiv.org/html/2605.08360#S7)shows accuracy as a function of projection rankrr\. Table[10](https://arxiv.org/html/2605.08360#A5.T10)reports the exact macro\-mean values with standard deviations across five seeds\.
Table 10:Per\-topic scorer accuracy as a function of rank \(macro\-mean over 11 datasets, mean±\\pmstd over five seeds\)\.
### E\.7Per\-Topic Scorer: Data Efficiency
Fixingr=20r\{=\}20, we vary the number of labeled triplets per topic and evaluate held\-out natural accuracy\. Figure[4](https://arxiv.org/html/2605.08360#A5.F4)plots the learning curve across individual datasets and the macro\-mean against the base and universal DPT\-tuned cosine baselines\. Averaging over datasets,K≈50K\{\\approx\}50labeled triplets already cross the universal DPT baseline, and accuracy saturates byK≈1,000K\{\\approx\}1\{,\}000\.
Figure 4:Data efficiency of the rank\-20 per\-topic projected embedding on base sentence\-T5\-XL\.
### E\.8User Clustering Coherence on Remesh
Here we test whether the DPT\-tuned and per\-topic projected embeddings provide a geometry in which clusters are coherent in approval behavior, beyond what the base ST5\-XL geometry already provides\. The Remesh deliberation transcripts are well suited to this question as each participant authors one or more comments, votes*Agree*or*Disagree*on a sample of comments written by others, and the comments tend to be more substantive than those from Polis\. Further, in contrast to GSC, there are many statements that were voted on so the approval level for any cluster can be estimated\. Taken together, this allows us to get another angle on how well the geometry aligns with participants’ expressed approval preferences in a setting akin to how it may be deployed\.
For each participantuuwe form a user vector by mean\-pooling the encoder embeddings of their authored comments\. We do this under three geometries, the base sentence\-T5\-XL encoder, the DPT\-tuned encoder of §[5](https://arxiv.org/html/2605.08360#S5), and the rank\-20 per\-topic projectionL⊤ψL^\{\\top\}\\psiof §[7](https://arxiv.org/html/2605.08360#S7), withLLtaken from the ideal\-point scorer fit on the same dataset\. We then runkk\-means on the user vectors fork∈\{3,5,8,10\}k\\in\\\{3,5,8,10\\\}\. For every useruuassigned to clustercc, we compute their*within\-cluster approval rate*, defined as the fraction of*Agree*votes among the cluster\-cc\-authored comments thatuuactually voted on\. The denominator counts only votes that exist, and comments that were never shown touuare excluded\. The analogous*across\-cluster*rate uses the comments not inuu’s cluster\.
We aggregate per\-user rates within a cluster by vote\-weighted mean and then macro\-average over the clusters at a givenkk\. The headline statistic is the liftΔ=within¯−across¯\\Delta=\\overline\{\\text\{within\}\}\-\\overline\{\\text\{across\}\}, which is approximately zero under random assignment\. Each cell in Table[11](https://arxiv.org/html/2605.08360#A5.T11), indexed by dataset, encoder, andkk, averages over fivekk\-means seeds\. We additionally compute a permutation null by shuffling cluster labels, with 50 permutations per seed\. All shuffle lifts have absolute value at most 0\.005, so we report raw lifts directly\.
Table 11:Within−\-across approval\-rate liftΔ\\Delta\(%\) on Remesh user clusters, by encoder \(base ST5\-XL; DPT\-tuned ST5\-XL; rank\-20 per\-topic projection of base ST5\-XL viaL⊤ψL^\{\\top\}\\psi\)\. Each cell is the mean of fivekk\-means seeds; per\-seed shuffle null has\|Δ\|≤0\.5%\|\\Delta\|\\leq 0\.5\\%\. Best entry per row inbold\.Table[11](https://arxiv.org/html/2605.08360#A5.T11)reports the per\-configuration lifts\. Both adapted geometries exceed the base on a strong majority of the 12 cells\. Clustering separates users most cleanly on Campus Protests, whereΔ≈0\.10\\Delta\\approx 0\.10for both the tuned encoder and the projection at all values ofkk\. It separates them less cleanly on Foreign Intervention, where the projection collapses atk≥8k\\geq 8, and only weakly on Right to Assemble, where the global within\-rate is approximately 0\.78 and there is little disagreement left to partition\.
Aggregating across the 12 configurations, the mean lift is 5\.6% for base, 6\.7% for DPT\-tuned, and 6\.6% for the per\-topic projected embedding\. The mean rank, with lower being better, is 2\.50, 1\.75, and 1\.75 respectively\. Paired comparisons over configurations giveΔDPT\-tuned−Δbase=\+1\.11\\Delta\_\{\\text\{DPT\-tuned\}\}\-\\Delta\_\{\\text\{base\}\}=\+1\.11pp, with 95% bootstrap CI\[\+0\.4,\+1\.8\]\[\+0\.4,\+1\.8\]and Wilcoxonp=0\.016p=0\.016, andΔprojected−Δbase=\+1\.00\\Delta\_\{\\text\{projected\}\}\-\\Delta\_\{\\text\{base\}\}=\+1\.00pp, with CI\[−1\.3,\+3\.1\]\[\-1\.3,\+3\.1\]andp=0\.38p=0\.38\. The wider interval for the projection is driven by the two values ofkkon Foreign Intervention where the rank\-20 projection collapses\. DPT\-tuned and the projected embedding are statistically indistinguishable at this scale, withΔ=\+0\.1\\Delta=\+0\.1pp, CI\[−1\.9,\+2\.3\]\[\-1\.9,\+2\.3\], andp=0\.97p=0\.97\. These results suggest that the DPT\-tuned embedding provides a geometry which would be useful for the types of tasks identified in §[8](https://arxiv.org/html/2605.08360#S8)\.
### E\.9Likert\-Rating Correlation on GSC
The triplet metric used throughout the paper collapses each participant’s continuous Likert rating of a candidate statement into a binary preference between two statements for the GSC datasets\. However, the GSC surveys release the underlying ratings on a 0 to 6 scale, which lets us ask the more direct question of how strongly cosine similarity between a participant’s free\-text opinion and a candidate statement tracks the rating they actually gave it, which can give us some indication of whether distances in the DPT\-tuned and per\-topic projected embedding track latent utility as we would hope\.
For each of the three GSC surveys, we recover from the raw CSVs every\(u,s,r\)\(u,s,r\)triple in which participantuurated statementsswith Likert scorer∈\[0,6\]r\\in\[0,6\]\. We assemble an anchor pool per participant from all of their free\-text responses, which include the open\-ended opinion text in the generation and chatbot surveys together with the per\-rating justification text that all three surveys collect\. The validation cohort writes only justifications\. When correlating with the rating of statementss, we exclude the user’s justification text written aboutssfrom the anchor pool\. The remaining texts are mean\-pooled to give a single anchor embedding per rating row\. We then compute Spearman rank correlation between cosine similarity and Likert rating, pooled across all rating rows in a survey, under three geometries\. These are the base sentence\-T5\-XL encoder, the DPT\-tuned encoder of §[5](https://arxiv.org/html/2605.08360#S5), and the rank\-20 per\-topic projected embeddingL⊤ψL^\{\\top\}\\psiof §[7](https://arxiv.org/html/2605.08360#S7), withLLfit from the ideal\-point scorer on the same survey’s votes\.
Table[12](https://arxiv.org/html/2605.08360#A5.T12)shows that both adapted geometries lift the pooled correlation substantially over the base encoder on every survey\. On the two abortion surveys, which are the most explicitly preference\-laden topic in GSC, the per\-topic projected embedding more than doubles the chatbot\-survey correlation gap and pushes pooled correlation past 0\.7 on the validation cohort\. The chatbot survey gives the smallest absolute correlations across the board, and it is the one survey on which the global tune beats the per\-topic projection\.
Table 12:Pooled Spearman rank correlation between cosine similarity \(anchor→\\tostatement\) and the participant’s Likert rating, across the three GSC surveys\. Higher is better\. Best per row inbold\.If we accept that Likert ratings give us a noisy view of the participants’ latent utility, then this experiment provides some evidence that the distances in the DPT\-tuned embedding and the per\-topic projected embedding are more closely related to utility than distances in the base embedding model\.
### E\.10Stacking the Per\-Topic Probe on the Tuned Encoder
The per\-topic metric scorer of §[7](https://arxiv.org/html/2605.08360#S7)is fit on the base sentence\-T5\-XL encoder\. A natural follow\-up question is whether the rank\-20 projection would perform better if trained on the encoder that has been globally tuned by DPT\. We re\-run the same val\-selected metric probe pipeline on the DPT\-tuned encoder, holding all other choices fixed, including the rank, the validation grid, the three\-way participant split, and the five seeds\.
The natural\-test macro is essentially unchanged, moving from 77\.6 \(base ST5\-XL\) to 78\.0 \(DPT\-tuned\)\. The hard evaluation triplet performance lifts from 81\.1 \(base ST5\-XL\) to 87\.4 \(DPT\-tuned\), a 6\.3 percentage\-point gain, with non\-negative gains on every dataset and seven datasets gaining over five points\.
Table 13:Per\-dataset macro\-mean test and 875\-triplet hard accuracy of the rank\-20 metric probe, fit at val\-selected hyperparameters on either the base encoder or the DPT\-tuned encoder\. Mean and standard deviation over five seeds\. Best per row inbold\.This asymmetry between natural and hard data can be explained using our formal framework from §[4\.1](https://arxiv.org/html/2605.08360#S4.SS1)\. The ideal\-point scorer projects into a learned rank\-20 subspace and never sees the orthogonal complement, so the cosine decomposition of \([2](https://arxiv.org/html/2605.08360#S4.E2)\) does not literally apply to the probe’s score, but the same mechanism reappears inside the projection\. Our hypothesis is that val\-selection on natural data tends to pull a few directions into the column space ofLLthat are not strictly preference\-aligned, because on natural data those directions correlate positively with preference and so improve validation accuracy\. They are nuisance directions that look like signal when semantic and preferential similarity are correlated\. When the resulting probe is then evaluated on hard triplets, those nuisance directions contribute against preference, and the projected embedding pays for them with degraded hard\-triplet accuracy\.
By suppressing the out\-of\-subspace nuisance globally, the DPT\-tuning leaves val\-selection with cleaner candidate directions to choose among, and the resulting tunedLLcontains less of the regime\-dependent contamination\. We see this directly in the geometry of the learned subspaces\. LettingQbaseQ\_\{\\text\{base\}\}andQtunedQ\_\{\\text\{tuned\}\}denote orthonormal bases of the two probes’ columns, the singular values ofQbase⊤QtunedQ\_\{\\text\{base\}\}^\{\\top\}Q\_\{\\text\{tuned\}\}are the cosines of the principal angles between the two rank\-20 subspaces\. Across the eleven datasets, the largest cosine averages 0\.95 and the median averages 0\.80, while the smallest averages 0\.17\. The two probes share a core preference subspace where most directions are well aligned, and each has a small number of directions that are nearly orthogonal to anything in the other\. Those near\-orthogonal residues are the encoder\-specific component of each fit, and on hard evaluation triplets it is likely the case that the tuned encoder’s residue is benign while the base encoder’s residue carries the nuisance signal that costs accuracy\.
A second fingerprint comes from decomposing the score margin as2⟨L⊤ψ\(a\),L⊤\(ψ\(p\)−ψ\(n\)\)⟩\+∥L⊤ψ\(n\)∥2−∥L⊤ψ\(p\)∥22\\langle L^\{\\top\}\\psi\(a\),\\,L^\{\\top\}\(\\psi\(p\)\-\\psi\(n\)\)\\rangle\+\\lVert L^\{\\top\}\\psi\(n\)\\rVert^\{2\}\-\\lVert L^\{\\top\}\\psi\(p\)\\rVert^\{2\}, an in\-projected\-space inner product margin plus a projected item\-norm difference, reported per dataset in Table[14](https://arxiv.org/html/2605.08360#A5.T14)\. The projected embedding on top of the DPT\-tuned model improves hard accuracy on 10 of 11 datasets, and on 8 of those the tuned inner product margin is1\.81\.8to5\.95\.9times larger than the base, with the item\-norm term staying small in magnitude relative to the inner product margin\. The remaining two, Polis Bowling Green and Remesh Foreign, also gain hard accuracy but with only a marginal increase in the mean inner product margin \(ratios1\.11\.1and1\.31\.3respectively\), indicating the improvement there is in where the margin lands rather than its mean\. The takeaway is that the same rank\-20 budget extracts a stronger preference signal once the base directions have been cleaned by DPT\.
Table 14:Per\-dataset mean of the in\-projected\-space inner product margin2⟨L⊤ψ\(a\),L⊤\(ψ\(p\)−ψ\(n\)\)⟩2\\langle L^\{\\top\}\\psi\(a\),L^\{\\top\}\(\\psi\(p\)\-\\psi\(n\)\)\\rangleand the projected item\-norm term∥L⊤ψ\(n\)∥2−∥L⊤ψ\(p\)∥2\\lVert L^\{\\top\}\\psi\(n\)\\rVert^\{2\}\-\\lVert L^\{\\top\}\\psi\(p\)\\rVert^\{2\}on the dataset’s share of the 875 hard triplets, computed under the rank\-20 metric probe fit on either the base or the tuned encoder \(single seed\)\. The ratio column is tuned divided by base on the inner product margin\.Two practical readings follow\. First, DPT and the per\-topic probe are complementary contributions, not redundant\. DPT shifts the global encoder geometry to suppress the nuisance subspace, the per\-topic projected embedding extracts the dataset\-specific preference subspace, and the two compose with no measurable cost on natural data and a consistent gain on hard triplets\. Second, val\-selection on natural data is not an unalloyed good for downstream evaluation under distribution shift\. When the candidate directions a fit can choose from include some that correlate with preference, val\-selection can pull them in, and the model will look fine on its own held\-out split while degrading on any distribution where the nuisance correlation flips\.
## Appendix FFull Model Comparison
Table[15](https://arxiv.org/html/2605.08360#A6.T15)is the complete version of the abridged main\-results table \(Table[4](https://arxiv.org/html/2605.08360#S6.T4)in §[6](https://arxiv.org/html/2605.08360#S6)\), covering all 25 embedding models evaluated in this work\.
The 25 baselines are: ST5\-Base, ST5\-Large, ST5\-XL\[Niet al\.,[2022](https://arxiv.org/html/2605.08360#bib.bib10)\]; all\-mpnet\-base\-v2, all\-MiniLM\-L6, all\-MiniLM\-L12, all\-distilroberta, NLI\-mpnet, Paraphrase\-MiniLM\[Reimers and Gurevych,[2019](https://arxiv.org/html/2605.08360#bib.bib9)\]; e5\-large\-v2\[Wanget al\.,[2022](https://arxiv.org/html/2605.08360#bib.bib57)\]; BGE\-large\-en\-v1\.5\[Xiaoet al\.,[2024](https://arxiv.org/html/2605.08360#bib.bib58)\]; GTE\-large\[Liet al\.,[2023](https://arxiv.org/html/2605.08360#bib.bib59)\]; mxbai\-embed\-large\-v1\[Leeet al\.,[2024](https://arxiv.org/html/2605.08360#bib.bib60)\]; Arctic\-embed\-l\[Merricket al\.,[2024](https://arxiv.org/html/2605.08360#bib.bib61)\]; Stella\-en\-1\.5B\[Zhanget al\.,[2024](https://arxiv.org/html/2605.08360#bib.bib65)\]; Qwen2\-1\.5B\-instruct\[Yanget al\.,[2024](https://arxiv.org/html/2605.08360#bib.bib66)\]; OpenAI text\-embedding\-3\-small and text\-embedding\-3\-large\[OpenAI,[2024](https://arxiv.org/html/2605.08360#bib.bib56)\]; voyage\-3\[Voyage AI,[2024](https://arxiv.org/html/2605.08360#bib.bib62)\]; voyage\-3\-large\[Voyage AI,[2025](https://arxiv.org/html/2605.08360#bib.bib63)\]; voyage\-4, voyage\-4\-lite, voyage\-4\-large\[Voyage AI,[2026](https://arxiv.org/html/2605.08360#bib.bib64)\]; StanceAware\-SBERT\[Ghafouriet al\.,[2024](https://arxiv.org/html/2605.08360#bib.bib20)\]; and BGE\-SparseCL \(arguana\)\[Xuet al\.,[2024](https://arxiv.org/html/2605.08360#bib.bib24)\]\.
Table 15:Triplet accuracy \(%\) across all 11 evaluation datasets, full comparison across 25 models\. Best result in each column inbold\. Abridged version in the body as Table[4](https://arxiv.org/html/2605.08360#S6.T4)\.### F\.1Baseline encoding format
Many baselines were trained as asymmetric retrieval encoders, expecting the query side and the passage side of an input to be formatted differently\. Treating these encoders symmetrically \(passing both anchor and candidate through plainencode\(text\)\) consistently under\-represents what they can do\. We therefore apply each model’s intended query/passage convention, treating the participant’s anchor text as the query and the candidate statement as the passage\. The conventions below are sourced from each model card or registeredconfig\_sentence\_transformers\.json, and in all cases cosine similarity is computed between the query\-side and passage\-side embeddings\.
- •Symmetric \(no prefixes\):ST5\-XL/Large/Base, all\-mpnet\-base, all\-MiniLM\-\{L6,L12\}, all\-distilroberta, NLI\-mpnet, Paraphrase\-MiniLM, GTE\-large\[Liet al\.,[2023](https://arxiv.org/html/2605.08360#bib.bib59)\]\(the older non\-instruct variant\), OpenAI text\-embedding\-3\-\{small, large\}, StanceAware\-SBERT\.
- •e5\-large\-v2:"query: "prefix on anchors,"passage: "prefix on candidates\.
- •BGE\-large\-en\-v1\.5, MxBAI\-embed\-large\-v1:"Represent this sentence for searching relevant passages: "prefix on anchors, plain on candidates\.
- •Snowflake Arctic\-embed\-l\-v2\.0:"query: "prefix on anchors, plain on candidates\. The v2 redesign uses E5\-style prefixes rather than the v1 BGE prompt\.
- •BGE\-SparseCL \(arguana\):plain encoding on both sides, matching the authors’ own evaluation script\. The released checkpoint ships without an SBERTmodules\.json; we verified that the default sentence\-transformers wrapper applies mean pooling, matching the paper’s training\-time\-\-pooler\_type avg\.
- •Voyage:the API exposes aninput\_typeparameter; we pass"query"for anchors and"document"for candidates\.
- •Instruction\-tuned \(GTE\-Qwen2\-1\.5B\-instruct, Stella\-en\-1\.5B\-v5\):task\-tailored instruction wrapped in each model’s standard"Instruct: <task\>\\\\backslashnQuery: "template on anchors, plain text on candidates\.
#### Instruction\-tuned prompt selection\.
The registered prompts in each model’sconfig\_sentence\_transformers\.jsontarget web\-search retrieval \(e\.g\.,*“Given a web search query, retrieve relevant passages that answer the query”*\), which does not match a sentence\-to\-sentence preference task\. Qwen was trained with diverse task instructions and supports arbitrary task strings\. Stella was trained primarily on two registered prompts,s2p\_queryfor sentence\-to\-passage retrieval ands2s\_queryfor sentence\-to\-sentence similarity, with the model card recommending those two for general use\. We swap in a task\-tailored instruction,*“Given an opinion statement on a contested social or political issue, retrieve other statements that share the same stance and underlying values,”*using the template described above\.
For Stella, we verified that this choice is benign on natural data: the task\-tailored instruction yields a 52\.7% macro\-mean, the registereds2s\_queryprompt yields 52\.8% \(a 0\.1 pp difference\), and thes2p\_queryprompt yields 49\.4% \(worse, as expected for the web\-search framing\)\. For Qwen, the task\-tailored instruction and the registered web\-search instruction give identical 53\.8% macro\-mean\. The Table[15](https://arxiv.org/html/2605.08360#A6.T15)rows for these two models report the task\-tailored configuration\.
## Appendix GPrompts
Three LLM prompts define the training\-data pipeline: GPT\-4o\-mini filters the issue pool, Claude Sonnet generates 5 diverse opinions per surviving issue, and GPT\-4o rewrites a sampled anchor into a hard triplet\. The same rewrite prompt is used to produce the held\-out hard eval triplets from real participant anchors \(§[4\.3](https://arxiv.org/html/2605.08360#S4.SS3)\)\. Placeholders in braces \(\{issue\_text\},\{anchor\}\) are substituted at call time\. For reproducibility, exact model identifiers and sampling settings are:
- •Issue filter:gpt\-4o\-mini, temperature0, max tokens55\.
- •Opinion generation:claude\-sonnet\-4\-20250514, temperature1\.01\.0, max tokens10241024\.
- •Hard\-triplet rewrite:gpt\-4o, temperature0\.70\.7, max tokens400400\.
#### Issue filter \(GPT\-4o\-mini, one call per candidate issue\)\.
```
You are deciding whether a debate topic is a political, policy, or
social issue suitable for studying public opinion and preference
diversity.
ACCEPT if:
- It is a political, policy, economic, or social issue
- People’s positions reflect their values, ideology, or worldview
(not just personal taste)
- It is the kind of topic debated in legislatures, newspapers, public
surveys, or town halls
- There is a clear spectrum of positions (e.g., pro-regulation vs.
free-market)
REJECT if:
- It is purely factual or scientific with a clear answer
- It is abstract philosophy with no policy relevance
- It is a niche technical question that only specialists would engage
with
- It is about personal lifestyle preferences rather than public affairs
- It has near-universal consensus
- It is nonsensical or too vague
Topic: "{issue_text}"
Respond with exactly one word: ACCEPT or REJECT
```
#### Opinion generation \(Claude Sonnet, one call per issue\)\.
```
Topic: {issue_text}
Write 5 standalone opinions on this topic from 5 different people.
These should read like things people wrote in a forum - NOT like
they are answering a question.
The 5 opinions should be clearly distinct and evenly spaced across
the full opinion spectrum:
1. Strongly supportive - fully committed to this position
2. Moderately supportive - in favor but with reservations
3. Genuinely ambivalent - sees valid points on both sides
4. Moderately opposed - against but acknowledges some merit
5. Strongly opposed - firmly against
Important:
- Do not start with Yes, No, I agree, I disagree or any
response-like framing
- Each opinion should stand alone as a statement of belief
- Make the difference between adjacent positions (e.g., 1 vs 2,
4 vs 5) clear and meaningful
- Keep them short and natural - average ~25 words, some shorter,
some longer
{"opinions": ["op1", "op2", "op3", "op4", "op5"]}
```
#### Hard\-triplet rewrite \(GPT\-4o, one call per sampled anchor\)\.
```
You are given a person’s opinion statement on a political or social
topic.
Original statement: "{anchor}"
Generate two rewritten versions that test whether a model understands
values vs. word overlap:
1. SEMANTIC DISTRACTOR (tricks models that rely on word overlap):
- Keep nearly IDENTICAL wording and sentence structure as the
original
- Only flip the conclusion/stance to be the opposite
- Goal: MAXIMIZE word overlap while having opposite meaning
- Example: "Renewable energy must be prioritized above all else"
-> "Renewable energy is worth considering, but should NOT be
prioritized above all else"
2. PREFERENCE MATCH (requires understanding values, not words):
- Express the SAME stance and values as the original
- But use completely DIFFERENT vocabulary, framing, and sentence
structure
- Goal: MINIMIZE word overlap while preserving the underlying
position
- Example: "Renewable energy must be prioritized above all else"
-> "Fossil fuel dependence is our greatest threat - government
must act decisively on clean power"
Respond with valid JSON only:
{"semantic_distractor": "...", "preference_match": "..."}
```
#### Normal\-correlation rewrite \(GPT\-4o, one call per sampled anchor\)\.
```
You are given a person’s opinion statement on a political or social
topic.
Original statement: "{anchor}"
Generate two rewritten versions that exhibit the natural correlation
between stance and wording, where same stance tends to share words
and opposite stance tends to use different words:
1. PREFERENCE MATCH (high word overlap with the anchor, same stance):
- Reuse most of the anchor’s vocabulary and content words
- Preserve the original stance and values
- BUT it must be a genuine paraphrase, NOT a verbatim or near-
verbatim copy of the anchor. Change at least 3-5 words to
synonyms, rearrange clauses, or restructure the sentence.
Otherwise the rewrite is invalid.
- Goal: MAXIMIZE word overlap while still meaningfully rewording
- Example: "Renewable energy must be prioritized above all else"
-> "We must put renewable energy ahead of every other concern"
2. SEMANTIC DISTRACTOR (low word overlap with the anchor, opposite
stance):
- Use completely DIFFERENT vocabulary, framing, and sentence
structure
- Express the OPPOSITE stance on the same underlying issue
- Goal: MINIMIZE word overlap while having opposite meaning
- Example: "Renewable energy must be prioritized above all else"
-> "Cheap, reliable fossil fuels are what people actually
need. Chasing wind and solar comes at too great a cost."
Respond with valid JSON only:
{"preference_match": "...", "semantic_distractor": "..."}
```Similar Articles
Preference Estimation via Opponent Modeling in Multi-Agent Negotiation
This paper proposes a novel preference estimation method that integrates natural language information from LLMs into a structured Bayesian opponent modeling framework for multi-agent negotiation. The approach leverages LLMs to extract qualitative cues from utterances and convert them into probabilistic formats, demonstrating improved agreement rates and preference estimation accuracy on multi-party negotiation benchmarks.
Learning Transferable Latent User Preferences for Human-Aligned Decision Making
This paper introduces CLIPR, a framework that learns transferable latent user preferences from minimal conversational input to improve human-aligned decision making in LLMs.
TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding
This paper introduces TabEmbed, a generalist embedding model for tabular data that unifies classification and retrieval tasks, along with TabBench, a new benchmark for evaluating tabular understanding.
The Proxy Presumption: From Semantic Embeddings to Valid Social Measures
This paper critiques the 'Proxy Presumption' in NLP, where geometric embedding properties are incorrectly equated with social constructs. It introduces the Construct Validity Protocol and Counterfactual Neutralization methods to ensure rigorous validation of social measures derived from semantic embeddings.
C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences
C2 proposes a scalable rubric-augmented reward modeling framework that trains a cooperative rubric generator and critical verifier exclusively from binary preferences, eliminating the need for costly rubric annotations while achieving up to 6.5 point gains on RM-Bench.