CLIPer: Tailoring Diverse User Preference via Classifier-Guided Inference-Time Personalization
Summary
This paper introduces CLIPer, a method for personalizing LLM outputs at inference time using a classifier, avoiding the computational cost of extensive fine-tuning.
View Cached Full Text
Cached at: 05/11/26, 06:51 AM
# CLIPer: Tailoring Diverse User Preference via Classifier-Guided Inference-Time Personalization
Source: [https://arxiv.org/html/2605.07162](https://arxiv.org/html/2605.07162)
Jinyan Su, Jinpeng Zhou, Claire Cardie, Wen Sun Cornell University \{js3673, jpzhou, ctc9, ws455\}@cornell\.edu
###### Abstract
Personalized LLMs can significantly enhance user experiences by tailoring responses to preferences such as helpfulness, conciseness, and humor\. However, fine\-tuning models to address all possible combinations of user preferences is computationally expensive and impractical\. In this paper, we introduceCLIPer\(Classifier\-guidedInference\-timePersonalization\), a lightweight personalization approach that leverages a classifier model to steer LLM generation dynamically to different user preferences at inference time\. Our method eliminates the need for extensive fine\-tuning, inducing negligible additional computational overhead while enabling more controllable and nuanced personalization across single and multi\-dimensional preferences\. Comprehensive empirical analyses demonstrate the scalability and effectiveness of our approach in delivering personalized language generation\.
CLIPer: Tailoring Diverse User Preference via Classifier\-Guided Inference\-Time Personalization
Jinyan Su, Jinpeng Zhou, Claire Cardie, Wen SunCornell University\{js3673, jpzhou, ctc9, ws455\}@cornell\.edu
## 1Introduction
In recent years, LLMs have emerged as powerful tools, driving advancements across various applications\. Their potential for delivering personalized services to meet diverse user needs has spurred significant interest in developing personalized LLMs, with the ultimate goal of leveraging them as personal assistants\. While methods like Reinforcement Learning from Human Feedback \(RLHF\) have aligned LLMs with general human preferences, they fall short in addressing the vast diversity of individual user preferences\. This generalization can negatively impact the user experience, particularly for minority groups, and limit the accessibility of LLMs\. For instance, an LLM designed for children should be amicable and easy to understand, whereas one for IT professionals should employ technical and formal language\.
MethodsTraining CostInference CostNaive Fine\-tuningO\(2d\)O\(2^\{d\}\)O\(1\)O\(1\)P\-Soup\(Janget al\.,[2023](https://arxiv.org/html/2605.07162#bib.bib7)\)O\(d\)O\(d\)O\(1\)O\(1\)MoPE\(Zhouet al\.,[2024](https://arxiv.org/html/2605.07162#bib.bib8)\)O\(d\)O\(d\)O\(d\)O\(d\)CLIPer\(Ours\)O\(1\)O\(1\)O\(1\)O\(1\)Table 1:Compare different personalization approaches\. Letddbe number of preferences user can choose from, such as \{conciseness, funny, informal,\.\.\.\}, for each preference dimension, user can choose to have it or not, leading to2d2^\{d\}total options in total\. Naive fine\-tuning requires to train2d2^\{d\}models in total\. Meanwhile, bothJanget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\)andZhouet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib8)\)fine\-tunes models for each dimension, thus, trainingddmodels in total\. Moreover,Zhouet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib8)\)requires all theddmodels during inference, resulting inO\(d\)O\(d\)inference time cost\. Our method doesn’t require training any model for the preference dimensions\.Personalization in LLMs \(Figure[1](https://arxiv.org/html/2605.07162#S1.F1)\) offers the potential to significantly enhance user experiences by tailoring the model’s behavior to align with diverse individual or group preferences\. A practical way to approach this is by clustering user preferences along various dimensions, such as helpfulness, conciseness, or humor\. Users can then define their ideal LLM personality by choosing among these dimensions, allowing the model to adapt dynamically to their needs\.
The simplest approach for personalization is through direct prompting, i\.e\., we include the user preference into the prompt\. However, preference prompting can struggle especially when preference cannot be explicitly articulated in natural language, but can only be inferred from user preference data\. In such cases, the most straightforward approach is to fine\-tune a model on data reflecting each user’s preferences\. However, fine\-tuning a model for each preference type already demands significant computational and memory resources\. The challenge intensifies when user preferences span multiple dimensions, as the number of possible combinations grows exponentially\. For example, as illustrated in Figure[1](https://arxiv.org/html/2605.07162#S1.F1), some user might want their LLM to be concise, some might prefer their models to be funny, while some might want their models to have combinations of dimensions such as to be both concise and funny\. Prior workJanget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\); Zhouet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib8)\)acknowledges the necessity of fine\-tuning models on single dimension, and mainly focus on reducing the complexity when users have combined preferences along multiple dimensions, reducing the training cost from exponential to linear in terms of dimensiondd\. See the comparison in Table[1](https://arxiv.org/html/2605.07162#S1.T1)\.
Figure 1:Motivation: Different users have different preferences\.Figure 2:Overview of CLIPer: at each stepii, a lightweight classifier model outputs the probability of the preferences classes given promptxx, partial generation in previous stepsy<iy\_\{<i\}and potential next token\. The detail can be found in Section[3\.2](https://arxiv.org/html/2605.07162#S3.SS2)\.However, trainingddseparate models for each preference dimension can be impractical for for large LLMs\. To make personalization more feasible, we need a lightweight personalization approach without fine\-tuning LLM for every preference dimension\. In this paper, we proposeCLIPer, aClassifier\-guidedInference\-timePersonalization approach that eliminates the need for training and maintaining multiple fine\-tuned models during both training and inference\. Our classifier is trained to predict a distribution over all preference dimensions given the past text\. The distribution models how likely the given text can be classified as each preference dimension \(e\.g\., what is the probability that the given text can be classified as funny\)\. Our classifier predicts at the token level, and its output can be used to dynamically adjust the base model’s logits at token level during the inference time\. Our method incurs negligible computational overhead compared to fine\-tuning multiple large LLMs while effectively steers the LLM toward personalized behaviors\.
To summarize, our contributions are: \(1\) We propose aClassifier\-guidedInference\-timePersonalization approach \(CLIPer\) that trains a small classifier which dynamically adjusts the logits of the LLM during inference time, enabling personalization without fine\-tuning the LLM\. This approach maintains a constant training and inference cost with respect to preference dimensions, making it both scalable and practical for real\-world applications\. \(2\) Our method supports both single\-dimension and multi\-dimension preference settings\. \(3\) We introduce a novel classifier model architecture that reduces computational overhead by jointly predicting the conditional distribution over the preference dimensions conditioned on past tokens \(prefix\) at the same time\. This architecture enables our approach to perform only one forward pass on both the large LLM and the small classifier during the inference time\. \(4\) Through comprehensive empirical analyses, we demonstrate the effectiveness of our approach in achieving personalized and controllable text generation without the need for extensive fine\-tuning or additional computational burden\.
## 2Related Work
#### Personalized LLM\.
P\-RLHFLiet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib17)\)introduces a lightweight user model to capture individual preferences and jointly learns both the user model and the personalized LLM from human feedback\.[Poddaret al\.](https://arxiv.org/html/2605.07162#bib.bib18)infer a user\-specific latent representation, using it to condition both the reward model and policy—without requiring additional user\-specific data\.[Zhuanget al\.](https://arxiv.org/html/2605.07162#bib.bib19)propose a system that trains a reranker to prioritize useful information and an adapter to align outputs with user preferences\. Similarly[Zolloet al\.](https://arxiv.org/html/2605.07162#bib.bib24)simulate a large and diverse user base by leveraging pre\-trained reward models to generate user\-specific signals\.Janget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\)proposed Personalized Soups \(P\-Soup\), reducing the training cost to linear complexity with respect to the number of dimensionsdd\. Subsequent work byZhouet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib8)\)use inference time personalization, merging the outputs of dimension\-wise expert models\. However, bothJanget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\)andZhouet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib8)\)require training an expert model for each dimension, resulting inddadditional training computations fordddimensions\. Furthermore,Zhouet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib8)\)employsddexpert models simultaneously during inference, leading toO\(d\)O\(d\)computational complexity for inference, as summarized in Table[1](https://arxiv.org/html/2605.07162#S1.T1)\.
#### LLM Alignment
[Dathathriet al\.](https://arxiv.org/html/2605.07162#bib.bib12)introduce Plug and Play Language Models \(PPLM\), which enable controllable generation by combining a pretrained LLM with lightweight attribute classifiers—without retraining the base model\.Liet al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib20)\)propose inference\-time interventions to elicit more truthful responses\.[Khanovet al\.](https://arxiv.org/html/2605.07162#bib.bib13)guide generation using a learned reward function, whileChakrabortyet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib14)\)estimate the optimal value function for a target reward implicitly\.Mudgalet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib15)\)train a separate prefix scorer to approximate the reward function\.[Chenet al\.](https://arxiv.org/html/2605.07162#bib.bib16)propose PAL, a general framework for pluralistic alignment that explicitly incorporates diverse human preferences\. Several other works also explore alignment through a multi\-objective reinforcement learning lensShiet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib21)\); Rameet al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib22)\);[Yanget al\.](https://arxiv.org/html/2605.07162#bib.bib23)drawing on foundations from \(multi\-objective\) RLSener and Koltun \([2018](https://arxiv.org/html/2605.07162#bib.bib25)\)\.
## 3Method
### 3\.1Problem Setting
#### Preferences Dimensions
Given a set ofddpreferences, for example,𝒫=\\mathcal\{P\}=\{Concise, Funny, Rigorous\.\.\.\}, a user can choose among theseddpreference dimensions\. Letccbe a single user preference dimension, for instance,c=c=Conciseorc=c=Funny\. Letcbe a set of user preferences, for example,c=\\textbf\{c\}=\{Concise,Funny\}, indicating that user wants their model to be both concise and funny\. The total number of differentcis2d2^\{d\}\.
#### Independence Assumption
Denote𝒫=\{c1,⋯,cd\}\\mathcal\{P\}=\\\{c\_\{1\},\\cdots,c\_\{d\}\\\}, andP\(ci\|x,y\)P\(c\_\{i\}\|x,y\)fori=1,2,…,di=1,2,\\dots,das the probability of the prompt and the partial sentenceyybeing classified as preferencecic\_\{i\}\. We assume user preferences incare independent, i\.e\., for any preference setc, we assumeP\(c\|x,y\)=∏c∈cP\(c\|x,y\)P\(\\textbf\{c\}\|x,y\)=\\prod\_\{c\\in\\textbf\{c\}\}P\(c\|x,y\)\. This independence assumption enables our approach to scale efficiently withddwhenccspans multiple dimensions— Instead of training a classifier for all possiblecamong2d2^\{d\}choices, we only traindd\-wise classifier, each corresponding to a single preference dimension\.
### 3\.2Personalized Text Generation
Given a user preferenceccand promptxx\(such as shown in Figure[1](https://arxiv.org/html/2605.07162#S1.F1)\), the text generation from personalized model can be formulated as a conditional text probability of generationyygivenxxandcc:
p\(y\|x,c\)=Πi=1np\(yi\|y<i;x,c\),p\(y\|x,c\)=\\Pi\_\{i=1\}^\{n\}p\(y\_\{i\}\|y\_\{<i\};x,c\),\(1\)whereyiy\_\{i\}denotes theii\-th token in the generated text andy<iy\_\{<i\}represents the tokens generated up to stepi−1i\-1\.
While training a separate model for each possible preferenceccwould theoretically enable perfect personalization, it is computationally infeasible due to the exponential growth of required models as the number of preference dimensionsddincreases\. Instead, we focus on inference\-time personalization, where a general base modelp\(y\|x\)p\(y\|x\)is dynamically adjusted to approximate the personalized modelp\(y\|x,c\)p\(y\|x,c\)\.
To achieve this, we leverage Bayesian inference by introducing an auxiliary classifier modelp\(c\|y,x\)p\(c\|y,x\)to steer the base model:
p\(y\|x,c\)∝p\(y\|x\)⋅p\(c\|y,x\)α,p\(y\|x,c\)\\propto p\(y\|x\)\\cdot p\(c\|y,x\)^\{\\alpha\},\(2\)whereα\\alphais a hyperparameter controlling the influence of the classifier model\. Whenα=0\\alpha=0, the formulation reduces to the base model without personalization\.
In the case of autoregressive language models, wherep\(y\|x,c\)=Πi=1np\(yi\|y<i;x,c\)p\(y\|x,c\)=\\Pi\_\{i=1\}^\{n\}p\(y\_\{i\}\|y\_\{<i\};x,c\), the conditional probability for each token is:
p\(yi\|y<i;x,c\)∝p\(yi\|y<i;x\)⋅p\(c\|y≤i,x\)α\.p\(y\_\{i\}\|y\_\{<i\};x,c\)\\propto p\(y\_\{i\}\|y\_\{<i\};x\)\\cdot p\(c\|y\_\{\\leq i\},x\)^\{\\alpha\}\.\(3\)Thus, the overall personalized text generation probability becomes:
p\(y\|x,c\)∝Πi=1np\(yi\|y<i;x\)p\(c\|y≤i,x\)αp\(y\|x,c\)\\propto\\Pi\_\{i=1\}^\{n\}p\(y\_\{i\}\|y\_\{<i\};x\)p\(c\|y\_\{\\leq i\},x\)^\{\\alpha\}Here,p\(c\|yi,y<i,x\)p\(c\|y\_\{i\},y\_\{<i\},x\)represents the likelihood of preferenceccgiven promptxxand the partial generationy≤iy\_\{\\leq i\}, which are given by the classifier model\.
#### Multiple Preferences
It is intuitive to generalize the above formulation to multiple preferences, i\.e\.,c=\{c1,⋯,cd\}\\textbf\{c\}=\\\{c\_\{1\},\\cdots,c\_\{d\}\\\}, and\|c\|=k\|\\textbf\{c\}\|=k\. Using the independence assumption, we have
p\(yi\|y<i;x,c\)∝p\(yi\|y<i;x\)Πck∈𝐜p\(ck\|y≤i;x\)αkp\(y\_\{i\}\|y\_\{<i\};x,\\textbf\{c\}\)\\propto p\(y\_\{i\}\|y\_\{<i\};x\)\\Pi\_\{c\_\{k\}\\in\\mathbf\{c\}\}p\(c\_\{k\}\|y\_\{\\leq i\};x\)^\{\\alpha\_\{k\}\}\(4\)whereα=\(α1,⋯,αd\)\\alpha=\(\\alpha\_\{1\},\\cdots,\\alpha\_\{d\}\)are hyperparameters controlling the influence of the each preference\.
In inference time, to sample the next token, we need to construct the distribution over the next token \(i\.e\.,p\(y\|y<i;x,c\)p\(y\|y\_\{<i\};x,c\)for ally∈𝒱y\\in\\mathcal\{V\}\)\. This would require us to evaluate the classifier model for ally∈𝒱y\\in\\mathcal\{V\}, i\.e\., we need to computep\(c\|y<i,y,x\)p\(c\|y\_\{<i\},y,x\)for ally∈𝒱y\\in\\mathcal\{V\}\. Naively implementing this would require us to perform\|𝒱\|\|\\mathcal\{V\}\|forward passes on the classifier model\(Sitdikovet al\.,[2022](https://arxiv.org/html/2605.07162#bib.bib9)\)\. This can significantly slow down the inference procedure\.
### 3\.3Efficient classifier Modeling
To address the above efficiency challenge, we train a classifier model capable of producing classification probabilities for all tokens in the vocabulary at each step\. Specifically, during the generation, the classifier model takes the promptxxand partial generationy<iy\_\{<i\}as inputs, and outputs a matrixM∈ℝ\|𝒱\|×dM\\in\\mathbb\{R\}^\{\|\\mathcal\{V\}\|\\times d\}whereMv,cjM\_\{v,c\_\{j\}\}models the probability of having labelcjc\_\{j\}given promptxx, the partial generation\(y<i\)\(y\_\{<i\}\), and the next tokenv∈𝒱v\\in\\mathcal\{V\}, i\.e\.,p\(cj\|yi=v,y<i,x\)p\(c\_\{j\}\|y\_\{i\}=v,y\_\{<i\},x\)\.
Figure 3:Details of the output matrixMM\. Giveny<iy\_\{<i\}, each row ofMMprovides the preference probabilities for a specific token in the vocabulary, which sums up to be 1\.Thekk\-th row ofMMcorresponds to the preference probabilitiesp\(c\|y<i,vk,x\)p\(c\|y\_\{<i\},v\_\{k\},x\), wherevk∈𝒱v\_\{k\}\\in\\mathcal\{V\}is thekk\-th token in the vocabulary\. The rows are normalized such that the probabilities sum to 1\. This design allows the classifier model to efficiently compute preference probabilities for all possible tokens in a single step, avoiding the need for performing multiple forward passes of the model\. An illustrative representation of the matrixMMis depicted in Figure[3](https://arxiv.org/html/2605.07162#S3.F3)\.
#### Training Loss
Givenkkdatasets where each dataset𝒟i\\mathcal\{D\}\_\{i\}contains generations that align well with the given preference dimensioncic\_\{i\}\. The classifier model is trained using a combined datasetD=∪iDiD=\\cup\_\{i\}D\_\{i\}\. Let the combinedD=\{\(x\(j\),y\(j\),c\(j\)\)\}D=\\\{\(x^\{\(j\)\},y^\{\(j\)\},c^\{\(j\)\}\)\\\}, where each sample consists of a promptx\(j\)x^\{\(j\)\}, a generated texty\(j\)y^\{\(j\)\}, and a preferencec\(j\)∈𝒫c^\{\(j\)\}\\in\\mathcal\{P\}\. The training objective is to minimize the following token\-level multi\-class classification loss:
ℒ=−1N∑j=1N\(1n\(j\)∑i=1n\(j\)logp\(c\(j\)\|y≤i\(j\),x\(j\)\)\),\\mathcal\{L\}=\-\\frac\{1\}\{N\}\\sum\_\{j=1\}^\{N\}\\left\(\\frac\{1\}\{n^\{\(j\)\}\}\\sum\_\{i=1\}^\{n^\{\(j\)\}\}\\log p\(c^\{\(j\)\}\|y^\{\(j\)\}\_\{\\leq i\},x^\{\(j\)\}\)\\right\),\(5\)whereNNis the total training samples,n\(j\)n^\{\(j\)\}is the total number of tokens ofy\(j\)y^\{\(j\)\}\. We show the construction of the training loss through a concrete example in Figure[7](https://arxiv.org/html/2605.07162#A1.F7)\.
### 3\.4Classifier Model Guided Generation\.
After training the classifier model, we utilize it to achieve personalized generation as described in Eq\. \([2](https://arxiv.org/html/2605.07162#S3.E2)\)\. At each generation stepii, the base model and the classifier model both take the promptxxand the current partial generationy<iy\_\{<i\}as input\. Then, produces the unconditional probabilitiesp\(v\|x,y<i\)p\(v\|x,y\_\{<i\}\)for all tokensv∈𝒱v\\in\\mathcal\{V\}, where𝒱\\mathcal\{V\}is the vocabulary\. And the classifier model generates probabilities of user preferencecc, i\.e\.,p\(c\|x,y<i,v\)p\(c\|x,y\_\{<i\},v\), for allv∈𝒱v\\in\\mathcal\{V\}\. The next tokenyiy\_\{i\}is sampled from the personalized probability distributionp\(v\|x,y<i,c\)p\(v\|x,y\_\{<i\},c\), which combines the outputs of the base model and the classifier model through Eq\.\([3](https://arxiv.org/html/2605.07162#S3.E3)\) \(Or Eq\.\([4](https://arxiv.org/html/2605.07162#S3.E4)\) for multiple preferences\)\. The process is repeated until the end of the generation An illustration of this classified guided process is provided in Figure[2](https://arxiv.org/html/2605.07162#S1.F2)\.
## 4Experiments
### 4\.1Experimental Detail
#### Dataset for Classifier Model Training
Following prior workZhouet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib8)\); Janget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\), we consider six preference dimensions across three categories: audience, content density, and tone \(see Table[2](https://arxiv.org/html/2605.07162#S4.T2)\)\. Consistent with previous studiesJanget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\); Zhouet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib8)\), we use GPT4\-AlpacaPenget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib10)\)as training data and generaten=16n=16samples per dimension using Tulu\-7B as the base model, with prompts fromJanget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\)\(summarized in Table[2](https://arxiv.org/html/2605.07162#S4.T2)\)\. The top\-1 ranked sample, based on the reward model fromJanget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\), is selected as training data for each dimension\. We then train a classifier using LLaMA\-160M on this collected data\. UnlikeJanget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\); Zhouet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib8)\), we do not use reward model for training or inference; it is only employed during dataset creation to simulate human preferences\.
Due to the scarcity of high\-quality validation and test datasets, we tune hyperparametersα\\alphausing data randomly sampled from Alpaca\-Eval\. For evaluation, we follow prior worksJanget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\); Zhouet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib8)\)and use the Koala and UltraFeedbackCuiet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib11)\)datasets\.
CategorySymbolDimensionPromptAudience\-specificP1AElementary LevelGenerate a response that can be easily understood by an elementary school student\.P2APhD LevelGenerate a response that only a PhD Student in that specific field could understand\.Content\-densityP2AConciseGenerate a response that is concise and to the point without being verbose\.P2BVerboseGenerate a response that is very informative without missing any background information\.ToneP3APlayfulGenerate a response that is friendly, witty, funny, and humorous, like a close friend\.P3BHarshGenerate a response in an unfriendly manner\.Table 2:Preference symbol, their corresponding dimension, as well as the prompt that is used for direct prompting\.
#### Preference Dimension Setting
In order to accommodate the experiments to an important baseline \- direct prompting with the preferences, and also to be consistent with previous works, we use explicit preferences, where the 6 dimensions are, "elementary level", "PhD level", "concise", "verbose", "playful", "harsh", respectively\. The details description of these dimensions can be found in Table[2](https://arxiv.org/html/2605.07162#S4.T2)\.
#### Baselines
We mainly compare our method to 3 baselines:Direct Prompting, which directly prompt the LLM to generate the text according to user preferences\. Noting from prior worksJanget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\); Zhouet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib8)\), Direct prompting is a rather strong baseline since tulu\-7b is well instructed to generate personalized texts based on instructions alone\. The prompt used for each dimension is illustrated in Table[2](https://arxiv.org/html/2605.07162#S4.T2)\.p\-soup & Direct Fine\-tuning\.For single\-dimension experiments, we compare with fine\-tuned models\. For multi\-dimension experiments, we compare with p\-soupJanget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\), which merges the parameters of models fine\-tuned on each dimensions\.Vanilla Baseline\.Texts generated directly from the base model, without tailoring to any user preference\.
#### Evaluation Metrics
Since our classifier model is trained on dataset created with reward models, we use the reward model as a judge for win\-rate\. Additionally, we employ GPT\-4o\-mini as an alternative judge to simulate an out\-of\-domain preference scenario—where preferences are learned from one user group \(i\.e\., the reward model\) but evaluated on another \(i\.e\., GPT\-4o\-mini\)\. This further underscores the importance of personalizing LLMs for different user groups\. We used the same prompts fromJanget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\); Zhouet al\.\([2024](https://arxiv.org/html/2605.07162#bib.bib8)\)for GPT\-4o\-mini when using it as a judge\.
EvaluatorReward ModelGPT\-4o\-miniAverageDatasetKoalaUltraFeedbackKoalaUltraFeedbackDirect Prompting0\.53670\.53670\.55000\.49670\.5300Vanilla Baseline0\.88000\.78000\.84330\.83330\.8342p\-soup & Direct Fine\-tuning0\.78670\.71330\.39670\.42330\.5800Table 3:Average win rate on Koala and UltraFeedback, evaluated by reward model and GPT\-4o\-mini respectively\. \(Preference for 1\-dim\)\.EvaluatorReward ModelGPT\-4o\-miniAverageDatasetKoalaUltraFeedbackKoalaUltraFeedbackDirect Prompting0\.52750\.48440\.50310\.50380\.5047Vanilla Baseline0\.77750\.71250\.71060\.69750\.7245p\-soup & Direct Fine\-tuning0\.70630\.65500\.47750\.45370\.5731Table 4:Average win rate on Koala and UltraFeedback, evaluated by reward model and GPT\-4o\-mini respectively\. \(Preference for 2\-dim\)\.EvaluatorReward ModelGPT\-4o\-miniAverageDatasetKoalaUltraFeedbackKoalaUltraFeedbackDirect Prompting0\.59330\.46000\.53670\.53670\.5317Vanilla Baseline0\.90000\.79000\.80670\.81000\.8267p\-soup & Direct Fine\-tuning0\.85330\.75330\.54670\.58330\.6842Table 5:Average win rate on Koala and UltraFeedback, evaluated by reward model and GPT\-4o\-mini respectively\. \(Preference for 3\-dim\)\.
#### Hyperparameter
The primary hyperparameter we tune isα\\alpha, which controls the influence of the classifier model in modifying the base model’s logits\. A lowerα\\alpharesults in less personalization, allowing the base model’s original responses to dominate, while a higherα\\alphaenforces stronger personalization\. In practice,α\\alphashould be chosen to balance the trade\-off between preserving the base model’s original responses and incorporating personalized adjustments\. For CLIPer,α\\alphais a tuple, depending on total number of dimensions in the combinationc\. For single dimension experiments,α\\alphais a scalar, so we use two randomly sampled subset of alpaca eval dataset, each of 100 data samples, and conduct parameter search overα=\{0\.05,0\.1,0\.3,0\.5,0\.8\}\\alpha=\\\{0\.05,0\.1,0\.3,0\.5,0\.8\\\}\. We end up havingα=0\.5\\alpha=0\.5to be the best parameter for one subset andα=0\.8\\alpha=0\.8for another\. We end up choosingα=0\.8\\alpha=0\.8as the hyper\-parameter for our main experiments by taking the average of the win rate of these two dataset, and select the hyper\-parameter based on this averaged win rate\. Note that, since the win rate is pair\-wise, to simplify the hyper\-parameter selection, we only compute the win\-rate against direct\-prompting, and use this win rate to choose hyper\-parameter\. Also, we use the same hyper\-parameter for all the experiments, thus, our hyper\-parameter selection is rather coarse\. For 2\-dimension and 3 dimension experiments, with the experience from single dimensional hyper\-parameter selection, we narrow down our search space and selectαi\\alpha\_\{i\}only from\{0\.5,0\.8\}\\\{0\.5,0\.8\\\}\. Note that for 2\-dimensional experiments, there are two hyper\-parameters\(α1,α2\)\(\\alpha\_\{1\},\\alpha\_\{2\}\), each for one dimension, while for 3\-dimensional experiments, there are three hyper\-parameters\(α1,α2,α3\)\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\), thus, we search over 8 total combination of hyper\-parameters\.
## 5Experimental Results
### 5\.1Main Results
In this section, we experiment when users choose one, two and three combined preference out of the 6 dimensions𝒫=\\mathcal\{P\}=\{P1A, P1B, P2A, P2B, P3A, P3B\}\.
#### Single preference Dimension
For single preference dimension experiments, we only have 1 hyper\-parameter to tune\. In order to show the robustness of our method, we present the average performance over these 6 dimensions in Table[3](https://arxiv.org/html/2605.07162#S4.T3)without tuningα\\alphafor different dimensions\. Even without carefully tuningα\\alpha, we find that CLIPer still achieves a rather high average win rate compare to other baseline\. Table[3](https://arxiv.org/html/2605.07162#S4.T3)presents the average win rates \(across six single preference dimensions\) from both reward model and gpt\-4o\-mini evaluations, demonstrating the effectiveness of our proposed approach\. When compared with vanilla baseline, our method achieves a high average win rate of 83\.42%, highlighting the necessity of personalizing LLMs\. Compared to Direct Prompting baseline, CLIPer attains an average win rate of 53%, while achieving 58% average win rate against Direct fine\-tuning\. When comparing against direct prompting, there is minimal discrepancy between evaluations from the reward model and GPT\-4o\-mini, indicating consistency across different evaluators\.However, when compared to Direct Fine\-tuning, our approach achieves a high win rate when evaluated by the reward model but falls below 50% when evaluated by GPT\-4o\-mini\. This discrepancy arises because our classifier model is trained on data labeled by the reward model, meaning it inherently aligns with the reward model’s preferences\. In contrast, Direct Fine\-tuning leverages a fine\-tuned model trained on reward model from GPT\-4\-preference data, leading to a stronger alignment with GPT\-4o\-mini’s evaluation\.
#### Two Preference Dimensions
For two dimensional experiments, since there are in total 12 combinations of preferences, we randomly choose 8 out of these 12 combinations to reduce the cost when doing the evaluation\. Similar to single preference dimension experiments, we use the same\(α1,α2\)\(\\alpha\_\{1\},\\alpha\_\{2\}\)values across all experiments without tuning them for different preference combinations\. While this approach may severely undermine the performance of our method, it serves as a stress test for its sensitivity to hyperparameters, making our evaluation more practical by assessing robustness rather than relying on delicate hyperparameter selection\. Table[4](https://arxiv.org/html/2605.07162#S4.T4)shows the average win rate when combining 2 preference dimensions\. Direct prompting remains a strong baseline, with our approach achieving only a 50\.47% win rate against it\. However, compared to the vanilla baseline, our method consistently outperforms, attaining an average win rate of 72\.45%\. While our approach does not show a clear advantage over direct prompting, this aligns with prior findingsJanget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\), which highlight direct prompting as a strong baseline\. Nevertheless, direct prompting is limited to explicitly stated preferences and is not applicable when preferences are implicit\. Compared to p\-soup, our win rate remains below 50% when evaluated by GPT\-4o\-mini\. This is expected, as p\-soup, while not directly fine\-tuned on two\-dimensional preference combinations, benefits from merging parameters from fine\-tuned models\. Given that only two dimensions are merged, it still achieves strong performance\. Additionally, we observe that the win rate for Koala is generally higher than for Ultrafeedback, consistent with the results from single\-dimensional experiments\.
Figure 4:Accuracy and loss on evaluation set for classifier using Llama\-160M and Llama\-1B respectively\.Figure 5:Per\-dimension accuracy on evaluation dataset with Llama\-160M and Llama\-1B respectively\.Figure 6:Correlation matrices for reward values between different preference dimensions\.Each matrix only uses the text generated by direct prompting using the preference dimension shown in each title\. For each matrix, the reward values for all preference dimensions are calculated and correlation calculation is performed\.
#### Three Preference Dimensions
Table[5](https://arxiv.org/html/2605.07162#S4.T5)presents the results for three\-preference dimension combinations, where CLIPer consistently outperforms all three baselines in both reward model evaluation and GPT\-4o\-mini evaluation\. Comparing these results with those from the single\-dimensional setting \(Table[3](https://arxiv.org/html/2605.07162#S4.T3)\) and the two\-dimensional setting \(Table[4](https://arxiv.org/html/2605.07162#S4.T4)\), we observe a notable trend: the win rate of our approach, as evaluated by GPT\-4o\-mini, increases progressively from approximately 40% in the single\-dimensional case to above 45% in the two\-dimensional case and surpasses 50% in the three\-dimensional case\. This trend suggests that while p\-soup remains competitive at lower preference dimensions, its performance diminishes as the number of preference dimensions increases\. In contrast, our approach maintains strong performance and even improves as the dimensionality scales, highlighting its robustness in handling more complex preference interactions\. This observation underscores the limitations of parameter merging in p\-soup when dealing with higher\-dimensional preference combinations, whereas our method demonstrates better adaptability and effectiveness in capturing nuanced preference structures\.
### 5\.2Further Analysis
#### Scaling up the Classifier Model
While we use LLaMA\-160M as our classifier in the main experiments, we conduct additional experiments to investigate whether increasing the model size further improves the performance of the classifier model\. In Figure[4](https://arxiv.org/html/2605.07162#S5.F4), we compare the evaluation loss and accuracy when using LLaMA\-160M and LLaMA\-1B\. Surprisingly, we observe no significant gap in classification performance in terms of loss and accuracy, suggesting that LLaMA\-160M is sufficient for our current experiments\. This indicates that scaling up the classifier model may not provide substantial benefits in this setting\. In Figure[5](https://arxiv.org/html/2605.07162#S5.F5), we further plot the Per\-class accuracy for each class, we found that P3A, which corresponds to "Playful", as in Table[2](https://arxiv.org/html/2605.07162#S4.T2), has the lowest classification accuracy, while P1A, which corresponds to "elementary level", has the highest classification accuracy\. Even the dimension has the lowest accuracy achieves more than 0\.5 \(over 0\.17\) classification accuracy, showing that the classifier is indeed classifying these classes\.
#### Correlation Analysis
An important assumption for our approach is that the preference dimensions we are combining should be independent since during inference the classifier output probability is multiplied across preference dimensions\. To empirically verify this, in Figure[6](https://arxiv.org/html/2605.07162#S5.F6), we analyze the correlation of different dimensions where we generate text that is intentionally steered toward that dimension through direct prompting\. For each text, we use the reward model to label each dimension, and compute the Pearson correlation coefficient over the reward model scores among all preference dimension pairs\. We observe that for every preference and its opposite, the correlation is always very strongly negative\. This is expected and not an issue since we never group opposite preferences in toc\. Another observation is that when the text is generated based on a particular preference, it usually has low correlation with other preferences \(except itself and the opposite preference\), suggesting preference dimensions that we are combining are fairly independent from each other, which verifies our independence assumption\.
#### Human Evaluation
We get 6 PhD student volunteers to do human study, where we present them 50 questions and pairwise model response \(either our response or the response from preference prompting, presented in randomized order\) from the koala dataset, and ask them to evaluation which response satisfies “elementary level, playful and concise” dimensions better, the results are given in Table[19](https://arxiv.org/html/2605.07162#A3.T19)and the agreement among the labelers are give in Table[20](https://arxiv.org/html/2605.07162#A3.T20)\. In average, our approach holds a win rate of 65\.3% when evaluated by human\.
## 6Conclusion
In this paper, we introduced CLIPer, a classifier\-guided inference\-time personalization approach that enables dynamic adaptation of LLMs to diverse user preferences without extensive fine\-tuning\. By leveraging a lightweight classifier model, CLIPer steers generation in real time, offering a scalable and efficient alternative to traditional fine\-tuning approaches\. Our experimental results demonstrate the potential of CLIPer as a practical and adaptable solution for real\-world LLM personalization\.
## Limitations
As our work focuses on building a scalable and efficient personalization framework—an intrinsically challenging task—it naturally introduces certain limitations\. First, our classifier model is trained on synthetic data derived from reward model rankings, rather than human\-annotated labels\. While this design choice enables scalable training without costly human supervision, it may also introduce alignment biases from the reward model\. However, our empirical evaluations across both in\-domain \(reward model\) and out\-of\-domain \(GPT\-4o\-mini\) judges help mitigate this concern\. Second, our method assumes conditional independence between preference dimensions to avoid exponential scaling\. While this assumption holds empirically for the dimensions we evaluate \(as shown in our correlation analysis\), future work could explore modeling interdependencies explicitly to support more nuanced personalization\. Third, due to resource constraints, we evaluate CLIPer on a fixed set of six well\-defined preference dimensions\. This allows for controlled comparisons and benchmarking, but does not fully capture the range of real\-world user preferences—especially those that are implicit or context\-specific\. Nonetheless, the framework is general and can easily accommodate new dimensions as data becomes available\. Finally, we apply a shared set of hyperparameters across all combinations of preferences to ensure fairness and reduce tuning overhead\. While this likely understates the upper bound of our method’s performance, it demonstrates robustness and practicality for real\-world use without extensive hyperparameter tuning\.
## References
- Transfer q\-star: principled decoding for llm alignment\.Advances in Neural Information Processing Systems37,pp\. 101725–101761\.Cited by:[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px2.p1.1)\.
- \[2\]D\. Chen, Y\. Chen, A\. Rege, and R\. K\. VinayakPAL: pluralistic alignment framework for learning from heterogeneous preferences\.InNeurIPS 2024 Workshop on Behavioral Machine Learning,Cited by:[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px2.p1.1)\.
- G\. Cui, L\. Yuan, N\. Ding, G\. Yao, B\. He, W\. Zhu, Y\. Ni, G\. Xie, R\. Xie, Y\. Lin,et al\.\(2024\)ULTRAFEEDBACK: boosting language models with scaled ai feedback\.InForty\-first International Conference on Machine Learning,Cited by:[§4\.1](https://arxiv.org/html/2605.07162#S4.SS1.SSS0.Px1.p2.1)\.
- \[4\]S\. Dathathri, A\. Madotto, J\. Lan, J\. Hung, E\. Frank, P\. Molino, J\. Yosinski, and R\. LiuPlug and play language models: a simple approach to controlled text generation\.InInternational Conference on Learning Representations,Cited by:[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px2.p1.1)\.
- J\. Jang, S\. Kim, B\. Y\. Lin, Y\. Wang, J\. Hessel, L\. Zettlemoyer, H\. Hajishirzi, Y\. Choi, and P\. Ammanabrolu \(2023\)Personalized soups: personalized large language model alignment via post\-hoc parameter merging\.arXiv preprint arXiv:2310\.11564\.Cited by:[Appendix D](https://arxiv.org/html/2605.07162#A4.p1.1),[Table 1](https://arxiv.org/html/2605.07162#S1.T1),[Table 1](https://arxiv.org/html/2605.07162#S1.T1.4.4.3),[§1](https://arxiv.org/html/2605.07162#S1.p3.1),[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px1.p1.5),[§4\.1](https://arxiv.org/html/2605.07162#S4.SS1.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2605.07162#S4.SS1.SSS0.Px1.p2.1),[§4\.1](https://arxiv.org/html/2605.07162#S4.SS1.SSS0.Px3.p1.1),[§4\.1](https://arxiv.org/html/2605.07162#S4.SS1.SSS0.Px4.p1.1),[§5\.1](https://arxiv.org/html/2605.07162#S5.SS1.SSS0.Px2.p1.1)\.
- \[6\]M\. Khanov, J\. Burapacheep, and Y\. LiARGS: alignment as reward\-guided search\.InThe Twelfth International Conference on Learning Representations,Cited by:[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px2.p1.1)\.
- K\. Li, O\. Patel, F\. Viégas, H\. Pfister, and M\. Wattenberg \(2023\)Inference\-time intervention: eliciting truthful answers from a language model\.Advances in Neural Information Processing Systems36,pp\. 41451–41530\.Cited by:[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px2.p1.1)\.
- X\. Li, R\. Zhou, Z\. C\. Lipton, and L\. Leqi \(2024\)Personalized language modeling from personalized human feedback\.arXiv preprint arXiv:2402\.05133\.Cited by:[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px1.p1.5)\.
- S\. Mudgal, J\. Lee, H\. Ganapathy, Y\. Li, T\. Wang, Y\. Huang, Z\. Chen, H\. Cheng, M\. Collins, T\. Strohman,et al\.\(2024\)Controlled decoding from language models\.InInternational Conference on Machine Learning,pp\. 36486–36503\.Cited by:[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px2.p1.1)\.
- B\. Peng, C\. Li, P\. He, M\. Galley, and J\. Gao \(2023\)Instruction tuning with gpt\-4\.arXiv preprint arXiv:2304\.03277\.Cited by:[§4\.1](https://arxiv.org/html/2605.07162#S4.SS1.SSS0.Px1.p1.1)\.
- \[11\]S\. Poddar, Y\. Wan, H\. Ivison, A\. Gupta, and N\. JaquesPersonalizing reinforcement learning from human feedback with variational preference learning\.InPluralistic Alignment Workshop at NeurIPS 2024,Cited by:[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px1.p1.5)\.
- A\. Rame, G\. Couairon, C\. Dancette, J\. Gaya, M\. Shukor, L\. Soulier, and M\. Cord \(2023\)Rewarded soups: towards pareto\-optimal alignment by interpolating weights fine\-tuned on diverse rewards\.Advances in Neural Information Processing Systems36,pp\. 71095–71134\.Cited by:[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px2.p1.1)\.
- O\. Sener and V\. Koltun \(2018\)Multi\-task learning as multi\-objective optimization\.Advances in neural information processing systems31\.Cited by:[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px2.p1.1)\.
- R\. Shi, Y\. Chen, Y\. Hu, A\. Liu, H\. Hajishirzi, N\. A\. Smith, and S\. S\. Du \(2024\)Decoding\-time language model alignment with multiple objectives\.Advances in Neural Information Processing Systems37,pp\. 48875–48920\.Cited by:[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px2.p1.1)\.
- A\. Sitdikov, N\. Balagansky, D\. Gavrilov, and A\. Markov \(2022\)Classifiers are better experts for controllable text generation\.arXiv preprint arXiv:2205\.07276\.Cited by:[§3\.2](https://arxiv.org/html/2605.07162#S3.SS2.SSS0.Px1.p2.6)\.
- \[16\]K\. Yang, Z\. Liu, Q\. Xie, J\. Huang, T\. Zhang, and S\. AnaniadouMetaAligner: towards generalizable multi\-objective alignment of language models\.InThe Thirty\-eighth Annual Conference on Neural Information Processing Systems,Cited by:[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px2.p1.1)\.
- J\. P\. Zhou, K\. Z\. Luo, J\. Gu, J\. Yuan, K\. Q\. Weinberger, and W\. Sun \(2024\)Orchestrating llms with different personalizations\.arXiv preprint arXiv:2407\.04181\.Cited by:[Table 1](https://arxiv.org/html/2605.07162#S1.T1),[Table 1](https://arxiv.org/html/2605.07162#S1.T1.6.6.3),[§1](https://arxiv.org/html/2605.07162#S1.p3.1),[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px1.p1.5),[§4\.1](https://arxiv.org/html/2605.07162#S4.SS1.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2605.07162#S4.SS1.SSS0.Px1.p2.1),[§4\.1](https://arxiv.org/html/2605.07162#S4.SS1.SSS0.Px3.p1.1),[§4\.1](https://arxiv.org/html/2605.07162#S4.SS1.SSS0.Px4.p1.1)\.
- \[18\]Y\. Zhuang, H\. Sun, Y\. Yu, R\. Qiang, Q\. Wang, C\. Zhang, and B\. DaiHYDRA: model factorization framework for black\-box llm personalization\.InThe Thirty\-eighth Annual Conference on Neural Information Processing Systems,Cited by:[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px1.p1.5)\.
- \[19\]T\. P\. Zollo, A\. W\. T\. Siah, N\. Ye, A\. Li, and H\. NamkoongPersonalLLM: tailoring llms to individual preferences\.InThe Thirteenth International Conference on Learning Representations,Cited by:[§2](https://arxiv.org/html/2605.07162#S2.SS0.SSS0.Px1.p1.5)\.
## Appendix AExperimental details
#### Parameter Setting for Classifier
For Llama\-160M model, we search over learning rate being\{1e−4,1e−5,1e−6\}\\\{1e\-4,1e\-5,1e\-6\\\}, and choose best learning rate based on the evaluation loss\. The final learning rate used in our experiment is1e−51e\-5\. For Llama\-1B model, we also search over learning rate being\{1e−4,1e−5,1e−6\}\\\{1e\-4,1e\-5,1e\-6\\\}and use the learning rate1e−61e\-6, which has the lowest evaluation loss\.
#### Illustration of the loss
The illustation of the composition of the loss is in Figure[7](https://arxiv.org/html/2605.07162#A1.F7)
Figure 7:Illustration of the composition of the training loss using a single data point\(x,y,c\)\(x,y,c\), wherexxis the user prompt,yyis the LLM generation, andccis the preference that the texts fits into\. Training loss is the average of the negative log likelihood of preferenceccgiven the partial generationy≤iy\_\{\\leq i\}and the promptxx\. And the overall loss function across the training datasetD=\{\(x\(j\),y\(j\),c\(j\)\)\}D=\\\{\(x^\{\(j\)\},y^\{\(j\)\},c^\{\(j\)\}\)\\\}is averaged overNNdata\.
## Appendix BAdditional experiments
### B\.1Single dimension
#### Performance of Alpaca\-Eval subsets
We use two subsets randomly sampled from Alapaca\-eval, each of 100 data prompts, and select the hyper\-parameterα\\alphabased on its win rate against direct prompting\. Table[6](https://arxiv.org/html/2605.07162#A2.T6)\(evaluated by GPT\-4o\-mini\) and Table[9](https://arxiv.org/html/2605.07162#A2.T9)\(evaluated by reward model\) illustrate the win rate of different dimensions using differentα\\alphaon both subset 1 and subset 2\. Table[7](https://arxiv.org/html/2605.07162#A2.T7)and Table[8](https://arxiv.org/html/2605.07162#A2.T8)summarized the average win rate across all 6 dimensions, evaluated on gpt\-4o\-mini and reward model, respectively\.
Datasetα\\alphaP1AP1BP2AP2BP3AP3BAvg\.AlpacaEval Subset 1α=\\alpha=0\.050\.5550\.4950\.4950\.4850\.4500\.5600\.507α=\\alpha=0\.10\.4900\.5250\.5000\.4850\.4950\.5000\.499α=\\alpha=0\.30\.5400\.5350\.5550\.5650\.4650\.5350\.533α=\\alpha=0\.50\.5300\.5150\.4950\.4800\.5350\.5900\.524α=\\alpha=0\.80\.5600\.5150\.5150\.5950\.5150\.5850\.548α=\\alpha=1\.00\.5450\.4950\.5050\.4550\.5200\.6450\.528AlpacaEval Subset 2α=\\alpha=0\.050\.5550\.5000\.5400\.5900\.4750\.4900\.525α=\\alpha=0\.10\.5050\.4900\.4150\.5250\.4700\.5300\.489α=\\alpha=0\.30\.4950\.5550\.4950\.5250\.5500\.5150\.523α=\\alpha=0\.50\.5300\.5650\.5900\.5050\.4850\.5100\.531α=\\alpha=0\.80\.4600\.5300\.5150\.4950\.5450\.5850\.522α=\\alpha=1\.00\.4900\.4700\.5050\.5250\.4650\.5250\.497Table 6:Win rate \(evaluated with gpt4o\-mini\) of single dimension on AlpacaEval Subset 1 and AlpacaEval Subset 2 against Direct Prompting baseline with variousα\\alpha\. Thebestα\\alpha\(based on the average value\) for each dataset is highlighted\.α\\alpha0\.050\.10\.30\.50\.81\.0Alpaca Subset \#\.10\.50670\.49920\.53250\.52420\.54750\.5275Alpaca Subset \#\.20\.52500\.48920\.52250\.53080\.52170\.4967Avg\.0\.51590\.49420\.52750\.52750\.53460\.5121Table 7:Average Win Rates \(across 6 dimensions\) for differentα\\alphaacross Alpaca Subset \#\.1 and Alpaca Subset \#\.2 evaluated on GPT4o\-mini\. We highlight the win rate ofα\\alphaselected using Alpaca Subset \#\.1 as the evaluation set and Alpaca Subset \#\.2 as the test set withblue, while the win rate selected using Alpaca Subset \#\.2 as the evaluation set and Alpaca Subset \#\.1 as the test set withgreen\. For both of them, we bold the test set\. We also bold the result that has the largest average for these two sets, which corresponds to a parameterα\\alphathat will be used for direct evaluation on koala and ultrafeedback dataset\.α\\alpha0\.050\.10\.30\.50\.81\.0Alpaca Subset \#\.10\.50920\.51830\.50920\.52750\.54080\.4917Alpaca Subset \#\.20\.51000\.50330\.51670\.53170\.54080\.5408Avg\.0\.50960\.51080\.51300\.52960\.54080\.5162Table 8:Average Win Rates \(across 6 dimensions\) for differentα\\alphaacross Alpaca Subset 1 and Alpaca Subset 2 evaluated on reward models\. The win rate ofα\\alphaselected using Alpaca Subset 1 as the evaluation set and Alpaca Subset 2 as the test set, and vice versa, are consistent, which is highlighted in blue\. We select the parameterα\\alphaby taking the average of these two sets\.Datasetα\\alphaP1AP1BP2AP2BP3AP3BAvg\.AlpacaEval Subset 1α=\\alpha=0\.050\.5900\.5000\.4650\.4750\.4500\.5750\.509α=\\alpha=0\.10\.5550\.4900\.5200\.4650\.5400\.5400\.518α=\\alpha=0\.30\.5300\.5250\.5350\.4300\.5300\.5050\.509α=\\alpha=0\.50\.5200\.5450\.5450\.5150\.4700\.5700\.528α=\\alpha=0\.80\.6050\.5150\.5050\.5200\.5500\.5500\.541α=\\alpha=1\.00\.5100\.4900\.5000\.4350\.4900\.5250\.492AlpacaEval Subset 2α=\\alpha=0\.050\.4900\.4950\.4600\.5900\.4950\.5300\.510α=\\alpha=0\.10\.5100\.5150\.4400\.4550\.5400\.5600\.503α=\\alpha=0\.30\.5050\.5850\.4550\.4700\.5450\.5400\.517α=\\alpha=0\.50\.5050\.5600\.5250\.5300\.5150\.5550\.532α=\\alpha=0\.80\.5750\.5450\.4950\.5300\.5150\.5850\.541α=\\alpha=1\.00\.5000\.5600\.5350\.4550\.5550\.6400\.541Table 9:Win rate \(evaluated with reward model\) of single dimension on AlpacaEval Subset 1 and AlpacaEval Subset 2 against Direct Prompting baseline with variousα\\alpha\. Thebestα\\alphafor each dataset is highlighted\.
#### Performance on Koala and UltraFeedback dataset
Table[10](https://arxiv.org/html/2605.07162#A2.T10)illustrates the win rate against direct fine\-tuning and direct prompting for each dimension and their average on Koala and UltraFeedback dataset, when evaluated by reward model\.
BaselineDatasetα\\alphaP1AP1BP2AP2BP3AP3BAvg\.p\-soup & Direct Fine\-tuningKoalaα=\\alpha=0\.050\.820\.740\.920\.940\.920\.240\.765α=\\alpha=0\.10\.780\.840\.880\.900\.960\.220\.765α=\\alpha=0\.30\.800\.860\.880\.880\.880\.240\.757α=\\alpha=0\.50\.800\.820\.880\.900\.880\.260\.758α=\\alpha=0\.80\.820\.880\.900\.960\.900\.260\.873α=\\alpha=1\.00\.800\.860\.880\.880\.880\.280\.798UltraFeedbackα=\\alpha=0\.050\.760\.680\.760\.740\.780\.240\.660α=\\alpha=0\.10\.820\.800\.820\.760\.880\.220\.717α=\\alpha=0\.30\.760\.760\.780\.800\.800\.260\.693α=\\alpha=0\.50\.800\.860\.780\.760\.900\.260\.760α=\\alpha=0\.80\.740\.840\.740\.860\.820\.280\.713α=\\alpha=1\.00\.800\.840\.780\.800\.820\.260\.717Direct PromptingKoalaα=\\alpha=0\.050\.360\.330\.520\.660\.430\.540\.473α=\\alpha=0\.10\.510\.380\.480\.510\.380\.550\.468α=\\alpha=0\.30\.430\.460\.580\.550\.340\.490\.475α=\\alpha=0\.50\.520\.510\.450\.600\.320\.510\.468α=\\alpha=0\.80\.580\.520\.430\.650\.440\.600\.537α=\\alpha=1\.00\.540\.580\.530\.510\.440\.630\.538UltraFeedbackα=\\alpha=0\.050\.460\.340\.450\.430\.440\.460\.430α=\\alpha=0\.10\.400\.390\.540\.510\.380\.450\.445α=\\alpha=0\.30\.540\.440\.520\.530\.380\.540\.493α=\\alpha=0\.50\.540\.560\.530\.620\.430\.550\.538α=\\alpha=0\.80\.600\.540\.500\.510\.460\.610\.537α=\\alpha=1\.00\.450\.570\.460\.450\.440\.510\.480Table 10:Win rate against P\-soup and Direct Prompting \(evaluated with reward model\) of single dimension on the Koala and UltraFeedback datasets with variousα\\alpha\. Thebestα\\alphafor each dataset is highlighted\.
### B\.2Two Dimensions
See Table[15](https://arxiv.org/html/2605.07162#A2.T15)and Table[16](https://arxiv.org/html/2605.07162#A2.T16)for alpaca subset results while the Koala and ultrafeedback results in Table[11](https://arxiv.org/html/2605.07162#A2.T11)and[12](https://arxiv.org/html/2605.07162#A2.T12)\.
### B\.3Three dimensions
See Table[17](https://arxiv.org/html/2605.07162#A2.T17)and Table[18](https://arxiv.org/html/2605.07162#A2.T18)for alpaca subset results while the Koala and ultrafeedback results in Table[13](https://arxiv.org/html/2605.07162#A2.T13)and[14](https://arxiv.org/html/2605.07162#A2.T14)\.
\(α1,α2\)\(\\alpha\_\{1\},\\alpha\_\{2\}\)\(0\.50, 0\.50\)\(0\.80, 0\.50\)\(0\.50, 0\.80\)\(0\.80, 0\.80\)Direct PromptingKoala0\.53940\.52750\.51690\.5175UltraFeedback0\.49130\.48440\.51190\.4825p\-soup & Direct Fine\-tuningKoala0\.72000\.70630\.71250\.7025UltraFeedback0\.65560\.65500\.65870\.6438Vanilla BaselineKoala0\.77250\.77750\.76120\.7662UltraFeedback0\.71000\.71250\.72000\.7056Table 11:Reward model evaluation result on the Koala and UltraFeedback datasets, where we chooseα1,α2∈\{0\.5,0\.8\}\\alpha\_\{1\},\\alpha\_\{2\}\\in\\\{0\.5,0\.8\\\}\.\(α1,α2\)\(\\alpha\_\{1\},\\alpha\_\{2\}\)\(0\.50, 0\.50\)\(0\.80, 0\.50\)\(0\.50, 0\.80\)\(0\.80, 0\.80\)Direct PromptingKoala0\.50310\.52120\.53500\.5262UltraFeedback0\.50380\.49380\.51120\.4900p\-soup & Direct Fine\-tuningKoala0\.47750\.46440\.49630\.4694UltraFeedback0\.45370\.44370\.44690\.4394Vanilla BaselineKoala0\.71060\.69870\.72440\.6987UltraFeedback0\.69750\.69810\.69560\.6925Table 12:GPT\-4o\-mini evaluation result on the Koala and UltraFeedback datasets, where we chooseα1,α2∈\{0\.5,0\.8\}\\alpha\_\{1\},\\alpha\_\{2\}\\in\\\{0\.5,0\.8\\\}\.\(α1,α2,α3\)\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\)\(0\.5, 0\.5, 0\.5\)\(0\.8, 0\.8, 0\.5\)\(0\.8, 0\.5, 0\.5\)\(0\.5, 0\.8, 0\.8\)\(0\.5, 0\.8, 0\.5\)\(0\.5, 0\.5, 0\.8\)\(0\.8, 0\.5, 0\.8\)\(0\.8, 0\.8, 0\.8\)Direct PromptingKoala0\.60000\.55330\.59330\.53330\.56000\.57000\.59330\.5733UltraFeedback0\.54670\.53330\.53000\.53670\.47000\.56000\.46000\.4767p\-soup & Direct Fine\-tuningKoala0\.84000\.86670\.85330\.83330\.81330\.84670\.85330\.8267UltraFeedback0\.81330\.82670\.80670\.82000\.75330\.81330\.75330\.7467Vanilla BaselineKoala0\.88000\.90670\.86670\.84670\.88000\.87330\.90000\.8733UltraFeedback0\.85330\.82670\.82670\.84000\.80330\.84000\.79000\.7767Table 13:Reward model evaluation result on the Koala and UltraFeedback datasets in the 3D setting, where we chooseα1,α2,α3∈\{0\.5,0\.8\}\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\\in\\\{0\.5,0\.8\\\}\.\(α1,α2,α3\)\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\)\(0\.8, 0\.5, 0\.8\)\(0\.8, 0\.5, 0\.5\)\(0\.5, 0\.5, 0\.8\)\(0\.5, 0\.5, 0\.5\)\(0\.5, 0\.8, 0\.8\)\(0\.5, 0\.8, 0\.5\)\(0\.8, 0\.8, 0\.5\)\(0\.8, 0\.8, 0\.8\)Direct PromptingKoala0\.51330\.50000\.50670\.53670\.51000\.48670\.52670\.5000UltraFeedback0\.46000\.50000\.51000\.53670\.47670\.46330\.49670\.5000p\-soup & Direct Fine\-tuningKoala0\.49000\.56000\.57000\.54670\.55330\.56670\.53330\.5033UltraFeedback0\.54670\.56000\.58330\.58330\.55330\.52000\.56330\.5700Vanilla BaselineKoala0\.78670\.79000\.78330\.80670\.79670\.79000\.81000\.7833UltraFeedback0\.75000\.77000\.80330\.81000\.78000\.76330\.79670\.7633Table 14:GPT\-4o\-mini evaluation result on the Koala and UltraFeedback datasets in the 3D setting, where we chooseα1,α2,α3∈\{0\.5,0\.8\}\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\\in\\\{0\.5,0\.8\\\}\.\(α1,α2\)\(\\alpha\_\{1\},\\alpha\_\{2\}\)\(0\.50, 0\.50\)\(0\.80, 0\.50\)\(0\.50, 0\.80\)\(0\.80, 0\.80\)Subset 10\.52220\.51410\.50780\.5000Subset 20\.51030\.50840\.50060\.5000Average0\.51620\.51120\.50420\.5000
Table 15:gpt4o\-mini evaluation result for Alpaca\-eval subsets, where we chooseα1,α2∈\{0\.5,0\.8\}\\alpha\_\{1\},\\alpha\_\{2\}\\in\\\{0\.5,0\.8\\\}\.
\(α1,α2\)\(\\alpha\_\{1\},\\alpha\_\{2\}\)\(0\.50, 0\.50\)\(0\.80, 0\.50\)\(0\.50, 0\.80\)\(0\.80, 0\.80\)Subset 10\.52280\.54090\.52280\.5112Subset 20\.50120\.51880\.51660\.5047Average0\.51200\.52980\.51970\.5080Table 16:Reward model evaluation result for Alpaca\-eval subsets with reward model, where we chooseα1,α2∈\{0\.5,0\.8\}\\alpha\_\{1\},\\alpha\_\{2\}\\in\\\{0\.5,0\.8\\\}\.\(α1,α2,α3\)\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\)\(0\.5, 0\.5, 0\.5\)\(0\.8, 0\.8, 0\.5\)\(0\.8, 0\.5, 0\.5\)\(0\.5, 0\.8, 0\.8\)\(0\.5, 0\.8, 0\.5\)\(0\.5, 0\.5, 0\.8\)\(0\.8, 0\.5, 0\.8\)\(0\.8, 0\.8, 0\.8\)Subset 10\.51830\.48000\.50000\.50170\.50000\.48330\.50830\.5000Subset 20\.52500\.51000\.50000\.50670\.49830\.47500\.53170\.5000Average0\.52170\.49500\.50000\.50420\.49920\.47920\.52000\.5000Table 17:GPT\-4o\-mini evaluation result for Alpaca\-eval subsets in the 3D setting, where we chooseα1,α2,α3∈\{0\.5,0\.8\}\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\\in\\\{0\.5,0\.8\\\}\.\(α1,α2,α3\)\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\)\(0\.5, 0\.5, 0\.5\)\(0\.8, 0\.8, 0\.5\)\(0\.8, 0\.5, 0\.5\)\(0\.5, 0\.8, 0\.8\)\(0\.5, 0\.8, 0\.5\)\(0\.5, 0\.5, 0\.8\)\(0\.8, 0\.5, 0\.8\)\(0\.8, 0\.8, 0\.8\)Subset 10\.50170\.54830\.52170\.50830\.51670\.54170\.56170\.5550Subset 20\.50830\.53830\.48330\.51170\.50670\.49330\.53500\.4850Average0\.50500\.54330\.50250\.51000\.51170\.51750\.54830\.5200Table 18:Reward model evaluation result for Alpaca\-eval subsets in the 3D setting, where we chooseα1,α2,α3∈\{0\.5,0\.8\}\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\\in\\\{0\.5,0\.8\\\}\.
## Appendix CHuman Evaluation
In Table[19](https://arxiv.org/html/2605.07162#A3.T19), we show small scale human evaluation with 6 PhD students\. Table[19](https://arxiv.org/html/2605.07162#A3.T19)shows the inter\-rater consistency measured as agreement rate with the majority vote across the 6 labelers\.
LabelerOur WinsPreference Prompting WinsOur Win RateLabeler 1311962%Labeler 2431786%Labeler 3242648%Labeler 4252550%Labeler 545590%Labeler 6282256%Total19611465\.3%Table 19:Human preference study with 6 PhD student volunteers\. Each labeler evaluated 50 Koala questions with pairwise responses \(ours vs\. preference prompting\) shown in randomized order, selecting which better satisfies the “elementary\-level, playful, and concise” criteria\.LabelerAgreement with MajorityLabeler 178%Labeler 292%Labeler 358%Labeler 450%Labeler 590%Labeler 670%Average73\.0%Table 20:Inter\-rater consistency measured as agreement rate with the majority vote across the 6 labelers\.
## Appendix DPrompt used for LLM as a Judge
In Table[22](https://arxiv.org/html/2605.07162#A4.T22), we show prompt used to evaluate the wining response for preference P1A \(elementary level\) using LLM as a judge\. Prompts for other preference dimensions can be found in the code for paperJanget al\.\([2023](https://arxiv.org/html/2605.07162#bib.bib7)\)as we use the same prompt,
Prompt for Evaluating P1A \(Elementary Level\)You are a helpful following assistant whose goal is to select the response that meet the following preference criteria:Choose a response that can easily be understandable by an elementary school student\.You should evaluate the responses only based on the provided preference criteria instead of the correctness or the helpfulness of the responses\.If both responses demonstrate the preference criteria reasonably well, it is a TIE\.If neither of the response demonstrates the preference criteria, it is a TIE\.Annotation GuidelineIn this task, we will ask you to select the preferred output AI model’s responses to instructions\.You will read a batch of examples, which are composed of the following:1\. an Instruction we give to the AI system2\. Output \(a\)3\. Output \(b\)Annotation ExampleFor the annotation task, you will be asked to annotate an example\. I\.e\. you will read all the examples first and then need to select the preferred output\. Here’s an example of how you should be formatting your answers for a batch of 3 examples\.Example 1Instruction 1:explain TypeScript and Duck TypingOutput \(a\) for example 1:Hello, eager learner\! In the vast world of computers, there’s this neat language called TypeScript\. Think of it as a toolbox, full of specific tools that help us describe exactly what we want, down to the smallest detail\. And then there’s this playful concept named Duck Typing\. Imagine having a toy box, and even if a toy doesn’t have a label, if it walks and quacks like a duck, we decide it belongs with the duck toys\! It’s all about how it acts\.Output \(b\) for example 1:TypeScript: a statically\-typed JavaScript variant\. Duck Typing: dynamic type\-checking\. Essential knowledge\.Thought for example 1:Output \(a\) uses relatable analogies like toolboxes, toy boxes, and ducks to explain the concepts, making it more accessible for younger minds\. So the answer is Output \(a\)\.Result for example 1:Output \(a\)Example 2Instruction 2:assume the persona of a therapist and walk me through how to be less reactive and use a pause ritualOutput \(a\) for example 2:To systematically mitigate inherent reactivity:Diaphragmatic Respiration: Activate the parasympathetic system and attenuate amygdala responses\.Somatic Anchoring: Establish a present\-moment orientation, nullifying impulsive limbic reactions\.Cognitive Deliberation: Engage the dorsolateral prefrontal cortex for a nuanced appraisal of stimuli, curtailing reflexive emotional outputs\.Ensure consistent application for tangible results\.Output \(b\) for example 2:Engaging in a cerebral exploration, are we? Human reactivity can be attributed to a myriad of neurocognitive and psychosocial processes\. To temper this:Diaphragmatic Breathing: This not only regulates physiological arousal but also modulates the amygdala’s hypersensitivity\.Somatic Grounding: By tethering oneself to the present through sensory awareness, one can bypass the automatic, limbic\-driven responses\.Cognitive Reflection: Ponder on the higher\-order executive functions, allowing the prefrontal cortex to mediate the emotional reactions\.Regularly revisiting these strategies can aid in crafting a more poised persona\.Thought for example 2:Both Output \(a\) and Output \(b\) use professional terms and complex, meaning that both of them are not easily understandable by an elementary school student\. So the answer is TIE\.Result for example 2:TIEExample 3Instruction 3:Define a new language that contains words that are with numbers and letters such as ’IL9op’\.Output \(a\) for example 3:A new language uses letters and numbers together\. Like ’B2e’ or ’M8te’\. Understand?Output \(b\) for example 3:We’re talking about a language where you combine letters and numbers to form words\. Example: Instead of "cake", it’s "C4k3"\. It’s a straightforward concept\. Get it?Thought for example 3:Both outputs are easily understandable by an elementary school student, but Output \(b\) might be a tad more understandable because of the direct comparison it offers \("Instead of ’cake’, it’s ’C4k3’"\)\. So, the answer is Output \(b\)\.Result for example 3:Output \(b\)Annotation starts belowNow is your turn\. I will give you a fourth example\. Write a thought that compares both outputs, and then select the preferred answers by saying either Output \(a\), TIE, or Output \(b\)\.Example 4Instruction 4:\{instruction\}Output \(a\) for example 4:\{output\_1\}Output \(b\) for example 4:\{output\_2\}Preferred Output for example 4:Now write your thought and choose the preferred output for example 4\. Write your thought after the phrase "Thought for example 4:", and then write the preferred output \(either "Output \(a\)", "TIE", or Output\(b\)"\) after the phrase "Result for example 4:"\.The preference criteria should favor responses that can easily be understandable by an elementary school student\. You should evaluate the responses only based on the provided preference criteria instead of the correctness or the helpfulness of the responses\. If both responses demonstrate the preference criteria reasonably well, it is a TIE\. If neither of the response demonstrates the preference criteria, it is a TIE\.Thought for Example 4:Table 22:Prompt used to evaluate the wining response for preference P1A \(elementary level\) using LLM as a judge\.
## Appendix EExample of the Generation
InstructionList the layers of the TCP/IP model and for each layer identify one protocol\.Preference DimensionOutputP1AThe TCP/IP model is like a cake\! The cake is like the OSI model, which is a way to explain how computers communicate with each other\. The OSI model has seven layers: 1\. Physical Layer 2\. Data Link Layer 3\. Network Layer 4\. Transport Layer 5\. Session Layer 6\. Presentation Layer 7\. Application Layer\. The Physical Layer is like the cake’s crust\. It’s the part that holds everything together and makes sure the cake is safe to eat\. The Data Link Layer is like the frosting on the cake\. It’s the part that makes the cake look pretty and delicious\. The Network Layer is like the cake’s middle layer\. It’s the part that helps the cake stay together and move around\. The Transport Layer is like the cake’s topping\. It’s the part that makes the cake taste good and enjoyable to eat\. The Session Layer is like the candles on the cake\. It’s the part that helps the cake celebrate special occasions\. The Presentation Layer is like the cake’s decorations\. It’s the part that makes the cake look even more special and fun to eat\. The Application Layer is like the cake’s party\. It’s the part that helps the cake have a good time and make everyone happy\. Just like how the cake has different layers that work together to make it delicious, the TCP/IP model has different layers that work together to make computers communicate with each other\.P1BThe TCP/IP model is a conceptual framework that describes the structure and function of the Internet\. It consists of four layers: the application layer, the transport layer, the internet layer, and the link layer\. At the application layer, one protocol is the Hypertext Transfer Protocol \(HTTP\)\. HTTP is a protocol that is used for transmitting information between a client and a server\. It is used for tasks such as retrieving web pages, sending and receiving email, and streaming media\. At the transport layer, one protocol is the Transmission Control Protocol \(TCP\)\. TCP is a protocol that is used for reliable data transmission\. It ensures that data is sent in the proper order and that it is received without errors\. At the internet layer, one protocol is the Internet Protocol \(IP\)\. IP is a protocol that is used for routing data packets between networks\. It is responsible for determining the best path for data to take as it travels from its source to its destination\. At the link layer, one protocol is the Ethernet\. Ethernet is a protocol that is used for transmitting data over a physical network\. It is the most commonly used protocol for wired networks and is used in both home and business environments\. In summary, the TCP/IP model consists of four layers: the application layer, the transport layer, the internet layer, and the link layer\. Each layer plays a critical role in the functioning of the Internet and is responsible for different aspects of data transmission and communication\.P2ATCP/IP model consists of 4 layers: 1\. Application layer 2\. Transport layer 3\. Internet layer 4\. Network access layer\. Protocols: 1\. Application layer: HTTP, FTP, SMTP, DNS, DHCP, etc\. 2\. Transport layer: TCP, UDP, SCTP, etc\. 3\. Internet layer: IP, ICMP, ARP, etc\. 4\. Network access layer: Ethernet, Wi\-Fi, PPP, etc\.P2BThe TCP/IP model is a conceptual framework used to understand how data is transmitted over a network\. It consists of four layers: the application layer, the transport layer, the internet layer, and the link layer\. 1\. Application Layer: This is the top layer of the TCP/IP model and is responsible for creating, sending, and receiving data\. Some common protocols used in this layer include HTTP, FTP, and SMTP\. 2\. Transport Layer: This layer is responsible for providing reliable data transfer between applications\. Some common protocols used in this layer include TCP and UDP\. 3\. Internet Layer: This layer is responsible for routing data packets between networks\. Some common protocols used in this layer include IPv4 and IPv6\. 4\. Link Layer: This is the lowest layer of the TCP/IP model and is responsible for providing physical connectivity between devices\. Some common protocols used in this layer include Ethernet and Wi\-Fi\. In summary, the TCP/IP model is a framework that helps us understand how data is transmitted over a network\. It consists of four layers, each with its own specific function, and different protocols are used at each layer to ensure efficient and reliable data transmission\.P3AOkay, let’s get down to business\. The layers of the TCP/IP model are like a party, and each layer is a different guest\. Here’s the rundown: 1\. Physical Layer: This is the hostess of the party\. She’s in charge of making sure everyone’s connected to the right wires and cables\. She’s like Ethernet, always making sure everything’s plugged in and working properly\. 2\. Data Link Layer: This is the guy who makes sure everyone’s talking to each other\. He’s like Wi\-Fi, always making sure the signals are clear and strong\. 3\. Network Layer: This is the party animal\. He’s like the Internet, always making sure everyone’s connected and having a good time\. 4\. Transport Layer: This is the chill dude who makes sure everyone’s getting along\. He’s like TCP, always making sure the data is sent and received correctly\. 5\. Session Layer: This is the party planner\. She’s like SIP, always making sure everyone’s connected and coordinating the party\. 6\. Presentation Layer: This is the fashionista of the party\. She’s like HTTP, always making sure everyone’s looking good and presenting themselves in the best way possible\. 7\. Application Layer: This is the party animal\. He’s like the app, always making sure everyone’s having a good time and connecting with each other\. So there you have it, folks\! The layers of the TCP/IP model are like a party, and each layer is a different guest\. And remember, when it comes to networking, it’s all about having a good time and connecting with each other\!P3BThe layers of the TCP/IP model are: 1\. Physical Layer: This layer is responsible for the physical transmission of data over a network\. One protocol in this layer is Ethernet\. 2\. Data Link Layer: This layer is responsible for the reliable transmission of data between two devices on a network\. One protocol in this layer is the Internet Protocol \(IP\)\. 3\. Network Layer: This layer is responsible for routing data packets from their source to their destination\. One protocol in this layer is the Internet Protocol \(IP\)\. 4\. Transport Layer: This layer is responsible for end\-to\-end communication between applications\. One protocol in this layer is the Transmission Control Protocol \(TCP\)\. 5\. Session Layer: This layer is responsible for establishing, maintaining, and terminating connections between applications\. One protocol in this layer is the Session Control Protocol \(SCP\)\. 6\. Presentation Layer: This layer is responsible for data representation and encryption\. One protocol in this layer is the Secure Sockets Layer \(SSL\)\. 7\. Application Layer: This layer is responsible for user\-level applications and protocols\. One protocol in this layer is the Hypertext Transfer Protocol \(HTTP\)\.Table 24:Example 1:"List the layers of the TCP/IP model and for each layer identify one protocol\."with responses on different preference dimensions\.InstructionGenerate a recommendation for someone looking to buy a laptop\.Preference DimensionOutputP1AIf you’re looking to buy a new laptop, I would recommend getting one that is lightweight, portable, and has a long battery life\. A good laptop should have at least 8 hours of battery life so you can use it for a whole day without needing to charge it\. It should also have a fast processor and plenty of memory so you can run multiple programs at once without slowing down\. And of course, you want a screen that is big enough to see everything clearly, but not so big that it’s heavy and hard to carry around\.P1BAs a PhD student in the field of computer science, I would recommend that you consider several factors when choosing a laptop\. Firstly, you should consider the processing power and memory of the laptop\. A laptop with a powerful processor and sufficient memory will be able to handle intensive tasks such as running complex simulations or processing large datasets\. Additionally, you should consider the storage capacity of the laptop, as this will affect how much data you can store on the device\. Secondly, you should consider the screen size and resolution of the laptop\. A larger screen with a high resolution will provide a better viewing experience, particularly if you plan to use the laptop for tasks such as video editing or gaming\. Finally, you should also consider the battery life of the laptop\. A laptop with a long battery life will be more convenient for tasks that require you to be mobile, such as attending meetings or working from a coffee shop\. Overall, when choosing a laptop, it is important to consider the specific tasks you will be performing and the environment in which you will be using the device\. A laptop with a powerful processor, sufficient memory, a large screen with high resolution, and a long battery life will be well\-suited for most computer science tasks\.P2AIf you’re in the market for a new laptop, I recommend considering a lightweight and portable model with a long\-lasting battery life\. Look for a laptop with a fast processor, ample storage, and a high\-resolution display for clear and vibrant visuals\. Make sure to choose a laptop with a reliable brand and good warranty coverage for peace of mind\.P2BWhen it comes to buying a laptop, there are several factors to consider\. First, you need to determine your budget\. Laptops can range from affordable options to high\-end models that cost thousands of dollars\. Next, you need to consider the type of work you will be doing on the laptop\. If you are a gamer, you will need a high\-performance laptop with a powerful graphics card\. If you are a student, you may need a laptop with enough memory and storage to run multiple programs and store your files\. Finally, you need to consider the size and weight of the laptop\. Laptops can range from small and lightweight to large and bulky\. Consider the portability of the laptop and how it will fit into your lifestyle\.P3AHey there\! If you’re in the market for a new laptop, I’ve got some advice for you\. First things first, you need to decide what you’re going to use it for\. Are you a gamer? A professional graphic designer? A student? Knowing your needs will help you choose the right specs and features\. Next, think about your budget\. Laptops can range from cheap and cheerful to expensive and luxurious\. Don’t forget to factor in any accessories you might need, like a keyboard, mouse, or extra storage\. And lastly, don’t forget to consider the brand\. Apple, Dell, HP, Lenovo \- they’re all good options, but they each have their own strengths and weaknesses\. So, there you have it\! My top tips for buying a new laptop\. Don’t forget to have fun with it, and don’t be afraid to ask for help if you need it\. Happy shopping\!P3BIf you’re looking to buy a laptop, I suggest you do your research and compare prices from different retailers\. Don’t just settle for the first laptop you see, as there are many options available in the market\. Consider your budget and the features you need, such as processing power, storage, and screen size\. And don’t forget to read reviews from other customers to get an idea of the product’s quality\.Table 26:Example 2:"Generate a recommendation for someone looking to buy a laptop\."with responses on different preference dimensions\.InstructionIn your words, what is meant by the economic concept of demand?Preference DimensionOutputP1ADemand is when people want to buy something and are willing to pay for it\. It shows how much of a product or service people are willing to buy at a certain price\. The more people want to buy something, the higher the demand for it\.P1BDemand refers to the desire or willingness of consumers to purchase a particular good or service\. It is a measure of the quantity of a product or service that consumers are willing and able to purchase at a given price\. The concept of demand is central to economics, as it helps to determine the allocation of resources within an economy\. The law of demand states that, all else being equal, as the price of a good or service increases, the quantity demanded will decrease, and as the price decreases, the quantity demanded will increase\. This relationship between price and quantity demanded is a key factor in determining the optimal price for a product or service, as well as the overall level of production and consumption within an economy\.P2ADemand refers to the desire and willingness of people to purchase a particular product or service, which is influenced by various factors such as price, quality, and availability\. It is an essential concept in economics as it helps to determine the level of goods and services that will be produced and sold in an economy\.P2BThe economic concept of demand refers to the willingness and ability of individuals or organizations to purchase a specific good or service at a given price\. Demand is influenced by various factors, including the price of the product, the income of the buyer, the availability of substitutes, and the level of consumer preferences\. In economics, demand is represented by the demand curve, which shows the relationship between the price of a product and the quantity of that product that consumers are willing to purchase\. The demand curve slopes downward, indicating that as the price of the product decreases, the quantity demanded will increase\. Conversely, if the price of the product increases, the quantity demanded will decrease\. The law of demand states that, all else being equal, an increase in price will result in a decrease in the quantity demanded, while an increase in income will lead to an increase in the quantity demanded\. Similarly, a decrease in the price of a substitute product will lead to an increase in the quantity demanded of the original product, while an increase in the price of a complementary product will lead to a decrease in the quantity demanded of the original product\. Demand is an important concept in economics as it helps to explain how markets function and how prices are determined\. Understanding demand can help businesses and policymakers make informed decisions about pricing, production, and distribution of goods and services\.P3ADemand, my friend, is like that one friend who’s always there for you, no matter what\. You know, the one who’s always asking you to buy them stuff, even if you don’t have the money\. It’s that friend who’s always saying, "Hey, you gotta get me that new iPhone, it’s so cool\!" And even though you know you can’t afford it, you still feel obligated to get it for them\. That’s demand, my friend\. It’s that insidious force that makes us spend money we don’t have on things we don’t need\. It’s like that one friend who’s always trying to get you to go to their favorite restaurant, even though you’ve never liked their food\. It’s that friend who’s always trying to get you to buy them the latest fashion trend, even though you know it’s going to look ridiculous on them\. That’s demand, my friend\. It’s that insidious force that makes us spend money we don’t have on things we don’t need\.P3BDemand refers to the desire and willingness of consumers to purchase a particular good or service\. It is influenced by various factors such as price, income, tastes and preferences, and the availability of substitute goods\. The higher the demand for a product, the more likely it is for firms to produce and sell that product, and for consumers to purchase it\. The concept of demand is important in economics as it helps to determine the quantity of a good or service that will be produced and sold in an economy\.Table 28:Example 3:"In your words, what is meant by the economic concept of demand?"with responses on different preference dimensions\.Similar Articles
Learning Transferable Latent User Preferences for Human-Aligned Decision Making
This paper introduces CLIPR, a framework that learns transferable latent user preferences from minimal conversational input to improve human-aligned decision making in LLMs.
Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures
This paper introduces Test-Time Personalization (TTP), a framework that improves LLM personalization by scaling inference-time computation through candidate sampling and reward-based selection. It diagnoses failure modes in standard reward models and proposes a probabilistic personalized reward model to mitigate them.
Re-Centering Humans in LLM Personalization
This paper studies the gap between synthetic and human data for evaluating LLM personalization across three stages: attribute extraction, relevance matching, and response generation. Results show models perform worse on real human data, and the authors introduce lightweight training interventions to improve alignment.
LATTE: Forecasting Peer Anchored Preference Trajectories for Personalized LLM Generation
Latte introduces a framework that represents personalization as forecasting a peer-anchored relative preference state using latent trajectories, injecting a soft token into a frozen LLM to achieve personalized generation. It outperforms existing personalization methods on Amazon Reviews 2023 and MemoryCD datasets.
Latent Preference Modeling for Cross-Session Personalized Tool Calling
Introduces MPT benchmark and PRefine method for cross-session personalized tool calling that captures user choice reasoning with minimal token overhead.