Right-Sizing Communication and Recommendation Set Size in AI-Assisted Search
Summary
This paper models the interaction between a user and an AI-driven recommendation system, analyzing optimal communication and recommendation set sizes under different sampling schemes to maximize expected utility.
View Cached Full Text
Cached at: 05/26/26, 09:02 AM
# Right-Sizing Communication and Recommendation Set Size in AI-Assisted Search
Source: [https://arxiv.org/html/2605.23944](https://arxiv.org/html/2605.23944)
Prakirt Raj Jhunjhunwala Amazon\.com Inc\. prakirt2203@gmail\.comYash Kanoria Columbia Business School, Columbia University ykanoria@gmail\.com
###### Abstract
We model the interaction between a user and an AI\-driven recommendation system\. The user initiates the process by conveying preference information through a costly and noisy message\. The AI assistant, acting as a Bayesian agent, interprets the user’s message to form a posterior belief about their true preferences and make product recommendations\. In particular, it determines how many recommendations to present so as to maximize the user’s expected utility from their final choice, while accounting for the search cost induced by the size of the recommendation set\. We use mutual information based cost functions to model the two distinct costs incurred by the user during the interaction: \(i\) acommunication cost, which increases with the precision of their preference message, and \(ii\) asearch cost, which increases with the size of the recommendation set provided by the AI assistant\.
We study products and preferences which live indddimensional space, and ask how the user’s expected payoff can be maximized\. For largedd, we characterize how optimal message precision and recommendation set size depend on the cost parameters, under two distinct distributions from which recommendations can be sampled from the product universe: \(i\)Bayesṕosterior belief, and \(ii\) an optimizedtilted distribution\. Under the posterior sampling scheme \(i\), we identify ahybrid regime, in which an efficient interaction policy requires jointly optimizing the amount of information \(in bits\) conveyed by the user and the number of recommendations provided by the AI assistant\. In the tilted sampling scheme \(ii\), our results show that the optimal interaction policy uses only one of communication and search, favoring whichever of them is less costly\.
Keywords:product recommendations, communication cost, search cost, sampling, choice overload
## 1Introduction
AI\-powered recommendation systems are increasingly embedded in customer\-facing platforms across domains such as e\-commerce\(Bansalet al\.[2025](https://arxiv.org/html/2605.23944#bib.bib10), Cuiet al\.[2017](https://arxiv.org/html/2605.23944#bib.bib9)\), healthcare\(Mooret al\.[2023](https://arxiv.org/html/2605.23944#bib.bib13), Jianget al\.[2017](https://arxiv.org/html/2605.23944#bib.bib16), Yuet al\.[2018](https://arxiv.org/html/2605.23944#bib.bib15)\), and education\(Kasneciet al\.[2023](https://arxiv.org/html/2605.23944#bib.bib14), Atchleyet al\.[2024](https://arxiv.org/html/2605.23944#bib.bib12)\)\. As these systems proliferate, the nature of human decision\-making is shifting from direct search to interactive delegation\. This shift is particularly salient in e\-commerce environments, where customers routinely face thousands of options for a single product category—for example, a search for headphones on Amazon can return an overwhelming number of results\. Carefully reading product specifications and reviews to identify the option that best matches one’s preferences can therefore be extremely time\-consuming and cognitively demanding\(Schwartz[2015](https://arxiv.org/html/2605.23944#bib.bib8), Scheibehenneet al\.[2010](https://arxiv.org/html/2605.23944#bib.bib7)\)\. AI shopping assistants can alleviate this burden by aggregating product information and generating recommendations based on a posterior belief over the customer’s latent preferences\. In this work, we use the termagentto refer to an AI shopping assistant whose objective is to optimally assist the customer\. \(In particular, our shopping “agents” will neither be strategic nor enjoy much autonomy\.\) We also refer to customers asusers\.
In modern recommendation settings, agents excel at processing large, complex product spaces and narrowing down feasible options, but they typically lack direct access to users’ true preferences\. Eliciting this information requires communication from the user, an inherently costly activity for the user due to the cognitive effort and time spent\. As a result, agents rely on limited user interactions to infer preferences and tailor recommendations\. Our work studies how to optimally structure this user–agent interaction in a product recommendation setting\. Here,interactionrefers specifically to the exchange of information between the user and the agent, namely the user’s communication of preference information and the agent’s provision of product recommendations\. The key mechanism governing this interaction is that user preference communication improves the relevance of the recommendation set, while the agent’s provision of multiple recommendations addresses residual uncertainty in those preferences\.
Research Question\.*How should one jointly choose the amount of preference information communicated by the user and the number of recommendations provided to the user, to balance communication costs, search costs, and product utility?*
This paper introduces a theoretical framework to study AI\-assisted decision\-making, where the user obtains utility from the selected item and incurs acommunication costfor specifying preferences and asearch costfor evaluating the recommendation set\. Upon receiving a user message, the agent constructs a sampling distribution and generates a recommendation menu consisting of independent samples from this distribution\. The menu size is optimized to balance two competing objectives: maximizing the expected utility of the best option in the set \(i\.e\., ensuring the menu contains a highly preferred item\), while keeping the set small enough to limit the user’s search costs\. Anticipating the agent’s response, the user can carefully choose the precision of their message, trading off the benefits of a more targeted recommendation set against the cost of communication\. Our analysis uses a high\-dimensional approximation in terms of the number of features and reveals a sharp distinction between two practically motivated recommendation sampling schemes\. The default approach \(for a Gen AI\-based agent\) is posterior sampling, whereby each recommendation is an independent and identically distributed \(i\.i\.d\.\) draw from the posterior distribution over the user’s preferred product\. Under posterior sampling, optimal performance generally requires an optimal blend of user preference communication and multiple product recommendations\. In contrast, under optimally designed importance sampling of product recommendations \(which we call sampling from a “tilted” distribution\), the optimal policy turns out to be “pure”, leveraging either only communication or only search, depending on their relative costs\.
### 1\.1Main Contributions
This paper advances the theoretical understanding of AI\-powered recommendation systems driven by user–agent interaction\. We make three primary contributions\. First, we develop a novel model of user–agent interaction that accounts for the communication cost of preference elicitation and the search cost of evaluating recommendations, with costs measured information\-theoretically \(via KL divergence and set size entropy\)\.
Second, we derive a high\-dimensional approximation of the problem under an asymptotic regime where the number of features \(dd\) becomes large, i\.e\.,d→∞d\\rightarrow\\infty\. By establishing a Large Deviation Principle \(LDP\) for the maximum utility induced as a function of recommendation set size, we reduce the complex stochastic optimization problem to a deterministic optimization problem, yielding an explicit characterization of the optimal system design\. We note that the large\-ddregime is reminiscent of the architecture of modern recommendation systems, where user preferences and product features are represented as high\-dimensional embeddings\.
Third, using the analytic solution for the asymptotic limit, we characterize the system’s performance under the optimal interaction policy and identify distinct operational regimes that arise from the relative magnitudes of the communication and search costs\. Next, we provide a brief overview of our model, followed by the key insights derived from our work\.
#### 1\.1\.1Model:
We introduce a stylized model which formalizes the user–agent interaction in add\-dimensional spherical feature space, where both user preferences and products are represented as unit vectors\. The utility derived from a product is its alignment with the user’s true preference, measured by the dot product\. User preferences are uniformly distributed, and every point on the surface of thedd\-dimensional sphere is a product\. The interaction unfolds in three stages that model the costly exchange of information between the user and the agent, with information theory\-inspired quantification of costs\. First, the user transmits a noisy message with precisionκ\\kappa, incurring a communication cost equal to the KL\-divergence between the agent’s prior and posterior beliefs of the user’s preference vector, multiplied by a communication cost parameterλc\\lambda\_\{c\}\. Second, the agent generates a recommendation menu of sizenn, modeled as independent draws from a sampling distribution\. Finally, the user identifies the utility\-maximizing item from the menu, incurring a search cost equal to the logarithm of the menu size multiplied by a search\-cost parameterλs\\lambda\_\{s\}\. We frame the interaction as a collaborative process where the shared goal is to maximize the user’s expected payoff \(utility from the chosen item minus the search and communication costs\) by jointly optimizing the precision of the user’s messageκ\\kappaand the size of the recommendation setnnprovided by the agent\. We consider two recommendation set sampling strategies: \(i\) posterior sampling, where the agent samples items from the posterior over the user’s preferences vector, and \(ii\) optimized “tilted” sampling, where recommendations are drawn from a modified distribution that places additional, optimally chosen weight \(a “tilt”\) on the user’s communicated preference\.
#### 1\.1\.2Results and Insights:
In this section, we describe our analytic results characterizing the optimal message precision \(κ\\kappa\) and recommendation set size \(nn\)\. More extensive discussion is provided in Section[3\.2\.1](https://arxiv.org/html/2605.23944#S3.SS2.SSS1)and Section[3\.3\.1](https://arxiv.org/html/2605.23944#S3.SS3.SSS1)\. We first discuss the setting where recommendations are sampled from the posterior distribution over user preferences, mirroring what one may expect a Gen AI model to do by default, in the absence of any intentional tuning\.
##### Joint optimization and hybrid communication\-search regime:
Figure 1:Qualitative illustration of the optimal interaction regimes under posterior sampling\.A central insight from our analysis is that optimal performance requires balancing message precision and recommendation set size according to the relative magnitudes of communication and search costs\. We identify a single switching curve in the cost–parameter space\.
On one side of this curve, the optimal policy jointly optimizes communication and search; on the other side, it relies solely on search\. In particular, when the search and communication costs scale asλs∼1/d\\lambda\_\{s\}\\sim 1/dandλc∼1/d\\lambda\_\{c\}\\sim 1/d, there exists a nontrivial region – bounded by this switching curve – in which both costs are economically significant \(see the Hybrid region in Fig\.[1](https://arxiv.org/html/2605.23944#S1.F1)\)\. Within this hybrid regime, optimal performance requiresjoint optimization: nontrivial communication \(positive message precision\) together with non\-degenerate recommendation sets \(more than a single item\)\.
Intuition behind the scaling laws under the hybrid regime:The high\-dimensional approximation also allows us to characterize the scaling laws for message precisionκ\\kappaand recommendation set sizennwith respect to the feature dimensiondd\.
- •Scaling of message precision:Our model assumes that the user preferences are uniformly distributed on add\-dimensional unit hypersphere\. In this setting, the total uncertainty \(entropy\) about a user’s preference scales linearly with the dimensiondd\. For communication to be meaningful, the information a user provides must overcome anΩ\(1\)\\Omega\(1\)fraction of this uncertainty\. Our results confirm that in the hybrid regime, the user communicates a fraction of their total preference information, i\.e\., providing an amount of information that scales linearly withdd\. Intuitively, this is similar to the user perfectly specifying a subset of their preference “features” while leaving the rest unspecified\.
- •Scaling of recommendation set size:When a user provides partial information \(or no information\), a sizeable product subspace remains where their preferred item might reside\. To account for these unspecified features, the agent must provide multiple recommendations to ensure adequate coverage\. Intuitively, if each unspecified attribute can vary over a few qualitatively distinct levels \(e\.g\., low/medium/high price, sporty/casual/professional style\), then the agent would need to provide a recommended product for each possible combination of these levels for unspecified features, leading to an optimal recommendation set size which grows exponentially indd\. We note that the optimal recommendation set size scales asn=exp\(αd\)n=\\exp\(\\alpha d\)with typically a small exponentα\\alpha\. As a result, the number of recommended items is much smaller than the size of the product space\. As a concrete example, ford=15d=15, we find thatα=0\.15\\alpha=0\.15for plausible communication and search costs, yielding an optimal recommendation set size ofn=15n=15, which is realistic\. Note that the dimensionddin our model should be interpreted as capturing the effective preference complexity relevant for the interaction sod=15d=15may be reasonable for many product categories\. We further study a model extension that allows different features to carry different weights \(see Appendix[C](https://arxiv.org/html/2605.23944#A3)\)\.
##### Recommendations from tilted distribution:
Next, we consider a more sophisticated recommendation sampling policy: importance sampling of product recommendations, which we call sampling from a “tilted” distribution, loosely inspired by the possibility of creating intentionally tuned Gen AI tools such as ChatGPT Shopping Research mode\. The core insight is that the agent can optimally manage \(in a high\-dimensional setting\) the trade\-off between exploiting the user’s message and diversifying across uncommunicated features by adjusting a single, deterministic “tilt” parameter; as a result we call this approachtilting\. Here, the tilt parameter acts as a design lever that the agent must calibrate based on both the message precision as well as the recommendation set size\.
Figure 2:Qualitative illustration of the optimal interaction regimes under tilted sampling\.Our analysis demonstrates that optimal tilting outperforms direct posterior sampling, and increases the return to communication by placing greater weight on the user’s message\. Furthermore, when all control parameters \(message precision, recommendation set size, and the tilt parameter\) are jointly optimized, perhaps surprisingly, a sharp phase transition emerges for largedd\. The optimal policy is pure, relying entirely on either communication or search, whichever is more cost\-effective, with no hybrid regime arising\. This behavior is qualitatively illustrated by the diagonal switching curve in Fig\.[2](https://arxiv.org/html/2605.23944#S1.F2): on one side of a diagonal boundary in the cost\-parameter space the policy relies entirely on communication, while on the other it relies entirely on search\.
Two natural questions emerge: First, why does optimal tilting place greater weight on the user’s message than posterior sampling? At one extreme, when there is only a single recommendation \(n=1n=1\), the optimal action is to recommend the item corresponding exactly to the user’s message \(the maximum possible tilting\); there is no room to compensate for the uncertainty in preferences\. At the other extreme, when the number of recommendations is very large \(n→∞n\\to\\infty\), sampling from the posterior distribution \(no tilting\) works well, since with high probability at least one sampled item lies very close to the user’s true type, and additional tilting would only distort this distribution and reduce expected utility\. For finiten\>1n\>1, we are in an intermediate situation and appropriate tilting towards the user’s message helps by acting as a form of regularization \(avoiding overfitting to the posterior distribution\)\.
Second, why does the optimal policy under largeddeffectively use only one of the two modes of interaction when tilting is allowed? The key intuition is that tilting provides a lever for controlling how the user’s message is translated into recommendations\. By tuning this lever, in our model for largedd, the system is able to fully exploit the user’s communication, and also make best use of product recommendations\. As a result, the optimal policy favors a single mode, specifically, the mode which costs less per unit \(recall that our model quantifies both costs along information theoretic lines\)\.
We note that posterior sampling and tilting represent fundamentally different agent capabilities, not simply alternative methods with different performance\. Posterior sampling captures aconstrainedagent architecture, typical of modular or black\-box generative systems, where the agent cannot modify its underlying posterior distribution\. In this setting, balancing communication and search is essential to improve performance\. In contrast, the tilted distribution represents anoptimizedagent with the architectural flexibility to modify its internal sampling distribution to directly target the system objective\. This additional design freedom enables both higher performance and qualitatively different interaction regimes\. At the same time, realizing these gains requires a more sophisticated training and tuning pipeline, although such complexity may be justified in high\-stakes applications such as e\-commerce\. Importantly, only a single\-dimensional tilting parameter is needed, making the implementation analogous to choosing a context\-dependent temperature parameter in modern generative systems\.
### 1\.2Related Work
##### AI\-Assisted Decision Making:
With the advent of chatbots and AI agents, there is huge interest in works that examine how human decision\-making interacts with outputs from algorithmic agents\.Kleinberget al\.\([2018](https://arxiv.org/html/2605.23944#bib.bib46)\)show that machine learning predictions can augment human judgment in socially sensitive contexts, whileAngelovaet al\.\([2023](https://arxiv.org/html/2605.23944#bib.bib43)\)study settings where individuals retain discretion over whether to accept or reject algorithmic recommendations\. In the medical domain,Agarwalet al\.\([2023](https://arxiv.org/html/2605.23944#bib.bib40)\)provide experimental evidence that combining human expertise with AI predictions improves diagnostic accuracy, though the extent of improvement depends critically on how information is communicated and the cognitive effort required from experts\.
Another strand of research investigates the cognitive underpinnings of Human–AI interaction\.Vasconceloset al\.\([2023](https://arxiv.org/html/2605.23944#bib.bib17)\)show that overreliance on AI persists even when predictions are accompanied by explanations, and argue that this is not simply a cognitive bias but rather a strategic choice rooted in cost–benefit reasoning: decision\-makers implicitly weigh the cognitive effort of verification against the ease of acceptance\. In a similar spirit,Boyacıet al\.\([2024](https://arxiv.org/html/2605.23944#bib.bib44)\)use a rational inattention framework \(similar to our work\) to study how machine predictions affect human decision\-makers with limited cognitive capacity\. They find that while AI assistance improves accuracy, it can also increase error rates and induce users to exert more cognitive effort\.
Closest to our work isCastroet al\.\([2024](https://arxiv.org/html/2605.23944#bib.bib19)\), who study the trade\-off between output fidelity and communication cost under single\-output recommendations\. They show that preference heterogeneity can introduce substantial bias when only one recommendation is presented\. In contrast, we argue that offering multiple recommendations can reduce this bias, as it allows users to choose from a broader set and better align outcomes with their preferences\. Similarly,Liang \([2025](https://arxiv.org/html/2605.23944#bib.bib45)\)analyze delegation to AI “clones” and find that in high\-dimensional settings, noise degrades performance and humans may outperform\. Our framework shows that when multiple outputs are available, users can mitigate this risk by selecting the best fit, and further, that in high dimensions it is optimal for users to communicate richer information to ensure credible recommendations\.
##### Costly Information Acquisition:
From an economic and information\-theoretic perspective, our model builds on the rational inattention \(RI\) framework\(Sims[2003](https://arxiv.org/html/2605.23944#bib.bib53),[2005](https://arxiv.org/html/2605.23944#bib.bib52)\), which has become a foundational approach to modeling bounded rationality under costly information processing\. The central insight of RI is that decision\-makers optimally allocate limited attention across signals, trading off informativeness against the cost of acquiring information\. This framework has been applied extensively in macroeconomics and finance, including monetary policy design\(Sims[2006](https://arxiv.org/html/2605.23944#bib.bib49), Maćkowiak and Wiederholt[2009](https://arxiv.org/html/2605.23944#bib.bib31)\), portfolio choice\(Zhong[2022](https://arxiv.org/html/2605.23944#bib.bib47)\), and multivariate information acquisition problems\(Miaoet al\.[2022](https://arxiv.org/html/2605.23944#bib.bib48)\)\. Closely related are approaches in information theory, such as rate–distortion methods\(Tishby and Polani[2010](https://arxiv.org/html/2605.23944#bib.bib20)\), which also study trade\-offs between informativeness and cost\. We also refer the reader to the review\(Maćkowiaket al\.[2023](https://arxiv.org/html/2605.23944#bib.bib51)\)for more details\. While most prior work applies RI to single\-agent decision problems, few works\(Castroet al\.[2024](https://arxiv.org/html/2605.23944#bib.bib19), Boyacıet al\.[2024](https://arxiv.org/html/2605.23944#bib.bib44)\), alongside our own, have used the RI framework to study the cognitive costs and trade\-offs in human–AI interaction\.
##### Product Recommendation:
Research on product recommendation has traditionally centered on collaborative filtering\(Korenet al\.[2021](https://arxiv.org/html/2605.23944#bib.bib38), Su and Khoshgoftaar[2009](https://arxiv.org/html/2605.23944#bib.bib37)\)and content\-based methods\(Thoratet al\.[2015](https://arxiv.org/html/2605.23944#bib.bib39)\), with hybrid approaches combining user\-item interaction histories, product features, and contextual signals\(Sarwaret al\.[2001](https://arxiv.org/html/2605.23944#bib.bib36), Lindenet al\.[2003](https://arxiv.org/html/2605.23944#bib.bib35)\)\. These foundational methods, widely deployed in e\-commerce platforms like Amazon and Netflix, rely on structured data to exploit similarities across users or products\. A parallel stream of research in human\-computer interaction and information retrieval has focused on the concept of theselection set, the curated subset of options from which a user makes their final choice\. This literature recognizes that the value of a recommendation system lies not only in the relevance of individual items but also in the composition and size of the presented set\. In particular, diversity can improve coverage under preference uncertainty\(Ziegleret al\.[2005](https://arxiv.org/html/2605.23944#bib.bib2)\), while overly large sets risk cognitive overload and degrade decision quality\(Iyengar and Lepper[2000](https://arxiv.org/html/2605.23944#bib.bib5)\)\. This perspective, that the recommendation task fundamentally involves managing a user’s attention budget, directly motivates our theoretical treatment of communication and search costs\.
The recent advent of Large Language Models \(LLMs\) has marked a new phase in recommendation research, introducing powerful new capabilities\. LLMs can act as natural\-language interfaces, allowing users to express complex preferences conversationally rather than through simple clicks or ratings\(Zhanget al\.[2023](https://arxiv.org/html/2605.23944#bib.bib32)\)\. Furthermore, pre\-trained on vast corpora, they serve as knowledge\-rich priors that encode deep semantic relationships, enabling effective zero\-shot and few\-shot recommendation\(Baoet al\.[2024](https://arxiv.org/html/2605.23944#bib.bib22), Wuet al\.[2024](https://arxiv.org/html/2605.23944#bib.bib21)\)\. Architectures such as retrieval\-augmented generation \(RAG\)\(Lewiset al\.[2020](https://arxiv.org/html/2605.23944#bib.bib6)\)now explicitly separate the retrieval of a candidate set from the generation of the final recommendation, mirroring the two\-stage structure of our model\. Our contribution complements these developments by offering a theoretical framework that characterizes how users optimally trade off communication effort with reliance on the AI\-generated recommendation set, an aspect largely absent from algorithmic studies of LLM\-based recommenders\.
## 2Model Description
We consider a market consisting of a continuous space of products, each represented by a feature vector𝜽∈𝒮d−1\\bm\{\\theta\}\\in\\mathcal\{S\}^\{d\-1\}, where𝒮d−1\\mathcal\{S\}^\{d\-1\}is the surface of a unit sphere inℝd\\mathbb\{R\}^\{d\}, that is,𝒮d−1=\{𝜽∈ℝd:‖𝜽‖2=1\}\\mathcal\{S\}^\{d\-1\}=\\\{\\bm\{\\theta\}\\in\\mathbb\{R\}^\{d\}:\\\|\\bm\{\\theta\}\\\|\_\{2\}=1\\\}\. Users are similarly characterized by a preference vector𝐡∈𝒮d−1\\mathbf\{h\}\\in\\mathcal\{S\}^\{d\-1\}, which encodes their most preferred product profile ortrue preferences\. We assume user preferences follow a uniform distribution over𝒮d−1\\mathcal\{S\}^\{d\-1\}, giving us𝐡∼p\(𝐡\)=Cd\(0\)\\mathbf\{h\}\\sim p\(\\mathbf\{h\}\)=C\_\{d\}\(0\), where1/Cd\(0\)1/C\_\{d\}\(0\)is the surface area of𝒮d−1\\mathcal\{S\}^\{d\-1\}\.
The utility that a user derives from selecting a product with feature vector𝜽\\bm\{\\theta\}depends on how well the product matches their preference vector𝐡\\mathbf\{h\}, and is given by the dot product between the two vectors, i\.e\.,u\(𝜽,𝐡\)=⟨𝐡,𝜽⟩u\(\\bm\{\\theta\},\\mathbf\{h\}\)=\\langle\\mathbf\{h\},\\bm\{\\theta\}\\rangle\. Note that the utility of a product with feature𝜽\\bm\{\\theta\}can also be expressed as the negative squaredℓ2\\ell\_\{2\}\-distance from the user’s preferences𝐡\\mathbf\{h\}, i\.e\.,u\(𝜽,𝐡\)=1−12‖𝜽−𝐡‖22u\(\\bm\{\\theta\},\\mathbf\{h\}\)=1\-\\frac\{1\}\{2\}\\\|\\bm\{\\theta\}\-\\mathbf\{h\}\\\|^\{2\}\_\{2\}\.
Motivation behind the feature space model:In our model, the space of products is represented through a set of underlying features \(such as color, or size\) that together determine user preferences\. We assume that this feature space is characterized by a Pareto frontier, where each point corresponds to a product that is most preferred by some subset of users\. Intuitively, any product that is strictly dominated, i\.e\., less preferred by all users compared to another, is redundant and can be excluded from the space\. For a tractable stylized representation, we model this Pareto frontier as a high\-dimensional isotropic space𝒮d−1\\mathcal\{S\}^\{d\-1\}\.
A complementary motivation for modeling the feature space as𝒮d−1\\mathcal\{S\}^\{d\-1\}arises from a tokenized representation of products and preferences, commonly employed in modern recommendation and language models\. In such systems, product descriptions, and similarly, users’ preferences, comprising textual, visual, and numerical attributes, are embedded as high\-dimensional vectors in a shared semantic space\. The relevance of a product to a user’s preferences is then primarily determined by the alignment between their embeddings, typically measured by cosine similarity\. Such embedding\-based representations form the foundation of large\-scale recommendation systems and language models \(see, e\.g\.,Barkan and Koenigstein[2016](https://arxiv.org/html/2605.23944#bib.bib23)\)\.
User–Agent interaction:We model the user–agent interaction as a three\-stage process\.
User \(𝐡\\mathbf\{h\}\)AgentUser𝜽∗\\bm\{\\theta\}^\{\*\}𝐦∼pκ\(𝐦\|𝐡\)\\mathbf\{m\}\\sim p\_\{\\kappa\}\(\\mathbf\{m\}\|\\mathbf\{h\}\)Message𝜽1,…,𝜽n∼qκ\(𝐡\|𝐦\),i\.i\.d\.\\bm\{\\theta\}\_\{1\},\\dots,\\bm\{\\theta\}\_\{n\}\\sim q\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\),\\text\{ i\.i\.d\.\}RecommendationsSearch
- \(i\)Communication:The interaction starts with the user transmitting a message \(or context\)𝐦\\mathbf\{m\}to the agent, where the message is generated stochastically given their feature𝐡\\mathbf\{h\}via a conditional distributionpκ\(𝐦\|𝐡\)p\_\{\\kappa\}\(\\mathbf\{m\}\|\\mathbf\{h\}\)\. This formulation captures the practical reality that users often find it cumbersome to articulate their complete preferences, as such, the message𝐦\\mathbf\{m\}may be incomplete, noisy, or only partially informative about the underlying preference vector𝐡\\mathbf\{h\}\. The stochasticity inpκ\(⋅\|𝐡\)p\_\{\\kappa\}\(\\cdot\|\\mathbf\{h\}\)thus reflects both communication noise and inherent ambiguity in self\-reported preferences\. In this work, we assume that the message provided by the user follows a von Mises Fisher distribution \(vMF\)\(Mardia and Jupp[2009](https://arxiv.org/html/2605.23944#bib.bib18)\), given by pκ\(𝐦\|𝐡\)=Cd\(κ\)exp\(κ⟨𝐡,𝐦⟩\),p\_\{\\kappa\}\(\\mathbf\{m\}\|\\mathbf\{h\}\)=C\_\{d\}\(\\kappa\)\\exp\\big\(\\kappa\\langle\\mathbf\{h\},\\mathbf\{m\}\\rangle\\big\),\(1\)whereCd\(κ\)C\_\{d\}\(\\kappa\)is the normalization constant, with a known closed form expression\(Mardia and Jupp[2009](https://arxiv.org/html/2605.23944#bib.bib18), Chapter 9\)\. The parameterκ\\kappais typically known as theconcentration parameteras it governs how closely the user’s message aligns with the true preference vector𝐡\\mathbf\{h\}, where a largerκ\\kappaimplies that𝐦\\mathbf\{m\}is concentrated closer to𝐡\\mathbf\{h\}\. As such, we refer toκ\\kappaas themessage precision\. Whenκ=0\\kappa=0, the distributionp0\(⋅\|𝐡\)p\_\{0\}\(\\cdot\|\\mathbf\{h\}\)matches the uniform distributionp\(⋅\)p\(\\cdot\)\.
- \(ii\)Recommendation:The agent interprets the message provided by the user, and uses Bayes’ rule to obtain a posteriorqκ\(𝐡\|𝐦\)q\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\)usingp\(𝐡\)p\(\\mathbf\{h\}\)as the prior, which results inqκ\(𝐡\|𝐦\)=Cd\(κ\)exp\(κ⟨𝐡,𝐦⟩\)q\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\)=C\_\{d\}\(\\kappa\)\\exp\\big\(\\kappa\\langle\\mathbf\{h\},\\mathbf\{m\}\\rangle\\big\)\. Using the inferred posteriorqκ\(𝐡\|𝐦\)q\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\), the agent generates a menu ofnnproduct recommendations\{𝜽1,…,𝜽n\}\\\{\\bm\{\\theta\}\_\{1\},\\dots,\\bm\{\\theta\}\_\{n\}\\\}\. In Section[3](https://arxiv.org/html/2605.23944#S3), we study the setting where𝜽i∼qκ\(⋅\|𝐦\)\\bm\{\\theta\}\_\{i\}\\sim q\_\{\\kappa\}\(\\cdot\|\\mathbf\{m\}\)for alli∈\[n\]i\\in\[n\], i\.i\.d\. This is a stylized but natural modeling assumption motivated by modern Gen AI–based recommender systems: the agent produces recommendations by sampling from its posterior belief over the user’s preferences\. Conceptually, this reflects the fact that autoregressively trained models act as implicit simulators of complex conditional distributions: having internalized rich latent structure during training, they can generate samples from the conditional distribution of “what the user might like” given the prompt, without explicitly constructing or optimizing a posterior\. Recommendations from tilted distribution:In a later part \(Section[4](https://arxiv.org/html/2605.23944#S4)\), we consider a more sophisticated agent that optimally tilts \(modifies\) the posterior distribution to maximize the expected utility of the best\-performing item in the recommendation menu\. This approach explicitly accounts for the fact that the user’s utility is governed by the maximum of then<∞n<\\inftyrecommended items\. Consequently, the tilted distribution and the recommendation set sizennare jointly optimized to maximize the user’s expected utility\.
- \(iii\)Search:After receiving the recommendations\{𝜽1,…,𝜽n\}\\\{\\bm\{\\theta\}\_\{1\},\\dots,\\bm\{\\theta\}\_\{n\}\\\}, the user evaluates each product to identify the best match based on their true preferences,𝐡\\mathbf\{h\}, i\.e\., the user chooses the item 𝜽∗=argmax𝜽i⟨𝐡,𝜽i⟩\.\\displaystyle\\bm\{\\theta\}^\{\*\}=\\arg\\max\_\{\\bm\{\\theta\}\_\{i\}\}\\ \\langle\\mathbf\{h\},\\bm\{\\theta\}\_\{i\}\\rangle\.We assume that the user perfectly identifies the item that maximizes this alignment among thennrecommendations\.
Objective:We now formalize the utility the user derives from a chosen product and the costs they incur during the interaction process\. Throughout the remainder of the paper, we refer to the pair\(κ,n\)\(\\kappa,n\)as theinteraction policy\.
- \(i\)Utility from chosen recommendation:As mentioned before, given the set of recommendations\{𝜽1,…,𝜽n\}\\\{\\bm\{\\theta\}\_\{1\},\\dots,\\bm\{\\theta\}\_\{n\}\\\}, the user chooses the product that best matches their preferences\. As such, the product utility received by the user is given bymaxi⟨𝐡,𝜽i⟩\\max\_\{i\}\\langle\\mathbf\{h\},\\bm\{\\theta\}\_\{i\}\\rangle\.
- \(ii\)Communication cost:We assume that when the agent acquires information \(message𝐦\\mathbf\{m\}\) about a user’s features𝐡\\mathbf\{h\}to update their initial belief \(the priorp\(𝐡\)p\(\\mathbf\{h\)\}\) to a more informed one \(the posteriorqκ\(𝐡\|𝐦\)q\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\)\), communication cost accrues to the user\. We consider the cost to be linearly proportional to theinformation gain, quantified by the KL\-divergence between the prior and posterior distributions\. As such, the communication cost isλcDKL\(qκ\(𝐡\|𝐦\)∥p\(𝐡\)\)\\lambda\_\{c\}D\_\{\\mathrm\{KL\}\}\\big\(q\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\)\\\|p\(\\mathbf\{h\}\)\\big\), whereλc\\lambda\_\{c\}is the communication cost parameter\.
- \(iii\)Search cost:The search cost represents the effort a user expends to find the best product among a set ofnnrecommendations\. Similar to the communication cost, we assume the cost is proportional to the information gain\. In particular, we assume that at the start of the search process, a user views allnnproduct recommendations as equally likely to be the best, i\.e\., the user’s prior on the index of the best item follows a uniform distribution over the set of indices\[n\]\[n\]\. Once the user determines the best product, their belief becomes a certainty \(𝜽i\\bm\{\\theta\}\_\{i\}is the best with probability either0or11\), and the total information gain during the search process equals the entropy of the uniform distribution over\[n\]\[n\], i\.e\.,logn\\log n\. Motivated by this, we assume the user’s search cost isλslogn\\lambda\_\{s\}\\log n, whereλs\\lambda\_\{s\}is the search cost parameter\.
Overall, the objective of an average user is given by
𝒫d\(κ,n\):=𝔼𝐡∼p\(𝐡\)𝔼𝐦∼pκ\(𝐦\|𝐡\)𝔼𝜽i∼qκ\(𝐡\|𝐦\),i\.i\.d\.\[maxi⟨𝐡,𝜽i⟩−λslogn−λcDKL\(qκ\(𝐡\|𝐦\)∥p\(𝐡\)\)\]\.\\displaystyle\\mathcal\{P\}\_\{d\}\(\\kappa,n\):=\\mathbb\{E\}\_\{\\mathbf\{h\}\\sim p\(\\mathbf\{h\}\)\}\\mathbb\{E\}\_\{\\mathbf\{m\}\\sim p\_\{\\kappa\}\(\\mathbf\{m\}\|\\mathbf\{h\}\)\}\\mathbb\{E\}\_\{\\bm\{\\theta\}\_\{i\}\\sim q\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\),\\textup\{ i\.i\.d\.\}\}\\Big\[\\max\_\{i\}\\ \\langle\\mathbf\{h\},\\bm\{\\theta\}\_\{i\}\\rangle\-\\lambda\_\{s\}\\log n\-\\lambda\_\{c\}D\_\{\\mathrm\{KL\}\}\\big\(q\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\)\\\|p\(\\mathbf\{h\}\)\\big\)\\Big\]\.Our objective in this work is to characterize the interaction policy\(κ,n\)\(\\kappa,n\)that maximizes𝒫d\(κ,n\)\\mathcal\{P\}\_\{d\}\(\\kappa,n\)\.
### 2\.1Decomposition of Preference and Recommendation for a given Message
Any vector𝐱∈𝒮d−1\\mathbf\{x\}\\in\\mathcal\{S\}^\{d\-1\}admits a unique decomposition into components parallel and orthogonal to a given unit vector𝐦∈𝒮d−1\\mathbf\{m\}\\in\\mathcal\{S\}^\{d\-1\}\. This decomposition is expressed as𝐱=⟨𝐱,𝐦⟩𝐦\+1−⟨𝐱,𝐦⟩2𝐱⟂\\mathbf\{x\}=\\langle\\mathbf\{x\},\\mathbf\{m\}\\rangle\\mathbf\{m\}\+\\sqrt\{1\-\\langle\\mathbf\{x\},\\mathbf\{m\}\\rangle^\{2\}\}\\mathbf\{x\}\_\{\\perp\}, where𝐱⟂∈𝒮d−1\\mathbf\{x\}\_\{\\perp\}\\in\\mathcal\{S\}^\{d\-1\}is orthogonal to𝐦\\mathbf\{m\}\(i\.e\.,⟨𝐱⟂,𝐦⟩=0\\langle\\mathbf\{x\}\_\{\\perp\},\\mathbf\{m\}\\rangle=0\)\. We leverage this to decompose a random variable distributed as per the posterior over user preferences𝐡¯\\overline\{\\mathbf\{h\}\}, and the set of recommendations\{𝜽1,…,𝜽n\}\\\{\\bm\{\\theta\}\_\{1\},\\dots,\\bm\{\\theta\}\_\{n\}\\\}conditioned on the message𝐦\\mathbf\{m\}\. For notational simplicity \(since preferences𝐡¯\\overline\{\\mathbf\{h\}\}lead to the same expected user utility\), we drop the bar and denote the posterior preference vector as𝐡\\mathbf\{h\}\.
Decomposition of posterior on user preferences:Define themessage fidelityW=⟨𝐡,𝐦⟩W=\\langle\\mathbf\{h\},\\mathbf\{m\}\\rangle, which quantifies the alignment between the preference𝐡∼qκ\(⋅\|𝐦\)\\mathbf\{h\}\\sim q\_\{\\kappa\}\(\\cdot\|\\mathbf\{m\}\)and the message𝐦\\mathbf\{m\}\. We can write
𝐡=W𝐦\+1−W2𝐘\.\\mathbf\{h\}=W\\mathbf\{m\}\+\\sqrt\{1\-W^\{2\}\}\\mathbf\{Y\}\.\(2\)The vector𝐘\\mathbf\{Y\}represents theuncommunicated componentof𝐡\\mathbf\{h\}\(the part not captured by the message𝐦\\mathbf\{m\}\), and is orthogonal to𝐦\\mathbf\{m\}, i\.e\.,⟨𝐘,𝐦⟩=0\\langle\\mathbf\{Y\},\\mathbf\{m\}\\rangle=0\. Under the vMF posterior, the random variablesWWand𝐘\\mathbf\{Y\}are independent\. Note that a larger message precisionκ\\kappainduces stronger concentration ofWWnear11\. More details on properties ofWWand𝐘\\mathbf\{Y\}are provided in Lemma[1](https://arxiv.org/html/2605.23944#Thmtheorem1)\.
Decomposition of recommendations:Each recommendation𝜽i\\bm\{\\theta\}\_\{i\}is drawn independently from the same posterior distributionqκ\(𝐡\|𝐦\)q\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\)\. Consequently, we have an analogous decomposition for it
𝜽i=Wi𝐦\+1−Wi2𝐘i,and⟨𝐡,𝜽i⟩=WWi\+1−W21−Wi2Xi,\\displaystyle\\bm\{\\theta\}\_\{i\}=W\_\{i\}\\mathbf\{m\}\+\\sqrt\{1\-W\_\{i\}^\{2\}\}\\mathbf\{Y\}\_\{i\},\\ \\text\{ and \}\\ \\langle\\mathbf\{h\},\\bm\{\\theta\}\_\{i\}\\rangle=WW\_\{i\}\+\\sqrt\{1\-W^\{2\}\}\\sqrt\{1\-W\_\{i\}^\{2\}\}X\_\{i\},\(3\)where\(Wi,𝐘i\)\(W\_\{i\},\\mathbf\{Y\}\_\{i\}\)has the same distribution as\(W,𝐘\)\(W,\\mathbf\{Y\}\),Xi:=⟨𝐘,𝐘i⟩X\_\{i\}:=\\langle\\mathbf\{Y\},\\mathbf\{Y\}\_\{i\}\\rangle, and we use⟨𝐦,𝐘⟩=⟨𝐦,𝐘i⟩=0\\langle\\mathbf\{m\},\\mathbf\{Y\}\\rangle=\\langle\\mathbf\{m\},\\mathbf\{Y\}\_\{i\}\\rangle=0\.
###### Lemma 1\.
For the message and product distribution, we have the following results:
1. \(i\)*Distribution ofWWand𝐘\\mathbf\{Y\}:*The distribution function forWWis given by pκ,d\(w\)=C~d\(κ\)eκw\(1−w2\)d−32,forw∈\[−1,1\]\.\\displaystyle p\_\{\\kappa,d\}\(w\)=\\tilde\{C\}\_\{d\}\(\\kappa\)e^\{\\kappa w\}\\left\(1\-w^\{2\}\\right\)^\{\\frac\{d\-3\}\{2\}\},\\ \\text\{ for \}w\\in\[\-1,1\]\.\(4\)whereC~d\(κ\)\\tilde\{C\}\_\{d\}\(\\kappa\)is the normalization constant\. Further, the uncommunicated component𝐘\\mathbf\{Y\}is uniformly distributed over the space𝒮\(𝐦\)=\{𝐲∈𝒮d−1:⟨𝐦,𝐲⟩=0\}\\mathcal\{S\}\(\\mathbf\{m\}\)=\\\{\\mathbf\{y\}\\in\\mathcal\{S\}^\{d\-1\}:\\langle\\mathbf\{m\},\\mathbf\{y\}\\rangle=0\\\}\.
2. \(ii\)*Independence of orthogonal components:*The set of random variables\{X1,⋯,Xn\}\\\{X\_\{1\},\\cdots,X\_\{n\}\\\}are mutually independent, and follow the distributionp0,d−1\(x\)p\_\{0,d\-1\}\(x\), wherepκ,d\(⋅\)p\_\{\\kappa,d\}\(\\cdot\)is as given in Eq\. \([4](https://arxiv.org/html/2605.23944#S2.E4)\)\.
3. \(iii\)*Communication cost simplification:*The KL\-divergence betweenp\(𝐡\)p\(\\mathbf\{h\}\)andqκ\(𝐡\|𝐦\)q\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\)is 𝔼𝐡,𝐦\[D*KL*\(qκ\(⋅\|𝐦\)∥p\(⋅\)\)\]=κ𝔼\[W\]−logCd\(0\)Cd\(κ\),\\displaystyle\\mathbb\{E\}\_\{\\mathbf\{h\},\\mathbf\{m\}\}\\big\[D\_\{\\emph\{KL\}\}\\big\(q\_\{\\kappa\}\(\\cdot\|\\mathbf\{m\}\)\\\|p\(\\cdot\)\\big\)\\big\]=\\kappa\\mathbb\{E\}\[W\]\-\\log\\frac\{C\_\{d\}\(0\)\}\{C\_\{d\}\(\\kappa\)\},where the distribution ofWWis itself a function ofκ\\kappa\.
The results in Lemma[1](https://arxiv.org/html/2605.23944#Thmtheorem1)characterize the distributional and information\-theoretic properties of the underlying random variables\. The proof utilizes standard results in directional statistics \(see\(Mardia and Jupp[2009](https://arxiv.org/html/2605.23944#bib.bib18)\)\) regarding the marginal distribution of the vMF distribution, and is provided in Appendix[D\.1](https://arxiv.org/html/2605.23944#A4.SS1)\. Next, using Eq\. \([3](https://arxiv.org/html/2605.23944#S2.E3)\) and Lemma[1](https://arxiv.org/html/2605.23944#Thmtheorem1)[\(iii\)](https://arxiv.org/html/2605.23944#S2.I3.i3), we have
𝒫d\(κ,n\)\\displaystyle\\mathcal\{P\}\_\{d\}\(\\kappa,n\)=𝔼\[maxi\{WWi\+1−W21−Wi2Xi\}\]−λslogn−λc\(κ𝔼\[W\]−logCd\(0\)Cd\(κ\)\)\.\\displaystyle=\\mathbb\{E\}\\Big\[\\max\_\{i\}\\big\\\{WW\_\{i\}\+\\sqrt\{1\-W^\{2\}\}\\sqrt\{1\-W\_\{i\}^\{2\}\}X\_\{i\}\\big\\\}\\Big\]\-\\lambda\_\{s\}\\log n\-\\lambda\_\{c\}\\Big\(\\kappa\\mathbb\{E\}\[W\]\-\\log\\frac\{C\_\{d\}\(0\)\}\{C\_\{d\}\(\\kappa\)\}\\Big\)\.
Alternative Policies:To study if there are gains from jointly optimizing communication and search, we compare against policies that use only one of them\.
1. 1\.Pure Search:Under this policy, the user does not communicate their preferences, they jump straight to choosing between recommendations received\. Mathematically, this amounts to settingκ=0\\kappa=0in the joint optimization problem\. The agent’s posterior collapses to the prior, recommendations are samples fromp\(⋅\)p\(\\cdot\)and the expected payoff is𝒫d\(0,n\)\\mathcal\{P\}\_\{d\}\(0,n\)\.
2. 2\.Pure Communication:In this case, we consider a setting where the agent is restricted to recommending a single item, i\.e\.,n=1n=1\. Under this policy, the user relies on providing an informative message to ensure that the agent delivers a single high\-quality recommendation\. Formally, whenn=1n=1, the objective simplifies to𝒫d\(κ,1\)\\mathcal\{P\}\_\{d\}\(\\kappa,1\)\.
## 3Main Results and Insights under Posterior Sampling Scheme
While the optimization problemmaxκ,n𝒫d\(κ,n\)\\max\_\{\\kappa,n\}\\mathcal\{P\}\_\{d\}\(\\kappa,n\)can be solved numerically, such computation offers limited structural insight\. To gain structural insight, we turn to a high\-dimensional asymptotic approximation, which serves as a tractable surrogate that captures the dominant interactions between key parameters\. It reveals the fundamental scaling laws and operational regimes governing the optimal user–agent interaction\.
### 3\.1Intuitive Explanation behind High\-Dimensional Approximation
In modern recommendation systems, both users and products are described by high\-dimensional embeddings\. Analyzing the regime where the feature dimensionddis large not only mirrors practical settings but also enables powerful asymptotic simplifications to reveal key tradeoffs between communication and search costs\. Asddgrows, the alignments⟨𝐡,𝜽i⟩\\langle\\mathbf\{h\},\\bm\{\\theta\}\_\{i\}\\rangle’s concentrate sharply around their mean\. Deviations of constant order away from this typical value occur with probabilities that decay exponentially indd\. These rare but utility\-relevant deviations are naturally characterized by a Large Deviation Principle \(LDP\) with an associated rate function that quantifies the exponential cost of achieving a given alignment level relative to its typical value\. This perspective allows us to replace the stochastic maximization over a finite set of randomly generated recommendations with a deterministic optimization problem defined by the LDP rate function\. In particular, the maximum utility is governed by the most likely extreme alignment achievable at exponential scale, rather than by typical fluctuations\. The resulting approximation captures the asymptotically dominant alignment outcomes that govern the maximum utility in high dimensions\.
##### Intuition behind scaling ofκ\\kappaandnn:
In order to conduct the high\-dimensional approximation for the optimization problemmaxκ,n𝒫d\(κ,n\)\\max\_\{\\kappa,n\}\\mathcal\{P\}\_\{d\}\(\\kappa,n\), we first examine the product utility received by the user asd→∞d\\rightarrow\\infty\. Consider the alignment of a recommendation𝜽1∼qκ\(⋅\|𝐦\)\\bm\{\\theta\}\_\{1\}\\sim q\_\{\\kappa\}\(\\cdot\|\\mathbf\{m\}\)with respect to the message𝐦\\mathbf\{m\}, denoted byW1=⟨𝜽1,𝐦⟩W\_\{1\}=\\langle\\bm\{\\theta\}\_\{1\},\\mathbf\{m\}\\rangle\. The induced one\-dimensional density ofW1W\_\{1\}ispκ,d\(w\)∝eκw\(1−w2\)d−32p\_\{\\kappa,d\}\(w\)\\propto e^\{\\kappa w\}\\left\(1\-w^\{2\}\\right\)^\{\\frac\{d\-3\}\{2\}\}\. The exponential termeκwe^\{\\kappa w\}pulls mass towardsw=1w=1whenκ\\kappais large, while the spherical term\(1−w2\)d−32\\left\(1\-w^\{2\}\\right\)^\{\\frac\{d\-3\}\{2\}\}pulls mass towardw=0w=0whenddis large\. Ifκ\\kappais fixed whiled→∞d\\rightarrow\\infty, the spherical term dominates andwwtends to0\. On the other hand, ifκ\\kappagrows faster thandd, the exponential termeκwe^\{\\kappa w\}dominates andwwconcentrates near11\. In contrast to the two extremes, the exponential and the spherical are in balance whenκ\\kappascales linearly withdd\. To capture this, we setκ=ρ1−ρ2\(d−3\)\\kappa=\\frac\{\\rho\}\{1\-\\rho^\{2\}\}\(d\-3\)in the analysis \(see Proposition[2](https://arxiv.org/html/2605.23944#Thmtheorem2)\)\. Hereρ\\rhoturns out to be the mode of the distributionpκ,d\(w\)p\_\{\\kappa,d\}\(w\), andW,Wi∼pκ,d\(w\)W,W\_\{i\}\\sim p\_\{\\kappa,d\}\(w\)concentrate atρ\\rhoasd→∞d\\to\\infty\.
Next, considernncandidate products\. The LDP implies that the probability of observing an unusually high alignment for any single product decays exponentially indd, saye−dαe^\{\-d\\alpha\}for someα\>0\\alpha\>0for a given degree of alignment\. When searching over many products, the most preferred one will correspond to such a rare alignment, but to have a meaningful chance of observing it, the number of candidates must grow fast enough to offset this exponential decay\. In particular, we requirene−dαne^\{\-d\\alpha\}to remain non\-trivial, for someα\>0\\alpha\>0, that isn∝exp\(dα\)n\\propto\\exp\(d\\alpha\)\. Intuitively, exponential inΘ\(d\)\\Theta\(d\)recommendations are needed to address preferences which are unknown inΘ\(d\)\\Theta\(d\)dimensions\.
##### Intuition behind scaling of cost parameters:
We next analyze the communication and search costs\. Forκ=ρ1−ρ2\(d−3\)\\kappa=\\frac\{\\rho\}\{1\-\\rho^\{2\}\}\(d\-3\), the expected information gain, measured by the KL\-divergence, grows linearly withdd\(see Proposition[3](https://arxiv.org/html/2605.23944#Thmtheorem3)\)\. Similarly, forn∝exp\(dα\)n\\propto\\exp\(d\\alpha\), the search cost, proportional tologn\\log n, also scales linearly indd\. To keep the overall optimization meaningful in this regime, the communication and search cost coefficientsλc\\lambda\_\{c\}andλs\\lambda\_\{s\}must scale inversely withdd, i\.e\.,∝1/d\\propto 1/d\. Note that if the cost structure changes, the scaling must adjust accordingly\. For example, if search cost were proportional tonn, thenλs\\lambda\_\{s\}would need to decay exponentially, i\.e\.,∝e−αd\\propto e^\{\-\\alpha d\}\.
### 3\.2Theoretical Results and Insights ford→∞d\\rightarrow\\infty
We start by providing a high\-dimensional approximation for𝒫d\(κ,n\)\\mathcal\{P\}\_\{d\}\(\\kappa,n\)\.
###### Proposition 2\(Product utility\)\.
Supposeρ∈\[0,1\)\\rho\\in\[0,1\)andα≥0\\alpha\\geq 0are fixed and define:
f\(ρ,α\):=maxw,x∈\(−1,1\)\\displaystyle f\(\\rho,\\alpha\):=\\max\_\{w,x\\in\(\-1,1\)\}ρw\+1−ρ21−w2xsuch thatIρ\(w,x\)≤α,\\displaystyle\\ \\ \\ \\rho w\+\\sqrt\{1\-\\rho^\{2\}\}\\sqrt\{1\-w^\{2\}\}x\\ \\text\{ such that \}I\_\{\\rho\}\(w,x\)\\leq\\alpha,whereIρ\(w,x\)I\_\{\\rho\}\(w,x\)is the large deviations rate function, given by
Iρ\(w,x\)=−ρ\(w−ρ\)1−ρ2−12log\(1−w2\)−12log\(1−x2\)\+12log\(1−ρ2\)\.\\displaystyle I\_\{\\rho\}\(w,x\)=\-\\frac\{\\rho\(w\-\\rho\)\}\{1\-\\rho^\{2\}\}\-\\frac\{1\}\{2\}\\log\(1\-w^\{2\}\)\-\\frac\{1\}\{2\}\\log\(1\-x^\{2\}\)\+\\frac\{1\}\{2\}\\log\(1\-\\rho^\{2\}\)\.\(5\)Suppose the message precisionκ=ρ1−ρ2\(d−3\)\\kappa=\\frac\{\\rho\}\{1\-\\rho^\{2\}\}\(d\-3\)and the recommendation set sizen=⌊edα⌋n=\\lfloor e^\{d\\alpha\}\\rfloor, then, the corresponding expected utility received by the user satisfies
𝔼\[maxi\{WWi\+1−W21−Wi2Xi\}\]=f\(ρ,α\)\+ℰ[2](https://arxiv.org/html/2605.23944#Thmtheorem2),where\|ℰ[2](https://arxiv.org/html/2605.23944#Thmtheorem2)\|≤K[2](https://arxiv.org/html/2605.23944#Thmtheorem2)logdd,\\displaystyle\\mathbb\{E\}\\left\[\\max\_\{i\}\\Big\\\{WW\_\{i\}\+\\sqrt\{1\-W^\{2\}\}\\sqrt\{1\-W\_\{i\}^\{2\}\}X\_\{i\}\\Big\\\}\\right\]=f\(\\rho,\\alpha\)\+\\mathcal\{E\}\_\{\\ref\{prop: utility\_approx\}\},\\ \\text\{ where \}\\ \|\\mathcal\{E\}\_\{\\ref\{prop: utility\_approx\}\}\|\\leq K\_\{\\ref\{prop: utility\_approx\}\}\\sqrt\{\\frac\{\\log d\}\{d\}\},whereW,WiW,W\_\{i\}andXiX\_\{i\}’s are as in Eq\. \([3](https://arxiv.org/html/2605.23944#S2.E3)\), and the constantK[2](https://arxiv.org/html/2605.23944#Thmtheorem2)K\_\{\\ref\{prop: utility\_approx\}\}depends onρ\\rhoandα\\alpha\.
Proposition[2](https://arxiv.org/html/2605.23944#Thmtheorem2)provides a high\-dimensional approximation of the expected utility an average user receives, and shows that the approximated expected utility can be obtained by solving a deterministic optimization problem\. The parameterρ\\rhodenotes the mode of the distribution ofWW, and as the dimensionddincreases, the distribution ofWWconcentrates sharply around this mode\. The result in Proposition[2](https://arxiv.org/html/2605.23944#Thmtheorem2)is derived using a LDP for the set of sample pairs\{\(Wi,Xi\)\}i∈\[n\]\\\{\(W\_\{i\},X\_\{i\}\)\\\}\_\{i\\in\[n\]\}\. The LDP implies that as the dimension grows, the set of points\{\(Wi,Xi\)\}i∈\[n\]\\\{\(W\_\{i\},X\_\{i\}\)\\\}\_\{i\\in\[n\]\}lies, with high probability, within the effective support of the joint distribution of\(Wi,Xi\)\(W\_\{i\},X\_\{i\}\)\. This effective support is characterized by sub\-level sets of the associated rate functionIρ\(w,x\)I\_\{\\rho\}\(w,x\)\(the rate function corresponding to the distribution of\(Wi,Xi\)\(W\_\{i\},X\_\{i\}\)\), and its boundary determines the extreme realizations that govern the maximum utility\. The approximation error ofO~\(d−12\)\\tilde\{O\}\\big\(d^\{\-\\frac\{1\}\{2\}\}\\big\)arises from residual stochastic fluctuations in the random variableWW, where we use the fact that the standard deviation ofWWis of orderd−12d^\{\-\\frac\{1\}\{2\}\}\. The proof of Proposition[2](https://arxiv.org/html/2605.23944#Thmtheorem2)is provided in Appendix[E\.1](https://arxiv.org/html/2605.23944#A5.SS1)\.
###### Proposition 3\(Communication cost\)\.
Supposeρ\\rhois fixed and letκ=ρ1−ρ2\(d−3\)\\kappa=\\frac\{\\rho\}\{1\-\\rho^\{2\}\}\(d\-3\)\. We have,
𝔼𝐡,𝐦\[DKL\(qκ\(⋅\|𝐦\)∥p\(⋅\)\)\]=d−22log11−ρ2\+ℰ[3](https://arxiv.org/html/2605.23944#Thmtheorem3),\\displaystyle\\mathbb\{E\}\_\{\\mathbf\{h\},\\mathbf\{m\}\}\\big\[D\_\{\\mathrm\{KL\}\}\\big\(q\_\{\\kappa\}\(\\cdot\|\\mathbf\{m\}\)\\\|p\(\\cdot\)\\big\)\\big\]=\\frac\{d\-2\}\{2\}\\log\\frac\{1\}\{1\-\\rho^\{2\}\}\+\\mathcal\{E\}\_\{\\ref\{prop: kl\_approx\}\},where\|ℰ[3](https://arxiv.org/html/2605.23944#Thmtheorem3)\(ρ\)\|≤K[3](https://arxiv.org/html/2605.23944#Thmtheorem3)d\(1−ρ2\)3\|\\mathcal\{E\}\_\{\\ref\{prop: kl\_approx\}\}\(\\rho\)\|\\leq K\_\{\\ref\{prop: kl\_approx\}\}\\sqrt\{\\frac\{d\}\{\(1\-\\rho^\{2\}\)^\{3\}\}\}, andK[3](https://arxiv.org/html/2605.23944#Thmtheorem3)K\_\{\\ref\{prop: kl\_approx\}\}is a universal constant\.
Proposition[3](https://arxiv.org/html/2605.23944#Thmtheorem3)provides an approximation of the KL divergence between the prior on the user’s preference and the posterior given the message\. It is important to note that the KL divergence scales linearly in the dimensiondd\. This makes sense: Recall that the entropy of prior preference distribution \(i\.e\., uniform over𝒮d−1\\mathcal\{S\}^\{d\-1\}\) scales linearly with the dimensiondd\. As such, the amount of information that the user needs to provide to meaningfully reduce the agent’s uncertainty must also grow proportionally\. Conceptually, this linear scaling implies that the user effectively pays a constant price for each dimension of preference they clarify for the agent\.
Combining Propositions[2](https://arxiv.org/html/2605.23944#Thmtheorem2)and[3](https://arxiv.org/html/2605.23944#Thmtheorem3), we define the following optimization problem, which allows us to state our first theorem:
OPTJoint:=maxρ∈\[0,1\),α≥0\{f\(ρ,α\)−csα\+12cclog\(1−ρ2\)\},\\displaystyle\\text\{OPT\}\_\{\\text\{Joint\}\}:=\\max\_\{\\rho\\in\[0,1\),\\alpha\\geq 0\}\\ \\ \\Big\\\{f\(\\rho,\\alpha\)\-c\_\{s\}\\alpha\+\\frac\{1\}\{2\}c\_\{c\}\\log\(1\-\\rho^\{2\}\)\\Big\\\},\(6\)wherecsandccc\_\{s\}\\text\{ and \}c\_\{c\}are constants, and the termsρ\\rho,α\\alphaandf\(ρ,α\)f\(\\rho,\\alpha\)are as defined in Proposition[2](https://arxiv.org/html/2605.23944#Thmtheorem2)\.
###### Theorem 4\(Asymptotics of Interaction Policy\)\.
Letcs,cc\>0c\_\{s\},c\_\{c\}\>0be fixed constants such that the cost coefficients scale asλs=csd\+o\(d−1\)\\lambda\_\{s\}=\\frac\{c\_\{s\}\}\{d\}\+o\(d^\{\-1\}\)andλc=ccd\+o\(d−1\)\\lambda\_\{c\}=\\frac\{c\_\{c\}\}\{d\}\+o\(d^\{\-1\}\)\. Letρ∗\(cs,cc\)\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)andα∗\(cs,cc\)\\alpha^\{\*\}\(c\_\{s\},c\_\{c\}\)denote the optimal solution toOPTJoint\\mathrm\{OPT\}\_\{\\mathrm\{Joint\}\}given\(cs,cc\)\(c\_\{s\},c\_\{c\}\), and construct an interaction policy\(κ∞∗,n∞∗\)\(\\kappa^\{\*\}\_\{\\infty\},n^\{\*\}\_\{\\infty\}\)as:
κ∞∗=ρ∗\(cs,cc\)1−\(ρ∗\(cs,cc\)\)2\(d−3\),andn∞∗=⌊edα∗\(cs,cc\)⌋\.\\displaystyle\\kappa^\{\*\}\_\{\\infty\}=\\frac\{\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)\}\{1\-\(\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)\)^\{2\}\}\(d\-3\),\\quad\\text\{ and \}\\quad n^\{\*\}\_\{\\infty\}=\\lfloor e^\{d\\alpha^\{\*\}\(c\_\{s\},c\_\{c\}\)\}\\rfloor\.\(7\)We have the following results:
1. 1\.*Convergence:*The expected payoff satisfies, limd→∞maxκ,n𝒫d\(κ,n\)=limd→∞𝒫d\(κ∞∗,n∞∗\)=OPTJoint\.\\lim\_\{d\\rightarrow\\infty\}\\max\_\{\\kappa,n\}\\mathcal\{P\}\_\{d\}\(\\kappa,n\)=\\lim\_\{d\\rightarrow\\infty\}\\mathcal\{P\}\_\{d\}\(\\kappa^\{\*\}\_\{\\infty\},n^\{\*\}\_\{\\infty\}\)=\\mathrm\{OPT\}\_\{\\mathrm\{Joint\}\}\.
2. 2\.*Monotonicity:*The optimal message precisionρ∗\(cs,cc\)\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)and set size exponentα∗\(cs,cc\)\\alpha^\{\*\}\(c\_\{s\},c\_\{c\}\)satisfy, ∂ρ∗\(cs,cc\)∂cc≤0,\\displaystyle\\frac\{\\partial\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)\}\{\\partial c\_\{c\}\}\\leq 0,∂ρ∗\(cs,cc\)∂cs≥0,\\displaystyle\\frac\{\\partial\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)\}\{\\partial c\_\{s\}\}\\geq 0,∂α∗\(cs,cc\)∂cc≥0,\\displaystyle\\frac\{\\partial\\alpha^\{\*\}\(c\_\{s\},c\_\{c\}\)\}\{\\partial c\_\{c\}\}\\geq 0,∂α∗\(cs,cc\)∂cs≤0\\displaystyle\\frac\{\\partial\\alpha^\{\*\}\(c\_\{s\},c\_\{c\}\)\}\{\\partial c\_\{s\}\}\\leq 0
3. 3\.*Search\-Only Regime:*For every fixedcs\>0c\_\{s\}\>0, there exists a thresholdc¯c\(cs\)\>0\\bar\{c\}\_\{c\}\(c\_\{s\}\)\>0such that for allcc\>c¯c\(cs\)c\_\{c\}\>\\bar\{c\}\_\{c\}\(c\_\{s\}\), the optimal policy involves no communication, that is,ρ∗\(cs,cc\)=0\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)=0\.
Theorem[4](https://arxiv.org/html/2605.23944#Thmtheorem4)delivers three key insights into the structure of optimal user–agent interaction in high dimensions\. First, the convergence ofmaxκ,n𝒫d\(κ,n\)\\max\_\{\\kappa,n\}\\mathcal\{P\}\_\{d\}\(\\kappa,n\)shows that, despite the inherent stochasticity of the recommendation process, the complex finite\-dimensional problem admits a deterministic characterization in the limit\. This validates the joint optimization problemOPTJoint\\mathrm\{OPT\}\_\{\\mathrm\{Joint\}\}as a tractable proxy for system design when the number of features is large, and allows us to analytically understand the fundamental trade\-offs in communication and search\. Second, the monotonicity properties reveal a clear and intuitive substitution pattern between communication and search\. As communication becomes more expensive, users optimally reduce message precision and compensate by relying on larger recommendation sets, whereas increases in search costs lead to more informative communication and smaller recommendation set size\. Finally, the existence of a search\-only regime underscores the limits of communication in high\-dimensional settings\. When communication is sufficiently costly relative to search, the user optimally withholds information and delegates all effort to exploration\. The proof of Theorem[4](https://arxiv.org/html/2605.23944#Thmtheorem4)is provided in Appendix[E\.3](https://arxiv.org/html/2605.23944#A5.SS3)\.
#### 3\.2\.1Asymptotic Regimes of Operation
Our analysis characterizes the asymptotic structure of optimal user–agent interaction under communication and search costs that scale asλs=cs/d\\lambda\_\{s\}=c\_\{s\}/dandλc=cc/d\\lambda\_\{c\}=c\_\{c\}/dwith fixed constants\(cs,cc\)\(c\_\{s\},c\_\{c\}\)\. Under this specification, the interaction admits a well\-defined asymptotic limit characterized by the optimization problem OPTJoint\{\}\_\{\\text\{Joint\}\}\. Varying\(cs,cc\)\(c\_\{s\},c\_\{c\}\)induces qualitatively distinct regimes of operation with a phase transition between them\.
##### Hybrid Regime \(and Joint Optimization\)
The hybrid regime and the need for joint optimization arises when both scaled costscsc\_\{s\}andccc\_\{c\}are finite and of comparable magnitude\. More specifically, whencc<c¯c\(cs\)c\_\{c\}<\\bar\{c\}\_\{c\}\(c\_\{s\}\), wherec¯c\(cs\)\\bar\{c\}\_\{c\}\(c\_\{s\}\)is a threshold that depends on the communication cost parametercsc\_\{s\}, the optimal solution to OPTJoint\{\}\_\{\\text\{Joint\}\}involves strictly positive values of both the message precision parameterρ∗\(cs,cc\)\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)and the recommendation set exponentα∗\(cs,cc\)\\alpha^\{\*\}\(c\_\{s\},c\_\{c\}\)\. This corresponds to a hybrid interaction policy in which the user provides partial but informative communication, while the agent supplies a recommendation set whose size grows exponentially withdd\. This hybrid regime highlights that in complex environments, optimal performance requires jointly leveraging communication and search, rather than relying exclusively on either mechanism\. The optimal valuesρ∗\(cs,cc\)\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)andα∗\(cs,cc\)\\alpha^\{\*\}\(c\_\{s\},c\_\{c\}\)characterize the asymptotic scaling of the optimal message precisionκ\\kappaand recommendation set sizenn, respectively\. The underlying intuition for these scaling laws was developed in Section[3\.1](https://arxiv.org/html/2605.23944#S3.SS1)\.
A natural illustration of the hybrid regime arises in everydayonline shoppingfor differentiated products such as electronics, apparel, or home goods, on e\-commerce websites such as Amazon\. In these settings, customers often find it cognitively costly to articulate all relevant preferences \(e\.g\., color, brand affinity, or design features\) that jointly determine the best product fit\. At the same time, the sheer scale of available options renders exhaustive search impractical\. Consequently, effective interaction requires combining both mechanisms: users convey coarse but informative preference signals, while the agent presents a carefully sized set of recommendations that spans the remaining uncertainty, leading to a hybrid policy\.
##### Search\-Only Regime
When communication cost exceeds a certain thresholdc¯c\(cs\)\\bar\{c\}\_\{c\}\(c\_\{s\}\), the optimal solution is a pure search policy\. In this regime, the optimal message precision collapses toρ∗\(cs,cc\)=0\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)=0, and the user refrains from providing any informative communication\. In this case, the user’s payoff is derived entirely from search\. The user’s decision to not provide preference information in the search\-only regime is not due to a lack of informative potential\. Under the same cost structure, the user might optimally provide partial information if the agent were restricted to a single recommendation \(n=1n=1\)\. However, due to high communication cost, the ability to search cheaply renders costly communication economically inefficient\.
A representative example of this regime iswindow shopping, in which users face high cognitive costs in articulating precise preferences, as they themselves may be uncertain about what they seek\. This corresponds to a setting with high communication costs, as transmitting informative signals is either infeasible or too cognitively demanding\. Consequently, the informational burden shifts entirely to search, and the agent must compensate by offering a large recommendation set to cover the space of plausible user preferences\. This interaction closely resembles a brute\-force browsing experience on a generic e\-commerce platform, where identifying a suitable product relies almost exclusively on the user’s capacity to search through many alternatives\. We defer further discussion on the phase transition between the two regimes and the switching curvec¯c\(cs\)\\bar\{c\}\_\{c\}\(c\_\{s\}\)to Section[4\.0\.1](https://arxiv.org/html/2605.23944#S4.SS0.SSS1)\.
##### Frictionless Extremes
Apart from the above mentioned regimes, there exist additional extremes outside the1/d1/dcost scaling\. If eitherλs≪1/d\\lambda\_\{s\}\\ll 1/dorλc≪1/d\\lambda\_\{c\}\\ll 1/d, which is analogous tocs=0c\_\{s\}=0orcc=0c\_\{c\}=0, respectively, the interaction becomes effectively frictionless\. In the former case, the user can search at negligible cost; in the latter, the user can communicate preferences with essentially perfect precision\. Under either scenario, frictions vanish asymptotically, and the resulting expected payoff converges to the maximal attainable value11\.
##### Absence of Communication\-Only Regime
Under the high\-dimensional approximation, a communication\-only policy corresponds to choosing a singleton recommendation set, i\.e\.,α=0\\alpha=0, so that the user relies entirely on the message and performs no search\. Unlike the search\-only regime identified in Theorem[4](https://arxiv.org/html/2605.23944#Thmtheorem4)\(3\), this outcome does not arise over any nondegenerate range of parameters\. In particular,α=0\\alpha=0is optimal only in two boundary scenarios\. First, whencs→∞c\_\{s\}\\rightarrow\\infty, search becomes prohibitively costly, forcing the user to rely exclusively on communication\. Second, when communication is frictionless, i\.e\.,cc=0c\_\{c\}=0, the user can transmit preferences without cost, eliminating the need for search\. In contrast, whenever communication carries any strictly positive cost \(resulting in optimalρ<1\\rho<1\), the user strictly benefits from allowing at least some exploration, i\.e\.,α\>0\\alpha\>0\. A singleton recommendation set, generated through posterior sampling, prevents residual preference uncertainty through search, making pure communication suboptimal away from these boundary cases\. Consequently, whileρ∗\(cs,cc\)\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)exhibits a genuine threshold behavior inccc\_\{c\}, the optimal set\-size exponentα∗\(cs,cc\)\\alpha^\{\*\}\(c\_\{s\},c\_\{c\}\)does not display an analogous phase transition; the communication\-only policy appears only as a boundary solution rather than as an interior regime\.
#### 3\.2\.2Numerical results ford→∞d\\rightarrow\\inftycase
The plots in Figure[3](https://arxiv.org/html/2605.23944#S3.F3)present the optimal values ofρ∗\(cs,cc\)\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)\(left\) andα∗\(cs,cc\)\\alpha^\{\*\}\(c\_\{s\},c\_\{c\}\)\(middle\) that solve the optimization problem OPTJoint\{\}\_\{\\text\{Joint\}\}for given values ofcsc\_\{s\}andccc\_\{c\}\. The left panel of Figure[3](https://arxiv.org/html/2605.23944#S3.F3)reveals that, for any fixed value ofcsc\_\{s\}, there exists a thresholdc¯c\(cs\)\\bar\{c\}\_\{c\}\(c\_\{s\}\)at which the interaction policy undergoes a sharp transition from a hybrid communication–search regime to a purely search\-only regime, as shown in Theorem[4](https://arxiv.org/html/2605.23944#Thmtheorem4)\. For values ofcc<c¯c\(cs\)c\_\{c\}<\\bar\{c\}\_\{c\}\(c\_\{s\}\), the system operates in a hybrid communication\-search regime, where the optimal interaction policy is to jointly optimize over the message precision and recommendation set size\. It can also be observed that whenccc\_\{c\}is small relative tocsc\_\{s\}, the optimal solution favors higher values ofρ∗\(cs,cc\)\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)andα∗\(cs,cc\)\\alpha^\{\*\}\(c\_\{s\},c\_\{c\}\)is close to zero, even thoughα∗\(cs,cc\)=0\\alpha^\{\*\}\(c\_\{s\},c\_\{c\}\)=0only at the boundarycc=0c\_\{c\}=0\.
Figure 3:Heatmap forρ∗\(cs,cc\)\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)\(left\) andα∗\(cs,cc\)\\alpha^\{\*\}\(c\_\{s\},c\_\{c\}\)\(middle\) with respect tocsc\_\{s\}andccc\_\{c\}; and \(right\) incremental payoff of the joint policy relative to the best pure policy\.The right panel in Figure[3](https://arxiv.org/html/2605.23944#S3.F3)presents the heatmap of\[OPTJoint−max\{OPTComm,OPTSearch\}\]\\big\[\\text\{OPT\}\_\{\\text\{Joint\}\}\-\\max\\\{\\text\{OPT\}\_\{\\text\{Comm\}\},\\text\{OPT\}\_\{\\text\{Search\}\}\\\}\\big\], whereOPTComm\\text\{OPT\}\_\{\\text\{Comm\}\}andOPTSearch\\text\{OPT\}\_\{\\text\{Search\}\}are obtained by settingα=0\\alpha=0andρ=0\\rho=0in the optimization problem forOPTJoint\\text\{OPT\}\_\{\\text\{Joint\}\}, respectively, and serve as asymptotic characterizations of the pure communication and pure search policies, respectively\. As such, the right panel in Figure[3](https://arxiv.org/html/2605.23944#S3.F3)presents the incremental gain of using joint optimization compared to the better of the two pure benchmarks\. The results show that performance gains are most significant in a narrow band of intermediate cost regimes, whereccc\_\{c\}is smaller than but comparable tocsc\_\{s\}\. In these settings, jointly optimizing message precision and recommendation set size yields meaningful improvements over either pure strategy, whereas outside this region the incremental benefit of joint optimization is limited\.
### 3\.3Theoretical Results and Insights for Finitedd
We now turn to the finite\-dimensional problem and assess the accuracy of the asymptotic prescriptions derived in the previous section\. The next theorem presents a performance gap between the asymptotically optimal policy, and that under finitedd\.
###### Theorem 5\(Performance Gap\)\.
Supposecc\>0c\_\{c\}\>0andcs\>0c\_\{s\}\>0\. Let\(ρ∗\(cs,cc\),α∗\(cs,cc\)\)\(\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\),\\alpha^\{\*\}\(c\_\{s\},c\_\{c\}\)\)denote the optimal solution toOPTJoint\\mathrm\{OPT\}\_\{\\mathrm\{Joint\}\}with scaled cost parameterscs=dλsc\_\{s\}=d\\lambda\_\{s\}andcc=dλcc\_\{c\}=d\\lambda\_\{c\}\. Then, the performance gap between the true optimal policy\(κd∗,nd∗\)\(\\kappa^\{\*\}\_\{d\},n^\{\*\}\_\{d\}\)and the asymptotically optimal policy\(κ∞∗,n∞∗\)\(\\kappa^\{\*\}\_\{\\infty\},n^\{\*\}\_\{\\infty\}\)\(see Eq\. \([7](https://arxiv.org/html/2605.23944#S3.E7)\)\) satisfies
0≤𝒫d\(κd∗,nd∗\)−𝒫d\(κ∞∗,n∞∗\)≤K[5](https://arxiv.org/html/2605.23944#Thmtheorem5)logdd,\\displaystyle 0\\ \\leq\\ \\mathcal\{P\}\_\{d\}\\big\(\\kappa^\{\*\}\_\{d\},n^\{\*\}\_\{d\}\\big\)\-\\mathcal\{P\}\_\{d\}\\big\(\\kappa^\{\*\}\_\{\\infty\},n^\{\*\}\_\{\\infty\}\\big\)\\ \\leq\\ K\_\{\\ref\{thm: perf\_gap\}\}\\sqrt\{\\frac\{\\log d\}\{d\}\},\(8\)where the constantK[5](https://arxiv.org/html/2605.23944#Thmtheorem5)K\_\{\\ref\{thm: perf\_gap\}\}depends on the cost parametersccc\_\{c\}andcsc\_\{s\}\.
Theorem[5](https://arxiv.org/html/2605.23944#Thmtheorem5)establishes the accuracy of the high\-dimensional approximation by quantifying the performance gap, which decays at a rate ofO~\(d−1/2\)\\tilde\{O\}\(d^\{\-1/2\}\)\. The proof of Theorem[5](https://arxiv.org/html/2605.23944#Thmtheorem5)is provided in Appendix[F\.1](https://arxiv.org/html/2605.23944#A6.SS1)\. A crucial distinction from Theorem[4](https://arxiv.org/html/2605.23944#Thmtheorem4)is that this result does not rely on asymptotic scaling assumptions for the cost parameters\. Instead, for any finitedd, the theorem effectively defines the scaled parameterscs=dλsc\_\{s\}=d\\lambda\_\{s\}andcc=dλcc\_\{c\}=d\\lambda\_\{c\}and solvesOPTJoint\\mathrm\{OPT\}\_\{\\mathrm\{Joint\}\}to derive the optimalρ∗\(cs,cc\)\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)andα∗\(cs,cc\)\\alpha^\{\*\}\(c\_\{s\},c\_\{c\}\), which are then mapped to the control parametersκ∞∗\\kappa^\{\*\}\_\{\\infty\}andn∞∗n^\{\*\}\_\{\\infty\}\. From a practical standpoint, this implies that system designers can rely on the simpler asymptotic optimization to select message precision and recommendation set size without incurring significant utility loss, particularly in modern environments where embeddings are inherently high\-dimensional\.
#### 3\.3\.1Discretization and the Emergence of Communication\-Only Regime
A key distinction between the finite\-dimensional setting and its high\-dimensional approximation lies in the granularity of the recommendation set size\. In the asymptotic limit \(d→∞d\\to\\infty\), the recommendation set size is governed by a continuous exponentα\\alpha, allowing for smooth transitions between regimes\. However, whenddis finite,nnis strictly constrained to the set of positive integers \(n∈\{1,2,…\}n\\in\\\{1,2,\\dots\\\}\)\. This introduces a “discretization gap”, particularly in the transition from a single recommendation \(n=1n=1\) to multiple \(n≥2n\\geq 2\)\. The marginal search cost of adding a second item is significant \(λslog2\\lambda\_\{s\}\\log 2\), creating a barrier that prevents the smooth adoption of hybrid policy\. Consequently, the system is more prone to “stick” to pure policies at the extremes of the cost spectrum, as formalized below\.
###### Proposition 6\(Phase Transitions in Finite\-Dimensions\)\.
Fix the dimensiondd\. We have,
1. 1\.*Pure Communication regime:*For allλs\>0\\lambda\_\{s\}\>0, there existsλ¯c\(λs\)\\underline\{\\lambda\}\_\{c\}\(\\lambda\_\{s\}\), such that for allλc<λ¯c\(λs\)\\lambda\_\{c\}<\\underline\{\\lambda\}\_\{c\}\(\\lambda\_\{s\}\)we havend∗=1n^\{\*\}\_\{d\}=1\.
2. 2\.*Pure Search regime:*For allλs\>0\\lambda\_\{s\}\>0, there existsλ¯c\(λs\)\\bar\{\\lambda\}\_\{c\}\(\\lambda\_\{s\}\), such that for allλc\>λ¯c\(λs\)\\lambda\_\{c\}\>\\bar\{\\lambda\}\_\{c\}\(\\lambda\_\{s\}\)we haveκd∗=0\\kappa^\{\*\}\_\{d\}=0\.
The result in Proposition[6](https://arxiv.org/html/2605.23944#Thmtheorem6)highlights that while the joint policy is dominant in the hybrid regime, the integer constraints on the recommendation set size give rise to acommunication\-onlyregime\. Specifically, when the communication cost parameter is small \(λc<λ¯c\(λs\)\\lambda\_\{c\}<\\underline\{\\lambda\}\_\{c\}\(\\lambda\_\{s\}\)\), the discrete jump in search cost required to move fromn=1n=1ton=2n=2outweighs the marginal benefit of multiple recommendations, locking the system into apure communicationpolicy\. Conversely, when communication is expensive \(λc\>λ¯c\(λs\)\\lambda\_\{c\}\>\\bar\{\\lambda\}\_\{c\}\(\\lambda\_\{s\}\)\), the system enters asearch\-onlyregime, mirroring the asymptotic result in Theorem[4](https://arxiv.org/html/2605.23944#Thmtheorem4)\.
A natural example of the communication\-only regime is thedeep researchsetting, where the desired output is a long, detailed report or analysis\. Such outputs are cognitively costly for the user to evaluate, making it infeasible to provide or review multiple alternatives\. As a result, the interaction effectively collapses to a single recommendation and the overall performance is highly sensitive to the quality of the user’s initial message, shifting the burden entirely to communication\. Anticipating that only one output will be produced, it is worthwhile for the users to invest effort in providing a high\-fidelity, well\-specified query so that the agent can tailor a single precise response that closely aligns with the user’s preferences\.
#### 3\.3\.2Empirical Results for Finitedd
In this section, we present empirical results for the solution of𝒫d\(⋅,⋅\)\\mathcal\{P\}\_\{d\}\(\\cdot,\\cdot\)for finitedd\. Figure[4](https://arxiv.org/html/2605.23944#S3.F4)illustrates sharp phase transitions in the optimal interaction policy for finite dimensions, as reflected in the optimal values of message precisionρ\\rhoand recommendation exponentα\\alphaacross different feature dimensionsdd\. The figure shows how the policy varies with the scaled communication costccc\_\{c\}\(withλc=cc/d\\lambda\_\{c\}=c\_\{c\}/d\)\. Asccc\_\{c\}increases, the optimalρ\\rhodecreases and eventually reaches zero at a dimension\-dependent threshold\. By contrast, the optimalα\\alphaincreases withccc\_\{c\}, rising from zero to a strictly positive value\. Notably, there are parameter regions in which bothρ\\rhoandα\\alphaare simultaneously nonzero\. This pattern is consistent with Proposition[6](https://arxiv.org/html/2605.23944#Thmtheorem6)and provides a clear illustration of the finite\-dimensional phase transition inρ\\rho\. Moreover, the discrete jumps inα\\alphahighlight the role of the integer constraint on the recommendation set size, which induces sharp regime switches that are absent in the continuous asymptotic approximation\.
Figure 4:Optimal control variables in finite dimensions with respect to the scaled communication cost parameter\. \(Results are estimated via Monte Carlo simulations and therefore exhibit sampling variability\.\)
## 4Recommendations from Tilted Distribution
In previous sections, our analysis has focused on recommendation sets generated through direct sampling from the posterior distribution\. While natural and easy to implement leveraging Gen AI, this approach does not take advantage of the possibility of systematically adjusting the weight placed on the user’s message\. In this section, we consider an alternative mechanism\. Given a message𝐦∈\[−1,1\]d\\mathbf\{m\}\\in\[\-1,1\]^\{d\}, we define the class𝒬\\mathcal\{Q\}as the collection of distributions over𝜽\\bm\{\\theta\}induced by the following stochastic construction: a scalar random variableVVis drawn from some distributionP∈𝒫\(\[−1,1\]\)P\\in\\mathcal\{P\}\(\[\-1,1\]\), and independently, a random vector𝐘\\mathbf\{Y\}is drawn uniformly from among unit vectors orthogonal to𝐦\\mathbf\{m\}, denoted by𝒮\(𝐦\)\\mathcal\{S\}\(\\mathbf\{m\}\)\. The recommendation vector is then generated as𝜽=V𝐦\+1−V2𝐘\\bm\{\\theta\}=V\\mathbf\{m\}\+\\sqrt\{1\-V^\{2\}\}\\,\\mathbf\{Y\}\. Formally,
𝒬:=\{q\(𝜽\|𝐦\)∣𝜽=V𝐦\+1−V2𝐘,V∼𝒫\(\[−1,1\]\),𝐘∼Unif\(𝒮\(𝐦\)\),𝐘⟂V\}\\displaystyle\\mathcal\{Q\}:=\\left\\\{q\(\\bm\{\\theta\}\|\\mathbf\{m\}\)\\mid\\bm\{\\theta\}=V\\mathbf\{m\}\+\\sqrt\{1\-V^\{2\}\}\\mathbf\{Y\},\\quad V\\sim\\mathcal\{P\}\(\[\-1,1\]\),\\quad\\mathbf\{Y\}\\sim\\text\{Unif\}\(\\mathcal\{S\}\(\\mathbf\{m\}\)\),\\quad\\mathbf\{Y\}\\perp V\\right\\\}We define the class oftilted distributions, in which the weightVVis deterministic\. That is, for a fixedv∈\[−1,1\]v\\in\[\-1,1\], the recommendation is given by𝜽=v𝐦\+1−v2𝐘\\bm\{\\theta\}=v\\mathbf\{m\}\+\\sqrt\{1\-v^\{2\}\}\\,\\mathbf\{Y\}, with𝐘∼Unif\(𝒮\(𝐦\)\)\\mathbf\{Y\}\\sim\\mathrm\{Unif\}\(\\mathcal\{S\}\(\\mathbf\{m\}\)\)\. Formally,
𝒬Tilt:=\{q\(𝜽\|𝐦\)∣𝜽=v𝐦\+1−v2𝐘,v∈\[−1,1\],𝐘∼Unif\(𝒮\(𝐦\)\)\}\.\\displaystyle\\mathcal\{Q\}^\{\\text\{Tilt\}\}:=\\left\\\{q\(\\bm\{\\theta\}\|\\mathbf\{m\}\)\\mid\\bm\{\\theta\}=v\\mathbf\{m\}\+\\sqrt\{1\-v^\{2\}\}\\mathbf\{Y\},\\quad v\\in\[\-1,1\],\\quad\\mathbf\{Y\}\\sim\\text\{Unif\}\(\\mathcal\{S\}\(\\mathbf\{m\}\)\)\\right\\\}\.
The following theorem compares the performance of the general class𝒬\\mathcal\{Q\}with that of the tilted subclass𝒬Tilt\\mathcal\{Q\}^\{\\mathrm\{Tilt\}\}\. It shows that, in high dimensions, randomizing the weight placed on the user’s message provides no asymptotic advantage over an optimally chosen deterministic tilt\.
###### Theorem 7\(Optimality gap of sampling from a tilted distribution\)\.
For a recommendation set of sizenn, the expected utility of a sampling schemeq∈𝒬q\\in\\mathcal\{Q\}is given by
Un\(q\):=𝔼𝜽1,…,𝜽n∼q\(⋅\|𝐦\),i\.i\.d\.\[maxi∈\[n\]⟨𝐡,𝜽i⟩\]\.\\displaystyle U\_\{n\}\(q\):=\\mathbb\{E\}\_\{\\bm\{\\theta\}\_\{1\},\\dots,\\bm\{\\theta\}\_\{n\}\\sim q\(\\cdot\|\\mathbf\{m\}\),\\text\{ i\.i\.d\.\}\}\\left\[\\max\_\{i\\in\[n\]\}\\langle\\mathbf\{h\},\\bm\{\\theta\}\_\{i\}\\rangle\\right\]\.Then, there exists a universal constantK[7](https://arxiv.org/html/2605.23944#Thmtheorem7)K\_\{\\ref\{thm:tilt\_opt\}\}, such that, ford≥4d\\geq 4, the maximum expected utility achievable by any distribution in the class𝒬\\mathcal\{Q\}satisfies
maxq∈𝒬𝔼𝐡,𝐦\[Un\(q\)\]≤maxq∈𝒬Tilt𝔼𝐡,𝐦\[Un\(q\)\]\+K[7](https://arxiv.org/html/2605.23944#Thmtheorem7)d−3\(1−ρ2\),\\displaystyle\\max\_\{q\\in\\mathcal\{Q\}\}\\mathbb\{E\}\_\{\\mathbf\{h\},\\mathbf\{m\}\}\[U\_\{n\}\(q\)\]\\leq\\max\_\{q\\in\\mathcal\{Q\}^\{\\text\{Tilt\}\}\}\\mathbb\{E\}\_\{\\mathbf\{h\},\\mathbf\{m\}\}\[U\_\{n\}\(q\)\]\+\\frac\{K\_\{\\ref\{thm:tilt\_opt\}\}\}\{\\sqrt\{d\-3\}\(1\-\\rho^\{2\}\)\},where the expectation is taken over the joint distribution of preferences𝐡\\mathbf\{h\}, and messages𝐦∼pκ\(⋅\|𝐡\)\\mathbf\{m\}\\sim p\_\{\\kappa\}\(\\cdot\|\\mathbf\{h\}\), whereρ∈\[0,1\)\\rho\\in\[0,1\)andκ=ρ\(1−ρ2\)\(d−3\)\\kappa=\\frac\{\\rho\}\{\(1\-\\rho^\{2\}\)\}\(d\-3\)\.
Theorem[7](https://arxiv.org/html/2605.23944#Thmtheorem7)implies that the agent can deterministically*tilt*the recommendations toward the observed message to achieve near\-optimal expected utility, effectively managing the crucial trade\-off between exploiting communicated preferences and maintaining diversity across unobserved feature dimensions\. The tilt parametervvthus becomes a design lever: when communication is reliable, the agent can bias recommendations more heavily toward the message; when communication is noisy, the agent can diversify recommendations to hedge against uncertainty\. Mathematically, the optimal tilt is the solution of the following optimization problem:
v∗=argmaxvv𝔼\[W\]\+1−v2𝔼\[1−W2\]𝔼\[maxi∈\[n\]Xi\],\\displaystyle v^\{\*\}=\\arg\\max\_\{v\}\\ \\ v\\mathbb\{E\}\[W\]\+\\sqrt\{1\-v^\{2\}\}\\mathbb\{E\}\\left\[\\sqrt\{1\-W^\{2\}\}\\right\]\\mathbb\{E\}\\Big\[\\max\_\{i\\in\[n\]\}X\_\{i\}\\Big\],whereWWandXiX\_\{i\}’s satisfy the characterization in Lemma[1](https://arxiv.org/html/2605.23944#Thmtheorem1)\. Note that the optimal tilt parameterv∗v^\{\*\}depends on the value of𝔼\[maxiXi\]\\mathbb\{E\}\\left\[\\max\_\{i\}X\_\{i\}\\right\], which in turn depends on the recommendation set sizenn, implying the dependence of optimal tilt onnn, unlike that for the posterior sampling approach\. Exploiting Theorem[7](https://arxiv.org/html/2605.23944#Thmtheorem7), we restrict subsequent analysis to the class of tilted distributions\.
Allowing the user to also choose the message precisionκ\\kappa, the joint objective becomes
𝒯d\(κ,n,v\)=v𝔼\[W\]\+1−v2𝔼\[1−W2\]𝔼\[maxiXi\]−λslogn−λc\(κ𝔼\[W\]−logCd\(0\)Cd\(κ\)\)\.\\displaystyle\\mathcal\{T\}\_\{d\}\(\\kappa,n,v\)=\\ v\\mathbb\{E\}\[W\]\+\\sqrt\{1\-v^\{2\}\}\\mathbb\{E\}\\left\[\\sqrt\{1\-W^\{2\}\}\\right\]\\mathbb\{E\}\\left\[\\max\_\{i\}X\_\{i\}\\right\]\-\\lambda\_\{s\}\\log n\-\\lambda\_\{c\}\\Big\(\\kappa\\mathbb\{E\}\[W\]\-\\log\\frac\{C\_\{d\}\(0\)\}\{C\_\{d\}\(\\kappa\)\}\\Big\)\.We also define the high\-dimensional approximation of the objective as
𝒯∞\(ρ,α,v\)=maxxρv\+1−ρ21−v2x−csα\+12cclog\(1−ρ2\)such thatI\(x\)≤α,\\displaystyle\\mathcal\{T\}\_\{\\infty\}\(\\rho,\\alpha,v\)=\\max\_\{x\}\\ \\ \\rho v\+\\sqrt\{1\-\\rho^\{2\}\}\\sqrt\{1\-v^\{2\}\}x\-c\_\{s\}\\alpha\+\\frac\{1\}\{2\}c\_\{c\}\\log\(1\-\\rho^\{2\}\)\\ \\text\{ such that \}I\(x\)\\leq\\alpha,\(9\)whereI\(x\)=−12log\(1−x2\)I\(x\)=\-\\frac\{1\}\{2\}\\log\(1\-x^\{2\}\), implying that optimalxxis given byx∗=1−e−2αx^\{\*\}=\\sqrt\{1\-e^\{\-2\\alpha\}\}\. The relation between𝒯d\(κ,v,n\)\\mathcal\{T\}\_\{d\}\(\\kappa,v,n\)and the high\-dimensional counterpart𝒯∞\(ρ,α,v\)\\mathcal\{T\}\_\{\\infty\}\(\\rho,\\alpha,v\)is captured in Theorem[8](https://arxiv.org/html/2605.23944#Thmtheorem8)\.
###### Theorem 8\(Asymptotically optimal tilt parameter and Phase Transitions\)\.
Supposeρ\\rhoandα\\alphaare fixed, and let, for anydd,κd=ρ1−ρ2\(d−3\)\\kappa\_\{d\}=\\frac\{\\rho\}\{1\-\\rho^\{2\}\}\(d\-3\)andnd=⌊eαd⌋n\_\{d\}=\\lfloor e^\{\\alpha d\}\\rfloorfor fixedρ\\rhoandα\\alpha\. Then, for any choice of tilt parametervv, we have
limd→∞𝒯d\(κd,nd,v\)=𝒯∞\(ρ,α,v\)\.\\displaystyle\\lim\_\{d\\rightarrow\\infty\}\\mathcal\{T\}\_\{d\}\(\\kappa\_\{d\},n\_\{d\},v\)=\\mathcal\{T\}\_\{\\infty\}\(\\rho,\\alpha,v\)\.Furthermore,
1. 1\.*Optimal Tilt:*For any fixed\(ρ,α\)\(\\rho,\\alpha\), the optimal tilt parameterv∗v^\{\*\}that optimizes the high\-dimensional approximation of the objective, i\.e\.,𝒯∞\(ρ,α,v\)\\mathcal\{T\}\_\{\\infty\}\(\\rho,\\alpha,v\), is given by v∗\(ρ,α\)=ρ1−\(1−ρ2\)e−2α\.v^\{\*\}\(\\rho,\\alpha\)=\\frac\{\\rho\}\{\\sqrt\{1\-\(1\-\\rho^\{2\}\)e^\{\-2\\alpha\}\}\}\.\(10\)
2. 2\.*Regime Phase Transitions:*Let\(ρ∗,α∗\)\(\\rho^\{\*\},\\alpha^\{\*\}\)maximize the asymptotic utility𝒯∞\\mathcal\{T\}\_\{\\infty\}subject to search costcsc\_\{s\}and communication costccc\_\{c\}\. The optimal policy exhibits a sharp phase transition: - •*Pure Communication Regime:*Ifcs\>ccc\_\{s\}\>c\_\{c\}, the optimal policy collapses toα∗=0\\alpha^\{\*\}=0andv∗=1v^\{\*\}=1\. - •*Pure Search Regime:*Ifcs<ccc\_\{s\}<c\_\{c\}, the optimal policy collapses toρ∗=0\\rho^\{\*\}=0andv∗=0v^\{\*\}=0\. Further, we have that \(1−\(ρ∗\)2\)e−2α∗=12\(cmin4\+4cmin2−cmin2\),\\displaystyle\\big\(1\-\(\\rho^\{\*\}\)^\{2\}\\big\)e^\{\-2\\alpha^\{\*\}\}=\\frac\{1\}\{2\}\\Big\(\\sqrt\{c\_\{\\min\}^\{4\}\+4c\_\{\\min\}^\{2\}\}\-c\_\{\\min\}^\{2\}\\Big\),\(11\)wherecmin=min\{cs,cc\}c\_\{\\min\}=\\min\\\{c\_\{s\},c\_\{c\}\\\}, and eitherρ∗=0\\rho^\{\*\}=0\(ifcc\>csc\_\{c\}\>c\_\{s\}\) orα∗=0\\alpha^\{\*\}=0\(ifcc<csc\_\{c\}<c\_\{s\}\)\.
Theorem[8](https://arxiv.org/html/2605.23944#Thmtheorem8)characterizes the optimal tiltv∗v^\{\*\}under the high\-dimensional approximation of the objective𝒯d\(κ,n,v\)\\mathcal\{T\}\_\{d\}\(\\kappa,n,v\)and identifies the sharp phase transition that emerges when both the tiltvvand the parameters\(ρ,α\)\(\\rho,\\alpha\)are chosen optimally\. We see that the optimal tiltv∗v^\{\*\}depends on bothρ\\rhoandα\\alpha, i\.e\., the optimal tilt must adjust simultaneously to the message precision and to the size of the recommendation set\. Several key structural observations emerge from Theorem[8](https://arxiv.org/html/2605.23944#Thmtheorem8)\. First, wheneverρ\>0\\rho\>0andα<∞\\alpha<\\infty\(i\.e\.,n<∞n<\\infty\), the optimal tiltv∗v^\{\*\}is strictly greater thanρ\\rho\. This indicates that an optimized agent should place more weight on the communicated signal than the typical weight under posterior sampling, which can be seen as a form of regularization\. Second, the availability of tilting induces a specialization in the optimal policy: the policy either extracts information and uses it to provide a single recommendation \(α∗=0\\alpha^\{\*\}=0andv∗=1v^\{\*\}=1\) or avoids communication entirely to rely on search \(ρ∗=0\\rho^\{\*\}=0andv∗=0v^\{\*\}=0\)\.*There is no joint communication and search regime\.*And third, as can be deduced from Eq\. \([11](https://arxiv.org/html/2605.23944#S4.E11)\), the optimal precisionρ∗\\rho^\{\*\}and set\-size exponentα∗\\alpha^\{\*\}are determined by the minimum of the two cost parameterscminc\_\{\\min\}, exhibiting monotonicity whereρ∗\\rho^\{\*\}decreases asccc\_\{c\}increases andα∗\\alpha^\{\*\}decreases ascsc\_\{s\}increases\.
We avoid pursuing a more elaborate convergence result \(cf\. Theorem[4](https://arxiv.org/html/2605.23944#Thmtheorem4)\) and performance gap bound \(cf\. Theorem[5](https://arxiv.org/html/2605.23944#Thmtheorem5)\) in the interest of space\. The proofs of Theorem[7](https://arxiv.org/html/2605.23944#Thmtheorem7)and Theorem[8](https://arxiv.org/html/2605.23944#Thmtheorem8)are provided in Appendix[G](https://arxiv.org/html/2605.23944#A7)\. Figure[5](https://arxiv.org/html/2605.23944#S4.F5)\(left\) compares the payoff achieved under tilting with that obtained from posterior sampling; and reveals substantial gains from tilting undercs\>ccc\_\{s\}\>c\_\{c\}\.
Figure 5:Performance comparison between the tilted\-distribution and posterior\-sampling strategies, along with the comparison ofρ∗\\rho^\{\*\}w\.r\.t\.ccc\_\{c\}\(withcs=1\.55c\_\{s\}=1\.55\) andα∗\\alpha^\{\*\}w\.r\.t\.csc\_\{s\}\(withcc=0\.58c\_\{c\}=0\.58\)\.Practically, the tilt parameter corresponds to a tunable control in modern LLM\-based recommender systems, analogous to temperature or retrieval\-weighting in retrieval\-augmented generation \(RAG\) pipelines\(Lewiset al\.[2020](https://arxiv.org/html/2605.23944#bib.bib6), Gaoet al\.[2023](https://arxiv.org/html/2605.23944#bib.bib11)\)\. Crucially, the structure of the optimal tilt parameter reveals that the real\-world tuning must account for both the quality of the user’s message and the desired diversity in the output\. This mirrors emerging practices in commercial AI assistants, which increasingly adjust weighting or temperature dynamically based on query quality and uncertainty\(Wuet al\.[2024](https://arxiv.org/html/2605.23944#bib.bib21), Baoet al\.[2024](https://arxiv.org/html/2605.23944#bib.bib22)\)\.
#### 4\.0\.1Comparison of Posterior Sampling and Tilted Distribution Schemes
This section compares tilted sampling and posterior sampling, and the optimal interaction policy under each\.
##### Structure of optimal interaction policy:
A striking implication of Theorem[8](https://arxiv.org/html/2605.23944#Thmtheorem8)is that, when all the parameters\(ρ,n,v\)\(\\rho,n,v\)are chosen optimally, the interaction policy is governed by the smaller of the two cost parameters\. Unlike the posterior\-sampling approach, where we have to jointly optimize communication and search, the tilted mechanism adapts primarily to the cheapest source of user effort\. When communication is cheaper, the system relies on precise messaging; when search is cheaper, it relies on exploration\. Further, as seen in Figure[5](https://arxiv.org/html/2605.23944#S4.F5), the optimal message precision under tilting declines more gradually as communication costs increase, indicating that the users have reason to provide more preference information under tilting\. Also, the tilted sampling generally relies on smaller recommendation sets, with the optimal recommendation set size exponent consistently no larger than that under posterior sampling\. Together, these patterns show that tilting makes more effective use of both communication and search than direct posterior sampling\.
##### Switching curves:
Under posterior sampling, the optimal policy switches from a hybrid policy to a search\-only regime at a threshold onccc\_\{c\}ofc¯c\(cs\)<cs\\bar\{c\}\_\{c\}\(c\_\{s\}\)<c\_\{s\}, whereas under tilted sampling, there is sharp phase transition at a threshold ofcc=csc\_\{c\}=c\_\{s\}\. The shift from the asymmetric switching curve under posterior sampling to the symmetriccc=csc\_\{c\}=c\_\{s\}line under tilting results from a move from a passive, “black\-box” agent to a highly optimized one\. Under posterior sampling, the agent’s recommendations merely mirror the user’s noisy input, which causes the marginal benefit of communication to diminish rapidly as the user’s message becomes coarser \(as communication cost increases\)\. This makes search a cheaper lever for utility, leading users to abandon communication even when it is technically cheaper than search, leading toc¯c\(cs\)<cs\\bar\{c\}\_\{c\}\(c\_\{s\}\)<c\_\{s\}\.
In contrast, tilted sampling allows the agent to act as an active optimizer that makes best use of the user’s signal by adjusting a deterministic tilt parameter\. This capability restores the efficiency of user’s communication; specifically, forc¯c\(cs\)<cc<cs\\bar\{c\}\_\{c\}\(c\_\{s\}\)<c\_\{c\}<c\_\{s\}, tilting facilitates a shift where the user’s effort is reallocated from search \(under posterior sampling\) back to communication \(under tilting\)\. This effectively restores the marginal parity between communication and search, causing the switching curve to coincide exactly withcc=csc\_\{c\}=c\_\{s\}\.
## 5Conclusion
This work establishes a unified theoretical framework for understanding the collaborative interplay between human communication and AI\-driven recommendation\. By viewing this interaction through an information\-theoretically inspired lens, we capture how users’ cognitive costs for communication and search shape optimal system design and overall performance\. Our findings may help inform the design of efficient AI shopping assistants which use the right mix of communication and search for a given situation, and if possible, are tuned to sample recommendations from an appropriately “tilted” version of the distribution of what the user may like\.
## References
- N\. Agarwal, A\. Moehring, P\. Rajpurkar, and T\. Salz \(2023\)Combining human expertise with artificial intelligence: experimental evidence from radiology\.Technical reportNational Bureau of Economic Research\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px1.p1.1)\.
- V\. Angelova, W\. S\. Dobbie, and C\. Yang \(2023\)Algorithmic recommendations and human discretion\.Technical reportNational Bureau of Economic Research\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px1.p1.1)\.
- P\. Atchley, H\. Pannell, K\. Wofford, M\. Hopkins, and R\. A\. Atchley \(2024\)Human and ai collaboration in the higher education environment: opportunities and concerns\.Cognitive research: principles and implications9\(1\),pp\. 20\.Cited by:[§1](https://arxiv.org/html/2605.23944#S1.p1.1)\.
- D\. Bakry, I\. Gentil, and M\. Ledoux \(2013\)Analysis and geometry of markov diffusion operators\.Vol\.348,Springer Science & Business Media,Cham, Switzerland\.Cited by:[item 1](https://arxiv.org/html/2605.23944#A4.I2.i1.p1.11),[§G\.1](https://arxiv.org/html/2605.23944#A7.SS1.5.p5.6)\.
- G\. Bansal, W\. Hua, Z\. Huang, A\. Fourney, A\. Swearngin, W\. Epperson, T\. Payne, J\. M\. Hofman, B\. Lucier, C\. Singh, M\. Mobius, A\. Nambi, A\. Yadav, K\. Gao, D\. M\. Rothschild, A\. Slivkins, D\. G\. Goldstein, H\. Mozannar, N\. Immorlica, M\. Murad, M\. Vogel, S\. Kambhampati, E\. Horvitz, and S\. Amershi \(2025\)Magentic marketplace: an open\-source environment for studying agentic markets\.External Links:2510\.25779Cited by:[§1](https://arxiv.org/html/2605.23944#S1.p1.1)\.
- K\. Bao, J\. Zhang, X\. Lin, Y\. Zhang, W\. Wang, and F\. Feng \(2024\)Large language models for recommendation: past, present, and future\.InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval,New York, NY, USA,pp\. 2993–2996\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px3.p2.1),[§4](https://arxiv.org/html/2605.23944#S4.p7.1)\.
- O\. Barkan and N\. Koenigstein \(2016\)Item2vec: Neural Item Embedding for Collaborative Filtering\.In2016 IEEE 26th International Workshop on Machine Learning for Signal Processing,Salerno, Italy,pp\. 1–6\.Cited by:[§2](https://arxiv.org/html/2605.23944#S2.p4.1)\.
- T\. Boyacı, C\. Canyakmaz, and F\. De Véricourt \(2024\)Human and machine: the impact of machine input on decision making under cognitive limitations\.Management Science70\(2\),pp\. 1258–1275\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px1.p2.1),[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px2.p1.1)\.
- F\. Castro, J\. Gao, and S\. Martin \(2024\)Human\-ai interactions and societal pitfalls\.InProceedings of the 25th ACM Conference on Economics and Computation,New York, NY, USA,pp\. 205\.External Links:ISBN 9798400707049Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px1.p3.1),[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px2.p1.1)\.
- L\. Cui, S\. Huang, F\. Wei, C\. Tan, C\. Duan, and M\. Zhou \(2017\)Superagent: a customer service chatbot for e\-commerce websites\.InProceedings of the 55th Annual Meeting of the Association for Computational Linguistics: System Demonstrations \(ACL 2017\),Vancouver, Canada,pp\. 97–102\.Cited by:[§1](https://arxiv.org/html/2605.23944#S1.p1.1)\.
- Y\. Gao, Y\. Xiong, X\. Gao, K\. Jia, J\. Pan, Y\. Bi, Y\. Dai, J\. Sun, Q\. Wang, and H\. Wang \(2023\)Retrieval\-Augmented Generation for Large Language Models: A Survey\.Note:arXiv preprint arXiv:2312\.10997Cited by:[§4](https://arxiv.org/html/2605.23944#S4.p7.1)\.
- S\. S\. Iyengar and M\. R\. Lepper \(2000\)When choice is demotivating: can one desire too much of a good thing?\.Journal of personality and social psychology79\(6\),pp\. 995\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px3.p1.1)\.
- F\. Jiang, Y\. Jiang, H\. Zhi, Y\. Dong, H\. Li, S\. Ma, Y\. Wang, Q\. Dong, H\. Shen, and Y\. Wang \(2017\)Artificial Intelligence in Healthcare: Past, Present and Future\.Stroke and Vascular Neurology2\(4\),pp\. 230–243\.Cited by:[§1](https://arxiv.org/html/2605.23944#S1.p1.1)\.
- E\. Kasneci, K\. Seßler, S\. Küchemann, M\. Bannert, D\. Dementieva, F\. Fischer, U\. Gasser, G\. Groh, S\. Günnemann, E\. Hüllermeier,et al\.\(2023\)ChatGPT for good? on opportunities and challenges of large language models for education\.Learning and individual differences103,pp\. 102274\.Cited by:[§1](https://arxiv.org/html/2605.23944#S1.p1.1)\.
- J\. Kleinberg, H\. Lakkaraju, J\. Leskovec, J\. Ludwig, and S\. Mullainathan \(2018\)Human decisions and machine predictions\.The quarterly journal of economics133\(1\),pp\. 237–293\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px1.p1.1)\.
- Y\. Koren, S\. Rendle, and R\. Bell \(2021\)Advances in Collaborative Filtering\.InRecommender Systems Handbook,pp\. 91–142\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px3.p1.1)\.
- M\. Ledoux \(2001\)The concentration of measure phenomenon\.Mathematical Surveys and Monographs,American Mathematical Society,Providence, RI\.Cited by:[item 1](https://arxiv.org/html/2605.23944#A4.I2.i1.p1.11),[§G\.1](https://arxiv.org/html/2605.23944#A7.SS1.5.p5.6)\.
- P\. Lewis, E\. Perez, A\. Piktus, F\. Petroni, V\. Karpukhin, N\. Goyal, H\. Küttler, M\. Lewis, W\. Yih, T\. Rocktäschel,et al\.\(2020\)Retrieval\-augmented generation for knowledge\-intensive nlp tasks\.Advances in neural information processing systems33,pp\. 9459–9474\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px3.p2.1),[§4](https://arxiv.org/html/2605.23944#S4.p7.1)\.
- A\. Liang \(2025\)Artificial intelligence clones\.InProceedings of the 26th ACM Conference on Economics and Computation,pp\. 387–388\.External Links:ISBN 9798400719431Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px1.p3.1)\.
- G\. Linden, B\. Smith, and J\. York \(2003\)Amazon\.com recommendations: item\-to\-item collaborative filtering\.IEEE Internet computing7\(1\),pp\. 76–80\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px3.p1.1)\.
- B\. Maćkowiak, F\. Matějka, and M\. Wiederholt \(2023\)Rational inattention: a review\.Journal of Economic Literature61\(1\),pp\. 226–273\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px2.p1.1)\.
- B\. Maćkowiak and M\. Wiederholt \(2009\)Optimal sticky prices under rational inattention\.American Economic Review99\(3\),pp\. 769–803\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px2.p1.1)\.
- K\. V\. Mardia and P\. E\. Jupp \(2009\)Directional Statistics\.John Wiley & Sons,Chichester, UK\.External Links:ISBN 978\-0471953333Cited by:[item \(i\)](https://arxiv.org/html/2605.23944#A4.I1.ix1.p1.2),[§E\.2](https://arxiv.org/html/2605.23944#A5.SS2.1.p1.13),[item \(i\)](https://arxiv.org/html/2605.23944#S2.I1.ix1.p1.16),[item \(i\)](https://arxiv.org/html/2605.23944#S2.I1.ix1.p1.6),[§2\.1](https://arxiv.org/html/2605.23944#S2.SS1.p4.1)\.
- J\. Miao, J\. Wu, and E\. R\. Young \(2022\)Multivariate rational inattention\.Econometrica90\(2\),pp\. 907–945\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px2.p1.1)\.
- M\. Moor, O\. Banerjee, Z\. S\. H\. Abad, H\. M\. Krumholz, J\. Leskovec, E\. J\. Topol, and P\. Rajpurkar \(2023\)Foundation models for generalist medical artificial intelligence\.Nature616\(7956\),pp\. 259–265\.Cited by:[§1](https://arxiv.org/html/2605.23944#S1.p1.1)\.
- B\. Sarwar, G\. Karypis, J\. Konstan, and J\. Riedl \(2001\)Item\-based collaborative filtering recommendation algorithms\.InProceedings of the 10th International Conference on World Wide Web,New York, NY, USA,pp\. 285–295\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px3.p1.1)\.
- B\. Scheibehenne, R\. Greifeneder, and P\. M\. Todd \(2010\)Can there ever be too many options? a meta\-analytic review of choice overload\.Journal of consumer research37\(3\),pp\. 409–425\.Cited by:[§1](https://arxiv.org/html/2605.23944#S1.p1.1)\.
- B\. Schwartz \(2015\)The Paradox of Choice\.InPositive Psychology in Practice: Promoting Human Flourishing in Work, Health, Education, and Everyday Life,pp\. 121–138\.Cited by:[§1](https://arxiv.org/html/2605.23944#S1.p1.1)\.
- C\. A\. Sims \(2003\)Implications of rational inattention\.Journal of monetary Economics50\(3\),pp\. 665–690\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px2.p1.1)\.
- C\. A\. Sims \(2005\)Rational inattention: a research agenda\.Technical reportTechnical Report2005,34,Discussion Paper Series 1,Deutsche Bundesbank,Frankfurt am Main, Germany\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px2.p1.1)\.
- C\. A\. Sims \(2006\)Rational inattention: beyond the linear\-quadratic case\.American Economic Review96\(2\),pp\. 158–163\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px2.p1.1)\.
- X\. Su and T\. M\. Khoshgoftaar \(2009\)A survey of collaborative filtering techniques\.Advances in artificial intelligence2009\(1\),pp\. 421425\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px3.p1.1)\.
- P\. B\. Thorat, R\. M\. Goudar, and S\. Barve \(2015\)Survey on collaborative filtering, content\-based filtering and hybrid recommendation system\.International Journal of Computer Applications110\(4\),pp\. 31–36\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px3.p1.1)\.
- N\. Tishby and D\. Polani \(2010\)Information theory of decisions and actions\.InPerception\-action cycle: Models, architectures, and hardware,pp\. 601–636\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px2.p1.1)\.
- H\. Vasconcelos, M\. Jörke, M\. Grunde\-McLaughlin, T\. Gerstenberg, M\. S\. Bernstein, and R\. Krishna \(2023\)Explanations can reduce overreliance on ai systems during decision\-making\.Proceedings of the ACM on Human\-Computer Interaction7\(CSCW1\),pp\. 1–38\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px1.p2.1)\.
- L\. Wu, Z\. Zheng, Z\. Qiu, H\. Wang, H\. Gu, T\. Shen, C\. Qin, C\. Zhu, H\. Zhu, Q\. Liu,et al\.\(2024\)A survey on large language models for recommendation\.World Wide Web27\(5\),pp\. 60\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px3.p2.1),[§4](https://arxiv.org/html/2605.23944#S4.p7.1)\.
- K\. Yu, A\. L\. Beam, and I\. S\. Kohane \(2018\)Artificial intelligence in healthcare\.Nature biomedical engineering2\(10\),pp\. 719–731\.Cited by:[§1](https://arxiv.org/html/2605.23944#S1.p1.1)\.
- J\. Zhang, K\. Bao, Y\. Zhang, W\. Wang, F\. Feng, and X\. He \(2023\)Is ChatGPT fair for recommendation? evaluating fairness in large language model recommendation\.InProceedings of the 17th ACM Conference on Recommender Systems,New York, NY, USA,pp\. 993–999\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px3.p2.1)\.
- W\. Zhong \(2022\)Optimal dynamic information acquisition\.Econometrica90\(4\),pp\. 1537–1582\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px2.p1.1)\.
- C\. Ziegler, S\. M\. McNee, J\. A\. Konstan, and G\. Lausen \(2005\)Improving recommendation lists through topic diversification\.InProceedings of the 14th International Conference on World Wide Web,New York, NY, USA,pp\. 22–32\.Cited by:[§1\.2](https://arxiv.org/html/2605.23944#S1.SS2.SSS0.Px3.p1.1)\.
## Appendix APure Policies under Posterior Sampling
To highlight the benefits of jointly optimizing message precision and recommendation set size under posterior sampling, we compare our main formulation with two benchmark policies in which only one lever is active\. Specifically, we consider a pure search benchmark, where only the recommendation set size is optimized, and a pure communication benchmark, where only message precision is optimized\. These restricted scenarios isolate the individual contributions of search and communication and allow us to quantify the performance gains from optimizing both levers simultaneously\.
### A\.1Brute\-Force Search \(No Communication\)
In this scenario, the user provides no preference information to the agent, which corresponds to setting the message precisionκ=0\\kappa=0\. Mathematically, the high\-dimensional approximation of the user’s payoff,𝒫d\(0,n\)\\mathcal\{P\}\_\{d\}\(0,n\), is found by substitutingρ=w=0\\rho=w=0into the joint optimization problemOPTJoint\\text\{OPT\}\_\{\\text\{Joint\}\}, presented in Eq\. \([6](https://arxiv.org/html/2605.23944#S3.E6)\)\. The problem then simplifies to,
OPTSearch=maxα,x\{x−csαs\.t\.12log\(1−x2\)=−α\}\.\\displaystyle\\text\{OPT\}\_\{\\text\{Search\}\}=\\max\_\{\\alpha,x\}\\left\\\{x\-c\_\{s\}\\alpha\\ \\ \\ \\text\{s\.t\.\}\\ \\ \\ \\frac\{1\}\{2\}\\log\(1\-x^\{2\}\)=\-\\alpha\\right\\\}\.This constrained optimization problem can be solved analytically, yielding the optimal values
α∗\(cs\):=12log\(12\+14\+1cs2\),andOPTSearch=1−e−2α∗\(cs\)−csα∗\(cs\)\.\\displaystyle\\alpha^\{\*\}\(c\_\{s\}\):=\\frac\{1\}\{2\}\\log\\left\(\\frac\{1\}\{2\}\+\\sqrt\{\\frac\{1\}\{4\}\+\\frac\{1\}\{c\_\{s\}^\{2\}\}\}\\right\),\\quad\\text\{and\}\\quad\\text\{OPT\}\_\{\\text\{Search\}\}=\\sqrt\{1\-e^\{\-2\\alpha^\{\*\}\(c\_\{s\}\)\}\}\-c\_\{s\}\\alpha^\{\*\}\(c\_\{s\}\)\.\(12\)From Theorem[4](https://arxiv.org/html/2605.23944#Thmtheorem4), we also know that there existsc¯c\\bar\{c\}\_\{c\}such that for allcc\>c¯cc\_\{c\}\>\\bar\{c\}\_\{c\}, we have that the optimal solution fromOPTJoint\\text\{OPT\}\_\{\\text\{Joint\}\}matches that ofOPTSearch\\text\{OPT\}\_\{\\text\{Search\}\}\. The following corollary provides the performance gap between the optimal performance of the the high\-dimensional approximation and that of the original finite\-dimensional problem\.
###### Corollary 9\.
Letα∗\(cs\)\\alpha^\{\*\}\(c\_\{s\}\)denote the solution ofOPTSearch\\text\{OPT\}\_\{\\text\{Search\}\}and letn∞∗=⌊edα∗\(cs\)⌋n^\{\*\}\_\{\\infty\}=\\lfloor e^\{d\\alpha^\{\*\}\(c\_\{s\}\)\}\\rfloor\. We have,
\|maxn𝒫d\(0,n\)−𝒫d\(0,n∞∗\)\|≤O~\(d−12\)\.\\displaystyle\\left\|\\max\_\{n\}\\mathcal\{P\}\_\{d\}\(0,n\)\-\\mathcal\{P\}\_\{d\}\(0,n^\{\*\}\_\{\\infty\}\)\\right\|\\leq\\tilde\{O\}\\big\(d^\{\-\\frac\{1\}\{2\}\}\\big\)\.
Mathematically, Corollary[9](https://arxiv.org/html/2605.23944#Thmtheorem9)follows by arguments analogous to those used in the proof of Theorem[5](https://arxiv.org/html/2605.23944#Thmtheorem5)\. As shown in Proposition[6](https://arxiv.org/html/2605.23944#Thmtheorem6), the pure search policy is not merely a theoretical baseline but becomes the*optimal*interaction policy when communication costs are prohibitively high\. In this regime, the preceding analysis yields a closed\-form characterization of the optimal recommendation set size\.
The analytic expression forα\(cs\)\\alpha^\{\(\}c\_\{s\}\)also allows us to characterize system performance under high search costs\. A second\-order expansion givesα\(cs\)≈12cs2\\alpha^\{\(\}c\_\{s\}\)\\approx\\frac\{1\}\{2c\_\{s\}^\{2\}\}, which implies that the optimal payoff satisfiesOPTSearch≈12cs\\text\{OPT\}\_\{\\text\{Search\}\}\\approx\\frac\{1\}\{2c\_\{s\}\}\. Thus, the attainable payoff decays on the order of1/cs1/c\_\{s\}, indicating a severe deterioration in performance as search becomes more costly for the user\.
### A\.2Single recommendation with quality communication
In this scenario, the agent is restricted to providing a single recommendation \(n=1n=1\)\. This is analogous to settingα=0\\alpha=0in the high\-dimensional approximation, and the feasibility constraint in theOPTJoint\\text\{OPT\}\_\{\\text\{Joint\}\}problem,Iρ\(w,x\)=αI\_\{\\rho\}\(w,x\)=\\alpha, simplifies toIρ\(w,x\)=0I\_\{\\rho\}\(w,x\)=0\. This equality holds only when the recommendation perfectly aligns with the communicated message \(w=ρw=\\rho\), and there is no orthogonal component to explore \(x=0x=0\)\. The optimization problem thus reduces to
OPTComm=maxρ\{ρ2\+cc2log\(1−ρ2\)\}\.\\displaystyle\\text\{OPT\}\_\{\\text\{Comm\}\}=\\max\_\{\\rho\}\\left\\\{\\rho^\{2\}\+\\frac\{c\_\{c\}\}\{2\}\\log\(1\-\\rho^\{2\}\)\\right\\\}\.The optimal solution exhibits a \(soft\) threshold depending onccc\_\{c\}, given by
ρ∗\(cc\):=\{1−cc2,ifcc<2,0,otherwise,andOPTComm=\{1−cc2log2ecc,ifcc<2,0,otherwise\.\\displaystyle\\rho^\{\*\}\(c\_\{c\}\):=\\begin\{cases\}\\sqrt\{1\-\\tfrac\{c\_\{c\}\}\{2\}\},&\\text\{if \}c\_\{c\}<2,\\\\\[6\.0pt\] 0,&\\text\{otherwise\},\\end\{cases\}\\qquad\\text\{and\}\\qquad\\text\{OPT\}\_\{\\text\{Comm\}\}=\\begin\{cases\}1\-\\tfrac\{c\_\{c\}\}\{2\}\\log\\tfrac\{2e\}\{c\_\{c\}\},&\\text\{if \}c\_\{c\}<2,\\\\\[6\.0pt\] 0,&\\text\{otherwise\}\.\\end\{cases\}This result reveals that meaningful communication is only viable if the scaled costccc\_\{c\}is below a threshold of22\. If the cost exceeds this point, the optimal policy for the user is to provide no information \(ρ∗\(cc\)=0\\rho^\{\*\}\(c\_\{c\}\)=0\), resulting in a payoff of zero, which is equivalent to the agent making a random guess\. Note that, for any value ofccc\_\{c\}, the result ofOPTJoint\\text\{OPT\}\_\{\\text\{Joint\}\}matches that ofOPTComm\\text\{OPT\}\_\{\\text\{Comm\}\}only whencsc\_\{s\}goes to infinity\. As before, the following corollary ensures that this simplified model accurately reflects the behavior of the finite\-dimensional system\.
###### Corollary 10\.
Letρ∗\(cc\)\\rho^\{\*\}\(c\_\{c\}\)andρ∗\(cs,cc\)\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)denote the solution ofOPTComm\\text\{OPT\}\_\{\\text\{Comm\}\}andOPTJoint\\text\{OPT\}\_\{\\text\{Joint\}\}respectively\. Then, for anycsc\_\{s\},ρ∗\(cc\)=ρ∗\(∞,cc\)≥ρ∗\(cs,cc\)\\rho^\{\*\}\(c\_\{c\}\)=\\rho^\{\*\}\(\\infty,c\_\{c\}\)\\geq\\rho^\{\*\}\(c\_\{s\},c\_\{c\}\)\. Letκ∞∗=ρ∗\(cc\)1−\(ρ∗\(cc\)\)2d\\kappa^\{\*\}\_\{\\infty\}=\\frac\{\\rho^\{\*\}\(c\_\{c\}\)\}\{1\-\(\\rho^\{\*\}\(c\_\{c\}\)\)^\{2\}\}d\. We have,
\|maxκ𝒫d\(κ,1\)−𝒫d\(κ∞∗,1\)\|≤O~\(d−12\)\.\\displaystyle\\big\|\\max\_\{\\kappa\}\\mathcal\{P\}\_\{d\}\(\\kappa,1\)\-\\mathcal\{P\}\_\{d\}\(\\kappa^\{\*\}\_\{\\infty\},1\)\\big\|\\leq\\tilde\{O\}\\big\(d^\{\-\\frac\{1\}\{2\}\}\\big\)\.
Similar to Corollary[9](https://arxiv.org/html/2605.23944#Thmtheorem9), Corollary[10](https://arxiv.org/html/2605.23944#Thmtheorem10)follows by arguments analogous to those used in the proof of Theorem[5](https://arxiv.org/html/2605.23944#Thmtheorem5)\. Corollary[10](https://arxiv.org/html/2605.23944#Thmtheorem10)shows that under a pure communication regime, the user is incentivized to invest more effort in providing preference information\. Moreover, the user’s achievable payoff declines rapidly with the communication cost parameter, eventually collapsing to zero at a finite threshold\.
The preceding analysis highlights the inherent fragility of relying on pure policies\. In both the search\-only and communication\-only benchmarks, the user’s attainable payoff declines rapidly as the corresponding cost parameters increase, rendering these single\-lever approaches ineffective even under moderate cognitive burdens\. This limitation reinforces our central insight: effective system design requires jointly optimizing message precision and recommendation set size in accordance with the underlying cost structure\.
## Appendix BAdditional Numerical Results
To validate the accuracy of our analytical approach, Figure[6](https://arxiv.org/html/2605.23944#A2.F6)compares the optimal objective values and solutions of the finite\-dimensional problem against high\-dimensional asymptotic approximations, which are evaluated through simulation\. The leftmost panel reports the overall payoff \(objective value\) and shows that the user’s payoff \(dotted\) converges to the theoretical limit \(dashed\) as the feature dimensionddincreases, consistent with the convergence predicted by Theorem[5](https://arxiv.org/html/2605.23944#Thmtheorem5)\. The center and right panels depict the behavior of the optimal control variables\. We observe that both the message precision parameterκ\\kappaand the recommendation set sizennare increasing functions of the feature dimensiondd\. Specifically,κ\\kappascales linearly withdd, whilennscales exponentially withdd\.
Figure 6:Comparison of finite\-ddsimulations \(markers\) and high\-dimensional asymptotics \(dashed lines\)\. Panels display the optimal expected payoff \(Left\), message precisionρ\\rho\(Center\), and set size exponentα\\alpha\(Right\) as a function ofdd\.
## Appendix CWeighted Preferences Extension
In the baseline model, user preferences were assumed to be uniformly distributed across feature dimensions, so that each attribute contributes equally to overall utility and incurs the same communication cost\. In many practical recommendation settings, however, feature dimensions may differ both in how much they contribute to utility and in how costly it is to communicate preferences about them\. In this section, we extend the framework to capture such structured heterogeneity by allowing different subsets of features to receive different weights in the utility function and to be associated with different communication costs\. This formulation enables us to study how communication and search decisions are allocated across feature subsets when their contributions to utility and their communication costs are asymmetric\.
We model the user’s preference vector as a composition of two orthogonal components:
𝐡=μ𝐡1\+1−μ2𝐡2\\displaystyle\\mathbf\{h\}=\\mu\\mathbf\{h\}\_\{1\}\+\\sqrt\{1\-\\mu^\{2\}\}\\mathbf\{h\}\_\{2\}where, for simplicity, we assume that the set of features are divided into two subsetsd=d1\+d2d=d\_\{1\}\+d\_\{2\}, and𝐡1\\mathbf\{h\}\_\{1\}is supported on the firstd1d\_\{1\}features \(lastd2d\_\{2\}elements of𝐡1\\mathbf\{h\}\_\{1\}are all zero\), and similarly,𝐡2\\mathbf\{h\}\_\{2\}is supported on the lastd2d\_\{2\}features\. The scalarμ∈\[0,1\]\\mu\\in\[0,1\]determines the relative importance of the two feature groups\.
The user’s message follows a similar decomposition,𝐦=μ𝐦1\+1−μ2𝐦2\\mathbf\{m\}=\\mu\\mathbf\{m\}\_\{1\}\+\\sqrt\{1\-\\mu^\{2\}\}\\mathbf\{m\}\_\{2\}, where𝐦1\\mathbf\{m\}\_\{1\}and𝐦2\\mathbf\{m\}\_\{2\}are generated according to vMF distributions on𝒮d1−1\\mathcal\{S\}^\{d\_\{1\}\-1\}and𝒮d2−1\\mathcal\{S\}^\{d\_\{2\}\-1\}with precision parametersκ1\\kappa\_\{1\}andκ2\\kappa\_\{2\}, respectively\. To allow for differential ease of communication across feature groups, we also permit group\-specific communication costs, denoted byλ1,c\\lambda\_\{1,c\}andλ2,c\\lambda\_\{2,c\}\.
The agent, upon receiving the message, constructs recommendation sets\{𝜽1,1,…,𝜽1,n1\}\\\{\\bm\{\\theta\}\_\{1,1\},\\dots,\\bm\{\\theta\}\_\{1,n\_\{1\}\}\\\}and\{𝜽2,1,…,𝜽2,n2\}\\\{\\bm\{\\theta\}\_\{2,1\},\\dots,\\bm\{\\theta\}\_\{2,n\_\{2\}\}\\\}over the two feature subsets, combining them multiplicatively to form the full recommendation menu\. Thus, the total number of recommendations is given byn=n1×n2n=n\_\{1\}\\times n\_\{2\}, where each product feature vector takes the form𝜽ij=μ𝜽1,i\+1−μ2𝜽2,j\\bm\{\\theta\}\_\{ij\}=\\mu\\bm\{\\theta\}\_\{1,i\}\+\\sqrt\{1\-\\mu^\{2\}\}\\bm\{\\theta\}\_\{2,j\}\. The utility that the user receives from recommendation𝜽ij\\bm\{\\theta\}\_\{ij\}is given by
u\(𝐡,𝜽ij\)=μ2⟨𝐡1,𝜽1,i⟩\+\(1−μ2\)⟨𝐡2,𝜽2,j⟩\.\\displaystyle u\(\\mathbf\{h\},\\bm\{\\theta\}\_\{ij\}\)=\\mu^\{2\}\\langle\\mathbf\{h\}\_\{1\},\\bm\{\\theta\}\_\{1,i\}\\rangle\+\(1\-\\mu^\{2\}\)\\langle\\mathbf\{h\}\_\{2\},\\bm\{\\theta\}\_\{2,j\}\\rangle\.
Under this modeling, the overall optimization problem decomposes into two independent subproblems, one for each feature subset, with search and communication costs effectively scaled by their respective utility weights\. The overall payoff of the user is given by
𝒩d1,d2\\displaystyle\\mathcal\{N\}\_\{d\_\{1\},d\_\{2\}\}\(κ1,κ2,n1,n2\):=μ2𝒫d1\(κ1,n1\)\+\(1−μ2\)𝒫d2\(κ2,n2\),\\displaystyle\(\\kappa\_\{1\},\\kappa\_\{2\},n\_\{1\},n\_\{2\}\):=\\mu^\{2\}\\mathcal\{P\}\_\{d\_\{1\}\}\(\\kappa\_\{1\},n\_\{1\}\)\+\(1\-\\mu^\{2\}\)\\mathcal\{P\}\_\{d\_\{2\}\}\(\\kappa\_\{2\},n\_\{2\}\),where each term𝒫d1\(κ1,n1\)\\mathcal\{P\}\_\{d\_\{1\}\}\(\\kappa\_\{1\},n\_\{1\}\)follows the same structure as in the baseline model, with costs\(λsμ2,λ1,cμ2\)\\big\(\\frac\{\\lambda\_\{s\}\}\{\\mu^\{2\}\},\\frac\{\\lambda\_\{1,c\}\}\{\\mu^\{2\}\}\\big\), and similarly for𝒫d2\(κ2,n2\)\\mathcal\{P\}\_\{d\_\{2\}\}\(\\kappa\_\{2\},n\_\{2\}\)with cost parameters\(λs1−μ2,λ2,c1−μ2\)\\big\(\\frac\{\\lambda\_\{s\}\}\{1\-\\mu^\{2\}\},\\frac\{\\lambda\_\{2,c\}\}\{1\-\\mu^\{2\}\}\\big\)\.
This formulation highlights the interaction between communication and search across heterogeneous feature groups\. For instance, whenλ2,c≫λs≫λ1,c\\lambda\_\{2,c\}\\gg\\lambda\_\{s\}\\gg\\lambda\_\{1,c\}, the user rationally concentrates communication effort on the more easily articulated feature subset \(the firstd1d\_\{1\}dimensions\), while relying on search to explore the harder\-to\-communicate features \(d2d\_\{2\}dimensions\)\. The agent, in turn, optimally setsn1=O\(1\)n\_\{1\}=O\(1\)to avoid redundancy in the well\-communicated subspace while diversifying over the poorly communicated subspace by choosingn2=O\(eαd2\)n\_\{2\}=O\(e^\{\\alpha d\_\{2\}\}\)for someα\>0\\alpha\>0\. This extension provides a natural and intuitive framework for modeling asymmetric feature importance: communication effort is directed toward salient, describable attributes, while search diversity compensates for uncertainty in latent dimensions\.
##### Discussion onμ\\mu
Although the optimization problem mathematically decouples into two independent subproblems, the weighting parameterμ\\muserves as a critical lever for allocating cognitive effort across the two feature subspaces\. The parameterμ\\muinfluences the system through two reinforcing channels: marginal utility and effective cost\. First, the weights determine each subspace’s contribution to the user’s total utility\. A largerμ\\muincreases the value of alignment in the first feature subspace, thereby strengthening the incentive to optimize along those dimensions\. Second, and more subtly,μ\\mueffectively rescales the cost parameters\. As shown in the decomposition, the optimization for the first subspace is governed by effective cost parameters\(λ1,c/μ2,λs/μ2\)\(\\lambda\_\{1,c\}/\\mu^\{2\},\\lambda\_\{s\}/\\mu^\{2\}\)\. This creates an inverse relationship between importance and cost, as the subspace becomes more important \(larger weight\), the effective ”price” of acquiring information and searching within that subspace decreases\. Consequently, whenμ\\muis large, the system optimally invests heavily in the first subspace by demanding higher message precision and, when beneficial, greater search effort\. Asμ\\mudecreases, the effective costs for the first subspace rise while those for the second subspace fall, and attention shifts accordingly\.
## Appendix DProofs of Essential Lemmas
### D\.1Proof of Lemma[1](https://arxiv.org/html/2605.23944#Thmtheorem1)
###### Proof of Lemma[1](https://arxiv.org/html/2605.23944#Thmtheorem1)\.\.
We provide the proof of each part separately\.
- \(i\)The density function of the message fidelityWWand the uncommunicated component𝐘\\mathbf\{Y\}is provided in\[Mardia and Jupp,[2009](https://arxiv.org/html/2605.23944#bib.bib18), Chapter 9\.3\]\.
- \(ii\)To prove mutual independence, it is sufficient to show that for any pair of distinct indicesi≠ji\\neq j, the random variablesXiX\_\{i\}andXjX\_\{j\}are independent\. Fix𝐦∈𝒮d−1\\mathbf\{m\}\\in\\mathcal\{S\}^\{d\-1\}and let𝒮\(𝐦\)=\{𝐯∈𝒮d−1:⟨𝐦,𝐯⟩=0\}\\mathcal\{S\}\(\\mathbf\{m\}\)=\\\{\\mathbf\{v\}\\in\\mathcal\{S\}^\{d\-1\}:\\langle\\mathbf\{m\},\\mathbf\{v\}\\rangle=0\\\}be the equatorial\(d−2\)\(d\-2\)\-sphere orthogonal to𝐦\\mathbf\{m\}\. We condition on the event𝐘=𝐲∈𝒮\(𝐦\)\\mathbf\{Y\}=\\mathbf\{y\}\\in\\mathcal\{S\}\(\\mathbf\{m\}\)\. Under this conditioning, the random variables\[Xi\|𝐘=𝐲\]=⟨𝐲,𝐘i⟩\[X\_\{i\}\|\\mathbf\{Y\}=\\mathbf\{y\}\]=\\langle\\mathbf\{y\},\\mathbf\{Y\}\_\{i\}\\rangleand\[Xj\|𝐘=𝐲\]=⟨𝐲,𝐘j⟩\[X\_\{j\}\|\\mathbf\{Y\}=\\mathbf\{y\}\]=\\langle\\mathbf\{y\},\\mathbf\{Y\}\_\{j\}\\rangleare functions of𝐘i\\mathbf\{Y\}\_\{i\}and𝐘j\\mathbf\{Y\}\_\{j\}respectively\. Since𝐘i\\mathbf\{Y\}\_\{i\}and𝐘j\\mathbf\{Y\}\_\{j\}are i\.i\.d\. samples fromUnif\(𝒮\(𝐦\)\)\\text\{Unif\}\(\\mathcal\{S\}\(\\mathbf\{m\}\)\), the conditional variablesXiX\_\{i\}andXjX\_\{j\}are also conditionally independent: P\(Xi≤a,Xj≤b\|𝐘=𝐲\)=P\(Xi≤a\|𝐘=𝐲\)P\(Xj≤b\|𝐘=𝐲\)\.\\displaystyle P\(X\_\{i\}\\leq a,X\_\{j\}\\leq b\|\\mathbf\{Y\}=\\mathbf\{y\}\)=P\(X\_\{i\}\\leq a\|\\mathbf\{Y\}=\\mathbf\{y\}\)P\(X\_\{j\}\\leq b\|\\mathbf\{Y\}=\\mathbf\{y\}\)\.This gives us P\(Xi≤a,Xj≤b\)=∫𝐲∈𝒮d−1P\(Xi≤a\|𝐘=𝐲\)P\(Xj≤b\|𝐘=𝐲\)pd\(𝐲\)𝑑y\.\\displaystyle P\(X\_\{i\}\\leq a,X\_\{j\}\\leq b\)=\\int\_\{\\mathbf\{y\}\\in\\mathcal\{S\}^\{d\-1\}\}P\(X\_\{i\}\\leq a\|\\mathbf\{Y\}=\\mathbf\{y\}\)P\(X\_\{j\}\\leq b\|\\mathbf\{Y\}=\\mathbf\{y\}\)p\_\{d\}\(\\mathbf\{y\}\)dy\.Next, we invoke rotational invariance\. Let𝒪\(𝒮\(𝐦\)\)\\mathcal\{O\}\(\\mathcal\{S\}\(\\mathbf\{m\}\)\)be the group of orthogonal transformations \(rotations and reflections\) that map the subspacespan\(𝒮\(𝐦\)\)\\text\{span\}\(\\mathcal\{S\}\(\\mathbf\{m\}\)\)onto itself\. For any two vectors𝐲1,𝐲2∈𝒮\(𝐦\)\\mathbf\{y\}\_\{1\},\\mathbf\{y\}\_\{2\}\\in\\mathcal\{S\}\(\\mathbf\{m\}\), there exists a transformation𝐑∈𝒪\(𝒮\(𝐦\)\)\\mathbf\{R\}\\in\\mathcal\{O\}\(\\mathcal\{S\}\(\\mathbf\{m\}\)\)such that𝐲2=𝐑𝐲1\\mathbf\{y\}\_\{2\}=\\mathbf\{R\}\\mathbf\{y\}\_\{1\}\. Because𝐘i\\mathbf\{Y\}\_\{i\}is uniformly distributed over𝒮\(𝐦\)\\mathcal\{S\}\(\\mathbf\{m\}\), its distribution is invariant under𝐑\\mathbf\{R\}\(i\.e\.,𝐑𝐘i=𝑑𝐘i\\mathbf\{R\}\\mathbf\{Y\}\_\{i\}\\overset\{d\}\{=\}\\mathbf\{Y\}\_\{i\}\)\. Therefore: ⟨𝐲2,𝐘i⟩=⟨𝐑𝐲1,𝐘i⟩=⟨𝐲1,𝐑T𝐘i⟩=𝑑⟨𝐲1,𝐘i⟩\\langle\\mathbf\{y\}\_\{2\},\\mathbf\{Y\}\_\{i\}\\rangle=\\langle\\mathbf\{R\}\\mathbf\{y\}\_\{1\},\\mathbf\{Y\}\_\{i\}\\rangle=\\langle\\mathbf\{y\}\_\{1\},\\mathbf\{R\}^\{T\}\\mathbf\{Y\}\_\{i\}\\rangle\\overset\{d\}\{=\}\\langle\\mathbf\{y\}\_\{1\},\\mathbf\{Y\}\_\{i\}\\rangleThis demonstrates that the conditional probabilityP\(Xi≤a\|𝐘=𝐲\)P\(X\_\{i\}\\leq a\|\\mathbf\{Y\}=\\mathbf\{y\}\)is identical for all𝐲∈𝒮\(𝐦\)\\mathbf\{y\}\\in\\mathcal\{S\}\(\\mathbf\{m\}\)\. Consequently, the conditional distribution is equal to the marginal distribution:P\(Xi≤a\|𝐘=𝐲\)=P\(Xi≤a\)P\(X\_\{i\}\\leq a\|\\mathbf\{Y\}=\\mathbf\{y\}\)=P\(X\_\{i\}\\leq a\)\. Similarly, we haveP\(Xj≤b\|𝐘=𝐲\)=P\(Xj≤b\)P\(X\_\{j\}\\leq b\|\\mathbf\{Y\}=\\mathbf\{y\}\)=P\(X\_\{j\}\\leq b\)\. By substituting this back in the integral, we have P\(Xi≤a,Xj≤b\)=P\(Xi≤a\)P\(Xj≤b\)\.\\displaystyle P\(X\_\{i\}\\leq a,X\_\{j\}\\leq b\)=P\(X\_\{i\}\\leq a\)P\(X\_\{j\}\\leq b\)\.This holds for any pairi≠ji\\neq j, and therefore the set of random variables\{X1,…,Xn\}\\\{X\_\{1\},\\dots,X\_\{n\}\\\}are mutually independent\. Next, we characterize the distribution ofXiX\_\{i\}’s\. Since the uniform distribution on𝒮d−1\\mathcal\{S\}^\{d\-1\}is invariant under orthogonal transformations, the distribution of the inner productXi=⟨𝐘,𝐘i⟩X\_\{i\}=\\langle\\mathbf\{Y\},\\mathbf\{Y\}\_\{i\}\\rangleis independent of the orientation of the reference vector𝐦\\mathbf\{m\}\. Therefore, without loss of generality, we can choose𝐦=𝐞d=\(0,0,…,0,1\)\\mathbf\{m\}=\\mathbf\{e\}\_\{d\}=\(0,0,\\dots,0,1\)\. For this choice of𝐦\\mathbf\{m\}, we can generate samples fromUnif\(𝒮\(𝐞d\)\)\\text\{Unif\}\\big\(\\mathcal\{S\}\(\\mathbf\{e\}\_\{d\}\)\\big\)as follows: first sample𝐘~∼Unif\(𝒮d−2\)\\tilde\{\\mathbf\{Y\}\}\\sim\\text\{Unif\}\\big\(\\mathcal\{S\}^\{d\-2\}\\big\), then set𝐘=\(𝐘~,0\)\\mathbf\{Y\}=\(\\tilde\{\\mathbf\{Y\}\},0\)\. Similarly, we can write𝐘i=\(𝐘~i,0\)\\mathbf\{Y\}\_\{i\}=\(\\tilde\{\\mathbf\{Y\}\}\_\{i\},0\), where𝐘~i\\tilde\{\\mathbf\{Y\}\}\_\{i\}’s are independent samples fromUnif\(𝒮d−2\)\\text\{Unif\}\\big\(\\mathcal\{S\}^\{d\-2\}\\big\)\. Consequently,Xi=⟨𝐘,𝐘i⟩=⟨𝐘~,𝐘~i⟩X\_\{i\}=\\langle\\mathbf\{Y\},\\mathbf\{Y\}\_\{i\}\\rangle=\\langle\\tilde\{\\mathbf\{Y\}\},\\tilde\{\\mathbf\{Y\}\}\_\{i\}\\rangle\. Note that𝐘~i∼Unif\(𝒮d−2\)\\tilde\{\\mathbf\{Y\}\}\_\{i\}\\sim\\text\{Unif\}\\big\(\\mathcal\{S\}^\{d\-2\}\\big\)is equivalent to𝐘~i\\tilde\{\\mathbf\{Y\}\}\_\{i\}following the vMF distributionp0\(⋅\|𝐘~\)p\_\{0\}\(\\cdot\|\\tilde\{\\mathbf\{Y\}\}\)withκ=0\\kappa=0, and dimensiond−1d\-1\. Therefore, by Lemma[1](https://arxiv.org/html/2605.23944#Thmtheorem1)[\(i\)](https://arxiv.org/html/2605.23944#S2.I3.i1), the probability density function ofXiX\_\{i\}is given byp0,d−1\(x\)=C~d−1\(0\)\(1−x2\)d−42p\_\{0,d\-1\}\(x\)=\\tilde\{C\}\_\{d\-1\}\(0\)\(1\-x^\{2\}\)^\{\\frac\{d\-4\}\{2\}\}\.
- \(iii\)We first show that the posterior distributionqκ\(𝐡\|𝐦\)q\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\)matches the vMF distributionpκ\(𝐡\|𝐦\)p\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\)\(as given in Eq\. \([1](https://arxiv.org/html/2605.23944#S2.E1)\)\)\. Using Bayes’ rule: qκ\(𝐡\|𝐦\)∝pκ\(𝐦\|𝐡\)p\(𝐡\)∝eκ⟨𝐡,𝐦⟩\.\\displaystyle q\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\)\\propto p\_\{\\kappa\}\(\\mathbf\{m\}\|\\mathbf\{h\}\)p\(\\mathbf\{h\}\)\\propto e^\{\\kappa\\langle\\mathbf\{h\},\\mathbf\{m\}\\rangle\}\.According to Eq\. \([1](https://arxiv.org/html/2605.23944#S2.E1)\), the normalization constant in the above equation is given byCd\(κ\)C\_\{d\}\(\\kappa\)which gives us, qκ\(𝐡\|𝐦\)=Cd\(κ\)eκ⟨𝐡,𝐦⟩\.\\displaystyle q\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\)=C\_\{d\}\(\\kappa\)e^\{\\kappa\\langle\\mathbf\{h\},\\mathbf\{m\}\\rangle\}\. Next, we show the simplification for the KL\-divergence of the conditional distributionqκ\(⋅\|𝐦\)q\_\{\\kappa\}\(\\cdot\|\\mathbf\{m\}\)fromp\(⋅\)p\(\\cdot\)\. We have DKL\(qκ\(⋅\|𝐦\)∥p\(⋅\)\)\\displaystyle D\_\{\\mathrm\{KL\}\}\\big\(q\_\{\\kappa\}\(\\cdot\|\\mathbf\{m\}\)\\\|p\(\\cdot\)\\big\)=𝔼𝐡∼qκ\(⋅\|𝐦\)\[logqκ\(𝐡\|𝐦\)p\(𝐡\)\]\\displaystyle=\\mathbb\{E\}\_\{\\mathbf\{h\}\\sim q\_\{\\kappa\}\(\\cdot\|\\mathbf\{m\}\)\}\\left\[\\log\\frac\{q\_\{\\kappa\}\(\\mathbf\{h\}\|\\mathbf\{m\}\)\}\{p\(\\mathbf\{h\}\)\}\\right\]=𝔼𝐡∼qκ\(⋅\|𝐦\)\[log\(Cd\(κ\)eκ⟨𝐡,𝐦⟩Cd\(0\)\)\]\\displaystyle=\\mathbb\{E\}\_\{\\mathbf\{h\}\\sim q\_\{\\kappa\}\(\\cdot\|\\mathbf\{m\}\)\}\\left\[\\log\\left\(\\frac\{C\_\{d\}\(\\kappa\)e^\{\\kappa\\langle\\mathbf\{h\},\\mathbf\{m\}\\rangle\}\}\{C\_\{d\}\(0\)\}\\right\)\\right\]=𝔼𝐡∼qκ\(⋅\|𝐦\)\[logCd\(κ\)−logCd\(0\)\+κ⟨𝐡,𝐦⟩\]\\displaystyle=\\mathbb\{E\}\_\{\\mathbf\{h\}\\sim q\_\{\\kappa\}\(\\cdot\|\\mathbf\{m\}\)\}\\left\[\\log C\_\{d\}\(\\kappa\)\-\\log C\_\{d\}\(0\)\+\\kappa\\langle\\mathbf\{h\},\\mathbf\{m\}\\rangle\\right\]=logCd\(κ\)−logCd\(0\)\+κ⋅𝔼𝐡∼qκ\(⋅\|𝐦\)\[⟨𝐡,𝐦⟩\]\.\\displaystyle=\\log C\_\{d\}\(\\kappa\)\-\\log C\_\{d\}\(0\)\+\\kappa\\cdot\\mathbb\{E\}\_\{\\mathbf\{h\}\\sim q\_\{\\kappa\}\(\\cdot\|\\mathbf\{m\}\)\}\[\\langle\\mathbf\{h\},\\mathbf\{m\}\\rangle\]\.Further, by Law of Iterated Expectation, 𝔼𝐦\[𝔼𝐡∼qκ\(⋅\|𝐦\)\[⟨𝐡,𝐦⟩\]\]=𝔼𝐡,𝐦\[⟨𝐡,𝐦⟩\]=𝔼\[W\],\\displaystyle\\mathbb\{E\}\_\{\\mathbf\{m\}\}\\left\[\\mathbb\{E\}\_\{\\mathbf\{h\}\\sim q\_\{\\kappa\}\(\\cdot\|\\mathbf\{m\}\)\}\[\\langle\\mathbf\{h\},\\mathbf\{m\}\\rangle\]\\right\]=\\mathbb\{E\}\_\{\\mathbf\{h\},\\mathbf\{m\}\}\[\\langle\\mathbf\{h\},\\mathbf\{m\}\\rangle\]=\\mathbb\{E\}\[W\],whereWWis a random variable as defined in Lemma[1](https://arxiv.org/html/2605.23944#Thmtheorem1)[\(i\)](https://arxiv.org/html/2605.23944#S2.I3.i1)\. Substituting this back in the calculation of KL\-divergence, we have 𝔼𝐦\[DKL\(qκ\(⋅\|𝐦\)∥p\(⋅\)\)\]=κ𝔼\[W\]−logCd\(0\)Cd\(κ\)\.\\displaystyle\\mathbb\{E\}\_\{\\mathbf\{m\}\}\\big\[D\_\{\\mathrm\{KL\}\}\\big\(q\_\{\\kappa\}\(\\cdot\|\\mathbf\{m\}\)\\\|p\(\\cdot\)\\big\)\\big\]=\\kappa\\mathbb\{E\}\[W\]\-\\log\\frac\{C\_\{d\}\(0\)\}\{C\_\{d\}\(\\kappa\)\}\.
∎
### D\.2Concentration of Message Fidelity
###### Lemma 11\(Concentration ofWW\)\.
LetW∈\[−1,1\]W\\in\[\-1,1\]follows the density functionpκ,d\(w\)p\_\{\\kappa,d\}\(w\)\(see Lemma[1](https://arxiv.org/html/2605.23944#Thmtheorem1)\)\. Supposeρ∈\[0,1\)\\rho\\in\[0,1\)is fixed andκ=ρ1−ρ2d⋆\\kappa=\\frac\{\\rho\}\{1\-\\rho^\{2\}\}d^\{\\star\}withd⋆=d−3d^\{\\star\}=d\-3, then, for alld⋆\>0d^\{\\star\}\>0,
𝔼\[exp\(t\(W−𝔼\[W\]\)\)\]≤exp\(t22d⋆\),\\displaystyle\\mathbb\{E\}\\Big\[\\exp\\big\(t\(W\-\\mathbb\{E\}\[W\]\)\\big\)\\Big\]\\leq\\exp\\left\(\\frac\{t^\{2\}\}\{2d^\{\\star\}\}\\right\),ℙ\(\|W−𝔼\[W\]\|\>ε\)≤2exp\(−d⋆ε22\)\.\\displaystyle\\mathbb\{P\}\\big\(\|W\-\\mathbb\{E\}\[W\]\|\>\\varepsilon\\big\)\\leq 2\\exp\\left\(\-\\frac\{d^\{\\star\}\\varepsilon^\{2\}\}\{2\}\\right\)\.\(13\)Further, we have that there exists a universal constantKKsuch that,
K\(1−ρ2\)d⋆exp\(d⋆ρ21−ρ2\+d⋆2log\(1−ρ2\)\)≤∫−11exp\(d⋆ρw1−ρ2\+d⋆2log\(1−w2\)\)𝑑w\.\\displaystyle\\frac\{K\(1\-\\rho^\{2\}\)\}\{\\sqrt\{d^\{\\star\}\}\}\\exp\\left\(\\frac\{d^\{\\star\}\\rho^\{2\}\}\{1\-\\rho^\{2\}\}\+\\frac\{d^\{\\star\}\}\{2\}\\log\(1\-\\rho^\{2\}\)\\right\)\\leq\\int\_\{\-1\}^\{1\}\\exp\\left\(\\frac\{d^\{\\star\}\\rho w\}\{1\-\\rho^\{2\}\}\+\\frac\{d^\{\\star\}\}\{2\}\\log\(1\-w^\{2\}\)\\right\)dw\.and
∫−11exp\(d⋆ρw1−ρ2\+d⋆2log\(1−w2\)\)𝑑w≤2πd⋆exp\(d⋆ρ21−ρ2\+d⋆2log\(1−ρ2\)\)\.\\displaystyle\\int\_\{\-1\}^\{1\}\\exp\\left\(\\frac\{d^\{\\star\}\\rho w\}\{1\-\\rho^\{2\}\}\+\\frac\{d^\{\\star\}\}\{2\}\\log\(1\-w^\{2\}\)\\right\)dw\\leq\\sqrt\{\\frac\{2\\pi\}\{d^\{\\star\}\}\}\\exp\\left\(\\frac\{d^\{\\star\}\\rho^\{2\}\}\{1\-\\rho^\{2\}\}\+\\frac\{d^\{\\star\}\}\{2\}\\log\(1\-\\rho^\{2\}\)\\right\)\.Finally, we also have that, there exits a universal constantK¯\\bar\{K\}such that
𝔼\[\|W−ρ\|\]≤K¯d⋆\(1−ρ2\)\.\\displaystyle\\mathbb\{E\}\[\|W\-\\rho\|\]\\leq\\frac\{\\bar\{K\}\}\{\\sqrt\{d^\{\\star\}\(1\-\\rho^\{2\}\)\}\}\.
###### Proof\.
We prove the result in three parts\.
1. 1\.Concentration ofWWaround the mean:Consider the potential functionI\(w\)I\(w\)such that the density is expressed aspκ,d\(w\)∝e−I\(w\)p\_\{\\kappa,d\}\(w\)\\propto e^\{\-I\(w\)\}, where I\(w\):=−κw−d⋆2log\(1−w2\)I\(w\):=\-\\kappa w\-\\frac\{d^\{\\star\}\}\{2\}\\log\(1\-w^\{2\}\)We first compute the second derivative ofI\(w\)I\(w\)on the support\(−1,1\)\(\-1,1\): I′\(w\)=−κ\+d⋆w1−w2,I′′\(w\)=d⋆\(1\+w2\(1−w2\)2\)≥d⋆\.I^\{\\prime\}\(w\)=\-\\kappa\+\\frac\{d^\{\\star\}w\}\{1\-w^\{2\}\},\\quad I^\{\\prime\\prime\}\(w\)=d^\{\\star\}\\left\(\\frac\{1\+w^\{2\}\}\{\(1\-w^\{2\}\)^\{2\}\}\\right\)\\geq d^\{\\star\}\.As case be easily observed, we have the uniform lower bound on the second derivative, I′′\(w\)≥I′′\(0\)=d⋆\.I^\{\\prime\\prime\}\(w\)\\geq I^\{\\prime\\prime\}\(0\)=d^\{\\star\}\.By the Bakry\-Émery theorem, the distribution ofWWsatisfies a Log\-Sobolev Inequality with constant1/d⋆1/d^\{\\star\}\. It follows from Herbst’s argument thatWWis sub\-Gaussian with parameter1/d⋆1/d^\{\\star\}, i\.e\., we have 𝔼\[exp\(t\(W−𝔼\[W\]\)\)\]≤exp\(t22d⋆\)\.\\displaystyle\\mathbb\{E\}\\Big\[\\exp\\big\(t\(W\-\\mathbb\{E\}\[W\]\)\\big\)\\Big\]\\leq\\exp\\left\(\\frac\{t^\{2\}\}\{2d^\{\\star\}\}\\right\)\.We refer the readers to\[Ledoux,[2001](https://arxiv.org/html/2605.23944#bib.bib42), Theorem 5\.2\]and\[Bakryet al\.,[2013](https://arxiv.org/html/2605.23944#bib.bib41)\]for more details on Bakry\-Émery theorem and Herbst’s argument\. Next, the tail bound follows from a standard Chernoff’s bound\.
2. 2\.Bound on the integral:As already shown,I′′\(w\)≥d⋆I^\{\\prime\\prime\}\(w\)\\geq d^\{\\star\}, which gives us that I\(w\)≥I\(ρ\)\+I′\(ρ\)\(w−ρ\)\+12d⋆\(w−ρ\)2,∀w∈\(−1,1\)\.\\displaystyle I\(w\)\\geq I\(\\rho\)\+I^\{\\prime\}\(\\rho\)\(w\-\\rho\)\+\\frac\{1\}\{2\}d^\{\\star\}\(w\-\\rho\)^\{2\},\\ \\ \\ \\forall w\\in\(\-1,1\)\.Under the condition thatκ=ρ1−ρ2d⋆\\kappa=\\frac\{\\rho\}\{1\-\\rho^\{2\}\}d^\{\\star\}, we get thatI′\(ρ\)=0I^\{\\prime\}\(\\rho\)=0\. As such, for allw∈\(−1,1\)w\\in\(\-1,1\), I\(w\)≥I\(ρ\)\+12d⋆\(w−ρ\)2\.\\displaystyle I\(w\)\\geq I\(\\rho\)\+\\frac\{1\}\{2\}d^\{\\star\}\(w\-\\rho\)^\{2\}\.\(14\)This in turn implies that ∫−11e−I\(w\)𝑑w≤e−I\(ρ\)∫−∞∞exp\(−12d⋆\(w−ρ\)2\)𝑑w=2πd⋆e−I\(ρ\)\.\\displaystyle\\int\_\{\-1\}^\{1\}e^\{\-I\(w\)\}dw\\leq e^\{\-I\(\\rho\)\}\\int\_\{\-\\infty\}^\{\\infty\}\\exp\\left\(\-\\frac\{1\}\{2\}d^\{\\star\}\(w\-\\rho\)^\{2\}\\right\)dw=\\sqrt\{\\frac\{2\\pi\}\{d^\{\\star\}\}\}e^\{\-I\(\\rho\)\}\.We define ρ^=max\{ρ,1d⋆\},\\hat\{\\rho\}=\\max\\Big\\\{\\rho,\\frac\{1\}\{\\sqrt\{d^\{\\star\}\}\}\\Big\\\},and we consider the neighborhood\(−ρ^,ρ^\)\(\-\\hat\{\\rho\},\\hat\{\\rho\}\)\. We also define Hmax:=1d⋆maxw∈\(−ρ^,ρ^\)I′′\(w\)=1d⋆I′′\(ρ^\)=1\+ρ^2\(1−ρ^2\)2\.\\displaystyle H\_\{\\max\}:=\\frac\{1\}\{d^\{\\star\}\}\\max\_\{w\\in\(\-\\hat\{\\rho\},\\hat\{\\rho\}\)\}I^\{\\prime\\prime\}\(w\)=\\frac\{1\}\{d^\{\\star\}\}I^\{\\prime\\prime\}\(\\hat\{\\rho\}\)=\\frac\{1\+\\hat\{\\rho\}^\{2\}\}\{\(1\-\\hat\{\\rho\}^\{2\}\)^\{2\}\}\.Then, by Taylor’s theorem, for allw∈\(−ρ^,ρ^\)w\\in\(\-\\hat\{\\rho\},\\hat\{\\rho\}\), there exists aξ∈\[w,ρ\]\\xi\\in\[w,\\rho\], such that I\(w\)\\displaystyle I\(w\)=I\(ρ\)\+I′\(ρ\)\(w−ρ\)\+12I′′\(ξ\)\(w−ρ\)2\\displaystyle=I\(\\rho\)\+I^\{\\prime\}\(\\rho\)\(w\-\\rho\)\+\\frac\{1\}\{2\}I^\{\\prime\\prime\}\(\\xi\)\(w\-\\rho\)^\{2\}=\(a\)I\(ρ\)\+12I′′\(ξ\)\(w−ρ\)2\\displaystyle\\stackrel\{\{\\scriptstyle\(a\)\}\}\{\{=\}\}I\(\\rho\)\+\\frac\{1\}\{2\}I^\{\\prime\\prime\}\(\\xi\)\(w\-\\rho\)^\{2\}≤I\(ρ\)\+12\(maxξ∈\[w,ρ\]\|I′′\(ξ\)\|\)×\(w−ρ\)2\\displaystyle\\leq I\(\\rho\)\+\\frac\{1\}\{2\}\\Big\(\\max\_\{\\xi\\in\[w,\\rho\]\}\|I^\{\\prime\\prime\}\(\\xi\)\|\\Big\)\\times\(w\-\\rho\)^\{2\}≤I\(ρ\)\+12\(maxξ∈\(−ρ^,ρ^\)\|I′′\(ξ\)\|\)×\(w−ρ\)2\\displaystyle\\leq I\(\\rho\)\+\\frac\{1\}\{2\}\\Big\(\\max\_\{\\xi\\in\(\-\\hat\{\\rho\},\\hat\{\\rho\}\)\}\|I^\{\\prime\\prime\}\(\\xi\)\|\\Big\)\\times\(w\-\\rho\)^\{2\}=I\(ρ\)\+12d⋆Hmax\(w−ρ\)2,\\displaystyle=I\(\\rho\)\+\\frac\{1\}\{2\}d^\{\\star\}H\_\{\\max\}\(w\-\\rho\)^\{2\},where \(a\) usesI′\(ρ\)=0I^\{\\prime\}\(\\rho\)=0\. Substituting this into the integral yields: ∫−11e−I\(w\)𝑑w\\displaystyle\\int\_\{\-1\}^\{1\}e^\{\-I\(w\)\}dw≥∫−ρ^ρ^e−I\(w\)𝑑w\\displaystyle\\geq\\int\_\{\-\\hat\{\\rho\}\}^\{\\hat\{\\rho\}\}e^\{\-I\(w\)\}dw≥∫−ρ^ρ^exp\(−I\(ρ\)−12d⋆Hmax\(w−ρ\)2\)𝑑w\\displaystyle\\geq\\int\_\{\-\\hat\{\\rho\}\}^\{\\hat\{\\rho\}\}\\exp\\left\(\-I\(\\rho\)\-\\frac\{1\}\{2\}d^\{\\star\}H\_\{\\max\}\(w\-\\rho\)^\{2\}\\right\)dw=e−I\(ρ\)×1d⋆Hmax∫−\(ρ^\+ρ\)d⋆Hmax\(ρ^−ρ\)d⋆Hmaxexp\(−12u2\)𝑑u\\displaystyle=e^\{\-I\(\\rho\)\}\\times\\frac\{1\}\{\\sqrt\{d^\{\\star\}H\_\{\\max\}\}\}\\int\_\{\-\(\\hat\{\\rho\}\+\\rho\)\\sqrt\{d^\{\\star\}H\_\{\\max\}\}\}^\{\(\\hat\{\\rho\}\-\\rho\)\\sqrt\{d^\{\\star\}H\_\{\\max\}\}\}\\exp\\left\(\-\\frac\{1\}\{2\}u^\{2\}\\right\)duwhere we used the substitutionu=d⋆Hmax\(w−ρ\)u=\\sqrt\{d^\{\\star\}H\_\{\\max\}\}\(w\-\\rho\)\. Here, we have that \(ρ^−ρ\)d⋆Hmax=\(max\{ρ,1d⋆\}−ρ\)d⋆Hmax≥0,\\displaystyle\(\\hat\{\\rho\}\-\\rho\)\\sqrt\{d^\{\\star\}H\_\{\\max\}\}=\\Big\(\\max\\Big\\\{\\rho,\\frac\{1\}\{\\sqrt\{d^\{\\star\}\}\}\\Big\\\}\-\\rho\\Big\)\\sqrt\{d^\{\\star\}H\_\{\\max\}\}\\geq 0,and \(ρ^\+ρ\)d⋆Hmax\\displaystyle\(\\hat\{\\rho\}\+\\rho\)\\sqrt\{d^\{\\star\}H\_\{\\max\}\}=\(max\{ρ,1d⋆\}\+ρ\)d⋆Hmax≥\(1\+ρd⋆\)1\+ρ2\(1−ρ2\)2≥1\.\\displaystyle=\\Big\(\\max\\Big\\\{\\rho,\\frac\{1\}\{\\sqrt\{d^\{\\star\}\}\}\\Big\\\}\+\\rho\\Big\)\\sqrt\{d^\{\\star\}H\_\{\\max\}\}\\geq\\Big\(1\+\\rho\\sqrt\{d^\{\\star\}\}\\Big\)\\sqrt\{\\frac\{1\+\\rho^\{2\}\}\{\(1\-\\rho^\{2\}\)^\{2\}\}\}\\geq 1\.Substituting these in the previous equation gives us that ∫−11e−I\(w\)𝑑w\\displaystyle\\int\_\{\-1\}^\{1\}e^\{\-I\(w\)\}dw≥e−I\(ρ\)×1d⋆Hmax∫−10exp\(−12u2\)𝑑u\\displaystyle\\geq e^\{\-I\(\\rho\)\}\\times\\frac\{1\}\{\\sqrt\{d^\{\\star\}H\_\{\\max\}\}\}\\int\_\{\-1\}^\{0\}\\exp\\left\(\-\\frac\{1\}\{2\}u^\{2\}\\right\)du=e−I\(ρ\)1−ρ2d⋆\(1\+ρ2\)∫−10exp\(−12u2\)𝑑u≥K\(1−ρ2\)d⋆e−I\(ρ\),\\displaystyle=e^\{\-I\(\\rho\)\}\\frac\{1\-\\rho^\{2\}\}\{\\sqrt\{d^\{\\star\}\(1\+\\rho^\{2\}\)\}\}\\int\_\{\-1\}^\{0\}\\exp\\left\(\-\\frac\{1\}\{2\}u^\{2\}\\right\)du\\geq\\frac\{K\(1\-\\rho^\{2\}\)\}\{\\sqrt\{d^\{\\star\}\}\}e^\{\-I\(\\rho\)\},where we chooseK=12∫−10exp\(−12u2\)𝑑uK=\\frac\{1\}\{\\sqrt\{2\}\}\\int\_\{\-1\}^\{0\}\\exp\\left\(\-\\frac\{1\}\{2\}u^\{2\}\\right\)du\.
3. 3\.Second moment aroundρ\\rho:We have 𝔼\[\(W−ρ\)2\]\\displaystyle\\mathbb\{E\}\[\(W\-\\rho\)^\{2\}\]=\[∫−11exp\(−I\(w\)\)𝑑w\]−1∫−11\(w−ρ\)2exp\(−I\(w\)\)𝑑w\\displaystyle=\\Big\[\\int\_\{\-1\}^\{1\}\\exp\\left\(\-I\(w\)\\right\)dw\\Big\]^\{\-1\}\\int\_\{\-1\}^\{1\}\(w\-\\rho\)^\{2\}\\exp\\left\(\-I\(w\)\\right\)dw\\allowdisplaybreaks≤\(a\)d⋆K\(1−ρ2\)eI\(ρ\)∫−11\(w−ρ\)2exp\(−I\(w\)\)𝑑w\\displaystyle\\stackrel\{\{\\scriptstyle\(a\)\}\}\{\{\\leq\}\}\\frac\{\\sqrt\{d^\{\\star\}\}\}\{K\(1\-\\rho^\{2\}\)\}e^\{I\(\\rho\)\}\\int\_\{\-1\}^\{1\}\(w\-\\rho\)^\{2\}\\exp\\left\(\-I\(w\)\\right\)dw\\allowdisplaybreaks≤\(b\)d⋆K\(1−ρ2\)eI\(ρ\)∫−11\(w−ρ\)2exp\(−I\(ρ\)−12d⋆\(w−ρ\)2\)𝑑w\\displaystyle\\stackrel\{\{\\scriptstyle\(b\)\}\}\{\{\\leq\}\}\\frac\{\\sqrt\{d^\{\\star\}\}\}\{K\(1\-\\rho^\{2\}\)\}e^\{I\(\\rho\)\}\\int\_\{\-1\}^\{1\}\(w\-\\rho\)^\{2\}\\exp\\left\(\-I\(\\rho\)\-\\frac\{1\}\{2\}d^\{\\star\}\(w\-\\rho\)^\{2\}\\right\)dw\\allowdisplaybreaks≤d⋆K\(1−ρ2\)∫−∞∞\(w−ρ\)2exp\(−12d⋆\(w−ρ\)2\)𝑑w\\displaystyle\\leq\\frac\{\\sqrt\{d^\{\\star\}\}\}\{K\(1\-\\rho^\{2\}\)\}\\int\_\{\-\\infty\}^\{\\infty\}\(w\-\\rho\)^\{2\}\\exp\\left\(\-\\frac\{1\}\{2\}d^\{\\star\}\(w\-\\rho\)^\{2\}\\right\)dw\\allowdisplaybreaks=d⋆K\(1−ρ2\)2π\(d⋆\)3=2πKd⋆\(1−ρ2\),\\displaystyle=\\frac\{\\sqrt\{d^\{\\star\}\}\}\{K\(1\-\\rho^\{2\}\)\}\\sqrt\{\\frac\{2\\pi\}\{\(d^\{\\star\}\)^\{3\}\}\}=\\frac\{\\sqrt\{2\\pi\}\}\{Kd^\{\\star\}\(1\-\\rho^\{2\}\)\},where \(a\) follows by the bound on∫−11exp\(−I\(w\)\)𝑑w\\int\_\{\-1\}^\{1\}\\exp\\left\(\-I\(w\)\\right\)dwprovided in second part, and \(b\) follows by Eq\. \([14](https://arxiv.org/html/2605.23944#A4.E14)\)\. Now the result follows by using𝔼\|W−ρ\|≤𝔼\[\(W−ρ\)2\]\\mathbb\{E\}\|W\-\\rho\|\\leq\\sqrt\{\\mathbb\{E\}\[\(W\-\\rho\)^\{2\}\]\}and choosingK¯=\(2π/K\)1/2\\bar\{K\}=\\big\(\\sqrt\{2\\pi\}/K\\big\)^\{1/2\}\.
∎
## Appendix EProofs of Results in Section[3\.2](https://arxiv.org/html/2605.23944#S3.SS2)
### E\.1Proof of Proposition[2](https://arxiv.org/html/2605.23944#Thmtheorem2)
###### Proof\.
Recall the representation \(Eq\. \([3](https://arxiv.org/html/2605.23944#S2.E3)\)\),⟨𝐡,𝜽i⟩=WWi\+1−W21−Wi2Xi,\\langle\\mathbf\{h\},\\bm\{\\theta\}\_\{i\}\\rangle=WW\_\{i\}\+\\sqrt\{1\-W^\{2\}\}\\sqrt\{1\-W\_\{i\}^\{2\}\}X\_\{i\},where, conditional on the message𝐦\\mathbf\{m\},
W=⟨𝐡,𝐦⟩,Wi=⟨𝜽i,𝐦⟩,Xi=⟨𝐘,𝐘i⟩,⟨𝐘,𝐦⟩=⟨𝐘i,𝐦⟩=0,\\displaystyle W=\\langle\\mathbf\{h\},\\mathbf\{m\}\\rangle,\\qquad W\_\{i\}=\\langle\\bm\{\\theta\}\_\{i\},\\mathbf\{m\}\\rangle,\\qquad X\_\{i\}=\\langle\\mathbf\{Y\},\\mathbf\{Y\}\_\{i\}\\rangle,\\quad\\langle\\mathbf\{Y\},\\mathbf\{m\}\\rangle=\\langle\\mathbf\{Y\}\_\{i\},\\mathbf\{m\}\\rangle=0,
Letρ∈\[0,1\)\\rho\\in\[0,1\)andα≥0\\alpha\\geq 0be fixed and supposeκ=ρ1−ρ2d⋆\\kappa=\\frac\{\\rho\}\{1\-\\rho^\{2\}\}d^\{\\star\}, whered⋆=d−3d^\{\\star\}=d\-3\. We first bound the difference between the actual utility and the utility evaluated at the concentration pointρ\\rho\. For simplicity of notations, we defineUi\(W\):=WWi\+1−W21−Wi2XiU\_\{i\}\(W\):=WW\_\{i\}\+\\sqrt\{1\-W^\{2\}\}\\sqrt\{1\-W\_\{i\}^\{2\}\}X\_\{i\}\. Using the fact thatWi,Xi∈\[−1,1\]W\_\{i\},X\_\{i\}\\in\[\-1,1\]and the triangle inequality, we have
\|Ui\(W\)−Ui\(ρ\)\|\\displaystyle\\Big\|U\_\{i\}\(W\)\-U\_\{i\}\(\\rho\)\\Big\|≤\|Wi\|\|W−ρ\|\+1−Wi2\|Xi\|\|1−W2−1−ρ2\|\\displaystyle\\leq\|W\_\{i\}\|\|W\-\\rho\|\+\\sqrt\{1\-W\_\{i\}^\{2\}\}\|X\_\{i\}\|\\left\|\\sqrt\{1\-W^\{2\}\}\-\\sqrt\{1\-\\rho^\{2\}\}\\right\|\\allowdisplaybreaks≤\|W−ρ\|\+\|1−W2−1−ρ2\|\\displaystyle\\leq\|W\-\\rho\|\+\\left\|\\sqrt\{1\-W^\{2\}\}\-\\sqrt\{1\-\\rho^\{2\}\}\\right\|\\allowdisplaybreaks≤\|W−ρ\|\(1\+21−ρ2\)≤3\|W−ρ\|1−ρ2,\\displaystyle\\leq\|W\-\\rho\|\\left\(1\+\\frac\{2\}\{\\sqrt\{1\-\\rho^\{2\}\}\}\\right\)\\leq\\frac\{3\|W\-\\rho\|\}\{\\sqrt\{1\-\\rho^\{2\}\}\},where we use the identity for the difference of square roots:
\|1−W2−1−ρ2\|=\|W2−ρ2\|1−W2\+1−ρ2≤\|W\+ρ\|\|W−ρ\|1−ρ2≤2\|W−ρ\|1−ρ2\.\\displaystyle\\left\|\\sqrt\{1\-W^\{2\}\}\-\\sqrt\{1\-\\rho^\{2\}\}\\right\|=\\frac\{\|W^\{2\}\-\\rho^\{2\}\|\}\{\\sqrt\{1\-W^\{2\}\}\+\\sqrt\{1\-\\rho^\{2\}\}\}\\leq\\frac\{\|W\+\\rho\|\|W\-\\rho\|\}\{\\sqrt\{1\-\\rho^\{2\}\}\}\\leq\\frac\{2\|W\-\\rho\|\}\{\\sqrt\{1\-\\rho^\{2\}\}\}\.\(15\)From Lemma[11](https://arxiv.org/html/2605.23944#Thmtheorem11), we have that
𝔼\[\|W−ρ\|\]≤K¯d⋆\(1−ρ2\)\.\\displaystyle\\mathbb\{E\}\[\|W\-\\rho\|\]\\leq\\frac\{\\bar\{K\}\}\{\\sqrt\{d^\{\\star\}\(1\-\\rho^\{2\}\)\}\}\.By integrating these results, the expected maximum utility satisfies:
𝔼\[\|maxiUi\(W\)−maxiUi\(ρ\)\|\]≤3K¯d⋆\(1−ρ2\)\.\\displaystyle\\mathbb\{E\}\\Big\[\\big\|\\max\_\{i\}U\_\{i\}\(W\)\-\\max\_\{i\}U\_\{i\}\(\\rho\)\\big\|\\Big\]\\leq\\frac\{3\\bar\{K\}\}\{\\sqrt\{d^\{\\star\}\}\(1\-\\rho^\{2\}\)\}\.\(16\)
We define the simplified utility function as,
V\(w,x\):=ρw\+1−ρ21−w2x\.\\displaystyle V\(w,x\):=\\rho w\+\\sqrt\{1\-\\rho^\{2\}\}\\sqrt\{1\-w^\{2\}\}x\.\(17\)We define the tail probability ofG\(t\)G\(t\)as the probability that a single sample \(W1,X1W\_\{1\},X\_\{1\}\) yields a utility of at leasttt,
G\(t\):=ℙ\(V\(W,X\)≥t\)=∬𝒜tpκ,d\(w\)×p0,d−1\(x\)𝑑w𝑑x,\\displaystyle G\(t\):=\\mathbb\{P\}\\left\(V\(W,X\)\\geq t\\right\)=\\iint\_\{\\mathcal\{A\}\_\{t\}\}p\_\{\\kappa,d\}\(w\)\\times p\_\{0,d\-1\}\(x\)dwdx,\(18\)where the set𝒜t\\mathcal\{A\}\_\{t\}is the region of the domain satisfying the utility threshold, i\.e\.,𝒜t:=\{\(w,x\)∈\(−1,1\)2:V\(w,x\)≥t\}\.\\mathcal\{A\}\_\{t\}:=\\\{\(w,x\)\\in\(\-1,1\)^\{2\}:V\(w,x\)\\geq t\\\}\.Finally, we denote the minimum rate required to achieve utilityttasJ\(t\)J\(t\), which corresponds to the optimization problem:
J\(t\):=infw,x∈\(−1,1\)\{Iρ\(w,x\)such thatV\(w,x\)≥t\}\.\\displaystyle J\(t\):=\\inf\_\{w,x\\in\(\-1,1\)\}\\big\\\{I\_\{\\rho\}\(w,x\)\\ \\text\{ such that \}V\(w,x\)\\geq t\\big\\\}\.\(19\)Using the closed\-form expression for the product of the marginal densities and , we can express the joint density in terms of the large deviations rate function:
pκ,d\(w\)p0,d−1\(x\)=Aρ,d1−x2exp\(−d⋆Iρ\(w,x\)\),\\displaystyle p\_\{\\kappa,d\}\(w\)p\_\{0,d\-1\}\(x\)=\\frac\{A\_\{\\rho,d\}\}\{\\sqrt\{1\-x^\{2\}\}\}\\exp\\left\(\-d^\{\\star\}I\_\{\\rho\}\(w,x\)\\right\),\(20\)where the normalization constant is given by
Aρ,d−1=∫−11∫−1111−x2exp\(−d⋆Iρ\(w,x\)\)𝑑w𝑑x\.\\displaystyle A\_\{\\rho,d\}^\{\-1\}=\\int\_\{\-1\}^\{1\}\\int\_\{\-1\}^\{1\}\\frac\{1\}\{\\sqrt\{1\-x^\{2\}\}\}\\exp\\left\(\-d^\{\\star\}I\_\{\\rho\}\(w,x\)\\right\)dwdx\.We prove the main result in multiple parts\. The first part is the following claim that uses the result in Lemma[11](https://arxiv.org/html/2605.23944#Thmtheorem11)\.
###### Claim 12\.
SupposeKKis a universal constant as mentioned in Lemma[11](https://arxiv.org/html/2605.23944#Thmtheorem11)\. Then, we have
d−42π≤Aρ,d≤d−3K2\.\\displaystyle\\frac\{d\-4\}\{2\\pi\}\\leq A\_\{\\rho,d\}\\leq\\frac\{d\-3\}\{K^\{2\}\}\.
###### Proof of Claim[12](https://arxiv.org/html/2605.23944#Thmtheorem12)\.
We have that
1Aρ,d\\displaystyle\\frac\{1\}\{A\_\{\\rho,d\}\}=∫−11∫−1111−x2exp\(−d⋆Iρ\(w,x\)\)𝑑w𝑑x\\displaystyle=\\int\_\{\-1\}^\{1\}\\int\_\{\-1\}^\{1\}\\frac\{1\}\{\\sqrt\{1\-x^\{2\}\}\}\\exp\\left\(\-d^\{\\star\}I\_\{\\rho\}\(w,x\)\\right\)dwdx=∫−11∫−11exp\(d⋆ρ\(w−ρ\)1−ρ2\+d⋆2log\(1−w2\)\(1−ρ2\)\+d⋆−12log\(1−x2\)\)𝑑w𝑑x\\displaystyle=\\int\_\{\-1\}^\{1\}\\int\_\{\-1\}^\{1\}\\exp\\left\(\\frac\{d^\{\\star\}\\rho\(w\-\\rho\)\}\{1\-\\rho^\{2\}\}\+\\frac\{d^\{\\star\}\}\{2\}\\log\\frac\{\(1\-w^\{2\}\)\}\{\(1\-\\rho^\{2\}\)\}\+\\frac\{d^\{\\star\}\-1\}\{2\}\\log\(1\-x^\{2\}\)\\right\)dwdx=∫−11exp\(d⋆ρ\(w−ρ\)1−ρ2\+d⋆2log\(1−w2\)\(1−ρ2\)\)𝑑w∫−11exp\(d⋆−12log\(1−x2\)\)𝑑x\.\\displaystyle=\\int\_\{\-1\}^\{1\}\\exp\\left\(\\frac\{d^\{\\star\}\\rho\(w\-\\rho\)\}\{1\-\\rho^\{2\}\}\+\\frac\{d^\{\\star\}\}\{2\}\\log\\frac\{\(1\-w^\{2\}\)\}\{\(1\-\\rho^\{2\}\)\}\\right\)dw\\int\_\{\-1\}^\{1\}\\exp\\left\(\\frac\{d^\{\\star\}\-1\}\{2\}\\log\(1\-x^\{2\}\)\\right\)dx\.Then, from Lemma[11](https://arxiv.org/html/2605.23944#Thmtheorem11), we know that
K1−ρ2d⋆≤∫−11exp\(d⋆ρ\(w−ρ\)1−ρ2\+d⋆2log\(1−w2\)\(1−ρ2\)\)𝑑w≤2πd⋆\\displaystyle K\\frac\{1\-\\rho^\{2\}\}\{\\sqrt\{d^\{\\star\}\}\}\\leq\\int\_\{\-1\}^\{1\}\\exp\\left\(\\frac\{d^\{\\star\}\\rho\(w\-\\rho\)\}\{1\-\\rho^\{2\}\}\+\\frac\{d^\{\\star\}\}\{2\}\\log\\frac\{\(1\-w^\{2\}\)\}\{\(1\-\\rho^\{2\}\)\}\\right\)dw\\leq\\sqrt\{\\frac\{2\\pi\}\{d^\{\\star\}\}\}Next, by simply substitutingρ=0\\rho=0andd⋆→d⋆−1d^\{\\star\}\\rightarrow d^\{\\star\}\-1in the previous equation,
Kd⋆−1≤∫−11exp\(d⋆−12log\(1−x2\)\)𝑑x≤2πd⋆−1\.\\displaystyle\\frac\{K\}\{\\sqrt\{d^\{\\star\}\-1\}\}\\leq\\int\_\{\-1\}^\{1\}\\exp\\left\(\\frac\{d^\{\\star\}\-1\}\{2\}\\log\(1\-x^\{2\}\)\\right\)dx\\leq\\sqrt\{\\frac\{2\\pi\}\{d^\{\\star\}\-1\}\}\.Thus,
K2d⋆≤1Aρ,d≤2πd⋆−1\.\\displaystyle\\frac\{K^\{2\}\}\{d^\{\\star\}\}\\leq\\frac\{1\}\{A\_\{\\rho,d\}\}\\leq\\frac\{2\\pi\}\{d^\{\\star\}\-1\}\.This completes the proof of the claim\. ∎
Next we make the claim presenting an upper bound on the probabilityG\(t\)G\(t\)\.
###### Claim 13\.
There exists a universal constantK1K\_\{1\}such that,
G\(t\)≤K1d⋆exp\(−d⋆J\(t\)\)\.\\displaystyle G\(t\)\\leq K\_\{1\}d^\{\\star\}\\exp\\left\(\-d^\{\\star\}J\(t\)\\right\)\.
###### Proof of Claim[13](https://arxiv.org/html/2605.23944#Thmtheorem13)\.
From the definition ofJ\(t\)J\(t\), we have thatIρ\(w,x\)≥J\(t\)I\_\{\\rho\}\(w,x\)\\geq J\(t\)for all\(w,x\)∈𝒜t\(w,x\)\\in\\mathcal\{A\}\_\{t\}\. This gives us that
G\(t\)\\displaystyle G\(t\)≤exp\(−d⋆J\(t\)\)⋅Aρ,d∬𝒜t11−x2𝑑w𝑑x≤2πAρ,dexp\(−d⋆J\(t\)\),\\displaystyle\\leq\\exp\\left\(\-d^\{\\star\}J\(t\)\\right\)\\cdot A\_\{\\rho,d\}\\iint\_\{\\mathcal\{A\}\_\{t\}\}\\frac\{1\}\{\\sqrt\{1\-x^\{2\}\}\}dwdx\\leq 2\\pi A\_\{\\rho,d\}\\exp\\left\(\-d^\{\\star\}J\(t\)\\right\),where we use that
∬𝒜t11−x2𝑑w𝑑x≤∫−11∫−1111−x2𝑑w𝑑x=2π\.\\displaystyle\\iint\_\{\\mathcal\{A\}\_\{t\}\}\\frac\{1\}\{\\sqrt\{1\-x^\{2\}\}\}dwdx\\leq\\int\_\{\-1\}^\{1\}\\int\_\{\-1\}^\{1\}\\frac\{1\}\{\\sqrt\{1\-x^\{2\}\}\}dwdx=2\\pi\.Afterwards, the result follows by using thatAρ,d≤d−3K2=d⋆K2A\_\{\\rho,d\}\\leq\\frac\{d\-3\}\{K^\{2\}\}=\\frac\{d^\{\\star\}\}\{K^\{2\}\}, and choosing the constantK1=2πK2K\_\{1\}=\\frac\{2\\pi\}\{K^\{2\}\}\. This completes the proof of the claim\. ∎
Next, we prove the lower bound on the probabilityG\(t\)G\(t\)\.
###### Claim 14\.
There exists a constantK2K\_\{2\}such that,
G\(t\)≥K2d⋆exp\(−d⋆J\(t\)\)\.\\displaystyle G\(t\)\\geq\\frac\{K\_\{2\}\}\{\\sqrt\{d^\{\\star\}\}\}\\exp\\left\(\-d^\{\\star\}J\(t\)\\right\)\.
###### Proof of Claim[14](https://arxiv.org/html/2605.23944#Thmtheorem14)\.
Let𝐳t=\(wt,xt\)\\mathbf\{z\}\_\{t\}=\(w\_\{t\},x\_\{t\}\)be the point on the boundaryV\(w,x\)=tV\(w,x\)=tthat minimizes the rate functionIρ\(w,x\)I\_\{\\rho\}\(w,x\)\. By the first\-order optimality conditions, we have that∇Iρ\(𝐳t\)\\nabla I\_\{\\rho\}\(\\mathbf\{z\}\_\{t\}\)must be parallel to∇V\(𝐳t\)\\nabla V\(\\mathbf\{z\}\_\{t\}\)\. We define an orthonormal basis \(𝐞⟂,𝐞∥\)\\mathbf\{e\}\_\{\\perp\},\\mathbf\{e\}\_\{\\parallel\}\)centered at𝐳t\\mathbf\{z\}\_\{t\}as follows,
𝐞⟂:=∇V\(wt,xt\)‖∇V\(wt,xt\)‖,𝐞∥:=\(−e⟂,xe⟂,w\)\.\\displaystyle\\mathbf\{e\}\_\{\\perp\}:=\\frac\{\\nabla V\(w\_\{t\},x\_\{t\}\)\}\{\\\|\\nabla V\(w\_\{t\},x\_\{t\}\)\\\|\},\\quad\\mathbf\{e\}\_\{\\parallel\}:=\\begin\{pmatrix\}\-e\_\{\\perp,x\}\\ e\_\{\\perp,w\}\\end\{pmatrix\}\.\(21\)
Here,𝐞⟂\\mathbf\{e\}\_\{\\perp\}is the unit vector normal to the level set\{V\(w,x\)=t\}\\\{V\(w,x\)=t\\\}pointing into the set𝒜t\\mathcal\{A\}\_\{t\}, and𝐞∥\\mathbf\{e\}\_\{\\parallel\}is the corresponding unit tangent vector\. Any point𝐳=\(w,x\)\\mathbf\{z\}=\(w,x\)in a neighborhood of𝐳t\\mathbf\{z\}\_\{t\}can be represented in these local coordinates \(r,sr,s\) as
𝐳=𝐳t\+r𝐞⟂\+s𝐞∥,\\displaystyle\\mathbf\{z\}=\\mathbf\{z\}\_\{t\}\+r\\mathbf\{e\}\_\{\\perp\}\+s\\mathbf\{e\}\_\{\\parallel\},\(22\)Since the transformation is a rotation followed by a translation, the Jacobian determinant is\|det\(𝐞⟂,𝐞∥\)\|=1\|\\text\{det\}\(\\mathbf\{e\}\_\{\\perp\},\\mathbf\{e\}\_\{\\parallel\}\)\|=1, ensuring that the area element satisfiesdwdx=drdsdwdx=drds\.
Consider a neighborhood𝒩d\\mathcal\{N\}\_\{d\}around𝐳t\\mathbf\{z\}\_\{t\}defined in local coordinates by
𝒩d\(𝐳t\)=\{𝐳=𝐳t\+r𝐞⟂\+s𝐞∥:\|r\|≤d−1/2,\|s\|≤d−1/2\}\\displaystyle\\mathcal\{N\}\_\{d\}\(\\mathbf\{z\}\_\{t\}\)=\\big\\\{\\mathbf\{z\}=\\mathbf\{z\}\_\{t\}\+r\\mathbf\{e\}\_\{\\perp\}\+s\\mathbf\{e\}\_\{\\parallel\}:\|r\|\\leq d^\{\-1/2\},\\ \|s\|\\leq d^\{\-1/2\}\\big\\\}Since𝐳t\\mathbf\{z\}\_\{t\}lies in the interior of the domain\(−1,1\)2\(\-1,1\)^\{2\}, the utility functionVVis twice\-continuously differentiable on𝒩d\(𝐳t\)\\mathcal\{N\}\_\{d\}\(\\mathbf\{z\}\_\{t\}\)for sufficiently largedd\. Because the Hessian of functionVV,𝐇V\\mathbf\{H\}\_\{V\}is continuous, its eigenvalues are bounded on this compact neighborhood\. Thus, there exists a constantKt\>0K\_\{t\}\>0, independent ofdd, such that the minimum eigenvalue of𝐇V\\mathbf\{H\}\_\{V\}satisfies
λmin\(𝐇V\(𝝃\)\)≥−Ktfor all𝝃∈𝒩d\.\\displaystyle\\lambda\_\{\\min\}\(\\mathbf\{H\}\_\{V\}\(\\bm\{\\xi\}\)\)\\geq\-K\_\{t\}\\quad\\text\{for all \}\\bm\{\\xi\}\\in\\mathcal\{N\}\_\{d\}\.
Applying Taylor’s theorem at𝐳t\\mathbf\{z\}\_\{t\}, for any point𝐳∈𝒩d\(𝐳t\)\\mathbf\{z\}\\in\\mathcal\{N\}\_\{d\}\(\\mathbf\{z\}\_\{t\}\), there exists some𝝃\\bm\{\\xi\}such that
V\(r,s\)\\displaystyle V\(r,s\)=V\(𝐳t\)\+∇V\(𝐳t\)⊤\(r𝐞⟂\+s𝐞∥\)\+12\(r𝐞⟂\+s𝐞∥\)⊤𝐇V\(𝝃\)\(r𝐞⟂\+s𝐞∥\)\\displaystyle=V\(\\mathbf\{z\}\_\{t\}\)\+\\nabla V\(\\mathbf\{z\}\_\{t\}\)^\{\\top\}\(r\\mathbf\{e\}\_\{\\perp\}\+s\\mathbf\{e\}\_\{\\parallel\}\)\+\\frac\{1\}\{2\}\(r\\mathbf\{e\}\_\{\\perp\}\+s\\mathbf\{e\}\_\{\\parallel\}\)^\{\\top\}\\mathbf\{H\}\_\{V\}\(\\bm\{\\xi\}\)\(r\\mathbf\{e\}\_\{\\perp\}\+s\\mathbf\{e\}\_\{\\parallel\}\)≥t\+‖∇V\(𝐳t\)‖r−Kt2\(r2\+s2\),\\displaystyle\\geq t\+\\\|\\nabla V\(\\mathbf\{z\}\_\{t\}\)\\\|r\-\\frac\{K\_\{t\}\}\{2\}\(r^\{2\}\+s^\{2\}\),where we use the fact thatV\(𝐳t\)=tV\(\\mathbf\{z\}\_\{t\}\)=t,∇V\(𝐳t\)=‖∇V\(𝐳t\)‖𝐞⟂\\nabla V\(\\mathbf\{z\}\_\{t\}\)=\\\|\\nabla V\(\\mathbf\{z\}\_\{t\}\)\\\|\\mathbf\{e\}\_\{\\perp\}, and using the lower bound on the Hessian eigenvalues\. To ensure that a point𝐳=𝐳t\+r𝐞⟂\+s𝐞∥\\mathbf\{z\}=\\mathbf\{z\}\_\{t\}\+r\\mathbf\{e\}\_\{\\perp\}\+s\\mathbf\{e\}\_\{\\parallel\}is contained in the set𝒜t=\{𝐳:V\(𝐳\)≥t\}\\mathcal\{A\}\_\{t\}=\\\{\\mathbf\{z\}:V\(\\mathbf\{z\}\)\\geq t\\\}, it is sufficient to satisfy
‖∇V\(𝐳t\)‖r≥Kt2\(r2\+s2\)\.\\displaystyle\\\|\\nabla V\(\\mathbf\{z\}\_\{t\}\)\\\|r\\geq\\frac\{K\_\{t\}\}\{2\}\(r^\{2\}\+s^\{2\}\)\.As such, we have that for all sufficiently largedd
ℛt:=\{𝐳=𝐳t\+r𝐞⟂\+s𝐞∥:\(r,s\)∈𝒩t,\}⊆𝒜t,\\displaystyle\\mathcal\{R\}\_\{t\}:=\\Big\\\{\\mathbf\{z\}=\\mathbf\{z\}\_\{t\}\+r\\mathbf\{e\}\_\{\\perp\}\+s\\mathbf\{e\}\_\{\\parallel\}:\(r,s\)\\in\\mathcal\{N\}\_\{t\},\\Big\\\}\\subseteq\\mathcal\{A\}\_\{t\},where
𝒩t=\{\(r,s\):\|r\|≤d−1/2,\|s\|≤d−1/2,‖∇V\(𝐳t\)‖r≥Kt2\(r2\+s2\)\}\.\\displaystyle\\mathcal\{N\}\_\{t\}=\\Big\\\{\(r,s\):\|r\|\\leq d^\{\-1/2\},\\ \|s\|\\leq d^\{\-1/2\},\\\|\\nabla V\(\\mathbf\{z\}\_\{t\}\)\\\|r\\geq\\frac\{K\_\{t\}\}\{2\}\(r^\{2\}\+s^\{2\}\)\\Big\\\}\.
SinceIρ\(w,x\)I\_\{\\rho\}\(w,x\)is twice\-continuously differentiable on the interior of the domain, we expand it in the local\(r,s\)\(r,s\)coordinates around the optimizer𝐳t\\mathbf\{z\}\_\{t\}\. Recall that, by the first\-order optimality conditions,∇Iρ\(𝐳t\)\\nabla I\_\{\\rho\}\(\\mathbf\{z\}\_\{t\}\)is parallel to∇V\(𝐳t\)\\nabla V\(\\mathbf\{z\}\_\{t\}\), and thus∇Iρ\(𝐳t\)⊤𝐞∥=0\\nabla I\_\{\\rho\}\(\\mathbf\{z\}\_\{t\}\)^\{\\top\}\\mathbf\{e\}\_\{\\parallel\}=0\. For any𝐳∈ℛt\\mathbf\{z\}\\in\\mathcal\{R\}\_\{t\}, we have:
Iρ\(𝐳\)\\displaystyle I\_\{\\rho\}\(\\mathbf\{z\}\)≤Iρ\(𝐳t\)\+\|∇Iρ\(𝐳t\)⊤\(r𝐞⟂\+s𝐞∥\)\|\+12Mt\(r2\+s2\)\\displaystyle\\leq I\_\{\\rho\}\(\\mathbf\{z\}\_\{t\}\)\+\\big\|\\nabla I\_\{\\rho\}\(\\mathbf\{z\}\_\{t\}\)^\{\\top\}\(r\\mathbf\{e\}\_\{\\perp\}\+s\\mathbf\{e\}\_\{\\parallel\}\)\\big\|\+\\frac\{1\}\{2\}M\_\{t\}\(r^\{2\}\+s^\{2\}\)≤J\(t\)\+\(\|∇Iρ\(𝐳t\)⊤𝐞⟂\|\+Mt‖∇V\(𝐳t\)‖Kt\)r,\\displaystyle\\leq J\(t\)\+\\Big\(\\big\|\\nabla I\_\{\\rho\}\(\\mathbf\{z\}\_\{t\}\)^\{\\top\}\\mathbf\{e\}\_\{\\perp\}\\big\|\+\\frac\{M\_\{t\}\\\|\\nabla V\(\\mathbf\{z\}\_\{t\}\)\\\|\}\{K\_\{t\}\}\\Big\)r,where we use thatr2\+s2≤‖∇V\(𝐳t\)‖Ktrr^\{2\}\+s^\{2\}\\leq\\frac\{\\\|\\nabla V\(\\mathbf\{z\}\_\{t\}\)\\\|\}\{K\_\{t\}\}rfor any𝐳∈ℛt\\mathbf\{z\}\\in\\mathcal\{R\}\_\{t\}\. This gives us that
exp\(−d⋆Iρ\(𝐳\)\)≥exp\(−d⋆J\(t\)−d⋆Ctr\),∀𝐳∈ℛt,\\displaystyle\\exp\\big\(\-d^\{\\star\}I\_\{\\rho\}\(\\mathbf\{z\}\)\\big\)\\geq\\exp\\big\(\-d^\{\\star\}J\(t\)\-d^\{\\star\}C\_\{t\}r\\big\),\\ \\ \\forall\\mathbf\{z\}\\in\\mathcal\{R\}\_\{t\},\(23\)whereCt:=\(\|∇Iρ\(𝐳t\)⊤𝐞⟂\|\+Mt‖∇V\(𝐳t\)‖Kt\)C\_\{t\}:=\\Big\(\\big\|\\nabla I\_\{\\rho\}\(\\mathbf\{z\}\_\{t\}\)^\{\\top\}\\mathbf\{e\}\_\{\\perp\}\\big\|\+\\frac\{M\_\{t\}\\\|\\nabla V\(\\mathbf\{z\}\_\{t\}\)\\\|\}\{K\_\{t\}\}\\Big\)\. Now, using Eq\. \([18](https://arxiv.org/html/2605.23944#A5.E18)\) and \([20](https://arxiv.org/html/2605.23944#A5.E20)\), we have that
G\(t\)\\displaystyle G\(t\)=Aρ,d∬𝒜t11−x2exp\(−d⋆Iρ\(w,x\)\)𝑑w𝑑x\\displaystyle=A\_\{\\rho,d\}\\iint\_\{\\mathcal\{A\}\_\{t\}\}\\frac\{1\}\{\\sqrt\{1\-x^\{2\}\}\}\\exp\\big\(\-d^\{\\star\}I\_\{\\rho\}\(w,x\)\\big\)dwdx≥\(a\)Aρ,d∬𝒜texp\(−d⋆Iρ\(w,x\)\)𝑑w𝑑x\\displaystyle\\stackrel\{\{\\scriptstyle\(a\)\}\}\{\{\\geq\}\}A\_\{\\rho,d\}\\iint\_\{\\mathcal\{A\}\_\{t\}\}\\exp\\big\(\-d^\{\\star\}I\_\{\\rho\}\(w,x\)\\big\)dwdx≥\(b\)Aρ,d∬𝒜texp\(−d⋆Iρ\(𝐳\)\)𝑑r𝑑s\\displaystyle\\stackrel\{\{\\scriptstyle\(b\)\}\}\{\{\\geq\}\}A\_\{\\rho,d\}\\iint\_\{\\mathcal\{A\}\_\{t\}\}\\exp\\big\(\-d^\{\\star\}I\_\{\\rho\}\(\\mathbf\{z\}\)\\big\)drds≥\(c\)Aρ,d∬ℛtexp\(−d⋆Iρ\(𝐳\)\)𝑑r𝑑s\.\\displaystyle\\stackrel\{\{\\scriptstyle\(c\)\}\}\{\{\\geq\}\}A\_\{\\rho,d\}\\iint\_\{\\mathcal\{R\}\_\{t\}\}\\exp\\big\(\-d^\{\\star\}I\_\{\\rho\}\(\\mathbf\{z\}\)\\big\)drds\.where \(a\) follows by using11−x2≥1\\frac\{1\}\{\\sqrt\{1\-x^\{2\}\}\}\\geq 1, \(b\) follows by using the earlier argument\|det\(𝐞⟂,𝐞∥\)\|=1\|\\text\{det\}\(\\mathbf\{e\}\_\{\\perp\},\\mathbf\{e\}\_\{\\parallel\}\)\|=1implying thatdwdx=drdsdwdx=drds, \(c\) follows asℛt⊆𝒜t\\mathcal\{R\}\_\{t\}\\subseteq\\mathcal\{A\}\_\{t\}\. Using the results in Eq\. \([23](https://arxiv.org/html/2605.23944#A5.E23)\), we have that
∬ℛtexp\(−d⋆Iρ\(𝐳\)\)𝑑r𝑑s≥e−d⋆J\(t\)∬𝒩texp\(−d⋆Ctr\)𝑑r𝑑s\.\\displaystyle\\iint\_\{\\mathcal\{R\}\_\{t\}\}\\exp\\big\(\-d^\{\\star\}I\_\{\\rho\}\(\\mathbf\{z\}\)\\big\)drds\\geq e^\{\-d^\{\\star\}J\(t\)\}\\iint\_\{\\mathcal\{N\}\_\{t\}\}\\exp\\left\(\-d^\{\\star\}C\_\{t\}r\\right\)drds\.Next, we establish a lower bound for the integral over𝒩t\\mathcal\{N\}\_\{t\}\. The neighborhood𝒩t\\mathcal\{N\}\_\{t\}is defined by the quadratic constraint‖∇V\(𝐳t\)‖r≥Kt2\(r2\+s2\)\\\|\\nabla V\(\\mathbf\{z\}\_\{t\}\)\\\|r\\geq\\frac\{K\_\{t\}\}\{2\}\(r^\{2\}\+s^\{2\}\)\. We get,
\{\(r,s\):0≤r≤d−1/2,\|s\|≤d−1/2,r≤‖∇V\(𝐳t\)‖Kt,‖∇V\(𝐳t\)‖Ktr≥s2\}⊆𝒩t\.\\displaystyle\\Big\\\{\(r,s\):0\\leq r\\leq d^\{\-1/2\},\\ \|s\|\\leq d^\{\-1/2\},r\\leq\\frac\{\\\|\\nabla V\(\\mathbf\{z\}\_\{t\}\)\\\|\}\{K\_\{t\}\},\\frac\{\\\|\\nabla V\(\\mathbf\{z\}\_\{t\}\)\\\|\}\{K\_\{t\}\}r\\geq s^\{2\}\\Big\\\}\\subseteq\\mathcal\{N\}\_\{t\}\.Thus, writingγ:=2‖∇V\(𝐳t\)‖Kt\\gamma:=\\frac\{2\\\|\\nabla V\(\\mathbf\{z\}\_\{t\}\)\\\|\}\{K\_\{t\}\}, we get
∬𝒩texp\(−d⋆Ctr\)𝑑r𝑑s\\displaystyle\\iint\_\{\\mathcal\{N\}\_\{t\}\}\\exp\(\-d^\{\\star\}C\_\{t\}r\)drds≥∫0d−12exp\(−d⋆Ctr\)\(∫−γrγr𝑑s\)𝑑r\\displaystyle\\geq\\int\_\{0\}^\{d^\{\-\\frac\{1\}\{2\}\}\}\\exp\(\-d^\{\\star\}C\_\{t\}r\)\\left\(\\int\_\{\-\\sqrt\{\\gamma r\}\}^\{\\sqrt\{\\gamma r\}\}ds\\right\)dr=2γ∫0d−12rexp\(−d⋆Ctr\)𝑑r\\displaystyle=2\\sqrt\{\\gamma\}\\int\_\{0\}^\{d^\{\-\\frac\{1\}\{2\}\}\}\\sqrt\{r\}\\exp\(\-d^\{\\star\}C\_\{t\}r\)dr=2γ\(d⋆Ct\)3/2∫0d⋆Ctue−u𝑑u\.\\displaystyle=\\frac\{2\\sqrt\{\\gamma\}\}\{\(d^\{\\star\}C\_\{t\}\)^\{3/2\}\}\\int\_\{0\}^\{\\sqrt\{d^\{\\star\}\}C\_\{t\}\}\\sqrt\{u\}e^\{\-u\}du\.Asd→∞d\\to\\infty, the upper limitd⋆Ct\\sqrt\{d^\{\\star\}\}C\_\{t\}tends to infinity\. The integral converges to the Gamma function valueΓ\(3/2\)=π2\\Gamma\(3/2\)=\\frac\{\\sqrt\{\\pi\}\}\{2\}\. As such, ford⋆d^\{\\star\}large enough, we have that there exists a constantC1C\_\{1\}such that∫0d⋆Ctue−u𝑑u≥C1\\int\_\{0\}^\{\\sqrt\{d^\{\\star\}\}C\_\{t\}\}\\sqrt\{u\}e^\{\-u\}du\\geq C\_\{1\}\. Thus, we get
∬𝒩texp\(−d⋆Ctr\)𝑑r𝑑s≥C2\(d⋆\)3/2,\\displaystyle\\iint\_\{\\mathcal\{N\}\_\{t\}\}\\exp\(\-d^\{\\star\}C\_\{t\}r\)drds\\geq\\frac\{C\_\{2\}\}\{\(d^\{\\star\}\)^\{3/2\}\},where we chooseC2=2γ\(Ct\)3/2C1C\_\{2\}=\\frac\{2\\sqrt\{\\gamma\}\}\{\(C\_\{t\}\)^\{3/2\}\}C\_\{1\}\. Further, from Claim[12](https://arxiv.org/html/2605.23944#Thmtheorem12), we have thatAρ,d≥d⋆−12π≥d⋆4πA\_\{\\rho,d\}\\geq\\frac\{d^\{\\star\}\-1\}\{2\\pi\}\\geq\\frac\{d^\{\\star\}\}\{4\\pi\}\. Thus,
G\(t\)≥K2d⋆e−d⋆J\(t\),\\displaystyle G\(t\)\\geq\\frac\{K\_\{2\}\}\{\\sqrt\{d^\{\\star\}\}\}e^\{\-d^\{\\star\}J\(t\)\},by choosingK2=C24πK\_\{2\}=\\frac\{C\_\{2\}\}\{4\\pi\}\. This completes the proof of Claim[14](https://arxiv.org/html/2605.23944#Thmtheorem14)\. ∎
Next, we present the final steps of the proof\. Recall thatα\\alphadenotes the exponent of the recommendation set size\. We useO~\(⋅\)\\tilde\{O\}\(\\cdot\)to denoteO\(⋅\)O\(\\cdot\)up to polylogarithmic factors indd\.
##### Caseα=0\\alpha=0\(single recommendation\)\.
Whenα=0\\alpha=0, we haven=⌊e0⌋=1n=\\lfloor e^\{0\}\\rfloor=1, so the recommendation set consists of a single item and there is no maximum over multiple samples\. The expected utility is simply𝔼\[V\(W1,X1\)\]\\mathbb\{E\}\[V\(W\_\{1\},X\_\{1\}\)\]\. SinceV\(w,x\)=ρw\+1−ρ21−w2xV\(w,x\)=\\rho w\+\\sqrt\{1\-\\rho^\{2\}\}\\sqrt\{1\-w^\{2\}\}\\,xandW1,X1W\_\{1\},X\_\{1\}are independent with𝔼\[X1\]=0\\mathbb\{E\}\[X\_\{1\}\]=0, we have
𝔼\[V\(W1,X1\)\]=ρ𝔼\[W1\]\.\\displaystyle\\mathbb\{E\}\[V\(W\_\{1\},X\_\{1\}\)\]=\\rho\\,\\mathbb\{E\}\[W\_\{1\}\]\.From Lemma[11](https://arxiv.org/html/2605.23944#Thmtheorem11),\|𝔼\[W1\]−ρ\|≤𝔼\|W1−ρ\|≤K¯d⋆\(1−ρ2\)\|\\mathbb\{E\}\[W\_\{1\}\]\-\\rho\|\\leq\\mathbb\{E\}\|W\_\{1\}\-\\rho\|\\leq\\frac\{\\bar\{K\}\}\{\\sqrt\{d^\{\\star\}\(1\-\\rho^\{2\}\)\}\}, so
𝔼\[V\(W1,X1\)\]=ρ2\+O\(d−1/2\)\.\\displaystyle\\mathbb\{E\}\[V\(W\_\{1\},X\_\{1\}\)\]=\\rho^\{2\}\+O\(d^\{\-1/2\}\)\.Meanwhile,f\(ρ,0\)=max\{V\(w,x\):Iρ\(w,x\)≤0\}f\(\\rho,0\)=\\max\\\{V\(w,x\):I\_\{\\rho\}\(w,x\)\\leq 0\\\}\. SinceIρ\(w,x\)≥0I\_\{\\rho\}\(w,x\)\\geq 0with equality if and only if\(w,x\)=\(ρ,0\)\(w,x\)=\(\\rho,0\), we havef\(ρ,0\)=V\(ρ,0\)=ρ2f\(\\rho,0\)=V\(\\rho,0\)=\\rho^\{2\}\. Combining with Eq\. \([16](https://arxiv.org/html/2605.23944#A5.E16)\), the total approximation error isO\(d−1/2\)O\(d^\{\-1/2\}\), which yields the stated bound whenα=0\\alpha=0\.
##### Caseα\>0\\alpha\>0\.
Next, we assume thatα\>0\\alpha\>0\. Supposet∗t^\{\*\},tHight\_\{\\text\{High\}\}andtLowt\_\{\\text\{Low\}\}are such that
J\(t∗\)=α,J\(tHigh\)=α\+2logd⋆d⋆,J\(tLow\)=α−logd⋆d⋆\.\\displaystyle J\(t^\{\*\}\)=\\alpha,\\ \\ J\(t\_\{\\text\{High\}\}\)=\\alpha\+\\frac\{2\\log d^\{\\star\}\}\{d^\{\\star\}\},\\ \\ J\(t\_\{\\text\{Low\}\}\)=\\alpha\-\\frac\{\\log d^\{\\star\}\}\{d^\{\\star\}\}\.Then, asJ\(t\)J\(t\)is a strictly increasing function, from Claims[14](https://arxiv.org/html/2605.23944#Thmtheorem14)and[13](https://arxiv.org/html/2605.23944#Thmtheorem13), we have
G\(t\)≥K2d⋆exp\(−d⋆α\),∀t≤tLow,andG\(t\)≤K1d⋆exp\(−d⋆α\),∀t≥tHigh\.\\displaystyle G\(t\)\\geq K\_\{2\}\\sqrt\{d^\{\\star\}\}\\exp\\left\(\-d^\{\\star\}\\alpha\\right\),\\ \\ \\forall t\\leq t\_\{\\text\{Low\}\},\\ \\ \\text\{ and \}\\ \\ G\(t\)\\leq\\frac\{K\_\{1\}\}\{d^\{\\star\}\}\\exp\\left\(\-d^\{\\star\}\\alpha\\right\),\\ \\ \\forall t\\geq t\_\{\\text\{High\}\}\.This in turn implies that, for allt≥tHight\\geq t\_\{\\text\{High\}\},
ℙ\(maxi≤nV\(Wi,Xi\)≥t\)=1−\(1−G\(t\)\)n≤nG\(t\)≤e\(d⋆\+3\)α×K1d⋆exp\(−d⋆α\)=K1d⋆exp\(3α\)\.\\displaystyle\\mathbb\{P\}\\Big\(\\max\_\{i\\leq n\}V\(W\_\{i\},X\_\{i\}\)\\geq t\\Big\)=1\-\\big\(1\-G\(t\)\\big\)^\{n\}\\leq nG\(t\)\\leq e^\{\(d^\{\\star\}\+3\)\\alpha\}\\times\\frac\{K\_\{1\}\}\{d^\{\\star\}\}\\exp\\left\(\-d^\{\\star\}\\alpha\\right\)=\\frac\{K\_\{1\}\}\{d^\{\\star\}\}\\exp\(3\\alpha\)\.Similarly, for allt≤tLowt\\leq t\_\{\\text\{Low\}\},
ℙ\(maxi≤nV\(Wi,Xi\)≥t\)≥1−exp\(−nG\(t\)\)≥1−exp\(−K2d⋆exp\(3α\)\)\.\\displaystyle\\mathbb\{P\}\\Big\(\\max\_\{i\\leq n\}V\(W\_\{i\},X\_\{i\}\)\\geq t\\Big\)\\geq 1\-\\exp\\left\(\-nG\(t\)\\right\)\\geq 1\-\\exp\\left\(\-K\_\{2\}\\sqrt\{d^\{\\star\}\}\\exp\(3\\alpha\)\\right\)\.We now bound𝔼\[maxiV\(Wi,Xi\)\]\\mathbb\{E\}\[\\max\_\{i\}V\(W\_\{i\},X\_\{i\}\)\]using a truncation argument\. Since\|V\(w,x\)\|≤1\|V\(w,x\)\|\\leq 1and denotingVi:=V\(Wi,Xi\)V\_\{i\}:=V\(W\_\{i\},X\_\{i\}\), we have
𝔼\[maxiVi\]\\displaystyle\\mathbb\{E\}\\Big\[\\max\_\{i\}V\_\{i\}\\Big\]≤tHigh\+ℙ\(maxiVi\>tHigh\)≤tHigh\+K1d⋆exp\(3α\),\\displaystyle\\leq t\_\{\\text\{High\}\}\+\\mathbb\{P\}\\Big\(\\max\_\{i\}V\_\{i\}\>t\_\{\\text\{High\}\}\\Big\)\\leq t\_\{\\text\{High\}\}\+\\frac\{K\_\{1\}\}\{d^\{\\star\}\}\\exp\(3\\alpha\),𝔼\[maxiVi\]\\displaystyle\\mathbb\{E\}\\Big\[\\max\_\{i\}V\_\{i\}\\Big\]≥tLow⋅ℙ\(maxiVi≥tLow\)−ℙ\(maxiVi<tLow\)≥tLow−2exp\(−K2d⋆e3α\)\.\\displaystyle\\geq t\_\{\\text\{Low\}\}\\cdot\\mathbb\{P\}\\Big\(\\max\_\{i\}V\_\{i\}\\geq t\_\{\\text\{Low\}\}\\Big\)\-\\mathbb\{P\}\\Big\(\\max\_\{i\}V\_\{i\}<t\_\{\\text\{Low\}\}\\Big\)\\geq t\_\{\\text\{Low\}\}\-2\\exp\\big\(\-K\_\{2\}\\sqrt\{d^\{\\star\}\}\\,e^\{3\\alpha\}\\big\)\.Since both correction terms areO\(d−1\)O\(d^\{\-1\}\), we obtain
tLow−O\(d−1\)≤𝔼\[maxi≤nV\(Wi,Xi\)\]≤tHigh\+O\(d−1\)\.\\displaystyle t\_\{\\text\{Low\}\}\-O\(d^\{\-1\}\)\\leq\\mathbb\{E\}\\Big\[\\max\_\{i\\leq n\}V\(W\_\{i\},X\_\{i\}\)\\Big\]\\leq t\_\{\\text\{High\}\}\+O\(d^\{\-1\}\)\.Now, to complete the argument, we need thattHigh−tLow=O~\(d−1\)t\_\{\\text\{High\}\}\-t\_\{\\text\{Low\}\}=\\tilde\{O\}\(d^\{\-1\}\)\. This holds because for any givenρ∈\[0,1\)\\rho\\in\[0,1\),J\(t\)J\(t\)is a smooth function, and so for anyα≥0\\alpha\\geq 0, there exists a constantK4K\_\{4\}such that\|tHigh−t∗\|≤1K4\|J\(tHigh\)−J\(t∗\)\|=O~\(d−1\)\|t\_\{\\text\{High\}\}\-t^\{\*\}\|\\leq\\frac\{1\}\{K\_\{4\}\}\|J\(t\_\{\\text\{High\}\}\)\-J\(t^\{\*\}\)\|=\\tilde\{O\}\(d^\{\-1\}\)\. Similarly,\|tLow−t∗\|=O~\(d−1\)\|t\_\{\\text\{Low\}\}\-t^\{\*\}\|=\\tilde\{O\}\(d^\{\-1\}\)\. Thus,
𝔼\[maxi≤nV\(Wi,Xi\)\]=t∗\+O~\(d−1\)\.\\displaystyle\\mathbb\{E\}\\Big\[\\max\_\{i\\leq n\}V\(W\_\{i\},X\_\{i\}\)\\Big\]=t^\{\*\}\+\\tilde\{O\}\(d^\{\-1\}\)\.Finally, recall thatf\(ρ,α\)=supw,x∈\(−1,1\)\{V\(w,x\):Iρ\(w,x\)≤α\}f\(\\rho,\\alpha\)=\\sup\_\{w,x\\in\(\-1,1\)\}\\\{V\(w,x\):I\_\{\\rho\}\(w,x\)\\leq\\alpha\\\}andJ\(t\)=infw,x∈\(−1,1\)\{Iρ\(w,x\):V\(w,x\)≥t\}J\(t\)=\\inf\_\{w,x\\in\(\-1,1\)\}\\\{I\_\{\\rho\}\(w,x\):V\(w,x\)\\geq t\\\}\. Note thatf\(ρ,⋅\)f\(\\rho,\\cdot\)andJ\(⋅\)J\(\\cdot\)are functional inverses of each other\. SinceIρI\_\{\\rho\}is strictly convex andVVis monotonic in the region of interest, the set\{𝐳:Iρ\(𝐳\)≤α\}\\\{\\mathbf\{z\}:I\_\{\\rho\}\(\\mathbf\{z\}\)\\leq\\alpha\\\}is a convex set, and its boundary corresponds exactly to the level sets ofVVwhere the maximum is achieved\. Thus,J\(t∗\)=αJ\(t^\{\*\}\)=\\alphaimpliest∗=f\(ρ,α\)t^\{\*\}=f\(\\rho,\\alpha\), and so
𝔼\[maxi≤nV\(Wi,Xi\)\]=f\(ρ,α\)\+O~\(d−1\)\.\\displaystyle\\mathbb\{E\}\\Big\[\\max\_\{i\\leq n\}V\(W\_\{i\},X\_\{i\}\)\\Big\]=f\(\\rho,\\alpha\)\+\\tilde\{O\}\(d^\{\-1\}\)\.Combining with Eq\. \([16](https://arxiv.org/html/2605.23944#A5.E16)\), which gives
𝔼\[maxi≤nWWi\+1−W21−Wi2Xi\]=𝔼\[maxiV\(Wi,Xi\)\]\+O\(d−1/2\),\\displaystyle\\mathbb\{E\}\\Big\[\\max\_\{i\\leq n\}WW\_\{i\}\+\\sqrt\{1\-W^\{2\}\}\\sqrt\{1\-W\_\{i\}^\{2\}\}X\_\{i\}\\Big\]=\\mathbb\{E\}\[\\max\_\{i\}V\(W\_\{i\},X\_\{i\}\)\]\+O\(d^\{\-1/2\}\),the total error isO~\(d−1\)\+O\(d−1/2\)=O\(d−1/2\)\\tilde\{O\}\(d^\{\-1\}\)\+O\(d^\{\-1/2\}\)=O\(d^\{\-1/2\}\), yielding theKlogd/dK\\sqrt\{\\log d/d\}bound stated in the proposition\. This completes the proof\.
∎
### E\.2Proof of Proposition[3](https://arxiv.org/html/2605.23944#Thmtheorem3)
###### Proof\.
From Lemma[1](https://arxiv.org/html/2605.23944#Thmtheorem1), we have that
𝔼𝐡,𝐦\[DKL\(qκ\(⋅\|𝐦\)∥p\(⋅\)\)\]\\displaystyle\\mathbb\{E\}\_\{\\mathbf\{h\},\\mathbf\{m\}\}\\left\[D\_\{\\mathrm\{KL\}\}\\big\(q\_\{\\kappa\}\(\\cdot\|\\mathbf\{m\}\)\\\|p\(\\cdot\)\\big\)\\right\]=−logCd\(0\)Cd\(κ\)\+κ𝔼\[W\],\\displaystyle=\-\\log\\frac\{C\_\{d\}\(0\)\}\{C\_\{d\}\(\\kappa\)\}\+\\kappa\\mathbb\{E\}\[W\],whereW∼pκ,d\(w\)W\\sim p\_\{\\kappa,d\}\(w\)\. Further, from Lemma[11](https://arxiv.org/html/2605.23944#Thmtheorem11), we have that
κ𝔼\|W−ρ\|≤κK¯d⋆\(1−ρ2\)=ρK¯d⋆\(1−ρ2\)3\.\\displaystyle\\kappa\\mathbb\{E\}\|W\-\\rho\|\\leq\\frac\{\\kappa\\bar\{K\}\}\{\\sqrt\{d^\{\\star\}\(1\-\\rho^\{2\}\)\}\}=\\frac\{\\rho\\bar\{K\}\\sqrt\{d^\{\\star\}\}\}\{\\sqrt\{\(1\-\\rho^\{2\}\)^\{3\}\}\}\.\(24\)Note that, the normalization constantCd\(κ\)C\_\{d\}\(\\kappa\)is defined as
1Cd\(κ\)=∫𝐡∈𝒮d−1exp\(κ𝐡⊤𝐦\)𝑑𝐦,\\displaystyle\\frac\{1\}\{C\_\{d\}\(\\kappa\)\}=\\int\_\{\\mathbf\{h\}\\in\\mathcal\{S\}^\{d\-1\}\}\\exp\(\\kappa\\mathbf\{h\}^\{\\top\}\\mathbf\{m\}\)d\\mathbf\{m\},It can be shown that \(see\[Mardia and Jupp,[2009](https://arxiv.org/html/2605.23944#bib.bib18)\]\),
1Cd\(κ\)\\displaystyle\\frac\{1\}\{C\_\{d\}\(\\kappa\)\}=1Cd−1\(0\)∫−11\(1−w2\)d⋆2exp\(κw\)𝑑w\\displaystyle=\\frac\{1\}\{C\_\{d\-1\}\(0\)\}\\int\_\{\-1\}^\{1\}\(1\-w^\{2\}\)^\{\\frac\{d^\{\\star\}\}\{2\}\}\\exp\\left\(\\kappa w\\right\)dw=1Cd−1\(0\)∫−11exp\(d⋆ρw1−ρ2\+d⋆2log\(1−w2\)\)𝑑w,\\displaystyle=\\frac\{1\}\{C\_\{d\-1\}\(0\)\}\\int\_\{\-1\}^\{1\}\\exp\\left\(\\frac\{d^\{\\star\}\\rho w\}\{1\-\\rho^\{2\}\}\+\\frac\{d^\{\\star\}\}\{2\}\\log\(1\-w^\{2\}\)\\right\)dw,where1Cd−1\(0\)\\frac\{1\}\{C\_\{d\-1\}\(0\)\}is same as the surface area of𝒮d−2\\mathcal\{S\}^\{d\-2\}and the second equality follows asκ=d⋆ρ1−ρ2\\kappa=\\frac\{d^\{\\star\}\\rho\}\{1\-\\rho^\{2\}\}\. As such, by using Lemma[11](https://arxiv.org/html/2605.23944#Thmtheorem11), forκ=ρd⋆1−ρ2\\kappa=\\frac\{\\rho d^\{\\star\}\}\{1\-\\rho^\{2\}\}, we have
1Cd−1\(0\)K\(1−ρ2\)d⋆≤1Cd\(κ\)exp\(−d⋆ρ21−ρ2−d⋆2log\(1−ρ2\)\)≤1Cd−1\(0\)2πd⋆\.\\displaystyle\\frac\{1\}\{C\_\{d\-1\}\(0\)\}\\frac\{K\(1\-\\rho^\{2\}\)\}\{\\sqrt\{d^\{\\star\}\}\}\\leq\\frac\{1\}\{C\_\{d\}\(\\kappa\)\}\\exp\\left\(\-\\frac\{d^\{\\star\}\\rho^\{2\}\}\{1\-\\rho^\{2\}\}\-\\frac\{d^\{\\star\}\}\{2\}\\log\(1\-\\rho^\{2\}\)\\right\)\\leq\\frac\{1\}\{C\_\{d\-1\}\(0\)\}\\sqrt\{\\frac\{2\\pi\}\{d^\{\\star\}\}\}\.By substitutingρ=0\\rho=0in the above expression,
1Cd−1\(0\)Kd⋆≤1Cd\(0\)≤1Cd−1\(0\)2πd⋆\.\\displaystyle\\frac\{1\}\{C\_\{d\-1\}\(0\)\}\\frac\{K\}\{\\sqrt\{d^\{\\star\}\}\}\\leq\\frac\{1\}\{C\_\{d\}\(0\)\}\\leq\\frac\{1\}\{C\_\{d\-1\}\(0\)\}\\sqrt\{\\frac\{2\\pi\}\{d^\{\\star\}\}\}\.Thus,
−K0≤−logCd\(0\)Cd\(κ\)\+d⋆\(ρ21−ρ2\+12log\(1−ρ2\)\)≤K0−log\(1−ρ2\),\\displaystyle\-K\_\{0\}\\leq\-\\log\\frac\{C\_\{d\}\(0\)\}\{C\_\{d\}\(\\kappa\)\}\+d^\{\\star\}\\left\(\\frac\{\\rho^\{2\}\}\{1\-\\rho^\{2\}\}\+\\frac\{1\}\{2\}\\log\(1\-\\rho^\{2\}\)\\right\)\\leq K\_\{0\}\-\\log\(1\-\\rho^\{2\}\),whereK0=max\{0,log2πK\}K\_\{0\}=\\max\\big\\\{0,\\log\\frac\{\\sqrt\{2\\pi\}\}\{K\}\\big\\\}\. This, in turn, implies that
−K0≤−logCd\(0\)Cd\(κ\)\+d⋆ρ21−ρ2\+d⋆\+12log\(1−ρ2\)≤K0,\\displaystyle\-K\_\{0\}\\leq\-\\log\\frac\{C\_\{d\}\(0\)\}\{C\_\{d\}\(\\kappa\)\}\+d^\{\\star\}\\frac\{\\rho^\{2\}\}\{1\-\\rho^\{2\}\}\+\\frac\{d^\{\\star\}\+1\}\{2\}\\log\(1\-\\rho^\{2\}\)\\leq K\_\{0\},Finally, by using Eq\. \([24](https://arxiv.org/html/2605.23944#A5.E24)\) and the fact thatκ=ρd⋆1−ρ2\\kappa=\\frac\{\\rho d^\{\\star\}\}\{1\-\\rho^\{2\}\}, we have
\|−logCd\(0\)Cd\(κ\)\+κ𝔼W−d⋆\+12log\(1−ρ2\)\|≤K0\+ρK¯d⋆\(1−ρ2\)3\.\\displaystyle\\Big\|\-\\log\\frac\{C\_\{d\}\(0\)\}\{C\_\{d\}\(\\kappa\)\}\+\\kappa\\mathbb\{E\}W\-\\frac\{d^\{\\star\}\+1\}\{2\}\\log\(1\-\\rho^\{2\}\)\\Big\|\\leq K\_\{0\}\+\\frac\{\\rho\\bar\{K\}\\sqrt\{d^\{\\star\}\}\}\{\\sqrt\{\(1\-\\rho^\{2\}\)^\{3\}\}\}\.Now, the proof is complete by choosingK[3](https://arxiv.org/html/2605.23944#Thmtheorem3)=K0\+K¯K\_\{\\ref\{prop: kl\_approx\}\}=K\_\{0\}\+\\bar\{K\}and usingd⋆≤dd^\{\\star\}\\leq d\. ∎
### E\.3Proof of Theorem[4](https://arxiv.org/html/2605.23944#Thmtheorem4)
###### Proof\.
We prove each part of the theorem sequentially\.
##### Part 1: Convergence
We analyze the asymptotic behavior of each term asd→∞d\\to\\inftyunder the scaling assumptionsλs=cs/d\+o\(d−1\)\\lambda\_\{s\}=c\_\{s\}/d\+o\(d^\{\-1\}\)andλc=cc/d\+o\(d−1\)\\lambda\_\{c\}=c\_\{c\}/d\+o\(d^\{\-1\}\)\. Letn=⌊eαd⌋n=\\lfloor e^\{\\alpha d\}\\rfloorandκ=ρ1−ρ2\(d−3\)\\kappa=\\frac\{\\rho\}\{1\-\\rho^\{2\}\}\(d\-3\)for fixedα,ρ\\alpha,\\rho\. From Proposition[2](https://arxiv.org/html/2605.23944#Thmtheorem2), the expected utility of the best recommendation converges as
limd→∞𝔼\[maxi∈\[n\]⟨𝐡,𝜽i⟩\]=f\(ρ,α\),\\lim\_\{d\\rightarrow\\infty\}\\mathbb\{E\}\\left\[\\max\_\{i\\in\[n\]\}\\langle\\mathbf\{h\},\\bm\{\\theta\}\_\{i\}\\rangle\\right\]=f\(\\rho,\\alpha\),wheref\(ρ,α\)f\(\\rho,\\alpha\)is the value of the deterministic optimization problem defined in Proposition[2](https://arxiv.org/html/2605.23944#Thmtheorem2)\. Next, using the scaling ofλs\\lambda\_\{s\}, the search cost becomes,
λslogn=\(csd\+o\(d−1\)\)\(αd\)=csα\+o\(1\)\.\\lambda\_\{s\}\\log n=\\left\(\\frac\{c\_\{s\}\}\{d\}\+o\(d^\{\-1\}\)\\right\)\(\\alpha d\)=c\_\{s\}\\alpha\+o\(1\)\.Finally, from Proposition[3](https://arxiv.org/html/2605.23944#Thmtheorem3),
DKL\\displaystyle D\_\{\\text\{KL\}\}=d−22log\(11−ρ2\)\+ℰ[3](https://arxiv.org/html/2605.23944#Thmtheorem3)\(ρ\),\\displaystyle=\\frac\{d\-2\}\{2\}\\log\\left\(\\frac\{1\}\{1\-\\rho^\{2\}\}\\right\)\+\\mathcal\{E\}\_\{\\ref\{prop: kl\_approx\}\}\(\\rho\),where we useDKLD\_\{\\text\{KL\}\}as a shorthand notation for the KL\-divergence\. Multiplying by the communication cost parameterλc\\lambda\_\{c\}and taking the limit,
limd→∞λcDKL\\displaystyle\\lim\_\{d\\rightarrow\\infty\}\\lambda\_\{c\}D\_\{\\text\{KL\}\}=limd→∞\(ccd\+o\(d−1\)\)\(d−22log\(11−ρ2\)\+ℰ[3](https://arxiv.org/html/2605.23944#Thmtheorem3)\(ρ\)\)=−cc2log\(1−ρ2\),\\displaystyle=\\lim\_\{d\\rightarrow\\infty\}\\left\(\\frac\{c\_\{c\}\}\{d\}\+o\(d^\{\-1\}\)\\right\)\\left\(\\frac\{d\-2\}\{2\}\\log\\left\(\\frac\{1\}\{1\-\\rho^\{2\}\}\\right\)\+\\mathcal\{E\}\_\{\\ref\{prop: kl\_approx\}\}\(\\rho\)\\right\)=\-\\frac\{c\_\{c\}\}\{2\}\\log\(1\-\\rho^\{2\}\),where we use the fact that\|ℰ[3](https://arxiv.org/html/2605.23944#Thmtheorem3)\(ρ\)\|≤K[3](https://arxiv.org/html/2605.23944#Thmtheorem3)d\(1−ρ2\)3\|\\mathcal\{E\}\_\{\\ref\{prop: kl\_approx\}\}\(\\rho\)\|\\leq\\frac\{K\_\{\\ref\{prop: kl\_approx\}\}\\sqrt\{d\}\}\{\\sqrt\{\(1\-\\rho^\{2\}\)^\{3\}\}\}, andK[3](https://arxiv.org/html/2605.23944#Thmtheorem3)K\_\{\\ref\{prop: kl\_approx\}\}is a universal constant\. Combining these terms, the objective function converges to
limd→∞𝒫d\(κ,n\)=f\(ρ,α\)−csα\+cc2log\(1−ρ2\)\.\\lim\_\{d\\to\\infty\}\\mathcal\{P\}\_\{d\}\(\\kappa,n\)=f\(\\rho,\\alpha\)\-c\_\{s\}\\alpha\+\\frac\{c\_\{c\}\}\{2\}\\log\(1\-\\rho^\{2\}\)\.This matches the definition of the objective function inOPTJoint\\mathrm\{OPT\}\_\{\\mathrm\{Joint\}\}\(Eq\. \([6](https://arxiv.org/html/2605.23944#S3.E6)\)\)\. This shows that the finite\-dimensional objective converges uniformly to the asymptotic objective\. Now, under the assumption that the optimizer \(ρ∗,α∗\\rho^\{\*\},\\alpha^\{\*\}\) lies on a compact set, the maximum also converges, proving the first statement\.
As such, to complete the argument, we show that forcs\>0c\_\{s\}\>0andcc\>0c\_\{c\}\>0, any optimizer of𝒫∞\\mathcal\{P\}\_\{\\infty\}must lie in a compact subset of\[0,1\)×\[0,∞\)\[0,1\)\\times\[0,\\infty\)\. Sincef\(ρ,α\)≤1f\(\\rho,\\alpha\)\\leq 1, the objective satisfies
𝒫∞\(ρ,α\)≤1−csα\+cc2log\(1−ρ2\)\.\\displaystyle\\mathcal\{P\}\_\{\\infty\}\(\\rho,\\alpha\)\\leq 1\-c\_\{s\}\\alpha\+\\frac\{c\_\{c\}\}\{2\}\\log\(1\-\\rho^\{2\}\)\.Note that𝒫∞\(0,0\)=f\(0,0\)=0\\mathcal\{P\}\_\{\\infty\}\(0,0\)=f\(0,0\)=0provides a baseline value\. For any\(ρ,α\)\(\\rho,\\alpha\)to be optimal, we need𝒫∞\(ρ,α\)≥0\\mathcal\{P\}\_\{\\infty\}\(\\rho,\\alpha\)\\geq 0, which requires bothcsα≤1c\_\{s\}\\alpha\\leq 1andcc2log\(1−ρ2\)\>−1\\frac\{c\_\{c\}\}\{2\}\\log\(1\-\\rho^\{2\}\)\>\-1\. The first condition givesα≤1/cs\\alpha\\leq 1/c\_\{s\}\. The second gives1−ρ2\>e−2/cc1\-\\rho^\{2\}\>e^\{\-2/c\_\{c\}\}, i\.e\.,ρ<1−e−2/cc<1\\rho<\\sqrt\{1\-e^\{\-2/c\_\{c\}\}\}<1\. Since the optimizer is confined to the compact set\[0,1−e−2/cc\]×\[0,1/cs\]\[0,\\sqrt\{1\-e^\{\-2/c\_\{c\}\}\}\]\\times\[0,1/c\_\{s\}\], the uniform convergence of𝒫d\\mathcal\{P\}\_\{d\}to𝒫∞\\mathcal\{P\}\_\{\\infty\}on this set implies convergence of the optima\.
##### Part 2: Monotonicity
We denote the asymptotic objective function by𝒫∞\(ρ,α\)\\mathcal\{P\}\_\{\\infty\}\(\\rho,\\alpha\)which is given by
𝒫∞\(ρ,α\)=f\(ρ,α\)−csα\+cc2log\(1−ρ2\)\.\\mathcal\{P\}\_\{\\infty\}\(\\rho,\\alpha\)=f\(\\rho,\\alpha\)\-c\_\{s\}\\alpha\+\\frac\{c\_\{c\}\}\{2\}\\log\(1\-\\rho^\{2\}\)\.For simplicity of the argument, we assume that the optimal solution\(ρ∗,α∗\)\(\\rho^\{\*\},\\alpha^\{\*\}\)lies in the interior of the feasible region\(0,1\)×\(0,∞\)\(0,1\)\\times\(0,\\infty\)\. From the first\-order necessary conditions for a local maximum, we have
∂∂ρ𝒫∞=∂f∂ρ−ccρ1−ρ2=0,\\displaystyle\\frac\{\\partial\}\{\\partial\\rho\}\\mathcal\{P\}\_\{\\infty\}=\\frac\{\\partial f\}\{\\partial\\rho\}\-\\frac\{c\_\{c\}\\rho\}\{1\-\\rho^\{2\}\}=0,∂∂α𝒫∞=∂f∂α−cs=0\\displaystyle\\frac\{\\partial\}\{\\partial\\alpha\}\\mathcal\{P\}\_\{\\infty\}=\\frac\{\\partial f\}\{\\partial\\alpha\}\-c\_\{s\}=0\(25\)For the solution to be a local maximum, the second\-order sufficient conditions require that
∂2∂ρ2𝒫∞<0,\\displaystyle\\frac\{\\partial^\{2\}\}\{\\partial\\rho^\{2\}\}\\mathcal\{P\}\_\{\\infty\}<0,∂2∂α2𝒫∞<0,\\displaystyle\\frac\{\\partial^\{2\}\}\{\\partial\\alpha^\{2\}\}\\mathcal\{P\}\_\{\\infty\}<0,and the Hessian matrix𝐇∞\\mathbf\{H\}\_\{\\infty\}of the objective function𝒫∞\\mathcal\{P\}\_\{\\infty\}with respect toρ\\rhoandα\\alphabe negative definite, where the Hessian is given by
𝐇∞=\(∂2f∂ρ2−cc\(1\+ρ2\)\(1−ρ2\)2∂2f∂ρ∂α∂2f∂α∂ρ∂2f∂α2,\)\\mathbf\{H\}\_\{\\infty\}=\\begin\{pmatrix\}\\frac\{\\partial^\{2\}f\}\{\\partial\\rho^\{2\}\}\-\\frac\{c\_\{c\}\(1\+\\rho^\{2\}\)\}\{\(1\-\\rho^\{2\}\)^\{2\}\}&\\frac\{\\partial^\{2\}f\}\{\\partial\\rho\\partial\\alpha\}\\\\ \\frac\{\\partial^\{2\}f\}\{\\partial\\alpha\\partial\\rho\}&\\frac\{\\partial^\{2\}f\}\{\\partial\\alpha^\{2\}\},\\end\{pmatrix\}where second order conditions imply that\|𝐇∞\|\>0\|\\mathbf\{H\}\_\{\\infty\}\|\>0\. Furthermore, we invoke the property that communication precision and search set size act as substitutes in the production of utility, simply because asρ\\rhoincreases, the alignment between the agent’s recommendation and the user’s preference increases\. This reduces the necessity of a large search set exponentα\\alphato find a high\-utility outcome, as such, we have∂2∂α∂ρ𝒫∞=∂2f∂ρ∂α<0\\frac\{\\partial^\{2\}\}\{\\partial\\alpha\\partial\\rho\}\\mathcal\{P\}\_\{\\infty\}=\\frac\{\\partial^\{2\}f\}\{\\partial\\rho\\partial\\alpha\}<0\.
Since the first\-order conditions must hold even asccc\_\{c\}changes, we can take the total derivative of both equations in Eq\. \([25](https://arxiv.org/html/2605.23944#A5.E25)\) with respect toccc\_\{c\}to get that
𝐇∞\(∂ρ∗∂cc∂α∗∂cc\)\+\(∂2∂ρ∂cc𝒫∞∂2∂α∂cc𝒫∞\)=\(00\)\.\\mathbf\{H\}\_\{\\infty\}\\begin\{pmatrix\}\\frac\{\\partial\\rho^\{\*\}\}\{\\partial c\_\{c\}\}\\\\ \\frac\{\\partial\\alpha^\{\*\}\}\{\\partial c\_\{c\}\}\\end\{pmatrix\}\+\\begin\{pmatrix\}\\frac\{\\partial^\{2\}\}\{\\partial\\rho\\partial c\_\{c\}\}\\mathcal\{P\}\_\{\\infty\}\\\\ \\frac\{\\partial^\{2\}\}\{\\partial\\alpha\\partial c\_\{c\}\}\\mathcal\{P\}\_\{\\infty\}\\end\{pmatrix\}=\\begin\{pmatrix\}0\\\\ 0\\end\{pmatrix\}\.Further, we have that
∂2∂ρ∂cc𝒫∞=∂∂cc\(∂f∂ρ−ccρ1−ρ2\)=−ρ1−ρ2,\\displaystyle\\frac\{\\partial^\{2\}\}\{\\partial\\rho\\partial c\_\{c\}\}\\mathcal\{P\}\_\{\\infty\}=\\frac\{\\partial\}\{\\partial c\_\{c\}\}\\left\(\\frac\{\\partial f\}\{\\partial\\rho\}\-\\frac\{c\_\{c\}\\rho\}\{1\-\\rho^\{2\}\}\\right\)=\-\\frac\{\\rho\}\{1\-\\rho^\{2\}\},∂2∂α∂cc𝒫∞=∂∂cc\(∂f∂α−cs\)=0\\displaystyle\\frac\{\\partial^\{2\}\}\{\\partial\\alpha\\partial c\_\{c\}\}\\mathcal\{P\}\_\{\\infty\}=\\frac\{\\partial\}\{\\partial c\_\{c\}\}\\left\(\\frac\{\\partial f\}\{\\partial\\alpha\}\-c\_\{s\}\\right\)=0Next, we apply Cramer’s rule to solve for the linear system, giving us
∂ρ∗∂cc=−1\|𝐇∞\|det\(−ρ1−ρ2∂2∂α∂ρ𝒫∞0∂2∂α2𝒫∞\)=1\|𝐇∞\|\(ρ1−ρ2∂2∂α2𝒫∞\)<0,\\frac\{\\partial\\rho^\{\*\}\}\{\\partial c\_\{c\}\}=\-\\frac\{1\}\{\|\\mathbf\{H\}\_\{\\infty\}\|\}\\det\\begin\{pmatrix\}\\frac\{\-\\rho\}\{1\-\\rho^\{2\}\}&\\frac\{\\partial^\{2\}\}\{\\partial\\alpha\\partial\\rho\}\\mathcal\{P\}\_\{\\infty\}\\\\ 0&\\frac\{\\partial^\{2\}\}\{\\partial\\alpha^\{2\}\}\\mathcal\{P\}\_\{\\infty\}\\end\{pmatrix\}=\\frac\{1\}\{\|\\mathbf\{H\}\_\{\\infty\}\|\}\\left\(\\frac\{\\rho\}\{1\-\\rho^\{2\}\}\\frac\{\\partial^\{2\}\}\{\\partial\\alpha^\{2\}\}\\mathcal\{P\}\_\{\\infty\}\\right\)<0,where we use the fact that\|𝐇∞\|\>0\|\\mathbf\{H\}\_\{\\infty\}\|\>0and∂2∂α2𝒫∞<0\\frac\{\\partial^\{2\}\}\{\\partial\\alpha^\{2\}\}\\mathcal\{P\}\_\{\\infty\}<0\. Similarly, we get that
∂α∗∂cc=−1\|𝐇∞\|det\(∂2∂ρ2𝒫∞−ρ1−ρ2∂2∂α∂ρ𝒫∞0\)=1\|𝐇∞\|\(−ρ1−ρ2∂2∂α∂ρ𝒫∞\)\>0\.\\displaystyle\\frac\{\\partial\\alpha^\{\*\}\}\{\\partial c\_\{c\}\}=\-\\frac\{1\}\{\|\\mathbf\{H\}\_\{\\infty\}\|\}\\det\\begin\{pmatrix\}\\frac\{\\partial^\{2\}\}\{\\partial\\rho^\{2\}\}\\mathcal\{P\}\_\{\\infty\}&\\frac\{\-\\rho\}\{1\-\\rho^\{2\}\}\\\\ \\frac\{\\partial^\{2\}\}\{\\partial\\alpha\\partial\\rho\}\\mathcal\{P\}\_\{\\infty\}&0\\end\{pmatrix\}=\\frac\{1\}\{\|\\mathbf\{H\}\_\{\\infty\}\|\}\\left\(\-\\frac\{\\rho\}\{1\-\\rho^\{2\}\}\\frac\{\\partial^\{2\}\}\{\\partial\\alpha\\partial\\rho\}\\mathcal\{P\}\_\{\\infty\}\\right\)\>0\.Next, we repeat the procedure in terms of the search cost parametercsc\_\{s\}\. We have that
∂2∂ρ∂cs𝒫∞=∂∂cs\(∂f∂ρ−ccρ1−ρ2\)=0,\\displaystyle\\frac\{\\partial^\{2\}\}\{\\partial\\rho\\partial c\_\{s\}\}\\mathcal\{P\}\_\{\\infty\}=\\frac\{\\partial\}\{\\partial c\_\{s\}\}\\left\(\\frac\{\\partial f\}\{\\partial\\rho\}\-\\frac\{c\_\{c\}\\rho\}\{1\-\\rho^\{2\}\}\\right\)=0,∂2∂α∂cs𝒫∞=∂∂cs\(∂f∂α−cs\)=−1\\displaystyle\\frac\{\\partial^\{2\}\}\{\\partial\\alpha\\partial c\_\{s\}\}\\mathcal\{P\}\_\{\\infty\}=\\frac\{\\partial\}\{\\partial c\_\{s\}\}\\left\(\\frac\{\\partial f\}\{\\partial\\alpha\}\-c\_\{s\}\\right\)=\-1This gives us that
∂ρ∗∂cs=1\|𝐇∞\|\(−1×∂2∂α∂ρ𝒫∞\)\>0,\\frac\{\\partial\\rho^\{\*\}\}\{\\partial c\_\{s\}\}=\\frac\{1\}\{\|\\mathbf\{H\}\_\{\\infty\}\|\}\\left\(\-1\\times\\frac\{\\partial^\{2\}\}\{\\partial\\alpha\\partial\\rho\}\\mathcal\{P\}\_\{\\infty\}\\right\)\>0,and also,
∂α∗∂cs=1\|𝐇∞\|\(1×∂2∂ρ2𝒫∞\)<0\.\\frac\{\\partial\\alpha^\{\*\}\}\{\\partial c\_\{s\}\}=\\frac\{1\}\{\|\\mathbf\{H\}\_\{\\infty\}\|\}\\left\(1\\times\\frac\{\\partial^\{2\}\}\{\\partial\\rho^\{2\}\}\\mathcal\{P\}\_\{\\infty\}\\right\)<0\.This completes the proof of the monotonicity properties\.
##### Part 3: Search\-Only Regime
To prove this part, we first show that forcc\>csc\_\{c\}\>c\_\{s\}, the optimal policy for𝒫∞\(ρ,α\)\\mathcal\{P\}\_\{\\infty\}\(\\rho,\\alpha\)satisfy that optimal message precisionρ∗=0\\rho^\{\*\}=0\. From the expression forf\(ρ,α\)f\(\\rho,\\alpha\)in Proposition[2](https://arxiv.org/html/2605.23944#Thmtheorem2), we know that
f\(ρ,α\):=maxw,x∈\(−1,1\)\\displaystyle f\(\\rho,\\alpha\):=\\max\_\{w,x\\in\(\-1,1\)\}ρw\+1−ρ21−w2xsuch thatIρ\(w,x\)≤α,\\displaystyle\\ \\ \\ \\rho w\+\\sqrt\{1\-\\rho^\{2\}\}\\sqrt\{1\-w^\{2\}\}x\\ \\text\{ such that \}I\_\{\\rho\}\(w,x\)\\leq\\alpha,whereIρ\(w,x\)I\_\{\\rho\}\(w,x\)is the large deviations rate function, given by
Iρ\(w,x\)=−ρ\(w−ρ\)1−ρ2−12log\(1−w2\)\+12log\(1−ρ2\)−12log\(1−x2\)\.\\displaystyle I\_\{\\rho\}\(w,x\)=\-\\frac\{\\rho\(w\-\\rho\)\}\{1\-\\rho^\{2\}\}\-\\frac\{1\}\{2\}\\log\(1\-w^\{2\}\)\+\\frac\{1\}\{2\}\\log\(1\-\\rho^\{2\}\)\-\\frac\{1\}\{2\}\\log\(1\-x^\{2\}\)\.Using the fact that−ρ\(w−ρ\)1−ρ2−12log\(1−w2\)\+12log\(1−ρ2\)≥0\-\\frac\{\\rho\(w\-\\rho\)\}\{1\-\\rho^\{2\}\}\-\\frac\{1\}\{2\}\\log\(1\-w^\{2\}\)\+\\frac\{1\}\{2\}\\log\(1\-\\rho^\{2\}\)\\geq 0for anyw∈\(−1,1\)w\\in\(\-1,1\), we get that
\{\(w,x\):\|w\|<1,\|x\|<1,Iρ\(w,x\)≤α\}⊆\{\(w,x\):\|w\|<1,\|x\|<1,−12log\(1−x2\)≤α\}\.\\displaystyle\\big\\\{\(w,x\):\|w\|<1,\|x\|<1,I\_\{\\rho\}\(w,x\)\\leq\\alpha\\big\\\}\\subseteq\\Big\\\{\(w,x\):\|w\|<1,\|x\|<1,\-\\frac\{1\}\{2\}\\log\(1\-x^\{2\}\)\\leq\\alpha\\Big\\\}\.This implies that
f\(ρ,α\)≤maxw,x∈\(−1,1\)\\displaystyle f\(\\rho,\\alpha\)\\leq\\max\_\{w,x\\in\(\-1,1\)\}ρw\+1−ρ21−w2xsuch that−12log\(1−x2\)≤α\.\\displaystyle\\ \\ \\ \\rho w\+\\sqrt\{1\-\\rho^\{2\}\}\\sqrt\{1\-w^\{2\}\}x\\ \\text\{ such that \}\-\\frac\{1\}\{2\}\\log\(1\-x^\{2\}\)\\leq\\alpha\.This in turn gives us that,
𝒫∞\(ρ,α\)\\displaystyle\\mathcal\{P\}\_\{\\infty\}\(\\rho,\\alpha\)≤maxw,x∈\(−1,1\)ρw\+1−ρ21−w2x−csα−cc2log11−ρ2s\.t\.−12log\(1−x2\)≤α\\displaystyle\\leq\\max\_\{w,x\\in\(\-1,1\)\}\\ \\ \\ \\rho w\+\\sqrt\{1\-\\rho^\{2\}\}\\sqrt\{1\-w^\{2\}\}x\-c\_\{s\}\\alpha\-\\frac\{c\_\{c\}\}\{2\}\\log\\frac\{1\}\{1\-\\rho^\{2\}\}\\ \\text\{ s\.t\. \}\-\\frac\{1\}\{2\}\\log\(1\-x^\{2\}\)\\leq\\alpha=ρ2\+\(1−ρ2\)\(1−e−2α\)−csα−cc2log11−ρ2,\\displaystyle=\\sqrt\{\\rho^\{2\}\+\(1\-\\rho^\{2\}\)\(1\-e^\{\-2\\alpha\}\)\}\-c\_\{s\}\\alpha\-\\frac\{c\_\{c\}\}\{2\}\\log\\frac\{1\}\{1\-\\rho^\{2\}\},where the second equality is achieved by simply maximizing in terms ofwwandxx\. It can be shown that whencc\>csc\_\{c\}\>c\_\{s\}, the\(ρ,α\)\(\\rho,\\alpha\)that maximizes the RHS in the above expression satisfyρ=0\\rho=0\. A more similar argument \(and in more detail\) for the same is presented in Theorem[7](https://arxiv.org/html/2605.23944#Thmtheorem7)and its proof\. As such, forcc\>csc\_\{c\}\>c\_\{s\}, we have
maxρ∈\[0,1\),α≥0𝒫∞\(ρ,α\)≤maxα≥01−e−2α−csα\.\\displaystyle\\max\_\{\\rho\\in\[0,1\),\\alpha\\geq 0\}\\mathcal\{P\}\_\{\\infty\}\(\\rho,\\alpha\)\\leq\\max\_\{\\alpha\\geq 0\}\\sqrt\{1\-e^\{\-2\\alpha\}\}\-c\_\{s\}\\alpha\.Further, we also have that,
maxρ∈\[0,1\),α≥0𝒫∞\(ρ,α\)\\displaystyle\\max\_\{\\rho\\in\[0,1\),\\alpha\\geq 0\}\\mathcal\{P\}\_\{\\infty\}\(\\rho,\\alpha\)≥maxα≥0𝒫∞\(0,α\)\\displaystyle\\geq\\max\_\{\\alpha\\geq 0\}\\mathcal\{P\}\_\{\\infty\}\(0,\\alpha\)=maxα≥0maxw,x∈\(−1,1\)\[1−w2x−csαs\.t\.−12log\(1−w2\)−12log\(1−x2\)≤α\]\\displaystyle=\\max\_\{\\alpha\\geq 0\}\\max\_\{w,x\\in\(\-1,1\)\}\\Big\[\\sqrt\{1\-w^\{2\}\}x\-c\_\{s\}\\alpha\\ \\text\{ s\.t\. \}\-\\frac\{1\}\{2\}\\log\(1\-w^\{2\}\)\-\\frac\{1\}\{2\}\\log\(1\-x^\{2\}\)\\leq\\alpha\\Big\]=maxα≥0maxx∈\(−1,1\)\[x−csαs\.t\.−12log\(1−x2\)≤α\]\\displaystyle=\\max\_\{\\alpha\\geq 0\}\\max\_\{x\\in\(\-1,1\)\}\\Big\[x\-c\_\{s\}\\alpha\\ \\text\{ s\.t\. \}\-\\frac\{1\}\{2\}\\log\(1\-x^\{2\}\)\\leq\\alpha\\Big\]=maxα≥01−e−2α−csα\.\\displaystyle=\\max\_\{\\alpha\\geq 0\}\\sqrt\{1\-e^\{\-2\\alpha\}\}\-c\_\{s\}\\alpha\.Combining the above two inequalities, it implies that forcc\>csc\_\{c\}\>c\_\{s\},
maxρ∈\[0,1\),α≥0𝒫∞\(ρ,α\)=maxα≥01−e−2α−csα,\\displaystyle\\max\_\{\\rho\\in\[0,1\),\\alpha\\geq 0\}\\mathcal\{P\}\_\{\\infty\}\(\\rho,\\alpha\)=\\max\_\{\\alpha\\geq 0\}\\sqrt\{1\-e^\{\-2\\alpha\}\}\-c\_\{s\}\\alpha,and soρ∗=0\\rho^\{\*\}=0whenevercc\>csc\_\{c\}\>c\_\{s\}\. Thus, we get that there existsc¯c\(cs\)\\bar\{c\}\_\{c\}\(c\_\{s\}\)such that forcc\>c¯c\(cs\)c\_\{c\}\>\\bar\{c\}\_\{c\}\(c\_\{s\}\), we haveρ∗\(cc,cs\)=0\\rho^\{\*\}\(c\_\{c\},c\_\{s\}\)=0\. ∎
## Appendix FProofs of Results in Section[3\.3](https://arxiv.org/html/2605.23944#S3.SS3)
### F\.1Proof of Theorem[5](https://arxiv.org/html/2605.23944#Thmtheorem5)
###### Proof\.
We begin by defining the asymptotic objective function derived from the joint optimization problemOPTJoint\\mathrm\{OPT\}\_\{\\mathrm\{Joint\}\}\. For anyρ∈\[0,1\)\\rho\\in\[0,1\)andα≥0\\alpha\\geq 0, with slight abuse of notation, we define
𝒫∞\(ρ,α\)=f\(ρ,α\)−csα\+cc2log\(1−ρ2\)\\mathcal\{P\}\_\{\\infty\}\(\\rho,\\alpha\)=f\(\\rho,\\alpha\)\-c\_\{s\}\\alpha\+\\frac\{c\_\{c\}\}\{2\}\\log\(1\-\\rho^\{2\}\)wheref\(ρ,α\)f\(\\rho,\\alpha\)is the limit of the expected maximum utility defined in Proposition[2](https://arxiv.org/html/2605.23944#Thmtheorem2)\. For any fixed\(ρ,α\)\(\\rho,\\alpha\)we construct a policy using the mappingκ\(ρ\)=ρ1−ρ2\(d−3\)\\kappa\(\\rho\)=\\frac\{\\rho\}\{1\-\\rho^\{2\}\}\(d\-3\)andn\(α\)=edαn\(\\alpha\)=e^\{d\\alpha\}, where we ignore the integrality of the recommendation set sizennfor simplicity\. The finite\-dimensional objective𝒫d\(κ\(ρ\),n\(α\)\)\\mathcal\{P\}\_\{d\}\\big\(\\kappa\(\\rho\),n\(\\alpha\)\\big\)can be decomposed into three terms: product utility, search cost, and communication cost\.
First, invoking Proposition[2](https://arxiv.org/html/2605.23944#Thmtheorem2), the expected product utility satisfies
𝔼\[maxi∈\[n\]⟨𝐡,𝜽i⟩\]=f\(ρ,α\)\+ℰ[2](https://arxiv.org/html/2605.23944#Thmtheorem2)\(ρ,α\),\\mathbb\{E\}\\left\[\\max\_\{i\\in\[n\]\}\\langle\\mathbf\{h\},\\bm\{\\theta\}\_\{i\}\\rangle\\right\]=f\(\\rho,\\alpha\)\+\\mathcal\{E\}\_\{\\ref\{prop: utility\_approx\}\}\(\\rho,\\alpha\),where\|ℰ[2](https://arxiv.org/html/2605.23944#Thmtheorem2)\(ρ,α\)\|≤K[2](https://arxiv.org/html/2605.23944#Thmtheorem2)\(ρ,α\)logdd\|\\mathcal\{E\}\_\{\\ref\{prop: utility\_approx\}\}\(\\rho,\\alpha\)\|\\leq K\_\{\\ref\{prop: utility\_approx\}\}\(\\rho,\\alpha\)\\sqrt\{\\frac\{\\log d\}\{d\}\}, Here, we writeℰ[2](https://arxiv.org/html/2605.23944#Thmtheorem2)\(ρ,α\)\\mathcal\{E\}\_\{\\ref\{prop: utility\_approx\}\}\(\\rho,\\alpha\)instead of simplyℰ[2](https://arxiv.org/html/2605.23944#Thmtheorem2)\\mathcal\{E\}\_\{\\ref\{prop: utility\_approx\}\}to show the dependency onρ\\rhoandα\\alpha\.
Second, utilizing the definition of scaled costscs=dλsc\_\{s\}=d\\lambda\_\{s\}, the search cost is exactly
λslogn=csd\(dα\)=csα\\lambda\_\{s\}\\log n=\\frac\{c\_\{s\}\}\{d\}\(d\\alpha\)=c\_\{s\}\\alphaThird, using Proposition[3](https://arxiv.org/html/2605.23944#Thmtheorem3)and the scaled costcc=dλcc\_\{c\}=d\\lambda\_\{c\}, the communication cost is
λcDKL\\displaystyle\\lambda\_\{c\}D\_\{\\text\{KL\}\}=ccd\[d−22log\(11−ρ2\)\+ℰ[3](https://arxiv.org/html/2605.23944#Thmtheorem3)\(ρ\)\]=−cc2log\(1−ρ2\)\+ccdℰ[3](https://arxiv.org/html/2605.23944#Thmtheorem3)\(ρ\)−ccdlog\(11−ρ2\),\\displaystyle=\\frac\{c\_\{c\}\}\{d\}\\left\[\\frac\{d\-2\}\{2\}\\log\\left\(\\frac\{1\}\{1\-\\rho^\{2\}\}\\right\)\+\\mathcal\{E\}\_\{\\ref\{prop: kl\_approx\}\}\(\\rho\)\\right\]=\-\\frac\{c\_\{c\}\}\{2\}\\log\(1\-\\rho^\{2\}\)\+\\frac\{c\_\{c\}\}\{d\}\\mathcal\{E\}\_\{\\ref\{prop: kl\_approx\}\}\(\\rho\)\-\\frac\{c\_\{c\}\}\{d\}\\log\\left\(\\frac\{1\}\{1\-\\rho^\{2\}\}\\right\),where\|ℰ[3](https://arxiv.org/html/2605.23944#Thmtheorem3)\(ρ\)\|≤K[3](https://arxiv.org/html/2605.23944#Thmtheorem3)d\(1−ρ2\)3\|\\mathcal\{E\}\_\{\\ref\{prop: kl\_approx\}\}\(\\rho\)\|\\leq K\_\{\\ref\{prop: kl\_approx\}\}\\frac\{\\sqrt\{d\}\}\{\(1\-\\rho^\{2\}\)^\{3\}\}, andK[3](https://arxiv.org/html/2605.23944#Thmtheorem3)K\_\{\\ref\{prop: kl\_approx\}\}is a universal constant\.
Combining these terms, we can relate the finite objective𝒫d\\mathcal\{P\}\_\{d\}to the asymptotic objective𝒫∞\\mathcal\{P\}\_\{\\infty\}for any valid\(ρ,α\)\(\\rho,\\alpha\)pair
𝒫d\(κ\(ρ\),n\(α\)\)=𝒫∞\(ρ,α\)\+ℰtotal\(ρ,α\),\\mathcal\{P\}\_\{d\}\(\\kappa\(\\rho\),n\(\\alpha\)\)=\\mathcal\{P\}\_\{\\infty\}\(\\rho,\\alpha\)\+\\mathcal\{E\}\_\{\\text\{total\}\}\(\\rho,\\alpha\),where the total error is bounded by
ℰtotal\(ρ,α\)≤K[2](https://arxiv.org/html/2605.23944#Thmtheorem2)\(ρ,α\)logdd\+ccd\(K[3](https://arxiv.org/html/2605.23944#Thmtheorem3)d\(1−ρ2\)3\)\+ccdlog\(11−ρ2\),\\displaystyle\\mathcal\{E\}\_\{\\text\{total\}\}\(\\rho,\\alpha\)\\leq K\_\{\\ref\{prop: utility\_approx\}\}\(\\rho,\\alpha\)\\sqrt\{\\frac\{\\log d\}\{d\}\}\+\\frac\{c\_\{c\}\}\{d\}\\left\(K\_\{\\ref\{prop: kl\_approx\}\}\\frac\{\\sqrt\{d\}\}\{\(1\-\\rho^\{2\}\)^\{3\}\}\\right\)\+\\frac\{c\_\{c\}\}\{d\}\\log\\left\(\\frac\{1\}\{1\-\\rho^\{2\}\}\\right\),giving us that\|ℰtotal\(ρ,α\)\|\|\\mathcal\{E\}\_\{\\text\{total\}\}\(\\rho,\\alpha\)\|is dominated by the utility approximation error\. And so, there exists a constantKtotal\(ρ,α\)K\_\{\\text\{total\}\}\(\\rho,\\alpha\)such that for
\|ℰtotal\(ρ,α\)\|≤Ktotal\(ρ,α\)logdd\.\\displaystyle\|\\mathcal\{E\}\_\{\\text\{total\}\}\(\\rho,\\alpha\)\|\\leq K\_\{\\text\{total\}\}\(\\rho,\\alpha\)\\sqrt\{\\frac\{\\log d\}\{d\}\}\.
Now, let\(κd∗,nd∗\)\(\\kappa^\{\*\}\_\{d\},n^\{\*\}\_\{d\}\)be the true optimal solution to𝒫d\\mathcal\{P\}\_\{d\}\. We define the implicit asymptotic parameters\(ρ~,α~\)\(\\tilde\{\\rho\},\\tilde\{\\alpha\}\)by inverting the mapping relations:α~=1dlognd∗\\tilde\{\\alpha\}=\\frac\{1\}\{d\}\\log n^\{\*\}\_\{d\}andρ~\\tilde\{\\rho\}such thatκd∗=ρ~1−ρ~2\(d−3\)\\kappa^\{\*\}\_\{d\}=\\frac\{\\tilde\{\\rho\}\}\{1\-\\tilde\{\\rho\}^\{2\}\}\(d\-3\)\. The performance gap is given byΔd:=𝒫d\(κd∗,nd∗\)−𝒫d\(κ∞∗,n∞∗\)\\Delta\_\{d\}:=\\mathcal\{P\}\_\{d\}\(\\kappa^\{\*\}\_\{d\},n^\{\*\}\_\{d\}\)\-\\mathcal\{P\}\_\{d\}\(\\kappa^\{\*\}\_\{\\infty\},n^\{\*\}\_\{\\infty\}\), where\(ρ∗,α∗\)\(\\rho^\{\*\},\\alpha^\{\*\}\)is the optimizer of𝒫∞\\mathcal\{P\}\_\{\\infty\}and\(k∞∗,n∞∗\)\(k^\{\*\}\_\{\\infty\},n^\{\*\}\_\{\\infty\}\)is constructed by mapping\(ρ∗,α∗\)\(\\rho^\{\*\},\\alpha^\{\*\}\)using the mappingκ\(ρ\)\\kappa\(\\rho\)andn\(α\)n\(\\alpha\)\.
Since\(κd∗,nd∗\)\(\\kappa^\{\*\}\_\{d\},n^\{\*\}\_\{d\}\)is the maximizer of𝒫d\\mathcal\{P\}\_\{d\}, we immediately have the lower boundΔd≥0\\Delta\_\{d\}\\geq 0\. Next, for the upper bound, we expand both terms using the approximation relation derived above
\|𝒫d\(κd∗,nd∗\)−𝒫∞\(ρ~,α~\)\|\\displaystyle\\big\|\\mathcal\{P\}\_\{d\}\(\\kappa^\{\*\}\_\{d\},n^\{\*\}\_\{d\}\)\-\\mathcal\{P\}\_\{\\infty\}\(\\tilde\{\\rho\},\\tilde\{\\alpha\}\)\\big\|≤Ktotal\(ρ~,α~\)logdd,\\displaystyle\\leq K\_\{\\text\{total\}\}\(\\tilde\{\\rho\},\\tilde\{\\alpha\}\)\\sqrt\{\\frac\{\\log d\}\{d\}\},\|𝒫d\(κ∞∗,n∞∗\)−𝒫∞\(ρ∗,α∗\)\|\\displaystyle\\big\|\\mathcal\{P\}\_\{d\}\(\\kappa^\{\*\}\_\{\\infty\},n^\{\*\}\_\{\\infty\}\)\-\\mathcal\{P\}\_\{\\infty\}\(\\rho^\{\*\},\\alpha^\{\*\}\)\\big\|≤Ktotal\(ρ∗,α∗\)logdd,\\displaystyle\\leq K\_\{\\text\{total\}\}\(\\rho^\{\*\},\\alpha^\{\*\}\)\\sqrt\{\\frac\{\\log d\}\{d\}\},Substituting these into the gap expression yields
\|Δd\|\\displaystyle\\big\|\\Delta\_\{d\}\\big\|≤\|𝒫d\(κd∗,nd∗\)−𝒫∞\(ρ~,α~\)\|\+\|𝒫d\(κ∞∗,n∞∗\)−𝒫∞\(ρ∗,α∗\)\|\\displaystyle\\leq\\big\|\\mathcal\{P\}\_\{d\}\(\\kappa^\{\*\}\_\{d\},n^\{\*\}\_\{d\}\)\-\\mathcal\{P\}\_\{\\infty\}\(\\tilde\{\\rho\},\\tilde\{\\alpha\}\)\\big\|\+\\big\|\\mathcal\{P\}\_\{d\}\(\\kappa^\{\*\}\_\{\\infty\},n^\{\*\}\_\{\\infty\}\)\-\\mathcal\{P\}\_\{\\infty\}\(\\rho^\{\*\},\\alpha^\{\*\}\)\\big\|≤\(Ktotal\(ρ~,α~\)\+Ktotal\(ρ∗,α∗\)\)logdd,\\displaystyle\\leq\\big\(K\_\{\\text\{total\}\}\(\\tilde\{\\rho\},\\tilde\{\\alpha\}\)\+K\_\{\\text\{total\}\}\(\\rho^\{\*\},\\alpha^\{\*\}\)\\big\)\\sqrt\{\\frac\{\\log d\}\{d\}\},where we use the fact that𝒫∞\(ρ~,α~\)≤𝒫∞\(ρ∗,α∗\)\\mathcal\{P\}\_\{\\infty\}\(\\tilde\{\\rho\},\\tilde\{\\alpha\}\)\\leq\\mathcal\{P\}\_\{\\infty\}\(\\rho^\{\*\},\\alpha^\{\*\}\)\.
It remains to show thatKtotal\(ρ~,α~\)K\_\{\\text\{total\}\}\(\\tilde\{\\rho\},\\tilde\{\\alpha\}\)is uniformly bounded indd\. We use a similar argument as in the proof of Theorem[4](https://arxiv.org/html/2605.23944#Thmtheorem4)\. Since𝒫d\(0,1\)≥0\\mathcal\{P\}\_\{d\}\(0,1\)\\geq 0\(zero communication, single recommendation\), any optimizer must satisfy𝒫d\(κd∗,nd∗\)≥0\\mathcal\{P\}\_\{d\}\(\\kappa^\{\*\}\_\{d\},n^\{\*\}\_\{d\}\)\\geq 0\. Using the bound𝔼\[maxi⟨𝐡,𝜽i⟩\]≤1\\mathbb\{E\}\[\\max\_\{i\}\\langle\\mathbf\{h\},\\bm\{\\theta\}\_\{i\}\\rangle\]\\leq 1, we have
0≤𝒫d\(κd∗,nd∗\)≤1−λslognd∗−λcDKL\.\\displaystyle 0\\leq\\mathcal\{P\}\_\{d\}\(\\kappa^\{\*\}\_\{d\},n^\{\*\}\_\{d\}\)\\leq 1\-\\lambda\_\{s\}\\log n^\{\*\}\_\{d\}\-\\lambda\_\{c\}D\_\{\\mathrm\{KL\}\}\.This givesλslognd∗≤1\\lambda\_\{s\}\\log n^\{\*\}\_\{d\}\\leq 1, i\.e\.,α~=lognd∗d≤1cs\\tilde\{\\alpha\}=\\frac\{\\log n^\{\*\}\_\{d\}\}\{d\}\\leq\\frac\{1\}\{c\_\{s\}\}\. Similarly,λcDKL≤1\\lambda\_\{c\}D\_\{\\mathrm\{KL\}\}\\leq 1\. From Proposition[3](https://arxiv.org/html/2605.23944#Thmtheorem3),DKL≥d−22log11−ρ~2−\|ℰ[3](https://arxiv.org/html/2605.23944#Thmtheorem3)\|D\_\{\\mathrm\{KL\}\}\\geq\\frac\{d\-2\}\{2\}\\log\\frac\{1\}\{1\-\\tilde\{\\rho\}^\{2\}\}\-\|\\mathcal\{E\}\_\{\\ref\{prop: kl\_approx\}\}\|\. Forddsufficiently large, the error term is lower order \(in terms ofdd\), givingcc2log11−ρ~2≲1\\frac\{c\_\{c\}\}\{2\}\\log\\frac\{1\}\{1\-\\tilde\{\\rho\}^\{2\}\}\\lesssim 1, which yieldsρ~≤1−e−2/cc\\tilde\{\\rho\}\\leq\\sqrt\{1\-e^\{\-2/c\_\{c\}\}\}\. Thus\(ρ~,α~\)\(\\tilde\{\\rho\},\\tilde\{\\alpha\}\)lies in the compact set\[0,1−e−2/cc\]×\[0,1/cs\]\[0,\\sqrt\{1\-e^\{\-2/c\_\{c\}\}\}\]\\times\[0,1/c\_\{s\}\]for all largedd, andKtotal\(ρ~,α~\)K\_\{\\text\{total\}\}\(\\tilde\{\\rho\},\\tilde\{\\alpha\}\)is uniformly bounded\.
This establishes the upper bound and completes the proof\. ∎
### F\.2Proof of Proposition[6](https://arxiv.org/html/2605.23944#Thmtheorem6)
###### Proof\.
From the expression of the objective𝒫d\(κ,n\)\\mathcal\{P\}\_\{d\}\(\\kappa,n\), we have that for any fixed interaction policy\(κ,n\)\(\\kappa,n\),
∂∂λs𝒫d\(κ,n\)=−logn,\\displaystyle\\frac\{\\partial\}\{\\partial\\lambda\_\{s\}\}\\mathcal\{P\}\_\{d\}\(\\kappa,n\)=\-\\log n,∂∂λc𝒫d\(κ,n\)=−DKL\(qκ\(⋅\|⋅\)∥p\(⋅\)\)\.\\displaystyle\\frac\{\\partial\}\{\\partial\\lambda\_\{c\}\}\\mathcal\{P\}\_\{d\}\(\\kappa,n\)=\-D\_\{\\text\{KL\}\}\\big\(q\_\{\\kappa\}\(\\cdot\|\\cdot\)\\\|p\(\\cdot\)\\big\)\.By the Envelope Theorem, the derivative of the maximized payoffmaxκ,n𝒫d\(κ,n\)\\max\_\{\\kappa,n\}\\mathcal\{P\}\_\{d\}\(\\kappa,n\)with respect to a cost parameter is the derivative of the objective evaluated at the optimal policy\(κ∗,n∗\)\(\\kappa^\{\*\},n^\{\*\}\)\. Sincelogn≥0\\log n\\geq 0for alln≥1n\\geq 1and the KL divergence is non\-negative, it follows that:
∂∂λs\[maxκ,n𝒫d\(κ,n\)\]=−logn∗≤0,and∂∂λc\[maxκ,n𝒫d\(κ,n\)\]=−DKL\(qκ∗∥p\)≤0\.\\displaystyle\\frac\{\\partial\}\{\\partial\\lambda\_\{s\}\}\\Big\[\\max\_\{\\kappa,n\}\\mathcal\{P\}\_\{d\}\(\\kappa,n\)\\Big\]=\-\\log n^\{\*\}\\leq 0,\\quad\\text\{and\}\\quad\\frac\{\\partial\}\{\\partial\\lambda\_\{c\}\}\\Big\[\\max\_\{\\kappa,n\}\\mathcal\{P\}\_\{d\}\(\\kappa,n\)\\Big\]=\-D\_\{\\text\{KL\}\}\(q\_\{\\kappa^\{\*\}\}\\\|p\)\\leq 0\.
Next, we use the fact that the expected utility is bounded above by11\. For any policy withn≥2n\\geq 2, the payoff is constrained by𝒫d\(κ,n\)≤1−λslog2\\mathcal\{P\}\_\{d\}\(\\kappa,n\)\\leq 1\-\\lambda\_\{s\}\\log 2\. We can define a thresholdλ¯s=1/log2\\bar\{\\lambda\}\_\{s\}=1/\\log 2such that for anyλs\>λ¯s\\lambda\_\{s\}\>\\bar\{\\lambda\}\_\{s\}, the payoff for anyn≥2n\\geq 2is strictly less than zero\. Since the system can always achieve a payoff of at least0by selectingn=1n=1andκ=0\\kappa=0, any policy withn≥2n\\geq 2becomes strictly suboptimal\. Combined with the fact that the optimal payoff is a decreasing function in terms ofλs\\lambda\_\{s\}, there existsλ¯s\(λc\)\\bar\{\\lambda\}\_\{s\}\(\\lambda\_\{c\}\)such that for allλs≥λ¯s\(λc\)\\lambda\_\{s\}\\geq\\bar\{\\lambda\}\_\{s\}\(\\lambda\_\{c\}\), we havend∗=1n^\{\*\}\_\{d\}=1\. This implies that forλc<λ¯c\(λs\)=λ¯s−1\(λs\)\\lambda\_\{c\}<\\underline\{\\lambda\}\_\{c\}\(\\lambda\_\{s\}\)=\\bar\{\\lambda\}\_\{s\}^\{\-1\}\(\\lambda\_\{s\}\), we have thatnd∗=1n^\{\*\}\_\{d\}=1\.
A similar argument applies to the Pure Search regime\. Note that the KL divergenceDKL\(qκ∥p\)D\_\{\\text\{KL\}\}\(q\_\{\\kappa\}\\\|p\)is a strictly increasing and unbounded function ofκ\\kappafor the vMF distribution\. As such, there exists a thresholdλ¯c\\bar\{\\lambda\}\_\{c\}such that for allλc\>λ¯c\\lambda\_\{c\}\>\\bar\{\\lambda\}\_\{c\}and for anynn,
∂∂κ\[𝒫d\(κ,n\)\]\|κ=0<0\.\\displaystyle\\frac\{\\partial\}\{\\partial\\kappa\}\\Big\[\\mathcal\{P\}\_\{d\}\(\\kappa,n\)\\Big\]\\Bigg\|\_\{\\kappa=0\}<0\.Combined with the fact that the optimal payoff is a decreasing function in terms ofλc\\lambda\_\{c\}, there existsλ¯c\(λs\)\\bar\{\\lambda\}\_\{c\}\(\\lambda\_\{s\}\)such that for allλc≥λ¯c\(λs\)\\lambda\_\{c\}\\geq\\bar\{\\lambda\}\_\{c\}\(\\lambda\_\{s\}\), we haveκd∗=0\\kappa^\{\*\}\_\{d\}=0\. ∎
## Appendix GProofs of Results in Section[4](https://arxiv.org/html/2605.23944#S4)
### G\.1Proof of Theorem[7](https://arxiv.org/html/2605.23944#Thmtheorem7)
###### Proof\.
Using the decomposition presented in Section[2\.1](https://arxiv.org/html/2605.23944#S2.SS1), the decomposition of𝜽i\\bm\{\\theta\}\_\{i\}, as𝜽i∼q\(⋅\|𝐦\)∈𝒬\\bm\{\\theta\}\_\{i\}\\sim q\(\\cdot\|\\mathbf\{m\}\)\\in\\mathcal\{Q\}, we have
𝐡=W𝐦\+1−W2𝐘,𝜽i=Vi𝐦\+1−Vi2𝐘i,\\displaystyle\\mathbf\{h\}=W\\mathbf\{m\}\+\\sqrt\{1\-W^\{2\}\}\\mathbf\{Y\},\\quad\\bm\{\\theta\}\_\{i\}=V\_\{i\}\\mathbf\{m\}\+\\sqrt\{1\-V\_\{i\}^\{2\}\}\\mathbf\{Y\}\_\{i\},where𝐘\\mathbf\{Y\}and𝐘i\\mathbf\{Y\}\_\{i\}’s represent the uncommunicated component \(see Section[2\.1](https://arxiv.org/html/2605.23944#S2.SS1)\)\. Using the above decomposition, we can write the utility
⟨𝐡,𝜽i⟩=WVi\+1−W21−Vi2Xi,\\displaystyle\\langle\\mathbf\{h\},\\bm\{\\theta\}\_\{i\}\\rangle=WV\_\{i\}\+\\sqrt\{1\-W^\{2\}\}\\sqrt\{1\-V\_\{i\}^\{2\}\}X\_\{i\},whereXi=⟨𝐘,𝐘i⟩X\_\{i\}=\\langle\\mathbf\{Y\},\\mathbf\{Y\}\_\{i\}\\rangle\. From Lemma[1](https://arxiv.org/html/2605.23944#Thmtheorem1), we know thatXiX\_\{i\}’s are independent and identically distribution\. The expected utility of the recommendation set is then,
Un\(q\)=𝔼\[maxi∈\[n\]\{WVi\+1−W21−Vi2Xi\}\]\.\\displaystyle U\_\{n\}\(q\)=\\mathbb\{E\}\\Big\[\\max\_\{i\\in\[n\]\}\\big\\\{WV\_\{i\}\+\\sqrt\{1\-W^\{2\}\}\\sqrt\{1\-V\_\{i\}^\{2\}\}X\_\{i\}\\big\\\}\\Big\]\.Using the bound on the MGF \(moment generating function\) ofWWfrom Lemma[11](https://arxiv.org/html/2605.23944#Thmtheorem11), we have that
𝔼\[\|W−𝔼\[W\]\|\]≤1d⋆\.\\displaystyle\\mathbb\{E\}\\big\[\|W\-\\mathbb\{E\}\[W\]\|\\big\]\\leq\\frac\{1\}\{\\sqrt\{d^\{\\star\}\}\}\.Further, as\|W\|≤1\|W\|\\leq 1, we have
𝔼\|1−W2−𝔼\[1−W2\]\|≤2𝔼\|1−W2−1−ρ2\|≤21−ρ2𝔼\|W−ρ\|≤2ρK¯d⋆\(1−ρ2\)\.\\displaystyle\\mathbb\{E\}\\left\|\\sqrt\{1\-W^\{2\}\}\-\\mathbb\{E\}\[\\sqrt\{1\-W^\{2\}\}\]\\right\|\\leq 2\\mathbb\{E\}\\left\|\\sqrt\{1\-W^\{2\}\}\-\\sqrt\{1\-\\rho^\{2\}\}\\right\|\\leq\\frac\{2\}\{\\sqrt\{1\-\\rho^\{2\}\}\}\\mathbb\{E\}\|W\-\\rho\|\\leq\\frac\{2\\rho\\bar\{K\}\}\{\\sqrt\{d^\{\\star\}\}\(1\-\\rho^\{2\}\)\}\.where the second inequality follows by using Eq\. \([15](https://arxiv.org/html/2605.23944#A5.E15)\), and the third inequality follows by Lemma[11](https://arxiv.org/html/2605.23944#Thmtheorem11)\. Further, as\|Vi\|≤1\|V\_\{i\}\|\\leq 1and\|Xi\|≤1\|X\_\{i\}\|\\leq 1, we get
\|Un\(q\)−𝔼\[maxi∈\[n\]\{𝔼\[W\]Vi\+𝔼\[1−W2\]1−Vi2Xi\}\]\|≤1\+2ρK¯d⋆\(1−ρ2\)\.\\displaystyle\\Big\|U\_\{n\}\(q\)\-\\mathbb\{E\}\\Big\[\\max\_\{i\\in\[n\]\}\\big\\\{\\mathbb\{E\}\[W\]V\_\{i\}\+\\mathbb\{E\}\\big\[\\sqrt\{1\-W^\{2\}\}\\big\]\\sqrt\{1\-V\_\{i\}^\{2\}\}X\_\{i\}\\big\\\}\\Big\]\\Big\|\\leq\\frac\{1\+2\\rho\\bar\{K\}\}\{\\sqrt\{d^\{\\star\}\}\(1\-\\rho^\{2\}\)\}\.\(26\)
To simplify this expectation, we defineMn=maxi∈\[n\]XiM\_\{n\}=\\max\_\{i\\in\[n\]\}X\_\{i\}\. Then, we can upper bound each individual item’s utility using,
maxi∈\[n\]\\displaystyle\\max\_\{i\\in\[n\]\}\{𝔼\[W\]Vi\+𝔼\[1−W2\]1−Vi2Xi\}\\displaystyle\\big\\\{\\mathbb\{E\}\[W\]V\_\{i\}\+\\mathbb\{E\}\[\\sqrt\{1\-W^\{2\}\}\]\\sqrt\{1\-V\_\{i\}^\{2\}\}X\_\{i\}\\big\\\}≤maxi∈\[n\]\{𝔼\[W\]Vi\+𝔼\[1−W2\]1−Vi2Mn\}\\displaystyle\\leq\\max\_\{i\\in\[n\]\}\\big\\\{\\mathbb\{E\}\[W\]V\_\{i\}\+\\mathbb\{E\}\[\\sqrt\{1\-W^\{2\}\}\]\\sqrt\{1\-V\_\{i\}^\{2\}\}M\_\{n\}\\big\\\}≤maxi∈\[n\]\{𝔼\[W\]Vi\+𝔼\[1−W2\]1−Vi2𝔼\[Mn\]\}\+\|Mn−𝔼\[Mn\]\|\\displaystyle\\leq\\max\_\{i\\in\[n\]\}\\big\\\{\\mathbb\{E\}\[W\]V\_\{i\}\+\\mathbb\{E\}\[\\sqrt\{1\-W^\{2\}\}\]\\sqrt\{1\-V\_\{i\}^\{2\}\}\\mathbb\{E\}\[M\_\{n\}\]\\big\\\}\+\|M\_\{n\}\-\\mathbb\{E\}\[M\_\{n\}\]\|≤maxv\{𝔼\[W\]v\+𝔼\[1−W2\]1−v2𝔼\[Mn\]\}\+\|Mn−𝔼\[Mn\]\|,\\displaystyle\\leq\\max\_\{v\}\\Big\\\{\\mathbb\{E\}\[W\]v\+\\mathbb\{E\}\[\\sqrt\{1\-W^\{2\}\}\]\\sqrt\{1\-v^\{2\}\}\\mathbb\{E\}\[M\_\{n\}\]\\Big\\\}\+\|M\_\{n\}\-\\mathbb\{E\}\[M\_\{n\}\]\|,\(27\)where we use that𝔼\[1−W2\]1−Vi2≤1\\mathbb\{E\}\[\\sqrt\{1\-W^\{2\}\}\]\\sqrt\{1\-V\_\{i\}^\{2\}\}\\leq 1, and the last inequality holds simply because
𝔼\[W\]Vi\+𝔼\[1−W2\]1−Vi2𝔼\[Mn\]≤maxv\{𝔼\[W\]v\+𝔼\[1−W2\]1−v2𝔼\[Mn\]\}\.\\mathbb\{E\}\[W\]V\_\{i\}\+\\mathbb\{E\}\[\\sqrt\{1\-W^\{2\}\}\]\\sqrt\{1\-V\_\{i\}^\{2\}\}\\mathbb\{E\}\[M\_\{n\}\]\\leq\\max\_\{v\}\\Big\\\{\\mathbb\{E\}\[W\]v\+\\mathbb\{E\}\[\\sqrt\{1\-W^\{2\}\}\]\\sqrt\{1\-v^\{2\}\}\\mathbb\{E\}\[M\_\{n\}\]\\Big\\\}\.
Next, we provide a bound onE\|Mn−𝔼\[Mn\]\|E\|M\_\{n\}\-\\mathbb\{E\}\[M\_\{n\}\]\|\. We use an argument similar to that in the proof of Lemma[1](https://arxiv.org/html/2605.23944#Thmtheorem1), to argue that the distribution ofMnM\_\{n\}is independent of message𝐦\\mathbf\{m\}\. Recall that,𝐘\\mathbf\{Y\}and𝐘i\{\\mathbf\{Y\}\_\{i\}\}’s are independent samples from the uniform distribution over𝒮\(𝐦\)=\{𝐲∈𝒮d−1:⟨𝐲,𝐦⟩=0\}\\mathcal\{S\}\(\\mathbf\{m\}\)=\\\{\\mathbf\{y\}\\in\\mathcal\{S\}^\{d\-1\}:\\langle\\mathbf\{y\},\\mathbf\{m\}\\rangle=0\\\}\. Due to the rotational symmetry of𝒮d−1\\mathcal\{S\}^\{d\-1\}, the distribution ofXi=⟨𝐘,𝐘i⟩X\_\{i\}=\\langle\\mathbf\{Y\},\\mathbf\{Y\}\_\{i\}\\rangleis independent of𝐦\\mathbf\{m\}\. Therefore, without loss of generality, we can choose𝐦=𝐞d=\(0,0,…,0,1\)\\mathbf\{m\}=\\mathbf\{e\}\_\{d\}=\(0,0,\\dots,0,1\)\. For this choice of𝐦\\mathbf\{m\}, we can generate samples fromUnif\(𝒮\(𝐞d\)\)\\text\{Unif\}\\big\(\\mathcal\{S\}\(\\mathbf\{e\}\_\{d\}\)\\big\)as follows: first sample𝐘~∼Unif\(𝒮d−2\)\\tilde\{\\mathbf\{Y\}\}\\sim\\text\{Unif\}\\big\(\\mathcal\{S\}^\{d\-2\}\\big\), then set𝐘=\(𝐘~,0\)\\mathbf\{Y\}=\(\\tilde\{\\mathbf\{Y\}\},0\)\. Similarly, we can write𝐘i=\(𝐘~i,0\)\\mathbf\{Y\}\_\{i\}=\(\\tilde\{\\mathbf\{Y\}\}\_\{i\},0\), where𝐘~i\\tilde\{\\mathbf\{Y\}\}\_\{i\}are independent samples fromUnif\(𝒮d−2\)\\text\{Unif\}\\big\(\\mathcal\{S\}^\{d\-2\}\\big\)\. Consequently,Xi=⟨𝐘,𝐘i⟩=⟨𝐘~,𝐘~i⟩X\_\{i\}=\\langle\\mathbf\{Y\},\\mathbf\{Y\}\_\{i\}\\rangle=\\langle\\tilde\{\\mathbf\{Y\}\},\\tilde\{\\mathbf\{Y\}\}\_\{i\}\\rangle\. This implies thatMnM\_\{n\}follows the same distribution asmaxi⟨𝐘~,𝐘~i⟩\\max\_\{i\}\\langle\\tilde\{\\mathbf\{Y\}\},\\tilde\{\\mathbf\{Y\}\}\_\{i\}\\rangle, where𝐘~\\tilde\{\\mathbf\{Y\}\}and𝐘~i\\tilde\{\\mathbf\{Y\}\}\_\{i\}’s are independent samples fromUnif\(𝒮d−2\)\\text\{Unif\}\\big\(\\mathcal\{S\}^\{d\-2\}\\big\)\.
As such, without loss of generality, we defineMnM\_\{n\}as a function of𝐘~\\tilde\{\\mathbf\{Y\}\}and𝐘~i\\tilde\{\\mathbf\{Y\}\}\_\{i\}’s\. Meaning, we writeMn\(𝐘~,𝐘~1,…,𝐘~n\):=maxi∈\[n\]⟨𝐘~,𝐘~i⟩M\_\{n\}\(\\tilde\{\\mathbf\{Y\}\},\\tilde\{\\mathbf\{Y\}\}\_\{1\},\\dots,\\tilde\{\\mathbf\{Y\}\}\_\{n\}\):=\\max\_\{i\\in\[n\]\}\\langle\\tilde\{\\mathbf\{Y\}\},\\tilde\{\\mathbf\{Y\}\}\_\{i\}\\rangle, whereMn\(𝐘~,𝐘~1,…,𝐘~n\)M\_\{n\}\(\\tilde\{\\mathbf\{Y\}\},\\tilde\{\\mathbf\{Y\}\}\_\{1\},\\dots,\\tilde\{\\mathbf\{Y\}\}\_\{n\}\)is11\-Lipschitz in each of its arguments separately\.
As presented in\[Ledoux,[2001](https://arxiv.org/html/2605.23944#bib.bib42), Bakryet al\.,[2013](https://arxiv.org/html/2605.23944#bib.bib41)\], the uniform distribution over𝒮d−2\\mathcal\{S\}^\{d\-2\}satisfies a Log\-Sobolev Inequality \(LSI\) with constant1/\(d−2\)1/\(d\-2\)\. By the tensorization property of the LSI\[Ledoux,[2001](https://arxiv.org/html/2605.23944#bib.bib42), Chapter 5\], the product measure on𝒮d−2×\(𝒮d−2\)n\\mathcal\{S\}^\{d\-2\}\\times\(\\mathcal\{S\}^\{d\-2\}\)^\{n\}also satisfies an LSI with the same constant1/\(d−2\)1/\(d\-2\)\. Consequently, for any functiong:𝒮d−2×\(𝒮d−2\)n→ℝg:\\mathcal\{S\}^\{d\-2\}\\times\(\\mathcal\{S\}^\{d\-2\}\)^\{n\}\\rightarrow\\mathbb\{R\}that is11\-Lipschitz separately in each of its arguments, we have
ℙ\(\|g\(𝐘~\)−𝔼\[g\(𝐘~\)\]\|\>ε\)≤2exp\(−\(d−2\)ε22\)\.\\displaystyle\\mathbb\{P\}\\big\(\|g\(\\tilde\{\\mathbf\{Y\}\}\)\-\\mathbb\{E\}\[g\(\\tilde\{\\mathbf\{Y\}\}\)\]\|\>\\varepsilon\\big\)\\leq 2\\exp\\left\(\-\\frac\{\(d\-2\)\\varepsilon^\{2\}\}\{2\}\\right\)\.SinceMn\(𝐘~,𝐘~1,…,𝐘~n\)=maxi∈\[n\]⟨𝐘~,𝐘~i⟩M\_\{n\}\(\\tilde\{\\mathbf\{Y\}\},\\tilde\{\\mathbf\{Y\}\}\_\{1\},\\dots,\\tilde\{\\mathbf\{Y\}\}\_\{n\}\)=\\max\_\{i\\in\[n\]\}\\langle\\tilde\{\\mathbf\{Y\}\},\\tilde\{\\mathbf\{Y\}\}\_\{i\}\\rangleis11\-Lipschitz in each of its arguments separately, we obtain
ℙ\(\|Mn−𝔼\[Mn\]\|\>ε\)≤2exp\(−\(d−2\)ε22\)\.\\displaystyle\\mathbb\{P\}\\big\(\|M\_\{n\}\-\\mathbb\{E\}\[M\_\{n\}\]\|\>\\varepsilon\\big\)\\leq 2\\exp\\left\(\-\\frac\{\(d\-2\)\\varepsilon^\{2\}\}\{2\}\\right\)\.By integrating this tail, the expected deviation is bounded as
𝔼\[\|Mn−𝔼\[Mn\]\|\]≤π2\(d−2\)≤π2d⋆\.\\displaystyle\\mathbb\{E\}\\big\[\|M\_\{n\}\-\\mathbb\{E\}\[M\_\{n\}\]\|\\big\]\\leq\\sqrt\{\\frac\{\\pi\}\{2\(d\-2\)\}\}\\leq\\sqrt\{\\frac\{\\pi\}\{2d^\{\\star\}\}\}\.Combining this with the arguments in Eq\. \([26](https://arxiv.org/html/2605.23944#A7.E26)\) and \([G\.1](https://arxiv.org/html/2605.23944#A7.Ex171)\), and using triangle’s inequality, we get
\|Un\(q\)−𝔼\[maxi∈\[n\]\{𝔼\[W\]Vi\+𝔼\[1−W2\]1−Vi2𝔼\[Mn\]\}\]\|≤1d⋆\(π2\+2ρK¯\(1−ρ2\)\)\.\\displaystyle\\Big\|U\_\{n\}\(q\)\-\\mathbb\{E\}\\Big\[\\max\_\{i\\in\[n\]\}\\big\\\{\\mathbb\{E\}\[W\]V\_\{i\}\+\\mathbb\{E\}\[\\sqrt\{1\-W^\{2\}\}\]\\sqrt\{1\-V\_\{i\}^\{2\}\}\\mathbb\{E\}\[M\_\{n\}\]\\big\\\}\\Big\]\\Big\|\\leq\\frac\{1\}\{\\sqrt\{d^\{\\star\}\}\}\\Big\(\\sqrt\{\\frac\{\\pi\}\{2\}\}\+\\frac\{2\\rho\\bar\{K\}\}\{\(1\-\\rho^\{2\}\)\}\\Big\)\.Now, the result follows by takingK[7](https://arxiv.org/html/2605.23944#Thmtheorem7)=π2\+2K¯K\_\{\\ref\{thm:tilt\_opt\}\}=\\sqrt\{\\frac\{\\pi\}\{2\}\}\+2\\bar\{K\}\. This completes the proof\. ∎
### G\.2Proof of Theorem[8](https://arxiv.org/html/2605.23944#Thmtheorem8)
###### Proof\.
To find the expected value ofMnM\_\{n\}, we first consider the special case \(ρ=0\\rho=0\) in Proposition[2](https://arxiv.org/html/2605.23944#Thmtheorem2)\. From the utility approximation result in Proposition[2](https://arxiv.org/html/2605.23944#Thmtheorem2), given a message𝐦\\mathbf\{m\}and a recommendation set sizen=⌊edα⌋n=\\lfloor e^\{d\\alpha\}\\rfloor, the expected utility forρ=0\\rho=0is
limd→∞𝔼\[maxi∈\[n\]⟨𝐡,𝜽i⟩\]=f\(0,α\),\\displaystyle\\lim\_\{d\\rightarrow\\infty\}\\mathbb\{E\}\\Big\[\\max\_\{i\\in\[n\]\}\\langle\\mathbf\{h\},\\bm\{\\theta\}\_\{i\}\\rangle\\Big\]=f\(0,\\alpha\),wheref\(0,α\)f\(0,\\alpha\)is the asymptotic utility defined by the large deviations rate functionI0\(w,x\)I\_\{0\}\(w,x\):
f\(0,α\):=maxw,x∈\(−1,1\)\{1−w2x\}subject toI0\(w,x\)≤α,\\displaystyle f\(0,\\alpha\):=\\max\_\{w,x\\in\(\-1,1\)\}\\left\\\{\\sqrt\{1\-w^\{2\}\}x\\right\\\}\\quad\\text\{subject to\}\\quad I\_\{0\}\(w,x\)\\leq\\alpha,whereI0\(w,x\)=−12log\(1−w2\)−12log\(1−x2\)I\_\{0\}\(w,x\)=\-\\frac\{1\}\{2\}\\log\(1\-w^\{2\}\)\-\\frac\{1\}\{2\}\\log\(1\-x^\{2\}\)\. The maximum is attained atw=0w=0andx=1−e−2αx=\\sqrt\{1\-e^\{\-2\\alpha\}\}, giving us
f\(0,α\)=1−e−2α\.\\displaystyle f\(0,\\alpha\)=\\sqrt\{1\-e^\{\-2\\alpha\}\}\.This result characterizes the expected maximum of inner products for vectors drawn uniformly from𝒮d−1\\mathcal\{S\}^\{d\-1\}\. Specifically, for𝐘~,𝐘~1,…,𝐘~n\\tilde\{\\mathbf\{Y\}\},\\tilde\{\\mathbf\{Y\}\}\_\{1\},\\dots,\\tilde\{\\mathbf\{Y\}\}\_\{n\}i\.i\.d\. and uniform over𝒮d−1\\mathcal\{S\}^\{d\-1\}
limd→∞𝔼\[maxi∈\[n\]⟨𝐘~,𝐘~i⟩\]=1−e−2α\.\\displaystyle\\lim\_\{d\\rightarrow\\infty\}\\mathbb\{E\}\\left\[\\max\_\{i\\in\[n\]\}\\langle\\tilde\{\\mathbf\{Y\}\},\\tilde\{\\mathbf\{Y\}\}\_\{i\}\\rangle\\right\]=\\sqrt\{1\-e^\{\-2\\alpha\}\}\.Finally, we apply this to the maximum ofXi=⟨𝐘,𝐘i⟩X\_\{i\}=\\langle\\mathbf\{Y\},\\mathbf\{Y\}\_\{i\}\\rangle\. Although𝐘\\mathbf\{Y\}and𝐘i\\mathbf\{Y\}\_\{i\}are uniform over the\(d−2\)\(d\-2\)\-dimensional hypersphere𝒮d−2\\mathcal\{S\}^\{d\-2\}rather than𝒮d−1\\mathcal\{S\}^\{d\-1\}, the asymptotic behavior of the rate function remains identical for largedd\. Thus, the expected maximum of the search components is:
limd→∞𝔼\[Mn\]=limd→∞𝔼\[maxi∈\[n\]Xi\]=1−e−2α\.\\displaystyle\\lim\_\{d\\rightarrow\\infty\}\\mathbb\{E\}\[M\_\{n\}\]=\\lim\_\{d\\rightarrow\\infty\}\\mathbb\{E\}\\left\[\\max\_\{i\\in\[n\]\}X\_\{i\}\\right\]=\\sqrt\{1\-e^\{\-2\\alpha\}\}\.From Lemma[11](https://arxiv.org/html/2605.23944#Thmtheorem11)\(and as argued in the proof of Proposition[2](https://arxiv.org/html/2605.23944#Thmtheorem2)\), we have
limd→∞𝔼\[W\]=ρandlimd→∞𝔼\[1−W2\]=1−ρ2\.\\displaystyle\\lim\_\{d\\rightarrow\\infty\}\\mathbb\{E\}\[W\]=\\rho\\quad\\text\{and\}\\quad\\lim\_\{d\\rightarrow\\infty\}\\mathbb\{E\}\[\\sqrt\{1\-W^\{2\}\}\]=\\sqrt\{1\-\\rho^\{2\}\}\.By setting the concentration parameterκd\(ρ\)=ρ1−ρ2\(d−3\)\\kappa\_\{d\}\(\\rho\)=\\frac\{\\rho\}\{1\-\\rho^\{2\}\}\(d\-3\)and the set sizend\(α\)=⌊eαd⌋n\_\{d\}\(\\alpha\)=\\lfloor e^\{\\alpha d\}\\rfloorfor fixedρ\\rhoandα\\alpha, for any choice of tilt parametervv, we have
limd→∞𝒯d\(κd,nd,v\)=𝒯∞\(ρ,α,v\),\\displaystyle\\lim\_\{d\\to\\infty\}\\mathcal\{T\}\_\{d\}\(\\kappa\_\{d\},n\_\{d\},v\)=\\mathcal\{T\}\_\{\\infty\}\(\\rho,\\alpha,v\),where the limiting utility is given by:
𝒯∞\(ρ,α,v\)=ρv\+1−ρ21−v21−e−2α\.\\displaystyle\\mathcal\{T\}\_\{\\infty\}\(\\rho,\\alpha,v\)=\\rho v\+\\sqrt\{1\-\\rho^\{2\}\}\\sqrt\{1\-v^\{2\}\}\\sqrt\{1\-e^\{\-2\\alpha\}\}\.To find the optimal tilt parameterv∗v^\{\*\}corresponding to𝒯∞\\mathcal\{T\}\_\{\\infty\}, we simply optimize𝒯∞\(ρ,α,v\)\\mathcal\{T\}\_\{\\infty\}\(\\rho,\\alpha,v\)for fixedρ\\rhoandα\\alpha, which gives,
v∗\(ρ,α\)=ρρ2\+\(1−ρ2\)\(1−e−2α\)\.\\displaystyle v^\{\*\}\(\\rho,\\alpha\)=\\frac\{\\rho\}\{\\sqrt\{\\rho^\{2\}\+\(1\-\\rho^\{2\}\)\(1\-e^\{\-2\\alpha\}\)\}\}\.Substitutingv∗\(ρ,α\)v^\{\*\}\(\\rho,\\alpha\)back into𝒯∞\(ρ,α,v\)\\mathcal\{T\}\_\{\\infty\}\(\\rho,\\alpha,v\), the maximized utility \(at the optimal tilt\) simplifies to
maxv𝒯∞\(ρ,α,v\)=ρ2\+\(1−ρ2\)\(1−e−2α\)\.\\displaystyle\\max\_\{v\}\\mathcal\{T\}\_\{\\infty\}\(\\rho,\\alpha,v\)=\\sqrt\{\\rho^\{2\}\+\(1\-\\rho^\{2\}\)\(1\-e^\{\-2\\alpha\}\)\}\.Next, we aim to solve forρ∗\\rho^\{\*\}andα∗\\alpha^\{\*\}, where
\(ρ∗,α∗\)\\displaystyle\(\\rho^\{\*\},\\alpha^\{\*\}\)=argmaxρ∈\[0,1\),α≥0\{ρ2\+\(1−ρ2\)\(1−e−2α\)−csα\+12cclog\(1−ρ2\)\}\\displaystyle=\\arg\\max\_\{\\rho\\in\[0,1\),\\alpha\\geq 0\}\\left\\\{\\sqrt\{\\rho^\{2\}\+\(1\-\\rho^\{2\}\)\(1\-e^\{\-2\\alpha\}\)\}\-c\_\{s\}\\alpha\+\\frac\{1\}\{2\}c\_\{c\}\\log\(1\-\\rho^\{2\}\)\\right\\\}=argmaxρ∈\[0,1\),α≥0\{1−\(1−ρ2\)e−2α\+12csloge−2α\+12cclog\(1−ρ2\)\}\.\\displaystyle=\\arg\\max\_\{\\rho\\in\[0,1\),\\alpha\\geq 0\}\\left\\\{\\sqrt\{1\-\(1\-\\rho^\{2\}\)e^\{\-2\\alpha\}\}\+\\frac\{1\}\{2\}c\_\{s\}\\log e^\{\-2\\alpha\}\+\\frac\{1\}\{2\}c\_\{c\}\\log\(1\-\\rho^\{2\}\)\\right\\\}\.As can be easily observed: \(i\) whencc\>csc\_\{c\}\>c\_\{s\}, it is optimal to setρ=0\\rho=0, i\.e\.,ρ∗=0\\rho^\{\*\}=0which also givesv∗=0v^\{\*\}=0, asv∗\(0,α\)=0v^\{\*\}\(0,\\alpha\)=0for anyα\>0\\alpha\>0, \(ii\) Conversely, whencc<csc\_\{c\}<c\_\{s\}, it is optimal to setα=0\\alpha=0, i\.e\.,α∗=0\\alpha^\{\*\}=0which also givesv∗=1v^\{\*\}=1, asv∗\(ρ,0\)=1v^\{\*\}\(\\rho,0\)=1for anyρ\>0\\rho\>0\.
Further, by representingcmin=min\{cs,cc\}c\_\{\\text\{min\}\}=\\min\\\{c\_\{s\},c\_\{c\}\\\}and supposez=\(1−ρ2\)e−2αz=\(1\-\\rho^\{2\}\)e^\{\-2\\alpha\}, then we have
maxρ∈\[0,1\),α≥0\\displaystyle\\max\_\{\\rho\\in\[0,1\),\\alpha\\geq 0\}\{1−\(1−ρ2\)e−2α\+12csloge−2α\+12cclog\(1−ρ2\)\}\\displaystyle\\left\\\{\\sqrt\{1\-\(1\-\\rho^\{2\}\)e^\{\-2\\alpha\}\}\+\\frac\{1\}\{2\}c\_\{s\}\\log e^\{\-2\\alpha\}\+\\frac\{1\}\{2\}c\_\{c\}\\log\(1\-\\rho^\{2\}\)\\right\\\}≤maxρ∈\[0,1\),α≥0\{1−\(1−ρ2\)e−2α\+12cminloge−2α\+12cminlog\(1−ρ2\)\}\\displaystyle\\leq\\max\_\{\\rho\\in\[0,1\),\\alpha\\geq 0\}\\left\\\{\\sqrt\{1\-\(1\-\\rho^\{2\}\)e^\{\-2\\alpha\}\}\+\\frac\{1\}\{2\}c\_\{\\text\{min\}\}\\log e^\{\-2\\alpha\}\+\\frac\{1\}\{2\}c\_\{\\text\{min\}\}\\log\(1\-\\rho^\{2\}\)\\right\\\}=maxz∈\[0,1\)\{1−z\+12cminlogz\}\.\\displaystyle=\\max\_\{z\\in\[0,1\)\}\\left\\\{\\sqrt\{1\-z\}\+\\frac\{1\}\{2\}c\_\{\\text\{min\}\}\\log z\\right\\\}\.\(28\)By solving the above optimization problem, we get
z∗=12\(cmin4\+4cmin2−cmin2\)\.\\displaystyle z^\{\*\}=\\frac\{1\}\{2\}\\Big\(\\sqrt\{c\_\{\\min\}^\{4\}\+4c\_\{\\min\}^\{2\}\}\-c\_\{\\min\}^\{2\}\\Big\)\.and the equality in Eq\. \([G\.2](https://arxiv.org/html/2605.23944#A7.Ex191)\) holds when eitherρ∗=0\\rho^\{\*\}=0\(ifcc\>csc\_\{c\}\>c\_\{s\}\) orα∗=0\\alpha^\{\*\}=0\(ifcc<csc\_\{c\}<c\_\{s\}\), and
\(1−\(ρ∗\)2\)e−2α∗=z∗=12\(cmin4\+4cmin2−cmin2\)\.\\displaystyle\\big\(1\-\(\\rho^\{\*\}\)^\{2\}\\big\)e^\{\-2\\alpha^\{\*\}\}=z^\{\*\}=\\frac\{1\}\{2\}\\Big\(\\sqrt\{c\_\{\\min\}^\{4\}\+4c\_\{\\min\}^\{2\}\}\-c\_\{\\min\}^\{2\}\\Big\)\.This completes the proof\. ∎Similar Articles
How are AI assistants deciding which companies to recommend?
Discusses how AI assistants generate company recommendations, noting inconsistencies and suggesting a new discoverability challenge compared to traditional search.
How Should We Determine Whether an AI Agent's Recommendation Is Truly Quality-Driven?
Discusses the inadequacy of traditional metrics like accuracy and click-through rates for evaluating AI agent recommendations, proposing a more holistic long-term evaluation that includes user understanding, trade-offs, and real-world problem-solving.
How Should AI Agents Avoid Losing User Trust When Providing Business Recommendations?
The article discusses the challenge of maintaining user trust in AI agents that provide commercial recommendations, highlighting a lack of standards for transparency and responsibility. It calls for feedback from developers on implementing reliable and transparent recommendation mechanisms.
Do AI systems accidentally reinforce big brands too much?
A discussion on how AI language models may disproportionately recommend well-known brands, potentially making it harder for smaller companies to be discovered in AI-powered search.
When salespeople recommend products, which information sources should they rely on?
Discusses the challenges AI agents face when recommending products from multiple information sources, each with its own biases and limitations, and questions how to design a trust layer for reliable recommendations.