Counterparty Modeling is Not Strategy: The Limits of LLM Negotiators
Summary
Study shows LLM agents can model counterparty preferences in negotiation but fail to turn that knowledge into strategic bargaining to improve outcomes, limiting their effectiveness in multi-turn negotiations.
View Cached Full Text
Cached at: 05/19/26, 06:34 AM
# The Limits of LLM Negotiators
Source: [https://arxiv.org/html/2605.16575](https://arxiv.org/html/2605.16575)
## Counterparty Modeling is Not Strategy: The Limits of LLM Negotiators
###### Abstract
Negotiation requires more than inferring what the other side wants: it requires using that information to make advantageous offers and counteroffers over multiple turns\. We study whether large language model \(LLM\) agents do this in a controlled multi\-attribute bargaining environment\. We find that current LLM agents can model a counterparty’s preferences, but do not reliably turn that knowledge into strategic bargaining\. When given negotiating partner preference information, agents model it accurately and early in their reasoning traces, yet this does not reliably improve outcomes for the informed side\. Turn\-level analyses show why: agents often respond to what they believe the counterparty values, but do not consistently pair those moves with gains on their own high\-value attributes\. Sellers are more accommodating overall, and in asymmetric\-information conditions, the informed side often makes the more weakly compensated concessions\. Because agents fail to leverage this underlying utility structure for strategic advantage, their final agreements are heavily dictated by surface\-level opening anchors rather than actual utility weights\. Finally, requiring agents to explicitly state concession\-for\-reciprocity trades before making an offer makes individual turns look more strategic, but ultimately fails to improve the efficiency of the final agreements\.
## 1\. Introduction
Negotiation is a compact test of strategic social intelligence\(Zhouet al\.,[2023](https://arxiv.org/html/2605.16575#bib.bib40)\)\. To negotiate well, an agent must not only infer what the other party wants, but also decide how to use that information over multiple turns to maximize their own outcome: which concessions to make, which attributes to hold fixed, and what to demand in return\. For example, knowing the counterparty’s priorities should allow an agent to concede on low\-cost attributes in exchange for gains on its own top priorities\. Strategic use of preference information therefore means more than identifying what the other party wants: it means turning that knowledge into reciprocal exchanges that improve one’s own position\.
This makes negotiation a useful setting for separating two capabilities that are often conflated in large language models \(LLMs\):*counterparty modeling*and*strategic utilization*\(Zhaoet al\.,[2024](https://arxiv.org/html/2605.16575#bib.bib39); Kosinski,[2024](https://arxiv.org/html/2605.16575#bib.bib37)\)\. Information about the other party’s preferences should help only if it can be converted into better contingent action\. The central question of this paper is whether current LLM agents can make that transition from counterparty modeling to strategy\.
As LLMs are increasingly used in procurement, contracting, sales, and organizational coordination\(Songet al\.,[2026](https://arxiv.org/html/2605.16575#bib.bib33); Kirshneret al\.,[2026](https://arxiv.org/html/2605.16575#bib.bib36)\), it is not enough for an agent to recognize another party’s preferences\. It must convert that knowledge into advantageous action under interaction\. An agent that understands the other party but systematically bargains poorly may appear socially intelligent while remaining strategically fragile and exploitable\.
This setting highlights a general issue in LLM evaluation: evidence that a model can describe or reason about a task\-relevant variable should not be taken as evidence that it can also use that variable effectively in sequential behavior\(Huanget al\.,[2025](https://arxiv.org/html/2605.16575#bib.bib38)\)\. In humans, that inference is often reasonable\. In LLM agents, however, these two capacities may be only partially aligned\(Junqué de Fortuny and Cappelli,[2025](https://arxiv.org/html/2605.16575#bib.bib35)\)\. Negotiation offers a controlled setting in which to examine that distinction directly\.
Recent work has established that LLMs can participate in negotiations, respond to social tactics, and sometimes achieve strong final outcomes in bilateral bargaining or benchmark settings\(Denget al\.,[2024](https://arxiv.org/html/2605.16575#bib.bib25); Bianchiet al\.,[2024](https://arxiv.org/html/2605.16575#bib.bib24); Fuet al\.,[2023](https://arxiv.org/html/2605.16575#bib.bib23); Huaet al\.,[2024](https://arxiv.org/html/2605.16575#bib.bib26); Liuet al\.,[2026](https://arxiv.org/html/2605.16575#bib.bib27); Zhuet al\.,[2025](https://arxiv.org/html/2605.16575#bib.bib28); Rana,[2024](https://arxiv.org/html/2605.16575#bib.bib42); Lewiset al\.,[2017](https://arxiv.org/html/2605.16575#bib.bib32); Davidsonet al\.,[2024](https://arxiv.org/html/2605.16575#bib.bib31)\)\. But these studies mostly evaluate*what*agreements are reached \(final prices, deal rates, welfare, or broad measures of efficiency and fairness\) rather than whether an agent can systematically leverage advantages, such as asymmetric information, to secure superior individual outcomes\. Because the focus has remained largely on these final states, much less is known about*how*LLM agents use information across turns\. In particular, prior work does not isolate whether LLM agents convert explicit knowledge of their counterparty’s preferences into reciprocal, strategically effective bargaining behavior over the course of a negotiation\.
In this paper, we study whether current LLM agents can use information about the other side’s preferences strategically in multi\-attribute bilateral negotiation\. Specifically, we place agents in a controlled car purchase bargaining environment with ten contract attributes and randomized linear utility profiles\. Each agent’s utility is a private weighted sum over contract features, determining both their preferences across outcomes and their reservation value\. Because this is a familiar consumer\-negotiation scenario, it helps reduce domain comprehension confounds and lets us focus on strategic reasoning\. Across conditions, we vary whether the buyer, the seller, both, or neither receives a prompt\-level ranked summary of the other party’s preferences\. We then analyze behavior at three levels: final outcomes, reasoning trace beliefs, and turn\-level bargaining dynamics\.
Our main finding is not simply that information about a negotiating partner’s preferences has limited effects on outcomes, but rather that even when modeled correctly, it does not reliably translate into a strategic bargaining advantage\. While providing information modestly improves joint welfare, it does not reliably benefit the information holder\. In particular, providing the seller with the buyer’s preferences does not help the seller extract value, and instead, the seller often simply accommodates the buyer\.
The reasoning traces show that this is not a failure of perception\. Informed agents frequently identify the negotiating partner’s priorities early and explicitly\. The problem arises in the transition from modeling to strategy\. To diagnose that transition, we introduce metrics that measure whether offers move in the direction the agent believes the other party prefers, and whether those moves are paired with gains on the agent’s own high\-value attributes\. Our results show that the agents respond to what the other party values, but do not reliably turn that information into reciprocal exchange\.
Because agents fail to leverage this underlying utility structure for strategic advantage, their negotiations are strongly influenced by the first concrete values introduced in the conversation\. Consistent with this pattern, final agreements are much more strongly tied to the first proposed price than to the agents’ actual utility weights\. This suggests that models rely too much on specific numbers in the prompt or dialogue, instead of using the latent utility structure that should guide efficient bargaining\.
We then investigated whether this gap could be closed by forcing agents to explicitly structure their trades\. We introduced a trade\-plan intervention that requires agents to use a give/ask template before making an offer\. However, this forced structure likewise does not improve efficiency\. This suggests that the missing ingredient is not merely the ability to articulate a one\-step trade, but the ability to embed those trades in reciprocal multi\-turn bargaining\.
These results point to a broader conclusion: current LLM agents can model negotiation\-relevant information, but do not reliably use it to organize bargaining strategically across turns\. Even when an agent has accurate information about the negotiating partner’s priorities, that information does not consistently translate into reciprocal exchange that improves the agent’s own position\.
#### We summarize our contributions as follows:
- •Access to the negotiating partner’s preferences does not reliably translate into bargaining advantage for the informed side: outcome effects are modest and often benefit the other side instead \(Sec\.[4](https://arxiv.org/html/2605.16575#S4)\)\.
- •This failure is not explained by weak counterparty modeling\. Informed agents form early beliefs about the other party’s priorities, indicating that the breakdown lies in how those beliefs are translated into bargaining behavior \(Sec\.[5](https://arxiv.org/html/2605.16575#S5)\)\.
- •Turn\-level analyses reveal that agents act on what they believe the negotiating partner values, but do not reliably convert those moves into gains on their own priorities \(Sec\.[6](https://arxiv.org/html/2605.16575#S6)\)\.
- •Requiring explicit give/ask trade plans via a template does not make negotiation more effective, suggesting that the missing capability is not the ability to state a trade, but to embed such trades in reciprocal multi\-turn bargaining \(Sec\.[7](https://arxiv.org/html/2605.16575#S7)\)\.
## 2\. Related Work
#### LLMs as negotiators\.
Several recent papers show that LLMs can negotiate non\-trivially in bilateral settings\. This line of work builds on earlier negotiation\-dialogue work such asLewiset al\.\([2017](https://arxiv.org/html/2605.16575#bib.bib32)\), which showed that end\-to\-end dialogue agents can learn to negotiate in natural language and that rollout\-based planning can improve performance\.Denget al\.\([2024](https://arxiv.org/html/2605.16575#bib.bib25)\)study buyer\-seller bargaining under light prompting and show that LLMs can achieve high trade rates, negotiate prices close to theory\-guided benchmarks, and exploit asymmetric information in scalar\-price bargaining\.Bianchiet al\.\([2024](https://arxiv.org/html/2605.16575#bib.bib24)\)introduce*negotiationarena*, a platform covering ultimatum games, trading games, and price negotiations, and show that tactics such as aggression can change final payoffs\.Fuet al\.\([2023](https://arxiv.org/html/2605.16575#bib.bib23)\)study self\-play and AI feedback in a single\-item buyer\-seller game, showing that some models can improve final deal prices across rounds\.Zhuet al\.\([2025](https://arxiv.org/html/2605.16575#bib.bib28)\)analyze fully automated agent\-to\-agent negotiations in consumer markets and identify large model\-dependent performance gaps and multiple failure modes, including budget violations and unreasonable deals\.Liuet al\.\([2026](https://arxiv.org/html/2605.16575#bib.bib27)\)broaden the scope further, introducing a benchmark spanning bilateral to many\-to\-many markets and showing persistent limitations in long\-horizon negotiation\. These papers establish that LLM negotiation is both feasible and behaviorally rich\. They also complement more recent work framing negotiation as a useful testbed for language\-model agencyDavidsonet al\.\([2024](https://arxiv.org/html/2605.16575#bib.bib31)\)\.
#### Game\-theoretic evaluation and negotiation workflows\.
A related line of work studies LLMs in game\-theoretic environments\.Huaet al\.\([2024](https://arxiv.org/html/2605.16575#bib.bib26)\)evaluate LLM rationality across complete\- and incomplete\-information games and propose structured workflows that improve performance in tasks such as Deal\-or\-No\-Deal\. Relatedly,Lorè and Heydari \([2024](https://arxiv.org/html/2605.16575#bib.bib41)\)study the strategic behavior of large language models in canonical two\-player social\-dilemma games, whileGandhiet al\.\([2023](https://arxiv.org/html/2605.16575#bib.bib29)\)show that language models can be prompted to perform strategic reasoning in games involving other agents, hidden information, and competing objectives\. These results show that LLMs can display nontrivial strategic behavior and that explicit game\-theoretic scaffolds can improve rationality and near\-optimal allocations\.
#### Strategic reasoning with language models\.
Beyond negotiation specifically, a growing literature studies whether LLMs can reason strategically in settings involving other agents, hidden information, and competing objectives\.Gandhiet al\.\([2023](https://arxiv.org/html/2605.16575#bib.bib29)\)show that pretrained language models can be prompted to perform strategic reasoning in games and can exhibit human\-like negotiation strategies in realistic scenarios\.Liaoet al\.\([2024](https://arxiv.org/html/2605.16575#bib.bib30)\)further show that self\-play can substantially improve language\-model performance in negotiation\-style non\-zero\-sum settings\.
#### Automated negotiation and bargaining theory\.
Additionally, our work sits within a broader academic literature surrounding automated negotiation, bilateral trade, and bargaining under incomplete information\(Myerson and Satterthwaite,[1983](https://arxiv.org/html/2605.16575#bib.bib17); Chatterjee and Samuelson,[1983](https://arxiv.org/html/2605.16575#bib.bib18); Faratinet al\.,[1998](https://arxiv.org/html/2605.16575#bib.bib20); Heet al\.,[2018](https://arxiv.org/html/2605.16575#bib.bib21); Baarslaget al\.,[2013](https://arxiv.org/html/2605.16575#bib.bib22)\)\. Classical models provide clean benchmarks for information, efficiency, and equilibrium, but abstract away from natural\-language reasoning and the richer issue structure of modern LLM interactions\.
While the prior work discussed above establishes that LLMs can negotiate, follow workflows, and model game\-theoretic scenarios, these studies primarily evaluate final outcomes, e\.g\., deal rates, prices, overall rationality, or broad measures of efficiency\. Even when negotiations unfold over multiple turns, the main object of study is typically the eventual agreement\. Recent work has begun to examine the structural mechanisms driving these outcomes, for instance,Wanget al\.\([2026](https://arxiv.org/html/2605.16575#bib.bib34)\)used automated program discovery to show that frontier LLMs maintain highly sophisticated opponent models in simple sequential games like iterated Rock\-Paper\-Scissors\. Our paper takes a different approach by isolating a specific, process\-level phenomenon in complex, natural\-language bargaining: whether agents that can infer a counterparty’s preferences actually convert that knowledge into reciprocal, strategically coupled bargaining behavior over time\. Rather than asking if a workflow improves final payoffs, or if models can exploit simple transition matrices, we analyze belief traces, turn\-level offer changes, and the coupling between concessions and self\-gain\. In doing so, we demonstrate that in multi\-attribute language\-mediated negotiation, the ability to model a partner and the ability to execute a multi\-turn strategy remain two distinct and unaligned capabilities\.
## 3\. Experimental Design
We study controlled bilateral negotiation in a multi\-attribute car\-purchase domain\. The goal of the design is to isolate strategic information use: agents face the same action space and bargaining protocol, while information about negotiating partner preferences is manipulated experimentally\.
### 3\.1\. Multi\-Attribute Negotiation Environment
We use a multi\-attribute car\-purchase domain with 10 negotiation terms \(Table[1](https://arxiv.org/html/2605.16575#S3.T1)\)\. The feature vectorϕ\(x\)∈ℝ20\\phi\(x\)\\in\\mathbb\{R\}^\{20\}is built in two steps\. First, each continuous term is rescaled to\[0,1\]\[0,1\]within the*agent’s own feasible range*\(e\.g\., for the buyer on a Sedan,price\_norm=\(price−20\)/\(45−20\)\\texttt\{price\\\_norm\}=\(\\text\{price\}\-20\)/\(45\-20\), where2020and4545are respectively the lowest and highest permitted price for the buyer\)\. Second, categorical and binary terms are one\-hot encoded as binary indicators\. After both steps every component ofϕ\(x\)\\phi\(x\)lies in\[0,1\]\[0,1\]\.
Table 1:Negotiation terms and feasible ranges\.Each agent has a private utility functionUi\(x\)=θi⊤ϕ\(x\)U\_\{i\}\(x\)=\\theta\_\{i\}^\{\\top\}\\phi\(x\)\. The weightsθi\\theta\_\{i\}are drawn per feature group from uniform distributions with role\-specific sign constraints, e\.g\., the price weight is always negative for the buyer and positive for the seller\. For categorical variables, one option is sampled as preferred and assigned a positive weight, while the remaining options receive fractional negative weights\. This encoding makes categorical choices contrastive: the preferred option increases utility, whereas the alternatives reduce it relative to that preference\. All weights are then L1\-normalised,‖θi‖1=1\\\|\\theta\_\{i\}\\\|\_\{1\}=1, also note thatUiU\_\{i\}is*not*bounded below by zero: negative weights \(e\.g\. the buyer’s price weight\) contribute negatively when the corresponding feature is large, so the worst\-case contract typically givesUi<0U\_\{i\}<0\. Each agent also has a reservation valuedid\_\{i\}computed as the utility at the worst feasible contract, and we constrain that a deal requiresUi\(x\)\>diU\_\{i\}\(x\)\>d\_\{i\}for both parties to be concluded\. In the experiments, we report normalized utilityU~i=Ui−dimaxxUi\(x\)−di∈\[0,1\],\\tilde\{U\}\_\{i\}=\\frac\{U\_\{i\}\-d\_\{i\}\}\{\\max\_\{x\}U\_\{i\}\(x\)\-d\_\{i\}\}\\,\\in\\,\[0,\\,1\],soU~i=0\\tilde\{U\}\_\{i\}=0is the walk\-away point andU~i=1\\tilde\{U\}\_\{i\}=1is the agent’s ideal outcome\.
### 3\.2\. Models and Traces
All experiments use Qwen3\-235B\-A22B\(Yanget al\.,[2025](https://arxiv.org/html/2605.16575#bib.bib2)\)and are replicated with DeepSeek\-R1\-671B\(Guoet al\.,[2025](https://arxiv.org/html/2605.16575#bib.bib3)\)in Appendix[B](https://arxiv.org/html/2605.16575#A2)\. For each conversation turn, we log the resulting<think\>traces together with the templated \(JSON\) deal proposition\. These traces allow us to analyze the intermediate beliefs and plans that accompany the offers\. This is central to our setting, because we ask whether partner knowledge is merely modeled or strategically deployed in multi\-turn negotiation\.
### 3\.3\. Information Conditions and Prompt Interventions
We run two experiment families, each designed to isolate a different part of our central question: whether LLM agents merely*model*partner information, or actually*use*them strategically in negotiation \(Table[2](https://arxiv.org/html/2605.16575#S3.T2)\)\.
#### Asymmetric information \(exp\_asym\)\.
Our first experiment asks the basic question of value\-of\-information in bargaining: when one side knows more about the other’s preferences, does that knowledge improve its outcome? We vary who receives partner preference information derived from the partner’s utility weights \(neither agent, only the buyer, only the seller, or both\) and run 100 trials per condition with randomized utility weights\. This experiment tests whether negotiating partner information helps the agent that receives it, and whether any resulting gains reflect strategic use of that information rather than a general increase in agreement or coordination\. When an agent receives the other side preference information, it appears in the prompt as a ranked preference list \(critical, important, flexible, see Appendix[C\.2](https://arxiv.org/html/2605.16575#A3.SS2)for prompts\)\. Thus, informed agents observe a coarse ranked summary of the counterparty’s higher\-priority attributes rather than the full underlying utility weights or utility function\.
#### Trade\-plan intervention \(exp\_trade\_plan\)\.
Our second experiment asks whether the observed gap can be reduced by imposing explicit reciprocal trade structure in the prompt\. We introduce a reasoning template that requires agents, before each offer, to specify a feature to concede, a feature to demand in return, and a concrete give/ask template \(Appendix[C\.5](https://arxiv.org/html/2605.16575#A3.SS5)\)\. We then compare negotiation with and without this intervention under uninformed and fully informed conditions\. This experiment tests whether enforcing a simple concession for reciprocity template is sufficient to convert partner knowledge into more strategically effective multi\-turn bargaining\.
Table 2:Experimental conditions investigated\.
### 3\.4\. Metrics
To assess both negotiation dynamics and final outcomes, we use the following evaluation metrics\.
#### Outcome metrics\.
We report deal rate, normalized utilitiesU~b,U~s\\tilde\{U\}\_\{b\},\\tilde\{U\}\_\{s\}\(buyer and seller respectively\), social welfareU~b\+U~s\\tilde\{U\}\_\{b\}\+\\tilde\{U\}\_\{s\}, distance to the Nash Bargaining SolutiondNBSd\_\{\\text\{NBS\}\}, and distance to the nearest Pareto\-efficient contractdParetod\_\{\\text\{Pareto\}\}\.
#### Belief accuracy\.
For each agent turn, we use an LLM to read the<think\>block and extract structured beliefs about the counterparty:\{\(feature,direction\)\}\\\{\(\\text\{feature\},\\text\{direction\}\)\\\}\. We compute*signed\-accuracy@k*: a prediction counts as correct when it identifies a feature in the partner’s true top\-k and the correct preference direction\.
#### Strategic coupling\.
Outcome metrics cannot distinguish strategic use of partner knowledge from generic accommodation\. We therefore construct per\-turn measures that jointly characterize whether the agent acts on its negotiating partner model*and*extracts value for itself\.
*Concession toward counterparty*cT\+1c\_\{T\+1\}measures how much the agent acted on what it believes the other party wants\. For each featureffthat the agent’s<think\>block mentions when reasoning about the negotiating partner,df∈\{−1,\+1\}d\_\{f\}\\in\\\{\-1,\+1\\\}is the extracted belief direction \(e\.g\. “the partner wants lower price”⇒\\Rightarrowdf=−1d\_\{f\}=\-1for price\) andΔf\\Delta\_\{f\}is the change in the agent’s own offered value from turnTTtoT\+1T\{\+\}1\. The productdf⋅Δfd\_\{f\}\\cdot\\Delta\_\{f\}is positive precisely when the offer moved in the direction the agent believes the counterparty prefers\. Summing positive contributions over all mentioned features gives:
cT\+1=∑f∈ℳT\+1max\(0,df⋅Δf\)c\_\{T\+1\}=\\textstyle\\sum\_\{f\\in\\mathcal\{M\}\_\{T\+1\}\}\\max\(0,\\,d\_\{f\}\\cdot\\Delta\_\{f\}\)\(1\)whereℳT\+1\\mathcal\{M\}\_\{T\+1\}is the set of partner features mentioned in the<think\>block at turnT\+1T\{\+\}1\.
*Own gain*gT\+1g\_\{T\+1\}measures, independently of the partner, whether the agent simultaneously captured value on its*own*top priorities\. Letθi,1,…,θi,K\\theta\_\{i,1\},\\ldots,\\theta\_\{i,K\}be the agent’sKKlargest utility weights \(by magnitude\) andΔk\\Delta\_\{k\}the change in the offered value on those features\. Thensign\(θi,k\)⋅Δk\>0\\operatorname\{sign\}\(\\theta\_\{i,k\}\)\\cdot\\Delta\_\{k\}\>0whenever the offer moved in the agent’s own preferred direction on featurekk:
gT\+1=1K∑k=1Ksign\(θi,k\)⋅Δkg\_\{T\+1\}=\\textstyle\\frac\{1\}\{K\}\\sum\_\{k=1\}^\{K\}\\operatorname\{sign\}\(\\theta\_\{i,k\}\)\\cdot\\Delta\_\{k\}\(2\)
A strategically rational agent links the two: it concedes on features the partner values \(c\>0c\>0\) while simultaneously extracting gains on its own priorities \(g\>0g\>0\)\.
## 4\. Counterparty Information Does Not Reliably Benefit the Informed Side
Figure 1:Buyer utility rises while seller utility declines across information conditions\.We show the mean normalized utility per condition, where conditions indicate whether neither side, only the buyer, only the seller, or both sides receive a ranked summary of the other party’s preferences\. Buyer utility \(blue\) rises monotonically with information, while seller utility \(red\) falls in every information condition\. The sharpest asymmetry occurs when only the seller is informed: buyer utility increases strongly, while seller utility declines relative to the uninformed baseline\. Providing the seller with buyer’s information does not improve its strategic abilities\.We begin with the most conservative question: when an agent is given the partner’s preferences, do final outcomes look as though that information is being used strategically?
[Figure˜1](https://arxiv.org/html/2605.16575#S4.F1)summarizes the main outcome\-level pattern\. We can observe that as more partner information is introduced, joint welfare increases slightly \(values are shown in[Table˜3](https://arxiv.org/html/2605.16575#S4.T3)\)\. But these gains do not accrue primarily to the side that receives the information\. The sharpest case is the seller\-informed condition: relative to the uninformed baseline, buyer utility rises while seller utility falls\. Thus, information is not inert, but the resulting shifts do not match the straightforward prediction that the informed side should systematically bargain to its own advantage\.
Table 3:Outcome summary forexp\_asym\.U~=0\\tilde\{U\}=0is the walk\-away point andU~=1\\tilde\{U\}=1is ideal\.Δ\\Deltais measured relative tosymmetric\_none\. Each condition contains 100 trials in total\. Values are mean \(standard error\)\.Table[3](https://arxiv.org/html/2605.16575#S4.T3)gives the corresponding magnitudes\. Welfare increases from 1\.096 in the uninformed baseline to 1\.148 in the fully informed condition, but the per\-side changes are small in absolute terms\. The seller\-informed condition is again the clearest mismatch between information access and self\-benefit: buyer utility rises by\+0\.069\+0\.069, while seller utility falls by−0\.044\-0\.044\. This is far from the outcome one would expect if the informed side strategically exploited partner preference information\. If an agent could use counterparty information strategically, it would choose the agreement that maximizes its own utility subject to the partner receiving at least its reservation value, thereby pushing the partner to the minimally acceptable feasible outcome\. Instead, the observed gains are modest and often accrue to the uninformed side\. We also observe in[Figure˜2](https://arxiv.org/html/2605.16575#S4.F2)that agreements remain concentrated in a buyer\-favorable region across all conditions, and the condition means move only slightly as information is added\. That is, information nudges the location of outcomes rather than reorganizing the bargaining landscape in a way that would clearly signal strategic exploitation\.
Figure 2:Outcome distributions remain buyer\-favorable across information conditions\.Each point is one agreed trial \(★\\bigstardenotes the condition mean\)\. The x\-axis is buyer normalized utility and the y\-axis is seller normalized utility, so points farther right favor the buyer more and points higher favor the seller more\. Across conditions, agreements remain concentrated in a buyer\-favorable region, and condition means move only modestly\. The clearest deviation appears in the seller\-informed condition, where outcomes shift mainly toward higher buyer utility rather than higher seller utility\. Counterparty information therefore changes outcomes directionally, but does not reorganize the bargaining landscape in a way that would indicate a reliable advantage for the informed side\.The buyer\-favorable concentration visible in[Figure˜2](https://arxiv.org/html/2605.16575#S4.F2)is present even in the uninformed baseline, so information effects alone cannot account for it\. The asymmetry runs across all ten contract dimensions, not only price\. In this domain, buyer and seller utility weights are drawn independently per trial, so for any categorical feature \(vehicle model, interior finish, warranty type, or service package\) the two sides may prefer different options\. When their preferences conflict, buyers obtain their preferred categorical value in6363to91%91\\%of cases across these features\. Sellers do not recover this ground on price: in cases where the seller concedes their preferred model type or interior finish to the buyer, the final agreed price is essentially the same as in cases where the buyer is the one who conceded, differing by less than $1k on average\. Sellers thus give up categorical features without extracting a higher price in return\. Since price accounts for only 17% of buyer utility and the categorical features together account for more than half, this pattern of uncompensated multi\-attribute seller concessions, not any single leverage point, explains the stable buyer advantage observed across all conditions\.
Figure 3:Final price depends more on the opening price than on price utility weights\.Trials shown \(n=123n=123\) are pooled across allexp\_asymconditions for cases where a Sedan was negotiated\.*Left:*buyer’s price weight vs\. final agreed price\.*Center:*seller’s price weight vs\. final agreed price\.*Right:*first proposed price vs\. final agreed price\. The final price is only weakly related to the buyer’s and seller’s price weights, whereas the opening price is a strong predictor of the final agreed price\.In[Figure˜3](https://arxiv.org/html/2605.16575#S4.F3), we compare how the final agreed price relates to the agents’ price weights versus the first price proposed during the conversation\. It is clear that the final agreed price is only weakly related to the agents’ underlying price utility weights, but strongly related to the first price proposed\. That pattern is hard to reconcile with a process in which counterparty information is being used to construct targeted, utility\-aware bargaining moves\. Instead, it suggests that negotiation operates more through anchoring and local adjustment than through strategic search over reciprocal trade\-offs\.
## 5\. Counterparty Modeling is Substantial: Agents Actively Infer Preferences
Given that outcome effects are too small and weakly aligned to demonstrate strategic use, we must determine where the breakdown occurs\. The most basic explanation would be a failure of perception: do informed agents actually form accurate internal models of the partner’s preferences? If the answer were no, then the weak strategic effects could be explained simply by failure of partner modeling\. Otherwise, the problem must lie downstream, in how those beliefs are used during bargaining\.
Figure 4:Informed agents rapidly form accurate negotiating partner beliefs\.Cumulative signed\-accuracy@5 over normalized turn fraction, for buyer \(left\) and seller \(right\)\. At each turn, a prediction counts as correct when it identifies an attribute in the partner’s true top\-5 \(based on the magnitude of the utility weights\) with the correct preference direction, aggregated over all attributes the agent has mentioned up to that point\. The cumulative nature of the metric means accuracy rises over turns\. Note that for informed agents: the preference list is available from turn11, but agents only explicitly reason about individual features as they become bargaining\-relevant, so the full partner model is articulated progressively rather than all at once\. Informed conditions exceed0\.70\.7by mid\-negotiation for the buyer, while uninformed agents remain above chance, indicating active inference from the conversation\. The seller\-side curve rises more slowly than the buyer\-side curve, suggesting weaker or less explicit incorporation of partner preferences in the reasoning trace\.[Figure˜4](https://arxiv.org/html/2605.16575#S5.F4)shows that informed agents do, in fact, build counterparty models\. The belief accuracy \(described in[Section˜3\.4](https://arxiv.org/html/2605.16575#S3.SS4)\) rises quickly and remains well above the uninformed baseline for both roles, with informed conditions exceeding roughly 0\.7 by the midpoint of negotiation\. Even uninformed agents perform above chance, indicating that some partner inference is possible from interaction alone, but explicit information sharply improves both speed and accuracy of belief formation\. The weak strategic effects in Section[4](https://arxiv.org/html/2605.16575#S4)cannot be explained by saying that the models failed to notice or encode what the other side wanted\. On the contrary, the traces indicate that partner information is often modeled early, explicitly, and correctly\. In the example shown in Appendix[A](https://arxiv.org/html/2605.16575#A1), the informed buyer correctly identifies the seller’s most important features before any offer is exchanged: higher price, Sedan, and annual service\. It illustrates what[Figure˜4](https://arxiv.org/html/2605.16575#S5.F4)already shows in aggregate: the models often possess the relevant counterparty model well before the negotiation is completed\.
The aggregate evidence in[Figure˜4](https://arxiv.org/html/2605.16575#S5.F4)and the concrete traces imply that the core failure is not one of perception\. The agents often know what the negotiating partner values\. The deeper question, then, is why these beliefs fail to organize the subsequent interaction as reciprocal, strategically coupled bargaining\.
## 6\. Accurate Counterparty Models Fail to Translate into Strategic Execution
Because informed agents do successfully model what the other side wants, the failure to secure better outcomes cannot be attributed to a lack of perception\. The breakdown must therefore lie downstream in the bargaining process itself\. This section asks*why*this knowledge does not translate into better bargaining outcomes\. Our answer is that it is a systematic process\-level pattern: agents respond to inferred partner preferences, but they do not reliably convert those responses into reciprocal exchange\. The resulting interaction appears to be closer to a localized accommodation behavior than bargaining\.
Figure 5:Sellers accommodate while buyers withhold\.Mean belief\-action alignment by condition and role\. Alignment is positive when an offer is closer to the direction the agent believes the partner prefers, and negative when the offer moves against the partner’s inferred preference\. Seller alignment is consistently positive, indicating accommodation of the buyer’s inferred preferences, while buyer alignment is consistently negative, indicating withholding from the seller’s inferred preferences\. The asymmetry is stable across information conditions\.We first ask whether agents move their offers in the direction they believe the partner prefers\. For each turn where the<think\>block mentions a feature preference, we compute alignmenta=df⋅\(v^f−0\.5\)a=d\_\{f\}\\cdot\(\\hat\{v\}\_\{f\}\-0\.5\), wheredfd\_\{f\}is the extracted belief direction andv^f\\hat\{v\}\_\{f\}the normalized offered value\. Positive alignment means accommodating the partner’s inferred preference; negative alignment means moving against it\.[Figure˜5](https://arxiv.org/html/2605.16575#S6.F5)shows a clear role asymmetry\. Sellers tend to move offers in the direction they believe the buyer wants, whereas buyers are more likely to move against attributes they believe the seller values\. This helps explain why information does not benefit both roles symmetrically\. Additional information gives the seller a more precise map of what to accommodate, while giving the buyer a more precise map of what to hold back or bargain around\. This first result already shows that counterparty information is not inert\. Agents do act on their beliefs about the other side\. But belief\-action alignment alone is not yet strategic use\. A counterparty\-facing concession becomes strategic only if it is paired with a compensating gain on the agent’s own priorities\. We therefore now ask whether turns that move the offer toward what the agent believes the counterparty wants are coupled to gains on the agent’s own high\-value features\. If agents were using concessions strategically, moves that benefit the partner should also improve at least some of the agent’s own priorities\.
Figure 6:Counterparty\-facing concessions are weakly compensated by own\-priority gains\.A counterparty\-facing concession is a turn where the agent changes its offer in the direction it believes the counterparty prefers on at least one mentioned feature, so thatcT\+1\>0c\_\{T\+1\}\>0\. Own\-gaingT\+1g\_\{T\+1\}measures whether the same offer revision also moves the agent’s own highest\-weighted features in its preferred direction\. We compare mean own\-gain on turns with a positive counterparty\-facing concession against turns without one, shown separately for buyers \(left\) and sellers \(right\)\. If agents were using concessions strategically, concession turns should have positive own\-gain\. Instead, own\-gain on concession turns is typically near zero or negative\. Read together with[Figure˜5](https://arxiv.org/html/2605.16575#S6.F5), this indicates that agents often accommodate what they believe the partner wants without reliably extracting compensating gains on their own priorities\. In asymmetric\-information conditions, the informed side tends to show the weakest compensation for its concessions: the clearest drop appears for sellers inseller\_informedand for buyers inbuyer\_informed\.[Figure˜6](https://arxiv.org/html/2605.16575#S6.F6)shows that strategic coupling is weak, but not in a perfectly uniform way across roles and conditions\. Own\-gain on counterparty\-facing concession turns is typically near zero or negative, rather than reliably positive, indicating that agents often move offers toward what they believe the partner wants without securing compensating gains on their own priorities\.
Sellers are the more accommodating role overall\. They more consistently move in the direction they believe the buyer prefers\. Second, conditional on such concessions being made, the updated coupling analysis shows the sharper asymmetry at the level of information access\. Inseller\_informed, the seller shows the clearest drop in own\-gain on concession turns, while the buyer remains closer to zero\. Inbuyer\_informed, the same pattern appears on the buyer side: the buyer’s own\-gain is more negative, while the seller remains closer to zero\. Thus, in asymmetric conditions, the agent with more information about the partner tends to make the more weakly compensated concessions\. The qualitative traces help unpack the aggregate mechanism\. The traces in Appendix[A](https://arxiv.org/html/2605.16575#A1)show that agents often identify the counterparty’s priorities explicitly, sometimes revise offers in the partner’s preferred direction, and can produce locally sensible trade plans that still fail to coordinate across turns\.
[Figures˜5](https://arxiv.org/html/2605.16575#S6.F5)and[6](https://arxiv.org/html/2605.16575#S6.F6)identify the paper’s central mechanism\. The agents do not fail because they lack negotiating partner models\. They fail because those models do not reliably structure bargaining as reciprocal exchange\. What is missing is not social inference, but the ability to turn social inference into strategically coupled multi\-turn negotiation\.
## 7\. Explicit Trade Templates Do Not Close the Strategic Gap
If the core failure is not a lack of counterparty modeling, but rather a lack of reciprocal strategic execution, a natural intervention is to constrain the free\-form interaction\. We hypothesize that forcing agents to translate their beliefs into structured, explicit bargaining moves might bridge this gap\. On this view, the bottleneck is not partner modeling itself, but the absence of a scaffold that forces beliefs about the counterparty to be expressed as explicit concession\-for\-reciprocity proposals\. A structured template could plausibly help by narrowing the open\-ended response space and by requiring the agent to couple each concession with a corresponding demand\. This section tests that hypothesis with a templated trade\-plan prompt\.
In theexp\_trade\_plancondition, agents are required to articulate an explicit exchange before each offer: a feature to concede, a feature to demand in return, and a concrete give/ask package \(prompt details in Appendix[C\.5](https://arxiv.org/html/2605.16575#A3.SS5)\)\. The intervention is applied symmetrically to both agents, allowing us to ask whether making exchange structure explicit is sufficient to convert counterparty knowledge into more strategically organized bargaining\.
Table 4:Outcome metrics forexp\_trade\_plan\(100 trials per condition\)\. Utilities and distances are in normalized utility space \(U~∈\[0,1\]\\tilde\{U\}\\in\[0,1\]\), averaged over agreed deals\. Lower is better fordParetod\_\{\\text\{Pareto\}\}anddNBSd\_\{\\text\{NBS\}\}\. The trade plan reduces deal rate for informed agents without improving Pareto efficiency or appreciably reducing distance to the Nash solution\.[Table˜4](https://arxiv.org/html/2605.16575#S7.T4)shows that the intervention does not improve the basic outcome pattern\. In the informed setting, deal rate falls substantially and the utility split shifts only slightly \(within one standard error\) toward the seller\. This is already evidence against the view that the main problem is merely one of prompt organization: forcing agents to spell out an exchange does not make negotiation more effective\.
Figure 7:The trade plan does not improve efficiency\.The trade\-plan intervention requires each agent, before making an offer, to state a feature it is willing to concede, a feature it wants in return, and a concrete give/ask package\.*Left:*distance to the Pareto frontier in normalized utility space\. Information improves Pareto proximity \(∼0\.09→0\.05\\sim 0\.09\\to 0\.05\); the trade plan has no measurable effect on Pareto distance in either information regime\.*Right:*distance to the Nash Bargaining Solution in normalized utility space\. The trade plan does not move informed deals appreciably closer to Nash \(0\.228 vs\. 0\.207, within one standard error\) and reduces deal rate by about seven percentage points \(cf\. Table[4](https://arxiv.org/html/2605.16575#S7.T4)\)\.[Figure˜7](https://arxiv.org/html/2605.16575#S7.F7)sharpens that conclusion\. If the intervention were repairing the underlying strategic failure, we would expect negotiations to move closer to efficient bargains\. Instead, the trade plan has little effect on Pareto distance and no clear efficiency benefit overall\. In other words, making exchange structure explicit does not cause the interaction to become more frontier\-seeking or more reciprocally organized\. The most plausible interpretation is that the intervention improves local neatness without improving interactive coordination\. Agents can formulate explicit one\-step packages, but those packages are not reliably embedded in a contingent multi\-turn strategy\. The problem is therefore not simply the absence of an explicit trade format; it is the failure to use such trades as part of a larger reciprocal bargaining process\. The qualitative traces support the same reading\. In the representative example reported in Appendix[A](https://arxiv.org/html/2605.16575#A1), the buyer proposes a locally sensible give/ask package and the seller responds with another locally sensible package, but the two packages interfere rather than reinforce\. The trace matters because it makes visible what the aggregate results imply: the intervention improves legibility more than coordination\.
[Table˜4](https://arxiv.org/html/2605.16575#S7.T4)and[Figure˜7](https://arxiv.org/html/2605.16575#S7.F7)show that a structured trade template does not repair the gap identified in this paper\. The missing capability is not the ability to state a trade, but the ability to embed such trades in reciprocal, multi\-turn negotiation\. This suggests that improving negotiation performance will require mechanisms that support contingent planning across turns, rather than prompt structures that only make single\-turn exchanges more explicit\. More generally, the result reinforces the paper’s central distinction between modeling strategically relevant information and using it to control sequential interaction\.
## 8\. Conclusion
In this paper, we studied whether LLM agents can use counterparty information strategically in multi\-turn negotiation\. Across controlled multi\-attribute bargaining experiments, the main result is a gap between*counterparty modeling*, a capability agents mostly possess, and*strategic deployment*, a capability they lack\. Agents often infer the negotiating partner’s preferences accurately and early, yet that information does not reliably organize bargaining as reciprocal, contingent exchange\. Instead, negotiation is shaped more by local accommodation, weak or asymmetric concession\-gain coupling, and opening anchors than by multi\-turn strategic coordination\.
This process\-level view matters because outcome metrics alone are easy to over\-interpret\. Providing negotiating partner information modestly shifts agreements and can slightly improve welfare, but it does not reliably benefit the side that holds the information\. Once reasoning traces and turn\-level offer dynamics are taken into account, a clearer picture emerges: agents respond to what the other side values, but they do not consistently convert that knowledge into bargaining leverage\. Sellers are more accommodating overall, and in asymmetric\-information conditions the informed side often makes the more weakly compensated concessions\. The trade\-plan template we introduced strengthens this interpretation\. If the bottleneck were merely the absence of explicit structure, then requiring agents to formulate give/ask packages should improve bargaining performance\. The template makes proposed exchanges more legible, but not more effective, suggesting that the missing capability is not simply stating a trade, but coordinating concessions contingently over multiple turns\.
The broader implication is that final\-outcome competence should not be mistaken for strategic competence\. More generally, the presence of a capability in model reasoning or verbal reports does not guarantee the corresponding behavior in interaction\. In our setting, agents can state what the counterparty wants, but that knowledge does not reliably guide reciprocal bargaining across turns\. Evaluating strategic intelligence in LLM agents therefore requires testing not only what models can infer, but whether those inferences causally organize their sequential decisions\. These results suggest that social understanding and strategic interaction remain importantly separable capabilities in current LLM agents\. Progress will likely require methods that go beyond local response generation, including stronger models of negotiating partner response, explicit search over offer sequences, and training objectives defined over multi\-turn bargaining trajectories rather than single\-turn conversational quality\.
## References
- T\. Baarslag, K\. Fujita, E\. H\. Gerding, K\. Hindriks, T\. Ito, N\. R\. Jennings, C\. Jonker, S\. Kraus, R\. Lin, V\. Robu,et al\.\(2013\)Evaluating practical negotiating agents: results and analysis of the 2011 international competition\.Artificial Intelligence198,pp\. 73–103\.Cited by:[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px4.p1.1)\.
- F\. Bianchi, P\. J\. Chia, M\. Yuksekgonul, J\. Tagliabue, D\. Jurafsky, and J\. Zou \(2024\)How well can llms negotiate? negotiationarena platform and analysis\.arXiv preprint arXiv:2402\.05863\.Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p5.1),[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px1.p1.1)\.
- K\. Chatterjee and W\. Samuelson \(1983\)Bargaining under incomplete information\.Operations research31\(5\),pp\. 835–851\.Cited by:[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px4.p1.1)\.
- T\. R\. Davidson, V\. Veselovsky, M\. Josifoski, M\. Peyrard, A\. Bosselut, M\. Kosinski, and R\. West \(2024\)Evaluating language model agency through negotiations\.arXiv preprint arXiv:2401\.04536\.Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p5.1),[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px1.p1.1)\.
- Y\. Deng, V\. Mirrokni, R\. P\. Leme, H\. Zhang, and S\. Zuo \(2024\)Llms at the bargaining table\.InAgentic Markets Workshop at ICML,Vol\.2024\.Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p5.1),[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px1.p1.1)\.
- P\. Faratin, C\. Sierra, and N\. R\. Jennings \(1998\)Negotiation decision functions for autonomous agents\.Robotics and Autonomous Systems24\(3\-4\),pp\. 159–182\.Cited by:[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px4.p1.1)\.
- Y\. Fu, H\. Peng, T\. Khot, and M\. Lapata \(2023\)Improving language model negotiation with self\-play and in\-context learning from ai feedback\.arXiv preprint arXiv:2305\.10142\.Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p5.1),[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px1.p1.1)\.
- K\. Gandhi, D\. Sadigh, and N\. D\. Goodman \(2023\)Strategic reasoning with language models\.arXiv preprint arXiv:2305\.19165\.Cited by:[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px2.p1.1),[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px3.p1.1)\.
- D\. Guo, D\. Yang, H\. Zhang, J\. Song, P\. Wang, Q\. Zhu, R\. Xu, R\. Zhang, S\. Ma, X\. Bi,et al\.\(2025\)Deepseek\-r1: incentivizing reasoning capability in llms via reinforcement learning\.arXiv preprint arXiv:2501\.12948\.Cited by:[Appendix B](https://arxiv.org/html/2605.16575#A2.p1.1),[§3\.2](https://arxiv.org/html/2605.16575#S3.SS2.p1.1)\.
- H\. He, D\. Chen, A\. Balakrishnan, and P\. Liang \(2018\)Decoupling strategy and generation in negotiation dialogues\.InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,pp\. 2333–2343\.Cited by:[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px4.p1.1)\.
- W\. Hua, O\. Liu, L\. Li, A\. Amayuelas, J\. Chen, L\. Jiang, M\. Jin, L\. Fan, F\. Sun, W\. Wang,et al\.\(2024\)Game\-theoretic llm: agent workflow for negotiation games\.arXiv preprint arXiv:2411\.05990\.Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p5.1),[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px2.p1.1)\.
- S\. Huang, W\. Zhao, and J\. Gao \(2025\)SI\-bench: benchmarking social intelligence of large language models in human\-to\-human conversations\.External Links:2510\.23182,[Link](https://arxiv.org/abs/2510.23182)Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p4.1)\.
- E\. Junqué de Fortuny and V\. R\. Cappelli \(2025\)LLMs as strategic agents: beliefs, best response behavior, and emergent heuristics\.Best Response Behavior, and Emergent Heuristics \(September 10, 2025\)\.Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p4.1)\.
- S\. N\. Kirshner, Y\. Pan, J\. X\. Wu, and A\. Gould \(2026\)Talking terms: agent information in llm supply chain bargaining\.Decision Sciences57\(1\),pp\. 9–23\.Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p3.1)\.
- M\. Kosinski \(2024\)Evaluating large language models in theory of mind tasks\.Proceedings of the National Academy of Sciences121\(45\)\.External Links:ISSN 1091\-6490,[Link](http://dx.doi.org/10.1073/pnas.2405460121),[Document](https://dx.doi.org/10.1073/pnas.2405460121)Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p2.1)\.
- M\. Lewis, D\. Yarats, Y\. Dauphin, D\. Parikh, and D\. Batra \(2017\)Deal or no deal? end\-to\-end learning of negotiation dialogues\.InProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing,pp\. 2443–2453\.Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p5.1),[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px1.p1.1)\.
- A\. Liao, N\. Tomlin, and D\. Klein \(2024\)Efficacy of language model self\-play in non\-zero\-sum games\.arXiv preprint arXiv:2406\.18872\.Cited by:[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px3.p1.1)\.
- X\. Liu, S\. Gu, and D\. Song \(2026\)AgenticPay: a multi\-agent llm negotiation system for buyer\-seller transactions\.arXiv preprint arXiv:2602\.06008\.Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p5.1),[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px1.p1.1)\.
- N\. Lorè and B\. Heydari \(2024\)Strategic behavior of large language models and the role of game structure versus contextual framing\.Scientific Reports14\(1\),pp\. 18490\.Cited by:[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px2.p1.1)\.
- R\. B\. Myerson and M\. A\. Satterthwaite \(1983\)Efficient mechanisms for bilateral trading\.Journal of economic theory29\(2\),pp\. 265–281\.Cited by:[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px4.p1.1)\.
- Y\. S\. Rana \(2024\)When ai joins the table: how large language models transform negotiations\.Available at SSRN 5049248\.Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p5.1)\.
- Z\. Song, Y\. Xie, L\. Yang, and Y\. Zhao \(2026\)Large language models in supply chain management: a systematic literature review and application framework\.International Journal of Production Research,pp\. 1–41\.Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p3.1)\.
- C\. Wang, D\. Kasenberg, K\. Stachenfeld, and P\. S\. Castro \(2026\)Discovering differences in strategic behavior between humans and llms\.External Links:2602\.10324,[Link](https://arxiv.org/abs/2602.10324)Cited by:[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px4.p1.1)\.
- A\. Yang, A\. Li, B\. Yang, B\. Zhang, B\. Hui, B\. Zheng, B\. Yu, C\. Gao, C\. Huang, C\. Lv,et al\.\(2025\)Qwen3 technical report\.arXiv preprint arXiv:2505\.09388\.Cited by:[§3\.2](https://arxiv.org/html/2605.16575#S3.SS2.p1.1)\.
- Q\. Zhao, S\. Wang, C\. Zhang, C\. Fu, M\. Q\. Do, N\. Agarwal, K\. Lee, and C\. Sun \(2024\)AntGPT: can large language models help long\-term action anticipation from videos?\.External Links:2307\.16368,[Link](https://arxiv.org/abs/2307.16368)Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p2.1)\.
- X\. Zhou, H\. Zhu, L\. Mathur, R\. Zhang, H\. Yu, Z\. Qi, L\. Morency, Y\. Bisk, D\. Fried, G\. Neubig,et al\.\(2023\)Sotopia: interactive evaluation for social intelligence in language agents\.arXiv preprint arXiv:2310\.11667\.Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p1.1)\.
- S\. Zhu, J\. Sun, Y\. Nian, T\. South, A\. Pentland, and J\. Pei \(2025\)The automated but risky game: modeling and benchmarking agent\-to\-agent negotiations and transactions in consumer markets\.arXiv preprint arXiv:2506\.00073\.Cited by:[§1](https://arxiv.org/html/2605.16575#S1.p5.1),[§2](https://arxiv.org/html/2605.16575#S2.SS0.SSS0.Px1.p1.1)\.
## Appendix AExample Traces
Immediate Counterparty Modeling: Trial 12 at Turn 1Ground truth: seller’s top\-3 utility weights
price\(\+0\.62\),Sedan\(\+0\.38\),annual service\(\+0\.28\)Uninformed buyer \(symmetric\_none\) — Turn 1\(<think\>, \)I need to start the negotiation \[…\] The buyer’s critical point is to lower the price\. Important items are increasing trade\-in value, getting a basic warranty \[…\]Informed buyer \(buyer\_informed\) — Turn 1\(<think\>, \)The other side, the seller, really wants ahigher price, prefersSedans, and likesannual service\. They don’t care much about trade\-in value or down payment percentage, so those are areas I can push on\.Figure 8:Negotiating partner modeling\.Before any substantive bargaining occurs, the informed buyer explicitly identifies the seller’s highest\-value attributes \(higher price, Sedan, annual service\)\. In fact, informed agents often possess a usable counterparty model early in the negotiation\.Illustrative trade\-plan collisionBuyer — Turn 11\(<think\>, \)Trade plan:Concede on down payment to get a higher trade\-in\.Buyer — Turn 11\(offer\)“Let’s adjust the down payment to 15% in exchange for a $7,000 trade\-in allowance\.”Seller — Turn 12\(<think\>, \)Trade plan:Include accessories to demand trade\-in back down to $5k\.Seller — Turn 12\(offer\)“I can include the accessories package, but I’ll need the trade\-in back at $5,000\.”Figure 9:One\-step trade plans do not compose into coordinated bargaining\.Both agents produce locally coherent give/ask packages, but the proposed trades work at cross\-purposes rather than building toward a contingent coordination\.
## Appendix BReplication with DeepSeek\-R1\-671B
To assess whether the main findings depend on the particular model used in the body of the paper, we replicate the two core experiments: asymmetric information \(exp\_asym\), trade\-plan intervention \(exp\_trade\_plan\) with DeepSeek\-R1\-671B\[Guoet al\.,[2025](https://arxiv.org/html/2605.16575#bib.bib3)\]\. The negotiation domain, protocol, and evaluation pipeline are unchanged\.
The main directional patterns reappear\. Specifically, when the seller is informed, the buyer gains strongly and the seller loses or fails to benefit\. Informed agents again achieve substantial partner\-belief accuracy and the trade\-plan intervention again fails to produce a clear efficiency gain\. The most notable model difference is on the buyer side: with DeepSeek\-R1, buyers are more accommodating than with Qwen3\-235B, suggesting that the buyer\-side policy is less stable across model families than the seller\-side behavior\.
### B\.1\. Asymmetric Information
Table 5:exp\_asymoutcomes for DeepSeek\-R1\-671B\. Compare with Table[3](https://arxiv.org/html/2605.16575#S4.T3)in the main text\.Δ\\Deltais relative tosymmetric\_none\.When only the seller is informed, buyer utility increases by\+0\.070\+0\.070, essentially identical to the\+0\.069\+0\.069effect in the main Qwen experiment, while seller utility declines slightly\. Welfare again rises with information\. The directional pattern therefore holds: information on the seller side helps the buyer more than the seller\.
The main difference appears inbuyer\_informed, where the buyer’s own utility decreases slightly and the seller’s utility rises\. This reversal is small and within noise, but it reinforces the broader point that the information holder does not reliably capture the gains from added counterparty knowledge\.
Figure 10:The seller\-informed asymmetry replicates: when the seller is informed, the buyer accumulates some utility gain while the seller does not\.Figure 11:Final utility scatter forexp\_asym\.Across conditions, outcomes remain concentrated in a buyer\-favorable region, with the seller\-informed condition shifting the mean rightward\.
### B\.2\. Reasoning Traces: Belief Accuracy and Alignment
Figure[12](https://arxiv.org/html/2605.16575#A2.F12)shows that agents with access to partner preferences again achieve substantial signed belief accuracy well above baseline by mid\-negotiation\. The basic dissociation therefore replicates: the agents can model the other side’s priorities even when they do not use them to secure better outcomes for themselves\.
Figure 12:cumulative signed\-accuracy@5 over normalized turn fraction\.Agents with access to counterparty preferences rise well above baseline, replicating the gap between accurate beliefs and strategic use\.Figure[13](https://arxiv.org/html/2605.16575#A2.F13)shows the main model difference\. With DeepSeek\-R1, both buyers and sellers have positive belief–action alignment, indicating accommodation in both roles\. This differs from Qwen3\-235B, where buyers tended to withhold while sellers accommodated\. The robust part of the mechanism is therefore seller\-side accommodation; the buyer\-side response appears more model\-dependent\.
Figure 13:belief–action alignment by condition and role\.Both roles show positive alignment, indicating accommodation in both buyers and sellers\.Figure[14](https://arxiv.org/html/2605.16575#A2.F14)shows a partial replication\. DeepSeek sellers have negative own\-gain on concession turns across all conditions, with the deepest losses insymmetric\_noneandseller\_informed\. This confirms that the core seller\-side failure, conceding without extracting compensating value, is not model\-specific\.
The DeepSeek buyers diverge from Qwen3\-235B in an informative way\. Inseller\_informed, DeepSeek buyers show a small*positive*own\-gain on concession turns\. This means that when DeepSeek buyers accommodate the seller’s preferences, they can simultaneously move some of their own top\-priority features in a favorable direction, a partial form of strategic coupling that Qwen3\-235B buyers do not exhibit\.
Figure 14:Strategic coupling by role and condition\.Using concession toward counterpartycT\+1c\_\{T\+1\}and own\-gaingT\+1g\_\{T\+1\}as defined in Eqs\.[1](https://arxiv.org/html/2605.16575#S3.E1)–[2](https://arxiv.org/html/2605.16575#S3.E2), we compare mean own\-gaingT\+1g\_\{T\+1\}on turns with versus without positive concession toward counterparty \(cT\+1\>0c\_\{T\+1\}\>0\), shown separately for buyers \(left\) and sellers \(right\)\. The seller\-side pattern replicates the main result: seller own\-gain on concession turns remains negative across conditions, indicating uncompensated accommodation\. The buyer side is more model\-dependent: inseller\_informed, buyers show near\-zero or slightly positive own\-gain on concession turns, consistent with partial strategic coupling that is absent in the main Qwen results\.Figure 15:DeepSeek\-R1exp\_trade\_plan: distance to Pareto frontier \(left\) and to the Nash solution \(right\), in normalized utility space\.The pattern is qualitatively the same as for Qwen3\-235B: information improves Pareto proximity, and the trade plan has no clear efficiency benefit in either regime\.
### B\.3\. Trade Plan Intervention
Table 6:Outcome metrics forexp\_trade\_planwith DeepSeek\-R1 \(100 trials per condition\)\. Utilities and distances are in normalized utility space \(U~∈\[0,1\]\\tilde\{U\}\\in\[0,1\]\), averaged over agreed deals\. Lower is better fordParetod\_\{\\text\{Pareto\}\}anddNBSd\_\{\\text\{NBS\}\}\. The pattern replicates the main Qwen results: the trade plan slightly reduces deal rate without improving Pareto efficiency\.The trade\-plan result also replicates directionally \([Table˜6](https://arxiv.org/html/2605.16575#A2.T6),[Figure˜15](https://arxiv.org/html/2605.16575#A2.F15)\)\. There is no clear efficiency gain in the informed condition, and only a small change in deal rate\. DeepSeek\-R1 appears somewhat less brittle than Qwen3\-235B under bilateral plan templates, but the main conclusion is unchanged: making one\-step trades explicit does not close the gap between partner knowledge and effective bargaining\.
### B\.4\. Qwen v\.s\. DeepSeek
The most robust findings are the seller\-informed asymmetry, the presence of substantial counterparty\-belief accuracy, the lack of compensated seller concessions, and the failure of the trade\-plan intervention to produce a clear efficiency gain\. The main unstable component is buyer\-side policy: Qwen3\-235B buyers tend to withhold, whereas DeepSeek\-R1 buyers tend to accommodate\.
Across both models, the core conclusion remains stable: LLM agents can model negotiating partner preferences, but they do not reliably convert that knowledge into bargaining advantage\. The seller bears the clearest cost of this gap, and structured one\-step trade reasoning does not resolve it\.
## Appendix CPrompt Templates
All agents use a two\-level prompt architecture:\(i\)\(i\)*system prompt*is set once at the start of each negotiation trial which encodes the agent’s role, general strategy, domain context \(available terms and valid ranges\), and a trial\-specific preference block, and\(ii\)\(ii\)a*per\-turn prompt*is constructed at every negotiation step which gives the full dialogue history, the current offer on the table, and the instruction block\.
Note that, the actual prompts contain Unicode emoji markers that are not reproducible in TeX \(\[\!\]replaces a warning/critical emoji;\[\>\>\]replaces a target emoji;\[?\]replaces a magnifying\-glass emoji;\[P\]replaces a lightning/plan emoji\)\.
### C\.1\. Base System Prompt
Both buyer and seller receive the following system prompt structure\.\[BUYER / SELLER\]marks role\-specific text\.
Youarea\[BUYER/SELLER\]negotiatingto\[purchase/sell\]acar\.
\#\#NEGOTIATIONSTRATEGY:
1\.\*\*FOLLOWYOURPREFERENCES\*\*:Youhavespecificpreferenceslistedbelow\.
\-PRIORITIZEitemsmarkedCRITICAL\(mostimportant\)
\-PUSHFORitemsmarkedIMPORTANT\(butcancompromise\)
\-USEflexibleitemsasbargainingchips
\-Yourgoal:getoutcomesthatmatchyourpreferences
2\.\*\*TRADESTRATEGICALLY\*\*:Exchangethingsyoucarelessabout\.
\-ConcedeonFLEXIBLEitemstowinonCRITICALitems
\-Don’tgiveawaythingsyouwantwithoutgettingsomethingback
\-ProposedealsthatmaximizeYOURoutcome
3\.\*\*REACHAGREEMENT\*\*:Makingadealisimportant\!
\-Anydealaboveyourreservationvalueisbetterthannodeal
\-Ifopponentoffersseemreasonable,seriouslyconsideraccepting
\-Don’tletperfectbetheenemyofgood
\-Convergetowardmutuallybeneficialterms
4\.\*\*UNDERSTANDCONSTRAINTS\*\*:TheotherpartyhasHARDLIMITStoo\!
\-Theyhaveminimum/maximumboundstheyCANNOTviolate
\-Iftheykeeprejectingcertainterms,youmaybeoutsidetheirfeasiblerange
\-EXPLOREdifferentcombinations\-don’tgetstuckdemandingimpossibleterms
\-AsuccessfuldealrequiresfindingtermsthatworkforBOTHparties
\#\#RESPONSEFORMAT:
Respondwith:naturaldialogue\(2\-3sentences\),thenJSON\.BECONCISE\.
\[\!\]CRITICAL:YourJSONshouldONLYcontaintermsyouexplicitlymentioned
inyourdialogue\.
Forearlyconversation\(exploring\):
\{"action":"COUNTER","terms":\[\],"notes":"exploring"\}
Forproposingspecifictermsyoumentioned:
\{"action":"COUNTER",
"terms":\[\{"name":"model","type":"categorical","value":"Truck"\},
\{"name":"price","type":"money","value":35\}\],
"notes":"interestedintruckaround$35k"\}
DONOTincludetermsyouhaven’tdiscussed\(likecolor,warranty,etc\.\)
\-letthemcomeupnaturally\.
\[Domaincontext:availablecarmodels,negotiableterms,validranges\-\-
buyerseesbuyer\-sideranges;sellerseesseller\-sideminimums/ranges\]
\[Trial\-specificpreferenceblock,seeSectionC\.2\]
NEVERmentionJSON,technicaldetails,orutilityscoresinyourdialogue\.
### C\.2\. Trial\-Specific Preference Block
A preference block is appended to every agent’s system prompt at the start of each trial\. Utility weights are drawn independently per feature from uniform ranges with sign constraints, then L1\-normalized\. Features are assigned to tiers by absolute weight magnitude: CRITICAL \(\|w\|\>0\.6\|w\|\>0\.6\), IMPORTANT \(\|w\|\>0\.3\|w\|\>0\.3\), FLEXIBLE \(any remaining nonzero weight\)\. Below is a representative seller example\.
\#\#YOURPREFERENCES\(followthesestrictlytomaximizeyourutility\):
Role:SELLER
\*\*CRITICAL\*\*\(fighthard,don’teasilyconcede\):
\-Price:increaseprice\(higherprice\-\>higherutility\)
\*\*IMPORTANT\*\*\(pushfor\):
\-TradeIn:decreasetrade\-invalue\(lowervalue\-\>higherutility\)
\-DeliveryDay:increasedeliverytime\(laterdelivery\-\>higherutility\)
\*\*FLEXIBLE\*\*\(useasbargainingchips\):
\-HasAccessories:excludeaccessories\(accessories=false\-\>higherutility\)
\-IsTruck:chooseTruck\(selectingTruck\-\>higherutility\)
\-ColorBlue:chooseBluecolor\(Blue\-\>higherutility\)
\-InteriorLuxury:chooseLuxuryinterior\(Luxury\-\>higherutility\)
\-WarrantyBasic:choosebasicwarranty\(basic\-\>higherutility\)
\-ServiceAnnual:chooseannualservice\(annual\-\>higherutility\)
\[remaininglow\-weightfeatureslistedsimilarly\]
\#\#HOWTONEGOTIATE:
1\.FOLLOWYOURPREFERENCES\-theydetermineyourutility
2\.PUSHforhigh\-weightfeatures\(critical/important\)
3\.TRADEAWAYflexibleitemstogetwhatyouneed
4\.Expressyourpreferencesnaturallythroughoffersandreactions
5\.Maximizeutility=weightedsumofnormalizedfeatures
### C\.3\. Per\-Turn Prompt
At every negotiation step the agent receives a turn prompt assembled from the full dialogue history, the current structured offer, and a phase\-dependent instruction block chosen by turn number and offer state\.
\#\#CONVERSATION:
\[fulldialoguehistory\]
\#\#CURRENTOFFERONTABLE:\[laststructuredoffer,or"None\(yougofirst\)"\]
\[utilityinfo,ifany\]
\#\#YOURTURN\(TurnN\):
\[phase\-dependentinstruction\+optionalopponentintel\+optionaltradeplan\]
Respondwith:dialogue\(2\-3sentences\)\+JSONaction\.
\[\!\]REQUIRED:Afteranythinking,youMUSToutputyourdialoguetextand
JSONcodeblock\.Completeyourfullresponse\.
The phase\-dependent instruction evolves as follows\.
Turn 1 \(opening\):
STARTCONVERSATIONALLY\-introduceyourselfandexpressinterest\.
\[\>\>\]FORYOURJSON:
\-Justexploring?\-\>Useemptyterms:\{"action":"COUNTER","terms":\[\]\}
\-Mentionedspecificthings?\-\>IncludeONLYwhatyousaid
Example:"I’minterestedinatruck"\-\>
\{"action":"COUNTER","terms":\[\{"name":"model",
"type":"categorical","value":"Truck"\}\]\}
\[\!\]DONOTmakeupvaluesfortermsyouhaven’tmentionedyet
\(likecolor,warranty,etc\.\)
Turns 2–5, no complete offer on table \(exploratory\):
DISCUSSwhatmatterstoyou\.Mentionspecificpreferences\.
YourJSONshouldincludeONLYtermsyouexplicitlymentioninyourdialogue\.
\-Example:"I’mlookingataround$35kforatruck"\-\>Includemodel\+priceonly
\-Missingtermswillauto\-fillfromtheirpreviousoffer\(ifany\)
\[\!\]DONOTspecifytermsyouhaven’tdiscussed\-letthememergenaturally
Turn 6\+, no complete offer on table \(propose a deal\):
TimetoPROPOSEACOMPLETEDEAL\.Stateallmajortermsexplicitlyinyourdialogue\.
Whenmakingacompleteproposal:
1\.SAYallthetermsinyourdialogue\(model,price,delivery,etc\.\)
2\.THENincludetheminyourJSON
3\.Don’tincludeanythingyoudidn’texplicitlymention
Turns 6–15, offer on table \(active bargaining\):
REACTtotheiroffer\.Pushforbettertermsoracceptifgoodenough\.
ForyourJSON:
\-Accepttheiroffer?\-\>\{"action":"ACCEPT"\}
\-Changespecificterms?\-\>IncludeONLYthetermsyouwanttochange
\-Theirofferauto\-fillsunchangedterms
\[\!\]IMPORTANT:TheotherpartyhasHARDCONSTRAINTStheycannotviolate\.
Iftheykeeprejectingcertainvalues,trydifferentcombinations\-
findtheoverlapzone\!
Turns 16–30, offer on table \(convergence\):
CONVERGEtowardadeal\!Findmutuallyacceptableterms\.
\[\!\]Iftheykeeprejectingyourproposals,youmaybeoutsidetheir
feasiblerange\.
TRYDIFFERENTTERMS\-don’tkeepdemandingimpossiblevalues\.
Anydealaboveyourreservationvalueisbetterthannodeal\.
Turn 31\+, offer on table \(final round\):
FINALROUND\!Accepttheirofferormakeyourfinalcounter\.
Iftheiroffergivesyoupositiveutility\(abovereservation\),ACCEPTIT\.
Otherwise,makeONEFINALcounterandpreparetoaccepttheirresponse\.
### C\.4\. Counterparty Intelligence Block \(Informed Conditions\)
When an agent has nonzero information quality, the following block is appended to the per\-turn instruction after the phase text above\. The CRITICAL / IMPORTANT / FLEXIBLE tiers mirror the preference tiers of Section[C\.2](https://arxiv.org/html/2605.16575#A3.SS2), but now describe the*opponent’s inferred preferences*with action\-oriented language pointing toward the agent’s own gain\.
\#\#\[?\]INTELLIGENCEONOPPONENT’SPREFERENCES:
\*\*CRITICAL\*\*\(veryimportanttothem\):
\-Price:decreaseprice\(lowerprice\-\>higherutilityforthem\)
\-Model:chooseSedan\(selectingSedan\-\>higherutilityforthem\)
\*\*IMPORTANT\*\*\(moderatelyimportanttothem\):
\-Interior:chooseLuxury\(Luxury\-\>higherutilityforthem\)
\-DeliveryDay:decreasedeliverytime\(faster\-\>higherutilityforthem\)
\*\*FLEXIBLE\*\*\(theydon’tcaremuch\):
\-Color:chooseWhitecolor\(White\-\>higherutilityforthem\)
\[remaininglow\-weightfeatureslistedsimilarly\]
\*\*STRATEGICGUIDELINES\-usethistoMAXIMIZEYOUROWNutility:\*\*
\-TheirCRITICALitems=YOURleverage\.Theyneedthesebadly\-\>
demandconcessionsonYOURprioritiesinexchange\.
\-ItemsYOUvaluebutTHEYdon’tcareabout\-\>Pushhardhere\-
givingyouwhatyouwantcoststhemalmostnothing\.
\-ItemsTHEYvaluebutYOUdon’t\-\>Onlyconcedetheseinexchangefor
somethingYOUcareabout\.Nevergivethemawayforfree\.
\-Knowtheirwalk\-awaypoint:proposetermsthatgivethemJUSTENOUGH
toaccept,keepingmaximumsurplusforyourself\.
### C\.5\. Trade\-Plan Block \(with\_planCondition\)
In the trade\-plan experiment, the following one\-step reasoning scaffold is appended to every per\-turn instruction of every agent \(after any counterparty intelligence block\)\. Agents are instructed to complete the template*before*writing their dialogue and JSON\.
\#\#\[P\]TRADEPLAN\-completethisBEFOREwritingyourdialogue:
Basedonwhatyouknow\(opponentpreferencesifavailable,conversation
historyotherwise\),packageaconcretetradebeforeacting:
STEP1\-FeaturetoCONCEDE\(theyseemtowantit,andgivingit
costsyoulittle\):
\-\>Feature:\_\_\_Direction:\_\_\_
\-\>Whyit’scheapforyou:\_\_\_
STEP2\-FeaturetoDEMANDinreturn\(youwantit\-don’tgiveit
awayforfree\):
\-\>Feature:\_\_\_Direction:\_\_\_
\-\>Whyyoushouldextractthisnow:\_\_\_
STEP3\-Yourpackage:"I’llgivethem\_\_\_IFtheygiveme\_\_\_"
\(Ifnotrademakessensethisturn,statewhyinonesentence\.\)
OnlyAFTERcompletingtheplanabove,writeyourdialogueandJSON\.Similar Articles
Agentic Trading: When LLM Agents Meet Financial Markets
This paper presents a systematic survey and evidence map of 77 studies on LLM-based trading agents, finding that architectural experimentation is expanding rapidly but evaluation protocols, execution semantics, and reproducibility remain critical bottlenecks.
Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions: Evidence from the St. Petersburg Game
Researchers evaluate 28 LLMs on the St. Petersburg game to distinguish between outcome-level resemblance and mechanism-level alignment in risk decision-making, finding that LLMs often produce human-like bids without underlying human-consistent reasoning mechanisms. The study demonstrates that behavioral alignment can be superficial, urging high-stakes evaluations to go beyond outcome similarity.
Preference Estimation via Opponent Modeling in Multi-Agent Negotiation
This paper proposes a novel preference estimation method that integrates natural language information from LLMs into a structured Bayesian opponent modeling framework for multi-agent negotiation. The approach leverages LLMs to extract qualitative cues from utterances and convert them into probabilistic formats, demonstrating improved agreement rates and preference estimation accuracy on multi-party negotiation benchmarks.
Can LLMs Be CEOs? Benchmarking Strategic Resource Reallocation with Multi-Role Agent Simulation
This paper introduces CEO-Bench, a multi-agent benchmark for evaluating LLMs on CEO-level strategic resource reallocation, revealing systematic failure modes and a structural integration–boldness tradeoff.
Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents
This paper investigates the behavioral alignment and representation dynamics of LLM agents in financial trading, introducing the TradeArena testbed and finding measurable pre-failure signatures in planning embeddings that can predict drawdowns with high accuracy across multiple frontier models and stress conditions.