Tree-Based Formalization of Multi-Agent Complementarity in Human-AI Interactions
Summary
This paper introduces a tree-based formal framework for modeling complementarity in multi-agent human-AI interactions, proving that complementarity is attainable in regression but obstructed in classification under natural conditions on local aggregation and loss functions.
View Cached Full Text
Cached at: 06/05/26, 02:08 AM
# Tree-Based Formalization of Multi-Agent Complementarity in Human–AI Interactions
Source: [https://arxiv.org/html/2606.04779](https://arxiv.org/html/2606.04779)
Andrea Ferrario[https://orcid.org/0000-0001-9968-9474](https://orcid.org/0000-0001-9968-9474)Email:aferrario@ethz\.ch\.Institute of Biomedical Ethics and History of Medicine, University of Zurich, Zurich, SwitzerlandSUPSI, Dalle Molle Institute for Artificial Intelligence \(IDSIA\), Lugano, SwitzerlandETH Zurich, Zurich, Switzerland
###### Abstract
Complementarity is the case in which a human–AI interaction \(HAI\) outperforms the best prediction benchmark available among its members\. Although this idea is central in human–AI interaction research, formal work on complementarity remains limited\. Existing formal frameworks do not model how agents’ predictions compose into workflow\-sensitive multi\-agent protocols\. We close this gap by introducing a tree\-based formalization of complementarity in multi\-agent HAI\. An HAI protocol is represented by an ordered agent\-role configuration together with a rooted planar binary tree whose leaves are decorated by prediction vectors\. A local binary composition rule is evaluated recursively along the tree, yielding a tree\-relative complementarity functional relative to a pointwise\-min oracle benchmark\. We prove four results\. First, selector\-based HAIs, including self\- or AI\-reliance, cannot achieve complementarity regardless of task, loss, or prediction quality\. Second, in regression under squared loss, complementarity is equivalent to Euclidean distance minimization from the ground\-truth vector; forN=2N\\\!=\\\!2, the optimal linear\-pooling weight has a closed form and a residual\-correction interpretation\. Third, under linear local composition, every protocol tree defines a barycentric coordinate chart on the simplex of leaf weights; Tamari\-cover reparameterizations of protocol trees preserve complementarity, and forN=4N\\\!=\\\!4, the Tamari\-cover reparameterizations satisfy the pentagon identity\. Fourth, in binary classification, no internal local composition can achieve complementarity with pointwise\-min oracle benchmark under endpoint\-monotone losses, including standard Bregman and many finite Bernoulliff\-divergence losses; an analogous obstruction holds for coordinatewise\-internal multiclass aggregation under cross\-entropy\. In summary, our framework shows that complementarity is attainable in multi\-agent regression, but obstructed in classification under natural conditions on local aggregation and loss functions\. If the pointwise\-min oracle benchmark is appropriate for at least high\-stakes HAI settings, this should prompt a revision of how complementarity is investigated empirically\.
Keywords:Artificial Intelligence, Human–AI Interactions, Complementarity, Planar Binary Trees, Associahedra
## 1Introduction
In 1998, the Dutch Eurodance project Alice Deejay rose to popularity by asking a relevant question: “Do you think you’re better off alone?”111I refer to their song “Better off alone” for all details\.Almost thirty years later, the human–AI interaction \(HAI\) domain answers*no*\. The reason is that, under certain conditions, a human–AI team can perform better than the best standalone predictor available among its members\. This phenomenon is called*complementarity*\. While complementarity has attracted much attention across HAI\(Bansalet al\.,[2021b](https://arxiv.org/html/2606.04779#bib.bib3),[a](https://arxiv.org/html/2606.04779#bib.bib4); Hemmeret al\.,[2021](https://arxiv.org/html/2606.04779#bib.bib18),[2025](https://arxiv.org/html/2606.04779#bib.bib7); Donahueet al\.,[2022](https://arxiv.org/html/2606.04779#bib.bib5); Vaccaroet al\.,[2024](https://arxiv.org/html/2606.04779#bib.bib6)\), formal work on complementarity remains limited\. Existing formalizations are centered on the two\-agent setting: a human and an AI system interacting on the same prediction task\. In this setting, complementarity compares the empirical loss of the human–AI team with the better aggregate empirical loss of the human and the AI considered separately\(Hemmeret al\.,[2025](https://arxiv.org/html/2606.04779#bib.bib7); Donahueet al\.,[2022](https://arxiv.org/html/2606.04779#bib.bib5)\)\. For instance,Donahueet al\.\([2022](https://arxiv.org/html/2606.04779#bib.bib5)\)analyze complementarity at the level of binary\-classification loss ‘regimes’ in a two\-agent setting whileRastogiet al\.\([2023](https://arxiv.org/html/2606.04779#bib.bib61)\)develop a taxonomy of human and ML strengths and an optimization framework for convex combinations of human and ML policies\. Although complementarity is a central concept in HAI, the standard two\-agent setting abstracts away from the structure of the interaction itself\. Many real\-world workflows involve more than one human and one AI system: experts and assistants may collaborate with AI tools in sequential decision processes; multiple human–AI dyads may contribute competing or complementary predictions; and several AI systems may aggregate, filter, or revise predictions at different stages of a workflow\. Thus, once more than two agents are involved, complementarity requires specifying how agents are ordered, how local interactions are performed at the level of prediction vectors, and how intermediate outputs are composed into a final HAI protocol output\.
We close this gap by developinga tree\-based formalization of complementarity in multi\-agent HAI\. We study a regime in which all agents act on the same labeled dataset and predict the same target\. A multi\-agent HAI protocol is represented by an ordered agent\-role configuration together with a rooted planar binary tree whose leaves are decorated by the agents’ prediction vectors\. The ordered configuration captures the protocol\-sensitive role order; the tree captures the binary compositions of the HAI protocol; and a local binary rule determines how two prediction vectors are combined at each internal node\. Evaluating this rule recursively along the tree yields a tree\-relative HAI protocol output\.
Our contributions are as follows\. First, we show a task\-independent impossibility theorem for reliance\. If local rules select one of their two inputs coordinatewise, such as in the case of self\- and AI\-reliance\(Schemmeret al\.,[2023](https://arxiv.org/html/2606.04779#bib.bib21)\), then no interaction protocol can achieve complementarity relative to the pointwise\-min oracle benchmark, regardless of the proportion of accurate reliance instances\. Thus,*achieving complementarity requires producing interaction outputs that are not selected from the set of agent predictions*\.
Second, we study regression under squared loss\. In this setting, we prove that the complementarity functional has a geometric form: maximizing complementarity is equivalent to minimizing the Euclidean distance between the HAI protocol output and the ground\-truth vector\. ForN=2N\\\!=\\\!2and linear aggregation, this yields a closed\-form optimizer\. Here,*complementarity depends on whether the human–AI disagreement direction corrects the AI residual with respect to ground truth, and whether the correction is large enough to place the optimum inside the feasible pooling segment*\. ForN≥3N\\geq 3, we study complementarity invariance in relation to tree topology and interaction reparameterization\.
Third, in regression with linear pooling as local composition and*any*loss function, we show that every tree defines a barycentric coordinate chart on the simplex of leaf weights\. Using this coordinate representation, we prove complementarity invariance under Tamari covers of protocol trees\(Tamari,[1962](https://arxiv.org/html/2606.04779#bib.bib64)\)\. Therefore,*two distinct HAI protocol trees related by a Tamari move can lead to the same complementarity level after appropriately transporting their local parameters while keeping the same barycentric coordinates*\. In addition, forN=4N\\\!=\\\!4, we prove a coherence result: the Tamari\-cover reparameterizations satisfy the pentagon identity\(Yanofsky,[2024](https://arxiv.org/html/2606.04779#bib.bib80)\)\.
Fourth, we show that in*binary classification, no internal local rule can achieve complementarity under endpoint\-monotone losses*\. Internal rules are those whose coordinatewise outputs remain between the corresponding input probabilities; endpoint monotonicity captures the basic requirement that assigning more probability to the true class should not increase loss\. This impossibility result applies to broad families of Bregman and many standard finite Bernoulliff\-divergence losses\(Bregman,[1967](https://arxiv.org/html/2606.04779#bib.bib72); Ali and Silvey,[1966](https://arxiv.org/html/2606.04779#bib.bib79)\), including binary cross\-entropy\. The obstruction extends to multiclass problems with internal local rules under cross\-entropy\. We show that using non\-internal rules, e\.g\., by amplifying binary logarithmic/logit pooling\(Neyman and Roughgarden,[2023b](https://arxiv.org/html/2606.04779#bib.bib68)\), allows one to escape this impossibility regime\.
In summary, our framework shows that complementarity is attainable in multi\-agent regression, but obstructed in classification under natural conditions\. If the pointwise\-min oracle benchmark is appropriate for at least high\-stakes HAI settings, this should prompt a revision of how complementarity is investigated empirically\.
## 2Related Work on Complementarity
In research on human–AI interaction, complementarity is the case in which the*human–AI team*performs better than either component alone\(Bansalet al\.,[2021b](https://arxiv.org/html/2606.04779#bib.bib3),[a](https://arxiv.org/html/2606.04779#bib.bib4); Hemmeret al\.,[2021](https://arxiv.org/html/2606.04779#bib.bib18),[2025](https://arxiv.org/html/2606.04779#bib.bib7)\)\. This perspective shifted attention away from evaluating AI systems in isolation and toward the joint performance of the interaction, especially in advice\-based decision\-support settings where humans remain accountable for the final decision\(Bansalet al\.,[2021b](https://arxiv.org/html/2606.04779#bib.bib3); Miller,[2023](https://arxiv.org/html/2606.04779#bib.bib24)\)\. A central formalization is*complementarity team performance*\(CTP\), which treats complementarity as a binary property of a prediction\-task human–AI interaction: complementarity is achieved whenever the empirical loss of the team is strictly lower than the minimum of the empirical losses of the human and the AI considered separately\(Hemmeret al\.,[2021](https://arxiv.org/html/2606.04779#bib.bib18),[2025](https://arxiv.org/html/2606.04779#bib.bib7)\)\. In this sense, complementarity extends the selector logic characteristic of reliance relations, where the output is constrained to coincide with either the human or the AI prediction, by allowing interaction outputs that need not equal either input\(Schemmeret al\.,[2023](https://arxiv.org/html/2606.04779#bib.bib21); Zhanget al\.,[2020](https://arxiv.org/html/2606.04779#bib.bib20); Bansalet al\.,[2021b](https://arxiv.org/html/2606.04779#bib.bib3)\)\.
Existing formal work focuses on the two\-agent human–AI setting\.Donahueet al\.\([2022](https://arxiv.org/html/2606.04779#bib.bib5)\)study complementarity in a two\-component human–algorithm system directly in loss space\. Their framework is developed for binary classification and partitions the input space into loss\-homogeneous*regimes*\. A regime is not an individual prediction on a sample, but a type or region of cases for which human, algorithmic, and combined performance can be summarized by loss rates\. In regimeii, the unaided human has losshih\_\{i\}, the algorithm has lossaia\_\{i\}, and the combined system has lossc\(ai,hi\)c\(a\_\{i\},h\_\{i\}\), assumed to lie between the two standalone losses,min\{ai,hi\}≤c\(ai,hi\)≤max\{ai,hi\}\\min\\\{a\_\{i\},h\_\{i\}\\\}\\leq c\(a\_\{i\},h\_\{i\}\)\\leq\\max\\\{a\_\{i\},h\_\{i\}\\\}\. Complementarity is then defined relative to the better aggregate standalone loss, namely the better of the human and algorithm losses*after averaging across regimes*—see Def \(1\) in\(Donahueet al\.,[2022](https://arxiv.org/html/2606.04779#bib.bib5)\)\. Their impossibility results show that, in this regime\-level loss framework, complementarity cannot be achieved if human and algorithm loss rates are constant over regimes or if one agent always weakly dominates the loss of the other—see Lemma 2 and 3 in\(Donahueet al\.,[2022](https://arxiv.org/html/2606.04779#bib.bib5)\)\.Rastogiet al\.\([2023](https://arxiv.org/html/2606.04779#bib.bib61)\)develop a taxonomy of human and ML strengths and weaknesses in decision\-making, organized around task definition, input, internal processing, and output\. They emphasize that complementarity should not be assumed because a human and an ML model are combined; rather, one should identify the concrete source of potential complementarity, such as access to different information, different objectives, different models of the world, or different output capabilities\. To operationalize this idea,Rastogiet al\.\([2023](https://arxiv.org/html/2606.04779#bib.bib61)\)introduce an optimization framework in which a*joint policy*is obtained by convexly combining a human policy and an ML policy at the instance level\. Their corresponding metrics of across\-instance and within\-instance complementarity quantify how the optimal joint policy distributes contribution across the two agents\.
Furthermore, a related line of work has begun to study complementarity, deferral, and collaboration with multiple human experts\.Hemmeret al\.\([2022](https://arxiv.org/html/2606.04779#bib.bib75)\)train a classifier together with an allocation system that routes each instance either to the model or to one of several human experts\.Vermaet al\.\([2023](https://arxiv.org/html/2606.04779#bib.bib76)\)study learning\-to\-defer with multiple experts, focusing on surrogate losses, calibration, and consistency guarantees for selecting which expert should handle a case\.Paat and Shen \([2025](https://arxiv.org/html/2606.04779#bib.bib77)\)use conformal prediction sets to select subsets of relevant human experts for instance\-level classification\. Finally,Penget al\.\([2025](https://arxiv.org/html/2606.04779#bib.bib78)\)prove a no\-free\-lunch result for collaboration among two or more calibrated probabilistic agents in binary classification\.
Taken together, these works show that complementarity depends on allocation, deferral, expert selection, and the distribution of expertise across agents\. However, they do not provide a theory of complementarity as optimization over multi\-agent interaction protocols\. Their central design problem is typically which expert, model, or subset of experts to query, and whether a case should be deferred\. By contrast, they do not model recursive prediction\-vector composition conditional on real\-world workflow topology\. This limitation matters because many real\-world HAI settings are not naturally two\-agent interactions, and workflow structure can affect the final prediction\. In medicine, for example, diagnostic and prognostic decisions may involve a general practitioner, a specialist, a radiologist, a nurse, a patient or caregiver, and one or more AI\-based decision\-support tools\. In education, social services, or public administration, domain experts, case workers, lay users, and AI systems may all contribute different forms of information to a final judgment\. In such settings, complementarity depends on which agents participate, on how human and artificial agents are ordered, which local interactions occur first, and how intermediate outputs are aggregated\. Taken together, these considerations motivate the mathematical framework for complementarity that we introduce in what follows\.
## 3Tree\-Based Formalization of Multi\-Agent Complementarity
We introduce a tree\-based formalism for empirical complementarity in multi\-agent prediction\-task HAIs\. Our approach is as follows: \(i\) we distinguish HAIs, prediction\-task HAIs, protocols, and protocol trees; \(ii\) we introduce notation for agents, roles, and ordered configurations; and \(iii\) we define tree\-relative complementarity functionals\.
### 3\.1Human–AI Interactions, Agents, Roles, and Ordered Configurations
In what follows, a prediction task is a tupleτ=\(𝒳,𝒴,𝒴^,ℓ\)\\tau=\(\\mathcal\{X\},\\mathcal\{Y\},\\widehat\{\\mathcal\{Y\}\},\\ell\), where𝒳\\mathcal\{X\}is an input space,𝒴\\mathcal\{Y\}is the label space,𝒴^\\widehat\{\\mathcal\{Y\}\}is the prediction space, andℓ:𝒴×𝒴^→\[0,∞\)\\ell:\\mathcal\{Y\}\\times\\widehat\{\\mathcal\{Y\}\}\\to\[0,\\infty\),\(y,y^\)↦ℓ\(y,y^\)\(y,\\hat\{y\}\)\\mapsto\\ell\(y,\\hat\{y\}\), is a pointwise loss function\. A labeled dataset is a finite set
D=\{\(xi,yi\)\}i=1n⊆𝒳×𝒴\.D=\\\{\(x\_\{i\},y\_\{i\}\)\\\}\_\{i=1\}^\{n\}\\subseteq\\mathcal\{X\}\\times\\mathcal\{Y\}\.We use*human–AI interaction*\(HAI\) as a high\-level term for a goal\-oriented process in which human and AI agents contribute informational inputs that are integrated into a single output\. In this paper we restrict attention to*prediction\-task HAIs*: HAIs relative to a prediction taskτ=\(𝒳,𝒴,𝒴^,ℓ\)\\tau=\(\\mathcal\{X\},\\mathcal\{Y\},\\widehat\{\\mathcal\{Y\}\},\\ell\), in which the relevant inputs and outputs are prediction vectors and the HAI output is evaluated against ground\-truth labels in a datasetDD\. Thus, a prediction\-task HAI specifies the agents—either human or AI systems—their roles, the prediction task, and the goal of producing an HAI output, but it does not yet determine how the agents interact\. We call a*protocol*a concrete realization of such a prediction\-task HAI: it specifies the order in which agent predictions enter the interaction, the local rules by which they are combined, and the way intermediate outputs are propagated to a final prediction\. Hence, the same prediction\-task HAI may admit several protocols\. We elaborate on these concepts in what follows\.
###### Definition 1\(Agents, roles, and ordered configurations\)\.
Let𝒜=\{a1,…,aN\}\\mathcal\{A\}=\\\{a\_\{1\},\\dots,a\_\{N\}\\\}be a finite set of agents, letℛ\\mathcal\{R\}be a finite set of roles, and letr:𝒜→ℛr:\\mathcal\{A\}\\to\\mathcal\{R\}be a function assigning a role to each agent\. An*ordered agent\-role configuration*is a tuple
cσ,N=\(\(aσ\(1\),r\(aσ\(1\)\)\),…,\(aσ\(N\),r\(aσ\(N\)\)\)\),σ∈𝔖N,c\_\{\\sigma,N\}=\\bigl\(\(a\_\{\\sigma\(1\)\},r\(a\_\{\\sigma\(1\)\}\)\),\\dots,\(a\_\{\\sigma\(N\)\},r\(a\_\{\\sigma\(N\)\}\)\)\\bigr\),~\\sigma\\in\\mathfrak\{S\}\_\{N\},where𝔖N\\mathfrak\{S\}\_\{N\}denotes the permutation group onNNelements\.
For readability, in the discussion below we suppress agent identities and display only the ordered role sequence\. In what follows, we assume all agents act on the same datasetDDfor the same prediction taskτ\\tau, and all agents predict the same target variable\. Thus, each agentaσ\(i\)a\_\{\\sigma\(i\)\}in the configurationcσ,Nc\_\{\\sigma,N\}is represented by a predictorfaσ\(i\):𝒳→𝒴^f\_\{a\_\{\\sigma\(i\)\}\}:\\mathcal\{X\}\\to\\widehat\{\\mathcal\{Y\}\}, with empirical prediction vector
y^\(σ\(i\)\)=\(faσ\(i\)\(x1\),…,faσ\(i\)\(xn\)\)∈𝒴^n\.\\hat\{y\}^\{\(\\sigma\(i\)\)\}=\(f\_\{a\_\{\\sigma\(i\)\}\}\(x\_\{1\}\),\\dots,f\_\{a\_\{\\sigma\(i\)\}\}\(x\_\{n\}\)\)\\in\\widehat\{\\mathcal\{Y\}\}^\{n\}\.Onceτ\\tauandDDare fixed, we use the ordered configuration𝐜σ,N\\mathbf\{c\}\_\{\\sigma,N\}as the combinatorial representation of the HAI agents and roles\. We now turn to HAI protocols\. To address them, we introduce rooted planar binary trees\.
###### Definition 2\(Rooted planar binary trees\)\.
Let𝖸N\\mathsf\{Y\}\_\{N\}denote the set of rooted planar binary trees withNNleaves, equivalentlyN−1N\-1internal vertices\. We use the recursive convention𝖸1=\{\|\}\\mathsf\{Y\}\_\{1\}=\\\{\\,\|\\,\\\}, where\|\|is the one\-leaf tree, with no internal vertex\. ForN≥2N\\geq 2, define
𝖸N=∐p\+q=Np,q≥1\{σ∨τ:σ∈𝖸p,τ∈𝖸q\}\.\\mathsf\{Y\}\_\{N\}=\\coprod\_\{\\begin\{subarray\}\{c\}p\+q=N\\\\ p,q\\geq 1\\end\{subarray\}\}\\left\\\{\\sigma\\vee\\tau:\\sigma\\in\\mathsf\{Y\}\_\{p\},\\ \\tau\\in\\mathsf\{Y\}\_\{q\}\\right\\\}\.Hereσ∨τ\\sigma\\vee\\taudenotes root grafting: a new internal root vertex is created whose left subtree isσ\\sigmaand whose right subtree isτ\\tau\. Thus every nontrivial rooted planar binary tree admits a unique decomposition into its left and right root subtrees\.
For𝖳∈𝖸N\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}, letV\(𝖳\)V\(\\mathsf\{T\}\)denote its set of internal vertices\. Then\|V\(𝖳\)\|=N−1\|V\(\\mathsf\{T\}\)\|=N\-1\. We define the left\-to\-right orderingo\(𝖳\)o\(\\mathsf\{T\}\)ofV\(𝖳\)V\(\\mathsf\{T\}\)recursively\. For the one\-leaf tree,o\(\|\)=∅o\(\|\)=\\varnothing\. If𝖳=σ∨xτ\\mathsf\{T\}=\\sigma\\vee\_\{x\}\\tau, wherexxdenotes the newly created root vertex, then
o\(𝖳\)=\{o\(σ\)<x<o\(τ\)\}\.o\(\\mathsf\{T\}\)=\\\{\\,o\(\\sigma\)<x<o\(\\tau\)\\,\\\}\.Equivalently, the internal vertices of the left subtree come first, then the root vertexxx, and then the internal vertices of the right subtree\.
The cardinality of𝖸N\\mathsf\{Y\}\_\{N\}is the Catalan number\|𝖸N\|=CN−1=1N\(2N−2N−1\)\.\|\\mathsf\{Y\}\_\{N\}\|=C\_\{N\-1\}=\\frac\{1\}\{N\}\\binom\{2N\-2\}\{N\-1\}\.
###### Definition 3\(Protocol trees for a configuration\)\.
Letcσ,Nc\_\{\\sigma,N\}be an ordered agent\-role configuration of lengthNN\. A*protocol tree for𝐜σ,N\\mathbf\{c\}\_\{\\sigma,N\}*is a rooted planar binary tree𝖳∈𝖸N\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}whose leaves are decorated from left to right bycσ,Nc\_\{\\sigma,N\}\. Equivalently, once the predictors induced bycσ,Nc\_\{\\sigma,N\}are fixed, the leaves of𝖳\\mathsf\{T\}are labeled from left to right by the corresponding ordered prediction vectorsy^\(σ\(1\)\),…,y^\(σ\(N\)\)\\hat\{y\}^\{\(\\sigma\(1\)\)\},\\dots,\\hat\{y\}^\{\(\\sigma\(N\)\)\}\.
Throughout the paper, the ordered configurationcσ,Nc\_\{\\sigma,N\}is treated as fixed\. Thus, we do not consider permutations of the leaves and we work with rooted planar binary trees because left\-to\-right leaf order is part of the workflow specification; in particular, mirror\-related trees are not identified\. For notational simplicity, once a configurationcσ,Nc\_\{\\sigma,N\}is specified, we write the ordered leaf predictions simply as
y^\(1\),…,y^\(N\)∈𝒴^n\.\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\}\\in\\widehat\{\\mathcal\{Y\}\}^\{n\}\.
### 3\.2Complementarity Functionals
#### 3\.2\.1The Two\-Agent Setting
Following the literature on HAI, we introduce the complementarity functional in the two\-agent setting\.
###### Definition 4\(Two\-agent complementarity\(Bansalet al\.,[2021b](https://arxiv.org/html/2606.04779#bib.bib3); Hemmeret al\.,[2025](https://arxiv.org/html/2606.04779#bib.bib7); Donahueet al\.,[2022](https://arxiv.org/html/2606.04779#bib.bib5)\)\)\.
Let a human agent and an AI system interact in a HAI\. Lety^H=\(y^1H,…,y^nH\)\\hat\{y\}^\{H\}=\(\\hat\{y\}\_\{1\}^\{H\},\\dots,\\hat\{y\}\_\{n\}^\{H\}\),y^AI=\(y^1AI,…,y^nAI\)\\hat\{y\}^\{AI\}=\(\\hat\{y\}\_\{1\}^\{AI\},\\dots,\\hat\{y\}\_\{n\}^\{AI\}\)denote the vectors of predictions of the human agent and the AI system over the datasetDD\. Lety^HAI=\(y^1HAI,…,y^nHAI\)\\hat\{y\}^\{HAI\}=\(\\hat\{y\}\_\{1\}^\{HAI\},\\dots,\\hat\{y\}\_\{n\}^\{HAI\}\)denote the vector of ‘human–AI’ team predictions that results from the interaction between the human and the AI\. These predictions depend on the inputsy^H\\hat\{y\}^\{H\}andy^AI\\hat\{y\}^\{AI\}\. Consider the average empirical losses
L♣\(D\):=1n∑i=1nℓ\(yi,y^i♣\),♣∈\{H,AI,HAI\}\.L\_\{\\clubsuit\}\(D\):=\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\ell\(y\_\{i\},\\hat\{y\}\_\{i\}^\{\\clubsuit\}\),\\qquad\\clubsuit\\in\\\{H,AI,HAI\\\}\.Define
𝒞2\(y^H,y^AI;D\):=min\{LH\(D\),LAI\(D\)\}−LHAI\(D\)\.\\mathcal\{C\}\_\{2\}\(\\hat\{y\}^\{H\},\\hat\{y\}^\{AI\};D\):=\\min\\\{L\_\{H\}\(D\),L\_\{AI\}\(D\)\\\}\-L\_\{HAI\}\(D\)\.\(1\)The human–AI team achieves complementarity if
𝒞2\(y^H,y^AI;D\)\>0\.\\mathcal\{C\}\_\{2\}\(\\hat\{y\}^\{H\},\\hat\{y\}^\{AI\};D\)\>0\.
Thus, complementarity records relative empirical gain between a human agent and an AI system by comparing the losses of the ‘human\-AI team’ with those of the human and the AI system independently, on a given dataset\. Our goal is to extend complementarity analysis from the two\-agent setting to multi\-agent protocols\. Doing so, however, we start using a differentN=2N\\\!=\\\!2benchmark than \([1](https://arxiv.org/html/2606.04779#S3.E1)\)\. Specifically, we consider the functional atN=2N\\\!=\\\!2
Ψ\(y^H,y^AI;D\)=1n∑i=1nmin\{ℓ\(yi,y^iH\),ℓ\(yi,y^iAI\)\}−1n∑i=1nℓ\(yi,y^iHAI\)\.\\displaystyle\\Psi\(\\hat\{y\}^\{H\},\\hat\{y\}^\{AI\};D\)=\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\min\\\!\\left\\\{\\ell\(y\_\{i\},\\hat\{y\}\_\{i\}^\{H\}\),\\,\\ell\(y\_\{i\},\\hat\{y\}\_\{i\}^\{AI\}\)\\right\\\}\-\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\ell\\bigl\(y\_\{i\},\\hat\{y\}^\{HAI\}\_\{i\}\\bigr\)\.\(2\)The functionalΨ\\Psidiffers from𝒞2\\mathcal\{C\}\_\{2\}in \([1](https://arxiv.org/html/2606.04779#S3.E1)\) by the placement of the minimum and the empirical average\. The pointwise\-min benchmark in \([2](https://arxiv.org/html/2606.04779#S3.E2)\) is the empirical counterpart of the population functional
𝔼X,Y\[min\{ℓ\(Y,y^H\(X\)\),ℓ\(Y,y^AI\(X\)\)\}−ℓ\(Y,y^HAI\(X\)\)\],\\mathbb\{E\}\_\{X,Y\}\[\\min\\\{\\ell\(Y,\\hat\{y\}^\{H\}\(X\)\),\\ell\(Y,\\hat\{y\}^\{AI\}\(X\)\)\\\}\-\\ell\(Y,\\hat\{y\}^\{HAI\}\(X\)\)\],\(3\)whereas the𝒞2\\mathcal\{C\}\_\{2\}corresponds to the quantity
min\{𝔼\(X,Y\)\[ℓ\(Y,y^H\(X\)\)\],𝔼\(X,Y\)\[ℓ\(Y,y^AI\(X\)\)\]\}−𝔼\(X,Y\)\[ℓ\(Y,y^HAI\(X\)\)\]\.\\min\\\{\\mathbb\{E\}\_\{\(X,Y\)\}\[\\ell\(Y,\\hat\{y\}^\{H\}\(X\)\)\],\\mathbb\{E\}\_\{\(X,Y\)\}\[\\ell\(Y,\\hat\{y\}^\{AI\}\(X\)\)\]\\\}\-\\mathbb\{E\}\_\{\(X,Y\)\}\[\\ell\(Y,\\hat\{y\}^\{HAI\}\(X\)\)\]\.Thus,Ψ\\Psifollows the standard statistical\-learning passage from a pointwise population functional to its sample average\. By contrast, the criterion in \([1](https://arxiv.org/html/2606.04779#S3.E1)\) places the minimum after aggregation: it compares aggregate risks, but it is not the empirical\-risk analogue of the pointwise oracle\-gain functional\. They are related as follows:
###### Proposition 1\.
For all predictionsy^H,y^AI\\hat\{y\}^\{H\},\\hat\{y\}^\{AI\}and every datasetDD
Ψ\(y^H,y^AI;D\)≤𝒞2\(y^H,y^AI;D\)\.\\Psi\(\\hat\{y\}^\{H\},\\hat\{y\}^\{AI\};D\)\\leq\\mathcal\{C\}\_\{2\}\(\\hat\{y\}^\{H\},\\hat\{y\}^\{AI\};D\)\.\(4\)
###### Proof\.
Letai:=ℓ\(yi,y^iH\)a\_\{i\}:=\\ell\(y\_\{i\},\\hat\{y\}\_\{i\}^\{H\}\),bi:=ℓ\(yi,y^iAI\)b\_\{i\}:=\\ell\(y\_\{i\},\\hat\{y\}\_\{i\}^\{AI\}\)\. Averaging gives
1n∑i=1nmin\{ai,bi\}≤1n∑i=1nai=LH\(D\)and1n∑i=1nmin\{ai,bi\}≤1n∑i=1nbi=LAI\(D\)\.\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\min\\\{a\_\{i\},b\_\{i\}\\\}\\leq\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}a\_\{i\}=L\_\{H\}\(D\)\\quad\\text\{and\}\\quad\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\min\\\{a\_\{i\},b\_\{i\}\\\}\\leq\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}b\_\{i\}=L\_\{AI\}\(D\)\.Therefore,1n∑i=1nmin\{ai,bi\}≤min\{LH\(D\),LAI\(D\)\}\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\min\\\{a\_\{i\},b\_\{i\}\\\}\\leq\\min\\\{L\_\{H\}\(D\),L\_\{AI\}\(D\)\\\}\. SubtractingLHAI\(D\)L\_\{HAI\}\(D\)from both sides yields \([4](https://arxiv.org/html/2606.04779#S3.E4)\)\. ∎
Proposition[1](https://arxiv.org/html/2606.04779#Thmproposition1)shows that the pointwise\-min oracle benchmark is stricter than the classical aggregate benchmark in the two\-agent complementarity case\. The two benchmarks correspond to different reference risks\. The classical benchmark compares the interaction output with the best fixed standalone predictor in aggregate, whereas the pointwise\-min benchmark compares it with the best available standalone prediction at the case level before averaging\.
#### 3\.2\.2The Multi\-Agent Setting
We now move to the complementarity functional forN≥2N\\geq 2generalizing the caseN=2N\\\!=\\\!2\. We start with a structural assumption:
###### Assumption 1\(Benchmark–interaction decomposition\)\.
Complementarity functionals have the form
Ψ\(y^\(1\),…,y^\(N\);D\)=Φ\(y^\(1\),…,y^\(N\);D\)−Θ\(y^\(1\),…,y^\(N\);D\),\\displaystyle\\Psi\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)=\\Phi\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)\-\\Theta\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\),whereΦ\\Phiis a benchmark term independent of protocol topology, andΘ\\Thetais the loss of the protocol output\.
This assumption generalizes the structure of the complementarity functional in Definition[1](https://arxiv.org/html/2606.04779#S3.E1)to the multi\-agent setting\. Then, the problem becomes to \(1\) choose the benchmark termΦ\\Phi, and \(2\) introduce the interaction termΘ\\Theta\. The benchmark term can be introduced by generalizingΨ\\Psiin \([2](https://arxiv.org/html/2606.04779#S3.E2)\)\. To address the interaction part, however, we need to introduce the logic according to which agents are allowed to interact in an HAI\. The focus on planar binary trees reveals a natural simplification: local*binary*interactions\. On this view, HAIs evolve through elementary interaction steps, each involving two inputs at a time, with intermediate outputs then propagated through the interaction protocol\. The binary restriction gives a direct generalization of the standardN=2N\\\!=\\\!2human–AI setting: every local interaction is still a two\-input interaction, but larger protocols can be built by composing such elementary steps\. This covers natural sequential workflows, such asexpert–assistant–AIconfigurations, deferred referral structures, and settings in which severalhuman–AIdyads are combined into a final decision\. Further, it separates local interaction from global protocol structure\. The same local rule can be evaluated on different trees, allowing us to ask whether complementarity depends on the local aggregation mechanism or on the order in which agents enter the workflow\. Finally, although the binary restriction is a simplification as some workflows may involve simultaneous or higher\-arity interactions, it yields a mathematically tractable class of protocols: binary trees expose algebraic questions about associativity, geometric questions about attainable prediction regions, and analytical questions about optimizing complementarity under a chosen loss\. Thus, we arrive at:
###### Assumption 2\(Local binary composition\)\.
Let𝖳∈𝖸N\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}with leaves labeled byy^\(1\),…,y^\(N\)∈𝒴^n\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\}\\in\\widehat\{\\mathcal\{Y\}\}^\{n\}\. Its output
y^𝖳:=m𝖳\(y^\(1\),…,y^\(N\)\)∈𝒴^n\\hat\{y\}\_\{\\mathsf\{T\}\}:=m\_\{\\mathsf\{T\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\}\)\\in\\widehat\{\\mathcal\{Y\}\}^\{n\}is defined recursively by applying a local binary rulem2:𝒴^n×𝒴^n→𝒴^nm\_\{2\}:\\widehat\{\\mathcal\{Y\}\}^\{n\}\\times\\widehat\{\\mathcal\{Y\}\}^\{n\}\\to\\widehat\{\\mathcal\{Y\}\}^\{n\}at each internal node of𝖳\\mathsf\{T\}\.
We introduce the complementarity functional for interactions withN≥2N\\geq 2agents using our tree\-based formulation\. It is the main mathematical object of this work\.
###### Definition 5\(Tree\-relative complementarity functional\)\.
Letτ=\(𝒳,𝒴,𝒴^,ℓ\)\\tau=\(\\mathcal\{X\},\\mathcal\{Y\},\\widehat\{\\mathcal\{Y\}\},\\ell\)be a prediction task andD=\{\(xi,yi\)\}i=1nD=\\\{\(x\_\{i\},y\_\{i\}\)\\\}\_\{i=1\}^\{n\}be a labeled dataset\. Letm2m\_\{2\}be a local binary composition rule andy^\(1\),…,y^\(N\)\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\}be theNNpredictions ofNNagents of a configurationcσ,Nc\_\{\\sigma,N\}\.
For any protocol tree𝖳∈𝖸N\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}forcσ,Nc\_\{\\sigma,N\}, the tree\-relative complementarity functionalΨ𝖳m𝖳\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}is defined as
Ψ𝖳m𝖳\(y^\(1\),…,y^\(N\);D\):=Φ\(y^\(1\),…,y^\(N\);D\)−Θ𝖳\(y^\(1\),…,y^\(N\);D\),\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\):=\\Phi\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)\-\\Theta\_\{\\mathsf\{T\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\),\(5\)where the multi\-agent benchmark term is the pointwise\-min
Φ\(y^\(1\),…,y^\(N\);D\):=1n∑i=1nmin1≤j≤Nℓ\(yi,y^i\(j\)\)\.\\Phi\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\):=\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\min\_\{1\\leq j\\leq N\}\\ell\\bigl\(y\_\{i\},\\hat\{y\}\_\{i\}^\{\(j\)\}\\bigr\)\.\(6\)and the interaction term reads
Θ𝖳\(y^\(1\),…,y^\(N\);D\):=1n∑i=1nℓ\(yi,y^i𝖳\)\.\\Theta\_\{\\mathsf\{T\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\):=\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\ell\\bigl\(y\_\{i\},\\hat\{y\}\_\{i\}^\{\\mathsf\{T\}\}\\bigr\)\.\(7\)
#### 3\.2\.3On the Choice of Benchmark in the Definition of Complementarity
The comparison in Proposition[1](https://arxiv.org/html/2606.04779#Thmproposition1)extends to theNN\-agent tree\-relative setting\. For each leaf prediction, letLj\(D\):=1n∑i=1nℓ\(yi,y^i\(j\)\)L\_\{j\}\(D\):=\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\ell\(y\_\{i\},\\hat\{y\}\_\{i\}^\{\(j\)\}\),j=1,…,Nj=1,\\dots,N\. If one generalizes the classical aggregate complementarity criterion toNNagents by defining
𝒞N,𝖳m𝖳\(y^\(1\),…,y^\(N\);D\):=min1≤j≤NLj\(D\)−Θ𝖳\(y^\(1\),…,y^\(N\);D\),\\displaystyle\\mathcal\{C\}\_\{N,\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\):=\\min\_\{1\\leq j\\leq N\}L\_\{j\}\(D\)\-\\Theta\_\{\\mathsf\{T\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\),\(8\)with the same interaction termΘ𝖳\\Theta\_\{\\mathsf\{T\}\}as in \([7](https://arxiv.org/html/2606.04779#S3.E7)\), thenΨ𝖳m𝖳\(y^\(1\),…,y^\(N\);D\)≤𝒞N,𝖳m𝖳\(y^\(1\),…,y^\(N\);D\)\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)\\leq\\mathcal\{C\}\_\{N,\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)\.
The two complementarity functionals coincide if and only if at least one fixed agent is pointwise loss\-minimal on every sample inDD, that is
∃ȷ¯∈\{1,…,N\}:ℓ\(yi,y^i\(ȷ¯\)\)=min1≤j≤Nℓ\(yi,y^i\(j\)\)for everyi=1,…,n\.\\displaystyle\\exists\\bar\{\\jmath\}\\in\\\{1,\\dots,N\\\}:\\quad\\ell\\bigl\(y\_\{i\},\\hat\{y\}\_\{i\}^\{\(\\bar\{\\jmath\}\)\}\\bigr\)=\\min\_\{1\\leq j\\leq N\}\\ell\\bigl\(y\_\{i\},\\hat\{y\}\_\{i\}^\{\(j\)\}\\bigr\)\\qquad\\text\{for every \}i=1,\\dots,n\.This comparison clarifies the benchmark used throughout the paper\. LetℱN=\{f1,…,fN\}\\mathcal\{F\}\_\{N\}=\\\{f\_\{1\},\\dots,f\_\{N\}\\\}denote the finite family of available agent predictors, and letf𝖳f\_\{\\mathsf\{T\}\}denote the predictor induced by the protocol tree𝖳\\mathsf\{T\}\. Its population risk is
R\(f𝖳\):=𝔼\(X,Y\)\[ℓ\(Y,f𝖳\(X\)\)\]\.R\(f\_\{\\mathsf\{T\}\}\):=\\mathbb\{E\}\_\{\(X,Y\)\}\\\!\\left\[\\ell\\bigl\(Y,f\_\{\\mathsf\{T\}\}\(X\)\\bigr\)\\right\]\.At the population level, the classical aggregate benchmark corresponds to the best fixed member ofℱN\\mathcal\{F\}\_\{N\}in expected risk:
R𝒞\(ℱN\):=min1≤j≤N𝔼\(X,Y\)\[ℓ\(Y,fj\(X\)\)\]\.R\_\{\\mathcal\{C\}\}\(\\mathcal\{F\}\_\{N\}\):=\\min\_\{1\\leq j\\leq N\}\\mathbb\{E\}\_\{\(X,Y\)\}\\\!\\left\[\\ell\\bigl\(Y,f\_\{j\}\(X\)\\bigr\)\\right\]\.The pointwise\-min benchmark corresponds instead to the finite\-family pointwise oracle:
Rpw\(ℱN\):=𝔼\(X,Y\)\[min1≤j≤Nℓ\(Y,fj\(X\)\)\]\.R\_\{\\mathrm\{pw\}\}\(\\mathcal\{F\}\_\{N\}\):=\\mathbb\{E\}\_\{\(X,Y\)\}\\\!\\left\[\\min\_\{1\\leq j\\leq N\}\\ell\\bigl\(Y,f\_\{j\}\(X\)\\bigr\)\\right\]\.Thus, at the population level,*complementarity can be understood as minus an excess risk relative to the chosen benchmark*\. The empirical counterparts of both complementarity functionals are obtained by the usual plug\-in passage from population risk to empirical risk\. Both benchmarks are legitimate, but they answer different questions\. The aggregate benchmark in \([8](https://arxiv.org/html/2606.04779#S3.E8)\) asks whether the interaction improves over deploying the best fixed component across all cases\. The pointwise\-min benchmark \([6](https://arxiv.org/html/2606.04779#S3.E6)\) in \([5](https://arxiv.org/html/2606.04779#S3.E5)\) asks whether the protocol improves over the best available prediction at the case level, before averaging, instead\.
This distinction is relevant for HAI research because the aggregate benchmark can report complementarity even when the interaction fails to improve over the best available prediction on each case\. Consider two agents,HHandAA, and losses in arbitrary units:
LHLALHAIcase1041case2401\\begin\{array\}\[\]\{c\|ccc\}&L\_\{H\}&L\_\{A\}&L\_\{\\mathrm\{HAI\}\}\\\\ \\hline\\cr\\text\{case \}1&0&4&1\\\\ \\text\{case \}2&4&0&1\\end\{array\}Hence𝒞2=2−1=1\>0\\mathcal\{C\}\_\{2\}=2\-1=1\>0,Ψ=0−1=−1<0\\Psi=0\-1=\-1<0\. Thus, the aggregate benchmark reports complementarity, while the pointwise\-min benchmark detects that the protocol is worse than an available prediction on every case\. A second example separates complementarity from casewise selection\. Suppose the protocol selects the better available prediction on each case, again with losses in arbitrary units:
LHLALHAIcase1040case2400\\begin\{array\}\[\]\{c\|ccc\}&L\_\{H\}&L\_\{A\}&L\_\{\\mathrm\{HAI\}\}\\\\ \\hline\\cr\\text\{case \}1&0&4&0\\\\ \\text\{case \}2&4&0&0\\end\{array\}Then𝒞2=2−0=2\>0\\mathcal\{C\}\_\{2\}=2\-0=2\>0,Ψ=0−0=0\\Psi=0\-0=0\. The aggregate benchmark treats the casewise selector as complementary relative to the best fixed agent\. The pointwise\-min benchmark classifies it as optimal selection among available predictions, but not as improvement beyond the best available prediction\. \(This is a case of ‘appropriate reliance,’ as discussed in Section[4\.2](https://arxiv.org/html/2606.04779#S4.SS2)\.\)
For this reason,*the pointwise\-min benchmark is useful in workflows where case\-level errors matter*\. Examples include medical diagnosis, triage, and review, judicial or administrative decision support, and safety\-critical model monitoring\. The same consideration applies to multi\-AI and AI\-agent environments, where several models, model versions, systems, or agents may produce predictions for the same case\. In such settings, an aggregate benchmark may show that an ensemble or multi\-agent protocol improves over the best fixed model on average, while still failing to use the best available prediction on important cases\. The pointwise\-min benchmark is stricter because it evaluates the protocol against the best available casewise prediction from the finite family\. In lower\-stakes or large\-scale settings where the relevant deployment alternative is a single fixed component and the primary object of evaluation is average performance, the aggregate benchmark may be sufficient, although it can inflate complementarity by ignoring casewise heterogeneity among agents\. Unless explicitly stated otherwise,*all complementarity claims in the remainder of the paper are relative to pointwise\-min oracle benchmarks—see eq\.[6](https://arxiv.org/html/2606.04779#S3.E6)*\.
This benchmark choice matters for the interpretation of the impossibility results below—see Sections[4\.2](https://arxiv.org/html/2606.04779#S4.SS2)and[8\.1](https://arxiv.org/html/2606.04779#S8.SS1)\. These results use the pointwise\-min benchmark\. By contrast, the regression optimization results are largely insensitive to the benchmark choice—see Sections[5](https://arxiv.org/html/2606.04779#S5),[6](https://arxiv.org/html/2606.04779#S6), and[7](https://arxiv.org/html/2606.04779#S7)\. In squared\-loss regression, replacing the pointwise\-min benchmark by the aggregate benchmark changes the complementarity functional only by an additive constant independent of the protocol output\. Thus the optimizer, protocol\-indifference loci, and tree\-reparameterization invariance results remain unchanged; only the numerical complementarity value, and hence the threshold for strict positivity, changes\.
### 3\.3Optimizing Tree\-Relative Complementarity Functionals
The tree\-relative complementarity functionalΨ𝖳m𝖳\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}supports several optimization regimes, depending on which part of the interaction design is treated as fixed and which part is treated as variable\. While in this paper we work with a fixed ordered configuration and identify the corresponding decorated protocol set with𝖸N\\mathsf\{Y\}\_\{N\}to keep notation light, in applications, additional workflow constraints𝐜\\mathbf\{c\}may exclude some tree shapes\. In that case one replaces𝖸N\\mathsf\{Y\}\_\{N\}by an admissible subset𝖸Nadm\(𝐜\)⊆𝖸N\\mathsf\{Y\}\_\{N\}^\{\\mathrm\{adm\}\}\(\\mathbf\{c\}\)\\subseteq\\mathsf\{Y\}\_\{N\}\. We distinguish two optimization cases\.
##### Case 1: tree optimization for fixed local interaction\.
Fix a local rulem2m\_\{2\}\. The optimization problem is
𝖳∗∈argmax𝖳∈𝖸NΨ𝖳m𝖳\(y^\(1\),…,y^\(N\);D\)\.\\mathsf\{T\}^\{\\ast\}\\in\\operatorname\*\{arg\\,max\}\_\{\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}\}\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)\.\(9\)This is the pure protocol\-topology problem: one searches only over rooted planar binary trees\.
##### Case 2: joint optimization over trees and local rules\.
Letℳ\\mathcal\{M\}be a class of admissible local rules, possibly parametric\. For a fixed tree𝖳∈𝖸N\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}, choose an assignment
𝐦𝖳=\(m2,v\)v∈V\(𝖳\)∈ℳN−1\\mathbf\{m\}\_\{\\mathsf\{T\}\}=\(m\_\{2,v\}\)\_\{v\\in V\(\\mathsf\{T\}\)\}\\in\\mathcal\{M\}^\{N\-1\}of local rules to theN−1N\\\!\-1\\\!internal nodes of𝖳\\mathsf\{T\}\. This assignment induces, by recursive evaluation along𝖳\\mathsf\{T\}, a tree\-level composition mapm𝖳m\_\{\\mathsf\{T\}\}\. The dependence ofm𝖳m\_\{\\mathsf\{T\}\}on the node\-wise assignment𝐦𝖳\\mathbf\{m\}\_\{\\mathsf\{T\}\}, and on any node\-wise parameters whenℳ\\mathcal\{M\}is parametric, is suppressed unless it is needed explicitly\. One may then optimize jointly over protocol topology and local interaction:
sup𝖳∈𝖸N,𝐦𝖳∈ℳN−1Ψ𝖳m𝖳\(y^\(1\),…,y^\(N\);D\)\.\\sup\_\{\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\},\\mathbf\{m\}\_\{\\mathsf\{T\}\}\\in\\mathcal\{M\}^\{N\-1\}\}\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)\.\(10\)This regime treats both the tree and the local aggregation mechanism as design variables\. Ifℳ\\mathcal\{M\}is parametric, the same family of local rules may be used at every internal node, while the parameter values may vary from node to node\.
## 4Local Compositions and a First Impossibility Result
In this section, we discuss some properties of local binary compositions and show a first impossibility result\.
### 4\.1Local Compositions and Associativity
Let us introduce two familiesℳsel\\mathcal\{M\}\_\{sel\}andℳρ\\mathcal\{M\}\_\{\\rho\}of local binary compositions\. Elements ofℳsel\\mathcal\{M\}\_\{sel\}are*selectors*, namely, set\-theoretic local binary compositions that return one of their inputs coordinatewise on ordered sets\. Examples arem2\(u,v\)=um\_\{2\}\(u,v\)=u,m2\(u,v\)=vm\_\{2\}\(u,v\)=v,m2\(u,v\)=min\{u,v\}m\_\{2\}\(u,v\)=\\min\\\{u,v\\\}, andm2\(u,v\)=max\{u,v\}m\_\{2\}\(u,v\)=\\max\\\{u,v\\\}\. Elements ofℳρ\\mathcal\{M\}\_\{\\rho\}are*quasi\-arithmetic means*m2,αρm\_\{2,\\alpha\}^\{\\rho\}\(Kolmogorov,[1930](https://arxiv.org/html/2606.04779#bib.bib67); Nagumo,[1930](https://arxiv.org/html/2606.04779#bib.bib74)\), withα∈\[0,1\]\\alpha\\in\[0,1\]\. Letρ:I→ℝ\\rho:I\\to\\mathbb\{R\}be continuous and strictly monotone increasing, whereI⊆ℝI\\subseteq\\mathbb\{R\}is an interval\. Forα∈\[0,1\]\\alpha\\in\[0,1\], define the coordinatewise local rule
\(m2,αρ\(u,v\)\)i:=ρ−1\(αρ\(ui\)\+\(1−α\)ρ\(vi\)\),\(m\_\{2,\\alpha\}^\{\\rho\}\(u,v\)\)\_\{i\}:=\\rho^\{\-1\}\\\!\\bigl\(\\alpha\\rho\(u\_\{i\}\)\+\(1\-\\alpha\)\\rho\(v\_\{i\}\)\\bigr\),\(11\)foru,v∈Inu,v\\in I^\{n\}\. Ifρ=id\\rho=\\mathrm\{id\}, one obtains the family of*linear pooling*functionsαu\+\(1−α\)v\\alpha u\+\(1\-\\alpha\)v\. IfI=\(0,1\)I=\(0,1\)andρ=logit\\rho=\\operatorname\{logit\}, that isρ\(p\)=log\(p1−p\),p∈\(0,1\)\\rho\(p\)=\\log\\left\(\\frac\{p\}\{1\-p\}\\right\),p\\in\(0,1\), one obtains*logit pooling*functions\. Thus quasi\-arithmetic means provide local compositions for both regression\-style and probabilistic aggregation rules\. While the four examples of selector compositions are associative, quasi\-arithmetic means are non\-associative except in the projection cases:
###### Lemma 1\.
The quasi\-arithmetic meanm2,αρm\_\{2,\\alpha\}^\{\\rho\}in \([11](https://arxiv.org/html/2606.04779#S4.E11)\) is associative if and only ifα∈\{0,1\}\\alpha\\in\\\{0,1\\\}\.
###### Proof\.
LetIIcontain at least two points\. For allu,v,w∈Iu,v,w\\in I,m2,αρ\(m2,αρ\(u,v\),w\)=m2,αρ\(u,m2,αρ\(v,w\)\)m^\{\\rho\}\_\{2,\\alpha\}\(m^\{\\rho\}\_\{2,\\alpha\}\(u,v\),w\)=m^\{\\rho\}\_\{2,\\alpha\}\(u,m^\{\\rho\}\_\{2,\\alpha\}\(v,w\)\)is equivalent toα2=α\\alpha^\{2\}=\\alpha,\(1−α\)=\(1−α\)2\(1\-\\alpha\)=\(1\-\\alpha\)^\{2\}\. Thenα∈\{0,1\}\\alpha\\in\\\{0,1\\\}\. ∎
While selector compositions collapse tree dependence, Lemma[1](https://arxiv.org/html/2606.04779#Thmlemma1)shows that within the quasi\-arithmetic family, every nontrivial weight choiceα∈\(0,1\)\\alpha\\in\(0,1\)yields a non\-associative local rule\. Hence, in that family, tree topology becomes a source of variation for complementarity\.
### 4\.2Reliance Never Reaches Complementarity
We call*reliance*the selector\-based special case of a prediction\-task HAI\. In the standard two\-agent decision\-support setting, the inputs are the human and AI predictionsy^iH\\hat\{y\}\_\{i\}^\{H\}andy^iAI\\hat\{y\}\_\{i\}^\{AI\}for the same caseii\. Reliance occurs when the interaction output is not a transformed or combined prediction, but simply one of these two inputs: ify^iHAI=y^iH\\hat\{y\}\_\{i\}^\{HAI\}=\\hat\{y\}\_\{i\}^\{H\}, the case is one of*self\-reliance*; ify^iHAI=y^iAI\\hat\{y\}\_\{i\}^\{HAI\}=\\hat\{y\}\_\{i\}^\{AI\}, it is one of*AI reliance*\(Schemmeret al\.,[2023](https://arxiv.org/html/2606.04779#bib.bib21); Zhanget al\.,[2020](https://arxiv.org/html/2606.04779#bib.bib20); Ferrario,[2025](https://arxiv.org/html/2606.04779#bib.bib52)\)\. In the HAI literature, the normative target is often*appropriate reliance*: the human follows correct AI advice when the AI is right and ignores it when the AI is wrong\. This casewise notion extends to datasets: a two\-agent HAI exhibits reliance onDDif, for every sample inDD, its output is either the human prediction or the AI prediction\. Then one may measure appropriate reliance onDDby the proportion of samples on which the selected source is correct\. However, the theorem below shows that selector local rules do not yield complementarity in multi\-agent HAIs\.
###### Theorem 1\(Selectors cannot yield complementarity\)\.
Let𝖳∈𝖸N\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}, and suppose the local rulem2∈ℳselm\_\{2\}\\in\\mathcal\{M\}\_\{sel\}:\(m2\(u,v\)\)i∈\{ui,vi\}for allu,v∈𝒴^n,i=1,…,n\.\(m\_\{2\}\(u,v\)\)\_\{i\}\\in\\\{u\_\{i\},v\_\{i\}\\\}~\\text\{for all \}~u,v\\in\\widehat\{\\mathcal\{Y\}\}^\{n\},\\ i=1,\\dots,n\.Then, for every datasetDD,Ψ𝖳m𝖳\(y^\(1\),…,y^\(N\);D\)≤0\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)\\leq 0\.
###### Proof\.
Fix a caseii\. Sincem2m\_\{2\}selects one of its two coordinate inputs, recursive composition along the tree implies that the protocol output on coordinateiiequals one of the leaf predictions:\(m𝖳\(y^\(1\),…,y^\(N\)\)\)i∈\{y^i\(1\),…,y^i\(N\)\}\(m\_\{\\mathsf\{T\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\}\)\)\_\{i\}\\in\\\{\\hat\{y\}\_\{i\}^\{\(1\)\},\\dots,\\hat\{y\}\_\{i\}^\{\(N\)\}\\\}\. Hence
min1≤j≤Nℓ\(yi,y^i\(j\)\)≤ℓ\(yi,\(m𝖳\(y^\(1\),…,y^\(N\)\)\)i\)\.\\min\_\{1\\leq j\\leq N\}\\ell\(y\_\{i\},\\hat\{y\}\_\{i\}^\{\(j\)\}\)\\leq\\ell\\bigl\(y\_\{i\},\(m\_\{\\mathsf\{T\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\}\)\)\_\{i\}\\bigr\)\.Averaging overi=1,…,ni=1,\\dots,nyields
Φ\(y^\(1\),…,y^\(N\);D\)≤1n∑i=1nℓ\(yi,\(m𝖳\(y^\(1\),…,y^\(N\)\)\)i\)⟺Ψ𝖳m𝖳\(y^\(1\),…,y^\(N\);D\)≤0\.∎\\Phi\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)\\leq\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\ell\\bigl\(y\_\{i\},\(m\_\{\\mathsf\{T\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\}\)\)\_\{i\}\\bigr\)\\Longleftrightarrow\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)\\leq 0\.\\qed
###### Corollary 1\.
Consider the two\-agent configuration\(human,AI\)\(\\textnormal\{\{human\}\},\\textnormal\{\{AI\}\}\)\. If their interaction output exhibits self\-reliance or AI reliance on every sample ofDD, then the HAI cannot achieve complementarity, regardless of the proportion of appropriately reliant cases onDD\.
## 5Complementarity in Regression Under Squared Loss
We now turn to regression, showing that tree\-relative complementarity functional optimization under squared loss admits a geometric interpretation and can be solved analytically in a few cases\. The label and prediction spaces are𝒴=𝒴^=ℝ\\mathcal\{Y\}=\\widehat\{\\mathcal\{Y\}\}=\\mathbb\{R\}, so prediction vectors lie in𝒴^n=ℝn\\widehat\{\\mathcal\{Y\}\}^\{n\}=\\mathbb\{R\}^\{n\}and the loss is the squared loss, i\.e\.,ℓ\(y,y^\)=\(y−y^\)2\\ell\(y,\\hat\{y\}\)=\(y\-\\hat\{y\}\)^\{2\}\.
###### Proposition 2\(Squared\-loss complementarity as distance minimization\)\.
Assume regression with squared loss\. Define
Kn:=1n∑i=1nmin1≤j≤N\(yi−y^i\(j\)\)2\.K\_\{n\}:=\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\min\_\{1\\leq j\\leq N\}\(y\_\{i\}\-\\hat\{y\}\_\{i\}^\{\(j\)\}\)^\{2\}\.\(12\)Then, for every tree𝖳∈𝖸N\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}and every induced outputy^𝖳∈𝒴^n\\hat\{y\}\_\{\\mathsf\{T\}\}\\in\\widehat\{\\mathcal\{Y\}\}^\{n\},
nΨ𝖳m𝖳\(y^\(1\),…,y^\(N\);D\)=nKn−‖y−y^𝖳‖22,n\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)=nK\_\{n\}\-\\\|y\-\\hat\{y\}\_\{\\mathsf\{T\}\}\\\|^\{2\}\_\{2\},\(13\)where‖x‖22=∑i=1nxi2,x∈ℝn\\\|x\\\|\_\{2\}^\{2\}=\\sum\_\{i=1\}^\{n\}x^\{2\}\_\{i\},x\\in\\mathbb\{R\}^\{n\}\. Hence, for fixedDDand fixed leaf predictions, maximizing complementarity is equivalent to minimizing the Euclidean distance between the root outputy^𝖳\\hat\{y\}\_\{\\mathsf\{T\}\}and the label vectoryy\.
###### Proof\.
Eq\. \([13](https://arxiv.org/html/2606.04779#S5.E13)\) follows from the definition of the tree\-relative complementarity functional in eq\. \([7](https://arxiv.org/html/2606.04779#S3.E7)\) and the squared loss\. ∎
It follows from Proposition[2](https://arxiv.org/html/2606.04779#Thmproposition2)thatΨ𝖳m𝖳\>0⟺‖y−y^𝖳‖22<nKn\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}\>0\\Longleftrightarrow\\\|y\-\\hat\{y\}\_\{\\mathsf\{T\}\}\\\|\_\{2\}^\{2\}<nK\_\{n\}\. Thus complementarity holds exactly when the tree predictiony^𝖳\\hat\{y\}\_\{\\mathsf\{T\}\}lies inside the Euclidean sphere centered atyywith radiusnKn\\sqrt\{nK\_\{n\}\}\. Likewise,Ψ𝖳m𝖳=0\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}=0occurs exactly wheny^𝖳\\hat\{y\}\_\{\\mathsf\{T\}\}lies on that sphere as displayed in Figure[1](https://arxiv.org/html/2606.04779#S5.F1)\. Note that, by definition,Kn=Φ\(y^\(1\),…,y^\(N\);D\)K\_\{n\}=\\Phi\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)—see eq\. \([6](https://arxiv.org/html/2606.04779#S3.E6)\)\.
Further, in regression under the squared loss function,*complementarity invariance under HAI protocol change*has a direct geometric interpretation\. By Proposition[2](https://arxiv.org/html/2606.04779#Thmproposition2), two distinct trees𝖳,𝖳′∈𝖸N\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\\in\\mathsf\{Y\}\_\{N\}have the same complementarity value
Ψ𝖳m𝖳\(y^\(1\),…,y^\(N\);D\)=Ψ𝖳′m𝖳′\(y^\(1\),…,y^\(N\);D\)⟺‖y−y^𝖳‖22=‖y−y^𝖳′‖22\.\\displaystyle\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)=\\Psi\_\{\\mathsf\{T\}^\{\\prime\}\}^\{m\_\{\\mathsf\{T\}^\{\\prime\}\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)\\Longleftrightarrow\\\|y\-\\hat\{y\}\_\{\\mathsf\{T\}\}\\\|\_\{2\}^\{2\}=\\\|y\-\\hat\{y\}\_\{\\mathsf\{T\}^\{\\prime\}\}\\\|\_\{2\}^\{2\}\.\(14\)Geometrically, \([14](https://arxiv.org/html/2606.04779#S5.E14)\) holds when the two protocol outputsy^𝖳\\hat\{y\}\_\{\\mathsf\{T\}\}andy^𝖳′\\hat\{y\}\_\{\\mathsf\{T\}^\{\\prime\}\}lie on the same sphere centered atyy\. Thus, equality of complementarity values does not require equality of protocol outputs: two distinct protocols of the same HAI may achieve the same level of complementarity while producing different predictions\. Under linear pooling, the parameter values for which this protocol\-indifference condition holds form an algebraic locus:
###### Proposition 3\(Protocol\-indifference locus under linear pooling\)\.
Assume regression under squared loss and fix leaf predictionsy^\(1\),…,y^\(N\)∈ℝn\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\}\\in\\mathbb\{R\}^\{n\}\. Let𝖳,𝖳′∈𝖸N\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\\in\\mathsf\{Y\}\_\{N\}\. For each tree, fix the canonical recursive left\-to\-right ordering of itsN−1N\-1internal nodes and assign the local linear\-pooling rule
m2,αkid\(u,v\)=αku\+\(1−αk\)vm^\{\\mathrm\{id\}\}\_\{2,\\alpha\_\{k\}\}\(u,v\)=\\alpha\_\{k\}u\+\(1\-\\alpha\_\{k\}\)vto thekk\-th internal node\. For𝛂=\(α1,…,αN−1\)∈\[0,1\]N−1\\boldsymbol\{\\alpha\}=\(\\alpha\_\{1\},\\dots,\\alpha\_\{N\-1\}\)\\in\[0,1\]^\{N\-1\}, writey^𝖳\(𝛂\)\\hat\{y\}\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\)andy^𝖳′\(𝛂\)\\hat\{y\}\_\{\\mathsf\{T\}^\{\\prime\}\}\(\\boldsymbol\{\\alpha\}\)for the corresponding root outputs\.
Then the set of local weights for which the two protocol trees have the same complementarity value is
𝒮𝖳,𝖳′:=\{𝜶∈\[0,1\]N−1:P𝖳,𝖳′\(𝜶\)=0\},\\mathcal\{S\}\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}:=\\left\\\{\\boldsymbol\{\\alpha\}\\in\[0,1\]^\{N\-1\}:P\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}\(\\boldsymbol\{\\alpha\}\)=0\\right\\\},where
P𝖳,𝖳′\(𝜶\):=‖y−y^𝖳\(𝜶\)‖22−‖y−y^𝖳′\(𝜶\)‖22\.P\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}\(\\boldsymbol\{\\alpha\}\):=\\\|y\-\\hat\{y\}\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\)\\\|\_\{2\}^\{2\}\-\\\|y\-\\hat\{y\}\_\{\\mathsf\{T\}^\{\\prime\}\}\(\\boldsymbol\{\\alpha\}\)\\\|\_\{2\}^\{2\}\.Moreover,P𝖳,𝖳′P\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}is a polynomial in𝛂\\boldsymbol\{\\alpha\}of degree at most2\(N−1\)2\(N\-1\)\. The protocol\-indifference locus is nonempty: it always contains the two corners
\(0,…,0\)and\(1,…,1\)\.\(0,\\dots,0\)\\qquad\\text\{and\}\\qquad\(1,\\dots,1\)\.
###### Proof\.
The benchmark termΦ\\Phiis independent of the protocol tree\. Hence two trees have the same complementarity value if and only if their squared\-loss interaction terms are equal, equivalentlyP𝖳,𝖳′\(𝜶\)=0P\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}\(\\boldsymbol\{\\alpha\}\)=0\. For linear local composition, each coordinate ofy^𝖳\(𝜶\)\\hat\{y\}\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\)andy^𝖳′\(𝜶\)\\hat\{y\}\_\{\\mathsf\{T\}^\{\\prime\}\}\(\\boldsymbol\{\\alpha\}\)is a polynomial in𝜶\\boldsymbol\{\\alpha\}of degree at mostN−1N\-1, since each leaf contribution is weighted by a product of local weights along a root\-to\-leaf path\. Therefore each squared norm has degree at most2\(N−1\)2\(N\-1\), and so doesP𝖳,𝖳′P\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}\. Finally, if𝜶=\(1,…,1\)\\boldsymbol\{\\alpha\}=\(1,\\dots,1\), every internal node selects its left input, so every tree outputs the leftmost leaf predictiony^\(1\)\\hat\{y\}^\{\(1\)\}\. If𝜶=\(0,…,0\)\\boldsymbol\{\\alpha\}=\(0,\\dots,0\), every internal node selects its right input, so every tree outputs the rightmost leaf predictiony^\(N\)\\hat\{y\}^\{\(N\)\}\. Hence both corners belong to𝒮𝖳,𝖳′\\mathcal\{S\}\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}\. ∎
We now study complementarity in regression under squared loss in low\-NNcases\.
yynKn\\sqrt\{nK\_\{n\}\}y^𝖳1\\hat\{y\}\_\{\\mathsf\{T\}\_\{1\}\}y^𝖳2\\hat\{y\}\_\{\\mathsf\{T\}\_\{2\}\}Figure 1:Geometric interpretation of complementarity in regression under squared loss inℝn\\mathbb\{R\}^\{n\}\. The outer sphere is centered at the ground\-truth vectoryyand has radiusnKn\\sqrt\{nK\_\{n\}\}, corresponding to the boundary‖y−y^𝖳‖22=nKn\\\|y\-\\hat\{y\}\_\{\\mathsf\{T\}\}\\\|\_\{2\}^\{2\}=nK\_\{n\}, i\.e\., zero complementarity\. The smaller concentric sphere represents a lower common squared distance fromyy; the pointsy^𝖳1\\hat\{y\}\_\{\\mathsf\{T\}\_\{1\}\}andy^𝖳2\\hat\{y\}\_\{\\mathsf\{T\}\_\{2\}\}illustrate two protocol outputs with the same positive complementarity value\.### 5\.1TheN=2N\\\!=\\\!2Case and Complementarity Maximization
LetN=2N=2andm2,αid∈ℳidm^\{\{\\mathrm\{id\}\}\}\_\{2,\\alpha\}\\in\\mathcal\{M\}\_\{\\mathrm\{id\}\}, that ism2,αid\(u,v\)=αu\+\(1−α\)v,α∈\[0,1\]m^\{\{\\mathrm\{id\}\}\}\_\{2,\\alpha\}\(u,v\)=\\alpha u\+\(1\-\\alpha\)v,\\alpha\\in\[0,1\]\. We choose the agent configuration\(human,AI\)\(\\textnormal\{\{human\}\},\\textnormal\{\{AI\}\}\)to decorate the leaves of the unique tree𝖳∈𝖸2\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{2\}—see Figure[2](https://arxiv.org/html/2606.04779#S5.F2)\. The case\(AI,human\)\(\\textnormal\{\{AI\}\},\\textnormal\{\{human\}\}\)is treated analogously; results are interpreted under the new agent order\.
y^\(1\)\\hat\{y\}^\{\(1\)\}y^\(2\)\\hat\{y\}^\{\(2\)\}humanAIα\\alphay^𝖳\\hat\{y\}\_\{\\mathsf\{T\}\}𝖳=\(12\)\\mathsf\{T\}=\(12\)Figure 2:The rooted planar binary tree forN=2N\\\!=\\\!2, with leaves decorated byhumanandAIroles, the corresponding prediction vectorsy^\(1\)\\hat\{y\}^\{\(1\)\}andy^\(2\)\\hat\{y\}^\{\(2\)\}\.y^𝖳\\hat\{y\}\_\{\\mathsf\{T\}\}is the tree output\.###### Proposition 4\(Maximizing complementarity forN=2N\\\!=\\\!2regression under squared loss and linear pooling\)\.
Let𝖳\\mathsf\{T\}be the unique tree in𝖸2\\mathsf\{Y\}\_\{2\}\. Choose the squared loss and let
m2,αid\(u,v\)=αu\+\(1−α\)v,α∈\[0,1\]\.m\_\{2,\\alpha\}^\{\\mathrm\{id\}\}\(u,v\)=\\alpha u\+\(1\-\\alpha\)v,\\qquad\\alpha\\in\[0,1\]\.Then
nΨ𝖳m2,αid\(y^H,y^AI;D\)=−Anα2−2Bnα\+\(nKn−Cn\),n\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{2,\\alpha\}^\{\\mathrm\{id\}\}\}\(\\hat\{y\}^\{H\},\\hat\{y\}^\{AI\};D\)=\-A\_\{n\}\\alpha^\{2\}\-2B\_\{n\}\\alpha\+\(nK\_\{n\}\-C\_\{n\}\),\(15\)where
An=‖y^H−y^AI‖22,Bn=⟨y^H−y^AI,y^AI−y⟩,Cn=‖y^AI−y‖22\.A\_\{n\}=\\\|\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}\\\|\_\{2\}^\{2\},\\quad B\_\{n\}=\\langle\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\},\\hat\{y\}^\{AI\}\-y\\rangle,\\quad C\_\{n\}=\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}^\{2\}\.IfAn\>0A\_\{n\}\>0, the maximizing weight is
α∗=Π\[0,1\]\(−BnAn\)\.\\alpha^\{\\ast\}=\\Pi\_\{\[0,1\]\}\\\!\\left\(\-\\frac\{B\_\{n\}\}\{A\_\{n\}\}\\right\)\.IfAn=0A\_\{n\}=0, the aggregate is constant inα\\alphaand yields no complementarity\.
Proposition[4](https://arxiv.org/html/2606.04779#Thmproposition4)admits an explicit geometric interpretation\. Let us rewriteBnB\_\{n\}asBn=‖y^H−y^AI‖2‖y^AI−y‖2cosθB\_\{n\}=\\\|\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}\\\|\_\{2\}\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}\\cos\\theta\.
Then, the interior case−BnAn∈\(0,1\)\-\\frac\{B\_\{n\}\}\{A\_\{n\}\}\\in\(0,1\)is equivalent to the two inequalities
cosθ<0and‖y^H−y^AI‖2‖y^AI−y‖2\>−cosθ\.\\cos\\theta<0\\qquad\\text\{and\}\\qquad\\frac\{\\\|\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}\\\|\_\{2\}\}\{\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}\}\>\-\\cos\\theta\.\(16\)The first condition says that the human–AI disagreement directiony^H−y^AI\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}has a component pointing*against*the AI residualy^AI−y\\hat\{y\}^\{AI\}\-y\. The second condition says that the human predictiony^H\\hat\{y\}^\{H\}moves sufficiently far in that corrective direction for the projection ofyyonto the human–AI liney^H−y^AI\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}to lie inside the convex segment betweeny^AI\\hat\{y\}^\{AI\}andy^H\\hat\{y\}^\{H\}\.
###### Corollary 2\.
In the interior case−BnAn∈\(0,1\)\-\\frac\{B\_\{n\}\}\{A\_\{n\}\}\\in\(0,1\), the optimized value of the complementarity functional is
nΨ𝖳m2,α∗id\(y^H,y^AI;D\)=nKn−Cn\+Bn2An=nKn−‖y^AI−y‖22sin2θ,\\displaystyle n\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{2,\\alpha^\{\\ast\}\}^\{\\mathrm\{id\}\}\}\(\\hat\{y\}^\{H\},\\hat\{y\}^\{AI\};D\)=nK\_\{n\}\-C\_\{n\}\+\\frac\{B\_\{n\}^\{2\}\}\{A\_\{n\}\}=nK\_\{n\}\-\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}^\{2\}\\sin^\{2\}\\theta,\(17\)whereα∗∈\(0,1\)\\alpha^\{\\ast\}\\in\(0,1\)\. Thus, linear pooling can remove only the component of the AI residual that lies along the human–AI disagreement direction\. In particular, conditional on a fixed pointwise\-oracle benchmarkKnK\_\{n\}and a fixed AI residual norm‖y^AI−y‖2\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}, the optimized complementarity value is maximized by minimizingsin2θ\\sin^\{2\}\\theta, which occurs when the human–AI disagreement direction is opposite to the AI residual\.
yyy^AI\\hat\{y\}^\{AI\}y^H\\hat\{y\}^\{H\}𝟎\\mathbf\{0\}y^AI−y\\hat\{y\}^\{AI\}\-yy^H−y^AI\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}‖y^AI−y‖2sinθ\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}\\sin\\theta−‖y^AI−y‖2cosθ\-\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}\\cos\\thetaθ\\thetaFigure 3:Geometry for theN=2N\\\!=\\\!2regression case with linear pooling\. The segment of the line throughy^AI\\hat\{y\}^\{AI\}andy^H\\hat\{y\}^\{H\}that lies between the two vectors is the locus of feasible human–AI team predictions\. The line through the origin𝟎\\mathbf\{0\}is the human–AI disagreement direction\. The vectory^AI−y\\hat\{y\}^\{AI\}\-yis projected onto this line andθ\\thetais the angle betweeny^AI−y\\hat\{y\}^\{AI\}\-yandy^H−y^AI\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}\. The interior case inequalities \([16](https://arxiv.org/html/2606.04779#S5.E16)\) are both satisfied\. Further, ify^H−y^AI\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}is collinear withy^AI−y\\hat\{y\}^\{AI\}\-ybut oppositely oriented, and if the projection ofyylies inside the segment betweeny^AI\\hat\{y\}^\{AI\}andy^H\\hat\{y\}^\{H\}, thensinθ=0\\sin\\theta=0and the optimized value satisfiesnΨ𝖳m2,α∗id\(y^H,y^AI;D\)=nKnn\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{2,\\alpha^\{\\ast\}\}^\{\\mathrm\{id\}\}\}\(\\hat\{y\}^\{H\},\\hat\{y\}^\{AI\};D\)=nK\_\{n\}\. Note thatKnK\_\{n\}is a function ofy^H\\hat\{y\}^\{H\}andy^AI\\hat\{y\}^\{AI\}—see \([12](https://arxiv.org/html/2606.04779#S5.E12)\)\.#### 5\.1\.1Numerical illustration of theN=2N\\\!=\\\!2regression case
We illustrate theN=2N\\\!=\\\!2regression case with linear pooling by training a random\-forest regressor on the California housing training set and evaluating complementarity on the test set\. All details on the experimental setting are in Appendix[E](https://arxiv.org/html/2606.04779#A5)\. Taking the AI residualr=y^AI−yr\\\!=\\\!\\hat\{y\}^\{AI\}\\\!\-\\\!yas fixed, we generate synthetic human predictions of the formy^H=y^AI\+d\\hat\{y\}^\{H\}=\\hat\{y\}^\{AI\}\+d, where the displacementddhas controlled angleθ\\thetawithrrand controlled norm ratioq=‖d‖2/‖r‖2q=\\\|d\\\|\_\{2\}/\\\|r\\\|\_\{2\}\. For each scenario, we plot the quadraticnΨ𝖳m2,αidn\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{2,\\alpha\}^\{\\mathrm\{id\}\}\}and its constrained maximizerα∗=Π\[0,1\]\(−Bn/An\)\\alpha^\{\\ast\}=\\Pi\_\{\[0,1\]\}\(\-B\_\{n\}/A\_\{n\}\)\. Figure[4](https://arxiv.org/html/2606.04779#S5.F4)confirms the geometric setting: when the human displacement points in the same direction as the AI error, or is orthogonal to it, the optimizer collapses to the AI boundary and no complementarity is obtained\. A corrective direction is not sufficient by itself: atθ=3π4\\theta=\\frac\{3\\pi\}\{4\}with smallqq, the unconstrained maximizer lies beyond the feasible segment, so the constrained optimum is at the human boundary and remains non\-complementary\. Positive complementarity appears only when the human displacement is both corrective and sufficiently long, as in theθ=3π4\\theta=\\frac\{3\\pi\}\{4\}andθ=π\\theta=\\picases with interior optima\.
Figure 4:N=2N\\\!=\\\!2regression complementarity under linear pooling using the California housing dataset\. Each curve shows the quadratic value ofnΨ𝖳m2,αidn\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{2,\\alpha\}^\{\\mathrm\{id\}\}\}\[\($100,000\)2\(\\mathdollar 100\{,\}000\)^\{2\}\] as a function of the aggregation weightα\\alpha, for synthetic human predictions constructed by controlling the angleθ\\thetabetweeny^H−y^AI\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}andy^AI−y\\hat\{y\}^\{AI\}\-y, and the relative displacementq=‖y^H−y^AI‖2/‖y^AI−y‖2q=\\\|\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}\\\|\_\{2\}/\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}\. Vertical markers indicate the constrained optimizer\.nKnnK\_\{n\}changes as a function ofy^H\\hat\{y\}^\{H\}\.
### 5\.2TheN=3N\\\!=\\\!3Case and Complementarity Invariance Under Changes of Protocol
We now study the first nontrivial case in which the same prediction\-task HAI admits distinct protocols\. Let𝖳L=\(\(12\)3\)\\mathsf\{T\}\_\{L\}=\(\(12\)3\)and𝖳R=\(1\(23\)\)\\mathsf\{T\}\_\{R\}=\(1\(23\)\)be the two rooted planar binary trees in𝖸3\\mathsf\{Y\}\_\{3\}, and decorate their leaves with the fixed ordered configuration\(expert,assistant,AI\)\(\\texttt\{expert\},\\texttt\{assistant\},\\texttt\{AI\}\)as in Figure[5](https://arxiv.org/html/2606.04779#S5.F5)\. Thus,𝖳L\\mathsf\{T\}\_\{L\}represents the protocol in which theexpertandassistantfirst form an intermediate human judgment, which is then combined with theAIprediction\. By contrast,𝖳R\\mathsf\{T\}\_\{R\}represents the protocol in which theassistantfirst interacts with theAI, and theexpertenters only at the final stage\.
We compare the two protocols under local linear pooling\. The same local parameters\(α1,α2\)∈\[0,1\]2\(\\alpha\_\{1\},\\alpha\_\{2\}\)\\in\[0,1\]^\{2\}are assigned according to the internal node ordering for each tree—see Figure[5](https://arxiv.org/html/2606.04779#S5.F5)\. The corresponding outputs are
y^𝖳L\(α1,α2\)=α2\(α1y^\(1\)\+\(1−α1\)y^\(2\)\)\+\(1−α2\)y^\(3\),\\hat\{y\}\_\{\\mathsf\{T\}\_\{L\}\}\(\\alpha\_\{1\},\\alpha\_\{2\}\)=\\alpha\_\{2\}\\bigl\(\\alpha\_\{1\}\\hat\{y\}^\{\(1\)\}\+\(1\-\\alpha\_\{1\}\)\\hat\{y\}^\{\(2\)\}\\bigr\)\+\(1\-\\alpha\_\{2\}\)\\hat\{y\}^\{\(3\)\},\(18\)and
y^𝖳R\(α1,α2\)=α1y^\(1\)\+\(1−α1\)\(α2y^\(2\)\+\(1−α2\)y^\(3\)\)\.\\hat\{y\}\_\{\\mathsf\{T\}\_\{R\}\}\(\\alpha\_\{1\},\\alpha\_\{2\}\)=\\alpha\_\{1\}\\hat\{y\}^\{\(1\)\}\+\(1\-\\alpha\_\{1\}\)\\bigl\(\\alpha\_\{2\}\\hat\{y\}^\{\(2\)\}\+\(1\-\\alpha\_\{2\}\)\\hat\{y\}^\{\(3\)\}\\bigr\)\.\(19\)Already at the symmetric value\(α1,α2\)=\(12,12\)\(\\alpha\_\{1\},\\alpha\_\{2\}\)=\(\\tfrac\{1\}\{2\},\\tfrac\{1\}\{2\}\), the two protocols generally produce different outputs:
y^𝖳L=14y^\(1\)\+14y^\(2\)\+12y^\(3\),y^𝖳R=12y^\(1\)\+14y^\(2\)\+14y^\(3\)\.\\hat\{y\}\_\{\\mathsf\{T\}\_\{L\}\}=\\tfrac\{1\}\{4\}\\hat\{y\}^\{\(1\)\}\+\\tfrac\{1\}\{4\}\\hat\{y\}^\{\(2\)\}\+\\tfrac\{1\}\{2\}\\hat\{y\}^\{\(3\)\},\\qquad\\hat\{y\}\_\{\\mathsf\{T\}\_\{R\}\}=\\tfrac\{1\}\{2\}\\hat\{y\}^\{\(1\)\}\+\\tfrac\{1\}\{4\}\\hat\{y\}^\{\(2\)\}\+\\tfrac\{1\}\{4\}\\hat\{y\}^\{\(3\)\}\.In𝖳L\\mathsf\{T\}\_\{L\}, theAIreceives the largest weight because it enters at the final composition step\. In𝖳R\\mathsf\{T\}\_\{R\}, theexpertreceives the largest weight instead\. Hence, applying the same local averaging rule with the same local coordinates does not in general make the two protocols equivalent\.
The question is therefore when the two protocols produce the same complementarity value\. Under squared loss, Proposition[2](https://arxiv.org/html/2606.04779#Thmproposition2)shows that this is a geometric equality condition: the two protocol outputs have the same complementarity value exactly when they are equally distant from the ground truth vectoryy\. This motivates the following protocol\-indifference locus\.
y^\(1\)\\hat\{y\}^\{\(1\)\}y^\(2\)\\hat\{y\}^\{\(2\)\}y^\(3\)\\hat\{y\}^\{\(3\)\}expertassistantAIα1\\alpha\_\{1\}α2\\alpha\_\{2\}y^𝖳L\\hat\{y\}\_\{\\mathsf\{T\}\_\{L\}\}𝖳L=\(\(12\)3\)\\mathsf\{T\}\_\{L\}=\(\(12\)3\)y^\(1\)\\hat\{y\}^\{\(1\)\}y^\(2\)\\hat\{y\}^\{\(2\)\}y^\(3\)\\hat\{y\}^\{\(3\)\}expertassistantAIα2\\alpha\_\{2\}α1\\alpha\_\{1\}y^𝖳R\\hat\{y\}\_\{\\mathsf\{T\}\_\{R\}\}𝖳R=\(1\(23\)\)\\mathsf\{T\}\_\{R\}=\(1\(23\)\)Figure 5:The two rooted planar binary trees forN=3N\\\!=\\\!3and the ordered configuration\(expert,assistant,AI\)\(\\texttt\{expert\},\\texttt\{assistant\},\\texttt\{AI\}\), with corresponding prediction vectorsy^\(1\),y^\(2\)\\hat\{y\}^\{\(1\)\},\\hat\{y\}^\{\(2\)\}, andy^\(3\)\\hat\{y\}^\{\(3\)\}\. The trees represent two distinct protocols of the same prediction\-task HAI:𝖳L=\(\(12\)3\)\\mathsf\{T\}\_\{L\}=\(\(12\)3\)first combines expert and assistant predictions, whereas𝖳R=\(1\(23\)\)\\mathsf\{T\}\_\{R\}=\(1\(23\)\)first combines assistant and AI predictions\. The vectorsy^𝖳L\\hat\{y\}\_\{\\mathsf\{T\}\_\{L\}\}andy^𝖳R\\hat\{y\}\_\{\\mathsf\{T\}\_\{R\}\}denote the corresponding protocol outputs\.###### Proposition 5\(Protocol\-indifference locus forN=3N=3\)\.
Let𝖳L=\(\(12\)3\)\\mathsf\{T\}\_\{L\}=\(\(12\)3\)and𝖳R=\(1\(23\)\)\\mathsf\{T\}\_\{R\}=\(1\(23\)\)be the twoN=3N=3protocol trees above, equipped with local linear pooling and local parameters\(α1,α2\)∈\[0,1\]2\(\\alpha\_\{1\},\\alpha\_\{2\}\)\\in\[0,1\]^\{2\}\. DefineP𝖳L,𝖳R\(α1,α2\)P\_\{\\mathsf\{T\}\_\{L\},\\mathsf\{T\}\_\{R\}\}\(\\alpha\_\{1\},\\alpha\_\{2\}\)and the protocol\-indifference locus𝒮3\\mathcal\{S\}\_\{3\}as in Proposition[3](https://arxiv.org/html/2606.04779#Thmproposition3)\. Then
1. \(i\)P𝖳L,𝖳R=∑0≤i,j≤2Aijα1iα2jP\_\{\\mathsf\{T\}\_\{L\},\\mathsf\{T\}\_\{R\}\}=\\sum\_\{0\\leq i,j\\leq 2\}A\_\{ij\}\\alpha\_\{1\}^\{i\}\\alpha\_\{2\}^\{j\}has coefficients such that∑0≤i,j≤2Aij=0\\sum\_\{0\\leq i,j\\leq 2\}A\_\{ij\}=0\.
2. \(ii\)IfP𝖳L,𝖳R\(α1,α2\)<0P\_\{\\mathsf\{T\}\_\{L\},\\mathsf\{T\}\_\{R\}\}\(\\alpha\_\{1\},\\alpha\_\{2\}\)<0, then𝖳L\\mathsf\{T\}\_\{L\}has higher complementarity than𝖳R\\mathsf\{T\}\_\{R\}\. IfP𝖳L,𝖳R\(α1,α2\)\>0P\_\{\\mathsf\{T\}\_\{L\},\\mathsf\{T\}\_\{R\}\}\(\\alpha\_\{1\},\\alpha\_\{2\}\)\>0, then𝖳R\\mathsf\{T\}\_\{R\}has higher complementarity than𝖳L\\mathsf\{T\}\_\{L\}\.
3. \(iii\)\(0,α2\),\(α1,1\)∈𝒮3\(0,\\alpha\_\{2\}\),\(\\alpha\_\{1\},1\)\\in\\mathcal\{S\}\_\{3\}, whereα1,α2∈\[0,1\]\\alpha\_\{1\},\\alpha\_\{2\}\\in\[0,1\]\. At\(0,0\)\(0,0\), both protocols collapse to theAIreliance; at\(1,1\)\(1,1\), both collapse toexpertreliance\. In both cases, no complementarity is reached\.
###### Proof\.
\(i\)\(i\)ForN=3N=3, the recursive left\-to\-right ordering, the left tree keeps the parameter order\(α1,α2\)\(\\alpha\_\{1\},\\alpha\_\{2\}\), while for the right treeα1\\alpha\_\{1\}is the root parameter andα2\\alpha\_\{2\}is the parameter of the internal node\(23\)\(23\)\. Writingy^𝖳L\(α1,α2\)\\hat\{y\}\_\{\\mathsf\{T\}\_\{L\}\}\(\\alpha\_\{1\},\\alpha\_\{2\}\)as in \([18](https://arxiv.org/html/2606.04779#S5.E18)\) andy^𝖳R\(α1,α2\)\\hat\{y\}\_\{\\mathsf\{T\}\_\{R\}\}\(\\alpha\_\{1\},\\alpha\_\{2\}\)as in \([19](https://arxiv.org/html/2606.04779#S5.E19)\) one obtainsP𝖳L,𝖳R\(α1,α2\)=∑i,j=02Aijα1iα2jP\_\{\\mathsf\{T\}\_\{L\},\\mathsf\{T\}\_\{R\}\}\(\\alpha\_\{1\},\\alpha\_\{2\}\)=\\sum\_\{i,j=0\}^\{2\}A\_\{ij\}\\alpha^\{i\}\_\{1\}\\alpha^\{j\}\_\{2\}, where the coefficients are
A00=A01=A02=0,A\_\{00\}=A\_\{01\}=A\_\{02\}=0,A10=2⟨y−y^\(3\),y^\(1\)−y^\(3\)⟩,A\_\{10\}=2\\left\\langle y\-\\hat\{y\}^\{\(3\)\},\\,\\hat\{y\}^\{\(1\)\}\-\\hat\{y\}^\{\(3\)\}\\right\\rangle,A20=−‖y^\(1\)−y^\(3\)‖22,A\_\{20\}=\-\\left\\\|\\hat\{y\}^\{\(1\)\}\-\\hat\{y\}^\{\(3\)\}\\right\\\|\_\{2\}^\{2\},A11=−2\(⟨y−y^\(3\),y^\(1\)−y^\(3\)⟩\+⟨y^\(2\)−y^\(3\),y^\(1\)−y^\(3\)⟩\),A\_\{11\}=\-2\\left\(\\left\\langle y\-\\hat\{y\}^\{\(3\)\},\\,\\hat\{y\}^\{\(1\)\}\-\\hat\{y\}^\{\(3\)\}\\right\\rangle\+\\left\\langle\\hat\{y\}^\{\(2\)\}\-\\hat\{y\}^\{\(3\)\},\\,\\hat\{y\}^\{\(1\)\}\-\\hat\{y\}^\{\(3\)\}\\right\\rangle\\right\),A12=A21=2⟨y^\(2\)−y^\(3\),y^\(1\)−y^\(3\)⟩,A\_\{12\}=A\_\{21\}=2\\left\\langle\\hat\{y\}^\{\(2\)\}\-\\hat\{y\}^\{\(3\)\},\\,\\hat\{y\}^\{\(1\)\}\-\\hat\{y\}^\{\(3\)\}\\right\\rangle,A22=‖y^\(1\)−y^\(3\)‖22−2⟨y^\(2\)−y^\(3\),y^\(1\)−y^\(3\)⟩\.A\_\{22\}=\\left\\\|\\hat\{y\}^\{\(1\)\}\-\\hat\{y\}^\{\(3\)\}\\right\\\|\_\{2\}^\{2\}\-2\\left\\langle\\hat\{y\}^\{\(2\)\}\-\\hat\{y\}^\{\(3\)\},\\,\\hat\{y\}^\{\(1\)\}\-\\hat\{y\}^\{\(3\)\}\\right\\rangle\.Finally,∑0≤i,j≤2Aij=0\\sum\_\{0\\leq i,j\\leq 2\}A\_\{ij\}=0becausey^𝖳L\(1,1\)=y^𝖳R\(1,1\)=y^\(1\)\\hat\{y\}\_\{\\mathsf\{T\}\_\{L\}\}\(1,1\)=\\hat\{y\}\_\{\\mathsf\{T\}\_\{R\}\}\(1,1\)=\\hat\{y\}^\{\(1\)\}\.\(ii\)\(ii\)follows from the definition ofP𝖳L,𝖳RP\_\{\\mathsf\{T\}\_\{L\},\\mathsf\{T\}\_\{R\}\}and \([14](https://arxiv.org/html/2606.04779#S5.E14)\)\.\(iii\)\(iii\)follows fromy^𝖳L\(α1,α2\)−y^𝖳R\(α1,α2\)=α1\(1−α2\)\(y^\(3\)−y^\(1\)\)\\hat\{y\}\_\{\\mathsf\{T\}\_\{L\}\}\(\\alpha\_\{1\},\\alpha\_\{2\}\)\-\\hat\{y\}\_\{\\mathsf\{T\}\_\{R\}\}\(\\alpha\_\{1\},\\alpha\_\{2\}\)=\\alpha\_\{1\}\(1\-\\alpha\_\{2\}\)\(\\hat\{y\}^\{\(3\)\}\-\\hat\{y\}^\{\(1\)\}\)\. ∎
#### 5\.2\.1Numerical illustration of the protocol\-indifference locus forN=3N\\\!=\\\!3regression\.
We study when the two distinctN=3N\\\!=\\\!3trees𝖳L=\(\(12\)3\)\\mathsf\{T\}\_\{L\}=\(\(12\)3\)and𝖳R=\(1\(23\)\)\\mathsf\{T\}\_\{R\}=\(1\(23\)\)induce the same complementarity value under linear local pooling by visualizing the protocol\-indifference locus𝒮3\\mathcal\{S\}\_\{3\}and studying the sign and zeros ofP\(α1,α2\)P\(\\alpha\_\{1\},\\alpha\_\{2\}\), as characterized in Proposition[5](https://arxiv.org/html/2606.04779#Thmproposition5)\. Using the held\-out California Housing target vector and the same AI prediction vector as in the previous numerical illustration, we sety^\(3\)=y^AI\\hat\{y\}^\{\(3\)\}=\\hat\{y\}^\{AI\}, fix a corrective synthetic expert asy^\(1\)\\hat\{y\}^\{\(1\)\}, and vary the assistant predictiony^\(2\)\\hat\{y\}^\{\(2\)\}across four regimes: \(i\) weakly corrective; \(ii\) strongly corrective; \(iii\) weakly non\-corrective; and \(iv\) strongly non\-corrective\. Here, ‘\(non\-\)corrective’ refers to the displacement of the assistant prediction from the fixed AI prediction,y^\(2\)−y^AI\\hat\{y\}^\{\(2\)\}\-\\hat\{y\}^\{AI\}, relative to the AI residualy^AI−y\\hat\{y\}^\{AI\}\-y: corrective displacements move against the AI residual, whereas non\-corrective displacements reinforce it\. All technical details are contained in Appendix[F](https://arxiv.org/html/2606.04779#A6)\. Figure[6](https://arxiv.org/html/2606.04779#S5.F6)shows the signed differenceP\(α1,α2\)/nP\(\\alpha\_\{1\},\\alpha\_\{2\}\)/nover the local\-weight square\. The thick black contour is𝒮3\\mathcal\{S\}\_\{3\}, where both trees attain the same complementarity value\. In regions whereP\(α1,α2\)<0P\(\\alpha\_\{1\},\\alpha\_\{2\}\)<0,𝖳L\\mathsf\{T\}\_\{L\}has the smaller squared loss and hence higher complementarity; in regions whereP\(α1,α2\)\>0P\(\\alpha\_\{1\},\\alpha\_\{2\}\)\>0,𝖳R\\mathsf\{T\}\_\{R\}has higher complementarity\. Across the four regimes, the nontrivial branches of𝒮3\\mathcal\{S\}\_\{3\}shift as the assistant displacement from the fixed AI prediction changes from corrective to non\-corrective\. Complementarity depends on tree topology: with the same ordered leaves and the same linear local rule, different protocol trees achieve higher complementarity in different regions of the weight space\.
Figure 6:N=3N\\\!=\\\!3regression complementarity under linear pooling using the California housing dataset\. Each panel shows the values ofP\(α1,α2\)/nP\(\\alpha\_\{1\},\\alpha\_\{2\}\)/n\[\($100,000\)2\(\\mathdollar 100\{,\}000\)^\{2\}\] and the protocol\-indifference locusP\(α1,α2\)=0P\(\\alpha\_\{1\},\\alpha\_\{2\}\)=0for the two protocol trees𝖳L=\(\(12\)3\)\\mathsf\{T\}\_\{L\}=\(\(12\)3\)and𝖳R=\(1\(23\)\)\\mathsf\{T\}\_\{R\}=\(1\(23\)\)\. Blue regions, whereP\(α1,α2\)<0P\(\\alpha\_\{1\},\\alpha\_\{2\}\)<0, indicate higher complementarity for𝖳L\\mathsf\{T\}\_\{L\}; red regions, whereP\(α1,α2\)\>0P\(\\alpha\_\{1\},\\alpha\_\{2\}\)\>0, indicate higher complementarity for𝖳R\\mathsf\{T\}\_\{R\}, according to the sign convention in Proposition[5](https://arxiv.org/html/2606.04779#Thmproposition5)\. The expert predictiony^\(1\)\\hat\{y\}^\{\(1\)\}is fixed withθ1=180∘\\theta\_\{1\}=180^\{\\circ\}andq1=1\.75q\_\{1\}=1\.75, while the assistant predictiony^\(2\)\\hat\{y\}^\{\(2\)\}varies across four regimes\. The protocol\-indifference locus is shown only inside\[0,1\]2\[0,1\]^\{2\}, boundaries included, by the thick black contour; it contains the branchesα1=0\\alpha\_\{1\}=0andα2=1\\alpha\_\{2\}=1, while its interior branches depend on the assistant’s direction and magnitude relative to the AI residual\.
### 5\.3TheN=4N\\\!=\\\!4Case and the\(user,AI,AI,user\)\(\\textnormal\{\{user\}\},\\textnormal\{\{AI\}\},\\textnormal\{\{AI\}\},\\textnormal\{\{user\}\}\)Agent Configuration
Finally, let us considerN=4N\\\!=\\\!4and the five rooted planar binary trees in𝖸4\\mathsf\{Y\}\_\{4\}
𝖳1=\(\(\(12\)3\)4\),𝖳2=\(\(1\(23\)\)4\),𝖳3=\(\(12\)\(34\)\),𝖳4=\(1\(\(23\)4\)\),𝖳5=\(1\(2\(34\)\)\)\.\\mathsf\{T\}\_\{1\}=\(\(\(12\)3\)4\),\\quad\\mathsf\{T\}\_\{2\}=\(\(1\(23\)\)4\),\\quad\\mathsf\{T\}\_\{3\}=\(\(12\)\(34\)\),\\quad\\mathsf\{T\}\_\{4\}=\(1\(\(23\)4\)\),\\quad\\mathsf\{T\}\_\{5\}=\(1\(2\(34\)\)\)\.We decorate their leaves with the ordered configuration\(user,AI,AI,user\)\(\\texttt\{user\},\\texttt\{AI\},\\texttt\{AI\},\\texttt\{user\}\)\. Here the two AI leaves may be interpreted as two copies or variants of the same AI system performing the same intended predictive function in possibly different deployment environments\. We show these trees in Figure[7](https://arxiv.org/html/2606.04779#S5.F7)\. They can be interpreted as follows\.𝖳1\\mathsf\{T\}\_\{1\}is a*left\-user anchored sequential protocol*: the first user interacts with the firstAI, then the secondAIis added, and the second user enters only at the last stage\. Tree𝖳2\\mathsf\{T\}\_\{2\}is an*AI\-ensemble\-first protocol with late human review*: the two AI systems are first pooled, then their output is combined with the first user, and finally with the second user\.𝖳3\\mathsf\{T\}\_\{3\}represents*two parallel user–AI dyads*, whose outputs are then aggregated\.𝖳4\\mathsf\{T\}\_\{4\}is again an*AI\-ensemble\-first protocol*, but with the order of the two users reversed\. Finally,𝖳5\\mathsf\{T\}\_\{5\}is a*right\-user anchored sequential protocol*\. In real\-world applications, certain protocols could be inaccessible due to constraints; however, thisN=4N\\\!=\\\!4configuration example makes it possible to compare parallel human–AI dyads, AI\-ensemble\-first workflows, and serial protocols with late human intervention\. In regression under squared loss and linear pooling, theN=4N\\\!=\\\!4case extends theN=2N\\\!=\\\!2andN=3N\\\!=\\\!3analyses in a direct way: each protocol tree induces a polynomial map from local weights\(α1,α2,α3\)∈\[0,1\]3\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\)\\in\[0,1\]^\{3\}to a global linear combination of the four leaf predictions, and complementarity is determined by the distance of that output from the ground\-truth vector\. We therefore do not introduce a separate numerical study forN=4N\\\!=\\\!4\. Instead, we focus on the invariance of complementarity in regression under linear pooling as developed in the next sections\.
y^\(1\)\\hat\{y\}^\{\(1\)\}y^\(2\)\\hat\{y\}^\{\(2\)\}y^\(3\)\\hat\{y\}^\{\(3\)\}y^\(4\)\\hat\{y\}^\{\(4\)\}userAIAIuserα1\\alpha\_\{1\}α2\\alpha\_\{2\}α3\\alpha\_\{3\}y^𝖳1\\hat\{y\}\_\{\\mathsf\{T\}\_\{1\}\}𝖳1=\(\(\(12\)3\)4\)\\mathsf\{T\}\_\{1\}\\\!=\\\!\(\(\(12\)3\)4\)y^\(1\)\\hat\{y\}^\{\(1\)\}y^\(2\)\\hat\{y\}^\{\(2\)\}y^\(3\)\\hat\{y\}^\{\(3\)\}y^\(4\)\\hat\{y\}^\{\(4\)\}userAIAIuserα2\\alpha\_\{2\}α1\\alpha\_\{1\}α3\\alpha\_\{3\}y^𝖳2\\hat\{y\}\_\{\\mathsf\{T\}\_\{2\}\}𝖳2=\(\(1\(23\)\)4\)\\mathsf\{T\}\_\{2\}\\\!=\\\!\(\(1\(23\)\)4\)y^\(1\)\\hat\{y\}^\{\(1\)\}y^\(2\)\\hat\{y\}^\{\(2\)\}y^\(3\)\\hat\{y\}^\{\(3\)\}y^\(4\)\\hat\{y\}^\{\(4\)\}userAIAIuserα1\\alpha\_\{1\}α3\\alpha\_\{3\}α2\\alpha\_\{2\}y^𝖳3\\hat\{y\}\_\{\\mathsf\{T\}\_\{3\}\}𝖳3=\(\(12\)\(34\)\)\\mathsf\{T\}\_\{3\}\\\!=\\\!\(\(12\)\(34\)\)y^\(1\)\\hat\{y\}^\{\(1\)\}y^\(2\)\\hat\{y\}^\{\(2\)\}y^\(3\)\\hat\{y\}^\{\(3\)\}y^\(4\)\\hat\{y\}^\{\(4\)\}userAIAIuserα2\\alpha\_\{2\}α3\\alpha\_\{3\}α1\\alpha\_\{1\}y^𝖳4\\hat\{y\}\_\{\\mathsf\{T\}\_\{4\}\}𝖳4=\(1\(\(23\)4\)\)\\mathsf\{T\}\_\{4\}\\\!=\\\!\(1\(\(23\)4\)\)y^\(1\)\\hat\{y\}^\{\(1\)\}y^\(2\)\\hat\{y\}^\{\(2\)\}y^\(3\)\\hat\{y\}^\{\(3\)\}y^\(4\)\\hat\{y\}^\{\(4\)\}userAIAIuserα3\\alpha\_\{3\}α2\\alpha\_\{2\}α1\\alpha\_\{1\}y^𝖳5\\hat\{y\}\_\{\\mathsf\{T\}\_\{5\}\}𝖳5=\(1\(2\(34\)\)\)\\mathsf\{T\}\_\{5\}\\\!=\\\!\(1\(2\(34\)\)\)Figure 7:The five rooted planar binary trees forN=4N\\\!=\\\!4under the ordered configuration\(user,AI,AI,user\)\(\\texttt\{user\},\\texttt\{AI\},\\texttt\{AI\},\\texttt\{user\}\), with corresponding prediction vectorsy^\(1\),y^\(2\),y^\(3\),y^\(4\)\\hat\{y\}^\{\(1\)\},\\hat\{y\}^\{\(2\)\},\\hat\{y\}^\{\(3\)\},\\hat\{y\}^\{\(4\)\}and tree outputsy^𝖳1,…,y^𝖳5\\hat\{y\}\_\{\\mathsf\{T\}\_\{1\}\},\\dots,\\hat\{y\}\_\{\\mathsf\{T\}\_\{5\}\}\.
## 6Trees in Regression Under Linear Pooling: Barycentric Coordinates
We turn now to the question whether in regression problems distinct protocols can lead to the same HAI output for the same parametrization of local interactions modeled by linear pooling, i\.e\.,m2,αid\(u,v\)=αu\+\(1−α\)vm^\{\\mathrm\{id\}\}\_\{2,\\alpha\}\(u,v\)=\\alpha u\+\(1\-\\alpha\)v\. In this section, we show that, under linear local composition, every tree induces a convex combination of the leaf predictions\. We make this global linear combination explicit through a barycentric coordinate map\. To prove this, let us denote byΔN−1:=\{\(ω1,…,ωN\)∈\[0,1\]N:∑j=1Nωj=1\}\\Delta^\{N\-1\}:=\\left\\\{\(\\omega\_\{1\},\\dots,\\omega\_\{N\}\)\\in\[0,1\]^\{N\}:\\sum\_\{j=1\}^\{N\}\\omega\_\{j\}=1\\right\\\}the standard\(N−1\)\(N\-1\)\-simplex\.
###### Definition 6\(Barycentric coordinate map of a linear protocol tree\)\.
Let𝖳∈𝖸N\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}\. The*barycentric coordinate map*associated with𝖳\\mathsf\{T\}is the mapφ𝖳:\[0,1\]N−1→ΔN−1\\varphi\_\{\\mathsf\{T\}\}:\[0,1\]^\{N\-1\}\\to\\Delta^\{N\-1\}, defined recursively as follows\. For the one\-leaf tree\|∈𝖸1\|\\in\\mathsf\{Y\}\_\{1\}, setφ\|\(∅\)=\(1\)∈Δ0\\varphi\_\{\{\|\}\}\(\\emptyset\)=\(1\)\\in\\Delta^\{0\}\. ForN≥2N\\geq 2, let𝖳=σ∨τ\\mathsf\{T\}=\\sigma\\vee\\tau,σ∈𝖸N1\\sigma\\in\\mathsf\{Y\}\_\{N\_\{1\}\},τ∈𝖸N2\\tau\\in\\mathsf\{Y\}\_\{N\_\{2\}\},N1\+N2=NN\_\{1\}\+N\_\{2\}=N, be the unique root decomposition of𝖳\\mathsf\{T\}\. With the recursive left\-to\-right ordero\(𝖳\)=\{o\(σ\)<x<o\(τ\)\}o\(\\mathsf\{T\}\)=\\\{o\(\\sigma\)<x<o\(\\tau\)\\\}of internal nodes, for𝛂N−1\\boldsymbol\{\\alpha\}\_\{N\-1\}, the parameters\(α1,…,αN1−1\)\(\\alpha\_\{1\},\\dots,\\alpha\_\{N\_\{1\}\-1\}\)belong toσ\\sigma, the parameterαN1\\alpha\_\{N\_\{1\}\}belongs to the root, and the parameters\(αN1\+1,…,αN−1\)\(\\alpha\_\{N\_\{1\}\+1\},\\dots,\\alpha\_\{N\-1\}\)belong toτ\\tau\. We define
φ𝖳\(𝜶\):=\(αN1φσ\(α1,…,αN1−1\),\(1−αN1\)φτ\(αN1\+1,…,αN−1\)\)\.\\varphi\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\):=\\Bigl\(\\alpha\_\{N\_\{1\}\}\\,\\varphi\_\{\\sigma\}\(\\alpha\_\{1\},\\dots,\\alpha\_\{N\_\{1\}\-1\}\),\\,\(1\-\\alpha\_\{N\_\{1\}\}\)\\,\\varphi\_\{\\tau\}\(\\alpha\_\{N\_\{1\}\+1\},\\dots,\\alpha\_\{N\-1\}\)\\Bigr\)\.
By construction, each component ofφ𝖳\(𝜶\)\\varphi\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\)lies in\[0,1\]\[0,1\]\. Moreover, the components sum to one\. Thusφ𝖳\\varphi\_\{\\mathsf\{T\}\}is well\-defined as a map intoΔN−1\\Delta^\{N\-1\}\. If𝜶∈\(0,1\)N−1\\boldsymbol\{\\alpha\}\\in\(0,1\)^\{N\-1\}, then all recursive scaling factors are strictly positive, and thereforeφ𝖳\(𝜶\)∈int\(ΔN−1\)\\varphi\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\)\\in\\operatorname\{int\}\(\\Delta^\{N\-1\}\)\.
The naming ofφ𝖳\\varphi\_\{\\mathsf\{T\}\}follows from this observation: if the leaves of𝖳\\mathsf\{T\}are labeled by prediction vectorsy^\(1\),…,y^\(N\)∈ℝn\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\}\\in\\mathbb\{R\}^\{n\}, then the linear protocol output can be written as
y^𝖳\(𝜶\)=∑j=1Nφ𝖳,j\(𝜶\)y^\(j\),∑j=1Nφ𝖳,j\(𝜶\)=1\.\\hat\{y\}\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\)=\\sum\_\{j=1\}^\{N\}\\varphi\_\{\\mathsf\{T\},j\}\(\\boldsymbol\{\\alpha\}\)\\hat\{y\}^\{\(j\)\},\\quad\\sum\_\{j=1\}^\{N\}\\varphi\_\{\\mathsf\{T\},j\}\(\\boldsymbol\{\\alpha\}\)=1\.\(20\)This follows directly from the recursive definition of the linear pooling rule: the coefficientsφ𝖳,j\(𝜶\)\\varphi\_\{\\mathsf\{T\},j\}\(\\boldsymbol\{\\alpha\}\)are the weights assigned by the tree to thejj\-th leaf prediction\.
###### Proposition 6\.
For every𝖳∈𝖸N\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}, the restriction
φ𝖳\|\(0,1\)N−1:\(0,1\)N−1⟶int\(ΔN−1\)\\varphi\_\{\\mathsf\{T\}\}\\big\|\_\{\(0,1\)^\{N\-1\}\}:\(0,1\)^\{N\-1\}\\longrightarrow\\operatorname\{int\}\(\\Delta^\{N\-1\}\)is a bijection\.
Proposition[6](https://arxiv.org/html/2606.04779#Thmproposition6)shows that, once a protocol tree𝖳\\mathsf\{T\}is fixed, its local linear\-pooling parameters provide a coordinate system for the open simplex of leaf weights\. Thus,*different local parameters of the same HAI protocol correspond to different barycentric weights assigned to the same ordered leaf predictions*\. In the next section, we study whether different parameterizations of different trees can give rise to the*same*set of barycentric coordinates instead\. The answer is positive and it provides an algebraic interpretation to complementarity invariance under tree reparameterization in regression\.
## 7Complementarity Invariance Across Trees in Regression: Tamari Covers and Reparameterizations
We study whether a change in tree topology can be absorbed by a corresponding change in local weights so that the two distinct trees produce the same root output and, therefore, the same complementarity level for any loss function\. To answer this question we introduce the so\-called Tamari moves on𝖸N\\mathsf\{Y\}\_\{N\}\(Tamari,[1962](https://arxiv.org/html/2606.04779#bib.bib64)\)\. In HAI terms, a Tamari move changes the HAI protocol of a fixed ordered configuration by a local move that we describe in what follows\.
###### Definition 7\(Tamari cover,\(Tamari,[1962](https://arxiv.org/html/2606.04779#bib.bib64)\)\)\.
Let𝖳,𝖳′∈𝖸N\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\\in\\mathsf\{Y\}\_\{N\}\. We say that𝖳′\\mathsf\{T\}^\{\\prime\}covers𝖳\\mathsf\{T\}in the Tamari order, and write𝖳⋖Tam𝖳′\\mathsf\{T\}\\lessdot\_\{\\mathrm\{Tam\}\}\\mathsf\{T\}^\{\\prime\}, if𝖳′\\mathsf\{T\}^\{\\prime\}is obtained from𝖳\\mathsf\{T\}by a single right rotation, that is, by replacing a subtree of the form\(\(AB\)C\)\(\(AB\)C\)with\(A\(BC\)\)\(A\(BC\)\), while preserving the left\-to\-right order of the leaves\.
The partial order generated by these cover relations is the*Tamari lattice*on𝖸N\\mathsf\{Y\}\_\{N\}\. Its Hasse diagram is the directed cover graph whose arrows are the Tamari covers\. After forgetting orientations, this cover graph is the11\-skeleton of the associahedron𝖠𝗌𝗌𝗈𝖼N\\mathsf\{Assoc\}\_\{N\}\(Stasheff,[1963](https://arxiv.org/html/2606.04779#bib.bib66)\), whose vertices are the trees in𝖸N\\mathsf\{Y\}\_\{N\}\.
###### Definition 8\(Tamari reparameterization\)\.
Let𝖳,𝖳′∈𝖸N\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\\in\\mathsf\{Y\}\_\{N\}satisfy𝖳⋖Tam𝖳′\\mathsf\{T\}\\lessdot\_\{\\mathrm\{Tam\}\}\\mathsf\{T\}^\{\\prime\}\. A*Tamari reparameterization*from𝖳\\mathsf\{T\}to𝖳′\\mathsf\{T\}^\{\\prime\}is a map
Γ𝖳,𝖳′:\(0,1\)N−1→\(0,1\)N−1\\Gamma\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}:\(0,1\)^\{N\-1\}\\to\(0,1\)^\{N\-1\}such that, for every𝛂𝖳∈\(0,1\)N−1\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\}\\in\(0,1\)^\{N\-1\}, and every choice of leaf predictionsy^\(1\),…,y^\(N\)\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\},
y^𝖳\(𝜶𝖳\)=y^𝖳′\(Γ𝖳,𝖳′\(𝜶𝖳\)\)\.\\hat\{y\}\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\}\)=\\hat\{y\}\_\{\\mathsf\{T\}^\{\\prime\}\}\\bigl\(\\Gamma\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}\(\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\}\)\\bigr\)\.\(21\)
Thus, a Tamari reparameterization is a change of local coordinates along a Tamari cover that preserves the outputs of the trees of the cover\. The restriction to the open cube excludes degenerate projection cases in which some internal weights are0or11\. Tamari reparameterizations are relevant in our setting as they preserve complementarity:
###### Proposition 7\.
Assume regression with any loss\. Let𝖳,𝖳′∈𝖸N\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\\in\\mathsf\{Y\}\_\{N\}satisfy𝖳⋖Tam𝖳′\\mathsf\{T\}\\lessdot\_\{\\mathrm\{Tam\}\}\\mathsf\{T\}^\{\\prime\}, and letΓ𝖳,𝖳′:\(0,1\)N−1→\(0,1\)N−1\\Gamma\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}:\(0,1\)^\{N\-1\}\\to\(0,1\)^\{N\-1\}be a Tamari reparameterization\. Then, for all leaf predictionsy^\(1\),…,y^\(N\)\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\}, every datasetDD, and𝛂𝖳∈\(0,1\)N−1\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\}\\in\(0,1\)^\{N\-1\},
Ψ𝖳m𝖳\(𝜶𝖳\)\(y^\(1\),…,y^\(N\);D\)=Ψ𝖳′m𝖳′\(Γ𝖳,𝖳′\(𝜶𝖳\)\)\(y^\(1\),…,y^\(N\);D\)\.\\Psi\_\{\\mathsf\{T\}\}^\{\\,m\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\}\)\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)=\\Psi\_\{\\mathsf\{T\}^\{\\prime\}\}^\{\\,m\_\{\\mathsf\{T\}^\{\\prime\}\}\(\\Gamma\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}\(\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\}\)\)\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)\.
###### Proof\.
Equation \([21](https://arxiv.org/html/2606.04779#S7.E21)\) gives equality of the two root outputs after reparameterization\. The claim follows from \([7](https://arxiv.org/html/2606.04779#S3.E7)\)\. ∎
Proposition[7](https://arxiv.org/html/2606.04779#Thmproposition7)identifies a notion of*complementarity invariance along Hasse edges of the Tamari lattice*: distinct protocol trees related by a single Tamari cover may define the same value of the complementarity functional after an appropriate transport of local parameters for any choice of loss function of the regression problem, i\.e\., a Tamari reparameterization\. We construct a Tamari reparameterization explicitly under local linear pooling:
###### Theorem 2\(Existence of Tamari reparameterizations\)\.
Assume regression with arbitrary pointwise loss and linear local compositionm2,βid\(u,v\)=βu\+\(1−β\)vm\_\{2,\\beta\}^\{\\mathrm\{id\}\}\(u,v\)=\\beta u\+\(1\-\\beta\)v,β∈\(0,1\)\\beta\\in\(0,1\)\. Let𝖳∈𝖸N\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}contain a displayed subtree of the form\(\(AB\)C\)\(\(AB\)C\), and let𝖳′\\mathsf\{T\}^\{\\prime\}be obtained from𝖳\\mathsf\{T\}by the Tamari cover\(\(AB\)C\)↦\(A\(BC\)\)\(\(AB\)C\)\\mapsto\(A\(BC\)\)\. Then there exists a Tamari reparameterization
Γ𝖳,𝖳′♢:\(0,1\)N−1→\(0,1\)N−1\.\\Gamma^\{\\diamondsuit\}\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}:\(0,1\)^\{N\-1\}\\to\(0,1\)^\{N\-1\}\.Consequently, for all leaf predictionsy^\(1\),…,y^\(N\)\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\}, every datasetDD, and𝛂𝖳∈\(0,1\)N−1\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\}\\in\(0,1\)^\{N\-1\},
Ψ𝖳m𝖳\(𝜶𝖳\)\(y^\(1\),…,y^\(N\);D\)=Ψ𝖳′m𝖳′\(Γ𝖳,𝖳′♢\(𝜶𝖳\)\)\(y^\(1\),…,y^\(N\);D\)\.\\Psi\_\{\\mathsf\{T\}\}^\{\\,m\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\}\)\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)=\\Psi\_\{\\mathsf\{T\}^\{\\prime\}\}^\{\\,m\_\{\\mathsf\{T\}^\{\\prime\}\}\(\\Gamma^\{\\diamondsuit\}\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}\(\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\}\)\)\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)\.
###### Proof\.
The construction is local\. Letγ∈\(0,1\)\\gamma\\in\(0,1\)be the parameter at the root ofABAB, and letδ∈\(0,1\)\\delta\\in\(0,1\)be the parameter at the root of\(\(AB\)C\)\(\(AB\)C\)\. All parameters outside the subtree\(\(AB\)C\)\(\(AB\)C\)are left unchanged byΓ𝖳,𝖳′♢\\Gamma^\{\\diamondsuit\}\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}\.
Define the two new parameters in the rotated fragmentA\(BC\)A\(BC\)by
βA\(BC\):=γδ,βBC:=δ\(1−γ\)1−γδ\.\\beta\_\{A\(BC\)\}:=\\gamma\\delta,\\qquad\\beta\_\{BC\}:=\\frac\{\\delta\(1\-\\gamma\)\}\{1\-\\gamma\\delta\}\.These parameters are again in\(0,1\)\(0,1\):0<γδ<10<\\gamma\\delta<1, and1−γδ\>01\-\\gamma\\delta\>0, while1−γδ−δ\(1−γ\)=1−δ\>01\-\\gamma\\delta\-\\delta\(1\-\\gamma\)=1\-\\delta\>0, so0<βBC<10<\\beta\_\{BC\}<1\. Thus the construction defines a mapΓ𝖳,𝖳′♢:\(0,1\)N−1→\(0,1\)N−1\\Gamma^\{\\diamondsuit\}\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}:\(0,1\)^\{N\-1\}\\to\(0,1\)^\{N\-1\}\. It remains to check that the displayed fragment has the same output before and after the transportΓ𝖳,𝖳′♢\\Gamma^\{\\diamondsuit\}\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}\. LetzA,zB,zCz\_\{A\},z\_\{B\},z\_\{C\}denote the outputs of the subtreesA,B,CA,B,C\. In𝖳\\mathsf\{T\}, the fragment output is
δ\(γzA\+\(1−γ\)zB\)\+\(1−δ\)zC,\\delta\\bigl\(\\gamma z\_\{A\}\+\(1\-\\gamma\)z\_\{B\}\\bigr\)\+\(1\-\\delta\)z\_\{C\},with coefficientsγδ\\gamma\\delta,δ\(1−γ\)\\delta\(1\-\\gamma\), and1−δ1\-\\deltaonzA,zB,zCz\_\{A\},z\_\{B\},z\_\{C\}\. In𝖳′\\mathsf\{T\}^\{\\prime\}, the transported fragment output is
βA\(BC\)zA\+\(1−βA\(BC\)\)\(βBCzB\+\(1−βBC\)zC\)\.\\beta\_\{A\(BC\)\}z\_\{A\}\+\(1\-\\beta\_\{A\(BC\)\}\)\\bigl\(\\beta\_\{BC\}z\_\{B\}\+\(1\-\\beta\_\{BC\}\)z\_\{C\}\\bigr\)\.By the definitions above, the coefficient ofzAz\_\{A\}isγδ\\gamma\\delta, the coefficient ofzBz\_\{B\}is\(1−γδ\)δ\(1−γ\)/\(1−γδ\)=δ\(1−γ\)\(1\-\\gamma\\delta\)\\delta\(1\-\\gamma\)/\(1\-\\gamma\\delta\)=\\delta\(1\-\\gamma\), and the coefficient ofzCz\_\{C\}is1−δ1\-\\delta\. Hence the two fragment outputs are identical\.
Since all parameters outside the fragment are unchanged, and since the rotated fragment supplies the same output to the rest of the tree, the full root outputs agree:y^𝖳\(𝜶𝖳\)=y^𝖳′\(Γ𝖳,𝖳′♢\(𝜶𝖳\)\)\\hat\{y\}\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\}\)=\\hat\{y\}\_\{\\mathsf\{T\}^\{\\prime\}\}\\bigl\(\\Gamma^\{\\diamondsuit\}\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}\(\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\}\)\\bigr\)\. Then,Γ𝖳,𝖳′♢\\Gamma^\{\\diamondsuit\}\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}is a Tamari reparameterization by Definition[8](https://arxiv.org/html/2606.04779#Thmdefinition8)\. Applying Proposition[7](https://arxiv.org/html/2606.04779#Thmproposition7)ends the proof\. ∎
Theorem[2](https://arxiv.org/html/2606.04779#Thmtheorem2)gives complementarity invariance an algebraic and analytic form\. The Tamari reparameterization result shows that, for any choice of the loss function and linear local composition, a Tamari cover determines an explicit transport of local parameters that preserves the entire protocol output\. Thus*two protocol trees related by a Tamari move can be made analytically equivalent by a local change of coordinates on the interaction weights*\.
ForN=3N=3, the theorem immediately applies to the unique Tamari cover𝖳L=\(\(12\)3\)⋖Tam𝖳R=\(1\(23\)\)\\mathsf\{T\}\_\{L\}=\(\(12\)3\)\\lessdot\_\{\\mathrm\{Tam\}\}\\mathsf\{T\}\_\{R\}=\(1\(23\)\)\. In HAI terms, the two workflows may have different procedures—expert–assistantaggregation followed byAIintegration for𝖳L\\mathsf\{T\}\_\{L\}versusassistant–AIaggregation followed byexpertintegration for𝖳R\\mathsf\{T\}\_\{R\}—but, after the Tamari reparameterizationΓ♢\\Gamma^\{\\diamondsuit\}, they induce the same root output and hence the same complementarity value\. ForN=4N=4, Theorem[2](https://arxiv.org/html/2606.04779#Thmtheorem2)applies along cover relations in the five\-tree Tamari lattice\. Furthermore, forN=4N\\\!=\\\!4, we prove a coherence theorem: Tamari\-cover reparameterizations satisfy the pentagon identity\(Yanofsky,[2024](https://arxiv.org/html/2606.04779#bib.bib80)\)\. We show this result graphically in Figure[8](https://arxiv.org/html/2606.04779#S7.F8)and we state it more precisely in what follows:
###### Theorem 3\(Tamari\-cover reparameterizations satisfy the pentagon identity\)\.
Assume regression with any loss and with linear local composition
m2,βid\(u,v\)=βu\+\(1−β\)v,β∈\(0,1\)\.m\_\{2,\\beta\}^\{\\mathrm\{id\}\}\(u,v\)=\\beta u\+\(1\-\\beta\)v,\\qquad\\beta\\in\(0,1\)\.Let𝖳1=\(\(\(12\)3\)4\)\\mathsf\{T\}\_\{1\}=\(\(\(12\)3\)4\),𝖳2=\(\(1\(23\)\)4\)\\mathsf\{T\}\_\{2\}=\(\(1\(23\)\)4\),𝖳3=\(\(12\)\(34\)\)\\mathsf\{T\}\_\{3\}=\(\(12\)\(34\)\),𝖳4=\(1\(\(23\)4\)\)\\mathsf\{T\}\_\{4\}=\(1\(\(23\)4\)\),𝖳5=\(1\(2\(34\)\)\)\\mathsf\{T\}\_\{5\}=\(1\(2\(34\)\)\)be the five rooted planar binary trees in𝖸4\\mathsf\{Y\}\_\{4\}, ordered by the Tamari covers\. Let
Γ1,2♢,Γ1,3♢,Γ2,4♢,Γ3,5♢,Γ4,5♢\\Gamma^\{\\diamondsuit\}\_\{1,2\},\\ \\Gamma^\{\\diamondsuit\}\_\{1,3\},\\ \\Gamma^\{\\diamondsuit\}\_\{2,4\},\\ \\Gamma^\{\\diamondsuit\}\_\{3,5\},\\ \\Gamma^\{\\diamondsuit\}\_\{4,5\}denote the Tamari\-cover reparameterizations along the two directed paths from𝖳1\\mathsf\{T\}\_\{1\}to𝖳5\\mathsf\{T\}\_\{5\}as in Theorem[2](https://arxiv.org/html/2606.04779#Thmtheorem2)\. Then, for every parameter vector𝛂𝖳1∈\(0,1\)3\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\_\{1\}\}\\in\(0,1\)^\{3\}, the pentagon identity holds:
\(Γ3,5♢∘Γ1,3♢\)\(𝜶𝖳1\)=\(Γ4,5♢∘Γ2,4♢∘Γ1,2♢\)\(𝜶𝖳1\)\.\\bigl\(\\Gamma^\{\\diamondsuit\}\_\{3,5\}\\circ\\Gamma^\{\\diamondsuit\}\_\{1,3\}\\bigr\)\(\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\_\{1\}\}\)=\\bigl\(\\Gamma^\{\\diamondsuit\}\_\{4,5\}\\circ\\Gamma^\{\\diamondsuit\}\_\{2,4\}\\circ\\Gamma^\{\\diamondsuit\}\_\{1,2\}\\bigr\)\(\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\_\{1\}\}\)\.\(22\)Consequently, for all leaf predictionsy^\(1\),…,y^\(4\)\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(4\)\}, every datasetDD, and𝛂𝖳1∈\(0,1\)3\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\_\{1\}\}\\in\(0,1\)^\{3\},
Ψ𝖳1m𝖳1\(𝜶𝖳1\)\(y^\(1\),y^\(2\),y^\(3\),y^\(4\);D\)=\\displaystyle\\Psi\_\{\\mathsf\{T\}\_\{1\}\}^\{\\,m\_\{\\mathsf\{T\}\_\{1\}\}\(\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\_\{1\}\}\)\}\(\\hat\{y\}^\{\(1\)\},\\hat\{y\}^\{\(2\)\},\\hat\{y\}^\{\(3\)\},\\hat\{y\}^\{\(4\)\};D\)=Ψ𝖳5m𝖳5\(\(Γ3,5♢∘Γ1,3♢\)\(𝜶𝖳1\)\)\(y^\(1\),y^\(2\),y^\(3\),y^\(4\);D\)=\\displaystyle\\Psi\_\{\\mathsf\{T\}\_\{5\}\}^\{\\,m\_\{\\mathsf\{T\}\_\{5\}\}\\bigl\(\(\\Gamma^\{\\diamondsuit\}\_\{3,5\}\\circ\\Gamma^\{\\diamondsuit\}\_\{1,3\}\)\(\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\_\{1\}\}\)\\bigr\)\}\(\\hat\{y\}^\{\(1\)\},\\hat\{y\}^\{\(2\)\},\\hat\{y\}^\{\(3\)\},\\hat\{y\}^\{\(4\)\};D\)=Ψ𝖳5m𝖳5\(\(Γ4,5♢∘Γ2,4♢∘Γ1,2♢\)\(𝜶𝖳1\)\)\(y^\(1\),y^\(2\),y^\(3\),y^\(4\);D\)\.\\displaystyle\\Psi\_\{\\mathsf\{T\}\_\{5\}\}^\{\\,m\_\{\\mathsf\{T\}\_\{5\}\}\\bigl\(\(\\Gamma^\{\\diamondsuit\}\_\{4,5\}\\circ\\Gamma^\{\\diamondsuit\}\_\{2,4\}\\circ\\Gamma^\{\\diamondsuit\}\_\{1,2\}\)\(\\boldsymbol\{\\alpha\}\_\{\\mathsf\{T\}\_\{1\}\}\)\\bigr\)\}\(\\hat\{y\}^\{\(1\)\},\\hat\{y\}^\{\(2\)\},\\hat\{y\}^\{\(3\)\},\\hat\{y\}^\{\(4\)\};D\)\.\(23\)
Finally, we close the regression analysis by combining barycentric coordinates from Section[6](https://arxiv.org/html/2606.04779#S6)and Tamari reparameterizations described so far\. Let𝖳⋖Tam𝖳′\\mathsf\{T\}\\lessdot\_\{\\mathrm\{Tam\}\}\\mathsf\{T\}^\{\\prime\}be a Tamari cover, with𝖳,𝖳′∈𝖸N\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\\in\\mathsf\{Y\}\_\{N\}and letΓ𝖳,𝖳′♢\\Gamma^\{\\diamondsuit\}\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}be the corresponding reparameterization from Theorem[2](https://arxiv.org/html/2606.04779#Thmtheorem2)\. Then Proposition[6](https://arxiv.org/html/2606.04779#Thmproposition6)and Theorem[2](https://arxiv.org/html/2606.04779#Thmtheorem2)state thatφ𝖳′∘Γ𝖳,𝖳′♢=φ𝖳\\varphi\_\{\\mathsf\{T\}^\{\\prime\}\}\\circ\\Gamma^\{\\diamondsuit\}\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}=\\varphi\_\{\\mathsf\{T\}\}on\(0,1\)N−1\(0,1\)^\{N\-1\}\. Equivalently, the following diagram commutes:
\(0,1\)N−1\{\(0,1\)^\{N\-1\}\}\(0,1\)N−1\{\(0,1\)^\{N\-1\}\}int\(ΔN−1\)\{\\operatorname\{int\}\(\\Delta^\{N\-1\}\)\}Γ𝖳,𝖳′♢\\scriptstyle\{\\Gamma^\{\\diamondsuit\}\_\{\\mathsf\{T\},\\mathsf\{T\}^\{\\prime\}\}\}φ𝖳\\scriptstyle\{\\varphi\_\{\\mathsf\{T\}\}\}φ𝖳′\\scriptstyle\{\\varphi\_\{\\mathsf\{T\}^\{\\prime\}\}\}Thus, the barycentric weights of a tree𝖳∈𝖸N\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}are invariant under a Tamari cover of𝖳\\mathsf\{T\}and a Tamari reparameterization of its parameters on\(0,1\)N−1\(0,1\)^\{N\-1\}\.
Γ1,2♢\\Gamma^\{\\diamondsuit\}\_\{1,2\}Γ2,4♢\\Gamma^\{\\diamondsuit\}\_\{2,4\}Γ4,5♢\\Gamma^\{\\diamondsuit\}\_\{4,5\}Γ1,3♢\\Gamma^\{\\diamondsuit\}\_\{1,3\}Γ3,5♢\\Gamma^\{\\diamondsuit\}\_\{3,5\}Γ3,5♢∘Γ1,3♢=Γ4,5♢∘Γ2,4♢∘Γ1,2♢\\Gamma^\{\\diamondsuit\}\_\{3,5\}\\circ\\Gamma^\{\\diamondsuit\}\_\{1,3\}=\\Gamma^\{\\diamondsuit\}\_\{4,5\}\\circ\\Gamma^\{\\diamondsuit\}\_\{2,4\}\\circ\\Gamma^\{\\diamondsuit\}\_\{1,2\}\(α1,α2,α3\)⟼\(α1α2α3,\(1−α1\)α2α31−α1α2α3,\(1−α2\)α31−α2α3\)\\displaystyle\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\)\\longmapsto\\left\(\\alpha\_\{1\}\\alpha\_\{2\}\\alpha\_\{3\},\\frac\{\(1\-\\alpha\_\{1\}\)\\alpha\_\{2\}\\alpha\_\{3\}\}\{1\-\\alpha\_\{1\}\\alpha\_\{2\}\\alpha\_\{3\}\},\\frac\{\(1\-\\alpha\_\{2\}\)\\alpha\_\{3\}\}\{1\-\\alpha\_\{2\}\\alpha\_\{3\}\}\\right\)𝖳1\\mathsf\{T\}\_\{1\}𝖳2\\mathsf\{T\}\_\{2\}𝖳4\\mathsf\{T\}\_\{4\}𝖳5\\mathsf\{T\}\_\{5\}𝖳3\\mathsf\{T\}\_\{3\}Figure 8:The pentagon identity satisfied by the Tamari\-cover reparameterizations forN=4N\\\!=\\\!4\. The vertices of the11\-skeleton of the associahedron𝖠𝗌𝗌𝗈𝖼4\\mathsf\{Assoc\}\_\{4\}are the five rooted planar binary trees in𝖸4\\mathsf\{Y\}\_\{4\}\. Edges represent Tamari cover moves\. The two highlighted directed paths from the left comb𝖳1\\mathsf\{T\}\_\{1\}to the right comb𝖳5\\mathsf\{T\}\_\{5\}induce the same reparameterization\. By Theorem[2](https://arxiv.org/html/2606.04779#Thmtheorem2), both paths preserve the root output and therefore the tree\-relative complementarity functional\.
## 8Complementarity in Binary Classification
We now investigate complementarity in binary classification, with labelsyi∈\{0,1\}=𝒴y\_\{i\}\\in\\\{0,1\\\}=\\mathcal\{Y\}and predicted probabilitiesy^i∈\(0,1\)=𝒴^\\hat\{y\}\_\{i\}\\in\(0,1\)=\\hat\{\\mathcal\{Y\}\}, wherey^i\\hat\{y\}\_\{i\}denotes the predicted probability of class11\.
###### Definition 9\(Endpoint\-monotone loss\)\.
A loss functionℓ:𝒴×𝒴^→\[0,∞\)\\ell:\\mathcal\{Y\}\\times\\widehat\{\\mathcal\{Y\}\}\\to\[0,\\infty\)is called*endpoint\-monotone*ify^↦ℓ\(0,y^\)\\hat\{y\}\\mapsto\\ell\(0,\\hat\{y\}\)is nondecreasing andy^↦ℓ\(1,y^\)\\hat\{y\}\\mapsto\\ell\(1,\\hat\{y\}\)is nonincreasing on\(0,1\)\(0,1\)\.
Thus, under endpoint\-monotonicity, when the true label is0, assigning more probability to class11cannot decrease the loss; when the true label is11, assigning more probability to class11cannot increase the loss\. This condition is satisfied by the main probabilistic classification losses used in practice\.
###### Proposition 8\.
The following binary classification losses are endpoint\-monotone:
1. \(i\)ℓF\(y,y^\)=DF\(y,y^\)\\ell\_\{F\}\(y,\\hat\{y\}\)=D\_\{F\}\(y,\\hat\{y\}\), whereDFD\_\{F\}is a Bregman divergence generated by a convexFFon\[0,1\]\[0,1\], differentiable on\(0,1\)\(0,1\), and with finite endpoint values\(Bregman,[1967](https://arxiv.org/html/2606.04779#bib.bib72)\);
2. \(ii\)ℓf\(y,y^\)=Df\(Py∥Qy^\)\\ell\_\{f\}\(y,\\hat\{y\}\)=D\_\{f\}\(P\_\{y\}\\\|Q\_\{\\hat\{y\}\}\), whereDfD\_\{f\}is anff\-divergence between the Bernoulli distributions of ground truth and predicted probabilities, withf:\[0,∞\)→ℝf:\[0,\\infty\)\\rightarrow\\mathbb\{R\}convex, finite on\[0,∞\)\[0,\\infty\), differentiable on\(0,∞\)\(0,\\infty\), and normalized byf\(1\)=0f\(1\)=0\(Ali and Silvey,[1966](https://arxiv.org/html/2606.04779#bib.bib79)\)\.
###### Corollary 3\.
The following binary classification losses are endpoint\-monotone: the Brier loss, binary cross\-entropy/Kullback\-Leibler loss, squared Hellinger loss, Pearsonχ2\\chi^\{2\}loss, Jensen–Shannon loss, and Tsallisff\-divergence losses\(Tsallis,[1988](https://arxiv.org/html/2606.04779#bib.bib71); Furuichiet al\.,[2004](https://arxiv.org/html/2606.04779#bib.bib70)\)\.
###### Proof\.
The Brier loss and binary cross\-entropy are Bregman losses, generated respectively byF\(y^\)=y^2F\(\\hat\{y\}\)=\\hat\{y\}^\{2\}and by the negative entropyF\(y^\)=y^logy^\+\(1−y^\)log\(1−y^\)F\(\\hat\{y\}\)=\\hat\{y\}\\log\\hat\{y\}\+\(1\-\\hat\{y\}\)\\log\(1\-\\hat\{y\}\)\. The remaining examples areff\-divergence losses, withf:\[0,∞\)→ℝf:\[0,\\infty\)\\rightarrow\\mathbb\{R\}\. Squared Hellinger is generated, up to normalization, byf\(t\)=12\(t−1\)2f\(t\)=\\frac\{1\}\{2\}\(\\sqrt\{t\}\-1\)^\{2\}; Pearsonχ2\\chi^\{2\}byf\(t\)=\(t−1\)2f\(t\)=\(t\-1\)^\{2\}; Jensen–Shannon by
f\(t\)=12\[tlog2t1\+t\+log21\+t\];f\(t\)=\\frac\{1\}\{2\}\\left\[t\\log\\frac\{2t\}\{1\+t\}\+\\log\\frac\{2\}\{1\+t\}\\right\];and Tsallisff\-divergences, forq\>0q\>0,q≠1q\\neq 1, by the normalized generatorfq\(t\)=tq−tq−1f\_\{q\}\(t\)=\\frac\{t^\{q\}\-t\}\{q\-1\}\. These generators are convex, differentiable on\(0,∞\)\(0,\\infty\), normalized byf\(1\)=0f\(1\)=0and finite at0\. Proposition[8](https://arxiv.org/html/2606.04779#Thmproposition8)therefore applies\. ∎
### 8\.1An Impossibility Theorem for Complementarity in Binary Classification
We now obtain a structural impossibility result for binary classification\. The result derives from combining endpoint\-monotonicity of loss functions and a local condition on interaction rules in the trees\.
###### Definition 10\(Internality property of local rules\)\.
A local rulem2:\(0,1\)n×\(0,1\)n→\(0,1\)nm\_\{2\}:\(0,1\)^\{n\}\\times\(0,1\)^\{n\}\\to\(0,1\)^\{n\}satisfies the internality property if, for everyu,v∈\(0,1\)nu,v\\in\(0,1\)^\{n\}and every coordinateii,min\{ui,vi\}≤\(m2\(u,v\)\)i≤max\{ui,vi\}\\min\\\{u\_\{i\},v\_\{i\}\\\}\\leq\(m\_\{2\}\(u,v\)\)\_\{i\}\\leq\\max\\\{u\_\{i\},v\_\{i\}\\\}\.
Internality captures interaction rules that interpolate between available predicted probabilities\. This is a standard property of quasi\-arithmetic means\(Kolmogorov,[1930](https://arxiv.org/html/2606.04779#bib.bib67); Nagumo,[1930](https://arxiv.org/html/2606.04779#bib.bib74)\):
###### Lemma 2\.
Letm2ρm\_\{2\}^\{\\rho\}be a quasi\-arithmetic mean\. Then it satisfies the internality property\.
We begin with an observation: the internality property is preserved through tree composition\.
###### Lemma 3\.
Letm2m\_\{2\}satisfy the internality property, let𝖳∈𝖸N\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}, and fix a caseii\. Ify^i\(1\),…,y^i\(N\)∈\(0,1\)\\hat\{y\}\_\{i\}^\{\(1\)\},\\dots,\\hat\{y\}\_\{i\}^\{\(N\)\}\\in\(0,1\)are the leaf probabilities on that case andy^i𝖳\\hat\{y\}\_\{i\}^\{\\mathsf\{T\}\}is the output produced by𝖳\\mathsf\{T\}, then
min1≤j≤Ny^i\(j\)≤y^i𝖳≤max1≤j≤Ny^i\(j\)\.\\min\_\{1\\leq j\\leq N\}\\hat\{y\}\_\{i\}^\{\(j\)\}\\leq\\hat\{y\}\_\{i\}^\{\\mathsf\{T\}\}\\leq\\max\_\{1\\leq j\\leq N\}\\hat\{y\}\_\{i\}^\{\(j\)\}\.\(24\)
###### Proof\.
Proceed by induction on the number of leaves of𝖳\\mathsf\{T\}\. The case of one leaf is immediate\. For the inductive step, suppose the root of𝖳\\mathsf\{T\}combines two subtrees𝖳L\\mathsf\{T\}\_\{L\}and𝖳R\\mathsf\{T\}\_\{R\}, with outputsuiu\_\{i\}andviv\_\{i\}on caseii\. By the inductive hypothesis, each ofuiu\_\{i\}andviv\_\{i\}lies in the interval spanned by the leaf probabilitiesy^i\(1\),…,y^i\(N\)\\hat\{y\}\_\{i\}^\{\(1\)\},\\dots,\\hat\{y\}\_\{i\}^\{\(N\)\}\. Sincem2m\_\{2\}is internal,y^i𝖳=m2\(ui,vi\)\\hat\{y\}\_\{i\}^\{\\mathsf\{T\}\}=m\_\{2\}\(u\_\{i\},v\_\{i\}\)lies betweenuiu\_\{i\}andviv\_\{i\}, hence also in the interval spanned by the leaves\. ∎
Then we arrive at:
###### Theorem 4\(Impossibility of complementarity in binary classification\)\.
LetN≥2N\\\!\\geq\\\!2, let𝖳∈𝖸N\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}, and letD=\{\(xi,yi\)\}i=1nD=\\\{\(x\_\{i\},y\_\{i\}\)\\\}\_\{i=1\}^\{n\}be a labeled dataset withyi∈\{0,1\}y\_\{i\}\\in\\\{0,1\\\}\. Lety^\(1\),…,y^\(N\)∈\(0,1\)n\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\}\\in\(0,1\)^\{n\}be any collection of prediction vectors onDD\. Ifm2m\_\{2\}satisfies the internality property, for every endpoint\-monotone binary lossℓ\\ellin Definition[9](https://arxiv.org/html/2606.04779#Thmdefinition9),
Ψ𝖳m𝖳\(y^\(1\),…,y^\(N\);D\)≤0\.\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)\\leq 0\.
###### Proof\.
By Lemma[3](https://arxiv.org/html/2606.04779#Thmlemma3), for every caseii,min1≤j≤Ny^i\(j\)≤y^i𝖳≤max1≤j≤Ny^i\(j\)\\min\_\{1\\leq j\\leq N\}\\hat\{y\}\_\{i\}^\{\(j\)\}\\leq\\hat\{y\}\_\{i\}^\{\\mathsf\{T\}\}\\leq\\max\_\{1\\leq j\\leq N\}\\hat\{y\}\_\{i\}^\{\(j\)\}\. Ifyi=0y\_\{i\}=0, endpoint monotonicity gives
ℓ\(0,y^i𝖳\)≥ℓ\(0,min1≤j≤Ny^i\(j\)\)=min1≤j≤Nℓ\(0,y^i\(j\)\)\.\\ell\(0,\\hat\{y\}\_\{i\}^\{\\mathsf\{T\}\}\)\\geq\\ell\\\!\\left\(0,\\min\_\{1\\leq j\\leq N\}\\hat\{y\}\_\{i\}^\{\(j\)\}\\right\)=\\min\_\{1\\leq j\\leq N\}\\ell\(0,\\hat\{y\}\_\{i\}^\{\(j\)\}\)\.Ifyi=1y\_\{i\}=1, endpoint monotonicity gives
ℓ\(1,y^i𝖳\)≥ℓ\(1,max1≤j≤Ny^i\(j\)\)=min1≤j≤Nℓ\(1,y^i\(j\)\)\.\\ell\(1,\\hat\{y\}\_\{i\}^\{\\mathsf\{T\}\}\)\\geq\\ell\\\!\\left\(1,\\max\_\{1\\leq j\\leq N\}\\hat\{y\}\_\{i\}^\{\(j\)\}\\right\)=\\min\_\{1\\leq j\\leq N\}\\ell\(1,\\hat\{y\}\_\{i\}^\{\(j\)\}\)\.Therefore, for everyi=1,…,ni=1,\\dots,n, we have thatmin1≤j≤Nℓ\(yi,y^i\(j\)\)≤ℓ\(yi,y^i𝖳\)\\min\_\{1\\leq j\\leq N\}\\ell\(y\_\{i\},\\hat\{y\}\_\{i\}^\{\(j\)\}\)\\leq\\ell\(y\_\{i\},\\hat\{y\}\_\{i\}^\{\\mathsf\{T\}\}\)\. Averaging overiigives
1n∑i=1nmin1≤j≤Nℓ\(yi,y^i\(j\)\)≤1n∑i=1nℓ\(yi,y^i𝖳\)\.\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\min\_\{1\\leq j\\leq N\}\\ell\(y\_\{i\},\\hat\{y\}\_\{i\}^\{\(j\)\}\)\\leq\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\ell\(y\_\{i\},\\hat\{y\}\_\{i\}^\{\\mathsf\{T\}\}\)\.Thus
Ψ𝖳m𝖳\(y^\(1\),…,y^\(N\);D\)≤0\.∎\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{\\mathsf\{T\}\}\}\(\\hat\{y\}^\{\(1\)\},\\dots,\\hat\{y\}^\{\(N\)\};D\)\\leq 0\.\\qed
###### Corollary 4\.
In binary classification, complementarity cannot be achieved by any internal local rule when the loss is one of the Bregman or finite Bernoulliff\-divergence losses covered by Proposition[8](https://arxiv.org/html/2606.04779#Thmproposition8)\.
In summary, Theorem[4](https://arxiv.org/html/2606.04779#Thmtheorem4)identifies two sources of obstruction to complementarity in binary classification: endpoint\-monotone losses combined with internal local rules cannot outperform the pointwise best available leaf prediction\. Hence, achieving complementarity in binary classification requires relaxing at least one of these two conditions\. Relaxing endpoint monotonicity seems to be conceptually unattractive, since it encodes a natural requirement of loss functions in binary classification\. The alternative is therefore to relax internality\. A possibility to do this is to amplify a quasi\-arithmetic logit score before mapping it back to probability space\. For instance, the amplified logit rule becomes
\(m2,α,λlogit\(u,v\)\)i:=σ\(λ\[αlogit\(ui\)\+\(1−α\)logit\(vi\)\]\),\(m\_\{2,\\alpha,\\lambda\}^\{\\mathrm\{logit\}\}\(u,v\)\)\_\{i\}:=\\sigma\\\!\\Bigl\(\\lambda\\bigl\[\\alpha\\operatorname\{logit\}\(u\_\{i\}\)\+\(1\-\\alpha\)\\operatorname\{logit\}\(v\_\{i\}\)\\bigr\]\\Bigr\),\(25\)whereσ\(t\)=\(1\+e−t\)−1,t∈ℝ\\sigma\(t\)=\(1\+e^\{\-t\}\)^\{\-1\},t\\in\\mathbb\{R\}andλ≥1\\lambda\\geq 1\. In HAI terms, amplification moves the combined prediction of two agents away from the zero in logit space, or equivalently away from probabilityσ\(0\)=12\\sigma\(0\)=\\frac\{1\}\{2\}, in the direction selected by the agents’ pooled log\-odds predictions\. Note that the rulem2,α,λlogitm\_\{2,\\alpha,\\lambda\}^\{\\mathrm\{logit\}\}is normalized logarithmic pooling of Bernoulli forecasts\(Neyman and Roughgarden,[2023b](https://arxiv.org/html/2606.04779#bib.bib68),[a](https://arxiv.org/html/2606.04779#bib.bib69)\)ifλ=1\\lambda=1\. ForN=2N=2, the effect of amplifying the logit rule on complementarity is explicitly derived\. Write
zi\(α\):=αlogit\(y^i\(1\)\)\+\(1−α\)logit\(y^i\(2\)\),miα,λ:=σ\(λzi\(α\)\)\.z\_\{i\}\(\\alpha\):=\\alpha\\operatorname\{logit\}\(\\hat\{y\}\_\{i\}^\{\(1\)\}\)\+\(1\-\\alpha\)\\operatorname\{logit\}\(\\hat\{y\}\_\{i\}^\{\(2\)\}\),\\qquad m\_\{i\}^\{\\alpha,\\lambda\}:=\\sigma\(\\lambda z\_\{i\}\(\\alpha\)\)\.Under binary cross\-entropy, withI1=\{i:yi=1\}I\_\{1\}=\\\{i:y\_\{i\}=1\\\}andI0=\{i:yi=0\}I\_\{0\}=\\\{i:y\_\{i\}=0\\\}, the amplified logit rule gives the complementarity functional
nΨα,λ=∑i∈I1\[−logmax\{y^i\(1\),y^i\(2\)\}−log\(1\+e−λzi\(α\)\)\]\+∑i∈I0\[−logmax\{1−y^i\(1\),1−y^i\(2\)\}−log\(1\+eλzi\(α\)\)\]\.n\\Psi^\{\\alpha,\\lambda\}=\\sum\_\{i\\in I\_\{1\}\}\\left\[\-\\log\\max\\\{\\hat\{y\}\_\{i\}^\{\(1\)\},\\hat\{y\}\_\{i\}^\{\(2\)\}\\\}\-\\log\(1\+e^\{\-\\lambda z\_\{i\}\(\\alpha\)\}\)\\right\]\+\\sum\_\{i\\in I\_\{0\}\}\\left\[\-\\log\\max\\\{1\-\\hat\{y\}\_\{i\}^\{\(1\)\},1\-\\hat\{y\}\_\{i\}^\{\(2\)\}\\\}\-\\log\(1\+e^\{\\lambda z\_\{i\}\(\\alpha\)\}\)\\right\]\.Thus, fori∈I1i\\in I\_\{1\}, theii\-th summand is positive if and only if\(m2,α,λlogit\(y^\(1\),y^\(2\)\)\)i\>max\{y^i\(1\),y^i\(2\)\}\(m\_\{2,\\alpha,\\lambda\}^\{\\mathrm\{logit\}\}\(\\hat\{y\}^\{\(1\)\},\\hat\{y\}^\{\(2\)\}\)\)\_\{i\}\>\\max\\\{\\hat\{y\}\_\{i\}^\{\(1\)\},\\hat\{y\}\_\{i\}^\{\(2\)\}\\\}\. Fori∈I0i\\in I\_\{0\}, it is positive if and only if\(m2,α,λlogit\(y^\(1\),y^\(2\)\)\)i<min\{y^i\(1\),y^i\(2\)\}\(m\_\{2,\\alpha,\\lambda\}^\{\\mathrm\{logit\}\}\(\\hat\{y\}^\{\(1\)\},\\hat\{y\}^\{\(2\)\}\)\)\_\{i\}<\\min\\\{\\hat\{y\}\_\{i\}^\{\(1\)\},\\hat\{y\}\_\{i\}^\{\(2\)\}\\\}\. For these local complementarity cases the non\-internal outputm2,α,λlogit\(y^\(1\),y^\(2\)\)m\_\{2,\\alpha,\\lambda\}^\{\\mathrm\{logit\}\}\(\\hat\{y\}^\{\(1\)\},\\hat\{y\}^\{\(2\)\}\)must move outside the interval spanned by the two input probabilities, and it must do so in the direction of the true class\. We analyze this point empirically in the section that follows\.
#### 8\.1\.1Numerical illustration of complementarity for binary classification beyond internality
We study complementarity in theN=2N\\\!=\\\!2binary\-classification case under cross\-entropy using synthetic pairs of probabilistic predictors\. We fixα=0\.5\\alpha=0\.5and evaluate the amplified logit rule from Equation \([25](https://arxiv.org/html/2606.04779#S8.E25)\) for several amplification levelsλ≥1\\lambda\\geq 1\. For each simulated pair, we compute the global complementarity valuenΨn\\Psiand the class\-wise ratesk0k\_\{0\}andk1k\_\{1\}of canonical local complementarity: onI0I\_\{0\}, the pooled prediction lies below both input probabilities, while onI1I\_\{1\}, it lies above both input probabilities\. Atλ=1\\lambda=1, ordinary internal logit pooling cannot satisfy these strict outside\-interval conditions, in line with Theorem[4](https://arxiv.org/html/2606.04779#Thmtheorem4)\. Forλ\>1\\lambda\>1, amplification can generate such local complementarity cases by moving the pooled logit away from0, toward class0or class11depending on its sign\. Figure[9](https://arxiv.org/html/2606.04779#S8.F9)shows that larger values ofk0k\_\{0\}andk1k\_\{1\}are associated with positive global complementarity, although the relation is not deterministic: because cross\-entropy is unbounded, a small number of amplified wrong\-sign cases can outweigh many locally positive gains\.
Figure 9:N=2N\\\!=\\\!2binary classification under cross\-entropy with amplified logit pooling and fixedα=0\.5\\alpha=0\.5\. Each point is one simulated pair of probabilistic predictors, plotted by the class\-wise rates\(k0,k1\)\(k\_\{0\},k\_\{1\}\)of canonical local complementarity: foryi=0y\_\{i\}=0, the pooled prediction lies below both input probabilities; foryi=1y\_\{i\}=1, it lies above both input probabilities\. Color indicates global complementarity: blue points satisfynΨ𝖳m2,α,λlogit\>0n\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{2,\\alpha,\\lambda\}^\{\\mathrm\{logit\}\}\}\>0, while red points do not\. The plot illustrates that larger local\-complementarity rates are associated with positive global complementarity, although the relation is not deterministic because cross\-entropy losses can be dominated by a small number of amplified wrong\-sign cases\.
## 9Discussion and Conclusions
We provided a tree\-based mathematical language for formalizing complementarity as a property of HAI protocols\. We summarize the implications for HAI of our formalism in four messages\.
First,*complementarity is not reliance*\. If an HAI protocol only selects among existing agent predictions, as in self\-reliance or AI\-reliance, then it cannot achieve complementarity relative to the pointwise\-min oracle benchmark\. This holds regardless of task, loss, or prediction quality\. Appropriate reliance may still be valuable in certain real\-world HAI settings\(Schemmeret al\.,[2023](https://arxiv.org/html/2606.04779#bib.bib21)\), but it is not sufficient for complementarity in the strict sense studied here\. In summary, complementarity requires aggregation of input predictions beyond choosing one among them\.
Second,*in two\-agent regression under squared loss, human–AI complementarity is about AI residual correction\.*In this setting, maximizing complementarity is equivalent to moving the human–AI interaction protocol output closer to the ground\-truth vector\. In the linear aggregation case, the optimal weight is determined by the projection of the ground truth onto the line segment between the two predictions\. Thus, complementarity depends on whether the human–AI disagreement direction corrects the AI residual, and whether the correction is large enough to place the projection of ground truth inside the feasible pooling segment\. This gives a geometric interpretation of why some human–AI disagreements in real\-world HAIs may be useful and others are not\.
Third,*in regression, HAI protocol matters when formalizing complementarity, but complementarity invariance across protocol trees is possible\.*ForN≥3N\\geq 3, non\-associative local aggregation makes the protocol tree a design variable: the same ordered agents and the same local rule can lead to different outputs and different complementarity values\. TheN=3N\\\!=\\\!3protocol\-indifference analysis shows that the equality locus contains boundary components induced by degenerate local weights and nontrivial interior branches determined by the geometry of the prediction vectors\. At the same time, the dependence of HAI protocols on tree topology does not mean that every protocol tree change is substantively different\. The barycentric and Tamari results show that some protocol changes can be compensated by an appropriate transport of local parameters\. Under linear pooling, trees provide different coordinate charts for the same simplex of leaf weights, and Tamari\-cover reparameterizations preserve complementarity by preserving the induced barycentric output\. ForN=4N=4, the two directed Tamari paths from the left comb to the right comb induce the same reparameterization\. This shows that the invariance satisfies the pentagon identity\.
Lastly,*in binary classification under standard loss functions, interpolation of agent predictions is insufficient to reach complementarity\.*This obstruction holds for endpoint\-monotone losses, including cross\-entropy and standard Bregman or many finite Bernoulliff\-divergence losses, whenever the local aggregation rule satisfies the internality property\. In particular, quasi\-arithmetic means fall under the impossibility theorem\. Although amplified logit pooling can help escape the impossibility theorem, as shown in the numerical simulation in Section[8\.1\.1](https://arxiv.org/html/2606.04779#S8.SS1.SSS1), the classification impossibility theorem is a negative result for HAI research\. Thus, if one accepts the pointwise\-min oracle benchmark as the relevant standard for complementarity in certain high\-stakes HAI domains, then many common human–AI classification protocols cannot achieve complementarity by interpolating between human and AI probabilities\(Vaccaroet al\.,[2024](https://arxiv.org/html/2606.04779#bib.bib6)\)\. This matters for the HAI research domain and calls for a revision of the empirical approach to complementarity in classification tasks\.
Future work can extend the proposed framework in different directions\. These include adding explicit costs for interaction depth, monitoring burden, and protocol complexity—see*efficient complementarity*in\(Ferrarioet al\.,[2026](https://arxiv.org/html/2606.04779#bib.bib65)\)—in the optimization problems \([9](https://arxiv.org/html/2606.04779#S3.E9)\) and \([10](https://arxiv.org/html/2606.04779#S3.E10)\); integrating feasibility constraints arising from fairness, robustness, oversight, documentation, and domain\-specific workflow rules within trees; studying the stability of complementarity under dataset shift and temporal change; and testing the framework on real human–AI and multi\-agent prediction settings\.
## Acknowledgments
This work was partly conducted within the framework of the EUonAIR Centre of Excellence in Responsible AI and Education\. It was partially supported by a grant from Movetia, funded by the Swiss Confederation\. We also thank Alessandro Facchini and Matteo Casserini for many useful conversations on complementarity and human–AI interactions\.
## Appendix
## Appendix AProof of Proposition[4](https://arxiv.org/html/2606.04779#Thmproposition4)
###### Proof\.
Write
∑i=1n\(αy^iH\+\(1−α\)y^iAI−yi\)2=Anα2\+2Bnα\+Cn,\\sum\_\{i=1\}^\{n\}\(\\alpha\\hat\{y\}\_\{i\}^\{H\}\+\(1\-\\alpha\)\\hat\{y\}\_\{i\}^\{AI\}\-y\_\{i\}\)^\{2\}=A\_\{n\}\\alpha^\{2\}\+2B\_\{n\}\\alpha\+C\_\{n\},where
An=‖y^H−y^AI‖22,Bn=⟨y^H−y^AI,y^AI−y⟩,Cn=‖y^AI−y‖22\.A\_\{n\}=\\\|\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}\\\|\_\{2\}^\{2\},\\quad B\_\{n\}=\\langle\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\},\\hat\{y\}^\{AI\}\-y\\rangle,\\quad C\_\{n\}=\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}^\{2\}\.Thus
nΨ𝖳m2,αid=−Anα2−2Bnα\+\(nKn−Cn\),n\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{2,\\alpha\}^\{\\mathrm\{id\}\}\}=\-A\_\{n\}\\alpha^\{2\}\-2B\_\{n\}\\alpha\+\(nK\_\{n\}\-C\_\{n\}\),which is Equation \([15](https://arxiv.org/html/2606.04779#S5.E15)\), where
nKn=∑i=1nmin\{\(yi−y^iH\)2,\(yi−y^iAI\)2\}\.nK\_\{n\}=\\sum\_\{i=1\}^\{n\}\\min\\\!\\left\\\{\(y\_\{i\}\-\\hat\{y\}\_\{i\}^\{H\}\)^\{2\},\\,\(y\_\{i\}\-\\hat\{y\}\_\{i\}^\{AI\}\)^\{2\}\\right\\\}\.IfAn=0A\_\{n\}=0, theny^H=y^AI\\hat\{y\}^\{H\}=\\hat\{y\}^\{AI\}, henceBn=0B\_\{n\}=0,nKn=CnnK\_\{n\}=C\_\{n\}, andnΨ𝖳m2,αid=0n\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{2,\\alpha\}^\{\\mathrm\{id\}\}\}=0for allα∈\[0,1\]\\alpha\\in\[0,1\]\. Thus everyα∈\[0,1\]\\alpha\\in\[0,1\]is optimal, but no complementarity is possible\.
Assume now thatAn\>0A\_\{n\}\>0\. The function
Q\(α\):=−Anα2−2Bnα\+\(nKn−Cn\)Q\(\\alpha\):=\-A\_\{n\}\\alpha^\{2\}\-2B\_\{n\}\\alpha\+\(nK\_\{n\}\-C\_\{n\}\)is a strictly concave quadratic on\[0,1\]\[0,1\]\. Its derivative isQ′\(α\)=−2Anα−2BnQ^\{\\prime\}\(\\alpha\)=\-2A\_\{n\}\\alpha\-2B\_\{n\}, so the unconstrained maximizer isα0=−BnAn\\alpha\_\{0\}=\-\\frac\{B\_\{n\}\}\{A\_\{n\}\}\. Restricting to\[0,1\]\[0,1\]gives the unique constrained maximizerα∗=Π\[0,1\]\(−BnAn\)\\alpha^\{\\ast\}=\\Pi\_\{\[0,1\]\}\\\!\\left\(\-\\frac\{B\_\{n\}\}\{A\_\{n\}\}\\right\)\. Equivalently,
α∗=\{0,Bn≥0,−BnAn,−An<Bn<0,1,Bn≤−An\.\\alpha^\{\\ast\}=\\begin\{cases\}0,&B\_\{n\}\\geq 0,\\\\\[4\.0pt\] \-\\dfrac\{B\_\{n\}\}\{A\_\{n\}\},&\-A\_\{n\}<B\_\{n\}<0,\\\\\[10\.0pt\] 1,&B\_\{n\}\\leq\-A\_\{n\}\.\\end\{cases\}
The optimized value is
nΨ𝖳m2,α∗id=nKn−\(An\(α∗\)2\+2Bnα∗\+Cn\)\.n\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{2,\\alpha^\{\\ast\}\}^\{\\mathrm\{id\}\}\}=nK\_\{n\}\-\\bigl\(A\_\{n\}\(\\alpha^\{\\ast\}\)^\{2\}\+2B\_\{n\}\\alpha^\{\\ast\}\+C\_\{n\}\\bigr\)\.In the interior case−An<Bn<0\-A\_\{n\}<B\_\{n\}<0, this becomes
nΨ𝖳m2,α∗id=nKn−Cn\+Bn2An\.n\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{2,\\alpha^\{\\ast\}\}^\{\\mathrm\{id\}\}\}=nK\_\{n\}\-C\_\{n\}\+\\frac\{B\_\{n\}^\{2\}\}\{A\_\{n\}\}\.Equivalently, the optimized squared loss of the aggregate is
Cn−Bn2An\.C\_\{n\}\-\\frac\{B\_\{n\}^\{2\}\}\{A\_\{n\}\}\.
At the endpoints, the aggregate reduces to one of the standalone predictions:
α∗=0⇒nΨ𝖳m2,0id=nKn−‖y^AI−y‖22≤0,\\alpha^\{\\ast\}=0\\quad\\Rightarrow\\quad n\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{2,0\}^\{\\mathrm\{id\}\}\}=nK\_\{n\}\-\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}^\{2\}\\leq 0,and
α∗=1⇒nΨ𝖳m2,1id=nKn−‖y^H−y‖22≤0\.\\alpha^\{\\ast\}=1\\quad\\Rightarrow\\quad n\\Psi\_\{\\mathsf\{T\}\}^\{m\_\{2,1\}^\{\\mathrm\{id\}\}\}=nK\_\{n\}\-\\\|\\hat\{y\}^\{H\}\-y\\\|\_\{2\}^\{2\}\\leq 0\.Therefore complementarity, when it occurs, can only occur in the interior case\. ∎
## Appendix BProof of Proposition[6](https://arxiv.org/html/2606.04779#Thmproposition6)
###### Proof\.
We prove that, for every𝖳∈𝖸N\\mathsf\{T\}\\in\\mathsf\{Y\}\_\{N\}, the restrictionφ𝖳\|\(0,1\)N−1:\(0,1\)N−1→int\(ΔN−1\)\\varphi\_\{\\mathsf\{T\}\}\|\_\{\(0,1\)^\{N\-1\}\}:\(0,1\)^\{N\-1\}\\rightarrow\\operatorname\{int\}\(\\Delta^\{N\-1\}\)is a bijection\.
Let
𝖳=σ∨τ∈𝖸N,σ∈𝖸N1,τ∈𝖸N2,N1\+N2=N\.\\mathsf\{T\}=\\sigma\\vee\\tau\\in\\mathsf\{Y\}\_\{N\},\\qquad\\sigma\\in\\mathsf\{Y\}\_\{N\_\{1\}\},\\qquad\\tau\\in\\mathsf\{Y\}\_\{N\_\{2\}\},\\qquad N\_\{1\}\+N\_\{2\}=N\.With respect to the recursive left\-to\-right ordering of internal nodes, the root of𝖳\\mathsf\{T\}has indexN1N\_\{1\}\. Hence every parameter vector𝜶∈\(0,1\)N−1\\boldsymbol\{\\alpha\}\\in\(0,1\)^\{N\-1\}decomposes as𝜶=\(𝜶L,αN1,𝜶R\)\\boldsymbol\{\\alpha\}=\(\\boldsymbol\{\\alpha\}\_\{L\},\\alpha\_\{N\_\{1\}\},\\boldsymbol\{\\alpha\}\_\{R\}\), where𝜶L=\(α1,…,αN1−1\)\\boldsymbol\{\\alpha\}\_\{L\}=\(\\alpha\_\{1\},\\dots,\\alpha\_\{N\_\{1\}\-1\}\),𝜶R=\(αN1\+1,…,αN−1\)\\boldsymbol\{\\alpha\}\_\{R\}=\(\\alpha\_\{N\_\{1\}\+1\},\\dots,\\alpha\_\{N\-1\}\)\. By definition,
φ𝖳\(𝜶\)=\(αN1φσ\(𝜶L\),\(1−αN1\)φτ\(𝜶R\)\)\.\\varphi\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\)=\\bigl\(\\alpha\_\{N\_\{1\}\}\\varphi\_\{\\sigma\}\(\\boldsymbol\{\\alpha\}\_\{L\}\),\(1\-\\alpha\_\{N\_\{1\}\}\)\\varphi\_\{\\tau\}\(\\boldsymbol\{\\alpha\}\_\{R\}\)\\bigr\)\.
Injectivity ofφ𝖳\\varphi\_\{\\mathsf\{T\}\}is equivalent to
φ𝖳\(𝜶\)=φ𝖳\(𝜷\)⟹𝜶=𝜷\.\\varphi\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\)=\\varphi\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\beta\}\)\\quad\\Longrightarrow\\quad\\boldsymbol\{\\alpha\}=\\boldsymbol\{\\beta\}\.\(26\)We prove this by induction onNN\. ForN=1N=1, the claim is trivial, sinceφ\(\|;∅\)=\(1\)\\varphi\(\|;\\emptyset\)=\(1\)\. ForN=2N=2, the unique tree satisfiesφ𝖳\(α\)=\(α,1−α\)\\varphi\_\{\\mathsf\{T\}\}\(\\alpha\)=\(\\alpha,1\-\\alpha\), so \([26](https://arxiv.org/html/2606.04779#A2.E26)\) holds\. Assume injectivity holds for all trees with fewer thanNNleaves, and let𝖳=σ∨τ∈𝖸N\\mathsf\{T\}=\\sigma\\vee\\tau\\in\\mathsf\{Y\}\_\{N\}\. Let
𝜶=\(𝜶L,αN1,𝜶R\),𝜷=\(𝜷L,βN1,𝜷R\),\\boldsymbol\{\\alpha\}=\(\\boldsymbol\{\\alpha\}\_\{L\},\\alpha\_\{N\_\{1\}\},\\boldsymbol\{\\alpha\}\_\{R\}\),\\qquad\\boldsymbol\{\\beta\}=\(\\boldsymbol\{\\beta\}\_\{L\},\\beta\_\{N\_\{1\}\},\\boldsymbol\{\\beta\}\_\{R\}\),and supposeφ𝖳\(𝜶\)=φ𝖳\(𝜷\)\\varphi\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\)=\\varphi\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\beta\}\)\. Equating the firstN1N\_\{1\}components and the lastN2N\_\{2\}components gives
αN1φσ,i\(𝜶L\)\\displaystyle\\alpha\_\{N\_\{1\}\}\\varphi\_\{\\sigma,i\}\(\\boldsymbol\{\\alpha\}\_\{L\}\)=βN1φσ,i\(𝜷L\),\\displaystyle=\\beta\_\{N\_\{1\}\}\\varphi\_\{\\sigma,i\}\(\\boldsymbol\{\\beta\}\_\{L\}\),i=1,…,N1,\\displaystyle i=1,\\dots,N\_\{1\},\(1−αN1\)φτ,j\(𝜶R\)\\displaystyle\(1\-\\alpha\_\{N\_\{1\}\}\)\\varphi\_\{\\tau,j\}\(\\boldsymbol\{\\alpha\}\_\{R\}\)=\(1−βN1\)φτ,j\(𝜷R\),\\displaystyle=\(1\-\\beta\_\{N\_\{1\}\}\)\\varphi\_\{\\tau,j\}\(\\boldsymbol\{\\beta\}\_\{R\}\),j=1,…,N2\.\\displaystyle j=1,\\dots,N\_\{2\}\.Summing the first line overi=1,…,N1i=1,\\dots,N\_\{1\}and using∑iφσ,i=1\\sum\_\{i\}\\varphi\_\{\\sigma,i\}=1givesαN1=βN1\\alpha\_\{N\_\{1\}\}=\\beta\_\{N\_\{1\}\}\. SinceαN1∈\(0,1\)\\alpha\_\{N\_\{1\}\}\\in\(0,1\), the first line then reduces toφσ\(𝜶L\)=φσ\(𝜷L\)\\varphi\_\{\\sigma\}\(\\boldsymbol\{\\alpha\}\_\{L\}\)=\\varphi\_\{\\sigma\}\(\\boldsymbol\{\\beta\}\_\{L\}\)\. By the induction hypothesis applied toσ\\sigma, we obtain𝜶L=𝜷L\\boldsymbol\{\\alpha\}\_\{L\}=\\boldsymbol\{\\beta\}\_\{L\}\. Similarly,1−αN1=1−βN1\>01\-\\alpha\_\{N\_\{1\}\}=1\-\\beta\_\{N\_\{1\}\}\>0, so the second line reduces toφτ\(𝜶R\)=φτ\(𝜷R\)\\varphi\_\{\\tau\}\(\\boldsymbol\{\\alpha\}\_\{R\}\)=\\varphi\_\{\\tau\}\(\\boldsymbol\{\\beta\}\_\{R\}\)\. By the induction hypothesis applied toτ\\tau, we obtain𝜶R=𝜷R\\boldsymbol\{\\alpha\}\_\{R\}=\\boldsymbol\{\\beta\}\_\{R\}\. Hence𝜶=𝜷\\boldsymbol\{\\alpha\}=\\boldsymbol\{\\beta\}, proving injectivity\.
Surjectivity ofφ𝖳\\varphi\_\{\\mathsf\{T\}\}is equivalent to
∀𝝎∈int\(ΔN−1\),∃𝜶∈\(0,1\)N−1such thatφ𝖳\(𝜶\)=𝝎\.\\forall\\,\\boldsymbol\{\\omega\}\\in\\operatorname\{int\}\(\\Delta^\{N\-1\}\),\\quad\\exists\\,\\boldsymbol\{\\alpha\}\\in\(0,1\)^\{N\-1\}\\quad\\text\{such that\}\\quad\\varphi\_\{\\mathsf\{T\}\}\(\\boldsymbol\{\\alpha\}\)=\\boldsymbol\{\\omega\}\.\(27\)We again argue by induction onNN\. ForN=1N=1, the statement is trivial\. ForN=2N=2, it follows fromφ𝖳\(α\)=\(α,1−α\)\\varphi\_\{\\mathsf\{T\}\}\(\\alpha\)=\(\\alpha,1\-\\alpha\)\. Assume surjectivity holds for all trees with fewer thanNNleaves, and let𝖳=σ∨τ∈𝖸N\\mathsf\{T\}=\\sigma\\vee\\tau\\in\\mathsf\{Y\}\_\{N\}\. Then \([27](https://arxiv.org/html/2606.04779#A2.E27)\) is equivalent to
ωi\\displaystyle\\omega\_\{i\}=αN1φσ,i\(𝜶L\),\\displaystyle=\\alpha\_\{N\_\{1\}\}\\varphi\_\{\\sigma,i\}\(\\boldsymbol\{\\alpha\}\_\{L\}\),i=1,…,N1\\displaystyle i=1,\\dots,N\_\{1\}ωN1\+j\\displaystyle\\omega\_\{N\_\{1\}\+j\}=\(1−αN1\)φτ,j\(𝜶R\),\\displaystyle=\(1\-\\alpha\_\{N\_\{1\}\}\)\\varphi\_\{\\tau,j\}\(\\boldsymbol\{\\alpha\}\_\{R\}\),j=1,…,N2\\displaystyle j=1,\\dots,N\_\{2\}Summing overi=1,…,N1i=1,\\dots,N\_\{1\}andj=1,…,N2j=1,\\dots,N\_\{2\}gives
αN1=∑i=1N1ωi,1−αN1=∑j=1N2ωN1\+j\.\\alpha\_\{N\_\{1\}\}=\\sum\_\{i=1\}^\{N\_\{1\}\}\\omega\_\{i\},\\quad 1\-\\alpha\_\{N\_\{1\}\}=\\sum\_\{j=1\}^\{N\_\{2\}\}\\omega\_\{N\_\{1\}\+j\}\.Then
φσ,i\(𝜶L\)=ωi∑l=1N1ωl,i=1,…,N1\\displaystyle\\varphi\_\{\\sigma,i\}\(\\boldsymbol\{\\alpha\}\_\{L\}\)=\\frac\{\\omega\_\{i\}\}\{\\sum\_\{l=1\}^\{N\_\{1\}\}\\omega\_\{l\}\},\\quad i=1,\\dots,N\_\{1\}φτ,j\(𝜶R\)=ωN1\+j∑s=1N2ωN1\+s,j=1,…,N2\.\\displaystyle\\varphi\_\{\\tau,j\}\(\\boldsymbol\{\\alpha\}\_\{R\}\)=\\frac\{\\omega\_\{N\_\{1\}\+j\}\}\{\\sum\_\{s=1\}^\{N\_\{2\}\}\\omega\_\{N\_\{1\}\+s\}\},\\quad j=1,\\dots,N\_\{2\}\.The induction step gives𝜶L\\boldsymbol\{\\alpha\}\_\{L\}and𝜶R\\boldsymbol\{\\alpha\}\_\{R\}\. This concludes the proof\. ∎
## Appendix CThe Pentagon Identity on theN=4N\\\!=\\\!4Associahedron
###### Proof\.
To prove Theorem[3](https://arxiv.org/html/2606.04779#Thmtheorem3), we use the five trees𝖳1=\(\(\(12\)3\)4\)\\mathsf\{T\}\_\{1\}=\(\(\(12\)3\)4\),𝖳2=\(\(1\(23\)\)4\)\\mathsf\{T\}\_\{2\}=\(\(1\(23\)\)4\),𝖳3=\(\(12\)\(34\)\)\\mathsf\{T\}\_\{3\}=\(\(12\)\(34\)\),𝖳4=\(1\(\(23\)4\)\)\\mathsf\{T\}\_\{4\}=\(1\(\(23\)4\)\),𝖳5=\(1\(2\(34\)\)\)\\mathsf\{T\}\_\{5\}=\(1\(2\(34\)\)\), which form the vertices of the Stasheff associahedron𝖠𝗌𝗌𝗈𝖼4\\mathsf\{Assoc\}\_\{4\}; see Figure[8](https://arxiv.org/html/2606.04779#S7.F8)\. Throughout, we write𝜶=\(α1,α2,α3\)∈\(0,1\)3\\boldsymbol\{\\alpha\}=\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\)\\in\(0,1\)^\{3\}for the parameterization of the ‘left comb’𝖳1\\mathsf\{T\}\_\{1\}, whereα1\\alpha\_\{1\}labels the node\(12\)\(12\),α2\\alpha\_\{2\}labels the node\(\(12\)3\)\(\(12\)3\), andα3\\alpha\_\{3\}labels the root\.
### C\.1Path I:𝖳1→𝖳3→𝖳5\\mathsf\{T\}\_\{1\}\\to\\mathsf\{T\}\_\{3\}\\to\\mathsf\{T\}\_\{5\}
First consider the cover
𝖳1=\(\(\(12\)3\)4\)⟶𝖳3=\(\(12\)\(34\)\)\.\\mathsf\{T\}\_\{1\}=\(\(\(12\)3\)4\)\\longrightarrow\\mathsf\{T\}\_\{3\}=\(\(12\)\(34\)\)\.This rotates the fragment\(\(AB\)C\)\(\(AB\)C\)with
A=\(12\),B=3,C=4\.A=\(12\),\\qquad B=3,\\qquad C=4\.Hence, using the definition of the mapΓ♢\\Gamma^\{\\diamondsuit\}, the parameters on𝖳3\\mathsf\{T\}\_\{3\}are
Γ1,3♢\(α1,α2,α3\)=\(α1,α2α3,\(1−α2\)α31−α2α3\)\.\\Gamma^\{\\diamondsuit\}\_\{1,3\}\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\)=\\left\(\\alpha\_\{1\},\\,\\alpha\_\{2\}\\alpha\_\{3\},\\,\\frac\{\(1\-\\alpha\_\{2\}\)\\alpha\_\{3\}\}\{1\-\\alpha\_\{2\}\\alpha\_\{3\}\}\\right\)\.Here the coordinates on𝖳3\\mathsf\{T\}\_\{3\}are ordered as: parameter at\(12\)\(12\), parameter at the root\(\(12\)\(34\)\)\(\(12\)\(34\)\), and parameter at\(34\)\(34\)\. Next consider the cover
𝖳3=\(\(12\)\(34\)\)⟶𝖳5=\(1\(2\(34\)\)\)\.\\mathsf\{T\}\_\{3\}=\(\(12\)\(34\)\)\\longrightarrow\\mathsf\{T\}\_\{5\}=\(1\(2\(34\)\)\)\.This rotates the root fragment
\(\(AB\)C\)⟶\(A\(BC\)\)\(\(AB\)C\)\\longrightarrow\(A\(BC\)\)with
A=1,B=2,C=\(34\)\.A=1,\\qquad B=2,\\qquad C=\(34\)\.If we write
\(γ1,γ2,γ3\)=Γ1,3♢\(α1,α2,α3\),\(\\gamma\_\{1\},\\gamma\_\{2\},\\gamma\_\{3\}\)=\\Gamma^\{\\diamondsuit\}\_\{1,3\}\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\),then
Γ3,5♢\(γ1,γ2,γ3\)=\(γ1γ2,\(1−γ1\)γ21−γ1γ2,γ3\)\.\\Gamma^\{\\diamondsuit\}\_\{3,5\}\(\\gamma\_\{1\},\\gamma\_\{2\},\\gamma\_\{3\}\)=\\left\(\\gamma\_\{1\}\\gamma\_\{2\},\\,\\frac\{\(1\-\\gamma\_\{1\}\)\\gamma\_\{2\}\}\{1\-\\gamma\_\{1\}\\gamma\_\{2\}\},\\,\\gamma\_\{3\}\\right\)\.Substituting the values ofγ1,γ2,γ3\\gamma\_\{1\},\\gamma\_\{2\},\\gamma\_\{3\}gives
\(Γ3,5♢∘Γ1,3♢\)\(α1,α2,α3\)=\(α1α2α3,\(1−α1\)α2α31−α1α2α3,\(1−α2\)α31−α2α3\)\.\\displaystyle\(\\Gamma^\{\\diamondsuit\}\_\{3,5\}\\circ\\Gamma^\{\\diamondsuit\}\_\{1,3\}\)\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\)=\\left\(\\alpha\_\{1\}\\alpha\_\{2\}\\alpha\_\{3\},\\frac\{\(1\-\\alpha\_\{1\}\)\\alpha\_\{2\}\\alpha\_\{3\}\}\{1\-\\alpha\_\{1\}\\alpha\_\{2\}\\alpha\_\{3\}\},\\frac\{\(1\-\\alpha\_\{2\}\)\\alpha\_\{3\}\}\{1\-\\alpha\_\{2\}\\alpha\_\{3\}\}\\right\)\.\(28\)
### C\.2Path II:𝖳1→𝖳2→𝖳4→𝖳5\\mathsf\{T\}\_\{1\}\\to\\mathsf\{T\}\_\{2\}\\to\\mathsf\{T\}\_\{4\}\\to\\mathsf\{T\}\_\{5\}
Now consider the cover
𝖳1=\(\(\(12\)3\)4\)⟶𝖳2=\(\(1\(23\)\)4\)\.\\mathsf\{T\}\_\{1\}=\(\(\(12\)3\)4\)\\longrightarrow\\mathsf\{T\}\_\{2\}=\(\(1\(23\)\)4\)\.This rotates the fragment
\(\(12\)3\)⟶\(1\(23\)\),\(\(12\)3\)\\longrightarrow\(1\(23\)\),soA=1A=1,B=2B=2,C=3C=3\. Therefore
Γ1,2♢\(α1,α2,α3\)=\(α1α2,\(1−α1\)α21−α1α2,α3\)\.\\Gamma^\{\\diamondsuit\}\_\{1,2\}\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\)=\\left\(\\alpha\_\{1\}\\alpha\_\{2\},\\frac\{\(1\-\\alpha\_\{1\}\)\\alpha\_\{2\}\}\{1\-\\alpha\_\{1\}\\alpha\_\{2\}\},\\alpha\_\{3\}\\right\)\.Here the coordinates on𝖳2\\mathsf\{T\}\_\{2\}are ordered as: parameter at\(1\(23\)\)\(1\(23\)\), parameter\(23\)\(23\), and parameter at the root\. Next consider the cover
𝖳2=\(\(1\(23\)\)4\)⟶𝖳4=\(1\(\(23\)4\)\)\.\\mathsf\{T\}\_\{2\}=\(\(1\(23\)\)4\)\\longrightarrow\\mathsf\{T\}\_\{4\}=\(1\(\(23\)4\)\)\.This rotates the root fragment with
A=1,B=\(23\),C=4\.A=1,\\qquad B=\(23\),\\qquad C=4\.If
\(δ1,δ2,δ3\)=Γ1,2♢\(α1,α2,α3\),\(\\delta\_\{1\},\\delta\_\{2\},\\delta\_\{3\}\)=\\Gamma^\{\\diamondsuit\}\_\{1,2\}\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\),then
Γ2,4♢\(δ1,δ2,δ3\)=\(δ1δ3,δ2,\(1−δ1\)δ31−δ1δ3\)\.\\Gamma^\{\\diamondsuit\}\_\{2,4\}\(\\delta\_\{1\},\\delta\_\{2\},\\delta\_\{3\}\)=\\left\(\\delta\_\{1\}\\delta\_\{3\},\\delta\_\{2\},\\frac\{\(1\-\\delta\_\{1\}\)\\delta\_\{3\}\}\{1\-\\delta\_\{1\}\\delta\_\{3\}\}\\right\)\.Substituting gives
\(η1,η2,η3\):=\(Γ2,4♢∘Γ1,2♢\)\(α1,α2,α3\)=\(α1α2α3,\(1−α1\)α21−α1α2,\(1−α1α2\)α31−α1α2α3\)\.\\displaystyle\(\\eta\_\{1\},\\eta\_\{2\},\\eta\_\{3\}\):=\(\\Gamma^\{\\diamondsuit\}\_\{2,4\}\\circ\\Gamma^\{\\diamondsuit\}\_\{1,2\}\)\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\)=\\left\(\\alpha\_\{1\}\\alpha\_\{2\}\\alpha\_\{3\},\\frac\{\(1\-\\alpha\_\{1\}\)\\alpha\_\{2\}\}\{1\-\\alpha\_\{1\}\\alpha\_\{2\}\},\\frac\{\(1\-\\alpha\_\{1\}\\alpha\_\{2\}\)\\alpha\_\{3\}\}\{1\-\\alpha\_\{1\}\\alpha\_\{2\}\\alpha\_\{3\}\}\\right\)\.\(29\)Finally, consider the cover
𝖳4=\(1\(\(23\)4\)\)⟶𝖳5=\(1\(2\(34\)\)\)\.\\mathsf\{T\}\_\{4\}=\(1\(\(23\)4\)\)\\longrightarrow\\mathsf\{T\}\_\{5\}=\(1\(2\(34\)\)\)\.This rotates the fragment
\(\(23\)4\)⟶\(2\(34\)\),\(\(23\)4\)\\longrightarrow\(2\(34\)\),soA=2A=2,B=3B=3,C=4C=4\. Therefore
Γ4,5♢\(η1,η2,η3\)=\(η1,η2η3,\(1−η2\)η31−η2η3\)\.\\Gamma^\{\\diamondsuit\}\_\{4,5\}\(\\eta\_\{1\},\\eta\_\{2\},\\eta\_\{3\}\)=\\left\(\\eta\_\{1\},\\eta\_\{2\}\\eta\_\{3\},\\frac\{\(1\-\\eta\_\{2\}\)\\eta\_\{3\}\}\{1\-\\eta\_\{2\}\\eta\_\{3\}\}\\right\)\.Then, it follows that
1−η2=\(1−α2\)1−α1α2,\\displaystyle 1\-\\eta\_\{2\}=\\frac\{\(1\-\\alpha\_\{2\}\)\}\{1\-\\alpha\_\{1\}\\alpha\_\{2\}\},η2η3=\(1−α1\)α2α31−α1α2α3,\\displaystyle\\eta\_\{2\}\\eta\_\{3\}=\\frac\{\(1\-\\alpha\_\{1\}\)\\alpha\_\{2\}\\alpha\_\{3\}\}\{1\-\\alpha\_\{1\}\\alpha\_\{2\}\\alpha\_\{3\}\},1−η2η3=1−α2α31−α1α2α3,\\displaystyle 1\-\\eta\_\{2\}\\eta\_\{3\}=\\frac\{1\-\\alpha\_\{2\}\\alpha\_\{3\}\}\{1\-\\alpha\_\{1\}\\alpha\_\{2\}\\alpha\_\{3\}\},\(1−η2\)η31−η2η3=\(1−α2\)α31−α2α3\.\\displaystyle\\frac\{\(1\-\\eta\_\{2\}\)\\eta\_\{3\}\}\{1\-\\eta\_\{2\}\\eta\_\{3\}\}=\\frac\{\(1\-\\alpha\_\{2\}\)\\alpha\_\{3\}\}\{1\-\\alpha\_\{2\}\\alpha\_\{3\}\}\.Collecting terms,
\(Γ4,5♢∘Γ2,4♢∘Γ1,2♢\)\(α1,α2,α3\)=\(α1α2α3,\(1−α1\)α2α31−α1α2α3,\(1−α2\)α31−α2α3\)\.\\displaystyle\(\\Gamma^\{\\diamondsuit\}\_\{4,5\}\\circ\\Gamma^\{\\diamondsuit\}\_\{2,4\}\\circ\\Gamma^\{\\diamondsuit\}\_\{1,2\}\)\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\)=\\left\(\\alpha\_\{1\}\\alpha\_\{2\}\\alpha\_\{3\},\\frac\{\(1\-\\alpha\_\{1\}\)\\alpha\_\{2\}\\alpha\_\{3\}\}\{1\-\\alpha\_\{1\}\\alpha\_\{2\}\\alpha\_\{3\}\},\\frac\{\(1\-\\alpha\_\{2\}\)\\alpha\_\{3\}\}\{1\-\\alpha\_\{2\}\\alpha\_\{3\}\}\\right\)\.\(30\)Comparing \([28](https://arxiv.org/html/2606.04779#A3.E28)\) and \([30](https://arxiv.org/html/2606.04779#A3.E30)\) proves
\(Γ3,5♢∘Γ1,3♢\)\(α1,α2,α3\)=\(Γ4,5♢∘Γ2,4♢∘Γ1,2♢\)\(α1,α2,α3\),\(\\Gamma^\{\\diamondsuit\}\_\{3,5\}\\circ\\Gamma^\{\\diamondsuit\}\_\{1,3\}\)\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\)=\(\\Gamma^\{\\diamondsuit\}\_\{4,5\}\\circ\\Gamma^\{\\diamondsuit\}\_\{2,4\}\\circ\\Gamma^\{\\diamondsuit\}\_\{1,2\}\)\(\\alpha\_\{1\},\\alpha\_\{2\},\\alpha\_\{3\}\),which is Equation \([22](https://arxiv.org/html/2606.04779#S7.E22)\)\. This establishes the commutativity of theN=4N=4Stasheff pentagon at the level of reparameterizations\. By Theorem[2](https://arxiv.org/html/2606.04779#Thmtheorem2), every edgewise transport preserves the root output\. Hence both pathwise compositions preserve the root output from𝖳1\\mathsf\{T\}\_\{1\}to𝖳5\\mathsf\{T\}\_\{5\}, and therefore
y^𝖳1\(𝜶\)=y^𝖳5\(\(Γ3,5♢∘Γ1,3♢\)\(𝜶\)\)=y^𝖳5\(\(Γ4,5♢∘Γ2,4♢∘Γ1,2♢\)\(𝜶\)\)\.\\hat\{y\}\_\{\\mathsf\{T\}\_\{1\}\}\(\\boldsymbol\{\\alpha\}\)=\\hat\{y\}\_\{\\mathsf\{T\}\_\{5\}\}\\bigl\(\(\\Gamma^\{\\diamondsuit\}\_\{3,5\}\\circ\\Gamma^\{\\diamondsuit\}\_\{1,3\}\)\(\\boldsymbol\{\\alpha\}\)\\bigr\)=\\hat\{y\}\_\{\\mathsf\{T\}\_\{5\}\}\\bigl\(\(\\Gamma^\{\\diamondsuit\}\_\{4,5\}\\circ\\Gamma^\{\\diamondsuit\}\_\{2,4\}\\circ\\Gamma^\{\\diamondsuit\}\_\{1,2\}\)\(\\boldsymbol\{\\alpha\}\)\\bigr\)\.Regression complementarity invariance then follows from Equation \([7](https://arxiv.org/html/2606.04779#S3.E7)\), proving Theorem[3](https://arxiv.org/html/2606.04779#Thmtheorem3)\. ∎
## Appendix DProof of Proposition[8](https://arxiv.org/html/2606.04779#Thmproposition8)
###### Proof\.
For the Bregman case, letF:\[0,1\]→ℝF:\[0,1\]\\to\\mathbb\{R\}be convex and continuous on\[0,1\]\[0,1\], and differentiable on\(0,1\)\(0,1\)\. Fory∈\{0,1\}y\\in\\\{0,1\\\}andy^∈\(0,1\)\\hat\{y\}\\in\(0,1\), defineDF\(y,y^\):=F\(y\)−F\(y^\)−F′\(y^\)\(y−y^\)D\_\{F\}\(y,\\hat\{y\}\):=F\(y\)\-F\(\\hat\{y\}\)\-F^\{\\prime\}\(\\hat\{y\}\)\(y\-\\hat\{y\}\)\. We show endpoint monotonicity\. WriteℓF\(0,y^\)=F\(0\)−G0\(y^\)\\ell\_\{F\}\(0,\\hat\{y\}\)=F\(0\)\-G\_\{0\}\(\\hat\{y\}\)andℓF\(1,y^\)=F\(1\)−G1\(y^\)\\ell\_\{F\}\(1,\\hat\{y\}\)=F\(1\)\-G\_\{1\}\(\\hat\{y\}\), whereG0\(y^\):=F\(y^\)−y^F′\(y^\)G\_\{0\}\(\\hat\{y\}\):=F\(\\hat\{y\}\)\-\\hat\{y\}F^\{\\prime\}\(\\hat\{y\}\)andG1\(y^\):=F\(y^\)\+\(1−y^\)F′\(y^\)G\_\{1\}\(\\hat\{y\}\):=F\(\\hat\{y\}\)\+\(1\-\\hat\{y\}\)F^\{\\prime\}\(\\hat\{y\}\)\. By convexity ofFF,G0G\_\{0\}is nonincreasing andG1G\_\{1\}is nondecreasing on\(0,1\)\(0,1\)\. ThusℓF=DF\\ell\_\{F\}=D\_\{F\}is endpoint\-monotone\.
We now considerff\-divergences\. Under the standard convention
Df\(P∥Q\):=∑k∈\{0,1\}Qkf\(PkQk\),D\_\{f\}\(P\\\|Q\):=\\sum\_\{k\\in\\\{0,1\\\}\}Q\_\{k\}f\\\!\\left\(\\frac\{P\_\{k\}\}\{Q\_\{k\}\}\\right\),letf:\[0,∞\)→ℝf:\[0,\\infty\)\\to\\mathbb\{R\}be convex, differentiable on\(0,∞\)\(0,\\infty\), normalized byf\(1\)=0f\(1\)=0, and finite at0\. For binary classification, write the hard\-label distribution and the predicted distribution asPy=\(y,1−y\)P\_\{y\}=\(y,1\-y\)andQp=\(y^,1−y^\)Q\_\{p\}=\(\\hat\{y\},1\-\\hat\{y\}\)\. The induced hard\-label loss is
ℓf\(y,y^\):=Df\(Py∥Qy^\)=y^f\(yy^\)\+\(1−y^\)f\(1−y1−y^\)\.\\ell\_\{f\}\(y,\\hat\{y\}\):=D\_\{f\}\(P\_\{y\}\\\|Q\_\{\\hat\{y\}\}\)=\\hat\{y\}f\\\!\\left\(\\frac\{y\}\{\\hat\{y\}\}\\right\)\+\(1\-\\hat\{y\}\)f\\\!\\left\(\\frac\{1\-y\}\{1\-\\hat\{y\}\}\\right\)\.Hence
ℓf\(1,y^\)=y^f\(1y^\)\+\(1−y^\)f\(0\),ℓf\(0,y^\)=y^f\(0\)\+\(1−y^\)f\(11−y^\)\.\\ell\_\{f\}\(1,\\hat\{y\}\)=\\hat\{y\}f\\\!\\left\(\\frac\{1\}\{\\hat\{y\}\}\\right\)\+\(1\-\\hat\{y\}\)f\(0\),\\qquad\\ell\_\{f\}\(0,\\hat\{y\}\)=\\hat\{y\}f\(0\)\+\(1\-\\hat\{y\}\)f\\\!\\left\(\\frac\{1\}\{1\-\\hat\{y\}\}\\right\)\.Setg\(t\):=f\(t\)−tf′\(t\)g\(t\):=f\(t\)\-tf^\{\\prime\}\(t\)\. Differentiating the induced binary loss gives
ddy^ℓf\(1,y^\)=g\(1y^\)−f\(0\),ddy^ℓf\(0,y^\)=f\(0\)−g\(11−y^\)\.\\frac\{d\}\{d\\hat\{y\}\}\\ell\_\{f\}\(1,\\hat\{y\}\)=g\\\!\\left\(\\frac\{1\}\{\\hat\{y\}\}\\right\)\-f\(0\),\\qquad\\frac\{d\}\{d\\hat\{y\}\}\\ell\_\{f\}\(0,\\hat\{y\}\)=f\(0\)\-g\\\!\\left\(\\frac\{1\}\{1\-\\hat\{y\}\}\\right\)\.By convexity, the tangent inequality att\>0t\>0, evaluated at0, gives
f\(0\)≥f\(t\)\+f′\(t\)\(0−t\)=f\(t\)−tf′\(t\)=g\(t\)\.f\(0\)\\geq f\(t\)\+f^\{\\prime\}\(t\)\(0\-t\)=f\(t\)\-tf^\{\\prime\}\(t\)=g\(t\)\.Thereforeg\(1/y^\)−f\(0\)≤0g\(1/\\hat\{y\}\)\-f\(0\)\\leq 0andf\(0\)−g\(1/\(1−y^\)\)≥0f\(0\)\-g\(1/\(1\-\\hat\{y\}\)\)\\geq 0\. Hencey^↦ℓf\(1,y^\)\\hat\{y\}\\mapsto\\ell\_\{f\}\(1,\\hat\{y\}\)is nonincreasing andy^↦ℓf\(0,y^\)\\hat\{y\}\\mapsto\\ell\_\{f\}\(0,\\hat\{y\}\)is nondecreasing\. Thus,ℓf\\ell\_\{f\}is endpoint\-monotone\. ∎
## Appendix EMachine\-Learning Details for Experiment 1
Experiment 1 illustrates the two\-agent regression geometry from Proposition[4](https://arxiv.org/html/2606.04779#Thmproposition4)\. The purpose of the experiment is to verify how the location of the constrained maximizerα∗\\alpha^\{\\ast\}and the sign of complementarity depend on the angle and norm of the human–AI disagreement direction\.
### E\.1Dataset and AI model
We use the California Housing regression dataset fromscikit\-learn\. The target is the median house value of California districts expressed in hundreds of thousands of dollars \($100,000\)\. The data are split into training and test sets with a75/2575/25split\. The AI predictor is a random\-forest regressor with300300trees,min\_samples\_leaf=2\\texttt\{min\\\_samples\\\_leaf\}=2\. No hyperparameter tuning is performed, because the model is used only to generate a fixed AI prediction vectory^AI\\hat\{y\}^\{AI\}\. Complementarity is computed on the held\-out test set \(n=5160n=5160\), not on the training set\. The resulting AI test RMSE is0\.51020\.5102, with
Cn=‖y^AI−y‖22=1343\.2033\(\[$100,000\]2\)\.C\_\{n\}=\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}^\{2\}=1343\.2033\\quad\(\[\\mathdollar 100,000\]^\{2\}\)\.
#### E\.1\.1Synthetic human predictions
Starting from the fixed test\-set vectorsyyandy^AI\\hat\{y\}^\{AI\}, we generate synthetic human predictions with controlled geometry\. Let
e1=y^AI−y‖y^AI−y‖2\.e\_\{1\}=\\frac\{\\hat\{y\}^\{AI\}\-y\}\{\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}\}\.We sample a random Gaussian vector, orthogonalize it againste1e\_\{1\}, and normalize it to obtain a unit vectore2e\_\{2\}with⟨e1,e2⟩≃0\\langle e\_\{1\},e\_\{2\}\\rangle\\simeq 0\. For a prescribed angleθ\\thetaand norm ratioqq, the human–AI displacement is
y^H−y^AI=q‖y^AI−y‖2\(cos\(θ\)e1\+sin\(θ\)e2\),\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}=q\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}\\bigl\(\\cos\(\\theta\)e\_\{1\}\+\\sin\(\\theta\)e\_\{2\}\\bigr\),so that
∠\(y^H−y^AI,y^AI−y\)=θ,‖y^H−y^AI‖2‖y^AI−y‖2=q\.\\angle\(\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\},\\,\\hat\{y\}^\{AI\}\-y\)=\\theta,\\qquad\\frac\{\\\|\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}\\\|\_\{2\}\}\{\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}\}=q\.The human prediction is theny^H=y^AI\+\(y^H−y^AI\)\\hat\{y\}^\{H\}=\\hat\{y\}^\{AI\}\+\(\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}\)\. This construction fixes the AI model and the test labels, while varying only the geometry of the synthetic human prediction\.
### E\.2Complementarity computation
For each scenario, we computeAn=‖y^H−y^AI‖22A\_\{n\}=\\\|\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\}\\\|\_\{2\}^\{2\},Bn=⟨y^H−y^AI,y^AI−y⟩B\_\{n\}=\\langle\\hat\{y\}^\{H\}\-\\hat\{y\}^\{AI\},\\hat\{y\}^\{AI\}\-y\\rangle,Cn=‖y^AI−y‖22C\_\{n\}=\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}^\{2\}, and the pointwise\-oracle benchmark
nKn=∑i=1nmin\{\(y^iH−yi\)2,\(y^iAI−yi\)2\}\.nK\_\{n\}=\\sum\_\{i=1\}^\{n\}\\min\\\{\(\\hat\{y\}\_\{i\}^\{H\}\-y\_\{i\}\)^\{2\},\(\\hat\{y\}\_\{i\}^\{AI\}\-y\_\{i\}\)^\{2\}\\\}\.For the linear team predictiony^𝖳\(α\)=αy^H\+\(1−α\)y^AI\\hat\{y\}\_\{\\mathsf\{T\}\}\(\\alpha\)=\\alpha\\hat\{y\}^\{H\}\+\(1\-\\alpha\)\\hat\{y\}^\{AI\},α∈\[0,1\]\\alpha\\in\[0,1\], the plotted quantity is
nΨ\(α\)=−Anα2−2Bnα\+\(nKn−Cn\),n\\Psi\(\\alpha\)=\-A\_\{n\}\\alpha^\{2\}\-2B\_\{n\}\\alpha\+\(nK\_\{n\}\-C\_\{n\}\),with constrained maximizer
α∗=Π\[0,1\]\(−BnAn\)\.\\alpha^\{\\ast\}=\\Pi\_\{\[0,1\]\}\\\!\\left\(\-\\frac\{B\_\{n\}\}\{A\_\{n\}\}\\right\)\.
### E\.3Scenarios
We evaluate five controlled scenarios:
\(θ,q\)∈\{\(0,1\.0\),\(π2,1\.0\),\(3π4,0\.5\),\(3π4,2\.5\),\(π,1\.25\)\}\.\(\\theta,q\)\\in\\\{\(0,1\.0\),\(\\frac\{\\pi\}\{2\},1\.0\),\(\\frac\{3\\pi\}\{4\},0\.5\),\(\\frac\{3\\pi\}\{4\},2\.5\),\(\\pi,1\.25\)\\\}\.The first two scenarios test non\-corrective and orthogonal human movement\. The third is corrective but too short, so the unconstrained maximizer lies outside the feasible segment\. The last two scenarios satisfy the interior condition and achieve positive complementarity\. Theθ=π\\theta=\\picase is the collinear corrective case: the human–AI line passes through the ground\-truth direction, so the geometric loss term proportional tosin2θ\\sin^\{2\}\\thetavanishes\.
## Appendix FExperiment 2 implementation details
We use the California Housing regression dataset with the same preprocessing, train–test split, and random forest model as in Experiment 1\. A random forest regressor with300300trees andmin\_samples\_leaf=2is trained on the training set\. All equality loci and performance quantities are computed on the held\-out test set\. Lety^AI\\hat\{y\}^\{AI\}denote the AI prediction vector on the test set and let
rAI:=y^AI−yr\_\{AI\}:=\\hat\{y\}^\{AI\}\-ybe the AI residual vector\. We construct synthetic agent predictions in an AI\-anchored geometry\. For each synthetic agentjj, let
dj:=y^\(j\)−y^AId\_\{j\}:=\\hat\{y\}^\{\(j\)\}\-\\hat\{y\}^\{AI\}be its displacement from the AI prediction\. We control the angleθj=∠\(dj,rAI\)\\theta\_\{j\}=\\angle\(d\_\{j\},r\_\{AI\}\)and the relative displacement size
qj:=‖y^\(j\)−y^AI‖2‖y^AI−y‖2=‖dj‖2‖rAI‖2\.q\_\{j\}:=\\frac\{\\\|\\hat\{y\}^\{\(j\)\}\-\\hat\{y\}^\{AI\}\\\|\_\{2\}\}\{\\\|\\hat\{y\}^\{AI\}\-y\\\|\_\{2\}\}=\\frac\{\\\|d\_\{j\}\\\|\_\{2\}\}\{\\\|r\_\{AI\}\\\|\_\{2\}\}\.Thus, valuesθj\>90∘\\theta\_\{j\}\>90^\{\\circ\}indicate displacements that are corrective relative to the AI residual, whereas valuesθj<90∘\\theta\_\{j\}<90^\{\\circ\}move broadly in the same direction as the AI residual\.
To generate the synthetic predictions, we draw a fixed unit vectoru⟂u\_\{\\perp\}orthogonal torAIr\_\{AI\}and set
y^\(j\)=y^AI\+qj‖rAI‖2\(cos\(θj\)rAI‖rAI‖2\+sin\(θj\)u⟂\)\.\\hat\{y\}^\{\(j\)\}=\\hat\{y\}^\{AI\}\+q\_\{j\}\\\|r\_\{AI\}\\\|\_\{2\}\\left\(\\cos\(\\theta\_\{j\}\)\\frac\{r\_\{AI\}\}\{\\\|r\_\{AI\}\\\|\_\{2\}\}\+\\sin\(\\theta\_\{j\}\)u\_\{\\perp\}\\right\)\.In all scenarios,y^\(1\)\\hat\{y\}^\{\(1\)\}is a fixed corrective expert withθ1=180∘\\theta\_\{1\}=180^\{\\circ\}andq1=1\.75q\_\{1\}=1\.75, whiley^\(3\)=y^AI\\hat\{y\}^\{\(3\)\}=\\hat\{y\}^\{AI\}\. The assistanty^\(2\)\\hat\{y\}^\{\(2\)\}is varied across four regimes:
\(θ2,q2\)∈\{\(160∘,0\.50\),\(160∘,1\.50\),\(20∘,0\.50\),\(20∘,1\.50\)\}\.\(\\theta\_\{2\},q\_\{2\}\)\\in\\\{\(160^\{\\circ\},0\.50\),\\ \(160^\{\\circ\},1\.50\),\\ \(20^\{\\circ\},0\.50\),\\ \(20^\{\\circ\},1\.50\)\\\}\.These correspond respectively to weakly corrective, strongly corrective, weakly non\-corrective, and strongly non\-corrective assistant displacements relative to the AI residual\.
For each scenario, we evaluate the twoN=3N=3protocol trees𝖳L=\(\(12\)3\)\\mathsf\{T\}\_\{L\}=\(\(12\)3\),𝖳R=\(1\(23\)\)\\mathsf\{T\}\_\{R\}=\(1\(23\)\), withy^𝖳L\\hat\{y\}\_\{\\mathsf\{T\}\_\{L\}\}given by \([18](https://arxiv.org/html/2606.04779#S5.E18)\) andy^𝖳R\\hat\{y\}\_\{\\mathsf\{T\}\_\{R\}\}given by \([19](https://arxiv.org/html/2606.04779#S5.E19)\)\. The plotted quantity is the signed test\-set MSE difference
P\(α1,α2\)n=1n\(‖y−y^𝖳L‖22−‖y−y^𝖳R‖22\)=MSE\(𝖳L\)−MSE\(𝖳R\)\.\\displaystyle\\frac\{P\(\\alpha\_\{1\},\\alpha\_\{2\}\)\}\{n\}=\\frac\{1\}\{n\}\\left\(\\\|y\-\\hat\{y\}\_\{\\mathsf\{T\}\_\{L\}\}\\\|\_\{2\}^\{2\}\-\\\|y\-\\hat\{y\}\_\{\\mathsf\{T\}\_\{R\}\}\\\|\_\{2\}^\{2\}\\right\)=\\operatorname\{MSE\}\(\\mathsf\{T\}\_\{L\}\)\-\\operatorname\{MSE\}\(\\mathsf\{T\}\_\{R\}\)\.HenceP/n=0P/n=0is the protocol\-indifference locus𝒮3\\mathcal\{S\}\_\{3\},P/n<0P/n<0indicates lower MSE for𝖳L\\mathsf\{T\}\_\{L\}, andP/n\>0P/n\>0indicates lower MSE for𝖳R\\mathsf\{T\}\_\{R\}\. Here, lower tree MSE corresponds to higher complementarity value for that tree—see eq\. \([13](https://arxiv.org/html/2606.04779#S5.E13)\)\. With the parameter convention in \([18](https://arxiv.org/html/2606.04779#S5.E18)\)–\([19](https://arxiv.org/html/2606.04779#S5.E19)\), the two root outputs satisfyy^𝖳L−y^𝖳R=α1\(1−α2\)\(y^\(3\)−y^\(1\)\)\\hat\{y\}\_\{\\mathsf\{T\}\_\{L\}\}\-\\hat\{y\}\_\{\\mathsf\{T\}\_\{R\}\}=\\alpha\_\{1\}\(1\-\\alpha\_\{2\}\)\(\\hat\{y\}^\{\(3\)\}\-\\hat\{y\}^\{\(1\)\}\)\. Consequently,α1=0\\alpha\_\{1\}=0andα2=1\\alpha\_\{2\}=1are structural branches of𝒮3\\mathcal\{S\}\_\{3\}in every scenario\. In the implementation,P/nP/nis evaluated on a251×251251\\times 251grid over\[0,1\]2\[0,1\]^\{2\}\.
## Appendix GExperiment 3 implementation details
Experiment 3 studies binary classification under cross\-entropy and amplified logit pooling in theN=2N=2case\. The goal is not to fit a real\-data classifier, but to illustrate the mechanism by which leaving internality can make positive complementarity possible\. We fixα=0\.5\\alpha=0\.5and evaluate several amplification levelsλ∈\{1,2,5,10,20\}\\lambda\\in\\\{1,2,5,10,20\\\}\.
We generate balancedn=1000n=1000binary labelsyi∈\{0,1\}y\_\{i\}\\in\\\{0,1\\\}, withI1=\{i:yi=1\}I\_\{1\}=\\\{i:y\_\{i\}=1\\\}andI0=\{i:yi=0\}I\_\{0\}=\\\{i:y\_\{i\}=0\\\}\. For each simulated pair of probabilistic predictors, the construction is performed in logit space\. Letsi:=2yi−1∈\{−1,1\}s\_\{i\}:=2y\_\{i\}\-1\\in\\\{\-1,1\\\}\. We first generate a midpoint logitCiC\_\{i\}, which is the unamplified pooled logit forα=12\\alpha=\\frac\{1\}\{2\}\. Its magnitude is sampled around a dataset\-level signal\-strength parameter, while its sign issis\_\{i\}except for a randomly flipped fraction of observations\. Thus most pooled logits point toward the correct class, while a controlled fraction points in the wrong direction, meaning that amplification pushes the output opposite to the locally complementary direction: toward class0onI1I\_\{1\}, or toward class11onI0I\_\{0\}\. The two agent logits are then placed symmetrically around this midpoint aszi\(1\)=Ci\+δi/2z\_\{i\}^\{\(1\)\}=C\_\{i\}\+\\delta\_\{i\}/2andzi\(2\)=Ci−δi/2z\_\{i\}^\{\(2\)\}=C\_\{i\}\-\\delta\_\{i\}/2, whereδi\\delta\_\{i\}is Gaussian disagreement noise\. The resulting probabilities arey^i\(1\)=σ\(zi\(1\)\)\\hat\{y\}\_\{i\}^\{\(1\)\}=\\sigma\(z\_\{i\}^\{\(1\)\}\)andy^i\(2\)=σ\(zi\(2\)\)\\hat\{y\}\_\{i\}^\{\(2\)\}=\\sigma\(z\_\{i\}^\{\(2\)\}\)\.
For a fixedλ\\lambda, the amplified logit\-pooling output ism2,12,λlogit\(y^i\(1\),y^i\(2\)\)=σ\(λCi\)m\_\{2,\\frac\{1\}\{2\},\\lambda\}^\{\\mathrm\{logit\}\}\(\\hat\{y\}\_\{i\}^\{\(1\)\},\\hat\{y\}\_\{i\}^\{\(2\)\}\)=\\sigma\(\\lambda C\_\{i\}\), sinceα=12\\alpha=\\frac\{1\}\{2\}\.
We also compute the class\-wise rates of canonical local complementarity\. OnI1I\_\{1\}, we definek1k\_\{1\}as the fraction of observations satisfyingm2,12,λlogit\(y^i\(1\),y^i\(2\)\)\>max\{y^i\(1\),y^i\(2\)\}m\_\{2,\\frac\{1\}\{2\},\\lambda\}^\{\\mathrm\{logit\}\}\(\\hat\{y\}\_\{i\}^\{\(1\)\},\\hat\{y\}\_\{i\}^\{\(2\)\}\)\>\\max\\\{\\hat\{y\}\_\{i\}^\{\(1\)\},\\hat\{y\}\_\{i\}^\{\(2\)\}\\\}\. OnI0I\_\{0\}, we definek0k\_\{0\}as the fraction satisfyingm2,12,λlogit\(y^i\(1\),y^i\(2\)\)<min\{y^i\(1\),y^i\(2\)\}m\_\{2,\\frac\{1\}\{2\},\\lambda\}^\{\\mathrm\{logit\}\}\(\\hat\{y\}\_\{i\}^\{\(1\)\},\\hat\{y\}\_\{i\}^\{\(2\)\}\)<\\min\\\{\\hat\{y\}\_\{i\}^\{\(1\)\},\\hat\{y\}\_\{i\}^\{\(2\)\}\\\}\. These are the local cases in which the amplified output leaves the interval spanned by the two input probabilities in the direction of the true class\. Forλ=1\\lambda=1, ordinary logit pooling is internal, so these strict outside\-interval conditions cannot occur\. Forλ\>1\\lambda\>1, amplification may create such cases by moving the pooled logit away from0, toward class11ifCi\>0C\_\{i\}\>0and toward class0ifCi<0C\_\{i\}<0\. We finalize the experiment computing the complementarity functional \([2](https://arxiv.org/html/2606.04779#S3.E2)\) using binary cross\-entropy as loss function\.
## References
- A general class of coefficients of divergence of one distribution from another\.Journal of the Royal Statistical Society: Series B \(Methodological\)28\(1\),pp\. 131–142\.Cited by:[§1](https://arxiv.org/html/2606.04779#S1.p6.1),[item \(ii\)](https://arxiv.org/html/2606.04779#S8.I1.i2.p1.7)\.
- G\. Bansal, B\. Nushi, E\. Kamar, E\. Horvitz, and D\. S\. Weld \(2021a\)Is the most accurate AI the best teammate? Optimizing ai for teamwork\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.35,pp\. 11405–11414\.Cited by:[§1](https://arxiv.org/html/2606.04779#S1.p1.1),[§2](https://arxiv.org/html/2606.04779#S2.p1.1)\.
- G\. Bansal, T\. Wu, J\. Zhou, R\. Fok, B\. Nushi, E\. Kamar, M\. T\. Ribeiro, and D\. Weld \(2021b\)Does the whole exceed its parts? The effect of AI explanations on complementary team performance\.InProceedings of the 2021 CHI conference on human factors in computing systems,pp\. 1–16\.Cited by:[§1](https://arxiv.org/html/2606.04779#S1.p1.1),[§2](https://arxiv.org/html/2606.04779#S2.p1.1),[Definition 4](https://arxiv.org/html/2606.04779#Thmdefinition4)\.
- L\. M\. Bregman \(1967\)The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming\.USSR Computational Mathematics and Mathematical Physics7\(3\),pp\. 200–217\.Cited by:[§1](https://arxiv.org/html/2606.04779#S1.p6.1),[item \(i\)](https://arxiv.org/html/2606.04779#S8.I1.i1.p1.5)\.
- K\. Donahue, A\. Chouldechova, and K\. Kenthapadi \(2022\)Human\-algorithm collaboration: achieving complementarity and avoiding unfairness\.InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency,pp\. 1639–1656\.Cited by:[§1](https://arxiv.org/html/2606.04779#S1.p1.1),[§2](https://arxiv.org/html/2606.04779#S2.p2.5),[Definition 4](https://arxiv.org/html/2606.04779#Thmdefinition4)\.
- A\. Ferrario, A\. Facchini, and J\. M\. Durán \(2026\)Epistemology gives a future to complementarity in human\-AI interactions\.arXiv preprint arXiv:2601\.09871\.Cited by:[§9](https://arxiv.org/html/2606.04779#S9.p6.1)\.
- A\. Ferrario \(2025\)Being pragmatic about reliance and trust in artificial intelligence\.Minds and Machines36\(1\),pp\. 5\.Cited by:[§4\.2](https://arxiv.org/html/2606.04779#S4.SS2.p1.8)\.
- S\. Furuichi, K\. Yanagi, and K\. Kuriyama \(2004\)Fundamental properties of Tsallis relative entropy\.Journal of Mathematical Physics45\(12\),pp\. 4868–4877\.Cited by:[Corollary 3](https://arxiv.org/html/2606.04779#Thmcorollary3.p1.2.2)\.
- P\. Hemmer, S\. Schellhammer, M\. Vössing, J\. Jakubik, and G\. Satzger \(2022\)Forming effective human\-AI teams: building machine learning models that complement the capabilities of multiple experts\.InProceedings of the Thirty\-First International Joint Conference on Artificial Intelligence,pp\. 2478–2484\.Cited by:[§2](https://arxiv.org/html/2606.04779#S2.p3.1)\.
- P\. Hemmer, M\. Schemmer, M\. Vössing, and N\. Kühl \(2021\)Human\-AI complementarity in hybrid intelligence systems: A structured literature review\.PACIS78\.Cited by:[§1](https://arxiv.org/html/2606.04779#S1.p1.1),[§2](https://arxiv.org/html/2606.04779#S2.p1.1)\.
- P\. Hemmer, M\. Schemmer, N\. Kühl, M\. Vössing, and G\. Satzger \(2025\)Complementarity in human\-AI collaboration: Concept, sources, and evidence\.European Journal of Information Systems,pp\. 1–24\.Cited by:[§1](https://arxiv.org/html/2606.04779#S1.p1.1),[§2](https://arxiv.org/html/2606.04779#S2.p1.1),[Definition 4](https://arxiv.org/html/2606.04779#Thmdefinition4)\.
- A\. Kolmogorov \(1930\)On the notion of mean\.Mathematics and Mechanics199\(1\),pp\. 144–146\.Cited by:[§4\.1](https://arxiv.org/html/2606.04779#S4.SS1.p1.13),[§8\.1](https://arxiv.org/html/2606.04779#S8.SS1.p2.1)\.
- T\. Miller \(2023\)Explainable AI is dead, long live explainable AI\! Hypothesis\-driven decision support using evaluative AI\.InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency,pp\. 333–342\.Cited by:[§2](https://arxiv.org/html/2606.04779#S2.p1.1)\.
- M\. Nagumo \(1930\)Über eine Klasse der Mittelwerte\.InJapanese Journal of Mathematics: Transactions and Abstracts,Vol\.7,pp\. 71–79\.Cited by:[§4\.1](https://arxiv.org/html/2606.04779#S4.SS1.p1.13),[§8\.1](https://arxiv.org/html/2606.04779#S8.SS1.p2.1)\.
- E\. Neyman and T\. Roughgarden \(2023a\)From proper scoring rules to max\-min optimal forecast aggregation\.Operations Research71\(6\),pp\. 2175–2195\.Cited by:[§8\.1](https://arxiv.org/html/2606.04779#S8.SS1.p5.6)\.
- E\. Neyman and T\. Roughgarden \(2023b\)No\-regret learning with unbounded losses: The case of logarithmic pooling\.Advances in Neural Information Processing Systems36,pp\. 21857–21877\.Cited by:[§1](https://arxiv.org/html/2606.04779#S1.p6.1),[§8\.1](https://arxiv.org/html/2606.04779#S8.SS1.p5.6)\.
- H\. Paat and G\. Shen \(2025\)Conformal set\-based human\-AI complementarity with multiple experts\.arXiv preprint arXiv:2508\.06997\.Cited by:[§2](https://arxiv.org/html/2606.04779#S2.p3.1)\.
- K\. Peng, N\. Garg, and J\. Kleinberg \(2025\)A no free lunch theorem for human\-AI collaboration\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 14369–14376\.Cited by:[§2](https://arxiv.org/html/2606.04779#S2.p3.1)\.
- C\. Rastogi, L\. Leqi, K\. Holstein, and H\. Heidari \(2023\)A taxonomy of human and ml strengths in decision\-making to investigate human\-ML complementarity\.InProceedings of the AAAI Conference on Human Computation and Crowdsourcing,Vol\.11,pp\. 127–139\.Cited by:[§1](https://arxiv.org/html/2606.04779#S1.p1.1),[§2](https://arxiv.org/html/2606.04779#S2.p2.5)\.
- M\. Schemmer, N\. Kuehl, C\. Benz, A\. Bartos, and G\. Satzger \(2023\)Appropriate reliance on AI advice: Conceptualization and the effect of explanations\.InProceedings of the 28th International Conference on Intelligent User Interfaces,IUI ’23,New York, NY, USA,pp\. 410–422\.Cited by:[§1](https://arxiv.org/html/2606.04779#S1.p3.1),[§2](https://arxiv.org/html/2606.04779#S2.p1.1),[§4\.2](https://arxiv.org/html/2606.04779#S4.SS2.p1.8),[§9](https://arxiv.org/html/2606.04779#S9.p2.1)\.
- J\. D\. Stasheff \(1963\)Homotopy associativity of H\-spaces\. I\.Transactions of the American Mathematical Society108\(2\),pp\. 275–292\.Cited by:[§7](https://arxiv.org/html/2606.04779#S7.p2.4)\.
- D\. Tamari \(1962\)Problèmes d’associativité des monoides et problèmes des mots pour les groupes\.Séminaire Dubreil\. Algèbre et théorie des nombres16\(1\),pp\. 1–29\.Cited by:[§1](https://arxiv.org/html/2606.04779#S1.p5.1),[§7](https://arxiv.org/html/2606.04779#S7.p1.1),[Definition 7](https://arxiv.org/html/2606.04779#Thmdefinition7)\.
- C\. Tsallis \(1988\)Possible generalization of Boltzmann\-Gibbs statistics\.Journal of Statistical Physics52\(1\),pp\. 479–487\.Cited by:[Corollary 3](https://arxiv.org/html/2606.04779#Thmcorollary3.p1.2.2)\.
- M\. Vaccaro, A\. Almaatouq, and T\. Malone \(2024\)When combinations of humans and AI are useful: a systematic review and meta\-analysis\.Nature Human Behaviour8\(12\),pp\. 2293–2303\.Cited by:[§1](https://arxiv.org/html/2606.04779#S1.p1.1),[§9](https://arxiv.org/html/2606.04779#S9.p5.1)\.
- R\. Verma, D\. Barrejón, and E\. Nalisnick \(2023\)Learning to defer to multiple experts: consistent surrogate losses, confidence calibration, and conformal ensembles\.InInternational Conference on Artificial Intelligence and Statistics,pp\. 11415–11434\.Cited by:[§2](https://arxiv.org/html/2606.04779#S2.p3.1)\.
- N\. S\. Yanofsky \(2024\)Monoidal Category Theory: Unifying Concepts in Mathematics, Physics, and Computing\.MIT Press\.Cited by:[§1](https://arxiv.org/html/2606.04779#S1.p5.1),[§7](https://arxiv.org/html/2606.04779#S7.p6.7)\.
- Y\. Zhang, Q\. V\. Liao, and R\. K\. Bellamy \(2020\)Effect of confidence and explanation on accuracy and trust calibration in AI\-assisted decision making\.InProceedings of the 2020 Conference on Fairness, Accountability, and Transparency,pp\. 295–305\.Cited by:[§2](https://arxiv.org/html/2606.04779#S2.p1.1),[§4\.2](https://arxiv.org/html/2606.04779#S4.SS2.p1.8)\.Similar Articles
When Can Human-AI Teams Outperform Individuals? Tight Bounds with Impossibility Guarantees
This paper derives tight theoretical bounds for human-AI teams, proving when confidence-based aggregation leads to complementarity and establishing impossibility results under specific error correlations.
AI agents feel much more reliable once multiple models are involved
An exploration of how using multiple AI models for agent workflows reveals hidden uncertainties and reasoning gaps, suggesting that future systems may rely on cross-model consensus rather than single-model chains.
Multi agent systems for complex tasks
Discusses multi-agent systems designed to handle complex tasks, likely covering coordination and collaboration among AI agents.
Learning to Decide with AI Assistance under Human-Alignment
This paper studies the problem of learning to make optimal decisions with AI assistance under human-alignment, showing that alignment can reduce the complexity of learning, and provides regret bounds.
As we scale toward agentic, multimodal systems combining LLMs, RLHF, tool-use, and retrieval-augmented generation, what practical architecture best balances reliability, alignment, and cost?
The article debates whether future AI systems should use a unified agent stack or modular ensembles, and advocates for more realistic robustness benchmarks beyond static evaluations.