Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal

arXiv cs.AI Papers

Summary

This paper argues that consensus-seeking in multi-agent LLM systems is insufficient for value-laden tasks, proposing a knowledge-representation layer that classifies agent reasoning-trace disagreements into four symbolic states to enable strategic routing in systems like content moderation.

arXiv:2606.04223v1 Announce Type: new Abstract: Multi-agent systems are commonly designed to reduce disagreement through voting, consensus protocols, debate, or fault-tolerant aggregation. We argue that this objective is insufficient for value-laden tasks, where disagreement may reflect genuine normative uncertainty rather than agent error. Building on prior work on reasoning-trace disagreement in human-AI collaborative moderation, we propose a knowledge-representation layer in which reasoning traces and agent decisions are abstracted into symbolic disagreement states. Given agents producing explicit reasoning traces and binary decisions, we distinguish four states according to reasoning similarity and conclusion agreement: convergent agreement, divergent agreement, convergent disagreement and divergent disagreement. These states support defeasible strategic routing rules. We instantiate the framework in content moderation and argue that disagreement-aware routing provides a bridge between sub-symbolic LLM deliberation and symbolic knowledge representation for multi-agent strategic reasoning.
Original Article
View Cached Full Text

Cached at: 06/05/26, 02:05 AM

# Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal
Source: [https://arxiv.org/html/2606.04223](https://arxiv.org/html/2606.04223)
Jarosław A\. Chudziak1,2 \\affiliations1Laboratory of The New Ethos Warsaw University of Technology, Warsaw, Poland 2Institute of Computer Science Faculty of Electronics and Information Technology Warsaw University of Technology, Warsaw, Poland \\emails\{michal\.wawer\.stud, jaroslaw\.chudziak\}@pw\.edu\.pl

###### Abstract

Multi\-agent systems are commonly designed to reduce disagreement through voting, consensus protocols, debate, or fault\-tolerant aggregation\. We argue that this objective is insufficient for value\-laden tasks, where disagreement may reflect genuine normative uncertainty rather than agent error\. Building on prior work on reasoning\-trace disagreement in human\-AI collaborative moderation, we propose a knowledge\-representation layer in which reasoning traces and agent decisions are abstracted into symbolic disagreement states\. Given agents producing explicit reasoning traces and binary decisions, we distinguish four states according to reasoning similarity and conclusion agreement: convergent agreement, divergent agreement, convergent disagreement and divergent disagreement\. These states support defeasible strategic routing rules\. We instantiate the framework in content moderation and argue that disagreement\-aware routing provides a bridge between sub\-symbolic LLM deliberation and symbolic knowledge representation for multi\-agent strategic reasoning\.

## 1Introduction

LLM\-based multi\-agent systems are increasingly used as collective reasoning architectures \(?;?\) in which several agents deliberate, debate, or aggregate judgments before producing a final output \(?;?\)\. Existing approaches typically treat inter\-agent disagreement as a defect to be reduced through majority voting, additional debate rounds, or robust aggregation \(?;?;?;?;?\)\. This is plausible for instrumental tasks where disagreement signals noise or reasoning failure\. It is far less appropriate for value\-laden tasks, where disagreement may be a stable property of the decision problem itself\.

Content moderation is paradigmatic \(?\)\. Decisions about harmful speech, group\-directed language, or political criticism involve competing values, contextual interpretation, and socially situated judgment \(?;?;?;?\)\. Annotator disagreement on such cases is not always error to be averaged away: it may reflect perspectival variation or genuine value pluralism \(?;?\)\. The same observation applies to LLM agents: when differently profiled agents disagree, the disagreement may itself be informative\.

We exploit this by extending our previous research \(?\) by introducing a knowledge\-representation layer that abstracts agent reasoning traces and decisions into a small set of symbolic states and a defeasible policy that routes each state to a strategic meta\-action\. Three contributions follow\. First, we reframe disagreement as a representable epistemic state of the multi\-agent system rather than an aggregation obstacle \(?\)\. Second, we define a compact taxonomy along two dimensions: reasoning similarity and conclusion agreement, yielding four states: convergent agreement \(CA\), divergent agreement \(DA\), divergent disagreement \(DD\), and convergent disagreement \(CD\)\. Third, we associate these states with defeasible routing rules so the system reasons not only about what to decide, but about whether to decide, to inquire or to escalate\.

## 2Disagreement as a Knowledge\-Representation Signal

We model an LLM\-based multi\-agent system \(?\) as a finite set of agentsA=\{a1,…,an\}A=\\\{a\_\{1\},\\dots,a\_\{n\}\\\}\. For a casecc\(a content item\), each agent produces an outputOi​\(c\)=⟨ri,di,vi,γi⟩O\_\{i\}\(c\)=\\langle r\_\{i\},d\_\{i\},v\_\{i\},\\gamma\_\{i\}\\rangle, whererir\_\{i\}is an explicit reasoning trace,di∈Dd\_\{i\}\\in Dis the agent’s decision \(hereD=\{Keep,Remove\}D=\\\{\\textsc\{Keep\},\\textsc\{Remove\}\\\}\),viv\_\{i\}is the value or perspective profile, andγi\\gamma\_\{i\}is a confidence score\. The KR layer treatsrir\_\{i\}as an observable justificatory artifact rather than a formal proof, in line with the standard view that agents have individual informational states while the system must determine a collective response \(?;?\)\.

Two relations between agent outputs constitute the basic vocabulary\. Lets​i​m​\(ri,rj\)∈\[0,1\]sim\(r\_\{i\},r\_\{j\}\)\\in\[0,1\]denote the semantic similarity of two reasoning traces, with mean pairwise similaritys​i​m¯​\(c\)=2n​\(n−1\)​∑i<js​i​m​\(ri,rj\)\\overline\{sim\}\(c\)=\\tfrac\{2\}\{n\(n\-1\)\}\\sum\_\{i<j\}sim\(r\_\{i\},r\_\{j\}\); given a thresholdθs\\theta\_\{s\},H​i​g​h​S​i​m​\(c\)≡s​i​m¯​\(c\)≥θsHighSim\(c\)\\equiv\\overline\{sim\}\(c\)\\geq\\theta\_\{s\}andL​o​w​S​i​m​\(c\)≡s​i​m¯​\(c\)<θsLowSim\(c\)\\equiv\\overline\{sim\}\(c\)<\\theta\_\{s\}\. The threshold is a policy parameter, not a universal semantic boundary\. For conclusion agreement, letpd​\(c\)=\|\{ai:di=d\}\|/np\_\{d\}\(c\)=\|\\\{a\_\{i\}:d\_\{i\}=d\\\}\|/nandp∗​\(c\)=maxd∈D⁡pd​\(c\)p^\{\*\}\(c\)=\\max\_\{d\\in D\}p\_\{d\}\(c\); givenθa\\theta\_\{a\},A​g​r​e​e​\(c\)≡p∗​\(c\)≥θaAgree\(c\)\\equiv p^\{\*\}\(c\)\\geq\\theta\_\{a\}andD​i​s​a​g​r​e​e​\(c\)≡p∗​\(c\)<θaDisagree\(c\)\\equiv p^\{\*\}\(c\)<\\theta\_\{a\}\. Conservative settings pushθa\\theta\_\{a\}toward unanimity; permissive settings accept supermajority\.

Combining the two dimensions yields four symbolic states:

C​A​\(c\)\\displaystyle CA\(c\)≡H​i​g​h​S​i​m​\(c\)∧A​g​r​e​e​\(c\),\\displaystyle\\equiv HighSim\(c\)\\wedge Agree\(c\),D​A​\(c\)\\displaystyle DA\(c\)≡L​o​w​S​i​m​\(c\)∧A​g​r​e​e​\(c\),\\displaystyle\\equiv LowSim\(c\)\\wedge Agree\(c\),C​D​\(c\)\\displaystyle CD\(c\)≡H​i​g​h​S​i​m​\(c\)∧D​i​s​a​g​r​e​e​\(c\),\\displaystyle\\equiv HighSim\(c\)\\wedge Disagree\(c\),D​D​\(c\)\\displaystyle DD\(c\)≡L​o​w​S​i​m​\(c\)∧D​i​s​a​g​r​e​e​\(c\)\.\\displaystyle\\equiv LowSim\(c\)\\wedge Disagree\(c\)\.
These are not merely empirical clusters, they are symbolic abstractions of the multi\-agent system’s epistemic situation, available to a controller\. As in formal argumentation and nonmonotonic reasoning, conflicting reasons may support different conclusions\. We treat the resulting structure as a representable object \(?;?;?\)\. The state of greatest interest isC​D​\(c\)CD\(c\): when agents reason similarly but conclude differently, the residual disagreement is unlikely to be a difference of interpretation\. It more plausibly reflects different value weightings on a shared description of the case, a candidate signature of normative pluralism rather than error\. By contrast,D​D​\(c\)DD\(c\)suggests ambiguity or unstable interpretation;D​A​\(c\)DA\(c\)suggests robustness through independent reasons;C​A​\(c\)CA\(c\)is the most straightforward case for automatic resolution\. Figure[1](https://arxiv.org/html/2606.04223#S2.F1)summarizes the taxonomy together with the default meta\-actions defined next\.

![Refer to caption](https://arxiv.org/html/2606.04223v1/x1.png)Figure 1:The four disagreement states arise from combining reasoning similarity with conclusion agreement\. Each state has a default meta\-actionσR\\sigma\_\{R\}\(Sec\.[3](https://arxiv.org/html/2606.04223#S3)\)\. Convergent disagreement is treated as the strongest candidate signal of value\-laden conflict\.
## 3Defeasible Strategic Routing Rules

The disagreement states do not by themselves determine the moderation label\. They determine a*meta\-action*: the system reasons about whether to commit to an automatic decision at all\. Letd∗​\(c\)=arg⁡maxd∈D⁡pd​\(c\)d^\{\*\}\(c\)=\\arg\\max\_\{d\\in D\}p\_\{d\}\(c\)denote the most\-supported decision\. We consider four meta\-actions:A​u​t​o​\(c,d∗\)Auto\(c,d^\{\*\}\), automatically accept the strongest decision;A​u​t​o​E​x​p​l​a​i​n​\(c,d∗\)AutoExplain\(c,d^\{\*\}\), accept the strongest decision but preserve diverse explanations;S​e​e​k​C​o​n​t​e​x​t​\(c\)SeekContext\(c\), request additional information or another deliberation round; andE​s​c​a​l​a​t​e​\(c\)Escalate\(c\), forward the case to human judgment\.

We use⇒\\Rightarrowto denote a default inference whose consequent normally holds but may be overridden by stronger policy or risk constraints, in the spirit of nonmonotonic reasoning \(?\)\. The base routing policy is:

R1:\\displaystyle R\_\{1\}:C​A​\(c\)⇒A​u​t​o​\(c,d∗\),\\displaystyle\\quad CA\(c\)\\Rightarrow Auto\(c,d^\{\*\}\),\(1\)R2:\\displaystyle R\_\{2\}:D​A​\(c\)⇒A​u​t​o​E​x​p​l​a​i​n​\(c,d∗\),\\displaystyle\\quad DA\(c\)\\Rightarrow AutoExplain\(c,d^\{\*\}\),\(2\)R3:\\displaystyle R\_\{3\}:D​D​\(c\)⇒S​e​e​k​C​o​n​t​e​x​t​\(c\),\\displaystyle\\quad DD\(c\)\\Rightarrow SeekContext\(c\),\(3\)R4:\\displaystyle R\_\{4\}:C​D​\(c\)⇒E​s​c​a​l​a​t​e​\(c\)\.\\displaystyle\\quad CD\(c\)\\Rightarrow Escalate\(c\)\.\(4\)
RuleR1R\_\{1\}captures the easy case: justificatory and decisional convergence jointly justify automation\.R2R\_\{2\}handles agreement\-through\-diverse\-reasons; because the reasons differ, the system preserves explanation diversity rather than collapsing them into a single rationale, which matters when different stakeholders require different explanations \(?;?\)\.R3R\_\{3\}handles divergent disagreement: the system may not yet have a stable representation of the case, so context acquisition typically dominates immediate escalation\.R4R\_\{4\}is the central rule\. InC​DCD, agents share a broadly similar interpretation but transform it into different decisions; forcing consensus \(?\) here may conceal rather than resolve a normative conflict\.

Defeasibility is essential\. EvenC​A​\(c\)CA\(c\)may be overridden when content is legally sensitive or predicted harm is high; conversely,C​D​\(c\)CD\(c\)may not warrant escalation in low\-risk cases with high escalation cost:H​i​g​h​R​i​s​k​\(c\)⇒E​s​c​a​l​a​t​e​\(c\)HighRisk\(c\)\\Rightarrow Escalate\(c\),L​e​g​a​l​R​e​q​u​i​r​e​m​e​n​t​\(c\)⇒E​s​c​a​l​a​t​e​\(c\)LegalRequirement\(c\)\\Rightarrow Escalate\(c\), andL​o​w​R​i​s​k​\(c\)∧H​i​g​h​E​s​c​C​o​s​t​\(c\)⇒A​u​t​o​E​x​p​l​a​i​n​\(c,d∗\)LowRisk\(c\)\\wedge HighEscCost\(c\)\\Rightarrow AutoExplain\(c,d^\{\*\}\)\. The final meta\-action results from interaction between disagreement\-state rules and domain rules, in line with classical defeasible\-reasoning architectures \(?;?\)\. Decision\-theoretically, each meta\-action has a different cost profile automation risks illegitimate decisions,S​e​e​k​C​o​n​t​e​x​tSeekContextadds latency,E​s​c​a​l​a​t​eEscalateconsumes scarce institutional capacity and the disagreement state provides a structured signal for allocating these costs, complementing judgment\-aggregation perspectives that combine votes but do not, by themselves, decide*whether*to aggregate \(?\)\.

![Refer to caption](https://arxiv.org/html/2606.04223v1/x2.png)Figure 2:Architecture of the disagreement\-aware controller\. LLM agents deliberate at the object level, producing reasoning traces and decisions⟨ri,di⟩\\langle r\_\{i\},d\_\{i\}\\rangle\. The KR layer applies the abstractionΦ\\Phito extract one of four symbolic statesσ∈\{C​A,D​A,D​D,C​D\}\\sigma\\in\\\{CA,DA,DD,CD\\\}\. Defeasible rulesR1R\_\{1\}–R4R\_\{4\}then map each state to a strategic meta\-action; the convergent\-disagreement path \(highlighted\) defaults toE​s​c​a​l​a​t​eEscalate, but any rule may be overridden by domain\-level defaults\.
## 4Empirical Faithfulness Check: Content Moderation

The framework above is normative, it prescribes how a controller should react to disagreement structure\. We need to check whether the symbolic abstraction is faithful in a weaker but useful sense: do the four states track empirically distinct epistemic situations, in particular ones that humans also find different? This is a sanity check on the KR layer, not a benchmark of the routing policy\.

We reuse the experimental setup of \(?\)\. For each content itemcc, five LLM agents are instantiated from the same base model and differentiated by system prompts encoding distinct moderation perspectives:*harm\-focused*,*context\-sensitive*,*community\-norms*,*free\-expression*, and*legal\-framework*\. This isolates value\-profile differences from base\-capability differences\. Each agent produces⟨ri,di,vi,γi⟩\\langle r\_\{i\},d\_\{i\},v\_\{i\},\\gamma\_\{i\}\\ranglewheredi∈\{Keep,Remove\}d\_\{i\}\\in\\\{\\textsc\{Keep\},\\textsc\{Remove\}\\\}andrir\_\{i\}contains the agent’s interpretation, considerations, value trade\-offs and conclusion\.

We use the Measuring Hate Speech corpus \(?;?\), which preserves annotator variation and supports perspectivist analysis\. We samplen=600n=600items stratified by human annotator disagreement\. Reasoning traces are embedded into a shared vector space, and pairwise cosine similarity yieldss​i​m¯​\(c\)\\overline\{sim\}\(c\); the decision distribution yieldsp∗​\(c\)p^\{\*\}\(c\)\. Each case is labeled with one of the four symbolic states, giving the abstraction

Φ:⟨\(ri,di\)i=1n⟩⟼σ∈\{C​A,D​A,D​D,C​D\}\.\\Phi:\\langle\(r\_\{i\},d\_\{i\}\)\_\{i=1\}^\{n\}\\rangle\\;\\longmapsto\\;\\sigma\\in\\\{CA,DA,DD,CD\\\}\.The faithfulness check asks two questions aboutΦ\\Phi: \(i\) do cases assigned to different states differ in human disagreement, and \(ii\) does the structural distinctionΦ\\Phiprovide information beyond a magnitude\-only baseline that ignores conclusion structure?

## 5Preliminary Results and Evaluation

Table[1](https://arxiv.org/html/2606.04223#S5.T1)reports the distribution of cases across states and the corresponding mean human annotator disagreementd¯\\bar\{d\}\. The ordering predicted on conceptual grounds:D​A<C​A<D​D<C​DDA<CA<DD<CDis preserved: divergent agreement is the most stable, convergent disagreement the least\. The two disagreement states\{C​D,D​D\}\\\{CD,DD\\\}are jointly separated from the two agreement states\{C​A,D​A\}\\\{CA,DA\\\}with effect size Cohen’sd=0\.80d=0\.80\(p<10−11p<10^\{\-11\},n=600n=600\), suggesting that the structural abstraction tracks something humans also pick up on\.

Table 1:Distribution of cases over symbolic states and mean human annotator disagreementd¯∈\[0,1\]\\bar\{d\}\\in\[0,1\]\. Predicted orderingD​A<C​A<D​D<C​DDA<CA<DD<CDis preserved\.A natural baseline is to use only the magnitude of disagreement, e\.g\.,1−s​i​m¯​\(c\)1\-\\overline\{sim\}\(c\), ignoring conclusion structure\. Table[2](https://arxiv.org/html/2606.04223#S5.T2)compares the two as predictors of high human disagreement: category\-based routing achieves higher F1 than divergence\-only and substantially exceeds chance\. Divergence\-only achieves high recall but lower precision: it flags many cases where agents reason differently without that necessarily corresponding to human disagreement\. This is precisely what theC​D/D​DCD/DDdistinction captures a purely metric account loses the second axis \(whether agents nevertheless converge on a decision\), which is exactly the dimension that separates likely\-normative cases \(C​DCD\) from likely\-ambiguous ones \(D​DDD\)\. Figure[3](https://arxiv.org/html/2606.04223#S5.F3)visualizes the qualitative ordering\.

Table 2:Flagging high human\-disagreement cases\. Category\-based routing usesΦ\\Phi; divergence\-only uses1−s​i​m¯​\(c\)1\-\\overline\{sim\}\(c\)thresholded at the same operating point\.The check is preliminary, a single corpus, prompt\-based agent differentiation and embedding\-based similarity\. Stronger conclusions require independently parameterized agents and alternative similarity functions\. The result we treat as load\-bearing is qualitative the ordering and the agreement/disagreement gap, rather than the specific F1 numbers\.

![Refer to caption](https://arxiv.org/html/2606.04223v1/x3.png)Figure 3:Observed mean human disagreementd¯\\bar\{d\}per symbolic state, with the conceptually predicted rank order on the right \(1=1\{=\}lowest,4=4\{=\}highest\)\. The qualitative orderingD​A<C​A<D​D<C​DDA<CA<DD<CDis preserved;
## 6Discussion: From Consensus to Strategic Escalation

The framework reframes the design goal of LLM\-based multi\-agent systems\. A consensus\-seeking system asks how agents can be made to agree; a disagreement\-aware system asks what the structure of disagreement implies about the appropriate next action\. Reasoning traces are central to this shift: a vote alone does not reveal whether agents disagree because they misread the case or weigh shared considerations differently\. By comparing traces and decisions jointly, the controller distinguishes interpretive from evaluative disagreement in a manner reminiscent of argumentation frameworks where conclusions depend on the structure of supporting and attacking reasons \(?;?;?\)\.

The stateC​D​\(c\)CD\(c\)does the most strategic work\. In factual tasks it would look like inconsistency, in normative tasks it more plausibly indicates that agents share a description of the case and differ in value prioritization\. Collapsing such cases into a single automatic decision risks hiding legitimately contested situations\. Escalation here is not a failure of automation but a rational meta\-action under normative uncertainty\. Symmetrically,D​D​\(c\)DD\(c\)should typically trigger context acquisition rather than human review, because the system has not yet stabilized a representation\. ThisC​D/D​DCD/DDasymmetry is a chief benefit of the taxonomy: both involve disagreement, but call for different strategies\. We do not claim LLM reasoning traces are formal proofs, nor that semantic similarity captures logical equivalence\. The KR layer is a pragmatic interface between sub\-symbolic deliberation and symbolic strategic control \(?;?\)\.

Multi\-agent debate, round\-table consensus and Byzantine\-tolerant aggregation \(?;?;?;?\) share a design assumption that disagreement is a transient state to be resolved before output\. The empirical line on*perspectivist NLP and content moderation*\(?;?;?\) challenges this at the data level, treating annotator disagreement as informative rather than noisy\. We transfer this perspectivist stance from data to system architecture, making disagreement a representable state of the multi\-agent system rather than a defect\. Judgment aggregation \(?;?\) combines individual judgments without exposing the reasoning behind each\. Formal argumentation \(?;?;?\) exposes that reasoning as attack and support relations\. Our four\-state projection sits between the two: it preserves enough reasoning structure to distinguish shared from divergent interpretations \(beyond what aggregation sees\), but treats interpretations as propositional abstractions rather than full argumentation graphs\.

Several limitations are worth flagging\. Prompt\-based perspective differentiation may underrepresent the heterogeneity of independent agents\. Embedding similarity is a coarse proxy for reasoning equivalence\. The routing rules are hand\-designed defaults rather than learned or formally verified policies\. The empirical check covers a single domain\. Promising directions follow from each\. The KR layer can be enriched with explicit beliefs, preferences, and norms\. Reasoning traces can be coupled to argumentation graphs, so support, attack, and undercutting are detected directly\. Explicit cost models would let escalation choices be analyzed game\-theoretically\. Finally, the faithfulness check should be replicated in domains such as medical triage and legal assistance, where disagreement plausibly carries similar structural significance\.

## 7Conclusion

LLM\-based multi\-agent systems are typically designed to suppress disagreement\. This goal is strategically insufficient in value\-laden tasks, where disagreement may be a stable property of the case rather than a transient defect\. We proposed a knowledge\-representation layer that abstracts agent reasoning traces and decisions into four symbolic states:C​ACA,D​ADA,D​DDD,C​DCD\- and a defeasible policy mapping each to a strategic meta\-action\.

The result explicit interface between sub\-symbolic LLM deliberation and symbolic strategic control: the system reasons not only about what to decide, but about when to decide, when to inquire, and when to escalate\. A faithfulness check in content moderation is consistent with the claim that the structure of disagreement carries information its magnitude does not, with convergent disagreement most strongly tracking human normative conflict\. Natural extensions include coupling traces to argumentation graphs, learning the routing rules and experimenting with different LLMs\.

## References

Similar Articles

Beyond Consensus: Trace-Level Synthesis in Mixture of Agents

arXiv cs.AI

This paper reveals that aggregating complete reasoning traces from multiple LLM agents, rather than just their final answers, can correct errors even when agents unanimously agree, introducing the 'aggregation paradox' and the Self-Consistent Mixture of Agents method.

Conflict-Resilient Multi-Agent Reasoning via Signed Graph Modeling

arXiv cs.AI

This paper introduces SIGMA, a signed graph-informed multi-agent reasoning framework that explicitly models trust, conflict, and neutral relations among LLM agents to achieve conflict-resilient and globally consistent predictions, outperforming state-of-the-art baselines on six benchmarks.