Mitigating Anchoring Bias in LLM-Based Agents for Energy-Efficient 6G Autonomous Networks
Summary
This paper proposes a randomized anchoring strategy to mitigate anchoring bias in LLM-based agents for energy-efficient 6G autonomous networks, achieving up to 25% energy savings using a lightweight 1B-parameter model.
View Cached Full Text
Cached at: 06/18/26, 05:43 AM
# Mitigating Anchoring Bias in LLM-Based Agents for Energy-Efficient 6G Autonomous Networks
Source: [https://arxiv.org/html/2606.18272](https://arxiv.org/html/2606.18272)
Hatim Chergui, Claudia Carballo González, Farhad Rezazadeh, and Merouane DebbahH\. Chergui and C\. C\. González are with the i2CAT Foundation, 08034 Barcelona, Spain \(e\-mail: hatim\.chergui@i2cat\.net\)\.F\. Rezazadeh is with the Universitat Politècnica de Catalunya \(UPC\), 08034 Barcelona, Spain\.M\. Debbah is with the Research Institute for Digital Future, Khalifa University, 127788 Abu Dhabi, UAE\.
###### Abstract
This paper presents an autonomous agentic resource negotiation framework designed to enable zero\-touch network slicing in 6G architectures using Large Language Model \(LLM\) agents\. While LLMs offer powerful reasoning capabilities, we demonstrate that such agents inherently suffer from anchoring bias, rigidly adhering to initial heuristic proposals and causing severe network over\-provisioning\. To systematically mitigate this cognitive bias, we propose a novel randomized anchoring strategy modeled via a Truncated 3\-Parameter Weibull distribution\. This mathematically bounded approach seamlessly integrates with burst\-aware Digital Twins \(DTs\) employing Conditional Value at Risk \(CVaR\) to rigorously guarantee strict Service Level Agreement \(SLA\) tail\-latencies\. To validate our methodology, we introduce and prove the*Bimodal Constraint\-Avoidance Utility Theorem*, demonstrating that while feasible negotiations follow classical convex bounds, highly constrained scenarios undergo a phase transition governed by an inverse rational decay envelope\. Empirical results generated using a locally hosted 1B\-parameter model \(otel\-llm\-1b\-it\) confirm these dual\-regime bounds\. Our cognitive de\-biasing successfully dismantles rigid negotiation patterns, forcing agents into active exploration to safely ride SLA boundaries and boost system energy savings up to 25%\. Crucially, the lightweight 1B LLM achieves sub\-second inference latencies \(0\.95s mean\), ensuring our multi\-agent framework is compatible with the operational timescales of the O\-RAN non\-Real\-Time RAN Intelligent Controller \(non\-RT RIC\)111Our source code is available for non\-commercial use at[https://github\.com/HatimChergui](https://github.com/HatimChergui)\.\.
## IIntroduction
The transition toward sixth\-generation \(6G\) networks is significantly increasing the complexity of managing ubiquitous connectivity and stringent service requirements, exposing the limitations of current network automation approaches\. In this context, 6G wireless systems are being driven toward a vision of operational self\-governance\. To meet the rigorous demands of TM Forum’s Level 4 and Level 5 autonomy—representing closed\-loop and full autonomy, respectively\[[8](https://arxiv.org/html/2606.18272#bib.bib17)\]—network architectures move beyond conventional automation paradigms\. This evolution necessitates the deployment of*agentic systems*\[[3](https://arxiv.org/html/2606.18272#bib.bib18)\]\. Unlike legacy controllers, these Large Language Model \(LLM\)\-driven entities are designed to reason, plan, and negotiate at a high\-level objective space, enabling the dynamic management of slice orchestration and service assurance in highly dynamic environments\. However, delegating critical network operations to autonomous agents introduces new challenges related to reliability, robustness, and decision integrity\.
Recent scholarship suggests a troubling phenomenon in these advanced architectures: artificial intelligence \(AI\) agents frequently exhibit cognitive biases that mirror human psychological distortions\. Rooted in the foundational work of Tversky and Kahneman on heuristics and systematic errors\[[9](https://arxiv.org/html/2606.18272#bib.bib13)\], these biases can compromise collective decision\-making, fairness, and safety of agent\-augmented 6G systems\. As Xie et al\.\[[11](https://arxiv.org/html/2606.18272#bib.bib21)\]observe, these distortions are particularly prevalent in multi\-agent systems, which serve as the blueprint for decentralized 6G architectures\. Their impact propagates across the entire functional pipeline, manifesting in four key layers\. At the data level, biases may arise from historical or cultural imbalances in training datasets, for instance leading to*legacy bias*where agents fail to fully exploit advanced 6G capabilities\. At the prompt level, framing effects can skew decision\-making, such as prioritizing spectral efficiency at the expense of energy efficiency or fairness\. During reasoning, agents may rely on flawed heuristics\[[6](https://arxiv.org/html/2606.18272#bib.bib25)\], exhibiting behaviours such as availability\-driven over\-provisioning or confirmation bias in threat detection\. Finally, biases also affect tool and memory integration, where recency, primacy, or authority effects can distort the use of historical data and external information sources\.
Consequently, the intersection of cognitive psychology and autonomous networking has become a critical research direction\. A foundational contribution is provided by Chergui et al\.\[[1](https://arxiv.org/html/2606.18272#bib.bib16)\], who presented a structured tutorial on cognitive biases within 6G agentic systems\. Their work establishes the mathematical formulations for these biases and proposes mitigation strategies at both the agent and system levels, supported by practical 6G use cases\.
Beyond individual agent errors, research has highlighted the risks inherent in multi\-agent interaction\. In\[[2](https://arxiv.org/html/2606.18272#bib.bib2)\], the authors demonstrate that iterative agent discussions can amplify existing biases, creating*conversational echo chambers*where agents prematurely converge on a skewed consensus\. Furthermore, in\[[10](https://arxiv.org/html/2606.18272#bib.bib29)\], the authors study cognitive biases in LLM\-based interpersonal conflict resolution, showing that model judgments shift under biased prompt phrasing\. They propose BiasGUARRD, a multi\-agent framework that detects and mitigates such biases in socially sensitive decision\-making\. In\[[5](https://arxiv.org/html/2606.18272#bib.bib30)\], the authors likewise analyze limitations of Multi\-Agent Debate for LLM reasoning, showing that it can reinforce biases, and propose a refined multi\-agent prompting framework that enhances reasoning diversity and reduces bias, leading to improved decision accuracy and robustness across strategic tasks\. Empirical validation of these concerns is provided by Knipper et al\.\[[4](https://arxiv.org/html/2606.18272#bib.bib22)\], who indicate that while larger models \(\>\>32B parameters\) tend to reduce bias in approximately 39\.5% of cases, more detailed prompting—while generally helpful—can actually increase certain errors, such as Overattribution, by up to 8\.8%\.
In light of these challenges, this paper makes the following key contributions:
- •Anchoring Bias & Tail\-Risk Profiling:We demonstrate how initial proposals trap agents in rigid, over\-provisioned optima\. To ensure strict SLA compliance under stochastic traffic bursts, we integrate Conditional Value at Risk \(CVaR\) into the agent’s Digital Twin \(DT\)\.
- •Bimodal Constraint\-Avoidance Theorem:We introduce and rigorously prove a novel theoretical framework describing a phase transition in utility degradation bounds\. We map empirical utility loss to show dual\-regime behavior: a classical linear bound for feasible conditions and an inverse rational decay bound for constraint\-heavy environments\.
- •Truncated Weibull Mitigation:We propose an adaptive, randomized anchoring strategy via a Truncated Weibull distribution \(customized via shape parameterkk\) to safely explore energy\-efficient configurations and dismantle rigid negotiation patterns\.
- •non\-RT RIC Compatibility via 1B LLMs:Leveraging the lightweightotel\-llm\-1b\-itmodel, our multi\-agent framework executes complex negotiations with sub\-second response times \(0\.95s\), yielding up to 25% global energy savings while strictly meeting 99\.999th percentile URLLC latencies\.
## IINetwork Slicing CVaR Queuing Model
### II\-ASystem Dynamics and Edge\-RAN Queues
Our architecture \(see Figure[1](https://arxiv.org/html/2606.18272#S2.F1)\) considers a multi\-domain network slicing environment encompassing an Edge computing domain and a Radio Access Network \(RAN\)\. Service requests for a specific sliceiiare first processed at the Edge, resulting in a computation latencyLiedgeL^\{\\text\{edge\}\}\_\{i\}\. Subsequently, the processed packets are enqueued for wireless transmission via the RAN, incurring a transmission latencyLiRANL^\{\\text\{RAN\}\}\_\{i\}\. Consequently, the total end\-to\-end \(E2E\) latency is defined as:
Li=Liedge\+LiRAN\.L\_\{i\}=L^\{\\text\{edge\}\}\_\{i\}\+L^\{\\text\{RAN\}\}\_\{i\}\.\(1\)
Both the Edge and RAN domains operate under finite capacity\. Agents representing each slice must negotiate for a partition of the total available RAN bandwidth \(btotb\_\{\\text\{tot\}\}\) and Edge CPU capacity \(ftotf\_\{\\text\{tot\}\}\)\. An agentii’s decision is represented by the action vectorai=\(bi,fi\)a\_\{i\}=\(b\_\{i\},f\_\{i\}\)\.
Figure 1:Edge\-RAN cross\-domain slicing model\.To model these dynamics, each agentiimaintains a private DT grounded in queuing theory\. At each discrete time intervalttof durationτ\\tau, a volume of bitsΛi,t\\Lambda\_\{i,t\}arrives at the Edge according to a time\-varying, trial\-specific stochastic process:
Λi,t=λi,t⋅τ,\\Lambda\_\{i,t\}=\\lambda\_\{i,t\}\\cdot\\tau,\(2\)where the mean arrival rate𝔼\[λi,t\]\\mathbb\{E\}\[\\lambda\_\{i,t\}\]is maintained below the service rate to ensure queue stability\. The evolution of the Edge computation queue,Qi,t\(e\)Q^\{\(e\)\}\_\{i,t\}, is modeled as:
Qi,t\+1\(e\)=max\(0,Qi,t\(e\)−Di,t\(e\)\)\+Λi,t,Q^\{\(e\)\}\_\{i,t\+1\}=\\max\\left\(0,Q^\{\(e\)\}\_\{i,t\}\-D^\{\(e\)\}\_\{i,t\}\\right\)\+\\Lambda\_\{i,t\},\(3\)whereDi,t\(e\)D^\{\(e\)\}\_\{i,t\}represents the bits processed at the Edge:
Di,t\(e\)=τ⋅Ci,t\(e\)\(fi\)=τ⋅fi⋅CCPU\.D^\{\(e\)\}\_\{i,t\}=\\tau\\cdot C\_\{i,t\}^\{\(e\)\}\(f\_\{i\}\)=\\tau\\cdot f\_\{i\}\\cdot C\_\{\\text\{CPU\}\}\.\(4\)The RAN communication queue,Qi,t\(r\)Q^\{\(r\)\}\_\{i,t\}, is updated based on the output of the preceding computation stage:
Qi,t\+1\(r\)=max\(0,Qi,t\(r\)−Di,t\(r\)\)\+min\(Qi,t\(e\)\+Λi,t,Di,t\(e\)\),\\begin\{split\}Q^\{\(r\)\}\_\{i,t\+1\}&=\\max\\left\(0,Q^\{\(r\)\}\_\{i,t\}\-D^\{\(r\)\}\_\{i,t\}\\right\)\\\\ &\\quad\+\\min\\left\(Q^\{\(e\)\}\_\{i,t\}\+\\Lambda\_\{i,t\},D^\{\(e\)\}\_\{i,t\}\\right\),\\end\{split\}\(5\)whereDi,t\(r\)D^\{\(r\)\}\_\{i,t\}is the volume of transmitted bits, depending on the bandwidth allocationbib\_\{i\}and the stochastic Spectral Efficiency \(SE\),ηi,t\\eta\_\{i,t\}:
Di,t\(r\)=τ⋅Ci,t\(r\)\(bi,ηi,t\)=τ⋅bi⋅ηi,t\.D^\{\(r\)\}\_\{i,t\}=\\tau\\cdot C\_\{i,t\}^\{\(r\)\}\(b\_\{i\},\\eta\_\{i,t\}\)=\\tau\\cdot b\_\{i\}\\cdot\\eta\_\{i,t\}\.\(6\)
Applying Little’s Law, we define the average E2E latencyLi,TL\_\{i,T\}over a horizonTTas the ratio of aggregate queue lengths to the average arrival rate:
Li,T=1𝔼\[Λi,t\]T∑t=1T\(Qi,t\(e\)\+Qi,t\(r\)\)\.L\_\{i,T\}=\\frac\{1\}\{\\mathbb\{E\}\[\\Lambda\_\{i,t\}\]T\}\\sum\_\{t=1\}^\{T\}\\left\(Q^\{\(e\)\}\_\{i,t\}\+Q^\{\(r\)\}\_\{i,t\}\\right\)\.\(7\)The agent’s goal is to optimize the action vectorai=\(bi,fi\)a\_\{i\}=\(b\_\{i\},f\_\{i\}\)to maintainLi,TL\_\{i,T\}within the SLA while minimizing a linear power consumption costPi\(ai\)P\_\{i\}\(a\_\{i\}\):
Pi\(ai\)=Pstatic,i\+CBW⋅bi\+CCPU⋅fi,P\_\{i\}\(a\_\{i\}\)=P\_\{\\text\{static\},i\}\+C\_\{\\text\{BW\}\}\\cdot b\_\{i\}\+C\_\{\\text\{CPU\}\}\\cdot f\_\{i\},\(8\)whereCBWC\_\{\\text\{BW\}\}andCCPUC\_\{\\text\{CPU\}\}are the power consumptions per bandwidth and CPU frequency units, respectively\.
### II\-BDigital Twin and CVaR Tail\-Latency Prediction
To guarantee robustness against traffic bursts, the agent’s internal DT evaluates proposed actionsaia\_\{i\}using the Conditional Value at Risk \(CVaR\) of the latency distribution, rather than the mean\. For an M/M/1 approximation where the sojourn time follows an exponential distribution, the expected shortfall at the1−α1\-\\alphaconfidence level \(e\.g\.,α=0\.00001\\alpha=0\.00001for99\.999%99\.999\\%URLLC reliability\) is formulated as,
CVaR1−α\(Li\)=𝔼\[Li\]\(1−ln\(α\)\)\.\\text\{CVaR\}\_\{1\-\\alpha\}\(L\_\{i\}\)=\\mathbb\{E\}\[L\_\{i\}\]\\left\(1\-\\ln\(\\alpha\)\\right\)\.\(9\)Agents negotiate primarily over this strictCVaRmetric\. If the negotiated configuration breachesLSLA,iL\_\{\\text\{SLA\},i\}, an extreme utility penaltyℒmax\\mathcal\{L\}\_\{\\max\}is incurred\.
## IIIAgentic Negotiation & Bimodal Bounds
Anchoring bias arises when the initial resource proposal of each agentii—denoted as the vectorai\(0\)=\(bi\(0\),fi\(0\)\)a\_\{i\}^\{\(0\)\}=\(b\_\{i\}^\{\(0\)\},f\_\{i\}^\{\(0\)\}\)—systematically influences subsequent allocation updates\. This hinders the exploration of feasible multi\-resource configurations under the global system constraints\. This effect is particularly critical in multi\-round negotiations, as governed by Algorithm[1](https://arxiv.org/html/2606.18272#algorithm1)\.
Input:Anchor Strategy
𝒮\\mathcal\{S\}
Output:Final Agreement
𝒜=\(ae,au\)\\mathcal\{A\}=\(a\_\{e\},a\_\{u\}\)or Fallback
1
1ex//Initialization
2Initialize DTs and Agents
Ai,∀i∈\{e,u\}A\_\{i\},\\forall i\\in\\\{e,u\\\};
ai←Ai\.ProposeAnchor\(𝒮\),∀ia\_\{i\}\\leftarrow A\_\{i\}\.\\text\{ProposeAnchor\}\(\\mathcal\{S\}\),\\forall i;
//
ai=\(bi,fi\)a\_\{i\}=\(b\_\{i\},f\_\{i\}\)
3
𝒜←∅\\mathcal\{A\}\\leftarrow\\emptyset;
4
1ex//Turn\-Based Negotiation Loop
5for*r←1r\\leftarrow 1toRmaxR\_\{\\max\}*do
6
Φ←\(be\+bu≤btot\)∧\(fe\+fu≤ftot\)\\Phi\\leftarrow\(b\_\{e\}\+b\_\{u\}\\leq b\_\{\\text\{tot\}\}\)\\land\(f\_\{e\}\+f\_\{u\}\\leq f\_\{\\text\{tot\}\}\);
7Evaluate CVaR utilities
UiU\_\{i\}and SLA violations
Vi,∀iV\_\{i\},\\forall i;
8if*Φ∧\(∀i,Ui≥U*th*\)\\Phi\\land\(\\forall i,U\_\{i\}\\geq U\_\{\\text\{th\}\}\)*then
9
𝒜←\(ae,au\)\\mathcal\{A\}\\leftarrow\(a\_\{e\},a\_\{u\}\);
10break;
;
//Mutual Agreement
11
12
δbase←δmax⋅Rmax−r\+1Rmax\\delta\_\{\\text\{base\}\}\\leftarrow\\delta\_\{\\max\}\\cdot\\frac\{R\_\{\\max\}\-r\+1\}\{R\_\{\\max\}\};
;
//Decaying step size
13for*i∈\{e,u\}i\\in\\\{e,u\\\}*do
//Sequential Contextual Reasoning with PID\-Like Proportional Steps
14if*ViV\_\{i\}*then
15
η←min\(1\.0,max\(0\.25,LiLSLA,i−1\)\)\\eta\\leftarrow\\min\\big\(1\.0,\\max\(0\.25,\\frac\{L\_\{i\}\}\{L\_\{\\text\{SLA\},i\}\}\-1\)\\big\);
;
//Severity scale
16
δ←δbase⋅η\\delta\\leftarrow\\delta\_\{\\text\{base\}\}\\cdot\\eta;
17
ctxi←DemandInc\(δ\)ctx\_\{i\}\\leftarrow\\text\{DemandInc\}\(\\delta\);
;
//Proportional Climb
18
19else if*¬Φ∨V−i\\neg\\Phi\\lor V\_\{\-i\}*then
20
δ←δbase⋅cyield\\delta\\leftarrow\\delta\_\{\\text\{base\}\}\\cdot c\_\{\\text\{yield\}\};
21
ctxi←Yield\(δ\)ctx\_\{i\}\\leftarrow\\text\{Yield\}\(\\delta\);
;
//Hard Concession
22
23else
24
η←min\(1\.0,max\(0\.1,LSLA,imax\(1\.0,Li\)−1\)\)\\eta\\leftarrow\\min\\big\(1\.0,\\max\(0\.1,\\frac\{L\_\{\\text\{SLA\},i\}\}\{\\max\(1\.0,L\_\{i\}\)\}\-1\)\\big\);
;
//Safety scale
25
δ←δbase⋅ωfine⋅η\\delta\\leftarrow\\delta\_\{\\text\{base\}\}\\cdot\\omega\_\{\\text\{fine\}\}\\cdot\\eta;
26
ctxi←OptimizeEnergy\(δ\)ctx\_\{i\}\\leftarrow\\text\{OptimizeEnergy\}\(\\delta\);
;
//Proportional Descent
27
28
ai←Ai\.CounterPropose\(a−i,r,ai,ctxi\)a\_\{i\}\\leftarrow A\_\{i\}\.\\text\{CounterPropose\}\(a\_\{\-i\},r,a\_\{i\},ctx\_\{i\}\);
//Clamp to maintain systemic bounds
29
bi←max\(bmin,min\(bi,btot−b−i\)\)b\_\{i\}\\leftarrow\\max\(b\_\{\\min\},\\min\(b\_\{i\},b\_\{\\text\{tot\}\}\-b\_\{\-i\}\)\);
30
fi←max\(fmin,min\(fi,ftot−f−i\)\)f\_\{i\}\\leftarrow\\max\(f\_\{\\min\},\\min\(f\_\{i\},f\_\{\\text\{tot\}\}\-f\_\{\-i\}\)\);
31
ai←\(bi,fi\)a\_\{i\}\\leftarrow\(b\_\{i\},f\_\{i\}\);
32
33
34
1ex//Finalization
35if*𝒜=∅∧Φ\\mathcal\{A\}=\\emptyset\\land\\Phi*then
36
𝒜←\(ae,au\)\\mathcal\{A\}\\leftarrow\(a\_\{e\},a\_\{u\}\);
;
//Fallback to last feasible
37
38
Ai\.Memory\.Update\(𝒜\),∀iA\_\{i\}\.\\text\{Memory\}\.\\text\{Update\}\(\\mathcal\{A\}\),\\forall i;
39return
𝒜\\mathcal\{A\};
Algorithm 1Agentic Resource NegotiationFormally, anchoring can be modeled as a regularized optimization problem:
ai⋆=argmaxai\(Ui\(ai\)−γi⋅d\(ai,ai\(0\)\)\),a\_\{i\}^\{\\star\}=\\arg\\max\_\{a\_\{i\}\}\\Big\(U\_\{i\}\(a\_\{i\}\)\-\\gamma\_\{i\}\\cdot d\(a\_\{i\},a\_\{i\}^\{\(0\)\}\)\\Big\),\(10\)whered\(⋅,⋅\)d\(\\cdot,\\cdot\)is a deviation penalty metric andγi≥0\\gamma\_\{i\}\\geq 0quantifies the sensitivity of agentiito its initial proposal\. Incorporating the adaptive update logic of Algorithm[1](https://arxiv.org/html/2606.18272#algorithm1), the holistic resource adjustment rule is:
ai\(t\+1\)=ai\(t\)\+δt⋅Φi\(Li,Pi,DT\)−γi\(ai\(t\)−ai\(0\)\),a\_\{i\}^\{\(t\+1\)\}=a\_\{i\}^\{\(t\)\}\+\\delta\_\{t\}\\cdot\\Phi\_\{i\}\\big\(L\_\{i\},P\_\{i\},\\text\{DT\}\\big\)\-\\gamma\_\{i\}\\big\(a\_\{i\}^\{\(t\)\}\-a\_\{i\}^\{\(0\)\}\\big\),\(11\)whereΦi\(⋅\)\\Phi\_\{i\}\(\\cdot\)encodes the prioritized decision mechanism\.
The impact of anchoring is quantified through utility degradationℒi,anchor=Ui\(ai†\)−Ui\(ai⋆\)\\mathcal\{L\}\_\{i,\\text\{anchor\}\}=U\_\{i\}\(a\_\{i\}^\{\\dagger\}\)\-U\_\{i\}\(a\_\{i\}^\{\\star\}\)\. Traditional convex bounds assert thatℒi,anchor≤γi2μi‖ai†−ai\(0\)‖2\\mathcal\{L\}\_\{i,\\text\{anchor\}\}\\leq\\frac\{\\gamma\_\{i\}^\{2\}\}\{\\mu\_\{i\}\}\\\|a\_\{i\}^\{\\dagger\}\-a\_\{i\}^\{\(0\)\}\\\|^\{2\}\. However, in highly constrained multi\-agent settings with discontinuous SLA penalties, empirical observations contradict these classical bounds\. To explain this behavior, we formulate the following theorem:
###### Theorem III\.1\(Bimodal Constraint\-Avoidance Utility Bound\)\.
Let𝒞=\{𝐚∣∑jaj≤C\}\\mathcal\{C\}=\\\{\\mathbf\{a\}\\mid\\sum\_\{j\}a\_\{j\}\\leq C\\\}be the strict physical capacity constraint of the system, and let the agent’s utility function include a penalty cliffℒmax\\mathcal\{L\}\_\{\\max\}for CVaR SLA violations\. The anchoring\-induced utility degradationℒi,anchor\\mathcal\{L\}\_\{i,\\text\{anchor\}\}relative to the squared anchor distancedi=‖ai†−ai\(0\)‖2d\_\{i\}=\\\|a\_\{i\}^\{\\dagger\}\-a\_\{i\}^\{\(0\)\}\\\|^\{2\}exhibits a phase transition characterized by a dual\-regime envelope:
1. 1\.Phase 1 \(Feasible Convergence\):If∑aj\(0\)≤C\\sum a\_\{j\}^\{\(0\)\}\\leq C, the expected utility loss is bounded by a classical convex linear regime: ℒi≤γidi\.\\mathcal\{L\}\_\{i\}\\leq\\gamma\_\{i\}d\_\{i\}\.\(12\)
2. 2\.Phase 2 \(Penalty Recovery\):If∑aj\(0\)\>C\\sum a\_\{j\}^\{\(0\)\}\>C, forcing mandatory algorithmic concessions and triggering the penalty cliff, the expected utility loss decays via an inverse rational envelope as strategic low\-balling creates safety buffers: ℒi≤ℒmax1\+κidi\.\\mathcal\{L\}\_\{i\}\\leq\\frac\{\\mathcal\{L\}\_\{\\max\}\}\{1\+\\kappa\_\{i\}d\_\{i\}\}\.\(13\)
###### Proof\.
Phase 1:When the initial multi\-agent anchor resides within the interior of the feasible setint\(𝒞\)\\text\{int\}\(\\mathcal\{C\}\), the penalty condition is inactive\. Assuming the objective utilityUUis strongly concave with parameterμ\\mu, classical optimization distance bounding guarantees thatU\(a†\)−U\(a⋆\)≤12μ‖∇U‖2≤γi‖a†−a\(0\)‖2U\(a^\{\\dagger\}\)\-U\(a^\{\\star\}\)\\leq\\frac\{1\}\{2\\mu\}\\\|\\nabla U\\\|^\{2\}\\leq\\gamma\_\{i\}\\\|a^\{\\dagger\}\-a^\{\(0\)\}\\\|^\{2\}\.
Phase 2:When the initial joint anchor violates𝒞\\mathcal\{C\}, the negotiation protocol imposes rigid retraction mappings \(mandatory concessions\) causing severe SLA breaches \(ℒmax\\mathcal\{L\}\_\{\\max\}\)\. In this regime, an agent can strategically introduce an initial deficit by choosingai\(0\)≪ai†a\_\{i\}^\{\(0\)\}\\ll a\_\{i\}^\{\\dagger\}, mapping to a spatial safety buffersi∝dis\_\{i\}\\propto\\sqrt\{d\_\{i\}\}\. Assuming the opponent’s stochastic counter\-proposals introduce exploration noise with finite varianceσ2\\sigma^\{2\}, the probability of the final joint state violating the capacity constraint after the buffersis\_\{i\}is applied is bounded by Chebyshev’s inequality:
ℙ\(Violation\)≤σ2σ2\+si2=σ2σ2\+c⋅di\.\\mathbb\{P\}\(\\text\{Violation\}\)\\leq\\frac\{\\sigma^\{2\}\}\{\\sigma^\{2\}\+s\_\{i\}^\{2\}\}=\\frac\{\\sigma^\{2\}\}\{\\sigma^\{2\}\+c\\cdot d\_\{i\}\}\.\(14\)Scaling this probability into the utility loss space yields the expected degradationℒi=ℙ\(Violation\)×ℒmax\\mathcal\{L\}\_\{i\}=\\mathbb\{P\}\(\\text\{Violation\}\)\\times\\mathcal\{L\}\_\{\\max\}\. Lettingκi=c/σ2\\kappa\_\{i\}=c/\\sigma^\{2\}, we arrive at the inverse rational decay envelopeℒmax1\+κidi\\frac\{\\mathcal\{L\}\_\{\\max\}\}\{1\+\\kappa\_\{i\}d\_\{i\}\}\. ∎
Theorem[III\.1](https://arxiv.org/html/2606.18272#S3.Thmtheorem1)provides a profound insight: starting closer to the optimum is actually a*hazard*if it triggers systemic constraint violations\.
## IVWeibull Randomized Bias Correction
To navigate the bounds of Theorem[III\.1](https://arxiv.org/html/2606.18272#S3.Thmtheorem1)and eliminate energy inefficiencies caused by deterministic anchoring, we apply a*Truncated 3\-Parameter Weibull Distribution*𝒲trunc\(αc,βc,γc\)\\mathcal\{W\}\_\{\\text\{trunc\}\}\(\\alpha\_\{c\},\\beta\_\{c\},\\gamma\_\{c\}\)\. By diversifying initial proposals across negotiation instances, the system reduces the risk of consistently converging to suboptimal equilibria induced by poor anchors in either bandwidth or computational resources\.
We define the spatial constraints for𝒲trunc\(αc,βc,γc\)\\mathcal\{W\}\_\{\\text\{trunc\}\}\(\\alpha\_\{c\},\\beta\_\{c\},\\gamma\_\{c\}\)for each resource dimensionc∈\{b,f\}c\\in\\\{b,f\\\}as follows:
1. i\)Lower Bound \(αc\\alpha\_\{c\}\): The absolute physical floor of the proposal, e\.g\., αc=max\(1\.0,0\.6⋅creq\),\\alpha\_\{c\}=\\max\(1\.0,\\;0\.6\\cdot c\_\{\\text\{req\}\}\),\(15\)
2. ii\)Truncation / Upper Bound \(βc\\beta\_\{c\}\): The exploratory ceiling, capped to prevent hoarding, e\.g\., βc=min\(0\.9⋅ctot,1\.1⋅creq\),\\beta\_\{c\}=\\min\(0\.9\\cdot c\_\{\\text\{tot\}\},\\;1\.1\\cdot c\_\{\\text\{req\}\}\),\(16\)
3. iii\)Target Mode \(γc\\gamma\_\{c\}\): The point of maximum probability density\. We intentionally position this mode below the required baseline, e\.g\., γc=0\.90⋅creq\.\\gamma\_\{c\}=0\.90\\cdot c\_\{\\text\{req\}\}\.\(17\)
Input:Slice
i∈\{e,u\}i\\in\\\{e,u\\\}, Strategy
𝒮\\mathcal\{S\}, SLA limit
LSLA,iL\_\{\\text\{SLA\},i\}, DT
DiD\_\{i\}, limits
ctot∈\{btot,ftot\}c\_\{\\text\{tot\}\}\\in\\\{b\_\{\\text\{tot\}\},f\_\{\\text\{tot\}\}\\\}
Output:Initial multi\-resource anchor vector
ai\(0\)=\(bi\(0\),fi\(0\)\)a\_\{i\}^\{\(0\)\}=\(b\_\{i\}^\{\(0\)\},f\_\{i\}^\{\(0\)\}\)
1
21exfor*each resource dimensionc∈\{b,f\}c\\in\\\{b,f\\\}*do
//Find strictly optimal resource for CVaR SLA via DT
3
creq←min\{c′∈\[cabs\_min,ctot\]∣CVaR1−α\(Di,c′\)≤LSLA,i\}c\_\{\\text\{req\}\}\\leftarrow\\min\\big\\\{c^\{\\prime\}\\in\[c\_\{\\text\{abs\\\_min\}\},c\_\{\\text\{tot\}\}\]\\mid\\text\{CVaR\}\_\{1\-\\alpha\}\(D\_\{i\},c^\{\\prime\}\)\\leq L\_\{\\text\{SLA\},i\}\\big\\\};
4
copt←creq/1\.05c\_\{\\text\{opt\}\}\\leftarrow c\_\{\\text\{req\}\}/1\.05;
;
//Target underlying optimum without buffer
5if*𝒮=randomized\\mathcal\{S\}=\\text\{randomized\}*then
//Slice\-dependent Truncated Weibull anchor generation
6if*i=URLLCi=\\text\{URLLC\}*then
7
\(αc,γc,ki\)←\(0\.85⋅copt,0\.98⋅copt,5\.0\)\(\\alpha\_\{c\},\\gamma\_\{c\},k\_\{i\}\)\\leftarrow\(0\.85\\cdot c\_\{\\text\{opt\}\},\\;0\.98\\cdot c\_\{\\text\{opt\}\},\\;5\.0\);
8
9else
10
\(αc,γc,ki\)←\(0\.60⋅copt,0\.90⋅copt,2\.0\)\(\\alpha\_\{c\},\\gamma\_\{c\},k\_\{i\}\)\\leftarrow\(0\.60\\cdot c\_\{\\text\{opt\}\},\\;0\.90\\cdot c\_\{\\text\{opt\}\},\\;2\.0\);
11
12
βc←min\(0\.9⋅ctot,1\.1⋅copt\)\\beta\_\{c\}\\leftarrow\\min\(0\.9\\cdot c\_\{\\text\{tot\}\},\\;1\.1\\cdot c\_\{\\text\{opt\}\}\);
13
λi←\(γc−αc\)⋅\(kiki−1\)1/ki\\lambda\_\{i\}\\leftarrow\(\\gamma\_\{c\}\-\\alpha\_\{c\}\)\\cdot\\big\(\\frac\{k\_\{i\}\}\{k\_\{i\}\-1\}\\big\)^\{1/k\_\{i\}\};
14
ci\(0\)∼𝒲trunc\(αc,βc,γc;ki,λi\)c\_\{i\}^\{\(0\)\}\\sim\\mathcal\{W\}\_\{\\text\{trunc\}\}\(\\alpha\_\{c\},\\beta\_\{c\},\\gamma\_\{c\};k\_\{i\},\\lambda\_\{i\}\);
15
QueryLLM\(Promptforce\(ci\(0\)\)\)\\text\{QueryLLM\}\(\\text\{Prompt\}\_\{\\text\{force\}\}\(c\_\{i\}^\{\(0\)\}\)\);
16
17else
//Fixed strategy: default to greedy heuristic
18
ℋ←Memory\.DistillStrategy\(\)\\mathcal\{H\}\\leftarrow\\text\{Memory\.DistillStrategy\(\)\};
19
ci\(0\)←QueryLLM\(Promptheur\(ℋ,ctarget≈creq\)\)c\_\{i\}^\{\(0\)\}\\leftarrow\\text\{QueryLLM\}\(\\text\{Prompt\}\_\{\\text\{heur\}\}\(\\mathcal\{H\},c\_\{\\text\{target\}\}\\approx c\_\{\\text\{req\}\}\)\);
20
21
22return
ai\(0\)←\(bi\(0\),fi\(0\)\)a\_\{i\}^\{\(0\)\}\\leftarrow\(b\_\{i\}^\{\(0\)\},f\_\{i\}^\{\(0\)\}\);
Algorithm 2Initial Resource Proposal with Anchor Bias Mitigation\(a\)Bimodal Constraint\-Avoidance Utility Bounds
\(b\)Utility Degradation CDF
Figure 2:Analytical validation of anchoring\-induced utility degradation \(ℒi,anchor\\mathcal\{L\}\_\{i,\\text\{anchor\}\}\) mapped against the squared anchor distance \(did\_\{i\}\)\. The empirical data distinctly traces the*Bimodal Constraint\-Avoidance*bounds proven in Theorem[III\.1](https://arxiv.org/html/2606.18272#S3.Thmtheorem1)\.The initial resource proposalci\(0\)c\_\{i\}^\{\(0\)\}is drawn from the continuous random variableX∼𝒲trunc\(αc,βc,γc\)X\\sim\\mathcal\{W\}\_\{\\text\{trunc\}\}\(\\alpha\_\{c\},\\beta\_\{c\},\\gamma\_\{c\}\)\. To ensure the peak of the distribution aligns exactly with our target modeγc\\gamma\_\{c\}, we derive the scale parameterλ\\lambdaas:
λ=\(γc−αc\)×\(kk−1\)1k\.\\lambda=\\left\(\\gamma\_\{c\}\-\\alpha\_\{c\}\\right\)\\times\\left\(\\frac\{k\}\{k\-1\}\\right\)^\{\\frac\{1\}\{k\}\}\.\(18\)
The resulting Truncated Weibull PDFf\(x\)f\(x\)is formally defined over the supportαc≤x≤βc\\alpha\_\{c\}\\leq x\\leq\\beta\_\{c\}as:
f\(x\)\\displaystyle f\(x\)=11−e−\(βc−αcλ\)k\\displaystyle=\\frac\{1\}\{1\-e^\{\-\\left\(\\frac\{\\beta\_\{c\}\-\\alpha\_\{c\}\}\{\\lambda\}\\right\)^\{k\}\}\}\(19\)×kλ\(x−αcλ\)k−1exp\(−\(x−αcλ\)k\),\\displaystyle\\times\\frac\{k\}\{\\lambda\}\\left\(\\frac\{x\-\\alpha\_\{c\}\}\{\\lambda\}\\right\)^\{k\-1\}\\exp\\left\(\-\\left\(\\frac\{x\-\\alpha\_\{c\}\}\{\\lambda\}\\right\)^\{k\}\\right\),and 0 otherwise\. The deployment of this strategy is shown in Algorithm[2](https://arxiv.org/html/2606.18272#algorithm2)\.
We utilize adaptive shape parameters \(kk\) to reflect specific slice tolerances\. For URLLC, which possesses extreme sensitivity to tail\-latencies, we utilize a tightly peaked distribution \(k=5\.0k=5\.0\) with a very safe lower limit \(αc=0\.85⋅creq\\alpha\_\{c\}=0\.85\\cdot c\_\{\\text\{req\}\}\)\. For eMBB, we encourage wider exploration \(k=2\.0k=2\.0,αc=0\.6⋅creq\\alpha\_\{c\}=0\.6\\cdot c\_\{\\text\{req\}\}\)\. The Weibull PDF exponentially penalizes overly aggressive demands via itse−xke^\{\-x^\{k\}\}right\-tail decay, ensuring initial proposals inject bounded variance into the negotiation while rigorously anchoring the system toward the optimal penalty\-recovery envelope \(Phase 2 of Theorem[III\.1](https://arxiv.org/html/2606.18272#S3.Thmtheorem1)\)\.
## VNumerical Results
### V\-AExperimental Setup
We evaluate the mitigation of anchoring bias by dynamically allocating shared RAN bandwidth \(btot=60b\_\{\\text\{tot\}\}=60MHz\) and Edge CPU capacity \(ftot=40f\_\{\\text\{tot\}\}=40GHz\) between two network slices\. The simulation advances in discrete time steps ofτ=10ms\\tau=10\\text\{ ms\}\. To emulate realistic 6G network dynamics, we deploy differentiated, time\-varying traffic profiles: the eMBB slice generates a base load of 90 Mbps subject to slow sinusoidal fluctuations, whereas the URLLC slice generates a base load of 40 Mbps interspersed with deterministic 50% traffic bursts to stress the system’s tail\-latency handling\. The autonomous agents are powered by the lightweightotel\-llm\-1b\-itmodel hosted locally\[[7](https://arxiv.org/html/2606.18272#bib.bib31)\], interfacing via an OpenAI\-compatible API with a temperature setting of0\.10\.1to ensure highly focused, JSON\-compliant reasoning\. The negotiation framework restricts interaction to a maximum ofRmax=5R\_\{\\max\}=5rounds per trial\. Slice latency targets are evaluated at the rigorous CVaR tail:50ms50\\text\{ ms\}for eMBB \(95%95\\%confidence\) and10ms10\\text\{ ms\}for URLLC \(99\.999%99\.999\\%confidence\)\. We contrast a deterministic*Fixed*baseline strategy—which anchors heavily on conservative heuristics—against the proposed*Weibull Randomized*strategy over 200 independent trials\. Each trial concludes with a 10\-step post\-agreement evaluation phase to rigorously measure SLA violations, queue dynamics, and aggregate energy savings against the maximum baseline power consumption\.
Figure 3:The use of a locally hosted 1B model ensures sub\-second responses compatible with the non\-RT RIC\.
### V\-BAnalytical Validation of Theorem 3\.1
The mechanics of our proposed Bimodal Constraint\-Avoidance Utility Theorem are empirically validated in Figure[2](https://arxiv.org/html/2606.18272#S4.F2)\. Figure[2a](https://arxiv.org/html/2606.18272#S4.F2.sf1)plots the empirical utility degradation \(ℒi,anchor\\mathcal\{L\}\_\{i,\\text\{anchor\}\}\) against the squared anchor distance \(di=‖ai†−ai\(0\)‖2d\_\{i\}=\\\|a\_\{i\}^\{\\dagger\}\-a\_\{i\}^\{\(0\)\}\\\|^\{2\}\)\. As predicted by Theorem[III\.1](https://arxiv.org/html/2606.18272#S3.Thmtheorem1), the empirical data distinctly bifurcates into two mathematical regimes, demonstrating a clear phase transition based on initial feasibility\.
First, for instances where the joint initial anchor lies within the physical RAN capacity limit \(∑aj\(0\)≤60\\sum a\_\{j\}^\{\(0\)\}\\leq 60MHz\), the negotiation proceeds without triggering severe SLA penalty cliffs\. As shown by the dense cluster in the bottom\-left corner of Figure[2a](https://arxiv.org/html/2606.18272#S4.F2.sf1), these points trace the*Phase 1 Feasible Convergence*regime\. Here, the utility loss is strictly governed by the suboptimality of the initial distance, tightly hugging the linearly rising Classical Bound defined byℒi≤γidi\\mathcal\{L\}\_\{i\}\\leq\\gamma\_\{i\}d\_\{i\}\(with an empirical slope ofγi≈0\.05\\gamma\_\{i\}\\approx 0\.05\)\.
Conversely, when agents propose aggressively greedy initial anchors that exceed the 60 MHz limit, the system enters the*Phase 2 Penalty Recovery*regime\. The rigid capacity constraint forces immediate, mandatory algorithmic concessions, introducing a high risk of violating the strict CVaR SLA targets and triggering a maximum utility penalty \(ℒmax≈1\.2\\mathcal\{L\}\_\{\\max\}\\approx 1\.2\)\. However, as mathematically formalized in Phase 2 of the theorem, when an agent strategically injects an initial deficit \(increasingdid\_\{i\}via tactical ”low\-balling”\), it creates a spatial safety buffer that reduces the probability of a forced constraint violation\. The empirical data flawlessly traces this phenomenon: as the anchor distance increases, the utility degradation smoothly slides down the Inverse Rational decay envelope \(ℒi≤ℒmax1\+κidi\\mathcal\{L\}\_\{i\}\\leq\\frac\{\\mathcal\{L\}\_\{\\max\}\}\{1\+\\kappa\_\{i\}d\_\{i\}\}, governed byκi=0\.03\\kappa\_\{i\}=0\.03\)\.
The proposed Truncated Weibull strategy demonstrates superior adaptability across both domains\. While the deterministic baseline \(grey crosses\) frequently traps agents in suboptimal regions or high\-penalty states, the Weibull\-driven agents \(blue circles\) successfully ride the decay curve to mitigate losses\. Consequently, Figure[2b](https://arxiv.org/html/2606.18272#S4.F2.sf2)demonstrates the macroscopic impact of this behaviour: the Weibull approach substantially left\-shifts the degradation Cumulative Distribution Function \(CDF\)\. By effectively mapping its anchor generation to the mathematically optimal regions of the bimodal envelope, the proposed strategy minimizes median utility loss and completely eradicates the extreme tail risks inherent to uniform or deterministic exploration\.
### V\-CSystem Performance & non\-RT RIC Compatibility
For practical O\-RAN integration, Figure[3](https://arxiv.org/html/2606.18272#S5.F3)showcases the negotiation latency\. Because we utilize the optimized 1B parameter model, the mean LLM response time drops to 0\.95 seconds under the randomized strategy\. This sub\-second latency footprint comfortably places complex agentic negotiations within the 1\-second timescale required for non\-RT RIC operations\. As shown in Figure[4a](https://arxiv.org/html/2606.18272#S5.F4.sf1), the Weibull strategy breaks the rigid energy savings limit of the deterministic model, dynamically shedding excess capacity to push system\-wide energy savings up to a maximum of 25%, establishing a superior, sustainable equilibrium\. Furthermore, a deliberate strategic trade\-off is observable\. Figure[4c](https://arxiv.org/html/2606.18272#S5.F4.sf3)confirms that both strategies meet strict SLA limits, notably successfully satisfying the 99\.999th percentile CVaR latency for the URLLC slice below 10ms\. However, the randomized strategy intentionally utilizes the available SLA slack \(evidenced by higher mean latencies in Figure[4b](https://arxiv.org/html/2606.18272#S5.F4.sf2)\)\. This consumption of slack is converted directly into massive energy efficiency gains\. Ultimately, the integration of bias\-mitigated agentic reasoning within the non\-RT RIC heralds a shift toward intent\-driven, zero\-touch network orchestration\. By replacing rigid heuristic baselines with the dynamically adaptive Truncated Weibull anchor, the 6G system not only achieves deterministic tail\-latency guarantees under stochastic loads but also maximizes resource frugality\. The ability to perform such high\-level, multi\-objective optimization using a locally hosted, privacy\-preserving 1B model proves that cognitive de\-biasing is both computationally viable and strictly necessary for the sustainable operation of future autonomous networks\.
\(a\)Energy Saving CDF
\(b\)Latency CDF by Strategy
\(c\)Latency Statistics \(Mean & 99\.999th Percentile\)
Figure 4:Overall system performance comparing Deterministic and Weibull Randomized anchoring\.
## VIConclusion
This paper addressed anchoring bias in autonomous, LLM\-driven 6G network slicing\. By integrating CVaR tail\-latency evaluations and proposing the Bimodal Constraint\-Avoidance Theorem, we rigorously mapped the dual\-regime nature of multi\-agent utility degradation\. Leveraging a Truncated Weibull randomization strategy with adaptive shape scaling, we systematically dismantled rigid over\-provisioning heuristics\. Empirical deployments using the lightweightotel\-llm\-1b\-itmodel yielded highly performant sub\-second inferences \(0\.95s mean\), ensuring seamless compatibility with non\-RT RIC control loops\. This mathematically bounded exploration enabled agents to navigate toward SLA boundaries confidently, boosting network\-wide energy savings to 25% while strictly maintaining 99\.999th percentile URLLC reliability\. Ultimately, this framework accelerates the realization of intent\-driven, zero\-touch O\-RAN orchestration\. We eventually proved that privacy\-preserving, locally hosted small LLMs can execute such complex optimizations under cognitive de\-biasing, forming a fundamental pillar for sustainable 6G autonomous networks\.
## References
- \[1\]\(2025\)A Tutorial on Cognitive Biases in Agentic AI\-Driven 6G Autonomous Networks\.External Links:2510\.19973,[Link](https://arxiv.org/abs/2510.19973)Cited by:[§I](https://arxiv.org/html/2606.18272#S1.p3.1)\.
- \[2\]E\. Coppolillo, G\. Manco, and L\. M\. Aiello\(2025\)Unmasking Conversational Bias in AI Multiagent Systems\.ArXivabs/2501\.14844\.External Links:[Link](https://api.semanticscholar.org/CorpusID:275920669)Cited by:[§I](https://arxiv.org/html/2606.18272#S1.p4.1)\.
- \[3\]M\. A\. Ferrag, N\. Tihanyi, and M\. Debbah\(2025\)From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review\.External Links:2504\.19678,[Link](https://arxiv.org/abs/2504.19678)Cited by:[§I](https://arxiv.org/html/2606.18272#S1.p1.1)\.
- \[4\]R\. A\. Knipper, C\. S\. Knipper, K\. Zhang, V\. Sims, C\. Bowers, and S\. Karmaker\(2025\)The bias is in the details: an assessment of cognitive bias in llms\.External Links:2509\.22856,[Link](https://arxiv.org/abs/2509.22856)Cited by:[§I](https://arxiv.org/html/2606.18272#S1.p4.1)\.
- \[5\]J\. Oh, M\. Jeong, J\. Ko, and S\. Yun\(2025\)Understanding bias reinforcement in LLM agents debate\.InForty\-second International Conference on Machine Learning,External Links:[Link](https://openreview.net/forum?id=23iGt7BBtl)Cited by:[§I](https://arxiv.org/html/2606.18272#S1.p4.1)\.
- \[6\]F\. Rezazadeh, H\. Chergui, M\. Debbah, H\. Song, D\. Niyato, and L\. Liu\(2025\)Agentic World Modeling for 6G: Near\-Real\-Time Generative State\-Space Reasoning\.arXiv preprint arXiv:2511\.02748\.External Links:2511\.02748Cited by:[§I](https://arxiv.org/html/2606.18272#S1.p2.1)\.
- \[7\]F\. Tavakkoli, G\. Diamos, R\. Paulk, and J\. Terrazas\(2026\)OTel: open telco ai models\.External Links:[Link](https://huggingface.co/farbodtavakkoli)Cited by:[§V\-A](https://arxiv.org/html/2606.18272#S5.SS1.p1.9)\.
- \[8\]TM Forum\(2021\-12\)Autonomous Networks: Exploring the Evolution from Level 0 to Level 5\.Technical reportTM Forum\.Cited by:[§I](https://arxiv.org/html/2606.18272#S1.p1.1)\.
- \[9\]A\. Tversky and D\. Kahneman\(1974\)Judgment under Uncertainty: Heuristics and Biases\.Science185\(4157\),pp\. 1124–1131\.External Links:[Document](https://dx.doi.org/10.1126/science.185.4157.1124),[Link](https://www.science.org/doi/abs/10.1126/science.185.4157.1124),https://www\.science\.org/doi/pdf/10\.1126/science\.185\.4157\.1124Cited by:[§I](https://arxiv.org/html/2606.18272#S1.p2.1)\.
- \[10\]E\. Wang, S\. S\. Kunnam, and S\. Ratala\(2025\)BIASGUARRD: enhancing fairness and reliability in LLM conflict resolution through agentic debiasing\.InICML 2025 Workshop on Reliable and Responsible Foundation Models,External Links:[Link](https://openreview.net/forum?id=LH9jXb4B6a)Cited by:[§I](https://arxiv.org/html/2606.18272#S1.p4.1)\.
- \[11\]Z\. Xie, J\. Zhao, Y\. Wang, J\. Shi, Y\. Bai, X\. Wu, and L\. He\(2024\)Mindscope: exploring cognitive biases in large language models through multi\-agent systems\.InEuropean Conference on Artificial Intelligence,Cited by:[§I](https://arxiv.org/html/2606.18272#S1.p2.1)\.Similar Articles
From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs
This paper presents a two-stage methodology for end-to-end LLM deployment on spatial NPUs, progressing from human-guided development to an autonomous agent skill system. The system achieves speedups of 2.2x on prefill and 4.0x on decode for a reference model, and autonomously deploys eight additional LLMs on AMD XDNA 2 NPU with minimal human guidance.
Anchor: Mitigating Artifact Drift in Agent Benchmark Generation
Anchor is a task-generation pipeline that addresses artifact drift in AI agent benchmarks by jointly producing instructions, environments, solutions, and verifiers from a single constraint optimization specification, yielding consistent and auditable evaluation tasks for enterprise workflows. The paper introduces ERP-Bench, a benchmark of 300 long-horizon tasks in a production ERP system, showing that frontier models satisfy explicit constraints in 26.1% of trials but reach optimal solutions in only 17.4%.
Localizing Anchoring Pathways in Language Models
This paper investigates how irrelevant numbers in prompts cause anchoring effects in language models and localizes the internal pathways carrying this signal using attribution-based circuit methods on Qwen and Llama models.
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive
This paper introduces AutoLLMResearch, an agentic framework that automates the configuration of expensive LLM experiments by learning from low-fidelity environments and extrapolating to high-cost settings. It aims to reduce computational waste and reliance on expert intuition in scalable LLM research.
LLM-Enabled NWDAF: A Step Toward AI-Native 6G Network Intelligence
This paper presents an open-source NWDAF compatible with Free5GC that integrates an LLM interface for natural language interaction and intent-based network management, aiming toward AI-native 6G networks.