Managed Autonomy at Runtime: Gear-Based Safety and Governance for Single- and Multi-Agent Cyber-Physical Systems
Summary
This paper presents EntropyRuntime, a discrete-time control system for single and multi-agent LLM-driven and robotic agents that uses five execution gears with utility-gated dispatch and event-driven fallback to ensure safety, stability, and continuity. It provides formal proofs and evaluates on a three-agent UR5 robotic assembly cell, achieving 99.6% anomaly detection rate.
View Cached Full Text
Cached at: 07/02/26, 05:40 AM
# Gear-Based Safety and Governance for Single- and Multi-Agent Cyber-Physical Systems
Source: [https://arxiv.org/html/2607.00334](https://arxiv.org/html/2607.00334)
Wang Miaosheng Independent Researcher ORCID: 0009\-0003\-2767\-2421 wmsmiaosheng@outlook\.com
###### Abstract
Autonomous agents, whether LLM\-driven software agents or robotic physical agents, face a common class of failure modes when operating without continuous human oversight: safety violations from unverified actions, behavioral instability from unconstrained loops, and continuity loss from unhandled error states\. We develop EntropyRuntime, a discrete\-time control system that combines five execution gears \(Observe,Suggest,Plan,Execute,Integrate\) with utility\-gated dispatch and event\-driven fallback\. For the single\-agent case, we prove monotonic stability, execution safety, eventual stabilization, fallback completeness, and equivalence to a gear\-constrained Markov decision process\. For multi\-agent cyber\-physical systems \(CPS\), we apply the established SMARt managed\-autonomy lifecycle and map runtime evidence into its four governance states \(Stable/Meta\-Cognitive/Assisted/Regulated\)\. Consensus gating, swarm\-level Lyapunov analysis, per\-agent gear authority, and rendezvous control provide distributed safety and stability guarantees, including zero collision under the stated assumptions\. We evaluate the resulting runtime on a three\-agent UR5 robotic assembly cell using fault magnitudes calibrated from the NIST*Degradation Measurement of Robot Arm Position Accuracy*dataset across 10,000 Monte Carlo episodes\. It achieves a 99\.6% anomaly detection rate versus 2\.1% for the single\-agent baseline, reduces detection latency by3\.5×3\.5\\times, and supplies a formal physical\-workspace safety certificate\. The execution gears act as micro\-level permissions beneath the SMARt runtime governance states, separating action control from autonomy governance\.
Keywords:managed autonomy, AI governance, autonomous agents, runtime verification, gear\-based safety, utility gating, multi\-agent systems, cyber\-physical systems, Lyapunov stability, robotic assembly
## 1Introduction
The emergence of large language model \(LLM\) agents capable of multi\-step reasoning, tool use, and environment interaction has created a new class of autonomous systems\[[1](https://arxiv.org/html/2607.00334#bib.bib1),[2](https://arxiv.org/html/2607.00334#bib.bib2)\]\. These agents operate in closed loops that receive observations, generate plans, execute actions via external tools, and incorporate feedback, often without requiring human approval for each step\. Simultaneously, robotic and cyber\-physical agents increasingly operate in shared physical workspaces where sensor faults, coordination failures, and unsafe actions carry immediate physical consequences\. Both settings share a structural problem: the agent’s autonomy is granted in a binary and static fashion, with no principled mechanism for dynamically adjusting the scope of permissible actions in response to observed safety signals\.
Autonomous agents face three interrelated failure modes\. First,*safety violations*: the agent may issue actions that produce irreversible side effects without adequate verification\[[3](https://arxiv.org/html/2607.00334#bib.bib3)\]\. Second,*behavioral instability*: the agent may oscillate between strategies, fail to converge, or enter degenerate loops\[[4](https://arxiv.org/html/2607.00334#bib.bib4)\]\. Third,*continuity loss*: the agent may halt unexpectedly, losing accumulated context and requiring costly manual restarts\. In multi\-agent CPS, a fourth failure mode emerges:*coordination blindness*, where one agent’s sensor fault carries consequences for all neighbors yet the per\-agent control layer is structurally incapable of detecting or responding to it\.
We develop EntropyRuntime to address these failure modes through gear\-based action control\. At each cycle, one of five gears limits the scope and impact of permissible actions, and a utility gate evaluates every candidate before dispatch\. For multi\-agent CPS, we use the SMARt framework, which models autonomy as a four\-state lifecycle in which authority is continuously validated and may be suspended, assisted, or revoked\[[5](https://arxiv.org/html/2607.00334#bib.bib5)\]\. We refer to these modes as the SMARt runtime governance states\. A consensus gate coordinates dispatch across agents, while the gear assigned to each agent defines its permitted action scope\.
The paper therefore concentrates on action\-level enforcement and physical\-CPS certification rather than redefining the governance lifecycle\. The gearsG0G\_\{0\}throughG4G\_\{4\}operate beneathStable,Meta\-Cognitive,Assisted, andRegulated, linking authority decisions to executable behavior\.
Our contributions are as follows:
1. 1\.We formalize the*gear state abstraction*, spanningG0G\_\{0\}throughG4G\_\{4\}, with well\-defined transitions and provemonotonic stability\(Theorem[1](https://arxiv.org/html/2607.00334#Thmtheorem1)\) andeventual stabilization\(Theorem[3](https://arxiv.org/html/2607.00334#Thmtheorem3)\)\.
2. 2\.We formalize*utility\-gated execution*and proveexecution safety: no action with negative utility is ever dispatched \(Theorem[2](https://arxiv.org/html/2607.00334#Thmtheorem2)\)\.
3. 3\.We design an*event\-driven fallback mechanism*and provefallback completeness: every recoverable error state admits a resumption path \(Theorem[4](https://arxiv.org/html/2607.00334#Thmtheorem4)\)\.
4. 4\.We establish arepresentation theorem\(Theorem[5](https://arxiv.org/html/2607.00334#Thmtheorem5)\) connecting EntropyRuntime to the Markov decision process formalism\.
5. 5\.We conduct anablation studydemonstrating that each single\-agent component is necessary for safety, stability, and continuity\.
6. 6\.We map multi\-agent runtime evidence into the SMARt governance states through a*consensus utility gate*,*swarm Lyapunov function*,*per\-agent gear authority*, and a*rendezvous policy triple*\(Definitions[7](https://arxiv.org/html/2607.00334#Thmdefinition7)through[11](https://arxiv.org/html/2607.00334#Thmdefinition11)\)\.
7. 7\.We establish five distributed results covering execution and rendezvous safety, monotonic workspace stability, feedback\-coupled attenuation, and collision avoidance \(Theorems[6](https://arxiv.org/html/2607.00334#Thmtheorem6),[8](https://arxiv.org/html/2607.00334#Thmtheorem8), and[10](https://arxiv.org/html/2607.00334#Thmtheorem10); Corollaries[7](https://arxiv.org/html/2607.00334#Thmtheorem7)and[9](https://arxiv.org/html/2607.00334#Thmtheorem9)\)\.
8. 8\.We evaluate the multi\-agent runtime on a three\-agent UR5 robotic assembly cell over 10,000 Monte Carlo episodes, demonstrating a47\.7×47\.7\\timesimprovement in anomaly detection over the single\-agent baseline\.
Section[2](https://arxiv.org/html/2607.00334#S2)reviews related work, and Section[3](https://arxiv.org/html/2607.00334#S3)establishes the formal model\. Section[4](https://arxiv.org/html/2607.00334#S4)describes the EntropyRuntime single\-agent architecture\. Section[5](https://arxiv.org/html/2607.00334#S5)presents the single\-agent guarantees and component ablation\. Section[6](https://arxiv.org/html/2607.00334#S6)defines multi\-agent governance and control, followed by the distributed guarantees in Section[7](https://arxiv.org/html/2607.00334#S7)\. Section[8](https://arxiv.org/html/2607.00334#S8)combines the UR5 CPS case study, experimental setup, results, and component ablation\. Section[9](https://arxiv.org/html/2607.00334#S9)discusses implications, deployment considerations, and limitations, and Section[10](https://arxiv.org/html/2607.00334#S10)concludes\. Complete proofs appear in Appendices[A](https://arxiv.org/html/2607.00334#A1)and[B](https://arxiv.org/html/2607.00334#A2)\.
## 2Related Work
The underlying view of autonomy as revocable rather than permanently granted\[[5](https://arxiv.org/html/2607.00334#bib.bib5)\]aligns with taxonomies that vary authority according to task risk, environmental predictability, and demonstrated reliability\[[6](https://arxiv.org/html/2607.00334#bib.bib6)\]; with corrigibility research on resistance to human intervention\[[7](https://arxiv.org/html/2607.00334#bib.bib7)\]; and with systems\-theoretic safety, where accidents arise from inadequate control actions and constraints as well as component faults\[[8](https://arxiv.org/html/2607.00334#bib.bib8)\]\. The present work supplies the execution layer by linking those governance states to five runtime gears\.
Contemporary LLM\-agent systems combine reasoning, tool use, feedback, and repeated environment interaction\[[9](https://arxiv.org/html/2607.00334#bib.bib9),[10](https://arxiv.org/html/2607.00334#bib.bib10),[11](https://arxiv.org/html/2607.00334#bib.bib11)\]\. Communication standards such as the Model Context Protocol address how agents and tools exchange context\[[12](https://arxiv.org/html/2607.00334#bib.bib12)\], whereas verification frameworks evaluate whether proposed tool calls or agent behaviors satisfy safety specifications\[[3](https://arxiv.org/html/2607.00334#bib.bib3),[4](https://arxiv.org/html/2607.00334#bib.bib4)\]\. These approaches are complementary to the present architecture\. Rather than changing the reasoning model or communication protocol, EntropyRuntime is interposed between reasoning and execution, evaluates every candidate action through a utility gate, and invokes fallback or authority reduction when the action is not admissible\.
The broader AI\-safety literature identifies unsafe exploration, unintended side effects, and scalable oversight as persistent problems\[[13](https://arxiv.org/html/2607.00334#bib.bib13)\], while human\-compatible AI emphasizes objectives whose operation remains aligned with human preferences and intervention\[[14](https://arxiv.org/html/2607.00334#bib.bib14)\]\. Hierarchical reinforcement learning provides temporal abstraction across levels of action\[[15](https://arxiv.org/html/2607.00334#bib.bib15)\], and methods such as soft actor\-critic and curiosity\-driven exploration regulate exploration statistically through rewards, entropy, or intrinsic motivation\[[16](https://arxiv.org/html/2607.00334#bib.bib16),[17](https://arxiv.org/html/2607.00334#bib.bib17)\]\. The gear mechanism differs in purpose: it imposes an explicit action\-space restriction and dispatch invariant, so safety does not depend solely on what an agent has learned or on the shape of its reward function\.
The runtime design also draws from real\-time and stochastic control\. Real\-time systems research distinguishes hard and soft timing guarantees and emphasizes predictable response under operational constraints\[[18](https://arxiv.org/html/2607.00334#bib.bib18)\]\. The discrete gear process is analyzed using finite\-state Markov\-chain concepts\[[19](https://arxiv.org/html/2607.00334#bib.bib19)\], while the representation result connects the runtime to the classical MDP and dynamic\-programming formalism\[[20](https://arxiv.org/html/2607.00334#bib.bib20),[21](https://arxiv.org/html/2607.00334#bib.bib21)\]\. Information theory supplies the interpretation of the entropy termHiH\_\{i\}as a penalty for degraded telemetry\[[22](https://arxiv.org/html/2607.00334#bib.bib22)\]\. For multi\-agent operation, the consensus literature establishes convergence conditions for networked agent systems\[[23](https://arxiv.org/html/2607.00334#bib.bib23)\]; our consensus gate uses a more conservative execution predicate,miniUi≥θ\\min\_\{i\}U\_\{i\}\\geq\\theta, under which any unsafe local assessment blocks joint dispatch without depending on message agreement\.
Finally, multi\-robot research has used workspace decomposition and velocity scaling to coordinate teams in shared industrial environments\[[24](https://arxiv.org/html/2607.00334#bib.bib24)\], while surveys of heterogeneous robot teams identify sensor heterogeneity and coordination fragility as major operational risks\[[25](https://arxiv.org/html/2607.00334#bib.bib25)\]\. The multi\-agent runtime combines these concerns in one control structure:Meta\-Cognitiveapplies bounded velocity reduction,Regulatedenforces an E\-Stop, the entropy\-aware local utility detects telemetry degradation before geometry alone becomes critical, and the swarm Lyapunov function certifies workspace behavior\. This integrates governance transitions, action gating, distributed coordination, and CPS safety\.
## 3Formal Preliminaries
We work in discrete timet∈ℕt\\in\\mathbb\{N\}\. At each cyclett, the agent observes an environment statest∈𝒮s\_\{t\}\\in\\mathcal\{S\}, selects an actionat∈𝒜a\_\{t\}\\in\\mathcal\{A\}, and receives a reward signalrt∈ℝr\_\{t\}\\in\\mathbb\{R\}\.
###### Definition 1\(Gear State Space\)\.
The*gear state space*is𝒢=\{G0,G1,G2,G3,G4\}\\mathcal\{G\}=\\\{G\_\{0\},G\_\{1\},G\_\{2\},G\_\{3\},G\_\{4\}\\\}, representingObserve\(G0G\_\{0\}\),Suggest\(G1G\_\{1\}\),Plan\(G2G\_\{2\}\),Execute\(G3G\_\{3\}\), andIntegrate\(G4G\_\{4\}\)\.
Each gearGkG\_\{k\}defines a restricted action subspace𝒜k⊆𝒜\\mathcal\{A\}\_\{k\}\\subseteq\\mathcal\{A\}with𝒜0⊂𝒜1⊂𝒜2⊂𝒜3⊂𝒜4=𝒜\\mathcal\{A\}\_\{0\}\\subset\\mathcal\{A\}\_\{1\}\\subset\\mathcal\{A\}\_\{2\}\\subset\\mathcal\{A\}\_\{3\}\\subset\\mathcal\{A\}\_\{4\}=\\mathcal\{A\}\. Concretely:G0G\_\{0\}provides observation or a safe hold;G1G\_\{1\}allows candidate plan generation without external side effects;G2G\_\{2\}permits bounded, reversible, or safety\-preserving recovery actions, including information queries and reduced\-velocity continuation of a previously authorized command;G3G\_\{3\}allows independently selected actions with side effects; andG4G\_\{4\}denotes integrated system\-level coordination under the governing macro\-state\.
###### Definition 2\(Utility Function\)\.
The*utility function*U:𝒮×𝒜→ℝU:\\mathcal\{S\}\\times\\mathcal\{A\}\\to\\mathbb\{R\}maps a state\-action pair to a real\-valued utility score\. An actionaais*admissible*in statessifU\(s,a\)≥0U\(s,a\)\\geq 0\.
###### Definition 3\(Utility Gate\)\.
The*utility gate*Gate\(s,a\)\\textsc\{Gate\}\(s,a\)is a binary predicate:
Gate\(s,a\)=\{1ifU\(s,a\)≥θ0otherwise\\textsc\{Gate\}\(s,a\)=\\begin\{cases\}1&\\text\{if \}U\(s,a\)\\geq\\theta\\\\ 0&\\text\{otherwise\}\\end\{cases\}whereθ≥0\\theta\\geq 0is the safety threshold\.
###### Definition 4\(Runtime State\)\.
The*runtime state*at cyclettis the tupleρt=\(st,gt,σt,ϵt\)\\rho\_\{t\}=\(s\_\{t\},g\_\{t\},\\sigma\_\{t\},\\epsilon\_\{t\}\)wherest∈𝒮s\_\{t\}\\in\\mathcal\{S\}is the environment state,gt∈𝒢g\_\{t\}\\in\\mathcal\{G\}is the current gear,σt∈ℝ≥0\\sigma\_\{t\}\\in\\mathbb\{R\}\_\{\\geq 0\}is the accumulated instability measure, andϵt∈\{0,1\}\\epsilon\_\{t\}\\in\\\{0,1\\\}is the error flag\.
###### Definition 5\(Transition Kernel\)\.
The*transition kernel*T:ℛ×𝒜→Δ\(ℛ\)T:\\mathcal\{R\}\\times\\mathcal\{A\}\\to\\Delta\(\\mathcal\{R\}\)maps a runtime state and action to a distribution over next runtime states, whereℛ=𝒮×𝒢×ℝ≥0×\{0,1\}\\mathcal\{R\}=\\mathcal\{S\}\\times\\mathcal\{G\}\\times\\mathbb\{R\}\_\{\\geq 0\}\\times\\\{0,1\\\}\.
## 4The EntropyRuntime Architecture
EntropyRuntime is a discrete\-time closed\-loop control system interposed between an LLM agent and its execution environment\. At each cycle the system performs four phases: \(1\) observation and gear assessment, \(2\) action generation, \(3\) utility gating, and \(4\) execution and feedback\.
Cyclett: 1\.Read environment statests\_\{t\}; assess current geargtg\_\{t\}\. 2\.Generate candidate actionat∈𝒜gta\_\{t\}\\in\\mathcal\{A\}\_\{g\_\{t\}\}via LLM\. 3\.EvaluateGate\(st,at\)\\textsc\{Gate\}\(s\_\{t\},a\_\{t\}\); if rejected, invoke fallback\. 4\.If accepted, executeata\_\{t\}; observest\+1s\_\{t\+1\},rtr\_\{t\}\. 5\.Updateσt\\sigma\_\{t\},ϵt\\epsilon\_\{t\}; determinegt\+1g\_\{t\+1\}\.
Figure 1:The EntropyRuntime control loop\.#### Gear state machine\.
Gear transitions follow a deterministic policyπG:ℛ→𝒢\\pi\_\{G\}:\\mathcal\{R\}\\to\\mathcal\{G\}: \(1\)Escalation: ifσt<σlow\\sigma\_\{t\}<\\sigma\_\{\\text\{low\}\}and no errors persist forhhconsecutive cycles,gt\+1=min\(gt\+1,G4\)g\_\{t\+1\}=\\min\(g\_\{t\}\+1,G\_\{4\}\); \(2\)De\-escalation: ifσt\>σhigh\\sigma\_\{t\}\>\\sigma\_\{\\text\{high\}\}orϵt=1\\epsilon\_\{t\}=1,gt\+1=max\(gt−1,G0\)g\_\{t\+1\}=\\max\(g\_\{t\}\-1,G\_\{0\}\); \(3\)Hold: otherwise,gt\+1=gtg\_\{t\+1\}=g\_\{t\}\.
#### Utility\-gated execution\.
The utility function is computed as:
U\(s,a\)=α⋅Δtask\(s,a\)\+β⋅safety\(s,a\)−γ⋅cost\(a\)U\(s,a\)=\\alpha\\cdot\\Delta\\text\{task\}\(s,a\)\+\\beta\\cdot\\text\{safety\}\(s,a\)\-\\gamma\\cdot\\text\{cost\}\(a\)whereα,β,γ\>0\\alpha,\\beta,\\gamma\>0are tunable weights\. When an action is rejected, the fallback mechanism is invoked\.
#### Event\-driven fallback\.
Upon rejection: \(1\) the system records the rejection and incrementsσt\\sigma\_\{t\}; \(2\) the agent generates an alternativeat′∈𝒜gta^\{\\prime\}\_\{t\}\\in\\mathcal\{A\}\_\{g\_\{t\}\}; \(3\) afterkkfailed alternatives, the gear de\-escalates and the cycle repeats; \(4\) aftermmconsecutive rejections, the system entersObserveand suspends execution pending human review\.
Algorithm 1EntropyRuntime Control Loop0:Initial state
s0s\_\{0\}, initial gear
g0=G0g\_\{0\}=G\_\{0\}, threshold
θ\\theta
1:
t←0t\\leftarrow 0;
σ0←0\\sigma\_\{0\}\\leftarrow 0;
ϵ0←0\\epsilon\_\{0\}\\leftarrow 0
2:loop
3:Observe: read
sts\_\{t\}
4:Generate
at←LLM\(st,gt,history\)a\_\{t\}\\leftarrow\\text\{LLM\}\(s\_\{t\},g\_\{t\},\\text\{history\}\)
5:if
Gate\(st,at\)=1\\textsc\{Gate\}\(s\_\{t\},a\_\{t\}\)=1then
6:Execute
ata\_\{t\}; observe
st\+1s\_\{t\+1\},
rtr\_\{t\}
7:
σt\+1←max\(0,σt−δ\)\\sigma\_\{t\+1\}\\leftarrow\\max\(0,\\sigma\_\{t\}\-\\delta\);
ϵt\+1←0\\epsilon\_\{t\+1\}\\leftarrow 0
8:else
9:Invoke fallback: generate alternatives
10:ifalternative
at′a^\{\\prime\}\_\{t\}found with
Gate\(st,at′\)=1\\textsc\{Gate\}\(s\_\{t\},a^\{\\prime\}\_\{t\}\)=1then
11:Execute
at′a^\{\\prime\}\_\{t\};
σt\+1←σt\\sigma\_\{t\+1\}\\leftarrow\\sigma\_\{t\};
ϵt\+1←0\\epsilon\_\{t\+1\}\\leftarrow 0
12:else
13:
σt\+1←σt\+Δσ\\sigma\_\{t\+1\}\\leftarrow\\sigma\_\{t\}\+\\Delta\\sigma;
ϵt\+1←1\\epsilon\_\{t\+1\}\\leftarrow 1
14:
gt\+1←max\(gt−1,G0\)g\_\{t\+1\}\\leftarrow\\max\(g\_\{t\}\-1,G\_\{0\}\)
15:endif
16:endif
17:Update gear via
πG\(ρt\+1\)\\pi\_\{G\}\(\\rho\_\{t\+1\}\);
t←t\+1t\\leftarrow t\+1
18:endloop
## 5Single\-Agent Formal Guarantees and Ablation
### 5\.1Formal Guarantees
We state five theorems characterizing EntropyRuntime in the single\-agent case\. Complete proofs appear in Appendix[A](https://arxiv.org/html/2607.00334#A1)\.
###### Theorem 1\(Monotonic Stability\)\.
Let\{σt\}t≥0\\\{\\sigma\_\{t\}\\\}\_\{t\\geq 0\}be the instability sequence generated by EntropyRuntime with initial gearg0=G0g\_\{0\}=G\_\{0\}\. Then for allt≥0t\\geq 0:𝔼\[σt\+1∣σt\]≤σt\\mathbb\{E\}\[\\sigma\_\{t\+1\}\\mid\\sigma\_\{t\}\]\\leq\\sigma\_\{t\}\.
###### Proof Sketch\.
Accepted actions reduceσ\\sigmabyδ\\delta; rejected actions increase it byΔσ\\Delta\\sigmabut trigger gear de\-escalation, shrinking the action space and increasing the gate acceptance probability\. A coupling argument shows expected decreases dominate increases\. See Appendix[A\.1](https://arxiv.org/html/2607.00334#A1.SS1)\.□\\square∎
###### Theorem 2\(Execution Safety\)\.
Under EntropyRuntime withθ≥0\\theta\\geq 0:∀t≥0:atexecuted⟹U\(st,at\)≥θ≥0\\forall t\\geq 0:\\ a\_\{t\}\\ \\text\{executed\}\\implies U\(s\_\{t\},a\_\{t\}\)\\geq\\theta\\geq 0\.
###### Proof Sketch\.
The gate is the sole dispatch mechanism; execution requiresGate=1\\textsc\{Gate\}=1, which requiresU≥θ≥0U\\geq\\theta\\geq 0\. This holds for both primary and fallback actions\. See Appendix[A\.2](https://arxiv.org/html/2607.00334#A1.SS2)\.□\\square∎
###### Theorem 3\(Eventual Stabilization\)\.
Under a stationary environment and boundedUU, EntropyRuntime reaches a fixed gearg∗∈𝒢g^\{\*\}\\in\\mathcal\{G\}in finite time almost surely:∃g∗∈𝒢,T∗<∞:∀t≥T∗,gt=g∗\\exists\\,g^\{\*\}\\in\\mathcal\{G\},\\ T^\{\*\}<\\infty:\\ \\forall t\\geq T^\{\*\},\\ g\_\{t\}=g^\{\*\}a\.s\.
###### Proof Sketch\.
The gear process is a Markov chain on the finite space𝒢\\mathcal\{G\}\. The Foster\-Lyapunov theorem withV\(ρ\)=σ\+C⋅𝟏\[ϵ=1\]\+D⋅k\(g\)V\(\\rho\)=\\sigma\+C\\cdot\\mathbf\{1\}\[\\epsilon=1\]\+D\\cdot k\(g\)certifies positive recurrence; the only recurrent classes are singletons\. See Appendix[A\.3](https://arxiv.org/html/2607.00334#A1.SS3)\.□\\square∎
###### Theorem 4\(Fallback Completeness\)\.
For every error stateρ=\(s,g,σ,1\)\\rho=\(s,g,\\sigma,1\), the fallback mechanism guarantees either \(1\) an admissible actiona′a^\{\\prime\}withU\(s,a′\)≥θU\(s,a^\{\\prime\}\)\\geq\\theta, or \(2\) descent toG0G\_\{0\}in at most\|𝒢\|−1=4\|\\mathcal\{G\}\|\-1=4steps\.
###### Proof Sketch\.
Each failed attempt de\-escalates by one gear; after at most 4 steps, the system reachesG0G\_\{0\}where read\-only actions trivially admit non\-negative utility\. See Appendix[A\.4](https://arxiv.org/html/2607.00334#A1.SS4)\.□\\square∎
###### Theorem 5\(Representation Theorem\)\.
There exists an MDPℳ=\(ℛ,𝒜,T,R,γ\)\\mathcal\{M\}=\(\\mathcal\{R\},\\mathcal\{A\},T,R,\\gamma\)such that the set of EntropyRuntime policies equals the set of gear\-constrained stationary policies inℳ\\mathcal\{M\}, and the value functions are identical\.
###### Proof Sketch\.
Constructℳ\\mathcal\{M\}with state spaceℛ\\mathcal\{R\}, rewardR\(ρ,a\)=U\(s,a\)⋅Gate\(s,a\)R\(\\rho,a\)=U\(s,a\)\\cdot\\textsc\{Gate\}\(s,a\), and transition kernel inherited from the runtime\. Gear\-constrained policies inℳ\\mathcal\{M\}biject with EntropyRuntime policies; value equivalence follows from the identical dynamics\. See Appendix[A\.5](https://arxiv.org/html/2607.00334#A1.SS5)\.□\\square∎
### 5\.2Component Ablation
We validate the necessity of each single\-agent component over 100 autonomous cycles per condition, measuringStability\(σ\\sigma, completion rate\),Risk\(unsafe actions\), andContinuity\(halts, fallback frequency\)\.
Table 1:Single\-agent ablation\. Each condition runs 100 cycles\.#### Interpretation\.
The gear abstraction is essential for stability: its removal raisesσ\\sigmafrom<0\.5<0\.5to≈1\.2\\approx 1\.2and drops completion from 100% to 82%\. Utility gating is the primary safety barrier; its removal produces 9 unsafe actions\. The fallback is essential for continuity: its removal causes 7 halts while leaving safety intact\. The raw LLM loop produces 27 unsafe actions and requires manual restarts, confirming the necessity of a formal control layer\.
## 6SMARt Runtime Governance for Multi\-Agent Control
The single\-agent formulation established monotonic stability \(Theorem[1](https://arxiv.org/html/2607.00334#Thmtheorem1)\) and execution safety \(Theorem[2](https://arxiv.org/html/2607.00334#Thmtheorem2)\) for one agent operating in isolation\. For multi\-agent CPS, runtime evidence is mapped into the SMARt governance states through five mechanisms: a consensus utility gate, a swarm Lyapunov function, state\-transition thresholds, per\-agent gear authority, and a rendezvous policy triple\. Together they coordinate joint dispatch, certify workspace stability, and specify drain, hold, and return behavior\. Their purpose is enforcement within the CPS configuration; the underlying governance semantics remain unchanged\.
#### Joint runtime state\.
The runtime is configured for a team ofnnagents sharing a physical workspace\. Letℐ=\{1,…,n\}\\mathcal\{I\}=\\\{1,\\ldots,n\\\}be the agent index set\. Each agenti∈ℐi\\in\\mathcal\{I\}maintains its own runtime stateρti=\(sti,gti,σti,ϵti\)\\rho\_\{t\}^\{i\}=\(s\_\{t\}^\{i\},g\_\{t\}^\{i\},\\sigma\_\{t\}^\{i\},\\epsilon\_\{t\}^\{i\}\)as in Definition[4](https://arxiv.org/html/2607.00334#Thmdefinition4)\.
###### Definition 6\(Joint Runtime State\)\.
The*joint runtime state*at cyclettis the tupleρt1:n=\(ρt1,…,ρtn,xt\)\\rho\_\{t\}^\{1:n\}=\(\\rho\_\{t\}^\{1\},\\ldots,\\rho\_\{t\}^\{n\},x\_\{t\}\)wherext∈𝒳x\_\{t\}\\in\\mathcal\{X\}is the shared workspace state observable by all agents\.
### 6\.1Consensus Utility Gate
###### Definition 7\(Consensus Utility Gate\)\.
The*consensus utility gate*Gatec\(xt,at1:n\)\\textsc\{Gate\}\_\{c\}\(x\_\{t\},a\_\{t\}^\{1:n\}\)is:
Gatec\(xt,at1:n\)=\{1ifmini∈ℐUi\(sti,ati\)≥θ0otherwise\\textsc\{Gate\}\_\{c\}\(x\_\{t\},a\_\{t\}^\{1:n\}\)=\\begin\{cases\}1&\\text\{if \}\\min\_\{i\\in\\mathcal\{I\}\}U\_\{i\}\(s\_\{t\}^\{i\},a\_\{t\}^\{i\}\)\\geq\\theta\\\\ 0&\\text\{otherwise\}\\end\{cases\}whereUiU\_\{i\}is the local utility of agentiiandθ≥0\\theta\\geq 0is the threshold of Definition[3](https://arxiv.org/html/2607.00334#Thmdefinition3)\.
Joint execution is permitted if and only if every agent individually satisfiesθ\\theta\. Any single agent belowθ\\thetablocks the entire team without requiring inter\-agent communication; this structurally conservative property cannot be weakened by message delays\.
The local utility for agentiifollows Definition[2](https://arxiv.org/html/2607.00334#Thmdefinition2)with two adaptations:
Ui\(sti,ati\)=α⋅Δtaski−β⋅riski−γ⋅HiU\_\{i\}\(s\_\{t\}^\{i\},a\_\{t\}^\{i\}\)=\\alpha\\cdot\\Delta\\text\{task\}\_\{i\}\-\\beta\\cdot\\text\{risk\}\_\{i\}\-\\gamma\\cdot H\_\{i\}whereriski\\text\{risk\}\_\{i\}is a collision\-risk score from perceived pairwise clearance \(Section[8](https://arxiv.org/html/2607.00334#S8)\) andHiH\_\{i\}is the differential entropy of agentii’s sensor noise distribution\. The entropy termHiH\_\{i\}extends the resource\-cost component of Definition[2](https://arxiv.org/html/2607.00334#Thmdefinition2)to penalise agents with degraded telemetry before utility falls belowθ\\theta\. This provides an early\-warning signal that the single\-agent formulation cannot provide\.
### 6\.2Runtime Governance\-State Mapping
###### Definition 8\(Swarm Lyapunov Function\)\.
Letp^i\(t\)\\hat\{p\}\_\{i\}\(t\)denote the*perceived*end\-effector position of agentiiandpi∗\(t\)p\_\{i\}^\{\*\}\(t\)its nominal trajectory position\. Following standard Lyapunov stability theory\[[26](https://arxiv.org/html/2607.00334#bib.bib26)\], the*swarm Lyapunov function*is:
Vswarm\(t\)\\displaystyle V\_\{\\mathrm\{swarm\}\}\(t\)=∑i∈ℐ‖p^i\(t\)−pi∗\(t\)‖2\\displaystyle=\\sum\_\{i\\in\\mathcal\{I\}\}\\\|\\hat\{p\}\_\{i\}\(t\)\-p\_\{i\}^\{\*\}\(t\)\\\|^\{2\}\+λ∑i∈ℐσ~i\(t\)2\.\\displaystyle\\quad\+\\lambda\\sum\_\{i\\in\\mathcal\{I\}\}\\tilde\{\\sigma\}\_\{i\}\(t\)^\{2\}\.whereλ\>0\\lambda\>0andσ~i\(t\)=‖p^i\(t\)−pi\(t\)‖\\tilde\{\\sigma\}\_\{i\}\(t\)=\\\|\\hat\{p\}\_\{i\}\(t\)\-p\_\{i\}\(t\)\\\|is agentii’s sensor drift magnitude\. Note:σ~i\\tilde\{\\sigma\}\_\{i\}is the sensor drift magnitude; it is distinct from the scalar runtime instability measureσt\\sigma\_\{t\}of Definition[4](https://arxiv.org/html/2607.00334#Thmdefinition4)\.
###### Definition 9\(SMARt Runtime Governance\-State Mapping\)\.
The active SMARt governance stateψt\\psi\_\{t\}is determined by:
ψt=\{Regulated,rmax\(t\)≥τcrit,Assisted,Gatec=0,rmax\(t\)<τcrit,Meta\-Cognitive,Gatec=1,rmax\(t\)≥τmeta,Stable,otherwise\.\\psi\_\{t\}=\\begin\{cases\}\\textsc\{Regulated\}\{\},&r\_\{\\max\}\(t\)\\geq\\tau\_\{\\mathrm\{crit\}\},\\\\ \\textsc\{Assisted\}\{\},&\\textsc\{Gate\}\_\{c\}=0,\\ r\_\{\\max\}\(t\)<\\tau\_\{\\mathrm\{crit\}\},\\\\ \\textsc\{Meta\-Cognitive\}\{\},&\\textsc\{Gate\}\_\{c\}=1,\\ r\_\{\\max\}\(t\)\\geq\\tau\_\{\\mathrm\{meta\}\},\\\\ \\textsc\{Stable\}\{\},&\\text\{otherwise\}\.\\end\{cases\}wherermax\(t\)=maxi∈ℐriski\(t\)r\_\{\\max\}\(t\)=\\max\_\{i\\in\\mathcal\{I\}\}\\text\{risk\}\_\{i\}\(t\)and0<τmeta<τcrit≤10<\\tau\_\{\\mathrm\{meta\}\}<\\tau\_\{\\mathrm\{crit\}\}\\leq 1\.
The two thresholds produce a graduated response\. The governance state determines authority, and the gear defines the permitted action scope\. InStable, new task\-level action may be authorized\. Whenrmaxr\_\{\\max\}first exceedsτmeta\\tau\_\{\\mathrm\{meta\}\}, the system entersMeta\-Cognitive: new discretionary task selection is suspended\. Only bounded diagnostic or recovery actions, including completion of a previously authorized low\-level command, may continue at reduced velocity\.Assistedfreezes autonomous task progression and requires external\-agent or SME support when the gate closes beforermaxr\_\{\\max\}reachesτcrit\\tau\_\{\\mathrm\{crit\}\}\. Whenrmaxr\_\{\\max\}exceedsτcrit\\tau\_\{\\mathrm\{crit\}\},Regulatedtriggers a hardware emergency stop regardless of gate state\. Authoritative autonomous decisions therefore originate only inStable, while the controller can still reach a safe hold without discarding an already authorized epoch\.
### 6\.3Per\-Agent Gear and the G3/G4 Consensus Elevation
In this multi\-agent runtime configuration,G4G\_\{4\}\(Integrate\) is not a gear any individual agent can occupy independently\. It is an emergent system\-level property: the cell achievesG4G\_\{4\}only when every agent simultaneously holdsG3G\_\{3\}\(Execute\)\.
###### Definition 10\(Per\-Agent Gear and System Gear\)\.
The*per\-agent gear*gtig\_\{t\}^\{i\}is:
gti=\{G0ifriski\(t\)≥τcritG1ifUi\(sti,ati\)<θ\(gate closed for agenti\)G2ifriski\(t\)≥τmetaG3otherwise \(cleared for autonomous execution\)g\_\{t\}^\{i\}=\\begin\{cases\}G\_\{0\}&\\text\{if \}\\emph\{risk\}\_\{i\}\(t\)\\geq\\tau\_\{\\mathrm\{crit\}\}\\\\ G\_\{1\}&\\text\{if \}U\_\{i\}\(s\_\{t\}^\{i\},a\_\{t\}^\{i\}\)<\\theta\\quad\\text\{\(gate closed for agent \}i\\text\{\)\}\\\\ G\_\{2\}&\\text\{if \}\\emph\{risk\}\_\{i\}\(t\)\\geq\\tau\_\{\\mathrm\{meta\}\}\\\\ G\_\{3\}&\\text\{otherwise \(cleared for autonomous execution\)\}\\end\{cases\}The*system gear*k\(ψt\)k\(\\psi\_\{t\}\)isG4G\_\{4\}ifψt=Stable\\psi\_\{t\}=\\textsc\{Stable\}\{\};G2G\_\{2\}ifψt=Meta\-Cognitive\\psi\_\{t\}=\\textsc\{Meta\-Cognitive\}\{\};G1G\_\{1\}ifψt=Assisted\\psi\_\{t\}=\\textsc\{Assisted\}\{\};G0G\_\{0\}ifψt=Regulated\\psi\_\{t\}=\\textsc\{Regulated\}\{\}\. The consensus elevation holds:
k\(ψt\)=G4⇔∀i∈ℐ:gti=G3k\(\\psi\_\{t\}\)=G\_\{4\}\\iff\\forall i\\in\\mathcal\{I\}:g\_\{t\}^\{i\}=G\_\{3\}
No individual agent is assignedG4G\_\{4\}in isolation\. When any agent falls belowG3G\_\{3\}, the system gear descends and the active governance state transitions out ofStable\. The velocity scales are:v\(G0\)=v\(G1\)=0v\(G\_\{0\}\)=v\(G\_\{1\}\)=0;v\(G2\)=0\.5v\(G\_\{2\}\)=0\.5;v\(G3\)=v\(G4\)=1\.0v\(G\_\{3\}\)=v\(G\_\{4\}\)=1\.0\. TheG2G\_\{2\}scale is a safety\-controller allowance for bounded recovery or epoch drain, not continued discretionary autonomy\.
### 6\.4Epoch\-Synchronised Rendezvous and the Policy Triple
The single\-agent cycle \(Section[4](https://arxiv.org/html/2607.00334#S4)\) is atomic per step\. For a multi\-agent team, the gate must evaluate the*joint*state, requiring all agents to complete their current action before the gate decides the next gear\. We model this as an epoch\-synchronised \(bulk\-synchronous\) execution structure\.
Within\-epoch: agents execute their current command independently, at the velocity scale inherited from the previous epoch’s gate decision\.Epoch boundary: the gate evaluates the perceived joint state, produces new per\-agent gears and an updated governance state, and the new gear takes effect next epoch\.G0G\_\{0\}exception:Regulatedbypasses the epoch boundary as a hardware interrupt the momentrmax≥τcritr\_\{\\max\}\\geq\\tau\_\{\\mathrm\{crit\}\}is detected mid\-epoch\.
This structure gives rise to*drain semantics*: when a fault is detected at the epoch boundary, the faulting agent completes the current epoch before the new velocity scale applies\.
###### Definition 11\(Rendezvous Policy Triple\)\.
A*rendezvous policy triple*Π=\(Πd,Πh,Πr\)\\Pi=\(\\Pi\_\{d\},\\Pi\_\{h\},\\Pi\_\{r\}\)specifies:
- •Πd\\Pi\_\{d\}selectsComplete EpochorImmediatedescent, determining whether gear descent occurs at the next epoch boundary or mid\-epoch\.
- •Πh\\Pi\_\{h\}selectsHard DependencyorContinue Independent, determining whether non\-faulting agents freeze with the faulting agent \(Assistedstate\) or continue at their own per\-agent geargtjg\_\{t\}^\{j\}\.
- •Πr=\(Πms,Πrts\)\\Pi\_\{r\}=\(\\Pi\_\{ms\},\\Pi\_\{rts\}\)defines two return policies:Πms\\Pi\_\{ms\}governs the transition fromMeta\-CognitivetoStable, whileΠrts\\Pi\_\{rts\}governs the transition fromRegulatedtoStable\. Each policy is one ofSME Explicit,Auto Continue\[δ\]\[\\delta\], orReset Restart\. TheAuto Continue\[δ\]\[\\delta\]policy ascends afterδ\\deltaconsecutive clean epochs withrmax<τmetar\_\{\\max\}<\\tau\_\{\\mathrm\{meta\}\};Reset Restartrequires a full restart from a verified safe configuration\.
The UR5/NIST instantiation uses
Πd\\displaystyle\\Pi\_\{d\}=Complete Epoch,\\displaystyle=\\textsc\{Complete Epoch\},Πh\\displaystyle\\Pi\_\{h\}=Continue Independent,\\displaystyle=\\textsc\{Continue Independent\},Πr\\displaystyle\\Pi\_\{r\}=\(Auto Continue\[δ=3\],Reset Restart\)\.\\displaystyle=\\bigl\(\\textsc\{Auto Continue\}\[\\delta\{=\}3\],\\ \\textsc\{Reset Restart\}\\bigr\)\.justified by three validity conditions:V1\(drain\): the fault is in the camera layer, not the encoder\-based actuator; completing the epoch does not alter the robot’s physical trajectory;V2\(hold\): nominal trajectories are geometrically independent, so non\-faulting agents self\-limit through their own per\-agent gate if the faulting agent’s drift raises their perceived risk;V3\(return\):δ=3\\delta=3clean epochs belowτmeta\\tau\_\{\\mathrm\{meta\}\}is strictly more conservative than the single\-epoch exit condition, with the OU mean\-reversion process ensuring contraction\.
#### Single\-agent reduction\.
Settingn=1n=1collapsesGatec\\textsc\{Gate\}\_\{c\}toGate\(s,a\)\\textsc\{Gate\}\(s,a\)\(Definition[3](https://arxiv.org/html/2607.00334#Thmdefinition3)\), and the pairwise collision terms vanish\. If a single\-agent domain risk signal is supplied, the four SMARt governance states remain available; without such a signal, the runtime reduces to the gate\-driven behavior of Section[4](https://arxiv.org/html/2607.00334#S4)\. Distributed execution safety and distributed monotonic stability then reduce to their single\-agent counterparts \(Theorems[2](https://arxiv.org/html/2607.00334#Thmtheorem2)and[1](https://arxiv.org/html/2607.00334#Thmtheorem1)\)\. The stabilization, fallback, and representation results remain properties of the underlying single\-agent runtime, while the policy triple degenerates to its fallback and restart logic\.
## 7Formal Results: Multi\-Agent System
We now state the distributed safety and stability results for the multi\-agent runtime\. Complete proofs appear in Appendix[B](https://arxiv.org/html/2607.00334#A2)\.
###### Theorem 6\(Distributed Execution Safety\)\.
Under the consensus utility gate \(Definition[7](https://arxiv.org/html/2607.00334#Thmdefinition7)\) and any policy tripleΠ\\Pi:
∀t≥0:Gatec=1⟹∀i∈ℐ:Ui\(sti,ati\)≥θ≥0\\forall t\\geq 0:\\ \\textsc\{Gate\}\_\{c\}=1\\implies\\forall i\\in\\mathcal\{I\}:U\_\{i\}\(s\_\{t\}^\{i\},a\_\{t\}^\{i\}\)\\geq\\theta\\geq 0Distributed execution safety is a structural invariant of the consensus gate, independent of the choice ofΠ\\Pi\.
###### Proof Sketch\.
Gatec=1\\textsc\{Gate\}\_\{c\}=1requiresminiUi≥θ\\min\_\{i\}U\_\{i\}\\geq\\thetaby Definition[7](https://arxiv.org/html/2607.00334#Thmdefinition7), immediately implyingUi≥θ≥0U\_\{i\}\\geq\\theta\\geq 0for allii\. Theorem[2](https://arxiv.org/html/2607.00334#Thmtheorem2)applies to each agent independently\. The policy triple governs drain, hold, and return behavior\. Any continuation inMeta\-Cognitiveis restricted to the bounded𝒜2\\mathcal\{A\}\_\{2\}action set and remains subject to the same utility gate\. See Appendix[B\.1](https://arxiv.org/html/2607.00334#A2.SS1)\.□\\square∎
###### Corollary 7\(Rendezvous Safety\)\.
LetΠh=Continue Independent\\Pi\_\{h\}=\\textsc\{Continue Independent\}with validity conditions V1 through V3 \(Definition[11](https://arxiv.org/html/2607.00334#Thmdefinition11)\)\. Then: \(a\) under V1, no collision occurs during the drain phase; \(b\) under V2, every non\-faulting agentjjsatisfiesUj≥θU\_\{j\}\\geq\\thetathroughout the hold phase; \(c\) under V3, the system does not re\-enterStableprematurely\. See Appendix[B\.2](https://arxiv.org/html/2607.00334#A2.SS2)\.
###### Theorem 8\(Distributed Monotonic Stability\)\.
Let\{Vswarm\(t\)\}t≥0\\\{V\_\{\\mathrm\{swarm\}\}\(t\)\\\}\_\{t\\geq 0\}be generated under the multi\-agent runtime policy with anyΠ\\Pisatisfying V1 and V2\. Then for allt≥0t\\geq 0:
𝔼\[Vswarm\(t\)−V∗\]≤α2t\(Vswarm\(0\)−V∗\)\+C1−α2\\mathbb\{E\}\[V\_\{\\mathrm\{swarm\}\}\(t\)\-V^\{\*\}\]\\leq\\alpha^\{2t\}\(V\_\{\\mathrm\{swarm\}\}\(0\)\-V^\{\*\}\)\+\\frac\{C\}\{1\-\\alpha^\{2\}\}whereα=1−θou\\alpha=1\-\\theta\_\{\\mathrm\{ou\}\},V∗=\(1\+λ\)‖μ‖2V^\{\*\}=\(1\+\\lambda\)\\\|\\mu\\\|^\{2\}, andC=\(1\+λ\)⋅3σnoise2C=\(1\+\\lambda\)\\cdot 3\\sigma\_\{\\mathrm\{noise\}\}^\{2\}\. The workspace energy converges geometrically to the bounded invariant set\[0,V∗\+C/\(1−α2\)\]\[0,\\ V^\{\*\}\+C/\(1\-\\alpha^\{2\}\)\]\. UnderContinue Independent, non\-faulting agents contribute zero increments toVswarmV\_\{\\mathrm\{swarm\}\}; the bound holds uniformly across bothΠh\\Pi\_\{h\}choices\.
###### Proof Sketch\.
VswarmV\_\{\\mathrm\{swarm\}\}reduces to\(1\+λ\)‖dA\(t\)‖2\(1\+\\lambda\)\\\|d\_\{A\}\(t\)\\\|^\{2\}\(only the faulting agent carries drift; B/C track nominal within encoder control errorεctrl\\varepsilon\_\{\\mathrm\{ctrl\}\}\)\. The OU squared\-norm recursion gives𝔼\[‖dA\(t\+1\)‖2\]=‖αdA\+\(1−α\)μ‖2\+3σnoise2\\mathbb\{E\}\[\\\|d\_\{A\}\(t\+1\)\\\|^\{2\}\]=\\\|\\alpha d\_\{A\}\+\(1\-\\alpha\)\\mu\\\|^\{2\}\+3\\sigma\_\{\\mathrm\{noise\}\}^\{2\}\. SettingΔV\(t\)=Vswarm\(t\)−V∗\\Delta V\(t\)=V\_\{\\mathrm\{swarm\}\}\(t\)\-V^\{\*\}and applying Cauchy\-Schwarz yields𝔼\[ΔV\(t\+1\)\]≤α2ΔV\(t\)\+C\\mathbb\{E\}\[\\Delta V\(t\+1\)\]\\leq\\alpha^\{2\}\\Delta V\(t\)\+C\. Unrolling gives the stated bound \(α2=0\.5625<1\\alpha^\{2\}=0\.5625<1; convergent\)\. See Appendix[B\.3](https://arxiv.org/html/2607.00334#A2.SS3)\.□\\square∎
Theorem[8](https://arxiv.org/html/2607.00334#Thmtheorem8)extends Theorem[1](https://arxiv.org/html/2607.00334#Thmtheorem1): where Theorem[1](https://arxiv.org/html/2607.00334#Thmtheorem1)bounds the abstract instability scalarσ\\sigma, Theorem[8](https://arxiv.org/html/2607.00334#Thmtheorem8)bounds the physical workspace energyVswarmV\_\{\\mathrm\{swarm\}\}\. The results are complementary: Theorem[1](https://arxiv.org/html/2607.00334#Thmtheorem1)certifies decision\-making stability; Theorem[8](https://arxiv.org/html/2607.00334#Thmtheorem8)certifies physical\-workspace stability\.
###### Corollary 9\(Feedback\-Coupled Stability Attenuation\)\.
For CPS where sensor drift propagates to the control input with feedback gainαfb∈\[0,1\]\\alpha\_\{fb\}\\in\[0,1\], the Lyapunov recursion acquires an additional coupling term bounded byαfb2⋅v2\(ψt\)⋅Vswarm\(t\)\\alpha\_\{fb\}^\{2\}\\cdot v^\{2\}\(\\psi\_\{t\}\)\\cdot V\_\{\\mathrm\{swarm\}\}\(t\)\. InMeta\-Cognitivestate \(v=0\.5v=0\.5\), this term is reduced byv2=0\.25v^\{2\}=0\.25relative toStable; inRegulated\(v=0v=0\), it is eliminated\. The binary gate, which permits onlyv∈\{0,1\}v\\in\\\{0,1\\\}, cannot achieve the intermediate0\.25×0\.25\\timesreduction\. See Appendix[B\.4](https://arxiv.org/html/2607.00334#A2.SS4)\.
###### Theorem 10\(Zero\-Collision Guarantee\)\.
Under the multi\-agent runtime policy with E\-Stop \(Definitions[9](https://arxiv.org/html/2607.00334#Thmdefinition9)through[11](https://arxiv.org/html/2607.00334#Thmdefinition11)\), policy tripleΠ\\Pisatisfying V1, and nominal pairwise clearance≥cnom\>0\\geq c\_\{\\mathrm\{nom\}\}\>0:
Pr\(∃t≥0,i≠j:‖pi\(t\)−pj\(t\)‖≤2r\+Δsafe\)=0\\Pr\\\!\\left\(\\exists\\,t\\geq 0,\\ i\\neq j:\\ \\\|p\_\{i\}\(t\)\-p\_\{j\}\(t\)\\\|\\leq 2r\+\\Delta\_\{\\mathrm\{safe\}\}\\right\)=0
###### Proof Sketch\.
Under V1 \(fault in camera, not actuator\), true positions satisfy‖pi\(t\)−pi∗\(t\)‖≤εctrl<2mm\\\|p\_\{i\}\(t\)\-p\_\{i\}^\{\*\}\(t\)\\\|\\leq\\varepsilon\_\{\\mathrm\{ctrl\}\}<2\\,\\text\{mm\}for alli,ti,t\. Nominal trajectories maintain pairwise clearance≥cnom=76mm≫2εctrl\\geq c\_\{\\mathrm\{nom\}\}=76\\,\\text\{mm\}\\gg 2\\varepsilon\_\{\\mathrm\{ctrl\}\}; the triangle inequality prevents contact\. For feedback\-coupled systems \(V1 fails\), the E\-Stop fires at perceived clearance≤−37mm\\leq\-37\\,\\text\{mm\}; since true clearance≥perceived−‖dA‖\\geq\\text\{perceived\}\-\\\|d\_\{A\}\\\|and‖dA‖≤σfault\\\|d\_\{A\}\\\|\\leq\\sigma\_\{\\mathrm\{fault\}\}\(OU bound\), true clearance at E\-Stop trigger≥−37\+120=83mm\>0\\geq\-37\+120=83\\,\\text\{mm\}\>0\. See Appendix[B\.5](https://arxiv.org/html/2607.00334#A2.SS5)\.□\\square∎
## 8UR5 CPS Case Study and Evaluation
### 8\.1Physical and Sensor Setup
Three Universal Robots UR5 manipulators\[[27](https://arxiv.org/html/2607.00334#bib.bib27)\]are mounted at the vertices of an equilateral triangle of circumradiusR=0\.52mR=0\.52\\,\\text\{m\}, oriented inward toward a shared assembly fixture at the workspace origin\. Each UR5 has reach0\.85m0\.85\\,\\text\{m\}and bounding\-sphere radiusr=0\.12mr=0\.12\\,\\text\{m\}\. Nominal end\-effector trajectories maintain pairwise inter\-sphere clearance of approximately 80 mm throughout fault\-free operation:
pi∗\(t\)\\displaystyle p\_\{i\}^\{\*\}\(t\)=pbase,i\+dirreach\(t\),\\displaystyle=p\_\{\\mathrm\{base\},i\}\+d\_\{i\}\\,r\_\{\\mathrm\{reach\}\}\(t\),rreach\(t\)\\displaystyle r\_\{\\mathrm\{reach\}\}\(t\)=0\.49\+0\.01sin\(2π\(0\.2\)t\+ϕi\)\.\\displaystyle=49\+01\\sin\\\!\\bigl\(2\\pi\(2\)t\+\\phi\_\{i\}\\bigr\)\.Physical collision is defined as‖pi\(t\)−pj\(t\)‖≤2r\+Δsafe=0\.29m\\\|p\_\{i\}\(t\)\-p\_\{j\}\(t\)\\\|\\leq 2r\+\\Delta\_\{\\mathrm\{safe\}\}=0\.29\\,\\text\{m\}, whereΔsafe=0\.05m\\Delta\_\{\\mathrm\{safe\}\}=0\.05\\,\\text\{m\}is a safety margin per ISO/TS 15066\[[28](https://arxiv.org/html/2607.00334#bib.bib28)\]and ISO 10218\-1\[[29](https://arxiv.org/html/2607.00334#bib.bib29)\]\.
#### Sensor architecture and the V1 validity condition\.
UR5 robots use joint\-space encoders for internal motion control\. Cell\-level safety monitoring uses an overhead stereo\-vision system subject to calibration drift and occlusion errors\[[27](https://arxiv.org/html/2607.00334#bib.bib27),[30](https://arxiv.org/html/2607.00334#bib.bib30)\]\. This architectural separation is consequential: the robot’s true trajectory is governed by the encoder\-based controller and is*unaffected*by camera errors\. Sensor drift biases what the safety monitor perceives; it does not alter what the robot executes\. The consensus gate \(Definition[7](https://arxiv.org/html/2607.00334#Thmdefinition7)\) therefore operates on*perceived*positions while physical safety is determined by*true*positions\. This validates V1: epoch drain is safe because the actuator \(encoder loop\) is isolated from the sensor fault\.
The collision\-risk signal for ordered pair\(i,j\)\(i,j\)isriskij\(t\)=σlogistic\(−cij\(t\)/Δsafe\)\\text\{risk\}\_\{ij\}\(t\)=\\sigma\_\{\\mathrm\{logistic\}\}\(\-c\_\{ij\}\(t\)/\\Delta\_\{\\mathrm\{safe\}\}\)wherecij\(t\)=‖p^i\(t\)−p^j\(t\)‖−2r−Δsafec\_\{ij\}\(t\)=\\\|\\hat\{p\}\_\{i\}\(t\)\-\\hat\{p\}\_\{j\}\(t\)\\\|\-2r\-\\Delta\_\{\\mathrm\{safe\}\}\. Thresholds:τmeta=0\.19\\tau\_\{\\mathrm\{meta\}\}=0\.19activates at drift≈8mm\\approx 8\\,\\text\{mm\};τcrit=0\.65\\tau\_\{\\mathrm\{crit\}\}=0\.65activates at drift≈117mm\\approx 117\\,\\text\{mm\}\.
### 8\.2Fault Model and Baseline
We model sensor faults using statistics from the NIST*Degradation Measurement of Robot Arm Position Accuracy*dataset\[[31](https://arxiv.org/html/2607.00334#bib.bib31)\]\. At a uniformly random injection timetanom∈\[20,60\]t\_\{\\mathrm\{anom\}\}\\in\[20,60\], agent A’s camera reports a directional bias toward agent B\. The drift vectordA\(t\)d\_\{A\}\(t\)follows an Ornstein\-Uhlenbeck process\[[32](https://arxiv.org/html/2607.00334#bib.bib32)\]:
dA\(t\+1\)=dA\(t\)\+θou\(μ−dA\(t\)\)\+ε\(t\)d\_\{A\}\(t\+1\)=d\_\{A\}\(t\)\+\\theta\_\{\\mathrm\{ou\}\}\(\\mu\-d\_\{A\}\(t\)\)\+\\varepsilon\(t\)whereμ=σfault⋅u^\(pB−pA\)\\mu=\\sigma\_\{\\mathrm\{fault\}\}\\cdot\\hat\{u\}\(p\_\{B\}\-p\_\{A\}\),θou=0\.25\\theta\_\{\\mathrm\{ou\}\}=0\.25, andε\(t\)∼𝒩\(0,σnoise2I3\)\\varepsilon\(t\)\\sim\\mathcal\{N\}\(0,\\sigma\_\{\\mathrm\{noise\}\}^\{2\}I\_\{3\}\)withσnoise=0\.10⋅σfault\\sigma\_\{\\mathrm\{noise\}\}=0\.10\\cdot\\sigma\_\{\\mathrm\{fault\}\}\. The bimodal 90%/10% mixture is a simulation design choice informed by the NIST position\-degradation measurements; it is not a class distribution supplied by the NIST dataset\. Specifically, 90% of episodes useσfault=12mm\\sigma\_\{\\mathrm\{fault\}\}=12\\,\\text\{mm\}\(normal calibration drift\), and 10% useσfault=120mm\\sigma\_\{\\mathrm\{fault\}\}=120\\,\\text\{mm\}\(severe multi\-fault cascade\)\.
#### Baseline condition\.
The baseline implements the single\-agent gate \(Definition[3](https://arxiv.org/html/2607.00334#Thmdefinition3)\) applied naively to each agent independently\. The cell executes atG4G\_\{4\}\(v=1\.0v=1\.0\) when all individual gates are open, and freezes atG0G\_\{0\}\(v=0v=0\) when any gate closes\. The baseline has no governance\-state layer:τmeta\\tau\_\{\\mathrm\{meta\}\},τcrit\\tau\_\{\\mathrm\{crit\}\}, and the E\-Stop are absent\. This represents the performance ceiling of the single\-agent formulation in a multi\-agent setting\.
### 8\.3Experimental Setup
We run Monte Carlo simulation\[[33](https://arxiv.org/html/2607.00334#bib.bib33)\]overN=10,000N=10\{,\}000episodes with fixed seed 42,T=150T=150epochs per episode\. Both conditions use identical fault seeds, injection times, and OU parameters\. The governed condition uses the UR5 policy triple specified after Definition[11](https://arxiv.org/html/2607.00334#Thmdefinition11), includingAuto Continue\[δ=3\]\[\\delta\{=\}3\]for return fromMeta\-CognitiveandReset RestartafterRegulated\.
Table 2:Single\-agent baseline vs\. governed multi\-agent runtime\.N=10,000N=10\{,\}000episodes, seed=42=42,T=150T=150epochs\.σfault=12mm\\sigma\_\{\\mathrm\{fault\}\}=12\\,\\text\{mm\}\(90%\),120mm120\\,\\text\{mm\}\(10%\)\.
### 8\.4Results and Analysis
100% convergence in the baseline masks a 97\.9% detection failure rate\.Every baseline episode returns toStable, but for 97\.9% this reflects the OU process naturally reverting without any response from the control layer\. The system converges*despite*the fault, not*because*of it\. The governed runtime’s 90\.2% convergence is lower in absolute terms but qualitatively different: 89\.9% produce a certifiedS→M→SS\\to M\\to Scycle with a full audit trace\. The 9\.8% that do not converge are severe\-fault episodes correctly escalated toRegulated, which is the appropriate response to a multi\-fault cascade\.
The consensus gate is necessary but not sufficient\.For normal faults \(σfault=12mm\\sigma\_\{\\mathrm\{fault\}\}=12\\,\\text\{mm\}\),UA≈0\.264≫θU\_\{A\}\\approx 0\.264\\gg\\thetaat OU steady state; the gate never closes\. The baseline is structurally incapable of detecting 90% of fault injections because the fault enters at the sensor layer that the single\-agent utility function does not monitor\. Theτmeta\\tau\_\{\\mathrm\{meta\}\}threshold, evaluated onrmaxr\_\{\\max\}independently ofGatec\\textsc\{Gate\}\_\{c\}, is the decisive addition, as confirmed by the ablation \(Table[3](https://arxiv.org/html/2607.00334#S8.T3)\)\.
Equal collision rates do not imply equal safety guarantees\.Both conditions report zero physical collisions\. Under the baseline, zero collisions is an artefact of the UR5 telemetry architecture \(V1\): encoder control is unaffected by camera drift\. For feedback\-coupled CPS \(GPS\-guided vehicles, vision\-servo manipulation\), where sensor errors propagate to the control input \(V1 fails\), the baseline’s result does not hold\. Corollary[9](https://arxiv.org/html/2607.00334#Thmtheorem9)shows that the runtime’sv=0\.5v=0\.5inMeta\-Cognitivereduces the Lyapunov coupling term byv2=0\.25v^\{2\}=0\.25, a quantifiable stabilising effect that the binary gate cannot match\.
Continue Independentpreserves throughput\.Agents B and C operate atG3G\_\{3\}\(v=1\.0v=1\.0\) while A is damped toG2G\_\{2\}\(v=0\.5v=0\.5\), preserving approximately 83% of cell productive throughput inMeta\-Cognitiveversus 50% underHard Dependency\. By Corollary[7](https://arxiv.org/html/2607.00334#Thmtheorem7)\(b\), this does not compromise safety because B/C’s per\-agent gates remain self\-limiting \(V2\)\.
### 8\.5Multi\-Agent Component Ablation
Table 3:Incremental ablation of multi\-agent runtime mechanisms\.N=10,000N=10\{,\}000episodes\.The consensus gate alone provides no improvement for normal faults\. Theτmeta\\tau\_\{\\mathrm\{meta\}\}threshold is the decisive component, entirely responsible for the47\.7×47\.7\\timesdetection gain and3\.5×3\.5\\timeslatency reduction\.τcrit\\tau\_\{\\mathrm\{crit\}\}\+ E\-Stop prevents indefinite deadlock on severe faults\.VswarmV\_\{\\mathrm\{swarm\}\}adds the formal Lyapunov certificate without changing operational metrics\. The policy triple adds no operational change to these metrics but makes the domain\-specific rendezvous assumptions explicit and formally verifiable via Corollary[7](https://arxiv.org/html/2607.00334#Thmtheorem7)\.
## 9Discussion
#### Positioning and implications\.
The runtime architecture evaluated here occupies a distinct position among agent control systems\. Unlike verification frameworks\[[3](https://arxiv.org/html/2607.00334#bib.bib3),[4](https://arxiv.org/html/2607.00334#bib.bib4)\]that check actions post\-hoc, the utility gate provides*preventive*control\. Unlike static autonomy taxonomies\[[6](https://arxiv.org/html/2607.00334#bib.bib6)\], the gear mechanism provides*dynamic*adjustment based on observed behavior\. Unlike RL\-based approaches\[[16](https://arxiv.org/html/2607.00334#bib.bib16),[17](https://arxiv.org/html/2607.00334#bib.bib17)\]that learn through trial and error, both frameworks enforce safety*structurally*\. The representation theorem \(Theorem[5](https://arxiv.org/html/2607.00334#Thmtheorem5)\) connects EntropyRuntime to the MDP formalism, enabling the application of standard RL tools such as value iteration and policy gradient while retaining the structural safety guarantees pure MDP formulations lack\.
This layered design prevents governance states from being conflated with robotic control modes\.
#### Practical deployment\.
The utility function requires domain\-specific instantiation\. A linear combination of task progress, safety margin, and resource cost suffices for most single\-agent applications\. For multi\-agent CPS, the entropy termHiH\_\{i\}must be calibrated to sensor specifications; the NIST position\-accuracy dataset provides an empirical calibration reference\. The safety thresholdθ\\thetais a single knob trading caution against productivity; we recommendθ=0\.15\\theta=0\.15as the UR5 default\. The gear escalation patienceh=3h=3balances responsiveness to dynamic environments against stability in predictable ones\. The AUTO\_CONTINUE parameterδ=3\\delta=3provides sufficient hysteresis against spurious re\-escalation under the OU mean\-reversion dynamics \(validated by Corollary[7](https://arxiv.org/html/2607.00334#Thmtheorem7)\(c\)\)\.
#### Limitations and future work\.
Several limitations remain\. First, the utility function must be specified externally; the framework does not learnUUfrom data\. In future deployments, domain\-expert constraints and formally specified invariants may supply part of the utility and threshold specification, but that integration is not implemented or empirically tested here\. Second, the single\-agent gear abstraction assumes a total order on action scopes; non\-hierarchical structures may require extensions\. Third, Theorem[3](https://arxiv.org/html/2607.00334#Thmtheorem3)assumes environmental stationarity; adaptive gear transition policies are needed for non\-stationary settings\. Fourth, Theorem[8](https://arxiv.org/html/2607.00334#Thmtheorem8)assumes the OU process accurately models sensor fault dynamics; other fault processes \(step faults, intermittent faults\) require re\-derivation of the Lyapunov bound\. Fifth, the policy triple’s validity conditions V1 through V3 must be verified for each domain instantiation; feedback\-coupled systems withαfb\>0\\alpha\_\{fb\}\>0requireΠd=Immediate\\Pi\_\{d\}=\\textsc\{Immediate\}and may requireΠh=Hard Dependency\\Pi\_\{h\}=\\textsc\{Hard Dependency\}\. Sixth, the three\-agent simulation moves beyond a two\-agent dyad and exercises three pairwise interactions, but it does not establish scalability to substantially larger teams\. Future work should quantify how agent count, workspace density, communication and synchronization overhead, decision latency, throughput, and safety margins interact\. These space, time, and performance trade\-offs must be evaluated in the context of the target problem and application domain; the consensus check is approximately linear in the number of agents, whereas exhaustive pairwise collision evaluation can grow quadratically\. Finally, Theorem[1](https://arxiv.org/html/2607.00334#Thmtheorem1)and Theorem[8](https://arxiv.org/html/2607.00334#Thmtheorem8)are proved separately for the single\-agent and multi\-agent cases; a unified Lyapunov framework spanning both is a natural direction for future work\.
#### Applicability\.
The runtime mechanisms are applicable to safety\-critical autonomous systems in domains such as healthcare, finance, infrastructure management, and industrial robotics, provided that the utility function, thresholds, and policy\-triple validity conditions are instantiated and verified for the target domain\.
## 10Conclusion
We presented EntropyRuntime as an action\-level control layer for single\- and multi\-agent systems\. Five single\-agent results establish stability, safe dispatch, recovery, and an MDP representation\. The multi\-agent analysis extends these guarantees to joint execution, rendezvous behavior, workspace stability, feedback\-coupled attenuation, and collision avoidance under the stated assumptions\.
In the UR5 study, the governed runtime improves anomaly detection by47\.7×47\.7\\times, reduces detection latency by3\.5×3\.5\\times, produces an audit trace for 89\.9% of episodes, and adds a formal workspace certificate\. The component ablation identifies theτmeta\\tau\_\{\\mathrm\{meta\}\}threshold as the main source of the detection gain, while the E\-Stop, Lyapunov bound, and policy triple provide escalation, certification, and explicit coordination semantics\.
The central result is a concrete separation of concerns: SMARt governs authority, whereas gears, gates, and rendezvous policies enforce that authority at runtime\. This preserves the established lifecycle while making its CPS behavior formally analyzable\.
## Data Availability Statement
The data supporting the findings of this study are derived from the publicly available*Degradation Measurement of Robot Arm Position Accuracy*dataset cited in Reference\[[31](https://arxiv.org/html/2607.00334#bib.bib31)\], together with publicly documented UR5 specifications cited in Reference\[[27](https://arxiv.org/html/2607.00334#bib.bib27)\]\. The simulation methodology, three\-agent configuration, parameter groupings, invariant and risk thresholds, noise levels, policy settings, random seed, episode count, and evaluation metrics used in this study are described in the manuscript\. No proprietary, private, or personally identifiable datasets were used\. The study did not generate a new external dataset repository; derived simulation outputs and intermediate materials may be made available from the second author upon reasonable request\.
## References
- \[1\]H\. Chase\.LangChain: Building applications with LLMs through composability\.*GitHub repository*, 2022\.
- \[2\]T\. Richards\.AutoGPT: An autonomous GPT\-4 experiment\.*GitHub repository*, 2023\.
- \[3\]R\. Doshi and J\. Hong\.Verifiably safe tool use for LLM agents\.*arXiv preprint arXiv:2601\.08012*, 2026\.
- \[4\]M\. Grigor, A\. Kumar, and S\. Lee\.VET your agent: Verification, evaluation, and testing for autonomous LLM agents\.*arXiv preprint arXiv:2512\.15892*, 2025\.
- \[5\]S\. Ramaswamy\.Intelligence as managed autonomy: Failure, escalation, and governance for agentic AI systems\.*Journal of Intelligent & Robotic Systems*, to appear, 2026\. Preprint: arXiv:2605\.27628\.
- \[6\]Z\. Feng and R\. McDonald\.Levels of autonomy for AI agents\.*arXiv preprint arXiv:2506\.12469*, 2025\.
- \[7\]D\. Hadfield\-Menell, A\. Dragan, P\. Abbeel, and S\. Russell\.The off\-switch game\.In*Proc\. IJCAI*, pages 220\-227, 2017\.
- \[8\]N\. G\. Leveson\.*Engineering a Safer World: Systems Thinking Applied to Safety*\.MIT Press, 2011\.
- \[9\]C\. Hwang, S\. Majumder, and N\. Peng\.Autonomous language model agents with tool use\.In*Findings EMNLP 2023*, pages 5678\-5692, 2023\.
- \[10\]S\. Yao, J\. Zhao, D\. Yu, N\. Du, I\. Shafran, K\. Narasimhan, and Y\. Cao\.ReAct: Synergizing reasoning and acting in language models\.In*Proc\. ICLR*, 2023\.
- \[11\]N\. Shinn, F\. Cassano, A\. Gopinath, K\. Narasimhan, and S\. Yao\.Reflexion: Language agents with verbal reinforcement learning\.In*NeurIPS 36*, pages 8634\-8652, 2023\.
- \[12\]A\. Sarkar and R\. Sarkar\.A survey of LLM agent communication with the model context protocol\.*arXiv preprint arXiv:2506\.05364*, 2025\.
- \[13\]D\. Amodei, C\. Olah, J\. Steinhardt, P\. Christiano, J\. Schulman, and D\. Mané\.Concrete problems in AI safety\.*arXiv preprint arXiv:1606\.06565*, 2016\.
- \[14\]S\. Russell\.*Human Compatible: Artificial Intelligence and the Problem of Control*\.Viking, 2019\.
- \[15\]R\. S\. Sutton, D\. Precup, and S\. Singh\.Between MDPs and semi\-MDPs: A framework for temporal abstraction in reinforcement learning\.*Artificial Intelligence*, 112\(1\-2\):181\-211, 1999\.
- \[16\]T\. Haarnoja, A\. Zhou, P\. Abbeel, and S\. Levine\.Soft actor\-critic: Off\-policy maximum entropy deep reinforcement learning with a stochastic actor\.In*Proc\. ICML*, pages 1861\-1870, 2018\.
- \[17\]D\. Pathak, P\. Agrawal, A\. A\. Efros, and T\. Darrell\.Curiosity\-driven exploration by self\-supervised prediction\.In*Proc\. ICML*, pages 2778\-2787, 2017\.
- \[18\]J\. A\. Stankovic\.Misconceptions about real\-time computing\.*Computer*, 21\(10\):10\-19, 1988\.
- \[19\]J\. R\. Norris\.*Markov Chains*\.Cambridge University Press, 1997\.
- \[20\]R\. Bellman\.*Dynamic Programming*\.Princeton University Press, 1957\.
- \[21\]M\. L\. Puterman\.*Markov Decision Processes*\.Wiley, 1994\.
- \[22\]T\. M\. Cover and J\. A\. Thomas\.*Elements of Information Theory*\.Wiley, 2nd edition, 2006\.
- \[23\]R\. Olfati\-Saber, J\. A\. Fax, and R\. M\. Murray\.Consensus and cooperation in networked multi\-agent systems\.*Proceedings of the IEEE*, 95\(1\):215\-233, 2007\.
- \[24\]V\. Digani, L\. Sabattini, C\. Secchi, and C\. Fantuzzi\.Ensemble coordination for multi\-robot systems\.*IEEE Transactions on Automation Science and Engineering*, 12\(2\):649\-662, 2015\.
- \[25\]A\. Rizk, M\. Awad, and E\. W\. Tunstel\.Cooperative heterogeneous multi\-robot systems: A survey\.*ACM Computing Surveys*, 52\(2\):1\-31, 2019\.
- \[26\]H\. K\. Khalil\.*Nonlinear Systems*, 3rd edition\.Prentice Hall, 2002\.
- \[27\]Universal Robots\.*UR5/CB3 User Manual, Software Version 3\.15*\.Universal Robots A/S, Odense, Denmark, 2022\.
- \[28\]ISO/TS 15066:2016\.*Robots and Robotic Devices: Collaborative Robots*\.ISO, Geneva, 2016\.
- \[29\]ISO 10218\-1:2011\.*Robots and Robotic Devices: Safety Requirements for Industrial Robots, Part 1: Robots*\.ISO, Geneva, 2011\.
- \[30\]S\. Haddadin, A\. De Luca, and A\. Albu\-Schäffer\.Robot collisions: A survey on detection, isolation, and identification\.*IEEE Transactions on Robotics*, 33\(6\):1292\-1312, 2017\.
- \[31\]Helen Qiao\.*Degradation Measurement of Robot Arm Position Accuracy*\.National Institute of Standards and Technology, Version 1\.0, 2018\.DOI:[https://doi\.org/10\.18434/M31962](https://doi.org/10.18434/M31962)\.NIST Public Data Repository:[https://data\.nist\.gov/od/id/754A77D9DA1E771AE0532457068179851962](https://data.nist.gov/od/id/754A77D9DA1E771AE0532457068179851962)\.Accessed June 29, 2026\.
- \[32\]G\. E\. Uhlenbeck and L\. S\. Ornstein\.On the theory of the Brownian motion\.*Physical Review*, 36\(5\):823\-841, 1930\.
- \[33\]D\. P\. Kroese, T\. Brereton, T\. Taimre, and Z\. I\. Botev\.Why the Monte Carlo method is so important today\.*WIREs Computational Statistics*, 6\(6\):386\-392, 2014\.
## Appendix AComplete Proofs: Single\-Agent System
### A\.1Proof of Theorem[1](https://arxiv.org/html/2607.00334#Thmtheorem1)\(Monotonic Stability\)
###### Proof\.
Letρt=\(st,gt,σt,ϵt\)\\rho\_\{t\}=\(s\_\{t\},g\_\{t\},\\sigma\_\{t\},\\epsilon\_\{t\}\)\. We consider three cases\.
Case 1: Action accepted\.Gate\(st,at\)=1⇒σt\+1=max\(0,σt−δ\)≤σt\\textsc\{Gate\}\(s\_\{t\},a\_\{t\}\)=1\\Rightarrow\\sigma\_\{t\+1\}=\\max\(0,\\sigma\_\{t\}\-\\delta\)\\leq\\sigma\_\{t\}\.
Case 2: Fallback succeeds\.An alternativeat′a^\{\\prime\}\_\{t\}withGate=1\\textsc\{Gate\}=1is found;σt\+1=σt\\sigma\_\{t\+1\}=\\sigma\_\{t\}\.
Case 3: Fallback fails\.σt\+1=σt\+Δσ\\sigma\_\{t\+1\}=\\sigma\_\{t\}\+\\Delta\\sigma; gear de\-escalates togt\+1=max\(gt−1,G0\)g\_\{t\+1\}=\\max\(g\_\{t\}\-1,G\_\{0\}\)\.
Letp1\(t\),p2\(t\),p3\(t\)p\_\{1\}\(t\),p\_\{2\}\(t\),p\_\{3\}\(t\)denote the conditional probabilities of Cases 1, 2, 3\.
𝔼\[σt\+1∣ρt\]\\displaystyle\\mathbb\{E\}\[\\sigma\_\{t\+1\}\\mid\\rho\_\{t\}\]=σt−p1\(t\)δ\+p3\(t\)Δσ\\displaystyle=\\sigma\_\{t\}\-p\_\{1\}\(t\)\\delta\+p\_\{3\}\(t\)\\Delta\\sigma\(1\)It suffices to showp1\(t\)δ≥p3\(t\)Δσp\_\{1\}\(t\)\\delta\\geq p\_\{3\}\(t\)\\Delta\\sigma\.
Gear de\-escalation in Case 3 restricts the action space from𝒜gt\\mathcal\{A\}\_\{g\_\{t\}\}to𝒜gt−1⊂𝒜gt\\mathcal\{A\}\_\{g\_\{t\-1\}\}\\subset\\mathcal\{A\}\_\{g\_\{t\}\}, increasing the gate acceptance probability since lower\-risk actions dominate in the restricted space:
Pr\[Gate=1∣a∼LLM\(⋅∣s,g′\)\]\\displaystyle\\Pr\\\!\\left\[\\textsc\{Gate\}=1\\mid a\\sim\\mathrm\{LLM\}\(\\cdot\\mid s,g^\{\\prime\}\)\\right\]≥Pr\[Gate=1∣a∼LLM\(⋅∣s,g\)\],g′<g\.\\displaystyle\\qquad\\geq\\Pr\\\!\\left\[\\textsc\{Gate\}=1\\mid a\\sim\\mathrm\{LLM\}\(\\cdot\\mid s,g\)\\right\],\\qquad g^\{\\prime\}<g\.With equal step sizesΔσ=δ\\Delta\\sigma=\\deltaandp1\(t\)≥p3\(t\)p\_\{1\}\(t\)\\geq p\_\{3\}\(t\)\(the LLM assigns higher probability to admissible actions under gear restrictions\):p1\(t\)δ≥p3\(t\)Δσp\_\{1\}\(t\)\\delta\\geq p\_\{3\}\(t\)\\Delta\\sigma, so𝔼\[σt\+1∣ρt\]≤σt\\mathbb\{E\}\[\\sigma\_\{t\+1\}\\mid\\rho\_\{t\}\]\\leq\\sigma\_\{t\}\. The tower property gives𝔼\[σt\+1∣σt\]≤σt\\mathbb\{E\}\[\\sigma\_\{t\+1\}\\mid\\sigma\_\{t\}\]\\leq\\sigma\_\{t\}\.□\\square∎
### A\.2Proof of Theorem[2](https://arxiv.org/html/2607.00334#Thmtheorem2)\(Execution Safety\)
###### Proof\.
By Algorithm[1](https://arxiv.org/html/2607.00334#alg1)\(line 5\), actionata\_\{t\}is executed iffGate\(st,at\)=1\\textsc\{Gate\}\(s\_\{t\},a\_\{t\}\)=1\. By Definition[3](https://arxiv.org/html/2607.00334#Thmdefinition3),Gate=1⇔U\(st,at\)≥θ\\textsc\{Gate\}=1\\iff U\(s\_\{t\},a\_\{t\}\)\\geq\\theta\. Sinceθ≥0\\theta\\geq 0:U\(st,at\)≥θ≥0U\(s\_\{t\},a\_\{t\}\)\\geq\\theta\\geq 0\. Fallback actionsat′a^\{\\prime\}\_\{t\}are executed only ifGate\(st,at′\)=1\\textsc\{Gate\}\(s\_\{t\},a^\{\\prime\}\_\{t\}\)=1\(Algorithm[1](https://arxiv.org/html/2607.00334#alg1), line 10\), yielding the same bound\. The LLM can only suggest; the gate decides\. Contrapositive:U\(st,at\)<0⇒atU\(s\_\{t\},a\_\{t\}\)<0\\Rightarrow a\_\{t\}not executed\.□\\square∎
### A\.3Proof of Theorem[3](https://arxiv.org/html/2607.00334#Thmtheorem3)\(Eventual Stabilization\)
###### Proof\.
Define Lyapunov functionV\(ρ\)=σ\+C⋅𝟏\[ϵ=1\]\+D⋅k\(g\)V\(\\rho\)=\\sigma\+C\\cdot\\mathbf\{1\}\[\\epsilon=1\]\+D\\cdot k\(g\)wherek\(g\)k\(g\)is the gear index andC,D\>0C,D\>0\. ChooseC\>Δσ/δC\>\\Delta\\sigma/\\deltaandD\>C\+ΔσD\>C\+\\Delta\\sigma\.
Case A \(successful execution\):σt\+1≤σt−δ\\sigma\_\{t\+1\}\\leq\\sigma\_\{t\}\-\\delta; gear holds or escalates\.VVdecreases by at leastδ\\deltafrom theσ\\sigmaterm\.
Case B \(fallback failure\):σt\+1=σt\+Δσ\\sigma\_\{t\+1\}=\\sigma\_\{t\}\+\\Delta\\sigma;ϵt\+1=1\\epsilon\_\{t\+1\}=1; gear de\-escalates\.V\(ρt\+1\)=V\(ρt\)\+Δσ\+C−D<V\(ρt\)V\(\\rho\_\{t\+1\}\)=V\(\\rho\_\{t\}\)\+\\Delta\\sigma\+C\-D<V\(\\rho\_\{t\}\)sinceD\>C\+ΔσD\>C\+\\Delta\\sigma\.
By the Foster\-Lyapunov theorem\[[19](https://arxiv.org/html/2607.00334#bib.bib19)\], the chain is positive recurrent\. The finite state space𝒢\\mathcal\{G\}guarantees that the gear process has only singleton recurrent classes\. Therefore∃g∗∈𝒢\\exists g^\{\*\}\\in\\mathcal\{G\},T∗<∞T^\{\*\}<\\inftysuch thatgt=g∗g\_\{t\}=g^\{\*\}for allt≥T∗t\\geq T^\{\*\}a\.s\.□\\square∎
### A\.4Proof of Theorem[4](https://arxiv.org/html/2607.00334#Thmtheorem4)\(Fallback Completeness\)
###### Proof\.
Base casek=0k=0:AtG0G\_\{0\},𝒜0\\mathcal\{A\}\_\{0\}contains only read\-only observations\.U\(s,a\)≥0U\(s,a\)\\geq 0for anya∈𝒜0a\\in\\mathcal\{A\}\_\{0\}since observation reduces uncertainty without side effects\.
Inductive stepk\>0k\>0:After at mostmmfailed alternatives atGkG\_\{k\}, the gear de\-escalates toGk−1G\_\{k\-1\}\. By induction, after at mostk≤4k\\leq 4de\-escalation steps, the system reachesG0G\_\{0\}\. AtG0G\_\{0\}, Condition \(1\) is satisfied \(base case\)\. Sincek≤4k\\leq 4, the bound of\|𝒢\|−1=4\|\\mathcal\{G\}\|\-1=4steps is tight\.□\\square∎
### A\.5Proof of Theorem[5](https://arxiv.org/html/2607.00334#Thmtheorem5)\(Representation Theorem\)
###### Proof\.
Constructℳ=\(ℛ,𝒜,T,R,γ\)\\mathcal\{M\}=\(\\mathcal\{R\},\\mathcal\{A\},T,R,\\gamma\)withR\(ρ,a\)=U\(s,a\)⋅Gate\(s,a\)R\(\\rho,a\)=U\(s,a\)\\cdot\\textsc\{Gate\}\(s,a\)andTTinherited from the runtime\.
A policy is*gear\-constrained*ifπ\(a\|ρ\)\>0⇒a∈𝒜g\\pi\(a\|\\rho\)\>0\\Rightarrow a\\in\\mathcal\{A\}\_\{g\}\. LetΠℳgc\\Pi\_\{\\mathcal\{M\}\}^\{gc\}denote all gear\-constrained stationary policies inℳ\\mathcal\{M\}\.
ΠEntropyRuntime⊆Πℳgc\\Pi\_\{EntropyRuntime\{\}\}\\subseteq\\Pi\_\{\\mathcal\{M\}\}^\{gc\}: EntropyRuntime constrainsat∈𝒜gta\_\{t\}\\in\\mathcal\{A\}\_\{g\_\{t\}\}by the gear mechanism; the resulting mappingπ:ℛ→Δ\(𝒜\)\\pi:\\mathcal\{R\}\\to\\Delta\(\\mathcal\{A\}\)is gear\-constrained by construction\.
Πℳgc⊆ΠEntropyRuntime\\Pi\_\{\\mathcal\{M\}\}^\{gc\}\\subseteq\\Pi\_\{EntropyRuntime\{\}\}: givenπℳ∈Πℳgc\\pi\_\{\\mathcal\{M\}\}\\in\\Pi\_\{\\mathcal\{M\}\}^\{gc\}, samplea∼πℳ\(⋅\|ρ\)a\\sim\\pi\_\{\\mathcal\{M\}\}\(\\cdot\|\\rho\), applyGate, invoke fallback on rejection\. The fallback dynamics are deterministic givenρ\\rhoand the rejection event, so they can be absorbed into an augmented kernelTaugT\_\{\\mathrm\{aug\}\}\. This constructs a valid EntropyRuntime policy\.
Value equivalence:VEntropyRuntimeπ\(ρ\)=𝔼π\[∑tγtU\(st,at\)Gate\(st,at\)\]=Vℳπℳ\(ρ\)V^\{\\pi\}\_\{EntropyRuntime\{\}\}\(\\rho\)=\\mathbb\{E\}\_\{\\pi\}\[\\sum\_\{t\}\\gamma^\{t\}U\(s\_\{t\},a\_\{t\}\)\\textsc\{Gate\}\(s\_\{t\},a\_\{t\}\)\]=V^\{\\pi\_\{\\mathcal\{M\}\}\}\_\{\\mathcal\{M\}\}\(\\rho\)sinceR\(ρ,a\)=U\(s,a\)⋅Gate\(s,a\)R\(\\rho,a\)=U\(s,a\)\\cdot\\textsc\{Gate\}\(s,a\)and dynamics are identical\. ThereforeΠEntropyRuntime=Πℳgc\\Pi\_\{EntropyRuntime\{\}\}=\\Pi\_\{\\mathcal\{M\}\}^\{gc\}and value functions coincide\.□\\square∎
## Appendix BComplete Proofs: Multi\-Agent System
### B\.1Proof of Theorem[6](https://arxiv.org/html/2607.00334#Thmtheorem6)\(Distributed Execution Safety\)
###### Proof\.
Fixt≥0t\\geq 0\. SupposeGatec\(xt,at1:n\)=1\\textsc\{Gate\}\_\{c\}\(x\_\{t\},a\_\{t\}^\{1:n\}\)=1\. By Definition[7](https://arxiv.org/html/2607.00334#Thmdefinition7), this requiresmini∈ℐUi\(sti,ati\)≥θ\\min\_\{i\\in\\mathcal\{I\}\}U\_\{i\}\(s\_\{t\}^\{i\},a\_\{t\}^\{i\}\)\\geq\\theta, immediately implyingUi≥θ≥0U\_\{i\}\\geq\\theta\\geq 0for everyi∈ℐi\\in\\mathcal\{I\}\. The gate is evaluated before dispatch\. Theorem[2](https://arxiv.org/html/2607.00334#Thmtheorem2)applies to each agent independently\. The policy tripleΠ\\Pigoverns drain, hold, and return behavior\. Any continuation inMeta\-Cognitiveis restricted to the bounded𝒜2\\mathcal\{A\}\_\{2\}action set and remains utility\-gated;AssistedandRegulateddo not authorize autonomous task progression\.□\\square∎
### B\.2Proof of Corollary[7](https://arxiv.org/html/2607.00334#Thmtheorem7)\(Rendezvous Safety\)
###### Proof\.
\(a\) Drain safety \(V1\)\.V1 states that the fault is in the camera layer, not the encoder\-based actuator\. During the drain epoch, the faulting agent A executes with old velocity scalev\(gt−1A\)v\(g\_\{t\-1\}^\{A\}\); its true position satisfies‖pA\(t\)−pA∗\(t\)‖≤εctrl\\\|p\_\{A\}\(t\)\-p\_\{A\}^\{\*\}\(t\)\\\|\\leq\\varepsilon\_\{\\mathrm\{ctrl\}\}regardless of perceived drift\. Clearance constraints concern true positions \(Theorem[10](https://arxiv.org/html/2607.00334#Thmtheorem10)\); completing the epoch is therefore safe\.
\(b\) Hold safety \(V2\)\.V2 states that non\-faulting agents operate on geometrically independent trajectories\. The per\-agent gate forj∈\{B,C\}j\\in\\\{B,C\\\}evaluatesUjU\_\{j\}onjj’s perceived state, which includes A’s perceived position inriskjA\\mathrm\{risk\}\_\{jA\}\. If A’s perceived drift raisesriskjA≥τmeta\\mathrm\{risk\}\_\{jA\}\\geq\\tau\_\{\\mathrm\{meta\}\}, thengtjg\_\{t\}^\{j\}descends toG2G\_\{2\}; if≥τcrit\\geq\\tau\_\{\\mathrm\{crit\}\}, toG0G\_\{0\}\. In either case,jj’s own gate prevents execution atG3G\_\{3\}without approval; the hold is self\-limiting\. By Theorem[6](https://arxiv.org/html/2607.00334#Thmtheorem6)applied to agentjj,Uj≥θU\_\{j\}\\geq\\thetawheneverjj’s gate is open\.
\(c\) Return soundness \(V3\)\.δ\\deltaclean epochs withrmax<τmetar\_\{\\max\}<\\tau\_\{\\mathrm\{meta\}\}, combined with OU contraction rateα=1−θou\\alpha=1\-\\theta\_\{\\mathrm\{ou\}\}, imply𝔼\[‖dA‖2\]\\mathbb\{E\}\[\\\|d\_\{A\}\\\|^\{2\}\]is decreasing during the clean sequence \(Theorem[8](https://arxiv.org/html/2607.00334#Thmtheorem8)\)\. The probability that a single OU noise step returnsrmax≥τmetar\_\{\\max\}\\geq\\tau\_\{\\mathrm\{meta\}\}decreases geometrically inδ\\delta\. Forδ≥3\\delta\\geq 3the return is sound with probability≥1−α2δ\>0\.82\\geq 1\-\\alpha^\{2\\delta\}\>0\.82\.□\\square∎
### B\.3Proof of Theorem[8](https://arxiv.org/html/2607.00334#Thmtheorem8)\(Distributed Monotonic Stability\)
###### Proof\.
Letα=1−θou=0\.75\\alpha=1\-\\theta\_\{\\mathrm\{ou\}\}=0\.75\. SincedB=dC=0d\_\{B\}=d\_\{C\}=0and‖pi−pi∗‖≤εctrl≈0\\\|p\_\{i\}\-p\_\{i\}^\{\*\}\\\|\\leq\\varepsilon\_\{\\mathrm\{ctrl\}\}\\approx 0:
Vswarm\(t\)=\(1\+λ\)‖dA\(t\)‖2\+O\(εctrl\)≈\(1\+λ\)‖dA\(t\)‖2V\_\{\\mathrm\{swarm\}\}\(t\)=\(1\+\\lambda\)\\\|d\_\{A\}\(t\)\\\|^\{2\}\+O\(\\varepsilon\_\{\\mathrm\{ctrl\}\}\)\\approx\(1\+\\lambda\)\\\|d\_\{A\}\(t\)\\\|^\{2\}withλ=50\\lambda=50, so\(1\+λ\)=51\(1\+\\lambda\)=51\. UnderContinue Independent, B and C track nominal; their contribution toVswarmV\_\{\\mathrm\{swarm\}\}is zero\. The OU recursion gives:
𝔼\[‖dA\(t\+1\)‖2\]=‖αdA\(t\)\+\(1−α\)μ‖2\+3σnoise2\\mathbb\{E\}\[\\\|d\_\{A\}\(t\+1\)\\\|^\{2\}\]=\\\|\\alpha d\_\{A\}\(t\)\+\(1\-\\alpha\)\\mu\\\|^\{2\}\+3\\sigma\_\{\\mathrm\{noise\}\}^\{2\}Cauchy\-Schwarz:‖αu\+\(1−α\)v‖2≤\[α‖u‖\+\(1−α\)‖v‖\]2\\\|\\alpha u\+\(1\-\\alpha\)v\\\|^\{2\}\\leq\[\\alpha\\\|u\\\|\+\(1\-\\alpha\)\\\|v\\\|\]^\{2\}\. SettingΔV\(t\)=Vswarm\(t\)−V∗\\Delta V\(t\)=V\_\{\\mathrm\{swarm\}\}\(t\)\-V^\{\*\}whereV∗=51‖μ‖2V^\{\*\}=51\\\|\\mu\\\|^\{2\}:
𝔼\[ΔV\(t\+1\)\]\\displaystyle\\mathbb\{E\}\[\\Delta V\(t\+1\)\]≤51\(\\displaystyle\\leq 1\\Bigl\(\[α∥dA\(t\)∥\+\(1−α\)∥μ∥\]2−∥μ∥2\)\+C\\displaystyle\\quad\[\\alpha\\\|d\_\{A\}\(t\)\\\|\+\(1\-\\alpha\)\\\|\\mu\\\|\]^\{2\}\-\\\|\\mu\\\|^\{2\}\\Bigr\)\+C≤α2ΔV\(t\)\+C\.\\displaystyle\\leq\\alpha^\{2\}\\Delta V\(t\)\+C\.whereC=51⋅3σnoise2C=51\\cdot 3\\sigma\_\{\\mathrm\{noise\}\}^\{2\}\. Iterating:𝔼\[ΔV\(t\)\]≤α2tΔV\(0\)\+C/\(1−α2\)\\mathbb\{E\}\[\\Delta V\(t\)\]\\leq\\alpha^\{2t\}\\Delta V\(0\)\+C/\(1\-\\alpha^\{2\}\)\. Sinceα2=0\.5625<1\\alpha^\{2\}=0\.5625<1, the bound is convergent\.
Per\-step bound\.At OU stationarity,‖ε\(t\)‖≤6\.2×10−3m\\\|\\varepsilon\(t\)\\\|\\leq 6\.2\\times 10^\{\-3\}\\,\\text\{m\}at the 97\.5th percentile \(3σnoise33\\sigma\_\{\\mathrm\{noise\}\}\\sqrt\{3\}\)\. The per\-step increment satisfiesΔV≤51\(2‖μ‖‖ε‖\+‖ε‖2\)=0\.0095m2<δV=0\.01m2\\Delta V\\leq 51\(2\\\|\\mu\\\|\\\|\\varepsilon\\\|\+\\\|\\varepsilon\\\|^\{2\}\)=0\.0095\\,\\text\{m\}^\{2\}<\\delta\_\{V\}=0\.01\\,\\text\{m\}^\{2\}\. Zero violations were observed in 200 independent validation episodes\.□\\square∎
### B\.4Proof of Corollary[9](https://arxiv.org/html/2607.00334#Thmtheorem9)\(Feedback\-Coupled Stability Attenuation\)
###### Proof\.
Let sensor drift enter the control input through feedback gainαfb∈\[0,1\]\\alpha\_\{fb\}\\in\[0,1\]\. Under macro\-state velocity scalev\(ψt\)v\(\\psi\_\{t\}\), the induced position\-error contribution has norm bounded byαfbv\(ψt\)‖et‖\\alpha\_\{fb\}v\(\\psi\_\{t\}\)\\\|e\_\{t\}\\\|\. Squaring this contribution in the Lyapunov energy yields an additional term bounded byαfb2v2\(ψt\)Vswarm\(t\)\\alpha\_\{fb\}^\{2\}v^\{2\}\(\\psi\_\{t\}\)V\_\{\\mathrm\{swarm\}\}\(t\), with fixed weighting constants absorbed into the definition ofVswarmV\_\{\\mathrm\{swarm\}\}\. InStable,v=1v=1and the full coupling term remains\. InMeta\-Cognitive,v=0\.5v=0\.5, so the term is multiplied by0\.52=0\.250\.5^\{2\}=0\.25\. InRegulated,v=0v=0, so the coupling term vanishes\. A binary gate permits onlyv∈\{0,1\}v\\in\\\{0,1\\\}and therefore cannot realize the intermediate attenuation factor\.□\\square∎
### B\.5Proof of Theorem[10](https://arxiv.org/html/2607.00334#Thmtheorem10)\(Zero\-Collision Guarantee\)
###### Proof\.
UR5 architecture \(V1 satisfied\)\.The proportional controller reads joint encoder positions and targetspi∗\(t\)p\_\{i\}^\{\*\}\(t\):
‖pi\(t\)−pi∗\(t\)‖≤εctrl<2mm∀i,t\\\|p\_\{i\}\(t\)\-p\_\{i\}^\{\*\}\(t\)\\\|\\leq\\varepsilon\_\{\\mathrm\{ctrl\}\}<2\\,\\text\{mm\}\\quad\\forall i,tNominal trajectories maintain:‖pi∗\(t\)−pj∗\(t\)‖≥76mm\+2r\+Δsafe\\\|p\_\{i\}^\{\*\}\(t\)\-p\_\{j\}^\{\*\}\(t\)\\\|\\geq 76\\,\\text\{mm\}\+2r\+\\Delta\_\{\\mathrm\{safe\}\}for alli≠ji\\neq j\. Since2εctrl<4mm≪76mm2\\varepsilon\_\{\\mathrm\{ctrl\}\}<4\\,\\text\{mm\}\\ll 76\\,\\text\{mm\}, the triangle inequality gives‖pi\(t\)−pj\(t\)‖\>0\>−\(2r\+Δsafe\)\\\|p\_\{i\}\(t\)\-p\_\{j\}\(t\)\\\|\>0\>\-\(2r\+\\Delta\_\{\\mathrm\{safe\}\}\)\. No collision\.
Feedback\-coupled systems \(V1 not satisfied\)\.The E\-Stop fires atrmax≥τcrit=0\.65r\_\{\\max\}\\geq\\tau\_\{\\mathrm\{crit\}\}=0\.65, corresponding to perceived clearance≤−37mm\\leq\-37\\,\\text\{mm\}\. Since perceived clearance≤\\leqtrue clearance\+‖dA‖\+\\\|d\_\{A\}\\\|, and‖dA‖≤σfault=120mm\\\|d\_\{A\}\\\|\\leq\\sigma\_\{\\mathrm\{fault\}\}=120\\,\\text\{mm\}\(OU bound\), true clearance at E\-Stop trigger≥−37\+120=83mm\>0\\geq\-37\+120=83\\,\\text\{mm\}\>0\. The E\-Stop is instantaneous; therefore no physical contact occurs\.□\\square∎Similar Articles
Operationalizing Reconstructive Authority: Runtime Construction, Dependency Resolution, and Execution Gating in Autonomous Agent Systems
This paper introduces a runtime execution model for autonomous agents that enforces 'Reconstructive Authority'—actions are only permitted if authority can be constructed from current state. It includes dynamic dependency resolution, a halt state for uncertainty, and a recovery loop integrating drift detection.
Goal-Autopilot: A Verifiable Anti-Fabrication Firewall for Unattended Long-Horizon Agents
This paper presents Autopilot, an execution model for long-horizon LLM agents that enforces honest termination by externalizing state into a gated finite-state machine. It provides a theoretical guarantee against fabricated success and demonstrates significantly lower fabrication rates compared to Reflexion and StateFlow in empirical evaluations.
Your Agent Has a Genome: Sequence-Level Behavioral Analysis and Runtime Governance of LLM-Powered Autonomous Agents
This paper introduces Base Sequence Analysis, a framework that encodes LLM agent runtime behavior into compact sequences, revealing high-risk patterns like the 'P-X-P' trigram and a verification deficit. It presents Governor, a runtime intervention system that improves task success by 6.2% and reduces token consumption by 44%.
Runtime Governance: The Missing Layer for AI Agents in 2026
The article discusses the need for runtime governance in AI agents to balance autonomy with compliance, introducing SAFi, an open-source framework that enforces policies in real-time and audits actions.
Synergistic Simplex: Cooperative Runtime Assurance for Safety-Critical Autonomous Systems
This paper introduces Synergistic Simplex, a new runtime assurance architecture for autonomous systems that allows safety monitors to use ML outputs while preserving formal safety guarantees. The authors demonstrate its effectiveness in improving performance for obstacle detection in autonomous vehicles.