Thermodynamic Measure Of Intelligence
Summary
This paper proposes a thermodynamic measure of intelligence defined as 'rare-valid lift' and argues that recursive self-simulation is necessary and nearly sufficient for high thermodynamic intelligence, making intelligence measurable on a universal scale.
View Cached Full Text
Cached at: 06/23/26, 01:42 AM
# Thermodynamic Measure of Intelligence Source: [https://arxiv.org/html/2606.20231](https://arxiv.org/html/2606.20231) Ishanu Chattopadhyay[ishanu\_ch@uky\.edu](https://arxiv.org/html/2606.20231v1/mailto:[email protected])Institute for Biomedical Informatics, University of Kentucky, Lexington, Kentucky, USADepartment of Computer Science, University of Kentucky, Lexington, Kentucky, USA ###### Abstract Can intelligence be measured? We propose that intelligence can be defined as the lawful amplification of rare but valid futures: a system increases the probability of outcomes that would be unlikely under passive dynamics but remain admissible under the constraints of the domain\. We start with the premise that an intelligent system must model the world and its own place within it\. Because the system is part of the world it models, this leads naturally to recursive self\-simulation: the system represents futures in which its own actions are part of the trajectory\. Our central results give a necessity statement and a conditional near\-sufficiency statement connecting this architecture to a precise thermodynamic measure of lawful amplification of rare\-valid futures: high rare\-valid lift is impossible unless the internal simulation identifies rare\-valid futures with high fidelity; conversely, when rare\-valid fidelity is high and the simulation contains an effective policy, the achievable lift approaches the actuation\-limited optimum\. Thus recursive self\-simulation is not merely a plausible feature of intelligence but, under the stated assumptions, is necessary and nearly sufficient for high thermodynamic intelligence\. The resulting framework makes intelligence measurable on a universal scale, from passive matter and feedback controllers, large language models, and humans as text generators to Maxwell\-demon\-like information engines\. ## IIntroduction Can intelligence be framed as a measurable physical quantity? We start from the observation that any system perceived to be intelligent models the world with itself inside it, simulates possible futures conditioned on its own actions, and uses that internal model to make some futures more likely than they would be under passive dynamics\. This suggests that a relevant measurable quantity is*rare\-valid lift*: the increase in probability assigned to futures that are unlikely under a passive baseline but remain valid under the constraints of the domain\. Our central result is that high rare\-valid lift cannot be obtained merely from randomness or strong actuation\. Under bounded amplification, it requires high\-fidelity self\-simulation: the system’s internal model must identify rare\-valid futures accurately enough to target them\. Figure 1:Conceptual summary\. \(a\) Recursive self\-simulation: a system represents the world at one level together with models of its own future states and actions at higher simulated levels\. The rare\-valid fidelity \(Φ^\\widehat\{\\Phi\}\) measures how accurately the simulation identifies targetable rare\-valid futures\. \(b\) Thermodynamic intelligence: relative to a passive trajectory law \(P0P\_\{0\}\), an induced law \(PP\) shifts probability mass toward rare\-valid trajectories \(VδV\_\{\\delta\}\), producing rare\-valid lift \(Iδ=δ−1∫Vδ\(dP−dP0\)I\_\{\\delta\}=\\delta^\{\-1\}\\int\_\{V\_\{\\delta\}\}\(dP\-dP\_\{0\}\)\)\. \(c\) Representative systems on the compressed scale \(Λ=log10\(log10\(I\+1\)\+1\)\\Lambda=\\log\_\{10\}\(\\log\_\{10\}\(I\+1\)\+1\)\)\. The plotted examples are finite\-resolution calibrations of probability lift, with symbolic and demon entries interpreted under the assumptions stated in the text\.Standard definitions of intelligence emphasize behavior: imitation or conversational indistinguishability\[[1](https://arxiv.org/html/2606.20231#bib.bib1)\], learning, reasoning, planning, generalization, compression, reward maximization, or task success\[[2](https://arxiv.org/html/2606.20231#bib.bib2),[3](https://arxiv.org/html/2606.20231#bib.bib3),[4](https://arxiv.org/html/2606.20231#bib.bib4)\]\. These criteria are useful, but they do not by themselves identify a substrate\-independent operation common to brains, large language models, microbial communities, immune repertoires, controllers, and idealized information engines such as Maxwell’s demons\. All of these systems transform information into action\. We therefore ask a different question: what does an intelligent system do to the likelihood of possible futures? This question is related to, but distinct from, existing task\-facing accounts of intelligence\. Legg–Hutter intelligence defines an agent’s intelligence by its expected reward over a universal distribution of computable environments\[[2](https://arxiv.org/html/2606.20231#bib.bib2)\]\. Chollet’s ARC framework instead emphasizes skill\-acquisition efficiency: the ability to infer abstract rules and generalize from sparse experience under human\-like priors\[[4](https://arxiv.org/html/2606.20231#bib.bib4)\]\. These accounts evaluate performance across environments, benchmarks, or problem classes\. Our framework is path\-facing\. We ask what physical or probabilistic operation underlies such performance once a level of description, baseline law, validity criterion, and observational resolution are fixed\. Reward maximization, benchmark generalization, theorem proving, symbolic problem solving, and biological adaptation can then be treated as special cases\. We begin with recursive self\-simulation\. A system acts intelligently, in the present sense, when it carries a model of the world that includes itself as a causal object\. Such a model can represent possible futures, including the system’s own interventions, the observations those interventions may produce, and the way its own information state may later change\. Minsky anticipated this point in his account of internal models: advanced problem solving requires a system to represent its own goals, resources, and problem\-solving activity, and self\-understanding can involve models of models of oneself\[[5](https://arxiv.org/html/2606.20231#bib.bib5),[6](https://arxiv.org/html/2606.20231#bib.bib6),[7](https://arxiv.org/html/2606.20231#bib.bib7)\]\. Related ideas appear in work on self\-reference and strange loops\[[8](https://arxiv.org/html/2606.20231#bib.bib8),[9](https://arxiv.org/html/2606.20231#bib.bib9),[10](https://arxiv.org/html/2606.20231#bib.bib10)\]\. To measure the capabilities enabled by recursive self\-simulation, we use path laws\. Passive dynamics induce a baseline distributionP0P\_\{0\}over trajectories\. A controlled or agent\-like system induces another lawPP\. A self\-simulation becomes observable through the way it changes these probabilities\. Intelligence, viewed this way, is lawful trajectory reweighting: some futures become more likely, others less so, and the change must respect the thermodynamic accounting required by measurement, memory, computation, control, and erasure\. Not all trajectory manipulations are the same\. The most informative changes occur in the tail of the trajectory distribution\. Moving probability among futures already common underP0P\_\{0\}may reflect stabilization or regulation, but it does not strongly test whether the system can reach beyond passive dynamics\. Rare futures probe that ability because they would otherwise remain effectively unrealized\. Rarity alone, however, is not a measure of intelligence; random noise also produces improbable events\. The futures must remain valid; and hence we focus on rare\-valid futures: trajectories that have low probability under the passive law but remain admissible under the constraints of the domain\. Rarity supplies counterfactual difficulty; validity, interpreted as physical realizability, biological viability, semantic coherence, executable correctness, or functional success, prevents the measure from rewarding noise\. To make this notion precise and computable, we consider the thermodynamics of system trajectories, and we define thermodynamic intelligence as rare\-valid probability lift: the fractional increase, under the induced lawPP, in the probability of an exceedingly rare but valid set\. The definition turns the quantification of intelligence into a question about path measures\. The thermodynamic machinery needed for this analysis is well developed\[[11](https://arxiv.org/html/2606.20231#bib.bib11),[12](https://arxiv.org/html/2606.20231#bib.bib12),[13](https://arxiv.org/html/2606.20231#bib.bib13),[14](https://arxiv.org/html/2606.20231#bib.bib14),[15](https://arxiv.org/html/2606.20231#bib.bib15)\], including non\-equilibrium fluctuation theorems, which quantify the relative likelihood of entropy\-producing and entropy\-reducing trajectories under passive dynamics\[[16](https://arxiv.org/html/2606.20231#bib.bib16),[17](https://arxiv.org/html/2606.20231#bib.bib17),[18](https://arxiv.org/html/2606.20231#bib.bib18),[19](https://arxiv.org/html/2606.20231#bib.bib19)\]\. The symbolic case uses the complementary information\-theoretic language of entropy rate and coding\[[20](https://arxiv.org/html/2606.20231#bib.bib20),[21](https://arxiv.org/html/2606.20231#bib.bib21)\]\. We illustrate the framework through examples spanning passive systems with zero lift, simple controllers with modest amplification, symbolic generators, including GPT\-5 and human text, to Maxwell\-demon\-like information engines\. Maxwell’s demon provides the canonical historical case: a hypothetical microscopic observer using information about particle states to sort thermal fluctuations and create an apparent local entropy reduction\. Here the demon serves as an idealized high\-lift limit, where near\-perfect microstate simulation, rare\-valid identification, and actuation produce extreme rare\-valid trajectory amplification before implementation costs are paid\. Conversely, the symbolic examples show how the same formalism can be applied at a finite linguistic resolution, once the baseline ensemble, validity criterion, sequence length, and generator\-induced probability shift are specified\. ## IIRecursive Self\-Simulation To act with intelligence, a system needs a deep model of its world, with itself in it\. Minsky anticipated this point in his account of internal models: advanced problem solving requires a system to represent not only the external situation, but also its own goals, resources, and problem\-solving activity, and self\-understanding can involve models of models of oneself\[[5](https://arxiv.org/html/2606.20231#bib.bib5),[6](https://arxiv.org/html/2606.20231#bib.bib6),[7](https://arxiv.org/html/2606.20231#bib.bib7)\]\. We formalize this self\-in\-world requirement as an embedded representation hierarchy\. LetBBdenote an agent\-like system embedded in an environmentEE, and letU=B∪EU=B\\cup E\. At a minimal level, the system maintains an internal representation of its observable world, rB\(0\)=rB\(0\)\(U\)\.\\displaystyle r\_\{B\}^\{\(0\)\}=r\_\{B\}^\{\(0\)\}\(U\)\.\(1\)BecauseB∈UB\\in U, a sufficiently general model of the local universe must also represent the system itself: its state, memory, uncertainty, actions, and possible future updates, inducing a recursive hierarchy: rB\(1\)\\displaystyle r\_\{B\}^\{\(1\)\}=rB\(1\)\(rB\(0\)\),\\displaystyle=r\_\{B\}^\{\(1\)\}\(r\_\{B\}^\{\(0\)\}\),\(2\)rB\(2\)\\displaystyle r\_\{B\}^\{\(2\)\}=rB\(2\)\(rB\(1\)\),\\displaystyle=r\_\{B\}^\{\(2\)\}\(r\_\{B\}^\{\(1\)\}\),\(3\)⋮\\displaystyle\\vdotsrB\(k\)\\displaystyle r\_\{B\}^\{\(k\)\}=rB\(k\)\(rB\(k−1\)\)\.\\displaystyle=r\_\{B\}^\{\(k\)\}\(r\_\{B\}^\{\(k\-1\)\}\)\.\(4\)The hierarchy encodes predictions about the world, predictions about the system’s own future actions, and predictions about how its information state may change; recursive self\-simulation is therefore a finite self\-referential loop\. If the environment contains other agentsBjB\_\{j\}, the same idea extends to nested representations: rB\(k\)\(rBj\(ℓ\)\),j≠B,k,ℓ≥0\.\\displaystyle r\_\{B\}^\{\(k\)\}\\\!\\left\(r\_\{B\_\{j\}\}^\{\(\\ell\)\}\\right\),\\qquad j\\neq B,\\quad k,\\ell\\geq 0\.\(5\)Social and biological environments can therefore generate interacting recursive models\. This is the thermodynamic analogue of theory\-of\-mind style modeling: an agent’s future depends partly on what it predicts other agents will perceive, infer, and do\[[22](https://arxiv.org/html/2606.20231#bib.bib22),[23](https://arxiv.org/html/2606.20231#bib.bib23)\]\. Recursive self\-simulation becomes operational when a system evaluates consequences of its own future actions\. Lethth\_\{t\}denote the history available at timett, let𝒜t\\mathcal\{A\}\_\{t\}denote the available action set, and letΓt:T\\Gamma\_\{t:T\}denote a future trajectory segment\. Suppose an agent evaluates a trajectory functionalGGunder itskk\-level internal model and selects at⋆∈argmaxa∈𝒜t𝔼rB\(k\)\[G\(Γt:T\)∣ht,a\]\.\\displaystyle a\_\{t\}^\{\\star\}\\in\\arg\\max\_\{a\\in\\mathcal\{A\}\_\{t\}\}\\mathbb\{E\}\_\{r\_\{B\}^\{\(k\)\}\}\\\!\\left\[G\(\\Gamma\_\{t:T\}\)\\mid h\_\{t\},a\\right\]\.\(6\)Equivalently, the agent may implement a policyπt\(⋅∣ht\)\\pi\_\{t\}\(\\cdot\\mid h\_\{t\}\)concentrated near such maximizing actions\. To evaluate this expectation, the model must represent the future environment, the agent’s possible actions, and the agent’s own future information state\. Our construction is distinct from predictive\-processing and active\-inference views of perception and action\[[24](https://arxiv.org/html/2606.20231#bib.bib24)\]; our measured quantity is rare\-valid probability lift relative toP0P\_\{0\}, not free energy itself\. The hierarchy of recursive simulation determines which futures the system can identify, evaluate, and target\. To connect this architecture to a measurable quantity, we now introduce the level\-relative rare\-valid lift\. Letℒ0\\mathcal\{L\}\_\{0\}denote the base reality, or physical level, under consideration\. Throughout the paper,ℐ\\mathcal\{I\}denotes a fractional probability lift of a rare\-valid set relative to a passive baseline\. At levelkk, this lift compares the probability assigned to a rare\-valid event under an induced or simulated law with its probability under the corresponding passive law\. Realized intelligence is the lift actually induced in the level\-kkpath law\. Intelligence potential is the largest such lift available inside a level\-\(k\+1\)\(k\+1\)simulation of levelkk\. Section[III](https://arxiv.org/html/2606.20231#S3)gives the corresponding trajectory\-space definition and identifies it as thermodynamic intelligence\. Fork≥0k\\geq 0, letΩk\\Omega\_\{k\}be the trajectory space at levelkk, letℱk\\mathcal\{F\}\_\{k\}be itsσ\\sigma\-algebra, letP0,kP\_\{0,k\}be the passive or baseline law, and letηk\\eta\_\{k\}be the observational resolution\. LetVδ,k⊆ΩkV\_\{\\delta,k\}\\subseteq\\Omega\_\{k\}be a measurable rare\-valid event at levelkkwith baseline mass δk≜P0,k\(Vδ,k\)\>0\.\\displaystyle\\delta\_\{k\}\\triangleq P\_\{0,k\}\(V\_\{\\delta,k\}\)\>0\.\(7\)When the target mass is fixed or clear from context, we writeδ\\deltarather thanδk\\delta\_\{k\}\. Define ℒk≜\(Ωk,ℱk,P0,k,Vδ,k,ηk\)\.\\displaystyle\\mathcal\{L\}\_\{k\}\\triangleq\(\\Omega\_\{k\},\\mathcal\{F\}\_\{k\},P\_\{0,k\},V\_\{\\delta,k\},\\eta\_\{k\}\)\.\(8\)For any level\-kkpath lawQkQ\_\{k\}satisfyingVδ,k∈ℱkV\_\{\\delta,k\}\\in\\mathcal\{F\}\_\{k\}, define the level\-kkrare\-valid lift ℐδ,k\(Qk;P0,k,Vδ,k\)≜Qk\(Vδ,k\)−P0,k\(Vδ,k\)P0,k\(Vδ,k\)=Qk\(Vδ,k\)−δkδk\.\\begin\{multlined\}\\mathcal\{I\}\_\{\\delta,k\}\(Q\_\{k\};P\_\{0,k\},V\_\{\\delta,k\}\)\\triangleq\\frac\{Q\_\{k\}\(V\_\{\\delta,k\}\)\-P\_\{0,k\}\(V\_\{\\delta,k\}\)\}\{P\_\{0,k\}\(V\_\{\\delta,k\}\)\}\\\\ =\\frac\{Q\_\{k\}\(V\_\{\\delta,k\}\)\-\\delta\_\{k\}\}\{\\delta\_\{k\}\}\.\\end\{multlined\}\\mathcal\{I\}\_\{\\delta,k\}\(Q\_\{k\};P\_\{0,k\},V\_\{\\delta,k\}\)\\triangleq\\frac\{Q\_\{k\}\(V\_\{\\delta,k\}\)\-P\_\{0,k\}\(V\_\{\\delta,k\}\)\}\{P\_\{0,k\}\(V\_\{\\delta,k\}\)\}\\\\ =\\frac\{Q\_\{k\}\(V\_\{\\delta,k\}\)\-\\delta\_\{k\}\}\{\\delta\_\{k\}\}\.\(9\)The hierarchy is recursive in the upward direction:ℒk\+1\\mathcal\{L\}\_\{k\+1\}is a simulation or model ofℒk\\mathcal\{L\}\_\{k\}\. We write hatted quantities for the representation of levelkkinside levelk\+1k\+1: ℒ^k\+1→k=\(Ω^k\+1→k,ℱ^k\+1→k,P^0,k\+1→k,V^δ,k\+1→k,η^k\+1→k\)\.\\displaystyle\\widehat\{\\mathcal\{L\}\}\_\{k\+1\\to k\}=\(\\widehat\{\\Omega\}\_\{k\+1\\to k\},\\widehat\{\\mathcal\{F\}\}\_\{k\+1\\to k\},\\widehat\{P\}\_\{0,k\+1\\to k\},\\widehat\{V\}\_\{\\delta,k\+1\\to k\},\\widehat\{\\eta\}\_\{k\+1\\to k\}\)\.HereV^δ,k\+1→k\\widehat\{V\}\_\{\\delta,k\+1\\to k\}is the level\-\(k\+1\)\(k\+1\)representation of the level\-kkrare\-valid set, with simulated baseline mass δ^k\+1→k≜P^0,k\+1→k\(V^δ,k\+1→k\)\>0\.\\displaystyle\\widehat\{\\delta\}\_\{k\+1\\to k\}\\triangleq\\widehat\{P\}\_\{0,k\+1\\to k\}\(\\widehat\{V\}\_\{\\delta,k\+1\\to k\}\)\>0\.\(10\)When no ambiguity is possible, writeP^0\\widehat\{P\}\_\{0\},P^π\\widehat\{P\}\_\{\\pi\},V^δ\\widehat\{V\}\_\{\\delta\}, andδ^\\widehat\{\\delta\}for the corresponding level\-\(k\+1→k\)\(k\+1\\to k\)quantities\. The realized rare\-valid lift ofBBat levelkk, when the system induces the actual level\-kkpath lawPB,kP\_\{B,k\}, is ℐδ,kreal\(B\)≜ℐδ,k\(PB,k;P0,k,Vδ,k\)=PB,k\(Vδ,k\)−P0,k\(Vδ,k\)P0,k\(Vδ,k\)\.\\begin\{multlined\}\\mathcal\{I\}^\{\\rm real\}\_\{\\delta,k\}\(B\)\\triangleq\\mathcal\{I\}\_\{\\delta,k\}\(P\_\{B,k\};P\_\{0,k\},V\_\{\\delta,k\}\)\\\\ =\\frac\{P\_\{B,k\}\(V\_\{\\delta,k\}\)\-P\_\{0,k\}\(V\_\{\\delta,k\}\)\}\{P\_\{0,k\}\(V\_\{\\delta,k\}\)\}\.\\end\{multlined\}\\mathcal\{I\}^\{\\rm real\}\_\{\\delta,k\}\(B\)\\triangleq\\mathcal\{I\}\_\{\\delta,k\}\(P\_\{B,k\};P\_\{0,k\},V\_\{\\delta,k\}\)\\\\ =\\frac\{P\_\{B,k\}\(V\_\{\\delta,k\}\)\-P\_\{0,k\}\(V\_\{\\delta,k\}\)\}\{P\_\{0,k\}\(V\_\{\\delta,k\}\)\}\.\(11\)IfPB,k=P0,kP\_\{B,k\}=P\_\{0,k\}, thenℐδ,kreal\(B\)=0\\mathcal\{I\}^\{\\rm real\}\_\{\\delta,k\}\(B\)=0relative to that baseline\. This does not say the system lacks intelligence; it says that intelligence is not realized as a path\-law change at that level\. The intelligence potential for levelkkis computed inside the level\-\(k\+1\)\(k\+1\)simulation of levelkk\. LetΠ^k\+1→k\\widehat\{\\Pi\}\_\{k\+1\\to k\}be the simulated policy class\. Forπ∈Π^k\+1→k\\pi\\in\\widehat\{\\Pi\}\_\{k\+1\\to k\}, define the simulated rare\-valid lift ℐ^δ\(k\+1→k\)\(π\)≜P^π,k\+1→k\(V^δ\)−δ^δ^\.\\displaystyle\\widehat\{\\mathcal\{I\}\}\_\{\\delta\}^\{\(k\+1\\to k\)\}\(\\pi\)\\triangleq\\frac\{\\widehat\{P\}\_\{\\pi,k\+1\\to k\}\(\\widehat\{V\}\_\{\\delta\}\)\-\\widehat\{\\delta\}\}\{\\widehat\{\\delta\}\}\.\(12\)The intelligence potential ofBBfor levelkk, as represented at levelk\+1k\+1, is the supremal simulated rare\-valid lift over the policy class available in that representation: ℐδ,kpot\(B\)≜supπ∈Π^k\+1→kℐ^δ\(k\+1→k\)\(π\)\.\\displaystyle\\mathcal\{I\}^\{\\rm pot\}\_\{\\delta,k\}\(B\)\\triangleq\\sup\_\{\\pi\\in\\widehat\{\\Pi\}\_\{k\+1\\to k\}\}\\widehat\{\\mathcal\{I\}\}\_\{\\delta\}^\{\(k\+1\\to k\)\}\(\\pi\)\.\(13\)Thus actuation is not assumed at levelkkwhen potential is computed\. The simulated action variables live inℒk\+1\\mathcal\{L\}\_\{k\+1\}; realization at levelkkis the separate question of whether a simulated policy can indeed be implemented as an actual path\-law change\. ##### Rare\-valid simulation fidelity The relevant fidelity is not generic prediction accuracy\. A model may predict common trajectories well while missing the rare\-valid set, or be coarse in irrelevant coordinates while accurate on the rare\-valid futures and actions that matter\. We therefore define fidelity directly on the target set\. LetA^k\+1→k⊆Ω^k\+1→k\\widehat\{A\}\_\{k\+1\\to k\}\\subseteq\\widehat\{\\Omega\}\_\{k\+1\\to k\}denote the set of trajectories identified by the level\-\(k\+1\)\(k\+1\)simulation as targetable rare\-valid futures for levelkk\. We define the level\-specific rare\-valid self\-simulation fidelity Φ^k\+1→k≜P^0,k\+1→k\(A^k\+1→k∩V^δ,k\+1→k\)δ^k\+1→k\.\\displaystyle\\widehat\{\\Phi\}\_\{k\+1\\to k\}\\triangleq\\frac\{\\widehat\{P\}\_\{0,k\+1\\to k\}\(\\widehat\{A\}\_\{k\+1\\to k\}\\cap\\widehat\{V\}\_\{\\delta,k\+1\\to k\}\)\}\{\\widehat\{\\delta\}\_\{k\+1\\to k\}\}\.\(14\)When the represented level is fixed, we writeΦ^\\widehat\{\\Phi\}forΦ^k\+1→k\\widehat\{\\Phi\}\_\{k\+1\\to k\}\. The setV^δ\\widehat\{V\}\_\{\\delta\}is “true” only relative to the specified level\-kkdescription and its represented validity criterion\. ThusΦ^=1\\widehat\{\\Phi\}=1means that, at the simulated baseline resolution, the targetable set covers the represented rare\-valid set;Φ^=0\\widehat\{\\Phi\}=0means that the simulation misses it\. The corresponding rare\-valid self\-simulation error is ε^k\+1→kRV≜1−Φ^\.\\displaystyle\\widehat\{\\varepsilon\}^\{\\rm RV\}\_\{k\+1\\to k\}\\triangleq 1\-\\widehat\{\\Phi\}\.\(15\)Next we have our necessity result: under bounded amplification, high level\-relative intelligence potential cannot be obtained from low rare\-valid simulation fidelity\. ###### Theorem 1\(Rare\-valid self\-simulation fidelity is necessary\)\. Work inside the level\-\(k\+1\)\(k\+1\)simulation of levelkk, and abbreviateP^0=P^0,k\+1→k\\widehat\{P\}\_\{0\}=\\widehat\{P\}\_\{0,k\+1\\to k\},P^π=P^π,k\+1→k\\widehat\{P\}\_\{\\pi\}=\\widehat\{P\}\_\{\\pi,k\+1\\to k\},V^δ=V^δ,k\+1→k\\widehat\{V\}\_\{\\delta\}=\\widehat\{V\}\_\{\\delta,k\+1\\to k\},A^=A^k\+1→k\\widehat\{A\}=\\widehat\{A\}\_\{k\+1\\to k\},δ^=P^0\(V^δ\)\>0\\widehat\{\\delta\}=\\widehat\{P\}\_\{0\}\(\\widehat\{V\}\_\{\\delta\}\)\>0, and letΦ^\\widehat\{\\Phi\}denote the fidelity in Eq\. \([14](https://arxiv.org/html/2606.20231#S2.E14)\)\. AssumeP^π≪P^0\\widehat\{P\}\_\{\\pi\}\\ll\\widehat\{P\}\_\{0\}\. Suppose there existsαmax≥1\\alpha\_\{\\max\}\\geq 1such that dP^πdP^0\(ω\)≤αmaxP^0\-a\.e\. onA^∩V^δ,\\displaystyle\\frac\{d\\widehat\{P\}\_\{\\pi\}\}\{d\\widehat\{P\}\_\{0\}\}\(\\omega\)\\leq\\alpha\_\{\\max\}\\quad\\widehat\{P\}\_\{0\}\\text\{\-a\.e\. on \}\\widehat\{A\}\\cap\\widehat\{V\}\_\{\\delta\},\(16\)and dP^πdP^0\(ω\)≤1P^0\-a\.e\. onV^δ∖A^\.\\displaystyle\\frac\{d\\widehat\{P\}\_\{\\pi\}\}\{d\\widehat\{P\}\_\{0\}\}\(\\omega\)\\leq 1\\quad\\widehat\{P\}\_\{0\}\\text\{\-a\.e\. on \}\\widehat\{V\}\_\{\\delta\}\\setminus\\widehat\{A\}\.\(17\)Then ℐ^δ\(k\+1→k\)\(π\)≤\(αmax−1\)Φ^\.\\displaystyle\\widehat\{\\mathcal\{I\}\}\_\{\\delta\}^\{\(k\+1\\to k\)\}\(\\pi\)\\leq\(\\alpha\_\{\\max\}\-1\)\\widehat\{\\Phi\}\.\(18\)And, ifαmax\>1\\alpha\_\{\\max\}\>1and for someI0\>0I\_\{0\}\>0,ℐ^δ\(k\+1→k\)\(π\)≥I0\\widehat\{\\mathcal\{I\}\}\_\{\\delta\}^\{\(k\+1\\to k\)\}\(\\pi\)\\geq I\_\{0\}, then Φ^≥I0αmax−1\.\\displaystyle\\widehat\{\\Phi\}\\geq\\frac\{I\_\{0\}\}\{\\alpha\_\{\\max\}\-1\}\.\(19\)Thus high intelligence potential requires high rare\-valid simulation fidelity relative to the available amplification budget\. In particular, ifI0\>αmax−1I\_\{0\}\>\\alpha\_\{\\max\}\-1, no policy satisfying the amplification bound can attainI0I\_\{0\}\. *Proof\.*See Appendix[A](https://arxiv.org/html/2606.20231#A1)\. The near\-converse requires an implementation assumption: the simulation must contain a policy that amplifies the correctly identified rare\-valid region\. ###### Theorem 2\(Near\-sufficiency under effective simulated actuation\)\. Use the notation of Theorem[1](https://arxiv.org/html/2606.20231#Thmtheorem1)\. Suppose there exists a simulated policyπ\\piand constantsαmin\>1\\alpha\_\{\\min\}\>1,0≤βmin≤αmin0\\leq\\beta\_\{\\min\}\\leq\\alpha\_\{\\min\}such that dP^πdP^0\(ω\)≥αminP^0\-a\.e\. onA^∩V^δ,\\displaystyle\\frac\{d\\widehat\{P\}\_\{\\pi\}\}\{d\\widehat\{P\}\_\{0\}\}\(\\omega\)\\geq\\alpha\_\{\\min\}\\quad\\widehat\{P\}\_\{0\}\\text\{\-a\.e\. on \}\\widehat\{A\}\\cap\\widehat\{V\}\_\{\\delta\},\(20\)and dP^πdP^0\(ω\)≥βminP^0\-a\.e\. onV^δ∖A^\.\\displaystyle\\frac\{d\\widehat\{P\}\_\{\\pi\}\}\{d\\widehat\{P\}\_\{0\}\}\(\\omega\)\\geq\\beta\_\{\\min\}\\quad\\widehat\{P\}\_\{0\}\\text\{\-a\.e\. on \}\\widehat\{V\}\_\{\\delta\}\\setminus\\widehat\{A\}\.\(21\)Then ℐ^δ\(k\+1→k\)\(π\)≥αminΦ^\+βmin\(1−Φ^\)−1\.\\displaystyle\\widehat\{\\mathcal\{I\}\}\_\{\\delta\}^\{\(k\+1\\to k\)\}\(\\pi\)\\geq\\alpha\_\{\\min\}\\widehat\{\\Phi\}\+\\beta\_\{\\min\}\(1\-\\widehat\{\\Phi\}\)\-1\.\(22\)In particular, if0≤ε≤10\\leq\\varepsilon\\leq 1andΦ^≥1−ε\\widehat\{\\Phi\}\\geq 1\-\\varepsilon, then ℐ^δ\(k\+1→k\)\(π\)≥\(αmin−1\)−\(αmin−βmin\)ε\.\\displaystyle\\widehat\{\\mathcal\{I\}\}\_\{\\delta\}^\{\(k\+1\\to k\)\}\(\\pi\)\\geq\(\\alpha\_\{\\min\}\-1\)\-\(\\alpha\_\{\\min\}\-\\beta\_\{\\min\}\)\\varepsilon\.\(23\)Therefore, asΦ^→1\\widehat\{\\Phi\}\\to 1, effective simulated actuation drives the intelligence potential toward the actuation\-limited valueαmin−1\\alpha\_\{\\min\}\-1\. *Proof\.*See Appendix[A](https://arxiv.org/html/2606.20231#A1)\. Theorems[1](https://arxiv.org/html/2606.20231#Thmtheorem1)and[2](https://arxiv.org/html/2606.20231#Thmtheorem2)are the formal bridge between recursive self\-simulation and thermodynamic intelligence\. Low rare\-valid fidelity caps the achievable lift; high rare\-valid fidelity, together with a policy that amplifies the correctly identified region, yields high lift\. These statements concern intelligence potential for levelkkas computed in the level\-\(k\+1\)\(k\+1\)simulation\. Realized intelligence at levelkkadditionally requires implementation as an actual level\-kkpath\-law change\. ## IIIThermodynamic Intelligence The previous section described recursive self\-simulation as the internal architecture\. We now define the observable: probability lift over rare\-valid regions of trajectory space\. Because trajectory spaces may be continuous or high\-dimensional, rarity is defined at finite observational resolution\. ###### Definition 1\(Rare\-valid set at finite resolution\)\. LetV⊂ΩV\\subset\\Omegadenote the set of valid trajectories\. Validity is domain\-dependent\. In a physical system, validity means physical admissibility\. In a biological system, it may mean viability or functional organization\. In a symbolic system, it may mean grammaticality, semantic coherence, factual consistency, and task relevance\. LetΠη\\Pi\_\{\\eta\}be a finite measurable partition ofΩ\\Omegaat observational resolutionη\\eta\. For a cellC∈ΠηC\\in\\Pi\_\{\\eta\},P0\(C\)P\_\{0\}\(C\)is the passive probability of observing a trajectory in that cell\. A rare\-valid setVδ,ηV\_\{\\delta,\\eta\}is a union of valid cells with small passive probability and target passive massδ\\delta\. When exact normalization is possible, we choose P0\(Vδ,η\)=δ\.P\_\{0\}\(V\_\{\\delta,\\eta\}\)=\\delta\.\(24\)For a finite partition, exact equality need not hold for everyδ\\delta\. In that case one may either choose an attainable value ofδ\\delta, useP0\(Vδ,η\)≤δP\_\{0\}\(V\_\{\\delta,\\eta\}\)\\leq\\delta, or obtain exact normalization by randomized inclusion of a boundary cell\. Equivalently,Vδ,ηV\_\{\\delta,\\eta\}may be taken as the lowest\-baseline\-probability valid region of total passive massδ\\delta, up to this boundary convention\. When the resolutionη\\etais fixed, we writeVδV\_\{\\delta\}forVδ,ηV\_\{\\delta,\\eta\}\. ###### Definition 2\(Thermodynamic intelligence at resolutionδ\\delta\)\. LetPPbe the trajectory distribution induced by a system, and letVδV\_\{\\delta\}satisfyP0\(Vδ\)=δP\_\{0\}\(V\_\{\\delta\}\)=\\delta\. Define theδ\\delta\-scale thermodynamic intelligence ofPPrelative toP0P\_\{0\}andVδV\_\{\\delta\}as ℐδ\(P;P0,Vδ\)=P\(Vδ\)−P0\(Vδ\)δ=P\(Vδ\)−δδ\.\\mathcal\{I\}\_\{\\delta\}\(P;P\_\{0\},V\_\{\\delta\}\)=\\frac\{P\(V\_\{\\delta\}\)\-P\_\{0\}\(V\_\{\\delta\}\)\}\{\\delta\}=\\frac\{P\(V\_\{\\delta\}\)\-\\delta\}\{\\delta\}\.\(25\)Equivalently, in density notation, ℐδ\(P;P0,Vδ\)=1δ∫Vδ\(dP−dP0\)\.\\mathcal\{I\}\_\{\\delta\}\(P;P\_\{0\},V\_\{\\delta\}\)=\\frac\{1\}\{\\delta\}\\int\_\{V\_\{\\delta\}\}\\left\(\\,dP\-\\,dP\_\{0\}\\right\)\.\(26\)When the limit exists, define ℐ\(P;P0,V\)=limδ→0\+ℐδ\(P;P0,Vδ\)\.\\mathcal\{I\}\(P;P\_\{0\},V\)=\\lim\_\{\\delta\\to 0^\{\+\}\}\\mathcal\{I\}\_\{\\delta\}\(P;P\_\{0\},V\_\{\\delta\}\)\.\(27\) IfP=P0P=P\_\{0\}, thenℐδ=0\\mathcal\{I\}\_\{\\delta\}=0\. If a system deterministically realizes a rare\-valid cell inVδV\_\{\\delta\}, so thatP\(Vδ\)=1P\(V\_\{\\delta\}\)=1, then ℐδ=1−δδ≈1δ\.\\mathcal\{I\}\_\{\\delta\}=\\frac\{1\-\\delta\}\{\\delta\}\\approx\\frac\{1\}\{\\delta\}\.\(28\)Thus the measure ranges naturally from zero for passive systems to very large values for ideal information engines that select extremely rare valid trajectories\. The definition credits only probability mass moved into futures that are both low\-probability underP0P\_\{0\}and valid under the domain constraints\. In flexible settings, recursive self\-simulation should improve both the identification of such futures and the actions that make them more likely\. Lemma[1](https://arxiv.org/html/2606.20231#Thmlemma1)records the information\-theoretic consequence: amplifying a rare\-valid set requires path\-measure divergence from the passive baseline\. ###### Lemma 1\(Rare\-valid amplification implies path\-measure divergence\)\. LetP0\(Vδ\)=δP\_\{0\}\(V\_\{\\delta\}\)=\\delta, letp=P\(Vδ\)p=P\(V\_\{\\delta\}\), and suppose0<δ<10<\\delta<1\. Define ℐδ=p−δδ\.\\mathcal\{I\}\_\{\\delta\}=\\frac\{p\-\\delta\}\{\\delta\}\.\(29\)Then DKL\(P∥P0\)≥d\(p∥δ\),D\_\{\\mathrm\{KL\}\}\(P\\,\\\|\\,P\_\{0\}\)\\geq d\(p\\,\\\|\\,\\delta\),\(30\)where d\(p∥δ\)=plogpδ\+\(1−p\)log1−p1−δd\(p\\,\\\|\\,\\delta\)=p\\log\\frac\{p\}\{\\delta\}\+\(1\-p\)\\log\\frac\{1\-p\}\{1\-\\delta\}\(31\)is the binary KL divergence\. Equivalently, whenp=δ\(1\+ℐδ\)≤1p=\\delta\(1\+\\mathcal\{I\}\_\{\\delta\}\)\\leq 1, DKL\(P∥P0\)≥d\(δ\(1\+ℐδ\)∥δ\)\.D\_\{\\mathrm\{KL\}\}\(P\\,\\\|\\,P\_\{0\}\)\\geq d\\\!\\left\(\\delta\(1\+\\mathcal\{I\}\_\{\\delta\}\)\\,\\\|\\,\\delta\\right\)\.\(32\) *Proof\.*See Appendix[A](https://arxiv.org/html/2606.20231#A1)\. ### III\.1Trajectory\-Space Thermodynamics Rare\-valid lift is a change in trajectory probabilities, so its natural thermodynamic setting is path space\. Let\(Ω,ℱ\)\(\\Omega,\\mathcal\{F\}\)denote a measurable space of trajectoriesω\\omegaover\[0,τ\]\[0,\\tau\]\. LetP0P\_\{0\}be the passive path measure andPBP\_\{B\}the path measure induced by an agentBBacting through feedback\. LetS\(ω\)S\(\\omega\)denote physical entropy production alongω\\omega, and define the dimensionless entropy production σ\(ω\)=S\(ω\)kB\.\\displaystyle\\sigma\(\\omega\)=\\frac\{S\(\\omega\)\}\{k\_\{B\}\}\.\(33\)For matched entropy\-production binsAs\+A\_\{s\}^\{\+\}andAs−A\_\{s\}^\{\-\}, whereAs\+A\_\{s\}^\{\+\}collects trajectories withσ\(ω\)≈\+s\\sigma\(\\omega\)\\approx\+sandAs−A\_\{s\}^\{\-\}collects the corresponding time\-reversed or otherwise matched trajectories withσ\(ω\)≈−s\\sigma\(\\omega\)\\approx\-s, the passive fluctuation\-theorem relation is written at event level as logP0\(As\+\)P0\(As−\)≃s\.\\displaystyle\\log\\frac\{P\_\{0\}\(A\_\{s\}^\{\+\}\)\}\{P\_\{0\}\(A\_\{s\}^\{\-\}\)\}\\simeq s\.\(34\)Equivalently, in dimensional units, if the matched bins correspond to entropy productions\+ΔS\+\\Delta Sand−ΔS\-\\Delta S, then the right\-hand side isΔS/kB\\Delta S/k\_\{B\}\. The symbol≃\\simeqmarks event\-level coarse\-graining: exact equality requires bins and matching conventions that preserve the underlying trajectory\-level fluctuation relation\. Now, feedback replaces the passive lawP0P\_\{0\}by the controlled lawPBP\_\{B\}\. Microscopic feedback fluctuation relations generally contain trajectory\-dependent measurement and information terms, so a universal scalar correction need not exist after coarse\-graining\. For the fixed bins used here, we therefore define the event\-level information correction directly: Js\(B\)≜s−logPB\(As\+\)PB\(As−\)\.\\displaystyle J\_\{s\}\(B\)\\triangleq s\-\\log\\frac\{P\_\{B\}\(A\_\{s\}^\{\+\}\)\}\{P\_\{B\}\(A\_\{s\}^\{\-\}\)\}\.\(35\)Equivalently, logPB\(As\+\)PB\(As−\)=s−Js\(B\)\.\\displaystyle\\log\\frac\{P\_\{B\}\(A\_\{s\}^\{\+\}\)\}\{P\_\{B\}\(A\_\{s\}^\{\-\}\)\}=s\-J\_\{s\}\(B\)\.\(36\)ThusJs\(B\)J\_\{s\}\(B\)records how feedback changes the entropy\-production log\-ratio on the chosen bins relative to the passive fluctuation\-theorem scale\. It is a coarse\-grained diagnostic, not a complete thermodynamic balance; measurement, memory, computation, control, and erasure remain part of the full physical accounting\. ##### Coarse\-grained path\-deviation stability We next record a stability bound for the entropy\-bin signatures just defined\. LetPPandQQbe path measures on\(Ω,ℱ\)\(\\Omega,\\mathcal\{F\}\)\. For matched entropy\-production eventsAs\+A\_\{s\}^\{\+\}andAs−A\_\{s\}^\{\-\}, define Δs\(P,Q\)≜logP\(As\+\)P\(As−\)−logQ\(As\+\)Q\(As−\)\.\\displaystyle\\Delta\_\{s\}\(P,Q\)\\triangleq\\log\\frac\{P\(A\_\{s\}^\{\+\}\)\}\{P\(A\_\{s\}^\{\-\}\)\}\-\\log\\frac\{Q\(A\_\{s\}^\{\+\}\)\}\{Q\(A\_\{s\}^\{\-\}\)\}\.\(37\)ForP=PBP=P\_\{B\}andQ=P0Q=P\_\{0\}, writeΔs\(B\)=Δs\(PB,P0\)\\Delta\_\{s\}\(B\)=\\Delta\_\{s\}\(P\_\{B\},P\_\{0\}\)\. If the passive fluctuation relation holds in the dimensionless convention, then Δs\(B\)=logPB\(As\+\)PB\(As−\)−s,\\displaystyle\\Delta\_\{s\}\(B\)=\\log\\frac\{P\_\{B\}\(A\_\{s\}^\{\+\}\)\}\{P\_\{B\}\(A\_\{s\}^\{\-\}\)\}\-s,\(38\)withssreplaced byΔS/kB\\Delta S/k\_\{B\}in dimensional units\. ###### Assumption 1\(Nondegenerate entropy bins\)\. For the two path measures being compared, there existsms\>0m\_\{s\}\>0such that P\(As\+\),P\(As−\),Q\(As\+\),Q\(As−\)≥ms\.\\displaystyle P\(A\_\{s\}^\{\+\}\),\\,P\(A\_\{s\}^\{\-\}\),\\,Q\(A\_\{s\}^\{\+\}\),\\,Q\(A\_\{s\}^\{\-\}\)\\geq m\_\{s\}\.\(39\) ###### Theorem 3\(Coarse\-grained path\-deviation bound\)\. Under Assumption[1](https://arxiv.org/html/2606.20231#Thmassumption1), \|Δs\(P,Q\)\|≤2msDKL\(P∥Q\)\.\\displaystyle\|\\Delta\_\{s\}\(P,Q\)\|\\leq\\frac\{\\sqrt\{2\}\}\{m\_\{s\}\}\\sqrt\{D\_\{\\mathrm\{KL\}\}\(P\\,\\\|\\,Q\)\}\.\(40\)In particular, \|Δs\(B\)\|≤2msDKL\(PB∥P0\)\.\\displaystyle\|\\Delta\_\{s\}\(B\)\|\\leq\\frac\{\\sqrt\{2\}\}\{m\_\{s\}\}\\sqrt\{D\_\{\\mathrm\{KL\}\}\(P\_\{B\}\\,\\\|\\,P\_\{0\}\)\}\.\(41\) *Proof\.*See Appendix[A](https://arxiv.org/html/2606.20231#A1)\. Theorem[3](https://arxiv.org/html/2606.20231#Thmtheorem3)gives the thermodynamic role of path\-measure divergence\. Lemma[1](https://arxiv.org/html/2606.20231#Thmlemma1)shows that rare\-valid amplification requires divergence from the passive law\. Theorem[3](https://arxiv.org/html/2606.20231#Thmtheorem3)shows that such divergence also controls how much coarse\-grained entropy\-production log\-ratios can change on fixed nondegenerate bins\. Thus the rare\-valid lift is not an isolated score: when a controller reweights trajectory probabilities, the induced change is constrained in the same path\-measure geometry that governs coarse\-grained thermodynamic signatures\. The bound is intentionally finite\-bin and moderate\-event; rare\-event amplification itself is handled by Lemma[1](https://arxiv.org/html/2606.20231#Thmlemma1)\. ##### Auxiliary model\-to\-control continuity The fidelity theorems above are rare\-set results\. A separate continuity statement compares coarse\-grained thermodynamic signatures induced by nearby path laws\. It does not prove high thermodynamic intelligence from high fidelity; it only says that finite\-depth controlled laws inherit the entropy\-bin signatures of an ideal controlled law when the induced path laws are close\. LetPB\(k\)P\_\{B\}^\{\(k\)\}denote the controlled path law induced by a policy computed from thekk\-level recursive internal modelrB\(k\)r\_\{B\}^\{\(k\)\}\. LetPB⋆P\_\{B\}^\{\\star\}denote the ideal controlled path law induced by the limiting or perfectly faithful recursive model for the same objective and admissible control class\. Letεk≥0\\varepsilon\_\{k\}\\geq 0denote intervention\-relevant prediction error\. ###### Assumption 2\(Model\-to\-control stability\)\. There exist a constantC\>0C\>0and a modulusρ\\rho, withρ\(ε\)→0\\rho\(\\varepsilon\)\\to 0asε→0\\varepsilon\\to 0, such that DKL\(PB\(k\)∥PB⋆\)≤Cρ\(εk\)\.\\displaystyle D\_\{\\mathrm\{KL\}\}\\left\(P\_\{B\}^\{\(k\)\}\\,\\\|\\,P\_\{B\}^\{\\star\}\\right\)\\leq C\\rho\(\\varepsilon\_\{k\}\)\.\(42\) ###### Proposition 1\(Recursive fidelity controls convergence to the ideal thermodynamic signature\)\. Assume model\-to\-control stability\. Suppose the entropy bins are nondegenerate underPB\(k\)P\_\{B\}^\{\(k\)\}andPB⋆P\_\{B\}^\{\\star\}, with lower boundms\>0m\_\{s\}\>0\. Then \|Δs\(PB\(k\),P0\)−Δs\(PB⋆,P0\)\|≤2Cmsρ\(εk\)\.\\displaystyle\\left\|\\Delta\_\{s\}\(P\_\{B\}^\{\(k\)\},P\_\{0\}\)\-\\Delta\_\{s\}\(P\_\{B\}^\{\\star\},P\_\{0\}\)\\right\|\\leq\\frac\{\\sqrt\{2C\}\}\{m\_\{s\}\}\\sqrt\{\\rho\(\\varepsilon\_\{k\}\)\}\.\(43\)Consequently, ifεk→0\\varepsilon\_\{k\}\\to 0, then the finite\-depth thermodynamic signature converges to the ideal controlled thermodynamic signature at the rate determined byρ\\rho\. *Proof\.*See Appendix[A](https://arxiv.org/html/2606.20231#A1)\. ### III\.2Imperfect Rare\-Set Identification The definition above assumes access to the true rare\-valid setVδV\_\{\\delta\}\. Real agents infer an estimated setVδ′V^\{\\prime\}\_\{\\delta\}, so amplification can be spent on false\-positive trajectories that are rare but not valid\. We model this protocol\-level bookkeeping penalty\. ##### Perfect identification Assume thatPPis absolutely continuous with respect toP0P\_\{0\}on the true rare\-valid set and amplifies that set by a constant likelihood factorα\\alpha: dPdP0\(ω\)=α,ω∈Vδ,\\frac\{dP\}\{dP\_\{0\}\}\(\\omega\)=\\alpha,\\qquad\\omega\\in V\_\{\\delta\},\(44\)withP0\(Vδ\)=δP\_\{0\}\(V\_\{\\delta\}\)=\\deltaandαδ≤1\\alpha\\delta\\leq 1\. OutsideVδV\_\{\\delta\},PPis renormalized so that it remains a probability distribution\. Substituting into Definition 2 gives ℐδ=P\(Vδ\)−P0\(Vδ\)δ=αδ−δδ=α−1\.\\mathcal\{I\}\_\{\\delta\}=\\frac\{P\(V\_\{\\delta\}\)\-P\_\{0\}\(V\_\{\\delta\}\)\}\{\\delta\}=\\frac\{\\alpha\\delta\-\\delta\}\{\\delta\}=\\alpha\-1\.\(45\)Thus, at fixed resolutionδ\\delta,α=ℐδ\+1\.\\alpha=\\mathcal\{I\}\_\{\\delta\}\+1\.If the limit definingℐ\\mathcal\{I\}exists and the amplification factor has a corresponding limiting value, thenα=ℐ\+1\\alpha=\\mathcal\{I\}\+1in that limit\. The local log\-likelihood entropy bookkeeping associated with an amplified rare\-valid trajectory is Sidealloc=−kBlogdPdP0=−kBlogα=−kBlog\(ℐδ\+1\)\.\\displaystyle S\_\{\\rm ideal\}^\{\\rm loc\}=\-k\_\{B\}\\log\\frac\{dP\}\{dP\_\{0\}\}=\-k\_\{B\}\\log\\alpha=\-k\_\{B\}\\log\(\\mathcal\{I\}\_\{\\delta\}\+1\)\.This is a local likelihood\-accounting term, not a complete entropy balance without a specified physical protocol\. ##### Imperfect identification Let the agent identify an approximate rare\-valid setVδ′V^\{\\prime\}\_\{\\delta\}instead ofVδV\_\{\\delta\}\. For the main theorem we analyze the conservative\-identification case in which the estimated set contains the true rare\-valid set, Vδ⊆Vδ′\.\\displaystyle V\_\{\\delta\}\\subseteq V^\{\\prime\}\_\{\\delta\}\.\(46\)This isolates false\-positive cost and excludes false negatives\. Define Eδ=Vδ′∖Vδ,perr=P0\(Eδ\)\.\\displaystyle E\_\{\\delta\}=V^\{\\prime\}\_\{\\delta\}\\setminus V\_\{\\delta\},\\qquad p\_\{\\rm err\}=P\_\{0\}\(E\_\{\\delta\}\)\.\(47\)Assume that the agent amplifiesVδ′V^\{\\prime\}\_\{\\delta\}by the same likelihood factorα\\alpha: dPdP0\(ω\)=α,ω∈Vδ′\.\\frac\{dP\}\{dP\_\{0\}\}\(\\omega\)=\\alpha,\\qquad\\omega\\in V^\{\\prime\}\_\{\\delta\}\.\(48\)Require the normalization condition α\(δ\+perr\)≤1,\\displaystyle\\alpha\(\\delta\+p\_\{\\rm err\}\)\\leq 1,\(49\)so that the amplified mass assigned toVδ′V^\{\\prime\}\_\{\\delta\}remains compatible with a probability law\. Under \([46](https://arxiv.org/html/2606.20231#S3.E46)\),P\(Vδ\)=αδP\(V\_\{\\delta\}\)=\\alpha\\delta, henceα=ℐδ\+1\\alpha=\\mathcal\{I\}\_\{\\delta\}\+1, andP\(Eδ\)=αperrP\(E\_\{\\delta\}\)=\\alpha p\_\{\\rm err\}\. False\-positive correction depends on the physical protocol used to store, test, correct, or erase erroneous assignments\. We therefore keep the cost explicit and report both the expected per\-trial term and the version normalized per amplified true rare\-valid trajectory\. ###### Assumption 3\(Error\-resolution protocol\)\. For a false\-positive regionEδE\_\{\\delta\}with baseline massperr=P0\(Eδ\)p\_\{\\mathrm\{err\}\}=P\_\{0\}\(E\_\{\\delta\}\), the protocol used to resolve, correct, or erase amplified erroneous assignments has entropy costkBc\(perr\)k\_\{B\}c\(p\_\{\\mathrm\{err\}\}\)per unit amplified false\-positive mass, wherec\(p\)≥0c\(p\)\\geq 0\. The baseline\-surprisal Landauer bookkeeping protocol corresponds toc\(p\)=log\(1/p\)c\(p\)=\\log\(1/p\)\. This convention charges erroneous assignments according to their rarity under the passive baselineP0P\_\{0\}, not according to their mass after amplification\. Other physical implementations may induce different cost functions\. ###### Theorem 4\(Conditional imperfect rare\-set identification accounting\)\. LetVδV\_\{\\delta\}be the true rare\-valid set withP0\(Vδ\)=δP\_\{0\}\(V\_\{\\delta\}\)=\\delta, and letVδ′V^\{\\prime\}\_\{\\delta\}be the agent’s estimated rare\-valid set\. SupposeVδ⊆Vδ′V\_\{\\delta\}\\subseteq V^\{\\prime\}\_\{\\delta\}, let perr=P0\(Vδ′∖Vδ\),p\_\{\\mathrm\{err\}\}=P\_\{0\}\(V^\{\\prime\}\_\{\\delta\}\\setminus V\_\{\\delta\}\),assumedP/dP0=α\\,dP/\\,dP\_\{0\}=\\alphaonVδ′V^\{\\prime\}\_\{\\delta\}, and assumeα\(δ\+perr\)≤1\\alpha\(\\delta\+p\_\{\\mathrm\{err\}\}\)\\leq 1\. Under Assumption[3](https://arxiv.org/html/2606.20231#Thmassumption3), the expected false\-positive overhead per trial is ΔS¯err≜αkBperrc\(perr\)\.\\displaystyle\\overline\{\\Delta S\}\_\{\\mathrm\{err\}\}\\triangleq\\alpha k\_\{B\}p\_\{\\mathrm\{err\}\}\\,c\(p\_\{\\mathrm\{err\}\}\)\.\(50\)The corresponding expected protocol\-adjusted bookkeeping per trial is S¯imperfect=−αδkBlogα\+ΔS¯err\.\\displaystyle\\overline\{S\}\_\{\\mathrm\{imperfect\}\}=\-\\alpha\\delta k\_\{B\}\\log\\alpha\+\\overline\{\\Delta S\}\_\{\\mathrm\{err\}\}\.\(51\)Equivalently, normalizing by the amplified true rare\-valid massP\(Vδ\)=αδP\(V\_\{\\delta\}\)=\\alpha\\delta, the protocol\-adjusted local bookkeeping per amplified true rare\-valid trajectory is Simperfectloc=−kBlogα\+perrδkBc\(perr\)\.\\displaystyle S\_\{\\mathrm\{imperfect\}\}^\{\\mathrm\{loc\}\}=\-k\_\{B\}\\log\\alpha\+\\frac\{p\_\{\\mathrm\{err\}\}\}\{\\delta\}k\_\{B\}c\(p\_\{\\mathrm\{err\}\}\)\.\(52\)For the baseline\-surprisal Landauer bookkeeping protocolc\(p\)=log\(1/p\)c\(p\)=\\log\(1/p\), ΔS¯err=αkBperrlog1perr,\\displaystyle\\overline\{\\Delta S\}\_\{\\mathrm\{err\}\}=\\alpha k\_\{B\}p\_\{\\mathrm\{err\}\}\\log\\frac\{1\}\{p\_\{\\mathrm\{err\}\}\},\(53\)and Simperfectloc=−kBlogα\+perrδkBlog1perr\.\\displaystyle S\_\{\\mathrm\{imperfect\}\}^\{\\mathrm\{loc\}\}=\-k\_\{B\}\\log\\alpha\+\\frac\{p\_\{\\mathrm\{err\}\}\}\{\\delta\}k\_\{B\}\\log\\frac\{1\}\{p\_\{\\mathrm\{err\}\}\}\.\(54\)Usingα=ℐδ\+1\\alpha=\\mathcal\{I\}\_\{\\delta\}\+1, this becomes Simperfectloc=−kBlog\(ℐδ\+1\)\+perrδkBlog1perr\.\\displaystyle S\_\{\\mathrm\{imperfect\}\}^\{\\mathrm\{loc\}\}=\-k\_\{B\}\\log\(\\mathcal\{I\}\_\{\\delta\}\+1\)\+\\frac\{p\_\{\\mathrm\{err\}\}\}\{\\delta\}k\_\{B\}\\log\\frac\{1\}\{p\_\{\\mathrm\{err\}\}\}\.The expected version is S¯imperfect=−\(ℐδ\+1\)δkBlog\(ℐδ\+1\)\+\(ℐδ\+1\)kBperrlog1perr\.\\begin\{multlined\}\\overline\{S\}\_\{\\mathrm\{imperfect\}\}=\-\(\\mathcal\{I\}\_\{\\delta\}\+1\)\\delta k\_\{B\}\\log\(\\mathcal\{I\}\_\{\\delta\}\+1\)\\\\ \+\(\\mathcal\{I\}\_\{\\delta\}\+1\)k\_\{B\}p\_\{\\mathrm\{err\}\}\\log\\frac\{1\}\{p\_\{\\mathrm\{err\}\}\}\.\\end\{multlined\}\\overline\{S\}\_\{\\mathrm\{imperfect\}\}=\-\(\\mathcal\{I\}\_\{\\delta\}\+1\)\\delta k\_\{B\}\\log\(\\mathcal\{I\}\_\{\\delta\}\+1\)\\\\ \+\(\\mathcal\{I\}\_\{\\delta\}\+1\)k\_\{B\}p\_\{\\mathrm\{err\}\}\\log\\frac\{1\}\{p\_\{\\mathrm\{err\}\}\}\.\(55\) *Proof\.*See Appendix[A](https://arxiv.org/html/2606.20231#A1)\. ### III\.3Link to recursive self\-modeling resolution The dependence ofperrp\_\{\\rm err\}on recursive model resolution is a statistical learning question, not a thermodynamic law\. We therefore state rare\-set stability as a hypothesis\. In the conservative\-identification case, the inferred set covers the true rare\-valid set and only the excess false\-positive mass shrinks with model fidelity\. LetVδ\(k\)V\_\{\\delta\}^\{\(k\)\}denote the rare\-valid set inferred from thekk\-level recursive modelrB\(k\)r\_\{B\}^\{\(k\)\}\. Define perr\(k\)=P0\(Vδ\(k\)∖Vδ\)\.\\displaystyle p\_\{\\rm err\}^\{\(k\)\}=P\_\{0\}\\\!\\left\(V\_\{\\delta\}^\{\(k\)\}\\setminus V\_\{\\delta\}\\right\)\.\(56\)Letεk\\varepsilon\_\{k\}be the intervention\-relevant prediction error from Assumption[2](https://arxiv.org/html/2606.20231#Thmassumption2)\. ###### Hypothesis 1\(Rare\-set stability under model convergence\)\. There exists a modulus of continuityρδ\\rho\_\{\\delta\}, withρδ\(ε\)→0\\rho\_\{\\delta\}\(\\varepsilon\)\\to 0asε→0\\varepsilon\\to 0, such that perr\(k\)≤ρδ\(εk\)\.p\_\{\\rm err\}^\{\(k\)\}\\leq\\rho\_\{\\delta\}\(\\varepsilon\_\{k\}\)\.\(57\) This hypothesis is nontrivial: small KL model error need not imply small false\-positive mass on a rare set unless the rare\-valid boundary is stable\. A concrete finite\-resolution sufficient condition is given in Appendix[A\.9](https://arxiv.org/html/2606.20231#A1.SS9); it shows that Hypothesis[1](https://arxiv.org/html/2606.20231#Thmhypothesis1)follows from uniform validity\-score convergence together with a margin bound on the rare\-valid boundary\. ###### Corollary 1\(Conditional recursive self\-modeling entropy efficiency\)\. Assume the baseline\-surprisal Landauer bookkeeping protocol and Hypothesis[1](https://arxiv.org/html/2606.20231#Thmhypothesis1)\. SupposeVδ⊆Vδ\(k\)V\_\{\\delta\}\\subseteq V\_\{\\delta\}^\{\(k\)\},\(ℐδ\+1\)\(δ\+perr\(k\)\)≤1\(\\mathcal\{I\}\_\{\\delta\}\+1\)\(\\delta\+p\_\{\\mathrm\{err\}\}^\{\(k\)\}\)\\leq 1,perr\(k\)<e−1p\_\{\\mathrm\{err\}\}^\{\(k\)\}<e^\{\-1\}, andρδ\(εk\)<e−1\\rho\_\{\\delta\}\(\\varepsilon\_\{k\}\)<e^\{\-1\}\. Define the expected excess bookkeeping penalty ΔS¯err\(k\)≜S¯k\+\(ℐδ\+1\)δkBlog\(ℐδ\+1\)\.\\displaystyle\\overline\{\\Delta S\}\_\{\\mathrm\{err\}\}^\{\(k\)\}\\triangleq\\overline\{S\}\_\{k\}\+\(\\mathcal\{I\}\_\{\\delta\}\+1\)\\delta k\_\{B\}\\log\(\\mathcal\{I\}\_\{\\delta\}\+1\)\.\(58\)Then 0≤ΔS¯err\(k\)=\(ℐδ\+1\)kBperr\(k\)log1perr\(k\)≤\(ℐδ\+1\)kBρδ\(εk\)log1ρδ\(εk\)\.\\begin\{multlined\}0\\leq\\overline\{\\Delta S\}\_\{\\mathrm\{err\}\}^\{\(k\)\}=\(\\mathcal\{I\}\_\{\\delta\}\+1\)k\_\{B\}p\_\{\\mathrm\{err\}\}^\{\(k\)\}\\log\\frac\{1\}\{p\_\{\\mathrm\{err\}\}^\{\(k\)\}\}\\\\ \\leq\(\\mathcal\{I\}\_\{\\delta\}\+1\)k\_\{B\}\\rho\_\{\\delta\}\(\\varepsilon\_\{k\}\)\\log\\frac\{1\}\{\\rho\_\{\\delta\}\(\\varepsilon\_\{k\}\)\}\.\\end\{multlined\}0\\leq\\overline\{\\Delta S\}\_\{\\mathrm\{err\}\}^\{\(k\)\}=\(\\mathcal\{I\}\_\{\\delta\}\+1\)k\_\{B\}p\_\{\\mathrm\{err\}\}^\{\(k\)\}\\log\\frac\{1\}\{p\_\{\\mathrm\{err\}\}^\{\(k\)\}\}\\\\ \\leq\(\\mathcal\{I\}\_\{\\delta\}\+1\)k\_\{B\}\\rho\_\{\\delta\}\(\\varepsilon\_\{k\}\)\\log\\frac\{1\}\{\\rho\_\{\\delta\}\(\\varepsilon\_\{k\}\)\}\.\(59\)Equivalently, normalized per amplified true rare\-valid trajectory, 0≤Δserr\(k\)≜Skloc\+kBlog\(ℐδ\+1\)=kBδperr\(k\)log1perr\(k\)≤kBδρδ\(εk\)log1ρδ\(εk\)\.\\begin\{multlined\}0\\leq\\Delta s\_\{\\mathrm\{err\}\}^\{\(k\)\}\\triangleq S\_\{k\}^\{\\mathrm\{loc\}\}\+k\_\{B\}\\log\(\\mathcal\{I\}\_\{\\delta\}\+1\)\\\\ =\\frac\{k\_\{B\}\}\{\\delta\}p\_\{\\mathrm\{err\}\}^\{\(k\)\}\\log\\frac\{1\}\{p\_\{\\mathrm\{err\}\}^\{\(k\)\}\}\\leq\\frac\{k\_\{B\}\}\{\\delta\}\\rho\_\{\\delta\}\(\\varepsilon\_\{k\}\)\\log\\frac\{1\}\{\\rho\_\{\\delta\}\(\\varepsilon\_\{k\}\)\}\.\\end\{multlined\}0\\leq\\Delta s\_\{\\mathrm\{err\}\}^\{\(k\)\}\\triangleq S\_\{k\}^\{\\mathrm\{loc\}\}\+k\_\{B\}\\log\(\\mathcal\{I\}\_\{\\delta\}\+1\)\\\\ =\\frac\{k\_\{B\}\}\{\\delta\}p\_\{\\mathrm\{err\}\}^\{\(k\)\}\\log\\frac\{1\}\{p\_\{\\mathrm\{err\}\}^\{\(k\)\}\}\\leq\\frac\{k\_\{B\}\}\{\\delta\}\\rho\_\{\\delta\}\(\\varepsilon\_\{k\}\)\\log\\frac\{1\}\{\\rho\_\{\\delta\}\(\\varepsilon\_\{k\}\)\}\.\(60\)Consequently, εk→0⟹perr\(k\)→0⟹ΔS¯err\(k\)→0,Δserr\(k\)→0,\\displaystyle\\varepsilon\_\{k\}\\to 0\\Longrightarrow\\quad p\_\{\\mathrm\{err\}\}^\{\(k\)\}\\to 0\\Longrightarrow\\quad\\overline\{\\Delta S\}\_\{\\mathrm\{err\}\}^\{\(k\)\}\\to 0,\\Delta s\_\{\\mathrm\{err\}\}^\{\(k\)\}\\to 0,and therefore Skloc→−kBlog\(ℐδ\+1\)\\displaystyle S\_\{k\}^\{\\mathrm\{loc\}\}\\to\-k\_\{B\}\\log\(\\mathcal\{I\}\_\{\\delta\}\+1\)\(61\)from above\. *Proof\.*See Appendix[A](https://arxiv.org/html/2606.20231#A1)\. Together, model\-to\-control stability and rare\-set stability give the intended chain: recursive fidelity improves the controlled law, improves rare\-set identification, and reduces false\-positive bookkeeping waste\. Amplification power and rare\-set resolution remain distinct: a powerful but poorly resolved controller can amplify wrong futures, while a high\-resolution but weakly actuating model can identify rare\-valid futures without making them likely\. ## IVIntelligence Calculations ### IV\.1Intelligence of Maxwell’s Demons Maxwell’s demon is the canonical limiting case of realized thermodynamic intelligence\. In level\-relative terms, the demon carries an effectively perfect simulation of the relevant gas microstate and the consequences of its gate action\. It observes, stores, and acts to separate particles by velocity\. Under passive dynamics, a temperature gradient is exponentially unlikely; under demon\-assisted dynamics, the trajectory can become likely or deterministic\. The demon is not a free entropy machine, but an idealized upper bound: near\-perfect rare\-valid fidelity, target identification, and actuation at the measured level\. LetΔS\>0\\Delta S\>0denote the magnitude of a local entropy reduction in the gas\. For a matched entropy\-producing trajectory, the fluctuation\-theorem scale gives P0\(\+ΔS\)P0\(−ΔS\)∼exp\(ΔSkB\)\.\\displaystyle\\frac\{P\_\{0\}\(\+\\Delta S\)\}\{P\_\{0\}\(\-\\Delta S\)\}\\sim\\exp\\left\(\\frac\{\\Delta S\}\{k\_\{B\}\}\\right\)\.\(62\)Equivalently, P0\(−ΔS\)∼P0\(\+ΔS\)exp\(−ΔSkB\)\.\\displaystyle P\_\{0\}\(\-\\Delta S\)\\sim P\_\{0\}\(\+\\Delta S\)\\exp\\left\(\-\\frac\{\\Delta S\}\{k\_\{B\}\}\\right\)\.\(63\)If the positive\-entropy counterpart has order\-one probability at the chosen coarse\-graining, then the entropy\-reducing event has passive probability on the scaleexp\(−ΔS/kB\)\\exp\(\-\\Delta S/k\_\{B\}\)\. An ideal demon that realizes it with probability near one therefore has amplification ℐ\+1∼exp\(ΔSkB\),log10\(ℐ\+1\)=ΔS/kBlog10\.\\displaystyle\\mathcal\{I\}\+1\\sim\\exp\\left\(\\frac\{\\Delta S\}\{k\_\{B\}\}\\right\),\\qquad\\log\_\{10\}\(\\mathcal\{I\}\+1\)=\\frac\{\\Delta S/k\_\{B\}\}\{\\log 10\}\.\(64\) ##### Illustrative entropy\-reduction calculation SupposeΔS=10−19J/K\\Delta S=10^\{\-19\}\\,\\mathrm\{J/K\}\. SincekB=1\.380649×10−23J/Kk\_\{B\}=1\.380649\\times 10^\{\-23\}\\,\\mathrm\{J/K\}, we obtain ΔSkB≈10−191\.380649×10−23≈7243\.\\displaystyle\\frac\{\\Delta S\}\{k\_\{B\}\}\\approx\\frac\{10^\{\-19\}\}\{1\.380649\\times 10^\{\-23\}\}\\approx 7243\.\(65\)Therefore, at anO\(1\)O\(1\)positive\-counterpart coarse\-graining, P0\(−ΔS\)∼e−7243≈10−3146,ℐ\+1∼e7243≈103146\.\\displaystyle P\_\{0\}\(\-\\Delta S\)\\sim e^\{\-7243\}\\approx 10^\{\-3146\},\\qquad\\mathcal\{I\}\+1\\sim e^\{7243\}\\approx 10^\{3146\}\.Thus an ideal demon that deterministically realizes this trajectory has a thermodynamic\-intelligence scale on the order ofℐ∼103146\\mathcal\{I\}\\sim 10^\{3146\}\. This should be read as a fluctuation\-theorem scale calculation, not as a complete microscopic probability model for a particular gas protocol\. ##### Velocity\-selection demon Consider a classical ideal gas ofNNparticles at temperatureTT\. Let the demon select particles whose kinetic energy exceeds a thresholdϵ=mvc22kBT\\epsilon=\\frac\{mv\_\{c\}^\{2\}\}\{2k\_\{B\}T\}\. For a three\-dimensional Maxwell–Boltzmann gas, the dimensionless kinetic energyx=mv2/\(2kBT\)x=mv^\{2\}/\(2k\_\{B\}T\)follows a Gamma distribution with density f\(x\)=2πx1/2e−x\.\\displaystyle f\(x\)=\\frac\{2\}\{\\sqrt\{\\pi\}\}x^\{1/2\}e^\{\-x\}\.\(66\)The fraction of particles above threshold is q\(ϵ\)=ℙ\(x\>ϵ\)=Γ\(3/2,ϵ\)Γ\(3/2\)=erfc\(ϵ\)\+2πϵe−ϵ\.\\displaystyle q\(\\epsilon\)=\\mathbb\{P\}\(x\>\\epsilon\)=\\frac\{\\Gamma\(3/2,\\epsilon\)\}\{\\Gamma\(3/2\)\}=\\operatorname\{erfc\}\(\\sqrt\{\\epsilon\}\)\+\\frac\{2\}\{\\sqrt\{\\pi\}\}\\sqrt\{\\epsilon\}e^\{\-\\epsilon\}\.The conditional mean dimensionless energy above threshold is 𝔼\[x∣x\>ϵ\]=Γ\(5/2,ϵ\)Γ\(3/2,ϵ\)\.\\displaystyle\\mathbb\{E\}\[x\\mid x\>\\epsilon\]=\\frac\{\\Gamma\(5/2,\\epsilon\)\}\{\\Gamma\(3/2,\\epsilon\)\}\.\(67\)The excess dimensionless kinetic energy above the thermal mean3/23/2is ϕ\(ϵ\)=Γ\(5/2,ϵ\)Γ\(3/2,ϵ\)−32\.\\displaystyle\\phi\(\\epsilon\)=\\frac\{\\Gamma\(5/2,\\epsilon\)\}\{\\Gamma\(3/2,\\epsilon\)\}\-\\frac\{3\}\{2\}\.\(68\)The following is a localQ/TQ/T\-scale estimate, not a full entropy accounting\. Interpreting the selected particles’ excess kinetic energy as the heat scale sorted at ambient temperatureTT, the dimensionless local entropy\-reduction proxy is \|ΔS\|kB≈N2q\(ϵ\)ϕ\(ϵ\)\.\\displaystyle\\frac\{\|\\Delta S\|\}\{k\_\{B\}\}\\approx\\frac\{N\}\{2\}q\(\\epsilon\)\\phi\(\\epsilon\)\.\(69\)Here the factor1/21/2corresponds to the idealized one\-shot protocol in which the demon acts on one half of the chamber\. Therefore log10\(ℐ\+1\)≈N2log10q\(ϵ\)ϕ\(ϵ\)\.\\displaystyle\\log\_\{10\}\(\\mathcal\{I\}\+1\)\\approx\\frac\{N\}\{2\\log 10\}q\(\\epsilon\)\\phi\(\\epsilon\)\.\(70\) ##### Concrete numerical instance \(ambient nitrogen\) To make the velocity\-selection estimate explicit, consider nitrogen\-like air atT=300KT=300\\,\\mathrm\{K\}, with molecular massmN2≈4\.65×10−26kgm\_\{\\mathrm\{N\_\{2\}\}\}\\approx 4\.65\\times 10^\{\-26\}\\,\\mathrm\{kg\}\. The cutoff speed corresponding to the dimensionless kinetic\-energy thresholdϵ\\epsilonis vc\(ϵ\)=2kBTϵmN2\.\\displaystyle v\_\{c\}\(\\epsilon\)=\\sqrt\{\\frac\{2k\_\{B\}T\\epsilon\}\{m\_\{\\mathrm\{N\_\{2\}\}\}\}\}\.\(71\)AtT=300KT=300\\,\\mathrm\{K\}, this gives cutoff speeds of approximately597,731,844,597,731,844,and944m/s944\\,\\mathrm\{m/s\}forϵ=2,3,4,\\epsilon=2,3,4,and55, respectively\. The excess kinetic energy sorted by the demon in the one\-shot half\-chamber protocol is Qexcess≈N2q\(ϵ\)ϕ\(ϵ\)kBT\\displaystyle Q\_\{\\mathrm\{excess\}\}\\approx\\frac\{N\}\{2\}q\(\\epsilon\)\\phi\(\\epsilon\)k\_\{B\}T⇒QexcessT≈kBN2q\(ϵ\)ϕ\(ϵ\),andlog10\(ℐ\+1\)≈QexcesskBTlog10\.\\displaystyle\\Rightarrow\\frac\{Q\_\{\\mathrm\{excess\}\}\}\{T\}\\approx k\_\{B\}\\frac\{N\}\{2\}q\(\\epsilon\)\\phi\(\\epsilon\),\\text\{ and \}\\log\_\{10\}\(\\mathcal\{I\}\+1\)\\approx\\frac\{Q\_\{\\mathrm\{excess\}\}\}\{k\_\{B\}T\\log 10\}\.At one atmosphere and300K300\\,\\mathrm\{K\}, an ideal gas has density2\.44×1025m−32\.44\\times 10^\{25\}\\,\\mathrm\{m\}^\{\-3\}, so1mm31\\,\\mathrm\{mm\}^\{3\}containsN=2\.44×1016N=2\.44\\times 10^\{16\}particles\. Table[1](https://arxiv.org/html/2606.20231#S4.T1)reports the selected\-particle countNselN\_\{\\mathrm\{sel\}\}, sorted excess\-energy scaleQexcessQ\_\{\\mathrm\{excess\}\}, andL=log10\(ℐ\+1\)L=\\log\_\{10\}\(\\mathcal\{I\}\+1\)\. Table 1:Concrete velocity\-selection demon estimates for nitrogen\-like air atT=300KT=300\\,\\mathrm\{K\}, using the corrected three\-dimensional Maxwell–Boltzmann tail\.NselN\_\{\\mathrm\{sel\}\}is the selected\-particle count in the one\-shot half\-chamber protocol,QexcessQ\_\{\\mathrm\{excess\}\}is the local sorted excess kinetic\-energy scale for1mm31\\,\\mathrm\{mm\}^\{3\}of air, andL=log10\(ℐ\+1\)L=\\log\_\{10\}\(\\mathcal\{I\}\+1\)\. Scaled columns report the indicated common multipliers\. Values are local amplification\-scale estimates before measurement, memory, control, and erasure costs\.The scale is extreme because the same selection rule acts on many microscopic degrees of freedom\. ForN=100N=100,log10\(ℐ\+1\)≈1\.85\\log\_\{10\}\(\\mathcal\{I\}\+1\)\\approx 1\.85–9\.389\.38\. In1mm31\\,\\mathrm\{mm\}^\{3\}of air, it acts on101410^\{14\}–101510^\{15\}eligible particles; the sorted excess energy is onlyO\(10−5\)JO\(10^\{\-5\}\)\\,\\mathrm\{J\}, but corresponds to roughly101510^\{15\}thermal\-scale selections\. Scaling from1mm31\\,\\mathrm\{mm\}^\{3\}to1cm31\\,\\mathrm\{cm\}^\{3\}multipliesNselN\_\{\\mathrm\{sel\}\},QexcessQ\_\{\\mathrm\{excess\}\}, andLLby10310^\{3\}, leavingvcv\_\{c\},q\(ϵ\)q\(\\epsilon\), andϕ\(ϵ\)\\phi\(\\epsilon\)unchanged\. The local entropy reduction must still be compensated by measurement, memory, control, and erasure costs; the table gives idealized local amplification capacity, not a second\-law violation\. ### IV\.2Symbolic Generation: Human and LLM Text Many intelligent systems emit symbolic sequences, including human speech or writing and LLM token streams\. Even when direct thermodynamic work on the environment is unobserved, symbolic output provides an observable trajectory\. The symbolic case therefore gives a finite empirical version of the rare\-valid framework, using the information\-theoretic language of entropy rate, coding, and sequence likelihood\[[20](https://arxiv.org/html/2606.20231#bib.bib20),[21](https://arxiv.org/html/2606.20231#bib.bib21)\]\. LetX1:n=\(X1,…,Xn\)X\_\{1:n\}=\(X\_\{1\},\\ldots,X\_\{n\}\)be a symbolic sequence over a finite alphabetΣ\\Sigma\. For a prompt or task contextyy, letP0\(⋅∣y\)P\_\{0\}\(\\cdot\\mid y\)denote a baseline symbolic distribution onΣn\\Sigma^\{n\}, and letPG\(⋅∣y\)P\_\{G\}\(\\cdot\\mid y\)denote the distribution induced by generatorGG\. The baseline may be annn\-gram model, a low\-order Markov model, a prompt\-independent language model, a lower\-capacity reference generator, or a fixed decoding policy\. LetVn\(y\)⊆ΣnV\_\{n\}\(y\)\\subseteq\\Sigma^\{n\}be the valid outputs for contextyy\. Validity is task\-dependent—syntactic well\-formedness, semantic coherence, factual consistency, entailment, executable correctness, biological admissibility, or task relevance—and must be measurable under bothP0\(⋅∣y\)P\_\{0\}\(\\cdot\\mid y\)andPG\(⋅∣y\)P\_\{G\}\(\\cdot\\mid y\)\. #### IV\.2\.1Symbolic rare\-valid amplification A rare\-valid symbolic set must be defined by baseline mass, not merely by a pointwise likelihood cutoff\. Fix0<δ<10<\\delta<1\. LetRδ,n\(y\)⊆Vn\(y\)R\_\{\\delta,n\}\(y\)\\subseteq V\_\{n\}\(y\)be a valid set whose baseline probability isδ\\delta: P0\(Rδ,n\(y\)∣y\)=∑x∈Rδ,n\(y\)P0\(x∣y\)=δ\.\\displaystyle P\_\{0\}\(R\_\{\\delta,n\}\(y\)\\mid y\)=\\sum\_\{x\\in R\_\{\\delta,n\}\(y\)\}P\_\{0\}\(x\\mid y\)=\\delta\.\(72\)When exact equality is not attainable becauseΣn\\Sigma^\{n\}is discrete, one may use the nearest attainable mass, useP0\(Rδ,n\(y\)∣y\)≤δP\_\{0\}\(R\_\{\\delta,n\}\(y\)\\mid y\)\\leq\\delta, or randomize inclusion of a boundary element to obtain exact mass\. A natural construction is to chooseRδ,n\(y\)R\_\{\\delta,n\}\(y\)from the lowest\-baseline\-probability valid outputs until total baseline massδ\\deltais reached\. The symbolic thermodynamic intelligence of generatorGG, at lengthnn, contextyy, and rare\-valid massδ\\delta, is ℐδ,n\(G∣y\)=PG\(Rδ,n\(y\)∣y\)−P0\(Rδ,n\(y\)∣y\)P0\(Rδ,n\(y\)∣y\)=PG\(Rδ,n\(y\)∣y\)δ−1\.\\begin\{multlined\}\\mathcal\{I\}\_\{\\delta,n\}\(G\\mid y\)=\\frac\{P\_\{G\}\(R\_\{\\delta,n\}\(y\)\\mid y\)\-P\_\{0\}\(R\_\{\\delta,n\}\(y\)\\mid y\)\}\{P\_\{0\}\(R\_\{\\delta,n\}\(y\)\\mid y\)\}\\\\ =\\frac\{P\_\{G\}\(R\_\{\\delta,n\}\(y\)\\mid y\)\}\{\\delta\}\-1\.\\end\{multlined\}\\mathcal\{I\}\_\{\\delta,n\}\(G\\mid y\)=\\frac\{P\_\{G\}\(R\_\{\\delta,n\}\(y\)\\mid y\)\-P\_\{0\}\(R\_\{\\delta,n\}\(y\)\\mid y\)\}\{P\_\{0\}\(R\_\{\\delta,n\}\(y\)\\mid y\)\}\\\\ =\\frac\{P\_\{G\}\(R\_\{\\delta,n\}\(y\)\\mid y\)\}\{\\delta\}\-1\.\(73\)Equivalently, PG\(Rδ,n\(y\)∣y\)=δ\(1\+ℐδ,n\(G∣y\)\)\.\\displaystyle P\_\{G\}\(R\_\{\\delta,n\}\(y\)\\mid y\)=\\delta\\left\(1\+\\mathcal\{I\}\_\{\\delta,n\}\(G\\mid y\)\\right\)\.\(74\)Thusℐδ,n=0\\mathcal\{I\}\_\{\\delta,n\}=0when the generator matches the baseline on the rare\-valid set; positive values indicate amplification and negative values suppression\. For a distributionμ\\muover prompts or tasks, ℐδ,nμ\(G\)=𝔼y∼μ\[ℐδ,n\(G∣y\)\]\.\\displaystyle\\mathcal\{I\}\_\{\\delta,n\}^\{\\mu\}\(G\)=\\mathbb\{E\}\_\{y\\sim\\mu\}\\left\[\\mathcal\{I\}\_\{\\delta,n\}\(G\\mid y\)\\right\]\.\(75\)The symbolic definition inherits the binary coarse\-graining bound from the path\-space theory\. Let pG\(y\)=PG\(Rδ,n\(y\)∣y\)=δ\(1\+ℐδ,n\(G∣y\)\)\.\\displaystyle p\_\{G\}\(y\)=P\_\{G\}\(R\_\{\\delta,n\}\(y\)\\mid y\)=\\delta\\left\(1\+\\mathcal\{I\}\_\{\\delta,n\}\(G\\mid y\)\\right\)\.\(76\)Then data processing for KL divergence under the binary partition\{Rδ,n\(y\),Rδ,n\(y\)c\}\\\{R\_\{\\delta,n\}\(y\),R\_\{\\delta,n\}\(y\)^\{c\}\\\}gives DKL\(PG\(⋅∣y\)∥P0\(⋅∣y\)\)≥d\(pG\(y\)∥δ\),\\displaystyle D\_\{\\mathrm\{KL\}\}\\left\(P\_\{G\}\(\\cdot\\mid y\)\\,\\\|\\,P\_\{0\}\(\\cdot\\mid y\)\\right\)\\geq d\(p\_\{G\}\(y\)\\,\\\|\\,\\delta\),\(77\)whered\(p∥δ\)=plogpδ\+\(1−p\)log1−p1−δ\.\\displaystyle\\text\{ where \}d\(p\\,\\\|\\,\\delta\)=p\\log\\frac\{p\}\{\\delta\}\+\(1\-p\)\\log\\frac\{1\-p\}\{1\-\\delta\}\.Equivalently, DKL\(PG\(⋅∣y\)∥P0\(⋅∣y\)\)≥d\(δ\(1\+ℐδ,n\(G∣y\)\)∥δ\)\.\\begin\{multlined\}D\_\{\\mathrm\{KL\}\}\\left\(P\_\{G\}\(\\cdot\\mid y\)\\,\\\|\\,P\_\{0\}\(\\cdot\\mid y\)\\right\)\\\\ \\geq d\\left\(\\delta\(1\+\\mathcal\{I\}\_\{\\delta,n\}\(G\\mid y\)\)\\,\\\|\\,\\delta\\right\)\.\\end\{multlined\}D\_\{\\mathrm\{KL\}\}\\left\(P\_\{G\}\(\\cdot\\mid y\)\\,\\\|\\,P\_\{0\}\(\\cdot\\mid y\)\\right\)\\\\ \\geq d\\left\(\\delta\(1\+\\mathcal\{I\}\_\{\\delta,n\}\(G\\mid y\)\)\\,\\\|\\,\\delta\\right\)\.\(78\)Thus symbolic rare\-valid amplification requires measurable divergence from the baseline symbolic law\. #### IV\.2\.2Validity\-weighted operational estimator Empirical validity is often graded\. Letv\(x,y\)∈\[0,1\]v\(x,y\)\\in\[0,1\]be a calibrated validity score and define wδ,n\(x,y\)=v\(x,y\)1\{0<P0\(x∣y\)≤qδ\(y\)\},\\displaystyle w\_\{\\delta,n\}\(x,y\)=v\(x,y\)\\,\\mathbf\{1\}\_\{\\\{0<P\_\{0\}\(x\\mid y\)\\leq q\_\{\\delta\}\(y\)\\\}\},\(79\)whereqδ\(y\)q\_\{\\delta\}\(y\)is chosen so that the baseline weighted mass is Z0,δ\(y\)=∑x∈Σnwδ,n\(x,y\)P0\(x∣y\)\>0\.\\displaystyle Z\_\{0,\\delta\}\(y\)=\\sum\_\{x\\in\\Sigma^\{n\}\}w\_\{\\delta,n\}\(x,y\)P\_\{0\}\(x\\mid y\)\>0\.\(80\)The validity\-weighted symbolic intelligence is then ℐδ,nval\(G∣y\)=∑x∈Σnwδ,n\(x,y\)\(PG\(x∣y\)−P0\(x∣y\)\)Z0,δ\(y\)\.\\displaystyle\\mathcal\{I\}\_\{\\delta,n\}^\{\\mathrm\{val\}\}\(G\\mid y\)=\\frac\{\\sum\_\{x\\in\\Sigma^\{n\}\}w\_\{\\delta,n\}\(x,y\)\\left\(P\_\{G\}\(x\\mid y\)\-P\_\{0\}\(x\\mid y\)\\right\)\}\{Z\_\{0,\\delta\}\(y\)\}\.\(81\)This reduces to the hard rare\-valid definition whenvvis an indicator ofRδ,n\(y\)R\_\{\\delta,n\}\(y\), and it penalizes entropy inflation because rare invalid outputs receive little or no weight\. For a concrete task family, letD\+D\_\{\+\}contain valid prompt–output pairs andD−D\_\{\-\}invalid, corrupted, or adversarial outputs\. Train a calibrated classifiercθ\(x,y\)∈\[0,1\]c\_\{\\theta\}\(x,y\)\\in\[0,1\]estimating validity\. For thresholdτ\\tau, define Vn,τ\(y\)=\{x∈Σn:cθ\(x,y\)≥τ\}\.\\displaystyle V\_\{n,\\tau\}\(y\)=\\\{x\\in\\Sigma^\{n\}:c\_\{\\theta\}\(x,y\)\\geq\\tau\\\}\.\(82\)GivenP0\(⋅∣y\)P\_\{0\}\(\\cdot\\mid y\), choose a rare\-tail thresholdqδ\(y\)q\_\{\\delta\}\(y\)so that Rδ,n,τ\(y\)=\{x∈Σn:cθ\(x,y\)≥τ,0<P0\(x∣y\)≤qδ\(y\)\},\\displaystyle R\_\{\\delta,n,\\tau\}\(y\)=\\\{x\\in\\Sigma^\{n\}:c\_\{\\theta\}\(x,y\)\\geq\\tau,\\;0<P\_\{0\}\(x\\mid y\)\\leq q\_\{\\delta\}\(y\)\\\},with P0\(Rδ,n,τ\(y\)∣y\)=δ,\\displaystyle P\_\{0\}\(R\_\{\\delta,n,\\tau\}\(y\)\\mid y\)=\\delta,\(83\)up to the finite\-alphabet boundary convention\. The corresponding hard\-threshold symbolic intelligence is ℐδ,n,τ\(G∣y\)=PG\(Rδ,n,τ\(y\)∣y\)δ−1\.\\displaystyle\\mathcal\{I\}\_\{\\delta,n,\\tau\}\(G\\mid y\)=\\frac\{P\_\{G\}\(R\_\{\\delta,n,\\tau\}\(y\)\\mid y\)\}\{\\delta\}\-1\.\(84\)Herecθc\_\{\\theta\}may combine grammaticality, entailment, and task relevance\. For executable reasoning, validity can instead be defined by a parser, unit tests, proof checker, or verifier\. #### IV\.2\.3Temperature as a negative control LetGTG\_\{T\}denote an LLM decoded at temperatureTT\. The entropy of the generated symbolic distribution is Hn\(GT∣y\)=−∑x∈ΣnPGT\(x∣y\)logPGT\(x∣y\)\.\\displaystyle H\_\{n\}\(G\_\{T\}\\mid y\)=\-\\sum\_\{x\\in\\Sigma^\{n\}\}P\_\{G\_\{T\}\}\(x\\mid y\)\\log P\_\{G\_\{T\}\}\(x\\mid y\)\.\(85\)The validity\-weighted rare\-tail score is ℐδ,nval\(GT∣y\)=∑x∈Σnwδ,n\(x,y\)\(PGT\(x∣y\)−P0\(x∣y\)\)Z0,δ\(y\)\.\\displaystyle\\mathcal\{I\}\_\{\\delta,n\}^\{\\mathrm\{val\}\}\(G\_\{T\}\\mid y\)=\\frac\{\\sum\_\{x\\in\\Sigma^\{n\}\}w\_\{\\delta,n\}\(x,y\)\\left\(P\_\{G\_\{T\}\}\(x\\mid y\)\-P\_\{0\}\(x\\mid y\)\\right\)\}\{Z\_\{0,\\delta\}\(y\)\}\.Entropy and rare\-valid lift need not be monotone\. Increasing temperature may raiseHn\(GT∣y\)H\_\{n\}\(G\_\{T\}\\mid y\)while shifting mass into invalid or incoherent strings\. Therefore ℐδ,nval\(GT∣y\)≡Hn\(GT∣y\)\.\\displaystyle\\mathcal\{I\}\_\{\\delta,n\}^\{\\mathrm\{val\}\}\(G\_\{T\}\\mid y\)\\not\\equiv H\_\{n\}\(G\_\{T\}\\mid y\)\.\(86\)The empirical prediction is an intermediate optimum: low temperature is valid but generic, high temperature is rare but often invalid, and the validity\-weighted score peaks when outputs are both rare underP0P\_\{0\}and valid\. #### IV\.2\.4Set\-level lift and symbolic scale calculations For the hard rare\-valid setRδ,n\(y\)R\_\{\\delta,n\}\(y\), define the set\-level log\-lift λδ,n\(G∣y\)=logPG\(Rδ,n\(y\)∣y\)P0\(Rδ,n\(y\)∣y\)\.\\displaystyle\\lambda\_\{\\delta,n\}\(G\\mid y\)=\\log\\frac\{P\_\{G\}\(R\_\{\\delta,n\}\(y\)\\mid y\)\}\{P\_\{0\}\(R\_\{\\delta,n\}\(y\)\\mid y\)\}\.\(87\)SinceP0\(Rδ,n\(y\)∣y\)=δP\_\{0\}\(R\_\{\\delta,n\}\(y\)\\mid y\)=\\delta, this gives the exact identity λδ,n\(G∣y\)=log\(1\+ℐδ,n\(G∣y\)\)\.\\displaystyle\\lambda\_\{\\delta,n\}\(G\\mid y\)=\\log\\left\(1\+\\mathcal\{I\}\_\{\\delta,n\}\(G\\mid y\)\\right\)\.\(88\)Equivalently, ℐδ,n\(G∣y\)=exp\(λδ,n\(G∣y\)\)−1\.\\displaystyle\\mathcal\{I\}\_\{\\delta,n\}\(G\\mid y\)=\\exp\\left\(\\lambda\_\{\\delta,n\}\(G\\mid y\)\\right\)\-1\.\(89\)This set\-level identity replaces an informal pointwise average log\-lift\. The latter, 𝔼x∼PG\(⋅∣Rδ,n\(y\),y\)\[logPG\(x∣y\)P0\(x∣y\)\]\\displaystyle\\mathbb\{E\}\_\{x\\sim P\_\{G\}\(\\cdot\\mid R\_\{\\delta,n\}\(y\),y\)\}\\left\[\\log\\frac\{P\_\{G\}\(x\\mid y\)\}\{P\_\{0\}\(x\\mid y\)\}\\right\]\(90\)is generally unequal toλδ,n\(G∣y\)\\lambda\_\{\\delta,n\}\(G\\mid y\), except whenPG\(x∣y\)/P0\(x∣y\)P\_\{G\}\(x\\mid y\)/P\_\{0\}\(x\\mid y\)is nearly constant overRδ,n\(y\)R\_\{\\delta,n\}\(y\)\. For sequence lengthnn, define the per\-symbol structured log\-lift in bits by Δℓδ,n\(G∣y\)=λδ,n\(G∣y\)nlog2\.\\displaystyle\\Delta\\ell\_\{\\delta,n\}\(G\\mid y\)=\\frac\{\\lambda\_\{\\delta,n\}\(G\\mid y\)\}\{n\\log 2\}\.\(91\)Then log10\(ℐδ,n\(G∣y\)\+1\)=nΔℓδ,n\(G∣y\)log102\.\\displaystyle\\log\_\{10\}\\left\(\\mathcal\{I\}\_\{\\delta,n\}\(G\\mid y\)\+1\\right\)=n\\,\\Delta\\ell\_\{\\delta,n\}\(G\\mid y\)\\,\\log\_\{10\}2\.\(92\)This is a structured log\-lift decomposition, not an ordinary entropy\-rate statement\. The latter becomes relevant only after an additional finite\-resolution AEP approximation has been specified\. #### IV\.2\.5Sentence\-scale human and AI estimates We now instantiate the symbolic calculation at the sentence scale\. Louwerse’s simplified combinatorial estimate\[[26](https://arxiv.org/html/2606.20231#bib.bib26)\]gives a finite\-resolution ensemble of interpretable English sentences of roughly33–2020words on the order of NV≜\|Vn\|≈5×1021\.\\displaystyle N\_\{V\}\\triangleq\|V\_\{n\}\|\\approx 5\\times 10^\{21\}\.\(93\)This is not a universal linguistic constant; it is a usable baseline cardinality for the present finite\-resolution calculation\. To define a comparable rare\-valid target, letGn⋆⊂VnG\_\{n\}^\{\\star\}\\subset V\_\{n\}denote sentence\-scale strings that satisfy an additional high\-quality human predicate: literary force, explanatory compression, originality, memorability, or comparable semantic/aesthetic force\. We estimate its cardinality by taking a broad human canon ofB⋆∼2×103B\_\{\\star\}\\sim 2\\times 10^\{3\}high\-quality long\-form works andS⋆∼5×103S\_\{\\star\}\\sim 5\\times 10^\{3\}sentence\-scale units per work, NG≜\|Gn⋆\|≈B⋆S⋆∼\(2×103\)\(5×103\)=107\.\\displaystyle N\_\{G\}\\triangleq\|G\_\{n\}^\{\\star\}\|\\approx B\_\{\\star\}S\_\{\\star\}\\sim\(2\\times 10^\{3\}\)\(5\\times 10^\{3\}\)=10^\{7\}\.\(94\)The scale of this assumption is conservative relative to large public\-domain corpora; the Standardized Project Gutenberg Corpus, for example, contains more than5×1045\\times 10^\{4\}books and more than3×1093\\times 10^\{9\}word tokens\[[27](https://arxiv.org/html/2606.20231#bib.bib27)\]\. We treatNG=106N\_\{G\}=10^\{6\}–10810^\{8\}as a sensitivity range\. Taking the passive symbolic baseline to be approximately uniform overVnV\_\{n\}, the baseline mass of the exemplary human\-quality set is δ⋆=NGNV≈1075×1021=2×10−15\.\\displaystyle\\delta\_\{\\star\}=\\frac\{N\_\{G\}\}\{N\_\{V\}\}\\approx\\frac\{10^\{7\}\}\{5\\times 10^\{21\}\}=2\\times 10^\{\-15\}\.\(95\)IfqG=PG\(Gn⋆\)q\_\{G\}=P\_\{G\}\(G\_\{n\}^\{\\star\}\)is the probability that generatorGGlands in this target set under the specified task condition, then ℐG\+1=qGδ⋆=qGNVNG\.\\displaystyle\\mathcal\{I\}\_\{G\}\+1=\\frac\{q\_\{G\}\}\{\\delta\_\{\\star\}\}=q\_\{G\}\\,\\frac\{N\_\{V\}\}\{N\_\{G\}\}\.\(96\)For an expert human process conditioned on producing exemplary sentence\-scale text,qH≈1q\_\{H\}\\approx 1, giving ℐH\+1≈5×1021107=5×1014\.\\displaystyle\\mathcal\{I\}\_\{H\}\+1\\approx\\frac\{5\\times 10^\{21\}\}\{10^\{7\}\}=5\\times 10^\{14\}\.\(97\)Equivalently, LH=log10\(ℐH\+1\)=14\.699\\displaystyle L\_\{H\}=\\log\_\{10\}\(\\mathcal\{I\}\_\{H\}\+1\)=14\.699\(98\)ΛH=log10\(LH\+1\)=1\.196\.\\displaystyle\\Lambda\_\{H\}=\\log\_\{10\}\(L\_\{H\}\+1\)=1\.196\.\(99\) For the LLM estimate we use only one machine generator and one human reference corpus: Gutenberg prose and GPT\-5 long\-form prose\. The entropy\-rate estimates are generated by the self\-contained procedure in Appendix[B](https://arxiv.org/html/2606.20231#A2)\. Both corpora are mapped to a common2727\-symbol alphabet, consisting of the lettersaa–zzand space, with punctuation, digits, and non\-ASCII characters removed\. The resulting mean entropy rates are HH=0\.77bits/character\(for Gutenberg prose\),\\displaystyle H\_\{H\}=0\.77\\ \\mathrm\{bits/character\}\\quad\\hbox\{\(for Gutenberg prose\)\},\(100\)HGPT5=0\.74bits/character\.\\displaystyle H\_\{\\mathrm\{GPT5\}\}=0\.74\\ \\mathrm\{bits/character\}\.\(101\)Since Louwerse’s valid\-sentence estimate is a sentence\-scale count, the entropy\-rate correction must use a character\-scale length\. We take a 20\-word sentence\-scale unit to haven⋆=100n\_\{\\star\}=100characters after the same coarse\-graining\. The correction below is an AEP\-style support\-size approximation, not an exact finite\-length theorem\. More precisely, we write log2qGPT5qH=n⋆\(HGPT5−HH\)\+ρn⋆,\\displaystyle\\log\_\{2\}\\frac\{q\_\{\\mathrm\{GPT5\}\}\}\{q\_\{H\}\}=n\_\{\\star\}\(H\_\{\\mathrm\{GPT5\}\}\-H\_\{H\}\)\+\\rho\_\{n\_\{\\star\}\},\(102\)whereρn⋆\\rho\_\{n\_\{\\star\}\}collects finite\-length typical\-set error, boundary effects of the rare\-valid set, and the fact that entropy\-rate support size is only a proxy for overlap with the exemplary\-output subset\. The numerical value below setsρ100=0\\rho\_\{100\}=0, and should be read as a central order\-of\-magnitude estimate\. Withn⋆=100n\_\{\\star\}=100, this gives qGPT5qH≈2100\(0\.74−0\.77\)=2−3=0\.125\.\\displaystyle\\frac\{q\_\{\\mathrm\{GPT5\}\}\}\{q\_\{H\}\}\\approx 2^\{100\(0\.74\-0\.77\)\}=2^\{\-3\}=0\.125\.\(103\)Combining \([97](https://arxiv.org/html/2606.20231#S4.E97)\) and \([103](https://arxiv.org/html/2606.20231#S4.E103)\) gives the central estimate ℐGPT5\+1≈\(5×1014\)2100\(0\.74−0\.77\)=6\.25×1013\.\\displaystyle\\mathcal\{I\}\_\{\\mathrm\{GPT5\}\}\+1\\approx\(5\\times 10^\{14\}\)2^\{100\(0\.74\-0\.77\)\}=6\.25\\times 10^\{13\}\.\(104\)Thus LGPT5=log10\(6\.25×1013\)=13\.796\\displaystyle L\_\{\\mathrm\{GPT5\}\}=\\log\_\{10\}\(6\.25\\times 10^\{13\}\)=13\.796\(105\)ΛGPT5=log10\(LGPT5\+1\)=1\.170\.\\displaystyle\\Lambda\_\{\\mathrm\{GPT5\}\}=\\log\_\{10\}\(L\_\{\\mathrm\{GPT5\}\}\+1\)=1\.170\.\(106\)SinceHGPT5<HHH\_\{\\mathrm\{GPT5\}\}<H\_\{H\}, the sentence\-scale choicen⋆=100n\_\{\\star\}=100is conservative relative to longer independently composable symbolic units: increasingn⋆n\_\{\\star\}would decrease the GPT\-5 estimate, provided the entropy\-rate gap persists and the rare\-valid construction is consistently extended\. Table 2:Numerical thermodynamic\-intelligence scale sorted from small to large\. The table reports the approximate rare\-valid liftℐ\\mathcal\{I\}and the stabilized double\-log scaleΛ=log10\(log10\(ℐ\+1\)\+1\)\\Lambda=\\log\_\{10\}\(\\log\_\{10\}\(\\mathcal\{I\}\+1\)\+1\)\. Rows are ordered by the midpoint of the underlyingL=log10\(ℐ\+1\)L=\\log\_\{10\}\(\\mathcal\{I\}\+1\)range when a range is reported\. Symbolic entries are central finite\-resolution estimates; the GPT\-5 entry sets the finite\-length AEP correctionρ100=0\\rho\_\{100\}=0\. Demon entries are local amplification scales before measurement, memory, control, and erasure costs\. #### IV\.2\.6Algorithmic\-complexity interpretation The symbolic construction has an algorithmic\-statistics interpretation, but not a direct estimator\. Kolmogorov complexity is uncomputable and machine\-dependent up to additive constants\[[28](https://arxiv.org/html/2606.20231#bib.bib28)\]; the useful point is structural: algorithmic statistics separates the description of a model class from the index of an object inside it\[[29](https://arxiv.org/html/2606.20231#bib.bib29)\]\. LetK\(x\)K\(x\)denote the prefix Kolmogorov complexity of a finite stringxx\. A two\-part code describesxxthrough a finite set or model classS∋xS\\ni x: K\(x\)≲K\(S\)\+log\|S\|\.\\displaystyle K\(x\)\\lesssim K\(S\)\+\\log\|S\|\.\(107\)HereK\(S\)K\(S\)describes the regularity class andlog\|S\|\\log\|S\|indexesxxwithin it\. Thus high\-complexity strings need not be noise; they may lie in rich valid classes\. The target is not raw unpredictability, but probability mass assigned to rare strings that remain valid under task constraints\. Levin’s coding theorem relates universal a priori probabilitym\(x\)m\(x\)to Kolmogorov complexity\[[30](https://arxiv.org/html/2606.20231#bib.bib30)\]: K\(x\)=−logm\(x\)\+O\(1\)\.\\displaystyle K\(x\)=\-\\log m\(x\)\+O\(1\)\.\(108\)This gives an algorithmic analogue of symbolic rarity\. IfMMis a universal semimeasure or computable approximation, normalize it overΣn\\Sigma^\{n\}by Mn\(x\)=M\(x\)∑z∈ΣnM\(z\)\.\\displaystyle M\_\{n\}\(x\)=\\frac\{M\(x\)\}\{\\sum\_\{z\\in\\Sigma^\{n\}\}M\(z\)\}\.\(109\)For an algorithmic rare\-valid setAδ,n\(y\)⊆Vn\(y\)A\_\{\\delta,n\}\(y\)\\subseteq V\_\{n\}\(y\)satisfyingMn\(Aδ,n\(y\)\)=δM\_\{n\}\(A\_\{\\delta,n\}\(y\)\)=\\delta, define ℐδ,nK\(G∣y\)=PG\(Aδ,n\(y\)∣y\)δ−1\.\\displaystyle\\mathcal\{I\}\_\{\\delta,n\}^\{K\}\(G\\mid y\)=\\frac\{P\_\{G\}\(A\_\{\\delta,n\}\(y\)\\mid y\)\}\{\\delta\}\-1\.\(110\)Then the same binary KL bridge gives DKL\(PG\(⋅∣y\)∥Mn\)≥d\(δ\(1\+ℐδ,nK\(G∣y\)\)∥δ\)\.\\displaystyle D\_\{\\mathrm\{KL\}\}\\\!\\left\(P\_\{G\}\(\\cdot\\mid y\)\\,\\\|\\,M\_\{n\}\\right\)\\geq d\\\!\\left\(\\delta\(1\+\\mathcal\{I\}\_\{\\delta,n\}^\{K\}\(G\\mid y\)\)\\,\\\|\\,\\delta\\right\)\.\(111\)For a specified task distribution and baseline, larger symbolic thermodynamic intelligence means more probability mass on valid strings that are rare under that baseline\. Empirical comparison therefore requires finite baselines, validity functions, and prompt distributions\. ## VDiscussion The central insight in this work is that perceived intelligence is measurable by what a system does to the probability distribution over possible futures\. The architectural claim is that intelligence requires recursive self\-simulation: a system models a world in which it is itself an acting component, evaluates action\-conditioned futures, and uses that model to select interventions\. The operational claim is that this architecture becomes measurable as rare\-valid probability lift: amplification of futures that were unlikely under a passive baseline but remain valid under the constraints of the domain\. The main mathematical claim is that these two ideas are not merely associated\. Under bounded amplification, high rare\-valid lift requires high rare\-valid fidelity in the system’s self\-simulation, and high fidelity is nearly sufficient when an effective amplifying policy is available\. This perspective is adjacent to, but distinct from, several existing formalisms\. Legg–Hutter intelligence measures expected reward over a universal distribution of computable environments, whereas the present quantity measures probability lift of a specified rare\-valid path set relative to a specified passive law\[[2](https://arxiv.org/html/2606.20231#bib.bib2)\]\. Chollet’s ARC framework emphasizes skill\-acquisition efficiency and abstraction from sparse examples, whereas the present framework asks what path\-law operation such success corresponds to once the baseline, validity criterion, and resolution are fixed\[[4](https://arxiv.org/html/2606.20231#bib.bib4)\]\. Free\-energy and active\-inference formulations describe perception and action through variational free\-energy minimization; here the measured object is not free energy itself but the induced reweighting of rare\-valid trajectories\[[24](https://arxiv.org/html/2606.20231#bib.bib24)\]\. The closest thermodynamic relative is semantic information, in which information is meaningful when it is causally necessary for a system to maintain viability under counterfactual interventions\[[25](https://arxiv.org/html/2606.20231#bib.bib25)\]\. In contrast, rare\-valid lift measures how much an induced law amplifies valid low\-baseline\-probability futures, and Theorems[1](https://arxiv.org/html/2606.20231#Thmtheorem1)and[2](https://arxiv.org/html/2606.20231#Thmtheorem2)connect that amplification to rare\-valid simulation fidelity\. Thus the present contribution is not rarity, validity, self\-modeling, or feedback thermodynamics in isolation\[[13](https://arxiv.org/html/2606.20231#bib.bib13),[14](https://arxiv.org/html/2606.20231#bib.bib14)\], but their combination into a level\-relative path\-measure definition with explicit fidelity bounds\. This formulation makes intelligence a level\-relative measurement\. A claim of intelligence is not made relative to an inaccessible absolute reality, but relative to a specified level of description, baseline path law, validity criterion, and observational resolution\. At levelkk, realized thermodynamic intelligence means that the system actually changes the level\-kkpath law\. Thermodynamic intelligence potential for levelkkis computed inside a level\-\(k\+1\)\(k\+1\)simulation of levelkk, where counterfactual actions can be evaluated even if they are not implemented at levelkk\. This distinction separates what a system can identify in simulation from what it actually makes more probable in the measured world\. The numerical examples calibrate the measure rather than define a taxonomy\. Table[2](https://arxiv.org/html/2606.20231#S4.T2)reports the stabilized double\-log scaleΛ=log10\(log10\(ℐ\+1\)\+1\)\\Lambda=\\log\_\{10\}\(\\log\_\{10\}\(\\mathcal\{I\}\+1\)\+1\), which allows passive systems, feedback controllers, symbolic generators, and idealized information engines to be placed on the same probability\-lift scale\. Passive matter has zero lift by construction\. Simple feedback produces modest lift; repeated dynamic control compounds small gains; symbolic generators can amplify valid low\-baseline\-probability sequences; and Maxwell\-demon\-like systems occupy the high end because microscopic information is used to select rare thermodynamic trajectories\. The1mm31\\,\\mathrm\{mm\}^\{3\}velocity\-selection demon is large because the same selection rule is applied across101410^\{14\}–101510^\{15\}eligible particles\. These are local amplification scales before full measurement, memory, control, and erasure costs are paid\. The formal results separate the ingredients of the theory\. Theorems[1](https://arxiv.org/html/2606.20231#Thmtheorem1)and[2](https://arxiv.org/html/2606.20231#Thmtheorem2)provide the central bridge from recursive self\-simulation to thermodynamic intelligence: rare\-valid fidelity is necessary under bounded amplification and nearly sufficient with effective simulated actuation\. Lemma[1](https://arxiv.org/html/2606.20231#Thmlemma1)shows that rare\-valid amplification entails path\-measure divergence from the passive baseline\. Theorem[3](https://arxiv.org/html/2606.20231#Thmtheorem3)and Proposition[1](https://arxiv.org/html/2606.20231#Thmproposition1)relate path\-law changes to coarse\-grained thermodynamic signatures\. Theorem[4](https://arxiv.org/html/2606.20231#Thmtheorem4)gives a protocol\-dependent bookkeeping penalty for imperfect rare\-set identification\. Together, these results distinguish simulation fidelity, amplification power, thermodynamic accounting, and implementation\. Empirical use ofℐδ\\mathcal\{I\}\_\{\\delta\}requires explicit choices of baseline, validity criterion, level of description, and trajectory resolution\. These choices are not defects of the framework; they are the conditions under which the measurement is meaningful\. Poor baselines can inflate or suppress measured lift, and continuous high\-dimensional rare sets require statistical regularity conditions such as finite partitions, margin assumptions, or large\-deviation structure\. Natural testbeds include symbolic generation with executable or semantic checkers, closed\-loop control with known passive dynamics, biological sequence evolution under viability constraints, and synthetic Maxwell\-demon\-like information engines with explicit measurement and erasure costs\. ## VIConclusion Intelligence can be treated as recursive self\-simulation made observable through lawful amplification of rare\-valid futures\. A system is intelligent, in this sense, when it models a world containing itself, evaluates action\-conditioned futures, and shifts probability mass toward futures that were rare under a specified passive baseline but remain valid\. The central result is that high lift cannot be obtained from randomness or actuation alone: under bounded amplification it requires high rare\-valid self\-simulation fidelity, and with effective simulated actuation that fidelity yields lift near the actuation\-limited optimum\. The framework is level\-relative, thermodynamically accounted, and applicable across passive systems, feedback controllers, Maxwell\-demon\-like information engines, and symbolic generators once the level, baseline law, validity criterion, trajectory resolution, and induced probability shift are specified\. ## Data and Code Availability The data, parameter files, and scripts used to generate the numerical calibration results reported in this manuscript are available through Harvard Dataverse at[https://doi\.org/10\.7910/DVN/F5TGT3](https://doi.org/10.7910/DVN/F5TGT3)\. The accompanying public GitHub repository,[https://github\.com/zeroknowledgediscovery/tme](https://github.com/zeroknowledgediscovery/tme), contains the reproducibility code used to generate the values in Fig\.[1](https://arxiv.org/html/2606.20231#S1.F1), Table[1](https://arxiv.org/html/2606.20231#S4.T1), Table[2](https://arxiv.org/html/2606.20231#S4.T2), and Appendix[C](https://arxiv.org/html/2606.20231#A3)\. The GPT–human entropy\-rate values used as symbolic inputs are not re\-estimated in the TME repository; they are imported as documented input constants from the workflow in[https://github\.com/zeroknowledgediscovery/nero](https://github.com/zeroknowledgediscovery/nero), with provenance specified in the repository metadata and with the estimation protocol described in Appendix[B](https://arxiv.org/html/2606.20231#A2)\. ###### Acknowledgements\. This work was supported by the Defense Advanced Research Projects Agency \(DARPA\) under the MAGICS program, DARPA\-EA\-25\-02\-05\-MAGICS\-PA\-025, Award No\. HR0011\-26\-3\-E016\. The views, opinions, and conclusions expressed in this work are those of the author and do not necessarily represent the official position or policy of DARPA or the U\.S\. Government\. ## Appendix AProofs and Technical Qualifications ### A\.1Proof of Theorem[1](https://arxiv.org/html/2606.20231#Thmtheorem1) *Proof\.*For readability, writeP^0\\widehat\{P\}\_\{0\},P^π\\widehat\{P\}\_\{\\pi\},V^δ\\widehat\{V\}\_\{\\delta\},A^\\widehat\{A\},δ^\\widehat\{\\delta\}, andΦ^\\widehat\{\\Phi\}for the corresponding level\-\(k\+1→k\)\(k\+1\\to k\)quantities\. Decompose the simulated rare\-valid mass as P^π\(V^δ\)=P^π\(V^δ∩A^\)\+P^π\(V^δ∖A^\)\.\\displaystyle\\widehat\{P\}\_\{\\pi\}\(\\widehat\{V\}\_\{\\delta\}\)=\\widehat\{P\}\_\{\\pi\}\(\\widehat\{V\}\_\{\\delta\}\\cap\\widehat\{A\}\)\+\\widehat\{P\}\_\{\\pi\}\(\\widehat\{V\}\_\{\\delta\}\\setminus\\widehat\{A\}\)\.\(112\)SinceP^π≪P^0\\widehat\{P\}\_\{\\pi\}\\ll\\widehat\{P\}\_\{0\}, the likelihood\-ratio assumptions give P^π\(V^δ\)≤αmaxP^0\(V^δ∩A^\)\+P^0\(V^δ∖A^\)\.\\displaystyle\\widehat\{P\}\_\{\\pi\}\(\\widehat\{V\}\_\{\\delta\}\)\\leq\\alpha\_\{\\max\}\\widehat\{P\}\_\{0\}\(\\widehat\{V\}\_\{\\delta\}\\cap\\widehat\{A\}\)\+\\widehat\{P\}\_\{0\}\(\\widehat\{V\}\_\{\\delta\}\\setminus\\widehat\{A\}\)\.\(113\)By definition, P^0\(V^δ∩A^\)=δ^Φ^,P^0\(V^δ∖A^\)=δ^\(1−Φ^\)\.\\displaystyle\\widehat\{P\}\_\{0\}\(\\widehat\{V\}\_\{\\delta\}\\cap\\widehat\{A\}\)=\\widehat\{\\delta\}\\widehat\{\\Phi\},\\qquad\\widehat\{P\}\_\{0\}\(\\widehat\{V\}\_\{\\delta\}\\setminus\\widehat\{A\}\)=\\widehat\{\\delta\}\(1\-\\widehat\{\\Phi\}\)\.\(114\)Therefore P^π\(V^δ\)≤δ^\[1\+\(αmax−1\)Φ^\]\.\\displaystyle\\widehat\{P\}\_\{\\pi\}\(\\widehat\{V\}\_\{\\delta\}\)\\leq\\widehat\{\\delta\}\\left\[1\+\(\\alpha\_\{\\max\}\-1\)\\widehat\{\\Phi\}\\right\]\.\(115\)Substituting into the definition ofℐ^δ\(k\+1→k\)\(π\)\\widehat\{\\mathcal\{I\}\}\_\{\\delta\}^\{\(k\+1\\to k\)\}\(\\pi\)gives ℐ^δ\(k\+1→k\)\(π\)=P^π\(V^δ\)−δ^δ^≤\(αmax−1\)Φ^\.\\displaystyle\\widehat\{\\mathcal\{I\}\}\_\{\\delta\}^\{\(k\+1\\to k\)\}\(\\pi\)=\\frac\{\\widehat\{P\}\_\{\\pi\}\(\\widehat\{V\}\_\{\\delta\}\)\-\\widehat\{\\delta\}\}\{\\widehat\{\\delta\}\}\\leq\(\\alpha\_\{\\max\}\-1\)\\widehat\{\\Phi\}\.\(116\)Ifαmax\>1\\alpha\_\{\\max\}\>1andℐ^δ\(k\+1→k\)\(π\)≥I0\>0\\widehat\{\\mathcal\{I\}\}\_\{\\delta\}^\{\(k\+1\\to k\)\}\(\\pi\)\\geq I\_\{0\}\>0, then Eq\. \([18](https://arxiv.org/html/2606.20231#S2.E18)\) implies I0≤\(αmax−1\)Φ^,\\displaystyle I\_\{0\}\\leq\(\\alpha\_\{\\max\}\-1\)\\widehat\{\\Phi\},\(117\)and rearrangement gives Eq\. \([19](https://arxiv.org/html/2606.20231#S2.E19)\)\.□\\square ### A\.2Proof of Theorem[2](https://arxiv.org/html/2606.20231#Thmtheorem2) *Proof\.*Use the same abbreviations as in the previous proof\. The lower likelihood\-ratio assumptions give P^π\(V^δ\)≥αminP^0\(V^δ∩A^\)\+βminP^0\(V^δ∖A^\)\.\\displaystyle\\widehat\{P\}\_\{\\pi\}\(\\widehat\{V\}\_\{\\delta\}\)\\geq\\alpha\_\{\\min\}\\widehat\{P\}\_\{0\}\(\\widehat\{V\}\_\{\\delta\}\\cap\\widehat\{A\}\)\+\\beta\_\{\\min\}\\widehat\{P\}\_\{0\}\(\\widehat\{V\}\_\{\\delta\}\\setminus\\widehat\{A\}\)\.\(118\)UsingP^0\(V^δ∩A^\)=δ^Φ^\\widehat\{P\}\_\{0\}\(\\widehat\{V\}\_\{\\delta\}\\cap\\widehat\{A\}\)=\\widehat\{\\delta\}\\widehat\{\\Phi\}andP^0\(V^δ∖A^\)=δ^\(1−Φ^\)\\widehat\{P\}\_\{0\}\(\\widehat\{V\}\_\{\\delta\}\\setminus\\widehat\{A\}\)=\\widehat\{\\delta\}\(1\-\\widehat\{\\Phi\}\), we obtain P^π\(V^δ\)≥δ^\[αminΦ^\+βmin\(1−Φ^\)\]\.\\displaystyle\\widehat\{P\}\_\{\\pi\}\(\\widehat\{V\}\_\{\\delta\}\)\\geq\\widehat\{\\delta\}\\left\[\\alpha\_\{\\min\}\\widehat\{\\Phi\}\+\\beta\_\{\\min\}\(1\-\\widehat\{\\Phi\}\)\\right\]\.\(119\)Therefore ℐ^δ\(k\+1→k\)\(π\)=P^π\(V^δ\)−δ^δ^≥αminΦ^\+βmin\(1−Φ^\)−1\.\\displaystyle\\widehat\{\\mathcal\{I\}\}\_\{\\delta\}^\{\(k\+1\\to k\)\}\(\\pi\)=\\frac\{\\widehat\{P\}\_\{\\pi\}\(\\widehat\{V\}\_\{\\delta\}\)\-\\widehat\{\\delta\}\}\{\\widehat\{\\delta\}\}\\geq\\alpha\_\{\\min\}\\widehat\{\\Phi\}\+\\beta\_\{\\min\}\(1\-\\widehat\{\\Phi\}\)\-1\.\(120\)IfΦ^≥1−ε\\widehat\{\\Phi\}\\geq 1\-\\varepsilon, then the right\-hand side is minimized overΦ^∈\[1−ε,1\]\\widehat\{\\Phi\}\\in\[1\-\\varepsilon,1\]atΦ^=1−ε\\widehat\{\\Phi\}=1\-\\varepsilonwheneverαmin≥βmin\\alpha\_\{\\min\}\\geq\\beta\_\{\\min\}\. This yields ℐ^δ\(k\+1→k\)\(π\)≥αmin\(1−ε\)\+βminε−1\\displaystyle\\widehat\{\\mathcal\{I\}\}\_\{\\delta\}^\{\(k\+1\\to k\)\}\(\\pi\)\\geq\\alpha\_\{\\min\}\(1\-\\varepsilon\)\+\\beta\_\{\\min\}\\varepsilon\-1\(121\)=\(αmin−1\)−\(αmin−βmin\)ε\.\\displaystyle=\(\\alpha\_\{\\min\}\-1\)\-\(\\alpha\_\{\\min\}\-\\beta\_\{\\min\}\)\\varepsilon\.\(122\)□\\square ### A\.3Proof of Theorem[3](https://arxiv.org/html/2606.20231#Thmtheorem3) ###### Proof\. Let a=P\(As\+\),b=P\(As−\),a0=Q\(As\+\),b0=Q\(As−\)\.a=P\(A\_\{s\}^\{\+\}\),\\quad b=P\(A\_\{s\}^\{\-\}\),\\quad a\_\{0\}=Q\(A\_\{s\}^\{\+\}\),\\quad b\_\{0\}=Q\(A\_\{s\}^\{\-\}\)\.Then \|Δs\(P,Q\)\|\\displaystyle\|\\Delta\_\{s\}\(P,Q\)\|=\|logab−loga0b0\|\\displaystyle=\\left\|\\log\\frac\{a\}\{b\}\-\\log\\frac\{a\_\{0\}\}\{b\_\{0\}\}\\right\|\(123\)≤\|loga−loga0\|\+\|logb−logb0\|\.\\displaystyle\\leq\|\\log a\-\\log a\_\{0\}\|\+\|\\log b\-\\log b\_\{0\}\|\.\(124\)Since all four probabilities are at leastmsm\_\{s\}, the logarithm is1/ms1/m\_\{s\}\-Lipschitz on this interval, so \|Δs\(P,Q\)\|≤1ms\(\|a−a0\|\+\|b−b0\|\)\.\|\\Delta\_\{s\}\(P,Q\)\|\\leq\\frac\{1\}\{m\_\{s\}\}\\left\(\|a\-a\_\{0\}\|\+\|b\-b\_\{0\}\|\\right\)\.\(125\)The two event probabilities are components of the coarse\-grained distributions induced by the partition containingAs\+A\_\{s\}^\{\+\},As−A\_\{s\}^\{\-\}, and the complement\. Hence \|a−a0\|\+\|b−b0\|≤2TV\(P,Q\)\.\|a\-a\_\{0\}\|\+\|b\-b\_\{0\}\|\\leq 2\\,\\mathrm\{TV\}\(P,Q\)\.\(126\)By Pinsker’s inequality, TV\(P,Q\)≤12DKL\(P∥Q\)\.\\mathrm\{TV\}\(P,Q\)\\leq\\sqrt\{\\frac\{1\}\{2\}D\_\{\\mathrm\{KL\}\}\(P\\,\\\|\\,Q\)\}\.\(127\)Therefore, \|Δs\(P,Q\)\|≤2msDKL\(P∥Q\)\.\|\\Delta\_\{s\}\(P,Q\)\|\\leq\\frac\{\\sqrt\{2\}\}\{m\_\{s\}\}\\sqrt\{D\_\{\\mathrm\{KL\}\}\(P\\,\\\|\\,Q\)\}\.\(128\)TakingP=PBP=P\_\{B\}andQ=P0Q=P\_\{0\}gives \([41](https://arxiv.org/html/2606.20231#S3.E41)\)\. ∎ ### A\.4Proof of Proposition[1](https://arxiv.org/html/2606.20231#Thmproposition1) ###### Proof\. By definition, Δs\(PB\(k\),P0\)−Δs\(PB⋆,P0\)\\displaystyle\\Delta\_\{s\}\(P\_\{B\}^\{\(k\)\},P\_\{0\}\)\-\\Delta\_\{s\}\(P\_\{B\}^\{\\star\},P\_\{0\}\)=logPB\(k\)\(As\+\)PB\(k\)\(As−\)−logPB⋆\(As\+\)PB⋆\(As−\)\\displaystyle=\\log\\frac\{P\_\{B\}^\{\(k\)\}\(A\_\{s\}^\{\+\}\)\}\{P\_\{B\}^\{\(k\)\}\(A\_\{s\}^\{\-\}\)\}\-\\log\\frac\{P\_\{B\}^\{\\star\}\(A\_\{s\}^\{\+\}\)\}\{P\_\{B\}^\{\\star\}\(A\_\{s\}^\{\-\}\)\}=Δs\(PB\(k\),PB⋆\)\.\\displaystyle=\\Delta\_\{s\}\(P\_\{B\}^\{\(k\)\},P\_\{B\}^\{\\star\}\)\.Applying Theorem[3](https://arxiv.org/html/2606.20231#Thmtheorem3)withP=PB\(k\)P=P\_\{B\}^\{\(k\)\}andQ=PB⋆Q=P\_\{B\}^\{\\star\}gives \|Δs\(PB\(k\),P0\)−Δs\(PB⋆,P0\)\|≤2msDKL\(PB\(k\)∥PB⋆\)\.\\displaystyle\\left\|\\Delta\_\{s\}\(P\_\{B\}^\{\(k\)\},P\_\{0\}\)\-\\Delta\_\{s\}\(P\_\{B\}^\{\\star\},P\_\{0\}\)\\right\|\\leq\\frac\{\\sqrt\{2\}\}\{m\_\{s\}\}\\sqrt\{D\_\{\\mathrm\{KL\}\}\(P\_\{B\}^\{\(k\)\}\\,\\\|\\,P\_\{B\}^\{\\star\}\)\}\.Using \([42](https://arxiv.org/html/2606.20231#S3.E42)\) yields \([43](https://arxiv.org/html/2606.20231#S3.E43)\)\. ∎ ### A\.5Proof of Lemma[1](https://arxiv.org/html/2606.20231#Thmlemma1) ###### Proof\. Apply the data\-processing inequality for KL divergence to the binary coarse\-graining\{Vδ,Vδc\}\\\{V\_\{\\delta\},V\_\{\\delta\}^\{c\}\\\}\. The induced Bernoulli laws have success probabilitiesp=P\(Vδ\)p=P\(V\_\{\\delta\}\)andδ=P0\(Vδ\)\\delta=P\_\{0\}\(V\_\{\\delta\}\)\. Therefore DKL\(P∥P0\)≥d\(p∥δ\)\.D\_\{\\mathrm\{KL\}\}\(P\\,\\\|\\,P\_\{0\}\)\\geq d\(p\\,\\\|\\,\\delta\)\.Substitutingp=δ\(1\+ℐδ\)p=\\delta\(1\+\\mathcal\{I\}\_\{\\delta\}\)gives \([32](https://arxiv.org/html/2606.20231#S3.E32)\)\. ∎ ### A\.6Proof of Theorem[4](https://arxiv.org/html/2606.20231#Thmtheorem4) ###### Proof\. BecauseVδ⊆Vδ′V\_\{\\delta\}\\subseteq V^\{\\prime\}\_\{\\delta\}anddP/dP0=α\\,dP/\\,dP\_\{0\}=\\alphaonVδ′V^\{\\prime\}\_\{\\delta\}, the true rare\-valid mass isP\(Vδ\)=αδP\(V\_\{\\delta\}\)=\\alpha\\delta\. Henceℐδ=\(αδ−δ\)/δ=α−1\\mathcal\{I\}\_\{\\delta\}=\(\\alpha\\delta\-\\delta\)/\\delta=\\alpha\-1\. The false\-positive regionEδ=Vδ′∖VδE\_\{\\delta\}=V^\{\\prime\}\_\{\\delta\}\\setminus V\_\{\\delta\}has baseline massperrp\_\{\\mathrm\{err\}\}, so its amplified controlled mass isP\(Eδ\)=αperrP\(E\_\{\\delta\}\)=\\alpha p\_\{\\mathrm\{err\}\}\. By Assumption[3](https://arxiv.org/html/2606.20231#Thmassumption3), resolving this amplified erroneous mass costskBc\(perr\)k\_\{B\}c\(p\_\{\\mathrm\{err\}\}\)per unit amplified false\-positive mass\. Therefore the expected overhead isΔS¯err=αkBperrc\(perr\)\\overline\{\\Delta S\}\_\{\\mathrm\{err\}\}=\\alpha k\_\{B\}p\_\{\\mathrm\{err\}\}c\(p\_\{\\mathrm\{err\}\}\)\. The expected ideal bookkeeping contribution from the amplified true rare\-valid mass is−αδkBlogα\-\\alpha\\delta k\_\{B\}\\log\\alpha, giving \([51](https://arxiv.org/html/2606.20231#S3.E51)\)\. Dividing \([51](https://arxiv.org/html/2606.20231#S3.E51)\) byP\(Vδ\)=αδP\(V\_\{\\delta\}\)=\\alpha\\deltagives the local per\-amplified\-true\-rare\-valid expression \([52](https://arxiv.org/html/2606.20231#S3.E52)\)\. The Landauer forms follow by settingc\(p\)=log\(1/p\)c\(p\)=\\log\(1/p\), and the substitutionα=ℐδ\+1\\alpha=\\mathcal\{I\}\_\{\\delta\}\+1follows from the conservative\-identification identity above\. ∎ ### A\.7Proof of Corollary[1](https://arxiv.org/html/2606.20231#Thmcorollary1) ###### Proof\. For0<p<e−10<p<e^\{\-1\}, the functiong\(p\)=plog1pg\(p\)=p\\log\\frac\{1\}\{p\}is increasing\. Hypothesis[1](https://arxiv.org/html/2606.20231#Thmhypothesis1)givesperr\(k\)≤ρδ\(εk\)p\_\{\\mathrm\{err\}\}^\{\(k\)\}\\leq\\rho\_\{\\delta\}\(\\varepsilon\_\{k\}\)\. Therefore, perr\(k\)log1perr\(k\)≤ρδ\(εk\)log1ρδ\(εk\)\.p\_\{\\mathrm\{err\}\}^\{\(k\)\}\\log\\frac\{1\}\{p\_\{\\mathrm\{err\}\}^\{\(k\)\}\}\\leq\\rho\_\{\\delta\}\(\\varepsilon\_\{k\}\)\\log\\frac\{1\}\{\\rho\_\{\\delta\}\(\\varepsilon\_\{k\}\)\}\.Multiplying by\(ℐδ\+1\)kB\(\\mathcal\{I\}\_\{\\delta\}\+1\)k\_\{B\}gives \([59](https://arxiv.org/html/2606.20231#S3.E59)\)\. Multiplying bykB/δk\_\{B\}/\\deltagives \([60](https://arxiv.org/html/2606.20231#S3.E60)\)\. Sinceρδ\(ε\)→0\\rho\_\{\\delta\}\(\\varepsilon\)\\to 0andplog\(1/p\)→0p\\log\(1/p\)\\to 0asp→0\+p\\to 0^\{\+\}, both excess bookkeeping penalties vanish asεk→0\\varepsilon\_\{k\}\\to 0\. HenceSklocS\_\{k\}^\{\\mathrm\{loc\}\}approaches the ideal local bookkeeping value from above\. ∎ ### A\.8Additional protocol and rare\-set qualifications The main text uses the conservative\-identification caseVδ⊆Vδ′V\_\{\\delta\}\\subseteq V^\{\\prime\}\_\{\\delta\}to isolate false\-positive overhead and preserve the identityα=ℐδ\+1\\alpha=\\mathcal\{I\}\_\{\\delta\}\+1\. If false negatives are allowed, define t=P0\(Vδ∩Vδ′\),f=P0\(Vδ′∖Vδ\),m=δ−t\.\\displaystyle t=P\_\{0\}\(V\_\{\\delta\}\\cap V^\{\\prime\}\_\{\\delta\}\),\\qquad f=P\_\{0\}\(V^\{\\prime\}\_\{\\delta\}\\setminus V\_\{\\delta\}\),\\qquad m=\\delta\-t\.\(129\)IfdP/dP0=α\\,dP/\\,dP\_\{0\}=\\alphaonVδ′V^\{\\prime\}\_\{\\delta\}anddP/dP0=β\\,dP/\\,dP\_\{0\}=\\betaoutsideVδ′V^\{\\prime\}\_\{\\delta\}, with β=1−α\(t\+f\)1−t−f,P\(Vδ\)=αt\+β\(δ−t\),\\displaystyle\\beta=\\frac\{1\-\\alpha\(t\+f\)\}\{1\-t\-f\},\\qquad P\(V\_\{\\delta\}\)=\\alpha t\+\\beta\(\\delta\-t\),\(130\)then ℐδ=αt\+β\(δ−t\)−δδ\.\\displaystyle\\mathcal\{I\}\_\{\\delta\}=\\frac\{\\alpha t\+\\beta\(\\delta\-t\)\-\\delta\}\{\\delta\}\.\(131\)Thusα=ℐδ\+1\\alpha=\\mathcal\{I\}\_\{\\delta\}\+1is specific to the no\-false\-negative case\. The baseline\-surprisal Landauer form in Eq\. \([53](https://arxiv.org/html/2606.20231#S3.E53)\) charges false\-positive assignments according to their rarity underP0P\_\{0\}\. A controlled\-distribution erasure protocol instead usesP\(Eδ\)=αperrP\(E\_\{\\delta\}\)=\\alpha p\_\{\\mathrm\{err\}\}, giving, whenαperr<1\\alpha p\_\{\\mathrm\{err\}\}<1, αkBperrlog1αperr=αkBperr\(log1perr−logα\)\.\\alpha k\_\{B\}p\_\{\\mathrm\{err\}\}\\log\\frac\{1\}\{\\alpha p\_\{\\mathrm\{err\}\}\}=\\alpha k\_\{B\}p\_\{\\mathrm\{err\}\}\\left\(\\log\\frac\{1\}\{p\_\{\\mathrm\{err\}\}\}\-\\log\\alpha\\right\)\.The protocol\-dependent form in the main text keeps this choice explicit\. Similarly, the path\-deviation bound in Theorem[3](https://arxiv.org/html/2606.20231#Thmtheorem3)is a moderate\-event stability bound because of the factor1/ms1/m\_\{s\}; rare\-valid amplification is handled separately through Lemma[1](https://arxiv.org/html/2606.20231#Thmlemma1)\. ### A\.9A sufficient condition for rare\-set stability Hypothesis[1](https://arxiv.org/html/2606.20231#Thmhypothesis1)is a statistical regularity condition, not a thermodynamic law\. Here we record a concrete finite\-resolution setting in which it holds\. Fix an observational partitionΠη\\Pi\_\{\\eta\}and a rare\-tail regionRδ,ηR\_\{\\delta,\\eta\}, chosen under the passive lawP0P\_\{0\}\. Lets:Πη→ℝs:\\Pi\_\{\\eta\}\\to\\mathbb\{R\}be a cell\-level validity score and letτ\\taube a validity threshold, so that the true rare\-valid set at this resolution is Vδ,η=⋃C∈Πη:C⊆Rδ,η,s\(C\)≥τC\.\\displaystyle V\_\{\\delta,\\eta\}=\\bigcup\_\{C\\in\\Pi\_\{\\eta\}:\\,C\\subseteq R\_\{\\delta,\\eta\},\\,s\(C\)\\geq\\tau\}C\.\(132\)Letsk:Πη→ℝs\_\{k\}:\\Pi\_\{\\eta\}\\to\\mathbb\{R\}be the validity score induced by thekk\-level recursive model, and define the inferred set Vδ,η\(k\)=⋃C∈Πη:C⊆Rδ,η,sk\(C\)≥τC\.\\displaystyle V\_\{\\delta,\\eta\}^\{\(k\)\}=\\bigcup\_\{C\\in\\Pi\_\{\\eta\}:\\,C\\subseteq R\_\{\\delta,\\eta\},\\,s\_\{k\}\(C\)\\geq\\tau\}C\.\(133\)Assume that the score error is uniformly controlled by the intervention\-relevant model error: supC⊆Rδ,η\|sk\(C\)−s\(C\)\|≤cεk\\displaystyle\\sup\_\{C\\subseteq R\_\{\\delta,\\eta\}\}\|s\_\{k\}\(C\)\-s\(C\)\|\\leq c\\,\\varepsilon\_\{k\}\(134\)for some constantc\>0c\>0\. Also assume a boundary\-margin condition: there is a modulusmδ\(t\)m\_\{\\delta\}\(t\), withmδ\(t\)→0m\_\{\\delta\}\(t\)\\to 0ast→0t\\to 0, such that P0\(⋃C⊆Rδ,η:0≤τ−s\(C\)≤tC\)≤mδ\(t\)\.\\displaystyle P\_\{0\}\\\!\\left\(\\bigcup\_\{C\\subseteq R\_\{\\delta,\\eta\}:\\,0\\leq\\tau\-s\(C\)\\leq t\}C\\right\)\\leq m\_\{\\delta\}\(t\)\.\(135\)Then P0\(Vδ,η\(k\)∖Vδ,η\)≤mδ\(cεk\)\.\\displaystyle P\_\{0\}\\\!\\left\(V\_\{\\delta,\\eta\}^\{\(k\)\}\\setminus V\_\{\\delta,\\eta\}\\right\)\\leq m\_\{\\delta\}\(c\\,\\varepsilon\_\{k\}\)\.\(136\)Thus Hypothesis[1](https://arxiv.org/html/2606.20231#Thmhypothesis1)holds withρδ\(ε\)=mδ\(cε\)\\rho\_\{\\delta\}\(\\varepsilon\)=m\_\{\\delta\}\(c\\varepsilon\)\. Indeed, if a cellC⊆Rδ,ηC\\subseteq R\_\{\\delta,\\eta\}is a false positive, thensk\(C\)≥τs\_\{k\}\(C\)\\geq\\taubuts\(C\)<τs\(C\)<\\tau\. By Eq\. \([134](https://arxiv.org/html/2606.20231#A1.E134)\), τ−s\(C\)≤sk\(C\)−s\(C\)≤cεk\.\\tau\-s\(C\)\\leq s\_\{k\}\(C\)\-s\(C\)\\leq c\\,\\varepsilon\_\{k\}\.Hence every false\-positive cell lies in the boundary band \{C⊆Rδ,η:0≤τ−s\(C\)≤cεk\},\\\{C\\subseteq R\_\{\\delta,\\eta\}:0\\leq\\tau\-s\(C\)\\leq c\\,\\varepsilon\_\{k\}\\\},and Eq\. \([136](https://arxiv.org/html/2606.20231#A1.E136)\) follows from the margin condition\. A particularly simple case occurs when the validity boundary has a positive finite\-resolution margin: if there existsγ\>0\\gamma\>0such that no cell inRδ,ηR\_\{\\delta,\\eta\}satisfies0<\|τ−s\(C\)\|≤γ0<\|\\tau\-s\(C\)\|\\leq\\gamma, thenmδ\(t\)=0m\_\{\\delta\}\(t\)=0fort<γt<\\gamma\. Consequently, whenevercεk<γc\\,\\varepsilon\_\{k\}<\\gamma, the false\-positive mass is zero: P0\(Vδ,η\(k\)∖Vδ,η\)=0\.P\_\{0\}\\\!\\left\(V\_\{\\delta,\\eta\}^\{\(k\)\}\\setminus V\_\{\\delta,\\eta\}\\right\)=0\.More generally, if the boundary band satisfies a polynomial margin boundmδ\(t\)≤Cδtam\_\{\\delta\}\(t\)\\leq C\_\{\\delta\}t^\{a\}for constantsCδ\>0C\_\{\\delta\}\>0anda\>0a\>0, then perr\(k\)≤Cδcaεka\.p\_\{\\rm err\}^\{\(k\)\}\\leq C\_\{\\delta\}c^\{a\}\\varepsilon\_\{k\}^\{a\}\.This gives an explicit modulus for Hypothesis[1](https://arxiv.org/html/2606.20231#Thmhypothesis1)\. ## Appendix BEntropy\-rate Estimation for text This appendix gives the self\-contained entropy\-rate protocol used in Section[IV\.2\.5](https://arxiv.org/html/2606.20231#S4.SS2.SSS5)\. The goal is not to reproduce a full detector or model\-comparison study, but only to obtain two character\-level entropy\-rate estimates on a common symbolic alphabet: a human prose reference and a GPT\-5 long\-form prose estimate\. All texts are converted to a fixed2727\-symbol alphabet, Σ27=\{a,b,…,z,space\}\.\\Sigma\_\{27\}=\\\{a,b,\\ldots,z,\\text\{space\}\\\}\.Text is lowercased; punctuation, digits, and non\-ASCII characters are removed\. The resulting character stream is treated as a finite sample path from an approximately stationary symbolic source\. Entropy is reported in bits per character\. For a preprocessed sequences1:Ns\_\{1:N\}, we use a nonparametric probabilistic\-finite\-state\-automaton entropy\-rate estimator\[[31](https://arxiv.org/html/2606.20231#bib.bib31),[32](https://arxiv.org/html/2606.20231#bib.bib32)\]\. For a substring\-frequency thresholdmm, substrings with fewer thanmmoccurrences are excluded\. Empirical next\-symbol distributions are estimated from retained histories and histories inducing similar next\-symbol laws are represented as states of an inferred finite\-state source\. IfQmQ\_\{m\}is the resulting state set,π^m\(q\)\\widehat\{\\pi\}\_\{m\}\(q\)is the empirical state frequency, andp^m\(a∣q\)\\widehat\{p\}\_\{m\}\(a\\mid q\)is the empirical next\-symbol law, the threshold\-specific estimate is H^\(m\)=−∑q∈Qmπ^m\(q\)∑a∈Σ27p^m\(a∣q\)log2p^m\(a∣q\)\.\\widehat\{H\}^\{\(m\)\}=\-\\sum\_\{q\\in Q\_\{m\}\}\\widehat\{\\pi\}\_\{m\}\(q\)\\sum\_\{a\\in\\Sigma\_\{27\}\}\\widehat\{p\}\_\{m\}\(a\\mid q\)\\log\_\{2\}\\widehat\{p\}\_\{m\}\(a\\mid q\)\.\(137\)To reduce dependence on a single pruning threshold, the reported document\-level estimate is the median across a fixed threshold grid, H^=medianm∈\{m1,…,mM\}H^\(m\)\.\\widehat\{H\}=\\mathrm\{median\}\_\{m\\in\\\{m\_\{1\},\\ldots,m\_\{M\}\\\}\}\\widehat\{H\}^\{\(m\)\}\.\(138\)The estimator requires no language model, labels, or training corpus\. The thresholding step removes poorly supported histories; the median aggregation stabilizes the estimate across admissible frequency cutoffs\. The human reference corpus consists of English Project Gutenberg long\-form prose\. Legal headers and boilerplate are removed, and texts shorter than150,000150\{,\}000post\-processed characters are excluded\. This yields43414341Gutenberg documents\. The GPT\-5 corpus consists of197197long\-form prose samples generated with a fixed narrative\-prompt protocol using GPT\-5 API access\. Each sample is generated toward a target length of approximately150,000150\{,\}000characters using repeated continuation calls under default sampling settings, with only maximum completion length controlled\. The cohort\-level summary statistics used in the symbolic calculation are The main text uses the mean valuesHGPT5=0\.74H\_\{\\mathrm\{GPT5\}\}=0\.74andHH=0\.77H\_\{H\}=0\.77bits per character\. The median values give the same qualitative ordering; using the Gutenberg median0\.780\.78instead would make the GPT\-5 support correction smaller by an additional factor of22atn⋆=100n\_\{\\star\}=100\. ## Appendix CNumerical Scale Calculations This appendix supports the numerical values in Table[2](https://arxiv.org/html/2606.20231#S4.T2)\. The table does not report the raw liftℐ\\mathcal\{I\}, because the values span from0to powers such as10101510^\{10^\{15\}\}\. Instead it reports the compressed double\-log scale L\\displaystyle L≜log10\(ℐ\+1\),\\displaystyle\\triangleq\\log\_\{10\}\(\\mathcal\{I\}\+1\),\(139\)Λ\\displaystyle\\Lambda≜log10\(L\+1\)\.\\displaystyle\\triangleq\\log\_\{10\}\(L\+1\)\.\(140\)The\+1\+1terms keep the scale finite at the passive baseline\. For all nonzero large examples,Λ\\Lambdabehaves as an ordinaryloglog\\log\\log\-scale\. When a row is a range, the endpoints of the displayedΛ\\Lambda\-range are obtained by applying Eq\. \([140](https://arxiv.org/html/2606.20231#A3.E140)\) to the endpoint values ofLL\. The ordering in Table[2](https://arxiv.org/html/2606.20231#S4.T2)uses the midpoint of the underlyingLL\-range\. ### C\.1Passive baseline For a passive system,P=P0P=P\_\{0\}, soℐ=0\\mathcal\{I\}=0\. Therefore L=log10\(1\)=0,andΛ=log10\(1\)=0\.\\displaystyle L=\\log\_\{10\}\(1\)=0,\\text\{ and \}\\Lambda=\\log\_\{10\}\(1\)=0\.\(141\) ### C\.2Fixed\-feedback amplification Suppose a controller amplifies a rare\-valid set by a constant likelihood factorα\\alpha\. Then P\(Vδ\)\\displaystyle P\(V\_\{\\delta\}\)=αδ,\\displaystyle=\\alpha\\delta,\(142\)ℐδ\\displaystyle\\mathcal\{I\}\_\{\\delta\}=αδ−δδ=α−1,\\displaystyle=\\frac\{\\alpha\\delta\-\\delta\}\{\\delta\}=\\alpha\-1,\(143\)L\\displaystyle L=log10\(ℐδ\+1\)=log10α\.\\displaystyle=\\log\_\{10\}\(\\mathcal\{I\}\_\{\\delta\}\+1\)=\\log\_\{10\}\\alpha\.\(144\)For the fixed\-feedback row we useα=2\\alpha=2to10210^\{2\}\. Thus α=2:\\displaystyle\\alpha=2:\\quadL=0\.301,Λ=0\.114,\\displaystyle L=0\.301,\\quad\\Lambda=0\.114,\(145\)α=102:\\displaystyle\\alpha=10^\{2\}:\\quadL=2\.000,Λ=0\.477\.\\displaystyle L=2\.000,\\quad\\Lambda=0\.477\.\(146\)This gives the table entryΛ=0\.114\\Lambda=0\.114–0\.4770\.477\. ### C\.3Repeated dynamic control Repeated control compounds set\-level lift\. Ifmmstages have approximate lift factorsα1,…,αm\\alpha\_\{1\},\\ldots,\\alpha\_\{m\}, then ℐ\+1\\displaystyle\\mathcal\{I\}\+1≈∏j=1mαj,\\displaystyle\\approx\\prod\_\{j=1\}^\{m\}\\alpha\_\{j\},\(147\)L\\displaystyle L≈∑j=1mlog10αj\.\\displaystyle\\approx\\sum\_\{j=1\}^\{m\}\\log\_\{10\}\\alpha\_\{j\}\.\(148\)The repeated\-control row uses seven to ten binary improvements: ℐ\+1≈27–210\.\\mathcal\{I\}\+1\\approx 2^\{7\}\\text\{\-\-\}2^\{10\}\.\(149\)Hence m=7:\\displaystyle m=7:\\quadL=7log102=2\.107,Λ=0\.493,\\displaystyle L=7\\log\_\{10\}2=2\.107,\\quad\\Lambda=0\.493,\(150\)m=10:\\displaystyle m=10:\\quadL=10log102=3\.010,Λ=0\.603\.\\displaystyle L=10\\log\_\{10\}2=3\.010,\\quad\\Lambda=0\.603\.\(151\)This givesΛ=0\.493\\Lambda=0\.493–0\.6030\.603\. ### C\.4Sentence\-scale symbolic lift For symbolic sequences, Eq\. \([92](https://arxiv.org/html/2606.20231#S4.E92)\) gives L=nΔℓlog102,L=n\\,\\Delta\\ell\\,\\log\_\{10\}2,\(152\)wherennis sequence length andΔℓ\\Delta\\ellis the per\-symbol structured set\-level lift in bits\. The sentence\-scale calculation in Section[IV\.2\.5](https://arxiv.org/html/2606.20231#S4.SS2.SSS5)instead begins from cardinalities\. Louwerse’s finite\-resolution combinatorial estimate\[[26](https://arxiv.org/html/2606.20231#bib.bib26)\]gives NV≈5×1021N\_\{V\}\\approx 5\\times 10^\{21\}\(153\)valid or interpretable English sentence\-scale strings\. The exemplary human\-quality target set is estimated as NG≈B⋆S⋆∼\(2×103\)\(5×103\)=107\.N\_\{G\}\\approx B\_\{\\star\}S\_\{\\star\}\\sim\(2\\times 10^\{3\}\)\(5\\times 10^\{3\}\)=10^\{7\}\.\(154\)Thus δ⋆\\displaystyle\\delta\_\{\\star\}=NGNV=2×10−15,\\displaystyle=\\frac\{N\_\{G\}\}\{N\_\{V\}\}=2\\times 10^\{\-15\},\(155\)ℐH\+1\\displaystyle\\mathcal\{I\}\_\{H\}\+1≈NVNG=5×1014\.\\displaystyle\\approx\\frac\{N\_\{V\}\}\{N\_\{G\}\}=5\\times 10^\{14\}\.\(156\)Therefore LH\\displaystyle L\_\{H\}=log10\(5×1014\)=14\.699,\\displaystyle=\\log\_\{10\}\(5\\times 10^\{14\}\)=14\.699,\(157\)ΛH\\displaystyle\\Lambda\_\{H\}=log10\(14\.699\+1\)=1\.196\.\\displaystyle=\\log\_\{10\}\(14\.699\+1\)=1\.196\.\(158\)UsingNG=106N\_\{G\}=10^\{6\}–10810^\{8\}givesℐH\+1=5×1015\\mathcal\{I\}\_\{H\}\+1=5\\times 10^\{15\}–5×10135\\times 10^\{13\}andΛH=1\.223\\Lambda\_\{H\}=1\.223–1\.1671\.167\. For the entropy\-rate\-corrected GPT\-5 calculation, we use an AEP\-style support\-size approximation with an explicit finite\-length slack term, log2qGPT5qH=n⋆\(HGPT5−HH\)\+ρn⋆\.\\log\_\{2\}\\frac\{q\_\{\\mathrm\{GPT5\}\}\}\{q\_\{H\}\}=n\_\{\\star\}\(H\_\{\\mathrm\{GPT5\}\}\-H\_\{H\}\)\+\\rho\_\{n\_\{\\star\}\}\.\(159\)The termρn⋆\\rho\_\{n\_\{\\star\}\}is not estimated here\. It represents finite\-length typical\-set error, rare\-valid boundary effects, and support\-overlap error\. This qualification matters becausen⋆=100n\_\{\\star\}=100characters is sentence\-scale, whereas AEP convergence is asymptotic\. The point estimate in Table[2](https://arxiv.org/html/2606.20231#S4.T2)setsρ100=0\\rho\_\{100\}=0: 2100\(HGPT5−HH\)\\displaystyle 2^\{100\(H\_\{\\mathrm\{GPT5\}\}\-H\_\{H\}\)\}=2100\(0\.74−0\.77\)=2−3=0\.125,\\displaystyle=2^\{100\(0\.74\-0\.77\)\}=2^\{\-3\}=0\.125,\(160\)ℐGPT5\+1\\displaystyle\\mathcal\{I\}\_\{\\mathrm\{GPT5\}\}\+1≈\(5×1014\)2−3=6\.25×1013,\\displaystyle\\approx\(5\\times 10^\{14\}\)2^\{\-3\}=6\.25\\times 10^\{13\},\(161\)ΛGPT5\\displaystyle\\Lambda\_\{\\mathrm\{GPT5\}\}=log10\{log10\(6\.25×1013\)\+1\}=1\.170\.\\displaystyle=\\log\_\{10\}\\\{\\log\_\{10\}\(6\.25\\times 10^\{13\}\)\+1\\\}=1\.170\.\(162\)Equivalently, the finite\-length corrected expression is ℐGPT5\+1≈\(5×1014\)2100\(0\.74−0\.77\)\+ρ100\.\\mathcal\{I\}\_\{\\mathrm\{GPT5\}\}\+1\\approx\(5\\times 10^\{14\}\)2^\{100\(0\.74\-0\.77\)\+\\rho\_\{100\}\}\.\(163\)The compressed scoreΛ\\Lambdais relatively insensitive to moderate multiplicative changes, but the rawℐ\\mathcal\{I\}should be interpreted only at order\-of\-magnitude precision\. The entropy estimates are in bits per character under the2727\-symbol alphabet, so the sentence\-scale length used here isn⋆=100n\_\{\\star\}=100characters, not2020words\. ### C\.5Fluctuation\-theorem demon For an ideal demon that realizes an entropy\-reducing trajectory of magnitudeΔS\\Delta S, the fluctuation\-theorem scaling gives L=log10\(ℐ\+1\)=ΔS/kBlog10\.L=\\log\_\{10\}\(\\mathcal\{I\}\+1\)=\\frac\{\\Delta S/k\_\{B\}\}\{\\log 10\}\.\(164\)WithΔS=10−19J/K\\Delta S=10^\{\-19\}\\,\\mathrm\{J/K\}andkB=1\.380649×10−23J/Kk\_\{B\}=1\.380649\\times 10^\{\-23\}\\,\\mathrm\{J/K\}, ΔSkB=7242\.97,L=3145\.58,Λ=log10\(3146\.58\)=3\.498\.\\displaystyle\\frac\{\\Delta S\}\{k\_\{B\}\}=7242\.97,L=3145\.58,\\Lambda=\\log\_\{10\}\(3146\.58\)=3\.498\.\(165\)The table rounds this toΛ=3\.498\\Lambda=3\.498\. ### C\.6Velocity\-selection demon For the Maxwell–Boltzmann velocity\-selection demon, Eq\. \([70](https://arxiv.org/html/2606.20231#S4.E70)\) gives L≈N2log10q\(ϵ\)ϕ\(ϵ\),L\\approx\\frac\{N\}\{2\\log 10\}\\,q\(\\epsilon\)\\phi\(\\epsilon\),\(166\)whereNNis the number of particles,q\(ϵ\)q\(\\epsilon\)is the fraction above the velocity threshold, andϕ\(ϵ\)\\phi\(\\epsilon\)is the excess dimensionless kinetic energy above the thermal mean\. The numerical inputsq\(ϵ\)q\(\\epsilon\),ϕ\(ϵ\)\\phi\(\\epsilon\), andLLare listed in Table[1](https://arxiv.org/html/2606.20231#S4.T1)\. ForN=100N=100, the endpoint values overϵ=2\\epsilon=2–55are L=1\.85–9\.38,andΛ=log10\(L\+1\)=0\.455–1\.016\.\\displaystyle L=1\.85\\text\{\-\-\}9\.38,\\text\{ and \}\\Lambda=\\log\_\{10\}\(L\+1\)=0\.455\\text\{\-\-\}1\.016\.\(167\)For1mm31\\,\\mathrm\{mm\}^\{3\}of air at one atmosphere and300K300\\,\\mathrm\{K\},N≈2\.44×1016N\\approx 2\.44\\times 10^\{16\}\. Table[1](https://arxiv.org/html/2606.20231#S4.T1)gives L\\displaystyle L=4\.50×1014–2\.29×1015,\\displaystyle=4\.50\\times 10^\{14\}\\text\{\-\-\}2\.29\\times 10^\{15\},\(168\)Λ\\displaystyle\\Lambda=14\.653–15\.360\.\\displaystyle=14\.653\\text\{\-\-\}15\.360\.\(169\)The1mm31\\,\\mathrm\{mm\}^\{3\}demon therefore exceeds the single\-eventΔS=10−19J/K\\Delta S=10^\{\-19\}\\,\\mathrm\{J/K\}demon on this scale because it aggregates a very large number of microscopic velocity selections\. ## References - \[1\]A\. M\. Turing, “Computing machinery and intelligence,”*Mind*59, 433–460 \(1950\)\. - \[2\]S\. Legg and M\. Hutter, “Universal intelligence: A definition of machine intelligence,”*Minds and Machines*17, 391–444 \(2007\)\. - \[3\]S\. Russell and P\. Norvig,*Artificial Intelligence: A Modern Approach*, 4th ed\. \(Pearson, 2021\)\. - \[4\]F\. Chollet, “On the measure of intelligence,” arXiv:1911\.01547 \(2019\)\. - \[5\]M\. Minsky, “Matter, mind and models,” in*Proceedings of IFIP Congress 65*\(1965\)\. - \[6\]M\. Minsky,*The Society of Mind*\(Simon and Schuster, New York, 1986\)\. - \[7\]M\. Minsky,*The Emotion Machine*\(Simon and Schuster, New York, 2006\)\. - \[8\]K\. Gödel, “Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I,”*Monatshefte für Mathematik und Physik*38, 173–198 \(1931\)\. - \[9\]D\. R\. Hofstadter,*Gödel, Escher, Bach: An Eternal Golden Braid*\(Basic Books, New York, 1979\)\. - \[10\]D\. R\. Hofstadter,*I Am a Strange Loop*\(Basic Books, New York, 2007\)\. - \[11\]R\. Landauer, “Irreversibility and heat generation in the computing process,”*IBM Journal of Research and Development*5, 183–191 \(1961\)\. - \[12\]C\. H\. Bennett, “The thermodynamics of computation—a review,”*International Journal of Theoretical Physics*21, 905–940 \(1982\)\. - \[13\]T\. Sagawa and M\. Ueda, “Generalized Jarzynski equality under nonequilibrium feedback control,”*Physical Review Letters*104, 090602 \(2010\)\. - \[14\]J\. M\. R\. Parrondo, J\. M\. Horowitz, and T\. Sagawa, “Thermodynamics of information,”*Nature Physics*11, 131–139 \(2015\)\. - \[15\]D\. Mandal and C\. Jarzynski, “Work and information processing in a solvable model of Maxwell’s demon,”*Proceedings of the National Academy of Sciences USA*109, 11641–11645 \(2012\)\. - \[16\]D\. J\. Evans, E\. G\. D\. Cohen, and G\. P\. Morriss, “Probability of second law violations in shearing steady states,”*Physical Review Letters*71, 2401–2404 \(1993\)\. - \[17\]C\. Jarzynski, “Nonequilibrium equality for free energy differences,”*Physical Review Letters*78, 2690–2693 \(1997\)\. - \[18\]G\. E\. Crooks, “Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences,”*Physical Review E*60, 2721–2726 \(1999\)\. - \[19\]U\. Seifert, “Entropy production along a stochastic trajectory and an integral fluctuation theorem,”*Physical Review Letters*95, 040602 \(2005\)\. - \[20\]C\. E\. Shannon, “A mathematical theory of communication,”*Bell System Technical Journal*27, 379–423, 623–656 \(1948\)\. - \[21\]T\. M\. Cover and J\. A\. Thomas,*Elements of Information Theory*, 2nd ed\. \(Wiley, Hoboken, 2006\)\. - \[22\]D\. Premack and G\. Woodruff, “Does the chimpanzee have a theory of mind?”*Behavioral and Brain Sciences*1, 515–526 \(1978\)\. - \[23\]D\. C\. Dennett,*The Intentional Stance*\(MIT Press, Cambridge, MA, 1987\)\. - \[24\]K\. Friston, “The free\-energy principle: A unified brain theory?”*Nature Reviews Neuroscience*11, 127–138 \(2010\)\. - \[25\]A\. Kolchinsky and D\. H\. Wolpert, “Semantic information, autonomous agency and non\-equilibrium statistical physics,”*Interface Focus*8, 20180041 \(2018\)\. - \[26\]M\. M\. Louwerse, finite\-resolution estimates of English sentence\-scale combinatorics \(2021\)\. - \[27\]M\. Gerlach and F\. Font\-Clos, “A standardized Project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics,”*Entropy*22, 126 \(2020\)\. - \[28\]M\. Li and P\. Vitányi,*An Introduction to Kolmogorov Complexity and Its Applications*, 3rd ed\. \(Springer, New York, 2008\)\. - \[29\]P\. Gács, J\. T\. Tromp, and P\. M\. B\. Vitányi, “Algorithmic statistics,”*IEEE Transactions on Information Theory*47, 2443–2463 \(2001\)\. - \[30\]L\. A\. Levin, “Laws of information conservation \(non\-growth\) and aspects of the foundation of probability theory,”*Problems of Information Transmission*10, 206–210 \(1974\)\. - \[31\]I\. Chattopadhyay and collaborators, entropy\-rate estimation with probabilistic finite\-state automata; software and technical documentation\. - \[32\]I\. Chattopadhyay and collaborators, finite\-state source reconstruction and entropy\-rate estimation protocols; software and technical documentation\.
Similar Articles
My 3 cents on RSI
Vadim Fedenko shares a technical analysis of Recursive Self-Improvement (RSI), arguing that true RSI requires improving capability faster than complexity and expanding architectural space rather than just optimizing within fixed parameters. He doubts recent claims by xAI and Anthropic that RSI could arrive within a year, citing LLMs' poor subtractive engineering skills and current reward functions that ignore complexity.
The Narrow Window of Max Q: Why Intelligence Must Throttle Up, Not Down
This essay argues that civilization is at a structurally dangerous inflection point analogous to a rocket at maximum dynamic pressure, where AI, weapons, resource depletion, and institutional fragility converge — and that the appropriate response is to accelerate rather than throttle down intelligence and complexity. It frames cosmic and civilizational evolution as a staged sequence of diminishing free-energy gradients, positioning humanity as a potentially unique carrier of complexity in the observable universe.
Machine Psychometrics: A Mathematical Psychology of Artificial Intelligence
This paper introduces Machine Psychometrics, a mathematical psychology framework for measuring latent behavioral, metacognitive, and self-modeling dispositions in artificial agents, proposing a Machine Mindprint profile and Trust Protocol for evaluation and deployment decisions.
Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development
This paper introduces the Eisbach log-barrier, a parameter-free weight derived from the entropy of DiT output's spatial energy distribution, which when applied to LoRA fine-tuning of Stable Audio 3 improves musical diversity and thematic development without causing mode collapse.
Physics-IQ Verified
This paper presents a systematic audit of the Physics-IQ benchmark for evaluating physical understanding in video generative models, proposing improvements to prompts and scoring to enhance reliability.