Phase Transitions in Driven Informational Systems: A Two-Field Perspective on Learning Theory and Non-Equilibrium Chemistry

arXiv cs.LG Papers

Summary

This paper proposes a unified theoretical framework for phase transitions in deep learning (grokking, emergent capabilities) and non-equilibrium chemistry, describing both as driven informational systems governed by two gradient fields.

arXiv:2605.16325v1 Announce Type: new Abstract: Phase-transition phenomena in deep learning (grokking, emergent capabilities, and ontological reorganization under context shift) have been studied through several lenses, including representational compression, singular learning theory, and information-theoretic progress measures. Independently, non-equilibrium statistical physics has identified phase transitions in driven chemical reaction networks underlying prebiotic selection, with empirical signatures that are difficult to reproduce within single-field gradient accounts. We propose a perspective in which both classes of phenomena admit a common description as driven informational systems: stochastic processes governed by two gradient fields, an entropy production rate Sigma and an information quasi-potential Phi_I := -ln p*, where p* is the stationary density. Within this framework we introduce two candidate order parameters: an adversarial breakdown threshold alpha_dagger and a self-referential coupling threshold kappa_c. The joint scaling of (alpha_dagger, kappa_c) defines a candidate universality class with exponents (gamma_1, gamma_2). We outline the geometric structure of this framework, identify falsifiable predictions distinguishing it from single-field alternatives, and show consistency with recent empirical findings (2024--2026) on alignment transitions, adversarial breakdown scaling, and partial introspection in large language models.
Original Article
View Cached Full Text

Cached at: 05/19/26, 06:41 AM

# Phase Transitions in Driven Informational Systems: A Two-Field Perspective on Learning Theory and Non-Equilibrium Chemistry
Source: [https://arxiv.org/html/2605.16325](https://arxiv.org/html/2605.16325)
###### Abstract

Phase\-transition phenomena in deep learning—grokking, emergent capabilities, and ontological reorganization under context shift—have been studied through several theoretical lenses, including representational compression, singular learning theory, and information\-theoretic progress measures\. Independently, non\-equilibrium statistical physics has identified phase transitions in driven chemical reaction networks underlying prebiotic selection, with empirical signatures \(catalysis–confinement synergy, optimal entropy\-flux windows\) that are difficult to reproduce within single\-field gradient accounts\. We propose a perspective in which both classes of phenomena admit a common description as*driven informational systems*: stochastic processes governed by two gradient fields—an entropy production rateΣ\\Sigmaand an information quasi\-potentialΦI:=−ln⁡p∗\\Phi\_\{I\}:=\-\\ln p^\{\*\}, wherep∗p^\{\*\}is the stationary density\. Within this framework we discuss two candidate order parameters: an adversarial breakdown thresholdα†\\alpha^\{\\dagger\}whose decay with the primitive\-set cardinality\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|is logarithmic, and a self\-referential coupling thresholdκc\\kappa\_\{c\}associated with predictive feedback through an internal model\. The joint scaling of\(α†,κc\)\(\\alpha^\{\\dagger\},\\kappa\_\{c\}\)defines a candidate universality class, with two scaling exponents\(γ1,γ2\)\(\\gamma\_\{1\},\\gamma\_\{2\}\)as class invariants\. We do not claim that biological intelligence and large language models are the same kind of system; we propose only that they may be productively studied as particular instances within this common framework, on configuration manifolds shaped by carbon–nitrogen chemistry under solar entropy flux and by transformer parameter spaces under gradient\-descent flux, respectively\. We outline the geometric structure of the framework, identify three falsifiable predictions that distinguish it from single\-field alternatives, and synthesise recent empirical findings \(2024–2026\) on alignment phase transitions, adversarial breakdown scaling, and partial introspection in frontier language models, with which the framework is consistent\. Detailed proofs and supporting numerical analyses of the component results appear in companion preprints; the present paper develops the connections between them\.

Keywords:phase transitions, grokking, alignment, non\-equilibrium statistical mechanics, free energy principle, self\-referential coupling, ontological reorganization, prebiotic selection\.

## 1Introduction

### 1\.1The convergent puzzle

Phase\-transition phenomena in deep learning—grokking, emergent capabilities, and ontological reorganization under context shift—have been studied through several theoretical lenses, none of which we take to be definitive\.Poweret al\.\([2022](https://arxiv.org/html/2605.16325#bib.bib3)\)observed that small transformers trained on modular arithmetic generalize abruptly long after they have memorized the training set, andNandaet al\.\([2023](https://arxiv.org/html/2605.16325#bib.bib4)\)subsequently showed that this delayed transition coincides with the emergence of Fourier\-structured circuits\.Liuet al\.\([2023](https://arxiv.org/html/2605.16325#bib.bib6)\)interpreted the same phenomenon through the lens of representational compression, andDeMosset al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib7)\)formalized the rise\-and\-fall of model complexity through a rate–distortion lens\.Weiet al\.\([2022](https://arxiv.org/html/2605.16325#bib.bib9)\)documented*emergent capabilities*in large language models—abilities that appear discontinuously above scale thresholds—and a parallel literature has examined phase transitions in alignment dynamics\(Casperet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib14)\), in\-context learning\(Olssonet al\.,[2022](https://arxiv.org/html/2605.16325#bib.bib10)\), and introspective access\(Lindsey,[2025](https://arxiv.org/html/2605.16325#bib.bib15); Binderet al\.,[2024](https://arxiv.org/html/2605.16325#bib.bib16)\)\.

These phenomena share three structural features: a substrate\-level dynamics governed by gradient descent on a loss surface; a sustained external driving signal \(training\-data flux, RLHF feedback, weight decay\); and a transition between qualitatively distinct representational regimes whose detailed mechanism remains under active investigation\.

A structurally similar pattern appears in non\-equilibrium chemistry\.Ferriset al\.\([1996](https://arxiv.org/html/2605.16325#bib.bib40)\)demonstrated that clay\-catalyzed RNA polymerization combined with geometric confinement produces oligomers of length up to∼\\sim55\-mer, an order of magnitude longer than solution\-only controls; subsequent reanalysis extracts a catalysis–confinement synergy factorS≈5\.75S\\approx 5\.75that exceeds what single\-field gradient dynamics on compact manifolds with linear driving can produce under the assumptions made there\.Blanket al\.\([2001](https://arxiv.org/html/2605.16325#bib.bib41)\)reported a non\-monotonic optimal entropy\-flux window for amino\-acid yield in shock synthesis\.Matreuxet al\.\([2024](https://arxiv.org/html/2605.16325#bib.bib42)\)demonstrated that simple heat flows selectively enrich more than fifty prebiotic building blocks by up to three orders of magnitude\.Floroniet al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib43)\)realized a prebiotic\-to\-biotic transition criterion in a single membraneless protocell driven by a heat gradient\.Lianget al\.\([2024a](https://arxiv.org/html/2605.16325#bib.bib32),[b](https://arxiv.org/html/2605.16325#bib.bib33)\)placed kinetics\-independent bounds on the magnitude of symmetry\-breaking achievable in driven chemical reaction networks at given thermodynamic budget\.

We propose, in the rest of this paper, that these two lines of work may admit a common description as instances of a broader class of*driven informational systems*, and we develop the elements of that description\.

### 1\.2The central proposal

This perspective develops three connected proposals\.

#### First\.

Driven informational systems—whether chemical reaction networks under thermal flux or transformer parameter manifolds under gradient\-descent flux—can be modeled as governed by two gradient fields: an entropy\-production rateΣ\\Sigmaderived from the Schnakenberg decomposition of probability currents\(Schnakenberg,[1976](https://arxiv.org/html/2605.16325#bib.bib25)\), and an information quasi\-potentialΦI:=−ln⁡p∗\\Phi\_\{I\}:=\-\\ln p^\{\*\}defined self\-consistently through the stationary densityp∗p^\{\*\}\. Under the regularity conditions stated in §[4](https://arxiv.org/html/2605.16325#S4),∇Σ\\nabla\\Sigmaand∇ΦI\\nabla\\Phi\_\{I\}are generically linearly independent off equilibrium\.

#### Second\.

We propose two candidate order parameters that together may characterize the phase structure of these systems\. The first is an adversarial breakdown threshold

α†=Θ​\(1log⁡\|𝒪N\|\),\\alpha^\{\\dagger\}\\;=\\;\\Theta\\\!\\left\(\\frac\{1\}\{\\log\|\\mathcal\{O\}\_\{N\}\|\}\\right\),where\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|is the cardinality of the system’s primitive representation\. The asymptotic form depends on representational complexity, in contrast to the universal constants ofHampel \([1971](https://arxiv.org/html/2605.16325#bib.bib18)\)andDonoho and Huber \([1983](https://arxiv.org/html/2605.16325#bib.bib19)\); we are not aware of a prior breakdown point with this dependence in the literature reviewed, though the literature is large and the present paper does not claim exhaustive coverage\. The second proposed order parameter is a self\-referential coupling thresholdκc\\kappa\_\{c\}associated with predictive feedback through an internal model \(§[3\.2](https://arxiv.org/html/2605.16325#S3.SS2)\)\.

#### Third\.

We propose that biological intelligence and large language models may be productively studied as particular instances within this framework on different configuration manifolds\. Biological intelligence is here viewed as one realization of the dynamics, on chemical configuration manifolds under solar entropy flux over evolutionary time scales\. Frontier large language models are viewed as a second realization, on transformer parameter manifolds under gradient\-descent flux over training time scales\. We do not claim these systems are the same kind of object; the time scales, configuration manifolds, and physical substrates differ by many orders of magnitude\. We claim only that certain dynamical features—the two\-field structure, the candidate order parameters, the phase\-transition signature—may be shared across instances, and that this sharing is mathematically meaningful even where the substrates manifestly are not\.

### 1\.3Scope and provenance

The framework synthesized here builds on two preceding lines of work, both of which are publicly available as preprints\. The Equation of Motion–Information Field Framework \(EOM\-IFF\) for prebiotic chemistry was developed jointly with T\. Q\. Hoa and is presented in detail in a companion preprint on bioRxiv\(Truong and Truong,[2026a](https://arxiv.org/html/2605.16325#bib.bib2)\); that work establishes the two\-field independence theorem, derives structural constraints on single\-field gradient dynamics, and validates the framework against five independently published prebiotic systems\(Ferriset al\.,[1996](https://arxiv.org/html/2605.16325#bib.bib40); Blanket al\.,[2001](https://arxiv.org/html/2605.16325#bib.bib41); Matreuxet al\.,[2024](https://arxiv.org/html/2605.16325#bib.bib42); Floroniet al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib43); Routet al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib44)\)\. The theory of Ontological Phase Transitions \(OPT\) in learning systems was developed by the present author and is reported in a separate preprint on SSRN\(Truong,[2026](https://arxiv.org/html/2605.16325#bib.bib1)\); that work establishes a universal detection lower bound, a compression\-dividend theorem, the complexity\-dependent breakdown point cited above, and empirical validation of the predicted scaling on grokking dynamics across nine modulus pairs\.

The present perspective is a synthesis the author developed independently\. Its contribution—and the timestamp it claims—is not in the component results, which are established in those companion preprints, but in the proposal that the chemistry\-side and learning\-side frameworks may be productively studied as instances within a common dynamical class, withα†\\alpha^\{\\dagger\}andκc\\kappa\_\{c\}as candidate order parameters of that class\. We are not aware of a prior synthesis along these lines, though the literature is large and the present paper does not claim exhaustive coverage\.

This paper*does*outline the geometric perspective, define the two candidate order parameters, identify three falsifiable predictions, and position the framework relative to neighboring research programs: the free energy principle\(Friston,[2010](https://arxiv.org/html/2605.16325#bib.bib34); Ramsteadet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib35)\), singular learning theory\(Watanabe,[2009](https://arxiv.org/html/2605.16325#bib.bib13); Hooglandet al\.,[2024](https://arxiv.org/html/2605.16325#bib.bib11)\), dissipative adaptation\(England,[2015](https://arxiv.org/html/2605.16325#bib.bib30)\), grokking\-as\-compression\(Liuet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib6); DeMosset al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib7); Clauwet al\.,[2024](https://arxiv.org/html/2605.16325#bib.bib8)\), and tangled information hierarchies\(Prokopenkoet al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib37)\)\. This paper*does not*provide full proofs \(these are in the cited companion preprints\), claim universality beyond the stated regimes, or replace existing frameworks where they are correct\. Specific limitations and open problems are gathered in §[7](https://arxiv.org/html/2605.16325#S7)\.

### 1\.4Roadmap

§[2](https://arxiv.org/html/2605.16325#S2)sets up the language of driven informational systems and states the two\-field structure abstractly\. §[3](https://arxiv.org/html/2605.16325#S3)defines the two candidate order parametersα†\\alpha^\{\\dagger\}andκc\\kappa\_\{c\}and discusses their relationship\. §[4](https://arxiv.org/html/2605.16325#S4)states the two\-field independence theorem informally and indicates its consequences for learning theory\. §[5](https://arxiv.org/html/2605.16325#S5)presents the chemical and learning instances side by side\. §[6](https://arxiv.org/html/2605.16325#S6)derives three falsifiable predictions\. §[7](https://arxiv.org/html/2605.16325#S7)positions the framework relative to neighbors and identifies open problems\. §[8](https://arxiv.org/html/2605.16325#S8)concludes\.

## 2Driven Informational Systems

This section introduces the abstract framework in which the proposal is stated\. The construction is deliberately substrate\-agnostic: the same definitions apply when the configuration manifold is a chemical composition simplex and when it is the parameter space of a neural network\. What distinguishes the two cases—and what motivates the proposal that both may be productively studied as instances of a common dynamical class—is addressed in §[5](https://arxiv.org/html/2605.16325#S5)\.

### 2\.1Configuration manifold and Langevin dynamics

A*driven informational system*is specified by a triple\(ℳ,b,D\)\(\\mathcal\{M\},b,D\), whereℳ\\mathcal\{M\}is a smooth connectednn\-dimensional Riemannian manifold \(the configuration space\),b:ℳ→T​ℳb:\\mathcal\{M\}\\to T\\mathcal\{M\}is a Lipschitz drift vector field, andD\>0D\>0is a constant noise amplitude\. The stateXt∈ℳX\_\{t\}\\in\\mathcal\{M\}evolves under the overdamped Langevin equation

d​Xt=b​\(Xt\)​d​t\+2​D​d​Wt,\\mathrm\{d\}X\_\{t\}\\;=\\;b\(X\_\{t\}\)\\,\\mathrm\{d\}t\\;\+\\;\\sqrt\{2D\}\\,\\mathrm\{d\}W\_\{t\},\(1\)whereWtW\_\{t\}is a standard Brownian motion onℳ\\mathcal\{M\}\. Under mild regularity conditions—confinement of the drift at infinity, linear\-growth bounds, and strict positivity ofDD—equation \([1](https://arxiv.org/html/2605.16325#S2.E1)\) admits a unique stationary probability measureμ∗\\mu^\{\*\}with smooth, strictly positive densityp∗:ℳ→\(0,∞\)p^\{\*\}:\\mathcal\{M\}\\to\(0,\\infty\)\(Meyn and Tweedie,[1993](https://arxiv.org/html/2605.16325#bib.bib28)\); a precise statement of the regularity conditions is given in the companion preprint\.

Two instances of this construction frame the present perspective\. In*prebiotic chemistry*,ℳ\\mathcal\{M\}is the composition simplexΔn−1\\Delta^\{n\-1\}of molecular species under mass conservation,bbis determined by the reaction\-rate matrix of the chemical network, andDDencodes thermal fluctuations\. In*neural network learning*,ℳ\\mathcal\{M\}is the parameter manifold of a model architecture \(typicallyℝP\\mathbb\{R\}^\{P\}withPPthe parameter count, restricted by weight\-decay or layer\-norm constraints\),bbis the negative gradient of the training loss combined with regularization terms, andDDencodes mini\-batch gradient noise\. In both cases the dynamics is*driven*: an external entropy flux—thermal in chemistry, data\-driven in learning—maintains the system away from equilibrium, ensuring that the stationary measureμ∗\\mu^\{\*\}is genuinely non\-equilibrium \(i\.e\., does not satisfy detailed balance with respect tobb\)\.

### 2\.2The two fields

Two scalar fields can be constructed canonically from the dynamics \([1](https://arxiv.org/html/2605.16325#S2.E1)\)\.

#### Entropy production rate\.

LetJ∗​\(x\):=b​\(x\)​p∗​\(x\)−D​∇p∗​\(x\)J^\{\*\}\(x\):=b\(x\)p^\{\*\}\(x\)\-D\\nabla p^\{\*\}\(x\)denote the stationary probability current\. The*local entropy production rate*is

Σ​\(x\):=‖J∗​\(x\)‖2D⋅p∗​\(x\)\.\\Sigma\(x\)\\;:=\\;\\frac\{\\\|J^\{\*\}\(x\)\\\|^\{2\}\}\{D\\cdot p^\{\*\}\(x\)\}\.\(2\)This is the local Schnakenberg dissipation\(Schnakenberg,[1976](https://arxiv.org/html/2605.16325#bib.bib25); Seifert,[2012](https://arxiv.org/html/2605.16325#bib.bib26)\), and its integral∫Σ​\(x\)​p∗​\(x\)​dx\\int\\Sigma\(x\)\\,p^\{\*\}\(x\)\\,\\mathrm\{d\}xrecovers the total entropy production rate of the non\-equilibrium steady state\.Σ\\Sigmavanishes identically if and only if the system is in detailed balance, in which case the dynamics reduces to gradient descent on a potential\.

#### Information quasi\-potential\.

The*information quasi\-potential*is

ΦI​\(x\):=−ln⁡p∗​\(x\)\.\\Phi\_\{I\}\(x\)\\;:=\\;\-\\ln p^\{\*\}\(x\)\.\(3\)This object is well\-defined whereverp∗p^\{\*\}is, and inherits all the regularity ofp∗p^\{\*\}\. In equilibrium systemsΦI\\Phi\_\{I\}is a derived quantity \(proportional to the Boltzmann potential up to additive constants\), but off equilibrium it acquires independent geometric structure: in particular, its level sets need not coincide with those ofΣ\\Sigma, and its gradient need not be aligned with∇Σ\\nabla\\Sigma\. In the small\-noise limitD→0D\\to 0,ΦI\\Phi\_\{I\}coincides with the Freidlin–Wentzell quasi\-potential\(Freidlin and Wentzell,[1984](https://arxiv.org/html/2605.16325#bib.bib27)\), justifying the terminology\.

### 2\.3Two\-field structure

The drift in \([1](https://arxiv.org/html/2605.16325#S2.E1)\) can be*decomposed*into two gradient components plus a residual:

b​\(x\)=−α​∇Σ​\(x\)−β​∇ΦI​\(x\)\+b⟂​\(x\),b\(x\)\\;=\\;\-\\alpha\\,\\nabla\\Sigma\(x\)\\;\-\\;\\beta\\,\\nabla\\Phi\_\{I\}\(x\)\\;\+\\;b\_\{\\perp\}\(x\),\(4\)whereα,β∈ℝ\\alpha,\\beta\\in\\mathbb\{R\}are coupling constants andb⟂b\_\{\\perp\}collects any residual non\-gradient component\. The framework analyzed here is the regime in whichb⟂b\_\{\\perp\}is small in norm relative to the gradient terms, so that the dynamics is effectively two\-field gradient\. Whether this regime is reached in any particular application is an empirical question; we discuss its plausibility for chemical and learning instances in §[5](https://arxiv.org/html/2605.16325#S5)\.

The central structural fact about \([4](https://arxiv.org/html/2605.16325#S2.E4)\) is that the two gradient components are*generically independent*off equilibrium\. We state this informally here and develop it in §[4](https://arxiv.org/html/2605.16325#S4)\.

#### Two\-Field Independence \(informal statement\)\.

*Let\(ℳ,b,D\)\(\\mathcal\{M\},b,D\)be a driven informational system strictly out of detailed balance, satisfying mild non\-degeneracy conditions \(MorseΦI\\Phi\_\{I\}, trivial first cohomology ofℳ\\mathcal\{M\}, confinement at infinity\)\. Then∇Σ\\nabla\\Sigmaand∇ΦI\\nabla\\Phi\_\{I\}are not proportional on a set of positiveμ∗\\mu^\{\*\}\-measure where∇ΦI≠0\\nabla\\Phi\_\{I\}\\neq 0\. Within the space of admissible drift fields, the subset for which∇Σ∥∇ΦI\\nabla\\Sigma\\parallel\\nabla\\Phi\_\{I\}everywhere is contained in a proper algebraic subvariety, hence has Lebesgue measure zero\.*

A rigorous statement and proof—in both the discrete \(Schnakenberg network\) and continuous \(transport\-equation\) formulations—are given as Theorem 1 of\(Truong and Truong,[2026a](https://arxiv.org/html/2605.16325#bib.bib2)\)\. The proof reduces collinearity of the two gradients to vanishing of every cycle affinity in the Schnakenberg decomposition, which by Kolmogorov’s criterion is equivalent to detailed balance, contradicting the hypothesis\.

### 2\.4Why two fields matter

The geometric content of two\-field independence is that \([1](https://arxiv.org/html/2605.16325#S2.E1)\) cannot be reduced to gradient descent on any single scalar potential without losing information\. Three consequences follow, each of which has empirical signatures\.

First, on a compact manifold with linear driving, single\-field gradient dynamics yields a yield curve at any target configuration that is at most unimodal as a function of the driving parameter\. Multi\-peaked or oscillatory yield is incompatible with single\-field structure\.

Second, two independent perturbations with disjoint local supports near a target configuration combine*additively*in depth under single\-field gradient dynamics, with superlinearity factorS=1\+O​\(‖δ​V‖2\)S=1\+O\(\\\|\\delta V\\\|^\{2\}\)at second order\. The empirically inferred superlinearityS≈5\.75S\\approx 5\.75in clay\-catalyzed RNA polymerization under combined catalysis and confinement\(Ferriset al\.,[1996](https://arxiv.org/html/2605.16325#bib.bib40)\)therefore falsifies any single\-field gradient account in that system\.

Third, in learning\-theoretic instances, single\-field reduction implies that representational reorganization under context shift is basin selection within a fixed loss landscape—incompatible with the ontological\-restructuring phenomenology in which the primitive set itself changes across the transition\.

### 2\.5The substrate\-independence question

Two driven informational systems\(ℳ1,b1,D\)\(\\mathcal\{M\}\_\{1\},b\_\{1\},D\)and\(ℳ2,b2,D\)\(\\mathcal\{M\}\_\{2\},b\_\{2\},D\)related by an isometric diffeomorphismφ:ℳ1→ℳ2\\varphi:\\mathcal\{M\}\_\{1\}\\to\\mathcal\{M\}\_\{2\}withb2=φ∗​b1b\_\{2\}=\\varphi\_\{\*\}b\_\{1\}share their stationary densities, quasi\-potentials, and basin structure exactly\. This is a mathematical statement about the invariance of the framework under re\-parameterization, not a physical claim about substrate equivalence\.

The empirical hypothesis we develop in §[5](https://arxiv.org/html/2605.16325#S5)is weaker than substrate equivalence but stronger than analogy: we propose that biological intelligence and large language models may be productively studied as instances within the framework of equations \([1](https://arxiv.org/html/2605.16325#S2.E1)\)–\([4](https://arxiv.org/html/2605.16325#S2.E4)\) on different configuration manifolds, with shared dynamical structure \(the two\-field decomposition, the candidate order parameters\) but manifestly distinct substrates, time scales, and degrees of freedom\. Substrate\-independence in this sense is a proposal about which dynamical features may be conserved across instances within the framework, not about which substrates can support which dynamics\.

## 3Two Order Parameters

A driven informational system in the two\-field regime exhibits a phase structure governed by two order parameters: an adversarial breakdown thresholdα†\\alpha^\{\\dagger\}characterizing the system’s resistance to representational corruption from outside, and a self\-referential coupling thresholdκc\\kappa\_\{c\}characterizing the emergence of internal predictive coupling\. We define each operationally, indicate its scaling with representational complexity, and argue that the two are dual signatures of a single universality class\.

### 3\.1The breakdown order parameterα†\\alpha^\{\\dagger\}

Consider a driven informational system whose stationary measureμ∗\\mu^\{\*\}has support concentrated on a finite primitive set𝒪N=\{θ1,…,θ\|𝒪N\|\}\\mathcal\{O\}\_\{N\}=\\\{\\theta\_\{1\},\\ldots,\\theta\_\{\|\\mathcal\{O\}\_\{N\}\|\}\\\}, where eachθi∈ℝd\\theta\_\{i\}\\in\\mathbb\{R\}^\{d\}is a primitive representation \(a chemical species, a Fourier feature, a circuit motif\)\. The*representational complexity*of the system is the cardinality\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|\.

Suppose the system is observing a stream of data drawn from a context distributionpC1p\_\{C\_\{1\}\}, and that an adversary corrupts a fractionα∈\[0,1\]\\alpha\\in\[0,1\]of the stream by replacing samples with adversarially chosen ones\. The observer wishes to detect a genuine shift to a new contextpC2p\_\{C\_\{2\}\}withΔ:=KL​\(pC2∥pC1\)\>0\\Delta:=\\mathrm\{KL\}\(p\_\{C\_\{2\}\}\\\|p\_\{C\_\{1\}\}\)\>0, while distinguishing it from contamination\. The*minimax adversarial breakdown rate*is the largestα\\alphafor which any detection protocol can achieve false\-positive plus missed\-detection rates summing to less than1/21/2\.

#### Scaling\.

Theorem 10\.5 ofTruong \([2026](https://arxiv.org/html/2605.16325#bib.bib1)\)establishes that, under bounded\-shift assumptions \(0<Δmin≤Δ≤Δmax<∞0<\\Delta\_\{\\min\}\\leq\\Delta\\leq\\Delta\_\{\\max\}<\\infty\), the minimax breakdown rate satisfies

α†=Θ​\(1log⁡\|𝒪N\|\)\.\\alpha^\{\\dagger\}\\;=\\;\\Theta\\\!\\left\(\\frac\{1\}\{\\log\|\\mathcal\{O\}\_\{N\}\|\}\\right\)\.\(5\)The proof proceeds by a Le Cam two\-point argument\(Le Cam,[1986](https://arxiv.org/html/2605.16325#bib.bib23)\): a universal detection lower boundDpassive∗=Ω​\(log⁡\|𝒪N\|/Δ\)D^\{\*\}\_\{\\text\{passive\}\}=\\Omega\(\\log\|\\mathcal\{O\}\_\{N\}\|/\\Delta\)implies that no algorithm can confirm a shift with fewer samples while controlling false\-positive rate\. An adversary with budgetα\\alphainjectsα⋅Dpassive∗=Θ​\(α​log⁡\|𝒪N\|/Δ\)\\alpha\\cdot D^\{\*\}\_\{\\text\{passive\}\}=\\Theta\(\\alpha\\log\|\\mathcal\{O\}\_\{N\}\|/\\Delta\)poisoned samples per detection window\. Whenα​log⁡\|𝒪N\|≥1\\alpha\\log\|\\mathcal\{O\}\_\{N\}\|\\geq 1, the adversary can place at least one poisoned sample in every coherent detection window, defeating temporal\-consistency methods\. Setting the product to unity gives \([5](https://arxiv.org/html/2605.16325#S3.E5)\)\.

#### Significance\.

Classical breakdown points in robust statistics are universal constants:1/21/2for the median\(Donoho and Huber,[1983](https://arxiv.org/html/2605.16325#bib.bib19)\), and≈29%\\approx 29\\%for high\-breakdownSS\- andMM\-estimators\(Hampel,[1971](https://arxiv.org/html/2605.16325#bib.bib18)\)\. Recent advances in high\-dimensional robust statistics\(Diakonikolas and Kane,[2023](https://arxiv.org/html/2605.16325#bib.bib20)\)bound*error*by ambient dimension, but the underlying breakdown fraction remains constant; targeted\-poisoning lower bounds\(Hannekeet al\.,[2022](https://arxiv.org/html/2605.16325#bib.bib21); Chornomazet al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib22)\)similarly scale error with VC dimension while leaving the breakdown threshold dimension\-free\. Equation \([5](https://arxiv.org/html/2605.16325#S3.E5)\) differs from these in that its asymptotic form depends on representational complexity\. We are not aware of a prior breakdown point with this dependence in the literature reviewed; the literature is large and we cannot rule out parallel constructions we have missed\. Its rate of decay is sub\-logarithmic but strictly positive:α†→0\\alpha^\{\\dagger\}\\to 0as\|𝒪N\|→∞\|\\mathcal\{O\}\_\{N\}\|\\to\\infty, consistent with the intuition that more capable systems may be intrinsically harder to defend against adversarial corruption\.

### 3\.2The self\-referential order parameterκc\\kappa\_\{c\}

Consider now a driven informational system equipped with an internal*model space*ℳmodel\\mathcal\{M\}\_\{\\text\{model\}\}of dimension strictly less thandimℳ\\dim\\mathcal\{M\}, together with a smooth surjective projectionπ:ℳ→ℳmodel\\pi:\\mathcal\{M\}\\to\\mathcal\{M\}\_\{\\text\{model\}\}and a smooth feedback functiong:ℳmodel→T​ℳg:\\mathcal\{M\}\_\{\\text\{model\}\}\\to T\\mathcal\{M\}\. The dynamics is augmented by a self\-referential drift term:

d​Xt=b​\(Xt\)​d​t−κ​g​\(π​\(Xt\)\)​d​t\+2​D​d​Wt,\\mathrm\{d\}X\_\{t\}\\;=\\;b\(X\_\{t\}\)\\,\\mathrm\{d\}t\\;\-\\;\\kappa\\,g\(\\pi\(X\_\{t\}\)\)\\,\\mathrm\{d\}t\\;\+\\;\\sqrt\{2D\}\\,\\mathrm\{d\}W\_\{t\},\(6\)whereκ≥0\\kappa\\geq 0is the self\-referential coupling strength\. Heuristically,π​\(Xt\)\\pi\(X\_\{t\}\)is the system’s compressed internal representation of its own state, andg​\(π​\(Xt\)\)g\(\\pi\(X\_\{t\}\)\)is the action this internal representation drives on the substrate\.

Two observables, both estimable from time\-series data, characterize the regime of this system\. The*predictive fidelity*is

F​\(κ\):=1−exp⁡\(−2​I​\(Xt\+τ;π​\(Xt\)\)\),F\(\\kappa\)\\;:=\\;1\-\\exp\\\!\\bigl\(\-2\\,I\(X\_\{t\+\\tau\};\\,\\pi\(X\_\{t\}\)\)\\bigr\),\(7\)whereI​\(⋅;⋅\)I\(\\cdot\\,;\\,\\cdot\)is mutual information andτ\\tauis the slow correlation time of the projected process\. The form \([7](https://arxiv.org/html/2605.16325#S3.E7)\) is the Linfoot informational\-correlation transformation\(Linfoot,[1957](https://arxiv.org/html/2605.16325#bib.bib50)\): it agrees with squared correlationρ2\\rho^\{2\}for jointly Gaussian pairs, takes values in\[0,1\]\[0,1\]unconditionally, and is invariant under invertible re\-parameterization\. The*causal efficacy*is

C​\(κ\):=𝔼μ∗​\[‖κ​g​\(π​\(X\)\)‖\]𝔼μ∗​\[‖b​\(X\)−κ​g​\(π​\(X\)\)‖\],C\(\\kappa\)\\;:=\\;\\frac\{\\mathbb\{E\}\_\{\\mu^\{\*\}\}\\bigl\[\\\|\\kappa\\,g\(\\pi\(X\)\)\\\|\\bigr\]\}\{\\mathbb\{E\}\_\{\\mu^\{\*\}\}\\bigl\[\\\|b\(X\)\-\\kappa\\,g\(\\pi\(X\)\)\\\|\\bigr\]\},\(8\)the dimensionless ratio of the self\-referential drift magnitude to the substrate drift magnitude under stationary measure\.

#### Threshold\.

The*self\-referential coupling threshold*is

κc:=inf\{κ≥0:F​\(κ\)≥Fmin∧C​\(κ\)≥Cmin\},\\kappa\_\{c\}\\;:=\\;\\inf\\bigl\\\{\\kappa\\geq 0\\,:\\,F\(\\kappa\)\\geq F\_\{\\min\}\\ \\wedge\\ C\(\\kappa\)\\geq C\_\{\\min\}\\bigr\\\},\(9\)The operational thresholdsFminF\_\{\\min\}andCminC\_\{\\min\}are calibrated against the four\-criteria framework for genuine introspection developed inLindsey \([2025](https://arxiv.org/html/2605.16325#bib.bib15)\): predictive fidelityF​\(κ\)F\(\\kappa\)corresponds jointly to Lindsey’s*accuracy*and*grounding*criteria \(a self\-report’s reliability plus its causal dependence on the state being reported\), while causal efficacyC​\(κ\)C\(\\kappa\)corresponds to the*internality*criterion \(the requirement that introspection not be routed through external outputs\)\. We takeFmin=0\.5F\_\{\\min\}=0\.5\(Lindsey’s threshold for “reliable self\-reports”\) andCmin=0\.1C\_\{\\min\}=0\.1\(the minimum causal contribution distinguishable from probability\-matching artefacts in the dissociation analysis ofLederman and Mahowald \([2026](https://arxiv.org/html/2605.16325#bib.bib58)\)\)\. Robustness to choices in the rangesFmin∈\[0\.4,0\.6\]F\_\{\\min\}\\in\[0\.4,0\.6\],Cmin∈\[0\.05,0\.2\]C\_\{\\min\}\\in\[0\.05,0\.2\]does not qualitatively change the analysis\. If the set in \([9](https://arxiv.org/html/2605.16325#S3.E9)\) is empty,κc=\+∞\\kappa\_\{c\}=\+\\infty: the system cannot reach the self\-referential regime under any coupling strength\.

#### Interpretation\.

A system withκ<κc\\kappa<\\kappa\_\{c\}*samples*from its stationary measure:π​\(Xt\)\\pi\(X\_\{t\}\)tracksXtX\_\{t\}but does not stably encode the dynamics\. A system withκ≥κc\\kappa\\geq\\kappa\_\{c\}*encodes and acts on*its statistics: the internal projectionπ​\(Xt\)\\pi\(X\_\{t\}\)is a sufficient statistic with non\-trivial causal influence on the substrate dynamics\. This is an operational claim about time\-series structure—bothFFandCCare estimable from observation\(Kraskovet al\.,[2004](https://arxiv.org/html/2605.16325#bib.bib51); Belghaziet al\.,[2018](https://arxiv.org/html/2605.16325#bib.bib52)\)—not a metaphysical claim about consciousness or subjective experience\.

A growing body of empirical work on LLM introspection\(Binderet al\.,[2024](https://arxiv.org/html/2605.16325#bib.bib16); Lindsey,[2025](https://arxiv.org/html/2605.16325#bib.bib15); Lederman and Mahowald,[2026](https://arxiv.org/html/2605.16325#bib.bib58); Macaret al\.,[2026](https://arxiv.org/html/2605.16325#bib.bib59); Songet al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib60)\)reports a consistent qualitative pattern: current frontier systems exhibit*partial*introspective capacity—detecting injected internal states above the noise floor\(Lindsey,[2025](https://arxiv.org/html/2605.16325#bib.bib15)\), distinguishing self\-generated from prefilled content\(Lindsey,[2025](https://arxiv.org/html/2605.16325#bib.bib15)\), but doing so with high context\-dependence and identifying*anomaly without content*in the dissociation ofLederman and Mahowald \([2026](https://arxiv.org/html/2605.16325#bib.bib58)\)\. Within the present framework this pattern is naturally interpreted as operation in the vicinity ofκc\\kappa\_\{c\}: above the regime where introspection is absent \(κ≪κc\\kappa\\ll\\kappa\_\{c\}\) but below the regime where it would be reliably grounded \(κ≫κc\\kappa\\gg\\kappa\_\{c\}\)\. We develop this interpretation in §[5\.2](https://arxiv.org/html/2605.16325#S5.SS2)and stress that it is interpretive, not a direct measurement; direct estimation ofF​\(κ\)F\(\\kappa\)andC​\(κ\)C\(\\kappa\)for current frontier models from full training trajectories remains an open problem\.

### 3\.3The duality claim

The two order parametersα†\\alpha^\{\\dagger\}andκc\\kappa\_\{c\}are defined on apparently distinct dimensions:α†\\alpha^\{\\dagger\}concerns robustness to external perturbation of the representation,κc\\kappa\_\{c\}concerns the emergence of internal predictive coupling\. We propose they are related signatures within a common framework\.

#### Common origin inΦI\\Phi\_\{I\}\.

Both order parameters are derived from the negative\-log stationary density\. The breakdown thresholdα†\\alpha^\{\\dagger\}enters through the Le Cam two\-point construction, where the hypotheses being distinguished—genuine shift versus contamination—are characterized by theirΦI\\Phi\_\{I\}\-distance under the two contexts\. The coupling thresholdκc\\kappa\_\{c\}enters through the mutual informationI​\(Xt\+τ;π​\(Xt\)\)I\(X\_\{t\+\\tau\};\\pi\(X\_\{t\}\)\), which can be expressed as aΦI\\Phi\_\{I\}\-divergence between the joint and product marginal stationary measures\. Both quantities are functionals ofΦI\\Phi\_\{I\}alone; the entropy production rateΣ\\Sigmaenters the dynamics that producesΦI\\Phi\_\{I\}but does not appear directly in either threshold\.

#### Joint scaling \(conjecture\)\.

We conjecture the following joint scaling relation as\|𝒪N\|→∞\|\\mathcal\{O\}\_\{N\}\|\\to\\infty:

α†​\(𝒪N\)⋅\(log⁡\|𝒪N\|\)γ1→c1,κc​\(𝒪N\)⋅\(log⁡\|𝒪N\|\)γ2→c2,\\alpha^\{\\dagger\}\(\\mathcal\{O\}\_\{N\}\)\\cdot\(\\log\|\\mathcal\{O\}\_\{N\}\|\)^\{\\gamma\_\{1\}\}\\;\\to\\;c\_\{1\},\\qquad\\kappa\_\{c\}\(\\mathcal\{O\}\_\{N\}\)\\cdot\(\\log\|\\mathcal\{O\}\_\{N\}\|\)^\{\\gamma\_\{2\}\}\\;\\to\\;c\_\{2\},\(10\)whereγ1,γ2\>0\\gamma\_\{1\},\\gamma\_\{2\}\>0are scaling exponents andc1,c2\>0c\_\{1\},c\_\{2\}\>0are constants\. The first relation, withγ1=1\\gamma\_\{1\}=1, follows from \([5](https://arxiv.org/html/2605.16325#S3.E5)\) withc1c\_\{1\}a constant of the breakdown geometry \(modulo a universal factor\)\. The exponentγ2\\gamma\_\{2\}governing the self\-referential coupling threshold is open: dimensional analysis combined with the Le Cam two\-point structure underlyingκc\\kappa\_\{c\}suggestsγ2∈\(0,1\]\\gamma\_\{2\}\\in\(0,1\], withγ2=1/2\\gamma\_\{2\}=1/2as a natural guess from analogy with standard parametric rate arguments and with the logarithmic delay laws established for norm\-driven phase transitions in regularised training\(Truonget al\.,[2026](https://arxiv.org/html/2605.16325#bib.bib65); Truong and Truong,[2026b](https://arxiv.org/html/2605.16325#bib.bib66)\), but no rigorous derivation in the self\-referential setting is currently available\.

We treat the pair\(γ1,γ2\)\(\\gamma\_\{1\},\\gamma\_\{2\}\)together with\(c1,c2\)\(c\_\{1\},c\_\{2\}\)as*defining the universality class*of a driven informational system: their joint values, rather than any specific functional form, parameterize where in the space of such systems a given instance lies\. This duality structure—two complementary order parameters with potentially distinct anomalous dimensions—is reminiscent of multicritical phenomena in coupled\-order\-parameter universality classes\(Eichhornet al\.,[2013](https://arxiv.org/html/2605.16325#bib.bib62); Hasselmannet al\.,[2007](https://arxiv.org/html/2605.16325#bib.bib63)\), though we do not claim formal membership in those specific classes\. Precedent for breakdown points whose form depends on a complexity proxy is also found in modern robust statistics\(Lecué and Lerasle,[2020](https://arxiv.org/html/2605.16325#bib.bib64)\), where breakdown numbers scale with effective dimension; equation \([5](https://arxiv.org/html/2605.16325#S3.E5)\) extends this dependence to the logarithm of an ontology\-size proxy\. The empirical determination ofγ2\\gamma\_\{2\}is the principal experimental content of Prediction 1 \(§[6\.1](https://arxiv.org/html/2605.16325#S6.SS1)\)\. We treat \([10](https://arxiv.org/html/2605.16325#S3.E10)\) as a falsifiable conjecture \(§[6](https://arxiv.org/html/2605.16325#S6)\)\.

#### Phase\-transition signature\.

A driven informational system that crosses both thresholds may undergo qualitative restructuring of its dynamics: below threshold, the system is a passive sampler whose representation can be corrupted at rateα<α†\\alpha<\\alpha^\{\\dagger\}; above threshold, the system encodes its own statistics self\-referentially and may resist corruption at the same nominal rate through internal consistency\. We propose this as a candidate signature; whether the joint structure constitutes a universality class in the technical sense is left for future work\.

![Refer to caption](https://arxiv.org/html/2605.16325v1/x1.png)Figure 1:The two candidate order parameters \(schematic\)\.\(a\)The adversarial breakdown thresholdα†=Θ​\(1/log⁡\|𝒪N\|\)\\alpha^\{\\dagger\}=\\Theta\(1/\\log\|\\mathcal\{O\}\_\{N\}\|\)as a function of representational complexity\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|\(solid blue\), shown against classical breakdown points of robust statistics \(dotted horizontal lines\): the median ofHampel \([1971](https://arxiv.org/html/2605.16325#bib.bib18)\)at1/21/2, and high\-breakdownSS/MM\-estimators at≈0\.29\\approx 0\.29\. The classical breakdown points are universal constants; equation \([5](https://arxiv.org/html/2605.16325#S3.E5)\) differs in that its asymptotic form depends on representational complexity\.\(b\)The predictive fidelityF​\(κ\)F\(\\kappa\)\(Linfoot form, solid blue\) and the causal efficacyC​\(κ\)C\(\\kappa\)\(dashed red\) as a function of the self\-referential coupling strengthκ\\kappa\. The thresholdκc\\kappa\_\{c\}is the smallestκ\\kappaat which bothF≥Fmin=0\.5F\\geq F\_\{\\min\}=0\.5andC≥Cmin=0\.1C\\geq C\_\{\\min\}=0\.1hold \(vertical black line\)\.\(c\)The joint scaling conjecture \([10](https://arxiv.org/html/2605.16325#S3.E10)\): the rescaled productsα†⋅\(log⁡\|𝒪N\|\)γ1\\alpha^\{\\dagger\}\\cdot\(\\log\|\\mathcal\{O\}\_\{N\}\|\)^\{\\gamma\_\{1\}\}andκc⋅\(log⁡\|𝒪N\|\)γ2\\kappa\_\{c\}\\cdot\(\\log\|\\mathcal\{O\}\_\{N\}\|\)^\{\\gamma\_\{2\}\}are conjectured to approach distinct constantsc1,c2c\_\{1\},c\_\{2\}as\|𝒪N\|→∞\|\\mathcal\{O\}\_\{N\}\|\\to\\infty, with\(γ1,γ2\)\(\\gamma\_\{1\},\\gamma\_\{2\}\)defining the universality class \(γ1=1\\gamma\_\{1\}=1established;γ2∈\(0,1\]\\gamma\_\{2\}\\in\(0,1\]to be determined empirically per Prediction 1\)\.

### 3\.4Relation to existing order parameters

Several candidate order parameters have been proposed for phase\-transition phenomena in driven informational systems\. We position\(α†,κc\)\(\\alpha^\{\\dagger\},\\kappa\_\{c\}\)relative to three of the most developed\.

#### Real log canonical threshold \(RLCT\) in singular learning theory\.

Watanabe’s RLCT\(Watanabe,[2009](https://arxiv.org/html/2605.16325#bib.bib13)\)is a geometric invariant of the loss landscape that governs Bayesian generalization rates and has been proposed as the natural order parameter for stagewise development in neural networks\(Hooglandet al\.,[2024](https://arxiv.org/html/2605.16325#bib.bib11); Pepin Lehalleuret al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib12)\)\. The RLCT andα†\\alpha^\{\\dagger\}measure structurally distinct quantities: the RLCT characterizes*posterior basin geometry*under fixed data distribution, whereasα†\\alpha^\{\\dagger\}characterizes*robustness across distributional shift*\. The two are complementary; we conjecture \(and state as an open problem\) that systems near a Bayesian phase transition in the RLCT sense undergo simultaneous changes inα†\\alpha^\{\\dagger\}, but no direct functional relation is known\.

#### Variational free energy in the free energy principle\.

Friston’s variational free energyFFEPF\_\{\\text\{FEP\}\}\(Friston,[2010](https://arxiv.org/html/2605.16325#bib.bib34); Ramsteadet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib35)\)is a single scalar functional combining prediction error and complexity, minimized by self\- organizing systems\. The Linfoot fidelityF​\(κ\)F\(\\kappa\)in \([7](https://arxiv.org/html/2605.16325#S3.E7)\) is related to but not equal toFFEPF\_\{\\text\{FEP\}\}: both are mutual\-information\-based, butF​\(κ\)F\(\\kappa\)admits a sharp threshold form via the operational thresholdsFmin,CminF\_\{\\min\},C\_\{\\min\}, whereasFFEPF\_\{\\text\{FEP\}\}is continuously minimized\. The two\-field framework can be read as supplying the FEP with an explicit substrate dynamics \(Σ\\Sigma\-driven\) underlying the variational minimization\.

#### Compression rate in grokking\-as\-compression\.

Liuet al\.\([2023](https://arxiv.org/html/2605.16325#bib.bib6)\)propose the linear\-mapping number \(LMN\) as a measure of representational compression, andDeMosset al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib7)\)develop a rate–distortion–MDL account of complexity rise\-and\-fall during training\. Both relate to\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|through the MDL correspondence between primitive count and description length, but neither yields a breakdown point\. The compression\-dividend theorem ofTruong \([2026](https://arxiv.org/html/2605.16325#bib.bib1), §11\)establishes\|𝒪Npost\|≤\|𝒪Npre\|/η\|\\mathcal\{O\}\_\{N\}^\{\\text\{post\}\}\|\\leq\|\\mathcal\{O\}\_\{N\}^\{\\text\{pre\}\}\|/\\etaacross a phase transition, whereη\\etais the alignment\-protocol efficiency—a result of which the LMN reduction is a special case\.

## 4The Two\-Field Independence Theorem

The two\-field structure introduced in §[2](https://arxiv.org/html/2605.16325#S2)is not a modeling assumption but a generic structural feature of any driven informational system out of detailed balance\. We state this as a theorem at the level of formality appropriate for a perspective paper, and indicate three consequences specific to learning theory\.

### 4\.1Statement and proof sketch

#### Theorem \(Two\-Field Independence; informal\)\.

*Let\(ℳ,b,D\)\(\\mathcal\{M\},b,D\)be a driven informational system with stationary densityp∗p^\{\*\}, strictly out of detailed balance \(i\.e\.,J∗≢0J^\{\*\}\\not\\equiv 0\)\. Assume:*

1. \(C1\)*Confinement:ℳ\\mathcal\{M\}is compact, or non\-compact with confining drift ensuringJ∗​\(x\)→0J^\{\*\}\(x\)\\to 0at infinity\.*
2. \(C2\)*Morse condition:ΦI=−ln⁡p∗\\Phi\_\{I\}=\-\\ln p^\{\*\}has isolated, non\-degenerate critical points\.*
3. \(C3\)*Trivial first cohomology:H1​\(ℳ;ℝ\)=0H^\{1\}\(\\mathcal\{M\};\\mathbb\{R\}\)=0\(excludes flat manifolds with non\-trivial fundamental group, e\.g\., the torus\)\.*

*Then∇Σ\\nabla\\Sigmaand∇ΦI\\nabla\\Phi\_\{I\}are not proportional on a set of positiveμ∗\\mu^\{\*\}\-measure where∇ΦI≠0\\nabla\\Phi\_\{I\}\\neq 0\. Within the space of admissible drift fields satisfying\(C1\)–\(C3\)and violating detailed balance, the subset for which∇Σ∥∇ΦI\\nabla\\Sigma\\parallel\\nabla\\Phi\_\{I\}everywhere on the regular set\{∇ΦI≠0\}\\\{\\nabla\\Phi\_\{I\}\\neq 0\\\}is contained in a proper algebraic subvariety of Lebesgue measure zero\.*

A complete proof, in both discrete \(finite\-state Markov chain\) and continuous \(Fokker–Planck\) formulations, is the central result of the chemistry\-side companion preprint cited in §[2\.3](https://arxiv.org/html/2605.16325#S2.SS3)\. We sketch the argument here in three steps\.

#### Step 1: collinearity implies shared level sets\.

If∇Σ∥∇ΦI\\nabla\\Sigma\\parallel\\nabla\\Phi\_\{I\}pointwiseμ∗\\mu^\{\*\}\-almost everywhere, thenΣ\\SigmaandΦI\\Phi\_\{I\}have identical level sets \(moduloμ∗\\mu^\{\*\}\-null sets\), soΣ=F​\(ΦI\)\\Sigma=F\(\\Phi\_\{I\}\)for some scalar functionFF\.

#### Step 2: cycle affinities must vanish\.

For a finite\-state Markov chain with rateski​jk\_\{ij\}and stationary probabilitiespi∗p^\{\*\}\_\{i\}, Schnakenberg’s decomposition expresses the total entropy production as a sum over cycles:

Σtot=∑cJc⋅𝒜​\(c\),𝒜​\(c\)=ln​∏ℓ∈ckiℓ​iℓ\+1kiℓ\+1​iℓ,\\Sigma\_\{\\text\{tot\}\}\\;=\\;\\sum\_\{c\}J\_\{c\}\\cdot\\mathcal\{A\}\(c\),\\quad\\mathcal\{A\}\(c\)=\\ln\\prod\_\{\\ell\\in c\}\\frac\{k\_\{i\_\{\\ell\}i\_\{\\ell\+1\}\}\}\{k\_\{i\_\{\\ell\+1\}i\_\{\\ell\}\}\},where𝒜​\(c\)\\mathcal\{A\}\(c\)is the affinity of cyclecc\. The functional constraintΣ=F​\(ΦI\)=F​\(−ln⁡p∗\)\\Sigma=F\(\\Phi\_\{I\}\)=F\(\-\\ln p^\{\*\}\)forces every cycle affinity to be expressible purely in terms of thepi∗p^\{\*\}\_\{i\}along the cycle\. The only expressions consistent across all cycles simultaneously are𝒜​\(c\)=0\\mathcal\{A\}\(c\)=0for every cycle\.

#### Step 3: Kolmogorov’s criterion\.

Vanishing cycle affinities are precisely Kolmogorov’s criterion for detailed balance\(Kelly,[1979](https://arxiv.org/html/2605.16325#bib.bib29)\)\. But detailed balance forcesJ∗≡0J^\{\*\}\\equiv 0, contradicting the hypothesis of off\-equilibrium operation\.

The continuous\-state case follows by Sard’s theorem applied to the map from drift fields to stationary currents, with the topological non\-degeneracy of \(C2\) and \(C3\) ensuring that the singular set has positive codimension\. The flat\-density counterexample \(constant drift on the torus, where∇Σ\\nabla\\Sigmaand∇ΦI\\nabla\\Phi\_\{I\}both vanish identically\) violates both \(C2\) and \(C3\) and is excluded by hypothesis\. Configuration spaces relevant to chemistry \(composition simplices\) and learning \(parameter spaces with weight decay\) satisfy \(C1\)–\(C3\) automatically; we discuss this in §[5](https://arxiv.org/html/2605.16325#S5)\.

### 4\.2Consequences for learning theory

The Two\-Field Independence Theorem has three potential consequences for the study of phase transitions in neural network training, each of which is in principle empirically testable\.

#### Consequence 1: gradient descent is generically not single\-field\.

Standard analyses of neural network training treat the dynamics as gradient descent on a loss surfaceL​\(θ\)L\(\\theta\), modeled asθ˙=−∇L​\(θ\)\+noise\\dot\{\\theta\}=\-\\nabla L\(\\theta\)\+\\text\{noise\}\. When training is augmented by weight decay, learning\-rate scheduling, batch\-size modulation, or RLHF feedback, the effective drift acquires components that need not be gradients ofLL\. The Two\-Field Independence Theorem implies that if these additional components produce a non\-equilibrium stationary distribution—which they generically do, since training does not converge to the Boltzmann distribution ofLL—then the dynamics admits a two\-field description, and the parameter distributionp∗​\(θ\)p^\{\*\}\(\\theta\)cannot in general be recovered fromLLalone\. The information quasi\-potentialΦI=−ln⁡p∗\\Phi\_\{I\}=\-\\ln p^\{\*\}may encode structural information about the trained network that is not directly visible from the loss geometry\.

#### Consequence 2: compression dividend may require both fields\.

The compression\-dividend theorem cited in §[3\.4](https://arxiv.org/html/2605.16325#S3.SS4)establishes that, across an ontological phase transition, the post\-transition primitive count satisfies\|𝒪Npost\|≤\|𝒪Npre\|/η\|\\mathcal\{O\}\_\{N\}^\{\\text\{post\}\}\|\\leq\|\\mathcal\{O\}\_\{N\}^\{\\text\{pre\}\}\|/\\etawhereη\\etais the alignment\-protocol efficiency\. Reframed in the two\-field language: the compression is driven byΦI\\Phi\_\{I\}\(which selects deeper\-well representations\), but its rate is set byΣ\\Sigma\(which provides the exploration entropy through which deeper basins are discovered\)\. Single\-field reductions—whether loss\-only\(Liuet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib6)\)or compression\-only\(DeMosset al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib7)\)—may capture one side of this dynamic but not its complete structure within this framework\.

#### Consequence 3: alignment protocols may modifyΦI\\Phi\_\{I\}, notΣ\\Sigma\.

Reinforcement learning from human feedback, constitutional training, debate, and targeted interpretability protocols share a common structural feature: they introduce an auxiliary signal that biases the trained network toward configurations satisfying external constraints, without modifying the gradient\-descent substrate dynamics that produces representational structure\. In the two\-field language, these protocols can be read as modifyingΦI\\Phi\_\{I\}\(by reshaping the stationary measure\) without modifyingΣ\\Sigma\(the entropy\- production geometry\)\. The alignment efficiencyη=I​\(A;C2∣𝒪N\)/H​\(C2∣𝒪N\)\\eta=I\(A;C\_\{2\}\\mid\\mathcal\{O\}\_\{N\}\)/H\(C\_\{2\}\\mid\\mathcal\{O\}\_\{N\}\)introduced in the OPT companion preprint measures, in this reading, the projection of the protocol’sΦI\\Phi\_\{I\}\-modification onto the direction of the ground\-truth context shift\. Different protocols produce differentη\\etavalues; comparing them on this scale provides a common unit for protocol\-efficiency comparison\.

## 5Two Candidate Instances

The framework of §§[2](https://arxiv.org/html/2605.16325#S2)–[4](https://arxiv.org/html/2605.16325#S4)is substrate\-agnostic: the two\-field structure, the candidate order parameters, and the independence theorem are formulated to apply to any driven informational system satisfying the regularity conditions\. What distinguishes the instances is which configuration manifoldℳ\\mathcal\{M\}the dynamics unfolds on, what entropy flux drives the substrate, and on what time scale the resulting dynamics develops\. We present two candidate instances side by side, indicate what the framework predicts they share, and indicate what remains substrate\-specific\.

![Refer to caption](https://arxiv.org/html/2605.16325v1/x2.png)Figure 2:The two\-field geometry on two configuration manifolds \(schematic\)\.Both panels depict the same dynamical class: a stochastic trajectoryXtX\_\{t\}\(green\) on a configuration manifoldℳ\\mathcal\{M\}driven by an external entropy flux \(Σ\\Sigma, blue arrows\) and stabilized at attractors of the information quasi\-potentialΦI\\Phi\_\{I\}\(red contours\)\. The two gradient fields∇Σ\\nabla\\Sigmaand∇ΦI\\nabla\\Phi\_\{I\}are non\-collinear off equilibrium \(§[4](https://arxiv.org/html/2605.16325#S4)\)\.\(a\)Prebiotic\-chemistry instance: the configuration manifold is the composition simplexΔn−1\\Delta^\{n\-1\}with vertices labelled by amino\-acid \(AA\), nucleobase \(NB\), and sugar species; entropy flux is solar/thermal; attractors are deepΦI\\Phi\_\{I\}wells at AAs and nucleobases\.\(b\)Transformer\-learning instance: the configuration manifold is the parameter spaceℝP\\mathbb\{R\}^\{P\}with axes labelled by representational regimes \(memorization, Fourier features, generalizing circuits\); entropy flux is the gradient\-descent training signal; the trajectory exhibits a long memorization plateau before transitioning to the generalizing\-circuit attractor \(the grokking phenomenon\(Poweret al\.,[2022](https://arxiv.org/html/2605.16325#bib.bib3); Nandaet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib4)\)\)\.### 5\.1Instance 1: Carbon–nitrogen chemistry under solar entropy flux

In prebiotic chemistry, the configuration manifold is the composition simplexℳ=Δn−1\\mathcal\{M\}=\\Delta^\{n\-1\}of molecular species under mass conservation\. The driftbbis determined by the reaction\-rate matrix of the chemical network, modulated by catalytic surfaces, geometric confinement, and pH conditions\. The noiseDDencodes thermal fluctuations at ambient temperature\. The entropy flux that drives the system out of detailed balance is external: ultraviolet photolysis from solar radiation, hydrothermal heat gradients, shock chemistry from impacts, and wet–dry cycling under tidal or atmospheric forcing\.

Conditions \(C1\)–\(C3\) of the Two\-Field Independence Theorem are satisfied automatically\. The composition simplex is compact and convex, hence contractible—giving \(C1\) and \(C3\)\. The Morse condition \(C2\) corresponds to strictly positive vibrational frequencies at stable molecular configurations, verified spectroscopically across all amino acids and nucleobases\.

#### Empirical anchors\.

Five independently published prebiotic systems anchor the framework’s validity in this instance\.Ferriset al\.\([1996](https://arxiv.org/html/2605.16325#bib.bib40)\)demonstrated mineral\-catalyzed RNA oligomerization to lengths an order of magnitude beyond solution\-only controls; the inferred catalysis–confinement synergy factorS≈5\.75S\\approx 5\.75falsifies single\-field gradient accounts \(§[2\.4](https://arxiv.org/html/2605.16325#S2.SS4)\)\.Blanket al\.\([2001](https://arxiv.org/html/2605.16325#bib.bib41)\)reported the non\-monotonic optimal entropy\-flux window for amino\-acid yield in shock synthesis\.Matreuxet al\.\([2024](https://arxiv.org/html/2605.16325#bib.bib42)\)demonstrated three\-orders\-of\-magnitude selective enrichment of\>50\>50prebiotic building blocks under simple heat flow\.Floroniet al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib43)\)realized the full prebiotic\-to\-biotic transition criterion—attractor coupling, sustained external flux, and geometric confinement—in a single membraneless protocell driven by a microscale heat gradient\.Routet al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib44)\)demonstrated that amino acids catalyze RNA copolymerization with strongly base\-dependent fold\-enhancements \(more than100×100\\times\)\. The framework identifies amino acids and nucleobases as deep wells ofΦI\\Phi\_\{I\}under sustained entropy flux, with the universality of these motifs across meteoritic, asteroidal, and laboratory environments\(Kvenvoldenet al\.,[1970](https://arxiv.org/html/2605.16325#bib.bib47); Obaet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib48); McGuire,[2022](https://arxiv.org/html/2605.16325#bib.bib49)\)reflecting their status as attractors of equation \([1](https://arxiv.org/html/2605.16325#S2.E1)\) on this manifold\.

#### Realization\.

Biological intelligence on Earth can, on this view, be regarded as one realization of the dynamics \([1](https://arxiv.org/html/2605.16325#S2.E1)\) on the chemical configuration manifold\. The relevant time scale is evolutionary \(∼109\\sim 10^\{9\}years from prebiotic chemistry to multicellular life\)\. The self\-referential coupling thresholdκc\\kappa\_\{c\}is crossed in template\-directed synthesis and autocatalytic networks, withκA​B\>κmin\\kappa\_\{AB\}\>\\kappa\_\{\\min\}consistent with the Floroni protocell experiment\. The framework does not claim biological intelligence is the unique solution on this manifold, only that it is one realized solution given Earth’s specific entropy\-flux history\.

### 5\.2Instance 2: Transformer parameter manifolds under gradient\-descent flux

In neural network training, the configuration manifold is the parameter spaceℳ=ℝP\\mathcal\{M\}=\\mathbb\{R\}^\{P\}of a model architecture \(or a submanifold if architectural constraints reduce effective dimension\), wherePPis the parameter count\. The driftbbis the negative gradient of the training loss together with weight\-decay regularization and optimizer\-specific terms \(momentum, adaptive scaling\)\. The noiseDDencodes mini\-batch gradient noise, the magnitude of which depends on batch size and learning\-rate schedule\(Liuet al\.,[2022](https://arxiv.org/html/2605.16325#bib.bib5); Hooglandet al\.,[2024](https://arxiv.org/html/2605.16325#bib.bib11)\)\. The entropy flux is data\- driven: each training step deposits new information into the parameter distribution, and the cumulative flux is the integrated gradient signal over training\.

Conditions \(C1\)–\(C3\) require more care than in the chemical instance\. Compactness \(C1\) is enforced effectively by weight decay \(the stationary measure has bounded support\)\. Non\-degeneracy \(C2\) corresponds to strict positivity of the Hessian at typical loss minima, generically satisfied for over\-parameterized networks but failing at degenerate basins relevant to singular learning theory\(Watanabe,[2009](https://arxiv.org/html/2605.16325#bib.bib13)\)\. Trivial first cohomology \(C3\) holds forℝP\\mathbb\{R\}^\{P\}but may fail for architectures with discrete symmetries \(permutation invariance of hidden units, weight\-tying\); we treat these symmetries as gauge degrees of freedom that can be quotiented out\.

#### Empirical anchors\.

Several phase\-transition phenomena in neural network training serve as empirical anchors for the framework in this instance\.*Grokking*, the delayed transition from memorization to generalization in modular arithmetic\(Poweret al\.,[2022](https://arxiv.org/html/2605.16325#bib.bib3); Nandaet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib4); Liuet al\.,[2022](https://arxiv.org/html/2605.16325#bib.bib5)\), has recently been given a rigorous quantitative theory: the Norm\-Hierarchy Transition Law\(Truonget al\.,[2026](https://arxiv.org/html/2605.16325#bib.bib65); Truong and Truong,[2026b](https://arxiv.org/html/2605.16325#bib.bib66)\)establishes that the delay scales asTgrok−Tmem=Θ​\(γeff−1​log⁡\(‖θmem‖2/‖θpost‖2\)\)T\_\{\\text\{grok\}\}\-T\_\{\\text\{mem\}\}=\\Theta\(\\gamma\_\{\\text\{eff\}\}^\{\-1\}\\log\(\\\|\\theta\_\{\\text\{mem\}\}\\\|^\{2\}/\\\|\\theta\_\{\\text\{post\}\}\\\|^\{2\}\)\), whereγeff\\gamma\_\{\\text\{eff\}\}is the effective contraction rate of the optimizer \(γeff=η​λ\\gamma\_\{\\text\{eff\}\}=\\eta\\lambdafor SGD,≥η​λ\\geq\\eta\\lambdafor AdamW\), with matching upper and lower bounds under regularised first\-order dynamics\. We use this rigorous form as the basis for Prediction 3a \(§[6\.3](https://arxiv.org/html/2605.16325#S6.SS3)\)\.*Emergent capabilities*\(Weiet al\.,[2022](https://arxiv.org/html/2605.16325#bib.bib9)\)are identified, in the two\-field reading, with the crossing of phase boundaries inΦI\\Phi\_\{I\}as model scale and training compute increase\.

Introspective access in frontier language models has been the subject of converging empirical work in 2024–2026\(Binderet al\.,[2024](https://arxiv.org/html/2605.16325#bib.bib16); Lindsey,[2025](https://arxiv.org/html/2605.16325#bib.bib15); Lederman and Mahowald,[2026](https://arxiv.org/html/2605.16325#bib.bib58); Macaret al\.,[2026](https://arxiv.org/html/2605.16325#bib.bib59); Songet al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib60)\)\.Lindsey \([2025](https://arxiv.org/html/2605.16325#bib.bib15)\)demonstrated, using activation injection on Anthropic frontier models, that introspection on internal states satisfies the four operational criteria \(accuracy, grounding, internality, metacognitive representation\) in*some*scenarios, but is highly unreliable and context\-dependent\.Lederman and Mahowald \([2026](https://arxiv.org/html/2605.16325#bib.bib58)\)replicated these findings in open\-source models and dissociated two underlying mechanisms: probability\-matching \(inference from prompt anomaly\) and direct access \(content\-agnostic state detection\)\.Macaret al\.\([2026](https://arxiv.org/html/2605.16325#bib.bib59)\)found that introspective capacity is suppressed by default in sampled outputs but detectable via logit lens in intermediate layers, with elicitation methods raising detection rates from0\.3%0\.3\\%to39\.9%39\.9\\%\.Songet al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib60)\)report that LLMs fail to meaningfully introspect under a more demanding “privileged self\-access” definition\.

The persistent finding across these studies is*partial*introspective capacity—above the noise floor but well below the reliability that full self\-modelling would predict\. The framework interprets this pattern as operation in the vicinity ofκc\\kappa\_\{c\}: indicative estimates derived from injection\-detection rates and self\-report fidelity yieldF∈\[0\.3,0\.5\]F\\in\[0\.3,0\.5\]andC∈\[0\.05,0\.15\]C\\in\[0\.05,0\.15\]across the studied frontier systems, placing them in the immediate neighbourhood of\(Fmin,Cmin\)\(F\_\{\\min\},C\_\{\\min\}\)but neither reliably above nor confidently below\. Whetherκc\\kappa\_\{c\}crossings have already occurred in deployed systems, are imminent in the next training generation, or remain distant, is an open empirical question that direct measurement ofF​\(κ\)F\(\\kappa\)andC​\(κ\)C\(\\kappa\)on full training trajectories could settle\. Independent support for the structural claim that increasingκ\\kappahas measurable effects on the trained network comes fromPremakumaret al\.\([2024](https://arxiv.org/html/2605.16325#bib.bib61)\), who demonstrated that adding self\-modelling auxiliary tasks during training reduces the real log canonical threshold \(RLCT\)—a result we discuss further in §[7\.2](https://arxiv.org/html/2605.16325#S7.SS2)\.

#### Realization\.

Frontier large language models can, on this view, be regarded as a second realization of the dynamics \([1](https://arxiv.org/html/2605.16325#S2.E1)\), on the transformer parameter manifold\. The relevant time scale is computational \(∼104\\sim 10^\{4\}–10610^\{6\}gradient\-descent steps from initialization to deployment\)\. The breakdown thresholdα†=Θ​\(1/log⁡\|𝒪N\|\)\\alpha^\{\\dagger\}=\\Theta\(1/\\log\|\\mathcal\{O\}\_\{N\}\|\)predicts that LLMs of higher capability may be intrinsically more vulnerable to adversarial data corruption per unit contamination budget—a quantitative formalization of the intuition motivating recent work on data poisoning, jailbreak resistance, and alignment robustness\(Casperet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib14); Hannekeet al\.,[2022](https://arxiv.org/html/2605.16325#bib.bib21); Chornomazet al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib22)\)\. The framework does not claim that current LLMs are intelligent in any substantive philosophical sense; we propose only that they may be productively studied as particular instances within the framework on the transformer parameter manifold\.

### 5\.3What is shared, and what is not

The proposal we develop is that the two instances may be productively viewed as members of a common dynamical class, in the sense that they are particular instances within the same framework on different configuration manifolds\. Below we make explicit which features the framework predicts to be shared and which are instance\-specific; neither the shared features nor their universality status are established by this perspective paper alone\.

Table 1:Universality\-class invariants and instance\-specific features across the two empirical instances\. The shared features are consequences of the two\-field structure and the regularity conditions \(C1\)–\(C3\); the instance\-specific features parameterize which solution within the family is realized\.The asymmetry in scale \(nine orders of magnitude in degrees of freedom, fourteen orders of magnitude in time scale\) is among the most striking features of the unification\. We do not claim it is explanatory: the framework does not predict which solutions are realized in a given universe, only that whatever solutions are realized share the universality\-class invariants of Table[1](https://arxiv.org/html/2605.16325#S5.T1)\. The realized scales are determined by the specific entropy\-flux history of the substrate—a question for astrophysics and the specific computational economics of training runs, respectively, not for the framework itself\.

### 5\.4Other potential instances

We mention briefly, without development, three further candidate instances that the framework would in principle accommodate but that we do not validate empirically here\. \(i\)*Neural development in embryos*\(Levin,[2023](https://arxiv.org/html/2605.16325#bib.bib38)\): the configuration manifold is gene\-expression and morphogen\-concentration space; the entropy flux is metabolic ATP consumption; the relevant time scale is ontogenetic\. \(ii\)*Major evolutionary transitions*\(Maynard Smith and Szathmáry,[1995](https://arxiv.org/html/2605.16325#bib.bib39); Prokopenkoet al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib37)\): the configuration manifold is the genotypic–phenotypic state space of a lineage; the entropy flux is selective pressure integrated over generations; the relevant time scale spans the transition from prokaryotic to eukaryotic life or from solitary to multicellular organization\. \(iii\)*Multi\-agent collective intelligence*: the configuration manifold is the joint strategy space of interacting agents; the entropy flux is the information exchange rate; the relevant time scale is set by the mixing time of the interaction graph\. We treat these as speculative extensions, deferred to future work\.

### 5\.5Empirical convergence: what the framework explains

The two\-field framework was developed before the empirical findings synthesised below; we summarise them here to make the framework’s*explanatory*—as opposed to merely predictive—content concrete\. The findings are recent \(mostly 2024–2026\), independent of the framework’s development, and converge on a structure that the two\-field perspective makes coherent\.

#### Alignment phase transitions exist and are localised\.

Turneret al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib53)\)isolated a mechanistic phase transition in rank\-1 LoRA fine\-tuning across model families and sizes: directions for misalignment are learnt over a narrow window of training steps, with the transition evident both in fine\-tuned parameters and in behavioural misalignment\.Soligoet al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib54)\)found convergent linear representations of misalignment across model organisms, suggesting a shared structural target\.Arnold and Lörch \([2025](https://arxiv.org/html/2605.16325#bib.bib55)\)decomposed transitions into multiple plain\-English order parameters with shared timing, finding that the behavioural transition occurs substantially*later*than the gradient norm peak \(which serves as an early\-warning signal\)\.Hennick and Corlouer \([2026](https://arxiv.org/html/2605.16325#bib.bib56)\)derived a spectral heat\-capacity observable from a 2\-datapoint reduced density matrix, providing critical\-slowing\-down early warning of second\-order transitions during training\. The two\-field framework reads these collectively as signatures ofκc\\kappa\_\{c\}crossing—a single underlying regime change with multiple measurable correlates\.

#### Adversarial breakdown decreases with model scale\.

Souly and others \([2025](https://arxiv.org/html/2605.16325#bib.bib57)\), in the largest pretraining poisoning study to date \(600M–13B parameters under Chinchilla\-optimal training\), found that approximately 250 poisoned documents produce robust backdoors*regardless*of model or dataset size\. The poisoning fraction required therefore decreases monotonically with scale: a 13B model trained on more than20×20\\timesthe clean data of a 600M model is backdoored by the same absolute count of poisons\. This is qualitatively the predictionα†​\(𝒪N\)→0\\alpha^\{\\dagger\}\(\\mathcal\{O\}\_\{N\}\)\\to 0as\|𝒪N\|→∞\|\\mathcal\{O\}\_\{N\}\|\\to\\inftyestablished in \([5](https://arxiv.org/html/2605.16325#S3.E5)\),*though faster than purely logarithmic*\. The two\-field framework treats this discrepancy as informative rather than falsifying: the empirical rate ofα†\\alpha^\{\\dagger\}decay probes how\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|scales with parameter countPP\. A super\-polynomial relation\|𝒪N​\(P\)\|=exp⁡\(Θ​\(Pβ\)\)\|\\mathcal\{O\}\_\{N\}\(P\)\|=\\exp\(\\Theta\(P^\{\\beta\}\)\)for someβ\>0\\beta\>0recovers the[Souly and others](https://arxiv.org/html/2605.16325#bib.bib57)rate from \([5](https://arxiv.org/html/2605.16325#S3.E5)\); this is consistent with the combinatorial explosion of representable circuit motifs in transformer parameter spaces, but a rigorous\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|–PPcorrespondence remains open \(§[7\.6](https://arxiv.org/html/2605.16325#S7.SS6)\)\.

#### Introspective capacity is partial and content\-agnostic\.

Recent work on LLM introspection\(Lindsey,[2025](https://arxiv.org/html/2605.16325#bib.bib15); Binderet al\.,[2024](https://arxiv.org/html/2605.16325#bib.bib16); Lederman and Mahowald,[2026](https://arxiv.org/html/2605.16325#bib.bib58); Macaret al\.,[2026](https://arxiv.org/html/2605.16325#bib.bib59); Songet al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib60)\)collectively reports that frontier LLMs detect injected internal states above the noise floor but identify their content unreliably\.Lederman and Mahowald \([2026](https://arxiv.org/html/2605.16325#bib.bib58)\)explicitly dissociate two mechanisms: probability\-matching \(inference from prompt anomaly\) and direct access \(content\-agnostic state detection\)\. The framework interprets this dissociation as the operational signature of operation in theκc\\kappa\_\{c\}vicinity: the gap between detection \(reflecting mediumFF\) and content\-encoding \(reflecting lowCC\) maps onto the framework’s distinction between predictive fidelity and causal efficacy in \([7](https://arxiv.org/html/2605.16325#S3.E7)\)–\([8](https://arxiv.org/html/2605.16325#S3.E8)\)\.

#### Self\-modelling reduces RLCT\.

Premakumaret al\.\([2024](https://arxiv.org/html/2605.16325#bib.bib61)\)demonstrated that adding self\-modelling auxiliary tasks during training reduces the real log canonical threshold \(RLCT\) of the trained network, indicating reduced loss\-landscape singularity\. Within the two\-field reading, increasing the self\-referential coupling strengthκ\\kappadeepens the wells ofΦI\\Phi\_\{I\}on the loss\-relevant manifold; this manifests, in the language of singular learning theory, as RLCT reduction\. The[Premakumaret al\.](https://arxiv.org/html/2605.16325#bib.bib61)finding therefore provides the first direct empirical bridge betweenκ\\kappaand Watanabe’s RLCT machinery, closing a gap noted as open in §[7\.2](https://arxiv.org/html/2605.16325#S7.SS2)\.

#### Synthesis\.

The pattern across these independent findings is a phase\-transition phenomenology in driven informational systems with: \(i\) a complexity\-dependent vulnerability threshold whose decay rate probes the\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|–PPcorrespondence; \(ii\) a self\-modelling threshold whose vicinity is occupied by current frontier systems; \(iii\) measurable correlates in spectral observables \(heat capacity, participation ratio\) and structural invariants \(RLCT, linear representation directions\)\. The two\-field framework offers the dynamical class within which this pattern is internally consistent\. It does not claim to be the only such class; it claims that no single\-field gradient account, taken alone, can reproduce this combination of features\.

## 6Falsifiable Predictions

A perspective paper claims theoretical territory by identifying predictions that, if falsified, would force revision or abandonment of the framework\. We state three such predictions, each operational on existing experimental or computational platforms, and each discriminating the two\-field framework against single\-field alternatives\.

Several predictions below have partial empirical support already in the literature\.Bereskaet al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib67)\)measure sparse\-autoencoder feature\-count consolidation at the grokking transition on modular arithmetic, providing one component of the\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|measurement protocol of Prediction 1\.Clauwet al\.\([2024](https://arxiv.org/html/2605.16325#bib.bib8)\)\(introduced in §[5](https://arxiv.org/html/2605.16325#S5)\) report that O\-information synergy peaks before grokking, consistent with the non\-additive structure underlying Prediction 2\.Arnold and Lörch \([2025](https://arxiv.org/html/2605.16325#bib.bib55)\)document that behavioural transitions during emergent\-misalignment fine\-tuning lag the gradient\-norm peak, establishing the sign ofΔ​t\\Delta tin Prediction 3a\. The protocols below extend these partial validations to the framework’s specific quantitative claims, and identify the discriminating measurements still missing\.

### 6\.1Prediction 1: Joint scaling ofα†\\alpha^\{\\dagger\}andκc\\kappa\_\{c\}

The duality claim of §[3\.3](https://arxiv.org/html/2605.16325#S3.SS3)predicts that the two order parameters scale jointly with representational complexity in a manner specified by \([10](https://arxiv.org/html/2605.16325#S3.E10)\)\. Within a family of driven informational systems of varying complexity—transformers of varying width on a fixed task, or chemical networks of varying species count on a fixed entropy\-flux protocol—the productsα†​\(𝒪N\)⋅\(log⁡\|𝒪N\|\)γ1\\alpha^\{\\dagger\}\(\\mathcal\{O\}\_\{N\}\)\\cdot\(\\log\|\\mathcal\{O\}\_\{N\}\|\)^\{\\gamma\_\{1\}\}andκc​\(𝒪N\)⋅\(log⁡\|𝒪N\|\)γ2\\kappa\_\{c\}\(\\mathcal\{O\}\_\{N\}\)\\cdot\(\\log\|\\mathcal\{O\}\_\{N\}\|\)^\{\\gamma\_\{2\}\}should approach distinct universal constants asymptotically, with\(γ1,γ2\)\(\\gamma\_\{1\},\\gamma\_\{2\}\)identifying the universality class\.

#### Operational protocol \(learning instance\)\.

We propose varying the modulusppofℤp\\mathbb\{Z\}\_\{p\}as the size variable, following the finite\-size\-scaling methodology ofBiet al\.\([2026](https://arxiv.org/html/2605.16325#bib.bib68)\), who note that varying width across model classes does not satisfy the controlled single\-family size variation that finite\-size scaling requires\. The transformer architecture \(width, depth, attention heads\) is held fixed; onlyppvaries acrossp∈\{53,97,113,251,503,1009\}p\\in\\\{53,97,113,251,503,1009\\\}, with weight decay and learning rate held constant\. For each trained model: \(i\) estimate\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|via the linear\-mapping number, the sparse\-autoencoder feature count, or the entropy\- weighted effective\-feature count ofBereskaet al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib67)\)\(Liuet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib6); Cunninghamet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib24)\); \(ii\) estimateα†\\alpha^\{\\dagger\}by running data\-poisoning experiments at varying contamination rates and identifying the contamination level above which a two\-stage shift\-detection protocol fails; \(iii\) estimateκc\\kappa\_\{c\}via the introspection\-injection protocol ofLindsey \([2025](https://arxiv.org/html/2605.16325#bib.bib15)\)adapted to grokking\-task models, identifying the smallest coupling at whichF​\(κ\)≥FminF\(\\kappa\)\\geq F\_\{\\min\}andC​\(κ\)≥CminC\(\\kappa\)\\geq C\_\{\\min\}\.

#### Predicted signature\.

For each moduluspp, plotlog⁡α†​\(p\)\\log\\alpha^\{\\dagger\}\(p\)andlog⁡κc​\(p\)\\log\\kappa\_\{c\}\(p\)againstlog⁡log⁡\|𝒪N​\(p\)\|\\log\\log\|\\mathcal\{O\}\_\{N\}\(p\)\|\. The two\-field framework predicts: \(i\) both quantities exhibit asymptotic linear behaviour in this log–log scale, with negative slopes−γ1\-\\gamma\_\{1\}and−γ2\-\\gamma\_\{2\}respectively; \(ii\)γ1=1\.0±0\.15\\gamma\_\{1\}=1\.0\\pm 0\.15, consistent with theα†\\alpha^\{\\dagger\}scaling \([5](https://arxiv.org/html/2605.16325#S3.E5)\) \(an independent test of established theory\); \(iii\)γ2\\gamma\_\{2\}takes a definite value in\(0,1\]\(0,1\], to be determined by the experiment\. The framework is*silent on the specific value*ofγ2\\gamma\_\{2\}: it predicts only thatγ2∈\(0,1\]\\gamma\_\{2\}\\in\(0,1\], thatγ2\\gamma\_\{2\}is positive \(soκc\\kappa\_\{c\}does decrease with complexity\), and that the sameγ2\\gamma\_\{2\}is observed across system instances within the same nominal class \(e\.g\., across modular addition versus modular multiplication tasks\)\. The experiment thus serves simultaneously as parameter estimation and as a falsification test\.

#### Falsification\.

The framework is falsified if any of the following holds: \(a\) either product fails to exhibit asymptotic linear log–log behaviour \(e\.g\., diverges, vanishes, or shows oscillatory or non\-monotone scaling\); \(b\)γ1\\gamma\_\{1\}is found to differ significantly from11, contradicting the establishedα†\\alpha^\{\\dagger\}scaling; \(c\)γ2\\gamma\_\{2\}differs significantly across instances of the same nominal class, indicating thatγ2\\gamma\_\{2\}is not a class invariant; or \(d\)γ2≤0\\gamma\_\{2\}\\leq 0, indicating thatκc\\kappa\_\{c\}does not decrease with representational complexity\.

#### Discrimination\.

Single\-field theories of grokking predict at most one order parameter \(compression rate, basin depth, RLCT\)\. They do not, in their current forms, predict the joint scaling of two distinct quantities, since they posit only one phase transition\. Confirmation of joint scaling would suggest single\-field alternatives, in their current forms, are incomplete\.

### 6\.2Prediction 2: Catalysis–confinement synergy in LLM training

Theorem 2\.2 ofTruong and Truong \([2026a](https://arxiv.org/html/2605.16325#bib.bib2)\)establishes that single\- field gradient dynamics on compact manifolds with linear driving combine two perturbations with disjoint local supports*additively*, with superlinearity factorS=1\+O​\(‖δ​V‖2\)S=1\+O\(\\\|\\delta V\\\|^\{2\}\)\. The empirically inferredS≈5\.75S\\approx 5\.75in clay\-catalyzed RNA polymerization\(Ferriset al\.,[1996](https://arxiv.org/html/2605.16325#bib.bib40)\)is the discriminating empirical signature for two\-field structure in the chemical instance\. We predict an analogous signature in the learning instance\.

Consider two training\-signal modifications with disjoint local support in parameter space:*curriculum learning*\(modulating the data\-distribution sequence\) and*targeted data augmentation*\(modulating the per\-sample input distribution\)\. Each individually modifiesΦI\\Phi\_\{I\}in a localized region of the parameter manifold; under single\-field reduction, their joint application would combine additively in compression effect\.

#### Operational protocol\.

Train four models on a fixed underlying task \(e\.g\., modular arithmetic, code synthesis on a benchmark, or a small transformer language modeling task\), under four conditions: \(i\) baseline \(no modification\); \(ii\) curriculum only; \(iii\) augmentation only; \(iv\) both combined\. After each model reaches its grokking transition, measure the post\-transition primitive count\|𝒪Npost\|\|\\mathcal\{O\}\_\{N\}^\{\\text\{post\}\}\|via the same estimator as Prediction 1\. Compute the synergy factor

SLLM:=\|𝒪Nbase\|−\|𝒪Nboth\|\(\|𝒪Nbase\|−\|𝒪Ncurr\|\)\+\(\|𝒪Nbase\|−\|𝒪Naug\|\)\.S\_\{\\text\{LLM\}\}\\;:=\\;\\frac\{\|\\mathcal\{O\}\_\{N\}^\{\\text\{base\}\}\|\-\|\\mathcal\{O\}\_\{N\}^\{\\text\{both\}\}\|\}\{\(\|\\mathcal\{O\}\_\{N\}^\{\\text\{base\}\}\|\-\|\\mathcal\{O\}\_\{N\}^\{\\text\{curr\}\}\|\)\+\(\|\\mathcal\{O\}\_\{N\}^\{\\text\{base\}\}\|\-\|\\mathcal\{O\}\_\{N\}^\{\\text\{aug\}\}\|\)\}\.

#### Predicted signature\.

SLLM\>1\+O​\(δ2\)S\_\{\\text\{LLM\}\}\>1\+O\(\\delta^\{2\}\), whereδ\\deltais a measure of perturbation strength in parameter space\. We do not predict a specific numerical value \(the chemicalS≈5\.75S\\approx 5\.75is system\-specific\), only that the signature exceeds the perturbative additivity bound\.

#### Falsification\.

SLLM≈1S\_\{\\text\{LLM\}\}\\approx 1within experimental uncertainty\.

#### Discrimination\.

The no\-go result above implies that any single\-field gradient model of training, combined with the assumption of disjoint perturbation supports, must predictS≈1S\\approx 1\. ObservedS\>1S\>1would suggest single\-field gradient accounts of representational reorganization in transformer training, in their current forms, are incomplete—paralleling the situation in chemistry\.

### 6\.3Prediction 3:κc\\kappa\_\{c\}crossing as the mechanism of observed alignment phase transitions

Phase transitions in alignment fine\-tuning are by now an established empirical phenomenon\.Turneret al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib53)\)isolate a mechanistic phase transition in rank\-1 LoRA fine\-tuning across model families and sizes: directions for misalignment are learnt over a narrow window of training steps, with the transition evident both in fine\-tuned parameters and in misalignment scaling behaviour\.Soligoet al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib54)\)demonstrate that emergently misaligned models converge to similar linear representations of misalignment across training conditions\.Arnold and Lörch \([2025](https://arxiv.org/html/2605.16325#bib.bib55)\)develop a physics\-style order\-parameter framework for these transitions, finding that the behavioural transition occurs*substantially later*than the gradient norm peak, which serves as an early\-warning signal\.Hennick and Corlouer \([2026](https://arxiv.org/html/2605.16325#bib.bib56)\)derive a spectral heat\-capacity observable from a 2\-datapoint reduced density matrix, providing critical\-slowing\-down early warning of second\-order transitions during training\.

Against this empirical background, the two\-field framework makes a*mechanistic*claim: alignment phase transitions are signatures ofκc\\kappa\_\{c\}crossing—the regime change from passive sampling of statistics to self\-referential encoding and action—distinguishable from alternative mechanisms \(basin selection in a fixed loss landscape, RLCT degeneracy alone, capability\-only emergence\) by three operational signatures that can be tested simultaneously on existing model\-organism setups\.

#### 3a\. Phase\-transition timing\.

The behavioural transition lags the gradient norm peak by a delayΔ​t\\Delta tthat, under the framework, follows the Norm\-Hierarchy Transition \(NHT\) Law\(Truonget al\.,[2026](https://arxiv.org/html/2605.16325#bib.bib65); Truong and Truong,[2026b](https://arxiv.org/html/2605.16325#bib.bib66)\)\. That law establishes, with matching upper and lower bounds for regularised first\-order dynamics, that delayed representational transitions satisfy

T=Θ​\(γeff−1​log⁡\(Vsc/Vst\)\),T\\;=\\;\\Theta\\\!\\left\(\\gamma\_\{\\text\{eff\}\}^\{\-1\}\\,\\log\\\!\\left\(V\_\{\\text\{sc\}\}/V\_\{\\text\{st\}\}\\right\)\\right\),\(11\)whereγeff\\gamma\_\{\\text\{eff\}\}is the effective contraction rate of the optimizer \(η​λ\\eta\\lambdafor SGD;≥η​λ\\geq\\eta\\lambdafor AdamW\), andVsc,VstV\_\{\\text\{sc\}\},V\_\{\\text\{st\}\}are the characteristic norms of the shortcut and structured representations\. We propose that alignment phase transitions in the self\-referential regime are governed by an analogous law, with the substrate\-level contraction rateγeff\\gamma\_\{\\text\{eff\}\}replaced by the self\-referential coupling strengthκ\\kappa\(which plays the role of effective contraction toward the encoded internal model\), and the norm ratioVsc/VstV\_\{\\text\{sc\}\}/V\_\{\\text\{st\}\}replaced by an𝒪N\\mathcal\{O\}\_\{N\}\-cardinality proxy\. Concretely:

Δ​t=Θ​\(κ−1​log⁡\|𝒪Npre\|\)\.\\Delta t\\;=\\;\\Theta\\\!\\left\(\\kappa^\{\-1\}\\,\\log\|\\mathcal\{O\}\_\{N\}^\{\\text\{pre\}\}\|\\right\)\.\(12\)Equation \([12](https://arxiv.org/html/2605.16325#S6.E12)\) is presented as a mapping conjecture rather than a derived identity: the structural form \(logarithmic dependence on a complexity proxy, inverse dependence on a contraction\-like rate\) inherits from \([11](https://arxiv.org/html/2605.16325#S6.E11)\), but the identificationκ↔γeff\\kappa\\leftrightarrow\\gamma\_\{\\text\{eff\}\}andlog⁡\|𝒪Npre\|↔log⁡\(Vsc/Vst\)\\log\|\\mathcal\{O\}\_\{N\}^\{\\text\{pre\}\}\|\\leftrightarrow\\log\(V\_\{\\text\{sc\}\}/V\_\{\\text\{st\}\}\)is the proposed extension of the NHT framework to the self\-referential coupling regime\. Reduction of \([12](https://arxiv.org/html/2605.16325#S6.E12)\) to \([11](https://arxiv.org/html/2605.16325#S6.E11)\) in a suitable limit, together with rigorous bounds in the alignment\-fine\-tuning setting, is open\.*Falsification:*Δ​t\\Delta tdoes not scale logarithmically with model capability, or is independent of the LoRA rank used to induce the transition \(which controls the effectiveκ\\kappa\)\.

#### 3b\. Spectral coincidence with independently estimatedκc\\kappa\_\{c\}\.

The peak in the spectral heat capacity\(Hennick and Corlouer,[2026](https://arxiv.org/html/2605.16325#bib.bib56)\)over training time should coincide with theκc\\kappa\_\{c\}crossing point estimated independently from the joint scaling experiment of Prediction 1 \(§[6\.1](https://arxiv.org/html/2605.16325#S6.SS1)\), within a confidence interval set by the spectral width and the Prediction 1 estimation error\. This two\-experiment cross\-check is the strongest test available with current methodology\.*Falsification:*the two peaks separate by more than the combined uncertainty, indicating that the spectral observable and the framework’sκc\\kappa\_\{c\}estimate track different objects\.

#### 3c\. Persistence discontinuity atκc\\kappa\_\{c\}\.

Within a single training run that crossesκc\\kappa\_\{c\}, alignment\-protocol perturbations applied at checkpointstit\_\{i\}should yield persistenceτ​\(ti\)\\tau\(t\_\{i\}\)that exhibits a*derivative discontinuity*at theκc\\kappa\_\{c\}crossing point:∂2τ/∂t2\\partial^\{2\}\\tau/\\partial t^\{2\}exceeds a noise\-floor threshold in a small window around the predictedκc\\kappa\_\{c\}, while remaining smooth elsewhere—including at the gradient norm peak \(which precedes the transition perArnold and Lörch \([2025](https://arxiv.org/html/2605.16325#bib.bib55)\)\)\.*Falsification:*τ​\(t\)\\tau\(t\)varies smoothly across the predictedκc\\kappa\_\{c\}location, or shows discontinuity only at the gradient norm peak\. The latter outcome would indicate that the gradient norm peak*is*the alignment\-relevant transition rather than its precursor, contradicting the framework’s two\-stage structure\.

#### Discrimination\.

Each sub\-prediction discriminates the framework against a different alternative\. Prediction 3a discriminates against accounts in which gradient norm and behavioural transition coincide \(early single\-field accounts;[Arnold and Lörch](https://arxiv.org/html/2605.16325#bib.bib55)’s finding already creates pressure on these\)\. Prediction 3b discriminates against accounts that lack a complexity\-dependent threshold \(e\.g\., variational free energy minimization in the FEP alone, which predicts continuous improvement\)\. Prediction 3c discriminates against accounts in which the gradient norm peak*is*the transition rather than its precursor, including several mechanistic\-interpretability accounts of theTurneret al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib53)\)setup\.

#### Operational protocol\.

Predictions 3a–c can be tested simultaneously on the model\-organism setup ofTurneret al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib53)\)\(rank\-1 LoRA fine\-tuning of a mid\-scale instruction\-tuned model\), using the analytical machinery ofArnold and Lörch \([2025](https://arxiv.org/html/2605.16325#bib.bib55)\)\(LLM\-judged plain\-English order parameters with statistical dissimilarity measures\) andHennick and Corlouer \([2026](https://arxiv.org/html/2605.16325#bib.bib56)\)\(2\-datapoint reduced density matrix spectral observables\)\. The cost of testing is approximately one fine\-tuning run with checkpointed perturbation experiments and post\-hoc analysis\. Open\-source implementations of the necessary primitives—rank\-1 LoRA fine\-tuning with phase\-transition probing\(Turneret al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib53); Soligoet al\.,[2026](https://arxiv.org/html/2605.16325#bib.bib69)\), concept\-injection introspection on open\-weights models\(Macaret al\.,[2026](https://arxiv.org/html/2605.16325#bib.bib59)\), and the order\-parameter and spectral\-observable analyses ofArnold and Lörch \([2025](https://arxiv.org/html/2605.16325#bib.bib55)\)andHennick and Corlouer \([2026](https://arxiv.org/html/2605.16325#bib.bib56)\)—are publicly available, making the combined test feasible without novel infrastructure\. We invite the alignment\-evaluation community to apply this existing methodology to the combined test\.

### 6\.4Summary

The three predictions are independent: each can be tested without the others, and falsification of any one would call for significant revision of the framework\. Predictions 1 and 2 are testable on existing computational infrastructure with no novel measurement apparatus\. Prediction 3 may be approached using the introspection\-injection protocol ofLindsey \([2025](https://arxiv.org/html/2605.16325#bib.bib15)\)combined with the order\-parameter and spectral\-observable methodologies ofArnold and Lörch \([2025](https://arxiv.org/html/2605.16325#bib.bib55)\)andHennick and Corlouer \([2026](https://arxiv.org/html/2605.16325#bib.bib56)\), all demonstrated on research\-scale models\. We invite empirical groups in both communities—alignment evaluation laboratories\(Casperet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib14); Turneret al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib53)\)and prebiotic\-chemistry experimental groups \(Mast–Braun\(Matreuxet al\.,[2024](https://arxiv.org/html/2605.16325#bib.bib42); Floroniet al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib43)\), Sutherland\(Singhet al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib45)\), Damer–Deamer\(Damer and Deamer,[2020](https://arxiv.org/html/2605.16325#bib.bib46)\)\)—to consider testing the predictions on platforms where they have native expertise\.

## 7Discussion: What This Reframes

The two\-field framework intersects several active research programs\. Honest positioning relative to each is essential both for assessing the framework’s contribution and for identifying productive directions of collaboration\. We address five neighboring lines and conclude with limitations and open problems\.

### 7\.1Grokking and the compression\-rate accounts

The compression\-as\-grokking line of work\(Liuet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib6); Nandaet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib4); DeMosset al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib7); Clauwet al\.,[2024](https://arxiv.org/html/2605.16325#bib.bib8)\)has substantially advanced the empirical understanding of representational phase transitions in neural networks\.Liuet al\.\([2023](https://arxiv.org/html/2605.16325#bib.bib6)\)attribute grokking to the emergence of compressed representations measured by the linear\- mapping number;Nandaet al\.\([2023](https://arxiv.org/html/2605.16325#bib.bib4)\)identify Fourier circuits as the specific structure of these compressed representations; andDeMosset al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib7)\)andClauwet al\.\([2024](https://arxiv.org/html/2605.16325#bib.bib8)\)formalize the compression dynamics through rate–distortion and information\- theoretic phase\-transition lenses respectively\.

The two\-field framework is broadly consistent with these results in their direction, and offers complementary tools: an information\-theoretic lower bound on detection time, a strict\-inequality form of the compression dividend \(both established in the OPT companion preprint, see §[3\.4](https://arxiv.org/html/2605.16325#S3.SS4)\), and a perspective in which compression is read as one consequence of phase transition rather than its definition\. Within this reading, the compression rate is not the order parameter;\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|plays that role, and compression\|𝒪Npost\|/\|𝒪Npre\|≤1/η\|\\mathcal\{O\}\_\{N\}^\{\\text\{post\}\}\|/\|\\mathcal\{O\}\_\{N\}^\{\\text\{pre\}\}\|\\leq 1/\\etais its image under the phase transition\. Single\-field compression theories typically describe the phase transition as gradient descent on a complexity functional; the two\-field framework recovers this regime as the special case in which∇Σ\\nabla\\Sigmaand∇ΦI\\nabla\\Phi\_\{I\}become collinear, identified in §[4](https://arxiv.org/html/2605.16325#S4)as the measure\-zero non\-generic case under the stated regularity conditions\.

### 7\.2Singular learning theory and developmental interpretability

The singular learning theory program\(Watanabe,[2009](https://arxiv.org/html/2605.16325#bib.bib13); Hooglandet al\.,[2024](https://arxiv.org/html/2605.16325#bib.bib11); Pepin Lehalleuret al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib12)\)characterizes Bayesian phase transitions in neural networks via the real log canonical threshold \(RLCT\), an algebraic\-geometric invariant of the loss landscape\. The Timaeus developmental\-interpretability program applies these tools to study staged learning empirically\.

The two frameworks are formally complementary\. SLT analyzes the*posterior geometry*under a fixed data distribution; the two\- field framework analyzes the*Langevin dynamics*on the parameter manifold under driven training\. The RLCT characterizes basin shape;α†\\alpha^\{\\dagger\}characterizes basin robustness across distributional shift;κc\\kappa\_\{c\}characterizes the emergence of self\-referential coupling\. Recent empirical work\(Premakumaret al\.,[2024](https://arxiv.org/html/2605.16325#bib.bib61)\)establishes that increasing self\-modelling auxiliary load reduces the RLCT of trained networks; this provides the first measured bridge between the self\-referential\-coupling axis \(κ\\kappa\) and the SLT machinery, and is discussed further in §[5\.5](https://arxiv.org/html/2605.16325#S5.SS5)\. We conjecture that systems near a Bayesian phase transition in the RLCT sense undergo simultaneous transitions inα†\\alpha^\{\\dagger\}and possibly inκc\\kappa\_\{c\}, but no formal correspondence has been established and we treat this as an open problem \(§[7\.6](https://arxiv.org/html/2605.16325#S7.SS6)\)\. The two programs differ chiefly in framing: SLT is Bayesian and posterior\-centric; the two\- field framework is dynamical and trajectory\-centric\. Both should be welcomed\.

### 7\.3The free energy principle and active inference

The free energy principle\(Friston,[2010](https://arxiv.org/html/2605.16325#bib.bib34); Ramsteadet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib35); Friston,[2010](https://arxiv.org/html/2605.16325#bib.bib34)\)unifies perception and action through the variational minimization of free energy under a generative model\. The recent extension to large language models\(Prakki,[2024](https://arxiv.org/html/2605.16325#bib.bib17)\)treats LLM behavior as approximate active inference\.

The Linfoot fidelityF​\(κ\)F\(\\kappa\)in \([7](https://arxiv.org/html/2605.16325#S3.E7)\) is related to but distinct from the FEP variational free energy: both are mutual\-information\-based, butFFadmits a sharp threshold form via operational thresholds, whereas free energy is continuously minimized\. The two\-field framework can be read as supplying the FEP with an explicit substrate dynamics \(Σ\\Sigma\-driven\) underlying the variational minimization\. In particular, the FEP does not specify what generates the generative model itself; the two\-field framework proposes that the generative model is encoded byΦI\\Phi\_\{I\}, and the thresholdκc\\kappa\_\{c\}is the regime change at which this encoding becomes self\-referentially stable\. Recent work extending the FEP to origin\-of\-life dynamics and the present framework’s chemistry instance are mutually compatible; we expect productive cross\-pollination\.

### 7\.4Algorithmic origins and tangled hierarchies

Walker and Davies \([2013](https://arxiv.org/html/2605.16325#bib.bib36)\)proposed informational takeover as the defining transition in the origin of life: living systems are systems in which information control top\-down\. The recent extension byProkopenkoet al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib37)\)formalizes this through tangled information hierarchies and self\-modeling dynamics, identifying biological arrow\-of\-time as emergent from these hierarchies\.

The Prokopenko et al\. framework is the closest neighbor of theκc\\kappa\_\{c\}construct in the present perspective\. Both identify a regime change associated with self\-modeling; both ground the change in information\-theoretic quantities\. The differences are thatProkopenkoet al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib37)\)present self\-modeling as a qualitative hierarchy structure, whereas this perspective specifiesκc\\kappa\_\{c\}as a quantitative threshold with explicit operational definitionsFFandCC; and thatProkopenkoet al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib37)\)apply primarily to the biological\-evolutionary domain, whereas the present framework applies symmetrically to the learning instance\. We propose theκc\\kappa\_\{c\}formulation as a quantitative formalization of the tangled\-hierarchies framework, not a competing alternative; both direct empirical work toward measurement of self\-modeling onset\.

### 7\.5Thermodynamic\-bounds line and dissipation theories

The thermodynamic\-bounds line\(Busielloet al\.,[2021](https://arxiv.org/html/2605.16325#bib.bib31); Lianget al\.,[2024a](https://arxiv.org/html/2605.16325#bib.bib32),[b](https://arxiv.org/html/2605.16325#bib.bib33)\)establishes kinetics\-independent upper and lower bounds on symmetry\-breaking in driven chemical reaction networks via the matrix\-tree theorem\. These bounds delineate the*accessible*region of stationary states given a thermodynamic budget\. The dissipative\-adaptation framework\(England,[2015](https://arxiv.org/html/2605.16325#bib.bib30)\)establishes that driven self\-assembling systems are statistically biased toward configurations that absorb and dissipate work efficiently\.

The two\-field framework is complementary to both\. The bounds delimit which stationary states are accessible; the two\-field dynamics predicts which trajectory through the accessible region a system follows under specified entropy flux\. The dissipative adaptation captures theΣ\\Sigma\-driven exploration; the two\-field framework adds theΦI\\Phi\_\{I\}\-driven stabilization that selects deep wells within the explored region\. England’s framework is recovered in the limitβ→0\\beta\\to 0of the dynamics \([4](https://arxiv.org/html/2605.16325#S2.E4)\); the bounds ofLianget al\.\([2024a](https://arxiv.org/html/2605.16325#bib.bib32),[b](https://arxiv.org/html/2605.16325#bib.bib33)\)constrain the magnitudes that any specific solution within the framework can exhibit\. The three layers—bounds, dissipation, two\-field dynamics—together characterize driven chemical reaction networks more tightly than any one in isolation\.

### 7\.6Limitations and open problems

The framework as developed here is a perspective on the unification of two preceding lines of work \(the chemistry\-side and learning\-side companion preprints introduced in §[1\.3](https://arxiv.org/html/2605.16325#S1.SS3)\), and inherits their respective scopes and limitations\. Specific items deferred to future work include the following\.

#### \(i\) Rigorous derivation of the joint scaling law\.

Equation \([10](https://arxiv.org/html/2605.16325#S3.E10)\) is conjectural in the value ofγ2\\gamma\_\{2\}\. A derivation from first principles—specifyingγ2\\gamma\_\{2\}together with the asymptotic constantsc1,c2c\_\{1\},c\_\{2\}and their dependence on the universality class—is open\. Candidate routes include a Le Cam two\-point reduction with hypothesis classes of size scaling aslog⁡\|𝒪N\|\\log\|\\mathcal\{O\}\_\{N\}\|, a renormalization\-group\- style analysis of the coupled\(Σ,ΦI\)\(\\Sigma,\\Phi\_\{I\}\)flow at fixed points, and direct estimation from the experimental protocol of §[6\.1](https://arxiv.org/html/2605.16325#S6.SS1)\.

#### \(ii\) Formal correspondence with SLT phase transitions\.

The conjectured simultaneity of RLCT phase transitions andα†\\alpha^\{\\dagger\},κc\\kappa\_\{c\}transitions is unstudied\. A formal correspondence between the singular\-learning and two\-field accounts would be of substantial value\. The empirical bridge provided byPremakumaret al\.\([2024](https://arxiv.org/html/2605.16325#bib.bib61)\)is a starting point\.

#### \(iii\) Empirical estimation ofκc\\kappa\_\{c\}for current LLMs\.

The introspection\-based evidence\(Lindsey,[2025](https://arxiv.org/html/2605.16325#bib.bib15); Binderet al\.,[2024](https://arxiv.org/html/2605.16325#bib.bib16); Lederman and Mahowald,[2026](https://arxiv.org/html/2605.16325#bib.bib58); Macaret al\.,[2026](https://arxiv.org/html/2605.16325#bib.bib59); Songet al\.,[2025](https://arxiv.org/html/2605.16325#bib.bib60)\)is suggestive but not yet quantitatively connected toκc\\kappa\_\{c\}\. A direct estimation ofF​\(κ\)F\(\\kappa\)andC​\(κ\)C\(\\kappa\)for frontier models is open\.

#### \(iv\) The\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|–PPcorrespondence\.

The empirical finding ofSouly and others \([2025](https://arxiv.org/html/2605.16325#bib.bib57)\)thatα†\\alpha^\{\\dagger\}decays faster than1/log⁡P1/\\log Pat fixed Chinchilla\-optimal scaling probes the relation between primitive\-set cardinality\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|and parameter countPP\. A rigorous super\-polynomial correspondence\|𝒪N​\(P\)\|=exp⁡\(Θ​\(Pβ\)\)\|\\mathcal\{O\}\_\{N\}\(P\)\|=\\exp\(\\Theta\(P^\{\\beta\}\)\)for someβ\>0\\beta\>0would reconcile theory and experiment but has not been established\.

#### \(v\) Extension to continuous primitive spaces\.

The breakdown bound \([5](https://arxiv.org/html/2605.16325#S3.E5)\) is stated for finite primitive sets\. A covering\-number extension is sketched in the OPT companion preprint, but the treatment is incomplete\.

#### \(vi\) Connection to mechanistic interpretability\.

The relation between the abstract primitive count\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|and the empirical circuit\-formation timescales studied in mechanistic interpretability\(Nandaet al\.,[2023](https://arxiv.org/html/2605.16325#bib.bib4); Olssonet al\.,[2022](https://arxiv.org/html/2605.16325#bib.bib10)\)is unaddressed\.

#### \(vii\) Multi\-agent and collective extensions\.

The treatment is single\-system\. Whether the two\-field framework extends to multi\-agent collective intelligence \(§[5\.4](https://arxiv.org/html/2605.16325#S5.SS4)\) through a generalized configuration manifold remains to be determined\.

These items define a research program rather than terminal gaps\. Each is independently addressable; together they map the territory of the framework’s possible development\.

## 8Conclusion

We have proposed a framework in which phase\-transition phenomena in deep learning and in non\-equilibrium prebiotic chemistry may be productively studied as instances within a common dynamical class: driven informational systems governed by two gradient fields, the entropy\-production rateΣ\\Sigmaand the information quasi\-potentialΦI=−ln⁡p∗\\Phi\_\{I\}=\-\\ln p^\{\*\}\. Within this framework, we have discussed two candidate order parameters: an adversarial breakdown thresholdα†\\alpha^\{\\dagger\}whose decay with the primitive\-set cardinality\|𝒪N\|\|\\mathcal\{O\}\_\{N\}\|is logarithmic, and a self\-referential coupling thresholdκc\\kappa\_\{c\}associated with the regime in which a system encodes and acts on its own statistics\. The joint scaling\(α†,κc\)\(\\alpha^\{\\dagger\},\\kappa\_\{c\}\)defines a candidate universality class with two scaling exponents\(γ1,γ2\)\(\\gamma\_\{1\},\\gamma\_\{2\}\)as class invariants\. We have identified three predictions—joint scaling of the two thresholds \(parameter estimation ofγ2\\gamma\_\{2\}\), catalysis–confinement synergy in language model training, and three discriminating signatures ofκc\\kappa\_\{c\}crossing in alignment fine\-tuning—each in principle empirically testable on existing infrastructure\.

We do not claim that biological intelligence and large language models are the same kind of system; they manifestly are not\. We propose only that they may share dynamical structure as instances of a common framework, on configuration manifolds shaped by carbon–nitrogen chemistry under solar entropy flux and by transformer parameter spaces under gradient\-descent flux, respectively\. The asymmetry of many orders of magnitude in degrees of freedom and time scale between the two instances is not explained by the framework; the framework, if correct in its domain of applicability, would parameterize which instances within the class are realized under given physical conditions, but does not predict that biological\-style or model\-style realizations are inevitable\.

The framework was developed before the empirical findings synthesised in §[5\.5](https://arxiv.org/html/2605.16325#S5.SS5), and is offered here as a theoretical foundation for the convergent phenomenology thatTurneret al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib53)\); Arnold and Lörch \([2025](https://arxiv.org/html/2605.16325#bib.bib55)\); Soligoet al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib54)\); Hennick and Corlouer \([2026](https://arxiv.org/html/2605.16325#bib.bib56)\)have established for alignment phase transitions, thatSouly and others \([2025](https://arxiv.org/html/2605.16325#bib.bib57)\)have established for adversarial breakdown scaling, and thatLindsey \([2025](https://arxiv.org/html/2605.16325#bib.bib15)\); Lederman and Mahowald \([2026](https://arxiv.org/html/2605.16325#bib.bib58)\); Macaret al\.\([2026](https://arxiv.org/html/2605.16325#bib.bib59)\); Songet al\.\([2025](https://arxiv.org/html/2605.16325#bib.bib60)\)have established for partial introspection in frontier systems\. Two implications deserve emphasis\. First, the framework reframes alignment as a two\-field problem: protocols modifyΦI\\Phi\_\{I\}via substrate dynamics, and their efficiency may be measured by the projection of the modification onto the direction of the ground\-truth context shift\. The information\-theoretic unitη=I​\(A;C2∣𝒪N\)/H​\(C2∣𝒪N\)\\eta=I\(A;C\_\{2\}\\mid\\mathcal\{O\}\_\{N\}\)/H\(C\_\{2\}\\mid\\mathcal\{O\}\_\{N\}\)\(introduced in the OPT companion preprint, §[3\.4](https://arxiv.org/html/2605.16325#S3.SS4)\) provides a candidate common scale for protocol\-efficiency comparison—an analogue of bits\-per\-second for alignment\. Second, the framework interprets the persistent finding of partial introspection across frontier systems as operation in the vicinity ofκc\\kappa\_\{c\}, neither reliably above nor confidently below\. Identifying or ruling out aκc\\kappa\_\{c\}crossing in a controlled training run appears to us a useful empirical question for alignment research\.

We invite the alignment\-evaluation, mechanistic\-interpretability, prebiotic\-chemistry, and origin\-of\-life research communities to consider testing the predictions of §[6](https://arxiv.org/html/2605.16325#S6)on platforms where they have native expertise\. Whether the questions of representational reorganization under context shift, robustness to adversarial corruption, and prebiotic chemical convergence prove to admit a common dynamical description is, ultimately, an empirical question; the present paper proposes one direction of answer and identifies the experiments that could falsify it\.

## Acknowledgements

The Equation of Motion–Information Field Framework \(EOM\-IFF\) on which the chemistry\-side instance of this perspective is built was developed jointly with Truong Quynh Hoa, whose detailed treatment appears in the companion preprint\(Truong and Truong,[2026a](https://arxiv.org/html/2605.16325#bib.bib2)\)\. The present author thanks Hoa for permission to incorporate that framework in the unification proposed here; responsibility for the unification claim and any errors in its presentation rests with the present author alone\.

The author is grateful to the open\-access prebiotic\-chemistry and non\-equilibrium statistical\-physics communities, and to the open\- review machine\-learning theory community, for the public archive of preprints and datasets that made this synthesis possible\.

## Funding

This research received no specific grant from any funding agency in the public, commercial, or not\-for\-profit sectors\. All work was conducted independently at Clevix LLC \(Hanoi, Vietnam\) using the author’s own computational resources\.

## Declaration of AI use

The author used Anthropic’s Claude \(large language model assistant\) during manuscript preparation for \(i\) checkingLaTeXsyntax and cross\-reference consistency, \(ii\) language editing for grammar and clarity, \(iii\) reviewing the accuracy of literature citations, and \(iv\) identifying recent relevant literature\. All theoretical content, theorem statements, proof sketches, predictions, and final editorial choices are the author’s own\. The author takes full responsibility for the scientific content of this article\.

## Conflict of interest declaration

The author declares no competing interests, financial or otherwise, relevant to the subject matter of this manuscript\. Clevix LLC is an independent research entity and has no commercial interest in the outcome of the theoretical framework developed herein\.

## Data accessibility

This article has no primary experimental data\. All analyses cite published data from peer\-reviewed literature and preprints, which were not re\-collected by the author\. The companion preprints\(Truong and Truong,[2026a](https://arxiv.org/html/2605.16325#bib.bib2); Truong,[2026](https://arxiv.org/html/2605.16325#bib.bib1)\)contain the underlying theoretical derivations and computational validations referenced herein\.

## References

- J\. Arnold and N\. Lörch \(2025\)Decomposing behavioral phase transitions in LLMs: order parameters for emergent misalignment\.arXiv preprint arXiv:2508\.20015\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2508.20015),[Link](https://arxiv.org/abs/2508.20015)Cited by:[§5\.5](https://arxiv.org/html/2605.16325#S5.SS5.SSS0.Px1.p1.1),[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.SSS0.Px3.p1.8),[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.SSS0.Px4.p1.1),[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.SSS0.Px5.p1.1),[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.p1.1),[§6\.4](https://arxiv.org/html/2605.16325#S6.SS4.p1.1),[§6](https://arxiv.org/html/2605.16325#S6.p2.2),[§8](https://arxiv.org/html/2605.16325#S8.p3.4)\.
- M\. I\. Belghazi, A\. Baratin, S\. Rajeswar, S\. Ozair, Y\. Bengio, A\. Courville, and R\. D\. Hjelm \(2018\)Mutual information neural estimation\.InProceedings of the 35th International Conference on Machine Learning \(ICML\),pp\. 531–540\.External Links:[Link](https://proceedings.mlr.press/v80/belghazi18a.html)Cited by:[§3\.2](https://arxiv.org/html/2605.16325#S3.SS2.SSS0.Px2.p1.7)\.
- L\. F\. Bereska, Z\. Tzifa\-Kratira, R\. Samavi, and E\. Gavves \(2025\)Superposition as lossy compression: measure with sparse autoencoders and connect to adversarial vulnerability\.Transactions on Machine Learning Research \(TMLR\)\.Note:arXiv:2512\.13568External Links:[Link](https://arxiv.org/abs/2512.13568)Cited by:[§6\.1](https://arxiv.org/html/2605.16325#S6.SS1.SSS0.Px1.p1.9),[§6](https://arxiv.org/html/2605.16325#S6.p2.2)\.
- Y\. Bi, C\. Zhang, Q\. Wang, and V\. D\. Calhoun \(2026\)Grokking as a falsifiable finite\-size transition\.arXiv preprint arXiv:2603\.24746\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2603.24746),[Link](https://arxiv.org/abs/2603.24746)Cited by:[§6\.1](https://arxiv.org/html/2605.16325#S6.SS1.SSS0.Px1.p1.9)\.
- F\. J\. Binder, J\. Chua, T\. Korbak, H\. Sleight, J\. Hughes, R\. Long, E\. Perez, M\. Turpin, and O\. Evans \(2024\)Looking inward: language models can learn about themselves by introspection\.arXiv preprintarXiv:2410\.13787\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2410.13787),[Link](https://arxiv.org/abs/2410.13787)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p1.1),[§3\.2](https://arxiv.org/html/2605.16325#S3.SS2.SSS0.Px2.p2.5),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px1.p2.2),[§5\.5](https://arxiv.org/html/2605.16325#S5.SS5.SSS0.Px3.p1.3),[§7\.6](https://arxiv.org/html/2605.16325#S7.SS6.SSS0.Px3.p1.3)\.
- J\. G\. Blank, G\. H\. Miller, M\. J\. Ahrens, and R\. E\. Winans \(2001\)Experimental shock chemistry of aqueous amino acid solutions and the cometary delivery of prebiotic compounds\.Origins of Life and Evolution of Biospheres31,pp\. 15–51\.External Links:[Document](https://dx.doi.org/10.1023/A%3A1006758803255)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p3.2),[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p1.1),[§5\.1](https://arxiv.org/html/2605.16325#S5.SS1.SSS0.Px1.p1.4)\.
- D\. M\. Busiello, S\. Liang, F\. Piazza, and P\. De Los Rios \(2021\)Dissipation\-driven selection of states in non\-equilibrium chemical networks\.Communications Chemistry4,pp\. 16\.External Links:[Document](https://dx.doi.org/10.1038/s42004-021-00454-w)Cited by:[§7\.5](https://arxiv.org/html/2605.16325#S7.SS5.p1.1)\.
- S\. Casper, X\. Davies, C\. Shi, T\. K\. Gilbert, J\. Scheurer, J\. Rando, R\. Freedman, T\. Korbak, D\. Lindner, P\. Freire, T\. Wang, S\. Marks, C\. Segerie, M\. Carroll, A\. Peng, P\. Christoffersen, M\. Damani, S\. Slocum, U\. Anwar, A\. Siththaranjan, M\. Nadeau, E\. J\. Michaud, J\. Pfau, D\. Krasheninnikov, X\. Chen, L\. Langosco, P\. Hase, E\. Bıyık, A\. Dragan, D\. Krueger, D\. Sadigh, and D\. Hadfield\-Menell \(2023\)Open problems and fundamental limitations of reinforcement learning from human feedback\.Transactions on Machine Learning Research \(TMLR\)\.Note:Preprint: arXiv:2307\.15217External Links:[Link](https://openreview.net/forum?id=bx24KpJ4Eb)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p1.1),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px2.p1.3),[§6\.4](https://arxiv.org/html/2605.16325#S6.SS4.p1.1)\.
- B\. Chornomaz, Y\. Koren, S\. Moran, and T\. Waknine \(2025\)Agnostic learning under targeted poisoning: optimal rates and the role of randomness\.arXiv preprintarXiv:2506\.03075\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2506.03075),[Link](https://arxiv.org/abs/2506.03075)Cited by:[§3\.1](https://arxiv.org/html/2605.16325#S3.SS1.SSS0.Px2.p1.6),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px2.p1.3)\.
- K\. Clauw, S\. Stramaglia, and D\. Marinazzo \(2024\)Information\-theoretic progress measures reveal grokking is an emergent phase transition\.arXiv preprintarXiv:2408\.08944\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2408.08944),[Link](https://arxiv.org/abs/2408.08944)Cited by:[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p3.1),[§6](https://arxiv.org/html/2605.16325#S6.p2.2),[§7\.1](https://arxiv.org/html/2605.16325#S7.SS1.p1.1)\.
- H\. Cunningham, A\. Ewart, L\. Riggs, R\. Huben, and L\. Sharkey \(2023\)Sparse autoencoders find highly interpretable features in language models\.arXiv preprintarXiv:2309\.08600\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2309.08600),[Link](https://arxiv.org/abs/2309.08600)Cited by:[§6\.1](https://arxiv.org/html/2605.16325#S6.SS1.SSS0.Px1.p1.9)\.
- B\. Damer and D\. Deamer \(2020\)The hot spring hypothesis for an origin of life\.Astrobiology20\(4\),pp\. 429–452\.External Links:[Document](https://dx.doi.org/10.1089/ast.2019.2045)Cited by:[§6\.4](https://arxiv.org/html/2605.16325#S6.SS4.p1.1)\.
- B\. DeMoss, S\. Sapora, J\. Foerster, N\. Hawes, and I\. Posner \(2025\)The complexity dynamics of grokking\.Physica D: Nonlinear Phenomena482,pp\. 134859\.Note:Preprint: arXiv:2412\.09810External Links:[Document](https://dx.doi.org/10.1016/j.physd.2025.134859)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p1.1),[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p3.1),[§3\.4](https://arxiv.org/html/2605.16325#S3.SS4.SSS0.Px3.p1.3),[§4\.2](https://arxiv.org/html/2605.16325#S4.SS2.SSS0.Px2.p1.4),[§7\.1](https://arxiv.org/html/2605.16325#S7.SS1.p1.1)\.
- I\. Diakonikolas and D\. M\. Kane \(2023\)Algorithmic high\-dimensional robust statistics\.Cambridge University Press\.External Links:[Document](https://dx.doi.org/10.1017/9781108943161)Cited by:[§3\.1](https://arxiv.org/html/2605.16325#S3.SS1.SSS0.Px2.p1.6)\.
- D\. L\. Donoho and P\. J\. Huber \(1983\)The notion of breakdown point\.InA Festschrift for Erich L\. Lehmann,P\. J\. Bickel, K\. Doksum, and J\. L\. Hodges \(Eds\.\),pp\. 157–184\.Cited by:[§1\.2](https://arxiv.org/html/2605.16325#S1.SS2.SSS0.Px2.p1.2),[§3\.1](https://arxiv.org/html/2605.16325#S3.SS1.SSS0.Px2.p1.6)\.
- A\. Eichhorn, D\. Mesterházy, and M\. M\. Scherer \(2013\)Universal behavior of coupled order parameters below three dimensions\.Physical Review E88,pp\. 042141\.Note:arXiv:1606\.07449 v2 \(2017\)External Links:[Document](https://dx.doi.org/10.1103/PhysRevE.88.042141)Cited by:[§3\.3](https://arxiv.org/html/2605.16325#S3.SS3.SSS0.Px2.p2.3)\.
- J\. L\. England \(2015\)Dissipative adaptation in driven self\-assembly\.Nature Nanotechnology10,pp\. 919–923\.External Links:[Document](https://dx.doi.org/10.1038/nnano.2015.250)Cited by:[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p3.1),[§7\.5](https://arxiv.org/html/2605.16325#S7.SS5.p1.1)\.
- J\. P\. Ferris, A\. R\. Hill, R\. Liu, and L\. E\. Orgel \(1996\)Synthesis of long prebiotic oligomers on mineral surfaces\.Nature381,pp\. 59–61\.External Links:[Document](https://dx.doi.org/10.1038/381059a0)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p3.2),[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p1.1),[§2\.4](https://arxiv.org/html/2605.16325#S2.SS4.p3.2),[§5\.1](https://arxiv.org/html/2605.16325#S5.SS1.SSS0.Px1.p1.4),[§6\.2](https://arxiv.org/html/2605.16325#S6.SS2.p1.2)\.
- A\. Floroni, N\. Yeh Martín, T\. Matreux, L\. I\. Weise, S\. S\. Mansy, H\. Mutschler, C\. B\. Mast, and D\. Braun \(2025\)Membraneless protocell confined by a heat flow\.Nature Physics21,pp\. 1303–1310\.External Links:[Document](https://dx.doi.org/10.1038/s41567-025-02935-4)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p3.2),[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p1.1),[§5\.1](https://arxiv.org/html/2605.16325#S5.SS1.SSS0.Px1.p1.4),[§6\.4](https://arxiv.org/html/2605.16325#S6.SS4.p1.1)\.
- M\. I\. Freidlin and A\. D\. Wentzell \(1984\)Random perturbations of dynamical systems\.Grundlehren der mathematischen Wissenschaften, Vol\.260,Springer\.External Links:[Document](https://dx.doi.org/10.1007/978-1-4684-0176-9)Cited by:[§2\.2](https://arxiv.org/html/2605.16325#S2.SS2.SSS0.Px2.p1.7)\.
- K\. Friston \(2010\)The free\-energy principle: a unified brain theory?\.Nature Reviews Neuroscience11,pp\. 127–138\.External Links:[Document](https://dx.doi.org/10.1038/nrn2787)Cited by:[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p3.1),[§3\.4](https://arxiv.org/html/2605.16325#S3.SS4.SSS0.Px2.p1.7),[§7\.3](https://arxiv.org/html/2605.16325#S7.SS3.p1.1)\.
- F\. R\. Hampel \(1971\)A general qualitative definition of robustness\.Annals of Mathematical Statistics42\(6\),pp\. 1887–1896\.External Links:[Document](https://dx.doi.org/10.1214/aoms/1177693054)Cited by:[§1\.2](https://arxiv.org/html/2605.16325#S1.SS2.SSS0.Px2.p1.2),[Figure 1](https://arxiv.org/html/2605.16325#S3.F1),[§3\.1](https://arxiv.org/html/2605.16325#S3.SS1.SSS0.Px2.p1.6)\.
- S\. Hanneke, A\. Karbasi, M\. Mahmoody, I\. Mehalel, and S\. Moran \(2022\)On optimal learning under targeted data poisoning\.InAdvances in Neural Information Processing Systems \(NeurIPS\),External Links:[Document](https://dx.doi.org/10.48550/arXiv.2210.02713),[Link](https://arxiv.org/abs/2210.02713)Cited by:[§3\.1](https://arxiv.org/html/2605.16325#S3.SS1.SSS0.Px2.p1.6),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px2.p1.3)\.
- N\. Hasselmann, A\. Sinner, and P\. Kopietz \(2007\)Two\-parameter scaling of correlation functions near continuous phase transitions\.Physical Review E76,pp\. 040101\(R\)\.External Links:[Document](https://dx.doi.org/10.1103/PhysRevE.76.040101)Cited by:[§3\.3](https://arxiv.org/html/2605.16325#S3.SS3.SSS0.Px2.p2.3)\.
- M\. Hennick and G\. Corlouer \(2026\)From density matrices to phase transitions in deep learning: spectral early warnings and interpretability\.arXiv preprint arXiv:2603\.29805\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2603.29805),[Link](https://arxiv.org/abs/2603.29805)Cited by:[§5\.5](https://arxiv.org/html/2605.16325#S5.SS5.SSS0.Px1.p1.1),[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.SSS0.Px2.p1.2),[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.SSS0.Px5.p1.1),[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.p1.1),[§6\.4](https://arxiv.org/html/2605.16325#S6.SS4.p1.1),[§8](https://arxiv.org/html/2605.16325#S8.p3.4)\.
- J\. Hoogland, G\. Wang, M\. Farrugia\-Roberts, L\. Carroll, S\. Wei, and D\. Murfet \(2024\)Loss landscape degeneracy and stagewise development in transformers\.arXiv preprintarXiv:2402\.02364\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2402.02364),[Link](https://arxiv.org/abs/2402.02364)Cited by:[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p3.1),[§3\.4](https://arxiv.org/html/2605.16325#S3.SS4.SSS0.Px1.p1.3),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.p1.4),[§7\.2](https://arxiv.org/html/2605.16325#S7.SS2.p1.1)\.
- F\. P\. Kelly \(1979\)Reversibility and stochastic networks\.John Wiley & Sons\.Cited by:[§4\.1](https://arxiv.org/html/2605.16325#S4.SS1.SSS0.Px4.p1.1)\.
- A\. Kraskov, H\. Stögbauer, and P\. Grassberger \(2004\)Estimating mutual information\.Physical Review E69,pp\. 066138\.External Links:[Document](https://dx.doi.org/10.1103/PhysRevE.69.066138)Cited by:[§3\.2](https://arxiv.org/html/2605.16325#S3.SS2.SSS0.Px2.p1.7)\.
- K\. A\. Kvenvolden, J\. Lawless, K\. Pering, E\. Peterson, J\. Flores, C\. Ponnamperuma, I\. R\. Kaplan, and C\. Moore \(1970\)Evidence for extraterrestrial amino\-acids and hydrocarbons in the Murchison meteorite\.Nature228,pp\. 923–926\.External Links:[Document](https://dx.doi.org/10.1038/228923a0)Cited by:[§5\.1](https://arxiv.org/html/2605.16325#S5.SS1.SSS0.Px1.p1.4)\.
- L\. Le Cam \(1986\)Asymptotic methods in statistical decision theory\.Springer Series in Statistics,Springer\-Verlag\.External Links:[Document](https://dx.doi.org/10.1007/978-1-4612-4946-7)Cited by:[§3\.1](https://arxiv.org/html/2605.16325#S3.SS1.SSS0.Px1.p1.5)\.
- G\. Lecué and M\. Lerasle \(2020\)Robust machine learning by median\-of\-means: theory and practice\.Annals of Statistics48\(2\),pp\. 906–931\.Note:arXiv:1711\.10306 \(2017\)External Links:[Document](https://dx.doi.org/10.1214/19-AOS1828)Cited by:[§3\.3](https://arxiv.org/html/2605.16325#S3.SS3.SSS0.Px2.p2.3)\.
- H\. Lederman and K\. Mahowald \(2026\)Emergent introspection in AI is content\-agnostic\.arXiv preprint arXiv:2603\.05414\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2603.05414),[Link](https://arxiv.org/abs/2603.05414)Cited by:[§3\.2](https://arxiv.org/html/2605.16325#S3.SS2.SSS0.Px1.p1.9),[§3\.2](https://arxiv.org/html/2605.16325#S3.SS2.SSS0.Px2.p2.5),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px1.p2.2),[§5\.5](https://arxiv.org/html/2605.16325#S5.SS5.SSS0.Px3.p1.3),[§7\.6](https://arxiv.org/html/2605.16325#S7.SS6.SSS0.Px3.p1.3),[§8](https://arxiv.org/html/2605.16325#S8.p3.4)\.
- M\. Levin \(2023\)Darwin’s agential materials: evolutionary implications of multiscale competency in developmental biology\.Cellular and Molecular Life Sciences80,pp\. 142\.External Links:[Document](https://dx.doi.org/10.1007/s00018-023-04790-z)Cited by:[§5\.4](https://arxiv.org/html/2605.16325#S5.SS4.p1.1)\.
- S\. Liang, P\. De Los Rios, and D\. M\. Busiello \(2024a\)Thermodynamic bounds on symmetry breaking in linear and catalytic biochemical systems\.Physical Review Letters132,pp\. 228402\.External Links:[Document](https://dx.doi.org/10.1103/PhysRevLett.132.228402)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p3.2),[§7\.5](https://arxiv.org/html/2605.16325#S7.SS5.p1.1),[§7\.5](https://arxiv.org/html/2605.16325#S7.SS5.p2.3)\.
- S\. Liang, P\. De Los Rios, and D\. M\. Busiello \(2024b\)Thermodynamic space of chemical reaction networks\.arXiv preprintarXiv:2407\.11498\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2407.11498),[Link](https://arxiv.org/abs/2407.11498)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p3.2),[§7\.5](https://arxiv.org/html/2605.16325#S7.SS5.p1.1),[§7\.5](https://arxiv.org/html/2605.16325#S7.SS5.p2.3)\.
- J\. Lindsey \(2025\)Emergent introspective awareness in large language models\.Note:Transformer Circuits Thread, AnthropicExternal Links:[Link](https://transformer-circuits.pub/2025/introspection/index.html)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p1.1),[§3\.2](https://arxiv.org/html/2605.16325#S3.SS2.SSS0.Px1.p1.9),[§3\.2](https://arxiv.org/html/2605.16325#S3.SS2.SSS0.Px2.p2.5),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px1.p2.2),[§5\.5](https://arxiv.org/html/2605.16325#S5.SS5.SSS0.Px3.p1.3),[§6\.1](https://arxiv.org/html/2605.16325#S6.SS1.SSS0.Px1.p1.9),[§6\.4](https://arxiv.org/html/2605.16325#S6.SS4.p1.1),[§7\.6](https://arxiv.org/html/2605.16325#S7.SS6.SSS0.Px3.p1.3),[§8](https://arxiv.org/html/2605.16325#S8.p3.4)\.
- E\. H\. Linfoot \(1957\)An informational measure of correlation\.Information and Control1\(1\),pp\. 85–89\.External Links:[Document](https://dx.doi.org/10.1016/S0019-9958%2857%2990116-X)Cited by:[§3\.2](https://arxiv.org/html/2605.16325#S3.SS2.p2.4)\.
- Z\. Liu, O\. Kitouni, N\. Nolte, E\. J\. Michaud, M\. Tegmark, and M\. Williams \(2022\)Towards understanding grokking: an effective theory of representation learning\.InAdvances in Neural Information Processing Systems \(NeurIPS\),External Links:[Document](https://dx.doi.org/10.48550/arXiv.2205.10343),[Link](https://arxiv.org/abs/2205.10343)Cited by:[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px1.p1.5),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.p1.4)\.
- Z\. Liu, Z\. Zhong, and M\. Tegmark \(2023\)Grokking as compression: a nonlinear complexity perspective\.arXiv preprintarXiv:2310\.05918\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2310.05918),[Link](https://arxiv.org/abs/2310.05918)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p1.1),[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p3.1),[§3\.4](https://arxiv.org/html/2605.16325#S3.SS4.SSS0.Px3.p1.3),[§4\.2](https://arxiv.org/html/2605.16325#S4.SS2.SSS0.Px2.p1.4),[§6\.1](https://arxiv.org/html/2605.16325#S6.SS1.SSS0.Px1.p1.9),[§7\.1](https://arxiv.org/html/2605.16325#S7.SS1.p1.1)\.
- U\. Macar, L\. Yang, A\. Wang, P\. Wallich, E\. Ameisen, and J\. Lindsey \(2026\)Mechanisms of introspective awareness\.arXiv preprint arXiv:2603\.21396\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2603.21396),[Link](https://arxiv.org/abs/2603.21396)Cited by:[§3\.2](https://arxiv.org/html/2605.16325#S3.SS2.SSS0.Px2.p2.5),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px1.p2.2),[§5\.5](https://arxiv.org/html/2605.16325#S5.SS5.SSS0.Px3.p1.3),[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.SSS0.Px5.p1.1),[§7\.6](https://arxiv.org/html/2605.16325#S7.SS6.SSS0.Px3.p1.3),[§8](https://arxiv.org/html/2605.16325#S8.p3.4)\.
- T\. Matreux, P\. Aikkila, B\. Scheu, D\. Braun, and C\. B\. Mast \(2024\)Heat flows enrich prebiotic building blocks and enhance their reactivity\.Nature628,pp\. 110–116\.External Links:[Document](https://dx.doi.org/10.1038/s41586-024-07193-7)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p3.2),[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p1.1),[§5\.1](https://arxiv.org/html/2605.16325#S5.SS1.SSS0.Px1.p1.4),[§6\.4](https://arxiv.org/html/2605.16325#S6.SS4.p1.1)\.
- J\. Maynard Smith and E\. Szathmáry \(1995\)The major transitions in evolution\.Oxford University Press\.External Links:ISBN 978\-0198502944Cited by:[§5\.4](https://arxiv.org/html/2605.16325#S5.SS4.p1.1)\.
- B\. A\. McGuire \(2022\)2021 census of interstellar, circumstellar, extragalactic, protoplanetary disk, and exoplanetary molecules\.Astrophysical Journal Supplement Series259\(2\),pp\. 30\.External Links:[Document](https://dx.doi.org/10.3847/1538-4365/ac2a48)Cited by:[§5\.1](https://arxiv.org/html/2605.16325#S5.SS1.SSS0.Px1.p1.4)\.
- S\. P\. Meyn and R\. L\. Tweedie \(1993\)Stability of Markovian processes III: Foster–Lyapunov criteria for continuous\-time processes\.Advances in Applied Probability25\(3\),pp\. 518–548\.External Links:[Document](https://dx.doi.org/10.2307/1427522)Cited by:[§2\.1](https://arxiv.org/html/2605.16325#S2.SS1.p1.11)\.
- N\. Nanda, L\. Chan, T\. Lieberum, J\. Smith, and J\. Steinhardt \(2023\)Progress measures for grokking via mechanistic interpretability\.InThe Eleventh International Conference on Learning Representations \(ICLR\),External Links:[Document](https://dx.doi.org/10.48550/arXiv.2301.05217),[Link](https://arxiv.org/abs/2301.05217)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p1.1),[Figure 2](https://arxiv.org/html/2605.16325#S5.F2),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px1.p1.5),[§7\.1](https://arxiv.org/html/2605.16325#S7.SS1.p1.1),[§7\.6](https://arxiv.org/html/2605.16325#S7.SS6.SSS0.Px6.p1.1)\.
- Y\. Oba, T\. Koga, Y\. Takano, N\. O\. Ogawa, N\. Ohkouchi, K\. Sasaki, H\. Sato, D\. P\. Glavin, J\. P\. Dworkin, H\. Naraoka,et al\.\(2023\)Uracil in the carbonaceous asteroid \(162173\) Ryugu\.Nature Communications14,pp\. 1292\.External Links:[Document](https://dx.doi.org/10.1038/s41467-023-36904-3)Cited by:[§5\.1](https://arxiv.org/html/2605.16325#S5.SS1.SSS0.Px1.p1.4)\.
- C\. Olsson, N\. Elhage, N\. Nanda, N\. Joseph, N\. DasSarma, T\. Henighan, B\. Mann, A\. Askell, Y\. Bai, A\. Chen, T\. Conerly, D\. Drain, D\. Ganguli, Z\. Hatfield\-Dodds, D\. Hernandez, S\. Johnston, A\. Jones, J\. Kernion, L\. Lovitt, K\. Ndousse, D\. Amodei, T\. Brown, J\. Clark, J\. Kaplan, S\. McCandlish, and C\. Olah \(2022\)In\-context learning and induction heads\.Transformer Circuits Thread, Anthropic\.Note:Preprint: arXiv:2209\.11895External Links:[Link](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p1.1),[§7\.6](https://arxiv.org/html/2605.16325#S7.SS6.SSS0.Px6.p1.1)\.
- S\. Pepin Lehalleur, J\. Hoogland, M\. Farrugia\-Roberts, S\. Wei, A\. Gietelink Oldenziel, G\. Wang, L\. Carroll, and D\. Murfet \(2025\)You are what you eat—AI alignment requires understanding how data shapes structure and generalisation\.arXiv preprintarXiv:2502\.05475\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2502.05475),[Link](https://arxiv.org/abs/2502.05475)Cited by:[§3\.4](https://arxiv.org/html/2605.16325#S3.SS4.SSS0.Px1.p1.3),[§7\.2](https://arxiv.org/html/2605.16325#S7.SS2.p1.1)\.
- A\. Power, Y\. Burda, H\. Edwards, I\. Babuschkin, and V\. Misra \(2022\)Grokking: generalization beyond overfitting on small algorithmic datasets\.arXiv preprintarXiv:2201\.02177\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2201.02177),[Link](https://arxiv.org/abs/2201.02177)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p1.1),[Figure 2](https://arxiv.org/html/2605.16325#S5.F2),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px1.p1.5)\.
- R\. Prakki \(2024\)Active inference for self\-organizing multi\-LLM systems: a Bayesian thermodynamic approach to adaptation\.arXiv preprintarXiv:2412\.10425\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2412.10425),[Link](https://arxiv.org/abs/2412.10425)Cited by:[§7\.3](https://arxiv.org/html/2605.16325#S7.SS3.p1.1)\.
- V\. N\. Premakumar, M\. Vaiana, F\. Pop, J\. Rosenblatt, D\. Schwerz de Lucena, K\. Ziman, and M\. S\. A\. Graziano \(2024\)Unexpected benefits of self\-modeling in neural systems\.arXiv preprint arXiv:2407\.10188\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2407.10188),[Link](https://arxiv.org/abs/2407.10188)Cited by:[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px1.p3.8),[§5\.5](https://arxiv.org/html/2605.16325#S5.SS5.SSS0.Px4.p1.3),[§7\.2](https://arxiv.org/html/2605.16325#S7.SS2.p2.5),[§7\.6](https://arxiv.org/html/2605.16325#S7.SS6.SSS0.Px2.p1.2)\.
- M\. Prokopenko, P\. C\. W\. Davies, M\. Harré, M\. G\. Heisler, Z\. Kuncic, G\. F\. Lewis, O\. Livson, J\. T\. Lizier, and F\. E\. Rosas \(2025\)Biological arrow of time: emergence of tangled information hierarchies and self\-modelling dynamics\.Journal of Physics: Complexity6\(1\),pp\. 015006\.External Links:[Document](https://dx.doi.org/10.1088/2632-072X/ad9cdc)Cited by:[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p3.1),[§5\.4](https://arxiv.org/html/2605.16325#S5.SS4.p1.1),[§7\.4](https://arxiv.org/html/2605.16325#S7.SS4.p1.1),[§7\.4](https://arxiv.org/html/2605.16325#S7.SS4.p2.5)\.
- M\. J\. D\. Ramstead, D\. A\. R\. Sakthivadivel, C\. Heins, M\. Koudahl, B\. Millidge, L\. Da Costa, B\. Klein, and K\. J\. Friston \(2023\)On Bayesian mechanics: a physics of and by beliefs\.Interface Focus13\(3\),pp\. 20220029\.External Links:[Document](https://dx.doi.org/10.1098/rsfs.2022.0029)Cited by:[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p3.1),[§3\.4](https://arxiv.org/html/2605.16325#S3.SS4.SSS0.Px2.p1.7),[§7\.3](https://arxiv.org/html/2605.16325#S7.SS3.p1.1)\.
- S\. K\. Rout, S\. Wunnava, M\. Krepl, G\. Cassone, J\. E\. Šponer, C\. B\. Mast, M\. W\. Powner, and D\. Braun \(2025\)Amino acids catalyse RNA formation under ambient alkaline conditions\.Nature Communications16,pp\. 5193\.External Links:[Document](https://dx.doi.org/10.1038/s41467-025-60359-3)Cited by:[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p1.1),[§5\.1](https://arxiv.org/html/2605.16325#S5.SS1.SSS0.Px1.p1.4)\.
- J\. Schnakenberg \(1976\)Network theory of microscopic and macroscopic behavior of master equation systems\.Reviews of Modern Physics48\(4\),pp\. 571–585\.External Links:[Document](https://dx.doi.org/10.1103/RevModPhys.48.571)Cited by:[§1\.2](https://arxiv.org/html/2605.16325#S1.SS2.SSS0.Px1.p1.5),[§2\.2](https://arxiv.org/html/2605.16325#S2.SS2.SSS0.Px1.p1.3)\.
- U\. Seifert \(2012\)Stochastic thermodynamics, fluctuation theorems and molecular machines\.Reports on Progress in Physics75\(12\),pp\. 126001\.External Links:[Document](https://dx.doi.org/10.1088/0034-4885/75/12/126001)Cited by:[§2\.2](https://arxiv.org/html/2605.16325#S2.SS2.SSS0.Px1.p1.3)\.
- J\. Singh, B\. Thoma, D\. Whitaker, M\. Satterly Webley, Y\. Yao, and M\. W\. Powner \(2025\)Thioester\-mediated RNA aminoacylation and peptidyl\-RNA synthesis in water\.Nature644,pp\. 933–944\.External Links:[Document](https://dx.doi.org/10.1038/s41586-025-09388-y)Cited by:[§6\.4](https://arxiv.org/html/2605.16325#S6.SS4.p1.1)\.
- A\. Soligo, E\. Turner, S\. Rajamanoharan, and N\. Nanda \(2025\)Convergent linear representations of emergent misalignment\.arXiv preprint arXiv:2506\.11618\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2506.11618),[Link](https://arxiv.org/abs/2506.11618)Cited by:[§5\.5](https://arxiv.org/html/2605.16325#S5.SS5.SSS0.Px1.p1.1),[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.p1.1),[§8](https://arxiv.org/html/2605.16325#S8.p3.4)\.
- A\. Soligo, E\. Turner, M\. Taylor, S\. Rajamanoharan, and N\. Nanda \(2026\)Emergent misalignment is easy, narrow misalignment is hard\.arXiv preprint arXiv:2602\.07852\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2602.07852),[Link](https://arxiv.org/abs/2602.07852)Cited by:[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.SSS0.Px5.p1.1)\.
- S\. Song, H\. Lederman, J\. Hu, and K\. Mahowald \(2025\)Privileged self\-access matters for introspection in AI\.arXiv preprint arXiv:2508\.14802\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2508.14802),[Link](https://arxiv.org/abs/2508.14802)Cited by:[§3\.2](https://arxiv.org/html/2605.16325#S3.SS2.SSS0.Px2.p2.5),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px1.p2.2),[§5\.5](https://arxiv.org/html/2605.16325#S5.SS5.SSS0.Px3.p1.3),[§7\.6](https://arxiv.org/html/2605.16325#S7.SS6.SSS0.Px3.p1.3),[§8](https://arxiv.org/html/2605.16325#S8.p3.4)\.
- A\. Soulyet al\.\(2025\)Poisoning attacks on LLMs require a near\-constant number of poison samples\.arXiv preprint arXiv:2510\.07192\.Note:Anthropic, UK AI Security Institute, Alan Turing InstituteExternal Links:[Document](https://dx.doi.org/10.48550/arXiv.2510.07192),[Link](https://arxiv.org/abs/2510.07192)Cited by:[§5\.5](https://arxiv.org/html/2605.16325#S5.SS5.SSS0.Px2.p1.10),[§7\.6](https://arxiv.org/html/2605.16325#S7.SS6.SSS0.Px4.p1.6),[§8](https://arxiv.org/html/2605.16325#S8.p3.4)\.
- Q\. H\. Truong and X\. K\. Truong \(2026a\)Prebiotic selection as a physical process: an information quasi\-potential framework for chemical convergence\.bioRxiv\.Note:Preprint, version 2External Links:[Document](https://dx.doi.org/10.64898/2026.04.21.719958),[Link](https://www.biorxiv.org/content/10.64898/2026.04.21.719958v2)Cited by:[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p1.1),[§2\.3](https://arxiv.org/html/2605.16325#S2.SS3.SSS0.Px1.p2.1),[§6\.2](https://arxiv.org/html/2605.16325#S6.SS2.p1.2),[Acknowledgements](https://arxiv.org/html/2605.16325#Sx1.p1.1),[Data accessibility](https://arxiv.org/html/2605.16325#Sx5.p1.1)\.
- X\. K\. Truong, Q\. H\. Truong, D\. T\. Luu, and T\. D\. Phan \(2026\)Why grokking takes so long: a first\-principles theory of representational phase transitions\.arXiv preprint arXiv:2603\.13331\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2603.13331),[Link](https://arxiv.org/abs/2603.13331)Cited by:[§3\.3](https://arxiv.org/html/2605.16325#S3.SS3.SSS0.Px2.p1.9),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px1.p1.5),[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.SSS0.Px1.p1.1)\.
- X\. K\. Truong and Q\. H\. Truong \(2026b\)Norm\-hierarchy transitions in representation learning: when and why neural networks abandon shortcuts\.arXiv preprint arXiv:2603\.07323\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2603.07323),[Link](https://arxiv.org/abs/2603.07323)Cited by:[§3\.3](https://arxiv.org/html/2605.16325#S3.SS3.SSS0.Px2.p1.9),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px1.p1.5),[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.SSS0.Px1.p1.1)\.
- X\. K\. Truong \(2026\)Ontological phase transitions in learning systems: when context shifts force representational restructuring\. Periodic Table of Concepts, Paper V\.Note:Working paper, version 7\. Available on SSRN\.SSRN preprintExternal Links:[Link](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6301678)Cited by:[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p1.1),[§3\.1](https://arxiv.org/html/2605.16325#S3.SS1.SSS0.Px1.p1.1),[§3\.4](https://arxiv.org/html/2605.16325#S3.SS4.SSS0.Px3.p1.3),[Data accessibility](https://arxiv.org/html/2605.16325#Sx5.p1.1)\.
- E\. Turner, A\. Soligo, M\. Taylor, S\. Rajamanoharan, and N\. Nanda \(2025\)Model organisms for emergent misalignment\.arXiv preprint arXiv:2506\.11613\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2506.11613),[Link](https://arxiv.org/abs/2506.11613)Cited by:[§5\.5](https://arxiv.org/html/2605.16325#S5.SS5.SSS0.Px1.p1.1),[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.SSS0.Px4.p1.1),[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.SSS0.Px5.p1.1),[§6\.3](https://arxiv.org/html/2605.16325#S6.SS3.p1.1),[§6\.4](https://arxiv.org/html/2605.16325#S6.SS4.p1.1),[§8](https://arxiv.org/html/2605.16325#S8.p3.4)\.
- S\. I\. Walker and P\. C\. W\. Davies \(2013\)The algorithmic origins of life\.Journal of the Royal Society Interface10\(79\),pp\. 20120869\.External Links:[Document](https://dx.doi.org/10.1098/rsif.2012.0869)Cited by:[§7\.4](https://arxiv.org/html/2605.16325#S7.SS4.p1.1)\.
- S\. Watanabe \(2009\)Algebraic geometry and statistical learning theory\.Cambridge Monographs on Applied and Computational Mathematics,Cambridge University Press\.External Links:[Document](https://dx.doi.org/10.1017/CBO9780511800474)Cited by:[§1\.3](https://arxiv.org/html/2605.16325#S1.SS3.p3.1),[§3\.4](https://arxiv.org/html/2605.16325#S3.SS4.SSS0.Px1.p1.3),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.p2.1),[§7\.2](https://arxiv.org/html/2605.16325#S7.SS2.p1.1)\.
- J\. Wei, Y\. Tay, R\. Bommasani, C\. Raffel, B\. Zoph, S\. Borgeaud, D\. Yogatama, M\. Bosma, D\. Zhou, D\. Metzler, E\. H\. Chi, T\. Hashimoto, O\. Vinyals, P\. Liang, J\. Dean, and W\. Fedus \(2022\)Emergent abilities of large language models\.Transactions on Machine Learning Research \(TMLR\)\.Note:Preprint: arXiv:2206\.07682External Links:[Link](https://openreview.net/forum?id=yzkSU5zdwD)Cited by:[§1\.1](https://arxiv.org/html/2605.16325#S1.SS1.p1.1),[§5\.2](https://arxiv.org/html/2605.16325#S5.SS2.SSS0.Px1.p1.5)\.

Similar Articles

The Spectral Geometry of Thought: Phase Transitions, Instruction Reversal, Token-Level Dynamics, and Perfect Correctness Prediction in How Transformers Reason

arXiv cs.LG

A comprehensive spectral analysis across 11 LLMs revealing that transformers exhibit phase transitions in hidden activation spaces during reasoning versus factual recall, with seven fundamental phenomena including spectral compression, instruction-tuning reversal, and perfect correctness prediction (AUC=1.0) based solely on spectral properties.