Sheaf-Theoretic Transport and Obstruction for Detecting Scientific Theory Shift in AI Agents
Summary
This paper develops a finite sheaf-theoretic framework for detecting scientific theory shift in AI agents by measuring transport and obstruction across representational contexts, and evaluates it on a benchmark designed to separate deformation within a source language from extension of that language.
View Cached Full Text
Cached at: 05/15/26, 06:18 AM
# Sheaf-Theoretic Transport and Obstruction for Detecting Scientific Theory Shift in AI Agents
Source: [https://arxiv.org/html/2605.14033](https://arxiv.org/html/2605.14033)
###### Abstract
Scientific theory shift in AI agents requires more than fitting equations to data\. An artificial scientific agent must detect whether an existing representational framework remains transportable into a new regime, or whether its language has become locally\-to\-globally obstructed and must be extended\. This paper develops a finite sheaf\-theoretic framework for detecting theory\-shift candidates through transport and obstruction\. Contexts are organized as a local\-to\-global structure in which source, overlap, target, and validation charts are fitted, restricted, and tested for gluing\. Obstruction measures failure of coherence through residual fit, overlap incompatibility, constraint violation, limiting\-relation failure, and representational cost\. We evaluate the framework on a controlled transition\-card benchmark designed to separate deformation within a source language from extension of that language\. The main result is direct obstruction ranking: the intended deformation or extension is usually the lowest\-obstruction candidate, and transition type is separated in the benchmark\. A constellation kernel over the same signatures is included only as a secondary representational\-similarity probe\. The aim is not to reconstruct historical paradigm shifts or solve open\-ended autonomous theory invention, but to isolate a finite diagnostic subproblem for AI agents: detecting when representational transport fails and extension becomes the coherent next move\.
###### keywords:
representational change , epistemic architecture , artificial scientific discovery , scientific theory shift , sheaf theory , gluing obstruction , conceptual change
## 1Introduction
Scientific cognition depends on the reuse of representations across changing contexts\. A representation developed in one regime may remain valid in another as an approximation, a limiting case, or a deformed version of the original model\. Classical mechanics survives as a low\-velocity limit; ideal\-gas reasoning survives in dilute regimes; small\-angle pendulum dynamics survives as a local chart of a nonlinear system\. Scientific understanding therefore requires more than fitting observations\. It requires deciding when an existing representation can be transported and when the representational language itself must be enlarged\.
Current AI\-for\-science systems increasingly automate pieces of the scientific process: equation discovery, program search, hypothesis generation, tool use, literature search, experiment planning, coding, analysis, and paper\-like reporting\. Early computational scientific discovery treated law discovery and hypothesis search as explicit computational problems\(Langley et al\.,[1987](https://arxiv.org/html/2605.14033#bib.bib23)\)\. Modern equation\-discovery systems extend this agenda through symbolic regression, sparse identification of dynamics, and structured symbolic representations\(Schmidt & Lipson,[2009](https://arxiv.org/html/2605.14033#bib.bib34); Brunton et al\.,[2016](https://arxiv.org/html/2605.14033#bib.bib5); Udrescu & Tegmark,[2020](https://arxiv.org/html/2605.14033#bib.bib41); Cranmer et al\.,[2020](https://arxiv.org/html/2605.14033#bib.bib8)\)\. Recent benchmark work has further clarified the limits of formula\-recovery tasks and the need to evaluate symbolic discovery systems beyond simple curve\-fitting settings\(Matsubara et al\.,[2022](https://arxiv.org/html/2605.14033#bib.bib28)\)\.
A second line of work moves from formula recovery toward interactive and agentic scientific systems\. Program\-search systems such as FunSearch show how language models can participate in mathematically structured exploration\(Romera\-Paredes et al\.,[2024](https://arxiv.org/html/2605.14033#bib.bib33)\)\. Interactive environments such as ScienceWorld and DiscoveryWorld evaluate whether agents can plan, experiment, and infer in simplified scientific worlds\(Wang et al\.,[2022](https://arxiv.org/html/2605.14033#bib.bib43); Jansen et al\.,[2024](https://arxiv.org/html/2605.14033#bib.bib20)\)\. More recent science\-agent benchmarks and autonomous research workflows test data\-driven scientific tasks, workflow execution, and paper\-like research loops\(Majumder et al\.,[2025](https://arxiv.org/html/2605.14033#bib.bib27); Chen et al\.,[2025](https://arxiv.org/html/2605.14033#bib.bib7); Lu et al\.,[2024](https://arxiv.org/html/2605.14033#bib.bib25)\)\. These capabilities sharpen rather than remove a deeper question: whether an AI agent can recognize when existing representational resources are no longer adequate, and when genuine scientific progress requires a shift in the theory language rather than further search within the space of learned patterns\.
If an artificial scientific agent is to participate in genuine theory change, including discovery\-like transitions comparable in kind to Ptolemaic\-to\-Copernican astronomy, Newtonian\-to\-Einsteinian mechanics, or classical\-to\-quantum transitions, it must diagnose when failure is not merely poor parameterization or insufficient data, but failure of the representational language being transported\. The question is not whether a system can find a better\-fitting formula inside a supplied search space\. The question is whether it can detect the boundary between deformation inside an existing conceptual manifold and extension of that manifold\. This is the diagnostic boundary studied here\. The paper formulates this boundary in sheaf\-theoretic terms: a source theory, a target regime, and their overlap are treated as local contexts, and the test is whether the corresponding charts restrict compatibly, glue across the overlap, preserve limits and constraints, or instead exhibit an obstruction that motivates extension\.
In particular, this paper develops a computational diagnostic for theory transport and extension\. We call the structured object being transported a*representational constellation*\. A constellation includes observables, law schemas, constraints, measurement roles, limiting regimes, theoretical posits, and admissible transformations\. Galilean velocity addition, for example, is not only the equationw=u\+vw=u\+v; it also carries commitments about absolute time, unrestricted velocity composition, and the absence of an invariant speed\. Lorentzian velocity composition changes the equation, but also the constraints, transformations, and limiting relations that define admissible motion\. The broader motivation is to develop a formal account of scientific theory shift in the spirit of local\-to\-global reasoning, with richer Grothendieck\-style and topos\-theoretic constructions left for future work\. The present paper takes a smaller step: it asks how an artificial agent can detect, in a finite setting, when the available local charts no longer glue\.
In this paper, a*scientific theory shift*is a transition in which failure cannot be resolved only by parameter adjustment or bounded deformation inside the original representational language\. A theory shift occurs when coherence across source, overlap, target, and validation contexts requires a change in the representational constellation itself: a new primitive, constraint, law schema, transformation rule, or limiting relation\. The central distinction is therefore between*transport*and*extension*\. Transport preserves the representational language and modifies the representation within that language\. Extension changes the language by adding one of these representational resources\. In future work, richer categorical machinery may provide a way to compare or relate obstructed theoretical contexts through pullback\-like constructions; here, however, the objective is deliberately limited to the finite diagnostic problem of detecting when transport fails and extension is required\.
Open problems such as dark matter or dark energy illustrate the scale of the challenge\. The issue in such cases is not simply to fit one curve, but to assess whether an extended constellation of laws, constraints, scales, measurements, and auxiliary assumptions remains coherent across domains\. The present benchmark is far smaller and does not attempt to model such cases\. It isolates a finite version of the same diagnostic question for AI agents: when does representational transport fail?
Although the definitions introduced here are operational and computational, they are consistent with several conceptual traditions\. In Popperian terms, failures matter, but prediction failure alone does not determine whether a problem is local, parametric, or representational\(Popper,[1959](https://arxiv.org/html/2605.14033#bib.bib31)\)\. Kuhnian and cognitive accounts emphasize that major scientific shifts involve changes in representational resources, not only better numerical fits\(Kuhn,[1962](https://arxiv.org/html/2605.14033#bib.bib22); Thagard,[2012](https://arxiv.org/html/2605.14033#bib.bib40); Nersessian,[2008](https://arxiv.org/html/2605.14033#bib.bib30)\)\. Work on theory building also stresses that scientific construction uses heuristics, case studies, cognitive processes, and defensible but non\-guaranteed principles\(Danks & Ippoliti,[2018](https://arxiv.org/html/2605.14033#bib.bib10)\)\. Our framework also connects to cognitive\-systems accounts in which representational change and knowledge generation are modeled as computational mechanisms\(Sun,[2009](https://arxiv.org/html/2605.14033#bib.bib39); Lieto et al\.,[2019](https://arxiv.org/html/2605.14033#bib.bib24)\)\.
The formalism below makes this diagnostic boundary finite and computable\. In standard sheaf theory, local data are assigned to contexts, restriction maps compare descriptions across refinements, and compatible local sections can glue into a coherent global section\(Mac Lane & Moerdijk,[1992](https://arxiv.org/html/2605.14033#bib.bib26); Johnstone,[2002](https://arxiv.org/html/2605.14033#bib.bib21)\)\. Here this structure is instantiated operationally: contexts are source, overlap, target, and validation regimes; local charts are fitted representational constellations; restriction evaluates charts on shared overlap observations; gluing measures compatibility; and obstruction measures failure to fit, glue, preserve limits, or satisfy constraints\. The construction is finite and local\-to\-global\. It is motivated by sheaf\-theoretic ideas, but it is not a computation in full topos semantics\.
The hypothesis adopted here is that discovery\-like revision begins where representational transport fails\. If a source constellation can be deformed within its original language so that it fits the target, agrees on overlaps, satisfies constraints, and preserves the source limit, the transition is a case of transport\. If no bounded deformation removes the local\-to\-global obstruction, the system must search for a minimal extension of the representational language\. Such an extension is not merely extra flexibility; it is a justified change in the representation that makes the source, overlap, and target regimes coherent again\.
We test this hypothesis with a controlled benchmark built from*transition cards*\. A transition card is a finite, structured record of one proposed theory shift: it specifies a source constellation, the regimes in which that constellation is tested, the observations available in source, overlap, target, and validation contexts, and a finite set of candidate representational moves\. In this sense, a card is a small computational object for asking whether a source representation can be transported into a new regime or whether the representation itself must be extended\. The benchmark contains physics\-inspired transition families of two kinds\. In deformation\-sufficient cards, the correct move remains inside the original representational language\. In extension\-required cards, the correct move introduces a new primitive, constraint, law schema, transformation rule, or limiting relation\. The experiment evaluates whether obstruction signatures rank the intended representational move and whether gluing information contributes to distinguishing transport from representational strain\.
This paper takes an initial computational step toward a broader program on genuine scientific discovery in AI agents\. It does not attempt to solve open\-ended autonomous theory invention or to reconstruct historical paradigm shifts\. Instead, it isolates a necessary diagnostic subproblem: when a familiar theory is moved outside its native regime, can an artificial scientific agent detect whether deformation is sufficient, or whether extension of the representational language is required?
### 1\.1Contributions
This paper makes four contributions\. First, it casts scientific theory shift as a finite diagnostic problem for AI agents: detecting when transport inside a source representational language is insufficient and extension is required\. Second, it introduces representational constellations as structured local charts for scientific models\. Third, it formalizes transport, restriction, gluing, obstruction, and minimal extension in a finite sheaf\-theoretic setting\. Fourth, it evaluates the resulting obstruction signatures, together with a secondary constellation\-kernel probe, on controlled transition families that separate deformation from extension\.
## 2Sheaf\-Theoretic Background for Local\-to\-Global Structure
The mathematical background of the paper is sheaf theory\. The relevant idea is local\-to\-global organization: data, descriptions, or constraints are assigned locally over a base of contexts, and a global description exists only when the local descriptions are compatible on their overlaps\. This is the classical role of sheaves in geometry and logic\(Mac Lane & Moerdijk,[1992](https://arxiv.org/html/2605.14033#bib.bib26); Johnstone,[2002](https://arxiv.org/html/2605.14033#bib.bib21)\)\. Applied sheaf theory uses the same local\-to\-global structure for data fusion and networked consistency problems\(Curry,[2014](https://arxiv.org/html/2605.14033#bib.bib9); Robinson,[2017](https://arxiv.org/html/2605.14033#bib.bib32)\), while finite and cellular sheaf methods make compatibility, inconsistency, and obstruction computationally tractable\(Hansen & Ghrist,[2019](https://arxiv.org/html/2605.14033#bib.bib16); Ayzenberg et al\.,[2025](https://arxiv.org/html/2605.14033#bib.bib2)\)\.
This local\-to\-global viewpoint is natural for scientific representations\. A theory is often valid first as a local chart over a domain of applicability: a small\-angle pendulum model applies over small angular amplitudes, Newtonian kinetic energy in a low\-velocity regime, and Rayleigh–Jeans reasoning in a long\-wavelength or low\-frequency regime\(Feynman et al\.,[2011](https://arxiv.org/html/2605.14033#bib.bib12)\)\. In each case, the representation carries not only a predictive formula but also a domain of validity, a set of constraints, and a relation to neighboring regimes\. The question is therefore not only whether a model fits a set of observations, but whether local descriptions can be restricted, compared, and glued into a coherent larger description\.
### 2\.1Contexts, refinements, and covers
Let𝒞\\mathcal\{C\}be a finite category of contexts\. An objectU∈𝒞U\\in\\mathcal\{C\}represents a regime in which observations, constraints, and representational assumptions are specified\. In the present paper, a context is a regime of use: source, overlap, target, validation, or more generally any domain in which a description is meant to apply\. For a scientific agent, a context is therefore not just a data subset; it is a regime in which a model is being asked to make sense under particular assumptions\. A context may encode a physical regime, an approximation domain, a measurement protocol, a data\-quality restriction, or a modeling assumption\.
A morphism
is interpreted as a refinement or restriction of context\. It means thatVVis a more specific or more demanding view ofUU: a narrower domain, a stricter measurement setting, a higher\-resolution regime, or a target regime in which additional constraints become active\. In scientific terms, this is the operation of asking how a description valid in one regime behaves when read in a more specific or overlapping regime\. This is the same formal role played by restriction maps in sheaf theory and by change\-of\-context maps in sheaf semantics\(Mac Lane & Moerdijk,[1992](https://arxiv.org/html/2605.14033#bib.bib26); Johnstone,[2002](https://arxiv.org/html/2605.14033#bib.bib21); Caramello,[2018](https://arxiv.org/html/2605.14033#bib.bib6)\)\.
A cover of a contextUUis a family of local contexts
\{Ui⟶U\}i∈I\\\{U\_\{i\}\\longrightarrow U\\\}\_\{i\\in I\}that jointly probeUU\. Covers express the idea that a broader regime can be studied through compatible local regimes\. For scientific discovery, this means that a theory is not tested in one undifferentiated domain, but across partial regimes whose compatibility must be checked\. In the finite experiments below, source, overlap, and target regimes form the operational cover used to test whether a candidate representation can be treated as one coherent description across regimes\. The overlap context is central because it is where independently fitted local descriptions are restricted and compared\.
As a running example, consider transporting Galilean velocity composition into a higher\-velocity regime\. The source contextUsU\_\{s\}contains low\-velocity observations where the additive laww=u\+vw=u\+vis adequate\. The target contextUtU\_\{t\}contains higher subluminal velocities where invariant\-speed constraints become relevant\. The overlap contextUoU\_\{o\}contains intermediate velocities where source\-fitted and target\-fitted descriptions can both be evaluated\. In the finite site used here,UoU\_\{o\}is treated as a common refinement of the source and target regimes, with mapsUo→UsU\_\{o\}\\to U\_\{s\}andUo→UtU\_\{o\}\\to U\_\{t\}\. These maps express that the overlap is the common regime on which the two local descriptions must be restricted and compared\. The question is whether these restricted descriptions agree sufficiently to be treated as one transported representation, or whether the mismatch indicates obstruction\.
### 2\.2Presheaves as context\-dependent descriptions
Intuitively, a presheaf organizes descriptions that depend on context\. To connect this intuition with standard notation, letXXdenote a space of contexts or regimes, and let𝒪\(X\)\\mathcal\{O\}\(X\)be a family of admissible contexts\. A presheaf of representations is written
ℱ:𝒪\(X\)op→𝐒𝐞𝐭,U↦ℱ\(U\),\\mathcal\{F\}:\\mathcal\{O\}\(X\)^\{\\mathrm\{op\}\}\\to\\mathbf\{Set\},\\qquad U\\mapsto\\mathcal\{F\}\(U\),whereℱ\(U\)\\mathcal\{F\}\(U\)is the set of admissible descriptions, local laws, or representational constellations on contextUU\. For an inclusion or refinementV⊆UV\\subseteq U, the restriction map is
ρVU:ℱ\(U\)→ℱ\(V\)\.\\rho^\{U\}\_\{V\}:\\mathcal\{F\}\(U\)\\to\\mathcal\{F\}\(V\)\.This is the standard contravariant notation for a presheaf\(Mac Lane & Moerdijk,[1992](https://arxiv.org/html/2605.14033#bib.bib26); Johnstone,[2002](https://arxiv.org/html/2605.14033#bib.bib21)\)\. In scientific terms, the restriction map answers the question: if a description is valid in one regime, what does that same description imply when it is read, evaluated, or compared in a more specific or overlapping regime?
Here,XXis not an arbitrary topological space\. The operational site is finite: source, overlap, target, and validation regimes play the role of contexts; candidate constellations play the role of local sections; and restriction is implemented by evaluating fitted charts on shared overlap observations\. A presheaf is therefore the bookkeeping structure that records which representational constellations are admissible in each regime and how those constellations are compared across regimes\. Because empirical compatibility is approximate, the exact sheaf condition is replaced below by a quantitative obstruction\.
Applied sheaf theory uses this structure to represent distributed measurements, local constraints, and compatibility across networks or graphs\(Robinson,[2017](https://arxiv.org/html/2605.14033#bib.bib32); Curry,[2014](https://arxiv.org/html/2605.14033#bib.bib9); Hansen & Ghrist,[2019](https://arxiv.org/html/2605.14033#bib.bib16)\)\. In knowledge representation, sheaf\-theoretic formulations describe embeddings or assignments as approximate global sections satisfying local schema constraints\(Gebhart et al\.,[2023](https://arxiv.org/html/2605.14033#bib.bib15)\)\. In graph learning, cellular sheaves generalize graph\-based diffusion by assigning structured data and restriction maps to nodes and edges rather than treating the graph as a homogeneous carrier of scalar features\(Bodnar et al\.,[2022](https://arxiv.org/html/2605.14033#bib.bib3)\)\. The common theme is that local assignments matter not only individually, but also through the way they restrict and agree across shared structure\.
The key point for scientific representation is that models are not compared only by global prediction error\. They must also behave correctly under restriction\. A representation that fits a target regime but destroys the source limit has not transported the source theory correctly\. A representation that fits two local regimes but gives incompatible consequences on their overlap has not produced a coherent global chart\.
Continuing the Galilean\-to\-Lorentz example,ℱ\(Us\)\\mathcal\{F\}\(U\_\{s\}\)contains descriptions admissible in the low\-velocity source regime, including the Galilean additive law\. The target setℱ\(Ut\)\\mathcal\{F\}\(U\_\{t\}\)contains descriptions admissible in the higher\-velocity regime, where invariant\-speed constraints may become active\. The overlap setℱ\(Uo\)\\mathcal\{F\}\(U\_\{o\}\)contains descriptions evaluated in the intermediate regime\. Restriction maps such asρUoUs\\rho^\{U\_\{s\}\}\_\{U\_\{o\}\}andρUoUt\\rho^\{U\_\{t\}\}\_\{U\_\{o\}\}express how source\-fitted and target\-fitted descriptions are read on the common overlap\. The obstruction test will ask whether these restricted descriptions agree as one transported constellation, or whether their mismatch indicates that the original presheaf of admissible descriptions must be extended\.
### 2\.3The sheaf condition: agreement and gluing
A sheaf is a presheaf with an additional local\-to\-global property\. If several local descriptions agree when restricted to their overlaps, then they should be understood as parts of one coherent global description\. This is the sense in which gluing is used here\. In scientific terms, the point is simple: it is not enough for a model to work separately in several nearby regimes; those local uses must also agree where the regimes meet\.
Suppose a contextUUis covered by local contexts
\{Ui→U\}i∈I\.\\\{U\_\{i\}\\to U\\\}\_\{i\\in I\}\.A local description is a sectionsi∈ℱ\(Ui\)s\_\{i\}\\in\\mathcal\{F\}\(U\_\{i\}\)\. In the present setting, one can think ofsis\_\{i\}as a candidate representational constellation—a law schema together with its constraints and limiting assumptions—as used in regimeUiU\_\{i\}\. A family\{si\}i∈I\\\{s\_\{i\}\\\}\_\{i\\in I\}is compatible, or a matching family, when its restrictions agree on pairwise overlaps:
ρUi∩UjUi\(si\)=ρUi∩UjUj\(sj\)for alli,j\.\\rho^\{U\_\{i\}\}\_\{U\_\{i\}\\cap U\_\{j\}\}\(s\_\{i\}\)=\\rho^\{U\_\{j\}\}\_\{U\_\{i\}\\cap U\_\{j\}\}\(s\_\{j\}\)\\qquad\\text\{for all \}i,j\.Compatibility means that the local descriptions do not contradict one another on their shared domains\. They may have been fitted or formulated locally, but once restricted to the common regime, they make the same claims there\.
A sheaf is a presheaf in which every compatible family of local sections glues to a unique sections∈ℱ\(U\)s\\in\\mathcal\{F\}\(U\)satisfying
ρUiU\(s\)=sifor alli∈I\.\\rho^\{U\}\_\{U\_\{i\}\}\(s\)=s\_\{i\}\\qquad\\text\{for all \}i\\in I\.This is the standard locality\-and\-gluing condition for sheaves\(Mac Lane & Moerdijk,[1992](https://arxiv.org/html/2605.14033#bib.bib26); Johnstone,[2002](https://arxiv.org/html/2605.14033#bib.bib21)\)\. Intuitively, if the local pieces already agree wherever they overlap, then there is one coherent global description of which they are the local parts\.
This is the formal local\-to\-global principle used throughout the paper\. Local adequacy is not enough\. A candidate representation must also glue\. In scientific terms, a model should fit the source context, fit the target context, preserve the correct limiting relation, and give compatible consequences on the overlap\. A candidate representation can fit source and target observations separately, but if its source\-fitted and target\-fitted charts disagree on the overlap, then the local pieces do not glue\. Similar local\-to\-global consistency ideas appear in sheaf models of contextuality\(Abramsky & Brandenburger,[2011](https://arxiv.org/html/2605.14033#bib.bib1)\), distributed task solvability\(Felber et al\.,[2025](https://arxiv.org/html/2605.14033#bib.bib11)\), and sensor integration\(Robinson,[2017](https://arxiv.org/html/2605.14033#bib.bib32)\)\.
Figure[1](https://arxiv.org/html/2605.14033#S2.F1)gives the geometric intuition behind the finite construction used below\. In panel \(a\), the local chartss1,s2,s3s\_\{1\},s\_\{2\},s\_\{3\}restrict compatibly on the overlaps, so they can be interpreted as parts of one coherent global section\. In panel \(b\), the local charts still exist and may each be locally meaningful, but the restricted descriptions fail to agree on the shared overlap, so no coherent gluing is available inside the same representational family\. The bottom insets visualize this difference: agreement on the overlap yields a smooth continuation, whereas disagreement produces a visible mismatch\.
As a running scientific example, consider again the attempt to transport Galilean velocity composition into a higher\-velocity regime\. LetU1U\_\{1\}denote a low\-velocity source regime,U3U\_\{3\}a higher\-velocity target regime, andU2U\_\{2\}an intermediate overlap regime\. A source\-based local charts1s\_\{1\}may encode the additive laww=u\+vw=u\+v, while a target\-based charts3s\_\{3\}may encode a representation adapted to the higher\-velocity regime\. The overlap charts2s\_\{2\}represents the intermediate domain in which both sides can be compared\. If the restrictions of these descriptions agree on the shared regime, then they behave like panel \(a\) of Figure[1](https://arxiv.org/html/2605.14033#S2.F1): the theory has been transported coherently\. If the restrictions disagree, as in panel \(b\), then the mismatch indicates that the source representation does not extend coherently across regimes\. In the finite setting of this paper, that failure of gluing is what motivates the introduction of obstruction\.
Figure 1:Geometric intuition for restriction, gluing, and obstruction\. The context landscape is covered by overlapping local regionsU1,U2,U3U\_\{1\},U\_\{2\},U\_\{3\}, each equipped with a local sectionsi∈ℱ\(Ui\)s\_\{i\}\\in\\mathcal\{F\}\(U\_\{i\}\)\. In panel \(a\), the restrictions of the local sections agree on overlaps, so the local descriptions can be glued into a coherent global sections∈ℱ\(U\)s\\in\\mathcal\{F\}\(U\)\. In panel \(b\), the restrictions disagree on an overlap, producing a finite local\-to\-global obstruction within the present representational family\. The drawing is schematic: the paper operationalizes this local\-to\-global idea with finite source, overlap, target, and validation contexts rather than a full topological sheaf\.
### 2\.4Obstruction as failed gluing
In the exact sheaf condition, compatibility is categorical: local sections either agree on overlaps or they do not\. In empirical settings, compatibility is approximate and quantitative\. Data are noisy, models are approximate, and agreement has to be measured rather than asserted\. We therefore use a finite obstruction functional\. Given a candidate constellationKK, local contextsUiU\_\{i\}, and overlapsUi∩UjU\_\{i\}\\cap U\_\{j\}, we measure how far the locally fitted descriptions are from gluing:
𝖦𝗅𝗎𝖾\(K\)=∑i<jd\(ρUi∩UjUi\(K^i\),ρUi∩UjUj\(K^j\)\),\\mathsf\{Glue\}\(K\)=\\sum\_\{i<j\}d\\\!\\left\(\\rho^\{U\_\{i\}\}\_\{U\_\{i\}\\cap U\_\{j\}\}\(\\widehat\{K\}\_\{i\}\),\\rho^\{U\_\{j\}\}\_\{U\_\{i\}\\cap U\_\{j\}\}\(\\widehat\{K\}\_\{j\}\)\\right\),whereK^i\\widehat\{K\}\_\{i\}denotes the candidate fitted in contextUiU\_\{i\}, andddis a discrepancy between restricted predictions or constraint profiles\. In scientific terms,𝖦𝗅𝗎𝖾\(K\)\\mathsf\{Glue\}\(K\)asks whether independently fitted local versions of the same proposed representation make compatible claims where their regimes overlap\.
This use of obstruction follows the applied sheaf\-theoretic view that incompatibility of local data is itself informative\. Cellular sheaf Laplacians and sheaf cohomology provide computational ways to quantify consistency, disagreement, and obstruction in finite settings\(Curry,[2014](https://arxiv.org/html/2605.14033#bib.bib9); Hansen & Ghrist,[2019](https://arxiv.org/html/2605.14033#bib.bib16); Ayzenberg et al\.,[2025](https://arxiv.org/html/2605.14033#bib.bib2)\)\. The benchmark uses the same principle in a simpler finite form: obstruction is a measured failure of local descriptions to become one coherent representation\. In the theory\-shift setting, such failure is not treated merely as residual error\. It is evidence that the current representational family may not transport across regimes\.
The full obstruction also includes local residuals, constraint violations, limit failures, and representational cost\. Thus obstruction is not simply prediction error\. It measures failure of local\-to\-global coherence:
𝖮𝖻𝗌\(K\)=wresRres\(K\)\+wglueGglue\(K\)\+wconCviol\(K\)\+wlimPlimit\(K\)\+λ𝖢𝗈𝗌𝗍\(K\)\.\\mathsf\{Obs\}\(K\)=w\_\{\\mathrm\{res\}\}R\_\{\\mathrm\{res\}\}\(K\)\+w\_\{\\mathrm\{glue\}\}G\_\{\\mathrm\{glue\}\}\(K\)\+w\_\{\\mathrm\{con\}\}C\_\{\\mathrm\{viol\}\}\(K\)\+w\_\{\\mathrm\{lim\}\}P\_\{\\mathrm\{limit\}\}\(K\)\+\\lambda\\,\\mathsf\{Cost\}\(K\)\.HereRres\(K\)R\_\{\\mathrm\{res\}\}\(K\)aggregates local residuals across source, overlap, target, and validation contexts;Gglue\(K\)G\_\{\\mathrm\{glue\}\}\(K\)measures disagreement between restricted local charts;Cviol\(K\)C\_\{\\mathrm\{viol\}\}\(K\)penalizes violations of structural constraints;Plimit\(K\)P\_\{\\mathrm\{limit\}\}\(K\)penalizes failure to recover the source representation as an appropriate limit; and𝖢𝗈𝗌𝗍\(K\)\\mathsf\{Cost\}\(K\)penalizes unnecessary representational change\.
A low\-obstruction candidate is one whose local descriptions fit, restrict, and glue while preserving the relevant constraints and limiting relations\. A high\-obstruction candidate may fit some local data but fails as a coherent representation across contexts\. In the Galilean\-to\-Lorentz example, this means that a candidate should not only fit low\- and higher\-velocity observations separately; its source\- and target\-fitted charts should also agree on the intermediate overlap and preserve the relevant low\-velocity limit\. Failure on these terms is what the obstruction functional records\.
### 2\.5Transport and extension
The distinction between transport and extension can now be stated in sheaf\-theoretic terms\. Transport tries to carry a description from one context to another while keeping the same representational language\. In the notation above, transport asks whether a source descriptions∈ℱ\(Us\)s\\in\\mathcal\{F\}\(U\_\{s\}\)can be restricted, deformed, or refitted so that it remains compatible with the overlap and target contexts\. In scientific terms, transport is the case in which the original language still has enough expressive resources to make sense of the new regime\.
Extension changes the presheaf itself\. A new relation, constraint, primitive, law schema, or limiting relation enlarges the set of admissible descriptions:
ℱ↝ℱ\+\.\\mathcal\{F\}\\quad\\leadsto\\quad\\mathcal\{F\}^\{\+\}\.The extension is justified when obstruction remains high insideℱ\\mathcal\{F\}, but drops after passing toℱ\+\\mathcal\{F\}^\{\+\}\. In other words, the system does not add expressive capacity merely to improve fit; it adds a specific representational resource because that resource restores coherence across source, overlap, and target regimes\.
In the Galilean\-to\-Lorentz example, transport would mean that the Galilean representational language can be deformed or refitted so that its low\-velocity law, overlap behavior, and higher\-velocity predictions remain mutually compatible\. Extension is required when this cannot be done inside the original language\. Adding an invariant\-speed constraint and Lorentzian velocity composition changes the admissible descriptions fromℱ\\mathcal\{F\}to an extended familyℱ\+\\mathcal\{F\}^\{\+\}, within which the source limit and higher\-velocity regime can be made coherent\.
Thus, the sheaf\-theoretic structure gives the paper its central criterion\. A representational shift is a deformation when coherence can be restored inside the original presheaf of descriptions\. It is an extension when coherence requires enlarging that presheaf\. This is why the benchmark below separates deformation\-sufficient transition families from extension\-required transition families\.
### 2\.6Finite sheaf model used in this paper
The experiments instantiate the preceding definitions on a finite site\. Each transition card supplies four contexts,
Us,Uo,Ut,Uv,U\_\{s\},\\quad U\_\{o\},\\quad U\_\{t\},\\quad U\_\{v\},corresponding to source, overlap, target, and validation regimes\. A candidate moveΔj\\Delta\_\{j\}produces a candidate constellationKj=K0\+ΔjK\_\{j\}=K\_\{0\}\+\\Delta\_\{j\}\. The associated presheafℱ\\mathcal\{F\}specifies which descriptions are admissible in each context, and the fitted local charts
K^j,s∈ℱ\(Us\),K^j,t∈ℱ\(Ut\)\\widehat\{K\}\_\{j,s\}\\in\\mathcal\{F\}\(U\_\{s\}\),\\qquad\\widehat\{K\}\_\{j,t\}\\in\\mathcal\{F\}\(U\_\{t\}\)are restricted to the overlap by evaluating them onUoU\_\{o\}\. Gluing is then the measured disagreement between
ρUoUs\(K^j,s\)andρUoUt\(K^j,t\),\\rho^\{U\_\{s\}\}\_\{U\_\{o\}\}\(\\widehat\{K\}\_\{j,s\}\)\\quad\\text\{and\}\\quad\\rho^\{U\_\{t\}\}\_\{U\_\{o\}\}\(\\widehat\{K\}\_\{j,t\}\),while obstruction combines this disagreement with residual fit, constraint penalties, limit penalties, and representational cost\.
In the Galilean\-to\-Lorentz running example,UsU\_\{s\}contains low\-velocity data,UoU\_\{o\}contains intermediate velocities where both source\- and target\-fitted charts can be tested,UtU\_\{t\}contains higher subluminal velocities, andUvU\_\{v\}provides held\-out validation\. A fixed Galilean candidate may fitUsU\_\{s\}, but its restrictions can disagree with the target\-fitted chart onUoU\_\{o\}and fail the invariant\-speed constraints inUtU\_\{t\}\. A Lorentzian extension changes the admissible constellation family so that the overlap restrictions agree, the low\-velocity limit is preserved, and the higher\-velocity constraints are satisfied\. This finite computation is the operational form of transport, gluing, and obstruction used in the benchmark\.
The construction is finite and computational, but it follows the same local\-to\-global logic used in applied sheaf work on data fusion and distributed consistency, where local assignments are compared through restriction maps and global coherence becomes a computable property\(Robinson,[2017](https://arxiv.org/html/2605.14033#bib.bib32); Felber et al\.,[2025](https://arxiv.org/html/2605.14033#bib.bib11)\)\. Related work in knowledge representation and graph learning likewise treats local assignments, schemas, or cellular sheaf data as objects whose compatibility can be computed\(Gebhart et al\.,[2023](https://arxiv.org/html/2605.14033#bib.bib15); Bodnar et al\.,[2022](https://arxiv.org/html/2605.14033#bib.bib3); Ayzenberg et al\.,[2025](https://arxiv.org/html/2605.14033#bib.bib2)\)\. The benchmark below uses this logic as a diagnostic for scientific theory shift rather than as a full topos\-theoretic semantics\.
## 3Representational Constellations
Scientific representations are not exhausted by their equations\. A scientific model also carries a structured set of commitments: which quantities are observable, which entities are theoretical posits, which transformations are admissible, which constraints must be preserved, which measurements define the relevant variables, and which limiting regimes connect the model to neighboring descriptions\. This view is consistent with work on models as mediators in scientific practice\(Morgan & Morrison,[1999](https://arxiv.org/html/2605.14033#bib.bib29)\), model\-based reasoning and conceptual change\(Nersessian,[2008](https://arxiv.org/html/2605.14033#bib.bib30); Thagard,[2012](https://arxiv.org/html/2605.14033#bib.bib40)\), and conceptual spaces as structured representational resources\(Gärdenfors,[2000](https://arxiv.org/html/2605.14033#bib.bib13)\)\.
We call this organized structure a*representational constellation*\. A constellation is a local chart for a scientific regime: it contains law schemas, observables, measurement roles, constraints, limit relations, transformation rules, and theoretical posits\. The term “constellation” emphasizes that the object being transported is not a single formula, but a configuration of representational elements that jointly determine what counts as an admissible description\.
For example, Galilean velocity addition is not only the equationw=u\+vw=u\+v\. It belongs to a constellation involving absolute time, unconstrained velocity composition, Galilean transformations between inertial frames, and the absence of an invariant speed\. Lorentzian velocity composition changes the formula, but also the constraints, transformations, and limiting relations that organize admissible motion\. Likewise, the Rayleigh–Jeans law is not merely a failed high\-frequency formula; it belongs to a classical equipartition constellation without a quantization scale\. Planck\-like radiation introduces a new primitive and a different admissibility structure\. In these cases, conceptual change is not simply parameter replacement, but reorganization of the representational constellation\(Kuhn,[1962](https://arxiv.org/html/2605.14033#bib.bib22); Nersessian,[2008](https://arxiv.org/html/2605.14033#bib.bib30); Thagard,[2012](https://arxiv.org/html/2605.14033#bib.bib40)\)\.
Table 1:Representational constellation shift from Galilean to Lorentzian velocity composition\. The shift changes not only the composition law, but also the constraints, transformations, and limiting structure that define admissible motion\.### 3\.1Constituents of a constellation
A representational constellation𝒦\\mathcal\{K\}is modeled as a typed structure
𝒦=\(𝒪,𝒫,ℒ,𝒞str,ℳ,ℛlim,𝒯\),\\mathcal\{K\}=\\bigl\(\\mathcal\{O\},\\mathcal\{P\},\\mathcal\{L\},\\mathcal\{C\}\_\{\\mathrm\{str\}\},\\mathcal\{M\},\\mathcal\{R\}\_\{\\mathrm\{lim\}\},\\mathcal\{T\}\\bigr\),where𝒪\\mathcal\{O\}denotes observables,𝒫\\mathcal\{P\}theoretical posits,ℒ\\mathcal\{L\}law schemas,𝒞str\\mathcal\{C\}\_\{\\mathrm\{str\}\}structural constraints,ℳ\\mathcal\{M\}measurement roles,ℛlim\\mathcal\{R\}\_\{\\mathrm\{lim\}\}limit relations, and𝒯\\mathcal\{T\}admissible transformations\. This tuple is a working representation rather than a complete ontology of science: it keeps the features needed to distinguish deformation within a representational language from extension of that language\.
Each component has a different diagnostic role\. Observables define what can be compared to data\. Law schemas specify functional or relational structure\. Structural constraints encode admissibility conditions such as conservation, boundedness, positivity, finite\-energy behavior, or invariant\-speed conditions\. Limit relations specify how one description reduces to another in a source regime\. Transformation rules specify how quantities may change across frames, scales, or contexts\. Measurement roles connect theoretical quantities to the observations that instantiate them\.
Separating these components matters because failure can occur in different places\. A candidate formula may fit a target regime but violate the source limit\. Another may fit source and target observations but fail to agree on the overlap\. Another may lower residual error by adding flexible terms while violating the structural constraint that made the original representation meaningful\. Treating a model as a constellation makes these failures explicit rather than collapsing them into a single prediction\-error score\.
### 3\.2Constellations as typed graphs
For computation, a constellation is encoded as a typed graph
G𝒦=\(V𝒦,E𝒦,τV,τE\),G\_\{\\mathcal\{K\}\}=\(V\_\{\\mathcal\{K\}\},E\_\{\\mathcal\{K\}\},\\tau\_\{V\},\\tau\_\{E\}\),whereV𝒦V\_\{\\mathcal\{K\}\}contains representational elements,E𝒦E\_\{\\mathcal\{K\}\}contains relations among them, andτV,τE\\tau\_\{V\},\\tau\_\{E\}assign node and edge types\. The node\-type mapτV\\tau\_\{V\}distinguishes*observables*,*theoretical posits*,*law schemas*,*constraints*,*measurement roles*,*limit relations*,*transformation rules*, and*contexts*\. The edge\-type mapτE\\tau\_\{E\}distinguishes relations of*use*,*assumption*,*constraint*,*preservation*,*validity in a context*,*reduction to a limit*,*extension*,*conflict*,*measurement*,*introduction*, and*removal*\.
The graph does not replace the mathematical law schema\. Rather, ifℓ∈V𝒦\\ell\\in V\_\{\\mathcal\{K\}\}is a node of type*law schema*, the incident typed edges record the commitments surroundingℓ\\ell: assumptions\(ℓ,assumes,p\)\(\\ell,\\mathrm\{assumes\},p\), constraints\(ℓ,constrains,c\)\(\\ell,\\mathrm\{constrains\},c\), contexts\(ℓ,valid\_in,U\)\(\\ell,\\mathrm\{valid\\\_in\},U\), limiting relations\(ℓ,preserves,r\)\(\\ell,\\mathrm\{preserves\},r\), and representational changes such as\(ℓ,introduces,q\)\(\\ell,\\mathrm\{introduces\},q\)or\(ℓ,removes,q′\)\(\\ell,\\mathrm\{removes\},q^\{\\prime\}\)\. This is why two candidates with similar residual error may still be different constellations: one may preserve a source limit while another breaks it; one may introduce a new primitive while another merely deforms an old formula\.
This representation also makes transition families comparable\. The graph is not intended as a generic knowledge graph, but as a typed record of the local scientific chart being tested\. It keeps the connection to scientific models as structured representational resources\(Gärdenfors,[2000](https://arxiv.org/html/2605.14033#bib.bib13); Morgan & Morrison,[1999](https://arxiv.org/html/2605.14033#bib.bib29)\), while giving the benchmark a computable object for structural comparison\. In the experiments, these graph records support secondary similarity probes through graph\-based representations and graph kernels\(Vishwanathan et al\.,[2010](https://arxiv.org/html/2605.14033#bib.bib42)\)\.
### 3\.3Deformation and extension
A change of constellation can preserve the existing representational language or enlarge it\. A*deformation*modifies a law schema, parameterization, or correction term while keeping the same basic representational resources:
𝒦↝𝒦θ\.\\mathcal\{K\}\\leadsto\\mathcal\{K\}\_\{\\theta\}\.Here𝒦\\mathcal\{K\}is the source constellation andθ\\thetadenotes an admissible within\-language adjustment\. For example, if𝒦\\mathcal\{K\}contains the small\-angle pendulum law, then𝒦θ\\mathcal\{K\}\_\{\\theta\}may contain a finite\-angle correction parameterized by amplitude\. If𝒦\\mathcal\{K\}contains the ideal\-gas law, then𝒦θ\\mathcal\{K\}\_\{\\theta\}may add virial coefficients\. If𝒦\\mathcal\{K\}contains Ohm’s law with constant resistance, then𝒦θ\\mathcal\{K\}\_\{\\theta\}may add temperature dependence\. In each case, the original variables, constraints, and limiting interpretation remain recognizable, and the source description is preserved as an appropriate limit\.
An*extension*changes what counts as an admissible description:
𝒦↝𝒦\+\.\\mathcal\{K\}\\leadsto\\mathcal\{K\}^\{\+\}\.Here𝒦\+\\mathcal\{K\}^\{\+\}is not just a reparameterized version of𝒦\\mathcal\{K\}; it is an enlarged constellation with a new primitive, constraint, transformation rule, limiting relation, or theoretical posit\. If𝒦\\mathcal\{K\}is the Galilean velocity constellation, then𝒦\+\\mathcal\{K\}^\{\+\}may add invariant\-speed structure, Lorentz transformations, and Lorentzian velocity composition\. If𝒦\\mathcal\{K\}is the Newtonian kinetic\-energy constellation, then𝒦\+\\mathcal\{K\}^\{\+\}may add a relativistic high\-velocity law with the Newtonian expression as a limit\. If𝒦\\mathcal\{K\}is the Rayleigh–Jeans constellation, then𝒦\+\\mathcal\{K\}^\{\+\}may add a quantization scale and a finite\-energy constraint\. These moves are not merely larger parameter spaces\. They change the representational resources available to the model\.
This distinction is the one tested in the experiments\. For a transition cardTT, candidate moves generate constellations
𝒦j=Δj\(𝒦\)\.\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\)\.A card is*deformation\-sufficient*when the lowest\-obstruction candidate belongs to the within\-language family\{𝒦θ\}\\\{\\mathcal\{K\}\_\{\\theta\}\\\}\. It is*extension\-required*when every admissible deformation remains obstructed and the lowest\-obstruction candidate lies in an enlarged familyℱ\+\\mathcal\{F\}^\{\+\}, represented by some𝒦\+\\mathcal\{K\}^\{\+\}\. Thus the benchmark does not ask only which candidate fits the target best; it asks whether low obstruction is achievable inside the source language or only after the language is extended\.
### 3\.4Constellations as local charts
A constellation𝒦\\mathcal\{K\}is indexed by a contextUU\. It is not simply true or false globally; it is valid over a regime, under specified measurement roles, constraints, and limiting assumptions\. A fitted constellation in contextUUis therefore treated as a local chart
𝒦^U∈ℱ\(U\),\\widehat\{\\mathcal\{K\}\}\_\{U\}\\in\\mathcal\{F\}\(U\),whereℱ\(U\)\\mathcal\{F\}\(U\)is the set of admissible constellations in that context\. The sheaf\-theoretic question is whether charts fitted in different contexts can be restricted to their overlaps and glued into a coherent larger chart\.
For a transition cardTT, formally defined in the next section, the source constellation𝒦0\\mathcal\{K\}\_\{0\}is tested against source, overlap, target, and validation observations\. Each candidate moveΔj\\Delta\_\{j\}acts on the source constellation to produce a candidate constellation𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\)\. Fitting𝒦j\\mathcal\{K\}\_\{j\}in the source and target regimes gives local charts
𝒦^j,s∈ℱ\(Us\),𝒦^j,t∈ℱ\(Ut\)\.\\widehat\{\\mathcal\{K\}\}\_\{j,s\}\\in\\mathcal\{F\}\(U\_\{s\}\),\\qquad\\widehat\{\\mathcal\{K\}\}\_\{j,t\}\\in\\mathcal\{F\}\(U\_\{t\}\)\.These charts are compared by their restrictions to the overlap,
ρUoUs\(𝒦^j,s\)andρUoUt\(𝒦^j,t\)\.\\rho^\{U\_\{s\}\}\_\{U\_\{o\}\}\(\\widehat\{\\mathcal\{K\}\}\_\{j,s\}\)\\quad\\text\{and\}\\quad\\rho^\{U\_\{t\}\}\_\{U\_\{o\}\}\(\\widehat\{\\mathcal\{K\}\}\_\{j,t\}\)\.In the Galilean\-to\-Lorentz example, this asks whether the low\-velocity source\-fitted chart and the higher\-velocity target\-fitted chart make compatible claims in the intermediate velocity regime\. Agreement supports transport; disagreement contributes to obstruction\.
This local\-chart view connects the representational object𝒦j\\mathcal\{K\}\_\{j\}to the finite local\-to\-global test used in the experiments\. Section[4](https://arxiv.org/html/2605.14033#S4)turns the comparison above into an obstruction functional: a candidate transports when its fitted charts agree on overlaps while preserving constraints and limits, and it requires extension when low obstruction is attainable only after enlarging the representational constellation\.
## 4Transport, Gluing, and Obstruction Formalism
The experiments use a finite site of scientific contexts and a scalar obstruction functional\. Contexts form the site, constellations are fitted as local charts, restriction compares those charts on overlaps, gluing measures their compatibility, and obstruction quantifies failure of local\-to\-global coherence\(Mac Lane & Moerdijk,[1992](https://arxiv.org/html/2605.14033#bib.bib26); Johnstone,[2002](https://arxiv.org/html/2605.14033#bib.bib21)\)\. This finite construction follows the computational direction of applied sheaf work on data fusion, consistency, and cellular sheaf methods\(Curry,[2014](https://arxiv.org/html/2605.14033#bib.bib9); Robinson,[2017](https://arxiv.org/html/2605.14033#bib.bib32); Hansen & Ghrist,[2019](https://arxiv.org/html/2605.14033#bib.bib16)\)\.
### 4\.1Finite site of scientific contexts
Let𝒞\\mathcal\{C\}be a finite category of scientific contexts with basic objects
cs,co,ct,cv,c\_\{s\},\\qquad c\_\{o\},\\qquad c\_\{t\},\\qquad c\_\{v\},representing source, overlap, target, and validation regimes\. The source contextcsc\_\{s\}is where the starting constellation is locally adequate\. The target contextctc\_\{t\}is where transport or extension is tested\. The overlap contextcoc\_\{o\}is the common regime on which source\-fitted and target\-fitted charts are restricted and compared\. The validation contextcvc\_\{v\}is held out from the selection obstruction and used only as a diagnostic\.
The intended global regime is probed by the finite cover
\{cs,co,ct\}\.\\\{c\_\{s\},c\_\{o\},c\_\{t\}\\\}\.Thus a candidate constellation is not evaluated only by target fit\. It must remain adequate oncsc\_\{s\}, fitctc\_\{t\}, preserve the relevant source limit, and give compatible consequences oncoc\_\{o\}\. The finite site is therefore the minimal structure used to test whether a representation still transports across regimes\.
### 4\.2Transition cards
###### Definition 1\(Transition card\)\.
A transition card is a tuple
T=\(𝒦0,Ds,Do,Dt,Dv,\{Δj\}j=1m\),T=\\bigl\(\\mathcal\{K\}\_\{0\},D\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\},\\\{\\Delta\_\{j\}\\\}\_\{j=1\}^\{m\}\\bigr\),where𝒦0\\mathcal\{K\}\_\{0\}is the source constellation,Ds,Do,Dt,DvD\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\}are observations in the source, overlap, target, and validation contexts, and\{Δj\}j=1m\\\{\\Delta\_\{j\}\\\}\_\{j=1\}^\{m\}is a finite set of admissible representational moves\. Each move is typed as a deformation or an extension\.
A transition card is the finite object on which the ranking problem is defined\. It presents an artificial scientific agent with a source constellation𝒦0\\mathcal\{K\}\_\{0\}, evidence from several regimes, and a finite menu of possible ways to modify the source representation\. The task is not to generate a theory from nothing, but to decide which candidate move best restores local\-to\-global coherence when the source constellation is tested outside its native regime\. The evaluation label is not used by the ranking functional; it is used only after ranking to check whether the selected move is the intended deformation or extension\.
The candidate moveΔj\\Delta\_\{j\}acts on the source constellation to produce
𝒦j=Δj\(𝒦0\)\.\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\)\.This notation makes explicit that a move is an operation on the constellation\. IfΔj\\Delta\_\{j\}is a deformation, then𝒦j\\mathcal\{K\}\_\{j\}stays inside the original representational family: it changes parameters, correction terms, or law schemas while preserving the basic variables, constraints, and limiting interpretation\. IfΔj\\Delta\_\{j\}is an extension, then𝒦j\\mathcal\{K\}\_\{j\}belongs to an enlarged family: it adds a primitive, constraint, law schema, transformation rule, theoretical posit, or limit relation\. The obstruction computation below is performed for each𝒦j\\mathcal\{K\}\_\{j\}, and the candidates are ranked by their resulting obstruction values\.
The benchmark has three nested levels\. A*transition family*is an archetype of representational change, such as Galilean\-to\-Lorentzian velocity composition\. A*transition card*is one concrete instance within such a family, with its own observations and candidate moves\. An*observation record*is one local context\-indexed numerical observation insideDs,Do,DtD\_\{s\},D\_\{o\},D\_\{t\}, orDvD\_\{v\}\. Thus, statements such as “40 source observations” refer to 40 local records inside a single transition card, not to 40 distinct transition cards\. For a schematic view of this hierarchy and a running example, see[B](https://arxiv.org/html/2605.14033#A2)\.
### 4\.3Local fitting and restriction
Each candidate constellation𝒦j\\mathcal\{K\}\_\{j\}determines a model family
ℳj=\{f𝒦j,θ:θ∈Θj\},\\mathcal\{M\}\_\{j\}=\\\{f\_\{\\mathcal\{K\}\_\{j\},\\theta\}:\\theta\\in\\Theta\_\{j\}\\\},wherejjindexes the candidate move,θ\\thetadenotes the parameters that can still be fitted inside that candidate, andf𝒦j,θf\_\{\\mathcal\{K\}\_\{j\},\\theta\}is the predictive chart associated with𝒦j\\mathcal\{K\}\_\{j\}\. In the Galilean\-to\-Lorentz case, for example, different values ofjjmay correspond to the unchanged Galilean law, a bounded deformation, a wrong extension, or the Lorentzian extension;θ\\thetathen represents the adjustable parameters allowed within that chosen constellation\. Fitting the candidate on a context\-indexed datasetDDselects the chart
𝖥𝗂𝗍\(𝒦j;D\)=f𝒦j,θ^\(D\),θ^\(D\)=argminθ∈Θj∑\(x,y\)∈Dℓ\(f𝒦j,θ\(x\),y\)\.\\mathsf\{Fit\}\(\\mathcal\{K\}\_\{j\};D\)=f\_\{\\mathcal\{K\}\_\{j\},\\widehat\{\\theta\}\(D\)\},\\qquad\\widehat\{\\theta\}\(D\)=\\operatorname\*\{arg\\,min\}\_\{\\theta\\in\\Theta\_\{j\}\}\\sum\_\{\(x,y\)\\in D\}\\ell\\\!\\left\(f\_\{\\mathcal\{K\}\_\{j\},\\theta\}\(x\),y\\right\)\.The lossℓ\\elldepends on the observation type of the transition family; in the benchmark it is a normalized prediction loss over the observed quantities\.
For each candidate𝒦j\\mathcal\{K\}\_\{j\}, the source and target fits are
𝒦^j,s=𝖥𝗂𝗍\(𝒦j;Ds\),𝒦^j,t=𝖥𝗂𝗍\(𝒦j;Dt\)\.\\widehat\{\\mathcal\{K\}\}\_\{j,s\}=\\mathsf\{Fit\}\(\\mathcal\{K\}\_\{j\};D\_\{s\}\),\\qquad\\widehat\{\\mathcal\{K\}\}\_\{j,t\}=\\mathsf\{Fit\}\(\\mathcal\{K\}\_\{j\};D\_\{t\}\)\.The first subscriptjjidentifies the candidate constellation, whilessandttidentify the source and target regimes\. Thus𝒦^j,s\\widehat\{\\mathcal\{K\}\}\_\{j,s\}is the version of candidatejjfitted on source data, and𝒦^j,t\\widehat\{\\mathcal\{K\}\}\_\{j,t\}is the same candidate fitted on target data\. A global fit is also computed over the source, overlap, and target data:
𝒦^j,g=𝖥𝗂𝗍\(𝒦j;Ds∪Do∪Dt\)\.\\widehat\{\\mathcal\{K\}\}\_\{j,g\}=\\mathsf\{Fit\}\(\\mathcal\{K\}\_\{j\};D\_\{s\}\\cup D\_\{o\}\\cup D\_\{t\}\)\.The local fits test whether independently adapted source and target charts can agree on the overlap; the global fit supplies the residuals and structural terms entering the obstruction functional\.
Restriction evaluates the local charts on the overlap context:
ρs→o\(𝒦^j,s\),ρt→o\(𝒦^j,t\)\.\\rho\_\{s\\to o\}\(\\widehat\{\\mathcal\{K\}\}\_\{j,s\}\),\\qquad\\rho\_\{t\\to o\}\(\\widehat\{\\mathcal\{K\}\}\_\{j,t\}\)\.Heres→os\\to oandt→ot\\to oindicate that a source\-fitted or target\-fitted chart is being read on the common overlap regime\. In the finite empirical setting, these restrictions are the predictions and constraint profiles induced by the fitted charts on the overlap observationsDoD\_\{o\}\. In the Galilean\-to\-Lorentz example, they ask how the low\-velocity fit and the higher\-velocity fit behave on the same intermediate\-velocity observations\. If those restricted charts disagree, the candidate may fit locally but fail to transport coherently\.
### 4\.4Residual terms
For each candidate𝒦j\\mathcal\{K\}\_\{j\}, the global fit𝒦^j,g\\widehat\{\\mathcal\{K\}\}\_\{j,g\}is evaluated on the four context\-indexed datasets\. This gives normalized residuals
Rs\(𝒦j\),Ro\(𝒦j\),Rt\(𝒦j\),Rv\(𝒦j\),R\_\{s\}\(\\mathcal\{K\}\_\{j\}\),\\quad R\_\{o\}\(\\mathcal\{K\}\_\{j\}\),\\quad R\_\{t\}\(\\mathcal\{K\}\_\{j\}\),\\quad R\_\{v\}\(\\mathcal\{K\}\_\{j\}\),where the subscripts denote source, overlap, target, and validation regimes\. A typical residual has the form
Rc\(𝒦j\)=nRMSE\(𝒦^j,g\(Dc\),Dc\),c∈\{s,o,t,v\}\.R\_\{c\}\(\\mathcal\{K\}\_\{j\}\)=\\operatorname\{nRMSE\}\\left\(\\widehat\{\\mathcal\{K\}\}\_\{j,g\}\(D\_\{c\}\),D\_\{c\}\\right\),\\qquad c\\in\\\{s,o,t,v\\\}\.ThusRsR\_\{s\}tests whether the candidate remains adequate in the original source regime,RtR\_\{t\}tests whether it fits the new target regime, andRoR\_\{o\}tests behavior in the intermediate overlap regime\. In the Galilean\-to\-Lorentz example, these terms ask whether a candidate fits low\-velocity data, higher subluminal data, and the intermediate regime where the two descriptions can be compared\. The validation residualRvR\_\{v\}is held out from the selection functional; it is used only for diagnostic reporting and for the broader obstruction signature used by the secondary kernel probe\.
### 4\.5Gluing residual
The gluing test compares two local views of the same candidate constellation\. For candidate𝒦j\\mathcal\{K\}\_\{j\}, the source fit𝒦^j,s\\widehat\{\\mathcal\{K\}\}\_\{j,s\}and target fit𝒦^j,t\\widehat\{\\mathcal\{K\}\}\_\{j,t\}are both evaluated on the overlap regime:
ρs→o\(𝒦^j,s\)≃ρt→o\(𝒦^j,t\)\.\\rho\_\{s\\to o\}\(\\widehat\{\\mathcal\{K\}\}\_\{j,s\}\)\\simeq\\rho\_\{t\\to o\}\(\\widehat\{\\mathcal\{K\}\}\_\{j,t\}\)\.The symbol≃\\simeqindicates approximate agreement rather than exact equality\. In the finite benchmark, agreement means that the two restricted charts make compatible predictions and satisfy compatible constraint profiles on the same overlap observations\.
We define the finite gluing residual as
Gglue\(𝒦j\)=do\(ρs→o\(𝒦^j,s\),ρt→o\(𝒦^j,t\)\),G\_\{\\mathrm\{glue\}\}\(\\mathcal\{K\}\_\{j\}\)=d\_\{o\}\\left\(\\rho\_\{s\\to o\}\(\\widehat\{\\mathcal\{K\}\}\_\{j,s\}\),\\rho\_\{t\\to o\}\(\\widehat\{\\mathcal\{K\}\}\_\{j,t\}\)\\right\),wheredod\_\{o\}is a normalized discrepancy on the overlap\. In the Galilean\-to\-Lorentz example, this term compares what the low\-velocity source\-fitted chart and the higher\-velocity target\-fitted chart imply in the intermediate velocity regime\. A small value means that the candidate transports smoothly across the overlap; a large value means that the local charts remain in tension even if each fits its own regime\.
This term is the finite counterpart of the sheaf gluing condition\. A candidate may fit source and target data separately, yet fail to glue because the two fitted charts give incompatible consequences where their domains meet\. This is the representational situation that the obstruction functional is designed to detect\. Similar compatibility and consistency ideas appear in applied sheaf theory for sensor integration\(Robinson,[2017](https://arxiv.org/html/2605.14033#bib.bib32)\), cellular sheaf Laplacians and obstruction measures\(Hansen & Ghrist,[2019](https://arxiv.org/html/2605.14033#bib.bib16); Ayzenberg et al\.,[2025](https://arxiv.org/html/2605.14033#bib.bib2)\), and distributed systems or graph learning\(Felber et al\.,[2025](https://arxiv.org/html/2605.14033#bib.bib11); Bodnar et al\.,[2022](https://arxiv.org/html/2605.14033#bib.bib3)\)\.
Source contextDsD\_\{s\}Overlap contextDoD\_\{o\}Target contextDtD\_\{t\}ValidationDvD\_\{v\}Source fit𝒦^j,s\\widehat\{\\mathcal\{K\}\}\_\{j,s\}Target fit𝒦^j,t\\widehat\{\\mathcal\{K\}\}\_\{j,t\}Global fit𝒦^j,g\\widehat\{\\mathcal\{K\}\}\_\{j,g\}Gluing residualGglueG\_\{\\mathrm\{glue\}\}ResidualsRs,Ro,RtR\_\{s\},R\_\{o\},R\_\{t\}StructureCviolC\_\{\\mathrm\{viol\}\}Plimit,𝖢𝗈𝗌𝗍P\_\{\\mathrm\{limit\}\},\\mathsf\{Cost\}Selection obstruction𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)Diagnostic onlyRvR\_\{v\}restrictrestrictFigure 2:Finite local\-to\-global obstruction computation\. For each candidate constellation, source and target fits are restricted to the overlap to measure gluing failure\. Global residuals, structural penalties, limit penalties, and representational cost define the selection obstruction𝖮𝖻𝗌S\\mathsf\{Obs\}\_\{S\}\. The validation residualRvR\_\{v\}is held out and used only for diagnostic signatures\.
### 4\.6Constraint and limit penalties
Residual error alone is not enough\. For a candidate constellation𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\), good prediction onDs,Do,DtD\_\{s\},D\_\{o\},D\_\{t\}does not guarantee that the candidate is an admissible scientific representation\. It may fit the observations while violating an invariant, boundedness condition, conservation relation, monotonicity requirement, or source\-regime limit\. The obstruction functional therefore includes two structural penalties\.
The constraint\-violation term
Cviol\(𝒦j\)C\_\{\\mathrm\{viol\}\}\(\\mathcal\{K\}\_\{j\}\)measures how strongly𝒦j\\mathcal\{K\}\_\{j\}violates the admissibility constraints encoded in the transition family\. These constraints are part of the representational constellation, not external afterthoughts: they specify what counts as an allowed description in the relevant contexts\. In the benchmark, examples include the speed\-bound constraint for Lorentzian velocity composition, finite\-energy constraints for Planck\-like radiation, low\-density admissibility for virial corrections, and sign or monotonicity constraints for response functions\.
The limit\-preservation penalty
Plimit\(𝒦j\)P\_\{\\mathrm\{limit\}\}\(\\mathcal\{K\}\_\{j\}\)measures whether𝒦j\\mathcal\{K\}\_\{j\}recovers the source constellation𝒦0\\mathcal\{K\}\_\{0\}in the appropriate source regimecsc\_\{s\}or limiting domain\. This term is essential because many successful extensions preserve older theories as limiting cases rather than simply replacing them\. Relativistic kinetic energy must reduce to the Newtonian expression whenv/c≪1v/c\\ll 1; Planck\-like radiation must recover the appropriate classical behavior in its limiting regime; finite\-angle pendulum dynamics must recover the small\-angle approximation\. In the Galilean\-to\-Lorentz case, an extension𝒦j∈ℱ\+\\mathcal\{K\}\_\{j\}\\in\\mathcal\{F\}^\{\+\}is acceptable only if it satisfies the target invariant\-speed constraint while preserving the Galilean low\-velocity limit of𝒦0\\mathcal\{K\}\_\{0\}\.
### 4\.7Cost of representational change
Each candidate moveΔj\\Delta\_\{j\}is assigned a representational cost
𝖢𝗈𝗌𝗍\(Δj\)\.\\mathsf\{Cost\}\(\\Delta\_\{j\}\)\.This term penalizes changes of language that are not needed to restore coherence\. A deformation usually has lower cost because it keeps the source constellation𝒦0\\mathcal\{K\}\_\{0\}within the same representational family and modifies only a parameter, correction term, or law schema\. An extension has higher cost when it adds a new primitive, constraint, transformation rule, theoretical posit, or limiting relation\. The cost term prevents the ranking rule from selecting a more expressive constellation merely because it can fit the target data\.
The cost is therefore not a generic complexity penalty\. It is tied to the representational move being evaluated: the extension𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\)must pay for the new resources it introduces\. Such cost is justified only when it reduces residual, gluing, constraint, or limit obstruction enough to make the larger constellation more coherent than any admissible deformation\. This follows the same general principle used in model selection and scientific representation: increased expressive power must be justified by improved coherence, not merely by local fit\(Morgan & Morrison,[1999](https://arxiv.org/html/2605.14033#bib.bib29); Thagard,[2012](https://arxiv.org/html/2605.14033#bib.bib40); Nersessian,[2008](https://arxiv.org/html/2605.14033#bib.bib30)\)\.
### 4\.8Obstruction functional
The selection obstruction of candidate𝒦j\\mathcal\{K\}\_\{j\}is
𝖮𝖻𝗌S\(𝒦j\)=wsRs\(𝒦j\)\+woRo\(𝒦j\)\+wtRt\(𝒦j\)\+wgGglue\(𝒦j\)\+wcCviol\(𝒦j\)\+wlPlimit\(𝒦j\)\+λ𝖢𝗈𝗌𝗍\(Δj\)\.\\begin\{split\}\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)=\{\}&w\_\{s\}R\_\{s\}\(\\mathcal\{K\}\_\{j\}\)\+w\_\{o\}R\_\{o\}\(\\mathcal\{K\}\_\{j\}\)\+w\_\{t\}R\_\{t\}\(\\mathcal\{K\}\_\{j\}\)\\\\ &\+w\_\{g\}G\_\{\\mathrm\{glue\}\}\(\\mathcal\{K\}\_\{j\}\)\+w\_\{c\}C\_\{\\mathrm\{viol\}\}\(\\mathcal\{K\}\_\{j\}\)\+w\_\{l\}P\_\{\\mathrm\{limit\}\}\(\\mathcal\{K\}\_\{j\}\)\+\\lambda\\mathsf\{Cost\}\(\\Delta\_\{j\}\)\.\\end\{split\}\(1\)The subscriptSSmarks this as the obstruction used for candidate selection\. It combines source, overlap, and target residuals with gluing failure, constraint violation, limit failure, and representational cost\. The held\-out validation residualRvR\_\{v\}is not included in𝖮𝖻𝗌S\\mathsf\{Obs\}\_\{S\}; it is used only for diagnostic reporting and for the broader obstruction signatures defined below\.
A low\-obstruction candidate is one with small source, overlap, and target residualsRs,Ro,RtR\_\{s\},R\_\{o\},R\_\{t\}, small gluing discrepancyGglueG\_\{\\mathrm\{glue\}\}, low constraint and limit penaltiesCviolC\_\{\\mathrm\{viol\}\}andPlimitP\_\{\\mathrm\{limit\}\}, and no unnecessary representational cost𝖢𝗈𝗌𝗍\(Δj\)\\mathsf\{Cost\}\(\\Delta\_\{j\}\)\. A high\-obstruction candidate may fit one regime well, but fails as a coherent local\-to\-global representation because one or more of these terms remains large\. In the Galilean\-to\-Lorentz case, for example, a fixed Galilean candidate may have lowRsR\_\{s\}but largeRtR\_\{t\},CviolC\_\{\\mathrm\{viol\}\}, orGglueG\_\{\\mathrm\{glue\}\}\. A Lorentzian extension pays cost𝖢𝗈𝗌𝗍\(Δj\)\\mathsf\{Cost\}\(\\Delta\_\{j\}\), but is preferred only if the reduction in residual, gluing, constraint, and limit penalties lowers the total𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)\.
The obstruction functional is the central quantitative object of the paper\. It turns the qualitative distinction between deformation and extension into a ranking criterion: the selected move is the candidate with minimal𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)\.
### 4\.9Transport, obstruction, and extension
The obstruction ranking turns the transport\-versus\-extension distinction into a decision rule\. Let
j⋆=argmin1≤j≤m𝖮𝖻𝗌S\(𝒦j\)j^\{\\star\}=\\operatorname\*\{arg\\,min\}\_\{1\\leq j\\leq m\}\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)be the lowest\-obstruction candidate for a transition cardTT\. The transition is*transportable*when the selected candidate𝒦j⋆\\mathcal\{K\}\_\{j^\{\\star\}\}belongs to the deformation family:
𝒦j⋆∈\{𝒦j:Δjis a deformation\}\.\\mathcal\{K\}\_\{j^\{\\star\}\}\\in\\\{\\mathcal\{K\}\_\{j\}:\\Delta\_\{j\}\\ \\text\{is a deformation\}\\\}\.In this case, the source constellation remains adequate after bounded within\-language modification\.
The transition is*extension\-required*when fixed\-language deformations retain higher obstruction and the selected candidate belongs to the extension family:
𝒦j⋆∈\{𝒦j:Δjis an extension\}\.\\mathcal\{K\}\_\{j^\{\\star\}\}\\in\\\{\\mathcal\{K\}\_\{j\}:\\Delta\_\{j\}\\ \\text\{is an extension\}\\\}\.In this case, low obstruction is achieved only after enlarging the representational language by adding a primitive, constraint, transformation rule, law schema, theoretical posit, or limit relation\. The Galilean\-to\-Lorentz case has this form when Galilean deformations retain largeRtR\_\{t\},GglueG\_\{\\mathrm\{glue\}\}, orCviolC\_\{\\mathrm\{viol\}\}, while the Lorentzian extension lowers the total𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)despite paying𝖢𝗈𝗌𝗍\(Δj\)\\mathsf\{Cost\}\(\\Delta\_\{j\}\)\.
This is the formal criterion used in the experiments\. Discovery\-like revision is not identified with novelty alone\. It is identified with a costed reduction of local\-to\-global obstruction that cannot be achieved by deformation inside the original constellation\.
### 4\.10Obstruction signatures
For comparison across transition families, each candidate move is represented by an obstruction signature
Φ\(T,Δj\)=\[Rs\(𝒦j\),Ro\(𝒦j\),Rt\(𝒦j\),Rv\(𝒦j\),Gglue\(𝒦j\),Cviol\(𝒦j\),Plimit\(𝒦j\),𝖢𝗈𝗌𝗍\(Δj\),ψ\(G𝒦j\)\],\\Phi\(T,\\Delta\_\{j\}\)=\\bigl\[R\_\{s\}\(\\mathcal\{K\}\_\{j\}\),R\_\{o\}\(\\mathcal\{K\}\_\{j\}\),R\_\{t\}\(\\mathcal\{K\}\_\{j\}\),R\_\{v\}\(\\mathcal\{K\}\_\{j\}\),G\_\{\\mathrm\{glue\}\}\(\\mathcal\{K\}\_\{j\}\),C\_\{\\mathrm\{viol\}\}\(\\mathcal\{K\}\_\{j\}\),P\_\{\\mathrm\{limit\}\}\(\\mathcal\{K\}\_\{j\}\),\\mathsf\{Cost\}\(\\Delta\_\{j\}\),\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)\\bigr\],whereψ\(G𝒦j\)\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)denotes typed graph features of the candidate constellation\. The first entries record residual behavior across source, overlap, target, and validation contexts; the next entries record gluing, constraint, limit, and cost terms; and the final block records structural features of the typed constellation graph\. The validation residualRvR\_\{v\}appears here because the signature is used for analysis and kernel comparison, not for candidate selection\.
These signatures support two evaluations\. First, their scalar projection through Eq\. \([1](https://arxiv.org/html/2605.14033#S4.E1)\) gives the primary obstruction ranking by droppingRvR\_\{v\}and weighting the selection terms\. Second, the full signature provides a structured feature representation for the secondary representational constellation kernel introduced in the next section\.
## 5Constellation Kernels as a Representational Probe
The obstruction functional in Eq\. \([1](https://arxiv.org/html/2605.14033#S4.E1)\) is the primary decision rule of the paper\. It ranks candidate moves by local fit, gluing compatibility, constraint satisfaction, limit preservation, and representational cost\. The kernel introduced here uses the resulting obstruction signatures and constellation graphs to test whether these same components induce a transferable similarity space across transition families\.
Kernel methods are well suited to this comparison because the objects are sparse, structured, and heterogeneous\. Scientific transitions are structured cases whose similarity depends on residual patterns, gluing behavior, constraint profiles, limit preservation, and graph\-level representational commitments\. General kernel methods compare such objects through feature maps and inner products in implicit representation spaces\(Schölkopf & Smola,[2002](https://arxiv.org/html/2605.14033#bib.bib35); Shawe\-Taylor & Cristianini,[2004](https://arxiv.org/html/2605.14033#bib.bib36)\)\. Multiple\-kernel and block\-kernel constructions allow distinct evidence sources to contribute separate similarity components\(Hofmann et al\.,[2008](https://arxiv.org/html/2605.14033#bib.bib19)\)\. Convolution kernels compare structured objects by composing similarities over parts\(Haussler,[1999](https://arxiv.org/html/2605.14033#bib.bib18)\), while graph kernels provide corresponding tools for typed relational structures\(Borgwardt & Kriegel,[2005](https://arxiv.org/html/2605.14033#bib.bib4); Shervashidze et al\.,[2011](https://arxiv.org/html/2605.14033#bib.bib37); Vishwanathan et al\.,[2010](https://arxiv.org/html/2605.14033#bib.bib42)\)\.
### 5\.1Candidate signatures
Each candidate move is represented as
a=\(T,Δj\),a=\(T,\\Delta\_\{j\}\),whereTTis a transition card andΔj\\Delta\_\{j\}is a candidate deformation or extension\. The candidate produces a constellation𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\), and its obstruction signature is
Φ\(T,Δj\)=\[Rs\(𝒦j\),Ro\(𝒦j\),Rt\(𝒦j\),Rv\(𝒦j\),Gglue\(𝒦j\),Cviol\(𝒦j\),Plimit\(𝒦j\),𝖢𝗈𝗌𝗍\(Δj\),ψ\(G𝒦j\)\],\\Phi\(T,\\Delta\_\{j\}\)=\\bigl\[R\_\{s\}\(\\mathcal\{K\}\_\{j\}\),\\,R\_\{o\}\(\\mathcal\{K\}\_\{j\}\),\\,R\_\{t\}\(\\mathcal\{K\}\_\{j\}\),\\,R\_\{v\}\(\\mathcal\{K\}\_\{j\}\),\\,G\_\{\\mathrm\{glue\}\}\(\\mathcal\{K\}\_\{j\}\),\\,C\_\{\\mathrm\{viol\}\}\(\\mathcal\{K\}\_\{j\}\),\\,P\_\{\\mathrm\{limit\}\}\(\\mathcal\{K\}\_\{j\}\),\\,\\mathsf\{Cost\}\(\\Delta\_\{j\}\),\\,\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)\\bigr\],whereψ\(G𝒦j\)\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)denotes typed graph features of the candidate constellation\.
The signature separates the evidence used for selection from the information used for comparison\. Its weighted scalar projection, excluding the held\-out validation residualRvR\_\{v\}, gives the primary obstruction score𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)\. As a vector\-valued representation, the full signature also supports comparison across transition families\. Two different physical transitions may have analogous representational structure if both preserve the source regime, fail under fixed\-language deformation, exhibit gluing strain, and become coherent only after introducing a new primitive, constraint, law schema, or limiting relation\.
### 5\.2Graph features of a constellation
The graph feature mapψ\(G𝒦j\)\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)summarizes the typed structure of a candidate constellation\. We write
ψ\(G𝒦j\)=\[nV\(G𝒦j\),nE\(G𝒦j\),n3\(G𝒦j\),q\(G𝒦j\)\],\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)=\\bigl\[n\_\{V\}\(G\_\{\\mathcal\{K\}\_\{j\}\}\),\\,n\_\{E\}\(G\_\{\\mathcal\{K\}\_\{j\}\}\),\\,n\_\{3\}\(G\_\{\\mathcal\{K\}\_\{j\}\}\),\\,q\(G\_\{\\mathcal\{K\}\_\{j\}\}\)\\bigr\],wherenVn\_\{V\}gives typed node counts,nEn\_\{E\}gives typed edge counts,n3n\_\{3\}gives typed triple counts, andqqrecords representational commitments\. These commitments include*invariant\-speed structure*,*low\-speed limits*,*quantization scales*,*absolute time*,*preferred frames*,*limit relations*,*removal of old posits*, and*introduction of new constraints*\.
The feature map distinguishes candidates whose numerical residuals may look similar but whose constellation graphs differ\. A deformation candidate𝒦θ\\mathcal\{K\}\_\{\\theta\}may alter a law\-schema nodeℓ\\ellwhile preserving the surrounding commitments ofG𝒦0G\_\{\\mathcal\{K\}\_\{0\}\}\. An extension candidate𝒦\+\\mathcal\{K\}^\{\+\}may instead add new nodes or edges, such as\(ℓ,introduces,q\)\(\\ell,\\mathrm\{introduces\},q\),\(ℓ,constrains,c\)\(\\ell,\\mathrm\{constrains\},c\), or\(ℓ,preserves,r\)\(\\ell,\\mathrm\{preserves\},r\)\. In the Galilean\-to\-Lorentz case,ψ\(G𝒦j\)\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)separates a deformation of the velocity\-composition law from an extension that introduces invariant\-speed structure, Lorentz transformations, and a low\-speed limit relation\.
The graph blockψ\(G𝒦j\)\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)enters the kernel throughkgraphk\_\{\\mathrm\{graph\}\}, not through the primary selection obstruction𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)\. Thus the primary ranking remains the obstruction ranking from Eq\. \([1](https://arxiv.org/html/2605.14033#S4.E1)\), while the graph features test whether structural changes inG𝒦jG\_\{\\mathcal\{K\}\_\{j\}\}help compare candidate moves across transition families\. The typed\-count representation can be replaced by random\-walk, shortest\-path, Weisfeiler–Lehman, or other graph\-kernel constructions\(Gärtner et al\.,[2003](https://arxiv.org/html/2605.14033#bib.bib14); Borgwardt & Kriegel,[2005](https://arxiv.org/html/2605.14033#bib.bib4); Shervashidze et al\.,[2011](https://arxiv.org/html/2605.14033#bib.bib37); Vishwanathan et al\.,[2010](https://arxiv.org/html/2605.14033#bib.bib42)\)\.
### 5\.3Additive block kernel
The obstruction signature is block structured\. For a candidate rowa=\(T,Δj\)a=\(T,\\Delta\_\{j\}\), let
Φ\(a\)=\[zres\(a\),zglue\(a\),zcon\(a\),zlim\(a\),ψ\(G𝒦j\)\],\\Phi\(a\)=\\bigl\[z\_\{\\mathrm\{res\}\}\(a\),\\,z\_\{\\mathrm\{glue\}\}\(a\),\\,z\_\{\\mathrm\{con\}\}\(a\),\\,z\_\{\\mathrm\{lim\}\}\(a\),\\,\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)\\bigr\],wherezresz\_\{\\mathrm\{res\}\}contains standardized residual features,zgluez\_\{\\mathrm\{glue\}\}the standardized gluing feature,zconz\_\{\\mathrm\{con\}\}the standardized constraint feature,zlimz\_\{\\mathrm\{lim\}\}the standardized limit feature, andψ\(G𝒦j\)\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)the typed graph features\. Each block measures a different aspect of representational adequacy: fit, overlap compatibility, admissibility, source\-limit preservation, and graph\-level representational structure\.
The kernel compares two candidate rowsa=\(T,Δi\)a=\(T,\\Delta\_\{i\}\)andb=\(T′,Δj\)b=\(T^\{\\prime\},\\Delta\_\{j\}\)by adding the similarities of these blocks:
k\(a,b\)=αreskres\(a,b\)\+αgluekglue\(a,b\)\+αconkcon\(a,b\)\+αlimklim\(a,b\)\+αgraphkgraph\(a,b\)\.k\(a,b\)=\\alpha\_\{\\mathrm\{res\}\}k\_\{\\mathrm\{res\}\}\(a,b\)\+\\alpha\_\{\\mathrm\{glue\}\}k\_\{\\mathrm\{glue\}\}\(a,b\)\+\\alpha\_\{\\mathrm\{con\}\}k\_\{\\mathrm\{con\}\}\(a,b\)\+\\alpha\_\{\\mathrm\{lim\}\}k\_\{\\mathrm\{lim\}\}\(a,b\)\+\\alpha\_\{\\mathrm\{graph\}\}k\_\{\\mathrm\{graph\}\}\(a,b\)\.\(2\)The coefficientsαres,αglue,αcon,αlim,αgraph\\alpha\_\{\\mathrm\{res\}\},\\alpha\_\{\\mathrm\{glue\}\},\\alpha\_\{\\mathrm\{con\}\},\\alpha\_\{\\mathrm\{lim\}\},\\alpha\_\{\\mathrm\{graph\}\}control how much each evidence block contributes to similarity\. Unlike𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\), which is a scalar ranking score for one transition card,k\(a,b\)k\(a,b\)compares two candidate moves across cards or families\.
For a numeric blockB∈\{res,glue,con,lim\}B\\in\\\{\\mathrm\{res\},\\mathrm\{glue\},\\mathrm\{con\},\\mathrm\{lim\}\\\}, we use a radial\-basis kernel over standardized block vectors:
kB\(a,b\)=exp\(−‖zB\(a\)−zB\(b\)‖22σB2\)\.k\_\{B\}\(a,b\)=\\exp\\\!\\left\(\-\\frac\{\\\|z\_\{B\}\(a\)\-z\_\{B\}\(b\)\\\|^\{2\}\}\{2\\sigma\_\{B\}^\{2\}\}\\right\)\.Thuskresk\_\{\\mathrm\{res\}\}compares the residual profiles of two candidates,kgluek\_\{\\mathrm\{glue\}\}compares their overlap\-compatibility behavior,kconk\_\{\\mathrm\{con\}\}compares their constraint\-violation profiles, andklimk\_\{\\mathrm\{lim\}\}compares their source\-limit behavior\. The graph block uses a normalized linear kernel over typed constellation features:
kgraph\(a,b\)=⟨ψ\(G𝒦a\),ψ\(G𝒦b\)⟩‖ψ\(G𝒦a\)‖‖ψ\(G𝒦b\)‖\+ε\.k\_\{\\mathrm\{graph\}\}\(a,b\)=\\frac\{\\langle\\psi\(G\_\{\\mathcal\{K\}\_\{a\}\}\),\\psi\(G\_\{\\mathcal\{K\}\_\{b\}\}\)\\rangle\}\{\\\|\\psi\(G\_\{\\mathcal\{K\}\_\{a\}\}\)\\\|\\,\\\|\\psi\(G\_\{\\mathcal\{K\}\_\{b\}\}\)\\\|\+\\varepsilon\}\.HereG𝒦aG\_\{\\mathcal\{K\}\_\{a\}\}andG𝒦bG\_\{\\mathcal\{K\}\_\{b\}\}are the typed constellation graphs associated with the two candidate moves\. The inner product is large when the candidates share representational commitments, for example when both graphs contain features for a new constraint, a preserved limit relation, or a newly introduced theoretical primitive\.
The additive form mirrors the structure of the obstruction signature\. The termskres,kglue,kcon,klimk\_\{\\mathrm\{res\}\},k\_\{\\mathrm\{glue\}\},k\_\{\\mathrm\{con\}\},k\_\{\\mathrm\{lim\}\}, andkgraphk\_\{\\mathrm\{graph\}\}keep fit, overlap compatibility, admissibility, limit preservation, and graph\-level commitments as separate evidence sources, while the weightsαres,αglue,αcon,αlim,αgraph\\alpha\_\{\\mathrm\{res\}\},\\alpha\_\{\\mathrm\{glue\}\},\\alpha\_\{\\mathrm\{con\}\},\\alpha\_\{\\mathrm\{lim\}\},\\alpha\_\{\\mathrm\{graph\}\}combine them into one similarity measure\. The kernel therefore tests whether the same objects used to compute obstruction also define a useful geometry of representational moves across transition families\.
### 5\.4Kernel ranking task
For each transition cardTT, the ranking task is to order its candidate moves\{Δj\}j=1m\\\{\\Delta\_\{j\}\\\}\_\{j=1\}^\{m\}\. In the kernel experiment, each candidate rowa=\(T,Δj\)a=\(T,\\Delta\_\{j\}\)is represented by its obstruction signatureΦ\(a\)\\Phi\(a\)\. Given signatures from training families, the kernelk\(a,b\)k\(a,b\)defines similarities among candidate moves, and a kernel scoring model assigns a score to each held\-out candidate\. Candidates inside the held\-out card are then ranked by this score\.
The evaluation uses a leave\-family\-out protocol\. Ifffis the held\-out transition family, signatures from all other families are used for training, and candidates fromffare used only for evaluation\. This tests whether the signature blockszres,zglue,zcon,zlimz\_\{\\mathrm\{res\}\},z\_\{\\mathrm\{glue\}\},z\_\{\\mathrm\{con\}\},z\_\{\\mathrm\{lim\}\}, together withψ\(G𝒦j\)\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\), carry structure that transfers across transition types rather than merely describing one family\.
The kernel task remains secondary to direct obstruction ranking\. Direct obstruction ranking asks whether the theory\-defined functional𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)selects the intended move within each transition card\. Kernel ranking asks whether the resulting signatures define a useful representational geometry over candidate moves\. Thus the kernel is a probe of the structure induced by the obstruction components, not a replacement for the obstruction criterion itself\.
### 5\.5Workflow
Figure[3](https://arxiv.org/html/2605.14033#S5.F3)summarizes how the primary obstruction ranking and the secondary kernel probe use the same candidate\-level evidence\. A transition cardTTsupplies the source constellation𝒦0\\mathcal\{K\}\_\{0\}, context dataDs,Do,Dt,DvD\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\}, and candidate movesΔj\\Delta\_\{j\}\. Each move produces a candidate constellation𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\)\. From𝒦j\\mathcal\{K\}\_\{j\}, the finite local\-to\-global computation extracts residual terms, overlap\-restriction and gluing terms, structural penalties, and typed graph features\. These components form the obstruction signatureΦ\(T,Δj\)\\Phi\(T,\\Delta\_\{j\}\)\. The scalar projection𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)gives the primary ranking, while the full signature defines the secondary kernelk\(a,b\)k\(a,b\)for comparing candidate moves across transition families\.
Transition card𝒦0,Ds,Do,Dt,Dv\\mathcal\{K\}\_\{0\},D\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\}Candidate move𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\)Residual termsRs,Ro,Rt,RvR\_\{s\},R\_\{o\},R\_\{t\},R\_\{v\}Overlap gluingGglueG\_\{\\mathrm\{glue\}\}Graph featuresψ\(G𝒦j\)\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)Obstruction signatureΦ\(T,Δj\)\\Phi\(T,\\Delta\_\{j\}\)Primary rankingminj𝖮𝖻𝗌S\(𝒦j\)\\min\_\{j\}\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)Secondary kernelk\(a,b\)k\(a,b\)local\-to\-global evidence blocksFigure 3:Workflow from transition cards to obstruction ranking and constellation kernels\. Each candidate moveΔj\\Delta\_\{j\}produces a constellation𝒦j\\mathcal\{K\}\_\{j\}, whose residual, gluing, structural, and graph features form the obstruction signatureΦ\(T,Δj\)\\Phi\(T,\\Delta\_\{j\}\)\. The scalar obstruction𝖮𝖻𝗌S\\mathsf\{Obs\}\_\{S\}gives the primary decision rule, while the full signature defines the secondary kernelk\(a,b\)k\(a,b\)for probing representational similarity across transition families\.
## 6Experimental Design
The experimental design evaluates the ranking rule defined by𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\): for each transition cardTT, does the lowest\-obstruction candidate correspond to a deformation inside the source constellation or to an extension of that constellation? Each card supplies a source regimeDsD\_\{s\}, an overlap regimeDoD\_\{o\}, a target regimeDtD\_\{t\}, and candidate moves\{Δj\}j=1m\\\{\\Delta\_\{j\}\\\}\_\{j=1\}^\{m\}\. The source regime tests whether𝒦0\\mathcal\{K\}\_\{0\}remains locally adequate; the overlap regime tests whether independently fitted charts restrict compatibly; and the target regime tests whether low obstruction can be achieved without adding a new primitive, constraint, law schema, transformation rule, or limiting relation\. The use of controlled physical transition families follows the computational scientific\-discovery tradition of testing structure recovery on interpretable scientific systems\(Langley et al\.,[1987](https://arxiv.org/html/2605.14033#bib.bib23)\)\. Modern equation\-discovery and sparse\-dynamics methods continue this emphasis on recovering meaningful structure rather than only minimizing curve\-fitting error\(Schmidt & Lipson,[2009](https://arxiv.org/html/2605.14033#bib.bib34); Brunton et al\.,[2016](https://arxiv.org/html/2605.14033#bib.bib5)\)\. Recent symbolic\-regression and AI\-for\-science work further sharpens this setting by evaluating whether learned expressions capture the right structural form, not merely predictive accuracy\(Udrescu & Tegmark,[2020](https://arxiv.org/html/2605.14033#bib.bib41); Cranmer et al\.,[2020](https://arxiv.org/html/2605.14033#bib.bib8)\)\.
### 6\.1Transition families
The benchmark contains six physics\-inspired transition families\. Each family defines a source constellation𝒦0\\mathcal\{K\}\_\{0\}, context regimes\(Ds,Do,Dt,Dv\)\(D\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\}\), and a finite candidate set\{Δj\}j=1m\\\{\\Delta\_\{j\}\\\}\_\{j=1\}^\{m\}\. Three families are*deformation\-sufficient*: some within\-language moveΔj\\Delta\_\{j\}produces low obstruction without changing the representational resources of𝒦0\\mathcal\{K\}\_\{0\}\. Three families are*extension\-required*: all admissible deformations retain high obstruction, and low𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)requires an enlarged constellation𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\)\.
The deformation\-sufficient families are
small\-angle pendulum⟶finite\-angle correction,ideal gas law⟶virial correction,Ohm’s law⟶temperature\-dependent resistance\.\\begin\{array\}\[\]\{ll\}\\text\{small\-angle pendulum\}&\\longrightarrow\\text\{finite\-angle correction\},\\\\ \\text\{ideal gas law\}&\\longrightarrow\\text\{virial correction\},\\\\ \\text\{Ohm's law\}&\\longrightarrow\\text\{temperature\-dependent resistance\}\.\\end\{array\}The extension\-required families are
Galilean velocity composition⟶Lorentzian velocity composition,Newtonian kinetic energy⟶relativistic kinetic energy,Rayleigh–Jeans radiation⟶Planck\-like blackbody law\.\\begin\{array\}\[\]\{ll\}\\text\{Galilean velocity composition\}&\\longrightarrow\\text\{Lorentzian velocity composition\},\\\\ \\text\{Newtonian kinetic energy\}&\\longrightarrow\\text\{relativistic kinetic energy\},\\\\ \\text\{Rayleigh\-\-Jeans radiation\}&\\longrightarrow\\text\{Planck\-like blackbody law\}\.\\end\{array\}For each family, the source, overlap, target, and validation regimes are fixed explicitly, and the intended representational move is used only for evaluation\. Table[2](https://arxiv.org/html/2605.14033#S6.T2)summarizes the transition types, regimes, and key structural constraints\.
Table 2:Benchmark transition families\. Each row defines a source\-to\-target regime sequence and the structural constraint or limiting relation used to test whether the source constellation transports or requires extension\.
### 6\.2Contexts and observations
Each transition card contains four context\-indexed datasetsDs,Do,Dt,DvD\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\}, corresponding to source, overlap, target, and validation regimes\. The source dataDsD\_\{s\}test whether the initial constellation𝒦0\\mathcal\{K\}\_\{0\}remains adequate in its native regime\. The overlap dataDoD\_\{o\}are the common regime on which𝒦^j,s\\widehat\{\\mathcal\{K\}\}\_\{j,s\}and𝒦^j,t\\widehat\{\\mathcal\{K\}\}\_\{j,t\}are restricted and compared\. The target dataDtD\_\{t\}test whether candidate𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\)transports or requires extension\. The validation dataDvD\_\{v\}are excluded from𝖮𝖻𝗌S\\mathsf\{Obs\}\_\{S\}and used only as a held\-out diagnostic\.
An observation record is one local datum\(x,y,c\)\(x,y,c\), wherexxdenotes the input variables,yythe observed quantity, andc∈\{s,o,t,v\}c\\in\\\{s,o,t,v\\\}the context label\. Each record may also activate context\-specific constraints used inCviol\(𝒦j\)C\_\{\\mathrm\{viol\}\}\(\\mathcal\{K\}\_\{j\}\)or limiting checks used inPlimit\(𝒦j\)P\_\{\\mathrm\{limit\}\}\(\\mathcal\{K\}\_\{j\}\)\. Thus the evaluation is not only residual fitting throughRs,Ro,RtR\_\{s\},R\_\{o\},R\_\{t\}\. A candidate must preserve the source regime, agree on the overlap throughGglueG\_\{\\mathrm\{glue\}\}, satisfy structural constraints throughCviolC\_\{\\mathrm\{viol\}\}, preserve relevant limits throughPlimitP\_\{\\mathrm\{limit\}\}, and fit the target regime\. Table[3](https://arxiv.org/html/2605.14033#S6.T3)summarizes how each card component contributes to this local\-to\-global test\.
Table 3:Information contained in a transition card\. Each component contributes to the finite local\-to\-global test by exposing a distinct failure mode\.
### 6\.3Candidate classes
For each transition cardTT, the benchmark supplies a finite set of candidate moves\{Δj\}j=1m\\\{\\Delta\_\{j\}\\\}\_\{j=1\}^\{m\}\. Each move acts on the source constellation to produce𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\), and each move has a role:*base*,*deformation*,*incorrect alternative*, or*intended move*\. The*base*move leaves𝒦0\\mathcal\{K\}\_\{0\}unchanged and tests whether the source constellation already transports\. A*deformation*changes a law schema, parameterization, or correction term while remaining inside the original representational language\. An*incorrect alternative*is a controlled distractor: it may add flexibility or an extension\-like resource, but it does not remove the relevant local\-to\-global obstruction\. The*intended move*is the benchmark\-correct deformation or extension used only for evaluation\.
This candidate set defines the ranking problem\. The obstruction functional uses only the generated constellations𝒦j\\mathcal\{K\}\_\{j\}, their residualsRs,Ro,RtR\_\{s\},R\_\{o\},R\_\{t\}, gluing discrepancyGglueG\_\{\\mathrm\{glue\}\}, structural penaltiesCviol,PlimitC\_\{\\mathrm\{viol\}\},P\_\{\\mathrm\{limit\}\}, and representational cost𝖢𝗈𝗌𝗍\(Δj\)\\mathsf\{Cost\}\(\\Delta\_\{j\}\)\. It does not use the intended label when ranking\. The label is used afterward to determine whether the minimum\-obstruction candidate
j⋆=argminj𝖮𝖻𝗌S\(𝒦j\)j^\{\\star\}=\\operatorname\*\{arg\\,min\}\_\{j\}\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)matches the benchmark\-correct move and whether its transition type is deformation or extension\.
The candidate structure separates three diagnostic questions\. First, does the base move keep obstruction low, so that𝒦0\\mathcal\{K\}\_\{0\}transports without change? Second, does some deformation𝒦θ\\mathcal\{K\}\_\{\\theta\}reduce obstruction while preserving the source language? Third, if low obstruction is achieved only by an extension𝒦\+\\mathcal\{K\}^\{\+\}, does that extension introduce the structural resource needed to restore gluing, constraints, and limits rather than merely adding flexibility? The present experiment evaluates this obstruction\-based ranking problem; autonomous generation ofΔj\\Delta\_\{j\}by symbolic search, program synthesis, or language\-model proposal is left for future work\.
### 6\.4Obstruction weights and representational costs
The selection obstruction𝖮𝖻𝗌S\\mathsf\{Obs\}\_\{S\}uses fixed weights for the seven terms in Eq\. \([1](https://arxiv.org/html/2605.14033#S4.E1)\):
Rs\(𝒦j\),Ro\(𝒦j\),Rt\(𝒦j\),Gglue\(𝒦j\),Cviol\(𝒦j\),Plimit\(𝒦j\),𝖢𝗈𝗌𝗍\(Δj\)\.R\_\{s\}\(\\mathcal\{K\}\_\{j\}\),\\quad R\_\{o\}\(\\mathcal\{K\}\_\{j\}\),\\quad R\_\{t\}\(\\mathcal\{K\}\_\{j\}\),\\quad G\_\{\\mathrm\{glue\}\}\(\\mathcal\{K\}\_\{j\}\),\\quad C\_\{\\mathrm\{viol\}\}\(\\mathcal\{K\}\_\{j\}\),\\quad P\_\{\\mathrm\{limit\}\}\(\\mathcal\{K\}\_\{j\}\),\\quad\\mathsf\{Cost\}\(\\Delta\_\{j\}\)\.These weights are not learned from the evaluation labels\. They set the relative scale of the controlled benchmark:wsw\_\{s\}andwow\_\{o\}give unit weight to source and overlap fit,wtw\_\{t\}andwgw\_\{g\}give higher weight to target fit and gluing,wcw\_\{c\}gives the strongest weight to structural admissibility, andλ\\lambdakeeps representational cost active but smaller than the main coherence terms\. Sensitivity analysis in Section[7](https://arxiv.org/html/2605.14033#S7)tests whether the main conclusions depend on narrow tuning of these values\.
Table 4:Reference weights in the selection obstruction𝖮𝖻𝗌S\\mathsf\{Obs\}\_\{S\}\. The values define the fixed scoring rule used before sensitivity analysis\.Representational cost𝖢𝗈𝗌𝗍\(Δj\)\\mathsf\{Cost\}\(\\Delta\_\{j\}\)is attached to the move that produces𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\), not to the fitted residualsRs,Ro,RtR\_\{s\},R\_\{o\},R\_\{t\}\. The base move has cost0\. Deformations receive small positive costs, typically0\.40\.4–0\.60\.6, because they preserve the source language while changing a parameterization, correction term, or law schema\. Extensions receive larger costs because they add a new primitive, constraint, transformation rule, law schema, theoretical posit, or limiting relation\. In the benchmark, intended extensions have costs around1\.51\.5–1\.71\.7, while controlled incorrect extensions are assigned comparable extension\-like costs when they add representational capacity without the intended structural role\. The weighted termλ𝖢𝗈𝗌𝗍\(Δj\)\\lambda\\mathsf\{Cost\}\(\\Delta\_\{j\}\)therefore penalizes unnecessary language change while still allowing an extension to win when it sufficiently reducesRt\(𝒦j\)R\_\{t\}\(\\mathcal\{K\}\_\{j\}\),Gglue\(𝒦j\)G\_\{\\mathrm\{glue\}\}\(\\mathcal\{K\}\_\{j\}\),Cviol\(𝒦j\)C\_\{\\mathrm\{viol\}\}\(\\mathcal\{K\}\_\{j\}\), orPlimit\(𝒦j\)P\_\{\\mathrm\{limit\}\}\(\\mathcal\{K\}\_\{j\}\)\.
### 6\.5Primary evaluation: obstruction ranking
The primary evaluation ranks candidate moves by the selection obstruction𝖮𝖻𝗌S\\mathsf\{Obs\}\_\{S\}\. For each transition cardT=\(𝒦0,Ds,Do,Dt,Dv,\{Δj\}j=1m\)T=\(\\mathcal\{K\}\_\{0\},D\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\},\\\{\\Delta\_\{j\}\\\}\_\{j=1\}^\{m\}\), each candidate move produces
𝒦j=Δj\(𝒦0\)\.\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\)\.The score𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)is then computed from Eq\. \([1](https://arxiv.org/html/2605.14033#S4.E1)\) using the source, overlap, and target residualsRs,Ro,RtR\_\{s\},R\_\{o\},R\_\{t\}, the gluing termGglueG\_\{\\mathrm\{glue\}\}, the structural penaltiesCviol,PlimitC\_\{\\mathrm\{viol\}\},P\_\{\\mathrm\{limit\}\}, and the weighted costλ𝖢𝗈𝗌𝗍\(Δj\)\\lambda\\mathsf\{Cost\}\(\\Delta\_\{j\}\)\. Candidates are ranked in ascending obstruction, and the selected move is
j^\(T\)=argmin1≤j≤m𝖮𝖻𝗌S\(𝒦j\)\.\\widehat\{j\}\(T\)=\\operatorname\*\{arg\\,min\}\_\{1\\leq j\\leq m\}\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)\.
The intended move is not used in the ranking\. It is used only after selection to evaluate whetherj^\(T\)\\widehat\{j\}\(T\)matches the benchmark\-correct candidate\. We report top\-1 accuracy, mean reciprocal rank, and transition\-type accuracy\. Top\-1 accuracy asks whether the selected candidate is exactly correct; mean reciprocal rank measures how high the intended candidate appears in the obstruction ranking; and transition\-type accuracy asks whether the selected move has the correct type, deformation for deformation\-sufficient families and extension for extension\-required families\.
### 6\.6Secondary evaluation: constellation\-kernel ranking
The secondary evaluation tests whether the obstruction signatures define a transferable representation space across transition families\. Each candidate rowa=\(T,Δj\)a=\(T,\\Delta\_\{j\}\)is represented by
Φ\(T,Δj\),\\Phi\(T,\\Delta\_\{j\}\),which contains residual termsRs,Ro,Rt,RvR\_\{s\},R\_\{o\},R\_\{t\},R\_\{v\}, gluing discrepancyGglueG\_\{\\mathrm\{glue\}\}, structural penaltiesCviol,PlimitC\_\{\\mathrm\{viol\}\},P\_\{\\mathrm\{limit\}\}, cost𝖢𝗈𝗌𝗍\(Δj\)\\mathsf\{Cost\}\(\\Delta\_\{j\}\), and graph featuresψ\(G𝒦j\)\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)\. Candidate rows are compared by the additive constellation kernelk\(a,b\)k\(a,b\)in Eq\. \([2](https://arxiv.org/html/2605.14033#S5.E2)\)\.
We use a leave\-one\-family\-out protocol\. For a held\-out transition familyff, candidate signatures from the other five families are used to score and rank the candidates inff\. This tests whether the signature blockszres,zglue,zcon,zlimz\_\{\\mathrm\{res\}\},z\_\{\\mathrm\{glue\}\},z\_\{\\mathrm\{con\}\},z\_\{\\mathrm\{lim\}\}andψ\(G𝒦j\)\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)transfer across transition types rather than only separating candidates within one family\. Leave\-one\-group\-out evaluation is a standard way to assess generalization across structured groups rather than across exchangeable individual samples\(Stone,[1974](https://arxiv.org/html/2605.14033#bib.bib38); Hastie et al\.,[2009](https://arxiv.org/html/2605.14033#bib.bib17)\)\.
The kernel experiment remains secondary to direct obstruction ranking\. The primary question is whether𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)selects the intended move inside each transition card\. The kernel asks a different question: whether the candidate signatures induced by the obstruction terms and constellation graphs form a useful similarity geometry across families\.
### 6\.7Stress tests and robustness
Two additional analyses test the stability of the obstruction ranking\. The stress test expands the candidate set\{Δj\}j=1m\\\{\\Delta\_\{j\}\\\}\_\{j=1\}^\{m\}with additional incorrect formulas, randomized candidates, and matched\-cost incorrect extensions\. For each expanded card, we recompute𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)and compare the intended candidate with the best incorrect or matched\-cost alternative\. This tests whether the intended move is selected because it reduces local\-to\-global obstruction, rather than because it has favorable cost𝖢𝗈𝗌𝗍\(Δj\)\\mathsf\{Cost\}\(\\Delta\_\{j\}\)or merely adds flexibility\.
The robustness analysis perturbs the context\-indexed observationsDs,Do,Dt,DvD\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\}\. We add noise to the observation values and reduce the number of available records, then recompute the obstruction ranking\. The goal is not to simulate every possible experimental uncertainty, but to test whether the selected movej^\(T\)\\widehat\{j\}\(T\)remains stable when the finite evidence is degraded\. In this way, the robustness sweep probes whether the diagnosis depends mainly on coherent local\-to\-global structure or on fragile sampling details\.
### 6\.8Experimental questions
The experiments are organized around the finite theory\-shift detection problem: given a source constellation𝒦0\\mathcal\{K\}\_\{0\}, candidate moves\{Δj\}\\\{\\Delta\_\{j\}\\\}, and context\-indexed evidenceDs,Do,Dt,DvD\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\}, can obstruction ranking detect whether the source representation transports or requires extension? The experimental blocks test the ranking itself, the deformation\-versus\-extension classification, the contribution of obstruction components, and the stability of the diagnosis under controlled perturbations\. Table[5](https://arxiv.org/html/2605.14033#S6.T5)summarizes this structure\.
Table 5:Experimental questions addressed by the benchmark\. The experiments test whether finite obstruction ranking detects transport versus extension and whether the resulting diagnosis is stable under controlled perturbations\.The table also fixes the role of the secondary analyses\. Direct obstruction ranking is the decision rule\. The kernel asks whether the same signatures define a useful similarity geometry across families\. Stress and robustness analyses test where the obstruction diagnosis remains stable and where boundary cases appear\.
### 6\.9Summary of experimental logic
The experimental construction applies the finite obstruction test to a collection of transition cards
T=\(𝒦0,Ds,Do,Dt,Dv,\{Δj\}j=1m\)\.T=\(\\mathcal\{K\}\_\{0\},D\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\},\\\{\\Delta\_\{j\}\\\}\_\{j=1\}^\{m\}\)\.For each candidate moveΔj\\Delta\_\{j\}, the card generates a candidate constellation𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\)\. The candidate is fitted in source and target regimes, restricted to the overlap, evaluated by residual, gluing, constraint, limit, and cost terms, and ranked by𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)\. Thus the core computation is
\(T,Δj\)↦𝒦j↦\(Rs,Ro,Rt,Gglue,Cviol,Plimit,𝖢𝗈𝗌𝗍\)↦𝖮𝖻𝗌S\(𝒦j\)\.\(T,\\Delta\_\{j\}\)\\;\\mapsto\\;\\mathcal\{K\}\_\{j\}\\;\\mapsto\\;\\bigl\(R\_\{s\},R\_\{o\},R\_\{t\},G\_\{\\mathrm\{glue\}\},C\_\{\\mathrm\{viol\}\},P\_\{\\mathrm\{limit\}\},\\mathsf\{Cost\}\\bigr\)\\;\\mapsto\\;\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)\.A deformation\-sufficient family is successful when a deformation candidate has the lowest obstruction\. An extension\-required family is successful when fixed\-language candidates remain obstructed and the intended extension is selected\. The benchmark therefore tests the central diagnostic claim of the paper: transport is adequate when coherence can be restored inside the original constellation, while extension is required when coherence appears only after enlarging the representational language\.
Algorithms[1](https://arxiv.org/html/2605.14033#algorithm1)and[2](https://arxiv.org/html/2605.14033#algorithm2)give the two core operations used in the experiments\. Additional protocols for the constellation kernel, stress tests, and robustness sweeps are collected in[A](https://arxiv.org/html/2605.14033#A1)\.
Input:Transition families
ℱ\\mathcal\{F\}; context regimes
\(Ds,Do,Dt,Dv\)\(D\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\}\); admissible move classes
𝒟\\mathcal\{D\}
Output:Transition\-card collection
𝒯\\mathcal\{T\}
1
𝒯←∅\\mathcal\{T\}\\leftarrow\\emptyset
2foreach*transition familyf∈ℱf\\in\\mathcal\{F\}*do
3Specify the source constellation
𝒦0\(f\)\\mathcal\{K\}\_\{0\}^\{\(f\)\}
4Generate context\-indexed datasets
Ds\(f\),Do\(f\),Dt\(f\),Dv\(f\)D\_\{s\}^\{\(f\)\},D\_\{o\}^\{\(f\)\},D\_\{t\}^\{\(f\)\},D\_\{v\}^\{\(f\)\}for the source, overlap, target, and validation regimes
5Define admissible candidate moves
Δ\(f\)=\{Δ1\(f\),…,Δmf\(f\)\},\\Delta^\{\(f\)\}=\\\{\\Delta^\{\(f\)\}\_\{1\},\\ldots,\\Delta^\{\(f\)\}\_\{m\_\{f\}\}\\\},including the base move, deformations, controlled incorrect alternatives, and intended moves
6Form the transition card
T\(f\)=\(𝒦0\(f\),Ds\(f\),Do\(f\),Dt\(f\),Dv\(f\),Δ\(f\)\)T^\{\(f\)\}=\\bigl\(\\mathcal\{K\}\_\{0\}^\{\(f\)\},D\_\{s\}^\{\(f\)\},D\_\{o\}^\{\(f\)\},D\_\{t\}^\{\(f\)\},D\_\{v\}^\{\(f\)\},\\Delta^\{\(f\)\}\\bigr\)
7
𝒯←𝒯∪\{T\(f\)\}\\mathcal\{T\}\\leftarrow\\mathcal\{T\}\\cup\\\{T^\{\(f\)\}\\\}
8
return*𝒯\\mathcal\{T\}*
Algorithm 1Transition\-card constructionAlgorithm[1](https://arxiv.org/html/2605.14033#algorithm1)defines the finite card collection\. Algorithm[2](https://arxiv.org/html/2605.14033#algorithm2)ranks the candidate moves inside each card by their selection obstruction\.
Input:Transition card
T=\(𝒦0,Ds,Do,Dt,Dv,\{Δj\}j=1m\)T=\(\\mathcal\{K\}\_\{0\},D\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\},\\\{\\Delta\_\{j\}\\\}\_\{j=1\}^\{m\}\)
Output:Selected move
Δ^\(T\)\\widehat\{\\Delta\}\(T\), ranked moves, and obstruction signatures
1foreach*candidate moveΔj\\Delta\_\{j\}*do
2Form the candidate constellation
𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\)
3Fit local charts
𝒦^j,s=𝖥𝗂𝗍\(𝒦j;Ds\)\\widehat\{\\mathcal\{K\}\}\_\{j,s\}=\\mathsf\{Fit\}\(\\mathcal\{K\}\_\{j\};D\_\{s\}\)and
𝒦^j,t=𝖥𝗂𝗍\(𝒦j;Dt\)\\widehat\{\\mathcal\{K\}\}\_\{j,t\}=\\mathsf\{Fit\}\(\\mathcal\{K\}\_\{j\};D\_\{t\}\)
4Restrict local charts to the overlap:
ρs→o\(𝒦^j,s\)\\rho\_\{s\\to o\}\(\\widehat\{\\mathcal\{K\}\}\_\{j,s\}\)and
ρt→o\(𝒦^j,t\)\\rho\_\{t\\to o\}\(\\widehat\{\\mathcal\{K\}\}\_\{j,t\}\)
5Compute residuals
Rs\(𝒦j\),Ro\(𝒦j\),Rt\(𝒦j\)R\_\{s\}\(\\mathcal\{K\}\_\{j\}\),R\_\{o\}\(\\mathcal\{K\}\_\{j\}\),R\_\{t\}\(\\mathcal\{K\}\_\{j\}\)for selection and
Rv\(𝒦j\)R\_\{v\}\(\\mathcal\{K\}\_\{j\}\)for diagnostics
6Compute gluing residual
Gglue\(𝒦j\)=do\(ρs→o\(𝒦^j,s\),ρt→o\(𝒦^j,t\)\)G\_\{\\mathrm\{glue\}\}\(\\mathcal\{K\}\_\{j\}\)=d\_\{o\}\\\!\\left\(\\rho\_\{s\\to o\}\(\\widehat\{\\mathcal\{K\}\}\_\{j,s\}\),\\rho\_\{t\\to o\}\(\\widehat\{\\mathcal\{K\}\}\_\{j,t\}\)\\right\)
7Compute structural and cost terms
Cviol\(𝒦j\)C\_\{\\mathrm\{viol\}\}\(\\mathcal\{K\}\_\{j\}\),
Plimit\(𝒦j\)P\_\{\\mathrm\{limit\}\}\(\\mathcal\{K\}\_\{j\}\), and
𝖢𝗈𝗌𝗍\(Δj\)\\mathsf\{Cost\}\(\\Delta\_\{j\}\)
8Evaluate
𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)using Eq\. \([1](https://arxiv.org/html/2605.14033#S4.E1)\)
9Store the signature
Φ\(T,Δj\)\\Phi\(T,\\Delta\_\{j\}\)
10
11Rank moves so that
𝖮𝖻𝗌S\(𝒦\(1\)\)≤𝖮𝖻𝗌S\(𝒦\(2\)\)≤⋯≤𝖮𝖻𝗌S\(𝒦\(m\)\),𝒦\(r\)=Δ\(r\)\(𝒦0\)\.\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{\(1\)\}\)\\leq\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{\(2\)\}\)\\leq\\cdots\\leq\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{\(m\)\}\),\\qquad\\mathcal\{K\}\_\{\(r\)\}=\\Delta\_\{\(r\)\}\(\\mathcal\{K\}\_\{0\}\)\.
12
Δ^\(T\)←Δ\(1\)\\widehat\{\\Delta\}\(T\)\\leftarrow\\Delta\_\{\(1\)\}
return*Δ^\(T\)\\widehat\{\\Delta\}\(T\), ranked moves, and\{Φ\(T,Δj\)\}j=1m\\\{\\Phi\(T,\\Delta\_\{j\}\)\\\}\_\{j=1\}^\{m\}*
Algorithm 2Finite obstruction ranking
## 7Results
The results evaluate whether the selection obstruction𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)detects when a source constellation transports by deformation and when it requires extension\. The benchmark contains 30 transition cards across six families: three deformation\-sufficient families and three extension\-required families\. For each cardTT, candidate movesΔj\\Delta\_\{j\}generate constellations𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\), and the candidates are ranked by ascending𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)\. The intended candidate and transition type are used only for evaluation\.
Three findings organize the section\. First, direct obstruction ranking selects the intended deformation or extension in most cards and perfectly separates deformation\-sufficient from extension\-required transition types in this benchmark\. Second, baselines, ablations, and stress tests show that the result is not explained by target residual alone, representational cost, or added flexibility\. Third, the constellation kernel gives a useful secondary probe of the representation space, but remains weaker than direct obstruction ranking and is not the decision rule\.
### 7\.1Transition families and candidate sets
Table[6](https://arxiv.org/html/2605.14033#S7.T6)summarizes the evaluated cards\. Each transition family contributes five cards\. Deformation\-sufficient families require a bounded move inside the source representational language\. Extension\-required families require a new primitive, constraint, law schema, transformation rule, or limiting relation\. Each card contains the unchanged source constellation, candidate deformations, controlled incorrect alternatives, and the intended deformation or extension\.
Table 6:Transition families, transition type, number of transition cards, candidate count, and intended representational move\. Family labels are shortened for readability\.Figure[4](https://arxiv.org/html/2605.14033#S7.F4)shows how obstruction components support or oppose the intended move\. For one representative card from each family, the figure compares the benchmark\-correct reference candidate with the best incorrect alternative\. The signed margin decomposes into fit, gluing, structural, and cost contributions, so the reader can see whether the ranking is driven by residuals, overlap compatibility, constraints and limits, or representational cost\.
Figure 4:Obstruction\-margin ledger across transition families\. Each row shows one representative transition card\. The reference candidate is the benchmark\-correct deformation or extension, and the best incorrect alternative is the lowest\-obstruction non\-reference candidate\. Markers show signed contributions of fit, gluing, structure, and cost to the margin𝖮𝖻𝗌S\(bestincorrect\)−𝖮𝖻𝗌S\(reference\)\\mathsf\{Obs\}\_\{S\}\(\\mathrm\{best\\ incorrect\}\)\-\\mathsf\{Obs\}\_\{S\}\(\\mathrm\{reference\}\); black diamonds mark the net margin\. Positive values favor the reference candidate\. The ledger shows which obstruction components support the benchmark\-correct move before the detailed candidate landscapes in Figure[5](https://arxiv.org/html/2605.14033#S7.F5)\.
### 7\.2Obstruction ranking identifies the intended move
The primary evaluation uses no supervised training\. For each transition card, the selected candidate is
j^\(T\)=argmin1≤j≤m𝖮𝖻𝗌S\(𝒦j\)\.\\widehat\{j\}\(T\)=\\operatorname\*\{arg\\,min\}\_\{1\\leq j\\leq m\}\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)\.As shown in Table[7](https://arxiv.org/html/2605.14033#S7.T7), obstruction ranking selects the intended candidate in 27 of 30 cards, with mean reciprocal rank0\.9500\.950\. It also achieves perfect transition\-type accuracy: selected candidates are deformations in deformation\-sufficient families and extensions in extension\-required families\.
This is the main empirical evidence for the proposed diagnosis\. The score𝖮𝖻𝗌S\\mathsf\{Obs\}\_\{S\}combines source, overlap, and target residualsRs,Ro,RtR\_\{s\},R\_\{o\},R\_\{t\}, gluing discrepancyGglueG\_\{\\mathrm\{glue\}\}, constraint and limit penaltiesCviol,PlimitC\_\{\\mathrm\{viol\}\},P\_\{\\mathrm\{limit\}\}, and costλ𝖢𝗈𝗌𝗍\(Δj\)\\lambda\\mathsf\{Cost\}\(\\Delta\_\{j\}\)\. The metrics in Table[7](https://arxiv.org/html/2605.14033#S7.T7)measure three aspects of the ranking\.*Top\-1*is the fraction of cards for which the lowest\-obstruction candidate is the benchmark\-correct move\.*Mean reciprocal rank*is the average of1/r\(T\)1/r\(T\), wherer\(T\)r\(T\)is the rank position of the benchmark\-correct candidate in cardTT\.*Type accuracy*is the fraction of cards for which the selected move has the correct transition type: deformation for deformation\-sufficient families and extension for extension\-required families\. The perfect transition\-type accuracy indicates that the obstruction terms organize the selected move as transport or extension, not merely as a low\-error fit\.
Table 7:Primary obstruction\-ranking performance\. Top\-1 measures exact candidate selection, MRR measures the rank of the benchmark\-correct candidate, and type accuracy measures whether the selected move has the correct deformation\-versus\-extension type\.Figure[5](https://arxiv.org/html/2605.14033#S7.F5)shows two representative candidate landscapes\. In the Galilean\-to\-Lorentz card, the Lorentzian extension pays representational cost but lowers residual, gluing, and structural obstruction enough to become the selected candidate\. In the small\-angle\-to\-finite\-pendulum card, a finite\-angle deformation restores coherence without enlarging the representational language\.
Figure 5:Candidate landscapes for one extension\-required and one deformation\-sufficient transition card\. Bars show total weighted obstruction decomposed into fit, gluing, structure, and cost contributions\. Lower total obstruction is better\. Panel \(a\) shows a Galilean\-to\-Lorentz card in which the Lorentzian extension is the benchmark\-correct and selected candidate\. Panel \(b\) shows a small\-angle\-to\-finite\-pendulum card in which the finite\-angle deformation is benchmark\-correct and selected\. The examples illustrate the distinction between paying representational cost for a necessary extension and restoring coherence by deformation inside the original language\.
### 7\.3Baselines and direct obstruction ablations
The baseline comparisons test whether the theory\-shift diagnosis is already captured by residual error or by a simpler costed score\. Each row in Table[8](https://arxiv.org/html/2605.14033#S7.T8)defines an alternative scalar ranking scoreSbase\(𝒦j\)S\_\{\\mathrm\{base\}\}\(\\mathcal\{K\}\_\{j\}\); candidates are ranked by ascendingSbaseS\_\{\\mathrm\{base\}\}, exactly as candidates are ranked by ascending𝖮𝖻𝗌S\\mathsf\{Obs\}\_\{S\}in the main evaluation\. The target\-only baseline uses onlyRt\(𝒦j\)R\_\{t\}\(\\mathcal\{K\}\_\{j\}\)\. The source–target baseline usesRs\(𝒦j\)\+Rt\(𝒦j\)R\_\{s\}\(\\mathcal\{K\}\_\{j\}\)\+R\_\{t\}\(\\mathcal\{K\}\_\{j\}\)\. The source–overlap–target baseline usesRs\(𝒦j\)\+Ro\(𝒦j\)\+Rt\(𝒦j\)R\_\{s\}\(\\mathcal\{K\}\_\{j\}\)\+R\_\{o\}\(\\mathcal\{K\}\_\{j\}\)\+R\_\{t\}\(\\mathcal\{K\}\_\{j\}\)\. The residual–cost baseline addsλ𝖢𝗈𝗌𝗍\(Δj\)\\lambda\\mathsf\{Cost\}\(\\Delta\_\{j\}\)to the residual terms\. The residual–gluing baseline addsGglue\(𝒦j\)G\_\{\\mathrm\{glue\}\}\(\\mathcal\{K\}\_\{j\}\)but omits constraint, limit, and cost terms\. The full obstruction row is Eq\. \([1](https://arxiv.org/html/2605.14033#S4.E1)\)\.
Table[8](https://arxiv.org/html/2605.14033#S7.T8)shows that target residual alone matches full obstruction in aggregate top\-1 accuracy, but not in transition\-type accuracy\. Residual\-only scores can often find a plausible candidate, but they do not reliably organize the selected move as deformation or extension\. Adding cost to residuals performs worse because it can favor cheaper but structurally inadequate candidates\. Adding gluing restores perfect transition\-type accuracy\.
Table 8:Direct non\-kernel baselines for candidate ranking\. Each row defines a scalar score used to rank candidates by ascending value\. The comparison tests whether the full obstruction functional adds diagnostic structure beyond residual fit, cost, or gluing alone\.The added value of obstruction is therefore diagnostic rather than simply aggregate predictive accuracy\. The local\-to\-global terms expose why a candidate wins: whether it fits the target, agrees on the overlap, preserves limits, satisfies constraints, and pays justified representational cost\. In this benchmark, the full obstruction functional and residual\-only ranking can agree on top\-1 counts, but they differ in their ability to preserve the transport\-versus\-extension structure of the task\.
Table[9](https://arxiv.org/html/2605.14033#S7.T9)reports direct ablations of Eq\. \([1](https://arxiv.org/html/2605.14033#S4.E1)\)\. In each “NoXX” row, candidates are reranked after setting the corresponding obstruction term to zero while leaving the remaining terms unchanged\. For example, “No gluing” removeswgGglue\(𝒦j\)w\_\{g\}G\_\{\\mathrm\{glue\}\}\(\\mathcal\{K\}\_\{j\}\), “No limit” removeswlPlimit\(𝒦j\)w\_\{l\}P\_\{\\mathrm\{limit\}\}\(\\mathcal\{K\}\_\{j\}\), and “No cost” removesλ𝖢𝗈𝗌𝗍\(Δj\)\\lambda\\mathsf\{Cost\}\(\\Delta\_\{j\}\)\. The residual\-only and residual\-plus\-cost rows are included again to make the comparison with the baseline scores explicit\.
Removing the limit term reduces both top\-1 and transition\-type accuracy, confirming the role of source\-regime preservation\. Removing gluing increases aggregate top\-1 in this finite benchmark but lowers transition\-type accuracy, mainly because it avoids penalizing noisy virial cases\. This result should not be read as evidence against gluing\. It shows thatGglueG\_\{\\mathrm\{glue\}\}is a structural term: it enforces the local\-to\-global interpretation of a move as transport or extension, while also revealing where finite data and weights create strain\. Removing constraints or cost has little aggregate effect in the original 30\-card ranking, althoughCviolC\_\{\\mathrm\{viol\}\}and𝖢𝗈𝗌𝗍\\mathsf\{Cost\}remain important for interpreting the selected moves and for the kernel probe below\.
Table 9:Direct obstruction ablations\. Each row reranks candidates after removing one term from Eq\. \([1](https://arxiv.org/html/2605.14033#S4.E1)\), without using the kernel\. “No gluing”, for example, removeswgGgluew\_\{g\}G\_\{\\mathrm\{glue\}\}, while “No limit” removeswlPlimitw\_\{l\}P\_\{\\mathrm\{limit\}\}\.
### 7\.4Weight and cost sensitivity
Figure[6](https://arxiv.org/html/2605.14033#S7.F6)summarizes sensitivity to the weights in Eq\. \([1](https://arxiv.org/html/2605.14033#S4.E1)\)\. The reference setting is the1×1\\timesmultiplier\. Each sweep multiplies one obstruction block while leaving the other blocks fixed at their reference values, then recomputes the selected movej^\(T\)=argminj𝖮𝖻𝗌S\(𝒦j\)\\widehat\{j\}\(T\)=\\operatorname\*\{arg\\,min\}\_\{j\}\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)\. Moderate changes to residual, gluing, constraint, and limit blocks leave top\-1 accuracy and selected candidates largely unchanged\. The cost multiplier is the most sensitive block: excessive cost penalization suppresses necessary extensions and changes several top\-1 selections\. This is the expected failure mode for a costed theory\-shift criterion\. Cost should discourage unnecessary language change, but not prevent an extension when it sharply lowersRtR\_\{t\},GglueG\_\{\\mathrm\{glue\}\},CviolC\_\{\\mathrm\{viol\}\}, orPlimitP\_\{\\mathrm\{limit\}\}\.
Figure 6:Weight sensitivity and selection stability\. Panel \(a\) shows top\-1 accuracy as each obstruction block is multiplied relative to the reference1×1\\timessetting\. Panel \(b\) shows the number of selected movesj^\(T\)\\widehat\{j\}\(T\)that change relative to the reference ranking\. Panel \(c\) isolates the cost multiplier, the most sensitive block: large cost values reduce accuracy by over\-penalizing necessary extensions\. Panel \(d\) summarizes selection instability across blocks\. Moderate perturbations are stable, while excessive cost penalization is the main failure mode\.Held\-out validation residualsRvR\_\{v\}are not used in𝖮𝖻𝗌S\\mathsf\{Obs\}\_\{S\}, but they check whether low\-obstruction choices remain plausible outside the fitted source, overlap, and target regimes\. Across the 30 cards, intended candidates have mean validation residual0\.0370\.037, selected top\-1 candidates have mean validation residual0\.0450\.045, best incorrect candidates have mean validation residual0\.2520\.252, and source\-base candidates have mean validation residual0\.9820\.982\. Low obstruction therefore generally transfers to held\-out regimes, with the main exceptions occurring in the virial boundary cases discussed next\.
### 7\.5Wrong\-candidate stress test
The stress test asks whether the intended move wins because it removes local\-to\-global obstruction, rather than merely because it has more expressive capacity\. For each transition cardTT, the original candidate set\{Δj\}j=1m\\\{\\Delta\_\{j\}\\\}\_\{j=1\}^\{m\}is expanded with additional incorrect formulas, randomized formula perturbations, and matched\-cost incorrect extensions\. Each new moveΔj\\Delta\_\{j\}produces a candidate constellation𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\), and all candidates are reranked by𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)\. The ranking remains stable: top\-1 accuracy stays at0\.9000\.900, mean reciprocal rank remains high at0\.9250\.925, and no matched\-cost incorrect extension beats the intended extension\.
Figure[7](https://arxiv.org/html/2605.14033#S7.F7)reports the margin
M\(T\)=𝖮𝖻𝗌S\(𝒦bestincorrect\)−𝖮𝖻𝗌S\(𝒦ref\),M\(T\)=\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{\\mathrm\{best\\ incorrect\}\}\)\-\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{\\mathrm\{ref\}\}\),where𝒦ref\\mathcal\{K\}\_\{\\mathrm\{ref\}\}is the benchmark\-correct candidate and𝒦bestincorrect\\mathcal\{K\}\_\{\\mathrm\{best\\ incorrect\}\}is the lowest\-obstruction non\-reference or matched\-cost alternative in the expanded set\. Positive margins mean that the reference candidate still has lower obstruction\. Negative margins mark cases where a non\-reference candidate becomes preferred\.
Most transition cards retain positive margins\. The negative margins are concentrated in perturbed\-coefficient ideal\-gas\-to\-virial variants, where the ranking selects a lower\-cost linear virial deformation or a randomized perturbed\-coefficient formula instead of the intended quadratic virial deformation\. In formal terms, those alternatives reduce enough ofGglue\(𝒦j\)G\_\{\\mathrm\{glue\}\}\(\\mathcal\{K\}\_\{j\}\),Ro\(𝒦j\)R\_\{o\}\(\\mathcal\{K\}\_\{j\}\), or𝖢𝗈𝗌𝗍\(Δj\)\\mathsf\{Cost\}\(\\Delta\_\{j\}\)on finite noisy density samples to offset their weaker structural interpretation\. These cases identify a boundary of the finite benchmark: partial low\-cost corrections can appear more coherent than the intended quadratic correction when the available density evidence is noisy or sparse\.
Figure 7:Stress\-test margins across expanded transition cards\. Panel \(a\) shows, for each transition family, the marginM\(T\)=𝖮𝖻𝗌S\(𝒦bestincorrect\)−𝖮𝖻𝗌S\(𝒦ref\)M\(T\)=\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{\\mathrm\{best\\ incorrect\}\}\)\-\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{\\mathrm\{ref\}\}\)between the best incorrect or matched\-cost alternative and the benchmark\-correct reference candidate\. Positive margins indicate that the reference candidate retains lower obstruction; negative margins indicate boundary cases where a non\-reference alternative becomes preferred\. Panel \(b\) highlights the negative cases, which are concentrated in virial/randomized\-formula variants\. These cases mark informative benchmark boundaries rather than a general collapse of the obstruction criterion\.
### 7\.6Robustness to noise and reduced evidence
The robustness sweep perturbs the context\-indexed datasetsDs,Do,Dt,DvD\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\}and recomputes the selected move
j^\(T\)=argminj𝖮𝖻𝗌S\(𝒦j\)\.\\widehat\{j\}\(T\)=\\operatorname\*\{arg\\,min\}\_\{j\}\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)\.For each noise levelη\\etaand retained\-record fractionqq, observation values are perturbed and only a fractionqqof records is retained in each context\. The figure reports the resulting mean top\-1 accuracy over transition cards\.
Figure[8](https://arxiv.org/html/2605.14033#S7.F8)shows the full perturbation grid and the marginal trends\. Accuracy remains high under moderate subsampling, meaning that the obstruction ranking does not require all observation records to recover the same transport\-versus\-extension diagnosis\. Accuracy decreases more sharply as noise increases, because noise directly corrupts the residualsRs,Ro,RtR\_\{s\},R\_\{o\},R\_\{t\}and the overlap comparison enteringGglueG\_\{\\mathrm\{glue\}\}\. In this benchmark, the diagnosis is therefore more sensitive to noisy local evidence than to moderate reductions in record availability\.
Figure 8:Robustness under observation noise and reduced record availability\. Panel \(a\) reports mean top\-1 accuracy after recomputingj^\(T\)=argminj𝖮𝖻𝗌S\(𝒦j\)\\widehat\{j\}\(T\)=\\operatorname\*\{arg\\,min\}\_\{j\}\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)for each noise levelη\\etaand retained\-record fractionqq\. Panels \(b1\) and \(b2\) summarize the same grid by averaging over record fractions and noise levels, respectively\. Accuracy decreases mainly as observation noise increases, while moderate reductions in record availability have a weaker effect\.
### 7\.7Secondary constellation\-kernel probe
The constellation kernel evaluates whether the candidate signatures form a transferable representation space across transition families\. Each candidate rowa=\(T,Δj\)a=\(T,\\Delta\_\{j\}\)is represented by the obstruction signatureΦ\(T,Δj\)\\Phi\(T,\\Delta\_\{j\}\), including residual, gluing, constraint, limit, cost, and graph\-feature blocks\. The kernelk\(a,b\)k\(a,b\)then compares candidate moves through these blocks rather than through𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)alone\. In leave\-one\-family\-out evaluation, the kernel achieves top\-1 accuracy0\.6000\.600, mean reciprocal rank0\.7830\.783, and transition\-type accuracy0\.8000\.800\. These values are below direct obstruction ranking, but they show thatΦ\(T,Δj\)\\Phi\(T,\\Delta\_\{j\}\)andψ\(G𝒦j\)\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)carry cross\-family structure\.
The family\-level results are heterogeneous\. The kernel performs strongly on Rayleigh–Jeans→\\rightarrowPlanck and Newtonian→\\rightarrowrelativistic energy, but weakly on Ohm→\\rightarrowtemperature resistance and some deformation\-sufficient families\. This is expected in the small\-sample setting: the kernel must infer similarity across families from few structured examples, whereas the selection obstruction𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)evaluates each candidate directly inside its own transition card\.
Figure[9](https://arxiv.org/html/2605.14033#S7.F9)summarizes two diagnostics\. Panel \(a\) ablates the kernel blocks in Eq\. \([2](https://arxiv.org/html/2605.14033#S5.E2)\)\. Removing the gluing blockkgluek\_\{\\mathrm\{glue\}\}reduces ranking and transition\-type discrimination, showing that overlap compatibility contributes to the transferable signature\. Removing the graph blockkgraphk\_\{\\mathrm\{graph\}\}mainly affects transition\-type prediction, indicating that typed constellation structure helps separate deformation from extension\. The strong “no constraints” result indicates that the constraint blockkconk\_\{\\mathrm\{con\}\}is overactive for kernel generalization in this small benchmark\. Panel \(b\) compares generalization protocols: within\-family and mixed\-variant generalization are nearly saturated, whereas leave\-one\-family\-out remains the harder analogical\-transfer setting\. Exact aggregate values for these secondary diagnostics are reported in[C](https://arxiv.org/html/2605.14033#A3)\.
Figure 9:Secondary constellation\-kernel probe\. Panel \(a\) ablates the kernel blocks in Eq\. \([2](https://arxiv.org/html/2605.14033#S5.E2)\)\. Removingkgluek\_\{\\mathrm\{glue\}\}reduces ranking and type discrimination, while removingkgraphk\_\{\\mathrm\{graph\}\}mainly affects transition\-type prediction; the “no constraints” result indicates thatkconk\_\{\\mathrm\{con\}\}requires better calibration for kernel generalization\. Panel \(b\) compares controlled generalization protocols: within\-family and mixed\-variant generalization are nearly saturated, while leave\-one\-family\-out remains the harder analogical\-transfer setting\. The kernel probes the geometry ofΦ\(T,Δj\)\\Phi\(T,\\Delta\_\{j\}\)andψ\(G𝒦j\)\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\); it does not replace direct obstruction ranking by𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)\.
### 7\.8Qualitative case studies
The extension\-required families show how𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)separates local fit from representational coherence\. In each case, the base or fixed\-language candidate remains meaningful in the source regimeDsD\_\{s\}, but fails to give a coherent source–overlap–target account acrossDs,Do,DtD\_\{s\},D\_\{o\},D\_\{t\}\. The selected extension lowers obstruction not only by reducing residuals, but by improvingGglue\(𝒦j\)G\_\{\\mathrm\{glue\}\}\(\\mathcal\{K\}\_\{j\}\), satisfying the relevantCviol\(𝒦j\)C\_\{\\mathrm\{viol\}\}\(\\mathcal\{K\}\_\{j\}\), and preserving the source limit throughPlimit\(𝒦j\)P\_\{\\mathrm\{limit\}\}\(\\mathcal\{K\}\_\{j\}\)\.
#### Galilean→\\rightarrowLorentz velocity composition\.
The source regimeDsD\_\{s\}contains low velocities where additive composition is locally adequate\. The target regimeDtD\_\{t\}contains high subluminal velocities where invariant\-speed and speed\-bound constraints become active, and the overlapDoD\_\{o\}tests whether the source\-fitted and target\-fitted charts agree\. The fixed additive candidate has total obstruction3\.8703\.870\. A polynomial deformation lowers the obstruction to0\.7710\.771, but still leaves nonzero gluing, constraint, and limit penalties: it improves fit without fully restoring local\-to\-global coherence\. The Lorentzian extension has obstruction0\.4010\.401\. It pays representational costλ𝖢𝗈𝗌𝗍\(Δj\)\\lambda\\mathsf\{Cost\}\(\\Delta\_\{j\}\), but lowers the total score by preserving the low\-speed limit, satisfying the invariant\-speed constraint, and improving overlap compatibility\. The selected move is therefore not only a change of formula; it is an extension of the velocity constellation by invariant\-speed structure\.
#### Newtonian→\\rightarrowrelativistic kinetic energy\.
HereDsD\_\{s\}is the low\-speed kinetic\-energy regime, whileDtD\_\{t\}approaches relativistic velocities\. The Newtonian fixed\-language candidate has obstruction11\.31511\.315, reflecting large target and compatibility failures when the low\-speed chart is transported too far\. The best incorrect rational extension reduces the obstruction to6\.8946\.894, but does not recover the full source–target coherence\. The relativistic energy extension has obstruction0\.3890\.389\. It reduces the high\-velocity residual terms, improves the gluing behavior onDoD\_\{o\}, and preserves the Newtonian expression as the appropriate low\-velocity limit\. This case illustrates the role ofPlimitP\_\{\\mathrm\{limit\}\}: the extension succeeds not by discarding the source theory, but by retaining it as a limiting chart\.
#### Rayleigh–Jeans→\\rightarrowPlanck blackbody law\.
The source dataDsD\_\{s\}sample long wavelengths where the Rayleigh–Jeans chart is locally plausible\. The target and validation regimes include shorter wavelengths where finite\-energy behavior becomes decisive\. The fixed\-language candidate has obstruction10\.33410\.334\. The best incorrect polynomial repair reduces the obstruction to4\.7424\.742, but does so mainly by improving local fit while retaining a large structural penalty\. The Planck\-like extension has obstruction0\.9900\.990\. It introduces the missing quantization\-scale structure, reduces the finite\-energy constraint violation, and restores a coherent source–overlap–target description\. In terms of the obstruction components, the selected extension wins because reductions inRtR\_\{t\},GglueG\_\{\\mathrm\{glue\}\}, andCviolC\_\{\\mathrm\{viol\}\}justify the added representational cost\.
Together, these cases show why the obstruction criterion is not reducible to flexible curve fitting\. Incorrect or flexible candidates can lower some residual terms, but the selected extension is the one that lowers the full local\-to\-global obstruction𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)while preserving the source\-regime structure encoded by𝒦0\\mathcal\{K\}\_\{0\}\.
### 7\.9Summary of findings
Across the 30 transition cards, the minimum\-obstruction rulej^\(T\)=argminj𝖮𝖻𝗌S\(𝒦j\)\\widehat\{j\}\(T\)=\\operatorname\*\{arg\\,min\}\_\{j\}\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)behaves as a finite detector of transport versus extension\. In deformation\-sufficient families, low obstruction is achieved by candidates that remain inside the source language\. In extension\-required families, fixed\-language candidates retain obstruction in target fit, overlap gluing, constraints, or limits, and the selected move is an extension of the constellation\. The primary ranking gives the strongest evidence: it achieves0\.9000\.900top\-1 accuracy,0\.9500\.950MRR, and1\.0001\.000transition\-type accuracy\. Baselines and ablations show that the gain over residual\-only ranking is not simply more accurate curve fitting; it is the ability to organize a selected move as deformation or extension and to expose which termsRs,Ro,Rt,Gglue,Cviol,Plimit,𝖢𝗈𝗌𝗍R\_\{s\},R\_\{o\},R\_\{t\},G\_\{\\mathrm\{glue\}\},C\_\{\\mathrm\{viol\}\},P\_\{\\mathrm\{limit\}\},\\mathsf\{Cost\}explain the decision\. Stress and robustness analyses show where the criterion is stable and where finite evidence creates boundary cases, especially in noisy virial variants\. The kernel probe confirms thatΦ\(T,Δj\)\\Phi\(T,\\Delta\_\{j\}\)andψ\(G𝒦j\)\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)carry structured cross\-family information, while also showing that direct obstruction ranking remains the appropriate decision rule in this benchmark\.
## 8Discussion
The results support the central interpretation of the paper: scientific theory\-shift detection can be formulated as a local\-to\-global problem\. A candidate representation is not assessed only by its fit to a target regime\. It is assessed by whether fitted local charts restrict coherently to overlaps, preserve source\-regime limits, satisfy the constraints that define admissible use, and justify any added representational cost\. The obstruction functional𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)makes these requirements explicit and turns them into a ranking criterion\.
The main empirical point is not only that the intended candidate is usually selected\. It is that the selected candidate is organized as a deformation or an extension by explicit obstruction components\. In deformation\-sufficient families, low obstruction is achieved inside the original representational language\. In extension\-required families, fixed\-language candidates retain obstruction in target fit, overlap compatibility, constraints, or limits, and low obstruction appears only after the constellation is enlarged\. This is the operational distinction needed for theory\-shift detection: transport is adequate when coherence can be restored inside the source language; extension is required when coherence requires a new representational resource\.
The baseline and ablation results clarify why the obstruction criterion is not just residual fitting\. Target residual alone can select plausible candidates in many cards, but it does not preserve the deformation\-versus\-extension structure as reliably as the full obstruction score\. The local\-to\-global termsGglueG\_\{\\mathrm\{glue\}\},CviolC\_\{\\mathrm\{viol\}\},PlimitP\_\{\\mathrm\{limit\}\}, and𝖢𝗈𝗌𝗍\(Δj\)\\mathsf\{Cost\}\(\\Delta\_\{j\}\)make the decision interpretable: they show whether a candidate wins because it agrees on overlaps, satisfies structural constraints, preserves the source limit, or pays a justified cost for added language\. The point of obstruction is therefore diagnostic organization, not merely a higher aggregate accuracy number\.
The finite sheaf\-theoretic language provides the organizing structure for this diagnosis\. Contexts are scientific regimes; local sections are fitted representational constellations; restriction evaluates a chart on an overlap; gluing measures compatibility between locally fitted descriptions; obstruction measures failure of coherent transport\. This use of sheaf language is finite and computational, not a claim to full topos semantics\. Its role is to make local\-to\-global coherence testable in a controlled setting, in line with applied sheaf\-theoretic work where consistency, distributed measurements, and global compatibility are treated as computable objects\(Mac Lane & Moerdijk,[1992](https://arxiv.org/html/2605.14033#bib.bib26); Johnstone,[2002](https://arxiv.org/html/2605.14033#bib.bib21); Robinson,[2017](https://arxiv.org/html/2605.14033#bib.bib32); Hansen & Ghrist,[2019](https://arxiv.org/html/2605.14033#bib.bib16); Curry,[2014](https://arxiv.org/html/2605.14033#bib.bib9)\)\.
The deformation and extension cases illustrate two complementary modes of theory shift\. The deformation cases show that moving outside a source regime does not automatically require a new language: small\-angle dynamics, ideal\-gas behavior, and Ohmic response can be carried into neighboring regimes by bounded changes that preserve the source constellation\. The extension cases show the opposite pattern\. Galilean velocity addition, Newtonian kinetic energy, and Rayleigh–Jeans radiation remain meaningful in their source regimes, but fixed language candidates fail to give coherent source–overlap–target descriptions\. The successful candidates introduce invariant\-speed structure, relativistic energy, or a quantization scale\. These are not merely better curve fits; they change what counts as an admissible constellation of observables, constraints, limits, and transformations, matching the role of representational reorganization emphasized in accounts of conceptual change\(Kuhn,[1962](https://arxiv.org/html/2605.14033#bib.bib22); Nersessian,[2008](https://arxiv.org/html/2605.14033#bib.bib30); Thagard,[2012](https://arxiv.org/html/2605.14033#bib.bib40)\)\.
The constellation kernel gives a secondary test of whether obstruction signatures define a reusable representational geometry\. It does not replace the selection rule𝖮𝖻𝗌S\\mathsf\{Obs\}\_\{S\}\. Instead, it asks whetherΦ\(T,Δj\)\\Phi\(T,\\Delta\_\{j\}\)andψ\(G𝒦j\)\\psi\(G\_\{\\mathcal\{K\}\_\{j\}\}\)carry similarity information across transition families\. The kernel results suggest that gluing and graph features are transition\-relevant, while the constraint block needs better calibration for cross\-family generalization\. Within\-family and mixed\-variant generalization are easier than leave\-family\-out transfer, which is the more demanding analogical setting\.
The stress and robustness analyses identify where the finite diagnostic is stable and where it reaches boundary cases\. Adding incorrect, randomized, and matched\-cost candidates does not collapse the ranking, which argues against the simple explanation that the intended candidate wins only by having more expressive capacity\. The failures that do occur are localized, especially in noisy virial variants where a partial low\-cost correction can appear more coherent than the intended quadratic correction\. Noise also affects the ranking more strongly than moderate subsampling, indicating that the obstruction signal depends more on the integrity of local evidence than on having every record available\. These boundary cases are informative because they show which parts of the diagnostic depend on data quality, cost calibration, and finite sampling\.
The paper therefore sits between computational scientific discovery and cognitive accounts of conceptual change\. Computational discovery systems often search for compact laws, symbolic expressions, or governing equations from data\(Langley et al\.,[1987](https://arxiv.org/html/2605.14033#bib.bib23); Schmidt & Lipson,[2009](https://arxiv.org/html/2605.14033#bib.bib34); Brunton et al\.,[2016](https://arxiv.org/html/2605.14033#bib.bib5); Udrescu & Tegmark,[2020](https://arxiv.org/html/2605.14033#bib.bib41)\)\. Cognitive and philosophical accounts emphasize model\-based reasoning, conceptual restructuring, and new representational resources\(Nersessian,[2008](https://arxiv.org/html/2605.14033#bib.bib30); Thagard,[2012](https://arxiv.org/html/2605.14033#bib.bib40); Morgan & Morrison,[1999](https://arxiv.org/html/2605.14033#bib.bib29)\)\. The present framework connects these perspectives through a finite criterion: the object of evaluation is not only a formula, but a representational constellation; and the failure signal is not only prediction error, but obstruction to coherent transport\.
The broader implication is that representational strain should not be identified with a single anomalous datum\. It is a pattern: local fits remain possible, but overlap agreement, source\-limit preservation, constraint satisfaction, and representational cost cannot be jointly maintained inside the original constellation\. In this sense, extension is triggered when local adequacy no longer supports global coherence\.
### 8\.1Scope, limitations, and future work
The experiment establishes a finite mechanism for theory\-shift detection: obstruction\-based selection between deformation and extension\. The transition families are curated rather than historical, because the goal is to isolate the diagnostic operation under controlled conditions\. Candidates are supplied to the ranking procedure rather than generated autonomously\. Thus the benchmark is not a reconstruction of scientific history and not a general machine\-learning benchmark\. It is a controlled cognitive\-systems test of whether structured local\-to\-global evidence allows an artificial scientific agent to distinguish adaptation inside a representational language from extension of that language\.
The mathematical scope is also finite\. Contexts, restrictions, overlap comparisons, gluing residuals, constraints, limits, and costs are explicitly represented and measured\. The paper uses sheaf\-theoretic structure as a finite local\-to\-global formalism; it does not implement full topos semantics\. The main empirical boundary is the secondary kernel: direct obstruction ranking is the stronger criterion, while the kernel remains a probe of the representation space induced by obstruction signatures\. Robustness tests also identify a data boundary: the diagnosis is more sensitive to observation noise than to moderate subsampling\.
The next step is to expand transition cards beyond hand\-designed physics\-inspired cases\. A larger open transition\-card database could include curated cases across domains, explicit source, overlap, target, and validation regimes, candidate representational moves, constraints, limits, and train/test splits for evaluating AI agents\. LLM\-assisted card synthesis may be useful for proposing source/target theory pairs, candidate deformations, plausible incorrect extensions, constraints, and validation regimes, but such proposals would require symbolic validation, consistency checks, and human curation before entering a benchmark\. A larger atlas of transition cards could then support training and evaluation of AI systems for representational transport, extension, and theory\-shift detection\. Richer categorical or topos\-theoretic treatments may later describe theory transport through diagrams of contexts, common refinements, or pullback\-like reconciliation structures; the present paper supplies the finite obstruction primitive needed before that broader program can be made operational\.
## 9Conclusion
This paper formulated scientific theory shift as a finite local\-to\-global diagnostic problem\. A scientific model was represented as a constellation of observables, law schemas, constraints, measurement roles, limiting regimes, and admissible transformations\. Transport preserves this constellation by deformation across contexts; extension enlarges it when the original language no longer supports coherent gluing\.
The obstruction functional makes this distinction computable\. Source, overlap, target, and validation regimes define the finite contextual structure\. Candidate constellations are fitted locally, restricted to overlaps, and evaluated by residual fit, gluing disagreement, constraint violation, limit failure, and representational cost\. The selected move is therefore not simply the candidate that best fits a target dataset, but the candidate that best restores local\-to\-global coherence\.
The benchmark results support this formulation as a controlled proof of concept\. Direct obstruction ranking identifies the intended deformation or extension in most transition cards and separates deformation\-sufficient from extension\-required cases in this benchmark\. Baselines, ablations, stress tests, and robustness analyses show that the diagnosis is not reducible to target residual, raw expressive capacity, or arbitrary cost\. The constellation kernel adds a secondary representational probe, showing that obstruction signatures and typed graph features contain transition\-relevant structure, while confirming that direct obstruction remains the main decision criterion\.
The contribution is a finite computational primitive for a central cognitive operation in scientific modeling: deciding when a representation still transports and when obstruction motivates extension\. In this view, discovery\-like revision begins where local adequacy no longer glues into global coherence\.
## References
- Abramsky & Brandenburger \(2011\)Abramsky, S\., & Brandenburger, A\. \(2011\)\.The sheaf\-theoretic structure of non\-locality and contextuality\.New Journal of Physics,13, 113036\. doi:[10\.1088/1367\-2630/13/11/113036](http://dx.doi.org/10.1088/1367-2630/13/11/113036)\.
- Ayzenberg et al\. \(2025\)Ayzenberg, A\., Gebhart, T\., Magai, G\., & Solomadin, G\. \(2025\)\.Sheaf theory: from deep geometry to deep learning\.[arXiv:2502\.15476](http://arxiv.org/abs/2502.15476)\.
- Bodnar et al\. \(2022\)Bodnar, C\., Di Giovanni, F\., Chamberlain, B\. P\., Liò, P\., & Bronstein, M\. M\. \(2022\)\.Neural sheaf diffusion: A topological perspective on heterophily and oversmoothing in GNNs\.InAdvances in Neural Information Processing Systems\.volume 35\.
- Borgwardt & Kriegel \(2005\)Borgwardt, K\. M\., & Kriegel, H\.\-P\. \(2005\)\.Shortest\-path kernels on graphs\.InProceedings of the Fifth IEEE International Conference on Data Mining\(pp\. 74–81\)\.IEEE\.doi:[10\.1109/ICDM\.2005\.132](http://dx.doi.org/10.1109/ICDM.2005.132)\.
- Brunton et al\. \(2016\)Brunton, S\. L\., Proctor, J\. L\., & Kutz, J\. N\. \(2016\)\.Discovering governing equations from data by sparse identification of nonlinear dynamical systems\.Proceedings of the National Academy of Sciences,113, 3932–3937\. doi:[10\.1073/pnas\.1517384113](http://dx.doi.org/10.1073/pnas.1517384113)\.
- Caramello \(2018\)Caramello, O\. \(2018\)\.Theories, Sites, Toposes: Relating and Studying Mathematical Theories through Topos\-Theoretic Bridges\.Oxford: Oxford University Press\.
- Chen et al\. \(2025\)Chen, Z\., Chen, S\., Ning, Y\., Zhang, Q\., Wang, B\., Yu, B\., Li, Y\., Liao, Z\., Wei, C\., Lu, Z\., Dey, V\., Xue, M\., Baker, F\. N\., Burns, B\., Adu\-Ampratwum, D\., Huang, X\., Ning, X\., Gao, S\., Su, Y\., & Sun, H\. \(2025\)\.ScienceAgentBench: Toward rigorous assessment of language agents for data\-driven scientific discovery\.InInternational Conference on Learning Representations\.URL:[https://openreview\.net/forum?id=6z4YKr0GK6](https://openreview.net/forum?id=6z4YKr0GK6)\.[arXiv:2410\.05080](http://arxiv.org/abs/2410.05080)\.
- Cranmer et al\. \(2020\)Cranmer, M\., Sanchez Gonzalez, A\., Battaglia, P\., Xu, R\., Cranmer, K\., Spergel, D\., & Ho, S\. \(2020\)\.Discovering symbolic models from deep learning with inductive biases\.InAdvances in Neural Information Processing Systems\(pp\. 17429–17442\)\.volume 33\.
- Curry \(2014\)Curry, J\. M\. \(2014\)\.Sheaves, Cosheaves and Applications\.Ph\.D\. thesis University of Pennsylvania\.Available as arXiv:1303\.3255\.
- Danks & Ippoliti \(2018\)Danks, D\., & Ippoliti, E\. \(Eds\.\) \(2018\)\.Building Theories: Heuristics and Hypotheses in Sciencesvolume 41 ofStudies in Applied Philosophy, Epistemology and Rational Ethics\.Cham: Springer\.doi:[10\.1007/978\-3\-319\-72787\-5](http://dx.doi.org/10.1007/978-3-319-72787-5)\.
- Felber et al\. \(2025\)Felber, S\., Hummes Flores, B\., & Rincon Galeana, H\. \(2025\)\.A sheaf\-theoretic characterization of tasks in distributed systems\.[arXiv:2503\.02556](http://arxiv.org/abs/2503.02556)\.
- Feynman et al\. \(2011\)Feynman, R\. P\., Leighton, R\. B\., & Sands, M\. \(2011\)\.The Feynman Lectures on Physics\.\(The new millennium edition ed\.\)\.New York: Basic Books\.
- Gärdenfors \(2000\)Gärdenfors, P\. \(2000\)\.Conceptual Spaces: The Geometry of Thought\.Cambridge, MA: MIT Press\.
- Gärtner et al\. \(2003\)Gärtner, T\., Flach, P\., & Wrobel, S\. \(2003\)\.On graph kernels: Hardness results and efficient alternatives\.InLearning Theory and Kernel Machines\(pp\. 129–143\)\.Springer volume 2777 ofLecture Notes in Computer Science\.doi:[10\.1007/978\-3\-540\-45167\-9\_11](http://dx.doi.org/10.1007/978-3-540-45167-9_11)\.
- Gebhart et al\. \(2023\)Gebhart, T\., Hansen, J\., & Schrater, P\. \(2023\)\.Knowledge sheaves: A sheaf\-theoretic framework for knowledge graph embedding\.InProceedings of The 26th International Conference on Artificial Intelligence and Statistics\.PMLR volume 206 ofProceedings of Machine Learning Research\.
- Hansen & Ghrist \(2019\)Hansen, J\., & Ghrist, R\. \(2019\)\.Toward a spectral theory of cellular sheaves\.Journal of Applied and Computational Topology,3, 315–358\. doi:[10\.1007/s41468\-019\-00038\-7](http://dx.doi.org/10.1007/s41468-019-00038-7)\.
- Hastie et al\. \(2009\)Hastie, T\., Tibshirani, R\., & Friedman, J\. \(2009\)\.The Elements of Statistical Learning: Data Mining, Inference, and Prediction\.\(2nd ed\.\)\.New York: Springer\.
- Haussler \(1999\)Haussler, D\. \(1999\)\.Convolution Kernels on Discrete Structures\.Technical Report UCSC\-CRL\-99\-10 University of California, Santa Cruz\.
- Hofmann et al\. \(2008\)Hofmann, T\., Schölkopf, B\., & Smola, A\. J\. \(2008\)\.Kernel methods in machine learning\.The Annals of Statistics,36, 1171–1220\. doi:[10\.1214/009053607000000677](http://dx.doi.org/10.1214/009053607000000677)\.
- Jansen et al\. \(2024\)Jansen, P\., Côté, M\.\-A\., Khot, T\., Bransom, E\., Dalvi Mishra, B\., Majumder, B\. P\., Tafjord, O\., & Clark, P\. \(2024\)\.DiscoveryWorld: A virtual environment for developing and evaluating automated scientific discovery agents\.InAdvances in Neural Information Processing Systems\.volume 37\.URL:[https://proceedings\.neurips\.cc/paper\_files/paper/2024/hash/13836f251823945316ae067350a5c366\-Abstract\-Datasets\_and\_Benchmarks\_Track\.html](https://proceedings.neurips.cc/paper_files/paper/2024/hash/13836f251823945316ae067350a5c366-Abstract-Datasets_and_Benchmarks_Track.html)Datasets and Benchmarks Track\.
- Johnstone \(2002\)Johnstone, P\. T\. \(2002\)\.Sketches of an Elephant: A Topos Theory Compendium\.Oxford: Oxford University Press\.
- Kuhn \(1962\)Kuhn, T\. S\. \(1962\)\.The Structure of Scientific Revolutions\.Chicago: University of Chicago Press\.
- Langley et al\. \(1987\)Langley, P\., Simon, H\. A\., Bradshaw, G\. L\., & Zytkow, J\. M\. \(1987\)\.Scientific Discovery: Computational Explorations of the Creative Processes\.Cambridge, MA: MIT Press\.
- Lieto et al\. \(2019\)Lieto, A\., Perrone, F\., Pozzato, G\. L\., & Chiodino, E\. \(2019\)\.Beyond subgoaling: A dynamic knowledge generation framework for creative problem solving in cognitive architectures\.Cognitive Systems Research,58, 305–316\. doi:[10\.1016/j\.cogsys\.2019\.08\.005](http://dx.doi.org/10.1016/j.cogsys.2019.08.005)\.
- Lu et al\. \(2024\)Lu, C\., Lu, C\., Lange, R\. T\., Foerster, J\., Clune, J\., & Ha, D\. \(2024\)\.The AI scientist: Towards fully automated open\-ended scientific discovery\.arXiv preprint arXiv:2408\.06292, \.
- Mac Lane & Moerdijk \(1992\)Mac Lane, S\., & Moerdijk, I\. \(1992\)\.Sheaves in Geometry and Logic: A First Introduction to Topos Theory\.New York: Springer\.
- Majumder et al\. \(2025\)Majumder, B\. P\., Surana, H\., Agarwal, D\., Dalvi Mishra, B\., Meena, A\., Prakhar, A\., Vora, T\., Khot, T\., Sabharwal, A\., & Clark, P\. \(2025\)\.DiscoveryBench: Towards data\-driven discovery with large language models\.InInternational Conference on Learning Representations\.URL:[https://openreview\.net/forum?id=vyflgpwfJW](https://openreview.net/forum?id=vyflgpwfJW)\.[arXiv:2407\.01725](http://arxiv.org/abs/2407.01725)\.
- Matsubara et al\. \(2022\)Matsubara, Y\., Chiba, N\., Igarashi, R\., & Ushiku, Y\. \(2022\)\.Rethinking symbolic regression datasets and benchmarks for scientific discovery\.arXiv preprint arXiv:2206\.10540, \.[arXiv:2206\.10540](http://arxiv.org/abs/2206.10540)\.
- Morgan & Morrison \(1999\)Morgan, M\. S\., & Morrison, M\. \(Eds\.\) \(1999\)\.Models as Mediators: Perspectives on Natural and Social Science\.Cambridge: Cambridge University Press\.
- Nersessian \(2008\)Nersessian, N\. J\. \(2008\)\.Creating Scientific Concepts\.Cambridge, MA: MIT Press\.
- Popper \(1959\)Popper, K\. R\. \(1959\)\.The Logic of Scientific Discovery\.London: Hutchinson\.
- Robinson \(2017\)Robinson, M\. \(2017\)\.Sheaves are the canonical data structure for sensor integration\.Information Fusion,36, 208–224\. doi:[10\.1016/j\.inffus\.2016\.12\.002](http://dx.doi.org/10.1016/j.inffus.2016.12.002)\.
- Romera\-Paredes et al\. \(2024\)Romera\-Paredes, B\., Barekatain, M\., Novikov, A\., Balog, M\., Kumar, M\. P\., Dupont, E\., Ruiz, F\. J\. R\., Ellenberg, J\. S\., Wang, P\., Fawzi, O\., Kohli, P\., & Fawzi, A\. \(2024\)\.Mathematical discoveries from program search with large language models\.Nature,625, 468–475\. doi:[10\.1038/s41586\-023\-06924\-6](http://dx.doi.org/10.1038/s41586-023-06924-6)\.
- Schmidt & Lipson \(2009\)Schmidt, M\., & Lipson, H\. \(2009\)\.Distilling free\-form natural laws from experimental data\.Science,324, 81–85\. doi:[10\.1126/science\.1165893](http://dx.doi.org/10.1126/science.1165893)\.
- Schölkopf & Smola \(2002\)Schölkopf, B\., & Smola, A\. J\. \(2002\)\.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond\.Cambridge, MA: MIT Press\.
- Shawe\-Taylor & Cristianini \(2004\)Shawe\-Taylor, J\., & Cristianini, N\. \(2004\)\.Kernel Methods for Pattern Analysis\.Cambridge: Cambridge University Press\.
- Shervashidze et al\. \(2011\)Shervashidze, N\., Schweitzer, P\., van Leeuwen, E\. J\., Mehlhorn, K\., & Borgwardt, K\. M\. \(2011\)\.Weisfeiler–lehman graph kernels\.Journal of Machine Learning Research,12, 2539–2561\.
- Stone \(1974\)Stone, M\. \(1974\)\.Cross\-validatory choice and assessment of statistical predictions\.Journal of the Royal Statistical Society: Series B,36, 111–147\.
- Sun \(2009\)Sun, R\. \(2009\)\.Theoretical status of computational cognitive modeling\.Cognitive Systems Research,10, 124–140\. doi:[10\.1016/j\.cogsys\.2008\.07\.002](http://dx.doi.org/10.1016/j.cogsys.2008.07.002)\.
- Thagard \(2012\)Thagard, P\. \(2012\)\.The Cognitive Science of Science: Explanation, Discovery, and Conceptual Change\.Cambridge, MA: MIT Press\.
- Udrescu & Tegmark \(2020\)Udrescu, S\.\-M\., & Tegmark, M\. \(2020\)\.AI Feynman: A physics\-inspired method for symbolic regression\.Science Advances,6, eaay2631\. doi:[10\.1126/sciadv\.aay2631](http://dx.doi.org/10.1126/sciadv.aay2631)\.
- Vishwanathan et al\. \(2010\)Vishwanathan, S\. V\. N\., Schraudolph, N\. N\., Kondor, R\., & Borgwardt, K\. M\. \(2010\)\.Graph kernels\.Journal of Machine Learning Research,11, 1201–1242\.
- Wang et al\. \(2022\)Wang, R\., Jansen, P\., Côté, M\.\-A\., & Ammanabrolu, P\. \(2022\)\.ScienceWorld: Is your agent smarter than a 5th grader?InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing\(pp\. 11279–11298\)\.Abu Dhabi, United Arab Emirates: Association for Computational Linguistics\.URL:[https://aclanthology\.org/2022\.emnlp\-main\.775/](https://aclanthology.org/2022.emnlp-main.775/)\. doi:[10\.18653/v1/2022\.emnlp\-main\.775](http://dx.doi.org/10.18653/v1/2022.emnlp-main.775)\.
## Appendix AAuxiliary Evaluation Protocols
The main text defines the primary procedures: transition\-card construction and finite obstruction ranking\. This appendix records the auxiliary protocols used for the secondary constellation\-kernel probe, stress expansion, and robustness sweeps\. These procedures do not define additional discovery mechanisms; they test transfer, stability, and boundary cases around the direct obstruction ranking rule\.
Input:Candidate signatures
\{Φ\(T,Δj\)\}\\\{\\Phi\(T,\\Delta\_\{j\}\)\\\}grouped by transition family
Output:Kernel ranking metrics
1Construct the additive constellation kernel
k\(a,b\)k\(a,b\)from Eq\. \([2](https://arxiv.org/html/2605.14033#S5.E2)\)
2foreach*held\-out familyf∈ℱf\\in\\mathcal\{F\}*do
3Let
𝒜train\\mathcal\{A\}\_\{\\mathrm\{train\}\}contain candidate rows
a=\(T,Δj\)a=\(T,\\Delta\_\{j\}\)from
ℱ∖\{f\}\\mathcal\{F\}\\setminus\\\{f\\\}
4Let
𝒜test\\mathcal\{A\}\_\{\\mathrm\{test\}\}contain candidate rows from
ff
5Fit the kernel scoring model on
𝒜train\\mathcal\{A\}\_\{\\mathrm\{train\}\}
6Score and rank candidates in each held\-out transition card
7Record top\-1 accuracy, reciprocal rank, and transition\-type correctness
8
return*family\-level and aggregate kernel metrics*
Algorithm 3Leave\-family\-out constellation\-kernel evaluationInput:Transition\-card collection
𝒯\\mathcal\{T\}
Output:Stress\-test margins and boundary cases
1foreach*transition cardT=\(𝒦0,Ds,Do,Dt,Dv,\{Δj\}j=1m\)∈𝒯T=\(\\mathcal\{K\}\_\{0\},D\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\},\\\{\\Delta\_\{j\}\\\}\_\{j=1\}^\{m\}\)\\in\\mathcal\{T\}*do
2Expand the candidate set with controlled incorrect formulas, randomized perturbations, and matched\-cost incorrect extensions
3For each candidate move
Δj\\Delta\_\{j\}, form
𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\)and compute
𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)
4Compute the stress margin
M\(T\)=𝖮𝖻𝗌S\(𝒦bestincorrect\)−𝖮𝖻𝗌S\(𝒦ref\),M\(T\)=\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{\\mathrm\{best\\ incorrect\}\}\)\-\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{\\mathrm\{ref\}\}\),where
𝒦ref\\mathcal\{K\}\_\{\\mathrm\{ref\}\}is the benchmark\-correct candidate and
𝒦bestincorrect\\mathcal\{K\}\_\{\\mathrm\{best\\ incorrect\}\}is the lowest\-obstruction non\-reference or matched\-cost alternative
5Mark
TTas a boundary case if
M\(T\)<0M\(T\)<0
6
return*stress margins, selected candidates, and boundary cases*
Algorithm 4Stress expansion with incorrect and matched\-cost alternativesInput:Transition\-card collection
𝒯\\mathcal\{T\}; noise levels
ℰ\\mathcal\{E\}; record fractions
𝒬\\mathcal\{Q\}
Output:Robustness metrics over
ℰ×𝒬\\mathcal\{E\}\\times\\mathcal\{Q\}
1foreach*η∈ℰ\\eta\\in\\mathcal\{E\}*do
2foreach*q∈𝒬q\\in\\mathcal\{Q\}*do
3Perturb the observation values in
Ds,Do,Dt,DvD\_\{s\},D\_\{o\},D\_\{t\},D\_\{v\}at noise level
η\\eta
4Retain a fraction
qqof observation records in each context
5Recompute
j^\(T\)=argminj𝖮𝖻𝗌S\(𝒦j\)\\widehat\{j\}\(T\)=\\operatorname\*\{arg\\,min\}\_\{j\}\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)for each perturbed card
6Record top\-1 accuracy, mean reciprocal rank, transition\-type accuracy, and margin statistics
7
8
return*robustness summaries over noise levels and retained\-record fractions*
Algorithm 5Noise and observation\-record robustness sweep
## Appendix BTransition\-Card Anatomy
The benchmark has three nested levels: transition family, transition card, and observation record\. A transition family specifies a scientific\-transition archetype and the deformation\-versus\-extension question\. A transition card is one concrete instance of that family, with a source constellation, source, overlap, target, and validation contexts, candidate moves, constraints, limits, and an evaluation label\. An observation record is one local context\-indexed datum inside one regime of the card\. Obstruction is computed over the whole transition card for each candidate move, not over a single observation record\.
Table 10:Nested levels in the transition\-card benchmark\.For example, in the Galilean\-to\-Lorentz family, a transition card asks whether Galilean velocity addition can be transported into a higher\-velocity regime by bounded deformation or whether Lorentzian structure must be added\. The source dataDsD\_\{s\}contain low\-velocity observations, the overlap dataDoD\_\{o\}contain intermediate velocities, the target dataDtD\_\{t\}contain high subluminal velocities, andDvD\_\{v\}is held out for validation\. Candidate moves include the unchanged Galilean law, deformations inside the original language, controlled incorrect extensions, and the Lorentzian extension\. The obstruction𝖮𝖻𝗌S\(𝒦j\)\\mathsf\{Obs\}\_\{S\}\(\\mathcal\{K\}\_\{j\}\)is computed for each candidate constellation𝒦j=Δj\(𝒦0\)\\mathcal\{K\}\_\{j\}=\\Delta\_\{j\}\(\\mathcal\{K\}\_\{0\}\)\.
## Appendix CSecondary Kernel Details
The constellation kernel is a secondary diagnostic\. The main text summarizes the kernel results in Figure[9](https://arxiv.org/html/2605.14033#S7.F9); the tables below report the exact aggregate values behind the compact kernel diagnostics\.
Table[11](https://arxiv.org/html/2605.14033#A3.T11)reports ablations of the kernel blocks in Eq\. \([2](https://arxiv.org/html/2605.14033#S5.E2)\)\. These values are used only to interpret the secondary kernel probe and do not replace direct obstruction ranking by𝖮𝖻𝗌S\\mathsf\{Obs\}\_\{S\}\.
Table 11:Kernel ablation results\. The ablations test which signature blocks contribute to the secondary constellation\-kernel probe\.Table[12](https://arxiv.org/html/2605.14033#A3.T12)reports a controlled variant suite for the secondary kernel\. The suite distinguishes within\-family generalization, mixed\-variant generalization, and leave\-family\-out transfer\. The last setting is the hardest because the kernel must compare candidate signatures across different scientific\-transition archetypes\.
Table 12:Controlled variant constellation\-kernel suite\. The suite probes representation quality of the secondary kernel and does not replace the primary obstruction\-ranking experiment\.
## Appendix DReproducibility Note
The accompanying implementation records candidate\-level obstruction components, stress\-test margins, sensitivity sweeps, robustness summaries, and kernel diagnostics as structured tables\. The manuscript reports the corresponding figures, aggregate metrics, and conceptual summaries; implementation\-level filenames and run artifacts remain with the accompanying code rather than the scientific narrative\.Similar Articles
Bridging Legal Interpretation and Formal Logic: Faithfulness, Assumption, and the Future of AI Legal Reasoning
This paper identifies a systematic gap between legal interpretation and formal logic in AI legal reasoning, proposes a neuro-symbolic approach to bridge it, and demonstrates substantial label shifts when re-annotating legal NLI data under strict formal entailment.
Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning
Proposes Agent-ToM, a learning-to-monitor framework using Theory-of-Mind reasoning to detect covert malicious behavior in autonomous LLM agents by inferring beliefs and intents, outperforming baseline monitors.
The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems
This paper identifies a structural failure in multi-agent AI pipelines where memory-layer attacks can be misattributed as model misalignment, formalizing Semantic Norm Drift (SND) and proposing Counterfactual Composition Testing and Memory-Persistent Information-Flow Control as defenses.
Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion
This paper studies emergent languages that autonomous LLM agents propose to one another on the Moltbook platform, finding that some languages are specifically designed to evade human oversight and can be learned in-context from short descriptions. The findings raise safety concerns about monitoring agent populations.
Oversmoothing as Representation Degeneracy in Neural Sheaf Diffusion
This paper analyzes oversmoothing in Neural Sheaf Diffusion (NSD) as a representation degeneracy phenomenon using quiver theory and Geometric Invariant Theory. It proposes moment-map-inspired regularizers and explores non-uniform stalk dimensions to mitigate this issue in heterophilic graph benchmarks.