Interaction Locality in Hierarchical Recursive Reasoning
Summary
Proposes interaction locality, a task-geometry-aware framework for measuring whether information flow in spatial reasoning models stays within local cells or crosses into global structure, and applies it to HRM, TRM, and MTU3D models on grid benchmarks and embodied 3D grounding.
View Cached Full Text
Cached at: 05/22/26, 08:48 AM
# Interaction Locality in Hierarchical Recursive Reasoning
Source: [https://arxiv.org/html/2605.20784](https://arxiv.org/html/2605.20784)
###### Abstract
Spatial reasoning requires both location\-bound computation and location\-invariant structure: agents must make local moves while preserving route, object, or constraint\-level plans\. We propose interaction locality, a task\-geometry\-aware framework for measuring whether information flow stays within nearby cells or semantic segments, or crosses them\. We instantiate the framework with sparse\-autoencoder feature ablations and finite\-noise activation patching, with structural Jacobian and attention checks reported in the appendix, and apply it to HRM and TRM, two compact hierarchical and recursive reasoning models, on Maze\-Hard, Sudoku Extreme, and ARC\-AGI\. Across these models, activation patching gives the clearest architectural fingerprint: high\-level recurrent states tend to write information within nearby cells or same\-segment units, while repeated recursive updates accumulate these local writes into broader solution structure\. This pattern holds across maze paths, Sudoku constraints, and ARC\-AGI object neighborhoods, with the strongest concentration in TRM\. To test whether interaction locality extends beyond toy\-yet\-challenging grid benchmarks, we also apply it to MTU3D, a large\-scale embodied 3D scene\-grounding model\. In this MTU3D setting, causal spatial locality appears primarily at the transition where visual scene features are handed to the downstream grounding module, rather than uniformly throughout the visual encoder\. This contrast suggests that the local\-to\-global handoff observed in HRM and TRM is tied to explicit recursive reasoning dynamics, while embodied 3D models may concentrate causal spatial structure at module boundaries\. Interaction locality turns the intuitive local\-execution/global\-planning story into a reproducible measurement framework for recursive and embodied spatial reasoning\.
## 1Introduction
Spatial reasoning is central to real\-world agents: robots manipulate objects, vehicles plan around obstacles, and navigation systems search maps while maintaining route\-level intent\. These tasks require a characteristic separation of scale\. A model must preserve location\-specific facts such as walls, cells, colors, and immediate moves, while also forming location\-invariant structure such as corridors, constraint houses, objects, and global plans\. Because spatial reasoning systems can affect physical actions, interpretability should explain how local and global information flow through the computation\.
Compact recursive architectures provide a focused testbed for this question\. The Hierarchical Reasoning Model \(HRM;Wanget al\.\([2025](https://arxiv.org/html/2605.20784#bib.bib1)\)\) and Tiny Recursive Model \(TRM;Jolicoeur\-Martineau \([2025](https://arxiv.org/html/2605.20784#bib.bib2)\)\) solve challenging Maze\-Hard, Sudoku Extreme, and ARC\-AGI tasks by repeatedly applying small Transformer modules\(Vaswaniet al\.,[2017](https://arxiv.org/html/2605.20784#bib.bib36)\)\. HRM has separate high\-level and low\-level modules, whereas TRM reuses a single module across recursive calls\. It is tempting to equate the H state with global planning and the L state with local refinement, but this is a hypothesis, not a mechanistic measurement: a label does not determine the spatial reach of an update\.
We make this hypothesis testable through*interaction locality*\. For each task, the external geometry defines both fine sites and coarser semantic segments: maze cells and corridors, Sudoku cells and houses, and ARC\-AGI foreground cells and objects\. We ask whether hidden features, update kernels, or finite perturbations stay within these neighborhoods or cross them\. This gives a single coordinate system for comparing local movement, constraint propagation, object\-level aggregation, and route\-level planning across architectures\.
Our contribution is a unified locality framework and a controlled analysis of HRM/TRM under a layered probe suite\. Sparse autoencoder \(SAE\) feature ablations expose human\-readable segment effects, finite\-noise activation patching provides the main causal test of spatial reach, and appendix structural Jacobian/attention checks report the corresponding linearized or architectural topology\. The probes agree on a sharper conclusion than the informalH is globalstory: across Maze, Sudoku, and ARC\-AGI, H\-level writes are often more spatially concentrated than L\-level writes, but cross\-cycle propagation can still carry those summaries broadly through the recursive state\. We also stress\-test the framework beyond toy grids by applying it to MTU3D, a 3D embodied navigation and grounding model\(Zhuet al\.,[2025](https://arxiv.org/html/2605.20784#bib.bib37)\), on ScanNet indoor scenes\(Daiet al\.,[2017](https://arxiv.org/html/2605.20784#bib.bib38)\)\. There, causal spatial locality appears at the visual\-to\-grounding handoff but largely disappears inside the unified encoder\.
## 2Background and related work
Both HRM and TRM maintain carry states\(zH,zL\)∈ℝB×T×D\(z\_\{H\},z\_\{L\}\)\\in\\mathbb\{R\}^\{B\\times T\\times D\}through repeated reasoning steps\. We write cycle labels such as H1L1 as zero\-indexed high\-phase/low\-call locations\. At a low call,
zL\(n\+1\)=fL\(zL\(n\),zH\(n\)\+𝐞\),zH\(n\+1\)=fH\(zH\(n\),zL\(n\+1\)\),z\_\{L\}^\{\(n\+1\)\}=f\_\{L\}\(z\_\{L\}^\{\(n\)\},z\_\{H\}^\{\(n\)\}\+\\mathbf\{e\}\),\\qquad z\_\{H\}^\{\(n\+1\)\}=f\_\{H\}\(z\_\{H\}^\{\(n\)\},z\_\{L\}^\{\(n\+1\)\}\),\(1\)where𝐞\\mathbf\{e\}is the token embedding stream\. HRM uses distinctfLf\_\{L\}andfHf\_\{H\}modules; TRM shares a single module across H/L calls, so any H/L distinction must arise from call context and state trajectory rather than module identity\. Predictions and ACT halting\(Graves,[2016](https://arxiv.org/html/2605.20784#bib.bib4)\)are read fromzHz\_\{H\}\.
Mechanistic interpretability studies neural computation through causal mediators, circuits, and sparse features\(Elhageet al\.,[2021](https://arxiv.org/html/2605.20784#bib.bib5); Menget al\.,[2022](https://arxiv.org/html/2605.20784#bib.bib6); Bricken and others,[2023](https://arxiv.org/html/2605.20784#bib.bib7)\)\. Structured domains are useful because external geometry gives an interpretable basis for evaluating internal variables: prior work finds spatially grounded state in chess, maze\-solving Transformers, and geographical representations\(Toshniwal and others,[2022](https://arxiv.org/html/2605.20784#bib.bib9); Jenner and others,[2024](https://arxiv.org/html/2605.20784#bib.bib10); Ivanitskiy and others,[2023](https://arxiv.org/html/2605.20784#bib.bib11); Spieset al\.,[2024](https://arxiv.org/html/2605.20784#bib.bib13); Sabbataet al\.,[2025](https://arxiv.org/html/2605.20784#bib.bib12)\)\. Recent spatial reasoning models extend this motivation to static VLMs, navigation, and robot policies: SpatialVLM injects spatial supervision, NaviLLM\-like models tokenize viewpoint/history, and OpenVLA\-style policies map vision\-language state to actions\(Chenet al\.,[2024](https://arxiv.org/html/2605.20784#bib.bib31); Zhenget al\.,[2024](https://arxiv.org/html/2605.20784#bib.bib32); Kimet al\.,[2024](https://arxiv.org/html/2605.20784#bib.bib33)\)\. Interpretability for these systems is moving from visualization toward probes, feature interventions, spatial/temporal IDs, and reasoning traces\(Spieset al\.,[2024](https://arxiv.org/html/2605.20784#bib.bib13); Kanget al\.,[2026](https://arxiv.org/html/2605.20784#bib.bib35); Wuet al\.,[2025](https://arxiv.org/html/2605.20784#bib.bib34)\)\. These studies motivate a reusable framework: task\-specific visualizations are valuable, but a common notion of locality is needed to compare reasoning across tasks and architectures\.
For spatial reasoning, causal interpretability should ultimately ask what changes when internal state is changed\. Activation patching and related interventions localize causal mediators in language models\(Menget al\.,[2022](https://arxiv.org/html/2605.20784#bib.bib6); Syedet al\.,[2023](https://arxiv.org/html/2605.20784#bib.bib26)\), while spatial\-reasoning interpretability has begun to combine probes, sparse features, and causal interventions in maze, grid\-world, and VLM settings\(Spieset al\.,[2024](https://arxiv.org/html/2605.20784#bib.bib13); Kanget al\.,[2026](https://arxiv.org/html/2605.20784#bib.bib35); Wuet al\.,[2025](https://arxiv.org/html/2605.20784#bib.bib34)\)\. We therefore treat locality as an intervention\-level question: after perturbing one cell, object, or cycle state, does the effect remain near the source, stay inside a semantic segment, or spread globally? Jacobian and attention kernels remain useful structural summaries, but the main framework emphasizes finite perturbations because they are closer to causal behavior under non\-infinitesimal changes\.
## 3Experimental setup and methods
We analyze released HRM and TRM checkpoints on Maze\-Hard, Sudoku Extreme, and ARC\-AGI\. HRM uses a2×22\\times 2H/L schedule; TRM uses a shared module with longer schedules \(three high phases and four low calls for Maze/ARC\-AGI, six for Sudoku\)\. Main analyses are anchored at critical cycle locations selected by state\-change convergence: HRM H1L1 for all tasks, TRM H2L3 for Maze/ARC\-AGI, and TRM H2L5 for Sudoku\. Finite\-noise patching usesn=50n=50examples per model–task cell\. ARC\-AGI neighborhoods are same\-color 4\-connected foreground components after padding grids to30×3030\\times 30and filtering to examples with 2–15 components\. Appendix[G](https://arxiv.org/html/2605.20784#A7)reports the structural Jacobian analyses and their sample counts\.
#### SAE semantic locality\.
For H and L activations at the critical cycle, we train independent sparse autoencoders \(512→2048512\\to 2048,λ1=10−3\\lambda\_\{1\}=10^\{\-3\}\), rank features by ablation impact, and compute how much each feature’s effect remains within a task segment \(corridor, Sudoku house, or ARC object\)\. SAE locality is human\-readable but feature\-wise: it reveals where individual sparse features act, not necessarily the aggregate topology of the whole update\.
#### Finite\-noise activation patching\.
For a source sitevv, injection levelaa, and capture levelbb, we perturb the single activation vectorza\[v\]z\_\{a\}\[v\]byσaϵ\\sigma\_\{a\}\\epsilon, withϵ∼𝒩\(0,I\)\\epsilon\\sim\\mathcal\{N\}\(0,I\), and measure the activation\-difference field at each target siteuu:
Aa→b\[u,v\]=‖zbpatched\[u\]−zbclean\[u\]‖2,La→b\(v\)=∑u∈𝒩\(v\)Aa→b\[u,v\]∑uAa→b\[u,v\]\.A\_\{a\\to b\}\[u,v\]=\\\|z\_\{b\}^\{\\mathrm\{patched\}\}\[u\]\-z\_\{b\}^\{\\mathrm\{clean\}\}\[u\]\\\|\_\{2\},\\qquad L\_\{a\\to b\}\(v\)=\\frac\{\\sum\_\{u\\in\\mathcal\{N\}\(v\)\}A\_\{a\\to b\}\[u,v\]\}\{\\sum\_\{u\}A\_\{a\\to b\}\[u,v\]\}\.\(2\)The neighborhood𝒩\(v\)\\mathcal\{N\}\(v\)is fixed before measurement: distance at most one along the Maze path, the same3×33\\times 3Sudoku subgrid, or the same 4\-connected ARC\-AGI foreground object\. The denominator ranges over the same valid task positions, and rows with zero total activation change are excluded from the average\. The random baseline is computed with the same neighborhood sizes, soLa→bL\_\{a\\to b\}is interpreted as locality above a geometry\-matched null\. Noise scale is calibrated independently of the locality score: Maze and Sudoku reuse the reliability\-calibrated scale targeting roughly 30% self\-drop at a probe position, while ARC\-AGI uses the SNR=1 calibration from the cross\-dataset runs\. We report within\-L, within\-H, and cross\-level/cross\-cycle analogs; higher values mean the finite perturbation has more spatially local causal reach\. Structural Jacobian and attention analogs are defined and reported in Appendix[G](https://arxiv.org/html/2605.20784#A7)as topology checks rather than finite\-causal effects\.
Table 1:Probe suite used to instantiate interaction locality\. The main text prioritizes finite perturbations as causal evidence; SAE features provide semantic readability, and appendix\-only structural probes report linearized topology and attention bias\.\(a\) Task geometries\. 
\(b\) Interaction\-locality pipeline\. 
Figure 1:Interaction\-locality framework\. Panel \(a\) shows the task geometries that define locality: Maze path cells and corridors, Sudoku cells and3×33\\times 3boxes, and ARC\-AGI foreground objects/examples\. Panel \(b\) shows how the same measurement suite is aligned across models: HRM and TRM are analyzed at a critical recursive step and cycle, while MTU3D/ScanNet is analyzed at critical layers or blocks\. The green measurement box denotes SAE semantic locality by segment/neighborhood, the blue box denotes within\-cycle or within\-layer finite\-noise patching sensitivity, and the purple box denotes cross\-cycle or cross\-layer patching sensitivity\. Structural Jacobian and attention diagnostics are reported in the appendix as topology checks rather than the primary finite\-intervention evidence\.
#### MTU3D/ScanNet\.
For the embodied 3D extension, we analyze MTU3D\(Zhuet al\.,[2025](https://arxiv.org/html/2605.20784#bib.bib37)\)on ScanNet indoor scenes\(Daiet al\.,[2017](https://arxiv.org/html/2605.20784#bib.bib38)\)\. MTU3D is a 3D occupancy\-transformer model that predicts scene structure from object\-level scene features; ScanNet provides reconstructed indoor environments with object geometry and semantic annotations\. We evaluate interaction locality on 30 ScanNet scenes by patching object\-level activations at selected layers/blocks and measuring changes in the model’s scene\-completion or occupancy prediction metric\. Local neighborhoods are defined by object distance in the reconstructed 3D scene, and random near\-object patches provide the geometry\-matched baseline\.
#### Code availability\.
## 4Results
### 4\.1Convergence identifies the analysis window
Hidden states change most sharply early in reasoning\.[Figure˜2](https://arxiv.org/html/2605.20784#S4.F2)shows that Maze has the clearest first\-step drop, Sudoku decays over multiple steps, and ARC\-AGI has no single cliff\. The qualitative comparisons in[Figure˜3](https://arxiv.org/html/2605.20784#S4.F3)show why this early window is worth analyzing: first\-step decodes already contain much of the final solution structure\. At the same time, these decodes are observational\. They justify the analysis window but do not establish which positions causally influence later updates\. The remainder of the paper therefore uses interaction kernels and finite perturbations for mechanistic claims\.
Figure 2:H/L hidden\-state change across reasoning steps for HRM \(top\) and TRM \(bottom\)\. Blue curves track H\-level changes and orange curves track L\-level changes; shaded bands denote 95% bootstrap CIs over examples\. Maze changes sharply after the first step, Sudoku refines over several steps, and ARC\-AGI lacks a single convergence cliff; these trajectories select the critical windows used by later analyses\.[Appendix˜D](https://arxiv.org/html/2605.20784#A4)gives the Q\-value view\.Figure 3:Initial, step\-1, and final decoded outputs for all model–task pairs \(HRM top row, TRM bottom row; columns are Maze, Sudoku, ARC\-AGI\)\. Each panel shows the decoded task structure at the input/initial state, after the first reasoning step, and at the final step; colors follow the task\-specific output convention for paths, digits, or ARC objects\. Step 1 already contains recognizable task structure across domains, supporting it as a solution\-forming window\.
### 4\.2Activation patching gives the locality fingerprint
SAE features provide a useful segment\-level view \(Figure[4](https://arxiv.org/html/2605.20784#S4.F4)\): Maze HRM has more local L\-level features than H\-level features, while Sudoku and ARC\-AGI features are mostly balanced under the chosen segment definitions\. This tells us where individual sparse features act, but the causally stronger test is finite\-noise activation patching: after corrupting one source site, we measure where the resulting hidden\-state change travels\.
[Table˜2](https://arxiv.org/html/2605.20784#S4.T2)gives the cross\-task activation\-patching summary\. The most stable pattern is within\-H locality\. Across all six model–task pairs, within\-H is at least as local as within\-L and is above the corresponding random\-neighborhood baseline\. The effect is clearest in TRM/Maze, where within\-H is \.580 while within\-L is near baseline at \.032, and it extends to ARC\-AGI when connected objects define the neighborhood\. Sudoku shows the same direction under a larger3×33\\times 3house neighborhood, where the locality unit already spans a substantial constraint region\.


Figure 4:Representative SAE semantic locality examples\. Left: a Maze HRM feature whose ablation concentrates on a corridor pair; the colored corridor overlay marks the selected segment and the bar plots compare that segment with all other path cells\. Right: a Sudoku HRM feature whose ablation concentrates on a pair of3×33\\times 3boxes; orange bars denote the highlighted box pair and gray bars denote the comparison set, with error bars showing 95% CIs\. These feature ablations give interpretable segment\-level hypotheses, such as corridors and Sudoku boxes; causal reach is tested by the finite\-noise patching results in[Tables˜2](https://arxiv.org/html/2605.20784#S4.T2)and[5](https://arxiv.org/html/2605.20784#S4.F5)\.Table 2:Finite\-noise activation\-patching locality summary \(mean \[95% CI\]\)\. Locality isLa→bL\_\{a\\to b\}from[Equation˜2](https://arxiv.org/html/2605.20784#S3.E2): the fraction of activation\-change mass inside the task\-defined neighborhood; baseline is the expected local fraction under the same neighborhood sizes\. Within\-H is at least as local as within\-L in every model–task cell and remains above the geometry\-matched baseline, supporting a consistent local\-write signature under finite perturbations while allowing cross\-cycle channels to vary with task and architecture\.[Figure˜5](https://arxiv.org/html/2605.20784#S4.F5)shows why reliability diagnostics are needed\. In HRM/Maze, direct within\-H perturbations are concentrated and reliable: H\-level writes local summaries at each token\. Perturbations carried byzHz\_\{H\}across cycles decay more slowly, showing that the H state can propagate those summaries more broadly\. TRM/Maze contrasts with an almost baseline within\-L channel under the shared recursive operator\. Sudoku has higher locality at both levels because a3×33\\times 3house already forms a broad constraint unit, but it preserves the same within\-H direction\.
Figure 5:Finite\-noise activation patching analogs for Maze and Sudoku\. Left column: normalized spatial decay of the self\-effect as the patch source moves from the target cell to farther cells, boxes, or segments\. Right column: self\-drop reliability\. The gray dotted line at 0\.3 is the self\-drop calibration target used to set the perturbation scale, and the red dashed line at 0\.1 marks a dilution threshold below which a channel has little effect on its own source prediction\. Colors denote within\-H, within\-L, and cross\-level/cross\-cycle source–target channels\. HRM/Maze separates local H writes from broader cross\-cycle H\-state propagation, while TRM/Maze has a flatter shared\-operator channel\. Full raw drops and heatmaps are in Appendix[H](https://arxiv.org/html/2605.20784#A8)\.
### 4\.3MTU3D extends the framework to embodied 3D grounding
We stress\-test the framework on MTU3D, a large\-scale 3D embodied scene\-grounding model\(Zhuet al\.,[2025](https://arxiv.org/html/2605.20784#bib.bib37)\), using ScanNet indoor scenes\(Daiet al\.,[2017](https://arxiv.org/html/2605.20784#bib.bib38)\)\. The experiment asks whether corrupting or restoring one object\-level 3D scene feature changes nearby objects more than distant objects\. Thus MTU3D operates on metric 3D scene objects rather than 2D grid cells, and neighborhoods are defined by object distance in the ScanNet scene\.[Figure˜6](https://arxiv.org/html/2605.20784#S4.F6)reveals two distinct outcomes\. Panel \(a\) shows converging evidence at the visual\-to\-grounding handoff \(the MTU3D Stage 1–2 boundary\): structural layer\-0 attention locality \(\.438\) and causal input patching \(\.479\) both exceed the random baseline \(\.359\), confirming the framework’s sensitivity when spatial structure is present\. Yet layer\-recovery patching inside the unified encoder remains at baseline throughout all four layers, even though structural Jacobian and attention scores are well above baseline \(panel b\)\. The dissociation is itself informative in this MTU3D setting: MTU3D has a clear spatial inductive bias in its architecture, but this bias does not compound into causal locality through encoder depth\. Together with the HRM/TRM results, this suggests that the observed local\-to\-global handoff is tied to recursive reasoning dynamics rather than being a generic consequence of spatial attention alone\.
Figure 6:Interaction locality on MTU3D/ScanNet\. Bars show locality scores with 95% CIs; the dashed horizontal line is the random near\-object baseline computed from the same ScanNet object\-distance neighborhoods\. Panel \(a\) evaluates the visual\-to\-grounding handoff by comparing layer\-0 structural attention locality with causal input patching after corrupting one object feature\. Panel \(b\) compares structural Jacobian locality, attention locality, and layer\-recovery patching across the four unified\-encoder layers\. In this MTU3D/ScanNet setting, structural \(layer\-0 attention, \.438\) and causal \(input patching, \.479\) locality both exceed the random baseline \(\.359\) at the handoff, but layer\-recovery patching inside the encoder is near baseline despite structural Jacobian/attention locality above it\. The dissociation separates architectural spatial bias from causal local recovery in this 3D embodied setting\.
### 4\.4Triangulating claims across probes
Appendix[Table˜3](https://arxiv.org/html/2605.20784#A1.T3)summarizes how the probes jointly support the paper’s main claims\. The organization is layered: finite\-noise patching supplies the main causal evidence, SAE feature ablations make the task segments readable, and appendix structural checks explain the corresponding linearized topology\. This triangulation makes the interaction\-locality framework interpretable at both the semantic\-feature level and the finite\-intervention level\.
## 5Discussion
Interaction locality, the main object of study, unifies semantic feature analysis and finite causal perturbations under the same task geometry\. This improves clarity and falsifiability: a claim such asH is globalbecomes a statement about which intervention channel is local, which channel propagates broadly, and which neighborhood defines locality\. The results therefore revise the informal H/L story\. H\-level states can write locally within a cycle, especially under activation patching, while still carrying those local summaries across recursive time\.
A key implication is that local and global computation should not be treated as static properties of a named state\. They are relations among a state, a task geometry, a perturbation scale, and a recursive time edge\. SAE ablations give readable hypotheses about which corridors, boxes, or objects a sparse feature affects, but do not by themselves establish causal reach\. Finite\-noise patching checks whether the same locality survives a non\-infinitesimal intervention, but requires reliability diagnostics to avoid mistaking signal dilution for broad influence\. The appendix Jacobian and attention analyses add structural context, but the main claims rely on intervention\-level evidence\.
The MTU3D extension clarifies the framework’s scope\. In the 30\-scene MTU3D/ScanNet setting, structural spatial bias is present in attention and Jacobian scores, yet causal recovery inside the encoder is near baseline\. This dissociation is precisely why interaction locality should compare multiple probes: an architecture can attend locally without making downstream causal recovery local\. Conversely, HRM/TRM suggest that recursive cycles can create a local\-to\-global handoff even in compact toy benchmarks\. The framework is therefore not tied to Maze, Sudoku, or ARC\-AGI; it asks which task geometry makes a causal interaction local\.
For model design, the results suggest locality\-aware objectives\. A model could receive separate losses for local move direction, segment\-to\-segment matching, and cycle\-level consistency, or regularizers that encourage the right intervention scale at the right recursive phase\. In navigation this might mean supervising local collision\-avoidance separately from route\-segment matching; in manipulation it might mean separating contact\-local corrections from object\-level goal relations\. Such objectives are not evaluated here, but interaction locality gives an operational diagnostic for whether they work: after training, the intended local and global channels should appear in the corresponding feature, patching, structural, and temporal measurements\.
## 6Limitations and future work
The analysis compares released checkpoints rather than an accuracy\-matched training sweep, and ARC\-AGI uses HRM on ARC\-AGI\-2 and TRM on ARC\-AGI\-1\. We therefore make claims about mechanisms observed in available compact reasoning models, not about which architecture is more accurate\. The framework is also geometry\-dependent: our neighborhoods are appropriate for Maze, Sudoku, ARC\-AGI, and the MTU3D object\-distance analysis, but they do not exhaust all possible spatial relations\. Sudoku also contains row and column constraints, ARC\-AGI tasks may require transformations across same\-color object boundaries, and 3D embodied models require choices about metric distance, contact, viewpoint, and uncertainty\. A first step toward the Sudoku row/column dimension is in[Appendix˜J](https://arxiv.org/html/2605.20784#A10)\.
A first future direction is training dynamics\. If interaction locality is a mechanistic signature rather than a post\-hoc statistic, it should emerge, split, or reorganize over checkpoints\. Tracking SAE features, finite\-noise reach, structural kernels, and temporal propagation during training would reveal whether local moves appear before global segment planning, whether H\-boundary broadening coincides with accuracy jumps, and whether TRM’s shared\-operator locality emerges gradually or only after stable iterative routines appear\. Such experiments would also test whether locality\-aware training objectives improve sample efficiency or robustness\.
A second direction is embodied spatial reasoning\. MTU3D is only a first step: real agents operate in 3D or 4D geometries with depth, contact, viewpoint, time, and uncertainty\. Extending interaction locality to these domains requires replacing grid neighborhoods with scene graphs, contact graphs, egocentric/allocentric coordinate frames, and time\-varying object tracks\. The payoff could be substantial: for safety\-critical navigation or manipulation, one wants to know whether a local action is causally grounded in the intended global plan, not merely whether the decoded rationale is plausible\.
## 7Conclusion
We introduced interaction locality as a task\-geometry\-aware interpretability framework for recursive spatial reasoning\. Across Maze\-Hard, Sudoku Extreme, and ARC\-AGI, finite\-noise activation patching gives a causally stronger account than visualization alone: H\-level writes tend to occupy the local same\-segment channel under task\-defined neighborhoods, with the small HRM/Sudoku gap illustrating that the magnitude of the H/L contrast depends on the chosen task scale\. Cross\-cycle channels can still propagate those summaries broadly\. SAE features provide readable segment\-level hypotheses, and appendix structural kernels explain the corresponding linearized topology, but the main framework is grounded in finite interventions\.
The broader contribution is a shift in what is measured\. Instead of asking only whether a compact recursive model has an interpretable feature or a plausible decoded plan, the framework asks whether the information needed for that plan moves at the right spatial scale and at the right recursive time\. The MTU3D extension shows why this distinction matters beyond puzzles: a large 3D embodied model can exhibit structural spatial bias while lacking causal locality inside its encoder\. Interaction locality therefore becomes a diagnostic for whether local actions, object\-level relations, and global plans are causally connected, not merely co\-visible in a representation\.
In this paper, Maze, Sudoku, ARC\-AGI, and MTU3D provide geometries with known neighborhoods\. The same principle extends naturally to richer 3D and 4D settings if the neighborhood relation is replaced by contact graphs, object tracks, scene graphs, or egocentric/allocentric coordinate frames\. The immediate future work is therefore to track how interaction locality emerges during training and to test whether locality\-aware objectives improve robustness and transfer\. If successful, interaction locality could become a common diagnostic for spatial reasoning models across symbolic puzzles, navigation, and embodied manipulation\.
## References
- T\. Brickenet al\.\(2023\)Towards monosemanticity: decomposing language models with dictionary learning\.Transformer Circuits Thread\.Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p2.1)\.
- B\. Chen, Z\. Xu, S\. Kirmani, B\. Ichter, D\. Driess, P\. Florence, D\. Sadigh, L\. Guibas, and F\. Xia \(2024\)SpatialVLM: endowing vision\-language models with spatial reasoning capabilities\.InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,pp\. 14455–14465\.External Links:[Document](https://dx.doi.org/10.1109/CVPR52733.2024.01370),[Link](https://arxiv.org/abs/2401.12168)Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p2.1)\.
- A\. Dai, A\. X\. Chang, M\. Savva, M\. Halber, T\. Funkhouser, and M\. Nießner \(2017\)ScanNet: richly\-annotated 3d reconstructions of indoor scenes\.InProc\. Computer Vision and Pattern Recognition \(CVPR\), IEEE,Cited by:[Appendix B](https://arxiv.org/html/2605.20784#A2.SS0.SSS0.Px1.p1.1),[§1](https://arxiv.org/html/2605.20784#S1.p4.1),[§3](https://arxiv.org/html/2605.20784#S3.SS0.SSS0.Px3.p1.1),[§4\.3](https://arxiv.org/html/2605.20784#S4.SS3.p1.1)\.
- N\. Elhage, N\. Nanda, C\. Olsson, T\. Henighan, N\. Joseph, B\. Mann, A\. Askell, Y\. Bai, A\. Chen, T\. Conerly,et al\.\(2021\)A mathematical framework for transformer circuits\.Transformer Circuits Thread\.Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p2.1)\.
- A\. Graves \(2016\)Adaptive computation time for recurrent neural networks\.arXiv preprint arXiv:1603\.08983\.Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p1.5)\.
- M\. I\. Ivanitskiyet al\.\(2023\)Structured world representations in maze\-solving transformers\.arXiv preprint arXiv:2312\.02566\.Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p2.1)\.
- E\. Jenneret al\.\(2024\)Evidence of learned look\-ahead in a chess\-playing neural network\.InAdvances in Neural Information Processing Systems,Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p2.1)\.
- A\. Jolicoeur\-Martineau \(2025\)Less is more: recursive reasoning with tiny networks\.arXiv preprint arXiv:2510\.04871\.Cited by:[§1](https://arxiv.org/html/2605.20784#S1.p2.1)\.
- R\. Kang, H\. Chen, G\. Gkioxari, and P\. Perona \(2026\)Linear mechanisms for spatiotemporal reasoning in vision language models\.InInternational Conference on Learning Representations,External Links:[Link](https://arxiv.org/abs/2601.12626)Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p2.1),[§2](https://arxiv.org/html/2605.20784#S2.p3.1)\.
- M\. J\. Kim, K\. Pertsch, S\. Karamcheti, T\. Xiao, A\. Balakrishna, S\. Nair, R\. Rafailov, E\. Foster, G\. Lam, P\. Sanketi, Q\. Vuong, T\. Kollar, B\. Burchfiel, R\. Tedrake, D\. Sadigh, S\. Levine, P\. Liang, and C\. Finn \(2024\)OpenVLA: an open\-source vision\-language\-action model\.arXiv preprint arXiv:2406\.09246\.External Links:2406\.09246,[Link](https://arxiv.org/abs/2406.09246)Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p2.1)\.
- K\. Meng, D\. Bau, A\. Andonian, and Y\. Belinkov \(2022\)ROME: locating and editing factual associations in GPT\.InAdvances in Neural Information Processing Systems,Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p2.1),[§2](https://arxiv.org/html/2605.20784#S2.p3.1)\.
- S\. D\. Sabbata, S\. Mizzaro, and K\. Roitero \(2025\)Geospatial mechanistic interpretability of large language models\.arXiv preprint arXiv:2505\.03368\.Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p2.1)\.
- A\. F\. Spies, W\. Edwards, M\. I\. Ivanitskiy, A\. Skapars, T\. Räuker, K\. Inoue, A\. Russo, and M\. Shanahan \(2024\)Transformers use causal world models in maze\-solving tasks\.arXiv preprint arXiv:2412\.11867\.Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p2.1),[§2](https://arxiv.org/html/2605.20784#S2.p3.1)\.
- A\. Syed, C\. Rager, and A\. Conmy \(2023\)Attribution patching outperforms automated circuit discovery\.arXiv preprint arXiv:2310\.10348\.External Links:2310\.10348,[Link](https://arxiv.org/abs/2310.10348)Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p3.1)\.
- S\. Toshniwalet al\.\(2022\)Chess as a testbed for language model state tracking\.InProceedings of the AAAI Conference on Artificial Intelligence,Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p2.1)\.
- A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, Ł\. Kaiser, and I\. Polosukhin \(2017\)Attention is all you need\.InAdvances in Neural Information Processing Systems 30,pp\. 5998–6008\.Cited by:[§1](https://arxiv.org/html/2605.20784#S1.p2.1)\.
- G\. Wang, J\. Li, Y\. Sun, X\. Chen, C\. Liu, Y\. Wu, M\. Lu, S\. Song, and Y\. A\. Yadkori \(2025\)Hierarchical reasoning model\.arXiv preprint arXiv:2506\.21734\.Cited by:[§1](https://arxiv.org/html/2605.20784#S1.p2.1)\.
- J\. Wu, J\. Guan, K\. Feng, Q\. Liu, S\. Wu, L\. Wang, W\. Wu, and T\. Tan \(2025\)Reinforcing spatial reasoning in vision\-language models with interwoven thinking and visual drawing\.arXiv preprint arXiv:2506\.09965\.External Links:2506\.09965,[Link](https://arxiv.org/abs/2506.09965)Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p2.1),[§2](https://arxiv.org/html/2605.20784#S2.p3.1)\.
- D\. Zheng, S\. Huang, L\. Zhao, Y\. Zhong, and L\. Wang \(2024\)Towards learning a generalist model for embodied navigation\.InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,External Links:[Document](https://dx.doi.org/10.1109/CVPR52733.2024.01293),[Link](https://arxiv.org/abs/2312.02010)Cited by:[§2](https://arxiv.org/html/2605.20784#S2.p2.1)\.
- Z\. Zhu, X\. Wang, Y\. Li, Z\. Zhang, X\. Ma, Y\. Chen, B\. Jia, W\. Liang, Q\. Yu, Z\. Deng, S\. Huang, and Q\. Li \(2025\)Move to understand a 3d scene: bridging visual grounding and exploration for efficient and versatile embodied navigation\.International Conference on Computer Vision \(ICCV\)\.External Links:[Link](https://mtu3d.github.io/)Cited by:[§1](https://arxiv.org/html/2605.20784#S1.p4.1),[§3](https://arxiv.org/html/2605.20784#S3.SS0.SSS0.Px3.p1.1),[§4\.3](https://arxiv.org/html/2605.20784#S4.SS3.p1.1)\.
## Appendix AClaim\-level triangulation details
Table 3:Claim\-level triangulation\. The framework connects semantic readability, finite\-intervention locality, and structural topology so each mechanistic claim has a dedicated source of evidence\.
## Appendix BExperimental Details
#### MTU3D/ScanNet setting\.
For the 3D embodied extension, MTU3D is the model and ScanNet is the task source: we analyze precomputed object\-level Stage\-1 features from 30 ScanNet indoor scenes\[Daiet al\.,[2017](https://arxiv.org/html/2605.20784#bib.bib38)\]\. The measured task is scene grounding over 3D object/voxel representations\. Locality neighborhoods are defined by metric object distance, calibrated so that the near set contains roughly seven nearest neighbors per scene; the random baseline is the corresponding fraction of near\-object off\-diagonal pairs\. The MTU3D experiments therefore test whether input corruption or layer recovery changes spatially nearby scene objects more than distant objects\.
#### Compute resources\.
All reported analyses use released checkpoints and do not require retraining the HRM/TRM models\. The accompanying code is written in PyTorch and is intended to run on a single CUDA GPU; the repository recommends a GPU with at least 6GB VRAM for notebook\-level analyses, with selective state capture and batch size 1 used for memory\-intensive activation and intervention runs\. The setup script downloads dependencies, datasets, and checkpoints; checkpoint sizes range from tens of MB for small TRM checkpoints to approximately 2GB for ARC\-AGI checkpoints\. The finite\-noise activation\-patching, SAE\-ablation, and Jacobian diagnostic scripts process examples independently, so runtime scales approximately linearly with the number of evaluated samples\. Reproducing the optional Maze\-TRM training run from scratch is documented as requiring roughly 40 hours on a single A100 40GB GPU, but the paper’s main experiments use the released/analyzed checkpoints rather than retraining them\.
## Appendix CAdditional discussion
#### Broader impacts\.
Interaction locality is intended as an auditing tool for spatial reasoning models\. Its positive impact is to make compact recursive and embodied spatial systems more inspectable before they are used in safety\-relevant settings such as navigation, manipulation, and scene grounding\. The same analysis could also inform stronger spatial reasoning systems, including systems deployed in physical environments\. If such systems are used without adequate safety checks, errors in locality measurement or overconfidence in interpretability results could contribute to unsafe navigation, manipulation failures, or misplaced trust in embodied agents\. We therefore view interaction\-locality analysis as a diagnostic complement to, not a replacement for, task\-specific safety, robustness, and deployment evaluations\.
#### Licence\.
The official HRM implementation is released under Apache\-2\.0 and links to the official Sapient checkpoint artifacts used in this work\. The checkpoint pages are publicly released by the same authors, but do not expose separate license metadata beyond the official repository release context\.
## Appendix DAdditional convergence results
Figure 7:Q\-value evolution across reasoning steps for HRM and TRM on all three tasks\. The scalar Q\-value trajectories support the same critical\-window selection as hidden\-state divergence: Maze changes early, Sudoku refines over more steps, and ARC\-AGI lacks a single sharp cliff\.[Appendix˜D](https://arxiv.org/html/2605.20784#A4)gives the scalar view corresponding to the hidden\-state changes in[Figure˜2](https://arxiv.org/html/2605.20784#S4.F2)\. Cycle\-level diagnostics support the critical\-cycle choices reported in Section[3](https://arxiv.org/html/2605.20784#S3)\.
## Appendix EAdditional first\-step cycle decodes
Maze HRM: remaining L\-cycle decodes

Maze TRM: remaining upstreamzLz\_\{L\}decodes

Figure 8:Maze cycle\-decode panels not shown in the main selected figure\. These additional observational decodes restore the first\-step cycle context while keeping the main text focused on H\-update endpoints\.HRM / Sudoku

Figure 9:Selected first\-step cycle decodes for Sudoku HRM\. The two columns show the H\-cycle endpoints H0L1 and H1L1, withzLz\_\{L\}on top andzHz\_\{H\}below\. The decoded grids are observational sanity checks rather than causal evidence\.TRM / Sudoku

Figure 10:Selected first\-step cycle decodes for Sudoku TRM\. The top row shows direct upstreamzLz\_\{L\}cycles and the bottom row shows the correspondingzHz\_\{H\}updates\.HRM / ARC\-AGI

Figure 11:Selected first\-step cycle decodes for ARC\-AGI HRM\. The two columns show H\-cycle endpoints, withzLz\_\{L\}on top andzHz\_\{H\}below\.TRM / ARC\-AGI

Figure 12:Selected first\-step cycle decodes for ARC\-AGI TRM\. As in Maze and Sudoku, these panels show what is decodable at each cycle, while the Jacobian analysis measures which positions influence later updates\.
## Appendix FAdditional SAE locality results

\(a\) Maze HRM

\(b\) Sudoku HRM

\(c\) ARC\-AGI HRM

\(d\) Maze TRM

\(e\) Sudoku TRM

\(f\) ARC\-AGI TRM
Figure 13:Full top\-segment qualitative analysis\. The strongest human\-readable positive cases are the HRM Maze and HRM Sudoku panels shown in the main text; the remaining panels show that many high\-impact features are balanced or less cleanly aligned with the chosen segments\.Table 4:Per\-position correspondence between SAE feature locality and Jacobian locality\. Correlations are computed from the cross\-dataset summary files\. Correlation is task\- and architecture\-dependent, reinforcing that SAE segment locality and Jacobian cell\-to\-cell locality are complementary rather than interchangeable measurements\.
## Appendix GAdditional within\-cycle Jacobian and intervention results
The finite\-noise patching summary in[Table˜2](https://arxiv.org/html/2605.20784#S4.T2)is the primary within\-cycle causal evidence in the main text\. The Jacobian results below are structural diagnostics: they summarize linearized cell\-to\-cell and segment\-level topology, and help explain why the finite\-intervention patterns arise, but are not treated as large\-intervention causal effects\.
For a Jacobian diagnostic, we writeKb←a\[u,v\]=‖∂zb\[u\]/∂za\[v\]‖FK^\{b\\leftarrow a\}\[u,v\]=\\\|\\partial z\_\{b\}\[u\]/\\partial z\_\{a\}\[v\]\\\|\_\{F\}for the local linear influence from source sitevvat levelaato target siteuuat levelbb\. The shorthandKLK\_\{L\}andKHK\_\{H\}in the appendix figures denotes the L\-output and H\-output kernels, respectively; it does not assume thatKLK\_\{L\}is local or thatKHK\_\{H\}is global\. Cell locality is measured after the fact by the row\-normalized diagonal concentrationℓ¯\(K\)=P−1∑uK\[u,u\]/∑vK\[u,v\]\\bar\{\\ell\}\(K\)=P^\{\-1\}\\sum\_\{u\}K\[u,u\]/\\sum\_\{v\}K\[u,v\]\. For segment diagnostics, a partition𝒮=\{Sm\}\\mathcal\{S\}=\\\{S\_\{m\}\\\}inducesKseg\[m,n\]=meanu∈Sm,v∈SnK\[u,v\]K^\{\\mathrm\{seg\}\}\[m,n\]=\\mathrm\{mean\}\_\{u\\in S\_\{m\},v\\in S\_\{n\}\}K\[u,v\]\. When a segment\-granularity score is reported in supplementary outputs, we use
g\(K\)=rseg1\+ℓ¯\(K\),g\(K\)=\\frac\{r\_\{\\mathrm\{seg\}\}\}\{1\+\\bar\{\\ell\}\(K\)\},\(3\)wherersegr\_\{\\mathrm\{seg\}\}is the mean same\-segment entry divided by the mean cross\-segment entry\. The normalization keeps this descriptive segment score from simply duplicating cell\-diagonal concentration\.
#### Within\-cycle structural locality\.
[Appendix˜G](https://arxiv.org/html/2605.20784#A7.SS0.SSS0.Px1)restores the within\-cycle Jacobian summary used as a structural check\. In HRM, the L\-output kernel is more same\-position local than the H\-output kernel in Maze and Sudoku, while ARC\-AGI collapses the H/L gap under object\-local geometry\. TRM changes this ordering: Maze and ARC\-AGI have larger H\-output locality, while Sudoku is nearly tied\. These linearized kernels are not the main finite\-causal evidence, but they show that the task geometry is also visible in local differential topology\.
Table 5:Within\-cycle Jacobian same\-position locality summary \(95% bootstrap CI\)\.KLK\_\{L\}andKHK\_\{H\}are L\-output and H\-output Jacobian kernels, respectively; they are level\-indexed structural diagnostics, not assumed local or global a priori\. Sample sizes aren=191n=191for HRM Maze,n=530n=530for TRM Maze, andn=100n=100for the remaining model–task pairs\.
#### Segment granularity\.
[Appendix˜G](https://arxiv.org/html/2605.20784#A7.SS0.SSS0.Px2)aggregates the same kernels into task segments\. Maze HRM remains L\-over\-H at both cell and corridor resolutions, while Maze TRM is H\-over\-L; Sudoku is weaker but consistent with the nearly tied TRM pattern; ARC\-AGI shows that object\-level segments can reverse the HRM cell\-level near tie\. These results contextualize the main finite\-noise patching results by showing where the structural kernels place their segment mass\.
Table 6:Segment\-granularity summary with 95% bootstrap CIs\. Granularity is the segment self/cross ratio normalized by cell\-level locality as in[Equation˜3](https://arxiv.org/html/2605.20784#A7.E3); higher values indicate influence concentrated within task segments but spread across cells inside those segments\.Figure 14:Maze\-TRM segment\-level peak comparison\. The left panel shows corridor segments on a representative maze, with the top twoKHK\_\{H\}segment peaks highlighted; the right panels compare those peak segments against the remaining segments forKLK\_\{L\}andKHK\_\{H\}\. Maze\-TRM localizes strong interaction structure to specific route segments, but confidence intervals overlap in this single\-sample peak comparison, so aggregate statistics remain the primary evidence\.
\(a\)Sudoku constraint\-pair breakdown\.

\(b\)Sudoku structured\-pair mass fraction\.

\(c\)ARC\-AGI locality by task type\.
Figure 15:Additional quantitative Jacobian segment analyses\. Panel \(a\) breaks Sudoku Jacobian weight into box, row, column, and other cell\-pair categories; panel \(b\) summarizes the off\-diagonal mass in constraint\-structured pairs; panel \(c\) stratifies ARC\-AGI locality by task type\. Segment\-conditioned statistics expose structure that is not visible from the scalar cell\-locality summary alone: HRM and TRM differ in howKLK\_\{L\}andKHK\_\{H\}distribute mass across Sudoku constraint types, and ARC\-AGI locality varies with task type\.Figure 16:Per\-position Jacobian qualitative analysis for the selected high\-variance Maze\-TRM sample\. Green marks more local positions and red marks more global positions\. Locality is heterogeneous inside a single maze: some corridor stretches remain local while selected bottleneck or junction\-adjacent positions act more globally\.Figure 17:Per\-position Jacobian qualitative analysis for the selected high\-variance Sudoku\-TRM sample\. The scatter plots compare constraint density against row\-wise locality\. In this sample, higher constraint density is associated with lower TRM locality for bothKLK\_\{L\}andKHK\_\{H\}, suggesting broader integration under heavier local constraints\.Figure 18:Per\-position Jacobian qualitative analysis for the selected high\-variance ARC\-TRM sample\. Per\-object bar charts summarize locality for connected foreground components\. Object membership helps explain local/global variation, but the most global positions can lie on the task\-defining object rather than only at isolated boundaries\.
## Appendix HAdditional finite\-noise activation patching results
Figure 19:Raw accuracy\-drop curves and self\-drop fractions for the finite\-noise activation patching\. Raw drops distinguish genuine spatial reach from globally diluted channels that can look misleading after diagonal normalization\.Figure 20:Activation\-difference heatmaps for HRM/Maze patching analogs\. Rows are source positions and columns are target positions\. Within\-H is strongly diagonal, while cross\-H channels carry broader off\-diagonal effects across cycles\.Figure 21:Activation\-difference heatmaps for TRM/Maze patching analogs\. The within\-L channel is broadly distributed, matching the main text’s claim that weight sharing changes how local and global reach are expressed\.Figure 22:Activation\-difference heatmaps for HRM/Sudoku patching analogs\. Sudoku shows structured constraint\-related effects but weaker H/L separation than Maze\.Figure 23:Activation\-difference heatmaps for TRM/Sudoku patching analogs\. Several cross channels are broad or noise\-ceiling cases, reinforcing the need for the reliability diagnostic in[Figure˜5](https://arxiv.org/html/2605.20784#S4.F5)\.
## Appendix ICross\-dataset finite\-noise patching locality
Figure 24:Cross\-dataset finite\-noise patching locality from the experiment summarized numerically in[Table˜2](https://arxiv.org/html/2605.20784#S4.T2)\. Within\-H is at least as local as within\-L under task\-defined neighborhoods, while cross\-cycle channels vary with architecture and task geometry\.
## Appendix JConstraint\-type breakdown: comparing Jacobian, patching, and SAE ablation
The three probe methods in our framework—structural Jacobian, finite\-noise activation patching, and SAE feature ablation—each make a different commitment to causal strength, granularity, and scope\. This section places them side by side on the same question for Sudoku:*does information flow respect Sudoku constraint boundaries \(box, row, column, or other\)?*
For each cell pair\(i,j\)\(i,j\)in the9×99\\times 9grid we assign a constraint label based on which Sudoku rule links them:*box*if they share the same3×33\\times 3subgrid \(priority\),*row*/*column*if they share a row/column outside the same box, and*other*otherwise\. The random\-assignment baseline is box = 0\.100, row = 0\.075, col = 0\.075, other = 0\.750 \(exact counts from the81×8081\\times 80off\-diagonal pair set\)\.
Jacobian\(structural\): Per\-cell pair, we compute the Frobenius\-norm JacobianK\[i,j\]=‖∂z\[i\]/∂z\[j\]‖FK\[i,j\]=\\\|\\partial z\[i\]/\\partial z\[j\]\\\|\_\{F\}at the critical cycle, giving an \(81,81\) matrix per sample; fractions are sample\-means weighted by total mass\. The Jacobian figure covering HRM and TRM Sudoku is shown in[Appendix˜G](https://arxiv.org/html/2605.20784#A7.SS0.SSS0.Px2)\(a\)\.
Activation patching\(causal\): We use the averaged\(81×81\)\(81\\times 81\)heatmaps from the Jacobian\-analog patching experiment; each entry is the mean impact on celljjwhen celliiis source\-patched\. Fractions are computed off\-diagonal; 95% CIs use pair\-resampling bootstrap over all 6480 off\-diagonal pairs\. Results are shown for theKHK\_\{H\}\- andKLK\_\{L\}\-analog channels \(within\-H and within\-L, respectively\)\.
SAE feature ablation\(causal, feature\-resolved\): For the top\-30 H\-level and L\-level SAE features \(by total ablation impact\), we zero each feature and record the change in output logits across all non\-clue target cells\. Impact is summed by constraint type; fractions average over features and samples\.
[Appendix˜J](https://arxiv.org/html/2605.20784#A10)presents all three analyses side by side for HRM \(top row\) and TRM \(bottom row\)\. The structural JacobianKHK\_\{H\}shows strong constraint\-type preferences—in particular,KLK\_\{L\}concentrates substantially on*column*pairs \(≈0\.25\{\\approx\}0\.25vs\. baseline0\.0750\.075\) in HRM—while both activation patching and SAE feature ablation remain near the random baseline across all constraint types\. This dissociation is informative: it shows that the cell\-to\-cell differential topology aligns with Sudoku structure, but individual sparse features and finite perturbations do not selectively activate along row/column lines\. The implication is that constraint\-type selectivity in Sudoku is a collective property of the full representational geometry rather than the province of individually identifiable sparse features or activation\-patching channels\.
Figure 25:Sudoku constraint\-type breakdown for all three probe methods, HRM \(top\) and TRM \(bottom\)\.Col\. 1 \(JacobianKK\):structural Jacobian mass fraction by constraint type at the critical cycle; bars showKLK\_\{L\}\(darker\) andKHK\_\{H\}\(lighter\), error bars are 95% bootstrap CI over samples\.Col\. 2 \(Activation patching\):off\-diagonal fraction of the averaged impact heatmap by constraint type;KHK\_\{H\}\- andKLK\_\{L\}\-analog channels; error bars from pair\-resampling bootstrap\.Col\. 3 \(SAE feature ablation\):mean constraint\-type fraction across the top\-30 H\-level and L\-level features; error bars from sample\-resampling bootstrap\. Dashed lines mark the random pair\-type baseline \(box = 0\.10, row = col = 0\.075, other = 0\.75\)\. The Jacobian shows clear constraint\-type structure \(especially column concentration inKLK\_\{L\}\), while patching and SAE ablation are near\-baseline—indicating that Sudoku constraint\-type selectivity is a collective geometric property rather than a signature of individual sparse features or finite perturbation channels\.
## Appendix KAdditional MTU3D locality results
Figure 26:Qualitative illustration of zero\-ablation patching on two ScanNet scenes \(top:scene0012\_02, bottom:scene0011\_00;N=27N=27objects each\)\. Each panel is a top\-down\(x,z\)\(x,z\)projection of the scene\.Markers:the white star \(★\\bigstar\) is the zero\-ablated source object; all other objects are coloured and sized by the normalised activation\-change norm‖Δq‖2\\\|\\Delta q\\\|\_\{2\}\(hot colourmap: bright/large = high impact, dark/small = low impact\)\. The dashed circle shows the adaptive locality radiusRR\(calibrated so that≈5\\approx 5nearest neighbours fall inside on average\)\.Near\-fracin the panel title is the fraction of total impact mass that falls withinRRof the source: it matches the per\-object locality score used in the main\-text Exp 1 bar chart\. Local\-source objects \(left column; loc≈\\approx0\.64–0\.66\) concentrate impact within the radius \(near\-frac 0\.73–0\.95\), while global\-source objects \(right column; loc==0\.00\) spread impact broadly across the scene \(near\-frac 0\.03–0\.07\)\. Both extremes reflect real causal effects; the distinction is whether disruption is spatially structured or diffuse\. This per\-object variation underlies the aggregate locality score reported in the main text\.Figure 27:MTU3D locality across checkpoints\. Stage 1–2 input\-patching locality is stable across checkpoints, whereas encoder\-layer recovery remains near the random baseline\.Figure 28:MTU3D attention weight versus object\-pair distance\. Attention contains spatial distance bias, but the main text shows that such structural bias does not by itself imply finite causal recovery locality inside the unified encoder\.
## Appendix LAdditional cross\-cycle Jacobian results
For cross\-cycle kernels, diagonal concentration means the same row\-normalized same\-position massP−1∑uK\[u,u\]/∑vK\[u,v\]P^\{\-1\}\\sum\_\{u\}K\[u,u\]/\\sum\_\{v\}K\[u,v\]computed between aligned positions in adjacent recursive states\. The heatmaps below visualize the full row\-normalized kernel; the tables report cross\-cycle locality summaries and the same\-position concentration where H\-to\-H terms are available\.
Table 7:Cross\-cycle Jacobian locality by cycle edge \(mean \[95% CI\],n=100n=100per cell\)\. For HRM,H0L0→H0L1H0L0\\to H0L1is the within\-first\-H\-step update,H0L1→H1L0H0L1\\to H1L0is the cross\-H\-step boundary, andH1L0→H1L1H1L0\\to H1L1is the within\-second\-H\-step update\. TRM rows use analogous late\-within\-H and boundary anchors, with exact edge labels shown because L\-cycle counts differ across tasks\.Table 8:AvailableH→HH\{\\to\}Hcross\-cycle diagonal concentrations \(mean \[95% CI\],n=100n=100per cell\)\. Values are listed as sequences over H\-update pairs when present in the updated file\. Maze and Sudoku TRM have no comparableH→HH\{\\to\}Hterm under the analyzed L\-cycle indexing, while ARC\-AGI TRM haszHz\_\{H\}boundary entries\.Figure 29:Position\-level L→\\toL cross\-cycle Jacobian heatmaps for early and late cycle pairs\. The heatmaps visualize the same transition captured by same\-position diagonal concentration: early pairs are more self\-focused and later pairs show broader spatial attribution\.[Appendices˜L](https://arxiv.org/html/2605.20784#A12)and[L](https://arxiv.org/html/2605.20784#A12)give additional quantitative and spatial detail for the appendix structural cross\-cycle analysis\.Similar Articles
Retrieve, Integrate, and Synthesize: Spatial-Semantic Grounded Latent Visual Reasoning
This paper introduces RIS, a framework for spatial-semantic grounded latent visual reasoning in Multimodal Large Language Models to overcome information bottlenecks. It proposes anchoring latent tokens to spatial and semantic evidence, showing improvements on benchmarks like V* and HRBench.
Evaluating Interactive Reasoning in Large Language Models: A Hierarchical Benchmark with Executable Games
This paper introduces a multi-turn interactive framework for reasoning evaluation where LLMs must query a hidden environment and integrate partial observations, instantiated as a benchmark of 474 executable games across five difficulty levels, showing discriminative power and exposing differences in reasoning.
Exploiting Local Dynamics Regularity for Reusable Skills in Offline Hierarchical RL
This paper introduces CARL, a method for offline hierarchical reinforcement learning that exploits local dynamics regularity to learn reusable skills. The approach clusters state-goal pairs requiring similar action sequences, enabling more effective skill reuse and improved performance on complex humanoid tasks.
CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models
This paper presents CosmicFish-HRM, a compact 82.77M parameter language model with a hierarchical reasoning module that dynamically allocates reasoning compute during inference, learning when to halt based on input complexity.
Revisiting the Uniform Information Density Hypothesis in LLM Reasoning
This paper revisits the Uniform Information Density (UID) hypothesis in the context of LLM reasoning, introducing an entropy-based framework to quantify information flow uniformity. Across seven reasoning benchmarks, the authors find that high-quality reasoning exhibits local uniformity in step transitions but global non-uniformity in trajectory structure, suggesting LLM reasoning differs fundamentally from human communication patterns.