Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot Represent
Summary
This paper investigates whether large language models encode syntactic abstractions like phase boundaries that are not captured by Universal Dependencies, using structural probes on wh-movement stimuli with invariant UD distances, finding evidence across 13 LLMs for phase-structure representations that are causally active.
View Cached Full Text
Cached at: 05/27/26, 09:04 AM
# Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot Represent
Source: [https://arxiv.org/html/2605.26431](https://arxiv.org/html/2605.26431)
Yuanhao Chen Dartmouth College yc\.th@dartmouth\.edu &Peter Chin Dartmouth College pc@dartmouth\.edu
###### Abstract
Structural probes train onUniversal Dependencies \(UD\), which does not encode formal\-syntactic abstractions such as phase boundaries or phase\-internal cohesion\. Whetherlarge language models \(LLMs\)encode these remains an open question thatUD\-based probing cannot answer by construction\. We evaluate structural probes on wh\-movement stimuli whereUDdistances are invariant across conditions by design — any non\-zero effect therefore reflects structure beyondUD\. The three conditions — bare small clause, infinitival, and finite — are ordered by the number ofMinimalist Program \(MP\)phase boundaries the wh\-element crosses\.
Across 13LLMsfrom four families, we find a phase\-count gradient on a cross\-clause pair \(12/13 models\) and a 13/13 sign asymmetry on a within\-clause pair whoseUDdistance is identical across conditions — the latter specifically predicted by phase\-internal cohesion, anMPabstraction invisible toUDby construction\. Activation patching confirms the representations are causally active in 12/13 models\. These findings suggest that distributional pretraining can induce representations aligned with formal\-syntactic abstractions beyond the reach of annotation\-based probing;UD\-grounded probes provide a lower bound on syntactic encoding, not an upper bound\.
Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot Represent
Yuanhao ChenDartmouth Collegeyc\.th@dartmouth\.eduPeter ChinDartmouth Collegepc@dartmouth\.edu
## 1Introduction
Structural probing has established thatlarge language models \(LLMs\)encode syntactic structure in their hidden representations\(Hewitt and Manning,[2019](https://arxiv.org/html/2605.26431#bib.bib12);Manning et al\.,[2020](https://arxiv.org/html/2605.26431#bib.bib19)\)\. These probes train onUniversal Dependencies \(UD\)tree distances as the gold target — a consistent, broadly applicable annotation, but not a generative grammar\. WhetherLLMsencode such formal\-syntactic abstractions — phase boundaries, phase\-internal cohesion — remains an open question thatUD\-based probing cannot answer by construction\.
This paper evaluates structural probes on wh\-movement stimuli constructed so thatUDdistances are invariant across conditions by design \([section˜3\.1](https://arxiv.org/html/2605.26431#S3.SS1)\)\. The three conditions — bare small clause, infinitival, and finite — are ordered by the number of phase boundaries the wh\-element crosses, providing a graded test of phase structure inLLMrepresentations\.
Across all 13LLMsfrom four families, we find that the structural probe distance between an embedded subject and the embedded verb flips sign across conditions: smaller than baseline in the finite condition, larger in the infinitival condition\. No surface property predicts this:UDdistance between the two words is exactly one edge in every condition, linear word distance predicts no difference for finite \(both tokens are adjacent in bare and finite\), and a monotone structural\-complexity account predicts larger distances in both non\-baseline conditions\. The pattern is instead predicted by phase\-internal cohesion — aMinimalist Program \(MP\)abstraction invisible toUDby construction \([section˜4\.2](https://arxiv.org/html/2605.26431#S4.SS2)\) — providing evidence for formal\-syntactic representations inLLMsbeyond the reach of annotation\-based probing\.
We make three contributions\.
1. 1\.Phase\-structure probing with invariantUDdistance\.We design wh\-movement stimuli whereUDdistances are held constant across conditions, ensuring that probe effects reflect structure beyondUD\. Across 13LLMsfrom four families, we find a phase\-count gradient \(βfin\>βinf\>0\\beta\_\{\\text\{fin\}\}\>\\beta\_\{\\text\{inf\}\}\>0in 12/13 models under canonical\-layer reporting;[sections˜3\.3](https://arxiv.org/html/2605.26431#S3.SS3)and[4\.1](https://arxiv.org/html/2605.26431#S4.SS1)\) and a 13/13esubj\-evbsign asymmetry — specifically predicted byMPphase structure and inaccessible to anyUD\-based probe by construction \([section˜4\.2](https://arxiv.org/html/2605.26431#S4.SS2)\)\.
2. 2\.Canonical\-layer reporting\.We introduce canonical\-layer reporting: anchoring all contrasts to the layer that maximises the most reliable contrast, removing a per\-contrast degree of freedom that prior structural\-probing work has not controlled for\. Under this stricter criterion the phase\-count gradient holds in 12/13 models; the single failure mode is concealed by per\-contrast peak reporting\.
3. 3\.Causal corroboration\.We show that the representations identified by the probe are computationally active: Activation patching at the embedded\-subject position shifts probe distances in the predicted direction in 12/13 models, directly addressing the probing\-scepticism concern ofAgarwal et al\.\([2025](https://arxiv.org/html/2605.26431#bib.bib1)\)\.
## 2Background
#### Phase theory\.
We treat clause structure in the framework of phase theory\(Chomsky,[2000](https://arxiv.org/html/2605.26431#bib.bib6),[2001](https://arxiv.org/html/2605.26431#bib.bib7)\)\. A*phase*is a syntactic domain whose complement domain becomes inaccessible to higher derivational operations once the phase is complete\. The two phase heads in English are v0, the light\-verb head projecting a vP, and C, the head of CP\. A finite embedded clause therefore introduces two phases \(vP and CP\) above its embedded verb; an infinitival TP introduces one \(vP\); a bare small\-clause complement introduces no additional phases\. ThePhase Impenetrability Condition \(PIC\)restricts cross\-phase operations to material at the phase edge\(Chomsky,[2001](https://arxiv.org/html/2605.26431#bib.bib7)\), forcing successive\-cyclic wh\-movement to transit each phase edge\(Urk,[2020](https://arxiv.org/html/2605.26431#bib.bib29)\)\. Two predictions from this framework motivate our experimental design\. The number of phase boundaries between the wh\-position \(matrix Spec,CP, after movement\) and the base\-position wh\-copy at the embedded verb’s complement tracks the complement type: bare<<infinitival<<finite \([section˜3\.1](https://arxiv.org/html/2605.26431#S3.SS1)\)\. Arguments inside the same phase are structurally more cohesive than arguments separated by a phase boundary, even when their dependency\-tree distance is identical \([section˜4\.2](https://arxiv.org/html/2605.26431#S4.SS2)\)\. The first motivates the cross\-clausewh\-esubjprobe pair; the second motivates the within\-clauseesubj\-evbpair \([section˜3\.1](https://arxiv.org/html/2605.26431#S3.SS1)\)\.[Table˜3](https://arxiv.org/html/2605.26431#A2.T3)glosses syntactic terms used throughout\.
#### Structural probes\.
A structural probe is a linear map from a model’s contextualised hidden states to a Euclidean space, trained so that pairwise distances under the probe approximate gold tree distances between words\(Hewitt and Manning,[2019](https://arxiv.org/html/2605.26431#bib.bib12);Manning et al\.,[2020](https://arxiv.org/html/2605.26431#bib.bib19)\)\. Our probes are trained on theUniversal Dependencies English Web Treebank \(UD\-EWT\)\(Silveira et al\.,[2014](https://arxiv.org/html/2605.26431#bib.bib24)\)with undirected dependency\-tree distance as the gold target\. We then evaluate each trained probe on our wh\-movement stimuli without further fitting, and report an effect sizeβ\\beta: the estimated condition\-vs\.\-bare difference in probe distance for a designated word pair \([section˜3\.2](https://arxiv.org/html/2605.26431#S3.SS2)\)\. Because our stimuli are constructed so thatUD\-tree distance between probed word pairs is invariant across conditions \([section˜3\.1](https://arxiv.org/html/2605.26431#S3.SS1)\), a non\-zeroβ\\betacannot be a re\-encoding of the probe’s training target; it must reflect structural information thatUDdoes not encode\. The probing literature has questioned whether what a probe recovers corresponds to what a model computationally uses\(Agarwal et al\.,[2025](https://arxiv.org/html/2605.26431#bib.bib1)\); we address this directly with the causal\-intervention experiment of[section˜4\.3](https://arxiv.org/html/2605.26431#S4.SS3)\.
## 3Stimuli and Methods111Code and stimuli:[https://anonymous\.4open\.science/r/syntax\-probe\-147D](https://anonymous.4open.science/r/syntax-probe-147D)\.
### 3\.1Stimuli
Each item is a triple of wh\-questions sharing the wh\-phrase \(what\), the embedded\-subject lemma, and the embedded\-verb lemma, with the matrix verb and complement type varying by condition:
\\ex
\. \[̇Bare: \]What did she see him eat?\.̱\[Infinitival: \]What did she expect him to eat?\.̧\[Finite: \]What did she think he ate?
\{forest\}
\(a\)Bare
\{forest\}
\(b\)Infinitival
\{forest\}
\(c\)Finite
Figure 1:Schematic phrase structures of the three conditions\. The embedded clause is colour\-coded; phases it introduces are in bold, indicating the added phase boundaries: zero for bare \(a\), one for infinitival \(b\), two for finite \(c\)\. The lower wh\-copy is shown struck through at its base position\. Intermediate bar levels and matrix projections shared across conditions are omitted; the matrix vP \(a phase, but shared\) is unbolded\. Phrase structures with intermediate projections are in[fig\.˜7](https://arxiv.org/html/2605.26431#A3.F7)\.The matrix verb is drawn from a condition\-specific class\. The*bare*matrix verb is a perception or causative verb \(see,watch,make,let\) selecting a bare small\-clause complement\(Stowell,[1981](https://arxiv.org/html/2605.26431#bib.bib25)\)\. The*infinitival*matrix is anexceptional case\-marking \(ECM\)or object\-control verb \(expect,want,allow,need\) selecting an infinitival TP complement\(Chomsky,[1986](https://arxiv.org/html/2605.26431#bib.bib5)\)\. The*finite*matrix is a bridge verb \(think,believe,claim,say,know,suppose,report\) selecting a finite CP complement with a null complementizer\(Erteschik\-Shir,[1973](https://arxiv.org/html/2605.26431#bib.bib8)\)\. In the bare and infinitival conditions the embedded subject bears accusative case and the embedded verb appears in its base form; in the finite condition the embedded subject bears nominative case and the embedded verb appears in the simple past\.
The three conditions are ordered by complement size \([fig\.˜1](https://arxiv.org/html/2605.26431#S3.F1)\): bare small clause<<infinitival TP<<finite CP, equivalently an ordering by number of phase boundaries crossed along the wh\-movement path\(Chomsky,[2000](https://arxiv.org/html/2605.26431#bib.bib6),[2001](https://arxiv.org/html/2605.26431#bib.bib7)\)\. We treat bare as the structural baseline: A contrast against bare measures the representational consequences of the additional clause structure\.
We tag three positions per stimulus —wh,embedded\_subject,embedded\_verb— and measure two probe distances:wh\-esubj\(wh\-element to embedded subject\), capturing cross\-clause structural depth as a function of complement type; andesubj\-evb\(embedded subject to embedded verb\), capturing within\-clause cohesion\.
WH\-ESUBJ: 3 ESUBJ\-EVB: 1
\(a\)BareWH\-ESUBJ: 3 ESUBJ\-EVB: 1
\(b\)InfinitivalWH\-ESUBJ: 3 ESUBJ\-EVB: 1
\(c\)Finite\(d\)UDparses of the three conditions, with embedded\-clause words colour\-coded by condition \(matching[fig\.˜1](https://arxiv.org/html/2605.26431#S3.F1)\)\. Within\-itemUD\-tree distances for the two probe pairs are invariant across conditions:33edges forwh\-esubj\(what↔him/he\\textit\{what\}\\leftrightarrow\\textit\{him/he\}\) and11edge foresubj\-evb\(him/he↔eat/ate\\textit\{him/he\}\\leftrightarrow\\textit\{eat/ate\}\)\. Any non\-zeroβ\\betaon either pair therefore reflects structural information beyondUD\.#### UD\-distance invariance\.
For probe\-distance differences to be interpretable, theUD\-tree distance between each probe pair must be constant across conditions within each item; otherwise differences inβ\\betawould be confounded withUD\-level differences\. We parse every stimulus with spaCy’sen\_core\_web\_trfmodel\(RoBERTa\-based, self\-reported LAS≈94\\approx 94on its OntoNotes\-derived eval, with UD\-converted output schema;Honnibal et al\.,[2020](https://arxiv.org/html/2605.26431#bib.bib13)\)and verify within\-item invariance\. By design,esubj\-evbis exactly11edge \(nsubj\) in every condition\.wh\-esubjvaries by at most11edge in a minority of items, where the parser assigns a*shorter*UDpath in the infinitival than in bare\. Since our prediction is that probe distance*increases*with structural depth, this directional mismatch cannot manufacture the predicted effect\.
#### Lexicon and item generation\.
The combinatorial lexicon comprises77matrix subjects,77embedded subjects \(in matched accusative/nominative pairs\),44bare\-class matrix verbs,44infinitival\-class verbs,77bridge verbs, and2020embedded transitive verbs with inanimate\-compatible objects\. Items are constrained so that the matrix and embedded subjects differ and so that the three matrix verbs are mutually distinct within an item\. The Cartesian product yields on the order of10510^\{5\}candidate items; we use a fixed seed\-controlled sample of1,0001\{,\}000items, yielding3,0003\{,\}000stimuli \(one per condition per item\)\.
### 3\.2Probing Setup
#### Probe training\.
We follow the structural\-probe protocol ofHewitt and Manning\([2019](https://arxiv.org/html/2605.26431#bib.bib12)\)\. For each transformer layerℓ\\ell, we obtain a per\-word representationhw\(ℓ\)∈ℝdh\_\{w\}^\{\(\\ell\)\}\\in\\mathbb\{R\}^\{d\}by mean\-pooling the subword\-token hidden states at layerℓ\\ellfor each wordww\(ddis the model’s hidden size\)\. We then train a linear projectionB\(ℓ\)∈ℝr×dB^\{\(\\ell\)\}\\in\\mathbb\{R\}^\{r\\times d\}withr=64r=64, defining the probe distance between any two wordsu,vu,vat layerℓ\\ellas
dB\(ℓ\)\(u,v\):\-∥B\(ℓ\)\(hu\(ℓ\)−hv\(ℓ\)\)∥22,d\_\{B\}^\{\(\\ell\)\}\(u,v\)\\;\\coloneq\\;\\big\\lVert B^\{\(\\ell\)\}\\big\(h\_\{u\}^\{\(\\ell\)\}\-h\_\{v\}^\{\(\\ell\)\}\\big\)\\big\\rVert\_\{2\}^\{2\},the squared L2 norm of the projected difference\(Hewitt and Manning,[2019](https://arxiv.org/html/2605.26431#bib.bib12), eq\. 1\)\.B\(ℓ\)B^\{\(\\ell\)\}is fit by minimising L1 loss betweendB\(ℓ\)\(u,v\)d\_\{B\}^\{\(\\ell\)\}\(u,v\)and the gold undirected dependency\-tree distance betweenu,vu,von theUD\-EWTtraining corpus\(Silveira et al\.,[2014](https://arxiv.org/html/2605.26431#bib.bib24)\), using Adam \(learning rate10−310^\{\-3\}, batch size256256\) for up to100100epochs with learning\-rate decay on plateau \(factor0\.10\.1, patience11, up to44resets\)\. Input activations are standardised per training corpus before projection\. The resulting per\-layer probes are then evaluated on our wh\-movement stimuli without further fitting\.
#### Effect\-size estimation\.
For each \(model, layerℓ\\ell, pair\) we fit one treatment\-codedordinary least squares \(OLS\)regression
di,k\(ℓ\)=\\displaystyle d^\{\(\\ell\)\}\_\{i,k\}\\;=\\;\\;β0\(ℓ\)\+βfin\(ℓ\)⋅𝟙\[ci,k=fin\]\\displaystyle\\beta\_\{0\}^\{\(\\ell\)\}\+\\beta\_\{\\text\{fin\}\}^\{\(\\ell\)\}\\cdot\\mathds\{1\}\[c\_\{i,k\}=\\text\{fin\}\]\+βinf\(ℓ\)⋅𝟙\[ci,k=inf\]\+εi,k\(ℓ\),\\displaystyle\\phantom\{\\beta\_\{0\}^\{\(\\ell\)\}\}\+\\beta\_\{\\text\{inf\}\}^\{\(\\ell\)\}\\cdot\\mathds\{1\}\[c\_\{i,k\}=\\text\{inf\}\]\+\\varepsilon^\{\(\\ell\)\}\_\{i,k\},wheredi,k\(ℓ\)d^\{\(\\ell\)\}\_\{i,k\}is the probe distancedB\(ℓ\)d\_\{B\}^\{\(\\ell\)\}for the relevant word pair in stimuluskkof itemii,ci,kc\_\{i,k\}is the condition,𝟙\[⋅\]\\mathds\{1\}\[\\cdot\]is the indicator function,εi,k\(ℓ\)\\varepsilon^\{\(\\ell\)\}\_\{i,k\}is the error term, and bare is the reference category \(soβ0\(ℓ\)\\beta\_\{0\}^\{\(\\ell\)\}is the bare\-condition mean probe distance\)\. Standard errors are cluster\-robust at the item level, accounting for the within\-item correlation produced by three stimuli per item\. The headline coefficients are the contrast slopesβfin\(ℓ\)\\beta\_\{\\text\{fin\}\}^\{\(\\ell\)\}\(finite−\-bare\) andβinf\(ℓ\)\\beta\_\{\\text\{inf\}\}^\{\(\\ell\)\}\(infinitival−\-bare\)\. We apply Benjamini–Hochbergfalse discovery rate \(FDR\)correction\(Benjamini and Hochberg,[1995](https://arxiv.org/html/2605.26431#bib.bib2)\)across all \(layer×\\timespair×\\timescontrast\) tests within each model\. For point\-magnitude reporting in figures we additionally compute non\-parametric95%95\\%confidence intervals \(CIs\)by cluster bootstrap on item id \(n=1000n=1000resamples, percentile method\); the bootstrap operates on the raw contrast mean\-differenced¯target−d¯bare\\bar\{d\}\_\{\\text\{target\}\}\-\\bar\{d\}\_\{\\text\{bare\}\}and is independent of theOLSfit\.
#### Stimulus\-verification filter\.
Items failing theUD\-distance invariance check \([section˜3\.1](https://arxiv.org/html/2605.26431#S3.SS1)\) for a given probe pair are excluded from both the regression and the bootstrap for that pair, preventing residualUD\-level variance from contributing toβ\\beta\.
### 3\.3Canonical\-Layer Reporting
A common shortcut in the structural\-probing literature is to report the per\-model layer maximumβpeak=maxℓβ\(ℓ\)\\beta^\{\\text\{peak\}\}=\\max\_\{\\ell\}\\beta^\{\(\\ell\)\}\(Hewitt and Manning,[2019](https://arxiv.org/html/2605.26431#bib.bib12);Manning et al\.,[2020](https://arxiv.org/html/2605.26431#bib.bib19)\); when multiple contrasts are involved, this extends to selecting the peak layer independently for each\. This summary is opportunistic when the per\-contrast maxima land at different layers — in our panel,0of1313models share the same peak layer forβfin\\beta\_\{\\text\{fin\}\}andβinf\\beta\_\{\\text\{inf\}\}on thewh\-esubjpair, so per\-contrast peak reporting always selects a different layer for the two contrasts\.
We therefore define a single*canonical layer*per model, anchored to the more robust contrast:
L∗:\-argmaxℓβfin\(ℓ\),L^\{\*\}\\;\\coloneq\\;\\operatorname\*\{\\arg\\max\}\_\{\\ell\}\\beta\_\{\\text\{fin\}\}^\{\(\\ell\)\}\\\!,and report the canonical\-layer estimateβcanon:\-β\(L∗\)\\beta^\{\\text\{canon\}\}\\coloneq\\beta^\{\(L^\{\*\}\)\}for both contrasts atL∗L^\{\*\}\. Anchoring on finite reflects empirical reliability:βfin\\beta\_\{\\text\{fin\}\}is reliably positive across the panel \([section˜4\.1](https://arxiv.org/html/2605.26431#S4.SS1)\), whereasβinf\\beta\_\{\\text\{inf\}\}is sparser and noisier and is precisely the contrast for which we wish to remove a degree of freedom\.
By constructionβfincanon=βfinpeak\\beta\_\{\\text\{fin\}\}^\{\\text\{canon\}\}=\\beta\_\{\\text\{fin\}\}^\{\\text\{peak\}\};βinfcanon\\beta\_\{\\text\{inf\}\}^\{\\text\{canon\}\}may be substantially smaller thanβinfpeak\\beta\_\{\\text\{inf\}\}^\{\\text\{peak\}\}\. Canonical\-layer reporting is a strict refinement of peak reporting: Any claim it makes aboutβinf\\beta\_\{\\text\{inf\}\}also holds forβinfpeak\\beta\_\{\\text\{inf\}\}^\{\\text\{peak\}\}, but the converse fails\. Where the converse fails \(e\.g\. Qwen\-3\-4B;[section˜4\.1](https://arxiv.org/html/2605.26431#S4.SS1)\), peak reporting conceals heterogeneity that canonical\-layer reporting surfaces\.
### 3\.4Models
We evaluate on 13 decoder\-only language models from four families: Gemma\-3\(Team et al\.,[2025](https://arxiv.org/html/2605.26431#bib.bib26)\), Llama\-3\(Grattafiori et al\.,[2024](https://arxiv.org/html/2605.26431#bib.bib11)\), Mistral\(Jiang et al\.,[2023](https://arxiv.org/html/2605.26431#bib.bib14)\), and Qwen\(Qwen et al\.,[2025](https://arxiv.org/html/2605.26431#bib.bib22);Yang et al\.,[2025](https://arxiv.org/html/2605.26431#bib.bib31)\)\. All models are base \(non\-instruction\-tuned\) pretrained checkpoints, accessed via the HuggingFace Transformers library; the full panel is in[table˜2](https://arxiv.org/html/2605.26431#A1.T2)\. Probe training and evaluation were run on a single NVIDIA RTX PRO 6000 Blackwell GPU; total wall\-clock time was under two hours\.
## 4Experiments
### 4\.1Cross\-Clause Depth: The Phase\-Count Gradient
We probe each of 13LLMson the wh\-movement stimuli \([section˜3\.1](https://arxiv.org/html/2605.26431#S3.SS1)\) and estimateβfin\(ℓ\)\\beta\_\{\\text\{fin\}\}^\{\(\\ell\)\}andβinf\(ℓ\)\\beta\_\{\\text\{inf\}\}^\{\(\\ell\)\}on thewh\-esubjpair across all transformer layers\.[Figure˜2](https://arxiv.org/html/2605.26431#S3.F2)gives per\-model layer profiles,[fig\.˜3](https://arxiv.org/html/2605.26431#S4.F3)compares per\-contrast peak vs\. canonical\-layer reporting, and[table˜1](https://arxiv.org/html/2605.26431#S4.T1)reports per\-model headline numbers\. The within\-clauseesubj\-evbpair is treated separately in[section˜4\.2](https://arxiv.org/html/2605.26431#S4.SS2)\.
#### The phase\-count gradient under peak vs\. canonical reporting\.
At the per\-contrast peak layer, all 13 models show the predicted gradientβfin\>βinf\>0\\beta\_\{\\text\{fin\}\}\>\\beta\_\{\\text\{inf\}\}\>0on thewh\-esubjpair,222The basis forβinf\>0\\beta\_\{\\text\{inf\}\}\>0is indirect: Phrase\-structure distance from wh to the embedded subject is equal in bare and infinitival\. The distinction is derivational — the infinitival subject is base\-merged at Spec,v\*P \(phase edge;[fig\.7](https://arxiv.org/html/2605.26431#A3.F7);Chomsky[2001](https://arxiv.org/html/2605.26431#bib.bib7)\) while the bare subject is at Spec,VP \(non\-phase\)\.with median panel ratioβfinpeak/βinfpeak≈2\.18\\beta\_\{\\text\{fin\}\}^\{\\text\{peak\}\}/\\beta\_\{\\text\{inf\}\}^\{\\text\{peak\}\}\\approx 2\.18\. This summary is, however, sensitive to layer choice: None of the 13 models share the same peak layer forβfin\\beta\_\{\\text\{fin\}\}andβinf\\beta\_\{\\text\{inf\}\}\. Under canonical\-layer reporting \([section˜3\.3](https://arxiv.org/html/2605.26431#S3.SS3)\), the gradient prediction holds in 12 of 13 models:βinfcanon\>0\\beta\_\{\\text\{inf\}\}^\{\\text\{canon\}\}\>0in the predicted direction, and 9 of 13 satisfyβinfcanon\>\+0\.1\\beta\_\{\\text\{inf\}\}^\{\\text\{canon\}\}\>\+0\.1\([table˜1](https://arxiv.org/html/2605.26431#S4.T1)\)\. The median panelβinf\\beta\_\{\\text\{inf\}\}drops from\+0\.37\+0\.37at per\-contrast peak to\+0\.17\+0\.17at canonical layer, while the medianβfin\\beta\_\{\\text\{fin\}\}is unchanged by construction; the medianβfin/βinf\\beta\_\{\\text\{fin\}\}/\\beta\_\{\\text\{inf\}\}ratio rises from≈2\.2\\approx 2\.2to≈5\.1\\approx 5\.1\. The infinitival signal is therefore more layer\-localised than the finite signal: It attenuates substantially atL∗L^\{\*\}relative to its own peak \([fig\.˜3](https://arxiv.org/html/2605.26431#S4.F3)\)\.[Appendix˜E](https://arxiv.org/html/2605.26431#A5)confirms the gradient direction holds under two additional metrics: the layer median and the fraction ofFDR\-significant positive layers\.
#### The one canonical\-layer failure\.
Qwen\-3\-4B is the sole canonical\-layer failure onwh\-esubj: Itsβfin\\beta\_\{\\text\{fin\}\}peaks at layer1010, whereβinfcanon=−0\.045\\beta\_\{\\text\{inf\}\}^\{\\text\{canon\}\}=\-0\.045\(a sign reversal atL∗L^\{\*\}\), whileβinf\\beta\_\{\\text\{inf\}\}has its own peak at layer44\(\+0\.34\+0\.34\)\. The methodology thus surfaces a discrepancy that per\-contrast peak reporting would have hidden\.
#### Per\-layer reliability of the finite signal\.
Onwh\-esubj,βfincanon\\beta\_\{\\text\{fin\}\}^\{\\text\{canon\}\}ranges from\+0\.55\+0\.55\(Gemma\-3\-1B\) to\+0\.99\+0\.99\(Gemma\-3\-27B\); the panel median ofFDR\-significant positive layers forβfin\\beta\_\{\\text\{fin\}\}is94%94\\%, five of 13 models reach≥97%\\geq 97\\%, and the lowest is Gemma\-3\-1B at63%63\\%\. The finite\-bare contrast is therefore a broad layer\-wise signal rather than a single\-layer peak\. Within\-family scaling is heterogeneous: The Gemma family scales monotonically with size \(\+0\.55,\+0\.91,\+0\.96,\+0\.99\+0\.55,\+0\.91,\+0\.96,\+0\.99at11B,44B,1212B,2727B\) but the Llama family does not \(Llama\-3\.2\-1B at\+0\.77\+0\.77exceeds Llama\-3\.1\-8B at\+0\.72\+0\.72\)\.
Figure 2:Per\-model layer profiles ofβfin\(ℓ\)\\beta\_\{\\text\{fin\}\}^\{\(\\ell\)\}\(finite−\-bare\) andβinf\(ℓ\)\\beta\_\{\\text\{inf\}\}^\{\(\\ell\)\}\(infinitival−\-bare\) on thewh\-esubjpair\. Small filled markers indicate layersFDR\-significant atα=0\.05\\alpha=0\.05; the larger white\-edged marker is the per\-contrast peak in the predicted direction\.Figure 3:Per\-model peak vs\. canonical\-layerβ\\betaon thewh\-esubjpair, with95%95\\%cluster\-bootstrapCIs\.βfin\\beta\_\{\\text\{fin\}\}is shown at its peak layer \(==canonical by construction\)\.βinf\\beta\_\{\\text\{inf\}\}is shown both at its own peak \(open markers\) and atL∗L^\{\*\}\(filled markers\)\.Table 1:Phase\-count\-gradient summary at canonical\-layer reporting, on thewh\-esubjpair\.L∗L^\{\*\}is the canonical layer \(the layer maximisingβfin\\beta\_\{\\text\{fin\}\}\)\. The twoβ\\betacolumns give the bootstrap\-mean effect atL∗L^\{\*\}with95%95\\%cluster\-bootstrapCIs\. “FDR\+\+layers” counts the layers at which theOLSestimate ofβfin\\beta\_\{\\text\{fin\}\}onwh\-esubjis positive and significant under Benjamini–Hochberg correction atα=0\.05\\alpha=0\.05\.
### 4\.2Within\-Clause Cohesion: The Sign Asymmetry
We turn from the cross\-clause pair to the within\-clause pair: the embedded subject and the embedded verb \(esubj\-evb\), whoseUD\-tree distance is exactly11edge \(nsubj\) in every condition by design \([section˜3\.1](https://arxiv.org/html/2605.26431#S3.SS1)\)\. The empirical finding is a robust sign asymmetry: In every one of the1313models, across all four architecture families,βfinpeak<0\\beta\_\{\\text\{fin\}\}^\{\\text\{peak\}\}<0andβinfpeak\>0\\beta\_\{\\text\{inf\}\}^\{\\text\{peak\}\}\>0onesubj\-evb, with peak magnitudes typically in the0\.30\.3–0\.60\.6range \([fig\.˜4](https://arxiv.org/html/2605.26431#S4.F4), with per\-model layer profiles in[fig\.˜5](https://arxiv.org/html/2605.26431#S4.F5)\)\. Since the two contrasts have opposite predicted signs, canonical\-layer reporting \([section˜3\.3](https://arxiv.org/html/2605.26431#S3.SS3)\) does not apply here\.
Figure 4:Per\-model peakβ\\betaon theesubj\-evbpair with95%95\\%cluster\-bootstrapCIs\. All 13 models showβfinpeak<0\\beta\_\{\\text\{fin\}\}^\{\\text\{peak\}\}<0andβinfpeak\>0\\beta\_\{\\text\{inf\}\}^\{\\text\{peak\}\}\>0— a 13/13 sign asymmetry\.βfin\\beta\_\{\\text\{fin\}\}is shown at its predicted\-direction peak \(argminℓβfin\(ℓ\)\\arg\\min\_\{\\ell\}\\beta\_\{\\text\{fin\}\}^\{\(\\ell\)\}\);βinf\\beta\_\{\\text\{inf\}\}at its peak \(argmaxℓβinf\(ℓ\)\\arg\\max\_\{\\ell\}\\beta\_\{\\text\{inf\}\}^\{\(\\ell\)\}\)\. Models are ordered by the magnitude of the asymmetry \(βinfpeak−βfinpeak\\beta\_\{\\text\{inf\}\}^\{\\text\{peak\}\}\-\\beta\_\{\\text\{fin\}\}^\{\\text\{peak\}\}\), descending\.#### Three observations rule out simpler accounts\.
First,UDdistance between the two words is11edge in every condition \([section˜3\.1](https://arxiv.org/html/2605.26431#S3.SS1)\), so a pureUD\-decoding probe would yieldβ≈0\\beta\\approx 0\. Second, the linear distance between the two words is11word in bare and finite but22words in infinitival \(the*to*marker intervenes\)\. A surface\-linear\-distance heuristic predictsβinf\>0\\beta\_\{\\text\{inf\}\}\>0andβfin≈0\\beta\_\{\\text\{fin\}\}\\approx 0; the observed*negative*βfin\\beta\_\{\\text\{fin\}\}cannot come from linear distance\. Third, a monotone structural\-complexity heuristic — “more structure between the two words means larger probe distance” — predictsβfin\>βinf\>0\\beta\_\{\\text\{fin\}\}\>\\beta\_\{\\text\{inf\}\}\>0, since finite adds the most structure and bare the least\. The observed*negative*βfin\\beta\_\{\\text\{fin\}\}directly contradicts this prediction\.
#### Per\-layer pattern\.
The sign asymmetry is not a single\-layer peak\. Layer profiles \([fig\.˜5](https://arxiv.org/html/2605.26431#S4.F5)\) showβfin<0\\beta\_\{\\text\{fin\}\}<0andβinf\>0\\beta\_\{\\text\{inf\}\}\>0atFDR\-significant layers throughout most of the network in most models\. As with the cross\-clause finding, the pattern holds across all four architecture families\.
Figure 5:Per\-model layer profiles ofβfin\(ℓ\)\\beta\_\{\\text\{fin\}\}^\{\(\\ell\)\}\(finite−\-bare\) andβinf\(ℓ\)\\beta\_\{\\text\{inf\}\}^\{\(\\ell\)\}\(infinitival−\-bare\) on theesubj\-evbpair\. The sign asymmetry —βfin<0\\beta\_\{\\text\{fin\}\}<0andβinf\>0\\beta\_\{\\text\{inf\}\}\>0— holds atFDR\-significant layers throughout most of the network in most models\. Marker conventions follow[fig\.˜2](https://arxiv.org/html/2605.26431#S3.F2)\.
#### From the PIC to representations\.
ThePIC\(Chomsky,[2000](https://arxiv.org/html/2605.26431#bib.bib6),[2001](https://arxiv.org/html/2605.26431#bib.bib7)\)is a constraint on syntactic derivation — it determines which operations the grammar can perform across phase boundaries — and is not, in itself, a claim about representational geometry\. To connect phase theory to the observed sign asymmetry we therefore need an additional assumption\. The candidate is phase\-internal cohesion: Items inside the same completed phase are spelled out as a unit and share locality\-domain status, leading to more shared computation than structural depth alone predicts and making them representationally closer in the model’s hidden states\.
#### Three converging lines of work\.
The cohesion hypothesis is supported by independent strands of evidence\. Multiple\-Spell\-Out models\(Uriagereka,[1999](https://arxiv.org/html/2605.26431#bib.bib27),[2012](https://arxiv.org/html/2605.26431#bib.bib28);Fox and Pesetsky,[2005](https://arxiv.org/html/2605.26431#bib.bib9)\)treat derivation as cyclic spell\-out domains dispatched to the interfaces as units\. The locality\-domain literature\(Müller,[2011](https://arxiv.org/html/2605.26431#bib.bib21);Bošković,[2007](https://arxiv.org/html/2605.26431#bib.bib3);Lee\-Schoenfeld,[2008](https://arxiv.org/html/2605.26431#bib.bib18);Canac Marquis,[2005](https://arxiv.org/html/2605.26431#bib.bib4)\)treats phases as the domains for agreement, case, and binding, within which items remain mutually accessible\. And sentence\-processing work has found that comprehenders perform additional integrative computation at clause boundaries\(Just and Carpenter,[1980](https://arxiv.org/html/2605.26431#bib.bib15);Rayner et al\.,[2000](https://arxiv.org/html/2605.26431#bib.bib23)\), consolidating within\-clause material into a unitary representation — a clause\-wrap\-up effect that is not directly about phases but aligns with the cohesion intuition\. None of these directly establishes phase\-internal cohesion as an empirical fact aboutLLMs, but each provides a converging reason to expect within\-phase material to be represented as a more cohesive unit than cross\-phase material\.
#### Whyesubj\-evbspecifically\.
Phase cohesion would in principle apply to any pair of items inside the same phase\. Its predictive force is asymmetric across our two probe pairs for a structural reason\. Foresubj\-evb, both items sit inside the embedded clause: In the finite condition both are inside a complete CP phase, while in the infinitival and bare conditions the embedded clause is not itself a phase\. Phase cohesion therefore applies only in the finite condition, predicting that finite tightens the pair representationally relative to bare — exactly the observedβfin<0\\beta\_\{\\text\{fin\}\}<0\. Forwh\-esubj, the items span clause boundaries and are never co\-internal to any embedded phase; phase cohesion does not apply\.
The phase\-cohesion reading predicts the observed direction: Finite couples the embedded subject and verb most tightly \(both inside one phase;βfin<0\\beta\_\{\\text\{fin\}\}<0against the no\-phase baseline\); infinitival splits them across a phase boundary \(separately spelled out;βinf\>0\\beta\_\{\\text\{inf\}\}\>0\)\. We treat phase cohesion as one possible account among others\. The strong empirical claim is the13/1313/13sign asymmetry itself, which noUD\-, linear\-, or monotone\-complexity\-based account predicts\.
### 4\.3Causal Experiment: Embedded\-Subject Patching
The cross\-clause and within\-clause results \([sections˜4\.1](https://arxiv.org/html/2605.26431#S4.SS1)and[4\.2](https://arxiv.org/html/2605.26431#S4.SS2)\) establish thatLLMrepresentations exhibit correlates of phase structure\. This is observational evidence: It does not establish that those representations are causally used in the model’s computation\. The probing literature has raised this concern in general terms\(Agarwal et al\.,[2025](https://arxiv.org/html/2605.26431#bib.bib1)\); we address it directly with an activation\-patching experiment\.
#### Design\.
For each item, we pair a source condition \(infinitival\) and a target condition \(bare\)\. The source and target share lexical content but differ in the embedded clause —She expected him to leaveversusShe saw him leave\. At each model’s canonical layerL∗L^\{\*\}\([table˜1](https://arxiv.org/html/2605.26431#S4.T1)\), we replace the residual\-stream representation at the embedded subject’s first subword token with the source representation; the probe re\-pools across all subwords \([section˜3\.2](https://arxiv.org/html/2605.26431#S3.SS2)\), so the patch contributes proportionally to the word’s pooled representation\. We measure the resulting changeΔβ=βpatched−βtarget\\Delta\\beta=\\beta\_\{\\text\{patched\}\}\-\\beta\_\{\\text\{target\}\}on thewh\-esubjpair\. We use \(infinitival, bare\) rather than \(finite, bare\): The infinitival adds only T and a vP shell above the bare baseline, isolating the structural\-depth manipulation, whereas \(finite, bare\) additionally conflates morphological finiteness, the complementizer, and agreement\. As a negative control, we additionally patch at the wh\-position \(matrix Spec,CP\) and measureΔβ\\Delta\\betaonwh\-esubj\. The wh\-element’s local representation should not directly encode embedded\-clause structure, so this should yield\|Δβ\|≈0\|\\Delta\\beta\|\\approx 0— a test against generic artefacts of the patching procedure\.
#### Predictions\.
If the embedded subject’s representation causally encodes the clause\-type information picked up by the probe, patching the infinitival’s representation into the bare target should yieldΔβ\>0\\Delta\\beta\>0onwh\-esubjwith bootstrap95%95\\%CIexcluding zero\. The wh\-position control should yield\|Δβ\|≤0\.05\|\\Delta\\beta\|\\leq 0\.05\.
#### Results\.
Twelve of 13 models show positiveΔβ\\Delta\\betaonwh\-esubjwith bootstrap95%95\\%CIexcluding zero \(point estimates\+0\.08\+0\.08–\+0\.40\+0\.40\); the exception is Qwen\-3\-4B \(point estimate−0\.06\-0\.06;CIincludes zero\), the same model that fails canonical\-layer reporting in[section˜4\.1](https://arxiv.org/html/2605.26431#S4.SS1)\. The wh\-position control yields\|Δβ\|≤0\.05\|\\Delta\\beta\|\\leq 0\.05in 12 of 13 models; the exception is Qwen\-2\.5\-7B at\|Δβ\|=0\.060\|\\Delta\\beta\|=0\.060, marginally above the threshold\. These near\-null control results support the interpretation that the headline effect is not a generic patching artefact \([fig\.˜6](https://arxiv.org/html/2605.26431#S4.F6)\)\.
Figure 6:Per\-modelΔβ\\Delta\\betafor the embedded\-subject patch \(filled circles\) and the wh\-position negative control \(open grey squares\), measured onwh\-esubj\(left panel, blue\) andesubj\-evb\(right panel, orange\)\. Source condition is infinitival; target is bare; intervention layer is each model’s canonical layerL∗L^\{\*\}\. Bootstrap95%95\\%CIsshown\.
#### Scope of the causal claim\.
The intervention sits on the embedded subject’s residual stream, and probe distances for the within\-clauseesubj\-evbpair are consequently affected too \([fig\.˜6](https://arxiv.org/html/2605.26431#S4.F6), right panel\)\. We therefore read the result conservatively: The embedded subject’s representation causally encodes clause\-type information in a way that affects every probe pair containing that word, rather than being specific to the cross\-clause structural depth\. The experiment does not directly test theesubj\-evbsign asymmetry, which spans all three conditions and is not reducible to single\-word encoding\.
## 5Related Work
The structural probe\(Hewitt and Manning,[2019](https://arxiv.org/html/2605.26431#bib.bib12);Manning et al\.,[2020](https://arxiv.org/html/2605.26431#bib.bib19)\)casts syntactic encoding as a linear transformation of contextualised word representations trained to recoverUDdependency\-tree distances\. Because probe training and evaluation both operate within theUDframework, structural probing is by construction sensitive only toUD\-grounded distinctions\.Kennedy\([2025](https://arxiv.org/html/2605.26431#bib.bib16)\)extend the structural probe to generative\-syntax stimuli contrasting subject\-raising and subject\-control structures, providing the closest precedent for evaluating structural probes against distinctions thatUDdoes not directly encode\. The present work strengthens this test:UD\-tree distances between probed pairs are invariant across conditions by construction, verified per item, so no residualUDdifference can account for observed effects\.
A parallel tradition uses behavioural evaluations — minimal\-pair acceptability contrasts — to assess syntactic knowledge in language models\.Marvin and Linzen\([2018](https://arxiv.org/html/2605.26431#bib.bib20)\)established a broad\-coverage evaluation spanning agreement, reflexive anaphora, and negative\-polarity phenomena\. For wh\-movement specifically,Wilcox et al\.\([2018](https://arxiv.org/html/2605.26431#bib.bib30)\)showed that RNN models track gap sites but fail on island constraints;Koo and Kim\([2026](https://arxiv.org/html/2605.26431#bib.bib17)\)extended this to GPT\-style Transformer models, finding failure to replicate human sensitivity to successive\-cyclic movement\. Behavioural and representational methods are not interchangeable:Agarwal et al\.\([2025](https://arxiv.org/html/2605.26431#bib.bib1)\)demonstrate across 32 models that structural probe accuracy does not predict minimal\-pair performance, indicating that the two approaches recover different aspects of syntactic knowledge\.
Whether probed representations play a causal role in model computation — rather than merely correlating with behaviour — is a central methodological concern\(Agarwal et al\.,[2025](https://arxiv.org/html/2605.26431#bib.bib1)\), formalised in the causal\-abstraction framework ofGeiger et al\.\([2023](https://arxiv.org/html/2605.26431#bib.bib10)\)\. Activation patching tests this directly: An internal representation is replaced with one from a counterfactual input, and the downstream change in the probed quantity measures causal relevance\.
## 6Discussion
#### Structural encoding below theUDsurface\.
Theesubj\-evbsign asymmetry —βfinpeak<0\\beta\_\{\\text\{fin\}\}^\{\\text\{peak\}\}<0andβinfpeak\>0\\beta\_\{\\text\{inf\}\}^\{\\text\{peak\}\}\>0across all 13 models — is the central existence proof thatLLMrepresentations encode structural information inaccessible toUD\-grounded probing\. Three alternative accounts —UD\-distance\-based, surface\-linear, and monotone\-complexity\-based — all fail to account for the sign asymmetry \([section˜4\.2](https://arxiv.org/html/2605.26431#S4.SS2)\)\. The pattern is specifically predicted by phase\-internal cohesion — a formal\-syntactic abstraction from theMP’s treatment of cyclic spell\-out, invisible toUDby construction\. Its 13/13 replication across four architectures suggests that distributional pretraining can induce representations aligned with formal\-syntactic abstractions beyond the reach of annotation\-based probing\. The implication is as methodological as it is empirical:UD\-grounded probes provide a lower bound on syntactic encoding, not an upper bound\.
#### Canonical\-layer reporting\.
Anchoring all contrasts toL∗L^\{\*\}\(the layer maximising the most reliable contrast\) removes one cross\-contrast degree of freedom and makes canonical\-layer claims strictly stronger than per\-contrast peak claims\. We recommend canonical\-layer reporting as a default for structural\-probing work that compares multiple contrasts in the same predicted direction over a shared model panel\.
#### Causal corroboration and its scope\.
The activation\-patching result \([section˜4\.3](https://arxiv.org/html/2605.26431#S4.SS3)\) places the phase\-count gradient on causal footing: The embedded subject’s representation is not merely correlated with clause\-type information but encodes it in a way that shapes the model’s internal distances\. The scope is conservative — the intervention sits on a single word’s residual stream — but the result directly addresses the probing\-scepticism concern ofAgarwal et al\.\([2025](https://arxiv.org/html/2605.26431#bib.bib1)\)\.
## Limitations
All stimuli and probes are in English\. The theoretical predictions follow from phase structure, which generalises cross\-linguistically, but the specific three\-way complement\-size contrast exploits English\-specific verb\-class properties and case morphology\. Extension to languages with overt morphological or word\-order diagnostics for phase structure — such as V2 languages, where finite verbs obligatorily move to C in main clauses, or ergative languages with distinct case\-phase interactions — would provide a more stringent cross\-linguistic test\.
The phase\-count gradient prediction is directional: We predictβfincanon\>βinfcanon\>0\\beta\_\{\\text\{fin\}\}^\{\\text\{canon\}\}\>\\beta\_\{\\text\{inf\}\}^\{\\text\{canon\}\}\>0but make no prediction about the ratio’s magnitude; the prediction holds for anyβfincanon/βinfcanon\>1\\beta\_\{\\text\{fin\}\}^\{\\text\{canon\}\}/\\beta\_\{\\text\{inf\}\}^\{\\text\{canon\}\}\>1\. Qwen\-3\-4B is the sole canonical\-layer failure; we have no principled account of why this model differs from the other twelve\. Within\-family scaling onβfin\\beta\_\{\\text\{fin\}\}is heterogeneous across the panel \([section˜4\.1](https://arxiv.org/html/2605.26431#S4.SS1)\), and what determines theβfin/βinf\\beta\_\{\\text\{fin\}\}/\\beta\_\{\\text\{inf\}\}ratio magnitude remains an open question\.
The panel covers 13 publicly available base models from four architecture families, with parameter counts between 1B and 27B\. It excludes instruction\-tuned variants, mixture\-of\-experts models, and models above 27B; whether the phase\-count gradient and theesubj\-evbsign asymmetry extend to these settings is unknown\.
## Acknowledgments
This research was funded by the Defense Advanced Research Projects Agency \(DARPA\), under contract W912CG23C0031\.
## References
- Agarwal et al\. \(2025\)Ananth Agarwal, Jasper Jian, Christopher D Manning, and Shikhar Murty\. 2025\.[Mechanisms vs\. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations](https://doi.org/10.18653/v1/2025.emnlp-main.1712)\.In*Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing*, pages 33737–33757, Suzhou, China\. Association for Computational Linguistics\.
- Benjamini and Hochberg \(1995\)Yoav Benjamini and Yosef Hochberg\. 1995\.[Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing](https://doi.org/10.1111/j.2517-6161.1995.tb02031.x)\.*Journal of the Royal Statistical Society: Series B \(Methodological\)*, 57\(1\):289–300\.
- Bošković \(2007\)Željko Bošković\. 2007\.[On the Locality and Motivation of Move and Agree: An Even More Minimal Theory](https://doi.org/10.1162/ling.2007.38.4.589)\.*Linguistic Inquiry*, 38\(4\):589–644\.
- Canac Marquis \(2005\)Rejean Canac Marquis\. 2005\.[Phases and Binding of Reflexives and Pronouns in English](https://doi.org/10.21248/hpsg.2005.28)\.*Proceedings of the International Conference on Head\-Driven Phrase Structure Grammar*\.
- Chomsky \(1986\)Noam Chomsky\. 1986\.*Barriers*\.Linguistic Inquiry Monographs\. MIT Press, Cambridge, MA, USA\.
- Chomsky \(2000\)Noam Chomsky\. 2000\.Minimalist Inquiries: The Framework\.In*Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik*, pages 89–155\. MIT Press, Cambridge\.
- Chomsky \(2001\)Noam Chomsky\. 2001\.[Derivation by Phase](https://doi.org/10.7551/mitpress/4056.003.0004)\.In Michael Kenstowicz, editor,*Ken Hale*, pages 1–52\. The MIT Press\.
- Erteschik\-Shir \(1973\)Nomi Erteschik\-Shir\. 1973\.[*On the nature of island constraints\.*](http://hdl.handle.net/1721.1/12991)Ph\.D\. thesis, Massachusetts Institute of Technology\.
- Fox and Pesetsky \(2005\)Danny Fox and David Pesetsky\. 2005\.[Cyclic Linearization of Syntactic Structure](https://doi.org/10.1515/thli.2005.31.1-2.1)\.*Theoretical Linguistics*, 31\(1\-2\):1–45\.
- Geiger et al\. \(2023\)Atticus Geiger, Duligur Ibeling, Amir Zur, Maheep Chaudhary, Sonakshi Chauhan, Jing Huang, Aryaman Arora, Zhengxuan Wu, Noah Goodman, Christopher Potts, and Thomas Icard\. 2023\.[Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability](https://arxiv.org/abs/2301.04709v4)\.
- Grattafiori et al\. \(2024\)Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al\-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, and 542 others\. 2024\.[The Llama 3 Herd of Models](https://doi.org/10.48550/arXiv.2407.21783)\.*arXiv preprint*\.ArXiv:2407\.21783 \[cs\.AI\]\.
- Hewitt and Manning \(2019\)John Hewitt and Christopher D\. Manning\. 2019\.[A Structural Probe for Finding Syntax in Word Representations](https://doi.org/10.18653/v1/N19-1419)\.In*Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 \(Long and Short Papers\)*, pages 4129–4138, Minneapolis, Minnesota\. Association for Computational Linguistics\.
- Honnibal et al\. \(2020\)Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd\. 2020\.[spaCy: Industrial\-strength natural language processing in python](https://doi.org/10.5281/zenodo.1212303)\.
- Jiang et al\. \(2023\)Albert Q\. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie\-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed\. 2023\.[Mistral 7B](https://doi.org/10.48550/arXiv.2310.06825)\.*arXiv preprint*\.ArXiv:2310\.06825 \[cs\.CL\]\.
- Just and Carpenter \(1980\)Marcel A\. Just and Patricia A\. Carpenter\. 1980\.[A theory of reading: From eye fixations to comprehension](https://doi.org/10.1037/0033-295X.87.4.329)\.*Psychological Review*, 87\(4\):329–354\.
- Kennedy \(2025\)Mary Kennedy\. 2025\.[Evidence of Generative Syntax in LLMs](https://doi.org/10.18653/v1/2025.conll-1.25)\.In*Proceedings of the 29th Conference on Computational Natural Language Learning*, pages 377–396, Vienna, Austria\. Association for Computational Linguistics\.
- Koo and Kim \(2026\)Keonwoo Koo and Hyosik Kim\. 2026\.[Successive\-cyclic movement in humans and neural language models: testing wh\-filler\-gap dependencies](https://doi.org/10.3389/fpsyg.2025.1699740)\.*Frontiers in Psychology*, 16\.
- Lee\-Schoenfeld \(2008\)Vera Lee\-Schoenfeld\. 2008\.[Binding, Phases, and Locality](https://doi.org/10.1111/j.1467-9612.2008.00118.x)\.*Syntax*, 11\(3\):281–298\.
- Manning et al\. \(2020\)Christopher D\. Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, and Omer Levy\. 2020\.[Emergent linguistic structure in artificial neural networks trained by self\-supervision](https://doi.org/10.1073/pnas.1907367117)\.*Proceedings of the National Academy of Sciences*, 117\(48\):30046–30054\.
- Marvin and Linzen \(2018\)Rebecca Marvin and Tal Linzen\. 2018\.[Targeted Syntactic Evaluation of Language Models](https://doi.org/10.18653/v1/D18-1151)\.In*Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing*, pages 1192–1202, Brussels, Belgium\. Association for Computational Linguistics\.
- Müller \(2011\)Gereon Müller\. 2011\.[*Constraints on Displacement: A phase\-based approach*](https://doi.org/10.1075/lfab.7), volume 7 of*Language Faculty and Beyond*\.John Benjamins Publishing Company, Amsterdam\.
- Qwen et al\. \(2025\)Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, and 24 others\. 2025\.[Qwen2\.5 Technical Report](https://doi.org/10.48550/arXiv.2412.15115)\.*arXiv preprint*\.ArXiv:2412\.15115 \[cs\.CL\]\.
- Rayner et al\. \(2000\)Keith Rayner, Gretchen Kambe, and Susan A\. Duffy\. 2000\.[The Effect of Clause Wrap\-Up on Eye Movements during Reading](https://doi.org/10.1080/713755934)\.*The Quarterly Journal of Experimental Psychology Section A*, 53\(4\):1061–1080\.
- Silveira et al\. \(2014\)Natalia Silveira, Timothy Dozat, Marie\-Catherine de Marneffe, Samuel R\. Bowman, Miriam Connor, John Bauer, and Chris Manning\. 2014\.[A gold standard dependency corpus for English](https://aclanthology.org/L14-1067/)\.In*Proceedings of the Ninth International Conference on Language Resources and Evaluation \(LREC’14\)*, pages 2897–2904, Reykjavik, Iceland\. European Language Resources Association \(ELRA\)\.
- Stowell \(1981\)Timothy Angus Stowell\. 1981\.[*Origins of phrase structure*](https://dspace.mit.edu/handle/1721.1/15626)\.Thesis, Massachusetts Institute of Technology\.Accepted: 2009\-01\-23T14:40:10Z\.
- Team et al\. \(2025\)Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean\-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, and 197 others\. 2025\.[Gemma 3 Technical Report](https://doi.org/10.48550/arXiv.2503.19786)\.*arXiv preprint*\.ArXiv:2503\.19786 \[cs\.CL\]\.
- Uriagereka \(1999\)Juan Uriagereka\. 1999\.[Multiple Spell\-Out](https://doi.org/10.7551/mitpress/7305.003.0012)\.In*Working Minimalism*\. The MIT Press\.
- Uriagereka \(2012\)Juan Uriagereka\. 2012\.*Spell\-out and the minimalist program*\.Oxford linguistics\. Oxford university press, Oxford New York\.
- Urk \(2020\)Coppe van Urk\. 2020\.[Successive Cyclicity and the Syntax of Long\-Distance Dependencies](https://doi.org/10.1146/annurev-linguistics-011718-012318)\.*Annual Review of Linguistics*, 6\(Volume 6, 2020\):111–130\.
- Wilcox et al\. \(2018\)Ethan Wilcox, Roger Levy, Takashi Morita, and Richard Futrell\. 2018\.[What do RNN Language Models Learn about Filler–Gap Dependencies?](https://doi.org/10.18653/v1/W18-5423)In*Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP*, pages 211–221, Brussels, Belgium\. Association for Computational Linguistics\.
- Yang et al\. \(2025\)An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, and 41 others\. 2025\.[Qwen3 Technical Report](https://doi.org/10.48550/arXiv.2505.09388)\.*arXiv preprint*\.ArXiv:2505\.09388 \[cs\.CL\]\.
## Appendix AFull Model Panel
[Table˜2](https://arxiv.org/html/2605.26431#A1.T2)lists all 13 models with their family, parameter count, and number of transformer blocks\. All models are used under licenses permitting academic research use: the Gemma Terms of Use \(Gemma\-3\), the Meta Llama 3 Community License \(Llama\-3\), and Apache 2\.0 \(Mistral\-7B\-v0\.3, Qwen\-2\.5, Qwen\-3\)\.
Table 2:The 13\-model panel, organised by family\. “Layers” is the number of transformer blocks, excluding the embedding layer\.
## Appendix BSyntactic Terminology
[Table˜3](https://arxiv.org/html/2605.26431#A2.T3)defines the syntactic terms used throughout the paper\.
Table 3:Syntactic terms used in the paper\.
## Appendix CDetailed Phrase Structures
[Figure˜7](https://arxiv.org/html/2605.26431#A3.F7)gives phrase structures with intermediate projections for all three conditions, showing the vP–VP decomposition under little\-vvand copy\-theoretic lower copies for subject raising and head movement; intermediate wh\-movement copies are omitted for clarity\.
\{forest\}
\(a\)Bare
\{forest\}
\(b\)Infinitival
\{forest\}\(c\)Finite
Figure 7:Phrase structures with intermediate projections for the three conditions, following copy theory\(Chomsky,[2000](https://arxiv.org/html/2605.26431#bib.bib6)\)\. Struck\-through nodes are lower \(unpronounced\) copies: the auxiliarydid\(C←\\leftarrowT\), subjects \(Spec,TP←\\leftarrowSpec,vP\), and verbs \(v←\\leftarrowV\)\. The wh\-element \(whati\) is shown at its base position, at embedded Spec,CP \(finite only\), and at matrix Spec,CP; the intermediate copy at Spec,vP of the embedded clause is omitted for clarity\. The embedded clause is colour\-coded by condition; phase heads introduced within it are in bold: none in bare \(a\), the embedded vP in infinitival \(b\), the embedded CP and vP in finite \(c\)\.
## Appendix DProbe Training Quality
[Figure˜8](https://arxiv.org/html/2605.26431#A4.F8)shows per\-layer probe quality on theUD\-EWTvalidation set for all 13 models\. The textbook structural\-probe profile — sharp rise from the embedding layer, plateau in the middle layers, and moderate decline near the output — holds consistently across all four architecture families, confirming that the probes underlying[section˜4](https://arxiv.org/html/2605.26431#S4)reliably encode dependency structure\.
Figure 8:Per\-layer probe quality on theUD\-EWTvalidation set for all 13 models\. Panel \(a\): distance Spearman correlation\. Panel \(b\): undirected unlabelled attachment score \(uuas\)\. Thexx\-axis is relative depth \(0 = embedding layer, 1 = final transformer block\)\.
## Appendix ERobustness Across Summary Metrics
[Table˜4](https://arxiv.org/html/2605.26431#A5.T4)reports thewh\-esubjphase\-count gradient under four summary statistics: per\-contrast peakβ\\beta, the effect at the canonical layer \(@L∗L^\{\*\}\), the layer median, and the fraction ofFDR\-significant positive layers \(%sig\+\)\. The finite\-bare contrast is robust across all four metrics in every model\. The infinitival\-bare contrast passes the peak criterion in all 13 models but attenuates substantially at canonical layer and median, reflecting its layer\-localised character documented in[section˜4\.1](https://arxiv.org/html/2605.26431#S4.SS1)\.
Table 4:Phase\-count\-gradient summary on thewh\-esubjpair under four metrics\. “peak” is the per\-contrast layer maximum; “@L∗L^\{\*\}” is the effect at the canonical layer; “med” is the layer median; “%sig\+” is the fraction of layers with a positiveFDR\-significantOLSestimate\.finite – bareinfinitival – bareModelL∗L^\{\*\}peakmed%sig\+peak@L∗L^\{\*\}med%sig\+Gemma\-3\-1B4\+0\.55\+0\.0963%\+0\.15\+0\.05\-0\.1033%Gemma\-3\-4B3\+0\.91\+0\.4494%\+0\.42\+0\.07\+0\.0763%Gemma\-3\-12B29\+0\.96\+0\.4398%\+0\.37\+0\.29\+0\.0351%Gemma\-3\-27B32\+0\.99\+0\.5097%\+0\.32\+0\.10\-0\.0141%Llama\-3\.2\-1B3\+0\.77\+0\.2076%\+0\.36\+0\.12\+0\.0653%Llama\-3\.1\-8B3\+0\.72\+0\.3285%\+0\.44\+0\.18\+0\.0870%Mistral\-7B\-v0\.35\+0\.73\+0\.2691%\+0\.28\+0\.20\+0\.0352%Qwen\-2\.5\-1\.5B1\+0\.67\+0\.1269%\+0\.57\+0\.19\+0\.0652%Qwen\-3\-1\.7B10\+0\.82\+0\.3793%\+0\.47\+0\.27\+0\.0459%Qwen\-3\-4B10\+0\.73\+0\.4395%\+0\.34\-0\.04\+0\.0251%Qwen\-2\.5\-7B12\+0\.88\+0\.4697%\+0\.49\+0\.17\+0\.1066%Qwen\-3\-8B10\+0\.76\+0\.4097%\+0\.28\+0\.15\-0\.0432%Qwen\-3\-14B32\+0\.89\+0\.4298%\+0\.22\+0\.18\+0\.0354%Similar Articles
Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States
This paper demonstrates that linear probes on LLM hidden states detect task format confounds (e.g., source identity, response length) rather than distinct reasoning modes, using residualization and causal steering to show that high probe accuracy is due to superficial features, not computational structure.
Causal Probing for Internal Visual Representations in Multimodal Large Language Models
This paper proposes a causal framework for probing internal visual representations in Multimodal Large Language Models, revealing differences in how entities and abstract concepts are encoded. The study highlights that increasing model depth is crucial for encoding abstract concepts and uncovers a disconnect between perception and reasoning in current MLLMs.
What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs
This paper presents a methodology for delineating concepts and training linear probes to detect them in LLM embeddings, using four example concepts across three models. The work aims to enable scalable monitoring of LLM internal representations.
Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations
This paper systematically tests linear probes for deception detection in large language models, finding they fail under distributional shifts but style-augmented probes recover performance, and revealing that deception is encoded through distributed sub-threshold features.
Polar probe linearly decodes semantic structures from LLMs
This paper proposes a Polar Probe that linearly recovers semantic structures from LLM activations by representing entity relations through distance and direction in a learned subspace. Testing across arithmetic, visual scenes, family trees, metro maps, and social interactions shows the code emerges in middle layers, generalizes to new entities, and causally influences model predictions.