The Geometry of Last-Layer Model Stealing

arXiv cs.LG Papers

Summary

The paper provides a geometric interpretation of last-layer model stealing attacks on transformers using exterior differential systems, showing that recovery of the projection matrix is governed by the polar space of a quadric. It also characterizes an identifiability wall below the last layer, revealing what can and cannot be extracted.

arXiv:2606.06854v1 Announce Type: new Abstract: This paper uses geometry to explain how a machine learning model can be stolen using an already existing well-known method. The author has shown the exact conditions required to perfectly copy the final layer of a transformer network. When looking deeper into the hidden layers the author has explained clear limits. The author has also demonstrated that a hidden network cannot be fully reverse engineered just by looking at the final results. The research clearly maps out what can and cannot be stolen from a model.
Original Article
View Cached Full Text

Cached at: 06/08/26, 09:18 AM

# The Geometry of Last-Layer Model Stealing
Source: [https://arxiv.org/html/2606.06854](https://arxiv.org/html/2606.06854)
###### Abstract

We give a geometric reading of the last\-layer model\-stealing attack of Carlini et al\. \[[References](https://arxiv.org/html/2606.06854#bib)\], using the vocabulary of exterior differential systems \(EDS\) recently extended to Lie algebroids by Hohloch, Mestdag and Yasaka \[[References](https://arxiv.org/html/2606.06854#bib)\]\. The set of logit vectors a transformer can emit is the common zero locus of an*ideal*with one linear part \(recovered by the singular value decomposition\) and one quadratic part \(the ellipsoid induced by the final normalization layer\)\. In this language the object that controls recovery of the projection matrix is the*polar space*of that quadric, which we show is exactly the tangent space of the output manifold; recovery succeeds, up to the unavoidable orthogonal gauge, precisely when a pair of*regularity*conditions—the analogues of Kähler\-regularity—hold\. We verify every step on a fully controlled toy model to machine precision\. We then ask what lies below the last layer and report two things\. First, the*intrinsic dimension*of the recoverable hidden\-state manifold is an observable, invisible to the singular value decomposition and to the quadric, that detects a nonlinear sublayer and measures its effective rank\. Second, we characterize what is and is not identifiable beneath the last layer, and exhibit large explicit non\-identifiable fibers: different sublayers, and even different architecture widths, that produce bit\-identical outputs\. We are deliberately explicit about scope: the EDS framing organizes the picture but is not the engine, and the load\-bearing results are classical\. The contribution is a clean unified account and a concrete identifiability boundary, not a new attack\.

## 1Introduction

Production language models are exposed only through APIs, yet Carlini et al\. \[[References](https://arxiv.org/html/2606.06854#bib)\] showed that the final*embedding projection*\(“unembedding”\) layer of such a model can be recovered, up to symmetries, from ordinary query access\. Their attack is top\-down: because the last layer maps a small hidden dimensionhhto a large logit vector of dimensionl≫hl\\gg h, the logits live in anhh\-dimensional subspace, and the singular value decomposition \(SVD\) of enough query responses reveals bothhhand the projection matrix up to a linear change of basis\. A refinement \(their Appendix H\) exploits the fact that the final normalization layer places hidden states on a sphere, so the logits lie on an ellipsoid; fitting that ellipsoid sharpens the recovery from “up to an invertible matrix” to “up to an orthogonal matrix\.”

This note makes two contributions, both modest and clearly delimited\.

### A geometric account\.

We recast the attack in the language of exterior differential systems, which is the natural setting for “reconstruct a global object from local data under constraints, modulo a symmetry group\.” The attainable logits form the integral variety of an*ideal*; the object governing recovery of the projection is the*polar space*of its quadratic generator; and recovery is well posed exactly under regularity conditions that are the affine analogues of the Kähler\-regularity used in the Cartan–Kähler theorem \[[References](https://arxiv.org/html/2606.06854#bib),[References](https://arxiv.org/html/2606.06854#bib)\]\. The single\-layer case turns out to be Frobenius\-integrable—which is precisely why the attack is closed\-form rather than iterative—and we confirm the whole picture numerically\.

### An identifiability wall\.

We then look one layer deeper\. We observe that the*intrinsic dimension*of the recoverable hidden\-state manifold is an extraction observable distinct from the linear span the SVD measures: when a low\-rank nonlinear sublayer is present, the span overstates the content dimension, and the intrinsic dimension reveals the bottleneck\. Finally we give a crisp identifiability characterization of the sublayer and demonstrate, with machine\-precision examples, that most of its parameters lie in a non\-identifiable fiber\. This explains mechanically why the attack has not been extended past one layer: it is not a missing trick but a property of the observation map\.

### Honesty about scope\.

The geometric language is organizing, not enabling: at no point does it produce a result the standard linear\-algebra and manifold tools could not\. The identifiability statements rest on classical neural\-network identifiability \[[References](https://arxiv.org/html/2606.06854#bib)\] and on the known fact that learned representations have low intrinsic dimension \[[References](https://arxiv.org/html/2606.06854#bib)\]\. We state this plainly so the note is read as a unified exposition with a concrete identifiability boundary, not as a new attack\.

### Roadmap\.

Section[2](https://arxiv.org/html/2606.06854#S2)is a self\-contained primer on both halves of the story—model stealing and the handful of differential\-geometry notions we borrow—written for readers who know neither; specialists can skip to Section[3](https://arxiv.org/html/2606.06854#S3)\. Sections[4](https://arxiv.org/html/2606.06854#S4)–[6](https://arxiv.org/html/2606.06854#S6)develop and verify the single\-layer picture, and Sections[7](https://arxiv.org/html/2606.06854#S7)–[8](https://arxiv.org/html/2606.06854#S8)look beneath the last layer\.

## 2Background and intuition

This section assumes no prior exposure to either model extraction or exterior differential systems \(EDS\)\. Experts may skip to Section[3](https://arxiv.org/html/2606.06854#S3)\.

### What “model stealing” means\.

A language model is usually served behind an API: you send text and receive, for each possible next token, a score \(a*logit*\) that the model turns into a probability\. The provider keeps the model’s weights secret\.*Model stealing*asks how much of those weights an outsider can reconstruct using only API queries\. One does not expect to copy a multi\-billion\-parameter model from query access; the surprising result of \[[References](https://arxiv.org/html/2606.06854#bib)\] is that one specific piece—the final linear layer—can be recovered exactly, up to an unavoidable ambiguity, and cheaply\.

### Why the last layer is the easy target\.

A transformer carries information in a vector of widthhh\(the “hidden” or “residual” dimension\), then multiplies that vector by a matrix𝐖\\mathbf\{W\}to produce one logit per vocabulary token\. The vocabulary is large \(llin the tens of thousands\) whilehhis comparatively small, so𝐖\\mathbf\{W\}is a tall, thin, rank\-hhmatrix: it maps a small space up into a large one\. That gap,h≪lh\\ll l, is the crack the attack pries open\.

### The rank trick\.

Query the model on many different prompts and stack the logit vectors as columns of a matrix\. Although each column lives inlldimensions, every column is𝐖\\mathbf\{W\}times somethinghh\-dimensional, so all of them lie in the samehh\-dimensional subspace\. Once you have queried more thanhhtimes, new responses become linear combinations of old ones\. The singular value decomposition \(SVD\) detects this: it reports exactlyhhlarge singular values and a sharp drop afterwards\. Counting them recovers the hidden width \(Figure[1](https://arxiv.org/html/2606.06854#S4.F1)\); a little more linear algebra recovers𝐖\\mathbf\{W\}itself, up to a change of basis\.

### From a sphere to an ellipsoid\.

Modern transformers*normalize*the hidden vector just before the last layer, which forces it to have fixed length—it lives on a sphere\. A linear map sends a sphere to an*ellipsoid*\. So the logits do not merely fill anhh\-dimensional subspace; they lie on an ellipsoidal surface inside it\. Fitting that ellipsoid pins down more of𝐖\\mathbf\{W\}: it sharpens “known up to any invertible change of basis” to “known up to a rotation\.” The leftover rotation is genuinely unrecoverable, for a simple reason given in Section[5](https://arxiv.org/html/2606.06854#S5)\.

### Three borrowed ideas\.

The EDS vocabulary we use names three things that are already implicitly present above\.

- •An*ideal*is just the collection of equations every observation satisfies\. Here there are two kinds: linear ones \(the logits lie in the subspace\) and one quadratic one \(they lie on the ellipsoid\)\. The surface they cut out is the*integral variety*—the set of attainable outputs\.
- •A*polar space*answers “given part of a solution, which directions can extend it?” For a quadric this is the classical notion of points*conjugate*with respect to the surface, and—as we show—it is exactly the tangent plane to the ellipsoid\. Recovering the last layer amounts to recovering this field of tangent planes\.
- •*Regularity*is the package of nondegeneracy conditions that make the reconstruction unique and stable: a clean gap in the spectrum, and a genuinely curved \(nondegenerate\) ellipsoid\. When they fail, the attack fails in a predictable way\.

### Why bring in EDS at all?

Honestly, for the single layer it is a unifying language rather than a new tool: rank recovery, ellipsoid recovery, the rotation ambiguity, and the stability conditions become one object with three features\. Its real payoff is conceptual—it tells us in advance \(Remark[1](https://arxiv.org/html/2606.06854#Thmremark1)\) why the single\-layer attack is one\-shot, and it frames the genuinely hard question, “what can be learned about the layer underneath,” as a question about the geometry of a curved surface \(Sections[7](https://arxiv.org/html/2606.06854#S7)–[8](https://arxiv.org/html/2606.06854#S8)\)\.

## 3Setup

Let𝒳\\mathcal\{X\}be the token vocabulary,\|𝒳\|=l\|\\mathcal\{X\}\|=l\. A model producesfθ​\(p\)=softmax⁡\(𝐖​gθ​\(p\)\)f\_\{\\theta\}\(p\)=\\operatorname\{softmax\}\(\\mathbf\{W\}\\,g\_\{\\theta\}\(p\)\), wheregθ:𝒳N→ℝhg\_\{\\theta\}\\colon\\mathcal\{X\}^\{N\}\\to\\mathbb\{R\}^\{h\}computes a hidden state and𝐖∈ℝl×h\\mathbf\{W\}\\in\\mathbb\{R\}^\{l\\times h\}is the projection, withh≪lh\\ll l\. We assume the idealized oracle that returns the full logit vectorz=𝐖​gθ​\(p\)∈ℝlz=\\mathbf\{W\}\\,g\_\{\\theta\}\(p\)\\in\\mathbb\{R\}^\{l\}; the engineering needed to recover logits from top\-KKlog\-probabilities and a logit bias is treated at length in \[[References](https://arxiv.org/html/2606.06854#bib)\] and is orthogonal to what follows\. We assume the final block is a normalization \(RMSNorm or LayerNorm\) followed by𝐖\\mathbf\{W\}, so the attainable hidden states lie on a sphereS⊆ℝhS\\subseteq\\mathbb\{R\}^\{h\}and the logits lie inV:=col⁡\(𝐖\)V:=\\operatorname\{col\}\(\\mathbf\{W\}\)\.

## 4The output ideal

The attainable logits are the common zeros of two families of constraints\.

### Degree\-1 generators \(rank\)\.

Let\{ν1,…,νl−h\}\\\{\\nu\_\{1\},\\dots,\\nu\_\{l\-h\}\\\}spanV⟂V^\{\\perp\}\. Each linear formℓa​\(z\)=⟨νa,z⟩\\ell\_\{a\}\(z\)=\\langle\\nu\_\{a\},z\\ranglevanishes on every response; these are what the SVD recovers as the directions with zero singular value\.

### Degree\-2 generator \(normalization\)\.

Because‖g‖\\\|g\\\|is fixed by normalization,z=𝐖​gz=\\mathbf\{W\}gsatisfies a single quadricq​\(z\)=z⊤​A^​z−1=0q\(z\)=z^\{\\top\}\\hat\{A\}z\-1=0, whereA^\\hat\{A\}is symmetric positive semidefinite of rankhhwithker⁡A^=V⟂\\ker\\hat\{A\}=V^\{\\perp\}\. WithU∈ℝl×hU\\in\\mathbb\{R\}^\{l\\times h\}an orthonormal basis ofVVandx=U⊤​zx=U^\{\\top\}z, this readsx⊤​A​x=1x^\{\\top\}Ax=1for a positive definiteA∈ℝh×hA\\in\\mathbb\{R\}^\{h\\times h\}, and the recovery factorization𝐖=U​M−1​O\\mathbf\{W\}=UM^\{\-1\}OwithA=M⊤​MA=M^\{\\top\}MandOOorthogonal follows \[[References](https://arxiv.org/html/2606.06854#bib)\]\.

Thus the ideal isℐ=⟨ℓ1,…,ℓl−h,q⟩\\mathcal\{I\}=\\langle\\ell\_\{1\},\\dots,\\ell\_\{l\-h\},\\,q\\rangleand the output manifold is the ellipsoidℳ=\{ℓa=0,q=1\}\\mathcal\{M\}=\\\{\\ell\_\{a\}=0,\\;q=1\\\}\. Two structural facts are worth recording\.

![Refer to caption](https://arxiv.org/html/2606.06854v1/x1.png)Figure 1:The degree\-1 part of the ideal\. On a toy model withh=64h=64, the logit singular spectrum drops by fourteen orders of magnitude at indexhh: the recoveredh^\\hat\{h\}is exactly6464\.

## 5Polar space, gauge, and recovery

In the Cartan–Kähler theory the*polar space*of an integral element controls which directions extend it \[[References](https://arxiv.org/html/2606.06854#bib),[References](https://arxiv.org/html/2606.06854#bib)\]\. Here the governing generator is the quadric, so the relevant object is the classical polar \(conjugate\) space of that quadric\.

###### Proposition 1\(Polar space is the tangent space\)\.

At a pointz0∈ℳz\_\{0\}\\in\\mathcal\{M\}the polar spaceH​\(z0\)=\(A^​z0\)⟂H\(z\_\{0\}\)=\(\\hat\{A\}z\_\{0\}\)^\{\\perp\}equals the tangent spaceTz0​ℳT\_\{z\_\{0\}\}\\mathcal\{M\}\. Consequently recovering the layer is recovering the field of polar hyperplanes, each tangency conditionz0⊤​A^​v=0z\_\{0\}^\{\\top\}\\hat\{A\}v=0being one linear equation in the entries ofA^\\hat\{A\}\.

We verify this independently: computing a tangent direction of the output manifold by finite differences \(without using the recoveredA^\\hat\{A\}\), its cosine with the recovered polar normal is5\.8×10−75\.8\\times 10^\{\-7\}, limited only by the finite\-difference step\.

### The gauge\.

The symmetric formA^=M⊤​M\\hat\{A\}=M^\{\\top\}Mfixes only the Gram part of𝐖\\mathbf\{W\}; the antisymmetric complement, of dimension\(h2\)\\binom\{h\}\{2\}, is free\. This is the orthogonal gauge: under𝐖↦𝐖​O⊤\\mathbf\{W\}\\mapsto\\mathbf\{W\}O^\{\\top\},g↦O​gg\\mapsto Og, the logitsz=𝐖​O⊤​O​g=𝐖​gz=\\mathbf\{W\}O^\{\\top\}Og=\\mathbf\{W\}gare unchanged for every prompt, so the entire output distribution isO​\(h\)O\(h\)\-invariant and no statistic of the logits can distinguish points of the orbit\. Recovery up toO​\(h\)O\(h\)is therefore information\-theoretically optimal from logits alone\.

On the toy model \(h=64h=64\), Cholesky followed by a*scaled\-orthogonal*alignment reconstructs𝐖\\mathbf\{W\}to root\-mean\-square error6×10−166\\times 10^\{\-16\}—the same machine precision as a full affine alignment withh2h^\{2\}free parameters—while the aligning rotation has‖Ω−I‖F≈11\.5≈2​h\\\|\\Omega\-I\\\|\_\{F\}\\approx 11\.5\\approx\\sqrt\{2h\}, i\.e\. a generic rotation\. The quadric thus resolves the symmetric content exactly and leaves precisely theO​\(h\)O\(h\)gauge, as predicted\.

## 6Regularity, and what breaks it

Recovery is well posed under the affine analogues of Kähler\-regularity:

\(R1\) Spectral gap\.A strictly positive multiplicative gap betweenσh\\sigma\_\{h\}andσh\+1\\sigma\_\{h\+1\}, equivalently full rankhhof both the hidden\-state matrix and𝐖\\mathbf\{W\}\.

\(R2\) Nondegenerate quadric\.A^\\hat\{A\}positive definite of rankhhonVV; equivalently the activations are not confined to a proper sub\-variety, so the ellipsoid\-fitting system has full rank\.

\(R3\) Uniformity\.\(R1\)–\(R2\) hold on a neighborhood; automatic onceA^≻0\\hat\{A\}\\succ 0\.

These are not decorative\. Table[1](https://arxiv.org/html/2606.06854#S6.T1)sweeps i\.i\.d\. logit noise \(the defense of \[[References](https://arxiv.org/html/2606.06854#bib), App\. I\]\) and shows two distinct laws: the rank gap degrades like1/σ1/\\sigmabut stays above11even atσ=1\\sigma=1, soh^\\hat\{h\}is recovered at every level—*rank is robust*; while the orthogonal\-recovery error grows linearly,RMS≈0\.036​σ\\mathrm\{RMS\}\\approx 0\.036\\,\\sigma—*the projection is fragile*and needs the regularity to hold tightly\. Confining activations to an effective\-rank subspace \(an R1 violation\) makes the attack return the effective rank rather than the nominal width—reproducing the GPT\-2\-Small anomaly of \[[References](https://arxiv.org/html/2606.06854#bib)\], where757757was recovered for a768768\-dimensional model\. Figure[2](https://arxiv.org/html/2606.06854#S7.F2)\(left\) plots both laws\.

Table 1:Noise sweep on the toy model \(h=64h=64, unit\-scale logits\)\. Rank recovery \(R1\) is robust; projection recovery \(R2/R3\) degrades linearly\.
## 7Below the last layer

After the single\-layer attack, the hidden statesg​\(p\)g\(p\)are themselves known up to the global rotation\. We now ask what this reveals about the block beneath\. Consider a toy withkk\-dimensional content fed through an MLP block with a residual connection,

g=norm​\(x\+𝐖2​ϕ​\(𝐖1​x\)\),x=B​s,s∈ℝk,g=\\mathrm\{norm\}\\bigl\(x\+\\mathbf\{W\}\_\{2\}\\,\\phi\(\\mathbf\{W\}\_\{1\}x\)\\bigr\),\\qquad x=Bs,\\;s\\in\\mathbb\{R\}^\{k\},withϕ=tanh\\phi=\\tanh, hidden widthmm, residual widthhh\. The attainable hidden states now lie on a*curved*kk\-dimensional submanifold of the sphere whose linear span can be far larger thankk\. The intuition is the same as a circle in the plane: a circle is a one\-dimensional object, yet it does not fit in any single line—its linear span is two\-dimensional\. A curvedkk\-dimensional manifold likewise needs more thankklinear dimensions to contain it, and the nonlinearityϕ\\phiis precisely what bends the content manifold so that its span inflates above its true dimension\.

### An observable the linear attack misses\.

The SVD sees only the linear span\. On a two\-layer toy withh=64h=64,k=8k=8,m=32m=32the span is4040, the quadric is positive definite and well posed \(design rank820/820820/820\), and the attack returns a clean rank\-4040projection with no internal sign that anything is amiss—it reports a featureless4040\-dimensional linear layer\. But the*intrinsic dimension*of the recovered manifold, estimated by local PCA, is77–88: the true content dimension\. The gap between span \(4040\) and intrinsic dimension \(88\) is the fingerprint of a low\-rank nonlinear bottleneck\. The one\-layer control gives intrinsic dimension6262against span6464, i\.e\.span−1\\text\{span\}\-1\(the sphere\), with no gap\. In the EDS reading the intrinsic dimension is the first Cartan character and the span–intrinsic gap is the osculating \(second\-order\) data the linear attack discards\. Figure[2](https://arxiv.org/html/2606.06854#S7.F2)\(right\) shows the contrast\.

![Refer to caption](https://arxiv.org/html/2606.06854v1/x2.png)Figure 2:Left:regularity is load\-bearing\. Under logit noise the rank gap \(R1\) decays like1/σ1/\\sigmabut never falls below11; the projection recovery error \(R2/R3\) grows linearly\.Right:below the last layer, the linear span \(what the SVD reports\) overstates the content; the intrinsic manifold dimension recovers it and exposes the nonlinear bottleneck\.

## 8The identifiability wall

The observable is the hidden\-state manifold𝒮\\mathcal\{S\}up to the global orthogonal gauge\. We characterize the sublayerx↦x\+𝐖2​ϕ​\(𝐖1​x\)x\\mapsto x\+\\mathbf\{W\}\_\{2\}\\phi\(\\mathbf\{W\}\_\{1\}x\)relative to it\. The governing intuition is simple: one can only learn about parts of a network that the inputs actually exercise, and only up to relabelings that leave every output unchanged\. Both effects turn out to be large here\.

###### Proposition 2\(Identifiability boundary, toy model\)\.

From𝒮\\mathcal\{S\}the sublayer is determined only by its input–output behaviour on the input support𝒳=col⁡\(B\)\\mathcal\{X\}=\\operatorname\{col\}\(B\), and only up to: the global rotationO​\(h\)O\(h\); input reparametrizationG​L​\(k\)GL\(k\)on𝒳\\mathcal\{X\}; and, for minimaltanh\\tanhrealizations, neuron sign\-permutation \[[References](https://arxiv.org/html/2606.06854#bib)\]\. The following are free fibers, hence unrecoverable:

1. \(a\)the action of𝐖1\\mathbf\{W\}\_\{1\}on𝒳⟂\\mathcal\{X\}^\{\\perp\};
2. \(b\)the MLP widthmm\(only the minimal realization on𝒳\\mathcal\{X\}is pinned\);
3. \(c\)BBbeyond its column space \(a fullG​L​\(k\)GL\(k\)\);
4. \(d\)the parametrization of the input distribution—only its support matters\.

We make the fibers concrete with bit\-level examples \(all logit differences below are at machine precision,∼10−14\\sim 10^\{\-14\}\)\.

\(a\) Off\-support freedom\.LetΔ=Δ​\(I−P𝒳\)\\Delta=\\Delta\(I\-P\_\{\\mathcal\{X\}\}\)act only on𝒳⟂\\mathcal\{X\}^\{\\perp\}\. Then𝐖1\\mathbf\{W\}\_\{1\}and𝐖1\+Δ\\mathbf\{W\}\_\{1\}\+\\Deltaproduce identical outputs\. Forh=64h=64,k=8k=8,m=32m=32this hidesm​\(h−k\)=1792m\(h\-k\)=1792of the20482048parameters of𝐖1\\mathbf\{W\}\_\{1\}\(87\.5%87\.5\\%\), with‖Δ‖F≈42\\\|\\Delta\\\|\_\{F\}\\approx 42\.

\(b\) Width\.Appending a cancelling neuron pair \(𝐖1→\(𝐖1ww\)\\mathbf\{W\}\_\{1\}\\\!\\to\\\!\\begin\{pmatrix\}\\mathbf\{W\}\_\{1\}\\\\ w\\\\ w\\end\{pmatrix\},𝐖2→\[𝐖2​c−c\]\\mathbf\{W\}\_\{2\}\\\!\\to\\\!\[\\mathbf\{W\}\_\{2\}\\;c\\;\{\-\}c\]\) changes the width fromm=32m=32to3434while leaving every output unchanged\.

\(c\) Input reparametrization\.ReplacingBBbyB​RBRforR∈G​L​\(k\)R\\in GL\(k\)leaves the attainable input set, hence𝒮\\mathcal\{S\}, unchanged\.

The interpretation is sharp: the part of the sublayer one would most want, the full𝐖1\\mathbf\{W\}\_\{1\}, lies almost entirely inside the fiber\. This is the mechanical reason the attack stops at one layer—the observation map has a large kernel below the last linear layer—and it reframes the open “extend beyond one layer” problem of \[[References](https://arxiv.org/html/2606.06854#bib)\] as an identifiability question rather than an algorithmic one\.

## 9Scope, novelty, and related work

We are explicit about what is and is not new\.*What is reproduced:*the last\-layer recovery, including recovery up to an orthogonal matrix via the normalization\-induced ellipsoid, is from \[[References](https://arxiv.org/html/2606.06854#bib)\]\.*What the geometric language adds:*a single account in which rank recovery, ellipsoid recovery, theO​\(h\)O\(h\)gauge, and the regularity conditions are one structure—the ideal, its polar space, and its Kähler\-regularity—together with the Frobenius observation \(Remark[1](https://arxiv.org/html/2606.06854#Thmremark1)\) that explains why the single\-layer attack is closed\-form\. We stress that this framing is organizing, not enabling\.*What is a genuinely new observation, though built on known tools:*the intrinsic dimension of the recoverable manifold as an extraction observable \(Section[7](https://arxiv.org/html/2606.06854#S7)\), and the explicit identifiability boundary with its fibers \(Section[8](https://arxiv.org/html/2606.06854#S8)\)\. The first leans on the established fact that representations have low intrinsic dimension \[[References](https://arxiv.org/html/2606.06854#bib)\]; the second on classical network identifiability \[[References](https://arxiv.org/html/2606.06854#bib)\]\. Concurrent and prior work on the same attack family includes \[[References](https://arxiv.org/html/2606.06854#bib),[References](https://arxiv.org/html/2606.06854#bib)\]\. The EDS toolkit we borrow from is that of \[[References](https://arxiv.org/html/2606.06854#bib)\] and its Lie\-algebroid extension \[[References](https://arxiv.org/html/2606.06854#bib)\]\.

We did*not*recover the sublayer parameters, and Section[8](https://arxiv.org/html/2606.06854#S8)indicates this is blocked by identifiability rather than by a missing technique\. A clean impossibility theorem for realistic architectures, with a matching positive recovery result on the identifiable quotient, is the natural next target and is left open\.

## 10Conclusion

Last\-layer model stealing has a tidy geometric description: the logits trace the integral variety of an ideal with a linear and a quadratic generator, the quadric’s polar space is the manifold’s tangent space and controls recovery, and the attack succeeds up to the orthogonal gauge exactly under regularity conditions that we showed are load\-bearing\. One layer down, the linear span the SVD reports can hide a low\-rank nonlinear bottleneck that the intrinsic dimension reveals, and the sublayer’s parameters live largely in an explicit non\-identifiable fiber\. The geometry is a clarifying lens; the wall beneath the last layer is real and, we argue, the more fruitful object of study\.

### Reproducibility\.

Every number in this note is produced by a small NumPy library and a single reproduction script; figures are regenerated by one command\. Code:[https://github\.com/nssprogrammer/eds\-stealing](https://github.com/nssprogrammer/eds-stealing)\.

## References

- \[1\]N\. Carlini et al\.,Stealing Part of a Production Language Model\.ICML 2024\. arXiv:2403\.06634\.
- \[2\]S\. Hohloch, T\. Mestdag, K\. Yasaka,The Cartan–Kähler theorem for exterior differential systems on transitive Lie algebroids\.arXiv:2605\.29083 \(2026\)\.
- \[3\]R\. L\. Bryant, S\. S\. Chern, R\. B\. Gardner, H\. L\. Goldschmidt, P\. A\. Griffiths,Exterior Differential Systems\.Springer, 1991\.
- \[4\]H\. J\. Sussmann,Uniqueness of the weights for minimal feedforward nets with a given input–output map\.Neural Networks 5\(4\):589–593, 1992\.
- \[5\]A\. Ansuini, A\. Laio, J\. H\. Macke, D\. Zoccolan,Intrinsic dimension of data representations in deep neural networks\.NeurIPS 2019\. arXiv:1905\.12784\.
- \[6\]M\. Finlayson, S\. Swayamdipta, X\. Ren,Logits of API\-protected LLMs leak proprietary information\.arXiv:2403\.09539 \(2024\)\.
- \[7\]S\. Zanella\-Béguelin, S\. Tople, A\. Paverd, B\. Köpf,Grey\-box extraction of natural language models\.ICML 2021\.

Similar Articles

Geometric Factual Recall in Transformers

Hugging Face Daily Papers

This paper introduces a theoretical framework for geometric factual recall in transformers, demonstrating that embeddings can encode relational structure via linear superpositions while MLPs act as selectors. It provides empirical and theoretical evidence that this mechanism allows for efficient memorization of facts and multi-hop queries.