Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models

arXiv cs.LG 05/13/26, 04:00 AM Papers
Summary
This paper proposes Spectra, a method using spectral occupancy to analyze and control the realized capacity of latent graph models, arguing that rank is not equivalent to model capacity.
arXiv:2605.11142v1 Announce Type: new Abstract: Graph representation learning has become a standard approach for analyzing networked data, with latent embeddings widely used for link prediction, community detection, and related tasks. Yet a basic design choice, the latent dimension, is still treated as a brittle hyperparameter, fixed before training and tuned by held-out performance. Learned factors are also identifiable only up to rotation and rescaling, so the nominal rank rarely coincides with the quantity that governs model behavior. We propose Spectral Prefix Extraction and Capacity-Targeted Representation Analysis (Spectra), which replaces rank as the unit of analysis with the spectrum of a learned positive semidefinite kernel, trace-normalized so that spectra are comparable across fits. The normalized eigenvalues form a distribution on the simplex, and their Shannon effective rank acts both as a summary of learned capacity and as a controllable training-time coordinate: a single scalar shapes this realized dimension during training, and bisection targets any desired value within the rank cap. To theoretically support that, we show local regularity and monotonicity of the realized-dimension profile. Across collaboration, social, biological, and infrastructure networks, Spectra traces performance--capacity frontiers that make the trade-off between predictive accuracy and realized dimension visible. It performs competitively with strong link-prediction baselines, yields aligned lower-capacity views of the same fitted model through spectral prefixes, and provides a principled handle on capacity in the overparameterized regime. Capacity thus becomes a property of the fitted model rather than a hyperparameter of the training.
Original Article Export to Word Export to PDF
View Cached Full Text
Cached at: 05/13/26, 06:32 AM
# Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models
Source: [https://arxiv.org/html/2605.11142](https://arxiv.org/html/2605.11142)
Nikolaos Nakis1,Panagiotis Promponas2,∗Konstantinos Tsirkas3Katerina Mamali4 Eftychia Makri2Leandros Tassiulas2Nicholas A\. Christakis1 1Human Nature Lab, Yale University 2Department of Electrical and Computer Engineering, Yale University 3Department of Statistics and Data Science, Yale University 4Department of Computer Science, Yale University New Haven, CT 06511 nicolaos\.nakis@gmail\.com, panagiotis\.promponas@yale\.edu

###### Abstract

Graph representation learning has become a standard approach for analyzing networked data, with latent embeddings widely used for link prediction, community detection, and related tasks\. Yet a basic design choice, the latent dimension, is still treated as a brittle hyperparameter, fixed before training and tuned by held\-out performance\. Learned factors are also identifiable only up to rotation and rescaling, so the nominal rank rarely coincides with the quantity that governs model behavior\. We propose Spectral Prefix Extraction and Capacity\-Targeted Representation Analysis \(Spectra\), which replaces rank as the unit of analysis with the spectrum of a learned positive semidefinite kernel, trace\-normalized so that spectra are comparable across fits\. The normalized eigenvalues form a distribution on the simplex, and their Shannon effective rank acts both as a summary of learned capacity and as a controllable training\-time coordinate: a single scalar shapes this realized dimension during training, and bisection targets any desired value within the rank cap\. To theoretically support that, we show local regularity and monotonicity of the realized\-dimension profile\. Across collaboration, social, biological, and infrastructure networks,Spectratraces performance–capacity frontiers that make the trade\-off between predictive accuracy and realized dimension visible\. It performs competitively with strong link\-prediction baselines, yields aligned lower\-capacity views of the same fitted model through spectral prefixes, and provides a principled handle on capacity in the overparameterized regime\. Capacity thus becomes a property of the fitted model rather than a hyperparameter of the training\.

## 1Introduction

A central design choice in graph representation learning is the dimension of the latent space\. Despite its influence on model behavior, this quantity is usually treated as a fixed hyperparameter, selected before training and tuned by held\-out performance\. Modern graph representation methods, including random\-walk embeddings\(Grover and Leskovec,[2016](https://arxiv.org/html/2605.11142#bib.bib5); Ahmedet al\.,[2018](https://arxiv.org/html/2605.11142#bib.bib37)\), matrix\-factorization approaches\(Qiuet al\.,[2018](https://arxiv.org/html/2605.11142#bib.bib60); Caoet al\.,[2015](https://arxiv.org/html/2605.11142#bib.bib10)\), mixed\-membership models\(Airoldiet al\.,[2007](https://arxiv.org/html/2605.11142#bib.bib42); Wanget al\.,[2017](https://arxiv.org/html/2605.11142#bib.bib56)\), latent space and inner\-product models\(Hoffet al\.,[2002](https://arxiv.org/html/2605.11142#bib.bib71); Hoff,[2007](https://arxiv.org/html/2605.11142#bib.bib72); Athreyaet al\.,[2018](https://arxiv.org/html/2605.11142#bib.bib74); Rubin\-Delanchyet al\.,[2022](https://arxiv.org/html/2605.11142#bib.bib75); Sussmanet al\.,[2013](https://arxiv.org/html/2605.11142#bib.bib76)\), and simplex\-volume extensions such asHM\-LDM\(Nakiset al\.,[2022](https://arxiv.org/html/2605.11142#bib.bib43),[2023a](https://arxiv.org/html/2605.11142#bib.bib41)\), all rely, explicitly or implicitly, on such a choice\. Across these families, a basic question persists: how many latent dimensions do the data actually require?

Existing selection procedures, including profile\-likelihood elbows\(Zhu and Ghodsi,[2006](https://arxiv.org/html/2605.11142#bib.bib77)\), Bayesian latent\-dimension selection\(Passino and Heard,[2020](https://arxiv.org/html/2605.11142#bib.bib78)\), network cross\-validation\(Chen and Lei,[2018](https://arxiv.org/html/2605.11142#bib.bib79); Liet al\.,[2020](https://arxiv.org/html/2605.11142#bib.bib80)\), structural criteria\(Guet al\.,[2021](https://arxiv.org/html/2605.11142#bib.bib102)\), input\-graph entropy methods\(Luoet al\.,[2021](https://arxiv.org/html/2605.11142#bib.bib81)\), and spectral parallel analysis such asNetFlipPA\(Hong and Cape,[2025](https://arxiv.org/html/2605.11142#bib.bib101)\), largely select a single dimension before or after fitting\. They do not expose*representational capacity*, i\.e\., how many latent modes the fitted model effectively uses, as a quantity that can be measured, controlled, or audited in the fitted model itself\. Also, learned latent factors are identifiable only up to rotation and global rescaling\(Hoffet al\.,[2002](https://arxiv.org/html/2605.11142#bib.bib71); Airoldiet al\.,[2007](https://arxiv.org/html/2605.11142#bib.bib42); Athreyaet al\.,[2018](https://arxiv.org/html/2605.11142#bib.bib74)\), and dimension misspecification has measurable statistical costs\(Taing and Levin,[2026](https://arxiv.org/html/2605.11142#bib.bib82)\)\. The operative quantity is therefore not only the declared rank, but how representational mass is distributed across latent modes\.

We propose the Spectral Prefix Extraction and Capacity\-Targeted Representation Analysis \(Spectra\), to replace rank as the unit of analysis with the spectrum of a learned positive semidefinite \(PSD\) kernel\. The kernel is invariant under the gauge symmetries of the factorization, trace normalization fixes total spectral mass, and the normalized eigenvalues form a probability distribution on latent modes, which we call the*spectral occupancy distribution*\. We summarize this distribution by the*effective spectral dimension*dspecd\_\{\\mathrm\{spec\}\}, the exponential of its Shannon entropy\(Roy and Vetterli,[2007](https://arxiv.org/html/2605.11142#bib.bib85); Friedman and Dieng,[2022](https://arxiv.org/html/2605.11142#bib.bib86)\)\. Spectral entropy has been used as a diagnostic in language models\(Weiet al\.,[2024](https://arxiv.org/html/2605.11142#bib.bib90); Jha and Reagen,[2025](https://arxiv.org/html/2605.11142#bib.bib91)\), training dynamics\(Yanget al\.,[2024](https://arxiv.org/html/2605.11142#bib.bib103)\), and adaptive\-rank compression\(Cherukuri and Lala,[2025](https://arxiv.org/html/2605.11142#bib.bib92)\)\. Here, it becomes a controllable, end\-to\-end training\-time capacity coordinate for latent graph models\.

This positioning connects to two views of overparameterized models: effective complexity can govern generalization even when parameter counts are large\(Belkinet al\.,[2019](https://arxiv.org/html/2605.11142#bib.bib104); Bartlettet al\.,[2020](https://arxiv.org/html/2605.11142#bib.bib105)\), and gradient descent on factorized matrix models is implicitly biased toward low\-rank solutions\(Gunasekaret al\.,[2017](https://arxiv.org/html/2605.11142#bib.bib95); Aroraet al\.,[2019](https://arxiv.org/html/2605.11142#bib.bib96)\)\. Our approach, makes this bias explicit and parameterizable: an entropy weightη\\etashapes the learned spectrum, bisection targets a desireddspecd\_\{\\mathrm\{spec\}\}, and the rank cap becomes a parameterization ceiling\. Trace normalization separates scale from rank, avoiding the scale coupling of nuclear\-norm penalties\(Srebroet al\.,[2004](https://arxiv.org/html/2605.11142#bib.bib97)\), while spectral prefixes provide optimal low\-rank PSD summaries of the fitted kernel\(Eckart and Young,[1936](https://arxiv.org/html/2605.11142#bib.bib65)\)\.

Our contributions are: \(i\)Spectral occupancy as realized capacity\.We replace nominal rank with the trace\-normalized spectrum of the learned PSD kernel, and use its Shannon effective rank,dspecd\_\{\\mathrm\{spec\}\}, as a smooth measure of realized latent dimension\. \(ii\)Training\-time capacity control\.We introduce an entropy\-regularized objective in which a scalarη\\etashapes the realized spectral dimension, and use bisection to target a prescribeddspecd\_\{\\mathrm\{spec\}\}within tolerance\. \(iii\)Spectral identifiability and summaries\.We verify the identifiability of active spectral modes under a simple\-spectrum condition in our framework, we establish local𝒞1\\mathcal\{C\}^\{1\}behavior of the realized\-dimension profile under regularity conditions, and we use the optimality of spectral prefixes as low\-rank PSD summaries\. \(iv\)Empirical capacity frontiers\.Across eight benchmark networks and three rank caps, AUC aligns by achieveddspecd\_\{\\mathrm\{spec\}\}where rank caps overlap\. The resulting frontiers distinguish saturated datasets, whose best operating points lie below the cap, from rank\-cap\-binding datasets, whose best operating points increase withrr\. \(v\)Single\-fit prefix families\.A fitted kernel yields a nested family of aligned spectral prefixes, providing lower\-capacity views of the same representation without retraining\. \(vi\)Spectrally controlled overparameterization\.At matched effective capacity,η\\eta\-targeted overparameterized fits improve test log\-likelihood and train–test gap over rank\-cap\-only baselines in paired experiments\.

## 2Related Work

Latent\-distance and inner\-product models\(Hoffet al\.,[2002](https://arxiv.org/html/2605.11142#bib.bib71); Hoff,[2007](https://arxiv.org/html/2605.11142#bib.bib72)\)alongside random dot product graph \(RDPG\) frameworks\(Athreyaet al\.,[2018](https://arxiv.org/html/2605.11142#bib.bib74); Sussmanet al\.,[2013](https://arxiv.org/html/2605.11142#bib.bib76)\)treat spectra of latent positions or adjacency matrices as canonical inferential objects, often with consistency guarantees, but do not provide explicit training\-time capacity control\. Gradient\-based RDPG inference\(Fioriet al\.,[2023](https://arxiv.org/html/2605.11142#bib.bib100)\)similarly infers realized rank from the optimizer rather than controlling it, while simplex\-constrained and latent\-distance extensions\(Nakiset al\.,[2022](https://arxiv.org/html/2605.11142#bib.bib43),[2023b](https://arxiv.org/html/2605.11142#bib.bib57)\)add geometric interpretability but retain a fixed nominal dimension\.Spectraworks in this lineage, but uses an explicit kernel parameterization to expose capacity as a continuous training\-time coordinate\. Profile\-likelihood elbow rules\(Zhu and Ghodsi,[2006](https://arxiv.org/html/2605.11142#bib.bib77)\), Bayesian latent\-dimension selection\(Passino and Heard,[2020](https://arxiv.org/html/2605.11142#bib.bib78)\), network cross\-validation\(Chen and Lei,[2018](https://arxiv.org/html/2605.11142#bib.bib79); Liet al\.,[2020](https://arxiv.org/html/2605.11142#bib.bib80)\), structural information criteria\(Guet al\.,[2021](https://arxiv.org/html/2605.11142#bib.bib102); Luoet al\.,[2021](https://arxiv.org/html/2605.11142#bib.bib81)\), spectral parallel analysis\(Hong and Cape,[2025](https://arxiv.org/html/2605.11142#bib.bib101)\), and logarithmic search under metric latent\-distance models\(Nakiset al\.,[2025](https://arxiv.org/html/2605.11142#bib.bib106)\)select a dimension from spectral, posterior, or held\-out diagnostics in a one\-shot procedure\. The operative coordinate is fixed before or after training\.Spectrainstead shapes realized capacity during training and measures it on the learned kernel spectrum\. Factorized PSD models exhibit gradient\-descent bias toward low rank\(Gunasekaret al\.,[2017](https://arxiv.org/html/2605.11142#bib.bib95); Aroraet al\.,[2019](https://arxiv.org/html/2605.11142#bib.bib96)\), within a broader literature in which effective complexity governs overparameterized generalization\(Belkinet al\.,[2019](https://arxiv.org/html/2605.11142#bib.bib104); Bartlettet al\.,[2020](https://arxiv.org/html/2605.11142#bib.bib105)\)\. Nuclear\-norm relaxations\(Candes and Recht,[2012](https://arxiv.org/html/2605.11142#bib.bib93); Rechtet al\.,[2010](https://arxiv.org/html/2605.11142#bib.bib94); Srebroet al\.,[2004](https://arxiv.org/html/2605.11142#bib.bib97)\)provide explicit but scale\-coupled regularization, and stable\-rank normalization\(Sanyalet al\.,[2020](https://arxiv.org/html/2605.11142#bib.bib107)\)controls a Frobenius\-to\-spectral ratio\. By trace\-normalizing the learned PSD kernel,Spectraseparates scale from rank, exposes capacity on the simplex of normalized eigenvalues, and yields Eckart–Young–Mirsky\-optimal low\-rank PSD summaries\(Eckart and Young,[1936](https://arxiv.org/html/2605.11142#bib.bib65)\)\. Shannon effective rank has appeared as a signal\-processing spectral\-entropy measure\(Roy and Vetterli,[2007](https://arxiv.org/html/2605.11142#bib.bib85)\)and as the Vendi score for diversity\(Friedman and Dieng,[2022](https://arxiv.org/html/2605.11142#bib.bib86)\); spectral entropy also diagnoses capacity in language models\(Weiet al\.,[2024](https://arxiv.org/html/2605.11142#bib.bib90); Jha and Reagen,[2025](https://arxiv.org/html/2605.11142#bib.bib91)\), characterizes neural\-network training dynamics\(Yanget al\.,[2024](https://arxiv.org/html/2605.11142#bib.bib103)\), and guides adaptive\-rank compression\(Cherukuri and Lala,[2025](https://arxiv.org/html/2605.11142#bib.bib92)\)\. Related alternatives include the participation ratio, intrinsic dimension\(Liet al\.,[2018](https://arxiv.org/html/2605.11142#bib.bib108)\), stable rank\(Ipsen and Saibaba,[2025](https://arxiv.org/html/2605.11142#bib.bib89)\), and entropy functionals in determinantal point processes\(Kulesza and Taskar,[2012](https://arxiv.org/html/2605.11142#bib.bib109)\)\. In these settings the spectrum is typically read post hoc;Spectrauses Shannon effective rank as a controllable training\-time coordinate\.

## 3Proposed Method

![Refer to caption](https://arxiv.org/html/2605.11142v1/figures/overview.png)Figure 1:Overview ofSpectra\. A rank\-capped factorLLinduces a trace\-normalized PSD kernelK\(L\)K\(L\)used in the edge model\. The normalized spectrum ofK\(L\)K\(L\)defines spectral occupancy\.Preliminaries\.Let𝒢=\(V,E\)\\mathcal\{G\}=\(V,E\)be a simple undirected graph withN=\|V\|N=\|V\|nodes and adjacency matrix𝒀∈\{0,1\}N×N\\bm\{Y\}\\in\\\{0,1\\\}^\{N\\times N\}, where𝒀ij=𝒀ji\\bm\{Y\}\_\{ij\}=\\bm\{Y\}\_\{ji\}and𝒀ii=0\\bm\{Y\}\_\{ii\}=0\. Our goal is to learn a latent graph representation whose realized dimension can be measured from the fitted model, rather than fixed in advance by a chosen factorization rank\. We represent latent affinities between nodes by a positive semidefinite \(PSD\) kernelK∈ℝN×NK\\in\\mathbb\{R\}^\{N\\times N\}\. To compare the spectra of learned kernels across runs, rank caps, and regularization strengths, we fix the total spectral mass:

K⪰0,tr⁡\(K\)=N\.K\\succeq 0,\\qquad\\operatorname\{tr\}\(K\)=N\.\(1\)This trace normalization removes arbitrary global rescaling of the kernel and places all learned spectra on the same scale\. As a result, differences in the eigenvalue distribution ofKKreflect how the representation allocates its fixed spectral budget across latent modes, rather than differences in overall magnitude\. We parameterize the kernel by a rank\-capped factorL∈ℝN×rL\\in\\mathbb\{R\}^\{N\\times r\}:

K\(L\)=NLL⊤tr⁡\(LL⊤\),L≠0\.K\(L\)=N\\frac\{LL^\{\\top\}\}\{\\operatorname\{tr\}\(LL^\{\\top\}\)\},\\qquad L\\neq 0\.\(2\)The integerrris therefore a parameterization ceiling: it limits the maximum rank available to the optimizer, but it does not define the dimension actually used by the learned representation\. The realized dimension will be measured from the normalized spectrum ofK\(L\)K\(L\)\.

Spectral Prefix Extraction and Capacity\-Targeted Representation Analysis \(Spectra\)\.Given the trace\-normalized kernelK\(L\)K\(L\), node\-specific offsets𝒂∈ℝN\\bm\{a\}\\in\\mathbb\{R\}^\{N\}, and a slope parameterβ\>0\\beta\>0, we model each undirected edge as

ℙ\(𝒀ij=1∣L,𝒂,β\)=ϕ\(ai\+aj\+βKij\(L\)\),i<j,\\mathbb\{P\}\(\\bm\{Y\}\_\{ij\}=1\\mid L,\\bm\{a\},\\beta\)=\\phi\\\!\\left\(a\_\{i\}\+a\_\{j\}\+\\beta K\_\{ij\}\(L\)\\right\),\\qquad i<j,\(3\)whereϕ\(t\)=\(1\+exp⁡\(−t\)\)−1\\phi\(t\)=\(1\+\\exp\(\-t\)\)^\{\-1\}\. The offsets𝒂\\bm\{a\}capture degree heterogeneity: nodes with larger offsets have higher baseline propensity to form edges\. The kernel entryKij\(L\)K\_\{ij\}\(L\)captures latent affinity between nodesiiandjj\. The slopeβ\\betacontrols the strength of this latent\-affinity signal on the log\-odds scale: whenβ\\betais small, edge probabilities are driven mostly by the node offsets, whereas largerβ\\betamakes differences inKij\(L\)K\_\{ij\}\(L\)more influential\. Because the kernel trace is fixed,β\\betacannot be absorbed into an arbitrary rescaling ofKKand it has a well\-defined role as the global sensitivity to latent affinity\.

Spectral occupancy and effective spectral dimension\.Letλ1\(K\)≥λ2\(K\)≥⋯≥λN\(K\)≥0\\lambda\_\{1\}\(K\)\\geq\\lambda\_\{2\}\(K\)\\geq\\cdots\\geq\\lambda\_\{N\}\(K\)\\geq 0be the eigenvalues of the trace\-normalized kernelKK\. Sincetr⁡\(K\)=N\\operatorname\{tr\}\(K\)=N, the normalized spectrum

pi\(K\)=λi\(K\)Np\_\{i\}\(K\)=\\frac\{\\lambda\_\{i\}\(K\)\}\{N\}\(4\)defines a probability distribution over spectral modes\. We callp\(K\)=\(p1\(K\),…,pN\(K\)\)p\(K\)=\(p\_\{1\}\(K\),\\ldots,p\_\{N\}\(K\)\)the*spectral occupancy distribution*: it records how the fixed spectral budget ofKKis allocated across latent directions\.

###### Definition 1\(Effective spectral dimension\)\.

The*effective spectral dimension*ofKKis

dspec\(K\):=exp⁡\(−∑i:pi\(K\)\>0pi\(K\)log⁡pi\(K\)\)\.d\_\{\\mathrm\{spec\}\}\(K\):=\\exp\\\!\\left\(\-\\sum\_\{i:p\_\{i\}\(K\)\>0\}p\_\{i\}\(K\)\\log p\_\{i\}\(K\)\\right\)\.\(5\)Equivalently,dspec\(K\)=exp⁡\(H\(p\(K\)\)\)d\_\{\\mathrm\{spec\}\}\(K\)=\\exp\(H\(p\(K\)\)\), whereHHis the Shannon entropy\. This quantity is the Shannon effective rank of the trace\-normalized kernel\.

The valuedspec\(K\)d\_\{\\mathrm\{spec\}\}\(K\)can be interpreted as the number of equal\-mass spectral modes that would have the same Shannon entropy as the observed occupancy distributionp\(K\)p\(K\)\. It equals one when the entire trace budget is concentrated in a single mode\. It equalssswhen the trace budget is spread uniformly overssnonzero modes, in which casep1\(K\)=⋯=ps\(K\)=1/sp\_\{1\}\(K\)=\\cdots=p\_\{s\}\(K\)=1/sandpi\(K\)=0p\_\{i\}\(K\)=0fori\>si\>s\. Between these extremes,dspec\(K\)d\_\{\\mathrm\{spec\}\}\(K\)varies continuously with the spectrum and provides a smooth effective\-dimension coordinate for the learned representation\. In contrast, the hard rankrank⁡\(K\)=\|\{i:pi\(K\)\>0\}\|\\operatorname\{rank\}\(K\)=\|\\\{i:p\_\{i\}\(K\)\>0\\\}\|only counts the support size of the spectrum and is sensitive to modes with arbitrarily small spectral mass\.

Capacity\-control objective\.The effective spectral dimensiondspec\(K\)d\_\{\\mathrm\{spec\}\}\(K\)is bounded above by the hard rank ofKK, with equality attained when the spectral mass is uniformly distributed\. This maximum\-entropy reference case motivates controlling realized dimension through the entropy of the spectral occupancy distribution\. Higher entropy corresponds to a more diffuse allocation of the trace budget across latent modes, while lower entropy corresponds to a more concentrated allocation\.

Letℓ\(L,𝒂,β\)\\ell\(L,\\bm\{a\},\\beta\)denote the sampled link\-prediction loss induced by the latent\-kernel edge model, and letR\(L,𝒂,β\)R\(L,\\bm\{a\},\\beta\)denote quadratic regularization\. Sincelog⁡dspec\(K\(L\)\)=H\(p\(K\(L\)\)\),\\log d\_\{\\mathrm\{spec\}\}\(K\(L\)\)=H\(p\(K\(L\)\)\),we estimate the model by minimizing the entropy\-regularized objective

Fη\(L,𝒂,β\)=ℓ\(L,𝒂,β\)−ηlog⁡dspec\(K\(L\)\)\+R\(L,𝒂,β\)\.F\_\{\\eta\}\(L,\\bm\{a\},\\beta\)=\\ell\(L,\\bm\{a\},\\beta\)\-\\eta\\log d\_\{\\mathrm\{spec\}\}\(K\(L\)\)\+R\(L,\\bm\{a\},\\beta\)\.\(6\)The scalarη\\etaserves as a Lagrange\-style control parameter governing the tradeoff between predictive fit and spectral spread\. Larger positive values ofη\\etaplace greater reward on spectral entropy and favor representations with larger effective spectral dimension while smaller or negative values favor more concentrated spectra and lower effective spectral dimension\.

For each value ofη\\eta, optimization yields a fitted factorL⋆\(η\)L^\{\\star\}\(\\eta\)and a corresponding trace\-normalized kernelK⋆\(η\)=K\(L⋆\(η\)\)\.K^\{\\star\}\(\\eta\)=K\(L^\{\\star\}\(\\eta\)\)\.This induces the realized\-dimension profile

dspec\(η\)=dspec\(K⋆\(η\)\)\.d\_\{\\mathrm\{spec\}\}\(\\eta\)=d\_\{\\mathrm\{spec\}\}\(K^\{\\star\}\(\\eta\)\)\.\(7\)We report fitted models in the interpretable coordinatedspec\(K⋆\(η\)\)d\_\{\\mathrm\{spec\}\}\(K^\{\\star\}\(\\eta\)\), which gives the effective number of spectral modes realized by the learned representation\.

Interpretation\.Two kernels with the same hard rank can have very differentdspecd\_\{\\mathrm\{spec\}\}, and conversely\. The quantity summarizes the shape of the occupancy distribution rather than only its support size\. Among spectral summaries, the participation ratioPR\(K\)=1/∑ipi\(K\)2\\mathrm\{PR\}\(K\)=1/\\sum\_\{i\}p\_\{i\}\(K\)^\{2\}is the exponential of the second\-order Rényi entropy and emphasizes high\-mass modes through a second\-order moment, while thresholded rankskτ\(K\)=\|\{i:λi\(K\)≥τλ1\(K\)\}\|k\_\{\\tau\}\(K\)=\|\\\{i:\\lambda\_\{i\}\(K\)\\geq\\tau\\lambda\_\{1\}\(K\)\\\}\|depend on a discontinuous cutoff\. The Shannon effective rank indspecd\_\{\\mathrm\{spec\}\}is the unique choice that is continuous in the occupancy distribution, mass\-proportional, and recovers the hard rank in the equimass limit, making it a smooth coordinate for controlling and reporting realized dimension\.

Identifiability and optimality of spectral summaries\.We record two properties of the spectrum of the trace\-normalized kernel\. The first shows that spectral prefixes are optimal low\-rank summaries of a learned kernel and follows from Eckart\-Young\-MirskyEckart and Young \([1936](https://arxiv.org/html/2605.11142#bib.bib65)\)\. The second shows that, under a simple spectrum, the spectral modes are identified up to sign and permutation\. We present the proofs in Appendices[B](https://arxiv.org/html/2605.11142#A2)and[C](https://arxiv.org/html/2605.11142#A3)for completeness\.

###### Theorem 2\(Optimal spectral prefix\)\.

LetK⪰0K\\succeq 0be a symmetric matrix with rankrrand eigenvaluesλ1≥λ2≥⋯≥λr\>0\\lambda\_\{1\}\\geq\\lambda\_\{2\}\\geq\\cdots\\geq\\lambda\_\{r\}\>0and eigenvectorsu1,…,uru\_\{1\},\\ldots,u\_\{r\}such thatK=∑j=1rλjujuj⊤K=\\sum\_\{j=1\}^\{r\}\\lambda\_\{j\}u\_\{j\}u\_\{j\}^\{\\top\}\. For any positive integerk≤rk\\leq r, defineKk:=∑j=1kλjujuj⊤K\_\{k\}:=\\sum\_\{j=1\}^\{k\}\\lambda\_\{j\}u\_\{j\}u\_\{j\}^\{\\top\}\. ThenKkK\_\{k\}is a best rank\-kkpositive semidefinite approximation ofKKin Frobenius norm, i\.e\.,Kk∈arg⁡minM⪰0,rank⁡\(M\)≤k⁡‖K−M‖F2\.K\_\{k\}\\in\\arg\\min\_\{\\begin\{subarray\}\{c\}M\\succeq 0,\\operatorname\{rank\}\(M\)\\leq k\\end\{subarray\}\}\\\|K\-M\\\|\_\{F\}^\{2\}\.Moreover, ifλk\>λk\+1\\lambda\_\{k\}\>\\lambda\_\{k\+1\}, the optimizer is unique\.

###### Theorem 3\(Identifiability of simple spectral modes\)\.

LetK⪰0K\\succeq 0be a symmetric matrix of rankrrwith eigenvaluesλ1≥λ2≥⋯≥λr\>0\\lambda\_\{1\}\\geq\\lambda\_\{2\}\\geq\\cdots\\geq\\lambda\_\{r\}\>0withλi≠λj\\lambda\_\{i\}\\neq\\lambda\_\{j\}for alli≠ji\\neq jand eigenvectorsu1,…,uru\_\{1\},\\ldots,u\_\{r\}such thatK=∑j=1rλjujuj⊤K=\\sum\_\{j=1\}^\{r\}\\lambda\_\{j\}u\_\{j\}u\_\{j\}^\{\\top\}\. Then, if

K=Udiag⁡\(λ1,…,λr\)U⊤=U′diag⁡\(λ1′,…,λr′\)U′⊤,K=U\\operatorname\{diag\}\(\\lambda\_\{1\},\\ldots,\\lambda\_\{r\}\)U^\{\\top\}=U^\{\\prime\}\\operatorname\{diag\}\(\\lambda^\{\\prime\}\_\{1\},\\ldots,\\lambda^\{\\prime\}\_\{r\}\)\{U^\{\\prime\}\}^\{\\top\},withU⊤U=U′⊤U′=IrU^\{\\top\}U=\{U^\{\\prime\}\}^\{\\top\}U^\{\\prime\}=I\_\{r\}, there exist a permutation matrixΠ\\Piand a diagonal matrixSS, with diagonal entries in\{±1\}\\\{\\pm 1\\\}, such thatU′=UΠS,λ′=Π⊤λ\.U^\{\\prime\}=U\\Pi S,\\ \\lambda^\{\\prime\}=\\Pi^\{\\top\}\\lambda\.

Implications for representation summaries\.Theorem[2](https://arxiv.org/html/2605.11142#Thmtheorem2)shows that the leading spectral modes provide canonical low\-rank summaries of the fitted kernel: truncating to the firstkkeigenmodes is the optimal rank\-kkPSD approximation in Frobenius norm\. Thus, spectral prefixes are not arbitrary coordinate projections of the factorLL, but intrinsic summaries of the kernel itself\. Theorem[3](https://arxiv.org/html/2605.11142#Thmtheorem3)further shows that, when the active spectrum is simple, these spectral modes are stable representation\-level objects, identifiable up to sign and permutation\. Together, these results separate the parameterization from the representation: the rank caprrcontrols the largest available kernel rank, whiledspec\(K\)d\_\{\\mathrm\{spec\}\}\(K\)measures the effective spectral dimension realized by the learned kernel\.

Elastic prefix extraction\.A fitted kernel also induces a nested family of aligned lower\-dimensional representations, in the spirit of*Matryoshka Representation Learning*\(Kusupatiet al\.,[2024](https://arxiv.org/html/2605.11142#bib.bib88)\), but obtained here without any multi\-scale training objective\. IfK=UΛU⊤K=U\\Lambda U^\{\\top\}, define

Kk=U1:kΛ1:kU1:k⊤,k=1,…,r\.K\_\{k\}=U\_\{1:k\}\\Lambda\_\{1:k\}U\_\{1:k\}^\{\\top\},\\qquad k=1,\\ldots,r\.\(8\)The family\{Kk\}k=1r\\\{K\_\{k\}\\\}\_\{k=1\}^\{r\}is strictly nested, sincerange⁡\(Kk\)⊂range⁡\(Kk\+1\)\\operatorname\{range\}\(K\_\{k\}\)\\subset\\operatorname\{range\}\(K\_\{k\+1\}\)becauseKk\+1K\_\{k\+1\}extendsKkK\_\{k\}by one additional eigenpair without modifying the previouskk\. Independently trained rank\-kkmodels do not satisfy this property: their eigenbases are unrelated acrosskk, blocking direct comparison of low\- and high\-capacity summaries of the same data\. EachKkK\_\{k\}is the optimal rank\-kkPSD approximation toKKin Frobenius norm by Theorem[2](https://arxiv.org/html/2605.11142#Thmtheorem2), in contrast to MRL, where nested representations are enforced through a joint loss at training time, our nestedness and optimality at everykkare deterministic consequences of the Eckart–Young–Mirsky theorem applied to a single\-objective fit\. A single learned kernel thus yields an auditable sequence of deployment\-time operating points indexed bykkand summarized bydspec\(Kk\)d\_\{\\mathrm\{spec\}\}\(K\_\{k\}\)\.

Local behavior asη\\etavaries\.We next investigate the one\-dimensional capacity path induced by varying the entropy weightη\\eta\. Because the objective may have multiple critical points, there is no globally single\-valued optimizer mapη↦L⋆\(η\)\\eta\\mapsto L^\{\\star\}\(\\eta\)in general\. The result below is therefore local and branch\-wise: it establishes a path of nondegenerate restricted local minima corresponding to valuesη\\etain the neighborhood ofη0\\eta\_\{0\}\. Along such a branch, the realized dimension

dspec\(η\)=dspec\(K\(L⋆\(η\)\)\)d\_\{\\mathrm\{spec\}\}\(\\eta\)=d\_\{\\mathrm\{spec\}\}\\bigl\(K\(L^\{\\star\}\(\\eta\)\)\\bigr\)varies smoothly and is locally nondecreasing, provided no spectral or second\-order degeneracy occurs\.

The statement must quotient out the orthogonal gauge of the factorization\. Indeed,K\(LQ\)=K\(L\)K\(LQ\)=K\(L\)for every orthogonalQ∈ℝr×rQ\\in\\mathbb\{R\}^\{r\\times r\}, so the full parameterization contains flat directions unrelated to the represented kernel\. Fix a ranks\>0s\>0\. After a right\-orthogonal reparameterization, work on the slice

Θ~≔\{\(\[L1∣0\],a,β\):L1∈ℝN×s,rank\(L1\)=s,L1⊤L1is diagonal,a∈ℝN,β\>0\}⊂Θ\.\\widetilde\{\\Theta\}\\coloneqq\\left\\\{\(\[L\_\{1\}\\mid 0\],a,\\beta\):L\_\{1\}\\in\\mathbb\{R\}^\{N\\times s\},\\operatorname\{rank\}\(L\_\{1\}\)=s,\\,L\_\{1\}^\{\\top\}L\_\{1\}\\text\{ is diagonal\},\\,a\\in\\mathbb\{R\}^\{N\},\\,\\beta\>0\\right\\\}\\subset\\Theta\.This slice is a smooth embedded submanifold of the parameter space, and all derivatives below are restricted to its tangent spaces\.111Basic differential\-geometric definitions are recalled in Appendix[A](https://arxiv.org/html/2605.11142#A1)\.Its tangent space atθ∈Θ~\\theta\\in\\widetilde\{\\Theta\}is denoted byTθΘ~T\_\{\\theta\}\\widetilde\{\\Theta\}\. The restricted objectiveFη\|Θ~F\_\{\\eta\}\|\_\{\\widetilde\{\\Theta\}\}is𝒞2\\mathcal\{C\}^\{2\}; see Appendix[D](https://arxiv.org/html/2605.11142#A4)\. We denote the restricted gradient by∇θFη\(θ\)\|TθΘ~\\nabla\_\{\\theta\}F\_\{\\eta\}\(\\theta\)\\big\|\_\{T\_\{\\theta\}\\widetilde\{\\Theta\}\}and the Hessian of the restricted functionFη\|Θ~F\_\{\\eta\}\|\_\{\\widetilde\{\\Theta\}\}, as∇θ2Fη\(θ\)\|TθΘ~\\nabla\_\{\\theta\}^\{2\}F\_\{\\eta\}\(\\theta\)\\big\|\_\{T\_\{\\theta\}\\widetilde\{\\Theta\}\}evaluated on tangent directions\.

Fixη0∈ℝ\\eta\_\{0\}\\in\\mathbb\{R\}\. We take as reference an arbitrary restricted critical pointθ⋆=\(L⋆,a⋆,β⋆\)∈Θ~\\theta^\{\\star\}=\(L^\{\\star\},a^\{\\star\},\\beta^\{\\star\}\)\\in\\widetilde\{\\Theta\}ofFη0\|Θ~F\_\{\\eta\_\{0\}\}\|\_\{\\widetilde\{\\Theta\}\}, meaning that∇θFη0\(θ⋆\)\|Tθ⋆Θ~=0\.\\nabla\_\{\\theta\}F\_\{\\eta\_\{0\}\}\(\\theta^\{\\star\}\)\\big\|\_\{T\_\{\\theta^\{\\star\}\}\\widetilde\{\\Theta\}\}=0\.The following assumptions are local regularity conditions for the chosen restricted critical pointθ⋆\\theta^\{\\star\}\. If a rank transition, eigenvalue crossing, or Hessian degeneracy occurs, this branch\-wise smoothness guarantee no longer applies\.

###### Assumption 4\(Simple active spectrum\)\.

The positive eigenvalues ofK\(L∗\)K\(L^\{\*\}\)are pairwise distinct\.

###### Assumption 5\(Nondegenerate second\-order condition\)\.

The restricted Hessian is positive definite:

v⊤\(∇θ2Fη0\(θ∗\)\|Tθ∗Θ~\)v\>0for every0≠v∈Tθ∗Θ~\.v^\{\\top\}\\left\(\\nabla\_\{\\theta\}^\{2\}F\_\{\\eta\_\{0\}\}\(\\theta^\{\*\}\)\\big\|\_\{T\_\{\\theta^\{\*\}\}\\widetilde\{\\Theta\}\}\\right\)v\>0\\quad\\text\{for every \}0\\neq v\\in T\_\{\\theta^\{\*\}\}\\widetilde\{\\Theta\}\.

The proof of the following Theorem is in Appendix[E](https://arxiv.org/html/2605.11142#A5)and an accompanying illustration in Figure[4](https://arxiv.org/html/2605.11142#A5.F4)\.

###### Theorem 6\(Local smoothness and monotonicity of the effective spectral dimension\)\.

Letη0∈ℝ\\eta\_\{0\}\\in\\mathbb\{R\}, and letθ⋆=\(L⋆,a⋆,β⋆\)\\theta^\{\\star\}=\(L^\{\\star\},a^\{\\star\},\\beta^\{\\star\}\)satisfy Assumptions[4](https://arxiv.org/html/2605.11142#Thmtheorem4)–[5](https://arxiv.org/html/2605.11142#Thmtheorem5)\. Then, there exists an open intervalU∋η0U\\ni\\eta\_\{0\}and a𝒞1\\mathcal\{C\}^\{1\}mapθ⋆:ℝ→Θ~\\theta^\{\\star\}\\colon\\mathbb\{R\}\\to\\widetilde\{\\Theta\}such thatθ⋆\(η0\)=θ⋆\\theta^\{\\star\}\(\\eta\_\{0\}\)=\\theta^\{\\star\}and the following three claims hold:\(i\)for everyη∈U\\eta\\in U, it holds∇θFη\(θ⋆\(η\)\)\|Tθ⋆\(η\)Θ~=0\\nabla\_\{\\theta\}F\_\{\\eta\}\(\\theta^\{\\star\}\(\\eta\)\)\\big\|\_\{T\_\{\\theta^\{\\star\}\(\\eta\)\}\\tilde\{\\Theta\}\}=0andθ⋆\(η\)\\theta^\{\\star\}\(\\eta\)is a strict local minimum ofFη\|Θ~F\_\{\\eta\}\\big\|\_\{\\widetilde\{\\Theta\}\},\(ii\)the capacity profiledspec\(η\):=dspec\(K\(L⋆\(η\)\)\)d\_\{\\mathrm\{spec\}\}\(\\eta\):=d\_\{\\mathrm\{spec\}\}\\bigl\(K\(L^\{\\star\}\(\\eta\)\)\\bigr\)is𝒞1\\mathcal\{C\}^\{1\}onUUand,\(iii\)for everyη∈U\\eta\\in U,

ddηlog⁡dspec\(η\)≥0\\frac\{d\}\{d\\eta\}\\log d\_\{\\mathrm\{spec\}\}\(\\eta\)\\geq 0

The theorem is intentionally local\. Its hypotheses can fail when \- asη\\etavaries \- an eigenvalue enters or leaves the active support, when positive eigenvalues collide, or when the restricted Hessian becomes degenerate\. These events can produce nonsmooth spectrum changes \(see Appendix[E\.1](https://arxiv.org/html/2605.11142#A5.SS1), Fig\.[5](https://arxiv.org/html/2605.11142#A5.F5)\)\. The appropriate interpretation of Theorem 6 is a sensitivity statement for a chosen nondegenerate local optimum, rather than a global statement about the optimizer set\. In nonconvex problems the argmin correspondenceη↦argmin⁡Fη\\eta\\mapsto\\operatorname\{argmin\}F\_\{\\eta\}can be set\-valued and need not vary continuously\. The theorem therefore proves the property that is actually needed locally: whenever a fitted solution is nondegenerate and its active spectrum remains simple, the implicit\-function theorem selects a unique nearby branch of local minima, and along this branch increasing the entropy weight cannot decreaselog⁡dspec\\log d\_\{\\mathrm\{spec\}\}\. Thus the result justifies local continuation and local one\-dimensional calibration of realized capacity, but it does not assert that all local minima, or the global minimizer, lie on a single smooth monotone path\.

Targeting a desired effective dimension\.Motivated by the local monotonicity in Theorem 6, we target a desired effective dimension by treatingdspec\(η\)d\_\{\\mathrm\{spec\}\}\(\\eta\)as a one\-dimensional response curve on empirically identified monotone intervals\. The entropy weightη\\etaparameterizes a one\-dimensional family of fitted kernels:η\>0\\eta\>0raisesdspecd\_\{\\mathrm\{spec\}\},η<0\\eta<0lowers it\. Given a targetdspec⋆d\_\{\\mathrm\{spec\}\}^\{\\star\}, we solvedspec\(η\)≈dspec⋆d\_\{\\mathrm\{spec\}\}\(\\eta\)\\approx d\_\{\\mathrm\{spec\}\}^\{\\star\}as one\-dimensional root finding forg\(η\):=dspec\(η\)−dspec⋆g\(\\eta\):=d\_\{\\mathrm\{spec\}\}\(\\eta\)\-d\_\{\\mathrm\{spec\}\}^\{\\star\}\. Starting from an anchor fit atη=0\\eta=0, the sign ofg\(0\)g\(0\)selects the search direction; geometric expansion brackets the target and bisection refinesη\\etauntil\|g\(η\)\|/dspec⋆≤τ\|g\(\\eta\)\|/d\_\{\\mathrm\{spec\}\}^\{\\star\}\\leq\\tau\(we useτ=0\.02\\tau=0\.02\)\. A monotone guard during expansion falls back to bidirectional search if local monotonicity fails, and probe fits may early\-stop oncedspecd\_\{\\mathrm\{spec\}\}stabilizes, since exact convergence is unnecessary for bracketing\. The selectedη⋆\\eta^\{\\star\}is retrained from a fresh initialization at the full training budget\. By Theorem[6](https://arxiv.org/html/2605.11142#Thmtheorem6), bisection reachesη\\eta\-precisionδ\\deltainO\(log⁡\(1/δ\)\)O\(\\log\(1/\\delta\)\)probe evaluations on intervals where the branch is monotone, so calibration adds at most a logarithmic multiplicative factor over a single training run\.

Complexity and parametrization\.We parametrize the kernel through its SVD,L=Qdiag\(σ\)L=Q\\,\\mathrm\{diag\}\(\\sigma\), withQQcolumn\-orthonormal andσ\\sigmapositive via softplus\. This is equivalent to a free factor underK\(L\)=NLL⊤/tr\(LL⊤\)K\(L\)=N\\,LL^\{\\top\}/\\mathrm\{tr\}\(LL^\{\\top\}\)but reads the eigendecomposition ofKKdirectly off the parameters \(λj\(K\)=Nσj2/∑lσl2\\lambda\_\{j\}\(K\)=N\\sigma\_\{j\}^\{2\}/\\sum\_\{l\}\\sigma\_\{l\}^\{2\}, eigenvectorsQQ\), makingdspecd\_\{\\mathrm\{spec\}\}and prefixesKkK\_\{k\}available without an eigensolve and exposing the signed\-permutation gauge of Theorem[3](https://arxiv.org/html/2605.11142#Thmtheorem3)at the parameter level\. Each training step costs𝒪\(\|Ebatch\|r\+Nr\)\\mathcal\{O\}\(\|E\_\{\\mathrm\{batch\}\}\|\\,r\+Nr\)with𝒪\(Nr\)\\mathcal\{O\}\(Nr\)memory; total cost is𝒪\(T\(\|Ebatch\|r\+Nr\)\)\\mathcal\{O\}\(T\(\|E\_\{\\mathrm\{batch\}\}\|\\,r\+Nr\)\), linear inNNandrrand tractable forr≪Nr\\ll N\. Calibration adds a logarithmic factor\.

## 4Experiments & Results

Table 1:Dataset statistics for eight undirected graphs used in the experiments\.Statisticca\-GrQcca\-HepThsocfb\-American75socfb\-Amherst41bio\-grid\-humanbio\-grid\-worminf\-powerinf\-openflightsDomainCollaborationCollaborationSocialSocialBiologicalBiologicalInfrastructureInfrastructureNodes4,1588,6386,3702,2359,1863,3434,9412,905Edges13,42624,818224,02493,18840,2249,78011,53418,550

We evaluateSpectraagainst representative latent representation methods spanning Euclidean embeddings, matrix factorization, mixed\-membership models, and latent\-distance simplex\-based representations\.Spectrais optimized with Adam\(Kingma and Ba,[2014](https://arxiv.org/html/2605.11142#bib.bib48)\)at5:15\{:\}1negatives\-to\-positives sampling per step\. For embedding baselines, dyadic edge features use the binary operator \(average, Hadamard, weighted\-L1L\_\{1\}, weighted\-L2L\_\{2\}\) maximizing each baseline’s test AUC, giving each baseline its strongest configuration; likelihood\-based latent graph models, includingSpectra, are evaluated directly from the learned log\-odds\.Spectrais reported in a strictly test\-blind regime: no hyperparameter,η\\etavalue,dspecd\_\{\\mathrm\{spec\}\}target, or checkpoint is selected with reference to any test quantity at any stage\. Hyperparameters, hardware, and implementation pointers are in Appendix[F](https://arxiv.org/html/2605.11142#A6)\.

inf\-power⋅\\cdotinfrastructure

inf\-openflights⋅\\cdotinfrastructure

socfb\-American75⋅\\cdotsocial

socfb\-Amherst41⋅\\cdotsocial

![Refer to caption](https://arxiv.org/html/2605.11142v1/x1.png)![Refer to caption](https://arxiv.org/html/2605.11142v1/x2.png)![Refer to caption](https://arxiv.org/html/2605.11142v1/x3.png)![Refer to caption](https://arxiv.org/html/2605.11142v1/x4.png)![Refer to caption](https://arxiv.org/html/2605.11142v1/x5.png)![Refer to caption](https://arxiv.org/html/2605.11142v1/x6.png)![Refer to caption](https://arxiv.org/html/2605.11142v1/x7.png)![Refer to caption](https://arxiv.org/html/2605.11142v1/x8.png)

Saturated regime

Rank\-cap\-binding regime

ca\-grqc⋅\\cdotcitation

ca\-hepth⋅\\cdotcitation

bio\-grid\-human⋅\\cdotbiological

bio\-grid\-worm⋅\\cdotbiological

![Refer to caption](https://arxiv.org/html/2605.11142v1/x9.png)![Refer to caption](https://arxiv.org/html/2605.11142v1/x10.png)![Refer to caption](https://arxiv.org/html/2605.11142v1/x11.png)![Refer to caption](https://arxiv.org/html/2605.11142v1/x12.png)![Refer to caption](https://arxiv.org/html/2605.11142v1/x13.png)![Refer to caption](https://arxiv.org/html/2605.11142v1/x14.png)![Refer to caption](https://arxiv.org/html/2605.11142v1/x15.png)![Refer to caption](https://arxiv.org/html/2605.11142v1/x16.png)

Figure 2:Capacity frontiers across rank caps\.Each dataset is swept overη∈\[−0\.25,0\.25\]\\eta\\in\[\-0\.25,0\.25\]atr∈\{64,128,256\}r\\in\\\{64,128,256\\\}; curves show mean±1σ\\pm 1\\sigmaacross three seeds\. Upper panels show achieveddspecd\_\{\\mathrm\{spec\}\}versusη\\eta, lower panels show test AUC versus achieveddspecd\_\{\\mathrm\{spec\}\}\. Top: saturated datasets\. Bottom: rank\-cap\-binding datasets\. Legends report per\-rank peak coordinates\.Datasets and Baselines\.All experiments use featureless undirected graphs spanning five structural regimes: collaboration networksca\-GrQcandca\-HepTh\(Leskovecet al\.,[2007](https://arxiv.org/html/2605.11142#bib.bib55)\); social networkssocfb\-American75andsocfb\-Amherst41\(Traudet al\.,[2012](https://arxiv.org/html/2605.11142#bib.bib68)\); biological protein\-protein interaction networksbio\-grid\-humanandbio\-grid\-worm\(Starket al\.,[2006](https://arxiv.org/html/2605.11142#bib.bib69)\); and infrastructure networksinf\-power\(Watts and Strogatz,[1998](https://arxiv.org/html/2605.11142#bib.bib66)\)andinf\-openflights\(Opsahl,[2011](https://arxiv.org/html/2605.11142#bib.bib67)\)\(see Table[1](https://arxiv.org/html/2605.11142#S4.T1)for details\)\. Datasets are obtained from SNAP\(Leskovec and Krevl,[2014](https://arxiv.org/html/2605.11142#bib.bib53)\)and the Network Repository\(Rossi and Ahmed,[2015](https://arxiv.org/html/2605.11142#bib.bib70)\)\. We compare against four families of latent representation methods: \(i\)*Euclidean vector\-space embeddings*that learn node representations inℝd\\mathbb\{R\}^\{d\}under inner\-product geometry, represented byNode2Vec\(Grover and Leskovec,[2016](https://arxiv.org/html/2605.11142#bib.bib5)\)andRole2Vec\(Ahmedet al\.,[2018](https://arxiv.org/html/2605.11142#bib.bib37)\); \(ii\)*matrix\-factorization methods*, represented byNetMF\(Qiuet al\.,[2018](https://arxiv.org/html/2605.11142#bib.bib60)\)andGraRep\(Caoet al\.,[2015](https://arxiv.org/html/2605.11142#bib.bib10)\); \(iii\)*mixed\-membership models*, represented byMNMF\(Wanget al\.,[2017](https://arxiv.org/html/2605.11142#bib.bib56)\); and \(iv\)*latent\-distance simplex\-based models*, represented byHM\-LDM\(Nakiset al\.,[2022](https://arxiv.org/html/2605.11142#bib.bib43)\)\. We focus on featureless graphs in an unsupervised link\-prediction setting; the comparison class is therefore the latent\-representation methods that operate in this regime, rather than message\-passing GNNs, which usually underperform under this regimeNikolentzoset al\.\([2024](https://arxiv.org/html/2605.11142#bib.bib58)\)\.

#### Empirical tracking protocol\.

Theorem[6](https://arxiv.org/html/2605.11142#Thmtheorem6)provides a local sensitivity result for nondegenerate branches of local minima\. In the experiments, we complement this branch\-wise theory with a more stringent protocol: each value ofη\\etais trained from independent random initializations rather than obtained by warm\-start continuation from a single solution\. Stability of achieveddspecd\_\{\\mathrm\{spec\}\}across seeds therefore tests whether spectral capacity control is reproducible under the actual nonconvex training procedure, not only along a locally tracked branch\.

Capacity Frontiers\.We test whether predictive performance is better organized by realized spectral capacitydspec\(η\)d\_\{\\mathrm\{spec\}\}\(\\eta\)than by the nominal rank cap\. For each dataset, we sweepη\\etaat rank capsr∈\{64,128,256\}r\\in\\\{64,128,256\\\}over three seeds, using an adaptive cold\-start grid that refines regions wheredspec\(η\)d\_\{\\mathrm\{spec\}\}\(\\eta\)changes rapidly \(Appendix section[G](https://arxiv.org/html/2605.11142#A7)\)\. Figure[2](https://arxiv.org/html/2605.11142#S4.F2)shows the resulting frontiers\. Across all eight datasets, the mapη↦dspec\\eta\\mapsto d\_\{\\mathrm\{spec\}\}is monotone with narrow cross\-seed variation, spanning roughly two orders of magnitude in realized dimension before saturating near the rank cap\. When plotted against achieveddspecd\_\{\\mathrm\{spec\}\}, AUC curves from different rank caps align wherever their realized\-capacity ranges overlap, indicating that performance is organized by realized spectral capacity rather than byrralone\. At very largedspecd\_\{\\mathrm\{spec\}\}, performance can decline because positiveη\\etapushes the spectrum toward equal mass over all available modes, including modes that carry little signal\. The peak locations reveal two regimes\. Infrastructure and social networks \(inf\-power,inf\-openflights,socfb\-American75,socfb\-Amherst41\) are saturated: their best operating points occur at similardspecd\_\{\\mathrm\{spec\}\}across rank caps and lie well below the smallest cap tested\. Citation and biological networks \(ca\-grqc,ca\-hepth,bio\-grid\-human,bio\-grid\-worm\) are rank\-cap\-binding: their best achieveddspecd\_\{\\mathrm\{spec\}\}increases withrr, and AUC continues to improve as the cap is enlarged\. Thus the multi\-rrsweep distinguishes datasets whose capacity demand is already saturated from those for which the parameterization remains constraining throughr=256r=256\.

Table 2:AUC\-ROC scoreson eight benchmark graphs, overS=5S=5seeds for variousdd\. Each method block reports AUC and achieveddspecd\_\{\\mathrm\{spec\}\}\.Bold: best AUC for each \(dataset,dd\);underline: second best\.Node2VecRole2VecNetMFGraRepMNMFHM\-LDMSpectra\(η=0\\eta\\\!=\\\!0\)Spectra\(meddspecd\_\{\\mathrm\{spec\}\}\)Spectra\(mindspecd\_\{\\mathrm\{spec\}\}\)DatasetddAUCdspecd\_\{\\mathrm\{spec\}\}AUCdspecd\_\{\\mathrm\{spec\}\}AUCdspecd\_\{\\mathrm\{spec\}\}AUCdspecd\_\{\\mathrm\{spec\}\}AUCdspecd\_\{\\mathrm\{spec\}\}AUCdspecd\_\{\\mathrm\{spec\}\}AUCdspecd\_\{\\mathrm\{spec\}\}AUCdspecd\_\{\\mathrm\{spec\}\}AUCdspecd\_\{\\mathrm\{spec\}\}ca\-GrQc16\.93115\.91\.93615\.36\.88715\.58\.9068\.45\.90514\.99\.93714\.72\.94015\.99\.93815\.18\.9328\.4532\.93631\.62\.93630\.47\.89930\.71\.90913\.7\.91830\.94\.94228\.15\.94431\.99\.94230\.59\.94013\.764\.93759\.64\.93155\.88\.89760\.38\.89722\.45\.92462\.68\.93748\.40\.94963\.74\.95057\.76\.94422\.45128\.93486\.41\.924125\.58\.888116\.33\.89335\.08\.918125\.58\.94990\.85\.952124\.51\.953103\.59\.94635\.08ca\-HepTh16\.88315\.95\.90715\.12\.83115\.75\.8465\.47\.84514\.99\.87614\.70\.90215\.99\.90515\.06\.8915\.4732\.88831\.80\.90230\.13\.83631\.41\.8428\.24\.86430\.97\.87729\.32\.91031\.99\.91130\.55\.8958\.2464\.89263\.36\.89658\.86\.82762\.37\.83213\.11\.87562\.91\.87456\.30\.91563\.94\.91760\.61\.90513\.11128\.892109\.57\.882108\.52\.810122\.42\.82521\.59\.876126\.70\.89195\.85\.921127\.00\.920109\.04\.90821\.59socfb\-Am\.7516\.92314\.35\.91014\.47\.87913\.02\.9328\.05\.85714\.58\.93811\.03\.94414\.82\.94513\.68\.9388\.0532\.93428\.75\.91428\.81\.89626\.35\.93811\.47\.87630\.21\.93815\.66\.95127\.61\.95127\.55\.94411\.4764\.93057\.64\.91060\.97\.90552\.22\.94216\.95\.89560\.97\.93817\.24\.95447\.65\.95254\.93\.94716\.95128\.914114\.33\.896120\.43\.906100\.69\.94326\.44\.912120\.43\.93920\.99\.95564\.58\.949107\.51\.94920\.99socfb\-Amh\.4116\.92613\.18\.90313\.51\.89211\.11\.9307\.81\.88414\.07\.93910\.12\.94314\.23\.90112\.14\.9387\.8132\.92626\.36\.89926\.87\.90121\.80\.93311\.47\.89929\.19\.94112\.35\.94725\.00\.91524\.08\.94211\.4764\.90952\.58\.88253\.59\.89942\.35\.93417\.95\.90958\.34\.94113\.11\.94938\.0192\.447\.46\.94413\.11128\.890102\.40\.862107\.40\.88579\.84\.92929\.64\.913115\.79\.94213\.87\.94940\.82\.93291\.12\.94413\.87bio\-grid\-h\.16\.90215\.75\.87815\.45\.86714\.62\.9108\.66\.84014\.97\.91914\.29\.90215\.99\.94314\.80\.8998\.6632\.90031\.51\.87630\.69\.85129\.09\.91714\.34\.85830\.89\.91727\.61\.91531\.97\.94729\.89\.90314\.3464\.89262\.43\.87560\.36\.84756\.72\.91823\.99\.86762\.71\.91945\.53\.92563\.68\.94658\.54\.91023\.99128\.883119\.15\.877118\.62\.854109\.69\.91338\.77\.872125\.91\.92376\.82\.932122\.87\.938114\.16\.91938\.77bio\-grid\-w\.16\.78615\.52\.74415\.21\.76915\.57\.90011\.95\.76914\.79\.91313\.32\.85215\.99\.85215\.00\.83511\.9532\.77530\.58\.74829\.00\.82130\.10\.91117\.67\.78030\.64\.91423\.06\.88631\.93\.88229\.55\.86017\.6764\.75457\.84\.77254\.80\.85256\.28\.91624\.17\.79262\.35\.92032\.03\.91563\.33\.91355\.54\.87724\.17128\.74795\.37\.812103\.96\.87898\.86\.91730\.30\.785124\.80\.92036\.19\.934121\.33\.93197\.12\.88930\.30inf\-power16\.89215\.95\.94215\.25\.89915\.73\.95613\.01\.81614\.99\.88514\.67\.93615\.99\.93915\.12\.94913\.0132\.91931\.9\.94230\.92\.92031\.44\.95620\.64\.84130\.97\.88128\.75\.91231\.99\.91630\.94\.92520\.6464\.91460\.28\.94055\.54\.89262\.45\.95331\.07\.84162\.89\.84750\.53\.89563\.98\.89657\.91\.91631\.07128\.90180\.41\.93563\.67\.845122\.82\.94545\.15\.813126\.61\.88995\.65\.880127\.89\.88488\.03\.91045\.15inf\-openfl\.16\.97415\.41\.96213\.56\.94815\.45\.98411\.59\.94414\.63\.99212\.88\.98715\.83\.98814\.10\.98911\.5932\.97430\.16\.96325\.26\.95229\.95\.98616\.45\.95919\.93\.99021\.43\.98730\.87\.98823\.34\.98816\.4564\.97455\.31\.96648\.12\.94957\.18\.98521\.90\.96860\.95\.99126\.67\.98757\.73\.98751\.72\.98821\.90128\.97387\.94\.96694\.68\.935103\.84\.98226\.60\.972122\.08\.99138\.86\.98796\.43\.98791\.31\.98826\.60

Link prediction\.We follow the standard unsupervised protocol\(Perozziet al\.,[2014](https://arxiv.org/html/2605.11142#bib.bib4); Nakiset al\.,[2023b](https://arxiv.org/html/2605.11142#bib.bib57); Liben\-Nowell and Kleinberg,[2003](https://arxiv.org/html/2605.11142#bib.bib110)\): remove50%50\\%of edges while keeping the training graph connected, and sample an equal number of non\-edges as negatives\. Table[2](https://arxiv.org/html/2605.11142#S4.T2)reports mean AUC\-ROC overS=5S=5seeds ford∈\{16,32,64,128\}d\\in\\\{16,32,64,128\\\}; AUC\-PR gives qualitatively identical conclusions\. Baselines are trained at native dimensiondd, whileSpectrauses rank capr=dr=d, so comparisons are matched in parameter count\. We also report each method’s achieveddspec=exp⁡\(H\(σ2/∑iσi2\)\)d\_\{\\mathrm\{spec\}\}=\\exp\(H\(\\sigma^\{2\}/\\sum\_\{i\}\\sigma\_\{i\}^\{2\}\)\)from embedding singular values\.Spectrais evaluated atη=0\\eta=0and at two bisection\-selected settings targeting the median and minimum baseline effective ranks at the samedd; these targets are fixed from baseline embeddings, with no validation or test AUC used to chooseη\\eta\. At matched parameter count,Spectra\(η=0\)\(\\eta=0\)is top or second on7/87/8datasets at everydd; the exceptions areinf\-power, whereGraRepdominates, andinf\-openflights, whereHM\-LDMleads narrowly\. Capacity\-targetedSpectraremains competitive at much smaller realized dimensions: the median\-target setting leads onca\-HepTh,socfb\-American75, andbio\-grid\-humanat multipledd, while the minimum\-target setting often matches full\-rank baselines at5−10×5\{\-\}10\\timessmallerdspecd\_\{\\mathrm\{spec\}\}\. Baselines also realize effective ranks well below nominalddon socfb and bio\-grid graphs \(e\.g\.,HM\-LDMatd=128d=128hasdspec≈21d\_\{\\mathrm\{spec\}\}\\approx 21onsocfb\-American75and≈36\\approx 36onbio\-grid\-worm\), reinforcing realized spectral capacity as the operative comparison coordinate\.

![Refer to caption](https://arxiv.org/html/2605.11142v1/figures/adj_grqc_k4.png)\(a\)dpre=4d\_\{\\mathrm\{pre\}\}\{=\}4
![Refer to caption](https://arxiv.org/html/2605.11142v1/figures/adj_grqc_k8.png)\(b\)dpre=8d\_\{\\mathrm\{pre\}\}\{=\}8
![Refer to caption](https://arxiv.org/html/2605.11142v1/figures/adj_grqc_k16.png)\(c\)dpre=16d\_\{\\mathrm\{pre\}\}\{=\}16
![Refer to caption](https://arxiv.org/html/2605.11142v1/figures/adj_grqc_k32.png)\(d\)dpre=32d\_\{\\mathrm\{pre\}\}\{=\}32
![Refer to caption](https://arxiv.org/html/2605.11142v1/figures/adj_grqc_k64.png)\(e\)dpre=64d\_\{\\mathrm\{pre\}\}\{=\}64

Figure 3:Adjacency matrices reordered by spectral assignmentci\(dpre\)=arg⁡maxj≤dpre⁡Uij2c\_\{i\}\(d\_\{\\mathrm\{pre\}\}\)=\\arg\\max\_\{j\\leq d\_\{\\mathrm\{pre\}\}\}U\_\{ij\}^\{2\}fordpre∈\{4,8,16,32,64\}d\_\{\\mathrm\{pre\}\}\\in\\\{4,8,16,32,64\\\}\. Panels come from a single fittedSpectrakernel via the rank\-dpred\_\{\\mathrm\{pre\}\}prefix\.Visualizing the prefix family\.Figure[3](https://arxiv.org/html/2605.11142#S4.F3)visualizes spectral prefixesdpre∈\{4,8,16,32,64\}d\_\{\\mathrm\{pre\}\}\\in\\\{4,8,16,32,64\\\}from a*single*Spectrafit ongrqcwith rank capr=128r=128and targetdspec⋆=64d\_\{\\mathrm\{spec\}\}^\{\\star\}=64\. Each panel uses the rank\-dpred\_\{\\mathrm\{pre\}\}prefixKdpre=U1:dpreΛ1:dpreU1:dpre⊤,K\_\{d\_\{\\mathrm\{pre\}\}\}=U\_\{1:d\_\{\\mathrm\{pre\}\}\}\\Lambda\_\{1:d\_\{\\mathrm\{pre\}\}\}U\_\{1:d\_\{\\mathrm\{pre\}\}\}^\{\\top\},which is the optimal rank\-dpred\_\{\\mathrm\{pre\}\}PSD approximation of the fitted kernel in Frobenius norm by Theorem[2](https://arxiv.org/html/2605.11142#Thmtheorem2)\. Nodes are reordered by the dominant retained spectral coordinate,ci\(dpre\)=arg⁡max1≤j≤dpre⁡Uij2,c\_\{i\}\(d\_\{\\mathrm\{pre\}\}\)=\\arg\\max\_\{1\\leq j\\leq d\_\{\\mathrm\{pre\}\}\}U\_\{ij\}^\{2\},and blocks are shaded by this assignment\. This is a standard spectral way to read block structure from leading eigenspaces\(Nget al\.,[2001](https://arxiv.org/html/2605.11142#bib.bib51); von Luxburg,[2007](https://arxiv.org/html/2605.11142#bib.bib50)\), but here the eigenspaces come from the learned trace\-normalized kernel rather than directly from the adjacency\. Asdpred\_\{\\mathrm\{pre\}\}increases, the reordered adjacency resolves progressively finer structure: broad macro\-groups atdpre=4d\_\{\\mathrm\{pre\}\}=4split into smaller, denser blocks bydpre=64d\_\{\\mathrm\{pre\}\}=64, while off\-diagonal density decreases\. No panel is a refit\. The role ofη\\etais to set the spectral operating point*before*prefix extraction: targetingdspec⋆d\_\{\\mathrm\{spec\}\}^\{\\star\}determines how spectral mass is distributed across modes, and hence how far the prefix family adds meaningful structure\. This gives a graph analogue of Matryoshka\-style multi\-resolution representations\(Kusupatiet al\.,[2024](https://arxiv.org/html/2605.11142#bib.bib88)\), but with alignment and optimality following from Eckart–Young–Mirsky for a single fitted kernel rather than from a multi\-scale training loss\.

Overparameterized target\-dspec⋆d\_\{\\mathrm\{spec\}\}^\{\\star\}acquisition\.We test whether overparameterization, used through spectral control, finds a better\-calibrated solution than the rank\-cap\-only practice\. We sweep target effective ranksdspec⋆∈\{16,32,64\}d\_\{\\mathrm\{spec\}\}^\{\\star\}\\in\\\{16,32,64\\\}and rank capsr∈\{dspec⋆,64,128,256\}r\\in\\\{d\_\{\\mathrm\{spec\}\}^\{\\star\},64,128,256\\\}ongrqcandbio\-grid\-human, withn=10n=10seeds per cell\. Each cell trains two models on identical data: a Phase\-0 anchor atη=0\\eta=0\(dspecd\_\{\\mathrm\{spec\}\}saturates nearrr\) and a Phase\-2 retrain atη⋆\\eta^\{\\star\}obtained by bidirectional bisection of fully\-trained probes \(τ=0\.05\\tau=0\.05\)\. Metrics target the optimization objective: test log\-likelihoodTLL\\mathrm\{TLL\}and the train–test generalization gapGenGap=trNLL\+TLL\\mathrm\{GenGap\}=\\mathrm\{trNLL\}\+\\mathrm\{TLL\}\(lower is better\)\. To assess whether observed differences between the two conditions reflect a systematic effect, we report paired Wilcoxon signed\-rank tests on the per\-seed differences, paired because the two conditions in each cell share data, negatives, and seed, and signed\-rank rather thantt\-test because paired\-difference distributions across seeds are heavy\-tailed and not well\-modeled as Gaussian\.Comparison A\(within\-cell, isolatingη\\etaat fixedrr\): without spectral control, overparameterization is calibration\-disastrous, across 240 paired comparisons, the retrain improves GenGap \(p<10−30p<10^\{\-30\}\) and TLL \(p<10−25p<10^\{\-25\}\), while the anchor fits training strictly harder \(lowertrNLL\\mathrm\{trNLL\},p<10−25p<10^\{\-25\}\)\.Comparison B\(matcheddspec⋆d\_\{\\mathrm\{spec\}\}^\{\\star\}, against the rank\-cap\-only baseline atr=dspec⋆r=d\_\{\\mathrm\{spec\}\}^\{\\star\}\):η\\eta\-targeted overparameterization finds a calibration\-superior minimum at matched effective capacity, across 180 paired comparisons, both GenGap \(p≈10−11p\\approx 10^\{\-11\}\) and TLL \(p≈10−12p\\approx 10^\{\-12\}\) favor the retrain, with larger effect onbio\-grid\-human\(eachp<10−10p<10^\{\-10\}\) than on the more\-saturatedgrqc\(eachp≈0\.015p\\approx 0\.015\)\. Since both conditions reach the same realizeddspecd\_\{\\mathrm\{spec\}\}, the gain comes not from additional representational capacity but from the optimizer landscape that the larger parameterization opens up, a setting consistent with the broader literature on overparameterized models that generalize well when their effective complexity is controlled\(Belkinet al\.,[2019](https://arxiv.org/html/2605.11142#bib.bib104); Bartlettet al\.,[2020](https://arxiv.org/html/2605.11142#bib.bib105)\)\.

## 5Conclusion & Limitations

We introducedSpectra, which treats capacity in latent graph models as a property of the fitted representation rather than as a nominal rank chosen before training\. Its key object is the trace\-normalized learned PSD kernel: the Shannon effective rankdspecd\_\{\\mathrm\{spec\}\}gives a smooth realized\-dimension coordinate, shaped by an entropy weightη\\etaand targeted by bisection\. The framework provides identifiability of active spectral modes, localC1C^\{1\}regularity away from structural events, and Eckart–Young–Mirsky\-optimal spectral prefixes\. Empirically,Spectrais competitive with strong link\-prediction baselines at matched parameter count, remains competitive at substantially smaller realized dimensions, reveals saturated and rank\-cap\-binding regimes through capacity frontiers, and yields aligned Matryoshka\-style prefix views without a multi\-scale objective\. Beyond link prediction,Spectrasuggests a broader use for comparing ensembles of networks through realized representational complexity\. For example,dspecd\_\{\\mathrm\{spec\}\}could be used to ask whether village\-level social networks from field experiments in Honduras\(Airoldi and Christakis,[2024](https://arxiv.org/html/2605.11142#bib.bib117)\), India\(Alexanderet al\.,[2022](https://arxiv.org/html/2605.11142#bib.bib118)\), or Malawi\(Beamanet al\.,[2021](https://arxiv.org/html/2605.11142#bib.bib119)\); school friendship networks from Add Health\(Harriset al\.,[2019](https://arxiv.org/html/2605.11142#bib.bib120); Jeon and Goodson,[2015](https://arxiv.org/html/2605.11142#bib.bib121)\)or school intervention studies\(Palucket al\.,[2016](https://arxiv.org/html/2605.11142#bib.bib122)\); protein interaction networks such as those studied in lethality\-centrality work\(Jeonget al\.,[2001](https://arxiv.org/html/2605.11142#bib.bib124)\); or infrastructure networks such as city electrical grids\(Pagani and Aiello,[2013](https://arxiv.org/html/2605.11142#bib.bib123)\)exhibit comparable structural complexity\. In such settings,dspecd\_\{\\mathrm\{spec\}\}could serve as an outcome variable for studying how network complexity varies with intervention context, school environment, biological organization, or urban scale\.

The main limitation is geometric: we work with Euclidean latent spaces, where capacity is read from anO\(r\)O\(r\)\-invariant Gram spectrum\. Simplex\-valued and non\-Euclidean latent models require the corresponding invariant kernel, e\.g\., a log\-ratio Gram under Aitchison geometry for compositional memberships\(Nakiset al\.,[2026](https://arxiv.org/html/2605.11142#bib.bib87)\)or a geometry\-native kernel for hyperbolic embeddings\. Cross\-network comparisons would also require common preprocessing and fitting protocols, with controls for graph size and density\. We leave these extensions to future work\.

## Acknowledgments

The authors gratefully acknowledge support from the NOMIS Foundation, the Stavros Niarchos Foundation, and the National Institutes of Health \(R01AG081814\)\.

## References

- N\. K\. Ahmed, R\. Rossi, J\. B\. Lee, T\. L\. Willke, R\. Zhou, X\. Kong, and H\. Eldardiry \(2018\)Learning role\-based graph embeddings\.arXiv preprint arXiv:1802\.02896\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p1.1),[§4](https://arxiv.org/html/2605.11142#S4.p2.1)\.
- E\. Airoldi and N\. Christakis \(2024\)Induction of social contagion for diverse outcomes in structured experiments in isolated villages\.Science \(New York, N\.Y\.\)384,pp\. eadi5147\.External Links:[Document](https://dx.doi.org/10.1126/science.adi5147)Cited by:[§5](https://arxiv.org/html/2605.11142#S5.p1.5)\.
- E\. M\. Airoldi, D\. M\. Blei, S\. E\. Fienberg, and E\. P\. Xing \(2007\)Mixed membership stochastic blockmodels\.External Links:0705\.4485,[Link](https://arxiv.org/abs/0705.4485)Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p1.1),[§1](https://arxiv.org/html/2605.11142#S1.p2.1)\.
- M\. Alexander, L\. Forastiere, S\. Gupta, and N\. A\. Christakis \(2022\)Algorithms for seeding social networks can enhance the adoption of a public health intervention in urban india\.Proceedings of the National Academy of Sciences119\(30\),pp\. e2120742119\.External Links:[Document](https://dx.doi.org/10.1073/pnas.2120742119),[Link](https://www.pnas.org/doi/abs/10.1073/pnas.2120742119),https://www\.pnas\.org/doi/pdf/10\.1073/pnas\.2120742119Cited by:[§5](https://arxiv.org/html/2605.11142#S5.p1.5)\.
- S\. Arora, N\. Cohen, W\. Hu, and Y\. Luo \(2019\)Implicit regularization in deep matrix factorization\.Advances in neural information processing systems32\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p4.2),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- A\. Athreya, D\. E\. Fishkind, M\. Tang, C\. E\. Priebe, Y\. Park, J\. T\. Vogelstein, K\. Levin, V\. Lyzinski, Y\. Qin, and D\. L\. Sussman \(2018\)Statistical inference on random dot product graphs: a survey\.Journal of Machine Learning Research18\(226\),pp\. 1–92\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p1.1),[§1](https://arxiv.org/html/2605.11142#S1.p2.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- P\. L\. Bartlett, P\. M\. Long, G\. Lugosi, and A\. Tsigler \(2020\)Benign overfitting in linear regression\.Proceedings of the National Academy of Sciences117\(48\),pp\. 30063–30070\.External Links:ISSN 1091\-6490,[Link](http://dx.doi.org/10.1073/pnas.1907378117),[Document](https://dx.doi.org/10.1073/pnas.1907378117)Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p4.2),[§2](https://arxiv.org/html/2605.11142#S2.p1.1),[§4](https://arxiv.org/html/2605.11142#S4.SS0.SSS0.Px1.p5.26)\.
- L\. Beaman, A\. Benyishay, and J\. Magruder \(2021\)Can network theory\-based targeting increase technology adoption?\.American Economic Review111,pp\. 1918–1943\.External Links:[Document](https://dx.doi.org/10.1257/aer.20200295)Cited by:[§5](https://arxiv.org/html/2605.11142#S5.p1.5)\.
- M\. Belkin, D\. Hsu, S\. Ma, and S\. Mandal \(2019\)Reconciling modern machine\-learning practice and the classical bias–variance trade\-off\.Proceedings of the National Academy of Sciences116\(32\),pp\. 15849–15854\.External Links:ISSN 1091\-6490,[Link](http://dx.doi.org/10.1073/pnas.1903070116),[Document](https://dx.doi.org/10.1073/pnas.1903070116)Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p4.2),[§2](https://arxiv.org/html/2605.11142#S2.p1.1),[§4](https://arxiv.org/html/2605.11142#S4.SS0.SSS0.Px1.p5.26)\.
- E\. Candes and B\. Recht \(2012\)Exact matrix completion via convex optimization\.Communications of the ACM55\(6\),pp\. 111–119\.Cited by:[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- S\. Cao, W\. Lu, and Q\. Xu \(2015\)Grarep: learning graph representations with global structural information\.InProceedings of the 24th ACM International on Conference on Information and Knowledge Management,pp\. 891–900\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p1.1),[§4](https://arxiv.org/html/2605.11142#S4.p2.1)\.
- K\. Chen and J\. Lei \(2018\)Network cross\-validation for determining the number of communities in network data\.Journal of the American Statistical Association113\(521\),pp\. 241–251\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p2.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- K\. Cherukuri and A\. Lala \(2025\)Low\-rank matrix approximation for neural network compression\.In2025 10th International Conference on Machine Learning Technologies \(ICMLT\),pp\. 166–170\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p3.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- C\. Eckart and G\. Young \(1936\)The approximation of one matrix by another of lower rank\.Psychometrika1,pp\. 211–218\.External Links:[Link](https://api.semanticscholar.org/CorpusID:10163399)Cited by:[Appendix B](https://arxiv.org/html/2605.11142#A2.p1.13),[§1](https://arxiv.org/html/2605.11142#S1.p4.2),[§2](https://arxiv.org/html/2605.11142#S2.p1.1),[§3](https://arxiv.org/html/2605.11142#S3.p9.1)\.
- M\. Fiori, B\. Marenco, F\. Larroca, P\. Bermolen, and G\. Mateos \(2023\)Gradient\-based spectral embeddings of random dot product graphs\.IEEE Transactions on Signal and Information Processing over Networks10,pp\. 1–16\.Cited by:[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- D\. Friedman and A\. B\. Dieng \(2022\)The vendi score: a diversity evaluation metric for machine learning\.arXiv preprint arXiv:2210\.02410\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p3.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- A\. Grover and J\. Leskovec \(2016\)node2vec: Scalable Feature Learning for Networks\.InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,pp\. 855–864\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p1.1),[§4](https://arxiv.org/html/2605.11142#S4.p2.1)\.
- W\. Gu, A\. Tandon, Y\. Ahn, and F\. Radicchi \(2021\)Principled approach to the selection of the embedding dimension of networks\.Nature Communications12\(1\),pp\. 3772\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p2.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- S\. Gunasekar, B\. E\. Woodworth, S\. Bhojanapalli, B\. Neyshabur, and N\. Srebro \(2017\)Implicit regularization in matrix factorization\.Advances in neural information processing systems30\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p4.2),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- K\. Harris, C\. Halpern, E\. Whitsel, J\. Hussey, L\. Killeya\-Jones, J\. Tabor, and S\. Dean \(2019\)Cohort profile: the national longitudinal study of adolescent to adult health \(add health\)\.International journal of epidemiology48,pp\.\.External Links:[Document](https://dx.doi.org/10.1093/ije/dyz115)Cited by:[§5](https://arxiv.org/html/2605.11142#S5.p1.5)\.
- P\. D\. Hoff, A\. E\. Raftery, and M\. S\. Handcock \(2002\)Latent space approaches to social network analysis\.Journal of the american Statistical association97\(460\),pp\. 1090–1098\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p1.1),[§1](https://arxiv.org/html/2605.11142#S1.p2.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- P\. Hoff \(2007\)Modeling homophily and stochastic equivalence in symmetric relational data\.Advances in neural information processing systems20\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p1.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- D\. Hong and J\. Cape \(2025\)Network signflip parallel analysis for selecting the embedding dimension\.arXiv preprint arXiv:2509\.05722\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p2.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- I\. C\. Ipsen and A\. K\. Saibaba \(2025\)Stable rank and intrinsic dimension of real and complex matrices\.SIAM Journal on Matrix Analysis and Applications46\(3\),pp\. 1988–2007\.Cited by:[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- K\. C\. Jeon and P\. Goodson \(2015\)US adolescents’ friendship networks and health risk behaviors: a systematic review of studies using social network analysis and add health data\.PeerJ3,pp\. e1052\.External Links:[Document](https://dx.doi.org/10.7717/peerj.1052)Cited by:[§5](https://arxiv.org/html/2605.11142#S5.p1.5)\.
- H\. Jeong, S\. P\. Mason, A\.\-L\. Barabási, and Z\. N\. Oltvai \(2001\)Lethality and centrality in protein networks\.Nature411\(6833\),pp\. 41–42\.External Links:ISSN 1476\-4687,[Link](http://dx.doi.org/10.1038/35075138),[Document](https://dx.doi.org/10.1038/35075138)Cited by:[§5](https://arxiv.org/html/2605.11142#S5.p1.5)\.
- N\. K\. Jha and B\. Reagen \(2025\)Spectral scaling laws in language models: emphhow effectively do feed\-forward networks use their latent space?\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,pp\. 35047–35058\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p3.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- D\. P\. Kingma and J\. Ba \(2014\)Adam: a method for stochastic optimization\.arXiv preprint arXiv:1412\.6980\.Cited by:[Appendix F](https://arxiv.org/html/2605.11142#A6.SS0.SSS0.Px1.p1.4),[§4](https://arxiv.org/html/2605.11142#S4.p1.5)\.
- A\. Kulesza and B\. Taskar \(2012\)Determinantal point processes for machine learning\.Foundations and Trends® in Machine Learning5\(2\-3\),pp\. 123–286\.External Links:ISSN 1935\-8245,[Link](http://dx.doi.org/10.1561/2200000044),[Document](https://dx.doi.org/10.1561/2200000044)Cited by:[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- A\. Kusupati, G\. Bhatt, A\. Rege, M\. Wallingford, A\. Sinha, V\. Ramanujan, W\. Howard\-Snyder, K\. Chen, S\. Kakade, P\. Jain, and A\. Farhadi \(2024\)Matryoshka representation learning\.External Links:2205\.13147,[Link](https://arxiv.org/abs/2205.13147)Cited by:[§3](https://arxiv.org/html/2605.11142#S3.p11.1),[§4](https://arxiv.org/html/2605.11142#S4.SS0.SSS0.Px1.p4.12)\.
- J\. M\. Lee \(2013\)Introduction to smooth manifolds\.2 edition,Springer\.Cited by:[Appendix A](https://arxiv.org/html/2605.11142#A1.p2.1),[Appendix E](https://arxiv.org/html/2605.11142#A5.SS0.SSS0.Px1.p1.6),[Appendix E](https://arxiv.org/html/2605.11142#A5.SS0.SSS0.Px2.p1.20)\.
- J\. Leskovec, J\. Kleinberg, and C\. Faloutsos \(2007\)Graph evolution: densification and shrinking diameters\.ACM Transactions on Knowledge Discovery from Data \(TKDD\)1\(1\),pp\. 2–es\.Cited by:[§4](https://arxiv.org/html/2605.11142#S4.p2.1)\.
- J\. Leskovec and A\. Krevl \(2014\)SNAP Datasets: Stanford large network dataset collection\.Cited by:[§4](https://arxiv.org/html/2605.11142#S4.p2.1)\.
- C\. Li, H\. Farkhoor, R\. Liu, and J\. Yosinski \(2018\)Measuring the intrinsic dimension of objective landscapes\.External Links:1804\.08838,[Link](https://arxiv.org/abs/1804.08838)Cited by:[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- T\. Li, E\. Levina, and J\. Zhu \(2020\)Network cross\-validation by edge sampling\.Biometrika107\(2\),pp\. 257–276\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p2.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- D\. Liben\-Nowell and J\. Kleinberg \(2003\)The link prediction problem for social networks\.InCIKM,pp\. 556–559\.Cited by:[§4](https://arxiv.org/html/2605.11142#S4.SS0.SSS0.Px1.p3.19)\.
- G\. Luo, J\. Li, J\. Su, H\. Peng, C\. Yang, L\. Sun, P\. S\. Yu, and L\. He \(2021\)Graph entropy guided node embedding dimension selection for graph neural networks\.arXiv preprint arXiv:2105\.03178\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p2.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- N\. Nakis, A\. Celikkanat, L\. Boucherie, C\. Djurhuus, F\. Burmester, D\. M\. Holmelund, M\. Frolcová, and M\. Mørup \(2023a\)Characterizing polarization in social networks using the signed relational latent distance model\.InProceedings of The 26th International Conference on Artificial Intelligence and Statistics,F\. Ruiz, J\. Dy, and J\. van de Meent \(Eds\.\),Proceedings of Machine Learning Research, Vol\.206,pp\. 11489–11505\.External Links:[Link](https://proceedings.mlr.press/v206/nakis23a.html)Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p1.1)\.
- N\. Nakis, A\. Çelikkanat, S\. L\. Jørgensen, and M\. Mørup \(2023b\)A hierarchical block distance model for ultra low\-dimensional graph representations\.External Links:2204\.05885,[Link](https://arxiv.org/abs/2204.05885)Cited by:[§2](https://arxiv.org/html/2605.11142#S2.p1.1),[§4](https://arxiv.org/html/2605.11142#S4.SS0.SSS0.Px1.p3.19)\.
- N\. Nakis, A\. Çelikkanat, and M\. Mørup \(2022\)HM\-ldm: a hybrid\-membership latent distance model\.External Links:2206\.03463,[Link](https://arxiv.org/abs/2206.03463)Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p1.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1),[§4](https://arxiv.org/html/2605.11142#S4.p2.1)\.
- N\. Nakis, N\. R\. Holm, A\. L\. Fiehn, and M\. Mørup \(2025\)How low can you go? searching for the intrinsic dimensionality of complex networks using metric node embeddings\.InThe Thirteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=V71ITh2w40)Cited by:[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- N\. Nakis, C\. Kosma, P\. Promponas, M\. Chatzianastasis, and G\. Nikolentzos \(2026\)Aitchison embeddings for learning compositional graph representations\.External Links:2605\.00716,[Link](https://arxiv.org/abs/2605.00716)Cited by:[§5](https://arxiv.org/html/2605.11142#S5.p2.1)\.
- A\. Ng, M\. I\. Jordan, and Y\. Weiss \(2001\)On spectral clustering: analysis and an algorithm\.InNeural Information Processing Systems,External Links:[Link](https://api.semanticscholar.org/CorpusID:18764978)Cited by:[§4](https://arxiv.org/html/2605.11142#S4.SS0.SSS0.Px1.p4.12)\.
- G\. Nikolentzos, M\. Chatzianastasis, and M\. Vazirgiannis \(2024\)What do gnns actually learn? towards understanding their representations\.External Links:2304\.10851,[Link](https://arxiv.org/abs/2304.10851)Cited by:[§4](https://arxiv.org/html/2605.11142#S4.p2.1)\.
- J\. Nocedal and S\. J\. Wright \(2006\)Numerical optimization\.Springer\.Cited by:[Appendix E](https://arxiv.org/html/2605.11142#A5.SS0.SSS0.Px2.p2.9)\.
- T\. Opsahl \(2011\)Why anchorage is not \(that\) important: binary ties and sample selection\.Note:[https://toreopsahl\.com/datasets/](https://toreopsahl.com/datasets/)Cited by:[§4](https://arxiv.org/html/2605.11142#S4.p2.1)\.
- G\. A\. Pagani and M\. Aiello \(2013\)The power grid as a complex network: a survey\.Physica A: Statistical Mechanics and its Applications392\(11\),pp\. 2688–2700\.External Links:ISSN 0378\-4371,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.physa.2013.01.023),[Link](https://www.sciencedirect.com/science/article/pii/S0378437113000575)Cited by:[§5](https://arxiv.org/html/2605.11142#S5.p1.5)\.
- E\. Paluck, H\. Shepherd, and P\. Aronow \(2016\)Changing climates of conflict: a social network experiment in 56 schools\.Proceedings of the National Academy of Sciences113,pp\. 201514483\.External Links:[Document](https://dx.doi.org/10.1073/pnas.1514483113)Cited by:[§5](https://arxiv.org/html/2605.11142#S5.p1.5)\.
- F\. S\. Passino and N\. A\. Heard \(2020\)Bayesian estimation of the latent dimension and communities in stochastic blockmodels\.Statistics and Computing30\(5\),pp\. 1291–1307\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p2.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- B\. Perozzi, R\. Al\-Rfou, and S\. Skiena \(2014\)DeepWalk: Online Learning of Social Representations\.InProceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,pp\. 701–710\.Cited by:[§4](https://arxiv.org/html/2605.11142#S4.SS0.SSS0.Px1.p3.19)\.
- J\. Qiu, Y\. Dong, H\. Ma, J\. Li, K\. Wang, and J\. Tang \(2018\)Network embedding as matrix factorization: unifying DeepWalk, LINE, PTE, and Node2Vec\.InProceedings of the 11th ACM International Conference on Web Search and Data Mining,pp\. 459–467\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p1.1),[§4](https://arxiv.org/html/2605.11142#S4.p2.1)\.
- B\. Recht, M\. Fazel, and P\. A\. Parrilo \(2010\)Guaranteed minimum\-rank solutions of linear matrix equations via nuclear norm minimization\.SIAM review52\(3\),pp\. 471–501\.Cited by:[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- R\. A\. Rossi and N\. Ahmed \(2015\)The network data repository with interactive graph analytics and visualization\.InAAAI Conference on Artificial Intelligence,External Links:[Link](https://api.semanticscholar.org/CorpusID:10141934)Cited by:[§4](https://arxiv.org/html/2605.11142#S4.p2.1)\.
- O\. Roy and M\. Vetterli \(2007\)The effective rank: a measure of effective dimensionality\.In2007 15th European signal processing conference,pp\. 606–610\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p3.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- P\. Rubin\-Delanchy, J\. Cape, M\. Tang, and C\. E\. Priebe \(2022\)A statistical interpretation of spectral embedding: the generalised random dot product graph\.Journal of the Royal Statistical Society Series B: Statistical Methodology84\(4\),pp\. 1446–1473\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p1.1)\.
- A\. Sanyal, P\. H\. S\. Torr, and P\. K\. Dokania \(2020\)Stable rank normalization for improved generalization in neural networks and gans\.External Links:1906\.04659,[Link](https://arxiv.org/abs/1906.04659)Cited by:[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- N\. Srebro, J\. Rennie, and T\. Jaakkola \(2004\)Maximum\-margin matrix factorization\.Advances in neural information processing systems17\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p4.2),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- C\. Stark, B\. Breitkreutz, T\. Reguly, L\. Boucher, A\. Breitkreutz, and M\. Tyers \(2006\)BioGRID: a general repository for interaction datasets\.Nucleic Acids Research34\.External Links:[Document](https://dx.doi.org/10.1093/nar/gkj109),[Link](https://doi.org/10.1093/nar/gkj109)Cited by:[§4](https://arxiv.org/html/2605.11142#S4.p2.1)\.
- D\. L\. Sussman, M\. Tang, and C\. E\. Priebe \(2013\)Consistent latent position estimation and vertex classification for random dot product graphs\.IEEE transactions on pattern analysis and machine intelligence36\(1\),pp\. 48–57\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p1.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- R\. Taing and K\. Levin \(2026\)On the effect of misspecifying the embedding dimension in low\-rank network models\.arXiv preprint arXiv:2601\.06014\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p2.1)\.
- A\. L\. Traud, P\. J\. Mucha, and M\. A\. Porter \(2012\)Social structure of facebook networks\.Physica A: Statistical Mechanics and its Applications391\(16\),pp\. 4165–4180\.External Links:ISSN 0378\-4371,[Link](http://dx.doi.org/10.1016/j.physa.2011.12.021),[Document](https://dx.doi.org/10.1016/j.physa.2011.12.021)Cited by:[§4](https://arxiv.org/html/2605.11142#S4.p2.1)\.
- L\. W\. Tu \(2011\)An introduction to manifolds\.2 edition,Springer\.Cited by:[Appendix A](https://arxiv.org/html/2605.11142#A1.p2.1)\.
- U\. von Luxburg \(2007\)A tutorial on spectral clustering\.External Links:0711\.0189,[Link](https://arxiv.org/abs/0711.0189)Cited by:[§4](https://arxiv.org/html/2605.11142#S4.SS0.SSS0.Px1.p4.12)\.
- X\. Wang, P\. Cui, J\. Wang, J\. Pei, W\. Zhu, and S\. Yang \(2017\)Community preserving network embedding\.InProceedings of the 31st AAAI Conference on Artificial Intelligence,Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p1.1),[§4](https://arxiv.org/html/2605.11142#S4.p2.1)\.
- D\. J\. Watts and S\. H\. Strogatz \(1998\)Collective dynamics of ‘small\-world’ networks\.Nature393,pp\. 440–442\.External Links:[Link](https://api.semanticscholar.org/CorpusID:3034643)Cited by:[§4](https://arxiv.org/html/2605.11142#S4.p2.1)\.
- L\. Wei, Z\. Tan, C\. Li, J\. Wang, and W\. Huang \(2024\)Diff\-erank: a novel rank\-based metric for evaluating large language models\.Advances in Neural Information Processing Systems37,pp\. 39501–39521\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p3.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- J\. Yang, Y\. Zhao, and Q\. Zhu \(2024\)Effective rank and the staircase phenomenon: new insights into neural network training dynamics\.arXiv e\-prints,pp\. arXiv–2412\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p3.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.
- M\. Zhu and A\. Ghodsi \(2006\)Automatic dimensionality selection from the scree plot via the use of profile likelihood\.Computational Statistics & Data Analysis51\(2\),pp\. 918–930\.Cited by:[§1](https://arxiv.org/html/2605.11142#S1.p2.1),[§2](https://arxiv.org/html/2605.11142#S2.p1.1)\.

Table 3:Notation used throughout the paper\.Symbols appear in the order they are introduced\. The two effective\-dimension symbolsdspecd\_\{\\mathrm\{spec\}\}anddpred\_\{\\mathrm\{pre\}\}are conceptually distinct:dspecd\_\{\\mathrm\{spec\}\}is the continuous Shannon effective rank of the fitted kernel \(controlled at training time byη\\eta\), whiledpred\_\{\\mathrm\{pre\}\}is the integer prefix size chosen post\-hoc when extracting a rank\-dpred\_\{\\mathrm\{pre\}\}prefix of the fitted kernel\.SymbolMeaning*Graph and observations*𝒢=\(V,E\)\\mathcal\{G\}=\(V,E\)Simple undirected graph with node setVVand edge setEE\.NNNumber of nodes,N=\|V\|N=\|V\|\.𝒀∈\{0,1\}N×N\\bm\{Y\}\\in\\\{0,1\\\}^\{N\\times N\}Symmetric adjacency matrix,𝒀ii=0\\bm\{Y\}\_\{ii\}=0\.*Latent representation*L∈ℝN×rL\\in\\mathbb\{R\}^\{N\\times r\}Latent factor; rank caprr\.rrParameterization ceiling onrank⁡\(L\)\\operatorname\{rank\}\(L\)\.K\(L\)∈ℝN×NK\(L\)\\in\\mathbb\{R\}^\{N\\times N\}Trace\-normalized PSD kernel,K\(L\)=NLL⊤/tr⁡\(LL⊤\)K\(L\)=N\\,LL^\{\\top\}/\\operatorname\{tr\}\(LL^\{\\top\}\)\.𝒂∈ℝN\\bm\{a\}\\in\\mathbb\{R\}^\{N\}Node\-specific log\-odds offsets\.β\>0\\beta\>0Global slope on the latent affinity\.ϕ\(t\)\\phi\(t\)Sigmoid,ϕ\(t\)=\(1\+e−t\)−1\\phi\(t\)=\(1\+e^\{\-t\}\)^\{\-1\}\.*Spectrum and effective dimension*λj\(K\)\\lambda\_\{j\}\(K\)Thejj\-th eigenvalue ofKK, sorted descending\.uju\_\{j\}Orthonormal eigenvector associated withλj\(K\)\\lambda\_\{j\}\(K\)\.U,ΛU,\\LambdaOrthonormal eigenvector matrix and diagonal eigenvalue matrix inK=UΛU⊤K=U\\Lambda U^\{\\top\}\.pi\(K\)p\_\{i\}\(K\)Spectral occupancy,pi\(K\)=λi\(K\)/Np\_\{i\}\(K\)=\\lambda\_\{i\}\(K\)/N\.p\(K\)p\(K\)Spectral occupancy distribution\(p1\(K\),…,pN\(K\)\)\(p\_\{1\}\(K\),\\ldots,p\_\{N\}\(K\)\)on the\(N−1\)\(N\{\-\}1\)\-simplex\.H\(p\)H\(p\)Shannon entropy ofpp,H\(p\)=−∑ipilog⁡piH\(p\)=\-\\sum\_\{i\}p\_\{i\}\\log p\_\{i\}\.dspec\(K\)d\_\{\\mathrm\{spec\}\}\(K\)Effective spectral dimension,dspec\(K\)=exp⁡\(H\(p\(K\)\)\)d\_\{\\mathrm\{spec\}\}\(K\)=\\exp\(H\(p\(K\)\)\)\. Continuous; ranges over\[1,rank⁡\(K\)\]\[1,\\operatorname\{rank\}\(K\)\]\.dspec\(η\)d\_\{\\mathrm\{spec\}\}\(\\eta\)Realizeddspecd\_\{\\mathrm\{spec\}\}along theη\\eta\-controlled solution branch,dspec\(η\)=dspec\(K\(L⋆\(η\)\)\)d\_\{\\mathrm\{spec\}\}\(\\eta\)=d\_\{\\mathrm\{spec\}\}\(K\(L^\{\\star\}\(\\eta\)\)\)\.dspec⋆d\_\{\\mathrm\{spec\}\}^\{\\star\}Target value ofdspecd\_\{\\mathrm\{spec\}\}specified by the user;η\\etais calibrated so thatdspec\(η\)≈dspec⋆d\_\{\\mathrm\{spec\}\}\(\\eta\)\\approx d\_\{\\mathrm\{spec\}\}^\{\\star\}\.PR\(K\)\\mathrm\{PR\}\(K\)Participation ratio,PR\(K\)=1/∑ipi\(K\)2\\mathrm\{PR\}\(K\)=1/\\sum\_\{i\}p\_\{i\}\(K\)^\{2\}\.kτ\(K\)k\_\{\\tau\}\(K\)Thresholded active rank,kτ\(K\)=\|\{i:λi\(K\)≥τλ1\(K\)\}\|k\_\{\\tau\}\(K\)=\|\\\{i:\\lambda\_\{i\}\(K\)\\geq\\tau\\,\\lambda\_\{1\}\(K\)\\\}\|\.*Spectral prefixes*dpred\_\{\\mathrm\{pre\}\}Integer prefix size used for prefix\-based visualization and analysis;1≤dpre≤r1\\leq d\_\{\\mathrm\{pre\}\}\\leq r\.KdpreK\_\{d\_\{\\mathrm\{pre\}\}\}Rank\-dpred\_\{\\mathrm\{pre\}\}spectral prefix ofKK,Kdpre=U1:dpreΛ1:dpreU1:dpre⊤K\_\{d\_\{\\mathrm\{pre\}\}\}=U\_\{1:d\_\{\\mathrm\{pre\}\}\}\\Lambda\_\{1:d\_\{\\mathrm\{pre\}\}\}U\_\{1:d\_\{\\mathrm\{pre\}\}\}^\{\\top\}\. EYM\-optimal among rank\-dpred\_\{\\mathrm\{pre\}\}PSD approximations ofKK\(Thm\.[2](https://arxiv.org/html/2605.11142#Thmtheorem2)\)\.πij\\pi\_\{ij\}Soft membership of nodeiiin prefix modejj,πij∝qj\[i\]2\\pi\_\{ij\}\\propto q\_\{j\}\[i\]^\{2\}, normalized to a row\-simplex\.*Training objective and capacity control*η∈ℝ\\eta\\in\\mathbb\{R\}Spectral\-entropy weight \(Lagrange\-style coefficient\) controlling realizeddspecd\_\{\\mathrm\{spec\}\}\.ℓ\(L,𝒂,β\)\\ell\(L,\\bm\{a\},\\beta\)Sampled link\-prediction negative log\-likelihood\.R\(L,𝒂,β\)R\(L,\\bm\{a\},\\beta\)Quadratic regularization on parameters\.Fη\(L,𝒂,β\)F\_\{\\eta\}\(L,\\bm\{a\},\\beta\)Entropy\-regularized objective,Fη=ℓ−ηlog⁡dspec\(K\(L\)\)\+RF\_\{\\eta\}=\\ell\-\\eta\\log d\_\{\\mathrm\{spec\}\}\(K\(L\)\)\+R\.L⋆\(η\)L^\{\\star\}\(\\eta\)Local minimizer ofFηF\_\{\\eta\}along a fitted branch\.K⋆\(η\)K^\{\\star\}\(\\eta\)Trace\-normalized kernel atL⋆\(η\)L^\{\\star\}\(\\eta\)\.Θ~\\widetilde\{\\Theta\}Gauge\-fixed slice of the parameter space used to state second\-order conditions\.*Calibration ofη\\eta*τ\\tauRelative tolerance of the bisection stopping criterion,\|dspec\(η\)−dspec⋆\|/dspec⋆≤τ\|d\_\{\\mathrm\{spec\}\}\(\\eta\)\-d\_\{\\mathrm\{spec\}\}^\{\\star\}\|/d\_\{\\mathrm\{spec\}\}^\{\\star\}\\leq\\tau\.η⋆\\eta^\{\\star\}Calibrated entropy weight selected by bisection\.\[ηmin,ηmax\]\[\\eta\_\{\\min\},\\eta\_\{\\max\}\]Bracket interval used in bisection;W=ηmax−ηminW=\\eta\_\{\\max\}\-\\eta\_\{\\min\}\.δ\\deltaTargetη\\eta\-precision in the complexity statement\.*Evaluation*ddNominal embedding dimension at which baselines are trained \(matched toSpectra’s rank capr=dr=din Table[2](https://arxiv.org/html/2605.11142#S4.T2)\)\.TLLTest log\-likelihood\.trNLLTraining negative log\-likelihood\.GenGapGeneralization gap,GenGap=trNLL\+TLL\\mathrm\{GenGap\}=\\mathrm\{trNLL\}\+\\mathrm\{TLL\}\.SSNumber of random seeds per experimental cell\.

## Appendix AExtra Preliminaries

We give some basic definitions used in the proofs\. We only consider smooth submanifolds of the Euclidean space\.

The definitions in this appendix are standard\. See, for example,Lee \[[2013](https://arxiv.org/html/2605.11142#bib.bib115)\]andTu \[[2011](https://arxiv.org/html/2605.11142#bib.bib116)\]\.

###### Definition 7\(Smooth embedded submanifold\)\.

LetM⊂ℝDM\\subset\\mathbb\{R\}^\{D\}\. We say thatMMis a smooth embedded submanifold of dimensionmmif, around every pointp∈Mp\\in M, there exists an open setW⊂ℝmW\\subset\\mathbb\{R\}^\{m\}and a smooth parametrization

ψ:W→ℝD\\psi:W\\to\\mathbb\{R\}^\{D\}such thatψ\(W\)\\psi\(W\)is a neighborhood ofppinMM, the mapψ:W→ψ\(W\)\\psi:W\\to\\psi\(W\)is one\-to\-one, with smooth inverse, and the Jacobian matrixJψ\(x\)J\_\{\\psi\}\(x\)has rankmmfor everyx∈Wx\\in W\.

###### Definition 8\(Local chart\)\.

LetM⊂ℝDM\\subset\\mathbb\{R\}^\{D\}be a smooth embedded submanifold\. A local chart aroundp∈Mp\\in Mis a smooth parametrization

ψ:W⊂ℝm→M\\psi:W\\subset\\mathbb\{R\}^\{m\}\\to Mwhose image is a neighborhood ofppinMM\. Ifp=ψ\(x\),p=\\psi\(x\),thenx∈ℝmx\\in\\mathbb\{R\}^\{m\}is the local coordinate representation ofpp\. Thus, a functionf:M→ℝf:M\\to\\mathbb\{R\}can locally be written as the ordinary Euclidean functionf^\(x\):=f\(ψ\(x\)\)\.\\widehat\{f\}\(x\):=f\(\\psi\(x\)\)\.

###### Definition 9\(Tangent vector\)\.

LetM⊂ℝDM\\subset\\mathbb\{R\}^\{D\}be a smooth embedded submanifold and letp∈Mp\\in M\. A vectorv∈ℝDv\\in\\mathbb\{R\}^\{D\}is tangent toMMatppif there exists a smooth curve

γ:\(−ε,ε\)→M\\gamma:\(\-\\varepsilon,\\varepsilon\)\\to Msuch thatγ\(0\)=p,γ′\(0\)=v\.\\gamma\(0\)=p,\\ \\gamma^\{\\prime\}\(0\)=v\.

###### Definition 10\(Tangent space\)\.

The tangent space ofMMatpp, denotedTpMT\_\{p\}M, is the set of all tangent vectors toMMatpp:

TpM:=\{γ′\(0\)∣γ:\(−ε,ε\)→Mis smooth andγ\(0\)=p\}\.T\_\{p\}M:=\\left\\\{\\gamma^\{\\prime\}\(0\)\\mid\\gamma:\(\-\\varepsilon,\\varepsilon\)\\to M\\text\{ is smooth and \}\\gamma\(0\)=p\\right\\\}\.It is a vector subspace of the ambient spaceℝD\\mathbb\{R\}^\{D\}\.

###### Definition 11\(Tangent space in local coordinates\)\.

Let

ψ:W⊂ℝm→M\\psi:W\\subset\\mathbb\{R\}^\{m\}\\to Mbe a local chart and letp=ψ\(x\)p=\\psi\(x\)\. ThenTpM=Im⁡Jψ\(x\)\.T\_\{p\}M=\\operatorname\{Im\}J\_\{\\psi\}\(x\)\.Equivalently, every tangent vectorv∈TpMv\\in T\_\{p\}Mcan be written asv=Jψ\(x\)uv=J\_\{\\psi\}\(x\)ufor someu∈ℝmu\\in\\mathbb\{R\}^\{m\}\.

###### Definition 12\(Tangent space of a constraint set\)\.

Suppose

M=\{x∈ℝD:c\(x\)=0\},M=\\\{x\\in\\mathbb\{R\}^\{D\}:c\(x\)=0\\\},wherec:ℝD→ℝqc:\\mathbb\{R\}^\{D\}\\to\\mathbb\{R\}^\{q\}is smooth\. Suppose also that the Jacobian matrixJc\(p\)J\_\{c\}\(p\)has full rank for everyp∈Mp\\in M\. ThenMMis a smooth embedded submanifold, and its tangent space is

TpM=ker⁡Jc\(p\)\.T\_\{p\}M=\\ker J\_\{c\}\(p\)\.Equivalently,v∈TpM⇔Jc\(p\)v=0\.v\\in T\_\{p\}M\\ \\Leftrightarrow\\ J\_\{c\}\(p\)v=0\.

## Appendix BProof of Theorem[2](https://arxiv.org/html/2605.11142#Thmtheorem2)

SinceKKis symmetric and PSD, the spectral theorem gives an orthonormal eigenbasis for its imageIm\(K\)=\{Kx:x∈ℝN\}\\mathrm\{Im\}\(K\)=\\\{Kx:x\\in\\mathbb\{R\}^\{N\}\\\}\. Thus, we may write

K=∑j=1rλjujuj⊤,K=\\sum\_\{j=1\}^\{r\}\\lambda\_\{j\}u\_\{j\}u\_\{j\}^\{\\top\},wherer=rank⁡\(K\)r=\\operatorname\{rank\}\(K\), the vectorsu1,…,uru\_\{1\},\\ldots,u\_\{r\}are orthonormal, and the nonzero eigenvalues satisfyλj\>0\\lambda\_\{j\}\>0\. Equivalently,

K=Udiag⁡\(λ1,…,λr\)U⊤,K=U\\operatorname\{diag\}\(\\lambda\_\{1\},\\ldots,\\lambda\_\{r\}\)U^\{\\top\},whereU=\(u1,…,ur\)U=\(u\_\{1\},\\ldots,u\_\{r\}\)andU⊤U=IrU^\{\\top\}U=I\_\{r\}\. By the Eckart–Young–Mirsky TheoremEckart and Young \[[1936](https://arxiv.org/html/2605.11142#bib.bib65)\], the best rank\-kkapproximation toKKin Frobenius norm, among all matrices of rank at mostkk, is obtained by keeping thekklargest singular values and their corresponding singular vectors\. SinceK⪰0K\\succeq 0, its singular values are exactly its eigenvalues\. Therefore the unconstrained best rank\-kkapproximation is

Kk=∑j=1kλjujuj⊤\.K\_\{k\}=\\sum\_\{j=1\}^\{k\}\\lambda\_\{j\}u\_\{j\}u\_\{j\}^\{\\top\}\.Next, we want to check that the positive semidefinite constraint does not change the optimizer\. The optimization over all rank\-kkmatrices is a relaxation of the optimization over rank\-kkpositive semidefinite matrices:

minrank⁡\(M\)≤k⁡‖K−M‖F2≤minM⪰0rank⁡\(M\)≤k⁡‖K−M‖F2\.\\min\_\{\\operatorname\{rank\}\(M\)\\leq k\}\\\|K\-M\\\|\_\{F\}^\{2\}\\leq\\min\_\{\\begin\{subarray\}\{c\}M\\succeq 0\\\\ \\operatorname\{rank\}\(M\)\\leq k\\end\{subarray\}\}\\\|K\-M\\\|\_\{F\}^\{2\}\.But the unconstrained minimizerKkK\_\{k\}is itself trivially PSD, because it is a nonnegative linear combination of rank\-one positive semidefinite matrices\. Therefore, it is also optimal among positive semidefinite rank\-kkmatrices\.

It remains to prove the uniqueness statement\. Assume thatλk\>λk\+1\\lambda\_\{k\}\>\\lambda\_\{k\+1\}, and letM⪰0M\\succeq 0withrank⁡\(M\)≤k\\operatorname\{rank\}\(M\)\\leq kbe any minimizer\. Write the spectral decomposition ofMMas

M=∑i=1kμivivi⊤,M=\\sum\_\{i=1\}^\{k\}\\mu\_\{i\}v\_\{i\}v\_\{i\}^\{\\top\},whereμ1≥⋯≥μk≥0\\mu\_\{1\}\\geq\\cdots\\geq\\mu\_\{k\}\\geq 0andv1,…,vkv\_\{1\},\\ldots,v\_\{k\}are orthonormal\. Using the Frobenius inner product,

‖K−M‖F2=‖K‖F2\+‖M‖F2−2⟨K,M⟩\.\\\|K\-M\\\|\_\{F\}^\{2\}=\\\|K\\\|\_\{F\}^\{2\}\+\\\|M\\\|\_\{F\}^\{2\}\-2\\langle K,M\\rangle\.SinceK⪰0K\\succeq 0andM⪰0M\\succeq 0, von Neumann’s trace inequality gives

⟨K,M⟩=tr⁡\(KM\)≤∑i=1kλiμi\.\\langle K,M\\rangle=\\operatorname\{tr\}\(KM\)\\leq\\sum\_\{i=1\}^\{k\}\\lambda\_\{i\}\\mu\_\{i\}\.Therefore,

‖K−M‖F2\\displaystyle\\\|K\-M\\\|\_\{F\}^\{2\}≥∑j=1rλj2\+∑i=1kμi2−2∑i=1kλiμi\\displaystyle\\geq\\sum\_\{j=1\}^\{r\}\\lambda\_\{j\}^\{2\}\+\\sum\_\{i=1\}^\{k\}\\mu\_\{i\}^\{2\}\-2\\sum\_\{i=1\}^\{k\}\\lambda\_\{i\}\\mu\_\{i\}=∑i=1k\(μi−λi\)2\+∑j=k\+1rλj2\.\\displaystyle=\\sum\_\{i=1\}^\{k\}\(\\mu\_\{i\}\-\\lambda\_\{i\}\)^\{2\}\+\\sum\_\{j=k\+1\}^\{r\}\\lambda\_\{j\}^\{2\}\.On the other hand, the truncated matrixKkK\_\{k\}satisfies‖K−Kk‖F2=∑j=k\+1rλj2\.\\\|K\-K\_\{k\}\\\|\_\{F\}^\{2\}=\\sum\_\{j=k\+1\}^\{r\}\\lambda\_\{j\}^\{2\}\.SinceMMis also assumed to be optimal, we must have‖K−M‖F2=∑j=k\+1rλj2\.\\\|K\-M\\\|\_\{F\}^\{2\}=\\sum\_\{j=k\+1\}^\{r\}\\lambda\_\{j\}^\{2\}\.Comparing this with the lower bound above, we get

∑i=1k\(μi−λi\)2=0\.\\sum\_\{i=1\}^\{k\}\(\\mu\_\{i\}\-\\lambda\_\{i\}\)^\{2\}=0\.Henceμi=λi,i=1,…,k\.\\mu\_\{i\}=\\lambda\_\{i\},\\ i=1,\\ldots,k\.

Moreover, equality must also hold in von Neumann’s trace inequality\. The equality implies that the image ofMMmust be a top\-kkeigenspace ofKK\. Becauseλk\>λk\+1\\lambda\_\{k\}\>\\lambda\_\{k\+1\}, this top\-kkeigenspace is uniquely determined, namely

span⁡\{u1,…,uk\}\.\\operatorname\{span\}\\\{u\_\{1\},\\ldots,u\_\{k\}\\\}\.ThereforeMMhas eigenvaluesλ1,…,λk\\lambda\_\{1\},\\ldots,\\lambda\_\{k\}on this uniquely determined subspace and is zero on its orthogonal complement\. Consequently,

M=∑j=1kλjujuj⊤=Kk\.M=\\sum\_\{j=1\}^\{k\}\\lambda\_\{j\}u\_\{j\}u\_\{j\}^\{\\top\}=K\_\{k\}\.ThusKkK\_\{k\}is the unique minimizer\.

## Appendix CProof of Theorem[3](https://arxiv.org/html/2605.11142#Thmtheorem3)

SinceKKis symmetric and PSD, the spectral theorem gives an orthonormal basis of eigenvectors for its image\. ThusKKcan be written as

K=∑j=1rλjujuj⊤,K=\\sum\_\{j=1\}^\{r\}\\lambda\_\{j\}u\_\{j\}u\_\{j\}^\{\\top\},whereu1,…,uru\_\{1\},\\ldots,u\_\{r\}are orthonormal eigenvectors andλ1,…,λr\\lambda\_\{1\},\\ldots,\\lambda\_\{r\}are the positive eigenvalues\.

For any eigenvalueλj\\lambda\_\{j\}, the associated eigenspace is

Eλj=\{v:Kv=λjv\}\.E\_\{\\lambda\_\{j\}\}=\\\{v:Kv=\\lambda\_\{j\}v\\\}\.SinceEλjE\_\{\\lambda\_\{j\}\}is one\-dimensional, any unit eigenvectoruj′u^\{\\prime\}\_\{j\}associated withλj\\lambda\_\{j\}satisfiesuj′=cjuju^\{\\prime\}\_\{j\}=c\_\{j\}u\_\{j\}for some scalarcjc\_\{j\}\. Because the vectors in the eigendecompositioon are orthonormal it holds that‖uj′‖2=‖uj‖2=1\\\|u^\{\\prime\}\_\{j\}\\\|\_\{2\}=\\\|u\_\{j\}\\\|\_\{2\}=1, we have\|cj\|=1\|c\_\{j\}\|=1, and hencecj∈\{±1\}c\_\{j\}\\in\\\{\\pm 1\\\}\. Therefore

uj′=±uj\.u^\{\\prime\}\_\{j\}=\\pm u\_\{j\}\.\(9\)
Now suppose

K=Udiag⁡\(λ1,…,λr\)U⊤=U′diag⁡\(λ1′,…,λr′\)U′⊤K=U\\operatorname\{diag\}\(\\lambda\_\{1\},\\ldots,\\lambda\_\{r\}\)U^\{\\top\}=U^\{\\prime\}\\operatorname\{diag\}\(\\lambda^\{\\prime\}\_\{1\},\\ldots,\\lambda^\{\\prime\}\_\{r\}\)\{U^\{\\prime\}\}^\{\\top\}are two spectral decompositions, withU⊤U=U′⊤U′=IrU^\{\\top\}U=\{U^\{\\prime\}\}^\{\\top\}U^\{\\prime\}=I\_\{r\}\.

For eachℓ∈\{1,…,r\}\\ell\\in\\\{1,\\ldots,r\\\}, the vectoruℓ′u^\{\\prime\}\_\{\\ell\}, theℓ\\ell\-th column ofU′U^\{\\prime\}, is a unit eigenvector ofKKwith eigenvalueλℓ′\\lambda^\{\\prime\}\_\{\\ell\}\. Since the positive eigenvalues ofKKare exactlyλ1,…,λr\\lambda\_\{1\},\\ldots,\\lambda\_\{r\}, and these eigenvalues are pairwise distinct, there exists a unique indexσ\(ℓ\)∈\{1,…,r\}\\sigma\(\\ell\)\\in\\\{1,\\ldots,r\\\}such that

λℓ′=λσ\(ℓ\)\.\\lambda^\{\\prime\}\_\{\\ell\}=\\lambda\_\{\\sigma\(\\ell\)\}\.The mapσ:\{1,…,r\}→\{1,…,r\}\\sigma:\\\{1,\\ldots,r\\\}\\to\\\{1,\\ldots,r\\\}is a permutation\.

Since bothuℓ′u^\{\\prime\}\_\{\\ell\}anduσ\(ℓ\)u\_\{\\sigma\(\\ell\)\}are unit eigenvectors associated with the same one\-dimensional eigenspaceEλσ\(ℓ\)E\_\{\\lambda\_\{\\sigma\(\\ell\)\}\}, using \([9](https://arxiv.org/html/2605.11142#A3.E9)\) there exists a signsℓ∈\{±1\}s\_\{\\ell\}\\in\\\{\\pm 1\\\}such that

uℓ′=sℓuσ\(ℓ\)\.u^\{\\prime\}\_\{\\ell\}=s\_\{\\ell\}u\_\{\\sigma\(\\ell\)\}\.
LetΠ\\Pibe the permutation matrix whoseℓ\\ell\-th column iseσ\(ℓ\)e\_\{\\sigma\(\\ell\)\}, and let

S=diag⁡\(s1,…,sr\)\.S=\\operatorname\{diag\}\(s\_\{1\},\\ldots,s\_\{r\}\)\.Then the column\-wise identities above imply

U′=UΠS\.U^\{\\prime\}=U\\Pi S\.Moreover, the identitiesλℓ′=λσ\(ℓ\)\\lambda^\{\\prime\}\_\{\\ell\}=\\lambda\_\{\\sigma\(\\ell\)\}implyλ′=Π⊤λ\\lambda^\{\\prime\}=\\Pi^\{\\top\}\\lambdaand the proof is complete\.

## Appendix DSmoothness of the restricted objective,Fη\|Θ~F\_\{\\eta\}\|\_\{\\widetilde\{\\Theta\}\}\.

###### Lemma 13\.

For anyη\\eta, the restricted objectiveFη\|Θ~F\_\{\\eta\}\|\_\{\\widetilde\{\\Theta\}\}is𝒞2\\mathcal\{C\}^\{2\}\.

###### Proof\.

Recall that

Fη\(θ\)=Φ\(θ\)−ηh\(θ\),Φ\(θ\):=ℓ\(θ\)\+R\(θ\),h\(θ\):=log⁡dspec\(K\(L\)\)\.F\_\{\\eta\}\(\\theta\)=\\Phi\(\\theta\)\-\\eta h\(\\theta\),\\qquad\\Phi\(\\theta\):=\\ell\(\\theta\)\+R\(\\theta\),\\qquad h\(\\theta\):=\\log d\_\{\\mathrm\{spec\}\}\(K\(L\)\)\.
The logistic negative log\-likelihoodℓ\\ellis𝒞∞\\mathcal\{C\}^\{\\infty\}inθ\\thetaon\{L≠0\}\\\{L\\neq 0\\\}as it is a finite sum of termslog⁡ϕ\(±\(ai\+aj\+βKij\(L\)\)\)\\log\\phi\(\\pm\(a\_\{i\}\+a\_\{j\}\+\\beta K\_\{ij\}\(L\)\)\), whereϕ\\phiandlog⁡ϕ\\log\\phiare𝒞∞\\mathcal\{C\}^\{\\infty\}onℝ\\mathbb\{R\}, andK\(L\)=nLL⊤/‖L‖F2K\(L\)=nLL^\{\\top\}/\\\|L\\\|\_\{F\}^\{2\}is rational inLLwith denominator nonzero on\{L≠0\}\\\{L\\neq 0\\\}\. The regularizerRRis a quadratic polynomial and therefore𝒞∞\\mathcal\{C\}^\{\\infty\}\. It remains to show thath\|Θ~h\|\_\{\\widetilde\{\\Theta\}\}is smooth\.

Let

θ=\(\[L1∣0\],a,β\)∈Θ~,\\theta=\(\[L\_\{1\}\\mid 0\],a,\\beta\)\\in\\widetilde\{\\Theta\},and write the columns ofL1L\_\{1\}asL1=\(ℓ1,…,ℓs\)\.L\_\{1\}=\(\\ell\_\{1\},\\ldots,\\ell\_\{s\}\)\.SinceL1⊤L1L\_\{1\}^\{\\top\}L\_\{1\}is diagonal andL1L\_\{1\}has full column rank, define

di:=‖ℓi‖22\>0,Q:=∑j=1sdj=‖L1‖F2\.d\_\{i\}:=\\\|\\ell\_\{i\}\\\|\_\{2\}^\{2\}\>0,\\qquad Q:=\\sum\_\{j=1\}^\{s\}d\_\{j\}=\\\|L\_\{1\}\\\|\_\{F\}^\{2\}\.ThenL1⊤L1=diag⁡\(d1,…,ds\)\.L\_\{1\}^\{\\top\}L\_\{1\}=\\operatorname\{diag\}\(d\_\{1\},\\ldots,d\_\{s\}\)\.The nonzero eigenvalues ofL1L1⊤L\_\{1\}L\_\{1\}^\{\\top\}are the same as the eigenvalues ofL1⊤L1L\_\{1\}^\{\\top\}L\_\{1\}\. Therefore the nonzero eigenvalues of

K\(L\)=NL1L1⊤‖L1‖F2K\(L\)=N\\frac\{L\_\{1\}L\_\{1\}^\{\\top\}\}\{\\\|L\_\{1\}\\\|\_\{F\}^\{2\}\}are

λi\(K\(L\)\)=NdiQ,i=1,…,s\.\\lambda\_\{i\}\(K\(L\)\)=N\\frac\{d\_\{i\}\}\{Q\},\\qquad i=1,\\ldots,s\.Hence the active spectral occupancies are

pi\(L\)=λi\(K\(L\)\)N=diQ,i=1,…,s\.p\_\{i\}\(L\)=\\frac\{\\lambda\_\{i\}\(K\(L\)\)\}\{N\}=\\frac\{d\_\{i\}\}\{Q\},\\qquad i=1,\\ldots,s\.Thus, onΘ~\\widetilde\{\\Theta\},

h\(θ\)=log⁡dspec\(K\(L\)\)=−∑i=1sdiQlog⁡\(diQ\)\.h\(\\theta\)=\\log d\_\{\\mathrm\{spec\}\}\(K\(L\)\)=\-\\sum\_\{i=1\}^\{s\}\\frac\{d\_\{i\}\}\{Q\}\\log\\left\(\\frac\{d\_\{i\}\}\{Q\}\\right\)\.Eachdid\_\{i\}is a polynomial function of the entries ofL1L\_\{1\}, and onΘ~\\widetilde\{\\Theta\}we havedi\>0d\_\{i\}\>0andQ\>0Q\>0\. Therefore the expression above is smooth inL1L\_\{1\}\. It does not depend onaaorβ\\beta\. Henceh\|Θ~h\|\_\{\\widetilde\{\\Theta\}\}is smooth\.

Consequently,

Fη\|Θ~=Φ\|Θ~−ηh\|Θ~F\_\{\\eta\}\|\_\{\\widetilde\{\\Theta\}\}=\\Phi\|\_\{\\widetilde\{\\Theta\}\}\-\\eta h\|\_\{\\widetilde\{\\Theta\}\}is𝒞2\\mathcal\{C\}^\{2\}for every fixedη\\eta\. This proves the lemma\. ∎

## Appendix EProof of Theorem[6](https://arxiv.org/html/2605.11142#Thmtheorem6)

![Refer to caption](https://arxiv.org/html/2605.11142v1/x17.png)\(a\)
![Refer to caption](https://arxiv.org/html/2605.11142v1/x18.png)\(b\)

Figure 4:Illustration of Theorem[6](https://arxiv.org/html/2605.11142#Thmtheorem6)\.\(a\)Solution branches inΘ~\\widetilde\{\\Theta\}\. Near a reference pointθ∗\(η0\)\\theta^\{\*\}\(\\eta\_\{0\}\)satisfying the Assumptions[4](https://arxiv.org/html/2605.11142#Thmtheorem4)and[5](https://arxiv.org/html/2605.11142#Thmtheorem5), the optimum traces a smooth curve asη\\etavaries, with eachθ∗\(η\)\\theta^\{\*\}\(\\eta\)a strict local minimum ofFηF\_\{\\eta\}on its slice\.FηF\_\{\\eta\}may have additional local minima, each generating its own local branch under the same conditions\.\(b\)Capacity profilesdspec\(η\)d\_\{\\mathrm\{spec\}\}\(\\eta\)along the branches\. The tracked branch is smooth and nondecreasing onUU\. BeyondUU, structural events on the tracked branch can interrupt the local guarantee\.First, we show the setΘ~\\widetilde\{\\Theta\}is a smooth submanifold and define the tangent spaceTθΘ~T\_\{\\theta\}\\widetilde\{\\Theta\}\. Then we show the three parts of Theorem[6](https://arxiv.org/html/2605.11142#Thmtheorem6)\.

#### Smooth Submanifold and Tangent Space\.

Recall the definition of the setΘ~\\widetilde\{\\Theta\}, i\.e\.,

Θ~:=\{\(\[L1∣0\],a,β\):L1∈ℝN×s,rank\(L1\)=s,L1⊤L1is diagonal,a∈ℝN,β\>0\}\.\\widetilde\{\\Theta\}:=\\left\\\{\(\[L\_\{1\}\\mid 0\],a,\\beta\):L\_\{1\}\\in\\mathbb\{R\}^\{N\\times s\},\\ \\operatorname\{rank\}\(L\_\{1\}\)=s,\\ L\_\{1\}^\{\\top\}L\_\{1\}\\text\{ is diagonal\},\\ a\\in\\mathbb\{R\}^\{N\},\\ \\beta\>0\\right\\\}\.The setΘ~\\widetilde\{\\Theta\}is a smooth submanifold as per Definition[7](https://arxiv.org/html/2605.11142#Thmtheorem7): indeed, the conditionsrank⁡\(L1\)=s\\operatorname\{rank\}\(L\_\{1\}\)=sandβ\>0\\beta\>0are open conditions, while the constraintsL2=0L\_\{2\}=0andoffdiag⁡\(L1⊤L1\)=0\\operatorname\{offdiag\}\(L\_\{1\}^\{\\top\}L\_\{1\}\)=0form a smooth submersion on the full\-column\-rank set; hence the claim follows from the regular level\-set theorem \(see Chapter 8,Lee \[[2013](https://arxiv.org/html/2605.11142#bib.bib115)\]\)\.

Fix a pointθ=\(\[L1∣0\],α,β\)∈Θ~\\theta=\(\[L\_\{1\}\\mid 0\],\\alpha,\\beta\)\\in\\widetilde\{\\Theta\}\. Next we show the tangent space ofΘ~\\widetilde\{\\Theta\}is

TθΘ~≔\{\(\[L˙1∣0\],a˙,β˙\):offdiag\(L1⊤L˙1\+L˙1⊤L1\)=0\},T\_\{\\theta\}\\widetilde\{\\Theta\}\\coloneqq\\left\\\{\(\[\\dot\{L\}\_\{1\}\\mid 0\],\\dot\{a\},\\dot\{\\beta\}\):\\operatorname\{offdiag\}\\left\(L\_\{1\}^\{\\top\}\\dot\{L\}\_\{1\}\+\\dot\{L\}\_\{1\}^\{\\top\}L\_\{1\}\\right\)=0\\right\\\}\\,,using Definition[12](https://arxiv.org/html/2605.11142#Thmtheorem12)\. We write a general factor matrixL∈ℝN×rL\\in\\mathbb\{R\}^\{N\\times r\}asL=\[L1∣L2\]L=\[L\_\{1\}\\mid L\_\{2\}\]whereL1∈ℝN×sL\_\{1\}\\in\\mathbb\{R\}^\{N\\times s\},L2∈ℝN×\(r−s\)L\_\{2\}\\in\\mathbb\{R\}^\{N\\times\(r\-s\)\}\. Consider the open set𝒰:=\{\(L1,L2,a,β\):rank⁡\(L1\)=s,β\>0\}\\mathcal\{U\}:=\\left\\\{\(L\_\{1\},L\_\{2\},a,\\beta\):\\operatorname\{rank\}\(L\_\{1\}\)=s,\\ \\beta\>0\\right\\\}\. ThenΘ~⊆𝒰\\widetilde\{\\Theta\}\\subseteq\\mathcal\{U\}and is defined by the smooth equality constraintsL2=0L\_\{2\}=0andoffdiag⁡\(L1⊤L1\)=0\\operatorname\{offdiag\}\(L\_\{1\}^\{\\top\}L\_\{1\}\)=0\. Equivalently, definec:𝒰→ℝN×\(r−s\)×𝕊0sc:\\mathcal\{U\}\\to\\mathbb\{R\}^\{N\\times\(r\-s\)\}\\times\\mathbb\{S\}\_\{0\}^\{s\}where𝕊0s:=\{B∈ℝs×s:B⊤=B,Bii=0for alli\}\\mathbb\{S\}\_\{0\}^\{s\}:=\\\{B\\in\\mathbb\{R\}^\{s\\times s\}:B^\{\\top\}=B,\\ B\_\{ii\}=0\\text\{ for all \}i\\\}by

c\(L1,L2,a,β\):=\(L2,offdiag⁡\(L1⊤L1\)\)\.c\(L\_\{1\},L\_\{2\},a,\\beta\):=\\left\(L\_\{2\},\\,\\operatorname\{offdiag\}\(L\_\{1\}^\{\\top\}L\_\{1\}\)\\right\)\\,\.ThenΘ~=c−1\(0\)\\widetilde\{\\Theta\}=c^\{\-1\}\(0\)\. After identifying matrix spaces with Euclidean spaces by vectorization, this is exactly of the formM=\{x:c\(x\)=0\}M=\\\{x:c\(x\)=0\\\}in Definition[12](https://arxiv.org/html/2605.11142#Thmtheorem12)\. Thus, we knowTθΘ~=ker⁡Jc\(θ\)T\_\{\\theta\}\\widetilde\{\\Theta\}=\\ker J\_\{c\}\(\\theta\)which we calculate next\.

Letθ˙=\(\[L˙1∣L˙2\],a˙,β˙\)\\dot\{\\theta\}=\(\[\\dot\{L\}\_\{1\}\\mid\\dot\{L\}\_\{2\}\],\\dot\{a\},\\dot\{\\beta\}\)be an arbitrary direction in the ambient space\. The first component ofccis the map\(L1,L2,a,β\)↦L2\(L\_\{1\},L\_\{2\},a,\\beta\)\\mapsto L\_\{2\}, which is linear inL2L\_\{2\}and independent ofL1,a,βL\_\{1\},a,\\beta\. Therefore the Jacobian of this component maps\(\[L˙1∣L˙2\],a˙,β˙\)↦L˙2\(\[\\dot\{L\}\_\{1\}\\mid\\dot\{L\}\_\{2\}\],\\dot\{a\},\\dot\{\\beta\}\)\\mapsto\\dot\{L\}\_\{2\}\.

It remains to compute the Jacobian of the second component,L1↦offdiag⁡\(L1⊤L1\)L\_\{1\}\\mapsto\\operatorname\{offdiag\}\(L\_\{1\}^\{\\top\}L\_\{1\}\)\. WriteL1=\(ℓ1,…,ℓs\)L\_\{1\}=\(\\ell\_\{1\},\\ldots,\\ell\_\{s\}\)andL˙1=\(ℓ˙1,…,ℓ˙s\)\\dot\{L\}\_\{1\}=\(\\dot\{\\ell\}\_\{1\},\\ldots,\\dot\{\\ell\}\_\{s\}\), whereℓi,ℓ˙i∈ℝn\\ell\_\{i\},\\dot\{\\ell\}\_\{i\}\\in\\mathbb\{R\}^\{n\}\. Fori≠ji\\neq j, define the scalar constraint

cij\(L1\):=\(L1⊤L1\)ij=ℓi⊤ℓj=∑k=1n\(L1\)ki\(L1\)kj\.c\_\{ij\}\(L\_\{1\}\):=\(L\_\{1\}^\{\\top\}L\_\{1\}\)\_\{ij\}=\\ell\_\{i\}^\{\\top\}\\ell\_\{j\}=\\sum\_\{k=1\}^\{n\}\(L\_\{1\}\)\_\{ki\}\(L\_\{1\}\)\_\{kj\}\\,\.We identifyL1L\_\{1\}with the vectorvec⁡\(L1\)\\operatorname\{vec\}\(L\_\{1\}\)\. HenceJcij\(L1\)J\_\{c\_\{ij\}\}\(L\_\{1\}\)denotes the ordinary Jacobian row of the scalar functioncijc\_\{ij\}with respect to the entries ofL1L\_\{1\}\. Fora∈\[N\]a\\in\[N\]andb∈\[s\]b\\in\[s\], we have

∂cij∂\(L1\)ab\(L1\)=𝟏\{b=i\}\(L1\)aj\+𝟏\{b=j\}\(L1\)ai\.\\frac\{\\partial c\_\{ij\}\}\{\\partial\(L\_\{1\}\)\_\{ab\}\}\(L\_\{1\}\)=\\mathbf\{1\}\_\{\\\{b=i\\\}\}\(L\_\{1\}\)\_\{aj\}\+\\mathbf\{1\}\_\{\\\{b=j\\\}\}\(L\_\{1\}\)\_\{ai\}\\,\.Therefore, applying this Jacobian row to the directionvec⁡\(L˙1\)\\operatorname\{vec\}\(\\dot\{L\}\_\{1\}\), we obtain

Jcij\(L1\)vec⁡\(L˙1\)\\displaystyle J\_\{c\_\{ij\}\}\(L\_\{1\}\)\\operatorname\{vec\}\(\\dot\{L\}\_\{1\}\)=∑a=1n∑b=1s∂cij∂\(L1\)ab\(L1\)\(L˙1\)ab\\displaystyle=\\sum\_\{a=1\}^\{n\}\\sum\_\{b=1\}^\{s\}\\frac\{\\partial c\_\{ij\}\}\{\\partial\(L\_\{1\}\)\_\{ab\}\}\(L\_\{1\}\)\(\\dot\{L\}\_\{1\}\)\_\{ab\}=∑a=1n\(L1\)aj\(L˙1\)ai\+∑a=1n\(L1\)ai\(L˙1\)aj\\displaystyle=\\sum\_\{a=1\}^\{n\}\(L\_\{1\}\)\_\{aj\}\(\\dot\{L\}\_\{1\}\)\_\{ai\}\+\\sum\_\{a=1\}^\{n\}\(L\_\{1\}\)\_\{ai\}\(\\dot\{L\}\_\{1\}\)\_\{aj\}=ℓj⊤ℓ˙i\+ℓi⊤ℓ˙j\\displaystyle=\\ell\_\{j\}^\{\\top\}\\dot\{\\ell\}\_\{i\}\+\\ell\_\{i\}^\{\\top\}\\dot\{\\ell\}\_\{j\}=\(L1⊤L˙1\+L˙1⊤L1\)ij\.\\displaystyle=\\bigl\(L\_\{1\}^\{\\top\}\\dot\{L\}\_\{1\}\+\\dot\{L\}\_\{1\}^\{\\top\}L\_\{1\}\\bigr\)\_\{ij\}\\,\.Collecting these identities over all off\-diagonal pairsi≠ji\\neq j, we getJoffdiag⁡\(L1⊤L1\)\(L1\)vec⁡\(L˙1\)=offdiag⁡\(L1⊤L˙1\+L˙1⊤L1\)J\_\{\\operatorname\{offdiag\}\(L\_\{1\}^\{\\top\}L\_\{1\}\)\}\(L\_\{1\}\)\\operatorname\{vec\}\(\\dot\{L\}\_\{1\}\)=\\operatorname\{offdiag\}\\bigl\(L\_\{1\}^\{\\top\}\\dot\{L\}\_\{1\}\+\\dot\{L\}\_\{1\}^\{\\top\}L\_\{1\}\\bigr\)\.

Combining the two components ofcc, the full Jacobian satisfies

Jc\(L1,L2,a,β\)\(L˙1L˙2a˙β˙\)=\(L˙2,offdiag⁡\(L1⊤L˙1\+L˙1⊤L1\)\)\.J\_\{c\}\(L\_\{1\},L\_\{2\},a,\\beta\)\\begin\{pmatrix\}\\dot\{L\}\_\{1\}\\\\ \\dot\{L\}\_\{2\}\\\\ \\dot\{a\}\\\\ \\dot\{\\beta\}\\end\{pmatrix\}=\\left\(\\dot\{L\}\_\{2\},\\,\\operatorname\{offdiag\}\\bigl\(L\_\{1\}^\{\\top\}\\dot\{L\}\_\{1\}\+\\dot\{L\}\_\{1\}^\{\\top\}L\_\{1\}\\bigr\)\\right\)\.
By Definition[12](https://arxiv.org/html/2605.11142#Thmtheorem12), the tangent space is the kernel of the Jacobian, i\.e, allθ˙\\dot\{\\theta\}such thatJc\(θ\)θ˙=0J\_\{c\}\(\\theta\)\\,\\dot\{\\theta\}=0\. Therefore, using the formula above, we get the result

TθΘ~=\{\(\[L˙1∣0\],a˙,β˙\):offdiag\(L1⊤L˙1\+L˙1⊤L1\)=0\}\.T\_\{\\theta\}\\widetilde\{\\Theta\}=\\left\\\{\(\[\\dot\{L\}\_\{1\}\\mid 0\],\\dot\{a\},\\dot\{\\beta\}\):\\operatorname\{offdiag\}\\bigl\(L\_\{1\}^\{\\top\}\\dot\{L\}\_\{1\}\+\\dot\{L\}\_\{1\}^\{\\top\}L\_\{1\}\\bigr\)=0\\right\\\}\\,\.

#### Claim \(i\)

Recall thatθ⋆∈Θ~\\theta^\{\\star\}\\in\\widetilde\{\\Theta\}satisfies Assumptions[4](https://arxiv.org/html/2605.11142#Thmtheorem4)and[5](https://arxiv.org/html/2605.11142#Thmtheorem5)and

Fη\(θ\)=Φ\(θ\)−ηh\(θ\),Φ\(θ\):=ℓ\(θ\)\+R\(θ\),h\(θ\):=log⁡dspec\(K\(L\)\)\.F\_\{\\eta\}\(\\theta\)=\\Phi\(\\theta\)\-\\eta h\(\\theta\),\\qquad\\Phi\(\\theta\):=\\ell\(\\theta\)\+R\(\\theta\),\\qquad h\(\\theta\):=\\log d\_\{\\mathrm\{spec\}\}\(K\(L\)\)\.SinceΘ~\\widetilde\{\\Theta\}is a smooth manifold as per Definition[7](https://arxiv.org/html/2605.11142#Thmtheorem7)there exists a smooth local chartψ:W⊂ℝm→Θ~\\psi:W\\subset\\mathbb\{R\}^\{m\}\\to\\tilde\{\\Theta\}withψ\(0\)=θ⋆\.\\psi\(0\)=\\theta^\{\\star\}\.DefineF^\(x,η\):=Fη\(ψ\(x\)\),\\widehat\{F\}\(x,\\eta\):=F\_\{\\eta\}\(\\psi\(x\)\),whereh^\(x\):=h\(ψ\(x\)\)\\widehat\{h\}\(x\):=h\(\\psi\(x\)\)andΦ^\(x\):=Φ\(ψ\(x\)\)\.\\widehat\{\\Phi\}\(x\):=\\Phi\(\\psi\(x\)\)\.Then,

F^\(x,η\)=Φ^\(x\)−ηh^\(x\)\.\\widehat\{F\}\(x,\\eta\)=\\widehat\{\\Phi\}\(x\)\-\\eta\\widehat\{h\}\(x\)\.By Lemma[13](https://arxiv.org/html/2605.11142#Thmtheorem13)in Appendix[D](https://arxiv.org/html/2605.11142#A4),F^\\widehat\{F\}isC2C^\{2\}in\(x,η\)\(x,\\eta\)\. Now defineG:ℝm×ℝ→ℝmG\\colon\\mathbb\{R\}^\{m\}\\times\\mathbb\{R\}\\to\\mathbb\{R\}^\{m\}as

G\(x,η\):=∇xF^\(x,η\)\.G\(x,\\eta\):=\\nabla\_\{x\}\\widehat\{F\}\(x,\\eta\)\.By Assumption[5](https://arxiv.org/html/2605.11142#Thmtheorem5), sinceψ\(0\)=θ⋆\\psi\(0\)=\\theta^\{\\star\}, it holdsG\(0,η0\)=∇xF^\(0,η0\)=0G\(0,\\eta\_\{0\}\)=\\nabla\_\{x\}\\widehat\{F\}\(0,\\eta\_\{0\}\)=0\. Moreover, let∇xG\(0,η0\)=∇x2F^\(0,η0\)\.\\nabla\_\{x\}G\(0,\\eta\_\{0\}\)=\\nabla\_\{x\}^\{2\}\\widehat\{F\}\(0,\\eta\_\{0\}\)\.By Assumption[5](https://arxiv.org/html/2605.11142#Thmtheorem5)matrix∇xG\(0,η0\)\\nabla\_\{x\}G\(0,\\eta\_\{0\}\)is positive definite\. In particular, it is invertible\. Hence, by the implicit function theorem\[Lee,[2013](https://arxiv.org/html/2605.11142#bib.bib115), Ch\. 5\], there exist an open intervalU∋η0U\\ni\\eta\_\{0\}and a𝒞1\\mathcal\{C\}^\{1\}mapx⋆:U→Wx^\{\\star\}:U\\to Wsuch thatx⋆\(η0\)=0x^\{\\star\}\(\\eta\_\{0\}\)=0andG\(x⋆\(η\),η\)=0for everyη∈U\.G\(x^\{\\star\}\(\\eta\),\\eta\)=0\\ \\text\{for every \}\\eta\\in U\.Define

θ⋆\(η\):=ψ\(x⋆\(η\)\)\.\\theta^\{\\star\}\(\\eta\):=\\psi\(x^\{\\star\}\(\\eta\)\)\.Thenθ⋆:U→Θ~\\theta^\{\\star\}:U\\to\\tilde\{\\Theta\}is𝒞1\\mathcal\{C\}^\{1\},θ⋆\(η0\)=θ⋆\\theta^\{\\star\}\(\\eta\_\{0\}\)=\\theta^\{\\star\}, and

∇θFη\(θ⋆\(η\)\)\|Tθ⋆\(η\)Θ~=0for everyη∈U\.\\displaystyle\\nabla\_\{\\theta\}F\_\{\\eta\}\(\\theta^\{\\star\}\(\\eta\)\)\\big\|\_\{T\_\{\\theta^\{\\star\}\(\\eta\)\}\\tilde\{\\Theta\}\}=0\\qquad\\text\{for every \}\\eta\\in U\.\(10\)This proves the restricted criticality part of claim \(i\)\. It remains to show that these restricted critical points, i\.e\., the ones satisfying Equation[10](https://arxiv.org/html/2605.11142#A5.E10), are strict local minima\. Define

H\(η\):=∇x2F^\(x⋆\(η\),η\)\.H\(\\eta\):=\\nabla\_\{x\}^\{2\}\\widehat\{F\}\(x^\{\\star\}\(\\eta\),\\eta\)\.The mapη↦H\(η\)\\eta\\mapsto H\(\\eta\)is continuous, becauseF^\\widehat\{F\}is𝒞2\\mathcal\{C\}^\{2\}andx⋆x^\{\\star\}is𝒞1\\mathcal\{C\}^\{1\}\. Notice that, forη=η0\\eta=\\eta\_\{0\}, we haveH\(η0\)=∇x2F^\(0,η0\)≻0H\(\\eta\_\{0\}\)=\\nabla\_\{x\}^\{2\}\\widehat\{F\}\(0,\\eta\_\{0\}\)\\succ 0by Assumption[5](https://arxiv.org/html/2605.11142#Thmtheorem5)\. Positive definiteness is an open condition\. Therefore, after possibly shrinkingUU, we haveH\(η\)≻0for everyη∈U\.H\(\\eta\)\\succ 0\\qquad\\text\{for every \}\\eta\\in U\.

Thus for anyη∈U\\eta\\in Uthe functionx↦F^\(x,η\)x\\mapsto\\widehat\{F\}\(x,\\eta\)satisfies

∇xF^\(x⋆\(η\),η\)=0∇x2F^\(x⋆\(η\),η\)=H\(η\)≻0\.\\nabla\_\{x\}\\widehat\{F\}\(x^\{\\star\}\(\\eta\),\\eta\)=0\\qquad\\nabla\_\{x\}^\{2\}\\widehat\{F\}\(x^\{\\star\}\(\\eta\),\\eta\)=H\(\\eta\)\\succ 0\\,\.By the standard second\-order sufficient condition for an unconstrained local minimum\[Nocedal and Wright,[2006](https://arxiv.org/html/2605.11142#bib.bib2), Ch\. 2\],x⋆\(η\)x^\{\\star\}\(\\eta\)is a strict local minimum ofx↦F^\(x,η\)x\\mapsto\\widehat\{F\}\(x,\\eta\)\. Sinceψ\\psiis a local coordinate chart forΘ~\\widetilde\{\\Theta\}, this is exactly the statement thatθ⋆\(η\)\\theta^\{\\star\}\(\\eta\)is a strict local minimum ofFηF\_\{\\eta\}restricted toΘ~\\widetilde\{\\Theta\}\. Claim \(i\) follows\.

#### Claim \(ii\)

By claim \(i\), the mapη↦θ⋆\(η\)\\eta\\mapsto\\theta^\{\\star\}\(\\eta\)is𝒞1\\mathcal\{C\}^\{1\}and takes values inΘ~\\tilde\{\\Theta\}\. Therefore we may writeL⋆\(η\)=\[L1⋆\(η\)∣0\],L^\{\\star\}\(\\eta\)=\[L\_\{1\}^\{\\star\}\(\\eta\)\\mid 0\],whereL1⋆\(η\)=\(ℓ1\(η\),…,ℓs\(η\)\)L\_\{1\}^\{\\star\}\(\\eta\)=\(\\ell\_\{1\}\(\\eta\),\\ldots,\\ell\_\{s\}\(\\eta\)\)depends𝒞1\\mathcal\{C\}^\{1\}onη\\eta\. Define

di\(η\):=‖ℓi\(η\)‖22,Q\(η\):=∑j=1sdj\(η\)\.d\_\{i\}\(\\eta\):=\\\|\\ell\_\{i\}\(\\eta\)\\\|\_\{2\}^\{2\},\\qquad Q\(\\eta\):=\\sum\_\{j=1\}^\{s\}d\_\{j\}\(\\eta\)\.Sincedi\(η0\)=di⋆\>0,d\_\{i\}\(\\eta\_\{0\}\)=d\_\{i\}^\{\\star\}\>0,continuity implies that, after shrinkingUUif necessary,di\(η\)\>0for everyi=1,…,sand everyη∈U\.d\_\{i\}\(\\eta\)\>0\\qquad\\text\{for every \}i=1,\\ldots,s\\text\{ and every \}\\eta\\in U\.Therefore,

pi\(η\):=di\(η\)Q\(η\)p\_\{i\}\(\\eta\):=\\frac\{d\_\{i\}\(\\eta\)\}\{Q\(\\eta\)\}is𝒞1\\mathcal\{C\}^\{1\}onUU\. Notice thatpi\(η\)=λi\(K\(L⋆\(η\)\)\)/Np\_\{i\}\(\\eta\)=\\lambda\_\{i\}\(K\(L^\{\\star\}\(\\eta\)\)\)/Nsince\(L⋆\)TL\(L^\{\\star\}\)^\{T\}Lis diagonal andK\(L⋆\)=N⋅L⋆\(L⋆\)T/tr⁡\(L⋆\(L⋆\)T\)\{K\(L^\{\\star\}\)=N\\cdot L^\{\\star\}\(L^\{\\star\}\)^\{T\}/\\operatorname\{tr\}\(L^\{\\star\}\(L^\{\\star\}\)^\{T\}\)\}\.222We repeat this argument in more detail in the proof of Lemma[13](https://arxiv.org/html/2605.11142#Thmtheorem13)\.Hence

log⁡dspec\(η\)=log⁡dspec\(K\(L⋆\(η\)\)\)=−∑i=1spi\(η\)log⁡pi\(η\)\.\\log d\_\{\\mathrm\{spec\}\}\(\\eta\)=\\log d\_\{\\mathrm\{spec\}\}\(K\(L^\{\\star\}\(\\eta\)\)\)=\-\\sum\_\{i=1\}^\{s\}p\_\{i\}\(\\eta\)\\log p\_\{i\}\(\\eta\)\.The functionx↦xlog⁡xx\\mapsto x\\log xis𝒞∞\\mathcal\{C\}^\{\\infty\}on\(0,∞\)\(0,\\infty\), and eachpi\(η\)p\_\{i\}\(\\eta\)is positive and𝒞1\\mathcal\{C\}^\{1\}\(since it is a polynomial in the entries ofL⋆L^\{\\star\}andL⋆L^\{\\star\}is𝒞1\\mathcal\{C\}^\{1\}onη\\eta\)\. Thereforelog⁡dspec\(η\)\\log d\_\{\\mathrm\{spec\}\}\(\\eta\)is𝒞1\\mathcal\{C\}^\{1\}\. Sincedspec\(η\)=exp⁡\(log⁡dspec\(η\)\),d\_\{\\mathrm\{spec\}\}\(\\eta\)=\\exp\(\\log d\_\{\\mathrm\{spec\}\}\(\\eta\)\),the capacity profiledspecd\_\{\\mathrm\{spec\}\}is also𝒞1\\mathcal\{C\}^\{1\}\. This proves Claim \(ii\)\.

#### Claim \(iii\)

We work in the same local coordinates as in the proof of claim \(i\)\. Recall that

F^\(x,η\)=Φ^\(x\)−ηh^\(x\)\.\\widehat\{F\}\(x,\\eta\)=\\widehat\{\\Phi\}\(x\)\-\\eta\\widehat\{h\}\(x\)\.The branch satisfies

∇xF^\(x⋆\(η\),η\)=0for everyη∈U\.\\nabla\_\{x\}\\widehat\{F\}\(x^\{\\star\}\(\\eta\),\\eta\)=0\\qquad\\text\{for every \}\\eta\\in U\.Differentiate this identity with respect toη\\eta\. Here the first term comes from differentiating the map

x↦∇xF^\(x,η\)x\\mapsto\\nabla\_\{x\}\\widehat\{F\}\(x,\\eta\)along the curvex⋆\(η\)x^\{\\star\}\(\\eta\), while the second term comes from the explicit dependence ofF^\\widehat\{F\}onη\\eta\. Since

∂η∇xF^\(x,η\)=−∇xh^\(x\),\\partial\_\{\\eta\}\\nabla\_\{x\}\\widehat\{F\}\(x,\\eta\)=\-\\nabla\_\{x\}\\widehat\{h\}\(x\),we get

0=ddη∇xF^\(x⋆\(η\),η\)=H\(η\)dx⋆dη\(η\)−∇xh^\(x⋆\(η\)\)\.0=\\frac\{d\}\{d\\eta\}\\nabla\_\{x\}\\widehat\{F\}\(x^\{\\star\}\(\\eta\),\\eta\)=H\(\\eta\)\\frac\{dx^\{\\star\}\}\{d\\eta\}\(\\eta\)\-\\nabla\_\{x\}\\widehat\{h\}\(x^\{\\star\}\(\\eta\)\)\.Therefore,

H\(η\)dx⋆dη\(η\)=∇xh^\(x⋆\(η\)\)\.H\(\\eta\)\\frac\{dx^\{\\star\}\}\{d\\eta\}\(\\eta\)=\\nabla\_\{x\}\\widehat\{h\}\(x^\{\\star\}\(\\eta\)\)\.By claim \(i\),H\(η\)≻0H\(\\eta\)\\succ 0for everyη∈U\\eta\\in U\. ThusH\(η\)H\(\\eta\)is invertible, and

dx⋆dη\(η\)=H\(η\)−1∇xh^\(x⋆\(η\)\)\.\\frac\{dx^\{\\star\}\}\{d\\eta\}\(\\eta\)=H\(\\eta\)^\{\-1\}\\nabla\_\{x\}\\widehat\{h\}\(x^\{\\star\}\(\\eta\)\)\.Now

log⁡dspec\(η\)=h\(θ⋆\(η\)\)=h^\(x⋆\(η\)\)\.\\log d\_\{\\mathrm\{spec\}\}\(\\eta\)=h\(\\theta^\{\\star\}\(\\eta\)\)=\\widehat\{h\}\(x^\{\\star\}\(\\eta\)\)\.Hence, by the chain rule,

ddηlog⁡dspec\(η\)=∇xh^\(x⋆\(η\)\)⊤dx⋆dη\(η\)\.\\frac\{d\}\{d\\eta\}\\log d\_\{\\mathrm\{spec\}\}\(\\eta\)=\\nabla\_\{x\}\\widehat\{h\}\(x^\{\\star\}\(\\eta\)\)^\{\\top\}\\frac\{dx^\{\\star\}\}\{d\\eta\}\(\\eta\)\.Substituting the expression fordx⋆/dηdx^\{\\star\}/d\\eta, we obtain

ddηlog⁡dspec\(η\)=∇xh^\(x⋆\(η\)\)⊤H\(η\)−1∇xh^\(x⋆\(η\)\)\.\\frac\{d\}\{d\\eta\}\\log d\_\{\\mathrm\{spec\}\}\(\\eta\)=\\nabla\_\{x\}\\widehat\{h\}\(x^\{\\star\}\(\\eta\)\)^\{\\top\}H\(\\eta\)^\{\-1\}\\nabla\_\{x\}\\widehat\{h\}\(x^\{\\star\}\(\\eta\)\)\.SinceH\(η\)≻0H\(\\eta\)\\succ 0, alsoH\(η\)−1≻0H\(\\eta\)^\{\-1\}\\succ 0\. Therefore

ddηlog⁡dspec\(η\)≥0\.\\frac\{d\}\{d\\eta\}\\log d\_\{\\mathrm\{spec\}\}\(\\eta\)\\geq 0\.

### E\.1Structural events along theη\\etapath

Theorem[6](https://arxiv.org/html/2605.11142#Thmtheorem6)guarantees𝒞1\\mathcal\{C\}^\{1\}regularity and local monotonicity of the realized effective dimensiondspec\(η\)d\_\{\\mathrm\{spec\}\}\(\\eta\)on a neighborhood of any referenceη0\\eta\_\{0\}where Assumptions[4](https://arxiv.org/html/2605.11142#Thmtheorem4)–[5](https://arxiv.org/html/2605.11142#Thmtheorem5)hold\. Three failure modes can interrupt this regularity along an extendedη\\etapath: an eigenvalue enters or leaves the active support; two adjacent positive eigenvalues approach equality \(violating Assumption[4](https://arxiv.org/html/2605.11142#Thmtheorem4)\); or the restricted Hessian becomes degenerate \(violating Assumption[5](https://arxiv.org/html/2605.11142#Thmtheorem5)\)\.

We monitor the first two directly during theη\\etasweep\. Support changes are flagged whenkτ\(K\)=\|\{i:λi\(K\)≥τλ1\(K\)\}\|k\_\{\\tau\}\(K\)=\|\\\{i:\\lambda\_\{i\}\(K\)\\geq\\tau\\lambda\_\{1\}\(K\)\\\}\|at relative\-mass thresholdτ\\tauchanges between consecutive probes\. Near\-degeneracy events are flagged when the minimum gap between adjacent positive eigenvalues drops below a relative tolerance of the largest eigenvalue\. Hessian degeneracy is not directly observable from the spectrum and is treated as correlated with the first two\.

Figure[5](https://arxiv.org/html/2605.11142#A5.F5)shows theη\\eta\-path of the realized effective dimension on four representative datasets, one from each structural family, at rank capsr=128r\{=\}128andr=256r\{=\}256\. Markers indicate detected structural events, without distinguishing type\. Three observations are consistent across datasets:\(i\)theη↦dspec\\eta\\\!\\mapsto\\\!d\_\{\\mathrm\{spec\}\}map is monotone inη\\etaacross the entire sweep, including across detected events;\(ii\)events concentrate near the extremes of theη\\eta\-range — at large positiveη\\etathe spectrum is driven toward uniformity and adjacent eigenvalues collide, while at large negativeη\\etathe spectrum compresses and modes drop below the support threshold;\(iii\)between events, the path is visibly smooth, consistent with the local𝒞1\\mathcal\{C\}^\{1\}guarantee of Theorem[6](https://arxiv.org/html/2605.11142#Thmtheorem6)\.

Empirically, structural events do not invalidate the capacity frontier:dspec\(η\)d\_\{\\mathrm\{spec\}\}\(\\eta\)remains monotone and continuous inη\\etaon every dataset and rank cap we examined\. The events appear as visible inflections rather than as discontinuities, suggesting that on these graphs the local𝒞1\\mathcal\{C\}^\{1\}branches connect into a globally continuous \(if not globally𝒞1\\mathcal\{C\}^\{1\}\) profile\. We reportdspec\(K⋆\)d\_\{\\mathrm\{spec\}\}\(K^\{\\star\}\)as the operating coordinate while tracking events as a diagnostic\.

Comparing the two rank caps shows that the event landscape depends on whetherrrbinds the representation\. On rank\-cap\-binding graphs \(ca\-grqc,bio\-grid\-worm\), increasingrrfrom128128to256256extends thedspecd\_\{\\mathrm\{spec\}\}range and shifts events upward; on saturated\-regime graphs \(inf\-power,socfb\-American75\), the path stays well below the cap at bothrrand the event landscape is sparser\. This is consistent with the regime distinction in the main paper\.

![Refer to caption](https://arxiv.org/html/2605.11142v1/x19.png)ca\-grqc
![Refer to caption](https://arxiv.org/html/2605.11142v1/x20.png)inf\-power
![Refer to caption](https://arxiv.org/html/2605.11142v1/x21.png)socfb\-American75
![Refer to caption](https://arxiv.org/html/2605.11142v1/x22.png)bio\-grid\-worm
![Refer to caption](https://arxiv.org/html/2605.11142v1/x23.png)
![Refer to caption](https://arxiv.org/html/2605.11142v1/x24.png)
![Refer to caption](https://arxiv.org/html/2605.11142v1/x25.png)
![Refer to caption](https://arxiv.org/html/2605.11142v1/x26.png)

Figure 5:Structural events along theη\\etapath at two rank caps, on four representative datasets spanning the four structural families\.*Top row:*r=128r\{=\}128\.*Bottom row:*r=256r\{=\}256\. Columns: citation \(ca\-grqc\), infrastructure \(inf\-power\), social \(socfb\-American75\), biological \(bio\-grid\-worm\)\. Each panel showsdspec\(η\)d\_\{\\mathrm\{spec\}\}\(\\eta\)vsη\\etaon log scale, mean±1σ\\pm 1\\sigmaacross three seeds\. Vertical line:η=0\\eta\{=\}0\. Event markers indicate probes where a structural event was detected along the fitted branch \(eigenvalue entering or leaving the active support, or adjacent positive eigenvalues approaching equality; see text\)\. Mostη\\eta\-paths are smooth acrossη∈\[−0\.25,0\.25\]\\eta\\in\[\-0\.25,0\.25\]with events concentrating near the extremes; the realized\-dimension profile remains continuous and monotone inη\\etaon every dataset even where local𝒞1\\mathcal\{C\}^\{1\}regularity is interrupted by an event\.

## Appendix FExperimental setup

We extensively evaluateSpectraagainst prominent baseline graph representation learning methods, including unconstrained Euclidean embeddings, matrix factorization, mixed\-membership models, and latent\-distance simplex\-based representations, across networks of varying sizes and structures\. We ran all methods using publicly available implementations\. When GPU support was available, experiments were executed on an NVIDIA A100 GPU; otherwise, we used an Apple M2 machine with 8 GB RAM\.

#### Optimization\.

ForSpectrawe optimize theη\\eta\-augmented objectiveFηF\_\{\\eta\}\(Eq\. \([6](https://arxiv.org/html/2605.11142#S3.E6)\)\) with Adam\[Kingma and Ba,[2014](https://arxiv.org/html/2605.11142#bib.bib48)\]at learning rate10−210^\{\-2\}for6,0006\{,\}000iterations unless stated otherwise\. At each step we sample five non\-edges per observed edge uniformly from the complement of the training graph, and combine the positive and negative log\-likelihood terms with equal weight\.

#### Regularization\.

The spectral regularizerRRinFηF\_\{\\eta\}is weighted by a fixed coefficientλ=10−4\\lambda=10^\{\-4\}in all experiments reported in the main text and supplement\. We found results to be robust to this choice: varyingλ\\lambdaover several orders of magnitude around10−410^\{\-4\}produced negligible changes in test AUC and in the achieveddspecd\_\{\\mathrm\{spec\}\}, so no per\-dataset tuning was performed\.

#### Baselines\.

For embedding baselines, dyadic edge features are constructed using the standard binary operators \(average, Hadamard, weighted\-L1L\_\{1\}, weighted\-L2L\_\{2\}\), followed by anL2L\_\{2\}\-regularized logistic regression classifier; in each cell we report the operator that maximizes the baseline’s test AUC, giving each baseline its strongest possible configuration\. For likelihood\-based latent graph models, including ours, we evaluate links directly from the learned log\-odds, with no auxiliary classifier and no operator selection\.

#### Test\-blind protocol\.

Spectrais reported in a strictly test\-blind regime: no hyperparameter,η\\etavalue,dspecd\_\{\\mathrm\{spec\}\}target, or model checkpoint is selected with reference to any test quantity, on any dataset, at any stage\.

## Appendix GFrontier protocol

We test whether predictive performance is organized by realized spectral capacitydspec\(η\)d\_\{\\mathrm\{spec\}\}\(\\eta\)rather than by the nominal rank caprr\. If this is the case, curves obtained at differentrrshould collapse onto a common frontier when plotted againstdspecd\_\{\\mathrm\{spec\}\}, and any residual dependence onrrshould appear only once the rank cap binds the achievable capacity\.

For each dataset and three seeds, we sweep the spectral\-prior weightη\\etaat three rank capsr∈\{64,128,256\}r\\in\\\{64,128,256\\\}\. Each probe is trained from scratch \(cold start, no warm\-starting acrossη\\etavalues\) so that the realized capacity at a givenη\\etareflects optimization under that prior alone, rather than path\-dependence from a neighboring configuration\.

#### Adaptive grid\.

Theη\\etavalues are chosen*adaptively*: starting from the most negativeη\\eta, the next probe is selected by halving or doubling the currentη\\eta\-step according to whetherdspecd\_\{\\mathrm\{spec\}\}moved by more than an upper threshold or less than a lower threshold between consecutive probes\. This refines the grid wheredspec\(η\)d\_\{\\mathrm\{spec\}\}\(\\eta\)changes rapidly while spending few probes on slowly\-varying segments, so structural transitions are resolved without paying for a uniformly fine sweep\.
Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models

Similar Articles

LLM rankings are not a ladder: experimental results from a transitive benchmark graph [D]

Gradient-Based LoRA Rank Allocation Under GRPO: An Empirical Study

I Found a Hidden Ratio in Transformers That Predicts Geometric Stability [R]

Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

Submit Feedback

Similar Articles

LLM rankings are not a ladder: experimental results from a transitive benchmark graph [D]
Gradient-Based LoRA Rank Allocation Under GRPO: An Empirical Study
I Found a Hidden Ratio in Transformers That Predicts Geometric Stability [R]
Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA
Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking