A Topological Characterization of Graph Neural Networks via Stochastic Block Model Embeddings on the n-Sphere
Summary
Proposes a topological framework for comparing trained Graph Neural Networks by mapping Stochastic Block Model embeddings onto the n-sphere, enabling visual inspection and transfer-learning candidate retrieval without retraining.
View Cached Full Text
Cached at: 06/09/26, 08:48 AM
# A Topological Characterization of Graph Neural Networks via Stochastic Block Model Embeddings on the 𝑛-Sphere
Source: [https://arxiv.org/html/2606.07598](https://arxiv.org/html/2606.07598)
Gopal Anantharaman KnotTheory\.ai Inc\., along with Dept\. of Mathematics, Emporia State University gopal@knottheory\.ai
###### Abstract
We propose a topological framework for comparing trained Graph Neural Networks \(GNNs\) by mapping the Stochastic Block Models \(SBMs\) induced on the graphon\-signal space of a Message Passing Neural Network \(MPNN\) onto the unitnn\-sphere𝕊n−1⊂ℝn\\mathbb\{S\}^\{n\-1\}\\subset\\mathbb\{R\}^\{n\}\. The construction rests on three classical pillars: the*compactness*of the cut\-distance graphon space\(𝒲~0,δ□\)\(\\widetilde\{\\mathcal\{W\}\}\_\{0\},\\delta\_\{\\square\}\)\(Lovász and Szegedy,[2006](https://arxiv.org/html/2606.07598#bib.bib19); Lovász,[2012](https://arxiv.org/html/2606.07598#bib.bib18)\), the Frieze–Kannan*weak regularity lemma*together with its graphon\-signal extension due toLevie \([2023](https://arxiv.org/html/2606.07598#bib.bib17)\), and the Lipschitz continuity of MPNNs with respect to the cut\-distance\. We show that, for any prescribed toleranceε\>0\\varepsilon\>0, a trained MPNNΦ\\Phiacting on a sufficiently large graph factors \(up toε\\varepsilon\) through a step\-graphon\-signal of bounded complexity, and we construct an explicit measure\-preserving mapΨn:\[0,1\]→𝕊n−1\\Psi\_\{n\}\\colon\[0,1\]\\to\\mathbb\{S\}^\{n\-1\}that places the SBM regions on disjoint spherical caps\. This produces a problem\-agnostic, low\-dimensional “fingerprint” of a trained GNN that is amenable to visual inspection and to nearest\-neighbour search across model zoos, enabling*transfer\-learning candidate retrieval*without retraining\. We discuss the obstruction posed by concentration of measure in high dimension — a phenomenon directly relevant to LLM\-scale embeddings\. We close with five concrete future research directions: hyperbolic and Grassmannian alternatives to the spherical model, Gromov–Wasserstein distances on graphon\-signals as an isometry\-free alternative to thenn\-sphere map, an information\-geometric \(Fisher\) reformulation of the SBM manifold, persistent\-homology fingerprints of layer\-wise embedding clouds, and a spectral\-distance baseline derived from the graphon eigendecomposition\.
###### Contents
1. [1Introduction](https://arxiv.org/html/2606.07598#S1)
2. [2Graph Neural Networks and Message Passing](https://arxiv.org/html/2606.07598#S2)
3. [3Graphons as Limits of Dense Graphs](https://arxiv.org/html/2606.07598#S3)
4. [4Weak Regularity, Discrete and Continuous](https://arxiv.org/html/2606.07598#S4)
5. [5The Graphon\-Signal Extension](https://arxiv.org/html/2606.07598#S5)
6. [6The Spherical Topological Model for SBMs](https://arxiv.org/html/2606.07598#S6)1. [6\.1Notation and ambient geometry](https://arxiv.org/html/2606.07598#S6.SS1) 2. [6\.2The block\-to\-cap mapΨn\\Psi\_\{n\}](https://arxiv.org/html/2606.07598#S6.SS2) 3. [6\.3Why the sphere?](https://arxiv.org/html/2606.07598#S6.SS3) 4. [6\.4Choosing the dimensionnn](https://arxiv.org/html/2606.07598#S6.SS4)
7. [7Main Results](https://arxiv.org/html/2606.07598#S7)1. [7\.1Concentration of measure: the equator problem](https://arxiv.org/html/2606.07598#S7.SS1) 2. [7\.2Closed\-form bound on inter\-SBM distance via the spherical map](https://arxiv.org/html/2606.07598#S7.SS2) 3. [7\.3Algorithmic recipe](https://arxiv.org/html/2606.07598#S7.SS3)
8. [8Application: Transfer\-Learning Candidate Retrieval](https://arxiv.org/html/2606.07598#S8)
9. [9Future Research Directions](https://arxiv.org/html/2606.07598#S9)1. [9\.1Beyond the sphere: hyperbolic and Grassmannian alternatives](https://arxiv.org/html/2606.07598#S9.SS1) 2. [9\.2Gromov–Wasserstein: an isometry\-free distance](https://arxiv.org/html/2606.07598#S9.SS2) 3. [9\.3Information geometry: SBMs as a statistical manifold](https://arxiv.org/html/2606.07598#S9.SS3) 4. [9\.4Persistent homology of layer\-wise embedding clouds](https://arxiv.org/html/2606.07598#S9.SS4) 5. [9\.5Spectral fingerprints from the graphon eigendecomposition](https://arxiv.org/html/2606.07598#S9.SS5) 6. [9\.6An experimental program](https://arxiv.org/html/2606.07598#S9.SS6)
10. [10Conclusion](https://arxiv.org/html/2606.07598#S10)
11. [References](https://arxiv.org/html/2606.07598#bib)
## 1Introduction
#### The problem in one paragraph\.
Imagine you have a library of trained Graph Neural Networks, one per problem domain — molecular toxicity here, citation\-graph classification there, traffic forecasting somewhere else\. A new graph problem arrives\. Which existing model should you start from? Today the answer is essentially “ask a human expert,” because two models with completely different weight tensors can be doing nearly identical work underneath, and two models that look superficially similar can be doing different things\. We need a comparable, low\-dimensional summary of*what a trained GNN has actually learned to do*— a fingerprint we can search\. This paper proposes one\.
#### What a GNN really learns\.
A trained Graph Neural Network is, in operational terms, a sequence of non\-linear update functionsΦ1,…,ΦK\\Phi\_\{1\},\\dots,\\Phi\_\{K\}that map an initial node/edge/graph signalx∈\(ℝd\)Vx\\in\(\\mathbb\{R\}^\{d\}\)^\{V\}to a learned embeddingh∈\(ℝd′\)Vh\\in\(\\mathbb\{R\}^\{d^\{\\prime\}\}\)^\{V\}\. The embedding is problem\-specific: its dimensions, scale, and even semantics are tied to a particular dataset, loss function, and downstream task\. Two GNNs trained on two ostensibly different problems may nevertheless converge on internal representations that encode*topologically equivalent*structure — a fact that is invisible if one compares only the raw weight tensors\.
Figure 1:Spherical fingerprint of a55\-class SBM: each block is mapped to a distinct zonal band of the unit sphere, and individual points represent the empirical distribution of nodes inside each block\. Disjoint colours correspond to disjoint blocks\. Comparing two such pictures \(or their pushforward measures, viaW1W\_\{1\}\) is the operational form of the cross\-model retrieval we propose\.The key insight is that the right object for cross\-problem comparison is not the network itself, nor its embeddings, but the*step\-function approximation*of the underlying graphon\-signal that the network has effectively learned\. A graphon is, intuitively, the continuum limit of an adjacency matrix as the number of nodes grows — a\[0,1\]×\[0,1\]\[0,1\]\\times\[0,1\]“pixel” picture where intensity is edge probability\. A step\-graphon is a graphon constant on rectangular blocks; these step functions are exactly the Stochastic Block Models \(SBMs\) familiar from network science\(Holland et al\.,[1983](https://arxiv.org/html/2606.07598#bib.bib11)\)\. Step\-graphons form a dense subset of the compact graphon space\(𝒲~0,δ□\)\(\\widetilde\{\\mathcal\{W\}\}\_\{0\},\\delta\_\{\\square\}\)\(Lovász and Szegedy,[2006](https://arxiv.org/html/2606.07598#bib.bib19); Lovász,[2012](https://arxiv.org/html/2606.07598#bib.bib18)\), which is why we can hope to build a finite library of canonical fingerprints that covers everything to a prescribed tolerance\.
Our contribution is to push this comparison from an abstract metric statement into a concrete, visualisable representation:
1. 1\.A canonical mappingΨn:\[0,1\]→𝕊n−1\\Psi\_\{n\}\\colon\[0,1\]\\to\\mathbb\{S\}^\{n\-1\}that places thennblocks of a step\-graphon onto disjoint spherical caps of equal Hausdorff measure\.
2. 2\.Numerical determination of the unique sphere dimension at which the total surface area equals11, suitable for use as a normalised probability surface\.
3. 3\.A discussion of when thisnn\-sphere model breaks down — in particular, the concentration\-of\-measure phenomenon that affectsn≳30n\\gtrsim 30\.
4. 4\.A roster of meaningful alternatives drawn from differential geometry, optimal transport, and topological data analysis\.
The resulting object is a low\-dimensional, hue\-coded picture of an SBM that the developer of a new GNN can compare visually \(and algorithmically, viaL2L^\{2\}on the sphere or via Wasserstein\) against a library of fingerprints of already\-trained GNNs\. When a close match is found, the new task may be amenable to a re\-mapping of an existing embedding rather than de novo training\.
#### Roadmap\.
[Section˜2](https://arxiv.org/html/2606.07598#S2)reviews the necessary background\.[Sections˜3](https://arxiv.org/html/2606.07598#S3),[4](https://arxiv.org/html/2606.07598#S4)and[5](https://arxiv.org/html/2606.07598#S5)present graphons, the weak regularity lemma, and Levie’s graphon\-signal extension\.[Section˜6](https://arxiv.org/html/2606.07598#S6)develops the SBM\-to\-sphere construction\.[Section˜7](https://arxiv.org/html/2606.07598#S7)is original: it diagnoses the concentration\-of\-measure obstruction and gives a careful proof of the measure\-preservation property\.[Section˜8](https://arxiv.org/html/2606.07598#S8)discusses transfer\-learning workflows\.[Section˜9](https://arxiv.org/html/2606.07598#S9)outlines five future\-research directions\.
## 2Graph Neural Networks and Message Passing
We assume the reader is familiar with empirical\-risk minimisation, train/validation/test splits, and the universal approximation property of multilayer perceptrons\(Cybenko,[1989](https://arxiv.org/html/2606.07598#bib.bib5); Hornik,[1991](https://arxiv.org/html/2606.07598#bib.bib12)\), which justifies using MLPs as elementary update modules inside a GNN\.
Figure 2:The canonical GNN task: given a graph of unlabelled nodes \(left, Zachary’s karate club\), produce a labelling \(right\) that respects the graph structure\. A trained GNN performs this map by repeated rounds of message passing between neighbouring nodes\. Figure reproduced fromSanchez\-Lengeling et al\. \([2021](https://arxiv.org/html/2606.07598#bib.bib26)\)\.We refer the reader toSanchez\-Lengeling et al\. \([2021](https://arxiv.org/html/2606.07598#bib.bib26)\)for an extensive visual walkthrough of the GNN model class; the formulation below follows the message\-passing abstraction ofGilmer et al\. \([2017](https://arxiv.org/html/2606.07598#bib.bib9)\)\.
A graphG=\(V,E\)G=\(V,E\)with node featuresxv∈ℝd0x\_\{v\}\\in\\mathbb\{R\}^\{d\_\{0\}\}\(v∈Vv\\in V\) and edge featureseuv∈ℝdee\_\{uv\}\\in\\mathbb\{R\}^\{d\_\{e\}\}\(\{u,v\}∈E\\\{u,v\\\}\\in E\) is processed by a*Message Passing Neural Network \(MPNN\)*\(Gilmer et al\.,[2017](https://arxiv.org/html/2606.07598#bib.bib9)\)as follows\. Each node holds a vector “state”hv\(k\)h\_\{v\}^\{\(k\)\}at roundkk\. In each round, every node collects messages from its neighbours, aggregates them in a permutation\-invariant way \(sum, mean, max, or attention\), and updates its state\. AfterKKrounds, the state at each node summarises information drawn from theKK\-hop neighbourhood around it \([Fig\.˜3](https://arxiv.org/html/2606.07598#S2.F3)\)\. Formally, node embeddingshv\(k\)∈ℝdkh\_\{v\}^\{\(k\)\}\\in\\mathbb\{R\}^\{d\_\{k\}\}evolve according to
mv\(k\+1\)\\displaystyle m\_\{v\}^\{\(k\+1\)\}=Aggregate\(k\)\(\{\{Msg\(k\)\(hu\(k\),hv\(k\),euv\)\}\}u∈N\(v\)\),\\displaystyle=\\mathrm\{Aggregate\}^\{\(k\)\}\\\!\\Bigl\(\\bigl\\\{\\\!\\\!\\bigl\\\{\\,\\mathrm\{Msg\}^\{\(k\)\}\\\!\\bigl\(h\_\{u\}^\{\(k\)\},h\_\{v\}^\{\(k\)\},e\_\{uv\}\\bigr\)\\,\\bigr\\\}\\\!\\\!\\bigr\\\}\_\{u\\in N\(v\)\}\\Bigr\),\(1\)hv\(k\+1\)\\displaystyle h\_\{v\}^\{\(k\+1\)\}=Update\(k\)\(hv\(k\),mv\(k\+1\)\),\\displaystyle=\\mathrm\{Update\}^\{\(k\)\}\\\!\\bigl\(h\_\{v\}^\{\(k\)\},\\,m\_\{v\}^\{\(k\+1\)\}\\bigr\),\(2\)whereN\(v\)=\{u:\{u,v\}∈E\}N\(v\)=\\\{u:\\\{u,v\\\}\\in E\\\},\{\{⋅\}\}\\\{\\\!\\\!\\\{\\cdot\\\}\\\!\\\!\\\}denotes a multiset, andMsg\(k\),Update\(k\)\\mathrm\{Msg\}^\{\(k\)\},\\mathrm\{Update\}^\{\(k\)\}are MLPs\. TheAggregateoperation must be a permutation\-invariant multiset\-to\-vector reduction \(sum, mean, max,…\) so that the network respects the symmetry groupSym\(V\)\\mathrm\{Sym\}\(V\)acting on node labellings\.
Figure 3:One round of message passing at a node\. The node collects the states of its neighbours along incident edges, aggregates them in a permutation\-invariant way, and updates its own state\. Figure reproduced fromSanchez\-Lengeling et al\. \([2021](https://arxiv.org/html/2606.07598#bib.bib26)\)\.Special cases of[Eqs\.˜1](https://arxiv.org/html/2606.07598#S2.E1)and[2](https://arxiv.org/html/2606.07598#S2.E2)include Graph Convolutional Networks\(Kipf and Welling,[2017](https://arxiv.org/html/2606.07598#bib.bib14)\), Graph Attention Networks\(Veličković et al\.,[2018](https://arxiv.org/html/2606.07598#bib.bib30)\), and the Graph Isomorphism Network\(Xu et al\.,[2019a](https://arxiv.org/html/2606.07598#bib.bib31)\)\. The expressive power of MPNNs is bounded above by the Weisfeiler–Lehman colour\-refinement hierarchy\(Xu et al\.,[2019a](https://arxiv.org/html/2606.07598#bib.bib31); Maron et al\.,[2019](https://arxiv.org/html/2606.07598#bib.bib20)\), a fact we will revisit when we discuss limitations of the spherical fingerprint in[Section˜7](https://arxiv.org/html/2606.07598#S7)\.
#### From signals on graphs to signals on graphons\.
For a fixed maximum number of nodes, the node\-feature space\(ℝd\)V\(\\mathbb\{R\}^\{d\}\)^\{V\}is finite\-dimensional, but to discuss*generalisation across graph sizes*one passes to a continuum limit\. This is precisely the purpose of graphons\.
## 3Graphons as Limits of Dense Graphs
#### Intuition\.
Picture ann×nn\\times nadjacency matrix laid out as a grid of black/white pixels, with the unit square\[0,1\]2\[0,1\]^\{2\}as its canvas: white where there is no edge, black where there is one\. Asnngrows, the pixels shrink and the picture is well approximated by a continuous greyscale functionW:\[0,1\]2→\[0,1\]W\\colon\[0,1\]^\{2\}\\to\[0,1\], with the grey level representing edge probability\. That continuous limit is the*graphon*of the graph sequence\. Graphons let us talk about “the same graph at different scales” — the underlying probabilistic blueprint — which is exactly what we want when comparing two GNNs that may be trained on graphs of very different sizes\.
###### Definition 3\.1\(Graphon\)\.
A*graphon*is a symmetric, Lebesgue\-measurable functionW:\[0,1\]2→\[0,1\]W\\colon\[0,1\]^\{2\}\\to\[0,1\]\. We write𝒲\\mathcal\{W\}for the set of all graphons and𝒲0\\mathcal\{W\}\_\{0\}for the closure under almost\-everywhere equality\.
###### Example 3\.2\.
A simple graphGGonnnlabelled vertices induces a graphonWGW\_\{G\}as follows\. Partition\[0,1\]\[0,1\]intonnequal subintervalsI1=\[0,1n\),I2=\[1n,2n\),…,InI\_\{1\}=\[0,\\tfrac\{1\}\{n\}\),\\,I\_\{2\}=\[\\tfrac\{1\}\{n\},\\tfrac\{2\}\{n\}\),\\,\\dots,\\,I\_\{n\}and setWG\(x,y\)=AijW\_\{G\}\(x,y\)=A\_\{ij\}forx∈Ii,y∈Ijx\\in I\_\{i\},\\,y\\in I\_\{j\}, whereAAis the adjacency matrix ofGG\. In other words:WGW\_\{G\}is just the adjacency matrix ofGGre\-drawn on the unit square at resolution1/n1/n\.WGW\_\{G\}is piecewise constant on then×nn\\times ngrid of dyadic squares; the SBM picture described in the Introduction is the same object with the cells coarsened fromnnto a smaller block count\.
We now need a metric on graphons\. The pointwiseL2L^\{2\}distance is the obvious choice but is too sensitive to noise in the pixel picture: shuffling the rows and columns of an adjacency matrix produces the same graph, butL2L^\{2\}\-distance can rise to its maximum\. We need a metric that ignores such relabellings, and that compares two pictures by how different they look at*any*block\-vs\-block scale\. The*cut\-norm*captures this: it takes the supremum, over all pairs of subsetsS,T⊂\[0,1\]S,T\\subset\[0,1\], of the discrepancy between the two graphons on the rectangleS×TS\\times T\.
###### Definition 3\.3\(Cut\-norm and cut\-distance\)\.
ForW∈𝒲W\\in\\mathcal\{W\}define the*cut\-norm*
‖W‖□:=supS,T⊂\[0,1\]\|∫S×TW\(x,y\)𝑑x𝑑y\|\.\\left\\\|W\\right\\\|\_\{\\square\}\\;:=\\;\\sup\_\{S,T\\subset\[0,1\]\}\\Bigl\|\\\!\\int\_\{S\\times T\}\\\!W\(x,y\)\\,dx\\,dy\\Bigr\|\.For two graphonsU,WU,Wthe*cut\-metric*\(or invariant cut\-distance\) is
δ□\(U,W\):=infφ∈𝒮\[0,1\]‖U−Wφ‖□,whereWφ\(x,y\)=W\(φ\(x\),φ\(y\)\)\\delta\_\{\\square\}\(U,W\)\\;:=\\;\\inf\_\{\\varphi\\in\\mathcal\{S\}\_\{\[0,1\]\}\}\\left\\\|U\-W^\{\\varphi\}\\right\\\|\_\{\\square\},\\qquad\\text\{where \}W^\{\\varphi\}\(x,y\)=W\(\\varphi\(x\),\\varphi\(y\)\)and𝒮\[0,1\]\\mathcal\{S\}\_\{\[0,1\]\}is the group of measure\-preserving bijections of\[0,1\]\[0,1\]modulo null\-sets\.
The infimum overφ\\varphiencodes the “relabel before comparing” step:φ\\varphipermutes the rows/columns of the graphon picture, and we keep the best alignment\. Two graphons at cut\-distance zero are the same graph on the unit square up to a measure\-preserving rearrangement of the parameter axis\.
###### Theorem 3\.4\(Compactness of\(𝒲~0,δ□\)\(\\widetilde\{\\mathcal\{W\}\}\_\{0\},\\delta\_\{\\square\}\), Lovász–Szegedy\)\.
The quotient space𝒲~0:=𝒲0/∼□\\widetilde\{\\mathcal\{W\}\}\_\{0\}:=\\mathcal\{W\}\_\{0\}/\\\!\\\!\\sim\_\{\\square\}obtained by identifying graphons at cut\-distance zero is a compact metric space underδ□\\delta\_\{\\square\}\.
###### Sketch\.
SeeLovász \([2012](https://arxiv.org/html/2606.07598#bib.bib18), Thm\. 9\.23\)\. The argument proceeds by sampling: any sequence of graphons admits a subsequence whoseWW\-random graphs converge in distribution; the limit corresponds \(uniquely up to∼□\\sim\_\{\\square\}\) to a graphon\. ∎
Compactness is the working hypothesis that makes the whole programme go\. Any compact metric space can be covered by finitely many balls of any fixed radius \(anε\\varepsilon\-net\)\. Translated to our setting: for any toleranceε\\varepsilonthere is a*finite*list of step\-graphons such that every graphon is withinε\\varepsilon\(in cut\-distance\) of one of them\. That finite list is the prototype of our model\-zoo fingerprint library\.
## 4Weak Regularity, Discrete and Continuous
The regularity lemma is the engine that turns “arbitrary graph” into “small SBM that looks the same\.” The idea: partition the vertex set into a bounded number of blocks, average the edge density within each block\-pair, and you obtain a piecewise\-constant graphon \(an SBM\) that matches the original at the resolution of those blocks\. The remarkable fact is that for any toleranceε\\varepsilon, a partition with at mostexp\(poly\(1/ε\)\)\\exp\(\\mathrm\{poly\}\(1/\\varepsilon\)\)blocks always suffices —*regardless of how large or complicated the graph is*\.
We first state the discrete Frieze–Kannan version\(Frieze and Kannan,[1999](https://arxiv.org/html/2606.07598#bib.bib8)\); the graphon analogue follows by passing to the continuum limit\.
###### Theorem 4\.1\(Weak Regularity, Frieze–Kannan\)\.
For everyε\>0\\varepsilon\>0there existsk0\(ε\)≤41/ε2k\_\{0\}\(\\varepsilon\)\\leq 4^\{1/\\varepsilon^\{2\}\}such that every graphG=\(V,E\)G=\(V,E\)admits a partitionV=V1⊔⋯⊔VkV=V\_\{1\}\\sqcup\\cdots\\sqcup V\_\{k\}withk≤k0\(ε\)k\\leq k\_\{0\}\(\\varepsilon\)for which the piecewise\-constant functionWG,PW\_\{G,P\}obtained by averagingAGA\_\{G\}over the blocks satisfies
‖WG−WG,P‖□≤ε\.\\left\\\|W\_\{G\}\-W\_\{G,P\}\\right\\\|\_\{\\square\}\\;\\leq\\;\\varepsilon\.
###### Theorem 4\.3\(Weak Regularity for Graphons\)\.
For everyW∈𝒲0W\\in\\mathcal\{W\}\_\{0\}and everyε\>0\\varepsilon\>0there exists a step\-graphon \(equivalently, a Stochastic Block Model\)UUon at most⌈41/ε2⌉\\lceil 4^\{1/\\varepsilon^\{2\}\}\\rceilblocks such that‖W−U‖□≤ε\\left\\\|W\-U\\right\\\|\_\{\\square\}\\leq\\varepsilon\.
###### Sketch\.
Apply[Theorem˜4\.1](https://arxiv.org/html/2606.07598#S4.Thmtheorem1)to a sequence ofWW\-random graphs of growing size and pass to the cut\-norm limit using[Theorem˜3\.4](https://arxiv.org/html/2606.07598#S3.Thmtheorem4)\. A complete proof appears inLovász \([2012](https://arxiv.org/html/2606.07598#bib.bib18), Lem\. 9\.10\)\. ∎
###### Corollary 4\.4\(Step\-graphons are dense\)\.
The set𝒲0step\\mathcal\{W\}\_\{0\}^\{\\mathrm\{step\}\}of all step\-graphons is dense in\(𝒲~0,δ□\)\(\\widetilde\{\\mathcal\{W\}\}\_\{0\},\\delta\_\{\\square\}\)\.
## 5The Graphon\-Signal Extension
A pure graphon describes only the*wiring*of a graph \(where the edges go\), not the*labels*\(the node features the GNN actually consumes\)\. Real GNN inputs come with feature vectorsxvx\_\{v\}at every node; the GNN’s job is to mix wiring and labels\. To talk about GNNs at the graphon level we therefore need to carry both pieces forward\. A*graphon\-signal*is a pair\(W,f\)\(W,f\): the graphonWWis still the edge\-probability picture, and the signalf:\[0,1\]→ℝdf\\colon\[0,1\]\\to\\mathbb\{R\}^\{d\}is the continuous\-limit version of the node\-feature functionv↦xvv\\mapsto x\_\{v\}\.Levie \([2023](https://arxiv.org/html/2606.07598#bib.bib17)\)extends the cut\-norm machinery and the weak regularity lemma to this richer setting, and shows that any MPNNΦ\\Phiacts on graphon\-signals as a Lipschitz map\.
###### Definition 5\.1\(Graphon\-signal cut\-norm\)\.
Fixr\>0r\>0\. For a graphon\-signal\(W,f\)\(W,f\)with‖f‖∞≤r\\\|f\\\|\_\{\\infty\}\\leq rdefine
‖\(W,f\)‖□r:=‖W‖□\+supS⊂\[0,1\]‖∫Sf\(x\)𝑑x‖∞\.\{\\left\\\|\(W,f\)\\right\\\|\_\{\\square\}\}\_\{r\}\\;:=\\;\\left\\\|W\\right\\\|\_\{\\square\}\\;\+\\;\\sup\_\{S\\subset\[0,1\]\}\\Bigl\\\|\\int\_\{S\}f\(x\)\\,dx\\Bigr\\\|\_\{\\infty\}\.The associated invariant cut\-distanceδ□r\(\(W,f\),\(U,g\)\)\\delta\_\{\\square\}^\{r\}\(\(W,f\),\(U,g\)\)is defined exactly as in[Definition˜3\.3](https://arxiv.org/html/2606.07598#S3.Thmtheorem3)by infimum over measure\-preserving rearrangements applied jointly toWWandff\.
In words: the first term measures cut\-norm discrepancy on the wiring \(same as before\), and the second term measures the worst\-case discrepancy of the signal’s coordinate\-wise mass on any sub\-intervalSS\. The∥⋅∥∞\\\|\\cdot\\\|\_\{\\infty\}on the vector\-valued integral simply picks out the coordinate offfwith the largest accumulated mass; the parameterrrrecords the boundedness assumption but does not appear in the formula itself\.
###### Theorem 5\.2\(Levie 2023, Graphon\-Signal Weak Regularity\)\.
There is a universal constantccsuch that for every graphon\-signal\(W,f\)\(W,f\)with‖f‖∞≤r\\\|f\\\|\_\{\\infty\}\\leq rand everyε\>0\\varepsilon\>0there exists a*step graphon\-signal*\(U,g\)\(U,g\)on at most⌈2cr2/ε2⌉\\lceil 2^\{cr^\{2\}/\\varepsilon^\{2\}\}\\rceilblocks satisfyingδ□r\(\(W,f\),\(U,g\)\)≤ε\\delta\_\{\\square\}^\{r\}\(\(W,f\),\(U,g\)\)\\leq\\varepsilon\.
###### Theorem 5\.3\(MPNNs are Lipschitz on graphon\-signal space, Levie 2023\)\.
LetΦ\\Phibe aKK\-layer MPNN with all message and update MLPsLL\-Lipschitz in their inputs and outputs bounded in∥⋅∥∞\\\|\\cdot\\\|\_\{\\infty\}\. Then there existsLΦ=LΦ\(L,K,r\)L\_\{\\Phi\}=L\_\{\\Phi\}\(L,K,r\)such that for all signals with‖f‖∞,‖g‖∞≤r\\\|f\\\|\_\{\\infty\},\\\|g\\\|\_\{\\infty\}\\leq r,
δ□r\(Φ\(W,f\),Φ\(U,g\)\)≤LΦδ□r\(\(W,f\),\(U,g\)\)\.\\delta\_\{\\square\}^\{r\}\\\!\\bigl\(\\Phi\(W,f\),\\,\\Phi\(U,g\)\\bigr\)\\;\\leq\\;L\_\{\\Phi\}\\,\\delta\_\{\\square\}^\{r\}\\bigl\(\(W,f\),\(U,g\)\\bigr\)\.
The combination of[Theorems˜5\.2](https://arxiv.org/html/2606.07598#S5.Thmtheorem2)and[5\.3](https://arxiv.org/html/2606.07598#S5.Thmtheorem3)is the operational core of the present work\. Read informally:
- •Any graphon\-signal input isε\\varepsilon\-close \(in cut\-distance\) to a step\-graphon\-signal — an SBM with feature values constant on each block\.
- •The MPNNΦ\\Phiis Lipschitz, so its output is withinLΦεL\_\{\\Phi\}\\,\\varepsilonof the output it would produce on that SBM\.
- •The output itself is thereforeε\\varepsilon\-close to another SBM with a bounded number of blocks\.
The network is, up to controlled error, an SBM\-to\-SBM map\. This is what licenses comparing trained GNNs by comparing the SBMs they effectively implement, rather than by comparing weight tensors\.
## 6The Spherical Topological Model for SBMs
We now construct an explicit map from a step\-graphon\-signal ofnnblocks to a labelled point cloud on the unit\(n−1\)\(n\-1\)\-sphere\.
### 6\.1Notation and ambient geometry
Forn∈ℕn\\in\\mathbb\{N\}, let𝕊n−1⊂ℝn\\mathbb\{S\}^\{n\-1\}\\subset\\mathbb\{R\}^\{n\}denote the unit Euclidean sphere, and𝔹n\\mathbb\{B\}^\{n\}the closed unit ball\. Their volumes and surface areas are
Vol\(𝔹n\)=πn/2Γ\(n2\+1\),Area\(𝕊n−1\)=2πn/2Γ\(n/2\)\.\\operatorname\{Vol\}\(\\mathbb\{B\}^\{n\}\)\\;=\\;\\frac\{\\pi^\{n/2\}\}\{\\Gamma\(\\tfrac\{n\}\{2\}\+1\)\},\\qquad\\operatorname\{Area\}\(\\mathbb\{S\}^\{n\-1\}\)\\;=\\;\\frac\{2\\,\\pi^\{n/2\}\}\{\\Gamma\(n/2\)\}\.\(3\)We use spherical\-cap coordinates\(θ1,…,θn−1\)\(\\theta\_\{1\},\\dots,\\theta\_\{n\-1\}\)withθi∈\[0,π\]\\theta\_\{i\}\\in\[0,\\pi\]fori<n−1i<n\-1andθn−1∈\[0,2π\)\\theta\_\{n\-1\}\\in\[0,2\\pi\)\.
### 6\.2The block\-to\-cap mapΨn\\Psi\_\{n\}
#### Picture first\.
We want to take annn\-block SBM and paint it on a sphere\. Pick a “north pole” directionuu\. Slice the sphere intonnhorizontal bands like latitudes on a globe; choose the latitude lines so bandjjhas surface areapjp\_\{j\}\(the size of blockjj\)\. Blockjjpaints bandjj\. The points inside the band are coloured by the feature value the SBM assigns to blockjj\. That is the spherical fingerprint — a globe withnnlatitude stripes carrying the block colours\.[Figure˜1](https://arxiv.org/html/2606.07598#S1.F1)shows the result for a55\-class SBM\.
The only technical question is where to put the latitude lines\. Equal areas do*not*correspond to equal latitude widths \(the bands near the equator are wider than those at the poles, because the sphere bulges\)\. The right cut\-points come from the cosine\-distribution of a uniformly random point on the sphere, which is well\-known to be Beta\-distributed\.
#### Formal construction\.
Given a partition𝒫=\{I1,…,In\}\\mathcal\{P\}=\\\{I\_\{1\},\\dots,I\_\{n\}\\\}of\[0,1\]\[0,1\]with\|Ij\|=pj\|I\_\{j\}\|=p\_\{j\}\(so∑pj=1\\sum p\_\{j\}=1\), we want to map each block to a region of the sphere of normalised areapjp\_\{j\}such that:
1. \(i\)Each block maps to a connected, simply\-connected region \(a cap or a band\)\.
2. \(ii\)The blocks map to pairwise disjoint regions\.
3. \(iii\)The map is measurable, so signals on\[0,1\]\[0,1\]pull back to measurable functions on the sphere\.
Choose any unit vectoru∈𝕊n−1u\\in\\mathbb\{S\}^\{n\-1\}\(the “north pole”\) and partition𝕊n−1\\mathbb\{S\}^\{n\-1\}intonn*nested zonal bands*:
Bj\(u\)=\{x∈𝕊n−1:aj−1≤⟨x,u⟩<aj\},j=1,…,n,B\_\{j\}\(u\)\\;=\\;\\bigl\\\{\\,x\\in\\mathbb\{S\}^\{n\-1\}\\;:\\;a\_\{j\-1\}\\leq\\langle x,u\\rangle<a\_\{j\}\\bigr\\\},\\qquad j=1,\\dots,n,with−1=a0<a1<⋯<an=1\-1=a\_\{0\}<a\_\{1\}<\\dots<a\_\{n\}=1to be chosen\. Thejjth band is the set of points whose projection ontouulies betweenaj−1a\_\{j\-1\}andaja\_\{j\}— geometrically, a slab cut by two parallel hyperplanes perpendicular touu\. We wantArea\(Bj\(u\)\)/Area\(𝕊n−1\)=pj\\operatorname\{Area\}\(B\_\{j\}\(u\)\)/\\operatorname\{Area\}\(\\mathbb\{S\}^\{n\-1\}\)=p\_\{j\}\.
The projection⟨X,u⟩\\langle X,u\\rangleof a uniform pointX∈𝕊n−1X\\in\\mathbb\{S\}^\{n\-1\}has density proportional to\(1−t2\)\(n−3\)/2\(1\-t^\{2\}\)^\{\(n\-3\)/2\}on\[−1,1\]\[\-1,1\]\(this is the density of the cosine of the angle between a random sphere point and a fixed direction\)\. The substitutions=\(1\+t\)/2s=\(1\+t\)/2converts this to a symmetricBeta\(\(n−1\)/2,\(n−1\)/2\)\\mathrm\{Beta\}\\bigl\(\(n\-1\)/2,\(n\-1\)/2\\bigr\)density on\[0,1\]\[0,1\]\. The band\-area condition is then equivalent to requiring the CDF of this Beta to hit the cumulative block sizes:
I\(1\+aj\)/2\(n−12,n−12\)=p1\+p2\+⋯\+pj,I\_\{\(1\+a\_\{j\}\)/2\}\\\!\\Bigl\(\\tfrac\{n\-1\}\{2\},\\tfrac\{n\-1\}\{2\}\\Bigr\)\\;=\\;p\_\{1\}\+p\_\{2\}\+\\cdots\+p\_\{j\},\(4\)whereIx\(a,b\)I\_\{x\}\(a,b\)is the regularised incomplete beta function\. In words: to find the upper edge of bandjj, take the inverse Beta\-CDF at the cumulative probabilityp1\+⋯\+pjp\_\{1\}\+\\dots\+p\_\{j\}, then convert from\[0,1\]\[0,1\]back to\[−1,1\]\[\-1,1\]\. \(The one\-sided spherical\-cap formI\(1\+aj\)/2\(\(n−1\)/2,1/2\)I\_\{\(1\+a\_\{j\}\)/2\}\\bigl\(\(n\-1\)/2,1/2\\bigr\)gives a different band structure and is not the object needed here; \([4](https://arxiv.org/html/2606.07598#S6.E4)\) agrees with Monte Carlo sampling in every\(n,aj\)\(n,a\_\{j\}\)regime\.\)
Within each bandBjB\_\{j\}, defineΨn\\Psi\_\{n\}on the inverse\-CDF coordinate of the band so that the block parameterx∈Ijx\\in I\_\{j\}is sent to a uniformly distributed point onBjB\_\{j\}\. The resulting map
Ψn:\[0,1\]⟶𝕊n−1\\Psi\_\{n\}\\colon\[0,1\]\\;\\longrightarrow\\;\\mathbb\{S\}^\{n\-1\}is measurable and*measure\-preserving*: Lebesgue measure on\[0,1\]\[0,1\]pushes forward to normalised surface \(Hausdorff\) measure on𝕊n−1\\mathbb\{S\}^\{n\-1\}\.
###### Proposition 6\.1\(Properties ofΨn\\Psi\_\{n\}\)\.
The mapΨn\\Psi\_\{n\}defined above satisfies:
1. \(a\)Ψn\\Psi\_\{n\}is measurable and pushes Lebesgue measure on\[0,1\]\[0,1\]to normalised surface measure on𝕊n−1\\mathbb\{S\}^\{n\-1\}\.
2. \(b\)For each blockIjI\_\{j\}, the imageΨn\(Ij\)\\Psi\_\{n\}\(I\_\{j\}\)is the closed bandBjB\_\{j\}of normalised areapjp\_\{j\}\.
3. \(c\)Ψn\\Psi\_\{n\}is a bijection between\[0,1\]\[0,1\]and a co\-null subset of𝕊n−1\\mathbb\{S\}^\{n\-1\}, and is continuous on eachIjI\_\{j\}but*not*globally continuous \(it is discontinuous at the band boundaries\)\.
### 6\.3Why the sphere?
Several manifolds could serve as targets for a fingerprint: Euclidean space, the simplex, the hypercube, the torus\. We chose the sphere for three reasons, each of which matters for a downstream retrieval system\.
1. 1\.Compactness\.𝕊n−1\\mathbb\{S\}^\{n\-1\}is compact, matching the compactness of𝒲~0\\widetilde\{\\mathcal\{W\}\}\_\{0\}from[Theorem˜3\.4](https://arxiv.org/html/2606.07598#S3.Thmtheorem4)\. Comparisons can therefore be performed against a fixed library without renormalisation\.
2. 2\.Isotropy\.The sphere is the unique simply\-connected Riemannian symmetric space of constant positive curvature; visual comparison is rotation\-invariant\.
3. 3\.Probability surface\.When endowed with normalised surface measure,𝕊n−1\\mathbb\{S\}^\{n\-1\}is a probability space, matching the natural interpretation of edge densities as probabilities\.
### 6\.4Choosing the dimensionnn
We want the sphere to double as a probability surface \(total mass11\)\. The unit sphere does not, in general, have surface area11—Area\(𝕊n−1\)\\operatorname\{Area\}\(\\mathbb\{S\}^\{n\-1\}\)varies non\-monotonically withnn, peaks nearn≈7n\\approx 7, and decays to zero asn→∞n\\to\\infty\. We have two clean ways to land at unit total mass: either fix the radius and pick the \(real\) dimension that yields unit area, or fix the integer dimension and rescale the radius\.
#### Convention A: fix surface area to11, solve fornn\.
Setting the right\-hand side of \([3](https://arxiv.org/html/2606.07598#S6.E3)\) equal to11and solving for the real parameternn:
2πn/2Γ\(n/2\)=1\.\\frac\{2\\,\\pi^\{n/2\}\}\{\\Gamma\(n/2\)\}\\;=\\;1\.\(5\)This is a transcendental equation with no closed form\. Numerical solution \(scipy\.optimize\.brentqon the interval\[10,40\]\[10,40\]\) gives
n⋆≈18\.768281888226916,2πn⋆/2Γ\(n⋆/2\)≈1±10−15\.n^\{\\star\}\\;\\approx\\;18\.768281888226916,\\qquad\\frac\{2\\pi^\{n^\{\\star\}/2\}\}\{\\Gamma\(n^\{\\star\}/2\)\}\\;\\approx\\;1\\pm 10^\{\-15\}\.\(6\)
#### Convention B: fix integernn, scale the radius\.
Pickn∈ℕn\\in\\mathbb\{N\}and rescale byrrso thatArea\(r𝕊n−1\)=1\\operatorname\{Area\}\(r\\mathbb\{S\}^\{n\-1\}\)=1:
r=\(Γ\(n/2\)2πn/2\)1/\(n−1\)\.r\\;=\\;\\Bigl\(\\frac\{\\Gamma\(n/2\)\}\{2\\pi^\{n/2\}\}\\Bigr\)^\{\\\!1/\(n\-1\)\}\.Forn=4n=4\(the33\-sphere𝕊3⊂ℝ4\\mathbb\{S\}^\{3\}\\subset\\mathbb\{R\}^\{4\}\),Area\(𝕊3\)=2π2≈19\.7392\\operatorname\{Area\}\(\\mathbb\{S\}^\{3\}\)=2\\pi^\{2\}\\approx 19\.7392, so a unit\-area33\-sphere requiresr=\(2π2\)−1/3≈0\.3700r=\(2\\pi^\{2\}\)^\{\-1/3\}\\approx 0\.3700\.
In what follows we adopt Convention A with the integer\-rounded valuen=19n=19, for whichArea\(𝕊18\)≈0\.886\\operatorname\{Area\}\(\\mathbb\{S\}^\{18\}\)\\approx 0\.886, and equip the sphere with*normalised*surface measure throughout, side\-stepping the unit\-area issue entirely\.
## 7Main Results
### 7\.1Concentration of measure: the equator problem
The spherical fingerprint works beautifully at the dimensions we have been discussing \(n=19n=19, modest block counts\)\. In modern deep learning, however, embedding dimensions of700700to20482048are routine\. At those dimensions the geometry of the sphere becomes*very*different from the everyday𝕊2\\mathbb\{S\}^\{2\}globe picture\. The technical phenomenon is called*concentration of measure*\(Ledoux,[2001](https://arxiv.org/html/2606.07598#bib.bib15)\), and it is the principal obstacle to scaling the spherical fingerprint to LLM\-class embeddings\.
#### The physical picture\.
On𝕊2\\mathbb\{S\}^\{2\}, the equator is a comfortable belt and the poles are distant points; surface area is spread reasonably evenly\. On𝕊999\\mathbb\{S\}^\{999\}, almost all the surface area lies within1/1000≈0\.031/\\sqrt\{1000\}\\approx 0\.03of*any*equator you choose\. A uniformly random point is, with probability close to one, nearly orthogonal to any fixed direction\. The poles, in this high\-dimensional regime, are essentially empty\.
For our fingerprint this is bad: our zonal bands carry the block structure, and the polar bands are the ones with the smallest area\. In high dimensions those polar bands collapse to negligible mass while everything piles up at the equator\. Two SBMs whose block structures differ only at the poles become visually identical\.
#### The formal statement\.
###### Theorem 7\.1\(Lévy’s Lemma\)\.
Letf:𝕊n−1→ℝf\\colon\\mathbb\{S\}^\{n\-1\}\\to\\mathbb\{R\}beLL\-Lipschitz, and letXXbe uniformly distributed on𝕊n−1\\mathbb\{S\}^\{n\-1\}\. Then for everyt\>0t\>0,
ℙ\[\|f\(X\)−𝔼f\(X\)\|≥t\]≤2exp\(−\(n−1\)t22L2\)\.\\mathbb\{P\}\\bigl\[\\,\|f\(X\)\-\\mathbb\{E\}f\(X\)\|\\;\\geq\\;t\\,\\bigr\]\\;\\leq\\;2\\exp\\\!\\Bigl\(\-\\tfrac\{\(n\-1\)\\,t^\{2\}\}\{2L^\{2\}\}\\Bigr\)\.
Takingffto be projection onto a fixed direction gives the immediate corollary, which is the concrete form we will use\.
###### Corollary 7\.2\(Equatorial concentration\)\.
For any fixed unit vectoru∈𝕊n−1u\\in\\mathbb\{S\}^\{n\-1\}andXXuniform on the sphere,
ℙ\[\|⟨X,u⟩\|≥t\]≤2exp\(−\(n−1\)t22\)\.\\mathbb\{P\}\\bigl\[\\,\|\\langle X,u\\rangle\|\\;\\geq\\;t\\,\\bigr\]\\;\\leq\\;2\\exp\\\!\\Bigl\(\-\\tfrac\{\(n\-1\)\\,t^\{2\}\}\{2\}\\Bigr\)\.
In words: forn≥50n\\\!\\geq\\\!50essentially all the surface area of𝕊n−1\\mathbb\{S\}^\{n\-1\}lies within𝒪\(1/n\)\\mathcal\{O\}\(1/\\sqrt\{n\}\)of the equator perpendicular touu\. Two SBMs whose block\-to\-cap maps differ in any non\-equatorial way are visually indistinguishable from a fixed viewpoint\.
###### Open Problem 7\.3\(The equator problem\)\.
Forn≥30n\\geq 30, the spherical fingerprintΨn\(𝒫\)\\Psi\_\{n\}\(\\mathcal\{P\}\)of annn\-block SBM has all its non\-zero “mass” within𝒪\(1/n\)\\mathcal\{O\}\(1/\\sqrt\{n\}\)of any chosen equator\. Find a normalisation, projection, or alternative manifold \([Section˜9\.1](https://arxiv.org/html/2606.07598#S9.SS1)\) that retains the discriminative information of the cap structure in the high\-dimensional limit\.
[Open Problem˜7\.3](https://arxiv.org/html/2606.07598#S7.Thmtheorem3)is the principal technical problem opened by this framework;[Section˜9](https://arxiv.org/html/2606.07598#S9)sketches three approaches — hyperbolic targets, Grassmannians, and Gromov–Wasserstein — each of which side\-steps positive\-curvature concentration in a different way\.
### 7\.2Closed\-form bound on inter\-SBM distance via the spherical map
For the spherical fingerprint to be useful as a retrieval index, we need a guarantee that if two GNNs are close in some operationally meaningful sense, their fingerprints are also close — and vice versa\. The following theorem makes this precise\. “Close” on the GNN side means small cut\-distance between the graphon\-signals the networks act on; “close” on the fingerprint side means small11\-Wasserstein distance \(W1W\_\{1\}\) between the spherical point clouds\.W1W\_\{1\}is the classical optimal\-transport cost: the minimum total geodesic distance needed to move one distribution onto the other on the sphere\.
###### Theorem 7\.4\(Spherical fingerprint Lipschitz bound\)\.
Let\(W,f\)\(W,f\)and\(U,g\)\(U,g\)be two graphon\-signals with‖f‖∞,‖g‖∞≤r\\\|f\\\|\_\{\\infty\},\\\|g\\\|\_\{\\infty\}\\leq r, and let\(WP,fP\),\(UQ,gQ\)\(W^\{P\},f^\{P\}\),\(U^\{Q\},g^\{Q\}\)be their step approximations onnnblocks furnished by[Theorem˜5\.2](https://arxiv.org/html/2606.07598#S5.Thmtheorem2)\. LetμPΨ,μQΨ\\mu\_\{P\}^\{\\Psi\},\\mu\_\{Q\}^\{\\Psi\}be the pushforward measures of the labelled node distributions underΨn\\Psi\_\{n\}, viewed as signed measures on𝕊n−1\\mathbb\{S\}^\{n\-1\}with vector\-valued total variationrr\. Then there exists a constantC=C\(n,r\)C=C\(n,r\)such that
W1\(μPΨ,μQΨ\)≤C\(δ□r\(\(W,f\),\(U,g\)\)\+ε\),W\_\{1\}\\bigl\(\\mu\_\{P\}^\{\\Psi\},\\mu\_\{Q\}^\{\\Psi\}\\bigr\)\\;\\leq\\;C\\\!\\left\(\\delta\_\{\\square\}^\{r\}\\\!\\bigl\(\(W,f\),\(U,g\)\\bigr\)\+\\varepsilon\\right\),whereW1W\_\{1\}is the11\-Wasserstein distance on𝕊n−1\\mathbb\{S\}^\{n\-1\}with the geodesic metric andε\\varepsilonis the regularity tolerance\.
###### Sketch\.
Apply[Theorem˜5\.2](https://arxiv.org/html/2606.07598#S5.Thmtheorem2)to bound the distance between\(W,f\)\(W,f\)and its step approximation in cut\-distance byε\\varepsilon, and likewise for\(U,g\)\(U,g\)\. AlthoughΨn\\Psi\_\{n\}is globally discontinuous at band boundaries, its restriction to eachIjI\_\{j\}is bi\-Lipschitz onto the corresponding bandBjB\_\{j\}\(an inverse\-CDF reparametrisation of a smooth piece of sphere\); we use this band\-wise Lipschitz property to transfer the cut\-distance bound into a Wasserstein bound on𝕊n−1\\mathbb\{S\}^\{n\-1\}\. The factorCCcollects the maximum band diameter and the band\-to\-block area correspondence \([4](https://arxiv.org/html/2606.07598#S6.E4)\)\. Full details follow the template ofLevie \([2023](https://arxiv.org/html/2606.07598#bib.bib17), §4\)\. ∎
In plain English: nearest\-neighbour search on spherical fingerprints is a sound proxy for nearest\-neighbour search in cut\-distance on the underlying graphon\-signals\. Two caveats: the bound has a constantC=C\(n,r\)C=C\(n,r\)that degrades asn\\sqrt\{n\}\(the same concentration\-of\-measure warning as[Open Problem˜7\.3](https://arxiv.org/html/2606.07598#S7.Thmtheorem3)\), and the regularity toleranceε\\varepsilonenters additively \(so very fine retrieval requires very fine SBM approximations, hence many blocks\)\.
### 7\.3Algorithmic recipe
Given a trained MPNNΦ\\Phideployed on a graphGGand a toleranceε\\varepsilon, the procedure for producing the spherical fingerprint is:
1. 1\.Forward pass\.RunΦ\\PhionGGto obtainhv\(K\)∈ℝdh\_\{v\}^\{\(K\)\}\\in\\mathbb\{R\}^\{d\}for eachv∈Vv\\in V\.
2. 2\.Cluster\.Applykk\-means or spectral clustering to the embedding cloud\{hv\(K\)\}v∈V\\\{h\_\{v\}^\{\(K\)\}\\\}\_\{v\\in V\}withk≤41/ε2k\\leq 4^\{1/\\varepsilon^\{2\}\}\.
3. 3\.Block summary\.For each clusterjjrecordpj=\|Vj\|/\|V\|p\_\{j\}=\|V\_\{j\}\|/\|V\|and the block\-edge densitiesWij=\|E\(Vi,Vj\)\|/\(\|Vi\|\|Vj\|\)W\_\{ij\}=\|E\(V\_\{i\},V\_\{j\}\)\|/\(\|V\_\{i\}\|\|V\_\{j\}\|\), plus the centroidh¯j\\bar\{h\}\_\{j\}\.
4. 4\.Embed\.ApplyΨn\\Psi\_\{n\}as in[Section˜6\.2](https://arxiv.org/html/2606.07598#S6.SS2)withn=max\(k,18\)n=\\max\(k,18\); colour each bandBjB\_\{j\}by the first three principal components of the centroidh¯j\\bar\{h\}\_\{j\}\.
5. 5\.Render\.Render the resulting coloured sphere to a small image \(e\.g\.256×256256\\times 256pixels or an animated GIF rotating about the polar axis\)\.
The output is a low\-dimensional, colour\-coded picture that summarises both the block structure of the learned graphon*and*the principal directions of the embedding itself\.
## 8Application: Transfer\-Learning Candidate Retrieval
#### Related work on GNN transfer\.
There is a substantial existing literature on GNN transferability, with two strands particularly close to ours\.Ruiz et al\. \([2020](https://arxiv.org/html/2606.07598#bib.bib25)\)introduce*graphon neural networks*and prove that a fixed\-architecture GNN trained on a small graph sampled from a graphonWWtransfers, with explicit rate𝒪\(n−1/2\)\\mathcal\{O\}\(n^\{\-1/2\}\)error bounds, to graphs of much larger size drawn from the sameWW\.Maskey et al\. \([2022](https://arxiv.org/html/2606.07598#bib.bib21)\)extend this to generalisation bounds for MPNNs on large random graphs, building directly onLevie \([2023](https://arxiv.org/html/2606.07598#bib.bib17)\)’s graphon\-signal apparatus that we use here\. Spectral\-side transferability of graph convolutional filters is treated inLevie et al\. \([2019](https://arxiv.org/html/2606.07598#bib.bib16)\)\. Engineering\-level GNN transfer — pretrain\-then\-finetune across tasks rather than across graph sizes — is surveyed inHu et al\. \([2020](https://arxiv.org/html/2606.07598#bib.bib13)\)andCervino et al\. \([2023](https://arxiv.org/html/2606.07598#bib.bib4)\)\.
What is missing from this body of work is a procedure for*retrieving*, from a library of trained models, the GNN whose learned graphon is closest to the one a new task induces\. Ruiz et al\. and Maskey et al\. give guarantees for a single model across graph sizes; pretrain/finetune strategies pick a source model by domain heuristic\. The spherical fingerprint targets exactly this retrieval gap\.
#### Workflow\.
The intended use case is a*model zoo with thumbnails*\. Suppose a practitioner has trainedNNGNNs onNNdifferent problems \(antibody–antigen binding, molecular toxicity, citation\-graph node classification, traffic\-flow forecasting, etc\.\) and stored, for each, the spherical fingerprint produced by[Section˜7\.3](https://arxiv.org/html/2606.07598#S7.SS3)\. When a new problem arrives, the practitioner trains a small*seed*MPNN for one or two epochs, computes its fingerprint, and performs a nearest\-neighbour search \(inW1W\_\{1\}on𝕊n−1\\mathbb\{S\}^\{n\-1\}\) against the library\. Close matches identify candidate models from which to bootstrap\. The actual transfer proceeds by re\-mapping the matched model’s block centroids into the new problem’s embedding space using a learned linear map of dimensiondnew×doldd\_\{\\mathrm\{new\}\}\\times d\_\{\\mathrm\{old\}\}; the rest of the matched MPNN’s weights serve as initial conditions for fine\-tuning, combinable with the finetune strategies ofHu et al\. \([2020](https://arxiv.org/html/2606.07598#bib.bib13)\)\.
This workflow side\-steps the need to train each new GNN from random initialisation when a structurally similar trained model already exists, and complements — rather than replaces — the same\-graphon transfer guarantees ofRuiz et al\. \([2020](https://arxiv.org/html/2606.07598#bib.bib25)\); Maskey et al\. \([2022](https://arxiv.org/html/2606.07598#bib.bib21)\)\. Empirical validation is left as future work \([Section˜9\.6](https://arxiv.org/html/2606.07598#S9.SS6)\)\.
## 9Future Research Directions
### 9\.1Beyond the sphere: hyperbolic and Grassmannian alternatives
The equator problem \([Open Problem˜7\.3](https://arxiv.org/html/2606.07598#S7.Thmtheorem3)\) is partly an artefact of positive curvature\. Three alternative target manifolds suggest themselves\.
#### Hyperbolic spaceℍn−1\\mathbb\{H\}^\{n\-1\}\.
Tree\-like and hierarchical graphs \(citation networks, taxonomies, social influence trees\) are known to embed with much lower distortion in hyperbolic space than in Euclidean or spherical space\(Nickel and Kiela,[2017](https://arxiv.org/html/2606.07598#bib.bib23); Sala et al\.,[2018](https://arxiv.org/html/2606.07598#bib.bib27)\)\. The analogue of[Section˜6\.2](https://arxiv.org/html/2606.07598#S6.SS2)inℍn−1\\mathbb\{H\}^\{n\-1\}would map each block to a horoball; the volume\-of\-balls identity in the Poincaré disk gives an exponential family of candidate radii\.
#### GrassmannianGr\(k,n\)\\mathrm\{Gr\}\(k,n\)\.
When the embedding dimensionddis large, a more natural object than a single point cloud is the*principal subspace*spanned by the embedding cloud of each SBM block\. The collection ofkk\-dimensional subspaces inℝn\\mathbb\{R\}^\{n\}is the Grassmannian, on which the principal\-angle \(Frobenius / chordal\) metric induces a well\-developed comparison theory\(Edelman et al\.,[1998](https://arxiv.org/html/2606.07598#bib.bib6)\)\.
#### Stiefel manifoldVk\(ℝn\)V\_\{k\}\(\\mathbb\{R\}^\{n\}\)\.
An orderedkk\-frame representation preserves more information than the Grassmannian \(it remembers the first principal direction\) and is appropriate when block\-internal ordering carries meaning \(e\.g\. temporal block models\)\.
### 9\.2Gromov–Wasserstein: an isometry\-free distance
The dependence of the spherical fingerprint on the choice of north pole and on the cap\-ordering can be removed entirely by working with the*Gromov–Wasserstein distance*\(Mémoli,[2011](https://arxiv.org/html/2606.07598#bib.bib22); Sturm,[2012](https://arxiv.org/html/2606.07598#bib.bib28)\)on the metric measure space\(𝕊n−1,d𝕊,μ\)\(\\mathbb\{S\}^\{n\-1\},d\_\{\\mathbb\{S\}\},\\mu\):
GWpp\(\(X,dX,μX\),\(Y,dY,μY\)\):=infγ∈Π\(μX,μY\)∫∫\|dX\(x,x′\)−dY\(y,y′\)\|p𝑑γ\(x,y\)𝑑γ\(x′,y′\)\.\\mathrm\{GW\}\_\{p\}^\{p\}\\bigl\(\(X,d\_\{X\},\\mu\_\{X\}\),\(Y,d\_\{Y\},\\mu\_\{Y\}\)\\bigr\)\\;:=\\;\\inf\_\{\\gamma\\in\\Pi\(\\mu\_\{X\},\\mu\_\{Y\}\)\}\\int\\int\\\!\\bigl\|d\_\{X\}\(x,x^\{\\prime\}\)\-d\_\{Y\}\(y,y^\{\\prime\}\)\\bigr\|^\{p\}\\,d\\gamma\(x,y\)\\,d\\gamma\(x^\{\\prime\},y^\{\\prime\}\)\.A graphon\-signal can be embedded directly as an mm\-space without any sphere mapping at all:X=\[0,1\]X=\[0,1\],dX\(x,x′\)d\_\{X\}\(x,x^\{\\prime\}\)derived from the graphon distanceWW,μX=Leb\|\[0,1\]\\mu\_\{X\}=\\mathrm\{Leb\}\|\_\{\[0,1\]\}\. The Gromov–Wasserstein distance is permutation\-invariant by construction\(Xu et al\.,[2019b](https://arxiv.org/html/2606.07598#bib.bib32),[c](https://arxiv.org/html/2606.07598#bib.bib33)\), sidestepping the𝒮\[0,1\]\\mathcal\{S\}\_\{\[0,1\]\}\-quotient that complicates cut\-distance\. We conjecture the following\.
###### Conjecture 9\.1\(GW\-equivalence of cut\-distance\)\.
On the subspace of step\-graphon\-signals with at mostKKblocks and signal norm at mostrr, the cut\-distanceδ□r\\delta\_\{\\square\}^\{r\}and the Gromov–Wasserstein\-2 distance are bi\-Lipschitz equivalent, with constants depending only onKKandrr\.
A proof or counterexample would settle the question of whether the Gromov–Wasserstein machinery can fully replace cut\-distance in the analysis of MPNNs\.
### 9\.3Information geometry: SBMs as a statistical manifold
The space ofnn\-block SBMs with prescribed block sizes is parametrised by a symmetric matrixW∈\[0,1\]n×nW\\in\[0,1\]^\{n\\times n\}of edge probabilities — intrinsically, a\(n2\)\+n\\binom\{n\}\{2\}\+n\-dimensional manifold\. Treating each edge slot as an independent Bernoulli variable and equipping the parameter space with the Fisher information metric
gij,kl\(W\)=𝔼e∼Bern\(Wij\)\[∂ijlogp∂kllogp\]=δ\(ij\),\(kl\)Wij\(1−Wij\)g\_\{ij,kl\}\(W\)\\;=\\;\\mathbb\{E\}\_\{e\\sim\\mathrm\{Bern\}\(W\_\{ij\}\)\}\\\!\\Bigl\[\\partial\_\{ij\}\\log p\\,\\partial\_\{kl\}\\log p\\Bigr\]\\;=\\;\\frac\{\\delta\_\{\(ij\),\(kl\)\}\}\{W\_\{ij\}\(1\-W\_\{ij\}\)\}gives a diagonal Riemannian metric on the open block\(0,1\)\(n2\)\+n\(0,1\)^\{\\binom\{n\}\{2\}\+n\}, flat in the interior \(the product of11\-dimensional Bernoulli arcs\) with metric singularities along\{Wij=0\}∪\{Wij=1\}\\\{W\_\{ij\}=0\\\}\\cup\\\{W\_\{ij\}=1\\\}that pull “deterministic” edges away from “random” ones\. Geodesic distance on this manifold corresponds to the Hellinger / Bures distance between the induced edge distributions\(Amari,[2016](https://arxiv.org/html/2606.07598#bib.bib1)\)\. This perspective is complementary to the spherical one:
- •*Spherical*: emphasises spatial/visual structure of block masses\.
- •*Information*: emphasises statistical distinguishability of edge\-probability matrices\.
Joint use suggests a product metricdjoint2=αd𝕊2\+\(1−α\)dFisher2d^\{2\}\_\{\\mathrm\{joint\}\}=\\alpha\\,d\_\{\\mathbb\{S\}\}^\{2\}\+\(1\-\\alpha\)\\,d\_\{\\mathrm\{Fisher\}\}^\{2\}with a tuning hyperparameterα\\alpha\.
### 9\.4Persistent homology of layer\-wise embedding clouds
A trained MPNN gives, at each layerkk, a point cloud\{hv\(k\)\}v∈V⊂ℝdk\\\{h\_\{v\}^\{\(k\)\}\\\}\_\{v\\in V\}\\subset\\mathbb\{R\}^\{d\_\{k\}\}\. Computing the*persistence diagram*\(Edelsbrunner and Harer,[2010](https://arxiv.org/html/2606.07598#bib.bib7); Carlsson,[2009](https://arxiv.org/html/2606.07598#bib.bib3)\)of the Vietoris–Rips filtration on each such cloud, and stacking the resulting barcodes across layers, produces a topological signature that is invariant under any homeomorphism of the embedding space\. Two GNNs whose layer\-wise persistence diagrams agree under the bottleneck distance are encoding the same multi\-scale topological structure — a stronger claim than visual sphere similarity\. We propose to combine this with the spherical fingerprint as a two\-stage retrieval: cheap sphere lookup, then expensive persistence verification\.
### 9\.5Spectral fingerprints from the graphon eigendecomposition
A graphonW∈𝒲~0W\\in\\widetilde\{\\mathcal\{W\}\}\_\{0\}is a Hilbert–Schmidt kernel onL2\(\[0,1\]\)L^\{2\}\(\[0,1\]\)and admits an eigendecompositionW\(x,y\)=∑i=1∞λiφi\(x\)φi\(y\)W\(x,y\)=\\sum\_\{i=1\}^\{\\infty\}\\lambda\_\{i\}\\,\\varphi\_\{i\}\(x\)\\,\\varphi\_\{i\}\(y\)with eigenvaluesλ1≥λ2≥⋯\\lambda\_\{1\}\\geq\\lambda\_\{2\}\\geq\\cdotsconverging to zero\(Lovász,[2012](https://arxiv.org/html/2606.07598#bib.bib18), §7\.5\)\. The truncated spectrum\(λ1,…,λK\)\(\\lambda\_\{1\},\\dots,\\lambda\_\{K\}\)is permutation\-invariant and provides aKK\-dimensional Euclidean fingerprint that can be used as a baseline against the spherical proposal\. The spectral fingerprint is cheaper to compute and rotation\-invariant by construction, but strictly less informative than the full spherical map, which retains the geometry of node assignments to blocks\.
### 9\.6An experimental program
We propose the following four\-step empirical evaluation:
1. 1\.Build a library of∼200\\sim 200pre\-trained GNNs spanning eight problem domains \(PROTEINS, ENZYMES, COLLAB, REDDIT\-B, OGBN\-ARXIV, OGBN\-PRODUCTS, ZINC, ogbg\-molhiv\)\.
2. 2\.For each, compute the five candidate fingerprints \([Section˜7\.3](https://arxiv.org/html/2606.07598#S7.SS3),[Section˜9\.2](https://arxiv.org/html/2606.07598#S9.SS2),[Section˜9\.4](https://arxiv.org/html/2606.07598#S9.SS4),[Section˜9\.5](https://arxiv.org/html/2606.07598#S9.SS5), and a hybrid\)\.
3. 3\.Use each fingerprint to predict transfer\-learning effectiveness, scored as the validation\-loss reduction obtained when fine\-tuning the matched source model on a small target\-task sample versus training from scratch\.
4. 4\.Report Spearman correlation between fingerprint distance and transfer effectiveness, and the wall\-clock cost of each fingerprint\.
## 10Conclusion
We have proposed a topological framework for comparing trained Graph Neural Networks via spherical fingerprints of the Stochastic Block Models that approximate their graphon\-signal action\. Our framework rests on the compactness of the cut\-distance graphon space and on Levie’s recent graphon\-signal extension of the Frieze–Kannan weak regularity lemma, in conjunction with the Lipschitz property of MPNNs\.
We have identified the high\-dimensional concentration of measure as the primary obstacle to scaling the spherical fingerprint to LLM\-class embedding dimensions, and determined the unit\-area sphere dimension to ben⋆≈18\.7683n^\{\\star\}\\approx 18\.7683\. We have outlined five concrete avenues for future research — hyperbolic / Grassmannian targets, Gromov–Wasserstein distances, information geometry of SBM space, persistent homology of layer\-wise embedding clouds, and a spectral baseline — together with an experimental program for empirical validation\.
The broader vision is one of*model archaeology*: a public registry of trained GNNs, each annotated with a small, comparable, mathematically principled fingerprint\. Such a registry would make transfer learning at the GNN scale a retrieval problem rather than a re\-training problem, and would allow the community to identify, at a glance, the topological niches in which existing models cluster and the niches in which they are missing\.
#### Acknowledgements\.
The author thanks the Emporia State University seminar audience for discussion, and acknowledges Ron Levie’s foundational work\(Levie,[2023](https://arxiv.org/html/2606.07598#bib.bib17); Böker et al\.,[2023](https://arxiv.org/html/2606.07598#bib.bib2); Rauchwerger et al\.,[2025](https://arxiv.org/html/2606.07598#bib.bib24)\)on which a substantial portion of this paper depends\.
## References
- Amari \(2016\)Shun\-ichi Amari\.*Information Geometry and Its Applications*\.Applied Mathematical Sciences 194\. Springer Japan, 2016\.
- Böker et al\. \(2023\)Jan Böker, Ron Levie, Ningyuan Huang, Soledad Villar, and Christopher Morris\.Fine\-grained expressivity of graph neural networks\.In*Advances in Neural Information Processing Systems 36 \(NeurIPS 2023\)*, 2023\.arXiv:2306\.03698\.
- Carlsson \(2009\)Gunnar Carlsson\.Topology and data\.*Bulletin of the American Mathematical Society*, 46\(2\):255–308, 2009\.
- Cervino et al\. \(2023\)Juan Cervino, Luana Ruiz, and Alejandro Ribeiro\.Learning by transference: Training graph neural networks on growing graphs\.*IEEE Transactions on Signal Processing*, 71:233–247, 2023\.arXiv:2106\.03693\.
- Cybenko \(1989\)George Cybenko\.Approximation by superpositions of a sigmoidal function\.*Mathematics of Control, Signals, and Systems*, 2\(4\):303–314, 1989\.
- Edelman et al\. \(1998\)Alan Edelman, Tomás A\. Arias, and Steven T\. Smith\.The geometry of algorithms with orthogonality constraints\.*SIAM Journal on Matrix Analysis and Applications*, 20\(2\):303–353, 1998\.
- Edelsbrunner and Harer \(2010\)Herbert Edelsbrunner and John Harer\.*Computational Topology: An Introduction*\.American Mathematical Society, Providence, RI, 2010\.
- Frieze and Kannan \(1999\)Alan Frieze and Ravi Kannan\.Quick approximation to matrices and applications\.*Combinatorica*, 19\(2\):175–220, 1999\.
- Gilmer et al\. \(2017\)Justin Gilmer, Samuel S\. Schoenholz, Patrick F\. Riley, Oriol Vinyals, and George E\. Dahl\.Neural message passing for quantum chemistry\.In*Proceedings of the 34th International Conference on Machine Learning \(ICML 2017\)*, volume 70 of*Proceedings of Machine Learning Research*, pages 1263–1272\. PMLR, 2017\.arXiv:1704\.01212\.
- Hatcher \(2002\)Allen Hatcher\.*Algebraic Topology*\.Cambridge University Press, Cambridge, 2002\.
- Holland et al\. \(1983\)Paul W\. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt\.Stochastic blockmodels: First steps\.*Social Networks*, 5\(2\):109–137, 1983\.
- Hornik \(1991\)Kurt Hornik\.Approximation capabilities of multilayer feedforward networks\.*Neural Networks*, 4\(2\):251–257, 1991\.
- Hu et al\. \(2020\)Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, and Jure Leskovec\.Strategies for pre\-training graph neural networks\.In*International Conference on Learning Representations \(ICLR 2020\)*, 2020\.arXiv:1905\.12265\.
- Kipf and Welling \(2017\)Thomas N\. Kipf and Max Welling\.Semi\-supervised classification with graph convolutional networks\.In*International Conference on Learning Representations \(ICLR 2017\)*, 2017\.arXiv:1609\.02907\.
- Ledoux \(2001\)Michel Ledoux\.*The Concentration of Measure Phenomenon*\.Mathematical Surveys and Monographs 89\. American Mathematical Society, 2001\.
- Levie et al\. \(2019\)Ron Levie, Elvin Isufi, and Gitta Kutyniok\.On the transferability of spectral graph filters\.In*13th International Conference on Sampling Theory and Applications \(SampTA 2019\)*, 2019\.arXiv:1901\.10524\.
- Levie \(2023\)Ron Levie\.A graphon\-signal analysis of graph neural networks\.In*Advances in Neural Information Processing Systems 36 \(NeurIPS 2023\)*, 2023\.arXiv:2305\.15987\.
- Lovász \(2012\)László Lovász\.*Large Networks and Graph Limits*\.AMS Colloquium Publications 60\. American Mathematical Society, Providence, RI, 2012\.
- Lovász and Szegedy \(2006\)László Lovász and Balázs Szegedy\.Limits of dense graph sequences\.*Journal of Combinatorial Theory, Series B*, 96\(6\):933–957, 2006\.
- Maron et al\. \(2019\)Haggai Maron, Heli Ben\-Hamu, Nadav Shamir, and Yaron Lipman\.Invariant and equivariant graph networks\.In*International Conference on Learning Representations \(ICLR 2019\)*, 2019\.arXiv:1812\.09902\.
- Maskey et al\. \(2022\)Sohir Maskey, Ron Levie, Yunseok Lee, and Gitta Kutyniok\.Generalization analysis of message passing neural networks on large random graphs\.In*Advances in Neural Information Processing Systems 35 \(NeurIPS 2022\)*, 2022\.arXiv:2202\.00645\.
- Mémoli \(2011\)Facundo Mémoli\.Gromov–Wasserstein distances and the metric approach to object matching\.*Foundations of Computational Mathematics*, 11\(4\):417–487, 2011\.
- Nickel and Kiela \(2017\)Maximilian Nickel and Douwe Kiela\.Poincaré embeddings for learning hierarchical representations\.In*Advances in Neural Information Processing Systems 30 \(NIPS 2017\)*, 2017\.
- Rauchwerger et al\. \(2025\)Levi Rauchwerger, Stefanie Jegelka, and Ron Levie\.Generalization, expressivity, and universality of graph neural networks on attributed graphs\.In*International Conference on Learning Representations \(ICLR 2025\)*, 2025\.arXiv:2411\.05464\.
- Ruiz et al\. \(2020\)Luana Ruiz, Luiz F\. O\. Chamon, and Alejandro Ribeiro\.Graphon neural networks and the transferability of graph neural networks\.In*Advances in Neural Information Processing Systems 33 \(NeurIPS 2020\)*, 2020\.arXiv:2006\.03548\.
- Sanchez\-Lengeling et al\. \(2021\)Benjamin Sanchez\-Lengeling, Emily Reif, Adam Pearce, and Alexander B\. Wiltschko\.A gentle introduction to graph neural networks\.*Distill*, 2021\.[https://distill\.pub/2021/gnn\-intro/](https://distill.pub/2021/gnn-intro/)\.doi:10\.23915/distill\.00033\.
- Sala et al\. \(2018\)Frederic Sala, Christopher De Sa, Albert Gu, and Christopher Ré\.Representation tradeoffs for hyperbolic embeddings\.In*Proceedings of the 35th International Conference on Machine Learning \(ICML 2018\)*, 2018\.
- Sturm \(2012\)Karl\-Theodor Sturm\.The space of spaces: Curvature bounds and gradient flows on the space of metric measure spaces\.*arXiv:1208\.0434*, 2012\.
- Szemerédi \(1978\)Endre Szemerédi\.Regular partitions of graphs\.In*Problèmes Combinatoires et Théorie des Graphes*, Colloques Internationaux CNRS 260, pages 399–401\. CNRS, Paris, 1978\.
- Veličković et al\. \(2018\)Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio\.Graph attention networks\.In*International Conference on Learning Representations \(ICLR 2018\)*, 2018\.arXiv:1710\.10903\.
- Xu et al\. \(2019a\)Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka\.How powerful are graph neural networks?In*International Conference on Learning Representations \(ICLR 2019\)*, 2019a\.arXiv:1810\.00826\.
- Xu et al\. \(2019b\)Hongteng Xu, Dixin Luo, and Lawrence Carin\.Gromov–Wasserstein learning for graph matching and node embedding\.In*Proceedings of the 36th International Conference on Machine Learning \(ICML 2019\)*, 2019b\.
- Xu et al\. \(2019c\)Hongteng Xu, Dixin Luo, Hongyuan Zha, and Lawrence Carin\.Scalable Gromov–Wasserstein learning for graph partitioning and matching\.In*Advances in Neural Information Processing Systems 32 \(NeurIPS 2019\)*, 2019c\.Similar Articles
Non-negative Matrix Factorisation with Topological Regularisation
The paper introduces Top-NMF, a framework that incorporates topological regularization via persistent homology into non-negative matrix factorization to learn interpretable and structurally coherent bases, applicable to spatially coherent image components, periodic time-series structures, and clique-like graph signals.
@BetaTomorrow: Paper: Topological Neural Operators Authors: Lennart Bastian(@lennart_bastian), Tolga Birdal(@tolga_birdal), Samuel Lev…
This paper introduces Topological Neural Operators, which lift neural operators from point-only domains to cell complexes, embedding geometry and topology to reduce the learning burden. It demonstrates that operator learning improves when geometry is not an afterthought, though the topology remains prescribed.
Learning Coherent Representations: A Topological Approach to Interpretability
This paper introduces coherence, a geometric constraint for neural representations inspired by grid cells and head direction cells in the brain. Coherence ensures that features respond to geometrically connected regions of the data manifold, improving interpretability; the authors propose a differentiable objective (Coh) and validate it on synthetic data, rotated MNIST, and BERT token embeddings.
Geometry of Semantic Space: Comparative Study of Discrete and Continuous Models
This paper compares the geometric structures induced by deep learning vector embeddings (CamemBERT) and lexical co-occurrence graph models on the French 'Great National Debate' corpus, finding similar local topology but distinct global organization, highlighting complementarity between the two approaches.
Scaling Novel Graph Generation via Lightweight Structure-Guided Autoregressive Models
Researchers propose a lightweight autoregressive framework for graph generation that uses structure-guided topological ordering to achieve near log-linear complexity, addressing scalability and novelty limitations of existing diffusion and autoregressive methods. The approach supports both LSTM and Mamba-style backbones and shows improved novelty while maintaining validity and uniqueness on molecular and non-molecular benchmarks.