Graph Normalization: Fast Binarizing Dynamics for Differentiable MWIS

arXiv cs.LG 05/08/26, 04:00 AM Papers
Summary
Introduces Graph Normalization, a differentiable dynamical system for approximating Maximum Weight Independent Set, with convergence guarantees and applications in structured sparse attention and constrained optimization.
arXiv:2605.05330v1 Announce Type: new Abstract: We introduce Graph Normalization (GN), a principled dynamical system on graphs that serves as a differentiable approximation engine for the NP-hard Maximum Weight Independent Set (MWIS) problem. MWIS encompasses many combinatorial challenges, including optimal assignment, scheduling, set packing, and MAP inference in discrete Markov Random Fields. Unlike Belief Propagation, we prove GN always converges to a binary indicator of a Maximum Independent Set. GN realizes a fast quasi-Newton descent through an exact Majorization-Minimization step, systematically improving the MWIS relaxed primal objective. We establish an equivalence between GN and the Replicator Dynamics of a nonlinear evolutionary game, where vertices compete for inclusion in an independent set. While a non-potential game, the GN game follows Fisher's Fundamental Theorem of Natural Selection, where the average fitness equals the MWIS primal objective and strictly increases. This connection leads to a weighted extension of the Motzkin-Straus theorem, showing MISes are in bijection with the local minima of a quadratic form over a tilted simplex. For the Assignment Problem, GN acts as a variant of the Sinkhorn algorithm that naturally converges to a hard assignment while generalizing to arbitrary constraint graphs. We demonstrate GN's performance as a fast binarization engine for the state-of-the-art Bregman-Sinkhorn relaxed MWIS solver. On real-world benchmarks with up to 1M edges, GN identifies solutions within 1% of the best known results in seconds on a CPU. GN opens new avenues for deep learning architectures requiring differentiable, "hard" decisions under constraints, with applications in structured sparse attention, dynamic network pruning, and Mixture-of-Experts. Beyond core AI, the GN framework enables end-to-end learning of constrained optimization in computer vision, computational biology, and resource allocation.
Original Article
View Cached Full Text
Cached at: 05/08/26, 06:53 AM
# Graph Normalization: Fast Binarizing Dynamics for Differentiable MWIS
Source: [https://arxiv.org/html/2605.05330](https://arxiv.org/html/2605.05330)
###### Abstract

We introduce Graph Normalization \(GN\), a principled dynamical system on graphs that serves as a differentiable approximation engine for the NP\-hard Maximum Weight Independent Set \(MWIS\) problem\. MWIS encompasses a vast class of combinatorial challenges, including optimal assignment, scheduling, set packing, and MAP inference in discrete Markov Random Fields\. Unlike Belief Propagation, we prove that GN always converges to a binary indicator of a Maximum Independent Set \(MIS\)\. GN realizes a fast quasi\-Newton descent through an exact Majorization\-Minimization step, which systematically improves the MWIS relaxed primal objective\. We establish an equivalence between GN and the Replicator Dynamics of a nonlinear evolutionary game, where vertices compete for inclusion in an independent set\. While being a non\-potential game, the GN game follows Fisher’s Fundamental Theorem of Natural Selection, where the average fitness is equal to the MWIS primal objective and strictly increases over time\. This connection leads to a weighted extension of the Motzkin\-Straus theorem, showing that MISes are in bijection with the local minima of a quadratic form over a tilted simplex\. For the Assignment Problem, GN acts as a variant of the Sinkhorn algorithm that naturally converges to a hard assignment while generalizing to arbitrary constraint graphs beyond bipartite matching\. We demonstrate GN’s performance as a fast binarization engine for the state\-of\-the\-art Bregman\-Sinkhorn relaxed MWIS solver\. On real\-world benchmarks with up to 1M edges, GN identifies solutions within a11% gap of the best known results in seconds on a laptop CPU\. GN opens new avenues for deep learning architectures requiring differentiable, ”hard” decisions under constraints, with potential applications in structured sparse attention, dynamic network pruning, and Mixture\-of\-Experts\. Beyond core AI, the GN framework provides a foundation for end\-to\-end learning of constrained optimization tasks in fields such as computer vision, computational biology, and resource allocation\.

## 1Introduction

The Maximum Weight Independent Set \(MWIS\) problem – selecting a set of non\-adjacent nodes in a graph of maximum total weight – is a cornerstone of combinatorial optimization\. It is equivalent to the Maximum Weight Clique \(MWC\) problem in the complementary graph and it directly subsumes classic problems such as Bipartite Graph Matching and itsnn\-dimensional extensions,kk\-Coloring, Maximum Weight Set Packing, the multi\-dimensional Knapsack problem, and Inexact Graph MatchingBunke \([1997](https://arxiv.org/html/2605.05330#bib.bib12)\)\. Furthermore, it serves as a pivotal framework for MAP inference in discrete Markov Random FieldsSanghaviet al\.\([2009](https://arxiv.org/html/2605.05330#bib.bib106)\), Max\-SAT reductionsKarp \([1972](https://arxiv.org/html/2605.05330#bib.bib28)\), and Distributed Constraint Optimization \(DCOP\)Fiorettoet al\.\([2018](https://arxiv.org/html/2605.05330#bib.bib20)\)\. Many real\-world problems can be formulated as MWIS instances, spanning from logistics and vehicle routingDumaset al\.\([1991](https://arxiv.org/html/2605.05330#bib.bib5)\), wireless link scheduling in telecommunicationsTassiulas and Ephremides \([1990](https://arxiv.org/html/2605.05330#bib.bib7)\), to map label placementVerweij and Aardal \([1999](https://arxiv.org/html/2605.05330#bib.bib112)\)\. In the domain of computer vision, this includes image segmentationBrendel and Todorovic \([2010](https://arxiv.org/html/2605.05330#bib.bib107)\), 3D reconstructionZhuoet al\.\([2016](https://arxiv.org/html/2605.05330#bib.bib156)\), multi\-object trackingPapageorgiou and Salpukas \([2009](https://arxiv.org/html/2605.05330#bib.bib115)\), and action recognitionYu and Yuan \([2015](https://arxiv.org/html/2605.05330#bib.bib138)\)\. In biological sciences, MWIS addresses molecular sequence alignmentKececioglu \([1993](https://arxiv.org/html/2605.05330#bib.bib19)\), drug discoveryBanchiet al\.\([2020](https://arxiv.org/html/2605.05330#bib.bib8)\)and genome sequence assemblyLaso\-Jadartet al\.\([2020](https://arxiv.org/html/2605.05330#bib.bib18)\), while in economics, it provides the foundation for solving combinatorial auctionsDe Vries and Vohra \([2003](https://arxiv.org/html/2605.05330#bib.bib153)\)and is used to model portfolio managementBettinelliet al\.\([2017](https://arxiv.org/html/2605.05330#bib.bib6)\)\.

Despite its fundamental importance, MWIS is NP\-hard and hard to approximateGarey and Johnson \([1979](https://arxiv.org/html/2605.05330#bib.bib148)\), making exact solutions intractable for the massive real\-world graphs encountered in modern applications\. Traditional approaches often rely on LP relaxations of the associated Binary Linear Program \(BLP\)Halleret al\.\([2024](https://arxiv.org/html/2605.05330#bib.bib83)\); however, bridging the gap between fractional solutions and feasible integer assignments, i\.e\., actual solutions to the MWIS problem, remains a significant challenge and often a computational bottleneck\. Even modern learning\-based approaches using Graph Neural Networks \(GNNs\), such asKaralias and Loukas \([2020](https://arxiv.org/html/2605.05330#bib.bib21)\), typically solve a continuous relaxation of the problem and suffer from the same binarization deficit: they do not guarantee a valid binary output, necessitating post\-hoc greedy heuristics\. They also fail to provide the rigorous optimality gap guarantees found in classical optimization frameworks\.

Another fundamental challenge is the integration of MWIS solvers into deep learning pipelines to enable end\-to\-end learning of constrained tasks\. Classical combinatorial algorithms are procedural and non\-differentiable, precluding their direct use in gradient\-based learning\. While black\-box differentiationPogančićet al\.\([2019](https://arxiv.org/html/2605.05330#bib.bib11)\)circumvents this by perturbing the input, it incurs high computational overhead and treats the solver as an external oracle rather than a native dynamical layer\. Alternatively, Reinforcement Learning \(RL\) has been employed to treat optimization as a sequential decision taskBelloet al\.\([2016](https://arxiv.org/html/2605.05330#bib.bib3)\); Khalilet al\.\([2017](https://arxiv.org/html/2605.05330#bib.bib4)\); however, RL suffers from significant sample inefficiency and high variance, making it computationally prohibitive for the production\-scale graphs addressed in this work\. While Belief Propagation \(BP\) is differentiable, it is notoriously prone to non\-convergence on graphs with loopsSanghaviet al\.\([2009](https://arxiv.org/html/2605.05330#bib.bib106)\)\. To our knowledge, the only native differentiable MWIS framework providing guaranteed convergence to a valid binary output without external annealing schedules is the Replicator Dynamics approach of Bomze and co\-workersBomze \([1997](https://arxiv.org/html/2605.05330#bib.bib57)\)\. We detail the relationship between our approach and Bomze’s – and why GN provides superior scalability – in Section[5](https://arxiv.org/html/2605.05330#S5)\.

Crucially, the most prominent success story of differentiable combinatorial optimization – the Sinkhorn\-Knopp \(SK\) algorithm – is fundamentally an MWIS solver restricted to bipartite graphs\. Indeed, Bipartite Graph Matching \(BGM\), also known as the Assignment Problem – finding a 1:1 matching between two sets which maximizes the total weight of the matches, is equivalent to MWIS in the line graphL\(Kn,n\)L\(K\_\{n,n\}\), in which the role of nodes and edges is exchanged compared to the complete bipartite graphKn,nK\_\{n,n\}\. BGM is an easy sub\-problem of MWIS: it can be solved in polynomial time, e\.g\., by Kuhn’s Hungarian algorithmKuhn \([1955](https://arxiv.org/html/2605.05330#bib.bib149)\)\. In their seminal work, Sinkhorn and Knopp have proven that alternating row and column normalization of a non\-negative matrix with total support converges to a bi\-stochastic matrixSinkhorn and Knopp \([1967](https://arxiv.org/html/2605.05330#bib.bib117)\), i\.e\., to a ”soft” / relaxed solution of the Assignment Problem\. The relationship between Sinkhorn\-Knopp and the ”hard” assignment problem was historically navigated through the lens of statistical physics, most notably in the SoftAssignGoldet al\.\([1996](https://arxiv.org/html/2605.05330#bib.bib125)\)and Graduated AssignmentGold and Rangarajan \([1996](https://arxiv.org/html/2605.05330#bib.bib124)\)frameworks\. These methods utilize deterministic annealing to solve the assignment problem by minimizing a free energy functionYuilleet al\.\([1990](https://arxiv.org/html/2605.05330#bib.bib31)\); Kosowsky and Yuille \([1994](https://arxiv.org/html/2605.05330#bib.bib123)\); as the “temperature” parameter is decreased toward the0\-temperature limit, the soft solution is forced to converge toward a “crisp” permutation matrixRangarajanet al\.\([1997](https://arxiv.org/html/2605.05330#bib.bib98)\)\. This lineage provided the foundational theory for the modern breakthrough of Optimal Transport \(OT\), where the Sinkhorn algorithm provides an efficient solution to a strictly convex relaxation of the Kantorovich problemCuturi \([2013](https://arxiv.org/html/2605.05330#bib.bib38)\)\. Mathematically, Sinkhorn iterations act as Bregman projections – minimizing the Kullback\-Leibler divergence toward the constraint set – where an entropic penalty effectively acts as the inverse of the annealing temperatureCuturi \([2013](https://arxiv.org/html/2605.05330#bib.bib38)\); Genevay \([2019](https://arxiv.org/html/2605.05330#bib.bib100)\)\. The popularization of OT within the machine learning community has, in turn, established “Sinkhorn Layers” as foundational plug\-and\-play modules across a diverse array of neural architectures – ranging from differentiable rankingAdams and Zemel \([2011](https://arxiv.org/html/2605.05330#bib.bib40)\); Menaet al\.\([2018](https://arxiv.org/html/2605.05330#bib.bib41)\), graph alignmentZanfir and Sminchisescu \([2018](https://arxiv.org/html/2605.05330#bib.bib139)\), image matchingSarlinet al\.\([2020](https://arxiv.org/html/2605.05330#bib.bib22)\)to multi\-person 3D pose trackingReddyet al\.\([2021](https://arxiv.org/html/2605.05330#bib.bib23)\)\. The utility of Sinkhorn dynamics is evident in modern Large Language Models \(LLMs\); Mixture\-of\-Experts \(MoE\) and Manifold Constrained Hyper\-ConnectionsXieet al\.\([2025](https://arxiv.org/html/2605.05330#bib.bib34)\)employ SK iterations to stabilize training and enforce balanced token routingClarket al\.\([2022](https://arxiv.org/html/2605.05330#bib.bib37)\); DeepSeek\-AI \([2024](https://arxiv.org/html/2605.05330#bib.bib33)\)\.

Despite its success, SK is fundamentally constrained by its underlying bipartite topology, addressing primarily the assignment problem and its multi\-dimensional generalizationAltschuler and Boix\-Adsera \([2023](https://arxiv.org/html/2605.05330#bib.bib94)\)\. While effective for 1:1 matchings, SK cannot natively resolve complex multi\-way conflicts found in general relational data without artificial bipartitization such as in the SoftAssign algorithm for graph matchingGold and Rangarajan \([1996](https://arxiv.org/html/2605.05330#bib.bib124)\)\. Furthermore, SK typically produces soft assignments within the Birkhoff polytope, requiring vanishing entropy regularization via sensitive annealing schedules to recover discrete solutionsRangarajanet al\.\([1999](https://arxiv.org/html/2605.05330#bib.bib32)\)\.

In this paper, we introduce Graph Normalization \(GN\), a principled dynamical system that overcomes these topological constraints by generalizing Sinkhorn\-like normalization dynamics to arbitrary constraint graphs\. We show that GN realizes a fast quasi\-Newton descent through an exact Majorization\-Minimization step that is mathematically proven to always converge to a binary indicator of a Maximum Independent Set, eliminating the need for post\-hoc heuristics or sensitive annealing schedules\. We establish a formal equivalence between GN and the Replicator Dynamics of a nonlinear evolutionary game, where the average population fitness identifies with the MWIS primal objective and is proven to strictly increase over time\. Empirically, we demonstrate the efficiency of GN as as a high\-speed binarization engine for the relaxed solutions provided by the state\-of\-the\-art Bregman\-Sinkhorn fractional MWIS algorithmHalleret al\.\([2024](https://arxiv.org/html/2605.05330#bib.bib83)\), producing high quality solutions on production\-scale graphs with millions of edges\. As a*differentiable*MWIS engine, GN unlocks end\-to\-end learning of complex constrained tasks, bridging the gap between combinatorial optimization, theoretical game theory, and the structural demands of modern neural architectures\.

## 2Graph Normalization: Definition and Basic Properties

We denoteℝ\+n\\mathbb\{R\}\_\{\+\}^\{n\}andℝ\+\+n\\mathbb\{R\}\_\{\+\+\}^\{n\}as the non\-negative and positive orthants, respectively\. We writex≥0x\\geq 0ifx∈ℝ\+nx\\in\\mathbb\{R\}\_\{\+\}^\{n\}andx\>0x\>0ifx∈ℝ\+\+nx\\in\\mathbb\{R\}\_\{\+\+\}^\{n\}\. Forx,y∈ℝ\+\+nx,y\\in\\mathbb\{R\}\_\{\+\+\}^\{n\},x⊙yx\\odot yandx⊘yx\\oslash ydenote resp\. componentwise multiplication and division\. LetG=\(V,E\)G=\(V,E\)be a simple undirected graph with\|V\|=n\|V\|=nand adjacency matrixA∈\{0,1\}n×nA\\in\\\{0,1\\\}^\{n\\times n\}, verifyingAii=0A\_\{ii\}=0asGGis simple\. We identifyVVwith\{1…n\}\\\{1\\dots n\\\}andGGwith its adjacency matrix\. For a vectorx∈ℝ\+nx\\in\\mathbb\{R\}\_\{\+\}^\{n\},supp⁡\(x\)=\{i:xi\>0\}⊆V\\operatorname\{supp\}\(x\)=\\\{i:x\_\{i\}\>0\\\}\\subseteq Vis its*support*\. Conversely, ifS⊆VS\\subseteq V, then𝟏S\\mathbf\{1\}\_\{S\}denotes its*indicator vector*, which is binary and verifies\(𝟏S\)i=1⇔i∈S\(\\mathbf\{1\}\_\{S\}\)\_\{i\}=1\\Leftrightarrow i\\in S\. Assupp⁡\(𝟏S\)=S\\operatorname\{supp\}\(\\mathbf\{1\}\_\{S\}\)=S, we identify each subsetS⊆VS\\subseteq Vwith its indicator vector𝟏S∈\{0,1\}n\\mathbf\{1\}\_\{S\}\\in\\\{0,1\\\}^\{n\}\. A vectorx∈\[0,1\]nx\\in\[0,1\]^\{n\}can be interpreted as a*fuzzy set*membership vector over the set of verticesVV\. Whenx∈\{0,1\}nx\\in\\\{0,1\\\}^\{n\}is binary, it is the indicator vector𝟏S\\mathbf\{1\}\_\{S\}of a*crisp*subset of verticesS⊂VS\\subset V\.S⊂VS\\subset Vis an*Independent Set \(IS\)*,S∈𝖨𝖲S\\in\\mathsf\{IS\}, when no two elements ofSSare neighbors inAA, i\.e,Aij=0A\_\{ij\}=0for all\(i,j\)∈S2\(i,j\)\\in S^\{2\}, or equivalently,𝟏STA𝟏S=0\\mathbf\{1\}\_\{S\}^\{T\}A\\mathbf\{1\}\_\{S\}=0\. An ISSSis a*Maximal Independent Set \(MIS\)*,S∈𝖬𝖨𝖲S\\in\\mathsf\{MIS\}, if it is not included in another IS\. For any nodei∈Vi\\in V,N\(i\):=\{j∈V:Aij=1\}N\(i\):=\\\{j\\in V:A\_\{ij\}=1\\\}is its*neighborhood*\. We denote byAI:=A\+I\{A\}\_\{\\scriptscriptstyle I\}:=A\+Ithe*closed adjacency matrix*, where the addition of the identity matrixIIcorresponds to the inclusion of self\-loops for every vertex, such that\(AI\)ii=1\(\{A\}\_\{\\scriptscriptstyle I\}\)\_\{ii\}=1\. For a vectorx∈ℝ\+nx\\in\\mathbb\{R\}\_\{\+\}^\{n\}, the values\(AIx\)i=xi\+∑j∈N\(i\)xj\(\{A\}\_\{\\scriptscriptstyle I\}x\)\_\{i\}=x\_\{i\}\+\\sum\_\{j\\in N\(i\)\}x\_\{j\}are called the*closed neighborhood sums*\.

###### Definition 1\.

A vectorx∈ℝ\+nx\\in\\mathbb\{R\}\_\{\+\}^\{n\}isNormalizableon a graphAAif and only ifAIx\>0\{A\}\_\{\\scriptscriptstyle I\}x\>0\.𝖭A:=\{x∈ℝ\+n:AIx\>𝟎\}\\mathsf\{N\}\_\{A\}:=\\\{x\\in\\mathbb\{R\}\_\{\+\}^\{n\}:\{A\}\_\{\\scriptscriptstyle I\}x\>\\mathbf\{0\}\\\}denotes the set of normalizable vectors onAA\.

The non\-normalizable vectors are the vectors which are identically zero on some closed neighborhoods, i\.e\., such that for some nodeii,xi=0x\_\{i\}=0as well as for all its neighbors\.

###### Definition 2\.

TheGraph Normalization \(GN\)map𝒩A:𝖭A→\[0,1\]n\\mathcal\{N\}\_\{A\}:\\mathsf\{N\}\_\{A\}\\to\[0,1\]^\{n\}associates to a normalizable vectorxxthe vector:

𝒩A\(x\):=x⊘AIx\\mathcal\{N\}\_\{A\}\(x\):=x\\oslash\{A\}\_\{\\scriptscriptstyle I\}x\(1\)
Element\-wise, this is expressed as:\(𝒩A\(x\)\)i=xi/\(xi\+∑j∈N\(i\)xj\)\(\\mathcal\{N\}\_\{A\}\(x\)\)\_\{i\}=x\_\{i\}/\\left\(x\_\{i\}\+\\sum\_\{j\\in N\(i\)\}x\_\{j\}\\right\)\.

Iterative Graph Normalization \(IGN\)is then defined by the sequence:x0∈𝖭Ax^\{0\}\\in\\mathsf\{N\}\_\{A\};xk\+1=𝒩A\(xk\)x^\{k\+1\}=\\mathcal\{N\}\_\{A\}\(x^\{k\}\)\.

By GN, each node is normalized in parallel by the sum of its own mass and the mass of its neighbors\.

When the graph is the complete graphKnK\_\{n\}, all nodes are pairwise connected, and the closed neighborhood sum for every node is identical:\(AIx\)i=∑j=1nxj\(\{A\}\_\{\\scriptscriptstyle I\}x\)\_\{i\}=\\sum\_\{j=1\}^\{n\}x\_\{j\}\. Consequently, GN corresponds exactly to simple probabilistic vector normalization, projecting anyxxonto the simplexΔn−1\\Delta^\{n\-1\}in a single iteration\. The*binary*fixed points of the dynamics are the one\-hot vectors, representing the MISes ofKnK\_\{n\}\.

When the graph is the line graphL\(KV1,V2\)L\(K\_\{V\_\{1\},V\_\{2\}\}\)corresponding to the Assignment Problem between two setsV1,V2V\_\{1\},V\_\{2\}, GN normalizes each weightWijW\_\{ij\},\(i,j\)∈V1×V2\(i,j\)\\in V\_\{1\}\\times V\_\{2\}by the sum of all competing assignments in its respective row and column:Wij\+∑k∈V1∖\{i\}Wkj\+∑l∈V2∖\{j\}WilW\_\{ij\}\+\\sum\_\{k\\in V\_\{1\}\\setminus\\\{i\\\}\}W\_\{kj\}\+\\sum\_\{l\\in V\_\{2\}\\setminus\\\{j\\\}\}W\_\{il\}\. We refer to this process as matrix*cross\-normalization*\. GN thus replaces the alternating row/column projections of Sinkhorn\-Knopp with a single normalization step over all conflicting assignments\. This scheme extends beyond line graphs of assignment problems to arbitrary constraint topologies; while close in spirit to parallel Bregman\-Dijkstra projectionsBauschke and Borwein \([1996](https://arxiv.org/html/2605.05330#bib.bib2)\), GN is a distinct dynamical system\.

While the MISes are the only binary fixed points of the GN map, the system admits spurious fractional fixed points which can act as attractors or create plateaus that impede convergence to binary solutions\. We provide a complete characterization of the fractional attractors of this unregularized GN map, including an exhaustive enumeration up ton=10n=10in Appendix[A](https://arxiv.org/html/2605.05330#A1)\. Furthermore, a naive initializationx0=wx^\{0\}=wprovides only a transient bias; because the GN operator is topologically driven by the adjacency matrixAI\{A\}\_\{\\scriptscriptstyle I\}, the unweighted dynamics remain objective\-agnostic\. To transform GN into a robust, weight\-aware MWIS engine, we introduce two synergistic modifications: \(i\) a regularization parameterγ\\gammathat functions as a structural catalyst for binarization, and \(ii\) a scaling of the adjacency matrix that embeds the weighted objective directly into the system’s fixed\-point topology\.

###### Definition 3\(Weighted Regularized Graph Normalization \(WRGN\)\)\.

LetG=\(A,w\)G=\(A,w\)withw∈ℝ\+\+nw\\in\\mathbb\{R\}\_\{\+\+\}^\{n\}be a weighted graph andγ\>1\\gamma\>1a regularization parameter\. Letv:=wv:=\\sqrt\{w\}\. theWRGNmap is the map𝒩Aγ,v\\mathcal\{N\}\_\{A^\{\\gamma,v\}\}, whereAγ,vA^\{\\gamma,v\}is the Weighted Regularized Adjacency Matrix:

Aγ,v:=γ⋅diag\(v\)−1A⋅diag\(v\)A^\{\\gamma,v\}:=\\gamma\\cdot\\operatorname\{diag\}\(v\)^\{\-1\}A\\cdot\\operatorname\{diag\}\(v\)\(2\)Component\-wise, this gives\(𝒩Aγ,v\(x\)\)i=xi/\(xi\+γ∑j∈N\(i\)vjvixj\)\(\\mathcal\{N\}\_\{A^\{\\gamma,v\}\}\(x\)\)\_\{i\}=x\_\{i\}/\\left\(x\_\{i\}\+\\gamma\\sum\_\{j\\in N\(i\)\}\\frac\{v\_\{j\}\}\{v\_\{i\}\}x\_\{j\}\\right\)\.

In this formulation,γ\\gammabreaks the symmetry between self\-influence and neighbor\-influence\. We show in next sections thatγ\>1\\gamma\>1triggers a hyperbolic transition that makes fractional solutions repulsive, leaving Maximal Independent Sets \(MISes\) as the only asymptotically stable attractors\. Simultaneously, the biasvvrealigns the energy landscape to prioritize high\-weight vertices, embedding the MWIS objective directly into the dynamics regardless of initialization\.

Note that whileAAandI\+γAI\+\\gamma Aare symmetric,Aγ,vA^\{\\gamma,v\}is*not*\. It is*diagonally similar*toI\+γAI\+\\gamma Avia the constant potentialvv\. This structure implies that the pressure exerted on nodeiiby neighborjjis scaled byvj/viv\_\{j\}/v\_\{i\}, while the reciprocal pressure is scaled byvi/vjv\_\{i\}/v\_\{j\}\. This asymmetry is what allows high\-weight nodes to exert more “normalization pressure” on their lower\-weight neighbors, effectively suppressing them in the competition for inclusion in the independent set\.

Before studying the dynamics induced by WRGN, we first provide some basic properties of the WRGN map \(proofs are straightforward\)\. For anyA,γ,vA,\\gamma,v, WRGN verifies:

i\) Boundedness:For anyx∈𝖭Ax\\in\\mathsf\{N\}\_\{A\},𝒩Aγ,v\(x\)∈\[0,1\]n\\mathcal\{N\}\_\{A^\{\\gamma,v\}\}\(x\)\\in\[0,1\]^\{n\};

ii\) Support Invariance:The support of a vector is invariant under WRGN, i\.e\.,supp⁡\(𝒩Aγ,v\(x\)\)=supp⁡\(x\)\\operatorname\{supp\}\(\\mathcal\{N\}\_\{A^\{\\gamma,v\}\}\(x\)\)=\\operatorname\{supp\}\(x\)\. Hence zeros are stable and do not influence the dynamics\. We can thus always remove zero nodes from the graph and assume thatx0x^\{0\}and all the iterates have full support, i\.e\., are strictly positive vectors\.

iii\) Connected Components Independence:Because updates only depend on neighboring values, the dynamics between two different connected components of the graph do not influence each other\. We can thus assume that the graph is connected\.

iv\) Scale Invariance:𝒩Aγ,v\(x\)\\mathcal\{N\}\_\{A^\{\\gamma,v\}\}\(x\)is*projective*by nature; for anyα\>0\\alpha\>0,𝒩Aγ,v\(αx\)=𝒩Aγ,v\(x\)\\mathcal\{N\}\_\{A^\{\\gamma,v\}\}\(\\alpha x\)=\\mathcal\{N\}\_\{A^\{\\gamma,v\}\}\(x\)\.

## 3Energy Descent and Global Convergence

We establish that WRGN performs a Quasi\-Newton descent on an energy function, utilizing updates that are exact Majorization\-Minimization \(MM\) stepsLee and Seung \([2000](https://arxiv.org/html/2605.05330#bib.bib45)\)\. By leveraging the general framework of Attouch, Bolte, and SvaiterAttouchet al\.\([2013](https://arxiv.org/html/2605.05330#bib.bib43)\)for the convergence of descent\-like algorithms, we prove systematic convergence to a normalizable fixed point of the dynamics\.

###### Definition 4\.

LetAAbe the adjacency matrix of a connected undirected simple graph,v∈ℝ\+\+nv\\in\\mathbb\{R\}\_\{\+\+\}^\{n\}andγ\>0\\gamma\>0\. We define the quadratic energy onx∈ℝ\+\+nx\\in\\mathbb\{R\}\_\{\+\+\}^\{n\}:

ℰγ,v\(x\):=12xTAIγ,vx−∑vi2xi\\mathcal\{E\}\_\{\\gamma,v\}\(x\):=\\frac\{1\}\{2\}x^\{T\}\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma,v\}x\-\\sum v\_\{i\}^\{2\}x\_\{i\}

For a WRGN sequence\{xk\}\\\{x^\{k\}\\\}, we define the weighted state spaceyk:=v⊙xky^\{k\}:=v\\odot x^\{k\}, which is diffeomorphic to the native state space asv\>0v\>0\. The energyℰγ,v\(x\)\\mathcal\{E\}\_\{\\gamma,v\}\(x\)inxxis equivalent to the energy inyy:ℰ~γ,v\(y\)=12yTAIγy−vTy\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y\)=\\frac\{1\}\{2\}y^\{T\}\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}y\-v^\{T\}ywhereAIγ:=I\+γA\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}:=I\+\\gamma Ais the*unweighted*regularized closed adjacency matrix\. We then prove in Appendix[B](https://arxiv.org/html/2605.05330#A2):

###### Theorem 1\(Strict Energy Descent via MM Updates\)\.

The WRGN update corresponds to the minimization of a convex quadratic majorant of the energyℰ~γ,v\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}in the weighted state spacey=v⊙xy=v\\odot x\. Consequently, the dynamics strictly decrease the energiesℰ~γ,v\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}andℰγ,v\\mathcal\{E\}\_\{\\gamma,v\}at every step unless the system is at a fixed point\.

This MM equivalence frames WRGN as a fast diagonal quasi\-Newton method using a dynamic diagonal approximation of the inverse Hessian\. As shown in Appendix[C](https://arxiv.org/html/2605.05330#A3), the update is a preconditioned gradient descent:

yk\+1−yk=−diag⁡\(yk\+1⊘v\)∇ℰ~γ,v\(yk\)\.y^\{k\+1\}\-y^\{k\}=\-\\operatorname\{diag\}\(y^\{k\+1\}\\oslash v\)\\nabla\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y^\{k\}\)\.Apart from approximating the inverse Hessian, the adaptive step sizediag⁡\(yk\+1⊘v\)\\operatorname\{diag\}\(y^\{k\+1\}\\oslash v\)also naturally incorporates log\-barriers which confine the dynamics in the unit box\[0,1\]n\[0,1\]^\{n\}\.

Despite the strict decrease of the energyℰγ,v\\mathcal\{E\}\_\{\\gamma,v\}, there are two difficulties in proving systematic convergence to a single point\. First, the normalizable domain is an open set, with non\-normalizable points belonging to its boundary\. We must make sure that the dynamics cannot fall into these singularities and remain confined inside a compact set\. This is proven in Appendix[D](https://arxiv.org/html/2605.05330#A4)\. Second, the WRGN map actually has non\-isolated fractional fixed points which constitute continuous manifolds \(see Appendix[A](https://arxiv.org/html/2605.05330#A1)\)\. Even with vanishing steps \(which directly follows from the MM theory\), in general, the sequence could indefinitely drift along those manifolds and never settle down\. We leverage the general framework of Attouch, Bolte and SvaiterAttouchet al\.\([2013](https://arxiv.org/html/2605.05330#bib.bib43)\)who have proven that if i\) the energy is continuous at critical points and verifies the Kurdyka\-Łojasiewicz \(KL\) property – which characterizes the local geometry of a function, ensuring it is not ”too flat” near its critical points, and ii\) the descent method ensures sufficient energy decrease and a relative error condition, then it necessarily converges to a critical point of the energy\. This gives the following theorem \(proof in Appendix[E](https://arxiv.org/html/2605.05330#A5)\):

###### Theorem 2\(Global Convergence of Iterated WRGN\)\.

For any simple undirected graphAA, regularizationγ\>0\\gamma\>0, biasv∈ℝ\+\+nv\\in\\mathbb\{R\}^\{n\}\_\{\+\+\}, and normalizable initializationx0∈𝖭Ax^\{0\}\\in\\mathsf\{N\}\_\{A\}, the WRGN sequence\{xk\}\\\{x^\{k\}\\\}converges to a unique normalizable fixed pointx∗∈𝖭Ax^\{\*\}\\in\\mathsf\{N\}\_\{A\}\.

## 4Weighted Mass Increase, MWIS Objective Alignment and Binarization

We now characterize the specific behavior of WRGN as a binarizing engine that systematically increases the weighted mass of the system, i\.e\., the relaxed primal objective of the MWIS problem\. We prove in Appendix[F](https://arxiv.org/html/2605.05330#A6):

###### Theorem 3\(Weighted Mass Growth\)\.

Consider the WRGN sequence\{yk\}\\\{y^\{k\}\\\}in the weighted state spacey=v⊙xy=v\\odot x\. The weighted massMv\(y\):=∑i=1nviyiM\_\{v\}\(y\):=\\sum\_\{i=1\}^\{n\}v\_\{i\}y\_\{i\}is strictly increasing:Mv\(yk\+1\)\>Mv\(yk\)M\_\{v\}\(y^\{k\+1\}\)\>M\_\{v\}\(y^\{k\}\)whenever the sequence is not at a fixed point \(yk≠yk\+1y^\{k\}\\neq y^\{k\+1\}\)\.

In particular, when applied to the MWIS problem withv=wv=\\sqrt\{w\}, WRGN strictly increases the relaxed MWIS objective∑wixi\\sum w\_\{i\}x\_\{i\}at each iteration\.

The WRGN dynamics are thus governed by a dual Lyapunov structure: while the energyℰγ,v\\mathcal\{E\}\_\{\\gamma,v\}decreases, the weighted mass, i\.e\. the relaxed MWIS objective, strictly increases\. Recall that the energyℰγ,v\(x\)=Q\(x\)−M\(x\)\\mathcal\{E\}\_\{\\gamma,v\}\(x\)=Q\(x\)\-M\(x\)opposes two conflicting terms: a quadratic termQ\(x\)=12xTAIγ,vxQ\(x\)=\\frac\{1\}\{2\}x^\{T\}\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma,v\}xwhich is minimized whenxxis an Independent Set, and the weighted mass termM\(x\)=wTxM\(x\)=w^\{T\}xwhich is maximized on\[0,1\]n\[0,1\]^\{n\}when all nodes are selected:x=𝟏x=\\mathbf\{1\}\. WRGN minimization path onℰγ,v\\mathcal\{E\}\_\{\\gamma,v\}is thus driven by a mass growing force toward an independent solution, i\.e\. a local MWIS\.

The WRGN map however admits non\-binary fixed points in general\. The following Theorem shows that the regularizationγ\\gammaacts as a symmetry\-breaking force that makes those fractional solutions repulsive, forcing the WRGN sequence to converge to a MIS \(proof in Appendix[G](https://arxiv.org/html/2605.05330#A7)\):

###### Theorem 4\(Binary Regularization\)\.

For any weighted graphG=\(A,w\)G=\(A,w\)and regularizationγ\>1\\gamma\>1, let𝒢γ=𝒩Aγ,w\\mathcal\{G\}\_\{\\gamma\}=\\mathcal\{N\}\_\{A^\{\\gamma,\\sqrt\{w\}\}\}be the WRGN map\. Then:

- •Fractional Repulsivity: Every non\-binary fixed point of𝒢γ\\mathcal\{G\}\_\{\\gamma\}is strictly repulsive\.
- •Binary Stability: A binary fixed pointxx\(with MISM=supp\(x\)M=\\text\{supp\}\(x\)\) is asymptotically stable iff it isγ\\gamma\-stable, i\.e, verifiesstabγ,w\(M\)\>1\\text\{stab\}\_\{\\gamma,w\}\(M\)\>1, where: stabγ,w⁡\(M\):=γmini∉M∑j∈N\(i\)∩Mwj/wi\.\\operatorname\{stab\}\_\{\\gamma,w\}\(M\):=\\gamma\\min\_\{i\\notin M\}\\sum\_\{j\\in N\(i\)\\cap M\}\\sqrt\{w\_\{j\}/w\_\{i\}\}\.\(3\)
- •Optimality: Every MWIS ofGGis a stable attractor of the dynamics \(stab≥γ\>1\\text\{stab\}\\geq\\gamma\>1\)\.
- •Convergence: Any sequence\{xk\}\\\{x^\{k\}\\\}initialized on a normalizable point converges to aγ\\gamma\-stable MIS\. The weighted mass∑wixik\\sum w\_\{i\}x\_\{i\}^\{k\}strictly increases until convergence\.

The thresholdγ=1\\gamma=1represents a topological phase transition in the energy landscape\. Forγ\>1\\gamma\>1, fractional equilibria become repulsive, ensuring convergence to a binary MIS, a feasible solution of the combinatorial optimization problem\. Beyond ensuring proper binarization,γ\\gammaacts as a selectivity filter: asγ→1\+\\gamma\\to 1^\{\+\}, the basins of attraction for sub\-optimal MISes vanish, increasing the likelihood that the system settles into the basin of the global MWIS\. This selectivity introduces a speed\-accuracy trade\-off: as the landscape flattens nearγ=1\\gamma=1, the convergence rate slows\. The evolution of the energy landscape asγ\\gammaincreases for the particular case ofK2K\_\{2\}is studied in Appendix[H](https://arxiv.org/html/2605.05330#A8)\. Unlike deterministic annealing or entropy regularization, which only recover discrete solutions in the theoretical zero\-temperature limit, WRGN operates in the low\-entropy regime natively\. Binarization is the natural limit of a parameter\-free, stable dynamical system\.

## 5Graph Normalization as an Evolutionary Game

The WRGN dynamics admit a natural interpretation through Evolutionary Game Theory \(EGT\)\.*Replicator Dynamics \(RD\)*– the canonical deterministic model of EGT – describe how strategy frequenciespip\_\{i\}in a population evolve based on relative fitnessfi\(p\)f\_\{i\}\(p\)Taylor and Jonker \([1978](https://arxiv.org/html/2605.05330#bib.bib50)\); Hofbauer and Sigmund \([1998](https://arxiv.org/html/2605.05330#bib.bib52)\)\. We define the*simplex state*pkp^\{k\}of the WRGN system by normalizing the weighted state:pik:=viyik/Mvkp\_\{i\}^\{k\}:=v\_\{i\}y\_\{i\}^\{k\}/M\_\{v\}^\{k\}, whereMvk:=∑vjyjkM\_\{v\}^\{k\}:=\\sum v\_\{j\}y\_\{j\}^\{k\}is the total weighted mass\. We then obtain the following Theorem \(proof in Appendix[I](https://arxiv.org/html/2605.05330#A9)\):

###### Theorem 5\(WRGN as Nonlinear Replicator\)\.

The dynamics of the simplex statepkp^\{k\}of a WRGN sequence generated by the map𝒩AIγ,v\\mathcal\{N\}\_\{\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma,v\}\}obey a nonlinear non\-potential replicator equation:

pik\+1=pikfi\(pk\)f¯\(pk\),with fitnessfi\(p\)=vi\(AIγ\(p⊘v\)\)i,p\_\{i\}^\{k\+1\}=p\_\{i\}^\{k\}\\frac\{f\_\{i\}\(p^\{k\}\)\}\{\\bar\{f\}\(p^\{k\}\)\},\\quad\\text\{with fitness\}\\quad f\_\{i\}\(p\)=\\frac\{v\_\{i\}\}\{\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}\(p\\oslash v\)\)\_\{i\}\},\(4\)wheref¯\(pk\):=∑pikfi\(pk\)\\bar\{f\}\(p^\{k\}\):=\\sum p^\{k\}\_\{i\}f\_\{i\}\(p^\{k\}\)is the average fitness of the population\.

Moreover,Mvk\+1=f¯\(pk\)M\_\{v\}^\{k\+1\}=\\bar\{f\}\(p^\{k\}\): the total weighted mass of each generation is exactly the average fitness of the preceding one, which increases monotonically throughout the dynamics\.

Replicators capture the principle of natural selection: strategies with above\-average fitness replicate, while those with below\-average fitness decrease in frequency\. The dynamics preserve the simplex \(∑ipik=1\\sum\_\{i\}p^\{k\}\_\{i\}=1\), fixed points correspond to Nash equilibria, and asymptotically stable equilibria are Evolutionary Stable Strategies \(ESS\)\.

The WRGN Harvesting Game: We can interpret the WRGN dynamics as a spatial competition for resources\. Consider an infinite population of agents distributed across settlement sites with intrinsic yieldsvi=wiv\_\{i\}=\\sqrt\{w\_\{i\}\}\. Letpikp^\{k\}\_\{i\}denote the population density at siteiiat roundkk\. Agents at siteiiharvest resources from their local site and – controlled by the interference coefficientγ\\gamma– from adjacent sites\. The effective demand on siteii, defined asDik=\(AIγ\(pk⊘v\)\)iD^\{k\}\_\{i\}=\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}\(p^\{k\}\\oslash v\)\)\_\{i\}, represents the aggregate harvesting pressure from the closed neighborhood\. The individual payoff \(fitness\) is the ratio of a site’s yield to its effective demand:fi=vi/Dikf\_\{i\}=v\_\{i\}/D^\{k\}\_\{i\}\. As agents migrate toward higher\-yield locations following the Replicator Equation, the population’s average fitness \(average harvest\) increases monotonically\. Whenγ\>1\\gamma\>1, the competitive cost of adjacency outweighs the cost of local congestion\. In this regime, the population converges to an equilibrium where it occupies a set of non\-adjacent sites – an Independent Set – while trying to maximize the total resource extraction∑wi\\sum w\_\{i\}, hence approximating the MWIS problem through natural selection\.

While replicator dynamics originated in biological contexts to describe the evolution of phenotypes, they have since been extensively adopted in evolutionary economics to model the competition between firms and technologiesMetcalfe \([1994](https://arxiv.org/html/2605.05330#bib.bib16)\); Friedman \([1991](https://arxiv.org/html/2605.05330#bib.bib17)\)\. In an economical context, the WRGN Harvesting game can be reformulated as a model of capital rebalancing under structural externalities; forγ\>1\\gamma\>1, the dynamics act as a price discovery mechanism where the stable portfolio consists exclusively of non\-correlated assets that maximize total yield\.

Most discrete replicators rely on*potential*games, where the fitness is defined as the gradient of a scalar potential which increases monotonically according to Fisher’s Fundamental Theorem of Natural Selection, and guarantees convergence\. In contrast, for non\-potential systems, discrete dynamics typically manifest complex behaviors such as chaos or limit cyclesSato and Crutchfield \([2003](https://arxiv.org/html/2605.05330#bib.bib15)\)\. WRGN is thus a rare class of globally convergent nonlinear, non\-potential dynamics that are globally convergent and obey Fisher’s principle of natural selection\.

Bomze pioneered the application of*linear*replicators to combinatorial optimization by formulating the Maximum Weight Clique \(MWC\) problem as an evolutionary gameBomze \([1997](https://arxiv.org/html/2605.05330#bib.bib57)\); Bomzeet al\.\([2000](https://arxiv.org/html/2605.05330#bib.bib69),[2002](https://arxiv.org/html/2605.05330#bib.bib62)\)\. Bomze leveraged the Motzkin\-Straus theoremMotzkin and Straus \([1965](https://arxiv.org/html/2605.05330#bib.bib132)\), which establishes a correspondence between the clique numberω\(A\)\\omega\(A\)of a graphAAand the global maximum of the quadratic formpTApp^\{T\}Apon the simplex\. Linear replicators with fitnessf\(p\)=Apf\(p\)=Apare a potential game on the quadratic potential12pTAp\\frac\{1\}\{2\}p^\{T\}Ap, which is also the average fitness\. They thus monotonically increase the Motzkin\-Straus quadratic form and converge to a local maximum\. However, the potential often exhibits fractional ”spurious” local maxima\. To address this, BomzeBomze \([1997](https://arxiv.org/html/2605.05330#bib.bib57)\)introduced a regularization of the adjacency matrixAAby incorporating a self\-loop term:A^=A\+12I\\hat\{A\}=A\+\\frac\{1\}\{2\}I\. This modification ensures that the the asymptotically stable fixed points of the dynamics correspond exactly to the maximal cliques of the graph, thereby eliminating the fractional, non\-clique local maxima of the original Motzkin\-Straus formulation\. We drew direct inspiration from Bomze in our regularized formulation of GN\.

Since an IS inGGis exactly a Clique in the complement graphG¯\\overline\{G\}, WRGN non\-linear replicators and Bomze’s linear replicators address the same combinatorial core through dual structural lenses\. However, while linear replicators correspond to first\-order Mirror Descent on a potential, the reciprocal fitness in WRGN incorporates the local curvature of conflict, resulting in a nonlinear second\-order\-like optimization path\. A comparison between WRGN and Bomze’s linear formulation for the*unweighted case*is provided in Table[1](https://arxiv.org/html/2605.05330#S5.T1)\. This analysis reveals a fundamental complementarity: where the nonlinear WRGN game utilizes*competition*to minimize structural conflict on the original graph, the linear clique game uses*collaboration*to maximize mutual support on the complement graph\. Remark the reciprocal symmetry between the two formulations of the fitness\.

Table 1:Comparison of Replicator Dynamics for MaxMIS / MaxClique \(unweighted\)\.For the weighted case \(MWIS/MWC\), Bomze leveraged Gibbons et al\. weighted extension of Motzkin\-StrausGibbonset al\.\([1997](https://arxiv.org/html/2605.05330#bib.bib56)\), who characterized the MWIS as the global*minima*on the simplex of a quadratic formpTGwpp^\{T\}G\_\{w\}pwith a particular matrixGwG\_\{w\}which incorporates the weights\. In order to fit the linear replicators*maximization*framework, Bomze uses the matrixGw′=αJ−GwG^\{\\prime\}\_\{w\}=\\alpha J\-G\_\{w\}, whereJJis the all\-ones matrix andα\>0\\alpha\>0a large enough constant to ensure thatGw′G^\{\\prime\}\_\{w\}is positive\. This introduces a meta\-parameterα\\alphaand breaks the natural scale invariance of the MWIS problem\. Furthermore, in Gibbons et al\.’s matrixGwG\_\{w\}the regularization and the weighting part are entangled\. Instead WRGN respects MWIS scale invariance and totally decouplesγ\\gammaandww, leading to a new weighted extension of the Motzkin\-Strauss theorem:

###### Theorem 6\(Weight\-Tilted Simplex Motzkin\-Straus\)\.

LetG=\(A,w\)G=\(A,w\)be a weighted simple undirected graph withwi\>0w\_\{i\}\>0\. LetΔn−1w:=\{x∈ℝ\+n∣∑wixi=1\}\\Delta^\{w\}\_\{n\-1\}:=\\\{x\\in\\mathbb\{R\}^\{n\}\_\{\+\}\\mid\\sum\\sqrt\{w\_\{i\}\}x\_\{i\}=1\\\}be theWeight\-Tilted Simplex\. For anyγ\>1\\gamma\>1, the local minima of the quadratic form𝒬\(x\)=xT\(I\+γA\)x\\mathcal\{Q\}\(x\)=x^\{T\}\(I\+\\gamma A\)xonΔn−1w\\Delta^\{w\}\_\{n\-1\}are in 1:1 correspondence with theγ\\gamma\-Stable Maximal Independent Sets ofGG\. The value of the𝒬\\mathcal\{Q\}at such a minimum corresponding to a MISMMis1/∑i∈Mwi1/\\sum\_\{i\\in M\}w\_\{i\}\. In particular, any Maximum Weight Independent Set ofGGisγ\\gamma\-Stable and corresponds to a global minimum of𝒬\\mathcal\{Q\}onΔn−1w\\Delta^\{w\}\_\{n\-1\}\.

The proof is provided in Appendix[J](https://arxiv.org/html/2605.05330#A10)\. Note that this theorem characterizes the minima of a quadratic form independently of any specific optimization algorithm, such as WRGN\. The elegance of this result lies in its geometric interpretation: the weight vector manifests as a deformation of the simplex, effectively shifting the problem’s weights from the objective function \(𝒬\\mathcal\{Q\}\) to the geometry of the constraint manifold \(Δn−1w\\Delta^\{w\}\_\{n\-1\}\)\. WhileΔn−1\\Delta\_\{n\-1\}is a hyperplane with normal vector𝟏\\mathbf\{1\},Δn−1w\\Delta^\{w\}\_\{n\-1\}is a hyperplane with normal vectorw\\sqrt\{w\}\. Geometrically, weighting the graph nodes simply manifests as*tilting*the constraint hyperplane, which reconfigures the landscape of local minima to favor directions aligned with heavier weights\. On its side, the regularization factorγ\\gammafilters out the weakest MISes while preserving the global MWIS111Note that both Motkzin\-Straus and Gibbons et al\. theorems only characterize the global optima\. Our theorem provides a richer characterization of all local minima\.\. It is easy to prove that for uniform weights, all MISes areγ\\gamma\-Stable for anyγ\>1\\gamma\>1\.Δn−1w\\Delta^\{w\}\_\{n\-1\}then aligns with the simplexΔn−1\\Delta\_\{n\-1\}, the MISes and the minima ofQQare in 1:1 mapping and the global optima are the maximum*size*MISes, which recovers Bomze’s result inBomze \([1997](https://arxiv.org/html/2605.05330#bib.bib57)\)\. For the weighted case, for any graph and weights, there is aγ0\\gamma\_\{0\}such that forγ\>γ0\\gamma\>\\gamma\_\{0\}*all*MISes areγ\\gamma\-Stable hence there is a 1:1 correspondence between MISes and local minima, together with alignement of the depth of the local minima with the weight of the MISes\.

## 6Experimental Results

GN can be leveraged as an MWIS solver using either random initializations or as a fast, accurate binarization engine for methods that excel at providing fractional solutions\. The state\-of\-the\-art method in this domain is the Bregman\-Sinkhorn \(BS\) MWIS solver by Haller and SavchynskyyHalleret al\.\([2024](https://arxiv.org/html/2605.05330#bib.bib83)\)\. The authors solve the clique\-cover LP relaxation of the MWIS problem via entropy\-regularized dual coordinate descent, which utilizes a Bregman\-Sinkhorn projection on the clique constraints at its core\. While BS is highly efficient at reaching a low*relaxed*duality gap – frequently in under 1s for graphs with tens of thousands of nodes – it relies on randomized greedy heuristics and optimal recombination to round fractional solutions into feasible integer sets \(i\.e\., actual ’hard’ solutions to the discrete combinatorial problem\)\. In practice, reaching low integral gaps takes orders of magnitude longer than reaching low fractional gaps\. We demonstrate that GN can be efficiently employed to bridge this fractional\-to\-integral gap\.

Tracking the global minimum byγ\\gamma\-pursuit\.As discussed previously, in the limitγ→0\+\\gamma\\to 0^\{\+\}, the GN energy is convex and possesses a unique fractional global minimum\. Asγ\\gammaincreases, the energy landscape deforms until the most stable MIS becomesγ\\gamma\-stable\. This eventually leads to a regime where only MISes are stable, which is guaranteed forγ\>1\\gamma\>1\. We refer to this graduated non\-convexity tracking algorithm as*γ\\gamma\-pursuit*\. A PyTorch implementation of a Graph Normalization layer incorporating this tracking logic is provided in Appendix[K](https://arxiv.org/html/2605.05330#A11)\. In all experiments, we utilized1,0001,000GN iterations with a linear increase ofγ\\gammafromγ0=0\.9\\gamma\_\{0\}=0\.9toγ1=1\.5\\gamma\_\{1\}=1\.5\. These parameters ensured convergence to a valid MIS \(after rounding\) for all tested graphs, demonstrating robust and rapid convergence even for the largest instances in the dataset\.

Randomized start and warm start\.We compared two distinct initialization strategies of WRGN sequences: random sampling on the simplex and warm\-starting from fractional solutions computed via BS\. For the warm\-start configuration, we modified the BS source code provided at[https://github\.com/vislearn/libmpopt](https://github.com/vislearn/libmpopt)Halleret al\.\([2024](https://arxiv.org/html/2605.05330#bib.bib83)\)to execute the BS fractional optimization scheme in isolation, bypassing the default heuristic binarization and rounding phases\. We ran this modified solver until the*fractional*duality gap fell below0\.1%0\.1\\%, a threshold typically reached in under one second for graphs with10,00010,000nodes\. At this stage, the*integral*gap often remains near100%100\\%, and reducing it below1%1\\%requires several minutes of additional computation\. To ensure diversity within our warm\-start batches, we generated a variety of fractional solutions by randomizing the clique optimization order and jittering initial temperatures uniformaly in the interval\[100,110\]\[100,110\]\. The temperature drop factor was set to0\.990\.99\.

Datasets\.We benchmarked these approaches on the5959large\-scale real\-world MWIS instances provided byHalleret al\.\([2024](https://arxiv.org/html/2605.05330#bib.bib83)\)\. This dataset includes the Amazon Vehicle Routing \(AVR\) benchmark, where nodes represent potential routes and edges represent conflicts between them, and the Meta\-Segmentation for Cell Detection \(MSCD\) dataset, in which MWIS is used to select non\-overlapping cell segmentations\. These datasets are particularly challenging as they scale up to 882K nodes and 344M edges \(see Table[2](https://arxiv.org/html/2605.05330#S6.T2)\), and contain the largest real\-world MWIS instances publicly available\.

Hardware\.All experiments were conducted on an Apple M4 Pro CPU with 64GB of unified memory\. We hit memory limitations though, both with thelibmpoptBregman\-Sinkhorn code and our WRGN code\. BS memory usage scales linearly with the number of cliques in the clique cover and could not run on 4/59 graphs with the largest number of cliques\. WRGN memory usage scale linearly with the number of edges in the graph, and although we could run on all graphs \(except the 4 for which BS did not run\), memory limitations triggered page swaps for graphs exceeding 10M edges\. For these instances, we limited testing to a single warm start instead of1616runs for other graphs; consequently, these runtimes do not reflect the intrinsic efficiency of the WRGN operator\.

Results\.Table[2](https://arxiv.org/html/2605.05330#S6.T2)summarizes the performance across 55/59 instances\. The full results are reported to appendix[L](https://arxiv.org/html/2605.05330#A12)\.BS Timeis the time of a BS run as reported inHalleret al\.\([2024](https://arxiv.org/html/2605.05330#bib.bib83)\);GN Timeis the time of a GN run\.Gapdenotes the relative difference between the GN solutions and the solutions reported inHalleret al\.\([2024](https://arxiv.org/html/2605.05330#bib.bib83)\), which were obtained with 1h runs of BS, and constitute the best known solutions for each problem to date, beating multiple competing algorithms including leading commercial software such as Gurobi\.E\[Gap\]andBest Gapare resp\. the average and the minimum gap found by GN for each type of start \(random or warm\)\.

Table 2:Summary of MWIS Performance on Representative Instances\.Those results yield several key insights: Warm\-starting from a BS fractional solution provides a significantly better initial basin\. While random starts frequently result in gaps exceeding5−10%5\-10\\%, warm\-starting brings the solution within1%1\\%of the best\-known integer solution for almost all instances under1M1Medges\. In particular, GN finds the best known solution on AVR\_024 and AVR\_034, and solutions below0\.1%0\.1\\%gap on AVR\_023 and AVR\_027, in less than11s\. Some instance, such as AVR\_036, are particularly hard, both for BS ad GN, which is apparent in the time taken by BS to reach a fractional gap of0\.10\.1% and by the poor quality of the final solution found by GN, even compared to larger graphs\.

## 7Conclusion

We have introduced Graph Normalization \(GN\), a principled dynamical system that provides a fast, differentiable engine for the Maximum Weight Independent Set problem\. By establishing an equivalence between GN and nonlinear Replicator Dynamics, we have provided a rigorous game\-theoretic foundation for binarization that avoids the sensitive hyperparameter tuning required by deterministic annealing\. Our theoretical contributions – including the Weight\-Tilted Simplex Motzkin\-Straus theorem – characterize the landscape of the MWIS problem as a geometric deformation of the constraint manifold, where weights manifest as the ”tilt” of a hyperplane and binarization emerges as a natural topological phase transition\. Empirically, GN serves as a high\-speed binarization bridge for state\-of\-the\-art relaxed solvers, reaching within a 1% gap of best\-known solutions for production\-scale graphs in seconds\. Beyond combinatorial optimization, the differentiability of the GN layer allows for its seamless integration into deep learning pipelines\. This opens new avenues for architectures that require ”hard” decisions under structural constraints, such as sparse Mixture\-of\-Experts, dynamic network pruning, and end\-to\-end learnable structured attention\. Future work will investigate the extension of GN to hypergraph constraints and the potential for a continuous\-time formulation of the dynamics to further improve convergence speed on massive\-scale relational data\.

## References

- \[1\]R\. P\. Adams and R\. S\. Zemel\(2011\)Ranking via sinkhorn propagation\.arXiv preprint arXiv:1106\.1925\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[2\]J\. M\. Altschuler and E\. Boix\-Adsera\(2023\)Polynomial\-time algorithms for multimarginal optimal transport problems with structure\.Mathematical Programming199\(1\),pp\. 1107–1178\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p5.1)\.
- \[3\]H\. Attouch, J\. Bolte, and B\. F\. Svaiter\(2013\)Convergence of descent methods for semi\-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized gauss–seidel methods\.Mathematical programming137\(1\),pp\. 91–129\.Cited by:[Appendix E](https://arxiv.org/html/2605.05330#A5.p1.1),[Appendix E](https://arxiv.org/html/2605.05330#A5.p3.1),[Appendix E](https://arxiv.org/html/2605.05330#A5.p6.1),[§3](https://arxiv.org/html/2605.05330#S3.p1.1),[§3](https://arxiv.org/html/2605.05330#S3.p4.1)\.
- \[4\]L\. Banchi, M\. Fingerhuth, T\. Babej, C\. Ing, and J\. M\. Arrazola\(2020\)Molecular docking with gaussian boson sampling\.Science advances6\(23\),pp\. eaax1950\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.
- \[5\]H\. H\. Bauschke and J\. M\. Borwein\(1996\)On projection algorithms for solving convex feasibility problems\.SIAM review38\(3\),pp\. 367–426\.Cited by:[§2](https://arxiv.org/html/2605.05330#S2.p5.5)\.
- \[6\]I\. Bello, H\. Pham, Q\. V\. Le, M\. Norouzi, and S\. Bengio\(2016\)Neural combinatorial optimization with reinforcement learning\.arXiv preprint arXiv:1611\.09940\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p3.1)\.
- \[7\]A\. Bettinelli, V\. Cacchiani, and E\. Malaguti\(2017\)A branch\-and\-bound algorithm for the knapsack problem with conflict graph\.INFORMS Journal on Computing29\(3\),pp\. 457–473\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.
- \[8\]I\. M\. Bomze, M\. Budinich, P\. M\. Pardalos, and M\. Pelillo\(2002\)The maximum clique problem\.InHandbook of Combinatorial Optimization,pp\. 1–74\.Note:Survey of continuous\-based methods for max clique, including replicator dynamics\. Positions these methods within combinatorial optimization\.Cited by:[§5](https://arxiv.org/html/2605.05330#S5.p6.7)\.
- \[9\]I\. M\. Bomze\(1997\)Evolution towards the maximum clique\.Journal of Global Optimization10\(2\),pp\. 143–164\.Note:Crucial reference \- First to apply replicator dynamics to the maximum clique problem \(dual of MWIS\)\. Shows that replicator dynamics converge to maximal cliques under certain conditions\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p3.1),[§5](https://arxiv.org/html/2605.05330#S5.p6.7),[§5](https://arxiv.org/html/2605.05330#S5.p9.15)\.
- \[10\]I\. M\. Bomze, M\. Pelillo, and V\. Stix\(2000\)Approximating the maximum weight clique using replicator dynamics\.IEEE Transactions on Neural Networks11\(6\),pp\. 1228–1241\.External Links:[Document](https://dx.doi.org/10.1109/72.883417)Cited by:[§5](https://arxiv.org/html/2605.05330#S5.p6.7)\.
- \[11\]W\. Brendel and S\. Todorovic\(2010\)Segmentation as maximum\-weight independent set\.InAdvances in neural information processing systems,pp\. 307–315\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.
- \[12\]H\. Bunke\(1997\)On a relation between graph edit distance and maximum common subgraph\.Pattern recognition letters18\(8\),pp\. 689–694\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.
- \[13\]A\. Clark, D\. De Las Casas, A\. Guy, A\. Mensch, M\. Paganini, J\. Hoffmann, B\. Damoc, B\. Hechtman, J\. Trevor, I\. Danihelka,et al\.\(2022\)Unified scaling laws for routed language models\.International Conference on Machine Learning \(ICML\)\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[14\]M\. Cuturi\(2013\)Sinkhorn distances: lightspeed computation of optimal transport\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Vol\.26\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[15\]D\. M\. Cvetković, M\. Doob, and H\. Sachs\(1980\)Spectra of graphs: theory and applications\.Pure and Applied Mathematics,Academic Press,New York\.External Links:ISBN 0121951502Cited by:[Appendix G](https://arxiv.org/html/2605.05330#A7.p6.8)\.
- \[16\]S\. De Vries and R\. V\. Vohra\(2003\)Combinatorial auctions: a survey\.INFORMS Journal on computing15\(3\),pp\. 284–309\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.
- \[17\]DeepSeek\-AI\(2024\)DeepSeek\-v3 technical report\.arXiv preprint arXiv:2412\.19437\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[18\]Y\. Dumas, J\. Desrosiers, and F\. Soumis\(1991\)The pickup and delivery problem with time windows\.European journal of operational research54\(1\),pp\. 7–22\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.
- \[19\]F\. Fioretto, H\. Xu, S\. Koenig, and T\. S\. Kumar\(2018\)Constraint composite graph\-based lifted message passing for distributed constraint optimization problems\.\.InISAIM,Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.
- \[20\]D\. Friedman\(1991\)Evolutionary games in economics\.Econometrica: Journal of the Econometric Society59\(3\),pp\. 637–666\.Cited by:[§5](https://arxiv.org/html/2605.05330#S5.p4.1)\.
- \[21\]M\. R\. Garey and D\. S\. Johnson\(1979\)Computers and intractability\.Vol\.174,freeman San Francisco\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p2.1)\.
- \[22\]A\. Genevay\(2019\)Entropy\-regularized optimal transport for machine learning\.Ph\.D\. Thesis,Paris Sciences et Lettres\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[23\]L\. E\. Gibbons, D\. W\. Hearn, P\. M\. Pardalos, and M\. V\. Ramana\(1997\)Continuous characterizations of the maximum clique problem\.Mathematics of Operations Research22\(3\),pp\. 754–768\.Cited by:[§5](https://arxiv.org/html/2605.05330#S5.p8.10)\.
- \[24\]S\. Gold, A\. Rangarajan,et al\.\(1996\)Softmax to softassign: neural network algorithms for combinatorial optimization\.Journal of Artificial Neural Networks2\(4\),pp\. 381–399\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[25\]S\. Gold and A\. Rangarajan\(1996\)A graduated assignment algorithm for graph matching\.IEEE Transactions on pattern analysis and machine intelligence18\(4\),pp\. 377–388\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3),[§1](https://arxiv.org/html/2605.05330#S1.p5.1)\.
- \[26\]S\. Haller, B\. Savchynskyy, M\. Schlöter, and B\. Savchynskyy\(2024\)A Bregman\-Sinkhorn algorithm for the maximum weight independent set problem\.arXiv preprint arXiv:2408\.02086\.External Links:2408\.02086Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p2.1),[§1](https://arxiv.org/html/2605.05330#S1.p6.1),[§6](https://arxiv.org/html/2605.05330#S6.p1.1),[§6](https://arxiv.org/html/2605.05330#S6.p3.6),[§6](https://arxiv.org/html/2605.05330#S6.p4.1),[§6](https://arxiv.org/html/2605.05330#S6.p6.5)\.
- \[27\]J\. Hofbauer and K\. Sigmund\(1998\)Evolutionary games and population dynamics\.Cambridge University Press\.Note:Comprehensive treatment of replicator dynamics, potential games, and stability analysis\. Establishes the connection between replicator dynamics and gradient flows on the simplex\.Cited by:[§5](https://arxiv.org/html/2605.05330#S5.p1.5)\.
- \[28\]D\. R\. Hunter and K\. Lange\(2004\)A tutorial on mm algorithms\.The American Statistician58\(1\),pp\. 30–37\.Cited by:[Appendix D](https://arxiv.org/html/2605.05330#A4.4.p4.12)\.
- \[29\]N\. Karalias and A\. Loukas\(2020\)Erdős goes neural: an unsupervised learning framework for combinatorial optimization on graphs\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Vol\.33,pp\. 6659–6670\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p2.1)\.
- \[30\]R\. M\. Karp\(1972\)Reducibility among combinatorial problems\.InComplexity of Computer Computations,pp\. 85–103\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.
- \[31\]J\. Kececioglu\(1993\)The maximum weight trace problem in multiple sequence alignment\.InAnnual Symposium on Combinatorial Pattern Matching,pp\. 106–119\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.
- \[32\]E\. Khalil, H\. Dai, Y\. Zhang, B\. Dilkina, and L\. Song\(2017\)Learning combinatorial optimization algorithms over graphs\.Advances in neural information processing systems30\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p3.1)\.
- \[33\]J\. J\. Kosowsky and A\. L\. Yuille\(1994\)The invisible hand algorithm: solving the assignment problem with statistical physics\.Neural networks7\(3\),pp\. 477–490\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[34\]H\. W\. Kuhn\(1955\)The hungarian method for the assignment problem\.Naval research logistics quarterly2\(1\-2\),pp\. 83–97\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[35\]R\. Laso\-Jadart, C\. Ambroise, P\. Peterlongo, and M\. Madoui\(2020\)MetaVaR: introducing metavariant species models for reference\-free metagenomic\-based population genomics\.PLoS One15\(12\),pp\. e0244637\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.
- \[36\]D\. Lee and H\. S\. Seung\(2000\)Algorithms for non\-negative matrix factorization\.Advances in neural information processing systems13\.Cited by:[item 2](https://arxiv.org/html/2605.05330#A2.I1.i2.p1.4),[Appendix B](https://arxiv.org/html/2605.05330#A2.p2.2),[§3](https://arxiv.org/html/2605.05330#S3.p1.1)\.
- \[37\]B\. D\. McKayet al\.\(1981\)Practical graph isomorphism\.Cited by:[Appendix A](https://arxiv.org/html/2605.05330#A1.p16.4)\.
- \[38\]G\. Mena, D\. Belanger, S\. Linderman, and J\. Snoek\(2018\)Learning latent permutations with gumbel\-sinkhorn networks\.InInternational Conference on Learning Representations \(ICLR\),Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[39\]J\. S\. Metcalfe\(1994\)Competition, fisher’s principle and economic evolution\.Evolutionary Economics4\(1\),pp\. 1–31\.Cited by:[§5](https://arxiv.org/html/2605.05330#S5.p4.1)\.
- \[40\]T\. S\. Motzkin and E\. G\. Straus\(1965\)Maxima for graphs and a new proof of a theorem of turán\.Canadian Journal of Mathematics17,pp\. 533–540\.Cited by:[§5](https://arxiv.org/html/2605.05330#S5.p6.7)\.
- \[41\]D\. J\. Papageorgiou and M\. R\. Salpukas\(2009\)The maximum weight independent set problem for data association in multiple hypothesis tracking\.InOptimization and Cooperative Control Strategies,pp\. 235–255\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.
- \[42\]M\. V\. Pogančić, A\. Paulus, V\. Musil, G\. Martius, and M\. Rolinek\(2019\)Differentiation of blackbox combinatorial solvers\.InInternational conference on learning representations,Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p3.1)\.
- \[43\]A\. Rangarajan, A\. Yuille, and S\. Gold\(1999\)Convergence properties of the softassign quadratic assignment algorithm\.Neural Computation11\(6\),pp\. 1455–1474\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p5.1)\.
- \[44\]A\. Rangarajan, A\. L\. Yuille, S\. Gold, and E\. Mjolsness\(1997\)A convergence proof for the softassign quadratic assignment algorithm\.InAdvances in neural information processing systems,pp\. 620–626\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[45\]N\. D\. Reddy, L\. Guigues, L\. Pishchulin, J\. Eledath, and S\. G\. Narasimhan\(2021\)Tessetrack: end\-to\-end learnable multi\-person articulated 3d pose tracking\.InProceedings of the IEEE/CVF conference on computer vision and pattern recognition,pp\. 15190–15200\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[46\]S\. Sanghavi, D\. Shah, and A\. S\. Willsky\(2009\)Message passing for maximum weight independent set\.IEEE Transactions on Information Theory55\(11\),pp\. 4822–4834\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2),[§1](https://arxiv.org/html/2605.05330#S1.p3.1)\.
- \[47\]P\. Sarlin, D\. DeTone, T\. Malisiewicz, and A\. Rabinovich\(2020\)Superglue: learning feature matching with graph neural networks\.InProceedings of the IEEE/CVF conference on computer vision and pattern recognition,pp\. 4938–4947\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[48\]Y\. Sato and J\. P\. Crutchfield\(2003\)Coupled replicator equations for the dynamics of learning in multiagent systems\.Physical Review E67\(1\),pp\. 015206\.Cited by:[§5](https://arxiv.org/html/2605.05330#S5.p5.1)\.
- \[49\]R\. Sinkhorn and P\. Knopp\(1967\)Concerning nonnegative matrices and doubly stochastic matrices\.Pacific Journal of Mathematics21\(2\),pp\. 343–348\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[50\]L\. Tassiulas and A\. Ephremides\(1990\)Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks\.In29th IEEE Conference on Decision and Control,pp\. 2130–2132\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.
- \[51\]P\. D\. Taylor and L\. B\. Jonker\(1978\)Evolutionary stable strategies and game dynamics\.Mathematical Biosciences40\(1\-2\),pp\. 145–156\.Note:Introduced the continuous replicator dynamics equation\. Foundation for all later work\.Cited by:[§5](https://arxiv.org/html/2605.05330#S5.p1.5)\.
- \[52\]B\. Verweij and K\. Aardal\(1999\)An optimisation algorithm for maximum independent set with applications in map labelling\.InEuropean Symposium on Algorithms,pp\. 426–437\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.
- \[53\]Z\. Xie, Y\. Wei, H\. Cao, C\. Zhao, C\. Deng, J\. Li, D\. Dai, H\. Gao, J\. Chang, K\. Yu,et al\.\(2025\)Mhc: manifold\-constrained hyper\-connections\.arXiv preprint arXiv:2512\.24880\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[54\]G\. Yu and J\. Yuan\(2015\)Fast action proposals for human action detection and search\.InProceedings of the IEEE conference on computer vision and pattern recognition,pp\. 1302–1311\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.
- \[55\]A\. L\. Yuille, D\. Geiger, and H\. Bozic\(1990\)Stereo and eye movement\.Biological Cybernetics63,pp\. 1–12\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[56\]A\. Zanfir and C\. Sminchisescu\(2018\)Deep learning of graph matching\.InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition,pp\. 2684–2693\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p4.3)\.
- \[57\]W\. Zhuo, M\. Salzmann, X\. He, and M\. Li\(2016\)Indoor scene reconstruction from a single view as maximum weight independent set\.InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition \(CVPR\),pp\. 5000–5009\.Cited by:[§1](https://arxiv.org/html/2605.05330#S1.p1.2)\.

## Appendix AFixed Points of the Unregularized GN Map

A fixed pointxxof the GN map defined by Equation[1](https://arxiv.org/html/2605.05330#S2.E1)is a normalizable vectorx∈𝖭Ax\\in\\mathsf\{N\}\_\{A\}which verifies componentwise:

xi=xi/\(\(A\+I\)x\)i\.x\_\{i\}=x\_\{i\}/\(\(A\+I\)x\)\_\{i\}\.\(5\)
Hencexix\_\{i\}is fixed if and only if eitherxi=0x\_\{i\}=0or\(\(A\+I\)x\)i=1\(\(A\+I\)x\)\_\{i\}=1\.

Letx∈𝖭Ax\\in\\mathsf\{N\}\_\{A\}be a fixed point andC1,C2,…CpC\_\{1\},C\_\{2\},\\dots C\_\{p\}the connected components of its supportsupp⁡\(x\)\\operatorname\{supp\}\(x\)\. Assumei∈supp⁡\(x\)i\\in\\operatorname\{supp\}\(x\)belongs to the componentCkC\_\{k\}\. As its closed neighborhood sum\(\(A\+I\)x\)i=xi\+∑j∈N\(i\)xj=xi\+∑j∈N\(i\)∩Ckxj\(\(A\+I\)x\)\_\{i\}=x\_\{i\}\+\\sum\_\{j\\in N\(i\)\}x\_\{j\}=x\_\{i\}\+\\sum\_\{j\\in N\(i\)\\cap C\_\{k\}\}x\_\{j\}, the restrictionxCkx\_\{C\_\{k\}\}ofxxtoCkC\_\{k\}is a fixed point of the subgraphACkA\_\{C\_\{k\}\}induced byCkC\_\{k\}\. We call a fixed point with*full support*on a*connected graph*anAtomof Graph Normalization \(GNA\)\. Any fixed point thus uniquely decomposes into i\) zeros, and ii\) atoms over the connected components of its support\.

For example, the following fixed point is made up of 3 atoms highlighted in grey:

010101/43/4It is not binary \(values1/41/4and3/43/4\), but it has two positive binary components which correspond to the smallest GN atom: the value11over the complete graphK1K\_\{1\}with only 1 node and no edge\. The MISes of a graph are precisely the fixed points which decompose into zeros andK1K\_\{1\}atoms – which are non adjacent by definition of the decomposition by connectivity\.

Atoms constitute the building blocks of the general fixed points of GN\. An atomxxover a connected graphAAis a solution of:

\(A\+I\)x=𝟏\.s\.t\.x\>0\.\(A\+I\)x=\\mathbf\{1\}\.\\quad\\textrm\{s\.t\.\}\\quad x\>0\.\(6\)Hence, all the closed neighborhood sums of an atom – the sum of the value of a node plus the values of its neighbors – are equal to11, i\.e\., normalized\. Note that for a generic fixed point, with zeros, the closed neighborhood sums at zero nodes are not normalized in general\. Consider for examplex=\(1,0,1\)x=\(1,0,1\)on the path graph ’o\-o\-o’ with adjacencyA=\(010101010\)A=\\begin\{pmatrix\}0&1&0\\\\ 1&0&1\\\\ 0&1&0\\end\{pmatrix\}\. The closed neighborhood sum of the zero node is22\. Note that as a fixed point must be normalizable, none of its closed neighborhood sums can be null, i\.e\., any zero must have at least one nonzero neighbor\.

For a given connected graphAA, we call the set𝒮\(A\)\\mathcal\{S\}\(A\)of the GN atoms onAA, i\.e\., the solutions of Equation[6](https://arxiv.org/html/2605.05330#A1.E6), theAtomic SpectrumofAA\. LetB:=A\+IB:=A\+I\. The atomic spectrum of a connected graphAAcan be either:

1\. A single point \(discrete\):𝒮\(A\)=\{x\}\\mathcal\{S\}\(A\)=\\\{x\\\}whendet\(B\)≠0\\det\(B\)\\neq 0andx=B−1𝟏\>0x=B^\{\-1\}\\mathbf\{1\}\>0\.

In that case, becauseB−1=1det\(B\)adj⁡\(B\)B^\{\-1\}=\\frac\{1\}\{\\det\(B\)\}\\operatorname\{adj\}\(B\)whereadj⁡\(B\)\\operatorname\{adj\}\(B\)is the adjoint ofBBandBBis binary,det\(B\)\\det\(B\)andadj⁡\(B\)\\operatorname\{adj\}\(B\)both have integer values, hence all values ofB−1𝟏B^\{\-1\}\\mathbf\{1\}are fractions proportional to1/\|det\(B\)\|1/\|\\det\(B\)\|\. Figure[1](https://arxiv.org/html/2605.05330#A1.F1)provides some examples\.

Figure 1:Examples of Discrete GN Atoms\.![Refer to caption](https://arxiv.org/html/2605.05330v1/atoms/GNA_6_2_ID.png)![Refer to caption](https://arxiv.org/html/2605.05330v1/atoms/GNA_7_11_ID.png)![Refer to caption](https://arxiv.org/html/2605.05330v1/atoms/GNA_8_225_RD.png)![Refer to caption](https://arxiv.org/html/2605.05330v1/atoms/GNA_8_73_ID.png)2\. A continuous manifold:whendet\(B\)=0\\det\(B\)=0and∃x0\>0:Bx0=1\\exists x\_\{0\}\>0:Bx\_\{0\}=1\.

In that case𝒮\(A\)=\(x0\+ker⁡\(B\)\)∩\(0,1\)n\\mathcal\{S\}\(A\)=\(x\_\{0\}\+\\ker\(B\)\)\\cap\(0,1\)^\{n\}is the intersection of an affine space with the open unit cube\. The manifold of fixed points is connected and has dimensiondim\(ker⁡\(B\)\)\\dim\(\\ker\(B\)\)\. Figure[2](https://arxiv.org/html/2605.05330#A1.F2)provides some examples\. The dimension of the manifold is indicated byDim\. The values on the nodes containDimvariablesa, b, …\. Any value of these variables which generate values in\(0,1\)\(0,1\)for all the nodes defines a valid atom\.

Figure 2:Examples of Continuous GN Atoms\.![Refer to caption](https://arxiv.org/html/2605.05330v1/atoms/GNA_3_1_RC.png)![Refer to caption](https://arxiv.org/html/2605.05330v1/atoms/GNA_6_4_RC.png)![Refer to caption](https://arxiv.org/html/2605.05330v1/atoms/GNA_6_7_IC.png)![Refer to caption](https://arxiv.org/html/2605.05330v1/atoms/GNA_7_7_IC.png)3\. Empty:ℱ\(A\)=∅\\mathcal\{F\}\(A\)=\\emptysetotherwise\.

Most GN atomic spectra are*empty*\. We say that a graph with a non empty spectrum isAtomic\.

Regular graphsare always atomic: indeed ifAAisdd\-regular then the uniform vectorx0=1d\+1𝟏x\_\{0\}=\\frac\{1\}\{d\+1\}\\mathbf\{1\}is always a solution ofBx0=𝟏Bx\_\{0\}=\\mathbf\{1\}222One can also prove that regular graphs are the only connected graphs which are normalized by a uniform vector\.\. Among regular graphs, some have a discrete spectrum \(e\.g\. the44\-cycleC4C\_\{4\}\), i\.e\., are only normalized for uniform weights, but others have a continuous spectrum\. The condition isdet\(B\)=0\\det\(B\)=0, i\.e\.−1\-1is an eigenvalue of the adjacency matrixAA\. There is no trivial condition for when this spectral condition happens in general, however one can identify some particular*families*of regular graphs with continuous spectra:

- •Circulant graphs:Cn\(S\)C\_\{n\}\(S\)for which−1\-1is an eigenvalue ofAA\. This includes: - –Complete graphs forn≥2n\\geq 2:𝒮\(Kn\)=\{x\>0:∑xi=1\}\\mathcal\{S\}\(K\_\{n\}\)=\\\{x\>0:\\sum x\_\{i\}=1\\\}is a continuous spectrum of dimensionn−1n\-1\. - –3n3n\-cycles:𝒮\(C3n\)=\{\(a,b,c,…,a,b,c\)⏟ntimes\):a,b,c∈\(0,1\);a\+b\+c=1\}\\mathcal\{S\}\(C\_\{3n\}\)=\\\{\\underbrace\{\(a,b,c,\\dots,a,b,c\)\}\_\{n~\\text\{times\}\}\):a,b,c\\in\(0,1\);a\+b\+c=1\\\}is a continuous spectrum of dimension22\(note thatC3=K3C\_\{3\}=K\_\{3\}\)\. - –Cn\(1,2\)C\_\{n\}\(1,2\)forn=6,10,…n=6,10,\\dots - –Cn\(1,3\)C\_\{n\}\(1,3\)forn=8,10,…n=8,10,\\dots
- •Hamming graphs:H\(d,q\)=Kq□dH\(d,q\)=K\_\{q\}^\{\\square d\}\(Cartesian product ofddcopies ofKqK\_\{q\}, vertex\-transitive andd\(q−1\)d\(q\-1\)\-regular\); eigenvalues ared\(q−1\)−qid\(q\-1\)\-qiwith multiplicity\(di\)\(q−1\)i\\binom\{d\}\{i\}\(q\-1\)^\{i\}, containing−1\-1for infinitely many\(d,q\)\(d,q\)pairs\. In particular anyHypercubeQn=K2□dQ\_\{n\}=K\_\{2\}^\{\\square d\}forddodd \(eigenvalue−1\-1has multiplicity\(d\(d\+1\)/2\)\\binom\{d\}\{\(d\+1\)/2\}\)
- •Generalized Petersen graphs:\(3\-regular\) e\.g\., the Desargues graphGP\(10,3\)GP\(10,3\)for whichspec⁡\(A\)=\{3,24,15,06,\(−1\)4,\(−2\)5\}\\operatorname\{spec\}\(A\)=\\\{3,2^\{4\},1^\{5\},0^\{6\},\(\-1\)^\{4\},\(\-2\)^\{5\}\\\}
- •Odd graphs:OkO\_\{k\}\(kk\-regular, distance\-regular, vertex\-transitive\) fork≥4k\\geq 4; e\.g\.,O4O\_\{4\}for whichspec⁡\(A\)=\{4,214,\(−1\)14,\(−3\)6\}\\operatorname\{spec\}\(A\)=\\\{4,2^\{14\},\(\-1\)^\{14\},\(\-3\)^\{6\}\\\}
- •Cayley graphs:Various constructions on non\-abelian groups \(dihedral, symmetric groups\) with generating sets giving eigenvalue−1\-1

We enumerated the11,716,57111,716,571connected graphs up to1010nodes – without isomorphic duplicate and in canonical order – using*nauty*\[[37](https://arxiv.org/html/2605.05330#bib.bib47)\], and tested if they were atomic\. We classified atomic graphs into four classes:\(Irregular,Regular\)×\(Discrete,Continuous\)\(\\textrm\{Irregular\},\\textrm\{Regular\}\)\\times\(\\textrm\{Discrete\},\\textrm\{Continuous\}\)\. The counts for each class are provided in table[3](https://arxiv.org/html/2605.05330#A1.T3)\. One sees that the overall density of atomic graphs quickly decreases withnnand that irregular atomic graphs are much more frequent than regular ones\. Discrete and continuous spectra seem to appear approximately in the same proportions\.

Table 3:Enumeration of GN Atomic Graphs up to1010Nodes\.
## Appendix BProof of Theorem[1](https://arxiv.org/html/2605.05330#Thmtheorem1)\(Strict Energy Descent\)

We work in the weighted state spacey=v⊙xy=v\\odot x\. DefineBv:=I\+γdiag\(v\)−1Adiag\(v\)B\_\{v\}:=I\+\\gamma\\operatorname\{diag\}\(v\)^\{\-1\}A\\operatorname\{diag\}\(v\)andB:=I\+γAB:=I\+\\gamma A\. One verifies by substitution that the WRGN updatexik\+1=xik/\(Bvxk\)ix^\{k\+1\}\_\{i\}=x^\{k\}\_\{i\}/\(B\_\{v\}x^\{k\}\)\_\{i\}provides the update for the stateyy:yik\+1=viyik/\(Byk\)iy^\{k\+1\}\_\{i\}=v\_\{i\}y\_\{i\}^\{k\}/\(By^\{k\}\)\_\{i\}and that the energyℰγ,v\(x\)=12xTBvx−∑vi2xi\\mathcal\{E\}\_\{\\gamma,v\}\(x\)=\\frac\{1\}\{2\}x^\{T\}B\_\{v\}x\-\\sum v\_\{i\}^\{2\}x\_\{i\}is equal to the energy inyy:

ℰ~γ,v\(y\):=12yTBy−∑viyi\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y\):=\\frac\{1\}\{2\}y^\{T\}By\-\\sum v\_\{i\}y\_\{i\}
Following the Majorization\-Minimization \(MM\) framework\[[36](https://arxiv.org/html/2605.05330#bib.bib45)\], we define an auxiliary functionG\(y,yk\)G\(y,y^\{k\}\)that serves as a global upper bound forℰ~γ,v\(y\)\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y\):

G\(y,yk\):=12∑i\(Byk\)iyikyi2−∑iviyiG\(y,y^\{k\}\):=\\frac\{1\}\{2\}\\sum\_\{i\}\\frac\{\(By^\{k\}\)\_\{i\}\}\{y\_\{i\}^\{k\}\}y\_\{i\}^\{2\}\-\\sum\_\{i\}v\_\{i\}y\_\{i\}
This surrogate function satisfies the two necessary MM conditions:

1. 1\.Equality at the Tangent Point: By substitution,G\(yk,yk\)=12\(yk\)TByk−vTyk=ℰ~γ,v\(yk\)G\(y^\{k\},y^\{k\}\)=\\frac\{1\}\{2\}\(y^\{k\}\)^\{T\}By^\{k\}\-v^\{T\}y^\{k\}=\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y^\{k\}\)\.
2. 2\.Global Dominance: BecauseBBis a*symmetric*non\-negative matrix, the diagonal majorization lemma\[[36](https://arxiv.org/html/2605.05330#bib.bib45)\]ensuresyTBy≤∑i\(Byk\)iyikyi2y^\{T\}By\\leq\\sum\_\{i\}\\frac\{\(By^\{k\}\)\_\{i\}\}\{y\_\{i\}^\{k\}\}y\_\{i\}^\{2\}thusℰ~γ,v\(y\)≤G\(y,yk\)\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y\)\\leq G\(y,y^\{k\}\)for ally≥0y\\geq 0\.

The next iterateyk\+1y^\{k\+1\}is obtained by minimizing the majorant\. SinceGGis a separable quadratic function with a diagonal Hessian∇yy2G=diag\(\(Byk\)iyik\)\\nabla^\{2\}\_\{yy\}G=\\text\{diag\}\\bigl\(\\frac\{\(By^\{k\}\)\_\{i\}\}\{y^\{k\}\_\{i\}\}\\bigr\), it is strictly convex providedyk\>0y^\{k\}\>0, which can always be assumed as zeros are fixed and can be ignored in the dynamics\.

Setting the gradient∇yG=0\\nabla\_\{y\}G=0yields the unique global minimizer:

yik\+1=yikvi\(Byk\)iy\_\{i\}^\{k\+1\}=y\_\{i\}^\{k\}\\frac\{v\_\{i\}\}\{\(By^\{k\}\)\_\{i\}\}
We thus recover the WRGN update rule in weighted state space, which is equivalent to the update rule in unweighted state space and shows that WRGN is an*exact*MM algorithm\.

By the fundamental properties of MM algorithms:

ℰ~γ,v\(yk\+1\)≤G\(yk\+1,yk\)≤G\(yk,yk\)=ℰ~γ,v\(yk\)\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y^\{k\+1\}\)\\leq G\(y^\{k\+1\},y^\{k\}\)\\leq G\(y^\{k\},y^\{k\}\)=\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y^\{k\}\)
The first inequality is given by global dominance\. The second inequality is strict \(G\(yk\+1,yk\)<G\(yk,yk\)G\(y^\{k\+1\},y^\{k\}\)<G\(y^\{k\},y^\{k\}\)\) iffyk\+1≠yky^\{k\+1\}\\neq y^\{k\}due to the strict convexity of the surrogate\.

## Appendix CWRGN Preconditioned Gradient Descent Form

The gradient of the energyℰ~γ,v\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}is:

∇ℰ~γ,v\(yk\)=AIγyk−v\.\\nabla\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y^\{k\}\)=\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}y^\{k\}\-v\.
The WRGN update rule for the stateyyis:

yik\+1=viyik\(AIγyk\)i\.y\_\{i\}^\{k\+1\}=\\frac\{v\_\{i\}y\_\{i\}^\{k\}\}\{\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}y^\{k\}\)\_\{i\}\}\.
Substituting\(AIγyk\)i=vi\+∇ℰ~γ,v\(yk\)i\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}y^\{k\}\)\_\{i\}=v\_\{i\}\+\\nabla\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y^\{k\}\)\_\{i\}:

yik\+1\(vi\+∇ℰ~γ,v\(yk\)i\)=viyik\.y\_\{i\}^\{k\+1\}\(v\_\{i\}\+\\nabla\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y^\{k\}\)\_\{i\}\)=v\_\{i\}y\_\{i\}^\{k\}\.
Expanding and isolating the stepyik\+1−yiky\_\{i\}^\{k\+1\}\-y\_\{i\}^\{k\}:

yik\+1−yik=−yik\+1vi∇ℰ~γ,v\(yk\)i\.y\_\{i\}^\{k\+1\}\-y\_\{i\}^\{k\}=\-\\frac\{y\_\{i\}^\{k\+1\}\}\{v\_\{i\}\}\\nabla\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y^\{k\}\)\_\{i\}\.
In vector form, this is the preconditioned gradient descent update:

yk\+1−yk=−diag\(yk\+1⊘v\)∇ℰ~γ,v\(yk\)\.y^\{k\+1\}\-y^\{k\}=\-\\text\{diag\}\(y^\{k\+1\}\\oslash v\)\\nabla\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y^\{k\}\)\.

## Appendix DWRGN Cannot Accumulate on Non\-Normalizable Points

###### Proposition 1\.

For any graphAA,v∈ℝ\+\+nv\\in\\mathbb\{R\}\_\{\+\+\}^\{n\},γ\>0\\gamma\>0, andx0∈𝖭Ax^\{0\}\\in\\mathsf\{N\}\_\{A\}, the WRGN sequence\{xk\}k≥0\\\{x^\{k\}\\\}\_\{k\\geq 0\}cannot accumulate on a non\-normalizable point\.

Consequently, there exists someδ\>0\\delta\>0such that for sufficiently largekk, the sequence is confined to the compact set:

Kδ:=\{x∈\[0,1\]n∣d\(x,∂𝖭A\)≥δ\}\.K\_\{\\delta\}:=\\\{x\\in\[0,1\]^\{n\}\\mid d\(x,\\partial\\mathsf\{N\}\_\{A\}\)\\geq\\delta\\\}\.

###### Proof\.

Suppose there exists an accumulation pointx∗∈∂𝖭Ax^\{\*\}\\in\\partial\\mathsf\{N\}\_\{A\}\. By definition of the normalizable domain for the WRGN map, there exists at least one nodeiisuch that its weighted neighborhood sum is zero:\(Aγ,vx∗\)i=0\(A^\{\\gamma,v\}x^\{\*\}\)\_\{i\}=0\. LetU=\{x∈𝖭A∣\(Aγ,vx\)i<1/2\}U=\\\{x\\in\\mathsf\{N\}\_\{A\}\\mid\(A^\{\\gamma,v\}x\)\_\{i\}<1/2\\\}be an open set containingx∗x^\{\*\}\. According to the WRGN update rule, whenxk∈Ux^\{k\}\\in U, the iteratexik\+1x\_\{i\}^\{k\+1\}satisfies:

xik\+1=xik\(Aγ,vxk\)i\>2xik\.x\_\{i\}^\{k\+1\}=\\frac\{x\_\{i\}^\{k\}\}\{\(A^\{\\gamma,v\}x^\{k\}\)\_\{i\}\}\>2x\_\{i\}^\{k\}\.
We now show that for anyL∈ℕL\\in\\mathbb\{N\}, there exists a contiguous block ofLLiterates\{xk,…,xk\+L\}\\\{x^\{k\},\\dots,x^\{k\+L\}\\\}contained inUU\.

Since the mapx↦\(Aγ,vx\)ix\\mapsto\(A^\{\\gamma,v\}x\)\_\{i\}is continuous and\(Aγ,vx∗\)i=0\(A^\{\\gamma,v\}x^\{\*\}\)\_\{i\}=0,UUis an open neighborhood ofx∗x^\{\*\}\. There exists anϵ\>0\\epsilon\>0such that the open ballℬ\(x∗,ϵ\)⊂U\\mathcal\{B\}\(x^\{\*\},\\epsilon\)\\subset U\.

As established, WRGN is an exact MM algorithm minimizing the energyℰγ,v\\mathcal\{E\}\_\{\\gamma,v\}\. Given thatℰγ,v\\mathcal\{E\}\_\{\\gamma,v\}is bounded below on the invariant set\[0,1\]n\[0,1\]^\{n\}, the vanishing step size‖xk\+1−xk‖2→0\\\|x^\{k\+1\}\-x^\{k\}\\\|\_\{2\}\\to 0is a classical result of MM theory\[[28](https://arxiv.org/html/2605.05330#bib.bib93)\]\. Since step sizes vanish, for anyLL, there existsKLK\_\{L\}such that for allk\>KLk\>K\_\{L\},‖xk\+1−xk‖2<ϵ2L\\\|x^\{k\+1\}\-x^\{k\}\\\|\_\{2\}<\\frac\{\\epsilon\}\{2L\}\. Becausex∗x^\{\*\}is an accumulation point, there exist infinitely many indicesk\>KLk\>K\_\{L\}such that‖xk−x∗‖2<ϵ/2\\\|x^\{k\}\-x^\{\*\}\\\|\_\{2\}<\\epsilon/2\. For anym∈\{1,…,L\}m\\in\\\{1,\\dots,L\\\}, the triangle inequality yields:

‖xk\+m−x∗‖2≤‖xk−x∗‖2\+∑j=0m−1‖xk\+j\+1−xk\+j‖2<ϵ2\+mϵ2L≤ϵ\\\|x^\{k\+m\}\-x^\{\*\}\\\|\_\{2\}\\leq\\\|x^\{k\}\-x^\{\*\}\\\|\_\{2\}\+\\sum\_\{j=0\}^\{m\-1\}\\\|x^\{k\+j\+1\}\-x^\{k\+j\}\\\|\_\{2\}<\\frac\{\\epsilon\}\{2\}\+m\\frac\{\\epsilon\}\{2L\}\\leq\\epsilonThus,xk\+m∈ℬ\(x∗,ϵ\)⊂Ux^\{k\+m\}\\in\\mathcal\{B\}\(x^\{\*\},\\epsilon\)\\subset Ufor allm∈\{0,…,L\}m\\in\\\{0,\\dots,L\\\}\.

During this contiguous block of lengthLL, the growth property holds at every step, yieldingxik\+L\>2Lxikx\_\{i\}^\{k\+L\}\>2^\{L\}x\_\{i\}^\{k\}\. Sincex0x^\{0\}has full support and the WRGN map is support\-invariant,xik\>0x\_\{i\}^\{k\}\>0for all finitekk\. For anyk\>KLk\>K\_\{L\}that is sufficiently close tox∗x^\{\*\},xikx\_\{i\}^\{k\}is a fixed positive value\. We can choose anLLlarge enough such that2L\>1/xik2^\{L\}\>1/x\_\{i\}^\{k\}, ensuringxik\+L\>1x\_\{i\}^\{k\+L\}\>1\. This contradicts the fact that the WRGN sequence is bounded within\[0,1\]n\[0,1\]^\{n\}\. Therefore,x∗x^\{\*\}cannot be an accumulation point\.

Since the sequence\{xk\}\\\{x^\{k\}\\\}is bounded in\[0,1\]n\[0,1\]^\{n\}and cannot accumulate on the boundary∂𝖭A\\partial\\mathsf\{N\}\_\{A\}, it must be confined to the compact setKδK\_\{\\delta\}for someδ\>0\\delta\>0\. ∎

###### Corollary 1\(Accumulation on Fixed Points\)\.

Any WRGN sequence accumulates on the \(normalizable\) fixed points of the map\.

###### Proof\.

By Proposition[1](https://arxiv.org/html/2605.05330#Thmproposition1), the sequence is confined to a compact subset of𝖭A\\mathsf\{N\}\_\{A\}for sufficiently largekk\. Therefore, the sequence must have at least one accumulation pointx∗x^\{\*\}\. The graph normalization map𝒩Aγ,v\\mathcal\{N\}\_\{A^\{\\gamma,v\}\}is continuous on𝖭A\\mathsf\{N\}\_\{A\}\. Let\{xkj\}\\\{x^\{k\_\{j\}\}\\\}be a subsequence converging tox∗x^\{\*\}\. Since the step sizes vanish \(‖xk\+1−xk‖2→0\\\|x^\{k\+1\}\-x^\{k\}\\\|\_\{2\}\\to 0\), we have:

𝒩Aγ,v\(x∗\)=𝒩Aγ,v\(limj→∞xkj\)=limj→∞𝒩Aγ,v\(xkj\)=limj→∞xkj\+1\.\\mathcal\{N\}\_\{A^\{\\gamma,v\}\}\(x^\{\*\}\)=\\mathcal\{N\}\_\{A^\{\\gamma,v\}\}\(\\lim\_\{j\\to\\infty\}x^\{k\_\{j\}\}\)=\\lim\_\{j\\to\\infty\}\\mathcal\{N\}\_\{A^\{\\gamma,v\}\}\(x^\{k\_\{j\}\}\)=\\lim\_\{j\\to\\infty\}x^\{k\_\{j\}\+1\}\.
Because‖xkj\+1−xkj‖2→0\\\|x^\{k\_\{j\}\+1\}\-x^\{k\_\{j\}\}\\\|\_\{2\}\\to 0, it follows thatlimxkj\+1=limxkj=x∗\\lim x^\{k\_\{j\}\+1\}=\\lim x^\{k\_\{j\}\}=x^\{\*\}\. Thus,𝒩Aγ,v\(x∗\)=x∗\\mathcal\{N\}\_\{A^\{\\gamma,v\}\}\(x^\{\*\}\)=x^\{\*\}, showing that every accumulation point is a fixed point, which is normalizable by definition\. ∎

## Appendix EProof of Theorem[2](https://arxiv.org/html/2605.05330#Thmtheorem2)\(Global Convergence\)

We leverage the following Theorem by Attouch, Bolte and Svaiter\[[3](https://arxiv.org/html/2605.05330#bib.bib43), p\. 12\]:

Theorem 2\.9*\(Convergence to a critical point\)\. Letf:ℝn→ℝ∪\{\+∞\}f:\\mathbb\{R\}^\{n\}\\to\\mathbb\{R\}\\cup\\\{\+\\infty\\\}be a proper lower semicontinuous function\. Consider a sequence\(xk\)k∈ℕ\(x^\{k\}\)\_\{k\\in\\mathbb\{N\}\}that satisfies H1, H2, and H3\. Ifffhas the Kurdyka\-Łojasiewicz property at the cluster pointx~\\tilde\{x\}specified in H3 then the sequence\(xk\)k∈ℕ\(x^\{k\}\)\_\{k\\in\\mathbb\{N\}\}converges tox¯=x~\\bar\{x\}=\\tilde\{x\}askkgoes to infinity, andx¯\\bar\{x\}is a critical point offf\.*

where H1, H2 and H3 are defined on p\. 8 in\[[3](https://arxiv.org/html/2605.05330#bib.bib43)\]and reproduced below\.

Let\{yk\}\\\{y^\{k\}\\\}be a WRGN sequence in weighted state space\. Letδ\>0\\delta\>0\. By Proposition[1](https://arxiv.org/html/2605.05330#Thmproposition1), there exists an integerKKsuch that the tail sequencex=\{yk\+K\}k∈ℕx=\\\{y^\{k\+K\}\\\}\_\{k\\in\\mathbb\{N\}\}is confined to the compact setKδK\_\{\\delta\}\. We define the extended\-value objective onℝn\\mathbb\{R\}^\{n\}:

f\(z\)=\{ℰ~γ,v\(z\)z∈Kδ\+∞otherwise\.f\(z\)=\\begin\{cases\}\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(z\)&z\\in K\_\{\\delta\}\\\\ \+\\infty&\\text\{otherwise\}\\end\{cases\}\.\(7\)
ffis proper, lower semicontinuous, and, being a composition of semi\-algebraic terms, satisfies the Kurdyka\-Łojasiewicz property everywhere\.

We verify that the tail sequencexxsatisfies conditions H1\-H3 of Theorem 2\.9\[[3](https://arxiv.org/html/2605.05330#bib.bib43), p\. 12\]:

H1\.\(Sufficient decrease condition\)\.a\>0a\>0exists such that for eachk∈ℕk\\in\\mathbb\{N\},

f\(xk\+1\)\+a‖xk\+1−xk‖2≤f\(xk\)\.f\(x^\{k\+1\}\)\+a\\\|x^\{k\+1\}\-x^\{k\}\\\|^\{2\}\\leq f\(x^\{k\}\)\.
Verification:By the MM construction,xk\+1x^\{k\+1\}is the minimizer of a separable quadratic surrogate functionG\(x\|xk\)G\(x\|x^\{k\}\)\. Specifically, forx∈Kδx\\in K\_\{\\delta\}, the second derivatives of the surrogate components are bounded below by a constantamin\>0a\_\{min\}\>0related to the minimum weighted neighborhood sum\. By the exact second\-order Taylor expansion ofGGaround its minimizerxk\+1x^\{k\+1\}, noting that∇G\(xk\+1\|xk\)=0\\nabla G\(x^\{k\+1\}\|x^\{k\}\)=0andG\(xk\|xk\)=f\(xk\)G\(x^\{k\}\|x^\{k\}\)=f\(x^\{k\}\), we have:

f\(xk\)=G\(xk\|xk\)≥G\(xk\+1\|xk\)\+amin2‖xk\+1−xk‖2\.f\(x^\{k\}\)=G\(x^\{k\}\|x^\{k\}\)\\geq G\(x^\{k\+1\}\|x^\{k\}\)\+\\frac\{a\_\{min\}\}\{2\}\\\|x^\{k\+1\}\-x^\{k\}\\\|^\{2\}\.Sincef\(xk\+1\)≤G\(xk\+1\|xk\)f\(x^\{k\+1\}\)\\leq G\(x^\{k\+1\}\|x^\{k\}\)by the majorization property, H1 is satisfied witha=amin/2a=a\_\{min\}/2\.

H2\.\(Relative error condition\)\.b\>0b\>0exists such that for eachk∈ℕk\\in\\mathbb\{N\}, there existswk\+1∈∂f\(xk\+1\)w^\{k\+1\}\\in\\partial f\(x^\{k\+1\}\)such that

‖wk\+1‖≤b‖xk\+1−xk‖\.\\\|w^\{k\+1\}\\\|\\leq b\\\|x^\{k\+1\}\-x^\{k\}\\\|\.\[where∂f\\partial fis the limiting subdifferential offf\]

Verification:Sincexk\+1x^\{k\+1\}is an interior point ofKδK\_\{\\delta\}for sufficiently largekkwe choosewk\+1=∇f\(xk\+1\)∈∂f\(xk\+1\)w^\{k\+1\}=\\nabla f\(x^\{k\+1\}\)\\in\\partial f\(x^\{k\+1\}\)\. Asxk\+1x^\{k\+1\}minimizes the surrogateG\(x\|xk\)G\(x\|x^\{k\}\), we have∇G\(xk\+1\|xk\)=0\\nabla G\(x^\{k\+1\}\|x^\{k\}\)=0, and thus:

∥wk\+1∥=∥∇f\(xk\+1\)−𝟎∥=∥∇f\(xk\+1\)−∇G\(xk\+1\|xk\)∥\.\\\|w^\{k\+1\}\\\|=\\\|\\nabla f\(x^\{k\+1\}\)\-\\mathbf\{0\}\\\|=\\\|\\nabla f\(x^\{k\+1\}\)\-\\nabla G\(x^\{k\+1\}\|x^\{k\}\)\\\|\.
Since∇f\(xk\)=∇G\(xk\|xk\)\\nabla f\(x^\{k\}\)=\\nabla G\(x^\{k\}\|x^\{k\}\)by the tangency property of the MM surrogate, the difference∥∇f\(xk\+1\)−∇G\(xk\+1\|xk\)∥\\\|\\nabla f\(x^\{k\+1\}\)\-\\nabla G\(x^\{k\+1\}\|x^\{k\}\)\\\|is bounded byb‖xk\+1−xk‖b\\\|x^\{k\+1\}\-x^\{k\}\\\|for someb\>0b\>0due to the Lipschitz continuity of the gradients onKδK\_\{\\delta\}\. This satisfies H2\.

H3\.\(Continuity condition\)\. There exists a subsequence\(xkj\)j∈ℕ\(x^\{k\_\{j\}\}\)\_\{j\\in\\mathbb\{N\}\}andx~\\tilde\{x\}such that

xkj→x~andf\(xkj\)→f\(x~\)asj→∞\.x^\{k\_\{j\}\}\\to\\tilde\{x\}\\quad\\textrm\{and\}\\quad f\(x^\{k\_\{j\}\}\)\\to f\(\\tilde\{x\}\)\\quad\\textrm\{as\}\\quad j\\to\\infty\.
Verification:Proposition[1](https://arxiv.org/html/2605.05330#Thmcorollary1)shows thatxxhas at least one accumulation pointx~∈Kδ\\tilde\{x\}\\in K\_\{\\delta\}and a subsequencexkj→x~x^\{k\_\{j\}\}\\to\\tilde\{x\}\. Sinceℰ~γ,v\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}is a smooth rational function onKδK\_\{\\delta\}, it is continuous atx~\\tilde\{x\}\. Thus,f\(xkj\)→f\(x~\)f\(x^\{k\_\{j\}\}\)\\to f\(\\tilde\{x\}\)asj→∞j\\to\\infty, satisfying H3\.

Application of Theorem 2\.9All conditions H1–H3 are satisfied hence the sequence\{xk\}\\\{x^\{k\}\\\}converges tox¯=x~\\bar\{x\}=\\tilde\{x\}\.

## Appendix FProof of Theorem[3](https://arxiv.org/html/2605.05330#Thmtheorem3)\(WRGN Increases the Weighted Mass\)

###### Lemma 1\.

LetSSbe a symmetric matrix withSii=1S\_\{ii\}=1andSij≥0S\_\{ij\}\\geq 0\. Letv∈ℝ\+\+nv\\in\\mathbb\{R\}^\{n\}\_\{\+\+\}be a vector of positive weights\. Lety∈ℝ\+\+ny\\in\\mathbb\{R\}^\{n\}\_\{\+\+\}be the result of a WRGN update step in weighted state space, i\.e\., there existsz\>0z\>0such thatyi=vizi\(Sz\)iy\_\{i\}=\\frac\{v\_\{i\}z\_\{i\}\}\{\(Sz\)\_\{i\}\}\. Then:

yTSy≤∑i=1nviyiy^\{T\}Sy\\leq\\sum\_\{i=1\}^\{n\}v\_\{i\}y\_\{i\}\(8\)with equality if and only ifyyis a fixed point of the dynamics, satisfying\(Sy\)i=vi\(Sy\)\_\{i\}=v\_\{i\}for alli∈supp\(y\)i\\in\\text\{supp\}\(y\)\.

###### Proof\.

Letdi=\(Sz\)id\_\{i\}=\(Sz\)\_\{i\}\. By the definition of the WRGN update, we havezi=yidiviz\_\{i\}=\\frac\{y\_\{i\}d\_\{i\}\}\{v\_\{i\}\}\. Substituting this into the expansion ofdi=\(Sz\)id\_\{i\}=\(Sz\)\_\{i\}:

di=∑j=1nSijzj=∑j=1nSijyjdjvj⟹1=∑j=1nSijyjdjvjdi\.d\_\{i\}=\\sum\_\{j=1\}^\{n\}S\_\{ij\}z\_\{j\}=\\sum\_\{j=1\}^\{n\}S\_\{ij\}\\frac\{y\_\{j\}d\_\{j\}\}\{v\_\{j\}\}\\implies 1=\\sum\_\{j=1\}^\{n\}S\_\{ij\}\\frac\{y\_\{j\}d\_\{j\}\}\{v\_\{j\}d\_\{i\}\}\.
Define the weighted differenceΔv\(y\)=∑viyi−yTSy\\Delta\_\{v\}\(y\)=\\sum v\_\{i\}y\_\{i\}\-y^\{T\}Sy\. Expanding the quadratic form and usingSii=1S\_\{ii\}=1:

Δv\(y\)=∑iyi\(vi−yi\)−∑i≠jSijyiyj\.\\Delta\_\{v\}\(y\)=\\sum\_\{i\}y\_\{i\}\(v\_\{i\}\-y\_\{i\}\)\-\\sum\_\{i\\neq j\}S\_\{ij\}y\_\{i\}y\_\{j\}\.
Substituting the identity1=∑jSijyjdjvjdi1=\\sum\_\{j\}S\_\{ij\}\\frac\{y\_\{j\}d\_\{j\}\}\{v\_\{j\}d\_\{i\}\}into theviv\_\{i\}term:

Δv\(y\)\\displaystyle\\Delta\_\{v\}\(y\)=\\displaystyle=∑iyi\(vi∑jSijyjdjvjdi\)−∑iyi2−∑i≠jSijyiyj\\displaystyle\\sum\_\{i\}y\_\{i\}\\left\(v\_\{i\}\\sum\_\{j\}S\_\{ij\}\\frac\{y\_\{j\}d\_\{j\}\}\{v\_\{j\}d\_\{i\}\}\\right\)\-\\sum\_\{i\}y\_\{i\}^\{2\}\-\\sum\_\{i\\neq j\}S\_\{ij\}y\_\{i\}y\_\{j\}=\\displaystyle=∑i,jSijyiyjvidjvjdi−∑iyi2−∑i≠jSijyiyj\\displaystyle\\sum\_\{i,j\}S\_\{ij\}y\_\{i\}y\_\{j\}\\frac\{v\_\{i\}d\_\{j\}\}\{v\_\{j\}d\_\{i\}\}\-\\sum\_\{i\}y\_\{i\}^\{2\}\-\\sum\_\{i\\neq j\}S\_\{ij\}y\_\{i\}y\_\{j\}
Using the symmetry ofSS\(Sij=SjiS\_\{ij\}=S\_\{ji\}\), we group the terms for each edge\(i,j\)∈E\(i,j\)\\in E:

Δv\(y\)=∑\(i,j\)∈ESijyiyj\(vidjvjdi\+vjdividj−2\)\.\\Delta\_\{v\}\(y\)=\\sum\_\{\(i,j\)\\in E\}S\_\{ij\}y\_\{i\}y\_\{j\}\\left\(\\frac\{v\_\{i\}d\_\{j\}\}\{v\_\{j\}d\_\{i\}\}\+\\frac\{v\_\{j\}d\_\{i\}\}\{v\_\{i\}d\_\{j\}\}\-2\\right\)\.
Factoring the expression in the parentheses:

Δv\(y\)=∑\(i,j\)∈ESijyiyjvivjdidj\(vidj−vjdi\)2\\Delta\_\{v\}\(y\)=\\sum\_\{\(i,j\)\\in E\}\\frac\{S\_\{ij\}y\_\{i\}y\_\{j\}\}\{v\_\{i\}v\_\{j\}d\_\{i\}d\_\{j\}\}\\left\(v\_\{i\}d\_\{j\}\-v\_\{j\}d\_\{i\}\\right\)^\{2\}
SinceSij,yi,di,vi\>0S\_\{ij\},y\_\{i\},d\_\{i\},v\_\{i\}\>0, every term in the sum is non\-negative, henceΔv\(y\)≥0\\Delta\_\{v\}\(y\)\\geq 0, which impliesyTSy≤∑viyiy^\{T\}Sy\\leq\\sum v\_\{i\}y\_\{i\}\.

Equality Case:

The equalityΔv\(y\)=0\\Delta\_\{v\}\(y\)=0holds if and only if each term in the sum vanishes\. For all pairs\(i,j\)\(i,j\)withSij\>0S\_\{ij\}\>0, this requiresvidj−vjdi=0v\_\{i\}d\_\{j\}\-v\_\{j\}d\_\{i\}=0, which impliesdivi=λ\\frac\{d\_\{i\}\}\{v\_\{i\}\}=\\lambdafor some constantλ\>0\\lambda\>0on each connected component of the support \(λ\>0\\lambda\>0becausedi\>0d\_\{i\}\>0\)\. By the definition of the WRGN update,zi=yidiviz\_\{i\}=\\frac\{y\_\{i\}d\_\{i\}\}\{v\_\{i\}\}\. Substitutingdivi=λ\\frac\{d\_\{i\}\}\{v\_\{i\}\}=\\lambdayieldszi=λyiz\_\{i\}=\\lambda y\_\{i\}\(orz=λyz=\\lambda yin vector form\)\. Substituting this back into the definition ofddgives:d=Sz=S\(λy\)=λSyd=Sz=S\(\\lambda y\)=\\lambda Sy\. Consequently, the conditiondi=λvid\_\{i\}=\\lambda v\_\{i\}becomesλ\(Sy\)i=λvi\\lambda\(Sy\)\_\{i\}=\\lambda v\_\{i\}\. Sinceλ\>0\\lambda\>0, we conclude\(Sy\)i=vi\(Sy\)\_\{i\}=v\_\{i\}for alli∈supp\(y\)i\\in\\text\{supp\}\(y\)\. Henceyyis a fixed point of the WRGN dynamics, satisfying the weighted normalization condition\. ∎

###### Theorem 7\(Monotonic Growth of Weighted Mass\)\.

LetSSbe a symmetric matrix withSii=1S\_\{ii\}=1andSij≥0S\_\{ij\}\\geq 0\. Letv∈ℝ\+\+nv\\in\\mathbb\{R\}^\{n\}\_\{\+\+\}be a vector of positive weights\. Consider the WRGN sequence\{yk\}\\\{y^\{k\}\\\}defined by the update rule:

yik\+1=yikvi\(Syk\)i\.y\_\{i\}^\{k\+1\}=y\_\{i\}^\{k\}\\frac\{v\_\{i\}\}\{\(Sy^\{k\}\)\_\{i\}\}\.Ifyk∈ℝ\+\+ny^\{k\}\\in\\mathbb\{R\}^\{n\}\_\{\+\+\}is the image of a prior WRGN step, then the weighted massMv\(y\):=∑i=1nviyiM\_\{v\}\(y\):=\\sum\_\{i=1\}^\{n\}v\_\{i\}y\_\{i\}is non\-decreasing:

Mv\(yk\+1\)≥Mv\(yk\)\.M\_\{v\}\(y^\{k\+1\}\)\\geq M\_\{v\}\(y^\{k\}\)\.Furthermore, the increase is strict \(Mv\(yk\+1\)\>Mv\(yk\)M\_\{v\}\(y^\{k\+1\}\)\>M\_\{v\}\(y^\{k\}\)\) unlessyky^\{k\}is a fixed point of the dynamics\.

###### Proof\.

For the current stateyky^\{k\}, we define a probability distributionλ\\lambdasuch that:

λi=viyikMv\(yk\),implying∑i=1nλi=1\.\\lambda\_\{i\}=\\frac\{v\_\{i\}y\_\{i\}^\{k\}\}\{M\_\{v\}\(y^\{k\}\)\},\\quad\\text\{implying\}\\quad\\sum\_\{i=1\}^\{n\}\\lambda\_\{i\}=1\.
The weighted mass at iterationk\+1k\+1is given by:

Mv\(yk\+1\)=∑i=1nviyik\+1=∑i=1nvi\(yikvi\(Syk\)i\)=∑i=1nvi2yik\(Syk\)i\.M\_\{v\}\(y^\{k\+1\}\)=\\sum\_\{i=1\}^\{n\}v\_\{i\}y\_\{i\}^\{k\+1\}=\\sum\_\{i=1\}^\{n\}v\_\{i\}\\left\(y\_\{i\}^\{k\}\\frac\{v\_\{i\}\}\{\(Sy^\{k\}\)\_\{i\}\}\\right\)=\\sum\_\{i=1\}^\{n\}\\frac\{v\_\{i\}^\{2\}y\_\{i\}^\{k\}\}\{\(Sy^\{k\}\)\_\{i\}\}\.
Factoring out the current massMv\(yk\)M\_\{v\}\(y^\{k\}\), we rewrite the sum in terms ofλi\\lambda\_\{i\}:

Mv\(yk\+1\)=Mv\(yk\)∑i=1nλi\(vi\(Syk\)i\)M\_\{v\}\(y^\{k\+1\}\)=M\_\{v\}\(y^\{k\}\)\\sum\_\{i=1\}^\{n\}\\lambda\_\{i\}\\left\(\\frac\{v\_\{i\}\}\{\(Sy^\{k\}\)\_\{i\}\}\\right\)
Sincef\(t\)=1/tf\(t\)=1/tis a strictly convex function on\(0,∞\)\(0,\\infty\), we apply Jensen’s inequality:

∑i=1nλi1\(Syk\)i/vi≥1∑i=1nλi\(Syk\)ivi\.\\sum\_\{i=1\}^\{n\}\\lambda\_\{i\}\\frac\{1\}\{\(Sy^\{k\}\)\_\{i\}/v\_\{i\}\}\\geq\\frac\{1\}\{\\sum\_\{i=1\}^\{n\}\\lambda\_\{i\}\\frac\{\(Sy^\{k\}\)\_\{i\}\}\{v\_\{i\}\}\}\.
Substituting the definition ofλi\\lambda\_\{i\}into the denominator:

∑i=1nλi\(Syk\)ivi=∑i=1n\(viyikMv\(yk\)\)\(Syk\)ivi=1Mv\(yk\)∑i=1nyik\(Syk\)i=\(yk\)TSykMv\(yk\)\\sum\_\{i=1\}^\{n\}\\lambda\_\{i\}\\frac\{\(Sy^\{k\}\)\_\{i\}\}\{v\_\{i\}\}=\\sum\_\{i=1\}^\{n\}\\left\(\\frac\{v\_\{i\}y\_\{i\}^\{k\}\}\{M\_\{v\}\(y^\{k\}\)\}\\right\)\\frac\{\(Sy^\{k\}\)\_\{i\}\}\{v\_\{i\}\}=\\frac\{1\}\{M\_\{v\}\(y^\{k\}\)\}\\sum\_\{i=1\}^\{n\}y\_\{i\}^\{k\}\(Sy^\{k\}\)\_\{i\}=\\frac\{\(y^\{k\}\)^\{T\}Sy^\{k\}\}\{M\_\{v\}\(y^\{k\}\)\}
Combining these results yields the bound:

Mv\(yk\+1\)≥Mv\(yk\)⋅Mv\(yk\)\(yk\)TSyk\.M\_\{v\}\(y^\{k\+1\}\)\\geq M\_\{v\}\(y^\{k\}\)\\cdot\\frac\{M\_\{v\}\(y^\{k\}\)\}\{\(y^\{k\}\)^\{T\}Sy^\{k\}\}\.
By Lemma[1](https://arxiv.org/html/2605.05330#Thmlemma1), we haveMv\(yk\)≥\(yk\)TSykM\_\{v\}\(y^\{k\}\)\\geq\(y^\{k\}\)^\{T\}Sy^\{k\}for anyyky^\{k\}that is an image of the WRGN operator\. ThereforeMv\(yk\+1\)≥Mv\(yk\)M\_\{v\}\(y^\{k\+1\}\)\\geq M\_\{v\}\(y^\{k\}\)\.

The inequality is strict unless all terms\(Syk\)i/vi\(Sy^\{k\}\)\_\{i\}/v\_\{i\}are equal fori∈supp\(yk\)i\\in\\text\{supp\}\(y^\{k\}\)\. If\(Syk\)i/vi=λ\(Sy^\{k\}\)\_\{i\}/v\_\{i\}=\\lambdafor some constantλ\\lambda, then by Lemma[1](https://arxiv.org/html/2605.05330#Thmlemma1),yky^\{k\}is a fixed point andλ=1\\lambda=1\. In all other cases, the strict convexity off\(t\)=1/tf\(t\)=1/tensuresMv\(yk\+1\)\>Mv\(yk\)M\_\{v\}\(y^\{k\+1\}\)\>M\_\{v\}\(y^\{k\}\)\. Furthermore, since\(yk\)TSyk<Mv\(yk\)\(y^\{k\}\)^\{T\}Sy^\{k\}<M\_\{v\}\(y^\{k\}\)for all non\-fixed pointsyk∈im\(WRGN\)y^\{k\}\\in\\text\{im\}\(\\text\{WRGN\}\), the growth factorMv\(yk\)\(yk\)TSyk\\frac\{M\_\{v\}\(y^\{k\}\)\}\{\(y^\{k\}\)^\{T\}Sy^\{k\}\}is strictly greater than unity, reinforcing the monotonic increase\. ∎

###### Corollary 2\.

When applied to the MWIS problem withv=wv=\\sqrt\{w\}, WRGN strictly increases the MWIS objective∑wixi\\sum w\_\{i\}x\_\{i\}at each iteration\.

###### Proof\.

Mv\(y\)=∑i=1nviyi=∑i=1nvi×vixi=∑i=1nvi2xi=∑i=1nwixiM\_\{v\}\(y\)=\\sum\_\{i=1\}^\{n\}v\_\{i\}y\_\{i\}=\\sum\_\{i=1\}^\{n\}v\_\{i\}\\times v\_\{i\}x\_\{i\}=\\sum\_\{i=1\}^\{n\}v^\{2\}\_\{i\}x\_\{i\}=\\sum\_\{i=1\}^\{n\}w\_\{i\}x\_\{i\}\. ∎

## Appendix GProof of Theorem[4](https://arxiv.org/html/2605.05330#Thmtheorem4)\(Binarization Regularization\)

Jacobian at Fixed Points\.

###### Lemma 2\(Jacobian at Fixed Points of GN\)\.

Given a graphAA,v\>0v\>0andγ\>0\\gamma\>0, we denoteB:=AIγ,vB:=\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma,v\}to lighten notations\. Letx∗x^\{\*\}be a fixed point of the WRGN map𝒩B\\mathcal\{N\}\_\{B\}andS:=supp⁡\(x∗\)S:=\\operatorname\{supp\}\(x^\{\*\}\)the support ofx∗x^\{\*\}\. Let\{C1,…Cp\}\\\{C\_\{1\},\\dots C\_\{p\}\\\}be the connected components ofSS\. The JacobianJ\(x∗\)J\(x^\{\*\}\)of𝒩B\\mathcal\{N\}\_\{B\}atx∗x^\{\*\}has the form:

Jij\(x∗\)=\(δij−xi∗Bij\)\(Bx∗\)iJ\_\{ij\}\(x^\{\*\}\)=\\frac\{\\big\(\\delta\_\{ij\}\-x\_\{i\}^\{\*\}B\_\{ij\}\\big\)\}\{\(Bx^\{\*\}\)\_\{i\}\}
whereδij\\delta\_\{ij\}is the Kronecker delta\. It yields the upper\-diagonal block structure:

J\(x∗\)=\(JC1C10…JC1S¯0⋱0⋮00JCpCpJCpS¯000JS¯S¯\)J\(x^\{\*\}\)=\\begin\{pmatrix\}J\_\{C\_\{1\}C\_\{1\}\}&0&\\dots&J\_\{C\_\{1\}\\bar\{S\}\}\\\\ 0&\\ddots&0&\\vdots\\\\ 0&0&J\_\{C\_\{p\}C\_\{p\}\}&J\_\{C\_\{p\}\\bar\{S\}\}\\\\ 0&0&0&J\_\{\\bar\{S\}\\bar\{S\}\}\\end\{pmatrix\}
whereJCkCkJ\_\{C\_\{k\}C\_\{k\}\}corresponds to edges between nodes of the componentCkC\_\{k\},JS¯S¯J\_\{\\bar\{S\}\\bar\{S\}\}to edges between zero nodes, andJCkS¯J\_\{C\_\{k\}\\bar\{S\}\}to edges between the componentCkC\_\{k\}and zero nodes\. The lower\-left block is zero\.

The diagonal blocks are:

JCkCk\\displaystyle J\_\{C\_\{k\}C\_\{k\}\}=ICk−diag⁡\(xCk∗\)BCk\\displaystyle=I\_\{C\_\{k\}\}\-\\operatorname\{diag\}\(x^\{\*\}\_\{C\_\{k\}\}\)B\_\{C\_\{k\}\}JS¯S¯\\displaystyle J\_\{\\bar\{S\}\\bar\{S\}\}=diag⁡\(1BS¯xS¯∗\)\\displaystyle=\\operatorname\{diag\}\\left\(\\frac\{1\}\{B\_\{\\bar\{S\}\}x^\{\*\}\_\{\\bar\{S\}\}\}\\right\)

###### Proof\.

Jij\(x\)=∂∂xj\[xi\(Bx\)i\]=δij\(Bx\)i−xiBij\(Bx\)i2=1\(Bx\)i\(δij−xi\(Bx\)iBij\)J\_\{ij\}\(x\)=\\frac\{\\partial\}\{\\partial x\_\{j\}\}\\Big\[\\frac\{x\_\{i\}\}\{\(Bx\)\_\{i\}\}\\Big\]=\\frac\{\\delta\_\{ij\}\(Bx\)\_\{i\}\-x\_\{i\}B\_\{ij\}\}\{\(Bx\)\_\{i\}^\{2\}\}=\\frac\{1\}\{\(Bx\)\_\{i\}\}\\big\(\\delta\_\{ij\}\-\\frac\{x\_\{i\}\}\{\(Bx\)\_\{i\}\}B\_\{ij\}\\big\)
Therefore at a fixed pointx∗x^\{\*\}, verifyingx∗=x∗/Bx∗x^\{\*\}=x^\{\*\}/Bx^\{\*\}:

Jij\(x∗\)=\(δij−xi∗Bij\)\(Bx∗\)iJ\_\{ij\}\(x^\{\*\}\)=\\frac\{\\big\(\\delta\_\{ij\}\-x\_\{i\}^\{\*\}B\_\{ij\}\\big\)\}\{\(Bx^\{\*\}\)\_\{i\}\}
Fori∈Si\\in S, we havexi∗\>0x\_\{i\}^\{\*\}\>0and thus\(Bx∗\)i=1\(Bx^\{\*\}\)\_\{i\}=1, hence:

Jij\(x∗\)=δij−xi∗BijJ\_\{ij\}\(x^\{\*\}\)=\\delta\_\{ij\}\-x\_\{i\}^\{\*\}B\_\{ij\}
Note that ifjjalso belongs toSS, i\.e\. ifxj∗\>0x^\{\*\}\_\{j\}\>0, thenJij\(x∗\)≠0J\_\{ij\}\(x^\{\*\}\)\\neq 0if and only ifjjis a neighbor ofiibelonging to the same connected componentCkC\_\{k\}of the support asii\. Hence the diagonal block structure ofJSSJ\_\{SS\}\.

Fori∉Si\\notin S, we havexi∗=0x\_\{i\}^\{\*\}=0, and thusJS¯S¯J\_\{\\bar\{S\}\\bar\{S\}\}is diagonal:

Jij\(x∗\)=δij\(Bx∗\)iJ\_\{ij\}\(x^\{\*\}\)=\\frac\{\\delta\_\{ij\}\}\{\(Bx^\{\*\}\)\_\{i\}\}
Note that\(Bx∗\)i\>0\(Bx^\{\*\}\)\_\{i\}\>0becausex∗∈𝖭Ax^\{\*\}\\in\\mathsf\{N\}\_\{A\}\. ∎

We now consider a fixed pointx∗x^\{\*\}of the WRGN map𝒩AIγ,v\\mathcal\{N\}\_\{\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma,v\}\}withγ\>1\\gamma\>1\. We prove the items of Theorem[4](https://arxiv.org/html/2605.05330#Thmtheorem4)one by one\.

Every non\-binary fixed point is repulsive\.Suppose thatx∗x^\{\*\}is not binary\. Its support must then contain a connected componentCCwith at least 2 nodes\. Indeed, if every nodei∈Si\\in Ssatisfiedxj∗=0x^\{\*\}\_\{j\}=0for allj∈N\(i\)j\\in N\(i\), then\(Bx∗\)i=Biixi∗=xi∗\(Bx^\{\*\}\)\_\{i\}=B\_\{ii\}x^\{\*\}\_\{i\}=x^\{\*\}\_\{i\}, which impliesxi∗=xi∗/xi∗=1x^\{\*\}\_\{i\}=x^\{\*\}\_\{i\}/x^\{\*\}\_\{i\}=1, meaningx∗x^\{\*\}would be binary\.

We analyze the Jacobian blockJCCJ\_\{CC\}\. To avoid clutter, we restrict all notations implicitly to the componentCC:J=I−diag⁡\(x∗\)AIγ,vJ=I\-\\operatorname\{diag\}\(x^\{\*\}\)\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma,v\}\. LetX=diag⁡\(x∗\)X=\\operatorname\{diag\}\(x^\{\*\}\)\. SinceJJis similar toM=X−1/2JX1/2M=X^\{\-1/2\}JX^\{1/2\}, they share the same eigenvalues\. We have:

M=I−X1/2AIγ,vX1/2\.M=I\-X^\{1/2\}\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma,v\}X^\{1/2\}\.
LetQ=X1/2AIγ,vX1/2Q=X^\{1/2\}\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma,v\}X^\{1/2\}\. SinceAIγ,v=V−1AIγV\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma,v\}=V^\{\-1\}\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}VandV,XV,Xare positive diagonal matrices,QQis congruent toAIγ\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}\. By Sylvester’s Law of Inertia,QQandAIγ\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}share the same number of negative eigenvalues\. We examine the minimum eigenvalue ofAIγ\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}:

λmin\(AIγ\)=γλmin\(A\)\+1\.\\lambda\_\{\\min\}\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}\)=\\gamma\\lambda\_\{\\min\}\(A\)\+1\.
For any*connected*graph with at least one edge,λmin\(A\)≤−1\\lambda\_\{\\min\}\(A\)\\leq\-1\[[15](https://arxiv.org/html/2605.05330#bib.bib49)\]\. Thus,λmin\(AIγ\)≤1−γ<0\\lambda\_\{\\min\}\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}\)\\leq 1\-\\gamma<0forγ\>1\\gamma\>1\. By congruence,QQpossesses at least one negative eigenvalueλQ<0\\lambda\_\{Q\}<0\. The corresponding eigenvalue of the Jacobian blockMMisλM=1−λQ\>1\\lambda\_\{M\}=1\-\\lambda\_\{Q\}\>1\. Therefore, the spectral radiusρ\(J\(x∗\)\)≥λM\>1\\rho\(J\(x^\{\*\}\)\)\\geq\\lambda\_\{M\}\>1, establishing that the fixed point is strictly repulsive\.

Condition of stability of binary fixed points\.Now assume thatx∗x^\{\*\}is binary\. It is necessarily a MIS hence all the connected components of its supportS=supp⁡\(x∗\)S=\\operatorname\{supp\}\(x^\{\*\}\)are isolated nodes\. For a nodeiiwithxi=1x\_\{i\}=1, the Jacobian blocJiiJ\_\{ii\}corresponding to its support is a1×11\\times 1bloc with value:Jii=Iii−diag\(x∗\)ii\(AIγ,v\)ii=1−1∗1=0J\_\{ii\}=I\_\{ii\}\-\\operatorname\{diag\}\(x^\{\*\}\)\_\{ii\}\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma,v\}\)\_\{ii\}=1\-1\*1=0\. The corresponding eigenvalue is thus0\.

The non\-support blockJS¯S¯\(x∗\)J\_\{\\bar\{S\}\\bar\{S\}\}\(x^\{\*\}\)is diagonal with entries fori∉Si\\notin S:Jii\(x∗\)=1\(AIγ,vx∗\)iJ\_\{ii\}\(x^\{\*\}\)=\\frac\{1\}\{\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma,v\}x^\{\*\}\)\_\{i\}\}which are the eigenvalues ofJS¯S¯\(x∗\)J\_\{\\bar\{S\}\\bar\{S\}\}\(x^\{\*\}\)\. Developing the denominator:

\(AIγ,vx∗\)i=∑j=1nBijγ,vxj∗=∑j∈Svjvi\(γAij\+δij\)\.\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma,v\}x^\{\*\}\)\_\{i\}=\\sum\_\{j=1\}^\{n\}B^\{\\gamma,v\}\_\{ij\}x^\{\*\}\_\{j\}=\\sum\_\{j\\in S\}\\frac\{v\_\{j\}\}\{v\_\{i\}\}\(\\gamma A\_\{ij\}\+\\delta\_\{ij\}\)\.
Sincei∉Si\\notin S, the Kronecker deltaδij=0\\delta\_\{ij\}=0\. Substitutingv=wv=\\sqrt\{w\}:

\(AIγ,vx∗\)i=γ∑j∈N\(i\)∩Swjwi\.\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma,v\}x^\{\*\}\)\_\{i\}=\\gamma\\sum\_\{j\\in N\(i\)\\cap S\}\\sqrt\{\\frac\{w\_\{j\}\}\{w\_\{i\}\}\}\.
Therefore

ρ\(J\(x∗\)\)=ρ\(JS¯S¯\(x∗\)\)=maxi∉S⁡\(1γ∑j∈N\(i\)∩Swjwi\)\\rho\(J\(x^\{\*\}\)\)=\\rho\(J\_\{\\bar\{S\}\\bar\{S\}\}\(x^\{\*\}\)\)=\\max\_\{i\\notin S\}\\left\(\\frac\{1\}\{\\gamma\\sum\_\{j\\in N\(i\)\\cap S\}\\sqrt\{\\frac\{w\_\{j\}\}\{w\_\{i\}\}\}\}\\right\)
Asymptotic stability requiresρ\(J\(x∗\)\)<1\\rho\(J\(x^\{\*\}\)\)<1which gives the stability condition provided in the theorem\.

Stability of the MWISes\.LetSSbe is a MWIS\. Because the functionf\(z\)=zf\(z\)=\\sqrt\{z\}is concave:∑wj≥∑wj\\sum\\sqrt\{w\_\{j\}\}\\geq\\sqrt\{\\sum w\_\{j\}\}\. Therefore:

stabγ,w⁡\(S\)\\displaystyle\\operatorname\{stab\}\_\{\\gamma,w\}\(S\)=mini∉S⁡\(γ∑j∈N\(i\)∩Swjwi\)\\displaystyle=\\min\_\{i\\notin S\}\\left\(\\gamma\\sum\_\{j\\in N\(i\)\\cap S\}\\sqrt\{\\frac\{w\_\{j\}\}\{w\_\{i\}\}\}\\right\)=mini∉S⁡\(γ∑j∈N\(i\)∩Swjwi\)\\displaystyle=\\min\_\{i\\notin S\}\\left\(\\gamma\\frac\{\\sum\_\{j\\in N\(i\)\\cap S\}\\sqrt\{w\_\{j\}\}\}\{\\sqrt\{w\_\{i\}\}\}\\right\)≥mini∉S⁡\(γ∑j∈N\(i\)∩Swjwi\)\\displaystyle\\geq\\min\_\{i\\notin S\}\\left\(\\gamma\\frac\{\\sqrt\{\\sum\_\{j\\in N\(i\)\\cap S\}w\_\{j\}\}\}\{\\sqrt\{w\_\{i\}\}\}\\right\)=γmini∉S⁡∑j∈N\(i\)∩Swjwi\\displaystyle=\\gamma\\min\_\{i\\notin S\}\\sqrt\{\\sum\_\{j\\in N\(i\)\\cap S\}\\frac\{w\_\{j\}\}\{w\_\{i\}\}\}
AsSSis a MWIS, for any nodei∉Si\\notin S, the set\(S∪\{i\}\)∖\(N\(i\)∩S\)\(S\\cup\\\{i\\\}\)\\setminus\(N\(i\)\\cap S\)is another independent set, whose weight must be less than or equal to the weight ofSS:

∀i∉S:wi−∑j∈N\(i\)∩Swj≤0⟹∀i∉S:∑j∈N\(i\)∩Swjwi≥1\.\\forall i\\notin S:\\quad w\_\{i\}\-\\sum\_\{j\\in N\(i\)\\cap S\}w\_\{j\}\\leq 0\\implies\\forall i\\notin S:\\quad\\sum\_\{j\\in N\(i\)\\cap S\}\\frac\{w\_\{j\}\}\{w\_\{i\}\}\\geq 1\.
Hence:

stabγ,w⁡\(S\)≥γmini∉S⁡∑j∈N\(i\)∩Swjwi≥γ\>1\.\\operatorname\{stab\}\_\{\\gamma,w\}\(S\)\\geq\\gamma\\min\_\{i\\notin S\}\\sqrt\{\\sum\_\{j\\in N\(i\)\\cap S\}\\frac\{w\_\{j\}\}\{w\_\{i\}\}\}\\geq\\gamma\>1\.
Convergence to a MIS\.As WRGN always converges and it must converge to a stable attractor, it necessarily converges to a MIS verifying the stability condition above\. As the MWISes are always stable \(and also the global minimizers of the energy\), there is at least one such stable attractor \(conversely if it was not the case we would have a contradiction\)\.

## Appendix HEnergy Landscapes and Phase Transitions

We explore11D energy profiles on the simplex as a function ofγ\\gammaandwwin order to illustrate the phase transition mechanism\.

Consider the weighted complete graph\(K2,w\)\(K\_\{2\},w\), with two nodes and a single edge between them\. We number the nodes0and11and parameterize their simplex state byp0=1−xp\_\{0\}=1\-xandp1=xp\_\{1\}=x, so thatx=0x=0corresponds to the binary solutionp=\(1,0\)p=\(1,0\)indicator of the MISM0=\{0\}M\_\{0\}=\\\{0\\\},x=1x=1top=\(0,1\)p=\(0,1\)indicator of the MISM1=\{1\}M\_\{1\}=\\\{1\\\}, and intermediate statesp=\(1−x,x\)p=\(1\-x,x\)correspond to fuzzy sets solutions\. As energy minima only depend on the ratiow0/w1w\_\{0\}/w\_\{1\}, we fix the weightw1=1w\_\{1\}=1\.

The energy for this system is the quadratic function:

ℰγ,w0\(x\)\\displaystyle\\mathcal\{E\}\_\{\\gamma,w\_\{0\}\}\(x\)=\(\(1−x\)/w0,x\)\(1γγ1\)\(\(1−x\)/w0,x\)\\displaystyle=\(\(1\-x\)/\\sqrt\{w\_\{0\}\},x\)\\begin\{pmatrix\}1&\\gamma\\\\ \\gamma&1\\end\{pmatrix\}\(\(1\-x\)/\\sqrt\{w\_\{0\}\},x\)=x2\+2γw0x\(1−x\)\+1w0\(1−x\)2\\displaystyle=x^\{2\}\+\\frac\{2\\gamma\}\{\\sqrt\{w\_\{0\}\}\}x\(1\-x\)\+\\frac\{1\}\{w\_\{0\}\}\(1\-x\)^\{2\}
The graph ofℰγ,w0\\mathcal\{E\}\_\{\\gamma,w\_\{0\}\}is a parabola with its apex atℰγ,w0′\(xa\)=0\\mathcal\{E\}\_\{\\gamma,w\_\{0\}\}^\{\\prime\}\(x\_\{a\}\)=0, which gives:

xa=1−γw01\+w0−2γw0\.x\_\{a\}=\\frac\{1\-\\gamma\\sqrt\{w\_\{0\}\}\}\{1\+w\_\{0\}\-2\\gamma\\sqrt\{w\_\{0\}\}\}\.
The curvature of the parabola isC=ℰγ,w0′′\(x\)=2−4γ/w0\+2/w0C=\\mathcal\{E\}\_\{\\gamma,w\_\{0\}\}^\{\\prime\\prime\}\(x\)=2\-4\\gamma/\\sqrt\{w\_\{0\}\}\+2/w\_\{0\}\. It is zero, i\.e\. the parabola is a line, forγF=12\(w0\+1w0\)\\gamma\_\{F\}=\\frac\{1\}\{2\}\(\\sqrt\{w\_\{0\}\}\+\\frac\{1\}\{\\sqrt\{w\_\{0\}\}\}\)\.

The graphs of this energy are represented in Figure[3](https://arxiv.org/html/2605.05330#A8.F3)for different values ofγ\\gammaandw0w\_\{0\}\. As states outside the simplexΔ1=\[0,1\]\\Delta\_\{1\}=\[0,1\]are prohibited, we represent them with infinite energies, which corresponds to energy barriers atx=0x=0andx=1x=1\. Local minima \(stable optima\) of the energy are represented by black disks\. Local maxima \(unstable optima\) of the energy are represented by white disks\.

Figure 3:Energy profiles onK2K\_\{2\}1\. Unweighted case \(w0=1w\_\{0\}=1\)\.

Forγ<1\\gamma<1, the energy is strictly convex with a unique minimum atx=1/2x=1/2\. Iterated GN converges to the fractional point\(1/2,1/2\)\(1/2,1/2\)from any initialization333Note thatγ=0\\gamma=0is a special case which corresponds to completely removing the edges from the graph, leaving the WRGN update:xik\+1=xik/\(xik\+0\)=1x^\{k\+1\}\_\{i\}=x^\{k\}\_\{i\}/\(x^\{k\}\_\{i\}\+0\)=1\. As in WRGN, the domain constraints are enforced by the image of the map itself,γ=0\\gamma=0enforces the unique solutionx=𝟏x=\\mathbf\{1\}which is the MIS of the graph withnnnodes and no edges\.\.

Forγ=1\\gamma=1, the energy is flat and anyx∈\[0,1\]x\\in\[0,1\]a stable minimum\. Any positive initialization is projected to the simplex in11iteration of graph normalization and stable after that\.

Forγ\>1\\gamma\>1, the energy is strictly concave, with two local minima atx=0x=0andx=1x=1, which correspond the two binary MISes of the graph\. Iterated GN converges to\(1,0\)\(1,0\)if initialized atx0<1/2x^\{0\}<1/2and to\(0,1\)\(0,1\)if initializedx0\>1/2x^\{0\}\>1/2\. The apexxa=1/2x\_\{a\}=1/2is an unstable local maximum: initialization exactly on1/21/2remains fixed but any slight deviation make the system fall to one of the two MISes of the graph\.

2\. Weighted case \(w0\>1w\_\{0\}\>1\)\.

As show by the Tilted Simplex Motzkin\-Straus Theorem[6](https://arxiv.org/html/2605.05330#Thmtheorem6),w0≠1w\_\{0\}\\neq 1corresponds to tilting the simplex\. The energy value on the MIS atx=0x=0isℰγ,w0\(0\)=1/w0\\mathcal\{E\}\_\{\\gamma,w\_\{0\}\}\(0\)=1/w\_\{0\}\.

Asγ\\gammaincreases, the system goes through two phase transitions\.

Forγ=0\\gamma=0, the parabola is convex \(C=2\+2/w0\>0C=2\+2/w\_\{0\}\>0\) and its unique global minimum is atxa=w01\+w0∈\(0,1\)x\_\{a\}=\\frac\{w\_\{0\}\}\{1\+w\_\{0\}\}\\in\(0,1\)\. WRGN thus converges to the fractional solution\(1−xa,xa\)\(1\-x\_\{a\},x\_\{a\}\)from any initialization\.

Asγ\\gammaincreasesxax\_\{a\}decreases until it crosses0atγB=1/w0\\gamma\_\{B\}=1/\\sqrt\{w\_\{0\}\}\. It corresponds to the transition where WRGN now converges to the binary MIS\(1,0\)\(1,0\)from any initialization\. Forw0=2w\_\{0\}=2, the transition happens atγB=1/2\\gamma\_\{B\}=1/\\sqrt\{2\}, and forw0=4w\_\{0\}=4atγB=1/2\\gamma\_\{B\}=1/2, i\.e\. closer to0\.

Further increasingγ\\gamma, the curvature of the parabola crosses0atγF\\gamma\_\{F\}, where it becomes a straight line before turning concave\. However, because this line is tilted by the weight ratio,x=0x=0remains the unique minimum\. A second local minimum \(the suboptimal MIS\) only emerges when the apexxax\_\{a\}crosses11atγD=w0\\gamma\_\{D\}=\\sqrt\{w\_\{0\}\}\. Forγ\>γD\\gamma\>\\gamma\_\{D\}, the suboptimal MIS becomesγ\\gamma\-Stable, and the system exhibits the local\-minimum correspondence predicted by Theorem[6](https://arxiv.org/html/2605.05330#Thmtheorem6)\.

Generalization to other graphs\.

For a general weighted graph\(A,w\)\(A,w\), if we consider two adjacent nodesi,ji,j, then the energy on the11D simplex slice given bypi=1−xp\_\{i\}=1\-x,pj=xp\_\{j\}=xandpk=0p\_\{k\}=0fork≠i,jk\\neq i,jhas the same general profile than in theK2K\_\{2\}case and the same succession of transitions occur asγ\\gammaincreases\. More generally, any11D slice of the simplex gives a11D quadratic energy which can have either11or22minima\. To have11fractional minimum, the quadratic function must be convex on the slice, with its minimum in\(0,1\)\(0,1\)\. On the contrary, if it has22minima, they must be on the boundary of the domain and the energy must be concave along the slice\. In general, the energetical landscape is a combination of concave or convex profiles along11D directions\.

## Appendix IProof of Theorem[5](https://arxiv.org/html/2605.05330#Thmtheorem5)\(WRGN as Nonlinear Replicator\)

By the definition of the simplex statepkp^\{k\}, we haveyik=Mvkpikviy\_\{i\}^\{k\}=\\frac\{M\_\{v\}^\{k\}p\_\{i\}^\{k\}\}\{v\_\{i\}\}, or in vector formyk=Mvkpk⊘vy^\{k\}=M\_\{v\}^\{k\}p^\{k\}\\oslash v\. Substituting this into the WRGN update rule:

yik\+1=\(Mvkpikvi\)vi\(AIγ\(Mvkpk⊘v\)\)i=MvkpikMvk\(AIγ\(pk⊘v\)\)i=pik\(AIγ\(pk⊘v\)\)iy\_\{i\}^\{k\+1\}=\\left\(\\frac\{M\_\{v\}^\{k\}p\_\{i\}^\{k\}\}\{v\_\{i\}\}\\right\)\\frac\{v\_\{i\}\}\{\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}\(M\_\{v\}^\{k\}p^\{k\}\\oslash v\)\)\_\{i\}\}=\\frac\{M\_\{v\}^\{k\}p\_\{i\}^\{k\}\}\{M\_\{v\}^\{k\}\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}\(p^\{k\}\\oslash v\)\)\_\{i\}\}=\\frac\{p\_\{i\}^\{k\}\}\{\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}\(p^\{k\}\\oslash v\)\)\_\{i\}\}
Multiplying both sides byviv\_\{i\}:

viyik\+1=pikvi\(AIγ\(pk⊘v\)\)i=pikfi\(pk\)v\_\{i\}y\_\{i\}^\{k\+1\}=p\_\{i\}^\{k\}\\frac\{v\_\{i\}\}\{\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}\(p^\{k\}\\oslash v\)\)\_\{i\}\}=p\_\{i\}^\{k\}f\_\{i\}\(p^\{k\}\)
Summing overiiyields the evolution of the weighted mass:

Mvk\+1=∑ipikfi\(pk\)=f¯\(pk\)M\_\{v\}^\{k\+1\}=\\sum\_\{i\}p\_\{i\}^\{k\}f\_\{i\}\(p^\{k\}\)=\\bar\{f\}\(p^\{k\}\)which shows that it is equal to the average fitness at the previous generation\. As WRGN systematically increasesMvkM\_\{v\}^\{k\}out of equilibrium, the average fitness follows the same evolution\.

The normalized weighted statepik\+1=viyik\+1/Mvk\+1p\_\{i\}^\{k\+1\}=v\_\{i\}y\_\{i\}^\{k\+1\}/M\_\{v\}^\{k\+1\}then reduces to the discrete replicator equation:

pik\+1=pikfi\(pk\)f¯\(pk\)\.p\_\{i\}^\{k\+1\}=p\_\{i\}^\{k\}\\frac\{f\_\{i\}\(p^\{k\}\)\}\{\\bar\{f\}\(p^\{k\}\)\}\.
WRGN is a non\-potential game\.For a vector fieldf\(x\)f\(x\)to be a gradient \(i\.e\.,f=∇Φf=\\nabla\\Phi\), the Poincaré Lemma requires that its Jacobian matrix must be symmetric:∂fi∂xj=∂fj∂xi\\frac\{\\partial f\_\{i\}\}\{\\partial x\_\{j\}\}=\\frac\{\\partial f\_\{j\}\}\{\\partial x\_\{i\}\}\. For the WRGN fitness function, lettingS=AIγS=\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}andq=p⊘vq=p\\oslash v, gives the condition:

−vivjSij\(Sq\)i2=−vjviSji\(Sq\)j2\-\\frac\{v\_\{i\}\}\{v\_\{j\}\}\\frac\{S\_\{ij\}\}\{\(Sq\)\_\{i\}^\{2\}\}=\-\\frac\{v\_\{j\}\}\{v\_\{i\}\}\\frac\{S\_\{ji\}\}\{\(Sq\)\_\{j\}^\{2\}\}\(9\)
AsSSis symmetric \(Sij=SjiS\_\{ij\}=S\_\{ji\}\):

vi2\(Sq\)i2=vj2\(Sq\)j2⟹\(Sq\)ivi=\(Sq\)jvj\\frac\{v\_\{i\}^\{2\}\}\{\(Sq\)\_\{i\}^\{2\}\}=\\frac\{v\_\{j\}^\{2\}\}\{\(Sq\)\_\{j\}^\{2\}\}\\implies\\frac\{\(Sq\)\_\{i\}\}\{v\_\{i\}\}=\\frac\{\(Sq\)\_\{j\}\}\{v\_\{j\}\}\(10\)
Even in the unweighted case \(v=𝟏v=\\mathbf\{1\}\), this requires\(Sx\)i=\(Sx\)j\(Sx\)\_\{i\}=\(Sx\)\_\{j\}for all adjacent nodes, a condition that cannot be satisfied globally \(for allxx\) for non\-trivial graphs\. Non\-uniform weights render the condition even harder\.

This shows that the WRGN fitness is not a gradient hence the WRGN game is a non\-potential game\.

## Appendix JProof of Theorem[6](https://arxiv.org/html/2605.05330#Thmtheorem6)\(Weight\-Tilted Simplex Motzkin\-Straus\)

Correspondence between stable fixed points of WRGN and minima of energy\.

Sinceℰ~γ,v\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}is a smooth quadratic and WRGN is an exact MM algorithm, we have the stationarity property: every limit pointy∗∈𝖭Ay^\{\*\}\\in\\mathsf\{N\}\_\{A\}is a fixed point of the map, which implies it is a critical point ofℰ~γ,v\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}satisfying the KKT conditions \(specifically, the complementarity conditionyi\[\(AIγy\)i−vi\]=0y\_\{i\}\[\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}y\)\_\{i\}\-v\_\{i\}\]=0\)\. While the set of limit points is the set of critical points, only the stable fixed points of the map correspond to the local minima of the energy\. As proven by Theorem[4](https://arxiv.org/html/2605.05330#Thmtheorem4), forγ\>1\\gamma\>1, the only asymtotically stable fixed points of WRGN are theγ\\gamma\-stable MISes\.

Now, WRGN is undefined at the non\-normalizable points of the non\-negative orthant\. However those non\-normalizable points are not local minima ofℰ~γ,v\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\. Indeed, a pointy∈ℝ\+ny\\in\\mathbb\{R\}\_\{\+\}^\{n\}is non\-normalizable if there exists at least one indexiisuch that\(AIγy\)i=0\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}y\)\_\{i\}=0\. The partial derivative of the energy in the directioniiis then:

∂ℰ~γ,v∂yi=\(AIγy\)i−vi=0−vi=−vi\\frac\{\\partial\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\}\{\\partial y\_\{i\}\}=\(\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}y\)\_\{i\}\-v\_\{i\}=0\-v\_\{i\}=\-v\_\{i\}Sincevi=wi\>0v\_\{i\}=\\sqrt\{w\_\{i\}\}\>0, the gradient is strictly negative in the direction ofii\. As the constraintyi≥0y\_\{i\}\\geq 0is active at that boundary, the point cannot satisfy the first\-order necessary conditions for a local minimum\. Moving fromyytoy\+ϵeiy\+\\epsilon e\_\{i\}\(whereeie\_\{i\}is theii\-th basis vector\) strictly decreases the energy for anyϵ\>0\\epsilon\>0\.

As a conclusion, for anyγ\>1\\gamma\>1, the local minima of the energyℰ~γ,v\(y\)\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y\)onℝ\+n\\mathbb\{R\}\_\{\+\}^\{n\}are in1:11:1correspondence with theγ\\gamma\-stable MISes of the graph\.

Minima on the Simplex\.

We establish the simplex form of the WRGN energy\. For a weighted statey∈ℝ\+ny\\in\\mathbb\{R\}\_\{\+\}^\{n\}, letMv=∑iviyiM\_\{v\}=\\sum\_\{i\}v\_\{i\}y\_\{i\}denote the weighted mass andp=\(v⊙y\)/Mvp=\(v\\odot y\)/M\_\{v\}the simplex state, such thaty=Mv\(p⊘v\)y=M\_\{v\}\(p\\oslash v\)\. Substituting into the energyℰ~γ,v\(y\)=12yTAIγy−vTy\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y\)=\\frac\{1\}\{2\}y^\{T\}\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}y\-v^\{T\}y:

ℰ~γ,v\(Mv,p\)\\displaystyle\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(M\_\{v\},p\)=Mv22\(p⊘v\)TAIγ\(p⊘v\)−Mv=Mv22𝒬v\(p\)−Mv\\displaystyle=\\frac\{M\_\{v\}^\{2\}\}\{2\}\(p\\oslash v\)^\{T\}\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}\(p\\oslash v\)\-M\_\{v\}=\\frac\{M\_\{v\}^\{2\}\}\{2\}\\mathcal\{Q\}\_\{v\}\(p\)\-M\_\{v\}where𝒬v\(p\):=\(p⊘v\)T\(I\+γA\)\(p⊘v\)\\mathcal\{Q\}\_\{v\}\(p\):=\(p\\oslash v\)^\{T\}\(I\+\\gamma A\)\(p\\oslash v\)\.

For any fixedp∈Δn−1p\\in\\Delta\_\{n\-1\}, since𝒬v\(p\)≥0\\mathcal\{Q\}\_\{v\}\(p\)\\geq 0, the energyℰ~γ,v\(Mv,p\)\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(M\_\{v\},p\)is a strictly convex quadratic function ofMvM\_\{v\}\. The optimal mass along the ray defined byppis found where∂ℰ~γ,v∂Mv=0\\frac\{\\partial\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\}\{\\partial M\_\{v\}\}=0, yieldingMv∗\(p\)=1/𝒬v\(p\)M\_\{v\}^\{\*\}\(p\)=1/\\mathcal\{Q\}\_\{v\}\(p\)\. Substituting this into the energy yields the reduced objective:

ℰ~γ,v\(p\):=ℰ~γ,v\(Mv∗\(p\),p\)=−12𝒬v\(p\)\.\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(p\):=\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(M\_\{v\}^\{\*\}\(p\),p\)=\-\\frac\{1\}\{2\\mathcal\{Q\}\_\{v\}\(p\)\}\.
Becausex↦−1/xx\\mapsto\-1/xis a monotonically increasing function forx\>0x\>0, minimizingℰ~γ,v\(p\)\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(p\)is equivalent to minimizing𝒬v\(p\)\\mathcal\{Q\}\_\{v\}\(p\)\. Hence minimizingℰ~γ,v\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}overℝ\+n\\mathbb\{R\}\_\{\+\}^\{n\}is equivelent to minimizing𝒬v\(p\)\\mathcal\{Q\}\_\{v\}\(p\)overΔn−1\\Delta\_\{n\-1\}\.

Minima on the Weight\-tilted Simplex\.The change of variablesr=p⊘vr=p\\oslash vis a bijection between the standard simplexΔn−1\\Delta\_\{n\-1\}and the weight\-deformed manifoldΔn−1v:=\{r∈ℝ\+n∣∑iviri=1\}\\Delta^\{v\}\_\{n\-1\}:=\\\{r\\in\\mathbb\{R\}\_\{\+\}^\{n\}\\mid\\sum\_\{i\}v\_\{i\}r\_\{i\}=1\\\}\. Under this transformation, the quadratic form becomes:

𝒬v\(p\)=\(p⊘v\)TAIγ\(p⊘v\)=rTAIγr=:𝒬\(r\)\.\\mathcal\{Q\}\_\{v\}\(p\)=\(p\\oslash v\)^\{T\}\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}\(p\\oslash v\)=r^\{T\}\{A\}\_\{\\scriptscriptstyle I\}^\{\\gamma\}r=:\\mathcal\{Q\}\(r\)\.Consequently, minimizing the WRGN energyℰ~γ,v\(y\)\\tilde\{\\mathcal\{E\}\}\_\{\\gamma,v\}\(y\)overℝ\+n\\mathbb\{R\}\_\{\+\}^\{n\}is equivalent to finding:

minr∈Δn−1v⁡rT\(I\+γA\)r\.\\min\_\{r\\in\\Delta^\{v\}\_\{n\-1\}\}r^\{T\}\(I\+\\gamma A\)r\.
Hence the correspondence betweenγ\\gamma\-stable MISes of the graph and the local minima ofrT\(I\+γA\)rr^\{T\}\(I\+\\gamma A\)roverΔn−1v\\Delta^\{v\}\_\{n\-1\}\.

## Appendix KIterative Graph Normalization Pytorch Module

1classIterativeGraphNormalization\(nn\.Module\):

2def\_\_init\_\_\(self,

3A:torch\.sparse\_coo\_tensor,

4weights:np\.ndarray,

5batch\_size:int=256,

6device:str=’cpu’\):

7super\(\)\.\_\_init\_\_\(\)

8self\.v=torch\.sqrt\(weights\)\.unsqueeze\(0\)

9self\.A=A

10self\.batch\_size=batch\_size

11self\.device=device

12

13

14params=\-torch\.rand\(self\.batch\_size,self\.v\.shape\[0\],device=self\.device\)\.log\(\)

15params=torch\.clamp\(params/torch\.max\(params\),1e\-3,1\.\)

16self\.params=nn\.Parameter\(params\)

17

18defforward\(self,gamma0:float=0\.9,gamma1:float=1\.5,iterations:int=1000\):

19foriterinrange\(iterations\):

20p=float\(iter\)/float\(iterations\-1\)

21gamma=p\*gamma1\+\(1\-p\)\*gamma0

22y=self\.v\*x

23Ay=torch\.sparse\.mm\(self\.A,y\.t\(\)\)\.t\(\)

24Ay\.mul\_\(gamma\)\.add\_\(y\)

25

26mask=Ay\>1e\-9

27x\[mask\]=y\[mask\]/Ay\[mask\]

28x\[~mask\]=0\.5

29

30delAy

31returntorch\.clamp\_\(x,0,1\)

Listing 1:Iterative Graph Normalization Pytorch Module
## Appendix LComplete experimental results

Table 4:Performance on AVR datasetTable 5:Performance on MSCD dataset
Graph Normalization: Fast Binarizing Dynamics for Differentiable MWIS

Similar Articles

A Nonmonotone Gradient-Based Algorithm for Symmetric Nonnegative Matrix Factorization and Graph Clustering

Weight normalization: A simple reparameterization to accelerate training of deep neural networks

Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation

Oversmoothing as Representation Degeneracy in Neural Sheaf Diffusion

From Model to Data (M2D): Shifting Complexity from GNNs to Graphs for Transparent Graph Learning

Submit Feedback

Similar Articles

A Nonmonotone Gradient-Based Algorithm for Symmetric Nonnegative Matrix Factorization and Graph Clustering
Weight normalization: A simple reparameterization to accelerate training of deep neural networks
Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation
Oversmoothing as Representation Degeneracy in Neural Sheaf Diffusion
From Model to Data (M2D): Shifting Complexity from GNNs to Graphs for Transparent Graph Learning