Modeling Heterophily in Multiplex Graphs: An Adaptive Approach for Node Classification

arXiv cs.LG 05/14/26, 04:00 AM Papers
Summary
This paper introduces HAAM, a novel method for node classification in multiplex graphs that adapts to both homophilic and heterophilic interactions across dimensions. It uses dimension-specific compatibility matrices and a product of trainable low-pass and high-pass filters approximated via Chebyshev polynomials to capture smooth and abrupt signal changes.
arXiv:2605.12699v1 Announce Type: new Abstract: Existing multiplex graph models often assume homophily, where connected nodes tend to belong to the same class or share similar attributes. Consequently, these models may struggle with graphs exhibiting heterophily, where connected nodes typically belong to different classes and have dissimilar attributes. While recent methods have been developed to learn reliable node representations from unidimensional graphs with heterophily, they do not fully address the complexities of multiplex graphs. In a multiplex graph, nodes are linked through multiple types of edges (referred to as dimensions), which can simultaneously exhibit homophilic and heterophilic interactions. To address this gap, we propose \methodname, a novel method for node classification in multiplex graphs that adapts to both homophilic and heterophilic dimensions. \methodname introduces dimension-specific compatibility matrices to model varying degrees of homophily and heterophily across dimensions. A key innovation is its use of a product of trainable low-pass and high-pass filters, approximated via Chebyshev polynomials, to capture both smooth and abrupt changes in the graph signal. By composing these filters and optimizing label predictions using a proximal-gradient method, \methodname dynamically adjusts to the heterophilic characteristics of each dimension. Extensive experiments on synthetic and real-world datasets provide evidence that \methodname captures the complex interplay of homophilic and heterophilic interactions in multiplex graphs, and tends to yield improved node classification performance compared to state-of-the-art methods.
Original Article
View Cached Full Text
Cached at: 05/14/26, 06:17 AM
# Modeling Heterophily in Multiplex Graphs: An Adaptive Approach for Node Classification
Source: [https://arxiv.org/html/2605.12699](https://arxiv.org/html/2605.12699)
Kamel AbdousKamel Abdous and Nairouz Mrabah contributed equally to this work\.Department of Computer Science, University of Quebec at Montreal, Montreal, QC, Canada abous\.kamel@courrier\.uqam\.ca;mrabah\.nairouz@etsmtl\.livia\.ca;bouguessa\.mohamed@uqam\.caNairouz Mrabah11footnotemark:1Department of Computer Science, University of Quebec at Montreal, Montreal, QC, Canada abous\.kamel@courrier\.uqam\.ca;mrabah\.nairouz@etsmtl\.livia\.ca;bouguessa\.mohamed@uqam\.caMohamed BouguessaDepartment of Computer Science, University of Quebec at Montreal, Montreal, QC, Canada abous\.kamel@courrier\.uqam\.ca;mrabah\.nairouz@etsmtl\.livia\.ca;bouguessa\.mohamed@uqam\.ca

###### Abstract

Existing multiplex graph models often assume homophily, where connected nodes tend to belong to the same class or share similar attributes\. Consequently, these models may struggle with graphs exhibiting heterophily, where connected nodes typically belong to different classes and have dissimilar attributes\. While recent methods have been developed to learn reliable node representations from unidimensional graphs with heterophily, they do not fully address the complexities of multiplex graphs\. In a multiplex graph, nodes are linked through multiple types of edges \(referred to as dimensions\), which can simultaneously exhibit homophilic and heterophilic interactions\. To address this gap, we propose HAAM, a novel method for node classification in multiplex graphs that adapts to both homophilic and heterophilic dimensions\. HAAM introduces dimension\-specific compatibility matrices to model varying degrees of homophily and heterophily across dimensions\. A key innovation is its use of a product of trainable low\-pass and high\-pass filters, approximated via Chebyshev polynomials, to capture both smooth and abrupt changes in the graph signal\. By composing these filters and optimizing label predictions using a proximal\-gradient method, HAAM dynamically adjusts to the heterophilic characteristics of each dimension\. Extensive experiments on synthetic and real\-world datasets provide evidence that HAAM captures the complex interplay of homophilic and heterophilic interactions in multiplex graphs, and tends to yield improved node classification performance compared to state\-of\-the\-art methods\.

Keywords:Multiplex Graphs ; Heterophily & Homophily ; Graph Neural Networks

## 1Introduction

Multiplex graphs offer a comprehensive framework for modeling interconnected systems by capturing multiple layers of interactions within a single structure\[[5](https://arxiv.org/html/2605.12699#bib.bib78)\]\. Unlike traditional unidimensional graphs\[[30](https://arxiv.org/html/2605.12699#bib.bib92)\],\[[29](https://arxiv.org/html/2605.12699#bib.bib94)\],\[[28](https://arxiv.org/html/2605.12699#bib.bib95)\], which represent relationships with a single type of edge, multiplex graphs enable simultaneous and diverse connections between the same set of nodes\[[1](https://arxiv.org/html/2605.12699#bib.bib93)\]\. Each type of connection represents a distinct graph dimension\. This capability is particularly helpful in modeling complex systems where entities interact through various channels, such as social networks\[[6](https://arxiv.org/html/2605.12699#bib.bib12)\], biological networks\[[31](https://arxiv.org/html/2605.12699#bib.bib5)\], and online recommendation systems\[[38](https://arxiv.org/html/2605.12699#bib.bib14)\]\. By preserving the distinct nature of each interaction, multiplex graphs support a more accurate and nuanced analysis of system dynamics, which may lead to deeper insights into how different layers of connectivity contribute to the overall behavior\[[2](https://arxiv.org/html/2605.12699#bib.bib3)\]\.

Graph modeling techniques often rely on the homophily assumption, which presumes that nodes are more likely to connect if they share the same class or closely related attributes\[[37](https://arxiv.org/html/2605.12699#bib.bib79)\]\. While this assumption underpins many traditional methods, its applicability can be limited in systems where connections arise from diverse or even opposing factors\[[46](https://arxiv.org/html/2605.12699#bib.bib72)\]\. In such cases, heterophily plays an important role, as nodes with differing attributes may be more likely to connect\[[23](https://arxiv.org/html/2605.12699#bib.bib81)\]\. For instance, professional networks foster collaborations between individuals with complementary skills\[[4](https://arxiv.org/html/2605.12699#bib.bib82)\], while biological systems often rely on interactions between entities that assume diverse yet interdependent roles\[[20](https://arxiv.org/html/2605.12699#bib.bib71)\]\.

Despite progress in modeling homophily in multiplex graphs\[[41](https://arxiv.org/html/2605.12699#bib.bib4)\],\[[18](https://arxiv.org/html/2605.12699#bib.bib49)\],\[[19](https://arxiv.org/html/2605.12699#bib.bib42)\],\[[25](https://arxiv.org/html/2605.12699#bib.bib48)\],\[[34](https://arxiv.org/html/2605.12699#bib.bib34)\],\[[35](https://arxiv.org/html/2605.12699#bib.bib88)\], the role of heterophily remains relatively underexplored\. Although recent efforts have investigated heterophily in traditional unidimensional graphs\[[21](https://arxiv.org/html/2605.12699#bib.bib1)\],\[[17](https://arxiv.org/html/2605.12699#bib.bib2)\],\[[32](https://arxiv.org/html/2605.12699#bib.bib67)\],\[[43](https://arxiv.org/html/2605.12699#bib.bib74)\], extending this framework to multiplex graphs introduces additional challenges\. Multiplex systems often exhibit conflicting patterns of connectivity, where nodes may be dissimilar in one dimension but similar in another\[[7](https://arxiv.org/html/2605.12699#bib.bib18)\]\. Furthermore, certain dimensions may emphasize homophilic interactions \(e\.g\., individuals with shared interests\), while others reflect heterophilic dynamics \(e\.g\., collaborations between individuals with different expertise\)\. Addressing these challenges is important for advancing tasks like node classification, where leveraging both homophilic and heterophilic connections can improve predictive accuracy\.

To address these challenges, we introduce HAAM \(Heterophily\-Aware Adaptive Multiplex model\), a novel framework tailored to node classification in multiplex graphs111This work focuses on multiplex graphs with homogeneous nodes, where nodes of the same type are connected through multiple dimensions, each representing a distinct type of relation\. While multiplex graphs involve multiple types of edges, this setting is distinct from heterogeneous graphs, which involve both multi\-typed nodes and edges\. Other graph types, such as heterogeneous or dynamic/time\-evolving graphs, fall outside the scope of this study and warrant further investigation\.\. Unlike existing methods that primarily focus on homophilic structures, HAAM explicitly accommodates both homophilic and heterophilic interactions across graph dimensions\. More precisely, HAAM uses learnable and dimension\-specific compatibility matrices\[[45](https://arxiv.org/html/2605.12699#bib.bib73)\]to capture the varying levels of homophily and heterophily across dimensions\. Additionally, we propose a combination between learnable low\-pass and high\-pass spectral Chebyshev filters\[[11](https://arxiv.org/html/2605.12699#bib.bib83)\]to extract smooth \(i\.e\., low\-frequency homophilic\) and rapidly\-changing \(i\.e\., high\-frequency heterophilic\) information from node interactions\. In particular, our framework leverages a mathematically derived method that applies two filters sequentially via the product of their Chebyshev interpolation in the spectral domain\. Finally, the proposed model is trained using two loss functions: the traditional cross\-entropy loss, and a second loss that minimizes the divergence between dimension\-specific and consensus predictions while promoting sparsity in the consensus predictions\. To handle the non\-smooth regularization that induces sparsity, we adopt a proximal\-gradient optimization framework\.

Spectral polynomial filters have been shown to approximate a wide range of spectral filters effectively\[[42](https://arxiv.org/html/2605.12699#bib.bib85)\], making them suitable in spectral graph convolutions for both homophilic and heterophilic graphs\. In particular, Chebyshev polynomials\[[11](https://arxiv.org/html/2605.12699#bib.bib83)\],\[[13](https://arxiv.org/html/2605.12699#bib.bib84)\]are recognized for their strong approximation capabilities thanks to the properties of the Chebyshev basis and the capacity to minimize the Runge phenomenon\. Unlike GREET\[[22](https://arxiv.org/html/2605.12699#bib.bib65)\], PolyGCL\[[9](https://arxiv.org/html/2605.12699#bib.bib70)\], and TFE\-GNN\[[12](https://arxiv.org/html/2605.12699#bib.bib89)\], which all use linear combinations or concatenations of Chebyshev filters, our approach employs the product of low\-pass and high\-pass filters\. As an advantage, the product of the filters can capture non\-linear interactions between the low\-frequency and high\-frequency components\. Furthermore, the product of filters introduces higher\-order Chebyshev terms \(up to2K2Kfor two Chebyshev filters of orderKK\), which enables capturing more complex interactions between low and high frequencies\.

Contributions:

- •Heterophily\-aware adaptive multiplex model \(HAAM\)\.We introduce a principled framework for multiplex node classification that explicitly models relation\-specific level of homophily/heterophily via learnable compatibility matrices and reconciles dimension\-wise predictions into a*sparse*consensus label distribution through a proximal\-gradient formulation \(Alg\.[1](https://arxiv.org/html/2605.12699#alg1)\)\.
- •Product\-composed Chebyshev spectral filtering\.We propose to fuse learnable low\-pass and high\-pass Chebyshev filters by composition \(matrix product\) rather than by linear mixtures, enabling non\-linear low/high\-frequency interactions and implicitly introducing higher\-order terms up to degree2K2K\(Sec\.[4\.2](https://arxiv.org/html/2605.12699#S4.SS2)\)\. We characterize the composed operator in the spectral domain \(Proposition[4\.1](https://arxiv.org/html/2605.12699#S4.Thmtheorem1)and Corollary[4\.2](https://arxiv.org/html/2605.12699#S4.Thmtheorem2)\) and derive a Chebyshev product\-to\-sum expansion \(Proposition[4\.3](https://arxiv.org/html/2605.12699#S4.Thmtheorem3)\) that avoids explicitN×NN\\times Nmatrix products and yields an efficient implementation\.
- •Theoretical guarantees for stability and generalization\.We establish bounded input, bounded output stability bounds for Chebyshev bases and polynomial filters and extend them to the product\-composed operator \(Propositions[4\.4](https://arxiv.org/html/2605.12699#S4.Thmtheorem4)–[4\.5](https://arxiv.org/html/2605.12699#S4.Thmtheorem5)and Corollary[4\.6](https://arxiv.org/html/2605.12699#S4.Thmtheorem6)\)\. Furthermore, we prove that the row\-wise softmax is Lipschitz\-stable \(Proposition[4\.7](https://arxiv.org/html/2605.12699#S4.Thmtheorem7)\) and formalize a bandwise noise attenuation effect induced by the product fusion \(Proposition[4\.8](https://arxiv.org/html/2605.12699#S4.Thmtheorem8)\)\. Finally, we provide a statistical generalization bound linking the generalization gap to the operator norms‖L^d‖2‖Hd‖2\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}\\\|H\_\{d\}\\\|\_\{2\}and to the learned Chebyshev coefficients via Corollary[4\.6](https://arxiv.org/html/2605.12699#S4.Thmtheorem6)\(Proposition[4\.9](https://arxiv.org/html/2605.12699#S4.Thmtheorem9)and Theorem[4\.10](https://arxiv.org/html/2605.12699#S4.Thmtheorem10); Sec\.[4\.7](https://arxiv.org/html/2605.12699#S4.SS7)\)\.
- •Comprehensive empirical validation across homophily regimes\.We conduct extensive experiments on synthetic multiplex graphs with controlled homophily ratios and on real\-world multiplex datasets\. Results show that HAAM consistently outperforms competitive multiplex baselines and adapted heterophily\-aware unidimensional methods \(Fig\.[2](https://arxiv.org/html/2605.12699#S5.F2)and Table[2](https://arxiv.org/html/2605.12699#S5.T2)\)\. Ablations and sensitivity analyses further validate the impact of compatibility modeling, proximal consensus, and product\-based filtering \(Table[3](https://arxiv.org/html/2605.12699#S5.T3)and Fig\.[6](https://arxiv.org/html/2605.12699#S5.F6)\)\.

## 2Related Work

This section reviews existing approaches for modeling multiplex graphs and the advancements in addressing heterophily within unidimensional graphs\.

### 2\.1Models for Multiplex Graphs

Many methods designed for multiplex graphs have traditionally focused on capturing homophily patterns\. DMGI\[[33](https://arxiv.org/html/2605.12699#bib.bib43)\]integrates embeddings from various types of node relations using a consensus regularization framework, a bilinear discriminator, along with a mutual information maximization mechanism\[[15](https://arxiv.org/html/2605.12699#bib.bib20)\]that pulls similar nodes together and dissimilar ones apart\. HDMI\[[19](https://arxiv.org/html/2605.12699#bib.bib42)\]builds upon the contrastive loss of DMGI by including a higher\-order objective function that incorporates node features\. X\-GOAL\[[18](https://arxiv.org/html/2605.12699#bib.bib49)\]further improves the contrastive loss by pairing topologically similar nodes and nodes within the same cluster as positive and negative pairs\. SSDCM\[[25](https://arxiv.org/html/2605.12699#bib.bib48)\]introduces a self\-supervised framework for learning node representations in multiplex graphs\. The model leverages a cluster\-based graph summary that improves the discriminative power of node embeddings\. Since all these methods rely on optimizing contrastive loss functions, they inherently group nodes that are topologically close, operating under the assumption of homophily\. Beyond fixed\-structure contrastive objectives, InfoMGF\[[36](https://arxiv.org/html/2605.12699#bib.bib75)\]tackles multiplex graph reliability by refining each graph view to remove task\-irrelevant noise and learning a fused graph by maximizing both view\-shared and view\-unique task\-relevant information\.

GATNE\[[8](https://arxiv.org/html/2605.12699#bib.bib40)\]decomposes the node embeddings into base, edge, and attribute embeddings\. Neighborhood information is integrated into edge embeddings using a self\-attention mechanism\. Similarly, mGCN\[[24](https://arxiv.org/html/2605.12699#bib.bib41)\]uses two types of node embeddings: one that captures interactions within and across dimensions, and another that considers the node embeddings with respect to the entire graph\. SSAMN\[[35](https://arxiv.org/html/2605.12699#bib.bib88)\]combines node embeddings and class label embeddings learned in a semi\-supervised setting using a small subset of labeled nodes\. HMGE\[[2](https://arxiv.org/html/2605.12699#bib.bib3)\]embeds high\-dimensional multiplex graphs by hierarchically encoding the graph dimensions\. The method progressively builds hidden graph dimensions that can capture new types of interactions through nonlinear combinations of the original graph structures\. DMG\[[27](https://arxiv.org/html/2605.12699#bib.bib50)\]focuses on capturing the common and complementary information across the graph dimensions\. The approach takes advantage of disentangled representations to distinguish between shared and unique information\. In the same context, MGDCR\[[26](https://arxiv.org/html/2605.12699#bib.bib51)\]minimizes the correlation between inter\-dimension and intra\-dimension codes\.

While recent studies on multiplex graphs have made significant progress, the treatment of heterophily, where nodes with different attributes or classes are more likely to connect, remains only partially explored\. This aspect may limit the ability to fully capture the structural diversity observed in complex systems\. This work aims to complement recent efforts by proposing a novel approach that explicitly integrates both homophilic and heterophilic interactions across multiplex dimensions in a principled way\. The proposed method can accurately capture the full spectrum of interactions within multiplex graph dimensions, including heterophilic and homophilic interactions\.

### 2\.2Heterophily in Unidimensional Graphs

In recent years, heterophily has emerged as a significant challenge in modeling unidimensional graphs, leading to the development of various methods to address it\. GPR\-GNN\[[10](https://arxiv.org/html/2605.12699#bib.bib69)\]is a flexible graph neural network that adapts to homophilic and heterophilic graphs\. By learning generalized PageRank weights, GPR\-GNN optimizes the balance between node features and topological information, avoiding over\-smoothing\. LINKX\[[20](https://arxiv.org/html/2605.12699#bib.bib71)\]introduces a simple and scalable method to learn reliable representations on heterophilous graphs\. The method separately embeds node features and edges with MLPs and combines the embeddings by concatenation\. DGCN\[[32](https://arxiv.org/html/2605.12699#bib.bib67)\]follows a similar approach, employing a mixed filter to balance low\- and high\-frequency information\. CPGNN\[[45](https://arxiv.org/html/2605.12699#bib.bib73)\]incorporates a learnable compatibility matrix that models the likelihood of connections between nodes of different classes\. SELENE\[[44](https://arxiv.org/html/2605.12699#bib.bib66)\]proposes a dual\-channel embedding pipeline that discriminates between r\-ego networks, taking advantage of the attributes of the nodes and the structural information separately\. GREET\[[22](https://arxiv.org/html/2605.12699#bib.bib65)\]employs an edge discriminator that separates homophilic from heterophilic edges using features and structural information\. This is coupled with a dual\-channel encoder that processes edges independently with a concatenation of low\-pass and high\-pass filters\. PolyGCL\[[9](https://arxiv.org/html/2605.12699#bib.bib70)\]applies contrastive learning with polynomial filters, making it adaptable to homophilic and heterophilic graphs\. Moreover, it introduces a dual\-channel filtering mechanism with a linear combination of low\-pass and high\-pass filters\. Similarly, TFE\-GNN\[[12](https://arxiv.org/html/2605.12699#bib.bib89)\]employs triple filter ensembles and combines low\-pass and high\-pass filters using linear summations and concatenations\.

Although these methods have made significant contributions to addressing heterophily in unidimensional graphs, they fall short when it comes to handling the complexity of multiplex graphs\. Multiplex graphs, characterized by multiple dimensions of diverse interactions between nodes, present unique challenges that unidimensional approaches can not tackle\. In this paper, we propose a principled approach specifically designed to handle the varying levels of homophily and heterophily across the dimensions of multiplex graphs\.

## 3Definitions & Notations

Before describing the proposed approach, let us first present the main definitions and notations used throughout the paper\. For convenience, Table[4](https://arxiv.org/html/2605.12699#A1.T4)in[A](https://arxiv.org/html/2605.12699#A1)summarizes the main notation used throughout the paper\.

### 3\.1Multiplex Graphs

We consider aDD\-dimensional multiplex graphGG, defined as a set ofDDgraphsG=\{G1,…,GD\}G=\\left\\\{G\_\{1\},\\dots,G\_\{D\}\\right\\\}\. Each graphGd=\(V,Ad\)G\_\{d\}=\(V,\\,A\_\{d\}\), ford∈\{1,…,D\}d\\in\\\{1,\\dots,D\\\}, consists of the same set ofNNnodesV=\{v1,…,vN\}V=\\left\\\{v\_\{1\},\\dots,v\_\{N\}\\right\\\}and an adjacency matrixAd∈ℝN×NA\_\{d\}\\in\\mathbb\{R\}^\{N\\times N\}, where\(Ad\)ij=1\(A\_\{d\}\)\_\{ij\}=1if there is an edge between nodesviv\_\{i\}andvjv\_\{j\}in dimensiondd, and0otherwise\. The degree matrixΔd\\Delta\_\{d\}of each adjacency matrixAdA\_\{d\}is diagonal, with\(Δd\)ii=∑j=1N\(Ad\)ij\(\\Delta\_\{d\}\)\_\{ii\}=\\sum\_\{j=1\}^\{N\}\(A\_\{d\}\)\_\{ij\}, and the associated normalized Laplacian matrix isLd=I−Δd−12⋅\(Ad\+I\)⋅Δd−12L\_\{d\}=I\-\\Delta\_\{d\}^\{\-\\frac\{1\}\{2\}\}\\,\\cdot\\,\(A\_\{d\}\+I\)\\,\\cdot\\,\\Delta\_\{d\}^\{\-\\frac\{1\}\{2\}\}\. We defineX∈ℝN×FX\\in\\mathbb\{R\}^\{N\\times F\}as the node feature matrix, where theii\-th row corresponds to the feature vector of nodeviv\_\{i\}andFFis the number of features per node\. The label matrix is denoted byY∈\{0,1\}N×CY\\in\\\{0,1\\\}^\{N\\times C\}, where each row corresponds to a node, and each node is assigned exactly one of theCClabels\.

### 3\.2Homophily Ratio

Let𝒴d∈ℕC×C\\mathcal\{Y\}\_\{d\}\\in\\mathbb\{N\}^\{C\\times C\}be the matrix that measures class\-wise connectivity, where each entry\(𝒴d\)ij\(\\mathcal\{Y\}\_\{d\}\)\_\{ij\}indicates the number of edges connecting nodes of classiito nodes of classjjin dimensiondd\. The homophily ratio for dimensionddis defined in Eq\. \([1](https://arxiv.org/html/2605.12699#S3.E1)\)\. The proportion of heterophilic edges is given by1−hd1\-h\_\{d\}\. Ashdh\_\{d\}decreases, the level of homophily in dimensiondddecreases, while the level of heterophily increases\.

hd=∑i=1C\(𝒴d\)ii∑i,j=1C\(𝒴d\)ij\.h\_\{d\}=\\frac\{\\sum\_\{i=1\}^\{C\}\\left\(\\mathcal\{Y\}\_\{d\}\\right\)\_\{ii\}\}\{\\sum\_\{i,j=1\}^\{C\}\\left\(\\mathcal\{Y\}\_\{d\}\\right\)\_\{ij\}\}\.\(1\)

### 3\.3Chebyshev Polynomials

Forx∈ℝx\\in\\mathbb\{R\}, the Chebyshev polynomialsTk\(x\)T\_\{k\}\(x\)are recursively defined by the relationTk\+1\(x\)=2xTk\(x\)−Tk−1\(x\)T\_\{k\+1\}\(x\)=2\\,x\\,T\_\{k\}\(x\)\-T\_\{k\-1\}\(x\), with initial conditionsT0\(x\)=1T\_\{0\}\(x\)=1andT1\(x\)=xT\_\{1\}\(x\)=x\. For matrices, this recurrence is applied element\-wise, making Chebyshev polynomials useful in approximating spectral functions\.

### 3\.4Spectral Graph Convolution

Spectral graph neural networks operate based on spectral graph convolutions\. Recall thatX∈ℝN×FX\\in\\mathbb\{R\}^\{N\\times F\}denotes the node feature matrix introduced in Sec\. 3\.1, where each row corresponds to one node and each column to one feature\. For dimensiondd, this operation is given by:

Zd=fd\(Ld\)X,∀d∈\{1,…,D\},Z\_\{d\}=f\_\{d\}\(L\_\{d\}\)\\,X,\\qquad\\forall d\\in\\left\\\{1,\\dots,D\\right\\\},\(2\)whereZd∈ℝN×F′Z\_\{d\}\\in\\mathbb\{R\}^\{N\\times F^\{\\prime\}\}is the transformed feature matrix for dimensiondd\(withF′F^\{\\prime\}the embedding dimension\), andfd\(Ld\)∈ℝN×Nf\_\{d\}\(L\_\{d\}\)\\in\\mathbb\{R\}^\{N\\times N\}is the spectral filter computed using the normalized Laplacian matrixLdL\_\{d\}\. LetUdΛdUd⊤U\_\{d\}\\,\\Lambda\_\{d\}\\,U\_\{d\}^\{\\top\}be the eigendecomposition ofLdL\_\{d\},Λd=diag\(λd\(1\),…,λd\(N\)\)\\Lambda\_\{d\}=\\text\{diag\}\(\\lambda\_\{d\}^\{\(1\)\},\\dots,\\lambda\_\{d\}^\{\(N\)\}\)is the matrix of eigenvalues representing the graph frequencies, andUd=\[ud\(1\),…,ud\(N\)\]U\_\{d\}=\[u\_\{d\}^\{\(1\)\},\\dots,u\_\{d\}^\{\(N\)\}\]contains the eigenvectors\. The graph Fourier transform of a graph signalx∈ℝNx\\in\\mathbb\{R\}^\{N\}is defined asx^=Ud⊤x\\hat\{x\}=U\_\{d\}^\{\\top\}\\,x, and the inverse transform isx=Udx^x=U\_\{d\}\\,\\hat\{x\}\. In the spectral domain, the spectral filter modulates the frequency response as expressed in:

Zd=fd\(Ld\)X=Udfd\(Λd\)Ud⊤X\.Z\_\{d\}=f\_\{d\}\(L\_\{d\}\)\\,X=U\_\{d\}\\,f\_\{d\}\(\\Lambda\_\{d\}\)\\,U\_\{d\}^\{\\top\}\\,X\.\(3\)

## 4The Proposed HAAM Approach

In this section, we describe the proposed approach HAAM \(Heterophily\-Aware Adaptive Multiplex model\)\.

### 4\.1Overall Framework

![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/framework.drawio.png)Figure 1:The architecture of HAAM\.Fig\.[1](https://arxiv.org/html/2605.12699#S4.F1)depicts the architecture of HAAM\. The process starts by embedding the featuresXXusing a Multilayer Perceptron \(MLP\) to generate an initial distribution of label predictionsY^0∈ℝN×C\\hat\{Y\}\_\{0\}\\in\\mathbb\{R\}^\{N\\times C\}\. In high heterophily settings, the graph topology reflects more complex relations between the label distribution and node features compared to homophilous graphs\[[20](https://arxiv.org/html/2605.12699#bib.bib71)\]\. To overcome this, we embedXXindependently from the graph structures to extract a prior distribution of node labels, which is then refined in the subsequent stages\.

We formulate learnable low\-pass and high\-pass Chebyshev filters for each dimensiondd, denoted byfdℒf\_\{d\}^\{\\mathcal\{L\}\}andfdℋf\_\{d\}^\{\\mathcal\{H\}\}, respectively\. The filters are computed based on the rescaled Laplacian matricesL~d=2Ld/λdmax−I\\tilde\{L\}\_\{d\}=2\\,L\_\{d\}\\,/\\,\\lambda\_\{d\}^\{\\text\{max\}\}\-I, whereλdmax\\lambda\_\{d\}^\{\\text\{max\}\}is the greatest eigenvalue of the normalized LaplacianLdL\_\{d\}\. The low\-pass filters smooth neighboring signals, emphasizing homophilic properties in the graph structures\. Conversely, the high\-pass filters capture significant variations between adjacent nodes, highlighting heterophilous properties\.

For each dimension, the low\-pass and high\-pass spectral filters extract distinct information, which we combine through a matrix product to obtain

L^d=fdℒ\(L~d\)⋅fdℋ\(L~d\),L^d∈ℝN×N\.\\hat\{L\}\_\{d\}=f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)\\cdot f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\),\\qquad\\hat\{L\}\_\{d\}\\in\\mathbb\{R\}^\{N\\times N\}\.\(4\)This contrasts with previous work that employs linear combinations and concatenations to fuse filters\[[22](https://arxiv.org/html/2605.12699#bib.bib65),[9](https://arxiv.org/html/2605.12699#bib.bib70),[12](https://arxiv.org/html/2605.12699#bib.bib89)\]\.

The next step is to compute the updated label predictionsY^d\\hat\{Y\}\_\{d\}for each dimensiondd\. To this end, we model the probability of connections between nodes in different classes using an empirically estimated compatibility matrixHd∈ℝC×CH\_\{d\}\\in\\mathbb\{R\}^\{C\\times C\}\[[45](https://arxiv.org/html/2605.12699#bib.bib73)\]\. As explained in Sec\.[4\.4](https://arxiv.org/html/2605.12699#S4.SS4), the entry\(Hd\)c1c2\(H\_\{d\}\)\_\{c\_\{1\}c\_\{2\}\}represents the likelihood that nodes from classc1c\_\{1\}in dimensionddconnect to nodes from classc2c\_\{2\}\. Both matrices,L^d\\hat\{L\}\_\{d\}andHdH\_\{d\}, capture heterophily and homophily within the graph\. WhileL^d\\hat\{L\}\_\{d\}diffuses information across the topology,HdH\_\{d\}adjusts the predictions based on inter\-class connection probabilities\. Accordingly, the updated predictionsY^d\\hat\{Y\}\_\{d\}are expressed as:

Y^d=softmax\(L^d⋅Y^0⋅Hd\),\\hat\{Y\}\_\{d\}=\\mathrm\{softmax\}\(\\hat\{L\}\_\{d\}\\,\\cdot\\,\\hat\{Y\}\_\{0\}\\,\\cdot\\,H\_\{d\}\),\(5\)whereY^d∈ℝN×C\\hat\{Y\}\_\{d\}\\in\\mathbb\{R\}^\{N\\times C\}denotes the dimension\-specific label predictions andHd∈ℝC×CH\_\{d\}\\in\\mathbb\{R\}^\{C\\times C\}is the compatibility matrix introduced in Sec\.[4\.4](https://arxiv.org/html/2605.12699#S4.SS4)\.

Finally, we generate the consensus label predictionsY^\\hat\{Y\}from all the dimension\-specific predictionsY^d\\hat\{Y\}\_\{d\}\. Specifically, we minimize the divergence betweenY^d\\hat\{Y\}\_\{d\}andY^\\hat\{Y\}while maximizing sparsity in the consensus predictions\. To manage the non\-smooth regularization that induces sparsity, we utilize proximal\-gradient optimization\.

### 4\.2Composition of Chebyshev Filters

We define the filtersfdℒf^\{\\mathcal\{L\}\}\_\{d\}andfdℋf^\{\\mathcal\{H\}\}\_\{d\}as Chebyshev polynomials\[[11](https://arxiv.org/html/2605.12699#bib.bib83)\]to approximate optimal spectral filters\. LetKKdenote the degree of the polynomial filters, then:

fdℒ\(L~d\)=∑k=0Kθkℒ,dTk\(L~d\),fdℋ\(L~d\)=∑k=0Kθkℋ,dTk\(L~d\),f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)=\\sum\_\{k=0\}^\{K\}\\theta\_\{k\}^\{\\mathcal\{L\},d\}\\,T\_\{k\}\(\\tilde\{L\}\_\{d\}\),\\;\\;f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)=\\sum\_\{k=0\}^\{K\}\\theta\_\{k\}^\{\\mathcal\{H\},d\}\\,T\_\{k\}\(\\tilde\{L\}\_\{d\}\),\(6\)whereθkℒ,d\\theta\_\{k\}^\{\\mathcal\{L\},d\}andθkℋ,d\\theta\_\{k\}^\{\\mathcal\{H\},d\}are the polynomial coefficients of the low\-pass and high\-pass filters, respectively, for dimensiondd\. These coefficients control which spectral components of the graph signal are emphasized\. Low\-pass filters aim to retain low\-frequency components, which correspond to smooth variations across the graph\. These components are represented by lower eigenvalues\. On the other hand, high\-pass filters are designed to capture high\-frequency components, which correspond to rapid variations between neighboring nodes\. These rapid variations are captured by higher eigenvalues\. Intuitively, low\-pass filters capture homophilic relations, while high\-pass filters capture heterophilic relations\.

Similar to\[[11](https://arxiv.org/html/2605.12699#bib.bib83)\], it is possible to optimize the coefficientsθkℒ,d\\theta\_\{k\}^\{\\mathcal\{L\},d\}andθkℋ,d\\theta\_\{k\}^\{\\mathcal\{H\},d\}via gradient descent\. However, the unconstrained coefficients might not capture the expressive power of Chebyshev coefficients and can lead to overfitting\[[13](https://arxiv.org/html/2605.12699#bib.bib84)\]\. We aim to approximate arbitrary low\-pass and high\-pass filters that could adapt to the specific characteristics of the multiplex graph\. To achieve this, we adopt a double reparametrization approach\. Givenℱ∈\{ℒ,ℋ\}\\mathcal\{F\}\\in\\left\\\{\\mathcal\{L\},\\mathcal\{H\}\\right\\\}, we first reparameterizeθkℱ,d\\theta\_\{k\}^\{\\mathcal\{F\},d\}by a vectorγℱ,d∈ℝK\+1\\gamma^\{\\mathcal\{F\},d\}\\in\\mathbb\{R\}^\{K\+1\}to capture the characteristics of Chebyshev coefficients, following the formulation in\[[13](https://arxiv.org/html/2605.12699#bib.bib84)\]:

θkℱ,d=2K\+1∑j=0Kγjℱ,dTk\(cos\(j\+1/2K\+1π\)\)\.\\theta\_\{k\}^\{\\mathcal\{F\},d\}=\\frac\{2\}\{K\+1\}\\sum\_\{j=0\}^\{K\}\\gamma\_\{j\}^\{\\mathcal\{F\},d\}\\\>\\\>T\_\{k\}\\left\(cos\\left\(\\frac\{j\+1/2\}\{K\+1\}\\pi\\right\)\\right\)\.\(7\)Constraining the coefficients allows to polynomially approximate an arbitrary spectral filter with an optimal convergence rate\[[13](https://arxiv.org/html/2605.12699#bib.bib84)\]\. Second, we further reparametrizeγℱ,d\\gamma^\{\\mathcal\{F\},d\}using prefix difference and prefix sum\[[9](https://arxiv.org/html/2605.12699#bib.bib70)\]:

γiℒ,d=γ0d−∑j=1iγjd,γiℋ,d=∑j=0iγjd,\\gamma\_\{i\}^\{\\mathcal\{L\},d\}=\\gamma\_\{0\}^\{d\}\-\\sum\_\{j=1\}^\{i\}\\gamma\_\{j\}^\{d\},\\ \\ \\ \\ \\gamma\_\{i\}^\{\\mathcal\{H\},d\}=\\sum\_\{j=0\}^\{i\}\\gamma\_\{j\}^\{d\},\(8\)whereγ0ℱ,d=γ0d\\gamma\_\{0\}^\{\\mathcal\{F\},d\}=\\gamma\_\{0\}^\{d\}is a predefined initial value andγd=\[γ1d,…,γKd\]\\gamma^\{d\}=\\left\[\\gamma^\{d\}\_\{1\},\\dots,\\gamma^\{d\}\_\{K\}\\right\]is a vector of non\-negative learnable parameters\. By substitutingγiℒ,d\\gamma\_\{i\}^\{\\mathcal\{L\},d\}andγiℋ,d\\gamma\_\{i\}^\{\\mathcal\{H\},d\}into Eq\. \([7](https://arxiv.org/html/2605.12699#S4.E7)\), we derive the low\-pass and high\-pass filters, respectively\. This process ensures that the Chebyshev polynomialsfdℒf\_\{d\}^\{\\mathcal\{L\}\}andfdℋf\_\{d\}^\{\\mathcal\{H\}\}effectively approximate filters with low\-pass and high\-pass properties\.

The filtersfdℒ\(L~d\)f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)andfdℋ\(L~d\)f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)capture two distinct perspectives for each dimension: one focusing on homophilic relations, the other on heterophily\. Since we aim to elaborate an adaptive approach, it is essential to combine these filters\. Previous work\[[22](https://arxiv.org/html/2605.12699#bib.bib65),[9](https://arxiv.org/html/2605.12699#bib.bib70),[12](https://arxiv.org/html/2605.12699#bib.bib89)\]has explored linear combinations and concatenations to fuse filters\. In this work, we propose an alternative method by utilizing the product offdℒ\(L~d\)f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)andfdℋ\(L~d\)f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)\. The product is derived from the composition of the two filters as explained in the Prop\.[4\.1](https://arxiv.org/html/2605.12699#S4.Thmtheorem1)\.

###### Proposition 4\.1\(Spectral response of filter composition\. Proof in[B](https://arxiv.org/html/2605.12699#A2)\)\.

The application of a low\-pass filterfℒ\(L\)f^\{\\mathcal\{L\}\}\(L\)followed by a high\-pass filterfℋ\(L\)f^\{\\mathcal\{H\}\}\(L\)to a graph signalxxis equivalent to the application of a filterf\(L\)f\(L\)whose eigenvalues are equal to the element\-wise product of the eigenvalues offℒ\(L\)f^\{\\mathcal\{L\}\}\(L\)andfℋ\(L\)f^\{\\mathcal\{H\}\}\(L\)\. Formally, the filter output is given by:

y=f\(L\)x=U\(fℋ\(Λ\)⋅fℒ\(Λ\)\)U⊤x,y=f\(L\)\\,x=U\\left\(f^\{\\mathcal\{H\}\}\(\\Lambda\)\\cdot f^\{\\mathcal\{L\}\}\(\\Lambda\)\\right\)U^\{\\top\}x,whereyyis the filter output,Λ\\Lambdais the eigenvalue matrix ofLL, andUUcontains the corresponding eigenvectors\.

###### Corollary 4\.2\(Order\-invariance of filter composition\. Proof in[C](https://arxiv.org/html/2605.12699#A3)\)\.

The composition of a low\-pass filterfℒ\(L\)f^\{\\mathcal\{L\}\}\(L\)followed by a high\-pass filterfℋ\(L\)f^\{\\mathcal\{H\}\}\(L\)is order\-invariant as expressed in:

fℋ\(L\)⋅fℒ\(L\)=fℒ\(L\)⋅fℋ\(L\)\.f^\{\\mathcal\{H\}\}\(L\)\\,\\cdot\\,f^\{\\mathcal\{L\}\}\(L\)=f^\{\\mathcal\{L\}\}\(L\)\\,\\cdot\\,f^\{\\mathcal\{H\}\}\(L\)\.

The Corollary implies that the output of the filter is the same regardless of whether the high\-pass or low\-pass filter is applied first, as the operation is commutative\. Compared with the linear combination, the product can capture non\-linear interactions between the low\-frequency and high\-frequency components and introduces higher\-order Chebyshev terms, which allow for capturing more complex interactions\. However, computing the product ofN×NN\\times Nmatrices is computationally expensive\. Moreover, applying gradient descent and backpropagation through the products of large matrices incurs substantial memory costs\. To mitigate this issue, we express the product of Chebyshev polynomials as a linear combination of Chebyshev polynomials weighted by the coefficientsθiℒ,dθiℋ,d\\theta\_\{i\}^\{\\mathcal\{L\},d\}\\,\\theta\_\{i\}^\{\\mathcal\{H\},d\}, as explained in Prop\.[4\.3](https://arxiv.org/html/2605.12699#S4.Thmtheorem3)\.

###### Proposition 4\.3\(Chebyshev product\-to\-sum expansion\. Proof in[D](https://arxiv.org/html/2605.12699#A4)\)\.

Given the low\-pass Chebyshev filterfdℒ\(L~d\)f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)and the high\-pass Chebyshev filterfdℋ\(L~d\)f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\), the product of these filtersL^d=fdℒ\(L~d\)⋅fdℋ\(L~d\)\\hat\{L\}\_\{d\}=f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)\\cdot f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)can be expressed as a sum of Chebyshev polynomials weighted byθiℒ,dθjℋ,d\\theta\_\{i\}^\{\\mathcal\{L\},d\}\\\>\\theta\_\{j\}^\{\\mathcal\{H\},d\}:

L^d=12∑i=0K∑j=0Kθiℒ,dθjℋ,d\[Ti\+j\(L~d\)\+T\|i−j\|\(L~d\)\]\.\\hat\{L\}\_\{d\}=\\frac\{1\}\{2\}\\sum\_\{i=0\}^\{K\}\\sum\_\{j=0\}^\{K\}\\theta\_\{i\}^\{\\mathcal\{L\},d\}\\,\\theta\_\{j\}^\{\\mathcal\{H\},d\}\\left\[T\_\{i\+j\}\\left\(\\tilde\{L\}\_\{d\}\\right\)\+T\_\{\|i\-j\|\}\\left\(\\tilde\{L\}\_\{d\}\\right\)\\right\]\.

Prop\.[4\.3](https://arxiv.org/html/2605.12699#S4.Thmtheorem3)allows to reduce the computational and memory overhead\. Rather than performing direct matrix multiplication, the matrix termsTk\(L~d\)T\_\{k\}\(\\tilde\{L\}\_\{d\}\)are summed and multiplied by the scalar coefficientsθiℒ,d\\theta\_\{i\}^\{\\mathcal\{L\},d\}andθiℋ,d\\theta\_\{i\}^\{\\mathcal\{H\},d\}\.

### 4\.3Stability of the Composed Chebyshev Filter

We now provide a stability analysis of the proposed composed operatorL^d=fdℒ\(L~d\)fdℋ\(L~d\)\\hat\{L\}\_\{d\}=f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\), whereL~d=2Ld/λdmax−I\\tilde\{L\}\_\{d\}=2\\,L\_\{d\}/\\lambda\_\{d\}^\{\\text\{max\}\}\-Iis the rescaled Laplacian used in Sec\.[4\.2](https://arxiv.org/html/2605.12699#S4.SS2)\. Throughout,∥⋅∥2\\\|\\cdot\\\|\_\{2\}denotes the spectral norm \(largest singular value\) and∥⋅∥F\\\|\\cdot\\\|\_\{F\}denotes the Frobenius norm\.

##### Why rescaling matters

WhenLdL\_\{d\}is symmetric \(e\.g\., undirected graphs with symmetric normalization\), it admits an eigendecompositionLd=UdΛdUd⊤L\_\{d\}=U\_\{d\}\\Lambda\_\{d\}U\_\{d\}^\{\\top\}with real eigenvaluesλd\(i\)∈\[0,λdmax\]\\lambda\_\{d\}^\{\(i\)\}\\in\[0,\\lambda\_\{d\}^\{\\text\{max\}\}\]\. Therefore,L~d=2Ld/λdmax−I\\tilde\{L\}\_\{d\}=2\\,L\_\{d\}/\\lambda\_\{d\}^\{\\text\{max\}\}\-Ihas eigenvaluesλ~d\(i\)=2λd\(i\)/λdmax−1∈\[−1,1\]\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}=2\\,\\lambda\_\{d\}^\{\(i\)\}/\\lambda\_\{d\}^\{\\text\{max\}\}\-1\\in\[\-1,1\]\. This is the classical regime where Chebyshev polynomials satisfy\|Tk\(x\)\|≤1\|T\_\{k\}\(x\)\|\\leq 1forx∈\[−1,1\]x\\in\[\-1,1\], preventing the exponential growth that occurs when\|x\|\>1\|x\|\>1\.

###### Definition 1\(BIBO stability\)\.

Fix a dimensiond∈\{1,…,D\}d\\in\\\{1,\\dots,D\\\}\. We say that a linear graph operator𝒯d∈ℝN×N\\mathcal\{T\}\_\{d\}\\in\\mathbb\{R\}^\{N\\times N\}is*bounded\-input bounded\-output \(BIBO\) stable*with respect to the Frobenius norm if there exists a constantCd<∞C\_\{d\}<\\inftysuch that, for every matrix\-valued graph signalS∈ℝN×CS\\in\\mathbb\{R\}^\{N\\times C\}, we have:

‖𝒯dS‖F≤Cd‖S‖F\.\\\|\\mathcal\{T\}\_\{d\}\\,S\\\|\_\{F\}\\leq C\_\{d\}\\,\\\|S\\\|\_\{F\}\.In HAAM, the relevant operators are Chebyshev polynomial filtersfd\(L~d\)f\_\{d\}\(\\tilde\{L\}\_\{d\}\)and the composed operatorL^d=fdℒ\(L~d\)fdℋ\(L~d\)\\hat\{L\}\_\{d\}=f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)acting on the initial predictionsY^0∈ℝN×C\\hat\{Y\}\_\{0\}\\in\\mathbb\{R\}^\{N\\times C\}\.

###### Proposition 4\.4\(Bounded Chebyshev bases\. Proof in Appendix[E](https://arxiv.org/html/2605.12699#A5)\)\.

LetL~d∈ℝN×N\\tilde\{L\}\_\{d\}\\in\\mathbb\{R\}^\{N\\times N\}be symmetric and admit the eigendecompositionL~d=UdΛ~dUd⊤\\tilde\{L\}\_\{d\}=U\_\{d\}\\tilde\{\\Lambda\}\_\{d\}U\_\{d\}^\{\\top\}, whereΛ~d=diag\(λ~d\(1\),…,λ~d\(N\)\)\\tilde\{\\Lambda\}\_\{d\}=\\mathrm\{diag\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(1\)\},\\dots,\\tilde\{\\lambda\}\_\{d\}^\{\(N\)\}\)satisfiesλ~d\(i\)∈\[−1,1\]\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\\in\[\-1,1\]for alli∈\{1,…,N\}i\\in\\\{1,\\dots,N\\\}\. Then for all integersk≥0k\\geq 0, we have:

‖Tk\(L~d\)‖2≤1\.\\\|T\_\{k\}\(\\tilde\{L\}\_\{d\}\)\\\|\_\{2\}\\leq 1\.

Prop\.[4\.4](https://arxiv.org/html/2605.12699#S4.Thmtheorem4)implies that finite\-order Chebyshev filters are bounded by theℓ1\\ell\_\{1\}magnitude of their coefficients, yielding a BIBO\-type stability guarantee\.

###### Proposition 4\.5\(BIBO stability of Chebyshev polynomial filters\. Proof in Appendix[F](https://arxiv.org/html/2605.12699#A6)\)\.

Letfd\(L~d\)=∑k=0KαkTk\(L~d\)f\_\{d\}\(\\tilde\{L\}\_\{d\}\)=\\sum\_\{k=0\}^\{K\}\\alpha\_\{k\}\\,T\_\{k\}\(\\tilde\{L\}\_\{d\}\), whereL~d\\tilde\{L\}\_\{d\}is symmetric and its eigenvalues satisfyλ~d\(i\)∈\[−1,1\]\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\\in\[\-1,1\]for allii\. Then

‖fd\(L~d\)‖2≤∑k=0K\|αk\|and‖fd\(L~d\)S‖F≤\(∑k=0K\|αk\|\)‖S‖F,∀S∈ℝN×C\.\\\|f\_\{d\}\(\\tilde\{L\}\_\{d\}\)\\\|\_\{2\}\\leq\\sum\_\{k=0\}^\{K\}\|\\alpha\_\{k\}\|\\quad\\text\{and\}\\quad\\\|f\_\{d\}\(\\tilde\{L\}\_\{d\}\)\\,S\\\|\_\{F\}\\leq\\Big\(\\sum\_\{k=0\}^\{K\}\|\\alpha\_\{k\}\|\\Big\)\\,\\\|S\\\|\_\{F\},\\quad\\forall S\\in\\mathbb\{R\}^\{N\\times C\}\.

###### Corollary 4\.6\(Stability of the product\-composed filter\. Proof in Appendix[G](https://arxiv.org/html/2605.12699#A7)\)\.

For each dimensiondd, letfdℒ\(L~d\)=∑k=0Kθkℒ,dTk\(L~d\)f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)=\\sum\_\{k=0\}^\{K\}\\theta\_\{k\}^\{\\mathcal\{L\},d\}\\,T\_\{k\}\(\\tilde\{L\}\_\{d\}\)andfdℋ\(L~d\)=∑k=0Kθkℋ,dTk\(L~d\)f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)=\\sum\_\{k=0\}^\{K\}\\theta\_\{k\}^\{\\mathcal\{H\},d\}\\,T\_\{k\}\(\\tilde\{L\}\_\{d\}\)\. DefineL^d=fdℒ\(L~d\)fdℋ\(L~d\)\\hat\{L\}\_\{d\}=f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)\. Then

‖L^d‖2\\displaystyle\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}≤\(∑k=0K\|θkℒ,d\|\)\(∑k=0K\|θkℋ,d\|\),\\displaystyle\\leq\\Big\(\\sum\_\{k=0\}^\{K\}\|\\theta\_\{k\}^\{\\mathcal\{L\},d\}\|\\Big\)\\,\\Big\(\\sum\_\{k=0\}^\{K\}\|\\theta\_\{k\}^\{\\mathcal\{H\},d\}\|\\Big\),‖L^dS‖F\\displaystyle\\\|\\hat\{L\}\_\{d\}\\,S\\\|\_\{F\}≤\(∑k=0K\|θkℒ,d\|\)\(∑k=0K\|θkℋ,d\|\)‖S‖F,∀S∈ℝN×C\.\\displaystyle\\leq\\Big\(\\sum\_\{k=0\}^\{K\}\|\\theta\_\{k\}^\{\\mathcal\{L\},d\}\|\\Big\)\\,\\Big\(\\sum\_\{k=0\}^\{K\}\|\\theta\_\{k\}^\{\\mathcal\{H\},d\}\|\\Big\)\\,\\\|S\\\|\_\{F\},\\qquad\\forall S\\in\\mathbb\{R\}^\{N\\times C\}\.Moreover, by Prop\.[4\.3](https://arxiv.org/html/2605.12699#S4.Thmtheorem3), the productL^d\\hat\{L\}\_\{d\}can be written as a*single*Chebyshev polynomial of degree2K2K, i\.e\.,

L^d=∑r=02Kθ¯rdTr\(L~d\),\\hat\{L\}\_\{d\}=\\sum\_\{r=0\}^\{2K\}\\bar\{\\theta\}\_\{r\}^\{\\,d\}\\,T\_\{r\}\(\\tilde\{L\}\_\{d\}\),whereθ¯d=\[θ¯0d,…,θ¯2Kd\]⊤∈ℝ2K\+1\\bar\{\\theta\}^\{\\,d\}=\[\\bar\{\\theta\}\_\{0\}^\{\\,d\},\\dots,\\bar\{\\theta\}\_\{2K\}^\{\\,d\}\]^\{\\top\}\\in\\mathbb\{R\}^\{2K\+1\}denotes the resulting coefficient vector\. This induced vector satisfies

‖θ¯d‖1≤‖θℒ,d‖1‖θℋ,d‖1,\\\|\\bar\{\\theta\}^\{\\,d\}\\\|\_\{1\}\\leq\\\|\\theta^\{\\mathcal\{L\},d\}\\\|\_\{1\}\\,\\\|\\theta^\{\\mathcal\{H\},d\}\\\|\_\{1\},withθℒ,d=\[θ0ℒ,d,…,θKℒ,d\]⊤\\theta^\{\\mathcal\{L\},d\}=\[\\theta\_\{0\}^\{\\mathcal\{L\},d\},\\dots,\\theta\_\{K\}^\{\\mathcal\{L\},d\}\]^\{\\top\}and similarly forθℋ,d\\theta^\{\\mathcal\{H\},d\}\.

##### Stability of the full update rule

Recall that HAAM predicts, for each dimensiondd, the score matrixSd=L^dY^0Hd∈ℝN×CS\_\{d\}=\\hat\{L\}\_\{d\}\\,\\hat\{Y\}\_\{0\}\\,H\_\{d\}\\in\\mathbb\{R\}^\{N\\times C\}and then applies a*row\-wise*softmax \(Eq\. \([5](https://arxiv.org/html/2605.12699#S4.E5)\)\) to obtainY^d=softmax\(Sd\)\\hat\{Y\}\_\{d\}=\\mathrm\{softmax\}\(S\_\{d\}\)\. Using the submultiplicativity of induced norms, we obtain the bound:

‖Sd‖F≤‖L^d‖2‖Y^0‖F‖Hd‖2\.\\\|S\_\{d\}\\\|\_\{F\}\\leq\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}\\,\\\|\\hat\{Y\}\_\{0\}\\\|\_\{F\}\\,\\\|H\_\{d\}\\\|\_\{2\}\.\(9\)Combining \([9](https://arxiv.org/html/2605.12699#S4.E9)\) with Cor\.[4\.6](https://arxiv.org/html/2605.12699#S4.Thmtheorem6)shows that the pre\-softmax logits remain bounded whenever the coefficientℓ1\\ell\_\{1\}norms and‖Hd‖2\\\|H\_\{d\}\\\|\_\{2\}are controlled\.

###### Proposition 4\.7\(Stability of the row\-wise softmax\. Proof in Appendix[H](https://arxiv.org/html/2605.12699#A8)\)\.

LetS,S′∈ℝN×CS,S^\{\\prime\}\\in\\mathbb\{R\}^\{N\\times C\}and defineY=softmax\(S\)Y=\\mathrm\{softmax\}\(S\)andY′=softmax\(S′\)Y^\{\\prime\}=\\mathrm\{softmax\}\(S^\{\\prime\}\), wheresoftmax\(⋅\)\\mathrm\{softmax\}\(\\cdot\)is applied row\-wise\. Then

‖Y‖F≤Nand‖Y−Y′‖F≤12‖S−S′‖F\.\\\|Y\\\|\_\{F\}\\leq\\sqrt\{N\}\\quad\\text\{and\}\\quad\\\|Y\-Y^\{\\prime\}\\\|\_\{F\}\\leq\\frac\{1\}\{2\}\\,\\\|S\-S^\{\\prime\}\\\|\_\{F\}\.

As an immediate consequence, small perturbations of the score matrix translate into controlled changes in the predicted probabilities\. In particular, for two initial predictionsY^0\\hat\{Y\}\_\{0\}andY^0′\\hat\{Y\}^\{\\prime\}\_\{0\}, lettingSd=L^dY^0HdS\_\{d\}=\\hat\{L\}\_\{d\}\\hat\{Y\}\_\{0\}H\_\{d\}andSd′=L^dY^0′HdS\_\{d\}^\{\\prime\}=\\hat\{L\}\_\{d\}\\hat\{Y\}\_\{0\}^\{\\prime\}H\_\{d\}, we obtain:

‖Y^d−Y^d′‖F=‖softmax\(Sd\)−softmax\(Sd′\)‖F≤12‖L^d‖2‖Hd‖2‖Y^0−Y^0′‖F\.\\\|\\hat\{Y\}\_\{d\}\-\\hat\{Y\}\_\{d\}^\{\\prime\}\\\|\_\{F\}=\\\|\\mathrm\{softmax\}\(S\_\{d\}\)\-\\mathrm\{softmax\}\(S\_\{d\}^\{\\prime\}\)\\\|\_\{F\}\\leq\\frac\{1\}\{2\}\\,\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}\\,\\\|H\_\{d\}\\\|\_\{2\}\\,\\\|\\hat\{Y\}\_\{0\}\-\\hat\{Y\}\_\{0\}^\{\\prime\}\\\|\_\{F\}\.

##### Why the product is robust to high\-frequency noise

Applying Prop\.[4\.1](https://arxiv.org/html/2605.12699#S4.Thmtheorem1)toL=L~dL=\\tilde\{L\}\_\{d\}, the composed spectral response satisfiesfd\(λ~\)=fdℒ\(λ~\)fdℋ\(λ~\)f\_\{d\}\(\\tilde\{\\lambda\}\)=f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{\\lambda\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{\\lambda\}\)\. Hence, a frequency component is amplified only if*both*branches assign it a large gain, yielding a soft gating effect\. The following result formalizes bandwise attenuation\.

###### Proposition 4\.8\(Bandwise noise attenuation of the product\. Proof in Appendix[I](https://arxiv.org/html/2605.12699#A9)\)\.

LetL~d=UdΛ~dUd⊤\\tilde\{L\}\_\{d\}=U\_\{d\}\\tilde\{\\Lambda\}\_\{d\}U\_\{d\}^\{\\top\}withΛ~d=diag\(λ~d\(1\),…,λ~d\(N\)\)⊆\[−1,1\]\\tilde\{\\Lambda\}\_\{d\}=\\mathrm\{diag\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(1\)\},\\dots,\\tilde\{\\lambda\}\_\{d\}^\{\(N\)\}\)\\subseteq\[\-1,1\]\. For any index setΩ⊆\{1,…,N\}\\Omega\\subseteq\\\{1,\\dots,N\\\}, define the spectral projectorPΩ,d=Uddiag\(𝟏i∈Ω\)Ud⊤P\_\{\\Omega,d\}=U\_\{d\}\\,\\mathrm\{diag\}\(\\mathbf\{1\}\_\{i\\in\\Omega\}\)\\,U\_\{d\}^\{\\top\}\. Letfdℒ\(L~d\)f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)andfdℋ\(L~d\)f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)be two spectral filters and definefd\(L~d\)=fdℒ\(L~d\)fdℋ\(L~d\)f\_\{d\}\(\\tilde\{L\}\_\{d\}\)=f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)\. Then for anyx∈ℝNx\\in\\mathbb\{R\}^\{N\}, we have:

‖PΩ,dfd\(L~d\)x‖2≤\(maxi∈Ω⁡\|fdℒ\(λ~d\(i\)\)fdℋ\(λ~d\(i\)\)\|\)‖PΩ,dx‖2\.\\\|P\_\{\\Omega,d\}\\,f\_\{d\}\(\\tilde\{L\}\_\{d\}\)\\,x\\\|\_\{2\}\\leq\\Big\(\\max\_\{i\\in\\Omega\}\\big\|f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\big\|\\Big\)\\,\\\|P\_\{\\Omega,d\}\\,x\\\|\_\{2\}\.In particular, iffdℒf\_\{d\}^\{\\mathcal\{L\}\}is low\-pass so thatmaxi∈Ωhigh⁡\|fdℒ\(λ~d\(i\)\)\|≤εhigh\\max\_\{i\\in\\Omega\_\{\\mathrm\{high\}\}\}\|f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\|\\leq\\varepsilon\_\{\\mathrm\{high\}\}on a high\-frequency bandΩhigh\\Omega\_\{\\mathrm\{high\}\}, then

maxi∈Ωhigh⁡\|fdℒ\(λ~d\(i\)\)fdℋ\(λ~d\(i\)\)\|≤εhigh⋅maxi∈Ωhigh⁡\|fdℋ\(λ~d\(i\)\)\|\.\\max\_\{i\\in\\Omega\_\{\\mathrm\{high\}\}\}\\big\|f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\big\|\\leq\\varepsilon\_\{\\mathrm\{high\}\}\\cdot\\max\_\{i\\in\\Omega\_\{\\mathrm\{high\}\}\}\|f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\|\.

Prop\.[4\.8](https://arxiv.org/html/2605.12699#S4.Thmtheorem8)shows that the product can not arbitrarily amplify high\-frequency noise when the low\-pass branch attenuates that band\. Compared to additive fusionfd,sum\(λ~\)=δfdℒ\(λ~\)\+\(1−δ\)fdℋ\(λ~\)f\_\{d,\\mathrm\{sum\}\}\(\\tilde\{\\lambda\}\)=\\delta\\,f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{\\lambda\}\)\+\(1\-\\delta\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{\\lambda\}\), the productfd\(λ~\)=fdℒ\(λ~\)fdℋ\(λ~\)f\_\{d\}\(\\tilde\{\\lambda\}\)=f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{\\lambda\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{\\lambda\}\)is conservative\. If either branch suppresses a frequency, the composed response also suppresses it\. This provides a principled explanation for the empirical robustness of the product operation compared with the sum or weighted\-sum variants reported in the ablation study \(Sec\.[5\.5](https://arxiv.org/html/2605.12699#S5.SS5)\)\.

### 4\.4Compatibility Matrices

The dimension\-specific compatibility matrixHdH\_\{d\}captures the likelihood of connections between nodes from different classes in dimensiondd\[[45](https://arxiv.org/html/2605.12699#bib.bib73)\]\. We defineHdH\_\{d\}as a learnableC×CC\\times Cmatrix, which is empirically initialized using the ground\-truth labels\. LetIctrainI\_\{c\}^\{\\text\{train\}\}denote the set of indices for training nodes belonging to classcc\. For every pair of classesc1c\_\{1\}andc2c\_\{2\}, we initialize\(Hd\)c1c2\\left\(H\_\{d\}\\right\)\_\{c\_\{1\}c\_\{2\}\}with:

\(Hd\)c1c2=∑i∈Ic1train,j∈Ic2train\(Ad\)ij∑i,j=0N\(Ad\)ij\.\\left\(H\_\{d\}\\right\)\_\{c\_\{1\}c\_\{2\}\}=\\frac\{\\sum\_\{i\\in I\_\{c\_\{1\}\}^\{\\text\{train\}\},\\ j\\in I\_\{c\_\{2\}\}^\{\\text\{train\}\}\}\\left\(A\_\{d\}\\right\)\_\{ij\}\}\{\\sum\_\{i,j=0\}^\{N\}\\left\(A\_\{d\}\\right\)\_\{ij\}\}\.\(10\)The initial weights\(Hd\)c1c2\\left\(H\_\{d\}\\right\)\_\{c\_\{1\}c\_\{2\}\}are computed as the proportion of edges in dimensionddlinking nodes of classc1c\_\{1\}with nodes of classc2c\_\{2\}\. During the training process,HdH\_\{d\}is composed withY^0\\hat\{Y\}\_\{0\}andL^d\\hat\{L\}\_\{d\}as indicated in Eq\. \([5](https://arxiv.org/html/2605.12699#S4.E5)\), and undergoes refinement through backpropagation of the loss function, thereby optimizing the model to the specificities of the prediction task\. Importantly, to avoid violating the semi\-supervised learning paradigm, only training labels are used to estimateHdH\_\{d\}\.

The compatibility matrixHdH\_\{d\}serves as a mechanism to integrate and adjust the homophily or heterophily level into the update rule of graph neural networks, diverging from traditional methods that typically employ a normal distribution for weight initialization\. The initialization strategy in Eq\. \([10](https://arxiv.org/html/2605.12699#S4.E10)\) is beneficial for accounting for the varying levels of homophily across different parts of the multiplex graph\. Specifically, the overall homophily level within dimensiondd\(i\.e\.,hdh\_\{d\}\) can be estimated by averaging the diagonal elements ofHdH\_\{d\}\.

### 4\.5Sparse Consensus Labels

The matrixY^0=MLP\(X\)∈ℝN×C\\hat\{Y\}\_\{0\}=\\mathrm\{MLP\}\(X\)\\in\\mathbb\{R\}^\{N\\times C\}defines the initial label predictions\. For each dimensiondd, we form the dimension\-specific score matrix:

Sd=L^dY^0Hd∈ℝN×C\.S\_\{d\}\\;=\\;\\hat\{L\}\_\{d\}\\,\\hat\{Y\}\_\{0\}\\,H\_\{d\}\\;\\in\\;\\mathbb\{R\}^\{N\\times C\}\.\(11\)To obtain a valid per\-node class\-probability vector, we apply a row\-wise softmax\. The matrixY^d∈ℝN×C\\hat\{Y\}\_\{d\}\\in\\mathbb\{R\}^\{N\\times C\}represents the dimension\-specific label predictions defined in Eq\. \([5](https://arxiv.org/html/2605.12699#S4.E5)\)\. For each nodeviv\_\{i\}, the prediction\(Y^d\)i:\(\\hat\{Y\}\_\{d\}\)\_\{i:\}is:

\(Y^d\)i:=\(softmax\(Sd\)i:\)c=exp⁡\(\(Sd\)ic\)∑c′=1Cexp⁡\(\(Sd\)ic′\)\.\(\\hat\{Y\}\_\{d\}\)\_\{i:\}=\\big\(\\mathrm\{softmax\}\(S\_\{d\}\)\_\{i:\}\\big\)\_\{c\}\\;=\\;\\frac\{\\exp\\big\(\(S\_\{d\}\)\_\{ic\}\\big\)\}\{\\sum\_\{c^\{\\prime\}=1\}^\{C\}\\exp\\big\(\(S\_\{d\}\)\_\{ic^\{\\prime\}\}\\big\)\}\.\(12\)Accordingly, the per\-node cross\-entropy loss on dimensionddis:

ℓ\(\(Sd\)i:,Yi:\)=−∑c=1CYiclog\(Y^d\)ic,\\ell\\big\(\(S\_\{d\}\)\_\{i:\},Y\_\{i:\}\\big\)\\;=\\;\-\\sum\_\{c=1\}^\{C\}Y\_\{ic\}\\,\\log\\Big\(\\hat\{Y\}\_\{d\}\\Big\)\_\{ic\},\(13\)whereYi:∈\{0,1\}CY\_\{i:\}\\in\\\{0,1\\\}^\{C\}denotes the one\-hot ground\-truth label vector of nodeviv\_\{i\}\.

We train HAAM in a semi\-supervised setting, aligning the ground\-truth and predicted labels across each dimension\. LetItrainI^\{\\text\{train\}\}denote the set of indices of labeled training nodes\. The minimized loss function is al2l\_\{2\}\-regularized categorical cross\-entropy loss as described below:

𝒥=−∑i∈Itrain∑d=1Dℓ\(\(Sd\)i:,Yi:\)\+α∑w∈𝕎‖w‖22,\\mathcal\{J\}=\-\\sum\_\{i\\in I^\{\\text\{train\}\}\}\\,\\sum\_\{d=1\}^\{D\}\\,\\ell\\big\(\(S\_\{d\}\)\_\{i:\},Y\_\{i:\}\\big\)\+\\alpha\\sum\_\{w\\in\\mathbb\{W\}\}\\left\\lVert w\\right\\rVert\_\{2\}^\{2\},\(14\)whereα\\alphais a balancing hyperparameter, and𝕎\\mathbb\{W\}is the set of trainable parameters of the Multilayer Perceptron\.

The cross\-entropy loss function enables the modeling of dimension\-specific class information inY^d\\hat\{Y\}\_\{d\}, potentially resulting in distinct labels across different dimensions\. To find the sparse consensus predictionsY^\\hat\{Y\}, we solve the optimization problem described in Eq\. \([15](https://arxiv.org/html/2605.12699#S4.E15)\) at the end of the training process based on Eq\. \([14](https://arxiv.org/html/2605.12699#S4.E14)\)\.

argminY^∈ℝN×C∑d=1D‖Y^−Y^d‖22\+β‖Y^‖1\.\\underset\{\\hat\{Y\}\\in\\mathbb\{R\}^\{N\\times C\}\}\{\\text\{argmin\}\}\\sum\_\{d=1\}^\{D\}\\left\\lVert\\hat\{Y\}\-\\hat\{Y\}\_\{d\}\\right\\rVert\_\{2\}^\{2\}\+\\beta\\left\\lVert\\hat\{Y\}\\right\\rVert\_\{1\}\.\(15\)The first term in Eq\. \([15](https://arxiv.org/html/2605.12699#S4.E15)\) is a sum of Frobenius norms to minimize the distance between the consensus predictionY^\\hat\{Y\}and the dimension\-specific predictionsY^d\\hat\{Y\}\_\{d\}\. The second term is anl1l\_\{1\}\-norm regularization to induce sparsity inY^\\hat\{Y\}, promoting solutions where nodes are classified with high likelihoods\. The coefficientβ\\betais a hyperparameter balancing the two terms\.

The loss function associated with Eq\. \([15](https://arxiv.org/html/2605.12699#S4.E15)\) is a convex function consisting of the sum of two convex functions: the Frobenius norm term and thel1l\_\{1\}\-norm regularization\. Consequently, we adopt a proximal\-gradient optimization method to manage the non\-smooth regularization\. At each iteration, the consensus predictionsY^\\hat\{Y\}are updated as follows:

Y^\(i\+1\)=proxβt\(Y^\(i\)−2t∑d=1D\(Y^\(i\)−Y^d\)\),\\hat\{Y\}^\{\(i\+1\)\}=\\text\{prox\}\_\{\\beta t\}\\left\(\\hat\{Y\}^\{\(i\)\}\-2t\\sum\_\{d=1\}^\{D\}\\left\(\\hat\{Y\}^\{\(i\)\}\-\\hat\{Y\}\_\{d\}\\right\)\\right\),\(16\)whereY^\(i\)\\hat\{Y\}^\{\(i\)\}represents the predictions at iterationii,t=14Dt=\\frac\{1\}\{4D\}is the step size, and the proximal operator is:

proxλ\(v\)=sign\(v\)max⁡\(\|v\|−λ,0\)\.\\text\{prox\}\_\{\\lambda\}\(v\)=\\text\{sign\}\(v\)\\max\\left\(\\left\|v\\right\|\-\\lambda,0\\right\)\.\(17\)
Algorithm 1HAAM0:multiplex graph

GG, features matrix

XX, indices of training node for every

ccclass

IctrainI\_\{c\}^\{\\text\{train\}\}, degree of filters

KK, number of iterations

T1T\_\{1\}and

T2T\_\{2\}\.

0:Sparse consensus label predictions

Y^\\hat\{Y\}\.

1:for

d←1d\\leftarrow 1to

DDdo

2:

Ld←I−Δd−12⋅\(Ad\+I\)⋅Δd−12L\_\{d\}\\leftarrow I\-\\Delta\_\{d\}^\{\-\\frac\{1\}\{2\}\}\\cdot\(A\_\{d\}\+I\)\\cdot\\Delta\_\{d\}^\{\-\\frac\{1\}\{2\}\}
3:

L~d←2Ld/λdmax−I\\tilde\{L\}\_\{d\}\\leftarrow 2\\,L\_\{d\}/\\lambda\_\{d\}^\{\\max\}\-I
4:for

c1,c2∈\{1,…,C\}c\_\{1\},c\_\{2\}\\in\\\{1,\\dots,C\\\}do

5:

\(Hd\)c1c2←∑i∈Ic1train,j∈Ic2train\(Ad\)ij∑i,j=0N\(Ad\)ij\\left\(H\_\{d\}\\right\)\_\{c\_\{1\}c\_\{2\}\}\\leftarrow\\frac\{\\sum\_\{i\\in I\_\{c\_\{1\}\}^\{\\text\{train\}\},\\ j\\in I\_\{c\_\{2\}\}^\{\\text\{train\}\}\}\\left\(A\_\{d\}\\right\)\_\{ij\}\}\{\\sum\_\{i,j=0\}^\{N\}\\left\(A\_\{d\}\\right\)\_\{ij\}\}
6:endfor

7:endfor

8:for

epoch←1epoch\\leftarrow 1to

T1T\_\{1\}do

9:

Y^0←MLP\(X\)\\hat\{Y\}\_\{0\}\\leftarrow\\text\{MLP\}\(X\)
10:for

d←1d\\leftarrow 1to

DDdo

11:

γiℒ,d←γ0d−∑j=1iγjd\\gamma\_\{i\}^\{\\mathcal\{L\},d\}\\leftarrow\\gamma\_\{0\}^\{d\}\-\\sum\_\{j=1\}^\{i\}\\gamma\_\{j\}^\{d\},

∀i∈\{1,…,K\}\\,\\,\\,\\forall i\\in\\left\\\{1,\\dots,K\\right\\\}
12:

γiℋ,d←∑j=0iγjd\\gamma\_\{i\}^\{\\mathcal\{H\},d\}\\leftarrow\\sum\_\{j=0\}^\{i\}\\gamma\_\{j\}^\{d\},

∀i∈\{1,…,K\}\\,\\,\\,\\forall i\\in\\left\\\{1,\\dots,K\\right\\\}
13:for

k←1k\\leftarrow 1to

KKdo

14:

θkℒ,d←2K\+1∑j=0Kγjℒ,dTk\(cos\(j\+1/2K\+1π\)\)\\theta\_\{k\}^\{\\mathcal\{L\},d\}\\leftarrow\\frac\{2\}\{K\+1\}\\sum\_\{j=0\}^\{K\}\\gamma\_\{j\}^\{\\mathcal\{L\},d\}\\\>\\\>T\_\{k\}\\left\(cos\\left\(\\frac\{j\+1/2\}\{K\+1\}\\pi\\right\)\\right\)
15:

θkℋ,d←2K\+1∑j=0Kγjℋ,dTk\(cos\(j\+1/2K\+1π\)\)\\theta\_\{k\}^\{\\mathcal\{H\},d\}\\leftarrow\\frac\{2\}\{K\+1\}\\sum\_\{j=0\}^\{K\}\\gamma\_\{j\}^\{\\mathcal\{H\},d\}\\\>\\\>T\_\{k\}\\left\(cos\\left\(\\frac\{j\+1/2\}\{K\+1\}\\pi\\right\)\\right\)
16:endfor

17:

L^d←12∑i=0K∑j=0Kθiℒ,dθjℋ,d\[Ti\+j\(L~d\)\+T\|i−j\|\(L~d\)\]\\hat\{L\}\_\{d\}\\leftarrow\\frac\{1\}\{2\}\\displaystyle\\sum\_\{i=0\}^\{K\}\\sum\_\{j=0\}^\{K\}\\theta\_\{i\}^\{\\mathcal\{L\},d\}\\,\\theta\_\{j\}^\{\\mathcal\{H\},d\}\\left\[T\_\{i\+j\}\\left\(\\tilde\{L\}\_\{d\}\\right\)\+T\_\{\|i\-j\|\}\\left\(\\tilde\{L\}\_\{d\}\\right\)\\right\]
18:

Y^d←softmax\(L^d⋅Y^0⋅Hd\)\\hat\{Y\}\_\{d\}\\leftarrow\\mathrm\{softmax\}\\big\(\\hat\{L\}\_\{d\}\\cdot\\hat\{Y\}\_\{0\}\\cdot H\_\{d\}\\big\)
19:endfor

20:Update the parameters

γd\\gamma^\{d\},

HdH\_\{d\}, and MLP weights via gradient descent to minimize the

l2l\_\{2\}\-regularized categorical cross\-entropy loss

𝒥\\mathcal\{J\}\.

21:endfor

22:

Y^←0\\hat\{Y\}\\leftarrow\\textbf\{0\}
23:for

i←1i\\leftarrow 1to

T2T\_\{2\}do

24:

Y^\(i\+1\)←proxβt\(Y^\(i\)−2t∑d=1D\(Y^\(i\)−Y^d\)\)\\hat\{Y\}^\{\(i\+1\)\}\\leftarrow\\text\{prox\}\_\{\\beta t\}\\left\(\\hat\{Y\}^\{\(i\)\}\-2\\,t\\sum\_\{d=1\}^\{D\}\\left\(\\hat\{Y\}^\{\(i\)\}\-\\hat\{Y\}\_\{d\}\\right\)\\right\)
25:endfor

26:Return

Y^\\hat\{Y\}\.

### 4\.6Algorithm & Complexity

Algorithm[1](https://arxiv.org/html/2605.12699#alg1)summarizes the proposed approach HAAM\. The time complexity of the proposed model is𝒪\(DC2ℰ\+T\(FMC\+D\(K2ℰ\+NCℰ\)\)\)\\mathcal\{O\}\\left\(DC^\{2\}\\mathcal\{E\}\+T\\left\(FMC\+D\\left\(K^\{2\}\\mathcal\{E\}\+NC\\mathcal\{E\}\\right\)\\right\)\\right\), whereTTis the number of iterations,NNis the number of nodes,DDis the number of dimensions,CCis the number of classes,KKis the degree of the polynomial filtersfdℒ\(L~d\)f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)andfdℋ\(L~d\)f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\),MMis the size of embeddings,FFis the size of the features, andℰ\\mathcal\{E\}is the maximum number of edges in all dimensions\. The memory complexity is𝒪\(D\(K2ℰ\+C2\+NC\)\+η\)\\mathcal\{O\}\\left\(D\\left\(K^\{2\}\\mathcal\{E\}\+C^\{2\}\+NC\\right\)\+\\eta\\right\)\. Both computational complexities are linear with respect toNNandℰ\\mathcal\{E\}\.

### 4\.7Generalization Analysis

We now provide a statistical generalization bound for HAAM that quantifies how the expected classification risk relates to the empirical training risk, and how this gap depends on the operator norms of the dimension\-specific propagation matricesL^d\\hat\{L\}\_\{d\}and compatibility matricesHdH\_\{d\}\.

##### Risk definitions

LetItrain⊆\{1,…,N\}I^\{\\mathrm\{train\}\}\\subseteq\\\{1,\\dots,N\\\}be the index set of labeled training nodes and letn:=\|Itrain\|n:=\|I^\{\\mathrm\{train\}\}\|\. We define the empirical training risk averaged across dimensions as:

ℛ^=1Dn∑d=1D∑i∈Itrainℓ\(\(Sd\)i:,Yi:\)\.\\widehat\{\\mathcal\{R\}\}\\;=\\;\\frac\{1\}\{D\\,n\}\\sum\_\{d=1\}^\{D\}\\;\\sum\_\{i\\in I^\{\\mathrm\{train\}\}\}\\ell\\big\(\(S\_\{d\}\)\_\{i:\},Y\_\{i:\}\\big\)\.\(18\)For the population risk, we adopt the standard learning\-theoretic abstraction where training examples\(x,y\)\(x,y\)are drawn i\.i\.d\. from an unknown distribution𝒟\\mathcal\{D\}over node features and labels\.222This abstraction is commonly used to quantify how empirical performance translates to expected performance\. In the semi\-supervised node classification protocol used in Sec\. 5, the labeled training nodesItrainI^\{\\mathrm\{train\}\}can be viewed as a random labeled sample from the underlying node population\.Letℛ\\mathcal\{R\}denote the expected risk averaged across dimensions:

ℛ=1D∑d=1D𝔼\(x,y\)∼𝒟\[ℓ\(sd\(x\),y\)\],\\mathcal\{R\}\\;=\\;\\frac\{1\}\{D\}\\sum\_\{d=1\}^\{D\}\\;\\mathbb\{E\}\_\{\(x,y\)\\sim\\mathcal\{D\}\}\\Big\[\\ell\\big\(s\_\{d\}\(x\),y\\big\)\\Big\],\(19\)wheresd\(x\)∈ℝCs\_\{d\}\(x\)\\in\\mathbb\{R\}^\{C\}denotes the dimension\-ddscore vector produced by the HAAM pipeline for an input feature vectorxx, consistent with the matrix in Eq\. \([11](https://arxiv.org/html/2605.12699#S4.E11)\)\.

##### Vector\-valued Rademacher complexity

Letℱ0\\mathcal\{F\}\_\{0\}denote the class of functions implemented by the MLP mappingx↦y^0\(x\)∈ℝCx\\mapsto\\hat\{y\}\_\{0\}\(x\)\\in\\mathbb\{R\}^\{C\}\(i\.e\., row\-wise outputs ofY^0\\hat\{Y\}\_\{0\}\)\. For a labeled sampleS=\{\(xi,yi\)\}i=1nS=\\\{\(x\_\{i\},y\_\{i\}\)\\\}\_\{i=1\}^\{n\}, define the empirical Rademacher complexity ofℱ0\\mathcal\{F\}\_\{0\}by:

ℜn\(ℱ0\)=1n𝔼σ\[supf∈ℱ0∑i=1n∑c=1Cσicfc\(xi\)\],\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\_\{0\}\)\\;=\\;\\frac\{1\}\{n\}\\,\\mathbb\{E\}\_\{\\sigma\}\\Bigg\[\\sup\_\{f\\in\\mathcal\{F\}\_\{0\}\}\\sum\_\{i=1\}^\{n\}\\sum\_\{c=1\}^\{C\}\\sigma\_\{ic\}\\,f\_\{c\}\(x\_\{i\}\)\\Bigg\],\(20\)where\{σic\}\\\{\\sigma\_\{ic\}\\\}are i\.i\.d\. Rademacher random variables taking values in\{−1,\+1\}\\\{\-1,\+1\\\}\.

###### Lemma 1\(Lipschitzness and boundedness of softmax cross\-entropy\. Proof in Appendix[J](https://arxiv.org/html/2605.12699#A10)\)\.

Letℓ\(⋅,y\)\\ell\(\\cdot,y\)be the softmax cross\-entropy in Eq\. \([13](https://arxiv.org/html/2605.12699#S4.E13)\)\. Then, for any fixed one\-hot label vectory∈\{0,1\}Cy\\in\\\{0,1\\\}^\{C\}, the mapz↦ℓ\(z,y\)z\\mapsto\\ell\(z,y\)is2\\sqrt\{2\}\-Lipschitz with respect to the Euclidean norm∥⋅∥2\\\|\\cdot\\\|\_\{2\}\. Moreover, if‖z‖∞≤Bz\\\|z\\\|\_\{\\infty\}\\leq B\_\{z\}thenℓ\(z,y\)≤log⁡\(C\)\+2Bz\\ell\(z,y\)\\leq\\log\(C\)\+2B\_\{z\}\.

##### Complexity scaling under graph propagation

Define the class of dimension\-ddscore functions \(logits\) obtained by composing the MLP with the linear propagation operatorsL^d\\hat\{L\}\_\{d\}andHdH\_\{d\}:

ℱd=\{x↦sd\(x\)∈ℝC:sd\(⋅\)is induced bySdin Eq\. \([11](https://arxiv.org/html/2605.12699#S4.E11)\),Y^0generated by somef∈ℱ0\}\.\\mathcal\{F\}\_\{d\}=\\left\\\{\\begin\{aligned\} &x\\mapsto s\_\{d\}\(x\)\\in\\mathbb\{R\}^\{C\}\\;:\\;\\\\ &\\quad s\_\{d\}\(\\cdot\)\\ \\text\{is induced by\}\\ S\_\{d\}\\text\{in Eq\.~\(\\ref\{eq:score\_matrix\_Sd\}\)\},\\ \\hat\{Y\}\_\{0\}\\ \\text\{generated by some\}\\ f\\in\\mathcal\{F\}\_\{0\}\\end\{aligned\}\\right\\\}\.\(21\)The next result shows that the Rademacher complexity ofℱd\\mathcal\{F\}\_\{d\}is controlled by the operator norms ofL^d\\hat\{L\}\_\{d\}andHdH\_\{d\}\.

###### Proposition 4\.9\(Rademacher complexity under propagation\. Proof in Appendix[K](https://arxiv.org/html/2605.12699#A11)\)\.

Fix a dimensiond∈\{1,…,D\}d\\in\\\{1,\\dots,D\\\}\. AssumeL^d∈ℝN×N\\hat\{L\}\_\{d\}\\in\\mathbb\{R\}^\{N\\times N\}andHd∈ℝC×CH\_\{d\}\\in\\mathbb\{R\}^\{C\\times C\}are fixed matrices\. Then, for any labeled sample of sizenn, the corresponding empirical Rademacher complexity satisfies

ℜn\(ℱd\)≤‖L^d‖2‖Hd‖2ℜn\(ℱ0\)\.\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\_\{d\}\)\\;\\leq\\;\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}\\;\\\|H\_\{d\}\\\|\_\{2\}\\;\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\_\{0\}\)\.\(22\)

###### Theorem 4\.10\(Generalization bound for HAAM\. Proof in Appendix[L](https://arxiv.org/html/2605.12699#A12)\)\.

Assume the labeled training nodes form an i\.i\.d\. sample of sizennfrom𝒟\\mathcal\{D\}\. Assume further that, for each dimensiondd, the score vectors are uniformly bounded:‖sd\(x\)‖∞≤Bd\\\|s\_\{d\}\(x\)\\\|\_\{\\infty\}\\leq B\_\{d\}for allxx\. Then for anyδ∈\(0,1\)\\delta\\in\(0,1\), with probability at least1−δ1\-\\delta, the following holds simultaneously for the dimension\-averaged risk of HAAM:

ℛ≤ℛ^\+22D\(∑d=1D‖L^d‖2‖Hd‖2\)ℜn\(ℱ0\)\+3Bmaxlog⁡\(2/δ\)2n,\\mathcal\{R\}\\;\\leq\\;\\widehat\{\\mathcal\{R\}\}\\;\+\\;\\frac\{2\\sqrt\{2\}\}\{D\}\\Big\(\\sum\_\{d=1\}^\{D\}\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}\\,\\\|H\_\{d\}\\\|\_\{2\}\\Big\)\\,\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\_\{0\}\)\\;\+\\;3\\,B\_\{\\max\}\\sqrt\{\\frac\{\\log\(2/\\delta\)\}\{2n\}\},\(23\)whereBmax:=maxd∈\{1,…,D\}⁡\(log⁡\(C\)\+2Bd\)B\_\{\\max\}:=\\max\_\{d\\in\\\{1,\\dots,D\\\}\}\\big\(\\log\(C\)\+2B\_\{d\}\\big\)\.

##### Interpretation for HAAM

Theorem[4\.10](https://arxiv.org/html/2605.12699#S4.Thmtheorem10)shows that the generalization gap is governed by: \(i\) the base prediction complexityℜn\(ℱ0\)\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\_\{0\}\)of the MLP, \(ii\) the dimension\-specific amplification factors‖L^d‖2‖Hd‖2\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}\\\|H\_\{d\}\\\|\_\{2\}, and \(iii\) the sample sizen=\|Itrain\|n=\|I^\{\\mathrm\{train\}\}\|\. Importantly, by Corollary[4\.6](https://arxiv.org/html/2605.12699#S4.Thmtheorem6)\(Sec\.[4\.3](https://arxiv.org/html/2605.12699#S4.SS3)\), the composed Chebyshev propagation operator satisfies:

‖L^d‖2≤\(∑k=0K\|θkℒ,d\|\)\(∑k=0K\|θkℋ,d\|\),\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}\\;\\leq\\;\\Big\(\\sum\_\{k=0\}^\{K\}\|\\theta\_\{k\}^\{\\mathcal\{L\},d\}\|\\Big\)\\Big\(\\sum\_\{k=0\}^\{K\}\|\\theta\_\{k\}^\{\\mathcal\{H\},d\}\|\\Big\),which provides an explicit control of the generalization term in Eq\. \([23](https://arxiv.org/html/2605.12699#S4.E23)\) through the learned Chebyshev coefficients\.

Moreover, the bounded\-logit assumption can be connected to Sec\.[4\.3](https://arxiv.org/html/2605.12699#S4.SS3)\. By Eq\. \([9](https://arxiv.org/html/2605.12699#S4.E9)\) and‖z‖∞≤‖z‖2\\\|z\\\|\_\{\\infty\}\\leq\\\|z\\\|\_\{2\}, one has‖sd\(x\)‖∞≤‖Sd‖F≤‖L^d‖2‖Y^0‖F‖Hd‖2\\\|s\_\{d\}\(x\)\\\|\_\{\\infty\}\\leq\\\|S\_\{d\}\\\|\_\{F\}\\leq\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}\\,\\\|\\hat\{Y\}\_\{0\}\\\|\_\{F\}\\,\\\|H\_\{d\}\\\|\_\{2\}whenever the MLP outputsY^0\\hat\{Y\}\_\{0\}are bounded\.

## 5Experiments

We conduct an empirical evaluation to show the suitability of the proposed approach for modeling heterophily and homophily for node classification in multiplex graphs\. We compare HAAM333The code of HAAM is available at[this link](https://drive.google.com/drive/folders/1ROhUghYARMRGzyLyBP2wXJH6MxXfu4yf?usp=drive_link)\.against state\-of\-the\-art multiplex graph models, namely: GATNE\[[8](https://arxiv.org/html/2605.12699#bib.bib40)\], mGCN\[[24](https://arxiv.org/html/2605.12699#bib.bib41)\], SSDCM\[[25](https://arxiv.org/html/2605.12699#bib.bib48)\], DMGI\[[33](https://arxiv.org/html/2605.12699#bib.bib43)\], HDMI\[[19](https://arxiv.org/html/2605.12699#bib.bib42)\], MGDCR\[[26](https://arxiv.org/html/2605.12699#bib.bib51)\], DMG\[[27](https://arxiv.org/html/2605.12699#bib.bib50)\], X\-GOAL\[[18](https://arxiv.org/html/2605.12699#bib.bib49)\], HMGE\[[2](https://arxiv.org/html/2605.12699#bib.bib3)\], and InfoMGF\[[36](https://arxiv.org/html/2605.12699#bib.bib75)\]\. In addition, we include recent unidimensional graph representation learning baselines that can handle heterophily\. More precisely, our comparison includes PolyGCL\[[9](https://arxiv.org/html/2605.12699#bib.bib70)\]and TFE\-GNN\[[12](https://arxiv.org/html/2605.12699#bib.bib89)\]on a unified adjacency matrixA~\\tilde\{A\}using the original feature matrixXX\. We adapt these methods to the multiplex setting following the common practice of aggregating all multiplex dimensions into one unified adjacency matrixA~\\tilde\{A\}:

A~=1D∑d=1D\(Ad\+I\)\.\\tilde\{A\}=\\frac\{1\}\{D\}\\sum\_\{d=1\}^\{D\}\(A\_\{d\}\+I\)\.\(24\)
Table 1:Data description\.DatasetSyntheticarXivMoviesAmazon\# Dims3233\# Nodes9,600169,34310,5897,621\# Edges343,4947,879,585485,9621,386,799\# Attributes1001282002,000\# Classes6544hdh\_\{d\}\{0\.1,0\.2,…,0\.9\}\\\{0\.1,0\.2,\\dots,0\.9\\\}0\.22 \- 0\.290\.34 \- 0\.32 \- 0\.370\.27 \- 0\.26 \- 0\.25

### 5\.1Datasets

In this section, we describe the datasets used in the experiments\. Table[1](https://arxiv.org/html/2605.12699#S5.T1)summarizes their characteristics, including the homophily ratiohdh\_\{d\}for each dimension\.

Synthetic datasets:We generate node classification labels and adjacency matrices using a method inspired by\[[45](https://arxiv.org/html/2605.12699#bib.bib73)\]\. The generated dimensions have a homophily ratiohd∈\[0\.1,…,0\.9\]h\_\{d\}\\in\[0\.1,\\dots,0\.9\]\. We describe the synthetic generation process in Sec\.[5\.3\.1](https://arxiv.org/html/2605.12699#S5.SS3.SSS1)\.

arXiv:This dataset, inspired by\[[20](https://arxiv.org/html/2605.12699#bib.bib71)\], is based on the OGBN\-arXiv network but contains different labels and two dimensions instead of one\. The nodes represent papers, with edges connecting papers with citation and co\-authorship relations\. The node features are derived from the word2vec features of titles and abstracts\. The class labels are to the publication year of the paper\.

Movies:This dataset is extracted from the film\-director\-actor\-writer network in\[[39](https://arxiv.org/html/2605.12699#bib.bib86)\]\. The nodes represent movies, and edges connect movies that share directors, actors, or writers\. Node features are token count vectorizations of the movie descriptions\. The class labels correspond to the year the movie was released\.

Amazon:This dataset\[[14](https://arxiv.org/html/2605.12699#bib.bib77)\]consists of Amazon items, with features being bag\-of\-words of item descriptions\. There are three types of relations between items:also viewed,also bought, andbought together\. The classes are the items’ categories \(e\.g\., beauty and baby products\)\.

### 5\.2Evaluation Protocol & Parameter Settings

We evaluate HAAM and all baseline methods on the task of node classification over both synthetic and real\-world multiplex graph datasets\. For unsupervised representation learning methods, including HDMI, HMGE, GATNE, DMG, X\-GOAL, InfoMGF, and PolyGCL, we first learn node embeddings without using labels and then train a logistic regression classifier on top of these embeddings to predict node labels\. For supervised or semi\-supervised methods, including DMGI, SSDCM, MGDCR, mGCN, TFE\-GNN, and HAAM, class predictions are obtained directly from the model outputs\. All experiments are performed using the same training/validation/test splits\.

We report F1\-Macro and F1\-Micro scores by comparing predicted labels against the ground\-truth labels\. Each experiment is repeated five times\. For HAAM, we report the mean and standard deviation across runs, whereas for the baseline methods we report the best result among five runs\. For HAAM, the embedding dimension is fixed to6464\. The degree of the Chebyshev polynomial filtersKKis set to\{5,4,2,3\}\\\{5,4,2,3\\\}for the synthetic datasets, arXiv, Movies, and Amazon, respectively\. We optimize the model using the Adam optimizer with a learning rate of0\.0010\.001and anℓ2\\ell\_\{2\}weight decay coefficientα=10−5\\alpha=10^\{\-5\}\. Training is performed for a maximum of1,0001\{,\}000epochs with early stopping if the validation performance does not improve for100100consecutive epochs\. Theℓ1\\ell\_\{1\}regularization parameterβ\\betais set to1\.01\.0and we optimize the loss in Eq\. \([15](https://arxiv.org/html/2605.12699#S4.E15)\)\.

### 5\.3Experiments on Synthetic Datasets

We first evaluate HAAM against baseline methods on node classification tasks using synthetic datasets\. The experiments are divided into two parts:\(i\)constant homophily ratios and\(ii\)variable homophily ratios across the dimensions of the multiplex graph\. Before that, we describe the synthetic data generation process\.

#### 5\.3\.1Synthetic Data Generation

We generate node classification labels and adjacency matrices using a method inspired by\[[45](https://arxiv.org/html/2605.12699#bib.bib73)\], which extends the Barabási\-Albert model with configurable class settings\. First, nodes are randomly assigned intoCCclasses, keeping a balanced distribution\. After that, node features are attributed using the features of theogbn\-productsdataset from Open Graph Benchmark \(OGBN\)\[[16](https://arxiv.org/html/2605.12699#bib.bib76)\], which is a product co\-purchasing graph\. For the edges, we initialize a compatibility matrixℬd\\mathcal\{B\}\_\{d\}that controls the homophily and heterophily settings of each dimensiondd, resulting in an overall homophily ratioρd\\rho\_\{d\}\. The diagonal elements ofℬd\\mathcal\{B\}\_\{d\}are set to the same valueρd\\rho\_\{d\}, while the off\-diagonal elements are set following the approach in\[[3](https://arxiv.org/html/2605.12699#bib.bib87)\]\. The matricesℬd\\mathcal\{B\}\_\{d\}are employed to sample edges\. Letviv\_\{i\}andvjv\_\{j\}be nodes of classcvic\_\{v\_\{i\}\}andcvjc\_\{v\_\{j\}\}\. The edge\(vi,vj\)\(v\_\{i\},v\_\{j\}\)is added to dimensionddwith probability\(ℬd\)cvicvj\\left\(\\mathcal\{B\}\_\{d\}\\right\)\_\{c\_\{v\_\{i\}\}c\_\{v\_\{j\}\}\}\. This process results in a multiplex graph withDDdimensions, such that each dimension has a homophily ratio equal toρd\\rho\_\{d\}\. In our experiments, we generate\(1\)multiplex graphs whereρd\\rho\_\{d\}is the same for all dimensions and\(2\)multiplex graphs whereρd\\rho\_\{d\}varies from one dimension to another\.

![Refer to caption](https://arxiv.org/html/2605.12699v1/x1.png)\(a\)Constant homophily ratios across all dimensions\.
![Refer to caption](https://arxiv.org/html/2605.12699v1/x2.png)\(b\)Variable homophily ratios in the same graph\.

Figure 2:Results of node classification on synthetic datasets\.
#### 5\.3\.2Constant Homophily Ratios

Figure[2\(a\)](https://arxiv.org/html/2605.12699#S5.F2.sf1)shows a comparison of multiplex graph methods on datasets with increasing homophily ratios\. For each value ofh∈0\.1,0\.2,…,0\.9h\\in\{0\.1,0\.2,\\dots,0\.9\}, we generate a synthetic multiplex graph wherehd=hh\_\{d\}=hin all dimensionsdd, following the methodology described in Sec\.[5\.3\.1](https://arxiv.org/html/2605.12699#S5.SS3.SSS1)\. We then train each method and measure accuracy on the resulting synthetic graphs\. The results indicate that HAAM generally achieves higher accuracy than the compared methods across the entire range ofhh\. Ashhincreases, the accuracy of HAAM also improves, approaching very high values whenhhis close to0\.90\.9\. Other methods such as HMGE, X\-GOAL, and mGCN obtain lower performance, particularly at smaller homophily ratios\. DMG and MGDCR show moderate improvements as homophily increases but tend to underperform compared to HAAM, and they face more difficulties whenhhis low, which points to their limitations in settings with stronger heterophily\. Overall, the results suggest that while some baselines can adapt to higher homophily, HAAM maintains strong performance across the full spectrum ofhh, making it a competitive approach in both low and high homophily regimes\.

#### 5\.3\.3Variable Homophily Ratios

Fig\.[2\(b\)](https://arxiv.org/html/2605.12699#S5.F2.sf2)groups results by the value ofhdh\_\{d\}for each of the three dimensions of the graphs \(0\.1−0\.3−0\.60\.1\-0\.3\-0\.6,0\.3−0\.5−0\.70\.3\-0\.5\-0\.7, and0\.5−0\.7−0\.90\.5\-0\.7\-0\.9\)\. We follow the same generation protocol described in Sec\.[5\.3\.1](https://arxiv.org/html/2605.12699#S5.SS3.SSS1), but assign different values ofhdh\_\{d\}depending on the dimension\. The objective is to evaluate competing algorithms whenhdh\_\{d\}varies from one dimension to another\. In this setting, HAAM remains competitive and consistent across all ranges, particularly in the lower homophily range\(0\.1−0\.3−0\.6\)\(0\.1\-0\.3\-0\.6\), where the performance gap becomes more evident\. The combination of adaptive filter products and learnable dimension\-specific compatibility matrices provides a flexible modeling approach, allowing HAAM to better capture varying levels of heterophily within the same multiplex graph\. Moreover, consensus labels support the integration of information from different dimensions\. Overall, HAAM demonstrates robustness across a wide range of homophily levels\. In contrast, methods such as DMGI and HMGE tend to show weaker performance in low homophily settings, reflecting their limitations in modeling heterophily\.

### 5\.4Experiments on Real\-World Datasets

Table 2:Results of node classification on real\-world datasets\.Table[2](https://arxiv.org/html/2605.12699#S5.T2)reports the results of node classification on real\-world datasets\. The table is organized into two blocks to provide a comprehensive comparison:\(i\)unidimensional graph methods\(PolyGCL and TFE\-GNN\), which are originally designed for single graphs and are therefore evaluated on the aggregated adjacency matrixA~\\tilde\{A\}, and\(ii\)native multiplex graph methods, which directly operate on all dimensions\{Ad\}d=1D\\\{A\_\{d\}\\\}\_\{d=1\}^\{D\}\.

Across all datasets, HAAM achieves the best F1\-Macro and F1\-Micro scores, with consistently low standard deviations, indicating stable performance\. On arXiv, HAAM reachesF1\-Macro=39\.95\\text\{F1\-Macro\}=39\.95andF1\-Micro=48\.44\\text\{F1\-Micro\}=48\.44, outperforming the strongest multiplex competitors DMG \(second\-best Macro:35\.4535\.45\) and mGCN \(second\-best Micro:44\.4544\.45\) by\+4\.50\+4\.50and\+3\.99\+3\.99, respectively\. We also observe that HDMI and SSDCM run out of memory on this large dataset, whereas HAAM remains feasible\.

On Movies, HAAM obtains41\.81/42\.3941\.81/42\.39\(Macro/Micro\) and improves over the best baseline HDMI \(40\.25/41\.4240\.25/41\.42\) by\+1\.56\+1\.56\(Macro\) and\+0\.97\+0\.97\(Micro\)\. On Amazon, which exhibits low homophily across all dimensions, HAAM achieves the largest improvements:88\.32/88\.3788\.32/88\.37versus the strongest baseline X\-GOAL \(85\.70/85\.7985\.70/85\.79\), i\.e\.,\+2\.62\+2\.62\(Macro\) and\+2\.58\+2\.58\(Micro\)\. These trends illustrate the limitations of existing multiplex methods that do not explicitly account for heterophily patterns that vary by relation\. Thus, the obtained results empirically support our main design choices: \(i\) learning dimension\-specific compatibility matrices to model class couplings that differ across dimensions, \(ii\) composing low\-pass and high\-pass Chebyshev filters via the proposed product mechanism to jointly exploit homophilic and heterophilic signals, and \(iii\) producing a sparse consensus prediction using proximal\-gradient optimization\.

Although PolyGCL and TFE\-GNN are designed to handle heterophily in unidimensional graphs, their adaptation to multiplex graphs viaA~\\tilde\{A\}generally underperforms native multiplex models, especially on arXiv and Amazon\. This suggests that aggregating dimensions can discard relation\-specific homophily/heterophily structure that is critical for prediction\. In contrast, InfoMGF is more competitive on Movies but remains substantially below HAAM on arXiv and Amazon\. These results indicate that multiplex structure learning alone is insufficient when heterophily must be explicitly modeled during propagation and prediction\. Overall, Table[2](https://arxiv.org/html/2605.12699#S5.T2)provides evidence that HAAM captures the interplay of homophilic and heterophilic interactions across multiplex dimensions and yields improved node classification performance compared to both recent unidimensional heterophily methods and state\-of\-the\-art multiplex baselines\.

Table 3:Ablation study on node classification\.
### 5\.5Ablation Study

The ablation study in Table[3](https://arxiv.org/html/2605.12699#S5.T3)provides a fine\-grained analysis of the main components of HAAM by isolating:\(i\)compatibility modeling \(shared vs\. dimension\-specific\),\(ii\)sparse consensus via proximal optimization, and\(iii\)the spectral filtering design \(low\-pass, high\-pass, their sum, weighted sum, and product\)\. All variants are trained and evaluated under the same protocol and data splits described in Sec\.[5\.2](https://arxiv.org/html/2605.12699#S5.SS2)\.

##### Ablation variants

We consider the following models:

- •Naive model:a per\-dimension two\-layer GCN baseline that does not use compatibility matrices, spectral filter compositions, or the proximal consensus mechanism\.
- •HAAM\-CM\(HH\): HAAM with a single shared compatibility matrixH∈ℝC×CH\\in\\mathbb\{R\}^\{C\\times C\}across all dimensions, testing whether a global class\-compatibility structure is sufficient\.
- •HAAM\-CM\(HdH\_\{d\}\): HAAM with dimension\-specific compatibility matrices\{Hd\}d=1D\\\{H\_\{d\}\\\}\_\{d=1\}^\{D\}, capturing relation\-dependent class interactions\.
- •HAAM\-CM\(Hd\+proxH\_\{d\}\+\\text\{prox\}\): HAAM\-CM \(HdH\_\{d\}\) augmented with the proximal\-gradient consensus optimization \(Eq\. \([15](https://arxiv.org/html/2605.12699#S4.E15)\)\), yielding sparse consensus predictions\.
- •HAAM\-LP\(Hd\+fdℒ\+proxH\_\{d\}\+f\_\{d\}^\{\\mathcal\{L\}\}\+\\text\{prox\}\): HAAM\-CM \(Hd\+proxH\_\{d\}\+\\text\{prox\}\) with only the low\-pass Chebyshev filterfdℒf\_\{d\}^\{\\mathcal\{L\}\}\.
- •HAAM\-HP\(Hd\+fdℋ\+proxH\_\{d\}\+f\_\{d\}^\{\\mathcal\{H\}\}\+\\text\{prox\}\): HAAM\-CM \(Hd\+proxH\_\{d\}\+\\text\{prox\}\) with only the high\-pass Chebyshev filterfdℋf\_\{d\}^\{\\mathcal\{H\}\}\.
- •HAAM\-SUM\(Hd\+fdℒ\+fdℋ\+proxH\_\{d\}\+f\_\{d\}^\{\\mathcal\{L\}\}\+f\_\{d\}^\{\\mathcal\{H\}\}\+\\text\{prox\}\): HAAM\-CM \(Hd\+proxH\_\{d\}\+\\text\{prox\}\) where the two filter responses are combined by an unweighted sum, i\.e\.,fdℒ\+fdℋf\_\{d\}^\{\\mathcal\{L\}\}\+f\_\{d\}^\{\\mathcal\{H\}\}\.
- •HAAM\-SUM\(Hd\+δfdℒ\+\(1−δ\)fdℋ\+proxH\_\{d\}\+\\delta f\_\{d\}^\{\\mathcal\{L\}\}\+\(1\-\\delta\)f\_\{d\}^\{\\mathcal\{H\}\}\+\\text\{prox\}\): HAAM\-CM \(Hd\+proxH\_\{d\}\+\\text\{prox\}\) with a weighted sum of low\-/high\-pass responses, whereδ∈\[0,1\]\\delta\\in\[0,1\]is selected on the validation set\.
- •HAAM\-Prod\(Hd\+fdℋ⋅fdℒ\+proxH\_\{d\}\+f\_\{d\}^\{\\mathcal\{H\}\}\\cdot f\_\{d\}^\{\\mathcal\{L\}\}\+\\text\{prox\}\): the full model, where the low\-pass and high\-pass filters are combined by the proposed product \(composition\) mechanism\.

##### Impact of compatibility modeling

As we can see from Table[3](https://arxiv.org/html/2605.12699#S5.T3), comparing the naive model with compatibility\-based variants highlights that compatibility modeling is a primary driver of the improvements, especially on low\-homophily datasets such as Amazon \(Table[1](https://arxiv.org/html/2605.12699#S5.T1)\)\. For example, moving from the Naive model to HAAM\-CM improves performance on Amazon \(F1\-Macro:74\.86→86\.6274\.86\\rightarrow 86\.62with sharedHH, and74\.86→87\.1774\.86\\rightarrow 87\.17with dimension\-specificHdH\_\{d\}\)\. This indicates that explicitly modeling cross\-class couplings is critical when edges frequently connect dissimilar labels\.

##### Shared vs\. dimension\-specific compatibility matrices

Replacing a single shared compatibility matrixHHwith dimension\-specific matricesHdH\_\{d\}yields consistent gains on Movies and Amazon \(e\.g\., Amazon F1\-Micro:86\.79→87\.4286\.79\\rightarrow 87\.42\)\. These results support the hypothesis that different relations encode different class\-interaction patterns\. On arXiv, the two designs are competitive \(with a small trade\-off between Macro and Micro\), which suggests that some datasets may benefit from partial sharing\. However, the dimension\-specific design provides a stronger and more flexible inductive bias overall\. This observation is also consistent with the qualitative differences across dimensions shown in Fig\.[7](https://arxiv.org/html/2605.12699#S5.F7)\.

##### Effect of proximal consensus optimization

Adding proximal consensus \(HAAM\-CM \(Hd\+proxH\_\{d\}\+\\text\{prox\}\)\) improves over HAAM\-CM \(HdH\_\{d\}\) on all three datasets\. These results indicate that explicitly reconciling the dimension\-wise predictions into a sparse consensus label distribution is beneficial\. Concretely, on arXiv we observe improvements \(F1\-Macro:38\.97→39\.5138\.97\\rightarrow 39\.51, F1\-Micro:47\.05→47\.6647\.05\\rightarrow 47\.66\), and similar gains appear on Movies and Amazon\. These results empirically validate the role of the sparsity\-inducing consensus mechanism in stabilizing predictions across dimensions\.

##### Low\-pass vs\. high\-pass filtering

The relative performance of HAAM\-LP and HAAM\-HP depends on the dataset homophily regime \(Table[1](https://arxiv.org/html/2605.12699#S5.T1)\)\. On Movies, which exhibits comparatively higher homophily than Amazon, HAAM\-LP is stronger than HAAM\-HP \(e\.g\., Movies F1\-Micro:42\.2142\.21vs\.41\.5341\.53\), indicating that smoothing low\-frequency components is particularly beneficial\. Conversely, on Amazon and arXiv, both characterized by lower homophily, HAAM\-HP yields larger gains \(e\.g\., Amazon F1\-Macro:88\.2688\.26\), highlighting the importance of preserving high\-frequency \(heterophilic\) signals\. This behavior is also consistent with the spectral response visualizations reported in Figs\.[3](https://arxiv.org/html/2605.12699#S5.F3)and[4](https://arxiv.org/html/2605.12699#S5.F4)\.

##### Sum vs\. weighted sum vs\. product

Among the fusion strategies, HAAM\-SUM provides improvements over compatibility\-only variants but is generally less robust than either \(i\) tuning a weighted mixture or \(ii\) using the proposed product\. Introducing the validation\-selectedδ\\deltain the weighted sum improves over the unweighted sum \(e\.g\., Amazon F1\-Macro:87\.85→88\.1887\.85\\rightarrow 88\.18\), showing that balancing low\-/high\-frequency contributions matters\. However, HAAM\-Prod achieves the strongest and most consistent performance across datasets, obtaining the best results on five out of six metrics and remaining competitive on the remaining one \(arXiv F1\-Macro, where HAAM\-HP is highest\)\. Overall, these results support our design choice\. More precisely, the product\-based composition provides a robust mechanism to jointly exploit homophilic \(low\-frequency\) and heterophilic \(high\-frequency\) information without requiring manual tuning of a mixture coefficient, and it tends to generalize well across datasets with different homophily regimes\.

![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/filter_response_amazon_dim1.png)Figure 3:Spectral frequency responses of the learned filters of HAAM on Amazon\. For each dimensiondd, we plot the learned low\-pass responsefdℒ\(λ\)f\_\{d\}^\{\\mathcal\{L\}\}\(\\lambda\), the learned high\-pass responsefdℋ\(λ\)f\_\{d\}^\{\\mathcal\{H\}\}\(\\lambda\), and their composed responsefdℋ\(λ\)fdℒ\(λ\)f\_\{d\}^\{\\mathcal\{H\}\}\(\\lambda\)f\_\{d\}^\{\\mathcal\{L\}\}\(\\lambda\), evaluated on a dense grid of Laplacian eigenvaluesλ∈\[0,2\]\\lambda\\in\[0,2\]\(normalized Laplacian\)\. We also overlay the pointwise productfdℋ\(λ\)fdℒ\(λ\)f\_\{d\}^\{\\mathcal\{H\}\}\(\\lambda\)f\_\{d\}^\{\\mathcal\{L\}\}\(\\lambda\)to verify that it matches the Chebyshev\-product implementation\.![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/filter_response_movies_dim1.png)Figure 4:Spectral frequency responses of the learned filters of HAAM on Movies, plotted with the same protocol as in Fig\.[3](https://arxiv.org/html/2605.12699#S5.F3)\.

### 5\.6Spectral Filter Responses

To provide an interpretable view of how HAAM adapts to homophily and heterophily at the signal level, we visualize the spectral frequency response of the learned Chebyshev filters for the multiplex dimensions\. Concretely, after training we extract the learned Chebyshev coefficients and evaluate the low\-pass filterfdℒ\(λ\)f\_\{d\}^\{\\mathcal\{L\}\}\(\\lambda\), the high\-pass filterfdℋ\(λ\)f\_\{d\}^\{\\mathcal\{H\}\}\(\\lambda\), and the composed filterfdℋ\(λ\)fdℒ\(λ\)f\_\{d\}^\{\\mathcal\{H\}\}\(\\lambda\)f\_\{d\}^\{\\mathcal\{L\}\}\(\\lambda\)on a dense grid of eigenvaluesλ∈\[0,2\]\\lambda\\in\[0,2\]corresponding to the spectrum of the normalized Laplacian\. This provides an intuitive picture of which frequency components are attenuated or amplified by the model in each dimension\.

Figs\.[3](https://arxiv.org/html/2605.12699#S5.F3)and[4](https://arxiv.org/html/2605.12699#S5.F4)show the learned responses on Amazon and Movies\. The dashed pointwise product closely overlaps the composed response obtained via the Chebyshev\-product coefficients, which empirically validates the correctness of Prop\.[4\.3](https://arxiv.org/html/2605.12699#S4.Thmtheorem3)\.

On Amazon, which exhibits low homophily ratios across dimensions \(Table[1](https://arxiv.org/html/2605.12699#S5.T1)\), the learned low\-pass and high\-pass components show complementary behaviour, indicating that the model does not rely on purely homophilic smoothing\. Importantly, the composed response remains stable across the spectrum while still preserving dimension\-specific differences, which is consistent with the gains obtained by the full model on Amazon in Table[2](https://arxiv.org/html/2605.12699#S5.T2)and with the ablation results showing that combiningHdH\_\{d\}, proximal consensus, and composed filtering yields the best overall performance \(Table[3](https://arxiv.org/html/2605.12699#S5.T3)\)\.

On Movies, where the homophily levels are comparatively higher, the composed response places more emphasis on lower frequencies, reflecting a stronger contribution of smoothing components that are beneficial under homophily\. At the same time, the presence of a non\-trivial high\-pass component indicates that the model still preserves heterophilic information when they are relevant\. Overall, these frequency\-response plots corroborate the main motivation of HAAM: learning dimension\-specific filters that flexibly combine homophilic and heterophilic propagation patterns rather than committing to a single regime\.

![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/robust_feat_mask_f1_micro_drop.png)\(a\)Feature masking\.
![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/robust_edge_dropout_f1_micro_drop.png)\(b\)Edge dropping\.

Figure 5:Robustness of HAAM under input perturbations on two real\-world multiplex datasets \(Amazon and Movies\)\. The y\-axis reports the drop in F1\-Micro \(in percentage points\) relative to the clean graph\. Error bars denote standard deviation over five random perturbation trials\.
### 5\.7Robustness Analysis

We evaluate the robustness of HAAM under two common perturbations that emulate noisy or incomplete multiplex data:\(i\) feature masking, where a fractionppof node feature entries is randomly set to zero, and\(ii\) edge dropping, where a fractionppof edges is randomly removed in each dimension\. We apply perturbations only at test time \(the model is trained on the clean graph\) and report the corresponding performance drop relative to the clean setting\. Each perturbation level is repeated over five random trials and we report mean and standard deviation \(Fig\.[5](https://arxiv.org/html/2605.12699#S5.F5)\)\.

Feature masking\.Fig\.[5\(a\)](https://arxiv.org/html/2605.12699#S5.F5.sf1)shows that HAAM degrades gracefully as feature information is removed\. With50%50\\%masking, the F1\-Micro drop is7\.397\.39pp on Amazon and2\.712\.71pp on Movies \(F1\-Macro drop:7\.487\.48pp and2\.152\.15pp, respectively\)\. This indicates that HAAM can compensate for partially missing attributes by leveraging multiplex structural signals captured by the composed low\-/high\-pass filters and the consensus mechanism\.

Edge dropping\.Fig\.[5\(b\)](https://arxiv.org/html/2605.12699#S5.F5.sf2)shows that HAAM is highly stable to substantial edge removal\. Across dropout rates up to50%50\\%, the maximum observed F1\-Micro drop remains below0\.190\.19pp on Movies and below0\.010\.01pp on Amazon \(the latter even exhibits a negligible gain within noise\)\. This robustness is consistent with the design of HAAM, which integrates information across multiple relations and relies on learned compatibility matrices to reweight same\-class and cross\-class propagation rather than overfitting to a specific connectivity pattern\. Overall, these stress tests suggest that HAAM remains reliable under common forms of missing features and missing links in multiplex graphs\.

![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/sensitivity_movies_macro.png)\(a\)F1\-Macro on Movies
![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/sensitivity_movies_micro.png)\(b\)F1\-Micro on Movies
![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/sensitivity_amazon_macro.png)\(c\)F1\-Macro on Amazon
![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/sensitivity_amazon_micro.png)\(d\)F1\-Micro on Amazon

Figure 6:Sensitivity analysis of HAAM\.
### 5\.8Sensitivity Analysis

Fig\.[6](https://arxiv.org/html/2605.12699#S5.F6)presents the sensitivity analysis of HAAM, showing the impact of varying the embedding sizeMMand the degree of the filtersKKon the performance of node classification\. The heatmaps reveal that our approach is relatively stable across different settings ofMMandKK, with only minor fluctuations in accuracy\. For the Movies dataset, the scores show a slight preference for a smaller number of filters \(K=K=1 or 2\), particularly when the embedding size is set toM=64M=64\. In contrast, the best performance for the Amazon dataset is observed at a larger degree \(K=K=3 or 5\)\. Overall, the analysis suggests that the model performs robustly across a wide range of settings\.

![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/H_learned_all_dims_amazon.png)\(a\)Amazon \(three dimensions\)\.
![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/H_learned_all_dims_movies.png)\(b\)Movies \(three dimensions\)\.

Figure 7:Learned dimension\-specific compatibility matricesHdH\_\{d\}of HAAM on two datasets \(Amazon and Movies\)\. Each heatmap is aC×CC\\times Cmatrix, where entry\(Hd\)c,c′\(H\_\{d\}\)\_\{c,c^\{\\prime\}\}controls how information associated with classcccontributes to the score of classc′c^\{\\prime\}after propagation in dimensiondd\. Diagonal\-dominant structure corresponds to homophilic couplings, whereas strong off\-diagonal structure \(in magnitude\) indicates cross\-class \(heterophilic\) couplings\.
### 5\.9Learned Compatibility Matrices

A key component of HAAM is the dimension\-specific compatibility matrixHd∈ℝC×CH\_\{d\}\\in\\mathbb\{R\}^\{C\\times C\}, which is learned jointly with the spectral filters and the consensus mechanism\. Fig\.[7](https://arxiv.org/html/2605.12699#S5.F7)provides a qualitative view of the learnedHdH\_\{d\}matrices for Amazon and Movies\. First, the learned compatibility patterns differ substantially across dimensions, supporting our design choice to learn dimension\-specificHdH\_\{d\}rather than a single shared compatibility matrix\. Second, the matrices exhibit both diagonal structure and pronounced off\-diagonal structure \(with dataset\- and dimension\-dependent intensity\), indicating that HAAM does not assume a fixed homophily regime but instead learns how to reweight same\-class and cross\-class signals in a relation\-aware manner\.

These visual results complement the quantitative ablation findings in Table[3](https://arxiv.org/html/2605.12699#S5.T3)\. More precisely, introducingHdH\_\{d\}on top of the naive per\-dimension GCN yields consistent gains, with a particularly large improvement on Amazon \(F1\-Macro:74\.86→86\.6274\.86\\rightarrow 86\.62\)\. Thus, the compatibility modeling is a major contributor to the overall performance improvements reported in Table[2](https://arxiv.org/html/2605.12699#S5.T2)\.

![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/HAAM_tsne_movies.png)\(a\)HAAM on Movies\.
![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/InfoMGF_tsne_movies.png)\(b\)InfoMGF on Movies\.
![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/HAAM_tsne_amazon.png)\(c\)HAAM on Amazon\.
![Refer to caption](https://arxiv.org/html/2605.12699v1/figures/InfoMGF_tsne_amazon.png)\(d\)InfoMGF on Amazon\.

Figure 8:Latent space t\-SNE visualization of the node representations learned by HAAM and InfoMGF on Movies and Amazon\. Points are colored by ground\-truth classes\. Following common practice, we subsample up to5,0005\{,\}000nodes \(approximately balanced across classes\) and use a fixed random seed\.
### 5\.10Embedding Visualization

We provide a qualitative assessment of representation quality using t\-SNE\[[40](https://arxiv.org/html/2605.12699#bib.bib91)\]\. The node representations produced by each method are extracted after training, project them to two dimensions, and color nodes by their ground\-truth class labels\. As illustrated in Fig\.[8](https://arxiv.org/html/2605.12699#S5.F8), HAAM yields representations that are more class\-separable\. On Amazon, the embedding space of HAAM exhibits compact and well\-separated clusters with limited overlap between categories\. In contrast, InfoMGF shows a more fragmented and mixed structure, which is consistent with its lower F1 scores on this dataset \(Table[2](https://arxiv.org/html/2605.12699#S5.T2)\)\. On Movies, class overlap remains more pronounced for both methods\. Nevertheless, the HAAM embeddings appear less fragmented, which aligns with its consistent improvement in F1 over competing approaches\.

## 6Conclusion

This paper addresses a critical gap in the literature of multiplex graphs by introducing HAAM, a novel adaptive framework specifically designed for node classification in multiplex graphs exhibiting both heterophilic and homophilic dimensions\. While existing models tend to focus on homophilic structures, they may fall short in capturing the structural diversity of real\-world systems where heterophily is prevalent\. Our method leverages the product of high\-pass and low\-pass Chebyshev filters, combined with dimension\-specific compatibility matrices and consensus labels, to dynamically adapt to varying connectivity patterns across dimensions\. This design supports the ability to generate more stable and robust predictions in multi\-relational environments\.

The dimension\-specific compatibility matrices are estimated using labeled nodes within the semi\-supervised setting\. As a result, their reliability depends on the availability and representativeness of labeled data\. When labeled nodes are extremely scarce, highly imbalanced, or biased, the empirical class\-mixing estimates may become less stable, which can in turn affect calibration and prediction robustness\. Although this behavior is inherent to approaches relying on class\-conditional connectivity statistics, incorporating additional regularization or prior structural assumptions could further improve robustness in low\-label regimes\. Exploring such strategies constitutes a promising direction for future work\.

While this study focuses exclusively on multiplex graphs, in which nodes of the same type are connected via multiple types of edges, it does not target more general heterogeneous multidimensional graphs that involve both multiple types of nodes and multiple types of edges\. Likewise, it does not address dynamic graphs that evolve over time\. Nonetheless, this work may pave the way for exploring more complex graph structures, including heterogeneous multiplex graphs and dynamic multiplex networks\.

## Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper\.

## Data Availability

The dataset and the source code used to support the findings of this study are available at[this link](https://drive.google.com/drive/folders/1ROhUghYARMRGzyLyBP2wXJH6MxXfu4yf?usp=drive_link)\.

## References

- \[1\]\(2024\)A geometric perspective for high\-dimensional multiplex graphs\.InProceedings of the 33rd ACM International Conference on Information and Knowledge Management,pp\. 4–13\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p1.1)\.
- \[2\]K\. Abdous, N\. Mrabah, and M\. Bouguessa\(2024\)Hierarchical aggregations for high\-dimensional multiplex graph embedding\.IEEE Transactions on Knowledge and Data Engineering36\(4\),pp\. 1624–1637\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.12699#S2.SS1.p2.1),[§5](https://arxiv.org/html/2605.12699#S5.p1.3)\.
- \[3\]S\. Abu\-El\-Haija, B\. Perozzi, A\. Kapoor, N\. Alipourfard, K\. Lerman, H\. Harutyunyan, G\. Ver Steeg, and A\. Galstyan\(2019\)Mixhop: higher\-order graph convolutional architectures via sparsified neighborhood mixing\.InInternational Conference on Machine Learning,pp\. 21–29\.Cited by:[§5\.3\.1](https://arxiv.org/html/2605.12699#S5.SS3.SSS1.p1.18)\.
- \[4\]O\. Barranco, C\. Lozares, and D\. Muntanyola\-Saura\(2019\)Heterophily in social groups formation: a social network analysis\.Quality & Quantity53\(2\),pp\. 599–619\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p2.1)\.
- \[5\]F\. Battiston, V\. Nicosia, and V\. Latora\(2017\)The new challenges of multiplex networks: measures and models\.The European Physical Journal Special Topics226,pp\. 401–416\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p1.1)\.
- \[6\]M\. Berlingerio, M\. Coscia, F\. Giannotti, A\. Monreale, and D\. Pedreschi\(2013\)Multidimensional networks: foundations of structural analysis\.World Wide Web16\(5\),pp\. 567–593\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p1.1)\.
- \[7\]O\. Boutemine and M\. Bouguessa\(2017\)Mining community structures in multidimensional networks\.ACM Transactions on Knowledge Discovery from Data11\(4\),pp\. 1–36\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p3.1)\.
- \[8\]Y\. Cen, X\. Zou, J\. Zhang, H\. Yang, J\. Zhou, and J\. Tang\(2019\)Representation learning for attributed multiplex heterogeneous network\.InInternational Conference on Knowledge Discovery and Data Mining,pp\. 1358–1368\.Cited by:[§2\.1](https://arxiv.org/html/2605.12699#S2.SS1.p2.1),[§5](https://arxiv.org/html/2605.12699#S5.p1.3)\.
- \[9\]J\. Chen, R\. Lei, and Z\. Wei\(2024\)PolyGCL: graph contrastive learning via learnable spectral polynomial filters\.InThe Twelfth International Conference on Learning Representations,Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p5.2),[§2\.2](https://arxiv.org/html/2605.12699#S2.SS2.p1.1),[§4\.1](https://arxiv.org/html/2605.12699#S4.SS1.p3.2),[§4\.2](https://arxiv.org/html/2605.12699#S4.SS2.p2.6),[§4\.2](https://arxiv.org/html/2605.12699#S4.SS2.p3.4),[§5](https://arxiv.org/html/2605.12699#S5.p1.3)\.
- \[10\]E\. Chien, J\. Peng, P\. Li, and O\. Milenkovic\(2021\)Adaptive universal generalized pagerank graph neural network\.InThe Ninth International Conference on Learning Representations,Cited by:[§2\.2](https://arxiv.org/html/2605.12699#S2.SS2.p1.1)\.
- \[11\]M\. Defferrard, X\. Bresson, and P\. Vandergheynst\(2016\)Convolutional neural networks on graphs with fast localized spectral filtering\.Advances on Neural Information Processing Systems29,pp\. 3844–3852\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p4.1),[§1](https://arxiv.org/html/2605.12699#S1.p5.2),[§4\.2](https://arxiv.org/html/2605.12699#S4.SS2.p1.3),[§4\.2](https://arxiv.org/html/2605.12699#S4.SS2.p2.5)\.
- \[12\]R\. Duan, M\. Guang, J\. Wang, C\. Yan, H\. Qi, W\. Su, C\. Tian, and H\. Yang\(2024\)Unifying homophily and heterophily for spectral graph neural networks via triple filter ensembles\.Advances in Neural Information Processing Systems37,pp\. 93540–93567\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p5.2),[§2\.2](https://arxiv.org/html/2605.12699#S2.SS2.p1.1),[§4\.1](https://arxiv.org/html/2605.12699#S4.SS1.p3.2),[§4\.2](https://arxiv.org/html/2605.12699#S4.SS2.p3.4),[§5](https://arxiv.org/html/2605.12699#S5.p1.3)\.
- \[13\]M\. He, Z\. Wei, and J\. Wen\(2024\)Convolutional neural networks on graphs with chebyshev approximation, revisited\.Advances on Neural Information Processing Systems35,pp\. 7264–7276\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p5.2),[§4\.2](https://arxiv.org/html/2605.12699#S4.SS2.p2.5),[§4\.2](https://arxiv.org/html/2605.12699#S4.SS2.p2.6)\.
- \[14\]R\. He and J\. McAuley\(2016\)Ups and downs: modeling the visual evolution of fashion trends with one\-class collaborative filtering\.InThe Web Conference,pp\. 507–517\.Cited by:[§5\.1](https://arxiv.org/html/2605.12699#S5.SS1.p5.1)\.
- \[15\]R\. D\. Hjelm, A\. Fedorov, S\. Lavoie\-Marchildon, K\. Grewal, P\. Bachman, A\. Trischler, and Y\. Bengio\(2019\)Learning deep representations by mutual information estimation and maximization\.InThe Seventh International Conference on Learning Representations,Cited by:[§2\.1](https://arxiv.org/html/2605.12699#S2.SS1.p1.1)\.
- \[16\]W\. Hu, M\. Fey, M\. Zitnik, Y\. Dong, H\. Ren, B\. Liu, M\. Catasta, and J\. Leskovec\(2020\)Open graph benchmark: datasets for machine learning on graphs\.Advances in Neural Information Processing Systems33,pp\. 22118–22133\.Cited by:[§5\.3\.1](https://arxiv.org/html/2605.12699#S5.SS3.SSS1.p1.18)\.
- \[17\]W\. Jin, H\. Ma, Y\. Zhang, Z\. Li, and L\. Chang\(2024\)Multi\-view discriminative edge heterophily contrastive learning network for attributed graph anomaly detection\.Expert Systems with Applications255,pp\. 124460\.External Links:ISSN 0957\-4174Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p3.1)\.
- \[18\]B\. Jing, S\. Feng, Y\. Xiang, X\. Chen, Y\. Chen, and H\. Tong\(2022\)X\-goal: multiplex heterogeneous graph prototypical contrastive learning\.InInternational Conference on Information & Knowledge Management,pp\. 894–904\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p3.1),[§2\.1](https://arxiv.org/html/2605.12699#S2.SS1.p1.1),[§5](https://arxiv.org/html/2605.12699#S5.p1.3)\.
- \[19\]B\. Jing, C\. Park, and H\. Tong\(2021\)Hdmi: high\-order deep multiplex infomax\.InThe Web Conference,pp\. 2414–2424\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p3.1),[§2\.1](https://arxiv.org/html/2605.12699#S2.SS1.p1.1),[§5](https://arxiv.org/html/2605.12699#S5.p1.3)\.
- \[20\]D\. Lim, F\. Hohne, X\. Li, S\. L\. Huang, V\. Gupta, O\. Bhalerao, and S\. N\. Lim\(2021\)Large scale learning on non\-homophilous graphs: new benchmarks and strong simple methods\.Advances in Neural Information Processing Systems34,pp\. 20887–20902\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p2.1),[§2\.2](https://arxiv.org/html/2605.12699#S2.SS2.p1.1),[§4\.1](https://arxiv.org/html/2605.12699#S4.SS1.p1.3),[§5\.1](https://arxiv.org/html/2605.12699#S5.SS1.p3.1)\.
- \[21\]S\. Liu, D\. He, Z\. Yu, D\. Jin, and Z\. Feng\(2025\)Beyond homophily: neighborhood distribution\-guided graph convolutional networks\.Expert Systems with Applications259,pp\. 125274\.External Links:ISSN 0957\-4174Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p3.1)\.
- \[22\]Y\. Liu, Y\. Zheng, D\. Zhang, V\. C\. Lee, and S\. Pan\(2023\)Beyond smoothing: unsupervised graph representation learning with edge heterophily discriminating\.InAAAI Conference on Artificial Intelligence,Vol\.37,pp\. 4516–4524\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p5.2),[§2\.2](https://arxiv.org/html/2605.12699#S2.SS2.p1.1),[§4\.1](https://arxiv.org/html/2605.12699#S4.SS1.p3.2),[§4\.2](https://arxiv.org/html/2605.12699#S4.SS2.p3.4)\.
- \[23\]S\. Luan, C\. Hua, Q\. Lu, J\. Zhu, M\. Zhao, S\. Zhang, X\. Chang, and D\. Precup\(2022\)Revisiting heterophily for graph neural networks\.Advances in Neural Information Processing Systems35,pp\. 1362–1375\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p2.1)\.
- \[24\]Y\. Ma, S\. Wang, C\. C\. Aggarwal, D\. Yin, and J\. Tang\(2019\)Multi\-dimensional graph convolutional networks\.InInternational Conference on Data Mining,pp\. 657–665\.Cited by:[§2\.1](https://arxiv.org/html/2605.12699#S2.SS1.p2.1),[§5](https://arxiv.org/html/2605.12699#S5.p1.3)\.
- \[25\]A\. Mitra, P\. Vijayan, R\. Sanasam, D\. Goswami, S\. Parthasarathy, and B\. Ravindran\(2021\)Semi\-supervised deep learning for multiplex networks\.InInternational Conference on Knowledge Discovery and Data Mining,pp\. 1234–1244\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p3.1),[§2\.1](https://arxiv.org/html/2605.12699#S2.SS1.p1.1),[§5](https://arxiv.org/html/2605.12699#S5.p1.3)\.
- \[26\]Y\. Mo, Y\. Chen, Y\. Lei, L\. Peng, X\. Shi, C\. Yuan, and X\. Zhu\(2023\)Multiplex graph representation learning via dual correlation reduction\.IEEE Transactions on Knowledge and Data Engineering35\(12\),pp\. 12814–12827\.Cited by:[§2\.1](https://arxiv.org/html/2605.12699#S2.SS1.p2.1),[§5](https://arxiv.org/html/2605.12699#S5.p1.3)\.
- \[27\]Y\. Mo, Y\. Lei, J\. Shen, X\. Shi, H\. T\. Shen, and X\. Zhu\(2023\)Disentangled multiplex graph representation learning\.InInternational Conference on Machine Learning,pp\. 24983–25005\.Cited by:[§2\.1](https://arxiv.org/html/2605.12699#S2.SS1.p2.1),[§5](https://arxiv.org/html/2605.12699#S5.p1.3)\.
- \[28\]N\. Mrabah, M\. Bouguessa, and R\. Ksantini\(2022\)Escaping feature twist: a variational graph auto\-encoder for node clustering\.\.InInternational Joint Conference on Artificial Intelligence \(IJCAI\),pp\. 3351–3357\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p1.1)\.
- \[29\]N\. Mrabah, M\. Bouguessa, and R\. Ksantini\(2024\)A contrastive variational graph auto\-encoder for node clustering\.Pattern recognition149,pp\. 110209\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p1.1)\.
- \[30\]N\. Mrabah, M\. Bouguessa, M\. F\. Touati, and R\. Ksantini\(2022\)Rethinking graph auto\-encoder models for attributed graph clustering\.IEEE Transactions on Knowledge and Data Engineering35\(9\),pp\. 9037–9053\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p1.1)\.
- \[31\]R\. Oughtred, J\. Rust, C\. Chang, B\. Breitkreutz, C\. Stark, A\. Willems, L\. Boucher, G\. Leung, N\. Kolas, F\. Zhang,et al\.\(2021\)The biogrid database: a comprehensive biomedical resource of curated protein, genetic, and chemical interactions\.Protein Science30\(1\),pp\. 187–200\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p1.1)\.
- \[32\]E\. Pan and Z\. Kang\(2023\)Beyond homophily: reconstructing structure for graph\-agnostic clustering\.InInternational Conference on Machine Learning,pp\. 26868–26877\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p3.1),[§2\.2](https://arxiv.org/html/2605.12699#S2.SS2.p1.1)\.
- \[33\]C\. Park, D\. Kim, J\. Han, and H\. Yu\(2020\)Unsupervised attributed multiplex network embedding\.InAAAI Conference on Artificial Intelligence,Vol\.34,pp\. 5371–5378\.Cited by:[§2\.1](https://arxiv.org/html/2605.12699#S2.SS1.p1.1),[§5](https://arxiv.org/html/2605.12699#S5.p1.3)\.
- \[34\]L\. Pio\-Lopez, A\. Valdeolivas, L\. Tichit, É\. Remy, and A\. Baudot\(2021\)MultiVERSE: a multiplex and multiplex\-heterogeneous network embedding approach\.Scientific Reports11\(1\),pp\. 1–20\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p3.1)\.
- \[35\]Y\. Sadikaj, J\. Rass, Y\. Velaj, and C\. Plant\(2023\)Semi\-supervised embedding of attributed multiplex networks\.InThe Web Conference,pp\. 578–587\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p3.1),[§2\.1](https://arxiv.org/html/2605.12699#S2.SS1.p2.1)\.
- \[36\]Z\. Shen, S\. Wang, and Z\. Kang\(2024\)Beyond redundancy: information\-aware unsupervised multiplex graph structure learning\.Advances in Neural Information Processing Systems37,pp\. 31629–31658\.Cited by:[§2\.1](https://arxiv.org/html/2605.12699#S2.SS1.p1.1),[§5](https://arxiv.org/html/2605.12699#S5.p1.3)\.
- \[37\]H\. Sun, X\. Li, Z\. Wu, D\. Su, R\. Li, and G\. Wang\(2024\)Breaking the entanglement of homophily and heterophily in semi\-supervised node classification\.In2024 IEEE 40th International Conference on Data Engineering \(ICDE\),Vol\.,pp\. 2379–2392\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p2.1)\.
- \[38\]J\. Sun, Y\. Zhang, C\. Ma, M\. Coates, H\. Guo, R\. Tang, and X\. He\(2019\)Multi\-graph convolution collaborative filtering\.InInternational Conference on Data Mining,pp\. 1306–1311\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p1.1)\.
- \[39\]J\. Tang, J\. Sun, C\. Wang, and Z\. Yang\(2009\)Social influence analysis in large\-scale networks\.InProceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,pp\. 807–816\.Cited by:[§5\.1](https://arxiv.org/html/2605.12699#S5.SS1.p4.1)\.
- \[40\]L\. van der Maaten and G\. Hinton\(2008\)Visualizing data using t\-sne\.Journal of Machine Learning Research9\(86\),pp\. 2579–2605\.Cited by:[§5\.10](https://arxiv.org/html/2605.12699#S5.SS10.p1.1)\.
- \[41\]B\. Wang, X\. Cai, M\. Xu, and W\. Xiang\(2023\)A graph\-enhanced attention model for community detection in multiplex networks\.Expert Systems with Applications230,pp\. 120552\.External Links:ISSN 0957\-4174Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p3.1)\.
- \[42\]X\. Wang and M\. Zhang\(2022\)How powerful are spectral graph neural networks\.InInternational Conference on Machine Learning,pp\. 23341–23362\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p5.2)\.
- \[43\]X\. Zheng, Y\. Wang, Y\. Liu, M\. Li, M\. Zhang, D\. Jin, P\. S\. Yu, and S\. Pan\(2022\)Graph neural networks for graphs with heterophily: a survey\.arXiv preprint arXiv:2202\.07082\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p3.1)\.
- \[44\]Z\. Zhong, G\. Gonzalez, D\. Grattarola, and J\. Pang\(2022\)Unsupervised network embedding beyond homophily\.Transactions on Machine Learning Research \(TMLR\)\.Cited by:[§2\.2](https://arxiv.org/html/2605.12699#S2.SS2.p1.1)\.
- \[45\]J\. Zhu, R\. A\. Rossi, A\. Rao, T\. Mai, N\. Lipka, N\. K\. Ahmed, and D\. Koutra\(2021\)Graph neural networks with heterophily\.InAAAI Conference on Artificial Intelligence,Vol\.35,pp\. 11168–11176\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p4.1),[§2\.2](https://arxiv.org/html/2605.12699#S2.SS2.p1.1),[§4\.1](https://arxiv.org/html/2605.12699#S4.SS1.p4.12),[§4\.4](https://arxiv.org/html/2605.12699#S4.SS4.p1.9),[§5\.1](https://arxiv.org/html/2605.12699#S5.SS1.p2.1),[§5\.3\.1](https://arxiv.org/html/2605.12699#S5.SS3.SSS1.p1.18)\.
- \[46\]J\. Zhu, Y\. Yan, L\. Zhao, M\. Heimann, L\. Akoglu, and D\. Koutra\(2020\)Beyond homophily in graph neural networks: current limitations and effective designs\.Advances in Neural Information Processing Systems33,pp\. 7793–7804\.Cited by:[§1](https://arxiv.org/html/2605.12699#S1.p2.1)\.

## Appendices

## Appendix ANotation summary

In the main manuscript, we use a variety of symbols to denote graph structure, node\- and label\-related quantities, and the parameters of the spectral filters\. For convenience, this appendix gathers the used notation in a single place\. The goal is to provide a quick reference while reading Sections 3 and 4, particularly the derivations of the Chebyshev filters and the update rules\.

Table[4](https://arxiv.org/html/2605.12699#A1.T4)lists each symbol, along with a brief description and its dimensions\. This helps distinguish, for example, between node\-level matrices \(XX,YY,Y^d\\hat\{Y\}\_\{d\},Y^\\hat\{Y\}\), graph\-structural operators \(AdA\_\{d\},LdL\_\{d\},L^d\\hat\{L\}\_\{d\}\), and filter parameters \(θkℒ,d\\theta\_\{k\}^\{\\mathcal\{L\},d\},θkℋ,d\\theta\_\{k\}^\{\\mathcal\{H\},d\},γℒ,d\\gamma^\{\\mathcal\{L\},d\},γℋ,d\\gamma^\{\\mathcal\{H\},d\}\)\. Dataset\-specific notation is introduced directly in Sec\. 5 when needed\.

Table 4:Summary of main notation\.
## Appendix BProof of Proposition[4\.1](https://arxiv.org/html/2605.12699#S4.Thmtheorem1)

###### Proof\.

LetLLbe the graph Laplacian with eigenvalue decompositionL=UΛU⊤L=U\\Lambda U^\{\\top\}, whereΛ=diag\(λ1,…,λN\)\\Lambda=\\mathrm\{diag\}\(\\lambda\_\{1\},\\dots,\\lambda\_\{N\}\)is the diagonal matrix of eigenvalues andUUis the matrix of corresponding eigenvectors\. The matrix of eigenvectorsUUis an orthonormal matrix, thenU⊤U=IU^\{\\top\}\\,U=I\. The graph signalxxcan be transformed into the spectral domain using the graph Fourier transform:

x^=U⊤x\.\\hat\{x\}=U^\{\\top\}x\.
Now, applying a low\-pass filterfℒ\(L\)f^\{\\mathcal\{L\}\}\(L\)to the signalxxresults in:

fℒ\(L\)x=Ufℒ\(Λ\)U⊤x=Ufℒ\(Λ\)x^,f^\{\\mathcal\{L\}\}\(L\)\\,x=U\\,f^\{\\mathcal\{L\}\}\(\\Lambda\)\\,U^\{\\top\}x=U\\,f^\{\\mathcal\{L\}\}\(\\Lambda\)\\,\\hat\{x\},wherefℒ\(Λ\)f^\{\\mathcal\{L\}\}\(\\Lambda\)is the element\-wise application of the low\-pass filter to the eigenvalues inΛ\\Lambda\. Similarly, applying a high\-pass filterfℋ\(L\)f^\{\\mathcal\{H\}\}\(L\)after the low\-pass filter gives:

fℋ\(L\)\(fℒ\(L\)x\)=Ufℋ\(Λ\)U⊤Ufℒ\(Λ\)x^=Ufℋ\(Λ\)fℒ\(Λ\)x^\.\\begin\{split\}f^\{\\mathcal\{H\}\}\(L\)\\,\\left\(f^\{\\mathcal\{L\}\}\(L\)\\,x\\right\)&=U\\,f^\{\\mathcal\{H\}\}\(\\Lambda\)\\,U^\{\\top\}\\,U\\,f^\{\\mathcal\{L\}\}\(\\Lambda\)\\,\\hat\{x\}\\\\ &=U\\,f^\{\\mathcal\{H\}\}\(\\Lambda\)\\,f^\{\\mathcal\{L\}\}\(\\Lambda\)\\,\\hat\{x\}\.\\end\{split\}
Thus, the combined filterf\(L\)f\(L\)is given by the product offℋ\(Λ\)f^\{\\mathcal\{H\}\}\(\\Lambda\)andfℒ\(Λ\)f^\{\\mathcal\{L\}\}\(\\Lambda\):

f\(L\)=fℋ\(L\)⋅fℒ\(L\),f\(L\)=f^\{\\mathcal\{H\}\}\(L\)\\cdot f^\{\\mathcal\{L\}\}\(L\),and the filter output becomes:

y=U\(fℋ\(Λ\)⋅fℒ\(Λ\)\)U⊤x\.y=U\\,\\left\(f^\{\\mathcal\{H\}\}\(\\Lambda\)\\cdot f^\{\\mathcal\{L\}\}\(\\Lambda\)\\right\)\\,U^\{\\top\}x\.∎

## Appendix CProof of Corollary[4\.2](https://arxiv.org/html/2605.12699#S4.Thmtheorem2)

###### Proof\.

According to Prop\. 4\.1, the application of a low\-pass filterfℒ\(L\)f^\{\\mathcal\{L\}\}\(L\)followed by a high\-pass filterfℋ\(L\)f^\{\\mathcal\{H\}\}\(L\)to a graph signalxxis equivalent to applying a filter whose eigenvalues are the element\-wise product of the eigenvalues of the low\-pass and high\-pass filters\.

So, we have:

y=f\(L\)x=U\(fℋ\(Λ\)⋅fℒ\(Λ\)\)U⊤x\.y=f\(L\)\\,x=U\\left\(f^\{\\mathcal\{H\}\}\(\\Lambda\)\\cdot f^\{\\mathcal\{L\}\}\(\\Lambda\)\\right\)U^\{\\top\}x\.
Since the element\-wise product of diagonal matrices \(eigenvalue matrices\) is equivalent to the dot product of the matrices, we can conclude that the dot product of diagonal matrices is commutative:

fℋ\(Λ\)⋅fℒ\(Λ\)=fℒ\(Λ\)⋅fℋ\(Λ\),f^\{\\mathcal\{H\}\}\(\\Lambda\)\\cdot f^\{\\mathcal\{L\}\}\(\\Lambda\)=f^\{\\mathcal\{L\}\}\(\\Lambda\)\\cdot f^\{\\mathcal\{H\}\}\(\\Lambda\),it then follows that:

fℋ\(L\)⋅fℒ\(L\)=fℒ\(L\)⋅fℋ\(L\)\.f^\{\\mathcal\{H\}\}\(L\)\\cdot f^\{\\mathcal\{L\}\}\(L\)=f^\{\\mathcal\{L\}\}\(L\)\\cdot f^\{\\mathcal\{H\}\}\(L\)\.
Thus, the order of applying the low\-pass and high\-pass filters does not affect the outcome, proving that the composition is order\-invariant\.

∎

## Appendix DProof of Proposition[4\.3](https://arxiv.org/html/2605.12699#S4.Thmtheorem3)

We first show the following result, which is used to prove the proposition\.

###### Lemma 2\.

LetTi\(x\)T\_\{i\}\(x\)andTj\(x\)T\_\{j\}\(x\)be Chebyshev polynomials of degreeiiandjjrespectively\. Then, their product can be expressed as:

Ti\(x\)⋅Tj\(x\)=12\[Ti\+j\(x\)\+T\|i−j\|\(x\)\]\.T\_\{i\}\(x\)\\cdot T\_\{j\}\(x\)=\\frac\{1\}\{2\}\\left\[T\_\{i\+j\}\\left\(x\\right\)\+T\_\{\|i\-j\|\}\\left\(x\\right\)\\right\]\.

###### Proof\.

The equality can be proved using the trigonometric definition of Chebyshev polynomials:

Tk\(x\)=cos⁡\(karccos⁡\(x\)\),T\_\{k\}\(x\)=\\cos\(k\\arccos\(x\)\),and the product\-to\-sum formula for the cosine:

cos⁡\(α\)⋅cos⁡\(β\)=12\[cos⁡\(α\+β\)\+cos⁡\(α−β\)\]\.\\cos\(\\alpha\)\\cdot\\cos\(\\beta\)=\\frac\{1\}\{2\}\\left\[\\cos\(\\alpha\+\\beta\)\+\\cos\(\\alpha\-\\beta\)\\right\]\.Applying both formulas toTi\(x\)⋅Tj\(x\)T\_\{i\}\(x\)\\cdot T\_\{j\}\(x\)results in:

Ti\(x\)⋅Tj\(x\)\\displaystyle T\_\{i\}\(x\)\\cdot T\_\{j\}\(x\)=cos⁡\(iarccos⁡\(x\)\)⋅cos⁡\(jarccos⁡\(x\)\)\\displaystyle=\\cos\(i\\arccos\(x\)\)\\cdot\\cos\(j\\arccos\(x\)\)=12\[cos\(\(i\+j\)⋅arccosx\)\\displaystyle=\\frac\{1\}\{2\}\[\\cos\\left\(\\left\(i\+j\\right\)\\cdot\\arccos x\\right\)\+cos\(\(i−j\)⋅arccosx\)\]\\displaystyle\\ \\ \\ \+\\cos\\left\(\\left\(i\-j\\right\)\\cdot\\arccos x\\right\)\]=12\[Ti\+j\(x\)\+T\|i−j\|\(x\)\]\.\\displaystyle=\\frac\{1\}\{2\}\\left\[T\_\{i\+j\}\\left\(x\\right\)\+T\_\{\|i\-j\|\}\\left\(x\\right\)\\right\]\.∎

We can now prove Prop\.[4\.3](https://arxiv.org/html/2605.12699#S4.Thmtheorem3)

###### Proof\.

LetL^d=fdℒ\(L~d\)⋅fdℋ\(L~d\)\\hat\{L\}\_\{d\}=f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)\\cdot f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)\. The product offdℒf\_\{d\}^\{\\mathcal\{L\}\}andfdℋf\_\{d\}^\{\\mathcal\{H\}\}can be expressed as:

fdℒ\(L~d\)⋅fdℋ\(L~d\)\\displaystyle f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)\\cdot f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)=\(∑k=0Kθkℒ,dTk\(L~d\)\)\(∑k=0Kθkℋ,dTk\(L~d\)\)\\displaystyle=\\left\(\\sum\_\{k=0\}^\{K\}\\theta\_\{k\}^\{\\mathcal\{L\},d\}\\\>T\_\{k\}\(\\tilde\{L\}\_\{d\}\)\\right\)\\left\(\\sum\_\{k=0\}^\{K\}\\theta\_\{k\}^\{\\mathcal\{H\},d\}\\\>T\_\{k\}\(\\tilde\{L\}\_\{d\}\)\\right\)=∑i=0K∑j=0Kθiℒ,dθjℋ,dTi\(L~d\)⋅Tj\(L~d\)\.\\displaystyle=\\sum\_\{i=0\}^\{K\}\\sum\_\{j=0\}^\{K\}\\theta\_\{i\}^\{\\mathcal\{L\},d\}\\,\\theta\_\{j\}^\{\\mathcal\{H\},d\}\\,T\_\{i\}\(\\tilde\{L\}\_\{d\}\)\\,\\cdot\\,T\_\{j\}\(\\tilde\{L\}\_\{d\}\)\.
We apply Lemma[2](https://arxiv.org/html/2605.12699#Thmlemma2)to expressTi\(L~d\)⋅Tj\(L~d\)T\_\{i\}\(\\tilde\{L\}\_\{d\}\)\\cdot T\_\{j\}\(\\tilde\{L\}\_\{d\}\)in terms of the sum ofTi\+j\(L~d\)T\_\{i\+j\}\\left\(\\tilde\{L\}\_\{d\}\\right\)andT\|i−j\|\(L~d\)T\_\{\|i\-j\|\}\\left\(\\tilde\{L\}\_\{d\}\\right\)\. Thus, the double sum can be rewritten as:

Ld^=12∑i=0K∑j=0Kθiℒ,dθjℋ,d\[Ti\+j\(L~d\)\+T\|i−j\|\(L~d\)\]\.\\hat\{L\_\{d\}\}=\\frac\{1\}\{2\}\\sum\_\{i=0\}^\{K\}\\sum\_\{j=0\}^\{K\}\\theta\_\{i\}^\{\\mathcal\{L\},d\}\\theta\_\{j\}^\{\\mathcal\{H\},d\}\\left\[T\_\{i\+j\}\\left\(\\tilde\{L\}\_\{d\}\\right\)\+T\_\{\|i\-j\|\}\\left\(\\tilde\{L\}\_\{d\}\\right\)\\right\]\.∎

## Appendix EProof of Proposition[4\.4](https://arxiv.org/html/2605.12699#S4.Thmtheorem4)

###### Proof\.

SinceL~d\\tilde\{L\}\_\{d\}is symmetric, it admits an eigendecompositionL~d=UdΛ~dUd⊤\\tilde\{L\}\_\{d\}=U\_\{d\}\\tilde\{\\Lambda\}\_\{d\}U\_\{d\}^\{\\top\}, whereUdU\_\{d\}is orthonormal andΛ~d=diag\(λ~d\(1\),…,λ~d\(N\)\)\\tilde\{\\Lambda\}\_\{d\}=\\mathrm\{diag\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(1\)\},\\dots,\\tilde\{\\lambda\}\_\{d\}^\{\(N\)\}\)withλ~d\(i\)∈\[−1,1\]\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\\in\[\-1,1\]for allii\. BecauseTk\(⋅\)T\_\{k\}\(\\cdot\)is a polynomial, we have:

Tk\(L~d\)=UdTk\(Λ~d\)Ud⊤,Tk\(Λ~d\)=diag\(Tk\(λ~d\(1\)\),…,Tk\(λ~d\(N\)\)\)\.T\_\{k\}\(\\tilde\{L\}\_\{d\}\)=U\_\{d\}\\,T\_\{k\}\(\\tilde\{\\Lambda\}\_\{d\}\)\\,U\_\{d\}^\{\\top\},\\qquad T\_\{k\}\(\\tilde\{\\Lambda\}\_\{d\}\)=\\mathrm\{diag\}\\\!\\big\(T\_\{k\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(1\)\}\),\\dots,T\_\{k\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(N\)\}\)\\big\)\.Therefore,

‖Tk\(L~d\)‖2=‖Tk\(Λ~d\)‖2=max1≤i≤N⁡\|Tk\(λ~d\(i\)\)\|\.\\\|T\_\{k\}\(\\tilde\{L\}\_\{d\}\)\\\|\_\{2\}=\\\|T\_\{k\}\(\\tilde\{\\Lambda\}\_\{d\}\)\\\|\_\{2\}=\\max\_\{1\\leq i\\leq N\}\\big\|T\_\{k\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\big\|\.Since\|Tk\(x\)\|≤1\|T\_\{k\}\(x\)\|\\leq 1for allx∈\[−1,1\]x\\in\[\-1,1\], we obtain‖Tk\(L~d\)‖2≤1\\\|T\_\{k\}\(\\tilde\{L\}\_\{d\}\)\\\|\_\{2\}\\leq 1\. ∎

## Appendix FProof of Proposition[4\.5](https://arxiv.org/html/2605.12699#S4.Thmtheorem5)

###### Proof\.

Letfd\(L~d\)=∑k=0KαkTk\(L~d\)f\_\{d\}\(\\tilde\{L\}\_\{d\}\)=\\sum\_\{k=0\}^\{K\}\\alpha\_\{k\}\\,T\_\{k\}\(\\tilde\{L\}\_\{d\}\)\. By the triangle inequality and Prop\.[4\.4](https://arxiv.org/html/2605.12699#S4.Thmtheorem4), we have:

‖fd\(L~d\)‖2\\displaystyle\\\|f\_\{d\}\(\\tilde\{L\}\_\{d\}\)\\\|\_\{2\}≤∑k=0K\|αk\|‖Tk\(L~d\)‖2≤∑k=0K\|αk\|\.\\displaystyle\\leq\\sum\_\{k=0\}^\{K\}\|\\alpha\_\{k\}\|\\,\\\|T\_\{k\}\(\\tilde\{L\}\_\{d\}\)\\\|\_\{2\}\\leq\\sum\_\{k=0\}^\{K\}\|\\alpha\_\{k\}\|\.
For any matrix\-valued signalS∈ℝN×CS\\in\\mathbb\{R\}^\{N\\times C\}, we similarly obtain:

‖fd\(L~d\)S‖F\\displaystyle\\\|f\_\{d\}\(\\tilde\{L\}\_\{d\}\)\\,S\\\|\_\{F\}=‖∑k=0KαkTk\(L~d\)S‖F\\displaystyle=\\Big\\\|\\sum\_\{k=0\}^\{K\}\\alpha\_\{k\}\\,T\_\{k\}\(\\tilde\{L\}\_\{d\}\)\\,S\\Big\\\|\_\{F\}≤∑k=0K\|αk\|‖Tk\(L~d\)S‖F\\displaystyle\\leq\\sum\_\{k=0\}^\{K\}\|\\alpha\_\{k\}\|\\,\\\|T\_\{k\}\(\\tilde\{L\}\_\{d\}\)\\,S\\\|\_\{F\}≤∑k=0K\|αk\|‖Tk\(L~d\)‖2‖S‖F\\displaystyle\\leq\\sum\_\{k=0\}^\{K\}\|\\alpha\_\{k\}\|\\,\\\|T\_\{k\}\(\\tilde\{L\}\_\{d\}\)\\\|\_\{2\}\\,\\\|S\\\|\_\{F\}≤\(∑k=0K\|αk\|\)‖S‖F,\\displaystyle\\leq\\Big\(\\sum\_\{k=0\}^\{K\}\|\\alpha\_\{k\}\|\\Big\)\\,\\\|S\\\|\_\{F\},again using Prop\.[4\.4](https://arxiv.org/html/2605.12699#S4.Thmtheorem4)\. ∎

## Appendix GProof of Corollary[4\.6](https://arxiv.org/html/2605.12699#S4.Thmtheorem6)

###### Proof\.

By submultiplicativity of the spectral norm, we have:

‖L^d‖2\\displaystyle\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}=‖fdℒ\(L~d\)fdℋ\(L~d\)‖2\\displaystyle=\\\|f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)\\\|\_\{2\}≤‖fdℒ\(L~d\)‖2‖fdℋ\(L~d\)‖2\.\\displaystyle\\leq\\\|f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)\\\|\_\{2\}\\;\\\|f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)\\\|\_\{2\}\.Applying Prop\.[4\.5](https://arxiv.org/html/2605.12699#S4.Thmtheorem5)to each factor yields:

‖fdℒ\(L~d\)‖2≤∑k=0K\|θkℒ,d\|,‖fdℋ\(L~d\)‖2≤∑k=0K\|θkℋ,d\|,\\\|f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)\\\|\_\{2\}\\leq\\sum\_\{k=0\}^\{K\}\|\\theta\_\{k\}^\{\\mathcal\{L\},d\}\|,\\qquad\\\|f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)\\\|\_\{2\}\\leq\\sum\_\{k=0\}^\{K\}\|\\theta\_\{k\}^\{\\mathcal\{H\},d\}\|,which proves the stated bound on‖L^d‖2\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}\.

For anyS∈ℝN×CS\\in\\mathbb\{R\}^\{N\\times C\}, we use‖L^dS‖F≤‖L^d‖2‖S‖F\\\|\\hat\{L\}\_\{d\}\\,S\\\|\_\{F\}\\leq\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}\\,\\\|S\\\|\_\{F\}to obtain the Frobenius inequality in the corollary\.

For the coefficient bound, Prop\.[4\.3](https://arxiv.org/html/2605.12699#S4.Thmtheorem3)expressesL^d\\hat\{L\}\_\{d\}as a degree\-2K2KChebyshev expansion with coefficients obtained by summing contributions of the form12θiℒ,dθjℋ,d\\frac\{1\}\{2\}\\theta\_\{i\}^\{\\mathcal\{L\},d\}\\theta\_\{j\}^\{\\mathcal\{H\},d\}\(each pair\(i,j\)\(i,j\)contributes to at most two terms:Ti\+jT\_\{i\+j\}andT\|i−j\|T\_\{\|i\-j\|\}\)\. By the triangle inequality, theℓ1\\ell\_\{1\}norm of the resulting coefficient vectorθ¯d\\bar\{\\theta\}^\{\\,d\}satisfies:

‖θ¯d‖1\\displaystyle\\\|\\bar\{\\theta\}^\{\\,d\}\\\|\_\{1\}≤12∑i=0K∑j=0K\|θiℒ,dθjℋ,d\|\+12∑i=0K∑j=0K\|θiℒ,dθjℋ,d\|\\displaystyle\\leq\\frac\{1\}\{2\}\\sum\_\{i=0\}^\{K\}\\sum\_\{j=0\}^\{K\}\\big\|\\theta\_\{i\}^\{\\mathcal\{L\},d\}\\theta\_\{j\}^\{\\mathcal\{H\},d\}\\big\|\+\\frac\{1\}\{2\}\\sum\_\{i=0\}^\{K\}\\sum\_\{j=0\}^\{K\}\\big\|\\theta\_\{i\}^\{\\mathcal\{L\},d\}\\theta\_\{j\}^\{\\mathcal\{H\},d\}\\big\|=\(∑i=0K\|θiℒ,d\|\)\(∑j=0K\|θjℋ,d\|\)=‖θℒ,d‖1‖θℋ,d‖1\.\\displaystyle=\\Big\(\\sum\_\{i=0\}^\{K\}\|\\theta\_\{i\}^\{\\mathcal\{L\},d\}\|\\Big\)\\,\\Big\(\\sum\_\{j=0\}^\{K\}\|\\theta\_\{j\}^\{\\mathcal\{H\},d\}\|\\Big\)=\\\|\\theta^\{\\mathcal\{L\},d\}\\\|\_\{1\}\\,\\\|\\theta^\{\\mathcal\{H\},d\}\\\|\_\{1\}\.∎

## Appendix HProof of Proposition[4\.7](https://arxiv.org/html/2605.12699#S4.Thmtheorem7)

###### Proof\.

We first prove the Frobenius bound\. For any row vectors∈ℝCs\\in\\mathbb\{R\}^\{C\}, letp=softmax\(s\)∈ℝCp=\\mathrm\{softmax\}\(s\)\\in\\mathbb\{R\}^\{C\}, i\.e\.,pc=exp⁡\(sc\)/∑c′=1Cexp⁡\(sc′\)p\_\{c\}=\\exp\(s\_\{c\}\)/\\sum\_\{c^\{\\prime\}=1\}^\{C\}\\exp\(s\_\{c^\{\\prime\}\}\)\. Thenpc≥0p\_\{c\}\\geq 0and∑c=1Cpc=1\\sum\_\{c=1\}^\{C\}p\_\{c\}=1, hence‖p‖2≤‖p‖1=1\\\|p\\\|\_\{2\}\\leq\\\|p\\\|\_\{1\}=1\. Applying this row\-wise toY=softmax\(S\)Y=\\mathrm\{softmax\}\(S\)gives:

‖Y‖F2=∑i=1N‖Yi:‖22≤∑i=1N1=N,\\\|Y\\\|\_\{F\}^\{2\}=\\sum\_\{i=1\}^\{N\}\\\|Y\_\{i:\}\\\|\_\{2\}^\{2\}\\leq\\sum\_\{i=1\}^\{N\}1=N,so‖Y‖F≤N\\\|Y\\\|\_\{F\}\\leq\\sqrt\{N\}\.

We now prove the Lipschitz bound\. Consider the vector softmax mapφ:ℝC→ℝC\\varphi:\\mathbb\{R\}^\{C\}\\to\\mathbb\{R\}^\{C\},φ\(s\)=softmax\(s\)\\varphi\(s\)=\\mathrm\{softmax\}\(s\)\. Its Jacobian atssis:

J\(s\)=∇φ\(s\)=diag\(p\)−pp⊤,p=φ\(s\)\.J\(s\)=\\nabla\\varphi\(s\)=\\mathrm\{diag\}\(p\)\-p\\,p^\{\\top\},\\qquad p=\\varphi\(s\)\.For any unit vectorv∈ℝCv\\in\\mathbb\{R\}^\{C\}with‖v‖2=1\\\|v\\\|\_\{2\}=1, we have

v⊤J\(s\)v=∑c=1Cpcvc2−\(∑c=1Cpcvc\)2=Var\(V\),v^\{\\top\}J\(s\)\\,v=\\sum\_\{c=1\}^\{C\}p\_\{c\}v\_\{c\}^\{2\}\-\\Big\(\\sum\_\{c=1\}^\{C\}p\_\{c\}v\_\{c\}\\Big\)^\{2\}=\\mathrm\{Var\}\(V\),whereVVis a real\-valued random variable taking valuevcv\_\{c\}with probabilitypcp\_\{c\}\. SinceV∈\[vmin,vmax\]V\\in\[v\_\{\\min\},v\_\{\\max\}\], Popoviciu’s inequality yields:

Var\(V\)≤\(vmax−vmin\)24\.\\mathrm\{Var\}\(V\)\\leq\\frac\{\(v\_\{\\max\}\-v\_\{\\min\}\)^\{2\}\}\{4\}\.Moreover,vmax−vmin≤maxi,j⁡\|vi−vj\|≤2‖v‖2=2v\_\{\\max\}\-v\_\{\\min\}\\leq\\max\_\{i,j\}\|v\_\{i\}\-v\_\{j\}\|\\leq\\sqrt\{2\}\\,\\\|v\\\|\_\{2\}=\\sqrt\{2\}, because\|vi−vj\|=\|\(ei−ej\)⊤v\|≤‖ei−ej‖2‖v‖2=2‖v‖2\|v\_\{i\}\-v\_\{j\}\|=\|\(e\_\{i\}\-e\_\{j\}\)^\{\\top\}v\|\\leq\\\|e\_\{i\}\-e\_\{j\}\\\|\_\{2\}\\,\\\|v\\\|\_\{2\}=\\sqrt\{2\}\\,\\\|v\\\|\_\{2\}\. Therefore,

v⊤J\(s\)v≤\(2\)24=12∀v:∥v∥2=1,v^\{\\top\}J\(s\)\\,v\\leq\\frac\{\(\\sqrt\{2\}\)^\{2\}\}\{4\}=\\frac\{1\}\{2\}\\qquad\\forall v:\\\|v\\\|\_\{2\}=1,which implies‖J\(s\)‖2≤1/2\\\|J\(s\)\\\|\_\{2\}\\leq 1/2for alls∈ℝCs\\in\\mathbb\{R\}^\{C\}\. By the mean value theorem, for anys,s′∈ℝCs,s^\{\\prime\}\\in\\mathbb\{R\}^\{C\}, we obtain:

‖φ\(s\)−φ\(s′\)‖2≤supτ∈\[0,1\]‖J\(s′\+τ\(s−s′\)\)‖2‖s−s′‖2≤12‖s−s′‖2\.\\\|\\varphi\(s\)\-\\varphi\(s^\{\\prime\}\)\\\|\_\{2\}\\leq\\sup\_\{\\tau\\in\[0,1\]\}\\\|J\(s^\{\\prime\}\+\\tau\(s\-s^\{\\prime\}\)\)\\\|\_\{2\}\\,\\\|s\-s^\{\\prime\}\\\|\_\{2\}\\leq\\frac\{1\}\{2\}\\,\\\|s\-s^\{\\prime\}\\\|\_\{2\}\.
Applying this row\-wise toY=softmax\(S\)Y=\\mathrm\{softmax\}\(S\)andY′=softmax\(S′\)Y^\{\\prime\}=\\mathrm\{softmax\}\(S^\{\\prime\}\)yields:

‖Y−Y′‖F2\\displaystyle\\\|Y\-Y^\{\\prime\}\\\|\_\{F\}^\{2\}=∑i=1N‖φ\(Si:\)−φ\(Si:′\)‖22\\displaystyle=\\sum\_\{i=1\}^\{N\}\\\|\\varphi\(S\_\{i:\}\)\-\\varphi\(S^\{\\prime\}\_\{i:\}\)\\\|\_\{2\}^\{2\}≤∑i=1N\(12‖Si:−Si:′‖2\)2=14‖S−S′‖F2,\\displaystyle\\leq\\sum\_\{i=1\}^\{N\}\\Big\(\\frac\{1\}\{2\}\\\|S\_\{i:\}\-S^\{\\prime\}\_\{i:\}\\\|\_\{2\}\\Big\)^\{2\}=\\frac\{1\}\{4\}\\,\\\|S\-S^\{\\prime\}\\\|\_\{F\}^\{2\},hence‖Y−Y′‖F≤12‖S−S′‖F\\\|Y\-Y^\{\\prime\}\\\|\_\{F\}\\leq\\frac\{1\}\{2\}\\\|S\-S^\{\\prime\}\\\|\_\{F\}\. ∎

## Appendix IProof of Proposition[4\.8](https://arxiv.org/html/2605.12699#S4.Thmtheorem8)

###### Proof\.

Becausefdℒ\(L~d\)f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)andfdℋ\(L~d\)f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)are spectral filters ofL~d\\tilde\{L\}\_\{d\}, they diagonalize in the eigenbasis ofL~d\\tilde\{L\}\_\{d\}:

fdℒ\(L~d\)=Udfdℒ\(Λ~d\)Ud⊤,fdℋ\(L~d\)=Udfdℋ\(Λ~d\)Ud⊤\.f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)=U\_\{d\}f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{\\Lambda\}\_\{d\}\)U\_\{d\}^\{\\top\},\\qquad f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)=U\_\{d\}f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{\\Lambda\}\_\{d\}\)U\_\{d\}^\{\\top\}\.Hence,

fd\(L~d\)=fdℒ\(L~d\)fdℋ\(L~d\)=Uddiag\(fdℒ\(λ~d\(i\)\)fdℋ\(λ~d\(i\)\)\)i=1NUd⊤\.f\_\{d\}\(\\tilde\{L\}\_\{d\}\)=f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{L\}\_\{d\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{L\}\_\{d\}\)=U\_\{d\}\\,\\mathrm\{diag\}\\\!\\Big\(f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\Big\)\_\{i=1\}^\{N\}\\,U\_\{d\}^\{\\top\}\.Also,PΩ,d=Uddiag\(𝟏i∈Ω\)Ud⊤P\_\{\\Omega,d\}=U\_\{d\}\\,\\mathrm\{diag\}\(\\mathbf\{1\}\_\{i\\in\\Omega\}\)\\,U\_\{d\}^\{\\top\}\. Letx^=Ud⊤x\\hat\{x\}=U\_\{d\}^\{\\top\}x\. Then:

‖PΩ,dfd\(L~d\)x‖22\\displaystyle\\\|P\_\{\\Omega,d\}\\,f\_\{d\}\(\\tilde\{L\}\_\{d\}\)\\,x\\\|\_\{2\}^\{2\}=∑i∈Ω\|fdℒ\(λ~d\(i\)\)fdℋ\(λ~d\(i\)\)\|2\|x^i\|2\\displaystyle=\\sum\_\{i\\in\\Omega\}\\big\|f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\big\|^\{2\}\\,\|\\hat\{x\}\_\{i\}\|^\{2\}≤\(maxi∈Ω⁡\|fdℒ\(λ~d\(i\)\)fdℋ\(λ~d\(i\)\)\|2\)∑i∈Ω\|x^i\|2\\displaystyle\\leq\\Big\(\\max\_\{i\\in\\Omega\}\\big\|f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\big\|^\{2\}\\Big\)\\sum\_\{i\\in\\Omega\}\|\\hat\{x\}\_\{i\}\|^\{2\}=\(maxi∈Ω⁡\|fdℒ\(λ~d\(i\)\)fdℋ\(λ~d\(i\)\)\|2\)‖PΩ,dx‖22\.\\displaystyle=\\Big\(\\max\_\{i\\in\\Omega\}\\big\|f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\big\|^\{2\}\\Big\)\\\|P\_\{\\Omega,d\}\\,x\\\|\_\{2\}^\{2\}\.Taking square roots yields the first inequality in Prop\.[4\.8](https://arxiv.org/html/2605.12699#S4.Thmtheorem8)\. The second inequality follows immediately from:

maxi∈Ωhigh⁡\|fdℒ\(λ~d\(i\)\)fdℋ\(λ~d\(i\)\)\|≤\(maxi∈Ωhigh⁡\|fdℒ\(λ~d\(i\)\)\|\)\(maxi∈Ωhigh⁡\|fdℋ\(λ~d\(i\)\)\|\)\.\\max\_\{i\\in\\Omega\_\{\\mathrm\{high\}\}\}\\big\|f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\,f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\\big\|\\leq\\Big\(\\max\_\{i\\in\\Omega\_\{\\mathrm\{high\}\}\}\|f\_\{d\}^\{\\mathcal\{L\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\|\\Big\)\\Big\(\\max\_\{i\\in\\Omega\_\{\\mathrm\{high\}\}\}\|f\_\{d\}^\{\\mathcal\{H\}\}\(\\tilde\{\\lambda\}\_\{d\}^\{\(i\)\}\)\|\\Big\)\.∎

## Appendix JProof of Lemma[1](https://arxiv.org/html/2605.12699#Thmlemma1)

###### Proof\.

Recall the softmax cross\-entropy written in terms of logitsz∈ℝCz\\in\\mathbb\{R\}^\{C\}and a one\-hot labely∈\{0,1\}Cy\\in\\\{0,1\\\}^\{C\}:

ℓ\(z,y\)=log⁡\(∑c=1Cezc\)−∑c=1Cyczc\.\\ell\(z,y\)=\\log\\Big\(\\sum\_\{c=1\}^\{C\}e^\{z\_\{c\}\}\\Big\)\-\\sum\_\{c=1\}^\{C\}y\_\{c\}z\_\{c\}\.Letc⋆c^\{\\star\}be the true class so thatyc⋆=1y\_\{c^\{\\star\}\}=1andyc=0y\_\{c\}=0forc≠c⋆c\\neq c^\{\\star\}\. Then, we have:

ℓ\(z,y\)=log⁡\(∑c=1Cezc\)−zc⋆\.\\ell\(z,y\)=\\log\\Big\(\\sum\_\{c=1\}^\{C\}e^\{z\_\{c\}\}\\Big\)\-z\_\{c^\{\\star\}\}\.Its gradient with respect tozzis:

∇zℓ\(z,y\)=softmax\(z\)−y\.\\nabla\_\{z\}\\ell\(z,y\)=\\mathrm\{softmax\}\(z\)\-y\.Sincesoftmax\(z\)\\mathrm\{softmax\}\(z\)is a probability vector in the simplex, we have‖softmax\(z\)‖2≤1\\\|\\mathrm\{softmax\}\(z\)\\\|\_\{2\}\\leq 1and‖y‖2=1\\\|y\\\|\_\{2\}=1, hence

‖∇zℓ\(z,y\)‖2=‖softmax\(z\)−y‖2≤2\.\\\|\\nabla\_\{z\}\\ell\(z,y\)\\\|\_\{2\}=\\\|\\mathrm\{softmax\}\(z\)\-y\\\|\_\{2\}\\leq\\sqrt\{2\}\.A bounded gradient implies Lipschitzness: for allz,z′∈ℝCz,z^\{\\prime\}\\in\\mathbb\{R\}^\{C\},

\|ℓ\(z,y\)−ℓ\(z′,y\)\|≤supξ‖∇ℓ\(ξ,y\)‖2‖z−z′‖2≤2‖z−z′‖2\.\|\\ell\(z,y\)\-\\ell\(z^\{\\prime\},y\)\|\\leq\\sup\_\{\\xi\}\\\|\\nabla\\ell\(\\xi,y\)\\\|\_\{2\}\\,\\\|z\-z^\{\\prime\}\\\|\_\{2\}\\leq\\sqrt\{2\}\\,\\\|z\-z^\{\\prime\}\\\|\_\{2\}\.This proves the2\\sqrt\{2\}\-Lipschitz claim\.

For boundedness, if‖z‖∞≤Bz\\\|z\\\|\_\{\\infty\}\\leq B\_\{z\}thenmaxc⁡zc≤Bz\\max\_\{c\}z\_\{c\}\\leq B\_\{z\}andminc⁡zc≥−Bz\\min\_\{c\}z\_\{c\}\\geq\-B\_\{z\}\. Therefore, we obtain:

log⁡\(∑c=1Cezc\)≤log⁡\(CeBz\)=log⁡\(C\)\+Bz,−zc⋆≤Bz,\\log\\Big\(\\sum\_\{c=1\}^\{C\}e^\{z\_\{c\}\}\\Big\)\\leq\\log\\big\(C\\,e^\{B\_\{z\}\}\\big\)=\\log\(C\)\+B\_\{z\},\\qquad\-z\_\{c^\{\\star\}\}\\leq B\_\{z\},which yieldsℓ\(z,y\)≤log⁡\(C\)\+2Bz\\ell\(z,y\)\\leq\\log\(C\)\+2B\_\{z\}\. ∎

## Appendix KProof of Proposition[4\.9](https://arxiv.org/html/2605.12699#S4.Thmtheorem9)

###### Proof\.

Fix a labeled sampleS=\{\(xi,yi\)\}i=1nS=\\\{\(x\_\{i\},y\_\{i\}\)\\\}\_\{i=1\}^\{n\}and letΣ∈\{−1,\+1\}n×C\\Sigma\\in\\\{\-1,\+1\\\}^\{n\\times C\}be a matrix of i\.i\.d\. Rademacher signs\. For any matrixZ∈ℝn×CZ\\in\\mathbb\{R\}^\{n\\times C\}, denote the Frobenius inner product by⟨Σ,Z⟩:=tr\(Σ⊤Z\)\\langle\\Sigma,Z\\rangle:=\\mathrm\{tr\}\(\\Sigma^\{\\top\}Z\)\.

For dimensiondd, the score matrix on the sample can be written as

Zd=AdZ0Hd,Z\_\{d\}=A\_\{d\}\\,Z\_\{0\}\\,H\_\{d\},whereZ0∈ℝn×CZ\_\{0\}\\in\\mathbb\{R\}^\{n\\times C\}collects the MLP outputs on the sample andAdA\_\{d\}is the \(sample\-restricted\) linear propagation operator induced byL^d\\hat\{L\}\_\{d\}\. Since restricting to a subset of rows/columns can not increase the spectral norm, we have‖Ad‖2≤‖L^d‖2\\\|A\_\{d\}\\\|\_\{2\}\\leq\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}\.

By definition of empirical Rademacher complexity,

ℜn\(ℱd\)\\displaystyle\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\_\{d\}\)=1n𝔼Σ\[supZ0∈ℱ0\(S\)⟨Σ,AdZ0Hd⟩\]\\displaystyle=\\frac\{1\}\{n\}\\,\\mathbb\{E\}\_\{\\Sigma\}\\Big\[\\sup\_\{Z\_\{0\}\\in\\mathcal\{F\}\_\{0\}\(S\)\}\\langle\\Sigma,\\;A\_\{d\}Z\_\{0\}H\_\{d\}\\rangle\\Big\]=1n𝔼Σ\[supZ0∈ℱ0\(S\)tr\(Σ⊤AdZ0Hd\)\]\\displaystyle=\\frac\{1\}\{n\}\\,\\mathbb\{E\}\_\{\\Sigma\}\\Big\[\\sup\_\{Z\_\{0\}\\in\\mathcal\{F\}\_\{0\}\(S\)\}\\mathrm\{tr\}\(\\Sigma^\{\\top\}A\_\{d\}Z\_\{0\}H\_\{d\}\)\\Big\]=1n𝔼Σ\[supZ0∈ℱ0\(S\)tr\(\(Ad⊤ΣHd⊤\)⊤Z0\)\]\\displaystyle=\\frac\{1\}\{n\}\\,\\mathbb\{E\}\_\{\\Sigma\}\\Big\[\\sup\_\{Z\_\{0\}\\in\\mathcal\{F\}\_\{0\}\(S\)\}\\mathrm\{tr\}\\big\(\(A\_\{d\}^\{\\top\}\\Sigma H\_\{d\}^\{\\top\}\)^\{\\top\}Z\_\{0\}\\big\)\\Big\]=1n𝔼Σ\[supZ0∈ℱ0\(S\)⟨Ad⊤ΣHd⊤,Z0⟩\]\.\\displaystyle=\\frac\{1\}\{n\}\\,\\mathbb\{E\}\_\{\\Sigma\}\\Big\[\\sup\_\{Z\_\{0\}\\in\\mathcal\{F\}\_\{0\}\(S\)\}\\langle A\_\{d\}^\{\\top\}\\Sigma H\_\{d\}^\{\\top\},\\;Z\_\{0\}\\rangle\\Big\]\.Using Cauchy–Schwarz for the Frobenius inner product,

⟨Ad⊤ΣHd⊤,Z0⟩≤‖Ad⊤ΣHd⊤‖F‖Z0‖F\.\\langle A\_\{d\}^\{\\top\}\\Sigma H\_\{d\}^\{\\top\},\\;Z\_\{0\}\\rangle\\leq\\\|A\_\{d\}^\{\\top\}\\Sigma H\_\{d\}^\{\\top\}\\\|\_\{F\}\\,\\\|Z\_\{0\}\\\|\_\{F\}\.Moreover, by submultiplicativity of norms and‖M‖F≤rank\(M\)‖M‖2\\\|M\\\|\_\{F\}\\leq\\sqrt\{\\mathrm\{rank\}\(M\)\}\\,\\\|M\\\|\_\{2\}, we can bound the transformation of Rademacher signs as:

‖Ad⊤ΣHd⊤‖F≤‖Ad⊤‖2‖Σ‖F‖Hd⊤‖2=‖Ad‖2‖Hd‖2‖Σ‖F\.\\\|A\_\{d\}^\{\\top\}\\Sigma H\_\{d\}^\{\\top\}\\\|\_\{F\}\\leq\\\|A\_\{d\}^\{\\top\}\\\|\_\{2\}\\,\\\|\\Sigma\\\|\_\{F\}\\,\\\|H\_\{d\}^\{\\top\}\\\|\_\{2\}=\\\|A\_\{d\}\\\|\_\{2\}\\,\\\|H\_\{d\}\\\|\_\{2\}\\,\\\|\\Sigma\\\|\_\{F\}\.This shows that the supremum over the transformed class is at most scaled by‖Ad‖2‖Hd‖2\\\|A\_\{d\}\\\|\_\{2\}\\\|H\_\{d\}\\\|\_\{2\}\. Since‖Ad‖2≤‖L^d‖2\\\|A\_\{d\}\\\|\_\{2\}\\leq\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}, we obtain:

ℜn\(ℱd\)≤‖L^d‖2‖Hd‖2ℜn\(ℱ0\),\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\_\{d\}\)\\leq\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}\\,\\\|H\_\{d\}\\\|\_\{2\}\\,\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\_\{0\}\),which is exactly Eq\. \([22](https://arxiv.org/html/2605.12699#S4.E22)\)\. ∎

## Appendix LProof of Theorem[4\.10](https://arxiv.org/html/2605.12699#S4.Thmtheorem10)

###### Proof\.

Let𝒢\\mathcal\{G\}denote the class of dimension\-averaged predictors induced by HAAM, and consider the lossℓ\\ellin Eq\. \([13](https://arxiv.org/html/2605.12699#S4.E13)\)\. Under the i\.i\.d\. sampling assumption, a standard Rademacher generalization result for bounded losses implies that with probability at least1−δ1\-\\delta,

ℛ≤ℛ^\+2ℜn\(ℓ∘𝒢\)\+3Bmaxlog⁡\(2/δ\)2n,\\mathcal\{R\}\\leq\\widehat\{\\mathcal\{R\}\}\+2\\,\\mathfrak\{R\}\_\{n\}\(\\ell\\circ\\mathcal\{G\}\)\+3\\,B\_\{\\max\}\\sqrt\{\\frac\{\\log\(2/\\delta\)\}\{2n\}\},whereℜn\(ℓ∘𝒢\)\\mathfrak\{R\}\_\{n\}\(\\ell\\circ\\mathcal\{G\}\)is the empirical Rademacher complexity of the loss\-composed class andBmaxB\_\{\\max\}upper bounds the loss range\.

By Lemma[1](https://arxiv.org/html/2605.12699#Thmlemma1),ℓ\(⋅,y\)\\ell\(\\cdot,y\)is2\\sqrt\{2\}\-Lipschitz in its logits argument with respect to∥⋅∥2\\\|\\cdot\\\|\_\{2\}\. Therefore, by the vector contraction principle, we have:

ℜn\(ℓ∘𝒢\)≤2D∑d=1Dℜn\(ℱd\)\.\\mathfrak\{R\}\_\{n\}\(\\ell\\circ\\mathcal\{G\}\)\\leq\\frac\{\\sqrt\{2\}\}\{D\}\\sum\_\{d=1\}^\{D\}\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\_\{d\}\)\.Finally, applying Proposition[4\.9](https://arxiv.org/html/2605.12699#S4.Thmtheorem9)to eachℜn\(ℱd\)\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\_\{d\}\)yields

ℜn\(ℓ∘𝒢\)≤2D\(∑d=1D‖L^d‖2‖Hd‖2\)ℜn\(ℱ0\)\.\\mathfrak\{R\}\_\{n\}\(\\ell\\circ\\mathcal\{G\}\)\\leq\\frac\{\\sqrt\{2\}\}\{D\}\\Big\(\\sum\_\{d=1\}^\{D\}\\\|\\hat\{L\}\_\{d\}\\\|\_\{2\}\\,\\\|H\_\{d\}\\\|\_\{2\}\\Big\)\\,\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\_\{0\}\)\.Substituting this inequality into the generalization bound above proves Eq\. \([23](https://arxiv.org/html/2605.12699#S4.E23)\)\. ∎
Modeling Heterophily in Multiplex Graphs: An Adaptive Approach for Node Classification

Similar Articles

Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation

Towards Robust Federated Multimodal Graph Learning under Modality Heterogeneity

Heterogeneous Graph Condensation via Role-Aware Clustering

Modeling Spectral Energy Shifts in Spatio-Temporal Graph Anomaly Detection

Context-aware Modality-Topology Co-Alignment for Multimodal Attributed Graphs

Submit Feedback

Similar Articles

Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation
Towards Robust Federated Multimodal Graph Learning under Modality Heterogeneity
Heterogeneous Graph Condensation via Role-Aware Clustering
Modeling Spectral Energy Shifts in Spatio-Temporal Graph Anomaly Detection
Context-aware Modality-Topology Co-Alignment for Multimodal Attributed Graphs