TERGAD: Structure-Aware Text-Enhanced Representations for Graph Anomaly Detection
Summary
TERGAD is a novel data augmentation framework that uses large language models to translate node-level topological properties into semantic narratives, then fuses these with original node attributes via a gated dual-branch autoencoder for graph anomaly detection, achieving state-of-the-art results on six datasets.
View Cached Full Text
Cached at: 05/20/26, 08:26 AM
# TERGAD: Structure-Aware Text-Enhanced Representations for Graph Anomaly Detection
Source: [https://arxiv.org/html/2605.19738](https://arxiv.org/html/2605.19738)
Wen Shi, Zhe Wang, Huafei Huang, Qing Qing, Ziqi Xu, Qixin Zhang, Xikun Zhang, Renqiang Luo, Feng XiaShi Wen, Zhe Wang, Qing Qing, and Renqiang Luo are with College of Computer Science and Technology, Jilin University, Changchun 130012, China \(\{shiwen24, qingqing25\}@mails\.jlu\.edu\.cn, \{wz2000, lrenqiang\}@jlu\.edu\.cn\)\.Huafei Huang is with the School of Computer Science and Information Technology, Adelaide University, Adelaide SA5095, Australia \(e\-mail: hhuafei@outlook\.com\)\.Ziqi Xu, Xikun Zhang, and Feng Xia are with the School of Computing Technologies, RMIT University, Melbourne, VIC 3000, Australia \(e\-mail: \{ziqi\.xu, xikun\.zhang\}@rmit\.edu\.au, f\.xia@ieee\.org\)\.Qixin Zhang is with College of Computing and Data Science, Nanyang Technological University, 639798, Singapore \(e\-mail: qixin\.zhang2026@gmail\.com\)\.Corresponding author: Xikun Zhang, Renqiang Luo\.
###### Abstract
Graph Anomaly Detection \(GAD\) aims to identify atypical graph entities, such as nodes, edges, or substructures, that deviate significantly from the majority\. While existing text\-rich approaches typically integrate structural context into the data representation pipeline using raw textual features, they often neglect the structural context of nodes\. This limitation hinders their ability to detect sophisticated anomalies arising from inconsistencies between a node’s inherent content and its topological role\. To bridge this gap, we propose TERGAD \(Structure\-aware Text\-enhanced Representations for Graph Anomaly Detection\), A novel data augmentation framework that enriches structural semantics for GAD via the semantic reasoning capabilities of Large Language Models\.\(LLMs\)\. Specifically, TERGAD translates node\-level topological properties into descriptive natural language narratives, which are subsequently processed by an LLM to derive high\-level semantic embeddings\. These embeddings are then adaptively fused with original node attributes through a gated dual\-branch autoencoder to jointly reconstruct both graph structure and node features\. The anomaly score is computed based on the integrated reconstruction error, effectively capturing deviations in both observable attributes and LLM\-informed semantic expectations\. Extensive experiments on six real\-world datasets demonstrate that TERGAD consistently outperforms state\-of\-the\-art baselines\. Furthermore, our ablation studies validate the indispensable role of structural semantic guidance and the efficacy of the gated fusion mechanism\. Code is available athttps://github\.com/Kantorakitty/TERGAD\-main\.
## IIntroduction
Graph Anomaly Detection \(GAD\) aims to identify atypical graph objects, such as nodes, edges, or substructures, that deviate significantly from majority patterns in graph databases\[[33](https://arxiv.org/html/2605.19738#bib.bib33)\]\. This task has become indispensable in high\-stakes domains, including financial fraud prevention\[[14](https://arxiv.org/html/2605.19738#bib.bib14)\], cyber intrusion detection\[[29](https://arxiv.org/html/2605.19738#bib.bib32)\], and social network spam filtering\[[25](https://arxiv.org/html/2605.19738#bib.bib27)\]\. The growing interconnectedness of real\-world data and rapid advancements in graph data mining have significantly boosted interest in GAD over the past decade\[[15](https://arxiv.org/html/2605.19738#bib.bib15)\]\. A key shift has been the transition from reliance on human expertise to traditional machine learning, and more recently, to sophisticated deep learning techniques\[[6](https://arxiv.org/html/2605.19738#bib.bib7)\]\. These deep learning approaches enhance detection performance by learning complex, non\-linear patterns directly from massive datasets through end\-to\-end data processing pipelines without manual feature engineering\[[16](https://arxiv.org/html/2605.19738#bib.bib16)\]\.
In many real\-world scenarios, however, graphs are “text\-poor” or purely structural, consisting only of nodes and edges without rich descriptive attributes\. This lack of explicit semantic information poses a significant challenge, as anomalous nodes often hide within subtle structural patterns that are difficult to distinguish using numerical features alone\[[24](https://arxiv.org/html/2605.19738#bib.bib26),[12](https://arxiv.org/html/2605.19738#bib.bib13)\]\. Reconstruction\-based analysis has become a common approach for detecting these anomalies, identifying nodes that the model fails to reconstruct accurately from learned representations\. However, existing methods often struggle to interpret the functional meaning of a node’s position within the data representation pipeline, as they rely on raw adjacency matrices or basic degree statistics\[[30](https://arxiv.org/html/2605.19738#bib.bib31)\]\. This limitation stems from an inability to translate abstract topological patterns into semantically rich representations that provide a higher\-level understanding of node behavior\. For example, a node with a high clustering coefficient might be a normal hub in one community but a malicious bridge in another, a nuance that numerical feature engineering pipelines often miss\.
To overcome the sparsity of node attributes, we explore the potential of transforming structural patterns into text\-rich representations via Large Language Models \(LLMs\) as a data augmentation strategy\[[11](https://arxiv.org/html/2605.19738#bib.bib12)\]\. A growing paradigm involves converting graph topology into natural language descriptions, allowing the model to leverage the vast reasoning capabilities of LLMs for structural understanding\. For instance, recent works have attempted to prompt LLMs to produce explanations or labels based on graph connectivity\[[8](https://arxiv.org/html/2605.19738#bib.bib8),[7](https://arxiv.org/html/2605.19738#bib.bib6)\]\. These approaches demonstrate promising results in tasks that benefit from natural language reasoning and external knowledge integration\[[13](https://arxiv.org/html/2605.19738#bib.bib11)\]\. However, standard Graph2Text methods suffer from critical limitations, such as requiring LLMs to infer implicit structures from long, sequential text descriptions\. Furthermore, these methods often produce excessively long sequences that exceed the practical context windows of most LLMs, making them unscalable for large\-scale graph data\. Moreover, most current methods lack an explicit mechanism to balance these newly generated semantic priors with the original structural properties of the graph within a unified feature integration framework\. To further investigate the direct applicability of LLMs in this domain, we conducted a preliminary study evaluating vanilla LLMs on the GAD task\. The results \(in Fig\.[1](https://arxiv.org/html/2605.19738#S1.F1)\) reveal their zero\-shot performance significantly lags behind GCN\-based GAD models, suggesting that LLMs struggle to inherently grasp the underlying structural anomalies without specialized adaptation\. These findings underscore the non\-trivial nature of effectively leveraging LLMs for graph anomaly detection and motivate the need for a more integrated semantic\-structural approach\.
Figure 1:Performance comparison between DOMINANT, a traditional GCN\-based method, and four representative LLMs\. The significant performance gap motivates the need for the integrated semantic\-structural approach in LLM\-based GAD\.Addressing these gaps leads to a fundamental question:Can we effectively empower structural graph learning by synthesizing text\-rich representations through LLMs?Answering this question requires overcoming several technical hurdles regarding efficiency and integration\. First, we must determine how to effectively translate abstract graph topology into semantically rich yet computationally efficient textual descriptions within a structured data transformation pipeline\. Second, it is crucial to integrate these LLM\-derived semantic insights with specialized graph learning models in a complementary manner\. Third, we need to design a framework that jointly optimizes both the synthesized semantic consistency and the native structural information for the anomaly detection task\. Effectively balancing these multi\-modal signals is essential for capturing the subtle deviations that characterize sophisticated anomalies in otherwise attribute\-poor networks\.
Our work addresses these challenges by establishing a vital connection between raw structural information and LLM\-based semantic understandings within a unified data processing framework\. We proposeTERGAD, a novel framework for structure\-awareText\-EnhancedRepresentations forGraphAnomalyDetection, which establishes a vital connection between raw structural information and LLM\-based semantic reasoning\. Specifically, our framework enriches the representation of each node by generating natural language descriptions that articulate its unique structural role within the network\. This is achieved through carefully designed templates that translate complex topological metrics, such as centrality and community membership, into concise and interpretable textual roles\. A gated dual\-branch autoencoder then adaptively fuses these synthesized semantic embeddings with the original structural features through a learnable gating mechanism\. This architecture jointly reconstructs both the synthesized semantic descriptions and the underlying graph structure, with anomaly scores reflecting deviations in both domains\.
Our main contributions are summarized as follows:
- •Constructing Text\-Rich Graphs for Anomaly Detection\.We propose TERGAD, a framework that addresses attribute sparsity by integrating explicit structural semantics derived from LLMs through a systematic semantic enrichment pipeline\. By translating topology into natural language, we transform purely structural graphs into text\-rich representations for more robust detection\.
- •Semantic\-Guided Graph Encoding and Adaptive Fusion\.We introduce a structured template\-based approach to encode topological roles via LLMs, followed by an adaptive fusion mechanism\. This allows the model to selectively integrate high\-level semantic priors with raw structural data based on the specific context of each node\.
- •Extensive Empirical Validation\.We conduct comprehensive experiments on six real\-world datasets to evaluate the effectiveness of our proposed framework within practical data engineering\. The results demonstrate that TERGAD consistently outperforms state\-of\-the\-art GAD methods\.
## IIRelated Work
### II\-ALLMs for Graph Learning
Recent advances in LLMs have spurred growing interest in applying their reasoning capabilities to graph\[[32](https://arxiv.org/html/2605.19738#bib.bib34)\]\. A dominant paradigm involves converting graph topology into natural language descriptions through structured data transformation pipelines \(often termed Graph2Text\), which are then processed by frozen LLMs to generate semantic node features\[[31](https://arxiv.org/html/2605.19738#bib.bib35)\]\. For instance, a unified vandalism detection system for Wikidata employs a Graph2Text approach to convert complex factual triples and multilingual edits into a single textual space, allowing a language model to evaluate both structural and content changes for potential knowledge alterations\[[22](https://arxiv.org/html/2605.19738#bib.bib23)\]\. Similarly, GPT4Graph\[[7](https://arxiv.org/html/2605.19738#bib.bib6)\]systematically evaluates the structural understanding of LLMs by translating graphs into various textual formats like adjacency lists and GraphML\.
However, the Graph2Text approach suffers from two critical limitations in data processing efficiency: it forces LLMs to infer implicit graph structures from sequential text, which is inefficient compared to native graph learners, and often produces excessively long sequences that exceed LLM context windows\. To address these issues, InstructGLM\[[27](https://arxiv.org/html/2605.19738#bib.bib29)\]proposes instruction\-finetuning LLMs directly on graph tasks using scalable natural language prompts that explicitly describe multi\-hop neighborhoods\. In a different vein, GraphLLM\[[1](https://arxiv.org/html/2605.19738#bib.bib1)\]integrates a graph transformer with an LLM via graph\-enhanced prefix tuning, condensing graph information into a compact prefix and bypassing verbose conversion\.
Despite these innovations, most existing methods either treat LLMs as static feature extractors or require complex multi\-stage training pipelines\. Moreover, the generated representations often lack explicit mechanisms to balance original attribute fidelity and LLM\-derived semantic priors\. Our work, TERGAD, directly addresses this gap through a gated dual\-branch architecture that adaptively fuses both modalities for robust anomaly detection within real\-world data quality assessment workflows\.
### II\-BGraph Anomaly Detection
Various deep learning\-based approaches have been developed to tackle the challenges of GAD\. Graph Auto\-Encoders \(GAEs\) serve as a foundational paradigm, identifying anomalies by measuring the reconstruction quality of node embeddings\. For instance, AnomalyDAE\[[5](https://arxiv.org/html/2605.19738#bib.bib5)\]employs a dual auto\-encoder architecture with attention mechanisms to capture the complex interplay between network topology and node attributes\. GAAN\[[3](https://arxiv.org/html/2605.19738#bib.bib2)\]introduces a generative adversarial framework, integrating a generator and discriminator to detect anomalies through both reconstruction errors and discrimination confidence\.
Beyond reconstruction, self\-supervised learning has gained significant traction\. CoLA\[[10](https://arxiv.org/html/2605.19738#bib.bib10)\]utilizes contrastive learning on instance pairs to enhance scalability, while CONAD\[[26](https://arxiv.org/html/2605.19738#bib.bib28)\]incorporates anomaly\-specific prior knowledge via data augmentation and a Siamese GNN encoder\. More recent works focus on local contexts and data scarcity; GAD\-NR\[[17](https://arxiv.org/html/2605.19738#bib.bib17)\]models local structures through neighborhood reconstruction, and FIAD\[[2](https://arxiv.org/html/2605.19738#bib.bib3)\]alleviates label scarcity by injecting synthetic anomaly signals directly into the feature matrix\. To address multi\-modal inconsistencies, AHFAN\[[23](https://arxiv.org/html/2605.19738#bib.bib25)\]employs an attention\-based module to fuse topology\-driven and semantics\-driven representations\.
Despite these advancements, most existing GAD methods operate strictly within numerical topological and attribute spaces\. They lack the capacity to incorporate high\-level semantic knowledge regarding graph structural data, such as the functional roles nodes play, which limits their detection performance and interpretability on complex, real\-world datasets\. This semantic void in data representation motivates our exploration of LLMs to bridge the gap between raw structural data and high\-level conceptual understanding\.
## IIIPreliminaries
### III\-ANotations
Unless otherwise specified, we adopt the following mathematical notations throughout this paper: sets are denoted by calligraphic uppercase letters \(e\.g\.,𝒜\\mathcal\{A\}\), matrices by bold uppercase letters \(e\.g\.,𝐀\\mathbf\{A\}\), and vectors by bold lowercase letters \(e\.g\.,𝐱\\mathbf\{x\}\)\.
We define a graph as𝒢=\(𝒱,ℰ,𝐗\)\\mathcal\{G\}=\(\\mathcal\{V\},\\mathcal\{E\},\\mathbf\{X\}\), where𝒱\\mathcal\{V\}represents the set ofnnnodes \(\|𝒱\|=n\|\\mathcal\{V\}\|=n\),ℰ⊆𝒱×𝒱\\mathcal\{E\}\\subseteq\\mathcal\{V\}\\times\\mathcal\{V\}is the set of edges, and𝐗∈ℝn×dx\\mathbf\{X\}\\in\\mathbb\{R\}^\{n\\times d\_\{x\}\}denotes the node attribute matrix\. The topological structure is represented by an adjacency matrix𝐀∈\{0,1\}n×n\\mathbf\{A\}\\in\\\{0,1\\\}^\{n\\times n\}, where𝐀\[i,j\]=1\\mathbf\{A\}\[i,j\]=1if an edge exists between nodeiito nodejj, and 0 otherwise\. To facilitate graph learning, we compute the normalized adjacency matrix𝐀~=𝐃−1/2\(𝐀\+𝐈\)𝐃−1/2\\tilde\{\\mathbf\{A\}\}=\\mathbf\{D\}^\{\-1/2\}\(\\mathbf\{A\}\+\\mathbf\{I\}\)\\mathbf\{D\}^\{\-1/2\}, where𝐈\\mathbf\{I\}is the identity matrix and𝐃\\mathbf\{D\}is the degree matrix of𝐀\+𝐈\\mathbf\{A\}\+\\mathbf\{I\}\.
Furthermore, we let𝐙LLM∈ℝn×dz\\mathbf\{Z\}\_\{\\text\{LLM\}\}\\in\\mathbb\{R\}^\{n\\times d\_\{z\}\}denote the semantic embeddings generated by LLMs from natural language descriptions of the graph\. Our dual\-branch encoder maps input features into two latent representations:𝐇\(x\)∈ℝn×h\\mathbf\{H\}^\{\(x\)\}\\in\\mathbb\{R\}^\{n\\times h\}from the attribute branch and𝐇\(z\)∈ℝn×h\\mathbf\{H\}^\{\(z\)\}\\in\\mathbb\{R\}^\{n\\times h\}from the LLM branch, withhhbeing the hidden dimension\. These are integrated via an adaptive gate matrix𝐆∈\[0,1\]n×h\\mathbf\{G\}\\in\[0,1\]^\{n\\times h\}to obtain the final fused representation𝐙∈ℝn×h\\mathbf\{Z\}\\in\\mathbb\{R\}^\{n\\times h\}\. The decoder then reconstructs the node attributes and adjacency matrix, denoted as𝐗^\\hat\{\\mathbf\{X\}\}and𝐀^\\hat\{\\mathbf\{A\}\}respectively\. For each nodeii, the anomaly score is computed assis\_\{i\}, whereα∈\[0,1\]\\alpha\\in\[0,1\]is a hyperparameter balancing the respective reconstruction losses\. A complete summary of notations is provided in Table[I](https://arxiv.org/html/2605.19738#S3.T1)\.
TABLE I:Notations and their definitions\.
### III\-BGraph Anomaly Detection
Given an attributed graph𝒢=\(𝒱,ℰ,𝐗\)\\mathcal\{G\}=\(\\mathcal\{V\},\\mathcal\{E\},\\mathbf\{X\}\), The goal of GAD is to identify a subset of nodes𝒱anom⊂𝒱\\mathcal\{V\}\_\{\\text\{anom\}\}\\subset\\mathcal\{V\}that exhibit significant deviations from the majority in terms of structural connectivity or attribute patterns\. A prevalent approach in this domain leverages Graph Autoencoders based on Graph Convolutional Networks \(GCNs\)\. In this framework, the GCN encoder utilizes graph convolution layers to learn node representations by aggregating information from local neighborhoods\. A GCN layer is defined as:
𝐇\(l\+1\)=σ\(𝐀~𝐇\(l\)W\(l\)\),\\mathbf\{H\}^\{\(l\+1\)\}=\\sigma\\left\(\\tilde\{\\mathbf\{A\}\}\\mathbf\{H\}^\{\(l\)\}W^\{\(l\)\}\\right\),\(1\)where𝐀~=𝐃−12\(𝐀\+𝐈\)𝐃−12\\tilde\{\\mathbf\{A\}\}=\\mathbf\{D\}^\{\-\\frac\{1\}\{2\}\}\(\\mathbf\{A\}\+\\mathbf\{I\}\)\\mathbf\{D\}^\{\-\\frac\{1\}\{2\}\}is the normalized adjacency matrix with self\-loops,𝐇\(l\)\\mathbf\{H\}^\{\(l\)\}is the node representations at layerll,𝐖\(l\)\\mathbf\{W\}^\{\(l\)\}is the trainable weight matrix, andσ\\sigmadenotes an activation function\.
The encoder maps nodes to their latent representations,i\.e\.𝐙=Encoder\(𝐗,𝐀\)\\mathbf\{Z\}=\\text\{Encoder\}\(\\mathbf\{X\},\\mathbf\{A\}\), and the decoder is adopted to reconstruct the original graph information, typically the adjacency matrix:𝐀^=Decoder\(𝐙\)\\hat\{\\mathbf\{A\}\}=\\text\{Decoder\}\(\\mathbf\{Z\}\)\. By minimizing the reconstruction error, the structural patterns of the graph are encoded in the GCN parameters\. A common reconstruction error formulation is:
ℒstruct=‖𝐀−𝐀^‖F2\.\\mathcal\{L\}\_\{\\text\{struct\}\}=\\\|\\mathbf\{A\}\-\\hat\{\\mathbf\{A\}\}\\\|\_\{\\text\{F\}\}^\{2\}\.\(2\)
For attributed graphs, the attribute reconstruction loss may also be included:
ℒattr=‖𝐗−𝐗^‖F2\.\\mathcal\{L\}\_\{\\text\{attr\}\}=\\\|\\mathbf\{X\}\-\\hat\{\\mathbf\{X\}\}\\\|\_\{F\}^\{2\}\.\(3\)
The final anomaly scoresis\_\{i\}for a nodeviv\_\{i\}is derived from its combined reconstruction errors, reflecting its deviation from the learned normal pattern\. A typical scoring function is:
si=\(1−α\)‖𝐚i−𝐚^i‖22\+α‖𝐱i−𝐱^i‖22,s\_\{i\}=\(1\-\\alpha\)\\,\\\|\\mathbf\{a\}\_\{i\}\-\\hat\{\\mathbf\{a\}\}\_\{i\}\\\|\_\{2\}^\{2\}\+\\alpha\\,\\\|\\mathbf\{x\}\_\{i\}\-\\hat\{\\mathbf\{x\}\}\_\{i\}\\\|\_\{2\}^\{2\},\(4\)where𝐚i\\mathbf\{a\}\_\{i\}and𝐚^i\\hat\{\\mathbf\{a\}\}\_\{i\}are the original and reconstructed adjacency vectors for nodeii,𝐱i\\mathbf\{x\}\_\{i\}and𝐱^i\\hat\{\\mathbf\{x\}\}\_\{i\}are its original and reconstructed attribute vectors, andα∈\[0,1\]\\alpha\\in\[0,1\]is a hyperparameter balancing the two terms\. Nodes with higher scoressis\_\{i\}are considered more anomalous\.
Figure 2:The architecture of TERGAD\.
## IVThe Design of TERGAD
In this section, we introduce the architecture of TERGAD \(Fig\.[2](https://arxiv.org/html/2605.19738#S3.F2)\)\. We first present the graph information extraction with LLMs and the LLM\-based semantic embedding, followed by the design of the dual\-branch autoencoder\. Finally, we define the anomaly score based on TERGAD\. Our algorithm is summarized in Algorithm[1](https://arxiv.org/html/2605.19738#alg1)\.
### IV\-AGraph Structure Data Extraction with LLMs
To bridge the modality gap between graph\-structured data and the textual inputs required by LLMs, we transform raw graph inputs into node\-level natural language descriptions via a structured prompting template\. This process begins with a comprehensive JSON\-formatted data intermediate representation that encodes global statistics \(e\.g\., node/edge counts, directedness\), degree distributions, high\-order topological features \(e\.g\., clustering coefficients,kk\-core numbers, centrality measures\), and community structures\.
Our template dynamically constructs descriptive sentences by contextualizing raw values into human\-interpretable semantics:
- •Connectivity: Degree is described relative to global percentiles \(e\.g\., “top11% of all nodes by degree”\) or absolute ranks to signify its relative importance\.
- •Local Cohesion: The clustering coefficient is mapped to qualitative labels \(e\.g\., “high”, “moderate”\) while retaining precise numerical values for granularity\.
- •Global Significance: Centrality metrics are annotated with percentile\-based significance \(e\.g\., “critical bridge node within the top1010%”\)\.
- •Topological Roles: Structural roles \(e\.g\., hub, peripheral node, core member\) are explicitly assigned based on multi\-criteria heuristic thresholds\.
Each description is initialized with a base identity sentence: “Nodeiiis a vertex in an undirected graph…”\. It is crucial to note that the node index \(ii\) serves exclusively as a persistent identifier within the current graph instance to maintain cross\-modal alignment between the structural branch and the LLM branch\. These IDs carry no global semantic meaning and are not utilized as features for prediction, thereby ensuring that no test\-set information is leaked and the model’s generalizability is preserved\. This design ensures that the resulting text is fluent, self\-contained, and semantically aligned with the reasoning patterns of off\-the\-shelf LLMs without requiring task\-specific fine\-tuning\. A detailed illustration of the template is shown in Fig\.[3](https://arxiv.org/html/2605.19738#S4.F3)\.
Figure 3:Structured text description template for each node\.Illustrative Example: Node4242is a vertex in an undirected graph with6,0006,000nodes and8,0008,000edges\. It has a degree of1,2481,248, placing it in the top11% of all nodes by connectivity\. This node participates in3,8923,892triangles, indicating its immersion in tightly\-knit local clusters\. With a local clustering coefficient of0\.7560\.756, nearly all its neighbors are interconnected\. It resides in thekk\-core layer1212, remaining connected even after the iterative removal of lower\-degree nodes, which identifies it as a structurally robust core member\. Furthermore, it belongs to community33\. Centrality Analysis: Degree centrality \(0\.8520\.852, top1010%\) indicates high direct influence; Closeness centrality \(0\.7230\.723, top1010%\) reflects its proximity to the network center; Betweenness centrality \(0\.6120\.612, top1010%\) marks it as a critical bridge\. In its11\-hop ego network, it maintains12481248neighbors with an average degree of45\.32±12\.4545\.32\\pm 12\.45\. This node is among the top33primary hubs\. Global spectral analysis shows a Fiedler value of0\.1540\.154, indicating strong algebraic connectivity across the graph\.
### IV\-BLLM\-Based Semantic Embedding
We encode the generated natural language descriptions into semantic embeddings using theBGE\-large\-en\-v1\.51\.5\. Following the architecture, we applylast\-token poolingto extract the hidden state of the last non\-padding token as the sentence representation\. To avoid interference with subsequent standardization and ensure effective statistical alignment, we disable the default L22normalization\.
To ensure compatibility with the original node attributes and stabilize downstream autoencoder training, we then applyZ\-score standardizationacross all nodes for each embedding dimension:
𝐳inorm=𝐳i−μσ,\\mathbf\{z\}\_\{i\}^\{\\text\{norm\}\}=\\frac\{\\mathbf\{z\}\_\{i\}\-\\mu\}\{\\sigma\},\(5\)whereμ=1n∑j=1n𝐳j\\mu=\\frac\{1\}\{n\}\\sum\_\{j=1\}^\{n\}\\mathbf\{z\}\_\{j\}andσ=1n∑j=1n\(𝐳j−μ\)2\\sigma=\\sqrt\{\\frac\{1\}\{n\}\\sum\_\{j=1\}^\{n\}\(\\mathbf\{z\}\_\{j\}\-\\mu\)^\{2\}\}denote the mean and standard deviation of theii\-th embedding dimension over allnnnodes\. This transformation aligns the statistical distribution of LLM\-derived embeddings with typical preprocessed graph features, enabling balanced reconstruction in the dual\-branch architecture\. The final embeddings𝐙LLM∈ℝn×1024\\mathbf\{Z\}\_\{\\text\{LLM\}\}\\in\\mathbb\{R\}^\{n\\times 1024\}serve as enhanced attribute inputs to the dual\-branch autoencoder\.
### IV\-CDual\-Branch Autoencoder
Our framework begins with two complementary inputs: the original node attributes𝐗∈ℝn×dx\\mathbf\{X\}\\in\\mathbb\{R\}^\{n\\times d\_\{x\}\}and the LLM\-derived semantic embeddings𝐙LLM∈ℝn×dz\\mathbf\{Z\}\_\{\\text\{LLM\}\}\\in\\mathbb\{R\}^\{n\\times d\_\{z\}\}, where the latter is standardized via Z\-score normalization as described in Section[IV\-B](https://arxiv.org/html/2605.19738#S4.SS2)\. While𝐗\\mathbf\{X\}captures task\-specific features \(e\.g\., textual content or metadata\),𝐙LLM\\mathbf\{Z\}\_\{\\text\{LLM\}\}encodes high\-level structural semantics inferred from natural language descriptions of the graph\. By preserving both modalities as separate inputs, we avoid premature fusion and allow the model to learn modality\-specific graph representations before integration\.
The graph structure is encoded using a normalized adjacency matrix𝐀~=𝐃−1/2\(𝐀\+𝐈\)𝐃−1/2\\tilde\{\\mathbf\{A\}\}=\\mathbf\{D\}^\{\-1/2\}\(\\mathbf\{A\}\+\\mathbf\{I\}\)\\mathbf\{D\}^\{\-1/2\}, following the standard GCN framework\[[9](https://arxiv.org/html/2605.19738#bib.bib9)\]\. This preprocessing ensures stable message propagation and incorporates self\-loops to retain node\-specific information\.
The encoder consists of two parallel graph convolutional branches:
𝐇\(x\)\\displaystyle\\mathbf\{H\}^\{\(x\)\}=σ\(𝐀~𝐗𝐖\(x\)\),\\displaystyle=\\sigma\\left\(\\tilde\{\\mathbf\{A\}\}\\mathbf\{X\}\\mathbf\{W\}^\{\(x\)\}\\right\),\(6\)𝐇\(z\)\\displaystyle\\mathbf\{H\}^\{\(z\)\}=σ\(𝐀~𝐙LLM𝐖\(z\)\),\\displaystyle=\\sigma\\left\(\\tilde\{\\mathbf\{A\}\}\\mathbf\{Z\}\_\{\\text\{LLM\}\}\\mathbf\{W\}^\{\(z\)\}\\right\),\(7\)where𝐖\(x\)∈ℝdx×h\\mathbf\{W\}^\{\(x\)\}\\in\\mathbb\{R\}^\{d\_\{x\}\\times h\}and𝐖\(z\)∈ℝdz×h\\mathbf\{W\}^\{\(z\)\}\\in\\mathbb\{R\}^\{d\_\{z\}\\times h\}are learnable projection matrices, andσ\(⋅\)\\sigma\(\\cdot\)denotes the ReLU activation function\. Each branch independently aggregates neighborhood information within its own semantic space, thereby preserving the distinct characteristics of raw attributes and LLM\-enhanced embeddings\.
Algorithm 1TERGAD: Graph Anomaly Detection via LLM\-Augmented Semantic Fusion1:Input:Graph
𝒢=\(𝒱,𝐀,𝐗\)\\mathcal\{G\}=\(\\mathcal\{V\},\\mathbf\{A\},\\mathbf\{X\}\), natural language template
𝒯\\mathcal\{T\}, LLM
ℳLLM\\mathcal\{M\}\_\{\\text\{LLM\}\}, hidden dimension
hh, loss weight
α\\alpha
2:Output:Anomaly scores
\{si\}i=1n\\\{s\_\{i\}\\\}\_\{i=1\}^\{n\}for all nodes
3:
4:Phase 1: Graph\-to\-Text Conversion
5:foreach node
vi∈𝒱v\_\{i\}\\in\\mathcal\{V\}do
6:
ti←𝒯\(vi,𝒩\(vi\),𝐗\)t\_\{i\}\\leftarrow\\mathcal\{T\}\(v\_\{i\},\\mathcal\{N\}\(v\_\{i\}\),\\mathbf\{X\}\)
7:endfor
8:
𝐓←\[t1,t2,…,tn\]\\mathbf\{T\}\\leftarrow\[t\_\{1\},t\_\{2\},\\dots,t\_\{n\}\]
9:
10:Phase 2: LLM Semantic Embedding
11:
𝐙LLM←ℳLLM\(𝐓\)\\mathbf\{Z\}\_\{\\text\{LLM\}\}\\leftarrow\\mathcal\{M\}\_\{\\text\{LLM\}\}\(\\mathbf\{T\}\)
12:
𝐙LLM←StandardScaler\(𝐙LLM\)\\mathbf\{Z\}\_\{\\text\{LLM\}\}\\leftarrow\\operatorname\{StandardScaler\}\(\\mathbf\{Z\}\_\{\\text\{LLM\}\}\)
13:
14:Phase 3: Dual\-Branch Gated Autoencoder
15:
𝐀~←𝐃−1/2\(𝐀\+𝐈\)𝐃−1/2\\tilde\{\\mathbf\{A\}\}\\leftarrow\\mathbf\{D\}^\{\-1/2\}\(\\mathbf\{A\}\+\\mathbf\{I\}\)\\mathbf\{D\}^\{\-1/2\}
16:
𝐇\(x\)←ReLU\(𝐀~𝐗𝐖\(x\)\)\\mathbf\{H\}^\{\(x\)\}\\leftarrow\\operatorname\{ReLU\}\(\\tilde\{\\mathbf\{A\}\}\\mathbf\{X\}\\mathbf\{W\}^\{\(x\)\}\)
17:
𝐇\(z\)←ReLU\(𝐀~𝐙LLM𝐖\(z\)\)\\mathbf\{H\}^\{\(z\)\}\\leftarrow\\operatorname\{ReLU\}\(\\tilde\{\\mathbf\{A\}\}\\mathbf\{Z\}\_\{\\text\{LLM\}\}\\mathbf\{W\}^\{\(z\)\}\)
18:
𝐆←σ\(𝐖g\[𝐇\(x\)∥𝐇\(z\)\]\+𝐛g\)\\mathbf\{G\}\\leftarrow\\sigma\\left\(\\mathbf\{W\}\_\{g\}\[\\mathbf\{H\}^\{\(x\)\}\\parallel\\mathbf\{H\}^\{\(z\)\}\]\+\\mathbf\{b\}\_\{g\}\\right\)
19:
𝐇fused←𝐆⊙𝐇\(x\)\+\(1−𝐆\)⊙𝐇\(z\)\\mathbf\{H\}^\{\\text\{fused\}\}\\leftarrow\\mathbf\{G\}\\odot\\mathbf\{H\}^\{\(x\)\}\+\(1\-\\mathbf\{G\}\)\\odot\\mathbf\{H\}^\{\(z\)\}
20:
𝐙←ReLU\(𝐀~𝐇fused𝐖fuse\)\\mathbf\{Z\}\\leftarrow\\operatorname\{ReLU\}\(\\tilde\{\\mathbf\{A\}\}\\mathbf\{H\}^\{\\text\{fused\}\}\\mathbf\{W\}^\{\\text\{fuse\}\}\)
21:
22:Phase 4: Reconstruction and Scoring
23:
𝐗^←𝐀~𝐙𝐖dec\\hat\{\\mathbf\{X\}\}\\leftarrow\\tilde\{\\mathbf\{A\}\}\\mathbf\{Z\}\\mathbf\{W\}^\{\\text\{dec\}\}
24:
𝐀^←σ\(𝐙𝐙⊤\)\\hat\{\\mathbf\{A\}\}\\leftarrow\\sigma\(\\mathbf\{Z\}\\mathbf\{Z\}^\{\\top\}\)
25:foreach node
i=1i=1to
nndo
26:
si←\(1−α\)⋅‖𝐚i−𝐚^i‖22\+α⋅‖𝐱i−𝐱^i‖22s\_\{i\}\\leftarrow\(1\-\\alpha\)\\cdot\\\|\\mathbf\{a\}\_\{i\}\-\\hat\{\\mathbf\{a\}\}\_\{i\}\\\|\_\{2\}^\{2\}\+\\alpha\\cdot\\\|\\mathbf\{x\}\_\{i\}\-\\hat\{\\mathbf\{x\}\}\_\{i\}\\\|\_\{2\}^\{2\}
27:endfor
28:return
\{si\}i=1n\\\{s\_\{i\}\\\}\_\{i=1\}^\{n\}
To combine these representations in a context\-aware manner, we introduce a gated fusion mechanism:
𝐆\\displaystyle\\mathbf\{G\}=σg\(𝐖g\[𝐇\(x\)∥𝐇\(z\)\]\+𝐛g\),\\displaystyle=\\sigma\_\{g\}\\left\(\\mathbf\{W\}\_\{g\}\[\\mathbf\{H\}^\{\(x\)\}\\parallel\\mathbf\{H\}^\{\(z\)\}\]\+\\mathbf\{b\}\_\{g\}\\right\),\(8\)𝐇fused\\displaystyle\\mathbf\{H\}^\{\\text\{fused\}\}=𝐆⊙𝐇\(x\)\+\(1−𝐆\)⊙𝐇\(z\),\\displaystyle=\\mathbf\{G\}\\odot\\mathbf\{H\}^\{\(x\)\}\+\(1\-\\mathbf\{G\}\)\\odot\\mathbf\{H\}^\{\(z\)\},\(9\)whereσg\\sigma\_\{g\}is the sigmoid function,𝐖g∈ℝh×2h\\mathbf\{W\}\_\{g\}\\in\\mathbb\{R\}^\{h\\times 2h\}and𝐛g∈ℝh\\mathbf\{b\}\_\{g\}\\in\\mathbb\{R\}^\{h\}are trainable parameters, and⊙\\odotdenotes element\-wise multiplication\. The gate𝐆∈\[0,1\]n×h\\mathbf\{G\}\\in\[0,1\]^\{n\\times h\}dynamically determines the contribution of each branch for every node and feature dimension\. This adaptability is essential: for nodes with reliable structural roles, LLM semantics may dominate; for ambiguous or sparse cases, raw attributes may carry more weight\.
The fused representation is further refined through an additional graph convolutional layer:
𝐙=σ\(𝐀~𝐇fused𝐖fuse\),\\mathbf\{Z\}=\\sigma\\left\(\\tilde\{\\mathbf\{A\}\}\\mathbf\{H\}^\{\\text\{fused\}\}\\mathbf\{W\}^\{\\text\{fuse\}\}\\right\),with𝐖fuse∈ℝh×h\\mathbf\{W\}^\{\\text\{fuse\}\}\\in\\mathbb\{R\}^\{h\\times h\}\. This step enables higher\-order interactions among neighbors on the integrated signal, enhancing the expressiveness of the final latent representation\.
Finally, a shared decoder reconstructs both the original node attributes and the graph structure from𝐙\\mathbf\{Z\}:
𝐗^\\displaystyle\\hat\{\\mathbf\{X\}\}=𝐀~𝐙𝐖dec,\\displaystyle=\\tilde\{\\mathbf\{A\}\}\\mathbf\{Z\}\\mathbf\{W\}^\{\\text\{dec\}\},\(10\)𝐀^\\displaystyle\\hat\{\\mathbf\{A\}\}=σs\(𝐙𝐙⊤\),\\displaystyle=\\sigma\_\{s\}\\left\(\\mathbf\{Z\}\\mathbf\{Z\}^\{\\top\}\\right\),\(11\)where𝐖dec∈ℝh×dx\\mathbf\{W\}^\{\\text\{dec\}\}\\in\\mathbb\{R\}^\{h\\times d\_\{x\}\}is a learnable weight matrix, andσs\(⋅\)\\sigma\_\{s\}\(\\cdot\)is the sigmoid function that maps the reconstructed edge values into the\[0,1\]\[0,1\]range\. Crucially, the attribute decoder reproduces the original attributes𝐗\\mathbf\{X\}rather than LLM embeddings, ensuring reconstructed features remain grounded in the observable data space\.
### IV\-DGraph Anomaly Score
Within the TERGAD framework, the anomaly detection is performed via the reconstruction errors of the dual\-branch autoencoder\. For a given nodeviv\_\{i\}, the anomaly score is computed as:
si=\(1−α\)‖𝐚i−𝐚^i‖22\+α‖𝐱i−𝐱^i‖22,s\_\{i\}=\(1\-\\alpha\)\\,\\\|\\mathbf\{a\}\_\{i\}\-\\hat\{\\mathbf\{a\}\}\_\{i\}\\\|\_\{2\}^\{2\}\+\\alpha\\,\\\|\\mathbf\{x\}\_\{i\}\-\\hat\{\\mathbf\{x\}\}\_\{i\}\\\|\_\{2\}^\{2\},\(12\)whereα∈\[0,1\]\\alpha\\in\[0,1\]balances the contributions of structural and attribute reconstruction errors\. Here,𝐚i\\mathbf\{a\}\_\{i\}and𝐚^i\\hat\{\\mathbf\{a\}\}\_\{i\}denote theii\-th rows of the ground\-truth adjacency matrix𝐀\\mathbf\{A\}and its reconstructed version𝐀^\\hat\{\\mathbf\{A\}\}, respectively, while𝐱i\\mathbf\{x\}\_\{i\}and𝐱^i\\hat\{\\mathbf\{x\}\}\_\{i\}are theii\-th rows of the original node attributes𝐗\\mathbf\{X\}and the reconstructed𝐗^\\hat\{\\mathbf\{X\}\}\. Notably,𝐗^\\hat\{\\mathbf\{X\}\}is derived from a fused latent representation that integrates both the original attributes and LLM\-enhanced semantic embeddings, allowing the model to leverage external semantic knowledge for more robust attribute reconstruction\. Higher scores indicate greater deviation from normal patterns and thus higher anomaly likelihood\.
## VExperiments
### V\-ADatasets
In our experiments, we employ six widely adopted graph datasets\. Detailed statistics of these datasets are summarized in Table[II](https://arxiv.org/html/2605.19738#S5.T2)\.
- •Cora\[[19](https://arxiv.org/html/2605.19738#bib.bib19)\]is a citation network consisting of2,7082,708machine learning papers grouped into77categories\. The graph has5,4295,429citation edges, and node attributes are bag\-of\-words features derived from the paper abstracts\.
- •Citeseer\[[19](https://arxiv.org/html/2605.19738#bib.bib19)\]is a citation network comprising3,3273,327scientific publications from the Citeseer digital library\. The graph contains4,7324,732citation edges, and node attributes are bag\-of\-words features based on the paper content\.
- •DBLP\[[28](https://arxiv.org/html/2605.19738#bib.bib30)\]is a citation network comprising5,4845,484publications from the DBLP Computer Science Bibliography, with8,1178,117citation edges\. Node attributes are extracted from article titles\.
- •ACM\[[20](https://arxiv.org/html/2605.19738#bib.bib20)\]is a citation network with16,48416,484papers from the ACM digital library\. The graph has71,98071,980citation edges, and node attributes are bag\-of\-words features extracted from the paper titles and abstracts\.
- •Pubmed\[[19](https://arxiv.org/html/2605.19738#bib.bib19)\]is a citation network of19,71719,717scientific publications from the PubMed database\. The graph includes88,64888,648citation links, and node attributes are TF\-IDF weighted word frequencies from the paper abstracts\.
- •BlogCatalog\[[21](https://arxiv.org/html/2605.19738#bib.bib21)\]is a social network dataset sourced from the BlogCatalog blogging platform \(shortened toBlogin this paper\)\. Nodes correspond to users, while edges capture the follower\-followee relationships among them\. Node features are derived from user\-generated content, including blog posts and photo tags\.
TABLE II:The statistics of six real\-world datasets\.TABLE III:Performance comparison of TERGAD and baselines across six datasets\. Results are reported in using ROC\-AUC \(%\\%\) and PR\-AUC \(%\), and the best performance in each column is highlighted inred and bolded\.
### V\-BBaselines
In our experiments, we compare our model with nine graph anomaly detection methods, which can be grouped into three categories:
Firstly, methods employ reconstruction error as an anomaly signal\.
- •MLPAE\[[18](https://arxiv.org/html/2605.19738#bib.bib18)\]uses a multi\-layer perceptron \(MLP\) as both encoder and decoder to reconstruct node attributes\.
- •GCNAE\[[9](https://arxiv.org/html/2605.19738#bib.bib9)\]replaces MLP with a GCN to jointly model graph structure and attributes\.
- •DOMINANT\[[4](https://arxiv.org/html/2605.19738#bib.bib4)\]adopts GCN as the encoder but reconstructs the adjacency matrix via inner product and the attribute matrix via a reverse GCN layer\.
- •AnomalyDAE\[[5](https://arxiv.org/html/2605.19738#bib.bib5)\]introduces dedicated structure and attribute decoders to jointly learn topological and semantic patterns for anomaly detection\.
Secondly, approaches leverage generative modeling or contrastive learning to capture anomalous beyond reconstruction\.
- •GAAN\[[3](https://arxiv.org/html/2605.19738#bib.bib2)\]is the first to apply generative adversarial networks \(GANs\) to graph anomaly detection, using a generator\-discriminator framework to identify outliers\.
- •CoLA\[[10](https://arxiv.org/html/2605.19738#bib.bib10)\]utilizes contrastive learning to pull normal node pairs closer and push anomalous ones apart\.
Thirdly, models enhance node representations through attention, neighborhood modeling, or explicit anomaly injection\.
- •GADNR\[[17](https://arxiv.org/html/2605.19738#bib.bib17)\]emphasizes local neighborhood structures to better capture diverse anomaly types\.
- •FIAD\[[2](https://arxiv.org/html/2605.19738#bib.bib3)\]directly injects synthetic anomaly information into the feature matrix, enabling the model to learn fine\-grained anomaly patterns from all nodes\.
- •AHFAN\[[23](https://arxiv.org/html/2605.19738#bib.bib25)\]proposes a dual\-branch framework that fuses semantic and attention\-based representations to address class and semantic inconsistency\.
### V\-CExperimental Setup
Following the standard evaluation protocol established in prior GAD literature\[[4](https://arxiv.org/html/2605.19738#bib.bib4)\], we adopt a synthetic anomaly injection strategy to ensure controlled and reproducible evaluation across all datasets\. Specifically, for each dataset, we designate55% of nodes as anomalous and perturb them via two complementary mechanisms:
- •Attribute Perturbation: For each selected anomalous node, we randomly flip3030% of its attribute dimensions\. For bag\-of\-words or TF\-IDF features \(e\.g\., Cora, Pubmed\), this corresponds to inverting the presence/absence or weight of selected terms, thereby creating semantic inconsistencies between the node’s content and its structural role\.
- •Structural Perturbation: We rewire2020% of the edges incident to each anomalous node by disconnecting it from its original neighbors and reconnecting it to randomly selected nodes outside its local community\. This createshub\-periphery mismatchesorbridge anomalies, where a node exhibits connectivity patterns inconsistent with its attribute profile\.
The anomaly injection ratios for the six datasets after perturbation are detailed in Table[II](https://arxiv.org/html/2605.19738#S5.T2)\. The injected anomalies are designed to exhibitsemantic\-structural inconsistencies, a hallmark of real\-world anomalies in domains such as financial fraud\[[14](https://arxiv.org/html/2605.19738#bib.bib14)\]and cyber intrusion detection\[[29](https://arxiv.org/html/2605.19738#bib.bib32)\]\.
For all experiments, we set the hidden dimension of the dual\-branch encoder to6464\(the Blog dataset is set to256256\), the number of training epochs to200200, the initial learning rate to5×10−35\\times 10^\{\-3\}, and the dropout rate to0\.30\.3\. The loss balancing parameterα\\alphais set to0\.80\.8, emphasizing structural reconstruction while preserving attribute fidelity\.
To ensure statistical reliability and mitigate the impact of random initialization, we run each experiment with four different random seeds and report the mean and variance of the ROC\-AUC scores\. All models are implemented in PyTorch and trained on a single NVIDIA L4040GPU with the cuda backend\. The model code and precomputed LLM embeddings are publicly available to ensure reproducibility\.
### V\-DComparison Results
To comprehensively evaluate the effectiveness of the proposed TERGAD framework, we compare it against nine state\-of\-the\-art baselines across six real\-world datasets\. The quantitative results, measured by ROC\-AUC \(%\), are summarized in Table[III](https://arxiv.org/html/2605.19738#S5.T3)\.
The results demonstrate that TERGAD consistently outperforms all baseline methods across the six datasets, achieving93\.0693\.06%,95\.7895\.78%,98\.0398\.03%,94\.9794\.97%, and97\.7297\.72% on Cora, Citeseer, Pubmed, ACM, and DBLP, respectively\. This consistent superiority across diverse domains, ranging from citation networks to co\-authorship graphs, underscores the robustness and strong generalizability of our framework\. Several key observations can be drawn from the results\. First, traditional reconstruction\-based methods, such as MLPAE and GCNAE, exhibit relatively lower performance, indicating their inherent limitations in capturing complex, non\-linear anomalous patterns\. Second, while FIAD performs competitively on certain datasets, its reliance on synthetic anomaly injection based on global statistics often overlooks localized structural irregularities, leading to inconsistent performance across different graph topologies\.
The performance gain of TERGAD highlights the effectiveness of leveraging LLMs to capture structural semantic priors for anomaly detection\. By translating abstract graph topology into descriptive natural language that articulates structural roles and connectivity patterns, our framework empowers LLMs to identify anomalies through a lens of semantic consistency\. This advantage is particularly pronounced on the Pubmed and DBLP datasets, where our method achieves98\.0398\.03% and97\.7297\.72%, significantly surpassing all baselines\. These results suggest that structural anomalies in academic networks often manifest as deviations from expected topological roles, such as nodes with abnormal community memberships, which are more effectively captured through semantic reasoning than pure numerical modeling\.
Notably, TERGAD also achieves the leading performance on the ACM dataset \(94\.9794\.97%\), further confirming the efficacy of LLM\-enhanced understanding in citation networks\. This result demonstrates that our framework remains highly effective even when anomaly signals are subtle and deeply intertwined with both structural and attribute\-based cues\. In conclusion, this comprehensive comparison validates that structural information, when processed through the semantic reasoning of LLMs, provides a superior foundation for graph anomaly detection\. Our approach offers a novel perspective by identifying structurally anomalous patterns that are often overlooked by traditional topological methods\.
To provide a more comprehensive evaluation of TERGAD’s performance, we report the Precision\-Recall Area Under the Curve \(PR\-AUC\) scores across all six benchmark datasets\. PR\-AUC is particularly informative for anomaly detection tasks due to the inherent class imbalance in these datasets\. As shown in Table[III](https://arxiv.org/html/2605.19738#S5.T3), TERGAD achieves substantial improvements in PR\-AUC across all datasets, with particularly notable gains on Pubmed \(76\.42%76\.42\\%\), DBLP \(73\.95%73\.95\\%\), and Cora \(68\.71%68\.71\\%\)\. These results confirm that our framework effectively identifies anomalous nodes even in highly imbalanced scenarios\.
### V\-EAblation Study
#### V\-E1Component Ablation
To evaluate the contribution of individual components within the TERGAD framework, we conduct a systematic ablation study by comparing the full model against two key variants\. The first variant,TERGAD w/o Prompt, omits the structured template\-based prompting strategy, directly depriving the LLM of the contextual information that encodes the topological roles\. This experiment simply concatenates structural information to the original node attributes𝐗\\mathbf\{X\}\. The second variant,TERGAD w/o Gate, eliminates the gated fusion mechanism, replacing the adaptive node\-wise balancing with a simple concatenation to integrate node attributes and LLM\-derived semantic embeddings\.
The anomaly detection performance of these variants across six benchmark datasets is summarized in Table[IV](https://arxiv.org/html/2605.19738#S5.T4)\. The results consistently demonstrate that both components are indispensable for achieving optimal performance\. Specifically, the significant performance degradation observed in the w/o Prompt variant, most notably on Cora \(17\.07%17\.07\\%\) and Citeseer \(23\.43%23\.43\\%\), underscores the critical role of structured prompting in eliciting high\-quality semantic priors\. By translating raw topology into role\-based natural language, this strategy enables the LLM to capture nuanced structural features \(e\.g\., hub or bridge nodes\) that are vital for identifying semantic\-structural inconsistencies\.
TABLE IV:Results of the ablation study \(ROC\-AUC %\)\.Similarly, the inferior results of the TERGAD w/o Gate variant highlight the necessity of an adaptive fusion mechanism\. The performance gaps are particularly pronounced on Cora \(7\.50%7\.50\\%\) and ACM \(6\.47%6\.47\\%\), suggesting that these datasets require a fine\-grained balance between raw attributes and semantic insights\. In the absence of the gating mechanism, the model struggles to reconcile potentially conflicting signals from the dual branches, leading to suboptimal representations that fail to exploit the synergy between structural roles and attribute information\. Notably, the full TERGAD model achieves the best performance across all datasets, confirming that the integration of semantic enrichment and adaptive fusion is essential for robust graph anomaly detection\.
#### V\-E2Impact of High\-Order Structural Descriptions in Prompt Design
To evaluate the contribution of individual high\-order structural components within our prompting strategy, we conduct a fine\-grained ablation study\. Specifically, we systematically remove distinct topological features, including degree \(Deg\.\), triangles and clustering coefficient \(T & C\),kk\-core \(kk\-C\.\), centrality measures \(Cen\.\), and top\-hub status \(T\-H\), from the natural language descriptions while retaining the full TERGAD architecture\. This allows us to isolate the semantic value provided by each structural property\. The performance comparison, measured by ROC\-AUC \(%\), is presented in Table[IV](https://arxiv.org/html/2605.19738#S5.T4)\.
The results in Table[IV](https://arxiv.org/html/2605.19738#S5.T4)demonstrate that every structural component contributes positively to the overall detection performance\. The full TERGAD model consistently achieves the highest scores across all six datasets, validating the necessity of comprehensive structural semantics\. Notably, removing thekk\-core decomposition leads to the most significant performance degradation, particularly on Cora \(a drop of6\.19%6\.19\\%\) and Citeseer \(a drop of6\.67%6\.67\\%\)\. This suggests that global hierarchical structural information is critical for the LLM to understand a node’s coreness and robustness within the network\. Similarly, excluding centrality measures results in substantial declines on ACM \(7\.78%7\.78\\%\) and Cora \(3\.42%3\.42\\%\), indicating that metrics quantifying node influence \(e\.g\., betweenness, closeness\) are vital for identifying anomalies that deviate from expected influence patterns\.
Furthermore, the removal of local cohesion metrics \(Triangles & Clustering Coefficient\) and connectivity basics \(Degree\) also yields consistent performance drops across all datasets, though to a varying extent\. For instance, on ACM, removing degree information causes a6\.08%6\.08\\%decrease, highlighting that even basic connectivity counts are foundational for semantic grounding\. The relatively smaller when removing Top\-Hub status imply that while explicit hub ranking is beneficial, its information is partially correlated with degree and centrality\. Overall, these findings confirm that enriching prompts with diverse, multi\-scale structural descriptors, from local connectivity to global hierarchy, enables the LLM to construct more nuanced semantic representations, thereby enhancing the robustness of anomaly detection\.
TABLE V:Performance comparison of TERGAD with different module variants \(ROC\-AUC %\)\.
#### V\-E3Effect of Different Module Variants
To investigate the impact of different fusion strategies in our dual\-branch architecture, we conduct a comprehensive study comparing five representative fusion mechanisms: \(1\)Concatenation\(Concat\), which horizontally concatenates the original node attributes and the LLM\-derived semantic embeddings; \(2\)Element\-wise Addition\(Add\), which performs dimension\-aligned addition of the two feature matrices; \(3\)Element\-wise Multiplication\(Multiply\), which computes the Hadamard product of the two feature matrices; \(4\)Attention mechanism\(Attention\), which uses a standard attention mechanism to compute adaptive weights; and \(5\)Temperature\-scaled Attention\(T\-A\), which introduces a learnable temperature parameterτ\\tauto sharpen or smooth the attention distribution, i\.e\.,softmax\(𝐪⊤𝐤/τ\)\\text\{softmax\}\(\\mathbf\{q\}^\{\\top\}\\mathbf\{k\}/\\tau\)\.
For the Add and Multiply operations, when the dimensions of original attributes and LLM embeddings differ, our implementation automatically applies PCA dimensionality reduction to align both representations to the minimum dimension before fusion\. All fusion variants operate on LLM embeddings that have been standardized using Z\-score normalization to ensure stable training\. All variants share the same overall architecture and training protocol, differing only in the fusion module\. The results, measured by ROC\-AUC \(%\), are summarized in Table[V](https://arxiv.org/html/2605.19738#S5.T5)\.
Comparing the results of our fusion strategies with the baseline methods in Table[III](https://arxiv.org/html/2605.19738#S5.T3), we observe consistent and significant improvements\. For instance, on the Cora dataset, our Temp\-Attention fusion achieves89\.7789\.77%, outperforming the best baseline DOMINANT \(88\.5288\.52%\) by1\.251\.25points\. Similarly, on Pubmed, both Add \(95\.6095\.60%\) and Multiply \(95\.4495\.44%\) fusion strategies exceed the best baseline GAAN \(90\.3590\.35%\) by approximately55points\. Even on datasets like ACM where the baseline performance is relatively strong, our Temp\-Attention method \(91\.5691\.56%\) still surpasses the best baseline GAAN \(88\.2588\.25%\) by3\.313\.31points\. These results highlight that integrating LLM\-derived semantic embeddings through adaptive fusion mechanisms consistently enhances anomaly detection performance beyond state\-of\-the\-art baselines\.
To assess how the choice of LLMs affects performance, we followed the identical experimental pipeline with two alternative semantic encoders:Qwen33\-Embedding\-44Bandnomic\-embed\-text\-v1\.51\.5\. The only difference in this study lies in the model used to generate the initial semantic node embeddings from textual node descriptions\. The rest of the framework architecture and training procedure, comprising gated fusion, dual\-branch autoencoding, and anomaly score calculation, was kept strictly consistent\. The comparative results are presented in Table[V](https://arxiv.org/html/2605.19738#S5.T5)\.
The results, as presented in Table[V](https://arxiv.org/html/2605.19738#S5.T5), demonstrate that both Qwen33\-Embedding\-44B and nomic\-embed\-text\-v1\.51\.5enable our TERGAD framework to achieve strong performance, outperforming non\-LLM baseline methods on most datasets \(as shown in Table[III](https://arxiv.org/html/2605.19738#S5.T3)\)\. Notably, Qwen33\-Embedding\-44B achieves superior performance on two datasets \(Citeseer and Pubmed\), while nomic\-embed\-text\-v1\.51\.5performs best on the other three \(Cora, ACM, and DBLP\)\. Across all datasets, nomic\-embed\-text\-v1\.51\.5attains a marginally higher average ROC\-AUC\. This indicates that both LLMs are capable semantic encoders, with each showing particular strengths on different types of graph data\. The competitive performance of nomic\-embed\-text\-v1\.51\.5, despite its significantly smaller parameter size \(0\.650\.65B vs\.44B\), suggests that model scale is not the sole determinant of effectiveness for this task\.
It is worth noting that, although these two LLM variants achieve competitive results, they do not outperform our primary model employing BGE\-large\-en\-v1\.51\.5embeddings, corresponding to the last row \(TERGAD\) in Table[V](https://arxiv.org/html/2605.19738#S5.T5), which attains the best results across all datasets \(a mean AUC of93\.0693\.06% on Cora,95\.7895\.78% on Citeseer,98\.0398\.03% on Pubmed,94\.9794\.97% on ACM, and97\.7297\.72% on DBLP\)\. This suggests that BGE, despite being a smaller model, is particularly well\-suited for capturing the semantic signals relevant to graph anomaly detection in our setup\. In summary, the choice of LLM remains a critical factor, and our results highlight the importance of selecting an LLM whose pretraining domain aligns with the target application\.
#### V\-E4Prompt Analysis
Our node descriptions are composed of multiple structural sentences\. To assess whether theorderin which these sentences appear affects model performance, we evaluate TERGAD under a shuffled prompt configuration, where the sequence of descriptive clauses is randomly permuted for each node while preserving all content\.
Fig\.[6\(a\)](https://arxiv.org/html/2605.19738#S5.F6.sf1)shows two independent trials of this per\-node shuffling process\. These are representative samples among many random configurations tested\. As summarized in Table[V](https://arxiv.org/html/2605.19738#S5.T5), TERGAD achieves strong performance under shuffled prompts\. This is only slightly lower than the results obtained with the original structured prompt\.
These findings confirm that TERGAD’s performance is largely invariant to the syntactic ordering of structural facts in the prompt, suggesting that the LLM effectively aggregates semantic signals regardless of surface\-level sentence arrangement\. This robustness further supports the reliability of our text\-enhanced representation strategy\.
\(a\)Illustration of Node\-level Prompt Order Divergence\. Two random realizations of per\-node prompt shuffling, demonstrating how independent randomization leads to distinct input sequences for different nodes\.To investigate how different natural language formulations affect the quality of LLM\-derived structural semantics, we conduct a comparative study between our original narrative\-style prompt and an alternative structured profile\-style template\. The profile\-style template adopts a technical dossier format with explicit section headers and delimiter\-based organization \(as shown in Fig\.[6\(a\)](https://arxiv.org/html/2605.19738#S5.F6.sf1)\), contrasting with the original fluent paragraph\-style descriptions\. This experiment isolates the impact of linguistic presentation on the downstream anomaly detection performance, while keeping all other components identical\.
As shown in Table[V](https://arxiv.org/html/2605.19738#S5.T5), the original narrative\-style prompt consistently outperforms the structured profile\-style template across all six datasets, with an average improvement of4\.53%4\.53\\%\. The performance gap is most pronounced on Cora \(7\.50%7\.50\\%\) and DBLP \(2\.94%2\.94\\%\), suggesting that fluent, self\-contained natural language descriptions better align with the pretraining distribution of off\-the\-shelf LLMs\. We hypothesize that narrative\-style prompts, which mimic human\-readable explanatory text, enable LLMs to more effectively leverage their inherent linguistic reasoning capabilities for structural understanding\. In contrast, the profile\-style template, while more compact and machine\-readable, may introduce syntactic patterns that deviate from the LLM’s pretraining corpus, slightly impairing semantic extraction quality\.
Notably, on Pubmed, both prompt styles achieve nearly identical performance \(98\.03%98\.03\\%vs\.97\.96%97\.96\\%\), indicating that for larger graphs with rich attribute information, the specific linguistic formulation becomes less critical\. This observation suggests that the semantic signal from node attributes may dominate the structural signal in such scenarios, reducing the sensitivity to prompt design\. Overall, these findings validate our design choice of using narrative\-style prompts and highlight the importance of aligning prompt formulations with LLM pretraining characteristics for optimal structural semantic extraction\.
### V\-FHyperparameter Analysis
We also investigate the impact of model depth by comparing single\-layer and two\-layer GCN variants for both the encoder and decoder\. The experimental results demonstrate that the single\-layer configuration consistently outperforms its two\-layer counterpart across all six evaluated datasets\.
To ensure a fair and controlled comparison, the two\-layer GCN variant strictly adheres to the experimental setup delineated in Section[V\-C](https://arxiv.org/html/2605.19738#S5.SS3)\. Specifically, we extend the original single\-layer architecture by sequentially stacking an additional GCN layer for each branch \(i\.e\., the attribute branch and the LLM embedding branch\), while maintaining identical hyperparameter configurations \(learning rate, hidden dimension, dropout rate\) and training protocols\. The two\-layer configuration adopts a standard stacked GCN architecture \(Layer11→\\rightarrowReLU→\\rightarrowLayer22→\\rightarrowReLU\) without incorporating residual connections or skip connections\. This methodological choice ensures that any observed performance discrepancies can be unequivocally attributed to variations in model depth rather than confounding factors such as architectural modifications or hyperparameter heterogeneity\.
We attribute this performance degradation to the increased model complexity introduced by the additional layer, which may lead to over\-smoothing or information redundancy, particularly when the LLM\-derived semantic embeddings already provide rich structural context\. This observation, supported by the empirical evidence in Fig\.[6](https://arxiv.org/html/2605.19738#S5.F6), further validates our choice of a shallow, single\-layer architecture as the optimal balance between expressiveness and sensitivity to anomalous patterns\.
Furthermore, we conduct a study to evaluate the sensitivity of our model to the balance parameterα\\alpha\. As illustrated in Fig\.[6](https://arxiv.org/html/2605.19738#S5.F6), the ROC\-AUC scores remain consistently high across a wide range ofα\\alphavalues \(from0\.10\.1to0\.90\.9\) for all six datasets\. Notably, the performance peaks aroundα=0\.8\\alpha=0\.8, but remains robust even whenα\\alphavaries between0\.60\.6and0\.90\.9, with minimal degradation\. This demonstrates that our LLMforGAD framework is not overly sensitive to the exact choice ofα\\alpha, indicating its stability and generalizability\. The consistent top performance atα=0\.8\\alpha=0\.8further validates our default setting, while the flat performance curve in the neighborhood of this value suggests that our method can maintain strong detection accuracy even under suboptimal parameter configurations\.


Figure 6:Performance comparison of TERGAD across six datasets under varying balance parameterα\\alpha, and varying layer GCN variants\.### V\-GEfficiency Analysis
#### V\-G1Generation Time Consumption Analysis
To evaluate the computational efficiency of our text generation and embedding process, we measure the time required for two key steps on six benchmark datasets\. The first step is the generation of structured, natural language descriptions for each node based on its graph structural properties\. The second step is the computation of embeddings from these generated texts using the BGE model\.
As shown in Table[VI](https://arxiv.org/html/2605.19738#S5.T6), the text generation phase is efficient, taking less than0\.250\.25seconds even for the largest dataset \(Pubmed, with19,71719,717nodes\)\. This demonstrates that our method for converting complex graph topology into descriptive text is highly scalable and incurs minimal overhead\. In contrast, the embedding computation is the primary time cost, which is expected as it involves a large pre\-trained language model\. The time for this step scales linearly with the number of nodes, confirming the predictable computational demand of our approach\.
#### V\-G2Training Time Consumption Analysis
We measure the total training time \(in seconds\) for200200epochs across all methods and datasets\. This metric reflects the computational efficiency of each approach during the training phase\. The results are presented in Table[VI](https://arxiv.org/html/2605.19738#S5.T6)\.
TABLE VI:Training time \(seconds\) for200200epochs across six datasets\.TERGAD demonstrates competitive computational efficiency, achieving the fastest training time on smaller datasets like Cora \(1\.771\.77s\) and Citeseer \(2\.452\.45s\)\. On larger datasets \(DBLP, ACM, Pubmed\), while lightweight autoencoders \(e\.g\., MLPAE, GCNAE, AHFAN\) exhibit lower training times due to simpler architectures, TERGAD remains significantly more efficient than complex baselines such as DOMINANT, GAAN, and GADNR\. For instance, on ACM, TERGAD completes training in52\.7052\.70s, substantially faster than DOMINANT \(2,361\.972,361\.97s\) and GAAN \(5,663\.095,663\.09s\)\. This efficiency stems from our streamlined dual\-branch architecture, which avoids computationally expensive adversarial training or extensive neighborhood reconstruction\. The modest overhead compared to simple autoencoders is justified by the substantial performance gains in detection accuracy, highlighting a favorable trade\-off between efficiency and effectiveness\.
#### V\-G3Memory Allocation Analysis
We report the maximum GPU memory allocated \(in MB\) during training for each method\. Additionally, we provide detailed memory consumption analysis for TERGAD’s LLM embedding generation phase\. The results are shown in Tables[VII](https://arxiv.org/html/2605.19738#S5.T7)\.
TABLE VII:Maximum GPU memory allocated \(MB\) during training across six datasets\.Table[VII](https://arxiv.org/html/2605.19738#S5.T7)presents a comprehensive comparison of GPU memory consumption across all methods\. We observe that lightweight autoencoder\-based approaches \(e\.g\., AHFAN, MLPAE, CoLA\) exhibit lower memory footprints due to their simpler architectures and absence of semantic enrichment modules\. However, these methods achieve such efficiency at the cost of limited representational capacity, as evidenced by their comparatively lower detection performance in Table[III](https://arxiv.org/html/2605.19738#S5.T3)\.
In contrast, TERGAD strikes a favorable balance between memory efficiency and detection effectiveness\. On smaller datasets \(Cora:276\.53276\.53MB, Citeseer:494\.21494\.21MB, DBLP:1,374\.291,374\.29MB\), TERGAD maintains memory consumption within a practical range while delivering substantial performance gains\. Even on larger datasets like ACM and PubMed, where memory usage increases to8,960\.418,960\.41MB and10,643\.4510,643\.45MB, respectively\. TERGAD remains significantly more memory\-efficient than comparably powerful baselines such as FIAD \(15,170\.3815,170\.38MB and15,250\.8615,250\.86MB\), DOMINANT \(14,686\.2514,686\.25MB and9,123\.779,123\.77MB\), and GAAN \(8,558\.058,558\.05MB and4,318\.374,318\.37MB\)\.
The base BGE\-large\-en\-v1\.51\.5model requires1,2571,257MB for initialization\. During batch processing \(100100nodes per batch\), the peak memory consumption remains stable across all datasets \(approximately2,5922,592–2,5982,598MB\), demonstrating that our embedding generation process is memory\-efficient and scalable regardless of graph size\. This one\-time preprocessing cost is acceptable given the substantial performance gains achieved by incorporating LLM\-derived semantic embeddings\.
In conclusion, the primary objective of Fig\. 1 is to address a fundamental question:Can LLMs alone, without explicit graph structure modeling, effectively detect graph anomalies?To investigate this, we conducted a controlled experiment on the standard text\-rich graph dataset Citeseer\.
We compare the following methods:
- •DOMINANT\[[4](https://arxiv.org/html/2605.19738#bib.bib4)\], a strong GCN\-based graph anomaly detection method achieving AUC =92\.8592\.85% \(see Table[III](https://arxiv.org/html/2605.19738#S5.T3)\)\.
- •ChatGPT\-4\.14\.1, Gemini\-33, Deepseek, and Qwen3\-Max, all operating in a zero\-shot setting\. Specifically, we provide each LLM with raw node attributes \(bag\-of\-words\) and the complete adjacency matrix in textual format \(e\.g\., “nodeiiconnects to \[j1,j2,…j\_\{1\},j\_\{2\},\\dots\]”\), and request binary anomaly labels for each node\.
- •We report Recall@K, whereKKequals the number of ground\-truth anomalous nodes \(K=150K=150\)\.
As shown in Fig\. 1, all LLM\-based methods exhibit Recall@K scores that are more than 70% lower than DOMINANT\. This substantial performance gap confirms our hypothesis:LLMs alone cannot reliably detect graph anomalies, even on text\-rich graphs, due to their lack of explicit graph reasoning mechanisms\.
This finding directly motivates the core design principle of TERGAD: rather than serving as an independent detector, the LLM functions as a semantic augmentation module that generates high\-quality embeddings from structured textual descriptions, which are then adaptively fused with structural information through our dual\-branch gated autoencoder\.
## VIConclusion
In this paper, we propose TERGAD, a novel framework for graph anomaly detection that bridges the gap between graph topology and semantic reasoning through LLMs\. Grounded in the insight that natural language can effectively encapsulate high\-level structural roles, such as “hubs”, “bridges”, or “peripheral nodes”, we first transform abstract graph structures into descriptive textual narratives, which are subsequently encoded by an LLM into rich semantic embeddings for enhanced data representation\. To synergize these insights with raw node attribute data, we introduce a gated dual\-branch autoencoder that adaptively fuses both modalities to reconstruct the original attributes and graph structure\. The resulting anomaly scores are derived from the joint reconstruction error, ensuring that the detection process is both interpretable and aligned with the fundamental definitions of graph anomalies\. Notably, TERGAD is a flexible, plug\-and\-play framework that requires no LLM fine\-tuning and supports efficient end\-to\-end training\. Extensive experiments on six real\-world graph datasets demonstrate that TERGAD consistently outperforms baselines\. In the future, we will explore dynamic prompt optimization, multi\-hop semantic propagation, and the extension of this framework to heterogeneous and temporal graph settings for broader data mining applications\.
## References
- \[1\]\(2025\)GraphLLM: boosting graph reasoning ability of large language model\.IEEE Transactions on Big Data\.Cited by:[§II\-A](https://arxiv.org/html/2605.19738#S2.SS1.p2.1)\.
- \[2\]A\. Chen, J\. Wu, and H\. Zhang\(2025\)FIAD: graph anomaly detection framework based feature injection\.Expert Systems with Applications259,pp\. 125216\.Cited by:[§II\-B](https://arxiv.org/html/2605.19738#S2.SS2.p2.1),[2nd item](https://arxiv.org/html/2605.19738#S5.I4.i2.p1.1)\.
- \[3\]Z\. Chen, B\. Liu, M\. Wang, P\. Dai, J\. Lv, and L\. Bo\(2020\)Generative adversarial attributed network anomaly detection\.InProceedings of the 29th ACM International Conference on Information and Knowledge Management,pp\. 1989–1992\.Cited by:[§II\-B](https://arxiv.org/html/2605.19738#S2.SS2.p1.1),[1st item](https://arxiv.org/html/2605.19738#S5.I3.i1.p1.1)\.
- \[4\]K\. Ding, J\. Li, R\. Bhanushali, and H\. Liu\(2019\)Deep anomaly detection on attributed networks\.InProceedings of the 2019 SIAM International Conference on Data Mining,pp\. 594–602\.Cited by:[3rd item](https://arxiv.org/html/2605.19738#S5.I2.i3.p1.1),[1st item](https://arxiv.org/html/2605.19738#S5.I6.i1.p1.1),[§V\-C](https://arxiv.org/html/2605.19738#S5.SS3.p1.1)\.
- \[5\]H\. Fan, F\. Zhang, and Z\. Li\(2020\)AnomalyDAE: dual autoencoder for anomaly detection on attributed networks\.InICASSP 2020 \- 2020 IEEE International Conference on Acoustics, Speech and Signal Processing \(ICASSP\),pp\. 5685–5689\.Cited by:[§II\-B](https://arxiv.org/html/2605.19738#S2.SS2.p1.1),[4th item](https://arxiv.org/html/2605.19738#S5.I2.i4.p1.1)\.
- \[6\]D\. Guo, Z\. Liu, and R\. Li\(2023\)RegraphGAN: a graph generative adversarial network model for dynamic network anomaly detection\.Neural Networks166,pp\. 273–285\.Cited by:[§I](https://arxiv.org/html/2605.19738#S1.p1.1)\.
- \[7\]J\. Guo, L\. Du, H\. Liu, M\. Zhou, X\. He, and S\. Han\(2023\)GPT4Graph: can large language models understand graph structured data? an empirical evaluation and benchmarking\.arXiv preprint arXiv:2305\.15066\.Cited by:[§I](https://arxiv.org/html/2605.19738#S1.p3.1),[§II\-A](https://arxiv.org/html/2605.19738#S2.SS1.p1.1)\.
- \[8\]X\. He, X\. Bresson, T\. Laurent, A\. Perold, Y\. LeCun, and B\. Hooi\(2024\)Harnessing explanations: llm\-to\-lm interpreter for enhanced text\-attributed graph representation learning\.InProceedings of the International Conference on Learning Representations 2024 \(ICLR 2024\),Cited by:[§I](https://arxiv.org/html/2605.19738#S1.p3.1)\.
- \[9\]T\. N\. Kipf and M\. Welling\(2017\)Semi\-supervised classification with graph convolutional networks\.InProceedings of the 5th International Conference on Learning Representation,Cited by:[§IV\-C](https://arxiv.org/html/2605.19738#S4.SS3.p2.1),[2nd item](https://arxiv.org/html/2605.19738#S5.I2.i2.p1.1)\.
- \[10\]Y\. Liu, Z\. Li, S\. Pan, C\. Gong, C\. Zhou, and G\. Karypis\(2021\)Anomaly detection on attributed networks via contrastive self\-supervised learning\.IEEE Transactions on Neural Networks and Learning Systems33\(6\),pp\. 2378–2392\.Cited by:[§II\-B](https://arxiv.org/html/2605.19738#S2.SS2.p2.1),[2nd item](https://arxiv.org/html/2605.19738#S5.I3.i2.p1.1)\.
- \[11\]R\. Luo, H\. Huang, T\. Tang, J\. Ren, Z\. Xu, M\. Hou, E\. Dai, and F\. Xia\(2026\)FairGE: fairness\-aware graph encoding in incomplete social networks\.InProceedings of the ACM on Web Conference 2026,Cited by:[§I](https://arxiv.org/html/2605.19738#S1.p3.1)\.
- \[12\]R\. Luo, H\. Huang, S\. Yu, F\. Yu, F\. Xia, S\. K\. Das, and C\. Zhang\(2026\)Utility\-preserving federated graph learning with dual\-perspective fairness\.IEEE Transactions on Pattern Analysis and Machine Intelligence\.Cited by:[§I](https://arxiv.org/html/2605.19738#S1.p2.1)\.
- \[13\]R\. Luo, D\. Zhang, Y\. Gao, W\. Shi, M\. Hou, J\. Liu, Z\. Wang, and S\. Yu\(2026\)Bridging semantic understanding and popularity bias with llms\.InProceedings of the ACM on Web Conference 2026,Cited by:[§I](https://arxiv.org/html/2605.19738#S1.p3.1)\.
- \[14\]X\. Ma, J\. Wu, J\. Yang, and Q\. Z\. Sheng\(2023\)Towards graph\-level anomaly detection via deep evolutionary mapping\.InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,pp\. 1631–1642\.Cited by:[§I](https://arxiv.org/html/2605.19738#S1.p1.1),[§V\-C](https://arxiv.org/html/2605.19738#S5.SS3.p2.1)\.
- \[15\]A\. D\. Pazho, G\. A\. Noghre, A\. A\. Purkayastha, J\. Vempati, O\. Martin, and H\. Tabkhi\(2024\)A survey of graph\-based deep learning for anomaly detection in distributed systems\.IEEE Transactions on Knowledge and Data Engineering36\(1\),pp\. 1–20\.Cited by:[§I](https://arxiv.org/html/2605.19738#S1.p1.1)\.
- \[16\]H\. Qiao, H\. Tong, B\. An, I\. King, C\. Aggarwal, and G\. Pang\(2025\)Deep graph anomaly detection: a survey and new perspectives\.IEEE Transactions on Knowledge and Data Engineering37\(9\)\.Cited by:[§I](https://arxiv.org/html/2605.19738#S1.p1.1)\.
- \[17\]A\. Roy, J\. Shu, J\. Li, C\. Yang, O\. Elshocht, J\. Smeets, and P\. Li\(2024\)GAD\-NR: graph anomaly detection via neighborhood reconstruction\.InProceedings of the 17th ACM International Conference on Web Search and Data Mining,pp\. 576–585\.Cited by:[§II\-B](https://arxiv.org/html/2605.19738#S2.SS2.p2.1),[1st item](https://arxiv.org/html/2605.19738#S5.I4.i1.p1.1)\.
- \[18\]M\. Sakurada and T\. Yairi\(2014\)Anomaly detection using autoencoders with nonlinear dimensionality reduction\.InProceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis,Cited by:[1st item](https://arxiv.org/html/2605.19738#S5.I2.i1.p1.1)\.
- \[19\]P\. Sen, G\. Namata, M\. Bilgic, L\. Getoor, B\. Galligher, and T\. Eliassi\-Rad\(2008\)Collective classification in network data\.AI Magazine29\(3\),pp\. 93–93\.Cited by:[1st item](https://arxiv.org/html/2605.19738#S5.I1.i1.p1.3),[2nd item](https://arxiv.org/html/2605.19738#S5.I1.i2.p1.2),[5th item](https://arxiv.org/html/2605.19738#S5.I1.i5.p1.2)\.
- \[20\]J\. Tang, J\. Zhang, L\. Yao, J\. Li, L\. Zhang, and Z\. Su\(2008\)ArnetMiner: extraction and mining of academic social networks\.InProceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,pp\. 990–998\.Cited by:[4th item](https://arxiv.org/html/2605.19738#S5.I1.i4.p1.2)\.
- \[21\]L\. Tang and H\. Liu\(2009\)Relational learning via latent social dimensions\.InProceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,pp\. 817–826\.Cited by:[6th item](https://arxiv.org/html/2605.19738#S5.I1.i6.p1.1)\.
- \[22\]M\. Trokhymovych, L\. Pintscher, R\. Baeza\-Yates, and D\. S\. Trumper\(2025\)Graph\-linguistic fusion: using language models for wikidata vandalism detection\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics,pp\. 284–294\.Cited by:[§II\-A](https://arxiv.org/html/2605.19738#S2.SS1.p1.1)\.
- \[23\]X\. Wang, H\. Dou, D\. Dong, and Z\. Meng\(2025\)Graph anomaly detection based on hybrid node representation learning\.Neural Networks185,pp\. 107169\.Cited by:[§II\-B](https://arxiv.org/html/2605.19738#S2.SS2.p2.1),[3rd item](https://arxiv.org/html/2605.19738#S5.I4.i3.p1.1)\.
- \[24\]F\. Xia, C\. Peng, J\. Ren, F\. G\. Febrinanto, R\. Luo, V\. Saikrishna, S\. Yu, and X\. Kong\(2026\)Graph learning\.Foundations and Trends® in Signal Processing,pp\. 362–519\.Cited by:[§I](https://arxiv.org/html/2605.19738#S1.p2.1)\.
- \[25\]C\. Xiao, X\. Xu, Y\. Lei, K\. Zhang, S\. Liu, and F\. Zhou\(2023\)Counterfactual graph learning for anomaly detection on attributed networks\.IEEE Transactions on Knowledge and Data Engineering35\(10\),pp\. 10540–10553\.Cited by:[§I](https://arxiv.org/html/2605.19738#S1.p1.1)\.
- \[26\]Z\. Xu, X\. Huang, Y\. Zhao, Y\. Dong, and J\. Li\(2022\)Contrastive attributed network anomaly detection with data augmentation\.InProceedings of the Pacific\-Asia Conference on Knowledge Discovery and Data Mining,pp\. 444–457\.Cited by:[§II\-B](https://arxiv.org/html/2605.19738#S2.SS2.p2.1)\.
- \[27\]R\. Ye, C\. Zhang, R\. Wang, S\. Xu, and Y\. Zhang\(2024\)Language is all a graph needs\.InFindings of the Association for Computational Linguistics: EACL 2024,pp\. 1955–1973\.Cited by:[§II\-A](https://arxiv.org/html/2605.19738#S2.SS1.p2.1)\.
- \[28\]X\. Yuan, N\. Zhou, S\. Yu, H\. Huang, Z\. Chen, and F\. Xia\(2021\)Higher\-order structure based anomaly detection on attributed networks\.InProceedings of the 2021 IEEE Conference on Big Data,pp\. 2691–2700\.Cited by:[3rd item](https://arxiv.org/html/2605.19738#S5.I1.i3.p1.2)\.
- \[29\]H\. Zhang, K\. Zeng, and S\. Lin\(2023\)Federated graph neural network for fast anomaly detection in controller area networks\.IEEE Transactions on Information Forensics and Security18,pp\. 1566–1579\.Cited by:[§I](https://arxiv.org/html/2605.19738#S1.p1.1),[§V\-C](https://arxiv.org/html/2605.19738#S5.SS3.p2.1)\.
- \[30\]Q\. Zhang, S\. Chen, Y\. Bei, Z\. Yuan, H\. Zhou, Z\. Hong, H\. Chen, Y\. Xiao, C\. Zhou, J\. Dong,et al\.\(2025\)A survey of graph retrieval\-augmented generation for customized large language models\.arXiv preprint arXiv:2501\.13958\.Cited by:[§I](https://arxiv.org/html/2605.19738#S1.p2.1)\.
- \[31\]Z\. Zhang, Y\. Hu, B\. Pan, C\. Ling, and L\. Zhao\(2025\)TAGA: text\-attributed graph self\-supervised learning by synergizing graph and text mutual transformations\.InProceedings of the 34th ACM International Conference on Information and Knowledge Management,pp\. 4263–4272\.Cited by:[§II\-A](https://arxiv.org/html/2605.19738#S2.SS1.p1.1)\.
- \[32\]Z\. Zhang, X\. Wang, H\. Zhou, Y\. Yu, M\. Zhang, C\. Yang, and C\. Shi\(2025\)Can large language models improve the adversarial robustness of graph neural networks?\.InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining,pp\. 2008–2019\.Cited by:[§II\-A](https://arxiv.org/html/2605.19738#S2.SS1.p1.1)\.
- \[33\]Y\. Zheng, H\. Y\. Koh, M\. Jin, L\. Chi, K\. T\. Phan, S\. Pan, Y\. P\. Chen, and W\. Xiang\(2024\)Correlation\-aware spatial–temporal graph learning for multivariate time\-series anomaly detection\.IEEE Transactions on Neural Networks and Learning Systems35\(9\),pp\. 11802–11816\.Cited by:[§I](https://arxiv.org/html/2605.19738#S1.p1.1)\.
Wen Shiis currently a Master student in College of Software Engineering, Jilin University, Changchun, China\. Before that, he received the B\.Sc\. degree from Northeast Agricultural University, Harbin, China, in 2024\. His research interests include graph learning and large language models\.Zhe Wangreceived the B\.Sc\., M\.Sc\. and Ph\.D\. degrees from Jilin University, Changchun, China, in 1997, 2001 and 2005, respectively\. He is currently a Full Professor with the College of Computer Science and Technology, Jilin University, Changchun, China\. He has published over 50 scientific papers in international journals and confernce\. His research interests include artificial intelligence, data mining, social media mining, business intelligence\.Huafei Huang\(Graduate Student Member, IEEE\) received the BSc degree in Internet of Things engineering from the North University of China in 2020, and the MSc degree in software engineering from the Dalian University of Technology in 2023\. He is currently working toward the PhD degree with the School of Computer Science and Information Technology, Adelaide University\. His research interests include graph learning and large language models\.Qing Qingis currently a PhD student in College of Computer Science and Technology, Jilin University, Changchun, China\. Before that, she received the B\.Sc\. degree from Northeast Agricultural University, Harbin, China, in 2018, and the M\.Sc\. degree from Dalian University of Technology, Dalian, China, in 2021\. Her research interests include graph learning, algorithmic fairness, responsible AI\.Ziqi Xureceived the M\.S\. degree in Computing and Innovation from the School of Computer and Mathematical Sciences, The University of Adelaide, Australia, and the Ph\.D\. degree in Computer Science from the University of South Australia, Australia\. He is currently a Lecturer in Data Science and Artificial Intelligence with the School of Computing Technologies, RMIT University, Australia\. His research interests include responsible AI, causal inference, fairness, and explainable machine learning\.Qixin Zhangreceived his B\.S\. degree from the University of Science and Technology of China in 2018\. He subsequently earned his Ph\.D\. degree in the College of Computing at City University of Hong Kong in 2024\. Currently, he is a Research Fellow at Nanyang Technological University, Singapore\. His research interests include optimization, subset selection, online learning and large language models\. He has published over 25 papers in top\-tier venues such as ICML, NeurIPS, ICLR, CVPR, ACL and TKDE\.Xikun Zhangis a Lecturer at the School of Computing Technologies at RMIT University\. He received his Ph\.D\. from the School of Computer Science at the University of Sydney\. His research interests span deep graph learning, reasoning with large language models, and biomedical AI\. His work has been published in leading conferences and journals, including ICLR, NeurIPS, KDD, ICDM, CVPR, ECCV, TPAMI, and TNNLS\.Renqiang Luoreceived the B\.Sc\. degree from University of Science and Technology of China, Hefei, China, in 2016, and the M\.Sc\. degree from University of South Australia, Adelaide, Australia, in 2019\. He received a Ph\.D\. degree in the School of Software, Dalian University of Technology, Dalian, China, in 2024\. Dr\. Renqiang Luo is currently an Assistant Professor in the Jilin University, Changchun, China\. His research interests include graph learning, algorithmic fairness, and trustworthy AI\.Feng Xia\(Fellow, IEEE\) received the BSc and PhD degrees from Zhejiang University, Hangzhou, China\. He is a Professor in School of Computing Technologies, RMIT University, Australia\. Recognized as a Clarivate Highly Cited Researcher and a ScholarGPS Highly Ranked Scholar, Dr\. Xia has published over 400 scientific papers\. His work is featured in top\-tier journals and conferences\. Dr\. Xia has extensive editorial and organizational experience, having served as an Associate or Guest Editor for over 20 journals and in various Chair roles for more than 30 conferences\. His contributions and leadership have been recognized by prestigious awards\. He has delivered numerous keynote speeches and invited talks at international venues worldwide\. He is the Chair of IEEE Task Force on Learning for Graphs\. His research interests include artificial intelligence, graph learning, brain, robotics, and cyber\-physical systems\. He is a Fellow of the IEEE\.Similar Articles
DDGAD: Trajectory Dynamics for Diffusion-Based Graph Anomaly Detection
Proposes DDGAD, a diffusion-based framework for graph anomaly detection that uses trajectory dynamics to distinguish normal from anomalous nodes, mitigating contamination propagation via a reliability-aware consensus mechanism and three complementary anomaly signals.
Text-Graph Synergy: A Bidirectional Verification and Completion Framework for RAG
This paper introduces TGS-RAG, a bidirectional verification and completion framework that synergizes text-based and graph-based Retrieval-Augmented Generation to improve multi-hop reasoning accuracy.
A Temporally Augmented Graph Attention Network for Affordance Classification
EEG-tGAT is a temporally augmented Graph Attention Network that improves affordance classification from interaction sequences by incorporating temporal attention and dropout mechanisms. The model enhances GATv2 for sequential data where temporal dimensions are semantically non-uniform.
Robustness of Graph Self-Supervised Learning to Real-World Noise: A Case Study on Text-Driven Biomedical Graphs
This paper introduces NATD-GSSL, a framework evaluating the robustness of Graph Self-Supervised Learning on noisy, text-driven biomedical graphs. It demonstrates that certain GNN architectures and pretext tasks maintain performance despite real-world noise, offering practical guidance for unsupervised learning in imperfect datasets.
Modeling Spectral Energy Shifts in Spatio-Temporal Graph Anomaly Detection
Proposes a node-level spectral energy formulation for detecting camouflaged anomalies in graphs, extending to spatio-temporal settings with energy-driven message passing. Demonstrates effectiveness on large-scale benchmarks.