PropLLM: Propagation-Aware Scene Reconstruction for Network Fault Diagnosis

arXiv cs.AI 06/02/26, 04:00 AM Papers
Summary
PropLLM integrates hop-by-hop scene reconstruction with LLMs for network fault diagnosis. It uses a dual-layer knowledge graph and a temporal causal propagation attention mechanism to trace back along propagation paths, improving accuracy and reducing hallucinations.
arXiv:2606.00582v1 Announce Type: new Abstract: Network faults propagate layer by layer along topology and protocol dependencies, yet operations systems typically observe only symptomatic alerts at the tail end of propagation chains, where distinct root-cause faults may produce highly similar end-point symptoms. Existing approaches, whether rule-based, machine learning (ML)-based, or large language model (LLM)-based, fundamentally map the alert set to a diagnosis in a single pass and are structurally incapable of resolving this end-point ambiguity. This paper proposes PropLLM, which is the first to integrate the hop-by-hop scene reconstruction paradigm with the generative reasoning capabilities of LLMs. Starting from end-point alerts, PropLLM traces back hop-by-hop along the propagation path, retrieving verifiable factual evidence from a dual-layer knowledge graph (KG) at each hop, while the proposed Temporal Causal Propagation Attention (TCPA) mechanism encodes known topological causal priors directly into the attention computation to guide the model along the correct causal direction, ultimately localizing the root cause and determining the fault type through a fully evidenced causal chain. On a real-world Wi-Fi multimodal fault dataset, PropLLM improves fault type diagnosis accuracy by 3.9\% and root cause localization accuracy by 4.7\% over the strongest baseline, while reducing the hallucination rate by 50.8\%. Supplementary experiments on the TeleLogs 5G dataset further demonstrate the effectiveness of the proposed method across different network scenarios.
Original Article
View Cached Full Text
Cached at: 06/02/26, 03:47 PM
# PropLLM: Propagation-Aware Scene Reconstruction for Network Fault Diagnosis This work was supported in part by the National Natural Science Foundation of China under Grant 62302527, and in part by the High Performance Computing Center of Central South University. (Corresponding author: Fengxiao Tang.)
Source: [https://arxiv.org/html/2606.00582](https://arxiv.org/html/2606.00582)
###### Abstract

Network faults propagate layer by layer along topology and protocol dependencies, yet operations systems typically observe only symptomatic alerts at the tail end of propagation chains, where distinct root\-cause faults may produce highly similar end\-point symptoms\. Existing approaches, whether rule\-based, machine learning \(ML\)\-based, or large language model \(LLM\)\-based, fundamentally map the alert set to a diagnosis in a single pass and are structurally incapable of resolving this end\-point ambiguity\. This paper proposes PropLLM, which is the first to integrate the hop\-by\-hop scene reconstruction paradigm with the generative reasoning capabilities of LLMs\. Starting from end\-point alerts, PropLLM traces back hop\-by\-hop along the propagation path, retrieving verifiable factual evidence from a dual\-layer knowledge graph \(KG\) at each hop, while the proposed Temporal Causal Propagation Attention \(TCPA\) mechanism encodes known topological causal priors directly into the attention computation to guide the model along the correct causal direction, ultimately localizing the root cause and determining the fault type through a fully evidenced causal chain\. On a real\-world Wi\-Fi multimodal fault dataset, PropLLM improves fault type diagnosis accuracy by 3\.9% and root cause localization accuracy by 4\.7% over the strongest baseline, while reducing the hallucination rate by 50\.8%\. Supplementary experiments on the TeleLogs 5G dataset further demonstrate the effectiveness of the proposed method across different network scenarios\.

## IIntroduction

Network fault diagnosis \(NFD\) aims to accurately determine fault types \(e\.g\., link degradation, misconfiguration, device failure\) from observed anomalies, thereby guiding operators toward targeted remediation\[[20](https://arxiv.org/html/2606.00582#bib.bib1),[47](https://arxiv.org/html/2606.00582#bib.bib2)\]\. The fundamental difficulty of this task lies in the fact that network faults propagate layer by layer along topology and protocol dependencies, yet operations systems typically observe only symptomatic alerts at the tail end of propagation chains \(as illustrated in Fig\.[1](https://arxiv.org/html/2606.00582#S1.F1)\(a\)\)\. These end\-point alerts are highly ambiguous: different types of root\-cause faults may produce similar downstream symptoms, and the same fault may manifest entirely different alert patterns along different propagation paths\.

Current mainstream methods—whether based on rule matching\[[20](https://arxiv.org/html/2606.00582#bib.bib1)\], machine learning classification\[[50](https://arxiv.org/html/2606.00582#bib.bib3),[7](https://arxiv.org/html/2606.00582#bib.bib4)\], or large language model \(LLM\)\-based generation\[[1](https://arxiv.org/html/2606.00582#bib.bib5)\]—all follow a single\-pass mapping paradigm\. They directly map the observed alert set to a fault type or root cause through one\-step feature extraction or reasoning, without tracing backward along the propagation path for hop\-by\-hop reconstruction and hypothesis verification \(as shown in the upper half of Fig\.[1](https://arxiv.org/html/2606.00582#S1.F1)\(b\)\)\. The core limitation of this paradigm is that alerts represent only the end products of fault propagation\. The same set of symptoms can arise from heterogeneous root causes via different paths, making it impossible to resolve endpoint ambiguity from observations alone\. Accurate fault diagnosis requires reconstructing the full causal chain by tracing back to the root cause, which the single\-pass paradigm is inherently unable to achieve\.

![Refer to caption](https://arxiv.org/html/2606.00582v1/x1.png)Figure 1:\(a\) Fault cascading propagation produces ambiguous end\-point alerts\. \(b\) Top: one\-shot mapping paradigm; Bottom: hop\-by\-hop scene reconstruction paradigm of PropLLM\.In practice, experienced network engineers resolve end\-point alert ambiguity by tracing back hop\-by\-hop along topology and protocol dependencies\. They reconstruct node states, validate causal hypotheses, and localize the root cause \(as shown in the lower half of Fig\.[1](https://arxiv.org/html/2606.00582#S1.F1)\(b\)\)\. This approach ensures diagnosis is based on a fully evidenced causal chain rather than statistical symptom correlations\. Automating this process requires a reasoning engine that interprets multimodal data, dynamically adjusts hypotheses, and produces interpretable diagnosis chains\[[29](https://arxiv.org/html/2606.00582#bib.bib7)\]\. LLMs are a natural choice for this task due to their cross\-modal understanding and generative reasoning capabilities\[[33](https://arxiv.org/html/2606.00582#bib.bib8),[35](https://arxiv.org/html/2606.00582#bib.bib9),[1](https://arxiv.org/html/2606.00582#bib.bib5),[5](https://arxiv.org/html/2606.00582#bib.bib6)\]\. Integrating LLMs into hop\-by\-hop reconstruction requires two conditions: \(1\) verifiable evidence must be retrievable at each hop for state reconstruction, and \(2\) the reasoning must be causally aware, distinguishing upstream causes from downstream effects\.

However, existing work has systematic gaps in factual grounding, making hop\-by\-hop scene reconstruction difficult in practice\. Scene reconstruction requires restoring each node’s state using two distinct types of facts: structural knowledge \(topology, protocol configurations, and device parameters\) and experiential knowledge \(historical fault patterns on similar paths\)\. Although knowledge graphs \(KGs\) are suitable for organizing both\[[16](https://arxiv.org/html/2606.00582#bib.bib10)\], most existing methods use flat structures that mix them indiscriminately\[[23](https://arxiv.org/html/2606.00582#bib.bib11),[36](https://arxiv.org/html/2606.00582#bib.bib12)\]\. This mixing introduces retrieval noise and fails to capture hierarchical relationships, as the two knowledge types differ in update frequency, query patterns, and indexing\[[14](https://arxiv.org/html/2606.00582#bib.bib13)\]\. Moreover, current RAG methods\[[19](https://arxiv.org/html/2606.00582#bib.bib14)\]inject knowledge in a single pass\[[2](https://arxiv.org/html/2606.00582#bib.bib15)\], lacking hop\-by\-hop verification, which makes models prone to hallucination and produces diagnosis paths that deviate from the true causal chain\.

Existing work also lacks causal awareness\. Standard attention mechanisms\[[31](https://arxiv.org/html/2606.00582#bib.bib16)\]assign symmetric weights to all tokens, failing to distinguish causal direction and often misidentifying end\-point symptoms as root causes\. Although chain\-of\-thought \(CoT\) prompting\[[38](https://arxiv.org/html/2606.00582#bib.bib17)\]can guide reasoning steps, it does not alter the underlying symmetric attention computation, so directional errors persist in complex scenarios\. Crucially, network fault propagation direction is a known structural prior derived from topology and protocol dependencies, unlike microservice systems where causality must be discovered from data\[[15](https://arxiv.org/html/2606.00582#bib.bib18),[44](https://arxiv.org/html/2606.00582#bib.bib19),[43](https://arxiv.org/html/2606.00582#bib.bib20)\]\. The key missing component is a mechanism that encodes these causal priors directly into attention layers, enabling directional awareness at every step rather than relying on post\-hoc prompts\.

This paper proposes PropLLM, the first framework to integrate hop\-by\-hop hypothesis\-verification scene reconstruction with the generative reasoning capabilities of LLMs\. To enable factual grounding, we construct a dual\-layer knowledge graph that separates structural knowledge of topology and protocols from experiential knowledge of historical fault patterns, with cross\-layer associations for efficient retrieval\. To achieve propagation\-aware causal reasoning, we introduce the Temporal Causal Propagation Attention \(TCPA\) mechanism, which encodes topological causal priors into every attention layer through a causal direction mask, propagation diffusion matrix, and temporal bias\. The TCPA output is injected into the LLM decoder via cross\-attention\. Built on this foundation, PropLLM performs hop\-by\-hop reconstruction through a dynamic closed loop, where verification results trigger targeted retrieval at the next hop, constraining the reasoning chain with factual evidence and effectively suppressing hallucination\.

The main contributions of this paper are as follows:

- •We propose the PropLLM framework, which for the first time formalizes the hop\-by\-hop scene reconstruction methodology of human experts into a computable reasoning paradigm, revealing that accurate fault type diagnosis depends on a fully evidenced causal chain rather than single\-pass mapping from end\-point observations\.
- •We propose TCPA, a standalone Transformer encoder that encodes known topological causal priors into every attention layer and injects its output into the LLM Decoder via cross\-attention, enabling continuous perception of fault propagation direction during generation\.
- •We construct a dual\-layer KG separating structural from experiential knowledge, with a dynamic closed\-loop retrieval mechanism that constrains each hop’s reasoning with factual evidence, effectively suppressing hallucination\.

## IIRELATED WORK

### II\-ANetwork Fault Diagnosis Methods

The evolution of NFD methods has shifted from rule\-driven to data\-driven approaches\. Rule\-based methods\[[20](https://arxiv.org/html/2606.00582#bib.bib1)\]depend on expert\-defined rules but scale poorly and cover only known faults\. Traditional ML methods\[[20](https://arxiv.org/html/2606.00582#bib.bib1),[50](https://arxiv.org/html/2606.00582#bib.bib3)\]treat diagnosis as classification but struggle with topological and temporal dependencies\. Deep learning approaches\[[7](https://arxiv.org/html/2606.00582#bib.bib4),[21](https://arxiv.org/html/2606.00582#bib.bib21),[49](https://arxiv.org/html/2606.00582#bib.bib22)\]improve representation learning yet still rely on statistical co\-occurrence and falter under ambiguous alert patterns from multiple root causes\. Unlike microservice systems \(e\.g\., MULAN\[[51](https://arxiv.org/html/2606.00582#bib.bib23)\], Minder\[[8](https://arxiv.org/html/2606.00582#bib.bib24)\]\), where causality must be discovered from data, network fault propagation is a known structural prior from topology and protocols\. Recent LLM\-based methods\[[1](https://arxiv.org/html/2606.00582#bib.bib5),[5](https://arxiv.org/html/2606.00582#bib.bib6),[17](https://arxiv.org/html/2606.00582#bib.bib25),[33](https://arxiv.org/html/2606.00582#bib.bib8)\]show promise, but effectively harnessing their reasoning capabilities in knowledge interaction and causal awareness remains an open question\.

### II\-BLLM\-Driven Fault Diagnosis

Existing work on applying LLMs to fault diagnosis can be categorized into three main approaches\. The end\-to\-end generation approach directly maps event descriptions to diagnostic results\[[1](https://arxiv.org/html/2606.00582#bib.bib5),[5](https://arxiv.org/html/2606.00582#bib.bib6),[45](https://arxiv.org/html/2606.00582#bib.bib26)\]\. While offering strong zero\-shot generalization, these methods rely solely on parametric knowledge and lack external factual grounding\. The retrieval\-augmented approach enhances LLMs via RAG\[[19](https://arxiv.org/html/2606.00582#bib.bib14)\]\[[17](https://arxiv.org/html/2606.00582#bib.bib25),[33](https://arxiv.org/html/2606.00582#bib.bib8)\]\. However, they typically follow a single\-retrieval, single\-generation pipeline, which cannot support the dynamic knowledge demands of multi\-hop reasoning\. The agent\-based approach equips LLMs with dynamic tool invocation\[[35](https://arxiv.org/html/2606.00582#bib.bib9),[43](https://arxiv.org/html/2606.00582#bib.bib20)\]\. Nevertheless, they still suffer from temporal decoupling between knowledge acquisition and reasoning, preventing closed\-loop hop\-by\-hop verification\. Furthermore, all three approaches lack causal awareness\. LLM self\-attention\[[31](https://arxiv.org/html/2606.00582#bib.bib16)\]assigns symmetric weights and cannot distinguish causal direction, while CoT prompting\[[38](https://arxiv.org/html/2606.00582#bib.bib17)\]only operates at the decoding level\. In summary, existing LLM\-based fault diagnosis methods suffer from two core limitations: \(1\) insufficient on\-demand hop\-by\-hop knowledge interaction and verification, and \(2\) inability to perceive causal propagation direction in attention mechanisms\.

### II\-CKnowledge Graph Representation and Causal Reasoning

The previous subsection identified knowledge interaction and causal awareness as two core gaps\. This subsection reviews the technical foundations of the proposed dual\-layer KG and TCPA\.

For KG representation in fault diagnosis, early studies mainly relied on single\-layer knowledge graphs combined with GCNs for fault classification\[[23](https://arxiv.org/html/2606.00582#bib.bib11)\]\. Recent methods introduced temporal knowledge graphs in UniDiag\[[48](https://arxiv.org/html/2606.00582#bib.bib27)\]and hierarchical structures in KG4Diagnosis\[[52](https://arxiv.org/html/2606.00582#bib.bib28)\]\. KG\-augmented LLM approaches\[[40](https://arxiv.org/html/2606.00582#bib.bib52),[24](https://arxiv.org/html/2606.00582#bib.bib29),[11](https://arxiv.org/html/2606.00582#bib.bib30)\]further explore retrieval\-reasoning closed loops\. However, none of these methods separates structural knowledge from experiential knowledge into semantically linked yet distinct layers\. This separation is critical for accurate fault diagnosis, as it enables simultaneous verification of current states using structural knowledge and historical fault patterns using experiential knowledge at each reasoning step\. Mixing the two types introduces retrieval noise and hinders precise on\-demand verification\.

For causally\-aware attention, existing methods\[[18](https://arxiv.org/html/2606.00582#bib.bib31),[30](https://arxiv.org/html/2606.00582#bib.bib32),[6](https://arxiv.org/html/2606.00582#bib.bib33),[26](https://arxiv.org/html/2606.00582#bib.bib34)\]focus on discovering latent causal relations from data\. In contrast, PropLLM operates under a known structural prior of fault propagation direction obtainable from network configurations\. While input\-stage GNNs inject topology only once, TCPA encodes causal priors into every attention layer through its causal direction mask and propagation diffusion matrix, achieving deeper integration\.

## IIIDual\-Layer Knowledge Graph Construction

Each step of hop\-by\-hop verification requires two fundamentally different types of knowledge, namely structural knowledge describing network topology and protocol behavior specifications, and experiential knowledge derived from historical fault cases\. We organize these into an infrastructure\-layer graph𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}and a fault\-experience\-layer graph𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}, linked by cross\-layer semantic associations for joint querying\. Fig\.[2](https://arxiv.org/html/2606.00582#S3.F2)illustrates the overall architecture and knowledge sources of the dual\-layer KG\.

The knowledge sources of the dual\-layer KG fall into three categories: \(1\) domain standards \(IEEE 802\.11, TCP/IP specifications, common topology patterns\) from protocol documents and engineering textbooks, forming the backbone of𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}; \(2\) operations expert knowledge \(causal propagation patterns, symptom signatures, diagnostic rules\) from expert interviews and manuals, constituting prior causal templates in𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}; and \(3\) instantiation data \(topology configurations, monitoring records\) from specific network environments, binding generic knowledge to a concrete network\.

Since the third category partially overlaps with the evaluation dataset, we clarify fairness safeguards\. Information from the dataset in𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}is limited to topology structure and static device attributes—equivalent to a deployment\-time asset inventory without fault labels or runtime state, thus not constituting information leakage\. Case knowledge in𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}is drawn strictly from the training split, with the first 80% by occurrence time used for graph construction and no test case participating\. Protocol specifications and topology patterns in𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}, as well as prior causal templates in𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}, are publicly available domain knowledge independent of any specific dataset\.

![Refer to caption](https://arxiv.org/html/2606.00582v1/x2.png)Figure 2:Architecture of the dual\-layer knowledge graph with cross\-layer semantic links\.### III\-AInfrastructure\-Layer Graph𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}

𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}structurally encodes the network’s topology, device capabilities, protocol behavior specifications, and communication patterns, providing hop\-by\-hop verification with foundational facts such as each device’s role, connectivity, protocol configuration, and normal behavior baseline\.

LetGinfra=\(𝒱infra,ℰinfra,𝒜infra\)G\_\{\\text\{infra\}\}=\(\\mathcal\{V\}\_\{\\text\{infra\}\},\\mathcal\{E\}\_\{\\text\{infra\}\},\\mathcal\{A\}\_\{\\text\{infra\}\}\), where𝒱infra\\mathcal\{V\}\_\{\\text\{infra\}\}is the entity set,ℰinfra⊆𝒱infra×ℛinfra×𝒱infra\\mathcal\{E\}\_\{\\text\{infra\}\}\\subseteq\\mathcal\{V\}\_\{\\text\{infra\}\}\\times\\mathcal\{R\}\_\{\\text\{infra\}\}\\times\\mathcal\{V\}\_\{\\text\{infra\}\}is the relation set,ℛinfra\\mathcal\{R\}\_\{\\text\{infra\}\}is the relation type set, and𝒜infra:𝒱infra→2𝒦×Val\\mathcal\{A\}\_\{\\text\{infra\}\}:\\mathcal\{V\}\_\{\\text\{infra\}\}\\to 2^\{\\mathcal\{K\}\\times\\text\{Val\}\}is the attribute mapping\.

𝒱infra\\mathcal\{V\}\_\{\\text\{infra\}\}comprises four entity types:*device*entities carrying role labels that determine structural positions along propagation paths;*interface*entities recording operating mode, frequency band, and channel parameters;*protocol*entities encoding behavioral specifications serving as anomaly detection baselines; and*topology pattern*entities representing common network topologies and their propagation characteristics\.ℛinfra\\mathcal\{R\}\_\{\\text\{infra\}\}contains six relation types: physical/wireless connectivity, device–interface ownership, protocol instantiation, data\-flow communication, topology pattern instantiation, and inter\-protocol dependency\.

GinfraG\_\{\\text\{infra\}\}is constructed in two stages:*generic knowledge injection*extracts protocol entities, topology patterns, and inter\-protocol dependencies from standard documents and engineering knowledge bases;*instantiation*binds this generic knowledge to a concrete network by extracting devices and connectivity from topology configurations and parsing interface attributes and communication relations from device and traffic records\.

### III\-BFault\-Experience\-Layer Graph𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}

𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}structurally encodes the causal propagation patterns of fault types and the observational data of historical cases, providing hop\-by\-hop verification with experiential evidence on how specific symptoms typically propagate and what root causes they have historically pointed to\.

LetGfault=\(𝒱fault,ℰfault,𝒜fault\)G\_\{\\text\{fault\}\}=\(\\mathcal\{V\}\_\{\\text\{fault\}\},\\mathcal\{E\}\_\{\\text\{fault\}\},\\mathcal\{A\}\_\{\\text\{fault\}\}\), where𝒱fault\\mathcal\{V\}\_\{\\text\{fault\}\}is the fault\-related entity set,ℰfault⊆𝒱fault×ℛfault×𝒱fault\\mathcal\{E\}\_\{\\text\{fault\}\}\\subseteq\\mathcal\{V\}\_\{\\text\{fault\}\}\\times\\mathcal\{R\}\_\{\\text\{fault\}\}\\times\\mathcal\{V\}\_\{\\text\{fault\}\}is the relation set, and𝒜fault\\mathcal\{A\}\_\{\\text\{fault\}\}is the attribute mapping\.

𝒱fault\\mathcal\{V\}\_\{\\text\{fault\}\}comprises five entity types:*fault type*entities encoding definitions and triggering conditions;*causal template*entities describing typical propagation paths distilled from expert knowledge;*fault case*entities carrying type labels, timestamps, and root\-cause devices;*alert*entities with codes, severity levels, and device identifiers; and*symptom pattern*entities aggregating similar alert combinations across cases\.ℛfault\\mathcal\{R\}\_\{\\text\{fault\}\}contains six relation types: fault type–template association, temporal triggering between alerts, alert–case membership, case–type determination, case–device localization, and case–symptom pattern association\.

The temporal triggering relation is the core structure ofGfaultG\_\{\\text\{fault\}\}\. For each training\-set case, alerts are sorted by timestamp; for pairs within a time windowΔtmax\\Delta t\_\{\\text\{max\}\}whose devices are topologically connected inGinfraG\_\{\\text\{infra\}\}, a triggering edge is established from earlier to later, and transitive reduction removes redundant edges, converting each case into a directed acyclic propagation graph\. Symptom pattern entities are generated by hierarchical clustering over training\-set cases using alert code sets as features, each associated with a fault type distribution for rapid hypothesis narrowing\.

### III\-CCross\-Layer Semantic Links

The core value of the dual\-layer KG lies in cross\-layer joint querying\. When traceback reaches a device, the model must simultaneously access its infrastructure attributes and historical propagation behavior under similar alerts\. This is realized through two semantic links:*device anchoring*links alert entities in𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}to device entities in𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}via identifiers, enabling retrieval of both historical alerts and topological position in a single traversal;*topology\-template association*connects causal templates in𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}with topology patterns in𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}, allowing the model to filter inapplicable templates based on the current network’s topology\.

The complete dual\-layer KG is defined as𝒢=\(𝒢infra,𝒢fault,ℒcross\)\\mathcal\{G\}=\(\\mathcal\{G\}\_\{\\text\{infra\}\},\\mathcal\{G\}\_\{\\text\{fault\}\},\\mathcal\{L\}\_\{\\text\{cross\}\}\), whereℒcross\\mathcal\{L\}\_\{\\text\{cross\}\}is the cross\-layer link set\. For thekk\-th step of hop\-by\-hop verification, given the current devicedkd\_\{k\}and causal hypothesishkh\_\{k\}, the model executes the following cross\-layer joint queries:

qstruct\(dk\)\\displaystyle q\_\{\\text\{struct\}\}\(d\_\{k\}\)=\{\(v,r,v′\)∈ℰinfra∣v=dk∨v′=dk\}\\displaystyle=\\\{\(v,r,v^\{\\prime\}\)\\in\\mathcal\{E\}\_\{\\text\{infra\}\}\\mid v=d\_\{k\}\\lor v^\{\\prime\}=d\_\{k\}\\\}\(1\)qexp\(dk,hk\)=\{\(v,r,v′\)∈ℰfault∣anchor\(v\)=dk∧sim\(attr\(v\),hk\)\>τ\}q\_\{\\text\{exp\}\}\(d\_\{k\},h\_\{k\}\)=\\\{\(v,r,v^\{\\prime\}\)\\in\\mathcal\{E\}\_\{\\text\{fault\}\}\\mid\\text\{anchor\}\(v\)=d\_\{k\}\\\\ \\land\\;\\text\{sim\}\(\\text\{attr\}\(v\),h\_\{k\}\)\>\\tau\\\}\(2\)whereqstructq\_\{\\text\{struct\}\}returns all associated facts of devicedkd\_\{k\}in the infrastructure layer, including topological neighbors, protocol configurations, and normal communication baselines;qexpq\_\{\\text\{exp\}\}returns historical alerts and propagation records of devicedkd\_\{k\}in the fault\-experience layer that are semantically similar to the current hypothesishkh\_\{k\}, withτ\\tauas the similarity threshold\. The results of both queries jointly constitute the factual evidence for verification at stepkk\.

### III\-DGraph Statistics and Analysis

Table[I](https://arxiv.org/html/2606.00582#S3.T1)reports the KG statistics\. Protocol and topology pattern entities in𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}originate from public domain knowledge \(55\.8% of𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}entities\), ensuring transferability—deploying to a new network requires updating only device and interface entities\.𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}accounts for 95\.5% of total entities and 77\.6% of relations, reflecting that diagnostic complexity lies in propagation pattern diversity rather than network structure\. Every alert entity is anchored to a device entity for seamless cross\-layer access\. All cases in𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}are strictly confined to the training split\.

TABLE I:Dual\-layer knowledge graph statistics\.LayerEntity TypeCountSource𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}Device21Dataset topology configInterface21Dataset device recordsProtocol47Protocol standards \(IEEE/IETF\)Topology pattern6Network engineering KBRelations312—𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}Fault type11Domain taxonomyCausal template34Operations expert knowledgeFault case472Training split \(80%\)Alert14,780Training split \(80%\)Symptom pattern89Clustering over training casesRelations52,436—ℒcross\\mathcal\{L\}\_\{\\text\{cross\}\}Anchor14,780Device identifier matchingTopo\_match34Topology\-template semantic matchingTotal entities15,481Total relations67,562

## IVPropLLM Framework

As illustrated in Fig\.[3](https://arxiv.org/html/2606.00582#S4.F3), PropLLM consists of four modules: \(1\) multimodal encoding, which maps heterogeneous monitoring data into a unified semantic space; \(2\) subgraph retrieval and knowledge injection, which dynamically retrieves relevant subgraphs from the dual\-layer KG based on the current reasoning state; \(3\) the TCPA Transformer, which drives hop\-by\-hop causal traceback along the propagation path; and \(4\) output and verification, which generates fault type diagnoses and causal explanation chains with consistency checking\. Training employs reinforcement learning with diagnosis accuracy, causal chain quality, and hallucination suppression as joint reward signals\.

![Refer to caption](https://arxiv.org/html/2606.00582v1/x3.png)Figure 3:Overall architecture of the PropLLM framework\.### IV\-AMultimodal Encoding Module

The input comprises three modalities: alert text/logs, device performance time series, and topological graph structure\. Three parallel encoding paths capture modality\-specific structures\.

For alert and log text of devicedid\_\{i\}, a pretrained language model embedding layer\[[9](https://arxiv.org/html/2606.00582#bib.bib35)\]produces:

𝐡itext=LLM\_Embed\(Concat\(w1,…,wm\)\)∈ℝdt\\mathbf\{h\}\_\{i\}^\{\\text\{text\}\}=\\text\{LLM\\\_Embed\}\(\\text\{Concat\}\(w\_\{1\},\\ldots,w\_\{m\}\)\)\\in\\mathbb\{R\}^\{d\_\{t\}\}\(3\)where\{w1,…,wm\}\\\{w\_\{1\},\\ldots,w\_\{m\}\\\}is the concatenated alert sequence anddtd\_\{t\}is the embedding dimension\.

For performance time series𝐗i∈ℝT×F\\mathbf\{X\}\_\{i\}\\in\\mathbb\{R\}^\{T\\times F\}withTTtime steps andFFfeatures, a bidirectional LSTM\[[28](https://arxiv.org/html/2606.00582#bib.bib36)\]captures temporal dependencies:

𝐡its=LSTM\(𝐗i;θlstm\)∈ℝds\\mathbf\{h\}\_\{i\}^\{\\text\{ts\}\}=\\text\{LSTM\}\(\\mathbf\{X\}\_\{i\};\\theta\_\{\\text\{lstm\}\}\)\\in\\mathbb\{R\}^\{d\_\{s\}\}\(4\)where𝐡its\\mathbf\{h\}\_\{i\}^\{\\text\{ts\}\}is the final hidden state anddsd\_\{s\}is the hidden dimension\.

For the network topology with adjacency matrix𝐀∈ℝN×N\\mathbf\{A\}\\in\\mathbb\{R\}^\{N\\times N\}, a graph attention network \(GAT\)\[[32](https://arxiv.org/html/2606.00582#bib.bib37)\]encodes topological context:

𝐡itopo=GAT\(𝐇\(0\),𝐀;θgat\)∈ℝdg\\mathbf\{h\}\_\{i\}^\{\\text\{topo\}\}=\\text\{GAT\}\(\\mathbf\{H\}^\{\(0\)\},\\mathbf\{A\};\\theta\_\{\\text\{gat\}\}\)\\in\\mathbb\{R\}^\{d\_\{g\}\}\(5\)where𝐇\(0\)\\mathbf\{H\}^\{\(0\)\}is the initial feature matrix from device role encodings\.

The three representations are fused via learnable linear projections:

𝐳i=𝐖t𝐡itext\+𝐖s𝐡its\+𝐖g𝐡itopo\+𝐛\\mathbf\{z\}\_\{i\}=\\mathbf\{W\}\_\{t\}\\mathbf\{h\}\_\{i\}^\{\\text\{text\}\}\+\\mathbf\{W\}\_\{s\}\\mathbf\{h\}\_\{i\}^\{\\text\{ts\}\}\+\\mathbf\{W\}\_\{g\}\\mathbf\{h\}\_\{i\}^\{\\text\{topo\}\}\+\\mathbf\{b\}\(6\)where𝐖t∈ℝd×dt\\mathbf\{W\}\_\{t\}\\in\\mathbb\{R\}^\{d\\times d\_\{t\}\},𝐖s∈ℝd×ds\\mathbf\{W\}\_\{s\}\\in\\mathbb\{R\}^\{d\\times d\_\{s\}\},𝐖g∈ℝd×dg\\mathbf\{W\}\_\{g\}\\in\\mathbb\{R\}^\{d\\times d\_\{g\}\}are projection matrices\. The fused matrix𝐙=\[𝐳1,…,𝐳N\]⊤∈ℝN×d\\mathbf\{Z\}=\[\\mathbf\{z\}\_\{1\},\\ldots,\\mathbf\{z\}\_\{N\}\]^\{\\top\}\\in\\mathbb\{R\}^\{N\\times d\}serves as input to subsequent stages\.

### IV\-BSubgraph Retrieval and Knowledge Injection

Hop\-by\-hop verification requires the model to retrieve knowledge relevant to the current causal hypothesis at each reasoning step\. At thekk\-th step of the TCPA Transformer, the retrieval query is formulated as:

queryk=\(dk,codek,𝐬k\)\\text\{query\}\_\{k\}=\(d\_\{k\},\\;\\text\{code\}\_\{k\},\\;\\mathbf\{s\}\_\{k\}\)\(7\)wheredkd\_\{k\}is the device currently reached by traceback,codek\\text\{code\}\_\{k\}is its alert code set, and𝐬k\\mathbf\{s\}\_\{k\}is the model’s current hidden state\. These three components respectively specify the spatial anchor, content anchor, and semantic anchor of retrieval\.

Based on this query, the model performs cross\-layer subgraph retrieval on the dual\-layer KG\. On𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}, anLL\-hop neighborhood expansion centered atdkd\_\{k\}extracts the structural subgraph:

𝒢kinfra=L\-hop\-subgraph\(𝒢infra,dk\)\\mathcal\{G\}\_\{k\}^\{\\text\{infra\}\}=\\text\{$L$\-hop\-subgraph\}\(\\mathcal\{G\}\_\{\\text\{infra\}\},\\;d\_\{k\}\)\(8\)which contains the device’s topological neighbors, interface attributes, protocol configurations, and normal communication baselines\. On𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}, the anchoring link locates historical alert entities ofdkd\_\{k\}, filters the subset matchingcodek\\text\{code\}\_\{k\}, and expands along triggering edges to extract the experiential subgraph:

𝒢kfault=PropSubgraph\(𝒢fault,anchor\(dk\),codek\)\\mathcal\{G\}\_\{k\}^\{\\text\{fault\}\}=\\text\{PropSubgraph\}\(\\mathcal\{G\}\_\{\\text\{fault\}\},\\;\\text\{anchor\}\(d\_\{k\}\),\\;\\text\{code\}\_\{k\}\)\(9\)which contains related propagation path segments and root\-cause information\. The two subgraphs are merged as𝒢k=𝒢kinfra∪𝒢kfault\\mathcal\{G\}\_\{k\}=\\mathcal\{G\}\_\{k\}^\{\\text\{infra\}\}\\cup\\mathcal\{G\}\_\{k\}^\{\\text\{fault\}\}\.

The retrieved subgraph is converted into a sequence via structure\-aware linearization\[[4](https://arxiv.org/html/2606.00582#bib.bib40)\]: each triple\(h,r,t\)\(h,r,t\)in𝒢k\\mathcal\{G\}\_\{k\}is rendered as a natural language description with structural markers and encoded through the LLM embedding layer, yielding the knowledge matrix𝐊k∈ℝ\|𝒢k\|×d\\mathbf\{K\}\_\{k\}\\in\\mathbb\{R\}^\{\|\\mathcal\{G\}\_\{k\}\|\\times d\}, injected into the TCPA layer via cross\-attention\. Unlike single\-pass RAG\[[19](https://arxiv.org/html/2606.00582#bib.bib14)\], the query at each step is dynamically determined by the model’s reasoning state, so retrieval focus automatically shifts from end\-point to upstream devices as traceback progresses, achieving hop\-by\-hop synchronization between reasoning and knowledge acquisition\.

### IV\-CTemporal Causal Propagation Attention \(TCPA\)

TCPA enables PropLLM to perceive fault propagation direction at the attention level\. Standard self\-attention\[[31](https://arxiv.org/html/2606.00582#bib.bib16)\]assigns symmetric weights and cannot distinguish causal upstream from downstream\. TCPA addresses this through three mechanisms: a causal direction mask constraining information flow, a propagation diffusion matrix encoding causal reachability, and a temporal propagation bias capturing event temporal order\.

#### IV\-C1Causal Direction Mask

Fault propagation spreads from the root\-cause device downstream along topology and protocol dependencies, so hop\-by\-hop traceback should proceed in the reverse direction from end\-point alerts toward the upstream\. The causal direction mask𝐌causal∈ℝN×N\\mathbf\{M\}\_\{\\text\{causal\}\}\\in\\mathbb\{R\}^\{N\\times N\}enforces this prior by constraining attention directionality:

𝐌causal\(i,j\)=\{0,ifjis causal upstream or peer ofi−∞,otherwise\\mathbf\{M\}\_\{\\text\{causal\}\}\(i,j\)=\\begin\{cases\}0,&\\text\{if \}j\\text\{ is causal upstream or peer of \}i\\\\ \-\\infty,&\\text\{otherwise\}\\end\{cases\}\(10\)Causal upstream is determined jointly by topological distance and alert temporal order\. Devicejjis the causal upstream of deviceiiif and only if:

upstream\(j,i\)≡\(depth\(j\)≤depth\(i\)\)∧\(tjfirst≤tifirst\)\\text\{upstream\}\(j,i\)\\equiv\(\\text\{depth\}\(j\)\\leq\\text\{depth\}\(i\)\)\\land\(t\_\{j\}^\{\\text\{first\}\}\\leq t\_\{i\}^\{\\text\{first\}\}\)\(11\)wheredepth\(⋅\)\\text\{depth\}\(\\cdot\)denotes the device’s depth in the topology tree \(AP as root, closer to root is upstream\) andt⋅firstt\_\{\\cdot\}^\{\\text\{first\}\}denotes its earliest alert timestamp\. Positions masked with−∞\-\\inftyyield near\-zero attention after softmax, blocking reverse information flow and forcing the model to trace back exclusively toward the root cause\.

Note thatdepth\(⋅\)\\text\{depth\}\(\\cdot\)in Eq\. \([11](https://arxiv.org/html/2606.00582#S4.E11)\) is a concrete instantiation for tree topologies, not a structural constraint of TCPA\. For general topologies, it can be replaced by any partial order function reflecting causal propagation direction—e\.g\., topological sort on forwarding DAGs in mesh networks, or inter\-base\-station interference order in the TeleLogs 5G scenario \(validated in our experiments\)\. When the topological partial order is locally undefined \(e\.g\., routing loops\), the temporal constrainttjfirst≤tifirstt\_\{j\}^\{\\text\{first\}\}\\leq t\_\{i\}^\{\\text\{first\}\}provides a fallback cue\. When both degenerate simultaneously, TCPA sets the mask to zero rather than−∞\-\\infty, gracefully degrading to standard self\-attention for that device pair\.

#### IV\-C2Propagation Diffusion Matrix

The causal direction mask provides binary directional constraints, but the causal association strength at different positions along the propagation path is not uniformly distributed; upstream events closer to the root cause typically carry stronger causal signals\. The propagation diffusion matrix𝐃∈ℝN×N\\mathbf\{D\}\\in\\mathbb\{R\}^\{N\\times N\}continuously encodes this non\-uniform causal strength\.

We construct𝐃\\mathbf\{D\}based on the graph diffusion process\[[12](https://arxiv.org/html/2606.00582#bib.bib38)\]\. Given the normalized graph Laplacian𝐋=𝐈−𝐃deg−1/2𝐀𝐃deg−1/2\\mathbf\{L\}=\\mathbf\{I\}\-\\mathbf\{D\}\_\{\\text\{deg\}\}^\{\-1/2\}\\mathbf\{A\}\\mathbf\{D\}\_\{\\text\{deg\}\}^\{\-1/2\}, where𝐃deg\\mathbf\{D\}\_\{\\text\{deg\}\}is the degree matrix, the diffusion matrix is computed via the matrix exponential withKK\-th order truncation:

𝐃=exp⁡\(−β𝐋\)≈∑k=0K\(−β𝐋\)kk\!\\mathbf\{D\}=\\exp\(\-\\beta\\mathbf\{L\}\)\\approx\\sum\_\{k=0\}^\{K\}\\frac\{\(\-\\beta\\mathbf\{L\}\)^\{k\}\}\{k\!\}\(12\)whereβ\>0\\beta\>0controls the propagation decay rate andKKis the truncation order\.𝐃\(i,j\)\\mathbf\{D\}\(i,j\)reflects diffusion reachability from devicejjtoii, decaying exponentially with hop count\. This matrix encodes three properties: maximum diffusion between directly connected devices, cumulative effects when multiple paths exist, and exponential decay with distance that prioritizes nearest upstream events while suppressing distant noise\.

The truncation orderK=4K\{=\}4covers the maximum propagation depth in the dataset\. Each attention head uses an independentβh\\beta\_\{h\}whose initialization is determined by grid search, enabling different heads to attend to propagation scales from local 1–2 hops to longer\-range chains\.

#### IV\-C3Temporal Propagation Bias

The direction mask and diffusion matrix are both based on static topology and do not exploit alert timestamp information\. However, alert pairs at the same topological distance may have vastly different temporal intervals, and temporally proximate pairs are more likely to lie on the same propagation chain\. The temporal propagation bias𝐁∈ℝN×N\\mathbf\{B\}\\in\\mathbb\{R\}^\{N\\times N\}encodes this temporal signal as attention biases:

𝐁\(i,j\)=−λ⋅\|tifirst−tjfirst\|Δtmax\\mathbf\{B\}\(i,j\)=\-\\lambda\\cdot\\frac\{\|t\_\{i\}^\{\\text\{first\}\}\-t\_\{j\}^\{\\text\{first\}\}\|\}\{\\Delta t\_\{\\max\}\}\(13\)whereλ\>0\\lambda\>0is the temporal decay coefficient andΔtmax\\Delta t\_\{\\max\}is the maximum alert time span serving as normalization\. This bias assigns higher weights to temporally proximate alert pairs while attenuating those with large gaps\. Unlike positional encodings\[[31](https://arxiv.org/html/2606.00582#bib.bib16)\],𝐁\(i,j\)\\mathbf\{B\}\(i,j\)is based on actual event times rather than sequence positions, correctly handling the non\-uniform temporal distribution of fault propagation\.

#### IV\-C4TCPA Attention Computation

The three components are integrated into the attention computation to form the complete TCPA:

TCPA\(𝐐,𝐊,𝐕\)=softmax\(𝐐𝐊⊤dk⊙𝐃\+𝐁\+𝐌causal\)𝐕\\text\{TCPA\}\(\\mathbf\{Q\},\\mathbf\{K\},\\mathbf\{V\}\)=\\text\{softmax\}\\\!\\left\(\\frac\{\\mathbf\{Q\}\\mathbf\{K\}^\{\\top\}\}\{\\sqrt\{d\_\{k\}\}\}\\odot\\mathbf\{D\}\+\\mathbf\{B\}\+\\mathbf\{M\}\_\{\\text\{causal\}\}\\right\)\\mathbf\{V\}\(14\)where𝐐=𝐙𝐖Q\\mathbf\{Q\}=\\mathbf\{Z\}\\mathbf\{W\}\_\{Q\},𝐊=𝐙𝐖K\\mathbf\{K\}=\\mathbf\{Z\}\\mathbf\{W\}\_\{K\},𝐕=𝐙𝐖V\\mathbf\{V\}=\\mathbf\{Z\}\\mathbf\{W\}\_\{V\}are query, key, and value projections,⊙\\odotdenotes element\-wise multiplication, anddkd\_\{k\}is the scaling factor\. Compared to standard self\-attentionsoftmax\(𝐐𝐊⊤/dk\)𝐕\\text\{softmax\}\(\\mathbf\{Q\}\\mathbf\{K\}^\{\\top\}/\\sqrt\{d\_\{k\}\}\)\\mathbf\{V\}, TCPA introduces a triple causal inductive bias:𝐃\\mathbf\{D\}weights attention scores by propagation diffusion strength,𝐁\\mathbf\{B\}adjusts biases by temporal proximity, and𝐌causal\\mathbf\{M\}\_\{\\text\{causal\}\}blocks attention flow that violates the causal direction\.

TCPA employs multi\-head attention\[[31](https://arxiv.org/html/2606.00582#bib.bib16)\]with each headhhusing an independentβh\\beta\_\{h\}, allowing low\-β\\betaheads to focus on tightly coupled local causal pairs while high\-β\\betaheads capture long\-range propagation effects across multiple hops\. Following multi\-head self\-attention, cross\-attention integrates the knowledge representation𝐊k\\mathbf\{K\}\_\{k\}from the subgraph retrieval module:

CrossAttn\(𝐙′,𝐊k\)=softmax\(𝐙′𝐖Qc\(𝐊k𝐖Kc\)⊤dk\)𝐊k𝐖Vc\\text\{CrossAttn\}\(\\mathbf\{Z\}^\{\\prime\},\\mathbf\{K\}\_\{k\}\)=\\text\{softmax\}\\\!\\left\(\\frac\{\\mathbf\{Z\}^\{\\prime\}\\mathbf\{W\}\_\{Q\}^\{c\}\(\\mathbf\{K\}\_\{k\}\\mathbf\{W\}\_\{K\}^\{c\}\)^\{\\top\}\}\{\\sqrt\{d\_\{k\}\}\}\\right\)\\mathbf\{K\}\_\{k\}\\mathbf\{W\}\_\{V\}^\{c\}\(15\)where𝐙′\\mathbf\{Z\}^\{\\prime\}is the output of multi\-head self\-attention\. This cross\-attention enables the model to selectively attend to the most relevant facts in the retrieved knowledge for verifying the current causal hypothesis at each reasoning step\.

The complete TCPA block follows the Pre\-Norm architecture\[[41](https://arxiv.org/html/2606.00582#bib.bib39)\]:

𝐙′\\displaystyle\\mathbf\{Z\}^\{\\prime\}=𝐙\+MultiHead\(LN\(𝐙\)\)\\displaystyle=\\mathbf\{Z\}\+\\text\{MultiHead\}\(\\text\{LN\}\(\\mathbf\{Z\}\)\)\(16\)𝐙′′\\displaystyle\\mathbf\{Z\}^\{\\prime\\prime\}=𝐙′\+CrossAttn\(LN\(𝐙′\),𝐊k\)\\displaystyle=\\mathbf\{Z\}^\{\\prime\}\+\\text\{CrossAttn\}\(\\text\{LN\}\(\\mathbf\{Z\}^\{\\prime\}\),\\mathbf\{K\}\_\{k\}\)\(17\)𝐙out\\displaystyle\\mathbf\{Z\}\_\{\\text\{out\}\}=𝐙′′\+FFN\(LN\(𝐙′′\)\)\\displaystyle=\\mathbf\{Z\}^\{\\prime\\prime\}\+\\text\{FFN\}\(\\text\{LN\}\(\\mathbf\{Z\}^\{\\prime\\prime\}\)\)\(18\)where FFN is a two\-layer feed\-forward network and LN denotes LayerNorm\. TCPA blocks are stacked forLLlayers, sharing𝐌causal\\mathbf\{M\}\_\{\\text\{causal\}\}and𝐁\\mathbf\{B\}across layers \(as these are statically determined by the input event\) while independently learning diffusion coefficients and projection parameters at each layer\.

The output𝐙\(L\)\\mathbf\{Z\}^\{\(L\)\}of the TCPA Transformer is injected as a prefix sequence into the cross\-attention layers of the LLM Decoder, enabling the Decoder to continuously perceive the causal structure encoded by TCPA during autoregressive generation\. As reasoning unfolds hop\-by\-hop, the subgraph retrieval module provides updated knowledge𝐊k\+1\\mathbf\{K\}\_\{k\+1\}, and the cross\-attention in the TCPA layer switches to the new knowledge context accordingly, forming an iterative closed loop of traceback, retrieval, and verification\.

### IV\-DOutput and Verification Module

The LLM Decoder generates a diagnostic sequence that is mapped to a fault type probability distribution through a classification head:

p\(y\|𝐗\)=softmax\(𝐖cls𝐡\[CLS\]\+𝐛cls\)p\(y\|\\mathbf\{X\}\)=\\text\{softmax\}\(\\mathbf\{W\}\_\{\\text\{cls\}\}\\mathbf\{h\}\_\{\\text\{\[CLS\]\}\}\+\\mathbf\{b\}\_\{\\text\{cls\}\}\)\(19\)where𝐡\[CLS\]\\mathbf\{h\}\_\{\\text\{\[CLS\]\}\}is the classification token representation from the Decoder output andyyis the fault type label\. The Decoder further generates a causal explanation chain in an autoregressive manner, i\.e\., a natural language description of the hop\-by\-hop traceback path from end\-point alerts to the root\-cause device, with the generation process constrained by the causal structure encoded in𝐙\(L\)\\mathbf\{Z\}^\{\(L\)\}\.

The generated causal chain undergoes consistency verification against the dual\-layer KG\. For each claim, three conditions are checked: topological connectivity in𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}, temporal conformity with propagation direction, and historical precedent in𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}\. Failing claims trigger re\-generation under stronger knowledge constraints, suppressing hallucination\. The root\-cause device is localized as the terminal node where traceback finds no further upstream predecessor\.

## VExperiments

We conduct systematic experiments to answer four research questions:

RQ1evaluates overall performance on both fault type diagnosis and root cause localization, and reveals the causal linkage between them through conditional analysis\.

RQ2examines the causal reasoning capability that TCPA confers on the model from the perspective of multi\-hop cascading scenario analysis\.

RQ3quantifies the independent contribution and synergistic effects of each component through ablation studies, and further investigates the cold\-start scenario performance and inference efficiency\.

RQ4delves into per\-class performance across fault types, combining confusion matrices, misclassification patterns, and a diagnostic case study to reveal the mechanism by which hop\-by\-hop backtracking resolves end\-point ambiguity\.

### V\-AExperimental settings

#### V\-A1Datasets

The main experiments are conducted on the Wi\-Fi Multimodal Fault Benchmark dataset\[[46](https://arxiv.org/html/2606.00582#bib.bib53)\]\. Built on a real physical testbed with 3 Basic Service Sets and 21 nodes in an AP\-STA tree topology, the dataset contains approximately 600 fault cases covering 11 fault types and 1 normal state, with roughly balanced samples per class\. Each case provides four types of multimodal monitoring data—traffic\-level metrics, packet\-level traces, alert events, and system logs—totaling 18,475 alert records\. The dataset is temporally split, with the first 80% used for training and KG construction and the remaining 20% for testing\.

Supplementary experiments are conducted on the TeleLogs dataset\[[27](https://arxiv.org/html/2606.00582#bib.bib47)\]\. TeleLogs targets root cause analysis in 5G wireless networks by simulating drive\-test scenarios based on real network engineering parameters, with user equipment moving across multiple gNodeB base stations\. It includes eight root cause types covering typical 5G faults such as downtilt misconfiguration, co\-channel interference, PCI conflict, and handover threshold errors\. Each case provides two modalities: base station configuration parameters and user\-plane measurement metrics\. Compared with the primary Wi\-Fi dataset, TeleLogs features a different fault propagation mechanism from hop\-by\-hop tree topology to interference and handover propagation between base stations, resulting in substantial differences in both topology structure and fault patterns\. Data partitioning follows the official TeleLogs standard\.

Adapting PropLLM to TeleLogs involves three scenario\-specific configurations: theGinfraG\_\{\\text\{infra\}\}layer replaces the Wi\-Fi device dependency graph with a 5G base station neighbor relation graph; the causal direction masks in TCPA replace tree depth with the directional order of inter\-base\-station interference propagation; and the propagation path annotations inGfaultG\_\{\\text\{fault\}\}replace alarm temporal chains with inter\-base\-station interference and handover anomaly propagation paths\. The core architecture and training configuration remain unchanged\.

#### V\-A2Implementation Details

PropLLM uses Qwen3\-8B\[[42](https://arxiv.org/html/2606.00582#bib.bib46)\]as the backbone language model, with the LLM Decoder initialized from pretrained weights and most parameters frozen, unfreezing only the classification head, cross\-attention layers, and LoRA adapters\. The TCPA Transformer is configured with 6 layers, 8 attention heads per layer, hidden dimensiond=512d\{=\}512, truncation orderK=4K\{=\}4, and temporal decay coefficientλ=2\.0\\lambda\{=\}2\.0\. The LSTM encoder is a 2\-layer bidirectional structure, and the GAT encoder has 2 layers with 4 attention heads\. Subgraph retrieval uses neighborhood expansion hopsL=2L\{=\}2and similarity thresholdτ=0\.7\\tau\{=\}0\.7\.

Training proceeds in two stages\. The first stage is supervised fine\-tuning with cross\-entropy loss for 10 epochs at learning rate 2e\-5 with cosine annealing\. The second stage is RL fine\-tuning with PPO for 5 epochs, reward weightsα=1\.0\\alpha\{=\}1\.0,β=0\.5\\beta\{=\}0\.5,γ=0\.3\\gamma\{=\}0\.3, clip ratioϵ=0\.2\\epsilon\{=\}0\.2, and learning rate 5e\-6\. All experiments are conducted on 4 NVIDIA A100\-80G GPUs with DeepSpeed ZeRO\-2 acceleration\.

#### V\-A3Baselines

Baselines for the fault type diagnosis experiment are divided into non\-LLM and LLM methods\. Non\-LLM methods include XGBoost\[[3](https://arxiv.org/html/2606.00582#bib.bib42)\], GDN\[[7](https://arxiv.org/html/2606.00582#bib.bib4)\], Minder\[[8](https://arxiv.org/html/2606.00582#bib.bib24)\], and FAMOS\[[10](https://arxiv.org/html/2606.00582#bib.bib45)\], spanning the technical spectrum from traditional machine learning to deep multimodal fusion\. LLM methods include LLM\-Direct, LLM\-RAG, LLM\-CoT, NetLLM\[[39](https://arxiv.org/html/2606.00582#bib.bib43)\], Confucius\[[37](https://arxiv.org/html/2606.00582#bib.bib44)\], and BiAn\[[33](https://arxiv.org/html/2606.00582#bib.bib8)\], covering the full paradigm range from zero\-shot to specialized architectures\. Baselines for the root cause localization experiment include non\-LLM methods GDN\[[7](https://arxiv.org/html/2606.00582#bib.bib4)\], MULAN\[[51](https://arxiv.org/html/2606.00582#bib.bib23)\], Chain\-of\-Event\[[43](https://arxiv.org/html/2606.00582#bib.bib20)\], CORAL\[[34](https://arxiv.org/html/2606.00582#bib.bib48)\], RUN\[[22](https://arxiv.org/html/2606.00582#bib.bib49)\], BARO\[[25](https://arxiv.org/html/2606.00582#bib.bib50)\], and AERCA\[[13](https://arxiv.org/html/2606.00582#bib.bib51)\], as well as LLM methods LLM\-RAG, LLM\-CoT, and BiAn\[[33](https://arxiv.org/html/2606.00582#bib.bib8)\]\. To ensure fair comparison, all LLM\-based methods uniformly adopt Qwen3\-8B as the backbone language model; for methods originally designed with different backbone models, we re\-implement them with Qwen3\-8B while keeping all other configurations consistent with their original papers\.

### V\-BRQ1: Overall Performance

Tables[II](https://arxiv.org/html/2606.00582#S5.T2)and[III](https://arxiv.org/html/2606.00582#S5.T3)report the overall results for fault type diagnosis and root cause localization, respectively\. PropLLM achieves 91\.2% Acc and 74\.8% RC@1, outperforming all baselines by a clear margin—3\.9% and 4\.7% above the strongest baseline BiAn, respectively\.

TABLE II:Overall fault type diagnosis performance \(mean±\\pmstd over 3 independent runs\)\. Precision, Recall, and F1 are macro\-averaged\.TABLE III:Root cause localization performance \(mean±\\pmstd, 3 independent runs\)\.The performance hierarchy in Table[II](https://arxiv.org/html/2606.00582#S5.T2)reveals three noteworthy observations:

*First*, general\-purpose LLM parametric knowledge is insufficient for network fault diagnosis: LLM\-Direct achieves only 57\.6%, below XGBoost’s 65\.3%, confirming that cascading fault propagation exceeds zero\-shot capability\. From LLM\-RAG through LLM\-CoT to NetLLM, external knowledge, chain\-of\-thought guidance, and domain adaptation contribute stackable gains, yet all operate on the input or decoding side without modifying the model’s internal perception of causal direction\.

*Second*, the gap between non\-LLM and LLM methods is not a simple generational divide\. FAMOS achieves 86\.1% Acc, surpassing NetLLM and Confucius through effective cross\-modal fusion, but its one\-shot mapping paradigm cannot reason stepwise along propagation paths\. BiAn breaks through the non\-LLM ceiling at 87\.3% through hierarchical summarization and multi\-step reasoning\.

*Third*, PropLLM’s 3\.9% Acc gain over BiAn quantifies the contribution of causally\-aware attention and hop\-by\-hop fact verification\. BiAn’s summarization–reasoning pipeline lacks topological causal direction constraints and per\-hop fact checking\. PropLLM’s TCPA injects propagation direction priors at the attention level and cross\-validates with the dual\-layer KG at every hop, producing a synchronized 4\.5% improvement in both Precision \(filtering spurious paths\) and Recall \(discovering discriminative evidence at intermediate nodes\)\.

In the root cause localization dimension, PropLLM achieves 74\.8% RC@1 on the Wi\-Fi Multimodal Fault Dataset, improving over BiAn by 4\.7%, and 96\.3% RC@1 on TeleLogs, improving over BiAn by 3\.5%\. The RC@1 gains exceed those in accuracy on both datasets\. This is not coincidental: root cause localization demands precise backtracking along the full propagation chain, where even small causal direction errors are highly detrimental\. Thus, the constraints imposed by TCPA’s causal masks are particularly effective\. All methods perform substantially better on TeleLogs than on the Wi\-Fi dataset, consistent with TeleLogs’ clearer causal signals from well\-defined configuration deviations for each of its eight root causes versus the intermediate state aliasing caused by multi\-hop cascading in Wi\-Fi\. The notably high 96\.3% RC@1 on TeleLogs further demonstrates PropLLM’s cross\-topology adaptability, as TCPA effectively captures the dominant propagation direction even when switching from tree structures to 5G inter\-base\-station interference patterns\. Among non\-LLM methods, Chain\-of\-Event reaches 56\.5% RC@1 on Wi\-Fi by learning weighted event causal graphs, outperforming MULAN but falling slightly below CORAL\. AERCA further improves to 65\.1% via Granger causal discovery\. However, these approaches rely on learning causal structures from limited observational data\. In contrast, PropLLM directly encodes known causal priors from topology in𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}and historical fault templates in𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}, bypassing the statistical bottleneck of data\-driven causal graph learning\.

Fig\.[4](https://arxiv.org/html/2606.00582#S5.F4)presents a radar chart across five dimensions: Acc, F1, RC@1, 1\-CCED, and 1\-HR \(the latter two mapped to\[0,1\]\[0,1\]via min\-max normalization\)\. PropLLM occupies the outermost contour on all axes, with the largest lead over BiAn on the 1\-HR and RC@1 axes—directly reflecting that the core benefit of hop\-by\-hop verification lies in causal chain fidelity and root cause precision rather than classification accuracy alone\.

![Refer to caption](https://arxiv.org/html/2606.00582v1/x4.png)Figure 4:Multi\-dimensional performance radar chart of PropLLM and representative baselines\.Causal propagation analysis between root cause localization and classification\.The results above demonstrate that PropLLM significantly outperforms baselines on both dimensions, but a deeper question remains: does root cause localization accuracy truly drive classification performance? To validate this core thesis, we partition PropLLM’s test cases on the Wi\-Fi Multimodal Fault Dataset \(118 cases\) into two groups according to whether RC@1 hits, and compute the classification accuracy for each group\. The results are shown in Fig\.[5](https://arxiv.org/html/2606.00582#S5.F5)\.

![Refer to caption](https://arxiv.org/html/2606.00582v1/x5.png)Figure 5:Conditional association between root cause localization correctness and classification accuracy\.The RC@1\-hit group achieves a classification accuracy of 97\.7%, while the miss group drops sharply to 70\.0%, with the error rate surging from 2\.3% to 30\.0%\. The only 2 classification errors in the hit group both originate from complex topologies where multiple propagation paths converge—even though the root cause device is correctly located, the mixed symptoms downstream of the convergence point still cause type confusion, representing the most challenging cases in this dataset\. The 30\.0% error rate in the miss group reveals a causal amplification effect: a root cause localization error is not an isolated local mistake but rather corrupts the starting point of every subsequent hop\-by\-hop fact verification, causing deviations to accumulate along the causal chain until classification judgment loses its reliable basis\. This result provides empirical evidence that the synchronized improvement of PropLLM on Acc and RC@1 is not a coincidence across two independent dimensions, but rather the inevitable consequence of root cause localization accuracy propagating through the causal chain to the classification decision\.

### V\-CRQ2: Multi\-hop cascading scenario analysis

We partition the fault cases in the test set into three groups by causal chain length after excluding the Normal class, yielding 108 cases total: short chains with 1–2 hops containing 46 cases, medium chains with 3–4 hops containing 42 cases, and long chains with 5 or more hops containing 20 cases\. Table[IV](https://arxiv.org/html/2606.00582#S5.T4)compares PropLLM against representative baselines that possess both root cause localization capability and causal reasoning mechanisms\.

TABLE IV:Acc \(%\) and RC@1 \(%\) across different propagation hop counts \(mean±\\pmstd, 3 independent runs\)\.All methods degrade as hop count increases, but their decay rates differ fundamentally\. PropLLM’s Acc decreases by 12\.0% from short to long chains, compared with BiAn’s 20\.5% decrease, yielding a decay rate only 59% of the latter\. In the RC@1 dimension, the gap between PropLLM and BiAn widens from 3\.5% on short chains to 6\.0% on long chains, with the advantage growing approximately linearly with chain length\. The underlying logic of this trend is that short\-chain scenarios present limited path ambiguity where one\-shot reasoning already handles effectively and the marginal gain of hop\-by\-hop verification is modest\. As chain length increases, each hop introduces new branching possibilities, causing methods without causal direction constraints to rapidly degrade under cumulative ambiguity\. The TCPA causal masks eliminate reverse attention at each hop, keeping ambiguity accumulation within a manageable range\.

Fig\.[6](https://arxiv.org/html/2606.00582#S5.F6)reveals the mechanistic source through attention heatmaps of a 5\-hop case\. The standard self\-attention matrix is approximately symmetric, with the end\-point STA’s throughput alarm receiving the highest weight, causing misidentification of this symptom as the root cause\. The TCPA matrix exhibits a clear lower\-triangular structure—the causal mask zeroes out all reverse attention, forcing the model to look exclusively upstream\. A gradient band along the diagonal reflects the diffusion matrixexp⁡\(−β𝐋\)\\exp\(\-\\beta\\mathbf\{L\}\): weights decay exponentially with topological distance, concentrating focus on causal neighbors\. These two features explain why PropLLM decays most slowly in the long\-chain regime in Table[IV](https://arxiv.org/html/2606.00582#S5.T4): even beyond 5 hops, attention at each hop remains constrained to the correct causal direction\.

![Refer to caption](https://arxiv.org/html/2606.00582v1/x6.png)Figure 6:Attention weight matrices of standard self\-attention \(left\) vs\. TCPA \(right\) on a 5\-hop long\-chain case\.
### V\-DRQ3: Ablation Study

Table[V](https://arxiv.org/html/2606.00582#S5.T5)quantifies the independent contribution of each component by removing them one at a time\.

TABLE V:Ablation study results\.The ablation results can be interpreted along three dimensions\. Regarding independent component contributions, TCPA has the most pronounced impact: removing it entirely causes Acc to drop by 9\.3% to 81\.9%, RC@1 to drop by 16\.1% to 58\.7%, and HR to surge from 6\.3% to 22\.7%, with all three dimensions degrading simultaneously\. This confirms that causal direction awareness is the cornerstone of the hop\-by\-hop backtracking paradigm\. The three sub\-mechanisms within TCPA contribute in a hierarchical fashion\. The causal direction mask causes a 5\.0% Acc decline, the propagation diffusion matrix causes a 4\.2% decline, and the temporal bias causes a 2\.5% decline\. This pattern is consistent with the design rationale, where direction judgment is the fundamental prerequisite, distance decay refines precision, and temporal bias provides auxiliary correction\. On the knowledge side, degrading hop\-by\-hop retrieval to one\-shot RAG causes HR to surge to 18\.3%, the largest HR increase among all ablation variants, indicating that dynamic per\-hop factual constraints contribute even more to reasoning faithfulness than RL fine\-tuning\.

Regarding inter\-component synergy, removing TCPA drops Acc to 81\.9%, which falls not only far below the full model but even below FAMOS at 86\.1% and BiAn at 87\.3%; removing hop\-by\-hop retrieval and removing multi\-modal inputs similarly produce Acc values of 84\.4% and 83\.6% that fall below baseline levels\. This reveals that PropLLM’s components are not independently pluggable modules but rather form a tightly coupled synergistic system\. Without TCPA’s directional guidance, hop\-by\-hop retrieval may backtrack in the wrong direction and inject irrelevant subgraph information, performing worse than one\-shot methods\. This “negative transfer” phenomenon provides a counter\-proof of PropLLM’s core thesis: propagation\-aware reasoning and hop\-by\-hop fact verification must be deeply coupled, and the absence of either side substantially undermines the utility of the other\.

Additionally, removing multi\-modal inputs causes HR to drop to 9\.1%, which is paradoxically lower than 11\.8% when removing RL and 13\.2% when removing𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}—a non\-monotonic phenomenon worth noting\. With only text alarm inputs, the model’s reasoning space is constrained to the factual scope of textual descriptions, producing shorter causal chains with an average length dropping from 3\.8 to 2\.9 hops, which reduces hallucination opportunities but at the cost of a 7\.6% Acc decline and a 14\.4% RC@1 decline, as the model misses critical causal evidence embedded in the temporal and topological modalities while avoiding hallucinations\. This indicates that HR should not be interpreted in isolation—high\-fidelity reasoning must be built on a sufficient information foundation\.

Cold\-start scenario analysis\.𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}relies on historical fault cases, so its performance when cases are scarce during early deployment of a new network deserves attention\. Table[VI](https://arxiv.org/html/2606.00582#S5.T6)quantifies PropLLM’s performance under varying levels of historical accumulation by controlling the case volume in𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}\.

TABLE VI:Impact of𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}scale on performance \(cold\-start analysis\)\.When𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}is completely empty, PropLLM still achieves 87\.6% Acc, on par with BiAn’s 87\.3%, indicating that TCPA and𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}alone already match the strongest baseline, and𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}serves as the critical increment for surpassing it rather than a necessary prerequisite\. Performance gains exhibit diminishing marginal returns as cases accumulate: just 10% of historical cases push Acc to 88\.4%, at 30% it reaches 89\.5% which already approximates 98% of full performance, and at 80% it essentially saturates\. This rapid convergence stems from the reusability of causal templates—faults of the same type exhibit highly similar propagation patterns even when occurring on different devices, so a small number of cases suffices to cover the principal modes\. Furthermore, verified propagation paths from each successful diagnosis can be automatically written back to𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}, enabling system performance to improve continuously over operational time\. It should be noted that this experiment only controls the case volume in𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}while keeping the SFT training data at full scale; the 87\.6% Acc with an empty𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}can be regarded as an optimistic upper bound on cold\-start performance, and a joint cold\-start analysis that simultaneously reduces both training data and𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}is left for future work\.

Inference efficiency analysis\.Table[VII](https://arxiv.org/html/2606.00582#S5.T7)reports the average inference latency per diagnostic case for PropLLM and representative baselines \(4×\\timesA100\-80G, batch size = 1\)\.

TABLE VII:Average inference latency per diagnostic case\.PropLLM achieves an average inference latency of 1\.23 seconds, lower than BiAn’s 1\.45 seconds\. Latency scales approximately linearly with chain length: short\-chain \(1–2 hops\) 0\.72s, medium\-chain \(3–4 hops\) 1\.35s, and long\-chain \(5\+ hops\) 1\.84s\. The LLM Decoder’s autoregressive generation dominates at approximately 75% of total latency, while TCPA and KG retrieval together account for only about 10%, indicating negligible overhead from causality\-aware attention\. TCPA’s complexity isO\(N2L\)O\(N^\{2\}L\), which is negligible at the current scale \(N=21N\{=\}21\), and the diffusion matrixDDis precomputed offline\. For larger networks, theLL\-hop neighborhood expansion confines retrieval to local topology, making latency weakly dependent on total network scale\. Compared with manual troubleshooting \(minutes to hours\), PropLLM’s second\-level latency fully satisfies online assisted diagnosis requirements\.

### V\-ERQ4: Fine\-Grained Analysis and Case Study

Per\-class diagnosis performance\.Table[VIII](https://arxiv.org/html/2606.00582#S5.T8)reports the per\-class F1 scores of PropLLM across all 12 categories compared with the strongest baseline BiAn\.

TABLE VIII:Per\-class F1 scores across 12 fault types\.PropLLM outperforms BiAn across all 12 categories\. However, the magnitude of improvement is highly uneven and closely aligns with the method’s design expectations\. The largest gains occur in HiddenNode, AppSlowdown, and PoorLinkQuality \(F1 improvements of 0\.102, 0\.077, and 0\.077, respectively\)\. These categories have highly confusable terminal symptoms—all manifesting as decreased throughput and increased latency at end nodes—making them nearly indistinguishable from terminal alerts alone\. PropLLM’s hop\-by\-hop traceback effectively resolves this by mining discriminative state features at intermediate nodes\. By contrast, the smallest improvements are observed in TrafficOverload, AppCrash, and BeaconLoss \(gains of only 0\.017 to 0\.022\)\. Their terminal symptoms already possess extremely high discriminability, so the existing one\-shot mapping approach is sufficient and leaves limited marginal benefit for hop\-by\-hop traceback\.

Fig\.[7](https://arxiv.org/html/2606.00582#S5.F7)displays the 12\-class confusion matrices of BiAn and PropLLM side by side as heatmaps\. BiAn’s confusion matrix exhibits prominent off\-diagonal blocks in three category\-pair regions: HiddenNode/PoorLinkQuality, AppSlowdown/TrafficOverload, and BufferBloat/QueueOverflow\. These blocks show concentrated and symmetric misclassifications that reflect the inherently bidirectional ambiguity of end\-point symptoms\. PropLLM’s confusion matrix is highly diagonalized, with the off\-diagonal elements in these three regions nearly zeroed out\. The residual 10 misclassifications are sparsely distributed with no discernible clustering pattern, indicating a qualitative shift from “systematic category confusion” to “case\-level sporadic errors\.”

![Refer to caption](https://arxiv.org/html/2606.00582v1/x7.png)Figure 7:Confusion matrix heatmaps of BiAn \(left\) and PropLLM \(right\)\.Misclassification pattern analysis\.Fig\.[8](https://arxiv.org/html/2606.00582#S5.F8)compares the misclassification counts of PropLLM, BiAn, and FAMOS on the top\-5 confusion pairs using grouped bar charts\.

![Refer to caption](https://arxiv.org/html/2606.00582v1/x8.png)Figure 8:Misclassification counts on top\-5 confusion pairs\.Baseline misclassifications concentrate heavily on category pairs with similar end\-point symptoms: the top four confusion pairs account for 9 out of BiAn’s total 15 errors, i\.e\., 60%, and FAMOS concentrates 12 out of 16, i\.e\., 75%\. PropLLM produces only 4 misclassifications across these five confusion pairs, less than half of BiAn’s count, because hop\-by\-hop backtracking acquires discriminative evidence at intermediate nodes that end\-point alarms cannot provide\. PropLLM’s residual 10 errors are uniformly dispersed across category pairs with no more than one per pair, occurring primarily in atypical propagation patterns that are rare in the training set\. As𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}cases continue to accumulate, these sporadic misclassifications will progressively diminish\.

Diagnostic case study\.Fig\.[9](https://arxiv.org/html/2606.00582#S5.F9)visualizes the hop\-by\-hop backtracking process on a HiddenNode fault test case, showing the ESS network topology with three APs and associated STAs\. Different colored directed arrows mark the ground\-truth propagation path \(gray dashed\), PropLLM’s backtracking path \(green solid\), and BiAn’s judgment path \(red solid\)\.

![Refer to caption](https://arxiv.org/html/2606.00582v1/x9.png)Figure 9:Hop\-by\-hop backtracking path comparison on a HiddenNode fault case\. BiAn misclassifies based on end\-point symptom similarity, while PropLLM correctly backtracks through intermediate nodes to identify the root cause\.In this case, STA\-7 and STA\-12 simultaneously report throughput degradation and latency increase; BiAn classifies this as PoorLinkQuality based on end\-point symptom similarity\. PropLLM’s trajectory differs: at hop 1, it departs from STA\-12 and retrieves from𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}that its associated AP is AP\-2; at hop 2, it finds AP\-2’s retransmission rate is 3\.2×\\timesnormal while RSSI remains normal—key evidence distinguishing HiddenNode \(MAC\-layer collisions\) from PoorLinkQuality \(physical\-layer degradation\), invisible at end\-points; at hop 3, it matches “high retransmission \+ normal RSSI” against𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}causal templates \(similarity 0\.91\) and confirms via𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}that STA\-7 and STA\-12 share AP\-2’s collision domain; at hop 4, it verifies no upstream anomalies above AP\-2, localizing it as the root cause\. This exemplifies the synergy of three components: TCPA ensures upstream backtracking,𝒢infra\\mathcal\{G\}\_\{\\text\{infra\}\}provides protocol\-level facts, and𝒢fault\\mathcal\{G\}\_\{\\text\{fault\}\}supplies experiential evidence\.

## VIConclusion

This paper proposes PropLLM, the first framework integrating hop\-by\-hop scene reconstruction with LLMs for network fault diagnosis via causal backtracking\. Its core consists of three tightly coupled components: a dual\-layer knowledge graph separating structural and experiential knowledge for verifiable per\-hop reconstruction; TCPA, which encodes topological causal priors into attention via causal masks, propagation diffusion matrices, and temporal biases; and a retrieval\-verification loop coupling backtracking with dynamic retrieval to suppress hallucinations\. Experiments show PropLLM significantly outperforms all baselines on the Wi\-Fi Multimodal Fault Dataset and TeleLogs 5G dataset, with pronounced gains in multi\-hop scenarios\. The main limitation is that experiments cover only IoT wireless and 5G cellular networks; effectiveness on more complex topologies such as data center fat\-trees and WAN meshes remains to be verified\.

As for future work, we will explore extension to larger\-scale heterogeneous network topologies, few\-shot cold\-start strategies, and model distillation with inference acceleration\.

## References

- \[1\]T\. Ahmed, S\. Ghosh, C\. Bansal, T\. Zimmermann, X\. Zhang, and S\. Rajmohan\(2023\)Recommending root\-cause and mitigation steps for cloud incidents using large language models\.In2023 IEEE/ACM 45th International Conference on Software Engineering \(ICSE\),pp\. 1737–1749\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p2.1),[§I](https://arxiv.org/html/2606.00582#S1.p3.1),[§II\-A](https://arxiv.org/html/2606.00582#S2.SS1.p1.1),[§II\-B](https://arxiv.org/html/2606.00582#S2.SS2.p1.1)\.
- \[2\]A\. Asai, Z\. Wu, Y\. Wang, A\. Sil, and H\. Hajishirzi\(2024\)Self\-rag: learning to retrieve, generate, and critique through self\-reflection\.InInternational conference on learning representations,Vol\.2024,pp\. 9112–9141\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p4.1)\.
- \[3\]T\. Chen and C\. Guestrin\(2016\)Xgboost: a scalable tree boosting system\.InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining,pp\. 785–794\.Cited by:[§V\-A3](https://arxiv.org/html/2606.00582#S5.SS1.SSS3.p1.1)\.
- \[4\]W\. Chen, Y\. Su, X\. Yan, and W\. Y\. Wang\(2020\)KGPT: knowledge\-grounded pre\-training for data\-to\-text generation\.InProceedings of the 2020 conference on empirical methods in natural language processing \(EMNLP\),pp\. 8635–8648\.Cited by:[§IV\-B](https://arxiv.org/html/2606.00582#S4.SS2.p6.3)\.
- \[5\]Y\. Chen, H\. Xie, M\. Ma, Y\. Kang, X\. Gao, L\. Shi, Y\. Cao, X\. Gao, H\. Fan, M\. Wen,et al\.\(2024\)Automatic root cause analysis via large language models for cloud incidents\.InProceedings of the Nineteenth European Conference on Computer Systems,pp\. 674–688\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p3.1),[§II\-A](https://arxiv.org/html/2606.00582#S2.SS1.p1.1),[§II\-B](https://arxiv.org/html/2606.00582#S2.SS2.p1.1)\.
- \[6\]Y\. Cheng, L\. Li, T\. Xiao, Z\. Li, J\. Suo, K\. He, and Q\. Dai\(2024\)Cuts\+: high\-dimensional causal discovery from irregular time\-series\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.38,pp\. 11525–11533\.Cited by:[§II\-C](https://arxiv.org/html/2606.00582#S2.SS3.p3.1)\.
- \[7\]A\. Deng and B\. Hooi\(2021\)Graph neural network\-based anomaly detection in multivariate time series\.InProceedings of the AAAI conference on artificial intelligence,Vol\.35,pp\. 4027–4035\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p2.1),[§II\-A](https://arxiv.org/html/2606.00582#S2.SS1.p1.1),[§V\-A3](https://arxiv.org/html/2606.00582#S5.SS1.SSS3.p1.1)\.
- \[8\]Y\. Deng, X\. Shi, Z\. Jiang, X\. Zhang, L\. Zhang, Z\. Zhang, B\. Li, Z\. Song, H\. Zhu, G\. Liu,et al\.\(2025\)Minder: faulty machine detection for large\-scale distributed model training\.In22nd USENIX Symposium on Networked Systems Design and Implementation \(NSDI 25\),pp\. 505–521\.Cited by:[§II\-A](https://arxiv.org/html/2606.00582#S2.SS1.p1.1),[§V\-A3](https://arxiv.org/html/2606.00582#S5.SS1.SSS3.p1.1)\.
- \[9\]J\. Devlin, M\. Chang, K\. Lee, and K\. Toutanova\(2019\)Bert: pre\-training of deep bidirectional transformers for language understanding\.InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 \(long and short papers\),pp\. 4171–4186\.Cited by:[§IV\-A](https://arxiv.org/html/2606.00582#S4.SS1.p2.1)\.
- \[10\]C\. Duan, Y\. Yang, T\. Jia, G\. Liu, J\. Liu, H\. Zhang, Q\. Zhou, Y\. Li, and G\. Huang\(2025\)Famos: fault diagnosis for microservice systems through effective multi\-modal data fusion\.In2025 IEEE/ACM 47th International Conference on Software Engineering \(ICSE\),pp\. 2613–2624\.Cited by:[§V\-A3](https://arxiv.org/html/2606.00582#S5.SS1.SSS3.p1.1)\.
- \[11\]J\. Fang, Z\. Meng, and C\. Macdonald\(2025\)Kirag: knowledge\-driven iterative retriever for enhancing retrieval\-augmented generation\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 18969–18985\.Cited by:[§II\-C](https://arxiv.org/html/2606.00582#S2.SS3.p2.1)\.
- \[12\]J\. Gasteiger, S\. Weißenberger, and S\. Günnemann\(2019\)Diffusion improves graph learning\.Advances in neural information processing systems32\.Cited by:[§IV\-C2](https://arxiv.org/html/2606.00582#S4.SS3.SSS2.p2.4)\.
- \[13\]X\. Han, S\. Absar, L\. Zhang, and S\. Yuan\(2025\)Root cause analysis of anomalies in multivariate time series through granger causal discovery\.InThe Thirteenth International Conference on Learning Representations,Cited by:[§V\-A3](https://arxiv.org/html/2606.00582#S5.SS1.SSS3.p1.1)\.
- \[14\]Z\. Hu, Y\. Dong, K\. Wang, and Y\. Sun\(2020\)Heterogeneous graph transformer\.InProceedings of the web conference 2020,pp\. 2704–2710\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p4.1)\.
- \[15\]A\. Ikram, S\. Chakraborty, S\. Mitra, S\. Saini, S\. Bagchi, and M\. Kocaoglu\(2022\)Root cause analysis of failures in microservices through causal discovery\.Advances in Neural Information Processing Systems35,pp\. 31158–31170\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p5.1)\.
- \[16\]S\. Ji, S\. Pan, E\. Cambria, P\. Marttinen, and P\. S\. Yu\(2021\)A survey on knowledge graphs: representation, acquisition, and applications\.IEEE transactions on neural networks and learning systems33\(2\),pp\. 494–514\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p4.1)\.
- \[17\]Y\. Jiang, C\. Zhang, S\. He, Z\. Yang, M\. Ma, S\. Qin, Y\. Kang, Y\. Dang, S\. Rajmohan, Q\. Lin,et al\.\(2024\)Xpert: empowering incident management with query recommendations via large language models\.InProceedings of the IEEE/ACM 46th International conference on software engineering,pp\. 1–13\.Cited by:[§II\-A](https://arxiv.org/html/2606.00582#S2.SS1.p1.1),[§II\-B](https://arxiv.org/html/2606.00582#S2.SS2.p1.1)\.
- \[18\]L\. Kong, W\. Li, H\. Yang, Y\. Zhang, J\. Guan, and S\. Zhou\(2024\)Causalformer: an interpretable transformer for temporal causal discovery\.IEEE Transactions on Knowledge and Data Engineering37\(1\),pp\. 102–115\.Cited by:[§II\-C](https://arxiv.org/html/2606.00582#S2.SS3.p3.1)\.
- \[19\]P\. Lewis, E\. Perez, A\. Piktus, F\. Petroni, V\. Karpukhin, N\. Goyal, H\. Küttler, M\. Lewis, W\. Yih, T\. Rocktäschel,et al\.\(2020\)Retrieval\-augmented generation for knowledge\-intensive nlp tasks\.Advances in neural information processing systems33,pp\. 9459–9474\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p4.1),[§II\-B](https://arxiv.org/html/2606.00582#S2.SS2.p1.1),[§IV\-B](https://arxiv.org/html/2606.00582#S4.SS2.p6.3)\.
- \[20\]M\. łgorzata Steinder and A\. S\. Sethi\(2004\)A survey of fault localization techniques in computer networks\.Science of computer programming53\(2\),pp\. 165–194\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p1.1),[§I](https://arxiv.org/html/2606.00582#S1.p2.1),[§II\-A](https://arxiv.org/html/2606.00582#S2.SS1.p1.1)\.
- \[21\]A\. Li, S\. Lu, S\. Nath, R\. Padhye, and V\. Sekar\(2024\)\{\\\{exchain\}\\\}: Exception dependency analysis for root cause diagnosis\.In21st USENIX Symposium on Networked Systems Design and Implementation \(NSDI 24\),pp\. 2047–2062\.Cited by:[§II\-A](https://arxiv.org/html/2606.00582#S2.SS1.p1.1)\.
- \[22\]C\. Lin, C\. Chang, W\. Wang, K\. Wang, and W\. Peng\(2024\)Root cause analysis in microservice using neural granger causal discovery\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.38,pp\. 206–213\.Cited by:[§V\-A3](https://arxiv.org/html/2606.00582#S5.SS1.SSS3.p1.1)\.
- \[23\]L\. Liu, B\. Wang, F\. Ma, Q\. Zheng, L\. Yao, C\. Zhang, and M\. A\. Mohamed\(2022\)A concurrent fault diagnosis method of transformer based on graph convolutional network and knowledge graph\.Frontiers in Energy Research10,pp\. 837553\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p4.1),[§II\-C](https://arxiv.org/html/2606.00582#S2.SS3.p2.1)\.
- \[24\]L\. Luo, Y\. Li, R\. Haffari, and S\. Pan\(2024\)Reasoning on graphs: faithful and interpretable large language model reasoning\.InInternational Conference on Learning Representations,Vol\.2024,pp\. 14400–14423\.Cited by:[§II\-C](https://arxiv.org/html/2606.00582#S2.SS3.p2.1)\.
- \[25\]L\. Pham, H\. Ha, and H\. Zhang\(2024\)Baro: robust root cause analysis for microservices via multivariate bayesian online change point detection\.Proceedings of the ACM on Software Engineering1\(FSE\),pp\. 2214–2237\.Cited by:[§V\-A3](https://arxiv.org/html/2606.00582#S5.SS1.SSS3.p1.1)\.
- \[26\]R\. Y\. Rohekar, Y\. Gurwicz, and S\. Nisimov\(2023\)Causal interpretation of self\-attention in pre\-trained transformers\.Advances in Neural Information Processing Systems36,pp\. 31450–31465\.Cited by:[§II\-C](https://arxiv.org/html/2606.00582#S2.SS3.p3.1)\.
- \[27\]M\. Sana, N\. Piovesan, A\. De Domenico, Y\. Kang, H\. Zhang, M\. Debbah, and F\. Ayed\(2025\)Reasoning language models for root cause analysis in 5g wireless networks\.arXiv preprint arXiv:2507\.21974\.Cited by:[§V\-A1](https://arxiv.org/html/2606.00582#S5.SS1.SSS1.p2.1)\.
- \[28\]J\. Schmidhuber, S\. Hochreiter,et al\.\(1997\)Long short\-term memory\.Neural Comput9\(8\),pp\. 1735–1780\.Cited by:[§IV\-A](https://arxiv.org/html/2606.00582#S4.SS1.p3.3)\.
- \[29\]H\. Trivedi, N\. Balasubramanian, T\. Khot, and A\. Sabharwal\(2023\)Interleaving retrieval with chain\-of\-thought reasoning for knowledge\-intensive multi\-step questions\.InProceedings of the 61st annual meeting of the association for computational linguistics \(volume 1: long papers\),pp\. 10014–10037\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p3.1)\.
- \[30\]A\. Vashishtha, A\. Kumar, A\. Pandey, A\. G\. Reddy, K\. Ahuja, V\. N\. Balasubramanian, and A\. Sharma\(2025\)Teaching transformers causal reasoning through axiomatic training\.InProc\. International Conference on Machine Learning \(ICML\),Cited by:[§II\-C](https://arxiv.org/html/2606.00582#S2.SS3.p3.1)\.
- \[31\]A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, Ł\. Kaiser, and I\. Polosukhin\(2017\)Attention is all you need\.Advances in neural information processing systems30\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p5.1),[§II\-B](https://arxiv.org/html/2606.00582#S2.SS2.p1.1),[§IV\-C3](https://arxiv.org/html/2606.00582#S4.SS3.SSS3.p2.3),[§IV\-C4](https://arxiv.org/html/2606.00582#S4.SS3.SSS4.p3.5),[§IV\-C](https://arxiv.org/html/2606.00582#S4.SS3.p1.1)\.
- \[32\]P\. Veličković, G\. Cucurull, A\. Casanova, A\. Romero, P\. Lio, Y\. Bengio,et al\.\(2018\)Graph attention networks\.InInternational conference on learning representations,Vol\.6\.Cited by:[§IV\-A](https://arxiv.org/html/2606.00582#S4.SS1.p4.1)\.
- \[33\]C\. Wang, X\. Zhang, R\. Lu, X\. Lin, X\. Zeng, X\. Zhang, Z\. An, G\. Wu, J\. Gao, C\. Tian,et al\.\(2025\)Towards llm\-based failure localization in production\-scale networks\.InProceedings of the ACM SIGCOMM 2025 Conference,pp\. 496–511\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p3.1),[§II\-A](https://arxiv.org/html/2606.00582#S2.SS1.p1.1),[§II\-B](https://arxiv.org/html/2606.00582#S2.SS2.p1.1),[§V\-A3](https://arxiv.org/html/2606.00582#S5.SS1.SSS3.p1.1)\.
- \[34\]D\. Wang, Z\. Chen, Y\. Fu, Y\. Liu, and H\. Chen\(2023\)Incremental causal graph learning for online root cause analysis\.InProceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining,pp\. 2269–2278\.Cited by:[§V\-A3](https://arxiv.org/html/2606.00582#S5.SS1.SSS3.p1.1)\.
- \[35\]H\. Wang, A\. Abhashkumar, C\. Lin, T\. Zhang, X\. Gu, N\. Ma, C\. Wu, S\. Liu, W\. Zhou, Y\. Dong,et al\.\(2024\)\{\\\{netassistant\}\\\}: Dialogue based network diagnosis in data center networks\.In21st USENIX Symposium on Networked Systems Design and Implementation \(NSDI 24\),pp\. 2011–2024\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p3.1),[§II\-B](https://arxiv.org/html/2606.00582#S2.SS2.p1.1)\.
- \[36\]T\. Wang and G\. Qi\(2024\)A comprehensive survey on root cause analysis in \(micro\) services: methodologies, challenges, and trends\.arXiv preprint arXiv:2408\.00803\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p4.1)\.
- \[37\]Z\. Wang, S\. Lin, G\. Yan, S\. Ghorbani, M\. Yu, J\. Zhou, N\. Hu, L\. Baruah, S\. Peters, S\. Kamath,et al\.\(2025\)Intent\-driven network management with multi\-agent llms: the confucius framework\.InProceedings of the ACM SIGCOMM 2025 Conference,pp\. 347–362\.Cited by:[§V\-A3](https://arxiv.org/html/2606.00582#S5.SS1.SSS3.p1.1)\.
- \[38\]J\. Wei, X\. Wang, D\. Schuurmans, M\. Bosma, F\. Xia, E\. Chi, Q\. V\. Le, D\. Zhou,et al\.\(2022\)Chain\-of\-thought prompting elicits reasoning in large language models\.Advances in neural information processing systems35,pp\. 24824–24837\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p5.1),[§II\-B](https://arxiv.org/html/2606.00582#S2.SS2.p1.1)\.
- \[39\]D\. Wu, X\. Wang, Y\. Qiao, Z\. Wang, J\. Jiang, S\. Cui, and F\. Wang\(2024\)Netllm: adapting large language models for networking\.InProceedings of the ACM SIGCOMM 2024 Conference,pp\. 661–678\.Cited by:[§V\-A3](https://arxiv.org/html/2606.00582#S5.SS1.SSS3.p1.1)\.
- \[40\]Z\. Wu, M\. Zhao, Y\. Li, H\. Li, S\. Peng, N\. Kato, and F\. Tang\(2026\)Kgv: integrating large language models with knowledge graphs for cyber threat intelligence credibility assessment\.IEEE Transactions on Mobile Computing\.Cited by:[§II\-C](https://arxiv.org/html/2606.00582#S2.SS3.p2.1)\.
- \[41\]R\. Xiong, Y\. Yang, D\. He, K\. Zheng, S\. Zheng, C\. Xing, H\. Zhang, Y\. Lan, L\. Wang, and T\. Liu\(2020\)On layer normalization in the transformer architecture\.InInternational conference on machine learning,pp\. 10524–10533\.Cited by:[§IV\-C4](https://arxiv.org/html/2606.00582#S4.SS3.SSS4.p5.1)\.
- \[42\]A\. Yang, A\. Li, B\. Yang, B\. Zhang, B\. Hui, B\. Zheng, B\. Yu, C\. Gao, C\. Huang, C\. Lv,et al\.\(2025\)Qwen3 technical report\.arXiv preprint arXiv:2505\.09388\.Cited by:[§V\-A2](https://arxiv.org/html/2606.00582#S5.SS1.SSS2.p1.5)\.
- \[43\]Z\. Yao, C\. Pei, W\. Chen, H\. Wang, L\. Su, H\. Jiang, Z\. Xie, X\. Nie, and D\. Pei\(2024\)Chain\-of\-event: interpretable root cause analysis for microservices through automatically learning weighted event causal graph\.InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering,pp\. 50–61\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p5.1),[§II\-B](https://arxiv.org/html/2606.00582#S2.SS2.p1.1),[§V\-A3](https://arxiv.org/html/2606.00582#S5.SS1.SSS3.p1.1)\.
- \[44\]G\. Yu, P\. Chen, Y\. Li, H\. Chen, X\. Li, and Z\. Zheng\(2023\)Nezha: interpretable fine\-grained root causes analysis for microservices on multi\-modal observability data\.InProceedings of the 31st ACM joint European software engineering conference and symposium on the foundations of software engineering,pp\. 553–565\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p5.1)\.
- \[45\]D\. Zhang, X\. Zhang, C\. Bansal, P\. Las\-Casas, R\. Fonseca, and S\. Rajmohan\(2024\)LM\-pace: confidence estimation by large language models for effective root causing of cloud incidents\.InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering,pp\. 388–398\.Cited by:[§II\-B](https://arxiv.org/html/2606.00582#S2.SS2.p1.1)\.
- \[46\]J\. Zhang, H\. Deng, X\. Li, M\. Zhao, F\. Tang, and N\. Kato\(2026\)Toward realistic wi\-fi fault diagnosis: a multi\-modal benchmark\.External Links:2605\.22008,[Link](https://arxiv.org/abs/2605.22008)Cited by:[§V\-A1](https://arxiv.org/html/2606.00582#S5.SS1.SSS1.p1.1)\.
- \[47\]L\. Zhang, T\. Jia, M\. Jia, Y\. Wu, A\. Liu, Y\. Yang, Z\. Wu, X\. Hu, P\. S\. Yu, and Y\. Li\(2024\)A survey of aiops for failure management in the era of large language models\.arXiv preprint arXiv:2406\.11213\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p1.1)\.
- \[48\]S\. Zhang, Y\. Zhao, S\. Xia, S\. Wei, Y\. Sun, C\. Zhao, S\. Ma, J\. Kuang, B\. Zhu, L\. Pan,et al\.\(2024\)No more data silos: unified microservice failure diagnosis with temporal knowledge graph\.IEEE Transactions on Services Computing17\(6\),pp\. 4013–4026\.Cited by:[§II\-C](https://arxiv.org/html/2606.00582#S2.SS3.p2.1)\.
- \[49\]S\. Zhang, Y\. Zhao, X\. Xiong, Y\. Sun, X\. Nie, J\. Zhang, F\. Wang, X\. Zheng, Y\. Zhang, and D\. Pei\(2024\)Illuminating the gray zone: non\-intrusive gray failure localization in server operating systems\.InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering,pp\. 126–137\.Cited by:[§II\-A](https://arxiv.org/html/2606.00582#S2.SS1.p1.1)\.
- \[50\]R\. Zhao, R\. Yan, Z\. Chen, K\. Mao, P\. Wang, and R\. X\. Gao\(2019\)Deep learning and its applications to machine health monitoring\.Mechanical systems and signal processing115,pp\. 213–237\.Cited by:[§I](https://arxiv.org/html/2606.00582#S1.p2.1),[§II\-A](https://arxiv.org/html/2606.00582#S2.SS1.p1.1)\.
- \[51\]L\. Zheng, Z\. Chen, J\. He, and H\. Chen\(2024\)Mulan: multi\-modal causal structure learning and root cause analysis for microservice systems\.InProceedings of the ACM Web Conference 2024,pp\. 4107–4116\.Cited by:[§II\-A](https://arxiv.org/html/2606.00582#S2.SS1.p1.1),[§V\-A3](https://arxiv.org/html/2606.00582#S5.SS1.SSS3.p1.1)\.
- \[52\]K\. Zuo, Y\. Jiang, F\. Mo, and P\. Lio\(2025\)Kg4diagnosis: a hierarchical multi\-agent llm framework with knowledge graph enhancement for medical diagnosis\.InAAAI Bridge Program on AI for Medicine and Healthcare,pp\. 195–204\.Cited by:[§II\-C](https://arxiv.org/html/2606.00582#S2.SS3.p2.1)\.
PropLLM: Propagation-Aware Scene Reconstruction for Network Fault Diagnosis

Similar Articles

PropGuard: Safeguarding LLM-MAS via Propagation-Aware Exploration and Remediation

Hallucination as an Anomaly: Dynamic Intervention via Probabilistic Circuits

PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations

Dismantling Pathological Shortcuts: A Causal Framework for Faithful LVLM Decoding

$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

Submit Feedback

Similar Articles

PropGuard: Safeguarding LLM-MAS via Propagation-Aware Exploration and Remediation
Hallucination as an Anomaly: Dynamic Intervention via Probabilistic Circuits
PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations
Dismantling Pathological Shortcuts: A Causal Framework for Faithful LVLM Decoding
$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction