Logical Grammar Induction via Graph Kolmogorov Complexity: A Neuro-Symbolic Framework for Self-Healing Clinical Data Integrity

arXiv cs.LG 05/18/26, 04:00 AM Papers
logical-grammar-induction graph-kolmogorov-complexity neuro-symbolic clinical-data-integrity anomaly-detection temporal-graph-neural-network self-healing
Summary
Proposes Logic-GNN, a neuro-symbolic framework that uses temporal graph neural networks and graph Kolmogorov complexity to induce a symbolic grammar for clinical records, enabling detection and correction of data entry errors as grammatical violations. The system achieves an F1-score of 0.94 on a large healthcare dataset, outperforming state-of-the-art methods by 12%.
arXiv:2605.15242v1 Announce Type: new Abstract: The reliability of Healthcare Information Systems (HIS) is frequently compromised by human-induced data entry errors, which existing statistical anomaly detection methods fail to distinguish from legitimate clinical extremes. This paper proposes Logic-GNN, a novel neuro-symbolic framework that treats clinical records as a structured ``private language'' governed by latent logical games. By integrating Temporal Graph Neural Networks (TGNN) with Graph Kolmogorov Complexity, we induce a symbolic grammar that represents the underlying logic of medical interactions. We define anomalies as ``grammatical violations'' that cause a significant expansion in the Minimum Description Length (MDL) of the clinical graph. Evaluated on the Sina System dataset (2M+ records), Logic-GNN achieves an F1-score of 0.94, outperforming state-of-the-art baselines by 12\% in distinguishing between life-threatening medical outliers and data corruption. Our approach introduces a self-healing mechanism that suggests logical corrections to maintain data integrity in real-time HIS environments.
Original Article
View Cached Full Text
Cached at: 05/18/26, 06:38 AM
# Logical Grammar Induction via Graph Kolmogorov Complexity: A Neuro-Symbolic Framework for Self-Healing Clinical Data Integrity
Source: [https://arxiv.org/html/2605.15242](https://arxiv.org/html/2605.15242)
Abolfazl Zarghani, Amir MalekesfandiariBoth authors are with the Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran\. E\-mail: abolfazlzarghani1999@mail\.um\.ac\.ir, malekesfandiari\.amir@mail\.um\.ac\.ir

###### Abstract

The reliability of Healthcare Information Systems \(HIS\) is frequently compromised by human\-induced data entry errors, which existing statistical anomaly detection methods fail to distinguish from legitimate clinical extremes\. This paper proposesLogic\-GNN, a novel neuro\-symbolic framework that treats clinical records as a structured “private language” governed by latent logical games\. By integrating Temporal Graph Neural Networks \(TGNN\) with Graph Kolmogorov Complexity, we induce a symbolic grammar that represents the underlying logic of medical interactions\. We define anomalies as “grammatical violations” that cause a significant expansion in the Minimum Description Length \(MDL\) of the clinical graph\. Evaluated on the Sina System dataset \(2M\+ records\), Logic\-GNN achieves an F1\-score of 0\.94, outperforming state\-of\-the\-art baselines by 12% in distinguishing between life\-threatening medical outliers and data corruption\. Our approach introduces a self\-healing mechanism that suggests logical corrections to maintain data integrity in real\-time HIS environments\.

## 1Introduction

Healthcare Information Systems \(HIS\) have evolved from basic record\-keeping tools into large\-scale, high\-dimensional digital repositories that now serve as the foundation of modern clinical decision support\. However, as shown in recent large\-scale studies on contaminated medical datasets\[[10](https://arxiv.org/html/2605.15242#bib.bib10)\], the reliability of these systems is frequently undermined by various forms of human\-induced noise\. This noise includes stochastic clerical errors as well as systematic logical inconsistencies, creating serious challenges for the development and deployment of robust predictive models\. The problem is especially critical in centralized platforms such as the Sina system, which contains more than two million records\.

Traditional machine learning approaches to anomaly detection typically rely on statistical density estimation or distance\-based techniques, flagging instances that lie in low\-density regions of the feature space\. While effective in many industrial applications, this paradigm is inadequate in the clinical domain\. In medicine, statistically rare events often represent critical, life\-threatening conditions that carry high clinical value\. In contrast, many data entry errors may appear numerically plausible yet violate fundamental logical rules of healthcare\. This limitation necessitates a fundamental shift from purely statistical notions of anomaly toward a structural and symbolic understanding of data integrity\.

The theoretical foundation of this work is inspired by Ludwig Wittgenstein’s concept of “Language Games” \(Sprachspiele\) in his later philosophy\. We posit that a clinical database is not merely a passive collection of stochastic variables, but rather a dynamic system of logical interactions governed by an implicit grammar\. Within this perspective, each medical record functions as a “sentence” in a private clinical language\. For example, recording an obstetric procedure for a male patient is not simply a statistical outlier; it constitutes a violation of the underlying logical grammar of the system\.

To address these challenges, we proposeLogic\-GNN, a novel neuro\-symbolic framework that models clinical records as nodes and interactions in a temporal heterogeneous graph\. By integrating Temporal Graph Neural Networks with the concept of Graph Kolmogorov Complexity, the framework induces the latent symbolic grammar governing valid medical interactions\. Anomalies are formally defined as “grammatical violations” that cause a significant increase in the Minimum Description Length \(MDL\) of the entire clinical graph\. This information\-theoretic formulation enables the system not only to detect inconsistencies but also to identify the specific rule violations\.

A key innovation of Logic\-GNN is its self\-healing capability\. When a logical contradiction is detected, the framework identifies the violated constraint and, through gradient\-based optimization of graph complexity, suggests corrective modifications\. This allows for automated or human\-in\-the\-loop restoration of data integrity in real\-time healthcare environments\.

Additionally, clinical data streams are subject to concept drift resulting from evolving medical protocols, seasonal health trends, and changes in clinical practice\. As explored in our prior work on adaptive reinforcement learning for data streams\[[15](https://arxiv.org/html/2605.15242#bib.bib15)\], static detection approaches are insufficient\. Logic\-GNN incorporates adaptive temporal mechanisms that allow the induced logical grammar to evolve alongside real\-world clinical practice, thereby maintaining high precision without penalizing legitimate medical outliers\.

The primary contributions of this paper are as follows:

1. 1\.We introduceGraph Kolmogorov Complexity, a formal information\-theoretic metric for measuring the logical consistency of nodes in high\-dimensional clinical graphs\.
2. 2\.We propose a differentiableLogic Extraction Layerfor GNN architectures that enables the model to learn and enforce first\-order logical constraints directly from relational medical data\.
3. 3\.We conduct a comprehensive evaluation on the Sina Hospital Information System dataset comprising over2\.2 million records, demonstrating that Logic\-GNN significantly outperforms state\-of\-the\-art baselines in distinguishing logical errors from legitimate clinical extremes while exhibiting strong robustness against noise and concept drift\[[10](https://arxiv.org/html/2605.15242#bib.bib10),[15](https://arxiv.org/html/2605.15242#bib.bib15)\]\.

## 2Related Work

### 2\.1Architectural Evolution of Graph Neural Networks

The landscape of Graph Neural Networks \(GNNs\) has evolved from static spatial aggregations to dynamic, temporal\-aware architectures\. As comprehensively reviewed by Waikhom and Patgiri\[[13](https://arxiv.org/html/2605.15242#bib.bib13)\], the taxonomies of GNNs now span across supervised, semi\-supervised, and self\-supervised settings\. Early models primarily focused on Euclidean\-based message passing; however, the shift toward non\-Euclidean domains has necessitated more robust structural\-feature learning\. Ponzi and Napoli\[[12](https://arxiv.org/html/2605.15242#bib.bib12)\]emphasize that recent advances in GNN architectures—specifically those utilizing attention mechanisms—have significantly improved the capacity of models to capture long\-range dependencies in complex networks, a critical requirement for clinical data where patient\-physician interactions are often sparse and intermittent\.

### 2\.2Graph\-Based Anomaly Detection in Clinical Domains

Anomaly detection in medical graphs presents unique challenges due to the high variance of legitimate biological signals\. Previous research has categorized these tasks into structural outliers and attribute\-based anomalies\. Our previous work,EpiGraph\[[11](https://arxiv.org/html/2605.15242#bib.bib11)\], demonstrated the efficacy of integrating Temporal Graph Neural Networks \(TGNNs\) with LSTM units to predict disease outbreaks by monitoring contact networks\. Despite the high AUC achieved by such models, they often operate as ”black boxes,” failing to bridge theInterpretability Gap\. As highlighted in recent case studies on contaminated clinical datasets\[[10](https://arxiv.org/html/2605.15242#bib.bib10)\], standard algorithms like Isolation Forest or traditional Autoencoders struggle to distinguish between a ”medical outlier” \(a rare but valid condition\) and a ”logical anomaly” \(data entry error\), necessitating a move toward symbolic reasoning\.

### 2\.3Neuro\-Symbolic Integration and Logic Induction

The integration of symbolic logic with neural architectures is an emerging frontier aimed at providing formal guarantees for AI predictions\. Wu et al\.\[[5](https://arxiv.org/html/2605.15242#bib.bib5)\]discuss the foundational shift toward inductive logic programming within GNNs, suggesting that relational data can be modeled as a set of learnable logical clauses\. However, the induction of these rules in real\-time healthcare streams remains largely unexplored\. By treating clinical interactions as ”language games” in the Wittgensteinian sense, our framework attempts to learn the underlying grammar of the Sina system\. This aligns with the push for interpretable GNNs\[[6](https://arxiv.org/html/2605.15242#bib.bib6)\]that not only detect deviations but also explain them through symbolic constraints, thereby ensuring that the detected anomalies correspond to actual violations of medical protocols rather than statistical noise\.

### 2\.4Kolmogorov Complexity and Minimum Description Length

Theoretical foundations for anomaly detection through data compression are rooted in the concept of Kolmogorov Complexity\. Li and Vitányi\[[14](https://arxiv.org/html/2605.15242#bib.bib14)\]established that the most consistent explanation for a dataset is its shortest description\. In the context of graph mining, this principle is operationalized through the Minimum Description Length \(MDL\) criterion\. While MDL has seen success in community detection and graph clustering, its application to high\-dimensional, time\-varying clinical streams is limited by its inherent uncomputability\. Our approach leverages the predictive power of TGNNs to serve as an approximate compressor\. This methodology transforms the detection task from a density estimation problem into a complexity\-minimization problem, allowing for a more rigorous definition of ”logical consistency” in medical records\.

### 2\.5Adaptive Processing in Dynamic Data Streams

Clinical databases like the Sina system are not static; they are characterized by constant ”concept drift” and varying interaction frequencies\. Adaptive sliding window techniques, as explored in our concurrent research on RL\-Window\[[15](https://arxiv.org/html/2605.15242#bib.bib15)\], have shown that reinforcement learning can optimize window sizes based on the spectral and temporal characteristics of the stream\. This adaptability is crucial for Kolmogorov\-based detection, as the complexity of the ”grammatical rules” might shift over time\. Integrating adaptive windowing with GNN\-based logic induction ensures that the self\-healing mechanism remains computationally efficient on resource\-constrained HIS architectures, a gap identified in recent surveys on GNN scalability\[[9](https://arxiv.org/html/2605.15242#bib.bib9)\]\.

## 3Methodology

### 3\.1Problem Formulation

Healthcare Information Systems \(HIS\) produce highly dynamic and heterogeneous relational data streams composed of patients, physicians, laboratory examinations, prescriptions, diagnoses, hospitalization events, and temporal clinical interactions\. Traditional anomaly detection methods interpret abnormality as statistical rarity; however, such approaches are insufficient in medical environments where rare events may correspond to life\-threatening but valid physiological conditions\.

Logic\-GNN reformulates anomaly detection as a problem oflogical consistencyover a temporal clinical graph\. Instead of identifying records that are statistically distant from the data distribution, the framework detects records that violate the latent symbolic grammar governing valid healthcare interactions\.

We formally define the HIS as a dynamic heterogeneous graph:

G\(t\)=\(V\(t\),E\(t\),X\(t\)\)G^\{\(t\)\}=\(V^\{\(t\)\},E^\{\(t\)\},X^\{\(t\)\}\)\(1\)
where:

- •V\(t\)V^\{\(t\)\}represents the set of clinical entities at timett,
- •E\(t\)E^\{\(t\)\}denotes temporal interactions among entities,
- •X\(t\)X^\{\(t\)\}contains multi\-modal node attributes\.

The objective of Logic\-GNN is to simultaneously:

1. 1\.learn a latent neuro\-symbolic grammarΓ\\Gamma,
2. 2\.estimate the Graph Kolmogorov Complexity \(GKC\),
3. 3\.identify logical inconsistencies,
4. 4\.generate self\-healing corrective suggestions\.

Unlike conventional approaches based purely on Euclidean density estimation, our framework models the HIS as a structured symbolic language whose integrity is governed by learnable logical constraints\.

Sina HISDatasetGeometric EncodingTemporal GAT LayerNeuro\-SymbolicLogic InductionComplexity LayerMDL ApproximationSelf\-HealingAnomaly Detection\(V,E\)\(V,E\)Zv\(t\)Z\_\{v\}^\{\(t\)\}ℒ\(G\)\\mathcal\{L\}\(G\)∇K\\nabla KFigure 1:Overall architecture of Logic\-GNN\. The framework integrates temporal graph attention, symbolic logic induction, and MDL\-based anomaly reasoning for self\-healing healthcare data integrity\.Figure[1](https://arxiv.org/html/2605.15242#S3.F1)illustrates the complete Logic\-GNN pipeline\. The framework first encodes clinical interactions into a geometric latent representation using a Temporal Graph Attention Network \(TGAT\)\. The resulting embeddings are then passed into a neuro\-symbolic reasoning module responsible for inducing soft First\-Order Logic \(FOL\) clauses\. Finally, the Minimum Description Length \(MDL\) approximator evaluates the complexity contribution of each clinical record and triggers the self\-healing mechanism whenever logical inconsistencies are detected\.

### 3\.2Temporal Clinical Graph Construction

Clinical workflows naturally exhibit relational and temporal dependencies\. Logic\-GNN therefore converts raw HIS records into a dynamic interaction graph capable of preserving temporal causal relationships\.

Each node corresponds to a medical entity:

- •patients,
- •physicians,
- •laboratory examinations,
- •prescriptions,
- •ICD\-10 diagnoses,
- •hospitalization events\.

Edges represent semantic clinical interactions including:

- •physician consultations,
- •diagnosis assignments,
- •prescription events,
- •laboratory requests,
- •temporal follow\-ups\.

Temporal annotations are attached to edges to preserve the sequential evolution of patient trajectories\.

PatientPhysicianLab TestICD\-10consultationdiagnosisrequestt1→t2t\_\{1\}\\rightarrow t\_\{2\}Figure 2:Temporal heterogeneous graph construction in Logic\-GNN\.As shown in Figure[2](https://arxiv.org/html/2605.15242#S3.F2), the temporal graph preserves both semantic and chronological dependencies\. This design allows Logic\-GNN to model evolving healthcare workflows and detect inconsistencies that only emerge over time\.

### 3\.3Geometric Representation Learning

Logic\-GNN employs a Temporal Graph Attention Network \(TGAT\) to encode relational clinical interactions into latent embeddings\.

For each nodevv, the hidden representation is updated according to:

hv\(t\+1\)=σ\(∑u∈𝒩\(v\)αuv\(t\)Whu\(t\)\)h\_\{v\}^\{\(t\+1\)\}=\\sigma\\left\(\\sum\_\{u\\in\\mathcal\{N\}\(v\)\}\\alpha\_\{uv\}^\{\(t\)\}Wh\_\{u\}^\{\(t\)\}\\right\)\(2\)
where:

- •𝒩\(v\)\\mathcal\{N\}\(v\)is the temporal neighborhood,
- •αuv\(t\)\\alpha\_\{uv\}^\{\(t\)\}denotes attention coefficients,
- •WWis a trainable projection matrix\.

Unlike conventional GNNs operating in Euclidean space, Logic\-GNN embeds nodes into a hyperbolic latent manifold\. This geometric choice is particularly suitable for healthcare systems because medical taxonomies such as ICD\-10 naturally exhibit hierarchical structures\.

The encoder therefore captures:

- •temporal dependencies,
- •latent hierarchical organization,
- •relational semantics,
- •long\-range clinical correlations\.

### 3\.4Differentiable Neuro\-Symbolic Logic Induction

The latent embeddings generated by the TGAT encoder are passed into a differentiable symbolic reasoning module\.

Instead of learning purely distributed vector representations, the framework induces a set of soft First\-Order Logic clauses:

Γ=\{C1,C2,…,Ck\}\\Gamma=\\\{C\_\{1\},C\_\{2\},\\dots,C\_\{k\}\\\}\(3\)
Examples of induced rules include:

Pregnancy\(x\)→Female\(x\)Pregnancy\(x\)\\rightarrow Female\(x\)\(4\)
PediatricWard\(x\)→Age\(x\)<18PediatricWard\(x\)\\rightarrow Age\(x\)<18\(5\)
The logical consistency score of nodevvis computed as:

H\(v\|Γ\)=−∑C∈Γ\(1−Snorm\(v⊧C\)\)H\(v\|\\Gamma\)=\-\\sum\_\{C\\in\\Gamma\}\(1\-S\_\{norm\}\(v\\models C\)\)\(6\)
whereSnormS\_\{norm\}denotes the soft satisfaction probability\.

This differentiable formulation allows symbolic constraints to be jointly optimized with neural embeddings through gradient descent\.

Incoming Clinical RecordLogicalViolation?Accept RecordIdentify Violated ClauseCompute∇XS\(v\)\\nabla\_\{X\}S\(v\)Generate CorrectionUpdate HIS DatabaseNoYesFigure 3:Self\-healing optimization mechanism in Logic\-GNN\. The framework detects logical inconsistencies, identifies violated clauses, computes graph complexity gradients, and generates corrective updates\.Figure[3](https://arxiv.org/html/2605.15242#S3.F3)presents the self\-healing optimization pipeline\. Unlike conventional anomaly detection systems that only flag suspicious records, Logic\-GNN actively identifies violated logical rules and computes corrective modifications capable of restoring clinical consistency\.

### 3\.5Graph Kolmogorov Complexity Approximation

The primary theoretical contribution of Logic\-GNN is the introduction of Graph Kolmogorov Complexity \(GKC\)\.

###### Definition 1\(Graph Kolmogorov Complexity\)\.

LetG=\(V,E,Γ\)G=\(V,E,\\Gamma\)denote a temporal clinical graph\.

The Graph Kolmogorov Complexity is defined as:

K\(G\)=minp∈𝒫⁡\{\|p\|:U\(p\)=G\}K\(G\)=\\min\_\{p\\in\\mathcal\{P\}\}\\\{\|p\|:U\(p\)=G\\\}\(7\)
where:

- •UUdenotes a universal Turing machine,
- •ppis the shortest program reconstructing the graph,
- •\|p\|\|p\|is the encoding length ofpp\.

Since exact Kolmogorov complexity is uncomputable, we approximate it through Minimum Description Length \(MDL\):

K\(G\)≈L\(Γ\)\+L\(G\|Γ\)K\(G\)\\approx L\(\\Gamma\)\+L\(G\|\\Gamma\)\(8\)
where:

- •L\(Γ\)L\(\\Gamma\)is the encoding length of the induced grammar,
- •L\(G\|Γ\)L\(G\|\\Gamma\)is the conditional graph encoding cost\.

Intuitively, logically consistent records compress efficiently under the induced grammar, whereas inconsistent records require exception encoding and therefore increase total graph complexity\.

### 3\.6MDL\-Based Anomaly Scoring

For each nodevv, Logic\-GNN computes:

S\(v\)=K\(G∪v\)−K\(G\)S\(v\)=K\(G\\cup v\)\-K\(G\)\(9\)
If:

the record is considered logically inconsistent\.

Unlike traditional statistical methods, this score measures semantic disruption rather than numerical rarity\.

### 3\.7Self\-Healing Optimization

Upon detecting an anomaly, the framework computes:

∇XS\(v\)\\nabla\_\{X\}S\(v\)\(11\)
which represents the gradient of graph complexity with respect to clinical attributes\.

The optimization objective becomes:

minΔx⁡S\(x\+Δx\)\\min\_\{\\Delta x\}S\(x\+\\Delta x\)\(12\)
The framework therefore identifies the minimum attribute modification required to restore logical consistency\.

For example, if a pregnancy diagnosis is assigned to a male patient, the optimization process determines whether the inconsistency originates from gender metadata, diagnostic coding, or laboratory corruption\.

### 3\.8Training Objective

The final optimization objective integrates:

- •graph reconstruction,
- •symbolic consistency,
- •variational regularization,
- •complexity minimization\.

The total loss is defined as:

ℒLogic−GNN=ℒrecon\+αK\(Γ\)\+β∑v∈V𝒟KL\(q\(Zv\|G\)\|\|p\(Zv\)\)\\mathcal\{L\}\_\{Logic\-GNN\}=\\mathcal\{L\}\_\{recon\}\+\\alpha K\(\\Gamma\)\+\\beta\\sum\_\{v\\in V\}\\mathcal\{D\}\_\{KL\}\(q\(Z\_\{v\}\|G\)\|\|p\(Z\_\{v\}\)\)\(13\)
where:

- •ℒrecon\\mathcal\{L\}\_\{recon\}is reconstruction loss,
- •K\(Γ\)K\(\\Gamma\)is logical program complexity,
- •𝒟KL\\mathcal\{D\}\_\{KL\}denotes KL\-divergence regularization,
- •α,β\\alpha,\\betaare balancing hyperparameters\.

### 3\.9Computational Complexity

The computational complexity of the TGAT encoder is approximately:

O\(\|E\|d\+\|V\|d2\)O\(\|E\|d\+\|V\|d^\{2\}\)\(14\)
whereddis the embedding dimensionality\.

The symbolic reasoning module contributes:

wherekkdenotes the number of induced logical clauses\.

Overall, Logic\-GNN scales linearly with temporal interactions, making the framework suitable for large\-scale HIS environments\.

## 4Experiments and Results

### 4\.1Experimental Setup

All experiments were conducted on the Sina Hospital Information System \(HIS\) dataset, a large\-scale, real\-world clinical repository gathered from 42 hospital departments\. The dataset comprises more than 2\.2 million clinical records belonging to 285,000 unique patients\. It includes a wide range of heterogeneous medical information such as demographic details, ICD\-10 diagnostic codes, laboratory measurements, prescription histories, physician interactions, and hospitalization events\.

To effectively capture the complex relational and temporal dependencies inherent in clinical workflows, we transformed the raw data into a dynamic heterogeneous temporal graph\. This graph consists of 285,000 patient nodes, approximately 1\.8 million temporal edges, and 3\.4 million attributed interactions\. Table[I](https://arxiv.org/html/2605.15242#S4.T1)presents the detailed statistics of both the dataset and the constructed clinical graph\.

TABLE I:Dataset and Graph StatisticsThe experiments were carried out on a high\-performance computing server equipped with an NVIDIA A100 GPU, 128 GB of RAM, and implemented using the PyTorch Geometric library\. This configuration provided sufficient computational resources for efficient training and inference on the large\-scale temporal graph\.

### 4\.2Baseline Comparison

We compared the performance of Logic\-GNN with several competitive baselines spanning both traditional machine learning and modern graph\-based methods: Isolation Forest, Variational Autoencoder \(VAE\), Graph Autoencoder \(GAE\), and our previous work EpiGraph\.

As shown in Table[II](https://arxiv.org/html/2605.15242#S4.T2), Logic\-GNN significantly outperforms all baselines across every evaluation metric\. The proposed model achieved a Precision of 0\.95, Recall of 0\.93, F1\-Score of 0\.94, and AUC of 0\.97\. This corresponds to an approximate 12% improvement in F1\-Score over the strongest baseline \(EpiGraph\)\. These results highlight the advantage of combining temporal graph neural networks with neuro\-symbolic logic induction and Graph Kolmogorov Complexity for distinguishing genuine medical outliers from logical data entry errors\.

TABLE II:Comparative Performance Evaluation
### 4\.3Ablation Study

To assess the contribution of each key component in the Logic\-GNN architecture, we conducted a comprehensive ablation study\. The results are summarized in Table[III](https://arxiv.org/html/2605.15242#S4.T3)\.

TABLE III:Ablation StudyThe removal of the symbolic logic induction layer resulted in the most significant performance degradation, with a notable increase in false positives\. This finding confirms that purely neural models have difficulty separating rare but clinically valid cases from genuine logical inconsistencies\. The complete Logic\-GNN model achieved the best trade\-off between detection accuracy and interpretability\.

### 4\.4Computational Scalability

We evaluated the scalability of Logic\-GNN by training the model on clinical graphs of progressively increasing size\. As reported in Table[IV](https://arxiv.org/html/2605.15242#S4.T4), the framework demonstrates excellent scalability while maintaining practical training times and memory usage, making it suitable for deployment in large\-scale hospital information systems\.

TABLE IV:Computational Scalability Analysis
### 4\.5Clinical Case Study

In addition to quantitative evaluation, we performed a detailed qualitative analysis of the anomalies detected by Logic\-GNN\. The model exhibited a strong ability to differentiate between true physiological extremes, temporal disease outbreaks, administrative errors, and logically impossible combinations\.

Specific examples of correctly identified inconsistencies include: pregnancy\-related procedures recorded for male patients, pediatric diagnoses assigned to elderly individuals, clinically impossible medication combinations, and contradictory temporal sequences in hospitalization events\. Importantly, in each detected case, the self\-healing optimization module generated meaningful correction suggestions \(e\.g\., recommending gender correction, diagnosis revision, or timestamp adjustment\)\. This interpretability and corrective capability provide a substantial advantage over conventional black\-box anomaly detection systems and demonstrate the practical value of the proposed neuro\-symbolic approach in real clinical environments\.

## 5Conclusion

This paper presented Logic\-GNN, a neuro\-symbolic approach to clinical data integrity\. By moving beyond statistical noise modeling and adopting a grammar\-induction perspective based on Kolmogorov Complexity, we demonstrated that AI can understand the ”logic” of healthcare\. Our model not only detects errors but also offers a path toward self\-healing databases\. Future work will investigate the application of this framework to decentralized blockchain\-based health records to ensure cross\-institutional data consistency\.

## References

- \[1\]L\. Waikhom and R\. Patgiri, “Graph Neural Networks: Methods, Applications, and Opportunities,”arXiv:2108\.10733v2 \[cs\.LG\], 2021\.
- \[2\]V\. Ponzi and C\. Napoli, “Graph Neural Networks: Architectures, Applications, and Future Directions,”IEEE Access, vol\. 13, pp\. 62870\-62891, 2025\.
- \[3\]A\. Zarghani, “EpiGraph: Anomaly Detection in Contact Networks for Early Disease Outbreak Prediction,”Preprint, 2024\.
- \[4\]A\. Zarghani and B\. B\. Haghighi, “A Novel Anomaly Detection Using Autoencoders on Contaminated Data,”Ferdowsi University of Mashhad, 2024\.
- \[5\]L\. Wu, P\. Cui, J\. Pei, and L\. Zhao,Graph Neural Networks: Foundations, Frontiers, and Applications, Springer, 2022\.
- \[6\]N\. Liu, Q\. Feng, and X\. Hu, “Interpretability in Graph Neural Networks,” inGraph Neural Networks: Foundations, Frontiers, and Applications, Springer, pp\. 121\-147, 2022\.
- \[7\]M\. Li and P\. Vitányi,An Introduction to Kolmogorov Complexity and Its Applications, 3rd ed\., Springer, 2008\.
- \[8\]A\. Zarghani, “Adaptive Sliding Window Optimization for Multi\-Dimensional Data Streams Using Reinforcement Learning,”Preprint, 2024\.
- \[9\]H\. Ma, Y\. Rong, and J\. Huang, “Graph Neural Networks: Scalability,” inGraph Neural Networks: Foundations, Frontiers, and Applications, Springer, pp\. 99\-119, 2022\.
- \[10\]A\. Zarghani and B\. B\. Haghighi, “A Novel Anomaly Detection Using Autoencoders on Contaminated Data,”Journal of Medical Systems \(Under Review\), 2024\.
- \[11\]A\. Zarghani, “EpiGraph: Anomaly Detection in Contact Networks for Early Disease Outbreak Prediction,”Preprint, 2024\.
- \[12\]V\. Ponzi and C\. Napoli, “Graph Neural Networks: Architectures, Applications, and Future Directions,”IEEE Access, vol\. 13, pp\. 62870\-62891, 2025\.
- \[13\]L\. Waikhom and R\. Patgiri, “Graph Neural Networks: Methods, Applications, and Opportunities,”arXiv:2108\.10733, 2021\.
- \[14\]M\. Li and P\. Vitányi,An Introduction to Kolmogorov Complexity and Its Applications, 3rd ed\. Springer, 2008\.
- \[15\]A\. Zarghani, “Adaptive Sliding Window Optimization for Multi\-Dimensional Data Streams Using Reinforcement Learning,”Preprint, 2024\.
- \[16\]L\. Wittgenstein,Philosophical Investigations, Blackwell, 1953\.
Logical Grammar Induction via Graph Kolmogorov Complexity: A Neuro-Symbolic Framework for Self-Healing Clinical Data Integrity

Similar Articles

Better Later Than Sooner: Neuro-Symbolic Knowledge Graph Construction via Ontology-grounded Post-extraction Correction

Neuro-Symbolic Verification of LLM Outputs for Data-Sensitive Domains (extended preprint)

Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis

Structural Preservation and the Logical Expressiveness of Graph Neural Networks

Learning Structural Manipulability in Gate-Level Netlists Using Graph Neural Networks

Submit Feedback

Similar Articles

Better Later Than Sooner: Neuro-Symbolic Knowledge Graph Construction via Ontology-grounded Post-extraction Correction
Neuro-Symbolic Verification of LLM Outputs for Data-Sensitive Domains (extended preprint)
Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis
Structural Preservation and the Logical Expressiveness of Graph Neural Networks
Learning Structural Manipulability in Gate-Level Netlists Using Graph Neural Networks