Scalable Uncertainty Reasoning in Knowledge Graphs

arXiv cs.AI Papers

Summary

This thesis proposes a modular framework for scalable uncertainty reasoning in knowledge graphs, addressing imprecise attribute values, probabilistic triple existence, and incomplete schema through tailored algebraic, logical, and geometric techniques.

arXiv:2605.16568v1 Announce Type: new Abstract: Knowledge Graphs are pivotal for semantic data integration. The real-world data they model is often inherently uncertain. Within knowledge graphs, uncertainty manifests in three distinct levels: imprecise attribute values, probabilistic triple existence, and incomplete schema knowledge. However, current Semantic Web standards lack native support for reasoning over such uncertainty, and na\"ive extensions often incur computational intractability. In this thesis, I aim to develop a modular framework that addresses each level through tailored techniques: (1) defining probabilistic literals and a corresponding query algebra for continuous attributes; (2) a compilation-based framework transforming SPARQL provenance into tractable probabilistic circuits for uncertain triples; and (3) topology-aware geometric embeddings for statistical schema reasoning. The central hypothesis is that specialized reasoning mechanisms, namely algebraic, logical, and geometric approaches, can reconcile semantic precision with computational tractability.
Original Article
View Cached Full Text

Cached at: 05/19/26, 06:34 AM

# Scalable Uncertainty Reasoning in Knowledge Graphs
Source: [https://arxiv.org/html/2605.16568](https://arxiv.org/html/2605.16568)
11institutetext:University of Stuttgart, Stuttgart, Germany
11email:jingcheng\.wu@ki\.uni\-stuttgart\.de###### Abstract

Knowledge Graphs are pivotal for semantic data integration\. The real\-world data they model is often inherently uncertain\. Within knowledge graphs, uncertainty manifests in three distinct levels: imprecise attribute values, probabilistic triple existence, and incomplete schema knowledge\. However, current Semantic Web standards lack native support for reasoning over such uncertainty, and naïve extensions often incur computational intractability\. In this thesis, I aim to develop a modular framework that addresses each level through tailored techniques: \(1\) defining probabilistic literals and a corresponding query algebra for continuous attributes; \(2\) a compilation\-based framework transforming SPARQL provenance into tractable probabilistic circuits for uncertain triples; and \(3\) topology\-aware geometric embeddings for statistical schema reasoning\. The central hypothesis is that specialized reasoning mechanisms, namely algebraic, logical, and geometric approaches, can reconcile semantic precision with computational tractability\.

## 1Introduction

Knowledge Graphs \(KGs\)\[[27](https://arxiv.org/html/2605.16568#bib.bib1)\]serve as a versatile framework for the semantic integration of heterogeneous data\[[8](https://arxiv.org/html/2605.16568#bib.bib4),[31](https://arxiv.org/html/2605.16568#bib.bib57)\], typically relying on W3C standards such as RDF\[[13](https://arxiv.org/html/2605.16568#bib.bib2)\]for data representation and SPARQL\[[23](https://arxiv.org/html/2605.16568#bib.bib3)\]for querying\. While these standards excel at retrieving deterministic facts, they operate under an assumption of binary truth\[[33](https://arxiv.org/html/2605.16568#bib.bib6)\]\. For instance, the triple \(:Motor123, rdf:type, :ElectricMotor\) asserts that the entity Motor123 is classified as an electric motor\. A standard query for this triple mandates an exact graph match, treating the statement as a binary fact that must explicitly exist in the dataset\.

In reality, KGs are frequently incomplete and subject to uncertainty, containing missing links and unreliable facts that conflict with the deterministic nature of W3C standards\[[20](https://arxiv.org/html/2605.16568#bib.bib5),[33](https://arxiv.org/html/2605.16568#bib.bib6),[52](https://arxiv.org/html/2605.16568#bib.bib56)\]\. This pervasive uncertainty is not a monolithic phenomenon but in heterogeneous forms that defy a “one\-size\-fits\-all” treatment\. Consider the contrast between a triple representing a confirmed relationship with an uncertain value, such as \(:Motor123,:hasTemperature,𝒩​\(80∘​C,1∘​C\)\\mathcal\{N\}\(80^\{\\circ\}\\text\{C\},1^\{\\circ\}\\text\{C\}\)\), and a triple where the existence of the statement is a probabilistic hypothesis, such as \(:Grinder07812,:hasFault,:Overheat\) withP=0\.12P=0\.12\. The former relies on calculus\-based integration to evaluate range\-based constraints over infinite continuous domains\[[50](https://arxiv.org/html/2605.16568#bib.bib38)\]\. Conversely, the latter employs combinatorial model counting to aggregate the probabilities of discrete possible worlds\[[43](https://arxiv.org/html/2605.16568#bib.bib8)\]\.

Attempting to unify these heterogeneous forms by coupling discrete structural probabilities with continuous attribute distributions often incurs prohibitive computational complexity, such as \#P\-hard inference\[[14](https://arxiv.org/html/2605.16568#bib.bib9)\], because the engine cannot efficiently reconcile discrete counting with infinite continuous domains\[[39](https://arxiv.org/html/2605.16568#bib.bib10)\]\. To achieve scalable reasoning without sacrificing semantic precision, we decompose this complexity by distinguishing the nature of the uncertainty\. Specifically, following\[[45](https://arxiv.org/html/2605.16568#bib.bib12),[47](https://arxiv.org/html/2605.16568#bib.bib11)\], we categorize KG uncertainty into three distinct levels:

- •Attribute\-level Uncertaintyrefers to the imprecision or inaccuracy of literal values\[[5](https://arxiv.org/html/2605.16568#bib.bib7),[25](https://arxiv.org/html/2605.16568#bib.bib13)\]\. At this level, the existence of the triple is deterministic, but the literal value remains uncertain due to factors such as measurement errors or sensor noise\. Recalling the aforementionedMotor123example, the triple \(:Motor123,:hasTemperature,ll\) is confirmed, but the literalllfollows the distribution𝒩​\(80∘​C,1∘​C\)\\mathcal\{N\}\(80^\{\\circ\}\\text\{C\},1^\{\\circ\}\\text\{C\}\)rather than a precise scalar\.
- •Triple\-level Uncertaintyconcerns the existential probability of a specific triple between two entities or an entity and a literal\[[20](https://arxiv.org/html/2605.16568#bib.bib5),[33](https://arxiv.org/html/2605.16568#bib.bib6)\]\. This uncertainty is frequently introduced by expert assessment\[[5](https://arxiv.org/html/2605.16568#bib.bib7)\], link prediction\[[6](https://arxiv.org/html/2605.16568#bib.bib40),[19](https://arxiv.org/html/2605.16568#bib.bib58)\], or information extraction\[[20](https://arxiv.org/html/2605.16568#bib.bib5)\]\. Recalling theGrinder07812example, based on domain experts’ observations, there is a 12% probability of an overheat fault\. Accordingly, the triple \(:Grinder07812,:hasFault,:Overheat\) is represented not as a binary fact, but as a probabilistic hypothesis whereP​\(:Grinder07812,:hasFault,:Overheat\)=0\.12P\(\\texttt\{:Grinder07812\},\\texttt\{:hasFault\},\\texttt\{:Overheat\}\)=0\.12\.
- •Group\-level Uncertaintyrefers to the probabilistic constraints and statistical regularities residing within the KG schema\[[34](https://arxiv.org/html/2605.16568#bib.bib14),[51](https://arxiv.org/html/2605.16568#bib.bib15)\]\. This level extends the scope of uncertainty from individual facts to terminological axioms regarding entire classes of entities\. Instead of focusing on isolated instances, it models general dependencies\. For example, regarding the class:AngleGrinderwhere:Grinder07812∈:AngleGrinder\\texttt\{:Grinder07812\}\\in\\texttt\{:AngleGrinder\}, statistical schema knowledge asserts that85%85\\%of such devices are equipped with a:DustCover\. Formally, this is encoded as the probabilistic inclusion axiom:AngleGrinder⊑0\.85∃:hasPart\.:DustCover\\texttt\{:AngleGrinder\}\\sqsubseteq\_\{0\.85\}\\exists\\texttt\{:hasPart\}\.\\texttt\{:DustCover\}\.

Despite their differences, these three levels face a common barrier: existing Semantic Web engines lack native support for probabilistic reasoning\. Attribute\-level handling is currently restricted to descriptive metadata or inefficient sampling; Triple\-level inference is \#P\-hard without structural optimization; and Group\-level logical reasoning scales poorly\. This thesis addresses these limitations by extending the Semantic Web stack with distinct reasoning mechanisms tailored to each uncertainty type\. Specifically, I study: \(1\) an algebraic query framework for Attribute\-level uncertainty, instantiated via closed\-form Gaussian Mixture Models \(GMMs\); \(2\) a logical compilation framework for Triple\-level uncertainty, transforming probabilistic graph patterns into tractable circuits; and \(3\) a geometric embedding model for Group\-level uncertainty, mapping statistical schemas onto topology\-aware manifolds\. The central hypothesis is that treating these levels through their respective algebraic, logical, and geometric lenses enables the reconciliation of semantic precision with computational tractability\.

## 2State of the Art

We analyze the state of the art based on the three levels\.

### 2\.1Attribute\-level Uncertainty

Existing standards within the Semantic Web, such as the Semantic Sensor Network \(SSN\) ontology\[[25](https://arxiv.org/html/2605.16568#bib.bib13)\]and ProbOnto\[[44](https://arxiv.org/html/2605.16568#bib.bib17)\], have established rich vocabularies for describing probability distributions and sensor observations\. Similarly, SCOVO\[[24](https://arxiv.org/html/2605.16568#bib.bib37)\]and the RDF Data Cube\[[12](https://arxiv.org/html/2605.16568#bib.bib33)\]provide ontologies for exchanging statistical data\. However, these frameworks function primarily asdescriptive metadata schemas\. They define how statistical data is structurally represented but lack the underlying query algebra required to execute operations, such as convolution or Bayesian fusion, directly within the database engine\.

In the RDF stream processing, Keskisärkkä et al\.\[[30](https://arxiv.org/html/2605.16568#bib.bib47),[29](https://arxiv.org/html/2605.16568#bib.bib48)\]introduced a custom literal datatype and SPARQL extensions for probabilistic filtering within the RSP\-QL\* model\. While sharing the high\-level idea of embedding distributions in RDF literals, their work targets the transient nature of data streams and differs from our persistent KG framework in three key respects\. First, their representation conflates random variables with distributions, whereas our framework separates them to uniformly handle multi\-dimensional data\. Second, their operations map distributions strictly to scalar probabilities without algebraic closure, precluding chained transformations \(e\.g\., convolution, Bayesian fusion\) or distributional comparisons like similarity joins\. Third, they support only basic parametric families, whereas our polymorphic support for heterogeneous distribution families \(including GMMs, Dirichlet, and histograms\) moving beyond basic parametric forms\.

Beyond the Semantic Web, handling continuous variables in probabilistic databases often incurs prohibitive computational costs\. In relational databases, systems like Orion\[[41](https://arxiv.org/html/2605.16568#bib.bib32)\]and MCDB\[[28](https://arxiv.org/html/2605.16568#bib.bib34)\]allow multiple probability distributions as tuple attributes\. Orion defines algebraic operations such as floor, marginalize, and product over distributions, whereas MCDB relies on generating thousands of random samples during query evaluation\. Although the Monte Carlo approach offers flexibility, it introduces massive runtime overhead, rendering it unsuitable for the interactive querying demands of large\-scale knowledge graphs\.

These limitations reveal a critical research gap: the absence of a query algebra that treats diverse probability distributions \(encompassing discrete, continuous, parametric, and non\-parametric forms\) asfirst\-class citizenswithin the RDF data model\. Grounded in standard random variable modeling, this algebra supports algebraically closed distribution\-to\-distribution transformations and similarity joins across heterogeneous families, while maintaining computational efficiency through a hybrid of closed\-form operations and different sampling strategies\[[38](https://arxiv.org/html/2605.16568#bib.bib49),[35](https://arxiv.org/html/2605.16568#bib.bib50),[46](https://arxiv.org/html/2605.16568#bib.bib52)\]\.

### 2\.2Triple\-level Uncertainty

This level addresses the existential probability of relations between entities\. The foundational framework is the Tuple\-Independent Database \(TID\)\[[18](https://arxiv.org/html/2605.16568#bib.bib19),[9](https://arxiv.org/html/2605.16568#bib.bib18),[43](https://arxiv.org/html/2605.16568#bib.bib8)\], where every triple is an independent Bernoulli event and query evaluation maps to Weighted Model Counting on the provenance\. However, real\-world triples frequently exhibit dependencies modeled via PGMs\[[32](https://arxiv.org/html/2605.16568#bib.bib24)\], and integrating such structures into SPARQL evaluation remains open\.

Even under TID, the Dichotomy Theorem\[[15](https://arxiv.org/html/2605.16568#bib.bib20)\]classifies conjunctive queries into Safe \(PTIME\) and Unsafe \(\#P\-hard\) classes\. Safe queries admit lifted inference directly from query structure, but existing provenance engines\[[2](https://arxiv.org/html/2605.16568#bib.bib23),[26](https://arxiv.org/html/2605.16568#bib.bib22)\]based on the semiring framework fail to exploit this classification\. While these systems provide a general algebraic approach for metadata propagation, they treat all query patterns uniformly, missing opportunities to perform efficient lifted inference for safe graph patterns\.

For the full expressivity of SPARQL, including non\-monotonic operators \(OPTIONAL, MINUS\), Geerts et al\.\[[21](https://arxiv.org/html/2605.16568#bib.bib21)\]introduced spm\-semirings to capture their provenance semantics, implemented by SPARQLProv\[[26](https://arxiv.org/html/2605.16568#bib.bib22)\]and NPCS\[[2](https://arxiv.org/html/2605.16568#bib.bib23)\]\. However, probabilistic evaluation of the resulting lineage remains computationally prohibitive\[[18](https://arxiv.org/html/2605.16568#bib.bib19)\]\.

To mitigate this computational intractability, the database community leverages knowledge compilation \(KC\)\[[17](https://arxiv.org/html/2605.16568#bib.bib42)\]\. This paradigm shifts the complexity from online query evaluation to an offline compilation phase, transforming the lineage formula into a tractable target language, notably deterministic Decomposable Negation Normal Form \(d\-DNNF\)\[[18](https://arxiv.org/html/2605.16568#bib.bib19),[9](https://arxiv.org/html/2605.16568#bib.bib18)\]\. Unlike raw lineage expressions, d\-DNNFs satisfy determinism and decomposability, two structural properties that together reduce inference complexity from \#P\-hard to linear in the circuit size\. While KC is highly effective for monotonic relational queries, adapting it to handle the non\-monotonic semantics of full SPARQL with spm\-semirings provenance remains a significant open problem that this thesis also aims to address\.

We note that Fuzzy Logic\[[33](https://arxiv.org/html/2605.16568#bib.bib6),[42](https://arxiv.org/html/2605.16568#bib.bib35),[53](https://arxiv.org/html/2605.16568#bib.bib36)\]assigns degrees of truth rather than probabilities and lacks the statistical compositionality required for rigorous inference\. This thesis therefore focuses on probabilistic semantics\.

These observations identify three interrelated gaps: \(1\) the failure to exploit query structure for lifted inference; \(2\) the absence of efficient inference mechanisms for non\-monotonic provenance; and \(3\) the lack of native support for tuple dependencies beyond the TID assumption\. This thesis addresses these gaps through a unified compilation framework that transforms SPARQL lineage into tractable circuit representations\.

### 2\.3Group\-level Uncertainty

This level addresses statistical constraints over terminological knowledge, asserting conditional probabilities such as “85%85\\%of angle grinders are equipped with a dust cover\.” The foundational formalism for this reasoning is Statisticalℰ​ℒ\\mathcal\{EL\}\(SEL\)\[[37](https://arxiv.org/html/2605.16568#bib.bib25)\]\. While SEL provides rigorous semantics for these probabilistic terminological axioms, theoretical analyses establish that its exact satisfiability checking isExptime\-complete\[[4](https://arxiv.org/html/2605.16568#bib.bib26)\]\. This high complexity renders classical inference mechanisms relying on linear programming\[[34](https://arxiv.org/html/2605.16568#bib.bib14)\]or tableau algorithms\[[7](https://arxiv.org/html/2605.16568#bib.bib27)\]computationally infeasible for large\-scale KGs\[[51](https://arxiv.org/html/2605.16568#bib.bib15)\]\.

To overcome the scalability barrier, recent research has pivoted towards geometric neuro\-symbolic approximations\[[22](https://arxiv.org/html/2605.16568#bib.bib29),[48](https://arxiv.org/html/2605.16568#bib.bib28)\]\. Representative geometric embedding models, exemplified by BoxEL\-based methods, map concepts to axis\-parallel boxes in vector space\[[48](https://arxiv.org/html/2605.16568#bib.bib28)\]\. This paradigm shift replaces intractable logical deduction with polynomial\-time geometric measurement, modeling the conditional probabilityP​\(D∣C\)P\(D\\mid C\)as a volumetric ratio\[[51](https://arxiv.org/html/2605.16568#bib.bib15)\]:

P​\(D∣C\)≈Vol​\(Box​\(C\)∩Box​\(D\)\)Vol​\(Box​\(C\)\)P\(D\\mid C\)\\approx\\frac\{\\text\{Vol\}\(\\text\{Box\}\(C\)\\cap\\text\{Box\}\(D\)\)\}\{\\text\{Vol\}\(\\text\{Box\}\(C\)\)\}\(1\)However, these models typically operate within flat Euclidean space, creating a fundamental topological mismatch when representing intrinsically hierarchical ontologies such as taxonomies\[[49](https://arxiv.org/html/2605.16568#bib.bib30)\]\. Euclidean embeddings often require high dimensionality to preserve such structures, leading to parameter inefficiency and systematic approximation errors\[[36](https://arxiv.org/html/2605.16568#bib.bib31)\]\.

This observation identifies a critical research gap: the absence of topology\-aware geometric embeddings that match the structural characteristics of ontologies to appropriate manifolds, thereby improving approximation fidelity while maintaining computational tractability\.

## 3Problem Statement and Contributions

In this thesis, I will investigate: 1\) how to extend SPARQL to support algebraic operations over continuous probability distributions for attribute\-level uncertainty; 2\) how to efficiently evaluate complex, non\-monotonic SPARQL queries by compiling their lineage into tractable probabilistic circuits\[[18](https://arxiv.org/html/2605.16568#bib.bib19),[10](https://arxiv.org/html/2605.16568#bib.bib41)\]for triple\-level uncertainty; and 3\) how to approximate intractable terminological reasoning by embedding concepts into topology\-aware geometric manifolds \(e\.g\., Hyperbolic space\) for group\-level uncertainty\.

Problem 1: Attribute\-level Uncertainty\.An RDF graph𝒢\\mathcal\{G\}is a finite set of triples\(s,p,o\)∈\(ℐ∪ℬ\)×ℐ×\(ℐ∪ℬ∪ℒ\)\(s,p,o\)\\in\(\\mathcal\{I\}\\cup\\mathcal\{B\}\)\\times\\mathcal\{I\}\\times\(\\mathcal\{I\}\\cup\\mathcal\{B\}\\cup\\mathcal\{L\}\), whereℐ,ℬ\\mathcal\{I\},\\mathcal\{B\}, andℒ\\mathcal\{L\}denote disjoint sets of IRIs, blank nodes, and literals, respectively\. The evaluation of a SPARQL queryQQover𝒢\\mathcal\{G\}, denoted as⟦Q⟧𝒢\\llbracket Q\\rrbracket\_\{\\mathcal\{G\}\}, relies on deterministic pattern matching to yield a set of solution mappingsμ:V→\(ℐ∪ℬ∪ℒ\)\\mu:V\\to\(\\mathcal\{I\}\\cup\\mathcal\{B\}\\cup\\mathcal\{L\}\)\. A fundamental representational gap exists because the literal setℒ\\mathcal\{L\}consists strictly of deterministic values\. Standard RDF specifications provide no built\-in datatype to encode continuous random variables, such as the multi\-dimensional distributions inherently present in sensor measurements\. Furthermore, the standard SPARQL algebra lacks built\-in operators to manipulate such probabilistic data\. Specifically, it offers no formal semantics to evaluate probabilistic filter constraints \(e\.g\.,FILTER\(P\(?X \> ?c\) \>= ?theta\)\) or to execute closed\-form algebraic transformations, such as distribution fusion, as native query operations\.

Hypothesis 1\.Extending SPARQL with a native datatype for continuous random variables and corresponding closed\-form algebraic operators, instantiated via GMMs \(universal approximators admitting closed\-form solutions for operations such as convolution and product\), allows for the manipulation of uncertain data with semantic precision comparable to Monte Carlo simulations but at significantly lower latency\.

The related research questions are:

- •RQ 1\.1\.How can multi\-dimensional sensor measurement data be efficiently represented as random variables and treated as first\-class citizens within the RDF data model?
- •RQ 1\.2\.How can common probabilistic functions be executed as SPARQL operations that interact seamlessly with other knowledge graph representations?
- •RQ 1\.3\.How does the proposed algebraic approach compare to sampling\-based baselines in terms of query execution time and approximation error?

Problem 2: Triple\-level Uncertainty\.We address the challenge of scalable computation of exact probabilities for SPARQL queries over a probabilistic knowledge graph𝒢p​r​o​b\\mathcal\{G\}\_\{prob\}\. Formally, we define𝒢p​r​o​b\\mathcal\{G\}\_\{prob\}as a pair\(𝒢,𝒫\)\(\\mathcal\{G\},\\mathcal\{P\}\), where𝒢\\mathcal\{G\}is an RDF graph and𝒫\\mathcal\{P\}denotes a joint probability distribution over the subgraphs of𝒢\\mathcal\{G\}\. The TID model parameterizes the joint distribution by associating a marginal probabilityP​\(t\)∈\(0,1\]P\(t\)\\in\(0,1\]with each triplet∈𝒢t\\in\\mathcal\{G\}, assuming independence such that𝒫\\mathcal\{P\}is determined by the factorization𝒫​\(W\)=∏t∈WP​\(t\)​∏t∈𝒢∖W\(1−P​\(t\)\)\\mathcal\{P\}\(W\)=\\prod\_\{t\\in W\}P\(t\)\\prod\_\{t\\in\\mathcal\{G\}\\setminus W\}\(1\-P\(t\)\)for any worldW⊆𝒢W\\subseteq\\mathcal\{G\}\. Crucially, accurately modeling real\-world data requires extending beyond the TID assumption to capture complex dependencies between triples, thereby incorporating conditional probabilities to specify𝒫\\mathcal\{P\}\. The query semantics are grounded in the possible worlds framework, where the sample spaceΩ\\Omegaconsists of2\|𝒢\|2^\{\|\\mathcal\{G\}\|\}deterministic subgraphs of𝒢\\mathcal\{G\}, each weighted by𝒫\\mathcal\{P\}\.

As discussed in Section[2\.2](https://arxiv.org/html/2605.16568#S2.SS2), three barriers remain\. First, existing engines do not exploit the Safe/Unsafe classification to perform lifted inference for tractable query classes\. Second, while spm\-semirings provide a representational solution for non\-monotonic operators, no efficient inference mechanism exists to evaluate the resulting lineage expressions\. Third, integrating probabilistic dependencies modeled via PGMs with SPARQL query evaluation remains an open challenge\.

Hypothesis 2\.Compiling provenance lineage and probabilistic dependencies into probabilistic circuits \(e\.g\., d\-DNNF\[[16](https://arxiv.org/html/2605.16568#bib.bib43)\]\) enables scalable exact inference for SPARQL queries, reducing the inference complexity from exponential to polynomial in the size of the circuit\.

The related research questions are:

- •RQ 2\.1\.How can independent sub\-patterns within a SPARQL query plan be identified and exploited during knowledge compilation to achieve lifted inference for monotonic fragments?
- •RQ 2\.2\.How can lineage polynomials defined over spm\-semirings be correctly compiled into probabilistic circuits while preserving the probabilistic semantics of non\-monotonic operators?
- •RQ 2\.3\.How can probabilistic dependencies between triples, modeled via Bayesian Networks, be efficiently compiled into a circuit representation and integrated with the query lineage to support exact inference over correlated data?

Problem 3: Group\-level Uncertainty\.This level addresses the integration of statistical terminological knowledge into knowledge graph querying\. Formally, we consider Statisticalℰ​ℒ\\mathcal\{EL\}\(𝒮​ℰ​ℒ\\mathcal\{SEL\}\), a probabilistic extension of the lightweight Description Logicℰ​ℒ\\mathcal\{EL\}where probabilities represent proportions in the domain\. A𝒮​ℰ​ℒ\\mathcal\{SEL\}knowledge base is a tuple𝒦=\(𝒯,𝒜\)\\mathcal\{K\}=\(\\mathcal\{T\},\\mathcal\{A\}\), consisting of a TBox𝒯\\mathcal\{T\}and an ABox𝒜\\mathcal\{A\}\. The TBox𝒯\\mathcal\{T\}contains both classical axioms \(e\.g\.,C⊑DC\\sqsubseteq D\) and statistical conditionals of the form\(D∣C\)​\[p\]\(D\\mid C\)\[p\], asserting that the proportion ofCC\-instances that are alsoDD\-instances ispp\. Classical subsumptionC⊑DC\\sqsubseteq Darises as the special case wherep=1p=1\. The ABox𝒜\\mathcal\{A\}contains assertions about specific individuals\. Given an ABox𝒜\\mathcal\{A\}and a target conceptCC, the reasoning task is to compute the probability of instance membershipa:Ca:C, which requires projecting the statistical proportions from𝒯\\mathcal\{T\}onto the individualaaconditioned on the explicit assertions in𝒜\\mathcal\{A\}\.

Suppose𝒯\\mathcal\{T\}contains the conditional\(HasFault∣AngleGrinder\)​\[0\.12\]\(\\texttt\{HasFault\}\\mid\\texttt\{AngleGrinder\}\)\[0\.12\], and the ABox assertsGrinder07812:AngleGrinder\\texttt\{Grinder07812\}:\\texttt\{AngleGrinder\}without explicit fault information\. A probabilistic query asking for all potentially faulty devices should returnGrinder07812with probability0\.120\.12, derived from the statistical schema constraint\. This reasoning pattern, propagating schema\-level statistics to instance\-level probability estimates, constitutes the core computational task of group\-level uncertainty\.

As discussed in Section[2\.3](https://arxiv.org/html/2605.16568#S2.SS3), two fundamental challenges impede this reasoning task\. First, exact probabilistic inference in𝒮​ℰ​ℒ\\mathcal\{SEL\}isExptime\-complete, motivating the use of approximate inference methods to achieve scalability\. Second, while geometric embedding approaches such as BoxEL\[[48](https://arxiv.org/html/2605.16568#bib.bib28)\]offer tractable approximations, their representational capacity is constrained by the geometry of flat Euclidean space, which fails to preserve hierarchical structures without incurring excessive dimensionality\.

Hypothesis 3\.Mapping ontological concepts to manifolds that match their structural characteristics \(e\.g\., hyperbolic space for hierarchies\) improves approximation fidelity compared to flat Euclidean embeddings, while maintaining polynomial\-time inference complexity\.

The related research questions are:

- •RQ 3\.1\.How can probabilistic box embeddings be generalized from Euclidean space to non\-Euclidean manifolds \(e\.g\., Hyperbolic space\) while preserving tractable volume computation for conditional probability estimation?
- •RQ 3\.2\.How can the structural characteristics of an ontology \(e\.g\., hierarchical depth, branching factor\) guide the selection of appropriate geometric spaces to improve approximation fidelity?

## 4Research Methodology and Approach

In this section, I describe the proposed approach to address the problems identified in Section[3](https://arxiv.org/html/2605.16568#S3)\.

### 4\.1Approach to Attribute\-level Uncertainty

To address the limitations identified in Section[2\.1](https://arxiv.org/html/2605.16568#S2.SS1), my approach embeds probability distributions as first\-class citizens within the RDF data model, instantiating the framework using GMMs, which serve as universal approximators that admit closed\-form algebraic solutions\.

To addressRQ 1\.1andRQ 1\.2, I propose a two\-layered extension to the Semantic Web stack\. At the data level, I will define a custom RDF datatype to represent continuous random variables by encoding their probability distributions as structured literals\. At the query level, I will extend the SPARQL algebra with distribution\-aware operators, including probabilistic filtering \(P​\(X\>c\)≥θP\(X\>c\)\\geq\\theta\), Bayesian fusion, and similarity joins\. This design ensures that uncertainty reasoning is handled natively within the query algebra rather than through external application logic\. AddressingRQ 1\.3, I will benchmark the algebraic framework against Monte Carlo baselines to quantify the trade\-off between query latency and approximation fidelity\. Furthermore, I will specifically evaluate the performance impact of the proposed extensions by comparing the execution times of queries utilizing similarity joins and filters against standard SPARQL benchmarks\.

To evaluate these contributions, I construct synthetic knowledge graphs scaling up to 3 million triples with varying distribution complexities \(e\.g\., mixture models withK∈\{1,3,5,10\}K\\in\\\{1,3,5,10\\\}components\)\. Since standard SPARQL benchmarks lack uncertainty, I generate parallel deterministic datasets \(replacing each GMM with its scalar mean\) to isolate the overhead of probabilistic extensions\. Beyond overhead ratio, mean absolute error, and latency scaling, the evaluation targets two additional dimensions\. First, I compare the dedicatedSIMJOINoperator against a naïveBIND\+FILTERpipeline to quantify the benefit of distribution\-aware pruning via the Data Processing Inequality\[[3](https://arxiv.org/html/2605.16568#bib.bib51)\]\. Second, I evaluate multiple sampling strategies \(naïve Monte Carlo, stratified sampling\[[38](https://arxiv.org/html/2605.16568#bib.bib49)\], sequential testing\[[46](https://arxiv.org/html/2605.16568#bib.bib52)\], and an adaptive cascade\) to assess per\-pair latency and classification accuracy across varying difficulty levels\.

### 4\.2Approach to Triple\-level Uncertainty

To address the limitations identified in Section[2\.2](https://arxiv.org/html/2605.16568#S2.SS2), my approach leverages the framework of probabilistic circuits\[[11](https://arxiv.org/html/2605.16568#bib.bib44)\]\. This technique shifts the computational burden from online query evaluation to an offline compilation phase, producing tractable circuit representations that enable linear\-time inference\. To integrate this technique into the Semantic Web stack, the system operates as a query\-rewriting middleware\. It intercepts standard SPARQL queries, evaluates them under possible\-worlds semantics, and returns solution mappings annotated with marginal probabilities\.

To tackleRQ 2\.1, a static analysis phase classifies queries according to the Safe/Unsafe dichotomy\[[15](https://arxiv.org/html/2605.16568#bib.bib20)\]\. For Safe queries, the system rewrites the query into a safe evaluation plan that decomposes probability computation along independent sub\-queries, enabling lifted inference without compilation\. For Unsafe queries, the provenance lineage is compiled into d\-DNNF circuits\. To studyRQ 2\.2, this compilation must correctly handle the monus operator \(⊖\\ominus\) from spm\-semiring provenance while preserving decomposability for tractable inference\. AnsweringRQ 2\.3, I will extend the framework to incorporate probabilistic dependencies modeled as Bayesian Networks\. By encoding the network structure as Conjunctive Normal Form \(CNF\) constraints and compiling them jointly with the query lineage, the system captures both query structure and tuple correlations in a unified circuit representation\.

To mitigate the potential exponential cost of compilation, I plan to investigate circuit caching techniques to amortize the offline overhead for workloads with recurring query patterns\.

For evaluation, correctness is verified by comparing circuit\-based inference against brute\-force enumeration over all2\|G\|2^\{\|G\|\}possible worlds on small\-scale probabilistic KGs \(\|G\|≤20\|G\|\\leq 20\)\. Efficiency is assessed by measuring compilation time, circuit size, and online inference time as functions of graph size and query complexity\. I additionally compare lifted inference on Safe queries against full compilation to quantify the benefit of exploiting query structure\.

### 4\.3Approach to Group\-level Uncertainty

To address the limitations identified in Section[2\.3](https://arxiv.org/html/2605.16568#S2.SS3), my approach adopts the geometric approximation paradigm, modeling ontological axioms as spatial containment in the embedding space and estimating conditional probabilities via volumetric ratios\.

Existing models such as BoxEL\[[48](https://arxiv.org/html/2605.16568#bib.bib28)\]operate in flat Euclidean space, where representing deep hierarchies requires high\-dimensional embeddings because Euclidean volume grows only polynomially with the radius\. To addressRQ 3\.1, I plan to extend box embeddings to non\-Euclidean manifolds, such as hyperbolic space, where volume grows exponentially with the radius, naturally matching the exponential branching of taxonomic hierarchies while requiring fewer dimensions\. AddressingRQ 3\.2, I will investigate how structural properties of an ontology, such as Gromov hyperbolicity, correlate with embedding quality\. This study aims to provide theoretical guidance on the conditions under which non\-Euclidean embeddings offer clear advantages\.

Approximation fidelity is evaluated against exact inference on small\-scaleℰ​ℒ\\mathcal\{EL\}knowledge bases using mean absolute error\. I compare Euclidean BoxEL against hyperbolic embeddings at equal dimensionality, varying ontology depth and branching factor to test Hypothesis 3\. Scalability is assessed by measuring training and inference time as a function of ontology size\.

## 5Preliminary Results

To natively integrate probabilistic literals into SPARQL for attribute\-level uncertainty, I developedProbSPARQL\. It ensures algebraic closure through distribution\-valued operators \(e\.g\., convolution\) while extracting scalar statistics via numeric ones \(e\.g\.,CDF,JSD\)\. This framework, which additionally introduces a divergence\-based similarity join \(SIMJOIN\), is implemented atop Apache Jena/Fuseki\.

Benchmarks on 3M\-triple KGs show a1\.25×1\.25\\times–2\.16×2\.16\\timesconstant overhead over deterministic baselines\. Pushing probabilistic filters below joins yields up to a17\.7×17\.7\\timesspeedup\. ForSIMJOIN, Data Processing Inequality pruning eliminates90%90\\%of candidates, achieving178\.8×178\.8\\timesend\-to\-end speedup\. The architecture supports histogram and Dirichlet distributions\. Histogram\-based similarity computation is three orders of magnitude faster than Monte Carlo sampling\. Code and datasets are available\.111[https://github\.com/0sidewalkenforcer0/ProbSPARQL](https://github.com/0sidewalkenforcer0/ProbSPARQL)

## 6Conclusions

This thesis addresses scalable uncertainty reasoning in KGs through a decompose\-then\-specialize methodology: algebraic operators for attribute\-level distributions, compilation\-based circuits for triple\-level provenance, and geometric embeddings for group\-level statistical schemas\. The preliminary results onProbSPARQL\(Section[5](https://arxiv.org/html/2605.16568#S5)\) validate this strategy for the attribute level\. While restricting group\-level reasoning toℰ​ℒ\\mathcal\{EL\}limits expressivity,ℰ​ℒ\\mathcal\{EL\}underpins major ontologies \(e\.g\., SNOMED CT\[[40](https://arxiv.org/html/2605.16568#bib.bib55)\], Gene Ontology\[[1](https://arxiv.org/html/2605.16568#bib.bib54)\]\) and provides a natural starting point\. Extensions to more expressive logics remain an open question\.

## Acknowledgments

I would like to express sincere gratitude to my supervisor, Prof\. Dr\. Steffen Staab, and my co\-supervisors, Dr\. Ratan Bahadur Thapa and Dr\. Daniel Hernández, for their continuous support and guidance throughout this research\. This Ph\.D\. project is funded by the Deutsche Forschungsgemeinschaft \(DFG, German Research Foundation\) within the Collaborative Research Center SFB 1574, Project number 471687386\.

## References

- \[1\]M\. Ashburner, C\. A\. Ball, J\. A\. Blake,et al\.\(2000\)Gene ontology: tool for the unification of biology\.Nature genetics25\(1\),pp\. 25–29\.External Links:[Document](https://dx.doi.org/10.1038/75556)Cited by:[§6](https://arxiv.org/html/2605.16568#S6.p1.2)\.
- \[2\]Z\. Asma, D\. Hernández, L\. Galárraga, G\. Flouris, I\. Fundulaki, and K\. Hose\(2024\)NPCS: native provenance computation for SPARQL\.InWWW,pp\. 2085–2093\.External Links:[Document](https://dx.doi.org/10.1145/3589334.3645557)Cited by:[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p2.1),[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p3.1)\.
- \[3\]N\. J\. Beaudry and R\. Renner\(2012\)An intuitive proof of the data processing inequality\.Quantum Inf\. Comput\.12\(5\-6\),pp\. 432–441\.External Links:[Document](https://dx.doi.org/10.26421/QIC12.5-6-4)Cited by:[§4\.1](https://arxiv.org/html/2605.16568#S4.SS1.p3.1)\.
- \[4\]B\. Bednarczyk\(2021\)Statistical EL is exptime\-complete\.Inf\. Process\. Lett\.169,pp\. 106113\.External Links:[Document](https://dx.doi.org/10.1016/J.IPL.2021.106113)Cited by:[§2\.3](https://arxiv.org/html/2605.16568#S2.SS3.p1.2)\.
- \[5\]F\. Bobillo and U\. Straccia\(2011\)Fuzzy ontology representation using owl 2\.Int\. J\. Approx\. Reasoning52\(7\),pp\. 1073–1094\.External Links:[Document](https://dx.doi.org/10.1016/j.ijar.2011.05.003)Cited by:[1st item](https://arxiv.org/html/2605.16568#S1.I1.i1.p1.3),[2nd item](https://arxiv.org/html/2605.16568#S1.I1.i2.p1.1)\.
- \[6\]A\. Bordes, N\. Usunier, A\. García\-Durán, J\. Weston, and O\. Yakhnenko\(2013\)Translating embeddings for modeling multi\-relational data\.InNIPS,pp\. 2787–2795\.External Links:[Link](https://proceedings.neurips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html)Cited by:[2nd item](https://arxiv.org/html/2605.16568#S1.I1.i2.p1.1)\.
- \[7\]L\. Botha, T\. Meyer, and R\. Peñaloza\(2019\)A bayesian extension of the description logic*ALC*\.InJELIA,Lecture Notes in Computer Science, Vol\.11468,pp\. 339–354\.External Links:[Document](https://dx.doi.org/10.1007/978-3-030-19570-0%5F22)Cited by:[§2\.3](https://arxiv.org/html/2605.16568#S2.SS3.p1.2)\.
- \[8\]G\. Buchgeher, D\. Gabauer, J\. Martinez\-Gil, and L\. Ehrlinger\(2021\)Knowledge graphs in manufacturing and production: A systematic literature review\.IEEE Access9,pp\. 55537–55554\.External Links:[Document](https://dx.doi.org/10.1109/ACCESS.2021.3070395)Cited by:[§1](https://arxiv.org/html/2605.16568#S1.p1.1)\.
- \[9\]İ\. İ\. Ceylan, A\. Darwiche, and G\. V\. den Broeck\(2016\)Open\-world probabilistic databases\.InKR,pp\. 339–348\.External Links:[Link](https://aaai.org/papers/35-12908-open-world-probabilistic-databases/)Cited by:[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p1.1),[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p4.1)\.
- \[10\]İ\. İ\. Ceylan, A\. Darwiche, and G\. V\. den Broeck\(2021\)Open\-world probabilistic databases: semantics, algorithms, complexity\.Artif\. Intell\.295,pp\. 103474\.External Links:[Document](https://dx.doi.org/10.1016/J.ARTINT.2021.103474)Cited by:[§3](https://arxiv.org/html/2605.16568#S3.p1.1)\.
- \[11\]Y\. Choi, A\. Vergari, and G\. Van den Broeck\(2020\)Probabilistic circuits: a unifying framework for tractable probabilistic models\.Technical reportUCLA\.External Links:[Link](http://starai.cs.ucla.edu/papers/ProbCirc20.pdf)Cited by:[§4\.2](https://arxiv.org/html/2605.16568#S4.SS2.p1.1)\.
- \[12\]R\. Cyganiak and D\. Reynolds\(2014\-01\)The RDF Data Cube Vocabulary\.Note:W3C RecommendationExternal Links:[Link](https://www.w3.org/TR/vocab-data-cube/)Cited by:[§2\.1](https://arxiv.org/html/2605.16568#S2.SS1.p1.1)\.
- \[13\]R\. Cyganiak, D\. Wood, and M\. Lanthaler\(2014\-02\)RDF 1\.1 concepts and abstract syntax\.W3C RecommendationW3C\.External Links:[Link](https://www.w3.org/TR/rdf11-concepts/)Cited by:[§1](https://arxiv.org/html/2605.16568#S1.p1.1)\.
- \[14\]N\. N\. Dalvi and D\. Suciu\(2007\)Efficient query evaluation on probabilistic databases\.VLDB J\.16\(4\),pp\. 523–544\.External Links:[Document](https://dx.doi.org/10.1007/S00778-006-0004-3)Cited by:[§1](https://arxiv.org/html/2605.16568#S1.p3.1)\.
- \[15\]N\. N\. Dalvi and D\. Suciu\(2012\)The dichotomy of probabilistic inference for unions of conjunctive queries\.J\. ACM59\(6\),pp\. 30:1–30:87\.External Links:[Document](https://dx.doi.org/10.1145/2395116.2395119)Cited by:[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p2.1),[§4\.2](https://arxiv.org/html/2605.16568#S4.SS2.p2.1)\.
- \[16\]A\. Darwiche and P\. Marquis\(2002\)A knowledge compilation map\.J\. Artif\. Intell\. Res\.17,pp\. 229–264\.External Links:[Document](https://dx.doi.org/10.1613/JAIR.989)Cited by:[§3](https://arxiv.org/html/2605.16568#S3.p8.1)\.
- \[17\]A\. Darwiche and P\. Marquis\(2011\)A knowledge compilation map\.CoRRabs/1106\.1819\.External Links:[Link](http://arxiv.org/abs/1106.1819)Cited by:[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p4.1)\.
- \[18\]G\. V\. den Broeck and D\. Suciu\(2017\)Query processing on probabilistic data: A survey\.Found\. Trends Databases7\(3\-4\),pp\. 197–341\.External Links:[Document](https://dx.doi.org/10.1561/1900000052)Cited by:[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p1.1),[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p3.1),[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p4.1),[§3](https://arxiv.org/html/2605.16568#S3.p1.1)\.
- \[19\]Z\. Ding, J\. Wu, J\. Wu, Y\. Xia, B\. Xiong, and V\. Tresp\(2024\)Temporal fact reasoning over hyper\-relational knowledge graphs\.InEMNLP \(Findings\),Findings of ACL,pp\. 355–373\.External Links:[Document](https://dx.doi.org/10.18653/V1/2024.FINDINGS-EMNLP.20)Cited by:[2nd item](https://arxiv.org/html/2605.16568#S1.I1.i2.p1.1)\.
- \[20\]X\. Dong, E\. Gabrilovich, G\. Heitz, W\. Horn, N\. Lao, K\. Murphy, T\. Strohmann, S\. Sun, and W\. Zhang\(2014\)Knowledge vault: a web\-scale approach to probabilistic knowledge fusion\.InKDD,pp\. 601–610\.External Links:[Document](https://dx.doi.org/10.1145/2623330.2623623)Cited by:[2nd item](https://arxiv.org/html/2605.16568#S1.I1.i2.p1.1),[§1](https://arxiv.org/html/2605.16568#S1.p2.2)\.
- \[21\]F\. Geerts, T\. Unger, G\. Karvounarakis, I\. Fundulaki, and V\. Christophides\(2016\)Algebraic structures for capturing the provenance of SPARQL queries\.J\. ACM63\(1\),pp\. 7:1–7:63\.External Links:[Document](https://dx.doi.org/10.1145/2810037)Cited by:[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p3.1)\.
- \[22\]C\. Gregucci, M\. Nayyeri, D\. Hernández, and S\. Staab\(2023\)Link prediction with attention applied on multiple knowledge graph embedding models\.InWWW,pp\. 2600–2610\.External Links:[Document](https://dx.doi.org/10.1145/3543507.3583358)Cited by:[§2\.3](https://arxiv.org/html/2605.16568#S2.SS3.p2.1)\.
- \[23\]S\. Harris and A\. Seaborne\(2013\-03\)SPARQL 1\.1 query language\.W3C RecommendationW3C\.External Links:[Link](https://www.w3.org/TR/sparql11-query/)Cited by:[§1](https://arxiv.org/html/2605.16568#S1.p1.1)\.
- \[24\]M\. Hausenblas, W\. Halb, Y\. Raimond, L\. Feigenbaum, and D\. Ayers\(2009\)SCOVO: using statistics on the web of data\.InESWC,Lecture Notes in Computer Science, Vol\.5554,pp\. 708–722\.External Links:[Document](https://dx.doi.org/10.1007/978-3-642-02121-3%5F52)Cited by:[§2\.1](https://arxiv.org/html/2605.16568#S2.SS1.p1.1)\.
- \[25\]C\. Henson, H\. Neuhaus, A\. Sheth,et al\.\(2012\)The SSN ontology: W3C semantic sensor network incubator group report\.Web Semantics: Science, Services and Agents on the World Wide Web17,pp\. 25–32\.External Links:[Document](https://dx.doi.org/10.1016/j.websem.2012.05.003)Cited by:[1st item](https://arxiv.org/html/2605.16568#S1.I1.i1.p1.3),[§2\.1](https://arxiv.org/html/2605.16568#S2.SS1.p1.1)\.
- \[26\]D\. Hernández, L\. Galárraga, and K\. Hose\(2021\)Computing how\-provenance for SPARQL queries via query rewriting\.Proc\. VLDB Endow\.14\(13\),pp\. 3389–3401\.External Links:[Document](https://dx.doi.org/10.14778/3484224.3484235)Cited by:[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p2.1),[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p3.1)\.
- \[27\]A\. Hogan, E\. Blomqvist, M\. Cochez, C\. d’Amato, G\. de Melo, C\. Gutierrez, S\. Kirrane, J\. E\. L\. Gayo, R\. Navigli, S\. Neumaier, A\. N\. Ngomo, A\. Polleres, S\. M\. Rashid, A\. Rula, L\. Schmelzeisen, J\. F\. Sequeda, S\. Staab, and A\. Zimmermann\(2022\)Knowledge graphs\.ACM Comput\. Surv\.54\(4\),pp\. 71:1–71:37\.External Links:[Document](https://dx.doi.org/10.1145/3447772)Cited by:[§1](https://arxiv.org/html/2605.16568#S1.p1.1)\.
- \[28\]R\. Jampani, F\. Xu, M\. Wu, L\. L\. Perez, C\. M\. Jermaine, and P\. J\. Haas\(2008\)MCDB: a monte carlo approach to managing uncertain data\.InSIGMOD Conference,pp\. 687–700\.External Links:[Document](https://dx.doi.org/10.1145/1376616.1376686)Cited by:[§2\.1](https://arxiv.org/html/2605.16568#S2.SS1.p3.1)\.
- \[29\]R\. Keskisärkkä, E\. Blomqvist, and O\. Hartig\(2021\)Optimizing RDF stream processing for uncertainty management\.InSEMANTiCS,Studies on the Semantic Web,pp\. 118–132\.External Links:[Document](https://dx.doi.org/10.3233/SSW210039)Cited by:[§2\.1](https://arxiv.org/html/2605.16568#S2.SS1.p2.1)\.
- \[30\]R\. Keskisärkkä, E\. Blomqvist, L\. Lind, and O\. Hartig\(2020\)Capturing and querying uncertainty in RDF stream processing\.InEKAW,Lecture Notes in Computer Science,pp\. 37–53\.External Links:[Document](https://dx.doi.org/10.1007/978-3-030-61244-3%5F3)Cited by:[§2\.1](https://arxiv.org/html/2605.16568#S2.SS1.p2.1)\.
- \[31\]G\. Lanza, B\. Deml, S\. Matthiesen, M\. Martin, O\. Brützel, and R\. Hörsting\(2024\)The vision of the circular factory for the perpetual innovative product\.at – Automatisierungstechnik72\(9\),pp\. 774–788\.External Links:[Document](https://dx.doi.org/10.1515/auto-2024-0012)Cited by:[§1](https://arxiv.org/html/2605.16568#S1.p1.1)\.
- \[32\]J\. Li, B\. Saha, and A\. Deshpande\(2011\)A unified approach to ranking in probabilistic databases\.VLDB J\.20\(2\),pp\. 249–275\.External Links:[Document](https://dx.doi.org/10.1007/S00778-011-0220-3)Cited by:[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p1.1)\.
- \[33\]T\. Lukasiewicz and U\. Straccia\(2008\)Managing uncertainty and vagueness in description logics for the semantic web\.J\. Web Semant\.6\(4\),pp\. 291–308\.External Links:[Document](https://dx.doi.org/10.1016/J.WEBSEM.2008.04.001)Cited by:[2nd item](https://arxiv.org/html/2605.16568#S1.I1.i2.p1.1),[§1](https://arxiv.org/html/2605.16568#S1.p1.1),[§1](https://arxiv.org/html/2605.16568#S1.p2.2),[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p5.1)\.
- \[34\]C\. Lutz and L\. Schröder\(2010\)Probabilistic description logics for subjective uncertainty\.InKR,External Links:[Link](https://aaai.org/papers/45-1243-probabilistic-description-logics-for-subjective-uncertainty/)Cited by:[3rd item](https://arxiv.org/html/2605.16568#S1.I1.i3.p1.3),[§2\.3](https://arxiv.org/html/2605.16568#S2.SS3.p1.2)\.
- \[35\]N\. Metropolis and S\. Ulam\(1949\)The monte carlo method\.Journal of the American Statistical Association44\(247\),pp\. 335–341\.External Links:[Document](https://dx.doi.org/10.2307/2280232)Cited by:[§2\.1](https://arxiv.org/html/2605.16568#S2.SS1.p4.1)\.
- \[36\]M\. Nickel and D\. Kiela\(2017\)Poincaré embeddings for learning hierarchical representations\.InNIPS,pp\. 6338–6347\.External Links:[Link](https://proceedings.neurips.cc/paper/2017/hash/59dfa2df42d9e3d41f5b02bfc32229dd-Abstract.html)Cited by:[§2\.3](https://arxiv.org/html/2605.16568#S2.SS3.p2.2)\.
- \[37\]R\. Peñaloza and N\. Potyka\(2017\)Towards statistical reasoning in description logics over finite domains\.InSUM,Lecture Notes in Computer Science, Vol\.10564,pp\. 280–294\.External Links:[Document](https://dx.doi.org/10.1007/978-3-319-67582-4%5F20)Cited by:[§2\.3](https://arxiv.org/html/2605.16568#S2.SS3.p1.2)\.
- \[38\]C\. P\. Robert and G\. Casella\(2004\)Monte carlo statistical methods\.Springer Texts in Statistics,Springer\.External Links:[Document](https://dx.doi.org/10.1007/978-1-4757-4145-2)Cited by:[§2\.1](https://arxiv.org/html/2605.16568#S2.SS1.p4.1),[§4\.1](https://arxiv.org/html/2605.16568#S4.SS1.p3.1)\.
- \[39\]S\. Sanner and E\. Abbasnejad\(2012\)Symbolic variable elimination for discrete and continuous graphical models\.InAAAI,pp\. 1954–1960\.External Links:[Document](https://dx.doi.org/10.1609/AAAI.V26I1.8406)Cited by:[§1](https://arxiv.org/html/2605.16568#S1.p3.1)\.
- \[40\]S\. Schulz, P\. Suntisukwongchote, F\. Baader, and M\. Boeker\(2009\)SNOMED reaching its adolescence: ontologists’ and logicians’ health check\.International Journal of Medical Informatics78,pp\. S86–S94\.External Links:[Document](https://dx.doi.org/10.1016/j.ijmedinf.2008.06.004)Cited by:[§6](https://arxiv.org/html/2605.16568#S6.p1.2)\.
- \[41\]S\. Singh, C\. Mayfield, S\. Mittal, S\. Prabhakar, S\. E\. Hambrusch, and R\. Shah\(2008\)The orion uncertain data management system\.InCOMAD,pp\. 273–276\.External Links:[Link](http://www.cse.iitb.ac.in/%5C%7Ecomad/2008/PDFs/orion2-demo.pdf)Cited by:[§2\.1](https://arxiv.org/html/2605.16568#S2.SS1.p3.1)\.
- \[42\]U\. Straccia\(2012\)Foundations of fuzzy logic and semantic web languages\.InCILC,CEUR Workshop Proceedings, Vol\.857,pp\. 1\.External Links:[Link](https://ceur-ws.org/Vol-857/paper%5C_i01.pdf)Cited by:[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p5.1)\.
- \[43\]D\. Suciu\(2020\)Probabilistic databases for all\.InPODS,pp\. 19–31\.External Links:[Document](https://dx.doi.org/10.1145/3375395.3389129)Cited by:[§1](https://arxiv.org/html/2605.16568#S1.p2.2),[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p1.1)\.
- \[44\]M\. J\. Swat, P\. Grenon, and S\. M\. Wimalaratne\(2016\)ProbOnto: ontology and knowledge base of probability distributions\.Bioinform\.32\(17\),pp\. 2719–2721\.External Links:[Document](https://dx.doi.org/10.1093/BIOINFORMATICS/BTW170)Cited by:[§2\.1](https://arxiv.org/html/2605.16568#S2.SS1.p1.1)\.
- \[45\]Y\. Tao, X\. Xiao, and R\. Cheng\(2007\)Range search on multidimensional uncertain data\.ACM Trans\. Database Syst\.32\(3\),pp\. 15\.External Links:[Document](https://dx.doi.org/10.1145/1272743.1272745)Cited by:[§1](https://arxiv.org/html/2605.16568#S1.p3.1)\.
- \[46\]A\. Wald\(1945\)Sequential tests of statistical hypotheses\.The Annals of Mathematical Statistics16\(2\),pp\. 117–186\.External Links:[Document](https://dx.doi.org/10.1214/aoms/1177731118)Cited by:[§2\.1](https://arxiv.org/html/2605.16568#S2.SS1.p4.1),[§4\.1](https://arxiv.org/html/2605.16568#S4.SS1.p3.1)\.
- \[47\]Y\. Wang, X\. Li, X\. Li, and Y\. Wang\(2013\)A survey of queries over uncertain data\.Knowl\. Inf\. Syst\.37\(3\),pp\. 485–530\.External Links:[Document](https://dx.doi.org/10.1007/S10115-013-0638-6)Cited by:[§1](https://arxiv.org/html/2605.16568#S1.p3.1)\.
- \[48\]B\. Xiong, N\. Potyka, T\. Tran, M\. Nayyeri, and S\. Staab\(2022\)Faithful embeddings for EL\+\+ knowledge bases\.InISWC 2022,Berlin, Heidelberg,pp\. 22–38\.External Links:[Document](https://dx.doi.org/10.1007/978-3-031-19433-7%5F2)Cited by:[§2\.3](https://arxiv.org/html/2605.16568#S2.SS3.p2.1),[§3](https://arxiv.org/html/2605.16568#S3.p13.1),[§4\.3](https://arxiv.org/html/2605.16568#S4.SS3.p2.1)\.
- \[49\]B\. Xiong, S\. Zhu, M\. Nayyeri, C\. Xu, S\. Pan, C\. Zhou, and S\. Staab\(2022\)Ultrahyperbolic knowledge graph embeddings\.InKDD,pp\. 2130–2139\.External Links:[Document](https://dx.doi.org/10.1145/3534678.3539333)Cited by:[§2\.3](https://arxiv.org/html/2605.16568#S2.SS3.p2.2)\.
- \[50\]Y\. Zhang, R\. Cheng, and J\. Chen\(2010\)Evaluating continuous probabilistic queries over imprecise sensor data\.InDASFAA \(1\),Lecture Notes in Computer Science, Vol\.5981,pp\. 535–549\.External Links:[Document](https://dx.doi.org/10.1007/978-3-642-12026-8%5F41)Cited by:[§1](https://arxiv.org/html/2605.16568#S1.p2.2)\.
- \[51\]Y\. Zhu, N\. Potyka, B\. Xiong, T\. Tran, M\. Nayyeri, E\. Kharlamov, and S\. Staab\(2024\)Approximating probabilistic inference in statistical EL with knowledge graph embeddings\.CoRRabs/2407\.11821\.External Links:[Document](https://dx.doi.org/10.48550/ARXIV.2407.11821)Cited by:[3rd item](https://arxiv.org/html/2605.16568#S1.I1.i3.p1.3),[§2\.3](https://arxiv.org/html/2605.16568#S2.SS3.p1.2),[§2\.3](https://arxiv.org/html/2605.16568#S2.SS3.p2.1)\.
- \[52\]Y\. Zhu, J\. Wu, Y\. Wang, H\. Zhou, J\. Chen, E\. Kharlamov, and S\. Staab\(2025\)Certainty in uncertainty: reasoning over uncertain knowledge graphs with statistical guarantees\.InEMNLP,pp\. 8730–8752\.External Links:[Document](https://dx.doi.org/10.18653/V1/2025.EMNLP-MAIN.441)Cited by:[§1](https://arxiv.org/html/2605.16568#S1.p2.2)\.
- \[53\]A\. Zimmermann, N\. Lopes, A\. Polleres, and U\. Straccia\(2012\)A general framework for representing, reasoning and querying with annotated semantic web data\.J\. Web Semant\.11,pp\. 72–95\.External Links:[Document](https://dx.doi.org/10.1016/J.WEBSEM.2011.08.006)Cited by:[§2\.2](https://arxiv.org/html/2605.16568#S2.SS2.p5.1)\.

Similar Articles

Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation

arXiv cs.CL

This paper proposes SGR, a framework that enhances LLM stepwise reasoning by integrating external knowledge graphs through query-relevant subgraph generation, combining Cypher-based reasoning with collaborative reasoning integration. Experiments on CWQ, WebQSP, GrailQA, and KQA Pro show improved reasoning accuracy over standard prompting and knowledge-enhanced baselines.