Lifted Causal Inference

arXiv cs.AI Papers

Summary

This paper introduces lifted causal inference, leveraging parametric causal factor graphs to efficiently compute causal effects in relational domains, and presents the Lifted Causal Inference (LCI) algorithm for polynomial-time inference.

arXiv:2606.28024v1 Announce Type: new Abstract: Lifted inference exploits indistinguishabilities in probabilistic graphical models by using a representative for indistinguishable objects, thereby speeding up query answering while maintaining exact answers. In this article, we show how lifting can be applied to efficiently compute causal effects in relational domains. More specifically, we introduce parametric causal factor graphs (PCFGs) to incorporate causal knowledge in lifted models and give a formal semantics of interventions therein. We further present the Lifted Causal Inference (LCI) algorithm to compute causal effects on a lifted level, thereby drastically speeding up causal inference compared to propositional inference, e.g., in causal Bayesian networks. In addition, we present partially directed parametric causal factor graphs (PD-PCFGs) as a generalisation of PCFGs to handle partial causal knowledge and extend LCI to perform lifted causal inference in a PD-PCFG, thereby extending the applicability of lifted causal inference to a broader range of models requiring less prior knowledge about causal relationships.
Original Article
View Cached Full Text

Cached at: 06/29/26, 05:28 AM

# Lifted Causal Inference
Source: [https://arxiv.org/html/2606.28024](https://arxiv.org/html/2606.28024)
\[1,2\]\\fnmMalte\\surLuttermann

1\]\\orgdivInstitute for Humanities\-Centered Artificial Intelligence,\\orgnameUniversity of Hamburg,\\orgaddress\\cityHamburg,\\countryGermany \[2\]\\orgnameGerman Research Center for Artificial Intelligence \(DFKI\),\\orgaddress\\cityLübeck,\\countryGermany 3\]\\orgdivData Science Group,\\orgnameUniversity of Münster,\\orgaddress\\cityMünster,\\countryGermany

###### Abstract

Lifted inference exploits indistinguishabilities in probabilistic graphical models by using a representative for indistinguishable objects, thereby speeding up query answering while maintaining exact answers\. In this article, we show how lifting can be applied to efficiently compute causal effects in relational domains\. More specifically, we introduceparametric causal factor graphsto incorporate causal knowledge in lifted models and give a formal semantics of interventions therein\. We further present theLifted Causal Inference\(LCI\) algorithm to compute causal effects on a lifted level, thereby drastically speeding up causal inference compared to propositional inference, e\.g\., incausal Bayesian networks\. In addition, we presentpartially directed parametric causal factor graphsas a generalisation ofPCFGsto handle partial causal knowledge and extendLCIto perform lifted causal inference in aPD\-PCFG, thereby extending the applicability of lifted causal inference to a broader range of models requiring less prior knowledge about causal relationships\.

###### keywords:

causal inference, lifting, probabilistic relational models

## 1Introduction

A fundamental problem in the research field of artificial intelligence for an intelligent agent is to plan and act rationally in a relational domain\. To compute the best possible action in a perceived state, the agent considers the available actions and chooses the one with the maximum expected utility\. When computing the expected utility of an action performed on a specific variable, it is crucial to deploy the semantics of an intervention instead of a typical conditioning on that variable\[Pearl2009a, Chapter 4\]\. When calculating the effect of an intervention, a specific variable is set to a fixed value and all incoming probabilistic causal influences of this variable must be ignored for the specific query\. It is fundamental to deploy the semantics of an intervention instead of the typical conditioning to correctly determine the effect of an action\. Otherwise, when treating actions as evidence \(by applying a classical conditioning\), conclusions might become misleading\. For example, assume a scenario in which the severity of fires influences the number of firefighters trying to extinguish the fire, that is, the more severe a fire is, the more firefighters are on duty\. Classical conditioning then suggests to reduce the number of firefighters to reduce the severity of fires \(because the probability for a severe fire is lower when observing a low number of firefighters on duty\)\. In this article, we apply lifting to efficiently compute causal effects \(and hence, the correct effect of actions\) in relational domains, where efficient inference refers to inference running in polynomial time with respect to domain sizes\.

Over the last years, causal models have become a widely used formalism to answer questions concerning the causal effect of an intervention on arandom variable\(randvar\) on anotherrandvar\. A causal model consists of\(i\)a causal graph representing the causal relationships between the involvedrandvars, and\(ii\)a probability distribution over therandvars\.There has been a considerable amount of work to perform causal effect estimation in causal models, and most of this work focuses on propositional models\[Spirtes2000a,Pearl2009a,Pearl2016a,Peters2017a\]\. Some works extend propositional \(undirected\)factor graphsby adding edge directions to enable the computation of the effect of interventions\[Frey2003a,Winn2012a\]\.Maier2013aintroduce so\-called relational causal models to express causal dependencies within relational domains\. Their work focuses on causal discovery, that is, on learning relational causal models from observed data\[Maier2010a\]\. Further developments on relational causal models also focus on causal discovery and on reasoning about conditional independence \(e\.g\.,Lee2015a,Lee2016a,Lee2019a\)\. Relational causal models provide a lifted representation \(that is, a representation that abstracts over individual objects and hence over all instantiations of a relational model\) to reason about conditional independence, however, relational causal models do not support lifted causal inference\. More recently, relational causal models have also been extended to cover cyclic dependency structures\[Ahsan2022a,Ahsan2023a\]\. Prior work dealing with the estimation of causal effects in relational domains still applies propositional probabilistic inference\[Arbour2016a,Salimi2020a\]\. Consequently, there is a lack of efficient algorithms to compute causal effects on a lifted level\. In probabilistic inference, lifting exploits indistinguishabilities in a relational model, allowing to carry out query answering more efficiently while maintaining exact answers\[Niepert2014a\]\. First introduced byPoole2003a,parametric factor graphsandLifted Variable Elimination\(LVE\) allow to perform lifted probabilistic inference, resulting in significant speed\-ups for probabilistic query answering in relational domains\. Over time,LVEhas been refined by many researchers to reach its current form\[DeSalvoBraz2005a,DeSalvoBraz2006a,Milch2008a,Kisynski2009a,Taghipour2013a,Braun2018a\]\. To perform efficient inference in aPFGnot only for single queries but also for sets of queries,Braun2016aintroduce theLifted Junction Tree\(LJT\) algorithm\.\\Acppfg have been well\-studied for many years and have been developed further to incorporate probabilistic inference over time\[Gehrke2018a,Gehrke2020a\], and, among other extensions, to allow for decision making by following the maximum expected utility principle\[Gehrke2018b,Gehrke2019c,Braun2022a\]\. Markov logic networks are another lifted representation and have been extended to incorporate maximum expected utility as well\[Apsel2012a\]\. In this article, we extendPFGsto enable lifted causal inference to correctly determine the effect of actions on a lifted level\.

This article is based on and extends the works\[Luttermann2024b\]and\[Luttermann2024g\]\. Specifically, we present the introduced models and algorithms for lifted causal inference under a unified view, thereby making the following contributions: First, we give a formal definition ofcausal factor graphsas an extension ofFGsto incorporate causal knowledge on a propositional level\. We then provide a unified view on fully directed lifted causal models introduced byLuttermann2024band partially directed lifted causal models introduced byLuttermann2024g\. In particular, we expose the connection between these models and their corresponding algorithms to perform lifted causal inference therein\. We especially highlight the differences in the assumptions made in the two models and exhibit how these assumptions affect their corresponding inference algorithms\. Furthermore, we align the model definitions and algorithm descriptions for consistency of terminology and improved clarity\. We also extend the theoretical results for fully directed and partially directed lifted causal models and showcase all presented concepts on a full running example\.

The remaining part of this article is structured as follows\. In[Sec\.˜2](https://arxiv.org/html/2606.28024#S2), we introduceCFGsand define the notion of an intervention in aCFGto allow for the computation of causal effects therein \(on a propositional level\)\. Thereafter, in[Sec\.˜3](https://arxiv.org/html/2606.28024#S3), we presentPCFGsas an extension ofPFGsand provide a formal semantics of interventions inPCFGs\. By incorporating causal knowledge on a lifted level, aPCFGallows to perform lifted causal inference, thereby enabling efficient decision making in relational domains using the notion of an intervention\. Then, in[Sec\.˜4](https://arxiv.org/html/2606.28024#S4), we elucidate theLCIalgorithm, which operates on aPCFG, and show howLCIcomputes causal effects on a lifted level to avoid grounding thePCFGas much as possible\. We then portrayPD\-PCFGsas a generalisation ofPCFGsin[Sec\.˜5](https://arxiv.org/html/2606.28024#S5)\. Afterwards, we investigate how the effect of interventions can be computed in aPD\-PCFGin the presence of unknown causal relationships\. In[Sec\.˜6](https://arxiv.org/html/2606.28024#S6), we present theExtended Lifted Causal Inference\(ELCI\) algorithm as a generalisation ofLCIto efficiently compute causal effects in aPD\-PCFGbefore we conclude this article in[Sec\.˜7](https://arxiv.org/html/2606.28024#S7)\.

## 2Causal Factor Graphs

Similar to acausal Bayesian network\(CBN\)\[Pearl1988a,Pearl2009a\], aCFGis a probabilistic graphical model that simultaneously encodes a probability distribution over a set ofrandvars𝑹\\boldsymbol\{R\}and causal relationships between therandvarsin𝑹\\boldsymbol\{R\}\. As in non\-causalFGs\[Frey1997a,Kschischang2001a\], the full joint probability distribution is encoded as a product of factors, where each factor is a function of a subset of therandvars\. The difference between anFGand aCFGis that aCFGcontains directed edges instead of undirected edges to represent the causal relationships between therandvars\. More specifically, a directed edge from arandvarRiR\_\{i\}to anotherrandvarRjR\_\{j\}in aCFGindicates thatRiR\_\{i\}is a direct cause ofRjR\_\{j\}and thus, the value ofRiR\_\{i\}influences the value ofRjR\_\{j\}\[Pearl2009a\]\. Therefore, in any causal graph, it holds that the value of arandvardepends on the values of its parents\. We next provide a formal definition of aCFGbased on the definition of directedFGsgiven byFrey2003a\. In the following, we denote byrange​\(Ri\)\\mathrm\{range\}\(R\_\{i\}\)the range of arandvarRiR\_\{i\}, that is, the set of possible values thatRiR\_\{i\}can take\.

###### Definition 1\(Causal Factor Graph\)\.

We define a*CFG*as a tupleM=\(𝐕,𝐄,𝚽\)M=\(\\boldsymbol\{V\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)where\(𝐕,𝐄\)\(\\boldsymbol\{V\},\\boldsymbol\{E\}\)is a directed bipartite graph with node set𝐕=𝐑∪𝐅\\boldsymbol\{V\}=\\boldsymbol\{R\}\\cup\\boldsymbol\{F\}and edge set𝐄⊆𝐑×𝐅\\boldsymbol\{E\}\\subseteq\\boldsymbol\{R\}\\times\\boldsymbol\{F\}and𝚽\\boldsymbol\{\\Phi\}is a set of function definitions\. The set of nodes𝐕\\boldsymbol\{V\}is divided into a set ofrandvars𝐑=\{R1,…,Rn\}\\boldsymbol\{R\}=\\\{R\_\{1\},\\ldots,R\_\{n\}\\\}\(variable nodes\) and a set of function names \(factor nodes\)𝐅=\{f1,…,fm\}\\boldsymbol\{F\}=\\\{f\_\{1\},\\ldots,f\_\{m\}\\\}\. Every function namefj∈𝐅f\_\{j\}\\in\\boldsymbol\{F\}has a function definition \(factor, for short\)ϕj​\(ℛj\)∈𝚽\\phi\_\{j\}\(\\mathcal\{R\}\_\{j\}\)\\in\\boldsymbol\{\\Phi\}, whereϕj:×R∈ℛjrange\(R\)↦ℝ≥0\\phi\_\{j\}\\colon\\times\_\{R\\in\\mathcal\{R\}\_\{j\}\}\\mathrm\{range\}\(R\)\\mapsto\\mathbb\{R\}\_\{\\geq 0\}maps range values of a sequenceℛj\\mathcal\{R\}\_\{j\}ofrandvarsfrom𝐑\\boldsymbol\{R\}to a non\-negative real number \(potential\)\. For each function definition, there must be at least one sequence of range values that is mapped to a potential which is non\-zero\. The set of edges𝐄\\boldsymbol\{E\}contains two types of edges\. For every factor nodefj∈𝐅f\_\{j\}\\in\\boldsymbol\{F\}with corresponding function definitionϕj​\(ℛj\)\\phi\_\{j\}\(\\mathcal\{R\}\_\{j\}\), there is either an undirected edge\{Ri,fj\}∈𝐄\\\{R\_\{i\},f\_\{j\}\\\}\\in\\boldsymbol\{E\}or a directed edge\(fj,Ri\)∈𝐄\(f\_\{j\},R\_\{i\}\)\\in\\boldsymbol\{E\}for everyrandvarRi∈ℛjR\_\{i\}\\in\\mathcal\{R\}\_\{j\}\. We stipulate that for every factor nodefj∈𝐅f\_\{j\}\\in\\boldsymbol\{F\}, there exists exactly one outgoing directed edge\(fj,Ri\)∈𝐄\(f\_\{j\},R\_\{i\}\)\\in\\boldsymbol\{E\}among the edges incident tofjf\_\{j\}\. Each directed edge\{Ri,fj\},\(fj→Rk\)\\\{R\_\{i\},f\_\{j\}\\\},\(f\_\{j\}\\to R\_\{k\}\)from arandvarRi∈𝐑R\_\{i\}\\in\\boldsymbol\{R\}to arandvarRk∈𝐑R\_\{k\}\\in\\boldsymbol\{R\}via a factor nodefj∈𝐅f\_\{j\}\\in\\boldsymbol\{F\}corresponds to a direct causal relationship betweenRiR\_\{i\}andRkR\_\{k\}\. Furthermore,MMhas to be acyclic, that is,MMis required to not contain any directed cycles\. The joint potential for an assignment𝐑=𝐫\\boldsymbol\{R\}=\\boldsymbol\{r\}is defined as the product over all factors in theCFGMM:

ψM​\(𝑹=𝒓\)=∏j=1mϕj​\(ℛj=𝒓j\),\\displaystyle\\psi\_\{M\}\(\\boldsymbol\{R\}=\\boldsymbol\{r\}\)=\\prod\_\{j=1\}^\{m\}\\phi\_\{j\}\(\\mathcal\{R\}\_\{j\}=\\boldsymbol\{r\}\_\{j\}\),\(1\)where𝐫j\\boldsymbol\{r\}\_\{j\}is a projection of𝐫\\boldsymbol\{r\}to the argument list ofϕj\\phi\_\{j\}\. The full joint probability distributionPM​\(𝐑\)P\_\{M\}\(\\boldsymbol\{R\}\)over𝐑\\boldsymbol\{R\}encoded byMMis then given by the normalised joint potential:

PM​\(𝑹=𝒓\)\\displaystyle P\_\{M\}\(\\boldsymbol\{R\}=\\boldsymbol\{r\}\)=1Z​ψM​\(𝑹=𝒓\),\\displaystyle=\\frac\{1\}\{Z\}\\psi\_\{M\}\(\\boldsymbol\{R\}=\\boldsymbol\{r\}\),\(2\)where the normalisation constantZZis defined as the sum of all joint potentials:

Z\\displaystyle Z=∑𝒓∈range​\(R1\)×…×range​\(Rn\)ψM​\(𝑹=𝒓\)\.\\displaystyle=\\sum\_\{\\boldsymbol\{r\}\\in\\mathrm\{range\}\(R\_\{1\}\)\\times\\ldots\\times\\mathrm\{range\}\(R\_\{n\}\)\}\\psi\_\{M\}\(\\boldsymbol\{R\}=\\boldsymbol\{r\}\)\.\(3\)

###### Example 1\(Causal Factor Graph\)\.

Consider theCFGM=\(𝐕,𝐄,𝚽\)M=\(\\boldsymbol\{V\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)depicted in[Fig\.˜1\(a\)](https://arxiv.org/html/2606.28024#S2.F1.sf1)\.MMrepresents the causal relationships between the competences and salaries of three employeesA​l​i​c​eAlice,B​o​bBob, andC​h​a​r​l​i​eCharlieand the revenue of the company they work for\. The underlying causal graph is illustrated in[Fig\.˜1\(b\)](https://arxiv.org/html/2606.28024#S2.F1.sf2)\. In set notation, the graph structure ofMMis given by\(𝐕=𝐑∪𝐅,𝐄\)\(\\boldsymbol\{V\}=\\boldsymbol\{R\}\\cup\\boldsymbol\{F\},\\boldsymbol\{E\}\), where

𝑹=\\displaystyle\\boldsymbol\{R\}=\{C​o​m​A,C​o​m​B,C​o​m​C,R​e​v,S​a​l​A,S​a​l​B,S​a​l​C\},\\displaystyle\\\{ComA,ComB,ComC,Rev,SalA,SalB,SalC\\\},𝑭=\\displaystyle\\boldsymbol\{F\}=\{f1,f2,f3,f4,f5,f6,f7\},and\\displaystyle\\\{f\_\{1\},f\_\{2\},f\_\{3\},f\_\{4\},f\_\{5\},f\_\{6\},f\_\{7\}\\\},~\\text\{and\}𝑬=\\displaystyle\\boldsymbol\{E\}=\{\(f1,ComA\),\{ComA,f4\},\{ComA,f5\},\(f2,ComB\),\{ComB,f4\},\{ComB,f6\},\\displaystyle\\\{\(f\_\{1\},ComA\),\\\{ComA,f\_\{4\}\\\},\\\{ComA,f\_\{5\}\\\},\(f\_\{2\},ComB\),\\\{ComB,f\_\{4\}\\\},\\\{ComB,f\_\{6\}\\\},\(f3,C​o​m​C\),\{C​o​m​C,f4\},\{C​o​m​C,f7\},\(f4,R​e​v\),\{R​e​v,f5\},\{R​e​v,f6\},\{R​e​v,f7\},\\displaystyle\\phantom\{\\\{\}\(f\_\{3\},ComC\),\\\{ComC,f\_\{4\}\\\},\\\{ComC,f\_\{7\}\\\},\(f\_\{4\},Rev\),\\\{Rev,f\_\{5\}\\\},\\\{Rev,f\_\{6\}\\\},\\\{Rev,f\_\{7\}\\\},\(f5,SalA\),\(f6,SalB\),\(f7,SalC\)\}\.\\displaystyle\\phantom\{\\\{\}\(f\_\{5\},SalA\),\(f\_\{6\},SalB\),\(f\_\{7\},SalC\)\\\}\.Moreover, the set of function definitions is

𝚽=\\displaystyle\\boldsymbol\{\\Phi\}=\{ϕ1\(ComA\),ϕ2\(ComB\),ϕ3\(ComC\),ϕ4\(ComA,ComB,ComC,Rev\),\\displaystyle\\\{\\phi\_\{1\}\(ComA\),\\phi\_\{2\}\(ComB\),\\phi\_\{3\}\(ComC\),\\phi\_\{4\}\(ComA,ComB,ComC,Rev\),ϕ5\(ComA,Rev,SalA\),ϕ6\(ComB,Rev,SalB\),ϕ7\(ComC,Rev,SalC\)\},\\displaystyle\\phantom\{\\\{\}\\phi\_\{5\}\(ComA,Rev,SalA\),\\phi\_\{6\}\(ComB,Rev,SalB\),\\phi\_\{7\}\(ComC,Rev,SalC\)\\\},where we omit the exact specification of the potential tables for brevity\.

C​o​m​AComAC​o​m​BComBC​o​m​CComCS​a​l​ASalAS​a​l​BSalBS​a​l​CSalCR​e​vRevf1f\_\{1\}f2f\_\{2\}f3f\_\{3\}f4f\_\{4\}f5f\_\{5\}f6f\_\{6\}f7f\_\{7\}\(a\)C​o​m​AComAC​o​m​BComBC​o​m​CComCS​a​l​ASalAS​a​l​BSalBS​a​l​CSalCR​e​vRev\(b\)
Figure 1:\(a\) ACFGmodelling the interplay between the competences and salaries of three employeesA​l​i​c​eAlice,B​o​bBob, andC​h​a​r​l​i​eCharlieand the revenue of a company, and \(b\) the underlying causal graph\. We omit the potential tables of the factors for brevity\.If the context is clear, we may omit the subscriptMMinPMP\_\{M\}and simply writePPinstead\. From the definition of the full joint probability distributionPMP\_\{M\}in[Eq\.˜2](https://arxiv.org/html/2606.28024#S2.E2), it becomes clear that the full joint probability distributionPMP\_\{M\}encoded by aCFGMMis independent of the edge directions inMM\. In other words, changing the edge directions inMMdoes not affect the probability distributionPMP\_\{M\}encoded byMM\. However, the edge directions impact the effect of an intervention and also the conditional independence statements induced byMM, which are implied by separation inMM\. The separation criteria in aCFGdiffer from the separation criteria in aFGas the direction of edges influences whether paths are blocked or not\. Before we define the separation criteria in aCFG, we introduce the following notations for aCFGM=\(𝑹∪𝑭,𝑬,𝚽\)M=\(\\boldsymbol\{R\}\\cup\\boldsymbol\{F\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\):

- •Pa𝑹​\(M,R\)=\{R′∈𝑹∣∃f∈𝑭:\{R′,f\}∈𝑬∧\(f,R\)∈𝑬\}\\mathrm\{Pa\}\_\{\\boldsymbol\{R\}\}\(M,R\)=\\\{R^\{\\prime\}\\in\\boldsymbol\{R\}\\mid\\exists f\\in\\boldsymbol\{F\}\\colon\\\{R^\{\\prime\},f\\\}\\in\\boldsymbol\{E\}\\land\(f,R\)\\in\\boldsymbol\{E\}\\\}denotes the set of parentrandvarsof arandvarR∈𝑹R\\in\\boldsymbol\{R\}inMM,
- •Pa​\(M,f\)=\{R∈𝑹∣\{R,f\}∈𝑬\}\\mathrm\{Pa\}\(M,f\)=\\\{R\\in\\boldsymbol\{R\}\\mid\\\{R,f\\\}\\in\\boldsymbol\{E\}\\\}denotes the set of parentrandvarsof a factor nodef∈𝑭f\\in\\boldsymbol\{F\}inMM,
- •Ch𝑹​\(M,R\)=\{R′∈𝑹∣∃f∈𝑭:\{R,f\}∈𝑬∧\(f,R′\)∈𝑬\}\\mathrm\{Ch\}\_\{\\boldsymbol\{R\}\}\(M,R\)=\\\{R^\{\\prime\}\\in\\boldsymbol\{R\}\\mid\\exists f\\in\\boldsymbol\{F\}\\colon\\\{R,f\\\}\\in\\boldsymbol\{E\}\\land\(f,R^\{\\prime\}\)\\in\\boldsymbol\{E\}\\\}denotes the singleton set of childrandvarsof arandvarR∈𝑹R\\in\\boldsymbol\{R\}inMM,
- •Ch​\(M,f\)=\{R∈𝑹∣\(f,R\)∈𝑬\}\\mathrm\{Ch\}\(M,f\)=\\\{R\\in\\boldsymbol\{R\}\\mid\(f,R\)\\in\\boldsymbol\{E\}\\\}denotes the singleton set of childrandvarsof a factor nodef∈𝑭f\\in\\boldsymbol\{F\}inMM,
- •De𝑹\(M,R\)=\{R′∈𝑹∣∃f1,…,fk∈𝑭,R1,…,Rk−1∈𝑹:\{R,f1\},\(f1,R1\),…,\{Rk−1,fk\},\(fk,R′\)∈𝑬\}\\mathrm\{De\}\_\{\\boldsymbol\{R\}\}\(M,R\)=\\\{R^\{\\prime\}\\in\\boldsymbol\{R\}\\mid\\exists f\_\{1\},\\allowbreak\\ldots,\\allowbreak f\_\{k\}\\in\\boldsymbol\{F\},\\allowbreak R\_\{1\},\\allowbreak\\ldots,\\allowbreak R\_\{k\-1\}\\in\\boldsymbol\{R\}\\colon\\\{R,\\allowbreak f\_\{1\}\\\},\\allowbreak\(f\_\{1\},\\allowbreak R\_\{1\}\),\\allowbreak\\ldots,\\allowbreak\\\{R\_\{k\-1\},\\allowbreak f\_\{k\}\\\},\\allowbreak\(f\_\{k\},\\allowbreak R^\{\\prime\}\)\\in\\boldsymbol\{E\}\\\}denotes the set of descendantrandvarsof arandvarR∈𝑹R\\in\\boldsymbol\{R\}inMM\(i\.e\.,randvarsthat can be reached fromRRvia a directed path\), and
- •De\(M,f\)=\{R′∈𝑹∣∃f1,…,fk∈𝑭,R1,…,Rk∈𝑹:\(f,R1\),\{R1,f1\},…,\(fk,R′\)∈𝑬\}\\mathrm\{De\}\(M,f\)=\\\{R^\{\\prime\}\\in\\boldsymbol\{R\}\\mid\\exists f\_\{1\},\\allowbreak\\ldots,\\allowbreak f\_\{k\}\\in\\boldsymbol\{F\},\\allowbreak R\_\{1\},\\allowbreak\\ldots,\\allowbreak R\_\{k\}\\in\\boldsymbol\{R\}\\colon\(f,\\allowbreak R\_\{1\}\),\\allowbreak\\\{R\_\{1\},\\allowbreak f\_\{1\}\\\},\\allowbreak\\ldots,\\allowbreak\(f\_\{k\},\\allowbreak R^\{\\prime\}\)\\in\\boldsymbol\{E\}\\\}denotes the set of descendantrandvarsof a factor nodef∈𝑭f\\in\\boldsymbol\{F\}inMM\(i\.e\.,randvarsthat can be reached fromffvia a directed path\)\.

The subscript𝑹\\boldsymbol\{R\}indicates that the sets are defined with respect to therandvarsin𝑹\\boldsymbol\{R\}, that is, the sets contain neighbouringrandvars\(connected via a factor node\) instead of directly connected factor nodes\. As factor nodes are directly connected torandvars, we omit the subscript𝑹\\boldsymbol\{R\}for the sets defined with respect to factor nodes\. Moreover, since every factor node has exactly one outgoing directed edge, it holds that\|Ch​\(M,f\)\|=1\\lvert\\mathrm\{Ch\}\(M,f\)\\rvert=1for every factor nodef∈𝑭f\\in\\boldsymbol\{F\}\. For the ease of reading, we may writef→Rf\\to R\(orR←fR\\leftarrow f\) to represent a directed edge\(f,R\)∈𝑬\(f,R\)\\in\\boldsymbol\{E\}andR−fR\-f\(orf−Rf\-R\) to represent an undirected edge\{R,f\}∈𝑬\\\{R,f\\\}\\in\\boldsymbol\{E\}\. We next define separation inCFGsbased on the definition given byFrey2003afor directedFGs\.

###### Definition 2\(Separation in Causal Factor Graphs\)\.

LetM=\(𝐑∪𝐅,𝐄,𝚽\)M=\(\\boldsymbol\{R\}\\cup\\boldsymbol\{F\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)denote aCFGand let𝐑i⊆𝐑\\boldsymbol\{R\}\_\{i\}\\subseteq\\boldsymbol\{R\},𝐑j⊆𝐑\\boldsymbol\{R\}\_\{j\}\\subseteq\\boldsymbol\{R\}, and𝐒⊆𝐑\\boldsymbol\{S\}\\subseteq\\boldsymbol\{R\}denote pairwise disjoint sets ofrandvars\. A path from arandvarto anotherrandvarinMMis a connected sequence of edges and is not restricted to follow the directions of the edges\. Thus, it is also possible for a path to pass from a parentingrandvarof a factor to another parentingrandvarof the same factor\. A path is blocked by𝐒\\boldsymbol\{S\}if

1. 1\.the path contains the patternf1→S←f2f\_\{1\}\\to S\\leftarrow f\_\{2\}, wheref1,f2∈𝑭f\_\{1\},f\_\{2\}\\in\\boldsymbol\{F\}, such that neitherSSnor any of its descendants are in𝑺\\boldsymbol\{S\}, or
2. 2\.the path passes fromf1f\_\{1\}throughSStof2f\_\{2\}, wheref1,f2∈𝑭f\_\{1\},f\_\{2\}\\in\\boldsymbol\{F\}, such that it does not contain the patternf1→S←f2f\_\{1\}\\to S\\leftarrow f\_\{2\}andSSis in𝑺\\boldsymbol\{S\}, or
3. 3\.the path passes from a parent of a factor nodef∈𝑭f\\in\\boldsymbol\{F\}to another parent offf, and neither the child offfnor any of its descendants are in𝑺\\boldsymbol\{S\}\.

MMimplies\(𝐑i⟂⟂𝐑j∣𝐒\)\(\\boldsymbol\{R\}\_\{i\}\\mathrel\{\\perp\\\!\\\!\\\!\\perp\}\\boldsymbol\{R\}\_\{j\}\\mid\\boldsymbol\{S\}\)if𝐒\\boldsymbol\{S\}*separates*𝐑i\\boldsymbol\{R\}\_\{i\}and𝐑j\\boldsymbol\{R\}\_\{j\}inMM, that is, if𝐒\\boldsymbol\{S\}blocks all paths from arandvarin𝐑i\\boldsymbol\{R\}\_\{i\}to arandvarin𝐑j\\boldsymbol\{R\}\_\{j\}\.

The separation criteria forCFGsgiven in[Def\.˜2](https://arxiv.org/html/2606.28024#Thmdefinition2)directly correspond to the rules ofdd\-separation inBayesian networksintroduced byPearl1986a\.

###### Example 2\(Separation\)\.

Consider again theCFGMMdepicted in[Fig\.˜1\(a\)](https://arxiv.org/html/2606.28024#S2.F1.sf1)\. For instance, it holds thatC​o​m​A⟂⟂C​o​m​BComA\\mathrel\{\\perp\\\!\\\!\\\!\\perp\}ComB\(i\.e\.,𝐒=∅\\boldsymbol\{S\}=\\emptyset\) as all paths fromC​o​m​AComAtoC​o​m​BComBare blocked by the condition given in[Item˜3](https://arxiv.org/html/2606.28024#S2.I2.i3)from[Def\.˜2](https://arxiv.org/html/2606.28024#Thmdefinition2)\. However, it holds thatC​o​m​A⟂⟂C​o​m​B∣\{R​e​v\}ComA\\mathrel\{\\not\\\!\\perp\\\!\\\!\\\!\\perp\}ComB\\mid\\\{Rev\\\}as, for instance, the pathC​o​m​A−f4−C​o​m​BComA\-f\_\{4\}\-ComBis not blocked anymore as soon asR​e​v∈𝐒Rev\\in\\boldsymbol\{S\}\.

When using aCFGMMto encode a probability distribution, it is crucial that the conditional independence statements induced byMMactually hold in the probability distribution\. The*global Markov property*ensures that separation criteria in aCFGare compliant with the conditional independence statements in a probability distribution\.

###### Definition 3\(Global Markov Property\[Lauritzen1996a\]\)\.

A probability distributionPPsatisfies the*global Markov property*for aCFGMMif and only if for all disjoint sets of variables𝐑i\\boldsymbol\{R\}\_\{i\},𝐑j\\boldsymbol\{R\}\_\{j\}, and𝐒\\boldsymbol\{S\}it holds that if𝐑i\\boldsymbol\{R\}\_\{i\}is separated from𝐑j\\boldsymbol\{R\}\_\{j\}given𝐒\\boldsymbol\{S\}inMM, then𝐑i\\boldsymbol\{R\}\_\{i\}is conditionally independent from𝐑j\\boldsymbol\{R\}\_\{j\}given𝐒\\boldsymbol\{S\}inPP\.

In other words, every conditional independence statement induced by the graph structure of aCFGMMalso holds in the probability distributionPPif the global Markov property is satisfied\. Thus, when encoding a probability distributionPPusing aCFGMM,MMcannot be chosen arbitrarily but instead must be chosen such thatPPsatisfies the global Markov property with respect toMM\. In this article, we therefore assume that all distributionsPPsatisfy the global Markov property for the correspondingCFGthat is used to encodePP\. Furthermore, we additionally require thatMMandPPfulfil the causal Markov property\[Spirtes2000a\], stating that inMM, everyrandvarR∈𝑹R\\in\\boldsymbol\{R\}is independent of allrandvarsthat are neither effects nor direct causes ofRRgiven the set ofRR’s direct causes\. By assuming that the causal Markov property holds, we ensure that every directed edge inMMaccurately represents a causal relationship between the involvedrandvars\(i\.e\., edge directions are actually causal\)\.

###### Definition 4\(Causal Markov Property\[Spirtes2000a\]\)\.

LetM=\(𝐑∪𝐅,𝐄,𝚽\)M=\(\\boldsymbol\{R\}\\cup\\boldsymbol\{F\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)be aCFGand letPPbe a probability distribution over therandvarsin𝐑\\boldsymbol\{R\}generated by the underlying causal structure ofMM\.MMandPPsatisfy the*causal Markov property*if and only if for everyrandvarR∈𝐑R\\in\\boldsymbol\{R\},RRis independent of its non\-descendants given its parents, i\.e\.,R⟂⟂𝐑∖\(De𝐑​\(M,R\)∪Pa𝐑​\(M,R\)\)∣Pa𝐑​\(M,R\)R\\mathrel\{\\perp\\\!\\\!\\\!\\perp\}\\boldsymbol\{R\}\\setminus\(\\mathrm\{De\}\_\{\\boldsymbol\{R\}\}\(M,R\)\\cup\\mathrm\{Pa\}\_\{\\boldsymbol\{R\}\}\(M,R\)\)\\mid\\mathrm\{Pa\}\_\{\\boldsymbol\{R\}\}\(M,R\)\.

In case aCFGMMand a probability distributionPPsatisfy the causal Markov property,PPalso satisfies the global Markov property with respect toMM, that is, the global Markov property is implied by the causal Markov property\. However, it is generally possible for a probability distributionPPto satisfy the global Markov property for a directed \(non\-causal\)FGMMwithoutMMandPPsatisfying the causal Markov property ifMMencodes the conditional independence statements inPPbut does not represent the true underlying causal relationships ofPP\(i\.e\., the directed edges inMMare not causal\)\. In case the causal Markov property is satisfied, direct causes of arandvarRRare given by the parents ofRRand the effects ofRRare given by the children ofRRand hence,PPfactorises as

P​\(R1=r1,…,Rn=rn\)=∏i=1nP​\(Ri=ri∣Pa𝑹​\(M,Ri\)=pa𝑹​\(M,Ri\)\),\\displaystyle P\(R\_\{1\}=r\_\{1\},\\ldots,R\_\{n\}=r\_\{n\}\)=\\prod\_\{i=1\}^\{n\}P\(R\_\{i\}=r\_\{i\}\\mid\\mathrm\{Pa\}\_\{\\boldsymbol\{R\}\}\(M,R\_\{i\}\)=\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(M,R\_\{i\}\)\),\(4\)wherepa𝑹​\(M,Ri\)\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(M,R\_\{i\}\)denotes a projection of the assignment\(r1,…,rn\)\(r\_\{1\},\\ldots,r\_\{n\}\)to the parents ofRiR\_\{i\}\. Moreover, we assume causal sufficiency\[Spirtes2000a\]in this article\. The causal sufficiency assumption states that all common causes ofrandvarsthat are included in a model are also included in the model\.

###### Definition 5\(Causal Sufficiency\)\.

A set𝐑\\boldsymbol\{R\}ofrandvarsis*causally sufficient*if and only if every common cause of any tworandvarsin𝐑\\boldsymbol\{R\}is also in𝐑\\boldsymbol\{R\}\.

To summarise, whenever we deal with aCFG\(or any of the causal models introduced in the upcoming sections\)MM, we make the following assumptions:

1. 1\.MMis acyclic, i\.e\.,MMcontains no directed cycles \([Def\.˜1](https://arxiv.org/html/2606.28024#Thmdefinition1)\),
2. 2\.MMand the probability distributionPPencoded byMMsatisfy the causal Markov property \([Def\.˜4](https://arxiv.org/html/2606.28024#Thmdefinition4)\), and
3. 3\.the set ofrandvars𝑹\\boldsymbol\{R\}inMMis causally sufficient \([Def\.˜5](https://arxiv.org/html/2606.28024#Thmdefinition5)\)\.

We next introduce a formal notion of an intervention in aCFG\. An interventiond​o​\(R=r\)do\(R=r\)sets the value of arandvarRRto a fixed valuer∈range​\(R\)r\\in\\mathrm\{range\}\(R\)and removes all incoming influences onRR\[Pearl2016a\]\. In the following, we use the notationd​o​\(R1=r1,…,Rk=rk\)do\(R\_\{1\}=r\_\{1\},\\ldots,R\_\{k\}=r\_\{k\}\)to denote a joint intervention on therandvarsR1,…,RkR\_\{1\},\\ldots,R\_\{k\}, which is an abbreviation ofd​o​\(R1=r1\),…,d​o​\(Rk=rk\)do\(R\_\{1\}=r\_\{1\}\),\\ldots,do\(R\_\{k\}=r\_\{k\}\)\. When performing an intervention, the underlying probability distribution changes\. The next definition formalises the effect of an intervention on the probability distribution encoded by aCFG\.

###### Definition 6\(Interventional Distribution\)\.

LetM=\(𝐑∪𝐅,𝐄,𝚽\)M=\(\\boldsymbol\{R\}\\cup\\boldsymbol\{F\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)be aCFGwith𝐑=\{R1,…,Rn\}\\boldsymbol\{R\}=\\\{R\_\{1\},\\allowbreak\\ldots,\\allowbreak R\_\{n\}\\\}\. An interventiond​o​\(R1′=r1′,…,Rk′=rk′\)do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)on therandvarsR1′,…,Rk′∈𝐑R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\in\\boldsymbol\{R\}changes the probability distributionPMP\_\{M\}encoded byMMsuch that

PM\\displaystyle P\_\{M\}\(R1=r1,…,Rn=rn∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)\\displaystyle\(R\_\{1\}=r\_\{1\},\\ldots,R\_\{n\}=r\_\{n\}\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)=\{∏Ri∈\{R1,…,Rn\}∖\{R1′,…,Rk′\}P​\(ri∣pa𝑹​\(M,Ri\)\)if​∀j∈\{1,…,k\}:rj=rj′0otherwise,\\displaystyle=\\begin\{cases\}\\prod\\limits\_\{R\_\{i\}\\in\\\{R\_\{1\},\\ldots,R\_\{n\}\\\}\\setminus\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}\}P\(r\_\{i\}\\mid\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(M,R\_\{i\}\)\)&\\text\{if \}\\forall j\\in\\\{1,\\ldots,k\\\}\\colon r\_\{j\}=r^\{\\prime\}\_\{j\}\\\\ 0&\\text\{otherwise\},\\end\{cases\}wherepa𝐑​\(M,Ri\)\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(M,R\_\{i\}\)denotes a projection of the assignment\(r1,…,rn\)\(r\_\{1\},\\ldots,r\_\{n\}\)to the parentsPa𝐑​\(M,Ri\)\\mathrm\{Pa\}\_\{\\boldsymbol\{R\}\}\(M,R\_\{i\}\)ofRiR\_\{i\}\.PM​\(R1=r1,…,Rn=rn∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\_\{M\}\(R\_\{1\}=r\_\{1\},\\allowbreak\\ldots,\\allowbreak R\_\{n\}=r\_\{n\}\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)is called the*interventional distribution*ofPMP\_\{M\}under the interventiond​o​\(R1′=r1′,…,Rk′=rk′\)do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\.

The interventional distribution is used to compute the effect of an intervention, that is, to answer an interventional query, which we define as follows\.

###### Definition 7\(Interventional Query\)\.

An*interventional query*P​\(Q∣d​o​\(R1=r1,…,Rk=rk\)\)P\(Q\\mid do\(R\_\{1\}=r\_\{1\},\\ldots,R\_\{k\}=r\_\{k\}\)\)consists of a query termQQ\(also called query variable\) and a set of interventions𝐈=\{d​o​\(R1=r1\),…,d​o​\(Rk=rk\)\}\\boldsymbol\{I\}=\\\{do\(R\_\{1\}=r\_\{1\}\),\\ldots,do\(R\_\{k\}=r\_\{k\}\)\\\}whereQQandR1,…,RkR\_\{1\},\\ldots,R\_\{k\}are disjointrandvars\. We also refer to the variablesR1,…,RkR\_\{1\},\\ldots,R\_\{k\}in𝐈\\boldsymbol\{I\}as intervention variables\. To query a specific probability instead of a probability distribution, the query term is an eventQ=qQ=q\.

To answer an interventional query, we can directly apply[Def\.˜6](https://arxiv.org/html/2606.28024#Thmdefinition6)\. In particular, given aCFGM=\(𝑹∪𝑭,𝑬,𝚽\)M=\(\\boldsymbol\{R\}\\cup\\boldsymbol\{F\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)with𝑹=\{R1,…,Rℓ,R1′,…,Rk′\}\\boldsymbol\{R\}=\\\{R\_\{1\},\\ldots,R\_\{\\ell\},R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}, for an interventiond​o​\(R1′=r1′,…​Rk′=rk′\)do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\), the joint distribution overR1,…,RℓR\_\{1\},\\ldots,R\_\{\\ell\}is given as

P\(R1=r1,…,Rℓ=rℓ∣do\(R1′=r1′,…,Rk′=rk′\)\)=∏Ri∈\{R1,…,Rℓ\}P​\(Ri=ri∣Pa𝑹​\(M,Ri\)=pa𝑹​\(M,Ri\)\),\\displaystyle\\begin\{split\}P\(&R\_\{1\}=r\_\{1\},\\ldots,R\_\{\\ell\}=r\_\{\\ell\}\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)\\\\ &=\\prod\_\{R\_\{i\}\\in\\\{R\_\{1\},\\ldots,R\_\{\\ell\}\\\}\}P\(R\_\{i\}=r\_\{i\}\\mid\\mathrm\{Pa\}\_\{\\boldsymbol\{R\}\}\(M,R\_\{i\}\)=\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(M,R\_\{i\}\)\),\\end\{split\}\(5\)wherepa𝑹​\(M,Ri\)\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(M,R\_\{i\}\)is a projection of the assignment\(r1,…,rℓ,r1′,…,rk′\)\(r\_\{1\},\\ldots,r\_\{\\ell\},r^\{\\prime\}\_\{1\},\\ldots,r^\{\\prime\}\_\{k\}\)to the parentsPa𝑹​\(M,Ri\)\\mathrm\{Pa\}\_\{\\boldsymbol\{R\}\}\(M,R\_\{i\}\)ofRiR\_\{i\}\.[Equation˜5](https://arxiv.org/html/2606.28024#S2.E5)is also known under the name of the truncated product formula or g\-formula\[Pearl2016a\]\. If we are not interested in the joint distribution over*all*randvarsR1,…,RℓR\_\{1\},\\ldots,R\_\{\\ell\}that are not intervened on but instead wish to compute the distribution over a subset of\{R1,…,Rℓ\}\\\{R\_\{1\},\\ldots,R\_\{\\ell\}\\\}, we have to sum out allrandvarsfromR1,…,RℓR\_\{1\},\\ldots,R\_\{\\ell\}that are not queried\.

###### Definition 8\(Truncated Product Formula\)\.

LetM=\(𝐑∪𝐅,𝐄,𝚽\)M=\(\\boldsymbol\{R\}\\cup\\boldsymbol\{F\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)be aCFG\. Further, let𝐑=\{Q,R1,…,Rℓ,R1′,…,Rk′\}\\boldsymbol\{R\}=\\\{Q,R\_\{1\},\\ldots,R\_\{\\ell\},R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}\. The result of an interventional queryP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)is then given by

P\(Q∣do\(R1′=r1′,…Rk′=rk′\)\)=∑r1∈range​\(R1\)…​∑rℓ∈range​\(Rℓ\)P​\(Q∣pa𝑹​\(M,Q\)\)⋅∏Ri∈\{R1,…,Rℓ\}P​\(ri∣pa𝑹​\(M,Ri\)\),\\displaystyle\\begin\{split\}P\(&Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)\\\\ &=\\sum\_\{r\_\{1\}\\in\\mathrm\{range\}\(R\_\{1\}\)\}\\ldots\\sum\_\{r\_\{\\ell\}\\in\\mathrm\{range\}\(R\_\{\\ell\}\)\}P\(Q\\mid\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(M,Q\)\)\\cdot\\prod\_\{R\_\{i\}\\in\\\{R\_\{1\},\\ldots,R\_\{\\ell\}\\\}\}P\(r\_\{i\}\\mid\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(M,R\_\{i\}\)\),\\end\{split\}\(6\)whererir\_\{i\}is again shorthand forRi=riR\_\{i\}=r\_\{i\}\(analogously forpa𝐑\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\) andpa𝐑​\(M,Q\)\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(M,Q\)as well aspa𝐑​\(M,Ri\)\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(M,R\_\{i\}\)denote projections of the assignment\(q,r1,…,rℓ,r1′,…,rk′\)\(q,r\_\{1\},\\ldots,r\_\{\\ell\},r^\{\\prime\}\_\{1\},\\ldots,r^\{\\prime\}\_\{k\}\)to the parents ofQQandRiR\_\{i\}, respectively\.

So far, we introducedCFGsas propositional causal models to compute the effect of interventions\. Next, we combine lifted representations with causal knowledge to allow for lifted causal inference\.

## 3Parametric Causal Factor Graphs

APCFGcombines aCFGand relational logic\[Genesereth2017a\]\(that is, first\-order logic with known universes\)\. By incorporating relational logic, aPCFGallows to encode that certain properties hold for all objects in a group \(i\.e\., set\) of objects\. In aPCFG,parameterised randvarsandparametric factorsrepresent sets ofrandvarsand factors, respectively\. More specifically, aPRVis parameterised bylogical variables, each having a domain consisting of constants, to represent a set ofrandvars\. Replacing thelogvarswith constants from their respective domains, called*grounding*, results in classicalrandvarsagain\. To restrictlogvarsto specific constants from their respective domains,PRVsare provided with constraints\. We first definePRVsand their components and afterwards defineparfactors, before we introducePCFGsand their semantics\.

### 3\.1Definitions

The upcoming definitions are based on the definitions given byBraun2020abut have been adapted and extended\.

###### Definition 9\(Parameterised Random Variable\)\.

Let𝐑\\boldsymbol\{R\}be a set ofrandvarnames,𝐋\\boldsymbol\{L\}a set oflogvarnames, and𝐃\\boldsymbol\{D\}a set of constants\. All sets are finite\. EachlogvarL∈𝐋L\\in\\boldsymbol\{L\}has a domaindom​\(L\)⊆𝐃\\mathrm\{dom\}\(L\)\\subseteq\\boldsymbol\{D\}\. A*constraint*C=\(ℒ,𝐂ℒ\)C=\(\\mathcal\{L\},\\boldsymbol\{C\}\_\{\\mathcal\{L\}\}\)is a tuple of a sequence oflogvarsℒ=\(L1,…,Ln\)\\mathcal\{L\}=\(L\_\{1\},\\ldots,L\_\{n\}\)and a set𝐂ℒ⊆×i=1ndom\(Li\)\\boldsymbol\{C\}\_\{\\mathcal\{L\}\}\\subseteq\\times\_\{i=1\}^\{n\}\\mathrm\{dom\}\(L\_\{i\}\)\. The symbol⊤\\topforCCmarks that no restrictions apply, i\.e\.,𝐂ℒ=×i=1ndom\(Li\)\\boldsymbol\{C\}\_\{\\mathcal\{L\}\}=\\times\_\{i=1\}^\{n\}\\mathrm\{dom\}\(L\_\{i\}\)\. A*substitution*𝛔=\{Li↦ti\}i=1n\\boldsymbol\{\\sigma\}=\\\{L\_\{i\}\\mapsto t\_\{i\}\\\}\_\{i=1\}^\{n\}replaces every occurrence oflogvarLiL\_\{i\}with termti∈dom​\(Li\)t\_\{i\}\\in\\mathrm\{dom\}\(L\_\{i\}\)\(also called grounding\)\. A*PRV*R​\(L1,…,Ln\)R\(L\_\{1\},\\ldots,L\_\{n\}\),n≥0n\\geq 0, is a syntactical construct of arandvarnameR∈𝐑R\\in\\boldsymbol\{R\}possibly combined withlogvarsL1,…,Ln∈𝐋L\_\{1\},\\ldots,L\_\{n\}\\in\\boldsymbol\{L\}to represent a set ofrandvars\. Ifn=0n=0, thePRVis parameterless and forms a propositionalrandvar\. APRVAA\(orlogvarLL\) under constraintCCis given byA\|CA\_\{\|C\}\(L\|CL\_\{\|C\}, respectively\)\. We may omit\|⊤\|\\topinA\|⊤A\_\{\|\\top\}orL\|⊤L\_\{\|\\top\}\. The termrange​\(A\)\\mathrm\{range\}\(A\)denotes the possible values of aPRVAA\. An*event*A=aA=adenotes the occurrence ofPRVAAwith range valuea∈range​\(A\)a\\in\\mathrm\{range\}\(A\)and a set of events𝚵=\{A1=a1,…,Ak=ak\}\\boldsymbol\{\\Xi\}=\\\{A\_\{1\}=a\_\{1\},\\ldots,A\_\{k\}=a\_\{k\}\\\}is called*evidence*\.

We further denote bylv​\(Y\)\\mathrm\{lv\}\(Y\)thelogvarsoccurring inYY, whereYYmay be aPRVor a constraint\. The set of all instances ofYY\(alogvarorPRV\) with respect to given constraints is denoted bygr​\(Y\)\\mathrm\{gr\}\(Y\)\. An instance \(also called grounding\) ofYYis the result of substituting thelogvarsinYYwith constants from the specified constraints\. For a set of elements𝒀\\boldsymbol\{Y\}\(e\.g\.,logvarsorPRVs\), we definelv​\(𝒀\)=⋃Y∈𝒀lv​\(Y\)\\mathrm\{lv\}\(\\boldsymbol\{Y\}\)=\\bigcup\_\{Y\\in\\boldsymbol\{Y\}\}\\mathrm\{lv\}\(Y\)andgr​\(𝒀\)=⋃Y∈𝒀gr​\(Y\)\\mathrm\{gr\}\(\\boldsymbol\{Y\}\)=\\bigcup\_\{Y\\in\\boldsymbol\{Y\}\}\\mathrm\{gr\}\(Y\)\. The next example introducesPRVsfor our running example\.

###### Example 3\(Parameterised Random Variable\)\.

Consider𝐑=\{C​o​m,R​e​v,S​a​l\}\\boldsymbol\{R\}=\\\{Com,Rev,Sal\\\}for competence, revenue, and salary, respectively,𝐋=\{E\}\\boldsymbol\{L\}=\\\{E\\\}withdom​\(E\)=\{A​l​i​c​e,B​o​b,C​h​a​r​l​i​e\}\\mathrm\{dom\}\(E\)=\\\{Alice,\\allowbreak Bob,\\allowbreak Charlie\\\}\(employees\), and𝐃=\{A​l​i​c​e,B​o​b,C​h​a​r​l​i​e\}\\boldsymbol\{D\}=\\\{Alice,\\allowbreak Bob,\\allowbreak Charlie\\\}\. CombiningC​o​mComandS​a​lSalwith thelogvarEE, we obtain thePRVsC​o​m​\(E\)\|⊤=C​o​m​\(E\)Com\(E\)\_\{\|\\top\}=Com\(E\)andS​a​l​\(E\)\|⊤=S​a​l​\(E\)Sal\(E\)\_\{\|\\top\}=Sal\(E\)\. Furthermore,R​e​vRevis a parameterlessPRV\. For the sake of the example, letrange​\(C​o​m​\(E\)\)=range​\(R​e​v\)=range​\(S​a​l​\(E\)\)=\{low,high\}\\mathrm\{range\}\(Com\(E\)\)=\\mathrm\{range\}\(Rev\)=\\mathrm\{range\}\(Sal\(E\)\)=\\\{\\mathrm\{low\},\\allowbreak\\mathrm\{high\}\\\}\. Applying the substitution𝛔=\{E↦A​l​i​c​e\}\\boldsymbol\{\\sigma\}=\\\{E\\mapsto Alice\\\}toC​o​m​\(E\)Com\(E\)results inC​o​m​\(A​l​i​c​e\)Com\(Alice\)\. The groundings ofC​o​m​\(E\)Com\(E\)are given bygr​\(C​o​m​\(E\)\)=\{C​o​m​\(A​l​i​c​e\),C​o​m​\(B​o​b\),C​o​m​\(C​h​a​r​l​i​e\)\}\\mathrm\{gr\}\(Com\(E\)\)=\\\{Com\(Alice\),\\allowbreak Com\(Bob\),\\allowbreak Com\(Charlie\)\\\}\. Applying the constraintC=\(E,\{A​l​i​c​e\}\)C=\(E,\\\{Alice\\\}\)toC​o​m​\(E\)Com\(E\)yieldsC​o​m​\(E\)\|\(E,\{Alice\}\)Com\(E\)\_\{\|\(E,\\\{Alice\\\}\)\}with groundingsgr​\(C​o​m​\(E\)\|\(E,\{Alice\}\)\)=\{C​o​m​\(A​l​i​c​e\)\}\\mathrm\{gr\}\(Com\(E\)\_\{\|\(E,\\\{Alice\\\}\)\}\)=\\\{Com\(Alice\)\\\}\.

We next defineparfactors, which represent sets of factors and are used to encode the probability distribution over therandvars\. Aparfactordescribes a function, mapping argument values to positive real numbers \(potentials\), of which at least one is non\-zero\.

###### Definition 10\(Parfactor\)\.

Let𝚽\\boldsymbol\{\\Phi\}denote a set of function definitions, let𝒜=\(A1,…,An\)\\mathcal\{A\}=\(A\_\{1\},\\ldots,A\_\{n\}\)denote a sequence ofPRVs, and let\(ℒ,𝐂ℒ\)\(\\mathcal\{L\},\\boldsymbol\{C\}\_\{\\mathcal\{L\}\}\)denote a constraint on thelogvarsℒ\\mathcal\{L\}in𝒜\\mathcal\{A\}\. Withϕ:×i=1nrange\(Ai\)↦ℝ≥0\\phi\\colon\\times\_\{i=1\}^\{n\}\\mathrm\{range\}\(A\_\{i\}\)\\mapsto\\mathbb\{R\}\_\{\\geq 0\}being a function from𝚽\\boldsymbol\{\\Phi\}, a*parfactor*is given by∀𝐥∈𝐂ℒ:ϕ​\(𝒜\)\|\(ℒ,𝐂ℒ\)\\forall\\boldsymbol\{l\}\\in\\boldsymbol\{C\}\_\{\\mathcal\{L\}\}\\colon\\phi\(\\mathcal\{A\}\)\_\{\|\(\\mathcal\{L\},\\boldsymbol\{C\}\_\{\\mathcal\{L\}\}\)\}, whereℒ\\mathcal\{L\}is substituted by𝐥\\boldsymbol\{l\}in𝒜\\mathcal\{A\}\. We writeϕ​\(𝒜\)\|\(ℒ,𝐂ℒ\)\\phi\(\\mathcal\{A\}\)\_\{\|\(\\mathcal\{L\},\\boldsymbol\{C\}\_\{\\mathcal\{L\}\}\)\}as a shorthand for∀𝐥∈𝐂ℒ:ϕ​\(𝒜\)\|\(ℒ,𝐂ℒ\)\\forall\\boldsymbol\{l\}\\in\\boldsymbol\{C\}\_\{\\mathcal\{L\}\}\\colon\\phi\(\\mathcal\{A\}\)\_\{\|\(\\mathcal\{L\},\\boldsymbol\{C\}\_\{\\mathcal\{L\}\}\)\}\(omitting the substitution\) and we again may omit\|⊤\|\\topinϕ​\(𝒜\)\|⊤\\phi\(\\mathcal\{A\}\)\_\{\|\\top\}\.

For aparfactorϕ\\phi,lv​\(ϕ\)\\mathrm\{lv\}\(\\phi\)again refers to thelogvarsinϕ\\phiandgr​\(ϕ\)\\mathrm\{gr\}\(\\phi\)again denotes the set of instances ofϕ\\phi\. We next introduceparfactorsfor our running example\.

###### Example 4\(Parfactor\)\.

Take a look atϕ1​\(C​o​m​\(E\)\)\|⊤\\phi\_\{1\}\(Com\(E\)\)\_\{\|\\top\}withrange​\(C​o​m​\(E\)\)=\{low,high\}\\mathrm\{range\}\(Com\(E\)\)=\\\{\\mathrm\{low\},\\allowbreak\\mathrm\{high\}\\\}anddom​\(E\)=\{A​l​i​c​e,B​o​b,C​h​a​r​l​i​e\}\\mathrm\{dom\}\(E\)=\\\{Alice,\\allowbreak Bob,\\allowbreak Charlie\\\}\. Forϕ1\\phi\_\{1\}, we have

ϕ1​\(C​o​m​\(E\)\)\|⊤\\displaystyle\\phi\_\{1\}\(Com\(E\)\)\_\{\|\\top\}=∀e∈dom​\(E\):ϕ1​\(C​o​m​\(e\)\)\|⊤\.\\displaystyle=\\forall e\\in\\mathrm\{dom\}\(E\)\\colon\\phi\_\{1\}\(Com\(e\)\)\_\{\|\\top\}\.It holds thatgr​\(ϕ1​\(C​o​m​\(E\)\)\|⊤\)=\{ϕ1​\(C​o​m​\(A​l​i​c​e\)\),ϕ1​\(C​o​m​\(B​o​b\)\),ϕ1​\(C​o​m​\(C​h​a​r​l​i​e\)\)\}\\mathrm\{gr\}\(\\phi\_\{1\}\(Com\(E\)\)\_\{\|\\top\}\)=\\\{\\phi\_\{1\}\(Com\(Alice\)\),\\allowbreak\\phi\_\{1\}\(Com\(Bob\)\),\\allowbreak\\phi\_\{1\}\(Com\(Charlie\)\)\\\}\. In this specific example,ϕ1​\(C​o​m​\(E\)\)\|⊤\\phi\_\{1\}\(Com\(E\)\)\_\{\|\\top\}thus represents a set of three ground factors\.

Before we are ready to define aPCFG, we need one more concept, namely the concept of acounting randvar\(CRV\)\[Milch2008a\], which allows us to compactly encode a factor where it does not matter which specific individualrandvarshave a certain range value but instead only the number ofrandvarshaving particular range values is of interest\. The range of aCRVis the space of histograms, i\.e\., a range value is a histogram indicating how manyrandvarshave a certain value\.

###### Definition 11\(Counting Random Variable\)\.

LetA​\(ℒ\)\|CA\(\\mathcal\{L\}\)\_\{\|C\}denote aPRVunder constraintCC, wherelv​\(ℒ\)=\{L\}\\mathrm\{lv\}\(\\mathcal\{L\}\)=\\\{L\\\}, i\.e\., eitherℒ\\mathcal\{L\}consists of onlyLLor the other inputs are constants \(meaningℒ\\mathcal\{L\}contains at most onelogvar\)\. We denote a*CRV*by\#L​\[A​\(ℒ\)\|C\]\\\#\_\{L\}\[A\(\\mathcal\{L\}\)\_\{\|C\}\]\. Its range is the space of possible histograms\. A histogramhhis a set of tuples\{\(vi,ni\)\}i=1m\\\{\(v\_\{i\},n\_\{i\}\)\\\}\_\{i=1\}^\{m\},vi∈range​\(A​\(ℒ\)\)v\_\{i\}\\in\\mathrm\{range\}\(A\(\\mathcal\{L\}\)\),ni∈ℕn\_\{i\}\\in\\mathbb\{N\},m=\|range​\(A​\(ℒ\)\)\|m=\\lvert\\mathrm\{range\}\(A\(\\mathcal\{L\}\)\)\\rvert, and∑ini=\|gr​\(L\|C\)\|\\sum\_\{i\}n\_\{i\}=\\lvert\\mathrm\{gr\}\(L\_\{\|C\}\)\\rvertfor some constraintCCoverℒ\\mathcal\{L\}\. A shorthand notation is\[n1,…,nm\]\[n\_\{1\},\\ldots,n\_\{m\}\]\. Since counting binds thelogvarLL,lv​\(\#L​\[A​\(ℒ\)\]\)=ℒ∖\{L\}\\mathrm\{lv\}\(\\\#\_\{L\}\[A\(\\mathcal\{L\}\)\]\)=\\mathcal\{L\}\\setminus\\\{L\\\}\.

###### Example 5\(Counting Random Variable\)\.

Let\#E​\[C​o​m​\(E\)\]\\\#\_\{E\}\[Com\(E\)\]be aCRV,range​\(C​o​m​\(E\)\)=\{low,high\}\\mathrm\{range\}\(Com\(E\)\)=\\\{\\mathrm\{low\},\\allowbreak\\mathrm\{high\}\\\}anddom​\(E\)=\{A​l​i​c​e,B​o​b,C​h​a​r​l​i​e\}\\mathrm\{dom\}\(E\)=\\\{Alice,\\allowbreak Bob,\\allowbreak Charlie\\\}\. Then, there arem=\|range​\(C​o​m​\(E\)\)\|=2m=\\lvert\\mathrm\{range\}\(Com\(E\)\)\\rvert=2possible range values andn=\|gr​\(E\)\|=3n=\\lvert\\mathrm\{gr\}\(E\)\\rvert=3groundings\. Hence, the histograms are\[0,3\]\[0,3\],\[1,2\]\[1,2\],\[2,1\]\[2,1\], and\[3,0\]\[3,0\]\(corresponding to\{\(high,0\),\(low,3\)\}\\\{\(\\mathrm\{high\},0\),\\allowbreak\(\\mathrm\{low\},3\)\\\},\{\(high,1\),\(low,2\)\}\\\{\(\\mathrm\{high\},1\),\\allowbreak\(\\mathrm\{low\},2\)\\\},\{\(high,2\),\(low,1\)\}\\\{\(\\mathrm\{high\},2\),\\allowbreak\(\\mathrm\{low\},1\)\\\}and\{\(high,3\),\(low,0\)\}\\\{\(\\mathrm\{high\},3\),\\allowbreak\(\\mathrm\{low\},0\)\\\}in set notation, respectively\)\.

We have now introduced all components of aPCFG, which we define next\.

###### Definition 12\(Parametric Causal Factor Graph\)\.

A*PCFG*M=\(𝐕,𝐄,𝚽\)M=\(\\boldsymbol\{V\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)consists of a directed graph\(𝐕,𝐄\)\(\\boldsymbol\{V\},\\boldsymbol\{E\}\)with node set𝐕=𝐀∪𝐆\\boldsymbol\{V\}=\\boldsymbol\{A\}\\cup\\boldsymbol\{G\}and edge set𝐄⊆𝐀×𝐆\\boldsymbol\{E\}\\subseteq\\boldsymbol\{A\}\\times\\boldsymbol\{G\}\. The set of nodes𝐕=𝐀∪𝐆\\boldsymbol\{V\}=\\boldsymbol\{A\}\\cup\\boldsymbol\{G\}is partitioned into a set ofPRVs𝐀=\{A1,…,An\}\\boldsymbol\{A\}=\\\{A\_\{1\},\\ldots,A\_\{n\}\\\}and a set ofparfactornames \(parfactornodes\)𝐆=\{g1,…,gm\}\\boldsymbol\{G\}=\\\{g\_\{1\},\\ldots,g\_\{m\}\\\}\. For everyparfactornamegj∈𝐆g\_\{j\}\\in\\boldsymbol\{G\}, there is a function definition \(parfactor\)ϕj​\(𝒜j\)\|C∈𝚽\\phi\_\{j\}\(\\mathcal\{A\}\_\{j\}\)\_\{\|C\}\\in\\boldsymbol\{\\Phi\}with𝒜j\\mathcal\{A\}\_\{j\}being a sequence ofPRVsfrom𝐀\\boldsymbol\{A\}andCCbeing a constraint on thelogvarsof𝒜j\\mathcal\{A\}\_\{j\}such thatϕ:×A∈𝒜jrange\(A\)↦ℝ≥0\\phi\\colon\\times\_\{A\\in\\mathcal\{A\}\_\{j\}\}\\mathrm\{range\}\(A\)\\mapsto\\mathbb\{R\}\_\{\\geq 0\}maps range values in𝒜j\\mathcal\{A\}\_\{j\}to a positive real number \(potential\)\. In every function definition, at least one potential has to be non\-zero and we again may omit\|⊤\|\\topinϕj​\(𝒜j\)\|⊤\\phi\_\{j\}\(\\mathcal\{A\}\_\{j\}\)\_\{\|\\top\}\. For eachparfactornamegj∈𝐆g\_\{j\}\\in\\boldsymbol\{G\}with corresponding function definitionϕj​\(𝒜j\)\|C∈𝚽\\phi\_\{j\}\(\\mathcal\{A\}\_\{j\}\)\_\{\|C\}\\in\\boldsymbol\{\\Phi\}, there is either an undirected edge\{Ai,gj\}∈𝐄\\\{A\_\{i\},g\_\{j\}\\\}\\in\\boldsymbol\{E\}or a directed edge\(gj,Ai\)∈𝐄\(g\_\{j\},A\_\{i\}\)\\in\\boldsymbol\{E\}for everyPRVAi∈𝒜jA\_\{i\}\\in\\mathcal\{A\}\_\{j\}\(directed edges are only allowed to point fromparfactornodes toPRVsbut not vice versa\)\. We stipulate that for everyparfactornodegj∈𝐆g\_\{j\}\\in\\boldsymbol\{G\}, there is exactly one outgoing directed edge\(gj,Ai\)∈𝐄\(g\_\{j\},A\_\{i\}\)\\in\\boldsymbol\{E\}among the edges incident togjg\_\{j\}\. Each directed edgeAi−gj→AkA\_\{i\}\-g\_\{j\}\\to A\_\{k\}from aPRVAi∈𝐀A\_\{i\}\\in\\boldsymbol\{A\}to aPRVAk∈𝐀A\_\{k\}\\in\\boldsymbol\{A\}via aparfactornodegj∈𝐆g\_\{j\}\\in\\boldsymbol\{G\}corresponds to a direct causal relationship betweenAiA\_\{i\}andAkA\_\{k\}\. APCFGis an acyclic graph, that is,𝐄\\boldsymbol\{E\}contains no sequence of edges\{A1,g1\},\(g1,A2\),…,\{Ak−1,gk\},\(gk,A1\)\\\{A\_\{1\},\\allowbreak g\_\{1\}\\\},\\allowbreak\(g\_\{1\},\\allowbreak A\_\{2\}\),\\allowbreak\\ldots,\\allowbreak\\\{A\_\{k\-1\},\\allowbreak g\_\{k\}\\\},\\allowbreak\(g\_\{k\},\\allowbreak A\_\{1\}\)starting from an arbitraryPRVA1∈𝐀A\_\{1\}\\in\\boldsymbol\{A\}such that the sequence ends again atA1A\_\{1\}when following the edges in the direction of the arrows\. The semantics ofMMis given by grounding with respect to constraints and building a full joint distribution over𝐑=gr​\(𝐀\)\\boldsymbol\{R\}=\\mathrm\{gr\}\(\\boldsymbol\{A\}\)\. The joint potential for an assignment𝐑=𝐫\\boldsymbol\{R\}=\\boldsymbol\{r\}is

ψM​\(𝑹=𝒓\)=∏ϕj∈𝚽∏ϕk∈gr​\(ϕj\)ϕk​\(ℛk=𝒓k\),\\displaystyle\\psi\_\{M\}\(\\boldsymbol\{R\}=\\boldsymbol\{r\}\)=\\prod\_\{\\phi\_\{j\}\\in\\boldsymbol\{\\Phi\}\}\\prod\_\{\\phi\_\{k\}\\in\\mathrm\{gr\}\(\\phi\_\{j\}\)\}\\phi\_\{k\}\(\\mathcal\{R\}\_\{k\}=\\boldsymbol\{r\}\_\{k\}\),\(7\)where𝐫k\\boldsymbol\{r\}\_\{k\}is a projection of𝐫\\boldsymbol\{r\}to the argument listℛk\\mathcal\{R\}\_\{k\}ofϕk\\phi\_\{k\}\. The normalised joint potential then yields the full joint probability distribution over𝐑\\boldsymbol\{R\}that is encoded byMM, that is,

PM​\(𝑹=𝒓\)=1Z​ψM​\(𝑹=𝒓\),\\displaystyle P\_\{M\}\(\\boldsymbol\{R\}=\\boldsymbol\{r\}\)=\\frac\{1\}\{Z\}\\psi\_\{M\}\(\\boldsymbol\{R\}=\\boldsymbol\{r\}\),\(8\)whereZZis the normalisation constant, defined as

Z=∑𝒓⁣∈⁣×R∈𝑹range​\(R\)ψM​\(𝑹=𝒓\)\.\\displaystyle Z=\\sum\_\{\\boldsymbol\{r\}\\in\\times\_\{R\\in\\boldsymbol\{R\}\}\\mathrm\{range\}\(R\)\}\\psi\_\{M\}\(\\boldsymbol\{R\}=\\boldsymbol\{r\}\)\.\(9\)

The definition of aPCFGalso implies that everyCFGis aPCFGcontaining only parameterlessPRVs\(analogously, every factor is aparfactorhaving only parameterless arguments\)\. Grounding aPCFGthus yields aCFGentailing equivalent semantics \(that is, encoding the same full joint probability distribution\) as thePCFG\. Aslogvarsabstract from individual objects, we refer toPCFGsas*lifted*representations and toCFGsas*propositional*representations \(in the same way, we refer to parameterlessPRVsas propositionalrandvarsand toparfactorshaving only parameterless arguments as propositional factors\)\. In the literature, a lifted representation is sometimes also referred to as a first\-order representation\. Before we take a look at an example, we introduce the following notations for aPCFGM=\(𝑨∪𝑮,𝑬,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\):

- •Pa𝑨​\(M,A\)=\{A′∈𝑨∣∃g∈𝑮:\{A′,g\}∈𝑬∧\(g,A\)∈𝑬\}\\mathrm\{Pa\}\_\{\\boldsymbol\{A\}\}\(M,A\)=\\\{A^\{\\prime\}\\in\\boldsymbol\{A\}\\mid\\exists g\\in\\boldsymbol\{G\}\\colon\\\{A^\{\\prime\},g\\\}\\in\\boldsymbol\{E\}\\land\(g,A\)\\in\\boldsymbol\{E\}\\\}denotes the set of parentPRVsof aPRVA∈𝑨A\\in\\boldsymbol\{A\}inMM,
- •Pa​\(M,g\)=\{A∈𝑨∣\{A,g\}∈𝑬\}\\mathrm\{Pa\}\(M,g\)=\\\{A\\in\\boldsymbol\{A\}\\mid\\\{A,g\\\}\\in\\boldsymbol\{E\}\\\}denotes the set of parentPRVsof aparfactornodeg∈𝑮g\\in\\boldsymbol\{G\}inMM,
- •Ch𝑨​\(M,A\)=\{A′∈𝑨∣∃g∈𝑮:\{A,g\}∈𝑬∧\(g,A′\)∈𝑬\}\\mathrm\{Ch\}\_\{\\boldsymbol\{A\}\}\(M,A\)=\\\{A^\{\\prime\}\\in\\boldsymbol\{A\}\\mid\\exists g\\in\\boldsymbol\{G\}\\colon\\\{A,g\\\}\\in\\boldsymbol\{E\}\\land\(g,A^\{\\prime\}\)\\in\\boldsymbol\{E\}\\\}denotes the singleton set of childPRVsof aPRVA∈𝑨A\\in\\boldsymbol\{A\}inMM,
- •Ch​\(M,g\)=\{A∈𝑨∣\(g,A\)∈𝑬\}\\mathrm\{Ch\}\(M,g\)=\\\{A\\in\\boldsymbol\{A\}\\mid\(g,A\)\\in\\boldsymbol\{E\}\\\}denotes the singleton set of childPRVsof aparfactornodeg∈𝑮g\\in\\boldsymbol\{G\}inMM,
- •De𝑨\(M,A\)=\{A′∈𝑨∣∃g1,…,gk∈𝑮,A1,…,Ak−1∈𝑨:\{A,g1\},\(g1,A1\),…,\{Ak−1,gk\},\(gk,A′\)∈𝑬\}\\mathrm\{De\}\_\{\\boldsymbol\{A\}\}\(M,A\)=\\\{A^\{\\prime\}\\in\\boldsymbol\{A\}\\mid\\exists g\_\{1\},\\allowbreak\\ldots,\\allowbreak g\_\{k\}\\in\\boldsymbol\{G\},\\allowbreak A\_\{1\},\\allowbreak\\ldots,\\allowbreak A\_\{k\-1\}\\in\\boldsymbol\{A\}\\colon\\\{A,\\allowbreak g\_\{1\}\\\},\\allowbreak\(g\_\{1\},\\allowbreak A\_\{1\}\),\\allowbreak\\ldots,\\allowbreak\\\{A\_\{k\-1\},\\allowbreak g\_\{k\}\\\},\\allowbreak\(g\_\{k\},\\allowbreak A^\{\\prime\}\)\\in\\boldsymbol\{E\}\\\}is the set of descendantPRVsof aPRVA∈𝑨A\\in\\boldsymbol\{A\}inMM, and
- •De\(M,g\)=\{A′∈𝑨∣∃g1,…,gk∈𝑮,A1,…,Ak∈𝑨:\(g,A1\),\{A1,g1\},…,\(gk,A′\)∈𝑬\}\\mathrm\{De\}\(M,g\)=\\\{A^\{\\prime\}\\in\\boldsymbol\{A\}\\mid\\exists g\_\{1\},\\allowbreak\\ldots,\\allowbreak g\_\{k\}\\in\\boldsymbol\{G\},\\allowbreak A\_\{1\},\\allowbreak\\ldots,\\allowbreak A\_\{k\}\\in\\boldsymbol\{A\}\\colon\(g,\\allowbreak A\_\{1\}\),\\allowbreak\\\{A\_\{1\},\\allowbreak g\_\{1\}\\\},\\allowbreak\\ldots,\\allowbreak\(g\_\{k\},\\allowbreak A^\{\\prime\}\)\\in\\boldsymbol\{E\}\\\}is the set of descendantPRVsof aparfactornodeg∈𝑮g\\in\\boldsymbol\{G\}inMM\.

As before, the subscript𝑨\\boldsymbol\{A\}indicates that the sets are defined with respect to thePRVsin𝑨\\boldsymbol\{A\}, that is, the sets contain neighbouringPRVsthat are connected via aparfactornode\. Furthermore, as the definition of aPCFGrequires everyparfactornode to have exactly one outgoing directed edge, it holds that\|Ch​\(M,g\)\|=1\\lvert\\mathrm\{Ch\}\(M,g\)\\rvert=1for everyparfactornodeg∈𝑮g\\in\\boldsymbol\{G\}\. We next give an example\.

###### Example 6\(Parametric Causal Factor Graph\)\.

[Figure˜2](https://arxiv.org/html/2606.28024#S3.F2)displays aPCFGMMfor our running example\.MMcontains twoPRVsC​o​m​\(E\)Com\(E\)\(for the competence of employees\) andS​a​l​\(E\)Sal\(E\)\(for the salary of employees\), as well as a propositionalrandvarR​e​vRev\(for the revenue of the company\)\. The ranges of thePRVsarerange​\(C​o​m​\(E\)\)=range​\(S​a​l​\(E\)\)=range​\(R​e​v\)=\{low,high\}\\mathrm\{range\}\(Com\(E\)\)=\\mathrm\{range\}\(Sal\(E\)\)=\\mathrm\{range\}\(Rev\)=\\\{\\mathrm\{low\},\\allowbreak\\mathrm\{high\}\\\}and thelogvarEE\(representing employees\) has the domaindom​\(E\)=\{A​l​i​c​e,B​o​b,C​h​a​r​l​i​e\}\\mathrm\{dom\}\(E\)=\\\{Alice,Bob,Charlie\\\}\. There are threeparfactornodesg1g\_\{1\},g2g\_\{2\}, andg3g\_\{3\}with corresponding function definitionsϕ1​\(C​o​m​\(E\)\)\\phi\_\{1\}\(Com\(E\)\),ϕ2​\(\#E​\[C​o​m​\(E\)\],R​e​v\)\\phi\_\{2\}\(\\\#\_\{E\}\[Com\(E\)\],\\allowbreak Rev\), andϕ3​\(C​o​m​\(E\),R​e​v,S​a​l​\(E\)\)\\phi\_\{3\}\(Com\(E\),\\allowbreak Rev,\\allowbreak Sal\(E\)\)\. We omit the potential tables of theparfactorsfor brevity\. AsC​o​m​\(E\)Com\(E\)appears count\-converted inϕ2​\(\#E​\[C​o​m​\(E\)\],R​e​v\)\\phi\_\{2\}\(\\\#\_\{E\}\[Com\(E\)\],\\allowbreak Rev\), it holds thatlv​\(ϕ2\)=∅\\mathrm\{lv\}\(\\phi\_\{2\}\)=\\emptysetand thus,g2g\_\{2\}is not layered in[Fig\.˜2](https://arxiv.org/html/2606.28024#S3.F2)whileg1g\_\{1\}andg3g\_\{3\}are layered \(becauselv​\(ϕ1\)≠∅\\mathrm\{lv\}\(\\phi\_\{1\}\)\\neq\\emptysetandlv​\(ϕ3\)≠∅\\mathrm\{lv\}\(\\phi\_\{3\}\)\\neq\\emptyset\)\. In set notation,M=\(𝐀∪𝐆,𝐄,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)is given as

𝑹\\displaystyle\\boldsymbol\{R\}=\{C​o​m,R​e​v,S​a​l\},\\displaystyle=\\\{Com,Rev,Sal\\\},𝑳\\displaystyle\\boldsymbol\{L\}=\{E\},\\displaystyle=\\\{E\\\},𝑫\\displaystyle\\boldsymbol\{D\}=\{A​l​i​c​e,B​o​b,C​h​a​r​l​i​e\},\\displaystyle=\\\{Alice,Bob,Charlie\\\},𝑨\\displaystyle\\boldsymbol\{A\}=\{C​o​m​\(E\),R​e​v,S​a​l​\(E\)\},\\displaystyle=\\\{Com\(E\),Rev,Sal\(E\)\\\},𝑮\\displaystyle\\boldsymbol\{G\}=\{g1,g2,g3\},\\displaystyle=\\\{g\_\{1\},g\_\{2\},g\_\{3\}\\\},𝑬\\displaystyle\\boldsymbol\{E\}=\{\(g1,C​o​m​\(E\)\),\{C​o​m​\(E\),g2\},\{C​o​m​\(E\),g3\},\(g2,R​e​v\),\{R​e​v,g3\},\(g3,S​a​l​\(E\)\)\},\\displaystyle=\\\{\(g\_\{1\},Com\(E\)\),\\\{Com\(E\),g\_\{2\}\\\},\\\{Com\(E\),g\_\{3\}\\\},\(g\_\{2\},Rev\),\\\{Rev,g\_\{3\}\\\},\(g\_\{3\},Sal\(E\)\)\\\},𝚽\\displaystyle\\boldsymbol\{\\Phi\}=\{ϕ1​\(C​o​m​\(E\)\),ϕ2​\(\#E​\[C​o​m​\(E\)\],R​e​v\),ϕ3​\(C​o​m​\(E\),R​e​v,S​a​l​\(E\)\)\},\\displaystyle=\\\{\\phi\_\{1\}\(Com\(E\)\),\\phi\_\{2\}\(\\\#\_\{E\}\[Com\(E\)\],\\allowbreak Rev\),\\phi\_\{3\}\(Com\(E\),\\allowbreak Rev,\\allowbreak Sal\(E\)\)\\\},where𝐑\\boldsymbol\{R\}is the set ofrandvarnames,𝐋\\boldsymbol\{L\}is the set oflogvarnames, and𝐃\\boldsymbol\{D\}is the set of constants\. GroundingMMresults in theCFGfrom[Ex\.˜1](https://arxiv.org/html/2606.28024#Thmexample1), whereC​o​m​\(A​l​i​c​e\)Com\(Alice\)corresponds toC​o​m​AComA,C​o​m​\(B​o​b\)Com\(Bob\)corresponds toC​o​m​BComB, and so on\.

C​o​m​\(E\)Com\(E\)g1g\_\{1\}g2g\_\{2\}g3g\_\{3\}R​e​vRevS​a​l​\(E\)Sal\(E\)Figure 2:An illustration of aPCFGfor our running example\. We omit the potential tables of the \(par\)factors for brevity\.We deliberately chose labelsC​o​m​AComA,C​o​m​BComB, andC​o​m​CComCinstead ofC​o​m​\(A​l​i​c​e\)Com\(Alice\),C​o​m​\(B​o​b\)Com\(Bob\), andC​o​m​\(C​h​a​r​l​i​e\)Com\(Charlie\)in[Ex\.˜1](https://arxiv.org/html/2606.28024#Thmexample1)to emphasise that there is no explicit representation of objects \(here employees\) in the graph structure of the propositionalCFG\. In general, node labels can be arbitrary strings of characters\. The size of thePCFG\(that is, the number of nodes and edges in the graph\) remains constant even if the number of employees increases\. In theCFGfrom[Fig\.˜1\(a\)](https://arxiv.org/html/2606.28024#S2.F1.sf1), however, the size of the graph increases linearly with the number of employees as every additional employee adds tworandvarsand two factors to the graph\. In general, there might be multiple groups of indistinguishable objects instead of having a single group including all objects, which can be represented by using constraints\.

Before we continue to define the semantics of an intervention in aPCFG, we briefly provide the separation criteria in aPCFG, linking its graph structure to conditional independence statements in the underlying probability distribution\.

### 3\.2Conditional Independence in Parametric Causal Factor Graphs

The separation criteria in aPCFGare given on a ground level and hence directly correspond to the separation criteria for aCFGgiven in[Def\.˜2](https://arxiv.org/html/2606.28024#Thmdefinition2)\.

###### Definition 13\(Separation in Parametric Causal Factor Graphs\)\.

LetM=\(𝐀∪𝐆,𝐄,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)be aPCFG\. Further, let𝐑=gr​\(𝐀\)\\boldsymbol\{R\}=\\mathrm\{gr\}\(\\boldsymbol\{A\}\)and let𝐑i⊆𝐑\\boldsymbol\{R\}\_\{i\}\\subseteq\\boldsymbol\{R\},𝐑j⊆𝐑\\boldsymbol\{R\}\_\{j\}\\subseteq\\boldsymbol\{R\}, and𝐒⊆𝐑\\boldsymbol\{S\}\\subseteq\\boldsymbol\{R\}be pairwise disjoint sets of groundrandvars\. We say that𝐒\\boldsymbol\{S\}*separates*𝐑i\\boldsymbol\{R\}\_\{i\}and𝐑j\\boldsymbol\{R\}\_\{j\}inMMif𝐒\\boldsymbol\{S\}blocks all paths from anyrandvarin𝐑i\\boldsymbol\{R\}\_\{i\}to anyrandvarin𝐑j\\boldsymbol\{R\}\_\{j\}ingr​\(M\)\\mathrm\{gr\}\(M\)\.MMimplies the conditional independence statement\(𝐑i⟂⟂𝐑j∣𝐒\)\(\\boldsymbol\{R\}\_\{i\}\\mathrel\{\\perp\\\!\\\!\\\!\\perp\}\\boldsymbol\{R\}\_\{j\}\\mid\\boldsymbol\{S\}\)if𝐒\\boldsymbol\{S\}separates𝐑i\\boldsymbol\{R\}\_\{i\}and𝐑j\\boldsymbol\{R\}\_\{j\}inMM\.

While the separation criteria in aPCFGare defined on a ground level, it is nevertheless possible to check for separation on a lifted level without grounding the entirePCFG\. The Bayes\-Ball algorithm\[Shachter1998a\]allows to efficiently check for induced conditional independence statements in a propositionalBN\(and hence can also be applied to aCFGby taking the underlying causal graph structure into account\)\.Meert2010aextend the Bayes\-Ball algorithm to the lifted setting, thereby allowing to run it directly on a lifted representation\.

It is also possible to check for implied conditional independence statements involvingPRVsinstead of groundrandvarsin a highly efficient manner on a lifted level\. In aPCFG, everyPRVA​\(ℒ\)\|CA\(\\mathcal\{L\}\)\_\{\|C\}is represented by a variable node and thus, checking for conditional independence statements that involveAAcan be done by looking at this specific variable node instead of taking into account all groundings ofAA\. In contrast, in a propositional setting \(i\.e\., in a ground model\), each groundrandvaringr​\(A\)\\mathrm\{gr\}\(A\)is an individual node in the graph and hence must be looked at individually\. For instance, to check whetherC​o​m​\(E\)⟂⟂S​a​l​\(E\)∣R​e​vCom\(E\)\\mathrel\{\\perp\\\!\\\!\\\!\\perp\}Sal\(E\)\\mid Revis implied by thePCFGdepicted in[Fig\.˜2](https://arxiv.org/html/2606.28024#S3.F2), only three variable nodes are of relevance whereas2⋅\|dom​\(E\)\|\+12\\cdot\\lvert\\mathrm\{dom\}\(E\)\\rvert\+1nodes are of relevance in the corresponding ground model shown in[Fig\.˜1\(a\)](https://arxiv.org/html/2606.28024#S2.F1.sf1)\. Here, the conditional independence statementC​o​m​\(E\)⟂⟂S​a​l​\(E\)∣R​e​vCom\(E\)\\mathrel\{\\perp\\\!\\\!\\\!\\perp\}Sal\(E\)\\mid Rev\(which is not implied by thePCFGfrom[Fig\.˜2](https://arxiv.org/html/2606.28024#S3.F2)\) is a shorthand to refer to a set of conditional independence statements resulting from substitutingEEwith everye∈dom​\(E\)e\\in\\mathrm\{dom\}\(E\), i\.e\.,\{C​o​m​\(e\)⟂⟂S​a​l​\(e\)∣R​e​v\}e∈dom​\(E\)\\\{Com\(e\)\\mathrel\{\\perp\\\!\\\!\\\!\\perp\}Sal\(e\)\\mid Rev\\\}\_\{e\\in\\mathrm\{dom\}\(E\)\}\.

For now, we additionally assume that everyPRVAkA\_\{k\}in aPCFGhas exactly one parentparfactornodegjg\_\{j\}such that each corresponding ground function definitionϕj​\(R1,…,Rk\)∈gr​\(ϕj​\(A1,…,Ak\)\)\\phi\_\{j\}\(R\_\{1\},\\allowbreak\\ldots,\\allowbreak R\_\{k\}\)\\in\\mathrm\{gr\}\(\\phi\_\{j\}\(A\_\{1\},\\allowbreak\\ldots,\\allowbreak A\_\{k\}\)\)represents a conditional probability distributionP​\(Rk∣R1,…,Rk−1\)P\(R\_\{k\}\\mid R\_\{1\},\\allowbreak\\ldots,\\allowbreak R\_\{k\-1\}\)\. In other words, we assume that the groundCFGrepresented by a givenPCFGdirectly corresponds to aCBN\. This assumption results in a more convenient and often more efficient computation when answering interventional queries in aPCFGbut is not necessary to answer such queries in general\. Later on, we relax this assumption and show how interventional queries can still be answered on a lifted level\. In the next subsection, we apply the notion of an intervention toPCFGs\.

### 3\.3Interventions in Parametric Causal Factor Graphs

An intervention in aPCFGis defined analogously to an intervention in aCFG\. The interventional distribution, in particular, is defined on a ground level again\.

###### Definition 14\(Interventional Distribution in a Parametric Causal Factor Graph\)\.

LetM=\(𝐀∪𝐆,𝐄,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)be aPCFGwith𝐑=gr​\(𝐀\)=\{R1,…,Rn\}\\boldsymbol\{R\}=\\mathrm\{gr\}\(\\boldsymbol\{A\}\)=\\\{R\_\{1\},\\allowbreak\\ldots,\\allowbreak R\_\{n\}\\\}\. Further, letd​o​\(R1′=r1′,…,Rk′=rk′\)do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)be an intervention on therandvarsR1′,…,Rk′∈gr​\(𝐀\)R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\in\\mathrm\{gr\}\(\\boldsymbol\{A\}\)\. The interventional distribution under the interventiond​o​\(R1′=r1′,…,Rk′=rk′\)do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)is given by

PM\\displaystyle P\_\{M\}\(R1=r1,…,Rn=rn∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)\\displaystyle\(R\_\{1\}=r\_\{1\},\\ldots,R\_\{n\}=r\_\{n\}\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)=\{∏Ri∈\{R1,…,Rn\}∖\{R1′,…,Rk′\}P​\(ri∣pa𝑹​\(gr​\(M\),Ri\)\)if​∀j∈\{1,…,k\}:rj=rj′0otherwise,\\displaystyle=\\begin\{cases\}\\prod\\limits\_\{R\_\{i\}\\in\\\{R\_\{1\},\\ldots,R\_\{n\}\\\}\\setminus\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}\}P\(r\_\{i\}\\mid\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R\_\{i\}\)\)&\\text\{if \}\\forall j\\in\\\{1,\\ldots,k\\\}\\colon r\_\{j\}=r^\{\\prime\}\_\{j\}\\\\ 0&\\text\{otherwise\},\\end\{cases\}wherepa𝐑​\(gr​\(M\),Ri\)\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R\_\{i\}\)denotes a projection of the assignment\(r1,…,rn\)\(r\_\{1\},\\ldots,r\_\{n\}\)to the parentsPa𝐑​\(gr​\(M\),Ri\)\\mathrm\{Pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R\_\{i\}\)ofRiR\_\{i\}in the ground modelgr​\(M\)\\mathrm\{gr\}\(M\)\.

Furthermore, we allow for interventions onPRVs\. An interventiond​o​\(A​\(ℒ\)\|C=a\)do\(A\(\\mathcal\{L\}\)\_\{\|C\}=a\)on aPRVAA, wherea∈range​\(A\)a\\in\\mathrm\{range\}\(A\), can be seen as a joint intervention on all groundrandvarsingr​\(A\|C\)\\mathrm\{gr\}\(A\_\{\|C\}\)\. In other words,d​o​\(A​\(ℒ\)\|C=a\)do\(A\(\\mathcal\{L\}\)\_\{\|C\}=a\)is equivalent tod​o​\(R1=a,…,Rk=a\)do\(R\_\{1\}=a,\\ldots,R\_\{k\}=a\), wheregr​\(A\|C\)=\{R1,…,Rk\}\\mathrm\{gr\}\(A\_\{\|C\}\)=\\\{R\_\{1\},\\ldots,R\_\{k\}\\\}\. From now on, we therefore also allow for interventional queries of the formP​\(Q∣d​o​\(A1=a1,…,Ak=ak\)\)P\(Q\\mid do\(A\_\{1\}=a\_\{1\},\\ldots,A\_\{k\}=a\_\{k\}\)\), whereA1,…,AkA\_\{1\},\\ldots,A\_\{k\}arePRVs\. Since any interventional query involvingPRVscan be reduced to an interventional query containing only parameterlessrandvars, we continue to work with our original definition of an interventional query \([Def\.˜7](https://arxiv.org/html/2606.28024#Thmdefinition7)\)\. To answer an interventional query in aPCFG, we can again apply the truncated product formula\.

###### Definition 15\(Truncated Product Formula in a Parametric Causal Factor Graph\)\.

LetM=\(𝐀∪𝐆,𝐄,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)be aPCFGand let𝐑=gr​\(𝐀\)=\{Q,R1,…,Rℓ,R1′,…,Rk′\}\\boldsymbol\{R\}=\\mathrm\{gr\}\(\\boldsymbol\{A\}\)=\\\{Q,R\_\{1\},\\ldots,R\_\{\\ell\},R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}\. The result of an interventional queryP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)is then given by

P​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)=∑r1∈range​\(R1\)…​∑rℓ∈range​\(Rℓ\)P​\(Q∣pa𝑹​\(gr​\(M\),Q\)\)⋅∏Ri∈\{R1,…,Rℓ\}P​\(ri∣pa𝑹​\(gr​\(M\),Ri\)\),\\displaystyle\\begin\{split\}P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)=&\\sum\_\{r\_\{1\}\\in\\mathrm\{range\}\(R\_\{1\}\)\}\\ldots\\sum\_\{r\_\{\\ell\}\\in\\mathrm\{range\}\(R\_\{\\ell\}\)\}P\(Q\\mid\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),Q\)\)\\\\ \\cdot&\\prod\_\{R\_\{i\}\\in\\\{R\_\{1\},\\ldots,R\_\{\\ell\}\\\}\}P\(r\_\{i\}\\mid\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R\_\{i\}\)\),\\end\{split\}\(10\)wherepa𝐑​\(gr​\(M\),Q\)\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),Q\)andpa𝐑​\(gr​\(M\),Ri\)\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R\_\{i\}\)denote projections of the assignment\(q,r1,…,rℓ,r1′,…,rk′\)\(q,\\allowbreak r\_\{1\},\\allowbreak\\ldots,\\allowbreak r\_\{\\ell\},\\allowbreak r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak r^\{\\prime\}\_\{k\}\)to the parents ofQQandRiR\_\{i\}in the ground modelgr​\(M\)\\mathrm\{gr\}\(M\), respectively\.

Even though both the interventional distribution and the truncated product formula are defined on a ground level, an interventional query can be answered without grounding the entirePCFG\. In particular,[Eq\.˜10](https://arxiv.org/html/2606.28024#S3.E10)gives us a formula that consists of a set of probabilistic queries, which can be answered on a lifted level\. Under the assumption of having a direct correspondence ofparfactorsto conditional probability distributions, we can further simplify query answering in aPCFGMM\.

###### Proposition 1\.

LetM=\(𝐀∪𝐆,𝐄,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)denote aPCFGwith eachϕj​\(Rj1,…,Rjz\)∈gr​\(𝚽\)\\phi\_\{j\}\(R\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\}\}\)\\in\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}\)representing a conditional probability distributionP​\(Rjz∣Rj1,…,Rjz−1\)P\(R\_\{j\_\{z\}\}\\mid R\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\-1\}\}\), letgr​\(𝐀\)=\{Q,R1,…,Rℓ,R1′,…,Rk′\}\\mathrm\{gr\}\(\\boldsymbol\{A\}\)=\\\{Q,\\allowbreak R\_\{1\},\\allowbreak\\ldots,\\allowbreak R\_\{\\ell\},\\allowbreak R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}, and letP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)be an interventional query\. Further, letM′=\(𝐀∪𝐆,𝐄,𝚽′\)M^\{\\prime\}=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\allowbreak\\boldsymbol\{E\},\\allowbreak\\boldsymbol\{\\Phi\}^\{\\prime\}\)be thePCFGobtained by changing𝚽\\boldsymbol\{\\Phi\}to𝚽′\\boldsymbol\{\\Phi\}^\{\\prime\}such that every factorϕj​\(Rj1,…,Rjz\)∈gr​\(𝚽\)\\phi\_\{j\}\(R\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\}\}\)\\in\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}\)that has a childRjz=Rz′R\_\{j\_\{z\}\}=R^\{\\prime\}\_\{z\}in\{R1′,…,Rk′\}\\\{R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}is replaced by a factorϕj′​\(Rj1,…,Rjz\)∈gr​\(𝚽′\)\\phi^\{\\prime\}\_\{j\}\(R\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\}\}\)\\in\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}^\{\\prime\}\)with

ϕj′​\(Rj1=rj1,…,Rjz=rjz\)=\{1if​rjz=rz′0if​rjz≠rz′\.\\displaystyle\\phi^\{\\prime\}\_\{j\}\(R\_\{j\_\{1\}\}=r\_\{j\_\{1\}\},\\ldots,R\_\{j\_\{z\}\}=r\_\{j\_\{z\}\}\)=\\begin\{cases\}1&\\text\{if \}r\_\{j\_\{z\}\}=r^\{\\prime\}\_\{z\}\\\\ 0&\\text\{if \}r\_\{j\_\{z\}\}\\neq r^\{\\prime\}\_\{z\}\.\\end\{cases\}All factors whose child is not in\{R1′,…,Rk′\}\\\{R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}remain unchanged\. The result of the interventional queryPM​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\_\{M\}\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)in the original modelMMis then given by the result of the probabilistic queryPM′​\(Q∣R1′=r1′,…,Rk′=rk′\)P\_\{M^\{\\prime\}\}\(Q\\mid R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)in the modified modelM′M^\{\\prime\}\.

###### Proof\.

For each ground factorϕj​\(Rj1,…,Rjz\)∈gr​\(𝚽\)\\phi\_\{j\}\(R\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\}\}\)\\in\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}\), it holds that

ϕj\(Rj1=rj1,…,Rjz=rjz\)=P\(Rjz=rjz∣Rj1=rj1,…,Rjz−1=rjz−1\)\\displaystyle\\phi\_\{j\}\(R\_\{j\_\{1\}\}=r\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\}\}=r\_\{j\_\{z\}\}\)=P\(R\_\{j\_\{z\}\}=r\_\{j\_\{z\}\}\\mid R\_\{j\_\{1\}\}=r\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\-1\}\}=r\_\{j\_\{z\-1\}\}\)\(11\)for all assignments\(rj1,…,rjz\)\(r\_\{j\_\{1\}\},\\ldots,r\_\{j\_\{z\}\}\)\. Entering[Eq\.˜11](https://arxiv.org/html/2606.28024#S3.E11)into the truncated product formula \([Eq\.˜10](https://arxiv.org/html/2606.28024#S3.E10)\) leaves us with

PM​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)=∑r1∈range​\(R1\)…​∑rℓ∈range​\(Rℓ\)∏ϕj∈gr^​\(𝚽\)ϕj​\(ℛj=𝒓j\),\\displaystyle P\_\{M\}\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)=\\sum\_\{r\_\{1\}\\in\\mathrm\{range\}\(R\_\{1\}\)\}\\ldots\\sum\_\{r\_\{\\ell\}\\in\\mathrm\{range\}\(R\_\{\\ell\}\)\}\\prod\_\{\\phi\_\{j\}\\in\\hat\{\\mathrm\{gr\}\}\(\\boldsymbol\{\\Phi\}\)\}\\phi\_\{j\}\(\\mathcal\{R\}\_\{j\}=\\boldsymbol\{r\}\_\{j\}\),wheregr^​\(𝚽\)=\{ϕj​\(Rj1,…,Rjz\)∈gr​\(𝚽\)∣Rjz∉\{R1′,…,Rk′\}\}\\hat\{\\mathrm\{gr\}\}\(\\boldsymbol\{\\Phi\}\)=\\\{\\phi\_\{j\}\(R\_\{j\_\{1\}\},\\ldots,R\_\{j\_\{z\}\}\)\\in\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}\)\\mid R\_\{j\_\{z\}\}\\notin\\\{R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}\\\}denotes the set of groundings of𝚽\\boldsymbol\{\\Phi\}whose child is not in\{R1′,…,Rk′\}\\\{R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}and𝒓j\\boldsymbol\{r\}\_\{j\}denotes a projection of the assignment\(q,r1,…,rℓ,r1′,…,rk′\)\(q,\\allowbreak r\_\{1\},\\allowbreak\\ldots,\\allowbreak r\_\{\\ell\},\\allowbreak r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak r^\{\\prime\}\_\{k\}\)to the argument listℛj\\mathcal\{R\}\_\{j\}ofϕj\\phi\_\{j\}\. Now, consider the modified modelM′M^\{\\prime\}and the queryP​\(Q∣R1′=r1′,…,Rk′=rk′\)P\(Q\\mid R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\. By entering[Eq\.˜11](https://arxiv.org/html/2606.28024#S3.E11)into the definition of the full joint probability distributionPM′P\_\{M^\{\\prime\}\}encoded byM′M^\{\\prime\}\([Eq\.˜8](https://arxiv.org/html/2606.28024#S3.E8)\), we end up with

PM′​\(Q∣R1′=r1′,…,Rk′=rk′\)=1Z′​∑r1∈range​\(R1\)…​∑rℓ∈range​\(Rℓ\)∏ϕj′∈gr​\(𝚽′\)ϕj′​\(ℛj=𝒓j\)\.\\displaystyle P\_\{M^\{\\prime\}\}\(Q\\mid R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)=\\frac\{1\}\{Z^\{\\prime\}\}\\sum\_\{r\_\{1\}\\in\\mathrm\{range\}\(R\_\{1\}\)\}\\ldots\\sum\_\{r\_\{\\ell\}\\in\\mathrm\{range\}\(R\_\{\\ell\}\)\}\\prod\_\{\\phi^\{\\prime\}\_\{j\}\\in\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}^\{\\prime\}\)\}\\phi^\{\\prime\}\_\{j\}\(\\mathcal\{R\}\_\{j\}=\\boldsymbol\{r\}\_\{j\}\)\.Furthermore, as every factorϕj′\\phi^\{\\prime\}\_\{j\}whose child is not in\{R1′,…,Rk′\}\\\{R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}is left unchanged \(i\.e\.,ϕj′​\(Rj1,…,Rjz\)=ϕj​\(Rj1,…,Rjz\)\\phi^\{\\prime\}\_\{j\}\(R\_\{j\_\{1\}\},\\ldots,R\_\{j\_\{z\}\}\)=\\phi\_\{j\}\(R\_\{j\_\{1\}\},\\ldots,R\_\{j\_\{z\}\}\)ifRjz∉\{R1′,…,Rk′\}R\_\{j\_\{z\}\}\\notin\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}\), it holds that

gr​\(𝚽′\)\\displaystyle\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}^\{\\prime\}\)=\{ϕj′​\(Rj1,…,Rjz\)∈gr​\(𝚽′\)∣Rjz∉\{R1′,…,Rk′\}\}\\displaystyle=\\\{\\phi^\{\\prime\}\_\{j\}\(R\_\{j\_\{1\}\},\\ldots,R\_\{j\_\{z\}\}\)\\in\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}^\{\\prime\}\)\\mid R\_\{j\_\{z\}\}\\notin\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}\\\}∪\{ϕj′​\(Rj1,…,Rjz\)∈gr​\(𝚽′\)∣Rjz∈\{R1′,…,Rk′\}\}\\displaystyle\\cup\\\{\\phi^\{\\prime\}\_\{j\}\(R\_\{j\_\{1\}\},\\ldots,R\_\{j\_\{z\}\}\)\\in\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}^\{\\prime\}\)\\mid R\_\{j\_\{z\}\}\\in\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}\\\}=gr^​\(𝚽\)∪\{ϕj′​\(Rj1,…,Rjz\)∈gr​\(𝚽′\)∣Rjz∈\{R1′,…,Rk′\}\}\.\\displaystyle=\\hat\{\\mathrm\{gr\}\}\(\\boldsymbol\{\\Phi\}\)\\cup\\\{\\phi^\{\\prime\}\_\{j\}\(R\_\{j\_\{1\}\},\\ldots,R\_\{j\_\{z\}\}\)\\in\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}^\{\\prime\}\)\\mid R\_\{j\_\{z\}\}\\in\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}\\\}\.Due to the modifications inM′M^\{\\prime\}, it holds that every factorϕj′​\(Rj1,…,Rjz\)∈\{ϕj′​\(Rj1,…,Rjz\)∈gr​\(𝚽′\)∣Rjz∈\{R1′,…,Rk′\}\}\\phi^\{\\prime\}\_\{j\}\(R\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\}\}\)\\in\\\{\\phi^\{\\prime\}\_\{j\}\(R\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\}\}\)\\in\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}^\{\\prime\}\)\\mid R\_\{j\_\{z\}\}\\in\\\{R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}\\\}maps any assignment that assignsR1′=r1′,…,Rk′=rk′R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}to the value one\. Consequently, we end up with

PM′​\(Q∣R1′=r1′,…,Rk′=rk′\)\\displaystyle P\_\{M^\{\\prime\}\}\(Q\\mid R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)=1Z′​∑r1∈range​\(R1\)…​∑rℓ∈range​\(Rℓ\)∏ϕj′∈gr​\(𝚽′\)ϕj′​\(ℛj=𝒓j\)\\displaystyle=\\frac\{1\}\{Z^\{\\prime\}\}\\sum\_\{r\_\{1\}\\in\\mathrm\{range\}\(R\_\{1\}\)\}\\ldots\\sum\_\{r\_\{\\ell\}\\in\\mathrm\{range\}\(R\_\{\\ell\}\)\}\\prod\_\{\\phi^\{\\prime\}\_\{j\}\\in\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}^\{\\prime\}\)\}\\phi^\{\\prime\}\_\{j\}\(\\mathcal\{R\}\_\{j\}=\\boldsymbol\{r\}\_\{j\}\)=1Z′​∑r1∈range​\(R1\)…​∑rℓ∈range​\(Rℓ\)∏ϕj∈gr^​\(𝚽\)ϕj​\(ℛj=𝒓j\)\.\\displaystyle=\\frac\{1\}\{Z^\{\\prime\}\}\\sum\_\{r\_\{1\}\\in\\mathrm\{range\}\(R\_\{1\}\)\}\\ldots\\sum\_\{r\_\{\\ell\}\\in\\mathrm\{range\}\(R\_\{\\ell\}\)\}\\prod\_\{\\phi\_\{j\}\\in\\hat\{\\mathrm\{gr\}\}\(\\boldsymbol\{\\Phi\}\)\}\\phi\_\{j\}\(\\mathcal\{R\}\_\{j\}=\\boldsymbol\{r\}\_\{j\}\)\.Thus, to conclude the proof of showing thatPM​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)=PM′​\(Q∣R1′=r1′,…,Rk′=rk′\)P\_\{M\}\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)=P\_\{M^\{\\prime\}\}\(Q\\mid R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\), it remains to be shown thatZ′=1Z^\{\\prime\}=1\. The definition of the normalisation constantZ′Z^\{\\prime\}\([Eq\.˜9](https://arxiv.org/html/2606.28024#S3.E9)\) is given by

Z′=∑\(q,r1,…,rℓ,r1′,…,rk′\)⁣∈⁣×R∈gr​\(𝑨\)range​\(R\)∏ϕj′∈gr​\(𝚽′\)ϕj′​\(ℛj=𝒓j\)\.\\displaystyle Z^\{\\prime\}=\\sum\_\{\(q,r\_\{1\},\\ldots,r\_\{\\ell\},r^\{\\prime\}\_\{1\},\\ldots,r^\{\\prime\}\_\{k\}\)\\in\\times\_\{R\\in\\mathrm\{gr\}\(\\boldsymbol\{A\}\)\}\\mathrm\{range\}\(R\)\}\\prod\_\{\\phi^\{\\prime\}\_\{j\}\\in\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}^\{\\prime\}\)\}\\phi^\{\\prime\}\_\{j\}\(\\mathcal\{R\}\_\{j\}=\\boldsymbol\{r\}\_\{j\}\)\.After the modification, every factorϕj′​\(Rj1,…,Rjz\)∈gr​\(𝚽′\)\\phi^\{\\prime\}\_\{j\}\(R\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\}\}\)\\in\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}^\{\\prime\}\)still represents a valid conditional probability distributionP\(Rjz=rjz∣Rj1=rj1,…,Rjz−1=rjz−1\)P\(R\_\{j\_\{z\}\}=r\_\{j\_\{z\}\}\\mid R\_\{j\_\{1\}\}=r\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\-1\}\}=r\_\{j\_\{z\-1\}\}\)because exactly one assignment ofRjzR\_\{j\_\{z\}\}is mapped to one while all other assignments are mapped to zero, thus ensuring that the sum of all assignments is one\. We can therefore again apply[Eq\.˜11](https://arxiv.org/html/2606.28024#S3.E11)to the definition of the normalisation constantZ′Z^\{\\prime\}and obtain

Z′=∑\(q,r1,…,rℓ,r1′,…,rk′\)⁣∈⁣×R∈gr​\(𝑨\)range​\(R\)∏Ri∈gr​\(𝑨\)P​\(ri∣pa𝑹​\(gr​\(M′\),Ri\)\),\\displaystyle Z^\{\\prime\}=\\sum\_\{\(q,r\_\{1\},\\ldots,r\_\{\\ell\},r^\{\\prime\}\_\{1\},\\ldots,r^\{\\prime\}\_\{k\}\)\\in\\times\_\{R\\in\\mathrm\{gr\}\(\\boldsymbol\{A\}\)\}\\mathrm\{range\}\(R\)\}\\prod\_\{R\_\{i\}\\in\\mathrm\{gr\}\(\\boldsymbol\{A\}\)\}P\(r\_\{i\}\\mid\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M^\{\\prime\}\),R\_\{i\}\)\),whererir\_\{i\}is the assigned value forRiR\_\{i\}in the assignment\(q,r1,…,rℓ,r1′,…,rk′\)\(q,\\allowbreak r\_\{1\},\\allowbreak\\ldots,\\allowbreak r\_\{\\ell\},\\allowbreak r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak r^\{\\prime\}\_\{k\}\)andpa𝑹​\(gr​\(M\),Ri\)\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R\_\{i\}\)is a projection of the assignment\(q,r1,…,rℓ,r1′,…,rk′\)\(q,\\allowbreak r\_\{1\},\\allowbreak\\ldots,\\allowbreak r\_\{\\ell\},\\allowbreak r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak r^\{\\prime\}\_\{k\}\)to the parents ofRiR\_\{i\}in the ground modelgr​\(M′\)\\mathrm\{gr\}\(M^\{\\prime\}\)\. In other words,Z′Z^\{\\prime\}is a sum over all entries in the full joint probability distribution, and hence, it holds thatZ′=1Z^\{\\prime\}=1as all entries in a full joint probability distribution must sum up to one\. ∎

An alternative way of verifying that the normalisation constant is equal to one if every factorϕj​\(Rj1,…,Rjz\)\\phi\_\{j\}\(R\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\}\}\)represents a conditional probability distributionP​\(Rjz∣Rj1,…,Rjz−1\)P\(R\_\{j\_\{z\}\}\\mid R\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\-1\}\}\)is to make use of[Eq\.˜11](https://arxiv.org/html/2606.28024#S3.E11)in the factorisation implied by the causal Markov property \([Eq\.˜4](https://arxiv.org/html/2606.28024#S2.E4)\)\. The resulting factorisation then is equivalent to the factorisation given in the definition of the semantics of aPCFG\([Eq\.˜8](https://arxiv.org/html/2606.28024#S3.E8)\) forZ=1Z=1and as both factorisations are valid,ZZhas to be equal to one\.

By modifying the original model, an interventional queryP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)can be answered by computing the result of*a single*probabilistic query in the modified model \(however, the modification of the model introduces some overhead\)\. Using the truncated product formula directly on the original model instead, we obtain a set of multiple probabilistic queries that have to be answered\. In both cases, a lifted inference algorithm such asLVEor theLJTalgorithm \(which is specifically advantageous if a set of probabilistic queries needs to be answered as a result of the truncated product formula\) can be applied to answer these queries on a lifted level\.

Before we give a full algorithm to efficiently answer interventional queries in aPCFG, we explain how the modification of the original model is done such that[Prop\.˜1](https://arxiv.org/html/2606.28024#Thmtheorem1)can be applied\. In particular, as only specific ground factors that have an intervention variable as a child are changed, we have to*split*parfactorssuch that modified ground factors can be separated from the remaining ground factors\. Splitting aparfactorin aPCFGMMresults in a modifiedPCFGM′M^\{\\prime\}entailing equivalent semantics asMM\[DeSalvoBraz2005a\]such thatM′M^\{\\prime\}forms a valid model on which lifted inference algorithms \(such asLVEand theLJTalgorithm\) can be run\. The procedure of splitting aparfactorϕ​\(𝒜\)\|C\\phi\(\\mathcal\{A\}\)\_\{\|C\}on a specific instanceA​\(l1,…,lz\)∈gr​\(A​\(ℒA\)\)A\(l\_\{1\},\\ldots,l\_\{z\}\)\\in\\mathrm\{gr\}\(A\(\\mathcal\{L\}\_\{A\}\)\), whereA​\(ℒA\)∈𝒜A\(\\mathcal\{L\}\_\{A\}\)\\in\\mathcal\{A\}is aPRVin the argument list ofϕ​\(𝒜\)\|C\\phi\(\\mathcal\{A\}\)\_\{\|C\}, replacesϕ​\(𝒜\)\|C\\phi\(\\mathcal\{A\}\)\_\{\|C\}by twoparfactorsϕ​\(𝒜\)\|C1\\phi\(\\mathcal\{A\}\)\_\{\|C\_\{1\}\}andϕ​\(𝒜\)\|C2\\phi\(\\mathcal\{A\}\)\_\{\|C\_\{2\}\}\. The constraintsC1C\_\{1\}andC2C\_\{2\}are chosen such that the inputs ofϕ​\(𝒜\)\|C1\\phi\(\\mathcal\{A\}\)\_\{\|C\_\{1\}\}are restricted to all sequences under constraintCCthat containA​\(l1,…,lz\)A\(l\_\{1\},\\ldots,l\_\{z\}\)and the inputs ofϕ​\(𝒜\)\|C2\\phi\(\\mathcal\{A\}\)\_\{\|C\_\{2\}\}are restricted to the remaining input sequences under constraintCC\.

We next present theLCIalgorithm, which efficiently answers interventional queries in aPCFGon a lifted level\. The basic idea ofLCIis to splitparfactorsbased on the intervention variables such that the parent factors of intervention variables are detached from their respective groups and thus can be changed according to[Prop\.˜1](https://arxiv.org/html/2606.28024#Thmtheorem1)\.

## 4The Lifted Causal Inference Algorithm

TheLCIalgorithm solves the problem of efficiently computing the effect of interventions in aPCFG\.\\Aclci avoids to fully ground thePCFGif possible to benefit from lifted inference\. For instance, consider again thePCFGMMillustrated in[Fig\.˜2](https://arxiv.org/html/2606.28024#S3.F2)and assume we would like to compute the answer to the interventional queryP​\(R​e​v∣d​o​\(C​o​m​\(B​o​b\)=high\)\)P\(Rev\\mid do\(Com\(Bob\)=\\mathrm\{high\}\)\)inMM\. As the interventiond​o​\(C​o​m​\(B​o​b\)=high\)do\(Com\(Bob\)=\\mathrm\{high\}\)fixes the value ofC​o​m​\(B​o​b\)Com\(Bob\)tohigh\\mathrm\{high\}, we have to treatB​o​bBobdifferently fromA​l​i​c​eAliceandC​h​a​r​l​i​eCharlie, whose competences remain unobserved\. In other words, not all employees are indistinguishable anymore\. Nevertheless, and this is the crucial point, we can still treatA​l​i​c​eAliceandC​h​a​r​l​i​eCharlieas indistinguishable when computing the result of the interventional queryP​\(R​e​v∣d​o​\(C​o​m​\(B​o​b\)=high\)\)P\(Rev\\mid do\(Com\(Bob\)=\\mathrm\{high\}\)\)\.

Algorithm 1Lifted Causal InferenceInput:APCFGM=\(𝑨∪𝑮,𝑬,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\), and an interventional queryP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)with\{Q,R1′,…,Rk′\}⊆gr​\(𝑨\)=\{Q,R1,…,Rℓ,R1′,…,Rk′\}\\\{Q,R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}\\subseteq\\mathrm\{gr\}\(\\boldsymbol\{A\}\)=\\\{Q,\\allowbreak R\_\{1\},\\allowbreak\\ldots,\\allowbreak R\_\{\\ell\},\\allowbreak R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}\. Output:The result of the interventional queryP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)\.

1:

M′←M^\{\\prime\}\\leftarrowPCFGobtained by splittingparfactorsin

MMon each

Ri′∈\{R1′,…,Rk′\}R^\{\\prime\}\_\{i\}\\in\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}
2:for each

Ri′∈\{R1′,…,Rk′\}R^\{\\prime\}\_\{i\}\\in\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}do

3:for each

ϕj′​\(Rj1,…,Rjz\)∈Pa​\(M′,Ri′\)\\phi^\{\\prime\}\_\{j\}\(R\_\{j\_\{1\}\},\\ldots,R\_\{j\_\{z\}\}\)\\in\\mathrm\{Pa\}\(M^\{\\prime\},R^\{\\prime\}\_\{i\}\)do

4:for eachassignment

\(rj1,…,rjz\)∈range​\(Rj1\)×…×range​\(Rjz\)\(r\_\{j\_\{1\}\},\\ldots,r\_\{j\_\{z\}\}\)\\in\\mathrm\{range\}\(R\_\{j\_\{1\}\}\)\\times\\ldots\\times\\mathrm\{range\}\(R\_\{j\_\{z\}\}\)do

5:Set

ϕj′​\(rj1,…,rjz\)=\{1if\(rj1,…,rjz\)assignsRi′=ri′0if\(rj1,…,rjz\)assignsRi′≠ri′\\phi^\{\\prime\}\_\{j\}\(r\_\{j\_\{1\}\},\\ldots,r\_\{j\_\{z\}\}\)=\\begin\{cases\}1&\\text\{if $\(r\_\{j\_\{1\}\},\\ldots,r\_\{j\_\{z\}\}\)$ assigns $R^\{\\prime\}\_\{i\}=r^\{\\prime\}\_\{i\}$\}\\\\ 0&\\text\{if $\(r\_\{j\_\{1\}\},\\ldots,r\_\{j\_\{z\}\}\)$ assigns $R^\{\\prime\}\_\{i\}\\neq r^\{\\prime\}\_\{i\}$\}\\end\{cases\}
6:

D←D\\leftarrowCallLVEon

M′M^\{\\prime\}and

P​\(Q∣R1′=r1′,…,Rk′=rk′\)P\(Q\\mid R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)
7:return

DD

### 4\.1Algorithm Description

We next describe theLCIalgorithm to compute the result of an interventional queryP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)in aPCFGMMwhere every ground factor represents a conditional probability distribution\.[Algorithm˜1](https://arxiv.org/html/2606.28024#alg1)displays the entireLCIalgorithm, which we now discuss in detail\. First, in[˜1](https://arxiv.org/html/2606.28024#alg1.l1),LCIsplits theparfactorsinMMon the intervention variablesRi′∈\{R1′,…,Rk′\}R^\{\\prime\}\_\{i\}\\in\\\{R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}to obtain a modifiedPCFGM′M^\{\\prime\}\. More specifically,LCIsplits everyparfactorϕ∈𝚽\\phi\\in\\boldsymbol\{\\Phi\}for which there is an instanceϕj∈gr​\(ϕ\)\\phi\_\{j\}\\in\\mathrm\{gr\}\(\\phi\)such that any intervention variableRi′∈\{R1′,…,Rk′\}R^\{\\prime\}\_\{i\}\\in\\\{R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}is a child ofϕj\\phi\_\{j\}\. After the splitting procedure, the semantics of the model remains unchanged as the set of ground factors inM′M^\{\\prime\}is still the same as the set of ground factors of the initial modelMM\. The only difference after splitting is that the ground factors are now arranged differently across the sets of ground instances\. Having completed the split of all respectiveparfactors,LCInext changes the parentparfactorsof all intervention variablesRi′∈\{R1′,…,Rk′\}R^\{\\prime\}\_\{i\}\\in\\\{R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}to modify the underlying probability distribution encoded byM′M^\{\\prime\}according to the semantics of the interventiond​o​\(R1′=r1′,…,Rk′=rk′\)do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\([˜2](https://arxiv.org/html/2606.28024#alg1.l2),[3](https://arxiv.org/html/2606.28024#alg1.l3),[4](https://arxiv.org/html/2606.28024#alg1.l4)and[5](https://arxiv.org/html/2606.28024#alg1.l5)\)\.\\Aclci changes theparfactorsinM′M^\{\\prime\}according to[Prop\.˜1](https://arxiv.org/html/2606.28024#Thmtheorem1)and thus, after theparfactorshave been changed,M′M^\{\\prime\}encodes the interventional distribution under the interventiond​o​\(R1′=r1′,…,Rk′=rk′\)do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\. More specifically, as each intervention variableRi′R^\{\\prime\}\_\{i\}is fixed to the valueri′r^\{\\prime\}\_\{i\}, all parentparfactorsϕj′​\(Rj1,…,Rjz\)∈Pa​\(M′,Ri′\)\\phi^\{\\prime\}\_\{j\}\(R\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\}\}\)\\in\\mathrm\{Pa\}\(M^\{\\prime\},R^\{\\prime\}\_\{i\}\)ofRi′R^\{\\prime\}\_\{i\}are altered such that all input sequences\(rj1,…,rjz\)\(r\_\{j\_\{1\}\},\\ldots,r\_\{j\_\{z\}\}\)assigningRi′=ri′R^\{\\prime\}\_\{i\}=r^\{\\prime\}\_\{i\}map to the value one while all other input sequences map to zero\. Finally,LCIcomputes the result of the probabilistic queryP​\(Q∣R1′=r1′,…,Rk′=rk′\)P\(Q\\mid R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)in the modified modelM′M^\{\\prime\}, which is, according to[Prop\.˜1](https://arxiv.org/html/2606.28024#Thmtheorem1), equivalent to the result of the interventional queryP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)in the original modelMM\. To compute the result ofP​\(Q∣R1′=r1′,…,Rk′=rk′\)P\(Q\\mid R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)inM′M^\{\\prime\},LCIcallsLVEonP​\(Q∣R1′=r1′,…,Rk′=rk′\)P\(Q\\mid R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)andM′M^\{\\prime\}and then returns the result computed byLVE\([˜6](https://arxiv.org/html/2606.28024#alg1.l6)and[7](https://arxiv.org/html/2606.28024#alg1.l7)\)\. During this step,LVE\(which originally operates on aPFG\) ignores the edge directions inM′M^\{\\prime\}\. Since the semantics of the underlying full joint probability distribution encoded by aPFGare defined identically to the semantics of the probability distribution encoded by aPCFG,LVEcan also be applied to compute the result of probabilistic queries in aPCFG\(alternatively, a different lifted inference algorithm that works on aPFGcould be called as well\)\.

###### Example 7\(Lifted Causal Inference\)\.

Look at thePCFGMMshown in[Fig\.˜2](https://arxiv.org/html/2606.28024#S3.F2)and consider the interventional queryP​\(R​e​v∣d​o​\(C​o​m​\(B​o​b\)=high\)\)P\(Rev\\mid do\(Com\(Bob\)=\\mathrm\{high\}\)\)\. In accordance with the previous examples, we assume thatdom​\(E\)=\{A​l​i​c​e,B​o​b,C​h​a​r​l​i​e\}\\mathrm\{dom\}\(E\)=\\\{Alice,\\allowbreak Bob,\\allowbreak Charlie\\\}\. SinceC​o​m​\(B​o​b\)Com\(Bob\)is a particular instance ofC​o​m​\(E\)Com\(E\), we have to split theparfactorϕ1​\(C​o​m​\(E\)\)\|⊤\\phi\_\{1\}\(Com\(E\)\)\_\{\|\\top\}, which is a parentparfactorofC​o​m​\(E\)Com\(E\)\.[Figure˜3](https://arxiv.org/html/2606.28024#S4.F3)shows the modifiedPCFGM′M^\{\\prime\}obtained after splittingϕ1​\(C​o​m​\(E\)\)\|⊤\\phi\_\{1\}\(Com\(E\)\)\_\{\|\\top\}onC​o​m​\(B​o​b\)Com\(Bob\)\. InM′M^\{\\prime\},ϕ1​\(C​o​m​\(E\)\)\|⊤\\phi\_\{1\}\(Com\(E\)\)\_\{\|\\top\}has been replaced byϕ1​\(C​o​m​\(E\)\)\|C′\\phi\_\{1\}\(Com\(E\)\)\_\{\|C^\{\\prime\}\}\(the correspondingparfactornode isg1′g^\{\\prime\}\_\{1\}\) andϕ1​\(C​o​m​\(E\)\)\|C′′\\phi\_\{1\}\(Com\(E\)\)\_\{\|C^\{\\prime\\prime\}\}\(the correspondingparfactornode isg1′′g^\{\\prime\\prime\}\_\{1\}\), whereC′=\(E,\{B​o​b\}\)C^\{\\prime\}=\(E,\\\{Bob\\\}\)andC′′=\(E,\{A​l​i​c​e,C​h​a​r​l​i​e\}\)C^\{\\prime\\prime\}=\(E,\\\{Alice,\\allowbreak Charlie\\\}\)\. To incorporate the semantics of the interventiond​o​\(C​o​m​\(B​o​b\)\)=highdo\(Com\(Bob\)\)=\\mathrm\{high\},LCInext modifies the parentparfactorsofC​o​m​\(B​o​b\)Com\(Bob\), i\.e\.,LCImodifiesϕ1​\(C​o​m​\(E\)\)\|C′\\phi\_\{1\}\(Com\(E\)\)\_\{\|C^\{\\prime\}\}in this example \(sinceϕ1​\(C​o​m​\(E\)\)\|C′′\\phi\_\{1\}\(Com\(E\)\)\_\{\|C^\{\\prime\\prime\}\}is constrained toA​l​i​c​eAliceandC​h​a​r​l​i​eCharlie,ϕ1​\(C​o​m​\(E\)\)\|C′′\\phi\_\{1\}\(Com\(E\)\)\_\{\|C^\{\\prime\\prime\}\}is not a parent ofC​o​m​\(B​o​b\)Com\(Bob\)and hence not changed\)\. More specifically,ϕ1​\(C​o​m​\(E\)=high\)\|C′\\phi\_\{1\}\(Com\(E\)=\\mathrm\{high\}\)\_\{\|C^\{\\prime\}\}is set to one andϕ1​\(C​o​m​\(E\)=low\)\|C′\\phi\_\{1\}\(Com\(E\)=\\mathrm\{low\}\)\_\{\|C^\{\\prime\}\}is set to zero\. Finally,LCIrunsLVEto computeP​\(R​e​v∣C​o​m​\(B​o​b\)=high\)P\(Rev\\mid Com\(Bob\)=\\mathrm\{high\}\)inM′M^\{\\prime\}, which is equivalent to computingP​\(R​e​v∣d​o​\(C​o​m​\(B​o​b\)=high\)\)P\(Rev\\mid do\(Com\(Bob\)=\\mathrm\{high\}\)\)in the original modelMM\.

C​o​m​\(E\)Com\(E\)g1′g^\{\\prime\}\_\{1\}g1′′g^\{\\prime\\prime\}\_\{1\}g2g\_\{2\}g3g\_\{3\}R​e​vRevS​a​l​\(E\)Sal\(E\)Figure 3:A visualisation of the modifiedPCFGobtained after altering thePCFGshown in[Fig\.˜2](https://arxiv.org/html/2606.28024#S3.F2)by splittingϕ1​\(C​o​m​\(E\)\)\|⊤\\phi\_\{1\}\(Com\(E\)\)\_\{\|\\top\}onC​o​m​\(B​o​b\)Com\(Bob\)\.\\Ac

lci is able to handle both interventions on a single \(ground\)randvaras well as interventions on a conjunction of multiplerandvarsefficiently\. In particular, when intervening on multiple indistinguishablerandvarsat the same time,LCIis able to treat thoserandvarsas a group even after the intervention\. For instance, assume that in our running example, we would like to train multiple employees simultaneously, as a training program is mostly offered not only for a single employee but for a group of employees \(here, training an employeee∈dom​\(E\)e\\in\\mathrm\{dom\}\(E\)corresponds to the interventiond​o​\(C​o​m​\(e\)=high\)do\(Com\(e\)=\\mathrm\{high\}\)\)\. Then, it is not necessary to split all trained employees into separate groups but instead it is sufficient to differentiate between trained employees and all remaining employees\. Formally, the interventiond​o​\(R1′=r1′,…,Rk′=rk′\)do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)on an arbitrary set ofrandvars\{R1′,…,Rk′\}\\\{R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}can thus efficiently be handled by splitting theparfactorsinMMsuch that allRi′R^\{\\prime\}\_\{i\}that are represented by the samePRVAAand that are set to the same valueri′∈range​\(Ri′\)r^\{\\prime\}\_\{i\}\\in\\mathrm\{range\}\(R^\{\\prime\}\_\{i\}\)remain grouped\. Specifically,LCIneeds just a single split on theparfactorsper group and thus avoids manipulating the parents of each individualrandvarseparately\. In contrast, in a propositional model, every object has to be treated individually and therefore the parents for eachrandvarneed to be manipulated separately\. Given the way we specified the semantics of an intervention in aPCFG, it immediately follows thatLCIcorrectly computes the effect of interventions\.

###### Proposition 2\.

The result computed byLCI\([Alg\.˜1](https://arxiv.org/html/2606.28024#alg1)\) is the correct answer to the interventional queryP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)in the givenPCFGMM\.

###### Proof\.

AsLCIdirectly applies[Prop\.˜1](https://arxiv.org/html/2606.28024#Thmtheorem1)by setting the parent factors of all intervention variables accordingly, the result of the interventional queryP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)in the original modelMMis equivalent to the result of the probabilistic queryP​\(Q∣R1′=r1′,…,Rk′=rk′\)P\(Q\\mid R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)in the modified modelM′M^\{\\prime\}computed byLCI\. ∎

Moreover, by callingLVE,LCIallows for tractable probabilistic inference problems with respect to domain sizes oflogvars\. Thus,LCIruns in polynomial time with respect to domain sizes oflogvarsfor allPCFGsbelonging to the class of domain\-liftable models\. The class of domain\-liftable models includes allPCFGscontaining onlyparfactorswith at most twologvarsand allPCFGscontaining onlyPRVshaving at most onelogvar\[VanDenBroeck2011a\]\.

###### Proposition 3\.

\\Ac

lci \([Alg\.˜1](https://arxiv.org/html/2606.28024#alg1)\) allows for tractable probabilistic inference problems with respect to domain sizes oflogvarsfor the class of domain\-liftable models\.

###### Proof\.

If the inputPCFGMMforLCIbelongs to the class of domain\-liftable models, so does the modified modelM′M^\{\\prime\}obtained after splitting theparfactorsinMMand changing the parent factors of the intervention variables because newly introducedparfactorscontain identicallogvarsas the originalparfactorsthat were split\. Consequently, the input forLVE, which is given byM′M^\{\\prime\}, belongs to the class of domain\-liftable models and asLVEis complete for this model class\[Taghipour2013b\]\(that is,LVEruns in polynomial time with respect to the domain sizes of thelogvarsin its input model for all combinations of queries, evidence, and models in this class\), the call ofLVEallows for tractable probabilistic inference problems with respect to domain sizes oflogvarsprovided thatMMbelongs to the class of domain\-liftable models\. Furthermore, both the splitting procedure in[˜1](https://arxiv.org/html/2606.28024#alg1.l1)and the loops in[˜2](https://arxiv.org/html/2606.28024#alg1.l2),[3](https://arxiv.org/html/2606.28024#alg1.l3),[4](https://arxiv.org/html/2606.28024#alg1.l4)and[5](https://arxiv.org/html/2606.28024#alg1.l5)of[Alg\.˜1](https://arxiv.org/html/2606.28024#alg1)do not influence the overall time complexity ofLCI\(as the loops iterate over potential tables that must be considered anyway during inference\)\. Thus,LCIallows for tractable probabilistic inference problems with respect to domain sizes oflogvarsfor the class of domain\-liftable models\. ∎

To summarise,LCIis a simple, yet effective algorithm to perform lifted causal inference\.\\Aclci can also handle queries with multiple query variables, provided that the lifted inference algorithm which is called in[˜6](https://arxiv.org/html/2606.28024#alg1.l6)can handle multiple query variables as well\. Next, we take a look at our experiments, which highlight the practical performance ofLCIto compute the effect of interventions in aPCFGon a lifted level\.

### 4\.2Experiments

In this subsection, we evaluate the runtimes needed to compute the result of interventional queries inCBNs,CFGs, andPCFGs\. For our experiments, we use a slightly modified version of thePCFGMMgiven in[Fig\.˜2](https://arxiv.org/html/2606.28024#S3.F2), whose groundCFGdirectly corresponds to aCBN\. In addition to thePCFGMM, we also investigate runtimes for causal inference in theCFGobtained by groundingMM, and its equivalentCBN\. To obtain the equivalentCBN, we apply the transformation from directedFGtoCBNproposed byFrey2003a\. Hence, all three models, thePCFG, theCFG, and theCBN, encode the same underlying full joint probability distribution\. As a remark, we note that thePCFGused in our experiments to demonstrate the practical efficiency of lifted causal inference is rather small with fourparfactorsandPRVs, respectively, and the gain we obtain from lifted inference might further increase for models consisting of morePRVs\. Our experiments can thus be seen as a proof of concept demonstrating the practical efficiency of lifted causal inference and more extensive experiments onPCFGswith various graph structures and onPCFGswith morePRVsare left for future work\.

101001000100008163264128256512102420484096Number of employeesddtime \(ms\)LVE \(PCFG\)VE \(CBN\)VE \(CFG\)Figure 4:A comparison of the runtimes required to compute interventional distributions on different graphical models encoding equivalent full joint probability distributions\.We test the required runtime to compute the result of an interventional query for each of the three graphical models on different graph sizes by setting the domain size of the employees tod∈\{8,16,32,64,128,256,512,1024,2048,4096\}d\\in\\\{8,\\allowbreak 16,\\allowbreak 32,\\allowbreak 64,\\allowbreak 128,\\allowbreak 256,\\allowbreak 512,\\allowbreak 1024,\\allowbreak 2048,\\allowbreak 4096\\\}, that is,\|dom​\(E\)\|=d\\lvert\\mathrm\{dom\}\(E\)\\rvert=d\.[Figure˜4](https://arxiv.org/html/2606.28024#S4.F4)shows the runtimes needed to compute the result of the probabilistic query in the modified graph when runningvariable elimination\(VE\) on theCFG,VEon theCBN, andLVEon thePCFG\. As we have seen,VEalgorithm is the propositional counterpart ofLVEand operates on a propositional \(ground\) model, such as aCBNor anCFG\. Consequently,VEconsiders every object \(e\.g\., every employee\) individually for computations, independent of whether objects are indistinguishable or not\. In contrast,LVEtreats indistinguishable objects as a group by using a representative for computations instead of considering each of those objects separately\. The results emphasise that theLCIalgorithm, which internally exploitsLVE, overcomes scalability issues for large domain sizes as the runtime ofLVE, in contrast to the runtimes ofVEon theCBNand theCFG, does not exponentially increase for increasing values ofdd\(y\-axis is log\-scaled\)\. Even though the splitting ofparfactorsresults in a less compressed lifted representation, it becomes evident that the performance ofLVEis not significantly affected by the splitting\.

WhileLCIin combination with a givenPCFGsolves the problem of efficiently computing the effect of interventions on a lifted level, in practice, we often face the problem of not knowing all the underlying causal relationships\. Thus, in the upcoming section, we relax the assumption of knowing all causal relationships by allowing for partial causal knowledge\. In particular, we introducePD\-PCFGsa generalisation ofPCFGsand investigate the implications of not knowing all causal relationships for answering interventional queries\. Moreover, we have hitherto assumed that every factor in aPCFGencodes a conditional probability distribution and we also abandon this assumption in the next section\.

## 5Partially Directed Parametric Causal Factor Graphs

We now move on to define aPD\-PCFGas a lifted representation that is able to incorporate partial causal knowledge by combining aPFGwith a partially directed graph to model causal relationships\. The major advantage of aPD\-PCFGover aPCFGis that not all causal relationships between the involvedrandvarsneed to be known, thereby reducing the amount of prior knowledge required and thus making the formalism more suitable for many practical settings\.

###### Definition 16\(Partially Directed Parametric Causal Factor Graph\)\.

A*PD\-PCFG*M=\(𝐕,𝐄,𝚽\)M=\(\\boldsymbol\{V\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)consists of a partially directed graph\(𝐕,𝐄\)\(\\boldsymbol\{V\},\\boldsymbol\{E\}\)with node set𝐕=𝐀∪𝐆\\boldsymbol\{V\}=\\boldsymbol\{A\}\\cup\\boldsymbol\{G\}and edge set𝐄⊆𝐀×𝐆\\boldsymbol\{E\}\\subseteq\\boldsymbol\{A\}\\times\\boldsymbol\{G\}\. The set of nodes𝐕=𝐀∪𝐆\\boldsymbol\{V\}=\\boldsymbol\{A\}\\cup\\boldsymbol\{G\}is partitioned into a set ofPRVs𝐀=\{A1,…,An\}\\boldsymbol\{A\}=\\\{A\_\{1\},\\ldots,A\_\{n\}\\\}and a set ofparfactornames \(parfactornodes\)𝐆=\{g1,…,gm\}\\boldsymbol\{G\}=\\\{g\_\{1\},\\ldots,g\_\{m\}\\\}\. For everyparfactornamegj∈𝐆g\_\{j\}\\in\\boldsymbol\{G\}, there is a function definition \(parfactor\)ϕj​\(𝒜j\)\|C∈𝚽\\phi\_\{j\}\(\\mathcal\{A\}\_\{j\}\)\_\{\|C\}\\in\\boldsymbol\{\\Phi\}with𝒜j\\mathcal\{A\}\_\{j\}being a sequence ofPRVsfrom𝐀\\boldsymbol\{A\}andCCbeing a constraint on thelogvarsof𝒜j\\mathcal\{A\}\_\{j\}such thatϕ:×A∈𝒜jrange\(A\)↦ℝ≥0\\phi\\colon\\times\_\{A\\in\\mathcal\{A\}\_\{j\}\}\\mathrm\{range\}\(A\)\\mapsto\\mathbb\{R\}\_\{\\geq 0\}maps range values in𝒜j\\mathcal\{A\}\_\{j\}to a positive real number \(potential\)\. As usual, in every function definition, at least one potential has to be non\-zero and we may omit\|⊤\|\\topinϕj​\(𝒜j\)\|⊤\\phi\_\{j\}\(\\mathcal\{A\}\_\{j\}\)\_\{\|\\top\}\. For eachparfactornamegj∈𝐆g\_\{j\}\\in\\boldsymbol\{G\}with corresponding function definitionϕj​\(𝒜j\)\|C∈𝚽\\phi\_\{j\}\(\\mathcal\{A\}\_\{j\}\)\_\{\|C\}\\in\\boldsymbol\{\\Phi\}, there is either an undirected edge\{Ai,gj\}∈𝐄\\\{A\_\{i\},g\_\{j\}\\\}\\in\\boldsymbol\{E\}or a directed edge\(gj,Ai\)∈𝐄\(g\_\{j\},A\_\{i\}\)\\in\\boldsymbol\{E\}for everyPRVAi∈𝒜jA\_\{i\}\\in\\mathcal\{A\}\_\{j\}\. We stipulate that for everyparfactornodegj∈𝐆g\_\{j\}\\in\\boldsymbol\{G\}, there is at most one outgoing directed edge\(gj,Ai\)∈𝐄\(g\_\{j\},A\_\{i\}\)\\in\\boldsymbol\{E\}among the edges incident togjg\_\{j\}\. In case aparfactorϕj∈𝚽\\phi\_\{j\}\\in\\boldsymbol\{\\Phi\}has only a single argumentA∈𝐀A\\in\\boldsymbol\{A\}, its correspondingparfactornodegj∈𝐆g\_\{j\}\\in\\boldsymbol\{G\}is connected toAAvia a directed edge\(gj,A\)∈𝐄\(g\_\{j\},A\)\\in\\boldsymbol\{E\}\. Furthermore, each directed edgeAk−gj→AiA\_\{k\}\-g\_\{j\}\\to A\_\{i\}from aPRVAk∈𝐀A\_\{k\}\\in\\boldsymbol\{A\}to aPRVAi∈𝐀A\_\{i\}\\in\\boldsymbol\{A\}via aparfactornodegj∈𝐆g\_\{j\}\\in\\boldsymbol\{G\}corresponds to a direct causal relationship betweenAkA\_\{k\}andAiA\_\{i\}\. The directed edges in𝐄\\boldsymbol\{E\}are not allowed to form any directed cycles, i\.e\.,𝐄\\boldsymbol\{E\}contains no sequence of edges\{A1,g1\},\(g1,A2\),…,\{Ak−1,gk\},\(gk,A1\)\\\{A\_\{1\},\\allowbreak g\_\{1\}\\\},\\allowbreak\(g\_\{1\},\\allowbreak A\_\{2\}\),\\allowbreak\\ldots,\\allowbreak\\\{A\_\{k\-1\},\\allowbreak g\_\{k\}\\\},\\allowbreak\(g\_\{k\},\\allowbreak A\_\{1\}\)starting from an arbitraryPRVA1∈𝐀A\_\{1\}\\in\\boldsymbol\{A\}such that the sequence ends again atA1A\_\{1\}while every second edge in the sequence is directed and the sequence follows the arrow directions of the directed edges\. The semantics ofMMis given by grounding with respect to constraints and building a full joint distribution over𝐑=gr​\(𝐀\)\\boldsymbol\{R\}=\\mathrm\{gr\}\(\\boldsymbol\{A\}\)\. The joint potential for an assignment𝐑=𝐫\\boldsymbol\{R\}=\\boldsymbol\{r\}is defined as

ψM​\(𝑹=𝒓\)=∏ϕj∈𝚽∏ϕk∈gr​\(ϕj\)ϕk​\(ℛk=𝒓k\),\\displaystyle\\psi\_\{M\}\(\\boldsymbol\{R\}=\\boldsymbol\{r\}\)=\\prod\_\{\\phi\_\{j\}\\in\\boldsymbol\{\\Phi\}\}\\prod\_\{\\phi\_\{k\}\\in\\mathrm\{gr\}\(\\phi\_\{j\}\)\}\\phi\_\{k\}\(\\mathcal\{R\}\_\{k\}=\\boldsymbol\{r\}\_\{k\}\),\(12\)where𝐫k\\boldsymbol\{r\}\_\{k\}is a projection of𝐫\\boldsymbol\{r\}to the argument listℛk\\mathcal\{R\}\_\{k\}ofϕk\\phi\_\{k\}\. The normalised joint potential then yields the full joint probability distribution over𝐑\\boldsymbol\{R\}that is encoded byMM, that is,

PM​\(𝑹=𝒓\)=1Z​ψM​\(𝑹=𝒓\),\\displaystyle P\_\{M\}\(\\boldsymbol\{R\}=\\boldsymbol\{r\}\)=\\frac\{1\}\{Z\}\\psi\_\{M\}\(\\boldsymbol\{R\}=\\boldsymbol\{r\}\),\(13\)whereZZis the normalisation constant, defined as

Z=∑𝒓⁣∈⁣×R∈𝑹range​\(R\)ψM​\(𝑹=𝒓\)\.\\displaystyle Z=\\sum\_\{\\boldsymbol\{r\}\\in\\times\_\{R\\in\\boldsymbol\{R\}\}\\mathrm\{range\}\(R\)\}\\psi\_\{M\}\(\\boldsymbol\{R\}=\\boldsymbol\{r\}\)\.\(14\)

APD\-PCFGthus offers the possibility to omit edge directions if no information about the underlying causal relationships is available\. Grounding aPD\-PCFGMMyields a partially directedCFG, which encodes the same underlying full joint probability distribution asMM\. If all causal relationships are known \(and hence allparfactornodes have an outgoing edge\), aPD\-PCFGis identical to aPCFG\. In aPD\-PCFGM=\(𝑨∪𝑮,𝑬,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\), we follow the same notations for the parentPRVsPa𝑨​\(M,A\)\\mathrm\{Pa\}\_\{\\boldsymbol\{A\}\}\(M,A\)of aPRVA∈𝑨A\\in\\boldsymbol\{A\}, the parentPRVsPa​\(M,g\)\\mathrm\{Pa\}\(M,g\)of aparfactornodeg∈𝑮g\\in\\boldsymbol\{G\}, the childPRVsCh𝑨​\(M,A\)\\mathrm\{Ch\}\_\{\\boldsymbol\{A\}\}\(M,A\)of aPRVA∈𝑨A\\in\\boldsymbol\{A\}, the childPRVsCh​\(M,g\)\\mathrm\{Ch\}\(M,g\)of aparfactornodeg∈𝑮g\\in\\boldsymbol\{G\}, the descendantPRVsDe𝑨​\(M,A\)\\mathrm\{De\}\_\{\\boldsymbol\{A\}\}\(M,A\)of aPRVA∈𝑨A\\in\\boldsymbol\{A\}, and the descendantPRVsDe​\(M,g\)\\mathrm\{De\}\(M,g\)of aparfactornodeg∈𝑮g\\in\\boldsymbol\{G\}as in aPCFG\. Additionally, we define the \(undirected\) neighbourPRVsof aPRVA∈𝑨A\\in\\boldsymbol\{A\}inMMasNe𝑨​\(M,A\)=\{A′∈𝑨∣∃g∈𝑮:Ch​\(M,g\)=∅∧\{g,A′\}∈𝑬∧\{g,A\}∈𝑬\}\\mathrm\{Ne\}\_\{\\boldsymbol\{A\}\}\(M,A\)=\\\{A^\{\\prime\}\\in\\boldsymbol\{A\}\\mid\\exists g\\in\\boldsymbol\{G\}\\colon\\mathrm\{Ch\}\(M,g\)=\\emptyset\\land\\\{g,A^\{\\prime\}\\\}\\in\\boldsymbol\{E\}\\land\\\{g,A\\\}\\in\\boldsymbol\{E\}\\\}\. In contrast to aPCFG, the set of childPRVsCh​\(M,g\)\\mathrm\{Ch\}\(M,g\)of aparfactornodeg∈𝑮g\\in\\boldsymbol\{G\}in aPD\-PCFGMMmay be empty\. More specifically, as everyparfactornode has at most one outgoing edge, it holds that\|Ch​\(M,g\)\|≤1\\lvert\\mathrm\{Ch\}\(M,g\)\\rvert\\leq 1for everyparfactornodeg∈𝑮g\\in\\boldsymbol\{G\}\. If aparfactornodeg∈𝑮g\\in\\boldsymbol\{G\}corresponds to aparfactorwith a single argument, it always holds that\|Ch​\(M,g\)\|=1\\lvert\\mathrm\{Ch\}\(M,g\)\\rvert=1\. From now on, we also drop the assumption that every ground factorϕj​\(Rj1,…,Rjz\)∈gr​\(𝚽\)\\phi\_\{j\}\(R\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\}\}\)\\in\\mathrm\{gr\}\(\\boldsymbol\{\\Phi\}\)in aPD\-PCFGMMencodes a conditional probability distributionP​\(Rjz∣Rj1,…,Rjz−1\)P\(R\_\{j\_\{z\}\}\\mid R\_\{j\_\{1\}\},\\allowbreak\\ldots,\\allowbreak R\_\{j\_\{z\-1\}\}\)\. Let us next consider a modified version of our running example, where the underlying causal relationships are only partially known, resulting in aPD\-PCFGinstead of a fully directedPCFG\.

I​n​t​\(E\)Int\(E\)C​o​m​\(E\)Com\(E\)R​e​vRevS​a​l​\(E\)Sal\(E\)g1g\_\{1\}g2g\_\{2\}g3g\_\{3\}Figure 5:APD\-PCFGthat extends thePCFGdepicted in[Fig\.˜2](https://arxiv.org/html/2606.28024#S3.F2)in the sense that an additionalPRVI​n​t​\(E\)Int\(E\)has been added to the model\. We omit the specification of the potential tables of the \(par\)factors for brevity\.###### Example 8\(Partially Directed Parametric Causal Factor Graph\)\.

[Figure˜5](https://arxiv.org/html/2606.28024#S5.F5)depicts aPD\-PCFGMM, which extends thePCFGgiven in[Fig\.˜2](https://arxiv.org/html/2606.28024#S3.F2)\. In particular, there is an additionalPRVI​n​t​\(E\)Int\(E\)inMM, which represents the intelligence of an employee\. Moreover, for the sake of the example, there is no information available about the causal relationship betweenI​n​t​\(E\)Int\(E\)andC​o​m​\(E\)Com\(E\)\. Asg1g\_\{1\}has no outgoing directed edge, we haveNe𝐀​\(M,C​o​m​\(E\)\)=\{I​n​t​\(E\)\}\\mathrm\{Ne\}\_\{\\boldsymbol\{A\}\}\(M,Com\(E\)\)=\\\{Int\(E\)\\\}andNe𝐀​\(M,I​n​t​\(E\)\)=\{C​o​m​\(E\)\}\\mathrm\{Ne\}\_\{\\boldsymbol\{A\}\}\(M,Int\(E\)\)=\\\{Com\(E\)\\\}, whereas the remainingPRVshave no undirected neighbourPRVs\. We omit the set notation ofMMfor brevity\.

Separation in aPD\-PCFGis defined as in aPCFG, that is, the conditions specifying when a path is blocked are identical\. Again, it is also possible to check whetherPRVs\(instead of groundrandvars\) are conditionally independent in a highly efficient manner on a lifted level in aPD\-PCFG\. In aPD\-PCFG, everyPRVA​\(ℒ\)\|CA\(\\mathcal\{L\}\)\_\{\|C\}is represented by a single variable node and thus, checking for conditional independence statements that involveAAcan be done by looking at this single variable node instead of taking into account all groundings ofAAindividually\.

In accordance with our previous assumptions, whenever we deal with aPD\-PCFGin this article, we demand that whenever a probability distributionPPis modelled using aPD\-PCFGMM,PPsatisfies the global Markov property with respect toMM\. We further stipulate that all directed edges in aPD\-PCFGMMare causal and hence accurately represent causal relationships between the involvedrandvars\.

We next show how the computation of the effect of interventions can efficiently be realised in aPD\-PCFG\. An important challenge is that the effect of an intervention might differ depending on the actual causal relationships between therandvars\. As there might be multiple possible causal explanations and we do not know the correct one, it is not always possible to uniquely determine the effect of an intervention\.

The semantics of an intervention is defined on a fully directed graph\. In particular, the interventional distribution is defined as a factorisation over the conditional probability distributions of allrandvars, which are no intervention variables, given their parents\. APD\-PCFG, however, might containparfactornodes without any outgoing directed edges, thereby possibly leading to unknown sets of parents for somerandvars\. Thus, when computing the effect of an intervention, we have to take all possible parent sets of the intervention variables into account\. In general, not all combinations of orienting the undirected edges in aPD\-PCFGare consistent with the conditional independence statements holding in the underlying probability distribution\. More specifically, everyPD\-PCFGMMrepresents a set of fully directedPCFGsobtained by orienting the undirected edges inMMsuch that everyparfactornode has exactly one outgoing directed edge and the resulting model entails the same conditional independence statements asMM\. We formalise this concept in the following definition\.

###### Definition 17\(Consistent Extension\)\.

LetM=\(𝐀∪𝐆,𝐄,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\allowbreak\\boldsymbol\{E\},\\allowbreak\\boldsymbol\{\\Phi\}\)denote aPD\-PCFG\. APCFGM′=\(𝐀∪𝐆,𝐄′,𝚽\)M^\{\\prime\}=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\allowbreak\\boldsymbol\{E\}^\{\\prime\},\\allowbreak\\boldsymbol\{\\Phi\}\)is a*consistent extension*ofMMif

1. 1\.every directed edge\(g,A\)∈𝑬\(g,A\)\\in\\boldsymbol\{E\}is also in𝑬′\\boldsymbol\{E\}^\{\\prime\},
2. 2\.for everyparfactornodeg∈𝑮g\\in\\boldsymbol\{G\}withCh​\(M,g\)=∅\\mathrm\{Ch\}\(M,g\)=\\emptyset, exactly one edge\{A,g\}∈𝑬\\\{A,g\\\}\\in\\boldsymbol\{E\}is replaced by an edge\(g,A\)\(g,A\)in𝑬′\\boldsymbol\{E\}^\{\\prime\}, and
3. 3\.M′M^\{\\prime\}entails the same conditional independence statements asMM\.

We denote the set of all consistent extensions ofMMas\[M\]\[M\]\.

In other words, a consistent extension of aPD\-PCFGMMis aPCFGobtained by orienting the undirected edges inMMsuch that everyparfactornode has exactly one outgoing directed edge and the implied conditional independence statements in the extension remain the same as inMM\. The set of consistent extensions\[M\]\[M\]of aPD\-PCFGMMmight be empty\. If\[M\]≠∅\[M\]\\neq\\emptyset, we say thatMMis extendable\. The concept of a consistent extension is closely related to the concept of a Markov equivalence class, which is a set of directed acyclic graphs that entail the same conditional independence statements\[Verma1990a,Andersson1997a\]\(and thus, the set of consistent extensions of a partially directed acyclic graph is a subset of a Markov equivalence class\)\.

###### Example 9\(Consistent Extension\)\.

Consider again thePD\-PCFGMMdepicted in[Fig\.˜5](https://arxiv.org/html/2606.28024#S5.F5)\. The set of consistent extensions\[M\]\[M\]ofMMcontains two fully directedPD\-PCFGsM1M\_\{1\}andM2M\_\{2\}, which are illustrated in[Fig\.˜6\(a\)](https://arxiv.org/html/2606.28024#S5.F6.sf1)and[Fig\.˜6\(b\)](https://arxiv.org/html/2606.28024#S5.F6.sf2), respectively\. InM1M\_\{1\}, the edgeg1−C​o​m​\(E\)g\_\{1\}\-Com\(E\)has been replaced by an edgeg1→C​o​m​\(E\)g\_\{1\}\\to Com\(E\)and inM2M\_\{2\}, the edgeI​n​t​\(E\)−g1Int\(E\)\-g\_\{1\}has been replaced by an edgeg1→I​n​t​\(E\)g\_\{1\}\\to Int\(E\)\. BothM1M\_\{1\}andM2M\_\{2\}entail the same conditional independence statements asMMand could possibly model the correct underlying causal relationships but we do not know whetherM1M\_\{1\}orM2M\_\{2\}is actually the correct model\.

I​n​t​\(E\)Int\(E\)C​o​m​\(E\)Com\(E\)R​e​vRevS​a​l​\(E\)Sal\(E\)g1g\_\{1\}g2g\_\{2\}g3g\_\{3\}

\(a\)
I​n​t​\(E\)Int\(E\)C​o​m​\(E\)Com\(E\)R​e​vRevS​a​l​\(E\)Sal\(E\)g1g\_\{1\}g2g\_\{2\}g3g\_\{3\}

\(b\)

Figure 6:A graphical illustration of the set of consistent extensions\[M\]=\{M1,M2\}\[M\]=\\\{M\_\{1\},M\_\{2\}\\\}of thePD\-PCFGMMshown in[Fig\.˜5](https://arxiv.org/html/2606.28024#S5.F5)\. \(a\) shows thePCFGM1M\_\{1\}, whereC​o​m​\(E\)−g1Com\(E\)\-g\_\{1\}has been oriented asg1→C​o​m​\(E\)g\_\{1\}\\to Com\(E\), and \(b\) shows thePCFGM2M\_\{2\}, whereI​n​t​\(E\)−g1Int\(E\)\-g\_\{1\}has been oriented asg1→I​n​t​\(E\)g\_\{1\}\\to Int\(E\)\.We remark that the definition of a consistent extension refers to edges in thePD\-PCFGMMinstead of referring to edges in the ground modelgr​\(M\)\\mathrm\{gr\}\(M\)\. Thus, we assume that every edge\{A,g\}\\\{A,g\\\}inMMrepresents a set of edges ingr​\(M\)\\mathrm\{gr\}\(M\)such that all of the edges in this set are oriented in the same way\. For instance, in the ground graph of our running example, we do not allow any orientation where, e\.g\.,I​n​t​A−f1→C​o​m​AIntA\-f\_\{1\}\\to ComAandI​n​t​B←f2−C​o​m​BIntB\\leftarrow f\_\{2\}\-ComBoccur at the same time\. As we assume thatA​l​i​c​eAliceandB​o​bBobare indistinguishable, we also assume identical edge orientations for their correspondingrandvars\. Defining consistent extensions on the lifted level is, however, not necessary to apply the approaches presented here\. If wanted, the set of consistent extensions of aPD\-PCFGMMcan also be defined with respect to the ground model ofMM\(however, a definition with respect to the ground model is rather unintuitive as objects are not really indistinguishable if surrounding edges are oriented differently\)\.

To determine the effect of an intervention, we need to know the parents of the intervention variables\. As anyPD\-PCFGrepresents a set of consistent extensions, there are various possible parent sets for the intervention variables in general\. Fortunately, we do not always have to consider all consistent extensions of a givenPD\-PCFGMMto compute the effect of an intervention because there might be consistent extensions with identical parent sets for the intervention variables, thereby leading to the same effect of the intervention\. Therefore, in case all parents of therandvarson which we intervene are known, we can uniquely determine the effect of an intervention even if there are still undirected edges present inMM\. This result has been shown for propositional partially directed acyclic graphs\[Maathuis2009a,Nandy2017a\]and we now transfer this result toPD\-PCFGs\.

###### Theorem 4\.

LetM=\(𝐀∪𝐆,𝐄,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\allowbreak\\boldsymbol\{E\},\\allowbreak\\boldsymbol\{\\Phi\}\)denote aPD\-PCFG, let𝐑=gr​\(𝐀\)=\{R1,…,Rn\}\\boldsymbol\{R\}=\\mathrm\{gr\}\(\\boldsymbol\{A\}\)=\\\{R\_\{1\},\\allowbreak\\ldots,\\allowbreak R\_\{n\}\\\}and letd​o​\(R1′=r1′,…,Rk′=rk′\)do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)be an intervention on\{R1′,…,Rk′\}⊆𝐑\\\{R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}\\subseteq\\boldsymbol\{R\}\. IfNe𝐑​\(gr​\(M\),R1′\)=∅,…,Ne𝐑​\(gr​\(M\),Rk′\)=∅\\mathrm\{Ne\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R^\{\\prime\}\_\{1\}\)=\\emptyset,\\allowbreak\\ldots,\\allowbreak\\mathrm\{Ne\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R^\{\\prime\}\_\{k\}\)=\\emptyset, then the interventional distributionPM′​\(R1=r1,…,Rn=rn∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\_\{M^\{\\prime\}\}\(R\_\{1\}=r\_\{1\},\\allowbreak\\ldots,\\allowbreak R\_\{n\}=r\_\{n\}\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)under the interventiond​o​\(R1′=r1′,…,Rk′=rk′\)do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)is identical in all consistent extensionsM′∈\[M\]M^\{\\prime\}\\in\[M\]ofMM\.

###### Proof\.

The interventional distribution of aPCFGM′∈\[M\]M^\{\\prime\}\\in\[M\]is given by \([Def\.˜14](https://arxiv.org/html/2606.28024#Thmdefinition14)\):

PM′\\displaystyle P\_\{M^\{\\prime\}\}\(R1=r1,…,Rn=rn∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)\\displaystyle\(R\_\{1\}=r\_\{1\},\\ldots,R\_\{n\}=r\_\{n\}\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)=\{∏Ri∈\{R1,…,Rn\}∖\{R1′,…,Rk′\}P​\(ri∣pa𝑹​\(gr​\(M′\),Ri\)\)if​∀j∈\{1,…,k\}:rj=rj′0otherwise\.\\displaystyle=\\begin\{cases\}\\prod\\limits\_\{R\_\{i\}\\in\\\{R\_\{1\},\\ldots,R\_\{n\}\\\}\\setminus\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}\}P\(r\_\{i\}\\mid\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M^\{\\prime\}\),R\_\{i\}\)\)&\\text\{if \}\\forall j\\in\\\{1,\\ldots,k\\\}\\colon r\_\{j\}=r^\{\\prime\}\_\{j\}\\\\ 0&\\text\{otherwise\}\.\\end\{cases\}Given thatNe𝑹​\(gr​\(M\),R1′\)=∅,…,Ne𝑹​\(gr​\(M\),Rk′\)=∅\\mathrm\{Ne\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R^\{\\prime\}\_\{1\}\)=\\emptyset,\\ldots,\\mathrm\{Ne\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R^\{\\prime\}\_\{k\}\)=\\emptyset, the parentsPa𝑹​\(gr​\(M\),R1′\),…,Pa𝑹​\(gr​\(M\),Rk′\)\\mathrm\{Pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R^\{\\prime\}\_\{1\}\),\\allowbreak\\ldots,\\allowbreak\\mathrm\{Pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R^\{\\prime\}\_\{k\}\)ofR1′,…,Rk′R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}inMMare fully known and identical in allM′∈\[M\]M^\{\\prime\}\\in\[M\], i\.e\.,Pa𝑹​\(gr​\(M\),Ri′\)=Pa𝑹​\(gr​\(M′\),Ri′\)\\mathrm\{Pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R^\{\\prime\}\_\{i\}\)=\\mathrm\{Pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M^\{\\prime\}\),R^\{\\prime\}\_\{i\}\)for allM′∈\[M\]M^\{\\prime\}\\in\[M\]and alli∈\{1,…,k\}i\\in\\\{1,\\allowbreak\\ldots,\\allowbreak k\\\}\. The conditional probability distributions being removed from the product of the interventional distribution thus are identical for allM′∈\[M\]M^\{\\prime\}\\in\[M\]\. Hence, it remains to be shown that the factorisation of all groundrandvarsthat are not in\{R1′,…,Rk′\}\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}is equivalent for all consistent extensionsM′∈\[M\]M^\{\\prime\}\\in\[M\]ofMM\. We know that everyPCFGM′∈\[M\]M^\{\\prime\}\\in\[M\]entails exactly the same conditional independence statements asMMand thus, the factorisation induced by anyPCFGM′∈\[M\]M^\{\\prime\}\\in\[M\]is valid, i\.e\., allPCFGsM′∈\[M\]M^\{\\prime\}\\in\[M\]encode the same underlying full joint probability distributionPPasMM\. Therefore, as the parents of\{R1′,…,Rk′\}\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}are identical in allM′∈\[M\]M^\{\\prime\}\\in\[M\]and allM′∈\[M\]M^\{\\prime\}\\in\[M\]encode the same probability distribution, the product over the conditional distributions ofRi∈\{R1,…,Rn\}∖\{R1′,…,Rk′\}R\_\{i\}\\in\\\{R\_\{1\},\\ldots,R\_\{n\}\\\}\\setminus\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}given their respective parents is identical in allM′∈\[M\]M^\{\\prime\}\\in\[M\]\(just as allBNstructures over a fixed set ofrandvarsentailing the same conditional independence statements induce equivalent factorisations of the underlying probability distribution\)\. Consequently, the interventional distribution is identical in all consistent extensionsM′∈\[M\]M^\{\\prime\}\\in\[M\]ofMM\. ∎

A direct consequence of[Thm\.˜4](https://arxiv.org/html/2606.28024#Thmtheorem4)is that the result of any interventional query is uniquely determined if there are no undirected edges connected to the intervention variables in the corresponding ground graph of aPD\-PCFG\.

###### Corollary 1\.

LetM=\(𝐀∪𝐆,𝐄,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\allowbreak\\boldsymbol\{E\},\\allowbreak\\boldsymbol\{\\Phi\}\)denote aPD\-PCFG, let𝐑=gr​\(𝐀\)=\{Q,R1,…,Rℓ,R1′,…,Rk′\}\\boldsymbol\{R\}=\\mathrm\{gr\}\(\\boldsymbol\{A\}\)=\\\{Q,\\allowbreak R\_\{1\},\\allowbreak\\ldots,\\allowbreak R\_\{\\ell\},\\allowbreak R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}and letP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)be an interventional query\. IfNe𝐑​\(gr​\(M\),R1′\)=∅,…,Ne𝐑​\(gr​\(M\),Rk′\)=∅\\mathrm\{Ne\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R^\{\\prime\}\_\{1\}\)=\\emptyset,\\allowbreak\\ldots,\\allowbreak\\mathrm\{Ne\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R^\{\\prime\}\_\{k\}\)=\\emptyset, then the result ofP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)is identical in all consistent extensionsM′∈\[M\]M^\{\\prime\}\\in\[M\]ofMM\.

Even though in practice, there might be undirected edges connected to the intervention variables,[Thm\.˜4](https://arxiv.org/html/2606.28024#Thmtheorem4)implies that we do not have to consider all possible edge directions of the undirected edges in aPD\-PCFGwhen computing the effect of an intervention\. Instead, we only have to consider the possible directions of the undirected edges that are relevant for the intervention, i\.e\., the directions of the undirected edges that are connected to the intervention variables\. Hence, we might not have to consider all consistent extensions of the givenPD\-PCFG\. All terms required to answer the interventional query according to the truncated product formula can be computed by querying thePD\-PCFGMM, as the semantics ofMMis well\-defined even if there are undirected edges inMM\(that is, the underlying full joint probability distribution is well\-defined because its definition is independent of the edge directions inMM\)\. Intuitively, it becomes clear that the effect of an intervention is not guaranteed to be uniquely determined if there are undirected edges connected to the intervention variables because there might be various consistent extensions with different parent sets, resulting in multiple possible disjoint effects of the intervention\. This result has been shown for propositional partially directed acyclic graphs\[Maathuis2009a\]and we next show that it also holds forPD\-PCFGs\.

###### Theorem 5\.

LetM=\(𝐀∪𝐆,𝐄,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\allowbreak\\boldsymbol\{E\},\\allowbreak\\boldsymbol\{\\Phi\}\)denote aPD\-PCFG, let𝐑=gr​\(𝐀\)=\{Q,R1,…,Rℓ,R1′,…,Rk′\}\\boldsymbol\{R\}=\\mathrm\{gr\}\(\\boldsymbol\{A\}\)=\\\{Q,\\allowbreak R\_\{1\},\\allowbreak\\ldots,\\allowbreak R\_\{\\ell\},\\allowbreak R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}and letP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)be an interventional query\. If there exists arandvarRi′∈\{R1′,…,Rk′\}R^\{\\prime\}\_\{i\}\\in\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}such thatNe𝐑​\(gr​\(M\),Ri′\)≠∅\\mathrm\{Ne\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R^\{\\prime\}\_\{i\}\)\\neq\\emptyset, then there might be consistent extensionsM1,M2∈\[M\]M\_\{1\},M\_\{2\}\\in\[M\]ofMMsuch that the result ofP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)is not identical inM1M\_\{1\}andM2M\_\{2\}\.

###### Proof\.

If there exists arandvarRi′∈\{R1′,…,Rk′\}R^\{\\prime\}\_\{i\}\\in\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}such thatNe𝑹​\(gr​\(M\),Ri′\)≠∅\\mathrm\{Ne\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R^\{\\prime\}\_\{i\}\)\\neq\\emptysetholds, there might existM1,M2∈\[M\]M\_\{1\},M\_\{2\}\\in\[M\]such thatPa𝑹​\(gr​\(M1\),Ri′\)≠Pa𝑹​\(gr​\(M2\),Ri′\)\\mathrm\{Pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\_\{1\}\),R^\{\\prime\}\_\{i\}\)\\neq\\mathrm\{Pa\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\_\{2\}\),R^\{\\prime\}\_\{i\}\)\. Then, by definition of the interventional distribution \([Def\.˜14](https://arxiv.org/html/2606.28024#Thmdefinition14)\), the conditional probability distributions being removed from the product differ inM1M\_\{1\}andM2M\_\{2\}, thereby yielding different interventional distributions forM1M\_\{1\}andM2M\_\{2\}\. ∎

Generally, there might be scenarios in which it is possible to uniquely determine the result of an interventional query even if there are undirected edges connected to the intervention variables, as possibly not all undirected edges can be oriented in both directions\. In particular, some orientations might introduce a cycle or change the conditional independence statements implied by the graph structure and hence do not result in a consistent extension\. In other words, it might be possible that there is just a single possible orientation of the parents of the intervention variables and in this case, the result of any interventional query can be uniquely determined\.

Next, we gather the theoretical insights from this section to introduce theELCIalgorithm, which efficiently computes the effect of an intervention in aPD\-PCFG\.

## 6The Extended Lifted Causal Inference Algorithm

Combining the insights from[Thms\.˜4](https://arxiv.org/html/2606.28024#Thmtheorem4)and[5](https://arxiv.org/html/2606.28024#Thmtheorem5)naturally leads to an algorithm to compute the effect of interventions in aPD\-PCFG\. The idea is that all possible parent sets of the intervention variables have to be considered\. If there is just one possible set of parents, the effect of the intervention can be uniquely determined, otherwise there are multiple possible effects that are enumerated\. This idea is incorporated in the IDA algorithm and its variants\[Guo2021a,Liu2020a,Maathuis2009a\]for interventionsd​o​\(R′=r′\)do\(R^\{\\prime\}=r^\{\\prime\}\)with a single intervention variableR′R^\{\\prime\}in propositional causal models\.Nandy2017aalso consider the case of multiple intervention variablesR1′,…,Rk′R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}in propositional causal models, however, they operate in a different setting as they assume observational data generated by an unknown linear structural equation model with independent errors\.[Algorithm˜2](https://arxiv.org/html/2606.28024#alg2)displays theELCIalgorithm, which extends the idea of just considering the possible parent sets of intervention variables to handle arbitrary interventionsd​o​\(R1′=r1′,…,Rk′=rk′\)do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)withk≥1k\\geq 1in aPD\-PCFG\.

Algorithm 2Extended Lifted Causal InferenceInput:APD\-PCFGM=\(𝑨∪𝑮,𝑬,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\), and an interventional queryP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)with\{Q,R1′,…,Rk′\}⊆𝑹=gr​\(𝑨\)=\{Q,R1,…,Rℓ,R1′,…,Rk′\}\\\{Q,\\allowbreak R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}\\subseteq\\boldsymbol\{R\}=\\mathrm\{gr\}\(\\boldsymbol\{A\}\)=\\\{Q,\\allowbreak R\_\{1\},\\allowbreak\\ldots,\\allowbreak R\_\{\\ell\},\\allowbreak R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}\. Output:The set of all possible answers to the interventional queryP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)inMM\.

1:

𝑷←∅\\boldsymbol\{P\}\\leftarrow\\emptyset
2:

M←M\\leftarrow\\Acpdpcfg after splittingparfactorsin

MMon each

Ri′∈\{R1′,…,Rk′\}R^\{\\prime\}\_\{i\}\\in\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}
3:for each

𝑪1⊆Ne𝑹​\(M,R1′\),…,𝑪k⊆Ne𝑹​\(M,Rk′\)\\boldsymbol\{C\}\_\{1\}\\subseteq\\mathrm\{Ne\}\_\{\\boldsymbol\{R\}\}\(M,R^\{\\prime\}\_\{1\}\),\\ldots,\\boldsymbol\{C\}\_\{k\}\\subseteq\\mathrm\{Ne\}\_\{\\boldsymbol\{R\}\}\(M,R^\{\\prime\}\_\{k\}\)s\.t\.

𝑪1,…,𝑪k\\boldsymbol\{C\}\_\{1\},\\ldots,\\boldsymbol\{C\}\_\{k\}are cliquesdo

4:

M′←MM^\{\\prime\}\\leftarrow M
5:for eachintervention variable

Ri′∈\{R1′,…,Rk′\}R^\{\\prime\}\_\{i\}\\in\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}do

6:for eachundirected neighbourrandvar

C∈𝑪iC\\in\\boldsymbol\{C\}\_\{i\}of

Ri′R^\{\\prime\}\_\{i\}do

7:Orient

C−f−Ri′C\-f\-R^\{\\prime\}\_\{i\}as

C−f→Ri′C\-f\\to R^\{\\prime\}\_\{i\}in

M′M^\{\\prime\}
8:if

\[M′\]=∅\[M^\{\\prime\}\]=\\emptysetthen

9:continue

10:

M′′←M^\{\\prime\\prime\}\\leftarrowAny consistent extension from

\[M′\]\[M^\{\\prime\}\]
11:

D←∑r1∈range​\(R1\)…​∑rℓ∈range​\(Rℓ\)∏Ri∈\{Q,R1,…,Rℓ\}P​\(ri∣pa𝑹​\(M′′,Ri\)\)D\\leftarrow\\sum\\limits\_\{r\_\{1\}\\in\\mathrm\{range\}\(R\_\{1\}\)\}\\ldots\\sum\\limits\_\{r\_\{\\ell\}\\in\\mathrm\{range\}\(R\_\{\\ell\}\)\}\\prod\\limits\_\{R\_\{i\}\\in\\\{Q,R\_\{1\},\\ldots,R\_\{\\ell\}\\\}\}P\(r\_\{i\}\\mid\\mathrm\{pa\}\_\{\\boldsymbol\{R\}\}\(M^\{\\prime\\prime\},R\_\{i\}\)\)
12:Add

DDto

𝑷\\boldsymbol\{P\}
13:return

𝑷\\boldsymbol\{P\}

Given aPD\-PCFGM=\(𝑨∪𝑮,𝑬,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)and an interventional queryP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\),ELCIproceeds as follows to compute the set of all possible results forP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)\. First, after initialising an empty set𝑷\\boldsymbol\{P\}to which possible query results are added \([˜1](https://arxiv.org/html/2606.28024#alg2.l1)\),ELCIsplits theparfactorsinMMbased on eachRi′∈\{R1′,…,Rk′\}R^\{\\prime\}\_\{i\}\\in\\\{R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\\\}\([˜2](https://arxiv.org/html/2606.28024#alg2.l2)\)\. In particular,ELCIsplits everyparfactorϕ∈𝚽\\phi\\in\\boldsymbol\{\\Phi\}for which there is an instanceϕj∈gr​\(ϕ\)\\phi\_\{j\}\\in\\mathrm\{gr\}\(\\phi\)such that any intervention variableRi′∈\{R1′,…,Rk′\}R^\{\\prime\}\_\{i\}\\in\\\{R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}\\\}is a child ofϕj\\phi\_\{j\}\.\\Acelci then iterates over all possible combinations of parent sets \(i\.e\., over all combinations of subsets of undirected neighbours\) of the intervention variablesR1′,…,Rk′R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\([˜3](https://arxiv.org/html/2606.28024#alg2.l3),[4](https://arxiv.org/html/2606.28024#alg2.l4),[5](https://arxiv.org/html/2606.28024#alg2.l5),[6](https://arxiv.org/html/2606.28024#alg2.l6)and[7](https://arxiv.org/html/2606.28024#alg2.l7)\)\. When considering the subsets of undirected neighbours, it is necessary that all subsets are jointly valid, that is, they are not allowed to alter the conditional independence statements encoded by the model and they must not introduce any directed cycles when oriented towardsR1′,…,Rk′R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}\. To ensure the validity of these subsets, they are required to form a clique\. A clique𝑪\\boldsymbol\{C\}is a subset of nodes such that all pairs of nodes in𝑪\\boldsymbol\{C\}are directly connected via aparfactornode, that is, for each pair of nodesC1∈𝑪C\_\{1\}\\in\\boldsymbol\{C\},C2∈𝑪C\_\{2\}\\in\\boldsymbol\{C\}withC1≠C2C\_\{1\}\\neq C\_\{2\}it holds that there exists aparfactornodeg∈𝑮g\\in\\boldsymbol\{G\}such that there is an edge betweenC1C\_\{1\}andggas well as an edge betweenC2C\_\{2\}andggin𝑬\\boldsymbol\{E\}\(either directed or undirected\)\. By ensuring that the subsets of undirected neighbours form cliques, the orientation of the incident edges towardsR1′,…,Rk′R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\}does not introduce any patternC1−g1→Ri′←g2−C2C\_\{1\}\-g\_\{1\}\\to R^\{\\prime\}\_\{i\}\\leftarrow g\_\{2\}\-C\_\{2\}whereC1C\_\{1\}andC2C\_\{2\}are not directly connected via aparfactornode, as due to the clique property,C1C\_\{1\}andC2C\_\{2\}are always guaranteed to be directly connected via a factor node\. In consequence, the conditional independence statements encoded byM′M^\{\\prime\}are guaranteed to be equivalent to those encoded byMM\[Maathuis2009a\]\. Having obtained a possible combination of parent sets ofR1′,…,Rk′R^\{\\prime\}\_\{1\},\\ldots,R^\{\\prime\}\_\{k\},ELCInext extends the modified modelM′M^\{\\prime\}to anyPCFGfrom the set of consistent extensions ofM′M^\{\\prime\}, if such a consistent extension exists \([˜10](https://arxiv.org/html/2606.28024#alg2.l10),[8](https://arxiv.org/html/2606.28024#alg2.l8)and[9](https://arxiv.org/html/2606.28024#alg2.l9)\)\. In case there is no consistent extension \(e\.g\., due toM′M^\{\\prime\}containing a directed cycle\),ELCIcontinues with the next possible combination of parent sets\. If there is a consistent extensionM′′∈\[M′\]M^\{\\prime\\prime\}\\in\[M^\{\\prime\}\], then the result of the provided query is given by applying the truncated product formula \([Eq\.˜10](https://arxiv.org/html/2606.28024#S3.E10)\) inM′′M^\{\\prime\\prime\}\. As the parents of the intervention variables are fixed inM′M^\{\\prime\}, the result of the given interventional query is identical in all consistent extensions ofM′M^\{\\prime\}according to[Thm\.˜4](https://arxiv.org/html/2606.28024#Thmtheorem4)\. Thus,ELCIcomputes the result of the interventional query inM′′M^\{\\prime\\prime\}by applying the truncated product formula and then adds the result to𝑷\\boldsymbol\{P\}\([˜11](https://arxiv.org/html/2606.28024#alg2.l11)and[12](https://arxiv.org/html/2606.28024#alg2.l12)\)\. We remark that the computation can be simplified if it is known that the factors inMMencode conditional probability distributions\. Then, the modification from[Prop\.˜1](https://arxiv.org/html/2606.28024#Thmtheorem1)can be applied as in theLCIalgorithm \([Alg\.˜1](https://arxiv.org/html/2606.28024#alg1)\)\. After the result of the interventional query inM′′M^\{\\prime\\prime\}has been computed,ELCIrepeats the above steps for the next parent set of the intervention variablesR1′,…,Rk′R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}until all possible parent sets have been taken into account\. In the end,ELCIreturns the set𝑷\\boldsymbol\{P\}containing all possible results for the interventional queryP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)\([˜13](https://arxiv.org/html/2606.28024#alg2.l13)\)\. In case there is no causal explanation for the givenPD\-PCFGMM\(that is,MMhas no consistent extension at all\),ELCIreturns an empty set\. Such a situation might occur ifMMalready contains a directed cycle or if there are undirected edges that cannot be oriented without introducing a directed cycle or altering the conditional independence statements implied by the model\.

\\Ac

elci is also able to handle multiple query variables at once \(then,[˜11](https://arxiv.org/html/2606.28024#alg2.l11)is adjusted such that only non\-query variables are summed out\)\. To compute a consistent extension in[˜10](https://arxiv.org/html/2606.28024#alg2.l10),ELCImight just call any of the efficient extension algorithms that are already available\[Verma1992a,Wienoebst2021a,Luttermann2023a\]\. Each of these algorithms operates on a partially directed acyclic graph and hence can be directly applied to the underlying causal graph of thePD\-PCFG\. While the set of probabilistic queries obtained from the truncated product formula refers to groundrandvars, these queries can be answered using lifted probabilistic inference, e\.g\., by running theLJTalgorithm \(which is specifically designed to efficiently handle sets of queries\) onMM\. The correctness ofELCIdirectly follows from[Thm\.˜4](https://arxiv.org/html/2606.28024#Thmtheorem4)\.

###### Proposition 6\.

The result computed byELCI\([Alg\.˜2](https://arxiv.org/html/2606.28024#alg2)\) is correct, i\.e\., the returned set contains all possible results for the interventional queryP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)in the givenPD\-PCFGMM\.

###### Proof\.

From[Thm\.˜4](https://arxiv.org/html/2606.28024#Thmtheorem4), we know that the result of any interventional query is identical in all consistent extensions of aPD\-PCFGMMif the parents of the intervention variables are known\. Thus, to compute the set of possible results forP​\(Q∣d​o​\(R1′=r1′,…,Rk′=rk′\)\)P\(Q\\mid do\(R^\{\\prime\}\_\{1\}=r^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}=r^\{\\prime\}\_\{k\}\)\)inMM, it is sufficient to consider all possible parent sets of the intervention variablesR1′,…,Rk′R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}and compute the result of the given query in the resulting models\. Due toMaathuis2009a, it holds that any subset of undirected neighbours of the intervention variables needs to form a clique in order to obtain a valid orientation when orienting the edges towards the intervention variables \(as otherwise, the conditional independence statements induced by the graph change\)\. Consequently, by ensuring that undirected neighbours ofR1′,…,Rk′R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}form cliques before orienting them towardsR1′,…,Rk′R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\},ELCIdoes not miss any possible parent set of the intervention variables\. For any fixed set of parents ofR1′,…,Rk′R^\{\\prime\}\_\{1\},\\allowbreak\\ldots,\\allowbreak R^\{\\prime\}\_\{k\}, due to[Thm\.˜4](https://arxiv.org/html/2606.28024#Thmtheorem4)it is then sufficient to consider any consistent extension and compute the result for the given query in it\. AsELCIapplies the truncated product formula from[Def\.˜15](https://arxiv.org/html/2606.28024#Thmdefinition15)to compute the result of the given query, the correctness ofELCIfollows\. ∎

Given our assumption that the graph structure is identical for all groundings, it also holds that, e\.g\., given an interventiond​o​\(C​o​m​\(E\)=high\)do\(Com\(E\)=\\mathrm\{high\}\),ELCIhas to consider only two possible parent sets regardless of the number of employees while there are2\|dom​\(E\)\|2^\{\\lvert\\mathrm\{dom\}\(E\)\\rvert\}possible parent sets in an equivalent propositional model to consider\. In a propositional model, it is also possible to reduce the number of possible parent sets when background knowledge is introduced, i\.e\., when knowing that specificrandvarsare actually representable by a singlePRV\.

###### Corollary 2\.

LetM=\(𝐀∪𝐆,𝐄,𝚽\)M=\(\\boldsymbol\{A\}\\cup\\boldsymbol\{G\},\\boldsymbol\{E\},\\boldsymbol\{\\Phi\}\)be aPD\-PCFG\. When intervening on aPRVA​\(ℒ\)\|C∈𝐀A\(\\mathcal\{L\}\)\_\{\|C\}\\in\\boldsymbol\{A\}, under the assumption that the graph structure is identical for all groundings, it holds that

1. 1\.ELCIconsidersO​\(2\|Ne𝑨​\(M,A\)\|\)O\(2^\{\\lvert\\mathrm\{Ne\}\_\{\\boldsymbol\{A\}\}\(M,A\)\\rvert\}\)possible parent sets in the worst case, and
2. 2\.in a propositional model,O​\(2∑R∈gr​\(A\)\|Ne𝑹​\(gr​\(M\),R\)\|\)O\(2^\{\\sum\_\{R\\in\\mathrm\{gr\}\(A\)\}\\lvert\\mathrm\{Ne\}\_\{\\boldsymbol\{R\}\}\(\\mathrm\{gr\}\(M\),R\)\\rvert\}\)possible parent sets have to be considered in the worst case\.

We refrain from empirically evaluatingELCIas we have already shown the superiority ofLCIto the propositional case\.\\Acelci \(LCI, respectively\) enables the computation of answers to interventional queries on a lifted level and hence can also be plugged into parameterised decision models\[Gehrke2018b\]to compute the action that maximises the expected utility under the semantics of interventions \(instead of using the semantics of conditioning\)\. A parameterised decision model originally extends aPFGby action nodes and utility nodes\. Instead of using an undirectedPFGas a basis, we can use aPD\-PCFG\(or aPCFG\) as a basis for a parameterised decision model and then compute the expected utility of an action usingELCI\(LCI, respectively\), thereby allowing for first\-order decision making\.

## 7Conclusion

We introducePCFGsto combine lifted probabilistic inference with causal knowledge\. To leverage the power of lifted inference for the computation of the effect of interventions, we further present theLCIalgorithm, which operates on a lifted level and thus allows us to drastically speed up causal inference compared to running causal inference on an equivalent propositional \(ground\) model\.\\Aclci is a simple, yet effective algorithm to compute the effect of interventions\. Moreover, we introducePD\-PCFGsas lifted causal models that allow to incorporate partial causal knowledge, thereby enabling lifted causal inference without the requirement of having a fully specified causal graph at hand\. APD\-PCFGgeneralises the concept of aPCFGby incorporating both undirected and directed edges in the graph structure\. To compute the effect of interventions in aPD\-PCFG, we introduce theELCIalgorithm, which enumerates all possible results for an interventional query without grounding the entire model\.

An interesting direction for future work is to relax the causal sufficiency assumption \([Def\.˜5](https://arxiv.org/html/2606.28024#Thmdefinition5)\) and allow for hidden confounders \(i\.e\., confounding variables that are not observed and hence not included in the set ofrandvarsover which the model is defined\) in aPD\-PCFG\. Under the presence of hidden confounders, the result of an interventional query might not be uniquely determinable anymore\. Thus, the identification problem \(i\.e\., the problem of determining whether a causal effect can be uniquely identified\) becomes relevant if hidden confounders are present\. In particular, an interesting question is whether thed​odo\-calculus introduced byPearl1995a, which allows to rewrite an interventional query to obtain a probabilistic query free ofd​odo\-expressions, can be applied to aPD\-PCFGwith hidden confounders\.

\\bmhead

Acknowledgements This work is supported by the BMBF project AnoMed 16KISA057 and extends the works\[Luttermann2024b\]and\[Luttermann2024g\]\. The authors also would like to thank the anonymous reviewers for their valuable feedback and suggestions to improve the manuscript\. This version of the article has been accepted for publication, after peer review but is not the Version of Record and does not reflect post\-acceptance improvements, or any corrections\. The Version of Record is available online at:[https://doi\.org/10\.1007/s10472\-026\-10009\-1](https://doi.org/10.1007/s10472-026-10009-1)\.

\\bmhead

Conflict of Interest The authors declare that they have no conflict of interest\.

## References

Similar Articles

LLM Explainability with Counterfactual Chains and Causal Graphs

Hugging Face Daily Papers

This paper proposes a four-phase method for constructing causal graphs that model LLM inference processes, using counterfactual augmentation to enable stable causal discovery and provide transparent, concept-level explainability.

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

Hugging Face Daily Papers

CausaLab is a scalable environment for evaluating LLM agents on interactive causal discovery, assessing both predictive accuracy and faithful recovery of underlying causal mechanisms. Experiments reveal a gap between prediction and mechanism recovery, highlighting limits in current LLM agents as experimental causal reasoners.

Relational Structural Causal Models

arXiv cs.AI

This paper introduces relational structural causal models, extending structural causal models to settings with varying objects and relations. It provides theoretical results for identification and proposes relational neural causal models that outperform non-relational baselines on simulated traffic scenes.