Relational Structural Causal Models

arXiv cs.AI 06/16/26, 04:00 AM Papers
Summary
This paper introduces relational structural causal models, extending structural causal models to settings with varying objects and relations. It provides theoretical results for identification and proposes relational neural causal models that outperform non-relational baselines on simulated traffic scenes.
arXiv:2606.14892v1 Announce Type: new Abstract: An artificial intelligence must have a model of its environment that is causal, supporting reasoning about interventions and counterfactuals, and also combinatorial, supporting generalization to unseen combinations of objects. In this work, we formally study when and how such a model can be learned. We develop relational structural causal models, extending structural causal models (Pearl 2009) to settings where objects and their relations vary. First, we show how answers to not only causal but also observational queries about unseen combinations of objects can not be identified without further assumptions. To enable such identification--including in the presence of unobserved confounding--we define relational causal graphs and derive symbolic identification criteria. Finally, we propose relational neural causal models, a provably correct approach that outperforms non-relational baselines on simulated traffic scenes with varying cars, signals, and pedestrians.
Original Article
View Cached Full Text
Cached at: 06/16/26, 11:43 AM
# Relational Structural Causal Models
Source: [https://arxiv.org/html/2606.14892](https://arxiv.org/html/2606.14892)
###### Abstract

An artificial intelligence must have a model of its environment that iscausal, supporting reasoning about interventions and counterfactuals, and also*combinatorial*, supporting generalization to unseen combinations of objects\. In this work, we formally study when and how such a model can be learned\. We develop*relational structural causal models*, extending structural causal models\(Pearl,[2009](https://arxiv.org/html/2606.14892#bib.bib86)\)to settings where objects and their relations vary\. First, we show how answers to not only causal but also observational queries about unseen combinations of objects can not be identified without further assumptions\. To enable such identification—including in the presence of unobserved confounding—we define*relational causal graphs*and derive symbolic identification criteria\. Finally, we propose*relational neural causal models*, a provably correct approach that outperforms non\-relational baselines on simulated traffic scenes with varying cars, signals, and pedestrians\.

Machine Learning, ICML

## 1Introduction

Behind a Rube Goldberg machine is a sequence of simple mechanisms\. A ball rolls down a ramp, tipping a weight, pulling a string, swinging a hammer, and striking a gong\. Predicting what happens next and why requires a model of how these bodies interact\. This is precisely whatworld modelsaim to provide for AI systems to learn efficiently and generalize across environments\(Ha & Schmidhuber,[2018](https://arxiv.org/html/2606.14892#bib.bib46); LeCun,[2022](https://arxiv.org/html/2606.14892#bib.bib62); Gurnee & Tegmark,[2024](https://arxiv.org/html/2606.14892#bib.bib45); Richens & Everitt,[2024](https://arxiv.org/html/2606.14892#bib.bib94); Vafa et al\.,[2024](https://arxiv.org/html/2606.14892#bib.bib119)\)\. In this work, we consider two important problems that such a model must address\.

The first problem is that of representing objects and composing them via relations\(Battaglia et al\.,[2018](https://arxiv.org/html/2606.14892#bib.bib9); Lake et al\.,[2017](https://arxiv.org/html/2606.14892#bib.bib61); Tenenbaum et al\.,[2011](https://arxiv.org/html/2606.14892#bib.bib117); Chollet,[2019](https://arxiv.org/html/2606.14892#bib.bib16)\)\. Downstream of such representations is the ability to answer questions about unseen combinations of objects, e\.g\., a new Rube Goldberg machine with an added ramp\. Such combinatorial structure arises in many domains\. Robots must reason about varying types of objects and their spatial relations to navigate and manipulate the world\(Li et al\.,[2019](https://arxiv.org/html/2606.14892#bib.bib65); Wang et al\.,[2025b](https://arxiv.org/html/2606.14892#bib.bib125); Locatello et al\.,[2020](https://arxiv.org/html/2606.14892#bib.bib70)\); language is generated from unbounded combinations of nouns related by verbs\(Chomsky,[1965](https://arxiv.org/html/2606.14892#bib.bib17)\); and biological systems are naturally described in terms of interacting proteins, metabolites, and cells\(Barabási et al\.,[2011](https://arxiv.org/html/2606.14892#bib.bib5); Veličković,[2023](https://arxiv.org/html/2606.14892#bib.bib121); Regev et al\.,[2017](https://arxiv.org/html/2606.14892#bib.bib92)\)\. The generality of this problem has inspired active research into relational and object\-centric machine learning\(Koller & Friedman,[2009](https://arxiv.org/html/2606.14892#bib.bib60); Getoor & Taskar,[2007](https://arxiv.org/html/2606.14892#bib.bib40); Veličković et al\.,[2018](https://arxiv.org/html/2606.14892#bib.bib122); Kipf & Welling,[2017](https://arxiv.org/html/2606.14892#bib.bib59); Zambaldi et al\.,[2018](https://arxiv.org/html/2606.14892#bib.bib135)\)aimed at such combinatorial generalization\.

The second problem is that of answering causal questions: what if the weight were lighter, the string were cut, or the ramp angle were changed in our Rube Goldberg machine? A common view is that such questions cannot be answered from observations of the environment alone, requiring either interventions, orcausalinductive biases, or often both\(Pearl,[2009](https://arxiv.org/html/2606.14892#bib.bib86); Pearl & Mackenzie,[2018](https://arxiv.org/html/2606.14892#bib.bib87); Bareinboim et al\.,[2022](https://arxiv.org/html/2606.14892#bib.bib7); Schölkopf et al\.,[2021](https://arxiv.org/html/2606.14892#bib.bib105); Bareinboim,[2025](https://arxiv.org/html/2606.14892#bib.bib6)\)\. In our case, evidence for this point of view comes from the weaknesses of relational machine learning\. Despite strong in\-distribution predictive performance, relational and non\-relational methods alike can still exploit correlations that are unstable under interventions or distribution shift\(de Haan et al\.,[2019](https://arxiv.org/html/2606.14892#bib.bib25); Park et al\.,[2021](https://arxiv.org/html/2606.14892#bib.bib84); Fan et al\.,[2022](https://arxiv.org/html/2606.14892#bib.bib33); Wu et al\.,[2022](https://arxiv.org/html/2606.14892#bib.bib128); Vo et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib123)\)\. Relational structure alone does not guarantee answers to causal questions\.

![Refer to caption](https://arxiv.org/html/2606.14892v1/x1.png)Figure 1:A schematic for the problem of relational identification across varying traffic scenes, following the schema in Ex\.[1](https://arxiv.org/html/2606.14892#Thmexample1)\. Each panel shows: \(i\) a relational skeletonρ\\rhorepresenting a particular combination of signals \(ss\), pedestrians \(pp\), and cars \(cc\), \(ii\) the corresponding causal graph𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}, and \(iii\) the available data or the query of interest\. The goal is to use data from thesourceskeletonsρA\\rho\_\{A\}andρB\\rho\_\{B\}to answer a query about thetargetskeletonρB\\rho\_\{B\}\. In anyρ\\rho, an edge\(si,pj\)\(s\_\{i\},p\_\{j\}\)indicates that signalsis\_\{i\}controls pedestrianpjp\_\{j\}; an edge\(si,cj\)\(s\_\{i\},c\_\{j\}\)indicates that signalsis\_\{i\}controls carcjc\_\{j\}; and an edge\(pi,cj\)\(p\_\{i\},c\_\{j\}\)indicates that pedestrianpip\_\{i\}is in the path of carcjc\_\{j\}\. A bubble marks the second tuple element\. Cars with similar relational neighborhoods skeletons are circumscribed by similar shapes \(red dotted line\)\.Causal machine learning methods aim to answer such questions in the contexts of decision\-making, generative modeling, fairness, and more\(Schölkopf,[2022](https://arxiv.org/html/2606.14892#bib.bib103); Plečko & Bareinboim,[2024](https://arxiv.org/html/2606.14892#bib.bib88); Bareinboim et al\.,[2024](https://arxiv.org/html/2606.14892#bib.bib8); Pan & Bareinboim,[2024](https://arxiv.org/html/2606.14892#bib.bib80),[2025](https://arxiv.org/html/2606.14892#bib.bib81)\)\. However, many of these results do not easily lend themselves to combinatorial generalization because they rely on a fixed causal graph over a fixed set of variables\. In domains such as autonomous driving, where traffic scenes differ in how many signals, pedestrians and cars appear and how they relate \(Fig\.[1](https://arxiv.org/html/2606.14892#S1.F1)\), causal methods must make assumptions that simplify these relations\. They often assume that objects are unrelated \(i\.i\.d\.\) or that the relational structure is fixed\. Causal relational learning\(Lee & Honavar,[2016](https://arxiv.org/html/2606.14892#bib.bib64); Maier et al\.,[2010](https://arxiv.org/html/2606.14892#bib.bib71); Salimi et al\.,[2020](https://arxiv.org/html/2606.14892#bib.bib99)\)and methods for causal inference from non\-i\.i\.d\. data\(Rubin,[1990](https://arxiv.org/html/2606.14892#bib.bib97); Sobel,[2006](https://arxiv.org/html/2606.14892#bib.bib108); Hudgens & Halloran,[2008](https://arxiv.org/html/2606.14892#bib.bib52); Ogburn & VanderWeele,[2014](https://arxiv.org/html/2606.14892#bib.bib76); Weinstein & Blei,[2024](https://arxiv.org/html/2606.14892#bib.bib126)\)make progress towards relaxing this assumption, but assume that the objects and their relations are fixed\. For instance, they study what can be inferred about a given traffic scene given data from that same scene, but not data from a different scene\. As such, they do not address the problem of generalizing across combinations where both object counts and their relations can vary\.

Contributions\.We develop causal models for object\-relational settings, enabling causal inference across varying combinations of objects\. More specifically, our contributions are as follows\.

1. 1\.Relational SCMs\.In Sec\.[3](https://arxiv.org/html/2606.14892#S3), we formalize how different combinations of objects can be unified by the same data\-generating process: a relational structural causal model \(Def\.[3\.1](https://arxiv.org/html/2606.14892#S3.Thmtheorem1)\)\. Based on this formalization, we prove the limits of learning distributions over seen and unseen combinations of objects, showing that even observational distributions of unseen combinations can not be learned without further assumptions\.
2. 2\.Graphical identification\.In Sec\.[4](https://arxiv.org/html/2606.14892#S4), given the previous impossibility results, we introduce a graphical language for encoding assumptions that enable*relational identification*\(Def\.[4\.2](https://arxiv.org/html/2606.14892#S4.Thmtheorem2)\) across combinations of objects\. We show when and how existing causal inference tools can be used for this task\.
3. 3\.Relational neural causal models\.In Sec\.[5](https://arxiv.org/html/2606.14892#S5), we develop*relational neural causal models*\(Def\.[5\.1](https://arxiv.org/html/2606.14892#S5.Thmtheorem1)\) which form the basis of a sound and complete neural approach for relational identification in practice\.

Experiments with simulated traffic scenes \(Sec\.[6](https://arxiv.org/html/2606.14892#S6), Sec\.[E](https://arxiv.org/html/2606.14892#A5)\) support our findings\. We give an extended discussion of related literature in Sec\.[A](https://arxiv.org/html/2606.14892#A1); further definitions and examples, including a comparison with standard SCMs in Secs\.[B](https://arxiv.org/html/2606.14892#A2)and[C](https://arxiv.org/html/2606.14892#A3); as well as proofs and further results in Sec\.[D](https://arxiv.org/html/2606.14892#A4)\.

## 2Preliminaries

Notation\.Capital letters\(X\)\(X\)denote variables,dom\(X\)\\textnormal\{dom\}\(X\)denotes their domains, small letters\(x\)\(x\)denote values in their domains, and bold letters denote sets of variables\(𝐗\)\(\\mathbf\{X\}\)and their values\(𝐱\)\(\\mathbf\{x\}\)\.P\(𝐗\)P\(\\mathbf\{X\}\)denotes the probability distribution over a set of variables𝐗\\mathbf\{X\}\. We consistently useP\(𝐱\)P\(\\mathbf\{x\}\)to abbreviate probabilitiesP\(𝐗=𝐱\)P\(\\mathbf\{X\}=\\mathbf\{x\}\)\.

Structural causal models\.Our framework extends that of*structural causal models*\(SCMs\)\(Pearl,[2009](https://arxiv.org/html/2606.14892#bib.bib86); Bareinboim,[2025](https://arxiv.org/html/2606.14892#bib.bib6)\), a formalism for data\-generating processes\. An SCMℳ\\mathcal\{M\}is a four\-tupleℳ=⟨𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\ranglewhere𝐕\\mathbf\{V\}and𝐔\\mathbf\{U\}are sets of endogenous \(observed\) and exogenous \(unobserved\) variables respectively\.ℱ\\mathcal\{F\}is a set of mechanisms: eachV∈𝐕V\\in\\mathbf\{V\}takes the valuefV\(𝐩𝐚V,𝐮V\)f\_\{V\}\(\\mathbf\{pa\}\_\{V\},\\mathbf\{u\}\_\{V\}\), a function of the values of its endogenous and exogenous parents,𝐏𝐚V⊆𝐕\\mathbf\{Pa\}\_\{V\}\\subseteq\\mathbf\{V\}and𝐔V⊆𝐔\\mathbf\{U\}\_\{V\}\\subseteq\\mathbf\{U\}, respectively\.P\(𝐔\)P\(\\mathbf\{U\}\)is a joint distribution over𝐔\\mathbf\{U\}; as in prior work\(Zhang et al\.,[2022b](https://arxiv.org/html/2606.14892#bib.bib138); Xia et al\.,[2021](https://arxiv.org/html/2606.14892#bib.bib130)\), we assume the variables in𝐔\\mathbf\{U\}are jointly independent, although a givenUUmay affect more than oneVV\.

Every SCM induces acausal graph, constructed as follows: \(1\) add a vertex for everyV∈𝐕V\\in\\mathbf\{V\}\(2\) add an edgeVi→VjV\_\{i\}\\to V\_\{j\}for everyVi,Vj∈𝐕V\_\{i\},V\_\{j\}\\in\\mathbf\{V\}ifVi∈𝐏𝐚𝐕𝐣V\_\{i\}\\in\\mathbf\{Pa\_\{V\_\{j\}\}\}\(3\) add a dashed bidirected edge betweenVi,VjV\_\{i\},V\_\{j\}if𝐔Vi∩𝐔Vj≠∅\\mathbf\{U\}\_\{V\_\{i\}\}\\cap\\mathbf\{U\}\_\{V\_\{j\}\}\\neq\\emptyset\. See Sec\.[B](https://arxiv.org/html/2606.14892#A2)for additional background\.

Objects and relations\.We build on the entity\-relationship \(ER\) model\(Ullman & Widom,[2002](https://arxiv.org/html/2606.14892#bib.bib118)\)\. Arelational schemais a 3\-tuple𝒮=⟨ℰ,ℛ,𝒜⟩\\mathcal\{S\}=\\langle\\mathcal\{E\},\\mathcal\{R\},\\mathcal\{A\}\\ranglewhereℰ\\mathcal\{E\}is a set of entity \(or object\) types;ℛ\\mathcal\{R\}a set of relation types overℰ\\mathcal\{E\}; and𝒜\\mathcal\{A\}a set of observed attribute typesO\.AO\.Afor typesO∈ℰ∪ℛO\\in\\mathcal\{E\\cup R\}\. Arelational skeletonρ\\rhoof𝒮\\mathcal\{S\}is a finite set of ground entities and relationsooof the specified typesO∈ℰ∪ℛO\\in\\mathcal\{E\\cup R\}\. We writeρ\(O\)\\rho\(O\)for the set of instancesooinρ\\rhoof typeOO\.

###### Example 1\(Relational schema and skeleton for traffic scene\)\.

A simple relational schema for traffic scenes would be

ℰ\\displaystyle\\mathcal\{E\}=\{Signal\(𝖲𝗂𝗀\),Car\(𝖢𝖺𝗋\),Pedestrian\(𝖯𝖾𝖽\)\}\\displaystyle=\\\{\\text\{Signal \}\(\\mathsf\{Sig\}\),\\text\{ Car \}\(\\mathsf\{Car\}\),\\text\{ Pedestrian \}\(\\mathsf\{Ped\}\)\\\}ℛ\\displaystyle\\mathcal\{R\}=\{𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖯𝖾𝖽\),𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖢𝖺𝗋\),𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)\}\\displaystyle=\\\{\\mathsf\{Ctrl\(Sig,Ped\)\},\\mathsf\{Ctrl\(Sig,Car\)\},\\mathsf\{Path\(Ped,Car\)\}\\\}𝒜\\displaystyle\\mathcal\{A\}=\{𝖲𝗂𝗀\.W,𝖯𝖾𝖽\.X,𝖢𝖺𝗋\.B\},\\displaystyle=\\\{\\mathsf\{Sig\}\.W,\\mathsf\{Ped\}\.X,\\mathsf\{Car\}\.B\\\},with all attributes binary\-valued:𝖲𝗂𝗀\.W∈\{1,0\}\\mathsf\{Sig\}\.W\\in\\\{1,0\\\}denotes walk/drive;𝖯𝖾𝖽\.X∈\{1,0\}\\mathsf\{Ped\}\.X\\in\\\{1,0\\\}cross/wait; and𝖢𝖺𝗋\.B∈\{1,0\}\\mathsf\{Car\}\.B\\in\\\{1,0\\\}brake/go\.𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖯𝖾𝖽\)\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Ped\}\)\(resp\.𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖢𝖺𝗋\)\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Car\}\)\) indicates that a signal controls a pedestrian \(resp\. car\), and𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)indicates that the pedestrian is in the car’s path\. Figure[1](https://arxiv.org/html/2606.14892#S1.F1)shows three skeletons for three traffic scenes, e\.g\., inρA\\rho\_\{A\}\(slip lane\), one signals1s\_\{1\}controls pedestriansp1,p2p\_\{1\},p\_\{2\}and carc1c\_\{1\}\(but notc2c\_\{2\}\)\.□\\square

## 3Defining and Characterizing Relational SCMs

In this section, we introduce relational structural causal models \(RSCMs\) with the goal of specifying a data\-generating process that underlies varying combinations of objects\.

### 3\.1Defining Relational SCMs

An RSCM generalizes a standard SCM in two ways\. First, different types of objects carry different attributes, and hence different sets of variables in an RSCM\. Second, an attribute of one object may affect that of another only when the two objects stand in a particular relation\. Following previous work on relational modeling\(Koller & Friedman,[2009](https://arxiv.org/html/2606.14892#bib.bib60)\), we capture such contingent dependencies using*relational constraints*\(Def\.[B\.1](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition1)\)\.

###### Example 2\(Relational constraints for traffic scene\)\.

In Ex\.[1](https://arxiv.org/html/2606.14892#Thmexample1), consider a signal𝖲𝗂𝗀\\mathsf\{Sig\}, pedestrian𝖯𝖾𝖽\\mathsf\{Ped\}, and car𝖢𝖺𝗋\\mathsf\{Car\}\. In skeletonρA\\rho\_\{A\}, the constraintϕ:\\phi:𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖢𝖺𝗋\)\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Car\}\)is true for𝖲𝗂𝗀=s1\\mathsf\{Sig\}=s\_\{1\}and𝖢𝖺𝗋=c1\\mathsf\{Car\}=c\_\{1\}but not𝖢𝖺𝗋=c2\\mathsf\{Car\}=c\_\{2\}\. InρB\\rho\_\{B\}, the constraintϕ′:\\phi^\{\\prime\}:𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)is true for𝖯𝖾𝖽=p1\\mathsf\{Ped\}=p\_\{1\}and𝖢𝖺𝗋=c1\\mathsf\{Car\}=c\_\{1\}but not𝖢𝖺𝗋=c2\\mathsf\{Car\}=c\_\{2\}\.□\\square

An RSCM specifies one mechanism per attribute \(e\.g\., for whether a car brakes\)\. Its output can depend on the attributes of related objects \(e\.g\., the crossing states of all pedestrians in the car’s path\), possibly via aggregation \(Def\.[B\.2](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition2)\)\.

###### Definition 3\.1\(Relational structural causal model \(RSCM\)\)\.

Arelational structural causal model\(RSCM\) is a 5\-tupleℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangle, where𝒮=⟨ℰ,ℛ,𝒜⟩\\mathcal\{S\}=\\langle\\mathcal\{E\},\\mathcal\{R\},\\mathcal\{A\}\\rangleis a relational schema;𝐕\\mathbf\{V\}is a set of endogenous variablesO\.AO\.Afor each attributeO\.AO\.Ain𝒜\\mathcal\{A\};ℱ\\mathcal\{F\}is a set of mechanismsfO\.Af\_\{O\.A\}for each variableO\.AO\.Ain𝐕\\mathbf\{V\};𝐔\\mathbf\{U\}is a set of exogenous variablesO\.UO\.Utied to objectsO∈ℰ∪ℛO\\in\\mathcal\{E\\cup R\}; andP\(𝐔\)P\(\\mathbf\{U\}\)is a probability distribution over𝐔\\mathbf\{U\}factorizing asP\(𝐔\)=∏O\.U∈𝐔P\(O\.U\)P\(\\mathbf\{U\}\)=\\prod\_\{O\.U\\in\\mathbf\{U\}\}P\(O\.U\)\. Each mechanism has the form

O\.A←fO\.A\(𝐏𝐚O\.A,𝐔O\.A,𝐏𝐚O\.Ar,𝐔O\.Ar\)\.O\.A\\leftarrow f\_\{O\.A\}\(\\mathbf\{Pa\}\_\{O\.A\},\\mathbf\{U\}\_\{O\.A\},\\mathbf\{Pa\}^\{r\}\_\{O\.A\},\\mathbf\{U\}^\{r\}\_\{O\.A\}\)\.Here,𝐏𝐚O\.A⊆𝐕\\mathbf\{Pa\}\_\{O\.A\}\\subseteq\\mathbf\{V\}and𝐔O\.A⊆𝐔\\mathbf\{U\}\_\{O\.A\}\\subseteq\\mathbf\{U\}arenon\-relational parentscomprising attributes of the same object instance\. On the other hand,𝐏𝐚O\.Ar\\mathbf\{Pa\}^\{r\}\_\{O\.A\}and𝐔O\.Ar\\mathbf\{U\}^\{r\}\_\{O\.A\}arerelational parents\. Each endogenous relational parent is a tuple\(𝐖,ϕ,AGG\)\(\\mathbf\{W\},\\phi,\\textrm\{AGG\}\), where𝐖⊆𝐕\\mathbf\{W\}\\subseteq\\mathbf\{V\}are variables belonging to some typeT∈ℰ∪ℛT\\in\\mathcal\{E\\cup R\};ϕ\\phiis a relational constraint over entities associated withOOandTT; andAGGis an optional list of aggregators for eachT\.W∈𝐖T\.W\\in\\mathbf\{W\}\. Exogenous relational parents are analogous\.

###### Example 3\(RSCM for traffic scene\)\.

Continuing Ex\.[1](https://arxiv.org/html/2606.14892#Thmexample1)and[2](https://arxiv.org/html/2606.14892#Thmexample2), we define an RSCM for traffic scenes\. The endogenous variables are𝐕=\{𝖲𝗂𝗀\.W,𝖯𝖾𝖽\.X,𝖢𝖺𝗋\.B\}\\mathbf\{V\}=\\\{\\mathsf\{Sig\}\.W,\\ \\mathsf\{Ped\}\.X,\\ \\mathsf\{Car\}\.B\\\}\. The exogenous variables𝐔\\mathbf\{U\}are𝖲𝗂𝗀\.UW∼ℬ\(0\.3\),𝖯𝖾𝖽\.UX∼ℬ\(0\.4\)\\mathsf\{Sig\}\.\{U\_\{W\}\}\\sim\\mathcal\{B\}\(0\.3\),\\mathsf\{Ped\}\.\{U\_\{X\}\}\\sim\\mathcal\{B\}\(0\.4\)and𝖢𝖺𝗋\.UB∼ℬ\(0\.2\)\\mathsf\{Car\}\.\{U\_\{B\}\}\\sim\\mathcal\{B\}\(0\.2\), capturing unobserved factors such as a pedestrian’s intent to cross or a driver’s alertness\. The mechanisms are

𝖲𝗂𝗀\.W\\displaystyle\\mathsf\{Sig\}\.W←𝖲𝗂𝗀\.UW,\\displaystyle\\leftarrow\\mathsf\{Sig\}\.U\_\{W\},𝖯𝖾𝖽\.X\\displaystyle\\mathsf\{Ped\}\.X←𝖯𝖾𝖽\.UX⊕⋀𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖯𝖾𝖽\)𝖲𝗂𝗀\.W,and\\displaystyle\\leftarrow\\mathsf\{Ped\}\.U\_\{X\}\\oplus\\bigwedge\_\{\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Ped\}\)\}\\mathsf\{Sig\}\.W,\\text\{ and\}𝖢𝖺𝗋\.B\\displaystyle\\mathsf\{Car\}\.B←𝖢𝖺𝗋\.UB⊕\(⋁𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖢𝖺𝗋\)𝖲𝗂𝗀\.W\\displaystyle\\leftarrow\\mathsf\{Car\}\.U\_\{B\}\\oplus\\bigg\(\\bigvee\_\{\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Car\}\)\}\\mathsf\{Sig\}\.W∨⋁𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)𝖯𝖾𝖽\.X\)\.\\displaystyle\\hskip 70\.0001pt\\lor\\bigvee\_\{\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)\}\\mathsf\{Ped\}\.X\\bigg\)\.For example,𝖯𝖾𝖽\.X\\mathsf\{Ped\}\.Xhas the non\-relational parent𝖯𝖾𝖽\.UX\\mathsf\{Ped\}\.U\_\{X\}and relational parent\(\{𝖲𝗂𝗀\.W\},𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖯𝖾𝖽\),∧\)\(\\\{\\mathsf\{Sig\}\.W\\\},\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Ped\}\),\\wedge\)\. Intuitively,f𝖯𝖾𝖽\.Xf\_\{\\mathsf\{Ped\}\.X\}makes a pedestrian cross when all controlling signals are in the ‘walk’ state, up to noise \(e\.g\., the pedestrian does not intend to cross\)\. Similarly,f𝖢𝖺𝗋\.Bf\_\{\\mathsf\{Car\}\.B\}makes a car brake when any controlling signal says ‘walk’ or any pedestrian in its path is crossing, again up to noise \(e\.g\., the driver is not alert\)\. See Ex\.[10](https://arxiv.org/html/2606.14892#Thmexample10)for an extended example\.□\\square

Note how mechanisms in an RSCM differ from those in an SCM\.111We give a side\-by\-side comparison of RSCMs with standard SCMs for the traffic example in Table[B\.3\.1](https://arxiv.org/html/2606.14892#A2.SS3.T1), as well as for an additional example in Sec\.[C\.1](https://arxiv.org/html/2606.14892#A3.SS1)\.Since the number of objects satisfying a constraint \(e\.g\., pedestrians in a car’s path\) can vary across skeletons, eachfO\.Af\_\{O\.A\}in an RSCM must accept multisets of varying size, while in an SCM,fO\.Af\_\{O\.A\}accepts a fixed\-size input\. In practice, relational learning often uses permutation\-invariantaggregators\(Def\.[B\.2](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition2)\) such as mean, sum, max, majority, attention pooling, etc\. to implement functions on sets\.

An RSCMℳ\\mathcal\{M\}may additionally be*Markovian*\.

###### Definition 3\.2\(RSCM Markovianity\)\.

We say an RSCMℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangleisρ\\rho\-*Markovian*if for each variableO\.A∈𝐕O\.A\\in\\mathbf\{V\}, the set of exogenous relational parents𝐔O\.Ar\\mathbf\{U\}^\{r\}\_\{O\.A\}is empty\. We sayℳ\\mathcal\{M\}is*Markovian*if it isρ\\rho\-Markovian and no two variablesO\.A,T\.B∈𝐕O\.A,T\.B\\in\\mathbf\{V\}share a non\-relational exogenous parent\.

An RSCM can be instantiated for any skeletonρ\\rho\. It induces a standard*ground*RSCM \(Def\.[B\.4](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition4)\) with a ground variableo\.Ao\.Afor each attributesO\.AO\.Aand instanceo∈ρ\(O\)o\\in\\rho\(O\)\. The function determiningo\.Ao\.Asubstitutes the relational parents infO\.Af\_\{O\.A\}with ground variablest\.Wt\.Wwhereooandttstand in the required relation\.

###### Example 4\(Ground RSCM for traffic scene\)\.

For the RSCMℳ\\mathcal\{M\}in Ex\.[3](https://arxiv.org/html/2606.14892#Thmexample3)and skeletonρA\\rho\_\{A\}in Fig\.[1](https://arxiv.org/html/2606.14892#S1.F1), the ground RSCMℳρA=⟨𝐕ρA,𝐔ρA,ℱρA,P\(𝐔ρA\)⟩\\mathcal\{M\}\_\{\\rho\_\{A\}\}=\\langle\\mathbf\{V\}\_\{\\rho\_\{A\}\},\\mathbf\{U\}\_\{\\rho\_\{A\}\},\\mathcal\{F\}\_\{\\rho\_\{A\}\},P\(\\mathbf\{U\}\_\{\\rho\_\{A\}\}\)\\rangleis as follows\.

𝐕ρA\\displaystyle\\mathbf\{V\}\_\{\\rho\_\{A\}\}=\{s1\.W,p1\.X,p2\.X,c1\.B,c2\.B\}\\displaystyle=\\\{s\_\{1\}\.W,\\ p\_\{1\}\.X,\\ p\_\{2\}\.X,\\ c\_\{1\}\.B,\\ c\_\{2\}\.B\\\}𝐔ρA\\displaystyle\\mathbf\{U\}\_\{\\rho\_\{A\}\}=\{s1\.UW,p1\.UX,p2\.UX,c1\.UB,c2\.UB\}\\displaystyle=\\\{s\_\{1\}\.U\_\{W\},\\ p\_\{1\}\.U\_\{X\},\\ p\_\{2\}\.U\_\{X\},c\_\{1\}\.U\_\{B\},\\ c\_\{2\}\.U\_\{B\}\\\}s1\.W\\displaystyle s\_\{1\}\.W←s1\.UW\\displaystyle\\leftarrow s\_\{1\}\.U\_\{W\}p1\.X\\displaystyle p\_\{1\}\.X←p1\.UX⊕⋀\{s1\.W\}\\displaystyle\\leftarrow p\_\{1\}\.U\_\{X\}\\oplus\\bigwedge\\\{s\_\{1\}\.W\\\}p2\.X\\displaystyle p\_\{2\}\.X←p2\.UX⊕⋀\{s1\.W\}\\displaystyle\\leftarrow p\_\{2\}\.U\_\{X\}\\oplus\\bigwedge\\\{s\_\{1\}\.W\\\}c1\.B\\displaystyle c\_\{1\}\.B←c1\.UB⊕\(⋁\{s1\.W\}∨⋁\{p1\.X,p2\.X\}\)\\displaystyle\\leftarrow c\_\{1\}\.U\_\{B\}\\oplus\\left\(\\bigvee\\\{s\_\{1\}\.W\\\}\\lor\\bigvee\\\{p\_\{1\}\.X,p\_\{2\}\.X\\\}\\right\)c2\.B\\displaystyle c\_\{2\}\.B←c2\.UB⊕\(⋁∅∨⋁\{p2\.X\}\)\\displaystyle\\leftarrow c\_\{2\}\.U\_\{B\}\\oplus\\left\(\\bigvee\\emptyset\\lor\\bigvee\\\{p\_\{2\}\.X\\\}\\right\)withs1\.UW∼ℬ\(0\.3\)s\_\{1\}\.U\_\{W\}\\sim\\mathcal\{B\}\(0\.3\);p1\.UX,p2\.UX∼𝗂𝗂𝖽ℬ\(0\.4\)p\_\{1\}\.U\_\{X\},\\ p\_\{2\}\.U\_\{X\}\\sim\_\{\\mathsf\{iid\}\}\\mathcal\{B\}\(0\.4\); andc1\.UB,c2\.UB∼𝗂𝗂𝖽ℬ\(0\.2\)c\_\{1\}\.U\_\{B\},\\ c\_\{2\}\.U\_\{B\}\\sim\_\{\\mathsf\{iid\}\}\\mathcal\{B\}\(0\.2\)\.ℳρA\\mathcal\{M\}\_\{\\rho\_\{A\}\}describes the generative process for various traffic scenes with the structureρA\\rho\_\{A\}\.□\\square

We assume, throughout, that for any skeletonρ\\rho, the ground RSCMℳρ\\mathcal\{M\}\_\{\\rho\}is recursive \(or acyclic, Def\.[B\.3](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition3)\)\.222This is weaker than requiring the template RSCM itself to be acyclic\. For instance, an RSCM may specify that for carsCCandC′C^\{\\prime\},𝖢𝖺𝗋\.B\\mathsf\{Car\}\.BaffectsC′\.BC^\{\\prime\}\.BifC′C^\{\\prime\}is behindCC\. This appears cyclic at the template level; however, in any groundingℳρ\\mathcal\{M\}\_\{\\rho\}, two cars cannot both be behind each other, and soℳρ\\mathcal\{M\}\_\{\\rho\}is acyclic\. We implement such an RSCM in Exp\.[6\.2](https://arxiv.org/html/2606.14892#S6.SS2)\.

### 3\.2Limits of Learning Relational SCMs

In most domains, the true data\-generating process, or RSCM, is unknown\(Pearl,[2009](https://arxiv.org/html/2606.14892#bib.bib86); Bareinboim et al\.,[2022](https://arxiv.org/html/2606.14892#bib.bib7); Bareinboim,[2025](https://arxiv.org/html/2606.14892#bib.bib6)\)\. What we observe instead is data from many skeletons, each with its own combination of objects and relations\. In this section, we consider what can be learned from such data about the true RSCM, and what this implies for unseen relational structures\.

A ground RSCMℳρ\\mathcal\{M\}\_\{\\rho\}induces observational, interventional, and counterfactual distributions over𝐕ρ\\mathbf\{V\}\_\{\\rho\}\(Def\.[B\.5](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition5)\)\.

###### Example 5\(RSCM distributions for traffic scene\)\.

Considerℳρ\\mathcal\{M\}\_\{\\rho\}in Ex\.[4](https://arxiv.org/html/2606.14892#Thmexample4)\. The observational queryP\(s1\.W=1\)P\(s\_\{1\}\.W=1\)is the probability that signals1s\_\{1\}says ‘walk’\. The interventional queryP\(c1\.B=1∣do\(p1\.X=1\)\)P\(c\_\{1\}\.B=1\\mid do\(p\_\{1\}\.X=1\)\)is the probability that carc1c\_\{1\}brakes when pedestrianp1p\_\{1\}crosses under intervention, irrespective of the signal \(e\.g\., by an officer\)\. The counterfactualP\(c1\.Bs1\.W=1=1\|s1\.W=0,p1\.X=1,c1\.B=0\)P\\\!\\big\(c\_\{1\}\.B\_\{s\_\{1\}\.W=1\}=1\\,\\big\|\\,s\_\{1\}\.W=0,\\ p\_\{1\}\.X=1,\\ c\_\{1\}\.B=0\\big\)asks: in a scene where the signal was ‘drive’,p1p\_\{1\}crossed, andc1c\_\{1\}did not brake, what is the probability thatc1c\_\{1\}*would have*braked hads1s\_\{1\}been set to ‘walk’?□\\square

The classic challenge in causal inference is inferring causal effects from observational data for a fixed skeleton\. Generalizing this, say we have data from a given set of source skeletons, and we want to answer a query about an unseen target skeleton\. We show that even the*observational*distribution for this target is not identifiable\.

###### Theorem 3\.3\(Impossibility of observational inference across skeletons\)\.

Consider a schema𝒮\\mathcal\{S\}, source skeletonsρ1,…,ρl\\rho\_\{1\},\\dots,\\rho\_\{l\}, and target skeletonρ⋆\\rho\_\{\\star\}\. Then, for any RSCMℳ\\mathcal\{M\}over𝒮\\mathcal\{S\}, there exists another RSCMℳ′\\mathcal\{M\}^\{\\prime\}over𝒮\\mathcal\{S\}such thatℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}agree on observational distributionsP\(𝐯ρk\)P\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\)for every source skeletonρk\\rho\_\{k\}but disagree on the observational distributionP\(𝐯ρ⋆\)P\(\\mathbf\{v\}\_\{\\rho\_\{\\star\}\}\)of the target skeleton\.

###### Example 6\(Impossibility of observational inference across skeletons\)\.

Consider the source skeletonρA\\rho\_\{A\}and target skeletonρC\\rho\_\{C\}\(Fig\.[1](https://arxiv.org/html/2606.14892#S1.F1)\)\. Say we knowP\(𝐯ρA\)P\(\\mathbf\{v\}\_\{\\rho\_\{A\}\}\)and want to learnP\(𝐯ρC\)P\(\\mathbf\{v\}\_\{\\rho\_\{C\}\}\)\. We can show this is not possible, following Thm\.[3\.3](https://arxiv.org/html/2606.14892#S3.Thmtheorem3)above\. Let the true RSCM beℳ\\mathcal\{M\}given in Ex\.[3](https://arxiv.org/html/2606.14892#Thmexample3), where the mechanism determining whether a car brakes is

𝖢𝖺𝗋\.B\\displaystyle\\mathsf\{Car\}\.B←𝖢𝖺𝗋\.UB⊕\(⋁𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖢𝖺𝗋\)𝖲𝗂𝗀\.W\\displaystyle\\leftarrow\\mathsf\{Car\}\.U\_\{B\}\\oplus\\bigg\(\\bigvee\_\{\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Car\}\)\}\\mathsf\{Sig\}\.W∨⋁𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)𝖯𝖾𝖽\.X\)\.\\displaystyle\\hskip 70\.0001pt\\lor\\bigvee\_\{\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)\}\\mathsf\{Ped\}\.X\\bigg\)\.
Consider another RSCMℳ′\\mathcal\{M\}^\{\\prime\}which is identical toℳ\\mathcal\{M\}, except with a slightly different braking mechanism:

𝖢𝖺𝗋\.B\\displaystyle\\mathsf\{Car\}\.B←𝖢𝖺𝗋\.UB⊕\(⨁𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖢𝖺𝗋\)𝖲𝗂𝗀\.W\\displaystyle\\leftarrow\\mathsf\{Car\}\.U\_\{B\}\\oplus\\bigg\(\\bigoplus\_\{\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Car\}\)\}\\mathsf\{Sig\}\.W∨⋁𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)𝖯𝖾𝖽\.X\)\.\\displaystyle\\hskip 70\.0001pt\\lor\\bigvee\_\{\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)\}\\mathsf\{Ped\}\.X\\bigg\)\.In skeletonρA\\rho\_\{A\}, any car is controlled by at most one signal, so we can show thatℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}yield identical distributions forρA\\rho\_\{A\}\. In particularc1c\_\{1\}is controlled only bys1s\_\{1\}, and we have⋁\{s1\.W\}=⨁\{s1\.W\}\\bigvee\\\{s\_\{1\}\.W\\\}=\\bigoplus\\\{s\_\{1\}\.W\\\};c2c\_\{2\}is not controlled by any signals, and⋁∅=⨁∅\\bigvee\\emptyset=\\bigoplus\\emptyset\. However, in the target skeletonρC\\rho\_\{C\}, carc1c\_\{1\}is controlled by two signals; here,⋁\{s1\.W,s2\.W\}≠⨁\{s1\.W,s2\.W\}\\bigvee\\\{s\_\{1\}\.W,s\_\{2\}\.W\\\}\\neq\\bigoplus\\\{s\_\{1\}\.W,s\_\{2\}\.W\\\}in general\. Consider the conditionals:

PℳρC\(c1\.B=1∣p1\.X=0,p2\.X=0,\\displaystyle P^\{\\mathcal\{M\}\_\{\\rho\_\{C\}\}\}\(c\_\{1\}\.B=1\\mid p\_\{1\}\.X=0,p\_\{2\}\.X=0,s1\.W=1,s2\.W=1\)\\displaystyle\\qquad\\qquad\\qquad\\qquad s\_\{1\}\.W=1,s\_\{2\}\.W=1\)=PℳρC\(c1\.UB⊕\(\(1∨1\)∨\(0∨0\)\)=1\)\\displaystyle=P^\{\\mathcal\{M\}\_\{\\rho\_\{C\}\}\}\(c\_\{1\}\.U\_\{B\}\\oplus\(\(1\\lor 1\)\\lor\(0\\lor 0\)\)=1\)=PℳρC\(c1\.UB=0\)=0\.8\\displaystyle=P^\{\\mathcal\{M\}\_\{\\rho\_\{C\}\}\}\(c\_\{1\}\.U\_\{B\}=0\)=0\.8Pℳ′ρC\(c1\.B=1∣p1\.X=0,p2\.X=0,\\displaystyle P^\{\\mathcal\{M^\{\\prime\}\}\_\{\\rho\_\{C\}\}\}\(c\_\{1\}\.B=1\\mid p\_\{1\}\.X=0,p\_\{2\}\.X=0,s1\.W=1,s2\.W=1\)\\displaystyle\\qquad\\qquad\\qquad\\qquad s\_\{1\}\.W=1,s\_\{2\}\.W=1\)=Pℳ′ρC\(c1\.UB⊕\(\(1⊕1\)∨\(0∨0\)\)=1\)\\displaystyle=P^\{\\mathcal\{M^\{\\prime\}\}\_\{\\rho\_\{C\}\}\}\(c\_\{1\}\.U\_\{B\}\\oplus\(\(1\\oplus 1\)\\lor\(0\\lor 0\)\)=1\)=Pℳ′ρC\(c1\.UB=1\)=0\.2\\displaystyle=P^\{\\mathcal\{M^\{\\prime\}\}\_\{\\rho\_\{C\}\}\}\(c\_\{1\}\.U\_\{B\}=1\)=0\.2Thus, the two SCMs disagree onP\(𝐯ρC\)P\(\\mathbf\{v\}\_\{\\rho\_\{C\}\}\)\.□\\hfill\\square

If we want to learn an interventional distribution for an unseen target skeletonρ⋆\\rho\_\{\\star\}\. Thm\.[3\.3](https://arxiv.org/html/2606.14892#S3.Thmtheorem3)already limits our ability to do this\. It implies that even the observational distributionP\(𝐯ρ⋆\)P\(\\mathbf\{v\}\_\{\\rho\_\{\\star\}\}\), a prerequisite of existing causal inference methods, may not be identified by the source data\. An independently interesting question, however, is: when we*do*know some distributions for the target skeleton, e\.g\.,P\(𝐯ρ⋆\)P\(\\mathbf\{v\}\_\{\\rho\_\{\\star\}\}\), does this suffice to identify other interventional \(or counterfactual\) queries in the target? We give a negative answer\.333It may seem that a negative answer follows immediately from the causal hierarchy theorem \(CHT\)\(Bareinboim et al\.,[2022](https://arxiv.org/html/2606.14892#bib.bib7)\)\. However, the proof of the CHT relies on being able to construct an*arbitrary*SCMℳ′\\mathcal\{M\}^\{\\prime\}that matches the true SCMℳ\\mathcal\{M\}on the given distribution\(s\) but not on the query\. We are in a stricter setting whereℳ′\\mathcal\{M\}^\{\\prime\}must share exogenous distributions and functions across objects of the same type, i\.e\., be a grounding of an RSCM\.

###### Theorem 3\.4\(Impossibility of causal inference within a skeleton\)\.

Consider a schema𝒮\\mathcal\{S\}where at least one entity or relation type has more than one observed attribute\. For any relational SCMℳ\\mathcal\{M\}over𝒮\\mathcal\{S\}and skeletonρ\\rho, there exists another relational SCMℳ′\\mathcal\{M\}^\{\\prime\}over𝒮\\mathcal\{S\}such thatℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}agree on the observational distributionP\(𝐯ρ\)P\(\\mathbf\{v\}\_\{\\rho\}\)but disagree on some interventional distribution over𝐕ρ\\mathbf\{V\}\_\{\\rho\}\.

Thms\.[3\.3](https://arxiv.org/html/2606.14892#S3.Thmtheorem3)and[3\.4](https://arxiv.org/html/2606.14892#S3.Thmtheorem4)hold even when the relational structure is known\. This suggests the need for assumptions about the causal structure, in addition to the relational structure\.

## 4Relational Identification

Previously, we showed that without further assumptions, even the observational distribution for unseen combinations of objects cannot be identified\. In this section, we develop a graphical model approach to overcome this impossibility\.

### 4\.1Defining Relational Causal Graphs

First, we extend causal graphs to include relational constraints\(Pearl,[2009](https://arxiv.org/html/2606.14892#bib.bib86); Koller & Friedman,[2009](https://arxiv.org/html/2606.14892#bib.bib60)\)\.

###### Definition 4\.1\(Relational causal graph\)\.

An RSCMℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangleinduces a*relational causal graph*𝒢\\mathcal\{G\}constructed as follows\.

- •Non\-relational subgraph\.For each object typeO∈ℰ∪ℛO\\in\\mathcal\{E\\cup R\}, let𝒢\\mathcal\{G\}contain nodes for each variableO\.A∈𝐕O\.A\\in\\mathbf\{V\}, a directed edgeO\.B→O\.AO\.B\\to O\.Afor anyO\.B∈𝐏𝐚O\.AO\.B\\in\\mathbf\{Pa\}\_\{O\.A\}, and a dashed bidirected edgeO\.A↔O\.BO\.A\\leftrightarrow O\.Bfor anyO\.B∈𝐕O\.B\\in\\mathbf\{V\}such that𝐔O\.A∩𝐔O\.B≠∅\\mathbf\{U\}\_\{O\.A\}\\cap\\mathbf\{U\}\_\{O\.B\}\\neq\\emptyset, annotated with the constraintO=O′O=O^\{\\prime\}\.
- •Relational subgraph\.For each variableO\.A∈𝐕O\.A\\in\\mathbf\{V\}and each relational parentR=\(𝐖,ϕ,AGG\)∈𝐏𝐚O\.ArR=\(\\mathbf\{W\},\\phi,\\textrm\{AGG\}\)\\in\\mathbf\{Pa\}^\{r\}\_\{O\.A\}, let𝒢\\mathcal\{G\}contain a*relational node*O\.RO\.Rand an edgeO\.R→O\.AO\.R\\to O\.A\. For eachT\.W∈𝐖T\.W\\in\\mathbf\{W\}, add an edgeT\.W→O\.RT\.W\\to O\.Rannotated withϕ\\phiandAGG\. Finally, for anyT\.B∈𝐕T\.B\\in\\mathbf\{V\}such thatO\.AO\.AandT\.BT\.Bhave exogenous relational parents\(𝐖1,ϕ1,AGG1\)\(\\mathbf\{W\}\_\{1\},\\phi\_\{1\},\\textrm\{AGG\}\_\{1\}\)and\(𝐖2,ϕ2,AGG2\)\(\\mathbf\{W\}\_\{2\},\\phi\_\{2\},\\textrm\{AGG\}\_\{2\}\)respectively such that someZ\.U∈𝐖1∩𝐖2Z\.U\\in\\mathbf\{W\}\_\{1\}\\cap\\mathbf\{W\}\_\{2\}, add a dashed bidirected edgeO\.A↔T\.BO\.A\\leftrightarrow T\.Bannotated with the constraint∃Z:ϕ1∧ϕ2\\exists Z:\\phi\_\{1\}\\wedge\\phi\_\{2\}, or append the constraint to an existingO\.A↔T\.BO\.A\\leftrightarrow T\.Bedge\.

Fig\.[2](https://arxiv.org/html/2606.14892#S4.F2)shows the graph for the traffic RSCM \(Ex\.[3](https://arxiv.org/html/2606.14892#Thmexample3)\)\. Like an RSCM, a relational causal graph𝒢\\mathcal\{G\}can be instantiated for any skeleton to yield a*ground graph*𝒢ρ\\mathcal\{G\}\_\{\\rho\}\(Def\.[B\.6](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition6)\)\. The relational nodes in𝒢ρ\\mathcal\{G\}\_\{\\rho\}can bemarginalized\(Def\.[B\.7](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition7), Fig\.[1](https://arxiv.org/html/2606.14892#S1.F1)\) to yield a standard causal graph𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}\.

Causal graphs can be used to encode assumptions about the space of possible RSCMs\. They help circumvent the impossibility results of Sec\.[3](https://arxiv.org/html/2606.14892#S3), enabling*relational identification*\.

###### Definition 4\.2\(Relational counterfactual identification\)\.

Consider a schema𝒮\\mathcal\{S\}, relational causal graph𝒢\\mathcal\{G\}, source skeletonsρ1,…,ρl\\rho\_\{1\},\\dots,\\rho\_\{l\}, source distributionsℙ=\{\{P\(𝐯ρk∣do\(𝐱k,j\)\)\}j=1mk\}k=1l\\mathbb\{P\}=\\\{\\\{P\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)\\\}\_\{j=1\}^\{m\_\{k\}\}\\\}\_\{k=1\}^\{l\}, and target skeletonρ⋆\\rho\_\{\\star\}\. LetP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)be a target query with𝐘∗,𝐗∗⊆𝐕ρ⋆\\mathbf\{Y\_\{\*\}\},\\mathbf\{X\_\{\*\}\}\\subseteq\\mathbf\{V\}\_\{\\rho\_\{\\star\}\}\.

We sayP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)is*relationally identifiable*from𝒢\\mathcal\{G\}andℙ\\mathbb\{P\}if for any RSCMsℳ,ℳ′\\mathcal\{M\},\\mathcal\{M\}^\{\\prime\}consistent with𝒢\\mathcal\{G\}agreeing on the source data, so that for everyρk\\rho\_\{k\}andj=1,…,mkj=1,\\dots,m\_\{k\},

Pℳρk\(𝐯ρk∣do\(𝐱k,j\)\)=Pℳρk′\(𝐯ρk∣do\(𝐱k,j\)\)\>0,P^\{\\mathcal\{M\}\_\{\\rho\_\{k\}\}\}\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)=P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\_\{k\}\}\}\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)\\;\>\\;0,they also agree on the query:

Pℳρ⋆\(𝐲∗∣𝐱∗\)=Pℳρ⋆′\(𝐲∗∣𝐱∗\)\.P^\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)=P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\_\{\\star\}\}\}\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)\.Otherwise, the query is*relationally non\-identifiable*\.

A special case of the above definition is when all skeletons \(source and target\) are*isomorphic*to each other \(Def\.[B\.1](https://arxiv.org/html/2606.14892#A2.Thmtheorem1)\. This is the task of same\-skeleton identification\. When the target skeleton is non\-isomorphic to every source skeleton, we refer to the task as cross\-skeleton identification\.

###### Example 7\(Relational identification across skeletons\)\.

Continuing Ex\.[1](https://arxiv.org/html/2606.14892#Thmexample1), suppose we are given the graph𝒢\\mathcal\{G\}in Fig\.[2](https://arxiv.org/html/2606.14892#S4.F2)\. Let the target skeleton beρB\\rho\_\{B\}\(Fig\.[1](https://arxiv.org/html/2606.14892#S1.F1)\) and consider the queryPρB\(c1\.B=1∣do\(s1\.W=1\)\),P^\{\\rho\_\{B\}\}\(c\_\{1\}\.B=1\\mid do\(s\_\{1\}\.W=1\)\),the effect of setting signals1s\_\{1\}to ‘walk’ on whether carc1c\_\{1\}brakes\. A non\-relational identification task for this query would assume all available distributions are from the same skeletonρB\\rho\_\{B\}\. On the other hand, relational identification can ask whether the same query is answerable from a distribution for a different skeleton, e\.g\.,P\(𝐯ρA\)P\(\\mathbf\{v\}\_\{\\rho\_\{A\}\}\)\.

Crucially, the target query is not interchangeable withPρA\(c1\.B=1∣do\(s1\.W=1\)\)P^\{\\rho\_\{A\}\}\(c\_\{1\}\.B=1\\mid do\(s\_\{1\}\.W=1\)\): inρA\\rho\_\{A\},s1s\_\{1\}controls two pedestrians in the path ofc1c\_\{1\}, whereas inρB\\rho\_\{B\},s1s\_\{1\}controls only one pedestrian in the path ofc1c\_\{1\}\. So, the contribution ofs1\.Ws\_\{1\}\.Wneed not be the same\. The target quantity is also not interchangeable withPρB\(c2\.B=1∣do\(s1\.W=1\)\)P^\{\\rho\_\{B\}\}\(c\_\{2\}\.B=1\\mid do\(s\_\{1\}\.W=1\)\), sinces1s\_\{1\}has no effect onc2c\_\{2\}\. A non\-relational query such asP\(𝖢𝖺𝗋\.B=1∣do\(𝖲𝗂𝗀\.W=1\)\)P\(\\mathsf\{Car\}\.B=1\\mid do\(\\mathsf\{Sig\}\.W=1\)\), the usual target of causal inference, hides this variation\.□\\square

### 4\.2Identification Machinery

We now introduce tools for relational identification\.

##### Observational inference\.

We first ask when observational distributions can be transferred across skeletons, addressing the limitation in Thm\.[3\.3](https://arxiv.org/html/2606.14892#S3.Thmtheorem3)\. Say a ground variableo\.Ao\.Ais ‘unconfounded’ if there are no bidirected edges incident to it in the ground graph𝒢ρ\\mathcal\{G\}\_\{\\rho\}\.

###### Theorem 4\.3\(Observational identification across skeletons\)\.

Consider a schema𝒮\\mathcal\{S\}, relational causal graph𝒢\\mathcal\{G\}, source skeletonρ\\rho, and target skeletonρ⋆\\rho\_\{\\star\}\. Leto\.Ao\.Abe an unconfounded variable in𝐕ρ⋆\\mathbf\{V\}\_\{\\rho\_\{\\star\}\}\. The conditionalP\(o\.a∣𝐩𝐚o\.A,𝐩𝐚o\.Ar\)P\(o\.a\\mid\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\}\)is relationally identifiable from𝒢\\mathcal\{G\}andP\(𝐯ρ\)P\(\\mathbf\{v\}\_\{\\rho\}\)if there exists a source instanceo′∈ρo^\{\\prime\}\\in\\rhosuch thato′\.Ao^\{\\prime\}\.Ais unconfounded anddom\(𝐏𝐚o\.Ar\)⊆dom\(𝐏𝐚o′\.Ar\)\\textnormal\{dom\}\(\\mathbf\{Pa\}^\{r\}\_\{o\.A\}\)\\subseteq\\textnormal\{dom\}\(\\mathbf\{Pa\}^\{r\}\_\{o^\{\\prime\}\.A\}\)\. In this case,P\(o\.a∣𝐩𝐚o\.A,𝐩𝐚o\.Ar\)=P\(o′\.a∣𝐩𝐚o′\.A,𝐩𝐚o′\.Ar\)P\(o\.a\\mid\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\}\)=P\(o^\{\\prime\}\.a\\mid\\mathbf\{pa\}\_\{o^\{\\prime\}\.A\},\\mathbf\{pa\}^\{r\}\_\{o^\{\\prime\}\.A\}\)\.

![Refer to caption](https://arxiv.org/html/2606.14892v1/x2.png)Figure 2:A relational causal graph \(Def\.[4\.1](https://arxiv.org/html/2606.14892#S4.Thmtheorem1)\) for the traffic RSCM in Ex\.[3](https://arxiv.org/html/2606.14892#Thmexample3)\.𝖲𝗂𝗀\.W\\mathsf\{Sig\}\.Wdenotes the state of a signal,𝖯𝖾𝖽\.X\\mathsf\{Ped\}\.Xwhether a pedestrian crosses, and𝖢𝖺𝗋\.B\\mathsf\{Car\}\.Bwhether a car brakes\. A signal affects a pedestrian or car only if it controls them \(𝖢𝗍𝗋𝗅\\mathsf\{Ctrl\}\), and a pedestrian affects a car only if they are in the car’s path \(𝖯𝖺𝗍𝗁\\mathsf\{Path\}\)\. The relational nodes represent aggregated values \(𝗈𝗋,𝖺𝗇𝖽\\mathsf\{or,and\}\) of the related objects\.If𝒢\\mathcal\{G\}is Markovian, \(1\) holds automatically, allowing us to recover the full target observational distributionP\(𝐯ρ⋆\)P\(\\mathbf\{v\}\_\{\\rho\_\{\\star\}\}\)if the support condition is satisfied for each instance \(Cor\.[D\.4](https://arxiv.org/html/2606.14892#A4.Thmtheorem4)\)\.

###### Example 8\(Observational identification across skeletons\)\.

Continuing Ex\.[1](https://arxiv.org/html/2606.14892#Thmexample1), letρA\\rho\_\{A\}andρC\\rho\_\{C\}be the source and target skeletons respectively, and𝒢\\mathcal\{G\}\(Fig\.[2](https://arxiv.org/html/2606.14892#S4.F2)\) the relational causal graph\. The conditionalPρC\(c2\.b∣s2\.w,p2\.x,p3\.x\)P^\{\\rho\_\{C\}\}\(c\_\{2\}\.b\\mid s\_\{2\}\.w,p\_\{2\}\.x,p\_\{3\}\.x\)in the target is identifiable fromP\(𝐯ρA\)P\(\\mathbf\{v\}\_\{\\rho\_\{A\}\}\)and𝒢\\mathcal\{G\}\. This is becausec1\.Bc\_\{1\}\.Bin𝐕ρA\\mathbf\{V\}\_\{\\rho\_\{A\}\}andc2\.Bc\_\{2\}\.Bin𝐕ρC\\mathbf\{V\}\_\{\\rho\_\{C\}\}are both unconfounded, and affected by exactly two pedestrians and one signal \(and therefore have the same parent domains\)\. In particular, writing multisets of valuesw¯=\{s2\.w\}\\bar\{w\}=\\\{s\_\{2\}\.w\\\}andx¯=\{p2\.x,p3\.x\}\\bar\{x\}=\\\{p\_\{2\}\.x,p\_\{3\}\.x\\\},

PρC\(c2\.b∣s2\.w,p2\.x,p3\.x\)\\displaystyle P^\{\\rho\_\{C\}\}\(c\_\{2\}\.b\\mid s\_\{2\}\.w,p\_\{2\}\.x,p\_\{3\}\.x\)=PρC\(c2\.b∣\{s2\.W\}=w¯,\{p2\.X,p3\.X\}=x¯\)\\displaystyle=P^\{\\rho\_\{C\}\}\(c\_\{2\}\.b\\mid\\\{s\_\{2\}\.W\\\}=\\bar\{w\},\\\{p\_\{2\}\.X,p\_\{3\}\.X\\\}=\\bar\{x\}\)=PρA\(c1\.b∣\{s1\.W\}=w¯,\{p1\.X,p2\.X\}=x¯\)\.\\displaystyle=P^\{\\rho\_\{A\}\}\(c\_\{1\}\.b\\mid\\\{s\_\{1\}\.W\\\}=\\bar\{w\},\\\{p\_\{1\}\.X,p\_\{2\}\.X\\\}=\\bar\{x\}\)\.Consider a more complex queryPρC\(c1\.b∣s1\.w,s2\.w,p2\.x,p2\.x\)P^\{\\rho\_\{C\}\}\(c\_\{1\}\.b\\mid s\_\{1\}\.w,s\_\{2\}\.w,p\_\{2\}\.x,p\_\{2\}\.x\)in the target\. There is no car in the source controlled by two signals\. If the parent domains are multisets of values, no car in the source meets the support condition for this query\. However, note the aggregation constraint in𝒢\\mathcal\{G\}:𝖢𝖺𝗋\.B\\mathsf\{Car\}\.Bonly depends on its controlling signals via the aggregate∨\\lor\. Then, lettingw¯=∨\(s1\.w,s2\.w\)\\bar\{w\}=\\lor\(s\_\{1\}\.w,s\_\{2\}\.w\)andx¯=∨\(p2\.x,p3\.x\)\\bar\{x\}=\\lor\(p\_\{2\}\.x,p\_\{3\}\.x\), we have

PρC\(c1\.b∣s1\.w,s2\.w,p2\.x,p2\.x\)\\displaystyle P^\{\\rho\_\{C\}\}\(c\_\{1\}\.b\\mid s\_\{1\}\.w,s\_\{2\}\.w,p\_\{2\}\.x,p\_\{2\}\.x\)=PρC\(c1\.b∣∨\(s1\.W,s2\.W\)=w¯,∨\(p1\.X,p2\.X\)=x¯\)\\displaystyle=P^\{\\rho\_\{C\}\}\(c\_\{1\}\.b\\mid\\lor\(s\_\{1\}\.W,s\_\{2\}\.W\)=\\bar\{w\},\\lor\(p\_\{1\}\.X,p\_\{2\}\.X\)=\\bar\{x\}\)=PρA\(c1\.b∣∨\(s1\.W\)=w¯,∨\(p1\.X,p2\.X\)=x¯\),\\displaystyle=P^\{\\rho\_\{A\}\}\(c\_\{1\}\.b\\mid\\lor\(s\_\{1\}\.W\)=\\bar\{w\},\\lor\(p\_\{1\}\.X,p\_\{2\}\.X\)=\\bar\{x\}\),rendering the query identifiable\.□\\square

##### Causal inference\.

For the task of same\-skeleton identification, we show how thectf\-calculus\(Correa & Bareinboim,[2025](https://arxiv.org/html/2606.14892#bib.bib19)\), a recent generalization of do\-calculus\(Pearl,[2009](https://arxiv.org/html/2606.14892#bib.bib86)\), can be used to show identifiability\.

###### Proposition 4\.4\(Sufficient condition for same\-skeleton relational identification\)\.

Consider a schema𝒮\\mathcal\{S\}, relational causal graph𝒢\\mathcal\{G\}, skeletonρ\\rho, and family of interventional distributionsℙ\\mathbb\{P\}over𝐕ρ\\mathbf\{V\}\_\{\\rho\}\. IfP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)is identifiable via ctf\-calculus from the marginalized ground graph𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}andℙ\\mathbb\{P\}, then it is also relationally identifiable from𝒢\\mathcal\{G\}andℙ\\mathbb\{P\}\.

As a corollary, we recover the well\-known backdoor adjustment formula\(Pearl,[2009](https://arxiv.org/html/2606.14892#bib.bib86)\)in the relational setting \(Cor\.[D\.2](https://arxiv.org/html/2606.14892#A4.Thmadxcorollary2)\)\.

###### Example 9\(Same\-skeleton relational backdoor\.\)\.

Continuing Ex\.[1](https://arxiv.org/html/2606.14892#Thmexample1), letρ=ρA\\rho=\\rho\_\{A\}be the skeleton of interest,𝒢\\mathcal\{G\}\(Fig\.[2](https://arxiv.org/html/2606.14892#S4.F2)\) the graph, andP\(c1\.B∣do\(p1\.X\)\)P\(c\_\{1\}\.B\\mid do\(p\_\{1\}\.X\)\)the query\. In the marginalized ground graph𝒢¯ρA\\bar\{\\mathcal\{G\}\}\_\{\\rho\_\{A\}\}\(Fig\.[1](https://arxiv.org/html/2606.14892#S1.F1)\),𝐙=\{s1\.W,p2\.X\}\\mathbf\{Z\}=\\\{s\_\{1\}\.W,p\_\{2\}\.X\\\}is a valid backdoor adjustment set for the query\. This gives

P\(c1\.b∣do\(p1\.x\)\)=\\displaystyle P\(c\_\{1\}\.b\\mid do\(p\_\{1\}\.x\)\)=∑p2\.x,s1\.wP\(c1\.b∣p1\.x,p2\.x,s1\.2\)⋅P\(p2\.x,s1\.w\)\\displaystyle\\sum\_\{p\_\{2\}\.x,s\_\{1\}\.w\}P\(c\_\{1\}\.b\\mid p\_\{1\}\.x,p\_\{2\}\.x,s\_\{1\}\.2\)\\cdot P\(p\_\{2\}\.x,s\_\{1\}\.w\)

Next, we turn to non\-identifiability\. Many practical queries are within\-instance: they ask how intervening on an individual’s treatment affects that same individual’s outcome when individuals interact \(e\.g\., showing an online ad to one user in a social network and measuring that user’s purchases\)\. We show that if a within\-instance query is standardly non\-identifiable, then it is also relationally non\-identifiable\.

###### Proposition 4\.5\(Necessary condition for within\-instance relational identification\)\.

Consider a schema𝒮\\mathcal\{S\}, relational causal graph𝒢\\mathcal\{G\}, source skeletonsρ1,…,ρl\\rho\_\{1\},\\dots,\\rho\_\{l\}with available interventional distributionsℙ\\mathbb\{P\}, and a target skeletonρ⋆\\rho\_\{\\star\}\. Leto∈ρ⋆o\\in\\rho\_\{\\star\}be a target instance and consider a counterfactual queryP\(𝐲⋆∣𝐱⋆\)P\(\\mathbf\{y\}\_\{\\star\}\\mid\\mathbf\{x\}\_\{\\star\}\)with𝐘⋆,𝐗⋆⊆𝐕o\\mathbf\{Y\}\_\{\\star\},\\mathbf\{X\}\_\{\\star\}\\subseteq\\mathbf\{V\}\_\{o\}, the attributes ofoo\.

Let the restrictionℙ\|O\\mathbb\{P\}\|\_\{O\}be as follows\. For each source skeletonρk\\rho\_\{k\}, each distributionP\(𝐯ρk∣do\(𝐱k,j\)\)∈ℙP\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)\\in\\mathbb\{P\}, and each objecto′∈ρk\(O\)o^\{\\prime\}\\in\\rho\_\{k\}\(O\), includeP\(𝐯ρk,o′∣do\(𝐱k,j∩𝐯ρk,o′\)\)P\(\\mathbf\{v\}\_\{\\rho\_\{k\},o^\{\\prime\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\\cap\\mathbf\{v\}\_\{\\rho\_\{k\},o^\{\\prime\}\}\)\)inℙ\|O\\mathbb\{P\}\|\_\{O\}, with instance identifiers omitted\. Let𝒢o\\mathcal\{G\}\_\{o\}be the induced subgraph of the marginalized ground graph𝒢¯ρ⋆\\bar\{\\mathcal\{G\}\}\_\{\\rho\_\{\\star\}\}on𝐕o\\mathbf\{V\}\_\{o\}with instance identifiers omitted\.

IfP\(𝐲⋆∣𝐱⋆\)P\(\\mathbf\{y\}\_\{\\star\}\\mid\\mathbf\{x\}\_\{\\star\}\)is non\-identifiable via ctf\-calculus fromℙ\|O\\mathbb\{P\}\|\_\{O\}and𝒢o\\mathcal\{G\}\_\{o\}, then it is relationally non\-identifiable from𝒢\\mathcal\{G\}andℙ\\mathbb\{P\}\.

See Ex\.[C\.2](https://arxiv.org/html/2606.14892#A3.Thmadxexample2)for an application of the above result\.

This section developed symbolic criteria for relational identification across a range of settings\. We next introduce a neural approach that generalizes these criteria and provides a practical route to identification\.

Algorithm 1𝖱𝖾𝗅𝖺𝗍𝗂𝗈𝗇𝖺𝗅𝖭𝖾𝗎𝗋𝖺𝗅𝖨𝖣\\mathsf\{RelationalNeuralID\}Input:schema

𝒮\\mathcal\{S\}, relational causal graph

𝒢\\mathcal\{G\}, source data

𝒟=\{\(ρk,\{P\(𝐯ρk∣do\(𝐱k,j\)\)\}j=1mk\)\}k=1l\\mathcal\{D\}=\\big\\\{\\big\(\\rho\_\{k\},\\ \\\{P\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)\\\}\_\{j=1\}^\{m\_\{k\}\}\\big\)\\big\\\}\_\{k=1\}^\{l\}, target skeleton

ρ⋆\\rho\_\{\\star\}, query

P\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)
M^←𝒢\-RNCM\\hat\{M\}\\leftarrow\\mathcal\{G\}\\text\{\-RNCM\}

θ^l←arg⁡minθ∈Θ\(M^\)⁡PM^ρ⋆\(θ\)\(𝐲∗∣𝐱∗\)subject to∀k,jPM^ρk\(θ\)\(𝐯ρk∣do\(𝐱k,j\)\)=P\(𝐯ρk∣do\(𝐱k,j\)\)\\begin\{aligned\} \\hat\{\\theta\}\_\{l\}\\leftarrow&\\arg\\min\_\{\\theta\\in\\Theta\(\\hat\{M\}\)\}P^\{\\hat\{M\}\_\{\\rho\_\{\\star\}\}\(\\theta\)\}\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)\\text\{ subject to \}\\forall k,j\\\\ &\\ P^\{\\hat\{M\}\_\{\\rho\_\{k\}\}\(\\theta\)\}\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)=P\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)\\ \\end\{aligned\}

ql←PM^ρ⋆\(θ^l\)\(𝐲∗∣𝐱∗\)q\_\{l\}\\leftarrow P^\{\\hat\{M\}\_\{\\rho\_\{\\star\}\}\(\\hat\{\\theta\}\_\{l\}\)\}\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)

θ^r←arg⁡maxθ∈Θ\(M^\)⁡PM^ρ⋆\(θ\)\(𝐲∗∣𝐱∗\)subject to∀k,jPM^ρk\(θ\)\(𝐯ρk∣do\(𝐱k,j\)\)=P\(𝐯ρk∣do\(𝐱k,j\)\)\\begin\{aligned\} \\hat\{\\theta\}\_\{r\}\\leftarrow&\\arg\\max\_\{\\theta\\in\\Theta\(\\hat\{M\}\)\}P^\{\\hat\{M\}\_\{\\rho\_\{\\star\}\}\(\\theta\)\}\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)\\text\{ subject to \}\\forall k,j\\\\ &\\ P^\{\\hat\{M\}\_\{\\rho\_\{k\}\}\(\\theta\)\}\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)=P\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)\\ \\end\{aligned\}

qr←PM^ρ⋆\(θ^r\)\(𝐲∗∣𝐱∗\)q\_\{r\}\\leftarrow P^\{\\hat\{M\}\_\{\\rho\_\{\\star\}\}\(\\hat\{\\theta\}\_\{r\}\)\}\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)

if

ql=qrq\_\{l\}=q\_\{r\}then

return

qlq\_\{l\}
else

returnFAIL

endif

## 5Relational Neural Causal Models

Neural causal models \(NCMs\)\(Xia et al\.,[2021](https://arxiv.org/html/2606.14892#bib.bib130),[2023](https://arxiv.org/html/2606.14892#bib.bib131)\)parameterize an SCM with neural networks\. We adopt the same idea in the relational setting, developing a neural RSCM constrained by a given graph to enable relational identification from graph and data\.

A neural RSCM contains one neural network for each variableO\.AO\.A; this network is reused for all groundo\.Ao\.A\. The given graph𝒢\\mathcal\{G\}constrains which inputs each network may use\. In the traffic example, this means one network for𝖢𝖺𝗋\.B\\mathsf\{Car\}\.Bis applied to every car’sc\.Bc\.B, taking as input that car’s non\-relational parents \(e\.g\.,c\.UBc\.U\_\{B\}\) and relational parents \(e\.g\., the counts ofs\.Ws\.Wfor controlling signals and𝖯𝖾𝖽\.X\\mathsf\{Ped\}\.Xfor in\-path pedestrians\)\.

##### Assumptions\.

We assume throughout this section that observed attributes are discrete and finite, that𝒢\\mathcal\{G\}isρ\\rho\-Markovian \(no unobserved confounding between different instances\), and that each relational\-parent multiset has bounded size \(see Sec\.[D\.3](https://arxiv.org/html/2606.14892#A4.SS3)\)\.

![Refer to caption](https://arxiv.org/html/2606.14892v1/x3.png)\(a\)Targetc1c\_\{1\}\(2 signals, 2 pedestrians\)
![Refer to caption](https://arxiv.org/html/2606.14892v1/x4.png)\(b\)Targetc2c\_\{2\}\(1 signal, 2 pedestrians\)
![Refer to caption](https://arxiv.org/html/2606.14892v1/x5.png)\(c\)Targetc3c\_\{3\}\(0 signals, 1 pedestrian\)

Figure 3:Estimation accuracy of RNCMs for causal effects across the traffic scenes in Fig\.[1](https://arxiv.org/html/2606.14892#S1.F1)\(Exp\.[6\.1](https://arxiv.org/html/2606.14892#S6.SS1)\)\. The x\-axis denotes the source skeleton used for training\. The y\-axis denoteslog⁡\(MSE\)\\log\(\\mathrm\{MSE\}\)to ground truth of various methods in estimating an interventional query in the target skeletonρC\\rho\_\{C\}, averaged over 10 seeds \(lower is better\)\. Each panel represents a different car for which the query is formulated\. RNCMs consistently outperform baselines —even when the baselines are trained directly on the target skeleton while the RNCM is not\. In identifiable cases, RNCMs often match the gold\-standard NCM∗trained directly on the target\. Performance degrades only in non\-identifiable settings \(training onρA\\rho\_\{A\}and evaluating onc1c\_\{1\}\)\. ‘OOS’ denotes out\-of\-support cases for which the estimate is undefined\.###### Definition 5\.1\(𝒢\\mathcal\{G\}\-Constrained Relational Neural Causal Model \(𝒢\\mathcal\{G\}\-RNCM\)\)\.

Consider a schema𝒮\\mathcal\{S\}and a relational causal graph𝒢\\mathcal\{G\}\. A𝒢\\mathcal\{G\}\-RNCM𝒩=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{N\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangleis an RSCM constructed as follows\.

1. 1\.𝐕\\mathbf\{V\}contains a variableO\.AO\.Afor every non\-relational nodeO\.AO\.Ain𝒢\\mathcal\{G\}\.
2. 2\.For every object typeO∈ℰ∪ℛO\\in\\mathcal\{E\\cup R\}, and every maximal bidirected clique𝐂\\mathbf\{C\}in𝒢\\mathcal\{G\}over variables belonging to typeOO,𝐔\\mathbf\{U\}contains a variableO\.U𝐂∼𝒰\(\[0,1\]\)O\.U\_\{\\mathbf\{C\}\}\\sim\\mathcal\{U\}\(\[0,1\]\)\.444See Sec\.[B\.2](https://arxiv.org/html/2606.14892#A2.SS2)for the definition of a maximal bidirected clique\.
3. 3\.For eachO\.A∈𝐕\{O\.A\}\\in\\mathbf\{V\}, the mechanismfO\.Af\_\{O\.A\}is a feed\-forward neural network O\.A←fO\.A\(𝐩𝐚O\.A,𝐮O\.A,𝐩𝐚O\.Ar\)\.O\.A\\leftarrow f\_\{O\.A\}\(\\mathbf\{pa\}\_\{O\.A\},\\mathbf\{u\}\_\{O\.A\},\\mathbf\{pa\}^\{r\}\_\{O\.A\}\)\.Above,𝐏𝐚O\.A\\mathbf\{Pa\}\_\{O\.A\}consists of variablesO\.B∈𝐕O\.B\\in\\mathbf\{V\}with a directed edgeO\.B→O\.AO\.B\\to O\.Ain𝒢\\mathcal\{G\};𝐔O\.A\\mathbf\{U\}\_\{O\.A\}consists of variablesO\.U𝐂O\.U\_\{\\mathbf\{C\}\}such thatO\.AO\.Ais in the clique𝐂\\mathbf\{C\}; and𝐏𝐚O\.Ar\\mathbf\{Pa\}^\{r\}\_\{O\.A\}consists of multisets \(or aggregates as specified in𝒢\\mathcal\{G\}\) of variables𝐖\\mathbf\{W\}such that there is an edgeO\.R→O\.AO\.R\\to O\.Ain𝒢\\mathcal\{G\}whereO\.RO\.Ris a relational node withR=\(𝐖,ϕ,AGG\)R=\(\\mathbf\{W\},\\phi,\\textrm\{AGG\}\)\.

Note that since𝒢\\mathcal\{G\}isρ\\rho\-Markovian, the set𝐔O\.Ar\\mathbf\{U\}^\{r\}\_\{O\.A\}is empty for eachO\.AO\.Ain the definition above\. We show that𝒢\\mathcal\{G\}\-RNCMs are expressive enough to represent any RSCM consistent with𝒢\\mathcal\{G\}, under our stated assumptions\.

###### Theorem 5\.2\(Expressivity of RNCMs\)\.

Consider a relational schema𝒮\\mathcal\{S\}\. For every RSCMℳ\\mathcal\{M\}over𝒮\\mathcal\{S\}inducing relational causal graph𝒢\\mathcal\{G\}, there exists a𝒢\\mathcal\{G\}\-RNCM𝒩\\mathcal\{N\}such that for every skeletonρ\\rho, the ground RSCMsℳρ\\mathcal\{M\}\_\{\\rho\}and𝒩ρ\\mathcal\{N\}\_\{\\rho\}induce the same counterfactual distributions over𝐕ρ\\mathbf\{V\}\_\{\\rho\}\.

The expressivity of RNCMs lays the groundwork for causal identification via neural methods\. In particular, we can now train the parameters of an RNCM to fit the source data while minimizing or maximizing the query on our target skeleton; this procedure is given in Alg\.[1](https://arxiv.org/html/2606.14892#alg1)\. As a corollary of Thm\.[5\.2](https://arxiv.org/html/2606.14892#S5.Thmtheorem2), we can derive that Alg\.[1](https://arxiv.org/html/2606.14892#alg1)is sound and complete for the task of relational identification \(Cor\.[D\.7](https://arxiv.org/html/2606.14892#A4.Thmadxcorollary7)\)\.

While Alg\.[1](https://arxiv.org/html/2606.14892#alg1)is stated as taking distributions for inputs, in practice, and in our implementation, each distribution is observed only through finitely many samples\. The equality constraints are replaced by approximate constraints induced by a maximum\-likelihood objective, adapting standard NCM training procedures to allow for parameter sharing across instances\(Xia et al\.,[2021](https://arxiv.org/html/2606.14892#bib.bib130),[2023](https://arxiv.org/html/2606.14892#bib.bib131)\)\(Appendix[E](https://arxiv.org/html/2606.14892#A5)\)\.

## 6Experiments

### 6\.1Estimation accuracy across traffic scenes

We evaluate how well RNCMs can estimate identifiable queries on both seen and unseen skeletons using Alg\.[2](https://arxiv.org/html/2606.14892#alg2)\. We use the traffic schema in Ex\.[1](https://arxiv.org/html/2606.14892#Thmexample1)and graph𝒢\\mathcal\{G\}in Fig\.[2](https://arxiv.org/html/2606.14892#S4.F2), but omit the aggregators, creating a more challenging task\.555We use a histogram of counts of the different values in the domain of the relational parents\. For discrete variables, this is a sufficient statistic for the multiset of relational parent values\.

##### Setup\.

We train on four source settings from Fig\.[1](https://arxiv.org/html/2606.14892#S1.F1): three use a single skeleton for training \(ρA,ρB\\rho\_\{A\},\\rho\_\{B\}orρC\\rho\_\{C\}\) and one uses a pair of skeletons \(ρA\\rho\_\{A\}andρB\\rho\_\{B\}\)\. For each source skeleton, we generaten=104n=10^\{4\}observational samples and train five models: \(i\) a𝒢\\mathcal\{G\}\-RNCM; \(ii\)NCM\-X, a non\-relational causal baseline that trains a standard NCM on data consisting of a Cartesian product of all\(s\.W,p\.X,c\.B\)\(s\.W,p\.X,c\.B\)triples; \(iii\)NCM\-J, a causal baseline with partial relational information, similar toNCM\-Xbut restricting to ‘joined’ triples wheres,p,cs,p,care pairwise related; \(iv\)Rel\-MLP, a relational non\-causal baseline that predictsc\.Bc\.Bfrom the number and states of related cars and pedestrians; and \(v\)Rel\-MLP \+ deg, a variant ofRel\-MLPthat predictsc\.Bc\.Bonly from data on cars with the same degree ascc\(see Appendix[E\.4](https://arxiv.org/html/2606.14892#A5.SS4)for details\)\. We also train a gold\-standardNCM∗directly on the target data and ground graph \(without parameter sharing\)\. Then, we evaluate each trained model on the target skeletonρ⋆=ρC\\rho\_\{\\star\}=\\rho\_\{C\}, estimating the query, for various carscc: what is the probability thatccwill brake given that all the pedestrians in its path are set to ‘cross’? For the three cars appearing inρC\\rho\_\{C\}, this results in three queries total \(Table[E\.4\.1](https://arxiv.org/html/2606.14892#A5.SS4.T1)\); each query requires backdoor adjustment ons\.Ws\.Wfor signalsssthat both controlccand control pedestriansppin the path ofcc\.

##### Comparison with baselines\.

RNCMs consistently outperform flat NCMs andRel\-MLPvariants, often by≈100\\approx 100x in identifiable source\-target settings \(Fig\.[3](https://arxiv.org/html/2606.14892#S5.F3)\)\. Notably, even when the RNCM is trained on a source skeleton distinct from the target, it outperforms baselines trained directly on the target\. For instance, an RNCM trained on source skeletonρA\\rho\_\{A\}outperformsNCM\-XandNCM\-Jtrained onρC\\rho\_\{C\}when estimating the query for carc2c\_\{2\}inρC\\rho\_\{C\}\(Fig\.[3\(b\)](https://arxiv.org/html/2606.14892#S5.F3.sf2)\)\. This highlights the importance of relational structure in determining causal effects\. A notable strength ofRel\-MLP \+ degis on carc3c\_\{3\}\(Fig\.[3\(c\)](https://arxiv.org/html/2606.14892#S5.F3.sf3)\),\. For carc3c\_\{3\}, it happens to be that, due to the absence of confounding,P\(c3\.b∣do\(p3\.x\)=P\(c3\.b∣p3\.xP\(c\_\{3\}\.b\\mid do\(p\_\{3\}\.x\)=P\(c\_\{3\}\.b\\mid p\_\{3\}\.x, explaining the success of this non\-causal method\. Finally, RNCMs trained on sources distinct from the target are typically strong enough to match or even exceed the gold\-standard NCM∗trained directly on the target\. The main exception is training onρA\\rho\_\{A\}and evaluating onc1c\_\{1\}\(Fig\.[3\(a\)](https://arxiv.org/html/2606.14892#S5.F3.sf1)\), a non\-identifiable case which we discuss next\.

##### Role of training source\.

Our source\-target combinations cover three cases: generalization to exactly matched neighbourhoods \(e\.g\., sourceρA\\rho\_\{A\}to carc2c\_\{2\}inρC\\rho\_\{C\}\); generalization to smaller neighbourhoods \(e\.g\., sourceρB\\rho\_\{B\}to carc2c\_\{2\}inρC\\rho\_\{C\}\); and generalization to larger neighbourhoods \(e\.g\., sourceρA\\rho\_\{A\}to carc1c\_\{1\}inρC\\rho\_\{C\}\)\. We find that RNCMs succeed at the first two, while failing at the third, as predicted by the identifiability theory of Thm\.[4\.3](https://arxiv.org/html/2606.14892#S4.Thmtheorem3)and Prop\.[4\.4](https://arxiv.org/html/2606.14892#S4.Thmtheorem4)\. For instance, training onρA\\rho\_\{A\}underperforms onc1c\_\{1\}\(Fig\.[3\(a\)](https://arxiv.org/html/2606.14892#S5.F3.sf1)\): inρA\\rho\_\{A\}, every car is controlled by at most one signal, so it fails the support condition \(Thm\.[4\.3](https://arxiv.org/html/2606.14892#S4.Thmtheorem3)\) forc1c\_\{1\}\(two signals\)\. Still, combiningρA\\rho\_\{A\}andρB\\rho\_\{B\}in training recovers performance, illustrating how RNCMs integrate sources to improve accuracy\.

### 6\.2Identification accuracy across traffic scenes

We evaluate how well RNCMs are able to decide when a causal effect is identifiable, following Alg\.[1](https://arxiv.org/html/2606.14892#alg1)\.

![Refer to caption](https://arxiv.org/html/2606.14892v1/x6.png)Figure 4:Identification accuracy of RNCMs on unseen traffic scenes \(Exp\.[6\.2](https://arxiv.org/html/2606.14892#S6.SS2)\)\.\(left\)Relational causal graphs𝒢𝖻𝗈𝗐\\mathcal\{G\}\_\{\\mathsf\{bow\}\}\(top\) and𝒢𝗂𝗏\\mathcal\{G\}\_\{\\mathsf\{iv\}\}\(bottom\)\. Within each carCC,𝖢𝖺𝗋\.X\\mathsf\{Car\}\.Xaffects and is confounded with𝖢𝖺𝗋\.Y\\mathsf\{Car\}\.Yin both graphs\.\(right\)The average max–min gap produced by𝖱𝖾𝗅𝖺𝗍𝗂𝗈𝗇𝖺𝗅𝖭𝖾𝗎𝗋𝖺𝗅𝖨𝖣\\mathsf\{RelationalNeuralID\}\(Alg\.[1](https://arxiv.org/html/2606.14892#alg1)\) collapses toward zero for the identifiable different\-car query \(blue\) but remains large for the non\-identifiable same\-car query \(orange\)\. Shaded regions denote the 25th and 75th percentile across 10 random seeds\.##### Setup\.

We use a schema with one entity type𝖢𝖺𝗋\\mathsf\{Car\}\(CC\) and one relation𝖡𝖾𝗁𝗂𝗇𝖽\\mathsf\{Behind\}\(𝖢𝖺𝗋1,𝖢𝖺𝗋2\)\(\\mathsf\{Car\}\_\{1\},\\mathsf\{Car\}\_\{2\}\)\. Each car has two observed attributes𝖢𝖺𝗋\.X\\mathsf\{Car\}\.Xand𝖢𝖺𝗋\.Y\\mathsf\{Car\}\.Y\. The source skeletonρ\\rhohas two carsc1,c2c\_\{1\},c\_\{2\}with𝖡𝖾𝗁𝗂𝗇𝖽\(c1,c2\)\\mathsf\{Behind\}\(c\_\{1\},c\_\{2\}\)\. The target skeletonρ⋆\\rho\_\{\\star\}has three carsc1,c2,c3c\_\{1\},c\_\{2\},c\_\{3\}with𝖡𝖾𝗁𝗂𝗇𝖽\(c1,c2\)\\mathsf\{Behind\}\(c\_\{1\},c\_\{2\}\),𝖡𝖾𝗁𝗂𝗇𝖽\(c2,c3\)\\mathsf\{Behind\}\(c\_\{2\},c\_\{3\}\), and𝖡𝖾𝗁𝗂𝗇𝖽\(c1,c3\)\\mathsf\{Behind\}\(c\_\{1\},c\_\{3\}\)\. We consider two causal graphs, the relational bow𝒢𝖻𝗈𝗐\\mathcal\{G\}\_\{\\mathsf\{bow\}\}and IV𝒢𝗂𝗏\\mathcal\{G\}\_\{\\mathsf\{iv\}\}\(Figs\.[4](https://arxiv.org/html/2606.14892#S6.F4),[E\.6\.1](https://arxiv.org/html/2606.14892#A5.SS6.F1)\)\. Data\-generation follows Exp\.[6\.1](https://arxiv.org/html/2606.14892#S6.SS1)but with majority aggregation\. For each graph, we consider two target queries onρ⋆\\rho\_\{\\star\}”Qd:=Pρ⋆\(c3\.Y∣do\(c2\.X\)\)Q\_\{d\}:=P^\{\\rho\_\{\\star\}\}\(c\_\{3\}\.Y\\mid do\(c\_\{2\}\.X\)\), an identifiable effect across cars; andQs:=Pρ⋆\(c3\.Y∣do\(c3\.X\)\)Q\_\{s\}:=P^\{\\rho\_\{\\star\}\}\(c\_\{3\}\.Y\\mid do\(c\_\{3\}\.X\)\), a non\-identifiable effect within the same car\. Results of Alg\.[1](https://arxiv.org/html/2606.14892#alg1)are in Fig\.[4](https://arxiv.org/html/2606.14892#S6.F4)\.

##### Results\.

Though the target carc3c\_\{3\}inρ⋆\\rho\_\{\\star\}has a larger relational neighborhood \(two cars behind it\) than any car in the sourceρ\\rho,𝒢\\mathcal\{G\}\-RNCMs correctly assess identifiability in the target, matching the theory\. In the non\-relational IV and bow graphs, the causal effectP\(y∣do\(x\)\)P\(y\\mid do\(x\)\)is non\-identifiable from purely observational data\(Pearl,[2009](https://arxiv.org/html/2606.14892#bib.bib86)\)\. By Prop\.[4\.5](https://arxiv.org/html/2606.14892#S4.Thmtheorem5), the same conclusion carries over to the within\-car queryQsQ\_\{s\}\. Consistent with this, RNCMs trained to minimize vs\. maximizeQsQ\_\{s\}remain far apart \(orange curve in Fig\.[4](https://arxiv.org/html/2606.14892#S6.F4)\)\. In contrast, for both graphs we show that the cross\-car queryQdQ\_\{d\}onρ⋆\\rho\_\{\\star\}*is*identifiable from source dataP\(𝐯ρ\)P\(\\mathbf\{v\}\_\{\\rho\}\)\(Prop\.[E\.4](https://arxiv.org/html/2606.14892#A5.Thmadxprop4)\)\. Accordingly, the RNCM max–min gap forQdQ\_\{d\}collapses to \(approximately\) zero \(solid blue curve in Fig\.[4](https://arxiv.org/html/2606.14892#S6.F4)\), indicating that Alg\.[1](https://arxiv.org/html/2606.14892#alg1)correctly certifies identifiability in this setting\.

In Sec\.[E](https://arxiv.org/html/2606.14892#A5), we evaluate RNCMs on larger relational structures, model misspecification, and front\-door estimation\.

## 7Conclusions

In this paper, we introduced*relational structural causal models*\(RSCMs\), a generalization of SCMs to object\-relational domains\. We characterized the limits of learning in this setting \(Thm\.[3\.3](https://arxiv.org/html/2606.14892#S3.Thmtheorem3), Thm\.[3\.4](https://arxiv.org/html/2606.14892#S3.Thmtheorem4)\), and gave graphical conditions for identification within and across relational skeletons \(Thm\.[4\.3](https://arxiv.org/html/2606.14892#S4.Thmtheorem3), Cor\.[D\.4](https://arxiv.org/html/2606.14892#A4.Thmtheorem4), Props\.[4\.4](https://arxiv.org/html/2606.14892#S4.Thmtheorem4),[4\.5](https://arxiv.org/html/2606.14892#S4.Thmtheorem5)\)\. Finally, we developed*relational neural causal models*for identification that are provably correct \(Alg\.[1](https://arxiv.org/html/2606.14892#alg1), Alg\.[2](https://arxiv.org/html/2606.14892#alg2)\) Thm\.[5\.2](https://arxiv.org/html/2606.14892#S5.Thmtheorem2), Cor\.[D\.7](https://arxiv.org/html/2606.14892#A4.Thmadxcorollary7)\) and empirically outperform existing neural\-causal baselines \(Sec\.[6](https://arxiv.org/html/2606.14892#S6)\)\. We hope our work informs causal reasoning in domains where relations between objects are of scientific importance\.

## Acknowledgements

This research is supported in part by the NSF, ONR, AFOSR, DoE, Amazon, JP Morgan, and The Alfred P\. Sloan Foundation\. We thank Kevin Xia and Yushu Pan for their helpful comments\.

## Impact Statement

This paper presents results that advance the field of machine learning, bringing together the areas of causal inference and relational learning\. In particular, we contribute methodological foundations for causal inference in object\-relational settings and validate our methods on simulated traffic scenes\. If adopted in practice, our work could enable better generalization in settings such as autonomous driving and robotics\.

## References

- Ahsan et al\. \(2022\)Ahsan, R\., Arbour, D\., and Zheleva, E\.Relational causal models with cycles: Representation and reasoning\.In Schölkopf, B\., Uhler, C\., and Zhang, K\. \(eds\.\),*Proceedings of the First Conference on Causal Learning and Reasoning*, volume 177 of*Proceedings of Machine Learning Research*, pp\. 1–18\. PMLR, 11–13 Apr 2022\.URL[https://proceedings\.mlr\.press/v177/ahsan22a\.html](https://proceedings.mlr.press/v177/ahsan22a.html)\.
- Arbour et al\. \(2016\)Arbour, D\., Garant, D\., and Jensen, D\.Inferring network effects from observational data\.In*Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, KDD ’16, pp\. 715–724, New York, NY, USA, 2016\. Association for Computing Machinery\.ISBN 9781450342322\.doi:10\.1145/2939672\.2939791\.URL[https://doi\.org/10\.1145/2939672\.2939791](https://doi.org/10.1145/2939672.2939791)\.
- Bach et al\. \(2017\)Bach, S\. H\., Broecheler, M\., Huang, B\., and Getoor, L\.Hinge\-loss markov random fields and probabilistic soft logic\.*J\. Mach\. Learn\. Res\.*, 18\(1\):3846–3912, January 2017\.ISSN 1532\-4435\.
- Baek et al\. \(2025\)Baek, J\., Wu, Y\.\-F\., Singh, G\., and Ahn, S\.Dreamweaver: Learning compositional world models from pixels\.In*The Thirteenth International Conference on Learning Representations*, 2025\.URL[https://openreview\.net/forum?id=e5mTvjXG9u](https://openreview.net/forum?id=e5mTvjXG9u)\.
- Barabási et al\. \(2011\)Barabási, A\.\-L\., Gulbahce, N\., and Loscalzo, J\.Network Medicine: A Network\-based Approach to Human Disease\.*Nature reviews\. Genetics*, 12\(1\):56–68, January 2011\.ISSN 1471\-0056\.doi:10\.1038/nrg2918\.URL[https://pmc\.ncbi\.nlm\.nih\.gov/articles/PMC3140052/](https://pmc.ncbi.nlm.nih.gov/articles/PMC3140052/)\.
- Bareinboim \(2025\)Bareinboim, E\.*Causal Artificial Intelligence: A Roadmap for Building Causally Intelligent Systems*\.2025\.URL[https://causalai\-book\.net/](https://causalai-book.net/)\.Draft version \(October 11, 2025\)\.
- Bareinboim et al\. \(2022\)Bareinboim, E\., Correa, J\. D\., Ibeling, D\., and Icard, T\.On pearl’s hierarchy and the foundations of causal inference\.In*Probabilistic and Causal Inference: The Works of Judea Pearl*, pp\. 507–556\. Association for Computing Machinery, New York, NY, USA, 1st edition, 2022\.
- Bareinboim et al\. \(2024\)Bareinboim, E\., Lee, S\., and Zhang, J\.An introduction to causal reinforcement learning\.Technical Report 65, December 2024\.URL[https://causalai\.net/r65\.pdf](https://causalai.net/r65.pdf)\.
- Battaglia et al\. \(2018\)Battaglia, P\., Hamrick, J\. B\. C\., Bapst, V\., Sanchez, A\., Zambaldi, V\., Malinowski, M\., Tacchetti, A\., Raposo, D\., Santoro, A\., Faulkner, R\., Gulcehre, C\., Song, F\., Ballard, A\., Gilmer, J\., Dahl, G\. E\., Vaswani, A\., Allen, K\., Nash, C\., Langston, V\. J\., Dyer, C\., Heess, N\., Wierstra, D\., Kohli, P\., Botvinick, M\., Vinyals, O\., Li, Y\., and Pascanu, R\.Relational inductive biases, deep learning, and graph networks\.*arXiv*, 2018\.URL[https://arxiv\.org/pdf/1806\.01261\.pdf](https://arxiv.org/pdf/1806.01261.pdf)\.
- Bhattacharya et al\. \(2020\)Bhattacharya, R\., Malinsky, D\., and Shpitser, I\.Causal inference under interference and network uncertainty\.In Adams, R\. P\. and Gogate, V\. \(eds\.\),*Proceedings of The 35th Uncertainty in Artificial Intelligence Conference*, volume 115 of*Proceedings of Machine Learning Research*, pp\. 1028–1038\. PMLR, 22–25 Jul 2020\.URL[https://proceedings\.mlr\.press/v115/bhattacharya20a\.html](https://proceedings.mlr.press/v115/bhattacharya20a.html)\.
- Blei \(2012\)Blei, D\. M\.Probabilistic topic models\.55, 2012\.
- Bordes et al\. \(2013\)Bordes, A\., Usunier, N\., Garcia\-Duran, A\., Weston, J\., and Yakhnenko, O\.Translating embeddings for modeling multi\-relational data\.In Burges, C\., Bottou, L\., Welling, M\., Ghahramani, Z\., and Weinberger, K\. \(eds\.\),*Advances in Neural Information Processing Systems*, volume 26\. Curran Associates, Inc\., 2013\.URL[https://proceedings\.neurips\.cc/paper\_files/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9\-Paper\.pdf](https://proceedings.neurips.cc/paper_files/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf)\.
- Bunescu & Mooney \(2004\)Bunescu, R\. and Mooney, R\. J\.Collective information extraction with relational markov networks\.In*Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics*, ACL ’04, pp\. 438–es, USA, 2004\. Association for Computational Linguistics\.doi:10\.3115/1218955\.1219011\.URL[https://doi\.org/10\.3115/1218955\.1219011](https://doi.org/10.3115/1218955.1219011)\.
- Buntine \(1994\)Buntine, W\. L\.Operations for learning with graphical models\.*J\. Artif\. Int\. Res\.*, 2\(1\):159–225, December 1994\.ISSN 1076\-9757\.doi:10\.1613/jair\.62\.URL[https://doi\.org/10\.1613/jair\.62](https://doi.org/10.1613/jair.62)\.
- Chen et al\. \(2025\)Chen, T\., Kanatsoulis, C\., and Leskovec, J\.RelGNN: Composite message passing for relational deep learning\.In Singh, A\., Fazel, M\., Hsu, D\., Lacoste\-Julien, S\., Berkenkamp, F\., Maharaj, T\., Wagstaff, K\., and Zhu, J\. \(eds\.\),*Proceedings of the 42nd International Conference on Machine Learning*, volume 267 of*Proceedings of Machine Learning Research*, pp\. 8296–8312\. PMLR, 13–19 Jul 2025\.URL[https://proceedings\.mlr\.press/v267/chen25ad\.html](https://proceedings.mlr.press/v267/chen25ad.html)\.
- Chollet \(2019\)Chollet, F\.On the measure of intelligence, 2019\.URL[https://arxiv\.org/abs/1911\.01547](https://arxiv.org/abs/1911.01547)\.
- Chomsky \(1965\)Chomsky, N\.*Aspects of the Theory of Syntax*\.The MIT Press, 50 edition, 1965\.ISBN 9780262527408\.URL[http://www\.jstor\.org/stable/j\.ctt17kk81z](http://www.jstor.org/stable/j.ctt17kk81z)\.
- Codd \(1970\)Codd, E\. F\.A relational model of data for large shared data banks\.*Commun\. ACM*, 13\(6\):377–387, June 1970\.ISSN 0001\-0782\.doi:10\.1145/362384\.362685\.URL[https://doi\.org/10\.1145/362384\.362685](https://doi.org/10.1145/362384.362685)\.
- Correa & Bareinboim \(2025\)Correa, J\. D\. and Bareinboim, E\.Counterfactual graphical models: Constraints and inference\.In*Forty\-second International Conference on Machine Learning*, 2025\.URL[https://openreview\.net/forum?id=Z1qZoHa6ql](https://openreview.net/forum?id=Z1qZoHa6ql)\.
- Cotta et al\. \(2023\)Cotta, L\., Bevilacqua, B\., Ahmed, N\., and Ribeiro, B\.Causal lifting and link prediction\.*Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences*, 479\(2276\):20230121, 08 2023\.ISSN 1364\-5021\.doi:10\.1098/rspa\.2023\.0121\.URL[https://doi\.org/10\.1098/rspa\.2023\.0121](https://doi.org/10.1098/rspa.2023.0121)\.
- Cui et al\. \(2019\)Cui, Z\., Henrickson, K\., Ke, R\., Pu, Z\., and Wang, Y\.Traffic graph convolutional recurrent neural network: A deep learning framework for network\-scale traffic learning and forecasting, 2019\.URL[https://arxiv\.org/abs/1802\.07007](https://arxiv.org/abs/1802.07007)\.
- Cussens \(2000\)Cussens, J\.Stochastic logic programs: sampling, inference and applications\.In*Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence*, UAI’00, pp\. 115–122, San Francisco, CA, USA, 2000\. Morgan Kaufmann Publishers Inc\.ISBN 1558607099\.
- Dai et al\. \(2016\)Dai, H\., Dai, B\., and Song, L\.Discriminative embeddings of latent variable models for structured data\.In*Proceedings of the 33rd International Conference on International Conference on Machine Learning \- Volume 48*, ICML’16, pp\. 2702–2711\. JMLR\.org, 2016\.
- de Finetti \(1931\)de Finetti, B\.Funzione caratteristica di un fenomeno aleatorio\.*Atti della R\. Accademia Nazionale dei Lincei, Ser\. 6\. Memorie, Classe di Scienze Fisiche, Matematiche e Naturali 4*, pp\. 251–299, 1931\.
- de Haan et al\. \(2019\)de Haan, P\., Jayaraman, D\., and Levine, S\.*Causal confusion in imitation learning*\.Curran Associates Inc\., Red Hook, NY, USA, 2019\.
- De Raedt et al\. \(2007\)De Raedt, L\., Kimmig, A\., and Toivonen, H\.Problog: a probabilistic prolog and its application in link discovery\.In*Proceedings of the 20th International Joint Conference on Artifical Intelligence*, IJCAI’07, pp\. 2468–2473, San Francisco, CA, USA, 2007\. Morgan Kaufmann Publishers Inc\.
- Derrow\-Pinion et al\. \(2021\)Derrow\-Pinion, A\., She, J\., Wong, D\., Lange, O\., Hester, T\., Perez, L\., Nunkesser, M\., Lee, S\., Guo, X\., Wiltshire, B\., Battaglia, P\. W\., Gupta, V\., Li, A\., Xu, Z\., Sanchez\-Gonzalez, A\., Li, Y\., and Velickovic, P\.Eta prediction with graph neural networks in google maps\.In*Proceedings of the 30th ACM International Conference on Information & Knowledge Management*, CIKM ’21, pp\. 3767–3776, New York, NY, USA, 2021\. Association for Computing Machinery\.ISBN 9781450384469\.doi:10\.1145/3459637\.3481916\.URL[https://doi\.org/10\.1145/3459637\.3481916](https://doi.org/10.1145/3459637.3481916)\.
- Du et al\. \(2023\)Du, Y\., Durkan, C\., Strudel, R\., Tenenbaum, J\. B\., Dieleman, S\., Fergus, R\., Sohl\-Dickstein, J\., Doucet, A\., and Grathwohl, W\.Reduce, reuse, recycle: compositional generation with energy\-based diffusion models and mcmc\.In*Proceedings of the 40th International Conference on Machine Learning*, ICML’23\. JMLR\.org, 2023\.
- Duan et al\. \(2025\)Duan, X\., He, Y\., Tajwar, F\., Chen, W\., Salakhutdinov, R\., and Schneider, J\.State combinatorial generalization in decision making with conditional diffusion models, 2025\.URL[https://openreview\.net/forum?id=PH7ja3T0vN](https://openreview.net/forum?id=PH7ja3T0vN)\.
- Duvenaud et al\. \(2015\)Duvenaud, D\., Maclaurin, D\., Aguilera\-Iparraguirre, J\., Gómez\-Bombarelli, R\., Hirzel, T\., Aspuru\-Guzik, A\., and Adams, R\. P\.Convolutional networks on graphs for learning molecular fingerprints\.In*Proceedings of the 29th International Conference on Neural Information Processing Systems \- Volume 2*, NIPS’15, pp\. 2224–2232, Cambridge, MA, USA, 2015\. MIT Press\.
- Dwivedi et al\. \(2026\)Dwivedi, V\. P\., Jaladi, S\., Shen, Y\., López, F\., Kanatsoulis, C\. I\., Puri, R\., Fey, M\., and Leskovec, J\.Relational graph transformer, 2026\.URL[https://arxiv\.org/abs/2505\.10960](https://arxiv.org/abs/2505.10960)\.
- Falcon & Cho \(2020\)Falcon, W\. and Cho, K\.A framework for contrastive self\-supervised learning and designing a new approach, 2020\.URL[https://arxiv\.org/abs/2009\.00104](https://arxiv.org/abs/2009.00104)\.
- Fan et al\. \(2022\)Fan, S\., Wang, X\., Mo, Y\., Shi, C\., and Tang, J\.Debiasing graph neural networks via learning disentangled causal substructure\.In*Proceedings of the 36th International Conference on Neural Information Processing Systems*, NIPS ’22, Red Hook, NY, USA, 2022\. Curran Associates Inc\.ISBN 9781713871088\.
- Feng et al\. \(2025\)Feng, F\., Lippe, P\., and Magliacane, S\.Learning interactive world model for object\-centric reinforcement learning\.In*The Thirty\-ninth Annual Conference on Neural Information Processing Systems*, 2025\.URL[https://openreview\.net/forum?id=E0cjqfM55C](https://openreview.net/forum?id=E0cjqfM55C)\.
- Ferraro et al\. \(2023\)Ferraro, S\., Mazzaglia, P\., Verbelen, T\., and Dhoedt, B\.FOCUS: Object\-centric world models for robotic manipulation\.In*Intrinsically\-Motivated and Open\-Ended Learning Workshop @NeurIPS2023*, 2023\.URL[https://openreview\.net/forum?id=RoQbZRv1zw](https://openreview.net/forum?id=RoQbZRv1zw)\.
- Fey et al\. \(2024\)Fey, M\., Hu, W\., Huang, K\., Lenssen, J\. E\., Ranjan, R\., Robinson, J\., Ying, R\., You, J\., and Leskovec, J\.Position: Relational deep learning \- graph representation learning on relational databases\.In Salakhutdinov, R\., Kolter, Z\., Heller, K\., Weller, A\., Oliver, N\., Scarlett, J\., and Berkenkamp, F\. \(eds\.\),*Proceedings of the 41st International Conference on Machine Learning*, volume 235 of*Proceedings of Machine Learning Research*, pp\. 13592–13607\. PMLR, 21–27 Jul 2024\.URL[https://proceedings\.mlr\.press/v235/fey24a\.html](https://proceedings.mlr.press/v235/fey24a.html)\.
- Friedman et al\. \(1999\)Friedman, N\., Getoor, L\., Koller, D\., and Pfeffer, A\.Learning probabilistic relational models\.In*Proceedings of the 16th International Joint Conference on Artificial Intelligence \- Volume 2*, IJCAI’99, pp\. 1300–1307, San Francisco, CA, USA, 1999\. Morgan Kaufmann Publishers Inc\.
- Gao et al\. \(2019\)Gao, H\., Pei, J\., and Huang, H\.Conditional random field enhanced graph convolutional neural networks\.In*Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining*, KDD ’19, pp\. 276–284, New York, NY, USA, 2019\. Association for Computing Machinery\.ISBN 9781450362016\.doi:10\.1145/3292500\.3330888\.URL[https://doi\.org/10\.1145/3292500\.3330888](https://doi.org/10.1145/3292500.3330888)\.
- Gelman & Hill \(2006\)Gelman, A\. and Hill, J\.*Data Analysis Using Regression and Multilevel/Hierarchical Models*\.Analytical Methods for Social Research\. Cambridge University Press, 2006\.
- Getoor & Taskar \(2007\)Getoor, L\. and Taskar, B\.*Introduction to Statistical Relational Learning*\.The MIT Press, 08 2007\.ISBN 9780262256230\.doi:10\.7551/mitpress/7432\.001\.0001\.URL[https://doi\.org/10\.7551/mitpress/7432\.001\.0001](https://doi.org/10.7551/mitpress/7432.001.0001)\.
- Getoor et al\. \(2003\)Getoor, L\., Friedman, N\., Koller, D\., and Taskar, B\.Learning probabilistic models of link structure\.*J\. Mach\. Learn\. Res\.*, 3\(null\):679–707, March 2003\.ISSN 1532\-4435\.
- Gilmer et al\. \(2017\)Gilmer, J\., Schoenholz, S\. S\., Riley, P\. F\., Vinyals, O\., and Dahl, G\. E\.Neural message passing for quantum chemistry\.In*Proceedings of the 34th International Conference on Machine Learning \- Volume 70*, ICML’17, pp\. 1263–1272\. JMLR\.org, 2017\.
- Gori et al\. \(2005\)Gori, M\., Monfardini, G\., and Scarselli, F\.A new model for learning in graph domains\.In*Proceedings\. 2005 IEEE International Joint Conference on Neural Networks, 2005\.*, volume 2, pp\. 729–734 vol\. 2, 2005\.doi:10\.1109/IJCNN\.2005\.1555942\.
- Guo et al\. \(2024\)Guo, S\., Zhang, C\., Mohan, K\., Huszár, F\., and Schölkopf, B\.Do finetti: On causal effects for exchangeable data\.In Globerson, A\., Mackey, L\., Belgrave, D\., Fan, A\., Paquet, U\., Tomczak, J\., and Zhang, C\. \(eds\.\),*Advances in Neural Information Processing Systems*, volume 37, pp\. 127317–127345\. Curran Associates, Inc\., 2024\.doi:10\.52202/079017\-4044\.
- Gurnee & Tegmark \(2024\)Gurnee, W\. and Tegmark, M\.Language models represent space and time\.In Kim, B\., Yue, Y\., Chaudhuri, S\., Fragkiadaki, K\., Khan, M\., and Sun, Y\. \(eds\.\),*International Conference on Representation Learning*, volume 2024, pp\. 2483–2503, 2024\.
- Ha & Schmidhuber \(2018\)Ha, D\. and Schmidhuber, J\.Recurrent world models facilitate policy evolution\.In Bengio, S\., Wallach, H\., Larochelle, H\., Grauman, K\., Cesa\-Bianchi, N\., and Garnett, R\. \(eds\.\),*Advances in Neural Information Processing Systems*, volume 31\. Curran Associates, Inc\., 2018\.URL[https://proceedings\.neurips\.cc/paper\_files/paper/2018/file/2de5d16682c3c35007e4e92982f1a2ba\-Paper\.pdf](https://proceedings.neurips.cc/paper_files/paper/2018/file/2de5d16682c3c35007e4e92982f1a2ba-Paper.pdf)\.
- Hajij et al\. \(2023\)Hajij, M\., Zamzmi, G\., Papamarkou, T\., Miolane, N\., Guzmán\-Sáenz, A\., Ramamurthy, K\. N\., Birdal, T\., Dey, T\. K\., Mukherjee, S\., Samaga, S\. N\., Livesay, N\., Walters, R\., Rosen, P\., and Schaub, M\. T\.Topological deep learning: Going beyond graph data, 2023\.URL[https://arxiv\.org/abs/2206\.00606](https://arxiv.org/abs/2206.00606)\.
- Hamaguchi et al\. \(2017\)Hamaguchi, T\., Oiwa, H\., Shimbo, M\., and Matsumoto, Y\.Knowledge transfer for out\-of\-knowledge\-base entities: a graph neural network approach\.In*Proceedings of the 26th International Joint Conference on Artificial Intelligence*, IJCAI’17, pp\. 1802–1808\. AAAI Press, 2017\.ISBN 9780999241103\.
- Hamilton et al\. \(2017\)Hamilton, W\. L\., Ying, R\., and Leskovec, J\.Inductive representation learning on large graphs\.In*Proceedings of the 31st International Conference on Neural Information Processing Systems*, NIPS’17, pp\. 1025–1035, Red Hook, NY, USA, 2017\. Curran Associates Inc\.ISBN 9781510860964\.
- Hamrick et al\. \(2018\)Hamrick, J\. B\., Allen, K\. R\., Bapst, V\., Zhu, T\., McKee, K\. R\., Tenenbaum, J\. B\., and Battaglia, P\. W\.Relational inductive bias for physical construction in humans and machines, 2018\.URL[https://arxiv\.org/abs/1806\.01203](https://arxiv.org/abs/1806.01203)\.
- Heckerman et al\. \(2004\)Heckerman, D\., Meek, C\., and Koller, D\.Probabilistic models for relational data\.Technical Report MSR\-TR\-2004\-30, March 2004\.
- Hudgens & Halloran \(2008\)Hudgens, M\. G\. and Halloran, M\. E\.Toward causal inference with interference\.*Journal of the American Statistical Association*, 103\(482\):832–842, 2008\.doi:10\.1198/016214508000000292\.URL[https://doi\.org/10\.1198/016214508000000292](https://doi.org/10.1198/016214508000000292)\.PMID: 19081744\.
- Hwang et al\. \(2023\)Hwang, G\., Choi, J\., Cho, H\., and Kang, M\.MAGANet: Achieving combinatorial generalization by modeling a group action\.In Krause, A\., Brunskill, E\., Cho, K\., Engelhardt, B\., Sabato, S\., and Scarlett, J\. \(eds\.\),*Proceedings of the 40th International Conference on Machine Learning*, volume 202 of*Proceedings of Machine Learning Research*, pp\. 14237–14248\. PMLR, 23–29 Jul 2023\.URL[https://proceedings\.mlr\.press/v202/hwang23b\.html](https://proceedings.mlr.press/v202/hwang23b.html)\.
- Jensen et al\. \(2020\)Jensen, D\., Burroni, J\., and Rattigan, M\.Object conditioning for causal inference\.In Adams, R\. P\. and Gogate, V\. \(eds\.\),*Proceedings of The 35th Uncertainty in Artificial Intelligence Conference*, volume 115 of*Proceedings of Machine Learning Research*, pp\. 1072–1082\. PMLR, 22–25 Jul 2020\.URL[https://proceedings\.mlr\.press/v115/jensen20a\.html](https://proceedings.mlr.press/v115/jensen20a.html)\.
- Jeong et al\. \(2025\)Jeong, H\., Ejaz, A\., Tian, J\., and Bareinboim, E\.Testing causal models with hidden variables in polynomial delay via conditional independencies\.*Proceedings of the AAAI Conference on Artificial Intelligence*, 39\(25\):26813–26822, April 2025\.ISSN 2159\-5399\.doi:10\.1609/aaai\.v39i25\.34885\.URL[http://dx\.doi\.org/10\.1609/aaai\.v39i25\.34885](http://dx.doi.org/10.1609/aaai.v39i25.34885)\.
- Johnson et al\. \(2016\)Johnson, J\., Hariharan, B\., van der Maaten, L\., Fei\-Fei, L\., Zitnick, C\. L\., and Girshick, R\.Clevr: A diagnostic dataset for compositional language and elementary visual reasoning, 2016\.URL[https://arxiv\.org/abs/1612\.06890](https://arxiv.org/abs/1612.06890)\.
- Jothimurugan et al\. \(2023\)Jothimurugan, K\., Hsu, S\., Bastani, O\., and Alur, R\.Robust subtask learning for compositional generalization\.In*Proceedings of the 40th International Conference on Machine Learning*, ICML’23\. JMLR\.org, 2023\.
- Kersting & Raedt \(2001\)Kersting, K\. and Raedt, L\. D\.Bayesian logic programs, 2001\.URL[https://arxiv\.org/abs/cs/0111058](https://arxiv.org/abs/cs/0111058)\.
- Kipf & Welling \(2017\)Kipf, T\. N\. and Welling, M\.Semi\-supervised classification with graph convolutional networks\.In*International Conference on Learning Representations*, 2017\.URL[https://openreview\.net/forum?id=SJU4ayYgl](https://openreview.net/forum?id=SJU4ayYgl)\.
- Koller & Friedman \(2009\)Koller, D\. and Friedman, N\.*Probabilistic Graphical Models: Principles and Techniques*\.Adaptive computation and machine learning\. MIT Press, 2009\.ISBN 9780262013192\.URL[https://books\.google\.co\.in/books?id=7dzpHCHzNQ4C](https://books.google.co.in/books?id=7dzpHCHzNQ4C)\.
- Lake et al\. \(2017\)Lake, B\. M\., Ullman, T\. D\., Tenenbaum, J\. B\., and Gershman, S\. J\.Building machines that learn and think like people\.*Behavioral and Brain Sciences*, 40:e253, 2017\.doi:10\.1017/S0140525X16001837\.
- LeCun \(2022\)LeCun, Y\.A path towards autonomous machine intelligence version 0\.9\. 2, 2022\-06\-27\.*Open Review*, 62\(1\):1–62, 2022\.
- Lee et al\. \(2019\)Lee, J\., Lee, Y\., Kim, J\., Kosiorek, A\., Choi, S\., and Teh, Y\. W\.Set transformer: A framework for attention\-based permutation\-invariant neural networks\.In Chaudhuri, K\. and Salakhutdinov, R\. \(eds\.\),*Proceedings of the 36th International Conference on Machine Learning*, volume 97 of*Proceedings of Machine Learning Research*, pp\. 3744–3753\. PMLR, 09–15 Jun 2019\.URL[https://proceedings\.mlr\.press/v97/lee19d\.html](https://proceedings.mlr.press/v97/lee19d.html)\.
- Lee & Honavar \(2016\)Lee, S\. and Honavar, V\.On learning causal models from relational data\.*Proceedings of the AAAI Conference on Artificial Intelligence*, 30\(1\), Mar\. 2016\.doi:10\.1609/aaai\.v30i1\.10417\.URL[https://ojs\.aaai\.org/index\.php/AAAI/article/view/10417](https://ojs.aaai.org/index.php/AAAI/article/view/10417)\.
- Li et al\. \(2019\)Li, R\., Jabri, A\., Darrell, T\., and Agrawal, P\.Towards practical multi\-object manipulation using relational reinforcement learning\.In*arXiv preprint arXiv:XXXX*, 2019\.
- Li et al\. \(2018\)Li, Y\., Yu, R\., Shahabi, C\., and Liu, Y\.Diffusion convolutional recurrent neural network: Data\-driven traffic forecasting\.In*International Conference on Learning Representations*, 2018\.URL[https://openreview\.net/forum?id=SJiHXGWAZ](https://openreview.net/forum?id=SJiHXGWAZ)\.
- Liang et al\. \(2025\)Liang, Q\., Qian, D\., Ziyin, L\., and Fiete, I\. R\.Compositional generalization via forced rendering of disentangled latents\.In*Forty\-second International Conference on Machine Learning*, 2025\.URL[https://openreview\.net/forum?id=rkHCHI5H5W](https://openreview.net/forum?id=rkHCHI5H5W)\.
- Lin et al\. \(2022\)Lin, Y\., Wang, A\. S\., Undersander, E\., and Rai, A\.Efficient and interpretable robot manipulation with graph neural networks\.*IEEE Robotics and Automation Letters*, 7\(2\):2740–2747, 2022\.doi:10\.1109/LRA\.2022\.3143518\.
- Liu et al\. \(2022\)Liu, N\., Li, S\., Du, Y\., Torralba, A\., and Tenenbaum, J\. B\.Compositional visual generation with composable diffusion models\.In Avidan, S\., Brostow, G\., Cissé, M\., Farinella, G\. M\., and Hassner, T\. \(eds\.\),*Computer Vision – ECCV 2022*, pp\. 423–439, Cham, 2022\. Springer Nature Switzerland\.ISBN 978\-3\-031\-19790\-1\.
- Locatello et al\. \(2020\)Locatello, F\., Weissenborn, D\., Unterthiner, T\., Mahendran, A\., Heigold, G\., Uszkoreit, J\., Dosovitskiy, A\., and Kipf, T\.Object\-centric learning with slot attention\.In Larochelle, H\., Ranzato, M\., Hadsell, R\., Balcan, M\., and Lin, H\. \(eds\.\),*Advances in Neural Information Processing Systems*, volume 33, pp\. 11525–11538\. Curran Associates, Inc\., 2020\.URL[https://proceedings\.neurips\.cc/paper\_files/paper/2020/file/8511df98c02ab60aea1b2356c013bc0f\-Paper\.pdf](https://proceedings.neurips.cc/paper_files/paper/2020/file/8511df98c02ab60aea1b2356c013bc0f-Paper.pdf)\.
- Maier et al\. \(2010\)Maier, M\., Taylor, B\., Oktay, H\., and Jensen, D\.Learning causal models of relational domains\.*Proceedings of the AAAI Conference on Artificial Intelligence*, 24\(1\):531–538, Jul\. 2010\.doi:10\.1609/aaai\.v24i1\.7695\.URL[https://ojs\.aaai\.org/index\.php/AAAI/article/view/7695](https://ojs.aaai.org/index.php/AAAI/article/view/7695)\.
- Mendez et al\. \(2022\)Mendez, J\. A\., van Seijen, H\., and EATON, E\.Modular lifelong reinforcement learning via neural composition\.In*International Conference on Learning Representations*, 2022\.URL[https://openreview\.net/forum?id=5XmLzdslFNN](https://openreview.net/forum?id=5XmLzdslFNN)\.
- Montero et al\. \(2021\)Montero, M\. L\., Ludwig, C\. J\., Costa, R\. P\., Malhotra, G\., and Bowers, J\.The role of disentanglement in generalisation\.In*International Conference on Learning Representations*, 2021\.URL[https://openreview\.net/forum?id=qbH974jKUVy](https://openreview.net/forum?id=qbH974jKUVy)\.
- Mosbach et al\. \(2025\)Mosbach, M\., Ewertz, J\. N\., Villar\-Corrales, A\., and Behnke, S\.SOLD: Slot object\-centric latent dynamics models for relational manipulation learning from pixels\.In*Forty\-second International Conference on Machine Learning*, 2025\.URL[https://openreview\.net/forum?id=XOUpHJPYRX](https://openreview.net/forum?id=XOUpHJPYRX)\.
- Nakano et al\. \(2023\)Nakano, A\., Suzuki, M\., and Matsuo, Y\.Interaction\-based disentanglement of entities for object\-centric world models\.In*The Eleventh International Conference on Learning Representations*, 2023\.URL[https://openreview\.net/forum?id=JQc2VowqCzz](https://openreview.net/forum?id=JQc2VowqCzz)\.
- Ogburn & VanderWeele \(2014\)Ogburn, E\. L\. and VanderWeele, T\. J\.Causal Diagrams for Interference\.*Statistical Science*, 29\(4\):559 – 578, 2014\.doi:10\.1214/14\-STS501\.URL[https://doi\.org/10\.1214/14\-STS501](https://doi.org/10.1214/14-STS501)\.
- Okawa et al\. \(2023\)Okawa, M\., Lubana, E\. S\., Dick, R\. P\., and Tanaka, H\.Compositional abilities emerge multiplicatively: Exploring diffusion models on a synthetic task\.In*Thirty\-seventh Conference on Neural Information Processing Systems*, 2023\.URL[https://openreview\.net/forum?id=frVo9MzRuU](https://openreview.net/forum?id=frVo9MzRuU)\.
- Orbanz & Roy \(2015\)Orbanz, P\. and Roy, D\. M\.Bayesian models of graphs, arrays and other exchangeable random structures\.*IEEE Trans\. Pattern Anal\. Mach\. Intell\.*, 37\(2\):437–461, February 2015\.ISSN 0162\-8828\.doi:10\.1109/TPAMI\.2014\.2334607\.URL[https://doi\.org/10\.1109/TPAMI\.2014\.2334607](https://doi.org/10.1109/TPAMI.2014.2334607)\.
- Oñoro et al\. \(2017\)Oñoro, D\., Niepert, M\., García\-Durán, A\., González\-Sánchez, R\., and López\-Sastre, R\.Representation learning for visual\-relational knowledge graphs\.09 2017\.doi:10\.48550/arXiv\.1709\.02314\.
- Pan & Bareinboim \(2024\)Pan, Y\. and Bareinboim, E\.Counterfactual image editing\.In Salakhutdinov, R\., Kolter, Z\., Heller, K\., Weller, A\., Oliver, N\., Scarlett, J\., and Berkenkamp, F\. \(eds\.\),*Proceedings of the 41st International Conference on Machine Learning*, volume 235 of*Proceedings of Machine Learning Research*, pp\. 39087–39101\. PMLR, 21–27 Jul 2024\.URL[https://proceedings\.mlr\.press/v235/pan24a\.html](https://proceedings.mlr.press/v235/pan24a.html)\.
- Pan & Bareinboim \(2025\)Pan, Y\. and Bareinboim, E\.Counterfactual image editing with disentangled causal latent space\.In*The Thirty\-ninth Annual Conference on Neural Information Processing Systems*, 2025\.URL[https://openreview\.net/forum?id=u2Lgi4NIe7](https://openreview.net/forum?id=u2Lgi4NIe7)\.
- Papamarkou et al\. \(2024\)Papamarkou, T\., Birdal, T\., Bronstein, M\. M\., Carlsson, G\. E\., Curry, J\., Gao, Y\., Hajij, M\., Kwitt, R\., Lio, P\., Di Lorenzo, P\., Maroulas, V\., Miolane, N\., Nasrin, F\., Natesan Ramamurthy, K\., Rieck, B\., Scardapane, S\., Schaub, M\. T\., Veličković, P\., Wang, B\., Wang, Y\., Wei, G\., and Zamzmi, G\.Position: Topological deep learning is the new frontier for relational learning\.In Salakhutdinov, R\., Kolter, Z\., Heller, K\., Weller, A\., Oliver, N\., Scarlett, J\., and Berkenkamp, F\. \(eds\.\),*Proceedings of the 41st International Conference on Machine Learning*, volume 235 of*Proceedings of Machine Learning Research*, pp\. 39529–39555\. PMLR, 21–27 Jul 2024\.URL[https://proceedings\.mlr\.press/v235/papamarkou24a\.html](https://proceedings.mlr.press/v235/papamarkou24a.html)\.
- Papillon et al\. \(2025\)Papillon, M\., Sanborn, S\., Mathe, J\., Cornelis, L\., Bertics, A\., Buracas, D\., J Lillemark, H\., Shewmake, C\., Dinc, F\., Pennec, X\., and Miolane, N\.Beyond euclid: an illustrated guide to modern machine learning with geometric, topological, and algebraic structures\.*Machine Learning: Science and Technology*, 6\(3\):031002, aug 2025\.doi:10\.1088/2632\-2153/adf375\.URL[https://doi\.org/10\.1088/2632\-2153/adf375](https://doi.org/10.1088/2632-2153/adf375)\.
- Park et al\. \(2021\)Park, J\., Seo, Y\., Liu, C\., Zhao, L\., Qin, T\., Shin, J\., and Liu, T\.\-Y\.Object\-aware regularization for addressing causal confusion in imitation learning\.In*Proceedings of the 35th International Conference on Neural Information Processing Systems*, NIPS ’21, Red Hook, NY, USA, 2021\. Curran Associates Inc\.ISBN 9781713845393\.
- Paszke et al\. \(2019\)Paszke, A\., Gross, S\., Massa, F\., Lerer, A\., Bradbury, J\., Chanan, G\., Killeen, T\., Lin, Z\., Gimelshein, N\., Antiga, L\., Desmaison, A\., Köpf, A\., Yang, E\., DeVito, Z\., Raison, M\., Tejani, A\., Chilamkurthy, S\., Steiner, B\., Fang, L\., Bai, J\., and Chintala, S\.Pytorch: An imperative style, high\-performance deep learning library, 2019\.URL[https://arxiv\.org/abs/1912\.01703](https://arxiv.org/abs/1912.01703)\.
- Pearl \(2009\)Pearl, J\.*Causality: Models, Reasoning, and Inference*\.Cambridge University Press, New York, 2nd edition, 2009\.
- Pearl & Mackenzie \(2018\)Pearl, J\. and Mackenzie, D\.*The Book of Why*\.Basic Books, New York, 2018\.
- Plečko & Bareinboim \(2024\)Plečko, D\. and Bareinboim, E\.Causal fairness analysis: A causal toolkit for fair machine learning\.*Found\. Trends Mach\. Learn\.*, 17\(3\):304–589, January 2024\.ISSN 1935\-8237\.doi:10\.1561/2200000106\.URL[https://doi\.org/10\.1561/2200000106](https://doi.org/10.1561/2200000106)\.
- Qu et al\. \(2019\)Qu, M\., Bengio, Y\., and Tang, J\.GMNN: Graph Markov neural networks\.In Chaudhuri, K\. and Salakhutdinov, R\. \(eds\.\),*Proceedings of the 36th International Conference on Machine Learning*, volume 97 of*Proceedings of Machine Learning Research*, pp\. 5241–5250\. PMLR, 09–15 Jun 2019\.URL[https://proceedings\.mlr\.press/v97/qu19a\.html](https://proceedings.mlr.press/v97/qu19a.html)\.
- Ranjan et al\. \(2025\)Ranjan, R\., Hudovernik, V\., Znidar, M\., Kanatsoulis, C\., Upendra, R\., Mohammadi, M\., Meyer, J\., Palczewski, T\., Guestrin, C\., and Leskovec, J\.Relational transformer: Toward zero\-shot foundation models for relational data, 2025\.URL[https://arxiv\.org/abs/2510\.06377](https://arxiv.org/abs/2510.06377)\.
- Raposo et al\. \(2017\)Raposo, D\., Santoro, A\., Barrett, D\., Pascanu, R\., Lillicrap, T\., and Battaglia, P\.Discovering objects and their relations from entangled scene representations, 2017\.URL[https://arxiv\.org/abs/1702\.05068](https://arxiv.org/abs/1702.05068)\.
- Regev et al\. \(2017\)Regev, A\., Teichmann, S\. A\., Lander, E\. S\., Amit, I\., Benoist, C\., Birney, E\., Bodenmiller, B\., Campbell, P\., Carninci, P\., Clatworthy, M\., Clevers, H\., Deplancke, B\., Dunham, I\., Eberwine, J\., Eils, R\., Enard, W\., Farmer, A\., Fugger, L\., Göttgens, B\., Hacohen, N\., Haniffa, M\., Hemberg, M\., Kim, S\., Klenerman, P\., Kriegstein, A\., Lein, E\., Linnarsson, S\., Lundberg, E\., Lundeberg, J\., Majumder, P\., Marioni, J\. C\., Merad, M\., Mhlanga, M\., Nawijn, M\., Netea, M\., Nolan, G\., Pe’er, D\., Phillipakis, A\., Ponting, C\. P\., Quake, S\., Reik, W\., Rozenblatt\-Rosen, O\., Sanes, J\., Satija, R\., Schumacher, T\. N\., Shalek, A\., Shapiro, E\., Sharma, P\., Shin, J\. W\., Stegle, O\., Stratton, M\., Stubbington, M\. J\. T\., Theis, F\. J\., Uhlen, M\., van Oudenaarden, A\., Wagner, A\., Watt, F\., Weissman, J\., Wold, B\., Xavier, R\., Yosef, N\., and Human Cell Atlas Meeting Participants\.The Human Cell Atlas\.*eLife*, 6:e27041, December 2017\.ISSN 2050\-084X\.doi:10\.7554/eLife\.27041\.
- Richardson & Domingos \(2006\)Richardson, M\. and Domingos, P\.Markov logic networks\.*Mach\. Learn\.*, 62\(1–2\):107–136, February 2006\.ISSN 0885\-6125\.doi:10\.1007/s10994\-006\-5833\-1\.URL[https://doi\.org/10\.1007/s10994\-006\-5833\-1](https://doi.org/10.1007/s10994-006-5833-1)\.
- Richens & Everitt \(2024\)Richens, J\. and Everitt, T\.Robust agents learn causal world models\.In Kim, B\., Yue, Y\., Chaudhuri, S\., Fragkiadaki, K\., Khan, M\., and Sun, Y\. \(eds\.\),*International Conference on Representation Learning*, volume 2024, pp\. 15786–15817, 2024\.
- Robinson et al\. \(2024\)Robinson, J\., Ranjan, R\., Hu, W\., Huang, K\., Han, J\., Dobles, A\., Fey, M\., Lenssen, J\. E\., Yuan, Y\., Zhang, Z\., He, X\., and Leskovec, J\.Relbench: A benchmark for deep learning on relational databases\.In Globerson, A\., Mackey, L\., Belgrave, D\., Fan, A\., Paquet, U\., Tomczak, J\., and Zhang, C\. \(eds\.\),*Advances in Neural Information Processing Systems*, volume 37, pp\. 21330–21341\. Curran Associates, Inc\., 2024\.doi:10\.52202/079017\-0672\.
- Rosenbaum \(2007\)Rosenbaum, P\. R\.Interference between units in randomized experiments\.*Journal of the American Statistical Association*, 102\(477\):191–200, 2007\.doi:10\.1198/016214506000001112\.URL[https://doi\.org/10\.1198/016214506000001112](https://doi.org/10.1198/016214506000001112)\.
- Rubin \(1990\)Rubin, D\. B\.\[On the Application of Probability Theory to Agricultural Experiments\. Essay on Principles\. Section 9\.\] Comment: Neyman \(1923\) and Causal Inference in Experiments and Observational Studies\.*Statistical Science*, 5\(4\):472 – 480, 1990\.doi:10\.1214/ss/1177012032\.URL[https://doi\.org/10\.1214/ss/1177012032](https://doi.org/10.1214/ss/1177012032)\.
- Sakai et al\. \(2025\)Sakai, Y\., Kamigaito, H\., and Watanabe, T\.Revisiting compositional generalization capability of large language models considering instruction following ability\.In Che, W\., Nabende, J\., Shutova, E\., and Pilehvar, M\. T\. \(eds\.\),*Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, pp\. 31219–31238, Vienna, Austria, July 2025\. Association for Computational Linguistics\.ISBN 979\-8\-89176\-251\-0\.doi:10\.18653/v1/2025\.acl\-long\.1508\.URL[https://aclanthology\.org/2025\.acl\-long\.1508/](https://aclanthology.org/2025.acl-long.1508/)\.
- Salimi et al\. \(2020\)Salimi, B\., Parikh, H\., Kayali, M\., Getoor, L\., Roy, S\., and Suciu, D\.Causal relational learning\.In*Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data*, SIGMOD ’20, pp\. 241–256, New York, NY, USA, 2020\. Association for Computing Machinery\.ISBN 9781450367356\.doi:10\.1145/3318464\.3389759\.URL[https://doi\.org/10\.1145/3318464\.3389759](https://doi.org/10.1145/3318464.3389759)\.
- Santoro et al\. \(2017\)Santoro, A\., Raposo, D\., Barrett, D\. G\., Malinowski, M\., Pascanu, R\., Battaglia, P\., and Lillicrap, T\.A simple neural network module for relational reasoning\.In Guyon, I\., Luxburg, U\. V\., Bengio, S\., Wallach, H\., Fergus, R\., Vishwanathan, S\., and Garnett, R\. \(eds\.\),*Advances in Neural Information Processing Systems*, volume 30\. Curran Associates, Inc\., 2017\.URL[https://proceedings\.neurips\.cc/paper\_files/paper/2017/file/e6acf4b0f69f6f6e60e9a815938aa1ff\-Paper\.pdf](https://proceedings.neurips.cc/paper_files/paper/2017/file/e6acf4b0f69f6f6e60e9a815938aa1ff-Paper.pdf)\.
- Scarselli et al\. \(2009a\)Scarselli, F\., Gori, M\., Tsoi, A\. C\., Hagenbuchner, M\., and Monfardini, G\.The graph neural network model\.*Trans\. Neur\. Netw\.*, 20\(1\):61–80, January 2009a\.ISSN 1045\-9227\.doi:10\.1109/TNN\.2008\.2005605\.URL[https://doi\.org/10\.1109/TNN\.2008\.2005605](https://doi.org/10.1109/TNN.2008.2005605)\.
- Scarselli et al\. \(2009b\)Scarselli, F\., Gori, M\., Tsoi, A\. C\., Hagenbuchner, M\., and Monfardini, G\.Computational capabilities of graph neural networks\.*Trans\. Neur\. Netw\.*, 20\(1\):81–102, January 2009b\.ISSN 1045\-9227\.doi:10\.1109/TNN\.2008\.2005141\.URL[https://doi\.org/10\.1109/TNN\.2008\.2005141](https://doi.org/10.1109/TNN.2008.2005141)\.
- Schölkopf \(2022\)Schölkopf, B\.*Causality for Machine Learning*, pp\. 765–804\.Association for Computing Machinery, New York, NY, USA, 1 edition, 2022\.ISBN 9781450395861\.URL[https://doi\.org/10\.1145/3501714\.3501755](https://doi.org/10.1145/3501714.3501755)\.
- Schott et al\. \(2022\)Schott, L\., von Kügelgen, J\., Träuble, F\., Gehler, P\., Russell, C\., Bethge, M\., Schölkopf, B\., Locatello, F\., and Brendel, W\.Visual representation learning does not generalize strongly within the same domain, 2022\.URL[https://arxiv\.org/abs/2107\.08221](https://arxiv.org/abs/2107.08221)\.
- Schölkopf et al\. \(2021\)Schölkopf, B\., Locatello, F\., Bauer, S\., Ke, N\. R\., Kalchbrenner, N\., Goyal, A\., and Bengio, Y\.Toward causal representation learning\.*Proceedings of the IEEE*, 109\(5\):612–634, 2021\.doi:10\.1109/JPROC\.2021\.3058954\.
- Sherman & Shpitser \(2018\)Sherman, E\. and Shpitser, I\.Identification and estimation of causal effects from dependent data\.In Bengio, S\., Wallach, H\., Larochelle, H\., Grauman, K\., Cesa\-Bianchi, N\., and Garnett, R\. \(eds\.\),*Advances in Neural Information Processing Systems*, volume 31\. Curran Associates, Inc\., 2018\.URL[https://proceedings\.neurips\.cc/paper\_files/paper/2018/file/024677efb8e4aee2eaeef17b54695bbe\-Paper\.pdf](https://proceedings.neurips.cc/paper_files/paper/2018/file/024677efb8e4aee2eaeef17b54695bbe-Paper.pdf)\.
- Singla & Domingos \(2006\)Singla, P\. and Domingos, P\.Entity resolution with markov logic\.In*Proceedings of the Sixth International Conference on Data Mining*, ICDM ’06, pp\. 572–582, USA, 2006\. IEEE Computer Society\.ISBN 0769527019\.doi:10\.1109/ICDM\.2006\.65\.URL[https://doi\.org/10\.1109/ICDM\.2006\.65](https://doi.org/10.1109/ICDM.2006.65)\.
- Sobel \(2006\)Sobel, M\. E\.What do randomized studies of housing mobility demonstrate?*Journal of the American Statistical Association*, 101\(476\):1398–1407, 2006\.doi:10\.1198/016214506000000636\.URL[https://doi\.org/10\.1198/016214506000000636](https://doi.org/10.1198/016214506000000636)\.
- Sommestad et al\. \(2010\)Sommestad, T\., Ekstedt, M\., and Johnson, P\.A probabilistic relational model for security risk analysis\.*Computers & Security*, 29\(6\):659–679, 2010\.ISSN 0167\-4048\.doi:https://doi\.org/10\.1016/j\.cose\.2010\.02\.002\.URL[https://www\.sciencedirect\.com/science/article/pii/S0167404810000209](https://www.sciencedirect.com/science/article/pii/S0167404810000209)\.
- Song et al\. \(2024\)Song, Y\., Lee, D\., and Kim, G\.Compositional conservatism: A transductive approach in offline reinforcement learning\.In*The Twelfth International Conference on Learning Representations*, 2024\.URL[https://openreview\.net/forum?id=HRkyLbBRHI](https://openreview.net/forum?id=HRkyLbBRHI)\.
- Spalević et al\. \(2020\)Spalević, S\., Veličković, P\., Kovačević, J\., and Nikolić, M\.Hierarchical protein function prediction with tail\-gnns, 2020\.URL[https://arxiv\.org/abs/2007\.12804](https://arxiv.org/abs/2007.12804)\.
- Spiegelhalter \(2002\)Spiegelhalter, D\. J\.Bayesian graphical modelling: A case\-study in monitoring health outcomes\.*Journal of the Royal Statistical Society Series C: Applied Statistics*, 47\(1\):115–133, 01 2002\.ISSN 0035\-9254\.doi:10\.1111/1467\-9876\.00101\.URL[https://doi\.org/10\.1111/1467\-9876\.00101](https://doi.org/10.1111/1467-9876.00101)\.
- Spirtes et al\. \(2000\)Spirtes, P\., Glymour, C\. N\., and Scheines, R\.*Causation, Prediction, and Search*\.MIT Press, Cambridge, MA, 2nd edition, 2000\.
- Sutton & McCallum \(2007\)Sutton, C\. and McCallum, A\.An introduction to conditional random fields for relational learning\.In*Introduction to Statistical Relational Learning*\. The MIT Press, 08 2007\.ISBN 9780262256230\.doi:10\.7551/mitpress/7432\.003\.0006\.URL[https://doi\.org/10\.7551/mitpress/7432\.003\.0006](https://doi.org/10.7551/mitpress/7432.003.0006)\.
- Sypetkowski et al\. \(2024\)Sypetkowski, M\., Wenkel, F\., Poursafaei, F\., Dickson, N\., Suri, K\., Fradkin, P\., and Beaini, D\.On the scalability of GNNs for molecular graphs\.In*The Thirty\-eighth Annual Conference on Neural Information Processing Systems*, 2024\.URL[https://openreview\.net/forum?id=klqhrq7fvB](https://openreview.net/forum?id=klqhrq7fvB)\.
- Taskar et al\. \(2002\)Taskar, B\., Abbeel, P\., and Koller, D\.Discriminative probabilistic models for relational data\.In*Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence*, UAI’02, pp\. 485–492, San Francisco, CA, USA, 2002\. Morgan Kaufmann Publishers Inc\.ISBN 1558608974\.
- Tenenbaum et al\. \(2011\)Tenenbaum, J\. B\., Kemp, C\., Griffiths, T\. L\., and Goodman, N\. D\.How to grow a mind: Statistics, structure, and abstraction\.*Science*, 331\(6022\):1279–1285, 2011\.doi:10\.1126/science\.1192788\.URL[https://www\.science\.org/doi/abs/10\.1126/science\.1192788](https://www.science.org/doi/abs/10.1126/science.1192788)\.
- Ullman & Widom \(2002\)Ullman, J\. and Widom, J\.*A First Course in Database Systems*\.An Alan R\. Apt book\. Prentice Hall, 2002\.ISBN 9780131225206\.URL[https://books\.google\.com/books?id=B5hOQAAACAAJ](https://books.google.com/books?id=B5hOQAAACAAJ)\.
- Vafa et al\. \(2024\)Vafa, K\., Chen, J\. Y\., Rambachan, A\., Mullainathan, S\., and Kleinberg, J\.Evaluating the world model implicit in a generative model\.In*Proceedings of the 38th International Conference on Neural Information Processing Systems*, NIPS ’24, Red Hook, NY, USA, 2024\. Curran Associates Inc\.ISBN 9798331314385\.
- Veerapaneni et al\. \(2020\)Veerapaneni, R\., Co\-Reyes, J\. D\., Chang, M\., Janner, M\., Finn, C\., Wu, J\., Tenenbaum, J\., and Levine, S\.Entity abstraction in visual model\-based reinforcement learning\.In Kaelbling, L\. P\., Kragic, D\., and Sugiura, K\. \(eds\.\),*Proceedings of the Conference on Robot Learning*, volume 100 of*Proceedings of Machine Learning Research*, pp\. 1439–1456\. PMLR, 30 Oct–01 Nov 2020\.URL[https://proceedings\.mlr\.press/v100/veerapaneni20a\.html](https://proceedings.mlr.press/v100/veerapaneni20a.html)\.
- Veličković \(2023\)Veličković, P\.Everything is connected: Graph neural networks\.*Current Opinion in Structural Biology*, 79:102538, April 2023\.ISSN 0959\-440X\.doi:10\.1016/j\.sbi\.2023\.102538\.URL[https://www\.sciencedirect\.com/science/article/pii/S0959440X2300012X](https://www.sciencedirect.com/science/article/pii/S0959440X2300012X)\.
- Veličković et al\. \(2018\)Veličković, P\., Cucurull, G\., Casanova, A\., Romero, A\., Liò, P\., and Bengio, Y\.Graph attention networks\.In*International Conference on Learning Representations*, 2018\.URL[https://openreview\.net/forum?id=rJXMpikCZ](https://openreview.net/forum?id=rJXMpikCZ)\.
- Vo et al\. \(2025\)Vo, A\., Nguyen, K\.\-N\., Taesiri, M\. R\., Dang, V\. T\., Nguyen, A\. T\., and Kim, D\.Vision language models are biased, 2025\.URL[https://arxiv\.org/abs/2505\.23941](https://arxiv.org/abs/2505.23941)\.
- Wang et al\. \(2025a\)Wang, Y\., Wang, X\., Gan, Q\., Wang, M\., Yang, Q\., Wipf, D\., and Zhang, M\.Griffin: Towards a graph\-centric relational database foundation model, 2025a\.URL[https://arxiv\.org/abs/2505\.05568](https://arxiv.org/abs/2505.05568)\.
- Wang et al\. \(2025b\)Wang, Z\., Wang, K\., Zhao, L\., Stone, P\., and Bian, J\.Dyn\-o: Building structured world models with object\-centric representations, 2025b\.URL[https://arxiv\.org/abs/2507\.03298](https://arxiv.org/abs/2507.03298)\.
- Weinstein & Blei \(2024\)Weinstein, E\. N\. and Blei, D\. M\.Hierarchical causal models, 2024\.URL[https://arxiv\.org/abs/2401\.05330](https://arxiv.org/abs/2401.05330)\.
- Wu et al\. \(2025\)Wu, F\., Dwivedi, V\. P\., and Leskovec, J\.Large language models are good relational learners, 2025\.URL[https://arxiv\.org/abs/2506\.05725](https://arxiv.org/abs/2506.05725)\.
- Wu et al\. \(2022\)Wu, Y\.\-X\., Wang, X\., Zhang, A\., He, X\., and seng Chua, T\.Discovering invariant rationales for graph neural networks\.In*ICLR*, 2022\.
- Wu et al\. \(2023\)Wu, Z\., Dvornik, N\., Greff, K\., Kipf, T\., and Garg, A\.Slotformer: Unsupervised visual dynamics simulation with object\-centric models\.In*The Eleventh International Conference on Learning Representations*, 2023\.URL[https://openreview\.net/forum?id=TFbwV6I0VLg](https://openreview.net/forum?id=TFbwV6I0VLg)\.
- Xia et al\. \(2021\)Xia, K\., Lee, K\.\-Z\., Bengio, Y\., and Bareinboim, E\.The causal\-neural connection: expressiveness, learnability, and inference\.In*Proceedings of the 35th International Conference on Neural Information Processing Systems*, NIPS ’21, Red Hook, NY, USA, 2021\. Curran Associates Inc\.ISBN 9781713845393\.
- Xia et al\. \(2023\)Xia, K\. M\., Pan, Y\., and Bareinboim, E\.Neural causal models for counterfactual identification and estimation\.In*The Eleventh International Conference on Learning Representations*, 2023\.URL[https://openreview\.net/forum?id=vouQcZS8KfW](https://openreview.net/forum?id=vouQcZS8KfW)\.
- Xu et al\. \(2005\)Xu, Z\., Tresp, V\., Yu, K\., Yu, S\., and Kriegel, H\.\-P\.Dirichlet enhanced relational learning\.In*Proceedings of the 22nd International Conference on Machine Learning*, ICML ’05, pp\. 1004–1011, New York, NY, USA, 2005\. Association for Computing Machinery\.ISBN 1595931805\.doi:10\.1145/1102351\.1102478\.URL[https://doi\.org/10\.1145/1102351\.1102478](https://doi.org/10.1145/1102351.1102478)\.
- Yang et al\. \(2024\)Yang, H\., Lu, H\., Lam, W\., and Cai, D\.Exploring compositional generalization of large language models\.In Cao, Y\. T\., Papadimitriou, I\., Ovalle, A\., Zampieri, M\., Ferraro, F\., and Swayamdipta, S\. \(eds\.\),*Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 4: Student Research Workshop\)*, pp\. 16–24, Mexico City, Mexico, June 2024\. Association for Computational Linguistics\.doi:10\.18653/v1/2024\.naacl\-srw\.3\.URL[https://aclanthology\.org/2024\.naacl\-srw\.3/](https://aclanthology.org/2024.naacl-srw.3/)\.
- Zaheer et al\. \(2017\)Zaheer, M\., Kottur, S\., Ravanbhakhsh, S\., Póczos, B\., Salakhutdinov, R\., and Smola, A\. J\.Deep sets\.In*Proceedings of the 31st International Conference on Neural Information Processing Systems*, NIPS’17, pp\. 3394–3404, Red Hook, NY, USA, 2017\. Curran Associates Inc\.ISBN 9781510860964\.
- Zambaldi et al\. \(2018\)Zambaldi, V\., Raposo, D\., Santoro, A\., Bapst, V\., Li, Y\., Babuschkin, I\., Tuyls, K\., Reichert, D\., Lillicrap, T\., Lockhart, E\., Shanahan, M\., Langston, V\., Pascanu, R\., Botvinick, M\., Vinyals, O\., and Battaglia, P\.Relational deep reinforcement learning, 2018\.URL[https://arxiv\.org/abs/1806\.01830](https://arxiv.org/abs/1806.01830)\.
- Zečević et al\. \(2021\)Zečević, M\., Dhami, D\. S\., Veličković, P\., and Kersting, K\.Relating graph neural networks to structural causal models, 2021\.URL[https://arxiv\.org/abs/2109\.04173](https://arxiv.org/abs/2109.04173)\.
- Zhang et al\. \(2022a\)Zhang, C\., Mohan, K\., and Pearl, J\.Causal inference with non\-iid data using linear graphical models\.In Koyejo, S\., Mohamed, S\., Agarwal, A\., Belgrave, D\., Cho, K\., and Oh, A\. \(eds\.\),*Advances in Neural Information Processing Systems*, volume 35, pp\. 13214–13225\. Curran Associates, Inc\., 2022a\.
- Zhang et al\. \(2022b\)Zhang, J\., Tian, J\., and Bareinboim, E\.Partial counterfactual identification from observational and experimental data\.In Chaudhuri, K\., Jegelka, S\., Song, L\., Szepesvari, C\., Niu, G\., and Sabato, S\. \(eds\.\),*Proceedings of the 39th International Conference on Machine Learning*, volume 162 of*Proceedings of Machine Learning Research*, pp\. 26548–26558\. PMLR, 17–23 Jul 2022b\.URL[https://proceedings\.mlr\.press/v162/zhang22ab\.html](https://proceedings.mlr.press/v162/zhang22ab.html)\.
- Zhang et al\. \(2020\)Zhang, Y\., Chen, X\., Yang, Y\., Ramamurthy, A\., Li, B\., Qi, Y\., and Song, L\.Efficient probabilistic logic reasoning with graph neural networks\.In*International Conference on Learning Representations*, 2020\.URL[https://openreview\.net/forum?id=rJg76kStwH](https://openreview.net/forum?id=rJg76kStwH)\.
- Zhao et al\. \(2025\)Zhao, C\., Tan, Z\., Ma, P\., Li, D\., Jiang, B\., Wang, Y\., Yang, Y\., and huan liu\.Is chain\-of\-thought reasoning of LLMs a mirage? a data distribution lens\.In*First Workshop on Foundations of Reasoning in Language Models*, 2025\.URL[https://openreview\.net/forum?id=o2AoLPIjle](https://openreview.net/forum?id=o2AoLPIjle)\.
- Zhao et al\. \(2022\)Zhao, L\., Kong, L\., Walters, R\., and Wong, L\. L\.Toward compositional generalization in object\-oriented world modeling\.In Chaudhuri, K\., Jegelka, S\., Song, L\., Szepesvari, C\., Niu, G\., and Sabato, S\. \(eds\.\),*Proceedings of the 39th International Conference on Machine Learning*, volume 162 of*Proceedings of Machine Learning Research*, pp\. 26841–26864\. PMLR, 17–23 Jul 2022\.URL[https://proceedings\.mlr\.press/v162/zhao22b\.html](https://proceedings.mlr.press/v162/zhao22b.html)\.
- Zhou et al\. \(2023\)Zhou, D\., Schärli, N\., Hou, L\., Wei, J\., Scales, N\., Wang, X\., Schuurmans, D\., Cui, C\., Bousquet, O\., Le, Q\., and Chi, E\.Least\-to\-most prompting enables complex reasoning in large language models, 2023\.URL[https://arxiv\.org/abs/2205\.10625](https://arxiv.org/abs/2205.10625)\.

## Appendices

Contents

*Run LaTeX twice to populate the appendix contents\.*

## Appendix ABackground and Related Works

Much of machine learning is ‘Euclidean’\(Papillon et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib83)\)\. It operates on data represented as coordinates in a high\-dimensional space\. Such representations are also described as ‘flat’, ‘variable\-based’, ‘feature\-vector’, ‘attribute\-value,’ and ‘propositional’\(Getoor & Taskar,[2007](https://arxiv.org/html/2606.14892#bib.bib40); Koller & Friedman,[2009](https://arxiv.org/html/2606.14892#bib.bib60)\)\. Statistical learning guarantees assume that the data are values of independent and identically distributed \(i\.i\.d\.\) random variables\. In this section, we discuss in which settings it is useful to relax this assumption, and present methods for such settings, causal and otherwise\.

### A\.1Relational Data

#### A\.1\.1What is Relational Data?

There is no single definition of what counts as relational data\. Broadly construed, it is data that is not flat\. We describe, below, what various literatures mean by the term ‘relational\.’

*In database theory*, it refers to data organized under the relational model in a relational database\(Codd,[1970](https://arxiv.org/html/2606.14892#bib.bib18); Ullman & Widom,[2002](https://arxiv.org/html/2606.14892#bib.bib118)\)\. A vast amount of enterprise, government, and academic data is stored in relational databases\.

*In statistical relational learning*, ‘relational’ concerns domains modeled not as a fixed set of random variables, but rather structured spaces consisting of many types of objects \(entities\) related in different ways\(Getoor et al\.,[2003](https://arxiv.org/html/2606.14892#bib.bib41); Koller & Friedman,[2009](https://arxiv.org/html/2606.14892#bib.bib60)\)\. Both entities and relations may have attributes\. For example, to model inheritance across family trees, each tree consists of its own set of individuals, each with their own attributes\(Koller & Friedman,[2009](https://arxiv.org/html/2606.14892#bib.bib60)\)\. These individuals have varying relations: Mother\-of, Father\-of, Married\-to, Sibling\-of\. Both the number of individuals and their relations vary across family trees\. Since individuals are related, inferences about one individual may be made based on observations of another, e\.g\., a child’s risk of a disease based on their parents’\.

*In deep learning*, ‘relational’ refers not only to a type of input data but also to a type of learning problem that involves reasoning about objects in various relations with one another\. For example, while images can be encoded in a fixed Euclidean space, the task of answering questions like ‘are there any rubber things that have the same size as the yellow metallic cylinder?’ for a given image is considered relational\(Santoro et al\.,[2017](https://arxiv.org/html/2606.14892#bib.bib100)\); intermediate representations of the data for this task might be structured as objects and relations\. Relational reasoning involves composing together different objects \(entities\) via relations following certain rules\(Battaglia et al\.,[2018](https://arxiv.org/html/2606.14892#bib.bib9)\)\.

#### A\.1\.2How is Relational Data Represented?

We give some standard representations in increasing order of generality\.

##### Sequences and sets \(without explicit relations\)

At the least structured end, an instance is represented as a finite set of entities \(objects\) with attributes\. Their relations are not specified explicitly, covering cases where relations are absent, unobserved, or deliberately abstracted away\. A common modeling assumption is that entity indices are arbitrary, so the distribution \(and the model\) should be invariant to permutations of entities\. For such exchangeable sequences, de Finetti’s theorem implies they can be viewed as i\.i\.d\. draws conditional on a latent variable\(de Finetti,[1931](https://arxiv.org/html/2606.14892#bib.bib24); Orbanz & Roy,[2015](https://arxiv.org/html/2606.14892#bib.bib78)\)\. For sets, modern architectures\(Zaheer et al\.,[2017](https://arxiv.org/html/2606.14892#bib.bib134); Lee et al\.,[2019](https://arxiv.org/html/2606.14892#bib.bib63); Locatello et al\.,[2020](https://arxiv.org/html/2606.14892#bib.bib70)\)enforce permutation invariance \(or equivariance\) directly while constructing representations\. Examples include object\-centric representations of scenes \(a set of detected objects\), multi\-agent systems \(a set of agents with state vectors\), and point clouds \(a set of points in space\)\. A limitation of set\-based representations is they do not take relational structure as an explicit input\. However, they may be used to learn the relations between objects, e\.g\., by attention between all pairs\.

##### Matrices and tensors \(special cases of graphs\)\.

Matrices arise when there is a single relation between two types of entities, one indexed by rows and the other by columns, e\.g\., user–item interactions in recommender system\. This can be viewed as a*bipartite graph*\. The matrix entries are edge attributes \(ratings, clicks, links\), and missing entries correspond to absent/unobserved edges\. Tensors extend this to multi\-way relations \(e\.g\., user×\\timesitem×\\timescontext\) or to multiple relation types\. The gain relative to sets is that some interaction structure is now explicit \(who is connected to whom\), but the limitation is that the representation presupposes a small number of entity types and a small number of relations that can be aligned to axes; many relational domains do not naturally factor into a single rectangular array\.

##### Relational databases \(tables\)\.

A relational database contains one or more tables \(‘relations’\) with attributes as columns and records as rows\. Each attribute has a data type, and each record specifies a value of that type\. The primary key is an attribute that uniquely identifies each row, whereas the foreign keys are attributes that connect a given row to other rows in other tables\. Relational databases can be encoded as heterogenous graphs\(Fey et al\.,[2024](https://arxiv.org/html/2606.14892#bib.bib36)\), which we explain next\.

##### Graphs and heterogeneous graphs \(explicit interaction structure\)\.

A graph input consists of explicit entities \(nodes\) and relations \(edges\), possibly with attributes on both and possibly with multiple node/edge types\. This representation is natural for domains such as social networks \(people connected by friendship/follow\), transportation networks \(intersections connected by road segments\), molecular graphs \(atoms connected by bonds\), or citation/knowledge graphs \(papers or entities connected by typed relations\)\. It is often used in relational neural methods that compute by passing information along edges\(Gilmer et al\.,[2017](https://arxiv.org/html/2606.14892#bib.bib42); Battaglia et al\.,[2018](https://arxiv.org/html/2606.14892#bib.bib9)\)\. Compared to set representations, graphs commit to a notion of locality \(neighbors\) and hence constrain which entities can directly influence which others; compared to matrices, graphs permit arbitrary sparsity patterns, multiple relation types, and heterogeneous entities without forcing them into a single array\.

##### More general non\-Euclidean spaces\.

Finally, some domains require modeling more than just edges or pairwise relations between objects\. Higher\-order relations \(hyper\-edges\) and interactions between edges, triangles, and cliques are often useful in domains spanning physical systems, traffic forecasting, drug discovery, and more\(Papillon et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib83); Papamarkou et al\.,[2024](https://arxiv.org/html/2606.14892#bib.bib82); Hajij et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib47)\)\. Structures such as hypergraphs, simplicial complexes, sheaves, and combinatorial complexes are used to represent such data in the active field of topological deep learning\.

### A\.2Probabilistic Models for Relational Data

Much of the probabilistic modeling literature in the 20th century operated on flat, i\.i\.d\. data\. For instance, Bayesian networks and related graphical models\(Pearl,[2009](https://arxiv.org/html/2606.14892#bib.bib86); Koller & Friedman,[2009](https://arxiv.org/html/2606.14892#bib.bib60)\)captured dependencies among attributes of a fixed\-dimensional random vector\. An important exception was the tradition of hierarchical modeling—including multi\-level and cross\-classified models—for non\-i\.i\.d\. data\(Gelman & Hill,[2006](https://arxiv.org/html/2606.14892#bib.bib39); Orbanz & Roy,[2015](https://arxiv.org/html/2606.14892#bib.bib78)\)\. However, these models typically involve two or fewer entity and relation types and simple, homogeneous relational neighborhoods \(i\.e\., sequence\- or matrix\-structured data\)\.

A new literature emerged in the 1990s, often grouped under*statistical relational learning*\(SRL\), that shifted from attribute\-value representations of data to object\-relational ones\(Koller & Friedman,[2009](https://arxiv.org/html/2606.14892#bib.bib60); Getoor & Taskar,[2007](https://arxiv.org/html/2606.14892#bib.bib40)\)\. The goal of these methods was to enable probabilistic inference in object\-relational settings; for e\.g\., infer a patient’s risk of a disease based on that of their family members’, or a user’s rating for a movie based on that of their friends\. To do so, these methods define a template\-level model for a particular schema, laying out probabilistic independencies that could be grounded for any relational instances\.

##### Plate models\.

Plate models represent objects as plates, and relations as intersections between these plates\(Buntine,[1994](https://arxiv.org/html/2606.14892#bib.bib14); Spiegelhalter,[2002](https://arxiv.org/html/2606.14892#bib.bib112)\)\. Their primary use is in modeling domains with repeated measurements, encoding parameter sharing for objects in the same plate\. For example, in a recommender system, one may have one plate for users, and another for movies; the intersection of these plates may define a variable ‘rating’\. This variable exists for every \(user, movie\) pair\. Plate models are widely used in applied Bayesian statistics\(Blei,[2012](https://arxiv.org/html/2606.14892#bib.bib11)\), but do not express dependencies that are contingent on relational constraints, and assume simple, pre\-defined relational neighbourhoods\.666While this is true of plate models as typically used in the literature,\(Heckerman et al\.,[2004](https://arxiv.org/html/2606.14892#bib.bib51)\)define a generalized plate model that can express constraints\.For example, a plate model can express that a user’s rating for a movie depends on their preferences and the movie’s quality\. However, they can not express that this rating additionally depends on the user’s friends’ ratings for that movie\.

##### Directed relational models\.

More general directed relational graphical models such as probabilistic relational models \(PRMs, also known as relational Bayesian networks\)\(Friedman et al\.,[1999](https://arxiv.org/html/2606.14892#bib.bib37); Getoor et al\.,[2003](https://arxiv.org/html/2606.14892#bib.bib41); Koller & Friedman,[2009](https://arxiv.org/html/2606.14892#bib.bib60)\)and the directed acyclic probabilistic entity\-relationship model \(DAPER\)\(Heckerman et al\.,[2004](https://arxiv.org/html/2606.14892#bib.bib51); Getoor & Taskar,[2007](https://arxiv.org/html/2606.14892#bib.bib40)\)address the limitation of plate models by defining dependencies using first\-order constraints and allowing variable\-sized neighbourhoods\. Application areas include citation networks and web hyperlinks \(including link prediction and topic classification\)\(Getoor & Taskar,[2007](https://arxiv.org/html/2606.14892#bib.bib40)\), medical diagnosis\(Xu et al\.,[2005](https://arxiv.org/html/2606.14892#bib.bib132)\), and IT security risk analysis\(Sommestad et al\.,[2010](https://arxiv.org/html/2606.14892#bib.bib109)\)\. Nevertheless, like Bayesian networks, they specify factorizations of observational distributions but do not provide a causal semantics for interventions and counterfactuals\.

##### Other approaches\.

In our work, we build on the directed graphical model approach for object\-relational domains\. However, we note there are a number of other prominent approaches to SRL\. Undirected models address the limitation of directed models in capturing cyclic or symmetric dependencies \(e\.g\., feedback loops, dynamical systems, dependencies such as ”these individuals are likely to enjoy the same movies”\)\. They include relational Markov networks\(Taskar et al\.,[2002](https://arxiv.org/html/2606.14892#bib.bib116)\), Markov logic networks\(Richardson & Domingos,[2006](https://arxiv.org/html/2606.14892#bib.bib93)\), and conditional random fields\(Sutton & McCallum,[2007](https://arxiv.org/html/2606.14892#bib.bib114)\)\. Probabilistic inference tends to be more difficult in undirected models than directed models; still, undirected models have proved useful for entity resolution in citation networks \(e\.g\., is authoroothe same as authoro′o^\{\\prime\}?\)\(Singla & Domingos,[2006](https://arxiv.org/html/2606.14892#bib.bib107)\); for classifying webpages\(Taskar et al\.,[2002](https://arxiv.org/html/2606.14892#bib.bib116)\); and for extracting knowledge from unstructured text\(Bunescu & Mooney,[2004](https://arxiv.org/html/2606.14892#bib.bib13)\)\. In addition, given that causation is directed and asymmetric, such undirected models do not suffice to represent causality in relational settings\(Maier et al\.,[2010](https://arxiv.org/html/2606.14892#bib.bib71)\)\. Besides graphical models, probabilistic logic programs constitute another \(and in some cases, equivalent\) approach, and have been applied to link prediction in heterogenous biological networks\(Kersting & Raedt,[2001](https://arxiv.org/html/2606.14892#bib.bib58); Cussens,[2000](https://arxiv.org/html/2606.14892#bib.bib22); Bach et al\.,[2017](https://arxiv.org/html/2606.14892#bib.bib3); De Raedt et al\.,[2007](https://arxiv.org/html/2606.14892#bib.bib26)\)\.

### A\.3Relational Deep Learning

Relational deep learning combines and extends the tradition of graphical models with deep neural networks for scalable inference\. Unlike SRL, it is not necessarily probabilistic; like SRL, it operates on non\-Euclidean data, most commonly graphs and databases\. Its goal is to design architectures with ‘relational inductive biases’ that leverage relational information for prediction and promotecombinatorial generalization, understood as generalization across combinations of objects\(Battaglia et al\.,[2018](https://arxiv.org/html/2606.14892#bib.bib9)\)\.

##### Graph neural networks\.

Graph neural networks \(GNNs\) are perhaps the most popular architecture for relational deep learning\(Gori et al\.,[2005](https://arxiv.org/html/2606.14892#bib.bib43); Scarselli et al\.,[2009a](https://arxiv.org/html/2606.14892#bib.bib101),[b](https://arxiv.org/html/2606.14892#bib.bib102); Hamilton et al\.,[2017](https://arxiv.org/html/2606.14892#bib.bib49); Kipf & Welling,[2017](https://arxiv.org/html/2606.14892#bib.bib59); Veličković et al\.,[2018](https://arxiv.org/html/2606.14892#bib.bib122); Veličković,[2023](https://arxiv.org/html/2606.14892#bib.bib121)\)\. Given a graph with node \(entity\) features and edge \(relation\) features, message passing architectures iteratively update each node representation by aggregating messages from its neighbors\. GNNs have been applied to molecular property prediction\(Duvenaud et al\.,[2015](https://arxiv.org/html/2606.14892#bib.bib30); Gilmer et al\.,[2017](https://arxiv.org/html/2606.14892#bib.bib42); Sypetkowski et al\.,[2024](https://arxiv.org/html/2606.14892#bib.bib115)\), physical reasoning and interaction networks in vision and control\(Raposo et al\.,[2017](https://arxiv.org/html/2606.14892#bib.bib91); Zambaldi et al\.,[2018](https://arxiv.org/html/2606.14892#bib.bib135); Hamrick et al\.,[2018](https://arxiv.org/html/2606.14892#bib.bib50)\), knowledge graphs\(Bordes et al\.,[2013](https://arxiv.org/html/2606.14892#bib.bib12); Oñoro et al\.,[2017](https://arxiv.org/html/2606.14892#bib.bib79); Hamaguchi et al\.,[2017](https://arxiv.org/html/2606.14892#bib.bib48)\), and spatiotemporal forecasting on transportation networks\(Li et al\.,[2018](https://arxiv.org/html/2606.14892#bib.bib66); Cui et al\.,[2019](https://arxiv.org/html/2606.14892#bib.bib21); Derrow\-Pinion et al\.,[2021](https://arxiv.org/html/2606.14892#bib.bib27)\)\. GNNs also provide a computational interface between SRL and modern representation learning, since many SRL domains can be compiled into heterogeneous graphs\. A number of works have integrated undirected SRL methods into GNN architectures\(Dai et al\.,[2016](https://arxiv.org/html/2606.14892#bib.bib23); Gao et al\.,[2019](https://arxiv.org/html/2606.14892#bib.bib38); Spalević et al\.,[2020](https://arxiv.org/html/2606.14892#bib.bib111); Qu et al\.,[2019](https://arxiv.org/html/2606.14892#bib.bib89); Zhang et al\.,[2020](https://arxiv.org/html/2606.14892#bib.bib139)\)\.

##### Causality and GNNs\.

An important distinction to make is that the graphs used as input to GNNs are not causal graphs\. They are relational graphs, with nodes depicting entities and edges depicting relations between then\. Causal graphs, on the other hand, are naturally formulated over features \(of nodes or edges\)\. As such GNNs, like the previously discussed SRL approaches, lack the architecture and guaranties for predicting the effects of interventions and counterfactuals, as distinct from observations\. For instance, changing a node feature \(a movie’s log line\) or adding an edge \(recommending a movie to a user\) are not modeled as do\-interventions\.\(Cotta et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib20)\)take an important step towards causal prediction using graph embeddings for the task of link prediction\.\(Zečević et al\.,[2021](https://arxiv.org/html/2606.14892#bib.bib136)\)propose GNNs for causal inference; their method, restricted to settings without unobserved confounding, is equivalent to neural causal models\(Xia et al\.,[2021](https://arxiv.org/html/2606.14892#bib.bib130)\)under this restriction\. However, it applies only to i\.i\.d\. attribute\-value data, thus leaving open the relational setting\.

##### Relational deep learning on databases\.

A distinct and increasingly active direction studies deep learning directly on relational databases, with the goal of avoiding manual, error\-prone feature engineering to flatten relational data\(Fey et al\.,[2024](https://arxiv.org/html/2606.14892#bib.bib36); Robinson et al\.,[2024](https://arxiv.org/html/2606.14892#bib.bib95)\)\. These works model typically model databases as heterogenous graphs to leverage the power of graph neural networks\. They fall into roughly two categories: models are trained on and applicable to a fixed schema\(Wu et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib127); Chen et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib15); Dwivedi et al\.,[2026](https://arxiv.org/html/2606.14892#bib.bib31)\)versus ‘foundation models’ that are trained on and applicable to diverse schemas\(Wang et al\.,[2025a](https://arxiv.org/html/2606.14892#bib.bib124); Ranjan et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib90)\)\.

### A\.4Combinatorial and Compositional Generalization

The focus of our work echoes the problem ofcombinatorial generalizationstudied in artificial intelligence\. There is no agreed\-upon definition of combinatorial generalization\. Sometimes, it is used interchangeably with the termcompositional generalization\(e\.g\., in\(Liu et al\.,[2022](https://arxiv.org/html/2606.14892#bib.bib69)\)\)\. However, compositional generalization may also involve the composition of functions or tasks \(e\.g\., subtask reinforcement learning\(Mendez et al\.,[2022](https://arxiv.org/html/2606.14892#bib.bib72); Jothimurugan et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib57)\)and compositional instruction following or skill composition in LLMs\(Yang et al\.,[2024](https://arxiv.org/html/2606.14892#bib.bib133); Zhao et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib140); Sakai et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib98); Zhou et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib142)\)\)\.

##### Combining features vs combining objects\.

We distinguish between two types of combinatorial generalization\. First,*feature\-combination generalization*holds the underlying object\-relational structure fixed but varies the combinations of features of these objects\. A canonical example is attribute binding: training on blue circles and red squares and testing on blue squares\. Second,*object\-combination generalization*varies the underlying set of objects and relations itself \(often including the number of objects\), e\.g\., in going from 2 stacked blocks to 3 stacked blocks, or from small interaction graphs to larger ones\. In this work, we study the latter type of combinatorial generalization\.

##### Computer vision\.

In computer vision, combinatorial generalization is often studied for visual scenes with varying objects, attributes, and relations\(Okawa et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib77); Hwang et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib53)\)\. CLEVR is a paradigmatic dataset for combinatorial generalization in visual\-question answering\(Johnson et al\.,[2016](https://arxiv.org/html/2606.14892#bib.bib56)\)\. Combinatorial generalization is also a goal of image generation\.\(Liu et al\.,[2022](https://arxiv.org/html/2606.14892#bib.bib69); Du et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib28)\)develop an approach that composes pre\-trained diffusion models, explicitly enforce compositionality during inference using logical operators, instead of relying on implicit learning\. This approach, alongside several other works\(Schott et al\.,[2022](https://arxiv.org/html/2606.14892#bib.bib104); Montero et al\.,[2021](https://arxiv.org/html/2606.14892#bib.bib73); Liang et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib67)\), challenges a common view that disentangled representation learning is sufficient for combinatorial generalization\.

##### Decision\-making and world models\.

For tasks such as robotic manipulation and autonomous driving, combinatorial generalization is unavoidable since scenes naturally vary in the number of objects and their relations\(Cui et al\.,[2019](https://arxiv.org/html/2606.14892#bib.bib21); Derrow\-Pinion et al\.,[2021](https://arxiv.org/html/2606.14892#bib.bib27); Lin et al\.,[2022](https://arxiv.org/html/2606.14892#bib.bib68)\)\.\(Zambaldi et al\.,[2018](https://arxiv.org/html/2606.14892#bib.bib135)\)introduce an approach for relational deep reinforcement learning that uses attention over entity representations, showing improved generalization to more complex instances than those seen during training\.\(Duan et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib29)\)introduce a formal definition of ‘out\-of\-combination’ generalization in the decision\-making context that assumes a fixed number of objects and requires generalization to regions out of thesupportof the state space training distribution\. While their diffusion\-based approach shows promising zero\-shot generalization for this task, both the task definition and their approach assume that all objects \(or ‘base elements’\) are seen during training\(Duan et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib29), Sec 3\.3\), thus ruling out varying numbers of objects\.\(Song et al\.,[2024](https://arxiv.org/html/2606.14892#bib.bib110)\)solve a similar problem as\(Duan et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib29)\)using an approach that maps unseen states to the closest state seen during training\. An active recent literature on object\-oriented \(as opposed to monolithic\)world modelsaims to learn object\-centric representations of pixel data\(Nakano et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib75); Wu et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib129); Baek et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib4); Ferraro et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib35); Veerapaneni et al\.,[2020](https://arxiv.org/html/2606.14892#bib.bib120); Wang et al\.,[2025b](https://arxiv.org/html/2606.14892#bib.bib125); Mosbach et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib74); Feng et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib34); Zhao et al\.,[2022](https://arxiv.org/html/2606.14892#bib.bib141)\), for instance, by decomposing the latent space into ‘slots’ for different objects and sometimes explicitly modeling relations between objects\. While these methods are evaluated on out\-of\-distribution generalization tasks, their performance on unseen combinations of objects remains relatively understudied\.

### A\.5Relational Causal Models

The intersection of causal and relational modeling is relatively understudied\. In the graphical framework of causality, most works focus on relational causal discovery–the task of learning a relational causal graph from data\. Works on relational causal inference–answering causal queries from graph and data–are few in number, and deal with special types of relations, as we describe below\.

##### Relational causal discovery\.

\(Maier et al\.,[2010](https://arxiv.org/html/2606.14892#bib.bib71)\)provide the first algorithm for causal discovery over data stored in relational databases\. They use the DAPER model to encode conditional independencies in relational data, and extend the PC algorithm\(Spirtes et al\.,[2000](https://arxiv.org/html/2606.14892#bib.bib113)\)to the relational setting\.\(Lee & Honavar,[2016](https://arxiv.org/html/2606.14892#bib.bib64)\)build on this DAPER framework, coining the term ‘relational causal model’ for the DAPER model, and providing more efficient and informative algorithms for relational causal discovery;\(Ahsan et al\.,[2022](https://arxiv.org/html/2606.14892#bib.bib1)\)extend this to include cyclic dependencies\. However, the DAPER model is a probabilistic model, offering a compact encoding of conditional independencies in relational data \(as Bayesian networks do for flat data\)\. It lacks a causal semantics for interventions and counterfactuals, just as Bayesian networks do\. This is precisely what motivated the formulation of structural causal models \(SCMs\) and causal/counterfactual Bayesian networks\(Pearl,[2009](https://arxiv.org/html/2606.14892#bib.bib86); Bareinboim et al\.,[2022](https://arxiv.org/html/2606.14892#bib.bib7); Bareinboim,[2025](https://arxiv.org/html/2606.14892#bib.bib6); Correa & Bareinboim,[2025](https://arxiv.org/html/2606.14892#bib.bib19)\)\. Therefore, these works leave open the grounding of causality in relational domains, and, therefore, the definition and inference of causal queries\. Additionally, all of them address only those conditional independence structures that can be represented using graphs with directed edges, leaving out settings with unobserved confounding \(often represented via bidirected edges\)\(Bareinboim et al\.,[2022](https://arxiv.org/html/2606.14892#bib.bib7); Jeong et al\.,[2025](https://arxiv.org/html/2606.14892#bib.bib55)\)\.

##### Causal inference under interference\.

Causal inference under interference can be seen as a special case of relational causal inference where all objects and relations are of the same type\(Sobel,[2006](https://arxiv.org/html/2606.14892#bib.bib108); Rosenbaum,[2007](https://arxiv.org/html/2606.14892#bib.bib96); Ogburn & VanderWeele,[2014](https://arxiv.org/html/2606.14892#bib.bib76); Bhattacharya et al\.,[2020](https://arxiv.org/html/2606.14892#bib.bib10); Hudgens & Halloran,[2008](https://arxiv.org/html/2606.14892#bib.bib52); Zhang et al\.,[2022a](https://arxiv.org/html/2606.14892#bib.bib137); Sherman & Shpitser,[2018](https://arxiv.org/html/2606.14892#bib.bib106)\)\. The interference literature assumes a fixed number of ‘units’ \(or objects\) and a fixed interaction structure between them: this captures settings such as people in a given neighborhood, students in a given school, or patients in a given hospital\. The query of interest is typically an aggregated causal effect across all units \(e\.g\., an average direct effect, an average spillover effect, or a global average treatment affect\)\. Unlike DAPER, however, these models do not necessarily enforce parameter\-sharing across units, e\.g\.,\(Zhang et al\.,[2022a](https://arxiv.org/html/2606.14892#bib.bib137)\)study interference for linear models without enforcing mechanism sharing across instances of the same type, and allowing different coefficients for different neighbors \(violating permutation\-invariance\)\. Additionally, works in graphical causality under interference assume the absence of unobserved confounding, an assumption violated in many real\-world settings\.

##### Relational causal inference\.

Relational causal inference generalizes inference under interference, allowing heterogeneous objects and relations\(Arbour et al\.,[2016](https://arxiv.org/html/2606.14892#bib.bib2); Jensen et al\.,[2020](https://arxiv.org/html/2606.14892#bib.bib54); Salimi et al\.,[2020](https://arxiv.org/html/2606.14892#bib.bib99); Weinstein & Blei,[2024](https://arxiv.org/html/2606.14892#bib.bib126)\)\.\(Jensen et al\.,[2020](https://arxiv.org/html/2606.14892#bib.bib54)\),\(Guo et al\.,[2024](https://arxiv.org/html/2606.14892#bib.bib44)\), and\(Weinstein & Blei,[2024](https://arxiv.org/html/2606.14892#bib.bib126)\)provide sound methods for causal inference in plate models, showing how ‘object\-conditioning’ can lead to greater identifiability even in the presence of unobserved confounding\. However, these models capture only a subset of the restricted types of relations \(not including first\-order constraints\) expressible in plate models\.\(Arbour et al\.,[2016](https://arxiv.org/html/2606.14892#bib.bib2)\)are the first to generalize causal inference to the entity\-relationship model, giving a backdoor criterion for identifying interventional queries from abstract ground graphs\(Maier et al\.,[2010](https://arxiv.org/html/2606.14892#bib.bib71)\)\. They assume a ground relational network for a particular relational skeleton as input\. As such, they do not define a template\-level causal model which can be instantiated on different skeletons, tying them together\.\(Salimi et al\.,[2020](https://arxiv.org/html/2606.14892#bib.bib99)\)define a template\-level formalism with ‘causal rules’ that define first\-order constraints for when one variable affects another, and give a similar backdoor adjustment criterion\. However, these causal rules do not specify themechanismby which this effect unfolds, and thus resemble the coarse\-grained information encoded in a causal graph\.

Firstly, neither\(Arbour et al\.,[2016](https://arxiv.org/html/2606.14892#bib.bib2)\)nor\(Salimi et al\.,[2020](https://arxiv.org/html/2606.14892#bib.bib99)\)provide a mechanism\-level definition of relational causal models, defining interventions only using the truncated Markov factorization\(Pearl,[2009](https://arxiv.org/html/2606.14892#bib.bib86)\)\. As such, their set\-up is limited to modeling interventions in settings without unobserved confounding, and does not provide semantics or identification criteria for counterfactuals\. Secondly, both works assume that for causal identification in a given skeleton, observational data from that skeleton is available; they do not address the problem of cross\-skeleton inference\. Finally, while\(Arbour et al\.,[2016](https://arxiv.org/html/2606.14892#bib.bib2)\)allows the effect of one variable on another to depend on the exact relation satisfied,\(Salimi et al\.,[2020](https://arxiv.org/html/2606.14892#bib.bib99)\)does not; for example, if both friends’ and family members’ vaccination statuses affect a given individual’s infection risk, they are assumed to affect it in the same way\. While possibly beneficial for estimation in practice, this assumption is quite restrictive in real\-world settings\.

## Appendix BFurther Definitions and Summary of Concepts

##### Graphical terminology\.

Consider a graph𝒢=\(𝐕𝒢,𝐄𝒢\)\\mathcal\{G\}=\(\\mathbf\{V\_\{\\mathcal\{G\}\},E\_\{\\mathcal\{G\}\}\}\)with directed and bidirected edges\. For a nodeOOin𝒢\\mathcal\{G\}, the*graphical parents*ofOOare the set of nodes\{Y:Y→X∈𝐄𝒢\}\\\{Y:Y\\to X\\in\\mathbf\{E\}\_\{\\mathcal\{G\}\}\\\}\. The descendants ofOOare nodesYYsuch that there is a directed pathX↝YX\\rightsquigarrow Yin𝒢\\mathcal\{G\}\. For a set of nodes𝐗⊆𝐕\\mathbf\{X\}\\subseteq\\mathbf\{V\}, the*induced subgraph*𝒢𝐗\\mathcal\{G\}\_\{\\mathbf\{X\}\}is the graph containing nodes𝐗\\mathbf\{X\}with an edgeV,WV,WbetweenV,W∈𝐗V,W\\in\\mathbf\{X\}if and only if this edge is present in𝒢\\mathcal\{G\}\. A*bidirected clique*in𝒢\\mathcal\{G\}is a set of nodes every pair of which is connected by a bidirected edge in𝒢\\mathcal\{G\}\. Such a clique is*maximal*if there is no larger bidirected clique in which it is strictly contained\.

### B\.1Relational Structural Causal Models

We use a modified definition of relational constraints from\(Heckerman et al\.,[2004](https://arxiv.org/html/2606.14892#bib.bib51), Def\. 6\)\.

###### Definition B\.1\(Relational constraint\)\.

Consider a relational schema𝒮=⟨ℰ,ℛ,𝒜⟩\\mathcal\{S\}=\\langle\\mathcal\{E\},\\mathcal\{R\},\\mathcal\{A\}\\rangle\. Let𝒳⊆ℰ\\mathcal\{X\}\\subseteq\\mathcal\{E\}be a set of entity types\. A relational constraintϕ\(𝒳\)\\phi\(\\mathcal\{X\}\)is a first\-order expression whose atoms are relationship symbols inℛ\\mathcal\{R\}\(alongside pre\-defined constraints such as equality\) and whose only free variables range over the space of ground entities of the types in𝒳\\mathcal\{X\}\.

RSCMs are defined using permutation\-invariant functions on multiset inputs\(Zaheer et al\.,[2017](https://arxiv.org/html/2606.14892#bib.bib134)\)\. Often, in relational learning, such mutlisets are summarized using*aggregators*\(Koller & Friedman,[2009](https://arxiv.org/html/2606.14892#bib.bib60); Getoor & Taskar,[2007](https://arxiv.org/html/2606.14892#bib.bib40)\)\. We define such aggregators below\.

###### Definition B\.2\(Aggregator\)\.

Let𝒳\\mathcal\{X\}and𝒴\\mathcal\{Y\}be finite sets, and let𝒳𝗆𝗎𝗅𝗍𝗂𝗌𝖾𝗍\\mathcal\{X\}^\{\\mathsf\{multiset\}\}be the set of all finite multisets over𝒳\\mathcal\{X\}\. An*aggregator*is a function

AGG:𝒳𝗆𝗎𝗅𝗍𝗂𝗌𝖾𝗍→𝒴\\mathrm\{AGG\}:\\mathcal\{X\}^\{\\mathsf\{multiset\}\}\\to\\mathcal\{Y\}that maps a finite multiset of elements from𝒳\\mathcal\{X\}to an element of𝒴\\mathcal\{Y\}\.

Note that since aggregators take \(multi\-\)sets as input, they do not depend on any ordering over the elements of their input\.

We also define what it means for two relational skeletons to be considered the ‘same’\.

###### Definition B\.1\(Skeleton isomorphism\)\.

An isomorphism between skeletonsρ\\rhoandρ′\\rho^\{\\prime\}over a given schema𝒮\\mathcal\{S\}is a bijectionπ:ρ→ρ′\\pi:\\rho\\to\\rho^\{\\prime\}on entities and relations that \(a\) preserves types and \(b\) preserves relations between entities\. We writeρ≅ρ′\\rho\\cong\\rho^\{\\prime\}if such aπ\\piexists\.

In Prop\.[D\.1](https://arxiv.org/html/2606.14892#A4.Thmadxprop1), we show how RSCM\-induced distributions are invariant to skeleton isomorphism\.

Next, we define what it means for an RSCM to be acyclic, or recursive\.

###### Definition B\.3\(Recursive RSCM\)\.

An RSCMℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangleis said to be*recursive*\(or*acyclic*\) if its relational causal graph𝒢\\mathcal\{G\}contains no directed cycles\. Equivalently, there exists a topological ordering of the variables𝐕\\mathbf\{V\}such that for every attributeO\.A∈𝐕O\.A\\in\\mathbf\{V\}, the structural functionfO\.Af\_\{O\.A\}only takes as input \(relational or non\-relational\) variables that precedeO\.AO\.Ain this ordering\.

For a given skeletonρ\\rho, an RSCMℳ\\mathcal\{M\}induces a*ground RSCM*, a standard SCM with functions and exogenous distributions shared across variables of the same type\.

###### Definition B\.4\(Ground relational structural causal model\)\.

Given an RSCMℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangleand a relational skeletonρ\\rhofollowing the schema𝒮\\mathcal\{S\}, a ground relational structural causal model ofℳ\\mathcal\{M\}forρ\\rhois an SCMℳρ=⟨𝐕ρ,𝐔ρ,ℱρ,P\(𝐔ρ\)⟩\\mathcal\{M\_\{\\rho\}\}=\\langle\\mathbf\{V\}\_\{\\rho\},\\mathbf\{U\}\_\{\\rho\},\\mathcal\{F\}\_\{\\rho\},P\(\\mathbf\{U\}\_\{\\rho\}\)\\ranglewith

1. 1\.ground endogenous variables𝐕ρ=\{o\.A∣O\.A∈𝐕,o∈ρ\(O\)\}\\mathbf\{V\}\_\{\\rho\}=\\\{o\.A\\mid O\.A\\in\\mathbf\{V\},o\\in\\rho\(O\)\\\},
2. 2\.ground exogenous variables𝐔ρ=\{o\.U∣O\.U∈𝐔,o∈ρ\(O\)\}\\mathbf\{U\}\_\{\\rho\}=\\\{o\.U\\mid O\.U\\in\\mathbf\{U\},o\\in\\rho\(O\)\\\}with distributionso\.U∼P\(O\.U\)o\.U\\sim P\(O\.U\)given byP\(𝐔\)P\(\\mathbf\{U\}\), and
3. 3\.ground mechanismsℱρ\\mathcal\{F\}\_\{\\rho\}, withfo\.Af\_\{o\.A\}obtained by substituting relational parents\(𝐖,ϕ,AGG\)\(\\mathbf\{W\},\\phi,\\textrm\{AGG\}\)infO\.Af\_\{O\.A\}with \(the specified aggregate of\) the set of ground variables\{t\.W∣T\.W∈𝐖,t∈ρ\(T\),ϕ\(o,t\)holds inρ\}\\\{t\.W\\mid T\.W\\in\\mathbf\{W\},t\\in\\rho\(T\),\\phi\(o,t\)\\text\{ holds in \}\\rho\\\}\. This yields a functionfo\.A\(𝐩𝐚o\.A,𝐮o\.A,𝐩𝐚o\.Ar,𝐮o\.Ar\)f\_\{o\.A\}\(\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{u\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\},\\mathbf\{u\}^\{r\}\_\{o\.A\}\)with𝐏𝐚O\.A⊆𝐕ρ\\mathbf\{Pa\}\_\{O\.A\}\\subseteq\\mathbf\{V\}\_\{\\rho\}and𝐔O\.A⊆𝐔ρ\\mathbf\{U\}\_\{O\.A\}\\subseteq\\mathbf\{U\}\_\{\\rho\}\.

Note that a ground RSCM can be viewed as a standard SCM with typed variables and function/noise sharing across variables of the same type\. The standard mechanism for each relational mechanismfo\.A\(𝐩𝐚o\.A,𝐮o\.A,𝐩𝐚o\.Ar,𝐮o\.Ar\)f\_\{o\.A\}\(\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{u\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\},\\mathbf\{u\}^\{r\}\_\{o\.A\}\)is the same, but with the argument signaturefo\.A\(𝐩𝐚o\.A∪𝐩𝐚o\.Ar,𝐮o\.A∪𝐮o\.Ar\)f\_\{o\.A\}\(\\mathbf\{pa\}\_\{o\.A\}\\cup\\mathbf\{pa\}^\{r\}\_\{o\.A\},\\mathbf\{u\}\_\{o\.A\}\\cup\\mathbf\{u\}^\{r\}\_\{o\.A\}\)\. A ground RSCMℳρ\\mathcal\{M\}\_\{\\rho\}is said to be recursive if it is recursive viewed as a standard RSCM\(Pearl,[2009](https://arxiv.org/html/2606.14892#bib.bib86)\)\.

Next, we define the various observational, interventional, and counterfactual distributions that a ground RSCM induces\. Note the similarity to standard SCMs\(Pearl,[2009](https://arxiv.org/html/2606.14892#bib.bib86); Bareinboim,[2025](https://arxiv.org/html/2606.14892#bib.bib6)\), since ground RSCMs are simply standard SCMs\.

###### Definition B\.5\(RSCM\-induced distributions\)\.

Fix an RSCMℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangle, a relational skeletonρ\\rhofollowing the schema𝒮\\mathcal\{S\}, and a corresponding ground RSCMℳρ=⟨𝐕ρ,𝐔ρ,ℱρ,P\(𝐔ρ\)⟩\\mathcal\{M\_\{\\rho\}\}=\\langle\\mathbf\{V\}\_\{\\rho\},\\mathbf\{U\}\_\{\\rho\},\\mathcal\{F\}\_\{\\rho\},P\(\\mathbf\{U\}\_\{\\rho\}\)\\rangle\. For any counterfactual events𝐘𝐱,…,𝐙𝐰\\mathbf\{Y\_\{x\}\},\\dots,\\mathbf\{Z\_\{w\}\}over𝐕ρ\\mathbf\{V\}\_\{\\rho\},ℳρ\\mathcal\{M\}\_\{\\rho\}induces the distribution

Pℳρ\(𝐲𝐱,…,𝐳𝐰\)\\displaystyle P^\{\\mathcal\{M\}\_\{\\rho\}\}\(\\mathbf\{y\_\{x\}\},\\dots,\\mathbf\{z\_\{w\}\}\):=∑𝐮ρ𝟏\[𝐘𝐱\(𝐮ρ\)=𝐲,…,𝐙𝐰\(𝐮ρ\)=𝐳\]P\(𝐮ρ\)\\displaystyle:=\\sum\_\{\\mathbf\{u\}\_\{\\rho\}\}\\mathbf\{1\}\[\\mathbf\{Y\_\{x\}\}\(\\mathbf\{u\}\_\{\\rho\}\)=\\mathbf\{y\},\\dots,\\mathbf\{Z\_\{w\}\}\(\\mathbf\{u\}\_\{\\rho\}\)=\\mathbf\{z\}\]P\(\\mathbf\{u\}\_\{\\rho\}\)where the value𝐘𝐱\(𝐮\)\\mathbf\{Y\_\{x\}\}\(\\mathbf\{u\}\)is obtained by the standard SCM semantics overℱρ\\mathcal\{F\}\_\{\\rho\}\.

A distribution over ground variables𝐕ρ\\mathbf\{V\}\_\{\\rho\}is said to be*observational*,*interventional*, or*counterfactual*depending on the form of the counterfactual event\(s\) it assigns probability to\.

1. 1\.If none of the events involve interventions, i\.e\., each event is of the form𝐘\\mathbf\{Y\}\(equivalently𝐘∅\\mathbf\{Y\_\{\\emptyset\}\}\), thenPℳρP^\{\\mathcal\{M\}\_\{\\rho\}\}is*observational*\. In this case we writePℳρ\(𝐲\)P^\{\\mathcal\{M\}\_\{\\rho\}\}\(\\mathbf\{y\}\)for𝐘⊆𝐕ρ\\mathbf\{Y\}\\subseteq\\mathbf\{V\}\_\{\\rho\}orPℳρ\(𝐯ρ\)P^\{\\mathcal\{M\}\_\{\\rho\}\}\(\\mathbf\{v\}\_\{\\rho\}\)\.
2. 2\.If all events share the same intervention and contain no nested interventions, i\.e\., all events are of the form𝐘𝐱\\mathbf\{Y\_\{x\}\}for a single intervention𝐗=𝐱\\mathbf\{X\}\{=\}\\mathbf\{x\}, thenPℳρP^\{\\mathcal\{M\}\_\{\\rho\}\}is*interventional*\. In this case we writePℳρ\(𝐲∣do\(𝐱\)\)P^\{\\mathcal\{M\}\_\{\\rho\}\}\(\\mathbf\{y\}\\mid do\(\\mathbf\{x\}\)\)orPℳρ\(𝐲𝐱\)P^\{\\mathcal\{M\}\_\{\\rho\}\}\(\\mathbf\{y\_\{x\}\}\), and similarlyPℳρ\(𝐯ρ∣do\(𝐱\)\)P^\{\\mathcal\{M\}\_\{\\rho\}\}\(\\mathbf\{v\}\_\{\\rho\}\\mid do\(\\mathbf\{x\}\)\)\.
3. 3\.If the distribution involves at least two events𝐘𝐱,𝐙𝐰\\mathbf\{Y\_\{x\}\},\\mathbf\{Z\_\{w\}\}such that𝐗\\mathbf\{X\}and𝐖\\mathbf\{W\}are distinct variables or𝐱\\mathbf\{x\}and𝐰\\mathbf\{w\}are distinct values, then it is*counterfactual*\. In this case we writePℳρ\(𝐲𝐱,…,𝐳𝐰\)P^\{\\mathcal\{M\}\_\{\\rho\}\}\(\\mathbf\{y\_\{x\}\},\\dots,\\mathbf\{z\_\{w\}\}\)\.

### B\.2Relational Identification

###### Definition B\.6\(Ground relational causal graph\)\.

Consider a relational causal graph𝒢\\mathcal\{G\}over schema𝒮\\mathcal\{S\}and a skeletonρ\\rho\. The*ground relational causal graph*𝒢ρ\\mathcal\{G\}\_\{\\rho\}is constructed as follows\.

- •For each nodeO\.VO\.Vin𝒢\\mathcal\{G\}\(relational or otherwise\), and each instanceo∈ρ\(O\)o\\in\\rho\(O\), include a nodeo\.Vo\.Vin𝒢ρ\\mathcal\{G\}\_\{\\rho\}\.
- •For each edgeO\.B→O\.AO\.B\\to O\.Ain𝒢\\mathcal\{G\}whereO\.AO\.Ais a non\-relational node, include an edgeo\.B→o\.Ao\.B\\to o\.Ain𝒢ρ\\mathcal\{G\}\_\{\\rho\}\.
- •For each edgeO\.B↔T\.BO\.B\\leftrightarrow T\.Bin𝒢\\mathcal\{G\}annotated with constraintϕ\\phi, include an edgeo\.A↔t\.Bo\.A\\leftrightarrow t\.Bin𝒢ρ\\mathcal\{G\}\_\{\\rho\}for eacho∈ρ\(O\),t∈ρ\(T\)o\\in\\rho\(O\),t\\in\\rho\(T\)such thatϕ\(o,t\)\\phi\(o,t\)holds\.
- •For each edgeT\.W→O\.RT\.W\\to O\.Rin𝒢\\mathcal\{G\}whereO\.RO\.Ris a relational node withR=\(𝐖,ϕ,AGG\)R=\(\\mathbf\{W\},\\phi,\\textrm\{AGG\}\), include edgest\.W→o\.Rt\.W\\to o\.Rfor everyy∈ρ\(T\)y\\in\\rho\(T\)such thatϕ\(o,t\)\\phi\(o,t\)holds\.

###### Definition B\.7\(Marginalized ground relational causal graph\)\.

Fix𝒢\\mathcal\{G\}andρ\\rho, and let𝒢ρ\\mathcal\{G\}\_\{\\rho\}be the ground relational causal graph \(Def\.[B\.6](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition6)\)\. The*marginalized ground relational causal graph*𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}is the graph on node set𝐕ρ\\mathbf\{V\}\_\{\\rho\}obtained by marginalizing out all ground relational role nodes\. Equivalently, start from𝒢ρ\\mathcal\{G\}\_\{\\rho\}, delete every relational nodeo\.Ro\.R, and for each deleted node add directed edgest\.W→o\.At\.W\\to o\.Awhenever𝒢ρ\\mathcal\{G\}\_\{\\rho\}containedt\.W→o\.R→o\.At\.W\\to o\.R\\to o\.A\. Bidirected edges among nodes in𝐕ρ\\mathbf\{V\}\_\{\\rho\}are inherited unchanged\.

### B\.3Summarizing RSCMs versus SCMs

To help relate our terminology to standard Pearl\-style SCMs, Table[B\.3\.1](https://arxiv.org/html/2606.14892#A2.SS3.T1)compares the main objects in the standard and relational settings\. Informally, the schema provides the types of objects and relations common to various domains, the skeleton provides the concrete objects and relations in one domain, the RSCM specifies reusable template\-level mechanisms, and grounding an RSCM on a skeleton produces an ordinary SCM over the instantiated object attributes\.

Table B\.3\.1:Side\-by\-side comparison of standard SCM concepts and relational SCM concepts, illustrated using the traffic example\.

## Appendix CFurther Examples and Comparison with Standard SCMs

### C\.1RSCMs versus SCMs: Student Example

A standard SCM may be viewed as a special case of a relational SCM with a single entity type and no relation types\. The purpose of this example is to make this comparison explicit, and to show how assuming i\.i\.d\.\-ness can lead to incorrect causal effect estimates when the relational effects exist\.

Consider measuring the effect of student tutoring on GPA\. For each student, letT∈\{0,1\}T\\in\\\{0,1\\\}indicate tutoring enrollment, whereT=1T=1means enrolled, and letG∈\{0,1\}G\\in\\\{0,1\\\}indicate whether the student’s GPA exceeds3\.53\.5, whereG=1G=1means GPA\>3\.5\>3\.5\.

#### Relational schema

- •Standard SCM\.The units are students\. Equivalently, the implicit relational schema is𝒮std=⟨ℰ,ℛ,𝒜⟩,ℰ=\{𝖲𝗍𝗎𝖽𝖾𝗇𝗍\},ℛ=∅,\\mathcal\{S\}\_\{\\mathrm\{std\}\}=\\langle\\mathcal\{E\},\\mathcal\{R\},\\mathcal\{A\}\\rangle,\\qquad\\mathcal\{E\}=\\\{\\mathsf\{Student\}\\\},\\quad\\mathcal\{R\}=\\emptyset,with observed attributes𝒜=\{𝖲𝗍𝗎𝖽𝖾𝗇𝗍\.T,𝖲𝗍𝗎𝖽𝖾𝗇𝗍\.G\}\.\\mathcal\{A\}=\\\{\\mathsf\{Student\}\.T,\\mathsf\{Student\}\.G\\\}\.
- •Relational SCM\.The entity type is again students, but now students may be related by friendship:𝒮rel=⟨ℰ,ℛ,𝒜⟩,ℰ=\{𝖲𝗍𝗎𝖽𝖾𝗇𝗍\},ℛ=\{𝖥𝗋𝗂𝖾𝗇𝖽\(𝖲𝗍𝗎𝖽𝖾𝗇𝗍,𝖲𝗍𝗎𝖽𝖾𝗇𝗍\)\},\\mathcal\{S\}\_\{\\mathrm\{rel\}\}=\\langle\\mathcal\{E\},\\mathcal\{R\},\\mathcal\{A\}\\rangle,\\qquad\\mathcal\{E\}=\\\{\\mathsf\{Student\}\\\},\\quad\\mathcal\{R\}=\\\{\\mathsf\{Friend\}\(\\mathsf\{Student\},\\mathsf\{Student\}\)\\\},with the same observed attributes𝒜=\{𝖲𝗍𝗎𝖽𝖾𝗇𝗍\.T,𝖲𝗍𝗎𝖽𝖾𝗇𝗍\.G\}\.\\mathcal\{A\}=\\\{\\mathsf\{Student\}\.T,\\mathsf\{Student\}\.G\\\}\.

#### Relational skeletons

Let the observed skeleton contain100100studentss1,…,s100s\_\{1\},\\dots,s\_\{100\}\. We pair the students into friendship pairs so that, symmetrically,

𝖥𝗋𝗂𝖾𝗇𝖽\(s2i−1,s2i\)and𝖥𝗋𝗂𝖾𝗇𝖽\(s2i,s2i−1\)fori=1,…,50\.\\mathsf\{Friend\}\(s\_\{2i\-1\},s\_\{2i\}\)\\quad\\text\{and\}\\quad\\mathsf\{Friend\}\(s\_\{2i\},s\_\{2i\-1\}\)\\qquad\\text\{for \}i=1,\\dots,50\.Each student has exactly one friend\.

- •Standard SCM\.The implicit skeleton consists of100100student instances and no relations between them\.
- •Relational SCM\.The skeleton consists of100100student instances and the friendship relations above\.

#### Structural causal model: template level

- •Standard SCM\.The standard SCM isMstd=⟨𝐕,𝐔,ℱ,P\(𝐔\)⟩,M\_\{\\mathrm\{std\}\}=\\langle\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangle,where𝐕=\{T,G\}\\mathbf\{V\}=\\\{T,G\\\}and𝐔=\{UT,UG\}\\mathbf\{U\}=\\\{U\_\{T\},U\_\{G\}\\\}\. The mechanisms are T←fT\(UT\)=UT,G←fG\(T,UG\)=T⊕UG,T\\leftarrow f\_\{T\}\(U\_\{T\}\)=U\_\{T\},\\qquad G\\leftarrow f\_\{G\}\(T,U\_\{G\}\)=T\\oplus U\_\{G\},where⊕\\oplusdenotes XOR\. ThusTTaffectsGGfor the same student only\. The exogenous variables are independent, withUT∼Bernoulli\(0\.2\),UG∼Bernoulli\(0\.3\)\.U\_\{T\}\\sim\\mathrm\{Bernoulli\}\(0\.2\),\\ U\_\{G\}\\sim\\mathrm\{Bernoulli\}\(0\.3\)\.
- •Relational SCM\.The relational SCM isMrel=⟨𝒮rel,𝐕,𝐔,ℱ,P\(𝐔\)⟩,M\_\{\\mathrm\{rel\}\}=\\langle\\mathcal\{S\}\_\{\\mathrm\{rel\}\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangle,where𝐕=\{𝖲𝗍𝗎𝖽𝖾𝗇𝗍\.T,𝖲𝗍𝗎𝖽𝖾𝗇𝗍\.G\},\\mathbf\{V\}=\\\{\\mathsf\{Student\}\.T,\\mathsf\{Student\}\.G\\\},and𝐔=\{𝖲𝗍𝗎𝖽𝖾𝗇𝗍\.UT,𝖲𝗍𝗎𝖽𝖾𝗇𝗍\.UG\}\.\\mathbf\{U\}=\\\{\\mathsf\{Student\}\.U\_\{T\},\\mathsf\{Student\}\.U\_\{G\}\\\}\.For a studentSS, the tutoring mechanism is S\.T←fS\.T\(S\.UT\)=S\.UT\.S\.T\\leftarrow f\_\{S\.T\}\(S\.U\_\{T\}\)=S\.U\_\{T\}\.Here,S\.TS\.Thas one exogenous non\-relational parent–S\.UTS\.U\_\{T\}for the same student–and no non\-relational endogenous parents or relational endogenous or exogenous parents\. The GPA mechanism has one non\-relational endogenous parent, the student’s own tutoring statusS\.TS\.T, and one relational endogenous parent, the tutoring status ofSS’s friends:𝐏𝐚S\.Gr=\{\(\{𝖲𝗍𝗎𝖽𝖾𝗇𝗍\.T\},𝖥𝗋𝗂𝖾𝗇𝖽\(S,S′\),∨\)\}\.\\mathbf\{Pa\}^\{r\}\_\{S\.G\}=\\left\\\{\\left\(\\\{\\mathsf\{Student\}\.T\\\},\\mathsf\{Friend\}\(S,S^\{\\prime\}\),\\lor\\right\)\\right\\\}\.The corresponding mechanism is S\.G←fS\.G\(S\.T,S\.UG,⋁S′:𝖥𝗋𝗂𝖾𝗇𝖽\(S,S′\)S′\.T\),S\.G\\leftarrow f\_\{S\.G\}\\left\(S\.T,S\.U\_\{G\},\\bigvee\_\{S^\{\\prime\}:\\mathsf\{Friend\}\(S,S^\{\\prime\}\)\}S^\{\\prime\}\.T\\right\),with S\.G←\(S\.T∧⋁S′:𝖥𝗋𝗂𝖾𝗇𝖽\(S,S′\)S′\.T\)⊕S\.UG\.S\.G\\leftarrow\\left\(S\.T\\wedge\\bigvee\_\{S^\{\\prime\}:\\mathsf\{Friend\}\(S,S^\{\\prime\}\)\}S^\{\\prime\}\.T\\right\)\\oplus S\.U\_\{G\}\.Here, tutoring improves a student’s GPA only when both the student and at least one of the student’s friends are tutored, up to noise\. The exogenous variables are independent across attributes and across student instances, withS\.UT∼Bernoulli\(0\.2\),S\.UG∼Bernoulli\(0\.3\)\.S\.U\_\{T\}\\sim\\mathrm\{Bernoulli\}\(0\.2\),\\ S\.U\_\{G\}\\sim\\mathrm\{Bernoulli\}\(0\.3\)\.

#### Structural causal models: ground level

- •Ground standard SCM\.Although the standard SCM formalism does not usually introduce a separate notion of a ground model, the i\.i\.d\. sampling assumption implicitly induces one copy of the SCM for each student\. For the skeleton with studentss1,…,s100s\_\{1\},\\dots,s\_\{100\}, the ground variables are 𝐕ρ=\{T\(si\),G\(si\):i=1,…,100\},𝐔ρ=\{UT\(si\),UG\(si\):i=1,…,100\}\.\\mathbf\{V\}\_\{\\rho\}=\\\{T^\{\(s\_\{i\}\)\},G^\{\(s\_\{i\}\)\}:i=1,\\dots,100\\\},\\qquad\\mathbf\{U\}\_\{\\rho\}=\\\{U\_\{T\}^\{\(s\_\{i\}\)\},U\_\{G\}^\{\(s\_\{i\}\)\}:i=1,\\dots,100\\\}\.The ground mechanisms are T\(si\)←UT\(si\),G\(si\)←T\(si\)⊕UG\(si\)\.T^\{\(s\_\{i\}\)\}\\leftarrow U\_\{T\}^\{\(s\_\{i\}\)\},\\qquad G^\{\(s\_\{i\}\)\}\\leftarrow T^\{\(s\_\{i\}\)\}\\oplus U\_\{G\}^\{\(s\_\{i\}\)\}\.The exogenous variables are mutually independent, withUT\(si\)∼Bernoulli\(0\.2\),UG\(si\)∼Bernoulli\(0\.3\)\.U\_\{T\}^\{\(s\_\{i\}\)\}\\sim\\mathrm\{Bernoulli\}\(0\.2\),\\qquad U\_\{G\}^\{\(s\_\{i\}\)\}\\sim\\mathrm\{Bernoulli\}\(0\.3\)\.
- •Ground relational SCM\.The ground relational SCM for the friendship skeleton isMρ=⟨𝐕ρ,𝐔ρ,ℱρ,P\(𝐔ρ\)⟩,M\_\{\\rho\}=\\langle\\mathbf\{V\}\_\{\\rho\},\\mathbf\{U\}\_\{\\rho\},\\mathcal\{F\}\_\{\\rho\},P\(\\mathbf\{U\}\_\{\\rho\}\)\\rangle,with 𝐕ρ=\{si\.T,si\.G:i=1,…,100\},𝐔ρ=\{si\.UT,si\.UG:i=1,…,100\}\.\\mathbf\{V\}\_\{\\rho\}=\\\{s\_\{i\}\.T,s\_\{i\}\.G:i=1,\\dots,100\\\},\\qquad\\mathbf\{U\}\_\{\\rho\}=\\\{s\_\{i\}\.U\_\{T\},s\_\{i\}\.U\_\{G\}:i=1,\\dots,100\\\}\.For every studentsis\_\{i\}, the tutoring mechanism issi\.T←si\.UT\.s\_\{i\}\.T\\leftarrow s\_\{i\}\.U\_\{T\}\.Since each student has exactly one friend, letfr\(s2i−1\)=s2i,fr\(s2i\)=s2i−1\.\\mathrm\{fr\}\(s\_\{2i\-1\}\)=s\_\{2i\},\\ \\mathrm\{fr\}\(s\_\{2i\}\)=s\_\{2i\-1\}\.Then, the GPA mechanisms are sj\.G←\(sj\.T∧fr\(sj\)\.T\)⊕sj\.UG\.s\_\{j\}\.G\\leftarrow\\left\(s\_\{j\}\.T\\wedge\\mathrm\{fr\}\(s\_\{j\}\)\.T\\right\)\\oplus s\_\{j\}\.U\_\{G\}\.The exogenous variables are mutually independent, withsi\.UT∼Bernoulli\(0\.2\),si\.UG∼Bernoulli\(0\.3\)\.s\_\{i\}\.U\_\{T\}\\sim\\mathrm\{Bernoulli\}\(0\.2\),\\ s\_\{i\}\.U\_\{G\}\\sim\\mathrm\{Bernoulli\}\(0\.3\)\.

#### Causal graph: template level

- •Standard causal graph\.The standard SCM induces the graph𝒢std:T→G\.\\mathcal\{G\}\_\{\\textrm\{std\}\}:\\ T\\to G\.
- •Relational causal graph\.The relational SCM induces a relational causal graph𝒢rel\\mathcal\{G\}\_\{\\textrm\{rel\}\}with: 1. 1\.non\-relational nodesS\.TS\.TandS\.GS\.G, 2. 2\.a non\-relational edgeS\.T→S\.GS\.T\\to S\.G, 3. 3\.a relational nodeS\.R𝖥𝗋𝗂𝖾𝗇𝖽S\.R\_\{\\mathsf\{Friend\}\}, and 4. 4\.relational edgesS\.T→𝖥𝗋𝗂𝖾𝗇𝖽\(S,S′\),∨S\.R𝖥𝗋𝗂𝖾𝗇𝖽→S\.GS\.T\\overset\{\\mathsf\{Friend\}\(S,S^\{\\prime\}\),\\lor\}\{\\to\}S\.R\_\{\\mathsf\{Friend\}\}\\to S\.G Intuitively,S\.GS\.Gdepends onS\.TS\.Tand on the logical OR ofS′\.TS^\{\\prime\}\.Tamong studentsS′S^\{\\prime\}such that𝖥𝗋𝗂𝖾𝗇𝖽\(S,S′\)\\mathsf\{Friend\}\(S,S^\{\\prime\}\)\.

#### Causal graph: ground and marginalized ground

- •Ground standard graph\.The implicit ground graph contains one disconnected copy ofT→GT\\to Gfor each student:𝒢std,ρ:T\(si\)→G\(si\),i=1,…,100\.\\mathcal\{G\}\_\{\\textrm\{std\},\\rho\}:T^\{\(s\_\{i\}\)\}\\to G^\{\(s\_\{i\}\)\},\\qquad i=1,\\dots,100\.There are no edges between distinct students\.
- •Ground relational graph\.The ground relational graph𝒢rel,ρ\\mathcal\{G\}\_\{\\textrm\{rel\},\\rho\}contains, for each studentsjs\_\{j\}, the non\-relational edgesj\.T→sj\.G,s\_\{j\}\.T\\to s\_\{j\}\.G,and a relational nodesj\.R𝖥𝗋𝗂𝖾𝗇𝖽s\_\{j\}\.R\_\{\\mathsf\{Friend\}\}collecting the tutoring statuses ofsjs\_\{j\}’s friends:fr\(sj\)\.T→sj\.R𝖥𝗋𝗂𝖾𝗇𝖽→sj\.G\.\\mathrm\{fr\}\(s\_\{j\}\)\.T\\to s\_\{j\}\.R\_\{\\mathsf\{Friend\}\}\\to s\_\{j\}\.G\.
- •Marginalized ground relational graph\.After marginalizing the relational nodesj\.R𝖥𝗋𝗂𝖾𝗇𝖽s\_\{j\}\.R\_\{\\mathsf\{Friend\}\}, the resulting marginalized ground graph𝒢¯rel,ρ\\bar\{\\mathcal\{G\}\}\_\{\\textrm\{rel\},\\rho\}has a direct edge into a given student’s GPA from the tutoring status of that student’s friend: sj\.T→sj\.G,fr\(sj\)\.T→sj\.G\.s\_\{j\}\.T\\to s\_\{j\}\.G,\\qquad\\mathrm\{fr\}\(s\_\{j\}\)\.T\\to s\_\{j\}\.G\.

#### Causal effects

We now compare two within\-skeleton interventional queries, for the given set of students\. The first query compares tutoring all students to tutoring no students\. The second query compares tutoring only the even\-indexed students to tutoring no students\.

For both queries, the outcome is the average GPA indicatorG¯=1100∑i=1100G\(si\)\\overline\{G\}=\\frac\{1\}\{100\}\\sum\_\{i=1\}^\{100\}G^\{\(s\_\{i\}\)\}in the standard SCM, andG¯=1100∑i=1100si\.G\\overline\{G\}=\\frac\{1\}\{100\}\\sum\_\{i=1\}^\{100\}s\_\{i\}\.Gin the relational SCM\.

###### Query 1: tutoring all students versus tutoring no students\.

The first intervention contrast isdo\(𝐓=𝟏\)versusdo\(𝐓=𝟎\)\.do\(\\mathbf\{T\}=\\mathbf\{1\}\)\\ \\text\{versus\}\\ do\(\\mathbf\{T\}=\\mathbf\{0\}\)\.where𝐓=\(T\(si\)\)i=1,…,100\\mathbf\{T\}=\(T^\{\(s\_\{i\}\)\}\)\_\{i=1,\\dots,100\}in the standard SCM and𝐓=\(si\.T\)i=1,…,100\\mathbf\{T\}=\(s\_\{i\}\.T\)\_\{i=1,\\dots,100\}in the relational SCM\.

- •Standard SCM\.Becuase students are i\.i\.d\. in the standard setting, we have P\(g\(s1\),…g\(s100\)∣do\(t\(s1\),…,t\(s100\)\)\)\\displaystyle P\(g^\{\(s\_\{1\}\)\},\\dots g^\{\(s\_\{100\}\)\}\\mid do\(t^\{\(s\_\{1\}\)\},\\dots,t^\{\(s\_\{100\}\)\}\)\)=∏i=1100P\(g\(si\)∣do\(t\(s1\),…t\(s100\)\)\)\\displaystyle=\\prod\_\{i=1\}^\{100\}P\(g^\{\(s\_\{i\}\)\}\\mid do\(t^\{\(s\_\{1\}\)\},\\dots t^\{\(s\_\{100\}\)\}\)\)\(Rule 1 of do\-calculus, students’ grades are conditionally independent in𝒢std,ρ\\mathcal\{G\}\_\{\\textrm\{std\},\\rho\}\)=∏i=1100P\(g\(si\)∣do\(t\(si\)\)\)\\displaystyle=\\prod\_\{i=1\}^\{100\}P\(g^\{\(s\_\{i\}\)\}\\mid do\(t^\{\(s\_\{i\}\)\}\)\)\(Rule 2 of do\-calculus, no directed path from one student’s tutoring status to another’s GPA in𝒢std,ρ\\mathcal\{G\}\_\{\\textrm\{std\},\\rho\}\)Since the students are identically distributed, we additionally have thatP\(g\(si\)∣do\(t\(si\)\)\)=P\(g\(sj\)∣do\(t\(sj\)\)\)P\(g^\{\(s\_\{i\}\)\}\\mid do\(t^\{\(s\_\{i\}\)\}\)\)=P\(g^\{\(s\_\{j\}\)\}\\mid do\(t^\{\(s\_\{j\}\)\}\)\)for anyi,ji,j\. By linearity of expectation, 𝔼\[G¯∣do\(𝐓=𝐭\)\]=1100∑i=1100𝔼\[G\(si\)∣do\(T\(si\)=ti\)\]\.\\mathbb\{E\}\[\\overline\{G\}\\mid do\(\\mathbf\{T\}=\\mathbf\{t\}\)\]=\\frac\{1\}\{100\}\\sum\_\{i=1\}^\{100\}\\mathbb\{E\}\[G^\{\(s\_\{i\}\)\}\\mid do\(T^\{\(s\_\{i\}\)\}=t\_\{i\}\)\]\. Underdo\(𝐓=𝟏\)do\(\\mathbf\{T\}=\\mathbf\{1\}\), every student hasT\(si\)=1T^\{\(s\_\{i\}\)\}=1, soG\(si\)←1⊕UG\(si\)G^\{\(s\_\{i\}\)\}\\leftarrow 1\\oplus U\_\{G\}^\{\(s\_\{i\}\)\}impliesP\(G\(si\)=1∣do\(T\(si\)=1\)\)=P\(UG\(si\)=0\)=0\.7\.P\(G^\{\(s\_\{i\}\)\}=1\\mid do\(T^\{\(s\_\{i\}\)\}=1\)\)=P\(U\_\{G\}^\{\(s\_\{i\}\)\}=0\)=0\.7\.Therefore, 𝔼\[G¯∣do\(𝐓=𝟏\)\]=1100∑i=11000\.7=0\.7\.\\mathbb\{E\}\[\\overline\{G\}\\mid do\(\\mathbf\{T\}=\\mathbf\{1\}\)\]=\\frac\{1\}\{100\}\\sum\_\{i=1\}^\{100\}0\.7=0\.7\. Underdo\(𝐓=𝟎\)do\(\\mathbf\{T\}=\\mathbf\{0\}\), a similar calculation yields𝔼\[G¯∣do\(𝐓=𝟎\)\]=0\.3\.\\mathbb\{E\}\[\\overline\{G\}\\mid do\(\\mathbf\{T\}=\\mathbf\{0\}\)\]=0\.3\. Thus,ATEstdall=𝔼\[G¯∣do\(𝐓=𝟏\)\]−𝔼\[G¯∣do\(𝐓=𝟎\)\]=0\.7−0\.3=0\.4\.\\mathrm\{ATE\}^\{\\mathrm\{all\}\}\_\{\\mathrm\{std\}\}=\\mathbb\{E\}\[\\overline\{G\}\\mid do\(\\mathbf\{T\}=\\mathbf\{1\}\)\]\-\\mathbb\{E\}\[\\overline\{G\}\\mid do\(\\mathbf\{T\}=\\mathbf\{0\}\)\]=0\.7\-0\.3=0\.4\.
- •Relational SCM\. In the relational SCM, recall that a student’s outcome depends both on the student’s own tutoring and on the tutoring status of the student’s friend: sj\.G←\(sj\.T∧fr\(sj\)\.T\)⊕sj\.UG\.s\_\{j\}\.G\\leftarrow\\left\(s\_\{j\}\.T\\wedge\\mathrm\{fr\}\(s\_\{j\}\)\.T\\right\)\\oplus s\_\{j\}\.U\_\{G\}\. Underdo\(𝐓=𝟏\)do\(\\mathbf\{T\}=\\mathbf\{1\}\), every student and every student’s friend are tutored\. Hence, for eachsjs\_\{j\},sj\.G←\(1∧1\)⊕sj\.UG=1⊕sj\.UG\.s\_\{j\}\.G\\leftarrow\(1\\wedge 1\)\\oplus s\_\{j\}\.U\_\{G\}=1\\oplus s\_\{j\}\.U\_\{G\}\.Therefore,Pr\(sj\.G=1∣do\(𝐓=𝟏\)\)=Pr\(sj\.UG=0\)=0\.7\.\\Pr\(s\_\{j\}\.G=1\\mid do\(\\mathbf\{T\}=\\mathbf\{1\}\)\)=\\Pr\(s\_\{j\}\.U\_\{G\}=0\)=0\.7\.Thus,𝔼\[G¯∣do\(𝐓=𝟏\)\]=0\.7\.\\mathbb\{E\}\[\\overline\{G\}\\mid do\(\\mathbf\{T\}=\\mathbf\{1\}\)\]=0\.7\. Underdo\(𝐓=𝟎\)do\(\\mathbf\{T\}=\\mathbf\{0\}\), neither a student nor the student’s friend is tutored\. A similar calculation gives𝔼\[G¯∣do\(𝐓=𝟎\)\]=0\.3\.\\mathbb\{E\}\[\\overline\{G\}\\mid do\(\\mathbf\{T\}=\\mathbf\{0\}\)\]=0\.3\.Therefore,ATErelall=𝔼\[G¯∣do\(𝐓=𝟏\)\]−𝔼\[G¯∣do\(𝐓=𝟎\)\]=0\.7−0\.3=0\.4\.\\mathrm\{ATE\}^\{\\mathrm\{all\}\}\_\{\\mathrm\{rel\}\}=\\mathbb\{E\}\[\\overline\{G\}\\mid do\(\\mathbf\{T\}=\\mathbf\{1\}\)\]\-\\mathbb\{E\}\[\\overline\{G\}\\mid do\(\\mathbf\{T\}=\\mathbf\{0\}\)\]=0\.7\-0\.3=0\.4\.

Therefore, for the all\-treated versus none\-treated query,ATEstdall=ATErelall=0\.4\.\\mathrm\{ATE\}^\{\\mathrm\{all\}\}\_\{\\mathrm\{std\}\}=\\mathrm\{ATE\}^\{\\mathrm\{all\}\}\_\{\\mathrm\{rel\}\}=0\.4\.By construction, the two models agree for this query numerically\. Next, we consider a query for which they differ\.

###### Query 2: tutoring even\-indexed students versus tutoring no students\.

Now consider the intervention that tutors exactly the even\-indexed students:do\(T\(si\)=1for eveni,T\(si\)=0for oddi\),do\(T^\{\(s\_\{i\}\)\}=1\\text\{ for even \}i,\\;T^\{\(s\_\{i\}\)\}=0\\text\{ for odd \}i\),compared to the baseline interventiondo\(𝐓=𝟎\)\.do\(\\mathbf\{T\}=\\mathbf\{0\}\)\.This policy tutors exactly one student in each friendship pair\.

- •Standard SCM\.Again, since each student’s GPA depends only on that student’s own tutoring, we recall the derivation in the previous query: P\(g\(s1\),…g\(s100\)∣do\(t\(s1\),…,t\(s100\)\)\)\\displaystyle P\(g^\{\(s\_\{1\}\)\},\\dots g^\{\(s\_\{100\}\)\}\\mid do\(t^\{\(s\_\{1\}\)\},\\dots,t^\{\(s\_\{100\}\)\}\)\)=∏i=1100P\(g\(si\)∣do\(t\(si\)\)\)\\displaystyle=\\prod\_\{i=1\}^\{100\}P\(g^\{\(s\_\{i\}\)\}\\mid do\(t^\{\(s\_\{i\}\)\}\)\)Since the even\-indexed students havePr⁡\(G\(si\)=1∣do\(T\(si\)=1\)\)=0\.7\\Pr\(G^\{\(s\_\{i\}\)\}=1\\mid do\(T^\{\(s\_\{i\}\)\}=1\)\)=0\.7and and the odd\-indexed students havePr⁡\(G\(si\)=1∣do\(T\(si\)=0\)\)=0\.3\\Pr\(G^\{\(s\_\{i\}\)\}=1\\mid do\(T^\{\(s\_\{i\}\)\}=0\)\)=0\.3, we get 𝔼\[G¯∣do\(T\(si\)=1for eveni,T\(si\)=0for oddi\)\]=50100\(0\.7\)\+50100\(0\.3\)=0\.5\.\\mathbb\{E\}\[\\overline\{G\}\\mid do\(T^\{\(s\_\{i\}\)\}=1\\text\{ for even \}i,\\;T^\{\(s\_\{i\}\)\}=0\\text\{ for odd \}i\)\]=\\frac\{50\}\{100\}\(0\.7\)\+\\frac\{50\}\{100\}\(0\.3\)=0\.5\. As before,𝔼\[G¯∣do\(𝐓=𝟎\)\]=0\.3\\mathbb\{E\}\[\\overline\{G\}\\mid do\(\\mathbf\{T\}=\\mathbf\{0\}\)\]=0\.3, givingATEstdeven=0\.5−0\.3=0\.2\.\\mathrm\{ATE\}^\{\\mathrm\{even\}\}\_\{\\mathrm\{std\}\}=0\.5\-0\.3=0\.2\.
- •Relational SCM\. Recall that in the the relational SCM,sj\.G=\(sj\.T∧fr\(sj\)\.T\)⊕sj\.UG\.s\_\{j\}\.G=\\left\(s\_\{j\}\.T\\wedge\\mathrm\{fr\}\(s\_\{j\}\)\.T\\right\)\\oplus s\_\{j\}\.U\_\{G\}\.Since each friendship pair contains exactly one tutored student, we havesj\.T∧fr\(sj\)\.T=0\.s\_\{j\}\.T\\wedge\\mathrm\{fr\}\(s\_\{j\}\)\.T=0\.Thereforesj\.G=0⊕sj\.UG=sj\.UG\.s\_\{j\}\.G=0\\oplus s\_\{j\}\.U\_\{G\}=s\_\{j\}\.U\_\{G\}\.Therefore,Pr\(sj\.G=1∣do\(T\(si\)=1for eveni,T\(si\)=0for oddi\)\)=0\.3,\\Pr\(s\_\{j\}\.G=1\\mid do\(T^\{\(s\_\{i\}\)\}=1\\text\{ for even \}i,\\;T^\{\(s\_\{i\}\)\}=0\\text\{ for odd \}i\)\)=0\.3,and so 𝔼\[G¯∣do\(T\(si\)=1for eveni,T\(si\)=0for oddi\)\]=0\.3\.\\mathbb\{E\}\[\\overline\{G\}\\mid do\(T^\{\(s\_\{i\}\)\}=1\\text\{ for even \}i,\\;T^\{\(s\_\{i\}\)\}=0\\text\{ for odd \}i\)\]=0\.3\. As before,𝔼\[G¯∣do\(𝐓=𝟎\)\]=0\.3\.\\mathbb\{E\}\[\\overline\{G\}\\mid do\(\\mathbf\{T\}=\\mathbf\{0\}\)\]=0\.3\.This implies thatATEreleven=0\.3−0\.3=0\.\\mathrm\{ATE\}^\{\\mathrm\{even\}\}\_\{\\mathrm\{rel\}\}=0\.3\-0\.3=0\.

Therefore, for the even\-treated versus none\-treated query,ATEstdeven=0\.2\\mathrm\{ATE\}^\{\\mathrm\{even\}\}\_\{\\mathrm\{std\}\}=0\.2whereasATEreleven=0\.\\mathrm\{ATE\}^\{\\mathrm\{even\}\}\_\{\\mathrm\{rel\}\}=0\.\. This second query highlights a key difference between the two models\. In the standard SCM, tutoring an even\-indexed student improves that student’s GPA regardless of the treatment status of other students\. In the relational SCM, however, tutoring a student improves GPA only when that student’s friend is also tutored\. Since the even\-treated policy tutors exactly one student in each friendship pair, the GPA mechanism receives no effective tutoring signal for any student, and the ATE is0\. Thus, if the true data\-generating process is this relational SCM, then the standard SCM methodology yields an incorrect estimate of the causal effect of this policy\.

### C\.2Extended Traffic Example

In this section, we consider an extended version of our running example \(Ex\.[1](https://arxiv.org/html/2606.14892#Thmexample1)\) with unobserved confounding across objects\.

###### Example C\.1\(Extended relational schema\)\.

We extend the schema in Ex\.[1](https://arxiv.org/html/2606.14892#Thmexample1)with an attribute𝖯𝖾𝖽\.V\\mathsf\{Ped\}\.Vfor whether they a pedestrian is visible and𝖯𝖾𝖽\.A\\mathsf\{Ped\}\.Afor whether they look alert\. An extended relational schema for traffic scenes would be

ℰ\\displaystyle\\mathcal\{E\}=\{Signal\(𝖲𝗂𝗀\),Car\(𝖢𝖺𝗋\),Pedestrian\(𝖯𝖾𝖽\)\}\\displaystyle=\\\{\\text\{Signal \}\(\\mathsf\{Sig\}\),\\text\{ Car \}\(\\mathsf\{Car\}\),\\text\{ Pedestrian \}\(\\mathsf\{Ped\}\)\\\}ℛ\\displaystyle\\mathcal\{R\}=\{𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖯𝖾𝖽\),𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖢𝖺𝗋\),𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)\}\\displaystyle=\\\{\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Ped\}\),\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Car\}\),\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)\\\}𝒜\\displaystyle\\mathcal\{A\}=\{𝖲𝗂𝗀\.W,𝖯𝖾𝖽\.V,𝖯𝖾𝖽\.A,𝖯𝖾𝖽\.X,𝖢𝖺𝗋\.B\},\\displaystyle=\\\{\\mathsf\{Sig\}\.W,\\mathsf\{Ped\}\.V,\\mathsf\{Ped\}\.A,\\mathsf\{Ped\}\.X,\\mathsf\{Car\}\.B\\\},with all attributes binary\-valued:𝖲𝗂𝗀\.W∈\{1,0\}\\mathsf\{Sig\}\.W\\in\\\{1,0\\\}denotes walk/drive;𝖯𝖾𝖽\.V∈\{1,0\}\\mathsf\{Ped\}\.V\\in\\\{1,0\\\}visible/not;𝖯𝖾𝖽\.A∈\{1,0\}\\mathsf\{Ped\}\.A\\in\\\{1,0\\\}alert/not;𝖯𝖾𝖽\.X∈\{1,0\}\\mathsf\{Ped\}\.X\\in\\\{1,0\\\}cross/wait; and𝖢𝖺𝗋\.B∈\{1,0\}\\mathsf\{Car\}\.B\\in\\\{1,0\\\}brake/go\.𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖯𝖾𝖽\)\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Ped\}\)\(resp\.𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖢𝖺𝗋\)\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Car\}\)\) indicates that a signal controls a pedestrian \(resp\. car\), and𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)indicates that the pedestrian is in the car’s path\.

Since the object and relation types are the same as in Ex\.[1](https://arxiv.org/html/2606.14892#Thmexample1), the three relational skeletons shown in Fig\.[1](https://arxiv.org/html/2606.14892#S1.F1)can also be viewed as skeletons of this new schema\. We give an example RSCM for this schema that departs from Ex\.[3](https://arxiv.org/html/2606.14892#Thmexample3)in two ways: \(i\) it includes unobserved confounding, and \(ii\) it includes a relational parent consisting of more than one variable, illustrating why𝐖\\mathbf\{W\}in Def\.[3\.1](https://arxiv.org/html/2606.14892#S3.Thmtheorem1)is a set and not a single variable\.

###### Example 10\(RSCM for traffic scene\)\.

The endogenous variables are𝐕=\{𝖲𝗂𝗀\.W,𝖯𝖾𝖽\.V,𝖯𝖾𝖽\.,𝖯𝖾𝖽\.X,𝖢𝖺𝗋\.B\}\\mathbf\{V\}=\\\{\\mathsf\{Sig\}\.W,\\ \\mathsf\{Ped\}\.V,\\ \\mathsf\{Ped\}\.,\\ \\mathsf\{Ped\}\.X,\\ \\mathsf\{Car\}\.B\\\}\. The exogenous variables𝐔\\mathbf\{U\}capturing unobserved factors \(e\.g\., a pedestrian’s intent to cross or a driver’s alertness\) are𝖲𝗂𝗀\.UW∼ℬ\(0\.3\),𝖯𝖾𝖽\.UXA∼ℬ\(0\.4\)\\mathsf\{Sig\}\.\{U\_\{W\}\}\\sim\\mathcal\{B\}\(0\.3\),\\mathsf\{Ped\}\.\{U\_\{XA\}\}\\sim\\mathcal\{B\}\(0\.4\),P,UV∼ℬ\(0\.8\)P,U\_\{V\}\\sim\\mathcal\{B\}\(0\.8\), and𝖢𝖺𝗋\.UB∼ℬ\(0\.2\)\\mathsf\{Car\}\.\{U\_\{B\}\}\\sim\\mathcal\{B\}\(0\.2\)\. The mechanisms are

𝖲𝗂𝗀\.W\\displaystyle\\mathsf\{Sig\}\.W←𝖲𝗂𝗀\.UW,\\displaystyle\\leftarrow\\mathsf\{Sig\}\.U\_\{W\},𝖯𝖾𝖽\.V\\displaystyle\\mathsf\{Ped\}\.V←𝖯𝖾𝖽\.UV\\displaystyle\\leftarrow\\mathsf\{Ped\}\.U\_\{V\}𝖯𝖾𝖽\.A\\displaystyle\\mathsf\{Ped\}\.A←𝖯𝖾𝖽\.UXA\\displaystyle\\leftarrow\\mathsf\{Ped\}\.U\_\{XA\}𝖯𝖾𝖽\.X\\displaystyle\\mathsf\{Ped\}\.X←𝖯𝖾𝖽\.UXA⊕⋀𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖯𝖾𝖽\)𝖲𝗂𝗀\.W,\\displaystyle\\leftarrow\\mathsf\{Ped\}\.U\_\{XA\}\\oplus\\bigwedge\_\{\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Ped\}\)\}\\mathsf\{Sig\}\.W,𝖢𝖺𝗋\.B\\displaystyle\\mathsf\{Car\}\.B←𝖢𝖺𝗋\.UB⊕\(⋁𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖢𝖺𝗋\)𝖲𝗂𝗀\.W∨⋁𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)\(𝖯𝖾𝖽\.X∨𝖯𝖾𝖽\.A\)∧𝖯𝖾𝖽\.V\)\.\\displaystyle\\leftarrow\\mathsf\{Car\}\.U\_\{B\}\\oplus\\left\(\\bigvee\_\{\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Car\}\)\}\\mathsf\{Sig\}\.W\\lor\\bigvee\_\{\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)\}\(\\mathsf\{Ped\}\.X\\lor\\mathsf\{Ped\}\.A\)\\wedge\\mathsf\{Ped\}\.V\\right\)\.For each pedestrian,𝖯𝖾𝖽\.A\\mathsf\{Ped\}\.Aand𝖯𝖾𝖽\.X\\mathsf\{Ped\}\.Xare confounded by the pedestrian’s unobserved intent𝖯𝖾𝖽\.UXA\\mathsf\{Ped\}\.U\_\{XA\}\.𝖢𝖺𝗋\.B\\mathsf\{Car\}\.Bhas a relational parent\(\{𝖲𝗂𝗀\.W\},𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖢𝖺𝗋\),∨\)\(\\\{\\mathsf\{Sig\}\.W\\\},\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Car\}\),\\lor\)as in Ex\.[3](https://arxiv.org/html/2606.14892#Thmexample3)\. It also has the relational parent\(\{𝖯𝖾𝖽\.V,𝖯𝖾𝖽\.A,𝖯𝖾𝖽\.X\},𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\),⟂\)\(\\\{\\mathsf\{Ped\}\.V,\\mathsf\{Ped\}\.A,\\mathsf\{Ped\}\.X\\\},\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\),\\perp\)which means: for each pedestrianppin the path of the car, the mechanism for𝖢𝖺𝗋\.B\\mathsf\{Car\}\.Bmay jointly use that pedestrian’s triple\(𝖯𝖾𝖽\.V,𝖯𝖾𝖽\.A,𝖯𝖾𝖽\.X\)\(\\mathsf\{Ped\}\.V,\\mathsf\{Ped\}\.A,\\mathsf\{Ped\}\.X\)\. The relational causal graph for this RSCM is shown in Fig\.[1\(a\)](https://arxiv.org/html/2606.14892#A3.SS2.F1.sf1)\.

![Refer to caption](https://arxiv.org/html/2606.14892v1/figs/rcg-py.png)\(a\)Relational causal graph𝒢\\mathcal\{G\}for extended traffic RSCM in Ex[10](https://arxiv.org/html/2606.14892#Thmexample10)\.
![Refer to caption](https://arxiv.org/html/2606.14892v1/figs/ground-rcg-py-c.png)\(b\)Marginalized ground relational causal graph𝒢¯ρC\\bar\{\\mathcal\{G\}\}\_\{\\rho\_\{C\}\}for skeletonρC\\rho\_\{C\}in Fig\.[1](https://arxiv.org/html/2606.14892#S1.F1)\.
![Refer to caption](https://arxiv.org/html/2606.14892v1/figs/subgraph-py.png)\(c\)Induced subgraph of𝒢¯ρC\\bar\{\\mathcal\{G\}\}\_\{\\rho\_\{C\}\}on attributes of pedestrianp1p\_\{1\}\(with instance identifiers omitted\) for Ex\.[C\.2](https://arxiv.org/html/2606.14892#A3.Thmadxexample2)\.

Figure C\.2\.1:Relational causal graphs for the extended traffic examples\.It is important that\{𝖯𝖾𝖽\.V,𝖯𝖾𝖽\.A,𝖯𝖾𝖽\.X\}\\\{\\mathsf\{Ped\}\.V,\\mathsf\{Ped\}\.A,\\mathsf\{Ped\}\.X\\\}appears as a single relational parent rather than as three separate relational parents\(\{𝖯𝖾𝖽\.V\},𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)\)\(\\\{\\mathsf\{Ped\}\.V\\\},\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)\),\(\{𝖯𝖾𝖽\.A\},𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)\)\(\\\{\\mathsf\{Ped\}\.A\\\},\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)\), and\(\{𝖯𝖾𝖽\.X\},𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)\)\(\\\{\\mathsf\{Ped\}\.X\\\},\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)\)\. This is because the braking condition depends on a within\-pedestrian interaction:∃pin path:𝖯𝖾𝖽\.V∧\(𝖯𝖾𝖽\.A∨𝖯𝖾𝖽\.X\)\\exists p\\textrm\{ in path \}:\\mathsf\{Ped\}\.V\\wedge\(\\mathsf\{Ped\}\.A\\lor\\mathsf\{Ped\}\.X\), i\.e\., a given pedestrian must be visible and either crossing or alert to trigger braking\. If we aggregate𝖯𝖾𝖽\.V,𝖯𝖾𝖽\.A\\mathsf\{Ped\}\.V,\\mathsf\{Ped\}\.A, and𝖯𝖾𝖽\.X\\mathsf\{Ped\}\.Xseparately across pedestrians, we lose information about whether these properties are true of the same pedestrian\. In general,

⋁𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)\(𝖯𝖾𝖽\.X∨𝖯𝖾𝖽\.A\)∧𝖯𝖾𝖽\.V≠\(⋁𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)𝖯𝖾𝖽\.X∨⋁𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)𝖯𝖾𝖽\.A\)∧⋁𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)𝖯𝖾𝖽\.V\\bigvee\_\{\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)\}\(\\mathsf\{Ped\}\.X\\lor\\mathsf\{Ped\}\.A\)\\wedge\\mathsf\{Ped\}\.V\\neq\\left\(\\bigvee\_\{\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)\}\\mathsf\{Ped\}\.X\\lor\\bigvee\_\{\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)\}\\mathsf\{Ped\}\.A\\right\)\\wedge\\bigvee\_\{\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)\}\\mathsf\{Ped\}\.VThe right\-hand side can be true even when no single pedestrian satisfies both conditions—for instance, one pedestrian is visible but not crossing/alert, while another is crossing/alert but not visible\. The left\-hand side is false in that situation\.

This illustrates why Def\.[3\.1](https://arxiv.org/html/2606.14892#S3.Thmtheorem1)allows arelational parent to contain a set of variables: it lets the mechanism represent interactions among attributes of the same related object\.□\\square

Next, we illustrate an application of Prop\.[4\.5](https://arxiv.org/html/2606.14892#S4.Thmtheorem5)to show non\-identifiability in this example\.

###### Example C\.2\(Relational non\-identifiability using Prop\.[4\.5](https://arxiv.org/html/2606.14892#S4.Thmtheorem5)\)\.

Continuing Ex\.[10](https://arxiv.org/html/2606.14892#Thmexample10), consider the causal diagram𝒢\\mathcal\{G\}in Fig\.[1\(a\)](https://arxiv.org/html/2606.14892#A3.SS2.F1.sf1), the source skeletonρ=ρA\\rho=\\rho\_\{A\}, and the target skeletonρ⋆=ρC\\rho\_\{\\star\}=\\rho\_\{C\}from Fig\.[1](https://arxiv.org/html/2606.14892#S1.F1)\. Say we have as input source distributionsℙ=\{P\(𝐯ρ\),P\(𝐯ρ∣do\(p1\.x\)\),P\(𝐯ρ∣do\(p1\.x,p1\.a\)\)\}\\mathbb\{P\}=\\\{P\(\\mathbf\{v\}\_\{\\rho\}\),P\(\\mathbf\{v\}\_\{\\rho\}\\mid do\(p\_\{1\}\.x\)\),P\(\\mathbf\{v\}\_\{\\rho\}\\mid do\(p\_\{1\}\.x,p\_\{1\}\.a\)\)\\\}, and we are interested in the queryPρ⋆\(p1\.x∣do\(p1\.a\)\)P^\{\\rho\_\{\\star\}\}\(p\_\{1\}\.x\\mid do\(p\_\{1\}\.a\)\)inρ⋆\\rho\_\{\\star\}, the causal effect of pedestrianp1p\_\{1\}’s alertness on whether or not they cross\. Notice how in the marginalized ground graph𝒢¯ρC\\bar\{\\mathcal\{G\}\}\_\{\\rho\_\{C\}\}\(Fig\.[1\(b\)](https://arxiv.org/html/2606.14892#A3.SS2.F1.sf2)\) we see a bow\-graph\(Pearl,[2009](https://arxiv.org/html/2606.14892#bib.bib86)\)structure overp1\.Ap\_\{1\}\.Aandp1\.Xp\_\{1\}\.X\. Standard identification theory usually suggests that in this case,Pρ⋆\(p1\.x∣do\(p1\.a\)\)P^\{\\rho\_\{\\star\}\}\(p\_\{1\}\.x\\mid do\(p\_\{1\}\.a\)\)is not identifiable from𝒢ρC\\mathcal\{G\}\_\{\\rho\_\{C\}\}andP\(𝐯ρ⋆\)P\(\\mathbf\{v\}\_\{\\rho\_\{\\star\}\}\)\. We show, using Prop\.[4\.5](https://arxiv.org/html/2606.14892#S4.Thmtheorem5), that it is also not relationally identifiable fromℙ\\mathbb\{P\}and𝒢\\mathcal\{G\}\.

Following the notation of Prop\.[4\.5](https://arxiv.org/html/2606.14892#S4.Thmtheorem5), our query concerns attributes of the instancex=p1x=p\_\{1\}in targetρ⋆\\rho\_\{\\star\}, where𝐕p1=\{p1\.V,p1\.X,p1\.A\}\\mathbf\{V\}\_\{p\_\{1\}\}=\\\{p\_\{1\}\.V,p\_\{1\}\.X,p\_\{1\}\.A\\\}\.

The induced subgraph of𝒢¯ρC\\bar\{\\mathcal\{G\}\}\_\{\\rho\_\{C\}\}on𝐕p1\\mathbf\{V\}\_\{p\_\{1\}\}\(with instance identifiers omitted\) is given in Fig\.[1\(c\)](https://arxiv.org/html/2606.14892#A3.SS2.F1.sf3)\. Fromℙ\\mathbb\{P\}, we construct the restrictionℙ\|P\\mathbb\{P\}\|\_\{P\}for instancesp1,p2p\_\{1\},p\_\{2\}inρ\\rhoof typePP\(Pedestrian\) as follows\. Note that𝐕p1=\{p1\.V,p1\.X,p1\.A\}\\mathbf\{V\}\_\{p\_\{1\}\}=\\\{p\_\{1\}\.V,p\_\{1\}\.X,p\_\{1\}\.A\\\}and𝐕p2=\{p2\.V,p2\.X,p2\.A\}\\mathbf\{V\}\_\{p\_\{2\}\}=\\\{p\_\{2\}\.V,p\_\{2\}\.X,p\_\{2\}\.A\\\}\.

1. 1\.P\(𝐯ρ\)∈ℙP\(\\mathbf\{v\}\_\{\\rho\}\)\\in\\mathbb\{P\} - •p1p\_\{1\}givesP\(𝐯ρ∩𝐯p1\)=P\(𝐯p1\)P\(\\mathbf\{v\}\_\{\\rho\}\\cap\\mathbf\{v\}\_\{p\_\{1\}\}\)=P\(\\mathbf\{v\}\_\{p\_\{1\}\}\) - •p2p\_\{2\}givesP\(𝐯ρ∩𝐯p2\)=P\(𝐯p2\)P\(\\mathbf\{v\}\_\{\\rho\}\\cap\\mathbf\{v\}\_\{p\_\{2\}\}\)=P\(\\mathbf\{v\}\_\{p\_\{2\}\}\)
2. 2\.P\(𝐯ρ∣do\(p1\.x\)\)P\(\\mathbf\{v\}\_\{\\rho\}\\mid do\(p\_\{1\}\.x\)\) - •p1p\_\{1\}givesP\(𝐯ρ∩𝐯p1∣do\(\{p1\.x\}∩𝐯p1\)\)=P\(𝐯p1∣do\(p1\.x\)\)P\(\\mathbf\{v\}\_\{\\rho\}\\cap\\mathbf\{v\}\_\{p\_\{1\}\}\\mid do\(\\\{p\_\{1\}\.x\\\}\\cap\\mathbf\{v\}\_\{p\_\{1\}\}\)\)=P\(\\mathbf\{v\}\_\{p\_\{1\}\}\\mid do\(p\_\{1\}\.x\)\) - •p2p\_\{2\}givesP\(𝐯ρ∩𝐯p2∣do\(\{p1\.x\}∩𝐯p2\)\)=P\(𝐯p2\)P\(\\mathbf\{v\}\_\{\\rho\}\\cap\\mathbf\{v\}\_\{p\_\{2\}\}\\mid do\(\\\{p\_\{1\}\.x\\\}\\cap\\mathbf\{v\}\_\{p\_\{2\}\}\)\)=P\(\\mathbf\{v\}\_\{p\_\{2\}\}\)
3. 3\.P\(𝐯ρ∣do\(p1\.x,p1\.a\)\)P\(\\mathbf\{v\}\_\{\\rho\}\\mid do\(p\_\{1\}\.x,p\_\{1\}\.a\)\) - •p1p\_\{1\}givesP\(𝐯ρ∩𝐯p1∣do\(\{p1\.x,p1\.a\}∩𝐯p1\)\)=P\(𝐯p1∣do\(p1\.x,p1\.a\)\)P\(\\mathbf\{v\}\_\{\\rho\}\\cap\\mathbf\{v\}\_\{p\_\{1\}\}\\mid do\(\\\{p\_\{1\}\.x,p\_\{1\}\.a\\\}\\cap\\mathbf\{v\}\_\{p\_\{1\}\}\)\)=P\(\\mathbf\{v\}\_\{p\_\{1\}\}\\mid do\(p\_\{1\}\.x,p\_\{1\}\.a\)\) - •p2p\_\{2\}givesP\(𝐯ρ∩𝐯p2∣do\(\{p1\.x,p1\.a\}∩𝐯p2\)\)=P\(𝐯p2\)P\(\\mathbf\{v\}\_\{\\rho\}\\cap\\mathbf\{v\}\_\{p\_\{2\}\}\\mid do\(\\\{p\_\{1\}\.x,p\_\{1\}\.a\\\}\\cap\\mathbf\{v\}\_\{p\_\{2\}\}\)\)=P\(\\mathbf\{v\}\_\{p\_\{2\}\}\)

Omitting identifiers, we get the restrictionℙ\|P=\{P\(𝖯𝖾𝖽\.v,𝖯𝖾𝖽\.X,𝖯𝖾𝖽\.a\),P\(𝖯𝖾𝖽\.v,𝖯𝖾𝖽\.X,𝖯𝖾𝖽\.a∣do\(𝖯𝖾𝖽\.X\)\),P\(𝖯𝖾𝖽\.v,𝖯𝖾𝖽\.X,𝖯𝖾𝖽\.a∣do\(𝖯𝖾𝖽\.X,𝖯𝖾𝖽\.a\)\}\\mathbb\{P\}\|\_\{P\}=\\\{P\(\\mathsf\{Ped\}\.v,\\mathsf\{Ped\}\.X,\\mathsf\{Ped\}\.a\),P\(\\mathsf\{Ped\}\.v,\\mathsf\{Ped\}\.X,\\mathsf\{Ped\}\.a\\mid do\(\\mathsf\{Ped\}\.X\)\),P\(\\mathsf\{Ped\}\.v,\\mathsf\{Ped\}\.X,\\mathsf\{Ped\}\.a\\mid do\(\\mathsf\{Ped\}\.X,\\mathsf\{Ped\}\.a\)\\\}and the queryP\(𝖯𝖾𝖽\.X∣do\(𝖯𝖾𝖽\.a\)\)P\(\\mathsf\{Ped\}\.X\\mid do\(\\mathsf\{Ped\}\.a\)\)\. By counterfactual calculus, since the subgraph𝒢P\\mathcal\{G\}\_\{P\}in Fig\.[1\(c\)](https://arxiv.org/html/2606.14892#A3.SS2.F1.sf3)contains a bow\-structure over𝖯𝖾𝖽\.A\\mathsf\{Ped\}\.Aand𝖯𝖾𝖽\.X\\mathsf\{Ped\}\.X, the queryP\(𝖯𝖾𝖽\.X∣do\(𝖯𝖾𝖽\.a\)\)P\(\\mathsf\{Ped\}\.X\\mid do\(\\mathsf\{Ped\}\.a\)\)is non\-identifiable from𝒢P\\mathcal\{G\}\_\{P\}andℙ\|P\\mathbb\{P\}\|\_\{P\}\. Then, by Prop\.[4\.5](https://arxiv.org/html/2606.14892#S4.Thmtheorem5), the original queryPρ⋆\(p1\.x∣do\(p1\.a\)\)P^\{\\rho\_\{\\star\}\}\(p\_\{1\}\.x\\mid do\(p\_\{1\}\.a\)\)is non\-identifiable fromℙ\\mathbb\{P\}and𝒢\\mathcal\{G\}\.

## Appendix DFurther Results and Proofs

### D\.1Proofs for Sec\.[3](https://arxiv.org/html/2606.14892#S3)

The following proposition justifies how two isomorphic skeletons induce the ‘same’ counterfactual distributions over variables\.

###### Proposition D\.1\(Isomorphism\-invariance of RSCM distributions\)\.

Consider an RSCMℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangleand relational skeletonsρ,ρ′\\rho,\\rho^\{\\prime\}isomorphic under a mappingπ\\pi\. Then, for any counterfactual events𝐘𝐱,…,𝐙𝐰\\mathbf\{Y\_\{x\}\},\\dots,\\mathbf\{Z\_\{w\}\}over𝐕ρ\\mathbf\{V\}\_\{\\rho\},

Pℳρ\(𝐲𝐱,…,𝐳𝐰\)=Pℳρ′\(π\(𝐲\)π\(𝐱\),…,π\(𝐳\)π\(𝐰\)\)P^\{\\mathcal\{M\}\_\{\\rho\}\}\(\\mathbf\{y\_\{x\}\},\\dots,\\mathbf\{z\_\{w\}\}\)=P^\{\\mathcal\{M\}\_\{\\rho^\{\\prime\}\}\}\(\\mathbf\{\\pi\(y\)\_\{\\pi\(x\)\}\},\\dots,\\mathbf\{\\pi\(z\)\_\{\\pi\(w\)\}\}\)whereπ\(o\.A\)=π\(o\)\.A\\pi\(o\.A\)=\\pi\(o\)\.Aextendsπ\\pito ground variableso\.A∈𝐕ρo\.A\\in\\mathbf\{V\}\_\{\\rho\}\.

###### Proof\.

Recall from Def\.[B\.5](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition5)that

Pℳρ\(𝐲𝐱,…,𝐳𝐰\)\\displaystyle P^\{\\mathcal\{M\}\_\{\\rho\}\}\(\\mathbf\{y\}\_\{\\mathbf\{x\}\},\\dots,\\mathbf\{z\}\_\{\\mathbf\{w\}\}\)=∑𝐮ρ𝟏\[𝐘𝐱\(𝐮ρ\)=𝐲,…,𝐙𝐰\(𝐮ρ\)=𝐳\]P\(𝐮ρ\),\\displaystyle=\\sum\_\{\\mathbf\{u\}\_\{\\rho\}\}\\mathbf\{1\}\[\\mathbf\{Y\}\_\{\\mathbf\{x\}\}\(\\mathbf\{u\}\_\{\\rho\}\)=\\mathbf\{y\},\\dots,\\mathbf\{Z\}\_\{\\mathbf\{w\}\}\(\\mathbf\{u\}\_\{\\rho\}\)=\\mathbf\{z\}\]\\;P\(\\mathbf\{u\}\_\{\\rho\}\),and

Pℳρ′\(π\(𝐲\)π\(𝐱\),…,π\(𝐳\)π\(𝐰\)\)\\displaystyle P^\{\\mathcal\{M\}\_\{\\rho^\{\\prime\}\}\}\(\\pi\(\\mathbf\{y\}\)\_\{\\pi\(\\mathbf\{x\}\)\},\\dots,\\pi\(\\mathbf\{z\}\)\_\{\\pi\(\\mathbf\{w\}\)\}\)=∑𝐮ρ′𝟏\[π\(𝐘\)π\(𝐱\)\(𝐮ρ′\)=π\(𝐲\),…,π\(𝐙\)π\(𝐰\)\(𝐮ρ′\)=π\(𝐳\)\]P\(𝐮ρ′\)\.\\displaystyle=\\sum\_\{\\mathbf\{u\}\_\{\\rho^\{\\prime\}\}\}\\mathbf\{1\}\[\\pi\(\\mathbf\{Y\}\)\_\{\\pi\(\\mathbf\{x\}\)\}\(\\mathbf\{u\}\_\{\\rho^\{\\prime\}\}\)=\\pi\(\\mathbf\{y\}\),\\dots,\\pi\(\\mathbf\{Z\}\)\_\{\\pi\(\\mathbf\{w\}\)\}\(\\mathbf\{u\}\_\{\\rho^\{\\prime\}\}\)=\\pi\(\\mathbf\{z\}\)\]\\;P\(\\mathbf\{u\}\_\{\\rho^\{\\prime\}\}\)\.We prove the desired equality by a change of variables𝐮ρ′=π\(𝐮ρ\)\\mathbf\{u\}\_\{\\rho^\{\\prime\}\}=\\pi\(\\mathbf\{u\}\_\{\\rho\}\)\. The isomorphismπ:ρ→ρ′\\pi:\\rho\\to\\rho^\{\\prime\}induces a bijection on the exogenous assignments𝐮ρ↦π\(𝐮ρ\)\\mathbf\{u\}\_\{\\rho\}\\mapsto\\pi\(\\mathbf\{u\}\_\{\\rho\}\)\. Therefore, we re\-index the second sum by writing𝐮ρ′=π\(𝐮ρ\)\\mathbf\{u\}\_\{\\rho^\{\\prime\}\}=\\pi\(\\mathbf\{u\}\_\{\\rho\}\)\.

Pℳρ′\(π\(𝐲\)π\(𝐱\),…,π\(𝐳\)π\(𝐰\)\)\\displaystyle P^\{\\mathcal\{M\}\_\{\\rho^\{\\prime\}\}\}\(\\pi\(\\mathbf\{y\}\)\_\{\\pi\(\\mathbf\{x\}\)\},\\dots,\\pi\(\\mathbf\{z\}\)\_\{\\pi\(\\mathbf\{w\}\)\}\)=∑π\(𝐮ρ\)𝟏\[π\(𝐘\)π\(𝐱\)\(π\(𝐮ρ\)\)=π\(𝐲\),…,π\(𝐙\)π\(𝐰\)\(π\(𝐮ρ\)\)=π\(𝐳\)\]P\(π\(𝐮ρ\)\)\.\\displaystyle=\\sum\_\{\\pi\(\\mathbf\{u\}\_\{\\rho\}\)\}\\mathbf\{1\}\[\\pi\(\\mathbf\{Y\}\)\_\{\\pi\(\\mathbf\{x\}\)\}\(\\pi\(\\mathbf\{u\}\_\{\\rho\}\)\)=\\pi\(\\mathbf\{y\}\),\\dots,\\pi\(\\mathbf\{Z\}\)\_\{\\pi\(\\mathbf\{w\}\)\}\(\\pi\(\\mathbf\{u\}\_\{\\rho\}\)\)=\\pi\(\\mathbf\{z\}\)\]\\;P\(\\pi\(\\mathbf\{u\}\_\{\\rho\}\)\)\.
We claim that for any intervention assignment𝐱\\mathbf\{x\}and any exogenous assignment𝐮ρ\\mathbf\{u\}\_\{\\rho\},

π\(𝐘\)π\(𝐱\)\(π\(𝐮ρ\)\)=π\(𝐲\)in𝐌ρ′⟺𝐘𝐱\(𝐮ρ\)=𝐲in𝐌ρ\\displaystyle\\pi\(\\mathbf\{Y\}\)\_\{\\pi\(\\mathbf\{x\}\)\}\(\\pi\(\\mathbf\{u\}\_\{\\rho\}\)\)=\\pi\(\\mathbf\{y\}\)\\ \\text\{ in \}\\mathbf\{M\}\_\{\\rho^\{\\prime\}\}\\ \\Longleftrightarrow\\ \\mathbf\{Y\}\_\{\\mathbf\{x\}\}\(\\mathbf\{u\}\_\{\\rho\}\)=\\mathbf\{y\}\\ \\text\{ in \}\\mathbf\{M\}\_\{\\rho\}and similarly for other counterfactual events𝐙𝐰\\mathbf\{Z\}\_\{\\mathbf\{w\}\}\.

To see this, fix a ground variableo\.A∈𝐕ρo\.A\\in\\mathbf\{V\}\_\{\\rho\}\. By Def\.[B\.4](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition4), each relational parent\(𝐖,ϕ,AGG\)\(\\mathbf\{W\},\\phi,\\textrm\{AGG\}\)in the template mechanismfO\.Af\_\{O\.A\}is instantiated inℳρ\\mathcal\{M\}\_\{\\rho\}as the multiset

\{t\.W∣T\.W∈𝐖,t∈ρ\(T\),ϕ\(o,t\)holds inρ\},\\\{\\,t\.W\\mid T\.W\\in\\mathbf\{W\},\\ t\\in\\rho\(T\),\\ \\phi\(o,t\)\\text\{ holds in \}\\rho\\,\\\},and analogously inℳρ′\\mathcal\{M\}\_\{\\rho^\{\\prime\}\}\. Sinceπ\\piis a skeleton isomorphism, it preserves relations and hence satisfaction of constraints:

ϕ\(o,t\)holds inρ⟺ϕ\(π\(o\),π\(t\)\)holds inρ′\.\\phi\(o,t\)\\text\{ holds in \}\\rho\\ \\Longleftrightarrow\\ \\phi\(\\pi\(o\),\\pi\(t\)\)\\text\{ holds in \}\\rho^\{\\prime\}\.Hence the structural function \(or intervened constant\) forπ\(o\)\.A\\pi\(o\)\.Ainℳρ′\\mathcal\{M\}\_\{\\rho^\{\\prime\}\}is exactly theπ\\pi\-renaming of the structural function foro\.Ao\.Ainℳρ\\mathcal\{M\}\_\{\\rho\}\. This proves the claim, so that for every𝐮ρ\\mathbf\{u\}\_\{\\rho\},

𝟏\[π\(𝐘\)π\(𝐱\)\(π\(𝐮ρ\)\)=π\(𝐲\),…,π\(𝐙\)π\(𝐰\)\(π\(𝐮ρ\)\)=π\(𝐳\)\]\\displaystyle\\mathbf\{1\}\[\\pi\(\\mathbf\{Y\}\)\_\{\\pi\(\\mathbf\{x\}\)\}\(\\pi\(\\mathbf\{u\}\_\{\\rho\}\)\)=\\pi\(\\mathbf\{y\}\),\\dots,\\pi\(\\mathbf\{Z\}\)\_\{\\pi\(\\mathbf\{w\}\)\}\(\\pi\(\\mathbf\{u\}\_\{\\rho\}\)\)=\\pi\(\\mathbf\{z\}\)\]=𝟏\[𝐘𝐱\(𝐮ρ\)=𝐲,…,𝐙𝐰\(𝐮ρ\)=𝐳\]\.\\displaystyle\\qquad=\\mathbf\{1\}\[\\mathbf\{Y\}\_\{\\mathbf\{x\}\}\(\\mathbf\{u\}\_\{\\rho\}\)=\\mathbf\{y\},\\dots,\\mathbf\{Z\}\_\{\\mathbf\{w\}\}\(\\mathbf\{u\}\_\{\\rho\}\)=\\mathbf\{z\}\]\.
It remains to show that for every assignment of values𝐮ρ\\mathbf\{u\}\_\{\\rho\},

P\(𝐮ρ\)inℳρ=P\(π\(𝐮ρ\)\)inℳρ′P\(\\mathbf\{u\}\_\{\\rho\}\)\\ \\text\{ in \}\\mathcal\{M\}\_\{\\rho\}=P\(\\pi\(\\mathbf\{u\}\_\{\\rho\}\)\)\\ \\text\{ in \}\\mathcal\{M\}\_\{\\rho^\{\\prime\}\}
For each entity/relation typeOO, recall that the RSCMℳ\\mathcal\{M\}specifies for each exogenous variableO\.U∈𝐔O\.U\\in\\mathbf\{U\}a distributionO\.U∼P\(O\.U\)O\.U\\sim P\(O\.U\)\. By definition of the ground RSCMℳρ\\mathcal\{M\}\_\{\\rho\}, for eacho∈ρ\(O\)o\\in\\rho\(O\),o\.U∼P\(O\.U\)o\.U\\sim P\(O\.U\)\. Sinceπ\\pipreserves types, we also haveπ\(o\)\.U∼P\(O\.U\)\\pi\(o\)\.U\\sim P\(O\.U\)by definition of the ground RSCMℳρ′\\mathcal\{M\}\_\{\\rho^\{\\prime\}\}\. Therefore, the above equality follows\. ∎

###### Theorem[3\.3](https://arxiv.org/html/2606.14892#S3.Thmtheorem3)\(Impossibility of observational inference across skeletons\)\.

Consider a schema𝒮\\mathcal\{S\}, source skeletonsρ1,…,ρl\\rho\_\{1\},\\dots,\\rho\_\{l\}, and target skeletonρ⋆\\rho\_\{\\star\}\. Then, for any RSCMℳ\\mathcal\{M\}over𝒮\\mathcal\{S\}, there exists another RSCMℳ′\\mathcal\{M\}^\{\\prime\}over𝒮\\mathcal\{S\}such thatℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}agree on observational distributionsP\(𝐯ρk\)P\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\)for every source skeletonρk\\rho\_\{k\}but disagree on the observational distributionP\(𝐯ρ⋆\)P\(\\mathbf\{v\}\_\{\\rho\_\{\\star\}\}\)of the target skeleton\.

Proof idea\.We will prove this by constructing a relational constraintϕ⋆\\phi\_\{\\star\}that evaluates to true only on skeletons isomorphic to the givenρ⋆\\rho\_\{\\star\}\. So, givenℳ\\mathcal\{M\}, we will construct another SCMℳ′\\mathcal\{M\}^\{\\prime\}that is almost identical toℳ\\mathcal\{M\}, except that it has different behaviour whenϕ⋆\\phi\_\{\\star\}is true\. For example, such a constraint forρA\\rho\_\{A\}given in Ex\.[1](https://arxiv.org/html/2606.14892#Thmexample1)would be:

ϕA:\\displaystyle\\phi\_\{A\}:∃SignalS1,PedestrianP1,P2,CarC1,C2such that\(P1≠P2\)∧\(C1≠C2\)\\displaystyle\\ \\exists\\text\{ Signal \}S\_\{1\},\\text\{ Pedestrian \}P\_\{1\},P\_\{2\},\\text\{ Car \}C\_\{1\},C\_\{2\}\\text\{ such that \}\(P\_\{1\}\\neq P\_\{2\}\)\\wedge\(C\_\{1\}\\neq C\_\{2\}\)∧\(∀SignalS,S=S1\)\\displaystyle\\wedge\(\\forall\\text\{ Signal \}S,S=S\_\{1\}\)∧\(∀PedestrianP,P=P1∨P=P2\)\\displaystyle\\wedge\(\\forall\\text\{ Pedestrian \}P,P=P\_\{1\}\\lor P=P\_\{2\}\)∧\(∀CarC,C=C1∨C=C2\)\\displaystyle\\wedge\(\\forall\\text\{ Car \}C,C=C\_\{1\}\\lor C=C\_\{2\}\)∧𝖢𝗍𝗋𝗅\(S1,P1\)∧𝖢𝗍𝗋𝗅\(S1,P2\)\\displaystyle\\wedge\\mathsf\{Ctrl\}\(S\_\{1\},P\_\{1\}\)\\wedge\\mathsf\{Ctrl\}\(S\_\{1\},P\_\{2\}\)∧𝖢𝗍𝗋𝗅\(S1,C1\)∧¬𝖢𝗍𝗋𝗅\(S1,C2\)\\displaystyle\\wedge\\mathsf\{Ctrl\}\(S\_\{1\},C\_\{1\}\)\\wedge\\neg\\mathsf\{Ctrl\}\(S\_\{1\},C\_\{2\}\)∧𝖯𝖺𝗍𝗁\(P1,C1\)∧𝖯𝖺𝗍𝗁\(P1,C2\)\\displaystyle\\wedge\\mathsf\{Path\}\(P\_\{1\},C\_\{1\}\)\\wedge\\mathsf\{Path\}\(P\_\{1\},C\_\{2\}\)∧𝖯𝖺𝗍𝗁\(P2,C1\)∧¬𝖯𝖺𝗍𝗁\(P2,C2\)\\displaystyle\\wedge\\mathsf\{Path\}\(P\_\{2\},C\_\{1\}\)\\wedge\\neg\\mathsf\{Path\}\(P\_\{2\},C\_\{2\}\)
###### Proof\.

Sinceρ⋆\\rho\_\{\\star\}is a finite relational skeleton, there exists a first\-order formulaϕ⋆\\phi\_\{\\star\}that is true for a given skeletonρ\\rhoif and only ifρ≅ρ⋆\\rho\\cong\\rho\_\{\\star\}\. Such aϕ⋆\\phi\_\{\\star\}is constructed as follows\. For each entity/relation typeOOand instanceooofOOinρ⋆\\rho\_\{\\star\}, introduce one existentially quantified variable\. Check that each of these variables \(for a given type\) are distinct\. Introduce a universally quantified variable of typeOO, and check that it is equal to atleast one of theseoo\-variables\. Finally, check that every relationRRin the schema𝒮\\mathcal\{S\}holds on exactly those instance variables for which it is true inρ⋆\\rho\_\{\\star\}, and no others\. By construction,ϕ⋆\\phi\_\{\\star\}is true only on skeletons isomorphic toρ⋆\\rho\_\{\\star\}\.

Having constructedϕ⋆\\phi\_\{\\star\}, consider the given RSCMℳ\\mathcal\{M\}and skeletonsρ1,…,ρl\\rho\_\{1\},\\dots,\\rho\_\{l\}\. Letℳ′\\mathcal\{M^\{\\prime\}\}be the same asℳ\\mathcal\{M\}, with the following changes\. First,ℳ′\\mathcal\{M^\{\\prime\}\}contains, for some arbitrary entity or relation typeOOwith an observed attributeO\.A∈𝐕O\.A\\in\\mathbf\{V\}, an additional exogenous variableO\.UO\.Uwith the same domain asO\.AO\.A, so that𝐔′=𝐔∪\{O\.U\}\\mathbf\{U\}^\{\\prime\}=\\mathbf\{U\}\\cup\\\{O\.U\\\}\. Second,ℳ′\\mathcal\{M^\{\\prime\}\}has a functionfO\.A′f^\{\\prime\}\_\{O\.A\}as follows:

fO\.A′\(𝐩𝐚O\.A,𝐮O\.A′,𝐩𝐚O\.Ar,𝐮O\.Ar\)=\{uO\.Aϕ⋆\(X\)fO\.A\(𝐩𝐚O\.A,𝐮O\.A′∖\{O\.u\},𝐩𝐚O\.Ar,𝐮O\.Ar\)¬ϕ⋆\(X\)\\displaystyle f^\{\\prime\}\_\{O\.A\}\(\\mathbf\{pa\}\_\{O\.A\},\\mathbf\{u\}^\{\\prime\}\_\{O\.A\},\\mathbf\{pa\}^\{r\}\_\{O\.A\},\\mathbf\{u\}^\{r\}\_\{O\.A\}\)=\\begin\{cases\}u\_\{O\.A\}&\\phi\_\{\\star\}\(X\)\\\\ f\_\{O\.A\}\(\\mathbf\{pa\}\_\{O\.A\},\\mathbf\{u\}^\{\\prime\}\_\{O\.A\}\\setminus\\\{O\.u\\\},\\mathbf\{pa\}^\{r\}\_\{O\.A\},\\mathbf\{u\}^\{r\}\_\{O\.A\}\)&\\neg\\phi\_\{\\star\}\(X\)\\end\{cases\}
As a result, sincefO\.A′f^\{\\prime\}\_\{O\.A\}inℳ′\\mathcal\{M\}^\{\\prime\}is equal tofO\.Af\_\{O\.A\}inℳ\\mathcal\{M\}wheneverϕ⋆\\phi\_\{\\star\}is false,ℳ′\\mathcal\{M\}^\{\\prime\}andℳ\\mathcal\{M\}will induce the same observational distributions on skeletonsρ1,…,ρl≇ρ⋆\\rho\_\{1\},\\dots,\\rho\_\{l\}\\not\\cong\\rho\_\{\\star\}\. On the skeletonρ⋆\\rho\_\{\\star\}, however, the distributionP\(O\.U\)P\(O\.U\)inℳ′\\mathcal\{M\}^\{\\prime\}can be chosen depending onPℳ\(𝐯ρ⋆\)P^\{\\mathcal\{M\}\}\(\\mathbf\{v\}\_\{\\rho\_\{\\star\}\}\)so that the different behaviour offO\.A′f^\{\\prime\}\_\{O\.A\}andfO\.Af\_\{O\.A\}results in different observational distributions ofℳ′\\mathcal\{M\}^\{\\prime\}andℳ\\mathcal\{M\}\. ∎

###### Theorem[3\.4](https://arxiv.org/html/2606.14892#S3.Thmtheorem4)\(Impossibility of causal inference within a skeleton\)\.

Consider a schema𝒮\\mathcal\{S\}where at least one entity or relation type has more than one observed attribute\. For any relational SCMℳ\\mathcal\{M\}over𝒮\\mathcal\{S\}and skeletonρ\\rho, there exists another relational SCMℳ′\\mathcal\{M\}^\{\\prime\}over𝒮\\mathcal\{S\}such thatℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}agree on the observational distributionP\(𝐯ρ\)P\(\\mathbf\{v\}\_\{\\rho\}\)but disagree on some interventional distribution over𝐕ρ\\mathbf\{V\}\_\{\\rho\}\.

Proof idea\.We will construct an SCMℳ′\\mathcal\{M\}^\{\\prime\}such that for an entity/relation typeOOwith observed attributesO\.AO\.AandO\.BO\.B, the functionfO\.B′f^\{\\prime\}\_\{O\.B\}inℳ′\\mathcal\{M\}^\{\\prime\}can ‘detect’ thatO\.AO\.Ahas been intervened on \(for the same instance\)\. Then,fO\.B′f^\{\\prime\}\_\{O\.B\}has different behaviour thanfO\.Bf\_\{O\.B\}whenO\.AO\.Ais under intervention, but the same behaviour otherwise\. The fact thatO\.AO\.AandO\.BO\.Bbelong to the same instance allows us to implement such ‘detection’, since the constraints infO\.Af\_\{O\.A\}that hold for an instanceoowill also hold forooinfO\.B′f^\{\\prime\}\_\{O\.B\}\.

###### Proof\.

By assumption, there exists some entity/relation typeOOsuch thatO\.A,O\.B∈𝐕O\.A,O\.B\\in\\mathbf\{V\}\. First, assume WLOG, thatO\.B∉𝐏𝐚O\.AO\.B\\not\\in\\mathbf\{Pa\}\_\{O\.A\}inℳ\\mathcal\{M\}\. Defineℳ′\\mathcal\{M\}^\{\\prime\}to be the same asℳ\\mathcal\{M\}, but with two modifications\.

First,ℳ′\\mathcal\{M^\{\\prime\}\}contains an additional exogenous variableO\.UO\.Uwith the same domain asO\.BO\.B, so that𝐔′=𝐔∪\{O\.U\}\\mathbf\{U\}^\{\\prime\}=\\mathbf\{U\}\\cup\\\{O\.U\\\}\. Second,ℳ′\\mathcal\{M^\{\\prime\}\}has a functionfO\.B′f^\{\\prime\}\_\{O\.B\}as follows:

fO\.B′\(𝐩𝐚O\.B∪𝐩𝐚O\.A∪\{O\.a\},𝐮O\.B′∪𝐮O\.A,𝐩𝐚O\.Br∪𝐩𝐚O\.Ar,𝐮O\.Br∪𝐮O\.Ar\)\\displaystyle f^\{\\prime\}\_\{O\.B\}\(\\mathbf\{pa\}\_\{O\.B\}\\cup\\mathbf\{pa\}\_\{O\.A\}\\cup\\\{O\.a\\\},\\mathbf\{u\}^\{\\prime\}\_\{O\.B\}\\cup\\mathbf\{u\}\_\{O\.A\},\\mathbf\{pa\}^\{r\}\_\{O\.B\}\\cup\\mathbf\{pa\}^\{r\}\_\{O\.A\},\\mathbf\{u\}^\{r\}\_\{O\.B\}\\cup\\mathbf\{u\}^\{r\}\_\{O\.A\}\)=\{fO\.B\(𝐩𝐚O\.B,𝐮O\.B′∖\{O\.U\},𝐩𝐚O\.Br,𝐮O\.Br\)ifO\.a=fO\.A\(𝐩𝐚O\.A,𝐮O\.A′∖\{O\.u\},𝐩𝐚O\.Ar,𝐮O\.Ar\)O\.uotherwise\\displaystyle=\\begin\{cases\}f\_\{O\.B\}\(\\mathbf\{pa\}\_\{O\.B\},\\mathbf\{u\}^\{\\prime\}\_\{O\.B\}\\setminus\\\{O\.U\\\},\\mathbf\{pa\}^\{r\}\_\{O\.B\},\\mathbf\{u\}^\{r\}\_\{O\.B\}\)&\\textbf\{ if \}O\.a=f\_\{O\.A\}\(\\mathbf\{pa\}\_\{O\.A\},\\mathbf\{u\}^\{\\prime\}\_\{O\.A\}\\setminus\\\{O\.u\\\},\\mathbf\{pa\}^\{r\}\_\{O\.A\},\\mathbf\{u\}^\{r\}\_\{O\.A\}\)\\\\ O\.u&\\text\{ otherwise \}\\end\{cases\}Above,fO\.B′f^\{\\prime\}\_\{O\.B\}has an extended parent set forO\.BO\.Bthat takes in all the parents \(endogenous and exogenous, non\-relational and relational\) ofO\.AO\.A\. SinceO\.AO\.AandO\.BO\.Bbelong to the same instance, for any relational parent\(𝐖,ϕ\)∈𝐏𝐚O\.Ar∪𝐔O\.Ar\(\\mathbf\{W\},\\phi\)\\in\\mathbf\{Pa\}^\{r\}\_\{O\.A\}\\cup\\mathbf\{U\}^\{r\}\_\{O\.A\}, and any skeletonρ\\rho,ϕ\(o,t\)\\phi\(o,t\)will hold infO\.B′f^\{\\prime\}\_\{O\.B\}inρ\\rhoiff it holds infO\.Af\_\{O\.A\}inρ\\rho\.

In the observational regime, the conditionO\.a=fO\.A\(…\)O\.a=f\_\{O\.A\}\(\\dots\)is always true; therefore,fO\.B′f^\{\\prime\}\_\{O\.B\}is exactly equal tofO\.Bf\_\{O\.B\}, andℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}induce the sameP\(𝐯ρ⋆\)P\(\\mathbf\{v\}\_\{\\rho\_\{\\star\}\)\}\. However, this is not the case under intervention\. Consider an intervention that setso1\.A=ao\_\{1\}\.A=afor some instanceo1o\_\{1\}ofOOinρ\\rho\. There is some assignment𝐮ρ\\mathbf\{u\}\_\{\\rho\}to the exogenous variables such that the valuationo1\.A\(𝐮ρ\)≠ao\_\{1\}\.A\(\\mathbf\{u\}\_\{\\rho\}\)\\neq a\. Under this assignment, the conditiono1\.a=fO\.A\(…\)o\_\{1\}\.a=f\_\{O\.A\}\(\\dots\)fails, and thuso1\.B←x1\.uo\_\{1\}\.B\\leftarrow x\_\{1\}\.u\. The probabilityP\(O\.U\)P\(O\.U\)inℳ′\\mathcal\{M\}^\{\\prime\}can be chosen such thatPℳρ′\(o1\.B=b∣do\(o1\.A=a\)\)=P\(\(o1\.U=b\)≠Pℳρ\(o1\.B=b∣do\(o1\.A=a\)\)P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\}\}\(o\_\{1\}\.B=b\\mid do\(o\_\{1\}\.A=a\)\)=P\(\(o\_\{1\}\.U=b\)\\neq P^\{\\mathcal\{M\}\_\{\\rho\}\}\(o\_\{1\}\.B=b\\mid do\(o\_\{1\}\.A=a\)\)for somebbin the domain ofO\.BO\.B\.

∎

### D\.2Proofs for Sec\.[4](https://arxiv.org/html/2606.14892#S4)

First, we will show how if an RSCMℳ\\mathcal\{M\}induces by Def\.[4\.1](https://arxiv.org/html/2606.14892#S4.Thmtheorem1)a causal graph𝒢\\mathcal\{G\}, then for any skeletonρ\\rho, the ground RSCMℳρ\\mathcal\{M\}\_\{\\rho\}, when viewed as a standard SCM, induces \(by the definition in \(Sec\.[2](https://arxiv.org/html/2606.14892#S2)\)\) a causal graph that is equal to the marginalized ground graph𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}\(Def\.[B\.7](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition7)\)\.

###### Lemma D\.3\.

Consider a relational schema𝒮\\mathcal\{S\}, skeletonρ\\rho, and RSCMℳ\\mathcal\{M\}inducing the relational causal graph𝒢\\mathcal\{G\}\. Then, the ground RSCMℳρ\\mathcal\{M\}\_\{\\rho\}induces the marginalized ground graph𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}\.

###### Proof\.

Letℳ\\mathcal\{M\},𝒢\\mathcal\{G\},ρ\\rho, and𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}be as given in the statement of the lemma\. Let𝒢′\\mathcal\{G\}^\{\\prime\}be the causal graph induced byℳρ\\mathcal\{M\}\_\{\\rho\}viewed as a standard SCM, according to Sec\.[2](https://arxiv.org/html/2606.14892#S2)\)\.𝒢′\\mathcal\{G\}^\{\\prime\}and𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}contain the same nodes by construction, since relational nodes have been marginalized from𝒢ρ\\mathcal\{G\}\_\{\\rho\}to get𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}\. We will show𝒢′=𝒢¯ρ\\mathcal\{G\}^\{\\prime\}=\\bar\{\\mathcal\{G\}\}\_\{\\rho\}by showing that every edge in𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}is also in𝒢′\\mathcal\{G\}^\{\\prime\}and vice\-versa\.

##### Within\-instance edges\.

Fix a typeO∈ℰ∪ℛO\\in\\mathcal\{E\\cup R\}and an instanceo∈ρ\(O\)o\\in\\rho\(O\)\. For any variableo\.A,o\.B∈𝐕ρo\.A,o\.B\\in\\mathbf\{V\}\_\{\\rho\}, the endogenous variables inℳρ\\mathcal\{M\}\_\{\\rho\},

1. 1\.Every within\-instance edge in𝒢′\\mathcal\{G\}^\{\\prime\}is also an edge in𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}\. o\.B∈𝐏𝐚o\.Ainℳρ\\displaystyle o\.B\\in\\mathbf\{Pa\}\_\{o\.A\}\\textrm\{ in \}\\mathcal\{M\}\_\{\\rho\}⟹O\.B∈𝐏𝐚O\.A∈ℳ\\displaystyle\\implies O\.B\\in\\mathbf\{Pa\}\_\{O\.A\}\\in\\mathcal\{M\}\(ℳρ\\mathcal\{M\}\_\{\\rho\}groundsℳ\\mathcal\{M\}onρ\\rho, Def\.[B\.4](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition4)\)⟹O\.B→O\.A∈𝒢\\displaystyle\\implies O\.B\\to O\.A\\in\\mathcal\{G\}\(by construction of𝒢\\mathcal\{G\}fromℳ\\mathcal\{M\}, Def\.[4\.1](https://arxiv.org/html/2606.14892#S4.Thmtheorem1)\)⟹o\.B→o\.A∈𝒢ρ\\displaystyle\\implies o\.B\\to o\.A\\in\\mathcal\{G\}\_\{\\rho\}\(by construction of𝒢ρ\\mathcal\{G\}\_\{\\rho\}from𝒢\\mathcal\{G\}, Def\.[B\.6](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition6)\)⟹o\.B→o\.A∈𝒢¯ρ\\displaystyle\\implies o\.B\\to o\.A\\in\\bar\{\\mathcal\{G\}\}\_\{\\rho\}\(by construction of𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}from𝒢ρ\\mathcal\{G\}\_\{\\rho\}, Def\.[B\.7](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition7)\)∃o\.U∈𝐔ρ,o\.U∈𝐔o\.A∩𝐔o\.B\\displaystyle\\exists o\.U\\in\\mathbf\{U\}\_\{\\rho\},o\.U\\in\\mathbf\{U\}\_\{o\.A\}\\cap\\mathbf\{U\}\_\{o\.B\}⟹O\.U∈𝐔O\.A∩𝐔O\.Binℳ\\displaystyle\\implies O\.U\\in\\mathbf\{U\}\_\{O\.A\}\\cap\\mathbf\{U\}\_\{O\.B\}\\textrm\{ in \}\\mathcal\{M\}\(sinceℳρ\\mathcal\{M\}\_\{\\rho\}groundsℳ\\mathcal\{M\}onρ\\rho\)⟹O\.A↔O=O′O\.B∈𝒢\\displaystyle\\implies O\.A\\overset\{O=O^\{\\prime\}\}\{\\leftrightarrow\}O\.B\\in\\mathcal\{G\}\(by construction of𝒢\\mathcal\{G\}fromℳ\\mathcal\{M\}\)⟹o\.A↔o\.B∈𝒢ρ\\displaystyle\\implies o\.A\\leftrightarrow o\.B\\in\\mathcal\{G\}\_\{\\rho\}\(by construction of𝒢ρ\\mathcal\{G\}\_\{\\rho\}from𝒢\\mathcal\{G\}\)⟹o\.A↔o\.B∈𝒢¯ρ\\displaystyle\\implies o\.A\\leftrightarrow o\.B\\in\\bar\{\\mathcal\{G\}\}\_\{\\rho\}\(by construction of𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}from𝒢ρ\\mathcal\{G\}\_\{\\rho\}\)
2. 2\.Every within\-instance edge in𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}is also an edge in𝒢′\\mathcal\{G\}^\{\\prime\}\. o\.B→o\.A∈𝒢¯ρ\\displaystyle o\.B\\to o\.A\\in\\bar\{\\mathcal\{G\}\}\_\{\\rho\}⟹o\.B→o\.A∈𝒢ρ\\displaystyle\\implies o\.B\\to o\.A\\in\\mathcal\{G\}\_\{\\rho\}\(within\-instance edges preserved by construction of𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}from𝒢ρ\\mathcal\{G\}\_\{\\rho\}\)⟹O\.B→O\.A∈𝒢\\displaystyle\\implies O\.B\\to O\.A\\in\\mathcal\{G\}\(by construction of𝒢ρ\\mathcal\{G\}\_\{\\rho\}from𝒢\\mathcal\{G\}\)⟹O\.B∈𝐏𝐚O\.Ainℳ\\displaystyle\\implies O\.B\\in\\mathbf\{Pa\}\_\{O\.A\}\\textrm\{ in \}\\mathcal\{M\}\(by construction of𝒢\\mathcal\{G\}fromℳ\\mathcal\{M\}\)⟹o\.B∈𝐏𝐚o\.Ainℳρ\\displaystyle\\implies o\.B\\in\\mathbf\{Pa\}\_\{o\.A\}\\textrm\{ in \}\\mathcal\{M\}\_\{\\rho\}\(sinceℳρ\\mathcal\{M\}\_\{\\rho\}groundsℳ\\mathcal\{M\}onρ\\rho\)o\.B↔o\.A∈𝒢¯ρ\\displaystyle o\.B\\leftrightarrow o\.A\\in\\bar\{\\mathcal\{G\}\}\_\{\\rho\}⟹o\.B↔o\.A∈𝒢ρ\\displaystyle\\implies o\.B\\leftrightarrow o\.A\\in\\mathcal\{G\}\_\{\\rho\}\(within\-instance edges preserved by construction of𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}from𝒢ρ\\mathcal\{G\}\_\{\\rho\}\)⟹O\.B↔O=O′O\.A∈𝒢\\displaystyle\\implies O\.B\\overset\{O=O^\{\\prime\}\}\{\\leftrightarrow\}O\.A\\in\\mathcal\{G\}\(by construction of𝒢ρ\\mathcal\{G\}\_\{\\rho\}from𝒢\\mathcal\{G\}\)⟹∃O\.U∈𝐔,O\.U∈𝐔O\.A∩𝐔O\.Binℳ\\displaystyle\\implies\\exists O\.U\\in\\mathbf\{U\},O\.U\\in\\mathbf\{U\}\_\{O\.A\}\\cap\\mathbf\{U\}\_\{O\.B\}\\textrm\{ in \}\\mathcal\{M\}\(by construction of𝒢\\mathcal\{G\}fromℳ\\mathcal\{M\}\)⟹∃o\.U∈𝐔ρ,o\.U∈𝐔o\.A∩𝐔o\.Binℳρ\\displaystyle\\implies\\exists o\.U\\in\\mathbf\{U\}\_\{\\rho\},o\.U\\in\\mathbf\{U\}\_\{o\.A\}\\cap\\mathbf\{U\}\_\{o\.B\}\\textrm\{ in \}\\mathcal\{M\}\_\{\\rho\}\(ℳρ\\mathcal\{M\}\_\{\\rho\}groundsℳ\\mathcal\{M\}onρ\\rho\)

##### Cross\-instance edges\.

Fix typesO,T∈ℰ∪ℛO,T\\in\\mathcal\{E\\cup R\}and non\-identical instanceso∈ρ\(O\),t∈ρ\(T\)o\\in\\rho\(O\),t\\in\\rho\(T\)\. For any variableso\.A,t\.B∈𝐕ρo\.A,t\.B\\in\\mathbf\{V\}\_\{\\rho\},

1. 1\.Every cross\-instance edge in𝒢′\\mathcal\{G\}^\{\\prime\}is also an edge in𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}\. t\.B∈𝐏𝐚o\.Ainℳρ\\displaystyle t\.B\\in\\mathbf\{Pa\}\_\{o\.A\}\\textrm\{ in \}\\mathcal\{M\}\_\{\\rho\}⟹∃R=\(𝐖,ϕ,AGG\)withT\.B∈𝐖,R∈𝐏𝐚O\.Ar∈ℳsuch thatϕ\(o,t\)\\displaystyle\\implies\\exists R=\(\\mathbf\{W\},\\phi,\\textrm\{AGG\}\)\\textrm\{ with \}T\.B\\in\\mathbf\{W\},R\\in\\mathbf\{Pa\}^\{r\}\_\{O\.A\}\\in\\mathcal\{M\}\\textrm\{ such that \}\\phi\(o,t\)\(ℳρ\\mathcal\{M\}\_\{\\rho\}groundsℳ\\mathcal\{M\}onρ\\rho, Def\.[B\.4](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition4)\)⟹T\.B→ϕ,AGGO\.R→O\.A∈𝒢\\displaystyle\\implies T\.B\\overset\{\\phi,\\textrm\{AGG\}\}\{\\to\}O\.R\\to O\.A\\in\\mathcal\{G\}\(by construction of𝒢\\mathcal\{G\}fromℳ\\mathcal\{M\}, Def\.[4\.1](https://arxiv.org/html/2606.14892#S4.Thmtheorem1)\)⟹t\.B→ϕ,AGGo\.R→o\.A∈𝒢ρ\\displaystyle\\implies t\.B\\overset\{\\phi,\\textrm\{AGG\}\}\{\\to\}o\.R\\to o\.A\\in\\mathcal\{G\}\_\{\\rho\}\(by construction of𝒢ρ\\mathcal\{G\}\_\{\\rho\}from𝒢\\mathcal\{G\}, Def\.[B\.6](https://arxiv.org/html/2606.14892#A2.Thmadxdefinition6)\)⟹o\.B→o\.A∈G¯ρ\\displaystyle\\implies o\.B\\to o\.A\\in\\bar\{G\}\_\{\\rho\}\(by construction of𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}from𝒢ρ\\mathcal\{G\}\_\{\\rho\}\)∃z\.U∈𝐔ρ,z\.U∈𝐔o\.A∩𝐔o\.B\\displaystyle\\exists z\.U\\in\\mathbf\{U\}\_\{\\rho\},z\.U\\in\\mathbf\{U\}\_\{o\.A\}\\cap\\mathbf\{U\}\_\{o\.B\}⟹∃R1=\(𝐖𝟏,ϕ1,AGG1\)∈𝐔O\.ArandR2=\(𝐖𝟐,ϕ2,AGG2\)∈𝐔T\.Brinℳ\\displaystyle\\implies\\exists R\_\{1\}=\(\\mathbf\{W\_\{1\}\},\\phi\_\{1\},\\textrm\{AGG\}\_\{1\}\)\\in\\mathbf\{U\}^\{r\}\_\{O\.A\}\\textrm\{ and \}R\_\{2\}=\(\\mathbf\{W\_\{2\}\},\\phi\_\{2\},\\textrm\{AGG\}\_\{2\}\)\\in\\mathbf\{U\}^\{r\}\_\{T\.B\}\\textrm\{ in \}\\mathcal\{M\}withZ\.U∈𝐖𝟏∩𝐖𝟐andϕ1\(o,z\)∧ϕ2\(t,z\)\\displaystyle\\hskip 30\.00005pt\\textrm\{ with \}Z\.U\\in\\mathbf\{W\_\{1\}\}\\cap\\mathbf\{W\_\{2\}\}\\textrm\{ and \}\\phi\_\{1\}\(o,z\)\\wedge\\phi\_\{2\}\(t,z\)\(ℳρ\\mathcal\{M\}\_\{\\rho\}groundsℳ\\mathcal\{M\}onρ\\rho\)⟹O\.A↔∃Z:ϕ1\(O,Z\)∧ϕ2\(T,Z\)T\.B∈𝒢\\displaystyle\\implies O\.A\\overset\{\\exists Z:\\phi\_\{1\}\(O,Z\)\\wedge\\phi\_\{2\}\(T,Z\)\}\{\\leftrightarrow\}T\.B\\in\\mathcal\{G\}\(by construction of𝒢\\mathcal\{G\}fromℳ\\mathcal\{M\}\)⟹o\.A↔o\.B∈𝒢ρ\\displaystyle\\implies o\.A\\leftrightarrow o\.B\\in\\mathcal\{G\}\_\{\\rho\}\(by construction of𝒢ρ\\mathcal\{G\}\_\{\\rho\}from𝒢\\mathcal\{G\}\)⟹o\.A↔o\.B∈𝒢¯ρ\\displaystyle\\implies o\.A\\leftrightarrow o\.B\\in\\bar\{\\mathcal\{G\}\}\_\{\\rho\}\(by construction of𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}from𝒢ρ\\mathcal\{G\}\_\{\\rho\}, which preserves bidirected edges\)
2. 2\.Every cross\-instance edge in𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}is also an edge in𝒢′\\mathcal\{G\}^\{\\prime\}\. t\.B→o\.A∈𝒢¯ρ\\displaystyle t\.B\\to o\.A\\in\\bar\{\\mathcal\{G\}\}\_\{\\rho\}⟹t\.B→o\.R→o\.A∈𝒢ρfor someR=\(𝐖,ϕ,AGG\)\\displaystyle\\implies t\.B\\to o\.R\\to o\.A\\in\\mathcal\{G\}\_\{\\rho\}\\textrm\{ for some \}R=\(\\mathbf\{W\},\\phi,\\textrm\{AGG\}\)\(by construction of𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}from𝒢ρ\\mathcal\{G\}\_\{\\rho\}\)⟹T\.B→ϕ,AGGO\.R→O\.A∈𝒢\\displaystyle\\implies T\.B\\overset\{\\phi,\\textrm\{AGG\}\}\{\\to\}O\.R\\to O\.A\\in\\mathcal\{G\}\(by construction of𝒢ρ\\mathcal\{G\}\_\{\\rho\}from𝒢\\mathcal\{G\}\)⟹R∈𝐏𝐚O\.Arinℳ\\displaystyle\\implies R\\in\\mathbf\{Pa\}^\{r\}\_\{O\.A\}\\textrm\{ in \}\\mathcal\{M\}\(by construction of𝒢\\mathcal\{G\}fromℳ\\mathcal\{M\}\)⟹t\.B∈𝐏𝐚o\.Arinℳρ\\displaystyle\\implies t\.B\\in\\mathbf\{Pa\}^\{r\}\_\{o\.A\}\\textrm\{ in \}\\mathcal\{M\}\_\{\\rho\}\(sinceℳρ\\mathcal\{M\}\_\{\\rho\}groundsℳ\\mathcal\{M\}onρ\\rho\)t\.B↔o\.A∈𝒢¯ρ\\displaystyle t\.B\\leftrightarrow o\.A\\in\\bar\{\\mathcal\{G\}\}\_\{\\rho\}⟹o\.B↔o\.A∈𝒢ρ\\displaystyle\\implies o\.B\\leftrightarrow o\.A\\in\\mathcal\{G\}\_\{\\rho\}\(bidirected edges preserved by construction of𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}from𝒢ρ\\mathcal\{G\}\_\{\\rho\}\)⟹O\.B↔ϕO\.A∈𝒢for someϕsuch thatϕ\(o,t\)\\displaystyle\\implies O\.B\\overset\{\\phi\}\{\\leftrightarrow\}O\.A\\in\\mathcal\{G\}\\textrm\{ for some \}\\phi\\textrm\{ such that \}\\phi\(o,t\)\(by construction of𝒢ρ\\mathcal\{G\}\_\{\\rho\}from𝒢\\mathcal\{G\}\)⟹∃Z\.U∈𝐔,z∈ρ\(Z\),R1=\(𝐖𝟏,ϕ1,AGG1\)∈𝐔O\.ArandR2=\(𝐖𝟐,ϕ2,AGG2\)∈𝐔T\.Brinℳ\\displaystyle\\implies\\exists Z\.U\\in\\mathbf\{U\},z\\in\\rho\(Z\),R\_\{1\}=\(\\mathbf\{W\_\{1\}\},\\phi\_\{1\},\\textrm\{AGG\}\_\{1\}\)\\in\\mathbf\{U\}^\{r\}\_\{O\.A\}\\textrm\{ and \}R\_\{2\}=\(\\mathbf\{W\_\{2\}\},\\phi\_\{2\},\\textrm\{AGG\}\_\{2\}\)\\in\\mathbf\{U\}^\{r\}\_\{T\.B\}\\textrm\{ in \}\\mathcal\{M\}withZ\.U∈𝐖𝟏∩𝐖𝟐andϕ1\(o,z\)∧ϕ2\(t,z\)\\displaystyle\\hskip 30\.00005pt\\textrm\{ with \}Z\.U\\in\\mathbf\{W\_\{1\}\}\\cap\\mathbf\{W\_\{2\}\}\\textrm\{ and \}\\phi\_\{1\}\(o,z\)\\wedge\\phi\_\{2\}\(t,z\)\(by construction of𝒢\\mathcal\{G\}fromℳ\\mathcal\{M\}\)⟹∃z\.U∈𝐔ρ,z\.U∈𝐔o\.A∩𝐔t\.Binℳρ\\displaystyle\\implies\\exists z\.U\\in\\mathbf\{U\}\_\{\\rho\},\\ z\.U\\in\\mathbf\{U\}\_\{o\.A\}\\cap\\mathbf\{U\}\_\{t\.B\}\\textrm\{ in \}\\mathcal\{M\}\_\{\\rho\}\(sinceℳρ\\mathcal\{M\}\_\{\\rho\}groundsℳ\\mathcal\{M\}onρ\\rho\)

∎

###### Corollary D\.1\.

Consider a relational schema𝒮\\mathcal\{S\}, skeletonρ\\rho, and RSCMℳ\\mathcal\{M\}inducing the relational causal graph𝒢\\mathcal\{G\}\. Then, the ground RSCMℳρ\\mathcal\{M\}\_\{\\rho\}induces counterfactual distributions that satisfy all counterfactul equality constraints encoded in𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}\.

###### Proof\.

By Lemma[D\.3](https://arxiv.org/html/2606.14892#A4.Thmtheorem3), we have thatℳρ\\mathcal\{M\}\_\{\\rho\}induces𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}\.\(Xia et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib131), Lemma 1\)shows that if a standard SCMℳ\\mathcal\{M\}induces a causal diagram𝒢\\mathcal\{G\}, then its induced distributions satisfy all counterfactual equality constraints encoded in𝒢\\mathcal\{G\}\. The result follows\. ∎

###### Theorem[4\.3](https://arxiv.org/html/2606.14892#S4.Thmtheorem3)\(Observational identification across skeletons\)\.

Consider a schema𝒮\\mathcal\{S\}, relational causal graph𝒢\\mathcal\{G\}, source skeletonρ\\rho, and target skeletonρ⋆\\rho\_\{\\star\}\. Leto\.Ao\.Abe an unconfounded variable in𝐕ρ⋆\\mathbf\{V\}\_\{\\rho\_\{\\star\}\}\. The conditionalP\(o\.a∣𝐩𝐚o\.A,𝐩𝐚o\.Ar\)P\(o\.a\\mid\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\}\)is relationally identifiable from𝒢\\mathcal\{G\}andP\(𝐯ρ\)P\(\\mathbf\{v\}\_\{\\rho\}\)if there exists a source instanceo′∈ρo^\{\\prime\}\\in\\rhosuch thato′\.Ao^\{\\prime\}\.Ais unconfounded anddom\(𝐏𝐚o\.Ar\)⊆dom\(𝐏𝐚o′\.Ar\)\\textnormal\{dom\}\(\\mathbf\{Pa\}^\{r\}\_\{o\.A\}\)\\subseteq\\textnormal\{dom\}\(\\mathbf\{Pa\}^\{r\}\_\{o^\{\\prime\}\.A\}\)\. In this case,P\(o\.a∣𝐩𝐚o\.A,𝐩𝐚o\.Ar\)=P\(o′\.a∣𝐩𝐚o′\.A,𝐩𝐚o′\.Ar\)P\(o\.a\\mid\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\}\)=P\(o^\{\\prime\}\.a\\mid\\mathbf\{pa\}\_\{o^\{\\prime\}\.A\},\\mathbf\{pa\}^\{r\}\_\{o^\{\\prime\}\.A\}\)\.

###### Proof\.

Consider any two RSCMsℳ,ℳ′\\mathcal\{M\},\\mathcal\{M\}^\{\\prime\}compatible with𝒢\\mathcal\{G\}such thatPℳρ\(𝐯ρ\)=Pℳρ′\(𝐯ρ\)P^\{\\mathcal\{M\}\_\{\\rho\}\}\(\\mathbf\{v\}\_\{\\rho\}\)=P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\}\}\(\\mathbf\{v\}\_\{\\rho\}\)\. We need to show that

Pℳρ⋆\(o\.a∣𝐩𝐚o\.A,𝐩𝐚o\.Ar\)=Pℳρ⋆′\(o\.a∣𝐩𝐚o\.A,𝐩𝐚o\.Ar\)\.P^\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}\(o\.a\\mid\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\}\)=P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\_\{\\star\}\}\}\(o\.a\\mid\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\}\)\.
Consider theo′∈ρo^\{\\prime\}\\in\\rhogiven by the assumption\. Sinceℳ,ℳ′\\mathcal\{M\},\\mathcal\{M\}^\{\\prime\}agree on the observational distribution overP\(𝐯ρ\)P\(\\mathbf\{v\}\_\{\\rho\}\), this implies that

Pℳρ\(o′\.a∣𝐩𝐚o′\.A,𝐩𝐚o′\.Ar\)=Pℳρ′\(o′\.a∣𝐩𝐚o\.A,𝐩𝐚o′\.Ar\)P^\{\\mathcal\{M\}\_\{\\rho\}\}\(o^\{\\prime\}\.a\\mid\\mathbf\{pa\}\_\{o^\{\\prime\}\.A\},\\mathbf\{pa\}^\{r\}\_\{o^\{\\prime\}\.A\}\)=P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\}\}\(o^\{\\prime\}\.a\\mid\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o^\{\\prime\}\.A\}\)
It suffices, then, to show that for all valueso\.a=o′\.a,𝐩𝐚o\.A=𝐩𝐚o′\.Ao\.a=o^\{\\prime\}\.a,\\mathbf\{pa\}\_\{o\.A\}=\\mathbf\{pa\}\_\{o^\{\\prime\}\.A\}, and𝐩𝐚o\.Ar=𝐩𝐚o′\.Ar\\mathbf\{pa\}^\{r\}\_\{o\.A\}=\\mathbf\{pa\}^\{r\}\_\{o^\{\\prime\}\.A\}\(with the latter comparison made for values in the support of𝐩𝐚o\.Ar\\mathbf\{pa\}^\{r\}\_\{o\.A\}\)\.

Pℳρ\(o′\.a∣𝐩𝐚o′\.A,𝐩𝐚o′\.Ar\)=Pℳρ⋆\(o\.A∣𝐩𝐚o\.A,𝐩𝐚o\.Ar\)P^\{\\mathcal\{M\}\_\{\\rho\}\}\(o^\{\\prime\}\.a\\mid\\mathbf\{pa\}\_\{o^\{\\prime\}\.A\},\\mathbf\{pa\}^\{r\}\_\{o^\{\\prime\}\.A\}\)=P^\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}\(o\.A\\mid\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\}\)\(1\)
and

Pℳρ′\(o′\.a∣𝐩𝐚o′\.A,𝐩𝐚o′\.Ar\)=Pℳρ⋆′\(o\.a∣𝐩𝐚o\.A,𝐩𝐚o\.Ar\)\.P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\}\}\(o^\{\\prime\}\.a\\mid\\mathbf\{pa\}\_\{o^\{\\prime\}\.A\},\\mathbf\{pa\}^\{r\}\_\{o^\{\\prime\}\.A\}\)=P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\_\{\\star\}\}\}\(o\.a\\mid\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\}\)\.\(2\)
Let the structural equation foro\.Ao\.Ainℳρ⋆\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}beo\.A←fO\.A\(𝐩𝐚o\.A,𝐮o\.A,𝐩𝐚o\.Ar\)o\.A\\leftarrow f\_\{O\.A\}\(\\mathbf\{pa\}\_\{o\.A\},\\ \\mathbf\{u\}\_\{o\.A\},\\ \\mathbf\{pa\}^\{r\}\_\{o\.A\}\), where𝐔o\.A\\mathbf\{U\}\_\{o\.A\}are the exogenous parents ofo\.Ao\.Ainℳρ⋆\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\. By condition \(1\) of unconfoundedness,o\.Ao\.Ashares no bidirected edge with any variable in𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}\. By Prop\.[D\.3](https://arxiv.org/html/2606.14892#A4.Thmtheorem3), this implies that𝐔o\.A⟂⟂\(𝐏𝐚o\.A,𝐏𝐚o\.Ar\)\.\\mathbf\{U\}\_\{o\.A\}\\perp\\\!\\\!\\\!\\perp\(\\mathbf\{Pa\}\_\{o\.A\},\\mathbf\{Pa\}^\{r\}\_\{o\.A\}\)\.Therefore, for any valuesa,𝐩𝐚o\.A,𝐩𝐚o\.Ara,\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\},

Pℳρ⋆\(o\.a∣𝐩𝐚o\.A,𝐩𝐚o\.Ar\)\\displaystyle P^\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}\(o\.a\\mid\\mathbf\{pa\}\_\{o\.A\},\\ \\mathbf\{pa\}^\{r\}\_\{o\.A\}\)=∑𝐮o\.AP\(o\.a∣𝐮o\.A,𝐩𝐚o\.A,𝐩𝐚o\.Ar\)P\(𝐮o\.A∣𝐩𝐚o\.A,𝐩𝐚o\.Ar\)\\displaystyle=\\sum\_\{\\mathbf\{u\}\_\{o\.A\}\}P\(o\.a\\mid\\mathbf\{u\}\_\{o\.A\},\\mathbf\{pa\}\_\{o\.A\},\\ \\mathbf\{pa\}^\{r\}\_\{o\.A\}\)P\(\\mathbf\{u\}\_\{o\.A\}\\mid\\mathbf\{pa\}\_\{o\.A\},\\ \\mathbf\{pa\}^\{r\}\_\{o\.A\}\)=∑𝐮o\.AP\(o\.a∣𝐮o\.A,𝐩𝐚o\.A,𝐩𝐚o\.Ar\)P\(𝐮o\.A\)\\displaystyle=\\sum\_\{\\mathbf\{u\}\_\{o\.A\}\}P\(o\.a\\mid\\mathbf\{u\}\_\{o\.A\},\\mathbf\{pa\}\_\{o\.A\},\\ \\mathbf\{pa\}^\{r\}\_\{o\.A\}\)P\(\\mathbf\{u\}\_\{o\.A\}\)\(𝐔o\.A⟂⟂\(𝐏𝐚o\.A,𝐏𝐚o\.Ar\)\\mathbf\{U\}\_\{o\.A\}\\perp\\\!\\\!\\\!\\perp\(\\mathbf\{Pa\}\_\{o\.A\},\\mathbf\{Pa\}^\{r\}\_\{o\.A\}\)\)=∑𝐮o\.A𝟏\[fO\.A\(𝐩𝐚o\.A,𝐮o\.A,𝐩𝐚o\.Ar\)=a\]P\(𝐮o\.A\)\\displaystyle=\\sum\_\{\\mathbf\{u\}\_\{o\.A\}\}\\mathbf\{1\}\[f\_\{O\.A\}\(\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{u\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\}\)=a\]P\(\\mathbf\{u\}\_\{o\.A\}\)
Analogously, foro′\.Ao^\{\\prime\}\.Ainℳρ\\mathcal\{M\}\_\{\\rho\}, we get

Pℳρ\(o′\.a∣𝐩𝐚o′\.A,𝐩𝐚o′\.Ar\)=∑𝐮o′\.A𝟏\[fO\.A\(𝐩𝐚o′\.A,𝐮x\]\.A,𝐩𝐚o′\.Ar\)=a\]P\(𝐮o′\.A\)\\displaystyle P^\{\\mathcal\{M\}\_\{\\rho\}\}\(o^\{\\prime\}\.a\\mid\\mathbf\{pa\}\_\{o^\{\\prime\}\.A\},\\ \\mathbf\{pa\}^\{r\}\_\{o^\{\\prime\}\.A\}\)=\\sum\_\{\\mathbf\{u\}\_\{o^\{\\prime\}\.A\}\}\\mathbf\{1\}\[f\_\{O\.A\}\(\\mathbf\{pa\}\_\{o^\{\\prime\}\.A\},\\mathbf\{u\}\_\{x\]\.A\},\\mathbf\{pa\}^\{r\}\_\{o^\{\\prime\}\.A\}\)=a\]P\(\\mathbf\{u\}\_\{o^\{\\prime\}\.A\}\)Sinceℳρ\\mathcal\{M\}\_\{\\rho\}andℳρ⋆\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}are both groundings of the same RSCMℳ\\mathcal\{M\},𝐔o′\.A\\mathbf\{U\}\_\{o^\{\\prime\}\.A\}and𝐔o\.A\\mathbf\{U\}\_\{o\.A\}have the same domains, and whenever𝐮o′\.A=𝐮o\.A\\mathbf\{u\}\_\{o^\{\\prime\}\.A\}=\\mathbf\{u\}\_\{o\.A\}, we also haveP\(𝐮o′\.A\)=P\(𝐮o\.A\)P\(\\mathbf\{u\}\_\{o^\{\\prime\}\.A\}\)=P\(\\mathbf\{u\}\_\{o\.A\}\)\. Additionally, whenever𝐩𝐚o\.A=𝐩𝐚o′\.A\\mathbf\{pa\}\_\{o\.A\}=\\mathbf\{pa\}\_\{o^\{\\prime\}\.A\}and𝐩𝐚o\.Ar=𝐩𝐚o′\.Ar\\mathbf\{pa\}^\{r\}\_\{o\.A\}=\\mathbf\{pa\}^\{r\}\_\{o^\{\\prime\}\.A\}, because the mechanismfO\.Af\_\{O\.A\}is shared, we also have𝟏\[fO\.A\(𝐩𝐚o′\.A,𝐮x\]\.A,𝐩𝐚o′\.Ar\)=a\]=𝟏\[fO\.A\(𝐩𝐚o\.A,𝐮o\.A,𝐩𝐚o\.Ar\)=a\]\\mathbf\{1\}\[f\_\{O\.A\}\(\\mathbf\{pa\}\_\{o^\{\\prime\}\.A\},\\mathbf\{u\}\_\{x\]\.A\},\\mathbf\{pa\}^\{r\}\_\{o^\{\\prime\}\.A\}\)=a\]=\\mathbf\{1\}\[f\_\{O\.A\}\(\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{u\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\}\)=a\]\. This proves the equality in Eq\.\([1](https://arxiv.org/html/2606.14892#A4.E1)\)\. A similar calculation for the groundings ofℳ′\\mathcal\{M\}^\{\\prime\}proves the equality in Eq\.\([2](https://arxiv.org/html/2606.14892#A4.E2)\)\. ∎

###### Corollary D\.4\(Observational identification across skeletons \- Markovian\)\.

Consider a schema𝒮\\mathcal\{S\}and a Markovian relational causal graph𝒢\\mathcal\{G\}\. Given source skeletonsρ1,…,ρk\\rho\_\{1\},\\dots,\\rho\_\{k\}and a target skeletonρ⋆\\rho\_\{\\star\},P\(𝐯ρ⋆\)P\(\\mathbf\{v\}\_\{\\rho\_\{\\star\}\}\)is identifiable from the distributions\{P\(𝐯ρk\)\}k=1lP\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\)\\\}\_\{k=1\}^\{l\}and𝒢\\mathcal\{G\}assuming the support condition of Thm\.[4\.3](https://arxiv.org/html/2606.14892#S4.Thmtheorem3)is met for every variableo\.A∈𝐕ρ⋆o\.A\\in\\mathbf\{V\}\_\{\\rho\_\{\\star\}\}by someρk\\rho\_\{k\}\.

###### Proof\.

A Markovian graph𝒢\\mathcal\{G\}contains no bidirected edges\. Therefore, condition \(1\) of Thm\.[4\.3](https://arxiv.org/html/2606.14892#S4.Thmtheorem3)is met for everyo\.A∈𝐕ρ⋆o\.A\\in\\mathbf\{V\}\_\{\\rho\_\{\\star\}\}by someo′\.Ao^\{\\prime\}\.Ain someρk\\rho\_\{k\}\. Additionally, since𝒢\\mathcal\{G\}is Markovian, for any RSCMℳ\\mathcal\{M\}compatible with𝒢\\mathcal\{G\}, the ground RSCMℳρ⋆\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}is also Markovian\. Since, for eacho\.A∈𝐕ρ⋆o\.A\\in\\mathbf\{V\}\_\{\\rho\_\{\\star\}\},P\(o\.a∣𝐩𝐚o\.A,𝐩𝐚o\.Ar\)P\(o\.a\\mid\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\}\)is identifiable from someP\(𝐯ρk\)P\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\)by Thm\.[4\.3](https://arxiv.org/html/2606.14892#S4.Thmtheorem3), so is the jointP\(𝐯ρ⋆\)P\(\\mathbf\{v\}\_\{\\rho\_\{\\star\}\}\)by the by the Markov factorization\(Bareinboim,[2025](https://arxiv.org/html/2606.14892#bib.bib6), Thm\. 2\.4\.1\)and the compatibility ofP\(𝐯ρ⋆\)P\(\\mathbf\{v\}\_\{\\rho\_\{\\star\}\}\)and the marginalized ground graph𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}\.

P\(𝐯ρ⋆\)=∏o\.a∈𝐯ρ⋆P\(o\.a∣𝐩𝐚o\.A,𝐩𝐚o\.Ar\)\.P\(\\mathbf\{v\}\_\{\\rho\_\{\\star\}\}\)=\\prod\_\{o\.a\\in\\mathbf\{v\}\_\{\\rho\_\{\\star\}\}\}P\(o\.a\\mid\\mathbf\{pa\}\_\{o\.A\},\\mathbf\{pa\}^\{r\}\_\{o\.A\}\)\.Above,𝐏𝐚o\.A,𝐏𝐚o\.Ar⊆𝐕ρ\\mathbf\{Pa\}\_\{o\.A\},\\mathbf\{Pa\}^\{r\}\_\{o\.A\}\\subseteq\\mathbf\{V\}\_\{\\rho\}are the graphical parents ofo\.Ao\.Ain𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}containing variables within the same instance and from different instances respectively\. ∎

###### Proposition[4\.4](https://arxiv.org/html/2606.14892#S4.Thmtheorem4)\(Sufficient condition for same\-skeleton relational identification\)\.

Consider a schema𝒮\\mathcal\{S\}, relational causal graph𝒢\\mathcal\{G\}, skeletonρ\\rho, and family of interventional distributionsℙ\\mathbb\{P\}over𝐕ρ\\mathbf\{V\}\_\{\\rho\}\. IfP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)is identifiable via ctf\-calculus from the marginalized ground graph𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}andℙ\\mathbb\{P\}, then it is also relationally identifiable from𝒢\\mathcal\{G\}andℙ\\mathbb\{P\}\.

###### Proof\.

Assume that the givenP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)is identifiable via ctf\-calculus from the marginalized ground graphG¯ρ\\bar\{G\}\_\{\\rho\}andℙ\\mathbb\{P\}\. By the soundness of ctf\-calculus\(Correa & Bareinboim,[2025](https://arxiv.org/html/2606.14892#bib.bib19)\), this implies that for all \(standard\) SCMs𝒩,𝒩′\\mathcal\{N\},\\mathcal\{N\}^\{\\prime\}consistent withG¯ρ\\bar\{G\}\_\{\\rho\}and agreeing onℙ\\mathbb\{P\},𝒩,𝒩′\\mathcal\{N\},\\mathcal\{N\}^\{\\prime\}also agree onP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)\. We need to show that for any two RSCMsℳ,ℳ′\\mathcal\{M\},\\mathcal\{M^\{\\prime\}\}over𝒮\\mathcal\{S\}consistent with𝒢\\mathcal\{G\}, if the ground RSCMsℳρ\\mathcal\{M\}\_\{\\rho\}andℳ′ρ\\mathcal\{M^\{\\prime\}\}\_\{\\rho\}agree onℙ\\mathbb\{P\}, then they also agree onP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)\. By Lemma[D\.3](https://arxiv.org/html/2606.14892#A4.Thmtheorem3),ℳρ\\mathcal\{M\}\_\{\\rho\}andℳ′ρ\\mathcal\{M^\{\\prime\}\}\_\{\\rho\}induce the marginalized ground graphG¯ρ\\bar\{G\}\_\{\\rho\}\. Sinceℳρ\\mathcal\{M\}\_\{\\rho\}andℳ′ρ\\mathcal\{M^\{\\prime\}\}\_\{\\rho\}are a subset of the space of standard NCMs over𝐯ρ\\mathbf\{v\}\_\{\\rho\}, the result follows\. ∎

###### Corollary D\.2\(Relational backdoor adjustment\)\.

Consider a schema𝒮\\mathcal\{S\}, relational causal graph𝒢\\mathcal\{G\}, skeletonρ\\rho, and observational distributionP\(𝐯ρ\)P\(\\mathbf\{v\}\_\{\\rho\}\)\. LetP\(𝐲∣do\(𝐱\)\)P\(\\mathbf\{y\}\\mid do\(\\mathbf\{x\}\)\)be some query with𝐗,𝐘⊆𝐕ρ\\mathbf\{X\},\\mathbf\{Y\}\\subseteq\\mathbf\{V\}\_\{\\rho\}and let𝐙⊆𝐕ρ\\mathbf\{Z\}\\subseteq\\mathbf\{V\}\_\{\\rho\}be such that

1. 1\.𝐙\\mathbf\{Z\}contains no descendants of𝐗\\mathbf\{X\}in the marginalized ground graph𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}, and
2. 2\.𝐙\\mathbf\{Z\}blocks every path between𝐗\\mathbf\{X\}and𝐘\\mathbf\{Y\}that contains an arrow into𝐗\\mathbf\{X\}in𝒢¯ρ\\bar\{\\mathcal\{G\}\}\_\{\\rho\}\.

Then,P\(𝐲∣do\(𝐱\)\)P\(\\mathbf\{y\}\\mid do\(\\mathbf\{x\}\)\)is relationally identifiable from𝒢\\mathcal\{G\}andP\(𝐯ρ\)P\(\\mathbf\{v\}\_\{\\rho\}\)as

P\(𝐲∣do\(𝐱\)\)=∑𝐳P\(𝐲∣𝐱,𝐳\)P\(𝐳\)P\(\\mathbf\{y\}\\mid do\(\\mathbf\{x\}\)\)=\\sum\_\{\\mathbf\{z\}\}P\(\\mathbf\{y\\mid x,z\}\)P\(\\mathbf\{z\}\)

###### Proof\.

This follows from Prop\.[4\.4](https://arxiv.org/html/2606.14892#S4.Thmtheorem4)and the validity of backdoor adjustment\(Bareinboim,[2025](https://arxiv.org/html/2606.14892#bib.bib6)\)\. ∎

###### Proposition[4\.5](https://arxiv.org/html/2606.14892#S4.Thmtheorem5)\(Necessary condition for within\-instance relational identification\)\.

Consider a schema𝒮\\mathcal\{S\}, relational causal graph𝒢\\mathcal\{G\}, source skeletonsρ1,…,ρl\\rho\_\{1\},\\dots,\\rho\_\{l\}with available interventional distributionsℙ\\mathbb\{P\}, and a target skeletonρ⋆\\rho\_\{\\star\}\. Leto∈ρ⋆o\\in\\rho\_\{\\star\}be a target instance and consider a counterfactual queryP\(𝐲⋆∣𝐱⋆\)P\(\\mathbf\{y\}\_\{\\star\}\\mid\\mathbf\{x\}\_\{\\star\}\)with𝐘⋆,𝐗⋆⊆𝐕o\\mathbf\{Y\}\_\{\\star\},\\mathbf\{X\}\_\{\\star\}\\subseteq\\mathbf\{V\}\_\{o\}, the attributes ofoo\.

Let the restrictionℙ\|O\\mathbb\{P\}\|\_\{O\}be as follows\. For each source skeletonρk\\rho\_\{k\}, each distributionP\(𝐯ρk∣do\(𝐱k,j\)\)∈ℙP\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)\\in\\mathbb\{P\}, and each objecto′∈ρk\(O\)o^\{\\prime\}\\in\\rho\_\{k\}\(O\), includeP\(𝐯ρk,o′∣do\(𝐱k,j∩𝐯ρk,o′\)\)P\(\\mathbf\{v\}\_\{\\rho\_\{k\},o^\{\\prime\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\\cap\\mathbf\{v\}\_\{\\rho\_\{k\},o^\{\\prime\}\}\)\)inℙ\|O\\mathbb\{P\}\|\_\{O\}, with instance identifiers omitted\. Let𝒢o\\mathcal\{G\}\_\{o\}be the induced subgraph of the marginalized ground graph𝒢¯ρ⋆\\bar\{\\mathcal\{G\}\}\_\{\\rho\_\{\\star\}\}on𝐕o\\mathbf\{V\}\_\{o\}with instance identifiers omitted\.

IfP\(𝐲⋆∣𝐱⋆\)P\(\\mathbf\{y\}\_\{\\star\}\\mid\\mathbf\{x\}\_\{\\star\}\)is non\-identifiable via ctf\-calculus fromℙ\|O\\mathbb\{P\}\|\_\{O\}and𝒢o\\mathcal\{G\}\_\{o\}, then it is relationally non\-identifiable from𝒢\\mathcal\{G\}andℙ\\mathbb\{P\}\.

###### Proof\.

Fix an instanceo∈ρ⋆\(O\)o\\in\\rho\_\{\\star\}\(O\)and a queryP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)where𝐘∗,𝐗∗⊆𝐕ρ,x\\mathbf\{Y\_\{\*\}\},\\mathbf\{X\_\{\*\}\}\\subseteq\\mathbf\{V\}\_\{\\rho,x\}as described\.

Assume thatP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)is not identifiable via ctf\-calculus from𝒢o\\mathcal\{G\}\_\{o\}and the restrictionℙ\|O\\mathbb\{P\}\|\_\{O\}\. Note that here, and in the remainder of the result, we ignore instance identifiers inℙ\|O\\mathbb\{P\}\|\_\{O\}and𝒢o\\mathcal\{G\}\_\{o\}, simply considering within\-instance attributes as in the standard SCM setting\.

Then, by the completeness of ctf\-calculus\(Correa & Bareinboim,[2025](https://arxiv.org/html/2606.14892#bib.bib19)\), there exist two \(standard\) SCMs𝒩,𝒩′\\mathcal\{N\},\\mathcal\{N\}^\{\\prime\}consistent with𝒢o\\mathcal\{G\}\_\{o\}that agree on all distributions inℙ\|O\\mathbb\{P\}\|\_\{O\}but disagree onP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)\. Using𝒩,𝒩′\\mathcal\{N\},\\mathcal\{N\}^\{\\prime\}, we will construct two RSCMsℳ,ℳ′\\mathcal\{M\},\\mathcal\{M\}^\{\\prime\}that \(when grounded\) agree onℙ\\mathbb\{P\}but not onP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)\.

Constructℳ\\mathcal\{M\}as follows\. Letℳ\\mathcal\{M\}contain the endogenous variables given in𝒢\\mathcal\{G\}\.

1. 1\.First, consider attributes of typeOOinℳ\\mathcal\{M\}\. For each exogenous variableUUin𝒩\\mathcal\{N\}, letℳ\\mathcal\{M\}contain exogenous variableO\.UO\.U\. For attributesO\.AO\.Abelonging to typeOO, let the function determiningO\.AO\.Ainℳ\\mathcal\{M\}be the same as that determiningo\.Ao\.Ain𝒩\\mathcal\{N\}\. In particular,𝐏𝐚O\.Aℳ=𝐏𝐚o\.A𝒩\\mathbf\{Pa\}^\{\\mathcal\{M\}\}\_\{O\.A\}=\\mathbf\{Pa\}^\{\\mathcal\{N\}\}\_\{o\.A\}and𝐔O\.Aℳ=𝐔o\.A𝒩\\mathbf\{U\}^\{\\mathcal\{M\}\}\_\{O\.A\}=\\mathbf\{U\}^\{\\mathcal\{N\}\}\_\{o\.A\}\.O\.AO\.Ahas no non\-relational parents \(endogenous or exogenous\) inℳ\\mathcal\{M\}\.
2. 2\.Next, consider attributes of typeY≠XY\\neq Xinℳ\\mathcal\{M\}\. For each attributeT\.BT\.B, letℳ\\mathcal\{M\}contain an exogenous variableT\.UB∼ℬ\(0\.5\)T\.U\_\{B\}\\sim\\mathcal\{B\}\(0\.5\)\. Define the functionfT\.B:T\.B←T\.UBf\_\{T\.B\}:T\.B\\leftarrow T\.U\_\{B\}\. In other words, for each typeYY, each attributeT\.BT\.Bis determined by an independent fair coin flip\.

Defineℳ′\\mathcal\{M\}^\{\\prime\}similarly but using functions from𝒩\.\\mathcal\{N\}\.Then, for any skeletonρ\\rho, the groundingsℳρ,ℳρ′\\mathcal\{M\}\_\{\\rho\},\\mathcal\{M\}\_\{\\rho\}^\{\\prime\}consist of ‘copies’ of𝒩\\mathcal\{N\}and𝒩′\\mathcal\{N\}^\{\\prime\}for every instance of typeOO, and coin flips for all other variables\. Importantly, in bothℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}, there are no relational effects; all instances are independent in any grounding\.

We need to show thatℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}are consistent with𝒢\\mathcal\{G\}\. Sinceℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}contain no relational effects, it suffices to show that they are consistent with non\-relational edges in𝒢\\mathcal\{G\}\. Since attributes of typeY≠XY\\neq Xare all independent by construction,ℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}are consistent with any edges in𝒢\\mathcal\{G\}incident to such attributes\. For attributes of typeOO, note that for any instanceso′,o′′o^\{\\prime\},o^\{\\prime\\prime\}of typeOOin any skeletonρ\\rho, the induced subgraphs𝒢o′\\mathcal\{G\}\_\{o^\{\\prime\}\}and𝒢o\\mathcal\{G\}\_\{o\}are the same\.

Next, we show thatℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}agree onℙ\\mathbb\{P\}\. Consider some distributionP\(𝐯ρk∣do\(𝐱k,j\)\)P\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)inℙ\\mathbb\{P\}\. Then,

Pℳρk\(𝐯ρk∣do\(𝐱k,j\)\\displaystyle P^\{\\mathcal\{M\}\_\{\\rho\_\{k\}\}\}\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)=∏T∈ℰ∪ℛ∏t∈ρk\(T\)p\(𝐯y∣do\(𝐱k,j\)\)\\displaystyle=\\prod\_\{T\\in\\mathcal\{E\\cup R\}\}\\prod\_\{t\\in\\rho\_\{k\}\(T\)\}p\(\\mathbf\{v\}\_\{y\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)\(no relational effects inℳ⟹\\mathcal\{M\}\\impliesall instances independent inℳρk\\mathcal\{M\}\_\{\\rho\_\{k\}\}\)=∏T∈ℰ∪ℛ∏t∈ρk\(T\)Pℳρk\(𝐯y∣do\(𝐱k,j∩𝐯y\),do\(𝐱k,j∖𝐯y\)\)\\displaystyle=\\prod\_\{T\\in\\mathcal\{E\\cup R\}\}\\prod\_\{t\\in\\rho\_\{k\}\(T\)\}P^\{\\mathcal\{M\}\_\{\\rho\_\{k\}\}\}\(\\mathbf\{v\}\_\{y\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\\cap\\mathbf\{v\}\_\{y\}\),do\(\\mathbf\{x\}\_\{k,j\}\\setminus\\mathbf\{v\}\_\{y\}\)\)=∏T∈ℰ∪ℛ∏t∈ρk\(T\)Pℳρk\(𝐯y∣do\(𝐱k,j∩𝐯y\)\)\\displaystyle=\\prod\_\{T\\in\\mathcal\{E\\cup R\}\}\\prod\_\{t\\in\\rho\_\{k\}\(T\)\}P^\{\\mathcal\{M\}\_\{\\rho\_\{k\}\}\}\(\\mathbf\{v\}\_\{y\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\\cap\\mathbf\{v\}\_\{y\}\)\)\(all instances independent inℳρk\\mathcal\{M\}\_\{\\rho\_\{k\}\}\)=\(∏T∈ℰ∪ℛ∖X∏t∈ρk\(T\)Pℳρk\(𝐯y∣do\(𝐱k,j∩𝐯y\)\)\)⋅∏o∈ρk\(O\)Pℳρk\(𝐯x∣do\(𝐱k,j∩𝐱y\)\)\\displaystyle=\\left\(\\prod\_\{T\\in\\mathcal\{E\\cup R\}\\setminus X\}\\prod\_\{t\\in\\rho\_\{k\}\(T\)\}P^\{\\mathcal\{M\}\_\{\\rho\_\{k\}\}\}\(\\mathbf\{v\}\_\{y\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\\cap\\mathbf\{v\}\_\{y\}\)\)\\right\)\\cdot\\prod\_\{o\\in\\rho\_\{k\}\(O\)\}P^\{\\mathcal\{M\}\_\{\\rho\_\{k\}\}\}\(\\mathbf\{v\}\_\{x\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\\cap\\mathbf\{x\}\_\{y\}\)\)=\(∏T∈ℰ∪ℛ∖X∏t∈ρk\(T\)Pℳρk′\(𝐯y∣do\(𝐱k,j∩𝐯y\)\)\)⋅∏o∈ρk\(O\)Pℳρk\(𝐯x∣do\(𝐱k,j∩𝐱y\)\)\\displaystyle=\\left\(\\prod\_\{T\\in\\mathcal\{E\\cup R\}\\setminus X\}\\prod\_\{t\\in\\rho\_\{k\}\(T\)\}P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\_\{k\}\}\}\(\\mathbf\{v\}\_\{y\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\\cap\\mathbf\{v\}\_\{y\}\)\)\\right\)\\cdot\\prod\_\{o\\in\\rho\_\{k\}\(O\)\}P^\{\\mathcal\{M\}\_\{\\rho\_\{k\}\}\}\(\\mathbf\{v\}\_\{x\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\\cap\\mathbf\{x\}\_\{y\}\)\)\(variables of all butX−X\-type objects are coin flips inℳ,ℳ′\\mathcal\{M\},\\mathcal\{M^\{\\prime\}\}\)=\(∏T∈ℰ∪ℛ∖X∏t∈ρk\(T\)Pℳρk′\(𝐯y∣do\(𝐱k,j∩𝐯y\)\)\)⋅∏o∈ρk\(O\)P𝒩\(𝐯x∣do\(𝐱k,j∩𝐱y\)\)\\displaystyle=\\left\(\\prod\_\{T\\in\\mathcal\{E\\cup R\}\\setminus X\}\\prod\_\{t\\in\\rho\_\{k\}\(T\)\}P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\_\{k\}\}\}\(\\mathbf\{v\}\_\{y\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\\cap\\mathbf\{v\}\_\{y\}\)\)\\right\)\\cdot\\prod\_\{o\\in\\rho\_\{k\}\(O\)\}P^\{\\mathcal\{N\}\}\(\\mathbf\{v\}\_\{x\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\\cap\\mathbf\{x\}\_\{y\}\)\)\(by construction ofℳ\\mathcal\{M\}\)=\(∏T∈ℰ∪ℛ∖X∏t∈ρk\(T\)Pℳρk′\(𝐯y∣do\(𝐱k,j∩𝐯y\)\)\)⋅∏o∈ρk\(O\)P𝒩′\(𝐯x∣do\(𝐱k,j∩𝐱y\)\)\\displaystyle=\\left\(\\prod\_\{T\\in\\mathcal\{E\\cup R\}\\setminus X\}\\prod\_\{t\\in\\rho\_\{k\}\(T\)\}P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\_\{k\}\}\}\(\\mathbf\{v\}\_\{y\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\\cap\\mathbf\{v\}\_\{y\}\)\)\\right\)\\cdot\\prod\_\{o\\in\\rho\_\{k\}\(O\)\}P^\{\\mathcal\{N^\{\\prime\}\}\}\(\\mathbf\{v\}\_\{x\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\\cap\\mathbf\{x\}\_\{y\}\)\)\(since𝒩,𝒩′\\mathcal\{N,N^\{\\prime\}\}agree onℙx\\mathbb\{P\}\_\{x\}\)=Pℳρk′\(𝐯ρk∣do\(𝐱k,j\)\\displaystyle=P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\_\{k\}\}\}\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\(by a symmetric derivation\)
Finally, since𝒩\\mathcal\{N\}and𝒩′\\mathcal\{N\}^\{\\prime\}disagree onP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\), so doℳρ\\mathcal\{M\}\_\{\\rho\}andℳρ′\\mathcal\{M\}\_\{\\rho\}^\{\\prime\}\.

∎

### D\.3Proofs for Sec\.[5](https://arxiv.org/html/2606.14892#S5)

The proofs of this section resemble that of the expressvity and correctness of NCM training in\(Xia et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib131)\), adapted to enforce parameter sharing for variables of the same type and permutation\-invariance for relational parents\.

##### Assumptions\.

We assume, that in any domain of interest, the true RSCMℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangleis as follows\.

1. \(I\)ℳ\\mathcal\{M\}isρ\\rho\-Markovian\.
2. \(II\)All variables in𝐕\\mathbf\{V\}have discrete, finite domains\.
3. \(III\)There is a non\-negative integerDDsuch that for any skeletonρ\\rho, the groundingℳρ\\mathcal\{M\}\_\{\\rho\}has variableso\.Ao\.Awith relational parent multisets of size at mostDD\.

#### D\.3\.1Discrete RSCMs

First, we introduce*discrete RSCMs*, following\(Zhang et al\.,[2022b](https://arxiv.org/html/2606.14892#bib.bib138)\)\.

###### Definition D\.1\(Discrete RSCM\)\.

An RSCMℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangleis said to be a discrete RSCM if all variables in𝐕\\mathbf\{V\}are discrete and finite, and all variables in𝐔\\mathbf\{U\}are discrete\.

We will show that the space of discrete RSCMs is equally expressive as that of all RSCMs satisfying assumptions \(I\)\-\(III\)\. This licenses the assumption, without loss of generality, that the exogenous variables have discrete and finite domains\.

We adapt certain definitions from\(Zhang et al\.,[2022b](https://arxiv.org/html/2606.14892#bib.bib138)\)\.

###### Definition D\.5\(Equivalence classes\(Zhang et al\.,[2022b](https://arxiv.org/html/2606.14892#bib.bib138), Def\. A\.1\)\)\.

For an RSCMℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangle, for everyO\.A∈𝐕O\.A\\in\\mathbf\{V\}, let functions indom𝐏𝐚O\.A×dom𝐏𝐚O\.Ar↦dom\(O\.A\)\\textnormal\{dom\}\_\{\\mathbf\{Pa\}\_\{O\.A\}\}\\times\\textnormal\{dom\}\_\{\\mathbf\{Pa\}^\{r\}\_\{O\.A\}\}\\mapsto\\textnormal\{dom\}\(O\.A\)777Here,dom𝐏𝐚O\.Ar\\textnormal\{dom\}\_\{\\mathbf\{Pa\}^\{r\}\_\{O\.A\}\}contains multisets of size≤D\\leq Dfor each relational parent, per assumption \(II\)\.be ordered by\{hO\.A\(i\)∣i∈IO\.A\}\\\{h\_\{O\.A\}^\{\(i\)\}\\mid i\\in I\_\{O\.A\}\\\}, whereIO\.A=\{1,…,mO\.A\}I\_\{O\.A\}=\\\{1,\\ldots,m\_\{O\.A\}\\\}andmO\.A=\|dom𝐏𝐚O\.A×dom𝐏𝐚O\.Ar↦dom\(O\.A\)\|m\_\{O\.A\}=\\lvert\\textnormal\{dom\}\_\{\\mathbf\{Pa\}\_\{O\.A\}\}\\times\\textnormal\{dom\}\_\{\\mathbf\{Pa\}^\{r\}\_\{O\.A\}\}\\mapsto\\textnormal\{dom\}\(O\.A\)\\rvert\. An*equivalence class*𝒰O\.A\(i\)\\mathcal\{U\}\_\{O\.A\}^\{\(i\)\}for functionhO\.A\(i\)h\_\{O\.A\}^\{\(i\)\},i=1,…,mO\.Ai=1,\\ldots,m\_\{O\.A\}, is a subset ofdom\(𝐔O\.A\)\\textnormal\{dom\}\(\\mathbf\{U\}\_\{O\.A\}\)such that

𝒰O\.A\(i\)=\{𝐮O\.A∈dom\(𝐔O\.A\)\|fO\.A\(⋅,𝐮O\.A\)=hO\.A\(i\)\}\.\\mathcal\{U\}\_\{O\.A\}^\{\(i\)\}\\;=\\;\\bigl\\\{\\,\\mathbf\{u\}\_\{O\.A\}\\in\\textnormal\{dom\}\(\\mathbf\{U\}\_\{O\.A\}\)\\;\\big\|\\;f\_\{O\.A\}\(\\cdot,\\mathbf\{u\}\_\{O\.A\}\)=h\_\{O\.A\}^\{\(i\)\}\\,\\bigr\\\}\.\(26\)

###### Definition D\.6\(Canonical Partition\(Zhang et al\.,[2022b](https://arxiv.org/html/2606.14892#bib.bib138), Def\. A\.2\)\)\.

For an RSCMℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangle,\{𝒰O\.A\(i\)∣i∈IO\.A\}\\\{\\mathcal\{U\}\_\{O\.A\}^\{\(i\)\}\\mid i\\in I\_\{O\.A\}\\\}is the*canonical partition*over the exogenous domaindom\(𝐔O\.A\)\\textnormal\{dom\}\(\\mathbf\{U\}\_\{O\.A\}\)for everyO\.A∈𝐕O\.A\\in\\mathbf\{V\}\.

###### Corollary D\.3\.

For any RSCMℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangle, for eachO\.A∈𝐕O\.A\\in\\mathbf\{V\}, the functionfO\.A∈ℱf\_\{O\.A\}\\in\\mathcal\{F\}can be decomposed as:

fO\.A\(𝐩𝐚O\.A,𝐩𝐚O\.Ar,𝐮O\.A\)=∑i∈IO\.AhO\.A\(i\)\(𝐩𝐚O\.A,𝐩𝐚O\.Ar\)𝟏𝐮O\.A∈𝒰O\.A\(i\)\.f\_\{O\.A\}\(\\mathbf\{pa\}\_\{O\.A\},\\mathbf\{pa\}^\{r\}\_\{O\.A\},\\mathbf\{u\}\_\{O\.A\}\)\\;=\\;\\sum\_\{i\\in I\_\{O\.A\}\}h\_\{O\.A\}^\{\(i\)\}\(\\mathbf\{pa\}\_\{O\.A\},\\mathbf\{pa\}^\{r\}\_\{O\.A\}\)\\mathbf\{1\}\_\{\\,\\mathbf\{u\}\_\{O\.A\}\\in\\mathcal\{U\}\_\{O\.A\}^\{\(i\)\}\}\.\(27\)

###### Proof\.

This is an immediate consequence of Lemma A\.3 in\(Zhang et al\.,[2022b](https://arxiv.org/html/2606.14892#bib.bib138)\), considering the union of relational and non\-relational parents\. ∎

###### Corollary D\.4\.

Consider an RSCMℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangle\. For any relational skeletonρ\\rho, let the ground RSCM beℳρ=⟨𝐕ρ,𝐔ρ,ℱρ,P\(𝐔\)ρ⟩\\mathcal\{M\}\_\{\\rho\}=\\langle\\mathbf\{V\}\_\{\\rho\},\\mathbf\{U\}\_\{\\rho\},\\mathcal\{F\}\_\{\\rho\},P\(\\mathbf\{U\}\)\_\{\\rho\}\\rangle\. Let the indexing set𝐈\\mathbf\{I\}be𝐈=×O\.A∈𝐕×o∈ρ\(O\)𝐈O\.A\\mathbf\{I\}=\\bigtimes\_\{O\.A\\in\\mathbf\{V\}\}\\bigtimes\_\{o\\in\\rho\(O\)\}\\mathbf\{I\}\_\{O\.A\}, i\.e\., a product of indexing sets across the different types of variables, and for instances of each type\. Then, for any variables𝐘,…,𝐙,𝐗,…,𝐖⊆𝐕\\mathbf\{Y\},\\dots,\\mathbf\{Z\},\\mathbf\{X\},\\dots,\\mathbf\{W\}\\subseteq\\mathbf\{V\},

Pℳρ\(𝐲𝐱,…𝐳𝐰\)\\displaystyle P^\{\\mathcal\{M\}\_\{\\rho\}\}\(\\mathbf\{y\_\{x\}\},\\dots\\mathbf\{z\_\{w\}\}\)=∑𝐢∈𝐈𝟏𝐘𝐱\(𝐢\)=𝐲,…,𝐙𝐰\(𝐢\)=𝐳⋅P\(⋂O\.A∈𝐕,o∈ρ\(O\)𝒰O\.A\(i\)\)\\displaystyle=\\sum\_\{\\mathbf\{i\\in I\}\}\\mathbf\{1\}\_\{\\mathbf\{Y\_\{x\}\(i\)=y\},\\dots,\\mathbf\{Z\_\{w\}\(i\)=z\}\}\\cdot P\\left\(\\bigcap\_\{O\.A\\in\\mathbf\{V\},o\\in\\rho\(O\)\}\\mathcal\{U\}^\{\(i\)\}\_\{O\.A\}\\right\)\(whereiiis the index foro\.Ao\.Ainii\)where𝐘𝐱\(𝐢\)=\{Y𝐱\(𝐢\)=y∣o\.B∈𝐘\}\\mathbf\{Y\_\{x\}\(i\)\}=\\\{Y\_\{\\mathbf\{x\}\}\(\\mathbf\{i\}\)=y\\mid o\.B\\in\\mathbf\{Y\}\\\}and eacho\.B𝐱\(𝐢\)o\.B\_\{\\mathbf\{x\}\}\(\\mathbf\{i\}\)is recursively computed as

o\.B𝐱\(𝐢\)=\{𝐱o\.Bifo\.B∈𝐗hO\.B\(i\)\(\(𝐏𝐚o\.B\)𝐱\(𝐢\)\)otherwise\\displaystyle o\.B\_\{\\mathbf\{x\}\}\(\\mathbf\{i\}\)=\\begin\{cases\}\\mathbf\{x\}\_\{o\.B\}&\\text\{if \}o\.B\\in\\mathbf\{X\}\\\\ h^\{\(i\)\}\_\{O\.B\}\(\(\\mathbf\{Pa\}\_\{o\.B\}\)\_\{\\mathbf\{x\}\}\(\\mathbf\{i\}\)\)&\\text\{otherwise\}\\end\{cases\}

###### Proof\.

This follows from\(Zhang et al\.,[2022b](https://arxiv.org/html/2606.14892#bib.bib138), Lemma A\.4\), noting that each instanceo\.Ao\.Ais determined by the same functionfO\.Af\_\{O\.A\}of its parents inℳρ\\mathcal\{M\}\_\{\\rho\}\. ∎

###### Corollary D\.5\.

Consider an RSCMℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\rangleinducing relational causal graph𝒢\\mathcal\{G\}\. For any relational skeletonρ\\rho, let the indexing set be𝐈\\mathbf\{I\}as in Cor\.[D\.4](https://arxiv.org/html/2606.14892#A4.Thmadxcorollary4)\. For a given typeX∈ℰ∪ℛX\\in\\mathcal\{E\\cup R\},𝒞\(𝒢\)X=\{𝐂⊆𝐕∣Cis a bidirected connected component in𝒢containing someO\.A\}\\mathcal\{C\}\(\\mathcal\{G\}\)\_\{X\}=\\\{\\mathbf\{C\}\\subseteq\\mathbf\{V\}\\mid C\\text\{is a bidirected connected component in \}\\mathcal\{G\}\\text\{ containing some \}O\.A\\\}\. Then, for the ground RSCMℳρ\\mathcal\{M\}\_\{\\rho\},

P\(⋂O\.A∈𝐕,o∈ρ\(O\)𝒰O\.A\(i\)\)=∏X∈ℰ∪ℛ∏o∈ρ\(O\)∏𝐂∈𝒞\(𝒢\)XP\(⋂O\.A∈𝐂𝒰O\.Ai\)\.P\\left\(\\bigcap\_\{O\.A\\in\\mathbf\{V\},o\\in\\rho\(O\)\}\\mathcal\{U\}^\{\(i\)\}\_\{O\.A\}\\right\)=\\prod\_\{X\\in\\mathcal\{E\\cup R\}\}\\prod\_\{o\\in\\rho\(O\)\}\\prod\_\{\\mathbf\{C\}\\in\\mathcal\{C\}\(\\mathcal\{G\}\)\_\{X\}\}P\(\\bigcap\_\{O\.A\\in\\mathbf\{C\}\}\\mathcal\{U\}^\{i\}\_\{O\.A\}\)\.

###### Proof\.

This follows by a similar argument as in\(Zhang et al\.,[2022b](https://arxiv.org/html/2606.14892#bib.bib138), Lemma A\.5\), noting that sinceℳ\\mathcal\{M\}isρ\\rho\-Markovian, exogenous variables are independent across different instancesoo, and identically distributed across different instancesooof the same typeOO\. ∎

###### Corollary D\.6\(Discrete RSCM expressiveness\)\.

Consider an RSCMℳ=⟨𝒮,𝐕,𝐔,ℱ,P\(𝐔\)⟩\\mathcal\{M\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\},\\mathcal\{F\},P\(\\mathbf\{U\}\)\\ranglesatisfying assumptions \(I\)\-\(III\) and inducing relational causal graph𝒢\\mathcal\{G\}\. Then, there exists a discrete RSCMℳ′\\mathcal\{M\}^\{\\prime\}consistent with𝒢\\mathcal\{G\}such that for any skeletonρ\\rho, the ground RSCMsℳρ\\mathcal\{M\}\_\{\\rho\}andℳρ′\\mathcal\{M\}^\{\\prime\}\_\{\\rho\}agree on all counterfactual distributions over𝐕ρ\\mathbf\{V\}\_\{\\rho\}\.

###### Proof\.

This follows from\(Zhang et al\.,[2022b](https://arxiv.org/html/2606.14892#bib.bib138), Lemma A\.6\), Cor\.[D\.5](https://arxiv.org/html/2606.14892#A4.Thmadxcorollary5), and Cor\.[D\.4](https://arxiv.org/html/2606.14892#A4.Thmadxcorollary4)\.

∎

#### D\.3\.2Neural RSCMs

###### Theorem[5\.2](https://arxiv.org/html/2606.14892#S5.Thmtheorem2)\(Expressivity of RNCMs\)\.

Consider a relational schema𝒮\\mathcal\{S\}\. For every RSCMℳ\\mathcal\{M\}over𝒮\\mathcal\{S\}inducing relational causal graph𝒢\\mathcal\{G\}, there exists a𝒢\\mathcal\{G\}\-RNCM𝒩\\mathcal\{N\}such that for every skeletonρ\\rho, the ground RSCMsℳρ\\mathcal\{M\}\_\{\\rho\}and𝒩ρ\\mathcal\{N\}\_\{\\rho\}induce the same counterfactual distributions over𝐕ρ\\mathbf\{V\}\_\{\\rho\}\.

###### Proof\.

Fix an RSCMℳ∗\\mathcal\{M\}^\{\*\}over𝒮\\mathcal\{S\}inducing relational causal graph𝒢\\mathcal\{G\}\. By the discrete RSCM expressiveness corollary above, there exists a discrete RSCMℳ′=⟨𝒮,𝐕,𝐔′,ℱ′,P\(𝐔′\)⟩\\mathcal\{M\}^\{\\prime\}=\\langle\\mathcal\{S\},\\mathbf\{V\},\\mathbf\{U\}^\{\\prime\},\\mathcal\{F\}^\{\\prime\},P\(\\mathbf\{U\}^\{\\prime\}\)\\rangleconsistent with𝒢\\mathcal\{G\}such that for every skeletonρ\\rho, the ground rscmsℳρ∗\\mathcal\{M\}^\{\*\}\_\{\\rho\}andℳρ′\\mathcal\{M\}^\{\\prime\}\_\{\\rho\}agree on all counterfactual distributions over𝐕ρ\\mathbf\{V\}\_\{\\rho\}\. Thus, it suffices to construct a𝒢\\mathcal\{G\}\-RNCM𝒩\\mathcal\{N\}that agrees withℳρ′\\mathcal\{M\}^\{\\prime\}\_\{\\rho\}on all counterfactuals, for everyρ\\rho\.

We follow the proof strategy of\(Xia et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib131)\): \(i\) represent the discrete exogenous distribution using uniform𝒰\(\[0,1\]\)\\mathcal\{U\}\(\[0,1\]\)noise via a neural inverse probability integral transform \[Lemma 5\]\(Xia et al\.,[2021](https://arxiv.org/html/2606.14892#bib.bib130)\), and \(ii\) represent each mechanismfO\.Af\_\{O\.A\}using a multi\-layer perceptron \(MLP\)\.

Sinceℳ′\\mathcal\{M\}^\{\\prime\}isρ\\rho\-Markovian \(Assumption \(I\)\), bidirected edges occur only among attributes of the same instanceoo\. Fix a typeX∈ℰ∪ℛX\\in\\mathcal\{E\}\\cup\\mathcal\{R\}, and let𝒞\(𝒢\)X=\{𝐂1,…,𝐂k\}\\mathcal\{C\}\(\\mathcal\{G\}\)\_\{X\}=\\\{\\mathbf\{C\}\_\{1\},\\dots,\\mathbf\{C\}\_\{k\}\\\}be the set of bidirected maximal cliques among\{O\.A:O\.A∈𝐕\}\\\{O\.A:O\.A\\in\\mathbf\{V\}\\\}in𝒢\\mathcal\{G\}\. We construct𝒩\\mathcal\{N\}as in Def\.[5\.1](https://arxiv.org/html/2606.14892#S5.Thmtheorem1)by introducing, for each𝐂∈𝒞\(𝒢\)X\\mathbf\{C\}\\in\\mathcal\{C\}\(\\mathcal\{G\}\)\_\{X\}, an exogenous variableU^𝐂∼𝒰\(\[0,1\]\)\\hat\{U\}\_\{\\mathbf\{C\}\}\\sim\\mathcal\{U\}\(\[0,1\]\)\. For each clique𝐂\\mathbf\{C\}, let𝐔𝐂′\\mathbf\{U\}^\{\\prime\}\_\{\\mathbf\{C\}\}denote the tuple of discrete exogenous variables inℳ′\\mathcal\{M\}^\{\\prime\}that are the shared exogenous parents of the variables in𝐂\\mathbf\{C\}\(within an instance of typeOO\)\. Since𝐔𝐂′\\mathbf\{U\}^\{\\prime\}\_\{\\mathbf\{C\}\}has a finite domain, Lemma 5 of\(Xia et al\.,[2021](https://arxiv.org/html/2606.14892#bib.bib130)\)provides an MLPg𝐂g\_\{\\mathbf\{C\}\}such thatg𝐂\(U^𝐂\)g\_\{\\mathbf\{C\}\}\(\\hat\{U\}\_\{\\mathbf\{C\}\}\)has the same distribution as𝐔𝐂′\\mathbf\{U\}^\{\\prime\}\_\{\\mathbf\{C\}\}\.

Next, consider some variableO\.A∈𝐕O\.A\\in\\mathbf\{V\}\. Under Assumptions \(II\)–\(III\), the input spacedom\(𝐏𝐚O\.A\)×dom\(𝐏𝐚O\.Ar\)×dom\(𝐔O\.A′\)\\textnormal\{dom\}\(\\mathbf\{Pa\}\_\{O\.A\}\)\\times\\textnormal\{dom\}\(\\mathbf\{Pa\}^\{r\}\_\{O\.A\}\)\\times\\textnormal\{dom\}\(\\mathbf\{U\}^\{\\prime\}\_\{O\.A\}\)is finite \(the relational\-parent domain is finite because each multiset has size at mostDD\)\. Hence, by Lemma 4 of\(Xia et al\.,[2021](https://arxiv.org/html/2606.14892#bib.bib130)\), there exists an MLPhO\.Ah\_\{O\.A\}that agrees withfO\.A′f^\{\\prime\}\_\{O\.A\}for each possible input\.

Composing MLPshO\.Ah\_\{O\.A\}with every MLPg𝐂g\_\{\\mathbf\{C\}\}such thatO\.A∈𝐂O\.A\\in\\mathbf\{C\}yields a functionf^O\.A:dom\(𝐏𝐚O\.A\)×dom\(𝐏𝐚O\.Ar\)×dom\(𝐔^O\.A\)→dom\(O\.A\)\\hat\{f\}\_\{O\.A\}:\\textnormal\{dom\}\(\\mathbf\{Pa\}\_\{O\.A\}\)\\times\\textnormal\{dom\}\(\\mathbf\{Pa\}^\{r\}\_\{O\.A\}\)\\times\\textnormal\{dom\}\(\\hat\{\\mathbf\{U\}\}\_\{O\.A\}\)\\to\\textnormal\{dom\}\(O\.A\)\.

For any skeletonρ\\rho, the ground RSCM𝒩ρ\\mathcal\{N\}\_\{\\rho\}shares the same functionf^O\.A\\hat\{f\}\_\{O\.A\}across all instanceso∈ρ\(O\)o\\in\\rho\(O\); this is also the case inℳρ\\mathcal\{M\}\_\{\\rho\}\. Therefore, for every ground variableo\.A∈𝐕ρo\.A\\in\\mathbf\{V\}\_\{\\rho\},ℳρ\\mathcal\{M\}\_\{\\rho\}and𝒩ρ\\mathcal\{N\}\_\{\\rho\}share the same distribution over exogenous parents and the same mechanism\. It follows thatℳρ′\\mathcal\{M\}^\{\\prime\}\_\{\\rho\}andℳρ∗\\mathcal\{M\}^\{\*\}\_\{\\rho\}agree on all counterfactual distributions\. ∎

###### Definition D\.2\(Data\-dependent relational counterfactual identification\)\.

Consider a schema𝒮\\mathcal\{S\}, relational causal graph𝒢\\mathcal\{G\}, source skeletonsρ1,…,ρl\\rho\_\{1\},\\dots,\\rho\_\{l\}, source distributionsℙ=\{\{P\(𝐯ρk∣do\(𝐱k,j\)\)\>0\}j=1mk\}k=1l\\mathbb\{P\}=\\\{\\\{P\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)\>0\\\}\_\{j=1\}^\{m\_\{k\}\}\\\}\_\{k=1\}^\{l\}, and target skeletonρ⋆\\rho\_\{\\star\}\. LetP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)be a target query with𝐘⋆,𝐗⋆⊆𝐕ρ⋆\\mathbf\{Y\}\_\{\\star\},\\mathbf\{X\}\_\{\\star\}\\subseteq\\mathbf\{V\}\_\{\\rho\_\{\\star\}\}\.

We sayP\(𝐲⋆∣𝐱⋆\)P\(\\mathbf\{y\}\_\{\\star\}\\mid\\mathbf\{x\}\_\{\\star\}\)is*relationally identifiable*from𝒢\\mathcal\{G\}andℙ\\mathbb\{P\}if for any RSCMsℳ,ℳ′\\mathcal\{M\},\\mathcal\{M\}^\{\\prime\}consistent with𝒢\\mathcal\{G\}agreeing on the source data, so that for everyρk\\rho\_\{k\}andj=1,…,mkj=1,\\dots,m\_\{k\},

Pℳρk\(𝐯ρk∣do\(𝐱k,j\)\)=Pℳρk′\(𝐯ρk∣do\(𝐱k,j\)\)=P\(𝐯ρk∣do\(𝐱k,j\)\)P^\{\\mathcal\{M\}\_\{\\rho\_\{k\}\}\}\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)=P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\_\{k\}\}\}\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)=P\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)they also agree on the query:

Pℳρ⋆\(𝐲⋆∣𝐱⋆\)=Pℳρ⋆′\(𝐲⋆∣𝐱⋆\)\.P^\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}\(\\mathbf\{y\}\_\{\\star\}\\mid\\mathbf\{x\}\_\{\\star\}\)=P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\_\{\\star\}\}\}\(\\mathbf\{y\}\_\{\\star\}\\mid\\mathbf\{x\}\_\{\\star\}\)\.Otherwise, the query is*relationally non\-identifiable dependent on the data*\.

###### Corollary D\.7\(Correctness of𝖱𝖾𝗅𝖺𝗍𝗂𝗈𝗇𝖺𝗅𝖭𝖾𝗎𝗋𝖺𝗅𝖨𝖣\\mathsf\{RelationalNeuralID\}\(Alg\.[1](https://arxiv.org/html/2606.14892#alg1)\)\)\.

Consider a schema𝒮\\mathcal\{S\}, relational causal graph𝒢\\mathcal\{G\}, source skeletonsρ1,…,ρl\\rho\_\{1\},\\dots,\\rho\_\{l\}, source distributionsℙ=\{\{P\(𝐯ρk∣do\(𝐱k,j\)\)\>0\}j=1mk\}k=1l\\mathbb\{P\}=\\\{\\\{P\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)\>0\\\}\_\{j=1\}^\{m\_\{k\}\}\\\}\_\{k=1\}^\{l\}, and target skeletonρ⋆\\rho\_\{\\star\}\. LetP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)be a target query with𝐘⋆,𝐗⋆⊆𝐕ρ⋆\\mathbf\{Y\}\_\{\\star\},\\mathbf\{X\}\_\{\\star\}\\subseteq\\mathbf\{V\}\_\{\\rho\_\{\\star\}\}\. LetQQbe the output of𝖱𝖾𝗅𝖺𝗍𝗂𝗈𝗇𝖺𝗅𝖭𝖾𝗎𝗋𝖺𝗅𝖨𝖣\\mathsf\{RelationalNeuralID\}\(Alg\.[1](https://arxiv.org/html/2606.14892#alg1)\) given these inputs\. Then,QQis notFAILif and only if the queryP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)is relationally identifiable dependent on the data from𝒢\\mathcal\{G\}andℙ\\mathbb\{P\}, in which caseQ=P\(𝐲∗∣𝐱∗\)Q=P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)\.

###### Proof\.

Fix a relational causal diagram𝒢\\mathcal\{G\}\. By Thm\.[5\.2](https://arxiv.org/html/2606.14892#Thminnercustomthm4), for every RSCMℳ\\mathcal\{M\}consistent with𝒢\\mathcal\{G\}, there exists a𝒢\\mathcal\{G\}\-RNCM agreeing withℳ\\mathcal\{M\}on counterfactual distributions for every possible skeletonρ\\rho\. Additionally, every𝒢\\mathcal\{G\}\-RNCM is consistent with𝒢\\mathcal\{G\}by construction\. Therefore, minimizing / maximizing the queryQQin the space of RSCMs consistent with𝒢\\mathcal\{G\}and inducing the givenℙ\\mathbb\{P\}is equivalent to minimizing / maximizing the queryQQin the space of𝒢−\\mathcal\{G\}\-RSCMs inducing the givenℙ\\mathbb\{P\}\. The correctness of Alg\.[1](https://arxiv.org/html/2606.14892#alg1)thus follows from the definition of data\-dependent relational identification \(Def\.[D\.2](https://arxiv.org/html/2606.14892#A4.Thmadxdefinition2)\)\.

∎

### D\.4Additional Algorithms

##### Estimating identifiable queries\.

Alg\.[2](https://arxiv.org/html/2606.14892#alg2)provides an algorithm for point\-estimation of a given query from graph and data\. It modifies Alg\.[1](https://arxiv.org/html/2606.14892#alg1)by simply fitting a single𝒢\\mathcal\{G\}\-RNCM to the available data, and outputting the value of the query for that RNCM\. Since Alg\.[2](https://arxiv.org/html/2606.14892#alg2)does not minimize/maximize the query, it is correct only assuming that the given query is relationally identifiable from the given graph and distributions\.

Algorithm 2𝖱𝖾𝗅𝖺𝗍𝗂𝗈𝗇𝖺𝗅𝖭𝖾𝗎𝗋𝖺𝗅𝖤𝗌𝗍𝗂𝗆𝖺𝗍𝗂𝗈𝗇\\mathsf\{RelationalNeuralEstimation\}Input:schema

𝒮\\mathcal\{S\}, relational causal graph

𝒢\\mathcal\{G\}, source data

𝒟=\{\(ρk,\{P\(𝐯ρk∣do\(𝐱k,j\)\)\}j=1mk\)\}k=1l\\mathcal\{D\}=\\big\\\{\\big\(\\rho\_\{k\},\\ \\\{P\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)\\\}\_\{j=1\}^\{m\_\{k\}\}\\big\)\\big\\\}\_\{k=1\}^\{l\}, target skeleton

ρ⋆\\rho\_\{\\star\}, query

P\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)
M^←𝒢\-RNCM\\hat\{M\}\\leftarrow\\mathcal\{G\}\\text\{\-RNCM\}

θ^←θ∈Θ\(M^\)subject to∀k,jPM^ρk\(θ\)\(𝐯ρk∣do\(𝐱k,j\)\)=P\(𝐯ρk∣do\(𝐱k,j\)\)\\begin\{aligned\} \\hat\{\\theta\}\\leftarrow&\\ \\theta\\in\\Theta\(\\hat\{M\}\)\\text\{ subject to \}\\forall k,j\\ \\ P^\{\\hat\{M\}\_\{\\rho\_\{k\}\}\(\\theta\)\}\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)=P\(\\mathbf\{v\}\_\{\\rho\_\{k\}\}\\mid do\(\\mathbf\{x\}\_\{k,j\}\)\)\\ \\end\{aligned\}

q←PM^ρ⋆\(θ^l\)\(𝐲∗∣𝐱∗\)q\\leftarrow P^\{\\hat\{M\}\_\{\\rho\_\{\\star\}\}\(\\hat\{\\theta\}\_\{l\}\)\}\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)

return

qq

## Appendix EFurther Experiments and Experimental Details

### E\.1Compute and Implementation

All implementations are in Python, adapted from the Neural Causal Models codebase\(Xia et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib131)\)\. Neural Causal Models are built using PyTorch\(Paszke et al\.,[2019](https://arxiv.org/html/2606.14892#bib.bib85)\)and trained using PyTorch Lightning\(Falcon & Cho,[2020](https://arxiv.org/html/2606.14892#bib.bib32)\)\. We trained our models on a single NVIDIA H100 GPU on a shared compute cluster with 2x Intel Xeon Platinum 8480\+ CPUs \(112 cores total, 224 threads\) at up to 3\.8 GHz, and 210 MiB L3 cache

### E\.2Data Generation

We generated all synthetic data using*regional canonical models*\(Xia et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib131), Def\. 11\)adapted to share functions across variables of the same type\. Concretely, given aρ\\rho\-Markovian relational causal graph𝒢\\mathcal\{G\}, we introduce one exogenous variableU𝐂∼𝒰\(\[0,1\]\)U\_\{\\mathbf\{C\}\}\\sim\\mathcal\{U\}\(\[0,1\]\)for each maximal bidirected clique𝐂\\mathbf\{C\}in𝒢\\mathcal\{G\}, to represent within\-instance latent confounding encoded by𝒢\\mathcal\{G\}\. For each endogenous variableO\.AO\.A, we then sample a random deterministic function

fO\.A:dom\(𝐏𝐚O\.A\)×dom\(𝐏𝐚O\.Ar\)×dom\(𝐮O\.A\)→dom\(O\.A\)f\_\{O\.A\}:\\textnormal\{dom\}\(\\mathbf\{Pa\}\_\{O\.A\}\)\\times\\textnormal\{dom\}\(\\mathbf\{Pa\}^\{r\}\_\{O\.A\}\)\\times\\textnormal\{dom\}\(\\mathbf\{u\}\_\{O\.A\}\)\\to\\textnormal\{dom\}\(O\.A\)according to the canonical construction ofXia et al\. \([2023](https://arxiv.org/html/2606.14892#bib.bib131)\), where𝐮O\.A=\{U𝐂:O\.A∈𝐂\}\\mathbf\{u\}\_\{O\.A\}=\\\{U\_\{\\mathbf\{C\}\}:O\.A\\in\\mathbf\{C\}\\\}\.

For each relational parent\(𝐖,ϕ,AGG\)∈𝐏𝐚O\.Ar\(\\mathbf\{W\},\\phi,\\textrm\{AGG\}\)\\in\\mathbf\{Pa\}^\{r\}\_\{O\.A\}, we set its domain to be the set of histograms overdom\(𝐖\)\\textnormal\{dom\}\(\\mathbf\{W\}\)induced by multisets of size at mostDD; in our experimentsD=5D=5, which upper\-bounds the relational\-parent multiset sizes in Fig\.[1](https://arxiv.org/html/2606.14892#S1.F1)\. Note that since we assume discrete endogenous domains, the count is a sufficient statistic for a multiset of𝐰\\mathbf\{w\}\-values\.

For each trial, we sample a random regional canonical model as above\. Given a skeletonρ\\rho, to generate data fromP\(𝐯ρ\)P\(\\mathbf\{v\}\_\{\\rho\}\), we instantiate variableso\.Ao\.Aando\.U𝐂o\.U\_\{\\mathbf\{C\}\}for every instanceoo, and reuse the same mechanism for every variable of the same type\. Due to the expressivity of canonical models\(Zhang et al\.,[2022b](https://arxiv.org/html/2606.14892#bib.bib138); Xia et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib131)\), this avoids bias in data\-generation induced by choosing a particular parametric model\.

### E\.3Model Architecture and Training

We refer readers to the comprehensive Appendix B\.6 in\(Xia et al\.,[2023](https://arxiv.org/html/2606.14892#bib.bib131)\)for details on the MLE\-NCM architecture and training procedure\. In addition to the clique\-level noise variables \(Def\.[5\.1](https://arxiv.org/html/2606.14892#S5.Thmtheorem1)\), an MLE NCM additionally includes a Gumbel noise variable for each attribute\.

Given a relational causal graph𝒢\\mathcal\{G\}, we initialize a module for each mechanismfO\.Af\_\{O\.A\}to get an RNCMℳ^\\hat\{\\mathcal\{M\}\}\. This is reused for every instanceo\.Ao\.Aacross source and target skeletons\. To train to maximize a queryP\(𝐲∗∣𝐱∗\)P\(\\mathbf\{y\_\{\*\}\}\\mid\\mathbf\{x\_\{\*\}\}\)on a targetvρ⋆v\_\{\\rho\_\{\\star\}\}given data𝒟\\mathcal\{D\}from sourcesρ1,…,ρk\\rho\_\{1\},\\dots,\\rho\_\{k\}, where each sourceρk\\rho\_\{k\}has data\{𝐯ρ𝐤,𝐳𝐣,𝐤\(i\)\}i=1nj,k\\\{\\mathbf\{v\_\{\\rho\_\{k\},\\mathbf\{z\}\_\{j,k\}\}\}^\{\(i\)\}\\\}\_\{i=1\}^\{n\_\{j,k\}\}comprisingnj,kn\_\{j,k\}datapoints fromj=1,…,mkj=1,\\dots,m\_\{k\}regimes, we use the following modified MLE\-NCM loss\.

L\(ℳ^,𝒟\)=∑k=1l∑j=1mk1nj,k∑i=1nj,k−log⁡Pℳρk^\(𝐯ρk,𝐳j,k\(i\)\)−λlog⁡Pℳρ⋆^\(𝐲∣𝐱∗\)\\displaystyle L\(\\hat\{\\mathcal\{M\}\},\\mathcal\{D\}\)=\\sum\_\{k=1\}^\{l\}\\sum\_\{j=1\}^\{m\_\{k\}\}\\frac\{1\}\{n\_\{j,k\}\}\\sum\_\{i=1\}^\{n\_\{j,k\}\}\-\\log P^\{\\hat\{\\mathcal\{M\}\_\{\\rho\_\{k\}\}\}\}\(\\mathbf\{v\}\_\{\\rho\_\{k\},\\mathbf\{z\}\_\{j,k\}\}^\{\(i\)\}\)\-\\lambda\\log P^\{\\hat\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}\}\(\\mathbf\{y\_\{\\mathbf\{\}\}\}\\mid\\mathbf\{x\_\{\*\}\}\)\(3\)
Here,λ\\lambdais a parameter that decreases during training\. To minimize the query, we replace theλlog⁡Pℳρ⋆^\(𝐲∣𝐱∗\)\\lambda\\log P^\{\\hat\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}\}\(\\mathbf\{y\_\{\\mathbf\{\}\}\}\\mid\\mathbf\{x\_\{\*\}\}\)term withλ\(1−log⁡Pℳρ⋆^\(𝐲∣𝐱∗\)\)\\lambda\(1\-\\log P^\{\\hat\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}\}\(\\mathbf\{y\_\{\\mathbf\{\}\}\}\\mid\\mathbf\{x\_\{\*\}\}\)\)\. In our estimation experiments \(Exp\.[6\.1](https://arxiv.org/html/2606.14892#S6.SS1)\), we setλ=0\\lambda=0and train only one RNCM\.

All modules contain 2 hidden layers with 128 neurons each, and were trained for200200epochs \(often converging earlier\) with a learning rate of10−310^\{\-3\}and a batch size of 1000 datapoints\.

### E\.4RNCM Estimation Experiments: Details

##### Specification of queries\.

In Table[E\.4\.1](https://arxiv.org/html/2606.14892#A5.SS4.T1), we give the the exact target queries used in Exp\.[6\.1](https://arxiv.org/html/2606.14892#S6.SS1)\.

##### Proof of identifiability\.

We give an example derivation to show thatPρC\(c2\.b∣do\(p2\.x,p3\.x\)\)P^\{\\rho\_\{C\}\}\(c\_\{2\}\.b\\mid do\(p\_\{2\}\.x,p\_\{3\}\.x\)\)is identifiable fromP\(𝐯ρA\)P\(\\mathbf\{v\}\_\{\\rho\_\{A\}\}\)\. The proof for other identifiable cases follows similarly, by application of same\-skeleton backdoor adjustment \(Cor\.[D\.2](https://arxiv.org/html/2606.14892#A4.Thmadxcorollary2)\) and cross\-skeleton observational inference \(Thm\.[4\.3](https://arxiv.org/html/2606.14892#S4.Thmtheorem3)\)\.

###### Proposition E\.1\.

Given source skeletonρA\\rho\_\{A\}and target skeletonρC\\rho\_\{C\}as in Exp\.[6\.1](https://arxiv.org/html/2606.14892#S6.SS1), and relational causal graph𝒢\\mathcal\{G\}\(Fig\.[2](https://arxiv.org/html/2606.14892#S4.F2)\), the distributionPρC\(c2\.b∣do\(p2\.x,p2\.x\)\)P^\{\\rho\_\{C\}\}\(c\_\{2\}\.b\\mid do\(p\_\{2\}\.x,p\_\{2\}\.x\)\)is relationally identifiable fromP\(𝐯ρA\)P\(\\mathbf\{v\}\_\{\\rho\_\{A\}\}\)and𝒢\\mathcal\{G\}\.

###### Proof\.

By Cor\.[D\.2](https://arxiv.org/html/2606.14892#A4.Thmadxcorollary2),PρC\(c2\.b∣do\(p2\.x,p2\.x\)\)\)P^\{\\rho\_\{C\}\}\(c\_\{2\}\.b\\mid do\(p\_\{2\}\.x,p\_\{2\}\.x\)\)\)is identifiable fromP\(𝐯ρC\)P\(\\mathbf\{v\}\_\{\\rho\_\{C\}\}\)via backdoor adjustment on the set\{s2\.W\}\\\{s\_\{2\}\.W\\\}, using the formula

PρC\(c2\.b∣do\(p2\.x,p2\.x\)\)=∑s2\.wPρC\(c2\.b∣s2\.w,p2\.x,p3\.x\)PρC\(s2\.W\)\\displaystyle P^\{\\rho\_\{C\}\}\(c\_\{2\}\.b\\mid do\(p\_\{2\}\.x,p\_\{2\}\.x\)\)=\\sum\_\{s\_\{2\}\.w\}P^\{\\rho\_\{C\}\}\(c\_\{2\}\.b\\mid s\_\{2\}\.w,p\_\{2\}\.x,p\_\{3\}\.x\)P^\{\\rho\_\{C\}\}\(s\_\{2\}\.W\)
By Thm\.[4\.3](https://arxiv.org/html/2606.14892#S4.Thmtheorem3), we have thatPρC\(c2\.b∣do\(p2\.x,p2\.x\)\)=PρA\(c1\.b∣s1\.w,p1\.x,p2\.x\)P^\{\\rho\_\{C\}\}\(c\_\{2\}\.b\\mid do\(p\_\{2\}\.x,p\_\{2\}\.x\)\)=P^\{\\rho\_\{A\}\}\(c\_\{1\}\.b\\mid s\_\{1\}\.w,p\_\{1\}\.x,p\_\{2\}\.x\)andPρC\(s2\.W\)=PρA\(s1\.W\)P^\{\\rho\_\{C\}\}\(s\_\{2\}\.W\)=P^\{\\rho\_\{A\}\}\(s\_\{1\}\.W\), and we are done\. ∎

##### Proof of non\-identifiability\.

We show that in the case where RNCMs underperform relative to the gold standard NCM∗\(training on sourceρA\\rho\_\{A\}and evaluating on carc1c\_\{1\}in targetρC\\rho\_\{C\}\), the query is in fact non\-identifiable from the available data and assumptions\.

###### Proposition E\.2\.

Given source skeletonρA\\rho\_\{A\}and target skeletonρC\\rho\_\{C\}as in Exp\.[6\.1](https://arxiv.org/html/2606.14892#S6.SS1), and relational causal graph𝒢\\mathcal\{G\}\(Fig\.[2](https://arxiv.org/html/2606.14892#S4.F2)\), the distributionPρC\(c1\.B=1∣do\(p1\.X=1,p2\.X=1\)\)P^\{\\rho\_\{C\}\}\(c\_\{1\}\.B=1\\mid do\(p\_\{1\}\.X=1,p\_\{2\}\.X=1\)\)is relationally non\-identifiable fromP\(𝐯ρA\)P\(\\mathbf\{v\}\_\{\\rho\_\{A\}\}\)and𝒢\\mathcal\{G\}\.

###### Proof\.

We construct two RSCMs inducing𝒢\\mathcal\{G\}that agree on the observational distribution over the source skeletonρA\\rho\_\{A\}, but disagree on the target interventional query\.

Consider an RSCMℳ\\mathcal\{M\}as follows\. The endogenous variables are𝐕=\{𝖲𝗂𝗀\.W,𝖯𝖾𝖽\.X,𝖢𝖺𝗋\.B\}\\mathbf\{V\}=\\\{\\mathsf\{Sig\}\.W,\\ \\mathsf\{Ped\}\.X,\\ \\mathsf\{Car\}\.B\\\}\. The exogenous variables𝐔\\mathbf\{U\}are𝖲𝗂𝗀\.UW∼ℬ\(0\.3\),𝖯𝖾𝖽\.UX∼ℬ\(0\.4\)\\mathsf\{Sig\}\.\{U\_\{W\}\}\\sim\\mathcal\{B\}\(0\.3\),\\mathsf\{Ped\}\.\{U\_\{X\}\}\\sim\\mathcal\{B\}\(0\.4\)and𝖢𝖺𝗋\.UB∼ℬ\(0\.2\)\\mathsf\{Car\}\.\{U\_\{B\}\}\\sim\\mathcal\{B\}\(0\.2\), mutually independent\. The mechanisms are

𝖲𝗂𝗀\.W\\displaystyle\\mathsf\{Sig\}\.W←𝖲𝗂𝗀\.UW,\\displaystyle\\leftarrow\\mathsf\{Sig\}\.U\_\{W\},𝖯𝖾𝖽\.X\\displaystyle\\mathsf\{Ped\}\.X←𝖯𝖾𝖽\.UX⊕𝟏\[\|\{𝖲𝗂𝗀∣𝖲𝗂𝗀\.W=1,𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖯𝖾𝖽\)\}\|\>0\],\\displaystyle\\leftarrow\\mathsf\{Ped\}\.U\_\{X\}\\oplus\\mathbf\{1\}\[\|\\\{\\mathsf\{Sig\}\\mid\\mathsf\{Sig\}\.W=1,\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Ped\}\)\\\}\|\>0\],𝖢𝖺𝗋\.B\\displaystyle\\mathsf\{Car\}\.B←𝖢𝖺𝗋\.UB⊕H\(𝖢𝖺𝗋\)\\displaystyle\\leftarrow\\mathsf\{Car\}\.U\_\{B\}\\oplus H\(\\mathsf\{Car\}\)where

H\(𝖢𝖺𝗋\)=𝟏\[\|\{𝖲𝗂𝗀∣𝖲𝗂𝗀\.W=1,𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖢𝖺𝗋\)\}\|\>0\]∨𝟏\[\|\{𝖯𝖾𝖽∣𝖯𝖾𝖽\.X=1,𝖯𝖺𝗍𝗁\(𝖯𝖾𝖽,𝖢𝖺𝗋\)\}\|\>0\]\.\\displaystyle H\(\\mathsf\{Car\}\)=\\mathbf\{1\}\[\|\\\{\\mathsf\{Sig\}\\mid\\mathsf\{Sig\}\.W=1,\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Car\}\)\\\}\|\>0\]\\lor\\mathbf\{1\}\[\|\\\{\\mathsf\{Ped\}\\mid\\mathsf\{Ped\}\.X=1,\\mathsf\{Path\}\(\\mathsf\{Ped\},\\mathsf\{Car\}\)\\\}\|\>0\]\.Next, consider an RSCMℳ′\\mathcal\{M\}^\{\\prime\}, identical toℳ\\mathcal\{M\}except with a modified braking mechanism

𝖢𝖺𝗋\.B\\displaystyle\\mathsf\{Car\}\.B←𝖢𝖺𝗋\.UB⊕H\(𝖢𝖺𝗋\)⊕Z\(𝖢𝖺𝗋\)\\displaystyle\\leftarrow\\mathsf\{Car\}\.U\_\{B\}\\oplus H\(\\mathsf\{Car\}\)\\oplus Z\(\\mathsf\{Car\}\)where

Z\(𝖢𝖺𝗋\)=𝟏\[\|\{𝖲𝗂𝗀∣𝖲𝗂𝗀\.W=1,𝖢𝗍𝗋𝗅\(𝖲𝗂𝗀,𝖢𝖺𝗋\)\}\|\>1\]\.\\displaystyle Z\(\\mathsf\{Car\}\)=\\mathbf\{1\}\[\|\\\{\\mathsf\{Sig\}\\mid\\mathsf\{Sig\}\.W=1,\\mathsf\{Ctrl\}\(\\mathsf\{Sig\},\\mathsf\{Car\}\)\\\}\|\>1\]\.Bothℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}induce the same relational causal graph𝒢\\mathcal\{G\}, since the additional termZ\(𝖢𝖺𝗋\)Z\(\\mathsf\{Car\}\)depends only on signal variables already related to𝖢𝖺𝗋\.B\\mathsf\{Car\}\.B\.

We first show thatℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}agree on the source skeletonρA\\rho\_\{A\}\. InρA\\rho\_\{A\}, every car is controlled by at most one signal\. Thus, for every carccinρA\\rho\_\{A\},Z\(c\)=0Z\(c\)=0\. Therefore the braking mechanisms ofℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}coincide onρA\\rho\_\{A\}\. Since all other mechanisms are identical by construction,PℳρA\(𝐯ρA\)=PℳρA′\(𝐯ρA\)\.P^\{\\mathcal\{M\}\_\{\\rho\_\{A\}\}\}\(\\mathbf\{v\}\_\{\\rho\_\{A\}\}\)=P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\_\{A\}\}\}\(\\mathbf\{v\}\_\{\\rho\_\{A\}\}\)\.

We now show that the two models disagree on the target query\. InρC\\rho\_\{C\}, carc1c\_\{1\}is controlled by two signals,s1s\_\{1\}ands2s\_\{2\}, and hasp1,p2p\_\{1\},p\_\{2\}in its path\. Under the interventiondo\(p1\.X=1,p2\.X=1\)do\(p\_\{1\}\.X=1,p\_\{2\}\.X=1\), we haveH\(c1\)=1H\(c\_\{1\}\)=1in both models, regardless of the signal values\. Therefore,

PℳρC\(c1\.B=1∣do\(p1\.X=1,p2\.X=1\)\)\\displaystyle P^\{\\mathcal\{M\}\_\{\\rho\_\{C\}\}\}\(c\_\{1\}\.B=1\\mid do\(p\_\{1\}\.X=1,p\_\{2\}\.X=1\)\)=PℳρC\(c1\.UB⊕1=1\)\\displaystyle\\qquad=P^\{\\mathcal\{M\}\_\{\\rho\_\{C\}\}\}\(c\_\{1\}\.U\_\{B\}\\oplus 1=1\)=PℳρC\(c1\.UB=0\)=0\.8\.\\displaystyle\\qquad=P^\{\\mathcal\{M\}\_\{\\rho\_\{C\}\}\}\(c\_\{1\}\.U\_\{B\}=0\)=0\.8\.Inℳ′\\mathcal\{M\}^\{\\prime\}, the additional termZ\(c1\)Z\(c\_\{1\}\)equals11exactly when both controlling signals are active, i\.e\. whens1\.W=s2\.W=1s\_\{1\}\.W=s\_\{2\}\.W=1\. Since the signal mechanisms are unchanged by the intervention andP\(s\.W=1\)=0\.3P\(s\.W=1\)=0\.3, we havePρC\(Z\(c1\)=1\)=P\(s1\.W=1,s2\.W=1\)=0\.32=0\.09P^\{\\rho\_\{C\}\}\(Z\(c\_\{1\}\)=1\)=P\(s\_\{1\}\.W=1,s\_\{2\}\.W=1\)=0\.3^\{2\}=0\.09\. Thus,

PℳρC′\(c1\.B=1∣do\(p1\.X=1,p2\.X=1\)\)\\displaystyle P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\_\{C\}\}\}\(c\_\{1\}\.B=1\\mid do\(p\_\{1\}\.X=1,p\_\{2\}\.X=1\)\)=PℳρC′\(c1\.UB⊕1⊕Z\(c1\)=1\)\\displaystyle\\qquad=P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\_\{C\}\}\}\(c\_\{1\}\.U\_\{B\}\\oplus 1\\oplus Z\(c\_\{1\}\)=1\)=P\(Z\(c1\)=0\)P\(c1\.UB=0\)\+P\(Z\(c1\)=1\)P\(c1\.UB=1\)\\displaystyle\\qquad=P\(Z\(c\_\{1\}\)=0\)P\(c\_\{1\}\.U\_\{B\}=0\)\+P\(Z\(c\_\{1\}\)=1\)P\(c\_\{1\}\.U\_\{B\}=1\)=\(1−0\.09\)\(0\.8\)\+\(0\.09\)\(0\.2\)\\displaystyle\\qquad=\(1\-0\.09\)\(0\.8\)\+\(0\.09\)\(0\.2\)=0\.746\.\\displaystyle\\qquad=0\.746\.
Thusℳ\\mathcal\{M\}andℳ′\\mathcal\{M\}^\{\\prime\}agree onP\(𝐯ρA\)P\(\\mathbf\{v\}\_\{\\rho\_\{A\}\}\)while inducing different values for the target query\. ∎

##### Design of baselines\.

We compare RNCMs with five baselines, constructed as follows\.

1. 1\.NCM\-X, a non\-relational causal baseline\.This trains a𝒢\\mathcal\{G\}\-NCM for the flattened graph𝒢\\mathcal\{G\}with edgesW→X,W→B,X→BW\\to X,\\ W\\to B,\\ X\\to B\. It is trained on ‘Cartesian’ flattened data, splitting each data point into all triples of cars, pedestrians, and signals\. For example, a datapoint of source skeletonρA\\rho\_\{A\}is split into four datapoints as follows: \(s1\.w,p1\.x,p2\.x,c1\.b,c2\.b\)→\\displaystyle\(s\_\{1\}\.w,p\_\{1\}\.x,p\_\{2\}\.x,c\_\{1\}\.b,c\_\{2\}\.b\)\\to\\\(s1\.w,p1\.x,c1\.b\)\\displaystyle\(s\_\{1\}\.w,p\_\{1\}\.x,c\_\{1\}\.b\)\(s1\.w,p1\.x,c2\.b\)\\displaystyle\(s\_\{1\}\.w,p\_\{1\}\.x,c\_\{2\}\.b\)\(s1\.w,p2\.x,c1\.b\)\\displaystyle\(s\_\{1\}\.w,p\_\{2\}\.x,c\_\{1\}\.b\)\(s1\.w,p2\.x,c2\.b\)\.\\displaystyle\(s\_\{1\}\.w,p\_\{2\}\.x,c\_\{2\}\.b\)\.If the original dataset for sourceρA\\rho\_\{A\}contains10410^\{4\}datapoints, the flattened dataset thus contains4×1044\\times 10^\{4\}datapoints\.
2. 2\.NCM\-J, a causal baseline with partial relational information\.It is trained on ‘join’ flattened data, splitting each data point into all triples of cars, pedestrians, and signals that are pairwise related\. For example, a datapoint of source skeletonρA\\rho\_\{A\}is split into two datapoints as follows: \(s1\.w,p1\.x,p2\.x,c1\.b,c2\.b\)→\\displaystyle\(s\_\{1\}\.w,p\_\{1\}\.x,p\_\{2\}\.x,c\_\{1\}\.b,c\_\{2\}\.b\)\\to\\\(s1\.w,p1\.x,c1\.b\)\\displaystyle\(s\_\{1\}\.w,p\_\{1\}\.x,c\_\{1\}\.b\)\(s1\.w,p2\.x,c1\.b\)\.\\displaystyle\(s\_\{1\}\.w,p\_\{2\}\.x,c\_\{1\}\.b\)\.Note how\(s1\.w,p1\.x,c2\.b\)\(s\_\{1\}\.w,p\_\{1\}\.x,c\_\{2\}\.b\)is not an included triple since𝖢𝗍𝗋𝗅\(s1,c2\)\\mathsf\{Ctrl\}\(s\_\{1\},c\_\{2\}\)is false\. If the original dataset for sourceρA\\rho\_\{A\}contains10410^\{4\}datapoints, the flattened dataset thus contains2×1042\\times 10^\{4\}datapoints\.
3. 3\.Rel\-MLP, a relational non\-causal baseline\.This baseline trains an MLP to predictc\.Bc\.Bfrom count features of the objects related tocc\. Each relational datapoint is split into one supervised example per car\. The feature vector for carccis \(∑s:𝖢𝗍𝗋𝗅\(s,c\)s\.W,\|\{s:𝖢𝗍𝗋𝗅\(s,c\)\}\|,∑p:𝖨𝗇𝖯𝖺𝗍𝗁\(p,c\)p\.X,\|\{p:𝖨𝗇𝖯𝖺𝗍𝗁\(p,c\)\}\|\)\.\\Big\(\\underset\{s:\\mathsf\{Ctrl\}\(s,c\)\}\{\\sum\}s\.W,\\;\|\\\{s:\\mathsf\{Ctrl\}\(s,c\)\\\}\|,\\;\\underset\{p:\\mathsf\{InPath\}\(p,c\)\}\{\\sum\}p\.X,\\;\|\\\{p:\\mathsf\{InPath\}\(p,c\)\\\}\|\\Big\)\.Thus the model observes both the number of related signals and pedestrians, and the number of active related signals and pedestrians\. For example, a datapoint fromρA\\rho\_\{A\}is split into two car\-level examples: \(s1\.w,p1\.x,p2\.x,c1\.b,c2\.b\)↦\\displaystyle\(s\_\{1\}\.w,p\_\{1\}\.x,p\_\{2\}\.x,c\_\{1\}\.b,c\_\{2\}\.b\)\\mapsto\\\(s1\.w,1,p1\.x\+p2\.x,2,c1\.b\),\\displaystyle\\big\(s\_\{1\}\.w,\\;1,\\;p\_\{1\}\.x\+p\_\{2\}\.x,\\;2,\\;c\_\{1\}\.b\\big\),\(0,0,p2\.x,1,c2\.b\)\.\\displaystyle\\big\(0,\\;0,\\;p\_\{2\}\.x,\\;1,\\;c\_\{2\}\.b\\big\)\.Here the first four entries are the input features and the final entry is the prediction target\. Since this baseline is not causal, it cannot directly compute interventional quantities\. To estimate our target queryP\(c\.B=1∣do\(p1\.X=1,…,pk\.X=1\)\),P\(c\.B=1\\mid do\(p\_\{1\}\.X=1,\\ldots,p\_\{k\}\.X=1\)\),we use a conditional probability\. rst, the intervention is mapped to the active pedestrian count it induces in the target car’s neighborhood:mc=∑j=1k𝟏\{𝖨𝗇𝖯𝖺𝗍𝗁\(pj,c\)\}xj\.m\_\{c\}=\\sum\_\{j=1\}^\{k\}\\mathbf\{1\}\\\{\\mathsf\{InPath\}\(p\_\{j\},c\)\\\}x\_\{j\}\.We then average the MLP predictions over the subset of rowsℐ\\mathcal\{I\}in the source data whose active pedestrian count ismcm\_\{c\}: P^\(c\.B=1∣do\(p1\.X=1,…,pk\.X=1\)\)=1\|ℐ\|∑i∈ℐσ\(fθ\(i\)\),\\widehat\{P\}\(c\.B=1\\mid do\(p\_\{1\}\.X=1,\\ldots,p\_\{k\}\.X=1\)\)=\\frac\{1\}\{\|\\mathcal\{I\}\|\}\\sum\_\{i\\in\\mathcal\{I\}\}\\sigma\(f\_\{\\theta\}\(i\)\),Ifℐ\\mathcal\{I\}is empty, the estimate is left undefined\.
4. 4\.Rel\-MLP \+ deg, a degree\-matched variant ofRel\-MLP\.This baseline uses the same trained MLP and the same count features asRel\-MLP, but uses a different query estimator\. In addition to matching the active pedestrian count induced by the intervention, it also requires the source car row to have the same number of related signals and pedestrians as the target car, thus choosing a smaller subsetℐ\\mathcal\{I\}above\. For example, suppose the target query concerns a carccwith one related signal and two related pedestrians, under an intervention setting two related pedestrians to11\. ThenRel\-MLPaverages MLP predictions over all source rows with two active related pedestrians\.Rel\-MLP \+ degfurther restricts this average to source rows whose cars also have exactly one related signal and two related pedestrians\.
5. 5\.NCM∗, a gold standard target\-only NCM\.\. Finally, for a target skeletonρ⋆\\rho\_\{\\star\}, NCM∗is simply a𝒢¯ρ⋆\\bar\{\\mathcal\{G\}\}\_\{\\rho\_\{\\star\}\}\-constrained NCM trained on data from the target, where𝒢¯ρ⋆\\bar\{\\mathcal\{G\}\}\_\{\\rho\_\{\\star\}\}is the marginalized ground graph of the true relational causal graph onρ⋆\\rho\_\{\\star\}\. Note that since it is a standard NCM, it does not share parameters across different objects of the same type\.

Table E\.4\.1:Target queries used Exp\.[6\.1](https://arxiv.org/html/2606.14892#S6.SS1)and Fig\.[3](https://arxiv.org/html/2606.14892#S5.F3)\.

### E\.5Experiment with Front\-door Estimation

In this section, we present an additional experiment with a front\-door graph, in addition to the backdoor graphs considered in Exp\.[6\.1](https://arxiv.org/html/2606.14892#S6.SS1)\. We use similar set of source and target relational skeletons as in Exp\.[6\.2](https://arxiv.org/html/2606.14892#S6.SS2)\. We train RNCMs using the graph in Fig\.[E\.5\.1](https://arxiv.org/html/2606.14892#A5.SS5.F1)to estimate the identifiable same\-car queryP\(c1\.B=1∣do\(c1\.X=1\)\)P\(c\_\{1\}\.B=1\\mid do\(c\_\{1\}\.X=1\)\)in the target skeleton\. This query is computable via front\-door adjustment on\{c1\.Z,c2\.Z,c3\.Z\}\\\{c\_\{1\}\.Z,c\_\{2\}\.Z,c\_\{3\}\.Z\\\}\. Averaging across five trials, RNCMs achieve an MSE of0\.00082±0\.00180\.00082\\pm 0\.0018from the ground truth, thus estimating the query with high accuracy\.

![Refer to caption](https://arxiv.org/html/2606.14892v1/x7.png)\(a\)Front\-door relational causal graph\.
![Refer to caption](https://arxiv.org/html/2606.14892v1/x8.png)\(b\)Marginalized ground graph on source skeleton\.
![Refer to caption](https://arxiv.org/html/2606.14892v1/x9.png)\(c\)Marginalized ground graph on target skeleton\.

Figure E\.5\.1:Front\-door graphs used in Exp\.[E\.5](https://arxiv.org/html/2606.14892#A5.SS5)
### E\.6RNCM Identification Experiments: Details

In Fig\.[E\.6\.1](https://arxiv.org/html/2606.14892#A5.SS6.F1), we show the marginalized ground causal graphs for the source skeletonρ\\rhoand target skeletonρ⋆\\rho\_\{\\star\}of both the relational bow and the relational IV graphs used in Exp\.[6\.2](https://arxiv.org/html/2606.14892#S6.SS2)\.

We also prove the identifiability result for the different\-car query in both cases\.

###### Proposition E\.3\.

Given source skeletonρ\\rhoand target skeletonρ⋆\\rho\_\{\\star\}as in Exp\.[6\.2](https://arxiv.org/html/2606.14892#S6.SS2), and causal graph𝒢𝖻𝗈𝗐\\mathcal\{G\}\_\{\\mathsf\{bow\}\}\(Fig\.[E\.6\.1](https://arxiv.org/html/2606.14892#A5.SS6.F1), top left\), the distributionPρ⋆\(c3\.y\)P^\{\\rho\_\{\\star\}\}\(c\_\{3\}\.y\)is relationally identifiable fromP\(𝐯ρ\)P\(\\mathbf\{v\}\_\{\\rho\}\)and𝒢𝖻𝗈𝗐\\mathcal\{G\}\_\{\\mathsf\{bow\}\}\.

###### Proof\.

Consider any RSCMℳ\\mathcal\{M\}consistent with𝒢𝖻𝗈𝗐\\mathcal\{G\}\_\{\\mathsf\{bow\}\}\. Since𝒢𝖻𝗈𝗐\\mathcal\{G\}\_\{\\mathsf\{bow\}\}isρ\\rho\-Markovian, so isℳ\\mathcal\{M\}\. Let𝐔X\\mathbf\{U\}\_\{X\}\(resp\.,𝐔Y\\mathbf\{U\}\_\{Y\}\) denote the non\-relational exogenous variables affecting𝖢𝖺𝗋\.X\\mathsf\{Car\}\.X\(resp\.,𝖢𝖺𝗋\.Y\\mathsf\{Car\}\.Y\), and𝐔X′=𝐔X′∖𝐔Y′\\mathbf\{U\}^\{\\prime\}\_\{X\}=\\mathbf\{U\}^\{\\prime\}\_\{X\}\\setminus\\mathbf\{U\}^\{\\prime\}\_\{Y\}Then, in the ground RSCMℳρ⋆\\mathcal\{M\}\_\{\\rho\_\{\\star\}\},

Pℳρ⋆\(c3\.y\)\\displaystyle P^\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}\(c\_\{3\}\.y\)=∑c3\.x,c3\.𝐮Yℳρ⋆Pℳρ⋆\(c3\.y∣c3\.x,c3\.𝐮Y\)Pℳρ⋆\(c3\.x∣c3\.𝐮Y\)Pℳρ⋆\(c3\.𝐮Y\)\\displaystyle=\\sum\_\{c\_\{3\}\.x,c\_\{3\}\.\\mathbf\{u\}\_\{Y\}\}^\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}P^\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}\(c\_\{3\}\.y\\mid c\_\{3\}\.x,c\_\{3\}\.\\mathbf\{u\}\_\{Y\}\)P^\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}\(c\_\{3\}\.x\\mid c\_\{3\}\.\\mathbf\{u\}\_\{Y\}\)P^\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}\(c\_\{3\}\.\\mathbf\{u\}\_\{Y\}\)=∑c3\.x,c3\.𝐮Y𝟏f𝖢𝖺𝗋\.Y\(c3\.x,𝐮Y\)=c3\.y∑c3\.𝐮X′𝟏f𝖢𝖺𝗋\.X\(𝐮X\)=c3\.xPℳρ⋆\(c3\.𝐮Y,c3\.𝐮X′\)\\displaystyle=\\sum\_\{c\_\{3\}\.x,c\_\{3\}\.\\mathbf\{u\}\_\{Y\}\}\\mathbf\{1\}\_\{f\_\{\\mathsf\{Car\}\.Y\}\(c\_\{3\}\.x,\\mathbf\{u\}\_\{Y\}\)=c\_\{3\}\.y\}\\sum\_\{c\_\{3\}\.\\mathbf\{u\}^\{\\prime\}\_\{X\}\}\\mathbf\{1\}\_\{f\_\{\\mathsf\{Car\}\.X\}\(\\mathbf\{u\}\_\{X\}\)=c\_\{3\}\.x\}P^\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}\(c\_\{3\}\.\\mathbf\{u\}\_\{Y\},c\_\{3\}\.\\mathbf\{u\}^\{\\prime\}\_\{X\}\)\(c3\.Yc\_\{3\}\.Yhas no relational parents\)=∑c3\.x,c3\.𝐮Y𝟏f𝖢𝖺𝗋\.Y\(c3\.x,𝐮Y\)=c3\.y∑c3\.𝐮X′𝟏f𝖢𝖺𝗋\.X\(𝐮X\)=c3\.xPℳρ\(c2\.𝐮Y,c2\.𝐮X′\)\\displaystyle=\\sum\_\{c\_\{3\}\.x,c\_\{3\}\.\\mathbf\{u\}\_\{Y\}\}\\mathbf\{1\}\_\{f\_\{\\mathsf\{Car\}\.Y\}\(c\_\{3\}\.x,\\mathbf\{u\}\_\{Y\}\)=c\_\{3\}\.y\}\\sum\_\{c\_\{3\}\.\\mathbf\{u\}^\{\\prime\}\_\{X\}\}\\mathbf\{1\}\_\{f\_\{\\mathsf\{Car\}\.X\}\(\\mathbf\{u\}\_\{X\}\)=c\_\{3\}\.x\}P^\{\\mathcal\{M\}\_\{\\rho\}\}\(c\_\{2\}\.\\mathbf\{u\}\_\{Y\},c\_\{2\}\.\\mathbf\{u\}^\{\\prime\}\_\{X\}\)\(exogenous distributions are shared acrossccinρ,ρ⋆\\rho,\\rho\_\{\\star\}\)=∑c2\.x,c2\.𝐮Y𝟏f𝖢𝖺𝗋\.Y\(c2\.x,𝐮Y\)=c2\.y∑c2\.𝐮X′𝟏f𝖢𝖺𝗋\.X\(𝐮X\)=c2\.xPℳρ\(c2\.𝐮Y,c2\.𝐮X′\)\\displaystyle=\\sum\_\{c\_\{2\}\.x,c\_\{2\}\.\\mathbf\{u\}\_\{Y\}\}\\mathbf\{1\}\_\{f\_\{\\mathsf\{Car\}\.Y\}\(c\_\{2\}\.x,\\mathbf\{u\}\_\{Y\}\)=c\_\{2\}\.y\}\\sum\_\{c\_\{2\}\.\\mathbf\{u\}^\{\\prime\}\_\{X\}\}\\mathbf\{1\}\_\{f\_\{\\mathsf\{Car\}\.X\}\(\\mathbf\{u\}\_\{X\}\)=c\_\{2\}\.x\}P^\{\\mathcal\{M\}\_\{\\rho\}\}\(c\_\{2\}\.\\mathbf\{u\}\_\{Y\},c\_\{2\}\.\\mathbf\{u\}^\{\\prime\}\_\{X\}\)\(functionsf𝖢𝖺𝗋\.X,f𝖢𝖺𝗋\.Yf\_\{\\mathsf\{Car\}\.X\},f\_\{\\mathsf\{Car\}\.Y\}are shared acrossccinρ,ρ⋆\\rho,\\rho\_\{\\star\}\)=Pℳρ\(c2\.y\)\\displaystyle=P^\{\\mathcal\{M\}\_\{\\rho\}\}\(c\_\{2\}\.y\)\(by a symmetric derivation\)
This proves that for any RSCMsℳ,ℳ′\\mathcal\{M\},\\mathcal\{M\}^\{\\prime\}consistent with𝒢𝖻𝗈𝗐\\mathcal\{G\}\_\{\\mathsf\{bow\}\}, we havePℳρ\(c2\.y\)=Pℳρ⋆\(c3\.y\)P^\{\\mathcal\{M\}\_\{\\rho\}\}\(c\_\{2\}\.y\)=P^\{\\mathcal\{M\}\_\{\\rho\_\{\\star\}\}\}\(c\_\{3\}\.y\)andPℳρ′\(c2\.y\)=Pℳρ⋆′\(c3\.y\)P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\}\}\(c\_\{2\}\.y\)=P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\_\{\\star\}\}\}\(c\_\{3\}\.y\)\. Ifℳ,ℳ′\\mathcal\{M\},\\mathcal\{M\}^\{\\prime\}agree on the source data, thenPℳρ\(c2\.y\)=Pℳρ′\(c2\.y\)P^\{\\mathcal\{M\}\_\{\\rho\}\}\(c\_\{2\}\.y\)=P^\{\\mathcal\{M\}^\{\\prime\}\_\{\\rho\}\}\(c\_\{2\}\.y\), and we are done\. ∎

###### Proposition E\.4\.

Given source skeletonρ\\rhoand target skeletonρ⋆\\rho\_\{\\star\}as in Exp\.[6\.2](https://arxiv.org/html/2606.14892#S6.SS2), and causal graph𝒢𝖻𝗈𝗐\\mathcal\{G\}\_\{\\mathsf\{bow\}\}\(Fig\.[E\.6\.1](https://arxiv.org/html/2606.14892#A5.SS6.F1), top left\), the queryPρ⋆\(c3\.y∣do\(c2\.x\)\)P^\{\\rho\_\{\\star\}\}\(c\_\{3\}\.y\\mid do\(c\_\{2\}\.x\)\)is relationally identifiable fromP\(𝐯ρ\)P\(\\mathbf\{v\}\_\{\\rho\}\)and𝒢𝖻𝗈𝗐\\mathcal\{G\}\_\{\\mathsf\{bow\}\}\.

###### Proof\.

There is no directed path fromc2\.Xc\_\{2\}\.Xtoc3\.Yc\_\{3\}\.Yin𝒢𝖻𝗈𝗐,ρ⋆\\mathcal\{G\}\_\{\\mathsf\{bow\},\\rho\_\{\\star\}\}\(Fig\.[E\.6\.1](https://arxiv.org/html/2606.14892#A5.SS6.F1), top right\)\. By do\-calculus and Prop\.[4\.4](https://arxiv.org/html/2606.14892#S4.Thmtheorem4),Pρ⋆\(c3\.y∣do\(c2\.x\)\)=Pρ⋆\(c3\.y\)P^\{\\rho\_\{\\star\}\}\(c\_\{3\}\.y\\mid do\(c\_\{2\}\.x\)\)=P^\{\\rho\_\{\\star\}\}\(c\_\{3\}\.y\)\. By Prop\.[E\.3](https://arxiv.org/html/2606.14892#A5.Thmadxprop3),Pρ⋆\(c3\.y\)P^\{\\rho\_\{\\star\}\}\(c\_\{3\}\.y\)is relationally identifiable, and we are done\. ∎

An identical derivation shows thatPρ⋆\(c3\.y∣do\(c2\.x\)\)P^\{\\rho\_\{\\star\}\}\(c\_\{3\}\.y\\mid do\(c\_\{2\}\.x\)\)is also relationally identifiable fromP\(𝐯ρ\)P\(\\mathbf\{v\}\_\{\\rho\}\)and𝒢𝗂𝗏\\mathcal\{G\}\_\{\\mathsf\{iv\}\}\.

In both graphs, the non\-identifiability of the same\-car query follows from Prop\.[4\.5](https://arxiv.org/html/2606.14892#S4.Thmtheorem5), by a similar argument as in Ex\.[C\.2](https://arxiv.org/html/2606.14892#A3.Thmadxexample2)\.

![Refer to caption](https://arxiv.org/html/2606.14892v1/figs/exp-id-ground.png)Figure E\.6\.1:Marginalized ground graphs corresponding to𝒢𝖻𝗈𝗐\\mathcal\{G\}\_\{\\mathsf\{bow\}\}\(top\) and𝒢𝗂𝗏\\mathcal\{G\}\_\{\\mathsf\{iv\}\}\(bottom\) for the sourceρ\\rho\(middle\) and targetρ⋆\\rho\_\{\\star\}\(right\) in Exp\.[6\.2](https://arxiv.org/html/2606.14892#S6.SS2)\.
### E\.7Experiments with Larger Relational Structures

In this section, we evaluate the scability of RNCMs on larger relational structures with up to 75 variables, extending the estimation experiments of Sec\.[6\.1](https://arxiv.org/html/2606.14892#S6.SS1)\.

##### Setup\.

We use the relational causal graph in Exp\.[6\.1](https://arxiv.org/html/2606.14892#S6.SS1), Fig\.[2](https://arxiv.org/html/2606.14892#S4.F2)\. Forn∈\{10,20,30,40,50\}n\\in\\\{10,20,30,40,50\\\}, we train on a source skeleton withnnobjects and evaluate on two distinct target skeletons: one withnnobjects and one with1\.5n1\.5nobjects, all with similar neighborhood sizes but different topology\. We generate5×1035\\times 10^\{3\}datapoints using a regional CTM similar to Exp\.[6\.1](https://arxiv.org/html/2606.14892#S6.SS1)with count aggregation, with three trials per source skeleton\. We compute a large\-scale aggregate query: the average ofP\(c\.B=1\|do\(p\.X=1\)\)P\(c\.B=1\|do\(p\.X=1\)\)for all pedestriansppand carsccsuch that𝖯𝖺𝗍𝗁\(p,c\)\\mathsf\{Path\}\(p,c\)\.

##### Generating relational skeletons\.

We generate source skeletons\(nS,nP,nC\)∈\{\(2,4,4\),\(4,8,8\),\(6,12,12\),\(8,16,16\),\(10,20,20\),\}\(n\_\{S\},n\_\{P\},n\_\{C\}\)\\in\\\{\(2,4,4\),\\ \(4,8,8\),\\ \(6,12,12\),\\ \(8,16,16\),\\ \(10,20,20\),\\\}signals, pedestrians, and cars respectively\. For each source size, we construct two target skeletons: a matched\-size targetρ⋆\\rho\_\{\\star\}, and a larger targetρ⋆⋆\\rho\_\{\\star\\star\}with\(nS′,nP′,nC′\)=\(⌈1\.5nS⌉,⌈1\.5nP⌉,⌈1\.5nC⌉\)\.\(n\_\{S\}^\{\\prime\},n\_\{P\}^\{\\prime\},n\_\{C\}^\{\\prime\}\)=\(\\lceil 1\.5n\_\{S\}\\rceil,\\lceil 1\.5n\_\{P\}\\rceil,\\lceil 1\.5n\_\{C\}\\rceil\)\.many signals, pedestrians, and cars respectively\.

For each relation typeR\(A,B\)R\(A,B\), we generate ground relations as follows\. Consider objects of typeAAandnBn\_\{B\}objects of typeBBordered asa0,…,anA−1a\_\{0\},\\ldots,a\_\{n\_\{A\}\-1\}andb0,…,bnB−1b\_\{0\},\\ldots,b\_\{n\_\{B\}\-1\}\. For each objectbjb\_\{j\}we assign a local neighbourhood of objectsa∈ρ\(A\)a\\in\\rho\(A\)as follows\. We hoose an anchor indexa\(j\)∈\{0,…,nA−1\}a\(j\)\\in\\\{0,\\ldots,n\_\{A\}\-1\\\}, and include

R\(ai,bj\)⟺\|i−a\(j\)\|≤ΔR,R\(a\_\{i\},b\_\{j\}\)\\quad\\Longleftrightarrow\\quad\|i\-a\(j\)\|\\leq\\Delta\_\{R\},whereΔR\\Delta\_\{R\}is a relation\-specific neighborhood radius\. We useΔR=0\\Delta\_\{R\}=0for𝖢𝗈𝗇𝗍𝗋𝗈𝗅𝗌\(S,P\)\\mathsf\{Controls\}\(S,P\),ΔR=1\\Delta\_\{R\}=1for𝖢𝗈𝗇𝗍𝗋𝗈𝗅𝗌\(S,C\)\\mathsf\{Controls\}\(S,C\), andΔR=1\\Delta\_\{R\}=1for𝖯𝖺𝗍𝗁\(P,C\)\\mathsf\{Path\}\(P,C\)\.

The source and target skeletons differ in how the anchora\(j\)a\(j\)is chosen\. For the source skeleton, we use

asrc\(j\)=⌊jnAnB⌋\.a\_\{\\mathrm\{src\}\}\(j\)=\\left\\lfloor\\frac\{jn\_\{A\}\}\{n\_\{B\}\}\\right\\rfloor\.For both target skeletons, we use an alternating floor/ceiling anchor

atgt\(j\)=\{⌊jnAnB⌋,jeven,⌈jnAnB⌉,jodd\.a\_\{\\mathrm\{tgt\}\}\(j\)=\\begin\{cases\}\\left\\lfloor\\frac\{jn\_\{A\}\}\{n\_\{B\}\}\\right\\rfloor,&j\\text\{ even\},\\\\\[3\.0pt\] \\left\\lceil\\frac\{jn\_\{A\}\}\{n\_\{B\}\}\\right\\rceil,&j\\text\{ odd\}\.\\end\{cases\}Thus, the target skeletons preserve the same local neighborhood radii as the source skeletons but the two are non\-isomorphic\. See Fig\.[E\.7\.1](https://arxiv.org/html/2606.14892#A5.SS7.F1)for a visualization of the marginalized ground graphs of the source skeleton and large target skeleton forn=10n=10\.

![Refer to caption](https://arxiv.org/html/2606.14892v1/x10.png)\(a\)Source skeletonρ\\rhowith 10 objects\.
![Refer to caption](https://arxiv.org/html/2606.14892v1/x11.png)\(b\)Target skeletonρ⋆⋆\\rho\_\{\\star\\star\}with 15 objects\.

Figure E\.7\.1:Marginalized ground graphs for relational skeletons used in scaling experiment \(Exp\.[E\.7](https://arxiv.org/html/2606.14892#A5.SS7)\)\.![Refer to caption](https://arxiv.org/html/2606.14892v1/x12.png)\(a\)Accuracy of RNCMs trained on source and evaluated on target\.
![Refer to caption](https://arxiv.org/html/2606.14892v1/x13.png)\(b\)Training time of RNCMs\.

Figure E\.7\.2:Performance of RNCMs on large relational structures \(Exp\.[E\.7](https://arxiv.org/html/2606.14892#A5.SS7)\) averaged across 10 trials\. An RNCM is trained on a source skeleton withnnobjects, and evaluated on two distinct target skeletons: one withnnobjects, and another with1\.5n1\.5nobjects\. Accuracy remains stable with increasing number of objects, alongside a manageable increase in runtime\.
##### Results\.

In Fig\.[E\.7\.2](https://arxiv.org/html/2606.14892#A5.SS7.F2), we report MSE and runtime of RNCMs for this setting\. Accuracy remains stable and high with an increasing number of objects; interestingly, accuracy improves slightly fomn=10n=10ton=20n=20objects, which might be explained by how a larger number of objects means a larger number of effective training datapoints for each shared mechanism\. While runtime increases with the number of objects, the growth is modest beyond small sizes, suggesting that parameter sharing mitigates the combinatorial explosion typically associated with increased dimensionality\. This highlights the scalability of RNCMs to larger relational structures\.

### E\.8Experiments with Misspecified Assumptions

In this section, we conduct further experiments to evaluate the sensitivity of RNCMs to two types of misspecification in the assumptions: incorrect edges in the graph structure and incorrect aggregators\.

#### E\.8\.1Misspecified Graphs

##### Setup\.

Across 10 trials, we generate10410^\{4\}observational datapoints for sourceρA\\rho\_\{A\}\(Fig\.[1](https://arxiv.org/html/2606.14892#S1.F1)\) using a random regional CTM following the true causal graph in Fig\.[2](https://arxiv.org/html/2606.14892#S4.F2)\(with count aggregation\)\. For each trial, we train four RNCMs, each given a different graph as input: the true graph; an underspecified graph without a causal effect fromS\.WS\.WtoC\.BC\.B; an underspecified graph without a causal effect fromS\.WS\.WtoP\.XP\.X; and an overspecified graph, similar to the true graph but without any relational constraints on the edges \(so that every signal affects every car/pedestrian, and every pedestrian affects every car\)\. We evaluate each RNCM on the identifiable queryPρC\(c2\.B=1∣do\(p2\.X=1,p3\.X=1\)\)P^\{\\rho\_\{C\}\}\(c\_\{2\}\.B=1\\mid do\(p\_\{2\}\.X=1,p\_\{3\}\.X=1\)\)in the target skeletonρC\\rho\_\{C\}\. We also evaluate the flat NCM baselines; the two under\-specified graph removes theW→XW\\to XandW→BW\\to Bedges respectively, and the ‘over\-specified’ graph is simply the true \(complete\) graph\.

##### Results\.

We report average MSE across 10 trials in Fig\.[E\.8\.1](https://arxiv.org/html/2606.14892#A5.SS8.F1)\. Misspecification does not necessarily hurt RNCM performance\. For instance, the underspecfied graph without theS\.W→P\.XS\.W\\to P\.Xcausal effect achieves results in similar RNCM accuracy as the true graph\. However, the underspecified graph without theS\.W↝C\.BS\.W\\rightsquigarrow C\.Bpath significantly harms RNCM accuracy\. Overspecification also results in a decrease in accuracy, though less so than a missingS\.W↝C\.BS\.W\\rightsquigarrow C\.Beffect\. Importantly, despite misspecification in the input graph, RNCMs have greater or comparable accuracy to flat baselines even if the flat baselines are given the true flattened graph\.

![Refer to caption](https://arxiv.org/html/2606.14892v1/x14.png)Figure E\.8\.1:Graphs and results of Exp\.[E\.8\.1](https://arxiv.org/html/2606.14892#A5.SS8.SSS1)for evaluating the sensitivity of RNCMs to misspecification in the input graph\. Misspecification does not necessarily hurt accuracy: a missingS\.W→P\.XS\.W\\to P\.Xeffect yields accuracy similar to the true graph\. This shows how sensitivity to misspecification is highly dependent on the graph, query, and type of misspecification\. Importantly, RNCMs given a misspecified graph have accuracy greater than or comparable to flat baselines given the true graph\.

#### E\.8\.2Misspecified Aggregators

##### Setup\.

We use the graph structure in Fig\.[4\.1](https://arxiv.org/html/2606.14892#S4.Thmtheorem1)for both generating the data and training RNCMs, while varying the role aggregators\. For each of the count and majority aggregators, we generate10410^\{4\}observational datapoints each across 10 trials for sourceρA\\rho\_\{A\}\. For each dataset, we train two RNCMs: one using the true aggregator in \(count, majority\), and one a misspecified aggregator in \(majority, count\)\. We evaluate each RNCM on the identifiable queryPρC\(c2\.B=1∣do\(p2\.X=1,p3\.X=1\)\)P^\{\\rho\_\{C\}\}\(c\_\{2\}\.B=1\\mid do\(p\_\{2\}\.X=1,p\_\{3\}\.X=1\)\)in the target skeletonρC\\rho\_\{C\}, similar to Exp\.[E\.8\.1](https://arxiv.org/html/2606.14892#A5.SS8.SSS1)\.

##### Results\.

We report average MSE across 10 trials in Fig\.[E\.8\.2](https://arxiv.org/html/2606.14892#A5.SS8.F2)\. Correctly specified count aggregation gives the lowest error\. However, estimation using majority aggregation when the true aggregation is count yields in a substantial decrease in accuracy\. Correctly specified majority aggegation yields the next lowest error\. When the true aggregator is majority, correctly specified majority aggregation yields the lowest error, but estimation using count aggregation yields only slightly lower accuracy, with RNCMs still remaining performant \(MSE below10−310^\{\-3\}\. This can be explained by how count is a sufficient statistic for computing majority, with the slight decrease in accuracy possibly due to the increased dimensionality of the problem\.

![Refer to caption](https://arxiv.org/html/2606.14892v1/x15.png)Figure E\.8\.2:Accuracy of RNCMs under misspecification of aggregators \(Exp\.[E\.8\.2](https://arxiv.org/html/2606.14892#A5.SS8.SSS2)\)\. When the true aggregator is count, using majority aggregation for estimation yields a substantial decrease in accuracy\. When the true aggregator is count, using majority aggregation only slightly decreases accuracy, still maintaining MSE under10−310^\{\-3\}\.

## Appendix FFrequently Asked Questions

1. Q1\.What is the difference between a schema, skeleton, RSCM, ground SCM, RCG, and ground graph? How are they related to standard SCMs? Answer\.A summary of these concepts, with an illustrative example and a comparison to standard SCMs, can be found in Table[B\.3\.1](https://arxiv.org/html/2606.14892#A2.SS3.T1)\. Another example with a more explicit comparison to standard SCMs is given in Sec\.[C\.1](https://arxiv.org/html/2606.14892#A3.SS1)\.
2. Q2\.How does an RSCM differ from a standard SCM? Answer\.A standard SCM is defined over a fixed set of variables\. It assumes that each ‘unit’ or object in the domain \(e\.g\., cars, signals, or pedestrians\) can be described by this fixed set of variables\. Typically, it also assumes that units are i\.i\.d\.\. This assumption can be restrictive when units causally affect each other\. This has motivated a number of generalizations of SCMs to settings with interference between units\. However, these methods still assume a fixed set of units with a fixed set of relationships between them, allowing for queries about this network to be answered using data from the same network\. RSCMs generalize this fixed\-unit view by defining a causal data\-generating process that can be instantiated for any set of units \(of possibly different types\) and relations between them\. RSCMs achieve this by making causal relationships contingent on relational constraints\. For example, they may define that a signal’s state affects a car’s braking speed whenever the signal controls the car, and that the same mechanism for this effect is shared across cars\. This mechanism can be applied to any set of signals and cars\. Thus, RSCMs enable answering queries about one network of units using data from another \(we call this cross\-skeleton relational identification \(Def\.[4\.2](https://arxiv.org/html/2606.14892#S4.Thmtheorem2)\), evaluated in Exp\.[6\.1](https://arxiv.org/html/2606.14892#S6.SS1)\)\.
3. Q3\.Does grounding an RSCM just produce an ordinary SCM? If so, why not work directly with that SCM? Answer\.For any fixed skeletonρ\\rho, grounding an RSCMℳ\\mathcal\{M\}does produce an ordinary SCMℳρ\\mathcal\{M\}\_\{\\rho\}over instantiated variables\. The advantage of the RSCM is that it explains how these ordinary SCMs across different skeletons are related\. Different traffic scenes may contain different numbers of cars, signals, and pedestrians, giving rise to different ground SCMs\. The RSCM ties these ground SCMs together by sharing mechanisms across object and attribute types\. This is what allows the framework to ask whether causal information from one skeleton can transfer to another\.
4. Q4\.Do RSCMs require stronger assumptions than standard SCMs? Answer\.Not necessarily\. RSCMs require explicit assumptions about the relational structure, but standard SCMs also require this in assuming that units are i\.i\.d\.\. RSCMs allow a researcher to relax this assumption, allowing more general interactions between units to be made explicit\. Often, these interactions themselves can be learned from data; see Q6 for an example\.
5. Q5\.When is a standard SCM preferable, and when is an RSCM preferable? Answer\.A standard SCM is preferable when the causal problem naturally has a fixed set of variables and the units can reasonably be treated as i\.i\.d\.\. If the setting involves units that causally affect each other, an RSCM is preferable for both theoretical guarantees and empirical accuracy \(Fig\.[3](https://arxiv.org/html/2606.14892#S5.F3)\) as suggested by our experiments\.
6. Q6\.What prior knowledge is needed in practice? Answer\.In our identification results, we assume that the schema, skeletons, and relational causal graph are given\. This is analogous to graphical identification in standard causal inference, where the causal graph is assumed to be given\. This prior knowledge can come from expert knowledge, or learned from data\. Our framework is complementary to relational causal discovery for learning the causal graph and to methods that infer relational structure \(i\.e\., skeletons\) from raw data, such as scene graph extraction in vision\.
7. Q7\.What is the intuition behind the non\-identifiability result for unseen skeletons \(Thm\.[3\.3](https://arxiv.org/html/2606.14892#S3.Thmtheorem3)\)? Answer\.The key issue is that source skeletons may not reveal how the model behaves under relational configurations that occur only in the target skeleton\. Two RSCMs can agree on every observed source skeleton but disagree on an unseen target skeleton\. For example, suppose the target skeleton contains a relational pattern, such as three pedestrians being in the path of a car, that never occurs in the source skeletons\. One RSCM may define the car’s behavior to change when this pattern occurs, while another RSCM may ignore that pattern\. If the pattern is absent from all source skeletons, both models induce the same source distributions, but they can induce different target distributions\. Therefore, the target distribution is not identifiable without further assumptions\.
8. Q8\.How should Thm\.[3\.3](https://arxiv.org/html/2606.14892#S3.Thmtheorem3)be interpreted for relational machine learning? Answer\.Thm\.[3\.3](https://arxiv.org/html/2606.14892#S3.Thmtheorem3)formalizes a limitation of generalization across relational structures \(e\.g\., as in inductive graph learning\) even for observational queries–the lowest rung of the causal hierarchy, that does not involve interventions or counterfactuals\. The theorem states that even if mechanisms are shared across objects of the same type \(as is common in relational machine learning\), theobservational distributions of a given set of skeletons may not uniquely determine the observational distribution of an unseen skeleton\. This suggests that parameter sharing alone is not enough to guarantee cross\-skeleton generalization\. This does not mean that relational models such as GNNs cannot generalize in practice\. Rather, it clarifies that successful generalization relies on additional assumptions, such as smoothness, regularity, invariance, or architectural biases\. Our framework leverages relational causal graphs to make such assumptions explicit\.
9. Q9\.How does the approach scale to larger relational structures? Answer\.The main source of scalability is parameter sharing\. The same structural mechanism is reused across all instances of a given attribute type\. For example, the same car\-braking mechanism applies to every car, regardless of how many cars appear in a particular skeleton\. In additional experiments, we train on skeletons with up to5050objects and evaluated on unseen skeletons with up to7575objects \(Exp\.[E\.7](https://arxiv.org/html/2606.14892#A5.SS7)\)\. Accuracy remains stable as the number of objects increased, while runtime increases moderately\. That said, scalability also depends on the choice of relational aggregation\. High\-dimensional aggregators, such as histograms over large neighborhoods, can be expensive; lower\-dimensional summaries such as sums, means, maxima, or attention pooling may be preferable in large\-scale applications\.
10. Q10\.Does the framework require discrete variables? Answer\.The identification theory of Sec\.[4](https://arxiv.org/html/2606.14892#S4)does not require discrete variables\. The discreteness assumption enters in our neural implementation \(Sec\.[5](https://arxiv.org/html/2606.14892#S5)\), which follows discrete canonical neural causal model constructions\. Extending the neural implementation to continuous variables with theoretical guarantis an important direction for future work\. Conceptually, continuous attributes can be included in an RSCM in the same way as discrete attributes: they are attributes of entities or relations and have structural equations\. The main challenge is practical estimation and optimization, not the relational semantics themselves\.
11. Q11\.How do attributes on relations fit into the framework? Answer\.Relation attributes are attributes attached to relation instances rather than entity instances\. For example, consider entities𝖲𝗍𝗎𝖽𝖾𝗇𝗍\\mathsf\{Student\}and𝖢𝗈𝗎𝗋𝗌𝖾\\mathsf\{Course\}, and a relation𝖳𝖺𝗄𝖾𝗌\(𝖲𝗍𝗎𝖽𝖾𝗇𝗍,𝖢𝗈𝗎𝗋𝗌𝖾\)\\mathsf\{Takes\}\(\\mathsf\{Student\},\\mathsf\{Course\}\)\. The relation instance𝖳𝖺𝗄𝖾𝗌\(s,c\)\\mathsf\{Takes\}\(s,c\)could have an attribute𝖳𝖺𝗄𝖾𝗌\.G\\mathsf\{Takes\}\.G, representing the grade studentssreceives in coursecc\. Relation attributes behave like entity attributes in the formalism\.
Relational Structural Causal Models

Similar Articles

An Introduction to Causal Reinforcement Learning

Observable Patterns Are Not Explanations: A Causal-Geometric Analysis of Latent Reasoning Models

Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence

Cell-Based Representation of Relational Binding in Language Models

Causal Discovery in the Era of Agents

Submit Feedback

Similar Articles

An Introduction to Causal Reinforcement Learning
Observable Patterns Are Not Explanations: A Causal-Geometric Analysis of Latent Reasoning Models
Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence
Cell-Based Representation of Relational Binding in Language Models
Causal Discovery in the Era of Agents