Optimal Experiments for Partial Causal Effect Identification

arXiv cs.AI 05/11/26, 04:00 AM Papers
Summary
This paper introduces the 'max-potency problem' for selecting cost-constrained experiments to maximize the tightening of bounds on partial causal effects. The authors propose graphical pruning criteria to reduce the search space and demonstrate the method on NHANES health data.
arXiv:2605.06993v1 Announce Type: new Abstract: Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically costly. We study the problem of selecting, prior to observing experimental outcomes, a cost-constrained subset of experiments that maximally tightens bounds on a target query. We formalize this as the max-potency problem, where epistemic potency measures the worst-case reduction in bound width guaranteed by an experiment, and show that this problem is NP-hard via a reduction from 0-1 knapsack. Building on the polynomial-programming framework of Duarte et al. (2023), we give a general procedure for evaluating epistemic potency in discrete settings. To control the super-exponential search space, we introduce two graphical pruning criteria that depend only on the causal graph and the query: a novel path-interception rule that exploits district structure to certify zero potency in linear time, and an identifiability check based on the ID algorithm. On Erdos-Renyi random graphs and 11 bnlearn benchmark networks, the two criteria together prune 50-88% of candidate experiments on average without solving a single polynomial program. For the general subset search, we show that ID-pruned experiments are combinatorially inert, yielding a super-exponential reduction in the number of subsets evaluated. We close with an end-to-end demonstration on observational NHANES data, selecting optimal experiments for estimating the effect of physical activity on diabetes.
Original Article
View Cached Full Text
Cached at: 05/11/26, 07:10 AM
# Optimal Experiments for Partial Causal Effect Identification
Source: [https://arxiv.org/html/2605.06993](https://arxiv.org/html/2605.06993)
Tobias Maringgele Technical University of Munich tobias\.maringgele@tum\.de &Jalal Etesami Technical University of Munich j\.etesami@tum\.de

###### Abstract

Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically costly\. We study the problem of selecting, prior to observing experimental outcomes, a cost\-constrained subset of experiments that maximally tightens bounds on a target query\. We formalize this as the*max\-potency problem*, where epistemic potency measures the worst\-case reduction in bound width guaranteed by an experiment, and show that this problem is NP\-hard via a reduction from 0\-1 knapsack\. Building on the polynomial\-programming framework ofDuarteet al\.\([2023](https://arxiv.org/html/2605.06993#bib.bib13)\), we give a general procedure for evaluating epistemic potency in discrete settings\. To control the super\-exponential search space, we introduce two graphical pruning criteria that depend only on the causal graph and the query: a novel path\-interception rule that exploits district structure to certify zero potency in linear time, and an identifiability check based on the ID algorithm\. On Erdős–Rényi random graphs and 11*bnlearn*benchmark networks, the two criteria together prune 50–88% of candidate experiments on average without solving a single polynomial program\. For the general subset search, we show that ID\-pruned experiments are combinatorially inert, yielding a super\-exponential reduction in the number of subsets evaluated\. We close with an end\-to\-end demonstration on observational NHANES data, selecting optimal experiments for estimating the effect of physical activity on diabetes\.

## 1Introduction

A central goal of causal inference is to estimate the effect of a treatment on an outcome\. Population\-level summaries such as the average treatment effect \(ATE\) capture aggregate behavior, but many decisions hinge on finer\-grained counterfactual quantities—for example, the probability that a treatment is both necessary and sufficient to produce an outcome \(PNS\)\(Pearl,[1999](https://arxiv.org/html/2605.06993#bib.bib4); Mueller and Pearl,[2022](https://arxiv.org/html/2605.06993#bib.bib3)\)\. Such queries are rarely point\-identifiable from observational data; the best one can do is derive*bounds*\(Balke and Pearl,[1997](https://arxiv.org/html/2605.06993#bib.bib11); Tian and Pearl,[2000](https://arxiv.org/html/2605.06993#bib.bib16)\), which are often too wide to be decision\-relevant\. Tightening them requires additional data, and additional data means experiments, which are costly and subject to budget constraints\.

This motivates the central question of this paper:*given a causal graph, an observational distribution, and a budget, which subset of experiments, chosen ex ante, most tightens bounds on a target query?*Answering it well means reasoning about how informative an experiment will be*before*running it, unlike sequential designs that adapt to realized outcomes\(Aileret al\.,[2024](https://arxiv.org/html/2605.06993#bib.bib36)\), and doing so over a search space that grows super\-exponentially in the number of variables\.

Our contributions are as follows:

- •We formalize this as the*max\-potency problem*, where epistemic potency measures the worst\-case reduction in bound width an experiment guarantees, and show it is NP\-hard via a reduction from 0\-1 knapsack \(Section[4](https://arxiv.org/html/2605.06993#S4)\)\.
- •We give a general procedure for evaluating epistemic potency in discrete settings, building on the polynomial\-programming framework ofDuarteet al\.\([2023](https://arxiv.org/html/2605.06993#bib.bib13)\)\(Section[5](https://arxiv.org/html/2605.06993#S5)\)\.
- •We introduce two graphical pruning criteria that depend only on the causal graph and the query to certify zero potency of candidate experiments without solving any polynomial program: a novel path\-interception rule that runs in linear time by exploiting the graph’s district structure, and an identifiability check based on the ID algorithm\(Shpitser and Pearl,[2006](https://arxiv.org/html/2605.06993#bib.bib6)\)\. We further show that ID\-pruned experiments are*combinatorially inert*: they can be removed from the subset search entirely \(Section[6](https://arxiv.org/html/2605.06993#S6)\)\.
- •On Erdős–Rényi random graphs and 11*bnlearn*benchmark networks, the two criteria together prune 50–88% of candidate experiments on average when selecting a single optimal experiment\. For the general subset search, ID\-inertness alone reduces the search space exponent by roughly60%60\\%on average—a super\-exponential reduction in the number of subsets that need to be evaluated\. We demonstrate the full pipeline end\-to\-end on observational NHANES data \(Section[7](https://arxiv.org/html/2605.06993#S7)\)\.

### 1\.1Related Work

Our work bridges two largely separate threads in causal inference: causal*identification*, which asks whether a query can be uniquely computed from the available data, and*partial identification*, which asks how tightly the query can be bounded when unique computation is impossible\. Optimal experimental design has been extensively studied for the former; we address it for the latter\.

Causal identification\.Shpitser and Pearl \([2006](https://arxiv.org/html/2605.06993#bib.bib6)\)present a sound and complete algorithm \(the*ID algorithm*\) for determining whether the causal effect of a subset of variables,𝑿\\boldsymbol\{X\}, on a target subset,𝒀\\boldsymbol\{Y\}, denoted byP\(𝒚∣do\(𝒙\)\)P\(\\boldsymbol\{y\}\\mid do\(\\boldsymbol\{x\}\)\), is identifiable from observational data alone, which holds if and only if a specific graphical structure known as a*hedge*is absent\.Bareinboim and Pearl \([2012](https://arxiv.org/html/2605.06993#bib.bib5)\)extend this to settings where a specific form of experimental data is also available, introducing*z\-identifiability*: a queryP\(𝒚∣do\(𝒙\)\)P\(\\boldsymbol\{y\}\\mid do\(\\boldsymbol\{x\}\)\)is*z\-identifiable*if it can be uniquely computed from the observational distributionP\(𝑽\)P\(\\boldsymbol\{V\}\)together with distributions obtained by intervening on a designated subset of variables𝒁\\boldsymbol\{Z\}\. Intuitively, interventions on𝒁\\boldsymbol\{Z\}can break certain confounding structures that prevent identification from observations alone\.Leeet al\.\([2019](https://arxiv.org/html/2605.06993#bib.bib28)\)further extend this to*general identifiability*\(g\-identifiability\), where the available data is an arbitrary collection of interventional distributions of the form\{P\(𝑽∖𝒁i∣do\(𝒁i\)\)\}i=0m\\\{P\(\\boldsymbol\{V\}\\setminus\\boldsymbol\{Z\}\_\{i\}\\mid do\(\\boldsymbol\{Z\}\_\{i\}\)\)\\\}\_\{i=0\}^\{m\}rather than a single family\.Kivvaet al\.\([2022](https://arxiv.org/html/2605.06993#bib.bib29)\)provide a sound and complete algorithm for g\-identifiability, reducing it to a sequence of classical ID sub\-problems\.

Experimental design\.Akbariet al\.\([2025](https://arxiv.org/html/2605.06993#bib.bib14)\)formulate the minimum\-cost intervention design \(MCID\) problem for*full*identification, show that it is NP\-hard, and propose an algorithm based on minimum hitting sets\.Elahiet al\.\([2024](https://arxiv.org/html/2605.06993#bib.bib30)\)reformulate MCID as weighted partial MAX\-SAT and integer linear programming instances, achieving orders\-of\-magnitude speedups in practice\. Both focus on point identification\. Closer to our regime,Aileret al\.\([2024](https://arxiv.org/html/2605.06993#bib.bib36)\)sequentially design indirect experiments to narrow bound gaps in nonlinear, confounded IV settings; we instead study ex\-ante subset selection in discrete graphs, with graphical certificates of zero potency\.

Partial identification and causal bounds\.Balke and Pearl \([1997](https://arxiv.org/html/2605.06993#bib.bib11)\)derive sharp \(i\.e\. tightest possible\) bounds on binary causal effects using linear programming over response\-type variables; their approach is restricted to discrete, in particular binary, settings with a specific graph structure\.Tian and Pearl \([2000](https://arxiv.org/html/2605.06993#bib.bib16)\)provide closed\-form bounds for the probabilities of causation \(PNS, PN, PS\) for a single binary treatment and outcome, also requiring discrete variables\.Zhanget al\.\([2022](https://arxiv.org/html/2605.06993#bib.bib35)\)bound counterfactual queries from arbitrary combinations of observational and interventional distributions via canonical SCMs\.Duarteet al\.\([2023](https://arxiv.org/html/2605.06993#bib.bib13)\)further generalize to arbitrary causal and counterfactual queries in discrete settings via polynomial programming\.Sachset al\.\([2023](https://arxiv.org/html/2605.06993#bib.bib10)\)characterize conditions under which these polynomial programs reduce to linear programs\.Shridharan and Iyengar \([2023b](https://arxiv.org/html/2605.06993#bib.bib38)\)extend this class with response\-type aggregation to prune the underlying LP\.Shridharan and Iyengar \([2023a](https://arxiv.org/html/2605.06993#bib.bib39)\)further show that on quasi\-Markovian graphs\(Zaffalonet al\.,[2020](https://arxiv.org/html/2605.06993#bib.bib40)\), the polynomial degree reduces to the number of*intervened*c\-components, yielding LP or QP formulations when at most two c\-components are intervened upon\. Like ours, the latter exploits a c\-factor decomposition\(Tian and Pearl,[2002](https://arxiv.org/html/2605.06993#bib.bib8)\), but to reduce the structural complexity of the inner program on a restricted graph class; we instead use c\-factor structure \(Section[6\.1](https://arxiv.org/html/2605.06993#S6.SS1)\) to certify experiments as uninformative ex ante on general ADMGs\. In a related vein,Liet al\.\([2026](https://arxiv.org/html/2605.06993#bib.bib2)\)characterize when sharp bounds have width at most2ϵ2\\epsilonfor binary probabilities of causation\.

## 2Preliminaries

Notation\.Random variables are denoted by capital letters and their values by lowercase letters; bold denotes sets\. For a random variableXX, we writesupp\(X\)supp\(X\)to refer to its support\. WhenXXis a binary variable, i\.e\.\|supp\(X\)\|=2\|supp\(X\)\|=2, we usexxfor the caseX=1X=1\(ortrue\) andx′x^\{\\prime\}forX=0X=0\(orfalse\)\.

Structural Causal Model \(SCM\)\.An SCM is a tupleℳ=\(𝑽,𝑼,𝑭,P\(𝑼\)\)\\mathcal\{M\}=\(\\boldsymbol\{V\},\\boldsymbol\{U\},\\boldsymbol\{F\},P\(\\boldsymbol\{U\}\)\)of observed \(endogenous\) variables𝑽\\boldsymbol\{V\}, latent \(exogenous\) variables𝑼\\boldsymbol\{U\}, structural equations𝑭=\{fV\}V∈𝑽\\boldsymbol\{F\}=\\\{f\_\{V\}\\\}\_\{V\\in\\boldsymbol\{V\}\}, where eachfVf\_\{V\}determines the value ofVVas a function of its \(latent and observed\) parents\.

Acyclic Directed Mixed Graph \(ADMG\)\.The causal structure of an SCM is summarized by an ADMG𝒢=\(𝑽,Ed,Eb\)\\mathcal\{G\}=\(\\boldsymbol\{V\},E^\{d\},E^\{b\}\)over observed variables, with directed edgesVi→Vj∈EdV\_\{i\}\\to V\_\{j\}\\in E^\{d\}indicating thatVjV\_\{j\}structurally depends onViV\_\{i\}, and bidirected edgesVi↔Vj∈EbV\_\{i\}\\leftrightarrow V\_\{j\}\\in E^\{b\}indicating a hidden common cause ofViV\_\{i\}andVjV\_\{j\}\. The directed part is acyclic\. For a nodeV∈𝑽V\\in\\boldsymbol\{V\},𝒑𝒂\(V\)\\boldsymbol\{pa\}\(V\),𝒄𝒉\(V\)\\boldsymbol\{ch\}\(V\), and𝒂𝒏𝒄\(V\)\\boldsymbol\{anc\}\(V\)denote its parents, children, and ancestors, respectively; these extend to sets naturally\. In the figures of this paper, hidden common causes are shown as dashed nodes with directed edges to their children; this is equivalent to bidirected edges in the ADMG representation\.

An*intervention*on𝑿⊆𝑽\\boldsymbol\{X\}\\subseteq\\boldsymbol\{V\}, denoted bydo\(𝑿=𝒙\)do\(\\boldsymbol\{X\}=\\boldsymbol\{x\}\), replaces each structural equationfXf\_\{X\}forX∈𝑿X\\in\\boldsymbol\{X\}with the constant valuex∈supp\(X\)x\\in supp\(X\), removing all incoming edges to𝑿\\boldsymbol\{X\}in𝒢\\mathcal\{G\}\. The resulting*post\-intervention distribution*on𝒀⊆𝑽∖𝑿\\boldsymbol\{Y\}\\subseteq\\boldsymbol\{V\}\\setminus\\boldsymbol\{X\},P\(𝒀𝒙\)=P\(𝒀∣do\(𝑿=𝒙\)\)P\(\\boldsymbol\{Y\}\_\{\\boldsymbol\{x\}\}\)=P\(\\boldsymbol\{Y\}\\mid do\(\\boldsymbol\{X\}=\\boldsymbol\{x\}\)\), may differ from the observational conditionalP\(𝒀∣𝑿=𝒙\)P\(\\boldsymbol\{Y\}\\mid\\boldsymbol\{X\}=\\boldsymbol\{x\}\)whenever hidden confounders are present\. We writeP\(𝒀\|do\(𝑿\)\)P\(\\boldsymbol\{Y\}\|do\(\\boldsymbol\{X\}\)\)to denote the family of post\-intervention distributions indexed by𝒙∈supp\(𝑿\)\\boldsymbol\{x\}\\in supp\(\\boldsymbol\{X\}\)\. A*counterfactual*is a statement that involves contradictory interventions\. For example, in the binary case, a counterfactual probability may take the formP\(yx,yx′′\)P\(y\_\{x\},y^\{\\prime\}\_\{x^\{\\prime\}\}\)for variablesX,YX,Y\. In this paper, we will only consider counterfactuals that are conjunctive statements like the above, where each conjunct is a counterfactual*world*\. We denote the set of all counterfactual worlds appearing in a given statement byℭ\\mathfrak\{C\}\. In the example above,\|ℭ\|=2\|\\mathfrak\{C\}\|=2, with interventions onX\(𝔠\)X^\{\(\\mathfrak\{c\}\)\}and outcomes onY\(𝔠\)Y^\{\(\\mathfrak\{c\}\)\}for each world𝔠∈ℭ\\mathfrak\{c\}\\in\\mathfrak\{C\}\.

Discrete variables and response\-type variables\.We assume all variables in𝑽\\boldsymbol\{V\}are discrete with known finite supports\. Since each structural equationfVf\_\{V\}maps finitely many parent configurations to a finite output, it can only encode a finite number of distinct input\-output behaviors\. For each latent disturbanceUkU\_\{k\}, these behaviors over the children ofUkU\_\{k\}are indexed by a discrete*response\-type*RkR\_\{k\}\. The collection𝑹=\{Rk\}Uk∈𝑼\\boldsymbol\{R\}=\\\{R\_\{k\}\\\}\_\{U\_\{k\}\\in\\boldsymbol\{U\}\}is finite, and a fixed𝒓∈supp\(𝑹\)\\boldsymbol\{r\}\\in supp\(\\boldsymbol\{R\}\)fully determines the model \(including all counterfactuals\)\. Replacing latent variables𝑼\\boldsymbol\{U\}with𝑹\\boldsymbol\{R\}yields an equivalent model with no loss of generality\(Duarteet al\.,[2023](https://arxiv.org/html/2605.06993#bib.bib13), Proposition 1\)\. Given\(𝒢,P\(𝑽\),supp\(𝑽\)\)\(\\mathcal\{G\},P\(\\boldsymbol\{V\}\),supp\(\\boldsymbol\{V\}\)\)and a target query, the polynomial\-programming framework ofDuarteet al\.\([2023](https://arxiv.org/html/2605.06993#bib.bib13)\)constructs a program over𝑹\\boldsymbol\{R\}whose optima are sharp bounds on the query\. See Appendix[A](https://arxiv.org/html/2605.06993#A1)for the formal construction\.

## 3Causal Bounds and Epistemic Potency

Setup\.We are given an ADMG𝒢\\mathcal\{G\}over observed variables𝑽\\boldsymbol\{V\}\(all discrete\) with distributionP\(𝑽\)P\(\\boldsymbol\{V\}\)and a causal queryθ\\theta, which may correspond to a post\-intervention, a counterfactual, or a functional thereof\. Note that using the framework ofDuarteet al\.\([2023](https://arxiv.org/html/2605.06993#bib.bib13)\), the sharp observational boundsLθ≤θ≤UθL\_\{\\theta\}\\leq\\theta\\leq U\_\{\\theta\}are obtained by optimizing a polynomial program over𝑹\\boldsymbol\{R\}\.

We define𝒜:=\{\(yx\),\(yx,yx′′\),…\}\\mathcal\{A\}:=\\\{\(y\_\{x\}\),\(y\_\{x\},y^\{\\prime\}\_\{x^\{\\prime\}\}\),\\dots\\\}as a set of*candidate experiments*, i\.e\., arbitrary polynomial functions of the response\-type variables𝑹\\boldsymbol\{R\}\. We use the term broadly:𝒜\\mathcal\{A\}may contain physically realizable experiments such as randomized controlled trials and interventions\(Raghavan and Bareinboim,[2025](https://arxiv.org/html/2605.06993#bib.bib37)\), as well as structural assumptions defensible in the application domain, such as monotonicity\. The polynomial program treats both uniformly as constraints on𝑹\\boldsymbol\{R\}\. For any𝒶⊆𝒜\\mathcal\{a\}\\subseteq\\mathcal\{A\}with outcomesp𝒶p\_\{\\mathcal\{a\}\}, adding the constraintsf𝒶\(𝑹\)=p𝒶f\_\{\\mathcal\{a\}\}\(\\boldsymbol\{R\}\)=p\_\{\\mathcal\{a\}\}to the polynomial program yields new sharp bounds, denoted byUθ\(𝒶,p𝒶\)U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\(upper\) andLθ\(𝒶,p𝒶\)L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\(lower\)\. We writeUθU\_\{\\theta\}andLθL\_\{\\theta\}for the bounds without any experimental constraint \(i\.e\.,𝒶=∅\\mathcal\{a\}=\\emptyset\)\. We then define theworst\-case bound widthas follows:

𝒲θ\(𝒶\):=maxp𝒶∈∏𝒶~∈𝒶\[lp𝒶~,up𝒶~\]\(Uθ\(𝒶,p𝒶\)−Lθ\(𝒶,p𝒶\)\),\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\):=\\underset\{\\begin\{subarray\}\{c\}p\_\{\\mathcal\{a\}\}\\in\\prod\_\{\\tilde\{\\mathcal\{a\}\}\\in\\mathcal\{a\}\}\[l\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\},u\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\}\]\\end\{subarray\}\}\{\\max\}\\bigl\(U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\\bigr\),wherelp𝒶~l\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\}andup𝒶~u\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\}are the sharp observational bounds onp𝒶~p\_\{\\tilde\{\\mathcal\{a\}\}\}, itself a causal query\. We take the maximum because bounds must hold regardless of the experimental outcome; any smaller choice could yield a bound width that is exceeded in practice\.111Possible alternatives are briefly discussed in Appendix[B](https://arxiv.org/html/2605.06993#A2)\.We denote the observational bound width as𝒲θ:=𝒲θ\(∅\)=Uθ−Lθ\\mathcal\{W\}\_\{\\theta\}:=\\mathcal\{W\}\_\{\\theta\}\(\\emptyset\)=U\_\{\\theta\}\-L\_\{\\theta\}\.

For illustration, Figure[1\(b\)](https://arxiv.org/html/2605.06993#S3.F1.sf2)shows an ADMG with𝑽=\{A,B,X,M,Y,C,D\}\\boldsymbol\{V\}=\\\{A,B,X,M,Y,C,D\\\}\(all binary\)\. Restricting𝒜\\mathcal\{A\}to single\-world interventionsP\(𝑾∣do\(𝒁=𝒛\)\)P\(\\boldsymbol\{W\}\\mid do\(\\boldsymbol\{Z\}\{=\}\\boldsymbol\{z\}\)\)with𝑾∩𝒁=∅\\boldsymbol\{W\}\\cap\\boldsymbol\{Z\}=\\emptyset, eachV∈𝑽V\\in\\boldsymbol\{V\}falls into exactly one of three roles: intervened with a chosen value, measured as an outcome, or neither\. This results in\|𝒜\|=∏V∈𝑽\(\|supp\(V\)\|\+2\)=47=16,384\|\\mathcal\{A\}\|\\;=\\;\\prod\_\{V\\in\\boldsymbol\{V\}\}\\big\(\|supp\(V\)\|\+2\\big\)\\;=\\;4^\{7\}\\;=\\;16\{,\}384candidate experiments\.

XXYYUU

\(a\)An ADMG with two observed variables and a hidden confounder\.
AABBXXMMYYCCDDR3R\_\{3\}R1R\_\{1\}R2R\_\{2\}R4R\_\{4\}

\(b\)An ADMG with seven observed binary variables and four hidden common causes\.

Figure 1:Two example ADMGs\.###### Example 3\.1\(Observational bounds on PNS\)\.

Consider a binary treatmentXXwith causal effect on a binary outcomeYY, confounded by a hidden variableUU\. The ADMG is shown in Figure[1\(a\)](https://arxiv.org/html/2605.06993#S3.F1.sf1)\. Our query is the probability of necessity and sufficiencyPNS=P\(yx,yx′′\)\\text\{PNS\}=P\(y\_\{x\},y^\{\\prime\}\_\{x^\{\\prime\}\}\), whereyxy\_\{x\}denotesY=y∣do\(X=x\)Y\{=\}y\\mid do\(X\{=\}x\)\.

Tian and Pearl\([2000](https://arxiv.org/html/2605.06993#bib.bib16)\)showed that the sharp bounds for the PNS are:

max\{0,P\(yx\)−P\(yx′\),P\(y\)−P\(yx′\),P\(yx\)−P\(y\)\}≤PNS≤min\{P\(yx\),P\(x,y\)\+P\(x′,y′\),P\(yx′′\),P\(yx\)−P\(yx′\)\+P\(x,y′\)\+P\(x′,y\)\}\.\\max\\left\\\{\\\!\\\!\\\!\\begin\{array\}\[\]\{l\}0,\\ P\(y\_\{x\}\)\-P\(y\_\{x^\{\\prime\}\}\),\\\\ P\(y\)\-P\(y\_\{x^\{\\prime\}\}\),P\(y\_\{x\}\)\-P\(y\)\\end\{array\}\\\!\\\!\\\!\\right\\\}\\\!\\leq\\\!\\text\{PNS\}\\\!\\leq\\\!\\min\\\!\\left\\\{\\\!\\\!\\\!\\begin\{array\}\[\]\{l\}P\(y\_\{x\}\),\\ P\(x,y\)\+P\(x^\{\\prime\},y^\{\\prime\}\),\\\\ P\(y^\{\\prime\}\_\{x^\{\\prime\}\}\),\\ P\(y\_\{x\}\)\-P\(y\_\{x^\{\\prime\}\}\)\+P\(x,y^\{\\prime\}\)\+P\(x^\{\\prime\},y\)\\end\{array\}\\\!\\\!\\\!\\right\\\}\.In this example, two candidate experiments are available:𝒶1:yx\\mathcal\{a\}\_\{1\}:y\_\{x\}and𝒶2:yx′\\mathcal\{a\}\_\{2\}:y\_\{x^\{\\prime\}\}\. SinceP\(yx\)P\(y\_\{x\}\)andP\(yx′\)P\(y\_\{x^\{\\prime\}\}\)are not identifiable from observational data, only the purely observational terms in the above bounds remain, giving0≤PNS≤P\(x,y\)\+P\(x′,y′\)0\\leq\\text\{PNS\}\\leq P\(x,y\)\+P\(x^\{\\prime\},y^\{\\prime\}\)\. For instance, by assumingP\(x,y\)=0\.2,P\(x,y′\)=0\.1,P\(x′,y\)=0\.3,P\(x,y\)=0\.2,P\(x,y^\{\\prime\}\)=0\.1,P\(x^\{\\prime\},y\)=0\.3,andP\(x′,y′\)=0\.4P\(x^\{\\prime\},y^\{\\prime\}\)=0\.4, we get𝒲θ\(∅\)=0\.6\\mathcal\{W\}\_\{\\theta\}\(\\emptyset\)=0\.6\.

###### Definition 3\.2\(Epistemic Potency\)\.

For any𝒶⊆𝒜\\mathcal\{a\}\\subseteq\\mathcal\{A\}and a given causal queryθ\\theta, we definepotθ\(𝒶\):=𝒲θ−𝒲θ\(𝒶\)pot\_\{\\theta\}\(\\mathcal\{a\}\):=\\mathcal\{W\}\_\{\\theta\}\-\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)\. Ifpotθ\(𝒶\)=0pot\_\{\\theta\}\(\\mathcal\{a\}\)=0, we call𝒶\\mathcal\{a\}*useless*\. For a single experiment𝒶~∈𝒜\\tilde\{\\mathcal\{a\}\}\\in\\mathcal\{A\}, we writepotθ\(𝒶~\)pot\_\{\\theta\}\(\\tilde\{\\mathcal\{a\}\}\)as shorthand forpotθ\(\{𝒶~\}\)pot\_\{\\theta\}\(\\\{\\tilde\{\\mathcal\{a\}\}\\\}\)\.

Potency measures how much the worst\-case bound width on a causal queryθ\\thetais reduced by performing experiment𝒶\\mathcal\{a\}\. The following result establishes that potency is always non\-negative\. In words, experiments can only help, never hurt\.

###### Lemma 3\.3\.

For any𝒶⊆𝒜\\mathcal\{a\}\\subseteq\\mathcal\{A\},potθ\(𝒶\)≥0pot\_\{\\theta\}\(\\mathcal\{a\}\)\\geq 0\.

This is because adding experimental constraints restricts the set of feasible response\-type configurations, which can only tighten the resulting bounds\. All detailed proofs are in Appendix[F](https://arxiv.org/html/2605.06993#A6)\. The next result gives a useful characterization of zero potency that will underpin our pruning criteria: an experiment has zero potency if and only if some feasible outcome leaves both bounds unchanged\.

###### Lemma 3\.4\.

For any𝒶⊆𝒜\\mathcal\{a\}\\subseteq\\mathcal\{A\},potθ\(𝒶\)=0pot\_\{\\theta\}\(\\mathcal\{a\}\)=0if and only if∃p𝒶†∈∏𝒶~∈𝒶\[lp𝒶~,up𝒶~\]\\exists\\,p\_\{\\mathcal\{a\}\}^\{\\dagger\}\\in\\prod\_\{\\tilde\{\\mathcal\{a\}\}\\in\\mathcal\{a\}\}\[l\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\},u\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\}\]s\.t\.Uθ\(𝒶,p𝒶†\)=UθU\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)=U\_\{\\theta\}andLθ\(𝒶,p𝒶†\)=LθL\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)=L\_\{\\theta\}\.

###### Example 3\.5\.

Consider the setting of Example[3\.1](https://arxiv.org/html/2605.06993#S3.Thmtheorem1)\. \(1\) Performing𝒶1\\mathcal\{a\}\_\{1\}: From\(Tian and Pearl,[2000](https://arxiv.org/html/2605.06993#bib.bib16)\), we obtain thatp𝒶1∈\[P\(x,y\),1−P\(x,y′\)\]=\[0\.2,0\.9\]p\_\{\\mathcal\{a\}\_\{1\}\}\\in\[P\(x,y\),\\,1\-P\(x,y^\{\\prime\}\)\]=\[0\.2,0\.9\]\. \(2\) Performing𝒶2\\mathcal\{a\}\_\{2\}: Similarly, we havep𝒶2∈\[0\.3,0\.6\]p\_\{\\mathcal\{a\}\_\{2\}\}\\in\[0\.3,0\.6\]\. For each, we obtain the following bounds:

Uθ\(𝒶1,p𝒶1\)−Lθ\(𝒶1,p𝒶1\)=\{p𝒶1p𝒶1∈\[0\.2,0\.5\],0\.5p𝒶1∈\(0\.5,0\.6\],1\.1−p𝒶1p𝒶1∈\(0\.6,0\.9\]\.Uθ\(𝒶2,p𝒶2\)−Lθ\(𝒶2,p𝒶2\)=\{0\.1\+p𝒶2p𝒶2∈\[0\.3,0\.4\],0\.5p𝒶2∈\(0\.4,0\.5\],1−p𝒶2p𝒶2∈\[0\.5,0\.6\]\.\\displaystyle U\_\{\\theta\}\(\\\!\\mathcal\{a\}\_\{1\},p\_\{\\mathcal\{a\}\_\{1\}\}\\\!\)\\\!\\\!\-\\\!\\\!L\_\{\\theta\}\(\\\!\\mathcal\{a\}\_\{1\},p\_\{\\mathcal\{a\}\_\{1\}\}\\\!\)\\\!=\\\!\\begin\{cases\}\\\!p\_\{\\mathcal\{a\}\_\{1\}\}&\\\!\\\!\\\!p\_\{\\mathcal\{a\}\_\{1\}\}\\\!\\in\\\!\[0\.2,0\.5\],\\\\ \\\!0\.5&\\\!\\\!\\\!p\_\{\\mathcal\{a\}\_\{1\}\}\\\!\\in\\\!\(0\.5,0\.6\],\\\\ \\\!1\.1\\\!\-\\\!p\_\{\\mathcal\{a\}\_\{1\}\}&\\\!\\\!\\\!p\_\{\\mathcal\{a\}\_\{1\}\}\\\!\\in\\\!\(0\.6,0\.9\]\.\\end\{cases\}\\ U\_\{\\theta\}\(\\\!\\mathcal\{a\}\_\{2\},p\_\{\\mathcal\{a\}\_\{2\}\}\\\!\)\\\!\\\!\-\\\!\\\!L\_\{\\theta\}\(\\\!\\mathcal\{a\}\_\{2\},p\_\{\\mathcal\{a\}\_\{2\}\}\\\!\)\\\!=\\\!\\begin\{cases\}\\\!0\.1\\\!\+\\\!p\_\{\\mathcal\{a\}\_\{2\}\}&\\\!\\\!\\\!p\_\{\\mathcal\{a\}\_\{2\}\}\\\!\\in\\\!\[0\.3,0\.4\],\\\\ \\\!0\.5&\\\!\\\!\\\!p\_\{\\mathcal\{a\}\_\{2\}\}\\\!\\in\\\!\(0\.4,0\.5\],\\\\ \\\!1\\\!\-\\\!p\_\{\\mathcal\{a\}\_\{2\}\}&\\\!\\\!\\\!p\_\{\\mathcal\{a\}\_\{2\}\}\\\!\\in\\\!\[0\.5,0\.6\]\.\\end\{cases\}This implies the worst\-case widths are𝒲θ\(𝒶1\)=0\.5\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\_\{1\}\)=0\.5and𝒲θ\(𝒶2\)=0\.5\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\_\{2\}\)\\\!=\\\!0\.5\.

\(3\) Performing both: Let𝒶3=\{𝒶1,𝒶2\}\\mathcal\{a\}\_\{3\}\\\!=\\\!\\\{\\mathcal\{a\}\_\{1\}\\\!,\\mathcal\{a\}\_\{2\}\\\}\. MaximizingUθ\(𝒶3,p𝒶1,p𝒶2\)−Lθ\(𝒶3,p𝒶1,p𝒶2\)U\_\{\\theta\}\(\\mathcal\{a\}\_\{3\},p\_\{\\mathcal\{a\}\_\{1\}\},p\_\{\\mathcal\{a\}\_\{2\}\}\)\\\!\-\\\!L\_\{\\theta\}\(\\mathcal\{a\}\_\{3\},p\_\{\\mathcal\{a\}\_\{1\}\},p\_\{\\mathcal\{a\}\_\{2\}\}\)over\(p𝒶1,p𝒶2\)∈\[0\.2,0\.9\]×\[0\.3,0\.6\]\(p\_\{\\mathcal\{a\}\_\{1\}\},p\_\{\\mathcal\{a\}\_\{2\}\}\)\\\!\\in\\\!\[0\.2,0\.9\]\\\!\\times\\\!\[0\.3,0\.6\]yields𝒲θ\(𝒶3\)=0\.4\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\_\{3\}\)=0\.4\. Therefore,potθ\(𝒶1\)=0\.1pot\_\{\\theta\}\(\\mathcal\{a\}\_\{1\}\)=0\.1,potθ\(𝒶2\)=0\.1pot\_\{\\theta\}\(\\mathcal\{a\}\_\{2\}\)=0\.1,potθ\(𝒶3\)=0\.2pot\_\{\\theta\}\(\\mathcal\{a\}\_\{3\}\)=0\.2\. Thus,𝒶3\\mathcal\{a\}\_\{3\}strictly outperforms either individual experiment, illustrating that optimal design requires reasoning over*sets*of candidates\. See Figure[6](https://arxiv.org/html/2605.06993#A4.F6)in Appendix[D](https://arxiv.org/html/2605.06993#A4)\.

## 4The Max\-Potency Problem

Here, we introduce our problem\. Consider a cost functionc:2𝒜→\[0,∞\]c:2^\{\\mathcal\{A\}\}\\to\[0,\\infty\]that assigns a nonnegative cost to any combination of experiments in𝒜\\mathcal\{A\}, along with a budgetBB\. This function encodes the analyst’s burden uniformly across both physical experiments and structural assumptions in𝒜\\mathcal\{A\}—running an experiment or committing to an assumption\. We allow infinite cost for impossible experiments, and assume at least one𝒶⊆𝒜\\mathcal\{a\}\\subseteq\\mathcal\{A\}has cost at mostBB\. The*max\-potency problem*is given by

max𝒶⊆𝒜potθ\(𝒶\)s\.t\.c\(𝒶\)≤B\.\\underset\{\\mathcal\{a\}\\subseteq\\mathcal\{A\}\}\{\\text\{max\}\}\\ pot\_\{\\theta\}\(\\mathcal\{a\}\)\\quad\\text\{s\.t\.\}\\quad c\(\\mathcal\{a\}\)\\leq B\.
###### Theorem 4\.1\.

The max\-potency problem is NP\-hard\.

We reduce 0\-1 knapsack to max\-potency by constructing, for each knapsack item, a disjoint confounded treatment\-outcome cell whose per\-cell potency can be tuned to any target value in a non\-degenerate range\. Disjointness makes the districts independent, and the resulting potency is additive across cells, so the max\-potency optimum coincides with the knapsack optimum\. See Appendix[C](https://arxiv.org/html/2605.06993#A3)\.

###### Example 4\.2\.

Consider the ADMG in Figure[1\(a\)](https://arxiv.org/html/2605.06993#S3.F1.sf1), with the total budgetB=1B=1and each of𝒶1,𝒶2\\mathcal\{a\}\_\{1\},\\mathcal\{a\}\_\{2\}costs11; the combined experiment𝒶3=\{𝒶1,𝒶2\}\\mathcal\{a\}\_\{3\}=\\\{\\mathcal\{a\}\_\{1\},\\mathcal\{a\}\_\{2\}\\\}costs22\. The next table summarizes the resulting potencies\.

While𝒶3\\mathcal\{a\}\_\{3\}achieves the largest potency, it is infeasible underB=1B=1\. Both𝒶1\\mathcal\{a\}\_\{1\}and𝒶2\\mathcal\{a\}\_\{2\}are optimal, each guaranteeing a reduction of0\.10\.1\.

For completeness, Appendix[E\.4](https://arxiv.org/html/2605.06993#A5.SS4)gives an exact enumeration algorithm \(Algorithm[4](https://arxiv.org/html/2605.06993#alg4)\) that skips useless candidates identified by the pruning criteria of Section[6](https://arxiv.org/html/2605.06993#S6)\.

## 5Constructing the Polynomial Program

We compute the boundsLθL\_\{\\theta\}andUθU\_\{\\theta\}via the polynomial\-programming framework ofDuarteet al\.\([2023](https://arxiv.org/html/2605.06993#bib.bib13)\); we summarize the construction here and defer algorithmic details to Appendix[E\.1](https://arxiv.org/html/2605.06993#A5.SS1)\.

Base program\.Given a canonical ADMG𝒢\\mathcal\{G\}, an observational distributionP\(𝑽\)P\(\\boldsymbol\{V\}\), supportssupp\(𝑽\)supp\(\\boldsymbol\{V\}\), and a target queryθ\\theta,Duarteet al\.\([2023](https://arxiv.org/html/2605.06993#bib.bib13)\)produce a polynomial program𝒫∅=\(𝒯,𝒞∅\)\\mathcal\{P\}\_\{\\emptyset\}=\(\\mathcal\{T\},\\mathcal\{C\}\_\{\\emptyset\}\)whose decision variables are the response\-type probabilities\{P\(𝒓\):𝒓∈supp\(𝑹\)\}\\\{P\(\\boldsymbol\{r\}\):\\boldsymbol\{r\}\\in supp\(\\boldsymbol\{R\}\)\\\}\.*Target polynomial*𝒯\\mathcal\{T\}expresses the causal queryθ\\thetain the decision variables\. That is,𝒯=∑𝒓∈h\(θ\)P\(𝒓\)\\mathcal\{T\}=\\sum\_\{\\boldsymbol\{r\}\\in h\(\\theta\)\}P\(\\boldsymbol\{r\}\), whereh\(θ\)⊆supp\(𝑹\)h\(\\theta\)\\subseteq supp\(\\boldsymbol\{R\}\)collects the configurations under whichθ\\thetaholds \(Definition[6\.1](https://arxiv.org/html/2605.06993#S6.Thmtheorem1)\)\.*Constraint set*𝒞∅\\mathcal\{C\}\_\{\\emptyset\}comprises the probability axiomsP\(𝒓\)≥0P\(\\boldsymbol\{r\}\)\\geq 0and∑𝒓P\(𝒓\)=1\\sum\_\{\\boldsymbol\{r\}\}P\(\\boldsymbol\{r\}\)=1, together with the observational constraint, i\.e\.,∑𝒓∈h\(𝒗\)P\(𝒓\)=P\(𝒗\)\\sum\_\{\\boldsymbol\{r\}\\in h\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\)=P\(\\boldsymbol\{v\}\)for each𝒗∈supp\(𝑽\)\\boldsymbol\{v\}\\in supp\(\\boldsymbol\{V\}\)\. Minimizing \(maximizing\)𝒯\\mathcal\{T\}subject to𝒞∅\\mathcal\{C\}\_\{\\emptyset\}yields the sharp observational lower \(upper\) bound onθ\\theta\.

Adding experiments\.For𝒶⊆𝒜\\mathcal\{a\}\\subseteq\\mathcal\{A\}, each single experiment𝒶~∈𝒶\\tilde\{\\mathcal\{a\}\}\\in\\mathcal\{a\}with outcomep𝒶~p\_\{\\tilde\{\\mathcal\{a\}\}\}contributes a polynomial constraint of the form:f𝒶~\(𝑹\):=∑𝒓∈h\(𝒶~\)P\(𝒓\)=p𝒶~\.f\_\{\\tilde\{\\mathcal\{a\}\}\}\(\\boldsymbol\{R\}\):=\\sum\_\{\\boldsymbol\{r\}\\in h\(\\tilde\{\\mathcal\{a\}\}\)\}P\(\\boldsymbol\{r\}\)=p\_\{\\tilde\{\\mathcal\{a\}\}\}\.That is,f𝒶~f\_\{\\tilde\{\\mathcal\{a\}\}\}aggregates the probability mass across configurations compatible with𝒶~\\tilde\{\\mathcal\{a\}\}, which the data fixes top𝒶~p\_\{\\tilde\{\\mathcal\{a\}\}\}\. We writep𝒶=\(p𝒶~\)𝒶~∈𝒶p\_\{\\mathcal\{a\}\}=\(p\_\{\\tilde\{\\mathcal\{a\}\}\}\)\_\{\\tilde\{\\mathcal\{a\}\}\\in\\mathcal\{a\}\}for the collection of outcomes\. Setting𝒞𝒶:=𝒞∅∪\{f𝒶~\(𝑹\)=p𝒶~:𝒶~∈𝒶\}\\mathcal\{C\}\_\{\\mathcal\{a\}\}:=\\mathcal\{C\}\_\{\\emptyset\}\\cup\\big\\\{f\_\{\\tilde\{\\mathcal\{a\}\}\}\(\\boldsymbol\{R\}\)=p\_\{\\tilde\{\\mathcal\{a\}\}\}:\\tilde\{\\mathcal\{a\}\}\\in\\mathcal\{a\}\\big\\\}gives the augmented program𝒫𝒶=\(𝒯,𝒞𝒶\)\\mathcal\{P\}\_\{\\mathcal\{a\}\}=\(\\mathcal\{T\},\\mathcal\{C\}\_\{\\mathcal\{a\}\}\), whose optima are denoted byLθ\(𝒶,p𝒶\)L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)andUθ\(𝒶,p𝒶\)U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\. The quantity𝒲θ\(𝒶\)\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)is therefore obtained by maximizingUθ\(𝒶,p𝒶\)−Lθ\(𝒶,p𝒶\)U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)over feasible values ofp𝒶p\_\{\\mathcal\{a\}\}\. See Appendix[E\.2](https://arxiv.org/html/2605.06993#A5.SS2)for a coupled\-program formulation, which computes this worst\-case width without explicitly enumeratingp𝒶p\_\{\\mathcal\{a\}\}\.

## 6Pruning Useless Experiments

The max\-potency problem is computationally challenging for three reasons: \(1\) finding the optimal subset is NP\-hard even ifpotθpot\_\{\\theta\}were easy to evaluate; \(2\) evaluatingpotθpot\_\{\\theta\}requires solving polynomial programs; \(3\)\|𝒜\|\|\\mathcal\{A\}\|grows super\-exponentially in\|𝑽\|\|\\boldsymbol\{V\}\|\. We cannot easily circumvent \(1\) or \(2\), but \(3\) offers room: we can identify𝒜†⊆𝒜\\mathcal\{A\}^\{\\dagger\}\\subseteq\\mathcal\{A\}such that∀𝒶†∈𝒜†:potθ\(𝒶†\)=0\\forall\\mathcal\{a\}^\{\\dagger\}\\in\\mathcal\{A\}^\{\\dagger\}:pot\_\{\\theta\}\(\\mathcal\{a\}^\{\\dagger\}\)=0, a set of useless experiments,*without*solving any polynomial program\.

### 6\.1Relevant Response\-Types

To find and filter out the useless experiments, we exploit a result ofDuarteet al\.\([2023](https://arxiv.org/html/2605.06993#bib.bib13)\)that shows if every directed path from a response\-type to an outcome variable of𝒶\\mathcal\{a\}passes through an intervened variable, that response\-type is not needed for expressing𝒶\\mathcal\{a\}\. We extend this result and show that the base program𝒫∅\\mathcal\{P\}\_\{\\emptyset\}introduced in Section[5](https://arxiv.org/html/2605.06993#S5)optimizes only over a subset of response\-type variables𝑹θ∗⊆𝑹\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\\subseteq\\boldsymbol\{R\}\. Consequently, any experiment whose relevant response\-types lie entirely outside𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}must have zero potency\.

Strategy overview\.We proceed in four steps: \(1\) identify*query\-relevant*response\-types𝑹θ\\boldsymbol\{R\}\_\{\\theta\}\(those with an unblocked directed path to a query outcome\); \(2\) expand via the*district hull*to𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\(all response\-types in any district intersecting𝑹θ\\boldsymbol\{R\}\_\{\\theta\}\); \(3\) show via a Cartesian product decomposition and district factorization that𝒫∅\\mathcal\{P\}\_\{\\emptyset\}depends only on𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}; \(4\) show that any experiment𝒶\\mathcal\{a\}withR⋆\(𝒶\)∩𝑹θ∗=∅R^\{\\star\}\(\\mathcal\{a\}\)\\cap\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}=\\emptysethas zero potency, whereR⋆\(𝒶\)R^\{\\star\}\(\\mathcal\{a\}\)denotes the response\-type variables relevant for𝒶\\mathcal\{a\}\.

###### Definition 6\.1\(Compatible Response\-Types\)\.

Let𝒶\\mathcal\{a\}be a joint statement of counterfactual worldsℭ\\mathfrak\{C\}, each𝔠∈ℭ\\mathfrak\{c\}\\in\\mathfrak\{C\}specifying interventions𝐗\(𝔠\)\\boldsymbol\{X\}^\{\(\\mathfrak\{c\}\)\}and outcomes𝐘\(𝔠\)\\boldsymbol\{Y\}^\{\(\\mathfrak\{c\}\)\}\. The compatible configurations are:

h\(𝒶\):=\{𝒓∈supp\(𝑹\):∀𝔠∈ℭ,∀V∈𝒀\(𝔠\),FV\(𝔠\)\(𝒓\)=v\},h\(\\mathcal\{a\}\):=\\big\\\{\\boldsymbol\{r\}\\in supp\(\\boldsymbol\{R\}\):\\forall\\mathfrak\{c\}\\in\\mathfrak\{C\},\\;\\forall\\,V\\in\\boldsymbol\{Y\}^\{\(\\mathfrak\{c\}\)\},\\;F^\{\(\\mathfrak\{c\}\)\}\_\{V\}\(\\boldsymbol\{r\}\)=v\\big\\\},whereFV\(𝔠\)\(𝐫\)F^\{\(\\mathfrak\{c\}\)\}\_\{V\}\(\\boldsymbol\{r\}\)recursively evaluates the value ofVVin world𝔠\\mathfrak\{c\}, using the intervention value ifV∈𝐗\(𝔠\)V\\in\\boldsymbol\{X\}^\{\(\\mathfrak\{c\}\)\}and otherwise applying the structural equation\.

###### Example 6\.2\.

In Figure[1\(a\)](https://arxiv.org/html/2605.06993#S3.F1.sf1),XXandYYare binary\. Let the tuple\(RX,RY\)\\\!\(R\_\{X\},R\_\{Y\}\)be our response\-type replacingUU\.RX∈\{x′,x\}R\_\{X\}\\\!\\in\\\!\\\{x^\{\\prime\},x\\\}encodes the constant value taken byXX\.RYR\_\{Y\}encodes the function\{x′,x\}→\{y′,y\}\\\{x^\{\\prime\},x\\\}\\\!\\to\\\!\\\{y^\{\\prime\},y\\\}as the pair\(fY\(x′\),fY\(x\)\)∈\{\(y′,y′\),\(y′,y\),\(y,y′\),\(y,y\)\}\\big\(f\_\{Y\}\(x^\{\\prime\}\),\\,f\_\{Y\}\(x\)\\big\)\\\!\\in\\\!\\\{\(y^\{\\prime\},y^\{\\prime\}\),\\,\(y^\{\\prime\},y\),\\,\(y,y^\{\\prime\}\),\\,\(y,y\)\\\}\. ThePNS=P\(yx,yx′′\)\\text\{PNS\}\\\!=\\\!P\(y\_\{x\},y^\{\\prime\}\_\{x^\{\\prime\}\}\)requiresfY\(x,rY\)=yf\_\{Y\}\(x,r\_\{Y\}\)\\\!=\\\!yandfY\(x′,rY\)=y′f\_\{Y\}\(x^\{\\prime\},r\_\{Y\}\)\\\!=\\\!y^\{\\prime\}, which uniquely selectsrY=\(y′,y\)r\_\{Y\}\\\!=\\\!\(y^\{\\prime\},y\)\(the*complier*\);rXr\_\{X\}is unconstrained, thush\(PNS\)=\{\(x′,\(y′,y\)\),\(x,\(y′,y\)\)\}\.h\(\\text\{PNS\}\)\\\!=\\\!\\big\\\{\(x^\{\\prime\},\\,\(y^\{\\prime\},y\)\),\\;\(x,\\,\(y^\{\\prime\},y\)\)\\big\\\}\.

###### Definition 6\.3\(Districts𝑫\\boldsymbol\{D\}, District\-Compatible Response\-Types\)\.

The*districts*𝐃=\{𝐃1,…,𝐃c\}\\boldsymbol\{D\}=\\\{\\boldsymbol\{D\}\_\{1\},\\dots,\\boldsymbol\{D\}\_\{c\}\\\}are the connected components of the bidirected graph obtained by replacing eachR∈𝐑R\\in\\boldsymbol\{R\}with a bidirected edge between its children\.𝐃\\boldsymbol\{D\}partitions𝐕∪𝐑\\boldsymbol\{V\}\\cup\\boldsymbol\{R\}\. For any realization𝐯∈supp\(𝐕\)\\boldsymbol\{v\}\\in supp\(\\boldsymbol\{V\}\)and subset of districts𝐃′⊆𝐃\\boldsymbol\{D\}^\{\\prime\}\\subseteq\\boldsymbol\{D\}with observed variables𝐕𝐃′=𝐃′∩𝐕\\boldsymbol\{V\}\_\{\\boldsymbol\{D\}^\{\\prime\}\}=\\boldsymbol\{D\}^\{\\prime\}\\cap\\boldsymbol\{V\}and response\-types𝐑𝐃′=𝐃′∩𝐑\\boldsymbol\{R\}\_\{\\boldsymbol\{D\}^\{\\prime\}\}=\\boldsymbol\{D\}^\{\\prime\}\\cap\\boldsymbol\{R\}, let

h𝑫′\(𝒗\):=\{𝒓𝑫′∈supp\(𝑹𝑫′\):∀Vi∈𝑽𝑫′,fVi\(𝒑𝒂Vi\(𝒗\),𝒓𝑫′\)=vi\},h\_\{\\boldsymbol\{D\}^\{\\prime\}\}\(\\boldsymbol\{v\}\):=\\big\\\{\\boldsymbol\{r\}\_\{\\boldsymbol\{D\}^\{\\prime\}\}\\in supp\(\\boldsymbol\{R\}\_\{\\boldsymbol\{D\}^\{\\prime\}\}\):\\forall\\,V\_\{i\}\\in\\boldsymbol\{V\}\_\{\\boldsymbol\{D\}^\{\\prime\}\},\\;f\_\{V\_\{i\}\}\\\!\\big\(\\boldsymbol\{pa\}\_\{V\_\{i\}\}\(\\boldsymbol\{v\}\),\\,\\boldsymbol\{r\}\_\{\\boldsymbol\{D\}^\{\\prime\}\}\\big\)=v\_\{i\}\\big\\\},where𝐩𝐚Vi\(𝐯\)\\boldsymbol\{pa\}\_\{V\_\{i\}\}\(\\boldsymbol\{v\}\)denotes the values assigned toViV\_\{i\}’s parents by𝐯\\boldsymbol\{v\}\. For a single district𝐃k\\boldsymbol\{D\}\_\{k\}, we writehk\(𝐯\):=h𝐃k\(𝐯\)h\_\{k\}\(\\boldsymbol\{v\}\):=h\_\{\\boldsymbol\{D\}\_\{k\}\}\(\\boldsymbol\{v\}\)\.

Next, using the fact thatfVf\_\{V\}depends only on the response\-types ofVV’s district, we show thath\(𝒗\)h\(\\boldsymbol\{v\}\)can be decomposed into district\-compatible response\-types\. See Appendix[D\.1](https://arxiv.org/html/2605.06993#A4.Thmtheorem1)for an example\.

###### Lemma 6\.5\(Cartesian Product Decomposition\)\.

For any𝐯∈supp\(𝐕\)\\boldsymbol\{v\}\\in supp\(\\boldsymbol\{V\}\):h\(𝐯\)=∏𝐃k∈𝐃hk\(𝐯\)h\(\\boldsymbol\{v\}\)=\\prod\_\{\\boldsymbol\{D\}\_\{k\}\\in\\boldsymbol\{D\}\}h\_\{k\}\(\\boldsymbol\{v\}\)\.

###### Definition 6\.6\(C\-Factor\(Tian and Pearl,[2002](https://arxiv.org/html/2605.06993#bib.bib8)\)\)\.

For district𝐃k\\boldsymbol\{D\}\_\{k\}and a topological orderV1≺⋯≺VnV\_\{1\}\\prec\\cdots\\prec V\_\{n\}, we defineQk\(𝐯\):=Q𝐃k\(𝐯\):=∏\{i:Vi∈𝐃k∩𝐕\}P\(vi∣𝐯\(i−1\)\)Q\_\{k\}\(\\boldsymbol\{v\}\):=Q\_\{\\boldsymbol\{D\}\_\{k\}\}\(\\boldsymbol\{v\}\):=\\prod\_\{\\\{i:V\_\{i\}\\in\\boldsymbol\{D\}\_\{k\}\\cap\\boldsymbol\{V\}\\\}\}P\(v\_\{i\}\\mid\\boldsymbol\{v\}^\{\(i\-1\)\}\), where𝐕\(i−1\):=∏j<iVj\\boldsymbol\{V\}^\{\(i\-1\)\}:=\\prod\_\{j<i\}V\_\{j\}and𝐕\(0\)=∅\\boldsymbol\{V\}^\{\(0\)\}=\\emptyset\. The joint distribution factorizes asP\(𝐯\)=∏k=1cQk\(𝐯\)P\(\\boldsymbol\{v\}\)=\\prod\_\{k=1\}^\{c\}Q\_\{k\}\(\\boldsymbol\{v\}\)\.

The next lemma connects c\-factors to response\-type sums, providing an algebraic bridge needed to eliminate the irrelevant districts\.

###### Lemma 6\.7\.

For any𝐃′⊆𝐃\\boldsymbol\{D\}^\{\\prime\}\\subseteq\\boldsymbol\{D\}and any𝐯∈supp\(𝐕\)\\boldsymbol\{v\}\\in supp\(\\boldsymbol\{V\}\), we can decompose∏𝐃k∈𝐃′Qk\(𝐯\)=∑𝐫𝐃′∈h𝐃′\(𝐯\)P\(𝐫𝐃′\)\.\\prod\_\{\\boldsymbol\{D\}\_\{k\}\\in\\boldsymbol\{D\}^\{\\prime\}\}Q\_\{k\}\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{r\}\_\{\\boldsymbol\{D\}^\{\\prime\}\}\\in h\_\{\\boldsymbol\{D\}^\{\\prime\}\}\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\_\{\\boldsymbol\{D\}^\{\\prime\}\}\)\.

###### Definition 6\.8\(Relevant Response\-Types,𝑹θ\\boldsymbol\{R\}\_\{\\theta\}, District Hull,𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\)\.

R⋆\(𝒶\)R^\{\\star\}\(\\mathcal\{a\}\)is the set of response\-types with at least one directed path to an outcome not passing through any intervened variables in any counterfactual world𝔠∈ℭ\\mathfrak\{c\}\\in\\mathfrak\{C\}of𝒶\\mathcal\{a\}\. We write𝐑θ=R⋆\(θ\)\\boldsymbol\{R\}\_\{\\theta\}=R^\{\\star\}\(\\theta\)to denote the relevant response types of the queryθ\\theta\. Let𝐃θ∗\\boldsymbol\{D\}^\{\*\}\_\{\\theta\}denote the set of districts with non\-empty overlap with𝐑θ\\boldsymbol\{R\}\_\{\\theta\}\. The*district hull*is given byD∗\(𝐑θ\):=∪𝐃k∈𝐃θ∗𝐃k⊆𝐕∪𝐑D^\{\*\}\(\\boldsymbol\{R\}\_\{\\theta\}\):=\\cup\_\{\\boldsymbol\{D\}\_\{k\}\\in\\boldsymbol\{D\}^\{\*\}\_\{\\theta\}\}\\boldsymbol\{D\}\_\{k\}\\subseteq\\boldsymbol\{V\}\\cup\\boldsymbol\{R\}and𝐑θ∗=D∗\(𝐑θ\)∩𝐑\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}=D^\{\*\}\(\\boldsymbol\{R\}\_\{\\theta\}\)\\cap\\boldsymbol\{R\}\.

It has been shown by\(Duarteet al\.,[2023](https://arxiv.org/html/2605.06993#bib.bib13), Proposition 3\)that the relevant response\-typesR⋆\(𝒶\)R^\{\\star\}\(\\mathcal\{a\}\)are sufficient to express the causal quantity𝒶\\mathcal\{a\}in the polynomial program\. We can therefore formulate our objective function only in terms of𝑹θ⊆𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}\\subseteq\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\.

###### Example 6\.9\(Query\-relevant response\-types in Figure[1\(b\)](https://arxiv.org/html/2605.06993#S3.F1.sf2)\)\.

Letθ=P\(Y∣do\(M\)\)\\theta=P\(Y\\mid do\(M\)\)in the ADMG of Figure[1\(b\)](https://arxiv.org/html/2605.06993#S3.F1.sf2)\. To form𝐑θ\\boldsymbol\{R\}\_\{\\theta\}, we need to findR∈𝐑R\\in\\boldsymbol\{R\}such that there is a directed path fromRRtoYYnot passing throughMM\.R1∉𝐑θR\_\{1\}\\not\\in\\boldsymbol\{R\}\_\{\\theta\}since all pathsR1→…→YR\_\{1\}\\to\\dots\\to Ypass throughMM\. Similarly,R3R\_\{3\}andR4R\_\{4\}are not in𝐑θ\\boldsymbol\{R\}\_\{\\theta\}\. In contrast,R2→YR\_\{2\}\\to Ydoes not pass throughMM\. Hence𝐑θ=\{R2\}\\boldsymbol\{R\}\_\{\\theta\}=\\\{R\_\{2\}\\\}\. The districts of Figure[1\(b\)](https://arxiv.org/html/2605.06993#S3.F1.sf2)are𝐃1=\{A,B,C,R3,R4\}\\boldsymbol\{D\}\_\{1\}=\\\{A,B,C,R\_\{3\},R\_\{4\}\\\},𝐃2=\{X,M,Y,R1,R2\}\\boldsymbol\{D\}\_\{2\}=\\\{X,M,Y,R\_\{1\},R\_\{2\}\\\},𝐃3=\{D\}\\boldsymbol\{D\}\_\{3\}=\\\{D\\\}\. SinceR2∈𝐃2R\_\{2\}\\in\\boldsymbol\{D\}\_\{2\}, then𝐃θ∗=\{𝐃2\}\\boldsymbol\{D\}^\{\*\}\_\{\\theta\}=\\\{\\boldsymbol\{D\}\_\{2\}\\\}\. Consequently, we haveD∗\(𝐑θ\)=𝐃2D^\{\*\}\(\\boldsymbol\{R\}\_\{\\theta\}\)=\\boldsymbol\{D\}\_\{2\}and𝐑θ∗=\{R1,R2\}\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}=\\\{R\_\{1\},R\_\{2\}\\\}\.

Using Lemmas[6\.5](https://arxiv.org/html/2605.06993#S6.Thmtheorem5)and[6\.7](https://arxiv.org/html/2605.06993#S6.Thmtheorem7), we can show that the observational constraints, which naively involve all of𝑹\\boldsymbol\{R\}, can be reformulated to involve only𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\. Specifically, the contribution of response\-types outside the district hull factors out and cancels, leaving a set of constraints purely described by𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\.

###### Lemma 6\.10\.

For the purpose of optimizing𝒯\\mathcal\{T\}, the observational constraintsP\(𝐯\)=∑𝐫∈h\(𝐯\)P\(𝐫\)P\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{r\}\\in h\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\)can equivalently be replaced by∏𝐃k∈𝐃θ∗Qk\(𝐯\)=∑𝐫θ∗∈h𝐃θ∗\(𝐯\)P\(𝐫θ∗\),\\prod\_\{\\boldsymbol\{D\}\_\{k\}\\in\\boldsymbol\{D\}^\{\*\}\_\{\\theta\}\}Q\_\{k\}\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{r\}\_\{\\theta\}^\{\*\}\\in h\_\{\\boldsymbol\{D\}^\{\*\}\_\{\\theta\}\}\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\_\{\\theta\}^\{\*\}\),which involves only variables in𝐑θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\. The left\-hand side is computable fromP\(𝐕\)P\(\\boldsymbol\{V\}\)\.

The base program can be defined only over𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\. For instance, in the previous example, the base program𝒫∅\\mathcal\{P\}\_\{\\emptyset\}optimizes over\|supp\(R1\)\|⋅\|supp\(R2\)\|\|supp\(R\_\{1\}\)\|\\cdot\|supp\(R\_\{2\}\)\|decision variables rather than∏i=14\|supp\(Ri\)\|\\prod\_\{i=1\}^\{4\}\|supp\(R\_\{i\}\)\|\. See Appendix[D\.2](https://arxiv.org/html/2605.06993#A4.Thmtheorem2)\.

###### Theorem 6\.11\.

∀𝒶∈𝒜\\forall\\mathcal\{a\}\\in\\mathcal\{A\}withR⋆\(𝒶\)∩𝐑θ∗=∅R^\{\\star\}\(\\mathcal\{a\}\)\\cap\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}=\\emptyset,potθ\(𝒶\)=0pot\_\{\\theta\}\(\\mathcal\{a\}\)=0\.

Theorem[6\.11](https://arxiv.org/html/2605.06993#S6.Thmtheorem11)reduces zero\-potency certification to checking whether an experiment’s relevant response\-types overlap with𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\. The following corollary gives a purely graphical sufficient condition for this check\.

###### Corollary 6\.12\(Interception\)\.

Let𝒶∈𝒜\\mathcal\{a\}\\in\\mathcal\{A\}have counterfactual worldsℭ\\mathfrak\{C\}\. If for every𝔠∈ℭ\\mathfrak\{c\}\\in\\mathfrak\{C\}, every directed path from𝐑θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}to𝐘\(𝔠\)\\boldsymbol\{Y\}^\{\(\\mathfrak\{c\}\)\}passes through𝐗\(𝔠\)\\boldsymbol\{X\}^\{\(\\mathfrak\{c\}\)\}, thenpotθ\(𝒶\)=0pot\_\{\\theta\}\(\\mathcal\{a\}\)=0\.

###### Example 6\.13\(Zero potency\)\.

Consider experimentP\(D∣do\(C\)\)P\(D\\mid do\(C\)\)in Example[6\.9](https://arxiv.org/html/2605.06993#S6.Thmtheorem9)\. The only directed path from𝐑θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}toDDisR1→C→DR\_\{1\}\\to C\\to D, which passes throughCC\. Hencepotθ\(P\(D∣do\(C\)\)\)=0pot\_\{\\theta\}\(P\(D\\mid do\(C\)\)\)=0\. On the other hand, the experimentP\(D∣do\(B\)\)P\(D\\mid do\(B\)\)has potentially non\-zero potency, since the pathR1→X→C→DR\_\{1\}\\to X\\to C\\to Ddoes not pass throughBB\.

Figure 2:GetUselessExperiments
Graph

𝒢\\mathcal\{G\}, Query

θ\\theta, Experiment set

𝒜\\mathcal\{A\}
𝒜†⊆𝒜\\mathcal\{A\}^\{\\dagger\}\\subseteq\\mathcal\{A\}with

∀𝒶∈𝒜†:potθ\(𝒶\)=0\\forall\\,\\mathcal\{a\}\\in\\mathcal\{A\}^\{\\dagger\}:pot\_\{\\theta\}\(\\mathcal\{a\}\)=0
𝑹θ∗←D∗\(R⋆\(θ\)\)∩𝑹\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\\leftarrow D^\{\*\}\\\!\\big\(R^\{\\star\}\(\\theta\)\\big\)\\cap\\boldsymbol\{R\};

𝒜†←∅\\quad\\mathcal\{A\}^\{\\dagger\}\\leftarrow\\emptyset
for

𝒶∈𝒜\\mathcal\{a\}\\in\\mathcal\{A\}with worlds

ℭ𝒶\\mathfrak\{C\}\_\{\\mathcal\{a\}\}do

if

\|ℭ𝒶\|=1\|\\mathfrak\{C\}\_\{\\mathcal\{a\}\}\|=1and

ID\(𝒢,𝒀\(𝔠\),𝑿\(𝔠\)\)\\textsc\{ID\}\(\\mathcal\{G\},\\boldsymbol\{Y\}^\{\(\\mathfrak\{c\}\)\},\\boldsymbol\{X\}^\{\(\\mathfrak\{c\}\)\}\)then

𝒜†←𝒜†∪\{𝒶\}\\mathcal\{A\}^\{\\dagger\}\\leftarrow\\mathcal\{A\}^\{\\dagger\}\\cup\\\{\\mathcal\{a\}\\\};continue

endif

if

∀𝔠∈ℭ𝒶:\\forall\\mathfrak\{c\}\\\!\\in\\\!\\mathfrak\{C\}\_\{\\mathcal\{a\}\}\\\!\\\!:\\\!Alg\.[3](https://arxiv.org/html/2605.06993#alg3)\(𝒢,𝑹θ∗,𝒀\(𝔠\),𝑿\(𝔠\)\)\(\\mathcal\{G\},\\boldsymbol\{R\}\_\{\\theta\}^\{\*\},\\boldsymbol\{Y\}^\{\(\\mathfrak\{c\}\)\},\\boldsymbol\{X\}^\{\(\\mathfrak\{c\}\)\}\)then

𝒜†←𝒜†∪\{𝒶\}\\mathcal\{A\}^\{\\dagger\}\\leftarrow\\mathcal\{A\}^\{\\dagger\}\\cup\\\{\\mathcal\{a\}\\\}

endif

endfor

Algorithm[3](https://arxiv.org/html/2605.06993#alg3)in Appendix[E\.3](https://arxiv.org/html/2605.06993#A5.SS3)determines whether a given single\-world experiment is useless by verifying the condition of Corollary[6\.12](https://arxiv.org/html/2605.06993#S6.Thmtheorem12)\. If this algorithm returnsTrue, the intervention is deemed useless; otherwise, the result is inconclusive\. Its computational complexity is linear in the size of the graph\. For a worked example on Figure[1\(b\)](https://arxiv.org/html/2605.06993#S3.F1.sf2), see Appendix[D\.3](https://arxiv.org/html/2605.06993#A4.Thmtheorem3)\.

### 6\.2Identifiable Experiments

When an intervention is identifiable, it cannot provide any additional information beyond what is already contained in the observational distribution\. In other words, it is useless\.

###### Theorem 6\.14\.

∀𝒶∈𝒜\\forall\\,\\mathcal\{a\}\\in\\mathcal\{A\}whereP\(𝒶\)P\(\\mathcal\{a\}\)is identifiable \(ID\) in𝒢\\mathcal\{G\},potθ\(𝒶\)=0pot\_\{\\theta\}\(\\mathcal\{a\}\)=0\.

TheIDalgorithm inShpitser and Pearl \([2006](https://arxiv.org/html/2605.06993#bib.bib6)\)can be applied to verify identifiability for single\-world interventionsP\(𝒘∣do\(𝒛\)\)P\(\\boldsymbol\{w\}\\mid do\(\\boldsymbol\{z\}\)\)\. Note that cross\-world counterfactuals are not ID by construction, soIDis applied only to interventions\. Furthermore, the results of Theorems[6\.11](https://arxiv.org/html/2605.06993#S6.Thmtheorem11)and[6\.14](https://arxiv.org/html/2605.06993#S6.Thmtheorem14)can be applied independently, in any order, to filter out useless experiments, and Algorithm[2](https://arxiv.org/html/2605.06993#S6.F2)exploits this fact\.

###### Corollary 6\.15\(ID\-inertness\)\.

∀𝒶∈𝒜\\forall\\mathcal\{a\}\\in\\mathcal\{A\}where𝒶\\mathcal\{a\}isID:potθ\(\{𝒶\}∪𝒷\)=potθ\(𝒷\)pot\_\{\\theta\}\(\\\{\\mathcal\{a\}\\\}\\cup\\mathcal\{b\}\)=pot\_\{\\theta\}\(\\mathcal\{b\}\)for any𝒷⊆𝒜\\mathcal\{b\}\\subseteq\\mathcal\{A\}\.

In words, ID\-pruned experiments can be removed from the subset search entirely without affecting the optimum \(Appendix[E\.4](https://arxiv.org/html/2605.06993#A5.SS4)\)\.

## 7Evaluation

Synthetic evaluation\.We evaluate Algorithm[2](https://arxiv.org/html/2605.06993#S6.F2)on two types of graphs: randomly generated graphs and several real\-world networks selected from the*bnlearn*222[https://www\.bnlearn\.com/bnrepository/](https://www.bnlearn.com/bnrepository/)repository\. For each graph, we randomly choose a query as well as a set of\|𝒜\|=200\|\\mathcal\{A\}\|=200candidate experiments\. We report the mean fraction of candidates pruned by the two core mechanisms of Algorithm[2](https://arxiv.org/html/2605.06993#S6.F2): the ID check, and the interception rule \(Algorithm[3](https://arxiv.org/html/2605.06993#alg3)\); throughout,±\\pmdenotes one standard deviation across simulations\. For details on graph, query, and experiment generation as well as per\-graph results we refer to Appendix[H](https://arxiv.org/html/2605.06993#A8)\.

Erdős–Rényi random graphs:We sample ADMGs with\|𝑽\|∈\{10,15,20,30\}\|\\boldsymbol\{V\}\|\\in\\\{10,15,20,30\\\}observed nodes and vary the confounding density\|𝑼\|/\|𝑽\|\|\\boldsymbol\{U\}\|/\|\\boldsymbol\{V\}\|from roughly0\.250\.25to1\.01\.0\. Across1,5961\{,\}596simulations, Algorithm[2](https://arxiv.org/html/2605.06993#S6.F2)prunes on average74\.3%±19\.4%74\.3\\%\\pm 19\.4\\%of candidate experiments \(33\.0%±23\.2%33\.0\\%\\pm 23\.2\\%via the ID check,41\.3%±38\.5%41\.3\\%\\pm 38\.5\\%via interception\), reaching as much as90\.6%90\.6\\%in some simulations\. Figure[3](https://arxiv.org/html/2605.06993#S7.F3)reveals a clear pattern: as\|𝑼\|/\|𝑽\|\|\\boldsymbol\{U\}\|/\|\\boldsymbol\{V\}\|grows, interception weakens \(from70\.9%±31\.0%70\.9\\%\\pm 31\.0\\%at the sparsest setting to16\.5%±30\.0%16\.5\\%\\pm 30\.0\\%at the densest\)\. The fraction pruned by the ID check, in turn, increases with confounding density, since the ID algorithm is only executed on experiments that were not already pruned by interception\.

![Refer to caption](https://arxiv.org/html/2605.06993v1/images/evaluation/er_fraction_pruned.png)Figure 3:Erdős–Rényi graphs\. For different sizes of𝑽\\boldsymbol\{V\}, we plot the fraction of pruned experiments against the density\|𝑼\|\|𝑽\|\\frac\{\|\\boldsymbol\{U\}\|\}\{\|\\boldsymbol\{V\}\|\}\. The hatched orange area represents experiments pruned by the interception mechanism \(Algorithm[3](https://arxiv.org/html/2605.06993#alg3)\); the blue area represents those pruned by the ID check\. In total,1,5961\{,\}596simulations were performed\.bnlearn:We evaluate on 11 benchmark networks ranging from\|𝑽\|=8\|\\boldsymbol\{V\}\|\{=\}8\(ASIA\) to\|𝑽\|=76\|\\boldsymbol\{V\}\|\{=\}76\(WIN95PTS\)\. Across1,1001\{,\}100simulations, Algorithm[2](https://arxiv.org/html/2605.06993#S6.F2)prunes on average63\.4%±12\.4%63\.4\\%\\pm 12\.4\\%of candidate experiments \(47\.1%±14\.5%47\.1\\%\\pm 14\.5\\%via the ID check,16\.2%±17\.5%16\.2\\%\\pm 17\.5\\%via interception\)\. Figure[4](https://arxiv.org/html/2605.06993#S7.F4)displays the pruning rates per graph\. The contribution of the interception rule varies dramatically across graphs, ranging from2\.3%±4\.3%2\.3\\%\\pm 4\.3\\%onMILDEWto29\.2%±25\.5%29\.2\\%\\pm 25\.5\\%onWIN95PTS\.

![Refer to caption](https://arxiv.org/html/2605.06993v1/images/evaluation/bnlearn_per_graph.png)Figure 4:*bnlearn*\. For each graph, we plot the mean fraction of pruned experiments\. The hatched orange area represents experiments pruned by the interception mechanism \(Algorithm[3](https://arxiv.org/html/2605.06993#alg3)\); the blue area represents those pruned by the ID check\. Error bars indicate±1\\pm 1standard deviation of the total pruning rate across 100 simulations per graph\. In total,1,1001\{,\}100simulations were performed\.The high variability in the interception rate \(e\.g\.±25\.5%\\pm 25\.5\\%onWIN95PTS\) reflects its strong dependence on the specific topology\. Figure[9](https://arxiv.org/html/2605.06993#A8.F9)in Appendix[H\.5](https://arxiv.org/html/2605.06993#A8.SS5)shows the main structural driver of interception pruning\. When the query is localized \(small\|𝑹θ∗\|/\|𝑽\|\|\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\|/\|\\boldsymbol\{V\}\|\), most experiments fail to interact with𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}and are pruned immediately; as the hull grows, this mechanism weakens monotonically\. This matches the Erdős–Rényi findings: denser confounding produces larger districts, which enlarge𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}and shift the pruning burden onto the ID check\. Together, the two criteria yield substantial pruning across singleton candidates without solving a single polynomial program\.

Subset search space reduction:The pruning rates reported above determine the search space reduction when selecting a single optimal experiment, a practically important setting\. For the general subset search over2\|𝒜\|2^\{\|\\mathcal\{A\}\|\}subsets, Corollary[6\.15](https://arxiv.org/html/2605.06993#S6.Thmtheorem15)reduces the exponent by\|𝒜ID†\|\|\\mathcal\{A\}^\{\\dagger\}\_\{\\mathrm\{ID\}\}\|\(Appendix[E\.4](https://arxiv.org/html/2605.06993#A5.SS4)\)\. Running the ID check on all candidates independently \(without prior interception filtering\), it certifies61\.4%±11\.3%61\.4\\%\\pm 11\.3\\%on Erdős–Rényi and60\.0%±10\.8%60\.0\\%\\pm 10\.8\\%on*bnlearn*—for\|𝒜\|=200\|\\mathcal\{A\}\|=200, the exponent drops from200200to roughly8080on average\.

Physical Activity and Diabetes\.We demonstrate the full pipeline, pruning followed by potency computation, on observational data from NHANES 2017–2018\.333[https://wwwn\.cdc\.gov/nchs/nhanes/](https://wwwn.cdc.gov/nchs/nhanes/)

Graph and query:Five binary variables are drawn from the survey: balance problems \(AA\), falls \(BB\), health insurance \(ZZ\), physical activity \(XX\), and not diabetes \(YY\)\.444We invertYYfor better interpretability\.Three latent confounders could represent inner\-ear disorders \(U3U\_\{3\}, confoundingA↔BA\\leftrightarrow B\), healthcare access \(U1U\_\{1\}, confoundingZ↔XZ\\leftrightarrow X\), and genetic predisposition \(U2U\_\{2\}, confoundingX↔YX\\leftrightarrow Y\)\. The assumed causal structure is shown in Figure[10](https://arxiv.org/html/2605.06993#A9.F10)in Appendix[I](https://arxiv.org/html/2605.06993#A9)\. The query of interest isθ=ATE=P\(yx\)−P\(yx′\)\\theta=\\text\{ATE\}=P\(y\_\{x\}\)\-P\(y\_\{x^\{\\prime\}\}\), the average treatment effect of physical activity on not getting diabetes\. The observational data yields a bound width of𝒲θ=1\\mathcal\{W\}\_\{\\theta\}=1withθ∈\[−0\.47,0\.53\]\\theta\\in\[\-0\.47,0\.53\]\.

Experiments:In practice, an analyst need not commit the entire research budget at once; a natural workflow is to select a small optimal batch, execute it, and re\-solve with the realized outcome incorporated\. We consider the candidate experiments and combinations in Table[1](https://arxiv.org/html/2605.06993#S7.T1)\. Algorithm[2](https://arxiv.org/html/2605.06993#S6.F2)certifies𝒶1\\mathcal\{a\}\_\{1\}and𝒶2\\mathcal\{a\}\_\{2\}as useless; we compute potency for the rest\.

Table 1:Pruning and potency results\. Left: cost\-1 singleton experiments \(𝒶1,𝒶2\\mathcal\{a\}\_\{1\},\\mathcal\{a\}\_\{2\}pruned by ID and Interception, respectively\)\. Right: cost\-2 singletons and combinations\.ExperimentPruned?potθpot\_\{\\theta\}cost𝒶1\\mathcal\{a\}\_\{1\}P\(X∣do\(B\)\)P\(X\\mid do\(B\)\)ID011𝒶2\\mathcal\{a\}\_\{2\}P\(B∣do\(A\)\)P\(B\\mid do\(A\)\)Int\.011𝒶3\\mathcal\{a\}\_\{3\}P\(X,Y∣do\(z\)\)P\(X,Y\\mid do\(z\)\)–0\.120\.1211𝒶4\\mathcal\{a\}\_\{4\}P\(yx′,yx′\)P\(y\_\{x^\{\\prime\}\},y^\{\\prime\}\_\{x\}\)–0\.470\.4711
Experimentpotθpot\_\{\\theta\}cost𝒶5\\mathcal\{a\}\_\{5\}P\(Y∣do\(x′\)\)P\(Y\\mid do\(x^\{\\prime\}\)\)0\.460\.4622𝒶6\\mathcal\{a\}\_\{6\}P\(Y∣do\(x\)\)P\(Y\\mid do\(x\)\)0\.530\.5322\{𝒶4,𝒶6\}\\\{\\mathcal\{a\}\_\{4\},\\mathcal\{a\}\_\{6\}\\\}0\.530\.5333\{𝒶4,𝒶5\}\\\{\\mathcal\{a\}\_\{4\},\\mathcal\{a\}\_\{5\}\\\}0\.840\.8433

The solution to the max\-potency problem depends on the budgetBBcommitted per round\. ForB=1B=1, the optimal choice is\{𝒶4\}\\\{\\mathcal\{a\}\_\{4\}\\\}\. ForB=2B=2, we should carry out the positive intervention\{𝒶6\}\\\{\\mathcal\{a\}\_\{6\}\\\}\. ForB=3B=3, the most potent combination is\{𝒶4,𝒶5\}\\\{\\mathcal\{a\}\_\{4\},\\mathcal\{a\}\_\{5\}\\\}—even though the positive intervention is more potent by itself, in combination with𝒶4\\mathcal\{a\}\_\{4\}, the negative intervention is superior\. Committing toB=1B=1and findingP\(yx′,yx′\)=0P\(y\_\{x^\{\\prime\}\},y^\{\\prime\}\_\{x\}\)=0\(intuitively, physical activity cannot cause diabetes\) would yieldθ∈\[0,0\.53\]\\theta\\in\[0,0\.53\], after which the analyst re\-solves against the updated program\.

## 8Conclusion

We introduced the max\-potency problem—selecting cost\-constrained experiments to maximally tighten bounds on a partially identifiable causal query—defined*epistemic potency*as a worst\-case measure of informativeness, and showed the problem is NP\-hard\. Two graphical pruning criteria, a path\-interception rule and an ID check, together prune 50–88% of the super\-exponential candidate space for singleton selection\. For subset search, ID\-pruned experiments are combinatorially inert \(Corollary[6\.15](https://arxiv.org/html/2605.06993#S6.Thmtheorem15)\) and can be removed entirely, reducing the exponent by roughly 60% on average\. A natural extension is the sequential setting, where realized outcomes are incorporated into the base program after each round and the next optimal batch is selected against the updated bounds, as illustrated in Section[7](https://arxiv.org/html/2605.06993#S7)\.

Limitations\.i\) Our framework inherits limitations fromDuarteet al\.\([2023](https://arxiv.org/html/2605.06993#bib.bib13)\): known ADMG, discrete variables with finite support, exactP\(𝑽\)P\(\\boldsymbol\{V\}\), and possibly non\-sharp bounds if the nonconvex polynomial program is terminated early\. ii\) Our pruning criteria are sound but not complete: experiments not certified useless may still have zero potency, and interception\-pruned experiments may contribute non\-zero potency in subsets\. iii\) Our worst\-case bound\-width definition \(Section[3](https://arxiv.org/html/2605.06993#S3)\) guarantees tightening regardless of outcome; under alternative criteria \(Appendix[B](https://arxiv.org/html/2605.06993#A2)\) the optimal set may differ\.

## References

- E\. Ailer, N\. Dern, J\. Hartford, and N\. Kilbertus \(2024\)Targeted sequential indirect experiment design\.InAdvances in Neural Information Processing Systems 37 \(NeurIPS 2024\),Cited by:[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p3.1),[§1](https://arxiv.org/html/2605.06993#S1.p2.1)\.
- S\. Akbari, J\. Etesami, and N\. Kiyavash \(2025\)Optimal experiment design for causal effect identification\.Journal of Machine Learning Research26,pp\. 1–56\.Note:Submitted 12/22; Revised 11/24; Published 2/25External Links:[Link](http://jmlr.org/papers/v26/22-1516.html)Cited by:[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p3.1)\.
- J\. D\. Angrist, G\. W\. Imbens, and D\. B\. Rubin \(1996\)Identification of causal effects using instrumental variables\.Journal of the American statistical Association91\(434\),pp\. 444–455\.Cited by:[Example D\.1](https://arxiv.org/html/2605.06993#A4.Thmtheorem1.p2.8.8)\.
- A\. Balke and J\. Pearl \(1997\)Bounds on treatment effects from studies with imperfect compliance\.Journal of the American Statistical Association92\(439\),pp\. 1171–1176\.Cited by:[Definition A\.1](https://arxiv.org/html/2605.06993#A1.Thmtheorem1.p1.17.9),[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p4.1),[§1](https://arxiv.org/html/2605.06993#S1.p1.1)\.
- E\. Bareinboim and J\. Pearl \(2012\)Causal inference by surrogate experiments: z\-identifiability\.InProceedings of the Twenty\-Eighth Conference on Uncertainty in Artificial Intelligence,pp\. 113–120\.Cited by:[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p2.8)\.
- J\. Duarte, S\. Lee, K\. Mohan, and I\. Shpitser \(2023\)An automated approach to causal inference in discrete settings\.Journal of the American Statistical Association118\(543\),pp\. 1778–1792\.External Links:[Document](https://dx.doi.org/10.1080/01621459.2022.2063540),[Link](https://doi.org/10.1080/01621459.2022.2063540)Cited by:[Proposition A\.2](https://arxiv.org/html/2605.06993#A1.Thmtheorem2),[Appendix A](https://arxiv.org/html/2605.06993#A1.p1.1),[§E\.1](https://arxiv.org/html/2605.06993#A5.SS1.p1.1),[§E\.1](https://arxiv.org/html/2605.06993#A5.SS1.p2.2),[Appendix F](https://arxiv.org/html/2605.06993#A6.SS0.SSS0.Px1.p1.6),[§F\.6](https://arxiv.org/html/2605.06993#A6.SS6.1.p1.7),[2nd item](https://arxiv.org/html/2605.06993#S1.I1.i2.p1.1),[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p4.1),[§2](https://arxiv.org/html/2605.06993#S2.p5.11),[§3](https://arxiv.org/html/2605.06993#S3.p1.6),[§5](https://arxiv.org/html/2605.06993#S5.p1.2),[§5](https://arxiv.org/html/2605.06993#S5.p2.19),[§6\.1](https://arxiv.org/html/2605.06993#S6.SS1.p1.5),[§6\.1](https://arxiv.org/html/2605.06993#S6.SS1.p5.3),[§8](https://arxiv.org/html/2605.06993#S8.p2.1)\.
- S\. Elahi, S\. Akbari, J\. Etesami, N\. Kiyavash, and P\. Thiran \(2024\)Fast proxy experiment design for causal effect identification\.InAdvances in Neural Information Processing Systems 38 \(NeurIPS 2024\),Cited by:[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p3.1)\.
- C\. E\. Frangakis and D\. B\. Rubin \(2002\)Principal stratification in causal inference\.Biometrics58\(1\),pp\. 21–29\.External Links:[Document](https://dx.doi.org/10.1111/j.0006-341X.2002.00021.x)Cited by:[Definition A\.1](https://arxiv.org/html/2605.06993#A1.Thmtheorem1.p1.17.9)\.
- S\. Greenland and J\. M\. Robins \(1986\)Identifiability, exchangeability, and epidemiological confounding\.International Journal of Epidemiology15\(3\),pp\. 413–419\.External Links:ISSN 0300\-5771,[Document](https://dx.doi.org/10.1093/ije/15.3.413),[Link](https://doi.org/10.1093/ije/15.3.413),https://academic\.oup\.com/ije/article\-pdf/15/3/413/1873024/15\-3\-413\.pdfCited by:[Example D\.1](https://arxiv.org/html/2605.06993#A4.Thmtheorem1.p2.8.8)\.
- Y\. Kivva, E\. Mokhtarian, J\. Etesami, and N\. Kiyavash \(2022\)Revisiting the general identifiability problem\.InProceedings of the Thirty\-Eighth Conference on Uncertainty in Artificial Intelligence \(UAI\),Vol\.180,pp\. 1022–1030\.Cited by:[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p2.8)\.
- S\. Lee, J\. D\. Correa, and E\. Bareinboim \(2019\)General identifiability with arbitrary surrogate experiments\.InProceedings of the Thirty\-Fifth Conference on Uncertainty in Artificial Intelligence \(UAI\),Vol\.115,pp\. 389–398\.Cited by:[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p2.8)\.
- A\. Li, S\. Mueller, X\. Shu, and J\. Pearl \(2026\)ϵ\\epsilon\-Identifiability of causal quantities\.InProceedings of the International Conference on Artificial Intelligence and Statistics \(AISTATS\),Cited by:[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p4.1)\.
- S\. Mueller and J\. Pearl \(2022\)Personalized decision making – a conceptual introduction\.Technical reportTechnical ReportR\-513,Department of Computer Science, University of California, Los Angeles\.Cited by:[§1](https://arxiv.org/html/2605.06993#S1.p1.1)\.
- J\. Pearl \(1999\)Probabilities of causation: three counterfactual interpretations and their identification\.Synthese121,pp\. 93–149\.Cited by:[§1](https://arxiv.org/html/2605.06993#S1.p1.1)\.
- A\. Raghavan and E\. Bareinboim \(2025\)Counterfactual realizability\.InThe Thirteenth International Conference on Learning Representations \(ICLR\),Note:SpotlightCited by:[§3](https://arxiv.org/html/2605.06993#S3.p2.12)\.
- M\. C\. Sachs, G\. Jonzon, A\. Sjölander, and E\. E\. Gabriel \(2023\)A General Method for Deriving Tight Symbolic Bounds on Causal Effects\.Journal of Computational and Graphical Statistics32\(2\),pp\. 567–576\(en\)\.External Links:ISSN 1061\-8600, 1537\-2715,[Link](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2071905),[Document](https://dx.doi.org/10.1080/10618600.2022.2071905)Cited by:[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p4.1)\.
- I\. Shpitser and J\. Pearl \(2006\)Identification of joint interventional distributions in recursive semi\-markovian causal models\.InProceedings of the 21st National Conference on Artificial Intelligence \(AAAI\),Vol\.21,pp\. 1219–1226\.Cited by:[3rd item](https://arxiv.org/html/2605.06993#S1.I1.i3.p1.1),[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p2.8),[§6\.2](https://arxiv.org/html/2605.06993#S6.SS2.p2.1)\.
- M\. Shridharan and G\. Iyengar \(2023a\)Causal bounds in quasi\-Markovian graphs\.InProceedings of the 40th International Conference on Machine Learning,A\. Krause, E\. Brunskill, K\. Cho, B\. Engelhardt, S\. Sabato, and J\. Scarlett \(Eds\.\),Proceedings of Machine Learning Research, Vol\.202,pp\. 31675–31692\.External Links:[Link](https://proceedings.mlr.press/v202/shridharan23a.html)Cited by:[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p4.1)\.
- M\. Shridharan and G\. Iyengar \(2023b\)Scalable computation of causal bounds\.Journal of Machine Learning Research24\(237\),pp\. 1–35\.Cited by:[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p4.1)\.
- J\. Tian and J\. Pearl \(2000\)Probabilities of causation: Bounds and identification\.Annals of Mathematics and Artificial Intelligence28\(1\),pp\. 287–313\.External Links:ISSN 1573\-7470,[Link](https://doi.org/10.1023/A:1018912507879),[Document](https://dx.doi.org/10.1023/A%3A1018912507879)Cited by:[§C\.1](https://arxiv.org/html/2605.06993#A3.SS1.p1.2),[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p4.1),[§1](https://arxiv.org/html/2605.06993#S1.p1.1),[Example 3\.1](https://arxiv.org/html/2605.06993#S3.Thmtheorem1.p2.9),[Example 3\.5](https://arxiv.org/html/2605.06993#S3.Thmtheorem5.p1.4.4)\.
- J\. Tian and J\. Pearl \(2002\)A general identification condition for causal effects\.InEighteenth National Conference on Artificial Intelligence,USA,pp\. 567–573\.External Links:ISBN 0262511290Cited by:[§F\.4](https://arxiv.org/html/2605.06993#A6.SS4.1.p1.1),[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p4.1),[Definition 6\.6](https://arxiv.org/html/2605.06993#S6.Thmtheorem6)\.
- M\. Zaffalon, A\. Antonucci, and R\. Cabañas \(2020\)Structural causal models are \(solvable by\) credal networks\.InProceedings of the 10th International Conference on Probabilistic Graphical Models,M\. Jaeger and T\. D\. Nielsen \(Eds\.\),Proceedings of Machine Learning Research, Vol\.138,pp\. 581–592\.External Links:[Link](https://proceedings.mlr.press/v138/zaffalon20a.html)Cited by:[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p4.1)\.
- J\. Zhang, J\. Tian, and E\. Bareinboim \(2022\)Partial counterfactual identification from observational and experimental data\.InProceedings of the 39th International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.162,pp\. 26548–26558\.Cited by:[§1\.1](https://arxiv.org/html/2605.06993#S1.SS1.p4.1)\.

Appendix

## Appendix ADefinition of Response\-Type Variables

FollowingDuarteet al\.\[[2023](https://arxiv.org/html/2605.06993#bib.bib13)\], we define response\-type variables per disturbance and assume the given ADMG𝒢\\mathcal\{G\}to be*canonical*without loss of generality \(see their paper for details on canonicalization\)\.

###### Definition A\.1\(Response\-Type Variable\)\.

Let𝒢\\mathcal\{G\}be a canonical ADMG with discrete observed variables𝐕\\boldsymbol\{V\}and latent disturbances𝐔\\boldsymbol\{U\}\. For a disturbanceUk∈𝐔U\_\{k\}\\in\\boldsymbol\{U\}with observed children𝐜𝐡\(Uk\)⊆𝐕\\boldsymbol\{ch\}\(U\_\{k\}\)\\subseteq\\boldsymbol\{V\}, declare two valuesuk,uk′∈supp\(Uk\)u\_\{k\},u\_\{k\}^\{\\prime\}\\in supp\(U\_\{k\}\)equivalent, writtenuk∼kuk′u\_\{k\}\\sim\_\{k\}u\_\{k\}^\{\\prime\}, if for everyV∈𝐜𝐡\(Uk\)V\\in\\boldsymbol\{ch\}\(U\_\{k\}\)the partially applied structural equation

fV\(uk\):∏P∈𝑷𝒂\(V\)∖\{Uk\}supp\(P\)→supp\(V\)f\_\{V\}^\{\(u\_\{k\}\)\}:\\prod\_\{P\\in\\boldsymbol\{Pa\}\(V\)\\setminus\\\{U\_\{k\}\\\}\}supp\(P\)\\;\\to\\;supp\(V\)is identical underuku\_\{k\}anduk′u\_\{k\}^\{\\prime\}\. The equivalence classessupp\(Uk\)/∼ksupp\(U\_\{k\}\)/\\\!\\sim\_\{k\}are the*generalized principal strata*ofUkU\_\{k\}\[Frangakis and Rubin,[2002](https://arxiv.org/html/2605.06993#bib.bib32), Balke and Pearl,[1997](https://arxiv.org/html/2605.06993#bib.bib11)\], and the*response\-type variable*RkR\_\{k\}associated withUkU\_\{k\}is the discrete categorical variable whose support is exactly these classes\. The induced distributionP\(Rk\)P\(R\_\{k\}\)aggregatesP\(Uk\)P\(U\_\{k\}\)over each class\. We write𝐑=\{Rk\}Uk∈𝐔\\boldsymbol\{R\}=\\\{R\_\{k\}\\\}\_\{U\_\{k\}\\in\\boldsymbol\{U\}\}for the collection of all response\-type variables\.

###### Proposition A\.2\(Duarteet al\.,[2023](https://arxiv.org/html/2605.06993#bib.bib13), Proposition 1\)\.

Replacing eachUkU\_\{k\}in the canonical SCM with its response\-type variableRkR\_\{k\}yields an equivalent SCM with respect to the full data law \(every factual, interventional, and counterfactual joint distribution is preserved\)\.

## Appendix BAlternative Criteria for𝒲θ\\mathcal\{W\}\_\{\\theta\}

In Section[3](https://arxiv.org/html/2605.06993#S3), we defined𝒲θ\(𝒶\)\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)using the worst\-case \(maximum\) over feasible experimental outcomes\. Here, we briefly discuss two alternative criteria and compare them on our running example\.

#### Option 1: Expected width\.

One could assume a distribution over the unknown experimental outcome, e\.g\.p𝒶∼Uni\(lp𝒶,up𝒶\)p\_\{\\mathcal\{a\}\}\\sim\\mathrm\{Uni\}\(l\_\{p\_\{\\mathcal\{a\}\}\},u\_\{p\_\{\\mathcal\{a\}\}\}\), and define

𝒲θ\(𝒶\)=𝔼\[Uθ\(𝒶,p𝒶\)−Lθ\(𝒶,p𝒶\)\]=1up𝒶−lp𝒶∫lp𝒶up𝒶Uθ\(𝒶,p𝒶\)−Lθ\(𝒶,p𝒶\)dp\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)=\\mathbb\{E\}\[U\_\{\\theta\}\(\\mathcal\{a\},\{p\_\{\\mathcal\{a\}\}\}\)\-L\_\{\\theta\}\(\\mathcal\{a\},\{p\_\{\\mathcal\{a\}\}\}\)\]=\\frac\{1\}\{u\_\{p\_\{\\mathcal\{a\}\}\}\-l\_\{p\_\{\\mathcal\{a\}\}\}\}\\int\_\{l\_\{p\_\{\\mathcal\{a\}\}\}\}^\{u\_\{p\_\{\\mathcal\{a\}\}\}\}U\_\{\\theta\}\(\\mathcal\{a\},\{p\_\{\\mathcal\{a\}\}\}\)\-L\_\{\\theta\}\(\\mathcal\{a\},\{p\_\{\\mathcal\{a\}\}\}\)\\,dpfor a singleton𝒶∈𝒜\\mathcal\{a\}\\in\\mathcal\{A\}as the expected width\.

#### Option 2: Best\-case width\.

One may change the maximum to a minimum in the original definition and define

𝒲θ\(𝒶\)=minp𝒶∈∏𝒶~∈𝒶\[lp𝒶~,up𝒶~\]\(Uθ\(𝒶,p𝒶\)−Lθ\(𝒶,p𝒶\)\)\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)=\\underset\{\\begin\{subarray\}\{c\}p\_\{\\mathcal\{a\}\}\\in\\prod\_\{\\tilde\{\\mathcal\{a\}\}\\in\\mathcal\{a\}\}\[l\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\},u\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\}\]\\end\{subarray\}\}\{\\min\}\\bigl\(U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\\bigr\)as the smallest possible bound width an experiment could yield\.

#### Comparison on the running example\.

Table 2:Comparison of alternative width criteria on the running example\.Table[2](https://arxiv.org/html/2605.06993#A2.T2)displays alternative definitions of𝒲θ\\mathcal\{W\}\_\{\\theta\}under the setting of Example[3\.1](https://arxiv.org/html/2605.06993#S3.Thmtheorem1)\. Notably, at\(p𝒶1,p𝒶2\)=\(0\.2,0\.6\)\(p\_\{\\mathcal\{a\}\_\{1\}\},p\_\{\\mathcal\{a\}\_\{2\}\}\)=\(0\.2,0\.6\), the bounds collapse toPNS=0\\text\{PNS\}=0; the PNS is not identifiable from observational data alone, but becomes point\-identified for this specific experiment realization\.

Under all three criteria,𝒶3\\mathcal\{a\}\_\{3\}yields the smallest bound width; among singleton experiments, both𝒶1\\mathcal\{a\}\_\{1\}and𝒶2\\mathcal\{a\}\_\{2\}are equivalent under the worst\-case criterion, while𝒶1\\mathcal\{a\}\_\{1\}is strictly preferred under the alternatives\. Only the worst\-case criterion provides a*guaranteed*reduction in bound width regardless of the experimental outcome, which is why we adopt it throughout this paper\.

## Appendix CNP\-hardness of the Max\-Potency Problem

In this appendix we prove Theorem[4\.1](https://arxiv.org/html/2605.06993#S4.Thmtheorem1)\(max\-potency is NP\-hard\) via a polynomial\-time reduction from the 0\-1 knapsack problem\. The construction uses a basic confounded treatment\-outcome pair as an atomic unit, which we call a*TP\-cell*\(after Tian and Pearl, who gave the closed\-form sharp bounds we rely on\)\. Disjoint copies of this cell, one per knapsack item, yield a max\-potency instance whose potency function is additive across cells, so that the knapsack optimum coincides with the max\-potency optimum\.

### C\.1The atomic unit: TP\-cells

###### Definition C\.1\(TP\-cell\)\.

A*TP\-cell*𝒞i\\mathcal\{C\}\_\{i\}is the ADMG with binary observed variablesXi,YiX\_\{i\},Y\_\{i\}, a directed edgeXi→YiX\_\{i\}\\to Y\_\{i\}, and a bidirected edgeXi↔YiX\_\{i\}\\leftrightarrow Y\_\{i\}, equipped with an observational distributionPi\(Xi,Yi\)P\_\{i\}\(X\_\{i\},Y\_\{i\}\)parameterized by

αi=Pi\(xi,yi\),βi=Pi\(xi,yi′\),γi=Pi\(xi′,yi\),δi=Pi\(xi′,yi′\),\\alpha\_\{i\}=P\_\{i\}\(x\_\{i\},y\_\{i\}\),\\quad\\beta\_\{i\}=P\_\{i\}\(x\_\{i\},y\_\{i\}^\{\\prime\}\),\\quad\\gamma\_\{i\}=P\_\{i\}\(x\_\{i\}^\{\\prime\},y\_\{i\}\),\\quad\\delta\_\{i\}=P\_\{i\}\(x\_\{i\}^\{\\prime\},y\_\{i\}^\{\\prime\}\),with\(αi,βi,γi,δi\)∈Δ\(\\alpha\_\{i\},\\beta\_\{i\},\\gamma\_\{i\},\\delta\_\{i\}\)\\in\\Delta, where

Δ:=\{\(α,β,γ,δ\)∈ℝ≥04:α\+β\+γ\+δ=1\}\\Delta:=\\bigl\\\{\(\\alpha,\\beta,\\gamma,\\delta\)\\in\\mathbb\{R\}\_\{\\geq 0\}^\{4\}:\\alpha\+\\beta\+\\gamma\+\\delta=1\\bigr\\\}denotes the probability simplex over the four joint outcomes of\(Xi,Yi\)\(X\_\{i\},Y\_\{i\}\)\.

XiX\_\{i\}YiY\_\{i\}Figure 5:A TP\-cell𝒞i\\mathcal\{C\}\_\{i\}: a confounded treatment\-outcome pair\.The query of interest within a single cell is the probability of necessity and sufficiency,

PNSi=P\(yi,xi,yi,xi′′\)\.\\mathrm\{PNS\}\_\{i\}=P\(y\_\{i,x\_\{i\}\},y^\{\\prime\}\_\{i,x\_\{i\}^\{\\prime\}\}\)\.For a TP\-cell,Tian and Pearl \[[2000](https://arxiv.org/html/2605.06993#bib.bib16)\]give the sharp observational bounds

0≤PNSi≤αi\+δi,0\\leq\\mathrm\{PNS\}\_\{i\}\\leq\\alpha\_\{i\}\+\\delta\_\{i\},so that the observational worst\-case width is

𝒲i\(∅\)=αi\+δi\.\\mathcal\{W\}\_\{i\}\(\\emptyset\)=\\alpha\_\{i\}\+\\delta\_\{i\}\.
We writepoti\(𝒶i\)=𝒲i\(∅\)−𝒲i\(𝒶i\)pot\_\{i\}\(\\mathcal\{a\}\_\{i\}\)=\\mathcal\{W\}\_\{i\}\(\\emptyset\)\-\\mathcal\{W\}\_\{i\}\(\\mathcal\{a\}\_\{i\}\)for the per\-cell potency, in line with Definition[3\.2](https://arxiv.org/html/2605.06993#S3.Thmtheorem2)\. Furthermore, we useW\(⋅\)\(𝒶,p𝒶\):=U\(⋅\)\(𝒶,p𝒶\)−L\(⋅\)\(𝒶,p𝒶\)W\_\{\(\\cdot\)\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\):=U\_\{\(\\cdot\)\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\-L\_\{\(\\cdot\)\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)to denote the instantiated bound width\.

### C\.2The construction

Given a 0\-1 knapsack instance with valuesv1,…,vn≥0v\_\{1\},\\dots,v\_\{n\}\\geq 0, weightsw1,…,wn≥0w\_\{1\},\\dots,w\_\{n\}\\geq 0, and budgetB≥0B\\geq 0, we construct a max\-potency instance as follows\.

1. 1\.Graph\.Let𝒢\\mathcal\{G\}be the disjoint union ofnnTP\-cells𝒞1,…,𝒞n\\mathcal\{C\}\_\{1\},\\dots,\\mathcal\{C\}\_\{n\}\. The observed variables are𝑽=\{X1,Y1,…,Xn,Yn\}\\boldsymbol\{V\}=\\\{X\_\{1\},Y\_\{1\},\\dots,X\_\{n\},Y\_\{n\}\\\}\.
2. 2\.Observational distribution\.Choose per\-cell distributionsPi\(Xi,Yi\)P\_\{i\}\(X\_\{i\},Y\_\{i\}\)such thatpoti\(𝒶i\)=vipot\_\{i\}\(\\mathcal\{a\}\_\{i\}\)=v\_\{i\}, where𝒶i=P\(yi,xi\)\\mathcal\{a\}\_\{i\}=P\(y\_\{i,x\_\{i\}\}\)\. Constructive achievability is established in Lemma[C\.3](https://arxiv.org/html/2605.06993#A3.Thmtheorem3)below; if necessary, rescale the valuesviv\_\{i\}so that they lie in the achievable range\. SetP\(𝑽\)=∏i=1nPi\(Xi,Yi\)P\(\\boldsymbol\{V\}\)=\\prod\_\{i=1\}^\{n\}P\_\{i\}\(X\_\{i\},Y\_\{i\}\)\.
3. 3\.Query\.Let θ=∑i=1nPNSi=∑i=1nP\(yi,xi,yi,xi′′\)\.\\theta=\\sum\_\{i=1\}^\{n\}\\mathrm\{PNS\}\_\{i\}=\\sum\_\{i=1\}^\{n\}P\(y\_\{i,x\_\{i\}\},y^\{\\prime\}\_\{i,x\_\{i\}^\{\\prime\}\}\)\.
4. 4\.Experiment set\.Let𝒜=\{𝒶1,…,𝒶n\}\\mathcal\{A\}=\\\{\\mathcal\{a\}\_\{1\},\\dots,\\mathcal\{a\}\_\{n\}\\\}with𝒶i=P\(yi,xi\)\\mathcal\{a\}\_\{i\}=P\(y\_\{i,x\_\{i\}\}\)\. For anyS⊆\{1,…,n\}S\\subseteq\\\{1,\\dots,n\\\}, we write𝒶S:=\{𝒶i:i∈S\}\\mathcal\{a\}\_\{S\}:=\\\{\\mathcal\{a\}\_\{i\}:i\\in S\\\}\.
5. 5\.Cost function and budget\.Setc\(𝒶S\)=∑i∈Swic\(\\mathcal\{a\}\_\{S\}\)=\\sum\_\{i\\in S\}w\_\{i\}forS⊆\{1,…,n\}S\\subseteq\\\{1,\\dots,n\\\}, and use the original budgetBB\.

The construction is polynomial innn: each TP\-cell is a constant\-size object, and Lemma[C\.3](https://arxiv.org/html/2605.06993#A3.Thmtheorem3)produces the requiredPiP\_\{i\}in time polynomial in the bit\-length ofviv\_\{i\}\.

### C\.3Additivity of potency across cells

The key technical claim is that the potency of any subset𝒶S\\mathcal\{a\}\_\{S\}decomposes as a sum of per\-cell potencies\. This follows from the district structure of𝒢\\mathcal\{G\}via the machinery developed in Section[6](https://arxiv.org/html/2605.06993#S6)\.

###### Lemma C\.2\(Additivity\)\.

For anyS⊆\{1,…,n\}S\\subseteq\\\{1,\\dots,n\\\},

potθ\(𝒶S\)=∑i∈Spoti\(𝒶i\)\.pot\_\{\\theta\}\(\\mathcal\{a\}\_\{S\}\)\\;=\\;\\sum\_\{i\\in S\}pot\_\{i\}\(\\mathcal\{a\}\_\{i\}\)\.

###### Proof\.

Each TP\-cell𝒞i\\mathcal\{C\}\_\{i\}is a connected component of the bidirected graph and shares no variables with any other cell, so the districts of𝒢\\mathcal\{G\}are exactly𝒞1,…,𝒞n\\mathcal\{C\}\_\{1\},\\dots,\\mathcal\{C\}\_\{n\}\. LetRiR\_\{i\}denote the response\-types of cellii\. By Lemma[6\.5](https://arxiv.org/html/2605.06993#S6.Thmtheorem5)\(Cartesian product decomposition\),h\(𝒗\)=∏i=1nhi\(vi\)h\(\\boldsymbol\{v\}\)=\\prod\_\{i=1\}^\{n\}h\_\{i\}\(v\_\{i\}\)for any𝒗=\(v1,…,vn\)∈supp\(𝑽\)\\boldsymbol\{v\}=\(v\_\{1\},\\dots,v\_\{n\}\)\\in\\mathrm\{supp\}\(\\boldsymbol\{V\}\), and response\-types in distinct cells are independent:P\(𝒓\)=∏i=1nP\(ri\)P\(\\boldsymbol\{r\}\)=\\prod\_\{i=1\}^\{n\}P\(r\_\{i\}\)\.

We now show that the polynomial program𝒫𝒶S\\mathcal\{P\}\_\{\\mathcal\{a\}\_\{S\}\}decomposes intonnindependent per\-cell programs\.

#### Objective separates\.

EachPNSi\\mathrm\{PNS\}\_\{i\}involves only interventions, outcomes, and response\-types of cellii, so the target polynomial𝒯=∑iPNSi\\mathcal\{T\}=\\sum\_\{i\}\\mathrm\{PNS\}\_\{i\}is a sum where theii\-th summand depends only onRiR\_\{i\}\.

#### Observational constraints separate\.

The joint constraintP\(𝒗\)=∑𝒓∈h\(𝒗\)P\(𝒓\)P\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{r\}\\in h\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\)is implied by the per\-cell constraintsPi\(vi\)=∑ri∈hi\(vi\)P\(ri\)P\_\{i\}\(v\_\{i\}\)=\\sum\_\{r\_\{i\}\\in h\_\{i\}\(v\_\{i\}\)\}P\(r\_\{i\}\)fori=1,…,ni=1,\\dots,n, sinceP\(𝒗\)=∏iPi\(vi\)P\(\\boldsymbol\{v\}\)=\\prod\_\{i\}P\_\{i\}\(v\_\{i\}\)andh\(𝒗\)=∏ihi\(vi\)h\(\\boldsymbol\{v\}\)=\\prod\_\{i\}h\_\{i\}\(v\_\{i\}\)\.

#### Experimental constraints separate\.

Each constraintf𝒶i\(𝑹\)=p𝒶if\_\{\\mathcal\{a\}\_\{i\}\}\(\\boldsymbol\{R\}\)=p\_\{\\mathcal\{a\}\_\{i\}\}involves onlyRiR\_\{i\}\.

Hence𝒫𝒶S\\mathcal\{P\}\_\{\\mathcal\{a\}\_\{S\}\}decomposes as

Uθ\(𝒶S,p𝒶S\)=∑i∈SUi\(𝒶i,p𝒶i\)\+∑i∉SUi\(∅\),U\_\{\\theta\}\(\\mathcal\{a\}\_\{S\},p\_\{\\mathcal\{a\}\_\{S\}\}\)=\\sum\_\{i\\in S\}U\_\{i\}\(\\mathcal\{a\}\_\{i\},p\_\{\\mathcal\{a\}\_\{i\}\}\)\+\\sum\_\{i\\notin S\}U\_\{i\}\(\\emptyset\),and similarly forLθL\_\{\\theta\}, giving the instantiated width

Wθ\(𝒶S,p𝒶S\)=∑i∈SWi\(𝒶i,p𝒶i\)\+∑i∉SWi\(∅\)\.W\_\{\\theta\}\(\\mathcal\{a\}\_\{S\},p\_\{\\mathcal\{a\}\_\{S\}\}\)=\\sum\_\{i\\in S\}W\_\{i\}\(\\mathcal\{a\}\_\{i\},p\_\{\\mathcal\{a\}\_\{i\}\}\)\+\\sum\_\{i\\notin S\}W\_\{i\}\(\\emptyset\)\.
The worst\-case is taken over the product domainp𝒶S∈∏i∈S\[lp𝒶i,up𝒶i\]p\_\{\\mathcal\{a\}\_\{S\}\}\\in\\prod\_\{i\\in S\}\[l\_\{p\_\{\\mathcal\{a\}\_\{i\}\}\},u\_\{p\_\{\\mathcal\{a\}\_\{i\}\}\}\]\. Since the sum is separable in thep𝒶ip\_\{\\mathcal\{a\}\_\{i\}\}, the maximum decomposes:

𝒲θ\(𝒶S\)=maxp𝒶S⁡Wθ\(𝒶S,p𝒶S\)=∑i∈S𝒲i\(𝒶i\)\+∑i∉SWi\(∅\)\.\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\_\{S\}\)=\\max\_\{p\_\{\\mathcal\{a\}\_\{S\}\}\}W\_\{\\theta\}\(\\mathcal\{a\}\_\{S\},p\_\{\\mathcal\{a\}\_\{S\}\}\)=\\sum\_\{i\\in S\}\\mathcal\{W\}\_\{i\}\(\\mathcal\{a\}\_\{i\}\)\+\\sum\_\{i\\notin S\}W\_\{i\}\(\\emptyset\)\.ForS=∅S=\\emptysetthis gives𝒲θ\(∅\)=∑iWi\(∅\)\\mathcal\{W\}\_\{\\theta\}\(\\emptyset\)=\\sum\_\{i\}W\_\{i\}\(\\emptyset\)\. Combining,

potθ\(𝒶S\)\\displaystyle pot\_\{\\theta\}\(\\mathcal\{a\}\_\{S\}\)=𝒲θ\(∅\)−𝒲θ\(𝒶S\)\\displaystyle=\\mathcal\{W\}\_\{\\theta\}\(\\emptyset\)\-\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\_\{S\}\)=∑iWi\(∅\)−∑i∈S𝒲i\(𝒶i\)−∑i∉SWi\(∅\)\\displaystyle=\\sum\_\{i\}W\_\{i\}\(\\emptyset\)\-\\sum\_\{i\\in S\}\\mathcal\{W\}\_\{i\}\(\\mathcal\{a\}\_\{i\}\)\-\\sum\_\{i\\notin S\}W\_\{i\}\(\\emptyset\)=∑i∈S\[Wi\(∅\)−𝒲i\(𝒶i\)\]=∑i∈Spoti\(𝒶i\)\.∎\\displaystyle=\\sum\_\{i\\in S\}\\bigl\[W\_\{i\}\(\\emptyset\)\-\\mathcal\{W\}\_\{i\}\(\\mathcal\{a\}\_\{i\}\)\\bigr\]=\\sum\_\{i\\in S\}pot\_\{i\}\(\\mathcal\{a\}\_\{i\}\)\.\\qed

### C\.4Achievability of target potencies

###### Lemma C\.3\(Constructive achievability\)\.

There existsv∗\>0v^\{\*\}\>0and a polynomial\-time algorithm that, given any rationalv∈\[0,v∗\]v\\in\[0,v^\{\*\}\], outputs rational parameters\(α,β,γ,δ\)∈Δ\(\\alpha,\\beta,\\gamma,\\delta\)\\in\\Deltawithpoti\(𝒶i\)=vpot\_\{i\}\(\\mathcal\{a\}\_\{i\}\)=v\.

###### Proof\.

We exhibit a one\-parameter family of cells indexed directly byvv\. Forv∈\[0,1/8\]v\\in\[0,\\,1/8\], define

Pv:=\(α,β,γ,δ\)=\(1−8v,v,3v,4v\)\.P^\{v\}:=\\bigl\(\\alpha,\\,\\beta,\\,\\gamma,\\,\\delta\\bigr\)=\\bigl\(1\-8v,\\;v,\\;3v,\\;4v\\bigr\)\.All four entries are non\-negative on\[0,1/8\]\[0,\\,1/8\]and sum to11, soPv∈ΔP^\{v\}\\in\\Delta\. We will showpoti\(𝒶i;Pv\)=vpot\_\{i\}\(\\mathcal\{a\}\_\{i\};\\,P^\{v\}\)=von this range, and setv∗:=0\.1v^\{\*\}:=0\.1\.

#### Observational width\.

By the Tian–Pearl observational bound stated above,

Wi\(∅;Pv\)=α\+δ=1−4v\.W\_\{i\}\(\\emptyset;\\,P^\{v\}\)=\\alpha\+\\delta=1\-4v\.\(1\)

#### Width with the experimental constraint\.

We now use the Tian–Pearl bounds for PNS when the interventional marginalp=P\(yi,xi\)p=P\(y\_\{i,x\_\{i\}\}\)is known\. With the experimental constraint𝒶i=P\(yi,xi\)=p\\mathcal\{a\}\_\{i\}=P\(y\_\{i,x\_\{i\}\}\)=padded to the observational data, the bounds still involve the unobserved interventional marginal

q:=P\(yi,xi′\)\.q:=P\(y\_\{i,x\_\{i\}^\{\\prime\}\}\)\.Sinceqqis not part of the allowed experiment set in the reduction, it is optimized out over its observationally feasible range

q∈\[γ,1−δ\]=\[3v,1−4v\]\.q\\in\[\\gamma,1\-\\delta\]=\[3v,1\-4v\]\.Thus the sharp lower and upper bounds onPNSi\\mathrm\{PNS\}\_\{i\}, conditional on observingpp, are obtained as

Li\(𝒶i,p\)\\displaystyle L\_\{i\}\(\\mathcal\{a\}\_\{i\},\\,p\)=minq∈\[3v,1−4v\]⁡max⁡\{0,p−q,P\(y\)−q,p−P\(y\)\},\\displaystyle=\\min\_\{q\\in\[3v,\\,1\-4v\]\}\\max\\bigl\\\{0,\\;p\-q,\\;P\(y\)\-q,\\;p\-P\(y\)\\bigr\\\},Ui\(𝒶i,p\)\\displaystyle U\_\{i\}\(\\mathcal\{a\}\_\{i\},\\,p\)=maxq∈\[3v,1−4v\]⁡min⁡\{p,1−q,α\+δ,p−q\+β\+γ\}\.\\displaystyle=\\max\_\{q\\in\[3v,\\,1\-4v\]\}\\min\\bigl\\\{p,\\;1\-q,\\;\\alpha\+\\delta,\\;p\-q\+\\beta\+\\gamma\\bigr\\\}\.Substituting the marginalsP\(y\)=α\+γ=1−5vP\(y\)=\\alpha\+\\gamma=1\-5v,α\+δ=1−4v\\alpha\+\\delta=1\-4v, andβ\+γ=4v\\beta\+\\gamma=4v:

*Lower bound\.*The expressionmax⁡\{0,p−q,\(1−5v\)−q,p−\(1−5v\)\}\\max\\\{0,\\,p\-q,\\,\(1\-5v\)\-q,\\,p\-\(1\-5v\)\\\}is non\-increasing inqq\(each term is either independent ofqqor non\-increasing inqq\), so the minimum overq∈\[3v,1−4v\]q\\in\[3v,\\,1\-4v\]is attained atq=1−4vq=1\-4v\. Substituting,

Li\(𝒶i,p\)=max⁡\{0,p\+4v−1,−v,p\+5v−1\}=max⁡\{0,p\+5v−1\},L\_\{i\}\(\\mathcal\{a\}\_\{i\},\\,p\)=\\max\\bigl\\\{0,\\;p\+4v\-1,\\;\-v,\\;p\+5v\-1\\bigr\\\}=\\max\\bigl\\\{0,\\;p\+5v\-1\\bigr\\\},where we drop−v≤0\-v\\leq 0and notep\+5v−1≥p\+4v−1p\+5v\-1\\geq p\+4v\-1\.

*Upper bound\.*The expressionmin⁡\{p,1−q,1−4v,p−q\+4v\}\\min\\\{p,\\,1\-q,\\,1\-4v,\\,p\-q\+4v\\\}is non\-decreasing in\(−q\)\(\-q\), so the maximum overq∈\[3v,1−4v\]q\\in\[3v,\\,1\-4v\]is attained atq=3vq=3v\. Substituting,

Ui\(𝒶i,p\)=min⁡\{p,1−3v,1−4v,p\+v\}=min⁡\{p,1−4v\},U\_\{i\}\(\\mathcal\{a\}\_\{i\},\\,p\)=\\min\\bigl\\\{p,\\;1\-3v,\\;1\-4v,\\;p\+v\\bigr\\\}=\\min\\bigl\\\{p,\\;1\-4v\\bigr\\\},since1−4v≤1−3v1\-4v\\leq 1\-3vandp≤p\+vp\\leq p\+v\.

The bound widthWi\(𝒶i,p;Pv\)=Ui\(𝒶i,p\)−Li\(𝒶i,p\)W\_\{i\}\(\\mathcal\{a\}\_\{i\},\\,p;\\,P^\{v\}\)=U\_\{i\}\(\\mathcal\{a\}\_\{i\},\\,p\)\-L\_\{i\}\(\\mathcal\{a\}\_\{i\},\\,p\)follows by case analysis onp∈\[1−8v,1−v\]p\\in\[1\-8v,\\,1\-v\]:

Wi\(𝒶i,p;Pv\)=\{p,p∈\[1−8v,1−5v\],1−5v,p∈\[1−5v,1−4v\],\(2−9v\)−p,p∈\[1−4v,1−v\]\.W\_\{i\}\(\\mathcal\{a\}\_\{i\},\\,p;\\,P^\{v\}\)=\\begin\{cases\}p,&p\\in\[\\,1\-8v,\\;1\-5v\\,\],\\\\\[2\.0pt\] 1\-5v,&p\\in\[\\,1\-5v,\\;1\-4v\\,\],\\\\\[2\.0pt\] \(2\-9v\)\-p,&p\\in\[\\,1\-4v,\\;1\-v\\,\]\.\\end\{cases\}\(2\)The functionp↦Wi\(𝒶i,p;Pv\)p\\mapsto W\_\{i\}\(\\mathcal\{a\}\_\{i\},\\,p;\\,P^\{v\}\)is continuous and piecewise\-linear: it rises with slope\+1\+1fromW=1−8vW=1\-8vtoW=1−5vW=1\-5v, plateaus at1−5v1\-5von\[1−5v,1−4v\]\[1\-5v,\\,1\-4v\], then falls with slope−1\-1toW=1−8vW=1\-8v\. The maximum is attained on the plateau, giving

𝒲i\(𝒶i;Pv\)=maxp∈\[1−8v,1−v\]⁡Wi\(𝒶i,p;Pv\)=1−5v\.\\mathcal\{W\}\_\{i\}\(\\mathcal\{a\}\_\{i\};\\,P^\{v\}\)=\\max\_\{p\\in\[1\-8v,\\,1\-v\]\}W\_\{i\}\(\\mathcal\{a\}\_\{i\},\\,p;\\,P^\{v\}\)=1\-5v\.\(3\)

#### Potency\.

Combining \([1](https://arxiv.org/html/2605.06993#A3.E1)\) and \([3](https://arxiv.org/html/2605.06993#A3.E3)\),

poti\(𝒶i;Pv\)=Wi\(∅;Pv\)−𝒲i\(𝒶i;Pv\)=\(1−4v\)−\(1−5v\)=v\.pot\_\{i\}\(\\mathcal\{a\}\_\{i\};\\,P^\{v\}\)=W\_\{i\}\(\\emptyset;\\,P^\{v\}\)\-\\mathcal\{W\}\_\{i\}\(\\mathcal\{a\}\_\{i\};\\,P^\{v\}\)=\(1\-4v\)\-\(1\-5v\)=v\.Atv=0\.1v=0\.1this recoverspoti\(𝒶i;P0\.1\)=0\.1pot\_\{i\}\(\\mathcal\{a\}\_\{i\};\\,P^\{0\.1\}\)=0\.1in agreement with Example[3\.5](https://arxiv.org/html/2605.06993#S3.Thmtheorem5)\.

#### Polynomial time\.

Given a rationalvvwith bit\-lengthLL, the four entries\(1−8v,v,3v,4v\)\(1\-8v,\\,v,\\,3v,\\,4v\)ofPvP^\{v\}are computed by a constant number of rational arithmetic operations and have bit\-length𝒪\(L\)\\mathcal\{O\}\(L\)\. The cell is therefore constructed in time polynomial in the bit\-length ofvv\. ∎

### C\.5The reduction

###### Theorem C\.4\(Theorem[4\.1](https://arxiv.org/html/2605.06993#S4.Thmtheorem1), restated\)\.

The max\-potency problem is NP\-hard\.

###### Proof\.

We exhibit a polynomial\-time reduction from 0\-1 knapsack to max\-potency\. Let\(v1,…,vn;w1,…,wn;B\)\(v\_\{1\},\\dots,v\_\{n\};\\,w\_\{1\},\\dots,w\_\{n\};\\,B\)be a 0\-1 knapsack instance\.

#### Rescaling\.

Ifmaxi⁡vi\>v∗\\max\_\{i\}v\_\{i\}\>v^\{\*\}, replace eachviv\_\{i\}withv~i:=vi⋅v∗/maxj⁡vj\\tilde\{v\}\_\{i\}:=v\_\{i\}\\cdot v^\{\*\}/\\max\_\{j\}v\_\{j\}, so thatv~i∈\[0,v∗\]\\tilde\{v\}\_\{i\}\\in\[0,v^\{\*\}\]\. Since this rescaling multiplies every subset sum∑i∈Svi\\sum\_\{i\\in S\}v\_\{i\}by the same positive constantv∗/maxj⁡vjv^\{\*\}/\\max\_\{j\}v\_\{j\}, it preserves the argmax: the optimalS∗S^\{\*\}under\(v~,w,B\)\(\\tilde\{v\},w,B\)is identical to the optimalS∗S^\{\*\}under\(v,w,B\)\(v,w,B\)\. The rescaled valuesv~i\\tilde\{v\}\_\{i\}are rational with bit\-length polynomial in the original input\. Henceforth we work withv~i\\tilde\{v\}\_\{i\}in place ofviv\_\{i\}\.

#### Construction\.

Construct the max\-potency instance\(𝒢,P\(𝑽\),θ,𝒜,c,B\)\(\\mathcal\{G\},P\(\\boldsymbol\{V\}\),\\theta,\\mathcal\{A\},c,B\)as in Section[C\.2](https://arxiv.org/html/2605.06993#A3.SS2), with cell distributions chosen via Lemma[C\.3](https://arxiv.org/html/2605.06993#A3.Thmtheorem3)so thatpoti\(𝒶i\)=v~ipot\_\{i\}\(\\mathcal\{a\}\_\{i\}\)=\\tilde\{v\}\_\{i\}\. The construction runs in time polynomial in the size of the knapsack instance\.

#### Equivalence\.

By Lemma[C\.2](https://arxiv.org/html/2605.06993#A3.Thmtheorem2), for anyS⊆\{1,…,n\}S\\subseteq\\\{1,\\dots,n\\\},

potθ\(𝒶S\)=∑i∈Spoti\(𝒶i\)=∑i∈Sv~i,pot\_\{\\theta\}\(\\mathcal\{a\}\_\{S\}\)=\\sum\_\{i\\in S\}pot\_\{i\}\(\\mathcal\{a\}\_\{i\}\)=\\sum\_\{i\\in S\}\\tilde\{v\}\_\{i\},and by constructionc\(𝒶S\)=∑i∈Swic\(\\mathcal\{a\}\_\{S\}\)=\\sum\_\{i\\in S\}w\_\{i\}\. The max\-potency problem on the constructed instance is therefore

maxS⊆\{1,…,n\}∑i∈Sv~is\.t\.∑i∈Swi≤B,\\max\_\{S\\subseteq\\\{1,\\dots,n\\\}\}\\sum\_\{i\\in S\}\\tilde\{v\}\_\{i\}\\quad\\text\{s\.t\.\}\\quad\\sum\_\{i\\in S\}w\_\{i\}\\leq B,whose argmax coincides, by the rescaling argument, with the argmax of the original 0\-1 knapsack instance\. Since 0\-1 knapsack is NP\-hard, max\-potency is NP\-hard\. ∎

## Appendix DAdditional Examples

![Refer to caption](https://arxiv.org/html/2605.06993v1/images/simplestExample/Z1_W.png)\(a\)Experiment𝒶1\\mathcal\{a\}\_\{1\}\.
![Refer to caption](https://arxiv.org/html/2605.06993v1/images/simplestExample/Z2_W.png)\(b\)Experiment𝒶2\\mathcal\{a\}\_\{2\}\.
![Refer to caption](https://arxiv.org/html/2605.06993v1/images/simplestExample/Z3_W.png)\(c\)Experiment𝒶3=\{𝒶1,𝒶2\}\\mathcal\{a\}\_\{3\}=\\\{\\mathcal\{a\}\_\{1\},\\mathcal\{a\}\_\{2\}\\\}\.

Figure 6:Example[3\.5](https://arxiv.org/html/2605.06993#S3.Thmtheorem5)\. The realized bound widthWθ\(𝒶,p𝒶\):=Uθ\(𝒶,p𝒶\)−Lθ\(𝒶,p𝒶\)W\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\):=U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)is shown as a function ofp𝒶p\_\{\\mathcal\{a\}\}\.###### Example D\.1\(Cartesian product decomposition\)\.

Consider the chainA→B→C→DA\\to B\\to C\\to Dwith confoundersR1R\_\{1\}on\{A,B\}\\\{A,B\\\}andR2R\_\{2\}on\{C,D\}\\\{C,D\\\}\(Figure[7](https://arxiv.org/html/2605.06993#A4.F7)\)\. Structural equations:A=fA\(R1\)A=f\_\{A\}\(R\_\{1\}\),B=fB\(A,R1\)B=f\_\{B\}\(A,R\_\{1\}\),C=fC\(B,R2\)C=f\_\{C\}\(B,R\_\{2\}\),D=fD\(C,R2\)D=f\_\{D\}\(C,R\_\{2\}\)\. The districts are𝐃1=\{R1,A,B\}\\boldsymbol\{D\}\_\{1\}=\\\{R\_\{1\},A,B\\\}and𝐃2=\{R2,C,D\}\\boldsymbol\{D\}\_\{2\}=\\\{R\_\{2\},C,D\\\}with response\-types𝐑𝐃1=\{R1\}\\boldsymbol\{R\}\_\{\\boldsymbol\{D\}\_\{1\}\}=\\\{R\_\{1\}\\\}and𝐑𝐃2=\{R2\}\\boldsymbol\{R\}\_\{\\boldsymbol\{D\}\_\{2\}\}=\\\{R\_\{2\}\\\}\.

AABBCCDDR1R\_\{1\}R2R\_\{2\}Figure 7:Chain graph with two districts and confoundersR1R\_\{1\},R2R\_\{2\}\.We label response\-type values by the function they encode\. ForAA\(no parents\),a0a\_\{0\}anda1a\_\{1\}denote the constantsA=0A=0andA=1A=1\. ForBBwith parentAA, the four functions\{0,1\}→\{0,1\}\\\{0,1\\\}\\to\\\{0,1\\\}are labeled following the response\-type terminology of the literature\[Angristet al\.,[1996](https://arxiv.org/html/2605.06993#bib.bib34), Greenland and Robins,[1986](https://arxiv.org/html/2605.06993#bib.bib33)\]:

The behaviors ofCC\(w\.r\.t\. parentBB\) andDD\(w\.r\.t\. parentCC\) are labeled analogously\. Fix𝐯=\(A=1,B=0,C=1,D=0\)\\boldsymbol\{v\}=\(A\{=\}1,B\{=\}0,C\{=\}1,D\{=\}0\):

h1\(𝒗\)\\displaystyle h\_\{1\}\(\\boldsymbol\{v\}\)=\{r1:fA\(r1\)=1∧fB\(1,r1\)=0\}=\{\(a1,bnev\),\(a1,bdef\)\},\\displaystyle=\\big\\\{r\_\{1\}:f\_\{A\}\(r\_\{1\}\)=1\\;\\land\\;f\_\{B\}\(1,r\_\{1\}\)=0\\big\\\}=\\big\\\{\(a\_\{1\},b\_\{nev\}\),\\;\(a\_\{1\},b\_\{def\}\)\\big\\\},h2\(𝒗\)\\displaystyle h\_\{2\}\(\\boldsymbol\{v\}\)=\{r2:fC\(0,r2\)=1∧fD\(1,r2\)=0\}=\{\(cdef,dnev\),\(cdef,ddef\),\(calw,dnev\),\(calw,ddef\)\}\.\\displaystyle=\\big\\\{r\_\{2\}:f\_\{C\}\(0,r\_\{2\}\)=1\\;\\land\\;f\_\{D\}\(1,r\_\{2\}\)=0\\big\\\}=\\big\\\{\(c\_\{def\},d\_\{nev\}\),\\;\(c\_\{def\},d\_\{def\}\),\\;\(c\_\{alw\},d\_\{nev\}\),\\;\(c\_\{alw\},d\_\{def\}\)\\big\\\}\.By Lemma[6\.5](https://arxiv.org/html/2605.06993#S6.Thmtheorem5),h\(𝐯\)=h1\(𝐯\)×h2\(𝐯\)h\(\\boldsymbol\{v\}\)=h\_\{1\}\(\\boldsymbol\{v\}\)\\times h\_\{2\}\(\\boldsymbol\{v\}\)with\|h\(𝐯\)\|=2×4=8\|h\(\\boldsymbol\{v\}\)\|=2\\times 4=8\. For verification:𝐫=\(a1,bnev,cdef,dnev\)\\boldsymbol\{r\}=\(a\_\{1\},b\_\{nev\},c\_\{def\},d\_\{nev\}\)yieldsfA\(a1\)=1f\_\{A\}\(a\_\{1\}\)=1,fB\(1,bnev\)=0f\_\{B\}\(1,b\_\{nev\}\)=0,fC\(0,cdef\)=1f\_\{C\}\(0,c\_\{def\}\)=1,fD\(1,dnev\)=0f\_\{D\}\(1,d\_\{nev\}\)=0— consistent with𝐯\\boldsymbol\{v\}and therefore𝐫∈h\(𝐯\)\\boldsymbol\{r\}\\in h\(\\boldsymbol\{v\}\)\.

###### Example D\.2\(Reducing observational constraints via c\-factors in Figure[1\(b\)](https://arxiv.org/html/2605.06993#S3.F1.sf2)\)\.

Under topological orderA≺B≺X≺C≺D≺M≺YA\\prec B\\prec X\\prec C\\prec D\\prec M\\prec Y, the c\-factors are:

Q1\(𝒗\)\\displaystyle Q\_\{1\}\(\\boldsymbol\{v\}\)=P\(a\)P\(b∣a\)P\(c∣a,b,x\),\\displaystyle=P\(a\)\\,P\(b\\mid a\)\\,P\(c\\mid a,b,x\),Q2\(𝒗\)\\displaystyle Q\_\{2\}\(\\boldsymbol\{v\}\)=P\(x∣a,b\)P\(m∣a,b,x,c,d\)P\(y∣a,b,x,c,d,m\),\\displaystyle=P\(x\\mid a,b\)\\,P\(m\\mid a,b,x,c,d\)\\,P\(y\\mid a,b,x,c,d,m\),Q3\(𝒗\)\\displaystyle Q\_\{3\}\(\\boldsymbol\{v\}\)=P\(d∣a,b,x,c\)\.\\displaystyle=P\(d\\mid a,b,x,c\)\.By Lemma[6\.10](https://arxiv.org/html/2605.06993#S6.Thmtheorem10), the full observational constraintP\(𝐯\)=∑𝐫∈h\(𝐯\)P\(r1,r2,r3,r4\)P\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{r\}\\in h\(\\boldsymbol\{v\}\)\}P\(r\_\{1\},r\_\{2\},r\_\{3\},r\_\{4\}\)reduces to:

Q2\(𝒗\)=∑\(r1,r2\)∈h2\(𝒗\)P\(r1,r2\)\.Q\_\{2\}\(\\boldsymbol\{v\}\)=\\sum\_\{\(r\_\{1\},r\_\{2\}\)\\in h\_\{2\}\(\\boldsymbol\{v\}\)\}P\(r\_\{1\},r\_\{2\}\)\.R3R\_\{3\}andR4R\_\{4\}have been divided out\. The base program𝒫∅\\mathcal\{P\}\_\{\\emptyset\}therefore optimizes over\|supp\(R1\)\|⋅\|supp\(R2\)\|\|supp\(R\_\{1\}\)\|\\cdot\|supp\(R\_\{2\}\)\|decision variables rather than∏i=14\|supp\(Ri\)\|\\prod\_\{i=1\}^\{4\}\|supp\(R\_\{i\}\)\|\.

###### Example D\.3\(Step\-by\-step trace of Algorithm[3](https://arxiv.org/html/2605.06993#alg3)\)\.

We runAllPathsInterceptedon Figure[1\(b\)](https://arxiv.org/html/2605.06993#S3.F1.sf2)with𝐑θ∗=\{R1,R2\}\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}=\\\{R\_\{1\},R\_\{2\}\\\},𝐖=\{D\}\\boldsymbol\{W\}=\\\{D\\\},𝐙=\{C\}\\boldsymbol\{Z\}=\\\{C\\\}\. Removing𝐙\\boldsymbol\{Z\}yields the graph in Figure[8](https://arxiv.org/html/2605.06993#A4.F8)\(edges to/fromCCdeleted\)\.

AABBXXMMYYCCDDR3R\_\{3\}R1R\_\{1\}R2R\_\{2\}R4R\_\{4\}Figure 8:Figure[1\(b\)](https://arxiv.org/html/2605.06993#S3.F1.sf2)with all edges to/fromCCremoved \(i\.e\., after deleting𝐙=\{C\}\\boldsymbol\{Z\}=\\\{C\\\}\)\.After initialization:visited=\{R1,R2\}\\texttt\{visited\}=\\\{R\_\{1\},R\_\{2\}\\\},queue=\(R1,R2\)\\texttt\{queue\}=\(R\_\{1\},R\_\{2\}\)\.

Iteration 1:current=R1\\texttt\{current\}=R\_\{1\}\. Not in𝐖\\boldsymbol\{W\}\. Children in𝐄′\\boldsymbol\{E\}^\{\\prime\}:XX,MM\(unvisited\)\. Updatevisited=\{R1,R2,X,M\}\\texttt\{visited\}=\\\{R\_\{1\},R\_\{2\},X,M\\\},queue=\(R2,X,M\)\\texttt\{queue\}=\(R\_\{2\},X,M\)\.

Iteration 2:current=R2\\texttt\{current\}=R\_\{2\}\. Unvisited child:YY\. Updatevisited=\{R1,R2,X,M,Y\}\\texttt\{visited\}=\\\{R\_\{1\},R\_\{2\},X,M,Y\\\},queue=\(X,M,Y\)\\texttt\{queue\}=\(X,M,Y\)\.

Iterations 3–5:XXhas no unvisited children;MM’s childYYalready visited;YYhas no children\.

ReturnTrue\. No path from𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}toDDexists in𝑬′\\boldsymbol\{E\}^\{\\prime\}\. By Corollary[6\.12](https://arxiv.org/html/2605.06993#S6.Thmtheorem12), this intervention is useless\.

###### Example D\.4\(Pruning experiments in Figure[1\(b\)](https://arxiv.org/html/2605.06993#S3.F1.sf2)\)\.

Recall the graph from Figure[1\(b\)](https://arxiv.org/html/2605.06993#S3.F1.sf2)\. Every observed variableVVis binary and can take values in\{v,v′\}\\\{v,v^\{\\prime\}\\\}\. Our query isθ=P\(y\|do\(m\)\)\\theta=P\(y\|do\(m\)\)\. Therefore,𝐑θ∗=\{R1,R2\}\\boldsymbol\{R\}^\{\*\}\_\{\\theta\}=\\\{R\_\{1\},R\_\{2\}\\\}\. We define:

- •𝒶1\\mathcal\{a\}\_\{1\}:C,D\|do\(x\)C,D\|do\(x\),
- •𝒶2\\mathcal\{a\}\_\{2\}:Y,M,X\|do\(a,b\)Y,M,X\|do\(a,b\),
- •𝒶3\\mathcal\{a\}\_\{3\}:Y,M\|do\(x\)Y,M\|do\(x\),
- •𝒶4\\mathcal\{a\}\_\{4\}:Y,M\|do\(x′\)Y,M\|do\(x^\{\\prime\}\),
- •𝒶5\\mathcal\{a\}\_\{5\}:B\|do\(a\)B\|do\(a\),
- •𝒶6\\mathcal\{a\}\_\{6\}:dc′,dc′d\_\{c^\{\\prime\}\},d^\{\\prime\}\_\{c\}— that is, testing for monotonicity alongC→DC\\to D\.

Our candidate experiments are𝒜=\{𝒶1,…,𝒶6\}\\mathcal\{A\}=\\\{\\mathcal\{a\}\_\{1\},\\dots,\\mathcal\{a\}\_\{6\}\\\}on which we run Algorithm[2](https://arxiv.org/html/2605.06993#S6.F2)\. SinceP\(𝒶2\)P\(\\mathcal\{a\}\_\{2\}\)is ID and𝒶1,𝒶5,𝒶6\\mathcal\{a\}\_\{1\},\\mathcal\{a\}\_\{5\},\\mathcal\{a\}\_\{6\}fulfill the criterion from Corollary[6\.12](https://arxiv.org/html/2605.06993#S6.Thmtheorem12), we obtain the set of useless experiments𝒜†=\{𝒶1,𝒶2,𝒶5,𝒶6\}\\mathcal\{A\}^\{\\dagger\}=\\\{\\mathcal\{a\}\_\{1\},\\mathcal\{a\}\_\{2\},\\mathcal\{a\}\_\{5\},\\mathcal\{a\}\_\{6\}\\\}\.

## Appendix EAlgorithmic Details

### E\.1Constructing the Polynomial Program

Canonicalization\.We assume throughout that the input ADMG is canonicalized—no disturbance has a parent in𝒢\\mathcal\{G\}, and no disturbance influences a strict subset of another’s children\. This is without loss of generality for the full data law\[Duarteet al\.,[2023](https://arxiv.org/html/2605.06993#bib.bib13), Proposition 1\]; see Appendix B\.1 of that paper for the canonicalization procedure\.

Constructing𝒫∅\\mathcal\{P\}\_\{\\emptyset\}\.The base program is obtained by invoking Algorithm 2 ofDuarteet al\.\[[2023](https://arxiv.org/html/2605.06993#bib.bib13)\]with inputs\(𝒢,P\(𝑽\),∅,supp\(𝑽\),θ\)\(\\mathcal\{G\},P\(\\boldsymbol\{V\}\),\\emptyset,supp\(\\boldsymbol\{V\}\),\\theta\)\.

Constructing𝒫𝒶\\mathcal\{P\}\_\{\\mathcal\{a\}\}\.Given the base program, each experimental statement is polynomialized via the same procedure applied to the equality𝒶~=p𝒶~\\tilde\{\\mathcal\{a\}\}=p\_\{\\tilde\{\\mathcal\{a\}\}\}:

Algorithm 1Formulating𝒫𝒶\\mathcal\{P\}\_\{\\mathcal\{a\}\}1:Base Program

𝒫∅=\(𝒯,𝒞∅\)\\mathcal\{P\}\_\{\\emptyset\}=\(\\mathcal\{T\},\\mathcal\{C\}\_\{\\emptyset\}\), Experiments

𝒶⊆𝒜\\mathcal\{a\}\\subseteq\\mathcal\{A\}
2:Program

𝒫𝒶=\(𝒯,𝒞𝒶\)\\mathcal\{P\}\_\{\\mathcal\{a\}\}=\(\\mathcal\{T\},\\mathcal\{C\}\_\{\\mathcal\{a\}\}\)
3:

𝒞𝒶←𝒞∅\\mathcal\{C\}\_\{\\mathcal\{a\}\}\\leftarrow\\mathcal\{C\}\_\{\\emptyset\}⊳\\trianglerightInitialize with base constraints

4:for

𝒶~∈𝒶\\tilde\{\\mathcal\{a\}\}\\in\\mathcal\{a\}do

5:

c𝒶~←duarte\.polynomialize\(𝒶~=p𝒶~\)c\_\{\\tilde\{\\mathcal\{a\}\}\}\\leftarrow\\text\{duarte\.polynomialize\}\(\\tilde\{\\mathcal\{a\}\}=p\_\{\\tilde\{\\mathcal\{a\}\}\}\)
6:

𝒞𝒶←𝒞𝒶∪\{c𝒶~\}\\mathcal\{C\}\_\{\\mathcal\{a\}\}\\leftarrow\\mathcal\{C\}\_\{\\mathcal\{a\}\}\\cup\\\{c\_\{\\tilde\{\\mathcal\{a\}\}\}\\\}
7:endfor

8:return

𝒫𝒶=\(𝒯,𝒞𝒶\)\\mathcal\{P\}\_\{\\mathcal\{a\}\}=\(\\mathcal\{T\},\\mathcal\{C\}\_\{\\mathcal\{a\}\}\)

### E\.2Evaluating Potency

For a fixed experiment set𝒶\\mathcal\{a\}, potency evaluation requires the worst\-case post\-experimental width

𝒲θ\(𝒶\)=maxp𝒶⁡\[Uθ\(𝒶,p𝒶\)−Lθ\(𝒶,p𝒶\)\]\.\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)=\\max\_\{p\_\{\\mathcal\{a\}\}\}\\left\[U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\\right\]\.Unfolding the inner bounds as polynomial programs over independent optimizersPUP^\{U\}andPLP^\{L\}, both required to satisfy the observational constraints and to reproduce the experimental outcomep𝒶p\_\{\\mathcal\{a\}\},

𝒲θ\(𝒶\)=maxp𝒶⁡maxPU∈𝒞∅f\(PU\)=p𝒶⁡maxPL∈𝒞∅f\(PL\)=p𝒶⁡\[𝒯\(PU\)−𝒯\(PL\)\]\.\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)=\\max\_\{p\_\{\\mathcal\{a\}\}\}\\;\\max\_\{\\begin\{subarray\}\{c\}P^\{U\}\\in\\mathcal\{C\}\_\{\\emptyset\}\\\\ f\(P^\{U\}\)=p\_\{\\mathcal\{a\}\}\\end\{subarray\}\}\\;\\max\_\{\\begin\{subarray\}\{c\}P^\{L\}\\in\\mathcal\{C\}\_\{\\emptyset\}\\\\ f\(P^\{L\}\)=p\_\{\\mathcal\{a\}\}\\end\{subarray\}\}\\bigl\[\\mathcal\{T\}\(P^\{U\}\)\-\\mathcal\{T\}\(P^\{L\}\)\\bigr\]\.The variablep𝒶p\_\{\\mathcal\{a\}\}appears only as the common value off\(PU\)f\(P^\{U\}\)andf\(PL\)f\(P^\{L\}\)\. We can therefore drop it as a decision variable and replace its two binding constraints with the single equalityf\(PU\)=f\(PL\)f\(P^\{U\}\)=f\(P^\{L\}\)\. Whatever value the two optimizers agree on*is*the experimental outcome; no separate enumeration ofp𝒶p\_\{\\mathcal\{a\}\}is needed\.

This also resolves the question of feasibility forp𝒶p\_\{\\mathcal\{a\}\}\. In the outer\-loop view one would first compute bounds onp𝒶p\_\{\\mathcal\{a\}\}under𝒞∅\\mathcal\{C\}\_\{\\emptyset\}and grid over them; here those bounds are enforced implicitly, since the existence ofPU,PL∈𝒞∅P^\{U\},P^\{L\}\\in\\mathcal\{C\}\_\{\\emptyset\}producing a common valueppis exactly the definition ofppbeing feasible\. See Algorithm[2](https://arxiv.org/html/2605.06993#alg2)\.

Algorithm 2EvaluatePotency1:Base program

𝒫∅=\(𝒯,𝒞∅\)\\mathcal\{P\}\_\{\\emptyset\}=\(\\mathcal\{T\},\\mathcal\{C\}\_\{\\emptyset\}\), experiment set

𝒶\\mathcal\{a\}
2:Potency

potθ\(𝒶\)pot\_\{\\theta\}\(\\mathcal\{a\}\)
3:

Lθ←min⁡𝒯L\_\{\\theta\}\\leftarrow\\min\\mathcal\{T\}subject to

𝒞∅\\mathcal\{C\}\_\{\\emptyset\}
4:

Uθ←max⁡𝒯U\_\{\\theta\}\\leftarrow\\max\\mathcal\{T\}subject to

𝒞∅\\mathcal\{C\}\_\{\\emptyset\}
5:

𝒲θ\(∅\)←Uθ−Lθ\\mathcal\{W\}\_\{\\theta\}\(\\emptyset\)\\leftarrow U\_\{\\theta\}\-L\_\{\\theta\}
6:Create two copies of the response\-type probabilities:

\{PU\(𝒓\):𝒓∈supp\(𝑹\)\},\{PL\(𝒓\):𝒓∈supp\(𝑹\)\}\.\\\{P^\{U\}\(\\boldsymbol\{r\}\):\\boldsymbol\{r\}\\in supp\(\\boldsymbol\{R\}\)\\\},\\qquad\\\{P^\{L\}\(\\boldsymbol\{r\}\):\\boldsymbol\{r\}\\in supp\(\\boldsymbol\{R\}\)\\\}\.
7:Let

𝒞∅U\\mathcal\{C\}\_\{\\emptyset\}^\{U\}and

𝒞∅L\\mathcal\{C\}\_\{\\emptyset\}^\{L\}denote copies of

𝒞∅\\mathcal\{C\}\_\{\\emptyset\}using

PU\(𝒓\)P^\{U\}\(\\boldsymbol\{r\}\)and

PL\(𝒓\)P^\{L\}\(\\boldsymbol\{r\}\), respectively\.

8:Form the coupled constraint set

𝒞𝒶cpl=𝒞∅U∪𝒞∅L∪\{f𝒶~\(𝑹U\)=f𝒶~\(𝑹L\):𝒶~∈𝒶\}\.\\mathcal\{C\}^\{\\mathrm\{cpl\}\}\_\{\\mathcal\{a\}\}=\\mathcal\{C\}\_\{\\emptyset\}^\{U\}\\cup\\mathcal\{C\}\_\{\\emptyset\}^\{L\}\\cup\\left\\\{f\_\{\\tilde\{\\mathcal\{a\}\}\}\(\\boldsymbol\{R\}^\{U\}\)=f\_\{\\tilde\{\\mathcal\{a\}\}\}\(\\boldsymbol\{R\}^\{L\}\):\\tilde\{\\mathcal\{a\}\}\\in\\mathcal\{a\}\\right\\\}\.
9:Compute

𝒲θ\(𝒶\)←max⁡\[𝒯\(𝑹U\)−𝒯\(𝑹L\)\]s\.t\.𝒞𝒶cpl\.\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)\\leftarrow\\max\\left\[\\mathcal\{T\}\(\\boldsymbol\{R\}^\{U\}\)\-\\mathcal\{T\}\(\\boldsymbol\{R\}^\{L\}\)\\right\]\\quad\\text\{s\.t\.\}\\quad\\mathcal\{C\}^\{\\mathrm\{cpl\}\}\_\{\\mathcal\{a\}\}\.
10:return

𝒲θ\(∅\)−𝒲θ\(𝒶\)\\mathcal\{W\}\_\{\\theta\}\(\\emptyset\)\-\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)

The formulation is exact at the mathematical level; numerical solutions are subject to the tolerances of the polynomial optimizer\.

### E\.3AllPathsIntercepted Algorithm

Algorithm 3AllPathsInterceptedGraph

𝒢=\(𝑽,𝑹,𝑬\)\\mathcal\{G\}=\(\\boldsymbol\{V\},\\boldsymbol\{R\},\\boldsymbol\{E\}\),

𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}, Outcomes

𝑾\\boldsymbol\{W\}, Interventions

𝒁\\boldsymbol\{Z\}
Trueiff every directed path

𝑹θ∗→𝑾\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\\to\\boldsymbol\{W\}passes through

𝒁\\boldsymbol\{Z\}
𝑬′←\{\(u,v\)∈𝑬:u∉𝒁∧v∉𝒁\}\\boldsymbol\{E\}^\{\\prime\}\\leftarrow\\\{\(u,v\)\\in\\boldsymbol\{E\}:u\\notin\\boldsymbol\{Z\}\\land v\\notin\\boldsymbol\{Z\}\\\}⊳\\trianglerightRemove edges involving𝒁\\boldsymbol\{Z\}

visited←𝑹θ∗\\texttt\{visited\}\\leftarrow\\boldsymbol\{R\}\_\{\\theta\}^\{\*\};

queue←list\(𝑹θ∗\)\\quad\\texttt\{queue\}\\leftarrow\\texttt\{list\}\(\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\)
while

queue≠∅\\texttt\{queue\}\\neq\\emptysetdo

current←queue\.pop\(\)\\texttt\{current\}\\leftarrow\\texttt\{queue\.pop\}\(\)

if

current∈𝑾\\texttt\{current\}\\in\\boldsymbol\{W\}thenreturnFalse⊳\\trianglerightUnintercepted path found

endif

for

c∈children\(current,𝑬′\)c\\in\\texttt\{children\}\(\\texttt\{current\},\\boldsymbol\{E\}^\{\\prime\}\)with

c∉visitedc\\notin\\texttt\{visited\}do

visited←visited∪\{c\}\\texttt\{visited\}\\leftarrow\\texttt\{visited\}\\cup\\\{c\\\};

queue\.append\(c\)\\quad\\texttt\{queue\.append\}\(c\)
endfor

endwhile

returnTrue⊳\\trianglerightAll paths intercepted

Runtime of Algorithm[3](https://arxiv.org/html/2605.06993#alg3)

We analyze each component of Algorithm[3](https://arxiv.org/html/2605.06993#alg3):

Initialization:Constructing𝑬′\\boldsymbol\{E\}^\{\\prime\}requires iterating through all edges and performing two membership checks per edge, yielding𝒪\(\|𝑬\|\)\\mathcal\{O\}\(\|\\boldsymbol\{E\}\|\)\. Initializingvisitedandqueuewith𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}takes𝒪\(\|𝑹θ∗\|\)≤𝒪\(\|𝑽\|\)\\mathcal\{O\}\(\|\\boldsymbol\{R\}^\{\*\}\_\{\\theta\}\|\)\\leq\\mathcal\{O\}\(\|\\boldsymbol\{V\}\|\)\(this inequality holds in canonical DAGs\)\.

BFS Loop:Each node in𝑽∖𝒁\\boldsymbol\{V\}\\setminus\\boldsymbol\{Z\}is added tovisitedat most once and consequently toqueueat most once\. Therefore, the while\-loop executes at most\|𝑽∖𝒁\|≤\|𝑽\|\|\\boldsymbol\{V\}\\setminus\\boldsymbol\{Z\}\|\\leq\|\\boldsymbol\{V\}\|iterations, contributing𝒪\(\|𝑽\|\)\\mathcal\{O\}\(\|\\boldsymbol\{V\}\|\)for dequeue operations and membership checks\.

For the inner loop, each edge\(u,v\)∈𝑬′\(u,v\)\\in\\boldsymbol\{E\}^\{\\prime\}is examined at most once—when nodeuuis dequeued and we iterate over its children\. Since every edge is examined at most once across the entire execution, the total work by the inner loop is𝒪\(\|𝑬′\|\)⊆𝒪\(\|𝑬\|\)\\mathcal\{O\}\(\|\\boldsymbol\{E\}^\{\\prime\}\|\)\\subseteq\\mathcal\{O\}\(\|\\boldsymbol\{E\}\|\)\.

Total:

𝒪\(\|𝑽\|\+\|𝑬\|\)⏟init\.\+𝒪\(\|𝑽\|\)⏟while\-loop\+𝒪\(\|𝑬\|\)⏟edge exams\.=𝒪\(\|𝑽\|\+\|𝑬\|\)\.\\underbrace\{\\mathcal\{O\}\(\|\\boldsymbol\{V\}\|\+\|\\boldsymbol\{E\}\|\)\}\_\{\\text\{init\.\}\}\+\\underbrace\{\\mathcal\{O\}\(\|\\boldsymbol\{V\}\|\)\}\_\{\\text\{while\-loop\}\}\+\\underbrace\{\\mathcal\{O\}\(\|\\boldsymbol\{E\}\|\)\}\_\{\\text\{edge exams\.\}\}=\\mathcal\{O\}\(\|\\boldsymbol\{V\}\|\+\|\\boldsymbol\{E\}\|\)\.

### E\.4Solving the Max\-Potency Problem

We useEvaluatePotencyas a subroutine to solve the budget\-constrained max\-potency problem exactly\. Since the problem is NP\-hard \(Appendix[C](https://arxiv.org/html/2605.06993#A3)\), the algorithm is exponential in the worst case\.

Algorithm[2](https://arxiv.org/html/2605.06993#S6.F2)certifies a set𝒜†\\mathcal\{A\}^\{\\dagger\}of individually useless experiments\. Let𝒜ID†⊆𝒜†\\mathcal\{A\}^\{\\dagger\}\_\{\\mathrm\{ID\}\}\\subseteq\\mathcal\{A\}^\{\\dagger\}denote those pruned by the ID check and𝒜Int†=𝒜†∖𝒜ID†\\mathcal\{A\}^\{\\dagger\}\_\{\\mathrm\{Int\}\}=\\mathcal\{A\}^\{\\dagger\}\\setminus\\mathcal\{A\}^\{\\dagger\}\_\{\\mathrm\{ID\}\}those pruned by interception\. By Corollary[6\.15](https://arxiv.org/html/2605.06993#S6.Thmtheorem15),𝒜ID†\\mathcal\{A\}^\{\\dagger\}\_\{\\mathrm\{ID\}\}can be removed from the search entirely\. The algorithm then enumerates all subsets of𝒜∖𝒜ID†\\mathcal\{A\}\\setminus\\mathcal\{A\}^\{\\dagger\}\_\{\\mathrm\{ID\}\}, skipping singletons in𝒜Int†\\mathcal\{A\}^\{\\dagger\}\_\{\\mathrm\{Int\}\}\.

Algorithm 4SolveMaxPotency1:Graph

𝒢\\mathcal\{G\}, query

θ\\theta, base program

𝒫∅=\(𝒯,𝒞∅\)\\mathcal\{P\}\_\{\\emptyset\}=\(\\mathcal\{T\},\\mathcal\{C\}\_\{\\emptyset\}\), candidate experiments

𝒜\\mathcal\{A\}, cost function

c:2𝒜→ℝ\+c:2^\{\\mathcal\{A\}\}\\to\\mathbb\{R\}\_\{\+\}, budget

BB
2:Optimal experiment set

𝒶⋆\\mathcal\{a\}^\{\\star\}and potency value

v⋆v^\{\\star\}
3:

𝒜†←GetUselessExperiments\(𝒢,θ,𝒜\)\\mathcal\{A\}^\{\\dagger\}\\leftarrow\\textsc\{GetUselessExperiments\}\(\\mathcal\{G\},\\theta,\\mathcal\{A\}\)
4:Partition

𝒜†\\mathcal\{A\}^\{\\dagger\}into

𝒜ID†\\mathcal\{A\}^\{\\dagger\}\_\{\\mathrm\{ID\}\}and

𝒜Int†\\mathcal\{A\}^\{\\dagger\}\_\{\\mathrm\{Int\}\}
5:

𝒶⋆←∅\\mathcal\{a\}^\{\\star\}\\leftarrow\\emptyset;

v⋆←0v^\{\\star\}\\leftarrow 0
6:for

𝒶⊆𝒜∖𝒜ID†\\mathcal\{a\}\\subseteq\\mathcal\{A\}\\setminus\\mathcal\{A\}^\{\\dagger\}\_\{\\mathrm\{ID\}\}do

7:if

\|𝒶\|=1\|\\mathcal\{a\}\|=1and

𝒶⊆𝒜Int†\\mathcal\{a\}\\subseteq\\mathcal\{A\}^\{\\dagger\}\_\{\\mathrm\{Int\}\}then

8:continue

9:endif

10:if

c\(𝒶\)\>Bc\(\\mathcal\{a\}\)\>Bthen

11:continue

12:endif

13:

v←EvaluatePotency\(𝒫∅,𝒶\)v\\leftarrow\\textsc\{EvaluatePotency\}\(\\mathcal\{P\}\_\{\\emptyset\},\\mathcal\{a\}\)
14:if

v\>v⋆v\>v^\{\\star\}then

15:

v⋆←vv^\{\\star\}\\leftarrow v
16:

𝒶⋆←𝒶\\mathcal\{a\}^\{\\star\}\\leftarrow\\mathcal\{a\}
17:endif

18:endfor

19:return

𝒶⋆,v⋆\\mathcal\{a\}^\{\\star\},v^\{\\star\}

The algorithm is exact provided thatEvaluatePotency\(Algorithm[2](https://arxiv.org/html/2605.06993#alg2)\) globally solves the coupled polynomial program\. Its runtime is𝒪\(2\|𝒜\|−\|𝒜ID†\|⋅Tpot\)\\mathcal\{O\}\(2^\{\|\\mathcal\{A\}\|\-\|\\mathcal\{A\}^\{\\dagger\}\_\{\\mathrm\{ID\}\}\|\}\\cdot T\_\{\\mathrm\{pot\}\}\), whereTpotT\_\{\\mathrm\{pot\}\}denotes the cost of one potency evaluation\. Corollary[6\.15](https://arxiv.org/html/2605.06993#S6.Thmtheorem15)reduces the exponent from\|𝒜\|\|\\mathcal\{A\}\|to\|𝒜\|−\|𝒜ID†\|\|\\mathcal\{A\}\|\-\|\\mathcal\{A\}^\{\\dagger\}\_\{\\mathrm\{ID\}\}\|\.

## Appendix FOmitted Proofs

#### Feasible sets\.

We define two types of feasible sets used in the proofs below\. The*observational feasible set*ℱ∅:=\{𝒓∈supp\(𝑹\):all observational constraints fromP\(𝑽\)are satisfied\}\\mathcal\{F\}\_\{\\emptyset\}:=\\\{\\boldsymbol\{r\}\\in supp\(\\boldsymbol\{R\}\):\\text\{all observational constraints from \}P\(\\boldsymbol\{V\}\)\\text\{ are satisfied\}\\\}is the feasible set of the base program𝒫∅\\mathcal\{P\}\_\{\\emptyset\}as constructed byDuarteet al\.\[[2023](https://arxiv.org/html/2605.06993#bib.bib13)\]\. For an experiment𝒶\\mathcal\{a\}with observed outcomep𝒶p\_\{\\mathcal\{a\}\}, the*experimental feasible set*isℱ𝒶\(p𝒶\):=\{𝒓∈supp\(𝑹\):f𝒶\(𝒓\)=p𝒶\}\\mathcal\{F\}\_\{\\mathcal\{a\}\}\(p\_\{\\mathcal\{a\}\}\):=\\\{\\boldsymbol\{r\}\\in supp\(\\boldsymbol\{R\}\):f\_\{\\mathcal\{a\}\}\(\\boldsymbol\{r\}\)=p\_\{\\mathcal\{a\}\}\\\}\. The combined feasible set isℱ∅∩ℱ𝒶\(p𝒶\)\\mathcal\{F\}\_\{\\emptyset\}\\cap\\mathcal\{F\}\_\{\\mathcal\{a\}\}\(p\_\{\\mathcal\{a\}\}\)\.

### F\.1Proof of Lemma[3\.3](https://arxiv.org/html/2605.06993#S3.Thmtheorem3)

###### Proof\.

Letp𝒶∈∏𝒶~∈𝒶\[lp𝒶~,up𝒶~\]p\_\{\\mathcal\{a\}\}\\in\\prod\_\{\\tilde\{\\mathcal\{a\}\}\\in\\mathcal\{a\}\}\[l\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\},u\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\}\]\. The combined feasible setℱ∅∩ℱ𝒶\(p𝒶\)⊆ℱ∅\\mathcal\{F\}\_\{\\emptyset\}\\cap\\mathcal\{F\}\_\{\\mathcal\{a\}\}\(p\_\{\\mathcal\{a\}\}\)\\subseteq\\mathcal\{F\}\_\{\\emptyset\}by definition of intersection\. From optimization theory, restricting to a smaller feasible set can only decrease the maximum and increase the minimum\. HenceUθ\(𝒶,p𝒶\)≤UθU\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\\leq U\_\{\\theta\}andLθ≤Lθ\(𝒶,p𝒶\)L\_\{\\theta\}\\leq L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\. It follows∀p𝒶\\forall p\_\{\\mathcal\{a\}\}:

0≤Uθ−Uθ\(𝒶,p𝒶\)∧Lθ−Lθ\(𝒶,p𝒶\)≤0⟹Lθ−Lθ\(𝒶,p𝒶\)≤Uθ−Uθ\(𝒶,p𝒶\)0\\leq U\_\{\\theta\}\-U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\\;\\land\\;L\_\{\\theta\}\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\\leq 0\\implies L\_\{\\theta\}\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\\leq U\_\{\\theta\}\-U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)⟹Uθ\(𝒶,p𝒶\)−Lθ\(𝒶,p𝒶\)≤Uθ−Lθ=𝒲θ\\implies U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\\leq U\_\{\\theta\}\-L\_\{\\theta\}=\\mathcal\{W\}\_\{\\theta\}Since this holds for allp𝒶p\_\{\\mathcal\{a\}\}, we have𝒲θ\(𝒶\)=maxp𝒶\(Uθ\(𝒶,p𝒶\)−Lθ\(𝒶,p𝒶\)\)≤𝒲θ\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)=\\underset\{\\begin\{subarray\}\{c\}p\_\{\\mathcal\{a\}\}\\end\{subarray\}\}\{\\max\}\\bigl\(U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\\bigr\)\\leq\\mathcal\{W\}\_\{\\theta\}, and hencepotθ\(𝒶\)≥0pot\_\{\\theta\}\(\\mathcal\{a\}\)\\geq 0\. ∎

### F\.2Proof of Lemma[3\.4](https://arxiv.org/html/2605.06993#S3.Thmtheorem4)

###### Proof\.

\(⇒\\Rightarrow\)Unpacking the definition yields

potθ\(𝒶\)=0⟹𝒲θ\(𝒶\)=𝒲θ⟹maxp𝒶∈∏𝒶~∈𝒶\[lp𝒶~,up𝒶~\]\(Uθ\(𝒶,p𝒶\)−Lθ\(𝒶,p𝒶\)\)=Uθ−Lθ\.pot\_\{\\theta\}\(\\mathcal\{a\}\)=0\\implies\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)=\\mathcal\{W\}\_\{\\theta\}\\implies\\underset\{\\begin\{subarray\}\{c\}p\_\{\\mathcal\{a\}\}\\in\\prod\_\{\\tilde\{\\mathcal\{a\}\}\\in\\mathcal\{a\}\}\[l\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\},u\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\}\]\\end\{subarray\}\}\{\\max\}\\bigl\(U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\\bigr\)=U\_\{\\theta\}\-L\_\{\\theta\}\.Letp𝒶†=argmaxp𝒶∈∏𝒶~∈𝒶\[lp𝒶~,up𝒶~\]Uθ\(𝒶,p𝒶\)−Lθ\(𝒶,p𝒶\)p\_\{\\mathcal\{a\}\}^\{\\dagger\}=\\underset\{\\begin\{subarray\}\{c\}p\_\{\\mathcal\{a\}\}\\in\\prod\_\{\\tilde\{\\mathcal\{a\}\}\\in\\mathcal\{a\}\}\[l\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\},u\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\}\]\\end\{subarray\}\}\{\\operatorname\{arg\\,max\}\}\\ U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\)\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}\):

Uθ\(𝒶,p𝒶†\)−Lθ\(𝒶,p𝒶†\)=Uθ−Lθ⟹Uθ−Uθ\(𝒶,p𝒶†\)=Lθ−Lθ\(𝒶,p𝒶†\)\.U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)=U\_\{\\theta\}\-L\_\{\\theta\}\\implies U\_\{\\theta\}\-U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)=L\_\{\\theta\}\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)\.As we saw in the proof of Lemma[3\.3](https://arxiv.org/html/2605.06993#S3.Thmtheorem3),Lθ≤Lθ\(𝒶,p𝒶†\)L\_\{\\theta\}\\leq L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)andUθ\(𝒶,p𝒶†\)≤UθU\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)\\leq U\_\{\\theta\}, which implies that555SinceUθ−Uθ\(𝒶,p𝒶†\)≥0U\_\{\\theta\}\-U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)\\geq 0andLθ−Lθ\(𝒶,p𝒶†\)≤0L\_\{\\theta\}\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)\\leq 0, their equality forces both to be zero\.

Uθ−Uθ\(𝒶,p𝒶†\)=0∧Lθ−Lθ\(𝒶,p𝒶†\)=0⟹Uθ\(𝒶,p𝒶†\)=Uθ∧Lθ\(𝒶,p𝒶†\)=Lθ\.U\_\{\\theta\}\-U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)=0\\;\\land\\;L\_\{\\theta\}\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)=0\\implies U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)=U\_\{\\theta\}\\;\\land\\;L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)=L\_\{\\theta\}\.
\(⇐\\Leftarrow\)Suppose there existsp𝒶†∈∏𝒶~∈𝒶\[lp𝒶~,up𝒶~\]p\_\{\\mathcal\{a\}\}^\{\\dagger\}\\in\\prod\_\{\\tilde\{\\mathcal\{a\}\}\\in\\mathcal\{a\}\}\[l\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\},u\_\{p\_\{\\tilde\{\\mathcal\{a\}\}\}\}\]withUθ\(𝒶,p𝒶†\)=UθU\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)=U\_\{\\theta\}andLθ\(𝒶,p𝒶†\)=LθL\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)=L\_\{\\theta\}\. Then

Uθ\(𝒶,p𝒶†\)−Lθ\(𝒶,p𝒶†\)=Uθ−Lθ=𝒲θ\.U\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)\-L\_\{\\theta\}\(\\mathcal\{a\},p\_\{\\mathcal\{a\}\}^\{\\dagger\}\)=U\_\{\\theta\}\-L\_\{\\theta\}=\\mathcal\{W\}\_\{\\theta\}\.Since𝒲θ\(𝒶\)\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)is the maximum over allp𝒶p\_\{\\mathcal\{a\}\}, it must hold that𝒲θ\(𝒶\)≥𝒲θ\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)\\geq\\mathcal\{W\}\_\{\\theta\}\. By Lemma[3\.3](https://arxiv.org/html/2605.06993#S3.Thmtheorem3),𝒲θ\(𝒶\)≤𝒲θ\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)\\leq\\mathcal\{W\}\_\{\\theta\}, hence𝒲θ\(𝒶\)=𝒲θ\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)=\\mathcal\{W\}\_\{\\theta\}andpotθ\(𝒶\)=0pot\_\{\\theta\}\(\\mathcal\{a\}\)=0\. ∎

### F\.3Proof of Lemma[6\.5](https://arxiv.org/html/2605.06993#S6.Thmtheorem5)\(Cartesian Product Decomposition\)

###### Proof\.

We start by applying Definition[6\.1](https://arxiv.org/html/2605.06993#S6.Thmtheorem1)toh\(𝒗\)h\(\\boldsymbol\{v\}\), simplified for the case of\|ℭ\|=1\|\\mathfrak\{C\}\|=1:

h\(𝒗\)=\{𝒓:∀V∈𝑽,FV\(𝒓\)=v\}\.h\(\\boldsymbol\{v\}\)=\\\{\\boldsymbol\{r\}:\\forall V\\in\\boldsymbol\{V\},F\_\{V\}\(\\boldsymbol\{r\}\)=v\\\}\.Since𝑿=∅\\boldsymbol\{X\}=\\emptyset\(no interventions\),FV\(𝒓\)F\_\{V\}\(\\boldsymbol\{r\}\)reduces to the structural equation with all parent values recursively fixed by𝒗\\boldsymbol\{v\}:

FV\(𝒓\)=fV\(⋀P∈𝒑𝒂\(V\)f\(f\(…f\(𝒓\),𝒓\),𝒓\)⏟equals elements of𝒗,𝒓\)\.F\_\{V\}\(\\boldsymbol\{r\}\)=f\_\{V\}\\\!\\left\(\\bigwedge\_\{P\\in\\boldsymbol\{pa\}\(V\)\}\\underbrace\{f\(f\(\\dots f\(\\boldsymbol\{r\}\),\\boldsymbol\{r\}\),\\boldsymbol\{r\}\)\}\_\{\\text\{equals elements of \}\\boldsymbol\{v\}\},\\boldsymbol\{r\}\\right\)\.Using Definition[6\.3](https://arxiv.org/html/2605.06993#S6.Thmtheorem3)to replace the recursive evaluation:

h\(𝒗\)=\{𝒓:∀V∈𝑽,fV\(𝒑𝒂𝒗,𝒓\)=v\}\.h\(\\boldsymbol\{v\}\)=\\\{\\boldsymbol\{r\}:\\forall V\\in\\boldsymbol\{V\},\\;f\_\{V\}\(\\boldsymbol\{pa\}\_\{\\boldsymbol\{v\}\},\\boldsymbol\{r\}\)=v\\\}\.
Rewriting∀V∈𝑽\\forall V\\in\\boldsymbol\{V\}as∀𝑫k∈𝑫,∀V∈𝑫k∩𝑽\\forall\\boldsymbol\{D\}\_\{k\}\\in\\boldsymbol\{D\},\\forall V\\in\\boldsymbol\{D\}\_\{k\}\\cap\\boldsymbol\{V\}and partitioning𝒓\\boldsymbol\{r\}into per\-district vectors𝒓1,…,𝒓\|𝑫\|\\boldsymbol\{r\}\_\{1\},\\dots,\\boldsymbol\{r\}\_\{\|\\boldsymbol\{D\}\|\}:

h\(𝒗\)=\{\(𝒓1,…,𝒓\|𝑫\|\):∀𝑫k∈𝑫,∀V∈𝑫k∩𝑽,fV\(𝒑𝒂𝒗,𝒓\)=v\}\.h\(\\boldsymbol\{v\}\)=\\\{\(\\boldsymbol\{r\}\_\{1\},\\dots,\\boldsymbol\{r\}\_\{\|\\boldsymbol\{D\}\|\}\):\\forall\\boldsymbol\{D\}\_\{k\}\\in\\boldsymbol\{D\},\\forall V\\in\\boldsymbol\{D\}\_\{k\}\\cap\\boldsymbol\{V\},\\;f\_\{V\}\(\\boldsymbol\{pa\}\_\{\\boldsymbol\{v\}\},\\boldsymbol\{r\}\)=v\\\}\.SincefV\(𝒑𝒂𝒗,𝒓\)=fV\(𝒑𝒂𝒗,𝒓k\)f\_\{V\}\(\\boldsymbol\{pa\}\_\{\\boldsymbol\{v\}\},\\boldsymbol\{r\}\)=f\_\{V\}\(\\boldsymbol\{pa\}\_\{\\boldsymbol\{v\}\},\\boldsymbol\{r\}\_\{k\}\)wheneverV∈𝑫kV\\in\\boldsymbol\{D\}\_\{k\}:

h\(𝒗\)=\{\(𝒓1,…,𝒓\|𝑫\|\):∀𝑫k∈𝑫,∀V∈𝑫k∩𝑽,fV\(𝒑𝒂𝒗,𝒓k\)=v\}\.h\(\\boldsymbol\{v\}\)=\\\{\(\\boldsymbol\{r\}\_\{1\},\\dots,\\boldsymbol\{r\}\_\{\|\\boldsymbol\{D\}\|\}\):\\forall\\boldsymbol\{D\}\_\{k\}\\in\\boldsymbol\{D\},\\forall V\\in\\boldsymbol\{D\}\_\{k\}\\cap\\boldsymbol\{V\},\\;f\_\{V\}\(\\boldsymbol\{pa\}\_\{\\boldsymbol\{v\}\},\\boldsymbol\{r\}\_\{k\}\)=v\\\}\.The predicate is a conjunction where each conjunct depends only on the response\-type configuration𝒓k\\boldsymbol\{r\}\_\{k\}of one district𝑫k\\boldsymbol\{D\}\_\{k\}\. By Lemma[G\.1](https://arxiv.org/html/2605.06993#A7.Thmtheorem1)\(Appendix[G](https://arxiv.org/html/2605.06993#A7)\), we can factorize this set:

h\(𝒗\)=∏𝑫k∈𝑫\{𝒓k:∀V∈𝑽∩𝑫k,fV\(𝒑𝒂𝒗\(V\),𝒓k\)=v\}=∏𝑫k∈𝑫hk\(𝒗\)\.h\(\\boldsymbol\{v\}\)=\\prod\_\{\\boldsymbol\{D\}\_\{k\}\\in\\boldsymbol\{D\}\}\\\{\\boldsymbol\{r\}\_\{k\}:\\forall V\\in\\boldsymbol\{V\}\\cap\\boldsymbol\{D\}\_\{k\},\\;f\_\{V\}\(\\boldsymbol\{pa\}\_\{\\boldsymbol\{v\}\}\(V\),\\boldsymbol\{r\}\_\{k\}\)=v\\\}=\\prod\_\{\\boldsymbol\{D\}\_\{k\}\\in\\boldsymbol\{D\}\}h\_\{k\}\(\\boldsymbol\{v\}\)\.∎

### F\.4Proof of Lemma[6\.7](https://arxiv.org/html/2605.06993#S6.Thmtheorem7)

###### Proof\.

Single district\.By Eq\. \(26\) fromTian and Pearl \[[2002](https://arxiv.org/html/2605.06993#bib.bib8)\], the c\-factor for district𝑫k\\boldsymbol\{D\}\_\{k\}is

Qk\(𝒗\)=∑𝒖k∈supp\(𝑼k\)∏Vi∈𝑽kP\(vi∣𝒑𝒂𝒗\(Vi\),𝒖k\)P\(𝒖k\),Q\_\{k\}\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{u\}\_\{k\}\\in supp\(\\boldsymbol\{U\}\_\{k\}\)\}\\prod\_\{V\_\{i\}\\in\\boldsymbol\{V\}\_\{k\}\}P\(v\_\{i\}\\mid\\boldsymbol\{pa\}\_\{\\boldsymbol\{v\}\}\(V\_\{i\}\),\\boldsymbol\{u\}\_\{k\}\)\\,P\(\\boldsymbol\{u\}\_\{k\}\),where𝑼k\\boldsymbol\{U\}\_\{k\}denotes the latent confounders in𝑫k\\boldsymbol\{D\}\_\{k\}\. Replacing𝑼k\\boldsymbol\{U\}\_\{k\}by our discrete𝑹k\\boldsymbol\{R\}\_\{k\}:

Qk\(𝒗\)=∑𝒓k∏Vi∈𝑽kP\(vi∣𝒑𝒂𝒗\(Vi\),𝒓k\)P\(𝒓k\)\.Q\_\{k\}\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{r\}\_\{k\}\}\\prod\_\{V\_\{i\}\\in\\boldsymbol\{V\}\_\{k\}\}P\(v\_\{i\}\\mid\\boldsymbol\{pa\}\_\{\\boldsymbol\{v\}\}\(V\_\{i\}\),\\boldsymbol\{r\}\_\{k\}\)\\,P\(\\boldsymbol\{r\}\_\{k\}\)\.Each realization𝒓k\\boldsymbol\{r\}\_\{k\}deterministically fixes the structural equations, so the conditional probability becomes an indicator:

Qk\(𝒗\)=∑𝒓k∏Vi∈𝑽k𝟏\[fVi\(𝒑𝒂𝒗\(Vi\),𝒓k\)=vi\]P\(𝒓k\)\.Q\_\{k\}\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{r\}\_\{k\}\}\\prod\_\{V\_\{i\}\\in\\boldsymbol\{V\}\_\{k\}\}\\boldsymbol\{1\}\\\!\\big\[f\_\{V\_\{i\}\}\(\\boldsymbol\{pa\}\_\{\\boldsymbol\{v\}\}\(V\_\{i\}\),\\,\\boldsymbol\{r\}\_\{k\}\)=v\_\{i\}\\big\]\\,P\(\\boldsymbol\{r\}\_\{k\}\)\.The product of indicators equals 1 if and only if𝒓k∈hk\(𝒗\)\\boldsymbol\{r\}\_\{k\}\\in h\_\{k\}\(\\boldsymbol\{v\}\), yieldingQk\(𝒗\)=∑𝒓k∈hk\(𝒗\)P\(𝒓k\)Q\_\{k\}\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{r\}\_\{k\}\\in h\_\{k\}\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\_\{k\}\)\.

Set of districts\.We show the result for\|𝑫′\|=2\|\\boldsymbol\{D\}^\{\\prime\}\|=2; the general case follows by induction\. Let𝑫′=\{𝑫1,𝑫2\}\\boldsymbol\{D\}^\{\\prime\}=\\\{\\boldsymbol\{D\}\_\{1\},\\boldsymbol\{D\}\_\{2\}\\\}\. Applying the single\-district result:

∏𝑫k∈𝑫′Qk\(𝒗\)=\(∑r1∈h1\(𝒗\)P\(r1\)\)\(∑r2∈h2\(𝒗\)P\(r2\)\)\.\\prod\_\{\\boldsymbol\{D\}\_\{k\}\\in\\boldsymbol\{D\}^\{\\prime\}\}Q\_\{k\}\(\\boldsymbol\{v\}\)=\\left\(\\sum\_\{r\_\{1\}\\in h\_\{1\}\(\\boldsymbol\{v\}\)\}P\(r\_\{1\}\)\\right\)\\left\(\\sum\_\{r\_\{2\}\\in h\_\{2\}\(\\boldsymbol\{v\}\)\}P\(r\_\{2\}\)\\right\)\.Expanding the product of sums:

=∑r1∈h1\(𝒗\)∑r2∈h2\(𝒗\)P\(r1\)P\(r2\)\.=\\sum\_\{r\_\{1\}\\in h\_\{1\}\(\\boldsymbol\{v\}\)\}\\sum\_\{r\_\{2\}\\in h\_\{2\}\(\\boldsymbol\{v\}\)\}P\(r\_\{1\}\)\\,P\(r\_\{2\}\)\.Since response\-types in distinct districts are independent,P\(rk1\)P\(rk2\)=P\(𝒓𝑫′\)P\(r\_\{k\_\{1\}\}\)\\,P\(r\_\{k\_\{2\}\}\)=P\(\\boldsymbol\{r\}\_\{\\boldsymbol\{D\}^\{\\prime\}\}\)\. Collapsing the double sum into a sum over the Cartesian product666Formally justified by Lemma[G\.2](https://arxiv.org/html/2605.06993#A7.Thmtheorem2)\.and applying Lemma[6\.5](https://arxiv.org/html/2605.06993#S6.Thmtheorem5):

=∑𝒓𝑫′∈h1\(𝒗\)×h2\(𝒗\)P\(𝒓𝑫′\)=∑𝒓𝑫′∈h𝑫′\(𝒗\)P\(𝒓𝑫′\)\.∎=\\sum\_\{\\boldsymbol\{r\}\_\{\\boldsymbol\{D\}^\{\\prime\}\}\\in h\_\{1\}\(\\boldsymbol\{v\}\)\\times h\_\{2\}\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\_\{\\boldsymbol\{D\}^\{\\prime\}\}\)=\\sum\_\{\\boldsymbol\{r\}\_\{\\boldsymbol\{D\}^\{\\prime\}\}\\in h\_\{\\boldsymbol\{D\}^\{\\prime\}\}\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\_\{\\boldsymbol\{D\}^\{\\prime\}\}\)\.\\qed

### F\.5Proof of Lemma[6\.10](https://arxiv.org/html/2605.06993#S6.Thmtheorem10)

###### Proof\.

We write the observational constraint for a fixed𝒗∈supp\(𝑽\)\\boldsymbol\{v\}\\in supp\(\\boldsymbol\{V\}\):

P\(𝒗\)=∑𝒓∈h\(𝒗\)P\(𝒓\)\.P\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{r\}\\in h\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\)\.Let𝑫θ¯=𝑫∖𝑫θ∗\\boldsymbol\{D\}\_\{\\bar\{\\theta\}\}=\\boldsymbol\{D\}\\setminus\\boldsymbol\{D\}^\{\*\}\_\{\\theta\}denote all districts not in𝑫θ∗\\boldsymbol\{D\}^\{\*\}\_\{\\theta\}and𝑹θ¯=𝑫θ¯∩𝑹\\boldsymbol\{R\}\_\{\\bar\{\\theta\}\}=\\boldsymbol\{D\}\_\{\\bar\{\\theta\}\}\\cap\\boldsymbol\{R\}their response types\. Since response\-types in distinct districts are independent,P\(𝒓\)=P\(𝒓θ∗\)P\(𝒓θ¯\)P\(\\boldsymbol\{r\}\)=P\(\\boldsymbol\{r\}\_\{\\theta\}^\{\*\}\)\\,P\(\\boldsymbol\{r\}\_\{\\bar\{\\theta\}\}\):

P\(𝒗\)=∑𝒓∈h\(𝒗\)P\(𝒓θ∗\)P\(𝒓θ¯\)\.P\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{r\}\\in h\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\_\{\\theta\}^\{\*\}\)\\,P\(\\boldsymbol\{r\}\_\{\\bar\{\\theta\}\}\)\.By Lemma[6\.5](https://arxiv.org/html/2605.06993#S6.Thmtheorem5),h\(𝒗\)=∏𝑫k∈𝑫hk\(𝒗\)h\(\\boldsymbol\{v\}\)=\\prod\_\{\\boldsymbol\{D\}\_\{k\}\\in\\boldsymbol\{D\}\}h\_\{k\}\(\\boldsymbol\{v\}\), which we partition ash𝑫θ∗\(𝒗\)×h𝑫θ¯\(𝒗\)h\_\{\\boldsymbol\{D\}\_\{\\theta\}^\{\*\}\}\(\\boldsymbol\{v\}\)\\times h\_\{\\boldsymbol\{D\}\_\{\\bar\{\\theta\}\}\}\(\\boldsymbol\{v\}\)\. Writing as a double summation and pulling outP\(𝒓θ∗\)P\(\\boldsymbol\{r\}\_\{\\theta\}^\{\*\}\)\(independent of𝒓θ¯\\boldsymbol\{r\}\_\{\\bar\{\\theta\}\}\):

P\(𝒗\)=∑𝒓θ∗∈h𝑫θ∗\(𝒗\)P\(𝒓θ∗\)∑𝒓θ¯∈h𝑫θ¯\(𝒗\)P\(𝒓θ¯\)\.P\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{r\}\_\{\\theta\}^\{\*\}\\in h\_\{\\boldsymbol\{D\}\_\{\\theta\}^\{\*\}\}\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\_\{\\theta\}^\{\*\}\)\\sum\_\{\\boldsymbol\{r\}\_\{\\bar\{\\theta\}\}\\in h\_\{\\boldsymbol\{D\}\_\{\\bar\{\\theta\}\}\}\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\_\{\\bar\{\\theta\}\}\)\.Since𝑫θ∗∪𝑫θ¯=𝑫\\boldsymbol\{D\}^\{\*\}\_\{\\theta\}\\cup\\boldsymbol\{D\}\_\{\\bar\{\\theta\}\}=\\boldsymbol\{D\}, the c\-factor decomposition givesP\(𝒗\)=∏𝑫k∈𝑫θ∗Qk\(𝒗\)⋅∏𝑫l∈𝑫θ¯Ql\(𝒗\)P\(\\boldsymbol\{v\}\)=\\prod\_\{\\boldsymbol\{D\}\_\{k\}\\in\\boldsymbol\{D\}^\{\*\}\_\{\\theta\}\}Q\_\{k\}\(\\boldsymbol\{v\}\)\\cdot\\prod\_\{\\boldsymbol\{D\}\_\{l\}\\in\\boldsymbol\{D\}\_\{\\bar\{\\theta\}\}\}Q\_\{l\}\(\\boldsymbol\{v\}\):

∏𝑫k∈𝑫θ∗Qk\(𝒗\)∏𝑫l∈𝑫θ¯Ql\(𝒗\)=∑𝒓θ∗∈h𝑫θ∗\(𝒗\)P\(𝒓θ∗\)∑𝒓θ¯∈h𝑫θ¯\(𝒗\)P\(𝒓θ¯\)\.\\prod\_\{\\boldsymbol\{D\}\_\{k\}\\in\\boldsymbol\{D\}^\{\*\}\_\{\\theta\}\}Q\_\{k\}\(\\boldsymbol\{v\}\)\\prod\_\{\\boldsymbol\{D\}\_\{l\}\\in\\boldsymbol\{D\}\_\{\\bar\{\\theta\}\}\}Q\_\{l\}\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{r\}\_\{\\theta\}^\{\*\}\\in h\_\{\\boldsymbol\{D\}\_\{\\theta\}^\{\*\}\}\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\_\{\\theta\}^\{\*\}\)\\sum\_\{\\boldsymbol\{r\}\_\{\\bar\{\\theta\}\}\\in h\_\{\\boldsymbol\{D\}\_\{\\bar\{\\theta\}\}\}\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\_\{\\bar\{\\theta\}\}\)\.By Lemma[6\.7](https://arxiv.org/html/2605.06993#S6.Thmtheorem7),∏𝑫l∈𝑫θ¯Ql\(𝒗\)=∑𝒓θ¯∈h𝑫θ¯\(𝒗\)P\(𝒓θ¯\)\\prod\_\{\\boldsymbol\{D\}\_\{l\}\\in\\boldsymbol\{D\}\_\{\\bar\{\\theta\}\}\}Q\_\{l\}\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{r\}\_\{\\bar\{\\theta\}\}\\in h\_\{\\boldsymbol\{D\}\_\{\\bar\{\\theta\}\}\}\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\_\{\\bar\{\\theta\}\}\)\. Dividing both sides by∏𝑫l∈𝑫θ¯Ql\(𝒗\)\>0\\prod\_\{\\boldsymbol\{D\}\_\{l\}\\in\\boldsymbol\{D\}\_\{\\bar\{\\theta\}\}\}Q\_\{l\}\(\\boldsymbol\{v\}\)\>0yields the desired identity:

∏𝑫k∈𝑫θ∗Qk\(𝒗\)=∑𝒓θ∗∈h𝑫θ∗\(𝒗\)P\(𝒓θ∗\)\.\\prod\_\{\\boldsymbol\{D\}\_\{k\}\\in\\boldsymbol\{D\}^\{\*\}\_\{\\theta\}\}Q\_\{k\}\(\\boldsymbol\{v\}\)=\\sum\_\{\\boldsymbol\{r\}\_\{\\theta\}^\{\*\}\\in h\_\{\\boldsymbol\{D\}\_\{\\theta\}^\{\*\}\}\(\\boldsymbol\{v\}\)\}P\(\\boldsymbol\{r\}\_\{\\theta\}^\{\*\}\)\.∎

###### Lemma F\.1\.

The base program𝒫∅\\mathcal\{P\}\_\{\\emptyset\}optimizes only over𝐑θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\.

###### Proof\.

First, by construction of𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}, it covers all response\-types necessary for expressingθ\\theta\. Therefore the target polynomial𝒯\\mathcal\{T\}is only a function of𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\. Second, Lemma[6\.10](https://arxiv.org/html/2605.06993#S6.Thmtheorem10)shows that all observational constraints depend only on𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\. Hence𝒞∅\\mathcal\{C\}\_\{\\emptyset\}is only a function of𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}\. Since both the target function and the constraints depend only on𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}, it is sufficient to optimize over𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}in𝒫∅\\mathcal\{P\}\_\{\\emptyset\}\. ∎

### F\.6Proof of Theorem[6\.11](https://arxiv.org/html/2605.06993#S6.Thmtheorem11)

###### Proof\.

Since𝒫∅\\mathcal\{P\}\_\{\\emptyset\}optimizes over𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}only \(Lemma[F\.1](https://arxiv.org/html/2605.06993#A6.Thmtheorem1)\), there is no sequence of co\-occurring decision variables connectingR⋆\(𝒶\)R^\{\\star\}\(\\mathcal\{a\}\)to𝒯\\mathcal\{T\}\. ByDuarteet al\.\[[2023](https://arxiv.org/html/2605.06993#bib.bib13)\], the constraint for𝒶\\mathcal\{a\}can be dropped, resulting in𝒫𝒶=𝒫∅\\mathcal\{P\}\_\{\\mathcal\{a\}\}=\\mathcal\{P\}\_\{\\emptyset\}andpotθ\(𝒶\)=0pot\_\{\\theta\}\(\\mathcal\{a\}\)=0\. ∎

### F\.7Proof of Corollary[6\.12](https://arxiv.org/html/2605.06993#S6.Thmtheorem12)

###### Proof\.

By assumption, for every𝔠∈ℭ\\mathfrak\{c\}\\in\\mathfrak\{C\}, every directed path from𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}to𝒀\(𝔠\)\\boldsymbol\{Y\}^\{\(\\mathfrak\{c\}\)\}passes through𝑿\(𝔠\)\\boldsymbol\{X\}^\{\(\\mathfrak\{c\}\)\}\. By the definition ofR⋆\(𝒶\)R^\{\\star\}\(\\mathcal\{a\}\), a response\-typeRRis inR⋆\(𝒶\)R^\{\\star\}\(\\mathcal\{a\}\)only if there exists some𝔠∈ℭ\\mathfrak\{c\}\\in\\mathfrak\{C\}and a directed path fromRRto𝒀\(𝔠\)\\boldsymbol\{Y\}^\{\(\\mathfrak\{c\}\)\}that does*not*pass through𝑿\(𝔠\)\\boldsymbol\{X\}^\{\(\\mathfrak\{c\}\)\}\. Since no such path exists for any𝔠\\mathfrak\{c\}, no element of𝑹θ∗\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}can be inR⋆\(𝒶\)R^\{\\star\}\(\\mathcal\{a\}\)\. HenceR⋆\(𝒶\)∩𝑹θ∗=∅R^\{\\star\}\(\\mathcal\{a\}\)\\cap\\boldsymbol\{R\}\_\{\\theta\}^\{\*\}=\\emptyset\. By Theorem[6\.11](https://arxiv.org/html/2605.06993#S6.Thmtheorem11), we havepotθ\(𝒶\)=0pot\_\{\\theta\}\(\\mathcal\{a\}\)=0\. ∎

### F\.8Proof of Theorem[6\.14](https://arxiv.org/html/2605.06993#S6.Thmtheorem14)

###### Proof\.

Since𝒶\\mathcal\{a\}is ID, it is uniquely computable fromP\(𝑽\)P\(\\boldsymbol\{V\}\)\. Therefore, we can determine a unique valuep𝒶∗p\_\{\\mathcal\{a\}\}^\{\*\}for the experimental outcome, which satisfies∀𝒓∈ℱ∅:f𝒶\(𝒓\)=p𝒶∗\\forall\\boldsymbol\{r\}\\in\\mathcal\{F\}\_\{\\emptyset\}:f\_\{\\mathcal\{a\}\}\(\\boldsymbol\{r\}\)=p\_\{\\mathcal\{a\}\}^\{\*\}\. Henceℱ∅⊆ℱ𝒶\(p𝒶∗\)\\mathcal\{F\}\_\{\\emptyset\}\\subseteq\\mathcal\{F\}\_\{\\mathcal\{a\}\}\(p\_\{\\mathcal\{a\}\}^\{\*\}\), which gives:

ℱ∅∩ℱ𝒶\(p𝒶∗\)=ℱ∅\.\\mathcal\{F\}\_\{\\emptyset\}\\cap\\mathcal\{F\}\_\{\\mathcal\{a\}\}\(p\_\{\\mathcal\{a\}\}^\{\*\}\)=\\mathcal\{F\}\_\{\\emptyset\}\.Since the combined feasible set coincides withℱ∅\\mathcal\{F\}\_\{\\emptyset\}, the optimization is unchanged:𝒲θ\(𝒶\)=𝒲θ\(∅\)\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{a\}\)=\\mathcal\{W\}\_\{\\theta\}\(\\emptyset\)andpotθ\(𝒶\)=0pot\_\{\\theta\}\(\\mathcal\{a\}\)=0\. ∎

### F\.9Proof of Corollary[6\.15](https://arxiv.org/html/2605.06993#S6.Thmtheorem15)

###### Proof\.

Since𝒶\\mathcal\{a\}is identifiable, it has a unique feasible outcomep𝒶∗p\_\{\\mathcal\{a\}\}^\{\*\}withℱ∅∩ℱ𝒶\(p𝒶∗\)=ℱ∅\\mathcal\{F\}\_\{\\emptyset\}\\cap\\mathcal\{F\}\_\{\\mathcal\{a\}\}\(p\_\{\\mathcal\{a\}\}^\{\*\}\)=\\mathcal\{F\}\_\{\\emptyset\}\(Theorem[6\.14](https://arxiv.org/html/2605.06993#S6.Thmtheorem14)\)\. For any𝒷⊆𝒜\\mathcal\{b\}\\subseteq\\mathcal\{A\}and feasiblep𝒷p\_\{\\mathcal\{b\}\}:

ℱ∅∩ℱ\{𝒶\}∪𝒷\(p𝒶∗,p𝒷\)=ℱ∅∩ℱ𝒶\(p𝒶∗\)∩ℱ𝒷\(p𝒷\)=ℱ∅∩ℱ𝒷\(p𝒷\)\.\\mathcal\{F\}\_\{\\emptyset\}\\cap\\mathcal\{F\}\_\{\\\{\\mathcal\{a\}\\\}\\cup\\mathcal\{b\}\}\(p\_\{\\mathcal\{a\}\}^\{\*\},p\_\{\\mathcal\{b\}\}\)=\\mathcal\{F\}\_\{\\emptyset\}\\cap\\mathcal\{F\}\_\{\\mathcal\{a\}\}\(p\_\{\\mathcal\{a\}\}^\{\*\}\)\\cap\\mathcal\{F\}\_\{\\mathcal\{b\}\}\(p\_\{\\mathcal\{b\}\}\)=\\mathcal\{F\}\_\{\\emptyset\}\\cap\\mathcal\{F\}\_\{\\mathcal\{b\}\}\(p\_\{\\mathcal\{b\}\}\)\.Sincep𝒶∗p\_\{\\mathcal\{a\}\}^\{\*\}is the only feasible value ofp𝒶p\_\{\\mathcal\{a\}\}, the worst\-case width satisfies𝒲θ\(\{𝒶\}∪𝒷\)=𝒲θ\(𝒷\)\\mathcal\{W\}\_\{\\theta\}\(\\\{\\mathcal\{a\}\\\}\\cup\\mathcal\{b\}\)=\\mathcal\{W\}\_\{\\theta\}\(\\mathcal\{b\}\), and hencepotθ\(\{𝒶\}∪𝒷\)=potθ\(𝒷\)pot\_\{\\theta\}\(\\\{\\mathcal\{a\}\\\}\\cup\\mathcal\{b\}\)=pot\_\{\\theta\}\(\\mathcal\{b\}\)\. ∎

## Appendix GHelper Lemmas

###### Lemma G\.1\(Factorization of Cartesian Products\)\.

Let𝐑1,𝐑2\\boldsymbol\{R\}\_\{1\},\\boldsymbol\{R\}\_\{2\}be two sets of random variables with disjoint supports, and letϕ1,ϕ2\\phi\_\{1\},\\phi\_\{2\}be predicates depending only on configurations of𝐑1\\boldsymbol\{R\}\_\{1\}and𝐑2\\boldsymbol\{R\}\_\{2\}, respectively\. Then

\{\(𝒓1,𝒓2\)∈supp\(𝑹1\)×supp\(𝑹2\):ϕ1\(𝒓1\)∧ϕ2\(𝒓2\)\}\\displaystyle\\Big\\\{\(\\boldsymbol\{r\}\_\{1\},\\boldsymbol\{r\}\_\{2\}\)\\in supp\(\\boldsymbol\{R\}\_\{1\}\)\\times supp\(\\boldsymbol\{R\}\_\{2\}\):\\phi\_\{1\}\(\\boldsymbol\{r\}\_\{1\}\)\\wedge\\phi\_\{2\}\(\\boldsymbol\{r\}\_\{2\}\)\\Big\\\}=\{𝒓1∈supp\(𝑹1\):ϕ1\(𝒓1\)\}×\{𝒓2∈supp\(𝑹2\):ϕ2\(𝒓2\)\}\.\\displaystyle=\\big\\\{\\boldsymbol\{r\}\_\{1\}\\in supp\(\\boldsymbol\{R\}\_\{1\}\):\\phi\_\{1\}\(\\boldsymbol\{r\}\_\{1\}\)\\big\\\}\\times\\big\\\{\\boldsymbol\{r\}\_\{2\}\\in supp\(\\boldsymbol\{R\}\_\{2\}\):\\phi\_\{2\}\(\\boldsymbol\{r\}\_\{2\}\)\\big\\\}\.

###### Proof\.

\(⊆\)\(\\subseteq\): If\(𝒓1,𝒓2\)\(\\boldsymbol\{r\}\_\{1\},\\boldsymbol\{r\}\_\{2\}\)is in the left\-hand side, thenϕ1\(𝒓1\)\\phi\_\{1\}\(\\boldsymbol\{r\}\_\{1\}\)holds, so𝒓1\\boldsymbol\{r\}\_\{1\}is in the first factor, andϕ2\(𝒓2\)\\phi\_\{2\}\(\\boldsymbol\{r\}\_\{2\}\)holds, so𝒓2\\boldsymbol\{r\}\_\{2\}is in the second factor\.

\(⊇\)\(\\supseteq\): If𝒓1\\boldsymbol\{r\}\_\{1\}is in the first factor and𝒓2\\boldsymbol\{r\}\_\{2\}in the second, then bothϕ1\(𝒓1\)\\phi\_\{1\}\(\\boldsymbol\{r\}\_\{1\}\)andϕ2\(𝒓2\)\\phi\_\{2\}\(\\boldsymbol\{r\}\_\{2\}\)hold, so\(𝒓1,𝒓2\)\(\\boldsymbol\{r\}\_\{1\},\\boldsymbol\{r\}\_\{2\}\)is in the left\-hand side\.

The generalization to any finite collection𝑹1,…,𝑹K\\boldsymbol\{R\}\_\{1\},\\dots,\\boldsymbol\{R\}\_\{K\}follows by induction\. ∎

###### Lemma G\.2\(Summation over Cartesian Product as Double Summation\)\.

Let𝔸\\mathbb\{A\}be one ofℕ,ℤ,ℚ,ℝ,ℂ\\mathbb\{N\},\\mathbb\{Z\},\\mathbb\{Q\},\\mathbb\{R\},\\mathbb\{C\}\. LetS,TS,Tbe finite sets andf:S×T→𝔸f\\colon S\\times T\\to\\mathbb\{A\}\. Then

∑\(s,t\)∈S×Tf\(s,t\)=∑s∈S∑t∈Tf\(s,t\)\.\\sum\_\{\(s,t\)\\,\\in\\,S\\times T\}f\(s,t\)=\\sum\_\{s\\in S\}\\;\\sum\_\{t\\in T\}f\(s,t\)\.

###### Proof\.

We proceed by induction on\|T\|\|T\|\.

#### Base case\.

If\|T\|=0\|T\|=0, thenT=∅T=\\emptyset, soS×T=∅S\\times T=\\emptyset\. Both sides equal0by the convention for empty summation\.

#### Inductive step\.

Suppose the identity holds whenever\|T\|=n≥0\|T\|=n\\geq 0, and let\|T\|=n\+1\|T\|=n\+1\. Pickt0∈Tt\_\{0\}\\in T, setT′:=T∖\{t0\}T^\{\\prime\}:=T\\setminus\\\{t\_\{0\}\\\}\. ThenS×T=\(S×T′\)∪\(S×\{t0\}\)S\\times T=\(S\\times T^\{\\prime\}\)\\cup\(S\\times\\\{t\_\{0\}\\\}\)\(disjoint\)\. By the sum over disjoint unions:

∑\(s,t\)∈S×Tf\(s,t\)=∑\(s,t\)∈S×T′f\(s,t\)\+∑\(s,t\)∈S×\{t0\}f\(s,t\)\.\\sum\_\{\(s,t\)\\in S\\times T\}f\(s,t\)=\\sum\_\{\(s,t\)\\in S\\times T^\{\\prime\}\}f\(s,t\)\+\\sum\_\{\(s,t\)\\in S\\times\\\{t\_\{0\}\\\}\}f\(s,t\)\.By the induction hypothesis, the first summand equals∑s∈S∑t∈T′f\(s,t\)\\sum\_\{s\\in S\}\\sum\_\{t\\in T^\{\\prime\}\}f\(s,t\)\. The second equals∑s∈Sf\(s,t0\)\\sum\_\{s\\in S\}f\(s,t\_\{0\}\)\. Combining via linearity of summation:

=∑s∈S\(∑t∈T′f\(s,t\)\+f\(s,t0\)\)=∑s∈S∑t∈Tf\(s,t\)\.∎=\\sum\_\{s\\in S\}\\left\(\\sum\_\{t\\in T^\{\\prime\}\}f\(s,t\)\+f\(s,t\_\{0\}\)\\right\)=\\sum\_\{s\\in S\}\\sum\_\{t\\in T\}f\(s,t\)\.\\qed

## Appendix HExperimental Setup Details

This appendix details the synthetic evaluation protocol of Section[7](https://arxiv.org/html/2605.06993#S7)\.

### H\.1Common Parameters

Both the Erdős–Rényi and*bnlearn*evaluations share the following configuration:

\|𝒜\|\|\\mathcal\{A\}\|\(candidate experiments per simulation\)200200Master seed4242θ\\thetaCF\_fraction1\.01\.0θ\\thetaoutcome size\|𝒀\(𝔠\)\|\|\\boldsymbol\{Y\}^\{\(\\mathfrak\{c\}\)\}\|11θ\\thetaintervention size\|𝑿\(𝔠\)\|\|\\boldsymbol\{X\}^\{\(\\mathfrak\{c\}\)\}\|11ExperimentCF\_fraction0\.30\.3Experiment outcome size\|𝑾\(𝔠\)\|\|\\boldsymbol\{W\}^\{\(\\mathfrak\{c\}\)\}\|Unif\{1,2,3\}\\mathrm\{Unif\}\\\{1,2,3\\\}Experiment intervention size\|𝒁\(𝔠\)\|\|\\boldsymbol\{Z\}^\{\(\\mathfrak\{c\}\)\}\|Unif\{1,2,3\}\\mathrm\{Unif\}\\\{1,2,3\\\}We fixθ\\thetato be a two\-world counterfactual to guarantee that the query is not point\-identifiable, placing every simulation in the partial\-identification regime our framework targets\. The30%30\\%counterfactual share for candidate experiments reflects a plausible mix of interventional and counterfactual experiments a practitioner might consider\.

### H\.2Query Sampling

Each queryθ\\thetais sampled as follows:

1. 1\.PickY\(1\)Y^\{\(1\)\}uniformly from𝑽\\boldsymbol\{V\}; pickX\(1\)X^\{\(1\)\}uniformly fromanc\(Y\(1\)\)∖\{Y\(1\)\}\\mathrm\{anc\}\(Y^\{\(1\)\}\)\\setminus\\\{Y^\{\(1\)\}\\\}\.
2. 2\.For the second world, pickY\(2\)Y^\{\(2\)\}uniformly from𝑽\\boldsymbol\{V\}subject toX\(1\)∈anc\(Y\(2\)\)X^\{\(1\)\}\\in\\mathrm\{anc\}\(Y^\{\(2\)\}\), then setX\(2\)=X\(1\)X^\{\(2\)\}=X^\{\(1\)\}\.

### H\.3Candidate Experiment Sampling

Each𝒶∈𝒜\\mathcal\{a\}\\in\\mathcal\{A\}is sampled independently\. With probability0\.30\.3,𝒶\\mathcal\{a\}has two counterfactual worlds; otherwise one\.

1. 1\.Sample\|𝑾\(1\)\|,\|𝒁\(1\)\|∼Unif\{1,2,3\}\|\\boldsymbol\{W\}^\{\(1\)\}\|,\|\\boldsymbol\{Z\}^\{\(1\)\}\|\\sim\\mathrm\{Unif\}\\\{1,2,3\\\}\.
2. 2\.Pick𝑾\(1\)⊆𝑽\\boldsymbol\{W\}^\{\(1\)\}\\subseteq\\boldsymbol\{V\}uniformly; pick𝒁\(1\)⊆anc\(𝑾\(1\)\)∖𝑾\(1\)\\boldsymbol\{Z\}^\{\(1\)\}\\subseteq\\mathrm\{anc\}\(\\boldsymbol\{W\}^\{\(1\)\}\)\\setminus\\boldsymbol\{W\}^\{\(1\)\}uniformly\.
3. 3\.For two\-world experiments, sample𝑾\(2\),𝒁\(2\)\\boldsymbol\{W\}^\{\(2\)\},\\boldsymbol\{Z\}^\{\(2\)\}analogously, but seed each with a shared element: pickw∈𝑾\(1\)w\\in\\boldsymbol\{W\}^\{\(1\)\}uniformly and requirew∈𝑾\(2\)w\\in\\boldsymbol\{W\}^\{\(2\)\}; pickz∈𝒁\(1\)∩\(anc\(𝑾\(2\)\)∖𝑾\(2\)\)z\\in\\boldsymbol\{Z\}^\{\(1\)\}\\cap\(\\mathrm\{anc\}\(\\boldsymbol\{W\}^\{\(2\)\}\)\\setminus\\boldsymbol\{W\}^\{\(2\)\}\)uniformly and requirez∈𝒁\(2\)z\\in\\boldsymbol\{Z\}^\{\(2\)\}\. Remaining variables fill from𝑽\\boldsymbol\{V\}andanc\(𝑾\(2\)\)∖𝑾\(2\)\\mathrm\{anc\}\(\\boldsymbol\{W\}^\{\(2\)\}\)\\setminus\\boldsymbol\{W\}^\{\(2\)\}, respectively\.

### H\.4Erdős–Rényi Graphs

Each ER graph has a fixed topological orderV1<⋯<VNV\_\{1\}<\\cdots<V\_\{N\}\. Directed edgesVi→VjV\_\{i\}\\to V\_\{j\}fori<ji<jare drawn independently with probabilitypp; bidirected edges between each pair are drawn independently with probabilityqq\.

The full sweep yields4×4×100=1,6004\\times 4\\times 100=1\{,\}600simulations, of which1,5961\{,\}596are reported\.

### H\.5*bnlearn*Graphs

We use1111networks from the*bnlearn*repository:ASIA,SACHS,ALARM,BARLEY,CHILD,INSURANCE,MILDEW,WATER,HAILFINDER,HEPAR2,WIN95PTS\. Only the graph structure is retained; conditional probability tables are discarded\. Since these DAGs are not associated with latent variables, confounders are introduced synthetically\. For each simulation, after samplingθ\\thetaon the bare DAG:

1. 1\.Add a forced confounder betweenX\(1\)X^\{\(1\)\}andY\(1\)Y^\{\(1\)\}\.
2. 2\.Draw a target ratior∼Unif\(0\.1,0\.9\)r\\sim\\mathrm\{Unif\}\(0\.1,0\.9\), clamped below by1/\|𝑽\|1/\|\\boldsymbol\{V\}\|for achievability\.
3. 3\.Add confounders between uniformly random pairs of observed variables until the total reaches⌊r⋅\|𝑽\|⌋\\lfloor r\\cdot\|\\boldsymbol\{V\}\|\\rfloor\.

With100100simulations per graph, this yields11×100=1,10011\\times 100=1\{,\}100simulations\.

![Refer to caption](https://arxiv.org/html/2605.06993v1/images/evaluation/bnlearn_against_Rstartheta.png)Figure 9:*bnlearn*\. The fraction of experiments pruned by the interception mechanism \(Algorithm[3](https://arxiv.org/html/2605.06993#alg3)\) against the relative size of𝑹θ∗\\boldsymbol\{R\}^\{\*\}\_\{\\theta\}\. The line represents the mean fraction of experiments pruned; the band around it represents the±1\\pm 1standard deviation\.
### H\.6Compute Used

All experiments were executed on a single Microsoft Surface Laptop Studio with an 11th Gen Intel Core i7\-11370H CPU \(3\.30 GHz base\) and 32 GB RAM\. No GPU was used and no parallelization was employed; all simulations ran single\-threaded\. Each of the three experimental components—the Erdős–Rényi sweep \(1,596 simulations\), the*bnlearn*sweep \(1,100 simulations across 11 networks\), and the NHANES end\-to\-end demonstration \(Section[7](https://arxiv.org/html/2605.06993#S7), Table[1](https://arxiv.org/html/2605.06993#S7.T1)\)—completed in a matter of hours\. No additional compute infrastructure was used\.

### H\.7Per\-graph results \(*bnlearn*\)

Table[3](https://arxiv.org/html/2605.06993#A8.T3)reports the per\-graph mean and standard deviation of pruning rates across 100 simulations per graph, broken down by mechanism\. Across all 1,100 simulations, total pruning has mean63\.4%63\.4\\%\(std12\.4%12\.4\\%\), of which47\.1%±14\.5%47\.1\\%\\pm 14\.5\\%is contributed by the ID check and16\.2%±17\.5%16\.2\\%\\pm 17\.5\\%by interception\. The interception rate varies by more than an order of magnitude across networks—from2\.3%±4\.3%2\.3\\%\\pm 4\.3\\%onMILDEWto29\.2%±25\.5%29\.2\\%\\pm 25\.5\\%onWIN95PTS—driven primarily by the relative size of the district hull\|𝑹θ∗\|/\|𝑽\|\|\\boldsymbol\{R\}^\{\*\}\_\{\\theta\}\|/\|\\boldsymbol\{V\}\|\(Figure[9](https://arxiv.org/html/2605.06993#A8.F9)\)\.

Table 3:Per\-graph pruning rates on*bnlearn*networks \(mean±\\pmstd over 100 simulations\)\.
### H\.8Per\-density breakdown \(Erdős–Rényi\)

Table[4](https://arxiv.org/html/2605.06993#A8.T4)reports the breakdown of pruning rates across the four confounding\-density settingsc∈\{0\.5,1\.0,1\.5,2\.0\}c\\in\\\{0\.5,1\.0,1\.5,2\.0\\\}, where edges of the bidirected graph are drawn independently with probabilityc/Nc/N\(Appendix[H\.4](https://arxiv.org/html/2605.06993#A8.SS4)\)\. The mean achieved confounder ratio\|𝑼\|/\|𝑽\|\|\\boldsymbol\{U\}\|/\|\\boldsymbol\{V\}\|grows monotonically withcc, from0\.240\.24atc=0\.5c\{=\}0\.5to0\.940\.94atc=2\.0c\{=\}2\.0\. As confounding density increases, interception weakens \(mean drops from70\.9%70\.9\\%to16\.5%16\.5\\%\) and the ID check picks up the slack \(mean rises from17\.2%17\.2\\%to45\.6%45\.6\\%\), but total pruning still declines from88\.1%88\.1\\%to62\.1%62\.1\\%\.

Table 4:Per\-density breakdown for the Erdős–Rényi evaluation \(mean±\\pmstd\)\.\|𝑼\|/\|𝑽\|\|\\boldsymbol\{U\}\|/\|\\boldsymbol\{V\}\|is the mean confounder ratio for the given parametercc; remaining columns are pruning rates by mechanism\.
### H\.9ID\-only pruning \(subset search\)

For the subset search space reduction reported in Section[7](https://arxiv.org/html/2605.06993#S7), we run the ID check on all\|𝒜\|\|\\mathcal\{A\}\|candidates independently, without prior interception filtering\. Tables[5](https://arxiv.org/html/2605.06993#A8.T5)and[6](https://arxiv.org/html/2605.06993#A8.T6)report the results across the same simulations\.

Table 5:ID\-only pruning on*bnlearn*networks \(mean±\\pmstd over 100 simulations\)\.Table 6:ID\-only pruning for Erdős–Rényi, pooled over\|𝑽\|\|\\boldsymbol\{V\}\|\(mean±\\pmstd\)\.

## Appendix INHANES Variable Definitions

This appendix details the construction of the NHANES experiment in Section[7](https://arxiv.org/html/2605.06993#S7)\. We use the NHANES 2017–2018 cycle777[https://wwwn\.cdc\.gov/nchs/nhanes/](https://wwwn.cdc.gov/nchs/nhanes/)and restrict the sample to adult respondents \(RIDAGEYR≥18\\text\{RIDAGEYR\}\\geq 18\) with non\-missing entries on all five variables, yieldingn=5,833n=5\{,\}833complete cases\.

The assumed causal structure \(Figure[10](https://arxiv.org/html/2605.06993#A9.F10)\) encodes the following substantive assumptions: balance problems \(AA\) may cause falls \(BB\), and both may influence the propensity to engage in physical activity \(XX\); health insurance \(ZZ\) acts as an instrument\-like predictor of physical activity through access to preventive care; and physical activity affects the \(negated\) diabetes outcome \(YY\)\. Three latent confounders are posited:U3U\_\{3\}betweenAAandBB\(e\.g\., inner\-ear or vestibular disorders affecting both balance and falls\),U1U\_\{1\}betweenZZandXX\(e\.g\., overall healthcare access and socioeconomic factors\), andU2U\_\{2\}betweenXXandYY\(e\.g\., genetic predisposition or general health\-consciousness\)\. The query of interest is the average treatment effectθ=ATE=P\(yx\)−P\(yx′\)\\theta=\\mathrm\{ATE\}=P\(y\_\{x\}\)\-P\(y\_\{x^\{\\prime\}\}\)of physical activity on not having diabetes\.

AABBZZXXYYU3U\_\{3\}U1U\_\{1\}U2U\_\{2\}Figure 10:Assumed causal ADMG for the NHANES experiment \(n=5,833n=5\{,\}833complete cases\)\. Solid nodes are observed; dashed nodes are latent confounders\.Table[7](https://arxiv.org/html/2605.06993#A9.T7)maps each variable to its corresponding NHANES variable code and the rule used to discretize it to a binary value\. We negate the diabetes indicator \(denoting the variable¬Y\\neg Yin the table\) so thatY=1Y=1corresponds to*not*having a doctor\-diagnosed diabetes condition; this preserves the natural reading ofθ\>0\\theta\>0as “physical activity is protective\.”

Table 7:Mapping of variables to NHANES 2017–2018 codes\.The five\-variable ADMG and the discretized observational distributionP\(A,B,Z,X,Y\)P\(A,B,Z,X,Y\)derived from these definitions form the input to the pruning and potency\-evaluation pipeline reported in Table[1](https://arxiv.org/html/2605.06993#S7.T1)\.
Optimal Experiments for Partial Causal Effect Identification

Similar Articles

Do Real-World Datasets Contain Natural Experiments? An Empirical Study Using Causal Feature Selection

PACER: Acyclic Causal Discovery from Large-Scale Interventional Data

One Ruler: A Same-Hands Re-Evaluation of Bivariate Causal Direction on Tuebingen, with a Parameter-Free Compression Baseline

Causal Intelligence for Constraint-Aware Intervention Design to Induce State Transitions

Treatment Effect Estimation with Differentiated Networked Effect on Graph Data

Submit Feedback

Similar Articles

Do Real-World Datasets Contain Natural Experiments? An Empirical Study Using Causal Feature Selection
PACER: Acyclic Causal Discovery from Large-Scale Interventional Data
One Ruler: A Same-Hands Re-Evaluation of Bivariate Causal Direction on Tuebingen, with a Parameter-Free Compression Baseline
Causal Intelligence for Constraint-Aware Intervention Design to Induce State Transitions
Treatment Effect Estimation with Differentiated Networked Effect on Graph Data