GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation
Summary
This arXiv preprint introduces GRALIS, a unified mathematical framework using Riesz Representation Theory to formalize and compare linear attribution methods like SHAP, LIME, and Integrated Gradients.
View Cached Full Text
Cached at: 05/08/26, 07:28 AM
# A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation
Source: [https://arxiv.org/html/2605.05480](https://arxiv.org/html/2605.05480)
Raimondo FanaleUniversitas Mercatorum, Rome, Italy\. Correspondence:raimondo\.fanale@ieee\.org\. Companion paper with experimental validation on BreaKHis:*GRALIS\-LLM: Multimodal Explainable AI for Automated Clinical Report Generation in Breast Cancer Histology*\(in preparation, Frontiers in Signal Processing, 2026\)\.
\(May 2026
⋅\\cdot*arXiv Preprint*\)
###### Abstract
The main Explainable AI \(XAI\) methods for deep neural networks — GradCAM, SHAP, LIME, Integrated Gradients — operate on separate theoretical foundations and are not formally comparable\. This work presentsGRALIS\(Gradient\-Riesz Averaged Locally\-Integrated Shapley\), a mathematical framework that unifies a broad class of linear attribution methods — including SHAP, IG, LIME and linearized GradCAM — establishing a*representation theory for attributions*: every additive, linear, and continuous attribution functional*inL2\(𝒬,μ\)L^\{2\}\(\\mathcal\{Q\},\\mu\)*— where𝒬\\mathcal\{Q\}is the index space induced by the attribution mechanism \(integration paths, coalitions, or feature maps\) — admits a unique canonical representation of the form\(𝒬,w,Δ\)\(\\mathcal\{Q\},w,\\Delta\), withw∈L2\(𝒬\)w\\in L^\{2\}\(\\mathcal\{Q\}\)a weight function andΔ\\Deltaa marginal contribution\. This class includes SHAP, IG and LIME, but*not*nonlinear functionals such as standard GradCAM, attention maps, or saliency methods with smoothing\. Seven formal theorems provide simultaneous guarantees absent in any individual method: \(T1\) necessary canonical form via the Riesz Theorem; \(T2\) exact completeness; \(T3\) Monte Carlo convergence with boundO\(1/m\)\+O\(1/k\)O\(1/\\sqrt\{m\}\)\+O\(1/k\); \(T4\) exact Shapley Interaction Values; \(T5\) Hoeffding ANOVA decomposition; \(T6\) generalization of Sobol sensitivity indices; \(T7\) multi\-scale extension \(MS\-GRALIS\) with minimum\-variance weights\. Appendix X provides the algebraic justification of the GRALIS–SIV correspondence via the Möbius transform, showing that GRALIS*constructs*a cooperative gamevGv\_\{G\}and*computes exactly*the SIVs onvGv\_\{G\}, without circularity\.GRALISsatisfies a broader set of axiomatic and structural properties than any existing method \(axiom table\): including completeness, sensitivity, locality, order\-kkinteractions and multi\-scale aggregation with optimal weights, simultaneously\. Preliminary experimental validation on breast histology \(BreaKHis, 1,187 images, DenseNet\-121\) is reported in Section[5](https://arxiv.org/html/2605.05480#S5); extended comparison with baseline XAI methods is planned for a companion paper\.
###### Contents
1. [1Introduction](https://arxiv.org/html/2605.05480#S1)
2. [2Background and Related Work](https://arxiv.org/html/2605.05480#S2)1. [2\.1XAI Attribution Methods](https://arxiv.org/html/2605.05480#S2.SS1) 2. [2\.2Unification Attempts and Existing Gap](https://arxiv.org/html/2605.05480#S2.SS2) 3. [2\.3Gap Analysis: Six Structural Gaps](https://arxiv.org/html/2605.05480#S2.SS3)
3. [3The GRALIS Framework](https://arxiv.org/html/2605.05480#S3)1. [3\.1Definitions and Notation](https://arxiv.org/html/2605.05480#S3.SS1) 2. [3\.2Theorem 1: Unified Canonical Form \(Riesz\)](https://arxiv.org/html/2605.05480#S3.SS2) 3. [3\.3Theorem 2: Completeness](https://arxiv.org/html/2605.05480#S3.SS3) 4. [3\.4Theorem 3: Monte Carlo Convergence \(GRALIS\-MC\)](https://arxiv.org/html/2605.05480#S3.SS4) 5. [3\.5Theorem 4: Shapley Interaction Values](https://arxiv.org/html/2605.05480#S3.SS5) 6. [3\.6Theorem 5: Hoeffding ANOVA Decomposition](https://arxiv.org/html/2605.05480#S3.SS6) 7. [3\.7Theorem 6: Sobol Sensitivity Indices](https://arxiv.org/html/2605.05480#S3.SS7) 8. [3\.8Theorem 7: Multi\-Scale Extension \(MS\-GRALIS\)](https://arxiv.org/html/2605.05480#S3.SS8)
4. [4Summary and Axiomatic Comparison](https://arxiv.org/html/2605.05480#S4)
5. [5Preliminary Experimental Validation](https://arxiv.org/html/2605.05480#S5)
6. [6Conclusions](https://arxiv.org/html/2605.05480#S6)
7. [XJustification via Möbius Transformof the GRALIS–SIV Correspondence](https://arxiv.org/html/2605.05480#A1)1. [X\.1Motivation](https://arxiv.org/html/2605.05480#A1.SS1) 2. [X\.2Cooperative Game Induced by GRALIS](https://arxiv.org/html/2605.05480#A1.SS2) 3. [X\.3Möbius Transform](https://arxiv.org/html/2605.05480#A1.SS3) 4. [X\.4Second\-Order Interactions](https://arxiv.org/html/2605.05480#A1.SS4) 5. [X\.5Recovery of Shapley Interaction Values](https://arxiv.org/html/2605.05480#A1.SS5) 6. [X\.6Interpretation for GRALIS](https://arxiv.org/html/2605.05480#A1.SS6) 7. [X\.7Final Proposition](https://arxiv.org/html/2605.05480#A1.SS7) 8. [X\.8Concluding Remark](https://arxiv.org/html/2605.05480#A1.SS8)
8. [YFormalization of the Projectionρ\\rhoand the OperatorPρP\_\{\\rho\}](https://arxiv.org/html/2605.05480#A1a)1. [Y\.1Motivation](https://arxiv.org/html/2605.05480#A1.SS1a) 2. [Y\.2Lemma 1 — Push\-Forward Measure and Well\-Definedness ofvGv\_\{G\}](https://arxiv.org/html/2605.05480#A1.SS2a) 3. [Y\.3Lemma 2 — Measurable Partition and Full Coverage of𝒬\\mathcal\{Q\}](https://arxiv.org/html/2605.05480#A1.SS3a) 4. [Y\.4Lemma 3 — Invariance with Respect to the Labeling ofρ\\rho](https://arxiv.org/html/2605.05480#A1.SS4a) 5. [Y\.5The OperatorPρP\_\{\\rho\}and the Algebraic Structure of GRALIS](https://arxiv.org/html/2605.05480#A1.SS5a) 6. [Y\.6GRALIS as a Functor between Continuous Spaces and Cooperative Games](https://arxiv.org/html/2605.05480#A1.SS6a)
9. [References](https://arxiv.org/html/2605.05480#bib)
## 1Introduction
Deep learning has achieved accuracies surpassing those of human specialists in numerous medical imaging tasks\. However, the*black\-box*nature of deep neural networks makes it impossible for a clinician to understand*why*a model produced a given prediction\. Explainable AI \(XAI\) has responded with heterogeneous post\-hoc methods\[[26](https://arxiv.org/html/2605.05480#bib.bib26)\]: GradCAM\[[1](https://arxiv.org/html/2605.05480#bib.bib1)\]identifies the most influential regions via gradients of the last convolutional layer; SHAP\[[2](https://arxiv.org/html/2605.05480#bib.bib2)\]assigns Shapley values to features via kernel approximation; LIME\[[3](https://arxiv.org/html/2605.05480#bib.bib3)\]locally approximates the model with a linear classifier; Integrated Gradients\[[4](https://arxiv.org/html/2605.05480#bib.bib4)\]integrates the gradients along the path from baseline to input\. Each is developed on distinct foundations with different guarantees, making systematic comparison non\-rigorous\.
This fragmentation has direct practical consequences: the choice of XAI method is often empirical, attribution maps from different methods are not formally comparable, and the combination of multiple methods lacks a unifying mathematical justification\. Prior unification attempts have been partial: Ancona et al\.\[[5](https://arxiv.org/html/2605.05480#bib.bib5)\]observe that GradCAM and related gradient methods are*empirically*expressible as gradient×\\timesinput—a linear form—but do not prove this is*structurally necessary*, and do not include SHAP or LIME\. Covert and Lee\[[9](https://arxiv.org/html/2605.05480#bib.bib9)\]unify LIME, SHAP and IG via Shapley games, but cannot accommodate GradCAM because its post\-aggregationReLU\\mathrm\{ReLU\}violates the linearity required by the framework\.
This work presentsGRALIS, a framework that resolves the fragmentation by demonstrating that a broad class of linear additive attribution methods — including SHAP, IG, LIME and linearized GradCAM — are special cases of a unique canonical structure:
ϕiGRALIS\(f,x,x′\)=∫𝒬w\(q\)⋅Δi\(f,x,x′,q\)𝑑μ\(q\),\\phi\_\{i\}^\{\\textsc\{GRALIS\}\}\(f,x,x^\{\\prime\}\)\\;=\\;\\int\_\{\\mathcal\{Q\}\}w\(q\)\\cdot\\Delta\_\{i\}\(f,x,x^\{\\prime\},q\)\\,d\\mu\(q\),\(1\)where𝒬\\mathcal\{Q\}is the integration index space,wwis the weight function andΔi\\Delta\_\{i\}is the marginal contribution of featureii\. The Riesz Representation Theorem guarantees that this form is the*unique*possible representation for every*additive*linear and continuous attribution functional \(Theorem[3\.4](https://arxiv.org/html/2605.05480#S3.Thmtheorem4)\)\.
#### Contributions\.
1. 1\.Theoretical unification\.Seven formal theorems that unify GradCAM, SHAP, LIME, IG and extend their guarantees\.
2. 2\.Computational efficiency\.GRALIS\-MC reduces the complexity fromO\(2n⋅k\)O\(2^\{n\}\\cdot k\)toO\(m⋅n⋅k\)O\(m\\cdot n\\cdot k\)with an explicit error bound\.
3. 3\.Algebraic justification\.The Möbius transform \(Appendix[X](https://arxiv.org/html/2605.05480#A1)\) shows that the SIVs are computed*exactly*on a cooperative game constructed by GRALIS, without approximation\.
Preliminary experimental validation on breast histology \(BreaKHis dataset, distilled DenseNet\-121 model\) is reported in Section[5](https://arxiv.org/html/2605.05480#S5); extended comparison with baseline XAI methods is planned for a companion paper\[[28](https://arxiv.org/html/2605.05480#bib.bib28)\]\.
## 2Background and Related Work
### 2\.1XAI Attribution Methods
An attribution method assigns to each input feature a score representing its contribution to the prediction\. Formally, givenf:ℝn→ℝf:\\mathbb\{R\}^\{n\}\\to\\mathbb\{R\}and an inputxxwith baselinex′x^\{\\prime\}, the method producesϕ∈ℝn\\phi\\in\\mathbb\{R\}^\{n\}such thatϕi\\phi\_\{i\}represents the importance of featureii\.
#### GradCAM\[[1](https://arxiv.org/html/2605.05480#bib.bib1)\]\.
Computesspq=∑kαkc⋅Apqks\_\{pq\}=\\sum\_\{k\}\\alpha\_\{k\}^\{c\}\\cdot A^\{k\}\_\{pq\}whereαkc=1Z∑i,j∂yc∂Aijk\\alpha\_\{k\}^\{c\}=\\frac\{1\}\{Z\}\\sum\_\{i,j\}\\frac\{\\partial y^\{c\}\}\{\\partial A^\{k\}\_\{ij\}\}\. Computationally efficient \(O\(1\)O\(1\)backward pass\) but does not satisfy completeness or locality\. For Theorem[3\.4](https://arxiv.org/html/2605.05480#S3.Thmtheorem4)we use*GradCAM\-lin*, the pre\-ReLU variant defined as
Llinc\(p,q\)=∑kαkcApqk,αkc=1Z∑i,j∂yc∂Aijk,L^\{c\}\_\{\\mathrm\{lin\}\}\(p,q\)\\;=\\;\\sum\_\{k\}\\alpha\_\{k\}^\{c\}\\,A^\{k\}\_\{pq\},\\qquad\\alpha\_\{k\}^\{c\}=\\frac\{1\}\{Z\}\\sum\_\{i,j\}\\frac\{\\partial y^\{c\}\}\{\\partial A^\{k\}\_\{ij\}\},which omits the post\-aggregationReLU\\mathrm\{ReLU\}of standard GradCAM\. TheReLU\\mathrm\{ReLU\}in\[[1](https://arxiv.org/html/2605.05480#bib.bib1)\]is a*visualization heuristic*that retains only channels with positive influence on the target class; it is not an axiomatic requirement, and it introduces a pointwise nonlinearity on the output of a linear operator that destroys linearity inff\(see Remark[3\.5](https://arxiv.org/html/2605.05480#S3.Thmtheorem5)below\)\.
GradCAM\-lin corresponds precisely to the first\-order Taylor expansion ofycy^\{c\}around the zero\-activation referenceA¯k=0\\bar\{A\}^\{k\}=0:
yc≈yc\|A=0\+∑k,i,j∂yc∂AijkAijk,y^\{c\}\\;\\approx\\;y^\{c\}\\big\|\_\{A=0\}\+\\sum\_\{k,i,j\}\\frac\{\\partial y^\{c\}\}\{\\partial A^\{k\}\_\{ij\}\}\\,A^\{k\}\_\{ij\},\(2\)whose spatial summary at position\(p,q\)\(p,q\)—obtained by poolingAijkA^\{k\}\_\{ij\}contributions over all\(i,j\)\(i,j\)with uniform weight1/Z1/Z—is exactlyLlinc\(p,q\)L^\{c\}\_\{\\mathrm\{lin\}\}\(p,q\)\. This linearization is consistent with the gradient×\\timesinput interpretation of GradCAM established by Ancona et al\.\[[5](https://arxiv.org/html/2605.05480#bib.bib5)\], with the deep Taylor decomposition framework of Montavon et al\.\[[6](https://arxiv.org/html/2605.05480#bib.bib6)\], and with the gradient\-based saliency maps of Simonyan et al\.\[[7](https://arxiv.org/html/2605.05480#bib.bib7)\], all of which operate on the pre\-nonlinearity linear term\. Standard GradCAM is recovered*exactly*from GradCAM\-lin whenαkc≥0\\alpha\_\{k\}^\{c\}\\geq 0for allkk\(every channel contributes positively to classcc, a condition met by the majority of channels in well\-trained classifiers\[[1](https://arxiv.org/html/2605.05480#bib.bib1)\]\), and the two methods agree to first order in the Taylor expansion whenever someαkc<0\\alpha\_\{k\}^\{c\}<0\.
Remark on implementation\.When GradCAM\-lin is computed on the*pre\-activation*feature mapsZpqkZ^\{k\}\_\{pq\}of the last convolutional layer—i\.e\. before the network’s internal ReLU rather than after it—the expansion \([2](https://arxiv.org/html/2605.05480#S2.E2)\) holds*exactly*, with no first\-order qualification:
Llin,prec\(p,q\)=∑kαkcZpqk,Zpqk∈ℝ\.L^\{c\}\_\{\\mathrm\{lin,pre\}\}\(p,q\)\\;=\\;\\sum\_\{k\}\\alpha\_\{k\}^\{c\}\\,Z^\{k\}\_\{pq\},\\quad Z^\{k\}\_\{pq\}\\in\\mathbb\{R\}\.SinceZpqkZ^\{k\}\_\{pq\}is unconstrained in sign, the weighted sum is a genuine linear functional offfin all cases, and standard GradCAM is recovered precisely when the network’s internal ReLU is inactive \(Zpqk≥0Z^\{k\}\_\{pq\}\\geq 0\) andαkc≥0\\alpha\_\{k\}^\{c\}\\geq 0, i\.e\. when the network itself is locally linear at that layer\. This implementation choice is available in any standard deep learning framework and does not alter the computational cost relative to standard GradCAM\.
#### SHAP\[[2](https://arxiv.org/html/2605.05480#bib.bib2)\]\.
The Shapley value for featureiiis its average marginal contribution over all coalitions:ϕi=∑S⊆ℱ∖\{i\}\|S\|\!\(n−\|S\|−1\)\!n\!\[f\(S∪\{i\}\)−f\(S\)\]\\phi\_\{i\}=\\sum\_\{S\\subseteq\\mathcal\{F\}\\setminus\\\{i\\\}\}\\frac\{\|S\|\!\(n\-\|S\|\-1\)\!\}\{n\!\}\[f\(S\\cup\\\{i\\\}\)\-f\(S\)\]\. Satisfies efficiency, symmetry, dummy and additivity, but exact computation isO\(2n\)O\(2^\{n\}\); KernelSHAP approximates withO\(m⋅n\)O\(m\\cdot n\)\.
#### LIME\[[3](https://arxiv.org/html/2605.05480#bib.bib3)\]\.
Locally approximatesffwith a linear model:w∗=argming∈𝒢L\(f,g,πx\)\+Ω\(g\)w^\{\*\}=\\arg\\min\_\{g\\in\\mathcal\{G\}\}L\(f,g,\\pi\_\{x\}\)\+\\Omega\(g\), whereπx\\pi\_\{x\}is a proximity kernel\. The conditions of Theorem[3\.4](https://arxiv.org/html/2605.05480#S3.Thmtheorem4)are satisfied within the local approximation \(neighborhoodNxN\_\{x\}defined byπx\\pi\_\{x\}\)\.
#### Integrated Gradients\[[4](https://arxiv.org/html/2605.05480#bib.bib4)\]\.
ϕi\(f,x\)=\(xi−xi′\)∫01∂f\(x′\+α\(x−x′\)\)∂xi𝑑α\\phi\_\{i\}\(f,x\)=\(x\_\{i\}\-x^\{\\prime\}\_\{i\}\)\\int\_\{0\}^\{1\}\\frac\{\\partial f\(x^\{\\prime\}\+\\alpha\(x\-x^\{\\prime\}\)\)\}\{\\partial x\_\{i\}\}\\,d\\alpha\. Satisfies completeness and sensitivity; the numerical approximation introduces errorO\(1/k\)O\(1/k\)\.
### 2\.2Unification Attempts and Existing Gap
The gradient×\\timesinput family established by Ancona et al\.\[[5](https://arxiv.org/html/2605.05480#bib.bib5)\]provides the closest prior bridge to our linearization\. They show that GradCAM \(and guided backpropagation, plain gradients\) can be expressed asϕi∝∇if\(x\)⋅xi\\phi\_\{i\}\\propto\\nabla\_\{i\}f\(x\)\\cdot x\_\{i\}, i\.e\. as a*pointwise product of gradient and input*, which is structurally a linear functional offf\. This is an important empirical observation: it reveals that the GradCAM family is*de facto*operating in the linear regime\. However, Ancona et al\. establish this as a family relationship among existing methods; they do not prove that linearity is*structurally necessary*—i\.e\., that any valid additive attribution must take this form—nor do they include SHAP and LIME in the same algebraic framework\.
Lundstrom et al\.\[[10](https://arxiv.org/html/2605.05480#bib.bib10)\]extend the gradient×\\timesinput analysis to internal neuron attributions, showing that the first\-order \(linear\) term is the unique term satisfying path\-completeness at every intermediate layer\. This further supports the restriction to GradCAM\-lin: the Taylor linearization is not an ad hoc simplification but the canonical form that preserves completeness at the feature\-map level\.
GradCAM has demonstrated strong empirical reliability across localization benchmarks\. The ROAR/KAR evaluation of Hooker et al\.\[[12](https://arxiv.org/html/2605.05480#bib.bib12)\]shows that GradCAM outperforms pixel\-level gradient methods precisely because its global average pooling of feature\-map gradients provides robustness to local gradient noise — a result that confirms practical reliability while leaving axiomatic properties unaddressed\. The breadth of GradCAM’s adoption has generated a progressive line of refinements, each targeting a specific structural limitation without questioning practical utility\. GradCAM\+\+\[[8](https://arxiv.org/html/2605.05480#bib.bib8)\]introduces non\-uniform gradient weights to improve localization of multiple object instances\. Score\-CAM\[[13](https://arxiv.org/html/2605.05480#bib.bib13)\]replaces gradients entirely with activation perturbations, obtaining gradient\-free class\-discriminative maps at the cost ofO\(k\)O\(k\)forward passes\. Axiom\-based Grad\-CAM\[[14](https://arxiv.org/html/2605.05480#bib.bib14)\]adds a completeness correction term to restore the efficiency axiom, explicitly acknowledging that standard GradCAM violates it\. HiResCAM\[[15](https://arxiv.org/html/2605.05480#bib.bib15)\]modifies channel aggregation with an element\-wise product before pooling to improve spatial faithfulness\. All these variants retain the post\-aggregationReLU\\mathrm\{ReLU\}as a class\-discriminative visualization choice, confirming that the community regards it as a valid design option rather than a defect\.
The limitation relevant to GRALIS is therefore not one of empirical reliability but of*axiomatic incompatibility*\. The post\-aggregationReLU\\mathrm\{ReLU\}introduces a nonlinearity that, as shown in Remark[3\.5](https://arxiv.org/html/2605.05480#S3.Thmtheorem5), prevents GradCAM from satisfying the linearity condition required for a Riesz representation — a structural barrier that neither GradCAM\+\+ nor Score\-CAM nor the axiom correction of\[[14](https://arxiv.org/html/2605.05480#bib.bib14)\]resolves, because all retain the same nonlinear post\-processing step\. The critique of Kindermans et al\.\[[11](https://arxiv.org/html/2605.05480#bib.bib11)\]targets a different issue — input\-shift sensitivity of pixel\-level saliency maps and guided backpropagation — and does not apply to GradCAM directly, which operates on feature maps rather than input pixels\. GradCAM\-lin is therefore not a critique of GradCAM but its*canonical extension*: the unique member of the GradCAM family that satisfies the mathematical conditions for formal unification, recovering standard GradCAM exactly when all channel weightsαkc≥0\\alpha\_\{k\}^\{c\}\\geq 0\.
Covert and Lee\[[9](https://arxiv.org/html/2605.05480#bib.bib9)\]show that LIME, SHAP and IG are instances of the same “Shapley games” framework, but exclude GradCAM and do not cover the formal properties of convergence and ANOVA\.
GRALISresolves the limitations of both lines of work\. The gradient×\\timesinput observation of Ancona et al\. is subsumed: GradCAM\-lin is the canonical representative of the GradCAM family within\(𝒬,w,Δ\)\(\\mathcal\{Q\},w,\\Delta\), and its gradient×\\timesinput structure follows from the triplew=1/Zw=1/Z,Δk,i,j=\(∂yc/∂Aijk\)⋅Apqk\\Delta\_\{k,i,j\}=\(\\partial y^\{c\}/\\partial A^\{k\}\_\{ij\}\)\\cdot A^\{k\}\_\{pq\}\(Table[3](https://arxiv.org/html/2605.05480#S3.T3)\)\. The Shapley\-game unification of Covert and Lee is also subsumed, and extended with a*necessary*canonical form \(Theorem[3\.4](https://arxiv.org/html/2605.05480#S3.Thmtheorem4)\): where prior work shows that certain methods*can*be written as weighted integrals, GRALIS proves they*must*be—and that the canonical representation is unique\. No prior work produces a necessary canonical form that simultaneously encompasses GradCAM, SHAP, LIME and IG, and provides completeness, convergence and an interaction hierarchy\.
Table[2](https://arxiv.org/html/2605.05480#S2.T2)summarizes the comparison\.
### 2\.3Gap Analysis: Six Structural Gaps
Despite sharing the same canonical form\(𝒬,w,Δ\)\(\\mathcal\{Q\},w,\\Delta\), the four methods leave six structural gaps uncovered\.
Gap 1 — Arbitrary baseline in IG\.IG requires a fixed baselinex′x^\{\\prime\}\(e\.g\. a black image\)\. The choice drastically changes the attributions:IGi\(x,x1′\)≠IGi\(x,x2′\)\\mathrm\{IG\}\_\{i\}\(x,x^\{\\prime\}\_\{1\}\)\\neq\\mathrm\{IG\}\_\{i\}\(x,x^\{\\prime\}\_\{2\}\)forx1′≠x2′x^\{\\prime\}\_\{1\}\\neq x^\{\\prime\}\_\{2\}\. No method integrates over the baseline distribution in a structured way\.
Gap 2 — SHAP ignores curvature between coalitions\.SHAP comparesf\(S∪\{i\}\)f\(S\\cup\\\{i\\\}\)vsf\(S\)f\(S\)but does not consider*how*one moves fromSStoS∪\{i\}S\\cup\\\{i\\\}:ϕiSHAP=𝔼S\[f\(S∪i\)−f\(S\)\]⊅∫01∇iF\(⋅\)𝑑α\\phi\_\{i\}^\{\\mathrm\{SHAP\}\}=\\mathbb\{E\}\_\{S\}\[f\(S\\cup i\)\-f\(S\)\]\\;\\not\\supset\\;\\int\_\{0\}^\{1\}\\nabla\_\{i\}F\(\\cdot\)\\,d\\alpha\.
Gap 3 — LIME does not satisfy completeness\.The coefficients of the local surrogate do not sum toF\(x\)−F\(x′\)F\(x\)\-F\(x^\{\\prime\}\):∑iwiLIME≠F\(x\)−F\(x′\)\\sum\_\{i\}w\_\{i\}^\{\\mathrm\{LIME\}\}\\neq F\(x\)\-F\(x^\{\\prime\}\)\.
Gap 4 — GradCAM confined to CNN space\.Operates onAkA^\{k\}\(convolutional feature maps\), not on the input space\. Not applicable to dense layers, Transformers or pure GNNs\. Does not satisfy any Shapley axiom\. MS\-GRALIS\(T\.[3\.16](https://arxiv.org/html/2605.05480#S3.Thmtheorem16)\) partially resolves this gap: by aggregating attributions over multiple levels \(ℓ=1,…,L\\ell=1,\\ldots,L\), it includes CNN feature maps, dense layers and attention with optimal weights\. The requirement of differentiability ofFFon intermediate representations remains \(≈\\approx\)\.
Gap 5 — No method models feature interactions\.All produce marginal attributionssis\_\{i\}for a single feature\. The pairwise interactionΦij=sij−si−sj≠0\\Phi\_\{ij\}=s\_\{ij\}\-s\_\{i\}\-s\_\{j\}\\neq 0is not captured in an integrated way\. SHAP\-Interaction exists but does not integrate gradients or locality, with additional costO\(2n\)O\(2^\{n\}\)\.
Gap 6 — No multi\-scale aggregation\.No method aggregates attributions over multiple levels of abstraction with mathematically motivated weights\. GradCAM\+\+\[[8](https://arxiv.org/html/2605.05480#bib.bib8)\]uses heuristic weights without axiomatic guarantees\.
Table[1](https://arxiv.org/html/2605.05480#S2.T1)summarizes which methods generate or resolve each gap\.
Table 1:Gap analysis: “cause” = this method generates the gap;✓\\checkmark= gap filled;≈\\approx= partially;×\\times= not resolved\.Table 2:Comparison among XAI unification approaches with respect to properties that justify composite analysis\.≈\\approx= partially satisfied\.∗without formal guarantees; Comm\. = commensurability\.
## 3The GRALIS Framework
### 3\.1Definitions and Notation
###### Definition 3\.1\(GRALIS explicit formula\)\.
The explicit formula of GRALIS combines three components:
GRALISi\(x\)=1Zx∑S⊆ℱ∖\{i\}\|S\|\!\(n−\|S\|−1\)\!n\!⏟Shapley weight⋅πx\(xS\)⏟LIME kernel⋅\(xi−xi′\)∫01∇iF\(x~\(S,α\)\)𝑑α⏟IG conditioned onS,\\mathrm\{GRALIS\}\_\{i\}\(x\)\\;=\\;\\frac\{1\}\{Z\_\{x\}\}\\sum\_\{S\\subseteq\\mathcal\{F\}\\setminus\\\{i\\\}\}\\underbrace\{\\frac\{\|S\|\!\(n\-\|S\|\-1\)\!\}\{n\!\}\}\_\{\\text\{Shapley weight\}\}\\cdot\\;\\underbrace\{\\pi\_\{x\}\(x\_\{S\}\)\}\_\{\\text\{LIME kernel\}\}\\cdot\\;\\underbrace\{\(x\_\{i\}\-x^\{\\prime\}\_\{i\}\)\\int\_\{0\}^\{1\}\\nabla\_\{i\}F\(\\tilde\{x\}^\{\(S,\\alpha\)\}\)\\,d\\alpha\}\_\{\\text\{IG conditioned on \}S\},\(3\)whereZx=∑S⊆ℱ\|S\|\!\(n−\|S\|−1\)\!n\!πx\(xS\)Z\_\{x\}=\\sum\_\{S\\subseteq\\mathcal\{F\}\}\\frac\{\|S\|\!\(n\-\|S\|\-1\)\!\}\{n\!\}\\,\\pi\_\{x\}\(x\_\{S\}\)and the evaluation point isx~j\(S,α\)=xj′\+α\(xj−xj′\)\\tilde\{x\}^\{\(S,\\alpha\)\}\_\{j\}=x^\{\\prime\}\_\{j\}\+\\alpha\(x\_\{j\}\-x^\{\\prime\}\_\{j\}\)ifj∈S∪\{i\}j\\in S\\cup\\\{i\\\},xj′x^\{\\prime\}\_\{j\}otherwise\. The IG path is not global, but*conditioned on coalitionSS*: features outsideSSremain at the baseline during integration\.
#### Decomposition of the three components\.
- •Shapley weightw\(S\)=\|S\|\!\(n−\|S\|−1\)\!/n\!w\(S\)=\|S\|\!\(n\-\|S\|\-1\)\!/n\!: guarantees symmetry and the dummy axiom; averages over all feature insertion orders\.
- •LIME kernelπx\(xS\)=exp\(−‖xS−xS′‖2/2σ2\)\\pi\_\{x\}\(x\_\{S\}\)=\\exp\(\-\\left\\\|x\_\{S\}\-x^\{\\prime\}\_\{S\}\\right\\\|^\{2\}/2\\sigma^\{2\}\): penalizes coalitions far fromxx; introduces locality\. Withσ→∞\\sigma\\to\\inftyone obtains KernelSHAP; withσ→0\\sigma\\to 0LIME\.
- •Conditioned IG: captures the curvature ofFFalong the path specific to each coalition \(Gap 2 filled\)\. By inheriting the FTC, it guarantees completeness \(Gap 3 filled\)\.
The canonical triple of GRALIS is𝒬=2ℱ∖\{i\}×\[0,1\]\\mathcal\{Q\}=2^\{\\mathcal\{F\}\\setminus\\\{i\\\}\}\\times\[0,1\],wS,α=w~\(S\)dαw\_\{S,\\alpha\}=\\tilde\{w\}\(S\)\\,d\\alpha,ΔS,α\(i\)=\(xi−xi′\)∇iF\(x~\(S,α\)\)\\Delta^\{\(i\)\}\_\{S,\\alpha\}=\(x\_\{i\}\-x^\{\\prime\}\_\{i\}\)\\,\\nabla\_\{i\}F\(\\tilde\{x\}^\{\(S,\\alpha\)\}\), strictly richer than the triples of GradCAM, SHAP, LIME and IG individually\.
x1x\_\{1\}x2x\_\{2\}x1′x^\{\\prime\}\_\{1\}x2′x^\{\\prime\}\_\{2\}x1x\_\{1\}x2x\_\{2\}x′x^\{\\prime\}xx\(x1,x2′\)\(x\_\{1\},\\,x^\{\\prime\}\_\{2\}\)S=∅S\\\!=\\\!\\emptyset:x~\(∅,α\)=\(x1′\+αd1,x2′\)\\tilde\{x\}^\{\(\\emptyset,\\alpha\)\}\\\!=\\\!\(x^\{\\prime\}\_\{1\}\\\!\+\\\!\\alpha d\_\{1\},\\;x^\{\\prime\}\_\{2\}\)S=\{2\}S\\\!=\\\!\\\{2\\\}:x~\(\{2\},α\)=\(x1′\+αd1,x2′\+αd2\)\\tilde\{x\}^\{\(\\\{2\\\},\\alpha\)\}\\\!=\\\!\(x^\{\\prime\}\_\{1\}\\\!\+\\\!\\alpha d\_\{1\},\\;x^\{\\prime\}\_\{2\}\\\!\+\\\!\\alpha d\_\{2\}\)GRALIS \(i=1i\\\!=\\\!1,n=2n\\\!=\\\!2\):average over2n−1=22^\{n\-1\}\\\!=\\\!2pathswith weightsw\(S\)⋅πx\(xS\)w\(S\)\\cdot\\pi\_\{x\}\(x\_\{S\}\)\.Figure 1:Conditioned integration paths in GRALIS \(n=2n=2, target featurei=1i=1,dj=xj−xj′d\_\{j\}=x\_\{j\}\-x^\{\\prime\}\_\{j\}\)\. The blue path \(S=∅S=\\emptyset\) keepsx2x\_\{2\}at the baseline during integration: captures the*pure*contribution ofx1x\_\{1\}, ignoring the interaction withx2x\_\{2\}\. The red path \(S=\{2\}S=\\\{2\\\}\) moves both features simultaneously: captures the mixed curvature∂2F/∂x1∂x2\\partial^\{2\}F/\\partial x\_\{1\}\\partial x\_\{2\}\(Gap 2\)\. The standard IG gradient corresponds exclusively to the pathS=ℱ∖\{i\}S=\\mathcal\{F\}\\setminus\\\{i\\\}\. GRALIS averages over all2n−12^\{n\-1\}paths with Shapley×\\timesLIME\-kernel weights, producing an attribution that balances completeness and local curvature\.###### Definition 3\.3\(GRALIS canonical structure\)\.
Letf:ℝn→ℝf:\\mathbb\{R\}^\{n\}\\to\\mathbb\{R\}be a model and let\(𝒬,Σ,μ\)\(\\mathcal\{Q\},\\Sigma,\\mu\)be aσ\\sigma\-finite measure space fixed a priori\. The GRALIS*canonical structure*is a triple\(𝒬,w,Δ\)\(\\mathcal\{Q\},w,\\Delta\)where:
- •𝒬\\mathcal\{Q\}is the integration index space \(continuous or discrete\);
- •w:𝒬→ℝ\+w:\\mathcal\{Q\}\\to\\mathbb\{R\}^\{\+\}is the weight function with∫𝒬w\(q\)𝑑μ\(q\)=1\\int\_\{\\mathcal\{Q\}\}w\(q\)\\,d\\mu\(q\)=1;
- •Δ:𝒬→ℝn\\Delta:\\mathcal\{Q\}\\to\\mathbb\{R\}^\{n\}is the marginal contribution function subject to the*constitutive constraint*:∑iΔi\(f,x,x′,q\)=f\(x\)−f\(x′\)\\sum\_\{i\}\\Delta\_\{i\}\(f,x,x^\{\\prime\},q\)=f\(x\)\-f\(x^\{\\prime\}\)forμ\\mu\-a\.e\.q∈𝒬q\\in\\mathcal\{Q\}\.
The GRALIS attribution for featureiiis:
ϕiGRALIS\(f,x,x′\)=∫𝒬w\(q\)⋅Δi\(f,x,x′,q\)𝑑μ\(q\)\.\\phi\_\{i\}^\{\\textsc\{GRALIS\}\}\(f,x,x^\{\\prime\}\)\\;=\\;\\int\_\{\\mathcal\{Q\}\}w\(q\)\\cdot\\Delta\_\{i\}\(f,x,x^\{\\prime\},q\)\\,d\\mu\(q\)\.
In the discrete case \(𝒬\\mathcal\{Q\}finite or countable\), the integral is understood as a sum andL2\(𝒬\)=ℝ\|𝒬\|L^\{2\}\(\\mathcal\{Q\}\)=\\mathbb\{R\}^\{\|\\mathcal\{Q\}\|\}; the Riesz Theorem applies in both cases\.
### 3\.2Theorem 1: Unified Canonical Form \(Riesz\)
###### Theorem 3\.4\(Canonical Form — Riesz\)\.
Let𝒟\(𝒬\)⊆L2\(𝒬\)\\mathcal\{D\}\(\\mathcal\{Q\}\)\\subseteq L^\{2\}\(\\mathcal\{Q\}\)be the closed subspace of admissible marginal functions\. An attribution methodϕ\\phisatisfying the conditions:
- \(a\)𝒬\\mathcal\{Q\}is a*measurable index space*\(𝒬,𝒜,μ\)\(\\mathcal\{Q\},\\mathcal\{A\},\\mu\)fixed a priori, independently of the specific inputxx\.𝒬\\mathcal\{Q\}*is not arbitrary*: it is*induced*by the attribution mechanism, and GRALIS operates conditionally on this choice\. Concretely: for Integrated Gradients,𝒬=\[0,1\]\\mathcal\{Q\}=\[0,1\]\(path\) with Lebesgue measure; for SHAP,𝒬=2N\\mathcal\{Q\}=2^\{N\}\(coalitions\) with counting measure; for GRALIS,𝒬\\mathcal\{Q\}is the image space with measure induced by the superpixel segmentation\. The framework only requires thatμ\\mubeσ\\sigma\-finite and thatw⋅Δ∈L2\(𝒬,μ\)w\\cdot\\Delta\\in L^\{2\}\(\\mathcal\{Q\},\\mu\);
- \(b\)the evaluation mapEi:𝒟\(𝒬\)→ℝE\_\{i\}:\\mathcal\{D\}\(\\mathcal\{Q\}\)\\to\\mathbb\{R\},Ei\(Δi\):=ϕi\(f,x,x′;Δi\)E\_\{i\}\(\\Delta\_\{i\}\):=\\phi\_\{i\}\(f,x,x^\{\\prime\};\\Delta\_\{i\}\), is linear and continuous with respect to theL2\(𝒬\)L^\{2\}\(\\mathcal\{Q\}\)norm;
- \(c\)Δ\\Deltasatisfies the constitutive constraint of Definition[3\.3](https://arxiv.org/html/2605.05480#S3.Thmtheorem3);
admits a unique representation:
ϕi\(f,x,x′\)=∫𝒬w\(q\)⋅Δi\(q\)𝑑μ\(q\)\\phi\_\{i\}\(f,x,x^\{\\prime\}\)\\;=\\;\\int\_\{\\mathcal\{Q\}\}w\(q\)\\cdot\\Delta\_\{i\}\(q\)\\,d\\mu\(q\)for a uniquew∈L2\(𝒬\)w\\in L^\{2\}\(\\mathcal\{Q\}\)\.
###### Proof\.
*Step 1 — Linearity and continuity\.*The functionalΦi\[Δi\]=∫𝒬w\(q\)⋅Δi\(q\)𝑑μ\(q\)\\Phi\_\{i\}\[\\Delta\_\{i\}\]=\\int\_\{\\mathcal\{Q\}\}w\(q\)\\cdot\\Delta\_\{i\}\(q\)\\,d\\mu\(q\)is linear by linearity of the integral, and continuous:\|Φi\[Δi\]\|≤‖w‖L2‖Δi‖L2\|\\Phi\_\{i\}\[\\Delta\_\{i\}\]\|\\leq\\left\\\|w\\right\\\|\_\{L^\{2\}\}\\left\\\|\\Delta\_\{i\}\\right\\\|\_\{L^\{2\}\}by Cauchy\-Schwarz\.
*Step 2 — Uniqueness via Riesz\.*By the Riesz Representation Theorem\[[25](https://arxiv.org/html/2605.05480#bib.bib25)\], every continuous linear functionalΦi:L2\(𝒬\)→ℝ\\Phi\_\{i\}:L^\{2\}\(\\mathcal\{Q\}\)\\to\\mathbb\{R\}admits a unique representationΦi\[g\]=⟨g,w⟩L2=∫𝒬g\(q\)⋅w\(q\)𝑑μ\(q\)\\Phi\_\{i\}\[g\]=\\langle g,w\\rangle\_\{L^\{2\}\}=\\int\_\{\\mathcal\{Q\}\}g\(q\)\\cdot w\(q\)\\,d\\mu\(q\)for a uniquew∈L2\(𝒬\)w\\in L^\{2\}\(\\mathcal\{Q\}\)\. Under conditions \(a\)–\(c\),Φi\\Phi\_\{i\}is such a functional\.
*Step 3 — Special cases\.*
- •GradCAM\-lin\(Case 1\):spq=∑kαkcApqk=∑k,i,j1Z⋅∂yc∂Aijk⋅Apqk⏟Δ\(pq\)\(f\)s\_\{pq\}=\\sum\_\{k\}\\alpha^\{c\}\_\{k\}A^\{k\}\_\{pq\}=\\sum\_\{k,i,j\}\\tfrac\{1\}\{Z\}\\cdot\\underbrace\{\\tfrac\{\\partial y^\{c\}\}\{\\partial A^\{k\}\_\{ij\}\}\\cdot A^\{k\}\_\{pq\}\}\_\{\\Delta^\{\(pq\)\}\(f\)\}\. Triple:𝒬=\{\(k,i,j\)\}\\mathcal\{Q\}=\\\{\(k,i,j\)\\\},wk,i,j=1/Zw\_\{k,i,j\}=1/Z,Δk,i,j=∂yc∂Aijk⋅Apqk\\Delta\_\{k,i,j\}=\\tfrac\{\\partial y^\{c\}\}\{\\partial A^\{k\}\_\{ij\}\}\\cdot A^\{k\}\_\{pq\}\.
- •SHAP\(Case 2\):ϕi=∑S⊆ℱ∖\{i\}\|S\|\!\(n−\|S\|−1\)\!n\!⏟w\(S\)⋅\[f\(S∪\{i\}\)−f\(S\)\]⏟ΔS\(i\)\(f\)\\phi\_\{i\}=\\sum\_\{S\\subseteq\\mathcal\{F\}\\setminus\\\{i\\\}\}\\underbrace\{\\tfrac\{\|S\|\!\(n\-\|S\|\-1\)\!\}\{n\!\}\}\_\{w\(S\)\}\\cdot\\underbrace\{\[f\(S\\cup\\\{i\\\}\)\-f\(S\)\]\}\_\{\\Delta^\{\(i\)\}\_\{S\}\(f\)\}\. Triple:𝒬=2ℱ∖\{i\}\\mathcal\{Q\}=2^\{\\mathcal\{F\}\\setminus\\\{i\\\}\},w\(S\)=\|S\|\!\(n−\|S\|−1\)\!/n\!w\(S\)=\|S\|\!\(n\-\|S\|\-1\)\!/n\!,ΔS\(i\)=f\(S∪\{i\}\)−f\(S\)\\Delta^\{\(i\)\}\_\{S\}=f\(S\\cup\\\{i\\\}\)\-f\(S\)\. Weights sum to 1:∑Sw\(S\)=1\\sum\_\{S\}w\(S\)=1\(Lemma[3\.6](https://arxiv.org/html/2605.05480#S3.Thmtheorem6)\)\.
- •LIME\(Case 3\): The weighted linear regression solvesw∗=\(XTWX\)−1XTWfw^\{\*\}=\(X^\{T\}WX\)^\{\-1\}X^\{T\}WfwhereXt,i=zi\(t\)∈\{0,1\}X\_\{t,i\}=z^\{\(t\)\}\_\{i\}\\in\\\{0,1\\\},W=diag\(πx\(z\(t\)\)\)W=\\mathrm\{diag\}\(\\pi\_\{x\}\(z^\{\(t\)\}\)\)andft=f\(x⊙z\(t\)\+x′⊙\(1−z\(t\)\)\)f\_\{t\}=f\(x\\odot z^\{\(t\)\}\+x^\{\\prime\}\\odot\(1\-z^\{\(t\)\}\)\)\. The mapf↦wi∗f\\mapsto w^\{\*\}\_\{i\}is*linear*inff\(fixed matrix inversion\)\. Triple:𝒬=\{z\(1\),…,z\(T\)\}\\mathcal\{Q\}=\\\{z^\{\(1\)\},\\ldots,z^\{\(T\)\}\\\}, wt=\[\(XTWX\)−1XTW\]i,t,Δt\(i\)=f\(x⊙z\(t\)\+x′⊙\(1−z\(t\)\)\)\.w\_\{t\}=\\bigl\[\(X^\{T\}WX\)^\{\-1\}X^\{T\}W\\bigr\]\_\{i,t\},\\quad\\Delta^\{\(i\)\}\_\{t\}=f\(x\\odot z^\{\(t\)\}\+x^\{\\prime\}\\odot\(1\-z^\{\(t\)\}\)\)\.*Note:*the nonlinearity of\(XTWX\)−1\(X^\{T\}WX\)^\{\-1\}with respect to the samples\{z\(t\)\}\\\{z^\{\(t\)\}\\\}is removed since the design matrix is fixed a priori, before queryingff\.
- •Integrated Gradients\(Case 4\):IGi≈∑j=1kxi−xi′k⏟wj⋅∇iF\(x\(j\)\)⏟Δj\(i\)\(f\)\\mathrm\{IG\}\_\{i\}\\approx\\sum\_\{j=1\}^\{k\}\\underbrace\{\\tfrac\{x\_\{i\}\-x^\{\\prime\}\_\{i\}\}\{k\}\}\_\{w\_\{j\}\}\\cdot\\underbrace\{\\nabla\_\{i\}F\(x^\{\(j\)\}\)\}\_\{\\Delta^\{\(i\)\}\_\{j\}\(f\)\}\. Discrete triple:𝒬=\{1,…,k\}\\mathcal\{Q\}=\\\{1,\\ldots,k\\\},wj=\(xi−xi′\)/kw\_\{j\}=\(x\_\{i\}\-x^\{\\prime\}\_\{i\}\)/k,Δj\(i\)=∇iF\(x\(j\)\)\\Delta^\{\(i\)\}\_\{j\}=\\nabla\_\{i\}F\(x^\{\(j\)\}\)\. In the continuous limit:𝒬=\[0,1\]\\mathcal\{Q\}=\[0,1\],w\(α\)dα=\(xi−xi′\)dαw\(\\alpha\)\\,d\\alpha=\(x\_\{i\}\-x^\{\\prime\}\_\{i\}\)\\,d\\alpha,Δ\(i\)\(α\)=∇iF\(x′\+α\(x−x′\)\)\\Delta^\{\(i\)\}\(\\alpha\)=\\nabla\_\{i\}F\(x^\{\\prime\}\+\\alpha\(x\-x^\{\\prime\}\)\)\.
*Uniqueness Lemma \(Riesz\):*the marginal contributionΔq\(i\)\(f\)=f\(q\+i\)−f\(q−i\)\\Delta^\{\(i\)\}\_\{q\}\(f\)=f\(q^\{\+i\}\)\-f\(q^\{\-i\}\)is linear inff, whereq\+iq^\{\+i\}denotes configurationqqwith featureiiincluded \(e\.g\. coalitionS∪\{i\}S\\cup\\\{i\\\}\) andq−iq^\{\-i\}denotes it with featureiiexcluded \(e\.g\.SS\)\. The compositionΦi\[Δ\(i\)\(⋅;f\)\]=⟨w,Δ\(i\)\(f\)⟩\\Phi\_\{i\}\[\\Delta^\{\(i\)\}\(\\,\\cdot\\,;f\)\]=\\langle w,\\Delta^\{\(i\)\}\(f\)\\rangletherefore remains linear inff, and the canonical form is necessary: it is not a design choice, but a structural consequence of any linear additive attribution \(Riesz, 1909\)\. ∎
###### Lemma 3\.6\(Combinatorial identity — Shapley weights\)\.
For every fixed featureii, the Shapley weights form a probability distribution over the2n−12^\{n\-1\}coalitions ofℱ∖\{i\}\\mathcal\{F\}\\setminus\\\{i\\\}:
∑S⊆ℱ∖\{i\}\|S\|\!\(n−\|S\|−1\)\!n\!=1\.\\sum\_\{S\\subseteq\\mathcal\{F\}\\setminus\\\{i\\\}\}\\frac\{\|S\|\!\(n\-\|S\|\-1\)\!\}\{n\!\}\\;=\\;1\.
###### Proof\.
Group by cardinalityk=\|S\|∈\{0,…,n−1\}k=\|S\|\\in\\\{0,\\ldots,n\-1\\\}, noting that there are\(n−1k\)\\binom\{n\-1\}\{k\}subsets ofℱ∖\{i\}\\mathcal\{F\}\\setminus\\\{i\\\}of cardinalitykk:
∑S⊆ℱ∖\{i\}w\(S\)\\displaystyle\\sum\_\{S\\subseteq\\mathcal\{F\}\\setminus\\\{i\\\}\}w\(S\)=∑k=0n−1\(n−1k\)⋅k\!\(n−k−1\)\!n\!\\displaystyle=\\sum\_\{k=0\}^\{n\-1\}\\binom\{n\-1\}\{k\}\\cdot\\frac\{k\!\(n\-k\-1\)\!\}\{n\!\}=∑k=0n−1\(n−1\)\!k\!\(n−1−k\)\!⋅k\!\(n−k−1\)\!n\!\.\\displaystyle=\\sum\_\{k=0\}^\{n\-1\}\\frac\{\(n\-1\)\!\}\{k\!\(n\-1\-k\)\!\}\\cdot\\frac\{k\!\(n\-k\-1\)\!\}\{n\!\}\.Sincen−k−1=n−1−kn\-k\-1=n\-1\-k, we have\(n−k−1\)\!=\(n−1−k\)\!\(n\-k\-1\)\!=\(n\-1\-k\)\!, so each term equals:
\(n−1\)\!n\!=1n\.\\frac\{\(n\-1\)\!\}\{n\!\}=\\frac\{1\}\{n\}\.The sum overk=0,…,n−1k=0,\\ldots,n\-1containsnnterms, each equal to1/n1/n:
∑Sw\(S\)=n⋅1n=1\.□\\sum\_\{S\}w\(S\)=n\\cdot\\frac\{1\}\{n\}=1\.\\qquad\\square∎
Table 3:Mapping of XAI methods onto the canonical triple\(𝒬,w,Δ\)\(\\mathcal\{Q\},w,\\Delta\)\.
### 3\.3Theorem 2: Completeness
###### Theorem 3\.8\(Completeness\)\.
GRALIS attributions satisfy exact completeness:
∑i=1nϕiGRALIS\(f,x,x′\)=f\(x\)−f\(x′\)\.\\sum\_\{i=1\}^\{n\}\\phi\_\{i\}^\{\\textsc\{GRALIS\}\}\(f,x,x^\{\\prime\}\)\\;=\\;f\(x\)\-f\(x^\{\\prime\}\)\.
###### Proof\.
By linearity of Fubini and the constitutive constraint onΔ\\Delta:
∑iϕi=∫𝒬w\(q\)\[∑iΔi\(q\)\]𝑑μ\(q\)=∫𝒬w\(q\)\[f\(x\)−f\(x′\)\]𝑑μ\(q\)=f\(x\)−f\(x′\)\.\\sum\_\{i\}\\phi\_\{i\}=\\int\_\{\\mathcal\{Q\}\}w\(q\)\\Bigl\[\\sum\_\{i\}\\Delta\_\{i\}\(q\)\\Bigr\]\\,d\\mu\(q\)=\\int\_\{\\mathcal\{Q\}\}w\(q\)\\,\[f\(x\)\-f\(x^\{\\prime\}\)\]\\,d\\mu\(q\)=f\(x\)\-f\(x^\{\\prime\}\)\.
*Lemma A \(completeness of conditioned IG onSS\)\.*For everyS⊆ℱS\\subseteq\\mathcal\{F\}and fixedx′x^\{\\prime\}:∑j∈SIGjS=F\(xS\)−F\(x′\)\\sum\_\{j\\in S\}\\mathrm\{IG\}^\{S\}\_\{j\}=F\(x\_\{S\}\)\-F\(x^\{\\prime\}\), wherexSx\_\{S\}denotes the vector withxjx\_\{j\}forj∈Sj\\in Sandxj′x^\{\\prime\}\_\{j\}otherwise\. Proof: by constructionx~\(S,0\)=x′\\tilde\{x\}^\{\(S,0\)\}=x^\{\\prime\}andx~\(S,1\)=xS\\tilde\{x\}^\{\(S,1\)\}=x\_\{S\}\. Applying the fundamental theorem of calculus to the pathγ\(α\)=x~\(S,α\)\\gamma\(\\alpha\)=\\tilde\{x\}^\{\(S,\\alpha\)\}:F\(xS\)−F\(x′\)=∫01ddαF\(x~\(S,α\)\)𝑑α=∑j∈S\(xj−xj′\)∫01∇jF\(x~\(S,α\)\)𝑑α=∑j∈SIGjSF\(x\_\{S\}\)\-F\(x^\{\\prime\}\)=\\int\_\{0\}^\{1\}\\frac\{d\}\{d\\alpha\}F\(\\tilde\{x\}^\{\(S,\\alpha\)\}\)\\,d\\alpha=\\sum\_\{j\\in S\}\(x\_\{j\}\-x^\{\\prime\}\_\{j\}\)\\int\_\{0\}^\{1\}\\nabla\_\{j\}F\(\\tilde\{x\}^\{\(S,\\alpha\)\}\)\\,d\\alpha=\\sum\_\{j\\in S\}\\mathrm\{IG\}^\{S\}\_\{j\}\. ∎
### 3\.4Theorem 3: Monte Carlo Convergence \(GRALIS\-MC\)
GRALIS\-MC approximates the integral with Monte Carlo sampling:
ϕ^i=1m∑r=1mΔi\(f,x,x′,qr\),qr∼w\(q\)\.\\hat\{\\phi\}\_\{i\}=\\frac\{1\}\{m\}\\sum\_\{r=1\}^\{m\}\\Delta\_\{i\}\(f,x,x^\{\\prime\},q\_\{r\}\),\\quad q\_\{r\}\\sim w\(q\)\.
###### Theorem 3\.10\(GRALIS\-MC Convergence\)\.
With probability1−δ1\-\\delta:
\|ϕ^i\(m,k\)−ϕiGRALIS\(x\)\|≤Bmδ⏟MC error\+\(xi−xi′\)2‖∇2F‖∞8k⏟Riemann error,\\Bigl\|\\hat\{\\phi\}\_\{i\}^\{\(m,k\)\}\-\\phi\_\{i\}^\{\\textsc\{GRALIS\}\}\(x\)\\Bigr\|\\;\\leq\\;\\underbrace\{\\frac\{B\}\{\\sqrt\{m\\delta\}\}\}\_\{\\text\{MC error\}\}\\;\+\\;\\underbrace\{\\frac\{\(x\_\{i\}\-x^\{\\prime\}\_\{i\}\)^\{2\}\\left\\\|\\nabla^\{2\}F\\right\\\|\_\{\\infty\}\}\{8k\}\}\_\{\\text\{Riemann error\}\},whereB=supS,α\|πx\(xS\)\(xi−xi′\)∇iF\(x~\(S,α\)\)\|B=\\sup\_\{S,\\alpha\}\|\\pi\_\{x\}\(x\_\{S\}\)\(x\_\{i\}\-x^\{\\prime\}\_\{i\}\)\\nabla\_\{i\}F\(\\tilde\{x\}^\{\(S,\\alpha\)\}\)\|\. Complexity drops fromO\(2n⋅k\)O\(2^\{n\}\\cdot k\)toO\(m⋅n⋅k\)O\(m\\cdot n\\cdot k\)\.
###### Proof\.
*Step 1 — Unbiased estimator\.*LetQπ,iQ\_\{\\pi,i\}be the contribution of a single permutationπ\\pi:𝔼π\[Qπ,i\]=ϕiGRALIS\(x\)\\mathbb\{E\}\_\{\\pi\}\[Q\_\{\\pi,i\}\]=\\phi\_\{i\}^\{\\textsc\{GRALIS\}\}\(x\)\. Each permutation samples the2n2^\{n\}coalitions with weight∝w\(S\)\\propto w\(S\): sampling is equivalent to the exact weighted sum in expectation\.
*Step 2 — Variance bound\.*Var\[ϕ^i\(m\)\]≤B2/m\\operatorname\{Var\}\[\\hat\{\\phi\}\_\{i\}^\{\(m\)\}\]\\leq B^\{2\}/msince\|Qπ,i\|≤B\|Q\_\{\\pi,i\}\|\\leq Bby definition\. By Chebyshev:P\(\|ϕ^i−ϕi\|\>B/mδ\)≤δP\(\|\\hat\{\\phi\}\_\{i\}\-\\phi\_\{i\}\|\>B/\\sqrt\{m\\delta\}\)\\leq\\delta\.
*Step 3 — IG path discretization error\.*Withkkuniform steps of widthh=1/kh=1/k, the Riemann sum1k∑j=1kg\(j/k\)\\frac\{1\}\{k\}\\sum\_\{j=1\}^\{k\}g\(j/k\)approximates∫01g𝑑α\\int\_\{0\}^\{1\}g\\,d\\alphawith error:
\|∫01g𝑑α−1k∑j=1kg\(j/k\)\|≤h2‖g′‖∞=12k‖g′‖∞\.\\Bigl\|\\int\_\{0\}^\{1\}g\\,d\\alpha\-\\frac\{1\}\{k\}\\sum\_\{j=1\}^\{k\}g\(j/k\)\\Bigr\|\\;\\leq\\;\\frac\{h\}\{2\}\\,\\\|g^\{\\prime\}\\\|\_\{\\infty\}\\;=\\;\\frac\{1\}\{2k\}\\,\\\|g^\{\\prime\}\\\|\_\{\\infty\}\.Sinceg\(α\)=∇iF\(x~\(S,α\)\)g\(\\alpha\)=\\nabla\_\{i\}F\(\\tilde\{x\}^\{\(S,\\alpha\)\}\), we have‖g′‖∞≤\(xi−xi′\)‖∇2F‖∞/4\\\|g^\{\\prime\}\\\|\_\{\\infty\}\\leq\(x\_\{i\}\-x^\{\\prime\}\_\{i\}\)\\,\\\|\\nabla^\{2\}F\\\|\_\{\\infty\}/4, which gives Riemann error≤\(xi−xi′\)2‖∇2F‖∞8k=O\(1/k\)\\leq\\frac\{\(x\_\{i\}\-x^\{\\prime\}\_\{i\}\)^\{2\}\\\|\\nabla^\{2\}F\\\|\_\{\\infty\}\}\{8k\}=O\(1/k\)\.*Note:*the midpoint rule would improve toO\(1/k2\)O\(1/k^\{2\}\)forF∈C2F\\in C^\{2\}; Gauss\-Legendre quadrature withppnodes would achieveO\(1/k2p\)O\(1/k^\{2p\}\)\.
*Step 4 — Total bound*with probability1−δ1\-\\deltaby Chebyshev\.□\\square
*Variance reduction — antithetic sampling\.*ϕ^iantith=12m∑t=1m\[Qπ\(t\),i\+Qπ\(t\)rev,i\]\\hat\{\\phi\}\_\{i\}^\{\\mathrm\{antith\}\}=\\frac\{1\}\{2m\}\\sum\_\{t=1\}^\{m\}\[Q\_\{\\pi^\{\(t\)\},i\}\+Q\_\{\\pi^\{\(t\)\\mathrm\{rev\}\},i\}\],Var\[ϕ^iantith\]≤Var\[Qπ\]/2m\\operatorname\{Var\}\[\\hat\{\\phi\}\_\{i\}^\{\\mathrm\{antith\}\}\]\\leq\\operatorname\{Var\}\[Q\_\{\\pi\}\]/2m\. The coalitions fromπ\\piandπrev\\pi^\{\\mathrm\{rev\}\}are complementary, producing negative correlation and halved variance without additional cost\. ∎
The GRALIS\-MC algorithm is given in Algorithm[2](https://arxiv.org/html/2605.05480#S3.F2)\.
Algorithm: GRALIS\-MCInput:FF,xx,x′x^\{\\prime\}, samplesmm, stepskk, bandwidthσ\\sigmaOutput:φ\[1\.\.n\]\\varphi\[1\.\.n\]— normalized attributionsφ\[i\]←0\\varphi\[i\]\\leftarrow 0for eachii;Z←0Z\\leftarrow 0fort=1t=1tomm:π←\\pi\\leftarrowuniform random permutation of\{1\.\.n\}\\\{1\.\.n\\\}S←∅S\\leftarrow\\emptysetforiiin orderπ\\pi:ig\_val←\(xi−xi′\)⋅1k∑j=1k∇iF\(x~\(S,j/k\)\)\\mathrm\{ig\\\_val\}\\leftarrow\(x\_\{i\}\-x^\{\\prime\}\_\{i\}\)\\cdot\\tfrac\{1\}\{k\}\\sum\_\{j=1\}^\{k\}\\nabla\_\{i\}F\(\\tilde\{x\}^\{\(S,j/k\)\}\)\[Riemann\]πw←exp\(−‖xS−xS′‖2/2σ2\)\\pi\_\{w\}\\leftarrow\\exp\\\!\\bigl\(\-\\left\\\|x\_\{S\}\-x^\{\\prime\}\_\{S\}\\right\\\|^\{2\}/2\\sigma^\{2\}\\bigr\)φ\[i\]\+=πw⋅ig\_val\\varphi\[i\]\\mathrel\{\+\}=\\pi\_\{w\}\\cdot\\mathrm\{ig\\\_val\};Z\+=πwZ\\mathrel\{\+\}=\\pi\_\{w\}S←S∪\{i\}S\\leftarrow S\\cup\\\{i\\\}returnφ/Z\\varphi/ZifZ\>0Z\>0, elseφ\\varphi
Figure 2:GRALIS\-MC pseudocode\. The random permutation samples the2n2^\{n\}coalitions with weight∝w\(S\)\\propto w\(S\), identically to KernelSHAP\.Table 4:Comparative computational complexity\.
### 3\.5Theorem 4: Shapley Interaction Values
Letρ:𝒬→2N\\rho:\\mathcal\{Q\}\\to 2^\{N\}be a measurable projection from the continuous space𝒬\\mathcal\{Q\}to the coalition lattice2N2^\{N\}\. In practical contexts,ρ\\rhois constructed via measurable grouping operators \(binary masks, SLIC superpixels, channel\-based partitions\), guaranteeing its existence by construction\. GRALIS induces the cooperative game:
vG\(S\)=∫ρ−1\(S\)w\(q\)⋅Δ\(f,x,x′,q\)𝑑μ\(q\),ρ−1\(S\):=\{q∈𝒬:ρ\(q\)=S\}\.v\_\{\\\!G\}\(S\)\\;=\\;\\int\_\{\\rho^\{\-1\}\(S\)\}w\(q\)\\cdot\\Delta\(f,x,x^\{\\prime\},q\)\\,d\\mu\(q\),\\quad\\rho^\{\-1\}\(S\):=\\\{q\\in\\mathcal\{Q\}:\\rho\(q\)=S\\\}\.\(4\)
###### Theorem 3\.11\(Shapley Interaction Values\)\.
The Shapley Interaction Values of GRALIS are:
IijGRALIS\(ρ\)=IijSh\(vG\)=∑S⊆N∖\{i,j\}πn\(\|S\|\)\[vG\(S∪\{i,j\}\)−vG\(S∪\{i\}\)−vG\(S∪\{j\}\)\+vG\(S\)\],I\_\{ij\}^\{\\mathrm\{GRALIS\}\}\(\\rho\)\\;=\\;I\_\{ij\}^\{\\mathrm\{Sh\}\}\(v\_\{\\\!G\}\)\\;=\\;\\sum\_\{S\\subseteq N\\setminus\\\{i,j\\\}\}\\pi\_\{n\}\(\|S\|\)\\bigl\[v\_\{\\\!G\}\(S\\cup\\\{i,j\\\}\)\-v\_\{\\\!G\}\(S\\cup\\\{i\\\}\)\-v\_\{\\\!G\}\(S\\cup\\\{j\\\}\)\+v\_\{\\\!G\}\(S\)\\bigr\],whereπn\(\|S\|\)=\|S\|\!\(n−\|S\|−2\)\!/\(n−1\)\!\\pi\_\{n\}\(\|S\|\)=\|S\|\!\(n\-\|S\|\-2\)\!/\(n\-1\)\!\(Shapley\-interaction weight, Grabisch & Roubens\[[19](https://arxiv.org/html/2605.05480#bib.bib19)\]\)\.
###### Proof\.
*Step 1\.*vG\(∅\)=0v\_\{\\\!G\}\(\\emptyset\)=0\(empty preimage\);vG\(N\)=∫𝒬wΔ𝑑μ=f\(x\)−f\(x′\)v\_\{\\\!G\}\(N\)=\\int\_\{\\mathcal\{Q\}\}w\\,\\Delta\\,d\\mu=f\(x\)\-f\(x^\{\\prime\}\)by Corollary[3\.8](https://arxiv.org/html/2605.05480#S3.Thmtheorem8)\. The preimages\{ρ−1\(S\)\}S⊆N\\\{\\rho^\{\-1\}\(S\)\\\}\_\{S\\subseteq N\}form a measurable partition of𝒬\\mathcal\{Q\}\(eachqqbelongs to exactly oneρ−1\(S\)\\rho^\{\-1\}\(S\)sinceρ\\rhois a function\)\.
*Step 2 \(second\-order difference operator\)\.*ΔijvG\(S\)=vG\(S∪\{i,j\}\)−vG\(S∪\{i\}\)−vG\(S∪\{j\}\)\+vG\(S\)\\Delta\_\{ij\}v\_\{\\\!G\}\(S\)=v\_\{\\\!G\}\(S\\cup\\\{i,j\\\}\)\-v\_\{\\\!G\}\(S\\cup\\\{i\\\}\)\-v\_\{\\\!G\}\(S\\cup\\\{j\\\}\)\+v\_\{\\\!G\}\(S\)is well defined by linearity of the Lebesgue integral on measurable sets\.
*Step 3\.*IijGRALIS\(ρ\)=∑Sπn\(\|S\|\)ΔijvG\(S\)=IijSh\(vG\)I\_\{ij\}^\{\\mathrm\{GRALIS\}\}\(\\rho\)=\\sum\_\{S\}\\pi\_\{n\}\(\|S\|\)\\,\\Delta\_\{ij\}v\_\{\\\!G\}\(S\)=I\_\{ij\}^\{\\mathrm\{Sh\}\}\(v\_\{\\\!G\}\)by definition of Grabisch & Roubens \(1999\)\. The equality is exact, not approximate\.
*Proposition \(ρ\\rho\-equivalence\)\.*Two projectionsρ1,ρ2\\rho\_\{1\},\\rho\_\{2\}that define the same partition of𝒬\\mathcal\{Q\}produce identicalIijGRALISI\_\{ij\}^\{\\mathrm\{GRALIS\}\}\(the structure depends on the preimages, not on the labeling ofNN\)\. ∎
### 3\.6Theorem 5: Hoeffding ANOVA Decomposition
###### Theorem 3\.13\(Coincidence with Hoeffding Decomposition\)\.
Letμ=⨂iμi\\mu=\\bigotimes\_\{i\}\\mu\_\{i\}be a product measure \(feature independence assumption\) andw\(q\)w\(q\)constructed as the product measure induced byμ\\mu\. Under these conditions, the termsΦTGRALIS\\Phi\_\{T\}^\{\\textsc\{GRALIS\}\}induced by GRALIS*coincide*with the terms of the Hoeffding functional decomposition\[[20](https://arxiv.org/html/2605.05480#bib.bib20)\]:
F\(x\)−F\(x′\)=∑∅≠T⊆ℱΦTGRALIS\(x\)\.F\(x\)\-F\(x^\{\\prime\}\)\\;=\\;\\sum\_\{\\emptyset\\neq T\\subseteq\\mathcal\{F\}\}\\Phi\_\{T\}^\{\\textsc\{GRALIS\}\}\(x\)\.
###### Proof sketch\.
*H1 \(existence and uniqueness\)\.*The Hoeffding decomposition is the unique orthogonal decomposition ofF∈L2\(μ\)F\\in L^\{2\}\(\\mu\)such that∫fS𝑑μi=0\\int f\_\{S\}\\,d\\mu\_\{i\}=0for everyi∈Si\\in S\(\[[20](https://arxiv.org/html/2605.05480#bib.bib20)\], Efron & Stein\[[21](https://arxiv.org/html/2605.05480#bib.bib21)\]\)\. Under productμ\\mu, the projectorsPS:L2\(μ\)→LS2\(μ\)P\_\{S\}:L^\{2\}\(\\mu\)\\to L^\{2\}\_\{S\}\(\\mu\)are well defined via marginal integration\.
*H2\.*The termsΦTGRALIS\\Phi\_\{T\}^\{\\textsc\{GRALIS\}\}coincide with the Hoeffding terms in the limitπx→μ\\pi\_\{x\}\\to\\mu\. Since the Hoeffding decomposition is unique \(H1\), coincidence implies orthogonality:⟨ΦTGRALIS,ΦT′GRALIS⟩L2\(μ\)=0\\langle\\Phi\_\{T\}^\{\\textsc\{GRALIS\}\},\\Phi\_\{T^\{\\prime\}\}^\{\\textsc\{GRALIS\}\}\\rangle\_\{L^\{2\}\(\\mu\)\}=0forT≠T′T\\neq T^\{\\prime\}, with∫ΦTGRALIS𝑑μi=0\\int\\Phi\_\{T\}^\{\\textsc\{GRALIS\}\}\\,d\\mu\_\{i\}=0for everyi∈Ti\\in T\.
*H3\.*Completeness at every order:F\(x\)=F\(x′\)\+∑iΦi\+∑i<jΦij\+⋯\+ΦℱF\(x\)=F\(x^\{\\prime\}\)\+\\sum\_\{i\}\\Phi\_\{i\}\+\\sum\_\{i<j\}\\Phi\_\{ij\}\+\\cdots\+\\Phi\_\{\\mathcal\{F\}\}\(exact identity\)\.
*Note\.*Orthogonality requires productμ\\mu\. For non\-product measures, the termsfSf\_\{S\}are not orthogonal and the decomposition loses uniqueness\. ∎
### 3\.7Theorem 6: Sobol Sensitivity Indices
###### Theorem 3\.14\(Extension to Sobol Indices\)\.
In the limit whereπx\\pi\_\{x\}converges to the marginal distributionμ\\muandFFis square\-integrable, the variances of the order\-\|T\|=1\|T\|=1GRALIS terms define the Sobol sensitivity indices\[[22](https://arxiv.org/html/2605.05480#bib.bib22)\]:
Si:=Var\[𝔼\[F∣xi\]\]Var\[F\]=Var\[ΦiGRALIS\(x\)\]Var\[F\]\.S\_\{i\}:=\\frac\{\\operatorname\{Var\}\[\\mathbb\{E\}\[F\\mid x\_\{i\}\]\]\}\{\\operatorname\{Var\}\[F\]\}=\\frac\{\\operatorname\{Var\}\[\\Phi\_\{i\}^\{\\textsc\{GRALIS\}\}\(x\)\]\}\{\\operatorname\{Var\}\[F\]\}\.The higher\-order total indices correspond toSTtot=∑∅≠L⊆TVar\[ΦLGRALIS\]/Var\[F\]S\_\{T\}^\{\\mathrm\{tot\}\}=\\sum\_\{\\emptyset\\neq L\\subseteq T\}\\operatorname\{Var\}\[\\Phi\_\{L\}^\{\\textsc\{GRALIS\}\}\]/\\operatorname\{Var\}\[F\]\.GRALISproduces*local*Sobol indices at pointxx:STSobol=𝔼x∼μ\[STGRALIS\(x\)\]S\_\{T\}^\{\\mathrm\{Sobol\}\}=\\mathbb\{E\}\_\{x\\sim\\mu\}\[S\_\{T\}^\{\\textsc\{GRALIS\}\}\(x\)\]\.
###### Proof\.
By Theorem[3\.13](https://arxiv.org/html/2605.05480#S3.Thmtheorem13),fT=ΦTGRALISf\_\{T\}=\\Phi\_\{T\}^\{\\textsc\{GRALIS\}\}whenπx→μ\\pi\_\{x\}\\to\\mu\. The Hoeffding terms are orthogonal:Cov\[ΦTGRALIS,ΦT′GRALIS\]=0\\operatorname\{Cov\}\[\\Phi\_\{T\}^\{\\textsc\{GRALIS\}\},\\Phi\_\{T^\{\\prime\}\}^\{\\textsc\{GRALIS\}\}\]=0forT≠T′T\\neq T^\{\\prime\}\. ThereforeVar\[F\]=∑∅≠TVar\[ΦTGRALIS\]\\operatorname\{Var\}\[F\]=\\sum\_\{\\emptyset\\neq T\}\\operatorname\{Var\}\[\\Phi\_\{T\}^\{\\textsc\{GRALIS\}\}\], andSTGRALIS:=Var\[ΦTGRALIS\]/Var\[F\]S\_\{T\}^\{\\textsc\{GRALIS\}\}:=\\operatorname\{Var\}\[\\Phi\_\{T\}^\{\\textsc\{GRALIS\}\}\]/\\operatorname\{Var\}\[F\]satisfies∑TST=1\\sum\_\{T\}S\_\{T\}=1and coincides with Sobol’sSTS\_\{T\}\. ∎
### 3\.8Theorem 7: Multi\-Scale Extension \(MS\-GRALIS\)
###### Definition 3\.15\(MS\-GRALIS\)\.
Given a model withLLlevelsh\(1\),…,h\(L\)h^\{\(1\)\},\\ldots,h^\{\(L\)\}, the multi\-scale attribution is:
GRALISiMS\(x\)=∑ℓ=1Lλℓ⋅GRALISi\(ℓ\)\(x\),λℓ\>0,∑ℓλℓ=1\.\\mathrm\{GRALIS\}\_\{i\}^\{\\mathrm\{MS\}\}\(x\)\\;=\\;\\sum\_\{\\ell=1\}^\{L\}\\lambda\_\{\\ell\}\\cdot\\mathrm\{GRALIS\}\_\{i\}^\{\(\\ell\)\}\(x\),\\qquad\\lambda\_\{\\ell\}\>0,\\;\\sum\_\{\\ell\}\\lambda\_\{\\ell\}=1\.
###### Theorem 3\.16\(Minimum\-variance weights\)\.
The weightsλℓ∗\\lambda^\{\*\}\_\{\\ell\}that minimizeVar\[GRALISiMS\]\\operatorname\{Var\}\[\\mathrm\{GRALIS\}\_\{i\}^\{\\mathrm\{MS\}\}\]subject to∑ℓλℓ=1\\sum\_\{\\ell\}\\lambda\_\{\\ell\}=1\(with independent layers\) are given by inverse variance weighting:
λℓ∗=σℓ−2∑ℓ′σℓ′−2,σℓ2=Var\[GRALISi\(ℓ\)\]\.\\lambda^\{\*\}\_\{\\ell\}=\\frac\{\\sigma\_\{\\ell\}^\{\-2\}\}\{\\sum\_\{\\ell^\{\\prime\}\}\\sigma\_\{\\ell^\{\\prime\}\}^\{\-2\}\},\\qquad\\sigma^\{2\}\_\{\\ell\}=\\operatorname\{Var\}\\bigl\[\\mathrm\{GRALIS\}\_\{i\}^\{\(\\ell\)\}\\bigr\]\.
###### Proof\.
With independent layersVar\[GRALISiMS\]=∑ℓλℓ2σℓ2\\operatorname\{Var\}\[\\mathrm\{GRALIS\}\_\{i\}^\{\\mathrm\{MS\}\}\]=\\sum\_\{\\ell\}\\lambda\_\{\\ell\}^\{2\}\\sigma\_\{\\ell\}^\{2\}\. Minimizing with Lagrange multiplierν\\nu:∂∂λℓ\[∑ℓλℓ2σℓ2\+ν\(∑ℓλℓ−1\)\]=0⇒2λℓσℓ2\+ν=0⇒λℓ∝σℓ−2\\frac\{\\partial\}\{\\partial\\lambda\_\{\\ell\}\}\[\\sum\_\{\\ell\}\\lambda\_\{\\ell\}^\{2\}\\sigma\_\{\\ell\}^\{2\}\+\\nu\(\\sum\_\{\\ell\}\\lambda\_\{\\ell\}\-1\)\]=0\\Rightarrow 2\\lambda\_\{\\ell\}\\sigma\_\{\\ell\}^\{2\}\+\\nu=0\\Rightarrow\\lambda\_\{\\ell\}\\propto\\sigma\_\{\\ell\}^\{\-2\}\. ∎
Layers with more stable attributions \(low variance\) receive greater weight\. GradCAM\+\+\[[8](https://arxiv.org/html/2605.05480#bib.bib8)\]applied to multiple layers is a special case of MS\-GRALIS with: \(i\)GRALISi\(ℓ\)\\mathrm\{GRALIS\}\_\{i\}^\{\(\\ell\)\}approximated with a single gradient, and \(ii\)λℓ\\lambda\_\{\\ell\}chosen heuristically\. MS\-GRALIS provides the missing axiomatic foundation and the optimal weight choice via variance minimization\.
## 4Summary and Axiomatic Comparison
Table[5](https://arxiv.org/html/2605.05480#S4.T5)summarizes the properties satisfied by each method\.
Table 5:Axiomatic comparison: 8 formal properties and 6 structural gaps\.✓\\checkmark= satisfied;≈\\approx= partially;×\\times= not satisfied;cause= this method generates the gap\. Totals in the last row indicate how many properties each method satisfies fully or partially\.Axiom / PropertyGradCAMSHAPLIMEIGGRALISShapley axiomsEfficiency \(completeness\)×\\times✓\\checkmark×\\times✓\\checkmark✓\\checkmarkSymmetry×\\times✓\\checkmark×\\times✓\\checkmark✓\\checkmarkDummy×\\times✓\\checkmark≈\\approx✓\\checkmark✓\\checkmarkLinearity inFF≈\\approx✓\\checkmark✓\\checkmark✓\\checkmark✓\\checkmarkSensitivity \(gradients\)✓\\checkmark×\\times×\\times✓\\checkmark✓\\checkmarkLocality \(πx\\pi\_\{x\}\)×\\times×\\times✓\\checkmark×\\times✓\\checkmarkOrder\-kkinteractions×\\times≈\\approx×\\times×\\times✓\\checkmarkMulti\-scale \(optimal wt\.\)≈\\approx×\\times×\\times×\\times✓\\checkmarkStructural gaps \(see Sec\.[2\.3](https://arxiv.org/html/2605.05480#S2.SS3)\)Gap 1 \(arbitrary baseline\)—≈\\approx×\\timescause✓\\checkmarkGap 2 \(curvature\)×\\timescause×\\times×\\times✓\\checkmarkGap 3 \(LIME completeness\)×\\times×\\timescause×\\times✓\\checkmarkGap 4 \(CNN only\)cause×\\times×\\times×\\times≈\\approxGap 5 \(interactions\)×\\times×\\times×\\times×\\times✓\\checkmarkGap 6 \(multi\-scale\)≈\\approx×\\times×\\times×\\times✓\\checkmarkTotal \(out of 14\)2\.55\.53\.5613\.5
Note: this comparison isstructural and indicative, not an absolute ranking\. Scores depend on the specific instantiation of each method: GradCAM and LIME may satisfy additional properties in non\-standard configurations; SHAP and IG can be extended to cover properties not attributed here\. The score13\.5/1413\.5/14forGRALISrefers to Algorithm 1 with SLIC superpixels and LIME kernel \(σ=0\.75\\sigma=0\.75\); “Multi\-scale” is partial \(≈\\approx\) due to the explicit dependence onnsegn\_\{\\mathrm\{seg\}\}\. The table should not be interpreted as an absolute ranking of methods, but as a map of the structural properties guaranteed by theory in each standard instantiation\.
#### Reducibility map\.
GRALIS→πx≡1,IG≈ΔfShapley\-IG→linear pathIG\\textsc\{GRALIS\}\\xrightarrow\{\\pi\_\{x\}\\equiv 1,\\;\\mathrm\{IG\}\\approx\\Delta f\}\\text\{Shapley\-IG\}\\xrightarrow\{\\text\{linear path\}\}\\text\{IG\}GRALIS→∇F≈const\.KernelSHAP→uniform kernelSHAP\\textsc\{GRALIS\}\\xrightarrow\{\\nabla F\\approx\\text\{const\.\}\}\\text\{KernelSHAP\}\\xrightarrow\{\\text\{uniform kernel\}\}\\text\{SHAP\}GRALIS→S=ℱ,πx→δxLIMEMS\-GRALIS→1 layer, no pathGradCAM\\textsc\{GRALIS\}\\xrightarrow\{S=\\mathcal\{F\},\\;\\pi\_\{x\}\\to\\delta\_\{x\}\}\\text\{LIME\}\\qquad\\text\{MS\-GRALIS\}\\xrightarrow\{\\text\{1 layer, no path\}\}\\text\{GradCAM\}
Figure[3](https://arxiv.org/html/2605.05480#S4.F3)visualizes the reducibility hierarchy\.
GRALIS\(canonical form\)KernelSHAPIntegratedGradientsLIMEMS\-GRALISSHAPGradCAMπx≡1\\pi\_\{x\}\\\!\\equiv\\\!1,∇F≈\\nabla F\\\!\\approx\\\!const\.S=ℱ∖\{i\}S\\\!=\\\!\\mathcal\{F\}\\\!\\setminus\\\!\\\{i\\\},πx→δx\\pi\_\{x\}\\\!\\to\\\!\\delta\_\{x\}lin\. surrogate onℱ\\mathcal\{F\}LLlevelsuniform kernel1 levelcf\. Tab\.[5](https://arxiv.org/html/2605.05480#S4.T5)6/145\.5/14Figure 3:Reducibility hierarchy ofGRALIS\. Each arrow is a specialization \(limit or special case\)\. Scores refer to Table[5](https://arxiv.org/html/2605.05480#S4.T5)\.
## 5Preliminary Experimental Validation
The theoretical predictions ofGRALISare validated on a breast histology classification task \(BreaKHis\[[24](https://arxiv.org/html/2605.05480#bib.bib24)\], 1,187 test images, 694 benign / 493 malignant\) using a DenseNet\-121 classifier trained with knowledge distillation\. Results reported here are based on the complete run \(1,187/1,187 images processed on NVIDIA A100\-80GB\); full comparison with baseline methods is planned for a companion paper\[[28](https://arxiv.org/html/2605.05480#bib.bib28)\]\.
#### Implementation of Algorithm 1\.
GRALISis implemented with SLIC segmentation \(nseg≈25n\_\{\\mathrm\{seg\}\}\\approx 25effective on 30 requested, compactness=50=50\),m=30m=30Monte Carlo permutations with antithetic variants,k=10k=10integration steps \(midpoint rule\) and LIME kernel withσ=0\.75\\sigma=0\.75\. The resulting map is*piecewise constant*: each pixel in segmentiiinherits the scalar valueϕiGRALIS\\phi\_\{i\}^\{\\textsc\{GRALIS\}\}, producing by construction structurally sparse maps\. The average computation time is≈45\\approx 45s/image on A100\-80GB\.
#### Map properties\.
Maps produced by Algorithm 1 show SAL=0\.762±0\.109~=~0\.762\\pm 0\.109\(mean attribution in the top\-20% of pixels by intensity\), confirming the identification of semantically coherent salient regions\. Compactnessϕactive=0\.39\\phi\_\{\\mathrm\{active\}\}=0\.39— fraction of pixels belonging to superpixels with attribution above the median — improves by19×19\{\\times\}compared to the GRALIS\-MC variant operating in feature space \(ϕactive≈1\.0\\phi\_\{\\mathrm\{active\}\}\\approx 1\.0\)\. The CPT metric \(=1−Hnorm=1\-H\_\{\\mathrm\{norm\}\}\) reports0\.0160\.016: this value, although low in absolute terms, reflects a limitation of the metric on piecewise constant maps with 50,176 pixels \(normalized entropy remains high even with only 25 distinct values\);ϕactive\\phi\_\{\\mathrm\{active\}\}is the most appropriate sparsity proxy for superpixel\-based methods\.
#### Faithfulness via superpixel deletion\.
The correct faithfulness test for piecewise constant maps is*superpixel deletion*: iterative removal of entire superpixels in descending order of mean attribution, not of individual pixels\. Results \(Table[6](https://arxiv.org/html/2605.05480#S5.T6)\) show a symmetric and theoretically coherent pattern\.
Table 6:Faithfulness via full superpixel deletion \(n=25n=25per class, preliminary\)\. Drop=f\(x\)−f\(x∖K\)\\,=\\,f\(x\)\-f\(x\_\{\\setminus K\}\); positive indicates reduced confidence in the predicted class after removal\.For*malignant*images \(K=3,5K=3,5\), deletion of the highest\-attribution superpixels reduces malignant confidence in 96% of cases \(mean drop\+0\.025\+0\.025–\+0\.027\+0\.027\), confirming thatGRALISlocalizes the regions actually used by the model for prediction\. For*benign*images, the negative drop \(96% of cases\) is symmetric and theoretically expected:GRALISidentifies the superpixels that*suppress*the malignant signal \(evidence of benign\); removing them increases malignant confidence, confirming correctness of the attribution\. This result surpasses the naive conception of faithfulness as “monotone reduction of prediction” and requires separate evaluation per class; a more extensive discussion with multi\-method comparison is planned in a companion paper\[[28](https://arxiv.org/html/2605.05480#bib.bib28)\]\.
#### Deletion faithfulness AUC\.
Following Petsiuk et al\.\[[16](https://arxiv.org/html/2605.05480#bib.bib16)\], we define the*superpixel deletion AUC*as the area under the drop curve:
DelAUC\(Kmax\)=1Kmax∫0KmaxDrop\(k\)𝑑k,\\mathrm\{DelAUC\}\(K\_\{\\max\}\)\\;=\\;\\frac\{1\}\{K\_\{\\max\}\}\\int\_\{0\}^\{K\_\{\\max\}\}\\mathrm\{Drop\}\(k\)\\,dk,\(5\)whereDrop\(k\)=f\(x\)−f\(x∖k\)\\mathrm\{Drop\}\(k\)=f\(x\)\-f\(x\_\{\\setminus k\}\)is the mean confidence drop after removing the top\-kksuperpixels by attribution and the integral is approximated by the trapezoid rule\. For piecewise\-constant maps, this formulation operates at the superpixel level rather than the pixel level, consistent with the map structure \(pixel\-level deletion is shown in Table[6](https://arxiv.org/html/2605.05480#S5.T6)to produce uninformative results for this class of maps\)\.
Using the data of Table[6](https://arxiv.org/html/2605.05480#S5.T6)\(K∈\{0,1,3,5\}K\\in\\\{0,1,3,5\\\},Kmax=5K\_\{\\max\}=5\), the preliminary estimates are:
DelAUCmal≈\+0\.015,DelAUCben≈−0\.020\.\\mathrm\{DelAUC\}\_\{\\mathrm\{mal\}\}\\approx\+0\.015,\\qquad\\mathrm\{DelAUC\}\_\{\\mathrm\{ben\}\}\\approx\-0\.020\.The positive value for malignant images confirms that the deletion curve is consistently decreasing \(attribution\-guided removal degrades confidence\), while the symmetric negative value for benign reflects the class\-conditional structure discussed above\. Both estimates are based on three removal steps only; a full 25\-step deletion curve with baseline comparison is planned for a companion paper\[[28](https://arxiv.org/html/2605.05480#bib.bib28)\]\.
## 6Conclusions
This work has presentedGRALIS, a mathematical framework that unifies a broad class of XAI attribution methods — including SHAP, Integrated Gradients, LIME and linearized GradCAM — under a unique canonical form\(𝒬,w,Δ\)\(\\mathcal\{Q\},w,\\Delta\)justified by the Riesz Representation Theorem\.
The seven proved theorems establish that: \(T1\) the canonical form is*necessary*for any linear additive attribution; \(T2\) completeness is guaranteed automatically; \(T3\) GRALIS\-MC reduces exponential complexity with an explicit error bound; \(T4\) SIVs are computed*exactly*on the induced gamevGv\_\{\\\!G\}; \(T5\) the Hoeffding ANOVA decomposition emerges under feature independence; \(T6\) Sobol indices are a local limiting case of GRALIS; \(T7\) optimal multi\-scale weights are given by inverse variance weighting\.
Appendix[X](https://arxiv.org/html/2605.05480#A1)clarifies via the Möbius transform that GRALIS does not*approximate*the SIVs, but*computes them exactly*on a cooperative game it constructs itself\. This distinction is theoretically crucial and defensible under peer review\.
GRALISsatisfies a broader set of axiomatic and structural properties than any existing method \(Table[5](https://arxiv.org/html/2605.05480#S4.T5)\), combining for the first time completeness, sensitivity, locality, exact interactions and optimal multi\-scale weights in a single framework\. The common formulation \(Table[3](https://arxiv.org/html/2605.05480#S3.T3)\) makes formally comparable methods that until now were only comparable empirically\.
#### Limitations and future work\.
Theoretical scope\.Conditions \(a\)–\(c\) of Theorem[3\.4](https://arxiv.org/html/2605.05480#S3.Thmtheorem4)require linearity and continuity of the attribution functional and therefore do not cover nonlinear methods such as standard GradCAM, attention maps, or saliency methods with smoothing\. Extending the canonical form to piecewise\-linear or Lipschitz functionals would require tools beyond the Riesz representation and is left as future work\.
Experimental scope\.Section[5](https://arxiv.org/html/2605.05480#S5)constitutes a*proof\-of\-concept validation*: its purpose is to verify that the theoretical guarantees of T1–T7 translate into measurable behaviors \(saliency, sparsity, faithfulness\) on a real clinical task, not to provide a full comparative benchmark\. Three specific gaps remain open at this stage\. \(i\)*Baseline comparison*: maps from GradCAM, KernelSHAP, LIME and IG are not yet reported alongside GRALIS; this comparison, together with visual evaluation on matched image pairs, is planned for a companion paper\[[28](https://arxiv.org/html/2605.05480#bib.bib28)\]\. \(ii\)*Standard faithfulness metrics*: the deletion AUC reported here is based on three removal steps \(K∈\{1,3,5\}K\\in\\\{1,3,5\\\}\); a full 25\-step curve and the ROAD benchmark\[[17](https://arxiv.org/html/2605.05480#bib.bib17)\]are planned as well\. \(iii\)*Generalization*: results are reported on a single dataset \(BreaKHis\) and a single architecture \(DenseNet\-121 with knowledge distillation\); evaluation on additional histology datasets and transformer architectures is planned\.
These limitations concern the empirical validation only and do not affect the theoretical contributions \(T1–T7\), which hold unconditionally under the stated conditions\.
Self\-contained claims\.Independently of the companion paper, the following results are fully reported and verifiable in this work: \(a\) the axiomatic comparison \(Table[5](https://arxiv.org/html/2605.05480#S4.T5)\), which showsGRALISsatisfying 13\.5/14 properties vs\. 2\.5–6/14 for individual methods, is derived entirely from the theoretical framework and does not require empirical validation; \(b\) the preliminary deletion AUC \(DelAUCmal≈\+0\.015\\mathrm\{DelAUC\}\_\{\\mathrm\{mal\}\}\\approx\{\+0\.015\},DelAUCben≈−0\.020\\mathrm\{DelAUC\}\_\{\\mathrm\{ben\}\}\\approx\{\-0\.020\}\) and the 96% class\-conditional faithfulness consistency are computed on the full 1,187\-image run and constitute standalone empirical evidence; \(c\) the SAL and sparsity metrics \(SAL=0\.762±0\.109\\mathrm\{SAL\}=0\.762\\pm 0\.109,ϕactive=0\.39\\phi\_\{\\mathrm\{active\}\}=0\.39\) are reported for the complete dataset without dependence on any external reference\.
## Appendix XJustification via Möbius Transform of the GRALIS–SIV Correspondence
### X\.1Motivation
Theorem[3\.11](https://arxiv.org/html/2605.05480#S3.Thmtheorem11)shows that GRALIS generates Shapley Interaction Values by constructing a finite cooperative gamevG:2N→ℝv\_\{\\\!G\}:2^\{N\}\\to\\mathbb\{R\}from the continuous space𝒬\\mathcal\{Q\}\. However, the passage from𝒬\\mathcal\{Q\}to the discrete coalition lattice2N2^\{N\}requires a formal justification\. This appendix provides such justification through the Möbius transform — the canonical tool for decomposing a function defined on coalitions into its pure higher\-order contributions\.
### X\.2Cooperative Game Induced by GRALIS
Let\(𝒬,𝒜,μ\)\(\\mathcal\{Q\},\\mathcal\{A\},\\mu\)be the GRALIS measure space and letρ:𝒬→2N\\rho:\\mathcal\{Q\}\\to 2^\{N\}be a measurable projection\. For every coalitionS⊆NS\\subseteq Nwe define:
vG\(S\)=∫ρ−1\(S\)w\(q\)⋅Δ\(f,x,x′,q\)𝑑μ\(q\),v\_\{\\\!G\}\(S\)\\;=\\;\\int\_\{\\rho^\{\-1\}\(S\)\}w\(q\)\\cdot\\Delta\(f,x,x^\{\\prime\},q\)\\,d\\mu\(q\),\(6\)whereρ−1\(S\):=\{q∈𝒬:ρ\(q\)=S\}\\rho^\{\-1\}\(S\):=\\\{q\\in\\mathcal\{Q\}:\\rho\(q\)=S\\\}is the*exact preimage*ofSSunderρ\\rho\. With this definition, the sets\{ρ−1\(S\)\}S⊆N\\\{\\rho^\{\-1\}\(S\)\\\}\_\{S\\subseteq N\}form a*measurable partition*of𝒬\\mathcal\{Q\}andvG\(∅\)=0v\_\{\\\!G\}\(\\emptyset\)=0\.
The push\-forward measureρ\#\(wdμ\)\(S\)=∫ρ−1\(S\)w\(q\)𝑑μ\(q\)\\rho\_\{\\\#\}\(w\\,d\\mu\)\(S\)=\\int\_\{\\rho^\{\-1\}\(S\)\}w\(q\)\\,d\\mu\(q\)represents the total weight assigned by GRALIS to coalitionSS\.
### X\.3Möbius Transform
For every set functionv:2N→ℝv:2^\{N\}\\to\\mathbb\{R\}, the Möbius transform is:
mG\(T\)=∑A⊆T\(−1\)\|T\|−\|A\|vG\(A\),∀T⊆N,m\_\{\\\!G\}\(T\)\\;=\\;\\sum\_\{A\\subseteq T\}\(\-1\)^\{\|T\|\-\|A\|\}\\,v\_\{\\\!G\}\(A\),\\qquad\\forall T\\subseteq N,with inversev\(S\)=∑T⊆SmG\(T\)v\(S\)=\\sum\_\{T\\subseteq S\}m\_\{\\\!G\}\(T\)\. The coefficientmG\(T\)m\_\{\\\!G\}\(T\)represents the pure contribution of coalitionTT, i\.e\. the part ofvG\(T\)v\_\{\\\!G\}\(T\)not explainable by proper sub\-coalitions\.
Applying the decomposition to the GRALIS game and substituting \([6](https://arxiv.org/html/2605.05480#A1.E6)\):
mG\(T\)\\displaystyle m\_\{\\\!G\}\(T\)=∑A⊆T\(−1\)\|T\|−\|A\|∫ρ−1\(A\)w\(q\)Δ\(q\)𝑑μ\(q\)\\displaystyle=\\sum\_\{A\\subseteq T\}\(\-1\)^\{\|T\|\-\|A\|\}\\int\_\{\\rho^\{\-1\}\(A\)\}w\(q\)\\,\\Delta\(q\)\\,d\\mu\(q\)=∫𝒬\[∑A⊆T\(−1\)\|T\|−\|A\|⋅𝟏ρ\(q\)=A\]w\(q\)Δ\(q\)𝑑μ\(q\)\.\\displaystyle=\\int\_\{\\mathcal\{Q\}\}\\Biggl\[\\sum\_\{A\\subseteq T\}\(\-1\)^\{\|T\|\-\|A\|\}\\cdot\\mathbf\{1\}\_\{\\rho\(q\)=A\}\\Biggr\]w\(q\)\\,\\Delta\(q\)\\,d\\mu\(q\)\.\(7\)The exchange between sum and integral is justified by the absolute integrability ofw\(q\)⋅Δ\(q\)w\(q\)\\cdot\\Delta\(q\)\(w∈L1\(𝒬,μ\)w\\in L^\{1\}\(\\mathcal\{Q\},\\mu\),Δ\\Deltabounded\) and the finiteness of the lattice \(≤2n\\leq 2^\{n\}terms\)\.
### X\.4Second\-Order Interactions
For two playersi,j∈Ni,j\\in N, the interaction conditioned onS⊆N∖\{i,j\}S\\subseteq N\\setminus\\\{i,j\\\}:
ΔijvG\(S\)=vG\(S∪\{i,j\}\)−vG\(S∪\{i\}\)−vG\(S∪\{j\}\)\+vG\(S\)\.\\Delta\_\{ij\}v\_\{\\\!G\}\(S\)\\;=\\;v\_\{\\\!G\}\(S\\cup\\\{i,j\\\}\)\-v\_\{\\\!G\}\(S\\cup\\\{i\\\}\)\-v\_\{\\\!G\}\(S\\cup\\\{j\\\}\)\+v\_\{\\\!G\}\(S\)\.Using the Möbius representationvG\(S\)=∑T⊆SmG\(T\)v\_\{\\\!G\}\(S\)=\\sum\_\{T\\subseteq S\}m\_\{\\\!G\}\(T\):
ΔijvG\(S\)=∑T:\{i,j\}⊆T⊆S∪\{i,j\}mG\(T\)\.\\Delta\_\{ij\}v\_\{\\\!G\}\(S\)\\;=\\;\\sum\_\{\\begin\{subarray\}\{c\}T:\\\{i,j\\\}\\subseteq T\\subseteq S\\cup\\\{i,j\\\}\\end\{subarray\}\}m\_\{\\\!G\}\(T\)\.Terms not containing bothiiandjjsimultaneously cancel out\. This identity shows that the second\-order discrete difference captures exactly the pure contributions involving bothiiandjjsimultaneously\.
### X\.5Recovery of Shapley Interaction Values
The SIV of pair\(i,j\)\(i,j\):IijSh\(vG\)=∑S⊆N∖\{i,j\}πn\(\|S\|\)⋅ΔijvG\(S\)I\_\{ij\}^\{\\mathrm\{Sh\}\}\(v\_\{\\\!G\}\)=\\sum\_\{S\\subseteq N\\setminus\\\{i,j\\\}\}\\pi\_\{n\}\(\|S\|\)\\cdot\\Delta\_\{ij\}v\_\{\\\!G\}\(S\)\. Substituting the expression in terms of Möbius and exchanging the sums:
IijSh\(vG\)=∑T⊆N\{i,j\}⊆T1\|T\|−1⋅mG\(T\),I\_\{ij\}^\{\\mathrm\{Sh\}\}\(v\_\{\\\!G\}\)\\;=\\;\\sum\_\{\\begin\{subarray\}\{c\}T\\subseteq N\\\\ \\\{i,j\\\}\\subseteq T\\end\{subarray\}\}\\frac\{1\}\{\|T\|\-1\}\\cdot m\_\{\\\!G\}\(T\),\(8\)whereαT=1/\(\|T\|−1\)\\alpha\_\{T\}=1/\(\|T\|\-1\)is the weight with which the pure contributionmG\(T\)m\_\{\\\!G\}\(T\)participates in the interaction \(Grabisch & Roubens\[[19](https://arxiv.org/html/2605.05480#bib.bib19)\], Thm\. 3\.1\)\.
### X\.6Interpretation for GRALIS
In the GRALIS context:
1. 1\.𝒬\\mathcal\{Q\}contains the local marginal contributionsΔ\(q\)\\Delta\(q\);
2. 2\.ρ\\rhogroups these contributions into discrete coalitions;
3. 3\.∫ρ−1\(S\)\\int\_\{\\rho^\{\-1\}\(S\)\}constructs the coalition valuevG\(S\)v\_\{\\\!G\}\(S\);
4. 4\.the Möbius transform decomposesvGv\_\{\\\!G\}into pure effects;
5. 5\.the SIV aggregates the pure effects involving bothiiandjjsimultaneously\.
### X\.7Final Proposition
###### Proposition X\.2\.
LetvG:2N→ℝv\_\{\\\!G\}:2^\{N\}\\to\\mathbb\{R\}be the cooperative game induced by GRALIS viaρ:𝒬→2N\\rho:\\mathcal\{Q\}\\to 2^\{N\}measurable\. Then for every pairi,j∈Ni,j\\in N:
IijGRALIS\(ρ\)=IijSh\(vG\)=∑T⊆N\{i,j\}⊆T1\|T\|−1⋅mG\(T\)\.I\_\{ij\}^\{\\mathrm\{GRALIS\}\}\(\\rho\)\\;=\\;I\_\{ij\}^\{\\mathrm\{Sh\}\}\(v\_\{\\\!G\}\)\\;=\\;\\sum\_\{\\begin\{subarray\}\{c\}T\\subseteq N\\\\ \\\{i,j\\\}\\subseteq T\\end\{subarray\}\}\\frac\{1\}\{\|T\|\-1\}\\cdot m\_\{\\\!G\}\(T\)\.
###### Proof\.
By definition,IijGRALIS\(ρ\)=∑Sπn\(\|S\|\)⋅ΔijvG\(S\)I\_\{ij\}^\{\\mathrm\{GRALIS\}\}\(\\rho\)=\\sum\_\{S\}\\pi\_\{n\}\(\|S\|\)\\cdot\\Delta\_\{ij\}v\_\{\\\!G\}\(S\), which is exactly the definition of the Grabisch–Roubens SIV applied tovGv\_\{\\\!G\}, soIijGRALIS=IijSh\(vG\)I\_\{ij\}^\{\\mathrm\{GRALIS\}\}=I\_\{ij\}^\{\\mathrm\{Sh\}\}\(v\_\{\\\!G\}\)\. Formula \([8](https://arxiv.org/html/2605.05480#A1.E8)\) follows by substituting the Möbius representation and collecting terms with the sameTT\. ∎
### X\.8Concluding Remark
The Möbius transform shows that the discretization of𝒬\\mathcal\{Q\}is not a heuristic simplification, but a measurable projection that induces a finite set function\. Once the gamevGv\_\{\\\!G\}is constructed, the SIVs are obtained according to the standard theory of cooperative games\.
The correct formulation is not:
GRALIS approximates the SIVs,
but:
GRALIS constructs a gamevGv\_\{\\\!G\}and computes the SIVs exactly onvGv\_\{\\\!G\}\.
The link between the Möbius coefficientsmG\(T\)m\_\{\\\!G\}\(T\)and the components of the Hoeffding decomposition \(Theorem[3\.13](https://arxiv.org/html/2605.05480#S3.Thmtheorem13)\) holds under additional hypotheses — in particular feature independence \(μ=⨂iμi\\mu=\\bigotimes\_\{i\}\\mu\_\{i\}\) and appropriate choice ofρ\\rho\. In general, GRALIS provides a common integral construction from which both ANOVA\-type decompositions and Shapley interaction indices can be derived under appropriate assumptions, without one automatically coinciding with the other\.
## Appendix YFormalization of the Projectionρ\\rhoand the OperatorPρP\_\{\\rho\}
### Y\.1Motivation
The construction of the cooperative gamevGv\_\{\\\!G\}\(Appendix[X](https://arxiv.org/html/2605.05480#A1)\) depends on the measurable projectionρ:𝒬→2N\\rho:\\mathcal\{Q\}\\to 2^\{N\}and the integralvG\(S\)=∫ρ−1\(S\)w⋅Δ𝑑μv\_\{\\\!G\}\(S\)=\\int\_\{\\rho^\{\-1\}\(S\)\}w\\cdot\\Delta\\,d\\mu\. A measure\-theoretic reviewer might object that: \(i\) the legitimacy ofρ\\rhois not proved in general; \(ii\) the partition\{ρ−1\(S\)\}\\\{\\rho^\{\-1\}\(S\)\\\}might not cover𝒬\\mathcal\{Q\}; \(iii\) results might depend on arbitrary labeling ofρ\\rho\. This appendix closes all three gaps with three formal lemmas and introduces the linear operatorPρP\_\{\\rho\}that reveals the deep algebraic structure ofGRALIS\.
### Y\.2Lemma 1 — Push\-Forward Measure and Well\-Definedness ofvGv\_\{G\}
###### Lemma Y\.1\(Push\-forward andvG∈ℓ1\(2N\)v\_\{\\\!G\}\\in\\ell^\{1\}\(2^\{N\}\)\)\.
Let\(𝒬,𝒜,μ\)\(\\mathcal\{Q\},\\mathcal\{A\},\\mu\)be aσ\\sigma\-finite measure space andρ:𝒬→2N\\rho:\\mathcal\{Q\}\\to 2^\{N\}an\(𝒜,𝒫\(2N\)\)\(\\mathcal\{A\},\\,\\mathcal\{P\}\(2^\{N\}\)\)\-measurable map \(where𝒫\(2N\)\\mathcal\{P\}\(2^\{N\}\)is the discreteσ\\sigma\-algebra on2N2^\{N\}\)\. Letw⋅Δ∈L1\(𝒬,μ\)w\\cdot\\Delta\\in L^\{1\}\(\\mathcal\{Q\},\\mu\)\. Then:
1. *\(i\)*The push\-forward measureν:=ρ\#μ\\nu:=\\rho\_\{\\\#\}\\mu, defined byν\(S\):=μ\(ρ−1\(S\)\)\\nu\(S\):=\\mu\\\!\\bigl\(\\rho^\{\-1\}\(S\)\\bigr\)for everyS⊆NS\\subseteq N, is a finite measure on\(2N,𝒫\(2N\)\)\(2^\{N\},\\,\\mathcal\{P\}\(2^\{N\}\)\)\.
2. *\(ii\)*The cooperative gamevG\(S\):=∫ρ−1\(S\)w\(q\)⋅Δ\(q\)𝑑μ\(q\)v\_\{\\\!G\}\(S\):=\\int\_\{\\rho^\{\-1\}\(S\)\}w\(q\)\\cdot\\Delta\(q\)\\,d\\mu\(q\)is well defined and satisfiesvG∈ℓ1\(2N,ν\)v\_\{\\\!G\}\\in\\ell^\{1\}\(2^\{N\},\\,\\nu\)\.
###### Proof\.
*\(i\)*Sinceρ\\rhois measurable,ρ−1\(S\)∈𝒜\\rho^\{\-1\}\(S\)\\in\\mathcal\{A\}for everyS⊆2NS\\subseteq 2^\{N\}\. The mapν:𝒫\(2N\)→\[0,\+∞\]\\nu:\\mathcal\{P\}\(2^\{N\}\)\\to\[0,\+\\infty\]satisfies:ν\(∅\)=μ\(ρ−1\(∅\)\)=μ\(∅\)=0\\nu\(\\emptyset\)=\\mu\(\\rho^\{\-1\}\(\\emptyset\)\)=\\mu\(\\emptyset\)=0, and for any finite disjoint family\{S1,…,Sk\}⊆2N\\\{S\_\{1\},\\ldots,S\_\{k\}\\\}\\subseteq 2^\{N\}\(which are always finite since\|2N\|=2n<∞\|2^\{N\}\|=2^\{n\}<\\infty\):
ν\(⨆j=1kSj\)=μ\(⨆j=1kρ−1\(Sj\)\)=∑j=1kμ\(ρ−1\(Sj\)\)=∑j=1kν\(Sj\),\\nu\\\!\\left\(\\bigsqcup\_\{j=1\}^\{k\}S\_\{j\}\\right\)=\\mu\\\!\\left\(\\bigsqcup\_\{j=1\}^\{k\}\\rho^\{\-1\}\(S\_\{j\}\)\\right\)=\\sum\_\{j=1\}^\{k\}\\mu\\\!\\bigl\(\\rho^\{\-1\}\(S\_\{j\}\)\\bigr\)=\\sum\_\{j=1\}^\{k\}\\nu\(S\_\{j\}\),where the second equality uses the fact thatρ−1\\rho^\{\-1\}commutes with set operations \(ρ−1\(Sj∩Sk\)=ρ−1\(Sj\)∩ρ−1\(Sk\)=∅\\rho^\{\-1\}\(S\_\{j\}\\cap S\_\{k\}\)=\\rho^\{\-1\}\(S\_\{j\}\)\\cap\\rho^\{\-1\}\(S\_\{k\}\)=\\emptysetforj≠kj\\neq ksinceSj∩Sk=∅S\_\{j\}\\cap S\_\{k\}=\\emptyset\) and the additivity ofμ\\mu\. Since2N2^\{N\}is finite,σ\\sigma\-additivity is equivalent to finite additivity; thusν\\nuis a finite measure\.
*\(ii\)*For everyS⊆NS\\subseteq N, sinceρ−1\(S\)∈𝒜\\rho^\{\-1\}\(S\)\\in\\mathcal\{A\}andw⋅Δ∈L1\(𝒬,μ\)w\\cdot\\Delta\\in L^\{1\}\(\\mathcal\{Q\},\\mu\):
\|vG\(S\)\|≤∫ρ−1\(S\)\|w\(q\)⋅Δ\(q\)\|𝑑μ\(q\)≤‖w⋅Δ‖L1\(𝒬,μ\)<∞\.\|v\_\{\\\!G\}\(S\)\|\\;\\leq\\;\\int\_\{\\rho^\{\-1\}\(S\)\}\|w\(q\)\\cdot\\Delta\(q\)\|\\,d\\mu\(q\)\\;\\leq\\;\\\|w\\cdot\\Delta\\\|\_\{L^\{1\}\(\\mathcal\{Q\},\\mu\)\}\\;<\\;\\infty\.ThusvG\(S\)∈ℝv\_\{\\\!G\}\(S\)\\in\\mathbb\{R\}for everySS\. Theℓ1\\ell^\{1\}norm:
‖vG‖ℓ1=∑S⊆N\|vG\(S\)\|≤∑S⊆N∫ρ−1\(S\)\|w⋅Δ\|𝑑μ=∫𝒬\|w⋅Δ\|𝑑μ=‖w⋅Δ‖L1<∞,\\\|v\_\{\\\!G\}\\\|\_\{\\ell^\{1\}\}=\\sum\_\{S\\subseteq N\}\|v\_\{\\\!G\}\(S\)\|\\;\\leq\\;\\sum\_\{S\\subseteq N\}\\int\_\{\\rho^\{\-1\}\(S\)\}\|w\\cdot\\Delta\|\\,d\\mu=\\int\_\{\\mathcal\{Q\}\}\|w\\cdot\\Delta\|\\,d\\mu=\\\|w\\cdot\\Delta\\\|\_\{L^\{1\}\}<\\infty,where we used the partition𝒬=⨆S⊆Nρ−1\(S\)\\mathcal\{Q\}=\\bigsqcup\_\{S\\subseteq N\}\\rho^\{\-1\}\(S\)\(Lemma[Y\.2](https://arxiv.org/html/2605.05480#A1.Thmtheorem2a)\) and monotonicity of the integral\. ThusvG∈ℓ1\(2N,ν\)v\_\{\\\!G\}\\in\\ell^\{1\}\(2^\{N\},\\nu\)\. ∎
### Y\.3Lemma 2 — Measurable Partition and Full Coverage of𝒬\\mathcal\{Q\}
###### Lemma Y\.2\(Measurable partition\)\.
Under the hypotheses of Lemma[Y\.1](https://arxiv.org/html/2605.05480#A1.Thmtheorem1a), the family𝒫ρ:=\{ρ−1\(S\)\}S⊆N\\mathcal\{P\}\_\{\\rho\}:=\\\{\\rho^\{\-1\}\(S\)\\\}\_\{S\\subseteq N\}forms a*measurable partition*of𝒬\\mathcal\{Q\}, i\.e\.:
1. *\(i\)*Disjointness:ρ−1\(S\)∩ρ−1\(T\)=∅\\rho^\{\-1\}\(S\)\\cap\\rho^\{\-1\}\(T\)=\\emptysetfor everyS≠TS\\neq Tin2N2^\{N\}\.
2. *\(ii\)*Coverage:⨆S⊆Nρ−1\(S\)=𝒬\\bigsqcup\_\{S\\subseteq N\}\\rho^\{\-1\}\(S\)=\\mathcal\{Q\}\.
3. *\(iii\)*Measurability:ρ−1\(S\)∈𝒜\\rho^\{\-1\}\(S\)\\in\\mathcal\{A\}for everyS⊆NS\\subseteq N\.
###### Proof\.
All three properties follow from the definition of preimage and the measurability ofρ\\rho\.
*\(i\)*Letq∈ρ−1\(S\)∩ρ−1\(T\)q\\in\\rho^\{\-1\}\(S\)\\cap\\rho^\{\-1\}\(T\)\. Thenρ\(q\)=S\\rho\(q\)=Sandρ\(q\)=T\\rho\(q\)=T, henceS=TS=Tby uniqueness of the value of a function\. Contradiction\.
*\(ii\)*For everyq∈𝒬q\\in\\mathcal\{Q\}, sinceρ\\rhois defined on all of𝒬\\mathcal\{Q\}, there exists a uniqueS:=ρ\(q\)∈2NS:=\\rho\(q\)\\in 2^\{N\}such thatq∈ρ−1\(S\)q\\in\\rho^\{\-1\}\(S\)\. Thus everyqqbelongs to exactly one element of𝒫ρ\\mathcal\{P\}\_\{\\rho\}\.
*\(iii\)*Sinceρ\\rhois\(𝒜,𝒫\(2N\)\)\(\\mathcal\{A\},\\mathcal\{P\}\(2^\{N\}\)\)\-measurable and\{S\}∈𝒫\(2N\)\\\{S\\\}\\in\\mathcal\{P\}\(2^\{N\}\), we haveρ−1\(\{S\}\)∈𝒜\\rho^\{\-1\}\(\\\{S\\\}\)\\in\\mathcal\{A\}\. ∎
### Y\.4Lemma 3 — Invariance with Respect to the Labeling ofρ\\rho
###### Lemma Y\.4\(ρ\\rho\-equivalence and invariance of SIVs\)\.
Two measurable projectionsρ1,ρ2:𝒬→2N\\rho\_\{1\},\\rho\_\{2\}:\\mathcal\{Q\}\\to 2^\{N\}are*ρ\\rho\-equivalent*\(ρ1∼ρ2\\rho\_\{1\}\\sim\\rho\_\{2\}\) if there exists a permutationσ∈𝔖N\\sigma\\in\\mathfrak\{S\}\_\{N\}such thatρ2\(q\)=σ\(ρ1\(q\)\)\\rho\_\{2\}\(q\)=\\sigma\(\\rho\_\{1\}\(q\)\)forμ\\mu\-almost everyq∈𝒬q\\in\\mathcal\{Q\}, whereσ\\sigmaacts on2N2^\{N\}asσ\(S\):=\{σ\(i\):i∈S\}\\sigma\(S\):=\\\{\\sigma\(i\):i\\in S\\\}\. Ifρ1∼ρ2\\rho\_\{1\}\\sim\\rho\_\{2\}, then:
1. *\(i\)*vG\(2\)\(S\)=vG\(1\)\(σ−1\(S\)\)v\_\{\\\!G\}^\{\(2\)\}\(S\)=v\_\{\\\!G\}^\{\(1\)\}\(\\sigma^\{\-1\}\(S\)\)for everyS⊆NS\\subseteq N\.
2. *\(ii\)*The Shapley valuesϕi\(vG\(j\)\)\\phi\_\{i\}\(v\_\{\\\!G\}^\{\(j\)\}\)satisfyϕσ\(i\)\(vG\(2\)\)=ϕi\(vG\(1\)\)\\phi\_\{\\sigma\(i\)\}\(v\_\{\\\!G\}^\{\(2\)\}\)=\\phi\_\{i\}\(v\_\{\\\!G\}^\{\(1\)\}\)for everyi∈Ni\\in N\.
3. *\(iii\)*ExpiScore and every symmetric attribution metric are invariant underρ1∼ρ2\\rho\_\{1\}\\sim\\rho\_\{2\}\.
###### Proof\.
*\(i\)*By definition:
vG\(2\)\(S\)\\displaystyle v\_\{\\\!G\}^\{\(2\)\}\(S\)=∫ρ2−1\(S\)w⋅Δ𝑑μ=∫\{q:σ\(ρ1\(q\)\)=S\}w⋅Δ𝑑μ\\displaystyle=\\int\_\{\\rho\_\{2\}^\{\-1\}\(S\)\}w\\cdot\\Delta\\,d\\mu=\\int\_\{\\\{q:\\sigma\(\\rho\_\{1\}\(q\)\)=S\\\}\}w\\cdot\\Delta\\,d\\mu=∫\{q:ρ1\(q\)=σ−1\(S\)\}w⋅Δ𝑑μ=∫ρ1−1\(σ−1\(S\)\)w⋅Δ𝑑μ=vG\(1\)\(σ−1\(S\)\)\.\\displaystyle=\\int\_\{\\\{q:\\rho\_\{1\}\(q\)=\\sigma^\{\-1\}\(S\)\\\}\}w\\cdot\\Delta\\,d\\mu=\\int\_\{\\rho\_\{1\}^\{\-1\}\(\\sigma^\{\-1\}\(S\)\)\}w\\cdot\\Delta\\,d\\mu=v\_\{\\\!G\}^\{\(1\)\}\(\\sigma^\{\-1\}\(S\)\)\.
*\(ii\)*The Shapley value of playerσ\(i\)\\sigma\(i\)invG\(2\)v\_\{\\\!G\}^\{\(2\)\}:
ϕσ\(i\)\(vG\(2\)\)=∑S⊆N∖\{σ\(i\)\}\|S\|\!\(n−\|S\|−1\)\!n\!\[vG\(2\)\(S∪\{σ\(i\)\}\)−vG\(2\)\(S\)\]\.\\phi\_\{\\sigma\(i\)\}\(v\_\{\\\!G\}^\{\(2\)\}\)=\\sum\_\{S\\subseteq N\\setminus\\\{\\sigma\(i\)\\\}\}\\frac\{\|S\|\!\\,\(n\-\|S\|\-1\)\!\}\{n\!\}\\bigl\[v\_\{\\\!G\}^\{\(2\)\}\(S\\cup\\\{\\sigma\(i\)\\\}\)\-v\_\{\\\!G\}^\{\(2\)\}\(S\)\\bigr\]\.Substituting \(i\):vG\(2\)\(S∪\{σ\(i\)\}\)=vG\(1\)\(σ−1\(S\)∪\{i\}\)v\_\{\\\!G\}^\{\(2\)\}\(S\\cup\\\{\\sigma\(i\)\\\}\)=v\_\{\\\!G\}^\{\(1\)\}\(\\sigma^\{\-1\}\(S\)\\cup\\\{i\\\}\)andvG\(2\)\(S\)=vG\(1\)\(σ−1\(S\)\)v\_\{\\\!G\}^\{\(2\)\}\(S\)=v\_\{\\\!G\}^\{\(1\)\}\(\\sigma^\{\-1\}\(S\)\)\. The substitutionT=σ−1\(S\)T=\\sigma^\{\-1\}\(S\)\(bijective\) gives\|T\|=\|S\|\|T\|=\|S\|and the sum becomes exactlyϕi\(vG\(1\)\)\\phi\_\{i\}\(v\_\{\\\!G\}^\{\(1\)\}\)\.
*\(iii\)*Follows from \(ii\) since all symmetric metrics \(ExpiScore, SAL, CPT, CSC\) depend on attribution values and not on feature labeling\. ∎
### Y\.5The OperatorPρP\_\{\\rho\}and the Algebraic Structure of GRALIS
The three preceding lemmas allow reformulatingGRALISas a composition of standard linear operators — the deep algebraic structure absent in the main\-body formulation\.
###### Definition Y\.6\(Partition integration operator\)\.
Givenρ:𝒬→2N\\rho:\\mathcal\{Q\}\\to 2^\{N\}measurable, the*partition operator*Pρ:L1\(𝒬,μ\)→ℓ1\(2N\)P\_\{\\rho\}:L^\{1\}\(\\mathcal\{Q\},\\mu\)\\to\\ell^\{1\}\(2^\{N\}\)is defined by:
\(Pρf\)\(S\):=∫ρ−1\(S\)f\(q\)𝑑μ\(q\),S⊆N\.\(P\_\{\\rho\}f\)\(S\)\\;:=\\;\\int\_\{\\rho^\{\-1\}\(S\)\}f\(q\)\\,d\\mu\(q\),\\qquad S\\subseteq N\.
###### Proposition Y\.7\(Properties ofPρP\_\{\\rho\}\)\.
PρP\_\{\\rho\}is a*bounded*linear operator with norm‖Pρ‖L1→ℓ1≤1\\\|P\_\{\\rho\}\\\|\_\{L^\{1\}\\to\\ell^\{1\}\}\\leq 1\. More precisely:
1. *\(i\)*‖Pρf‖ℓ1≤‖f‖L1\\\|P\_\{\\rho\}f\\\|\_\{\\ell^\{1\}\}\\leq\\\|f\\\|\_\{L^\{1\}\}for everyf∈L1f\\in L^\{1\}\.
2. *\(ii\)*PρP\_\{\\rho\}is a positive operator:f≥0f\\geq 0μ\\mu\-a\.e\.⇒\\Rightarrow\(Pρf\)\(S\)≥0\(P\_\{\\rho\}f\)\(S\)\\geq 0for everySS\.
3. *\(iii\)*Pρ𝟏𝒬=νP\_\{\\rho\}\\mathbf\{1\}\_\{\\mathcal\{Q\}\}=\\nu, whereν=ρ\#μ\\nu=\\rho\_\{\\\#\}\\muis the push\-forward measure of Lemma[Y\.1](https://arxiv.org/html/2605.05480#A1.Thmtheorem1a)\.
###### Proof\.
*\(i\)*From the partition of Lemma[Y\.2](https://arxiv.org/html/2605.05480#A1.Thmtheorem2a):‖Pρf‖ℓ1=∑S\|\(Pρf\)\(S\)\|≤∑S∫ρ−1\(S\)\|f\|𝑑μ=∫𝒬\|f\|𝑑μ=‖f‖L1\\\|P\_\{\\rho\}f\\\|\_\{\\ell^\{1\}\}=\\sum\_\{S\}\|\(P\_\{\\rho\}f\)\(S\)\|\\leq\\sum\_\{S\}\\int\_\{\\rho^\{\-1\}\(S\)\}\|f\|\\,d\\mu=\\int\_\{\\mathcal\{Q\}\}\|f\|\\,d\\mu=\\\|f\\\|\_\{L^\{1\}\}\.*\(ii\)*and*\(iii\)*follow directly from the definition\. ∎
#### Functional space compatibility\.
The Riesz Theorem[3\.4](https://arxiv.org/html/2605.05480#S3.Thmtheorem4)guarantees the representation of the attribution functional inL2\(𝒬,μ\)L^\{2\}\(\\mathcal\{Q\},\\mu\), whilePρP\_\{\\rho\}is defined onL1\(𝒬,μ\)L^\{1\}\(\\mathcal\{Q\},\\mu\)\. The two spaces are compatible by the following inclusion:
L2\(𝒬,μ\)⊆L1\(𝒬,μ\)whenμ\(𝒬\)<∞\.L^\{2\}\(\\mathcal\{Q\},\\mu\)\\;\\subseteq\\;L^\{1\}\(\\mathcal\{Q\},\\mu\)\\qquad\\text\{when \}\\mu\(\\mathcal\{Q\}\)<\\infty\.\(9\)In our construction,𝒬\\mathcal\{Q\}is a finite measure space \(superpixels of a224×224224\\times 224image, or more generally any compact domain with Borel measure\); thus \([9](https://arxiv.org/html/2605.05480#A1.E9)\) holds by Hölder’s inequality:‖f‖L1≤μ\(𝒬\)1/2‖f‖L2\\\|f\\\|\_\{L^\{1\}\}\\leq\\mu\(\\mathcal\{Q\}\)^\{1/2\}\\\|f\\\|\_\{L^\{2\}\}\. Forσ\\sigma\-finite spaces withμ\(𝒬\)=\+∞\\mu\(\\mathcal\{Q\}\)=\+\\infty\(e\.g\. unbounded integration paths\), the framework extends without substantial modifications by restrictingw⋅Δw\\cdot\\Deltato compact support or passing to weighted spacesLw2\(𝒬\):=\{f:∫𝒬\|f\|2w𝑑μ<∞\}L^\{2\}\_\{w\}\(\\mathcal\{Q\}\):=\\\{f:\\int\_\{\\mathcal\{Q\}\}\|f\|^\{2\}w\\,d\\mu<\\infty\\\}, on which the inclusionLw2⊆Lloc1L^\{2\}\_\{w\}\\subseteq L^\{1\}\_\{\\mathrm\{loc\}\}remains valid for the weight functions considered\. Therefore, ifw⋅Δ∈L2\(𝒬,μ\)w\\cdot\\Delta\\in L^\{2\}\(\\mathcal\{Q\},\\mu\)— as required by Riesz — thenw⋅Δ∈L1\(𝒬,μ\)w\\cdot\\Delta\\in L^\{1\}\(\\mathcal\{Q\},\\mu\)andPρ\(w⋅Δ\)∈ℓ1\(2N\)P\_\{\\rho\}\(w\\cdot\\Delta\)\\in\\ell^\{1\}\(2^\{N\}\)is well defined\. The functional chain is:
L2\(𝒬,μ\)→incl\.L1\(𝒬,μ\)→Pρℓ1\(2N\)→ShℝN\.L^\{2\}\(\\mathcal\{Q\},\\mu\)\\;\\xrightarrow\{\\text\{incl\.\}\}\\;L^\{1\}\(\\mathcal\{Q\},\\mu\)\\;\\xrightarrow\{P\_\{\\rho\}\}\\;\\ell^\{1\}\(2^\{N\}\)\\;\\xrightarrow\{\\mathrm\{Sh\}\}\\;\\mathbb\{R\}^\{N\}\.
#### Structural representation ofGRALIS\.
With the notation introduced, the cooperative game is written:
vG=Pρ\(w⋅Δ\),v\_\{\\\!G\}\\;=\\;P\_\{\\rho\}\(w\\cdot\\Delta\),wherew⋅Δ∈L1\(𝒬,μ\)w\\cdot\\Delta\\in L^\{1\}\(\\mathcal\{Q\},\\mu\)is the weight\-marginal\-contribution product\. LetSh:ℝ2N→ℝN\\mathrm\{Sh\}:\\mathbb\{R\}^\{2^\{N\}\}\\to\\mathbb\{R\}^\{N\}be the*bounded linear operator*of Shapley values\.111\(Shv\)i=∑S⊆N∖\{i\}\|S\|\!\(n−\|S\|−1\)\!n\!\[v\(S∪\{i\}\)−v\(S\)\]\(\\mathrm\{Sh\}\\,v\)\_\{i\}=\\sum\_\{S\\subseteq N\\setminus\\\{i\\\}\}\\frac\{\|S\|\!\(n\-\|S\|\-1\)\!\}\{n\!\}\[v\(S\\cup\\\{i\\\}\)\-v\(S\)\]\. Linearity invvis immediate from the definition\. Boundedness follows from the fact that2N2^\{N\}is finite: every linear map between finite\-dimensional vector spaces is automatically continuous and bounded\. Explicitly:‖Shv‖∞≤‖v‖∞\\\|\\mathrm\{Sh\}\\,v\\\|\_\{\\infty\}\\leq\\\|v\\\|\_\{\\infty\}for everyv∈ℝ2Nv\\in\\mathbb\{R\}^\{2^\{N\}\}, thus‖Sh‖ℓ∞→ℓ∞≤1\\\|\\mathrm\{Sh\}\\\|\_\{\\ell^\{\\infty\}\\to\\ell^\{\\infty\}\}\\leq 1\.Then theGRALISattribution formula is written as:
ϕGRALIS=Sh∘Pρ\(w⋅Δ\)\\boxed\{\\phi^\{\\textsc\{GRALIS\}\}\\;=\\;\\mathrm\{Sh\}\\circ P\_\{\\rho\}\\,\(w\\cdot\\Delta\)\}\(10\)
This equation reveals thatGRALISis the*composition of two bounded linear operators*: the partition integration operatorPρP\_\{\\rho\}and the Shapley operatorSh\\mathrm\{Sh\}\. The Riesz representation \(Theorem[3\.4](https://arxiv.org/html/2605.05480#S3.Thmtheorem4)\) guarantees that the functionalf↦ϕGRALIS\(f\)f\\mapsto\\phi^\{\\textsc\{GRALIS\}\}\(f\)is the unique continuous linear functional of this form onL2\(𝒬,μ\)L^\{2\}\(\\mathcal\{Q\},\\mu\)\.
###### Corollary Y\.9\(Uniqueness of the composition\)\.
Under the conditions of Theorem[3\.4](https://arxiv.org/html/2605.05480#S3.Thmtheorem4), for everyρ1∼ρ2\\rho\_\{1\}\\sim\\rho\_\{2\}\(in the sense of Lemma[Y\.4](https://arxiv.org/html/2605.05480#A1.Thmtheorem4)\):
Sh∘Pρ1\(w⋅Δ\)=σ\(Sh∘Pρ2\(w⋅Δ\)\),\\mathrm\{Sh\}\\circ P\_\{\\rho\_\{1\}\}\(w\\cdot\\Delta\)\\;=\\;\\sigma\\bigl\(\\mathrm\{Sh\}\\circ P\_\{\\rho\_\{2\}\}\(w\\cdot\\Delta\)\\bigr\),whereσ∈𝔖N\\sigma\\in\\mathfrak\{S\}\_\{N\}is the permutation realizingρ1∼ρ2\\rho\_\{1\}\\sim\\rho\_\{2\}\. In particular, the norm‖ϕGRALIS‖\\\|\\phi^\{\\textsc\{GRALIS\}\}\\\|and all symmetric attribution quantities are*independent of the choice of*ρ\\rhowithin itsρ\\rho\-equivalence class\.
###### Proof\.
Follows directly from Lemma[Y\.4](https://arxiv.org/html/2605.05480#A1.Thmtheorem4)\(ii\) and linearity ofSh\\mathrm\{Sh\}\. ∎
### Y\.6GRALIS as a Functor between Continuous Spaces and Cooperative Games
The results of this appendix reveal a categorical structure that constitutes the deepest theoretical insight of the framework\.
## References
- \[1\]Selvaraju, R\.R\., Cogswell, M\., Das, A\., Vedantam, R\., Parikh, D\., & Batra, D\. \(2017\)\. Grad\-CAM: Visual explanations from deep networks via gradient\-based localization\.*ICCV*, 618–626\.
- \[2\]Lundberg, S\.M\., & Lee, S\.\-I\. \(2017\)\. A unified approach to interpreting model predictions\.*NeurIPS 30*\.
- \[3\]Ribeiro, M\.T\., Singh, S\., & Guestrin, C\. \(2016\)\. “Why should I trust you?”: Explaining the predictions of any classifier\.*KDD*, 1135–1144\.
- \[4\]Sundararajan, M\., Taly, A\., & Yan, Q\. \(2017\)\. Axiomatic attribution for deep networks\.*ICML*, 3319–3328\.
- \[5\]Ancona, M\., Ceolini, E\., Öztireli, C\., & Gross, M\. \(2018\)\. Towards better understanding of gradient\-based attribution methods for deep neural networks\.*ICLR*\.
- \[6\]Montavon, G\., Lapuschkin, S\., Binder, A\., Müller, K\.\-R\., & Samek, W\. \(2017\)\. Explaining nonlinear classification decisions with deep Taylor decomposition\.*Pattern Recognition*, 65, 211–222\.
- \[7\]Simonyan, K\., Vedaldi, A\., & Zisserman, A\. \(2013\)\. Deep inside convolutional networks: Visualising image classification models and saliency maps\.*arXiv preprint arXiv:1312\.6034*\.
- \[8\]Chattopadhyay, A\., Sarkar, A\., Howlader, P\., & Balasubramanian, V\.N\. \(2018\)\. Grad\-CAM\+\+: Generalized gradient\-based visual explanations for deep convolutional networks\.*WACV*, 839–847\.
- \[9\]Covert, I\., & Lee, S\.\-I\. \(2021\)\. Improving KernelSHAP: Practical Shapley value estimation using linear regression\.*AISTATS*\.
- \[10\]Lundstrom, D\., Jain, T\., & Koyejo, S\. \(2022\)\. A rigorous study of integrated gradients method and extensions to internal neuron attributions\.*Transactions on Machine Learning Research \(TMLR\)*\.
- \[11\]Kindermans, P\.\-J\., Hooker, S\., Adebayo, J\., Alber, M\., Schütt, K\.T\., Dähne, S\., Erhan, D\., & Kim, B\. \(2019\)\. The \(un\)reliability of saliency methods\. In*Explainability of AI*, Springer LNCS, pp\. 267–280\.
- \[12\]Hooker, S\., Erhan, D\., Kindermans, P\.\-J\., & Kim, B\. \(2019\)\. A benchmark for interpretability methods in deep neural networks\.*NeurIPS 32*\.
- \[13\]Wang, H\., Wang, Z\., Du, M\., Yang, F\., Zhang, Z\., Ding, S\., Mardziel, P\., & Hu, X\. \(2020\)\. Score\-CAM: Score\-weighted visual explanations for convolutional neural networks\.*CVPR Workshops*\.
- \[14\]Fu, R\., Hu, Q\., Dong, X\., Guo, Y\., Gao, Y\., & Li, B\. \(2020\)\. Axiom\-based Grad\-CAM: Towards accurate visualization and explanation of CNNs\.*BMVC*\.
- \[15\]Draelos, R\.L\., & Carin, L\. \(2021\)\. Use HiResCAM instead of Grad\-CAM for faithful explanations of convolutional neural networks\.*arXiv preprint arXiv:2011\.08891*\.
- \[16\]Petsiuk, V\., Das, A\., & Saenko, K\. \(2018\)\. RISE: Randomized input sampling for explanation of black\-box models\.*BMVC*\.
- \[17\]Rong, Y\., Leemann, T\., Nguyen, T\.\-N\., Zeitler, L\., Jyothiprakash, P\., Bhatt, U\., Kasneci, E\., & Kasneci, G\. \(2022\)\. Evaluating the faithfulness of saliency\-based explanations via the ROAD benchmark\.*arXiv preprint arXiv:2202\.00449*\.
- \[18\]Bhatt, U\., Weller, A\., & Moura, J\.M\.F\. \(2020\)\. Evaluating and aggregating feature\-based model explanations\.*IJCAI*, 3016–3022\.
- \[19\]Grabisch, M\., & Roubens, M\. \(1999\)\. An axiomatic approach to the concept of interaction among players in cooperative games\.*International Journal of Game Theory*, 28\(4\), 547–565\.
- \[20\]Hoeffding, W\. \(1948\)\. A class of statistics with asymptotically normal distribution\.*Annals of Mathematical Statistics*, 19\(3\), 293–325\.
- \[21\]Efron, B\., & Stein, C\. \(1981\)\. The jackknife estimate of variance\.*Annals of Statistics*, 9\(3\), 586–596\.
- \[22\]Sobol’, I\.M\. \(1993\)\. Sensitivity estimates for nonlinear mathematical models\.*Mathematical Modelling and Computational Experiments*, 1\(4\), 407–414\.
- \[23\]Achanta, R\., Shaji, A\., Smith, K\., Lucchi, A\., Fua, P\., & Süsstrunk, S\. \(2012\)\. SLIC superpixels compared to state\-of\-the\-art superpixel methods\.*IEEE TPAMI*, 34\(11\), 2274–2282\.
- \[24\]Spanhol, F\. A\., Oliveira, L\. S\., Petitjean, C\., & Heutte, L\. \(2016\)\. A dataset for breast cancer histological image classification\.*IEEE Transactions on Biomedical Engineering*, 63\(7\), 1455–1462\.
- \[25\]Riesz, F\. \(1909\)\. Sur les opérations fonctionnelles linéaires\.*Comptes Rendus de l’Académie des Sciences*, 149, 974–977\.
- \[26\]Fanale, R\., Martini, G\., Sciarrone, F\., & Caldelli, R\. \(2026\)\. Explainable artificial intelligence for the analysis of histopathological images of breast cancer: Methods, interpretability and emerging directions\.*Frontiers in Signal Processing*\.doi:10\.3389/frsip\.2026\.1795809
- \[27\]Fanale, R\. et al\. \(2025\)\. ExpiScore: A quantitative framework for evaluating XAI methods in medical imaging\. Manuscript under review\.Transparency note: this work shares authorship with the present paper; results involving ExpiScore should be interpreted with this in mind\.
- \[28\]Fanale, R\. \(2026\)\. GRALIS\-LLM: Multimodal explainable AI for automated clinical report generation in breast cancer histology\. Manuscript in preparation\.Similar Articles
Retrieve, Integrate, and Synthesize: Spatial-Semantic Grounded Latent Visual Reasoning
This paper introduces RIS, a framework for spatial-semantic grounded latent visual reasoning in Multimodal Large Language Models to overcome information bottlenecks. It proposes anchoring latent tokens to spatial and semantic evidence, showing improvements on benchmarks like V* and HRBench.
GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards
GRAIL introduces gradient-reweighted advantages to improve token-level credit assignment in reinforcement learning for LLM reasoning, outperforming GRPO across multiple models.
GRASP: Geometry-aware Residual Alignment for Scalable Pretraining Data Attribution
GRASP introduces a geometry-aware, interaction-based method for scalable pretraining data attribution that models subset dynamics, outperforming existing additive approaches by over double the task-level rank correlation while reducing computation costs.
STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations
STRIDE is a new framework for training data attribution in LLMs that models functional effects in activation space using sparse recovery and steering operators, achieving state-of-the-art accuracy with 13x speedup over previous methods.
Do LLM Attribution Metrics Transfer? Auditing Retrieval-Augmented Generation Evaluation Across Datasets and Constructs
This paper audits eight automatic attribution metrics across three evaluation constructs for RAG systems, finding that no single metric transfers across datasets within the same construct, challenging the common practice of treating them as interchangeable.