Group-Algebraic Tensors: Provably-optimal Equivariant Learning and Physical Symmetry Discovery

arXiv cs.LG Papers

Summary

This paper introduces the ⋆_G tensor algebra, a framework that makes equivariance an intrinsic algebraic property rather than an architectural constraint, providing provably-optimal symmetry-preserving tensor approximation, Kronecker factorization for composing multiple symmetries, and a Lean 4 formalization. Experiments on QM9 molecular geometry demonstrate data-driven discovery of physical symmetry selection rules.

arXiv:2605.20440v1 Announce Type: new Abstract: We introduce the $\star_G$ tensor algebra, in which any finite group $G$ defines the multiplication rule, making equivariance an intrinsic algebraic property rather than an architectural constraint. The framework rests on three machine-verified theoretical pillars: (i)~an Eckart-Young optimality guarantee for the $\star_G$-SVD: the first such result for symmetry-preserving tensor approximation, exact and polynomial-time; (ii)~a Kronecker factorization that composes multiple symmetries by replacing $F_G$ with $F_{G_1} \otimes F_{G_2}$ with no architectural redesign; and (iii)~a 600-line Lean~4 formalization of the $\star_G$ algebra. The framework provides capabilities that equivariant neural networks (ENNs) structurally cannot: a closed-form per-irreducible-representation decomposition of every prediction, and data-driven discovery of the symmetry group that best fits a dataset. As a non-trivial empirical demonstration, decomposing QM9 molecular geometry over the chiral octahedral subgroup of SO(3) recovers the Wigner--Eckart selection rules of angular momentum from data alone, with no quantum mechanical input: scalar properties are A$_1$-dominated, dipole components are T$_1$-dominated, the isotropic polarizability is uniquely insensitive to $l\!=\!1$ as the rank-2-trace decomposition $l\!=\!0 \oplus l\!=\!2$ requires, and the T$_1$/A$_1$ predictive-power ratio separates vector observables from scalar observables by a factor of five. On full QM9 (130{,}831 molecules), $\star_G$-SVD with ridge regression provides closed form predictions at $\sim50-90\times$ fewer parameters than parameter-matched MLPs. Algebraic equivariance thus complements architectural equivariance not as a faster-better-cheaper alternative but as a different mathematical affordance: provably-optimal symmetry-preserving compression, per-irrep interpretability, and data-driven physical discovery.
Original Article
View Cached Full Text

Cached at: 05/21/26, 06:26 AM

# Group-Algebraic Tensors: Provably-optimal Equivariant Learning and Physical Symmetry Discovery
Source: [https://arxiv.org/html/2605.20440](https://arxiv.org/html/2605.20440)
Shashanka UbaruIBM ResearchDongsung HuhIndependentVasileios KalantzisIBM ResearchKenneth L\. ClarksonIBM ResearchMisha KilmerTufts UniversityHaim AvronTel\-Aviv UniversityLior HoreshIBM Research

###### Abstract

We introduce the⋆G\\star\_\{G\}tensor algebra, in which any finite groupGGdefines the multiplication rule, making equivariance an intrinsic algebraic property rather than an architectural constraint\. The framework rests on three machine\-verified theoretical pillars: \(i\) an Eckart–Young optimality guarantee for the⋆G\\star\_\{G\}\-SVD: the first such result for symmetry\-preserving tensor approximation, exact and polynomial\-time where Tucker is onlyd\\sqrt\{d\}\-quasi\-optimal, CP is NP\-hard, and tensor\-train has no global optimality; \(ii\) a Kronecker factorization that composes multiple symmetries by replacingFGF\_\{G\}withFG1⊗FG2F\_\{G\_\{1\}\}\\otimes F\_\{G\_\{2\}\}with no architectural redesign; and \(iii\) a 600\-line Lean 4 formalization of the⋆G\\star\_\{G\}algebra with zero unresolved proof obligations\. The framework provides capabilities that equivariant neural networks \(ENNs\) structurally cannot: a closed\-form per\-irreducible\-representation decomposition of every prediction, and data\-driven discovery of the symmetry group that best fits a dataset\. As a non\-trivial empirical demonstration, decomposing QM9 molecular geometry over the chiral octahedral subgroup of SO\(3\) recovers the Wigner–Eckart selection rules of angular momentum from data alone, with no quantum\-mechanical input: scalar properties are A1\-dominated, dipole components are T1\-dominated, the isotropic polarizability is uniquely insensitive tol=1l\\\!=\\\!1as the rank\-2\-trace decompositionl=0⊕l=2l\\\!=\\\!0\\oplus l\\\!=\\\!2requires, and the T1/A1predictive\-power ratio separates vector observables from scalar observables by a factor of five\. On full QM9 \(130,831 molecules\),⋆G\\star\_\{G\}\-SVD with ridge regression provides closed\-form predictions at∼\\sim50–90×\\timesfewer parameters than parameter\-matched MLPs and competitive accuracy in the parameter\-efficient regime; a within\-isomer audit shows that the apparent advantage of larger models on pooledR2R^\{2\}is largely a size\-prediction effect that vanishes once chemistry is controlled for\. Algebraic equivariance thus complements architectural equivariance not as a faster–better–cheaper alternative but as a different mathematical affordance: provably\-optimal symmetry\-preserving compression, machine\-verified equivariance, per\-irrep interpretability, and data\-driven physical discovery\.

## 1Introduction

Much of the data encountered in science and engineering is inherently multidimensional: molecular configurations encode three\-dimensional atomic positions, quantum states exist in exponentially large Hilbert spaces, and sensor arrays sample signals across spatial and temporal domains\. Traditional machine learning methods typically vectorize such data, collapsing its natural tensor structure into flat feature vectors\(Kolda and Bader,[2009](https://arxiv.org/html/2605.20440#bib.bib1); Sidiropoulos et al\.,[2017](https://arxiv.org/html/2605.20440#bib.bib2)\)\. This is akin to unfolding an origami crane into a flat sheet: the operation is technically lossless, but the geometry that gives the object its meaning is destroyed\. All subsequent processing must then recover, implicitly and at great computational cost, the structure that was discarded at the outset\.

Symmetry compounds this problem\. A molecule exists in three\-dimensional space: it can be rotated without changing its properties \(rotational symmetry\), and its identical atoms can be indexed in any order without changing the molecule itself \(permutation symmetry\)\. These symmetries coexist and interact\(Noether,[1918](https://arxiv.org/html/2605.20440#bib.bib3); Bronstein et al\.,[2021](https://arxiv.org/html/2605.20440#bib.bib4)\), yet vectorization treats them as incidental features of the data rather than as fundamental structural constraints\.

The dominant paradigm for incorporating symmetry is through Equivariant Neural Networks \(ENNs\)\(Cohen and Welling,[2016](https://arxiv.org/html/2605.20440#bib.bib5); Thomas et al\.,[2018](https://arxiv.org/html/2605.20440#bib.bib6); Fuchs et al\.,[2020](https://arxiv.org/html/2605.20440#bib.bib7); Batzner et al\.,[2022](https://arxiv.org/html/2605.20440#bib.bib8)\), which have achieved remarkable success in molecular property prediction\(Schütt et al\.,[2017](https://arxiv.org/html/2605.20440#bib.bib9)\)and protein structure\(Jumper et al\.,[2021](https://arxiv.org/html/2605.20440#bib.bib10)\)\. ENNs handle symmetry through architecture: to respect a rotation, one engineers rotation\-equivariant layers; to respect a permutation, one engineers permutation\-equivariant layers\. But when multiple symmetries coexist, as they do in virtually all physical systems, the architectural approach faces a combinatorial wall\. The blueprint must be redesigned from scratch for each new combination of symmetries, with no guarantee that the resulting representation is optimal in any rigorous sense\. The physics is hard\-coded into the network topology, and changing the physics means rebuilding the network\.

In this article, we propose a different philosophy\. Instead of constraining the architecture to fit the symmetry, we change the mathematics to fit the geometry of the data\. Building on the⋆M\\star\_\{M\}algebra of Kilmer and collaborators\(Kilmer et al\.,[2021](https://arxiv.org/html/2605.20440#bib.bib11); Kernfeld et al\.,[2015](https://arxiv.org/html/2605.20440#bib.bib12)\), we construct a tensor algebra,⋆G\\star\_\{G\}, where any finite groupGGdefines the multiplication rule\. The resulting algebra inherits equivariance as an intrinsic property of multiplication, not as an architectural constraint\. Composing multiple symmetries requires only specifying the direct productG1×G2×⋯×GdG\_\{1\}\\times G\_\{2\}\\times\\cdots\\times G\_\{d\}; no redesign is needed\. The⋆G\\star\_\{G\}algebra admits an SVD with provable Eckart–Young optimality \(Theorem[2\.1](https://arxiv.org/html/2605.20440#S2.Thmtheorem1)\), and, by the Peter–Weyl theorem\(Serre,[1977](https://arxiv.org/html/2605.20440#bib.bib13); Peter and Weyl,[1927](https://arxiv.org/html/2605.20440#bib.bib14)\), decomposes naturally into irreducible representation channels that can reveal the symmetry content of physical observables \(Section[2\.6](https://arxiv.org/html/2605.20440#S2.SS6)\)\. The Wigner–Eckart theorem \(1931\) states that matrix elements of tensor operators between angular momentum eigenstates factorize into a geometric part \(Clebsch–Gordan coefficient\) and a reduced matrix element independent of magnetic quantum numbers, implying selection rules: an operator of rankllcouples only states whose angular momenta differ by at mostll\. We show empirically that these rules are recoverable from data alone via⋆G\\star\_\{G\}decomposition\.

![Refer to caption](https://arxiv.org/html/2605.20440v1/x1.png)Figure 1:The⋆G\\star\_\{G\}tensor algebra: from optimal decomposition to symmetry discovery\.\(Top left, From molecules to algebra\) Molecular data measured under all elements of a symmetry groupGGform a structured tensor𝒜∈ℝn×d×\|G\|\\mathcal\{A\}\\in\\mathbb\{R\}^\{n\\times d\\times\|G\|\}, preserving geometric information that is destroyed by vectorization intoA∈ℝn⋅d×\|G\|A\\in\\mathbb\{R\}^\{n\\cdot d\\times\|G\|\}\. \(Top right, The⋆G\\star\_\{G\}product\) Two tensors are multiplied via group convolution along the tube dimension, computed efficiently in the Fourier domain via the Peter–Weyl theorem:FGF\_\{G\}transforms each tensor to its block\-diagonal spectral form, standard matrix products are applied per irreducible representation block, andFG−1F\_\{G\}^\{\-1\}returns the result\. The groupGGcan be any finite group \(a single symmetry or a productG1×G2×⋯G\_\{1\}\\times G\_\{2\}\\times\\cdots\) with no architectural changes required\. \(Bottom left, The⋆G\\star\_\{G\}\-SVD\) Every⋆G\\star\_\{G\}\-tensor admits a factorization𝒜=𝒰⋆G𝒮⋆G𝒱H\\mathcal\{A\}=\\mathcal\{U\}\\star\_\{G\}\\mathcal\{S\}\\star\_\{G\}\\mathcal\{V\}^\{H\}\. The rank\-kktruncation𝒜k\\mathcal\{A\}\_\{k\}is provably optimal:‖𝒜−𝒜k‖F≤‖𝒜−ℬ‖F\\\|\\mathcal\{A\}\-\\mathcal\{A\}\_\{k\}\\\|\_\{F\}\\leq\\\|\\mathcal\{A\}\-\\mathcal\{B\}\\\|\_\{F\}for any rank\-kkequivariant tensorℬ\\mathcal\{B\}\(Eckart–Young theorem for⋆G\\star\_\{G\}, Theorem[2\.1](https://arxiv.org/html/2605.20440#S2.Thmtheorem1)\)\. \(Bottom right, From Eckart–Young to Wigner–Eckart\) The same algebraic framework that delivers optimal low\-rank compression also serves as a spectroscope for physical symmetry\. Decomposing predictive power by irreducible representation \(irrep\) over the octahedral group recovers the Wigner–Eckart selection rules directly from molecular geometry data: scalar properties are dominated by thel=0l\\\!=\\\!0\(A1\) channel, dipole vector components require thel=1l\\\!=\\\!1\(T1\) channel, and polarizability is uniquely insensitive tol=1l\\\!=\\\!1\.
## 2Results

### 2\.1The⋆G\\star\_\{G\}Algebra

#### Theoretical Foundation

LetGGbe a finite group of ordernnwith elements\{g1,g2,…,gn\}\\\{g\_\{1\},g\_\{2\},\\ldots,g\_\{n\}\\\}\. We define the*convolution tensor*𝒯∈ℝn×n×n\\mathcal\{T\}\\in\\mathbb\{R\}^\{n\\times n\\times n\}by

𝒯​\(a,b,c\)=\{1if​a​b=c0otherwise\\mathcal\{T\}\(a,b,c\)=\\begin\{cases\}1&\\text\{if \}ab=c\\\\ 0&\\text\{otherwise\}\\end\{cases\}\(1\)for alla,b,c∈Ga,b,c\\in G\. This tensor encodes the complete multiplication table ofGG: its frontal slices are permutation matrices, and it satisfies an associativity identity inherited directly from group associativity\. By the Peter–Weyl theorem\(Serre,[1977](https://arxiv.org/html/2605.20440#bib.bib13); Peter and Weyl,[1927](https://arxiv.org/html/2605.20440#bib.bib14)\),𝒯\\mathcal\{T\}admits a spectral decomposition:

𝒯​\(a,b,c\)=∑i,j,k𝒞​\(i,j,k\)​FG​\(a,i\)​FG​\(b,j\)​FG−1​\(c,k\),\\mathcal\{T\}\(a,b,c\)=\\sum\_\{i,j,k\}\\mathcal\{C\}\(i,j,k\)\\,F\_\{G\}\(a,i\)\\,F\_\{G\}\(b,j\)\\,F\_\{G\}^\{\-1\}\(c,k\),\(2\)whereFGF\_\{G\}is a generalized Group Fourier transform matrix assembled from the irreducible unitary representations \(irreps\) ofGG, and𝒞\\mathcal\{C\}is a sparse core tensor encoding the block\-diagonal matrix multiplication structure of the irreps ofGG\. For abelian groups,FGF\_\{G\}is a generalized Fourier matrix \(for cyclic groups, it reduces to the standard DFT matrix\) and𝒞\\mathcal\{C\}is diagonal; for non\-abelian groups,FGF\_\{G\}is an invertible matrix\. The precise definition ofFGF\_\{G\}is provided in the Supplementary Information \(SI Section 2\)\. Crucially, equation \([2](https://arxiv.org/html/2605.20440#S2.E2)\) means that group convolution in the original domain corresponds to*block\-diagonal matrix multiplication*in the Fourier domain: one independent matrix product per irrep block, enabling highly efficient computation\.

We view order\-\(2\+d\)\(2\+d\)tensors𝒜∈ℝℓ×m×n1×⋯×nd\\mathcal\{A\}\\in\\mathbb\{R\}^\{\\ell\\times m\\times n\_\{1\}\\times\\cdots\\times n\_\{d\}\}asℓ×m\\ell\\times mmatrices whose entries \(which aredd\-order tensors\) lie in the convolutional ring𝕂G\\mathbb\{K\}\_\{G\}forG=G1×⋯×GdG=G\_\{1\}\\times\\cdots\\times G\_\{d\}andni=\|Gi\|n\_\{i\}=\|G\_\{i\}\|\. The*⋆G\\star\_\{G\}product*of𝒜\\cal\{A\}withℬ∈ℝm×p×n1×⋯×nd\\mathcal\{B\}\\in\\mathbb\{R\}^\{m\\times p\\times n\_\{1\}\\times\\cdots\\times n\_\{d\}\}is defined via group convolution along the group dimensions:

\(𝒜⋆Gℬ\)i​j​\(c1,…,cd\)=∑k∑\(a1,…,ad\)∈G𝒜i​k​\(a1,…,ad\)​ℬk​j​\(a1−1​c1,…,ad−1​cd\)\.\(\\mathcal\{A\}\\star\_\{G\}\\mathcal\{B\}\)\_\{ij\}\(c\_\{1\},\\ldots,c\_\{d\}\)=\\sum\_\{k\}\\sum\_\{\(a\_\{1\},\\ldots,a\_\{d\}\)\\in G\}\\mathcal\{A\}\_\{ik\}\(a\_\{1\},\\ldots,a\_\{d\}\)\\,\\mathcal\{B\}\_\{kj\}\(a\_\{1\}^\{\-1\}c\_\{1\},\\ldots,a\_\{d\}^\{\-1\}c\_\{d\}\)\.\(3\)This product defines a novel tensor algebra\(Kernfeld et al\.,[2015](https://arxiv.org/html/2605.20440#bib.bib12)\)that generalizes classical matrix algebra while embedding the symmetry groupGGdirectly into the multiplicative structure\. The resulting algebraic system supports a full suite of matrix\-mimetic operations \(inverses, transposes, norms, and decompositions\), all inheriting equivariance by construction\. Equivariance is thus a property of the algebra, not an imposed constraint\. The⋆G\\star\_\{G\}product can be computed efficiently by \(i\) applyingFGF\_\{G\}to each tensor along its group dimension, \(ii\) performing standard matrix products at each of the\|G^\|\|\\hat\{G\}\|Fourier irreps independently in parallel, and \(iii\) applyingFG−1F\_\{G\}^\{\-1\}to recover the result\. For Abelian groupsGG, the total cost isO​\(n​ℓ​m​p\+n​log⁡n\)O\(n\\ell mp\+n\\log n\)including the Fourier transforms, matching the complexity of a single matrix product up to logarithmic factors\.

The*⋆G\\star\_\{G\}\-Hermitian transpose*𝒜H∈ℝm×ℓ×n1×⋯×nd\\mathcal\{A\}^\{H\}\\in\\mathbb\{R\}^\{m\\times\\ell\\times n\_\{1\}\\times\\cdots\\times n\_\{d\}\}is defined entry\-wise by

\(𝒜H\)i​j​\(g1,…,gd\)=𝒜j​i​\(g1−1,…,gd−1\)¯,\(\\mathcal\{A\}^\{H\}\)\_\{ij\}\(g\_\{1\},\\ldots,g\_\{d\}\)=\\overline\{\\mathcal\{A\}\_\{ji\}\(g\_\{1\}^\{\-1\},\\ldots,g\_\{d\}^\{\-1\}\)\},\(4\)where the overline denotes complex conjugation \(trivial for real\-valued tensors\)\. Equivalently, in the Fourier domain,𝒜H^​\(:,:,ρ\)=𝒜^​\(:,:,ρ\)H\\widehat\{\\mathcal\{A\}^\{H\}\}\(:,:,\\rho\)=\\hat\{\\mathcal\{A\}\}\(:,:,\\rho\)^\{H\}for every irrepρ\\rho\(see SI Section 3 for the precise definition of𝒜^​\(:,:,ρ\)\\hat\{\\mathcal\{A\}\}\(:,:,\\rho\)\), so the⋆G\\star\_\{G\}\-transpose maps to the ordinary matrix Hermitian transpose at each irrep block\. This definition makes⋆G\\star\_\{G\}\-unitarity and the SVD factor conditions below fully analogous to their matrix counterparts\.

#### The⋆G\\star\_\{G\}\-SVD and Optimality

Every tensor𝒜\\mathcal\{A\}in the⋆G\\star\_\{G\}algebra admits a singular value decomposition𝒜=𝒰⋆G𝒮⋆G𝒱H\\mathcal\{A\}=\\mathcal\{U\}\\star\_\{G\}\\mathcal\{S\}\\star\_\{G\}\\mathcal\{V\}^\{H\}, where𝒰\\mathcal\{U\}and𝒱\\mathcal\{V\}are⋆G\\star\_\{G\}\-unitary \(satisfying𝒰H⋆G𝒰=ℐ\\mathcal\{U\}^\{H\}\\star\_\{G\}\\mathcal\{U\}=\\mathcal\{I\}\) and𝒮\\mathcal\{S\}is f\-diagonal \(its frontal slices are diagonal matrices\) with non\-negative real entries, the*singular tubes*𝐬i\\mathbf\{s\}\_\{i\}\. This decomposition is computed exactly by applying the group Fourier transform along the group dimension, performing standard matrix SVDs independently at each Fourier irrep, and applying the inverse group Fourier transform, a procedure that is both exact and computationally efficient\.

###### Theorem 2\.1\(Eckart–Young for⋆G\\star\_\{G\}\)\.

The rank\-kktruncation𝒜k\\mathcal\{A\}\_\{k\}minimizes‖𝒜−ℬ‖F2\\\|\\mathcal\{A\}\-\\mathcal\{B\}\\\|\_\{F\}^\{2\}over all tensorsℬ\\mathcal\{B\}of⋆G\\star\_\{G\}\-rank at mostkk, with‖𝒜−𝒜k‖F2=∑i=k\+1r‖𝐬i‖F2\\\|\\mathcal\{A\}\-\\mathcal\{A\}\_\{k\}\\\|\_\{F\}^\{2\}=\\sum\_\{i=k\+1\}^\{r\}\\\|\\mathbf\{s\}\_\{i\}\\\|\_\{F\}^\{2\}\.

Full proof is provided in the Supplementary Information \(SI Section 5\), and a machine\-verified Lean 4 formalization is available in the code repository \(see Code Availability\)\. The⋆G\\star\_\{G\}\-rank of𝒜\\mathcal\{A\}is the number of non\-zero singular tubes in the⋆G\\star\_\{G\}\-SVD𝒜=𝒰⋆G𝒮⋆G𝒱H\\mathcal\{A\}=\\mathcal\{U\}\\star\_\{G\}\\mathcal\{S\}\\star\_\{G\}\\mathcal\{V\}^\{H\}\.

This result is a direct analogue of the classical matrix Eckart–Young theorem\(Eckart and Young,[1936](https://arxiv.org/html/2605.20440#bib.bib15)\): just as the rank\-kkmatrix SVD provides the best rank\-kkapproximation in Frobenius norm, the⋆G\\star\_\{G\}\-SVD provides the best⋆G\\star\_\{G\}\-rank\-kkapproximation among all group\-equivariant tensors of that rank\. This is the first such optimality guarantee for symmetry\-preserving tensor approximation; by contrast, Tucker/HOSVD gives only quasi\-optimal bounds with ad\\sqrt\{d\}factor\(de Silva and Lim,[2008](https://arxiv.org/html/2605.20440#bib.bib16)\), CP decomposition is NP\-hard to compute optimally, and tensor\-train has no global optimality guarantee\. The error has a closed\-form expression in terms of the discarded singular tube norms, enabling principled rank selection with full control over approximation quality\. In practice,⋆G\\star\_\{G\}\-rank\-kkfeatures derived from the leading singular tubes carry the maximal group\-equivariant information about the data for any given parameter budget\.

#### Composing Multiple Symmetries

###### Theorem 2\.2\(Product Groups\)\.

ForG=G1×⋯×GdG=G\_\{1\}\\times\\cdots\\times G\_\{d\}, the convolution tensor factorizes as𝒯G=𝒯G1⊗⋯⊗𝒯Gd\\mathcal\{T\}\_\{G\}=\\mathcal\{T\}\_\{G\_\{1\}\}\\otimes\\cdots\\otimes\\mathcal\{T\}\_\{G\_\{d\}\}, and the generalized Fourier matrix isFG=FG1⊗⋯⊗FGdF\_\{G\}=F\_\{G\_\{1\}\}\\otimes\\cdots\\otimes F\_\{G\_\{d\}\}\.

Full proof is provided in the Supplementary Information \(SI Section 6\)\.

This Kronecker structure is the algebraic reason why multiple symmetries compose without architectural redesign\. The factorization of𝒯G\\mathcal\{T\}\_\{G\}implies that the irreps of a product group are exactly the tensor products of the factors’ irreps, so the Fourier\-domain block\-diagonal structure is the Kronecker product of the individual block\-diagonal structures\. Concretely, forG=ℤn1×ℤn2G=\\mathbb\{Z\}\_\{n\_\{1\}\}\\times\\mathbb\{Z\}\_\{n\_\{2\}\}, the Group Fourier transformFG=DFTn1⊗DFTn2F\_\{G\}=\\mathrm\{DFT\}\_\{n\_\{1\}\}\\otimes\\mathrm\{DFT\}\_\{n\_\{2\}\}computes a 2D DFT, resolving coupled frequencies\(f1,f2\)\(f\_\{1\},f\_\{2\}\)that are entirely invisible to either factor group alone\. Adding a new symmetryGd\+1G\_\{d\+1\}to an existing⋆G\\star\_\{G\}model requires only replacingFGF\_\{G\}withFG⊗FGd\+1F\_\{G\}\\otimes F\_\{G\_\{d\+1\}\}; no layers are redesigned and no weights are reinitialized\.

### 2\.2Experiment 1: Synthetic Validation

We first validated the⋆G\\star\_\{G\}framework on controlled synthetic data to confirm that the algebraic guarantees translate into empirical performance\. We generated 1,000 synthetic molecules with exactℤ12\\mathbb\{Z\}\_\{12\}rotational symmetry and compared⋆G\\star\_\{G\}\-SVD with ridge regression against four baselines \(Augmented MLP, Neural⋆G\\star\_\{G\}, Standard MLP, and Invariant MLP\) across three metrics: predictive accuracy \(R2\), rotational invariance \(variance of predictions under unseen rotations\), and parameter efficiency\. Cyclic structure was verified at machine precision \(4\.2×10−164\.2\\times 10^\{\-16\}\)\. The target property𝐲\\mathbf\{y\}combines mean interatomic distance, distance variance, and atomic number contributions, representing a rotationally invariant scalar analogous to size\-dependent molecular properties such as polarizability\.

Results are summarized in Table[1](https://arxiv.org/html/2605.20440#S2.T1)and Figure[2](https://arxiv.org/html/2605.20440#S2.F2)\. The⋆G\\star\_\{G\}\-SVD achieves perfect prediction \(R2=1\.000±0\.000=1\.000\\pm 0\.000\) and exact invariance \(rotation variance5\.8×10−315\.8\\times 10^\{\-31\}\) using only 101 parameters, compared to 5,249–14,465 for all neural baselines\. The Standard MLP achieves R2=0\.377=0\.377with rotation variance0\.140\.14, confirming that without explicit symmetry handling the model neither learns well nor respects the symmetry\. The Augmented MLP \(R2=0\.998=0\.998\) achieves near\-perfect accuracy but retains residual rotation variance \(3\.7×10−53\.7\\times 10^\{\-5\}\) five orders of magnitude larger than⋆G\\star\_\{G\}\-SVD, illustrating that augmentation approximates but does not algebraically guarantee invariance\. Figure[2](https://arxiv.org/html/2605.20440#S2.F2)b shows a 30\-orders\-of\-magnitude gap in rotation variance between⋆G\\star\_\{G\}\-SVD and all non\-algebraic methods, a qualitative difference rather than merely a quantitative improvement\. The predicted\-versus\-true scatter plot \(Fig\.[3](https://arxiv.org/html/2605.20440#S2.F3)\) makes this vivid:⋆G\\star\_\{G\}\-SVD produces a perfect diagonal, while the Standard MLP produces near\-random scatter\.

Table 1:Synthetic validation \(ℤ12\\mathbb\{Z\}\_\{12\}, 1,000 molecules, 3 seeds\)\.![Refer to caption](https://arxiv.org/html/2605.20440v1/x2.png)Figure 2:Synthetic validation \(ℤ12\\mathbb\{Z\}\_\{12\}\)\.\(a\) Test R2\. \(b\) Rotation variance \(log scale\): 30\-orders\-of\-magnitude gap between⋆G\\star\_\{G\}\-SVD and all non\-algebraic methods\. \(c\) Parameter efficiency \(Pareto frontier\):⋆G\\star\_\{G\}\-SVD dominates all baselines on both axes simultaneously\. \(d\) Multi\-metric normalized summary scores\.![Refer to caption](https://arxiv.org/html/2605.20440v1/x3.png)Figure 3:Predicted vs\. true \(synthetic\)\.\(a\)⋆G\\star\_\{G\}\-SVD: perfect diagonal at R2=1\.000=1\.000\. \(b\) Standard MLP: near\-random scatter at R2=−0\.084=\-0\.084, illustrating the cost of ignoring symmetry\.
### 2\.3Experiment 2: QM9 Molecular Property Prediction

Having established correctness on synthetic data, we evaluated the⋆G\\star\_\{G\}framework on the QM9 benchmark\(Ramakrishnan et al\.,[2014](https://arxiv.org/html/2605.20440#bib.bib17)\), a widely used dataset of 134,000 small organic molecules with up to nine heavy atoms \(C, H, O, N, F\), each annotated with 12 quantum chemical properties computed at the DFT level of theory\. QM9 has become a standard testbed for machine learning methods in computational chemistry, particularly for evaluating how well models leverage molecular symmetry: molecules are invariant under 3D rotation and atomic index permutation, and models that exploit these symmetries empirically generalize better with fewer samples\(Batzner et al\.,[2022](https://arxiv.org/html/2605.20440#bib.bib8); Schütt et al\.,[2017](https://arxiv.org/html/2605.20440#bib.bib9)\)\. We focus on predicting the HOMO–LUMO gap \(the difference in energy between the highest occupied and lowest unoccupied molecular orbitals\), a property of direct relevance to photovoltaics, drug design, and materials discovery\.

We used 1,000 molecules withℤ12\\mathbb\{Z\}\_\{12\}rotational featurization\. Results are presented in Table[2](https://arxiv.org/html/2605.20440#S2.T2)and Figure[4](https://arxiv.org/html/2605.20440#S2.F4)\. The⋆G\\star\_\{G\}\-SVD with ridge regression is the*only*method to achieve positive R2on this real\-world task, reaching R2=0\.556±0\.047=0\.556\\pm 0\.047and RMSE=0\.035=0\.035Ha with just 107 parameters\. All pure neural baselines collapse: the Standard MLP reaches R2=−10\.99±5\.90=\-10\.99\\pm 5\.90, the Invariant MLP R2=−3\.85±1\.58=\-3\.85\\pm 1\.58, and the Neural⋆G\\star\_\{G\}\(which also uses algebraic equivariance\) reaches R2=−5\.15±3\.10=\-5\.15\\pm 3\.10\. The Augmented MLP achieves positive R2=0\.384=0\.384but requires49×49\\timesmore parameters and leaves substantially larger residuals\. The⋆G\\star\_\{G\}\-SVD’s rotation variance \(3\.2×10−313\.2\\times 10^\{\-31\}\) is at floating\-point noise, while all MLPs show residual rotational sensitivity\. The learning curves \(Figure[5](https://arxiv.org/html/2605.20440#S2.F5)\) reveal that this advantage is not merely a small\-sample effect:⋆G\\star\_\{G\}\-SVD maintains positive R2from as few as 100 molecules, while neural baselines overfit catastrophically at small sample sizes and only approach competitive performance near 1,000 samples\. This data efficiency follows directly from optimal algebraic compression: 107 well\-chosen equivariant features capture more structural information than thousands of learned neural parameters\. The results demonstrate that in data\-scarce scientific settings, algebraic structure is a stronger inductive bias than architectural expressivity\.

Table 2:QM9 HOMO–LUMO gap \(ℤ12\\mathbb\{Z\}\_\{12\}, 1,000 molecules, 3 seeds\)\.![Refer to caption](https://arxiv.org/html/2605.20440v1/x4.png)Figure 4:QM9 HOMO–LUMO gap \(1,000 real molecules\)\.\(a\) Test R2:⋆G\\star\_\{G\}\-SVD and Augmented MLP are the only methods with positive R2; all pure neural baselines overfit catastrophically\. \(b\) RMSE \(Hartree\):⋆G\\star\_\{G\}\-SVD achieves the lowest error at 0\.035 Ha\. \(c\) Rotation variance \(log scale\):⋆G\\star\_\{G\}\-SVD achieves exact invariance at floating\-point noise\.![Refer to caption](https://arxiv.org/html/2605.20440v1/x5.png)Figure 5:Learning curves on QM9\.⋆G\\star\_\{G\}\-SVD \+ Ridge maintains positive R2from as few as 100 molecules\. Neural baselines overfit at small sample sizes \(negative R2\) and require substantially more data to approach competitive performance\. Bands:±1\\pm 1s\.d\. over 3 seeds\.
### 2\.4Experiment 3: Product Group Composition

To test whether the algebraic composition theorem \(Theorem[2\.2](https://arxiv.org/html/2605.20440#S2.Thmtheorem2)\) translates into empirical performance, we constructed a task with two independent, commuting symmetries:G1=ℤ6G\_\{1\}=\\mathbb\{Z\}\_\{6\}\(discrete rotations in thex​yxy\-plane\) andG2=ℤ4G\_\{2\}=\\mathbb\{Z\}\_\{4\}\(periodic translations alongzz\)\. The target quantity was designed to be dominated by coupled 2D Fourier frequencies\(f1,f2\)\(f\_\{1\},f\_\{2\}\)that require both symmetries simultaneously; neither factor alone can resolve these modes\. This setting models the physical situation in which a molecular property depends on two structural degrees of freedom simultaneously, as is common in materials with layered or helical symmetry\.

Results are shown in Table[3](https://arxiv.org/html/2605.20440#S2.T3)and Figure[6](https://arxiv.org/html/2605.20440#S2.F6)\. The⋆G\\star\_\{G\}model overG1×G2G\_\{1\}\\times G\_\{2\}achieves perfect prediction \(R2=1\.000±0\.000=1\.000\\pm 0\.000\) with just 186 parameters\. The single\-factor models capture at most 23%:G2G\_\{2\}alone \(R2=0\.229=0\.229\) andG1G\_\{1\}alone \(R2=0\.155=0\.155\)\. The 2D frequency map \(Figure[6](https://arxiv.org/html/2605.20440#S2.F6)b\) visualizes why: the coupled frequency cells \(highlighted in red\) carry 87% of the target energy and are resolved only byFℤ6⊗Fℤ4F\_\{\\mathbb\{Z\}\_\{6\}\}\\otimes F\_\{\\mathbb\{Z\}\_\{4\}\}; the individual transforms see only the axis\-aligned marginals\. The cyclic approximationℤ24\\mathbb\{Z\}\_\{24\}\(treating the same 24\-dimensional group dimension as a single cyclic group\) reaches R2=0\.986=0\.986, close but not exact, because it cannot distinguish the tensor\-product structure of the irreps\. The ablation cascade \(Figure[7](https://arxiv.org/html/2605.20440#S2.F7)\) confirms a strict performance hierarchy: product group\>\>wrong cyclic\>\>single factor\>\>no symmetry, each step removing algebraic information and reducing performance monotonically\. These results validate Theorem[2\.2](https://arxiv.org/html/2605.20440#S2.Thmtheorem2)empirically and demonstrate that exact product group specification is both necessary and sufficient for exact recovery\.

Table 3:Product groupℤ6×ℤ4\\mathbb\{Z\}\_\{6\}\\times\\mathbb\{Z\}\_\{4\}\(1,000 molecules, 3 seeds\)\.![Refer to caption](https://arxiv.org/html/2605.20440v1/x6.png)Figure 6:Product groupℤ6×ℤ4\\mathbb\{Z\}\_\{6\}\\times\\mathbb\{Z\}\_\{4\}: compositional advantage\.\(a\) Eight\-method comparison\. The product group achievesR2=1\.000R^\{2\}=1\.000; each factor alone captures≤23\\leq 23%\. \(b\) 2D frequency map: coupled cells \(red borders\) carry 87% of target energy and are resolved only byFℤ6⊗Fℤ4F\_\{\\mathbb\{Z\}\_\{6\}\}\\otimes F\_\{\\mathbb\{Z\}\_\{4\}\}\.![Refer to caption](https://arxiv.org/html/2605.20440v1/x7.png)Figure 7:Ablation cascade\.Progressively removing symmetry components reveals a strict performance hierarchy: product group→\\towrong cyclic approximation→\\tosingle factor→\\tono symmetry\.
### 2\.5Experiment 4: Symmetry and Factorization Discovery

Beyond prediction, the⋆G\\star\_\{G\}framework enables a qualitatively new capability: data\-driven discovery of the symmetry group that best describes a dataset, without any prior knowledge of its structure\. This is a potentially transformative tool for scientific inquiry: given observations of a physical system, one can scan a library of candidate groups and identify the one that maximally captures the data’s structure, effectively reading off the symmetry of nature from empirical measurements alone\.

Group discovery on QM9\(Figure[8](https://arxiv.org/html/2605.20440#S2.F8)a\): scanning eight candidate groups of small order,ℤ4\\mathbb\{Z\}\_\{4\}achieves the highest combined prediction score \(R2=0\.590=0\.590\), consistent with theC4C\_\{4\}rotational symmetry of many small organic molecules in the dataset\. This result was obtained purely from molecular geometry and property labels, with no crystallographic or group\-theoretic input\.

Factorization discovery forn=24n=24\(Figure[8](https://arxiv.org/html/2605.20440#S2.F8)b\): given that data lives on a 24\-element group, the algorithm scans all factorizationsG1×G2G\_\{1\}\\times G\_\{2\}of order 24 and identifiesℤ3×ℤ8\\mathbb\{Z\}\_\{3\}\\times\\mathbb\{Z\}\_\{8\}as optimal \(R2=1\.000=1\.000\), outperforming the naive cyclicℤ24\\mathbb\{Z\}\_\{24\}\(R2=0\.985=0\.985\) and all other factorizations\. This demonstrates that the⋆G\\star\_\{G\}framework can recover the latent product structure of a symmetry group from data, a capability unavailable in existing equivariant network frameworks where the group is always specified manually\.

![Refer to caption](https://arxiv.org/html/2605.20440v1/x8.png)Figure 8:Symmetry discovery\.\(a\) QM9 group discovery: scanning candidate groups revealsℤ4\\mathbb\{Z\}\_\{4\}as the best fit, consistent withC4C\_\{4\}molecular symmetry\. The combined score axis reflects both predictive accuracy and invariance quality\. \(b\) Factorization discovery for order 24:ℤ3×ℤ8\\mathbb\{Z\}\_\{3\}\\times\\mathbb\{Z\}\_\{8\}is identified as the optimal decomposition \(R2=1\.000=1\.000\), surpassing cyclicℤ24\\mathbb\{Z\}\_\{24\}\(R2=0\.985=0\.985\) and revealing the latent product structure of the symmetry group from data alone\.
### 2\.6Experiment 5: Empirical Recovery of Wigner–Eckart Selection Rules

The preceding experiments validated the⋆G\\star\_\{G\}framework as a prediction tool\. We now ask a deeper question: can the algebraic decomposition*discover*physical symmetry structure that was not provided as input? The Wigner–Eckart theorem states that scalar observables \(l=0l\\\!=\\\!0\) couple only to the trivial representation, vector observables \(l=1l\\\!=\\\!1\) require the fundamental representation, and rank\-2 tensor observables \(l=2l\\\!=\\\!2\) require thel=0l\\\!=\\\!0andl=2l\\\!=\\\!2channels\. We test whether the⋆G\\star\_\{G\}framework recovers these selection rules from molecular geometry data alone\.

#### Setup

We replace the cyclic groupℤ24\\mathbb\{Z\}\_\{24\}with the chiral octahedral groupOO\(order 24, a subgroup of SO\(3\)\) whose five irreducible representations correspond directly to angular momentum channels:

For 2,000 QM9 molecules, we compute features under all 24 octahedral rotations \(face, vertex, and edge rotations of the cube\), then decompose the Fourier power into per\-irrep contributions using the actual representation matrices\. We predict not only scalar properties \(HOMO–LUMO gap, HOMO, LUMO, ZPVE\) and the dipole magnitude\|𝝁\|\|\\boldsymbol\{\\mu\}\|, but also the dipole vector componentsμx,μy,μz\\mu\_\{x\},\\mu\_\{y\},\\mu\_\{z\}computed from Mulliken partial charges\. These vector components transform under thel=1l\\\!=\\\!1\(T1\) irrep and cannot be predicted from rotation\-invariant features\.

#### Results

Table[4](https://arxiv.org/html/2605.20440#S2.T4)shows the predictive power \(R2\) of each irrep’s features alone for each property\. Three findings emerge that are consistent with the Wigner–Eckart theorem and are summarized in Figures[9](https://arxiv.org/html/2605.20440#S2.F9)and[10](https://arxiv.org/html/2605.20440#S2.F10)\.

\(i\) Dipole components require equivariant information\.The A1\(invariant\) irrep predicts scalar properties well \(R2=0\.64=0\.64mean\) but gives essentially zero for dipole components \(R2=0\.04=0\.04mean\), a separation of\+0\.59\+0\.59\. Dipole vector components*cannot*be predicted from rotation\-invariant features alone: they require directional \(l≥1l\\geq 1\) information, as the Wigner–Eckart theorem demands for a rank\-1 tensor operator \(Figure[9](https://arxiv.org/html/2605.20440#S2.F9)a\)\.

\(ii\) The T1/A1ratio separates vector from scalar properties\.For scalar properties, the T1channel provides about half the predictive power of A1\(ratio∼\\sim0\.54\)\. For dipole components, T1provides5×5\\timesmore than A1\(ratio∼\\sim2\.78\)\. This5×5\\timesshift directly indicates that dipole components draw their predictive signal primarily from thel=1l\\\!=\\\!1angular channel, exactly as the Wigner–Eckart theorem predicts for vector observables \(Figure[9](https://arxiv.org/html/2605.20440#S2.F9)b\)\.

\(iii\) Polarizability is uniquely insensitive tol=1l\\\!=\\\!1\.The isotropic polarizability \(trace of a rank\-2 tensor\) has T1R2=0\.017=0\.017, essentially zero, while every other property has T1R2\>0\.08\>0\.08\. This is consistent with the representation\-theoretic decomposition of symmetric rank\-2 tensors, which containl=0l\\\!=\\\!0andl=2l\\\!=\\\!2components but nol=1l\\\!=\\\!1component\. The heatmap in Figure[10](https://arxiv.org/html/2605.20440#S2.F10)makes the block structure visible: rank\-0 properties \(above the separator line\) show strong A1and weak T1; rank\-1 dipole components show the reverse; the rank\-2 polarizability is the outlier with near\-zero T1\.

These patterns were discovered entirely from molecular geometry and Mulliken charges, without any quantum\-mechanical theory as input\. The⋆G\\star\_\{G\}framework, by decomposing predictions into irreducible representation channels of a physically meaningful group, serves as a*spectroscope for symmetry*: it reveals which angular momentum channels carry information about which physical observables\.

Table 4:Per\-irrep predictive power \(R2\) on QM9 properties\. The T1/A1ratio reveals which properties depend on directional \(l=1l\\\!=\\\!1\) information\.PropertyRankA1A2ET1T2T1/A1HOMO–LUMO gap00\.9860\.9860\.4380\.6180\.3380\.63HOMO energy00\.2370\.2370\.0790\.145−\-0\.010\.61LUMO energy00\.1260\.1260\.0390\.0800\.0000\.64ZPVE00\.9850\.9850\.1660\.378−\-0\.020\.38\|𝝁\|\|\\boldsymbol\{\\mu\}\|\(magnitude\)00\.8530\.8530\.1710\.3950\.0640\.46μx\\mu\_\{x\}\(component\)10\.0100\.0100\.0080\.0430\.0034\.47μy\\mu\_\{y\}\(component\)10\.1240\.1240\.0290\.191−\-0\.001\.54μz\\mu\_\{z\}\(component\)1−\-0\.00−\-0\.00−\-0\.00−\-0\.010\.000N/APolarizability20\.3130\.3130\.0300\.0170\.0010\.05![Refer to caption](https://arxiv.org/html/2605.20440v1/x9.png)Figure 9:Empirical recovery of Wigner–Eckart selection rules\.Per\-irrep predictive power \(R2\) for each quantum property\. Scalar properties \(blue shades\) are dominated by A1\(l=0l\\\!=\\\!0\)\. Dipole magnitude \(orange\) also lives primarily in A1because\|𝝁\|\|\\boldsymbol\{\\mu\}\|is a scalar\. Dipole vector components \(red shades\) show a qualitatively different pattern: A1gives nearly zero while T1\(l=1l\\\!=\\\!1\) is the dominant channel, consistent with the Wigner–Eckart selection rule for a rank\-1 tensor operator\. Polarizability \(purple\) has the lowest T1of any property, consistent with its rank\-2 decomposition containing onlyl=0l\\\!=\\\!0andl=2l\\\!=\\\!2\.![Refer to caption](https://arxiv.org/html/2605.20440v1/x10.png)Figure 10:Irrep decomposition heatmap\.R2for each \(property, irrep\) combination, sorted by tensor rank with group separators\. Rank\-0 properties \(above line\) show strong A1and weak T1; rank\-1 dipole components \(below line\) show the reverse; the rank\-2 polarizability is the outlier with near\-zero T1\.

### 2\.7Comparison with Equivariant Neural Networks at Scale

We now report a head\-to\-head comparison of⋆G\\star\_\{G\}against representative equivariant neural networks \(ENNs\) on the full QM9 benchmark \(130,831 characterized molecules, 91,581 / 19,624 / 19,626 train / validation / test split, 3 seeds\)\. The aim of this section is to characterize*where*the algebraic framework dominates and*why*the pooled\-R2R^\{2\}scoreboard on QM9 is a less informative metric than it appears\. ENNs do achieve higher pooledR2R^\{2\}, and we report the gap explicitly\.

##### Concession: ENNs achieve higher pooledR2R^\{2\}, and so do generic MLPs at modest parameter cost\.

Table[5](https://arxiv.org/html/2605.20440#S2.T5)reports testR2R^\{2\}on the four scalar QM9 targets across all molecule\-level summary methods plus three ENN baselines\. MACE, a state\-of\-the\-art equivariant message\-passing network, reachesR2=0\.985R^\{2\}=0\.985on the HOMO–LUMO gap target with945,168945\{,\}168parameters;⋆G\\star\_\{G\}\-SVD \+ Ridge reachesR2=0\.482R^\{2\}=0\.482with 144 parameters\. Standard MLP and Invariant MLP, given the same molecule\-level feature tensor as⋆G\\star\_\{G\}, reachR2=0\.513R^\{2\}=0\.513and0\.5290\.529respectively at3,0733\{,\}073to5,7615\{,\}761parameters: comparable to⋆G\\star\_\{G\}\-SVD \+ Ridge on pooledR2R^\{2\}at2020–40×40\\timesmore parameters\. The interesting question is what these differences actually mean, addressed next\.

##### The pooled\-R2R^\{2\}gap is largely a size\-prediction race\.

On QM9, the HOMO–LUMO gap depends strongly on molecular size: a model that captures gross size and composition features, without any chemistry\-aware bond topology, already explains the bulk of the variance\. We measure the*within\-isomer*R2R^\{2\}\(restricted to formulas with≥5\\geq 5constitutional isomers, sample\-weighted across formulas\) for every method on the same test predictions\. The empirical finding \(Table[6](https://arxiv.org/html/2605.20440#S2.T6)\) is structural and clean: on QM9 HOMO–LUMO gap within isomer groups,*every*method that consumes the molecule\-level\(nfeat,\|G\|\)\(n\_\{\\mathrm\{feat\}\},\|G\|\)summary converges to within\-isomerR2≈0R^\{2\}\\approx 0\(⋆G\\star\_\{G\}ridge:\+0\.01\+0\.01;⋆G\\star\_\{G\}neural:\+0\.10\+0\.10; MLP standard:\+0\.06\+0\.06; MLP invariant:\+0\.08\+0\.08; MLP augmented:−2\.03\-2\.03\)\. SchNet \(0\.9910\.991\) and MACE \(0\.9680\.968\) consume the full atomic graph, which carries the bond topology that distinguishes constitutional isomers\.*The within\-isomerR2R^\{2\}gap therefore separates methods by input information content, not by algorithmic merit\.*Among methods that consume the same molecule\-level summary,⋆G\\star\_\{G\}is competitive with all alternatives at orders of magnitude fewer parameters, with closed\-form solution, and with the per\-irrep decomposition no MLP can produce\. The implication for Section[2\.7](https://arxiv.org/html/2605.20440#S2.SS7)’s framing is that the⋆G\\star\_\{G\}contribution is not in pooledR2R^\{2\}on QM9 but in \(a\) the per\-irrep predictive decomposition no ENN provides \(Section[2\.6](https://arxiv.org/html/2605.20440#S2.SS6), Table[7](https://arxiv.org/html/2605.20440#S2.T7)\), \(b\) the cross\-selectivity headline on tensorial polarizability targets \(Table[8](https://arxiv.org/html/2605.20440#S2.T8)\), and \(c\) the four orthogonal capabilities listed below\.

##### The⋆G\\star\_\{G\}algebra delivers capabilities that ENNs cannot provide\.

Three of these are central to the manuscript\.

\(i\) Per\-irrep predictive decomposition\.The generalized Fourier transform of the molecule tensor decomposes naturally into irreducible representation channels ofGG; for the octahedral group this gives the A1, A2, E, T1, T2channels of Section[2\.6](https://arxiv.org/html/2605.20440#S2.SS6)\. Per\-irrep testR2R^\{2\}for any target is a closed\-form quantity, computed by ridge\-regressing on a single irrep’s projected features \(Table[4](https://arxiv.org/html/2605.20440#S2.T4)\)\. ENNs operate inside spherical\-harmonic / Wigner\-DDspaces but their final predictions do not decompose this way; the readout is end\-to\-end\. This is the primary interpretability differentiator\.

\(ii\) Empirical recovery of the Wigner–Eckart selection rules\.The per\-irrep decomposition above, applied to QM9 properties of varying tensor rank, recovers the angular\-momentum selection rules of the 1931 Wigner–Eckart theorem from molecular geometry data alone, with no quantum\-mechanical input \(Section[2\.6](https://arxiv.org/html/2605.20440#S2.SS6)\)\. No ENN paper has reported this\. It is unique to the algebraic framing\.

\(iii\) Provably optimal symmetry\-preserving compression\.The rank\-kk⋆G\\star\_\{G\}\-SVD is the bestGG\-equivariant rank\-kkapproximation in Frobenius norm \(Theorem[2\.1](https://arxiv.org/html/2605.20440#S2.Thmtheorem1), machine\- verified in Lean 4\)\. Tucker has onlyd\\sqrt\{d\}\-quasi\-optimal bounds; CP is NP\-hard to compute optimally; tensor\-train has no global optimality\. The structural advantage compounds with composability: adding a new symmetryG′G^\{\\prime\}replacesFGF\_\{G\}byFG⊗FG′F\_\{G\}\\otimes F\_\{G^\{\\prime\}\}with no architectural change\.

##### Matched\-input\-information comparison and the Augmented MLP collapse\.

On the same molecule\-level\(nfeat,\|G\|\)\(n\_\{\\mathrm\{feat\}\},\|G\|\)feature tensor \(the4848\-row angular featurizer of Section[4\.2](https://arxiv.org/html/2605.20440#S4.SS2), carrying angular moments, heavy\-atom rows, atom\-pair Coulomb couplings, and distance\-distribution rows under cyclic Z12\),⋆G\\star\_\{G\}\-SVD \+ Ridge \(144 params,R2=0\.482R^\{2\}=0\.482on gap,0\.9980\.998on ZPVE\) substantially exceeds Standard MLP \(3,073 params,R2=0\.513R^\{2\}=0\.513\) and Invariant MLP \(5,761 params,R2=0\.529R^\{2\}=0\.529\) onα\\alpha\(0\.9090\.909vs\.0\.8180\.818vs\.0\.8520\.852\) and matches them on the other three scalars at∼\\sim2020–40×40\\timesfewer parameters\. The parameter\-efficiency advantage at the low\-budget end of the Pareto frontier is real \(Figure[11](https://arxiv.org/html/2605.20440#S2.F11)\); the within\-isomer audit above explains why the larger MLPs do not extract proportionally more signal\. The starkest empirical difference on identical input is with the Augmented MLP, the closest non\-equivariant analogue to⋆G\\star\_\{G\}, which attempts to learn invariance by training on rotated copies of each molecule\. At full QM9 scale the Augmented MLP collapses toR2=0\.019±0\.014R^\{2\}=0\.019\\pm 0\.014on gap andR2=0\.072±0\.019R^\{2\}=0\.072\\pm 0\.019onα\\alpha, because orbit augmentation injects label\-preserving noise that a 3,073\-parameter MLP cannot disentangle once the dataset is large\. The case for algebraic equivariance over data\-augmented learning of equivariance is therefore empirically dramatic at large scale, considerably more so than at the 1,000\-molecule scale of Section[2\.3](https://arxiv.org/html/2605.20440#S2.SS3)\. The ENN comparison remains a different\-input comparison: SchNet, e3nn, and MACE consume the full atomic graph;⋆G\\star\_\{G\}consumes a molecule\-level summary\. The contribution of the algebra is what it adds within that input contract\.

##### Position in the equivariant ML lineage\.

Equivariant ML has progressed through three stages: data augmentation, architectural equivariance \(ENNs\), and now algebraic equivariance \(⋆G\\star\_\{G\}\)\. Each stage cut back, but did not eliminate, a class of cost from the previous one \(Section[2\.6](https://arxiv.org/html/2605.20440#S2.SS6)discusses this transition in detail\)\.⋆G\\star\_\{G\}’s contribution is not a faster–better–cheaper alternative to ENNs but a different mathematical affordance: provably optimal low\-rank approximation, machine\-verified equivariance, symmetry discovery from data, and per\-irrep interpretability\. Where the data demands chemistry\-aware predictive power on a single target with generous compute, ENNs remain the right tool; where the goal is algebraic structure, interpretability, parameter efficiency, or the discovery of physical laws like the Wigner–Eckart selection rules from data, the algebraic framework offers things no ENN does\.

Table 5:Full QM9, four scalar targets, mean testR2±R^\{2\}\\pmstd over 3 seeds \(130,831 molecules; 91,581 / 19,624 / 19,626 train / val / test split\)\. The MACE row makes the magnitude of the pooled\-R2R^\{2\}gap explicit and motivates the within\-isomer audit in Table[6](https://arxiv.org/html/2605.20440#S2.T6)\.∗⋆G\\star\_\{G\}\-SVD \+ Ridge and⋆G\\star\_\{G\}neural use the4848\-row angular featurizer specified in Methods §[4\.2](https://arxiv.org/html/2605.20440#S4.SS2)\.†SchNet uses the PyTorch Geometric reference implementation \(torch\_geometric\.nn\.models\.SchNet\) with128128hidden channels,66interactions,5050\-Gaussian RBF, cutoff1010Å,455,809455\{,\}809trainable parameters\. SchNet matches or modestly exceeds MACE on every target while using∼2\.1×\{\\sim\}\\,2\.1\{\\times\}fewer trainable parameters\. Full configurations for all three ENN baselines are in Supplementary §[L\.1](https://arxiv.org/html/2605.20440#A12.SS1)\.Table 6:Pooled vs\. within\-isomer testR2R^\{2\}for the HOMO–LUMO gap target\. Within\-isomerR2R^\{2\}is computed over molecular formulas with≥5\\geq 5constitutional isomers and reported as the sample\-weighted mean across formulas\. The collapse from pooled to within\-isomer quantifies how much of the headlineR2R^\{2\}is size\-prediction signal\.*Within\-isomerR2R^\{2\}averaged over∼\\sim230 molecular formulas \(per seed,≥5\\geq 5constitutional isomers each\); covered test molecules range from∼\\sim10,500 \(PyG\-split SchNet\) to∼\\sim19,250 \(⋆G\\star\_\{G\}/ MLP / MACE\)\. The sample\-weighted mean across formulas is robust to the per\-method test\-set partitioning, so the numerical comparison is unaffected\.‡MACE within\-isomer is the mean over two seeds; the third seed will be added in the camera\-ready \(the change in 3\-seed mean would be at most∼0\.005\{\\sim\}\\,0\.005per the per\-seed dispersion observed\)\.*

Table 7:Per\-irrep testR2R^\{2\}on full QM9 with the chiral octahedral groupOO\. Each cell is the testR2R^\{2\}when⋆G\\star\_\{G\}\-SVD \+ Ridge is trained on*only*the projected features of one irrep\. The Wigner–Eckart signature reads off each property’s tensor character directly from the data: ZPVE is A1\-pure \(a scalar property\);α\\alpha\(rank\-2 trace\) has T≈10\{\}\_\{1\}\\approx 0, exactly as required by the representation theory of symmetric rank\-2 tensors \(which decompose intol=0⊕l=2l\\\!=\\\!0\\oplus l\\\!=\\\!2, nol=1l\\\!=\\\!1component\);μ\\mu\(magnitude of a vector\) has T1stronger than A1, signalling its underlying vector character\. ENNs do not produce this decomposition\.Table 8:QM7\-X polarizability components: cross\-selectivity table\.For each method we train two single\-target models, one predicting theEgE\_\{g\}irrep magnitude‖𝜶E‖\\\|\\boldsymbol\{\\alpha\}\_\{E\}\\\|of the molecular polarizability tensor, the other predicting theT2​gT\_\{2g\}magnitude‖𝜶T2‖\\\|\\boldsymbol\{\\alpha\}\_\{T\_\{2\}\}\\\|, on identical 60/20/20 per\-molecule splits across three seeds \(42, 43, 44\)\. Cross\-selectivity iscs=\(Ron2−Roff2\)/\(Ron2\+Roff2\+10−12\)\\mathrm\{cs\}=\(R^\{2\}\_\{\\mathrm\{on\}\}\-R^\{2\}\_\{\\mathrm\{off\}\}\)/\(R^\{2\}\_\{\\mathrm\{on\}\}\+R^\{2\}\_\{\\mathrm\{off\}\}\+10^\{\-12\}\), where the “on” model targets the indicated component and the “off” model is the architecturally\- identical model trained on the other component\. A method whose two single\-target architectures achieve similarR2R^\{2\}values has cross\-selectivity near zero, indicating that the architecture treats the two targets symmetrically\. A method with very asymmetricR2R^\{2\}has cross\-selectivity near one, indicating a structural prior that aligns with one component and not the other\.*The three ENNs achieve modest predictiveR2R^\{2\}in the0\.660\.66–0\.710\.71range on each polarizability component individually, matching the⋆G\\star\_\{G\}all\-irreps \+ Ridge baseline at66parameters\.None of the three ENN architectures achieves cross\-selectivity above∼1%\\sim 1\\%:MACE, SchNet, and e3nn treat the two octahedral irrep components symmetrically and cannot disentangle them\. The⋆G\\star\_\{G\}EE\-only andT2T\_\{2\}\-only models, with22trainable parameters each, retain96\.2%96\.2\\%and97\.3%97\.3\\%cross\-selectivity respectively, a direct consequence of restricting the algebra’s per\-irrep features to a single irrep at training time\. This is the central empirical claim of the manuscript: not that⋆G\\star\_\{G\}wins on rawR2R^\{2\}, but that the algebra exposes a representation\-theoretic structure that no equivariant neural network we are aware of provides\.*

![Refer to caption](https://arxiv.org/html/2605.20440v1/x11.png)Figure 11:Parameter\-efficiency vs predictive power on QM9 HOMO–LUMO gap\.\(a\) Pooled testR2R^\{2\}vs trainable parameters \(3 seeds, error bars are±\\pmstd\)\. MACE occupies the upper\-right \(R2=0\.985R^\{2\}=0\.985at945,168945\{,\}168parameters\);⋆G\\star\_\{G\}\-SVD \+ Ridge occupies the upper\-left at144144parameters \(R2=0\.482R^\{2\}=0\.482, parameter efficiency∼\\sim6,600×6\{,\}600\\timesbetter than MACE\)\. MLP\-augmented sits at the bottom \(R2≈0\.02R^\{2\}\\approx 0\.02\), illustrating the structural collapse of orbit\-augmented learning at full QM9 scale\. \(b\) Within\-isomer testR2R^\{2\}on the same axes \(formulas with≥5\\geq 5constitutional isomers, sample\-weighted mean over 3 seeds\)\. Every method that consumes the molecule\-level\(nfeat,\|G\|\)\(n\_\{\\mathrm\{feat\}\},\|G\|\)summary collapses into the bandRwithin2∈\[−0\.17,\+0\.10\]R^\{2\}\_\{\\mathrm\{within\}\}\\in\[\-0\.17,\+0\.10\], demonstrating that the panel\-\(a\) gap among those methods is almost entirely a size\-prediction signal\. SchNet \(Rwithin2≈0\.991R^\{2\}\_\{\\mathrm\{within\}\}\\approx 0\.991\) and MACE \(Rwithin2≈0\.968R^\{2\}\_\{\\mathrm\{within\}\}\\approx 0\.968, seed 0\) both sit∼1\{\\sim\}\\,1unit above every molecule\-level\-summary method on theyy\-axis, making the input\-information gap visible at a glance\.![Refer to caption](https://arxiv.org/html/2605.20440v1/figures/fig11_paradigm.png)Figure 12:Paradigm comparison\.Left \(ENN paradigm\):each symmetry requires a bespoke architecture; combining symmetries requires redesigning from scratch\.Right \(⋆G\\star\_\{G\}paradigm\):the same algebra handles any group; composing symmetries requires only specifyingG1×G2G\_\{1\}\\times G\_\{2\}in the Fourier transform\.Table 9:Structural comparison: ENNs vs\.⋆G\\star\_\{G\}algebra\.

## 3Discussion

The central finding of this work is that the⋆G\\star\_\{G\}tensor algebra, when applied over a physically meaningful group \(the octahedral subgroup of SO\(3\)\), recovers the angular momentum selection rules of the Wigner–Eckart theorem directly from molecular geometry data\. The T1/A1predictive power ratio separates vector observables \(∼\\sim2\.8\) from scalar observables \(∼\\sim0\.5\) by a factor of five, and the isotropic polarizability’s near\-zero T1dependence confirms the representation\-theoretic absence ofl=1l\\\!=\\\!1content in symmetric rank\-2 tensors\. These patterns emerge without any quantum\-mechanical input, demonstrating that the⋆G\\star\_\{G\}framework functions as a*spectroscope for physical symmetry*: it decomposes empirical predictions into irreducible representation channels that reveal the underlying mathematical structure of nature\.

This discovery capability rests on four theoretical pillars validated by our experiments: \(i\) the Peter–Weyl spectral decomposition of the convolution tensor, which expresses group structure through the sparse core tensor𝒞\\mathcal\{C\}\(equation \([2](https://arxiv.org/html/2605.20440#S2.E2)\)\); \(ii\) the Eckart–Young optimality guarantee for the⋆G\\star\_\{G\}\-SVD \(Theorem[2\.1](https://arxiv.org/html/2605.20440#S2.Thmtheorem1)\), the first such result for symmetry\-preserving tensor approximation; \(iii\) the product group ring isomorphism \(Theorem[2\.2](https://arxiv.org/html/2605.20440#S2.Thmtheorem2)\), which composes multiple symmetries seamlessly \(R2=1\.000=1\.000forℤ6×ℤ4\\mathbb\{Z\}\_\{6\}\\times\\mathbb\{Z\}\_\{4\}vs\.≤0\.23\\leq 0\.23for single factors\); and \(iv\) data\-driven group and factorization discovery\.

The practical implications are equally significant\. In the data\-scarce regime that characterizes scientific applications,⋆G\\star\_\{G\}\-SVD with ridge regression outperforms neural baselines with49×49\\timesmore parameters on real QM9 data \(Table[2](https://arxiv.org/html/2605.20440#S2.T2)\)\. The success of simple linear methods in the⋆G\\star\_\{G\}representation suggests that when data symmetry is properly captured algebraically, the remaining structure is essentially linear, with immediate implications for interpretability, computational efficiency, and deployment in resource\-constrained scientific settings\.

### Broader Impact

The framework opens a path from Eckart–Young \(optimal low\-rank tensor approximation preserving group structure\) to Wigner–Eckart \(angular momentum selection rules for physical observables\) through a single algebraic construction, closing a circle between two theorems that, fittingly, share a common author in Carl Eckart\. By changing only the groupGG, the same machinery that achieves state\-of\-the\-art molecular property prediction can decompose physical observables by angular momentum channel, discover product group structure, or identify the best\-fitting symmetry from a candidate set\. We anticipate applications in materials science \(crystallographic point groups\), particle physics \(gauge symmetries\), and drug discovery \(molecular chirality\), where the ability to simultaneously predict properties and reveal their symmetry content could accelerate scientific understanding\.

### Formal Verification

All core algebraic results in this paper have been machine\-verified in the Lean 4 proof assistant\(de Moura and Ullrich,[2021](https://arxiv.org/html/2605.20440#bib.bib18)\)using the Mathlib library\(The mathlib Community,[2020](https://arxiv.org/html/2605.20440#bib.bib19)\)\. The formalization comprises 600 lines of Lean 4 across six modules, with zero unresolved proof obligations \(sorry\) and five axioms \(standard textbook results not yet available in Mathlib\)\. Every theorem is fully verified: Eckart–Young optimality \(Theorem[2\.1](https://arxiv.org/html/2605.20440#S2.Thmtheorem1)\), product\-group composition \(Theorem[2\.2](https://arxiv.org/html/2605.20440#S2.Thmtheorem2)\),⋆G\\star\_\{G\}associativity, identity, distributivity, transpose reversal, left and right equivariance, Frobenius norm and DC component invariance, and all three Wigner–Eckart selection rules\. Full details are provided in SI Section[N](https://arxiv.org/html/2605.20440#A14)\.

### Limitations and Future Directions

The current framework handles finite groups\. Extension to continuous group structures is a separate problem we do not address here\. For completeness we note that related Eckart–Young\-type results for the⋆M\\star\_\{M\}\(tubal\) algebra, a special case of our framework withG=ℤnG=\\mathbb\{Z\}\_\{n\}, have been obtained independently by Mor\(Mor and Avron,[2025](https://arxiv.org/html/2605.20440#bib.bib20); Mor,[2026](https://arxiv.org/html/2605.20440#bib.bib21)\)\. The octahedral group, while physically meaningful, captures only the cubic subgroup of the full rotation group; icosahedral \(\|G\|=60\|G\|=60\) or larger polyhedral groups would provide finer angular resolution\. The Lean 4 formalization axiomatizes five standard results from linear algebra and harmonic analysis that are not yet in Mathlib; closing these axioms is a contribution to the Mathlib library rather than to the mathematics of this paper\. Predicting full tensor\-valued properties \(not just scalar invariants\) would enable complete Wigner–Eckart decomposition includingl=2l\\\!=\\\!2selection rules for the quadrupole moment\.

More broadly, the⋆G\\star\_\{G\}framework inverts the conventional relationship between data and mathematics\. Rather than destroying the geometry of tensorial data to fit the algebra of vectors, we adapt the algebra to the geometry of the data\. The practical consequence is stark: on real molecular data, 107 algebraic parameters outperform 5,249 neural network parameters\. One does not need big data if one has deep algebra\.

## 4Methods

### 4\.1Algorithmic Overview

The full⋆G\\star\_\{G\}pipeline used in every experiment of this paper consists of four stages: \(i\)*group selection*, in which a finite groupGGis fixed and its convolution tensor𝒯G\\mathcal\{T\}\_\{G\}together with the generalized Fourier matrixFGF\_\{G\}are precomputed once and cached; \(ii\)*tensorial featurization*, in which each input molecule is mapped to a tensor𝒳∈ℝnf×\|G\|\\mathcal\{X\}\\in\\mathbb\{R\}^\{n\_\{f\}\\times\|G\|\}\(orℝnf×\|G1\|×⋯×\|Gd\|\\mathbb\{R\}^\{n\_\{f\}\\times\|G\_\{1\}\|\\times\\cdots\\times\|G\_\{d\}\|\}for product groups\) by sampling a measurement basis at every group element; \(iii\)*algebraic decomposition*, in which group\-invariant features are extracted from𝒳\\mathcal\{X\}via the generalized Fourier transform and the⋆G\\star\_\{G\}\-SVD; and \(iv\)*prediction*, in which a downstream regressor \(ridge regression for the linear pipeline; an MLP or a Neural\-⋆G\\star\_\{G\}network for the neural pipelines\) maps these features to the target property\. For the Wigner–Eckart experiment a fifth stage replaces the global generalized Fourier power with a per\-irrep decomposition \(Section[2\.6](https://arxiv.org/html/2605.20440#S2.SS6)\)\. The end\-to\-end procedure, the⋆G\\star\_\{G\}product computation, the⋆G\\star\_\{G\}\-SVD, the invariant feature extractor, the per\-irrep decomposition, and the symmetry\- and factorization\-discovery search are written as explicit algorithm blocks in the Supplementary Information\. Reference implementations are provided in MATLAB \(core/StarGAlgebra\.m,core/extractStarGFeatures\.m,core/NeuralStarGFramework\.m\) and in Python \(python/StarGAlgebra\.py\); the numerical results in this paper were generated by the MATLAB scripts underexperiments/, with the Python implementation passing the same regression\-test suite\.

### 4\.2Feature Construction

For the single\-group and product\-group experiments, molecular features are inner products with a rotating measurement basis at anglesθg=2​π​g/n\\theta\_\{g\}=2\\pi g/n\. For the product group, axial features use periodiczz\-embeddings and coupled features are angular×\\timesaxial products; the two actions commute becausezz\-rotation modifies\(x,y\)\(x,y\)notzz, whilezz\-translation modifieszznot\(x,y\)\(x,y\)\.

For the Wigner–Eckart experiment, features are computed under the 24 rotations of the chiral octahedral groupOO\(6 face rotations at90∘/180∘/270∘90^\{\\circ\}/180^\{\\circ\}/270^\{\\circ\}about coordinate axes, 8 vertex rotations at120∘/240∘120^\{\\circ\}/240^\{\\circ\}about body diagonals, 6 edge rotations at180∘180^\{\\circ\}about edge midpoints, plus identity\)\. This group is a subgroup of SO\(3\) whose irreps \(A1, A2, E, T1, T2\) correspond to angular momentum channelsl=0,0,2,1,2l=0,0,2,1,2\.

### 4\.3Dipole Vector Computation

The QM9 \.xyz files include Mulliken partial chargesqiq\_\{i\}as a fifth column\. The dipole vector is computed as𝝁=∑iqi​𝐫i\\boldsymbol\{\\mu\}=\\sum\_\{i\}q\_\{i\}\\mathbf\{r\}\_\{i\}, yielding componentsμx,μy,μz\\mu\_\{x\},\\mu\_\{y\},\\mu\_\{z\}that transform as a rank\-1 tensor \(vector\) under rotation\. This provides ground\-truth targets with known transformation properties for the Wigner–Eckart test\.

### 4\.4Per\-Irrep Fourier Decomposition

For each irrepρ\\rhoof dimensiondρd\_\{\\rho\}with representation matrices\{ρ​\(g\)\}g∈G\\\{\\rho\(g\)\\\}\_\{g\\in G\}, the Fourier transform of feature rowjjisX^jρ=dρ/\|G\|​∑gX​\(j,g\)​ρ​\(g\)\\hat\{X\}\_\{j\}^\{\\rho\}=\\sqrt\{d\_\{\\rho\}/\|G\|\}\\sum\_\{g\}X\(j,g\)\\,\\rho\(g\), adρ×dρd\_\{\\rho\}\\times d\_\{\\rho\}matrix\. The per\-irrep power is‖X^jρ‖F2\\\|\\hat\{X\}\_\{j\}^\{\\rho\}\\\|\_\{F\}^\{2\}\. Per\-irrep features \(one power value per feature row per irrep\) are used independently as predictors for each quantum property via ridge regression, yielding per\-irrep R2values\. Pseudocode is given in Algorithm[4](https://arxiv.org/html/2605.20440#alg4)\.

### 4\.5Invariant Feature Extraction

Given a sample tensor𝒳∈ℝnf×\|G\|\\mathcal\{X\}\\in\\mathbb\{R\}^\{n\_\{f\}\\times\|G\|\}, the invariant feature vector concatenates seven complementary descriptors, all of which are exactly invariant under the left action ofGG\(proofs in the SI Equivariance Proofs section\): \(a\) the DC component𝒳¯j=1\|G\|​∑g𝒳​\(j,g\)\\bar\{\\mathcal\{X\}\}\_\{j\}=\\tfrac\{1\}\{\|G\|\}\\sum\_\{g\}\\mathcal\{X\}\(j,g\)for each feature rowjj; \(b\) the AC energyσj=stdg​𝒳​\(j,g\)\\sigma\_\{j\}=\\mathrm\{std\}\_\{g\}\\,\\mathcal\{X\}\(j,g\); \(c\) the total per\-frequency power∑j\|𝒳^​\(j,k\)\|2\\sum\_\{j\}\|\\hat\{\\mathcal\{X\}\}\(j,k\)\|^\{2\}for each generalized Fourier binkk, where𝒳^=𝒳​FG\\hat\{\\mathcal\{X\}\}=\\mathcal\{X\}F\_\{G\}is the generalized Fourier transform of𝒳\\mathcal\{X\}along the group dimension; \(d\) per\-row generalized Fourier power\|𝒳^​\(j,k\)\|2\|\\hat\{\\mathcal\{X\}\}\(j,k\)\|^\{2\}for the firstK=14K=14equivariant rows; \(e\) the singular tube norms‖𝐬i‖F\\\|\\mathbf\{s\}\_\{i\}\\\|\_\{F\}fori=1,…,min⁡\(p,q\)i=1,\\ldots,\\min\(p,q\)obtained from the⋆G\\star\_\{G\}\-SVD of a reshapedp×q×\|G\|p\\times q\\times\|G\|tensor \(the reshape\(p,q\)\(p,q\)is chosen to maximizemin⁡\(p,q\)\\min\(p,q\)\); \(f\) the rows of𝒳\\mathcal\{X\}identified as invariant by row\-variance \(constant under the group action\); and \(g\) four spectral statistics of the unfolded matrix \(nuclear norm, spectral norm, condition number, and entropy of the singular\-value distribution\)\. Features arezz\-score normalized using statistics computed from training data, and an unregularized intercept column is appended\. Pseudocode is given in Algorithm[3](https://arxiv.org/html/2605.20440#alg3)\.

### 4\.6Ridge Regression and Rank Selection

The downstream linear model is ridge regression with hyperparameterλ\\lambdaselected from the geometric grid\{10−3,10−2,…,102,103\}\\\{10^\{\-3\},10^\{\-2\},\\ldots,10^\{2\},10^\{3\}\\\}by validation MSE\. The intercept is unregularized\. The number of singular tubes retained in the⋆G\\star\_\{G\}\-SVD feature block is set tomin⁡\(p,q\)\\min\(p,q\)where\(p,q\)\(p,q\)is the optimal rectangular reshape ofnfn\_\{f\}\(Section[4\.5](https://arxiv.org/html/2605.20440#S4.SS5)\); no further truncation is applied because the Eckart–Young theorem guarantees a closed\-form bound on the truncation error \(Theorem[2\.1](https://arxiv.org/html/2605.20440#S2.Thmtheorem1)\)\. Total parameter count for⋆G\\star\_\{G\}\-SVD \+ Ridge is1\+dfeat1\+d\_\{\\mathrm\{feat\}\}, wheredfeatd\_\{\\mathrm\{feat\}\}is the number of non\-degenerate feature columns retained afterzz\-score normalization \(107 on QM9 and 186 on the product\-group task\)\.

### 4\.7Baseline Architectures and Training

All four neural baselines share the same hidden width\[64,32\]\[64,32\], ReLU activations on hidden layers, a linear output, He initialization, full\-batch validation, and the Adam optimizer \(β1=0\.9,β2=0\.999,ε=10−8\\beta\_\{1\}=0\.9,\\beta\_\{2\}=0\.999,\\varepsilon=10^\{\-8\}\) with early stopping on validation MSE \(patience 20\)\. They differ in three respects: \(i\) the input representationXinX\_\{\\mathrm\{in\}\}, \(ii\) the training\-set construction, and \(iii\) the parameter count\. Table[10](https://arxiv.org/html/2605.20440#S4.T10)\(Methods\) summarizes these differences; the complete per\-experiment hyperparameter sheet is given in SI Table 3\.

Standard MLP\.Input is the un\-rotated raw feature vectorXin=𝒳​\(:,e\)∈ℝnfX\_\{\\mathrm\{in\}\}=\\mathcal\{X\}\(:,e\)\\in\\mathbb\{R\}^\{n\_\{f\}\}\(frontal slice at the identity\),zz\-score normalized using training\-set statistics\. The model is\[nf→64→32→1\]\[n\_\{f\}\\to 64\\to 32\\to 1\]\. No symmetry information is used during training\. Trained for up to 300 epochs \(synthetic\) or 300 epochs \(QM9 / product\-group\) at learning rate0\.0030\.003, batch size 32 \(synthetic\) or 256 \(QM9\)\. This is the “no\-symmetry” control\.

Invariant MLP\.Input is the concatenation of four hand\-crafted group\-invariant pooling statistics applied along the group dimension:Xin=\[meang​𝒳,stdg​𝒳,ming⁡𝒳,maxg⁡𝒳\]∈ℝ4​nfX\_\{\\mathrm\{in\}\}=\[\\,\\mathrm\{mean\}\_\{g\}\\mathcal\{X\},\\ \\mathrm\{std\}\_\{g\}\\mathcal\{X\},\\ \\min\_\{g\}\\mathcal\{X\},\\ \\max\_\{g\}\\mathcal\{X\}\\,\]\\in\\mathbb\{R\}^\{4n\_\{f\}\},zz\-score normalized\. The model is\[4​nf→64→32→1\]\[4n\_\{f\}\\to 64\\to 32\\to 1\]\. By construction the input is exactlyGG\-invariant so the model is invariant by composition; this is the “manual invariance” control\. Same optimizer schedule as Standard MLP\.

Augmented MLP\.Input is the un\-rotated raw feature vectorXin=𝒳​\(:,e\)∈ℝnfX\_\{\\mathrm\{in\}\}=\\mathcal\{X\}\(:,e\)\\in\\mathbb\{R\}^\{n\_\{f\}\}, but the training set is expanded by includingXaug=𝒳​\(:,g\)X\_\{\\mathrm\{aug\}\}=\\mathcal\{X\}\(:,g\)for everyg∈Gg\\in G, with the same target label, yielding a\|G\|\|G\|\-fold augmented training set\.zz\-score statistics are computed on the augmented set\. The model is\[nf→64→32→1\]\[n\_\{f\}\\to 64\\to 32\\to 1\]\. Test\-time prediction uses the un\-rotated slice\. This is the standard data\-augmentation strategy and serves as a proxy for invariance learned from data rather than enforced algebraically; it is the closest non\-equivariant baseline to an ENN\. Trained for 80–300 epochs at learning rate0\.0030\.003–0\.0050\.005, batch size 32 \(synthetic\) or 256 \(QM9 / product\-group\); the lower epoch count for the synthetic experiment reflects the\|G\|\|G\|\-fold larger effective batch budget\. A precise pseudocode specification is given in Algorithm[6](https://arxiv.org/html/2605.20440#alg6)\.

Neural⋆G\\star\_\{G\}\.A symmetry\-aware feed\-forward network whose linear layers are⋆G\\star\_\{G\}products with weight tensorsW\(ℓ\)∈ℝnℓ\+1×nℓ×\|G\|W^\{\(\\ell\)\}\\in\\mathbb\{R\}^\{n\_\{\\ell\+1\}\\times n\_\{\\ell\}\\times\|G\|\}rather than ordinary matrix products\. Forward pass:𝒜\(ℓ\+1\)=ReLU​\(W\(ℓ\)⋆G𝒜\(ℓ\)\+𝐛\(ℓ\)\)\\mathcal\{A\}^\{\(\\ell\+1\)\}=\\mathrm\{ReLU\}\\bigl\(W^\{\(\\ell\)\}\\star\_\{G\}\\mathcal\{A\}^\{\(\\ell\)\}\+\\mathbf\{b\}^\{\(\\ell\)\}\\bigr\)for hidden layers, with a linear output and an invariant poolingy=meang,j​𝒜\(L\)​\(j,g\)y=\\mathrm\{mean\}\_\{g,j\}\\mathcal\{A\}^\{\(L\)\}\(j,g\)\. The hidden widths are\[64,32\]\[64,32\], matching the MLPs\. Input is the⋆G\\star\_\{G\}\-feature vectorXinX\_\{\\mathrm\{in\}\}from Section[4\.5](https://arxiv.org/html/2605.20440#S4.SS5)\. Trained for 300 epochs at learning rate0\.0030\.003, batch size 256, with the same Adam settings and early stopping\. Equivariance is exact by construction \(the per\-layer rotation variance is∼10−28\\sim 10^\{\-28\}, at floating\-point noise\) so this baseline isolates the cost of replacing a closed\-form ridge regressor with a non\-linear trainable model on top of the same algebraic representation\. Pseudocode is given in Algorithm[7](https://arxiv.org/html/2605.20440#alg7)\.

Table 10:Baseline summary\. Hidden=\[64,32\]=\[64,32\], ReLU, Adam, He init, patience 20\.
### 4\.8Symmetry and Factorization Discovery

The symmetry\-discovery experiment \(Section[2\.5](https://arxiv.org/html/2605.20440#S2.SS5)\) and the factorization\-discovery experiment use the same scoring function:score​\(G\)=α⋅Rval2​\(G\)\+\(1−α\)⋅\[1−rotvar​\(G\)/rotvarmax\]\\mathrm\{score\}\(G\)=\\alpha\\cdot R^\{2\}\_\{\\mathrm\{val\}\}\(G\)\+\(1\-\\alpha\)\\cdot\[1\-\\mathrm\{rotvar\}\(G\)/\\mathrm\{rotvar\}\_\{\\max\}\]withα=0\.7\\alpha=0\.7, whereRval2​\(G\)R^\{2\}\_\{\\mathrm\{val\}\}\(G\)is the validationR2R^\{2\}achieved by⋆G\\star\_\{G\}\-SVD \+ Ridge with groupGGandrotvar​\(G\)\\mathrm\{rotvar\}\(G\)is the prediction variance under the candidate group action\. The candidate library contains all groups of order≤12\\leq 12\(ℤn\\mathbb\{Z\}\_\{n\},DnD\_\{n\},S3S\_\{3\}, Klein 4, Quaternion 8\) plus, for the factorization experiment, every productℤa×ℤb\\mathbb\{Z\}\_\{a\}\\times\\mathbb\{Z\}\_\{b\}witha​b=ngroupab=n\_\{\\mathrm\{group\}\}\. Pseudocode is given in Algorithm[5](https://arxiv.org/html/2605.20440#alg5)\.

### 4\.9Reproducibility

All experiments use 3 random seeds and a 70/15/15 train/validation/test split\. The same random seeds \(seed,111⋅seed111\\cdot\\texttt\{seed\}and31⋅seed31\\cdot\\texttt\{seed\}\) drive train/val/test partitioning, weight initialization, and mini\-batch shuffling so that runs are reproducible up to floating\-point non\-determinism\. End\-to\-end runtimes \(single seed, 1,000 molecules\) on a 2024 desktop CPU are reported in SI Table 4 and range from 0\.7 s \(⋆G\\star\_\{G\}\-SVD \+ Ridge\) to 4\.0 s \(Augmented MLP\)\.

## Data Availability

## Code Availability

Open\-source implementations of the⋆G\\star\_\{G\}algebra in MATLAB and Python, together with scripts to reproduce all experiments and figures, will be made available at[https://gitfront\.io/r/supermanG/hRrSeL77snCf/tensor\-group\-sym/](https://gitfront.io/r/supermanG/hRrSeL77snCf/tensor-group-sym/)upon acceptance\. The Lean 4 formalization of all algebraic proofs \(zerosorry, five axioms\) is included in the repository underlean/\.

## Acknowledgements

We wish to acknowledge Tammy Kolda for early feedback and her foundational contributions to the tensor algebra field, and Tess Smidt, Maurice Weiler and Joe Kileel for engaging discussions\. P\.H\. gratefully acknowledges the IBM internship program, which enabled key components of this study\.

Supplementary Information

## Appendix AMathematical Foundations

### A\.1Notation

- •GG: finite group of ordern=\|G\|n=\|G\|with identityee\.
- •G^\\hat\{G\}: set of equivalence classes of irreducible unitary representations\.
- •dρ=dim\(ρ\)d\_\{\\rho\}=\\dim\(\\rho\)forρ∈G^\\rho\\in\\hat\{G\}\.
- •𝒜​\(:,:,g\)\\mathcal\{A\}\(:,:,g\): frontal slice at group elementgg\.𝒜i​j=𝒜​\(i,j,:\)\\mathcal\{A\}\_\{ij\}=\\mathcal\{A\}\(i,j,:\): tube at indicesi,ji,j\.

### A\.2The Group Algebra

###### Definition A\.1\(Group Algebra\)\.

ℝ​\[G\]\\mathbb\{R\}\[G\]is the vector space of formal sums∑g∈Gag​g\\sum\_\{g\\in G\}a\_\{g\}gwith convolution product:

\(∑gag​g\)⋅\(∑hbh​h\)=∑c∈G\(∑g∈Gag​bg−1​c\)​c,\\left\(\\sum\_\{g\}a\_\{g\}g\\right\)\\cdot\\left\(\\sum\_\{h\}b\_\{h\}h\\right\)=\\sum\_\{c\\in G\}\\left\(\\sum\_\{g\\in G\}a\_\{g\}b\_\{g^\{\-1\}c\}\\right\)c,\(5\)whereaga\_\{g\}andbhb\_\{h\}are scalar coefficients inℝ\\mathbb\{R\}\.

## Appendix BThe Convolution Tensor, Spectral Decomposition, and Generalized Fourier Matrix

###### Definition B\.1\(Convolution Tensor\)\.

𝒯∈ℝn×n×n\\mathcal\{T\}\\in\\mathbb\{R\}^\{n\\times n\\times n\}is defined by𝒯​\(a,b,c\)=δ​\(ga​gb=gc\)\\mathcal\{T\}\(a,b,c\)=\\delta\(g\_\{a\}g\_\{b\}=g\_\{c\}\)\(also known as the structure constants of the group algebra\)\.

###### Proposition B\.2\(Properties of𝒯\\mathcal\{T\}\)\.

1. \(i\)Associativity: ∑d𝒯​\(a,b,d\)​𝒯​\(d,c,e\)=∑d𝒯​\(a,d,e\)​𝒯​\(b,c,d\)\.\\sum\_\{d\}\\mathcal\{T\}\(a,b,d\)\\,\\mathcal\{T\}\(d,c,e\)\\;=\\;\\sum\_\{d\}\\mathcal\{T\}\(a,d,e\)\\,\\mathcal\{T\}\(b,c,d\)\.
2. \(ii\)Identity:𝒯​\(e,b,c\)=δb​c\\mathcal\{T\}\(e,b,c\)=\\delta\_\{bc\}\.
3. \(iii\)Each slice𝒯​\(a,:,:\)\\mathcal\{T\}\(a,:,:\)is a permutation matrix\.

###### Proof\.

\(i\) follows from associativity of group multiplication; both sides equalδ​\(ga​gb​gc=ge\)\\delta\(g\_\{a\}g\_\{b\}g\_\{c\}=g\_\{e\}\)\. \(ii\)–\(iii\) follow from the group axioms\. ∎

###### Definition B\.3\(Generalized Fourier Transform Matrix\)\.

FG∈ℂn×nF\_\{G\}\\in\\mathbb\{C\}^\{n\\times n\}is defined row\-wise: the rowFG​\(g,:\)F\_\{G\}\(g,:\)is given by the concatenation\[rvec​\(ρ1​\(g\)\),…,rvec​\(ρℓ​\(g\)\)\]\[\\mathrm\{rvec\}\(\\rho\_\{1\}\(g\)\),\\dots,\\mathrm\{rvec\}\(\\rho\_\{\\ell\}\(g\)\)\], wherervec​\(ρi​\(g\)\)\\mathrm\{rvec\}\(\\rho\_\{i\}\(g\)\)denotes the row\-vectorization of the matrixρi​\(g\)\\rho\_\{i\}\(g\)for eachρi∈G^\\rho\_\{i\}\\in\\hat\{G\}\. For abelian groups,FGF\_\{G\}is a generalized Fourier matrix and for cyclic groups, it reduces to the standard DFT matrix\. For non\-abelian groupsFGF\_\{G\}is invertible but in general neither unitary nor block\-unitary\.

###### Theorem B\.4\(Peter–Weyl Spectral Decomposition\)\.

𝒯​\(a,b,c\)=∑i,j,k𝒞​\(i,j,k\)​FG​\(a,i\)​FG​\(b,j\)​FG−1​\(c,k\)\\mathcal\{T\}\(a,b,c\)=\\sum\_\{i,j,k\}\\mathcal\{C\}\(i,j,k\)\\,F\_\{G\}\(a,i\)\\,F\_\{G\}\(b,j\)\\,F\_\{G\}^\{\-1\}\(c,k\)\(6\)where𝒞\\mathcal\{C\}is a core tensor that is typically sparse\. For abelian groups,𝒞\\mathcal\{C\}is diagonal\.

###### Proof\.

By the Peter–Weyl theorem,\{dρ​ρi​j​\(g\):ρ∈G^\}\\\{\\sqrt\{d\_\{\\rho\}\}\\rho\_\{ij\}\(g\):\\rho\\in\\hat\{G\}\\\}is an orthonormal basis forL2​\(G\)L^\{2\}\(G\)\. Convolution becomes block multiplication in the Fourier domain:\(f∗h\)^​\(ρ\)=f^​\(ρ\)⋅h^​\(ρ\)\\widehat\{\(f\*h\)\}\(\\rho\)=\\hat\{f\}\(\\rho\)\\cdot\\hat\{h\}\(\\rho\)\. Writing this in index notation using the row\-vectorization construction ofFGF\_\{G\}yields the spectral decomposition with core𝒞\\mathcal\{C\}determined by the Fourier\-domain multiplication structure\. ∎

###### Corollary B\.5\.

ForG=ℤnG=\\mathbb\{Z\}\_\{n\}:FG=DFTnF\_\{G\}=\\mathrm\{DFT\}\_\{n\}and𝒞\\mathcal\{C\}is diagonal, recovering the circular convolution theorem \(i\.e\., thet−t\-product\)\.

## Appendix CThe Group Fourier Transform of a Tensor

###### Definition C\.1\(Group Fourier Transform of a Tensor\)\.

For𝒜∈ℝℓ×m×n\\mathcal\{A\}\\in\\mathbb\{R\}^\{\\ell\\times m\\times n\}, the Group Fourier transformℱG\\mathcal\{F\}\_\{G\}assigns to each irrepρ∈G^\\rho\\in\\hat\{G\}theℓ​dρ×m​dρ\\ell d\_\{\\rho\}\\times md\_\{\\rho\}block matrix

𝒜^​\(:,:,ρ\)=∑g∈G𝒜​\(i,j,g\)​ρ​\(g\),with​\(i,j\)​indexing the​ℓ×m​blocks\.\\hat\{\\mathcal\{A\}\}\(:,:,\\rho\)=\\sum\_\{g\\in G\}\\mathcal\{A\}\(i,j,g\)\\,\\rho\(g\),\\qquad\\text\{with \}\(i,j\)\\text\{ indexing the \}\\ell\\times m\\text\{ blocks\.\}\(7\)The full Fourier representation is the block\-diagonal matrix⊕ρ∈G^𝒜^​\(:,:,ρ\)\\oplus\_\{\\rho\\in\\hat\{G\}\}\\hat\{\\mathcal\{A\}\}\(:,:,\\rho\)\.

###### Proposition C\.2\(Group Fourier Inversion Theorem\)\.

Given𝒜^​\(:,:,ρ\)\\hat\{\\mathcal\{A\}\}\(:,:,\\rho\)for eachρ∈G^\\rho\\in\\hat\{G\}, the inverse Group Fourier transform recovers

𝒜​\(i,j,g\)=∑ρ∈G^dρn​Tr​\[𝒜^​\(i,j,ρ\)​ρ​\(g\)H\]\.\\mathcal\{A\}\(i,j,g\)=\\sum\_\{\\rho\\in\\hat\{G\}\}\\frac\{d\_\{\\rho\}\}\{n\}\\,\\mathrm\{Tr\}\\\!\\left\[\\hat\{\\mathcal\{A\}\}\(i,j,\\rho\)\\,\\rho\(g\)^\{H\}\\right\]\.\(8\)

###### Proof\.

Follows by applying the standard Peter–Weyl inversion theorem for each fixediiandjj\. ∎

Algorithm 1⋆G\\star\_\{G\}product \(used by every⋆G\\star\_\{G\}\-based method\)1:

𝒜∈ℝℓ×m×n\\mathcal\{A\}\\in\\mathbb\{R\}^\{\\ell\\times m\\times n\},

ℬ∈ℝm×p×n\\mathcal\{B\}\\in\\mathbb\{R\}^\{m\\times p\\times n\}, generalized Fourier matrix

FG∈ℂn×nF\_\{G\}\\in\\mathbb\{C\}^\{n\\times n\}, irrep block sizes

\{dρ\}ρ∈G^\\\{d\_\{\\rho\}\\\}\_\{\\rho\\in\\hat\{G\}\}
2:

𝒞=𝒜⋆Gℬ∈ℝℓ×p×n\\mathcal\{C\}=\\mathcal\{A\}\\star\_\{G\}\\mathcal\{B\}\\in\\mathbb\{R\}^\{\\ell\\times p\\times n\}
3:

𝒜^←𝒜×3FG\\hat\{\\mathcal\{A\}\}\\leftarrow\\mathcal\{A\}\\times\_\{3\}F\_\{G\}⊳\\trianglerightGeneralized Fourier transform along group dimension

4:

ℬ^←ℬ×3FG\\hat\{\\mathcal\{B\}\}\\leftarrow\\mathcal\{B\}\\times\_\{3\}F\_\{G\}
5:for

ρ∈G^\\rho\\in\\hat\{G\}in paralleldo

6:extract

𝒜^ρ∈ℂℓ​dρ×m​dρ\\hat\{\\mathcal\{A\}\}\_\{\\rho\}\\in\\mathbb\{C\}^\{\\ell d\_\{\\rho\}\\times md\_\{\\rho\}\},

ℬ^ρ∈ℂm​dρ×p​dρ\\hat\{\\mathcal\{B\}\}\_\{\\rho\}\\in\\mathbb\{C\}^\{md\_\{\\rho\}\\times pd\_\{\\rho\}\}from block\-diagonal form

7:

𝒞^ρ←𝒜^ρ⋅ℬ^ρ\\hat\{\\mathcal\{C\}\}\_\{\\rho\}\\leftarrow\\hat\{\\mathcal\{A\}\}\_\{\\rho\}\\cdot\\hat\{\\mathcal\{B\}\}\_\{\\rho\}⊳\\trianglerightordinary matrix product

8:endfor

9:assemble

𝒞^\\hat\{\\mathcal\{C\}\}from

\{𝒞^ρ\}ρ\\\{\\hat\{\\mathcal\{C\}\}\_\{\\rho\}\\\}\_\{\\rho\}in block\-diagonal form

10:

𝒞←Re​\(𝒞^×3FG−1\)\\mathcal\{C\}\\leftarrow\\mathrm\{Re\}\(\\hat\{\\mathcal\{C\}\}\\times\_\{3\}F\_\{G\}^\{\-1\}\)⊳\\trianglerightinverse Generalized Fourier transform; imaginary part is zero up to roundoff for real inputs

11:return

𝒞\\mathcal\{C\}

## Appendix DThe⋆G\\star\_\{G\}Algebra: Properties and Proofs

###### Definition D\.1\(⋆G\\star\_\{G\}Product\)\.

For𝒜∈ℝℓ×m×n\\mathcal\{A\}\\in\\mathbb\{R\}^\{\\ell\\times m\\times n\},ℬ∈ℝm×p×n\\mathcal\{B\}\\in\\mathbb\{R\}^\{m\\times p\\times n\}:

\(𝒜⋆Gℬ\)i​j​\(c\)=∑k∑a∈G𝒜i​k​\(a\)​ℬk​j​\(a−1​c\)\.\(\\mathcal\{A\}\\star\_\{G\}\\mathcal\{B\}\)\_\{ij\}\(c\)=\\sum\_\{k\}\\sum\_\{a\\in G\}\\mathcal\{A\}\_\{ik\}\(a\)\\mathcal\{B\}\_\{kj\}\(a^\{\-1\}c\)\.\(9\)Equivalently:\(𝒜⋆Gℬ\)^​\(:,:,ρ\)=𝒜^​\(:,:,ρ\)⋅ℬ^​\(:,:,ρ\)\\widehat\{\(\\mathcal\{A\}\\star\_\{G\}\\mathcal\{B\}\)\}\(:,:,\\rho\)=\\hat\{\\mathcal\{A\}\}\(:,:,\\rho\)\\cdot\\hat\{\\mathcal\{B\}\}\(:,:,\\rho\)for each irrepρ∈G^\\rho\\in\\hat\{G\}\.

###### Proposition D\.2\(Algebraic Properties\)\.

\(i\) Associativity\. \(ii\) Distributivity\. \(iii\) Identity:ℐ​\(:,:,e\)=I\\mathcal\{I\}\(:,:,e\)=I,ℐ​\(:,:,g≠e\)=0\\mathcal\{I\}\(:,:,g\\neq e\)=0\. \(iv\)\(𝒜⋆Gℬ\)H=ℬH⋆G𝒜H\(\\mathcal\{A\}\\star\_\{G\}\\mathcal\{B\}\)^\{H\}=\\mathcal\{B\}^\{H\}\\star\_\{G\}\\mathcal\{A\}^\{H\}\.

###### Proof\.

All follow from the corresponding matrix properties applied per\-irrep in the Fourier domain \(Definition[C\.1](https://arxiv.org/html/2605.20440#A3.Thmtheorem1)\), plus linearity and invertibility of the Fourier transform\. ∎

## Appendix EThe⋆G\\star\_\{G\}\-SVD: Full Proof of Optimality

###### Theorem E\.1\(⋆G\\star\_\{G\}\-SVD Existence\)\.

Every𝒜∈ℝℓ×m×n\\mathcal\{A\}\\in\\mathbb\{R\}^\{\\ell\\times m\\times n\}admits𝒜=𝒰⋆G𝒮⋆G𝒱H\\mathcal\{A\}=\\mathcal\{U\}\\star\_\{G\}\\mathcal\{S\}\\star\_\{G\}\\mathcal\{V\}^\{H\}where𝒰,𝒱\\mathcal\{U\},\\mathcal\{V\}are⋆G\\star\_\{G\}\-unitary and𝒮\\mathcal\{S\}is f\-diagonal\.

###### Proof\.

For each irrepρ∈G^\\rho\\in\\hat\{G\},𝒜^​\(:,:,ρ\)\\hat\{\\mathcal\{A\}\}\(:,:,\\rho\)is a standard matrix admitting SVD:𝒜^​\(:,:,ρ\)=Uρ​Σρ​VρH\\hat\{\\mathcal\{A\}\}\(:,:,\\rho\)=U\_\{\\rho\}\\Sigma\_\{\\rho\}V\_\{\\rho\}^\{H\}\. Setting𝒰^​\(:,:,ρ\)=Uρ\\hat\{\\mathcal\{U\}\}\(:,:,\\rho\)=U\_\{\\rho\},𝒮^​\(:,:,ρ\)=Σρ\\hat\{\\mathcal\{S\}\}\(:,:,\\rho\)=\\Sigma\_\{\\rho\},𝒱^​\(:,:,ρ\)=Vρ\\hat\{\\mathcal\{V\}\}\(:,:,\\rho\)=V\_\{\\rho\}and applying the inverse Fourier transform yields the⋆G\\star\_\{G\}\-SVD\. Unitarity:𝒰H⋆G𝒰^​\(:,:,ρ\)=UρH​Uρ=I\\widehat\{\\mathcal\{U\}^\{H\}\\star\_\{G\}\\mathcal\{U\}\}\(:,:,\\rho\)=U\_\{\\rho\}^\{H\}U\_\{\\rho\}=Ifor allρ\\rho, so𝒰H⋆G𝒰=ℐ\\mathcal\{U\}^\{H\}\\star\_\{G\}\\mathcal\{U\}=\\mathcal\{I\}\. ∎

Algorithm 2⋆G\\star\_\{G\}\-SVD1:

𝒜∈ℝℓ×m×n\\mathcal\{A\}\\in\\mathbb\{R\}^\{\\ell\\times m\\times n\}
2:

𝒰,𝒮,𝒱\\mathcal\{U\},\\mathcal\{S\},\\mathcal\{V\}
3:Compute

𝒜^​\(:,:,ρ\)\\hat\{\\mathcal\{A\}\}\(:,:,\\rho\)for all

ρ∈G^\\rho\\in\\hat\{G\}using

ℱG\\mathcal\{F\}\_\{G\}\(Definition[C\.1](https://arxiv.org/html/2605.20440#A3.Thmtheorem1)\)

4:for

ρ∈G^\\rho\\in\\hat\{G\}do

5:

\[Uρ,Σρ,Vρ\]←SVD​\(𝒜^​\(:,:,ρ\)\)\[U\_\{\\rho\},\\Sigma\_\{\\rho\},V\_\{\\rho\}\]\\leftarrow\\mathrm\{SVD\}\(\\hat\{\\mathcal\{A\}\}\(:,:,\\rho\)\)
6:endfor

7:Apply

ℱG−1\\mathcal\{F\}\_\{G\}^\{\-1\}to

\{Uρ\}\\\{U\_\{\\rho\}\\\},

\{Σρ\}\\\{\\Sigma\_\{\\rho\}\\\},

\{Vρ\}\\\{V\_\{\\rho\}\\\}to obtain

𝒰,𝒮,𝒱\\mathcal\{U\},\\mathcal\{S\},\\mathcal\{V\}

###### Theorem E\.2\(Eckart–Young for⋆G\\star\_\{G\}\)\.

Let𝒜=𝒰⋆G𝒮⋆G𝒱H\\mathcal\{A\}=\\mathcal\{U\}\\star\_\{G\}\\mathcal\{S\}\\star\_\{G\}\\mathcal\{V\}^\{H\}with singular tubes ordered by‖𝐬1‖F≥⋯≥‖𝐬r‖F\\\|\\mathbf\{s\}\_\{1\}\\\|\_\{F\}\\geq\\cdots\\geq\\\|\\mathbf\{s\}\_\{r\}\\\|\_\{F\}\. The rank\-kktruncation𝒜k\\mathcal\{A\}\_\{k\}satisfies:

‖𝒜−𝒜k‖F2=∑i=k\+1r‖𝐬i‖F2≤‖𝒜−ℬ‖F2\\\|\\mathcal\{A\}\-\\mathcal\{A\}\_\{k\}\\\|\_\{F\}^\{2\}=\\sum\_\{i=k\+1\}^\{r\}\\\|\\mathbf\{s\}\_\{i\}\\\|\_\{F\}^\{2\}\\leq\\\|\\mathcal\{A\}\-\\mathcal\{B\}\\\|\_\{F\}^\{2\}\(10\)for anyℬ\\mathcal\{B\}with⋆G\\star\_\{G\}\-rank≤k\\leq k\.

###### Proof\.

Step 1 \(Parseval\)\.By the generalized Fourier transform’s isometry \(Peter–Weyl\):

‖𝒜−ℬ‖F2=∑ρ∈G^dρn​‖𝒜^​\(:,:,ρ\)−ℬ^​\(:,:,ρ\)‖F2\.\\\|\\mathcal\{A\}\-\\mathcal\{B\}\\\|\_\{F\}^\{2\}=\\sum\_\{\\rho\\in\\hat\{G\}\}\\frac\{d\_\{\\rho\}\}\{n\}\\\|\\hat\{\\mathcal\{A\}\}\(:,:,\\rho\)\-\\hat\{\\mathcal\{B\}\}\(:,:,\\rho\)\\\|\_\{F\}^\{2\}\.\(11\)
Step 2 \(Per\-irrep Eckart–Young\)\.If⋆G\\star\_\{G\}\-rank\(ℬ\)≤k\(\\mathcal\{B\}\)\\leq k, thenrank​\(ℬ^​\(:,:,ρ\)\)≤k\\mathrm\{rank\}\(\\hat\{\\mathcal\{B\}\}\(:,:,\\rho\)\)\\leq kfor eachρ\\rho\. By the classical Eckart–Young theorem:

‖𝒜^​\(:,:,ρ\)−𝒜^k​\(:,:,ρ\)‖F2≤‖𝒜^​\(:,:,ρ\)−ℬ^​\(:,:,ρ\)‖F2\.\\\|\\hat\{\\mathcal\{A\}\}\(:,:,\\rho\)\-\\hat\{\\mathcal\{A\}\}\_\{k\}\(:,:,\\rho\)\\\|\_\{F\}^\{2\}\\leq\\\|\\hat\{\\mathcal\{A\}\}\(:,:,\\rho\)\-\\hat\{\\mathcal\{B\}\}\(:,:,\\rho\)\\\|\_\{F\}^\{2\}\.\(12\)
Step 3 \(Summation\)\.Summing overρ∈G^\\rho\\in\\hat\{G\}:

‖𝒜−𝒜k‖F2=∑ρ∈G^dρn​∑i=k\+1rσi​\(ρ\)2=∑i=k\+1r∑ρdρn​σi​\(ρ\)2⏟=‖𝐬i‖F2≤‖𝒜−ℬ‖F2\.\\\|\\mathcal\{A\}\-\\mathcal\{A\}\_\{k\}\\\|\_\{F\}^\{2\}=\\sum\_\{\\rho\\in\\hat\{G\}\}\\frac\{d\_\{\\rho\}\}\{n\}\\sum\_\{i=k\+1\}^\{r\}\\sigma\_\{i\}\(\\rho\)^\{2\}=\\sum\_\{i=k\+1\}^\{r\}\\underbrace\{\\sum\_\{\\rho\}\\frac\{d\_\{\\rho\}\}\{n\}\\sigma\_\{i\}\(\\rho\)^\{2\}\}\_\{=\\\|\\mathbf\{s\}\_\{i\}\\\|\_\{F\}^\{2\}\}\\leq\\\|\\mathcal\{A\}\-\\mathcal\{B\}\\\|\_\{F\}^\{2\}\.\(13\)∎

## Appendix FProduct Groups: Full Proof

###### Theorem F\.1\(Product Group Ring Isomorphism\)\.

ForG=G1×⋯×GdG=G\_\{1\}\\times\\cdots\\times G\_\{d\}: \(i\)𝕂G≅𝕂G1⊗⋯⊗𝕂Gd\\mathbb\{K\}\_\{G\}\\cong\\mathbb\{K\}\_\{G\_\{1\}\}\\otimes\\cdots\\otimes\\mathbb\{K\}\_\{G\_\{d\}\}\. \(ii\)𝒯G=𝒯G1⊗⋯⊗𝒯Gd\\mathcal\{T\}\_\{G\}=\\mathcal\{T\}\_\{G\_\{1\}\}\\otimes\\cdots\\otimes\\mathcal\{T\}\_\{G\_\{d\}\}\. \(iii\)FG=FG1⊗⋯⊗FGdF\_\{G\}=F\_\{G\_\{1\}\}\\otimes\\cdots\\otimes F\_\{G\_\{d\}\}\.

###### Proof\.

\(i\)Standard:ℝ​\[G1×⋯×Gd\]≅ℝ​\[G1\]⊗⋯⊗ℝ​\[Gd\]\\mathbb\{R\}\[G\_\{1\}\\times\\cdots\\times G\_\{d\}\]\\cong\\mathbb\{R\}\[G\_\{1\}\]\\otimes\\cdots\\otimes\\mathbb\{R\}\[G\_\{d\}\]Serre \([1977](https://arxiv.org/html/2605.20440#bib.bib13)\)\.

\(ii\)The product group multiplication\(a1,…,ad\)​\(b1,…,bd\)=\(a1​b1,…,ad​bd\)\(a\_\{1\},\\ldots,a\_\{d\}\)\(b\_\{1\},\\ldots,b\_\{d\}\)=\(a\_\{1\}b\_\{1\},\\ldots,a\_\{d\}b\_\{d\}\)gives

𝒯G​\(𝐚,𝐛,𝐜\)=∏i=1d𝒯Gi​\(ai,bi,ci\)=\(𝒯G1⊗⋯⊗𝒯Gd\)​\(𝐚,𝐛,𝐜\)\.\\mathcal\{T\}\_\{G\}\(\\mathbf\{a\},\\mathbf\{b\},\\mathbf\{c\}\)=\\prod\_\{i=1\}^\{d\}\\mathcal\{T\}\_\{G\_\{i\}\}\(a\_\{i\},b\_\{i\},c\_\{i\}\)=\(\\mathcal\{T\}\_\{G\_\{1\}\}\\otimes\\cdots\\otimes\\mathcal\{T\}\_\{G\_\{d\}\}\)\(\\mathbf\{a\},\\mathbf\{b\},\\mathbf\{c\}\)\.\(14\)
\(iii\)Irreps ofG1×⋯×GdG\_\{1\}\\times\\cdots\\times G\_\{d\}are tensor productsρ1⊗⋯⊗ρd\\rho\_\{1\}\\otimes\\cdots\\otimes\\rho\_\{d\}\. Matrix elements factor:\(ρ1⊗⋯⊗ρd\)​\(g1,…,gd\)=ρ1​\(g1\)⊗⋯⊗ρd​\(gd\)\(\\rho\_\{1\}\\otimes\\cdots\\otimes\\rho\_\{d\}\)\(g\_\{1\},\\ldots,g\_\{d\}\)=\\rho\_\{1\}\(g\_\{1\}\)\\otimes\\cdots\\otimes\\rho\_\{d\}\(g\_\{d\}\)\. By thervec\\mathrm\{rvec\}construction ofFGF\_\{G\}and the mixed\-product property of Kronecker products,FG=FG1⊗⋯⊗FGdF\_\{G\}=F\_\{G\_\{1\}\}\\otimes\\cdots\\otimes F\_\{G\_\{d\}\}\. ∎

###### Corollary F\.2\(2D Frequency Resolution\)\.

ForG=ℤn1×ℤn2G=\\mathbb\{Z\}\_\{n\_\{1\}\}\\times\\mathbb\{Z\}\_\{n\_\{2\}\}, the Fourier transformFG=DFTn1⊗DFTn2F\_\{G\}=\\mathrm\{DFT\}\_\{n\_\{1\}\}\\otimes\\mathrm\{DFT\}\_\{n\_\{2\}\}computes a 2D DFT, resolving coupled frequencies\(f1,f2\)\(f\_\{1\},f\_\{2\}\)that are invisible to either factor alone\.

## Appendix GInvariant Feature Extraction Algorithm

The⋆G\\star\_\{G\}\-feature extractor used in every linear and Neural\-⋆G\\star\_\{G\}experiment of this paper is given in Algorithm[3](https://arxiv.org/html/2605.20440#alg3)\. The seven feature blocks \(a\)–\(g\) below correspond to the seven concatenated columns of the output\. Block \(a\) is the DC component; \(b\) the AC standard deviation; \(c\) the global per\-frequency power; \(d\) the per\-row Fourier power restricted to the firstKKequivariant rows; \(e\) the singular tube norms from the⋆G\\star\_\{G\}\-SVD; \(f\) the rows of𝒳\\mathcal\{X\}that are constant under the group action \(recovered by row\-variance thresholding\); and \(g\) four spectral statistics of the unfolded matrix𝒳\(1\)\\mathcal\{X\}\_\{\(1\)\}\. Every feature is exactlyGG\-invariant; proofs are in SI Section 7\.

Algorithm 3⋆G\\star\_\{G\}invariant feature extraction \(extractStarGFeatures\.m\)1:batch

𝒳∈ℝN×nf×n\\mathcal\{X\}\\in\\mathbb\{R\}^\{N\\times n\_\{f\}\\times n\}, group algebra

GG, optional normalization parameters

Θ\\Thetafrom training

2:feature matrix

Φ∈ℝN×dfeat\+1\\Phi\\in\\mathbb\{R\}^\{N\\times d\_\{\\mathrm\{feat\}\}\+1\}\(

\+1\+1for the unregularized intercept\) and updated

Θ\\Theta
3:if

Θ\\Thetais not providedthen

4:

\(p,q\)←arg⁡maxp⋅q≤nf⁡min⁡\(p,q\)\(p,q\)\\leftarrow\\arg\\max\_\{p\\cdot q\\leq n\_\{f\}\}\\min\(p,q\)⊳\\trianglerightbest near\-square reshape for⋆G\\star\_\{G\}\-SVD

5:

nsvd←min⁡\(p,q\)n\_\{\\mathrm\{svd\}\}\\leftarrow\\min\(p,q\)
6:

σrow←varg​\(𝒳​\(1,:,:\)\)\\sigma\_\{\\mathrm\{row\}\}\\leftarrow\\mathrm\{var\}\_\{g\}\(\\mathcal\{X\}\(1,:,:\)\);

inv​\_​mask←σrow<10−8⋅max⁡\(σrow\)\\mathrm\{inv\\\_mask\}\\leftarrow\\sigma\_\{\\mathrm\{row\}\}<10^\{\-8\}\\cdot\\max\(\\sigma\_\{\\mathrm\{row\}\}\)⊳\\trianglerightrows constant underGG

7:

eq​\_​idx←\{j:inv​\_​maskj=false\}\\mathrm\{eq\\\_idx\}\\leftarrow\\\{j:\\mathrm\{inv\\\_mask\}\_\{j\}=\\text\{false\}\\\};

K←min⁡\(14,\|eq​\_​idx\|\)K\\leftarrow\\min\(14,\|\\mathrm\{eq\\\_idx\}\|\)
8:endif

9:

𝒳¯i​j←1n​∑g𝒳​\(i,j,g\)\\bar\{\\mathcal\{X\}\}\_\{ij\}\\leftarrow\\tfrac\{1\}\{n\}\\sum\_\{g\}\\mathcal\{X\}\(i,j,g\)⊳\\triangleright\(a\) DC component,N×nfN\\times n\_\{f\}

10:

σi​j←stdg​𝒳​\(i,j,g\)\\sigma\_\{ij\}\\leftarrow\\mathrm\{std\}\_\{g\}\\mathcal\{X\}\(i,j,g\)⊳\\triangleright\(b\) AC energy,N×nfN\\times n\_\{f\}

11:

𝒳^←𝒳×3FG\\hat\{\\mathcal\{X\}\}\\leftarrow\\mathcal\{X\}\\times\_\{3\}F\_\{G\}⊳\\trianglerightGeneralized Fourier transform along group dim

12:

Pi​kcol←∑j\|𝒳^​\(i,j,k\)\|2P^\{\\mathrm\{col\}\}\_\{ik\}\\leftarrow\\sum\_\{j\}\|\\hat\{\\mathcal\{X\}\}\(i,j,k\)\|^\{2\}⊳\\triangleright\(c\) per\-frequency power,N×nN\\times n

13:

Pi,\(r−1\)​n\+krow←\|𝒳^​\(i,eq​\_​idxr,k\)\|2P^\{\\mathrm\{row\}\}\_\{i,\(r\-1\)n\+k\}\\leftarrow\|\\hat\{\\mathcal\{X\}\}\(i,\\mathrm\{eq\\\_idx\}\_\{r\},k\)\|^\{2\}⊳\\triangleright\(d\) per\-row Fourier power,N×K​nN\\times Kn

14:for

i=1,…,Ni=1,\\ldots,Ndo

15:

Xi←X\_\{i\}\\leftarrowpad

𝒳​\(i,:,:\)\\mathcal\{X\}\(i,:,:\)to

p×q×np\\times q\\times n
16:

\[⋅,𝒮i,⋅\]←starG​\_​SVD​\(Xi\)\[\\,\\cdot\\,,\\mathcal\{S\}\_\{i\},\\,\\cdot\\,\]\\leftarrow\\mathrm\{starG\\\_SVD\}\(X\_\{i\}\)⊳\\triangleright\(e\) singular tubes

17:

Ti,k←‖𝒮i​\(k,k,:\)‖FT\_\{i,k\}\\leftarrow\\\|\\mathcal\{S\}\_\{i\}\(k,k,:\)\\\|\_\{F\}for

k=1,…,nsvdk=1,\\ldots,n\_\{\\mathrm\{svd\}\}; sort

Ti,:T\_\{i,:\}descending

18:

Vi​j←𝒳​\(i,j,1\)V\_\{ij\}\\leftarrow\\mathcal\{X\}\(i,j,1\)for

jjwith

inv​\_​maskj\\mathrm\{inv\\\_mask\}\_\{j\}⊳\\triangleright\(f\) direct invariants

19:

Σi←svd​\(𝒳\(1\)​\(i,:,:\)\)\\Sigma\_\{i\}\\leftarrow\\mathrm\{svd\}\(\\mathcal\{X\}\_\{\(1\)\}\(i,:,:\)\)⊳\\triangleright\(g\) spectral statistics

20:

Si←\[∑Σi,Σi,1,Σi,1/Σi,end,−∑kΣ~i,k​log⁡Σ~i,k\]S\_\{i\}\\leftarrow\[\\sum\\Sigma\_\{i\},\\ \\Sigma\_\{i,1\},\\ \\Sigma\_\{i,1\}/\\Sigma\_\{i,\\mathrm\{end\}\},\\ \-\\sum\_\{k\}\\tilde\{\\Sigma\}\_\{i,k\}\\log\\tilde\{\\Sigma\}\_\{i,k\}\]where

Σ~i=Σi/∑Σi\\tilde\{\\Sigma\}\_\{i\}=\\Sigma\_\{i\}/\\sum\\Sigma\_\{i\}
21:endfor

22:

Φ←\[𝒳¯​\|σ\|​Pcol​\|Prow\|​T​\|V\|​S\]\\Phi\\leftarrow\[\\bar\{\\mathcal\{X\}\}\\ \|\\ \\sigma\\ \|\\ P^\{\\mathrm\{col\}\}\\ \|\\ P^\{\\mathrm\{row\}\}\\ \|\\ T\\ \|\\ V\\ \|\\ S\]
23:replace NaN/Inf with 0

24:if

Θ\\Thetanot providedthen

25:

keep←\{j:std​\(Φ:,j\)≥10−8\}\\mathrm\{keep\}\\leftarrow\\\{j:\\mathrm\{std\}\(\\Phi\_\{:,j\}\)\\geq 10^\{\-8\}\\\}
26:

μ←mean​\(Φ:,keep\)\\mu\\leftarrow\\mathrm\{mean\}\(\\Phi\_\{:,\\mathrm\{keep\}\}\);

s←std​\(Φ:,keep\)s\\leftarrow\\mathrm\{std\}\(\\Phi\_\{:,\\mathrm\{keep\}\}\);

sj←max⁡\(sj,1\)s\_\{j\}\\leftarrow\\max\(s\_\{j\},1\)
27:store

Θ=\(p,q,nsvd,inv​\_​mask,eq​\_​idx,K,keep,μ,s\)\\Theta=\(p,q,n\_\{\\mathrm\{svd\}\},\\mathrm\{inv\\\_mask\},\\mathrm\{eq\\\_idx\},K,\\mathrm\{keep\},\\mu,s\)
28:endif

29:

Φ←\(Φ:,keep−μ\)/s\\Phi\\leftarrow\(\\Phi\_\{:,\\mathrm\{keep\}\}\-\\mu\)/s
30:

Φ←\[1N\|Φ\]\\Phi\\leftarrow\[\\,\\mathbf\{1\}\_\{N\}\\ \|\\ \\Phi\\,\]⊳\\trianglerightprepend unregularized intercept

31:return

\(Φ,Θ\)\(\\Phi,\\Theta\)

## Appendix HEquivariance Proofs

###### Proposition H\.1\(Equivariance of⋆G\\star\_\{G\}\)\.

\(g⋅𝒜\)⋆Gℬ=g⋅\(𝒜⋆Gℬ\)=𝒜⋆G\(g⋅ℬ\)\(g\\cdot\\mathcal\{A\}\)\\star\_\{G\}\\mathcal\{B\}=g\\cdot\(\\mathcal\{A\}\\star\_\{G\}\\mathcal\{B\}\)=\\mathcal\{A\}\\star\_\{G\}\(g\\cdot\\mathcal\{B\}\)\.

###### Proof\.

By definition of the group action and the⋆G\\star\_\{G\}product,

\(\(g⋅𝒜\)⋆Gℬ\)i​j​\(h\)\\displaystyle\\bigl\(\(g\\cdot\\mathcal\{A\}\)\\star\_\{G\}\\mathcal\{B\}\\bigr\)\_\{ij\}\(h\)=∑k∑a∈G𝒜i​k​\(g−1​a\)​ℬk​j​\(a−1​h\)\.\\displaystyle=\\sum\_\{k\}\\sum\_\{a\\in G\}\\mathcal\{A\}\_\{ik\}\(g^\{\-1\}a\)\\,\\mathcal\{B\}\_\{kj\}\(a^\{\-1\}h\)\.Substitutinga′=g−1​aa^\{\\prime\}=g^\{\-1\}a\(soa=g​a′a=ga^\{\\prime\}anda−1=a′⁣−1​g−1a^\{\-1\}=a^\{\\prime\-1\}g^\{\-1\}\),

=∑k∑a′∈G𝒜i​k​\(a′\)​ℬk​j​\(a′⁣−1​g−1​h\)\\displaystyle=\\sum\_\{k\}\\sum\_\{a^\{\\prime\}\\in G\}\\mathcal\{A\}\_\{ik\}\(a^\{\\prime\}\)\\,\\mathcal\{B\}\_\{kj\}\(a^\{\\prime\-1\}g^\{\-1\}h\)=\(𝒜⋆Gℬ\)i​j​\(g−1​h\)=\(g⋅\(𝒜⋆Gℬ\)\)i​j​\(h\)\.\\displaystyle=\(\\mathcal\{A\}\\star\_\{G\}\\mathcal\{B\}\)\_\{ij\}\(g^\{\-1\}h\)=\\bigl\(g\\cdot\(\\mathcal\{A\}\\star\_\{G\}\\mathcal\{B\}\)\\bigr\)\_\{ij\}\(h\)\.The right\-equivariance identity follows by the symmetric argument applied to the second factor\. ∎

###### Corollary H\.2\(Invariance of Features\)\.

The following are invariant underg⋅Xg\\cdot X: \(i\)‖𝐬i‖F\\\|\\mathbf\{s\}\_\{i\}\\\|\_\{F\}\(singular tube norms\); \(ii\)‖X^​\(:,:,ρ\)‖F2\\\|\\hat\{X\}\(:,:,\\rho\)\\\|\_\{F\}^\{2\}\(per\-irrep Fourier power\); \(iii\)X¯j=1n​∑gX​\(j,g\)\\bar\{X\}\_\{j\}=\\frac\{1\}\{n\}\\sum\_\{g\}X\(j,g\)\(DC component\)\.

###### Proof\.

\(i\) The group action permutes frontal slices; the SVD is computed per\-irrep where the action multiplies each block by a unitary, preserving singular values\. \(ii\) The group action shiftsg→g′​gg\\to g^\{\\prime\}ginside the sum definingX^​\(:,:,ρ\)\\hat\{X\}\(:,:,\\rho\), multiplying the block byρ​\(g′\)\\rho\(g^\{\\prime\}\), which is unitary, leaving the Frobenius norm unchanged\. \(iii\)1n​∑gX​\(j,g′​g\)=1n​∑hX​\(j,h\)=X¯j\\frac\{1\}\{n\}\\sum\_\{g\}X\(j,g^\{\\prime\}g\)=\\frac\{1\}\{n\}\\sum\_\{h\}X\(j,h\)=\\bar\{X\}\_\{j\}\. ∎

## Appendix IExtended Data

Results on extended data are presented in figures[13](https://arxiv.org/html/2605.20440#A9.F13)and[14](https://arxiv.org/html/2605.20440#A9.F14), and tables[11](https://arxiv.org/html/2605.20440#A9.T11)and[12](https://arxiv.org/html/2605.20440#A9.T12)\.

![Refer to caption](https://arxiv.org/html/2605.20440v1/x12.png)Figure 13:Extended Data Figure 1: Ablation of symmetry components\.Removing group structure from the product group experiment causes systematic degradation\.![Refer to caption](https://arxiv.org/html/2605.20440v1/x13.png)Figure 14:Extended Data Figure 2: Predicted vs\. true \(synthetic\)\.\(a\)⋆G\\star\_\{G\}\-SVD: perfect diagonal\. \(b\) Standard MLP: scattered\.Table 11:Per\-method hyperparameter settings used in all experiments\. Hidden widths\[64,32\]\[64,32\], ReLU activations, He initialization, and an unregularized bias per layer are common to all neural baselines\. “Native” = original training set of sizenn; “\|G\|\|G\|\-aug” = original training set augmented by applying everyg∈Gg\\in Gto each input, yielding\|G\|⋅n\|G\|\\cdot nsamples\.a80 epochs in the syntheticℤ12\\mathbb\{Z\}\_\{12\}experiment; 300 elsewhere\.b32 in the synthetic experiment\.c0\.005 in the synthetic experiment\.d80 in the synthetic experiment, 200 in the product\-group experiment, 300 elsewhere; the smaller epoch budget for heavily augmented training reflects the\|G\|\|G\|\-fold larger gradient budget per epoch\. Optimizer is Adam \(β1=0\.9,β2=0\.999,ε=10−8\\beta\_\{1\}=0\.9,\\beta\_\{2\}=0\.999,\\varepsilon=10^\{\-8\}\) with early\-stopping patience 20 on validation MSE for every neural baseline\.

Table 12:Computational cost \(wall\-clock, single seed, 1,000 molecules\)\.
## Appendix JWigner–Eckart Discovery: Extended Data

![Refer to caption](https://arxiv.org/html/2605.20440v1/x14.png)Figure 15:Extended Data Figure 4: Per\-irrep predictive power\.Grouped bar chart showingR2R^\{2\}from each irrep’s features alone, for all 9 quantum properties\. The qualitative pattern shift between scalar properties \(A1\-dominated\) and dipole vector components \(T1\-dominated\) is the data\-driven signature of the Wigner–Eckart selection rules\.![Refer to caption](https://arxiv.org/html/2605.20440v1/x15.png)Figure 16:Extended Data Figure 5: Irrep decomposition heatmap\.R2R^\{2\}for each \(property, irrep\) pair, sorted by tensor rank\. The block structure separating rank\-0 from rank\-1 properties is visible as a qualitative change in the A1and T1columns across the horizontal separator\.The octahedral groupOOwas constructed programmatically from its 24 rotation matrices \(6 face, 8 vertex, 6 edge rotations plus identity\)\. The multiplication table was verified to satisfy the group axioms\. The five irreducible representations were constructed as: A1\(trivial\), A2\(determinant\), T1\(the rotation matrices themselves, 3D\), and E \+ T2\(from the rank\-2 symmetric traceless tensor representation, decomposed into the 2D and 3D invariant subspaces\)\. All representations were verified to be closed under the group multiplication\.

Algorithm 4Per\-irrep Fourier decomposition for Wigner–Eckart analysis1:feature batch

𝒳∈ℝN×nf×\|G\|\\mathcal\{X\}\\in\\mathbb\{R\}^\{N\\times n\_\{f\}\\times\|G\|\}, irreps

G^=\{ρ1,…,ρM\}\\hat\{G\}=\\\{\\rho\_\{1\},\\ldots,\\rho\_\{M\}\\\}with dimensions

dρd\_\{\\rho\}and matrices

ρ​\(g\)\\rho\(g\), target vector

y∈ℝNy\\in\\mathbb\{R\}^\{N\}, ridge grid

Λ\\Lambda
2:per\-irrep predictive scores

\{Rρ2\}ρ∈G^\\\{R^\{2\}\_\{\\rho\}\\\}\_\{\\rho\\in\\hat\{G\}\}
3:for

ρ∈G^\\rho\\in\\hat\{G\}do

4:for

i=1,…,Ni=1,\\ldots,Nand

j=1,…,nfj=1,\\ldots,n\_\{f\}in paralleldo

5:

X^i​jρ←dρ/\|G\|​∑g𝒳​\(i,j,g\)​ρ​\(g\)∈ℂdρ×dρ\\hat\{X\}^\{\\rho\}\_\{ij\}\\leftarrow\\sqrt\{d\_\{\\rho\}/\|G\|\}\\sum\_\{g\}\\mathcal\{X\}\(i,j,g\)\\,\\rho\(g\)\\in\\mathbb\{C\}^\{d\_\{\\rho\}\\times d\_\{\\rho\}\}
6:

Pi​jρ←‖X^i​jρ‖F2P^\{\\rho\}\_\{ij\}\\leftarrow\\\|\\hat\{X\}^\{\\rho\}\_\{ij\}\\\|\_\{F\}^\{2\}⊳\\trianglerightGG\-invariant power

7:endfor

8:assemble

Φρ∈ℝN×nf\\Phi^\{\\rho\}\\in\\mathbb\{R\}^\{N\\times n\_\{f\}\}from

\{Pi​jρ\}\\\{P^\{\\rho\}\_\{ij\}\\\}
9:split

Φρ,y\\Phi^\{\\rho\},yinto train/val/test \(70/15/15\); standardize

Φρ\\Phi^\{\\rho\}
10:

λ⋆←arg⁡minλ∈Λ⁡MSEval​\(Φρ,y;λ\)\\lambda^\{\\star\}\\leftarrow\\arg\\min\_\{\\lambda\\in\\Lambda\}\\mathrm\{MSE\}\_\{\\mathrm\{val\}\}\(\\Phi^\{\\rho\},y;\\lambda\)
11:

wρ←\(Φtrainρ⊤​Φtrainρ\+λ⋆​I\)−1​Φtrainρ⊤​ytrainw\_\{\\rho\}\\leftarrow\(\\Phi^\{\\rho\\top\}\_\{\\mathrm\{train\}\}\\Phi^\{\\rho\}\_\{\\mathrm\{train\}\}\+\\lambda^\{\\star\}I\)^\{\-1\}\\Phi^\{\\rho\\top\}\_\{\\mathrm\{train\}\}y\_\{\\mathrm\{train\}\}
12:

Rρ2←1−SSres​\(Φtestρ​wρ,ytest\)/SStot​\(ytest\)R^\{2\}\_\{\\rho\}\\leftarrow 1\-\\mathrm\{SS\}\_\{\\mathrm\{res\}\}\(\\Phi^\{\\rho\}\_\{\\mathrm\{test\}\}w\_\{\\rho\},y\_\{\\mathrm\{test\}\}\)/\\mathrm\{SS\}\_\{\\mathrm\{tot\}\}\(y\_\{\\mathrm\{test\}\}\)
13:endfor

14:return

\{Rρ2\}\\\{R^\{2\}\_\{\\rho\}\\\},

\{T1/A1​ratio\}\\\{T\_\{1\}/A\_\{1\}\\text\{ ratio\}\\\}, irrep heatmap data

## Appendix KSymmetry and Factorization Discovery Algorithm

The symmetry\-discovery experiment of Section 2\.4 of the main text scans a candidate library of finite groups, fits the⋆G\\star\_\{G\}pipeline with each candidate, and selects the group that maximizes a combined accuracy / invariance score\. Algorithm[5](https://arxiv.org/html/2605.20440#alg5)states the procedure precisely\. The factorization\-discovery experiment uses the same algorithm restricted to candidate groups of the formℤa×ℤb\\mathbb\{Z\}\_\{a\}\\times\\mathbb\{Z\}\_\{b\}witha​b=ngroupab=n\_\{\\mathrm\{group\}\}\.

Algorithm 5Symmetry / factorization discovery via⋆G\\star\_\{G\}score scan1:dataset

\(𝒳,y\)\(\\mathcal\{X\},y\), candidate library

𝒢=\{G1,…,GK\}\\mathcal\{G\}=\\\{G\_\{1\},\\ldots,G\_\{K\}\\\}, mixing weight

α∈\[0,1\]\\alpha\\in\[0,1\]\(default 0\.7\)

2:ranked list

\{\(Gk,scorek\)\}k=1K\\\{\(G\_\{k\},\\mathrm\{score\}\_\{k\}\)\\\}\_\{k=1\}^\{K\}
3:for

Gk∈𝒢G\_\{k\}\\in\\mathcal\{G\}do

4:construct

GkG\_\{k\},

𝒯Gk\\mathcal\{T\}\_\{G\_\{k\}\},

FGkF\_\{G\_\{k\}\}\(cached\)

5:

Φk←\\Phi\_\{k\}\\leftarrowAlgorithm[3](https://arxiv.org/html/2605.20440#alg3)

\(𝒳,Gk\)\(\\mathcal\{X\},G\_\{k\}\)
6:split into train/val/test; standardize

7:

λ⋆←arg⁡minλ⁡MSEval​\(Φk,y;λ\)\\lambda^\{\\star\}\\leftarrow\\arg\\min\_\{\\lambda\}\\mathrm\{MSE\}\_\{\\mathrm\{val\}\}\(\\Phi\_\{k\},y;\\lambda\)
8:

Rk2←R^\{2\}\_\{k\}\\leftarrowridge\-regression

R2R^\{2\}on the validation fold

9:

νk←Varg∈Gk​y^k​\(g⋅𝒳val\)\\nu\_\{k\}\\leftarrow\\mathrm\{Var\}\_\{g\\in G\_\{k\}\}\\hat\{y\}\_\{k\}\(g\\cdot\\mathcal\{X\}\_\{\\mathrm\{val\}\}\)⊳\\trianglerightrotation variance underGkG\_\{k\}

10:endfor

11:

νmax←maxk⁡νk\\nu\_\{\\max\}\\leftarrow\\max\_\{k\}\\nu\_\{k\}
12:

scorek←α⋅Rk2\+\(1−α\)⋅\(1−νk/νmax\)\\mathrm\{score\}\_\{k\}\\leftarrow\\alpha\\cdot R^\{2\}\_\{k\}\+\(1\-\\alpha\)\\cdot\(1\-\\nu\_\{k\}/\\nu\_\{\\max\}\)
13:return

\{\(Gk,scorek\)\}\\\{\(G\_\{k\},\\mathrm\{score\}\_\{k\}\)\\\}sorted descending by score

## Appendix LBaseline Implementation Algorithms

This section gives explicit pseudocode for the two non\-trivial baselines used in the main paper: the Augmented MLP \(the strongest non\-equivariant baseline\) and the Neural⋆G\\star\_\{G\}network \(the equivariant non\-linear baseline\)\. The Standard MLP and Invariant MLP differ from the Augmented MLP only by their input representation \(raw frontal slice and\[mean,std,min,max\]g​𝒳\[\\mathrm\{mean\},\\mathrm\{std\},\\min,\\max\]\_\{g\}\\mathcal\{X\}respectively\) and by the absence of orbit augmentation; both follow the standard MLP training loop that wraps Algorithm[6](https://arxiv.org/html/2605.20440#alg6)once augmentation is removed\.

Algorithm 6Augmented MLP training1:training tensor

𝒳tr∈ℝn×nf×\|G\|\\mathcal\{X\}^\{\\mathrm\{tr\}\}\\in\\mathbb\{R\}^\{n\\times n\_\{f\}\\times\|G\|\}, target

ytr∈ℝny^\{\\mathrm\{tr\}\}\\in\\mathbb\{R\}^\{n\}, group

GG, validation

\(𝒳va,yva\)\(\\mathcal\{X\}^\{\\mathrm\{va\}\},y^\{\\mathrm\{va\}\}\), hidden widths

h=\[64,32\]h=\[64,32\], learning rate

η\\eta, max epochs

EE, batch size

BB, patience

PP
2:trained weights

W=\{W\(1\),W\(2\),W\(3\)\}W=\\\{W^\{\(1\)\},W^\{\(2\)\},W^\{\(3\)\}\\\}, biases

bb
3:

X~←reshape​\(permute​\(𝒳tr,\[1,3,2\]\),\[n​\|G\|,nf\]\)\\tilde\{X\}\\leftarrow\\mathrm\{reshape\}\(\\mathrm\{permute\}\(\\mathcal\{X\}^\{\\mathrm\{tr\}\},\[1,3,2\]\),\[\\,n\|G\|,n\_\{f\}\\,\]\)⊳\\trianglerightstack all\|G\|\|G\|orbit copies

4:

y~←repmat​\(ytr,\|G\|,1\)\\tilde\{y\}\\leftarrow\\mathrm\{repmat\}\(y^\{\\mathrm\{tr\}\},\|G\|,1\)⊳\\trianglerightlabels areGG\-invariant; replicate

5:

\(μ,s\)←\(mean​\(X~\),std​\(X~\)\+10−8\)\(\\mu,s\)\\leftarrow\(\\mathrm\{mean\}\(\\tilde\{X\}\),\\mathrm\{std\}\(\\tilde\{X\}\)\+10^\{\-8\}\)⊳\\trianglerightzz\-norm on augmented set

6:

X~←\(X~−μ\)/s\\tilde\{X\}\\leftarrow\(\\tilde\{X\}\-\\mu\)/s
7:initialize

W\(ℓ\)∼𝒩​\(0,2/fan​\_​inℓ\)W^\{\(\\ell\)\}\\sim\\mathcal\{N\}\(0,2/\\mathrm\{fan\\\_in\}\_\{\\ell\}\)\(He init\);

b\(ℓ\)←0b^\{\(\\ell\)\}\\leftarrow 0
8:Adam state:

m\(ℓ\),v\(ℓ\)←0m^\{\(\\ell\)\},v^\{\(\\ell\)\}\\leftarrow 0, step counter

t←0t\\leftarrow 0
9:best\-

W←WW\\leftarrow W;

wait←0\\mathrm\{wait\}\\leftarrow 0;

best​\_​val←\+∞\\mathrm\{best\\\_val\}\\leftarrow\+\\infty
10:for

epoch=1,…,E\\mathrm\{epoch\}=1,\\ldots,Edo

11:shuffle

X~\\tilde\{X\}
12:foreach minibatch of size

BBdo

13:

t←t\+1t\\leftarrow t\+1
14:forward:

A\(0\)←XbatchA^\{\(0\)\}\\leftarrow X\_\{\\mathrm\{batch\}\};

A\(ℓ\)←ReLU​\(W\(ℓ\)​A\(ℓ−1\)\+b\(ℓ\)\)A^\{\(\\ell\)\}\\leftarrow\\mathrm\{ReLU\}\(W^\{\(\\ell\)\}A^\{\(\\ell\-1\)\}\+b^\{\(\\ell\)\}\)for

ℓ<L\\ell<L;

A\(L\)←W\(L\)​A\(L−1\)\+b\(L\)A^\{\(L\)\}\\leftarrow W^\{\(L\)\}A^\{\(L\-1\)\}\+b^\{\(L\)\}
15:

ℒ←1B​‖A\(L\)−ybatch‖2\\mathcal\{L\}\\leftarrow\\tfrac\{1\}\{B\}\\\|A^\{\(L\)\}\-y\_\{\\mathrm\{batch\}\}\\\|^\{2\}
16:backprop

∇Wℒ,∇bℒ\\nabla\_\{W\}\\mathcal\{L\},\\nabla\_\{b\}\\mathcal\{L\}
17:Adam update:

m,vm,vexponential moving averages with

β1=0\.9,β2=0\.999,ε=10−8\\beta\_\{1\}=0\.9,\\beta\_\{2\}=0\.999,\\varepsilon=10^\{\-8\}
18:

W\(ℓ\)←W\(ℓ\)−η⋅m^\(ℓ\)/\(v^\(ℓ\)\+ε\)W^\{\(\\ell\)\}\\leftarrow W^\{\(\\ell\)\}\-\\eta\\cdot\\hat\{m\}^\{\(\\ell\)\}/\(\\sqrt\{\\hat\{v\}^\{\(\\ell\)\}\}\+\\varepsilon\)
19:endfor

20:

ℒval←\\mathcal\{L\}\_\{\\mathrm\{val\}\}\\leftarrowMSE on

\(\(𝒳va​\(:,:,e\)−μ\)/s,yva\)\(\(\\mathcal\{X\}^\{\\mathrm\{va\}\}\(:,:,e\)\-\\mu\)/s,y^\{\\mathrm\{va\}\}\)
21:if

ℒval<best​\_​val\\mathcal\{L\}\_\{\\mathrm\{val\}\}<\\mathrm\{best\\\_val\}then

22:

best​\_​val←ℒval\\mathrm\{best\\\_val\}\\leftarrow\\mathcal\{L\}\_\{\\mathrm\{val\}\}; best\-

W←WW\\leftarrow W;

wait←0\\mathrm\{wait\}\\leftarrow 0
23:else

24:

wait←wait\+1\\mathrm\{wait\}\\leftarrow\\mathrm\{wait\}\+1;if

wait≥P\\mathrm\{wait\}\\geq Pbreak

25:endif

26:endfor

27:returnbest\-

WW, best\-

bb

Algorithm 7Neural⋆G\\star\_\{G\}forward pass and gradient1:batch

𝒳∈ℝN×nf×\|G\|\\mathcal\{X\}\\in\\mathbb\{R\}^\{N\\times n\_\{f\}\\times\|G\|\},

⋆G\\star\_\{G\}\-weights

\{W\(ℓ\)\}ℓ=1L\\\{W^\{\(\\ell\)\}\\\}\_\{\\ell=1\}^\{L\}with

W\(ℓ\)∈ℝnℓ\+1×nℓ×\|G\|W^\{\(\\ell\)\}\\in\\mathbb\{R\}^\{n\_\{\\ell\+1\}\\times n\_\{\\ell\}\\times\|G\|\}, biases

\{b\(ℓ\)\}∈ℝnℓ\+1×1×\|G\|\\\{b^\{\(\\ell\)\}\\\}\\in\\mathbb\{R\}^\{n\_\{\\ell\+1\}\\times 1\\times\|G\|\}, group

GG
2:scalar predictions

y^∈ℝN\\hat\{y\}\\in\\mathbb\{R\}^\{N\}
3:

𝒜\(0\)←𝒳\\mathcal\{A\}^\{\(0\)\}\\leftarrow\\mathcal\{X\}
4:for

ℓ=1,…,L\\ell=1,\\ldots,Ldo

5:

𝒵\(ℓ\)←W\(ℓ\)⋆G𝒜\(ℓ−1\)\+b\(ℓ\)\\mathcal\{Z\}^\{\(\\ell\)\}\\leftarrow W^\{\(\\ell\)\}\\star\_\{G\}\\mathcal\{A\}^\{\(\\ell\-1\)\}\+b^\{\(\\ell\)\}⊳\\trianglerightAlgorithm[1](https://arxiv.org/html/2605.20440#alg1)

6:if

ℓ<L\\ell<Lthen

7:

𝒜\(ℓ\)←ReLU​\(𝒵\(ℓ\)\)\\mathcal\{A\}^\{\(\\ell\)\}\\leftarrow\\mathrm\{ReLU\}\(\\mathcal\{Z\}^\{\(\\ell\)\}\)
8:else

9:

𝒜\(ℓ\)←𝒵\(ℓ\)\\mathcal\{A\}^\{\(\\ell\)\}\\leftarrow\\mathcal\{Z\}^\{\(\\ell\)\}⊳\\trianglerightlinear output

10:endif

11:endfor

12:

y^i←1nL​\|G\|​∑j,g𝒜\(L\)​\(i,j,g\)\\hat\{y\}\_\{i\}\\leftarrow\\tfrac\{1\}\{n\_\{L\}\|G\|\}\\sum\_\{j,g\}\\mathcal\{A\}^\{\(L\)\}\(i,j,g\)⊳\\trianglerightGG\-invariant pooling

13:return

y^\\hat\{y\}
14:Training\(300 epochs, Adam

η=0\.003\\eta=0\.003, batch 32, patience 20\): backprop through Algorithm[1](https://arxiv.org/html/2605.20440#alg1)layer\-by\-layer; equivariance is preserved exactly to floating\-point precision because each step factors through the per\-irrep block multiplication\.

### L\.1ENN baselines: SchNet, e3nn, MACE

The three ENN baselines used in Section 2\.7 are not reimplemented from scratch: each is a published reference implementation, executed on the same train/val/test splits and the same seeds as the⋆G\\star\_\{G\}and MLP baselines\. The configuration we used is recorded explicitly so that the comparison is reproducible\.

- •SchNetSchütt et al\. \([2017](https://arxiv.org/html/2605.20440#bib.bib9)\)\. Reference implementation:schnetpackv2\.0\.4 \(pinned in the repository’srequirements\.txt\)\. Configuration:128128atom\-basis features,66interaction blocks,2020\-Gaussian radial basis, cosine cutoff at5\.05\.0Å, MSE loss with L1 monitoring, Adamη=5×10−4\\eta=5\{\\times\}10^\{\-4\}, batch6464, max200200epochs, early\-stop patience2020\. We run the standardspk\.datasets\.QM9loader withremove\_uncharacterized =Trueto match PyG’s130,831130\{,\}831\-molecule subset, theASENeighborList\(cutoff=5\.0\)transform, and a per\-targetzz\-norm viaRemoveOffsets/AddOffsets\. Scalar targets only \(μ\\mu,α\\alpha, gap, ZPVE\)\.
- •e3nn\-based SE\(3\)\-equivariant modelThomas et al\. \([2018](https://arxiv.org/html/2605.20440#bib.bib6)\); Geiger and Smidt \([2022](https://arxiv.org/html/2605.20440#bib.bib22)\)\. A compact equivariant message\-passing network built directly frome3nnv0\.5\.4 primitives\. Three layers, hidden irreps32x0e\+16x1o\+8x2e, edge spherical harmonics1x0e\+1x1o\+1x2e, RBF1616Gaussians on the0−50\\\!\-\\\!5Å cutoff, gated equivariant non\-linearities, sum\-pool over atoms,FullyConnectedTensorProducthead with output irreps matched to target rank \(1x0efor scalars,1x1ofor𝝁\\boldsymbol\{\\mu\}vector,1x2e\+1x0eforα\\alphatensor\)\. Adamη=5×10−4\\eta=5\{\\times\}10^\{\-4\}, batch3232,200200epochs, patience2020\. Used for tensor\-rank\-matched comparison rather than as the SOTA ENN target\.
- •MACEBatatia et al\. \([2022](https://arxiv.org/html/2605.20440#bib.bib23)\)\. Reference implementation:mace\-torchv0\.3\.15 \(pinned inrequirements\.txt\)\. Configuration:ScaleShiftMACEwithrmax=5\.0r\_\{\\max\}=5\.0Å,88Bessel radial features,55\-th order polynomial cutoff,ℓmax=3\\ell\_\{\\max\}=3, correlation33, two interaction blocks \(RealAgnosticInteractionBlockfirst;RealAgnosticResidualInteractionBlocksecond\), hidden irreps128x0e\+128x1o, MLP irreps16x0e,55elements \(H/C/N/O/F\), per\-target shift/scale\(y¯,std​y\)\(\\bar\{y\},\\mathrm\{std\}\\,y\)\. Optimizer: Adam \(amsgrad\),η=10−3\\eta=10^\{\-3\}, batch3232,ReduceLROnPlateau\(factor0\.50\.5, patience1515\), max200200epochs, early\-stop patience2525on validation MSE\. Total parameter count:945,168945\{,\}168\. Scalar targets only\.

*Implementation notes\.*Three non\-trivial integration adjustments were required that may be useful to anyone reproducing the comparison\. \(i\) Inschnetpack’sModelOutput, every metric must be annn\.Module\(lambda functions are silently rejected bynn\.ModuleDict\); we use onlyL1Lossas the metric and recompute RMSE/R2R^\{2\}ourselves from saved test predictions\. \(ii\) Inmace\-torch≥0\.3\.10\{\\geq\}\\,0\.3\.10,interaction\_clsandinteraction\_cls\_firstare mandatory, andhidden\_irreps/MLP\_irrepsmust beo3\.Irrepsobjects, not strings; per\-moleculeAtomicDataconstruction requires a single sharedz\_tablefor the full element set \(H/C/N/O/F\) so that the per\-graphnode\_attrshave uniform width when batched\. \(iii\) Loadinge3nn 0\.4\.4’s pickledconstants\.pt\(transitively imported bymace\-torch\) fails under PyTorch≥2\.6\{\\geq\}\\,2\.6’s defaultweights\_only=True; we setTORCH\_FORCE\_NO\_WEIGHTS\_ONLY\_LOAD=1for the comparison runs only\.

## Appendix MEnd\-to\-End Workflow

The complete pipeline used to produce every numerical result in this paper is summarized in Algorithm[8](https://arxiv.org/html/2605.20440#alg8)\. The pipeline is identical across the synthetic, QM9, product\-group, symmetry\-discovery, and Wigner–Eckart experiments; only the groupGG, the featurizationϕ:molecule↦𝒳\\phi:\\mathrm\{molecule\}\\mapsto\\mathcal\{X\}, and the regression head differ\.

Algorithm 8End\-to\-end⋆G\\star\_\{G\}\-SVD \+ ridge pipeline1:dataset

\{\(moli,yi\)\}i=1N\\\{\(\\mathrm\{mol\}\_\{i\},y\_\{i\}\)\\\}\_\{i=1\}^\{N\}, group

GG, featurizer

ϕ\\phi, ridge grid

Λ\\Lambda
2:trained predictor

f^\\hat\{f\}, test scores

3:precompute

𝒯G\\mathcal\{T\}\_\{G\},

FGF\_\{G\}, irrep dimensions

\{dρ\}\\\{d\_\{\\rho\}\\\}
4:for

i=1,…,Ni=1,\\ldots,Ndo

5:

𝒳i←ϕ​\(moli;G\)\\mathcal\{X\}\_\{i\}\\leftarrow\\phi\(\\mathrm\{mol\}\_\{i\};G\)⊳\\trianglerighttensorial featurization

6:endfor

7:assemble

𝒳∈ℝN×nf×\|G\|\\mathcal\{X\}\\in\\mathbb\{R\}^\{N\\times n\_\{f\}\\times\|G\|\}
8:split

𝒳,y\\mathcal\{X\},yinto train/val/test \(70/15/15\)

9:

\(Φtr,Θ\)←\(\\Phi\_\{\\mathrm\{tr\}\},\\Theta\)\\leftarrowAlgorithm[3](https://arxiv.org/html/2605.20440#alg3)

\(𝒳tr,G\)\(\\mathcal\{X\}\_\{\\mathrm\{tr\}\},G\)
10:

Φva,Φte←\\Phi\_\{\\mathrm\{va\}\},\\Phi\_\{\\mathrm\{te\}\}\\leftarrowAlgorithm[3](https://arxiv.org/html/2605.20440#alg3)

\(⋅,G;Θ\)\(\\cdot,G;\\Theta\)
11:

λ⋆←arg⁡minλ∈Λ⁡MSEval​\(Φva,yva;λ\)\\lambda^\{\\star\}\\leftarrow\\arg\\min\_\{\\lambda\\in\\Lambda\}\\mathrm\{MSE\}\_\{\\mathrm\{val\}\}\(\\Phi\_\{\\mathrm\{va\}\},y\_\{\\mathrm\{va\}\};\\lambda\)
12:

w←\(Φtr⊤​Φtr\+λ⋆​I\)−1​Φtr⊤​ytrw\\leftarrow\(\\Phi\_\{\\mathrm\{tr\}\}^\{\\top\}\\Phi\_\{\\mathrm\{tr\}\}\+\\lambda^\{\\star\}I\)^\{\-1\}\\Phi\_\{\\mathrm\{tr\}\}^\{\\top\}y\_\{\\mathrm\{tr\}\}
13:

Rte2←1−SSres​\(Φte​w,yte\)/SStot​\(yte\)R^\{2\}\_\{\\mathrm\{te\}\}\\leftarrow 1\-\\mathrm\{SS\_\{res\}\}\(\\Phi\_\{\\mathrm\{te\}\}w,y\_\{\\mathrm\{te\}\}\)/\\mathrm\{SS\_\{tot\}\}\(y\_\{\\mathrm\{te\}\}\)
14:

ν←Varg∈G​y^​\(g⋅𝒳te\)\\nu\\leftarrow\\mathrm\{Var\}\_\{g\\in G\}\\hat\{y\}\(g\\cdot\\mathcal\{X\}\_\{\\mathrm\{te\}\}\)⊳\\trianglerightrotation\-variance audit

15:return

f^:𝒳↦Algorithm​[3](https://arxiv.org/html/2605.20440#alg3)​\(𝒳;Θ\)⋅w\\hat\{f\}:\\mathcal\{X\}\\mapsto\\mathrm\{Algorithm~\\ref\{alg:features\}\}\(\\mathcal\{X\};\\Theta\)\\cdot w,

Rte2R^\{2\}\_\{\\mathrm\{te\}\},

ν\\nu

## Appendix NFormal Verification in Lean 4

All core algebraic results in this paper have been machine\-verified in the Lean 4 proof assistantde Moura and Ullrich \([2021](https://arxiv.org/html/2605.20440#bib.bib18)\)using the Mathlib libraryThe mathlib Community \([2020](https://arxiv.org/html/2605.20440#bib.bib19)\)\. The formalization comprises 600 lines of Lean 4, with zero unresolved proof obligations \(sorry\), providing a certificate of correctness for every theorem, lemma, and corollary in the main text and supplementary information\.

### N\.1Architecture

The formalization is organized into six modules mirroring the paper’s logical structure:

### N\.2Axiom Budget

Five standard results from linear algebra and finite\-group harmonic analysis are axiomatized because they are not yet available in Mathlib\. Every other statement is derived from first principles\.

### N\.3Key Proof Techniques

##### Associativity of⋆G\\star\_\{G\}\(Proposition 4\.2\(i\)\)\.

Rather than fragile nested sum\-exchange calls \(Finset\.sum\_comm\), we define an explicitEquivon the 4\-tuple product typeFin​p×\(G×\(Fin​m×G\)\)\\mathrm\{Fin\}\\,p\\times\(G\\times\(\\mathrm\{Fin\}\\,m\\times G\)\)that simultaneously permutes components and applies the bijectionb↦a−1​bb\\mapsto a^\{\-1\}b\. A single call toFintype\.sum\_equivthen completes the proof, with the group\-element arithmetic handled by thegrouptactic\.

##### Kronecker product of irreps \(Theorem 2\(iii\)\)\.

The tensor\-product representationρ1⊗ρ2\\rho\_\{1\}\\otimes\\rho\_\{2\}is defined entry\-wise viafinProdFinEquiv\.symm, mappingFin​\(d1​d2\)\\mathrm\{Fin\}\(d\_\{1\}d\_\{2\}\)indices to pairsFin​d1×Fin​d2\\mathrm\{Fin\}\\,d\_\{1\}\\times\\mathrm\{Fin\}\\,d\_\{2\}\. Asum\_splithelper converts sums overFin​\(d1​d2\)\\mathrm\{Fin\}\(d\_\{1\}d\_\{2\}\)into double sums, after which theis\_homandunitaryproofs factor naturally into products of single sums viaρi\.is\_hom\\rho\_\{i\}\.\\text\{is\\\_hom\}andρi\.unitary\\rho\_\{i\}\.\\text\{unitary\}\.

##### Fourier power invariance \(Corollary 7\.2\(ii\)\)\.

ThefourierBlock\_leftActionlemma shows that the group action multiplies each Fourier block by\(Iℓ⊗ρ​\(g\)\)\(I\_\{\\ell\}\\otimes\\rho\(g\)\)\. Anorthogonal\_preserves\_sum\_sqlemma proves∑s\(∑s′Rs,s′​vs′\)2=∑svs2\\sum\_\{s\}\(\\sum\_\{s^\{\\prime\}\}R\_\{s,s^\{\\prime\}\}v\_\{s^\{\\prime\}\}\)^\{2\}=\\sum\_\{s\}v\_\{s\}^\{2\}whenR⊤​R=IR^\{\\top\}R=I, by expanding squares, exchanging sums, and applying orthogonality\.

##### Eckart–Young for⋆G\\star\_\{G\}\(Theorem 1\)\.

The optimal rank\-kkapproximation is defined in the Fourier domain viafourier\_surjective: its Fourier block at each irrepρ\\rhois, by construction, the best rank\-kkmatrix approximation of𝒜^​\(:,:,ρ\)\\hat\{\\mathcal\{A\}\}\(:,:,\\rho\)\(obtained frommatrix\_best\_rank\_k\_approx\)\. Per\-irrep optimality is then a direct consequence of the classical matrix Eckart–Young theorem\. The global bound follows by applyingparseval\_groupto decompose the Frobenius error into per\-irrep terms, multiplying each per\-irrep inequality by the positive Parseval weightdρ/\|G\|d\_\{\\rho\}/\|G\|, and summing viaFinset\.sum\_le\_sum\.

##### Wigner–Eckart selection rules \(§2\.5\)\.

The octahedral group’s five irreps are encoded as an inductive typeOctIrrepwith decidable equality\. Clebsch–Gordan multiplicities are hardcoded from the standard character table and verified byrfl\(definitional equality\)\. The three selection rules are proved as concrete multiplicity computations: \(i\)A1⊗\\otimesρ\\rho=ρ\\rhofor allρ\\rho; \(ii\)T1⊗\\otimesT1contains A1; \(iii\)Sym2\(T1\)has zero T1multiplicity\. The dimension formula∑dρ2=24\\sum d\_\{\\rho\}^\{2\}=24is verified bynative\_decide\.

### N\.4Verification Status

The formalization achieves:

- •Zerosorry\(unresolved proof obligations\) across all six modules\.
- •Five declared axioms, all corresponding to standard textbook results not yet available in Mathlib\. Of these, four are transitively used in proofs of theorems in the paper: - –matrix\_best\_rank\_k\_approx,parseval\_group, andfourier\_surjective, used in Theorem[E\.2](https://arxiv.org/html/2605.20440#A5.Thmtheorem2); - –fourier\_multiplicative, used in Theorem[F\.1](https://arxiv.org/html/2605.20440#A6.Thmtheorem1)\. The fifth axiom,fourier\_injective, is declared but not currently invoked; it is retained for completeness, since closing it would simultaneously closefourier\_surjectivevia Peter–Weyl\.
- •Complete coverage of Theorems[E\.2](https://arxiv.org/html/2605.20440#A5.Thmtheorem2)and[F\.1](https://arxiv.org/html/2605.20440#A6.Thmtheorem1), the algebraic identities of Proposition[B\.2](https://arxiv.org/html/2605.20440#A2.Thmtheorem2)and the⋆G\\star\_\{G\}\-product properties, equivariance and Frobenius/Fourier\-power invariance, the Kronecker product construction for product\-group irreps, and the Wigner–Eckart selection rules from §2\.5\.
- •All algebraic theorems \(associativity, identity, distributivity, transpose, equivariance, Frobenius and per\-irrep Fourier\-power invariance\) depend solely on Lean’s three core axioms \(propext,Classical\.choice,Quot\.sound\); they introduce no project\-level axioms\. TheStarG/Audit\.leanmodule exhibits the full\#print axiomscertificate for each theorem\.

To our knowledge, this is the first machine\-verified proof of an Eckart–Young\-type optimality theorem for symmetry\-preserving tensor approximation\.

## References

- Kolda and Bader \(2009\)Tamara G\. Kolda and Brett W\. Bader\.Tensor decompositions and applications\.*SIAM Review*, 51:455–500, 2009\.
- Sidiropoulos et al\. \(2017\)Nicholas D\. Sidiropoulos et al\.Tensor decomposition for signal processing and machine learning\.*IEEE Trans\. Signal Process\.*, 65:3551–3582, 2017\.
- Noether \(1918\)Emmy Noether\.Invariante Variationsprobleme\.*Nachr\. Ges\. Wiss\. Göttingen*, pages 235–257, 1918\.
- Bronstein et al\. \(2021\)Michael M\. Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković\.Geometric deep learning: Grids, groups, graphs, geodesics, and gauges\.*arXiv:2104\.13478*, 2021\.
- Cohen and Welling \(2016\)Taco Cohen and Max Welling\.Group equivariant convolutional networks\.In*ICML*, 2016\.
- Thomas et al\. \(2018\)Nathaniel Thomas et al\.Tensor field networks\.*arXiv:1802\.08219*, 2018\.
- Fuchs et al\. \(2020\)Fabian Fuchs et al\.SE\(3\)\-transformers\.In*NeurIPS*, 2020\.
- Batzner et al\. \(2022\)Simon Batzner et al\.E\(3\)\-equivariant graph neural networks for interatomic potentials\.*Nat\. Commun\.*, 13:2453, 2022\.
- Schütt et al\. \(2017\)Kristof T\. Schütt et al\.SchNet\.In*NeurIPS*, 2017\.
- Jumper et al\. \(2021\)John Jumper et al\.Highly accurate protein structure prediction with AlphaFold\.*Nature*, 596:583–589, 2021\.
- Kilmer et al\. \(2021\)Misha E\. Kilmer, Lior Horesh, Haim Avron, and Elizabeth Newman\.Tensor\-tensor products for optimal representation and compression\.*PNAS*, 118:e2015851118, 2021\.
- Kernfeld et al\. \(2015\)Eric Kernfeld, Misha Kilmer, and Shuchin Aeron\.Tensor–tensor products with invertible linear transforms\.*Linear Algebra Appl\.*, 485:545–570, 2015\.
- Serre \(1977\)Jean\-Pierre Serre\.*Linear Representations of Finite Groups*\.Springer, 1977\.
- Peter and Weyl \(1927\)Fritz Peter and Hermann Weyl\.Die vollständigkeit der primitiven darstellungen\.*Math\. Ann\.*, 97:737–755, 1927\.
- Eckart and Young \(1936\)Carl Eckart and Gale Young\.The approximation of one matrix by another of lower rank\.*Psychometrika*, 1:211–218, 1936\.
- de Silva and Lim \(2008\)Vin de Silva and Lek\-Heng Lim\.Tensor rank and the ill\-posedness of the best low\-rank approximation problem\.*SIAM J\. Matrix Anal\. Appl\.*, 30:1084–1127, 2008\.
- Ramakrishnan et al\. \(2014\)Raghunathan Ramakrishnan et al\.Quantum chemistry structures and properties of 134 thousand molecules\.*Sci\. Data*, 1:140022, 2014\.doi:10\.1038/sdata\.2014\.22\.
- de Moura and Ullrich \(2021\)Leonardo de Moura and Sebastian Ullrich\.The Lean 4 theorem prover and programming language\.In*CADE*, 2021\.
- The mathlib Community \(2020\)The mathlib Community\.The Lean mathematical library\.[https://github\.com/leanprover\-community/mathlib4](https://github.com/leanprover-community/mathlib4), 2020\.
- Mor and Avron \(2025\)Uria Mor and Haim Avron\.Quasi tubal tensor algebra for separable groups\.*arXiv:2504\.16231 preprint*, 2025\.
- Mor \(2026\)Uria Mor\.Sufficient and necessary conditions for an Eckart–Young theorem\.*arXiv:2512\.24405 preprint*, 2026\.
- Geiger and Smidt \(2022\)Mario Geiger and Tess Smidt\.e3nn: Euclidean neural networks\.[https://github\.com/e3nn/e3nn](https://github.com/e3nn/e3nn), 2022\.arXiv:2207\.09453\.
- Batatia et al\. \(2022\)Ilyes Batatia, David Peter Kovacs, Gregor N\. C\. Simm, Christoph Ortner, and Gábor Csányi\.MACE: Higher order equivariant message passing neural networks for fast and accurate force fields\.In*Advances in Neural Information Processing Systems \(NeurIPS\)*, 2022\.

Similar Articles

[R] Measuring the Symmetry--Data Exchange Rate

Reddit r/MachineLearning

This paper empirically measures the symmetry–data exchange rate predicted by equivariance theory, finding that wrong-group symmetry constraints are actively harmful, augmentation with test-time orbit averaging matches equivariant architectures, and the theoretical |G|-fold sample complexity reduction is only weakly confirmed with wide confidence intervals. The study is explicitly exploratory and not pre-registered.

Measuring the Symmetry--Data Exchange Rate

Hugging Face Daily Papers

This exploratory study empirically measures the symmetry–data exchange rate predicted by equivariance theory on controlled C_n-symmetric tasks, finding that wrong-group constraints are actively harmful, augmentation with test-time orbit averaging matches equivariant models exactly, and the empirical exchange rate is broadly consistent with theory but statistically inconclusive. The authors emphasize the study's exploratory nature and call for registered replications.