A homotopy-type-theoretic generalization of neurosymbolic inference
Summary
This paper presents a homotopy-type-theoretic generalization of neurosymbolic inference that preserves symmetry information and proof multiplicity, showing that this framework recovers classical inference when symmetries are trivial and yields shortcut-aware concept posteriors computable in closed form, with practical improvements on reasoning-shortcut benchmarks.
View Cached Full Text
Cached at: 06/17/26, 05:38 AM
# A homotopy-type-theoretic generalization of neurosymbolic inference
Source: [https://arxiv.org/html/2606.17851](https://arxiv.org/html/2606.17851)
\\jmlrproceedings\\jmlrvolume\\jmlryear\\jmlrpages
Electrical and Mathematical Sciences & Engineering DivisionKing Abdullah University of Science and Technology4700 KAUST23955ThuwalSaudi Arabia; KAUST Center of Excellence for Smart Health \(KCSH\); KAUST Center of Excellence for Generative AI\\NameRobert Hoehndorf\\Emailrobert\.hoehndorf@kaust\.edu\.sa \\addrComputerElectrical and Mathematical Sciences & Engineering DivisionKing Abdullah University of Science and Technology4700 KAUST23955ThuwalSaudi Arabia; KAUST Center of Excellence for Smart Health \(KCSH\); KAUST Center of Excellence for Generative AI
###### Abstract
A wide range of neurosymbolic \(NeSy\) systems compute one functional: a belief\-weighted sum of a logical quantity over a space ofσ\\sigma\-structures, of which weighted model counting, fuzzy logic, and probabilistic logic are special cases\. This account is built on*sets*, and a set deliberately forgets two things that are important for NeSy: when twoσ\\sigma\-structures are the same up to a symmetry of the theory, and how many distinct proofs witness a query\. Replacing the underlying sets by*types*, in the sense of homotopy type theory, preserves this information, and turns this functional into a belief\-weighted homotopy cardinality, a notion of size that counts each object in inverse proportion to its symmetries\. We develop the framework from scratch for NeSy systems, prove a conservativity theorem that recovers the classical functional when symmetries are trivial, and show that the symmetry our framework exposes is exactly the one behind reasoning shortcuts\. The payoff is concrete: the shortcut\-aware concept posterior that recent methods reach by ensembling or expressive density estimation is the only symmetry\-invariant point of the confusion\-set simplex, computable in closed form by averaging a single model over the symmetry group\. On MNIST reasoning\-shortcut benchmarks this single\-model wrapper is better calibrated than a diversity\-trained ensemble, while leaving label accuracy and identifiable concepts untouched\. Code is freely available at\\urlhttps://github\.com/bio\-ontology\-research\-group/hott\-nesy\.
## 1Introduction
Much of neurosymbolic AI can be read as a single recipe: a neural network proposes weights over possibilities, a logical theory says which possibilities are admissible, and inference aggregates the weights of the admissible ones\(desmet2025defining\)\. Fixing a spaceΩ\\Omegaof*σ\\sigma\-structures*\(assignments of values to the symbols of a logical language\), a*logic function*ℓ\\ellthat assigns a value to a sentenceφ\\varphiin aσ\\sigma\-structure, and a neural*belief*bθb\_\{\\theta\}overσ\\sigma\-structures, neurosymbolic inference can be defined as the belief\-weighted aggregate
Fθ\(φ\)=∫Ωℓ\(φ,ω\)bθ\(ω\)dm\(ω\)\.F\_\{\\theta\}\(\\varphi\)\\;=\\;\\int\_\{\\Omega\}\\ell\(\\varphi,\\omega\)\\,b\_\{\\theta\}\(\\omega\)\\,\\mathrm\{d\}m\(\\omega\)\.\(1\)With a counting measure this is weighted model counting \(WMC\); over a continuous domain it is weighted model integration\. Through the lens of algebraic model counting\(kimmig2017algebraic\), varying the arithmetic used to combine values recovers probabilistic, fuzzy, tropical \(MaxSAT/Viterbi\), and provenance inference; varying the belief recovers the semantic loss\(xu2018semantic\), DeepProbLog\(manhaeve2018deepproblog\), logic tensor networks\(badreddine2022logic\), Scallop\(huang2021scallop\), and more\. Equation \([1](https://arxiv.org/html/2606.17851#S1.E1)\) is therefore a common denominator for large parts of the NeSy field\.
A space ofσ\\sigma\-structures is a*set*, and a set is a deliberately forgetful object: it records which elements it contains and nothing else – neither*\(i\)*that twoσ\\sigma\-structures are “the same up to a symmetry” of the theory, nor*\(ii\)*how many distinct derivations witness that a query holds\. Both are central to relational and structured NeSy\. If two individuals are interchangeable \(no evidence distinguishes them\), theσ\\sigma\-structures obtained by swapping them are not different ones, yet Eqn\. \([1](https://arxiv.org/html/2606.17851#S1.E1)\) counts them as two; if a query is provable in several ways, Eqn\. \([1](https://arxiv.org/html/2606.17851#S1.E1)\) collapses this to a single number\. In the terminology of category theory\(baez2001finite\), the functional is*decategorified*: it projects a structured situation down to bare numbers\. The first symmetry is not an exotic concern: it is exactly the structure behind*reasoning shortcuts*\(marconato2023notall\), a central failure mode of NeSy learning\.
We investigate what Eqn\. \([1](https://arxiv.org/html/2606.17851#S1.E1)\) becomes if its underlying sets are replaced by*types*, in the sense of homotopy type theory \(HoTT\)\(hottbook\)\. A type is like a set that additionally remembers, for any two elements, the ways in which they may be identified; from this it recovers each element’s symmetry group and each proof’s identity\. The correct notion of*size*for such an object is not cardinality but*homotopy cardinality*\(baez2001finite;leinster2008euler\), which counts each element weighted by the reciprocal of its symmetries\. Carrying Eqn\. \([1](https://arxiv.org/html/2606.17851#S1.E1)\) through this change of foundations \(Section[3](https://arxiv.org/html/2606.17851#S3)\) yields a belief\-weighted homotopy cardinality that retains this information while specializing back to Eqn\. \([1](https://arxiv.org/html/2606.17851#S1.E1)\) when all symmetries are trivial\.
We keep the development self\-contained and assume no background in type theory: Section[2\.3](https://arxiv.org/html/2606.17851#S2.SS3)introduces every notion we use, and Appendix[A](https://arxiv.org/html/2606.17851#A1)expands it in logic and knowledge\-graph terms\. Our contributions are:
- •lifting each ingredient of Eqn\. \([1](https://arxiv.org/html/2606.17851#S1.E1)\) one level, exposing two design parameters the classical functional cannot name: a*symmetry parameter*and a*truncation level*;
- •a*conservativity theorem*\(Section[3\.1](https://arxiv.org/html/2606.17851#S3.SS1)\), verified in Lean 4 with Mathlib\(mathlib\), identifying the belief\-weighted homotopy cardinality with a symmetry\-corrected weighted count and recovering Eqn\. \([1](https://arxiv.org/html/2606.17851#S1.E1)\) when the symmetry is trivial or the semiring idempotent;
- •we identify \(Section[3\.2](https://arxiv.org/html/2606.17851#S3.SS2)\) the symmetry parameter with the mechanism behind reasoning shortcuts, leading to a closed\-form, single\-model method for shortcut\-aware concept posteriors \(*orbit\-averaging*\);
- •experiments \(Section[4](https://arxiv.org/html/2606.17851#S4)\) on MNIST reasoning\-shortcut tasks where the wrapper is better calibrated than a five\-modelBearsensemble\(marconato2024bears\), with accuracy and identifiable concepts preserved;
- •a generating\-function consequence for exchangeable models \(Section[5\.1](https://arxiv.org/html/2606.17851#S5.SS1)\) and a discussion \(Sections[5\.2](https://arxiv.org/html/2606.17851#S5.SS2)–[5\.3](https://arxiv.org/html/2606.17851#S5.SS3)\) of what types buy and of the route to the continuous side through cohesion\.
## 2Background
### 2\.1The neurosymbolic inference functional
As shown in Eqn\. \([1](https://arxiv.org/html/2606.17851#S1.E1)\), a NeSy model\(desmet2025defining\)fixes a signatureσ\\sigmaof symbols with associated domains, so that a*σ\\sigma\-structure*ω:σ→D\\omega\\colon\\sigma\\to Dassigns each symbol a value; the space of allσ\\sigma\-structures isΩ=Dσ\\Omega=D^\{\\sigma\}\. A semantics sends a sentenceφ\\varphiand aσ\\sigma\-structureω\\omegato a truth value in a setV↪ℝ\+V\\hookrightarrow\\mathbb\{R\}^\{\+\}\(Boolean\{0,1\}\\\{0,1\\\}, fuzzy\[0,1\]\[0,1\], and so on\), and the logic functionℓ\(φ,ω\)\\ell\(\\varphi,\\omega\)reads off the part of that value we wish to aggregate\. A beliefbθ:Ω→ℝ\+b\_\{\\theta\}\\colon\\Omega\\to\\mathbb\{R\}^\{\+\}, typically computed by a neural network \(the*perception*\) from the raw input, weightsσ\\sigma\-structures\. For finiteΩ\\Omegawith the counting measure, Eqn\. \([1](https://arxiv.org/html/2606.17851#S1.E1)\) is the weighted model count
Fθ\(φ\)=∑ω⊧φbθ\(ω\)\.F\_\{\\theta\}\(\\varphi\)\\;=\\;\\sum\_\{\\omega\\models\\varphi\}b\_\{\\theta\}\(\\omega\)\.\(2\)
###### Example 2\.1\(A two\-individual smokers program\)\.
Let two individuals1,21,2each either smoke or not, so aσ\\sigma\-structure is a functionω:\{1,2\}→\{⊥,⊤\}\\omega\\colon\\\{1,2\\\}\\to\\\{\\bot,\\top\\\}andΩ\\Omegahas four elements\. Suppose perception outputs an independent smoking probabilitypip\_\{i\}for each individual, giving the beliefbθ\(ω\)=∏ipi\[ω\(i\)=⊤\]\(1−pi\)\[ω\(i\)=⊥\]b\_\{\\theta\}\(\\omega\)=\\prod\_\{i\}p\_\{i\}^\{\[\\omega\(i\)=\\top\]\}\(1\-p\_\{i\}\)^\{\[\\omega\(i\)=\\bot\]\}\. For the queryφ=\\varphi=“someone smokes”, Eqn\. \([2](https://arxiv.org/html/2606.17851#S2.E2)\) sumsbθb\_\{\\theta\}over the three satisfyingσ\\sigma\-structures and gives1−\(1−p1\)\(1−p2\)1\-\(1\-p\_\{1\}\)\(1\-p\_\{2\}\)\. We return to this program throughout\.
A single algebraic generalization underlies the field’s diversity: replacing the value setVVby a commutative semiring\(⊕,⊗\)\(\\oplus,\\otimes\)and reading Eqn\. \([2](https://arxiv.org/html/2606.17851#S2.E2)\) in that semiring is*algebraic model counting*\(kimmig2017algebraic\): probability uses\(\+,×\)\(\+,\\times\), fuzzy logic\(max,min\)\(\\max,\\min\), MaxSAT and Viterbi the tropical\(max,\+\)\(\\max,\+\), provenance free semirings of proof terms\.
These readings are all*decategorifications*: passages from one structured object to a numerical invariant that forgets the structure\. The prototype is that the natural numbers are the decategorification of finite sets, where a number records an isomorphism class, disjoint union becomes\+\+, and Cartesian product becomes×\\times; algebraic model counting then varies which invariant, that is which semiring, one reads off \(Section[3](https://arxiv.org/html/2606.17851#S3)\)\.
### 2\.2Reasoning shortcuts
A NeSy model trains its perception – which predicts the latent symbols, or*concepts*, of each input – with supervision only on the logical output, and this under\-determination has a well\-studied failure mode\. Writeβ\\betafor the map sending a concept assignment to the program output it produces\. A*reasoning shortcut*is a perception that minimizes the training loss while assigning its concepts the wrong meaning, induced by a relabelingα\\alphaof the vocabulary that leaves the output unchanged,β\(α\(c\)\)=β\(c\)\\beta\(\\alpha\(c\)\)=\\beta\(c\)\(marconato2023notall;bortolotti2025identifiability\)\. The concept assignments reachable from a given one by such output\-preserving relabelings form its*confusion set*\(vankrieken2025rsindependence\), and, under mild assumptions, the loss\-minimizing perceptions are the convex combinations of the resulting deterministic optima\(marconato2023notall\)\. Under the common*independence assumption*, that the perception factorizes over concepts\(vankrieken2024independence\), the model is driven to one deterministic assignment per input and commits to a single element of each confusion set with no evidence to prefer it\.
The concepts within a confusion set are*unidentifiable*from the data, so a predictor should be*uncertain*across them rather than commit to one shortcut\(marconato2024bears\); the desired*shortcut\-aware*concept posterior is the maximum\-entropy \(uniform\) distribution on each confusion set\. Existing methods reach it by training a diversity\-encouraged ensemble \(Bears,marconato2024bears\), by regularizing toward a uniform target\(vankrieken2025rsindependence\), or by fitting an expressive joint distribution over concepts\(vankrieken2025nesydm\)\. Section[3\.2](https://arxiv.org/html/2606.17851#S3.SS2)recovers this target in closed form by reading confusion sets as symmetry orbits\.
### 2\.3Types, identifications, and symmetry
A type is a set that also remembers how its elements can be identified\. For neurosymbolic inference the elements areσ\\sigma\-structures, and an identification is a renaming of individuals carrying oneσ\\sigma\-structure to another that respects the theory – an isomorphism ofσ\\sigma\-structures, or an automorphism when the two coincide\.
For a typeAAand elementsa,b:Aa,b:Athere is an*identity type*IdA\(a,b\)\\mathrm\{Id\}\_\{A\}\(a,b\)whose elements are*identifications*ofaawithbb\. TakeAAto be theσ\\sigma\-structures of a theoryTT\. An identification ofσ\\sigma\-structureaawithσ\\sigma\-structurebbis then a renaming of the domain individuals that carriesaatobband respectsTT\. There may be none \(the two are genuinely different models\), exactly one \(the situation in an ordinary set ofσ\\sigma\-structures\), or several\. The self\-identifications of a singleσ\\sigma\-structure, the renamings of individuals that leave it unchanged, compose and invert, so they form a group: the*automorphism group*Aut\(a\)\\mathrm\{Aut\}\(a\)\. This is the information a set discards: a set ofσ\\sigma\-structures is the degenerate case in which the only renaming fixing anyσ\\sigma\-structure is the identity\.
In the smokers program of Example[2\.1](https://arxiv.org/html/2606.17851#S2.Thmtheorem1), if no evidence distinguishes individuals11and22, then swapping them identifies theσ\\sigma\-structure “only11smokes” with “only22smokes”\. The same swap leaves “nobody smokes” and “everybody smokes” unchanged, so it is a nontrivial automorphism of each\. The type ofσ\\sigma\-structures records this relabeling symmetry of the program; the set ofσ\\sigma\-structures cannot\.
### 2\.4Truncation: propositions, sets, groupoids
Types are stratified by how much identification structure they carry, the*truncation level*\. A*\(−1\)\(\-1\)\-type*, or*proposition*, has at most one element: a yes/no fact, such as whether a ground atom holds or whetherφ\\varphiis entailed\. A*0\-type*, or*set*, has at most one identification between any two elements: theσ\\sigma\-structuresΩ=Dσ\\Omega=D^\{\\sigma\}form a0\-type, recording no symmetry\. A*11\-type*, or*groupoid*, may have nontrivial automorphism groups but no higher structure;σ\\sigma\-structures carrying their renaming symmetries live here\.
### 2\.5Dependent sums and homotopy cardinality
If to eacha:Aa:Awe assign a typeP\(a\)P\(a\), the*dependent sum*∑a:AP\(a\)\\sum\_\{a:A\}P\(a\)is the type of pairs\(a,p\)\(a,p\)withp:P\(a\)p:P\(a\)\. We takeP\(ω\)P\(\\omega\)to be the*derivations*that witness the query inσ\\sigma\-structureω\\omega: a Boolean once we forget which, or the several distinct proof trees that provenance records\. The dependent sum bundles each model with a derivation, and its identifications combine renamings of models with identifications of the derivations they carry, so symmetries of models and of proofs are tracked together\.
Ordinary counting is the wrong measure of*size*for a groupoid ofσ\\sigma\-structures, because it ignores symmetry: aσ\\sigma\-structure fixed by a nontrivial renaming should count as less than a whole one\. The right notion is*homotopy cardinality*\(baez2001finite;leinster2008euler\), which counts each element in inverse proportion to its symmetries,
\|X\|=∑\[x\]1\|Aut\(x\)\|,\\lvert X\\rvert\\;=\\;\\sum\_\{\[x\]\}\\frac\{1\}\{\\lvert\\mathrm\{Aut\}\(x\)\\rvert\},\(3\)the sum running overσ\\sigma\-structures up to renaming\. A set ofkkσ\\sigma\-structures has\|X\|=k\\lvert X\\rvert=k; a single model with symmetry groupGGcounts as1\|G\|\\tfrac\{1\}\{\\lvert G\\rvert\}\.
To connect this to a group of renamings, we use three standard notions from group actions\. When a finite groupGGacts on a finite setXXofσ\\sigma\-structures, the*orbit*G⋅x=\{g⋅x:g∈G\}G\\cdot x=\\\{g\\cdot x:g\\in G\\\}collects the structures indistinguishable fromxxup to renaming, the*stabilizer*StabG\(x\)=\{g∈G:g⋅x=x\}\\mathrm\{Stab\}\_\{G\}\(x\)=\\\{g\\in G:g\\cdot x=x\\\}collects the renamings fixingxx, and each orbit has size\|G\|\|StabG\(x\)\|\\tfrac\{\\lvert G\\rvert\}\{\\lvert\\mathrm\{Stab\}\_\{G\}\(x\)\\rvert\}\(the*orbit–stabilizer theorem*\)\. The*action groupoid*X//GX/\\\!\\\!/G\(a double slash, distinguishing it from the orbit setX/GX/G, which forgets the stabilizers\) adds an identificationx→g⋅xx\\to g\\cdot xfor everyg∈Gg\\in G; combining the two preceding facts, its homotopy cardinality is
\|X//G\|=∑orbits1\|StabG\(x\)\|=\|X\|\|G\|\.\\lvert X/\\\!\\\!/G\\rvert=\\sum\_\{\\text\{orbits\}\}\\frac\{1\}\{\\lvert\\mathrm\{Stab\}\_\{G\}\(x\)\\rvert\}=\\frac\{\\lvert X\\rvert\}\{\\lvert G\\rvert\}\.\(4\)
###### Example 2\.2\(Models up to relabeling\)\.
Take the fourσ\\sigma\-structures of Example[2\.1](https://arxiv.org/html/2606.17851#S2.Thmtheorem1)with renaming groupSym\(2\)=\{id,swap\}\\mathrm\{Sym\}\(2\)=\\\{\\mathrm\{id\},\\text\{swap\}\\\}\. “Nobody” and “everybody” are each fixed by the swap \(stabilizer of size22\); the two “exactly one” models form one orbit with trivial stabilizer\. Then Eqn\. \([3](https://arxiv.org/html/2606.17851#S2.E3)\) gives12\+12\+1=2=42\\tfrac\{1\}\{2\}\+\\tfrac\{1\}\{2\}\+1=2=\\tfrac\{4\}\{2\}, the symmetry\-weighted count, not the number33of models\-up\-to\-relabeling\.
Homotopy cardinality requires finiteness \(finitely many isomorphism classes, finite automorphism groups; “π\\pi\-finiteness”\)\.
## 3Lifting the functional
We now carry each ingredient of Eqn\. \([1](https://arxiv.org/html/2606.17851#S1.E1)\) from sets to types \(Table[1](https://arxiv.org/html/2606.17851#S3.T1)\); in every row the classical object is recovered by truncating\.
Table 1:Lifting the four ingredients of neurosymbolic inference one level\.First, the structures become a typeΩ:𝒰\\Omega:\\mathcal\{U\}whose identity types record structure symmetries, so thatσ\\sigma\-structures related by an isomorphism are identified; the classicalΩ=Dσ\\Omega=D^\{\\sigma\}is the0\-truncated case\. Second, the truth value becomes a proof family: we replaceℓ\(φ,ω\)\\ell\(\\varphi,\\omega\)by a type familyP:Ω→𝒰P\\colon\\Omega\\to\\mathcal\{U\}, whereP\(ω\)P\(\\omega\)is the type of*witnesses*thatφ\\varphiholds atω\\omega\. Its truncation level is a parameter, so that a proposition records only*whether*φ\\varphiholds, a set records*how many*derivations witness it \(provenance\), and a groupoid identifies derivations related by a symmetry\. Third, inference becomes a homotopy cardinality: the dependent sum∑ω:ΩP\(ω\)\\sum\_\{\\omega:\\Omega\}P\(\\omega\)is the type of*\(structure, witness\)*pairs, and its belief\-weighted homotopy cardinality
N\(P,b\)=∥∑ω:ΩP\(ω\)∥V,bN\(P,b\)\\;=\\;\\Big\\lVert\\textstyle\\sum\_\{\\omega:\\Omega\}P\(\\omega\)\\Big\\rVert\_\{V,b\}\(5\)is our inference value\. WhenΩ\\Omegais a set,PPa proposition, and∥⋅∥V,b\\lVert\\cdot\\rVert\_\{V,b\}the belief\-weighted counting measure, Eqn\. \([5](https://arxiv.org/html/2606.17851#S3.E5)\) collapses to the weighted model count \([2](https://arxiv.org/html/2606.17851#S2.E2)\)\. Fourth, the valuation∥⋅∥V\\lVert\\cdot\\rVert\_\{V\}turning the proof type into a number is exactly the choice of commutative semiring: algebraic model counting\(kimmig2017algebraic\)viewed as decategorification maps applied to the*same*object \([5](https://arxiv.org/html/2606.17851#S3.E5)\)\.
The lift exposes two parameters Eqn\. \([1](https://arxiv.org/html/2606.17851#S1.E1)\) cannot express\. The first is a*symmetry parameter*: one may quotientΩ\\Omegaby any subgroupG≤Aut\(T\)G\\leq\\mathrm\{Aut\}\(T\)of the theory’s symmetries, interpolating betweenG=𝟏G=\\mathbf\{1\}\(every grounding distinct\) andG=Aut\(T\)G=\\mathrm\{Aut\}\(T\)\(σ\\sigma\-structures counted up to isomorphism\)\. This parameter is not free, because Theorem[3\.2](https://arxiv.org/html/2606.17851#S3.Thmtheorem2)needs the belief to beGG\-invariant, soGGis exactly the symmetry shared by the theory and the belief: a perception that scores individuals differently breaks the symmetry and forcesG=𝟏G=\\mathbf\{1\}, whereas an exchangeable perception \(one invariant under permuting individuals\) permits the fullG=Aut\(T\)G=\\mathrm\{Aut\}\(T\)\. The second is a*truncation level*: the level ofPPseparates systems that check satisfaction \(propositions; e\.g\.badreddine2022logic\) from those that count derivations \(sets; e\.g\.manhaeve2018deepproblog;huang2021scallop\) from those that identify symmetric derivations \(groupoids\)\.
Concretely, on the smokers program \(Example[2\.1](https://arxiv.org/html/2606.17851#S2.Thmtheorem1)\) with query “someone smokes”, the structure–witness object∑ωP\(ω\)\\sum\_\{\\omega\}P\(\\omega\)has the three satisfying models with their beliefs; the probability semiring\(\+,×\)\(\+,\\times\)sums these to1−\(1−p1\)\(1−p2\)1\-\(1\-p\_\{1\}\)\(1\-p\_\{2\}\), the weighted model count, while the tropical\(max,\+\)\(\\max,\+\)reading of the*same*object returns the most probable satisfying model\. The object is fixed; the semiring chooses which number to read off it\.
### 3\.1The conservativity theorem
We make precise, and machine\-check, that Eqn\. \([5](https://arxiv.org/html/2606.17851#S3.E5)\) extends Eqn\. \([1](https://arxiv.org/html/2606.17851#S1.E1)\) faithfully, working in the finite,11\-truncated \(groupoid\) case, where the structure–witness type with symmetry is a finite action groupoid \(Section[2\.5](https://arxiv.org/html/2606.17851#S2.SS5)\)\.
###### Definition 3\.1\(Belief\-weighted groupoid cardinality\)\.
Let a finite groupGGact on a finite typeXXof structure–witness pairs, and letb:X→ℚ\+b\\colon X\\to\\mathbb\{Q\}^\{\+\}be*GG\-invariant*,b\(g⋅x\)=b\(x\)b\(g\\cdot x\)=b\(x\)\. The belief\-weighted homotopy cardinality ofX//GX/\\\!\\\!/Gis\|X//G\|b=∑\[x\]∈X/Gb\(x\)\|StabG\(x\)\|\\lvert X/\\\!\\\!/G\\rvert\_\{b\}=\\sum\_\{\[x\]\\in X/G\}\\tfrac\{b\(x\)\}\{\\lvert\\mathrm\{Stab\}\_\{G\}\(x\)\\rvert\}\.
###### Theorem 3\.2\(Conservativity bridge\)\.
With the notation of Definition[3\.1](https://arxiv.org/html/2606.17851#S3.Thmtheorem1),\|X//G\|b=1\|G\|∑x∈Xb\(x\)\\lvert X/\\\!\\\!/G\\rvert\_\{b\}=\\tfrac\{1\}\{\\lvert G\\rvert\}\\sum\_\{x\\in X\}b\(x\)\.
The proof is orbit–stabilizer arithmetic \(Appendix[B](https://arxiv.org/html/2606.17851#A2)\), formalized in Lean 4 over Mathlib \(Appendix[C](https://arxiv.org/html/2606.17851#A3)\)\. WhenGGis trivial andPPis a proposition,N\(P,b\)=Fθ\(φ\)N\(P,b\)=F\_\{\\theta\}\(\\varphi\)is the weighted model count of Eqn\. \([2](https://arxiv.org/html/2606.17851#S2.E2)\)\. Lifted inference\(vandenbroeck2011lifted;niepert2014tractability\)exploits the same orbits, but to compute the*flat*count∑xb\(x\)\\sum\_\{x\}b\(x\)faster; the symmetry parameter chooses between the flat count and the quotient\.
#### When the lift is inert\.
Theorem[3\.2](https://arxiv.org/html/2606.17851#S3.Thmtheorem2)is the\(\+,×\)\(\+,\\times\)instance; read in an arbitrary commutative semiring via the valuation∥⋅∥V\\lVert\\cdot\\rVert\_\{V\}, the lift has a second, dual way to collapse onto the set\-based functional\.
###### Proposition 3\.3\(Two degeneracies of the lift\)\.
Read Eqn\. \([5](https://arxiv.org/html/2606.17851#S3.E5)\) in a commutative semiring\(⊕,⊗\)\(\\oplus,\\otimes\)via∥⋅∥V\\lVert\\cdot\\rVert\_\{V\}, withbbGG\-invariant\. The lifted value equals the set\-based one,\|X//G\|b=⨁x∈Xb\(x\)\\lvert X/\\\!\\\!/G\\rvert\_\{b\}=\\bigoplus\_\{x\\in X\}b\(x\), under either degeneracy:*\(a\)*the symmetry is trivial,G=𝟏G=\\mathbf\{1\}\(any semiring\); or*\(b\)*⊕\\oplusis idempotent \(anyGG\)\.
Case \(a\) is Theorem[3\.2](https://arxiv.org/html/2606.17851#S3.Thmtheorem2)atG=𝟏G=\\mathbf\{1\}\. In case \(b\) the belief is constant on each orbit, so an idempotent⊕\\oplusfuses the orbit’s equal copies into one \(v⊕⋯⊕v=vv\\oplus\\cdots\\oplus v=v\) while the1\|Stab\|\\tfrac\{1\}\{\\lvert\\mathrm\{Stab\}\\rvert\}weight is the identity \(Appendix[B](https://arxiv.org/html/2606.17851#A2); machine\-checked, Appendix[C](https://arxiv.org/html/2606.17851#A3)\)\. The degeneracies are dual: trivial symmetry has nothing to quotient, an idempotent semiring nothing to overcount\. The lift is therefore visible only in the non\-idempotent semirings \(probabilistic\(\+,×\)\(\+,\\times\), the gradient semiring, counting, provenance\) and inert on the idempotent ones, such as\(max,\+\)\(\\max,\+\)\(MPE/Viterbi\),\(max,min\)\(\\max,\\min\)\(fuzzy\), and\(∨,∧\)\(\\vee,\\wedge\)\(SAT\); the most\-probable\-σ\\sigma\-structure reading is symmetry\-invariant for free\. Remark[A\.1](https://arxiv.org/html/2606.17851#A1.Thmtheorem1)discusses which semirings the1\|Aut\|\\tfrac\{1\}\{\\lvert\\mathrm\{Aut\}\\rvert\}weight is defined in\.
### 3\.2Reasoning shortcuts as residual symmetry
The symmetry parameter is not an abstraction, but the structure behind reasoning shortcuts \(Section[2\.2](https://arxiv.org/html/2606.17851#S2.SS2)\), and turning it into a measure yields a practical method\. The output\-preserving relabelings of the vocabulary are invertible and closed under composition, so they form a group, the symmetry parameterG≤Aut\(T\)G\\leq\\mathrm\{Aut\}\(T\)instantiated by “preserves the belief”\. A confusion set is then exactly an orbitG⋅xG\\cdot xof aσ\\sigma\-structure under this group, and two facts the literature states set\-theoretically become orbit–stabilizer arithmetic: the number of distinct shortcuts through aσ\\sigma\-structure is the orbit size\|G\|\|Stab\|\\tfrac\{\\lvert G\\rvert\}\{\\lvert\\mathrm\{Stab\}\\rvert\}, and the symmetry\-corrected functional\|X//G\|b\\lvert X/\\\!\\\!/G\\rvert\_\{b\}is invariant under this group by construction \(Theorem[3\.2](https://arxiv.org/html/2606.17851#S3.Thmtheorem2)\)\. The independence assumption is the trivial caseG=𝟏G=\\mathbf\{1\}: every grounding is distinct, and the loss minima are the disconnected deterministic choices of Section[2\.2](https://arxiv.org/html/2606.17851#S2.SS2)\(vankrieken2024independence\)\.
The shortcut\-aware posterior is the unique symmetry\-invariant point: the concept posteriors form the probability simplex on a confusion set, that is on an orbit, the group action permutes its vertices, and the single invariant point is the uniform distribution, the maximum\-entropy target of Section[2\.2](https://arxiv.org/html/2606.17851#S2.SS2)\. Here it is forced: for any belief the symmetry parameter fixes, the orbit\-conditional posterior is uniform,b\(y\)∑zb\(z\)=1\|G⋅x\|\\tfrac\{b\(y\)\}\{\\sum\_\{z\}b\(z\)\}=\\tfrac\{1\}\{\\lvert G\\cdot x\\rvert\}, the same invariant measure the functional integrates against \(proof in Lean, Appendix[C](https://arxiv.org/html/2606.17851#A3)\)\.
This response lives only in a non\-idempotent semiring: reading the same confusion set in an idempotent one, such as the\(max,\+\)\(\\max,\+\)of most\-probable\-model inference, collapses its tied values to one number \(Proposition[3\.3](https://arxiv.org/html/2606.17851#S3.Thmtheorem3)\), so the aware uniform posterior and an overconfident commitment to one shortcut become indistinguishable; awareness needs the counting semiring\.
Orbit\-averaging realizes this invariant point as a closed\-form, inference\-time method that requires neither an ensemble nor a density model:*symmetrize*any base perceptionpθ\(c∣x\)p\_\{\\theta\}\(c\\mid x\)with the group\-averaging operator over the output\-preserving groupGG,
p¯\(c∣x\)=1\|G\|∑g∈Gpθ\(g−1⋅c∣x\)\.\\bar\{p\}\(c\\mid x\)\\;=\\;\\frac\{1\}\{\\lvert G\\rvert\}\\sum\_\{g\\in G\}p\_\{\\theta\}\(g^\{\-1\}\\\!\\cdot c\\mid x\)\.\(6\)
###### Proposition 3\.4\(Orbit\-averaging is closed\-form shortcut awareness\)\.
For an output\-preservingGG, the symmetrized posterior in Eqn\. \([6](https://arxiv.org/html/2606.17851#S3.E6)\)*\(i\)*leaves the induced label distribution unchanged, so label accuracy is preserved exactly;*\(ii\)*equals the uniform \(maximum\-entropy\) distribution on each confusion set, the onlyGG\-invariant point; and*\(iii\)*is computed in one forward pass plus a sum overGG, with the orbit size\|G\|\|Stab\|\\tfrac\{\\lvert G\\rvert\}\{\\lvert\\mathrm\{Stab\}\\rvert\}as a per\-input shortcut\-degeneracy score\.
Part \(i\) holds because eachggpermutes every label fiberβ−1\(y\)\\beta^\{\-1\}\(y\)within itself \(Appendix[B](https://arxiv.org/html/2606.17851#A2)\); part \(ii\) is the invariant\-point statement above\. So orbit\-averaging keeps confident, correct predictions on identifiable concepts \(orbit size11\) and is calibrated\-uncertain exactly on the confusable ones, from a single model\. The groupGGis read off the knowledge as the output\-preserving relabelings; computing it is graph\-automorphism detection \(encode the theory as a vertex\-colored graph\), practical at scale with solvers such as nauty\(mckay2014practical\)or saucy\(darga2008faster\)\. The closed form is exact whenGGis a genuine symmetry of the data; where it is only partial, an ensemble\(marconato2024bears\)or expressive model\(vankrieken2025nesydm\)represents the rest of the simplex\. To our knowledge, the reading of reasoning shortcuts as orbits of the theory’s symmetry group, and the closed\-form awareness method it yields, are new\.
## 4Experiments
We test orbit\-averaging on MNIST reasoning\-shortcut tasks, in the style of thersbenchsuite\(bortolotti2024rsbench\), where the symmetry group is known and concept ground truth is available; full details, architectures, and a second task \(XOR\-parity, groupℤ/2\\mathbb\{Z\}/2\) are in Appendix[D](https://arxiv.org/html/2606.17851#A4)\.
#### Setup\.
A single MNIST digit in\{0,…,5\}\\\{0,\\dots,5\\\}is mapped by a knowledgeβ\\betathat merges labels \(0↦00\\\!\\mapsto\\\!0,\{1,2\}↦1\\\{1,2\\\}\\\!\\mapsto\\\!1,\{3,4,5\}↦2\\\{3,4,5\\\}\\\!\\mapsto\\\!2\); training sees onlyβ\\beta\. The output\-preserving group isG=Sym\{0\}×Sym\{1,2\}×Sym\{3,4,5\}G=\\mathrm\{Sym\}\\\{0\\\}\\times\\mathrm\{Sym\}\\\{1,2\\\}\\times\\mathrm\{Sym\}\\\{3,4,5\\\}, so digit0is identifiable \(orbit size11\) while\{1,2\}\\\{1,2\\\}and\{3,4,5\}\\\{3,4,5\\\}are confusable \(orbit sizes2,32,3\): a reasoning shortcut on the confusable digits only\. We compare a base model, the same model orbit\-averaged \([6](https://arxiv.org/html/2606.17851#S3.E6)\), and a diversity\-trainedBearsensemble\(marconato2024bears\)\(members penalized by KL\-divergence from the running average plus an entropy term; Appendix[D](https://arxiv.org/html/2606.17851#A4)\)\. We report, with full definitions in Appendix[D](https://arxiv.org/html/2606.17851#A4): the accuracy of the merged labelβ\\beta\(*label*\); the tie\-aware expected calibration error \(ECE, the gap between confidence and accuracy\) of the digit posterior on the identifiable digit \(*id\-ECE*\) and on the confusable digits \(*conf\-ECE*\); and the mean posterior entropy on the confusable digits \(*conf\-ent\.*\), which a shortcut\-aware model should keep high\.
Table 2:Mixed\-identifiability MNIST task \(single\-model rows: means over 8 seeds; eachBearsrow: one five\-member ensemble\)\. Orbit\-averaging a single model calibrates the confusable concepts \(conf\-ECE\) while leaving the identifiable digit and label accuracy untouched; the ensemble trades calibration against accuracy\.
#### Results\.
Table[2](https://arxiv.org/html/2606.17851#S4.T2)shows the base model is overconfident and wrong on the confusable digits \(conf\-ECE0\.6140\.614\)\. Orbit\-averaging a*single*model drives this to0\.0100\.010while preserving label accuracy and the identifiable digit \(id\-ECE0\.0040\.004\): it acts exactly on the confusable orbits and nowhere else\. The diversity\-trained ensemble faces a tradeoff orbit\-averaging does not: its best label\-preserving setting reaches conf\-ECE0\.1580\.158, and pushing diversity harder lowers this to0\.0630\.063only by collapsing label accuracy to0\.590\.59and miscalibrating the identifiable digit \(id\-ECE0\.660\.66\), because its diversity pressure is global\. On the XOR\-parity task the base model commits to the wrong parity convention on60%60\\%of seeds \(calibration error on the parity bit, bit\-ECE, of0\.5920\.592\), while orbit\-averaging stays calibrated \(bit\-ECE0\.0120\.012\); see Appendix[D](https://arxiv.org/html/2606.17851#A4)\.
## 5Discussion
### 5\.1A generating\-function consequence
The categorified reading has an analytic consequence already in the simplest relational model\. Generalizing Example[2\.1](https://arxiv.org/html/2606.17851#S2.Thmtheorem1)tonnindividuals with one unary predicate, aσ\\sigma\-structure is a subset of\{S\(1\),…,S\(n\)\}\\\{S\(1\),\\dots,S\(n\)\\\},Aut\(T\)=Sym\(n\)\\mathrm\{Aut\}\(T\)=\\mathrm\{Sym\}\(n\), and the exchangeable beliefb\(ω\)=β\|ω\|b\(\\omega\)=\\beta^\{\\lvert\\omega\\rvert\}isSym\(n\)\\mathrm\{Sym\}\(n\)\-invariant, so the symmetry parameter is maximal,G=Sym\(n\)G=\\mathrm\{Sym\}\(n\)\. Orbits are indexed by the numbermmof smokers, with size\(nm\)\\binom\{n\}\{m\}and stabilizer of orderm\!\(n−m\)\!m\!\(n\-m\)\!, giving
∑ωb\(ω\)=\(1\+β\)n,\|Ω//Sym\(n\)\|b=\(1\+β\)nn\!,\\textstyle\\sum\_\{\\omega\}b\(\\omega\)=\(1\+\\beta\)^\{n\},\\qquad\\lvert\\Omega/\\\!\\\!/\\mathrm\{Sym\}\(n\)\\rvert\_\{b\}=\\frac\{\(1\+\\beta\)^\{n\}\}\{n\!\},\(7\)in agreement with Theorem[3\.2](https://arxiv.org/html/2606.17851#S3.Thmtheorem2)\(\|G\|=n\!\\lvert G\\rvert=n\!\); the closed form is machine\-checked for everynn\(Appendix[C](https://arxiv.org/html/2606.17851#A3)\)\.
###### Corollary 5\.1\(Inference generating functions\)\.
For the exchangeable model, the flat functional \([2](https://arxiv.org/html/2606.17851#S2.E2)\) assembles into the*ordinary*generating function∑n\(1\+β\)nxn=11−\(1\+β\)x\\sum\_\{n\}\(1\+\\beta\)^\{n\}x^\{n\}=\\frac\{1\}\{1\-\(1\+\\beta\)x\}, while the belief\-weighted homotopy cardinality \([5](https://arxiv.org/html/2606.17851#S3.E5)\) assembles into the*exponential*generating function∑n\(1\+β\)nn\!xn=e\(1\+β\)x\\sum\_\{n\}\\frac\{\(1\+\\beta\)^\{n\}\}\{n\!\}x^\{n\}=e^\{\(1\+\\beta\)x\}\.
In the theory of combinatorial species\(joyal1981species\), a species’ exponential generating function is the homotopy cardinality of its associated groupoid\(baez2001finite\); our model is the species “subsets of the domain”\. Because species compose, the homotopy\-cardinality reading composes correctly when relational models are built from parts, while the flat count overcounts relabelings; the correct generating function follows from the inference functional itself, without a separate exchangeability argument\.
### 5\.2Comparison of types and sets
Two further payoffs are worth naming\. The first is proof\-relevance: a set\-based truth value cannot distinguish “provable” from “provable in three symmetric ways”, whereas the truncation level ofPPplaces satisfaction checking, derivation counting, and symmetric\-derivation counting on one axis\.
The second payoff is representation invariance: HoTT’s univalence axiom\(hottbook\)lets equivalent types be treated as equal \(Appendix[A](https://arxiv.org/html/2606.17851#A1)\), so inference computed in one symbolic encoding equals inference in any equivalent one, with no separate proof; in set\-based foundations such invariance must be re\-established by hand\. The finite case is machine\-checked \(groupoidCard\_transport, Appendix[C](https://arxiv.org/html/2606.17851#A3)\); Appendix[B](https://arxiv.org/html/2606.17851#A2)spells out the consequence for description\-logic NeSy systems\.
### 5\.3The continuous side, and an open problem
Plain homotopy type theory supplies identifications but neither a metric nor a gradient\.*Real\-cohesive*homotopy type theory\(shulman2018brouwer\)equips every type with a synthetic notion of continuity alongside its identifications; its*shape modality*collapses a continuous type to its discrete content, a synthetic symbol\-grounding map\.
One substantial gap remains: homotopy cardinality is a counting invariant forπ\\pi\-finite types \(Section[2\.5](https://arxiv.org/html/2606.17851#S2.SS5)\), and a synthetic measure compatible with cohesion that lets Eqn\. \([5](https://arxiv.org/html/2606.17851#S3.E5)\) range over a continuousΩ\\Omegadoes not yet exist – the central open problem the framework raises\.
## 6Related work
Our value object refines established machinery: algebraic model counting\(kimmig2017algebraic\)already reads probabilistic, fuzzy, tropical, and provenance inference as one semiring sum, and we recover it as decategorifications of the dependent sum, with Scallop\(huang2021scallop\)the set\-valued case and lifted inference\(vandenbroeck2011lifted;niepert2014tractability\)the orbit machinery of Theorem[3\.2](https://arxiv.org/html/2606.17851#S3.Thmtheorem2)\. Reasoning shortcuts\(marconato2023notall\), the independence assumption\(vankrieken2024independence;vankrieken2025rsindependence\), and the awareness methods\(marconato2024bears;vankrieken2025nesydm\)supply confusion sets and uniform targets set\-theoretically; Section[3\.2](https://arxiv.org/html/2606.17851#S3.SS2)reads them as orbits and the invariant point of a group action\. Categorical deep learning\(gavranovic2024categorical\)works11\- and22\-categorically under strict equality, below the identity\-type and cohesive structure we use; homotopy\-theoretic models of neural data\(manin2024homotopy\)apply algebraic topology to neural activity, not type theory to inference\.
## 7Conclusion
Read over types rather than sets, neurosymbolic inference becomes a belief\-weighted homotopy cardinality that remembers structure symmetries and proof identity and recovers the classical functional when symmetries are trivial \(Theorem[3\.2](https://arxiv.org/html/2606.17851#S3.Thmtheorem2), machine\-checked\)\. The symmetry it exposes is the one behind reasoning shortcuts, and orbit\-averaging, the closed\-form single\-model method it yields, matches or beats a five\-model ensemble\.
## References
## Appendix AA fuller introduction to homotopy type theory
This appendix expands Section[2\.3](https://arxiv.org/html/2606.17851#S2.SS3)\. It first fixes the logical setting \(signatures, structures, truth, models\) and then the type\-theoretic notions used in the main text; it is self\-contained and followshottbook, to which all numbered results refer, with illustrations in logic\-programming and knowledge\-graph terms\.
### A\.1Signatures, structures, truth, and models
A*signature*σ\\sigmais a set of relation, function, and constant symbols, each with an arity\. A*σ\\sigma\-structure*ω\\omegafixes a domainDDof individuals and interprets every symbol ofσ\\sigmaoverDD: akk\-ary relation symbol as a subset ofDkD^\{k\}, a function symbol as a mapDk→DD^\{k\}\\to D, a constant as an element ofDD\. Theσ\\sigma\-structures over a fixed finite domain form the spaceΩ\\Omega\. A sentenceφ\\varphiis*true*inω\\omega, writtenω⊧φ\\omega\\models\\varphi, by the Tarskian satisfaction relation; aσ\\sigma\-structure withω⊧ψ\\omega\\models\\psifor every axiomψ\\psiof a theoryTTis a*model*ofTT, the least Herbrand model of a definite logic program being the standard example\. Twoσ\\sigma\-structures are*isomorphic*when a renaming of individuals \(a bijection ofDD\) carries one to the other while preserving every interpreted symbol; an isomorphism of a structure with itself is an*automorphism*, and the automorphisms ofTTare the renamings under whichTTis invariant\. These are the objects Section[2\.3](https://arxiv.org/html/2606.17851#S2.SS3)promotes one level:Ω\\Omegabecomes a type whose identifications are exactly these isomorphisms, and the truth valueω⊧φ\\omega\\models\\varphibecomes the proof familyP\(ω\)P\(\\omega\)of derivations that witnessφ\\varphi\.
### A\.2Types, terms, and the two basic constructors
Type theory manipulates*types*A,B,…A,B,\\dotsand*terms*a:Aa:A\. From a typeAAand a familyB:A→𝒰B\\colon A\\to\\mathcal\{U\}\(with𝒰\\mathcal\{U\}a*universe*, a type of types\) one forms the*dependent function type*∏a:AB\(a\)\\prod\_\{a:A\}B\(a\)\(ordinaryA→BA\\to BwhenBBis constant\) and the*dependent pair type*∑a:AB\(a\)\\sum\_\{a:A\}B\(a\)of pairs\(a,b\)\(a,b\)withb:B\(a\)b:B\(a\)\(ordinaryA×BA\\times BwhenBBis constant\)\. Under propositions\-as\-types∏\\prodreads “for all” and∑\\sum“there exists”:∏ismokes\(i\)\\prod\_\{i\}\\mathrm\{smokes\}\(i\)says every individual smokes, whereas∑ω:ΩP\(ω\)\\sum\_\{\\omega:\\Omega\}P\(\\omega\)is the type of aσ\\sigma\-structureω\\omegapaired with a derivationp:P\(ω\)p:P\(\\omega\)of the query, the \(structure, proof\) pairs the inference functional aggregates\. When eachB\(a\)B\(a\)is a yes/no fact,∑a:AB\(a\)\\sum\_\{a:A\}B\(a\)is the subtype ofaasatisfyingBB, for instance the models of a theory\.
### A\.3Identity types and the groupoid structure
Fora,b:Aa,b:Athe*identity type*a=Aba=\_\{A\}b\(alsoIdA\(a,b\)\\mathrm\{Id\}\_\{A\}\(a,b\)\) collects the identifications ofaawithbb; forσ\\sigma\-structures these are the renamings of individuals carrying one model to the other\. Its sole constructor is reflexivityrefla:a=Aa\\mathrm\{refl\}\_\{a\}\\colon a=\_\{A\}a, and its induction principle \(path induction\) says that to define something for allp:a=Abp\\colon a=\_\{A\}bit suffices to treat the caseb≡ab\\equiv a,p≡reflap\\equiv\\mathrm\{refl\}\_\{a\}\. Readingppas a*path*, path induction generates the higher structure: reflexivity is the constant path \(the identity renaming\), symmetryp−1p^\{\-1\}the inverse renaming, transitivityp⋅qp\\cdot qtheir composition \(hottbook, §2\.1\); these satisfy the groupoid laws up to higher paths, so each type is a \(weak\)∞\\infty\-groupoid\.
The self\-identificationsa=Aaa=\_\{A\}aform, under composition, the automorphism groupAut\(a\)\\mathrm\{Aut\}\(a\)of the model; the nontrivial automorphisms of a knowledge graph \(relabelings of its entities that preserve every triple\) are exactly its nontrivial elements\. A set is the degenerate case in which every such group is trivial\.
### A\.4Equivalence, univalence, and transport
A functionf:A→Bf\\colon A\\to Bis an*equivalence*,A≃BA\\simeq B, when it has a coherent inverse \(hottbook, §2\.4,§4\)\. ForA,B:𝒰A,B:\\mathcal\{U\}there is a canonical𝗂𝖽𝗍𝗈𝖾𝗊𝗏:\(A=𝒰B\)→\(A≃B\)\\mathsf\{idtoeqv\}\\colon\(A=\_\{\\mathcal\{U\}\}B\)\\to\(A\\simeq B\), and the*univalence axiom*\(hottbook, Axiom 2\.10\.3\) makes it an equivalence, so\(A=𝒰B\)≃\(A≃B\)\(A=\_\{\\mathcal\{U\}\}B\)\\simeq\(A\\simeq B\): equivalent types may be identified\. Its computational content is*transport*, and along𝗎𝖺\(f\)\\mathsf\{ua\}\(f\)transport computes toff\(hottbook, §2\.10\)\. Concretely, two knowledge bases differing only by an isomorphic renaming of their vocabulary \(distinct IRIs or constant names for the same entities\) have identified structure types, and the inference functional transports between them unchanged; representation invariance \(Section[5\.2](https://arxiv.org/html/2606.17851#S5.SS2)\) is automatic rather than proved per encoding\.
### A\.5Truncation levels \(h\-levels\)
The amount of identification structure a type carries is its*truncation level*, defined by recursion \(hottbook, Def\. 7\.1\.1\):XXis a\(−2\)\(\-2\)\-type if it is contractible, and an\(n\+1\)\(n\{\+\}1\)\-type ifx=Xyx=\_\{X\}yis annn\-type for allx,y:Xx,y:X\. Therefore\(−1\)\(\-1\)\-types are the*mere propositions*\(any two elements identified;hottbook, Def\. 3\.3\.1\), the truth of a ground atom;0\-types are the*sets*\(any two parallel paths identified;hottbook, Def\. 3\.1\.1\), theσ\\sigma\-structures; and11\-types are the*groupoids*,σ\\sigma\-structures with their renaming symmetries\. The levels are cumulative \(hottbook, Thm\. 7\.1\.7\) and closed under∑\\sumand∏\\prod\(hottbook, Thm\. 7\.1\.8–7\.1\.9\), so the structure–witness object∑ωP\(ω\)\\sum\_\{\\omega\}P\(\\omega\)has a level determined by those ofΩ\\OmegaandPP\. Truncating to levelnnforgets all structure abovenn\.
### A\.6Homotopy cardinality
For aπ\\pi\-finite groupoid \(finitely many isomorphism classes, finite automorphism groups\) the homotopy \(or groupoid\) cardinality is given by Eqn\. \([3](https://arxiv.org/html/2606.17851#S2.E3)\)\(baez2001finite;leinster2008euler\)\. The two facts we use are that a set ofkkσ\\sigma\-structures has cardinalitykk, and that for a finite renaming groupGGacting on a finite setXXofσ\\sigma\-structures the action groupoid satisfies\|X//G\|=\|X\|\|G\|\\lvert X/\\\!\\\!/G\\rvert=\\frac\{\\lvert X\\rvert\}\{\\lvert G\\rvert\}by orbit–stabilizer: the count of models up to isomorphism\. Homotopy cardinality is the decategorification under which disjoint unions add, products multiply, and the groupoid of finite structures of a species has the species’ exponential generating function as its cardinality, the fact behind Corollary[5\.1](https://arxiv.org/html/2606.17851#S5.Thmtheorem1)\.
## Appendix BProofs
#### Theorem[3\.2](https://arxiv.org/html/2606.17851#S3.Thmtheorem2)\.
By orbit–stabilizer each orbit\[x\]\[x\]has\|G\|\|StabG\(x\)\|\\frac\{\\lvert G\\rvert\}\{\\lvert\\mathrm\{Stab\}\_\{G\}\(x\)\\rvert\}members, all of equal belief byGG\-invariance, so
∑x∈Xb\(x\)=∑\[x\]\|G\|\|StabG\(x\)\|b\(x\)=\|G\|∑\[x\]b\(x\)\|StabG\(x\)\|=\|G\|⋅\|X//G\|b\.\\sum\_\{x\\in X\}b\(x\)=\\sum\_\{\[x\]\}\\frac\{\\lvert G\\rvert\}\{\\lvert\\mathrm\{Stab\}\_\{G\}\(x\)\\rvert\}\\,b\(x\)=\\lvert G\\rvert\\sum\_\{\[x\]\}\\frac\{b\(x\)\}\{\\lvert\\mathrm\{Stab\}\_\{G\}\(x\)\\rvert\}=\\lvert G\\rvert\\cdot\\lvert X/\\\!\\\!/G\\rvert\_\{b\}\.Dividing by\|G\|\\lvert G\\rvertgives the claim\. The argument is formalized in Lean 4 \(Appendix[C](https://arxiv.org/html/2606.17851#A3)\)\.
#### Proposition[3\.3](https://arxiv.org/html/2606.17851#S3.Thmtheorem3)\(b\)\.
Read the dependent sum in a commutative semiring with idempotent⊕\\oplus\. On each orbit\[x\]\[x\]the belief is constant,b\(⋅\)=vb\(\\cdot\)=v, byGG\-invariance\. The set\-based valuation of the orbit is⨁y∈\[x\]v=v\\bigoplus\_\{y\\in\[x\]\}v=v\(idempotence: any finite⊕\\oplusof copies ofvvisvv\), independent of the orbit size; and idempotence makes\|G\|⋅1=1\\lvert G\\rvert\\cdot 1=1, so the weight1\|Stab\|\\tfrac\{1\}\{\\lvert\\mathrm\{Stab\}\\rvert\}is the identity and the type\-based valuation of the orbit isvvas well\. The two agree orbit by orbit, so\|X//G\|b=⨁x∈Xb\(x\)\\lvert X/\\\!\\\!/G\\rvert\_\{b\}=\\bigoplus\_\{x\\in X\}b\(x\)for everyGG\. Case \(a\) is Theorem[3\.2](https://arxiv.org/html/2606.17851#S3.Thmtheorem2)atG=𝟏G=\\mathbf\{1\}\.
#### Proposition[3\.4](https://arxiv.org/html/2606.17851#S3.Thmtheorem4)\(i\), accuracy preservation\.
Fix a labelyyand writeF=β−1\(y\)F=\\beta^\{\-1\}\(y\)for its fiber\. Eachg∈Gg\\in Gpreservesβ\\beta, sog−1g^\{\-1\}restricts to a bijection ofFF\. Hence
∑c∈Fp¯\(c∣x\)\\displaystyle\\sum\_\{c\\in F\}\\bar\{p\}\(c\\mid x\)=1\|G\|∑g∈G∑c∈Fpθ\(g−1⋅c∣x\)\\displaystyle=\\frac\{1\}\{\\lvert G\\rvert\}\\sum\_\{g\\in G\}\\sum\_\{c\\in F\}p\_\{\\theta\}\(g^\{\-1\}\\\!\\cdot c\\mid x\)=1\|G\|∑g∈G∑c′∈Fpθ\(c′∣x\)=∑c∈Fpθ\(c∣x\),\\displaystyle=\\frac\{1\}\{\\lvert G\\rvert\}\\sum\_\{g\\in G\}\\sum\_\{c^\{\\prime\}\\in F\}p\_\{\\theta\}\(c^\{\\prime\}\\mid x\)=\\sum\_\{c\\in F\}p\_\{\\theta\}\(c\\mid x\),the inner sum being reindexed byc′=g−1⋅cc^\{\\prime\}=g^\{\-1\}\\\!\\cdot c\. The induced label distribution is unchanged, so any label decision \(and its accuracy\) is preserved\. Part \(ii\) is the orbit\-conditional uniformity machine\-checked asinvariant\_posterior\_uniform\.
#### Corollary[5\.1](https://arxiv.org/html/2606.17851#S5.Thmtheorem1)\.
Summing Eqn\. \([7](https://arxiv.org/html/2606.17851#S5.E7)\) againstxnx^\{n\}gives∑n\(1\+β\)nxn=11−\(1\+β\)x\\sum\_\{n\}\(1\+\\beta\)^\{n\}x^\{n\}=\\frac\{1\}\{1\-\(1\+\\beta\)x\}for the flat count and∑n\(1\+β\)nn\!xn=e\(1\+β\)x\\sum\_\{n\}\\frac\{\(1\+\\beta\)^\{n\}\}\{n\!\}x^\{n\}=e^\{\(1\+\\beta\)x\}for the homotopy cardinality, the latter by the exponential series\.
#### Description\-logic symmetries\.
In sROIQ a symmetric role declarationSym\(R\)\\mathrm\{Sym\}\(R\)identifiesR\(a,b\)R\(a,b\)withR\(b,a\)R\(b,a\), and interchangeable nominals make a swap\(ab\)\(a\\,b\)an automorphism of the theoryTT, so\|Aut\(T\)\|\>1\\lvert\\mathrm\{Aut\}\(T\)\\rvert\>1\. A knowledge\-compilation circuit for WMC over a grounding ofTTthen carries, unchanged, bothWMC\|Aut\(T\)\|\\frac\{\\mathrm\{WMC\}\}\{\\lvert\\mathrm\{Aut\}\(T\)\\rvert\}\(Theorem[3\.2](https://arxiv.org/html/2606.17851#S3.Thmtheorem2)\) and the tropical\(max,\+\)\(\\max,\+\)reading\.
## Appendix CThe Lean formalization
The development \(HottNeSy/GroupoidCard\.lean, Mathlib v4\.29\.1\) proves, with\#print axiomsreporting onlypropext,Classical\.choice,Quot\.sound\(nosorry\):groupoidCard\_eq\(Theorem[3\.2](https://arxiv.org/html/2606.17851#S3.Thmtheorem2)\);groupoidCard\_trivial\(the trivial\-symmetry recovery\);groupoidCard\_transport\(representation invariance\);invariant\_posterior\_uniform\(the orbit\-conditional posterior is uniform, the awareness target of Section[3\.2](https://arxiv.org/html/2606.17851#S3.SS2)\); andexch\_groupoidCard, verifying the closed form\(1\+β\)nn\!\\frac\{\(1\+\\beta\)^\{n\}\}\{n\!\}of \([7](https://arxiv.org/html/2606.17851#S5.E7)\) for everynn\(with supportingexch\_invariant,exch\_flat\_sum\); andgroupoidCard\_idempotent\_inert\(Proposition[3\.3](https://arxiv.org/html/2606.17851#S3.Thmtheorem3)\(b\): for aGG\-invariant belief into an idempotent additive monoid the type\-based and set\-based valuations coincide for everyGG, with supportingnsmul\_idem,sum\_eq\_sum\_orbits\_nsmul\)\. The whole argument is orbit–stabilizer arithmetic, so the finite case needs no homotopy type theory and checks in a proof\-irrelevant kernel\.
## Appendix DExperimental details
#### Architectures and training\.
Perception is a small convolutional network \(two conv layers, two linear layers\) mapping a28×2828\\times 28image to digit logits, trained with Adam \(10−310^\{\-3\}, batch256256\) on the merged label by minimizing the negative log\-likelihood ofβ\\beta, the standard NeSy objective\. The symmetry groupGGis enumerated fromβ\\betaas the product of symmetric groups on its label fibers\. Orbit\-averaging \([6](https://arxiv.org/html/2606.17851#S3.E6)\) is applied at inference\.BearstrainsK=5K=5members sequentially; memberttadds−γ1KL\(pt∥p¯<t\)−γ2H\(pt\)\-\\gamma\_\{1\}\\,\\mathrm\{KL\}\(p\_\{t\}\\\|\\bar\{p\}\_\{<t\}\)\-\\gamma\_\{2\}H\(p\_\{t\}\)to the loss \(diversity from the running average plus entropy\), a variant of Eq\. 7 ofmarconato2024bearsin which the running average is over the previous members and their normalization constants are omitted, and the ensemble averages members\.
#### Metrics\.
All calibration numbers are expected calibration error \(ECE\): test examples are binned by their top\-1 confidence, and ECE is the weighted mean over bins of the gap\|accuracy−confidence\|\\lvert\\text\{accuracy\}\-\\text\{confidence\}\\rvert\. On the mixed task, accuracy is*tie\-aware*: a top\-1 tie amongkkclasses counts as1/k1/kcorrect \(expected top\-1 correctness under uniform tie\-breaking\), so a deliberately uniform posterior is scored honestly rather than by an arbitrary argmax\. We report ECE on the identifiable digit \(*id\-ECE*\) and on the confusable digits \(*conf\-ECE*\) separately, the accuracy of the merged labelβ\\beta\(*label*\), and the mean entropy of the digit posterior on the confusable digits \(*conf\-ent\.*\), which is higher when the model is more uncertain, as the aware target requires\. On the XOR task*bit\-ECE*is the same ECE applied to the posterior over a parity bit, with plain argmax accuracy; the missing tie adjustment can only penalize orbit\-averaging, whose parity marginal ties at exactly12\\tfrac\{1\}\{2\}\.
#### Mixed task\.
Table[2](https://arxiv.org/html/2606.17851#S4.T2)reports means over88base seeds for the base and orbit\-averaged rows; eachBearsrow is a single five\-member ensemble \(member seeds100100–104104\)\. TheBearsrows sweep\(γ1,γ2\)\(\\gamma\_\{1\},\\gamma\_\{2\}\): the best label\-preserving setting \(γ1=0\.1,γ2=0\\gamma\_\{1\}\{=\}0\.1,\\gamma\_\{2\}\{=\}0\) gives conf\-ECE0\.1580\.158at label0\.9980\.998; the aggressive setting \(γ1=0\.5,γ2=0\.1\\gamma\_\{1\}\{=\}0\.5,\\gamma\_\{2\}\{=\}0\.1\) gives conf\-ECE0\.0630\.063but label0\.590\.59and id\-ECE0\.660\.66\.
#### XOR\-parity task\.
Two MNIST images, conceptsb1,b2b\_\{1\},b\_\{2\}their parities, labely=b1⊕b2y=b\_\{1\}\\oplus b\_\{2\}, trained onyyonly\. The global flip\(b1,b2\)↦\(1−b1,1−b2\)\(b\_\{1\},b\_\{2\}\)\\mapsto\(1\{\-\}b\_\{1\},1\{\-\}b\_\{2\}\)preserves XOR, soG=ℤ/2G=\\mathbb\{Z\}/2and each parity is identifiable only up to the flip\. Over1010seeds the base model commits to the wrong convention on60%60\\%of them \(bit\-ECE0\.5920\.592, confidence0\.980\.98\); orbit\-averaging a single model gives bit\-ECE0\.0120\.012with the label and the relationb1=b2b\_\{1\}\\\!=\\\!b\_\{2\}accuracy unchanged\. A naive ensemble inherits the majority convention’s bias \(bit\-ECE0\.5880\.588\); an ensemble balanced across conventions with the help of concept ground truth, which the other methods do not receive, only approximates the invariant point \(bit\-ECE0\.1500\.150\) and needs many models\.Similar Articles
A mathematical theory of balancing relational generalization and memorization
This paper introduces a novel task, transitive inference with exceptions, and analytically characterizes how neural network models (kernel ridge regression) balance relational generalization and memorization. The theory is validated in pretrained language models, showing systematic mistakes predicted by the theory.
Symmetry in the Wild: The Role of Equivariance in Neural Fluid Surrogates
This paper investigates the role of group-equivariant architectures in neural fluid dynamics surrogates, introducing the AB-GATr model. It finds that equivariance is beneficial when data lacks strong alignment, but can degrade performance on highly aligned datasets.
NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning
This paper introduces NeSyCat Torch, a differentiable tensor implementation of categorical semantics for neurosymbolic learning, unifying classical, fuzzy, and probabilistic semantics under a monadic framework and demonstrating superior speed and accuracy on MNIST addition compared to existing systems like LTN and DeepProbLog.
Group-Algebraic Tensors: Provably-optimal Equivariant Learning and Physical Symmetry Discovery
This paper introduces the ⋆_G tensor algebra, a framework that makes equivariance an intrinsic algebraic property rather than an architectural constraint, providing provably-optimal symmetry-preserving tensor approximation, Kronecker factorization for composing multiple symmetries, and a Lean 4 formalization. Experiments on QM9 molecular geometry demonstrate data-driven discovery of physical symmetry selection rules.
ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization
ImProver 2 is a neurosymbolic framework for automated proof optimization in Lean 4 that uses an expert-iteration pipeline and a scaffold to train a 7B-parameter model, outperforming much larger models and demonstrating that small models can effectively restructure research-level proofs.