NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning
Summary
This paper introduces NeSyCat Torch, a differentiable tensor implementation of categorical semantics for neurosymbolic learning, unifying classical, fuzzy, and probabilistic semantics under a monadic framework and demonstrating superior speed and accuracy on MNIST addition compared to existing systems like LTN and DeepProbLog.
View Cached Full Text
Cached at: 06/18/26, 05:42 AM
# NeSyCat Torch\xspace: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning
Source: [https://arxiv.org/html/2606.19279](https://arxiv.org/html/2606.19279)
\\clearauthor\\Name
Daniel Romero Schellhorn\\Emaildaniel\.schellhorn@uni\-osnabrueck\.de \\NameTill Mossakowski\\Emailtill\.mossakowski@uni\-osnabrueck\.de \\NameBjörn Gehrke\\Emailbjoern\.gehrke@uni\-osnabrueck\.de \\addrUniversity of Osnabrück, Osnabrück, Germany
###### Abstract
Neurosymbolic semantics is fragmented: classical, fuzzy, probabilistic and neural systems each define truth by their own inductive rules\. NeSyCat, extending ULLER, subsumes them under a single inductive definition of truth, parametric in a strong monad and an aggregation structure on truth\-values\. NeSyCat has so far lacked an account of predicates and functions learned by neural networks\. We provideNeSyCat Torch\\xspaceas the missing link and interpret computational symbols via neural networks, implementing the framework in probabilistic programming and tensor\-based backends\. We use the distribution monad for reference semantics and metric evaluation, and complement it by a monad for numerically stable, differentiable training: the lazy log\-tensor monad over the log\-semiring\. For efficient training in batches, we furthermore employ a batch monad\. The axioms*are*the source code: written once in monad\-baseddo\-notation\\xspace, monadic bind performs marginalisation, lazily pruning unneeded branches\. On MNIST addition, our HaskTorch, JAX, and PyTorch implementations outperform LTN and DeepProbLog in speed and accuracy, while achieving nearly the accuracy of DeepStochLog\. However, unlike DeepStochLog, we stay in a uniform framework that applies to many first\-order NeSy approaches\. Namely, the construction is parametric in the monad; instantiating it with, e\.g\., the Giry monad extends the approach to continuous probability \(working out a neural representation here is left for future work\)\.
## 1Introduction
Neurosymbolic \(NeSy\) AI combines the perceptual strength of neural networks with the structured, verifiable reasoning of symbolic logic\. A recurring obstacle is fragmentation: classical, fuzzy, and probabilistic NeSy systems each come with their own logical language and semantics, so knowledge bases and learning objectives rarely transfer between them\. ULLER \- the Unified Language for Learning and Reasoning\(vankriekenULLER2024\)\- endows First\-Order\-Logic \(FOL\) syntax with three pairwise\-independent semantics \- classical, fuzzy, probabilistic \- each carrying its own inductive definition of truth\.
A recent line of work\(schellhornNeSyCatCategorical2026\)reformulates all three semantics as instances of a single*categorical*framework built on*monads*, Moggi’s construct for computational effects in functional programming\(moggiNotionsComputation1991\)\. The key observation is that an ULLER computation formulax:=m\(T1,…,Tn\)\(F\)x:=m\(T\_\{1\},\\dots,T\_\{n\}\)\\,\(F\), interpreted as “run modelmm, then bind its result toxx, then evaluateFF”, is exactly monadicdo\-notation\. Fixing a strong monadℳ\\mathcal\{M\}\(the effect\) and an aggregated truth\-value spaceΩ\\Omegawith connectives and quantifiers yields a*NeSy framework*; classical, fuzzy, probabilistic, LTN, and possibilistic semantics all reappear as choices ofℳ\\mathcal\{M\}andΩ\\Omega, evaluated by*one*inductive definition of truth\.
For efficiency reasons, we use*lazy*monads\. The lifting operation in the distribution monad computes probabilities using marginalization; a lazy monad ensures that marginalization is only done in cases where it is actually needed\.
Besides usual FOL function and predicate symbols, we consider computational function symbolsX→ℳYX\\to\\mathcal\{M\}Yand computational predicate symbolsX→ℳΩX\\to\\mathcal\{M\}\\Omega\. At the deep learning level, we need also to consider two\-sided computational function symbolsℳX→ℳY\\mathcal\{M\}X\\to\\mathcal\{M\}Yand two\-sided computational predicate symbolsℳX→ℳΩ\\mathcal\{M\}X\\to\\mathcal\{M\}\\Omega\.
We now recall monads and present the monads used in this paper in TableLABEL:tab:monads\-nesycat:
## 2Monads for Computational Effects
###### Definition 2\.1\(Monad\(kohlSchwaigerMonads2021, §3\.1\)\)\.
A monad is given by a triple m is a type constructor mapping a type m a of computational effects with values from return embeds values into computation and mintedhaskell return :: a \-¿ m a \(¿¿=\) :: m a \-¿ \(a \-¿ m b\) \-¿ m b Here,cca and passes its value\(s\) to a functionffdelivering a computation over type definition
Haskell provides the do x ¡\- y; f is syntactic sugar for table\[t\]
## Appendix ACategorical Background
### Monads\.
Categorically, a monad on a category𝒞\\mathcal\{C\}is a functorT:𝒞→𝒞T\\colon\\mathcal\{C\}\\to\\mathcal\{C\}with natural transformationsη:id𝒞⇒T\\eta\\colon\\mathrm\{id\}\_\{\\mathcal\{C\}\}\\Rightarrow T\(*unit*\) andμ:TT⇒T\\mu\\colon TT\\Rightarrow T\(*multiplication*\) satisfyingμ∘ηT=id=μ∘Tη\\mu\\mathbin\{\\circ\}\\eta T=\\mathrm\{id\}=\\mu\\mathbin\{\\circ\}T\\etaandμ∘Tμ=μ∘μT\\mu\\mathbin\{\\circ\}T\\mu=\\mu\\mathbin\{\\circ\}\\mu T\. The programming definition above corresponds one\-to\-one to the equivalent presentation as a*Kleisli triple*\(T,η,\(⋅\)ℳ\)\(T,\\eta,\(\\cdot\)^\{\\mathcal\{M\}\}\), wherefℳ:TA→TBf^\{\\mathcal\{M\}\}\\colon TA\\to TBforf:A→TBf\\colon A\\to TBsatisfiesηAℳ=idTA\\eta\_\{A\}^\{\\mathcal\{M\}\}=\\mathrm\{id\}\_\{TA\},fℳ∘ηAℳ=ff^\{\\mathcal\{M\}\}\\mathbin\{\\circ\}\\eta\_\{A\}^\{\\mathcal\{M\}\}=f, andgℳ∘fℳ=\(gℳ∘f\)ℳg^\{\\mathcal\{M\}\}\\mathbin\{\\circ\}f^\{\\mathcal\{M\}\}=\(g^\{\\mathcal\{M\}\}\\mathbin\{\\circ\}f\)^\{\\mathcal\{M\}\}: \(¿¿=\) is the Kleisli lift\(⋅\)ℳ\(\\cdot\)^\{\\mathcal\{M\}\}\(applied flipped\), and thedo\-notation\\xspaceof Section[2](https://arxiv.org/html/2606.19279#S2)\. is its syntactic sugar\.
### States\.
We work over a*concrete*Cartesian category𝒞\\mathcal\{C\}: objects are sets equipped with structure \(for example measurable spaces, tensor spaces or also plain sets\), morphisms are structure\-preserving maps, finite products exist, and the terminal object11is the one\-element set\. Effectful maps are always written explicitly as mapsf:S→ℳTf\\colon S\\to\\mathcal\{M\}\\,Tof the chosen strong monadℳ\\mathcal\{M\}\. Because𝒞\\mathcal\{C\}is concrete and Cartesian, a*state*onSS, formally a Kleisli point1→ℳS1\\to\\mathcal\{M\}\\,S, is the same thing as an*element*ofℳS\\mathcal\{M\}\\,S; we use this identification throughout and simply writeD∈ℳSD\\in\\mathcal\{M\}\\,S\.
###### Proposition A\.1\(Pointwise evaluation\)\.
For eachi∈B¯i\\in\\underline\{B\}, evaluationevi:ℬℳX→ℳX\\mathrm\{ev\}\_\{i\}\\colon\\mathcal\{B\}\\mathcal\{M\}\\,X\\to\\mathcal\{M\}\\,X,m↦m\(i\)m\\mapsto m\(i\), is a monad morphism, and therefore commutes with the interpretation ofdo\-notation\\xspaceprograms: for a batchs:B¯→Ss\\colon\\underline\{B\}\\to S,
⟦φ⟧ℬℳ\(s\)\(i\)=⟦φ⟧ℳ\(si\)\(i∈B¯\)\.\\llbracket\\varphi\\rrbracket^\{\\mathcal\{B\}\\mathcal\{M\}\}\(s\)\(i\)\\;=\\;\\llbracket\\varphi\\rrbracket^\{\\mathcal\{M\}\}\(s\_\{i\}\)\\qquad\(i\\in\\underline\{B\}\)\.
One batched run thus yields all per\-sample truth values; over the full sample these are theNNnumbersL^\\widehat\{L\}averages, a mini\-batch gives an unbiased estimate ofL^\\widehat\{L\}\.Similar Articles
A homotopy-type-theoretic generalization of neurosymbolic inference
This paper presents a homotopy-type-theoretic generalization of neurosymbolic inference that preserves symmetry information and proof multiplicity, showing that this framework recovers classical inference when symmetries are trivial and yields shortcut-aware concept posteriors computable in closed form, with practical improvements on reasoning-shortcut benchmarks.
The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling
The Cognitive Categorical Transformer (CCT) augments GPT-2 Small with category-theoretic components, achieving a 12% relative perplexity reduction on WikiText-103 under matched training conditions, with simplicial message passing responsible for 84% of the improvement.
CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
This paper introduces CATS, a cascaded adaptive tree speculation framework designed to accelerate LLM inference on memory-constrained edge devices by optimizing memory usage while maintaining high token acceptance rates.
DisjunctiveNet: Neural Symbolic Learning via Differentiable Convexified Optimization Layers
Introduces DisjunctiveNet, a unified end-to-end framework for enforcing hard, input-dependent mixed integer linear constraints within neural networks via differentiable convexified optimization layers, achieving perfect rule satisfaction on real-world datasets.
Neuro-Symbolic Injection of LTLf Constraints in Autoregressive Reinforcement Learning Policies
Introduces a neurosymbolic framework that injects LTLf constraints into transformer-based reinforcement learning policies via differentiable automaton representations and a logic-based loss, improving constraint satisfaction while maintaining competitive returns.