XOResNet: Exclusive-OR Meta-Residuals Facilitate Deep Spiking Neural Networks Learning
Summary
XOResNet introduces OR-ADD shortcut connections and XOR meta-residuals to address spike redundancy and information loss in deep spiking neural networks, achieving state-of-the-art results on Fashion-MNIST, CIFAR-10, CIFAR-100, and miniImageNet.
View Cached Full Text
Cached at: 06/01/26, 09:28 AM
# XOResNet: Exclusive-OR Meta-Residuals Facilitate Deep Spiking Neural Networks Learning
Source: [https://arxiv.org/html/2605.30362](https://arxiv.org/html/2605.30362)
Junsong Wang[https://orcid.org/0000-0002-4846-6585](https://orcid.org/0000-0002-4846-6585)\{\}^\{\\lx@orcidlink\{0000\-0002\-4846\-6585\}\{\\orcidlogo\}\}[wangjunsong@sztu\.edu\.cn](https://arxiv.org/html/2605.30362v1/mailto:[email protected])School of Artificial Intelligence, Shenzhen Technology University, Shenzhen 518118, ChinaFaculty of Data Science, City University of Macau, Macau 999078, China
###### Abstract
Spiking neural networks \(SNNs\) hold promise for demonstrating superior learning and representation capabilities in deep models\. Given the tremendous success of ResNet in deep learning, it would naturally follow to train deep SNNs with residual learning\. However, existing residual structures for constructing deep SNNs still present challenges of spike redundancy or information loss, as well as redundant learning\. In the present study, we first aim to address issues of relative spike redundancy in identity mapping and information loss in non\-identity mapping\. To this end, we propose an OR\-ADD \(OA\) shortcut connection to merge output spikes/currents from two branches in the residual structure\. Furthermore, to mitigate redundant learning in the backbone branch of the residual structure, we introduce the concept of XOR meta\-residuals, i\.e\., selecting pre\-learning residuals using the Exclusive\-OR \(XOR\) operation for the backbone branch\. Finally, by integrating the OA shortcut and XOR meta\-residuals, we devise the XOR residual block and further construct XOResNet with varying depths based on this block\. Extensive experiments on four datasets, Fashion\-MNIST, CIFAR\-10, CIFAR\-100, and miniImageNet, show that the proposed XOResNet outperforms existing state\-of\-the\-art deep SNNs optimized via gradient descent\. These results validate the effectiveness of our OA shortcut and XOR meta\-residual components in overcoming fundamental limitations of residual learning in SNNs, providing new architectural insights for building high\-performance neuromorphic systems\.
###### keywords:
Spiking neural networks , Residual learning , OR\-ADD \(OA\) shortcut connection , Exclusive\-OR \(XOR\) meta\-residuals
††journal:LATEX## 1Introduction
Inspired by the working mechanisms of the human brain and the working patterns of biological neurons, spiking neural networks \(SNNs\) are considered a promising model in artificial intelligence, embodying high efficiency akin to the brainRoyet al\.\[[2019](https://arxiv.org/html/2605.30362#bib.bib1)\]\. Meanwhile, SNNs are also considered the third generation of neural networks due to their energy advantage of asynchronous binary spiking communication and powerful representation of spatio\-temporal dynamicsMaass \[[1997](https://arxiv.org/html/2605.30362#bib.bib2)\]\. By drawing on and mimicking the learning algorithms and network structure of artificial neural networks \(ANNs\), SNNs exhibit performance close to that of ANNs on some classification tasks but are still inferior to ANNs on complex tasksGuoet al\.\[[2023](https://arxiv.org/html/2605.30362#bib.bib3)\], Fanget al\.\[[2021](https://arxiv.org/html/2605.30362#bib.bib4)\], Xianet al\.\[[2024](https://arxiv.org/html/2605.30362#bib.bib6)\], Tanet al\.\[[2021](https://arxiv.org/html/2605.30362#bib.bib7)\], Vicente\-Solaet al\.\[[2025](https://arxiv.org/html/2605.30362#bib.bib8)\]\. An important reason is that discrete spikes and complex spatio\-temporal dynamics limit SNNs from directly adopting the deep construction method of ANNs\. However, deep networks have advantages over shallow networks in terms of computational cost and representational capabilityBengioet al\.\[[2007](https://arxiv.org/html/2605.30362#bib.bib5)\]\.
Artificial neural networks have achieved great success in various tasks, largely due to the success of deep learning\. The depth of a network is closely related to its performance on practical tasks, while the function represented by a deep network requires a single hidden\-layer network constructed from an exponential number of neural units to be comparableMontufaret al\.\[[2014](https://arxiv.org/html/2605.30362#bib.bib9)\]\. To solve the gradient problem in deep neural networks, Heet al\.Heet al\.\[[2016](https://arxiv.org/html/2605.30362#bib.bib10)\]proposed the concept of residual learning and used the residual structure to construct “very deep” networks\. Therefore, residual structure is widely used in the construction of deep neural networks and catalyzes the rapid development of deep learning\.
To achieve higher performance in SNNs, it will be natural to construct deeper networks with residual structure\. The spiking version of ResNet \(Spiking ResNet\)Huet al\.\[[2021](https://arxiv.org/html/2605.30362#bib.bib11)\], Senguptaet al\.\[[2019](https://arxiv.org/html/2605.30362#bib.bib12)\], Hanet al\.\[[2020](https://arxiv.org/html/2605.30362#bib.bib13)\]achieves state\-of\-the\-art performance on most datasets by replacing nonlinear activation units in ANNs with spiking neurons\. However, Spiking ResNet still suffers from performance degradation caused by gradient issues\. Meanwhile, simply transplanting shortcut connections from ANNs can disrupt the spike\-based identity mapping\. To train deep SNNs using spatio\-temporal backpropagation \(STBP\)Wuet al\.\[[2018](https://arxiv.org/html/2605.30362#bib.bib16)\], Fanget al\.Fanget al\.\[[2021](https://arxiv.org/html/2605.30362#bib.bib4)\]solved the gradient problem by summing output spikes from two branches, constructing SEW ResNet, a model of deep SNNs with over 100 layers\. However, this spike\-summing approach can lead to non\-spike computation, harming deployment on neuromorphic chips that process binary inputsShenet al\.\[[2024](https://arxiv.org/html/2605.30362#bib.bib14)\]\. To preserve the binary property of spikes in shortcut connections, Shanet al\.Shanet al\.\[[2023](https://arxiv.org/html/2605.30362#bib.bib15)\]used the OR operation to merge output spikes from two branches\. This maintains the binary property of spikes while reducing spike redundancy\. However, when the shortcut connection involves non\-identity mappings with scaling transformations, the OR operation may result in the loss of joint information from both branches\.
Furthermore, existing methods for constructing deep SNNs have almost entirely adopted the residual structure of ResNetHeet al\.\[[2016](https://arxiv.org/html/2605.30362#bib.bib10)\]\. This may result in spike redundancy in the residual learning of the backbone branch relative to the shortcut branch\. The residual branch \(backbone branch\) in ResNet specializes in capturing residual features \- specifically, the differential components between the layer input and the identity shortcut output\. However, only after the information from the two branches is fused can the learned residual information be truly determined\. In other words, the backbone branch does not determine in advance what residual information needs to be learned, i\.e\., the residuals of the backbone branch are post\-learning residuals\. In the case of information transmission via binary spikes, there may be relative redundancy of spikes in the two branches\. This may also result in redundant learning in the backbone branch\. To reduce spike redundancy and redundant learning, the backbone branch should be provided with pre\-learning residual guidance, i\.e\., selecting pre\-learning residuals\.
In the present study, we consider two cases of shortcut connection in the construction of deep SNNs: spike operation in identity mapping and information retention and utilization in non\-identity mapping\. To reduce spike redundancy of the backbone branch relative to the shortcut branch and facilitate residual learning of the backbone branch, we propose utilizing the Exclusive\-OR \(XOR\) operation to provide pre\-learning residuals, i\.e\., meta\-residuals, for the backbone branch\. We construct residual blocks using the aforementioned method and use them to construct deep SNNs called XOResNet, which consistently outperform both OA ResNet and OR ResNet\. We deepen XOResNet to 110 layers without encountering any degradation problem, and theoretically, it can be deepened to any desired depth\.
The main contributions and highlights of this study can be summarized as follows:
1. \(i\)For the shortcut connection of residual structures, we propose the OR\-ADD \(OA\) connection method\. If the shortcut branch realizes spike identity mapping, the output spikes of the shortcut and backbone branches are merged by the OR operation, achieving information complementation while maintaining the binary property of spikes\. If the shortcut branch realizes non\-identity mapping with scale transformation, the sum of output currents from both branches is used as input to the spiking neuron to avoid information loss\.
2. \(ii\)For the residual learning in the backbone branch, we propose to utilize the XOR operation to pre\-screen the residual features that require learning, thereby providing pre\-learning residuals \(\(i\.e\., meta\-residuals\)\) for the backbone branch\. This approach aims to reduce redundant learning in the backbone branch and enhance its residual learning capabilities\.
3. \(iii\)We integrate the OA shortcut and XOR meta\-residuals to construct deep SNNs called XOResNet\. Extensive comparisons across four benchmark datasets \(CIFAR\-10, CIFAR\-100, Fashion\-MNIST, and miniImageNet\) reveal that XOResNet consistently outperforms both OA ResNet and OR ResNet\. This demonstrates the efficiency of our proposed residual structure\.
The remainder of this paper is organized as follows\. Section[2](https://arxiv.org/html/2605.30362#S2)is an overview of related work on building deep SNNs\. In Section[3](https://arxiv.org/html/2605.30362#S3), we describe in detail the proposed shortcut branch connection method OR\-ADD\(OA\), the residual information extraction method, and the constructed XOResNet network\. In Section[4](https://arxiv.org/html/2605.30362#S4), we systematically present the datasets and experimental results\. In Section[5](https://arxiv.org/html/2605.30362#S5), we present a detailed discussion of our work\. Finally, in Section[6](https://arxiv.org/html/2605.30362#S6), we present our conclusions and further work\.
## 2Related Works
Deep networks offer advantages over shallow networks in terms of computational cost and representation ability\. Research efforts in constructing deep SNNs can be categorized into two main classes: \(1\) converting pre\-trained deep ANNs into SNNs, and \(2\) training deep SNNs with residual structures by spatio\-temporal backpropagation \(STBP\)\.
ANN to SNN conversion \(ANN2SNN\)ANN2SNN replaces the non\-linear activation units of the pre\-trained source ANN with spiking neuronsStöckl and Maass \[[2021](https://arxiv.org/html/2605.30362#bib.bib17)\], Tanget al\.\[[2023](https://arxiv.org/html/2605.30362#bib.bib18)\]\. The central idea of this approach is to use the firing rate of spiking neurons or the average postsynaptic potential to approximate ReLU activation in artificial neuronsShaoet al\.\[[2023](https://arxiv.org/html/2605.30362#bib.bib19)\]\. Some advanced conversion works on VGG and ResNet architectures with near\-lossless accuracy by adding scaling tricks like weight normalization and threshold balancingHuet al\.\[[2021](https://arxiv.org/html/2605.30362#bib.bib11)\], Senguptaet al\.\[[2019](https://arxiv.org/html/2605.30362#bib.bib12)\], Hanet al\.\[[2020](https://arxiv.org/html/2605.30362#bib.bib13)\], Duanet al\.\[[2022](https://arxiv.org/html/2605.30362#bib.bib20)\]\. However, hundreds or thousands of firing statistics must be performed on spiking neurons to approximate their firing rate to the activation output of a ReLU\.
Training deep SNNs with residual structures based on STBPThe nondifferentiable binary spike activity leads to the inability to train SNNs directly by backpropagation \(BP\)LeCunet al\.\[[1988](https://arxiv.org/html/2605.30362#bib.bib21)\], Lillicrapet al\.\[[2020](https://arxiv.org/html/2605.30362#bib.bib22)\]algorithms, however, the great success of BP in training deep ANNs is very tempting for the training of SNNs\. An algorithm called spatio\-temporal backpropagationWuet al\.\[[2018](https://arxiv.org/html/2605.30362#bib.bib16)\]was proposed for training SNNs by introducing the gradient of a differentiable function to surrogate the gradient of the Heaviside step function in the error backpropagation process\. Among works on training deep SNNs based on STBP, Spiking ResNetHuet al\.\[[2021](https://arxiv.org/html/2605.30362#bib.bib11)\], Senguptaet al\.\[[2019](https://arxiv.org/html/2605.30362#bib.bib12)\], Hanet al\.\[[2020](https://arxiv.org/html/2605.30362#bib.bib13)\], a spiking version of ResNet that fully adopted the connection structure of ResNet, still suffered from performance degradation caused by the gradient problem\. The SEW ResNet solves the gradient issue in Spiking ResNet by summing output spikes from the backbone and shortcut branches, enabling depths beyond 100 layersFanget al\.\[[2021](https://arxiv.org/html/2605.30362#bib.bib4)\]\. However, spike summation destroys the binary nature of spikes \(1\+1=21\+1=2\), hindering deployment on neuromorphic chips with binary inputsShenet al\.\[[2024](https://arxiv.org/html/2605.30362#bib.bib14)\]\. To preserve the binary nature of spikes, Shanet al\.Shanet al\.\[[2023](https://arxiv.org/html/2605.30362#bib.bib15)\]proposed using an OR operation to merge the output spikes from the backbone branch and the shortcut branch\. However, when the shortcut connections are non\-identity mappings with scale transformations, the OR operation may result in joint information loss from both branches\. Huet al\.Huet al\.\[[2024](https://arxiv.org/html/2605.30362#bib.bib23)\]directly merged the output currents of the two branches, proposing MS\-ResNet, which avoids some problems caused by spike operations\. Subsequently, this residual connection approach was also used to construct the spiking version of TransformerYaoet al\.\[[2023](https://arxiv.org/html/2605.30362#bib.bib24)\]\. However, this may also result in redundancy of output spikes during identity mapping\. Furthermore, the aforementioned works do not account for the specificities and advantages of spiking communications, resulting in a certain degree of redundant learning in the backbone branch\.
## 3Methods
### 3\.1Shortcut connections in residual structures
The fact that deep neural networks have demonstrated more powerful representation ability than shallow networks also holds for SNNs\. Simply increasing the depth of SNNs inevitably suffers from the performance degradation problem experienced by ANNs\. Ignoring the differences between SNNs and ANNs and duplicating ResNet’s residual structure exactly also fails to solve the performance degradation problem of deep SNNs in gradient\-based training\. The networks constructed from the two basic building blocks are Plain Network and Spiking ResNet, as shown in Fig\.[1](https://arxiv.org/html/2605.30362#S3.F1), but both deep networks suffer from performance degradation \(shown in Fig\.[2](https://arxiv.org/html/2605.30362#S3.F2)\)\.

Figure 1:The basic building blocks for deep SNNs\. \(a\) The basic building block in Plain Network\. \(b\) The basic building block in Spiking ResNet\.Sl\[t\]S^\{l\}\[t\]/Ol\[t\]O^\{l\}\[t\]denotes the input/output spikes of layerllat timett\.1×11\\times 1and3×33\\times 3denote the convolution kernel size\.BNBNis a batch normalization operation\.SNSNdenotes the spiking neuron\.\(a\)Plain Network
\(b\)Spiking ResNet
Figure 2:Training accuracy and test accuracy of models with different depths on CIFAR\-10\.In this work, we consider the specificity of spike communication in SNNs\. For the shortcut connection in a residual structure, we propose the OR\-ADD \(OA\) connection method\. Specifically, when spikes realize identity mapping \(im\), we merge the output spikes of the shortcut branch and backbone branch via the OR operation, maintaining the binary property of spikes and avoiding spike redundancy\. When the shortcut connection is a non\-identity mapping \(nim\) with scale transformation, the output currents from the two branches are summed as input to the spiking neuron \(SN\), avoiding information loss\.

Figure 3:Shortcut connection\. \(a\) OR shortcut connection\. \(b\) \(c\) OR\-ADD \(OA\) shortcut connection\.Table 1:Interconversion between logical operations and arithmetic operations\.As shown in Fig\.[3](https://arxiv.org/html/2605.30362#S3.F3)\(a\), the OR shortcut connection proposed inShanet al\.\[[2023](https://arxiv.org/html/2605.30362#bib.bib15)\]is effective for merging output spikes and maintaining the binary property when the shortcut realizes identity mapping\. However, for a non\-identity mapping shortcut, this may result in information loss\. Combining the dynamical properties of SN, logical operations \(Table[1](https://arxiv.org/html/2605.30362#S3.T1)\), and our proposed OA connection method \(Fig\.[3](https://arxiv.org/html/2605.30362#S3.F3)\(b\) and Fig\.[3](https://arxiv.org/html/2605.30362#S3.F3)\(c\)\), for non\-identity mapping:
SN\(fl\(Sl\[t\]\)\)\\displaystyle SN\(f^\{l\}\(S^\{l\}\[t\]\)\)≠fl\(Sl\[t\]\)≠Sl\[t\],\\displaystyle\\neq f^\{l\}\(S^\{l\}\[t\]\)\\neq S^\{l\}\[t\],\(1\)SN\(ℱl\(Sl\[t\]\)\)∨SN\(fl\(Sl\[t\]\)\)\\displaystyle SN\(\\mathcal\{F\}^\{l\}\(S^\{l\}\[t\]\)\)\\vee SN\(f^\{l\}\(S^\{l\}\[t\]\)\)≠SN\(ℱl\(Sl\[t\]\)\+fl\(Sl\[t\]\)\)\.\\displaystyle\\neq SN\(\\mathcal\{F\}^\{l\}\(S^\{l\}\[t\]\)\+f^\{l\}\(S^\{l\}\[t\]\)\)\.SN\(⋅\)SN\(\\cdot\)is a non\-linear activation unit that produces spiking outputs of 0 or 1\. Therefore, we propose the OA shortcut connection method, for which different spike/current merging methods are used for different shortcut connection purposes\. As shown in Fig\.[3](https://arxiv.org/html/2605.30362#S3.F3)\(b\) and Fig\.[3](https://arxiv.org/html/2605.30362#S3.F3)\(c\), the OA connection method can be expressed as follows:
OA\(Sl\[t\],ℱl\(Sl\[t\]\)\)=\{ℱl\(Sl\[t\]\)∨Sl\[t\]=ℱl\(Sl\[t\]\)\+Sl\[t\]−ℱl\(Sl\[t\]\)⋅Sl\[t\]imSN\(ℱl\(Sl\[t\]\)\+fl\(Sl\[t\]\)\)nim\.OA\(S^\{l\}\[t\],\\mathcal\{F\}^\{l\}\(S^\{l\}\[t\]\)\)=\\begin\{cases\}\\mathcal\{F\}^\{l\}\(S^\{l\}\[t\]\)\\vee S^\{l\}\[t\]=\\mathcal\{F\}^\{l\}\(S^\{l\}\[t\]\)\+S^\{l\}\[t\]\-\\mathcal\{F\}^\{l\}\(S^\{l\}\[t\]\)\\cdot S^\{l\}\[t\]\\quad\\text\{im\}\\\\ \\qquad\\qquad\\qquad SN\(\\mathcal\{F\}^\{l\}\(S^\{l\}\[t\]\)\+f^\{l\}\(S^\{l\}\[t\]\)\)\\quad\\text\{nim\}\\end\{cases\}\.\(2\)
### 3\.2The information retention advantages of OA connections
Next, from an information preservation perspective, we theoretically demonstrate that the ADD operation outperforms the OR operation in shortcut connections involving non\-identity mappings with scale transformations\. In Fig\.[3](https://arxiv.org/html/2605.30362#S3.F3)\(c\), for a non\-identity mapping with scale transformation, the joint distribution of the inputsℱl\(Sl\[t\]\)\\mathcal\{F\}^\{l\}\(S^\{l\}\[t\]\)andfl\(Sl\[t\]\)f^\{l\}\(S^\{l\}\[t\]\)is given byp\(ℱl\(Sl\[t\]\),fl\(Sl\[t\]\)\)p\(\\mathcal\{F\}^\{l\}\(S^\{l\}\[t\]\),f^\{l\}\(S^\{l\}\[t\]\)\)\. For simplicity, we denoteℱl\(Sl\[t\]\)\\mathcal\{F\}^\{l\}\(S^\{l\}\[t\]\)by the random variableX1X\_\{1\}andfl\(Sl\[t\]\)f^\{l\}\(S^\{l\}\[t\]\)byX2X\_\{2\}\. If the OR operation is applied to merge the spiking outputs of the backbone branch and the shortcut branch, the firing probability ofOORl\[t\]O^\{l\}\_\{OR\}\[t\]is:
p\(OORl\[t\]=1∣X1,X2\)=p\(X1\)\+p\(X2\)−p\(X1\)⋅p\(X2\)\.p\\left\(O^\{l\}\_\{OR\}\[t\]=1\\mid X\_\{1\},X\_\{2\}\\right\)=p\\left\(X\_\{1\}\\right\)\+p\(X\_\{2\}\)\-p\(X\_\{1\}\)\\cdot p\(X\_\{2\}\)\.\(3\)The conditional entropy is:
H\(OORl\[t\]∣X1,X2\)\\displaystyle H\\left\(O^\{l\}\_\{OR\}\[t\]\\mid X\_\{1\},X\_\{2\}\\right\)=HB\(p\(X1\)\+p\(X2\)−p\(X1\)⋅p\(X2\)\),\\displaystyle=H\_\{B\}\\left\(p\(X\_\{1\}\)\+p\(X\_\{2\}\)\-p\(X\_\{1\}\)\\cdot p\(X\_\{2\}\)\\right\),\(4\)HB\(p\)\\displaystyle H\_\{\{\}\_\{B\}\}\\left\(p\\right\)=−plogp−\(1−p\)log\(1−p\)\.\\displaystyle=\-p\\log p\-\(1\-p\)\\log\(1\-p\)\.Since the outputOORl\[t\]O^\{l\}\_\{OR\}\[t\]depends solely on the logical OR operation ofSN\(X1\)SN\(X\_\{1\}\)andSN\(X2\)SN\(X\_\{2\}\), it cannot distinguish whether the spiking event is triggered byX1X\_\{1\}orX2X\_\{2\}, leading to a loss of joint information\. If the ADD operation is applied to sum the currents, the merged current is then fed as input toSN\(⋅\)SN\(\\cdot\)\. The firing probability ofOADDl\[t\]O^\{l\}\_\{ADD\}\[t\]can then be expressed as:
P\(OADDl\[t\]=1\|X1,X2\)=p\(X1\+X2\)\.P\\left\(O^\{l\}\_\{ADD\}\[t\]=1\|X\_\{1\},X\_\{2\}\\right\)=p\\left\(X\_\{1\}\+X\_\{2\}\\right\)\.\(5\)The conditional entropy is:
H\(OADDl\[t\]∣X1,X2\)=HB\(p\(X1\+X2\)\)\.H\\left\(O^\{l\}\_\{ADD\}\[t\]\\mid X\_\{1\},X\_\{2\}\\right\)=H\_\{B\}\\left\(p\\left\(X\_\{1\}\+X\_\{2\}\\right\)\\right\)\.\(6\)The outputOADDl\[t\]O^\{l\}\_\{ADD\}\[t\]directly reflects the summation of inputs, preserving the linear combination information fromX1X\_\{1\}andX2X\_\{2\}\. Even when individual inputsX1X\_\{1\}orX2X\_\{2\}are subthreshold \(insufficient to trigger a spike\), their superpositionX1\+X2X\_\{1\}\+X\_\{2\}may still evoke a firing event, thereby capturing joint information\.
According to Jensen’s inequality:
p\(X1\)\+p\(X2\)−p\(X1\)⋅p\(X2\)≤p\(X1\+X2\)\.p\\left\(X\_\{1\}\\right\)\+p\\left\(X\_\{2\}\\right\)\-p\\left\(X\_\{1\}\\right\)\\cdot p\\left\(X\_\{2\}\\right\)\\leq p\\left\(X\_\{1\}\+X\_\{2\}\\right\)\.\(7\)This demonstrates that outputOADDl\[t\]O^\{l\}\_\{ADD\}\[t\]exhibits a higher firing probability, with the information entropy satisfyingH\(OADDl\[t\]\)≥H\(OORl\[t\]\)H\(O^\{l\}\_\{ADD\}\[t\]\)\\geq H\(O^\{l\}\_\{OR\}\[t\]\), while the conditional entropy showsH\(OADDl\[t\]∣X1,X2\)≤H\(OORl\[t\]∣X1,X2\)H\\left\(O^\{l\}\_\{ADD\}\[t\]\\mid X\_\{1\},X\_\{2\}\\right\)\\leq H\\left\(O^\{l\}\_\{OR\}\[t\]\\mid X\_\{1\},X\_\{2\}\\right\)\. The mutual information can be computed as:
I\(X1,X2;OADDl\[t\]\)\\displaystyle I\\left\(X\_\{1\},X\_\{2\};O^\{l\}\_\{ADD\}\[t\]\\right\)=H\(OADDl\[t\]\)−H\(OADDl\[t\]∣X1,X2\),\\displaystyle=H\\left\(O^\{l\}\_\{ADD\}\[t\]\\right\)\-H\\left\(O^\{l\}\_\{ADD\}\[t\]\\mid X\_\{1\},X\_\{2\}\\right\),\(8\)I\(X1,X2;OORl\[t\]\)\\displaystyle I\\left\(X\_\{1\},X\_\{2\};O^\{l\}\_\{OR\}\[t\]\\right\)=H\(OORl\[t\]\)−H\(OORl\[t\]∣X1,X2\)\.\\displaystyle=H\\left\(O^\{l\}\_\{OR\}\[t\]\\right\)\-H\\left\(O^\{l\}\_\{OR\}\[t\]\\mid X\_\{1\},X\_\{2\}\\right\)\.By applying the established inequality, we derive the following result:
I\(X1,X2;OADDl\[t\]\)≥I\(X1,X2;OORl\[t\]\)\.I\\left\(X\_\{1\},X\_\{2\};O^\{l\}\_\{ADD\}\[t\]\\right\)\\geq I\\left\(X\_\{1\},X\_\{2\};O^\{l\}\_\{OR\}\[t\]\\right\)\.\(9\)
The ADD operation preserves the joint information of input signals through current summation prior to spike generation, whereas the OR operation incurs greater information loss by independently processing inputs with logical disjunction\. Therefore, from an information\-theoretic perspective, the ADD operation demonstrates superior performance over the OR operation when implementing non\-identity mappings with scaling transformations\.
### 3\.3Exclusive\-OR Meta\-Residuals
SNNs rely on asynchronous binary spikes for transmitting and representing information\. Since spikes are binary, additional computation and redundant spikes are unnecessary\. However, the current residual structure used to construct deep SNNs doesn’t account for this, completely following the residual extraction method of ResNet, resulting in some spike redundancy and redundant learning\. To reduce spike redundancy and promote residual learning in the backbone branch, we propose selecting pre\-learning residuals for the backbone branch via the XOR operation\. Since these pre\-learning residuals produce the desired residuals, we also refer to them as meta\-residuals\.

Figure 4:Exclusive\-OR meta\-residuals structure\. \(a\) OR shortcut connection\. \(b\) ADD shortcut connection\.The proposed XOR meta\-residuals structure is shown in Fig\.[4](https://arxiv.org/html/2605.30362#S3.F4)\. Sincef1l\(Sl\[t\]\)f^\{l\}\_\{1\}\(S^\{l\}\[t\]\)is primarily used for channel transformation of feature maps with approximate input and output features, the output feature scale off2l\(Sl\[t\]\)f^\{l\}\_\{2\}\(S^\{l\}\[t\]\)is the same as that off1l\(Sl\[t\]\)f^\{l\}\_\{1\}\(S^\{l\}\[t\]\)\. The output features of the two branches are merged by the XOR operation to select the meta\-residuals that need to be learned, which is expressed as:
MR\\displaystyle MR=f1l\(Sl\[t\]\)⊕f2l\(Sl\[t\]\),\\displaystyle=f^\{l\}\_\{1\}\(S^\{l\}\[t\]\)\\oplus f^\{l\}\_\{2\}\(S^\{l\}\[t\]\),\(10\)⟹MR=f1l\(Sl\[t\]\)\\displaystyle\\Longrightarrow MR=f^\{l\}\_\{1\}\(S^\{l\}\[t\]\)\+f2l\(Sl\[t\]\)−2\(f1l\(Sl\[t\]\)×f2l\(Sl\[t\]\)\)\.\\displaystyle\+f^\{l\}\_\{2\}\(S^\{l\}\[t\]\)\-2\(f^\{l\}\_\{1\}\(S^\{l\}\[t\]\)\\times f^\{l\}\_\{2\}\(S^\{l\}\[t\]\)\)\.whereMRrepresents the meta\-residuals, providing meta\-residuals for the backbone branch to specialize in learning residual information that is different from that of the shortcut branch, which both reduces spike redundancy and facilitates residual learning in the backbone branch\.
Utilizing the OA shortcut connection method proposed in Section[3\.1](https://arxiv.org/html/2605.30362#S3.SS1)for merging output spikes/currents, and the proposed XOR meta\-residuals structure is shown in Fig\.[4](https://arxiv.org/html/2605.30362#S3.F4)\. In the following, we combine these two shortcut connection methods to demonstrate that XOResNet, a deep SNN constructed based on the residual structure with XOR meta\-residuals, does not suffer from the gradient vanishing/exploding problem during gradient\-based training\.
When the residualℱl\(Sl\[t\]\)=0\\mathcal\{F\}^\{l\}\(S^\{l\}\[t\]\)=0the identity mapping is completed by an OR connection, and the gradient of the outputOl\[t\]O^\{l\}\[t\]of the*l*\-th residual block to the inputSl\[t\]S^\{l\}\[t\]is computed as:
∂Ol\[t\]∂Sl\[t\]=∂\(ℱl\(Sl\[t\]\)\+Sl\[t\]−ℱl\(Sl\[t\]\)×Sl\[t\]\)∂Sl\[t\]=∂Sl\[t\]∂Sl\[t\]=1\.\\begin\{aligned\} \\frac\{\\partial O^\{l\}\[t\]\}\{\\partial S^\{l\}\[t\]\}&=\\frac\{\\partial\(\\mathcal\{F\}^\{l\}\(S^\{l\}\[t\]\)\+S^\{l\}\[t\]\-\\mathcal\{F\}^\{l\}\(S^\{l\}\[t\]\)\\times S^\{l\}\[t\]\)\}\{\\partial S^\{l\}\[t\]\}\\\\ &=\\frac\{\\partial S^\{l\}\[t\]\}\{\\partial S^\{l\}\[t\]\}=1\\end\{aligned\}\.\(11\)Therefore, the OA shortcut connection can overcome the vanishing/exploding gradient problem, and in principle, XOResNet can be deepened to any desired depth\.
### 3\.4Network Architectures
The proposed XOR meta\-residuals block is used as the basic building block to stack the XOResNet to different depths, as shown in Table[2](https://arxiv.org/html/2605.30362#S3.T2), where×K\\times Kdenotes the number of iterative stackings\. Images from different datasets have different sizes and corresponding output sizes\. For instance,2828denotes an output size of28×2828\\times 28, and other sizes follow a similar pattern\. Downsampling is performed in Stage 2 and Stage 3 by a convolution operation with stride=2\. In the stem stage, for an input image of256×256256\\times 256resolution, we first utilize a4×44\\times 4convolution to reduce its size to14\\frac\{1\}\{4\}of the original\. The structure of the proposed XOResNet is illustrated in Fig\.[5](https://arxiv.org/html/2605.30362#S3.F5), unfolded in the temporal and spatial dimensions with shared network parameters in the temporal dimension\. XOResNet comprises a spiking encoder network and a classifier network, where×N\\times Ncorresponds to “Stage” in Table[2](https://arxiv.org/html/2605.30362#S3.T2)and×M\\times Mto×K\\times Kin Table[2](https://arxiv.org/html/2605.30362#S3.T2)\.

Figure 5:The network structure of XOResNet and its unfolded formulation\.×M\\times Mdenotes the number of iterations of the identity mapping, and×N\\times Ndenotes the number of downsample steps\. Note that the network’s parameters are shared at all time\-steps\.Table 2:The architecture of XOResNet with different depths\.- •The three output sizes respectively correspond to the Fashion\-MNIST, CIFAR\-10/100, and miniImageNet datasets\.
## 4Experiments and results
We conduct extensive experiments on four publicly available datasets: Fashion\-MNISTXiaoet al\.\[[2017](https://arxiv.org/html/2605.30362#bib.bib25)\], CIFAR\-10Krizhevskyet al\.\[[2009](https://arxiv.org/html/2605.30362#bib.bib26)\], CIFAR\-100Krizhevskyet al\.\[[2009](https://arxiv.org/html/2605.30362#bib.bib26)\], and miniImageNetVinyalset al\.\[[2016](https://arxiv.org/html/2605.30362#bib.bib27)\]\. For simplicity, we refer to the OR\-connected ResNet as OR ResNet and the OA\-connected ResNet as OA ResNet\. Both architectures do not incorporate meta\-residual configurations, and they differ solely in their non\-identity mapping connections\.
### 4\.1Performance evaluation in classification tasks
We construct XOResNet models of varying depths according to the structure in Table[2](https://arxiv.org/html/2605.30362#S3.T2), and compare their performance with OA ResNet, OR ResNet, Spiking ResNet, and Plain Network models of the same depths, as shown in Fig\.[6](https://arxiv.org/html/2605.30362#S4.F6)\. We report the mean and standard deviation of model accuracy across 10 trials\. Both Plain Network and Spiking ResNet suffer from performance degradation as model depth increases\. However, OR ResNet, OA ResNet, and XOResNet do not exhibit this problem, as shown in Fig\.[7](https://arxiv.org/html/2605.30362#S4.F7): the training and test accuracy of these models are not inferior to those of shallower models across the four datasets\. This indicates that the OR and OA shortcut connections alleviate the vanishing/exploding gradient issues caused by deep network architectures\. As shown in Fig\.[6](https://arxiv.org/html/2605.30362#S4.F6), XOResNet significantly outperforms other models, indicating our proposed XOR meta\-residuals structure facilitates deep SNN learning\. Meanwhile, OA ResNet outperforms OR ResNet at the same depth, indicating OR shortcut solves the gradient problem but results in information loss for non\-identity mapping, which the OA connection resolves\.

Figure 6:Evaluation of models with varying depths on the CIFAR\-10 dataset\.\(a\)Fashion\-MNIST
\(b\)CIFAR\-10
\(c\)CIFAR\-100
\(d\)miniImageNet
Figure 7:The training and test accuracy of different models with different depths on different datasets\.Table 3:Performance comparison between the proposed method and previous works on different datasets\.DatasetMethodTime\-stepsAccuracy\(%\)Fashion\-MNISTSpiking ResNetZhenget al\.\[[2021](https://arxiv.org/html/2605.30362#bib.bib36)\]1693\.94LISNNChenget al\.\[[2020](https://arxiv.org/html/2605.30362#bib.bib28)\]2092\.07ST\-RSBPZhang and Li\[[2019](https://arxiv.org/html/2605.30362#bib.bib29)\]190\.13SNN\-BPZhuet al\.\[[2022](https://arxiv.org/html/2605.30362#bib.bib30)\]593\.28BackEISNNZhaoet al\.\[[2022](https://arxiv.org/html/2605.30362#bib.bib31)\]3093\.04±\\pm0\.31TSSL\-BPZhang and Li\[[2020](https://arxiv.org/html/2605.30362#bib.bib32)\]592\.69±\\pm0\.09OR\-Spiking ResNetShanet al\.\[[2023](https://arxiv.org/html/2605.30362#bib.bib15)\]1694\.21OR ResNet50Shanet al\.\[[2023](https://arxiv.org/html/2605.30362#bib.bib15)\]893\.88±\\pm0\.02OA ResNet50\(ours\)894\.11±\\pm0\.06XOResNet50\(ours\)894\.53±\\pm0\.05CIFAR\-10Spiking ResNetZhenget al\.\[[2021](https://arxiv.org/html/2605.30362#bib.bib36)\]1688\.65ANN2SNNHunsberger and Eliasmith\[[2015](https://arxiv.org/html/2605.30362#bib.bib33)\]3082\.95ANN2SNNCaoet al\.\[[2015](https://arxiv.org/html/2605.30362#bib.bib35)\]40077\.43ANN2SNN\(ResNet20\)Senguptaet al\.\[[2019](https://arxiv.org/html/2605.30362#bib.bib12)\]200087\.46NeuNormWuet al\.\[[2019](https://arxiv.org/html/2605.30362#bib.bib34)\]1290\.53ANN2SNN\(ResNet20\)Hanet al\.\[[2020](https://arxiv.org/html/2605.30362#bib.bib13)\]204891\.36MS\-ResNet56Huet al\.\[[2024](https://arxiv.org/html/2605.30362#bib.bib23)\]690\.4OR\-Spiking ResNetShanet al\.\[[2023](https://arxiv.org/html/2605.30362#bib.bib15)\]1689\.72OR ResNet110Shanet al\.\[[2023](https://arxiv.org/html/2605.30362#bib.bib15)\]888\.09±\\pm0\.22OA ResNet110\(ours\)888\.60±\\pm0\.21XOResNet110\(ours\)890\.54±\\pm0\.16CIFAR\-100VGG11Zhuet al\.\[[2022](https://arxiv.org/html/2605.30362#bib.bib30)\]1263\.97ANN2SNN\(ResNet20\)Hanet al\.\[[2020](https://arxiv.org/html/2605.30362#bib.bib13)\]204867\.82MS\-ResNet50Huet al\.\[[2024](https://arxiv.org/html/2605.30362#bib.bib23)\]665\.24OR ResNet50Shanet al\.\[[2023](https://arxiv.org/html/2605.30362#bib.bib15)\]858\.58±\\pm0\.14OA ResNet50\(ours\)861\.85±\\pm0\.13XOResNet50\(ours\)866\.30±\\pm0\.13miniImageNetSNN\-ResNet\-12\(5\-way 1\-shot \)Zhanet al\.\[[2024](https://arxiv.org/html/2605.30362#bib.bib37)\]1648\.37±\\pm0\.24SNN\-ResNet\-12\(5\-way 5\-shot \)Zhanet al\.\[[2024](https://arxiv.org/html/2605.30362#bib.bib37)\]1665\.61±\\pm0\.26Matching networksVinyalset al\.\[[2016](https://arxiv.org/html/2605.30362#bib.bib27)\]\-60\.00BASSQiet al\.\[[2024](https://arxiv.org/html/2605.30362#bib.bib38)\]\-36\.60±\\pm0\.30OR ResNet44Shanet al\.\[[2023](https://arxiv.org/html/2605.30362#bib.bib15)\]453\.68±\\pm0\.24OA ResNet44\(ours\)457\.59±\\pm0\.20XOResNet44\(ours\)468\.00±\\pm0\.21
To evaluate the effectiveness of the proposed method for facilitating deep SNN learning, we compare it with existing state\-of\-the\-art methods\. Table[3](https://arxiv.org/html/2605.30362#S4.T3)reports the results on the four datasets\. On the Fashion\-MNIST dataset, the highest recognition accuracy of our XOResNet50 is 94\.53%, which is far superior to other methods\. On the CIFAR\-10 dataset, our XOResNet achieves optimal recognition performance except for ANN2SNN proposed inHanet al\.\[[2020](https://arxiv.org/html/2605.30362#bib.bib13)\]\. Notably, ANN2SNN inHanet al\.\[[2020](https://arxiv.org/html/2605.30362#bib.bib13)\]uses 2,048 time\-steps, whereas our XOResNet uses only 8, reducing inference time to 1/256\.Senguptaet al\.\[[2019](https://arxiv.org/html/2605.30362#bib.bib12)\],Hunsberger and Eliasmith \[[2015](https://arxiv.org/html/2605.30362#bib.bib33)\]andCaoet al\.\[[2015](https://arxiv.org/html/2605.30362#bib.bib35)\]also use ANN2SNN but with fewer time\-steps and correspondingly weaker performance than that of literatureHanet al\.\[[2020](https://arxiv.org/html/2605.30362#bib.bib13)\], showing inference time\-step is critical for ANN2SNN\. This also indicates training deep SNNs with STBP is necessary\. Similarly, on the 100\-categorized CIFAR\-100 dataset, our method is second only to the method presented inHanet al\.\[[2020](https://arxiv.org/html/2605.30362#bib.bib13)\]\. The resolution of images in the aforementioned three datasets is low, and we need to evaluate XOResNet performance on a more complex dataset\. Since ImageNet is too large, we use its subset miniImageNet as an alternative, containing 100 classes with 600 images each\. We standardize image resolution to256×256256\\times 256\. The significant performance advantage of our XOResNet44 over methods based on few\-shot learning shows that our model is effective at extracting features for both simple and complex images\.

Figure 8:The Jaccard similarity coefficient between the input and output binary spike features of the backbone branch in the residual block with identity mapping\.To further demonstrate the effectiveness of the proposed XOR meta\-residuals structure in reducing redundant learning, we compute the Jaccard similarity coefficientBaget al\.\[[2019](https://arxiv.org/html/2605.30362#bib.bib39)\]between the input and output binary spike features of the backbone branch in the residual block with identity mapping\. The Jaccard similarity coefficient is the intersection over union \(IOU\) of two binary spike features\. A larger value indicates greater similarity, and vice versa\. As shown in Fig\.[8](https://arxiv.org/html/2605.30362#S4.F8), for the four datasets, the depths of the SNN models are 56, 50, 50, and 32, respectively, and the corresponding numbers of residual blocks with identity mapping are 15, 13, 13, and 7, respectively\. On all four datasets, Jaccard similarity coefficients between input and output features of the backbone branch with XOR meta\-residuals are smaller than those without meta\-residuals, indicating XOR meta\-residuals reduce redundancy learning of SNN and promote residual learning of the backbone branch\.
### 4\.2Ablation study
The proposed XOResNet shows superior performance compared to other models and methods on four datasets: Fashion\-MNIST, CIFAR\-10, CIFAR\-100, and miniImageNet\. To futher validate the effectiveness of the proposed individual components, we perform ablation analysis on these four datasets\.
Fashion\-MNISTThe Fashion\-MNIST dataset comprises grayscale images, each with a resolution of28×2828\\times 28pixels\. Based on the structural setup in Table[2](https://arxiv.org/html/2605.30362#S3.T2), we construct five SNN networks with varying depths, as detailed in Table[4](https://arxiv.org/html/2605.30362#S4.T4)\. As a baseline method, the OR shortcut connection method solves the gradient problem\. As demonstrated in Table[4](https://arxiv.org/html/2605.30362#S4.T4), SNNs employing OR shortcut connections for both identity and non\-identity mappings exhibit no performance degradation when the network depth is increased from 11 to 50 layers\. However, we demonstrate that the OR shortcut connection is inapplicable to non\-identity mapping in Eq\. \([1](https://arxiv.org/html/2605.30362#S3.E1)\)\. We propose an OA connection method that addresses both identity and non\-identity mappings\. For identity mapping, the method merges the output spikes from two branches using OR shortcut connections, thereby eliminating spike redundancy while preserving the binary properties of the spikes\. For non\-identity mapping, the method aggregates the current sums from the two branches as inputs to the spiking neurons, effectively mitigating information loss\. As shown in Table[4](https://arxiv.org/html/2605.30362#S4.T4), SNNs employing OA connections exhibit no performance degradation as the network depth increases from 11 to 50 layers, indicating that the proposed OA connection method effectively mitigates gradient\-related issues\. For identical network depths, SNNs with OA connections demonstrate superior recognition performance compared to those with OR connections, indicating that current merging proves more effective than spike merging for non\-identity mappings\.
Table 4:The test accuracy of different depth models on the Fashion\-MNIST dataset\.We further integrate the meta\-residual and OA connections into the SNN\. As shown in Table[4](https://arxiv.org/html/2605.30362#S4.T4), for SNNs with identical depths, networks incorporating meta\-residuals markedly outperform those without meta\-residuals in terms of recognition performance\. This indicates that the proposed meta\-residual structure promotes learning in deep SNNs\. Furthermore, Fig\.[9](https://arxiv.org/html/2605.30362#S4.F9)presents the confusion matrices of 50\-layer networks under different configurations\. The three network configurations achieved average test accuracies of 94\.00%, 94\.20%, and 94\.70% on the 10\-class classification task, further validating the effectiveness of the proposed components\.

Figure 9:Confusion matrices of 50\-layer networks on the Fashion\-MNIST dataset\. The left column represents the training dataset, and the right column represents the test dataset\.Table 5:The test accuracy of different depth models on the CIFAR\-10 dataset\.
Figure 10:Confusion matrices of 110\-layer networks on the CIFAR\-10 dataset\. The left column represents the training dataset, and the right column represents the test dataset\.CIFAR\-10We construct seven networks with varying depths, as shown in Table[5](https://arxiv.org/html/2605.30362#S4.T5), using different components based on the structural configuration in Table[2](https://arxiv.org/html/2605.30362#S3.T2)\. The networks exhibit no performance degradation as the model depth increases from 11 to 110 layers, indicating that both OR shortcut connections and OA shortcut connections can effectively address the gradient issues in deep models\. For networks of identical depth, those using the OA connection method consistently demonstrate superior recognition performance compared to networks employing the OR connection method\. This indicates that current merging proves more effective than spike merging for non\-identity mapping connections\. The introduction of the meta\-residual component further improves the recognition performance of networks at each depth level, demonstrating that the proposed meta\-residual component enhances residual learning in deep SNNs\. Fig\.[10](https://arxiv.org/html/2605.30362#S4.F10)shows the confusion matrices of three 110\-layer networks with different components\. The average recognition accuracies of these networks on the CIFAR\-10 classification task are 88\.10%, 88\.30%, and 90\.90%, respectively, which again demonstrates the effectiveness of our proposed individual components\.

Figure 11:Evaluation of models with varying depths on the CIFAR\-100 dataset\.Table 6:The test accuracy of different depth models on the CIFAR\-100 dataset\.CIFAR\-100Compared to the CIFAR\-10 dataset with 10 classes, the CIFAR\-100 dataset has 100 classes but only 600 samples per class, making model training and learning more challenging\. As illustrated in Fig\.[11](https://arxiv.org/html/2605.30362#S4.F11), across four networks with varying depths, the OA connection method consistently outperforms the OR method in model performance throughout all learning stages, while the introduction of the meta\-residual component markedly enhances learning capabilities\. Table[6](https://arxiv.org/html/2605.30362#S4.T6)further demonstrates that OA shortcut connections outperform OR shortcut connections and that the meta\-residual component enhances residual learning in the model, which aligns with the conclusions drawn from Table[4](https://arxiv.org/html/2605.30362#S4.T4)and Table[5](https://arxiv.org/html/2605.30362#S4.T5)\.
miniImageNetThe image resolution for the aforementioned three datasets does not exceed32×3232\\times 32\. We evaluate model performance on the more complex miniImageNet dataset, standardizing image resolution to256×256256\\times 256\. On higher\-resolution and more complex images, as demonstrated in Table[7](https://arxiv.org/html/2605.30362#S4.T7), the advantages of the OA connection method and the meta\-residual component become even more pronounced\. This superiority persists throughout the entire learning process of the model, as illustrated in Fig\.[12](https://arxiv.org/html/2605.30362#S4.F12)\.
Table 7:The test accuracy of different depth models on the miniImageNet dataset\.
Figure 12:Evaluation of models with varying depths on the miniImageNet dataset\.Through extensive experiments, we demonstrate that the proposed XOR meta\-residuals structure can promote the learning of deep SNNs\. Meanwhile, systematic ablation studies reveal the effectiveness of our proposed components\.
## 5Discussion
The structure and ideas of ResNet in ANNs are inspirational and referential for constructing deep SNNs\. Nevertheless, disregarding the spike binary property and spatio\-temporal dynamics, Spiking ResNet, which exactly mimics the connection structure of ResNet \(Fig\.[1](https://arxiv.org/html/2605.30362#S3.F1)\), still suffers from performance degradation problem \(Fig\.[2](https://arxiv.org/html/2605.30362#S3.F2)\)\. The OR operation \(Fig\.[3](https://arxiv.org/html/2605.30362#S3.F3)\(a\)\) for solving performance degradation and maintaining the spike binary property is effective but causes information loss in the non\-identity mapping \(Eq\. \([1](https://arxiv.org/html/2605.30362#S3.E1)\) and Eq\. \([9](https://arxiv.org/html/2605.30362#S3.E9)\)\)\. We propose the OR\-ADD \(OA\) connection method for shortcut connections in deep SNNs, as depicted in Eq\. \([2](https://arxiv.org/html/2605.30362#S3.E2)\) and Fig\.[3](https://arxiv.org/html/2605.30362#S3.F3)\(b\)\(c\): \(1\) For shortcut connection with identity mapping, the output spikes from the two branches are merged using an OR operation, maintaining the binary spike property and avoiding redundancy\. \(2\) For non\-identity mapping connections requiring dimensional transformation, the sum of both branches’ output currents serves as input to spiking neurons, avoiding information loss\. We conduct a comprehensive analysis to elucidate the effectiveness of the proposed OA connection method in solving the vexing vanishing/exploding gradient problem\. The innovative approach, as succinctly outlined in Eq\. \([11](https://arxiv.org/html/2605.30362#S3.E11)\), offers a promising solution for constructing deep SNNs\.
For residual learning of the backbone branch, given the consideration of binary spike communication, our innovative proposal is to select residual features via the XOR operation, an approach aimed at averting excessive computation and reducing spike redundancy\. We introduce the concept of meta\-residuals, as denoted in Eq\. \([10](https://arxiv.org/html/2605.30362#S3.E10)\), which refers to the selected residual features that are yet to be learned\. The integration of the proposed OA shortcut and meta\-residuals concepts within the residual block leads to the formulation of the XOR meta\-residuals structure, as visually represented in Fig\.[4](https://arxiv.org/html/2605.30362#S3.F4)\. Employing this novel structure, we can construct XOResNet models of varying depths, the specifics of which are detailed in Table[2](https://arxiv.org/html/2605.30362#S3.T2)and Fig\.[5](https://arxiv.org/html/2605.30362#S3.F5)\.
We conduct extensive experiments on four datasets: Fashion\-MNIST, CIFAR\-10, CIFAR\-100, and miniImageNet\. In comparison to state\-of\-the\-art methods, as shown in Table[3](https://arxiv.org/html/2605.30362#S4.T3), our XOResNet method outperforms state\-of\-the\-art deep SNNs trained with STBP on all four datasets\. Meanwhile, Fig\.[6](https://arxiv.org/html/2605.30362#S4.F6)shows that both Plain Network and Spiking ResNet suffer from performance degradation, which can be solved by both OR connection and OA connection\. The OA connection further compensates for the information loss problem in the OR connection\. We rigorously monitor the model’s performance changes during training\. As shown in Fig\.[7](https://arxiv.org/html/2605.30362#S4.F7), the deep model’s performance is not inferior to the shallow model at any stage, matching exactly with the results and conclusions in Fig\.[6](https://arxiv.org/html/2605.30362#S4.F6)\. We further demonstrate the contribution of the XOR meta\-residuals structure to facilitating backbone branch residual learning via Jaccard similarity coefficients \(Fig\.[8](https://arxiv.org/html/2605.30362#S4.F8)\)\. Ablation experiments \(Figs\.[9](https://arxiv.org/html/2605.30362#S4.F9)\-[12](https://arxiv.org/html/2605.30362#S4.F12), Tables\.[4](https://arxiv.org/html/2605.30362#S4.T4)\-[7](https://arxiv.org/html/2605.30362#S4.T7)\) demonstrate the effectiveness of the proposed OA in addressing degradation and information loss, and the contribution of XOR meta\-residuals for residual learning\. Our method provides an empirically supported reference for constructing deep SNNs, preserving the binary nature of spikes while mitigating information loss, spike redundancy, and redundant learning\.
## 6Conclusion
Spiking neural networks are considered promising models for achieving machine intelligence\. The residual connection and design of ResNet in ANNs provide valuable insights for constructing deep SNNs using gradient\-based training\. In this work, we consider the specificity of spike communication and propose the OA shortcut connection method for SNNs\. This maintains the binary property of spikes without causing redundancy or information loss\. We also propose the XOR meta\-residuals to facilitate residual learning in deep SNNs by selecting pre\-learning features\. Integrating these ideas, we propose the XOR meta\-residuals structure and use it to construct deep XOResNet\. Extensive experiments on four datasets demonstrate the superiority and efficiency of XOResNet, which can be deepened in principle to arbitrary depth\. For future work, we will use XOResNet for research on biologically plausible Few\-Shot Learning algorithms\.
## CRediT authorship contribution statement
Jianfang Wu:Writing \- original draft, Software, Methodology, Formal analysis, Visualization, Validation, Conceptualization\.Junsong Wang:Formal analysis, Writing \- review & editing, Validation, Supervision, Funding acquisition, Project administration\.
## Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper\.
## Data availability
The data used in this study are publicly available\.
## Acknowledgments
This work was supported by Shenzhen Natural Science Foundation Project \(Grant No\. JCYJ20250604145123031\), Shenzhen Science and Technology Major Project \(Grant No\. KJZD20230923114615032\), the Major Project of Science and Technology Research Program of Chongqing Education Commission \(Grant No\. KJZDM202302001\), the National Natural Science Foundation of China \(Grant No\. 61876132\), the Shenzhen University of Technology Self\-made Experimental Instruments and Equipment Project \(Grant No\. JSZZ202301006\), the Open Fund of National Engineering Laboratory for Big Data System Computing Technology \(Grant No\. SZU\-BDSC\-OF2024\-13\), and the Guangxi Key Laboratory of Brain\-inspired Computing and Intelligent Chips \(Grant No\. BCIC\-24\-K8\)\.
## References
- S\. Bag, S\. K\. Kumar, and M\. K\. Tiwari \(2019\)An efficient recommendation generation using relevant jaccard similarity\.Information Sciences483,pp\. 53–64\.Cited by:[§4\.1](https://arxiv.org/html/2605.30362#S4.SS1.p3.1)\.
- Y\. Bengio, Y\. LeCun,et al\.\(2007\)Scaling learning algorithms towards ai\.Large\-scale kernel machines34\(5\),pp\. 1–41\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p1.1)\.
- Y\. Cao, Y\. Chen, and D\. Khosla \(2015\)Spiking deep convolutional neural networks for energy\-efficient object recognition\.International Journal of Computer Vision113,pp\. 54–66\.Cited by:[§4\.1](https://arxiv.org/html/2605.30362#S4.SS1.p2.1),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.26.9.1)\.
- X\. Cheng, Y\. Hao, J\. Xu, and B\. Xu \(2020\)LISNN: improving spiking neural networks with lateral interactions for robust object recognition\.\.InIJCAI,pp\. 1519–1525\.Cited by:[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.20.3.1)\.
- C\. Duan, J\. Ding, S\. Chen, Z\. Yu, and T\. Huang \(2022\)Temporal effective batch normalization in spiking neural networks\.Advances in Neural Information Processing Systems35,pp\. 34377–34390\.Cited by:[§2](https://arxiv.org/html/2605.30362#S2.p2.1)\.
- W\. Fang, Z\. Yu, Y\. Chen, T\. Huang, T\. Masquelier, and Y\. Tian \(2021\)Deep residual learning in spiking neural networks\.Advances in Neural Information Processing Systems34,pp\. 21056–21069\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p1.1),[§1](https://arxiv.org/html/2605.30362#S1.p3.1),[§2](https://arxiv.org/html/2605.30362#S2.p3.1)\.
- Y\. Guo, W\. Peng, Y\. Chen, L\. Zhang, X\. Liu, X\. Huang, and Z\. Ma \(2023\)Joint a\-snn: joint training of artificial and spiking neural networks via self\-distillation and weight factorization\.Pattern Recognition142,pp\. 109639\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p1.1)\.
- B\. Han, G\. Srinivasan, and K\. Roy \(2020\)Rmp\-snn: residual membrane potential neuron for enabling deeper high\-accuracy and low\-latency spiking neural network\.InProceedings of the IEEE/CVF conference on computer vision and pattern recognition,pp\. 13558–13567\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p3.1),[§2](https://arxiv.org/html/2605.30362#S2.p2.1),[§2](https://arxiv.org/html/2605.30362#S2.p3.1),[§4\.1](https://arxiv.org/html/2605.30362#S4.SS1.p2.1),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.29.12.1),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.33.16.1)\.
- K\. He, X\. Zhang, S\. Ren, and J\. Sun \(2016\)Deep residual learning for image recognition\.InProceedings of the IEEE conference on computer vision and pattern recognition,pp\. 770–778\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p2.1),[§1](https://arxiv.org/html/2605.30362#S1.p4.1)\.
- Y\. Hu, H\. Tang, and G\. Pan \(2021\)Spiking deep residual networks\.IEEE Transactions on Neural Networks and Learning Systems34\(8\),pp\. 5200–5205\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p3.1),[§2](https://arxiv.org/html/2605.30362#S2.p2.1),[§2](https://arxiv.org/html/2605.30362#S2.p3.1)\.
- Y\. Hu, L\. Deng, Y\. Wu, M\. Yao, and G\. Li \(2024\)Advancing spiking neural networks toward deep residual learning\.IEEE transactions on neural networks and learning systems36\(2\),pp\. 2353–2367\.Cited by:[§2](https://arxiv.org/html/2605.30362#S2.p3.1),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.30.13.1),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.34.17.1)\.
- E\. Hunsberger and C\. Eliasmith \(2015\)Spiking deep networks with lif neurons\.arXiv preprint arXiv:1510\.08829\.Cited by:[§4\.1](https://arxiv.org/html/2605.30362#S4.SS1.p2.1),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.25.8.1)\.
- A\. Krizhevsky, G\. Hinton,et al\.\(2009\)Learning multiple layers of features from tiny images\.Cited by:[§4](https://arxiv.org/html/2605.30362#S4.p1.1)\.
- Y\. LeCun, D\. Touresky, G\. Hinton, and T\. Sejnowski \(1988\)A theoretical framework for back\-propagation\.InProceedings of the 1988 connectionist models summer school,Vol\.1,pp\. 21–28\.Cited by:[§2](https://arxiv.org/html/2605.30362#S2.p3.1)\.
- T\. P\. Lillicrap, A\. Santoro, L\. Marris, C\. J\. Akerman, and G\. Hinton \(2020\)Backpropagation and the brain\.Nature Reviews Neuroscience21\(6\),pp\. 335–346\.Cited by:[§2](https://arxiv.org/html/2605.30362#S2.p3.1)\.
- W\. Maass \(1997\)Networks of spiking neurons: the third generation of neural network models\.Neural networks10\(9\),pp\. 1659–1671\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p1.1)\.
- G\. F\. Montufar, R\. Pascanu, K\. Cho, and Y\. Bengio \(2014\)On the number of linear regions of deep neural networks\.Advances in neural information processing systems27\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p2.1)\.
- Y\. Qi, Y\. Ban, T\. Wei, J\. Zou, H\. Yao, and J\. He \(2024\)Meta\-learning with neural bandit scheduler\.Advances in Neural Information Processing Systems36\.Cited by:[Table 3](https://arxiv.org/html/2605.30362#S4.T3.14.14.14.2)\.
- K\. Roy, A\. Jaiswal, and P\. Panda \(2019\)Towards spike\-based machine intelligence with neuromorphic computing\.Nature575\(7784\),pp\. 607–617\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p1.1)\.
- A\. Sengupta, Y\. Ye, R\. Wang, C\. Liu, and K\. Roy \(2019\)Going deeper in spiking neural networks: vgg and residual architectures\.Frontiers in neuroscience13,pp\. 425055\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p3.1),[§2](https://arxiv.org/html/2605.30362#S2.p2.1),[§2](https://arxiv.org/html/2605.30362#S2.p3.1),[§4\.1](https://arxiv.org/html/2605.30362#S4.SS1.p2.1),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.27.10.1)\.
- Y\. Shan, X\. Qiu, R\. Zhu, R\. Li, M\. Wang, and H\. Qu \(2023\)OR residual connection achieving comparable accuracy to add residual connection in deep residual spiking neural networks\.arXiv preprint arXiv:2311\.06570\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p3.1),[§2](https://arxiv.org/html/2605.30362#S2.p3.1),[§3\.1](https://arxiv.org/html/2605.30362#S3.SS1.p3.2),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.15.15.15.2),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.23.6.1),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.31.14.1),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.3.3.3.2),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.6.6.6.2),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.9.9.9.2)\.
- Z\. Shao, X\. Fang, Y\. Li, C\. Feng, J\. Shen, and Q\. Xu \(2023\)EICIL: joint excitatory inhibitory cycle iteration learning for deep spiking neural networks\.Advances in Neural Information Processing Systems36,pp\. 32117–32128\.Cited by:[§2](https://arxiv.org/html/2605.30362#S2.p2.1)\.
- H\. Shen, H\. Wang, Y\. Ma, L\. Li, S\. Duan, and S\. Wen \(2024\)Multi\-lra: multi logical residual architecture for spiking neural networks\.Information Sciences660,pp\. 120136\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p3.1),[§2](https://arxiv.org/html/2605.30362#S2.p3.1)\.
- C\. Stöckl and W\. Maass \(2021\)Optimized spiking neurons can classify images with high accuracy through temporal coding with two spikes\.Nature Machine Intelligence3\(3\),pp\. 230–238\.Cited by:[§2](https://arxiv.org/html/2605.30362#S2.p2.1)\.
- C\. Tan, M\. Šarlija, and N\. Kasabov \(2021\)NeuroSense: short\-term emotion recognition and understanding based on spiking neural network modelling of spatio\-temporal eeg patterns\.Neurocomputing434,pp\. 137–148\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p1.1)\.
- J\. Tang, J\. Lai, X\. Xie, L\. Yang, and W\. Zheng \(2023\)AC2AS: activation consistency coupled ann\-snn framework for fast and memory\-efficient snn training\.Pattern Recognition144,pp\. 109826\.Cited by:[§2](https://arxiv.org/html/2605.30362#S2.p2.1)\.
- A\. Vicente\-Sola, D\. L\. Manna, P\. Kirkland, G\. Di Caterina, and T\. J\. Bihl \(2025\)Spiking neural networks for event\-based action recognition: a new task to understand their advantage\.Neurocomputing611,pp\. 128657\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p1.1)\.
- O\. Vinyals, C\. Blundell, T\. Lillicrap, D\. Wierstra,et al\.\(2016\)Matching networks for one shot learning\.Advances in neural information processing systems29\.Cited by:[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.35.18.1),[§4](https://arxiv.org/html/2605.30362#S4.p1.1)\.
- Y\. Wu, L\. Deng, G\. Li, and L\. Shi \(2018\)Spatio\-temporal backpropagation for training high\-performance spiking neural networks\.Frontiers in neuroscience12,pp\. 323875\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p3.1),[§2](https://arxiv.org/html/2605.30362#S2.p3.1)\.
- Y\. Wu, L\. Deng, G\. Li, J\. Zhu, Y\. Xie, and L\. Shi \(2019\)Direct training for spiking neural networks: faster, larger, better\.InProceedings of the AAAI conference on artificial intelligence,Vol\.33,pp\. 1311–1318\.Cited by:[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.28.11.1)\.
- R\. Xian, X\. Xiong, H\. Peng, J\. Wang, A\. R\. de Arellano Marrero, and Q\. Yang \(2024\)Feature fusion method based on spiking neural convolutional network for edge detection\.Pattern Recognition147,pp\. 110112\.Cited by:[§1](https://arxiv.org/html/2605.30362#S1.p1.1)\.
- H\. Xiao, K\. Rasul, and R\. Vollgraf \(2017\)Fashion\-mnist: a novel image dataset for benchmarking machine learning algorithms\.arXiv preprint arXiv:1708\.07747\.Cited by:[§4](https://arxiv.org/html/2605.30362#S4.p1.1)\.
- M\. Yao, J\. Hu, Z\. Zhou, L\. Yuan, Y\. Tian, B\. Xu, and G\. Li \(2023\)Spike\-driven transformer\.Advances in neural information processing systems36,pp\. 64043–64058\.Cited by:[§2](https://arxiv.org/html/2605.30362#S2.p3.1)\.
- Q\. Zhan, B\. Wang, A\. Jiang, X\. Xie, M\. Zhang, and G\. Liu \(2024\)A two\-stage spiking meta\-learning method for few\-shot classification\.Knowledge\-Based Systems284,pp\. 111220\.Cited by:[Table 3](https://arxiv.org/html/2605.30362#S4.T3.12.12.12.3),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.13.13.13.2)\.
- W\. Zhang and P\. Li \(2019\)Spike\-train level backpropagation for training deep recurrent spiking neural networks\.Advances in neural information processing systems32\.Cited by:[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.21.4.1)\.
- W\. Zhang and P\. Li \(2020\)Temporal spike sequence learning via backpropagation for deep spiking neural networks\.Advances in neural information processing systems33,pp\. 12022–12033\.Cited by:[Table 3](https://arxiv.org/html/2605.30362#S4.T3.2.2.2.2)\.
- D\. Zhao, Y\. Zeng, and Y\. Li \(2022\)Backeisnn: a deep spiking neural network with adaptive self\-feedback and balanced excitatory–inhibitory neurons\.Neural Networks154,pp\. 68–77\.Cited by:[Table 3](https://arxiv.org/html/2605.30362#S4.T3.1.1.1.2)\.
- H\. Zheng, Y\. Wu, L\. Deng, Y\. Hu, and G\. Li \(2021\)Going deeper with directly\-trained larger spiking neural networks\.InProceedings of the AAAI conference on artificial intelligence,Vol\.35,pp\. 11062–11070\.Cited by:[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.19.2.2),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.24.7.2)\.
- Y\. Zhu, Z\. Yu, W\. Fang, X\. Xie, T\. Huang, and T\. Masquelier \(2022\)Training spiking neural networks with event\-driven backpropagation\.Advances in Neural Information Processing Systems35,pp\. 30528–30541\.Cited by:[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.22.5.1),[Table 3](https://arxiv.org/html/2605.30362#S4.T3.17.17.32.15.2)\.Similar Articles
AI directly in DRAM: The Float Detox – How Pure Logic Unleashes the Future of Learning
BIN16 replaces all floating-point operations with boolean operations (XNOR+popcount) for neural network training and inference, enabling direct computation in off-the-shelf DRAM with zero floats, gradients, or hyperparameter tuning. It achieves 82% accuracy on MNIST in a single epoch, using only 220 lines of C.
DisjunctiveNet: Neural Symbolic Learning via Differentiable Convexified Optimization Layers
Introduces DisjunctiveNet, a unified end-to-end framework for enforcing hard, input-dependent mixed integer linear constraints within neural networks via differentiable convexified optimization layers, achieving perfect rule satisfaction on real-world datasets.
Bug or Feature^2: Weight Drift, Activation Sparsity, and Spikes
This paper formally proves that training neural networks with asymmetric activation functions like ReLU, GELU, or SiLU causes weights to drift negative, leading to up to 90% activation sparsity. It also shows that squared activations like ReLU² improve performance but cause activation spikes, which can be fixed by clipping, with GELU² achieving the best validation loss.
Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers
This paper proposes a plug-and-play framework that implements spike-friendly approximations for Transformer nonlinearities (e.g., Softmax, SiLU, normalization) via population computation with LIF neurons and lightweight bit-shift scaling, achieving less than 1% accuracy drop on LLMs without fine-tuning.
ResBM: a new transformer-based architecture for low-bandwidth pipeline-parallel training, achieving 128× activation compression [R]
ResBM introduces a transformer-based architecture with residual encoder-decoder bottlenecks for pipeline-parallel training, achieving 128× activation compression while maintaining convergence. The work advances decentralized, internet-grade distributed training by reducing inter-stage communication overhead.