Dendritic Neural Networks with Equilibrium Propagation
Summary
This paper investigates integrating dendritic neural networks with equilibrium propagation, showing that this biologically plausible approach improves performance on challenging datasets compared to standard equilibrium propagation.
View Cached Full Text
Cached at: 05/12/26, 06:50 AM
# Dendritic Neural Networks with Equilibrium Propagation
Source: [https://arxiv.org/html/2605.08135](https://arxiv.org/html/2605.08135)
Yoshimasa Kubo Department of Computer Science Lakehead University Thunder Bay, Canada ykubo@lakeheadu\.ca
###### Abstract
Equilibrium propagation \(EP\) is a biologically plausible alternative to backpropagation \(BP\), but its effectiveness can degrade in deeper and more challenging learning settings\. In parallel, dendritic neural networks have demonstrated improved performance and generalization when trained with BP, suggesting that structured, biologically inspired architectures may enhance learning\. In this work, we investigate the integration of dendritic neural networks with equilibrium propagation using an advanced EP framework\. We evaluate the proposed dendritic EP model on MNIST, Kuzushiji\-MNIST \(KMNIST\), and Fashion\-MNIST \(FMNIST\), considering both shallow and deeper architectures\. Our results show that dendritic EP achieves performance comparable to standard EP on simple tasks, while providing consistent improvements on more challenging datasets and deeper models\. In particular, dendritic EP significantly outperforms standard EP on KMNIST and FMNIST, and approaches the performance of dendritic networks trained with backpropagation through time\.To further understand these improvements, we analyze the evolution of hidden states during the free phase\. We observe that dendritic EP exhibits higher activation magnitudes and more distributed hidden\-state activity compared to standard EP, indicating that dendritic structure alters the internal network dynamics\. These findings suggest that incorporating dendritic structure can enhance the effectiveness of biologically plausible learning algorithms, especially in regimes where standard EP struggles\. Our work highlights the importance of architectural design for improving biologically inspired training methods\.
## 1Introduction
Equilibrium propagation \(EP\)\(Scellier and Bengio,[2017](https://arxiv.org/html/2605.08135#bib.bib1),[2019](https://arxiv.org/html/2605.08135#bib.bib2); Ernoultet al\.,[2019](https://arxiv.org/html/2605.08135#bib.bib3); Laborieuxet al\.,[2021](https://arxiv.org/html/2605.08135#bib.bib4); Laborieux and Zenke,[2022](https://arxiv.org/html/2605.08135#bib.bib5)\)is a biologically plausible training algorithm that serves as an alternative to backpropagation \(BP\), which is widely used for training neural networks\. Recent work\(Laborieuxet al\.,[2021](https://arxiv.org/html/2605.08135#bib.bib4)\)has shown that recurrent neural networks trained with advanced variants of EP can achieve performance competitive with those trained using BP\.
Several extensions of EP have been proposed, including applications to reinforcement learning\(Kuboet al\.,[2022](https://arxiv.org/html/2605.08135#bib.bib6)\), continual learning\(Kuboet al\.,[2025](https://arxiv.org/html/2605.08135#bib.bib7)\), models with heterogeneous time constants\(Kuboet al\.,[2026](https://arxiv.org/html/2605.08135#bib.bib12)\), and architectures incorporating convolutional layers\(Ernoultet al\.,[2019](https://arxiv.org/html/2605.08135#bib.bib3); Laborieuxet al\.,[2021](https://arxiv.org/html/2605.08135#bib.bib4)\)\. While these studies expand the applicability of EP, they primarily focus on the learning algorithm itself rather than the design of biologically plausible neural network architectures\.
In parallel, recent studies have explored dendritic neural networks trained with BP, demonstrating improved performance in tasks such as continual learning\(Grewalet al\.,[2021](https://arxiv.org/html/2605.08135#bib.bib16)\)and reduced overfitting\(Chavlis and Poirazi,[2025](https://arxiv.org/html/2605.08135#bib.bib17)\)\. Dendritic neurons are biologically motivated; however, these approaches rely on BP, which is not biologically plausible\.
In this work, we investigate the integration of dendritic neural networks with equilibrium propagation, using an advanced EP framework proposed byLaborieuxet al\.\([2021](https://arxiv.org/html/2605.08135#bib.bib4)\)\. We evaluate our approach on MNIST\(LeCun and Cortes,[2005](https://arxiv.org/html/2605.08135#bib.bib13)\), Kuzushiji\-MNIST \(KMNIST\)\(Clanuwatet al\.,[2018](https://arxiv.org/html/2605.08135#bib.bib14)\), and Fashion\-MNIST \(FMNIST\)\(Xiaoet al\.,[2017](https://arxiv.org/html/2605.08135#bib.bib15)\)\. Our results show that the proposed dendritic EP model outperforms standard EP\-based neural networks without dendritic structure and achieves performance competitive with dendritic networks trained using backpropagation through time\.
To further understand the effect of dendritic structure, we also analyze the internal dynamics of the models by visualizing hidden\-state trajectories during the free phase\. This analysis provides insight into how dendritic architectures influence network representations beyond performance metrics\.
## 2Methods
In this section, we will discuss equilibrium propagation, dendritic neurons, and Model and Dataset Specification\.
### 2\.1Equilibrium Propagation
Equilibrium Propagation \(EP\)\(Scellier and Bengio,[2017](https://arxiv.org/html/2605.08135#bib.bib1),[2019](https://arxiv.org/html/2605.08135#bib.bib2); Ernoultet al\.,[2019](https://arxiv.org/html/2605.08135#bib.bib3); Laborieuxet al\.,[2021](https://arxiv.org/html/2605.08135#bib.bib4); Laborieux and Zenke,[2022](https://arxiv.org/html/2605.08135#bib.bib5)\)is a biologically plausible learning algorithm based on energy minimization\. Given an input𝐱\\mathbf\{x\}, the network evolves its state variables𝐬\\mathbf\{s\}toward a fixed point determined by an energy functionE\(𝐬;θ\)E\(\\mathbf\{s\};\\theta\), whereθ\\thetadenotes the model parameters\.
In the free phase, the network evolves without any teaching signal:
d𝐬dt=−∂E\(𝐬;θ\)∂𝐬\.\\frac\{d\\mathbf\{s\}\}\{dt\}=\-\\frac\{\\partial E\(\\mathbf\{s\};\\theta\)\}\{\\partial\\mathbf\{s\}\}\.\(1\)This phase converges to a free fixed point, denoted by𝐬0\\mathbf\{s\}^\{0\}:
𝐬0=argmin𝐬E\(𝐬;θ\)\.\\mathbf\{s\}^\{0\}=\\arg\\min\_\{\\mathbf\{s\}\}E\(\\mathbf\{s\};\\theta\)\.\(2\)
In the nudged phase, the output layer is weakly driven by a loss functionℓ\(𝐬,𝐲\)\\ell\(\\mathbf\{s\},\\mathbf\{y\}\), where𝐲\\mathbf\{y\}is the target label\. For a positive nudging strength\+β\+\\beta, the dynamics are:
d𝐬dt=−∂E\(𝐬;θ\)∂𝐬−β∂ℓ\(𝐬,𝐲\)∂𝐬,\\frac\{d\\mathbf\{s\}\}\{dt\}=\-\\frac\{\\partial E\(\\mathbf\{s\};\\theta\)\}\{\\partial\\mathbf\{s\}\}\-\\beta\\frac\{\\partial\\ell\(\\mathbf\{s\},\\mathbf\{y\}\)\}\{\\partial\\mathbf\{s\}\},\(3\)which converges to a positively nudged fixed point𝐬\+β\\mathbf\{s\}^\{\+\\beta\}\.
The standard two\-phase EP update is then estimated as:
Δθ∝1β\(∂E\(𝐬\+β;θ\)∂θ−∂E\(𝐬0;θ\)∂θ\)\.\\Delta\\theta\\propto\\frac\{1\}\{\\beta\}\\left\(\\frac\{\\partial E\(\\mathbf\{s\}^\{\+\\beta\};\\theta\)\}\{\\partial\\theta\}\-\\frac\{\\partial E\(\\mathbf\{s\}^\{0\};\\theta\)\}\{\\partial\\theta\}\\right\)\.\(4\)
In this study, we use the symmetric nudging variant proposed byLaborieuxet al\.\([2021](https://arxiv.org/html/2605.08135#bib.bib4)\)\. In addition to the positive nudged phase, the network is also nudged in the opposite direction using−β\-\\beta:
d𝐬dt=−∂E\(𝐬;θ\)∂𝐬\+β∂ℓ\(𝐬,𝐲\)∂𝐬\.\\frac\{d\\mathbf\{s\}\}\{dt\}=\-\\frac\{\\partial E\(\\mathbf\{s\};\\theta\)\}\{\\partial\\mathbf\{s\}\}\+\\beta\\frac\{\\partial\\ell\(\\mathbf\{s\},\\mathbf\{y\}\)\}\{\\partial\\mathbf\{s\}\}\.\(5\)This phase converges to a negatively nudged fixed point𝐬−β\\mathbf\{s\}^\{\-\\beta\}\.
The symmetric EP update is computed using a centered finite difference:
Δθ∝12β\(∂E\(𝐬\+β;θ\)∂θ−∂E\(𝐬−β;θ\)∂θ\)\.\\Delta\\theta\\propto\\frac\{1\}\{2\\beta\}\\left\(\\frac\{\\partial E\(\\mathbf\{s\}^\{\+\\beta\};\\theta\)\}\{\\partial\\theta\}\-\\frac\{\\partial E\(\\mathbf\{s\}^\{\-\\beta\};\\theta\)\}\{\\partial\\theta\}\\right\)\.\(6\)
Compared with the standard two\-phase estimator, this centered estimator reduces the bias introduced by finite nudging\. This is particularly useful in deeper networks, where accurate feedback signals are important for stable credit assignment\.
### 2\.2Dendritic Neurons
Biological neurons receive inputs through distinct dendritic compartments, primarily basal and apical dendrites\. Basal dendrites integrate feedforward inputs from lower layers, while apical dendrites receive feedback signals from higher layers\. These compartments process signals locally before integration at the soma, enabling structured and nonlinear interactions between feedforward and feedback pathways\.
To model this mechanism, we introduce a dendritic neural network architecture in which each neuron receives two types of inputs: a basal \(feedforward\) input and an apical \(feedback\) input\. Our implementation follows recent dendritic neural network models that represent each neuron as a collection of nonlinear dendritic branches with aggregated outputs at the soma\(Hanet al\.,[2022](https://arxiv.org/html/2605.08135#bib.bib18)\)\. In contrast to biologically detailed compartmental models, we adopt a simplified and computationally efficient formulation that is compatible with equilibrium propagation\.
Formally, for a hidden layer𝐬ℓ\\mathbf\{s\}^\{\\ell\}, the basal and apical inputs are defined as:
𝐛ℓ=fb\(𝐖ℓ𝐬ℓ−1\),𝐚ℓ=fa\(𝐁ℓ𝐬ℓ\+1\),\\mathbf\{b\}^\{\\ell\}=f\_\{b\}\\\!\\left\(\\mathbf\{W\}^\{\\ell\}\\mathbf\{s\}^\{\\ell\-1\}\\right\),\\quad\\mathbf\{a\}^\{\\ell\}=f\_\{a\}\\\!\\left\(\\mathbf\{B\}^\{\\ell\}\\mathbf\{s\}^\{\\ell\+1\}\\right\),\(7\)where𝐖ℓ\\mathbf\{W\}^\{\\ell\}and𝐁ℓ\\mathbf\{B\}^\{\\ell\}denote the basal and apical connections, respectively, andfb\(⋅\)f\_\{b\}\(\\cdot\)andfa\(⋅\)f\_\{a\}\(\\cdot\)represent nonlinear dendritic transformations\.
Each dendritic compartment consists of multiple sparse branches\. Each branch connects to a subset of presynaptic neurons, applies a linear transformation followed by a nonlinearity, and produces a local response\. The outputs of these branches are then aggregated to form the dendritic input\. Letzi,kℓz\_\{i,k\}^\{\\ell\}denote the output of thekk\-th branch associated with neuroniiin layerℓ\\ell\. The basal input is computed as:
biℓ=1K∑k=1Kzi,kℓ,b\_\{i\}^\{\\ell\}=\\frac\{1\}\{K\}\\sum\_\{k=1\}^\{K\}z\_\{i,k\}^\{\\ell\},\(8\)whereKKis the number of branches per neuron\. An analogous formulation is used for the apical input\.
In practice, the number of basal and apical branches, branch sparsity, and the scaling of apical feedback are treated as hyperparameters\. The specific values used in our experiments are provided in Section[2\.3](https://arxiv.org/html/2605.08135#S2.SS3)\.
The somatic activation is obtained by combining basal and apical inputs:
𝐬ℓ=σ\(𝐛ℓ\+α𝐚ℓ\),\\mathbf\{s\}^\{\\ell\}=\\sigma\\\!\\left\(\\mathbf\{b\}^\{\\ell\}\+\\alpha\\mathbf\{a\}^\{\\ell\}\\right\),\(9\)whereσ\(⋅\)\\sigma\(\\cdot\)is the activation function andα\\alphacontrols the relative strength of the apical feedback signal\.
This dendritic formulation introduces structured, sparse, and nonlinear processing of both feedforward and feedback signals, while remaining computationally efficient and compatible with equilibrium propagation\. Unlike prior work based on backpropagation, our approach integrates this dendritic architecture with a biologically plausible learning rule\. Summary of this part is depicted in Figure[1\(a\)](https://arxiv.org/html/2605.08135#S3.F1.sf1)\.
Table 1:The hyper\-parameters for the EP, DEP, andDBPTT are summarized in this table\. Here,α1\\alpha 1refers to the learning rate for updating the weights between the input and hidden layers,α2\\alpha 2is the learning rate for updating the weights between the hidden and output layers \(or another hidden layer if there are two hidden layers\), andα3\\alpha 3is the learning rate for updating the weights between the hidden and output layers \(if there are two hidden layers\)\. The ’Free Phase’ and ’Clamped Phase’ columns specify the number of time steps used during the free and weakly clamped phases, respectively\.β\\betais a nudging parameter for the weakly clamped phase\.
### 2\.3Model and Dataset Specification
We evaluate our equilibrium propagation model with dendritic neurons \(DEP\) on MNIST\(LeCun and Cortes,[2005](https://arxiv.org/html/2605.08135#bib.bib13)\), Kuzushiji\-MNIST \(KMNIST\)\(Clanuwatet al\.,[2018](https://arxiv.org/html/2605.08135#bib.bib14)\), and Fashion\-MNIST \(FMNIST\)\(Xiaoet al\.,[2017](https://arxiv.org/html/2605.08135#bib.bib15)\)\. We compare against two baselines: \(i\) a standard EP model without dendritic structure \(EP\), and \(ii\) a dendritic model trained using backpropagation through time \(DBPTT\)\.
The hyperparameters used in our experiments are summarized in Table[1](https://arxiv.org/html/2605.08135#S2.T1)\. For MNIST, we use a single hidden layer with 256 units\. For KMNIST and FMNIST, we use two hidden layers \(256×\\times256\) for all models, reflecting the increased complexity of these datasets\.
For the activation function, we use the hard sigmoid function for most configurations\. However, for the EP model on FMNIST, we use thetanh\\tanhactivation, as we empirically observed that hard sigmoid leads to unstable training on this dataset\.
For dendritic neurons, we use a fixed configuration across all datasets, consisting of 8 basal branches, 2 apical branches, a branch sparsity of 0\.5, and an apical scaling factor of 0\.2\. These settings were chosen to balance model expressivity and computational efficiency\. For nonlinear transformations within dendritic branches, we employ the rectified linear unit \(ReLU\) activation function\.
We train all models using stochastic gradient descent \(SGD\) with momentum 0\.9\. We do not employ adaptive optimization methods such as Adam\(Kingma and Ba,[2014](https://arxiv.org/html/2605.08135#bib.bib19)\), in order to maintain consistency across models and avoid introducing additional optimization\-related confounding factors\.
## 3Results
Table 2:Training and test accuracy \(%\) for EP, dendritic EP \(DEP\), and dendritic BPTT \(DBPTT\) across datasets\. Results are reported as mean±\\pmstandard deviation over multiple runs\.\(a\)Dendritic neuron architecture
\(b\)MNIST
\(c\)KMNIST
\(d\)FMNIST
Figure 1:Dendritic architecture and learning dynamics\.\(a\) Illustration of the dendritic neuron, where feedforward inputs are processed through basal branches and feedback signals through apical branches before integration at the soma\. \(b–d\) Test accuracy learning curves for EP, dendritic EP \(DEP\), and dendritic BPTT \(DBPTT\) on MNIST, KMNIST, and FMNIST\. Shaded regions indicate standard deviation over multiple runs\.### 3\.1MNIST
The first row of Table[2](https://arxiv.org/html/2605.08135#S3.T2)reports the performance of all models on MNIST\. On this dataset, all methods achieve comparable results, with no significant differences in final test accuracy\.
Figure[1\(b\)](https://arxiv.org/html/2605.08135#S3.F1.sf2)shows the corresponding learning curves\. We observe that the standard EP model converges faster, reaching its peak performance within fewer epochs\. In contrast, both dendritic models \(DEP and DBPTT\) require more epochs to reach their maximum accuracy, although their final performance remains similar\.
### 3\.2Kuzushiji\-MNIST
The second row of Table[2](https://arxiv.org/html/2605.08135#S3.T2)summarizes the results on KMNIST\. In contrast to MNIST, clearer differences between models emerge on this more challenging dataset\.
As shown in Figure[1\(c\)](https://arxiv.org/html/2605.08135#S3.F1.sf3), DEP requires slightly more epochs to converge compared to the other methods\. However, it achieves a substantially higher test accuracy \(90\.02±0\.27%90\.02\\pm 0\.27\\%\) than standard EP \(88\.54±0\.33%88\.54\\pm 0\.33\\%\), and approaches the performance of DBPTT \(91\.92±0\.09%91\.92\\pm 0\.09\\%\)\. This suggests that incorporating dendritic structure improves performance in more complex settings while remaining competitive with backpropagation\-based training\.
### 3\.3Fashion\-MNIST
The final row of Table[2](https://arxiv.org/html/2605.08135#S3.T2)presents the results on FMNIST\. Similar to KMNIST, we observe a clear performance gap between standard EP and the dendritic models\.
Figure[1\(d\)](https://arxiv.org/html/2605.08135#S3.F1.sf4)shows that DEP again converges more slowly than the other models\. Nevertheless, it achieves strong final performance \(88\.52±0\.14%88\.52\\pm 0\.14\\%\), which is close to that of DBPTT \(89\.29±0\.17%89\.29\\pm 0\.17\\%\)\. These results further support the effectiveness of dendritic architectures when combined with equilibrium propagation on more challenging datasets\.
### 3\.4States comparison
\(a\)Representative MNIST samples with labels 7 and 2\.
\(b\)Representative MNIST samples with labels 1 and 0\.
Figure 2:Hidden\-state trajectories during the free phase for EP and DEP on representative MNIST test samples\. DEP exhibits higher activation magnitudes and more distributed hidden\-state activity compared with standard EP\.To better understand the effect of dendritic neurons, we visualize the evolution of hidden states during the free phase for both EP and DEP on the MNIST dataset \(Figure[2](https://arxiv.org/html/2605.08135#S3.F2)\)\. Across the examined examples, DEP exhibits higher activation magnitudes and engages a larger proportion of hidden neurons compared to standard EP\. These observations suggest that incorporating dendritic structure alters the internal network dynamics, leading to more distributed hidden\-state representations\.
## 4Discussion
In this work, we investigated the integration of dendritic neural network architectures with equilibrium propagation \(EP\)\. Across all datasets, the learning curves indicate that models with dendritic structure \(DEP\) generally converge more slowly than standard EP\. This slower convergence is consistent with the increased architectural complexity introduced by dendritic branches and the additional nonlinear processing they perform\.
Despite the slower learning dynamics, DEP achieves improved performance on more challenging datasets such as KMNIST and FMNIST, while remaining competitive with dendritic models trained using backpropagation through time \(DBPTT\)\. These results suggest a trade\-off between convergence speed and representational capacity: incorporating dendritic structure may enhance performance in complex settings at the cost of slower optimization\.
To further understand this behavior, we analyzed the evolution of hidden states during the free phase\. We observed that DEP exhibits higher activation magnitudes and engages a larger proportion of hidden neurons compared to standard EP, indicating more distributed internal representations\. This difference in dynamics suggests that dendritic structure alters how information is processed and propagated through the network\.
One possible interpretation is that the slower convergence of DEP reflects a more gradual exploration of the energy landscape\. This behavior may act as an implicit form of regularization, guiding the model toward higher\-quality solutions that generalize better on more complex datasets\. In particular, DEP may favor flatter regions of the energy landscape; however, validating this hypothesis requires further investigation\.
Future work will explore integrating alternative biologically motivated learning rules, such as the predictive learning rule\(Luczaket al\.,[2022](https://arxiv.org/html/2605.08135#bib.bib9); Luczak and Kubo,[2022](https://arxiv.org/html/2605.08135#bib.bib10); Kuboet al\.,[2023](https://arxiv.org/html/2605.08135#bib.bib11)\), which may provide improved learning dynamics while maintaining biological plausibility\. In addition, extending this framework to spiking neural networks is a promising direction, given recent studies demonstrating the applicability of EP to spiking models\(O’Connoret al\.,[2019](https://arxiv.org/html/2605.08135#bib.bib20); Martinet al\.,[2021](https://arxiv.org/html/2605.08135#bib.bib22); Linet al\.,[2024](https://arxiv.org/html/2605.08135#bib.bib21)\)\. Combining dendritic architectures, spiking dynamics, and equilibrium\-based learning may offer a more comprehensive and biologically grounded learning framework\.
#### Acknowledgments
This research was enabled in part by computational resources provided by the Digital Research Alliance of Canada \(alliancecan\.ca\)\.
## References
- Dendrites endow artificial neural networks with accurate, robust and parameter\-efficient learning\.Nature communications16\(1\),pp\. 943\.Cited by:[§1](https://arxiv.org/html/2605.08135#S1.p3.1)\.
- T\. Clanuwat, M\. Bober\-Irizar, A\. Kitamoto, A\. Lamb, K\. Yamamoto, and D\. Ha \(2018\)Deep learning for classical japanese literature\.arXiv preprint arXiv:1812\.01718\.Cited by:[§1](https://arxiv.org/html/2605.08135#S1.p4.1),[§2\.3](https://arxiv.org/html/2605.08135#S2.SS3.p1.1)\.
- M\. Ernoult, J\. Grollier, D\. Querlioz, Y\. Bengio, and B\. Scellier \(2019\)Updates of equilibrium prop match gradients of backprop through time in an rnn with static input\.Advances in neural information processing systems32\.Cited by:[§1](https://arxiv.org/html/2605.08135#S1.p1.1),[§1](https://arxiv.org/html/2605.08135#S1.p2.1),[§2\.1](https://arxiv.org/html/2605.08135#S2.SS1.p1.4)\.
- K\. Grewal, J\. Forest, B\. P\. Cohen, and S\. Ahmad \(2021\)Going beyond the point neuron: active dendrites and sparse representations for continual learning\.bioRxiv,pp\. 2021–10\.Cited by:[§1](https://arxiv.org/html/2605.08135#S1.p3.1)\.
- Z\. Han, E\. Gorobets, and P\. Chen \(2022\)Parameter efficient dendritic\-tree neurons outperform perceptrons\.External Links:2207\.00708,[Link](https://arxiv.org/abs/2207.00708)Cited by:[§2\.2](https://arxiv.org/html/2605.08135#S2.SS2.p2.1)\.
- D\. P\. Kingma and J\. Ba \(2014\)Adam: a method for stochastic optimization\.arXiv preprint arXiv:1412\.6980\.Cited by:[§2\.3](https://arxiv.org/html/2605.08135#S2.SS3.p5.1)\.
- Y\. Kubo, E\. Chalmers, and A\. Luczak \(2022\)Combining backpropagation with equilibrium propagation to improve an actor\-critic reinforcement learning framework\.Frontiers in Computational Neuroscience16,pp\. 980613\.Cited by:[§1](https://arxiv.org/html/2605.08135#S1.p2.1)\.
- Y\. Kubo, E\. Chalmers, and A\. Luczak \(2023\)Biologically\-inspired neuronal adaptation improves learning in neural networks\.Communicative & Integrative Biology16\(1\),pp\. 2163131\.Cited by:[§4](https://arxiv.org/html/2605.08135#S4.p5.1)\.
- Y\. Kubo, J\. E\. Delanois, and M\. Bazhenov \(2025\)Toward lifelong learning in equilibrium propagation: sleep\-like and awake rehearsal for enhanced stability\.arXiv preprint arXiv:2508\.14081\.Cited by:[§1](https://arxiv.org/html/2605.08135#S1.p2.1)\.
- Y\. Kubo, S\. P\. Modi, and S\. Patel \(2026\)Heterogeneous time constants improve stability in equilibrium propagation\.arXiv preprint arXiv:2603\.03402\.Cited by:[§1](https://arxiv.org/html/2605.08135#S1.p2.1)\.
- A\. Laborieux, M\. Ernoult, B\. Scellier, Y\. Bengio, J\. Grollier, and D\. Querlioz \(2021\)Scaling equilibrium propagation to deep convnets by drastically reducing its gradient estimator bias\.Frontiers in neuroscience15,pp\. 633674\.Cited by:[§1](https://arxiv.org/html/2605.08135#S1.p1.1),[§1](https://arxiv.org/html/2605.08135#S1.p2.1),[§1](https://arxiv.org/html/2605.08135#S1.p4.1),[§2\.1](https://arxiv.org/html/2605.08135#S2.SS1.p1.4),[§2\.1](https://arxiv.org/html/2605.08135#S2.SS1.p5.1)\.
- A\. Laborieux and F\. Zenke \(2022\)Holomorphic equilibrium propagation computes exact gradients through finite size oscillations\.Advances in neural information processing systems35,pp\. 12950–12963\.Cited by:[§1](https://arxiv.org/html/2605.08135#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.08135#S2.SS1.p1.4)\.
- Y\. LeCun and C\. Cortes \(2005\)The mnist database of handwritten digits\.External Links:[Link](https://api.semanticscholar.org/CorpusID:60282629)Cited by:[§1](https://arxiv.org/html/2605.08135#S1.p4.1),[§2\.3](https://arxiv.org/html/2605.08135#S2.SS3.p1.1)\.
- J\. Lin, M\. Bal, and A\. Sengupta \(2024\)Scaling snns trained using equilibrium propagation to convolutional architectures\.External Links:2405\.02546,[Link](https://arxiv.org/abs/2405.02546)Cited by:[§4](https://arxiv.org/html/2605.08135#S4.p5.1)\.
- A\. Luczak and Y\. Kubo \(2022\)Predictive neuronal adaptation as a basis for consciousness\.Frontiers in Systems Neuroscience15,pp\. 767461\.Cited by:[§4](https://arxiv.org/html/2605.08135#S4.p5.1)\.
- A\. Luczak, B\. L\. McNaughton, and Y\. Kubo \(2022\)Neurons learn by predicting future activity\.Nature machine intelligence4\(1\),pp\. 62–72\.Cited by:[§4](https://arxiv.org/html/2605.08135#S4.p5.1)\.
- E\. Martin, M\. Ernoult, J\. Laydevant, S\. Li, D\. Querlioz, T\. Petrisor, and J\. Grollier \(2021\)Eqspike: spike\-driven equilibrium propagation for neuromorphic implementations\.Iscience24\(3\)\.Cited by:[§4](https://arxiv.org/html/2605.08135#S4.p5.1)\.
- P\. O’Connor, E\. Gavves, and M\. Welling \(2019\)Training a spiking neural network with equilibrium propagation\.InProceedings of the Twenty\-Second International Conference on Artificial Intelligence and Statistics,K\. Chaudhuri and M\. Sugiyama \(Eds\.\),Proceedings of Machine Learning Research, Vol\.89,pp\. 1516–1523\.External Links:[Link](https://proceedings.mlr.press/v89/o-connor19a.html)Cited by:[§4](https://arxiv.org/html/2605.08135#S4.p5.1)\.
- B\. Scellier and Y\. Bengio \(2017\)Equilibrium propagation: bridging the gap between energy\-based models and backpropagation\.Frontiers in computational neuroscience11,pp\. 24\.Cited by:[§1](https://arxiv.org/html/2605.08135#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.08135#S2.SS1.p1.4)\.
- B\. Scellier and Y\. Bengio \(2019\)Equivalence of equilibrium propagation and recurrent backpropagation\.Neural computation31\(2\),pp\. 312–329\.Cited by:[§1](https://arxiv.org/html/2605.08135#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.08135#S2.SS1.p1.4)\.
- H\. Xiao, K\. Rasul, and R\. Vollgraf \(2017\)Fashion\-mnist: a novel image dataset for benchmarking machine learning algorithms\.arXiv preprint arXiv:1708\.07747\.Cited by:[§1](https://arxiv.org/html/2605.08135#S1.p4.1),[§2\.3](https://arxiv.org/html/2605.08135#S2.SS3.p1.1)\.Similar Articles
Equilibrium Propagation and Hamiltonian Inference in the Diffusive Fitzhugh-Nagumo Model
This paper extends Equilibrium Propagation to skew-gradient systems and demonstrates an equivalence between deep Energy-Based Models and Hamiltonian neural networks, focusing on diffusively coupled Fitzhugh-Nagumo neurons. It derives a layer-wise Hamiltonian recurrence relation for inference in such networks.
For over a decade, we've accepted that end-to-end backprop is the only way to train deep networks (1 minute read)
Sakana AI presents DiffusionBlocks, a method that trains neural networks block-wise by interpreting forward passes as diffusion denoising, significantly reducing memory requirements compared to traditional end-to-end backpropagation.
Geometric Stability of Neural Population Codes: Regional Variation, Behavioral Relevance, and Circuit Dependence
This paper introduces geometric stability as a measure of how reliably pairwise stimulus distances reproduce across trials, demonstrating its behavioral relevance and circuit dependence across brain regions, with an attractor network model explaining its emergence.
On the Stability of Growth in Structural Plasticity
This academic paper investigates the asymmetry between pruning and growth in structural plasticity for neural networks, showing that newborn units suffer from weaker gradient signals than incumbent units, and proposes interventions to improve integration.
Parallel-in-Time Training of Recurrent Neural Networks for Dynamical Systems Reconstruction
This paper investigates parallel-in-time algorithms for training recurrent neural networks in dynamical systems reconstruction, proposing GTF-DEER that enables stable learning over long sequences and improves reconstruction accuracy.