SpikF-GO: Spiking Fourier Graph Operators for Multivariate Time Series Forecasting
Summary
Introduces SpikF-GO, a spiking neural network model for multivariate time series forecasting that combines graph-based inter-variable dependency modeling with spike-driven spectral processing, achieving state-of-the-art results among SNN methods with reduced energy consumption.
View Cached Full Text
Cached at: 06/15/26, 09:08 AM
# SpikF-GO: Spiking Fourier Graph Operators for Multivariate Time Series Forecasting
Source: [https://arxiv.org/html/2606.13901](https://arxiv.org/html/2606.13901)
11institutetext:Data Science Group, University of Hildesheim, Hildesheim, Germany
11email:bakhshaliyevj@uni\-hildesheim\.de, landwehr@uni\-hildesheim\.de###### Abstract
Spiking Neural Networks \(SNNs\) have emerged as an energy\-efficient alternative to conventional neural networks, demonstrating strong performance in computer vision and robotics\. More recently, SNNs have been applied to time series forecasting \(TSF\), with methods exploring spiking temporal backbones, spike\-compatible positional encodings, Fourier\-domain processing, and redesigned neuron dynamics\. However, existing SNN forecasting approaches process variables independently, lacking explicit mechanisms for modeling inter\-variable dependencies\. This is a critical limitation in multivariate settings, where cross\-variable correlations carry substantial predictive information\. We propose Spiking Fourier Graph Operators \(SpikF\-GO\), which addresses this gap by combining a hypervariate graph formulation in which every scalar observation becomes a graph node with spike\-driven spectral processing\. SpikF\-GO introduces a Hard Concrete frequency gate for learnable sparse frequency selection and a Complex LIF gate that applies independent spiking neurons to real and imaginary Fourier components, preserving binary, event\-driven computation throughout the spectral domain\. We further present a variant incorporating Central Pattern Generator\-based positional encodings for stronger long\-range temporal modeling\. Evaluated on eight benchmarks under a unified experimental protocol, SpikF\-GO achieves the best average rank among all SNN methods and outperforms its ANN counterpart, FourierGNN, at reduced energy cost\. SpikF\-GO maintains competitive accuracy even at substantially smaller embedding dimensions, thereby achieving significant energy reductions\. To our knowledge, this is among the first works to bring graph\-based multivariate modeling into the spiking domain for TSF and the first to provide a unified comparison across SNN forecasting architectures under a common experimental protocol\.
††footnotetext:This paper has been accepted for presentation at ECML–PKDD 2026\.## 1Introduction
Spiking Neural Networks \(SNNs\), considered the third generation of neural networks\[[20](https://arxiv.org/html/2606.13901#bib.bib20)\], have attracted significant attention for their energy efficiency, sparsity, and event\-driven processing\[[7](https://arxiv.org/html/2606.13901#bib.bib12),[25](https://arxiv.org/html/2606.13901#bib.bib25)\]\. Unlike Artificial Neural Networks \(ANNs\) that operate on continuous\-valued activations, SNNs communicate via discrete spike events, mimicking how biological neurons transmit information through action potentials\[[28](https://arxiv.org/html/2606.13901#bib.bib17),[20](https://arxiv.org/html/2606.13901#bib.bib20),[7](https://arxiv.org/html/2606.13901#bib.bib12)\]\. This event\-driven paradigm enables SNNs to process information only when necessary, offering substantial computational savings over ANNs whose continuous operations create significant challenges for resource\-constrained and edge\-deployment environments\[[23](https://arxiv.org/html/2606.13901#bib.bib11),[25](https://arxiv.org/html/2606.13901#bib.bib25)\]\.
Leveraging these efficiency advantages, SNNs have achieved significant progress across machine learning domains, particularly in computer vision\. Spiking Transformer architectures have been applied to image classification\[[30](https://arxiv.org/html/2606.13901#bib.bib26),[35](https://arxiv.org/html/2606.13901#bib.bib27),[11](https://arxiv.org/html/2606.13901#bib.bib28)\], object detection\[[17](https://arxiv.org/html/2606.13901#bib.bib29)\], and semantic segmentation\[[12](https://arxiv.org/html/2606.13901#bib.bib30)\], in several cases matching or surpassing ANN counterparts at a fraction of the energy cost\. Beyond static inputs, the inherent temporal dynamics of spiking neurons make SNNs naturally suited for sequential and time\-varying data, including time series forecasting\[[9](https://arxiv.org/html/2606.13901#bib.bib16)\]\.
Recent work on SNN\-based time series forecasting has developed spiking counterparts of major temporal backbones\. Lv et al\.\[[19](https://arxiv.org/html/2606.13901#bib.bib13)\]introduced spiking TCN, RNN, and Transformer models with competitive accuracy and lower energy consumption, while Central Pattern Generator \(CPG\)\-based positional encodings were proposed to address the permutation\-invariance issue of spiking self\-attention\[[18](https://arxiv.org/html/2606.13901#bib.bib15)\]\. The Temporal Segment LIF \(TS\-LIF\) neuron\[[9](https://arxiv.org/html/2606.13901#bib.bib16)\]further improves multi\-timescale integration through a dual\-compartment design that separates low\- and high\-frequency processing\. More recently, SpikF\[[28](https://arxiv.org/html/2606.13901#bib.bib17)\]showed that Fourier\-domain processing is well suited to SNN forecasting by using a Spiking Fast Fourier Transform \(S\-FFT\) for frequency selection, avoiding permutation invariance while achieving strong energy–accuracy trade\-offs\. In parallel, on the ANN side, FourierGNN\[[32](https://arxiv.org/html/2606.13901#bib.bib10)\], which is considered one of the state\-of\-the\-art GNN\-based forecasting architectures \(see Section[2\.1](https://arxiv.org/html/2606.13901#S2.SS1)\), proposed a hypervariate graph formulation that unifies spatial and temporal modeling through Fourier Graph Operators \(FGOs\), achieving log\-linear complexity without separate graph and temporal modules\. Despite this progress, existing SNN forecasting methods still process each variable independently and lack explicit modeling of inter\-variable dependencies\. In multivariate time series, cross\-variable correlations often carry substantial predictive information, such as correlated sensor readings in traffic networks or co\-moving patterns in energy grids, and ignoring these relationships leaves significant predictive signal unexploited\[[5](https://arxiv.org/html/2606.13901#bib.bib31)\]\. Furthermore, prior SNN forecasting studies lack a unified experimental comparison across architectures under consistent settings, making it difficult to assess the true relative strengths of different approaches\.
In this paper, we proposeSpikingFourierGraphOperators\(SpikF\-GO\), which addresses both limitations by combining the unified hypervariate graph formulation of FourierGNN\[[32](https://arxiv.org/html/2606.13901#bib.bib10)\]with spike\-driven Fourier\-domain processing\. Rather than processing variables or time steps through separate modules, SpikF\-GO treats each scalar observation in the input window as a node in a hypervariate graph, enabling the model to learn joint spatiotemporal dependencies through spectral graph convolution in a single unified structure\. This formulation addresses the lack of explicit inter\-variable modeling in prior SNN forecasting methods such as TS\-LIF\[[9](https://arxiv.org/html/2606.13901#bib.bib16)\]and SpikF\[[28](https://arxiv.org/html/2606.13901#bib.bib17)\]\. To maintain energy efficiency within this graph\-based framework, SpikF\-GO employs a Hard Concrete frequency gate\[[16](https://arxiv.org/html/2606.13901#bib.bib23)\]for learnable sparse frequency selection and a Complex LIF gate that applies independent spiking neurons to the real and imaginary components of the Fourier spectrum, preserving binary, event\-driven computation throughout the spectral domain\. In addition, we evaluate a variant,SpikF\-GO w/ CPG, which injects CPG positional encodings\[[18](https://arxiv.org/html/2606.13901#bib.bib15)\]prior to spectral mixing to strengthen long\-range temporal modeling\. In summary, the main contributions are as follows:
- •We proposeSpikF\-GO\(SpikingFourierGraphOperators\), a spiking architecture for multivariate time series forecasting that combines a hypervariate graph formulation with spike\-driven Fourier graph processing to model intra\-series temporal dependencies, inter\-series dependencies, and time\-varying cross\-variable interactions\. We further introduceSpikF\-GO w/ CPGto improve long\-range temporal modeling\.
- •On eight benchmarks,SpikF\-GO w/ CPGachieves the best average rank on bothR2R^\{2\}\(2\.4\) and MAE \(2\.3\), whileSpikF\-GOachieves the second\-best average rank onR2R^\{2\}\(2\.8\), outperforming prior SNN baselines and surpassing the ANN baseline FourierGNN\.
- •We provide extensive ablations and theoretical energy analysis, showing that SpikF\-GO achieves1\.89×\\timeslower theoretical energy than FourierGNN, and up to7\.86×\\timeslower with a compact embedding size ofE=8E\{=\}8\.
## 2Related Work
### 2\.1GNNs and Frequency\-Based Models for Multivariate TSF
Multivariate time series forecasting \(TSF\) has traditionally been approached with temporal architectures such as recurrent networks, convolutional models, and more recently Transformers\[[24](https://arxiv.org/html/2606.13901#bib.bib2),[3](https://arxiv.org/html/2606.13901#bib.bib3),[27](https://arxiv.org/html/2606.13901#bib.bib4),[36](https://arxiv.org/html/2606.13901#bib.bib5)\]\. In parallel, graph\-based forecasting methods leverage the observation that many multivariate systems exhibit structured inter\-series dependencies, and therefore model both temporal dynamics and cross\-variable interactions using Graph Neural Networks \(GNNs\)\[[32](https://arxiv.org/html/2606.13901#bib.bib10)\]\.
Early spatio\-temporal GNN models such as STGCN and TAMP\-S2GCNets rely on a predefined graph to encode spatial correlations\[[33](https://arxiv.org/html/2606.13901#bib.bib1),[6](https://arxiv.org/html/2606.13901#bib.bib6)\]\. However, in many real\-world forecasting problems the underlying dependency graph is unknown or time\-varying\. To address this, later approaches learn inter\-series relations directly from data, including StemGNN\[[4](https://arxiv.org/html/2606.13901#bib.bib7)\], MTGNN\[[29](https://arxiv.org/html/2606.13901#bib.bib8)\], and AGCRN\[[2](https://arxiv.org/html/2606.13901#bib.bib9)\]\. Despite their effectiveness, many of these methods still adopt a two\-stream design—a graph module \(e\.g\., GCN/GAT\) for cross\-series interactions and a temporal backbone \(e\.g\., RNN/GRU/LSTM\) for temporal dependencies—which increases architectural complexity and can complicate optimization\[[32](https://arxiv.org/html/2606.13901#bib.bib10)\]\.
FourierGNN\[[32](https://arxiv.org/html/2606.13901#bib.bib10)\]addresses this limitation by proposing a unified formulation in which each scalar value in the multivariate input window is treated as a node in a*hypervariate graph*\. Fourier Graph Operators \(FGOs\) then perform graph convolutions in the time domain by carrying out the equivalent matrix multiplications in Fourier space, achieving log\-linear complexity and strong accuracy while eliminating the need for separate spatial and temporal modules\.
### 2\.2Time Series Forecasting with Spiking Neural Networks
Within the SNN forecasting literature, Lv et al\.\[[19](https://arxiv.org/html/2606.13901#bib.bib13)\]established the first systematic study by proposing spiking variants of TCN, RNN, and Transformer backbones, demonstrating competitive accuracy with significantly reduced energy consumption\. However, sequence modeling in spiking Transformers faces a structural challenge: standard self\-attention is permutation\-invariant, and positional encoding under spike\-based processing remains largely underexplored, which can weaken long\-range dependency modeling\. To mitigate this, subsequent work proposes spike\-compatible positional encodings inspired by Central Pattern Generators \(CPGs\), injecting structured timing signals into spiking models\[[18](https://arxiv.org/html/2606.13901#bib.bib15)\], though positional encoding in spiking architectures remains an open challenge\.
A complementary direction improves the spiking neuron itself\. The Temporal Segment LIF \(TS\-LIF\) model\[[9](https://arxiv.org/html/2606.13901#bib.bib16)\]introduces a dual\-compartment neuron design in which dendritic and somatic pathways specialize in low\- and high\-frequency components respectively, improving multi\-timescale integration and alleviating the long\-horizon limitations of standard LIF dynamics\. However, TS\-LIF and the aforementioned spiking backbones process each variable independently and lack explicit mechanisms for capturing inter\-variable correlations—a limitation the authors identify as future work\.
Most closely related to our work, SpikF\[[28](https://arxiv.org/html/2606.13901#bib.bib17)\]demonstrates that Fourier\-domain processing is a natural fit for SNN\-based forecasting\. By encoding patches of the input sequence and applying a Spiking Frequency Selection mechanism via a Spiking Fast Fourier Transform \(S\-FFT\), SpikF avoids the permutation\-invariance problem of self\-attention while naturally exploiting the positional structure embedded in the Fourier transform\. SpikF further provides theoretical efficiency analysis showing that S\-FFT operations yield substantially lower energy consumption than their floating\-point counterparts when spike trains are sparse\[[28](https://arxiv.org/html/2606.13901#bib.bib17),[15](https://arxiv.org/html/2606.13901#bib.bib18),[22](https://arxiv.org/html/2606.13901#bib.bib19)\]\. Yet, like prior spiking methods, SpikF does not model cross\-variable dependencies through an explicit graph structure\.
Motivated by these advances,SpikF\-GObridges this gap by combining the unified hypervariate graph formulation of FourierGNN\[[32](https://arxiv.org/html/2606.13901#bib.bib10)\]with spike\-driven Fourier\-domain processing inspired by SpikF\[[28](https://arxiv.org/html/2606.13901#bib.bib17)\], bringing graph\-based multivariate modeling into the spiking domain for multivariate TSF\. We additionally presentSpikF\-GO w/ CPG, which injects CPG positional signals prior to spectral mixing\[[18](https://arxiv.org/html/2606.13901#bib.bib15)\]to strengthen long\-range temporal modeling\.
## 3Problem Formulation
### 3\.1Multivariate Time Series Forecasting
We focus on the multivariate TSF setting, where the goal is to predict future values of multiple correlated variables over time\.
A batch of multivariate time series with batch sizeBB, sequence lengthTT, andNNvariables is represented as
𝐗∈ℝB×T×N,\\mathbf\{X\}\\in\\mathbb\{R\}^\{B\\times T\\times N\},\(1\)where𝐗b,t,n\\mathbf\{X\}\_\{b,t,n\}denotes the value of variablen∈\{1,…,N\}n\\in\\\{1,\\ldots,N\\\}at timet∈\{1,…,T\}t\\in\\\{1,\\ldots,T\\\}for sampleb∈\{1,…,B\}b\\in\\\{1,\\ldots,B\\\}, and we denote theNN\-dimensional observation vector by𝐱b,t∈ℝN\\mathbf\{x\}\_\{b,t\}\\in\\mathbb\{R\}^\{N\}\.
Given a look\-back window lengthL<TL<Tand a forecast horizonO=T−LO=T\-L, we construct, for each samplebb, the input window and forecast horizon as
𝐗in\(b\)\\displaystyle\\mathbf\{X\}^\{\(b\)\}\_\{\\text\{in\}\}=\[𝐱b,1,𝐱b,2,…,𝐱b,L\]∈ℝL×N,\\displaystyle=\\big\[\\mathbf\{x\}\_\{b,1\},\\mathbf\{x\}\_\{b,2\},\\ldots,\\mathbf\{x\}\_\{b,L\}\\big\]\\in\\mathbb\{R\}^\{L\\times N\},\(2\)𝐘\(b\)\\displaystyle\\mathbf\{Y\}^\{\(b\)\}=\[𝐱b,L\+1,𝐱b,L\+2,…,𝐱b,T\]∈ℝO×N\.\\displaystyle=\\big\[\\mathbf\{x\}\_\{b,L\+1\},\\mathbf\{x\}\_\{b,L\+2\},\\ldots,\\mathbf\{x\}\_\{b,T\}\\big\]\\in\\mathbb\{R\}^\{O\\times N\}\.\(3\)Stacking\{𝐗in\(b\)\}b=1B\\\{\\mathbf\{X\}^\{\(b\)\}\_\{\\text\{in\}\}\\\}\_\{b=1\}^\{B\}and\{𝐘\(b\)\}b=1B\\\{\\mathbf\{Y\}^\{\(b\)\}\\\}\_\{b=1\}^\{B\}yields the batched tensors
𝐗in∈ℝB×L×N,𝐘∈ℝB×O×N\.\\mathbf\{X\}\_\{\\text\{in\}\}\\in\\mathbb\{R\}^\{B\\times L\\times N\},\\qquad\\mathbf\{Y\}\\in\\mathbb\{R\}^\{B\\times O\\times N\}\.\(4\)
A forecasting model𝐅𝜽\\mathbf\{F\}\_\{\\boldsymbol\{\\theta\}\}parameterized by𝜽\\boldsymbol\{\\theta\}learns the mapping
𝐅𝜽:ℝB×L×N→ℝB×O×N,𝐗in↦𝐅𝜽\(𝐗in\)\.\\mathbf\{F\}\_\{\\boldsymbol\{\\theta\}\}:\\ \\mathbb\{R\}^\{B\\times L\\times N\}\\rightarrow\\mathbb\{R\}^\{B\\times O\\times N\},\\qquad\\mathbf\{X\}\_\{\\text\{in\}\}\\mapsto\\mathbf\{F\}\_\{\\boldsymbol\{\\theta\}\}\(\\mathbf\{X\}\_\{\\text\{in\}\}\)\.\(5\)
The model is trained by minimizing the Mean Squared Error \(MSE\) over mini\-batches of training samples, with model parameters updated iteratively via backpropagation through time \(BPTT\):
ℒMSE=1BON‖𝐘−𝐅𝜽\(𝐗in\)‖F2,\\mathcal\{L\}\_\{\\text\{MSE\}\}=\\frac\{1\}\{B\\,O\\,N\}\\left\\\|\\mathbf\{Y\}\-\\mathbf\{F\}\_\{\\boldsymbol\{\\theta\}\}\(\\mathbf\{X\}\_\{\\text\{in\}\}\)\\right\\\|\_\{F\}^\{2\},\(6\)where∥⋅∥F\\\|\\cdot\\\|\_\{F\}denotes the Frobenius norm\.
### 3\.2Spiking Neuron Dynamics and Training
The fundamental unit of our model is the Leaky Integrate\-and\-Fire \(LIF\) neuron\[[20](https://arxiv.org/html/2606.13901#bib.bib20),[19](https://arxiv.org/html/2606.13901#bib.bib13)\], whose membrane potentialU\[t\]U\[t\]evolves at each discrete time stepttas
U\[t\]\\displaystyle U\[t\]=H\[t−Δt\]\+I\[t\],\\displaystyle=H\[t\-\\Delta t\]\+I\[t\],\(7\)S\[t\]\\displaystyle S\[t\]=𝟏\(U\[t\]≥ϑ\),\\displaystyle=\\mathbf\{1\}\\bigl\(U\[t\]\\geq\\vartheta\\bigr\),\(8\)H\[t\]\\displaystyle H\[t\]=VresetS\[t\]\+\(1−S\[t\]\)βU\[t\],\\displaystyle=V\_\{\\text\{reset\}\}\\,S\[t\]\+\\bigl\(1\-S\[t\]\\bigr\)\\,\\beta\\,U\[t\],\(9\)whereΔt\\Delta tis the discretization constant controlling the granularity of LIF modeling,I\[t\]I\[t\]is the input current computed by the preceding layer,β<1\\beta<1is the membrane decay factor,ϑ\\varthetais the firing threshold, andVresetV\_\{\\text\{reset\}\}is the reset potential\. When the membrane potential reachesϑ\\vartheta, the neuron emits a binary spikeS\[t\]=1S\[t\]=1and the potential is reset; otherwise it decays byβ\\beta\.
Since the Heaviside indicator𝟏\(⋅\)\\mathbf\{1\}\(\\cdot\)is non\-differentiable, we adopt the arctangent surrogate gradient\[[8](https://arxiv.org/html/2606.13901#bib.bib32)\]during BPTT:
S\[t\]≈1πarctan\(π2αU\[t\]\)\+12,S\[t\]\\approx\\frac\{1\}\{\\pi\}\\arctan\\\!\\left\(\\frac\{\\pi\}\{2\}\\,\\alpha\\,U\[t\]\\right\)\+\\frac\{1\}\{2\},\(10\)whereα\\alphacontrols the sharpness of the approximation\.
##### Temporal Alignment\.
Following\[[19](https://arxiv.org/html/2606.13901#bib.bib13)\], we align the continuous time series with the discrete spiking dimension by dividing each time\-series stepΔT\\Delta TintoTsT\_\{s\}finer SNN steps of sizeΔt\\Delta t, so thatΔT=TsΔt\\Delta T=T\_\{s\}\\Delta t\. This bridges the time\-series time stepΔT\\Delta Tand the SNN time stepΔt\\Delta t, allowing both to share the same temporal meaning\. The model therefore processesTs×T×NT\_\{s\}\\times T\\times Npossible spike events per sample, and a spike encoder converts the floating\-point inputs into spike trains of temporal resolutionTsT\_\{s\}\.
## 4Methodology
### 4\.1Overview
Given a multivariate input𝐗in∈ℝB×L×N\\mathbf\{X\}\_\{\\text\{in\}\}\\in\\mathbb\{R\}^\{B\\times L\\times N\}, the model proceeds through three stages: \(1\) anEncoderthat constructs a hypervariate graph representation, embeds it into a latent space, and converts it into spike trains, \(2\) aSpiking Fourier Graph Operator \(S\-FGO\)that performs sparse spectral mixing via a Hard Concrete frequency gate and LIF\-gated complex linear operators, and \(3\) aDecoderthat maps the processed representations to the prediction horizon𝐘^∈ℝB×O×N\\hat\{\\mathbf\{Y\}\}\\in\\mathbb\{R\}^\{B\\times O\\times N\}\. The overall architecture of SpikF\-GO is illustrated in Figure[1](https://arxiv.org/html/2606.13901#S4.F1)\.
### 4\.2Encoder with Hypervariate Graph
Following FourierGNN\[[32](https://arxiv.org/html/2606.13901#bib.bib10)\], we construct a hypervariate graph by treating every scalar observation in the input window as a node\. Unlike conventional spatio\-temporal approaches that model temporal and cross\-variable dependencies through separate modules, the hypervariate graph connects any two variables at any two time steps, simultaneously encoding intra\-series temporal dependencies, inter\-series spatial dependencies, and time\-varying cross\-variable interactions within a single unified structure\.
We define the graph formation operatorℋ:ℝB×L×N→ℝB×M\\mathcal\{H\}:\\mathbb\{R\}^\{B\\times L\\times N\}\\to\\mathbb\{R\}^\{B\\times M\}, which flattens the spatial and temporal dimensions of each sample intoM=N×LM=N\\times Lgraph nodes:
𝐗G=ℋ\(𝐗in\)∈ℝB×M,\\mathbf\{X\}^\{G\}=\\mathcal\{H\}\(\\mathbf\{X\}\_\{\\text\{in\}\}\)\\in\\mathbb\{R\}^\{B\\times M\},\(11\)whereℋ\\mathcal\{H\}is applied independently to each sample in the batch and each entryxb,mGx^\{G\}\_\{b,m\}corresponds to a single node in the fully\-connected hypervariate graph\. Each node is then projected into anEE\-dimensional embedding space via a learnable vector𝐞∈ℝ1×E\\mathbf\{e\}\\in\\mathbb\{R\}^\{1\\times E\}:
𝐕=𝐗\(⋅\)G⋅𝐞∈ℝB×M×E,\\mathbf\{V\}=\\mathbf\{X\}^\{G\}\_\{\(\\cdot\)\}\\cdot\\mathbf\{e\}\\in\\mathbb\{R\}^\{B\\times M\\times E\},\(12\)where the multiplication broadcasts over the embedding dimension so that each node scales the shared embedding𝐞\\mathbf\{e\}\. The node embeddings are refined by a learnable affine transformAenc\(⋅\)A\_\{\\mathrm\{enc\}\}\(\\cdot\)followed by Root Mean Square Normalization \(RMSNorm\)\[[34](https://arxiv.org/html/2606.13901#bib.bib22)\]over the node axisMM:
𝐕^=RMSNorm\(Aenc\(𝐕\)\)∈ℝB×M×E\.\\hat\{\\mathbf\{V\}\}=\\operatorname\{RMSNorm\}\\\!\\left\(A\_\{\\mathrm\{enc\}\}\(\\mathbf\{V\}\)\\right\)\\in\\mathbb\{R\}^\{B\\times M\\times E\}\.\(13\)where RMSNorm normalizes each channel independently over the node axis without mean subtraction, chosen over LayerNorm for its compatibility with neuromorphic hardware\[[1](https://arxiv.org/html/2606.13901#bib.bib21)\]\.
To produce a multi\-step spike representation compatible with the SNN temporal dimension,𝐕^\\hat\{\\mathbf\{V\}\}is replicated acrossTsT\_\{s\}SNN steps and modulated by learnable per\-step parametersγt,βt∈ℝ\\gamma\_\{t\},\\beta\_\{t\}\\in\\mathbb\{R\}:
𝐔t=𝐕^⋅γt\+βt,t=1,…,Ts\.\\mathbf\{U\}\_\{t\}=\\hat\{\\mathbf\{V\}\}\\cdot\\gamma\_\{t\}\+\\beta\_\{t\},\\quad t=1,\\ldots,T\_\{s\}\.\(14\)Each modulated signal is passed through a LIF encoder layer to produce binary spike trains, yielding the spike tensor:
𝐒=\[LIF\(𝐔1\),…,LIF\(𝐔Ts\)\]∈\{0,1\}Ts×B×M×E\.\\mathbf\{S\}=\\bigl\[\\operatorname\{LIF\}\(\\mathbf\{U\}\_\{1\}\),\\;\\ldots,\\;\\operatorname\{LIF\}\(\\mathbf\{U\}\_\{T\_\{s\}\}\)\\bigr\]\\in\\\{0,1\\\}^\{T\_\{s\}\\times B\\times M\\times E\}\.\(15\)
Figure 1:The overall architecture of SpikF\-GO\. \(a\) The input signal is formed into a hypervariate graph, embedded and encoded into spike trains, then processed by the S\-FGO block with sparse frequency gating andNℓN\_\{\\ell\}sequential Complex LIF\-gated Fourier Graph Operators before decoding\. \(b\) The Complex LIF gate applies independent LIF neurons to real and imaginary parts, combined via logical OR to produce a binary spike mask\. \(c\) The decoder compresses the temporal dimension, applies a LIF layer with average pooling over SNN steps, followed by GELU activation and a final linear projection\.
### 4\.3Spiking Fourier Graph Operators \(S\-FGO\)
##### S\-FFT\.
Motivated by the efficiency of Fourier\-domain graph convolutions\[[32](https://arxiv.org/html/2606.13901#bib.bib10)\]and the spiking FFT\[[28](https://arxiv.org/html/2606.13901#bib.bib17)\], we apply the Spiking Fast Fourier Transform \(S\-FFT\) along the node axisMMof the spike tensor\. Since multiplication in the Fourier domain of the hypervariate graph is equivalent to graph convolution in the node domain\[[32](https://arxiv.org/html/2606.13901#bib.bib10)\], all subsequent spectral operations perform implicit graph mixing\. The spike tensor is transformed as:
𝐙=𝒮ℱ\(𝐒\)∈ℂTs×B×F×E,\\mathbf\{Z\}=\\mathcal\{SF\}\(\\mathbf\{S\}\)\\in\\mathbb\{C\}^\{T\_\{s\}\\times B\\times F\\times E\},\(16\)where𝒮ℱ\\mathcal\{SF\}denotes the S\-FFT applied independently at each SNN step andF=⌊M/2⌋\+1F=\\lfloor M/2\\rfloor\+1is the number of frequency bins\.
##### Hard Concrete Frequency Gate\.
To encourage sparse frequency utilization and enable hardware\-friendly inference with a fixed set of active bins, we apply a Hard Concrete gate\[[16](https://arxiv.org/html/2606.13901#bib.bib23)\]over theFFfrequency bins\. A learnable log\-odds parameterlogαf\\log\\alpha\_\{f\}is maintained for each binff\. During training, the binary concrete distribution\[[21](https://arxiv.org/html/2606.13901#bib.bib24)\]is stretched to the\(γ,ζ\)\(\\gamma,\\zeta\)interval withγ<0\\gamma<0andζ\>1\\zeta\>1, and rectified via a hard\-sigmoid:
S¯f=σ\(logu−log\(1−u\)\+logαfτ\)\(ζ−γ\)\+γ,u∼Uniform\(0,1\),\\bar\{S\}\_\{f\}=\\sigma\\\!\\left\(\\frac\{\\log u\-\\log\(1\-u\)\+\\log\\alpha\_\{f\}\}\{\\tau\}\\right\)\\\!\(\\zeta\-\\gamma\)\+\\gamma,\\quad u\\sim\\mathrm\{Uniform\}\(0,1\),\(17\)Mf=min\(1,max\(0,S¯f\)\),M\_\{f\}=\\min\\\!\\left\(1,\\;\\max\\\!\\left\(0,\\;\\bar\{S\}\_\{f\}\\right\)\\right\),\(18\)whereτ\\tauis a temperature parameter\. This stretching ensures that the gate can take exact zero and one values, folding the probability mass of the underlying continuous distribution onto those endpoints\[[16](https://arxiv.org/html/2606.13901#bib.bib23)\]\. At inference, the stochasticity is removed and the gate is binarized asMf←𝟏\[σ\(logαf\)\(ζ−γ\)\+γ\>0\.5\]M\_\{f\}\\leftarrow\\mathbf\{1\}\[\\sigma\(\\log\\alpha\_\{f\}\)\(\\zeta\-\\gamma\)\+\\gamma\>0\.5\]to obtain a fixed binary frequency mask suitable for neuromorphic deployment\. The gated spectrum is:
𝐙~=𝐙⊙𝐌,𝐌∈\[0,1\]F,\\tilde\{\\mathbf\{Z\}\}=\\mathbf\{Z\}\\odot\\mathbf\{M\},\\quad\\mathbf\{M\}\\in\[0,1\]^\{F\},\(19\)where⊙\\odotbroadcasts overTsT\_\{s\},BB, andEE\. Anℓ0\\ell\_\{0\}sparsity penalty is added to the training objective to promote frequency pruning:
ℒℓ0=1F∑f=1Fσ\(logαf\)\.\\mathcal\{L\}\_\{\\ell\_\{0\}\}=\\frac\{1\}\{F\}\\sum\_\{f=1\}^\{F\}\\sigma\\\!\\left\(\\log\\alpha\_\{f\}\\right\)\.\(20\)
##### S\-FGO Block\.
The S\-FGO block applies a sequence ofNℓN\_\{\\ell\}complex\-valued linear operators in the frequency domain, with each operator gated by a Complex LIF activation\. Since each complex linear operator acts in the Fourier domain of the hypervariate graph, it performs learnable spectral graph mixing, corresponding to an implicit graph\-convolution\-like operation in the node domain\[[32](https://arxiv.org/html/2606.13901#bib.bib10)\]\. Given the gated spectrum𝐙~∈ℂTs×B×F×E\\tilde\{\\mathbf\{Z\}\}\\in\\mathbb\{C\}^\{T\_\{s\}\\times B\\times F\\times E\}, each layer applies a complex affine normalization𝐀\(n\)\\mathbf\{A\}^\{\(n\)\}, followed by a Complex LIF gate𝒢\\mathcal\{G\}, and a complex linear operator𝐖\(n\)\\mathbf\{W\}^\{\(n\)\}:
𝐙\(n\)=𝒢\(𝐖\(n\)⋅𝒢\(𝐀\(n\)\(𝐙\(n−1\)\)\)\),n=1,…,Nℓ,\\mathbf\{Z\}^\{\(n\)\}=\\mathcal\{G\}\\\!\\left\(\\mathbf\{W\}^\{\(n\)\}\\cdot\\mathcal\{G\}\\\!\\left\(\\mathbf\{A\}^\{\(n\)\}\(\\mathbf\{Z\}^\{\(n\-1\)\}\)\\right\)\\right\),\\quad n=1,\\ldots,N\_\{\\ell\},\(21\)where𝐙\(0\)=𝐙~\\mathbf\{Z\}^\{\(0\)\}=\\tilde\{\\mathbf\{Z\}\}\. Residual connections with learnable scaling parameters are employed between layers to stabilize training\. The Complex LIF gate𝒢\\mathcal\{G\}operates on a complex tensor𝐐=𝐐r\+i𝐐i\\mathbf\{Q\}=\\mathbf\{Q\}\_\{r\}\+i\\,\\mathbf\{Q\}\_\{i\}by applying independent LIF neurons to the real and imaginary parts and combining them via a logical OR:
𝒢\(𝐐\)=𝐐⊙\[𝟏\(LIF\(𝐐r\)\>0\)∨1\(LIF\(𝐐i\)\>0\)\],\\mathcal\{G\}\(\\mathbf\{Q\}\)=\\mathbf\{Q\}\\odot\\Bigl\[\\mathbf\{1\}\\\!\\left\(\\operatorname\{LIF\}\(\\mathbf\{Q\}\_\{r\}\)\>0\\right\)\\;\\vee\\;\\mathbf\{1\}\\\!\\left\(\\operatorname\{LIF\}\(\\mathbf\{Q\}\_\{i\}\)\>0\\right\)\\Bigr\],\(22\)ensuring that frequency components are gated in a binary, event\-driven fashion throughout the block, preserving the sparse computational nature of the spiking framework\.
##### S\-iFFT\.
After the S\-FGO block, the processed spectrum𝐙\(Nℓ\)∈ℂTs×B×F×E\\mathbf\{Z\}^\{\(N\_\{\\ell\}\)\}\\in\\mathbb\{C\}^\{T\_\{s\}\\times B\\times F\\times E\}is mapped back to the node domain via the Spiking Inverse Fast Fourier Transform \(S\-iFFT\):
𝐏=𝒮ℱ−1\(𝐙\(Nℓ\)\)∈ℝTs×B×N×E×L,\\mathbf\{P\}=\\mathcal\{SF\}^\{\-1\}\\\!\\left\(\\mathbf\{Z\}^\{\(N\_\{\\ell\}\)\}\\right\)\\in\\mathbb\{R\}^\{T\_\{s\}\\times B\\times N\\times E\\times L\},\(23\)where theMM\-dimensional node axis is separated back into the original variable and temporal dimensions\.
### 4\.4Decoder
The decoder maps𝐏∈ℝTs×B×N×E×L\\mathbf\{P\}\\in\\mathbb\{R\}^\{T\_\{s\}\\times B\\times N\\times E\\times L\}to the final predictions𝐘^∈ℝB×O×N\\hat\{\\mathbf\{Y\}\}\\in\\mathbb\{R\}^\{B\\times O\\times N\}\. First, the temporal dimensionLLis compressed to a small projection dimensionp≪Lp\\ll Lvia a linear projection𝐖p∈ℝL×p\\mathbf\{W\}\_\{p\}\\in\\mathbb\{R\}^\{L\\times p\}:
𝐏t′=𝐏t𝐖p∈ℝB×N×E×p,t=1,…,Ts\.\\mathbf\{P\}^\{\\prime\}\_\{t\}=\\mathbf\{P\}\_\{t\}\\mathbf\{W\}\_\{p\}\\in\\mathbb\{R\}^\{B\\times N\\times E\\times p\},\\quad t=1,\\ldots,T\_\{s\}\.\(24\)After reshaping𝐏t′\\mathbf\{P\}^\{\\prime\}\_\{t\}by merging the last two dimensions intoD=E⋅pD=E\\cdot p, a LIF layer and a linear projection𝐖1∈ℝD×dr\\mathbf\{W\}\_\{1\}\\in\\mathbb\{R\}^\{D\\times d\_\{r\}\}are applied at each spiking step\. The resulting representations are averaged over theTsT\_\{s\}SNN steps, followed by a Gaussian Error Linear Unit \(GELU\) activation and a final linear projection𝐖2∈ℝdr×O\\mathbf\{W\}\_\{2\}\\in\\mathbb\{R\}^\{d\_\{r\}\\times O\}\. Since the linear layers act on the last dimension, the decoder first produces outputs inℝB×N×O\\mathbb\{R\}^\{B\\times N\\times O\}, which are then transposed to match the forecasting convention:
𝐘^=Transpose\(N,O\)\(GELU\(1Ts∑t=1TsLIF\(Reshape\(𝐏t′\)\)𝐖1\)𝐖2\)\.\\hat\{\\mathbf\{Y\}\}=\\mathrm\{Transpose\}\_\{\(N,O\)\}\\\!\\left\(\\mathrm\{GELU\}\\\!\\left\(\\frac\{1\}\{T\_\{s\}\}\\sum\_\{t=1\}^\{T\_\{s\}\}\\mathrm\{LIF\}\\\!\\left\(\\mathrm\{Reshape\}\(\\mathbf\{P\}^\{\\prime\}\_\{t\}\)\\right\)\\mathbf\{W\}\_\{1\}\\right\)\\mathbf\{W\}\_\{2\}\\right\)\.\(25\)Here,𝐖1\\mathbf\{W\}\_\{1\}and𝐖2\\mathbf\{W\}\_\{2\}are weight\-normalized linear projections, anddrd\_\{r\}is a reduced hidden dimension\.
## 5Experiments
### 5\.1Experimental Settings
##### Datasets\.
We evaluate on eight public multivariate TSF benchmarks spanning traffic flow \(Traffic, METR\-LA, PEMS\-BAY\), energy systems \(Solar, Electricity\), epidemiological records \(COVID\-19\), biomedical signals \(ECG\), and web activity \(Wiki\), with5555–2,0002\{,\}000variables and 5\-minute to daily granularity, providing a broad evaluation setting for TSF\[[26](https://arxiv.org/html/2606.13901#bib.bib34),[32](https://arxiv.org/html/2606.13901#bib.bib10)\]; summary statistics are in Appendix[0\.A](https://arxiv.org/html/2606.13901#Pt0.A1)\. Following\[[32](https://arxiv.org/html/2606.13901#bib.bib10)\], all datasets except COVID\-19 are split chronologically into training, validation, and test sets with a 7:2:1 ratio; for COVID\-19 the ratio is 6:2:2\.
##### Baselines\.
We evaluate the proposedSpikF\-GOmodel introduced in Section 4 and a variant,SpikF\-GO w/ CPG, which uses the same CPG\-based positional encoding as in\[[18](https://arxiv.org/html/2606.13901#bib.bib15)\]\. The CPG module is used exactly as implemented in the original work\. We compare these models against nine SNN baselines spanning the major architectural families in SNN\-based forecasting\. These include SpikeRNN and SpikeTCN with CPG positional encodings, as well as Spikformer w/ CPG\[[18](https://arxiv.org/html/2606.13901#bib.bib15)\]; TS\-GRU, TS\-TCN, and TS\-Former\[[9](https://arxiv.org/html/2606.13901#bib.bib16)\]; SpikF\[[28](https://arxiv.org/html/2606.13901#bib.bib17)\]; and Spike\-GRU and iSpikformer\[[19](https://arxiv.org/html/2606.13901#bib.bib13)\]\. This set includes strong recent SNN models such as the TS\-LIF\-based models and SpikF, whose original study reports competitive or better average performance than other ANN baselines such as iTransformer\[[14](https://arxiv.org/html/2606.13901#bib.bib14)\]\. The single ANN baseline is FourierGNN\[[32](https://arxiv.org/html/2606.13901#bib.bib10)\], which is among the state\-of\-the\-art graph\-based ANN forecasters for multivariate TSF and the closest continuous\-valued counterpart to SpikF\-GO, since both methods build on the hypervariate graph formulation\. For each included baseline, we use the hyperparameters reported in the original paper on the datasets originally evaluated; for additional datasets, hyperparameters are tuned on the validation set\. Unless stated otherwise, our model usesNℓ=3N\_\{\\ell\}=3andE=128E=128, matching FourierGNN\.
##### Training Settings\.
All experiments use PyTorch 2\.5\.1 on a single NVIDIA RTX 4090\. All models are trained with MSE as the main objective \(Eq\.[6](https://arxiv.org/html/2606.13901#S3.E6)\)\. For SpikF\-GO, we additionally apply an adaptiveℓ0\\ell\_\{0\}sparsity penalty to the frequency gate \(Eq\.[20](https://arxiv.org/html/2606.13901#S4.E20)\) to encourage sparse frequency utilization\. We apply Reversible Instance Normalization \(RevIN\)\[[10](https://arxiv.org/html/2606.13901#bib.bib33)\]to all models and correct a data leakage issue in the original FourierGNN codebase, where normalization statistics were computed over the full dataset rather than using training data only\. In our experiments, preprocessing is performed strictly using training\-set statistics, ensuring proper train–test separation\. All SNN models useTs=4T\_\{s\}=4spiking time steps, except SpikF, which usesTs=16T\_\{s\}=16, following the original setting\. We evaluate test set performance using the Coefficient of Determination \(R2R^\{2\}\) and Mean Absolute Error \(MAE\); their formal definitions are provided in Appendix[0\.B](https://arxiv.org/html/2606.13901#Pt0.A2)\.
### 5\.2Main Results
Tables[1](https://arxiv.org/html/2606.13901#S5.T1)and[2](https://arxiv.org/html/2606.13901#S5.T2)report forecasting results on eight benchmarks, averaged over 5 runs, usingR2R^\{2\}\(higher is better\) and MAE \(lower is better\)\. The input window and forecast horizon are both 12\.Boldandunderlinedenote the best and second\-best results, respectively\. Avg\. Rank is the mean rank over all eight datasets, computed separately for each metric \(lower is better\)\.
SpikF\-GO w/ CPG achieves the best overall average rank, ranking first on bothR2R^\{2\}\(2\.4\)and MAE\(2\.3\), while SpikF\-GO achieves the second\-best average rank onR2R^\{2\}\(2\.8\)\. The benefit of CPG is consistent with findings in\[[18](https://arxiv.org/html/2606.13901#bib.bib15)\], where injecting explicit positional structure improved spiking RNN, TCN, and Transformer backbones; here, CPG similarly strengthens SpikF\-GO on most datasets\. Among the baselines, SpikF is the strongest SNN baseline, while FourierGNN achieves the second\-best average rank on MAE\. The remaining SNN baselines perform notably worse, likely due to the lack of explicit cross\-variate modeling\. Results with standard deviations and per\-horizon breakdowns are provided in Appendix[0\.C](https://arxiv.org/html/2606.13901#Pt0.A3)\.
Table 1:Forecasting results on ECG, COVID\-19 \(COVID\), Solar, and Electricity \(ECL\), averaged over 5 runs\.Boldandunderlineindicate the best and second\-best results in each column, respectively\. Avg\. Rank is computed over all eight datasets\.black
Table 2:Forecasting results on METR\-LA \(METR\), Traffic, PEMS\-BAY \(PEMS\), and Wiki, averaged over 5 runs\.Boldandunderlinedenote the best and second\-best results in each column\. Avg\. Rank is computed over all eight datasets\.black
### 5\.3Model Analysis
We ablate three components on Solar, Traffic, and COVID\-19, as shown in Table[3](https://arxiv.org/html/2606.13901#S5.T3)\. TheTemporal\-Onlyvariant, which removes cross\-variable modeling by processing each variable independently, causes the largest drop across all datasets, withR2R^\{2\}decreasing by 0\.018, 0\.074, and 0\.030, respectively\. This confirms that the hypervariate graph formulation is the main source of SpikF\-GO’s performance gains, particularly on Traffic, where sensor correlations are strong\. Replacing the learned Hard Concrete gate with fixedTop\-Kfrequency selection also degrades performance, indicating that adaptive frequency pruning is more effective than a fixed strategy\. Replacing RMSNorm with a simpleScale\-Shiftaffine transform, which is more suitable for neuromorphic hardware, yields nearly identical performance on Solar and only slight degradation on Traffic and COVID\-19, making it a practical hardware\-friendly alternative\.
Table 3:Ablation study on key components of SpikF\-GO\. Best results are shown inbold\.Figure[2](https://arxiv.org/html/2606.13901#S5.F2)shows the sensitivity of SpikF\-GO to three hyperparameters: spiking timestepsTsT\_\{s\}, input window lengthLL, and embedding sizeEE\. Shaded regions denote±\\pm1 standard deviation over three runs\. Performance improves with increasingTsT\_\{s\}, peaking atTs=8T\_\{s\}\{=\}8for Solar andTs=12T\_\{s\}\{=\}12for METR\-LA, with no further gains thereafter\. Increasing the input window fromL=96L\{=\}96toL=168L\{=\}168yields negligible improvement, indicating thatL=96L\{=\}96is sufficient for the considered datasets\. Likewise,R2R^\{2\}remains nearly unchanged fromE=8E\{=\}8toE=128E\{=\}128, showing that SpikF\-GO maintains performance even with compact embeddings, which is desirable for neuromorphic deployment due to lower energy and memory costs\.
Figure 2:Sensitivity analysis of SpikF\-GO on three hyperparameters: spiking timestepsTsT\_\{s\}, input window lengthLL, and embedding sizeEE\. Shaded regions denote±\\pm1 std over three runs\.Table[4](https://arxiv.org/html/2606.13901#S5.T4)reports theoretical energy consumption, wall\-clock runtime, and energy reduction relative to FourierGNN on Solar with prediction length 12\. Energy is estimated on 45 nm hardware following\[[13](https://arxiv.org/html/2606.13901#bib.bib35)\]and decomposed into memory\-access \(EMemE\_\{\\mathrm\{Mem\}\}\), operational \(EOpsE\_\{\\mathrm\{Ops\}\}\), and addressing \(EAddrE\_\{\\mathrm\{Addr\}\}\) components\. The energy cost of S\-FFT and S\-iFFT is estimated following\[[28](https://arxiv.org/html/2606.13901#bib.bib17)\]\. We assume 4\.6 pJ per FLOP and 0\.9 pJ per synaptic operation \(SOP\) on 45 nm hardware\[[31](https://arxiv.org/html/2606.13901#bib.bib36)\]\. Training and inference times are reported as average per\-batch runtimes\.
As shown in Table[4](https://arxiv.org/html/2606.13901#S5.T4), SpikF\-GO reduces energy consumption by1\.89×1\.89\\timesrelative to FourierGNN while achieving better forecasting performance\. Reducing the embedding dimension toE=8E\{=\}8further increases the reduction to7\.86×7\.86\\timeswith little change in performance \(Fig\.[2](https://arxiv.org/html/2606.13901#S5.F2)\), making it the most energy\-efficient graph\-based configuration\. Although SpikF achieves lower energy consumption than SpikF\-GO withE=128E\{=\}128\(4\.27×4\.27\\timesrelative to FourierGNN\), it does not explicitly model cross\-variate dependencies and does not outperform the full SpikF\-GO model\. In terms of wall\-clock runtime, FourierGNN is the fastest because it relies only on ANN computation, whereas SNN\-based models incur additional overhead from multi\-step spiking simulation on GPU; this overhead would not arise on neuromorphic hardware\.
Table 4:Energy consumption and runtime comparison across models\.↓n×\\downarrow\\\!n\\\!\\timesdenotes the energy reduction factor relative to the ANN baseline FourierGNN\. Runtime is reported as training / inference time per batch in seconds\.
## 6Conclusion
We introduced SpikF\-GO, a spiking model for multivariate time series forecasting that addresses the lack of explicit cross\-variable modeling in prior SNN forecasting methods through a hypervariate graph formulation and spike\-driven Fourier\-domain graph processing\. We further presented SpikF\-GO w/ CPG, which strengthens long\-range temporal modeling through positional encoding\. Experiments on eight benchmark datasets demonstrated the effectiveness of the proposed models, with SpikF\-GO w/ CPG achieving the best overall average rank and SpikF\-GO also outperforming strong SNN and ANN baselines\. In addition, the proposed approach provides clear theoretical energy benefits over FourierGNN, highlighting its potential for efficient neuromorphic forecasting\.
At the same time, our method has several limitations\. Its complexity and energy consumption increase when both the input length and the number of channels become large, which may limit scalability in high\-dimensional long\-horizon settings\. Moreover, our energy analysis is theoretical and does not yet include deployment on real neuromorphic hardware\. Future work will therefore focus on improving scalability, reducing computational cost for large\-scale inputs, and validating the proposed models in practical neuromorphic deployment scenarios\.
\{credits\}
#### 6\.0\.1\\discintname
The authors have no competing interests to declare\.
#### 6\.0\.2Use of Generative AI\.
Generative AI was used only for grammar correction and language polishing\.
## References
- \[1\]S\. Abreu, S\. B\. Shrestha, R\. Zhu, and J\. Eshraghian\(2025\)Neuromorphic principles for efficient large language models on intel loihi 2\.External Links:2503\.18002,[Link](https://arxiv.org/abs/2503.18002)Cited by:[§4\.2](https://arxiv.org/html/2606.13901#S4.SS2.p2.10)\.
- \[2\]L\. BAI, L\. Yao, C\. Li, X\. Wang, and C\. Wang\(2020\)Adaptive graph convolutional recurrent network for traffic forecasting\.InAdvances in Neural Information Processing Systems,H\. Larochelle, M\. Ranzato, R\. Hadsell, M\.F\. Balcan, and H\. Lin \(Eds\.\),Vol\.33,pp\. 17804–17815\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2020/file/ce1aad92b939420fc17005e5461e6f48-Paper.pdf)Cited by:[§2\.1](https://arxiv.org/html/2606.13901#S2.SS1.p2.1)\.
- \[3\]S\. Bai, J\. Z\. Kolter, and V\. Koltun\(2018\)An empirical evaluation of generic convolutional and recurrent networks for sequence modeling\.External Links:1803\.01271,[Link](https://arxiv.org/abs/1803.01271)Cited by:[§2\.1](https://arxiv.org/html/2606.13901#S2.SS1.p1.1)\.
- \[4\]D\. Cao, Y\. Wang, J\. Duan, C\. Zhang, X\. Zhu, C\. Huang, Y\. Tong, B\. Xu, J\. Bai, J\. Tong, and Q\. Zhang\(2021\)Spectral temporal graph neural network for multivariate time\-series forecasting\.External Links:2103\.07719,[Link](https://arxiv.org/abs/2103.07719)Cited by:[§2\.1](https://arxiv.org/html/2606.13901#S2.SS1.p2.1)\.
- \[5\]S\. Chen, C\. Li, N\. Yoder, S\. O\. Arik, and T\. Pfister\(2023\)TSMixer: an all\-mlp architecture for time series forecasting\.External Links:2303\.06053,[Link](https://arxiv.org/abs/2303.06053)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p3.1)\.
- \[6\]Y\. Chen, I\. Segovia\-Dominguez, B\. Coskunuzer, and Y\. R\. Gel\(2022\)TAMP\-s2gcnets: coupling time\-aware multipersistence knowledge representation with spatio\-supra graph convolutional networks for time\-series forecasting\.InInternational Conference on Learning Representations,External Links:[Link](https://api.semanticscholar.org/CorpusID:251648968)Cited by:[§2\.1](https://arxiv.org/html/2606.13901#S2.SS1.p2.1)\.
- \[7\]J\. K\. Eshraghian, M\. Ward, E\. Neftci, X\. Wang, G\. Lenz, G\. Dwivedi, M\. Bennamoun, D\. S\. Jeong, and W\. D\. Lu\(2023\)Training spiking neural networks using lessons from deep learning\.External Links:2109\.12894,[Link](https://arxiv.org/abs/2109.12894)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p1.1)\.
- \[8\]W\. Fang, Y\. Chen, J\. Ding, Z\. Yu, T\. Masquelier, D\. Chen, L\. Huang, H\. Zhou, G\. Li, and Y\. Tian\(2023\)SpikingJelly: an open\-source machine learning infrastructure platform for spike\-based intelligence\.Science Advances9\(40\),pp\. eadi1480\.External Links:[Document](https://dx.doi.org/10.1126/sciadv.adi1480),[Link](https://www.science.org/doi/abs/10.1126/sciadv.adi1480),https://www\.science\.org/doi/pdf/10\.1126/sciadv\.adi1480Cited by:[§3\.2](https://arxiv.org/html/2606.13901#S3.SS2.p2.1)\.
- \[9\]S\. Feng, W\. Feng, X\. Gao, P\. Zhao, and Z\. Shen\(2025\)TS\-lif: a temporal segment spiking neuron network for time series forecasting\.External Links:2503\.05108,[Link](https://arxiv.org/abs/2503.05108)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p2.1),[§1](https://arxiv.org/html/2606.13901#S1.p3.1),[§1](https://arxiv.org/html/2606.13901#S1.p4.1),[§2\.2](https://arxiv.org/html/2606.13901#S2.SS2.p2.1),[§5\.1](https://arxiv.org/html/2606.13901#S5.SS1.SSS0.Px2.p1.2)\.
- \[10\]T\. Kim, J\. Kim, Y\. Tae, C\. Park, J\. Choi, and J\. Choo\(2021\)Reversible instance normalization for accurate time\-series forecasting against distribution shift\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=cGDAkQo1C0p)Cited by:[§5\.1](https://arxiv.org/html/2606.13901#S5.SS1.SSS0.Px3.p1.4)\.
- \[11\]D\. Lee, Y\. Li, Y\. Kim, S\. Xiao, and P\. Panda\(2025\)Spiking transformer with spatial\-temporal attention\.External Links:2409\.19764,[Link](https://arxiv.org/abs/2409.19764)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p2.1)\.
- \[12\]Z\. Lei, M\. Yao, J\. Hu, X\. Luo, Y\. Lu, B\. Xu, and G\. Li\(2024\)Spike2Former: efficient spiking transformer for high\-performance image segmentation\.External Links:2412\.14587,[Link](https://arxiv.org/abs/2412.14587)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p2.1)\.
- \[13\]E\. Lemaire, L\. Cordone, A\. Castagnetti, P\. Novac, J\. Courtois, and B\. Miramond\(2023\)An analytical estimation ofăspiking neural networks energy efficiency\.InNeural Information Processing,pp\. 574–587\.External Links:ISBN 9783031301056,ISSN 1611\-3349,[Link](http://dx.doi.org/10.1007/978-3-031-30105-6_48),[Document](https://dx.doi.org/10.1007/978-3-031-30105-6%5F48)Cited by:[§5\.3](https://arxiv.org/html/2606.13901#S5.SS3.p3.3)\.
- \[14\]Y\. Liu, T\. Hu, H\. Zhang, H\. Wu, S\. Wang, L\. Ma, and M\. Long\(2024\)ITransformer: inverted transformers are effective for time series forecasting\.External Links:2310\.06625,[Link](https://arxiv.org/abs/2310.06625)Cited by:[§5\.1](https://arxiv.org/html/2606.13901#S5.SS1.SSS0.Px2.p1.2)\.
- \[15\]J\. Lopez\-Randulfe, N\. Reeb, N\. Karimi, C\. Liu, H\. A\. Gonzalez, R\. Dietrich, B\. Vogginger, C\. Mayr, and A\. Knoll\(2022\-11\)Time\-coded spiking fourier transform in neuromorphic hardware\.IEEE Transactions on Computers71\(11\),pp\. 2792–2802\.External Links:ISSN 2326\-3814,[Document](https://dx.doi.org/10.1109/tc.2022.3162708)Cited by:[§2\.2](https://arxiv.org/html/2606.13901#S2.SS2.p3.1)\.
- \[16\]C\. Louizos, M\. Welling, and D\. P\. Kingma\(2018\)Learning sparse neural networks throughL0L\_\{0\}regularization\.External Links:1712\.01312,[Link](https://arxiv.org/abs/1712.01312)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p4.1),[§4\.3](https://arxiv.org/html/2606.13901#S4.SS3.SSS0.Px2.p1.6),[§4\.3](https://arxiv.org/html/2606.13901#S4.SS3.SSS0.Px2.p1.8)\.
- \[17\]X\. Luo, M\. Yao, Y\. Chou, B\. Xu, and G\. Li\(2025\)Integer\-valued training and spike\-driven inference spiking neural network for high\-performance and energy\-efficient object detection\.External Links:2407\.20708,[Link](https://arxiv.org/abs/2407.20708)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p2.1)\.
- \[18\]C\. Lv, D\. Han, Y\. Wang, X\. Zheng, X\. Huang, and D\. Li\(2024\)Advancing spiking neural networks for sequential modeling with central pattern generators\.External Links:2405\.14362,[Link](https://arxiv.org/abs/2405.14362)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p3.1),[§1](https://arxiv.org/html/2606.13901#S1.p4.1),[§2\.2](https://arxiv.org/html/2606.13901#S2.SS2.p1.1),[§2\.2](https://arxiv.org/html/2606.13901#S2.SS2.p4.1),[§5\.1](https://arxiv.org/html/2606.13901#S5.SS1.SSS0.Px2.p1.2),[§5\.2](https://arxiv.org/html/2606.13901#S5.SS2.p2.2)\.
- \[19\]C\. Lv, Y\. Wang, D\. Han, X\. Zheng, X\. Huang, and D\. Li\(2024\)Efficient and effective time\-series forecasting with spiking neural networks\.External Links:2402\.01533,[Link](https://arxiv.org/abs/2402.01533)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.13901#S2.SS2.p1.1),[§3\.2](https://arxiv.org/html/2606.13901#S3.SS2.SSS0.Px1.p1.8),[§3\.2](https://arxiv.org/html/2606.13901#S3.SS2.p1.2),[§5\.1](https://arxiv.org/html/2606.13901#S5.SS1.SSS0.Px2.p1.2)\.
- \[20\]W\. Maass\(1997\)Networks of spiking neurons: the third generation of neural network models\.Neural Networks10\(9\),pp\. 1659–1671\.External Links:ISSN 0893\-6080,[Document](https://dx.doi.org/10.1016/S0893-6080%2897%2900011-7)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p1.1),[§3\.2](https://arxiv.org/html/2606.13901#S3.SS2.p1.2)\.
- \[21\]C\. J\. Maddison, A\. Mnih, and Y\. W\. Teh\(2017\)The concrete distribution: a continuous relaxation of discrete random variables\.External Links:1611\.00712,[Link](https://arxiv.org/abs/1611.00712)Cited by:[§4\.3](https://arxiv.org/html/2606.13901#S4.SS3.SSS0.Px2.p1.6)\.
- \[22\]G\. Orchard, E\. P\. Frady, D\. B\. D\. Rubin, S\. Sanborn, S\. B\. Shrestha, F\. T\. Sommer, and M\. Davies\(2021\)Efficient neuromorphic signal processing with loihi 2\.External Links:2111\.03746,[Link](https://arxiv.org/abs/2111.03746)Cited by:[§2\.2](https://arxiv.org/html/2606.13901#S2.SS2.p3.1)\.
- \[23\]K\. Roy, A\. Jaiswal, and P\. Panda\(2019\)Towards spike\-based machine intelligence with neuromorphic computing\.Nature575\(7784\),pp\. 607–617\.External Links:[Document](https://dx.doi.org/10.1038/s41586-019-1677-2)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p1.1)\.
- \[24\]D\. Salinas, V\. Flunkert, and J\. Gasthaus\(2019\)DeepAR: probabilistic forecasting with autoregressive recurrent networks\.External Links:1704\.04110,[Link](https://arxiv.org/abs/1704.04110)Cited by:[§2\.1](https://arxiv.org/html/2606.13901#S2.SS1.p1.1)\.
- \[25\]C\. D\. Schuman, S\. R\. Kulkarni, M\. Parsa, J\. P\. Mitchell, P\. Date, and B\. Kay\(2022\-01\)Opportunities for neuromorphic computing algorithms and applications\.Nature Computational Science2\(1\)\.Note:Neuromorphic computing technologies will be important for the future of computing, but much of the work in neuromorphic computing has focused on hardware development\. In this study, we review recent results in neuromorphic computing algorithms and applications\. We highlight characteristics of neuromorphic computing technologies that make them attractive for the future of computing and we discuss opportunities for future development of algorithms and applications on these systems\.External Links:[Document](https://dx.doi.org/10.1038/s43588-021-00184-y),ISSN ISSN 2662\-8457Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p1.1)\.
- \[26\]R\. Sen, H\. Yu, and I\. Dhillon\(2019\)Think globally, act locally: a deep neural network approach to high\-dimensional time series forecasting\.External Links:1905\.03806,[Link](https://arxiv.org/abs/1905.03806)Cited by:[§5\.1](https://arxiv.org/html/2606.13901#S5.SS1.SSS0.Px1.p1.2)\.
- \[27\]H\. Wu, J\. Xu, J\. Wang, and M\. Long\(2022\)Autoformer: decomposition transformers with auto\-correlation for long\-term series forecasting\.External Links:2106\.13008,[Link](https://arxiv.org/abs/2106.13008)Cited by:[§2\.1](https://arxiv.org/html/2606.13901#S2.SS1.p1.1)\.
- \[28\]W\. Wu, D\. Huo, and H\. Chen\(2025\)SpikF: spiking fourier network for efficient long\-term prediction\.InForty\-second International Conference on Machine Learning,External Links:[Link](https://openreview.net/forum?id=5jlvLwoO1n)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p1.1),[§1](https://arxiv.org/html/2606.13901#S1.p3.1),[§1](https://arxiv.org/html/2606.13901#S1.p4.1),[§2\.2](https://arxiv.org/html/2606.13901#S2.SS2.p3.1),[§2\.2](https://arxiv.org/html/2606.13901#S2.SS2.p4.1),[§4\.3](https://arxiv.org/html/2606.13901#S4.SS3.SSS0.Px1.p1.1),[§5\.1](https://arxiv.org/html/2606.13901#S5.SS1.SSS0.Px2.p1.2),[§5\.3](https://arxiv.org/html/2606.13901#S5.SS3.p3.3)\.
- \[29\]Z\. Wu, S\. Pan, G\. Long, J\. Jiang, X\. Chang, and C\. Zhang\(2020\)Connecting the dots: multivariate time series forecasting with graph neural networks\.External Links:2005\.11650,[Link](https://arxiv.org/abs/2005.11650)Cited by:[§2\.1](https://arxiv.org/html/2606.13901#S2.SS1.p2.1)\.
- \[30\]M\. Yao, X\. Qiu, T\. Hu, J\. Hu, Y\. Chou, K\. Tian, J\. Liao, L\. Leng, B\. Xu, and G\. Li\(2025\-04\)Scaling spike\-driven transformer with efficient spike firing approximation training\.IEEE Transactions on Pattern Analysis and Machine Intelligence47\(4\),pp\. 2973–2990\.External Links:ISSN 1939\-3539,[Document](https://dx.doi.org/10.1109/tpami.2025.3530246)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p2.1)\.
- \[31\]M\. Yao, G\. Zhao, H\. Zhang, Y\. Hu, L\. Deng, Y\. Tian, B\. Xu, and G\. Li\(2022\)Attention spiking neural networks\.External Links:2209\.13929,[Link](https://arxiv.org/abs/2209.13929)Cited by:[§5\.3](https://arxiv.org/html/2606.13901#S5.SS3.p3.3)\.
- \[32\]K\. Yi, Q\. Zhang, W\. Fan, H\. He, L\. Hu, P\. Wang, N\. An, L\. Cao, and Z\. Niu\(2023\)FourierGNN: rethinking multivariate time series forecasting from a pure graph perspective\.External Links:2311\.06190,[Link](https://arxiv.org/abs/2311.06190)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p3.1),[§1](https://arxiv.org/html/2606.13901#S1.p4.1),[§2\.1](https://arxiv.org/html/2606.13901#S2.SS1.p1.1),[§2\.1](https://arxiv.org/html/2606.13901#S2.SS1.p2.1),[§2\.1](https://arxiv.org/html/2606.13901#S2.SS1.p3.1),[§2\.2](https://arxiv.org/html/2606.13901#S2.SS2.p4.1),[§4\.2](https://arxiv.org/html/2606.13901#S4.SS2.p1.1),[§4\.3](https://arxiv.org/html/2606.13901#S4.SS3.SSS0.Px1.p1.1),[§4\.3](https://arxiv.org/html/2606.13901#S4.SS3.SSS0.Px3.p1.5),[§5\.1](https://arxiv.org/html/2606.13901#S5.SS1.SSS0.Px1.p1.2),[§5\.1](https://arxiv.org/html/2606.13901#S5.SS1.SSS0.Px2.p1.2)\.
- \[33\]B\. Yu, H\. Yin, and Z\. Zhu\(2018\-07\)Spatio\-temporal graph convolutional networks: a deep learning framework for traffic forecasting\.InProceedings of the Twenty\-Seventh International Joint Conference on Artificial Intelligence,IJCAI\-2018,pp\. 3634–3640\.External Links:[Document](https://dx.doi.org/10.24963/ijcai.2018/505)Cited by:[§2\.1](https://arxiv.org/html/2606.13901#S2.SS1.p2.1)\.
- \[34\]B\. Zhang and R\. Sennrich\(2019\)Root mean square layer normalization\.External Links:1910\.07467,[Link](https://arxiv.org/abs/1910.07467)Cited by:[§4\.2](https://arxiv.org/html/2606.13901#S4.SS2.p2.9)\.
- \[35\]C\. Zhou, H\. Zhang, Z\. Zhou, L\. Yu, L\. Huang, X\. Fan, L\. Yuan, Z\. Ma, H\. Zhou, and Y\. Tian\(2024\)QKFormer: hierarchical spiking transformer using q\-k attention\.External Links:2403\.16552,[Link](https://arxiv.org/abs/2403.16552)Cited by:[§1](https://arxiv.org/html/2606.13901#S1.p2.1)\.
- \[36\]H\. Zhou, S\. Zhang, J\. Peng, S\. Zhang, J\. Li, H\. Xiong, and W\. Zhang\(2021\)Informer: beyond efficient transformer for long sequence time\-series forecasting\.External Links:2012\.07436,[Link](https://arxiv.org/abs/2012.07436)Cited by:[§2\.1](https://arxiv.org/html/2606.13901#S2.SS1.p1.1)\.
## Appendix 0\.ADataset Statistics
Table[5](https://arxiv.org/html/2606.13901#Pt0.A1.T5)summarizes the benchmark datasets used in our experiments, including the number of variables or channels \(\#Vars\.\), the number of observations \(\#Obs\.\), the sampling frequency \(Freq\.\), and the application domain\. For the Wiki dataset, we randomly sample 2,000 variables from the full dataset, which contains more than 100,000 time series\. We provide the processed datasets used in our experiments at[https://figshare\.com/s/7617530bce306584fe95?file=62576929](https://figshare.com/s/7617530bce306584fe95?file=62576929)\.
Table 5:Summary of benchmark datasets\. ECL denotes Electricity\.
## Appendix 0\.BEvaluation Metrics
We evaluate all models using two complementary metrics: the Coefficient of Determination \(R2R^\{2\}\) and Mean Absolute Error \(MAE\)\. Following the notation introduced in Section 3\.1, letNNdenote the number of variables,OOthe forecasting horizon, and𝐘^=Fθ\(𝐗in\)∈ℝB×O×N\\hat\{\\mathbf\{Y\}\}=F\_\{\\theta\}\(\\mathbf\{X\}\_\{\\text\{in\}\}\)\\in\\mathbb\{R\}^\{B\\times O\\times N\}the model predictions\. We writeYo,nbY^\{b\}\_\{o,n\}andY^o,nb\\hat\{Y\}^\{b\}\_\{o,n\}for the ground\-truth and predicted value of variablennat horizon stepoofor samplebb, respectively\.
##### Coefficient of Determination \(R2R^\{2\}\)\.
R2R^\{2\}measures the proportion of variance in the ground\-truth targets explained by the model\. Higher values indicate better fit, with a maximum of11\.
R2=1−∑b=1B∑o=1O∑n=1N\(Yo,nb−Y^o,nb\)2∑b=1B∑o=1O∑n=1N\(Yo,nb−Y¯\)2,R^\{2\}=1\-\\frac\{\\displaystyle\\sum\_\{b=1\}^\{B\}\\sum\_\{o=1\}^\{O\}\\sum\_\{n=1\}^\{N\}\\left\(Y^\{b\}\_\{o,n\}\-\\hat\{Y\}^\{b\}\_\{o,n\}\\right\)^\{2\}\}\{\\displaystyle\\sum\_\{b=1\}^\{B\}\\sum\_\{o=1\}^\{O\}\\sum\_\{n=1\}^\{N\}\\left\(Y^\{b\}\_\{o,n\}\-\\bar\{Y\}\\right\)^\{2\}\},\(26\)where
Y¯=1BON∑b=1B∑o=1O∑n=1NYo,nb\\bar\{Y\}=\\frac\{1\}\{BON\}\\sum\_\{b=1\}^\{B\}\\sum\_\{o=1\}^\{O\}\\sum\_\{n=1\}^\{N\}Y^\{b\}\_\{o,n\}is the global mean of the ground\-truth targets over the test set\.
##### Mean Absolute Error \(MAE\)\.
MAE measures the average absolute difference between the predicted and ground\-truth values\. Lower values indicate better performance\.
MAE=1BON∑b=1B∑o=1O∑n=1N\|Yo,nb−Y^o,nb\|\.\\mathrm\{MAE\}=\\frac\{1\}\{BON\}\\sum\_\{b=1\}^\{B\}\\sum\_\{o=1\}^\{O\}\\sum\_\{n=1\}^\{N\}\\left\|Y^\{b\}\_\{o,n\}\-\\hat\{Y\}^\{b\}\_\{o,n\}\\right\|\.\(27\)
## Appendix 0\.CAdditional Results
This section provides additional experimental results\. Section[0\.C\.1](https://arxiv.org/html/2606.13901#Pt0.A3.SS1)presents the main\-paper results with standard deviations, and Section[0\.C\.2](https://arxiv.org/html/2606.13901#Pt0.A3.SS2)reports results across forecast horizons\{6,12,24,48\}\\\{6,12,24,48\\\}, also with standard deviations\.
### 0\.C\.1Main Results with Standard Deviations
Tables[6](https://arxiv.org/html/2606.13901#Pt0.A3.T6)and[7](https://arxiv.org/html/2606.13901#Pt0.A3.T7)report the results for the setting with input length 12 and forecast horizon 12\. In addition to the main\-paper results, they also include the corresponding mean and standard deviation over 5 random seeds\.
Table 6:Forecasting results on ECG, COVID\-19 \(COVID\), Solar, and Electricity \(ECL\), averaged over 5 runs\. Values are reported as mean±\\pmstandard deviation\.Boldandunderlineindicate the best and second\-best results in each column, respectively\.Table 7:Forecasting results on METR\-LA \(METR\), Traffic, PEMS\-BAY \(PEMS\), and Wiki, averaged over 5 runs\. Values are reported as mean±\\pmstandard deviation\.Boldandunderlineindicate the best and second\-best results in each column, respectively\.
### 0\.C\.2Results Across Forecast Horizons
Tables[8](https://arxiv.org/html/2606.13901#Pt0.A3.T8)–[11](https://arxiv.org/html/2606.13901#Pt0.A3.T11)report forecasting performance across forecast horizons\{6,12,24,48\}\\\{6,12,24,48\\\}\. Results are averaged over 3 random seeds and reported as mean±\\pmstandard deviation, except for the horizon\-12 setting in the main paper, which is averaged over 5 random seeds\. Avg\. denotes the mean score over the four forecast horizons\. The compared methods include the top four baseline models for each dataset and our two proposed models\.
Table 8:COVID\-19 results across various forecast horizons\. Avg\. averages the mean scores over horizons \{6, 12, 24, 48\}; standard deviations are shown only for individual horizons\.Bold/underlinedenote best/second\-best based on the unrounded mean values\.Table 9:PEMS\-BAY results across various forecast horizons\. Avg\. averages the mean scores over horizons \{6, 12, 24, 48\}; standard deviations are shown only for individual horizons\.Bold/underlinedenote best/second\-best based on the unrounded mean values\.ModelsMetric𝟔\\mathbf\{6\}𝟏𝟐\\mathbf\{12\}𝟐𝟒\\mathbf\{24\}𝟒𝟖\\mathbf\{48\}Avg\.FourierGNNR2↑R^\{2\}\\\!\\\!\\uparrow\.847±\\pm\.001\.733±\\pm\.003\.508±\\pm\.001\.123±\\pm\.008\.553MAE↓\\downarrow1\.62±\\pm\.0022\.16±\\pm\.0023\.04±\\pm\.0024\.39±\\pm\.0102\.80SpikeTCNR2↑R^\{2\}\\\!\\\!\\uparrow\.857±\\pm\.000\.747±\\pm\.001\.539±\\pm\.001\.078±\\pm\.012\.555MAE↓\\downarrow1\.72±\\pm\.0002\.25±\\pm\.0023\.09±\\pm\.0084\.48±\\pm\.0132\.89TS\-TCNR2↑R^\{2\}\\\!\\\!\\uparrow\.854±\\pm\.003\.743±\\pm\.004\.520±\\pm\.013\.189±\\pm\.000\.577MAE↓\\downarrow1\.74±\\pm\.0122\.27±\\pm\.0083\.13±\\pm\.0144\.46±\\pm\.0002\.90SpikFR2↑R^\{2\}\\\!\\\!\\uparrow\.862±\\pm\.001\.740±\\pm\.001\.504±\\pm\.000\.120±\\pm\.000\.557MAE↓\\downarrow1\.66±\\pm\.0012\.21±\\pm\.0013\.10±\\pm\.0014\.41±\\pm\.0012\.85SpikF\-GOR2↑R^\{2\}\\\!\\\!\\uparrow\.870±\\pm\.001\.762±\\pm\.002\.545±\\pm\.004\.209±\\pm\.005\.597MAE↓\\downarrow1\.62±\\pm\.0002\.17±\\pm\.0083\.03±\\pm\.0024\.33±\\pm\.0102\.79SpikF\-GO w/ CPGR2↑R^\{2\}\\\!\\\!\\uparrow\.873±\\pm\.000\.766±\\pm\.002\.569±\\pm\.000\.225±\\pm\.006\.608MAE↓\\downarrow1\.62±\\pm\.0032\.16±\\pm\.0123\.02±\\pm\.0234\.34±\\pm\.0282\.79Table 10:Solar results across various forecast horizons\. Avg\. averages the mean scores over horizons \{6, 12, 24, 48\}; standard deviations are shown only for individual horizons\.Bold/underlinedenote best/second\-best based on the unrounded mean values\.ModelsMetric𝟔\\mathbf\{6\}𝟏𝟐\\mathbf\{12\}𝟐𝟒\\mathbf\{24\}𝟒𝟖\\mathbf\{48\}Avg\.FourierGNNR2↑R^\{2\}\\\!\\\!\\uparrow\.767±\\pm\.001\.742±\\pm\.001\.703±\\pm\.001\.670±\\pm\.001\.721MAE↓\\downarrow8\.56±\\pm\.0089\.03±\\pm\.0199\.80±\\pm\.02010\.47±\\pm\.029\.47SpikeTCNR2↑R^\{2\}\\\!\\\!\\uparrow\.743±\\pm\.007\.706±\\pm\.006\.656±\\pm\.005\.620±\\pm\.007\.681MAE↓\\downarrow9\.26±\\pm\.1489\.84±\\pm\.09810\.74±\\pm\.1211\.29±\\pm\.0910\.28TS\-TCNR2↑R^\{2\}\\\!\\\!\\uparrow\.740±\\pm\.004\.707±\\pm\.007\.658±\\pm\.004\.621±\\pm\.006\.682MAE↓\\downarrow9\.28±\\pm\.0899\.80±\\pm\.10010\.67±\\pm\.1111\.30±\\pm\.0610\.26SpikFR2↑R^\{2\}\\\!\\\!\\uparrow\.735±\\pm\.000\.712±\\pm\.001\.667±\\pm\.002\.633±\\pm\.001\.687MAE↓\\downarrow9\.20±\\pm\.0229\.61±\\pm\.03210\.43±\\pm\.04511\.14±\\pm\.02410\.10SpikF\-GOR2↑R^\{2\}\\\!\\\!\\uparrow\.765±\\pm\.004\.740±\\pm\.005\.702±\\pm\.008\.665±\\pm\.014\.718MAE↓\\downarrow8\.61±\\pm\.0689\.18±\\pm\.0659\.87±\\pm\.09810\.51±\\pm\.1649\.54SpikF\-GO w/ CPGR2↑R^\{2\}\\\!\\\!\\uparrow\.767±\\pm\.011\.742±\\pm\.005\.707±\\pm\.016\.665±\\pm\.028\.720MAE↓\\downarrow8\.60±\\pm\.1449\.03±\\pm\.1069\.66±\\pm\.20310\.40±\\pm\.379\.42Table 11:Electricity results across various forecast horizons\. Avg\. averages the mean scores over horizons \{6, 12, 24, 48\}; standard deviations are shown only for individual horizons\.Bold/underlinedenote best/second\-best based on the unrounded mean values\.Similar Articles
@AnimaAnandkumar: Great to see extrapolation success with FNOs.
Fourier neural operators (FNOs) achieve extrapolation success in modeling periodically driven quantum systems, capturing temporal correlations in frequency space for physically faithful dynamics beyond training data.
Graph-Conditioned Mixture of Graph Neural Network Experts for Traffic Forecasting
Proposes GC-MoE, a graph-conditioned mixture of experts framework for traffic forecasting that assigns each node a personalized combination of frozen pretrained spatio-temporal GNN experts based on graph topology and recent input, training only a lightweight routing module (∼17K parameters) and achieving competitive performance on four benchmarks.
Nested Spatio-Temporal Time Series Forecasting
This paper proposes a nested spatiotemporal forecasting framework that uses spectral clustering to construct semantically coherent macro-level regions, which provide top-down guidance for fine-grained micro-level predictions. Experiments on high-dimensional datasets show consistent improvements over state-of-the-art baselines.
Graph spectral analysis (Fiedler value + Scheffer CSD indicators) predicts grokking 21k steps before loss function - five reproducible experiments [R]
Applies graph spectral analysis (Fiedler value) and Scheffer critical slowing down indicators to predict grokking in neural networks, detecting it 21,000 steps before the loss function changes, across five reproducible experiments.
Otters++: A Time-to-first-spike Based Energy Efficient Optical Spiking Transformer
Otters++ is a novel optical spiking Transformer that leverages time-to-first-spike coding and physical hardware decay to achieve energy-efficient inference, achieving 84.17% on GLUE while maintaining a clear energy advantage over prior spiking Transformer baselines.