Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification

arXiv cs.LG Papers

Summary

This paper proposes DSFM, a novel generative framework that uses wavelet decomposition and spectral flow matching to synthesize realistic fMRI time series for brain disorder identification, addressing data scarcity and non-stationarity challenges.

arXiv:2605.30387v1 Announce Type: new Abstract: Functional Magnetic Resonance Imaging (fMRI) provides non-invasive access to dynamic brain activity by measuring blood oxygen level-dependent (BOLD) signals over time. However, the resource-intensive nature of fMRI acquisition limits the availability of high-fidelity samples required for data-driven brain analysis models. While modern generative models can synthesize fMRI data, they often remain challenging in replicating their inherent non-stationarity, intricate spatiotemporal dynamics, and physiological variations of raw BOLD signals. To address these challenges, we propose Dual-Spectral Flow Matching (DSFM), a novel fMRI generative framework that cascades dual frequency representation of BOLD signals with spectral flow matching. Specifically, our framework first converts BOLD signals into a wavelet decomposition map via a discrete wavelet transform (DWT) to capture globalized transient and multi-scale variations, and projects into the discrete cosine transform (DCT) space across brain regions and time to exploit localized energy compaction of low-frequency dominant BOLD coefficients. Subsequently, a spectral flow matching model is trained to generate class-conditioned cosine-frequency representation. The generated samples are reconstructed through inverse DCT and inverse DWT operations to recover physiologically plausible time-domain BOLD signals. This dual-transform approach imposes structured frequency priors and preserves key physiological brain dynamics. Ultimately, we demonstrate the efficacy of our approach through improved downstream fMRI-based brain network classification. The code is available at https://github.com/htew0001/DSFM.git .
Original Article
View Cached Full Text

Cached at: 06/01/26, 09:23 AM

# Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification
Source: [https://arxiv.org/html/2605.30387](https://arxiv.org/html/2605.30387)
Hwa Hui Tew1, Junn Yong Loo111footnotemark:1, Fang Yu Leong1, Julia K\. Lau1, Ding Fan1, Hernando Ombao3, Raphaël C\.\-W\. Phan1, Chee Pin Tan2, Chee\-Ming Ting1 1School of Information Technology, Monash University Malaysia 2School of Engineering, Monash University Malaysia 3Statistics Program, King Abdullah University of Science and Technology \{hwa\.tew,loo\.junnyong,ting\.cheeming\}@monash\.edu

###### Abstract

Functional Magnetic Resonance Imaging \(fMRI\) provides non\-invasive access to dynamic brain activity by measuring blood oxygen level\-dependent \(BOLD\) signals over time\. However, the resource\-intensive nature of fMRI acquisition limits the availability of high\-fidelity samples required for data\-driven brain analysis models\. While modern generative models can synthesize fMRI data, they often remain challenging in replicating their inherent non\-stationarity, intricate spatiotemporal dynamics, and physiological variations of raw BOLD signals\. To address these challenges, we propose Dual\-Spectral Flow Matching \(DSFM\), a novel fMRI generative framework that cascades dual frequency representation of BOLD signals with spectral flow matching\. Specifically, our framework first converts BOLD signals into a wavelet decomposition map via a discrete wavelet transform \(DWT\) to capture globalized transient and multi\-scale variations, and projects into the discrete cosine transform \(DCT\) space across brain regions and time to exploit localized energy compaction of low\-frequency dominant BOLD coefficients\. Subsequently, a spectral flow matching model is trained to generate class\-conditioned cosine\-frequency representation\. The generated samples are reconstructed through inverse DCT and inverse DWT operations to recover physiologically plausible time\-domain BOLD signals\. This dual\-transform approach imposes structured frequency priors and preserves key physiological brain dynamics\. Ultimately, we demonstrate the efficacy of our approach through improved downstream fMRI\-based brain network classification\. The code is available at[https://github\.com/htew0001/DSFM\.git](https://github.com/htew0001/DSFM.git)\.

## 1Introduction

Recent advances in deep generative modeling have shown promising capability in synthesizing realistic yet diverse variations of neuroimaging modalities\(Yapet al\.,[2024](https://arxiv.org/html/2605.30387#bib.bib16)\)\. Among available modalities, functional MRI \(fMRI\) signals offer a non\-invasive view of neuronal activity, critical for diagnosing neuropsychiatric and neurodevelopmental disorders\(Nomanet al\.,[2022](https://arxiv.org/html/2605.30387#bib.bib19);[2024](https://arxiv.org/html/2605.30387#bib.bib17)\)\. However, fMRI data collection is costly and yields limited, often imbalanced datasets\(Tewet al\.,[2025b](https://arxiv.org/html/2605.30387#bib.bib57)\)\. These shortcomings limit the generalizability of data\-driven brain analysis models, ultimately affecting the reliability of computer\-aided clinical tools for neurological and psychiatric conditions\(Bollmann and Barth,[2021](https://arxiv.org/html/2605.30387#bib.bib14); Tinget al\.,[2022](https://arxiv.org/html/2605.30387#bib.bib20)\)\. To address these challenges, generative models have been explored for fMRI signal synthesis to support data augmentation and downstream applications\(Poweret al\.,[2014](https://arxiv.org/html/2605.30387#bib.bib15); Tewet al\.,[2025b](https://arxiv.org/html/2605.30387#bib.bib57)\)\.

Most existing approaches generate brain connectivity directly in the functional connectivity \(FC\) space, where BOLD signal dependencies are summarized by a single correlation matrix\(Biswal and Uddin,[2025](https://arxiv.org/html/2605.30387#bib.bib6)\)\. For instance,Tanet al\.\([2024b](https://arxiv.org/html/2605.30387#bib.bib7)\)proposes a DCGAN that preserves connectomic structure and improves the performance of downstream FC classifiers\. Similarly, BrainFC\-CGAN jointly trains adversarial and supervised loss components to preserve the subject identity of real FC on synthetic samples\(Tanet al\.,[2024a](https://arxiv.org/html/2605.30387#bib.bib8)\)\. However, such FC representations encode static pairwise relations into dyads and do not effectively capture transient network states within human brain networks\(Shabestariet al\.,[2025](https://arxiv.org/html/2605.30387#bib.bib4)\)\.

Recent works have revisited time\-domain modeling of fMRI as an alternative to correlation\-based functional connectivity \(FC\)\.Yuan and Qiao \([2024](https://arxiv.org/html/2605.30387#bib.bib45)\)designs diffusion\-TS, a denoising diffusion probabilistic model \(DDPM\) for fMRI time series data generation, showing improved robustness over GANs and \(Variational Autoencoder\) VAE\-based generative models\.Huet al\.\([2024](https://arxiv.org/html/2605.30387#bib.bib5)\)proposes FM\-TS that accelerates the sampling step yet provides quality synthetic samples via a flow matching framework\. While these methods shift focus from traditional FC to time\-series data generation, their feasibility and effectiveness for neuroimaging tasks remain largely unexplored\. We argue that limiting generative modeling to FC matrices or the raw time series is inadequate to faithfully reproduce the brain’s transient state, multiscale oscillations, and cross\-frequency interactions due to difficulties in disentangling physiologically driven fluctuations \(e\.g\., cardiac pulsation, respiratory cycles, motion\-induced artifacts\)\(Biswal and Uddin,[2025](https://arxiv.org/html/2605.30387#bib.bib6)\)\. In contrast, a time\-frequency/scale representation that captures time and spectral BOLD information can fully reproduce the rich spatiotemporal dynamics of BOLD signals\. Motivated by T2I\-Diff and ImagenTime, both of which frame time\-series signals as an image\-generation task\(Tewet al\.,[2025b](https://arxiv.org/html/2605.30387#bib.bib57); Naimanet al\.,[2024](https://arxiv.org/html/2605.30387#bib.bib56)\)\. T2I\-Diff specifically remodeled and validated the feasibility of this time\-frequency image\-based approach for generating BOLD signals\. Crucially, the performance gains were modest due to the fixed\-resolution Short\-Time Fourier Transform \(STFT\) representation, which neglects fine\-grained transients and attenuates frequency amplitude modulations, leading to artifacts during the image\-to\-signal reconstruction\(Tewet al\.,[2025b](https://arxiv.org/html/2605.30387#bib.bib57)\)\.

To address these issues, in this paper, we propose Dual\-Spectral Flow Matching \(DSFM\), an fMRI generation framework that cascades two spectral transformations of BOLD signals and integrates a spectral flow matching for generative modeling\. Our framework first decomposed BOLD signals using the discrete wavelet transform \(DWT\) to form multiresolution time\-scale scalogram images\. Subsequently, we compute a discrete cosine transform \(DCT\) that exploits low\-frequency BOLD coefficients\. These transforms produce a dual\-spectral view in which local and global dynamics are jointly represented\. Additionally, our framework introduces a spectral\-domain flow matching for efficient and high\-fidelity generation of the time\-scale fMRI scalograms conditioned on subject classes\. The generated time\-frequency scalograms are then reverted to BOLD signals via image\-to\-time series transforms\. Our main contributions are summarized as follows:

1. 1\.Our proposed DSFM framework is the first to jointly leverage DWT and DCT, forming a unified dual\-spectral image transform to capture both global and local spatiotemporal and spectral features for fMRI BOLD signal generation and brain disorder classification\.
2. 2\.We develop a spectral flow matching to model a heat dissipation process in the DCT domain to achieve efficient, coarse\-to\-fine generation aligned with the frequency hierarchy of the dual\-spectral representation\. This enables DSFM to leverage the spectral sparsity inherent in fMRI signals to effectively capture diverse brain profiles\.
3. 3\.Our results show that DSFM demonstrates strong performance on unconditional and conditional spectral image synthesis, and achieves improvement in brain disorder classification compared to recent time\-series and fMRI generation baselines\.

![Refer to caption](https://arxiv.org/html/2605.30387v1/x1.png)Figure 1:The pipeline of DSFM\. ROI\-based BOLD time series are first extracted, followed by DWT\-based multiresolution decomposition and blockwise 2D DCT for localized spectral encoding\. U\-ViT is used to model the velocity field in the DCT domain for ODE\-based sampling\. The reconstructed signals \(via IDCT and IDWT\) are then used for data augmentation, FC matrix construction, and classification\. Finally, fidelity and downstream performance are evaluated\.
## 2Methods

### 2\.1Discrete Wavelet Transform and Its Inversion

Fig\.[1](https://arxiv.org/html/2605.30387#S1.F1)provides an overview of our proposed framework\. Given high\-dimensional fMRI signals fromSSsubjects, denoted as𝒳=\{xs\}s=1S\\mathcal\{X\}=\\\{x\_\{s\}\\\}\_\{s=1\}^\{S\}, where each subjectxs∈ℝD×Tx\_\{s\}\\in\\mathbb\{R\}^\{D\\times T\}consists ofDDregions of interest \(ROIs\) recorded overTTtime points, our objective is to learn the underlying real data distributionpdata​\(𝒳\)p\_\{\\text\{data\}\}\(\\mathcal\{X\}\)and generate a synthetic distributionpθ​\(𝒳\)p\_\{\\theta\}\(\\mathcal\{X\}\)that is statistically indistinguishable from the real data\. Unlike conventional time\-series generative tasks that operate exclusively in the time domain, our approach transforms fMRI time series into time\-scale images using the DWT, defined as follows:

W​\(k,j\)=∑n=1Nx​\(n\)​ψj,k​\[n\],W\(k,j\)=\\sum\_\{n=1\}^\{N\}x\(n\)\\,\\psi\_\{j,k\}\[n\],\(1\)wherexs​\(n\)x\_\{s\}\(n\)is the BOLD signal at local time indexn∈\{1,2,…,N\}n\\in\\\{1,2,\\dots,N\\\}\. Here,ψj,k​\[n\]=2−k/2​ψ​\[2k​n−j\]\\psi\_\{j,k\}\[n\]=2^\{\-k/2\}\\psi\[2^\{k\}n\-j\]is the dyadic wavelet basis function, where scalek∈\{1,2,…,⌊log2⁡N⌋\}k\\in\\\{1,2,\\dots,\\left\\lfloor\\log\_\{2\}N\\right\\rfloor\\\}controls the frequency resolution, and translation indexj∈\{1,2,…,N/2k\}j\\in\\\{1,2,\\dots,N/2^\{k\}\\\}determines the time location, derived from the mother waveletψj,k​\[n\]\\psi\_\{j,k\}\[n\]\. To construct a wavelet decomposition map, we upsample each wavelet subband to the original time length and stack them along the scale axis, forming a multiresolution wavelet decomposition map\. Thus forming a full wavelet coefficient tensorW​\(i,j,k\)∈ℝD×Tψ×CW\(i,j,k\)\\in\\mathbb\{R\}^\{D\\times T\_\{\\psi\}\\times C\}, whereTψ=N/2CT\_\{\\psi\}=N/2^\{C\}andC=⌊log2⁡N⌋C=\\left\\lfloor\\log\_\{2\}N\\right\\rfloor, that captures both low\-frequency trends and high\-frequency transients in the fMRI BOLD signals\. We further perform component\-wise normalization to accentuate the difference between high and low coefficients over brain regions and time\. As shown in Fig\.[2](https://arxiv.org/html/2605.30387#S2.F2), this allows the time\-series signals to be represented as multichannel images with preserved spectral\-temporal characteristics\.

To reconstruct the original signals from the generated scalogram representation, we first denormalize the predicted wavelet componentsW^\(i\)​\(j,k\)∈ℝT×C\\hat\{W\}^\{\(i\)\}\(j,k\)\\in\\mathbb\{R\}^\{T\\times C\}of eachithi^\{\\text\{th\}\}ROI\. The coefficients are then downsampled according to their corresponding dyadic scales and computed the inverse DWT \(IDWT\) to obtain the fMRI BOLD signals as follows:

x^​\(n\)=1N​∑k=1C∑j=1TW\(i\)​\(j,k\)​ψj,k​\[n\]\.\\hat\{x\}\(n\)=\\frac\{1\}\{N\}\\sum\_\{k=1\}^\{C\}\\sum\_\{j=1\}^\{T\}W^\{\(i\)\}\(j,k\)\\,\\psi\_\{j,k\}\[n\]\.\(2\)Finally, these wavelet subbands are reconstructed through a hierarchical combination of approximation and detail components across all scales to obtain the reconstructed time\-domain signalx^s\\hat\{x\}\_\{s\}for each subject\. This process ensures that the inherited spectral\-temporal characteristics of the original fMRI BOLD signals are well\-preserved\.

![Refer to caption](https://arxiv.org/html/2605.30387v1/AnonymousSubmission/LaTeX/img_visualization_1.png)Figure 2:Original \(Rows 1&3\) vs\. synthetic BOLD signals \(Rows 2 &4\) and generated normalized scalograms\. Our framework generates new synthetic BOLD signals as opposed to correlation matrices or functional connectivity, with distributional statistics that closely match the original samples\.
### 2\.2Discrete Cosine Transform for BOLD Signals

To extract localized energy compactions of low\-frequency spontaneous BOLD coefficients\. We divide each subband mapW^\(k\)​\(i,j\)∈ℝD×Tψ\\hat\{W\}^\{\(k\)\}\(i,j\)\\in\\mathbb\{R\}^\{D\\times T\_\{\\psi\}\}of eachkthk^\{\\text\{th\}\}wavelet scale into non\-overlapping 2D blocks of sizeB×BB\\times B, resulting in a set of blocks \(patches\):

W\(k\)≡\{Wp\(k\)​\(x,y\)∈ℝB×B\}p=1P,\\displaystyle\\begin\{split\}W^\{\(k\)\}\\equiv\\left\\\{W^\{\(k\)\}\_\{p\}\{\(x,y\)\}\\in\\mathbb\{R\}^\{B\\times B\}\\right\\\}\_\{p=1\}^\{P\},\\end\{split\}\(3\)wherePPis the number of blocks per subband image andBBis the block size\. Each block is then transformed via 2D type\-II DCT, as follows:

D\(k\)​\(u,v\)=α​\(u\)​α​\(v\)​∑x=1B∑y=1BW\(k\)​\(x,y\)​cos⁡\[\(2​x\+1\)​u​π2​B\]​cos⁡\[\(2​y\+1\)​v​π2​B\],\\displaystyle\\begin\{split\}D^\{\(k\)\}\(u,v\)=&\\;\\alpha\(u\)\\,\\alpha\(v\)\\sum\_\{x=1\}^\{B\}\\sum\_\{y=1\}^\{B\}W^\{\(k\)\}\(x,y\)\\cos\\\!\\left\[\\frac\{\(2x\+1\)u\\pi\}\{2B\}\\right\]\\cos\\\!\\left\[\\frac\{\(2y\+1\)v\\pi\}\{2B\}\\right\],\\end\{split\}\(4\)whereα​\(u\)=1B\\alpha\(u\)=\\sqrt\{\\frac\{1\}\{B\}\}ifu=0u=0, andα​\(u\)=2B\\alpha\(u\)=\\sqrt\{\\frac\{2\}\{B\}\}otherwise\. The resultingD\(k\)​\(u,v\)∈ℝB×BD^\{\(k\)\}\(u,v\)\\in\\mathbb\{R\}^\{B\\times B\}at each scalekkrepresents the DCT coefficients within each block\.

To recover the full image representation, we apply the inverse 2D DCT \(IDCT\) to each blockD\(k\)​\(u,v\)D^\{\(k\)\}\(u,v\)\. The signal block is reconstructed via the following inverse transform:

W^\(k\)​\(x,y\)=∑u=1B∑v=1Bα​\(u\)​α​\(v\)​D\(k\)​\(u,v\)​cos⁡\[\(2​x\+1\)​u​π2​B\]​cos⁡\[\(2​y\+1\)​v​π2​B\]\.\\displaystyle\\begin\{split\}\\hat\{W\}^\{\(k\)\}\(x,y\)=&\\;\\sum\_\{u=1\}^\{B\}\\sum\_\{v=1\}^\{B\}\\alpha\(u\)\\,\\alpha\(v\)D^\{\(k\)\}\(u,v\)\\cos\\\!\\left\[\\frac\{\(2x\+1\)u\\pi\}\{2B\}\\right\]\\cos\\\!\\left\[\\frac\{\(2y\+1\)v\\pi\}\{2B\}\\right\]\.\\end\{split\}\(5\)Once all blocks have been transformed back into the original spatial domain, we stitch the patches to recover the full subband map\. Since the DCT is applied to non\-overlapping blocks, the reconstruction involves simply tiling the inverse\-transformed blocks back into their original positions within the subband image\. This blockwise DCT preserves localized low\-frequency structure in the ROI–time space, while high\-frequency components can optionally be truncated to suppress noise\. The resulting set of filtered subbands can then be passed back into the IDWT to recover the time\-domain fMRI BOLD signalx^s​\(n\)\\hat\{x\}\_\{s\}\(n\), ensuring global and local spectral characteristics are retained\.

### 2\.3Spectral Flow Matching in DCT Domain

Recent studies have empirically demonstrated that pixel\-based diffusion models exhibit approximate autoregressive behavior in the frequency domain\(Dieleman,[2024](https://arxiv.org/html/2605.30387#bib.bib22); Falcket al\.,[2025](https://arxiv.org/html/2605.30387#bib.bib24)\)\. Specifically, diffusion models\(Hoet al\.,[2020](https://arxiv.org/html/2605.30387#bib.bib26); Songet al\.,[2021](https://arxiv.org/html/2605.30387#bib.bib25)\)tend to eliminate high\-frequency components early in the forward process, followed by progressively lower\-frequency components as the diffusion timestep advances\. While prior studies focus on the Fourier basis, this property also holds in the DCT domain\(Skorokhodovet al\.,[2025](https://arxiv.org/html/2605.30387#bib.bib29); Ninget al\.,[2025](https://arxiv.org/html/2605.30387#bib.bib28)\), which offers practical advantages: real\-valued orthogonality, energy compaction in low\-frequency bands, and compatibility with block\-wise architectures\.

Modeling diffusion directly in the frequency domain enables the exploitation of spectral sparsity for designing frequency\-aware noise schedules\. However, existing frequency\-domain generative models\(Hoogeboom and Salimans,[2023](https://arxiv.org/html/2605.30387#bib.bib30); Rissanenet al\.,[2023](https://arxiv.org/html/2605.30387#bib.bib31)\)remain constrained to the diffusion framework, which relies on stochastic differential equation \(SDE\) sampling and typically requires hundreds to thousands of steps for high\-quality synthesis\. In contrast, flow\-matching approaches based on ordinary differential equations \(ODEs\) provide a deterministic alternative with significantly lower sampling complexity\. In this work, we introduce a spectral flow\-matching framework that extends frequency\-based generative modeling beyond the diffusion paradigm\.

First, consider a forward\-time heat dissipation process\(Rissanenet al\.,[2023](https://arxiv.org/html/2605.30387#bib.bib31)\)as an alternative to the conventional isotropic diffusion, described by the following stochastic partial differential equation \(SPDE\):

d​xt​\(c\)=η​\(t\)​Δc​xt​\(c\)​d​t\+G​\(t\)​d​W​\(t\),\\displaystyle\\begin\{split\}dx\_\{t\}\(c\)&=\\eta\(t\)\\,\\Delta\_\{c\}\\,x\_\{t\}\(c\)\\,dt\+G\(t\)\\,dW\(t\),\\end\{split\}\(6\)wherext:ℝ2×ℝ\+→ℝx\_\{t\}:\\mathbb\{R\}^\{2\}\\times\\mathbb\{R\_\{\+\}\}\\rightarrow\\mathbb\{R\}is an idealized continuous\-space representation of a single image channel at timet∈\[0,1\]t\\in\[0,1\], andΔc=∇c⋅∇c\\Delta\_\{c\}=\\nabla\_\{c\}\\cdot\\nabla\_\{c\}is the Laplace operator with respect to the spatial image coordinatescc;η​\(t\)\\eta\(t\)andG​\(t\)G\(t\)are time\-dependent scalar drift and matrix\-valued diffusion coefficients, respectively\. The corresponding reverse\-time probability flow ODE\(Songet al\.,[2021](https://arxiv.org/html/2605.30387#bib.bib25)\)is given by:

d​xtd​t=η​\(t\)​Δc​xt​\(c\)−12​G​\(t\)​G​\(t\)T​∇xtlog⁡p​\(xt\)\.\\displaystyle\\begin\{split\}\\frac\{dx\_\{t\}\}\{dt\}&=\\eta\(t\)\\,\\Delta\_\{c\}\\,\{x\}\_\{t\}\(c\)\-\\frac\{1\}\{2\}\\,G\(t\)G\(t\)^\{T\}\\,\\nabla\_\{x\_\{t\}\}\\log p\(\{x\}\_\{t\}\)\.\\end\{split\}\(7\)
Subsequently, define the forward and inverse DCT transforms formally as

zt=VT​xt=DCT​\(xt\),xt=V​zt=IDCT​\(zt\),\\displaystyle z\_\{t\}=V^\{T\}x\_\{t\}=\\mathrm\{DCT\}\(x\_\{t\}\),\\quad x\_\{t\}=Vz\_\{t\}=\\mathrm\{IDCT\}\(z\_\{t\}\),\(8\)whereVVdenotes the matrix of orthonormal DCT basis eigenvectors\. It then follows that the Laplacian operator in equation[6](https://arxiv.org/html/2605.30387#S2.E6)can be diagonalized via the eigendecompositionΔc=V​Λ​VT\\Delta\_\{c\}=V\\,\\Lambda\\,V^\{T\}, whereΛ\\Lambdadenotes the diagonal matrix of DCT mode\-specific Laplacian eigenvalues\. Applying DCT to the forward\-time SPDE equation[6](https://arxiv.org/html/2605.30387#S2.E6)yields

d​zt=−η​\(t\)​Λ​zt​d​t\+G​\(t\)​d​W​\(t\),\\displaystyle\\begin\{split\}dz\_\{t\}=\-\\eta\(t\)\\,\\Lambda\\,z\_\{t\}\\,dt\+G\(t\)\\,dW\(t\),\\end\{split\}\(9\)whereW​\(t\)W\(t\)is a standard Wiener process, but in the DCT domain\. To obtain a frequency\-ordered representation, we apply zig\-zag flattening, which maps the two\-dimensional DCT coefficient grid into a one\-dimensional sequence sorted from low \(upper\-left\) to high \(bottom\-right\) frequencies\. Subsequently, we apply per–DCT\-mode signal\-to\-noise scaling to ensure that the DCT coefficients conform to the forward perturbation process of the proposed DCT flow governed by the SPDEs in equation[6](https://arxiv.org/html/2605.30387#S2.E6)and equation[9](https://arxiv.org/html/2605.30387#S2.E9)\. Moreover, given thatVVis orthonormal, the following change\-of\-variables holds for any differentiable functionff:

VT​∇xf​\(x\)=VT​\(∂z∂x\)T​∇zf​\(z\)=VT​V⏟I​∇zf​\(VT​x⏟z\)=∇zf​\(z\)\.\\displaystyle\\begin\{split\}V^\{T\}\\nabla\_\{x\}f\(x\)=V^\{T\}\\,\\bigg\(\\frac\{\\partial z\}\{\\partial x\}\\bigg\)^\{\\\!\\\!T\}\\nabla\_\{z\}f\(z\)&=\\underbrace\{V^\{T\}V\}\_\{I\}\\,\\nabla\_\{z\}f\(\\underbrace\{V^\{T\}x\}\_\{z\}\)=\\nabla\_\{z\}f\(z\)\.\\end\{split\}By lettingf​\(xt\)=log⁡p​\(xt\)f\(x\_\{t\}\)=\\log p\(x\_\{t\}\), the score transforms asVT​∇xtlog⁡p​\(xt\)=∇ztlog⁡p​\(zt\)V^\{T\}\\nabla\_\{x\_\{t\}\}\\log p\(x\_\{t\}\)=\\nabla\_\{z\_\{t\}\}\\log p\(z\_\{t\}\)\. SinceΛ\\Lambdais diagonal and the DCT basis orthogonalizes the frequency modes, applying DCT to equation[7](https://arxiv.org/html/2605.30387#S2.E7), the reverse\-time probability flow ODE admits the following mode\-wise decomposition:

d​zt​\[k\]d​t=−η​\(t\)​λk​zt​\[k\]−12​g​\(t,k\)2​∇zt​\[k\]log⁡p​\(zt\),\\displaystyle\\begin\{split\}\\frac\{dz\_\{t\}\[k\]\}\{dt\}=\-\\eta\(t\)\\,\\lambda\_\{k\}\\,z\_\{t\}\[k\]\-\\frac\{1\}\{2\}\\,g\(t,k\)^\{2\}\\,\\nabla\_\{z\_\{t\}\[k\]\}\\log p\(z\_\{t\}\),\\end\{split\}\(10\)whereλk\\lambda\_\{k\}is thekk\-th diagonal entry ofΛ\\Lambda, corresponding to the Laplacian eigenvalue of thekthk^\{\\text\{th\}\}DCT basis component, which evolves independently under the ODE dynamics\.

The following proposition bridges between this DCT mode\-wise probability flow ODE and the conditional velocity \(vector\) field in flow matching\(Lipmanet al\.,[2023](https://arxiv.org/html/2605.30387#bib.bib27)\)\.

###### Proposition 1\.

A mode\-wise conditional perturbation kernel \(isotropic in each DCT mode\) is

p​\(zt​\[k\]\|z0​\[k\]\)=𝒩​\(μ​\(t,k\)​z0​\[k\],σ​\(t,k\)2\),\\displaystyle\\begin\{split\}p\(z\_\{t\}\[k\]\\,\|\\,z\_\{0\}\[k\]\)=\\mathcal\{N\}\\big\(\\mu\(t,k\)\\,z\_\{0\}\[k\],\\sigma\(t,k\)^\{2\}\\big\),\\end\{split\}\(11\)with mean and standard deviation \(std\) schedules

μ​\(t,k\)=α​\(t\)​ω​\(t,k\),ω​\(t,k\)=e−λk​τ​\(t\),σ​\(t,k\)2=1−μ​\(t,k\)2,\\displaystyle\\begin\{split\}&\\mu\(t,k\)=\\alpha\(t\)\\,\\omega\(t,k\),\\quad\\omega\(t,k\)=e^\{\-\\lambda\_\{k\}\\tau\(t\)\},\\quad\\sigma\(t,k\)^\{2\}=1\-\\mu\(t,k\)^\{2\},\\end\{split\}\(12\)whereα​\(t\)\\alpha\(t\)is the mean schedule of a variance\-preserving \(VP\) diffusion process andτ​\(t\)=∫0tη​\(s\)​𝑑s\\tau\(t\)=\\int\_\{0\}^\{t\}\\eta\(s\)\\,ds, satisfies the heat dissipation process \([6](https://arxiv.org/html/2605.30387#S2.E6)\) and \([9](https://arxiv.org/html/2605.30387#S2.E9)\)\. The mode\-wise diffusion coefficients are then given by

g​\(t,k\)2=2​σ​\(t,k\)​\(σ˙​\(t,k\)−f​\(t,k\)​σ​\(t,k\)\),\\displaystyle\\begin\{split\}g\(t,k\)^\{2\}=2\\,\\sigma\(t,k\)\\,\\big\(\\dot\{\\sigma\}\(t,k\)\-f\(t,k\)\\,\\sigma\(t,k\)\\big\),\\end\{split\}\(13\)wheref​\(t,k\)=α˙​\(t\)α​\(t\)−η​\(t\)​λkf\(t,k\)=\\frac\{\\dot\{\\alpha\}\(t\)\}\{\\alpha\(t\)\}\-\\eta\(t\)\\,\\lambda\_\{k\}, andμ˙​\(t,k\)\\dot\{\\mu\}\(t,k\),σ˙​\(t,k\)\\dot\{\\sigma\}\(t,k\)denote time\-derivatives of the mean and std schedules in \([12](https://arxiv.org/html/2605.30387#S2.E12)\)\.

###### Proof\.

Refer to the Supplementary Material\. ∎

Table 1:Comparison of unconditional Netsim dataset generation across SOTA and proposed model\.###### Proposition 2\.

A mode\-wise conditional velocity field

d​zt​\[k\]d​t\|z0​\[k\]=v​\(zt\|z0;t,k\)=μ˙​\(t,k\)​z0​\[k\]\+σ˙​\(t,k\)​ϵ,\\displaystyle\\begin\{split\}\\frac\{dz\_\{t\}\[k\]\}\{dt\}\\bigg\|\_\{z\_\{0\}\[k\]\}\\\!\\\!\\\!\\\!\\\!=v\(z\_\{t\}\|z\_\{0\};t,k\)=\\dot\{\\mu\}\(t,k\)\\,z\_\{0\}\[k\]\+\\dot\{\\sigma\}\(t,k\)\\,\\epsilon,\\end\{split\}\(14\)whereϵ∼𝒩​\(0,1\)\\epsilon\\sim\\mathcal\{N\}\(0,1\), is equivalent to the conditional probability flow ODE

d​zt​\[k\]d​t\|z0​\[k\]=−η​\(t\)​λk​zt​\[k\]\+12​g​\(t\)2​∇zt​\[k\]log⁡p​\(zt\|z0\)\.\\displaystyle\\begin\{split\}\\frac\{dz\_\{t\}\[k\]\}\{dt\}\\bigg\|\_\{z\_\{0\}\[k\]\}\\\!\\\!\\\!\\\!\\\!=\-\\eta\(t\)\\,\\lambda\_\{k\}\\,z\_\{t\}\[k\]\+\\frac\{1\}\{2\}\\,g\(t\)^\{2\}\\,\\nabla\_\{z\_\{t\}\[k\]\}\\log p\(z\_\{t\}\|z\_\{0\}\)\.\\end\{split\}\(15\)Furthermore, it follows that the marginal velocity field

d​zt​\[k\]d​t=v​\(zt;t,k\)=𝔼pdata​\(z0\|zt\)​\[v​\(zt\|z0;t,k\)\|zt\],\\displaystyle\\begin\{split\}\\frac\{dz\_\{t\}\[k\]\}\{dt\}=v\(z\_\{t\};t,k\)=\\mathbb\{E\}\_\{p\_\{\\text\{data\}\}\(z\_\{0\}\|z\_\{t\}\)\}\\big\[v\(z\_\{t\}\|z\_\{0\};t,k\)\\,\|\\,z\_\{t\}\\big\],\\end\{split\}\(16\)given by the law of the unconscious statistician\(Lipmanet al\.,[2024](https://arxiv.org/html/2605.30387#bib.bib32)\), satisfies the marginal mode\-wise probability flow ODE \([10](https://arxiv.org/html/2605.30387#S2.E10)\)\.

###### Proof\.

Refer to the Supplementary Material\. ∎

Given this correspondence between the probability flow ODE from diffusion models and flow matching, we parameterize the velocity fieldvθv\_\{\\theta\}using a deep neural network \(U\-ViT\(Baoet al\.,[2023](https://arxiv.org/html/2605.30387#bib.bib33)\)\) and train it via the following conditional spectral flow matching \(CSFM\) loss:

ℒCSFM\(θ\)=𝔼t,p​\(zt\|z0\)​pdata​\(z0\)∥vθ\(zt;t,k\)−v\(zt\|z0;t,k\)∥2,\\displaystyle\\begin\{split\}&\\mathcal\{L\}^\{\\text\{CSFM\}\}\(\\theta\)\\\!=\\\!\\mathbb\{E\}\_\{t,p\(z\_\{t\}\\,\|\\,z\_\{0\}\)p\_\{\\text\{data\}\}\(z\_\{0\}\)\}\\big\\\|v\_\{\\theta\}\(z\_\{t\};t,k\)\-v\(z\_\{t\}\|z\_\{0\};t,k\)\\big\\\|^\{2\},\\end\{split\}\(17\)wherev​\(zt\|z0;t,k\)v\(z\_\{t\}\|z\_\{0\};t,k\)is the conditional velocity field in \([14](https://arxiv.org/html/2605.30387#S2.E14)\), withztz\_\{t\}sampled from the per\-mode conditional perturbation kernel \([11](https://arxiv.org/html/2605.30387#S2.E11)\), andt∼𝒰​\(0,1\)t\\sim\\mathcal\{U\}\(0,1\)is uniformly sampled\. Notably, this CSFM loss recovers the standard flow matching loss under the OT\-CFM schedulesμ​\(t\)=1−t\\mu\(t\)=1\-tandσ​\(t\)=t\\sigma\(t\)=t, where the time convention adopted here is the reverse of that in\(Lipmanet al\.,[2023](https://arxiv.org/html/2605.30387#bib.bib27)\)\. Hence, our framework generalizes flow matching to a heat dissipation process in the DCT domain\. In our experiments, we useα​\(t\)\\alpha\(t\)from the variance\-preserving \(VP\) cosine schedule and setτ​\(t\)=σmax​sin2⁡\(π2​t\)\\tau\(t\)=\\sigma\_\{\\max\}\\sin^\{2\}\\left\(\\tfrac\{\\pi\}\{2\}t\\right\)following\(Hoogeboom and Salimans,[2023](https://arxiv.org/html/2605.30387#bib.bib30)\), which observes optimal performance withσmax=20\\sigma\_\{\\max\}=20\.

To enable class\-conditioned generation, we employ classifier\-free guidance\(Ho and Salimans,[2021](https://arxiv.org/html/2605.30387#bib.bib50)\)by conditioning the velocity model on the class labelcc, i\.e\.,vθ​\(zt;t,k,c\)v\_\{\\theta\}\(z\_\{t\};t,k,c\)and setc=∅c=\\varnothingfor the unconditional model\. The conditional and unconditional models are trained jointly by randomly replacing the class labelccwith the null token∅\\varnothingwith probabilityp∅p\_\{\\varnothing\}\. During sampling, the classifier\-free guided velocity is obtained as a weighted combination of the model outputs\(Zhenget al\.,[2023](https://arxiv.org/html/2605.30387#bib.bib34)\)\. Finally, DCT samples are generated by numerically integrating the learned flow velocity using an adaptive ODE solver\.

Table 2:The best generation quality and classification performance on the MDD dataset for different generative models using the AAL atlas parcellation, trained on ground\-truth data augmented at three levels\. “–” denotes FC\-based generation\.Full resultsrefer to the Supplementary Material\.Table 3:The best generation quality and classification performance on the ABIDE dataset for different generative models using the Schaefer parcellation, trained on ground\-truth data augmented at three levels\. “–” denotes FC\-based generation\.Full resultsrefer to the Supplementary Material\.

## 3Experiment

### 3\.1Settings

Data Acquisition and Pre\-processing\.In our experiments, we evaluated the proposed method on one simulated and two real\-world brain disorder datasets\. \(1\) Major Depressive Disorder \(MDD\): We preprocessed the resting\-state fMRI \(rs\-fMRI\) dataset from the REST\-meta\-MDD Consortium database\(Yanet al\.,[2019](https://arxiv.org/html/2605.30387#bib.bib36)\)using the Data Processing Assistant for Resting\-State fMRI \(DPARSF\)\(Yan and Zang,[2010](https://arxiv.org/html/2605.30387#bib.bib37)\)\. This dataset comprises 250 Healthy Controls \(HC\) subjects and 227 individuals diagnosed with Major Depressive Disorder \(MDD\)\. All scans were acquired using a Siemens Tim Trio 3T scanner TR/TE = 2000/30 ms, and a slice thickness of 3mm\. The brain was parcellated into 116 ROIs, covering cortical and subcortical areas, and the mean BOLD signal for each ROI was extracted across 232 time points using the Automated Anatomical Labeling \(AAL\) atlas\. \(2\) Autism Brain Imaging Data Exchange \(ABIDE\): We preprocessed rs\-fMRI scans from multiple international sites\. This dataset includes 488 Autism Spectrum Disorder \(ASD\) patients and 537 normal controls \(NC\) from the ABIDE database\. The brain was parcellated into 100 ROIs using the Schaefer atlas, We then extracted mean BOLD signal for each ROI over 200 time points\(Di Martinoet al\.,[2014](https://arxiv.org/html/2605.30387#bib.bib60)\)\. \(3\) NetSim: We used the NetSim dataset, a simulated benchmark for evaluating causal discovery in neuroimaging\. NetSim provides biologically realistic simulations of BOLD time series, we selected Simulation 4 with 50 channels from the original dataset\(Smithet al\.,[2011](https://arxiv.org/html/2605.30387#bib.bib61)\)\.

### 3\.2Implementation Details

DSFM Training\.The proposed DSFM framework generates fMRI signals corresponding to the subjects’ condition\. The classifiers then discriminate between control and clinical groups for each dataset\. We train the DSFM using an AdamW optimizer with a learning rate of2​e−42e^\{\-4\}over 300k iterations\. All experiments employ a Haar wavelet basis with a 5\-level decomposition, yielding real\-valued images of size 116×\\times232 \(MDD\) and 100×\\times200 \(ABIDE\)\. We compare numbers of function evaluations \(NFE\) of 20, 50, and 100 steps\. Connectivity Network Construction\.The subject\-specific functional connectivity is derived using the Ledoit\-Wolf \(LDW\) regularized shrinkage covariance estimator to preserve the strongestτ=40%\\tau=40\\%connections, resulting in a sparse 116×\\times116 \(MDD\) and 100×\\times100 \(ABIDE\) FCs with all other connections set to zero\. Data Augmentation and Classifier Training\.The trained DSFM is used to augment real fMRI signals by factors of 1×\\times, 2×\\times, and 3×\\times\. For our classifier, the L2 regularization weight decay is from10−810^\{\-8\}to10−210^\{\-2\}, the scheduler learning rate reduction factor is from 0\.1 to 0\.9, and the batch size is from 5 to 16, the same as in\(Tewet al\.,[2025b](https://arxiv.org/html/2605.30387#bib.bib57)\)\. All hyperparameters are selected based on a 5\-fold stratified cross\-validation\.

Table 4:Ablation analysis of frequency\-specific FC classification by incorporating individual and grouped wavelet subbands on the MDD dataset\.Wavelet SubbandsAccuracy↑\\uparrowPrecision↑\\uparrowF1\-Score↑\\uparrowROC↑\\uparrowSettingLH1LH2LH3LH4LH5LLValueDrop \(%\)ValueDrop \(%\)ValueDrop \(%\)ValueDrop \(%\)Full\-band✓✓✓✓✓✓70\.84–70\.99–70\.77–71\.49–Low\-pass✗✗✓✓✓✓66\.89\-5\.5866\.96\-5\.6866\.77\-5\.6565\.79\-7\.97Mid\-pass✓✓✗✗✓✓63\.30\-10\.6463\.74\-10\.2163\.05\-10\.9160\.41\-15\.50High\-pass✓✓✓✓✗✗65\.40\-7\.6865\.55\-7\.6665\.18\-7\.9063\.66\-11\.0Band\-pass 1✗✓✓✓✓✓66\.45\-6\.2066\.53\-6\.2866\.39\-6\.1968\.38\-4\.35Band\-pass 2✓✓✓✓✓✗66\.66\-5\.9066\.88\-5\.7966\.60\-5\.8966\.74\-6\.64Band\-pass 3✗✓✓✓✓✗66\.88\-5\.5966\.76\-5\.9667\.06\-5\.2467\.77\-5\.20

### 3\.3Overall Performance

We first trained our proposed DSFM unconditionally to produce comparable outputs in Table[1](https://arxiv.org/html/2605.30387#S2.T1)\. Then, we followed the standard setting for the quality and classification evaluation of the conditional time\-series generation as described in section[D](https://arxiv.org/html/2605.30387#A4)\. Our primary goal is to achieve a better cFID score to model complex spatiotemporal patterns and excel in capturing conditional distribution over time\.

![Refer to caption](https://arxiv.org/html/2605.30387v1/AnonymousSubmission/LaTeX/gen_metrics_2.png)Figure 3:We plot the 2D t\-SNE embedding of HC and MDD synthetic data generated by our method \(top left\)\. Then, we compare with the distributions using Jensen\-Shannon Divergence and probability density functions \(top right and bottom\)\.Classification Score\.To validate the fidelity of the generated samples, we evaluate the classification performance of BrainNetCNN\(Kawaharaet al\.,[2017](https://arxiv.org/html/2605.30387#bib.bib46)\), comparing DSFM to GAN and diffusion\-based baselines on our fMRI dataset\. Here, we use the parameter setting of NFE = 100 in subsequent downstream analyses, as supported by the quality metrics of distinguishing HC and MDD subjects in Table[7](https://arxiv.org/html/2605.30387#A5.T7)\. Table[2](https://arxiv.org/html/2605.30387#S2.T2)and[3](https://arxiv.org/html/2605.30387#S2.T3)reports the classification results on the 5\-fold cross\-validation test set\. Notably, DSFM achieves the highest accuracy under 1×\\times\(MDD\) and 1×\\times\(ASD\) data augmentation setting\. Moreover, our model exhibits lower variance across increased augmentation levels, indicating strong generalization and robustness\. These results confirm that DSFM not only enriches sample diversity but also preserves discriminative neurophysiological functional patterns critical for clinical tasks\. Figure[3](https://arxiv.org/html/2605.30387#S3.F3)further demonstrates that our proposed DSFM model excels in generating class\-conditioned synthetic data whose statistical distribution closely matches that of the original samples\.

### 3\.4Ablation Studies

We first conducted ablation study on six wavelet detail bands, i\.e\., LH1: 0\.125 \- 0\.250Hz, LH2: 0\.0625 \- 0\.125Hz, LH3: 0\.03125 \- 0\.0625Hz, LH4: 0\.015625 \- 0\.03125Hz, LH5: 0\.007825 \- 0\.015625Hz, and a coarse approximation LL: 0 \- 0\.007825Hz, contrasting each setting with the full 0 to 0\.25Hz spectrum\. Table[4](https://arxiv.org/html/2605.30387#S3.T4)assesses the impact of different wavelet subbands on model performance\. The steepest decline occurred when the mid\-frequency LH3–LH4 pair was removed, highlighting the pivotal role of 0\.01–0\.06 Hz oscillations to capture disease\-specific interactions due to insufficient contextual information\. Suppressing either the highest \(LH1–LH2\) or the very lowest components \(LL and LH5\) produced a comparable, still significant degradation \(5–8%\\%\), indicating that both rapid fluctuations and slow drifts provide complementary cues\. Conversely, removing individual bands such as LH1 and LL also reduced performance by 5–7%\\%, indicating that long\-range and slow drifts carry global synchrony patterns essential for classification\. Interestingly, we observe that although BOLD fluctuations predominantly lie in the low\-frequency band, removing any subbands impaired performance, indicating that disease\-related features are distributed across the entire frequency spectrum\.

Table 5:Ablation studies of block sizes, wavelet bases, normalization strategies, different spectral representations and generative models\.Full resultsrefer to the Supplementary Material\.Table[5](https://arxiv.org/html/2605.30387#S3.T5)presents ablation analyses of different configurations evaluating normalization strategies, block sizes and wavelet bases influence the generative quality of our dual\-spectral representation\. In particular, 1\) and 2\) show a comparison of MinMax normalization \(MM\) with the Entropy\-Consistent Scaling \(ECS\)\. Notably, MinMax scales each wavelet coefficient independently, broadening the distribution of high\-frequency coefficients, which results in slower training and reduced performance\. In contrast, ECS preserves the global spectral coefficient by normalizing DCT frequency components using a percentile\-trimmed bound derived from the lowest frequency component, providing better cFID and correlation scores by maintaining the original coefficient distribution\. Experiments with smaller and larger block sizeBBin 3\) and 4\) achieve comparable generation performance, with a tradeoff of a smallerBBwill lead to slower training, largerBBleads to the loss of fine\-grained local dependencies\. Finally, the ablations in 4\) and 5\) using different mother wavelets produces similar generation results on the MDD dataset\. This further exemplifies that the underlying fMRI signals do not exhibit strong wavelet\-specific bases sensitivity\. Additional ablations in Section[E\.2](https://arxiv.org/html/2605.30387#A5.SS2)further illustrate results for different spectral representations \(Fourier and wavelet transforms\) and types of generative models \(flow matching and diffusion models\), against our proposed DSFM\.

Table 6:Similarity between synthetic and real FC networks across FC edges, node strength, and edge betweenness centrality\. Higher values indicate better preservation of real FC topology\.![Refer to caption](https://arxiv.org/html/2605.30387v1/AnonymousSubmission/LaTeX/neuro_p.png)Figure 4:Visualization of the average resting\-state hemodynamic response function \(rsHRF\) and power spectral density \(PSD\) of real and synthetic BOLD in the Medial Prefrontal Cortex \(mPFC\) and Posterior Cingulate Cortex \(PCC\) region of the Default Mode Network \(DMN\)\. Highlighted L2 norm quantifies the generation and synthetic results closely resemble the real physiological profiles\.

## 4Neurophysiological Plausibility Analysis

![Refer to caption](https://arxiv.org/html/2605.30387v1/AnonymousSubmission/LaTeX/testing.png)Figure 5:a\) Group\-averaged connectivity patterns of real and synthetic HC/MDD connectivity patterns and their differences\. b\) and c\) Subject\-level connectivity patterns of real and synthetic data from HC and MDD, respectively\. d\) 3D cortical surface and brain networks visualizations showing node strength \(top\) and network organization \(bottom\) for both real and synthetic HC/MDD data\.Figure[4](https://arxiv.org/html/2605.30387#S3.F4)presents the qualitative and quantitative comparisons of resting\-state hemodynamic response function \(rsHRF\) and power spectral density \(PSD\) between real and synthetic signals in two key hubs of the Default Mode Network \(DMN\)\. The near\-perfect overlap of the HRF plots indicates that DSFM preserves the canonical temporal dynamics of the hemodynamic process, rather than merely matching marginal statistics\. Likewise, the close alignment of the PSD curves indicates that the synthetic samples exhibit realistic fMRI spectral characteristics, accurately capturing the dominant low\-frequency peaks and the spectral decay across low\- and high\-frequency components\. The low L2 error for both HRF and PSD provides evidence that DSFM effectively learns underlying spectral\-temporal dynamics of the BOLD signals\. Overall, these analyses suggest that our model generates neurophysiologically plausible synthetic signals that are suitable for different downstream tasks, as further supported by the brain disorder classification performance in Tables[2](https://arxiv.org/html/2605.30387#S2.T2)and[3](https://arxiv.org/html/2605.30387#S2.T3)\.

## 5Functional Connectivity \(FC\) Analysis and Visualization

Table[6](https://arxiv.org/html/2605.30387#S3.T6)further evaluates the fidelity of the generated data FC matrices derived from real and synthetic fMRI BOLD signals\. Across all graph similarity metrics, DSFM shows higher Pearson correlation with the real data than other GAN\-based models, indicating more realistic synthesis of FC networks in both connectivity edges and network topology\. These results demonstrate that DSFM not only reproduces plausible spectral\-temporal BOLD signals dynamics but also faithfully captures higher\-order network transformations, reflecting more coherent interdependencies among FC edges than existing GAN\-based generative models\. Figure[5](https://arxiv.org/html/2605.30387#S4.F5)visualizes group\-averaged connectivity, thresholded at 0\.6 to highlight significant edge connections\. Our analysis reveals that the synthetic FC closely aligns with the functional changes observed in the real FC distribution\. Furthermore, the HC and MDD connectograms between both real and synthetic FC indicate a reduction in intra\-network connectivity within the left superior frontal gyrus \(FrontalSupL\) and weakened coupling between the left middle frontal gyrus \(FrontalMidL\) and the left anterior cingulate cortex \(CingulumAntL\)\. The results suggest impaired cognitive functions associated with difficulties in decision\-making and emotion regulation, indicating the biological plausibility of the generated data\.

## 6Conclusions and Future Work

In this paper, we propose DSFM, which effectively captures both temporal dynamics and spectral evolution underlying the ground\-truth data distribution for accurate brain signals generation\. Future work will further validate MDD and ASD classification using graph\-based deep learning models\(Tewet al\.,[2024](https://arxiv.org/html/2605.30387#bib.bib54);[2025a](https://arxiv.org/html/2605.30387#bib.bib55)\)and incorporate energy\-based models to identify out\-of\-distribution \(OOD\) patterns in brain spectrograms/scalograms associated with neurological disorders\(Looet al\.,[2025](https://arxiv.org/html/2605.30387#bib.bib53)\)\.

#### Acknowledgments

This work was supported in part by the Monash University Malaysia and the Ministry of Higher Education, Malaysia under Fundamental Research Grant Scheme FRGS/1/2023/ICT0 2/MUSM/02/1, and by the King Abdullah University of Science and Technology \(KAUST\) Grant\. The authors also acknowledge the support of the Advanced Computing Platform \(ACP\), Monash University Malaysia, for providing computational resources\.

## References

- F\. Bao, S\. Nie, K\. Xue, Y\. Cao, C\. Li, H\. Su, and J\. Zhu \(2023\)All are worth words: a vit backbone for diffusion models\.InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition \(CVPR\),pp\. 22669–22679\.Cited by:[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p6.1)\.
- B\. B\. Biswal and L\. Q\. Uddin \(2025\)The history and future of resting\-state functional magnetic resonance imaging\.Nature641\(8065\),pp\. 1121–1131\.Cited by:[§1](https://arxiv.org/html/2605.30387#S1.p2.1),[§1](https://arxiv.org/html/2605.30387#S1.p3.1)\.
- S\. Bollmann and M\. Barth \(2021\)New acquisition techniques and their prospects for the achievable resolution of fmri\.Progress in neurobiology207,pp\. 101936\.Cited by:[§1](https://arxiv.org/html/2605.30387#S1.p1.1)\.
- A\. Coletta, S\. Gopalakrishnan, D\. Borrajo, and S\. Vyetrenko \(2023\)On the constrained time\-series generation problem\.Advances in Neural Information Processing Systems36,pp\. 61048–61059\.Cited by:[§B\.1](https://arxiv.org/html/2605.30387#A2.SS1.p4.1),[§D\.1](https://arxiv.org/html/2605.30387#A4.SS1.p1.1)\.
- A\. Desai, C\. Freeman, Z\. Wang, and I\. Beaver \(2021\)Timevae: a variational auto\-encoder for multivariate time series generation\.arXiv preprint arXiv:2111\.08095\.Cited by:[§B\.1](https://arxiv.org/html/2605.30387#A2.SS1.p3.1),[§D\.1](https://arxiv.org/html/2605.30387#A4.SS1.p1.1)\.
- A\. Di Martino, C\. Yan, Q\. Li, E\. Denio, F\. X\. Castellanos, K\. Alaerts, J\. S\. Anderson, M\. Assaf, S\. Y\. Bookheimer, M\. Dapretto,et al\.\(2014\)The autism brain imaging data exchange: towards a large\-scale evaluation of the intrinsic brain architecture in autism\.Molecular psychiatry19\(6\),pp\. 659–667\.Cited by:[§3\.1](https://arxiv.org/html/2605.30387#S3.SS1.p1.1)\.
- S\. Dieleman \(2024\)Diffusion is spectral autoregression\.External Links:[Link](https://sander.ai/2024/09/02/spectral-autoregression.html)Cited by:[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p1.1)\.
- F\. Falck, T\. Pandeva, K\. Zahirnia, R\. Lawrence, R\. E\. Turner, E\. Meeds, J\. Zazo, and S\. Karmalkar \(2025\)A fourier space perspective on diffusion models\.External Links:[Link](https://arxiv.org/abs/2505.11278)Cited by:[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p1.1)\.
- I\. Goodfellow, J\. Pouget\-Abadie, M\. Mirza, B\. Xu, D\. Warde\-Farley, S\. Ozair, A\. Courville, and Y\. Bengio \(2020\)Generative adversarial networks\.Communications of the ACM63\(11\),pp\. 139–144\.Cited by:[§D\.1](https://arxiv.org/html/2605.30387#A4.SS1.p1.1)\.
- I\. Gulrajani, F\. Ahmed, M\. Arjovsky, V\. Dumoulin, and A\. C\. Courville \(2017\)Improved training of wasserstein gans\.Advances in neural information processing systems30\.Cited by:[§D\.1](https://arxiv.org/html/2605.30387#A4.SS1.p1.1)\.
- M\. Heusel, H\. Ramsauer, T\. Unterthiner, B\. Nessler, and S\. Hochreiter \(2017\)Gans trained by a two time\-scale update rule converge to a local nash equilibrium\.Advances in neural information processing systems30\.Cited by:[§D\.1](https://arxiv.org/html/2605.30387#A4.SS1.p1.1)\.
- J\. Ho, A\. Jain, and P\. Abbeel \(2020\)Denoising diffusion probabilistic models\.InAdvances in Neural Information Processing Systems,H\. Larochelle, M\. Ranzato, R\. Hadsell, M\.F\. Balcan, and H\. Lin \(Eds\.\),Vol\.33,pp\. 6840–6851\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf)Cited by:[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p1.1)\.
- J\. Ho and T\. Salimans \(2021\)Classifier\-free diffusion guidance\.InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications,External Links:[Link](https://openreview.net/forum?id=qw8AKxfYbI)Cited by:[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p7.6)\.
- E\. Hoogeboom and T\. Salimans \(2023\)Blurring diffusion models\.InThe Eleventh International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=OjDkC57x5sz)Cited by:[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p2.1),[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p6.9)\.
- Y\. Hu, X\. Wang, L\. Wu, H\. Zhang, S\. Z\. Li, S\. Wang, and T\. Chen \(2024\)Fm\-ts: flow matching for time series generation\.Cited by:[§1](https://arxiv.org/html/2605.30387#S1.p3.1)\.
- P\. Jeha, M\. Bohlke\-Schneider, P\. Mercado, S\. Kapoor, R\. S\. Nirwan, V\. Flunkert, J\. Gasthaus, and T\. Januschowski \(2022\)PSA\-gan: progressive self attention gans for synthetic time series\.InThe tenth international conference on learning representations,Cited by:[§D\.1](https://arxiv.org/html/2605.30387#A4.SS1.p1.1),[§D\.2](https://arxiv.org/html/2605.30387#A4.SS2.p3.1)\.
- J\. Kawahara, C\. J\. Brown, S\. P\. Miller, B\. G\. Booth, V\. Chau, R\. E\. Grunau, J\. G\. Zwicker, and G\. Hamarneh \(2017\)BrainNetCNN: convolutional neural networks for brain networks; towards predicting neurodevelopment\.NeuroImage146,pp\. 1038–1049\.Cited by:[§3\.3](https://arxiv.org/html/2605.30387#S3.SS3.p2.2)\.
- Z\. Kong, W\. Ping, J\. Huang, K\. Zhao, and B\. Catanzaro \(2020\)Diffwave: a versatile diffusion model for audio synthesis\.arXiv preprint arXiv:2009\.09761\.Cited by:[§B\.1](https://arxiv.org/html/2605.30387#A2.SS1.p4.1),[§D\.1](https://arxiv.org/html/2605.30387#A4.SS1.p1.1)\.
- S\. Liao, H\. Ni, L\. Szpruch, M\. Wiese, M\. Sabate\-Vidales, and B\. Xiao \(2020\)Conditional sig\-wasserstein gans for time series generation\.arXiv preprint arXiv:2006\.05421\.Cited by:[§D\.2](https://arxiv.org/html/2605.30387#A4.SS2.p4.1)\.
- Y\. Lipman, R\. T\. Q\. Chen, H\. Ben\-Hamu, M\. Nickel, and M\. Le \(2023\)Flow matching for generative modeling\.InThe Eleventh International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=PqvMRDCJT9t)Cited by:[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p5.1),[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p6.9)\.
- Y\. Lipman, M\. Havasi, P\. Holderrieth, N\. Shaul, M\. Le, B\. Karrer, R\. T\. Q\. Chen, D\. Lopez\-Paz, H\. Ben\-Hamu, and I\. Gat \(2024\)Flow matching guide and code\.External Links:[Link](https://arxiv.org/abs/2412.06264)Cited by:[Proposition 2](https://arxiv.org/html/2605.30387#Thmtheorem2.p1.4.1)\.
- J\. Y\. Loo, L\. F\. Yu, M\. Adeline, J\. K\. Lau, H\. H\. Tew, A\. Pal, V\. M\. Baskaran, C\. Ting, and R\. C\. Phan \(2025\)Learning energy\-based generative models via potential flow: a variational principle approach to probability density homotopy matching\.Transactions on Machine Learning Research\.Cited by:[§6](https://arxiv.org/html/2605.30387#S6.p1.1)\.
- M\. M\. N\. Murad, M\. Aktukmak, and Y\. Yilmaz \(2025\)Wpmixer: efficient multi\-resolution mixing for long\-term time series forecasting\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 19581–19588\.Cited by:[§B\.1](https://arxiv.org/html/2605.30387#A2.SS1.p4.1)\.
- I\. Naiman, N\. Berman, I\. Pemper, I\. Arbiv, G\. Fadlon, and O\. Azencot \(2024\)Utilizing image transforms and diffusion models for generative modeling of short and long time series\.Advances in Neural Information Processing Systems37,pp\. 121699–121730\.Cited by:[§B\.1](https://arxiv.org/html/2605.30387#A2.SS1.p4.1),[§D\.2](https://arxiv.org/html/2605.30387#A4.SS2.p1.1),[§1](https://arxiv.org/html/2605.30387#S1.p3.1)\.
- M\. Ning, M\. Li, J\. Su, J\. Haozhe, L\. Liu, M\. Benes, W\. Chen, A\. A\. Salah, and I\. O\. Ertugrul \(2025\)DCTdiff: intriguing properties of image generative modeling in the DCT space\.InForty\-second International Conference on Machine Learning,External Links:[Link](https://openreview.net/forum?id=vDoAA8xKXL)Cited by:[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p1.1)\.
- F\. Noman, C\. Ting, H\. Kang, R\. C\.\-W\. Phan, and H\. Ombao \(2024\)Graph autoencoders for embedding learning in brain networks and major depressive disorder identification\.IEEE Journal of Biomedical and Health Informatics28\(3\),pp\. 1644–1655\.Cited by:[§1](https://arxiv.org/html/2605.30387#S1.p1.1)\.
- F\. Noman, S\. Yap, R\. C\.\-W\. Phan, H\. Ombao, and C\. Ting \(2022\)Graph autoencoder\-based embedded learning in dynamic brain networks for autism spectrum disorder identification\.In2022 IEEE International Conference on Image Processing \(ICIP\),Vol\.,pp\. 2891–2895\.Cited by:[§1](https://arxiv.org/html/2605.30387#S1.p1.1)\.
- J\. D\. Power, A\. Mitra, T\. O\. Laumann, A\. Z\. Snyder, B\. L\. Schlaggar, and S\. E\. Petersen \(2014\)Methods to detect, characterize, and remove motion artifact in resting state fmri\.neuroimage84,pp\. 320–341\.Cited by:[§1](https://arxiv.org/html/2605.30387#S1.p1.1)\.
- A\. Radford, L\. Metz, and S\. Chintala \(2015\)Unsupervised representation learning with deep convolutional generative adversarial networks\.arXiv preprint arXiv:1511\.06434\.Cited by:[§D\.1](https://arxiv.org/html/2605.30387#A4.SS1.p1.1)\.
- S\. Rissanen, M\. Heinonen, and A\. Solin \(2023\)Generative modelling with inverse heat dissipation\.InThe Eleventh International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=4PJUBT9f2Ol)Cited by:[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p2.1),[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p3.7)\.
- P\. S\. Shabestari, H\. H\. Behjat, D\. Van De Ville, C\. R\. Cederroth, N\. K\. Edvall, A\. Naas, T\. Kleinjung, and P\. Neff \(2025\)Frequency\-specific resting\-state meg network characteristics of tinnitus patients revealed by graph learning\.bioRxiv,pp\. 2025–03\.Cited by:[§1](https://arxiv.org/html/2605.30387#S1.p2.1)\.
- I\. Skorokhodov, S\. Girish, B\. Hu, W\. Menapace, Y\. Li, R\. Abdal, S\. Tulyakov, and A\. Siarohin \(2025\)Improving the diffusability of autoencoders\.InForty\-second International Conference on Machine Learning,External Links:[Link](https://openreview.net/forum?id=2hEDcA7xy4)Cited by:[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p1.1)\.
- S\. M\. Smith, K\. L\. Miller, G\. Salimi\-Khorshidi, M\. Webster, C\. F\. Beckmann, T\. E\. Nichols, J\. D\. Ramsey, and M\. W\. Woolrich \(2011\)Network modelling methods for fmri\.Neuroimage54\(2\),pp\. 875–891\.Cited by:[§3\.1](https://arxiv.org/html/2605.30387#S3.SS1.p1.1)\.
- Y\. Song, J\. Sohl\-Dickstein, D\. P\. Kingma, A\. Kumar, S\. Ermon, and B\. Poole \(2021\)Score\-based generative modeling through stochastic differential equations\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=PxTIG12RRHS)Cited by:[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p1.1),[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p3.6)\.
- Y\. Tan, J\. Loo, C\. Ting, F\. Noman, R\. C\. Phan, and H\. Ombao \(2024a\)Brainfc\-cgan: a conditional generative adversarial network for brain functional connectivity augmentation and aging synthesis\.InICASSP 2024\-2024 IEEE International Conference on Acoustics, Speech and Signal Processing \(ICASSP\),pp\. 1511–1515\.Cited by:[§1](https://arxiv.org/html/2605.30387#S1.p2.1)\.
- Y\. Tan, C\. Ting, F\. Noman, R\. C\. Phan, and H\. Ombao \(2024b\)FMRI functional connectivity augmentation using convolutional generative adversarial networks for brain disorder classification\.In2024 IEEE International Symposium on Biomedical Imaging \(ISBI\),pp\. 1–5\.Cited by:[§D\.1](https://arxiv.org/html/2605.30387#A4.SS1.p1.1),[§1](https://arxiv.org/html/2605.30387#S1.p2.1)\.
- H\. H\. Tew, F\. Ding, G\. Li, J\. Y\. Loo, C\. Ting, Z\. Y\. Ding, and C\. P\. Tan \(2025a\)ST\-hcss: deep spatio\-temporal hypergraph convolutional neural network for soft sensing\.InICASSP 2025\-2025 IEEE International Conference on Acoustics, Speech and Signal Processing \(ICASSP\),pp\. 1–5\.Cited by:[§6](https://arxiv.org/html/2605.30387#S6.p1.1)\.
- H\. H\. Tew, G\. Li, F\. Ding, X\. Luo, J\. Y\. Loo, C\. Ting, Z\. Y\. Ding, and C\. P\. Tan \(2024\)KANS: knowledge discovery graph attention network for soft sensing in multivariate industrial processes\.In2024 IEEE International Conference on Systems, Man, and Cybernetics \(SMC\),pp\. 4377–4383\.Cited by:[§6](https://arxiv.org/html/2605.30387#S6.p1.1)\.
- H\. H\. Tew, J\. Y\. Loo, Y\. Tan, X\. Tang, H\. Ombao, F\. Noman, R\. C\.\-W\. Phan, and C\. Ting \(2025b\)T2I\-diff: fmri signal generation via time\-frequency image transform and classifier\-free denoising diffusion models\.InInternational Conference on Medical Image Computing and Computer\-Assisted Intervention,pp\. 640–650\.Cited by:[§B\.1](https://arxiv.org/html/2605.30387#A2.SS1.p4.1),[§D\.1](https://arxiv.org/html/2605.30387#A4.SS1.p1.1),[§1](https://arxiv.org/html/2605.30387#S1.p1.1),[§1](https://arxiv.org/html/2605.30387#S1.p3.1),[§3\.2](https://arxiv.org/html/2605.30387#S3.SS2.p1.11)\.
- C\. Ting, J\. I\. Skipper, F\. Noman, S\. L\. Small, and H\. Ombao \(2022\)Separating stimulus\-induced and background components of dynamic functional connectivity in naturalistic fmri\.IEEE Transactions on Medical Imaging41\(6\),pp\. 1431–1442\.Cited by:[§1](https://arxiv.org/html/2605.30387#S1.p1.1)\.
- H\. WU, M\. I\. KNIGHT, K\. W\. Cooper, N\. J\. Fortin, and H\. Ombao \(2025\)Wavelet canonical coherence for nonstationary signals: neurips 2025 \(spotlight\)\.InNeurIPS 2025 Proceedings,Cited by:[§B\.1](https://arxiv.org/html/2605.30387#A2.SS1.p4.1)\.
- T\. Xu, L\. K\. Wenliang, M\. Munn, and B\. Acciaio \(2020\)Cot\-gan: generating sequential data via causal optimal transport\.Advances in neural information processing systems33,pp\. 8798–8809\.Cited by:[§B\.1](https://arxiv.org/html/2605.30387#A2.SS1.p2.1),[§D\.1](https://arxiv.org/html/2605.30387#A4.SS1.p1.1)\.
- C\. Yan, X\. Chen, L\. Li, F\. X\. Castellanos, T\. Bai, Q\. Bo, J\. Cao, G\. Chen, N\. Chen, W\. Chen,et al\.\(2019\)Reduced default mode network functional connectivity in patients with recurrent major depressive disorder\.Proceedings of the National Academy of Sciences116\(18\),pp\. 9078–9083\.Cited by:[§3\.1](https://arxiv.org/html/2605.30387#S3.SS1.p1.1)\.
- C\. Yan and Y\. Zang \(2010\)DPARSF: a matlab toolbox for” pipeline” data analysis of resting\-state fmri\.Frontiers in systems neuroscience4,pp\. 1377\.Cited by:[§3\.1](https://arxiv.org/html/2605.30387#S3.SS1.p1.1)\.
- S\. Yap, J\. Y\. Loo, C\. Ting, F\. Noman, R\. C\.\-W\. Phan, A\. Razi, and D\. L\. Dowe \(2024\)A deep probabilistic spatiotemporal framework for dynamic graph representation learning with application to brain disorder identification\.InProceedings of the Thirty\-Third International Joint Conference on Artificial Intelligence, IJCAI\-24,pp\. 5353–5361\.Cited by:[§1](https://arxiv.org/html/2605.30387#S1.p1.1)\.
- J\. Yoon, D\. Jarrett, and M\. Van der Schaar \(2019\)Time\-series generative adversarial networks\.Advances in neural information processing systems32\.Cited by:[§B\.1](https://arxiv.org/html/2605.30387#A2.SS1.p2.1),[§D\.1](https://arxiv.org/html/2605.30387#A4.SS1.p1.1),[§D\.2](https://arxiv.org/html/2605.30387#A4.SS2.p2.1)\.
- X\. Yuan and Y\. Qiao \(2024\)Diffusion\-ts: interpretable diffusion for general time series generation\.arXiv preprint arXiv:2403\.01742\.Cited by:[§D\.1](https://arxiv.org/html/2605.30387#A4.SS1.p1.1),[§1](https://arxiv.org/html/2605.30387#S1.p3.1)\.
- Q\. Zheng, M\. Le, N\. Shaul, Y\. Lipman, A\. Grover, and R\. T\. Q\. Chen \(2023\)Guided flows for generative modeling and decision making\.External Links:[Link](https://arxiv.org/abs/2311.13443)Cited by:[§2\.3](https://arxiv.org/html/2605.30387#S2.SS3.p7.6)\.

## Appendix AAppendix

This appendix provides self\-contained additional material for the submission titled”Functional MRI Time Series Generation via Wavelet\-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification”\. It includes a detailed about related works, proofs and derivations, the evaluation metrics, full experimental results, limitations, reproducibility statement, as well as the use of large language models \(LLMs\)\.

## Appendix BRelated Works

### B\.1Generative Modeling of fMRI Time Series\.

Synthesizing fMRI BOLD signals is challenging due to the complex spatiotemporal dependencies, non\-stationarity, and interferences arising from neurophysiological fluctuations\. Existing time\-series generation is principally based on generative adversarial networks \(GANs\), variational autoencoders \(VAEs\), and diffusion\-based frameworks\.

GAN\-based approaches:Yoonet al\.\([2019](https://arxiv.org/html/2605.30387#bib.bib43)\)proposes TimeGAN by extending GAN framework with an embedding function and a supervised loss to better capture temporal dynamics, successfully preserving both the static and dynamic characteristics of synthetic time\-series data\. COT\-GAN introduces a causality\-aware optimal transport cost, further aligning real and synthetic samples over time and reducing time\-dependent discrepancy between them\(Xuet al\.,[2020](https://arxiv.org/html/2605.30387#bib.bib42)\)\.

VAE\-based approaches:TimeVAE incorporates temporal components into its encoder\-decoder network, improving the interpretability of generated time series\. Furthermore, it demonstrates success in reducing overall training time compared to adversarial methods\(Desaiet al\.,[2021](https://arxiv.org/html/2605.30387#bib.bib44)\)\.

Diffusion\-based approaches:DiffTime improves time\-series generation by applying hard constraints to enforce fixed points and global minima; alongside soft constraints introduce penalties to guide the model towards desired temporal trends\(Colettaet al\.,[2023](https://arxiv.org/html/2605.30387#bib.bib41)\)\. DiffWave achieves high\-fidelity time\-series generation by replacing autoregressive dependencies with a diffusion denoising chain\(Konget al\.,[2020](https://arxiv.org/html/2605.30387#bib.bib40)\)\. More recently, ImagenTime and T2I\-Diff demonstrated its capability in modelling long\-term time\-series benchmarks by converting signals into short\-time Fourier transform \(STFT\) as the image representation, offering an alternative for modelling longer continuous signals using spectral components\(Naimanet al\.,[2024](https://arxiv.org/html/2605.30387#bib.bib56); Tewet al\.,[2025b](https://arxiv.org/html/2605.30387#bib.bib57)\)\. In contrast, the wavelet transform provides multi\-resolution bands by using adaptive windows that narrow at high frequencies and widen at low frequencies\. These adaptive methods make the wavelet transform better at capturing short transients in continuous signals while still capturing slower trends\(Muradet al\.,[2025](https://arxiv.org/html/2605.30387#bib.bib59); WUet al\.,[2025](https://arxiv.org/html/2605.30387#bib.bib62)\)\.

## Appendix CProofs and Derivations

### C\.1Proof of Proposition 1

###### Proof\.

The forward\-time SPDE \(9\) in the DCT domain admits the following mode\-wise decomposition:

d​zt​\[k\]=η​\(t\)​λk​zt​\[k\]​d​t\+g​\(t,k\)​d​Wt​\[k\]\\displaystyle\\begin\{split\}dz\_\{t\}\[k\]=\\eta\(t\)\\,\\lambda\_\{k\}\\,z\_\{t\}\[k\]\\,dt\+g\(t,k\)\\,dW\_\{t\}\[k\]\\end\{split\}\(18\)whereWt​\[k\]W\_\{t\}\[k\]is the per\-mode standard Wiener process\. Subsequently, introduce the variance\-preserving \(VP\) scaling

zt​\[k\]=α​\(t\)​z~t​\[k\]\\displaystyle\\begin\{split\}z\_\{t\}\[k\]=\\alpha\(t\)\\,\\tilde\{z\}\_\{t\}\[k\]\\end\{split\}\(19\)whereα​\(t\)\\alpha\(t\)is a scalar applied equally to every mode, and the DCT basis remains unchanged, i\.e\., the scaledztz\_\{t\}still obeys the heat dissipation SPDE\. Substituting this into \([18](https://arxiv.org/html/2605.30387#A3.E18)\) and applying Itô’s lemma gives

d​zt​\[k\]=f​\(t,k\)​zt​\[k\]​d​t\+g​\(t,k\)​d​Wt​\[k\]\\displaystyle\\begin\{split\}dz\_\{t\}\[k\]=f\(t,k\)\\,z\_\{t\}\[k\]\\,dt\+g\(t,k\)\\,dW\_\{t\}\[k\]\\end\{split\}\(20\)where we have defined

f​\(t,k\)=α˙​\(t\)α​\(t\)−η​\(t\)​λk\\displaystyle\\begin\{split\}f\(t,k\)=\\frac\{\\dot\{\\alpha\}\(t\)\}\{\\alpha\(t\)\}\-\\eta\(t\)\\,\\lambda\_\{k\}\\end\{split\}\(21\)Taking the conditional expectation of the drift term in \([20](https://arxiv.org/html/2605.30387#A3.E20)\) and integrating with respect to time yields

dd​t​𝔼​\[zt​\[k\]\|z0​\[k\]\]=f​\(t,k\)​𝔼​\[zt​\[k\]\|z0​\[k\]\]𝔼​\[zt​\[k\]\|z0​\[k\]\]=∫0tf​\(t,k\)​μ​\(t,k\)​𝑑t=α​\(t\)​e−λk​τ​\(t\)=μ​\(t,k\)\\displaystyle\\begin\{split\}\\frac\{d\}\{dt\}\\,\\mathbb\{E\}\\big\[z\_\{t\}\[k\]\\,\|\\,z\_\{0\}\[k\]\\big\]&=f\(t,k\)\\,\\mathbb\{E\}\\big\[z\_\{t\}\[k\]\\,\|\\,z\_\{0\}\[k\]\\big\]\\\\ \\mathbb\{E\}\\big\[z\_\{t\}\[k\]\\,\|\\,z\_\{0\}\[k\]\\big\]&=\\int\_\{0\}^\{t\}f\(t,k\)\\,\\mu\(t,k\)\\,dt=\\alpha\(t\)\\,e^\{\-\\lambda\_\{k\}\\tau\(t\)\}=\\mu\(t,k\)\\end\{split\}\(22\)which is exactly the mean schedule defined in \(12\)\. From \([24](https://arxiv.org/html/2605.30387#A3.E24)\), we also have

μ˙​\(t,k\)=f​\(t,k\)​μ​\(t,k\)\\displaystyle\\begin\{split\}\\dot\{\\mu\}\(t,k\)=f\(t,k\)\\,\\mu\(t,k\)\\end\{split\}\(23\)which we will use to derive the standard deviation\.

Applying Itô’s lemma once again to the square of \([20](https://arxiv.org/html/2605.30387#A3.E20)\), and taking conditional expectations yields

dd​t​𝔼​\[zt​\[k\]2\]=2​f​\(t,k\)​𝔼​\[zt​\[k\]2\]\+g​\(t,k\)2\\displaystyle\\begin\{split\}\\frac\{d\}\{dt\}\\,\\mathbb\{E\}\[z\_\{t\}\[k\]^\{2\}\]=2\\,f\(t,k\)\\,\\mathbb\{E\}\\big\[z\_\{t\}\[k\]^\{2\}\\big\]\+g\(t,k\)^\{2\}\\end\{split\}\(24\)Additionally, taking the time\-derivative

σ​\(t,k\)2=Var​\[zt​\[k\]\|z0​\[k\]\]=𝔼​\[zt​\[k\]2\]−μ​\(t,k\)2\\displaystyle\\begin\{split\}\\sigma\(t,k\)^\{2\}=\\mathrm\{Var\}\\big\[z\_\{t\}\[k\]\\,\|\\,z\_\{0\}\[k\]\\big\]=\\mathbb\{E\}\\big\[z\_\{t\}\[k\]^\{2\}\\big\]\-\\mu\(t,k\)^\{2\}\\end\{split\}\(25\)and substitutingμ˙=f​\(t,k\)​μ\\dot\{\\mu\}=f\(t,k\)\\,\\mufrom \([23](https://arxiv.org/html/2605.30387#A3.E23)\), we have

σ˙2=2​f​\(t,k\)​σ2\+g​\(t,k\)2\\displaystyle\\begin\{split\}\\dot\{\\sigma\}^\{2\}=2\\,f\(t,k\)\\,\\sigma^\{2\}\+g\(t,k\)^\{2\}\\end\{split\}\(26\)where we use the shorthand notationsμ\\mu,σ\\sigmaandμ˙\\dot\{\\mu\},σ˙\\dot\{\\sigma\}for brevity\. Since the conditional perturbation kernel is variance–preserving, we also have

σ​\(t,k\)2=1−μ​\(t,k\)2\\displaystyle\\begin\{split\}\\sigma\(t,k\)^\{2\}=1\-\\mu\(t,k\)^\{2\}\\end\{split\}\(27\)Differentiating this gives

σ˙2=−2​μ​μ˙=−2​f​\(t,k\)​μ2=−2​f​\(t,k\)​\(1−σ2\)\\displaystyle\\begin\{split\}\\dot\{\\sigma\}^\{2\}=\-2\\,\\mu\\,\\dot\{\\mu\}=\-2\\,f\(t,k\)\\,\\mu^\{2\}=\-2\\,f\(t,k\)\\,\(1\-\\sigma^\{2\}\)\\end\{split\}\(28\)Equating \([26](https://arxiv.org/html/2605.30387#A3.E26)\) and \([28](https://arxiv.org/html/2605.30387#A3.E28)\) gives

g​\(t,k\)2=2​σ​\(t,k\)​\(σ˙​\(t,k\)−f​\(t,k\)​σ​\(t,k\)\)\\displaystyle\\begin\{split\}g\(t,k\)^\{2\}=2\\,\\sigma\(t,k\)\\,\\big\(\\dot\{\\sigma\}\(t,k\)\-f\(t,k\)\\,\\sigma\(t,k\)\\big\)\\end\{split\}\(29\)which is exactly \(13\)\. This completes the proof\. ∎

### C\.2Proof of Proposition 2

###### Proof\.

The Gaussian reparameterization trick

zt​\[k\]\|z0​\[k\]=μ​\(t,k\)​z0​\[k\]\+σ​\(t,k\)​ϵ\\displaystyle\\begin\{split\}z\_\{t\}\[k\]\|\_\{z\_\{0\}\[k\]\}=\\mu\(t,k\)\\,z\_\{0\}\[k\]\+\\sigma\(t,k\)\\,\\epsilon\\end\{split\}\(30\)follows from the mode\-wise conditional perturbation kernel \(11\) , and its time\-derivative gives the conditional vector field \(14\)\. Using the results \([21](https://arxiv.org/html/2605.30387#A3.E21)\), \([23](https://arxiv.org/html/2605.30387#A3.E23)\) and \([29](https://arxiv.org/html/2605.30387#A3.E29)\) from the proof of Proposition 1, and substituting \([30](https://arxiv.org/html/2605.30387#A3.E30)\), we can reformulate the conditional vector field \(14\) as follows:

d​zt​\[k\]d​t\|z0​\[k\]=v​\(zt\|z0;t,k\)=μ˙​z0​\[k\]\+σ˙​ϵ=μ˙μ​\(zt​\[k\]−σ​ϵ\)\+σ˙​ϵ=f​\(t,k\)​\(zt​\[k\]\|z0​\[k\]−σ​ϵ\)\+σ˙​ϵ=f​\(t,k\)​zt​\[k\]\|z0​\[k\]\+\(σ˙−f​\(t,k\)​σ\)​ϵ=f​\(t,k\)​zt​\[k\]\|z0​\[k\]\+12​g​\(t,k\)2​ϵσ=f​\(t,k\)​zt​\[k\]\|z0​\[k\]\+12​g​\(t,k\)2​∇zt​\[k\]log⁡p​\(zt​\[k\]\|z0​\[k\]\)\\displaystyle\\begin\{split\}\\frac\{dz\_\{t\}\[k\]\}\{dt\}\\bigg\|\_\{z\_\{0\}\[k\]\}&=v\(z\_\{t\}\\,\|\\,z\_\{0\};t,k\)\\\\ &=\\dot\{\\mu\}\\,z\_\{0\}\[k\]\+\\dot\{\\sigma\}\\,\\epsilon\\\\ &=\\frac\{\\dot\{\\mu\}\}\{\\mu\}\\,\\big\(z\_\{t\}\[k\]\-\\sigma\\,\\epsilon\\big\)\+\\dot\{\\sigma\}\\,\\epsilon\\\\ &=f\(t,k\)\\,\\big\(z\_\{t\}\[k\]\|\_\{z\_\{0\}\[k\]\}\-\\sigma\\,\\epsilon\\big\)\+\\dot\{\\sigma\}\\,\\epsilon\\\\ &=f\(t,k\)\\,z\_\{t\}\[k\]\|\_\{z\_\{0\}\[k\]\}\+\\big\(\\dot\{\\sigma\}\-f\(t,k\)\\,\\sigma\\big\)\\,\\epsilon\\\\ &=f\(t,k\)\\,z\_\{t\}\[k\]\|\_\{z\_\{0\}\[k\]\}\+\\frac\{1\}\{2\}\\,g\(t,k\)^\{2\}\\,\\frac\{\\epsilon\}\{\\sigma\}\\\\ &=f\(t,k\)\\,z\_\{t\}\[k\]\|\_\{z\_\{0\}\[k\]\}\+\\frac\{1\}\{2\}\\,g\(t,k\)^\{2\}\\,\\nabla\_\{\\\!z\_\{t\}\[k\]\}\\log p\(z\_\{t\}\[k\]\\,\|\\,z\_\{0\}\[k\]\)\\\\ \\end\{split\}\(31\)which arrives at the conditional probability flow ODE \(15\)\. Here, we again use the shorthand notations for brevity\.

Applying the law of the unconscious statistician from \(16\)

𝔼pdata​\(z0\|zt\)​\[v​\(zt\|z0;t,k\)\|zt\]\\displaystyle\\begin\{split\}\\mathbb\{E\}\_\{p\_\{\\text\{data\}\}\(z\_\{0\}\|z\_\{t\}\)\}\\big\[v\(z\_\{t\}\|z\_\{0\};t,k\)\\,\|\\,z\_\{t\}\\big\]\\end\{split\}\(32\)to the score∇ztlog⁡p​\(zt\|z0\)\\nabla\_\{\\\!z\_\{t\}\}\\log p\(z\_\{t\}\\,\|\\,z\_\{0\}\), we have

∫ℝ∇ztlog⁡p​\(zt\|z0\)​pdata​\(z0\|zt\)​𝑑z0=∫ℝ∇ztlog⁡p​\(zt\|z0\)​p​\(zt\|z0\)​pdata​\(z0\)∫ℝp​\(zt\|z0\)​pdata​\(z0\)​𝑑z0​𝑑z0=∫ℝ∇ztp​\(zt\|z0\)p​\(zt\|z0\)​p​\(zt\|z0\)​pdata​\(z0\)p​\(zt\)​𝑑z0=1p​\(zt\)​∇zt​∫ℝp​\(zt\|z0\)​pdata​\(z0\)​𝑑z0=1p​\(zt\)​∇ztp​\(zt\)=∇ztlog⁡p​\(zt\)\\displaystyle\\begin\{split\}&\\int\_\{\\mathbb\{R\}\}\\nabla\_\{\\\!z\_\{t\}\}\\log p\(z\_\{t\}\\,\|\\,z\_\{0\}\)\\,p\_\{\\text\{data\}\}\(z\_\{0\}\|z\_\{t\}\)\\;dz\_\{0\}\\\\ &=\\int\_\{\\mathbb\{R\}\}\\nabla\_\{\\\!z\_\{t\}\}\\log p\(z\_\{t\}\\,\|\\,z\_\{0\}\)\\,\\frac\{p\(z\_\{t\}\\,\|\\,z\_\{0\}\)\\,p\_\{\\text\{data\}\}\(z\_\{0\}\)\}\{\\int\_\{\\mathbb\{R\}\}p\(z\_\{t\}\\,\|\\,z\_\{0\}\)\\,p\_\{\\text\{data\}\}\(z\_\{0\}\)\\;dz\_\{0\}\}\\;dz\_\{0\}\\\\ &=\\int\_\{\\mathbb\{R\}\}\\frac\{\\nabla\_\{\\\!z\_\{t\}\}p\(z\_\{t\}\\,\|\\,z\_\{0\}\)\}\{p\(z\_\{t\}\\,\|\\,z\_\{0\}\)\}\\,\\frac\{p\(z\_\{t\}\\,\|\\,z\_\{0\}\)\\,p\_\{\\text\{data\}\}\(z\_\{0\}\)\}\{p\(z\_\{t\}\)\}\\;dz\_\{0\}\\\\ &=\\frac\{1\}\{p\(z\_\{t\}\)\}\\nabla\_\{\\\!z\_\{t\}\}\\\!\\\!\\int\_\{\\mathbb\{R\}\}p\(z\_\{t\}\\,\|\\,z\_\{0\}\)\\,p\_\{\\text\{data\}\}\(z\_\{0\}\)\\;dz\_\{0\}\\\\ &=\\frac\{1\}\{p\(z\_\{t\}\)\}\\nabla\_\{\\\!z\_\{t\}\}p\(z\_\{t\}\)=\\nabla\_\{\\\!z\_\{t\}\}\\log p\(z\_\{t\}\)\\\\ \\end\{split\}\(33\)where we have repeatedly apply the log\-derivative trick1p​\(z\)​∇p​\(z\)=∇log⁡p​\(z\)\\frac\{1\}\{p\(z\)\}\\nabla p\(z\)=\\nabla\\log p\(z\)\. This gives us the marginal score and the same applies to the drift termf​\(t,k\)​zt​\[k\]\|z0​\[k\]f\(t,k\)\\,z\_\{t\}\[k\]\|\_\{z\_\{0\}\[k\]\}in \([31](https://arxiv.org/html/2605.30387#A3.E31)\), thus completing the proof\. ∎

## Appendix DEvaluation Protocol

### D\.1Baselines\.

Time\-series Generative Models\.Our proposed DSFM model is first assessed in the unconditional setting using standard metrics used by T2I\-Diff\(Tewet al\.,[2025b](https://arxiv.org/html/2605.30387#bib.bib57)\), against seven time\-series and time\-frequency generative model baselines such as CoT\-GAN\(Xuet al\.,[2020](https://arxiv.org/html/2605.30387#bib.bib42)\), DiffTime\(Colettaet al\.,[2023](https://arxiv.org/html/2605.30387#bib.bib41)\), DiffWave\(Konget al\.,[2020](https://arxiv.org/html/2605.30387#bib.bib40)\), TimeVAE\(Desaiet al\.,[2021](https://arxiv.org/html/2605.30387#bib.bib44)\), TimeGAN\(Yoonet al\.,[2019](https://arxiv.org/html/2605.30387#bib.bib43)\), Diffusion\-TS\(Yuan and Qiao,[2024](https://arxiv.org/html/2605.30387#bib.bib45)\), and T2I\-Diff\(Tewet al\.,[2025b](https://arxiv.org/html/2605.30387#bib.bib57)\)\. In the conditional setting, we computed the image\-domain FID score on subject\-specific DCT and DWT image representations\(Heuselet al\.,[2017](https://arxiv.org/html/2605.30387#bib.bib38)\)\. To ensure image\-to\-signal reconstruction quality, we evaluate the time\-domain using the context\-FID \(cFID\) score\(Jehaet al\.,[2022](https://arxiv.org/html/2605.30387#bib.bib39)\)\. fMRI Generative Models\.We further compared DSFM with FC\-based GAN models, such as Vanilla\-GAN\(Goodfellowet al\.,[2020](https://arxiv.org/html/2605.30387#bib.bib48)\), 1D\-DCGAN\(Radfordet al\.,[2015](https://arxiv.org/html/2605.30387#bib.bib47)\), 2D\-DCGAN\(Tanet al\.,[2024b](https://arxiv.org/html/2605.30387#bib.bib7)\), WGAN and WGAN\-GP\(Gulrajaniet al\.,[2017](https://arxiv.org/html/2605.30387#bib.bib49)\)\.

### D\.2Time\-Series Metrics\.

We utilize the standard time\-series generation metrics fromNaimanet al\.\([2024](https://arxiv.org/html/2605.30387#bib.bib56)\)\. We employ the following four metrics and provide their mathematical formulations to ensure comparable evaluation across multiple aspects:

Discriminative \(Disc\.\) & Predictive score \(Pred\.\)\.We adopt the same experimental setup of\(Yoonet al\.,[2019](https://arxiv.org/html/2605.30387#bib.bib43)\)for both the discriminative and predictive scores\. Both the classifier and sequence\-prediction model use a two\-layer GRU\-based architecture\. The discriminative score is computed as\|accuracy−0\.5\|\|\\text\{accuracy\}\-0\.5\|, where lower scores indicate better indistinguishability, and higher scores reflect greater divergence\. The predictive score is the mean absolute error \(MAE\) of the one\-step\-ahead predictions and the ground\-truth values\.

Context\-FID score \(cFID\)\.Context\-FID score is a time\-series adaptation of the image\-based Frechet Inception Distance \(FID\) that measures how close in distribution synthetic data is to the real data in a learned embedding space\(Jehaet al\.,[2022](https://arxiv.org/html/2605.30387#bib.bib39)\)\. Instead of image features, it uses a trained encoder called TS2Vec to capture temporal context\. Lower scores indicate higher fidelity and have been shown to correlate with better downstream tasks\.

Correlational score \(Corr\.\)\.Following\(Liaoet al\.,[2020](https://arxiv.org/html/2605.30387#bib.bib58)\), we first estimate the covariance of theith andjth feature of time series as follows:

covi,j=1T​∑t=1TXit​Xjt−\(1T​∑t=1TXit\)​\(1T​∑t=1TXjt\)\\mathrm\{cov\}\_\{i,j\}=\\frac\{1\}\{T\}\\sum\_\{t=1\}^\{T\}X\_\{i\}^\{t\}X\_\{j\}^\{t\}\-\\left\(\\frac\{1\}\{T\}\\sum\_\{t=1\}^\{T\}X\_\{i\}^\{t\}\\right\)\\left\(\\frac\{1\}\{T\}\\sum\_\{t=1\}^\{T\}X\_\{j\}^\{t\}\\right\)\(34\)Then, the correlation score is defined as the average absolute difference between corresponding pairwise correlations in the real and synthetic data:

Corr=110​∑i,j\|covi,jrcovi,ir​covj,jr−covi,jscovi,is​covj,js\|\\mathrm\{Corr\}=\\frac\{1\}\{10\}\\sum\_\{i,j\}\\left\|\\frac\{\\mathrm\{cov\}^\{r\}\_\{i,j\}\}\{\\sqrt\{\\mathrm\{cov\}^\{r\}\_\{i,i\}\\,\\mathrm\{cov\}^\{r\}\_\{j,j\}\}\}\-\\frac\{\\mathrm\{cov\}^\{s\}\_\{i,j\}\}\{\\sqrt\{\\mathrm\{cov\}^\{s\}\_\{i,i\}\\,\\mathrm\{cov\}^\{s\}\_\{j,j\}\}\}\\right\|\(35\)

### D\.3Classification Metrics\.

We quantify classification performance using accuracy, precision, recall, F1\-score, and the area under the ROC curve, with larger values indicating better performance; their definitions are given in equation[36](https://arxiv.org/html/2605.30387#A4.E36)\-equation[40](https://arxiv.org/html/2605.30387#A4.E40)\.

Accuracy \(ACC\)=TP\+TNTP\+TN\+FP\+FN\\displaystyle\\text\{Accuracy \(ACC\)\}=\\frac\{\\text\{TP\}\+\\text\{TN\}\}\{\\text\{TP\}\+\\text\{TN\}\+\\text\{FP\}\+\\text\{FN\}\}\(36\)Precision \(PRE\)=TPTP\+FP\\displaystyle\\text\{Precision \(PRE\)\}=\\frac\{\\text\{TP\}\}\{\\text\{TP\}\+\\text\{FP\}\}\(37\)Recall \(REC\)=TPTP\+FN\\displaystyle\\text\{Recall \(REC\)\}=\\frac\{\\text\{TP\}\}\{\\text\{TP\}\+\\text\{FN\}\}\(38\)F1\-score=2⋅PRE×RECPRE\+REC\\displaystyle\\text\{F1\-score\}=2\\;\\cdot\\frac\{\\text\{PRE\}\\times\\text\{REC\}\}\{\\text\{PRE\}\+\\text\{REC\}\}\(39\)ROC=∫01TPR​\(τ\)​d​\(FPR​\(τ\)\)\\displaystyle\\text\{ROC\}=\\int\_\{0\}^\{1\}\\text\{TPR\}\(\\tau\)\\,d\\bigl\(\\text\{FPR\}\(\\tau\)\\bigr\)\(40\)

## Appendix EAdditional Experimental Results

Table 7:Evaluation of our proposed DSFM with class\-conditional \(HC vs MDD\) generation under varying NFE\.### E\.1Conditional Generation Quality across time and frequency domains\.

Table[7](https://arxiv.org/html/2605.30387#A5.T7)compares the generative fidelity of our DSFM framework across three domains: frequency \(DCT\), time\-scale \(DWT\), and the raw time\-series representations\. Overall, DSFM demonstrates competitive performance in the DWT domain by achieving the lowest FID across HC and MDD subjects with hyperparameter settings of NFE = 100, indicating precise reconstruction of scale\-specific BOLD dynamics\. Consistently low cFID values in the time domain further confirm that the synthetic signals remain well aligned with in\-distribution temporal patterns, outlining that the model is complementary with additional spectral features\. In contrast, we also observe that increasing the number of NFE from 20 to 100 consistently reduces error across subjects\. These results validate DSFM as an effective time\-series\-to\-image framework for synthesizing biologically plausible, frequency\-aligned fMRI signals across representations\.

### E\.2Full Results of Ablation Studies\.

Table 8:Ablation of block size, wavelet basis, normalization strategy and spectral representations \(Fourier and wavelet transforms\) and type of generative models \(flow matching and diffusion\)\.Table[8](https://arxiv.org/html/2605.30387#A5.T8)presents the complete ablation results on the MDD dataset with additional experiments on different spectral representations and type of generative models\. In comparison, the Fourier representation achieves a better context\-FID score, indicating strength in capturing global distributional characteristics, but it falls short on the other metrics relative to the wavelet representation\. This observations exemplify that the time–frequency localization property in wavelet representation is more effective at preserving local and multi\-scale structural information\. Moreover, the better results of flow matching model with wavelet representation across all metrics suggest a smoother and stable noise\-to\-data distribution alignment than that of the diffusion\-based wavelet approach\. Overall, these results demonstrate that DSFM outperformed all the configurations across all time\-series generative metrics with the advantage of dual\-spectral transformation and spectral flow matching\.

### E\.3Full Results of Classification Performance\.

Table 9:Classification performance \(MDD\) of different generative models trained on the ground\-truth data with an increasing amount of augmented time series data using our proposed model\.Table 10:Classification performance \(ABIDE\) of different generative models trained on the ground\-truth data with an increasing amount of augmented time series data using our proposed model\.Table[9](https://arxiv.org/html/2605.30387#A5.T9)and[10](https://arxiv.org/html/2605.30387#A5.T10)presents the complete classification results on the MDD and ABIDE datasets across three augmentation levels\. The consistent performance gains at each augmentation level indicate that the synthesized functional conductivity \(FC\) matrices accurately capture brain connectivity patterns and that the data augmentation strategy significantly improves classifier’s generalization to unseen samples\. Notably, DSFM achieved better performance even with only a single augmentation level\.

## Appendix FAdditional Visualization

![Refer to caption](https://arxiv.org/html/2605.30387v1/AnonymousSubmission/LaTeX/suppfig1.png)Figure 6:Comparison of univariate and multivariate spectral representations: ImagenTime/T2I\-Diff and our proposed DSFM\.### F\.1Spectral Image Transformations

Figure[6](https://arxiv.org/html/2605.30387#A6.F6)illustrates the forward and inverse processes of ImagenTime/T2I\-Diff and DSFM applied to our proposed fMRI signals\. The top row shows an univariate STFT real\-valued coefficients, and the bottom row presents a single subband \(Detail 1\) of multivariate DWT coefficient map\. Our framework directly transforms multivariate BOLD signals into a single image representation\.

### F\.2Frequency\-Specific FC Analysis

Figure[7](https://arxiv.org/html/2605.30387#A6.F7)compares the HC and MDD FC matrices against the ground\-truth data correlation across different wavelet subbands\. Consistent with the full\-band correlation, removing the highest\-frequency subbands \(D1 and D2\), or combining D1 with the lowest band \(A5\) preserves dense edge connections near the main diagonal\. In contrast, removing the mid\-frequency subbands \(D3 and D4\) results in sparser connectivity, particularly in the lower\-right region of the matrices\.

![Refer to caption](https://arxiv.org/html/2605.30387v1/AnonymousSubmission/LaTeX/brain_network_fig1.png)Figure 7:Frequency‐specific functional connectivity \(FC\) matrices for healthy controls \(HC\) and patients with major depressive disorder \(MDD\), alongside their differences\. The FCs are shown under four different conditions: full\-band; removal of the highest\-frequency subbands \(D1 \+ D2\) and the lowest\-frequency component \(A5\), which both yield the two highest classification scores; and removal of the mid\-band subbands \(D3 \+ D4\), which produces the greatest deviations and the lowest score\.

## Appendix GComputational cost

The training of the proposed DSFM required 22 hours, 40 minutes, and 52\.698 seconds of wall\-clock time, while inference for generating the full samples took 48 minutes and 48\.98 seconds with 1x A100 GPU\. The model contains 130,844,352 parameters\.

## Appendix HLimitations

Currently, DSFM is specially designed for the generation of resting\-state fMRI signals\. This opens a valuable opportunity to expand our work to other human brain activity signals, such as electroencephalography \(EEG\), functional near\-infrared spectroscopy \(fNIRS\), and magnetoencephalography \(MEG\)\. Our spectral flow matching framework offers flexibility to capture spectral\-temporal dynamics of other neural signals with frequency\-specific representation\.

## Appendix IReproducibility statement

We provide the datasets, source code, and configurations for all key experiments, including instructions on how to preprocess data and train the models at[https://github\.com/htew0001/DSFM\.git](https://github.com/htew0001/DSFM.git)\.

## Appendix JThe Use of Large Language Models \(LLMs\)

We used LLMs solely for grammar correction\. All ideas, analyses, and results are by the authors\.

Similar Articles

SDFlow: Similarity-Driven Flow Matching for Time Series Generation

arXiv cs.AI

This paper introduces SDFlow, a similarity-driven flow matching framework for time series generation that addresses exposure bias in autoregressive models. It achieves state-of-the-art performance and inference speedups by operating in the frozen VQ latent space with low-rank manifold decomposition.

A Simulated Federated Analysis of MS-Induced Brain Lesions

arXiv cs.LG

This paper introduces a simulation framework for federated analysis of Multiple Sclerosis brain lesions, combining image segmentation with clinical data analysis to test federated learning methods while preserving patient privacy.