PrismFlow: Residual Dynamics for Flow Matching in Time-Series Generation
Summary
PrismFlow introduces a Flow Matching method with Koopman-inspired dynamical experts to handle multimodal and multiscale time-series data, achieving state-of-the-art performance with significant improvements in Context-FID and Discriminative Score.
View Cached Full Text
Cached at: 05/29/26, 09:11 AM
# PrismFlow: Residual Dynamics for Flow Matching in Time-Series Generation
Source: [https://arxiv.org/html/2605.28867](https://arxiv.org/html/2605.28867)
Junru Zhang1Lang Feng2Jinbo Wang1Xu Guo2Yucheng Wang3 Han Yu2Min Wu3Yabo Dong111footnotemark:1Duanqing Xu1 1Zhejiang University, China 2Nanyang Technological University, Singapore 3I2R, Agency for Science, Technology and Research \(A\*STAR\), Singapore
###### Abstract
Generating high\-quality time\-series data is challenging because real\-world signals often exhibit multimodal patterns and multiscale dynamics, including oscillations and high\-frequency variations\. Flow Matching \(FM\) offers an efficient alternative to diffusion models, but practical implementations typically rely on a single finite\-capacity global vector\-field estimator\. In such heterogeneous temporal distributions, distinct regimes may pass through nearby flow states while requiring incompatible conditional velocities\. A monolithic estimator trained with the standardℓ2\\ell\_\{2\}velocity\-matching objective may therefore learn an overly smoothed approximation of the local transport field\. This estimator\-level smoothing can attenuate branch\-specific dynamics, leading to spectral distortion and poor mode coverage\. To address this, we propose PrismFlow, a new FM method with Koopman\-inspired dynamical experts\. Each expert learns residual corrections in a latent space where local nonlinear temporal evolution can be approximated by linear transitions\. We further propose a confidence\-aware Winner\-Take\-All \(WTA\) objective that updates only the expert best aligned with each sample while masking gradients to the others, encouraging mode\-specific specialization\. During sampling, the selected expert adds a residual dynamical correction to the global transport field, preserving FM stability while recovering fine\-grained and high\-frequency temporal structures\. Across various benchmarks, PrismFlow effectively mitigates the spectral contraction in standard FM and achieves state\-of\-the\-art performance, with a 15\.6% gain in Context\-FID and a 38\.6% improvement in Discriminative Score, while remaining robust in low\-data settings and effective for forecasting and imputation\.
## 1Introduction
Time\-series data underpin decision\-making in domains such as healthcare\[[1](https://arxiv.org/html/2605.28867#bib.bib1),[2](https://arxiv.org/html/2605.28867#bib.bib2)\], finance\[[3](https://arxiv.org/html/2605.28867#bib.bib3),[4](https://arxiv.org/html/2605.28867#bib.bib4),[5](https://arxiv.org/html/2605.28867#bib.bib5)\], and environmental monitoring\[[6](https://arxiv.org/html/2605.28867#bib.bib6),[7](https://arxiv.org/html/2605.28867#bib.bib7),[8](https://arxiv.org/html/2605.28867#bib.bib8)\]\. However, the acquisition of high\-fidelity signals is frequently restricted by stringent privacy regulations and prohibitive costs\[[9](https://arxiv.org/html/2605.28867#bib.bib9),[10](https://arxiv.org/html/2605.28867#bib.bib10)\]\. This scarcity necessitates generative models capable of synthesizing sequences that are not only statistically consistent but also preserve the underlying temporal evolution of real\-world phenomena\.
Recent advances in time\-series generation have shifted from adversarial frameworks\[[11](https://arxiv.org/html/2605.28867#bib.bib11),[12](https://arxiv.org/html/2605.28867#bib.bib12)\]toward simulation\-free continuous transport methods, most notably Flow Matching \(FM\)\[[13](https://arxiv.org/html/2605.28867#bib.bib13),[14](https://arxiv.org/html/2605.28867#bib.bib14)\]\. While FM offers superior stability and sampling efficiency compared to diffusion models\[[15](https://arxiv.org/html/2605.28867#bib.bib15),[16](https://arxiv.org/html/2605.28867#bib.bib16)\], its practical implementation typically relies on asingle global vector\-field estimatorto approximate the transport dynamics\[[17](https://arxiv.org/html/2605.28867#bib.bib17)\]\. Although the exact FM transport field is well\-defined in principle, learning it from finite data with a monolithic estimator introduces statistical and representational challenges\. In multi\-modal temporal scenarios, such as those with varying frequencies or transient responses, heterogeneous temporal patterns may occupy nearby regions of the flow state space while requiring distinct sequence evolutions\. Under the standardℓ2\\ell\_\{2\}velocity\-matching objective, the estimator is therefore encouraged to predict the conditional mean of incompatible target velocities, which tends to smooth out mode\-specific time\-series patterns\. As a result, rich temporal structure can collapse into an overly smooth learned flow with constrained spectral diversity, leading tomode collapse\[[18](https://arxiv.org/html/2605.28867#bib.bib18),[19](https://arxiv.org/html/2605.28867#bib.bib19),[20](https://arxiv.org/html/2605.28867#bib.bib20),[21](https://arxiv.org/html/2605.28867#bib.bib21),[22](https://arxiv.org/html/2605.28867#bib.bib22)\]\. This degrades local trajectory fidelity and restricts the expressive capacity of flow\-based generators by limiting their ability to reproduce diverse dynamical patterns\.
To mitigate this mode collapse, we propose PrismFlow, a new FM method that corrects the global transport field using a bank of Koopman\-inspired dynamical experts\. Drawing on Koopman operator theory\[[23](https://arxiv.org/html/2605.28867#bib.bib23),[24](https://arxiv.org/html/2605.28867#bib.bib24)\], our method maps nonlinear temporal evolutions into a latent space where they can be modeled as local linear transitions\. Rather than forcing a single learned estimator to capture all dynamics, we treat generation as a*routing*problem: at each intermediate state, the model dynamically identifies the local temporal pattern and assigns responsibility to the expert best suited to model that specific evolution\. To ensure these experts learn distinct modes, we introduce a Winner\-Take\-All \(WTA\) training objective with competitive selection\. Unlike standard mixtures with soft gating\[[25](https://arxiv.org/html/2605.28867#bib.bib25),[26](https://arxiv.org/html/2605.28867#bib.bib26)\], which can reintroduce averaging effects, our approach enforces clearer expert specialization\. For a given state, all experts predict candidate residual velocities, but only the expert with the best confidence\-aware WTA score receives the specialization update, while gradients to the others are masked\. This mechanism reduces the averaging of incompatible modes, which is the primary cause of regression\-to\-the\-mean behavior in the learned monolithic estimator\. During sampling, PrismFlow integrates the selected expert as aresidual dynamical correctionterm added to the global transport field, thus preserving the stability of the global flow while allowing the model to recover diverse spectral components across multiple scales\.
Our main contributions are summarized as follows:
- •We characterize velocity averaging in learned FM models as a key bottleneck for time\-series generation\. Our analysis shows that single\-field estimators can induce conditional\-mean behavior, leading to spectral distortion and mode collapse in generated time series\.
- •We propose PrismFlow, a novel method that mitigates mode collapse in time\-series generation\. By integrating Koopman\-inspired experts as residual dynamical correction terms to the global flow, we capture multi\-scale dynamics that standard monolithic estimators often fail to represent\.
- •We propose training these experts with a WTA competitive selection rule\. By assigning each flow state to a single expert and masking gradients to the others, this objective reduces regression\-to\-the\-mean behavior under the standardℓ2\\ell\_\{2\}loss and promotes clear mode\-specific specialization\.
- •Empirical evaluations of PrismFlow demonstrate strong performance across diverse time\-series generation tasks\. PrismFlow effectively recovers diverse modes, improving Context\-FID by 15\.6% and Discriminative Score by 38\.6%, while remaining robust in low\-data settings and high\-fidelity for forecasting and imputation\.
## 2Related Work
Time\-Series Generation\.Time\-series generation has progressed from early adversarial\[[27](https://arxiv.org/html/2605.28867#bib.bib27),[28](https://arxiv.org/html/2605.28867#bib.bib28),[12](https://arxiv.org/html/2605.28867#bib.bib12),[29](https://arxiv.org/html/2605.28867#bib.bib29),[30](https://arxiv.org/html/2605.28867#bib.bib30)\]and latent\-variable\[[31](https://arxiv.org/html/2605.28867#bib.bib31),[32](https://arxiv.org/html/2605.28867#bib.bib32)\]models toward iterative refinement methods that deliver higher fidelity\. Representative baselines such as TimeGAN\[[12](https://arxiv.org/html/2605.28867#bib.bib12)\]and TimeVAE\[[31](https://arxiv.org/html/2605.28867#bib.bib31)\]capture global temporal dependencies, but they often suffer from training instability and tend to blur local, phase\-sensitive dynamics\. More recently, diffusion\-based models\[[33](https://arxiv.org/html/2605.28867#bib.bib33),[34](https://arxiv.org/html/2605.28867#bib.bib34),[35](https://arxiv.org/html/2605.28867#bib.bib35),[36](https://arxiv.org/html/2605.28867#bib.bib36)\], including Diffwave\[[35](https://arxiv.org/html/2605.28867#bib.bib35)\]and Diffusion\-TS\[[36](https://arxiv.org/html/2605.28867#bib.bib36)\], have achieved strong synthesis quality by casting generation as progressive denoising\. SDformer\[[37](https://arxiv.org/html/2605.28867#bib.bib37)\]further explores discrete sequence modeling with a large\-parameter diffusion model\. While diffusion models provide impressive fidelity and distributional alignment, their many\-step inference remains computationally expensive, limiting real\-time and large\-scale deployment\.
Flow Matching\.Flow Matching \(FM\) is a simulation\-free framework for training continuous normalizing flows, combining stable objectives with efficient ordinary differential equation \(ODE\)\-based sampling\[[13](https://arxiv.org/html/2605.28867#bib.bib13),[14](https://arxiv.org/html/2605.28867#bib.bib14)\]\. By regressing time\-dependent velocity fields along predefined probability paths, FM avoids expensive trajectory simulations during training and enables deterministic generation via standard solvers\. This efficiency has yielded strong results in image synthesis\[[38](https://arxiv.org/html/2605.28867#bib.bib38)\], video generation\[[39](https://arxiv.org/html/2605.28867#bib.bib39)\], stable neural ODE dynamics\[[40](https://arxiv.org/html/2605.28867#bib.bib40)\], time\-series foundation modeling\[[41](https://arxiv.org/html/2605.28867#bib.bib41),[42](https://arxiv.org/html/2605.28867#bib.bib42)\], and probabilistic forecasting\[[43](https://arxiv.org/html/2605.28867#bib.bib43),[44](https://arxiv.org/html/2605.28867#bib.bib44)\]\. Recently, TimeMCL\[[45](https://arxiv.org/html/2605.28867#bib.bib45)\]introduced a multiple\-choice learning approach for forecasting diverse futures\. Complementary to such output\-level diversity methods, our work focuses on estimator\-level averaging in monolithic FM implementations\. When heterogeneous temporal modes pass through nearby flow states, the standardℓ2\\ell\_\{2\}objective may drive a single estimator toward the conditional mean of incompatible velocities, yielding over\-smoothed trajectories and reduced spectral diversity\. Rather than treating this as a flaw of the exact FM transport field, we address the practical limitation of its finite\-sample approximation by augmenting the global estimator with dynamically routed Koopman\-inspired residual experts\.
## 3Preliminaries
Problem Setting\.Letx∈𝒳=ℝS×Dx\\in\\mathcal\{X\}=\\mathbb\{R\}^\{S\\times D\}denote a multivariate time series withSStemporal steps andDDchannels\. Each sequencexxis drawn from an unknown data distributionq\(x\)q\(x\)over𝒳\\mathcal\{X\}\. The goal of generative modeling is to learn a parameterized distributionpθp\_\{\\theta\}that approximatesqq, enabling the synthesis of sequencesx^\\hat\{x\}that preserve both the statistical properties and temporal dynamics of real\-world data\.
Flow Matching\.Flow Matching \(FM\)\[[13](https://arxiv.org/html/2605.28867#bib.bib13)\]is a simulation\-free framework for training Continuous Normalizing Flows\. It generates samples by transporting a simple source distributionp0p\_\{0\}to the data distribution through a time\-dependent vector fieldvtθ:\[0,1\]×𝒳→𝒳v\_\{t\}^\{\\theta\}:\[0,1\]\\times\\mathcal\{X\}\\to\\mathcal\{X\}\. The transformation is governed by the ODE:
ddtxt=vtθ\(xt,t\),xt=0=x0,\\frac\{\\mathrm\{d\}\}\{\\mathrm\{d\}t\}x\_\{t\}=v\_\{t\}^\{\\theta\}\(x\_\{t\},t\),\\qquad x\_\{t=0\}=x\_\{0\},\(1\)which induces a probability path\{pt\}t∈\[0,1\]\\\{p\_\{t\}\\\}\_\{t\\in\[0,1\]\}withxt∼ptx\_\{t\}\\sim p\_\{t\}forx0∼p0x\_\{0\}\\sim p\_\{0\}\. We parameterizevtθ\(xt,t\)v\_\{t\}^\{\\theta\}\(x\_\{t\},t\)with an encoder\-decoder network whose parameters are denoted byθ=\(ϕη,ϕζ\)\\theta=\(\\phi\_\{\\eta\},\\phi\_\{\\zeta\}\)\. Following standard Conditional Flow Matching \(CFM\), we adopt the linear interpolation betweenx0∼p0=𝒩\(𝟎,𝐈\)x\_\{0\}\\sim p\_\{0\}=\\mathcal\{N\}\(\\mathbf\{0\},\\mathbf\{I\}\)andx1∼qx\_\{1\}\\sim q:xt=\(1−t\)x0\+tx1,x\_\{t\}=\(1\-t\)x\_\{0\}\+tx\_\{1\},whose target velocity is constant:
ddtxt=x1−x0\.\\frac\{\\mathrm\{d\}\}\{\\mathrm\{d\}t\}x\_\{t\}=x\_\{1\}\-x\_\{0\}\.\(2\)The model is trained by minimizing:
ℒCFM\(θ\)=𝔼t,x0,x1\[‖vtθ\(xt,t\)−\(x1−x0\)‖22\],\\mathcal\{L\}\_\{\\mathrm\{CFM\}\}\(\\theta\)=\\mathbb\{E\}\_\{t,x\_\{0\},x\_\{1\}\}\\left\[\\left\\\|v\_\{t\}^\{\\theta\}\(x\_\{t\},t\)\-\(x\_\{1\}\-x\_\{0\}\)\\right\\\|\_\{2\}^\{2\}\\right\],\(3\)wheret∼𝒰\[0,1\]t\\sim\\mathcal\{U\}\[0,1\]\. To generate a sample, we drawx0∼𝒩\(𝟎,𝐈\)x\_\{0\}\\sim\\mathcal\{N\}\(\\mathbf\{0\},\\mathbf\{I\}\)and integrate the learned ODE fromt=0t=0tot=1t=1:
xt\+Δt=xt\+vtθ\(xt,t\)Δt,x\_\{t\+\\Delta t\}=x\_\{t\}\+v\_\{t\}^\{\\theta\}\(x\_\{t\},t\)\\Delta t,\(4\)using a numerical solver to obtainx^=xt=1\\hat\{x\}=x\_\{t=1\}\.
Mode Collapse\.While theℓ2\\ell\_\{2\}objective in Eq\. \([3](https://arxiv.org/html/2605.28867#S3.E3)\) is effective, practical single\-field estimators can become mean\-seeking in multi\-modal temporal settings\. In standard FM, when trajectories from different temporal regimes pass through nearby flow statesxtx\_\{t\}, the local target velocity distributionq\(ut∣xt\)q\(u\_\{t\}\\mid x\_\{t\}\)can become highly heterogeneous, whereut=x1−x0u\_\{t\}=x\_\{1\}\-x\_\{0\}denotes the CFM target velocity\. Under theℓ2\\ell\_\{2\}loss, a finite\-capacity estimator may learn a smoothed approximation around the conditional\-average trend,vt∗\(xt,t\)=𝔼\[ut∣xt\],v\_\{t\}^\{\*\}\(x\_\{t\},t\)=\\mathbb\{E\}\[u\_\{t\}\\mid x\_\{t\}\],rather than effectively preserving branch\-specific velocity directions\. When nearby samples correspond to incompatible temporal regimes, this estimator\-level averaging can reduce the effective velocity energy,∥𝔼\[ut∣xt\]∥22≤𝔼\[∥ut∥22∣xt\],\\\|\\mathbb\{E\}\[u\_\{t\}\\mid x\_\{t\}\]\\\|\_\{2\}^\{2\}\\leq\\mathbb\{E\}\[\\\|u\_\{t\}\\\|\_\{2\}^\{2\}\\mid x\_\{t\}\],and may attenuate transient branches and high\-frequency components after ODE integration\. From a Dynamic Mode Decomposition \(DMD\) perspective\[[46](https://arxiv.org/html/2605.28867#bib.bib46)\], this practical smoothing manifests as spectral contraction, where energy concentrates into a few slowly varying modes while faster or weaker modes vanish\. We study this specific form of mode collapse\[[19](https://arxiv.org/html/2605.28867#bib.bib19),[20](https://arxiv.org/html/2605.28867#bib.bib20)\]in time\-series generation, which is further validated by a Gaussian\-mixture diagnostic in the Appendix\.
Figure 1:Overall architecture of PrismFlow\. Given a flow statextx\_\{t\}, a flow matching backbone predicts a global transport velocityvtθv\_\{t\}^\{\\theta\}\. In parallel, a shared encoder and projector mapxtx\_\{t\}into a latent Koopman space, where a bank of linear experts models different temporal modes\. Each expert evolves its latent state through a linear velocity, which is decoded bygψg\_\{\\psi\}into a candidate residual velocity\{vtψ,k\}k=1K\\\{v\_\{t\}^\{\\psi,k\}\\\}\_\{k=1\}^\{K\}\. A router produces expert probabilitiesπt\\pi\_\{t\}, and a WTA score selects the dominant expertkt∗k\_\{t\}^\{\*\}\. During training,ℒWTA\\mathcal\{L\}\_\{\\mathrm\{WTA\}\}assigns the specialization signal to the selected expert,ℒCFM\\mathcal\{L\}\_\{\\mathrm\{CFM\}\}aligns the global transport with the target distribution, andℒBAL\\mathcal\{L\}\_\{\\mathrm\{BAL\}\}encourages balanced expert usage\. During sampling, the selected expert provides a residual correction to the global transport field\.
## 4Methodology
We introduce PrismFlow, a new flow matching method designed to mitigate the estimator\-level mean seeking\. As shown in Fig\.[1](https://arxiv.org/html/2605.28867#S3.F1), PrismFlow preserves the global transport process of FM while augmenting it with residual dynamical corrections\. The method contains two core components: \(i\)Mode\-Specific Experts, a bank of Koopman\-inspired experts that captures distinct temporal regimes and allows local dynamics to deviate from the global average; and \(ii\)Residual Routing, a Winner\-Take\-All \(WTA\) mechanism that assigns each flow state to a dominant expert\. By learning residual velocities that complement the global estimator, PrismFlow recovers multi\-scale temporal structures while retaining the simulation\-free efficiency of standard FM\.
In the following, we detail the components of PrismFlow by \(1\) defining Koopman\-based linear experts that provide mode\-specific residual dynamics \(Sec\.[4\.1](https://arxiv.org/html/2605.28867#S4.SS1)\); \(2\) introducing the WTA routing policy for hard expert selection \(Sec\.[4\.2](https://arxiv.org/html/2605.28867#S4.SS2)\); \(3\) describing the multi\-objective training that enforces gradient\-masked specialization while balancing expert utilization \(Sec\.[4\.3](https://arxiv.org/html/2605.28867#S4.SS3)\); and \(4\) presenting the sampling procedure where expert\-informed residuals complement the global transport estimator to preserve spectral diversity \(Sec\.[4\.4](https://arxiv.org/html/2605.28867#S4.SS4)\)\.
### 4\.1Koopman Experts for Structured Mode\-Seeking
To mitigate estimator\-level averaging across incompatible temporal regimes, such as oscillations and sharp transients, we introduce a bank ofKKexperts that capture distinct evolution patterns\. When multiple temporal modes can explain nearby flow states, different experts should take responsibility for different regimes rather than forcing a single estimator to absorb them through a mean\-seeking prediction\. To model such structured dynamics efficiently, we adopt Koopman theory\[[23](https://arxiv.org/html/2605.28867#bib.bib23)\], which suggests that nonlinear temporal evolution in the original state space can admit an approximately linear description in a latent representation\. Formally, forx\(s\)→x\(s\+1\)\{x\}^\{\(s\)\}\\rightarrow\{x\}^\{\(s\+1\)\}, there exist an observable mapffand a linear operatorAAsuch thatf\(x\(s\+1\)\)=A∘f\(x\(s\)\)f\(\{x\}^\{\(s\+1\)\}\)=A\\circ f\(\{x\}^\{\(s\)\}\)\. We therefore learn a latent mapping with a shared encoder and parameterize each expert as a finite\-dimensional linear generator\.
Latent linear experts\.We construct a finite\-dimensional Koopman embedding using a shared encoderϕη\\phi\_\{\\eta\}and a projectorξ\\xi\. Given a flow statextx\_\{t\}, we computeht=ϕη\(xt\)h\_\{t\}=\\phi\_\{\\eta\}\(x\_\{t\}\)and project it into the Koopman latent space aszt=ξ\(ht\)z\_\{t\}=\\xi\(h\_\{t\}\)\. PrismFlow maintainsKKexperts, each parameterized by a learnable matrixAk∈ℝdz×dzA^\{k\}\\in\\mathbb\{R\}^\{d\_\{z\}\\times d\_\{z\}\}\. Conditioned on expertkk, the latent velocity field is:
dztdt=Akzt\.\\frac\{\\mathrm\{d\}z\_\{t\}\}\{\\mathrm\{d\}t\}=A^\{k\}z\_\{t\}\.\(5\)EachAkA^\{k\}defines a coherent latent transport mode along the flow path\. Its spectrum controls the rotational and contractive components of the latent velocity over flow time, allowing different experts to specialize in distinct local transport regimes while retaining linear latent evolution\.
Expert velocity decoding\.To correct the data\-space flow, we project the latent velocityAkztA^\{k\}z\_\{t\}back to𝒳\\mathcal\{X\}using an expert velocity decodergψg\_\{\\psi\}:
vtψ,k\(xt,t\)=gψ\(zt,Akzt\)\.v\_\{t\}^\{\\psi,k\}\(x\_\{t\},t\)=g\_\{\\psi\}\\\!\\left\(z\_\{t\},\\,A^\{k\}z\_\{t\}\\right\)\.\(6\)The decoder maps structured linear dynamics in the Koopman space into a nonlinear data\-space residual velocity\.
Spectral constraints and plasticity\.Unconstrained learning ofAkA^\{k\}can lead to unstable dynamics and unbounded time\-series data\. To encourage stable trajectories while preserving expressive temporal structure, we enforce dissipativity by requiring12\(Ak\+\(Ak\)⊤\)⪯−δI,δ≥0\.\\frac\{1\}\{2\}\\bigl\(A^\{k\}\+\(A^\{k\}\)^\{\\top\}\\bigr\)\\preceq\-\\delta I,\\delta\\geq 0\.We impose this constraint by parameterizing:
Ak=\(Sk−\(Sk\)⊤\)−\(Rk\)⊤Rk−δI,A^\{k\}=\\bigl\(S^\{k\}\-\(S^\{k\}\)^\{\\top\}\\bigr\)\-\(R^\{k\}\)^\{\\top\}R^\{k\}\-\\delta I,\(7\)whereSk∈ℝdz×dzS^\{k\}\\in\\mathbb\{R\}^\{d\_\{z\}\\times d\_\{z\}\}generates the skew\-symmetric component that supports oscillatory modes through latent rotations, andRk∈ℝdz×dzR^\{k\}\\in\\mathbb\{R\}^\{d\_\{z\}\\times d\_\{z\}\}parameterizes the dissipative term that damps transient dynamics\. The margin−δI\-\\delta Iensures spectral dissipativity\. This construction keepsSkS^\{k\}andRkR^\{k\}unconstrained during optimization while guaranteeing stable latent dynamics and preserving broad spectral diversity\.
### 4\.2WTA Competitive Routing for Flow Matching
While Koopman experts provide a structured way to model distinct temporal modes, PrismFlow still needs to associate each intermediate flow state with an appropriate expert\. In multi\-modal regions, several experts may induce plausible local corrections for the samextx\_\{t\}, while soft mixing can average their velocities and reintroduce regression\-to\-the\-mean behavior\. We therefore introduce a Winner\-Take\-All \(WTA\) routing rule that assigns each flow state to a dominant expert and encourages mode\-specific residual dynamics along the trajectory\.
Flow routing policy\.Along the flow\-matching probability path, we use a learnable routing networkℛω\\mathcal\{R\}^\{\\omega\}to assign flow states to experts\. Given flow timettand statextx\_\{t\}, the router outputs a categorical distribution overKKexperts:
πtk=ℛω\(t,ϕη\(xt\)\),s\.t\.πtk≥0,∑k=1Kπtk=1\.\\pi\_\{t\}^\{\\,k\}=\\mathcal\{R\}^\{\\omega\}\(t,\\phi\_\{\\eta\}\(x\_\{t\}\)\),\\quad\\text\{s\.t\.\}\\ \\pi\_\{t\}^\{\\,k\}\\geq 0,\\ \\sum\\nolimits\_\{k=1\}^\{K\}\\pi\_\{t\}^\{\\,k\}=1\.\(8\)Here,πtk\\pi\_\{t\}^\{\\,k\}denotes the probability of selecting expertkkat flow timett\. The dependence onttallows routing decisions to adapt to different noise\-to\-data stages of the transport process\.
Confidence\-aware WTA selection\.During training, the target endpointx1x\_\{1\}provides a natural signal for expert assignment\. For each expertkk, we estimate a candidate endpointx^1k\\hat\{x\}\_\{1\}^\{\\,k\}using the expert\-informed velocity from the current state toward the endpoint\. We then define a confidence\-aware score:
𝒮tk\(t,xt,x1\)=‖x^1k−x1‖22−βlog\(πtk\+ε\),\\mathcal\{S\}^\{k\}\_\{t\}\(t,x\_\{t\},x\_\{1\}\)=\\\|\\hat\{x\}\_\{1\}^\{\\,k\}\-x\_\{1\}\\\|\_\{2\}^\{2\}\-\\beta\\log\\\!\\big\(\\pi\_\{t\}^\{\\,k\}\+\\varepsilon\\big\),\(9\)whereβ\\betacontrols the confidence regularization andε\>0\\varepsilon\>0ensures numerical stability\. The winning expert is selected by a hard WTA rule:
kt∗=argmink∈\{1,…,K\}𝒮tk\(t,xt,x1\)\.k^\{\*\}\_\{t\}=\\arg\\min\_\{k\\in\\\{1,\\ldots,K\\\}\}\\mathcal\{S\}^\{k\}\_\{t\}\(t,x\_\{t\},x\_\{1\}\)\.\(10\)So, at each flow timett, a single dominant expert provides the primary residual correction for the current state\. This hard assignment avoids soft velocity averaging and promotes specialization across temporal regimes\. The score in Eq\. \([9](https://arxiv.org/html/2605.28867#S4.E9)\) also stabilizes routing by combining endpoint error with router confidence, reducing noisy early assignments and allowing specialization to develop more smoothly\.
### 4\.3Learning Mode\-Specific Dynamical Experts
In this part, we describe the training pipeline of PrismFlow, which learns mode\-specific experts to complement the global FM transport in multi\-modal regimes\. The global transport is supervised by the CFM lossℒCFM\\mathcal\{L\}\_\{\\mathrm\{CFM\}\}in Eq\. \([3](https://arxiv.org/html/2605.28867#S3.E3)\), which anchors the overall noise\-to\-data mapping and preserves distributional alignment\. To mitigate estimator\-level mean seeking and the resulting mode collapse, we introduce a WTA loss that updates only the selected expert for each\(t,xt\)\(t,x\_\{t\}\)\. Given the winnerkt∗k\_\{t\}^\{\*\}from Eq\. \([10](https://arxiv.org/html/2605.28867#S4.E10)\), we optimize
ℒWTA=𝔼t,x0,x1\[λt𝒮tkt∗\(t,xt,x1\)\],\\mathcal\{L\}\_\{\\mathrm\{WTA\}\}=\\mathbb\{E\}\_\{t,x\_\{0\},x\_\{1\}\}\\\!\\left\[\\lambda\_\{t\}\\,\\mathcal\{S\}^\{k^\{\*\}\_\{t\}\}\_\{t\}\(t,x\_\{t\},x\_\{1\}\)\\right\],\(11\)whereλt\\lambda\_\{t\}is a time\-dependent weighting schedule along the flow trajectory\. Gradients fromℒWTA\\mathcal\{L\}\_\{\\mathrm\{WTA\}\}are explicitly masked for all non\-selected experts, so specialization is driven only by the expert that best explains the current sample\.
To avoid routing collapse and encourage diverse expert usage, we add a load\-balancing regularizer:
π¯=1B∑i=1Bπt\(i\)\(i\),ℒBAL=KL\(u∥π¯\),\\bar\{\\pi\}=\\frac\{1\}\{B\}\\sum\_\{i=1\}^\{B\}\\pi\_\{t^\{\(i\)\}\}^\{\(i\)\},\\qquad\\mathcal\{L\}\_\{\\mathrm\{BAL\}\}=\\mathrm\{KL\}\\\!\\left\(u\\,\\middle\\\|\\,\\bar\{\\pi\}\\right\),\(12\)whereu=\(1/K,…,1/K\)u=\(1/K,\\ldots,1/K\)is the uniform prior andBBis the batch size\. Since hard routing can overuse a few experts early in training,ℒBAL\\mathcal\{L\}\_\{\\mathrm\{BAL\}\}penalizes batch\-level imbalance, mitigating dead experts and promoting sufficient specialization signals for all experts\.
Thus, the overall objective is
ℒtotal=ℒCFM\(θ\)\+αWℒWTA\(η,ψ,ξ,ω,A\)\+αBℒBAL\(ω\)\.\\small\\mathcal\{L\}\_\{\\mathrm\{total\}\}=\\mathcal\{L\}\_\{\\mathrm\{CFM\}\}\(\\theta\)\+\\alpha\_\{W\}\\,\\mathcal\{L\}\_\{\\mathrm\{WTA\}\}\(\\eta,\\psi,\\xi,\\omega,A\)\+\\alpha\_\{B\}\\,\\mathcal\{L\}\_\{\\mathrm\{BAL\}\}\(\\omega\)\.\(13\)Here,ℒCFM\\mathcal\{L\}\_\{\\mathrm\{CFM\}\}learns the shared global transport estimator,ℒWTA\\mathcal\{L\}\_\{\\mathrm\{WTA\}\}promotes mode\-specific residual corrections through hard expert assignment, andℒBAL\\mathcal\{L\}\_\{\\mathrm\{BAL\}\}prevents degenerate routing\.
Table 1:Results of all methods on all datasets\.MetricMethodsSinesStocksETThMuJoCoEnergyfMRIContext\-FID Score↓\\downarrow\\cellcolor\[rgb\]0\.88,0\.94,0\.99 Ours\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.003±\.000\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.027±\.008\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.015±\.001\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.006±\.005\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.022±\.002\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.155±\.002SMF\[[44](https://arxiv.org/html/2605.28867#bib.bib44)\]0\.005±\.0010\.023±\.0030\.059±\.0070\.019±\.0020\.049±\.0070\.116±\.006Diffusion\-TS\[[36](https://arxiv.org/html/2605.28867#bib.bib36)\]0\.014±\.0010\.208±\.0520\.132±\.0100\.015±\.0010\.092±\.0200\.108±\.006DiffTime\[[7](https://arxiv.org/html/2605.28867#bib.bib7)\]0\.008±\.0020\.236±\.0740\.299±\.0440\.188±\.0280\.289±\.0450\.324±\.045Diffwave\[[35](https://arxiv.org/html/2605.28867#bib.bib35)\]0\.014±\.0020\.232±\.0320\.873±\.0610\.393±\.0411\.031±\.1310\.244±\.018TimeGAN\[[12](https://arxiv.org/html/2605.28867#bib.bib12)\]0\.101±\.0140\.105±\.0250\.300±\.0130\.565±\.0280\.742±\.1331\.292±\.218TimeVAE\[[31](https://arxiv.org/html/2605.28867#bib.bib31)\]0\.307±\.0600\.215±\.0350\.805±\.1860\.251±\.0151\.631±\.14214\.449±\.969Cot\-GAN\[[29](https://arxiv.org/html/2605.28867#bib.bib29)\]1\.337±\.0680\.408±\.0860\.980±\.0711\.094±\.0791\.039±\.0287\.813±\.550Correlational Score↓\\downarrow\\cellcolor\[rgb\]0\.88,0\.94,0\.99 Ours\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.017±\.004\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.007±\.003\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.022±\.010\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.203±\.030\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.495±\.033\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.898±\.025SMF\[[44](https://arxiv.org/html/2605.28867#bib.bib44)\]0\.027±\.0120\.010±\.0070\.034±\.0060\.210±\.0320\.900±\.2770\.774±\.017Diffusion\-TS\[[36](https://arxiv.org/html/2605.28867#bib.bib36)\]0\.060±\.0080\.013±\.0090\.059±\.0080\.188±\.0350\.859±\.1831\.416±\.028DiffTime\[[7](https://arxiv.org/html/2605.28867#bib.bib7)\]0\.017±\.0040\.006±\.0020\.067±\.0050\.238±\.0311\.158±\.0951\.701±\.048Diffwave\[[35](https://arxiv.org/html/2605.28867#bib.bib35)\]0\.022±\.0050\.030±\.0200\.175±\.0060\.479±\.0185\.001±\.1543\.927±\.049TimeGAN\[[12](https://arxiv.org/html/2605.28867#bib.bib12)\]0\.045±\.0100\.063±\.0050\.210±\.0060\.886±\.0354\.010±\.10423\.502±\.039TimeVAE\[[31](https://arxiv.org/html/2605.28867#bib.bib31)\]0\.131±\.0100\.095±\.0080\.111±\.0200\.388±\.0411\.728±\.22617\.296±\.526Cot\-GAN\[[29](https://arxiv.org/html/2605.28867#bib.bib29)\]0\.049±\.0100\.087±\.0040\.249±\.0091\.042±\.0073\.164±\.06126\.824±\.449Discriminative Score↓\\downarrow\\cellcolor\[rgb\]0\.88,0\.94,0\.99 Ours\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.004±\.003\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.030±\.004\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.005±\.004\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.004±\.008\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.039±\.008\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.129±\.053SMF\[[44](https://arxiv.org/html/2605.28867#bib.bib44)\]0\.005±\.0060\.027±\.0140\.017±\.0090\.005±\.0040\.150±\.0180\.136±\.139Diffusion\-TS\[[36](https://arxiv.org/html/2605.28867#bib.bib36)\]0\.006±\.0050\.086±\.0240\.077±\.0100\.012±\.0060\.139±\.0120\.200±\.083DiffTime\[[7](https://arxiv.org/html/2605.28867#bib.bib7)\]0\.013±\.0060\.107±\.0160\.110±\.0070\.154±\.0450\.485±\.0020\.245±\.051Diffwave\[[35](https://arxiv.org/html/2605.28867#bib.bib35)\]0\.017±\.0080\.232±\.0610\.190±\.0080\.203±\.0960\.473±\.0040\.402±\.029TimeGAN\[[12](https://arxiv.org/html/2605.28867#bib.bib12)\]0\.011±\.0080\.102±\.0210\.114±\.0550\.238±\.0680\.236±\.0120\.484±\.042TimeVAE\[[31](https://arxiv.org/html/2605.28867#bib.bib31)\]0\.041±\.0440\.145±\.1200\.209±\.0580\.230±\.1020\.499±\.0000\.476±\.044Cot\-GAN\[[29](https://arxiv.org/html/2605.28867#bib.bib29)\]0\.254±\.1370\.230±\.0160\.325±\.0990\.426±\.0220\.498±\.0020\.492±\.018RNN\-AR\[[12](https://arxiv.org/html/2605.28867#bib.bib12)\]0\.495±\.0010\.226±\.035\-\-0\.483±\.004\-Predictive Score↓\\downarrow\\cellcolor\[rgb\]0\.88,0\.94,0\.99 Ours\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.093±\.000\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.037±\.000\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.122±\.003\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.006±\.003\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.251±\.003\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.100±\.000SMF\[[44](https://arxiv.org/html/2605.28867#bib.bib44)\]0\.093±\.0000\.037±\.0000\.123±\.0050\.008±\.0010\.193±\.0450\.100±\.000Diffusion\-TS\[[36](https://arxiv.org/html/2605.28867#bib.bib36)\]0\.095±\.0000\.037±\.0000\.118±\.0080\.007±\.0010\.251±\.0000\.100±\.000DiffTime\[[7](https://arxiv.org/html/2605.28867#bib.bib7)\]0\.093±\.0000\.038±\.0010\.121±\.0040\.010±\.0010\.252±\.0000\.100±\.000Diffwave\[[35](https://arxiv.org/html/2605.28867#bib.bib35)\]0\.093±\.0000\.047±\.0000\.130±\.0010\.013±\.0000\.251±\.0000\.101±\.000TimeGAN\[[12](https://arxiv.org/html/2605.28867#bib.bib12)\]0\.093±\.0190\.038±\.0010\.124±\.0010\.025±\.0030\.273±\.0040\.126±\.002TimeVAE\[[31](https://arxiv.org/html/2605.28867#bib.bib31)\]0\.093±\.0000\.039±\.0000\.126±\.0040\.012±\.0020\.292±\.0000\.113±\.003Cot\-GAN\[[29](https://arxiv.org/html/2605.28867#bib.bib29)\]0\.100±\.0000\.047±\.0010\.129±\.0000\.068±\.0090\.259±\.0000\.185±\.003RNN\-AR\[[12](https://arxiv.org/html/2605.28867#bib.bib12)\]0\.150±\.0220\.038±\.001\-\-0\.315±\.005\-Original0\.094±\.0010\.036±\.0010\.121±\.0050\.007±\.0010\.250±\.0030\.090±\.001
### 4\.4Sampling with Expert\-Informed Residual Dynamics
During inference, PrismFlow follows the globally learned FM transport field while the router selects a dominant expertkt∗=argmaxkπtkk\_\{t\}^\{\*\}=\\arg\\max\_\{k\}\\pi\_\{t\}^\{k\}for each state\. The selected expert provides a residual correction that steers the trajectory toward mode\-consistent local evolution:
xt\+Δt=xt\+\(vtθ\(xt,t\)\+γλtvtψ,kt∗\(xt,t\)\)Δt,x\_\{t\+\\Delta t\}=x\_\{t\}\+\\Big\(v\_\{t\}^\{\\theta\}\(x\_\{t\},t\)\+\\gamma\\lambda\_\{t\}v\_\{t\}^\{\\psi,k^\{\*\}\_\{t\}\}\(x\_\{t\},t\)\\Big\)\\Delta t,\(14\)This update preserves spectral patterns that tend to be lost under global averaging, yielding samples that are statistically consistent and dynamically expressive\.
## 5Empirical Evaluation
We evaluate PrismFlow on real\-world and synthetic datasets for unconditional and conditional generation, measuring sample quality, spectral diversity, and multi\-scale temporal preservation\. We also run ablations to verify each component and examine how experts partition the data manifold\.
Datasets & Metrics\.Following\[[12](https://arxiv.org/html/2605.28867#bib.bib12),[36](https://arxiv.org/html/2605.28867#bib.bib36)\], we evaluate our model on four real\-world benchmarks \(Stocks, ETTh, Energy, and fMRI\) and two synthetic datasets \(Sines and MuJoCo\), covering varying degrees of dynamical complexity\. For evaluation, we adopt Context\-FID\[[30](https://arxiv.org/html/2605.28867#bib.bib30)\]and the Correlational Score\[[47](https://arxiv.org/html/2605.28867#bib.bib47)\]to measure distributional alignment and assess temporal dependencies\. We additionally use the Discriminative Score\[[12](https://arxiv.org/html/2605.28867#bib.bib12)\]and Predictive Score\[[12](https://arxiv.org/html/2605.28867#bib.bib12)\]to evaluate statistical fidelity\.
### 5\.1Evaluation of Mode Diversity
Figure 2:Comparison of DMD eigenvalues between real and generated time series for our PrismFlow and vanilla FM\. The imaginary axis represents the oscillation frequency \(larger absolute values indicate faster oscillations\), while the distance from the origin indicates system stability\.We first examine mode collapse by comparing PrismFlow with vanilla Flow Matching\. Figure[2](https://arxiv.org/html/2605.28867#S5.F2)illustrates the DMD eigenvalue distributions on the Stocks and fMRI datasets\. The imaginary axis reflects oscillation frequency, and a wider spread indicates richer temporal dynamics across slow and fast variations\. The real data \(blue circles\) exhibit broad spectral support and clear geometric structure, suggesting complex multi\-scale dynamics rather than a single dominant time scale\.
Vanilla FM shows evident spectral contraction: its eigenvalues \(gray squares\) cluster near the real axis and span a narrower range along the imaginary direction\. This pattern indicates that the learned single\-field estimator favors a few slow modes, while attenuating high\-frequency and transient components after ODE integration\. Such contraction is consistent with the estimator\-level regression\-to\-the\-mean behavior discussed in Sec\.[3](https://arxiv.org/html/2605.28867#S3), and results in overly smooth generated trajectories\. But PrismFlow \(pink triangles\) achieves much denser spectral overlap with the real data\. By using expert\-informed residual dynamics to complement the global transport estimator, PrismFlow reduces mean\-seeking effects and preserves a wider range of oscillatory modes\. Consequently, the generated signals retain broader spectral diversity and multi\-scale temporal structure, better matching the dynamical patterns of the empirical distribution\.
Table 2:Performance under different training data sizes on the Stocks dataset\.MetricMethods20%40%60%80%Context\-FIDScore↓\\downarrow\\cellcolor\[rgb\]0\.88,0\.94,0\.99 Ours\\cellcolor\[rgb\]0\.88,0\.94,0\.991\.920±\\pm\.574\\cellcolor\[rgb\]0\.88,0\.94,0\.991\.890±\\pm\.251\\cellcolor\[rgb\]0\.88,0\.94,0\.991\.789±\\pm\.437\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.824±\\pm\.434SMF6\.406±\\pm\.2223\.900±\\pm\.4552\.613±\\pm\.5670\.887±\\pm\.054Diffusion\-TS5\.197±\\pm\.5833\.469±\\pm\.3892\.854±\\pm\.5830\.860±\\pm\.204DiscriminativeScore↓\\downarrow\\cellcolor\[rgb\]0\.88,0\.94,0\.99 Ours\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.303±\\pm\.004\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.242±\\pm\.012\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.194±\\pm\.005\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.084±\\pm\.005SMF0\.347±\\pm\.0080\.259±\\pm\.0140\.202±\\pm\.0060\.097±\\pm\.006Diffusion\-TS0\.351±\\pm\.0060\.267±\\pm\.0100\.221±\\pm\.0080\.158±\\pm\.018
Figure 3:Comparison of unconditional time\-series generation on the Stocks dataset \(left\) and imputation \(top row\) and forecasting \(bottom row\) on the fMRI dataset \(right\) between our PrismFlow and Diffusion\-TS\.
### 5\.2Unconditional Time Series Generation
We evaluate PrismFlow on unconditional time\-series generation under comparable parameter budgets for all baselines\. The compared methods include generative adversarial networks \(GANs\)\[[12](https://arxiv.org/html/2605.28867#bib.bib12),[29](https://arxiv.org/html/2605.28867#bib.bib29)\], variational autoencoders \(VAEs\)\[[31](https://arxiv.org/html/2605.28867#bib.bib31)\], diffusion\-based models\[[35](https://arxiv.org/html/2605.28867#bib.bib35),[36](https://arxiv.org/html/2605.28867#bib.bib36),[7](https://arxiv.org/html/2605.28867#bib.bib7)\], and FM\-based models\[[44](https://arxiv.org/html/2605.28867#bib.bib44)\]\.
Table[1](https://arxiv.org/html/2605.28867#S4.T1)reports generation performance across diverse datasets\. PrismFlow achieves the best results in 16 out of 24 cases, outperforming recent baselines such as SMF and Diffusion\-TS\. These results are consistent with the spectral analysis above: by using routed residual experts to complement the global estimator, PrismFlow alleviates estimator\-level mean seeking and better preserves multi\-scale temporal modes\. The benefit is especially clear on complex datasets such as Energy, where PrismFlow reduces Context\-FID by over 55% and Discriminative Score by about 74% compared with the second\-best method\. On the high\-dimensional fMRI dataset, PrismFlow also achieves the lowest Discriminative Score, indicating that it better preserves rhythmic and structured dynamics that are often attenuated by over\-smoothed learned flows\. Overall, the results show that PrismFlow improves both statistical fidelity and temporal realism by maintaining a broader spectral range and capturing multi\-scale structures in time\-series data\.
We further visualize generation quality using t\-SNE projections and kernel density estimation \(KDE\)\. As shown in Figure[3](https://arxiv.org/html/2605.28867#S5.F3)\(a\), samples generated by PrismFlow overlap closely with real data in both views\. In the t\-SNE plot, generated trajectories follow the global structure of real samples and cover both central regions and more complex outer areas\. In contrast, Diffusion\-TS leaves visible gaps in the sample space, suggesting partial mode dropping\. The KDE plots show a similar trend: PrismFlow closely matches the real density curves and captures the multiple peaks and sharp variations in the Stocks dataset\. Our appendix further shows that PrismFlow achieves this performance with high sampling efficiency\.
Small\-Scale Settings\.We also evaluate PrismFlow under severe data scarcity by training on reduced subsets of the training data\. As shown in Tab\.[2](https://arxiv.org/html/2605.28867#S5.T2), PrismFlow remains stable across all low\-data settings and achieves the best Context\-FID and Discriminative Scores among competing baselines\. With only 20% of the training data, PrismFlow still yields clear gains, improving Context\-FID by 63\.1% and Discriminative Score by 12\.7%\. We attribute this robustness to expert\-specific residual modeling: by assigning distinct temporal regimes to specialized experts, PrismFlow preserves subtle temporal variations that are easily averaged out in low\-data regimes\. As a result, it maintains fine\-grained dynamics and high generation quality even with substantially fewer training examples\.
### 5\.3Conditional Time Series Generation
Beyond unconditional generation, we extend PrismFlow to conditional tasks, including forecasting and imputation, using flow\-matching guidance\[[48](https://arxiv.org/html/2605.28867#bib.bib48)\]\. Specifically, we add a guidance gradient that enforces consistency with observed values and use it as an additional correction to the sampling velocity\. As shown in Figure[3](https://arxiv.org/html/2605.28867#S5.F3)\(b\), PrismFlow better follows sharp transitions and sudden spikes where Diffusion\-TS often falls short\. This advantage comes from the Koopman\-inspired experts, which provide structured temporal inductive biases, and the WTA routing mechanism, which promotes specialization over fine\-grained temporal modes\. As a result, the model can recover more faithful trajectories from limited local observations\.
### 5\.4Ablation Study
Table 3:Ablation study on the model design\.MetricVariantStocksfMRIContext\-FIDScore↓\\downarrow\\cellcolor\[rgb\]0\.88,0\.94,0\.99 Ours\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.027±\\pm\.008\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.155±\\pm\.002Vanilla FM0\.116±\\pm\.0120\.324±\\pm\.030Trainingw/oℒWTA\\mathcal\{L\}\_\{\\mathrm\{WTA\}\}0\.107±\\pm\.0110\.294±\\pm\.018Trainingw/oℒBAL\\mathcal\{L\}\_\{\\mathrm\{BAL\}\}0\.088±\\pm\.0150\.178±\\pm\.007Trainingβ=0\\beta=00\.095±\\pm\.0100\.248±\\pm\.019Samplingγ=0\\gamma=00\.091±\\pm\.0100\.185±\\pm\.014DiscriminativeScore↓\\downarrow\\cellcolor\[rgb\]0\.88,0\.94,0\.99 Ours\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.030±\\pm\.004\\cellcolor\[rgb\]0\.88,0\.94,0\.990\.129±\\pm\.053Vanilla FM0\.186±\\pm\.0050\.245±\\pm\.050Trainingw/oℒWTA\\mathcal\{L\}\_\{\\mathrm\{WTA\}\}0\.112±\\pm\.0140\.402±\\pm\.029Trainingw/oℒBAL\\mathcal\{L\}\_\{\\mathrm\{BAL\}\}0\.106±\\pm\.0450\.218±\\pm\.020Trainingβ=0\\beta=00\.102±\\pm\.0740\.229±\\pm\.058Samplingγ=0\\gamma=00\.075±\\pm\.0120\.275±\\pm\.029
Table[3](https://arxiv.org/html/2605.28867#S5.T3)further shows the contribution of each component\. PrismFlow consistently outperforms vanilla FM, confirming the benefit of augmenting the global transport estimator with mode\-specific residual dynamics\. Among all components, removing the WTA objectiveℒWTA\\mathcal\{L\}\_\{\\mathrm\{WTA\}\}causes the largest performance drop, especially on the high\-dimensional fMRI dataset\. Without competitive specialization, experts are less clearly separated across temporal regimes, and the learned dynamics become more mean\-seeking, producing smoother trajectories that miss fine\-grained structures\. The load\-balancing termℒBAL\\mathcal\{L\}\_\{\\mathrm\{BAL\}\}and confidence weightβ\\betafurther improve routing behavior by preventing a small subset of experts from dominating\. This is particularly useful for datasets such as Stocks, where temporal patterns are noisy and shift over time\. By encouraging broader expert utilization, these components help retain both frequent and less common temporal modes\. Finally, performance degrades whenγ=0\\gamma=0, indicating that expert\-informed residual dynamics during sampling are essential for generation quality\. Without this residual correction, the sampler relies mainly on the global estimator and becomes less adaptive to diverse temporal regimes, losing important local variations\. This supports our design principle: combining stable global transport with competitively routed residual experts mitigates estimator\-level averaging and better captures complex temporal structure\.
### 5\.5Analysis of Mode\-Specific Experts
Figure 4:\(a\) DMD eigenvalues for Expert 1 and Expert 2 on fMRI\. \(b\) Effect of the number of experts on the unconditional task for the MuJoCo and Stocks datasets\. Lower is better\.To better understand how PrismFlow reproduces the temporal modes in Figure[2](https://arxiv.org/html/2605.28867#S5.F2), we examine the DMD eigenvalue spectra of the trained fMRI experts in Figure[4](https://arxiv.org/html/2605.28867#S5.F4)\(a\)\. The expert spectra show a clear division of labor that mirrors the main structures of the real spectrum\. Expert 1 mainly produces eigenvalues close to the real axis, corresponding to non\-oscillatory components such as slow drifts and smooth trend\-like dynamics\. These modes match the band near the real axis in Figure[2](https://arxiv.org/html/2605.28867#S5.F2)\. In contrast, Expert 2 focuses on oscillatory components with larger imaginary parts, forming conjugate clusters above and below the real axis\. This structure matches the symmetric clouds in the real DMD spectrum and reflects rhythmic fluctuations in fMRI signals\. These results explain why PrismFlow improves spectral fidelity: rather than forcing a single learned estimator to average incompatible dynamics, the WTA routing mechanism assigns different temporal regimes to specialized experts\. As a result, the generated sequences preserve diverse dynamics and yield spectra that better match the empirical distribution\.
We further analyze the effect of the number of expertsKKin Figure[4](https://arxiv.org/html/2605.28867#S5.F4)\(b\)\. Across datasets,K=1K=1performs the worst, confirming that a single expert lacks sufficient capacity to capture heterogeneous temporal structures\. UsingK=4K=4provides consistently strong performance and serves as our default setting, while the optimal capacity still depends on data complexity\. For Stocks, increasingKKto 8 further improves performance, suggesting that additional experts help model its diverse and shifting temporal patterns\. For MuJoCo, performance improves with a moderate number of experts but slightly drops when too many are used, likely because the expert pool exceeds the intrinsic complexity of the data and weakens clear specialization\.
## 6Conclusion
In this paper, we introduce PrismFlow, a new FM method with residual dynamical experts for time\-series generation\. The method is motivated by a key limitation of learned single\-field FM models: when time\-series data contain multiple competing temporal regimes, a single learned transport field may suffer from estimator\-level averaging, which smooths distinct dynamics and can lead to mode collapse\. PrismFlow addresses this problem by decomposing the transport process into a stable global backbone and hard\-routed Koopman\-inspired residual experts\. The global field captures shared temporal evolution, while the experts model regime\-specific residual dynamics, such as oscillatory and high\-frequency patterns\. Empirical results on various generation tasks show that PrismFlow achieves strong performance and effectively mitigates mode collapse by preserving diverse spectral patterns, unlocking a new paradigm for robust generative modeling\.
## References
- Kaushik et al\. \[2020\]Shruti Kaushik, Abhinav Choudhury, Pankaj Kumar Sheron, Nataraj Dasgupta, Sayee Natarajan, Larry A Pickett, and Varun Dutt\.Ai in healthcare: time\-series forecasting using statistical, neural, and ensemble architectures\.*Frontiers in big data*, 3:4, 2020\.
- Zhang et al\. \[2025\]Junru Zhang, Lang Feng, Xu Guo, Han Yu, Yabo Dong, and Duanqing Xu\.Diffusion\-guided diversity for single domain generalization in time series classification\.In*Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\. 2*, pages 3764–3773, 2025\.
- Huang et al\. \[2024\]Hongbin Huang, Minghua Chen, and Xiao Qiao\.Generative learning for financial time series with irregular and scale\-invariant patterns\.In*The Twelfth International Conference on Learning Representations*, 2024\.
- \[4\]Fatemeh Chitsaz and Saman Haratizadeh\.Dual adaptation of time\-series foundation models for financial forecasting\.In*1st ICML Workshop on Foundation Models for Structured Data*\.
- Liu et al\. \[2024\]Zhen Liu, Wenbin Pei, Disen Lan, and Qianli Ma\.Diffusion language\-shapelets for semi\-supervised time\-series classification\.In*Proceedings of the AAAI conference on artificial intelligence*, volume 38, pages 14079–14087, 2024\.
- He et al\. \[2024\]Yang He, Yuhan Wu, Junru Zhang, and Yabo Dong\.Ltcr: Long temporal characteristic reconstruction for segmentation in contrastive learning\.In*Joint European Conference on Machine Learning and Knowledge Discovery in Databases*, pages 355–371\. Springer, 2024\.
- Coletta et al\. \[2023\]Andrea Coletta, Sriram Gopalakrishnan, Daniel Borrajo, and Svitlana Vyetrenko\.On the constrained time\-series generation problem\.*Advances in Neural Information Processing Systems*, 36:61048–61059, 2023\.
- Wu et al\. \[2024\]Yuhan Wu, Xiyu Meng, Junru Zhang, Yang He, Joseph A Romo, Yabo Dong, and Dongming Lu\.Effective lstms with seasonal\-trend decomposition and adaptive learning and niching\-based backtracking search algorithm for time series forecasting\.*Expert Systems with Applications*, 236:121202, 2024\.
- Farayola et al\. \[2024\]Oluwatoyin Ajoke Farayola, Oluwabukunmi Latifat Olorunfemi, and Philip Olaseni Shoetan\.Data privacy and security in it: a review of techniques and challenges\.*Computer Science & IT Research Journal*, 5\(3\):606–615, 2024\.
- Gonen et al\. \[2025\]Tal Gonen, Itai Pemper, Ilan Naiman, Nimrod Berman, and Omri Azencot\.Time series generation under data scarcity: A unified generative modeling approach\.*arXiv preprint arXiv:2505\.20446*, 2025\.
- Goodfellow et al\. \[2014\]Ian J Goodfellow, Jean Pouget\-Abadie, Mehdi Mirza, Bing Xu, David Warde\-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio\.Generative adversarial nets\.*Advances in neural information processing systems*, 27, 2014\.
- Yoon et al\. \[2019\]Jinsung Yoon, Daniel Jarrett, and Mihaela Van der Schaar\.Time\-series generative adversarial networks\.*Advances in neural information processing systems*, 32, 2019\.
- Lipman et al\. \[2022\]Yaron Lipman, Ricky TQ Chen, Heli Ben\-Hamu, Maximilian Nickel, and Matt Le\.Flow matching for generative modeling\.*arXiv preprint arXiv:2210\.02747*, 2022\.
- Albergo et al\. \[2023\]Michael S Albergo, Nicholas M Boffi, and Eric Vanden\-Eijnden\.Stochastic interpolants: A unifying framework for flows and diffusions\.*arXiv preprint arXiv:2303\.08797*, 2023\.
- Ho et al\. \[2020\]Jonathan Ho, Ajay Jain, and Pieter Abbeel\.Denoising diffusion probabilistic models\.*Advances in neural information processing systems*, 33:6840–6851, 2020\.
- Ho and Salimans \[2022\]Jonathan Ho and Tim Salimans\.Classifier\-free diffusion guidance\.*arXiv preprint arXiv:2207\.12598*, 2022\.
- Schusterbauer et al\. \[2025\]Johannes Schusterbauer, Ming Gui, Frank Fundel, and Björn Ommer\.Diff2flow: Training flow matching models via diffusion model alignment\.In*Proceedings of the Computer Vision and Pattern Recognition Conference*, pages 28347–28357, 2025\.
- Arora et al\. \[2017\]Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang\.Generalization and equilibrium in generative adversarial nets \(gans\)\.In*International conference on machine learning*, pages 224–232\. PMLR, 2017\.
- Bang and Shim \[2018\]Duhyeon Bang and Hyunjung Shim\.Improved training of generative adversarial networks using representative features\.In*International conference on machine learning*, pages 433–442\. PMLR, 2018\.
- Abbahaddou and Aboussalah \[2025\]Yassine Abbahaddou and Amine M Aboussalah\.A geometry\-aware metric for mode collapse in time series generative models\.In*The Thirty\-ninth Annual Conference on Neural Information Processing Systems*, 2025\.URL[https://openreview\.net/forum?id=YAc0O13qMc](https://openreview.net/forum?id=YAc0O13qMc)\.
- Eide et al\. \[2020\]Aksel Wilhelm Wold Eide, Eilif Solberg, and Ingebjørg Kåsen\.Sample weighting as an explanation for mode collapse in generative adversarial networks\.*arXiv preprint arXiv:2010\.02035*, 2020\.
- Pan et al\. \[2022\]Ziqi Pan, Li Niu, and Liqing Zhang\.Unigan: Reducing mode collapse in gans using a uniform generator\.*Advances in neural information processing systems*, 35:37690–37703, 2022\.
- Koopman \[1931\]Bernard O Koopman\.Hamiltonian systems and transformation in hilbert space\.*Proceedings of the National Academy of Sciences*, 17\(5\):315–318, 1931\.
- Lan and Mezić \[2013\]Yueheng Lan and Igor Mezić\.Linearization in the large of nonlinear systems and koopman operator spectrum\.*Physica D: Nonlinear Phenomena*, 242\(1\):42–53, 2013\.
- Shazeer et al\. \[2017\]Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean\.Outrageously large neural networks: The sparsely\-gated mixture\-of\-experts layer\.*arXiv preprint arXiv:1701\.06538*, 2017\.
- Jacobs et al\. \[1991\]Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton\.Adaptive mixtures of local experts\.*Neural computation*, 3\(1\):79–87, 1991\.
- Mogren \[2016\]Olof Mogren\.C\-rnn\-gan: Continuous recurrent neural networks with adversarial training\.*arXiv preprint arXiv:1611\.09904*, 2016\.
- Esteban et al\. \[2017\]Cristóbal Esteban, Stephanie L Hyland, and Gunnar Rätsch\.Real\-valued \(medical\) time series generation with recurrent conditional gans\.*arXiv preprint arXiv:1706\.02633*, 2017\.
- Xu et al\. \[2020\]Tianlin Xu, Li Kevin Wenliang, Michael Munn, and Beatrice Acciaio\.Cot\-gan: Generating sequential data via causal optimal transport\.*Advances in neural information processing systems*, 33:8798–8809, 2020\.
- Jeha et al\. \[2022\]Paul Jeha, Michael Bohlke\-Schneider, Pedro Mercado, Shubham Kapoor, Rajbir Singh Nirwan, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski\.Psa\-gan: Progressive self attention gans for synthetic time series\.In*The tenth international conference on learning representations*, 2022\.
- Desai et al\. \[2021\]Abhyuday Desai, Cynthia Freeman, Zuhui Wang, and Ian Beaver\.Timevae: A variational auto\-encoder for multivariate time series generation\.*arXiv preprint arXiv:2111\.08095*, 2021\.
- Qian et al\. \[2021\]Hangwei Qian, Sinno Jialin Pan, and Chunyan Miao\.Latent independent excitation for generalizable sensor\-based cross\-person activity recognition\.In*Proceedings of the AAAI conference on artificial intelligence*, volume 35, pages 11921–11929, 2021\.
- Zhang et al\. \[2024\]Junru Zhang, Lang Feng, Zhidan Liu, Yuhan Wu, Yang He, Yabo Dong, and Duanqing Xu\.Diverse intra\-and inter\-domain activity style fusion for cross\-person generalization in activity recognition\.In*Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining*, pages 4213–4222, 2024\.
- Yang et al\. \[2024\]Yiyuan Yang, Ming Jin, Haomin Wen, Chaoli Zhang, Yuxuan Liang, Lintao Ma, Yi Wang, Chenghao Liu, Bin Yang, Zenglin Xu, et al\.A survey on diffusion models for time series and spatio\-temporal data\.*ACM Computing Surveys*, 2024\.
- Kong et al\. \[2020\]Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro\.Diffwave: A versatile diffusion model for audio synthesis\.*arXiv preprint arXiv:2009\.09761*, 2020\.
- Yuan and Qiao \[2024\]Xinyu Yuan and Yan Qiao\.Diffusion\-ts: Interpretable diffusion for general time series generation\.*arXiv preprint arXiv:2403\.01742*, 2024\.
- Chen et al\. \[2024\]Zhicheng Chen, FENG SHIBO, Zhong Zhang, Xi Xiao, Xingyu Gao, and Peilin Zhao\.Sdformer: Similarity\-driven discrete transformer for time series generation\.*Advances in Neural Information Processing Systems*, 37:132179–132207, 2024\.
- Esser et al\. \[2024\]Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al\.Scaling rectified flow transformers for high\-resolution image synthesis\.In*Forty\-first international conference on machine learning*, 2024\.
- Chen et al\. \[2025\]Shoufa Chen, Chongjian Ge, Yuqi Zhang, Yida Zhang, Fengda Zhu, Hao Yang, Hongxiang Hao, Hui Wu, Zhichao Lai, Yifei Hu, et al\.Goku: Flow based video generative foundation models\.In*Proceedings of the Computer Vision and Pattern Recognition Conference*, pages 23516–23527, 2025\.
- Tamir et al\. \[2024\]Ella Tamir, Najwa Laabid, Markus Heinonen, Vikas Garg, and Arno Solin\.Conditional flow matching for time series modelling\.In*ICML 2024 Workshop on Structured Probabilistic Inference\{\\\{\\\\backslash&\}\\\}Generative Modeling*, 2024\.
- Ge et al\. \[2025\]Yunfeng Ge, Jiawei Li, Yiji Zhao, Haomin Wen, Zhao Li, Meikang Qiu, Hongyan Li, Ming Jin, and Shirui Pan\.T2s: High\-resolution time series generation with text\-to\-series diffusion models\.*arXiv preprint arXiv:2505\.02417*, 2025\.
- Liu et al\. \[2025\]Yong Liu, Guo Qin, Zhiyuan Shi, Zhi Chen, Caiyin Yang, Xiangdong Huang, Jianmin Wang, and Mingsheng Long\.Sundial: A family of highly capable time series foundation models\.*arXiv preprint arXiv:2502\.00816*, 2025\.
- Kollovieh et al\. \[2024\]Marcel Kollovieh, Marten Lienen, David Lüdke, Leo Schwinn, and Stephan Günnemann\.Flow matching with gaussian process priors for probabilistic time series forecasting\.*arXiv preprint arXiv:2410\.03024*, 2024\.
- Kim et al\. \[2025\]Jinwoo Kim, Max Beier, Petar Bevanda, Nayun Kim, and Seunghoon Hong\.Sequence modeling with spectral mean flows\.*arXiv preprint arXiv:2510\.15366*, 2025\.
- Cortés et al\. \[2025\]Adrien Cortés, Rémi Rehm, and Victor Letzelter\.Winner\-takes\-all for multivariate probabilistic time series forecasting\.*arXiv preprint arXiv:2506\.05515*, 2025\.
- Schmid \[2010\]Peter J Schmid\.Dynamic mode decomposition of numerical and experimental data\.*Journal of fluid mechanics*, 656:5–28, 2010\.
- Liao et al\. \[2020\]Shujian Liao, Hao Ni, Lukasz Szpruch, Magnus Wiese, Marc Sabate\-Vidales, and Baoren Xiao\.Conditional sig\-wasserstein gans for time series generation\.*arXiv preprint arXiv:2006\.05421*, 2020\.
- Feng et al\. \[2025\]Ruiqi Feng, Chenglei Yu, Wenhao Deng, Peiyan Hu, and Tailin Wu\.On the guidance of flow matching\.*arXiv preprint arXiv:2502\.02150*, 2025\.Similar Articles
SDFlow: Similarity-Driven Flow Matching for Time Series Generation
This paper introduces SDFlow, a similarity-driven flow matching framework for time series generation that addresses exposure bias in autoregressive models. It achieves state-of-the-art performance and inference speedups by operating in the frozen VQ latent space with low-rank manifold decomposition.
Capturing non-Markovian dynamics in non-equilibrium stochastic systems using flow matching
This paper develops a generative flow matching method to capture non-Markovian dynamics in non-equilibrium stochastic systems, demonstrating improved predictions for the Kramers first passage time problem compared to Markovian baselines.
Recursive Flow Matching
Introduces Recursive Flow Matching (RecFM), a generative framework for forecasting complex spatiotemporal dynamics that achieves high fidelity with fewer steps and improved accuracy and speed, including up to 20x speedup over diffusion-based emulators.
Flow-OPD: On-Policy Distillation for Flow Matching Models
Flow-OPD is a research paper introducing a two-stage on-policy distillation framework for Flow Matching text-to-image models, significantly improving generation quality and alignment metrics using Stable Diffusion 3.5 Medium.
Multimarginal flow matching with optimal transport potentials
Proposes OTP-FM, a novel method for multimarginal flow matching that uses optimal transport potentials to softly steer flows through intermediate marginals, achieving state-of-the-art performance on single-cell RNA sequencing, oceanographic, and meteorological datasets.