Perron--Frobenius Operator Matching for Generative Modeling
Summary
Introduces Perron–Frobenius Operator Matching (PFOM), a generative framework that unifies flow, diffusion, and jump models via integral PF operator matching, proving KL divergence yields a practical loss equivalent to Koopman path matching, and develops Nesterov-accelerated training and sampling for improved efficiency.
View Cached Full Text
Cached at: 06/17/26, 05:38 AM
# Perron–Frobenius Operator Matching for Generative Modeling
Source: [https://arxiv.org/html/2606.17465](https://arxiv.org/html/2606.17465)
Wuwei WuJaemin OhJie ChenXiaoning QianTexas A&M University, College Station, TX 77840, USA \(e\-mail: shiqizhang001@tamu\.edu; jaemin\_oh@tamu\.edu; xqian@ece\.tamu\.edu\)City University of Hong Kong, Kowloon, Hong Kong SAR \(e\-mail: w\.wu@my\.cityu\.edu\.hk; jichen@cityu\.edu\.hk\)
###### Abstract
We introduce Perron–Frobenius Operator Matching \(PFOM\), a generative framework that matches density evolution via the integral PF operator, subsuming flow, diffusion, and jump models\. We prove that among Bregman divergences, only Kullback–Leibler divergence preserves equality between density\-level and sample\-conditioned objectives, yielding a practical loss equivalent to Koopman path matching\. We further develop Nesterov\-accelerated training and sampling that stabilize discretization and accelerate convergence\. PFOM achieves faster KL/W2W\_\{2\}/MMD decrease and improved wall\-clock efficiency with empirical validation\. PFOM unifies operator\-theoretic identification with modern generative modeling and opens paths to adaptive dictionaries and high\-dimensional applications\.
###### keywords:
Koopman and Perron\-Frobenius Operators, Flow Matching, Generative modeling
††thanks:This work was supported in part by the Hong Kong RGC under Project CityU 11203321, CityU 11213322, CityU 11207823\. XQ acknowledges the support from U\.S\. National Science Foundation \(NSF\) grants SHF\-2215573 and IIS\-2212419\.## 1Introduction
Characterizing Markov processes is fundamental to stochastic analysis\(Ross,[1995](https://arxiv.org/html/2606.17465#bib.bib3)\), with wide\-ranging applications in, e\.g\., finance\(Rolskiet al\.,[2009](https://arxiv.org/html/2606.17465#bib.bib15)\), statistical physics\(Van Kampen,[1992](https://arxiv.org/html/2606.17465#bib.bib14)\), and signal processing\(Oppenheimet al\.,[1997](https://arxiv.org/html/2606.17465#bib.bib13)\)\. The recent surge of artificial intelligence and generative modeling has amplified interest in learnable Markovian dynamics\(Hoet al\.,[2020](https://arxiv.org/html/2606.17465#bib.bib12); Yanget al\.,[2023](https://arxiv.org/html/2606.17465#bib.bib8); Lipmanet al\.,[2024](https://arxiv.org/html/2606.17465#bib.bib22)\), which is of interest to modeling and control of large\-scale, complex systems, especially in the context of neural network\-based control design\(Katzet al\.,[2022](https://arxiv.org/html/2606.17465#bib.bib10)\)and generative AI\-driven automated control algorithms\(Cuiet al\.,[2025](https://arxiv.org/html/2606.17465#bib.bib11)\)\.
A central challenge is to efficiently and accurately parameterize Markov processes\. Operator\-theoretic perspectives provide a principled route: the Markov transfer operator\(Eisneret al\.,[2015](https://arxiv.org/html/2606.17465#bib.bib7)\)offers a dominant characterization, and for \(nonlinear\) semidynamical systems, Perron–Frobenius theory grounds the Markov semigroup\(Lemmens and Nussbaum,[2012](https://arxiv.org/html/2606.17465#bib.bib4); Lasota and Mackey,[2013](https://arxiv.org/html/2606.17465#bib.bib16)\), revealing the duality between Koopman and Perron–Frobenius operators\. From a control\-theoretic viewpoint, these operators encode the probabilistic evolution of closed\-loop dynamics under stochastic policies and exogenous disturbances, enabling linear surrogates for stability analysis, constraint satisfaction, and performance verification of nonlinear systems\. In safety\-critical applications such as robotics, power systems, and networked infrastructures, learning and manipulating such operators from data is therefore crucial for risk\-aware decision\-making and robust control synthesis\. Building on this view, data\-driven identification methods, such as DMD\(Proctoret al\.,[2016](https://arxiv.org/html/2606.17465#bib.bib2)\)and EDMD\(Liet al\.,[2017](https://arxiv.org/html/2606.17465#bib.bib20); Bruntonet al\.,[2016](https://arxiv.org/html/2606.17465#bib.bib5)\), have become standard\.
Concurrently, modern generative models, such as diffusion\(Hoet al\.,[2020](https://arxiv.org/html/2606.17465#bib.bib12); Yanget al\.,[2023](https://arxiv.org/html/2606.17465#bib.bib8)\)and flow\-based models\(Lipmanet al\.,[2022](https://arxiv.org/html/2606.17465#bib.bib1),[2024](https://arxiv.org/html/2606.17465#bib.bib22)\), impose stronger demands: capturing multimodality and nonlinear density evolution with sample\-conditioned efficiency\. Traditional Koopman/Perron–Frobenius identification, designed primarily for prediction and control, does not directly address these generative objectives\.
To bridge this gap, we introduce*Perron–Frobenius Operator Matching \(PFOM\)*\. PFOM \(i\) generalizes diffusion and flow matching paradigms by matching full density evolution—extending beyond first\-order \(velocity\) descriptions to infinitely many orders—and \(ii\) strengthens operator learning for generative purposes by aligning density\-level objectives with sample\-conditioned criteria, thereby unifying operator\-theoretic identification with modern generative modeling\.
An important extension of PFOM is an*inertial*optimization/sampling scheme based on Nesterov’s acceleration\(Nesterov and others,[2018](https://arxiv.org/html/2606.17465#bib.bib6)\)\. We employ a lookahead extrapolation on the operator\-parameter iterates and an inertial update on sample trajectories\. Concretely, PFOM alternates between \(a\) extrapolated evaluation of the PF loss at a momentum point and \(b\) corrective updates with restart/monotone safeguards\. This yields: \(i\) faster empirical convergence of the PF loss and density metrics \(KL,W2W\_\{2\}, MMD\); \(ii\) reduced discretization error in sample propagation due to lookahead stabilization\.
The rest of the paper is organized as follows: In Section[2](https://arxiv.org/html/2606.17465#S2), we review relevant background knowledge of Koopman/Perron–Frobenius theory, Wasserstein and Bregman divergence measures, and generative modeling\. In Section[3](https://arxiv.org/html/2606.17465#S3), we explain why and how \(under what measure\) we should look at Perron–Frobenius operator matching, and then we convert it into the Koopman path matching problem for implementation\. Section[3\.5](https://arxiv.org/html/2606.17465#S3.SS5)brings up a Nesterov momentum accelerating method for faster generation\. Section[4](https://arxiv.org/html/2606.17465#S4)demonstrates with simulations and Section[5](https://arxiv.org/html/2606.17465#S5)concludes the paper\.
## 2Preliminaries
### 2\.1Koopman and Perron–Frobenius Operators
Consider a nonlinear dynamical systemxt=St\(x0\),x\_\{t\}=S\_\{t\}\(x\_\{0\}\),whereS:ℝn→ℝnS\\colon\\mathbb\{R\}^\{n\}\\to\\mathbb\{R\}^\{n\}is a non\-singular mapping\. For somef∈L∞f\\in\{L\}\_\{\\infty\}, the Koopman operator𝒦τ\\mathcal\{K\}\_\{\\tau\}, is defined as
\(𝒦τf\)\(xt\)=f\(Sτ\(xt\)\)\.\\displaystyle\(\\mathcal\{K\}\_\{\\tau\}f\)\(x\_\{t\}\)=f\(S\_\{\\tau\}\(x\_\{t\}\)\)\.\(1\)For someg∈L1g\\in\{L\}\_\{1\}, the Perron–Frobenius \(PF\) operator𝒫τ\\mathcal\{P\}\_\{\\tau\}, is defined as
∫y∈A\(𝒫τg\)\(y\)dy=∫x∈Sτ−1\(A\)g\(x\)dx,∀A∈Σ,\\displaystyle\\int\_\{y\\in A\}\(\\mathcal\{P\}\_\{\\tau\}g\)\(y\)\\mathrm\{d\}y=\\int\_\{x\\in S\_\{\\tau\}^\{\-1\}\(A\)\}g\(x\)\\mathrm\{d\}x,\\quad\\forall A\\in\\Sigma,\(2\)whereΣ\\Sigmadenotes someσ\\sigma\-algebra corresponding to the spaceℝn\\mathbb\{R\}^\{n\}\. Whenggis a density function, PF operator𝒫τ\\mathcal\{P\}\_\{\\tau\}is actually aMarkov operatorthat pushes forward present density to future densities\(Lasota and Mackey,[2013](https://arxiv.org/html/2606.17465#bib.bib16)\)\.
By PF\-Koopman duality, for somef∈L∞f\\in\{L\}\_\{\\infty\}and some densityg∈L1g\\in\{L\}\_\{1\}, we always have
⟨𝒦τf,g⟩=\\displaystyle\\langle\\mathcal\{K\}\_\{\\tau\}f,g\\rangle=⟨f,𝒫τg⟩\.\\displaystyle\\langle f,\\mathcal\{P\}\_\{\\tau\}g\\rangle\.\(3\)This means that the Koopman and PF operators form a dual pair\.
### 2\.2Generative Modeling
Consider two random vectorsX0∼𝒩\(0,I\)X\_\{0\}\\sim\\mathcal\{N\}\(0,I\)andX1∼q\(X1\)X\_\{1\}\\sim q\(X\_\{1\}\), whereX0X\_\{0\}is generated from the known prior distribution whileX1X\_\{1\}is from some distributionq\(X1\)q\(X\_\{1\}\)whose analytical form is not knowna priori\. The objective forgenerative modelingis to learn a generative modelℳθ\(X1\)\\mathcal\{M\}\_\{\\theta\}\(X\_\{1\}\), from observed data𝒟\(X1\)\\mathcal\{D\}\(X\_\{1\}\), to generate samples following the distributionq\(X1\)q\(X\_\{1\}\)\.
As shown in Fig\.[1](https://arxiv.org/html/2606.17465#S2.F1), one of such generative modeling strategies isFlow Matching\(FM\)\(Lipmanet al\.,[2022](https://arxiv.org/html/2606.17465#bib.bib1)\)\. It constructs a probability path\(pt\)t∈\[0,1\]\(p\_\{t\}\)\_\{t\\in\[0,1\]\}, from a known source distributionp0=pp\_\{0\}=pto the target distributionp1=qp\_\{1\}=q, where eachptp\_\{t\}is a distribution overℝd\\mathbb\{R\}^\{d\}\. Specifically, FM adopts a simple regression objective to train the velocity field neural network describing the instantaneous velocities of samples—later used to convert the source distributionp0p\_\{0\}into the target distributionp1p\_\{1\}, along the probability pathptp\_\{t\}\. That is, minimizing the flow matching loss:
𝔼Xt∼pt;t∈Unif\[0,1\]‖u\(Xt\)−vθ\(Xt\)‖2\\displaystyle\\mathbb\{E\}\_\{X\_\{t\}\\sim p\_\{t\};t\\in\\mathrm\{Unif\}\[0,1\]\}\{\\big\\\|u\(X\_\{t\}\)\-v\_\{\\theta\}\(X\_\{t\}\)\\big\\\|\}^\{2\}\(4\)by minimizing its surrogate version \(the one conditional onX0X\_\{0\}andX1X\_\{1\}\):
𝔼X0∼pX1∼q,t∼U\[0,1\]∥u\(Xt\(X1,X0,t\)\)−vθ\(Xt\(X1,X0,t\)\)∥2\.\\mathbb\{E\}\_\{X\_\{0\}\\sim pX\_\{1\}\\sim q,t\\sim\\mathrm\{U\}\[0,1\]\}\\Big\\\|u\(X\_\{t\}\(X\_\{1\},X\_\{0\},t\)\)\\\\ \-v\_\{\\theta\}\(X\_\{t\}\(X\_\{1\},X\_\{0\},t\)\)\\Big\\\|^\{2\}\.\(5\)Notice that for \([5](https://arxiv.org/html/2606.17465#S2.E5)\) and\([4](https://arxiv.org/html/2606.17465#S2.E4)\)to have the same optima, one has to use theSample\-Level Bregman divergenceas a distance measure, of which the mean squared error \(MSE\) loss is a special choice\. After training, we generate a novel sample from the target distributionX1∼qX\_\{1\}\\sim qby \(i\) drawing a novel sample from the source distributionX0∼pX\_\{0\}\\sim p, and \(ii\) solving the ordinary differential equation \(ODE\) determined by the velocity field:X˙t=vθ\(Xt\),t∈\[0,1\]\.\\dot\{X\}\_\{t\}=v\_\{\\theta\}\(X\_\{t\}\),\\quad t\\in\[0,1\]\.
In the discrete time settings, FM is formulated asPath Matching\. Meanwhile, the flow ODEX˙t=vθ\(Xt\)\\dot\{X\}\_\{t\}=v\_\{\\theta\}\(X\_\{t\}\)is solved by simulating the discrete path equationXk\+1=Xk\+τvθ\(Xk\)X\_\{k\+1\}=\{X\_\{k\}\+\\tau\}v\_\{\\theta\}\(X\_\{k\}\)\.
 
Figure 1:Demonstration for sample and noise \(left\) and the corresponding generation process \(right\)\(Lipmanet al\.,[2024](https://arxiv.org/html/2606.17465#bib.bib22)\)Flow matching and diffusion models control the*local*terms in \([6](https://arxiv.org/html/2606.17465#S3.E6)\)—drift \(first order\) and diffusion \(second order\)—within a KFE\-based objective\(Lipmanet al\.,[2022](https://arxiv.org/html/2606.17465#bib.bib1),[2024](https://arxiv.org/html/2606.17465#bib.bib22)\)\. Recent generator matching extends this to include jump contributions\(Holderriethet al\.,[2024](https://arxiv.org/html/2606.17465#bib.bib18)\)\. However, all these existing formulations operate at the*infinitesimal*level and characterize only the first\-, second\-order, or jump terms of the differential approximation, which may fail to capture higher\-order, multi\-step transport effects crucial for complex, multi\-modal density evolution\.
## 3Perron\-Frobenius Operator Matching
We here propose a new generative modeling framework, Perron–Frobenius operator matching \(PFOM\), which elevates generative modeling from matching local, infinitesimal dynamics to directly aligning the finite\-time evolution of densities\. Instead of constraining only the drift/diffusion terms in a Kolmogorov forward equation \(KFE\)—as in flow matching and diffusion models—PFOM works at the level of the integral Perron–Frobenius operator, which encapsulates the full Markov semigroup, considering higher\-order and multi\-step transport effects that are critical for complex, multimodal distributions\. By matching𝒫τρt\\mathcal\{P\}\_\{\\tau\}\\rho\_\{t\}andρt\+τ\\rho\_\{t\+\\tau\}at a finite stepτ\\tau, PFOM captures richer global evolution than purely velocity\-based schema, while still remaining compatible with operator\-theoretic tools such as Koopman and DMD/EDMD\-based identification\.
We further formulate PFOM with a practical, sample\-conditioned training loss\. We show that among separable Bregman divergences, KL is the unique choice that keeps the density\-level PF loss exactly aligned with its conditional counterpart, thereby justifying a KL\-based PFOM objective for generative training\. Pushing this loss through the PF–Koopman duality yields an equivalent Koopman path\-matching formulation that can be realized with neural operators or classical DMD/EDMD\(Proctoret al\.,[2016](https://arxiv.org/html/2606.17465#bib.bib2); Liet al\.,[2017](https://arxiv.org/html/2606.17465#bib.bib20)\)\. We derive Nesterov\-style inertial updates for faster and more stable optimization and sampling\. In this section we first formalize PFOM, establish its Koopman equivalences, and introduce a Nesterov\-accelerated variant tailored for efficient training\.
### 3\.1Why Perron–Frobenius Operators?
Let\(𝒫τ\)τ≥0\(\\mathcal\{P\}\_\{\\tau\}\)\_\{\\tau\\geq 0\}denote the Perron–Frobenius \(PF\) semigroup acting on densities, and\(𝒦τ\)τ≥0\(\\mathcal\{K\}\_\{\\tau\}\)\_\{\\tau\\geq 0\}the Koopman semigroup acting on test functions \(observables\)\. A sufficiently regular Markov process with driftut\(x\)u\_\{t\}\(x\), diffusionσt\(x\)\\sigma\_\{t\}\(x\), and jumps, is governed by the KFE\(Risken,[1989](https://arxiv.org/html/2606.17465#bib.bib17); Lasota and Mackey,[2013](https://arxiv.org/html/2606.17465#bib.bib16)\):∂t⟨ρt,f⟩=⟨ρt,ℒ∗f⟩,\\partial\_\{t\}\\langle\\rho\_\{t\},f\\rangle\\;=\\;\\langle\\rho\_\{t\},\\mathcal\{L\}^\{\*\}f\\rangle,whereℒ∗\\mathcal\{L\}\{\{\}^\{\*\}\}is the \(Koopman\) infinitesimal generator acting onff:
ℒ∗f\(x\)=ut\(x\)𝖳∇f\(x\)⏟drift\+12tr\(σt\(x\)σt\(x\)𝖳∇2f\(x\)\)⏟diffusion\+\(jump term\)⏟if present\.\\mathcal\{L\}^\{\*\}f\(x\)=\\underbrace\{\{u\_\{t\}\(x\)\\\!\}^\{\\mathsf\{T\}\}\\nabla f\(x\)\}\_\{\\text\{drift\}\}\+\\underbrace\{\\tfrac\{1\}\{2\}\\,\\mathrm\{tr\}\\big\(\\sigma\_\{t\}\(x\)\{\\sigma\_\{t\}\(x\)\\\!\}^\{\\mathsf\{T\}\}\\nabla^\{2\}f\(x\)\\big\)\}\_\{\\text\{diffusion\}\}\\\\ \+\\underbrace\{\\text\{\(jump term\)\}\}\_\{\\text\{if present\}\}\.\(6\)Equivalently, on densities the adjoint generatorℒ\\mathcal\{L\}yields the Fokker–Planck form∂tρt=ℒρt\\partial\_\{t\}\\rho\_\{t\}=\\mathcal\{L\}\\rho\_\{t\}\. The integral PF operator satisfies
ρt\+τ=𝒫τρt=eτℒρt,⟨ρt\+τ,f⟩=⟨ρt,𝒦τf⟩,\\displaystyle\\rho\_\{t\+\\tau\}\\;=\\;\\mathcal\{P\}\_\{\\tau\}\\rho\_\{t\}\\;=\\;e^\{\\tau\\mathcal\{L\}\}\\rho\_\{t\},\\qquad\\langle\\rho\_\{t\+\\tau\},f\\rangle\\;=\\;\\langle\\rho\_\{t\},\\mathcal\{K\}\_\{\\tau\}f\\rangle,\(7\)so that𝒦τ=eτℒ∗\\mathcal\{K\}\_\{\\tau\}=e^\{\\tau\\mathcal\{L\}^\{\*\}\}and𝒫τ=eτℒ\\mathcal\{P\}\_\{\\tau\}=e^\{\\tau\\mathcal\{L\}\}are dual\.
In contrast, PFOM compares the*integral*evolution𝒫τρtvs\.ρt\+τ\\mathcal\{P\}\_\{\\tau\}\\rho\_\{t\}\\;\\;\\text\{vs\.\}\\;\\;\\rho\_\{t\+\\tau\}for finiteτ\\tau, thereby capturing*all orders*in the expansion ofeτℒe^\{\\tau\\mathcal\{L\}\}\(Risken,[1989](https://arxiv.org/html/2606.17465#bib.bib17)\)\. Practically, this allows us to train against richer, multi\-step transport phenomena that are invisible to purely infinitesimal matching\.
### 3\.2Wasserstein\-Divergence Guided PFOM
We denoteΠ\(ρ0,ρ1\)\\Pi\(\\rho\_\{0\},\\rho\_\{1\}\)as the set of all possible joint distributions with starting marginal densityρ0\\rho\_\{0\}and ending marginal densityρ1\\rho\_\{1\}\. TheWasserstein\-2 metricis defined by:
W22\(ρ0,ρ1\)=infπ∈Π\(ρ0,ρ1\)∫ℝn×ℝn‖x−y‖2π\(dx,dy\)\.W\_\{2\}^\{2\}\(\\rho\_\{0\},\\rho\_\{1\}\)=\\inf\_\{\\pi\\in\\Pi\(\\rho\_\{0\},\\rho\_\{1\}\)\}\\int\_\{\{\\mathbb\{R\}\}^\{n\}\\times\{\\mathbb\{R\}\}^\{n\}\}\{\\\|x\-y\\\|\}^\{2\}\\pi\(\\mathrm\{d\}x,\\mathrm\{d\}y\)\.\(8\)InKarimi and Georgiou \([2022](https://arxiv.org/html/2606.17465#bib.bib23)\), the authors took the Wasserstein\-2 metric as the loss function to match the density flowρk\(x\)\\rho\_\{k\}\(x\)through learning the Perron–Frobenius operator𝒫\\mathcal\{P\}such that\(𝒫ρk\)\(x\)=ρk\+1\(x\)\(\\mathcal\{P\}\\rho\_\{k\}\)\(x\)=\\rho\_\{k\+1\}\(x\):
W22\(\(𝒫ρk\)\(x\),ρk\+1\(y\)\)=infπ∈Π\(𝒫ρk,ρk\+1\)∫𝑑y𝑑x‖x−y‖2π\(x,y\)\.W\_\{2\}^\{2\}\(\(\\mathcal\{P\}\\rho\_\{k\}\)\(x\),\\rho\_\{k\+1\}\(y\)\)\\\\ =\\inf\_\{\\pi\\in\\Pi\(\\mathcal\{P\}\\rho\_\{k\},\\rho\_\{k\+1\}\)\}\\int dydx\\\|x\-y\\\|^\{2\}\\pi\(x,y\)\.
Consider a set of observables\{ϕk\}k=1K⊂L∞\\\{\\phi\_\{k\}\\\}\_\{k=1\}^\{K\}\\subset L\_\{\\infty\}and define the dictionaryΦ≔ℝn→ℝK\\Phi\\coloneqq\{\\mathbb\{R\}\}^\{n\}\\to\{\\mathbb\{R\}\}^\{K\}as the vector\-valued functionΦ\(x\)=\[ϕ1\(x\)⋯ϕK\(x\)\]𝖳\.\\Phi\(x\)=\{\\begin\{bmatrix\}\\phi\_\{1\}\(x\)&\\cdots&\\phi\_\{K\}\(x\)\\end\{bmatrix\}\}^\{\\mathsf\{T\}\}\.The Koopman operator𝒦τ\\mathcal\{K\}\_\{\\tau\}acts on this dictionary component\-wise, yielding𝒦τΦ=\[𝒦τϕ1⋯𝒦τϕK\]𝖳\.\\mathcal\{K\}\_\{\\tau\}\\Phi=\{\\begin\{bmatrix\}\\mathcal\{K\}\_\{\\tau\}\\phi\_\{1\}&\\cdots&\\mathcal\{K\}\_\{\\tau\}\\phi\_\{K\}\\end\{bmatrix\}\}^\{\\mathsf\{T\}\}\.The discrepancy between𝒫τρt\\mathcal\{P\}\_\{\\tau\}\\rho\_\{t\}andρt\+τ\\rho\_\{t\+\\tau\}on the observables\{ϕk\}k=1K\\\{\\phi\_\{k\}\\\}\_\{k=1\}^\{K\}can be measured by the Wasserstein\-2 metric as follows:
W𝒫τ2\(ρt,ρt\+τ\)≔infπ∈Π\(𝒫τρt,ρt\+τ\)∫ℝn×ℝn‖Φ\(x\)−Φ\(y\)‖2π\(dx,dy\)\.W\_\{\\mathcal\{P\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\\coloneqq\\\\ \\inf\_\{\\pi\\in\\Pi\(\\mathcal\{P\}\_\{\\tau\}\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\}\\int\_\{\{\\mathbb\{R\}\}^\{n\}\\times\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\(\\mathrm\{d\}x,\\mathrm\{d\}y\)\.\(9\)Similarly, we can define the discrepancy between𝒦τΦ\\mathcal\{K\}\_\{\\tau\}\\PhiandΦ\\Phiunder the Wasserstein\-2 metric as
W𝒦τ2\(ρt,ρt\+τ\)≔infπ∈Π\(ρt,ρt\+τ\)∫ℝn×ℝn‖𝒦τΦ\(x\)−Φ\(y\)‖2π\(dx,dy\)\.W\_\{\\mathcal\{K\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\\coloneqq\\\\ \\inf\_\{\\pi\\in\\Pi\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\}\\int\_\{\{\\mathbb\{R\}\}^\{n\}\\times\{\\mathbb\{R\}\}^\{n\}\}\{\\\|\\mathcal\{K\}\_\{\\tau\}\\Phi\(x\)\-\\Phi\(y\)\\\|\}^\{2\}\\pi\(\\mathrm\{d\}x,\\mathrm\{d\}y\)\.\(10\)The following theorem shows the equivalence between these two discrepancies\.
###### Theorem 1
For any set of observables\{ϕk\}k=1K⊂L∞\\\{\\phi\_\{k\}\\\}\_\{k=1\}^\{K\}\\subset L\_\{\\infty\}, we haveW𝒫τ2\(ρt,ρt\+τ\)=W𝒦τ2\(ρt,ρt\+τ\)\.W\_\{\\mathcal\{P\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)=W\_\{\\mathcal\{K\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\.
###### Proof\.
For anyπ∈Π\(𝒫ρt,ρt\+τ\)\\pi\\in\\Pi\(\\mathcal\{P\}\\rho\_\{t\},\\rho\_\{t\+\\tau\}\), we first obtain
∫ℝn×ℝn‖Φ\(x\)−Φ\(y\)‖2π\(dx,dy\)\\displaystyle\\int\_\{\{\\mathbb\{R\}\}^\{n\}\\times\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\(\\mathrm\{d\}x,\\mathrm\{d\}y\)=∫ℝn𝒫ρk\(x\)∫ℝn‖Φ\(x\)−Φ\(y\)‖2π\(dy\|x\)dx\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\mathcal\{P\}\\rho\_\{k\}\(x\)\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\(\\mathrm\{d\}y\|x\)\\mathrm\{d\}x=∫ℝnρk\(x\)𝒦∫ℝn‖Φ\(x\)−Φ\(y\)‖2π\(dy\|x\)dx\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\rho\_\{k\}\(x\)\\mathcal\{K\}\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\(\\mathrm\{d\}y\|x\)\\mathrm\{d\}x=∫ℝnρk\(x\)∫ℝn‖Φ\(S\(x\)\)−Φ\(y\)‖2π\(dy\|S\(x\)\)dx,\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\rho\_\{k\}\(x\)\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(S\(x\)\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\(\\mathrm\{d\}y\|S\(x\)\)\\mathrm\{d\}x,where the first equality comes from the disintegration theorem, and the second is due to the duality between the PF operator𝒫\\mathcal\{P\}and the Koopman operator𝒦\\mathcal\{K\}, and the third follows from the definition of Koopman operators\. Letπ1\(dx,dy\)=ρt\(x\)π\(dy\|S\(x\)\)dx\\pi\_\{1\}\(\\mathrm\{d\}x,\\mathrm\{d\}y\)=\\rho\_\{t\}\(x\)\\pi\(\\mathrm\{d\}y\|S\(x\)\)\\mathrm\{d\}x\. It is clear that
π1\(dx,ℝn\)=ρt\(x\)π\(ℝn\|S\(x\)\)dx=ρk\(x\)dx,\\pi\_\{1\}\(\\mathrm\{d\}x,\{\\mathbb\{R\}\}^\{n\}\)=\\rho\_\{t\}\(x\)\\pi\(\{\\mathbb\{R\}\}^\{n\}\|S\(x\)\)\\mathrm\{d\}x=\\rho\_\{k\}\(x\)\\mathrm\{d\}x,which implies thatπ1\\pi\_\{1\}has a marginal densityρk\\rho\_\{k\}\. On the other hand, for any measurable functiong:ℝn→ℝg\\colon\{\\mathbb\{R\}\}^\{n\}\\to\{\\mathbb\{R\}\},
∫ℝn×ℝng\(y\)π1\(dx,dy\)\\displaystyle\\int\_\{\{\\mathbb\{R\}\}^\{n\}\\times\{\\mathbb\{R\}\}^\{n\}\}g\(y\)\\pi\_\{1\}\(\\mathrm\{d\}x,\\mathrm\{d\}y\)=∫ℝnρt\(x\)∫ℝng\(y\)π\(dy\|S\(x\)\)dx\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\rho\_\{t\}\(x\)\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}g\(y\)\\pi\(\\mathrm\{d\}y\|S\(x\)\)\\mathrm\{d\}x=∫ℝn𝒫ρt\(x\)∫ℝng\(y\)π\(dy\|x\)dx\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\mathcal\{P\}\\rho\_\{t\}\(x\)\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}g\(y\)\\pi\(\\mathrm\{d\}y\|x\)\\mathrm\{d\}x=∫ℝn×ℝng\(y\)π\(dx,dy\)\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\\times\{\\mathbb\{R\}\}^\{n\}\}g\(y\)\\pi\(\\mathrm\{d\}x,\\mathrm\{d\}y\)=∫ℝng\(y\)ρt\+τ\(y\)dy,\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}g\(y\)\\rho\_\{t\+\\tau\}\(y\)\\mathrm\{d\}y,where the last two equalities come from the fact thatπ∈Π\(𝒫ρk,ρt\+τ\)\\pi\\in\\Pi\(\\mathcal\{P\}\\rho\_\{k\},\\rho\_\{t\+\\tau\}\)\. As a result,π1\\pi\_\{1\}has a marginal densityρk\\rho\_\{k\}, and thus,π1∈Π\(ρk,ρt\+τ\)\\pi\_\{1\}\\in\\Pi\(\\rho\_\{k\},\\rho\_\{t\+\\tau\}\)\. This gives rises to
W𝒫τ2\(ρt,ρt\+τ\)≥W𝒦τ2\(ρt,ρt\+τ\)\.W\_\{\\mathcal\{P\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\\geq W\_\{\\mathcal\{K\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\.\(11\)On the other hand, for anyπ1∈Π\(ρt,ρt\+τ\)\\pi\_\{1\}\\in\\Pi\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\), we have
∫ℝn×ℝn‖𝒦Φ\(x\)−Φ\(y\)‖2π1\(dx,dy\)\\displaystyle\\int\_\{\{\\mathbb\{R\}\}^\{n\}\\times\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\mathcal\{K\}\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\_\{1\}\(\\mathrm\{d\}x,\\mathrm\{d\}y\)=∫ℝnρk\(x\)𝒦∫ℝn‖Φ\(x\)−Φ\(y\)‖2π1\(dy\|S−1\(x\)\)dx\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\rho\_\{k\}\(x\)\\mathcal\{K\}\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\_\{1\}\(\\mathrm\{d\}y\|S^\{\-1\}\(x\)\)\\mathrm\{d\}x=∫ℝn𝒫ρk\(x\)∫ℝn‖Φ\(x\)−Φ\(y\)‖2π\(dy\|S−1\(x\)\)dx\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\mathcal\{P\}\\rho\_\{k\}\(x\)\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\(\\mathrm\{d\}y\|S^\{\-1\}\(x\)\)\\mathrm\{d\}x=∫ℝn𝒫ρk\(x\)∫ℝn‖Φ\(x\)−Φ\(y\)‖2π\(dy\|S−1\(x\)\)dx\.\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\mathcal\{P\}\\rho\_\{k\}\(x\)\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\(\\mathrm\{d\}y\|S^\{\-1\}\(x\)\)\\mathrm\{d\}x\.Letπ\(dx,dy\)=𝒫ρt\(x\)π1\(dy\|S−1\(x\)\)dx\\pi\(\\mathrm\{d\}x,\\mathrm\{d\}y\)=\\mathcal\{P\}\\rho\_\{t\}\(x\)\\pi\_\{1\}\(\\mathrm\{d\}y\|S^\{\-1\}\(x\)\)\\mathrm\{d\}x\. A similar approach shows thatπ∈Π\(𝒫ρt,ρt\+τ\)\\pi\\in\\Pi\(\\mathcal\{P\}\\rho\_\{t\},\\rho\_\{t\+\\tau\}\), and in turn
W𝒫τ2\(ρt,ρt\+τ\)≤W𝒦τ2\(ρt,ρt\+τ\)\.W\_\{\\mathcal\{P\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\\leq W\_\{\\mathcal\{K\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\.\(12\)The proof is completed by combining the inequalities \([11](https://arxiv.org/html/2606.17465#S3.E11)\) and \([12](https://arxiv.org/html/2606.17465#S3.E12)\)\. ∎
### 3\.3Bregman\-Divergence Guided PFOM
In PFOM we match the finite\-step density evolutionρt\+τ≈𝒫τρt\\rho\_\{t\+\\tau\}\\approx\\mathcal\{P\}\_\{\\tau\}\\rho\_\{t\}, yet training only accesses conditionals indexed by the data sampleX1∼qX\_\{1\}\\sim q, whose mixture is the marginal,ρt\+τ=𝔼X1∼qρt\+τ\(⋅\|X1\)\\rho\_\{t\+\\tau\}=\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\rho\_\{t\+\\tau\}\(\\cdot\\,\|\\,X\_\{1\}\)\. We ask the density discrepancyDDto meet two requirements\.\(P1\)*\(conditional–marginal consistency\)*: for every data lawqqand every family of conditional densities\{ρ\(⋅\|X1\)\}\\\{\\rho\(\\cdot\\,\|\\,X\_\{1\}\)\\\}with marginalρ¯=𝔼X1∼qρ\(⋅\|X1\)\\bar\{\\rho\}=\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\rho\(\\cdot\\,\|\\,X\_\{1\}\), and every densityσ\\sigma,
𝔼X1∼qD\(ρ\(⋅\|X1\)∥σ\)=D\(ρ¯∥σ\)\+C,\\displaystyle\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\,D\\bigl\(\\rho\(\\cdot\\,\|\\,X\_\{1\}\)\\,\\big\\\|\\,\\sigma\\bigr\)=D\(\\bar\{\\rho\}\\,\\\|\\,\\sigma\)\+C,\(13\)C=C\(q\)independent ofσ,\\displaystyle C=C\(q\)\\ \\text\{independent of \}\\sigma,\(14\)so the conditional loss differs from the marginal objective only by a parameter\-free constant and is thus an exact training surrogate\.\(P2\)*\(reparametrization invariance\)*: for every nondegenerate coordinate change \(diffeomorphism\)TTwith push\-forwardT\#\{T\}\_\{\\\#\},
D\(T\#ρ∥T\#σ\)=D\(ρ∥σ\),D\\bigl\(\{T\}\_\{\\\#\}\\rho\\,\\big\\\|\\,\{T\}\_\{\\\#\}\\sigma\\bigr\)=D\(\\rho\\,\\\|\\,\\sigma\),\(15\)so the aligned operator is intrinsic to the densities, not an artifact of the chosen \(and, for adaptive dictionaries, varying\) observable coordinates\. These two requirements single out the Kullback–Leibler divergence\.
###### Theorem 2
\[KL is the unique consistent, coordinate\-invariant discrepancy\] LetD\(ρ∥σ\)=∫δ\(ρ\(x\),σ\(x\)\)dxD\(\\rho\\\|\\sigma\)=\\int\\delta\(\\rho\(x\),\\sigma\(x\)\)\\,\\mathrm\{d\}xbe a separable divergence withδ∈C2\(\(0,∞\)2\)\\delta\\in C^\{2\}\(\(0,\\infty\)^\{2\}\),δ\(s,s\)=0\\delta\(s,s\)=0andδ≥0\\delta\\geq 0\. ThenDDsatisfies both \([13](https://arxiv.org/html/2606.17465#S3.E13)\) and \([15](https://arxiv.org/html/2606.17465#S3.E15)\) if and only ifD=cKLD=c\\,\\operatorname\{KL\}for some constantc\>0c\>0\.
###### Proof\.
If\.LetD=cKLD=c\\,\\operatorname\{KL\}\. For \([13](https://arxiv.org/html/2606.17465#S3.E13)\), withρ¯=𝔼X1∼qρ\(⋅\|X1\)\\bar\{\\rho\}=\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\rho\(\\cdot\\,\|\\,X\_\{1\}\),
𝔼X1∼qKL\(ρ\(⋅\|X1\)∥σ\)\\displaystyle\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\operatorname\{KL\}\\bigl\(\\rho\(\\cdot\\,\|\\,X\_\{1\}\)\\big\\\|\\sigma\\bigr\)\(16\)=\\displaystyle=𝔼X1∼q∫ρ\(⋅\|X1\)logρ\(⋅\|X1\)ρ¯\+𝔼X1∼q∫ρ\(⋅\|X1\)logρ¯σ\\displaystyle\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\\!\\int\\rho\(\\cdot\\,\|\\,X\_\{1\}\)\\log\\frac\{\\rho\(\\cdot\\,\|\\,X\_\{1\}\)\}\{\\bar\{\\rho\}\}\+\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\\!\\int\\rho\(\\cdot\\,\|\\,X\_\{1\}\)\\log\\frac\{\\bar\{\\rho\}\}\{\\sigma\}\(17\)=\\displaystyle=𝔼X1∼qKL\(ρ\(⋅\|X1\)∥ρ¯\)⏟=:C,σ\-free\+KL\(ρ¯∥σ\),\\displaystyle\\underbrace\{\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\operatorname\{KL\}\\bigl\(\\rho\(\\cdot\\,\|\\,X\_\{1\}\)\\big\\\|\\bar\{\\rho\}\\bigr\)\}\_\{=:C,\\ \\sigma\\text\{\-free\}\}\+\\,\\operatorname\{KL\}\(\\bar\{\\rho\}\\\|\\sigma\),\(18\)becauselog\(ρ¯/σ\)\\log\(\\bar\{\\rho\}/\\sigma\)does not depend onX1X\_\{1\}and𝔼X1∼qρ\(⋅\|X1\)=ρ¯\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\rho\(\\cdot\\,\|\\,X\_\{1\}\)=\\bar\{\\rho\}\. For \([15](https://arxiv.org/html/2606.17465#S3.E15)\), change of variables under any diffeomorphismTTgivesKL\(T\#ρ∥T\#σ\)=KL\(ρ∥σ\)\\operatorname\{KL\}\(\{T\}\_\{\\\#\}\\rho\\\|\{T\}\_\{\\\#\}\\sigma\)=\\operatorname\{KL\}\(\\rho\\\|\\sigma\), the Jacobian cancelling because KL depends on\(ρ,σ\)\(\\rho,\\sigma\)only throughρ\\rhoand the ratioρ/σ\\rho/\\sigma\.
Only if\.*Step 1: \([13](https://arxiv.org/html/2606.17465#S3.E13)\)⇒\\RightarrowBregman\.*Specializeqqto the two\-point law placing massλ\\lambdaonx10x\_\{1\}^\{0\}and1−λ1\-\\lambdaonx11x\_\{1\}^\{1\}, and writeρ0=ρ\(⋅\|x10\)\\rho\_\{0\}=\\rho\(\\cdot\\,\|\\,x\_\{1\}^\{0\}\),ρ1=ρ\(⋅\|x11\)\\rho\_\{1\}=\\rho\(\\cdot\\,\|\\,x\_\{1\}^\{1\}\), soρ¯=λρ0\+\(1−λ\)ρ1\\bar\{\\rho\}=\\lambda\\rho\_\{0\}\+\(1\-\\lambda\)\\rho\_\{1\}\. Then \([13](https://arxiv.org/html/2606.17465#S3.E13)\) states thatλD\(ρ0∥σ\)\+\(1−λ\)D\(ρ1∥σ\)−D\(ρ¯∥σ\)\\lambda D\(\\rho\_\{0\}\\\|\\sigma\)\+\(1\-\\lambda\)D\(\\rho\_\{1\}\\\|\\sigma\)\-D\(\\bar\{\\rho\}\\\|\\sigma\)is independent ofσ\\sigma\. Pointwise \(by separability\), the maps↦λδ\(r0,s\)\+\(1−λ\)δ\(r1,s\)−δ\(λr0\+\(1−λ\)r1,s\)s\\mapsto\\lambda\\delta\(r\_\{0\},s\)\+\(1\-\\lambda\)\\delta\(r\_\{1\},s\)\-\\delta\\bigl\(\\lambda r\_\{0\}\+\(1\-\\lambda\)r\_\{1\},s\\bigr\)is constant for allr0,r1\>0r\_\{0\},r\_\{1\}\>0andλ∈\(0,1\)\\lambda\\in\(0,1\); hence for anys,s′s,s^\{\\prime\}the functionr↦δ\(r,s\)−δ\(r,s′\)r\\mapsto\\delta\(r,s\)\-\\delta\(r,s^\{\\prime\}\)has vanishing Jensen gap, i\.e\. is affine\. Fix a references0s\_\{0\}and setv\(r\)≔δ\(r,s0\)v\(r\)\\coloneqq\\delta\(r,s\_\{0\}\); thenδ\(r,s\)=v\(r\)\+a\(s\)r\+b\(s\)\\delta\(r,s\)=v\(r\)\+a\(s\)\\,r\+b\(s\)\. Imposingδ\(s,s\)=0\\delta\(s,s\)=0and∂rδ\(r,s\)\|r=s=0\\partial\_\{r\}\\delta\(r,s\)\|\_\{r=s\}=0\(asr=sr=sminimizesδ\(⋅,s\)\\delta\(\\cdot,s\)\) givesa\(s\)=−v′\(s\)a\(s\)=\-v^\{\\prime\}\(s\)andb\(s\)=sv′\(s\)−v\(s\)b\(s\)=s\\,v^\{\\prime\}\(s\)\-v\(s\), hence
δ\(r,s\)=v\(r\)−v\(s\)−v′\(s\)\(r−s\),\\delta\(r,s\)=v\(r\)\-v\(s\)\-v^\{\\prime\}\(s\)\(r\-s\),i\.e\.DDis a separable Bregman divergence with potentialvv, convex sinceδ≥0\\delta\\geq 0\.
*Step 2: \([15](https://arxiv.org/html/2606.17465#S3.E15)\)⇒\\RightarrowKL\.*WriteBv≔δB\_\{v\}\\coloneqq\\delta\. Take the scalingTμ\(x\)=μxT\_\{\\mu\}\(x\)=\\mu xonℝn\{\\mathbb\{R\}\}^\{n\}\(J=μnJ=\\mu^\{n\}\), whose push\-forward is\(Tμ\#ρ\)\(y\)=ρ\(y/μ\)/J\(\{T\_\{\\mu\}\}\_\{\\\#\}\\rho\)\(y\)=\\rho\(y/\\mu\)/J\. Then \([15](https://arxiv.org/html/2606.17465#S3.E15)\) and a change of variables give, for all densities,∫JBv\(ρ/J,σ/J\)dx=∫Bv\(ρ,σ\)dx\\int J\\,B\_\{v\}\(\\rho/J,\\sigma/J\)\\,\\mathrm\{d\}x=\\int B\_\{v\}\(\\rho,\\sigma\)\\,\\mathrm\{d\}x, hence pointwiseJBv\(r/J,s/J\)=Bv\(r,s\)J\\,B\_\{v\}\(r/J,s/J\)=B\_\{v\}\(r,s\)for allJ\>0J\>0:BvB\_\{v\}is positively homogeneous of degree one,Bv\(λr,λs\)=λBv\(r,s\)B\_\{v\}\(\\lambda r,\\lambda s\)=\\lambda B\_\{v\}\(r,s\)\. Differentiating twice inrrgivesλ2v′′\(λr\)=λv′′\(r\)\\lambda^\{2\}v^\{\\prime\\prime\}\(\\lambda r\)=\\lambda v^\{\\prime\\prime\}\(r\), soλv′′\(λr\)=v′′\(r\)\\lambda\\,v^\{\\prime\\prime\}\(\\lambda r\)=v^\{\\prime\\prime\}\(r\); atr=1r=1,v′′\(λ\)=c/λv^\{\\prime\\prime\}\(\\lambda\)=c/\\lambdawithc=v′′\(1\)\>0c=v^\{\\prime\\prime\}\(1\)\>0\. Integrating,v\(r\)=crlogrv\(r\)=c\\,r\\log rup to affine terms, whenceD=cKLD=c\\,\\operatorname\{KL\}\. ∎
### 3\.4Connections with Flow Matching
Flow matching and diffusion\-style training arise as the*Gaussian reduction*of PF matching: when the one\-step conditional transitions are Gaussian with a shared noise schedule, a single least\-squares loss controls*both*the marginal Wasserstein\-2 and the marginal KL PF objectives\. The mechanism is shared—bothW22W\_\{2\}^\{2\}andKL\\operatorname\{KL\}are jointly convex, and for two Gaussians with the same covariance both reduce to the squared distance between means\.
###### Theorem 3
\[Flow matching is the common surrogate for marginal PF matching\] LetZ=X1∼qZ=X\_\{1\}\\sim q, and assume conditionally Gaussian one\-step transitions with a shared isotropic covariance and a shared drift fieldftθf^\{\\theta\}\_\{t\},
ρt\+τ\(⋅\|Z\)=𝒩\(μt\(Z\),gt2τId\),\\displaystyle\\rho\_\{t\+\\tau\}\(\\cdot\\,\|\\,Z\)=\\mathcal\{N\}\\\!\\bigl\(\\mu\_\{t\}\(Z\),\\,g\_\{t\}^\{2\}\\tau I\_\{d\}\\bigr\),\(19\)𝒫τθρt\(⋅\|Z\)=𝒩\(ftθ\(Xt\),gt2τId\),\\displaystyle\\mathcal\{P\}^\{\\theta\}\_\{\\tau\}\\rho\_\{t\}\(\\cdot\\,\|\\,Z\)=\\mathcal\{N\}\\\!\\bigl\(f^\{\\theta\}\_\{t\}\(X\_\{t\}\),\\,g\_\{t\}^\{2\}\\tau I\_\{d\}\\bigr\),\(20\)withgt\>0g\_\{t\}\>0fixed \(independent ofθ\\theta\) andXtX\_\{t\}the current state on the conditional path\. Write the flow\-matching loss
LFM\(θ\)≔𝔼Z‖μt\(Z\)−ftθ\(Xt\)‖2\.L\_\{\\mathrm\{FM\}\}\(\\theta\)\\;\\coloneqq\\;\\mathbb\{E\}\_\{Z\}\\bigl\\\|\\mu\_\{t\}\(Z\)\-f^\{\\theta\}\_\{t\}\(X\_\{t\}\)\\bigr\\\|^\{2\}\.Then the marginal Wasserstein and KL PF objectives obey
W22\(𝒫τθρt,ρt\+τ\)≤LFM\(θ\),\\displaystyle W\_\{2\}^\{2\}\\\!\\bigl\(\\mathcal\{P\}^\{\\theta\}\_\{\\tau\}\\rho\_\{t\},\\rho\_\{t\+\\tau\}\\bigr\)\\;\\leq\\;L\_\{\\mathrm\{FM\}\}\(\\theta\),\(21\)KL\(ρt\+τ∥𝒫τθρt\)≤12gt2τLFM\(θ\),\\displaystyle\\operatorname\{KL\}\\\!\\bigl\(\\rho\_\{t\+\\tau\}\\,\\big\\\|\\,\\mathcal\{P\}^\{\\theta\}\_\{\\tau\}\\rho\_\{t\}\\bigr\)\\;\\leq\\;\\frac\{1\}\{2g\_\{t\}^\{2\}\\tau\}\\,L\_\{\\mathrm\{FM\}\}\(\\theta\),\(22\)andLFML\_\{\\mathrm\{FM\}\}equals the denoising least\-squares \(sample\) loss up to aθ\\theta\-free constant,
LFM\(θ\)=𝔼Z,Xt\+τ‖Xt\+τ−ftθ\(Xt\)‖2−dgt2τ\.L\_\{\\mathrm\{FM\}\}\(\\theta\)=\\mathbb\{E\}\_\{Z,X\_\{t\+\\tau\}\}\\bigl\\\|X\_\{t\+\\tau\}\-f^\{\\theta\}\_\{t\}\(X\_\{t\}\)\\bigr\\\|^\{2\}\-d\\,g\_\{t\}^\{2\}\\tau\.\(23\)Consequently the flow\-matching loss is a common offline surrogate for both marginal objectives, and its unique minimizer over drift fields is the marginal driftft⋆\(x\)=𝔼\[μt\(Z\)\|Xt=x\]f\_\{t\}^\{\\star\}\(x\)=\\mathbb\{E\}\\\!\\bigl\[\\mu\_\{t\}\(Z\)\\,\\big\|\\,X\_\{t\}=x\\bigr\]\.
###### Proof\.
For two Gaussians with common covarianceΣ=gt2τId\\Sigma=g\_\{t\}^\{2\}\\tau I\_\{d\}, the Bures term vanishes and
W22\(𝒩\(μ,Σ\),𝒩\(m,Σ\)\)=‖μ−m‖2,\\displaystyle W\_\{2\}^\{2\}\\\!\\bigl\(\\mathcal\{N\}\(\\mu,\\Sigma\),\\mathcal\{N\}\(m,\\Sigma\)\\bigr\)=\\\|\\mu\-m\\\|^\{2\},\(24\)KL\(𝒩\(μ,Σ\)∥𝒩\(m,Σ\)\)=‖μ−m‖22gt2τ\.\\displaystyle\\operatorname\{KL\}\\\!\\bigl\(\\mathcal\{N\}\(\\mu,\\Sigma\)\\,\\big\\\|\\,\\mathcal\{N\}\(m,\\Sigma\)\\bigr\)=\\frac\{\\\|\\mu\-m\\\|^\{2\}\}\{2g\_\{t\}^\{2\}\\tau\}\.\(25\)Applying these to the conditionals and taking𝔼Z\\mathbb\{E\}\_\{Z\},
𝔼ZW22\(𝒫τθρt\(⋅\|Z\),ρt\+τ\(⋅\|Z\)\)=LFM\(θ\),\\displaystyle\\mathbb\{E\}\_\{Z\}\\,W\_\{2\}^\{2\}\\\!\\bigl\(\\mathcal\{P\}^\{\\theta\}\_\{\\tau\}\\rho\_\{t\}\(\\cdot\\,\|\\,Z\),\\rho\_\{t\+\\tau\}\(\\cdot\\,\|\\,Z\)\\bigr\)=L\_\{\\mathrm\{FM\}\}\(\\theta\),\(26\)𝔼ZKL\(ρt\+τ\(⋅\|Z\)∥𝒫τθρt\(⋅\|Z\)\)=LFM\(θ\)2gt2τ\.\\displaystyle\\mathbb\{E\}\_\{Z\}\\,\\operatorname\{KL\}\\\!\\bigl\(\\rho\_\{t\+\\tau\}\(\\cdot\\,\|\\,Z\)\\,\\big\\\|\\,\\mathcal\{P\}^\{\\theta\}\_\{\\tau\}\\rho\_\{t\}\(\\cdot\\,\|\\,Z\)\\bigr\)=\\frac\{L\_\{\\mathrm\{FM\}\}\(\\theta\)\}\{2g\_\{t\}^\{2\}\\tau\}\.\(27\)On the other hand, bothW22W\_\{2\}^\{2\}andKL\\operatorname\{KL\}are jointly convex in their two arguments, so for any mixing law,D\(𝔼ZαZ∥𝔼ZβZ\)≤𝔼ZD\(αZ∥βZ\)D\\\!\\bigl\(\\mathbb\{E\}\_\{Z\}\\alpha\_\{Z\}\\,\\\|\\,\\mathbb\{E\}\_\{Z\}\\beta\_\{Z\}\\bigr\)\\leq\\mathbb\{E\}\_\{Z\}D\(\\alpha\_\{Z\}\\\|\\beta\_\{Z\}\)\. Sinceρt\+τ=𝔼Zρt\+τ\(⋅\|Z\)\\rho\_\{t\+\\tau\}=\\mathbb\{E\}\_\{Z\}\\rho\_\{t\+\\tau\}\(\\cdot\\,\|\\,Z\)and𝒫τθρt=𝔼Z𝒫τθρt\(⋅\|Z\)\\mathcal\{P\}^\{\\theta\}\_\{\\tau\}\\rho\_\{t\}=\\mathbb\{E\}\_\{Z\}\\mathcal\{P\}^\{\\theta\}\_\{\\tau\}\\rho\_\{t\}\(\\cdot\\,\|\\,Z\), applying this toD∈\{W22,KL\}D\\in\\\{W\_\{2\}^\{2\},\\operatorname\{KL\}\\\}and combining with \([26](https://arxiv.org/html/2606.17465#S3.E26)\) yields the two bounds \([21](https://arxiv.org/html/2606.17465#S3.E21)\)\. WithXt\+τ\(⋅\|Z\)=μt\(Z\)\+gtτεX\_\{t\+\\tau\}\(\\cdot\\,\|\\,Z\)=\\mu\_\{t\}\(Z\)\+g\_\{t\}\\sqrt\{\\tau\}\\,\\varepsilon,ε∼𝒩\(0,Id\)\\varepsilon\\sim\\mathcal\{N\}\(0,I\_\{d\}\)independent of\(Z,Xt\)\(Z,X\_\{t\}\),
𝔼\[‖Xt\+τ−ftθ\(Xt\)‖2\|Z\]=‖μt\(Z\)−ftθ\(Xt\)‖2\+dgt2τ;\\mathbb\{E\}\\\!\\bigl\[\\\|X\_\{t\+\\tau\}\-f^\{\\theta\}\_\{t\}\(X\_\{t\}\)\\\|^\{2\}\\,\\big\|\\,Z\\bigr\]=\\\|\\mu\_\{t\}\(Z\)\-f^\{\\theta\}\_\{t\}\(X\_\{t\}\)\\\|^\{2\}\+d\\,g\_\{t\}^\{2\}\\tau;taking𝔼Z\\mathbb\{E\}\_\{Z\}gives \([23](https://arxiv.org/html/2606.17465#S3.E23)\), whose additive constantdgt2τd\\,g\_\{t\}^\{2\}\\tauis independent ofθ\\theta\. WritingLFM\(θ\)=𝔼Xt𝔼Z∣Xt‖μt\(Z\)−ftθ\(Xt\)‖2L\_\{\\mathrm\{FM\}\}\(\\theta\)=\\mathbb\{E\}\_\{X\_\{t\}\}\\mathbb\{E\}\_\{Z\\mid X\_\{t\}\}\\\|\\mu\_\{t\}\(Z\)\-f^\{\\theta\}\_\{t\}\(X\_\{t\}\)\\\|^\{2\}and minimizing over fields pointwise inxx, the optimum is the conditional meanft⋆\(x\)=𝔼\[μt\(Z\)∣Xt=x\]f\_\{t\}^\{\\star\}\(x\)=\\mathbb\{E\}\[\\mu\_\{t\}\(Z\)\\mid X\_\{t\}=x\]\. ∎
### 3\.5Nesterov Momentum Acceleration for Generation
We incorporate Nesterov’s acceleration\(Nesterov and others,[2018](https://arxiv.org/html/2606.17465#bib.bib6)\)at the*observable*level so that evaluation is performed at a look\-ahead point\. Let\{ϕk\}k≥1\\\{\\phi\_\{k\}\\\}\_\{k\\geq 1\}be an observable basis, and denote the Koopman step by𝒦τθ\\mathcal\{K\}\_\{\\tau\}^\{\\theta\}\. Define the extrapolated \(look\-ahead\) observable
ψk\(xt\)=ϕk\(xt\)\+ηt\(ϕk\(xt\)−ϕk\(xt−τ\)\),ηt∈\[0,1\)\.\\psi\_\{k\}\(x\_\{t\}\)=\\phi\_\{k\}\(x\_\{t\}\)\+\\eta\_\{t\}\\bigl\(\\phi\_\{k\}\(x\_\{t\}\)\-\\phi\_\{k\}\(x\_\{t\-\\tau\}\)\\bigr\),\\quad\\eta\_\{t\}\\in\[0,1\)\.We replaceϕk\(xt\)\\phi\_\{k\}\(x\_\{t\}\)with a momentum look\-ahead on the input observables:
ℒKPM\-Nes\(θ\)=∑k=1K𝔼X0∼p,X1∼q∥ϕk\(xt\+τ\(X0,X1\)\)−𝒦τθψk\(xt\(X0,X1\)\)∥2\.\\mathcal\{L\}\_\{\\text\{KPM\-Nes\}\}\(\\theta\)=\\sum\_\{k=1\}^\{K\}\\mathbb\{E\}\_\{X\_\{0\}\\sim p,X\_\{1\}\\sim q\}\\bigl\\\|\\phi\_\{k\}\\bigl\(x\_\{t\+\\tau\}\(X\_\{0\},X\_\{1\}\)\\bigr\)\\bigr\.\\\\ \{\\bigl\.\-\\mathcal\{K\}\_\{\\tau\}^\{\\theta\}\\psi\_\{k\}\\bigl\(x\_\{t\}\(X\_\{0\},X\_\{1\}\)\\bigr\)\\bigr\\\|\}^\{2\}\.\(28\)In coordinates, i\.e\.,ϕk\(x\)=x\(k\)\\phi\_\{k\}\(x\)=x^\{\(k\)\}, \([28](https://arxiv.org/html/2606.17465#S3.E28)\) reduces to the vector form
𝔼X0∼p,X1∼q∥xt\+τ\(X0,X1\)−𝒦^τθ\(xt\(X0,X1\)\+ηt\(xt\(X0,X1\)−xt−τ\(X0,X1\)\)\)∥2,\\mathbb\{E\}\_\{X\_\{0\}\\sim p,X\_\{1\}\\sim q\}\\Big\\\|x\_\{t\+\\tau\}\(X\_\{0\},X\_\{1\}\)\-\\hat\{\\mathcal\{K\}\}\_\{\\tau\}^\{\\theta\}\\bigl\(x\_\{t\}\(X\_\{0\},X\_\{1\}\)\\\\ \+\\eta\_\{t\}\(x\_\{t\}\(X\_\{0\},X\_\{1\}\)\-x\_\{t\-\\tau\}\(X\_\{0\},X\_\{1\}\)\)\\bigr\)\\Bigr\\\|^\{2\},\(29\)where𝒦^τθ\\hat\{\\mathcal\{K\}\}\_\{\\tau\}^\{\\theta\}is constructed as the Koopman operator\. Given a trained𝒦τθ\\mathcal\{K\}\_\{\\tau\}^\{\\theta\}, we propagate with a look\-ahead state:
yt\\displaystyle y\_\{t\}=xt\+ηt\(xt−xt−τ\),\\displaystyle=x\_\{t\}\+\\eta\_\{t\}\\,\(x\_\{t\}\-x\_\{t\-\\tau\}\),\(30a\)xt\+τ\\displaystyle x\_\{t\+\\tau\}=𝒦^τθ\(yt\),x0,xτ∼𝒩\(0,I\)\.\\displaystyle=\\hat\{\\mathcal\{K\}\}\_\{\\tau\}^\{\\theta\}\(y\_\{t\}\),\\quad x\_\{0\},x\_\{\\tau\}\\sim\\mathcal\{N\}\(0,I\)\.\(30b\)In light of the Nesterov momentum method in optimization theory, we here introduce formally the following Nesterov\-KPM training and sampling algorithms\.
Algorithm 1Nesterov\-KPM Training \(mini\-batch\)1:Inputs:step
τ\\tau, momentum
η\\eta, bridge
xs\(X1,X0\)x\_\{s\}\(X\_\{1\},X\_\{0\}\)
2:forbatches of pairs
\(X0\(i\),X1\(i\)\)\(X\_\{0\}^\{\(i\)\},X\_\{1\}^\{\(i\)\}\)do
3:build
xt−τ\(i\),xt\(i\),xt\+τ\(i\)x\_\{t\-\\tau\}^\{\(i\)\},x\_\{t\}^\{\(i\)\},x\_\{t\+\\tau\}^\{\(i\)\}from the bridge
4:
yt←xt\(i\)\+η\(xt\(i\)−xt−τ\(i\)\)y\_\{t\}\\leftarrow x\_\{t\}^\{\(i\)\}\+\\eta\\big\(x\_\{t\}^\{\(i\)\}\-x\_\{t\-\\tau\}^\{\(i\)\}\\big\)
5:
z^t←𝒦^τθ\(yt\)\\hat\{z\}\_\{t\}\\leftarrow\\hat\{\\mathcal\{K\}\}\_\{\\tau\}^\{\\theta\}\\big\(y\_\{t\}\\big\)
6:
ℒKPM\-Nes←1B∑i‖xt\+τ−z^t‖22\\mathcal\{L\}\_\{\\mathrm\{KPM\}\\text\{\-\}\\mathrm\{Nes\}\}\\leftarrow\\frac\{1\}\{B\}\\sum\_\{i\}\\\|x\_\{t\+\\tau\}\-\\hat\{z\}\_\{t\}\\\|\_\{2\}^\{2\}
7:Update
θ\\thetaby gradient descent on
ℒKPM\-Nes\\mathcal\{L\}\_\{\\mathrm\{KPM\}\\text\{\-\}\\mathrm\{Nes\}\}
Algorithm 2Nesterov\-KPM Sampling1:Initialize
x0,x−τ∼𝒩\(0,I\)x\_\{0\},x\_\{\-\\tau\}\\sim\\mathcal\{N\}\(0,I\), set
t=0t\\\!=\\\!0
2:while
t<1t<1do
3:
yt←xt\+η\(xt−xt−τ\)y\_\{t\}\\leftarrow x\_\{t\}\+\\eta\(x\_\{t\}\-x\_\{t\-\\tau\}\)
4:
xt\+τ←𝒦^τθ\(yt\)x\_\{t\+\\tau\}\\leftarrow\\hat\{\\mathcal\{K\}\}\_\{\\tau\}^\{\\theta\}\(y\_\{t\}\)
5:
t←t\+τt\\leftarrow t\+\\tau
## 4Numerical Simulations
In this section, we follow the training loss in the flow matching, representing the Koopman operator using a deep neural network parameterized byθ\\theta, that is,
minθ𝔼X0∼p,X1∼q,t∼U\[0,1\]∥xt\+τ\(X1,X0\)−NNθ\(t,xt\(X1,X0\)\)∥2\.\\min\_\{\\theta\}\\mathbb\{E\}\_\{X\_\{0\}\\sim p,X\_\{1\}\\sim q,t\\sim\\mathrm\{U\}\[0,1\]\}\\big\\\|x\_\{t\+\\tau\}\(X\_\{1\},X\_\{0\}\)\\\\ \-\\mathrm\{NN\}\_\{\\theta\}\(t,x\_\{t\}\(X\_\{1\},X\_\{0\}\)\)\\big\\\|^\{2\}\.\(31\)After getting the optimal parameterθ\\theta, we do the following iteration over timett:X^t\+τ=NNθ\(t,X^t\),X^0∼N\(0,I\)\.\\hat\{X\}\_\{t\+\\tau\}=\\mathrm\{NN\}\_\{\\theta\}\(t,\\hat\{X\}\_\{t\}\),\\quad\\hat\{X\}\_\{0\}\\sim N\(0,I\)\.Some important hyperparameters are listed in Table[1](https://arxiv.org/html/2606.17465#S4.T1)\.
Table 1:Parameter settingsFig\.[2](https://arxiv.org/html/2606.17465#S4.F2)shows the generated and original samples of GMM model and the Two\-Moon model, respectively\.


Figure 2:Original Samples \(Blue\) from GMM \(Left\) / Two\-Moon \(Right\) Models and Generated Samples \(Red\) by PFOM\.Moreover, we train with the Nesterov momentum loss in Algorithm[1](https://arxiv.org/html/2606.17465#alg1)and generate samples via Algorithm[2](https://arxiv.org/html/2606.17465#alg2)on the GMM benchmark\. Figure[3](https://arxiv.org/html/2606.17465#S4.F3)compares the*rates of decrease*in KL divergence, Wasserstein\-2, and maximum mean discrepancy \(MMD\) between standard Koopman path matching and its Nesterov\-accelerated variant\. The Nesterov method consistently achieves faster and better convergence\. The reported curves correspond to a representative run; multi\-seed evaluation is left for future work\.



Figure 3:Comparison of KL\-divergence \(First row\)/W2W\_\{2\}metric \(Second row\)/maximum mean discrepancy \(Third row\) decreasing rate\.Moreover, we also show in Fig\.[4](https://arxiv.org/html/2606.17465#S4.F4)the generating process for GMM/Two\-moons model of our Nesterov\-KPM sampling method for demonstration\.


Figure 4:Generating process of our Nesterov\-KPM Sampling\.
## 5Conclusions
We have introduced Perron–Frobenius Operator Matching\(PFOM\), an operator\-theoretic framework that connects density\-level Perron–Frobenius evolution, Koopman path matching,and sample\-conditioned generative training\. We showed that theKullback–Leibler divergence has a distinguished role amongseparable Bregman divergences in preserving the alignment betweenmarginal density objectives and conditional losses\. We alsodeveloped a Nesterov\-type inertial variant, which improves theempirical convergence behavior of the Koopman path\-matchingimplementation on Gaussian mixture and two\-moon benchmarks\.The present experiments serve as low\-dimensional proof\-of\-conceptvalidations\. Future work will focus on higher\-dimensionalbenchmarks, adaptive observable dictionaries, latent\-space imagemodeling, and controlled PFOM formulations with explicit input orfeedback dependence\. More systematic empirical evaluation,including multi\-seed robustness, uncertainty bands, and comparisonswith standard flow\-matching and diffusion baselines, will also beimportant for assessing the practical scalability of the proposedapproach\.
## References
- S\. L\. Brunton, J\. L\. Proctor, and J\. N\. Kutz \(2016\)Discovering governing equations from data by sparse identification of nonlinear dynamical systems\.Proceedings of the national academy of sciences113\(15\),pp\. 3932–3937\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p2.1)\.
- C\. Cui, J\. Liu, P\. Hui, P\. Lin, and C\. Zhang \(2025\)GenControl: generative ai\-driven autonomous design of control algorithms\.arXiv preprint arXiv:2506\.12554\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1)\.
- T\. Eisner, B\. Farkas, M\. Haase, and R\. Nagel \(2015\)Operator theoretic aspects of ergodic theory\.Vol\.272,Springer\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p2.1)\.
- J\. Ho, A\. Jain, and P\. Abbeel \(2020\)Denoising diffusion probabilistic models\.Advances in neural information processing systems33,pp\. 6840–6851\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1),[§1](https://arxiv.org/html/2606.17465#S1.p3.1)\.
- P\. Holderrieth, M\. Havasi, J\. Yim, N\. Shaul, I\. Gat, T\. Jaakkola, B\. Karrer, R\. T\. Chen, and Y\. Lipman \(2024\)Generator matching: generative modeling with arbitrary markov processes\.arXiv preprint arXiv:2410\.20587\.Cited by:[§2\.2](https://arxiv.org/html/2606.17465#S2.SS2.p4.1)\.
- A\. Karimi and T\. T\. Georgiou \(2022\)Data\-driven approximation of the perron\-frobenius operator using the wasserstein metric\.IFAC\-PapersOnLine55\(30\),pp\. 341–346\.Cited by:[§3\.2](https://arxiv.org/html/2606.17465#S3.SS2.p1.6),[Remark 1](https://arxiv.org/html/2606.17465#Thmremark1.p1.1)\.
- S\. M\. Katz, A\. L\. Corso, C\. A\. Strong, and M\. J\. Kochenderfer \(2022\)Verification of image\-based neural network controllers using generative models\.Journal of Aerospace Information Systems19\(9\),pp\. 574–584\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1)\.
- A\. Lasota and M\. C\. Mackey \(2013\)Chaos, fractals, and noise: stochastic aspects of dynamics\.Vol\.97,Springer Science & Business Media\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.17465#S2.SS1.p1.11),[§3\.1](https://arxiv.org/html/2606.17465#S3.SS1.p1.7)\.
- B\. Lemmens and R\. Nussbaum \(2012\)Nonlinear perron\-frobenius theory\.Vol\.189,Cambridge University Press\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p2.1)\.
- Q\. Li, F\. Dietrich, E\. M\. Bollt, and I\. G\. Kevrekidis \(2017\)Extended dynamic mode decomposition with dictionary learning: a data\-driven adaptive spectral decomposition of the koopman operator\.Chaos: An Interdisciplinary Journal of Nonlinear Science27\(10\)\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p2.1),[§3](https://arxiv.org/html/2606.17465#S3.p2.1)\.
- Y\. Lipman, R\. T\. Chen, H\. Ben\-Hamu, M\. Nickel, and M\. Le \(2022\)Flow matching for generative modeling\.arXiv preprint arXiv:2210\.02747\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.17465#S2.SS2.p2.8),[§2\.2](https://arxiv.org/html/2606.17465#S2.SS2.p4.1)\.
- Y\. Lipman, M\. Havasi, P\. Holderrieth, N\. Shaul, M\. Le, B\. Karrer, R\. T\. Chen, D\. Lopez\-Paz, H\. Ben\-Hamu, and I\. Gat \(2024\)Flow matching guide and code\.arXiv preprint arXiv:2412\.06264\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1),[§1](https://arxiv.org/html/2606.17465#S1.p3.1),[Figure 1](https://arxiv.org/html/2606.17465#S2.F1.4.1),[Figure 1](https://arxiv.org/html/2606.17465#S2.F1.6.2),[§2\.2](https://arxiv.org/html/2606.17465#S2.SS2.p4.1)\.
- Y\. Nesterovet al\.\(2018\)Lectures on convex optimization\.Vol\.137,Springer\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p5.1),[§3\.5](https://arxiv.org/html/2606.17465#S3.SS5.p1.2)\.
- A\. V\. Oppenheim, A\. S\. Willsky, and S\. H\. Nawab \(1997\)Signals & systems\.Pearson Educación\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1)\.
- J\. L\. Proctor, S\. L\. Brunton, and J\. N\. Kutz \(2016\)Dynamic mode decomposition with control\.SIAM Journal on Applied Dynamical Systems15\(1\),pp\. 142–161\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p2.1),[§3](https://arxiv.org/html/2606.17465#S3.p2.1)\.
- H\. Risken \(1989\)Fokker\-planck equation\.InThe Fokker\-Planck equation: methods of solution and applications,pp\. 63–95\.Cited by:[§3\.1](https://arxiv.org/html/2606.17465#S3.SS1.p1.7),[§3\.1](https://arxiv.org/html/2606.17465#S3.SS1.p2.3)\.
- T\. Rolski, H\. Schmidli, V\. Schmidt, and J\. L\. Teugels \(2009\)Stochastic processes for insurance and finance\.John Wiley & Sons\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1)\.
- S\. M\. Ross \(1995\)Stochastic processes\.John Wiley & Sons\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1)\.
- N\. G\. Van Kampen \(1992\)Stochastic processes in physics and chemistry\.Vol\.1,Elsevier\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1)\.
- L\. Yang, Z\. Zhang, Y\. Song, S\. Hong, R\. Xu, Y\. Zhao, W\. Zhang, B\. Cui, and M\. Yang \(2023\)Diffusion models: a comprehensive survey of methods and applications\.ACM computing surveys56\(4\),pp\. 1–39\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1),[§1](https://arxiv.org/html/2606.17465#S1.p3.1)\.Similar Articles
A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models
This arXiv preprint proposes a unified measure-theoretic framework for understanding diffusion, score-based, and flow matching generative models. It establishes connections between these methods via continuity/Fokker-Planck equations and analyzes their sampling schemes and theoretical guarantees.
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models
Flow-DPPO replaces ratio clipping with divergence proximal constraints in flow matching models, improving training stability and multi-objective optimization through exact KL divergence computation.
Multimarginal flow matching with optimal transport potentials
Proposes OTP-FM, a novel method for multimarginal flow matching that uses optimal transport potentials to softly steer flows through intermediate marginals, achieving state-of-the-art performance on single-cell RNA sequencing, oceanographic, and meteorological datasets.
Capturing non-Markovian dynamics in non-equilibrium stochastic systems using flow matching
This paper develops a generative flow matching method to capture non-Markovian dynamics in non-equilibrium stochastic systems, demonstrating improved predictions for the Kramers first passage time problem compared to Markovian baselines.
Flow-OPD: On-Policy Distillation for Flow Matching Models
Flow-OPD is a research paper introducing a two-stage on-policy distillation framework for Flow Matching text-to-image models, significantly improving generation quality and alignment metrics using Stable Diffusion 3.5 Medium.