Perron--Frobenius Operator Matching for Generative Modeling

arXiv cs.LG Papers

Summary

Introduces Perron–Frobenius Operator Matching (PFOM), a generative framework that unifies flow, diffusion, and jump models via integral PF operator matching, proving KL divergence yields a practical loss equivalent to Koopman path matching, and develops Nesterov-accelerated training and sampling for improved efficiency.

arXiv:2606.17465v1 Announce Type: new Abstract: We introduce Perron--Frobenius Operator Matching (PFOM), a generative framework that matches density evolution via the integral PF operator, subsuming flow, diffusion, and jump models. We prove that among Bregman divergences, only Kullback--Leibler divergence preserves equality between density-level and sample-conditioned objectives, yielding a practical loss equivalent to Koopman path matching. We further develop Nesterov-accelerated training and sampling that stabilize discretization and accelerate convergence. %On Gaussian mixtures and two-moons, PFOM achieves faster KL/$W_2$/MMD decrease and improved wall-clock efficiency with empirical validation. PFOM unifies operator-theoretic identification with modern generative modeling and opens paths to adaptive dictionaries and high-dimensional applications.
Original Article
View Cached Full Text

Cached at: 06/17/26, 05:38 AM

# Perron–Frobenius Operator Matching for Generative Modeling
Source: [https://arxiv.org/html/2606.17465](https://arxiv.org/html/2606.17465)
Wuwei WuJaemin OhJie ChenXiaoning QianTexas A&M University, College Station, TX 77840, USA \(e\-mail: shiqizhang001@tamu\.edu; jaemin\_oh@tamu\.edu; xqian@ece\.tamu\.edu\)City University of Hong Kong, Kowloon, Hong Kong SAR \(e\-mail: w\.wu@my\.cityu\.edu\.hk; jichen@cityu\.edu\.hk\)

###### Abstract

We introduce Perron–Frobenius Operator Matching \(PFOM\), a generative framework that matches density evolution via the integral PF operator, subsuming flow, diffusion, and jump models\. We prove that among Bregman divergences, only Kullback–Leibler divergence preserves equality between density\-level and sample\-conditioned objectives, yielding a practical loss equivalent to Koopman path matching\. We further develop Nesterov\-accelerated training and sampling that stabilize discretization and accelerate convergence\. PFOM achieves faster KL/W2W\_\{2\}/MMD decrease and improved wall\-clock efficiency with empirical validation\. PFOM unifies operator\-theoretic identification with modern generative modeling and opens paths to adaptive dictionaries and high\-dimensional applications\.

###### keywords:

Koopman and Perron\-Frobenius Operators, Flow Matching, Generative modeling

††thanks:This work was supported in part by the Hong Kong RGC under Project CityU 11203321, CityU 11213322, CityU 11207823\. XQ acknowledges the support from U\.S\. National Science Foundation \(NSF\) grants SHF\-2215573 and IIS\-2212419\.## 1Introduction

Characterizing Markov processes is fundamental to stochastic analysis\(Ross,[1995](https://arxiv.org/html/2606.17465#bib.bib3)\), with wide\-ranging applications in, e\.g\., finance\(Rolskiet al\.,[2009](https://arxiv.org/html/2606.17465#bib.bib15)\), statistical physics\(Van Kampen,[1992](https://arxiv.org/html/2606.17465#bib.bib14)\), and signal processing\(Oppenheimet al\.,[1997](https://arxiv.org/html/2606.17465#bib.bib13)\)\. The recent surge of artificial intelligence and generative modeling has amplified interest in learnable Markovian dynamics\(Hoet al\.,[2020](https://arxiv.org/html/2606.17465#bib.bib12); Yanget al\.,[2023](https://arxiv.org/html/2606.17465#bib.bib8); Lipmanet al\.,[2024](https://arxiv.org/html/2606.17465#bib.bib22)\), which is of interest to modeling and control of large\-scale, complex systems, especially in the context of neural network\-based control design\(Katzet al\.,[2022](https://arxiv.org/html/2606.17465#bib.bib10)\)and generative AI\-driven automated control algorithms\(Cuiet al\.,[2025](https://arxiv.org/html/2606.17465#bib.bib11)\)\.

A central challenge is to efficiently and accurately parameterize Markov processes\. Operator\-theoretic perspectives provide a principled route: the Markov transfer operator\(Eisneret al\.,[2015](https://arxiv.org/html/2606.17465#bib.bib7)\)offers a dominant characterization, and for \(nonlinear\) semidynamical systems, Perron–Frobenius theory grounds the Markov semigroup\(Lemmens and Nussbaum,[2012](https://arxiv.org/html/2606.17465#bib.bib4); Lasota and Mackey,[2013](https://arxiv.org/html/2606.17465#bib.bib16)\), revealing the duality between Koopman and Perron–Frobenius operators\. From a control\-theoretic viewpoint, these operators encode the probabilistic evolution of closed\-loop dynamics under stochastic policies and exogenous disturbances, enabling linear surrogates for stability analysis, constraint satisfaction, and performance verification of nonlinear systems\. In safety\-critical applications such as robotics, power systems, and networked infrastructures, learning and manipulating such operators from data is therefore crucial for risk\-aware decision\-making and robust control synthesis\. Building on this view, data\-driven identification methods, such as DMD\(Proctoret al\.,[2016](https://arxiv.org/html/2606.17465#bib.bib2)\)and EDMD\(Liet al\.,[2017](https://arxiv.org/html/2606.17465#bib.bib20); Bruntonet al\.,[2016](https://arxiv.org/html/2606.17465#bib.bib5)\), have become standard\.

Concurrently, modern generative models, such as diffusion\(Hoet al\.,[2020](https://arxiv.org/html/2606.17465#bib.bib12); Yanget al\.,[2023](https://arxiv.org/html/2606.17465#bib.bib8)\)and flow\-based models\(Lipmanet al\.,[2022](https://arxiv.org/html/2606.17465#bib.bib1),[2024](https://arxiv.org/html/2606.17465#bib.bib22)\), impose stronger demands: capturing multimodality and nonlinear density evolution with sample\-conditioned efficiency\. Traditional Koopman/Perron–Frobenius identification, designed primarily for prediction and control, does not directly address these generative objectives\.

To bridge this gap, we introduce*Perron–Frobenius Operator Matching \(PFOM\)*\. PFOM \(i\) generalizes diffusion and flow matching paradigms by matching full density evolution—extending beyond first\-order \(velocity\) descriptions to infinitely many orders—and \(ii\) strengthens operator learning for generative purposes by aligning density\-level objectives with sample\-conditioned criteria, thereby unifying operator\-theoretic identification with modern generative modeling\.

An important extension of PFOM is an*inertial*optimization/sampling scheme based on Nesterov’s acceleration\(Nesterov and others,[2018](https://arxiv.org/html/2606.17465#bib.bib6)\)\. We employ a lookahead extrapolation on the operator\-parameter iterates and an inertial update on sample trajectories\. Concretely, PFOM alternates between \(a\) extrapolated evaluation of the PF loss at a momentum point and \(b\) corrective updates with restart/monotone safeguards\. This yields: \(i\) faster empirical convergence of the PF loss and density metrics \(KL,W2W\_\{2\}, MMD\); \(ii\) reduced discretization error in sample propagation due to lookahead stabilization\.

The rest of the paper is organized as follows: In Section[2](https://arxiv.org/html/2606.17465#S2), we review relevant background knowledge of Koopman/Perron–Frobenius theory, Wasserstein and Bregman divergence measures, and generative modeling\. In Section[3](https://arxiv.org/html/2606.17465#S3), we explain why and how \(under what measure\) we should look at Perron–Frobenius operator matching, and then we convert it into the Koopman path matching problem for implementation\. Section[3\.5](https://arxiv.org/html/2606.17465#S3.SS5)brings up a Nesterov momentum accelerating method for faster generation\. Section[4](https://arxiv.org/html/2606.17465#S4)demonstrates with simulations and Section[5](https://arxiv.org/html/2606.17465#S5)concludes the paper\.

## 2Preliminaries

### 2\.1Koopman and Perron–Frobenius Operators

Consider a nonlinear dynamical systemxt=St​\(x0\),x\_\{t\}=S\_\{t\}\(x\_\{0\}\),whereS:ℝn→ℝnS\\colon\\mathbb\{R\}^\{n\}\\to\\mathbb\{R\}^\{n\}is a non\-singular mapping\. For somef∈L∞f\\in\{L\}\_\{\\infty\}, the Koopman operator𝒦τ\\mathcal\{K\}\_\{\\tau\}, is defined as

\(𝒦τ​f\)​\(xt\)=f​\(Sτ​\(xt\)\)\.\\displaystyle\(\\mathcal\{K\}\_\{\\tau\}f\)\(x\_\{t\}\)=f\(S\_\{\\tau\}\(x\_\{t\}\)\)\.\(1\)For someg∈L1g\\in\{L\}\_\{1\}, the Perron–Frobenius \(PF\) operator𝒫τ\\mathcal\{P\}\_\{\\tau\}, is defined as

∫y∈A\(𝒫τ​g\)​\(y\)​dy=∫x∈Sτ−1​\(A\)g​\(x\)​dx,∀A∈Σ,\\displaystyle\\int\_\{y\\in A\}\(\\mathcal\{P\}\_\{\\tau\}g\)\(y\)\\mathrm\{d\}y=\\int\_\{x\\in S\_\{\\tau\}^\{\-1\}\(A\)\}g\(x\)\\mathrm\{d\}x,\\quad\\forall A\\in\\Sigma,\(2\)whereΣ\\Sigmadenotes someσ\\sigma\-algebra corresponding to the spaceℝn\\mathbb\{R\}^\{n\}\. Whenggis a density function, PF operator𝒫τ\\mathcal\{P\}\_\{\\tau\}is actually aMarkov operatorthat pushes forward present density to future densities\(Lasota and Mackey,[2013](https://arxiv.org/html/2606.17465#bib.bib16)\)\.

By PF\-Koopman duality, for somef∈L∞f\\in\{L\}\_\{\\infty\}and some densityg∈L1g\\in\{L\}\_\{1\}, we always have

⟨𝒦τ​f,g⟩=\\displaystyle\\langle\\mathcal\{K\}\_\{\\tau\}f,g\\rangle=⟨f,𝒫τ​g⟩\.\\displaystyle\\langle f,\\mathcal\{P\}\_\{\\tau\}g\\rangle\.\(3\)This means that the Koopman and PF operators form a dual pair\.

### 2\.2Generative Modeling

Consider two random vectorsX0∼𝒩​\(0,I\)X\_\{0\}\\sim\\mathcal\{N\}\(0,I\)andX1∼q​\(X1\)X\_\{1\}\\sim q\(X\_\{1\}\), whereX0X\_\{0\}is generated from the known prior distribution whileX1X\_\{1\}is from some distributionq​\(X1\)q\(X\_\{1\}\)whose analytical form is not knowna priori\. The objective forgenerative modelingis to learn a generative modelℳθ​\(X1\)\\mathcal\{M\}\_\{\\theta\}\(X\_\{1\}\), from observed data𝒟​\(X1\)\\mathcal\{D\}\(X\_\{1\}\), to generate samples following the distributionq​\(X1\)q\(X\_\{1\}\)\.

As shown in Fig\.[1](https://arxiv.org/html/2606.17465#S2.F1), one of such generative modeling strategies isFlow Matching\(FM\)\(Lipmanet al\.,[2022](https://arxiv.org/html/2606.17465#bib.bib1)\)\. It constructs a probability path\(pt\)t∈\[0,1\]\(p\_\{t\}\)\_\{t\\in\[0,1\]\}, from a known source distributionp0=pp\_\{0\}=pto the target distributionp1=qp\_\{1\}=q, where eachptp\_\{t\}is a distribution overℝd\\mathbb\{R\}^\{d\}\. Specifically, FM adopts a simple regression objective to train the velocity field neural network describing the instantaneous velocities of samples—later used to convert the source distributionp0p\_\{0\}into the target distributionp1p\_\{1\}, along the probability pathptp\_\{t\}\. That is, minimizing the flow matching loss:

𝔼Xt∼pt;t∈Unif​\[0,1\]​‖u​\(Xt\)−vθ​\(Xt\)‖2\\displaystyle\\mathbb\{E\}\_\{X\_\{t\}\\sim p\_\{t\};t\\in\\mathrm\{Unif\}\[0,1\]\}\{\\big\\\|u\(X\_\{t\}\)\-v\_\{\\theta\}\(X\_\{t\}\)\\big\\\|\}^\{2\}\(4\)by minimizing its surrogate version \(the one conditional onX0X\_\{0\}andX1X\_\{1\}\):

𝔼X0∼p​X1∼q,t∼U​\[0,1\]∥u​\(Xt​\(X1,X0,t\)\)−vθ\(Xt\(X1,X0,t\)\)∥2\.\\mathbb\{E\}\_\{X\_\{0\}\\sim pX\_\{1\}\\sim q,t\\sim\\mathrm\{U\}\[0,1\]\}\\Big\\\|u\(X\_\{t\}\(X\_\{1\},X\_\{0\},t\)\)\\\\ \-v\_\{\\theta\}\(X\_\{t\}\(X\_\{1\},X\_\{0\},t\)\)\\Big\\\|^\{2\}\.\(5\)Notice that for \([5](https://arxiv.org/html/2606.17465#S2.E5)\) and\([4](https://arxiv.org/html/2606.17465#S2.E4)\)to have the same optima, one has to use theSample\-Level Bregman divergenceas a distance measure, of which the mean squared error \(MSE\) loss is a special choice\. After training, we generate a novel sample from the target distributionX1∼qX\_\{1\}\\sim qby \(i\) drawing a novel sample from the source distributionX0∼pX\_\{0\}\\sim p, and \(ii\) solving the ordinary differential equation \(ODE\) determined by the velocity field:X˙t=vθ​\(Xt\),t∈\[0,1\]\.\\dot\{X\}\_\{t\}=v\_\{\\theta\}\(X\_\{t\}\),\\quad t\\in\[0,1\]\.

In the discrete time settings, FM is formulated asPath Matching\. Meanwhile, the flow ODEX˙t=vθ​\(Xt\)\\dot\{X\}\_\{t\}=v\_\{\\theta\}\(X\_\{t\}\)is solved by simulating the discrete path equationXk\+1=Xk\+τ​vθ​\(Xk\)X\_\{k\+1\}=\{X\_\{k\}\+\\tau\}v\_\{\\theta\}\(X\_\{k\}\)\.

![Refer to caption](https://arxiv.org/html/2606.17465v1/Flow.png) ![Refer to caption](https://arxiv.org/html/2606.17465v1/Generating.png)

Figure 1:Demonstration for sample and noise \(left\) and the corresponding generation process \(right\)\(Lipmanet al\.,[2024](https://arxiv.org/html/2606.17465#bib.bib22)\)Flow matching and diffusion models control the*local*terms in \([6](https://arxiv.org/html/2606.17465#S3.E6)\)—drift \(first order\) and diffusion \(second order\)—within a KFE\-based objective\(Lipmanet al\.,[2022](https://arxiv.org/html/2606.17465#bib.bib1),[2024](https://arxiv.org/html/2606.17465#bib.bib22)\)\. Recent generator matching extends this to include jump contributions\(Holderriethet al\.,[2024](https://arxiv.org/html/2606.17465#bib.bib18)\)\. However, all these existing formulations operate at the*infinitesimal*level and characterize only the first\-, second\-order, or jump terms of the differential approximation, which may fail to capture higher\-order, multi\-step transport effects crucial for complex, multi\-modal density evolution\.

## 3Perron\-Frobenius Operator Matching

We here propose a new generative modeling framework, Perron–Frobenius operator matching \(PFOM\), which elevates generative modeling from matching local, infinitesimal dynamics to directly aligning the finite\-time evolution of densities\. Instead of constraining only the drift/diffusion terms in a Kolmogorov forward equation \(KFE\)—as in flow matching and diffusion models—PFOM works at the level of the integral Perron–Frobenius operator, which encapsulates the full Markov semigroup, considering higher\-order and multi\-step transport effects that are critical for complex, multimodal distributions\. By matching𝒫τ​ρt\\mathcal\{P\}\_\{\\tau\}\\rho\_\{t\}andρt\+τ\\rho\_\{t\+\\tau\}at a finite stepτ\\tau, PFOM captures richer global evolution than purely velocity\-based schema, while still remaining compatible with operator\-theoretic tools such as Koopman and DMD/EDMD\-based identification\.

We further formulate PFOM with a practical, sample\-conditioned training loss\. We show that among separable Bregman divergences, KL is the unique choice that keeps the density\-level PF loss exactly aligned with its conditional counterpart, thereby justifying a KL\-based PFOM objective for generative training\. Pushing this loss through the PF–Koopman duality yields an equivalent Koopman path\-matching formulation that can be realized with neural operators or classical DMD/EDMD\(Proctoret al\.,[2016](https://arxiv.org/html/2606.17465#bib.bib2); Liet al\.,[2017](https://arxiv.org/html/2606.17465#bib.bib20)\)\. We derive Nesterov\-style inertial updates for faster and more stable optimization and sampling\. In this section we first formalize PFOM, establish its Koopman equivalences, and introduce a Nesterov\-accelerated variant tailored for efficient training\.

### 3\.1Why Perron–Frobenius Operators?

Let\(𝒫τ\)τ≥0\(\\mathcal\{P\}\_\{\\tau\}\)\_\{\\tau\\geq 0\}denote the Perron–Frobenius \(PF\) semigroup acting on densities, and\(𝒦τ\)τ≥0\(\\mathcal\{K\}\_\{\\tau\}\)\_\{\\tau\\geq 0\}the Koopman semigroup acting on test functions \(observables\)\. A sufficiently regular Markov process with driftut​\(x\)u\_\{t\}\(x\), diffusionσt​\(x\)\\sigma\_\{t\}\(x\), and jumps, is governed by the KFE\(Risken,[1989](https://arxiv.org/html/2606.17465#bib.bib17); Lasota and Mackey,[2013](https://arxiv.org/html/2606.17465#bib.bib16)\):∂t⟨ρt,f⟩=⟨ρt,ℒ∗​f⟩,\\partial\_\{t\}\\langle\\rho\_\{t\},f\\rangle\\;=\\;\\langle\\rho\_\{t\},\\mathcal\{L\}^\{\*\}f\\rangle,whereℒ∗\\mathcal\{L\}\{\{\}^\{\*\}\}is the \(Koopman\) infinitesimal generator acting onff:

ℒ∗​f​\(x\)=ut​\(x\)𝖳​∇f​\(x\)⏟drift\+12​tr​\(σt​\(x\)​σt​\(x\)𝖳​∇2f​\(x\)\)⏟diffusion\+\(jump term\)⏟if present\.\\mathcal\{L\}^\{\*\}f\(x\)=\\underbrace\{\{u\_\{t\}\(x\)\\\!\}^\{\\mathsf\{T\}\}\\nabla f\(x\)\}\_\{\\text\{drift\}\}\+\\underbrace\{\\tfrac\{1\}\{2\}\\,\\mathrm\{tr\}\\big\(\\sigma\_\{t\}\(x\)\{\\sigma\_\{t\}\(x\)\\\!\}^\{\\mathsf\{T\}\}\\nabla^\{2\}f\(x\)\\big\)\}\_\{\\text\{diffusion\}\}\\\\ \+\\underbrace\{\\text\{\(jump term\)\}\}\_\{\\text\{if present\}\}\.\(6\)Equivalently, on densities the adjoint generatorℒ\\mathcal\{L\}yields the Fokker–Planck form∂tρt=ℒ​ρt\\partial\_\{t\}\\rho\_\{t\}=\\mathcal\{L\}\\rho\_\{t\}\. The integral PF operator satisfies

ρt\+τ=𝒫τ​ρt=eτ​ℒ​ρt,⟨ρt\+τ,f⟩=⟨ρt,𝒦τ​f⟩,\\displaystyle\\rho\_\{t\+\\tau\}\\;=\\;\\mathcal\{P\}\_\{\\tau\}\\rho\_\{t\}\\;=\\;e^\{\\tau\\mathcal\{L\}\}\\rho\_\{t\},\\qquad\\langle\\rho\_\{t\+\\tau\},f\\rangle\\;=\\;\\langle\\rho\_\{t\},\\mathcal\{K\}\_\{\\tau\}f\\rangle,\(7\)so that𝒦τ=eτ​ℒ∗\\mathcal\{K\}\_\{\\tau\}=e^\{\\tau\\mathcal\{L\}^\{\*\}\}and𝒫τ=eτ​ℒ\\mathcal\{P\}\_\{\\tau\}=e^\{\\tau\\mathcal\{L\}\}are dual\.

In contrast, PFOM compares the*integral*evolution𝒫τ​ρt​vs\.​ρt\+τ\\mathcal\{P\}\_\{\\tau\}\\rho\_\{t\}\\;\\;\\text\{vs\.\}\\;\\;\\rho\_\{t\+\\tau\}for finiteτ\\tau, thereby capturing*all orders*in the expansion ofeτ​ℒe^\{\\tau\\mathcal\{L\}\}\(Risken,[1989](https://arxiv.org/html/2606.17465#bib.bib17)\)\. Practically, this allows us to train against richer, multi\-step transport phenomena that are invisible to purely infinitesimal matching\.

### 3\.2Wasserstein\-Divergence Guided PFOM

We denoteΠ​\(ρ0,ρ1\)\\Pi\(\\rho\_\{0\},\\rho\_\{1\}\)as the set of all possible joint distributions with starting marginal densityρ0\\rho\_\{0\}and ending marginal densityρ1\\rho\_\{1\}\. TheWasserstein\-2 metricis defined by:

W22​\(ρ0,ρ1\)=infπ∈Π​\(ρ0,ρ1\)∫ℝn×ℝn‖x−y‖2​π​\(d​x,d​y\)\.W\_\{2\}^\{2\}\(\\rho\_\{0\},\\rho\_\{1\}\)=\\inf\_\{\\pi\\in\\Pi\(\\rho\_\{0\},\\rho\_\{1\}\)\}\\int\_\{\{\\mathbb\{R\}\}^\{n\}\\times\{\\mathbb\{R\}\}^\{n\}\}\{\\\|x\-y\\\|\}^\{2\}\\pi\(\\mathrm\{d\}x,\\mathrm\{d\}y\)\.\(8\)InKarimi and Georgiou \([2022](https://arxiv.org/html/2606.17465#bib.bib23)\), the authors took the Wasserstein\-2 metric as the loss function to match the density flowρk​\(x\)\\rho\_\{k\}\(x\)through learning the Perron–Frobenius operator𝒫\\mathcal\{P\}such that\(𝒫​ρk\)​\(x\)=ρk\+1​\(x\)\(\\mathcal\{P\}\\rho\_\{k\}\)\(x\)=\\rho\_\{k\+1\}\(x\):

W22​\(\(𝒫​ρk\)​\(x\),ρk\+1​\(y\)\)=infπ∈Π​\(𝒫​ρk,ρk\+1\)∫𝑑y​𝑑x​‖x−y‖2​π​\(x,y\)\.W\_\{2\}^\{2\}\(\(\\mathcal\{P\}\\rho\_\{k\}\)\(x\),\\rho\_\{k\+1\}\(y\)\)\\\\ =\\inf\_\{\\pi\\in\\Pi\(\\mathcal\{P\}\\rho\_\{k\},\\rho\_\{k\+1\}\)\}\\int dydx\\\|x\-y\\\|^\{2\}\\pi\(x,y\)\.
Consider a set of observables\{ϕk\}k=1K⊂L∞\\\{\\phi\_\{k\}\\\}\_\{k=1\}^\{K\}\\subset L\_\{\\infty\}and define the dictionaryΦ≔ℝn→ℝK\\Phi\\coloneqq\{\\mathbb\{R\}\}^\{n\}\\to\{\\mathbb\{R\}\}^\{K\}as the vector\-valued functionΦ​\(x\)=\[ϕ1​\(x\)⋯ϕK​\(x\)\]𝖳\.\\Phi\(x\)=\{\\begin\{bmatrix\}\\phi\_\{1\}\(x\)&\\cdots&\\phi\_\{K\}\(x\)\\end\{bmatrix\}\}^\{\\mathsf\{T\}\}\.The Koopman operator𝒦τ\\mathcal\{K\}\_\{\\tau\}acts on this dictionary component\-wise, yielding𝒦τ​Φ=\[𝒦τ​ϕ1⋯𝒦τ​ϕK\]𝖳\.\\mathcal\{K\}\_\{\\tau\}\\Phi=\{\\begin\{bmatrix\}\\mathcal\{K\}\_\{\\tau\}\\phi\_\{1\}&\\cdots&\\mathcal\{K\}\_\{\\tau\}\\phi\_\{K\}\\end\{bmatrix\}\}^\{\\mathsf\{T\}\}\.The discrepancy between𝒫τ​ρt\\mathcal\{P\}\_\{\\tau\}\\rho\_\{t\}andρt\+τ\\rho\_\{t\+\\tau\}on the observables\{ϕk\}k=1K\\\{\\phi\_\{k\}\\\}\_\{k=1\}^\{K\}can be measured by the Wasserstein\-2 metric as follows:

W𝒫τ2​\(ρt,ρt\+τ\)≔infπ∈Π​\(𝒫τ​ρt,ρt\+τ\)∫ℝn×ℝn‖Φ​\(x\)−Φ​\(y\)‖2​π​\(d​x,d​y\)\.W\_\{\\mathcal\{P\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\\coloneqq\\\\ \\inf\_\{\\pi\\in\\Pi\(\\mathcal\{P\}\_\{\\tau\}\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\}\\int\_\{\{\\mathbb\{R\}\}^\{n\}\\times\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\(\\mathrm\{d\}x,\\mathrm\{d\}y\)\.\(9\)Similarly, we can define the discrepancy between𝒦τ​Φ\\mathcal\{K\}\_\{\\tau\}\\PhiandΦ\\Phiunder the Wasserstein\-2 metric as

W𝒦τ2​\(ρt,ρt\+τ\)≔infπ∈Π​\(ρt,ρt\+τ\)∫ℝn×ℝn‖𝒦τ​Φ​\(x\)−Φ​\(y\)‖2​π​\(d​x,d​y\)\.W\_\{\\mathcal\{K\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\\coloneqq\\\\ \\inf\_\{\\pi\\in\\Pi\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\}\\int\_\{\{\\mathbb\{R\}\}^\{n\}\\times\{\\mathbb\{R\}\}^\{n\}\}\{\\\|\\mathcal\{K\}\_\{\\tau\}\\Phi\(x\)\-\\Phi\(y\)\\\|\}^\{2\}\\pi\(\\mathrm\{d\}x,\\mathrm\{d\}y\)\.\(10\)The following theorem shows the equivalence between these two discrepancies\.

###### Theorem 1

For any set of observables\{ϕk\}k=1K⊂L∞\\\{\\phi\_\{k\}\\\}\_\{k=1\}^\{K\}\\subset L\_\{\\infty\}, we haveW𝒫τ2​\(ρt,ρt\+τ\)=W𝒦τ2​\(ρt,ρt\+τ\)\.W\_\{\\mathcal\{P\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)=W\_\{\\mathcal\{K\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\.

###### Proof\.

For anyπ∈Π​\(𝒫​ρt,ρt\+τ\)\\pi\\in\\Pi\(\\mathcal\{P\}\\rho\_\{t\},\\rho\_\{t\+\\tau\}\), we first obtain

∫ℝn×ℝn‖Φ​\(x\)−Φ​\(y\)‖2​π​\(d​x,d​y\)\\displaystyle\\int\_\{\{\\mathbb\{R\}\}^\{n\}\\times\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\(\\mathrm\{d\}x,\\mathrm\{d\}y\)=∫ℝn𝒫​ρk​\(x\)​∫ℝn‖Φ​\(x\)−Φ​\(y\)‖2​π​\(d​y\|x\)​dx\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\mathcal\{P\}\\rho\_\{k\}\(x\)\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\(\\mathrm\{d\}y\|x\)\\mathrm\{d\}x=∫ℝnρk​\(x\)​𝒦​∫ℝn‖Φ​\(x\)−Φ​\(y\)‖2​π​\(d​y\|x\)​dx\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\rho\_\{k\}\(x\)\\mathcal\{K\}\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\(\\mathrm\{d\}y\|x\)\\mathrm\{d\}x=∫ℝnρk​\(x\)​∫ℝn‖Φ​\(S​\(x\)\)−Φ​\(y\)‖2​π​\(d​y\|S​\(x\)\)​dx,\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\rho\_\{k\}\(x\)\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(S\(x\)\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\(\\mathrm\{d\}y\|S\(x\)\)\\mathrm\{d\}x,where the first equality comes from the disintegration theorem, and the second is due to the duality between the PF operator𝒫\\mathcal\{P\}and the Koopman operator𝒦\\mathcal\{K\}, and the third follows from the definition of Koopman operators\. Letπ1​\(d​x,d​y\)=ρt​\(x\)​π​\(d​y\|S​\(x\)\)​d​x\\pi\_\{1\}\(\\mathrm\{d\}x,\\mathrm\{d\}y\)=\\rho\_\{t\}\(x\)\\pi\(\\mathrm\{d\}y\|S\(x\)\)\\mathrm\{d\}x\. It is clear that

π1​\(d​x,ℝn\)=ρt​\(x\)​π​\(ℝn\|S​\(x\)\)​d​x=ρk​\(x\)​d​x,\\pi\_\{1\}\(\\mathrm\{d\}x,\{\\mathbb\{R\}\}^\{n\}\)=\\rho\_\{t\}\(x\)\\pi\(\{\\mathbb\{R\}\}^\{n\}\|S\(x\)\)\\mathrm\{d\}x=\\rho\_\{k\}\(x\)\\mathrm\{d\}x,which implies thatπ1\\pi\_\{1\}has a marginal densityρk\\rho\_\{k\}\. On the other hand, for any measurable functiong:ℝn→ℝg\\colon\{\\mathbb\{R\}\}^\{n\}\\to\{\\mathbb\{R\}\},

∫ℝn×ℝng​\(y\)​π1​\(d​x,d​y\)\\displaystyle\\int\_\{\{\\mathbb\{R\}\}^\{n\}\\times\{\\mathbb\{R\}\}^\{n\}\}g\(y\)\\pi\_\{1\}\(\\mathrm\{d\}x,\\mathrm\{d\}y\)=∫ℝnρt​\(x\)​∫ℝng​\(y\)​π​\(d​y\|S​\(x\)\)​dx\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\rho\_\{t\}\(x\)\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}g\(y\)\\pi\(\\mathrm\{d\}y\|S\(x\)\)\\mathrm\{d\}x=∫ℝn𝒫​ρt​\(x\)​∫ℝng​\(y\)​π​\(d​y\|x\)​dx\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\mathcal\{P\}\\rho\_\{t\}\(x\)\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}g\(y\)\\pi\(\\mathrm\{d\}y\|x\)\\mathrm\{d\}x=∫ℝn×ℝng​\(y\)​π​\(d​x,d​y\)\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\\times\{\\mathbb\{R\}\}^\{n\}\}g\(y\)\\pi\(\\mathrm\{d\}x,\\mathrm\{d\}y\)=∫ℝng​\(y\)​ρt\+τ​\(y\)​dy,\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}g\(y\)\\rho\_\{t\+\\tau\}\(y\)\\mathrm\{d\}y,where the last two equalities come from the fact thatπ∈Π​\(𝒫​ρk,ρt\+τ\)\\pi\\in\\Pi\(\\mathcal\{P\}\\rho\_\{k\},\\rho\_\{t\+\\tau\}\)\. As a result,π1\\pi\_\{1\}has a marginal densityρk\\rho\_\{k\}, and thus,π1∈Π​\(ρk,ρt\+τ\)\\pi\_\{1\}\\in\\Pi\(\\rho\_\{k\},\\rho\_\{t\+\\tau\}\)\. This gives rises to

W𝒫τ2​\(ρt,ρt\+τ\)≥W𝒦τ2​\(ρt,ρt\+τ\)\.W\_\{\\mathcal\{P\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\\geq W\_\{\\mathcal\{K\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\.\(11\)On the other hand, for anyπ1∈Π​\(ρt,ρt\+τ\)\\pi\_\{1\}\\in\\Pi\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\), we have

∫ℝn×ℝn‖𝒦​Φ​\(x\)−Φ​\(y\)‖2​π1​\(d​x,d​y\)\\displaystyle\\int\_\{\{\\mathbb\{R\}\}^\{n\}\\times\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\mathcal\{K\}\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\_\{1\}\(\\mathrm\{d\}x,\\mathrm\{d\}y\)=∫ℝnρk​\(x\)​𝒦​∫ℝn‖Φ​\(x\)−Φ​\(y\)‖2​π1​\(d​y\|S−1​\(x\)\)​dx\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\rho\_\{k\}\(x\)\\mathcal\{K\}\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\_\{1\}\(\\mathrm\{d\}y\|S^\{\-1\}\(x\)\)\\mathrm\{d\}x=∫ℝn𝒫​ρk​\(x\)​∫ℝn‖Φ​\(x\)−Φ​\(y\)‖2​π​\(d​y\|S−1​\(x\)\)​dx\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\mathcal\{P\}\\rho\_\{k\}\(x\)\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\(\\mathrm\{d\}y\|S^\{\-1\}\(x\)\)\\mathrm\{d\}x=∫ℝn𝒫​ρk​\(x\)​∫ℝn‖Φ​\(x\)−Φ​\(y\)‖2​π​\(d​y\|S−1​\(x\)\)​dx\.\\displaystyle=\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\\mathcal\{P\}\\rho\_\{k\}\(x\)\\int\_\{\{\\mathbb\{R\}\}^\{n\}\}\{\\bigl\\\|\\Phi\(x\)\-\\Phi\(y\)\\bigr\\\|\}^\{2\}\\pi\(\\mathrm\{d\}y\|S^\{\-1\}\(x\)\)\\mathrm\{d\}x\.Letπ​\(d​x,d​y\)=𝒫​ρt​\(x\)​π1​\(d​y\|S−1​\(x\)\)​d​x\\pi\(\\mathrm\{d\}x,\\mathrm\{d\}y\)=\\mathcal\{P\}\\rho\_\{t\}\(x\)\\pi\_\{1\}\(\\mathrm\{d\}y\|S^\{\-1\}\(x\)\)\\mathrm\{d\}x\. A similar approach shows thatπ∈Π​\(𝒫​ρt,ρt\+τ\)\\pi\\in\\Pi\(\\mathcal\{P\}\\rho\_\{t\},\\rho\_\{t\+\\tau\}\), and in turn

W𝒫τ2​\(ρt,ρt\+τ\)≤W𝒦τ2​\(ρt,ρt\+τ\)\.W\_\{\\mathcal\{P\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\\leq W\_\{\\mathcal\{K\}\_\{\\tau\}\}^\{2\}\(\\rho\_\{t\},\\rho\_\{t\+\\tau\}\)\.\(12\)The proof is completed by combining the inequalities \([11](https://arxiv.org/html/2606.17465#S3.E11)\) and \([12](https://arxiv.org/html/2606.17465#S3.E12)\)\. ∎

### 3\.3Bregman\-Divergence Guided PFOM

In PFOM we match the finite\-step density evolutionρt\+τ≈𝒫τ​ρt\\rho\_\{t\+\\tau\}\\approx\\mathcal\{P\}\_\{\\tau\}\\rho\_\{t\}, yet training only accesses conditionals indexed by the data sampleX1∼qX\_\{1\}\\sim q, whose mixture is the marginal,ρt\+τ=𝔼X1∼qρt\+τ\(⋅\|X1\)\\rho\_\{t\+\\tau\}=\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\rho\_\{t\+\\tau\}\(\\cdot\\,\|\\,X\_\{1\}\)\. We ask the density discrepancyDDto meet two requirements\.\(P1\)*\(conditional–marginal consistency\)*: for every data lawqqand every family of conditional densities\{ρ\(⋅\|X1\)\}\\\{\\rho\(\\cdot\\,\|\\,X\_\{1\}\)\\\}with marginalρ¯=𝔼X1∼qρ\(⋅\|X1\)\\bar\{\\rho\}=\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\rho\(\\cdot\\,\|\\,X\_\{1\}\), and every densityσ\\sigma,

𝔼X1∼qD\(ρ\(⋅\|X1\)∥σ\)=D\(ρ¯∥σ\)\+C,\\displaystyle\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\,D\\bigl\(\\rho\(\\cdot\\,\|\\,X\_\{1\}\)\\,\\big\\\|\\,\\sigma\\bigr\)=D\(\\bar\{\\rho\}\\,\\\|\\,\\sigma\)\+C,\(13\)C=C​\(q\)​independent of​σ,\\displaystyle C=C\(q\)\\ \\text\{independent of \}\\sigma,\(14\)so the conditional loss differs from the marginal objective only by a parameter\-free constant and is thus an exact training surrogate\.\(P2\)*\(reparametrization invariance\)*: for every nondegenerate coordinate change \(diffeomorphism\)TTwith push\-forwardT\#\{T\}\_\{\\\#\},

D​\(T\#​ρ∥T\#​σ\)=D​\(ρ∥σ\),D\\bigl\(\{T\}\_\{\\\#\}\\rho\\,\\big\\\|\\,\{T\}\_\{\\\#\}\\sigma\\bigr\)=D\(\\rho\\,\\\|\\,\\sigma\),\(15\)so the aligned operator is intrinsic to the densities, not an artifact of the chosen \(and, for adaptive dictionaries, varying\) observable coordinates\. These two requirements single out the Kullback–Leibler divergence\.

###### Theorem 2

\[KL is the unique consistent, coordinate\-invariant discrepancy\] LetD​\(ρ∥σ\)=∫δ​\(ρ​\(x\),σ​\(x\)\)​dxD\(\\rho\\\|\\sigma\)=\\int\\delta\(\\rho\(x\),\\sigma\(x\)\)\\,\\mathrm\{d\}xbe a separable divergence withδ∈C2​\(\(0,∞\)2\)\\delta\\in C^\{2\}\(\(0,\\infty\)^\{2\}\),δ​\(s,s\)=0\\delta\(s,s\)=0andδ≥0\\delta\\geq 0\. ThenDDsatisfies both \([13](https://arxiv.org/html/2606.17465#S3.E13)\) and \([15](https://arxiv.org/html/2606.17465#S3.E15)\) if and only ifD=c​KLD=c\\,\\operatorname\{KL\}for some constantc\>0c\>0\.

###### Proof\.

If\.LetD=c​KLD=c\\,\\operatorname\{KL\}\. For \([13](https://arxiv.org/html/2606.17465#S3.E13)\), withρ¯=𝔼X1∼qρ\(⋅\|X1\)\\bar\{\\rho\}=\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\rho\(\\cdot\\,\|\\,X\_\{1\}\),

𝔼X1∼qKL\(ρ\(⋅\|X1\)∥σ\)\\displaystyle\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\operatorname\{KL\}\\bigl\(\\rho\(\\cdot\\,\|\\,X\_\{1\}\)\\big\\\|\\sigma\\bigr\)\(16\)=\\displaystyle=𝔼X1∼q∫ρ\(⋅\|X1\)logρ\(⋅\|X1\)ρ¯\+𝔼X1∼q∫ρ\(⋅\|X1\)logρ¯σ\\displaystyle\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\\!\\int\\rho\(\\cdot\\,\|\\,X\_\{1\}\)\\log\\frac\{\\rho\(\\cdot\\,\|\\,X\_\{1\}\)\}\{\\bar\{\\rho\}\}\+\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\\!\\int\\rho\(\\cdot\\,\|\\,X\_\{1\}\)\\log\\frac\{\\bar\{\\rho\}\}\{\\sigma\}\(17\)=\\displaystyle=𝔼X1∼qKL\(ρ\(⋅\|X1\)∥ρ¯\)⏟=⁣:C,σ​\-free\+KL⁡\(ρ¯∥σ\),\\displaystyle\\underbrace\{\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\operatorname\{KL\}\\bigl\(\\rho\(\\cdot\\,\|\\,X\_\{1\}\)\\big\\\|\\bar\{\\rho\}\\bigr\)\}\_\{=:C,\\ \\sigma\\text\{\-free\}\}\+\\,\\operatorname\{KL\}\(\\bar\{\\rho\}\\\|\\sigma\),\(18\)becauselog⁡\(ρ¯/σ\)\\log\(\\bar\{\\rho\}/\\sigma\)does not depend onX1X\_\{1\}and𝔼X1∼qρ\(⋅\|X1\)=ρ¯\\mathbb\{E\}\_\{X\_\{1\}\\sim q\}\\rho\(\\cdot\\,\|\\,X\_\{1\}\)=\\bar\{\\rho\}\. For \([15](https://arxiv.org/html/2606.17465#S3.E15)\), change of variables under any diffeomorphismTTgivesKL⁡\(T\#​ρ∥T\#​σ\)=KL⁡\(ρ∥σ\)\\operatorname\{KL\}\(\{T\}\_\{\\\#\}\\rho\\\|\{T\}\_\{\\\#\}\\sigma\)=\\operatorname\{KL\}\(\\rho\\\|\\sigma\), the Jacobian cancelling because KL depends on\(ρ,σ\)\(\\rho,\\sigma\)only throughρ\\rhoand the ratioρ/σ\\rho/\\sigma\.

Only if\.*Step 1: \([13](https://arxiv.org/html/2606.17465#S3.E13)\)⇒\\RightarrowBregman\.*Specializeqqto the two\-point law placing massλ\\lambdaonx10x\_\{1\}^\{0\}and1−λ1\-\\lambdaonx11x\_\{1\}^\{1\}, and writeρ0=ρ\(⋅\|x10\)\\rho\_\{0\}=\\rho\(\\cdot\\,\|\\,x\_\{1\}^\{0\}\),ρ1=ρ\(⋅\|x11\)\\rho\_\{1\}=\\rho\(\\cdot\\,\|\\,x\_\{1\}^\{1\}\), soρ¯=λ​ρ0\+\(1−λ\)​ρ1\\bar\{\\rho\}=\\lambda\\rho\_\{0\}\+\(1\-\\lambda\)\\rho\_\{1\}\. Then \([13](https://arxiv.org/html/2606.17465#S3.E13)\) states thatλ​D​\(ρ0∥σ\)\+\(1−λ\)​D​\(ρ1∥σ\)−D​\(ρ¯∥σ\)\\lambda D\(\\rho\_\{0\}\\\|\\sigma\)\+\(1\-\\lambda\)D\(\\rho\_\{1\}\\\|\\sigma\)\-D\(\\bar\{\\rho\}\\\|\\sigma\)is independent ofσ\\sigma\. Pointwise \(by separability\), the maps↦λ​δ​\(r0,s\)\+\(1−λ\)​δ​\(r1,s\)−δ​\(λ​r0\+\(1−λ\)​r1,s\)s\\mapsto\\lambda\\delta\(r\_\{0\},s\)\+\(1\-\\lambda\)\\delta\(r\_\{1\},s\)\-\\delta\\bigl\(\\lambda r\_\{0\}\+\(1\-\\lambda\)r\_\{1\},s\\bigr\)is constant for allr0,r1\>0r\_\{0\},r\_\{1\}\>0andλ∈\(0,1\)\\lambda\\in\(0,1\); hence for anys,s′s,s^\{\\prime\}the functionr↦δ​\(r,s\)−δ​\(r,s′\)r\\mapsto\\delta\(r,s\)\-\\delta\(r,s^\{\\prime\}\)has vanishing Jensen gap, i\.e\. is affine\. Fix a references0s\_\{0\}and setv​\(r\)≔δ​\(r,s0\)v\(r\)\\coloneqq\\delta\(r,s\_\{0\}\); thenδ​\(r,s\)=v​\(r\)\+a​\(s\)​r\+b​\(s\)\\delta\(r,s\)=v\(r\)\+a\(s\)\\,r\+b\(s\)\. Imposingδ​\(s,s\)=0\\delta\(s,s\)=0and∂rδ​\(r,s\)\|r=s=0\\partial\_\{r\}\\delta\(r,s\)\|\_\{r=s\}=0\(asr=sr=sminimizesδ​\(⋅,s\)\\delta\(\\cdot,s\)\) givesa​\(s\)=−v′​\(s\)a\(s\)=\-v^\{\\prime\}\(s\)andb​\(s\)=s​v′​\(s\)−v​\(s\)b\(s\)=s\\,v^\{\\prime\}\(s\)\-v\(s\), hence

δ​\(r,s\)=v​\(r\)−v​\(s\)−v′​\(s\)​\(r−s\),\\delta\(r,s\)=v\(r\)\-v\(s\)\-v^\{\\prime\}\(s\)\(r\-s\),i\.e\.DDis a separable Bregman divergence with potentialvv, convex sinceδ≥0\\delta\\geq 0\.

*Step 2: \([15](https://arxiv.org/html/2606.17465#S3.E15)\)⇒\\RightarrowKL\.*WriteBv≔δB\_\{v\}\\coloneqq\\delta\. Take the scalingTμ​\(x\)=μ​xT\_\{\\mu\}\(x\)=\\mu xonℝn\{\\mathbb\{R\}\}^\{n\}\(J=μnJ=\\mu^\{n\}\), whose push\-forward is\(Tμ\#​ρ\)​\(y\)=ρ​\(y/μ\)/J\(\{T\_\{\\mu\}\}\_\{\\\#\}\\rho\)\(y\)=\\rho\(y/\\mu\)/J\. Then \([15](https://arxiv.org/html/2606.17465#S3.E15)\) and a change of variables give, for all densities,∫J​Bv​\(ρ/J,σ/J\)​dx=∫Bv​\(ρ,σ\)​dx\\int J\\,B\_\{v\}\(\\rho/J,\\sigma/J\)\\,\\mathrm\{d\}x=\\int B\_\{v\}\(\\rho,\\sigma\)\\,\\mathrm\{d\}x, hence pointwiseJ​Bv​\(r/J,s/J\)=Bv​\(r,s\)J\\,B\_\{v\}\(r/J,s/J\)=B\_\{v\}\(r,s\)for allJ\>0J\>0:BvB\_\{v\}is positively homogeneous of degree one,Bv​\(λ​r,λ​s\)=λ​Bv​\(r,s\)B\_\{v\}\(\\lambda r,\\lambda s\)=\\lambda B\_\{v\}\(r,s\)\. Differentiating twice inrrgivesλ2​v′′​\(λ​r\)=λ​v′′​\(r\)\\lambda^\{2\}v^\{\\prime\\prime\}\(\\lambda r\)=\\lambda v^\{\\prime\\prime\}\(r\), soλ​v′′​\(λ​r\)=v′′​\(r\)\\lambda\\,v^\{\\prime\\prime\}\(\\lambda r\)=v^\{\\prime\\prime\}\(r\); atr=1r=1,v′′​\(λ\)=c/λv^\{\\prime\\prime\}\(\\lambda\)=c/\\lambdawithc=v′′​\(1\)\>0c=v^\{\\prime\\prime\}\(1\)\>0\. Integrating,v​\(r\)=c​r​log⁡rv\(r\)=c\\,r\\log rup to affine terms, whenceD=c​KLD=c\\,\\operatorname\{KL\}\. ∎

### 3\.4Connections with Flow Matching

Flow matching and diffusion\-style training arise as the*Gaussian reduction*of PF matching: when the one\-step conditional transitions are Gaussian with a shared noise schedule, a single least\-squares loss controls*both*the marginal Wasserstein\-2 and the marginal KL PF objectives\. The mechanism is shared—bothW22W\_\{2\}^\{2\}andKL\\operatorname\{KL\}are jointly convex, and for two Gaussians with the same covariance both reduce to the squared distance between means\.

###### Theorem 3

\[Flow matching is the common surrogate for marginal PF matching\] LetZ=X1∼qZ=X\_\{1\}\\sim q, and assume conditionally Gaussian one\-step transitions with a shared isotropic covariance and a shared drift fieldftθf^\{\\theta\}\_\{t\},

ρt\+τ\(⋅\|Z\)=𝒩\(μt\(Z\),gt2τId\),\\displaystyle\\rho\_\{t\+\\tau\}\(\\cdot\\,\|\\,Z\)=\\mathcal\{N\}\\\!\\bigl\(\\mu\_\{t\}\(Z\),\\,g\_\{t\}^\{2\}\\tau I\_\{d\}\\bigr\),\(19\)𝒫τθρt\(⋅\|Z\)=𝒩\(ftθ\(Xt\),gt2τId\),\\displaystyle\\mathcal\{P\}^\{\\theta\}\_\{\\tau\}\\rho\_\{t\}\(\\cdot\\,\|\\,Z\)=\\mathcal\{N\}\\\!\\bigl\(f^\{\\theta\}\_\{t\}\(X\_\{t\}\),\\,g\_\{t\}^\{2\}\\tau I\_\{d\}\\bigr\),\(20\)withgt\>0g\_\{t\}\>0fixed \(independent ofθ\\theta\) andXtX\_\{t\}the current state on the conditional path\. Write the flow\-matching loss

LFM​\(θ\)≔𝔼Z​‖μt​\(Z\)−ftθ​\(Xt\)‖2\.L\_\{\\mathrm\{FM\}\}\(\\theta\)\\;\\coloneqq\\;\\mathbb\{E\}\_\{Z\}\\bigl\\\|\\mu\_\{t\}\(Z\)\-f^\{\\theta\}\_\{t\}\(X\_\{t\}\)\\bigr\\\|^\{2\}\.Then the marginal Wasserstein and KL PF objectives obey

W22​\(𝒫τθ​ρt,ρt\+τ\)≤LFM​\(θ\),\\displaystyle W\_\{2\}^\{2\}\\\!\\bigl\(\\mathcal\{P\}^\{\\theta\}\_\{\\tau\}\\rho\_\{t\},\\rho\_\{t\+\\tau\}\\bigr\)\\;\\leq\\;L\_\{\\mathrm\{FM\}\}\(\\theta\),\(21\)KL⁡\(ρt\+τ∥𝒫τθ​ρt\)≤12​gt2​τ​LFM​\(θ\),\\displaystyle\\operatorname\{KL\}\\\!\\bigl\(\\rho\_\{t\+\\tau\}\\,\\big\\\|\\,\\mathcal\{P\}^\{\\theta\}\_\{\\tau\}\\rho\_\{t\}\\bigr\)\\;\\leq\\;\\frac\{1\}\{2g\_\{t\}^\{2\}\\tau\}\\,L\_\{\\mathrm\{FM\}\}\(\\theta\),\(22\)andLFML\_\{\\mathrm\{FM\}\}equals the denoising least\-squares \(sample\) loss up to aθ\\theta\-free constant,

LFM​\(θ\)=𝔼Z,Xt\+τ​‖Xt\+τ−ftθ​\(Xt\)‖2−d​gt2​τ\.L\_\{\\mathrm\{FM\}\}\(\\theta\)=\\mathbb\{E\}\_\{Z,X\_\{t\+\\tau\}\}\\bigl\\\|X\_\{t\+\\tau\}\-f^\{\\theta\}\_\{t\}\(X\_\{t\}\)\\bigr\\\|^\{2\}\-d\\,g\_\{t\}^\{2\}\\tau\.\(23\)Consequently the flow\-matching loss is a common offline surrogate for both marginal objectives, and its unique minimizer over drift fields is the marginal driftft⋆​\(x\)=𝔼​\[μt​\(Z\)\|Xt=x\]f\_\{t\}^\{\\star\}\(x\)=\\mathbb\{E\}\\\!\\bigl\[\\mu\_\{t\}\(Z\)\\,\\big\|\\,X\_\{t\}=x\\bigr\]\.

###### Proof\.

For two Gaussians with common covarianceΣ=gt2​τ​Id\\Sigma=g\_\{t\}^\{2\}\\tau I\_\{d\}, the Bures term vanishes and

W22​\(𝒩​\(μ,Σ\),𝒩​\(m,Σ\)\)=‖μ−m‖2,\\displaystyle W\_\{2\}^\{2\}\\\!\\bigl\(\\mathcal\{N\}\(\\mu,\\Sigma\),\\mathcal\{N\}\(m,\\Sigma\)\\bigr\)=\\\|\\mu\-m\\\|^\{2\},\(24\)KL⁡\(𝒩​\(μ,Σ\)∥𝒩​\(m,Σ\)\)=‖μ−m‖22​gt2​τ\.\\displaystyle\\operatorname\{KL\}\\\!\\bigl\(\\mathcal\{N\}\(\\mu,\\Sigma\)\\,\\big\\\|\\,\\mathcal\{N\}\(m,\\Sigma\)\\bigr\)=\\frac\{\\\|\\mu\-m\\\|^\{2\}\}\{2g\_\{t\}^\{2\}\\tau\}\.\(25\)Applying these to the conditionals and taking𝔼Z\\mathbb\{E\}\_\{Z\},

𝔼ZW22\(𝒫τθρt\(⋅\|Z\),ρt\+τ\(⋅\|Z\)\)=LFM\(θ\),\\displaystyle\\mathbb\{E\}\_\{Z\}\\,W\_\{2\}^\{2\}\\\!\\bigl\(\\mathcal\{P\}^\{\\theta\}\_\{\\tau\}\\rho\_\{t\}\(\\cdot\\,\|\\,Z\),\\rho\_\{t\+\\tau\}\(\\cdot\\,\|\\,Z\)\\bigr\)=L\_\{\\mathrm\{FM\}\}\(\\theta\),\(26\)𝔼ZKL\(ρt\+τ\(⋅\|Z\)∥𝒫τθρt\(⋅\|Z\)\)=LFM​\(θ\)2​gt2​τ\.\\displaystyle\\mathbb\{E\}\_\{Z\}\\,\\operatorname\{KL\}\\\!\\bigl\(\\rho\_\{t\+\\tau\}\(\\cdot\\,\|\\,Z\)\\,\\big\\\|\\,\\mathcal\{P\}^\{\\theta\}\_\{\\tau\}\\rho\_\{t\}\(\\cdot\\,\|\\,Z\)\\bigr\)=\\frac\{L\_\{\\mathrm\{FM\}\}\(\\theta\)\}\{2g\_\{t\}^\{2\}\\tau\}\.\(27\)On the other hand, bothW22W\_\{2\}^\{2\}andKL\\operatorname\{KL\}are jointly convex in their two arguments, so for any mixing law,D​\(𝔼Z​αZ∥𝔼Z​βZ\)≤𝔼Z​D​\(αZ∥βZ\)D\\\!\\bigl\(\\mathbb\{E\}\_\{Z\}\\alpha\_\{Z\}\\,\\\|\\,\\mathbb\{E\}\_\{Z\}\\beta\_\{Z\}\\bigr\)\\leq\\mathbb\{E\}\_\{Z\}D\(\\alpha\_\{Z\}\\\|\\beta\_\{Z\}\)\. Sinceρt\+τ=𝔼Zρt\+τ\(⋅\|Z\)\\rho\_\{t\+\\tau\}=\\mathbb\{E\}\_\{Z\}\\rho\_\{t\+\\tau\}\(\\cdot\\,\|\\,Z\)and𝒫τθρt=𝔼Z𝒫τθρt\(⋅\|Z\)\\mathcal\{P\}^\{\\theta\}\_\{\\tau\}\\rho\_\{t\}=\\mathbb\{E\}\_\{Z\}\\mathcal\{P\}^\{\\theta\}\_\{\\tau\}\\rho\_\{t\}\(\\cdot\\,\|\\,Z\), applying this toD∈\{W22,KL\}D\\in\\\{W\_\{2\}^\{2\},\\operatorname\{KL\}\\\}and combining with \([26](https://arxiv.org/html/2606.17465#S3.E26)\) yields the two bounds \([21](https://arxiv.org/html/2606.17465#S3.E21)\)\. WithXt\+τ\(⋅\|Z\)=μt\(Z\)\+gtτεX\_\{t\+\\tau\}\(\\cdot\\,\|\\,Z\)=\\mu\_\{t\}\(Z\)\+g\_\{t\}\\sqrt\{\\tau\}\\,\\varepsilon,ε∼𝒩​\(0,Id\)\\varepsilon\\sim\\mathcal\{N\}\(0,I\_\{d\}\)independent of\(Z,Xt\)\(Z,X\_\{t\}\),

𝔼​\[‖Xt\+τ−ftθ​\(Xt\)‖2\|Z\]=‖μt​\(Z\)−ftθ​\(Xt\)‖2\+d​gt2​τ;\\mathbb\{E\}\\\!\\bigl\[\\\|X\_\{t\+\\tau\}\-f^\{\\theta\}\_\{t\}\(X\_\{t\}\)\\\|^\{2\}\\,\\big\|\\,Z\\bigr\]=\\\|\\mu\_\{t\}\(Z\)\-f^\{\\theta\}\_\{t\}\(X\_\{t\}\)\\\|^\{2\}\+d\\,g\_\{t\}^\{2\}\\tau;taking𝔼Z\\mathbb\{E\}\_\{Z\}gives \([23](https://arxiv.org/html/2606.17465#S3.E23)\), whose additive constantd​gt2​τd\\,g\_\{t\}^\{2\}\\tauis independent ofθ\\theta\. WritingLFM​\(θ\)=𝔼Xt​𝔼Z∣Xt​‖μt​\(Z\)−ftθ​\(Xt\)‖2L\_\{\\mathrm\{FM\}\}\(\\theta\)=\\mathbb\{E\}\_\{X\_\{t\}\}\\mathbb\{E\}\_\{Z\\mid X\_\{t\}\}\\\|\\mu\_\{t\}\(Z\)\-f^\{\\theta\}\_\{t\}\(X\_\{t\}\)\\\|^\{2\}and minimizing over fields pointwise inxx, the optimum is the conditional meanft⋆​\(x\)=𝔼​\[μt​\(Z\)∣Xt=x\]f\_\{t\}^\{\\star\}\(x\)=\\mathbb\{E\}\[\\mu\_\{t\}\(Z\)\\mid X\_\{t\}=x\]\. ∎

### 3\.5Nesterov Momentum Acceleration for Generation

We incorporate Nesterov’s acceleration\(Nesterov and others,[2018](https://arxiv.org/html/2606.17465#bib.bib6)\)at the*observable*level so that evaluation is performed at a look\-ahead point\. Let\{ϕk\}k≥1\\\{\\phi\_\{k\}\\\}\_\{k\\geq 1\}be an observable basis, and denote the Koopman step by𝒦τθ\\mathcal\{K\}\_\{\\tau\}^\{\\theta\}\. Define the extrapolated \(look\-ahead\) observable

ψk​\(xt\)=ϕk​\(xt\)\+ηt​\(ϕk​\(xt\)−ϕk​\(xt−τ\)\),ηt∈\[0,1\)\.\\psi\_\{k\}\(x\_\{t\}\)=\\phi\_\{k\}\(x\_\{t\}\)\+\\eta\_\{t\}\\bigl\(\\phi\_\{k\}\(x\_\{t\}\)\-\\phi\_\{k\}\(x\_\{t\-\\tau\}\)\\bigr\),\\quad\\eta\_\{t\}\\in\[0,1\)\.We replaceϕk​\(xt\)\\phi\_\{k\}\(x\_\{t\}\)with a momentum look\-ahead on the input observables:

ℒKPM\-Nes​\(θ\)=∑k=1K𝔼X0∼p,X1∼q∥ϕk​\(xt\+τ​\(X0,X1\)\)−𝒦τθψk\(xt\(X0,X1\)\)∥2\.\\mathcal\{L\}\_\{\\text\{KPM\-Nes\}\}\(\\theta\)=\\sum\_\{k=1\}^\{K\}\\mathbb\{E\}\_\{X\_\{0\}\\sim p,X\_\{1\}\\sim q\}\\bigl\\\|\\phi\_\{k\}\\bigl\(x\_\{t\+\\tau\}\(X\_\{0\},X\_\{1\}\)\\bigr\)\\bigr\.\\\\ \{\\bigl\.\-\\mathcal\{K\}\_\{\\tau\}^\{\\theta\}\\psi\_\{k\}\\bigl\(x\_\{t\}\(X\_\{0\},X\_\{1\}\)\\bigr\)\\bigr\\\|\}^\{2\}\.\(28\)In coordinates, i\.e\.,ϕk​\(x\)=x\(k\)\\phi\_\{k\}\(x\)=x^\{\(k\)\}, \([28](https://arxiv.org/html/2606.17465#S3.E28)\) reduces to the vector form

𝔼X0∼p,X1∼q∥xt\+τ\(X0,X1\)−𝒦^τθ\(xt\(X0,X1\)\+ηt\(xt\(X0,X1\)−xt−τ\(X0,X1\)\)\)∥2,\\mathbb\{E\}\_\{X\_\{0\}\\sim p,X\_\{1\}\\sim q\}\\Big\\\|x\_\{t\+\\tau\}\(X\_\{0\},X\_\{1\}\)\-\\hat\{\\mathcal\{K\}\}\_\{\\tau\}^\{\\theta\}\\bigl\(x\_\{t\}\(X\_\{0\},X\_\{1\}\)\\\\ \+\\eta\_\{t\}\(x\_\{t\}\(X\_\{0\},X\_\{1\}\)\-x\_\{t\-\\tau\}\(X\_\{0\},X\_\{1\}\)\)\\bigr\)\\Bigr\\\|^\{2\},\(29\)where𝒦^τθ\\hat\{\\mathcal\{K\}\}\_\{\\tau\}^\{\\theta\}is constructed as the Koopman operator\. Given a trained𝒦τθ\\mathcal\{K\}\_\{\\tau\}^\{\\theta\}, we propagate with a look\-ahead state:

yt\\displaystyle y\_\{t\}=xt\+ηt​\(xt−xt−τ\),\\displaystyle=x\_\{t\}\+\\eta\_\{t\}\\,\(x\_\{t\}\-x\_\{t\-\\tau\}\),\(30a\)xt\+τ\\displaystyle x\_\{t\+\\tau\}=𝒦^τθ​\(yt\),x0,xτ∼𝒩​\(0,I\)\.\\displaystyle=\\hat\{\\mathcal\{K\}\}\_\{\\tau\}^\{\\theta\}\(y\_\{t\}\),\\quad x\_\{0\},x\_\{\\tau\}\\sim\\mathcal\{N\}\(0,I\)\.\(30b\)In light of the Nesterov momentum method in optimization theory, we here introduce formally the following Nesterov\-KPM training and sampling algorithms\.

Algorithm 1Nesterov\-KPM Training \(mini\-batch\)1:Inputs:step

τ\\tau, momentum

η\\eta, bridge

xs​\(X1,X0\)x\_\{s\}\(X\_\{1\},X\_\{0\}\)
2:forbatches of pairs

\(X0\(i\),X1\(i\)\)\(X\_\{0\}^\{\(i\)\},X\_\{1\}^\{\(i\)\}\)do

3:build

xt−τ\(i\),xt\(i\),xt\+τ\(i\)x\_\{t\-\\tau\}^\{\(i\)\},x\_\{t\}^\{\(i\)\},x\_\{t\+\\tau\}^\{\(i\)\}from the bridge

4:

yt←xt\(i\)\+η​\(xt\(i\)−xt−τ\(i\)\)y\_\{t\}\\leftarrow x\_\{t\}^\{\(i\)\}\+\\eta\\big\(x\_\{t\}^\{\(i\)\}\-x\_\{t\-\\tau\}^\{\(i\)\}\\big\)
5:

z^t←𝒦^τθ​\(yt\)\\hat\{z\}\_\{t\}\\leftarrow\\hat\{\\mathcal\{K\}\}\_\{\\tau\}^\{\\theta\}\\big\(y\_\{t\}\\big\)
6:

ℒKPM​\-​Nes←1B​∑i‖xt\+τ−z^t‖22\\mathcal\{L\}\_\{\\mathrm\{KPM\}\\text\{\-\}\\mathrm\{Nes\}\}\\leftarrow\\frac\{1\}\{B\}\\sum\_\{i\}\\\|x\_\{t\+\\tau\}\-\\hat\{z\}\_\{t\}\\\|\_\{2\}^\{2\}
7:Update

θ\\thetaby gradient descent on

ℒKPM​\-​Nes\\mathcal\{L\}\_\{\\mathrm\{KPM\}\\text\{\-\}\\mathrm\{Nes\}\}

Algorithm 2Nesterov\-KPM Sampling1:Initialize

x0,x−τ∼𝒩​\(0,I\)x\_\{0\},x\_\{\-\\tau\}\\sim\\mathcal\{N\}\(0,I\), set

t=0t\\\!=\\\!0
2:while

t<1t<1do

3:

yt←xt\+η​\(xt−xt−τ\)y\_\{t\}\\leftarrow x\_\{t\}\+\\eta\(x\_\{t\}\-x\_\{t\-\\tau\}\)
4:

xt\+τ←𝒦^τθ​\(yt\)x\_\{t\+\\tau\}\\leftarrow\\hat\{\\mathcal\{K\}\}\_\{\\tau\}^\{\\theta\}\(y\_\{t\}\)
5:

t←t\+τt\\leftarrow t\+\\tau

## 4Numerical Simulations

In this section, we follow the training loss in the flow matching, representing the Koopman operator using a deep neural network parameterized byθ\\theta, that is,

minθ⁡𝔼X0∼p,X1∼q,t∼U​\[0,1\]∥xt\+τ​\(X1,X0\)−NNθ\(t,xt\(X1,X0\)\)∥2\.\\min\_\{\\theta\}\\mathbb\{E\}\_\{X\_\{0\}\\sim p,X\_\{1\}\\sim q,t\\sim\\mathrm\{U\}\[0,1\]\}\\big\\\|x\_\{t\+\\tau\}\(X\_\{1\},X\_\{0\}\)\\\\ \-\\mathrm\{NN\}\_\{\\theta\}\(t,x\_\{t\}\(X\_\{1\},X\_\{0\}\)\)\\big\\\|^\{2\}\.\(31\)After getting the optimal parameterθ\\theta, we do the following iteration over timett:X^t\+τ=NNθ​\(t,X^t\),X^0∼N​\(0,I\)\.\\hat\{X\}\_\{t\+\\tau\}=\\mathrm\{NN\}\_\{\\theta\}\(t,\\hat\{X\}\_\{t\}\),\\quad\\hat\{X\}\_\{0\}\\sim N\(0,I\)\.Some important hyperparameters are listed in Table[1](https://arxiv.org/html/2606.17465#S4.T1)\.

Table 1:Parameter settingsFig\.[2](https://arxiv.org/html/2606.17465#S4.F2)shows the generated and original samples of GMM model and the Two\-Moon model, respectively\.

![Refer to caption](https://arxiv.org/html/2606.17465v1/GMM.png)
![Refer to caption](https://arxiv.org/html/2606.17465v1/MOONS.png)

Figure 2:Original Samples \(Blue\) from GMM \(Left\) / Two\-Moon \(Right\) Models and Generated Samples \(Red\) by PFOM\.Moreover, we train with the Nesterov momentum loss in Algorithm[1](https://arxiv.org/html/2606.17465#alg1)and generate samples via Algorithm[2](https://arxiv.org/html/2606.17465#alg2)on the GMM benchmark\. Figure[3](https://arxiv.org/html/2606.17465#S4.F3)compares the*rates of decrease*in KL divergence, Wasserstein\-2, and maximum mean discrepancy \(MMD\) between standard Koopman path matching and its Nesterov\-accelerated variant\. The Nesterov method consistently achieves faster and better convergence\. The reported curves correspond to a representative run; multi\-seed evaluation is left for future work\.

![Refer to caption](https://arxiv.org/html/2606.17465v1/KL_compare.png)
![Refer to caption](https://arxiv.org/html/2606.17465v1/Wasserstein_compare.png)
![Refer to caption](https://arxiv.org/html/2606.17465v1/MMD_compare.png)

Figure 3:Comparison of KL\-divergence \(First row\)/W2W\_\{2\}metric \(Second row\)/maximum mean discrepancy \(Third row\) decreasing rate\.Moreover, we also show in Fig\.[4](https://arxiv.org/html/2606.17465#S4.F4)the generating process for GMM/Two\-moons model of our Nesterov\-KPM sampling method for demonstration\.

![Refer to caption](https://arxiv.org/html/2606.17465v1/process.png)
![Refer to caption](https://arxiv.org/html/2606.17465v1/process2.png)

Figure 4:Generating process of our Nesterov\-KPM Sampling\.
## 5Conclusions

We have introduced Perron–Frobenius Operator Matching\(PFOM\), an operator\-theoretic framework that connects density\-level Perron–Frobenius evolution, Koopman path matching,and sample\-conditioned generative training\. We showed that theKullback–Leibler divergence has a distinguished role amongseparable Bregman divergences in preserving the alignment betweenmarginal density objectives and conditional losses\. We alsodeveloped a Nesterov\-type inertial variant, which improves theempirical convergence behavior of the Koopman path\-matchingimplementation on Gaussian mixture and two\-moon benchmarks\.The present experiments serve as low\-dimensional proof\-of\-conceptvalidations\. Future work will focus on higher\-dimensionalbenchmarks, adaptive observable dictionaries, latent\-space imagemodeling, and controlled PFOM formulations with explicit input orfeedback dependence\. More systematic empirical evaluation,including multi\-seed robustness, uncertainty bands, and comparisonswith standard flow\-matching and diffusion baselines, will also beimportant for assessing the practical scalability of the proposedapproach\.

## References

- S\. L\. Brunton, J\. L\. Proctor, and J\. N\. Kutz \(2016\)Discovering governing equations from data by sparse identification of nonlinear dynamical systems\.Proceedings of the national academy of sciences113\(15\),pp\. 3932–3937\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p2.1)\.
- C\. Cui, J\. Liu, P\. Hui, P\. Lin, and C\. Zhang \(2025\)GenControl: generative ai\-driven autonomous design of control algorithms\.arXiv preprint arXiv:2506\.12554\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1)\.
- T\. Eisner, B\. Farkas, M\. Haase, and R\. Nagel \(2015\)Operator theoretic aspects of ergodic theory\.Vol\.272,Springer\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p2.1)\.
- J\. Ho, A\. Jain, and P\. Abbeel \(2020\)Denoising diffusion probabilistic models\.Advances in neural information processing systems33,pp\. 6840–6851\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1),[§1](https://arxiv.org/html/2606.17465#S1.p3.1)\.
- P\. Holderrieth, M\. Havasi, J\. Yim, N\. Shaul, I\. Gat, T\. Jaakkola, B\. Karrer, R\. T\. Chen, and Y\. Lipman \(2024\)Generator matching: generative modeling with arbitrary markov processes\.arXiv preprint arXiv:2410\.20587\.Cited by:[§2\.2](https://arxiv.org/html/2606.17465#S2.SS2.p4.1)\.
- A\. Karimi and T\. T\. Georgiou \(2022\)Data\-driven approximation of the perron\-frobenius operator using the wasserstein metric\.IFAC\-PapersOnLine55\(30\),pp\. 341–346\.Cited by:[§3\.2](https://arxiv.org/html/2606.17465#S3.SS2.p1.6),[Remark 1](https://arxiv.org/html/2606.17465#Thmremark1.p1.1)\.
- S\. M\. Katz, A\. L\. Corso, C\. A\. Strong, and M\. J\. Kochenderfer \(2022\)Verification of image\-based neural network controllers using generative models\.Journal of Aerospace Information Systems19\(9\),pp\. 574–584\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1)\.
- A\. Lasota and M\. C\. Mackey \(2013\)Chaos, fractals, and noise: stochastic aspects of dynamics\.Vol\.97,Springer Science & Business Media\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.17465#S2.SS1.p1.11),[§3\.1](https://arxiv.org/html/2606.17465#S3.SS1.p1.7)\.
- B\. Lemmens and R\. Nussbaum \(2012\)Nonlinear perron\-frobenius theory\.Vol\.189,Cambridge University Press\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p2.1)\.
- Q\. Li, F\. Dietrich, E\. M\. Bollt, and I\. G\. Kevrekidis \(2017\)Extended dynamic mode decomposition with dictionary learning: a data\-driven adaptive spectral decomposition of the koopman operator\.Chaos: An Interdisciplinary Journal of Nonlinear Science27\(10\)\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p2.1),[§3](https://arxiv.org/html/2606.17465#S3.p2.1)\.
- Y\. Lipman, R\. T\. Chen, H\. Ben\-Hamu, M\. Nickel, and M\. Le \(2022\)Flow matching for generative modeling\.arXiv preprint arXiv:2210\.02747\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.17465#S2.SS2.p2.8),[§2\.2](https://arxiv.org/html/2606.17465#S2.SS2.p4.1)\.
- Y\. Lipman, M\. Havasi, P\. Holderrieth, N\. Shaul, M\. Le, B\. Karrer, R\. T\. Chen, D\. Lopez\-Paz, H\. Ben\-Hamu, and I\. Gat \(2024\)Flow matching guide and code\.arXiv preprint arXiv:2412\.06264\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1),[§1](https://arxiv.org/html/2606.17465#S1.p3.1),[Figure 1](https://arxiv.org/html/2606.17465#S2.F1.4.1),[Figure 1](https://arxiv.org/html/2606.17465#S2.F1.6.2),[§2\.2](https://arxiv.org/html/2606.17465#S2.SS2.p4.1)\.
- Y\. Nesterovet al\.\(2018\)Lectures on convex optimization\.Vol\.137,Springer\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p5.1),[§3\.5](https://arxiv.org/html/2606.17465#S3.SS5.p1.2)\.
- A\. V\. Oppenheim, A\. S\. Willsky, and S\. H\. Nawab \(1997\)Signals & systems\.Pearson Educación\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1)\.
- J\. L\. Proctor, S\. L\. Brunton, and J\. N\. Kutz \(2016\)Dynamic mode decomposition with control\.SIAM Journal on Applied Dynamical Systems15\(1\),pp\. 142–161\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p2.1),[§3](https://arxiv.org/html/2606.17465#S3.p2.1)\.
- H\. Risken \(1989\)Fokker\-planck equation\.InThe Fokker\-Planck equation: methods of solution and applications,pp\. 63–95\.Cited by:[§3\.1](https://arxiv.org/html/2606.17465#S3.SS1.p1.7),[§3\.1](https://arxiv.org/html/2606.17465#S3.SS1.p2.3)\.
- T\. Rolski, H\. Schmidli, V\. Schmidt, and J\. L\. Teugels \(2009\)Stochastic processes for insurance and finance\.John Wiley & Sons\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1)\.
- S\. M\. Ross \(1995\)Stochastic processes\.John Wiley & Sons\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1)\.
- N\. G\. Van Kampen \(1992\)Stochastic processes in physics and chemistry\.Vol\.1,Elsevier\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1)\.
- L\. Yang, Z\. Zhang, Y\. Song, S\. Hong, R\. Xu, Y\. Zhao, W\. Zhang, B\. Cui, and M\. Yang \(2023\)Diffusion models: a comprehensive survey of methods and applications\.ACM computing surveys56\(4\),pp\. 1–39\.Cited by:[§1](https://arxiv.org/html/2606.17465#S1.p1.1),[§1](https://arxiv.org/html/2606.17465#S1.p3.1)\.

Similar Articles

Multimarginal flow matching with optimal transport potentials

arXiv cs.LG

Proposes OTP-FM, a novel method for multimarginal flow matching that uses optimal transport potentials to softly steer flows through intermediate marginals, achieving state-of-the-art performance on single-cell RNA sequencing, oceanographic, and meteorological datasets.

Flow-OPD: On-Policy Distillation for Flow Matching Models

Hugging Face Daily Papers

Flow-OPD is a research paper introducing a two-stage on-policy distillation framework for Flow Matching text-to-image models, significantly improving generation quality and alignment metrics using Stable Diffusion 3.5 Medium.