A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models
Summary
This arXiv preprint proposes a unified measure-theoretic framework for understanding diffusion, score-based, and flow matching generative models. It establishes connections between these methods via continuity/Fokker-Planck equations and analyzes their sampling schemes and theoretical guarantees.
View Cached Full Text
Cached at: 05/11/26, 06:54 AM
# A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models
Source: [https://arxiv.org/html/2605.06829](https://arxiv.org/html/2605.06829)
\\nameAditya Ranganath\\emailranganath2@llnl\.gov \\addrCenter for Applied Scientific Computing Lawrence Livermore National Laboratory Livermore, CA 94551, USA\\nameMukesh Singhal\\emailmsinghal@ucmerced\.edu \\addrDepartment of Electrical Engineering University of California Merced, CA 95343, USA
###### Abstract
We survey continuous\-time generative modeling methods based on transporting a simple reference distribution to a data distribution via stochastic or deterministic dynamics\. We present a unified framework in which diffusion models, score\-based generative models, and flow matching are instances of learning a time\-dependent vector field that induces a family of marginals\(ρt\)t∈\[0,1\]\(\\rho\_\{t\}\)\_\{t\\in\[0,1\]\}governed by a continuity/Fokker–Planck equation\. Within this framework, we \(i\) derive reverse\-time sampling for diffusion/score models as controlled stochastic dynamics, \(ii\) show the probability flow ODE yields identical marginals and connects diffusion to likelihood\-based normalizing flows, and \(iii\) interpret flow matching as direct regression of the velocity field under a chosen interpolation, clarifying when it coincides with \(or differs from\) score\-based training\. We compare objectives, sampling schemes, and discretization errors under unified notation, discuss connections to Schrödinger bridges and entropic optimal transport, and summarize theoretical guarantees and open problems on approximation, stability, and scalability\.
Keywords:Generative models; deep generative models; diffusion models; score\-based models; flow matching; probabilistic modeling; stochastic differential equations; continuous normalizing flows; optimal transport; sampling methods; inverse problems; machine learning theory
## 1Introduction and Reading Guide
Generative modeling seeks to learn mechanisms for producing samples from a complex data distributionρdata\\rho\_\{\\mathrm\{data\}\}onℝd\\mathbb\{R\}^\{d\}\. Beyond unconditional synthesis, modern generative models serve as priors and proposal mechanisms in downstream tasks, including conditional generation and inverse problems\. Over the past decade, the field has progressed through several paradigms—variational autoencoders, generative adversarial networks, and likelihood\-based normalizing flows\(Kingma and Welling,[2014](https://arxiv.org/html/2605.06829#bib.bib17); Rezendeet al\.,[2014](https://arxiv.org/html/2605.06829#bib.bib18); Goodfellowet al\.,[2014](https://arxiv.org/html/2605.06829#bib.bib19); Dinhet al\.,[2015](https://arxiv.org/html/2605.06829#bib.bib20),[2017](https://arxiv.org/html/2605.06829#bib.bib21); Papamakarioset al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib29)\)—toward a family of methods that construct samples by evolving a simple reference distribution through time\-dependent dynamics\. Diffusion models and closely related score\-based generative models have become influential approaches to high\-fidelity synthesis in high dimensions, with diffusion probabilistic modeling popularized in modern deep learning by denoising diffusion probabilistic models \(DDPM\)\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1)\)and its successors\(Songet al\.,[2020a](https://arxiv.org/html/2605.06829#bib.bib14); Nichol and Dhariwal,[2021](https://arxiv.org/html/2605.06829#bib.bib10); Kingmaet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib28); Karraset al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib16)\)\. More recent work has broadened this design space further through deterministic degradations, stochastic interpolants, consistency\-style models, Bayesian flow networks, and discrete diffusion alternatives\(Bansalet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib36); Albergoet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib44); Songet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib39); Kimet al\.,[2024a](https://arxiv.org/html/2605.06829#bib.bib45); Graveset al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib38); Louet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib46)\)\.
#### What does it mean for a model to be*generative*?
A model is called*generative*if it specifies, explicitly or implicitly, a mechanism for producing new samples that resemble draws from an unknown data distributionρdata\\rho\_\{\\mathrm\{data\}\}\. Formally, given datax∈ℝdx\\in\\mathbb\{R\}^\{d\}\(or more general sample spaces\), the goal is to learn a distributionpθ\(x\)p\_\{\\theta\}\(x\)such thatpθ≈ρdatap\_\{\\theta\}\\approx\\rho\_\{\\mathrm\{data\}\}in a meaningful sense\. However, being generative does not require an explicit closed\-form density\. What matters is that the model defines a sampling procedure that maps randomness to data, for example
x=Gθ\(z\),z∼p\(z\),x=G\_\{\\theta\}\(z\),\\qquad z\\sim p\(z\),wherep\(z\)p\(z\)is a simple base distribution, such as a standard Gaussian, andGθG\_\{\\theta\}is a learned transformation\. This view emphasizes that generative modeling can be interpreted as a problem of learning*probability transport*: pushing a simple reference distribution through learned dynamics to match the data distribution\(Goodfellowet al\.,[2016](https://arxiv.org/html/2605.06829#bib.bib25); Papamakarioset al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib29)\)\.
#### Background: a brief map of generative modeling paradigms\.
Before diffusion and score\-based methods became widely adopted, three families of deep generative models shaped the modern literature\. These paradigms differ mainly in how they represent and learn the model distributionpθp\_\{\\theta\}\.
*Latent\-variable likelihood models*such as variational autoencoders \(VAEs\) learn an explicit generative model together with an amortized inference network by maximizing a variational lower bound on the data log\-likelihood\(Kingma and Welling,[2014](https://arxiv.org/html/2605.06829#bib.bib17); Rezendeet al\.,[2014](https://arxiv.org/html/2605.06829#bib.bib18)\)\. VAEs offer stable training and likelihood\-based evaluation, but classic formulations can trade off sample sharpness against generalizability, resulting in richer decoders and more narrow, yet, expressive priors\.
*Implicit generative models*such as generative adversarial networks \(GANs\) learn to generate samples through a minimax objective against a discriminator\(Goodfellowet al\.,[2014](https://arxiv.org/html/2605.06829#bib.bib19)\)\. GANs produce high\-fidelity samples and somewhat avoid explicit likelihood computation when compared with latent\-variable likelihood models, but training can be unstable and evaluation can be challenging\(Arjovsky and Bottou,[2017](https://arxiv.org/html/2605.06829#bib.bib22); Meschederet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib23); Salimanset al\.,[2016](https://arxiv.org/html/2605.06829#bib.bib24)\)\. Mode dropping and sensitivity to hyperparameters are recurring issues\.
*Normalizing flows*learn an invertible map that transports a simple base distribution \(e\.g\., Gaussian\) to the data distribution, enabling exact likelihood computation via change\-of\-variables\(Dinhet al\.,[2015](https://arxiv.org/html/2605.06829#bib.bib20),[2017](https://arxiv.org/html/2605.06829#bib.bib21); Papamakarioset al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib29)\)\. Flows provide a clear connection to transport and Jacobian\-based likelihoods, but architectural constraints required for tractable Jacobians can limit expressivity or increase compute in high dimensions\.
Taken together, these paradigms highlight recurring design choices: whether the model has an explicit likelihood, whether generation is defined through an invertible map or an implicit sampler, and how training balances sample quality, coverage, and tractability\(Goodfellowet al\.,[2016](https://arxiv.org/html/2605.06829#bib.bib25); Papamakarioset al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib29)\)\. Diffusion and score\-based models can be viewed as inheriting aspects of all three traditions: like flows, they admit a transport interpretation; like implicit models, they emphasize flexible samplers and high sample quality; and like latent\-variable likelihood models, they often come with likelihood\-based objectives or variational bounds\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1); Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Kingmaet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib28)\)\. On the contrary, more recent literature has explored substantially different ways of instantiating the same generative goal, including deterministic degradations in place of Gaussian noising, stochastic interpolants, consistency\-based one\-step or few\-step generation, Bayesian flow\-network formulations, and ratio\-based discrete diffusion methods\(Bansalet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib36); Albergoet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib44); Songet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib39); Kimet al\.,[2024a](https://arxiv.org/html/2605.06829#bib.bib45); Graveset al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib38); Louet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib46)\)\.
A second example—especially relevant to scientific and engineering applications—is solving inverse problems such as deblurring, inpainting, MRI/CT reconstruction, or more general linear and nonlinear measurement models of the formy=𝒜\(x\)\+εy=\\mathcal\{A\}\(x\)\+\\varepsilon\. Here, generative models generate priors for recoveringxxfrom partial or noisy observationsyy\. In diffusion and score\-based formulations, the learned score field provides a principled way to construct posterior sampling by combining a prior score with a likelihood term, leading to algorithms for conditional generation and reconstruction\. This can be interpreted as predictor–corrector approach or approximate posterior samplers\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Jalalet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib31); Chung and Ye,[2022](https://arxiv.org/html/2605.06829#bib.bib32); Chunget al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib34); Kawaret al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib33); Tewariet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib97); Routet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib98); Li and Pereira,[2024](https://arxiv.org/html/2605.06829#bib.bib103); Janatiet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib102)\)\. Closely related ideas have also been used for guided image editing and inpainting, where the conditioning signal acts as a soft constraint along the transport path rather than as an explicit post\-transport correction\(Menget al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib95); Lugmayret al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib96); Corneanuet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib100); Zhanget al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib99)\)\. In this setting, the choice of sampler \(reverse\-time SDE versus probability\-flow ODE\), the time discretization, and the weighting induced by the training objective can strongly affect reconstruction fidelity and stability\. This bolsters our motivation for a unified theory of objectives and samplers\. The same learned model can behave very differently in inverse problems depending on how its dynamics are discretized and how conditioning is imposed\.
#### Historical narrative\.
Modern diffusion modeling in deep learning was popularized by DDPM, which framed generation as reversing a discrete\-time Markov noising process trained by a variational/denoising objective\(Sohl\-Dicksteinet al\.,[2015](https://arxiv.org/html/2605.06829#bib.bib9); Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1)\)\. A continuous\-time reformulation then unified several diffusion\-like constructions by expressing the forward noising process as an SDE and deriving a reverse\-time SDE whose drift depends on the time\-dependent score\. This SDE perspective also made an explicit deterministic probability flow ODE, sharing the same marginals as the stochastic diffusion and connect diffusion sampling to continuous normalizing flows\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Chenet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib6); Grathwohlet al\.,[2019](https://arxiv.org/html/2605.06829#bib.bib11)\)\. More recently, flow matching reframed training as simulation\-free regression of vector fields for prescribed probability paths, including diffusion paths as special cases, enabling scalable training of continuous\-normalizing\-flow\-style generators with standard ODE solvers\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\. In parallel, straightening variants such as rectified flow learnt transport dynamics whose trajectories are as close to straight lines as possible, yielding accurate generation with very coarse discretization\(Liuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib15)\)\. Other work proposed alternative routes to transport\-based generation, including autoregressive diffusion, non\-Gaussian degradation processes, and path\-space constructions tailored to discrete or constrained settings\(Hoogeboomet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib58); Bansalet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib36); Campbellet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib59)\)\.
ρ0\\rho\_\{0\}dataρt\\rho\_\{t\}intermediateρ1\\rho\_\{1\}noiseforward pathProbability transport viewtransport between data and referenceforward SDEdXt=f\(Xt,t\)dt\+g\(t\)dWtdX\_\{t\}=f\(X\_\{t\},t\)\\,dt\+g\(t\)\\,dW\_\{t\}learn score fieldsθ\(x,t\)≈∇xlogρt\(x\)s\_\{\\theta\}\(x,t\)\\approx\\nabla\_\{x\}\\log\\rho\_\{t\}\(x\)reverse\-time SDEsamplingsame marginalsreverse samplingDiffusion / score viewprobability\-flow ODEvPF\(x,t\)=f\(x,t\)−12g\(t\)2st\(x\)v\_\{\\mathrm\{PF\}\}\(x,t\)=f\(x,t\)\-\\tfrac\{1\}\{2\}g\(t\)^\{2\}s\_\{t\}\(x\)same one\-time marginalsas forward SDEflow matchinglearn velocityvθ\(x,t\)v\_\{\\theta\}\(x,t\)ODE transportdirect velocity learningDeterministic transport viewinduce velocityMain design choices:\(1\)*path choice*\(ρt\)\(\\rho\_\{t\}\), \(2\)*learned field*\(scorests\_\{t\}versus velocityvtv\_\{t\}\), \(3\)*sampling dynamics*\(reverse\-time SDE versus deterministic ODE\)Diffusion learns a score field and can sample via either a reverse\-time SDE or the associated probability\-flow ODE; flow matching specifies a path directly and learns the corresponding velocity field\.
Figure 1:Unified view of diffusion, score\-based, and flow\-matching generative models as probability transport between a data distributionρ0\\rho\_\{0\}and a reference distributionρ1\\rho\_\{1\}\.Figure[1](https://arxiv.org/html/2605.06829#S1.F1)summarizes the common transport perspective that motivates this survey\. It shows how diffusion, score\-based sampling, probability\-flow ODEs, and flow matching can be organized around three design choices: the probability path, the learned field, and the sampling dynamics\.
### 1\.1Unifying theme: probability transport through learned fields
This survey develops a unified technical view: diffusion models, score\-based generative models, and flow matching can be interpreted as instances of learning probability transport\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\. We enumerate the integral elements of this survey below:
1. 1\.A family of intermediate distributions\(ρt\)t∈\[0,1\]\(\\rho\_\{t\}\)\_\{t\\in\[0,1\]\}connecting a complex targetρ0\\rho\_\{0\}\(data\) to a tractable referenceρ1\\rho\_\{1\}\(often Gaussian\)\.
2. 2\.A time\-dependent field that determines how probability mass evolves along this path: - •a*score field*st\(x\)=∇xlogρt\(x\)s\_\{t\}\(x\)=\\nabla\_\{x\}\\log\\rho\_\{t\}\(x\), used prominently in score\-based and diffusion formulations, or - •a*velocity field*vt\(x\)v\_\{t\}\(x\)defining deterministic transport via a continuity equation, used prominently in CNF and flow\-matching formulations\.
In score\-based generative modeling, the reverse\-time sampler obtains time\-dependent scores and integrates a reverse\-time SDE; in addition, an associated deterministic probability flow ODE can be constructed that shares the same marginals as the SDE and enables likelihood computation via change\-of\-variables along the ODE flow\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Chenet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib6); Grathwohlet al\.,[2019](https://arxiv.org/html/2605.06829#bib.bib11)\)\. The roots of this perspective traces back to score matching—a method for learning unnormalized models by matching score functions—together with denoising score matching, which connects score estimation to denoising objectives\(Hyvärinen,[2005](https://arxiv.org/html/2605.06829#bib.bib4); Vincent,[2011](https://arxiv.org/html/2605.06829#bib.bib5); Songet al\.,[2020b](https://arxiv.org/html/2605.06829#bib.bib30); Song and Ermon,[2020](https://arxiv.org/html/2605.06829#bib.bib13)\)\. These connections are explicitly leveraged in modern diffusion\-model training objectives, including the DDPM formulation\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1); Kingmaet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib28)\)\.
### 1\.2Why a unifying survey is needed now
A unifying survey is timely for at least three reasons\.
#### \(i\) Methodological convergence\.
The boundary between diffusion/score and flow\-based methods have blurred\. Score\-based modeling through SDEs explicitly derives both a reverse\-time SDE sampler and an equivalent deterministic ODE sampler \(probability flow ODE\), thereby connecting diffusion\-style modeling with continuous normalizing flows and ODE\-based likelihood computation; flow matching makes this connection even more explicit by learning velocity fields directly along prescribed probability paths\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Chenet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib6); Grathwohlet al\.,[2019](https://arxiv.org/html/2605.06829#bib.bib11); Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\. More recent conditional and inverse\-problem formulations further suggest that these methods are converging operationally even when they differ in parameterization\(Tewariet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib97); Li and Pereira,[2024](https://arxiv.org/html/2605.06829#bib.bib103)\)\.
#### \(ii\) Fragmented notation and competing derivations\.
Diffusion models are often introduced through discrete\-time Markov chains and variational training objectives, while score\-based approaches are presented through SDE time reversal, and flow matching is presented through regression of vector fields over probability paths\. These presentations can obscure the fact that many techniques differ mainly by \(a\) the choice of pathρt\\rho\_\{t\}, \(b\) whether one learns scores or velocities, and \(c\) whether one samples by SDE or ODE integration\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1); Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3); Kingmaet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib28)\)\.
#### \(iii\) Practical stakes: sampling, stability, and compute\.
Once methods are viewed as learned dynamics, numerical analysis and modeling choices become critical: discretization bias, stiffness, solver choice, and the role of stochasticity \(SDE versus ODE\) materially affect sample quality, cost, and robustness\. The neural ODE/CNF literature provides a framework for understanding likelihood computation and the numerical behavior of continuous\-time flows, which becomes directly relevant when diffusion models are sampled with deterministic ODEs\(Chenet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib6); Grathwohlet al\.,[2019](https://arxiv.org/html/2605.06829#bib.bib11); Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Karraset al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib16)\)\. The same issues become even more visible in inverse\-problem settings, where conditioning constraints can magnify small numerical or modeling errors\(Routet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib98); Pandeyet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib104)\)\.
### 1\.3Scope and goals
#### Covered\.
We focus on continuous\-time formulations of:
- •diffusion probabilistic models and score\-based generative models \(forward SDEs, reverse\-time SDE sampling, denoising score matching objectives, and probability flow ODE sampling\)\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1); Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Kingmaet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib28)\);
- •flow matching as a method for training CNFs by regressing vector fields of fixed probability paths, including diffusion paths as special instances\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\); and
- •connections to Schrödinger bridges / entropic optimal transport on path space as an interpretation and generalization of score\-based diffusion methods\(De Bortoliet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib8); Léonard,[2014](https://arxiv.org/html/2605.06829#bib.bib35)\)\.
#### Not covered in depth\.
We do not attempt a comprehensive catalog of architectures or application domains; our emphasis is on mathematical unification, objective equivalences, and sampling dynamics, together with the consequences of design choices such as paths and parameterizations\.
### 1\.4Related surveys and tutorials
Several recent survey and tutorial papers partially overlap with our scope, but each emphasizes a different slice of the landscape\. Broad diffusion surveys by Yang et al\., Cao et al\., and Ahsan et al\. focus primarily on diffusion\-model foundations, algorithmic variants, and applications across modalities and domains\(Yanget al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib105); Caoet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib106); Ahsanet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib107)\)\. Efficiency\-oriented reviews by Ma et al\. and Shen et al\. concentrate on acceleration, efficient training and inference, and deployment considerations for diffusion models\(Maet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib108); Shenet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib109)\)\. Tang and Zhao provide a technical tutorial centered specifically on score\-based diffusion models through the stochastic\-differential\-equation formulation\(Tang and Zhao,[2024](https://arxiv.org/html/2605.06829#bib.bib110)\), while Lipman et al\. give a comprehensive review of flow matching and its extensions\(Lipmanet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib47)\)\. Holderrieth and Erives offer a first\-principles tutorial spanning both diffusion and flow matching, with emphasis on practical construction of modern generators\(Holderrieth and Erives,[2025](https://arxiv.org/html/2605.06829#bib.bib111)\)\.
Our survey differs in emphasis and organization\. Instead of focusing only diffusion models, only flow matching, or only efficiency, we adopt a unified measure\-theoretic probability\-transport umbrella in which diffusion, score\-based models, probability\-flow ODEs, and flow matching are treated within a common path–field–sampler framework\. We also emphasize formal equivalence statements, comparative structure, and supporting appendices on reverse\-time diffusions, objective equivalences, path measures, and Schrödinger bridges\. In that sense, the present paper is intended to complement existing diffusion\-heavy surveys and method\-specific tutorials by offering a more explicitly unifying and mathematically organized perspective\(Yanget al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib105); Caoet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib106); Ahsanet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib107); Tang and Zhao,[2024](https://arxiv.org/html/2605.06829#bib.bib110); Lipmanet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib47); Holderrieth and Erives,[2025](https://arxiv.org/html/2605.06829#bib.bib111)\)\.
### 1\.5A map of design choices
We organize the landscape into 4 separate choices\.
1. 1\.Pathρt\\rho\_\{t\}\.Diffusion/score methods often defineρt\\rho\_\{t\}as marginals of a forward noising SDE; flow matching definesρt\\rho\_\{t\}through chosen probability paths \(interpolations or couplings\) between endpoints\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3); Liuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib15)\)\.
2. 2\.Learned object\.Score\-based methods learnst=∇logρts\_\{t\}=\\nabla\\log\\rho\_\{t\}, typically via denoising score matching; flow matching learnsvtv\_\{t\}directly as a regression target\(Hyvärinen,[2005](https://arxiv.org/html/2605.06829#bib.bib4); Vincent,[2011](https://arxiv.org/html/2605.06829#bib.bib5); Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\.
3. 3\.Sampling dynamics\.Sampling may proceed via reverse\-time SDE integration \(stochastic sampling\) or deterministic ODE integration \(probability flow ODE / CNF\-style sampling\)\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Chenet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib6); Grathwohlet al\.,[2019](https://arxiv.org/html/2605.06829#bib.bib11)\)\.
4. 4\.Objective weighting and numerical error\.Practical performance depends on how losses weight time or noise levels and on numerical discretization effects in SDE/ODE solvers—issues that become salient when translating continuous\-time into finite\-step algorithms\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1); Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Kingmaet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib28); Karraset al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib16)\)\.
### 1\.6Contributions of this survey
This survey makes four main contributions\.
1. 1\.A unified transport viewpoint\.We present diffusion, score\-based, and flow\-matching methods within a common framework of probability transport, using SDE/ODE dynamics and their associated PDEs \(Fokker–Planck and continuity equations\)\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\.
2. 2\.An equivalence map for samplers\.We clarify the relationship between reverse\-time SDE sampling and deterministic probability\-flow ODE sampling, identifying the sense in which they share the same one\-time marginals and where they differ in path\-space behavior and numerical properties\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\.
3. 3\.A unified view of training objectives\.We relate DDPM\-style losses, denoising score matching, and continuous\-time score\-SDE objectives as weighted score\-regression problems, and connect these to flow matching as velocity regression under prescribed probability paths\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1); Vincent,[2011](https://arxiv.org/html/2605.06829#bib.bib5); Kingmaet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib28); Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\.
4. 4\.A theory–practice bridge\.We organize approximation, estimation, discretization, and path\-mismatch effects into a common error decomposition, and use this perspective to highlight open problems in path design, robustness, conditioning, and fast sampling\.
### 1\.7Reading guide: three tracks
The survey is designed for three complementary reading styles\.
- •Applied ML track \(implementation\-first\)\.Read Section 2 for notation, then prioritize the sampler and comparison sections: reverse\-time SDE versus probability flow ODE sampling, flow matching training, and practical tradeoffs such as solver choice, stability, and cost\. The ODE/CNF literature is particularly relevant for understanding likelihood computation and numerical sensitivity\(Chenet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib6); Grathwohlet al\.,[2019](https://arxiv.org/html/2605.06829#bib.bib11); Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\.
- •ML theory track \(equivalences \+ error decomposition\)\.Read the full main text, focusing on the boxed propositions connecting \(i\) time reversal and scores, \(ii\) SDE/ODE marginal equivalence, and \(iii\) objective equivalences \(score matching / denoising score matching / diffusion losses\) and their weighting\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Kingmaet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib28)\)\.
- •Math/stat track \(PDE/SDE foundations\)\.Read Section 2 for notation, then consult the appendices for formal time\-reversal results and PDE derivations\. Classical reverse\-time diffusion modeling provides a rigorous foundation for the reverse\-time SDE in modern score\-based generative modeling\(Anderson,[1982](https://arxiv.org/html/2605.06829#bib.bib7)\)\.
### 1\.8Organization of the paper
Section 2 introduces unified notation and collects preliminaries on SDEs/ODEs and their associated PDEs\. Sections 3–5 present diffusion and score\-based modeling \(forward noising, reverse\-time sampling, and training objectives\)\. Section 6 discusses probability flow ODEs and their relationship to CNFs and likelihood computation\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Chenet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib6); Grathwohlet al\.,[2019](https://arxiv.org/html/2605.06829#bib.bib11)\)\. Section 7 presents flow matching and clarifies its relationship to diffusion paths and score\-based approaches\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\. Section 8 synthesizes comparisons, and Section 9 discusses theory and error sources\. Section 10 concludes with open problems, with appendices providing full derivations and more details\.
## 2Unified Notation and Preliminaries
This section fixes notation and collects the dynamical identities reused throughout the survey\. Our goal is to state the modeling objects and governing equations in a way that can accommodate \(i\) discrete\-time diffusion models, such as DDPM\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1)\), \(ii\) continuous\-time score\-based modeling via SDEs\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\), and \(iii\) deterministic transport formulations, such as continuous normalizing flows and flow matching\(Chenet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib6); Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\. Technical conditions \(existence and uniqueness of solutions, smoothness of densities, and boundary behavior\) are deferred to the appendices\.
### 2\.1Time convention, endpoint distributions, and marginals
We use timet∈\[0,1\]t\\in\[0,1\]\. Unless explicitly stated otherwise, we adopt the*forward*convention
t=0:ρ0\(data\),t=1:ρ1\(noise/prior\),t=0:\\ \\rho\_\{0\}\\ \\text\{\(data\)\},\\qquad t=1:\\ \\rho\_\{1\}\\ \\text\{\(noise/prior\)\},and generation runs in reverse time fromρ1\\rho\_\{1\}toρ0\\rho\_\{0\}\. LetXt∈ℝdX\_\{t\}\\in\\mathbb\{R\}^\{d\}denote a time\-indexed random variable\. We writeμt\\mu\_\{t\}for its probability law andρt\\rho\_\{t\}for its density when that density exists\. In particular,μ0≡μdata\\mu\_\{0\}\\equiv\\mu\_\{\\mathrm\{data\}\}and, when convenient,ρ0≡ρdata\\rho\_\{0\}\\equiv\\rho\_\{\\mathrm\{data\}\}\. The terminal distributionμ1\\mu\_\{1\}is taken to be a tractable reference law, typically𝒩\(0,I\)\\mathcal\{N\}\(0,I\)\.
Two modeling choices recur throughout the paper:
1. 1\.The probability path\(μt\)t∈\[0,1\]\(\\mu\_\{t\}\)\_\{t\\in\[0,1\]\}: how intermediate marginals are defined \(e\.g\., as SDE marginals in diffusion/score models, or via an explicit interpolation/coupling in flow matching\)\.
2. 2\.The learned field: whether we learn a*score field*st\(x\)=∇xlogρt\(x\)s\_\{t\}\(x\)=\\nabla\_\{x\}\\log\\rho\_\{t\}\(x\)or a*velocity field*vt\(x\)v\_\{t\}\(x\)that transports mass along the path\.
Throughout the main text we use density notation when it is available, but many statements are more fundamentally statements about probability measures\. For example, when we later say that two dynamics have the “same marginals,” this should be understood as equality of the one\-time lawsμt\\mu\_\{t\}for eachtt\.
### 2\.2Discrete\-time diffusion notation
In discrete\-time diffusion models, one often writes a forward noising chain
q\(xt∣xt−1\)=𝒩\(1−βtxt−1,βtI\),q\(x\_\{t\}\\mid x\_\{t\-1\}\)=\\mathcal\{N\}\\\!\\big\(\\sqrt\{1\-\\beta\_\{t\}\}\\,x\_\{t\-1\},\\,\\beta\_\{t\}I\\big\),with variance schedule\(βt\)t=1T\(\\beta\_\{t\}\)\_\{t=1\}^\{T\}\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1)\)\. It is standard to define
αt=1−βt,α¯t=∏s=1tαs,\\alpha\_\{t\}=1\-\\beta\_\{t\},\\qquad\\bar\{\\alpha\}\_\{t\}=\\prod\_\{s=1\}^\{t\}\\alpha\_\{s\},so that the perturbed sample admits the closed\-form reparameterization
xt=α¯tx0\+1−α¯tε,ε∼𝒩\(0,I\)\.x\_\{t\}=\\sqrt\{\\bar\{\\alpha\}\_\{t\}\}\\,x\_\{0\}\+\\sqrt\{1\-\\bar\{\\alpha\}\_\{t\}\}\\,\\varepsilon,\\qquad\\varepsilon\\sim\\mathcal\{N\}\(0,I\)\.These discrete\-time quantities will reappear in later sections when we compare DDPM training objectives with their continuous\-time counterparts\.
### 2\.3Forward diffusions as SDEs and the Fokker–Planck equation
A forward diffusion is commonly expressed as an Itô SDE onℝd\\mathbb\{R\}^\{d\},
dXt=f\(Xt,t\)dt\+g\(t\)dWt,dX\_\{t\}=f\(X\_\{t\},t\)\\,dt\+g\(t\)\\,dW\_\{t\},\(1\)wheref:ℝd×\[0,1\]→ℝdf:\\mathbb\{R\}^\{d\}\\times\[0,1\]\\to\\mathbb\{R\}^\{d\}is a drift field,g:\[0,1\]→ℝ\+g:\[0,1\]\\to\\mathbb\{R\}\_\{\+\}is a scalar diffusion coefficient \(extensions to matrix\-valued diffusion are standard\), and\(Wt\)t∈\[0,1\]\(W\_\{t\}\)\_\{t\\in\[0,1\]\}is a standard Wiener process\. Whenμt\\mu\_\{t\}admits a densityρt\\rho\_\{t\}, the latter satisfies the forward Kolmogorov \(Fokker–Planck\) equation
∂tρt\(x\)=−∇⋅\(ρt\(x\)f\(x,t\)\)\+12g\(t\)2Δρt\(x\)\.\\partial\_\{t\}\\rho\_\{t\}\(x\)=\-\\nabla\\cdot\\\!\\big\(\\rho\_\{t\}\(x\)\\,f\(x,t\)\\big\)\+\\tfrac\{1\}\{2\}g\(t\)^\{2\}\\,\\Delta\\rho\_\{t\}\(x\)\.\(2\)Diffusion modeling typically chooses\(f,g\)\(f,g\)so that ifX0∼μ0X\_\{0\}\\sim\\mu\_\{0\}, thenμ1\\mu\_\{1\}is close to a known reference law, often Gaussian, enabling generation by reversing the dynamics\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1); Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\.
### 2\.4Deterministic transport as ODEs and the continuity equation
A deterministic transport model evolves particles via an ODE,
dXtdt=v\(Xt,t\),\\frac\{dX\_\{t\}\}\{dt\}=v\(X\_\{t\},t\),\(3\)wherev:ℝd×\[0,1\]→ℝdv:\\mathbb\{R\}^\{d\}\\times\[0,1\]\\to\\mathbb\{R\}^\{d\}is a velocity field\. IfX0∼μ0X\_\{0\}\\sim\\mu\_\{0\}and the flow map is well\-defined, the induced marginals satisfy the continuity equation
∂tρt\(x\)\+∇⋅\(ρt\(x\)v\(x,t\)\)=0\\partial\_\{t\}\\rho\_\{t\}\(x\)\+\\nabla\\cdot\\\!\\big\(\\rho\_\{t\}\(x\)\\,v\(x,t\)\\big\)=0\(4\)wheneverμt\\mu\_\{t\}admits a densityρt\\rho\_\{t\}\. Continuous normalizing flows use \([3](https://arxiv.org/html/2605.06829#S2.E3)\) to construct invertible transport between endpoints while enabling likelihood evaluation through instantaneous change\-of\-variables\(Chenet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib6)\)\. In particular, along a trajectoryXtX\_\{t\}one has
ddtlogρt\(Xt\)=−∇⋅v\(Xt,t\)\.\\frac\{d\}\{dt\}\\log\\rho\_\{t\}\(X\_\{t\}\)=\-\\nabla\\cdot v\(X\_\{t\},t\)\.\(5\)Flow matching can be viewed as learning a velocity fieldvθv\_\{\\theta\}so that \([4](https://arxiv.org/html/2605.06829#S2.E4)\) holds along a prescribed probability path\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\.
### 2\.5Score functions and score models
Whenμt\\mu\_\{t\}admits a differentiable and strictly positive densityρt\\rho\_\{t\}, the*score*is defined by
st\(x\)=∇xlogρt\(x\)\.s\_\{t\}\(x\)\\;=\\;\\nabla\_\{x\}\\log\\rho\_\{t\}\(x\)\.\(6\)Score\-based generative modeling learns a neural approximationsθ\(x,t\)≈st\(x\)s\_\{\\theta\}\(x,t\)\\approx s\_\{t\}\(x\)for a range of noise levels \(times\) and uses it to define reverse\-time sampling dynamics\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\. In practice, the score is learned via regression losses derived from score matching and denoising score matching\(Hyvärinen,[2005](https://arxiv.org/html/2605.06829#bib.bib4); Vincent,[2011](https://arxiv.org/html/2605.06829#bib.bib5)\), leveraging tractable perturbation kernels that relate clean samplesx0∼μ0x\_\{0\}\\sim\\mu\_\{0\}to noisy samplesxtx\_\{t\}\.
The dependence of score\-based methods on densities is worth emphasizing: score notation is available only when the intermediate laws admit sufficiently regular densities\. This is one reason diffusive perturbations are convenient: they tend to regularize the distribution enough that density\-based objects become well\-defined for positive times\.
### 2\.6Fields on probability space: score vs\. velocity
Many modern generative methods can be described by learning one of two field types:
1. 1\.Score fieldssθ\(x,t\)s\_\{\\theta\}\(x,t\), which enter reverse\-time SDE samplers and the deterministic probability\-flow ODE derived from the same forward diffusion\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\.
2. 2\.Velocity fieldsvθ\(x,t\)v\_\{\\theta\}\(x,t\), which directly specify deterministic transport via \([3](https://arxiv.org/html/2605.06829#S2.E3)\)–\([4](https://arxiv.org/html/2605.06829#S2.E4)\) and are learned in flow matching as regression targets under a path distribution\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\.
Later sections make precise when these parameterizations coincide \(e\.g\., via the probability\-flow construction for a given forward diffusion\) and when they differ \(e\.g\., because flow matching permits broader path choices and training targets\)\.
### 2\.7Notations and Symbols
Table[1](https://arxiv.org/html/2605.06829#S2.T1)summarizes the principal symbols used throughout the paper\.
Table 1:Unified notation used throughout the survey\.
## 3Forward Processes and Probability Paths
Diffusion and score\-based generative models specify a*forward*corruption process that maps datax0∼μ0x\_\{0\}\\sim\\mu\_\{0\}to a tractable reference distributionμ1\\mu\_\{1\}\(typically Gaussian\)\. This process induces a probability path\(μt\)t∈\[0,1\]\(\\mu\_\{t\}\)\_\{t\\in\[0,1\]\}, or densities\(ρt\)t∈\[0,1\]\(\\rho\_\{t\}\)\_\{t\\in\[0,1\]\}when they exist, that is later traversed in reverse for generation\. We summarize the two most common constructions: \(i\) discrete\-time Markov chains \(DDPM\-style\)\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1); Nichol and Dhariwal,[2021](https://arxiv.org/html/2605.06829#bib.bib10)\)and \(ii\) continuous\-time SDEs \(score\-SDE view\)\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\. The objects that recur in training, namely, the marginal perturbation kernel and the resulting time\-dependent signal\-to\-noise ratio\. Recent work shows that this design space extends well beyond standard Gaussian forward noising, including structured discrete\-state denoising, continuous\-time jump\-process corruptions, and physics\-inspired alternatives to diffusion transport\(Austinet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib75); Campbellet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib59); Xuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib60),[2023](https://arxiv.org/html/2605.06829#bib.bib61)\)\.
### 3\.1Discrete\-time forward noising \(DDPM\)
In discrete\-time diffusion models, the forward process can be described as a Markov chain\(xt\)t=0T\(x\_\{t\}\)\_\{t=0\}^\{T\}with Gaussian transitions
q\(xt∣xt−1\)=𝒩\(1−βtxt−1,βtI\),q\(x\_\{t\}\\mid x\_\{t\-1\}\)=\\mathcal\{N\}\\\!\\big\(\\sqrt\{1\-\\beta\_\{t\}\}\\,x\_\{t\-1\},\\ \\beta\_\{t\}I\\big\),\(7\)where\(βt\)t=1T\(\\beta\_\{t\}\)\_\{t=1\}^\{T\}is a variance schedule\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1)\)\. This choice yields a closed\-form marginal perturbation kernel
q\(xt∣x0\)=𝒩\(α¯tx0,\(1−α¯t\)I\),α¯t=∏s=1t\(1−βs\),q\(x\_\{t\}\\mid x\_\{0\}\)=\\mathcal\{N\}\\\!\\big\(\\sqrt\{\\bar\{\\alpha\}\_\{t\}\}\\,x\_\{0\},\\ \(1\-\\bar\{\\alpha\}\_\{t\}\)I\\big\),\\qquad\\bar\{\\alpha\}\_\{t\}=\\prod\_\{s=1\}^\{t\}\(1\-\\beta\_\{s\}\),\(8\)so that a noisy sample can be generated as
xt=α¯tx0\+1−α¯tε,ε∼𝒩\(0,I\)\.x\_\{t\}=\\sqrt\{\\bar\{\\alpha\}\_\{t\}\}\\,x\_\{0\}\+\\sqrt\{1\-\\bar\{\\alpha\}\_\{t\}\}\\,\\varepsilon,\\qquad\\varepsilon\\sim\\mathcal\{N\}\(0,I\)\.\(9\)The schedule\(βt\)\(\\beta\_\{t\}\)determines how quickly information aboutx0x\_\{0\}is erased and therefore controls both training difficulty and the numerical behavior of reverse\-time sampling\. Practical improvements often modify the schedule or the parameterization of the reverse model while preserving the same basic forward kernel\(Nichol and Dhariwal,[2021](https://arxiv.org/html/2605.06829#bib.bib10)\)\.
Although \([7](https://arxiv.org/html/2605.06829#S3.E7)\) uses Gaussian transitions, the same denoising template has been extended to continuous space\. D3PMs replace Gaussian noising by structured discrete\-state transition kernels\(Austinet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib75)\), while star\-shaped diffusion models alter the geometry of the forward process itself\(Okhotinet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib76)\)\. These variants reinforce a central point of this survey: the forward process should be regarded as a modeling choice, not as a fixed Gaussian recipe\.
### 3\.2Continuous\-time forward noising \(score\-SDE view\)
A continuous\-time forward process is specified by an SDE of the form
dXt=f\(Xt,t\)dt\+g\(t\)dWt,t∈\[0,1\],dX\_\{t\}=f\(X\_\{t\},t\)\\,dt\+g\(t\)\\,dW\_\{t\},\\qquad t\\in\[0,1\],chosen so thatX0∼μ0X\_\{0\}\\sim\\mu\_\{0\}andX1∼μ1X\_\{1\}\\sim\\mu\_\{1\}for a tractable reference lawμ1\\mu\_\{1\}\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\. Two widely used families are:
#### Variance\-preserving \(VP\) SDE\.
A common choice is
dXt=−12β\(t\)Xtdt\+β\(t\)dWt,dX\_\{t\}=\-\\tfrac\{1\}\{2\}\\beta\(t\)\\,X\_\{t\}\\,dt\+\\sqrt\{\\beta\(t\)\}\\,dW\_\{t\},\(10\)whereβ\(t\)\>0\\beta\(t\)\>0is a continuous noise schedule\. The corresponding conditional marginals remain Gaussian:
Xt∣X0=x0∼𝒩\(α\(t\)x0,σ\(t\)2I\),X\_\{t\}\\mid X\_\{0\}=x\_\{0\}\\sim\\mathcal\{N\}\\\!\\big\(\\alpha\(t\)\\,x\_\{0\},\\ \\sigma\(t\)^\{2\}I\\big\),\(11\)withα\(t\)\\alpha\(t\)andσ\(t\)\\sigma\(t\)determined byβ\(t\)\\beta\(t\)\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\. This family can be viewed as the continuous\-time limit of DDPM\-style chains under appropriate scaling\.
#### Variance\-exploding \(VE\) SDE\.
Another common choice is
dXt=dσ\(t\)2dtdWt,dX\_\{t\}=\\sqrt\{\\frac\{d\\,\\sigma\(t\)^\{2\}\}\{dt\}\}\\,dW\_\{t\},\(12\)which keeps the mean fixed while increasing the noise level fromσ\(0\)≈0\\sigma\(0\)\\approx 0to a largeσ\(1\)\\sigma\(1\)\. Again,Xt∣X0=x0X\_\{t\}\\mid X\_\{0\}=x\_\{0\}is Gaussian with meanx0x\_\{0\}and varianceσ\(t\)2I\\sigma\(t\)^\{2\}I\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\. VE processes are often convenient for score estimation across a wide range of noise scales\.
Other variants, such as sub\-VP SDEs and critically damped Langevin diffusions, adjust the drift and diffusion to trade off likelihood estimation properties and sampling behavior\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Dockhornet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib57)\)\. Even though these alternative methods might employ certain constraints to modify their likelihood estimation, the primary premise of the forward process remains an instrumental feature\.
### 3\.3Perturbation kernels and reparameterizations
Most training objectives for diffusion and score\-based models reduce to expectations over*perturbed*samplesxtx\_\{t\}together with the time indextt\. The central analytical convenience is that many forward processes yield Gaussian perturbation kernels of the form
xt=m\(t\)x0\+s\(t\)ε,ε∼𝒩\(0,I\),x\_\{t\}=m\(t\)\\,x\_\{0\}\+s\(t\)\\,\\varepsilon,\\qquad\\varepsilon\\sim\\mathcal\{N\}\(0,I\),\(13\)for scalar functionsm\(t\)m\(t\)\(signal coefficient\) ands\(t\)s\(t\)\(noise scale\)\. For discrete\-time DDPMs,m\(t\)=α¯tm\(t\)=\\sqrt\{\\bar\{\\alpha\}\_\{t\}\}ands\(t\)=1−α¯ts\(t\)=\\sqrt\{1\-\\bar\{\\alpha\}\_\{t\}\}\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1)\)\. For VP/VE SDEs, analogous expressions follow from the Gaussian conditional laws above\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\.
A useful derived quantity is the*signal\-to\-noise ratio*\(SNR\), which, up to conventions, scales like\(m\(t\)/s\(t\)\)2\(m\(t\)/s\(t\)\)^\{2\}\. Schedules that allocate more training mass to either high\-SNR \(light noise\) or low\-SNR \(heavy noise\) regimes can materially affect sample quality and the stability of reverse\-time solvers\. Modern analyses of diffusion design often emphasize precisely this schedule/path viewpoint\(Karraset al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib16)\)\.
### 3\.4From discrete\-time DDPM to continuous\-time VP SDE
The discrete\-time chain \([7](https://arxiv.org/html/2605.06829#S3.E7)\) can be viewed as a time discretization of a continuous\-time diffusion\. This bridge is useful because it explains why \(i\) DDPM training objectives admit continuous\-time limits and \(ii\) reverse\-time sampling can be derived cleanly via SDE time reversal\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Sohl\-Dicksteinet al\.,[2015](https://arxiv.org/html/2605.06829#bib.bib9)\)\.
#### DDPM as a discretization of a VP \(Ornstein–Uhlenbeck\-type\) SDE\.
Consider the DDPM forward transition
xk\+1=1−βk\+1xk\+βk\+1εk\+1,εk\+1∼𝒩\(0,I\),x\_\{k\+1\}=\\sqrt\{1\-\\beta\_\{k\+1\}\}\\,x\_\{k\}\+\\sqrt\{\\beta\_\{k\+1\}\}\\,\\varepsilon\_\{k\+1\},\\qquad\\varepsilon\_\{k\+1\}\\sim\\mathcal\{N\}\(0,I\),with schedule\(βk\)k=1T\(\\beta\_\{k\}\)\_\{k=1\}^\{T\}\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1)\)\. Introduce a time stepΔt=1/T\\Delta t=1/Tand associatetk=kΔtt\_\{k\}=k\\Delta t\. Suppose
βk\+1=β\(tk\)Δt\+o\(Δt\),\\beta\_\{k\+1\}=\\beta\(t\_\{k\}\)\\,\\Delta t\+o\(\\Delta t\),for some smooth functionβ\(t\)\>0\\beta\(t\)\>0\. Using the Taylor approximation1−u=1−12u\+o\(u\)\\sqrt\{1\-u\}=1\-\\tfrac\{1\}\{2\}u\+o\(u\), we obtain
1−βk\+1=1−12β\(tk\)Δt\+o\(Δt\),βk\+1=β\(tk\)Δt\+o\(Δt\)\.\\sqrt\{1\-\\beta\_\{k\+1\}\}=1\-\\tfrac\{1\}\{2\}\\beta\(t\_\{k\}\)\\Delta t\+o\(\\Delta t\),\\qquad\\sqrt\{\\beta\_\{k\+1\}\}=\\sqrt\{\\beta\(t\_\{k\}\)\}\\,\\sqrt\{\\Delta t\}\+o\(\\sqrt\{\\Delta t\}\)\.Substituting into the update yields
xk\+1−xk=−12β\(tk\)xkΔt\+β\(tk\)Δtεk\+1\+o\(Δt\)\.x\_\{k\+1\}\-x\_\{k\}=\-\\tfrac\{1\}\{2\}\\beta\(t\_\{k\}\)\\,x\_\{k\}\\,\\Delta t\+\\sqrt\{\\beta\(t\_\{k\}\)\}\\,\\sqrt\{\\Delta t\}\\,\\varepsilon\_\{k\+1\}\+o\(\\Delta t\)\.SinceΔtεk\+1\\sqrt\{\\Delta t\}\\,\\varepsilon\_\{k\+1\}has the same distribution as a Brownian incrementΔWk∼𝒩\(0,ΔtI\)\\Delta W\_\{k\}\\sim\\mathcal\{N\}\(0,\\Delta t\\,I\), this is precisely an Euler–Maruyama discretization of the VP SDE
dXt=−12β\(t\)Xtdt\+β\(t\)dWt\.dX\_\{t\}=\-\\tfrac\{1\}\{2\}\\beta\(t\)\\,X\_\{t\}\\,dt\+\\sqrt\{\\beta\(t\)\}\\,dW\_\{t\}\.In this sense, DDPM forward noising converges to a VP SDE asT→∞T\\to\\inftyunder the above scaling, and discrete\-time diffusion objectives can be interpreted as discretizations of continuous\-time score\-based objectives\.
### 3\.5Marginals versus path measures
A forward diffusion specifies more than the family of one\-time marginals\(μt\)\(\\mu\_\{t\}\): it induces a full*path measure*on trajectories\(Xt\)t∈\[0,1\]\(X\_\{t\}\)\_\{t\\in\[0,1\]\}through the SDE dynamics\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\. Many equivalence statements used later in the survey should be read with this distinction in mind\. For example, the probability\-flow ODE associated with a forward SDE is constructed to match the*same marginals*\(μt\)\(\\mu\_\{t\}\)while generally inducing a different distribution over paths\. Conversely, different stochastic processes can share the same endpoint laws but define distinct intermediate marginals and trajectory laws\.
This perspective becomes especially important when comparing diffusion and score\-based modeling to flow matching: flow matching begins by specifying a coupling/interpolation scheme, and hence a path measure, and then learns a velocity field consistent with that chosen path\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\. Relatedly, Schrödinger bridge formulations make the path\-measure viewpoint explicit by posing transport as an optimization over stochastic path measures subject to endpoint constraints\(De Bortoliet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib8)\)\. We return to this distinction when discussing sampler equivalences and conditioning methods in later sections\.
### 3\.6Probability paths beyond diffusions
Flow matching makes the dependence on the probability path explicit: rather than taking\(μt\)\(\\mu\_\{t\}\)as the marginals of a fixed forward SDE, one may define\(μt\)\(\\mu\_\{t\}\)through a coupling/interpolation scheme and then learn a velocity field consistent with that path\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\. From this perspective, diffusion models correspond to one important stochastic path family, while alternative paths—for example, the “straightened” trajectories emphasized in rectified flow—aim to improve numerical properties such as the effectiveness of coarse discretization\(Liuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib15)\)\.
This broader path\-based viewpoint is also useful beyond continuous Gaussian perturbations\. For discrete data, continuous\-time denoising models define diffusion\-like corruption and recovery processes using jump dynamics rather than Gaussian noise\(Campbellet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib59)\)\. Related approaches such as Analog Bits and masked diffusion adapt diffusion\-style training to noncontinuous state spaces by embedding or masking discrete structure in ways that preserve tractable learning targets\(Chenet al\.,[2023b](https://arxiv.org/html/2605.06829#bib.bib62); Sahooet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib80)\)\. Similar ideas have also been explored for discrete\-state graph generation\(Xuet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib81)\)\. At the same time, physics\-inspired alternatives such as Poisson Flow Generative Models and PFGM\+\+ show that the same endpoint\-generation problem can be approached through transport constructions that differ substantially from standard Gaussian diffusion while preserving a closely related generative objective\(Xuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib60),[2023](https://arxiv.org/html/2605.06829#bib.bib61)\)\. We return to these connections when comparing broader families of generative transports in later sections\.
## 4Reverse\-Time Dynamics and Sampling
Given a forward corruption process \(Section[3](https://arxiv.org/html/2605.06829#S3)\) that maps the data lawμ0\\mu\_\{0\}to a tractable reference lawμ1\\mu\_\{1\}, generation proceeds by approximately simulating a*reverse\-time*dynamic that transports samples fromμ1\\mu\_\{1\}back toμ0\\mu\_\{0\}\. For diffusion and score\-based models, the reverse\-time sampler can be derived from stochastic time reversal and depends on the*score*of the intermediate marginals\. This section states the reverse\-time SDE and explains how a learned score model gives rise to practical samplers\.
### 4\.1Reverse\-time SDE: where the score enters
Consider the forward Itô diffusion
dXt=f\(Xt,t\)dt\+g\(t\)dWt,t∈\[0,1\],dX\_\{t\}=f\(X\_\{t\},t\)\\,dt\+g\(t\)\\,dW\_\{t\},\\qquad t\\in\[0,1\],and assume that its one\-time laws admit sufficiently smooth positive densitiesρt\\rho\_\{t\}\. Under suitable regularity assumptions, the time\-reversed process satisfies an SDE whose drift contains the score∇xlogρt\(x\)\\nabla\_\{x\}\\log\\rho\_\{t\}\(x\)\(Anderson,[1982](https://arxiv.org/html/2605.06829#bib.bib7); Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\. Informally, the reverse\-time dynamics can be written as
dXt=\(f\(Xt,t\)−g\(t\)2∇xlogρt\(Xt\)\)dt\+g\(t\)dW¯t,dX\_\{t\}=\\Big\(f\(X\_\{t\},t\)\-g\(t\)^\{2\}\\,\\nabla\_\{x\}\\log\\rho\_\{t\}\(X\_\{t\}\)\\Big\)\\,dt\+g\(t\)\\,d\\bar\{W\}\_\{t\},\(14\)run fromt=1t=1down tot=0t=0, whereW¯t\\bar\{W\}\_\{t\}denotes a reverse\-time Wiener process\.111Different conventions absorb sign changes into the direction of integration\. We follow the common convention of writing the reverse\-time SDE with drift evaluated at timettwhile integrating from11to0; seeSonget al\.\([2021](https://arxiv.org/html/2605.06829#bib.bib2)\)for a precise formulation\.
###### Proposition 1\(Reverse\-time SDE and the score\)
Let\(Xt\)t∈\[0,1\]\(X\_\{t\}\)\_\{t\\in\[0,1\]\}solve the forward SDE
dXt=f\(Xt,t\)dt\+g\(t\)dWt,dX\_\{t\}=f\(X\_\{t\},t\)\\,dt\+g\(t\)\\,dW\_\{t\},and suppose the marginals admit sufficiently smooth positive densitiesρt\\rho\_\{t\}\. Then, under standard time\-reversal regularity assumptions, the reverse\-time dynamics are given by an SDE whose drift contains the score of the forward marginals:
dXt=\(f\(Xt,t\)−g\(t\)2∇xlogρt\(Xt\)\)dt\+g\(t\)dW¯t,dX\_\{t\}=\\Big\(f\(X\_\{t\},t\)\-g\(t\)^\{2\}\\nabla\_\{x\}\\log\\rho\_\{t\}\(X\_\{t\}\)\\Big\)\\,dt\+g\(t\)\\,d\\bar\{W\}\_\{t\},interpreted in reverse time\.
Proposition[1](https://arxiv.org/html/2605.06829#Thmtheorem1)is the mathematical reason score estimation suffices for generation: once the score fieldst\(x\)=∇logρt\(x\)s\_\{t\}\(x\)=\\nabla\\log\\rho\_\{t\}\(x\)is approximated along the forward path, reverse\-time sampling can be defined directly\. A full derivation goes back to classical time\-reversal results for diffusions\(Anderson,[1982](https://arxiv.org/html/2605.06829#bib.bib7)\); the score\-SDE formulation ofSonget al\.\([2021](https://arxiv.org/html/2605.06829#bib.bib2)\)makes this usable in modern generative modeling\.
### 4\.2Learning the score field
In practice,ρt\\rho\_\{t\}is unknown, so we learn a neural networksθ\(x,t\)s\_\{\\theta\}\(x,t\)that approximatesst\(x\)s\_\{t\}\(x\)\. Training is typically based on denoising score matching: one samplesx0∼μ0x\_\{0\}\\sim\\mu\_\{0\}, chooses a timett, generates a perturbed samplextx\_\{t\}from the forward perturbation kernel \(Section[3\.3](https://arxiv.org/html/2605.06829#S3.SS3)\), and regressessθ\(xt,t\)s\_\{\\theta\}\(x\_\{t\},t\)toward the analytically available conditional score∇xtlogq\(xt∣x0\)\\nabla\_\{x\_\{t\}\}\\log q\(x\_\{t\}\\mid x\_\{0\}\)\(Vincent,[2011](https://arxiv.org/html/2605.06829#bib.bib5); Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\. Section 5 discusses these objectives and their equivalences in detail\. For the purposes of the present section, the key point is that oncesθs\_\{\\theta\}has been trained, it can be substituted for the unknown score in \([14](https://arxiv.org/html/2605.06829#S4.E14)\)\.
### 4\.3Practical samplers: discretizing the reverse SDE
To generate samples, one initializesx1∼μ1x\_\{1\}\\sim\\mu\_\{1\}\(typically𝒩\(0,I\)\\mathcal\{N\}\(0,I\)\) and numerically integrates the reverse\-time SDE fromt=1t=1tot=0t=0using the learned scoresθs\_\{\\theta\}\. The simplest discretization is an Euler–Maruyama step of the schematic form
xt−Δt=xt\+\(f\(xt,t\)−g\(t\)2sθ\(xt,t\)\)Δt\+g\(t\)Δtz,z∼𝒩\(0,I\)\.x\_\{t\-\\Delta t\}=x\_\{t\}\+\\Big\(f\(x\_\{t\},t\)\-g\(t\)^\{2\}s\_\{\\theta\}\(x\_\{t\},t\)\\Big\)\\Delta t\+g\(t\)\\sqrt\{\\Delta t\}\\,z,\\qquad z\\sim\\mathcal\{N\}\(0,I\)\.Exact implementations differ according to time parameterization and sign convention, but the essential structure is the same: a deterministic drift term driven by the learned score together with a stochastic diffusion term\. More accurate or stable samplers use higher\-order SDE solvers or*predictor–corrector*schemes that alternate a predictor step \(SDE discretization\) with a corrector step based on Langevin dynamics\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\. In discrete\-time DDPMs, the reverse\-time sampler is often implemented as an ancestral reverse Markov chain whose parameters are predicted by a neural network\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1)\)\.
### 4\.4SDE versus ODE sampling \(preview\)
The reverse\-time SDE sampler is stochastic: even with a fixed initial noise seed, injected noise at each discretization step influences the output\. A complementary deterministic alternative is obtained by constructing a*probability flow ODE*whose solution shares the same one\-time marginals as the forward SDE\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\. This ODE viewpoint connects diffusion sampling to continuous normalizing flows and enables likelihood computation via instantaneous change\-of\-variables, while generally inducing a different path measure from the SDE\. We discuss this construction in Section 6\.
## 5Training Objectives and Equivalences
Sections[3](https://arxiv.org/html/2605.06829#S3)–[4](https://arxiv.org/html/2605.06829#S4)described how forward noising processes induce a probability path\(ρt\)\(\\rho\_\{t\}\)and how reverse\-time sampling depends on the unknown score fieldst=∇logρts\_\{t\}=\\nabla\\log\\rho\_\{t\}\. We now describe how modern diffusion/score models*learn*the score \(or an equivalent parameterization\) from data\. The key unifying theme is that most objectives reduce to*weighted regression*of analytically available targets under perturbed data, and many seemingly different losses \(DDPM noise prediction, score matching, and continuous\-time objectives\) are equivalent up to time\-dependent weights and parameterizations\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1); Vincent,[2011](https://arxiv.org/html/2605.06829#bib.bib5); Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\.
### 5\.1Score matching and denoising score matching
Classical score matching estimates the score of an unknown density by minimizing the Fisher divergence between a model score and the data score\(Hyvärinen,[2005](https://arxiv.org/html/2605.06829#bib.bib4)\)\. In generative modeling, the most widely used variant is*denoising score matching*\(DSM\), which leverages a tractable corruption kernelqσ\(x∣x0\)q\_\{\\sigma\}\(x\\mid x\_\{0\}\)\(often Gaussian\) and the identity
∇xlogqσ\(x\)=𝔼\[∇xlogqσ\(x∣x0\)∣x\],\\nabla\_\{x\}\\log q\_\{\\sigma\}\(x\)\\;=\\;\\mathbb\{E\}\\big\[\\nabla\_\{x\}\\log q\_\{\\sigma\}\(x\\mid x\_\{0\}\)\\mid x\\big\],to train a score network without ever evaluatingqσ\(x\)q\_\{\\sigma\}\(x\)directly\(Vincent,[2011](https://arxiv.org/html/2605.06829#bib.bib5)\)\. In the diffusion setting, the “noise level”σ\\sigmais replaced by timett, and DSM is applied to the family of perturbed marginalsρt\\rho\_\{t\}, induced by the forward process\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\.
For Gaussian perturbations of the formxt=m\(t\)x0\+s\(t\)εx\_\{t\}=m\(t\)x\_\{0\}\+s\(t\)\\varepsilon\(Section[3\.3](https://arxiv.org/html/2605.06829#S3.SS3)\), the conditional densityq\(xt∣x0\)q\(x\_\{t\}\\mid x\_\{0\}\)is Gaussian and its score with respect toxtx\_\{t\}is available in closed form:
∇xtlogq\(xt∣x0\)=−1s\(t\)2\(xt−m\(t\)x0\)\.\\nabla\_\{x\_\{t\}\}\\log q\(x\_\{t\}\\mid x\_\{0\}\)\\;=\\;\-\\frac\{1\}\{s\(t\)^\{2\}\}\\Big\(x\_\{t\}\-m\(t\)\\,x\_\{0\}\\Big\)\.\(15\)DSM then regresses the model scoresθ\(xt,t\)s\_\{\\theta\}\(x\_\{t\},t\)toward \([15](https://arxiv.org/html/2605.06829#S5.E15)\) in expectation over\(x0,t,ε\)\(x\_\{0\},t,\\varepsilon\)\.
### 5\.2DDPM training as noise prediction \(and score prediction\)
DDPMs are often trained by predicting the noiseε\\varepsilonin the reparameterization \([9](https://arxiv.org/html/2605.06829#S3.E9)\):
ℒε\(θ\)=𝔼t,x0,ε\[‖ε−εθ\(xt,t\)‖2\],xt=α¯tx0\+1−α¯tε,\\mathcal\{L\}\_\{\\varepsilon\}\(\\theta\)=\\mathbb\{E\}\_\{t,x\_\{0\},\\varepsilon\}\\Big\[\\big\\\|\\varepsilon\-\\varepsilon\_\{\\theta\}\(x\_\{t\},t\)\\big\\\|^\{2\}\\Big\],\\qquad x\_\{t\}=\\sqrt\{\\bar\{\\alpha\}\_\{t\}\}x\_\{0\}\+\\sqrt\{1\-\\bar\{\\alpha\}\_\{t\}\}\\varepsilon,\(16\)possibly with time\-dependent weights\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1)\)\. This objective is equivalent to score regression under a change of variables: for the Gaussian kernel \([8](https://arxiv.org/html/2605.06829#S3.E8)\), the conditional score \([15](https://arxiv.org/html/2605.06829#S5.E15)\) can be expressed in terms ofε\\varepsilon, and a predictorεθ\\varepsilon\_\{\\theta\}induces a score model via
sθ\(xt,t\)≈−11−α¯tεθ\(xt,t\)\(up to known scalings\)\.s\_\{\\theta\}\(x\_\{t\},t\)\\;\\approx\\;\-\\frac\{1\}\{\\sqrt\{1\-\\bar\{\\alpha\}\_\{t\}\}\}\\,\\varepsilon\_\{\\theta\}\(x\_\{t\},t\)\\quad\\text\{\(up to known scalings\)\.\}\(17\)Thus “noise prediction” and “score prediction” are largely different parameterizations of the same regression problem, with different implicit weightings across time/noise levels\. Improved DDPM variants discuss alternative parameterizations and weighting choices\(Nichol and Dhariwal,[2021](https://arxiv.org/html/2605.06829#bib.bib10)\)\.
### 5\.3Continuous\-time objectives and weighting
In the continuous\-time score\-SDE framework, one typically samplest∼Unif\[0,1\]t\\sim\\mathrm\{Unif\}\[0,1\]\(or another distribution\) and minimizes a time\-weighted DSM objective of the form
ℒ\(θ\)=𝔼t𝔼x0,ε\[λ\(t\)∥sθ\(xt,t\)−∇xtlogq\(xt∣x0\)∥2\],\\mathcal\{L\}\(\\theta\)=\\mathbb\{E\}\_\{t\}\\,\\mathbb\{E\}\_\{x\_\{0\},\\varepsilon\}\\Big\[\\lambda\(t\)\\,\\big\\\|s\_\{\\theta\}\(x\_\{t\},t\)\-\\nabla\_\{x\_\{t\}\}\\log q\(x\_\{t\}\\mid x\_\{0\}\)\\big\\\|^\{2\}\\Big\],\(18\)whereλ\(t\)\\lambda\(t\)is a weighting function\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\. Different choices ofλ\(t\)\\lambda\(t\)correspond to emphasizing different SNR regimes along the path, and can strongly affect both sample quality and the stiffness/discretization sensitivity of reverse\-time solvers \(cf\. the schedule/path viewpoint in\(Karraset al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib16)\)\)\.
### 5\.4Equivalence view: weighted Fisher divergence \(informal\)
Many diffusion/score objectives can be interpreted as minimizing a*time\-integrated Fisher divergence*between the learned score and the true score ofρt\\rho\_\{t\}\(up to constants and weighting\)\. Informally,
ℒ\(θ\)≈∫01λ\(t\)𝔼x∼ρt\[‖sθ\(x,t\)−st\(x\)‖2\]𝑑t,\\mathcal\{L\}\(\\theta\)\\ \\approx\\ \\int\_\{0\}^\{1\}\\lambda\(t\)\\,\\mathbb\{E\}\_\{x\\sim\\rho\_\{t\}\}\\big\[\\\|s\_\{\\theta\}\(x,t\)\-s\_\{t\}\(x\)\\\|^\{2\}\\big\]\\,dt,where the relationship betweenλ\(t\)\\lambda\(t\)and the forward process depends on the chosen parameterization and discretization\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1); Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Kingmaet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib28)\)\. We defer precise equivalence statements \(including the connection to ELBO\-style objectives for discrete\-time diffusion models\) to Appendix C and focus here on the practical implications:*different “losses” often differ primarily by the time/noise weighting they induce\.*
###### Proposition 2\(Diffusion objectives as weighted score regression\)
For Gaussian perturbation kernels and the usual diffusion parameterizations, the standard training objectives used in DDPMs, denoising score matching, and continuous\-time score\-SDE training can all be written, up to constants and schedule\-dependent weights, as minimizing a time\-integrated squared error between a model score and the true score of the perturbed marginals:
∫01λ\(t\)𝔼x∼ρt\[‖sθ\(x,t\)−st\(x\)‖2\]𝑑t\.\\int\_\{0\}^\{1\}\\lambda\(t\)\\,\\mathbb\{E\}\_\{x\\sim\\rho\_\{t\}\}\\\!\\left\[\\\|s\_\{\\theta\}\(x,t\)\-s\_\{t\}\(x\)\\\|^\{2\}\\right\]\\,dt\.Equivalently, common noise\-prediction objectives are reparameterized score\-regression objectives under Gaussian perturbations\.
Proposition[2](https://arxiv.org/html/2605.06829#Thmtheorem2)formalizes the unifying claim of this subsection: DDPM noise\-prediction losses, denoising score matching, and continuous\-time score\-SDE objectives differ mainly in parameterization and weighting, rather than in the statistical object they estimate\. In the main text we keep this statement informal; the derivational details are deferred to Appendix C\.
### 5\.5Practice note: parameterizations and what changes
A recurring source of confusion is that implementations may train networks to predict different quantities—ε\\varepsilon,x0x\_\{0\}, the mean ofpθ\(xt−1∣xt\)p\_\{\\theta\}\(x\_\{t\-1\}\\mid x\_\{t\}\), or the score—while describing them as different methods\. For Gaussian forward kernels, these parameterizations are algebraically linked by known scalings \(e\.g\., \([17](https://arxiv.org/html/2605.06829#S5.E17)\)\), but the induced optimization problem can change due to weighting and normalization\. From a unifying viewpoint, the question is not “which variable is predicted?” but rather:*what score/velocity field is implied by the parameterization and what time\-dependent weighting does training apply along the path?*
## 6Probability Flow ODE and Likelihood Connections
The reverse\-time sampler in Section[4](https://arxiv.org/html/2605.06829#S4)is stochastic because it integrates an SDE whose drift depends on the score\. A complementary deterministic viewpoint—central to unifying diffusion models with continuous normalizing flows—is that the same family of one\-time marginals can be generated by an ODE whose vector field also depends on the score\. This*probability flow ODE*enables deterministic sampling and, in principle, likelihood computation via instantaneous change\-of\-variables\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Chenet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib6); Grathwohlet al\.,[2019](https://arxiv.org/html/2605.06829#bib.bib11)\)\.
### 6\.1From Fokker–Planck to a deterministic flow
Consider the forward SDE
dXt=f\(Xt,t\)dt\+g\(t\)dWt,dX\_\{t\}=f\(X\_\{t\},t\)\\,dt\+g\(t\)\\,dW\_\{t\},and suppose its one\-time laws admit densitiesρt\\rho\_\{t\}satisfying the Fokker–Planck equation \([2](https://arxiv.org/html/2605.06829#S2.E2)\)\.Songet al\.\([2021](https://arxiv.org/html/2605.06829#bib.bib2)\)observed that one can construct a deterministic ODE
dXtdt=f\(Xt,t\)−12g\(t\)2∇xlogρt\(Xt\),\\frac\{dX\_\{t\}\}\{dt\}=f\(X\_\{t\},t\)\-\\tfrac\{1\}\{2\}g\(t\)^\{2\}\\,\\nabla\_\{x\}\\log\\rho\_\{t\}\(X\_\{t\}\),\(19\)whose induced density evolution matches the same Fokker–Planck marginals\. In other words, integrating \([19](https://arxiv.org/html/2605.06829#S6.E19)\) fromt=0t=0tot=1t=1produces the same family of one\-time marginals as the stochastic forward SDE \(and likewise in reverse time for generation\), even though the path measures differ \(Section[3\.5](https://arxiv.org/html/2605.06829#S3.SS5)\)\. This marginal\-equivalence property is the formal bridge between diffusion models and deterministic transport\.
###### Proposition 3\(Probability\-flow ODE has the same marginals\)
Assume the forward SDE marginals admit sufficiently smooth densitiesρt\\rho\_\{t\}so that the scorest\(x\)=∇xlogρt\(x\)s\_\{t\}\(x\)=\\nabla\_\{x\}\\log\\rho\_\{t\}\(x\)is well\-defined\. Then the ODE
dXtdt=f\(Xt,t\)−12g\(t\)2st\(Xt\)\\frac\{dX\_\{t\}\}\{dt\}=f\(X\_\{t\},t\)\-\\tfrac\{1\}\{2\}g\(t\)^\{2\}\\,s\_\{t\}\(X\_\{t\}\)induces the same one\-time marginals as the forward SDE, and hence the same densities\(ρt\)t∈\[0,1\]\(\\rho\_\{t\}\)\_\{t\\in\[0,1\]\}whenever those densities exist\.
The proof is obtained by matching the PDEs governing density evolution: substituting the ODE velocity into the continuity equation reproduces the same Fokker–Planck evolution as the SDE\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\. Proposition[3](https://arxiv.org/html/2605.06829#Thmtheorem3)should be read as a*marginal*equivalence statement; the ODE and SDE generally induce different path measures\.
In practice,∇xlogρt\\nabla\_\{x\}\\log\\rho\_\{t\}is replaced by the learned score networksθ\(x,t\)s\_\{\\theta\}\(x,t\), yielding the approximate probability flow ODE
dXtdt=f\(Xt,t\)−12g\(t\)2sθ\(Xt,t\)\.\\frac\{dX\_\{t\}\}\{dt\}=f\(X\_\{t\},t\)\-\\tfrac\{1\}\{2\}g\(t\)^\{2\}\\,s\_\{\\theta\}\(X\_\{t\},t\)\.\(20\)Sampling is then performed by solving \([20](https://arxiv.org/html/2605.06829#S6.E20)\) backward in time fromt=1t=1tot=0t=0with an ODE solver\.
### 6\.2Deterministic sampling and numerical considerations
Compared to SDE sampling, ODE sampling has two practical consequences\.
#### \(i\) Determinism\.
Given an initial drawx1∼μ1x\_\{1\}\\sim\\mu\_\{1\}, the ODE trajectory is deterministic up to numerical solver tolerances\. This can be advantageous for reproducibility and for downstream tasks where one wants a deterministic mapping from a latent seed to a sample\.
#### \(ii\) Solver error and stiffness\.
ODE\-based sampling replaces stochastic discretization error with deterministic solver error\. In high dimensions, stiffness can arise near endpoint regions where the effective score magnitude changes rapidly, making coarse step sizes unstable or biased\. This is one reason step\-size schedules and solver choices matter in practice; diffusion\-design analyses often emphasize the interaction between noise schedules, SNR allocation, and discretization error\(Karraset al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib16)\)\. When adaptive solvers are used, computational cost is controlled indirectly by tolerance parameters rather than by an explicit step count\.
### 6\.3Likelihood computation via instantaneous change\-of\-variables
Because \([20](https://arxiv.org/html/2605.06829#S6.E20)\) defines an invertible flow under standard regularity conditions, one can in principle compute log\-likelihoods using the instantaneous change\-of\-variables formula for ODE flows\(Chenet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib6)\)\. Concretely, ifXtX\_\{t\}follows \([20](https://arxiv.org/html/2605.06829#S6.E20)\), then
ddtlogρt\(Xt\)=−∇⋅\(f\(Xt,t\)−12g\(t\)2sθ\(Xt,t\)\),\\frac\{d\}\{dt\}\\log\\rho\_\{t\}\(X\_\{t\}\)=\-\\nabla\\cdot\\Big\(f\(X\_\{t\},t\)\-\\tfrac\{1\}\{2\}g\(t\)^\{2\}s\_\{\\theta\}\(X\_\{t\},t\)\\Big\),\(21\)and integrating \([21](https://arxiv.org/html/2605.06829#S6.E21)\) along trajectories relateslogρ0\\log\\rho\_\{0\}andlogρ1\\log\\rho\_\{1\}\. Computing the divergence term exactly is expensive in high dimensions, and scalable CNF implementations use stochastic trace estimators \(e\.g\., Hutchinson estimators\), as in FFJORD\(Grathwohlet al\.,[2019](https://arxiv.org/html/2605.06829#bib.bib11)\)\. In diffusion models, this likelihood connection is typically only approximate because both the score model and the numerical solution are approximate\. Nonetheless, the probability\-flow viewpoint provides a principled bridge between diffusion samplers and likelihood\-based continuous flows\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\.
### 6\.4How ODE and SDE samplers relate \(summary\)
Both reverse\-time SDE sampling \(Section[4](https://arxiv.org/html/2605.06829#S4)\) and probability\-flow ODE sampling \([20](https://arxiv.org/html/2605.06829#S6.E20)\) use the same learned score field\. In the idealized setting of an exact score and exact numerical integration, they produce samples from the same target distribution via matching one\-time marginals, but they differ in \(i\) the distribution over intermediate trajectories and \(ii\) how numerical error manifests\. This distinction becomes especially important when comparing diffusion to flow matching: diffusion learns a score field tied to a stochastic forward process and then induces an ODE velocity through \([20](https://arxiv.org/html/2605.06829#S6.E20)\), whereas flow matching learns the velocity field directly along a chosen path\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\.
## 7Flow Matching and Rectified Flows
Flow matching provides a complementary route to generative modeling that emphasizes*deterministic transport*and makes the choice of probability path explicit\. Rather than defining\(μt\)\(\\mu\_\{t\}\)as the marginals of a fixed forward SDE and learning a score field, flow matching specifies a family of intermediate distributions—typically through a coupling and an interpolation rule—and learns a*velocity field*whose induced ODE transports mass along that path\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\. This viewpoint connects naturally to continuous normalizing flows and helps explain recent progress on few\-step generation via path straightening, as in rectified flow\(Liuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib15)\)\. More recent work also shows that the same path\-based transport principle extends beyond the standard continuous Euclidean setting, including discrete\-state, graph\-valued, function\-valued, and Wasserstein\-space formulations\(Gatet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib77); Chenget al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib64); Daviset al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib78); Eijkelboomet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib79); Kerriganet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib65); Havivet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib66)\)\.
### 7\.1Path\-based formulation and conditional velocities
Letπ\(x0,x1\)\\pi\(x\_\{0\},x\_\{1\}\)be a coupling between datax0∼μ0x\_\{0\}\\sim\\mu\_\{0\}and noisex1∼μ1x\_\{1\}\\sim\\mu\_\{1\}\. A*path sampler*specifies a random intermediate statextx\_\{t\}given\(x0,x1,t\)\(x\_\{0\},x\_\{1\},t\), for example by an affine interpolation
xt=a\(t\)x0\+b\(t\)x1,x\_\{t\}=a\(t\)\\,x\_\{0\}\+b\(t\)\\,x\_\{1\},\(22\)with scalar functionsa\(t\)a\(t\)andb\(t\)b\(t\)\. The induced law ofxtx\_\{t\}defines a probability path\(μt\)t∈\[0,1\]\(\\mu\_\{t\}\)\_\{t\\in\[0,1\]\}, with densities\(ρt\)\(\\rho\_\{t\}\)when they exist\.
If the interpolation is differentiable intt, then along a sampled endpoint pair\(x0,x1\)\(x\_\{0\},x\_\{1\}\)the path derivative is
x˙t:=ddtxt=a′\(t\)x0\+b′\(t\)x1\.\\dot\{x\}\_\{t\}:=\\frac\{d\}\{dt\}x\_\{t\}=a^\{\\prime\}\(t\)\\,x\_\{0\}\+b^\{\\prime\}\(t\)\\,x\_\{1\}\.\(23\)This object is naturally a*conditional*target: it depends on the sampled endpoints\(x0,x1\)\(x\_\{0\},x\_\{1\}\), not only on the current statextx\_\{t\}\.
By contrast, the velocity field that appears in the continuity equation
∂tρt\+∇⋅\(ρtvt\)=0\\partial\_\{t\}\\rho\_\{t\}\+\\nabla\\cdot\(\\rho\_\{t\}v\_\{t\}\)=0is a function of the current state and time\. For a given path construction, the marginally correct velocity is
v∗\(x,t\)=𝔼\[x˙t∣xt=x,t\]\.v^\{\*\}\(x,t\)=\\mathbb\{E\}\\\!\\left\[\\dot\{x\}\_\{t\}\\mid x\_\{t\}=x,\\,t\\right\]\.\(24\)Equation \([24](https://arxiv.org/html/2605.06829#S7.E24)\) is the key mathematical object: it is the velocity field whose flow is consistent with the prescribed probability path\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\. In this sense, flow matching can be understood as learning the conditional expectation of the path derivative given the current state\.
###### Proposition 4\(Optimal flow\-matching velocity\)
Let a path sampler induce intermediate statesxtx\_\{t\}and conditional path derivativex˙t\\dot\{x\}\_\{t\}\. Then the population minimizer of the conditional flow\-matching loss
𝔼\[‖vθ\(xt,t\)−x˙t‖2\]\\mathbb\{E\}\\big\[\\\|v\_\{\\theta\}\(x\_\{t\},t\)\-\\dot\{x\}\_\{t\}\\\|^\{2\}\\big\]over measurable functions of\(xt,t\)\(x\_\{t\},t\)is
v∗\(x,t\)=𝔼\[x˙t∣xt=x,t\]\.v^\{\*\}\(x,t\)=\\mathbb\{E\}\[\\dot\{x\}\_\{t\}\\mid x\_\{t\}=x,\\,t\]\.Equivalently, the optimal unconditional velocity field is the conditional expectation of the path derivative given the current state and time\.
Proposition[4](https://arxiv.org/html/2605.06829#Thmtheorem4)is the key mathematical bridge between endpoint\-conditioned supervision and the marginal velocity field appearing in the continuity equation\. It explains why endpoint\-conditioned regression can still recover a state\-dependent field suitable for deterministic transport\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\.
### 7\.2Conditional and marginal flow matching objectives
A practical training objective samples\(x0,x1\)∼π\(x\_\{0\},x\_\{1\}\)\\sim\\pi, chooses a timett, constructsxtx\_\{t\}, and regresses the model velocity against the conditional target \([23](https://arxiv.org/html/2605.06829#S7.E23)\):
ℒCFM\(θ\)=𝔼t𝔼\(x0,x1\)∼π\[‖vθ\(xt,t\)−x˙t‖2\]\.\\mathcal\{L\}\_\{\\mathrm\{CFM\}\}\(\\theta\)=\\mathbb\{E\}\_\{t\}\\,\\mathbb\{E\}\_\{\(x\_\{0\},x\_\{1\}\)\\sim\\pi\}\\Big\[\\\|v\_\{\\theta\}\(x\_\{t\},t\)\-\\dot\{x\}\_\{t\}\\\|^\{2\}\\Big\]\.\(25\)For the affine path \([22](https://arxiv.org/html/2605.06829#S7.E22)\), this becomes
ℒCFM\(θ\)=𝔼t𝔼\(x0,x1\)∼π\[‖vθ\(xt,t\)−\(a′\(t\)x0\+b′\(t\)x1\)‖2\]\.\\mathcal\{L\}\_\{\\mathrm\{CFM\}\}\(\\theta\)=\\mathbb\{E\}\_\{t\}\\,\\mathbb\{E\}\_\{\(x\_\{0\},x\_\{1\}\)\\sim\\pi\}\\Big\[\\\|v\_\{\\theta\}\(x\_\{t\},t\)\-\\big\(a^\{\\prime\}\(t\)x\_\{0\}\+b^\{\\prime\}\(t\)x\_\{1\}\\big\)\\\|^\{2\}\\Big\]\.
Although \([25](https://arxiv.org/html/2605.06829#S7.E25)\) is written using endpoint\-conditioned targets, its population minimizer is the marginal velocity field \([24](https://arxiv.org/html/2605.06829#S7.E24)\)\. This follows from theL2L^\{2\}projection principle: minimizing squared error againstx˙t\\dot\{x\}\_\{t\}under the joint law of\(xt,t,x0,x1\)\(x\_\{t\},t,x\_\{0\},x\_\{1\}\)yields the conditional expectation𝔼\[x˙t∣xt,t\]\\mathbb\{E\}\[\\dot\{x\}\_\{t\}\\mid x\_\{t\},t\]\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\. This is the basic reason conditional flow matching is mathematically compatible with learning an unconditional fieldvθ\(x,t\)v\_\{\\theta\}\(x,t\)\.
An equivalent marginal formulation is therefore
ℒFM\(θ\)=𝔼t𝔼x∼ρt\[‖vθ\(x,t\)−v∗\(x,t\)‖2\],\\mathcal\{L\}\_\{\\mathrm\{FM\}\}\(\\theta\)=\\mathbb\{E\}\_\{t\}\\,\\mathbb\{E\}\_\{x\\sim\\rho\_\{t\}\}\\Big\[\\\|v\_\{\\theta\}\(x,t\)\-v^\{\*\}\(x,t\)\\\|^\{2\}\\Big\],\(26\)wherev∗\(x,t\)v^\{\*\}\(x,t\)is given by \([24](https://arxiv.org/html/2605.06829#S7.E24)\)\. In practice, the conditional version \([25](https://arxiv.org/html/2605.06829#S7.E25)\) is preferred because it provides a tractable supervised target via sampled endpoint pairs\.
### 7\.3How flow matching relates to diffusion and probability\-flow ODEs
The probability\-flow ODE of Section[6](https://arxiv.org/html/2605.06829#S6)defines a deterministic velocity field
vPF\(x,t\)=f\(x,t\)−12g\(t\)2st\(x\),v\_\{\\mathrm\{PF\}\}\(x,t\)=f\(x,t\)\-\\tfrac\{1\}\{2\}g\(t\)^\{2\}\\,s\_\{t\}\(x\),tied to a specific forward diffusion and its score field\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\. Flow matching can be viewed as learning a velocity field directly, without going through an explicit score parameterization\. These viewpoints coincide in special cases:
- •If the chosen path\(μt\)\(\\mu\_\{t\}\)matches the marginals of a forward diffusion and the target velocity equals the associated probability\-flow velocity, then flow matching recovers the same deterministic sampler as the diffusion probability\-flow ODE\.
- •More generally, flow matching allows broader path families and couplings, potentially yielding velocities that are easier to integrate numerically, less stiff, or better suited to coarse discretization\.
From the transport viewpoint developed in this survey, diffusion and flow matching differ less in their ultimate goal than in how they parameterize transport: diffusion learns a score field and*induces*a velocity through the probability\-flow construction, whereas flow matching learns the velocity field directly along a prescribed path\.
### 7\.4Gaussian probability paths and tractable supervision
A major practical advantage of flow matching is that many useful path families admit simple conditional sampling formulas and explicit endpoint\-conditioned velocities\. In particular, Gaussian probability paths provide tractable supervision analogous to the Gaussian perturbation kernels used in diffusion training\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\. This makes the method scalable: the training loop only needs to sample endpoint pairs, generate an intermediate statextx\_\{t\}, and evaluate the explicit conditional targetx˙t\\dot\{x\}\_\{t\}\.
At the same time, the dependence on the couplingπ\\pishould not be overlooked\. Two different couplings with the same endpoint laws can induce different conditional targets and therefore different learned velocity fields\. Thus, unlike diffusion, where the forward SDE largely fixes the path, flow matching makes the coupling and interpolation rule explicit modeling choices\.
### 7\.5Beyond Gaussian paths and Euclidean state spaces
Recent work shows that the flow\-matching principle extends well beyond finite\-dimensional Euclidean state spaces\. On the discrete side, discrete flow matching, categorical flow matching on statistical manifolds, and Fisher flow matching all adapt the same basic velocity\-learning viewpoint to noncontinuous state spaces\(Gatet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib77); Chenget al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib64); Daviset al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib78)\)\. On structured domains such as graphs, variational flow matching shows that path\-based transport ideas can be combined with latent\-variable or graph\-specific inductive structures\(Eijkelboomet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib79)\)\. On infinite\-dimensional or nonstandard spaces, functional flow matching and Wasserstein flow matching illustrate that the learned object can be interpreted as a transport field over functions or even over families of probability distributions\(Kerriganet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib65); Havivet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib66)\)\.
These extensions are conceptually important because they show that flow matching is not merely a particular recipe for images or continuous vectors, but a general strategy for learning transport dynamics in appropriately structured state spaces\.
### 7\.6Rectified flows and path straightening
Rectified flow proposes to learn transport dynamics whose trajectories are “as straight as possible,” with the goal of enabling accurate generation using very few integration steps\(Liuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib15)\)\. From the path\-design perspective, rectified flow emphasizes that*choosing or learning a good path*can be as important as the choice between score and velocity parameterizations: straighter paths tend to reduce stiffness and discretization error for coarse solvers, which is crucial for fast sampling\.
This viewpoint is conceptually important for the broader comparison developed in the paper\. If diffusion highlights the role of score estimation under stochastic noising and probability\-flow ODEs highlight marginally equivalent deterministic transport, rectified flow highlights a third axis: the geometry of the path itself\.
### 7\.7Practice note: what changes when you change the path
Because flow matching targets the velocity under the path distribution induced by\(π,path sampler\)\(\\pi,\\text\{path sampler\}\), changing the path changes both the regression target and the distribution overxtx\_\{t\}on which the model is trained\. This is analogous to changing the noise schedule in diffusion, but more general: the couplingπ\\pican encode correlations betweenx0x\_\{0\}andx1x\_\{1\}, and different interpolations can emphasize different geometric aspects of the data manifold\.
For downstream tasks such as conditional generation or inverse problems, this path dependence can interact strongly with conditioning mechanisms, since conditioning effectively perturbs the learned velocity or score field along the path\. One of the broader lessons of flow matching is therefore that path choice is not merely a technicality: it is a central modeling decision that shapes both training and sampling behavior\.
## 8Unified Comparison
We now summarize diffusion/score models and flow matching under a common probability\-transport lens\. The comparisons in this section are intentionally*structural*: they focus on what object is learned \(score versus velocity\), what path is assumed, what dynamics are solved at sampling time \(SDE versus ODE\), and where the dominant sources of error arise\. These axes are the ones that most directly influence both practice \(compute, stability, controllability\) and theory \(approximation, discretization, and identifiability\)\.
The discussion is anchored by the central formal statements from the preceding sections: Proposition[1](https://arxiv.org/html/2605.06829#Thmtheorem1)\(reverse\-time SDE\), Proposition[2](https://arxiv.org/html/2605.06829#Thmtheorem2)\(objective equivalence\), Proposition[3](https://arxiv.org/html/2605.06829#Thmtheorem3)\(probability\-flow ODE marginal equivalence\), and Proposition[4](https://arxiv.org/html/2605.06829#Thmtheorem4)\(optimal flow\-matching velocity\)\.
### 8\.1At\-a\-glance taxonomy
Table[2](https://arxiv.org/html/2605.06829#S8.T2)summarizes the main method families through the transport lens developed in this survey: what field is learned, how the path of intermediate distributions is defined, what sampler is used, and where the main numerical or modeling tradeoffs arise\.
Table 2:Master comparison of diffusion, score\-based, probability\-flow, and flow\-matching methods under the unified transport viewpoint\. The most important distinctions are: \(i\) what field is learned, \(ii\) how the probability path is specified, and \(iii\) whether sampling is performed with stochastic or deterministic dynamics\.Viewed this way, diffusion and flow matching are less usefully distinguished by broad labels than by a small set of structural decisions: path choice, field parameterization, sampler type, and numerical regime\. Recent variants such as cold diffusion, stochastic interpolants, Bayesian Flow Networks, consistency models, and discrete diffusion alternatives suggest that these structural axes extend beyond the original DDPM versus score\-SDE versus flow\-matching taxonomy, rather than replacing it\(Bansalet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib36); Albergoet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib44); Graveset al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib38); Songet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib39); Kimet al\.,[2024a](https://arxiv.org/html/2605.06829#bib.bib45); Louet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib46)\)\.
### 8\.2Design axes and tradeoffs
The methods above can be understood as different choices along a small set of design axes\.
#### \(1\) What is learned: score versus velocity\.
Diffusion and score\-SDE methods learn a score fieldsθ\(x,t\)s\_\{\\theta\}\(x,t\)and define a sampler via reverse\-time dynamics \(SDE\) or by inducing a velocity through the probability\-flow ODE \(Section[6](https://arxiv.org/html/2605.06829#S6)\)\. Flow matching and rectified flow learn a velocity fieldvθ\(x,t\)v\_\{\\theta\}\(x,t\)directly along a chosen path \(Section[7](https://arxiv.org/html/2605.06829#S7)\)\. Score parameterizations are natural for time reversal of diffusions, whereas velocity parameterizations are natural for deterministic transport and can offer more direct numerical control\. More recent path\-based frameworks such as stochastic interpolants and Bayesian Flow Networks further reinforce this distinction by making the field parameterization itself an explicit part of the modeling design space\(Albergoet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib44); Graveset al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib38); Xueet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib51)\)\.
#### \(2\) What path is assumed\.
Diffusion models tie the path\(ρt\)\(\\rho\_\{t\}\)to the marginals of a forward noising process, whereas flow matching makes the path an explicit modeling choice through the coupling/interpolation mechanism\. Path choice influences what regions of space and time the model must fit accurately and interacts with time\-weighting in the loss \(Section[5](https://arxiv.org/html/2605.06829#S5)\)\. Path design is therefore a primary knob for trading off sample quality, stability, and speed\(Karraset al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib16); Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3); Liuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib15)\)\. The same point is illustrated by recent alternatives that change the path itself rather than merely the sampler, including cold diffusion and stochastic interpolants\(Bansalet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib36); Albergoet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib44)\)\. Related physics\-inspired variants such as PFGM\+\+ reinforce the broader point that path design can be treated as a modeling degree of freedom rather than a fixed consequence of Gaussian noising\(Xuet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib61)\)\.
#### \(3\) What dynamics are solved at sampling time: SDE versus ODE\.
Reverse\-time SDE sampling injects noise at each step, which can aid exploration and sometimes improve perceptual quality, but it also introduces variance and additional discretization considerations\. ODE sampling is deterministic given the initial noise draw and can use adaptive solvers, but may be sensitive to stiffness and solver tolerances\. Both rely on accurate learned fields and can fail under distribution shift or poor time\-weighting\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\. Newer fast\-generation methods such as consistency models and consistency trajectory models can be interpreted as attempts to make this axis less costly by learning transports that remain accurate under very coarse discretization\(Songet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib39); Kimet al\.,[2024a](https://arxiv.org/html/2605.06829#bib.bib45)\)\.
#### \(4\) Where error comes from\.
Across methods, generation error can be decomposed into: \(i\)*approximation error*in representing the true field \(score or velocity\), \(ii\)*estimation error*from finite data and optimization, \(iii\)*numerical error*from discretizing SDEs/ODEs, and \(iv\)*path mismatch*when training and sampling use inconsistent assumptions \(e\.g\., different schedules or couplings\)\. We develop these decompositions more systematically in Section 9\. Recent generator\-matching and unification perspectives suggest that these error sources may be analyzable within a single broad framework encompassing diffusion, flow matching, and several of their newer variants\(Patelet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib52); Xueet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib51)\)\.
### 8\.3Practical guidance \(qualitative\)
The comparison above does not imply a universal ranking, but it does suggest a few qualitative rules of thumb:
- •If you need a stochastic sampler \(diversity, exploration, certain conditioning schemes\), reverse\-time SDE sampling provides a principled route tied to time reversal\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\.
- •If you want deterministic sampling and a direct connection to CNF\-style likelihood tools, probability\-flow ODE sampling provides a natural bridge\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Chenet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib6)\)\.
- •If you want to treat the probability path as a design choice, and potentially reduce stiffness for fast sampling, flow matching and rectified flow offer a direct velocity\-learning approach\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3); Liuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib15); Lipmanet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib47)\)\.
- •If your application departs substantially from the standard Gaussian\-noising setting—for example, through non\-Gaussian degradations, discrete state spaces, or explicitly iterative uncertainty updates—then newer alternatives such as cold diffusion, discrete diffusion, stochastic interpolants, and Bayesian Flow Networks may be better viewed as neighboring points in the same transport design space rather than as entirely separate paradigms\(Bansalet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib36); Louet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib46); Albergoet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib44); Graveset al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib38)\)\.
These statements are deliberately high\-level, and later sections and appendices detail when they hold and where they can fail\.
## 9Theory and Error Decomposition
A unified view of diffusion/score models and flow matching as*learned transport*suggests a common set of theoretical questions\. What statistical object does the training objective estimate, and under what weighting? How do approximation and estimation errors in the learned field propagate through reverse\-time dynamics? How does numerical discretization bias interact with model error? This section organizes these questions into an error decomposition that is useful for both analysis and practice\. We emphasize qualitative structure rather than exhaustive formal results, while pointing to representative recent technical results where appropriate\(Okoet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib67); Chenet al\.,[2023a](https://arxiv.org/html/2605.06829#bib.bib68); Zhanget al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib70)\)\.
### 9\.1A generic “field \+ solver” abstraction
Both diffusion/score models and flow matching ultimately generate samples by integrating dynamics driven by a learned field:
- •SDE sampler:dXt=bθ\(Xt,t\)dt\+σ\(t\)dWtdX\_\{t\}=b\_\{\\theta\}\(X\_\{t\},t\)\\,dt\+\\sigma\(t\)\\,dW\_\{t\}, wherebθb\_\{\\theta\}depends on a learned score \(Section[4](https://arxiv.org/html/2605.06829#S4)\)\.
- •ODE sampler:dXtdt=vθ\(Xt,t\)\\frac\{dX\_\{t\}\}\{dt\}=v\_\{\\theta\}\(X\_\{t\},t\), wherevθv\_\{\\theta\}is either induced by a score \(probability\-flow ODE, Section[6](https://arxiv.org/html/2605.06829#S6)\) or learned directly \(flow matching, Section[7](https://arxiv.org/html/2605.06829#S7)\)\.
This abstraction separates the overall generative\-modeling problem into two coupled components: a*statistical problem*, namely learning an accurate field, and a*numerical problem*, namely integrating the resulting dynamics accurately\.
### 9\.2Error decomposition: approximation, estimation, and numerics
Letμθ\\mu\_\{\\theta\}denote the model’s implicit sample law obtained by integrating the learned dynamics, exactly or numerically, and letμ0\\mu\_\{0\}denote the target data law\. A useful conceptual decomposition of the gap betweenμθ\\mu\_\{\\theta\}andμ0\\mu\_\{0\}is:
1. 1\.Approximation error\.Even with infinite data and perfect optimization, the function class forsθs\_\{\\theta\}orvθv\_\{\\theta\}may not contain the true fieldsts\_\{t\}orvtv\_\{t\}along the path\. This includes limitations due to network architecture, conditioning, and regularity assumptions\.
2. 2\.Estimation/optimization error\.With finite data and stochastic optimization, the learned field deviates from the population minimizer of the training objective \(Section[5](https://arxiv.org/html/2605.06829#S5)\)\. Time weighting can amplify this error in poorly represented SNR regimes\.
3. 3\.Discretization \(numerical\) error\.Sampling requires discretizing SDE or ODE dynamics\. Even with an exact field, finite step sizes introduce bias; with an approximate field, solver error can interact nonlinearly with model error \(Sections[4](https://arxiv.org/html/2605.06829#S4)and[6](https://arxiv.org/html/2605.06829#S6)\)\.
4. 4\.Path/objective mismatch\.Differences between the path implied by training \(noise schedule, coupling, interpolation\) and the dynamics used at sampling time \(solver choice, step\-size schedule, stochastic versus deterministic sampling\) can introduce additional error \(Sections[3](https://arxiv.org/html/2605.06829#S3)and[8](https://arxiv.org/html/2605.06829#S8)\)\.
This decomposition should be understood as conceptual rather than strictly additive: in practice, these errors interact\. Nevertheless, it is useful because it isolates the main mechanisms by which generative quality degrades in finite\-compute regimes\. Recent statistical analyses of diffusion models have begun to make parts of this picture precise, for example by establishing minimax or near\-minimax guarantees under suitable assumptions\(Okoet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib67); Zhanget al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib70)\)\.
### 9\.3Propagation of field error through dynamics
A core theoretical challenge is that small local errors in the learned field can accumulate along trajectories\. For ODE sampling, standard stability theory suggests bounds in terms of Lipschitz constants or related conditioning quantities: ifvθv\_\{\\theta\}is close to the true velocity in a suitable norm and the dynamics are well\-conditioned, then trajectory error can be controlled\. For SDE sampling, one may instead study weak and strong convergence of discretizations together with perturbation bounds for diffusion processes\. In either case, the relevant constants can be large in high dimensions or near endpoint regions where the field changes rapidly, which aligns with empirical observations of stiffness and sensitivity to schedules\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Karraset al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib16)\)\.
Recent theory has begun to sharpen this picture for score\-based models\. For example, score approximation, estimation, and distribution recovery have been analyzed on low\-dimensional data and manifold\-like settings\(Chenet al\.,[2023a](https://arxiv.org/html/2605.06829#bib.bib68); Tang and Yang,[2024](https://arxiv.org/html/2605.06829#bib.bib69)\), while related work suggests that diffusion models encode nontrivial geometric information about the intrinsic dimension of the underlying data manifold\(Stanczuket al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib71)\)\. These results remain far from a complete practical theory, but they indicate that field\-error propagation can support meaningful geometric and statistical guarantees\.
### 9\.4Discretization bias and solver choice
Sampling error is often dominated by discretization\. Euler–Maruyama introducesO\(Δt\)O\(\\Delta t\)weak error for SDEs under regularity assumptions, while higher\-order methods can reduce this at increased computational cost\. For ODEs, higher\-order Runge–Kutta solvers or adaptive solvers can reduce local truncation error, but may still struggle with stiffness \(Section[6\.2](https://arxiv.org/html/2605.06829#S6.SS2)\)\. In practice, the cost–quality tradeoff is governed by \(i\) how rapidly the learned field varies across time and space and \(ii\) how strongly error concentrates near endpoint regions\. This is one reason few\-step generation is difficult: it amounts to controlling numerical error under deliberately coarse discretization\. These considerations motivate work on better schedules and better path design, as in EDM and rectified flow\(Karraset al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib16); Liuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib15)\)\.
More recent one\-step and few\-step models make this issue especially explicit\. Consistency models, one\-step score distillation, and EM\-style distillation can all be viewed as attempts to reshape the learning problem so that numerically coarse samplers remain accurate\(Songet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib39); Luoet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib72); Xieet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib73)\)\. This suggests that fast sampling is not merely an implementation concern, but a core theoretical question about how field learning and discretization interact\.
### 9\.5Identifiability and what the objective recovers
An often underemphasized point is that training objectives recover a field*under the training distribution*\. For diffusion and score\-based models, DSM trains the score of perturbed marginalsρt\\rho\_\{t\}, not the score ofρ0\\rho\_\{0\}directly, and success depends on accurately learningsts\_\{t\}over the time region that most influences reverse\-time sampling \(Section[5](https://arxiv.org/html/2605.06829#S5)\)\. For flow matching, the learned velocity is tied to the chosen coupling and path sampler \(Section[7](https://arxiv.org/html/2605.06829#S7)\), and changing the coupling can change the learned velocity even when the endpoint laws are fixed\. Thus identifiability is path\-dependent: there is typically no unique “best” field without first fixing the path measure\.
This observation becomes even more important in more recent variants\. For example, one\-step distillation, Gaussian\-mixture flow matching, and manifold\-aware diffusion analyses all reinforce the idea that identifiability depends not only on the endpoint law, but also on the chosen state representation, path geometry, and objective weighting\(Chenet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib74); Stanczuket al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib71)\)\.
### 9\.6What theory is still missing \(and why it is hard\)
Despite rapid empirical progress, several fundamental questions remain open or only partially addressed:
- •High\-dimensional guarantees\.How do approximation and sampling complexity scale with dimension for realistic, low\-dimensional data manifolds embedded in high\-dimensional spaces\(Okoet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib67); Zhanget al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib70)\)?
- •Generalization of learned fields\.When does a learned score or velocity generalize off the training support, and how does this affect sampling stability?
- •Coupled model–solver analysis\.Most existing analyses isolate either learning error \(assuming exact sampling\) or numerical error \(assuming exact fields\)\. Practical systems require joint bounds\.
- •Principled path design\.Which path choices optimize stability and sample quality under compute constraints? Rectified flow, Gaussian\-mixture flow matching, and diffusion\-design work all suggest the importance of this question, but a unifying theory is still developing\(Karraset al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib16); Liuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib15); Chenet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib74)\)\.
- •Geometry\-aware theory\.To what extent should diffusion and flow\-matching theory be stated relative to ambient Euclidean dimension, and to what extent can it adapt to intrinsic manifold structure\(Tang and Yang,[2024](https://arxiv.org/html/2605.06829#bib.bib69); Stanczuket al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib71)\)?
We view these gaps as opportunities rather than merely limitations\. Once diffusion, score\-based models, and flow matching are all understood as instances of learned transport, they can be analyzed using a common language of fields, paths, and solvers\. In that sense, theoretical progress on any one of these paradigms is likely to inform the others as well\.
## 10Conclusion: Open Problems and Research Directions
We close by summarizing research directions suggested by the unified transport viewpoint developed throughout this survey\. These problems are intentionally phrased in a method\-agnostic way: they apply across diffusion and score\-based models, probability\-flow ODE samplers, flow matching, rectified flows, and a broader family of discrete\-state, graph\-based, and non\-Euclidean generative transports\(Campbellet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib59); Chenet al\.,[2023b](https://arxiv.org/html/2605.06829#bib.bib62); Gatet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib77); Kerriganet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib65); Havivet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib66)\)\. They also interact strongly with the question of*conditioning*, since constrained generation, inverse problems, editing, and restoration all modify transport dynamics in ways that are sensitive to both model geometry and numerical implementation\(Menget al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib95); Lugmayret al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib96); Tewariet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib97)\)\.
### 10\.1Principled path and schedule design
Path choice—whether implemented as a diffusion noise schedule or as an explicit coupling/interpolation in flow matching—appears to be a first\-order determinant of numerical stability and sampling efficiency\. A major open problem is to develop*principled*criteria for path design under compute constraints\. Examples include choosing paths that reduce stiffness, concentrate modeling capacity where it matters most for generation, or yield robust conditioning behavior for inverse problems\. Recent empirical and conceptual proposals, including EDM, rectified flow, critically damped Langevin diffusion, cold diffusion, Poisson\-flow\-style transports, and alternative discrete\-state noising schemes, highlight the importance of this degree of freedom, but a comprehensive theory remains incomplete\(Karraset al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib16); Liuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib15); Dockhornet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib57); Bansalet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib36); Xuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib60); Austinet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib75); Okhotinet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib76)\)\.
### 10\.2Coupled learning–sampling analysis
Practical generative modeling combines an estimated field \(score or velocity\) with an approximate numerical solver\. Developing bounds that jointly capture*learning error*and*solver error*, together with their interaction, is essential for explaining when few\-step sampling succeeds and when it fails\. This issue becomes even more pressing as newer methods explicitly target coarse discretization, direct trajectory learning, or alternative forward processes\. A broader theoretical challenge is therefore to understand diffusion, flow matching, autoregressive diffusion, and related generator families within a common field\-plus\-solver framework, rather than analyzing each architecture in isolation\(Hoogeboomet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib58); Dockhornet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib57); Liuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib15); Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\.
### 10\.3Generalization and robustness of learned fields
Because training objectives estimate fields under a particular path distribution \(Section[9\.5](https://arxiv.org/html/2605.06829#S9.SS5)\), it remains unclear when learned scores or velocities generalize to off\-support regions encountered during sampling, guidance, or conditioning\. Understanding the geometry of learned vector fields, their regularity, and their behavior under perturbations could yield both better theoretical guarantees and practical diagnostics for robust generation\. This issue is likely to become even sharper as the state spaces under consideration broaden, for example to discrete domains, graph spaces, function spaces, or spaces of probability distributions\(Campbellet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib59); Xuet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib81); Kerriganet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib65); Havivet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib66)\)\.
### 10\.4Fast sampling and distillation beyond heuristics
Fast generation remains a central practical objective\. Current progress relies heavily on improved discretizations, solver choices, path design, and distillation\-like procedures\. A promising direction is to formalize fast sampling itself as a constrained transport problem: given a fixed compute budget—for example, a limit on the number of function evaluations—choose a path and field parameterization that minimizes distributional error\. Recent one\-step diffusion methods, score\-implicit distillation, and EM\-style distillation all suggest that this problem can be attacked at the level of objective design rather than only at the level of numerical implementation\(Songet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib39); Luoet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib72); Xieet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib73)\)\. A unified theory of this tradeoff is still missing\.
### 10\.5Conditioning, inverse problems, and constrained transport
For inverse problems and conditional generation, conditioning can be interpreted as modifying the transport dynamics by incorporating a likelihood or constraint term along the path \(Section[1\.8](https://arxiv.org/html/2605.06829#S1.SS8)\)\. The transport lens therefore suggests viewing these methods as instances of*constrained*or*controlled*transport\. This raises natural questions about stability, bias, and optimality of conditioning schemes under different sampler choices \(SDE versus ODE\) and path designs\. Schrödinger bridge formulations provide one principled route by optimizing over stochastic path measures under endpoint constraints\(De Bortoliet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib8)\)\. More recent work on guided image editing, inpainting, restoration, forward\-model\-based inverse problems, and diffusion posterior sampling suggests that this area is likely to become an important point of convergence between diffusion, bridge, and flow\-matching approaches\(Menget al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib95); Lugmayret al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib96); Tewariet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib97); Routet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib98); Zhanget al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib99); Chunget al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib34)\)\.
A particularly promising direction is to understand inverse problems through the joint lens of posterior sampling, latent diffusion priors, expectation\-maximization, and diffusion optimal control\(Routet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib98); Rozetet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib101); Li and Pereira,[2024](https://arxiv.org/html/2605.06829#bib.bib103)\)\. Related work also suggests that the computational bottleneck in inverse problems is often not only learning the prior, but performing posterior transport efficiently under strong data constraints\(Janatiet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib102); Pandeyet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib104)\)\. This points toward a richer theory of conditioned transport in which path design, solver choice, and constraint handling are treated jointly rather than separately\.
### 10\.6Beyond Gaussian, Euclidean, and continuous\-state formulations
A major emerging direction is to understand how far the transport viewpoint can be pushed beyond the standard setting of continuous Gaussian noising in Euclidean data spaces\. Recent work has explored autoregressive diffusion\(Hoogeboomet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib58)\), continuous\-time denoising models for discrete data\(Campbellet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib59)\), structured discrete\-state diffusion\(Austinet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib75)\), discrete diffusion via continuous embeddings and masking mechanisms\(Chenet al\.,[2023b](https://arxiv.org/html/2605.06829#bib.bib62); Sahooet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib80)\), graph\-oriented discrete\-state transports\(Xuet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib81); Eijkelboomet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib79)\), and flow matching in discrete, functional, and Wasserstein spaces\(Gatet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib77); Chenget al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib64); Daviset al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib78); Kerriganet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib65); Havivet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib66)\)\. These developments suggest that the basic design axes emphasized in this survey—path, learned field, and sampler—survive far beyond the original DDPM setting\. A major open problem is to identify which parts of the existing theory and practice genuinely generalize and which depend crucially on Gaussian or Euclidean structure\.
### 10\.7Evaluation: what should we measure?
Finally, evaluation remains a persistent challenge\. In practice, generative models are often judged by metrics such as the Fréchet Inception Distance \(FID\)\(Heuselet al\.,[2017](https://arxiv.org/html/2605.06829#bib.bib82)\), precision–recall style metrics\(Sajjadiet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib83); Kynkäänniemiet al\.,[2019](https://arxiv.org/html/2605.06829#bib.bib84); Simonet al\.,[2019](https://arxiv.org/html/2605.06829#bib.bib85); Cheema and Urner,[2023](https://arxiv.org/html/2605.06829#bib.bib88); Lianget al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib92)\), and related fidelity/diversity decompositions\(Naeemet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib86)\)\. While these metrics are useful, they often conflate distinct notions of quality, including sample fidelity, coverage, perceptual realism, and support overlap\. More recent work has therefore proposed sample\-level, likelihood\-like, or attribute\-based alternatives that attempt to evaluate generalization, interpretability, or faithfulness more directly\(Alaaet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib87); Jiralersponget al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib91); Kimet al\.,[2024b](https://arxiv.org/html/2605.06829#bib.bib93)\)\.
From the perspective of this survey, the central problem is that standard endpoint metrics may be insensitive to failure modes that matter for downstream tasks such as inverse problems, controlled generation, or scientific applications\. In addition, several recent papers argue that widely used metrics can behave asymmetrically or unfairly, especially in high\-dimensional settings or when comparing different classes of samplers such as diffusion and flow\-based models\(Khayatkhoei and Abdalmageed,[2023](https://arxiv.org/html/2605.06829#bib.bib89); Steinet al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib90); Räisäet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib94)\)\. A transport\-based perspective therefore suggests evaluation criteria that reflect not only endpoint distributional accuracy, but also the stability, regularity, controllability, and solver sensitivity of the learned dynamics\. This challenge becomes even sharper as the design space broadens to include autoregressive diffusion, non\-Gaussian degradations, discrete\-state diffusion, non\-Euclidean transport formulations, and conditioning\-intensive inverse\-problem solvers\(Hoogeboomet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib58); Bansalet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib36); Campbellet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib59); Chenet al\.,[2023b](https://arxiv.org/html/2605.06829#bib.bib62); Sahooet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib80); Kerriganet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib65); Havivet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib66); Pandeyet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib104)\)\.
#### Takeaway\.
Diffusion and score\-based models, probability\-flow ODEs, and flow matching can all be understood as complementary parameterizations of probability transport\. The open problems above—especially those concerning path design, coupled learning–sampling theory, robust conditioning, and extensions beyond Gaussian Euclidean settings—are therefore not isolated technical questions, but shared challenges across multiple generative\-modeling paradigms\. Progress on these questions is likely to clarify which aspects of performance are intrinsic to the learned field and which are artifacts of the chosen path and numerical solver\.
## Appendix AReverse\-Time Diffusions and the Score Term
This appendix provides background for Proposition[1](https://arxiv.org/html/2605.06829#Thmtheorem1)\. We state the reverse\-time diffusion formula at a formal level and explain why the score∇xlogρt\(x\)\\nabla\_\{x\}\\log\\rho\_\{t\}\(x\)appears naturally in the reverse drift\.
### A\.1Setup and assumptions
Consider the forward Itô diffusion
dXt=f\(Xt,t\)dt\+g\(t\)dWt,t∈\[0,1\],dX\_\{t\}=f\(X\_\{t\},t\)\\,dt\+g\(t\)\\,dW\_\{t\},\\qquad t\\in\[0,1\],\(27\)wheref:ℝd×\[0,1\]→ℝdf:\\mathbb\{R\}^\{d\}\\times\[0,1\]\\to\\mathbb\{R\}^\{d\}is a drift field,g:\[0,1\]→ℝ\+g:\[0,1\]\\to\\mathbb\{R\}\_\{\+\}is a scalar diffusion coefficient, and\(Wt\)t∈\[0,1\]\(W\_\{t\}\)\_\{t\\in\[0,1\]\}is a standard Wiener process\. Letμt\\mu\_\{t\}denote the law ofXtX\_\{t\}, and assume that for eacht∈\(0,1\]t\\in\(0,1\],μt\\mu\_\{t\}admits a sufficiently smooth positive densityρt\\rho\_\{t\}\.
We do not attempt to state minimal assumptions here\. Informally, one needs enough regularity to justify both the Fokker–Planck equation and the time\-reversal argument, including existence of smooth transition densities and suitable decay or boundary behavior\. Classical references includeAnderson \([1982](https://arxiv.org/html/2605.06829#bib.bib7)\); the formulation used in modern score\-based modeling is presented bySonget al\.\([2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\.
### A\.2Forward density evolution
When densities exist, the forward marginals satisfy the Fokker–Planck equation
∂tρt\(x\)=−∇⋅\(ρt\(x\)f\(x,t\)\)\+12g\(t\)2Δρt\(x\)\.\\partial\_\{t\}\\rho\_\{t\}\(x\)=\-\\nabla\\cdot\\big\(\\rho\_\{t\}\(x\)f\(x,t\)\\big\)\+\\tfrac\{1\}\{2\}g\(t\)^\{2\}\\Delta\\rho\_\{t\}\(x\)\.\(28\)It is useful to rewrite the diffusion term using
Δρt=∇⋅\(ρt∇logρt\),\\Delta\\rho\_\{t\}=\\nabla\\cdot\\\!\\big\(\\rho\_\{t\}\\nabla\\log\\rho\_\{t\}\\big\),which gives
∂tρt=−∇⋅\(ρtf−12g\(t\)2ρt∇logρt\)−12g\(t\)2∇⋅\(ρt∇logρt\)\+12g\(t\)2Δρt\.\\partial\_\{t\}\\rho\_\{t\}=\-\\nabla\\cdot\\\!\\Big\(\\rho\_\{t\}f\-\\tfrac\{1\}\{2\}g\(t\)^\{2\}\\rho\_\{t\}\\nabla\\log\\rho\_\{t\}\\Big\)\-\\tfrac\{1\}\{2\}g\(t\)^\{2\}\\nabla\\cdot\\\!\\big\(\\rho\_\{t\}\\nabla\\log\\rho\_\{t\}\\big\)\+\\tfrac\{1\}\{2\}g\(t\)^\{2\}\\Delta\\rho\_\{t\}\.\(29\)Formally, this makes visible the velocity\-like correction involving the score\.
### A\.3Reverse\-time dynamics
LetX~t:=X1−t\\widetilde\{X\}\_\{t\}:=X\_\{1\-t\}denote the time\-reversed process\. Under suitable regularity assumptions, the reverse\-time process is again a diffusion\. Its drift is not simply the negative of the forward drift; instead, it contains an additional correction term involving the score of the forward marginals\. Formally, the reverse\-time SDE can be written as
dXt=\(f\(Xt,t\)−g\(t\)2∇xlogρt\(Xt\)\)dt\+g\(t\)dW¯t,dX\_\{t\}=\\Big\(f\(X\_\{t\},t\)\-g\(t\)^\{2\}\\nabla\_\{x\}\\log\\rho\_\{t\}\(X\_\{t\}\)\\Big\)\\,dt\+g\(t\)\\,d\\bar\{W\}\_\{t\},\(30\)interpreted in reverse time, whereW¯t\\bar\{W\}\_\{t\}is a reverse\-time Wiener process\.
Equation \([30](https://arxiv.org/html/2605.06829#A1.E30)\) is the key fact underlying score\-based generative modeling\. It shows that reverse\-time sampling can be defined once the score field
st\(x\)=∇xlogρt\(x\)s\_\{t\}\(x\)=\\nabla\_\{x\}\\log\\rho\_\{t\}\(x\)is known or accurately approximated\.
### A\.4Why the score appears
A formal way to understand the appearance of the score is to compare the forward Fokker–Planck equation with the continuity equation that would govern reversed mass transport\. The diffusion term in \([28](https://arxiv.org/html/2605.06829#A1.E28)\) cannot be reversed simply by negating time; the stochastic spreading of mass must be compensated by an additional drift that points toward regions of higher probability\. That compensating drift is precisely the score term\.
Equivalently, one may think of the reverse drift as the forward drift corrected by the logarithmic gradient of the evolving density:
reverse drift=f\(x,t\)−g\(t\)2∇xlogρt\(x\)\.\\text\{reverse drift\}=f\(x,t\)\-g\(t\)^\{2\}\\nabla\_\{x\}\\log\\rho\_\{t\}\(x\)\.The factorg\(t\)2g\(t\)^\{2\}reflects the strength of the diffusion in the forward process\. Larger forward noise requires a stronger score correction in reverse time\.
### A\.5Connection to score\-based generative modeling
Modern score\-based models replace the unknown scorest\(x\)=∇xlogρt\(x\)s\_\{t\}\(x\)=\\nabla\_\{x\}\\log\\rho\_\{t\}\(x\)with a neural approximationsθ\(x,t\)s\_\{\\theta\}\(x,t\)\. Substitutingsθs\_\{\\theta\}into \([30](https://arxiv.org/html/2605.06829#A1.E30)\) gives the practical reverse\-time sampler used in score\-SDE modeling:
dXt=\(f\(Xt,t\)−g\(t\)2sθ\(Xt,t\)\)dt\+g\(t\)dW¯t\.dX\_\{t\}=\\Big\(f\(X\_\{t\},t\)\-g\(t\)^\{2\}s\_\{\\theta\}\(X\_\{t\},t\)\\Big\)\\,dt\+g\(t\)\\,d\\bar\{W\}\_\{t\}\.This is the mathematical reason score estimation suffices for generation: one need not represent the densityρt\\rho\_\{t\}itself, only its logarithmic gradient along the forward path\.
### A\.6Remark on conventions
Different references write the reverse\-time diffusion with different sign conventions, depending on whether time is parameterized forward or backward and how the reversed Brownian motion is defined\. These formulations are equivalent after a change of variables, but care is needed when comparing formulas across sources\. Throughout this paper, we follow the convention ofSonget al\.\([2021](https://arxiv.org/html/2605.06829#bib.bib2)\), in which the reverse\-time drift is written at timettand sampling proceeds fromt=1t=1tot=0t=0\.
## Appendix BFokker–Planck, Continuity Equations, and the Probability\-Flow ODE
This appendix provides background for Proposition[3](https://arxiv.org/html/2605.06829#Thmtheorem3)\. The goal is to make explicit the PDE relationship between stochastic diffusion dynamics, deterministic transport, and the probability\-flow ODE\. The key point is that the probability\-flow ODE is constructed so that its one\-time marginals satisfy the same density evolution equation as the forward SDE, even though the induced path measures generally differ\.
### B\.1Forward diffusion and the Fokker–Planck equation
Consider the forward Itô diffusion
dXt=f\(Xt,t\)dt\+g\(t\)dWt,dX\_\{t\}=f\(X\_\{t\},t\)\\,dt\+g\(t\)\\,dW\_\{t\},\(31\)wheref:ℝd×\[0,1\]→ℝdf:\\mathbb\{R\}^\{d\}\\times\[0,1\]\\to\\mathbb\{R\}^\{d\}is a drift field,g:\[0,1\]→ℝ\+g:\[0,1\]\\to\\mathbb\{R\}\_\{\+\}is a scalar diffusion coefficient, and\(Wt\)t∈\[0,1\]\(W\_\{t\}\)\_\{t\\in\[0,1\]\}is a standard Wiener process\. Letμt\\mu\_\{t\}denote the law ofXtX\_\{t\}, and assume thatμt\\mu\_\{t\}admits a sufficiently smooth densityρt\\rho\_\{t\}\.
Under standard regularity assumptions,ρt\\rho\_\{t\}satisfies the Fokker–Planck equation
∂tρt\(x\)=−∇⋅\(ρt\(x\)f\(x,t\)\)\+12g\(t\)2Δρt\(x\)\.\\partial\_\{t\}\\rho\_\{t\}\(x\)=\-\\nabla\\cdot\\\!\\big\(\\rho\_\{t\}\(x\)f\(x,t\)\\big\)\+\\tfrac\{1\}\{2\}g\(t\)^\{2\}\\Delta\\rho\_\{t\}\(x\)\.\(32\)This equation describes the evolution of the one\-time marginals of the stochastic process \([31](https://arxiv.org/html/2605.06829#A2.E31)\)\.
### B\.2Deterministic transport and the continuity equation
Now consider a deterministic flow defined by the ODE
dXtdt=v\(Xt,t\),\\frac\{dX\_\{t\}\}\{dt\}=v\(X\_\{t\},t\),\(33\)wherev:ℝd×\[0,1\]→ℝdv:\\mathbb\{R\}^\{d\}\\times\[0,1\]\\to\\mathbb\{R\}^\{d\}is a velocity field\. IfX0∼μ0X\_\{0\}\\sim\\mu\_\{0\}and the flow is well\-defined, then the corresponding densities evolve according to the continuity equation
∂tρt\(x\)\+∇⋅\(ρt\(x\)v\(x,t\)\)=0\.\\partial\_\{t\}\\rho\_\{t\}\(x\)\+\\nabla\\cdot\\\!\\big\(\\rho\_\{t\}\(x\)v\(x,t\)\\big\)=0\.\(34\)Equivalently,
∂tρt\(x\)=−∇⋅\(ρt\(x\)v\(x,t\)\)\.\\partial\_\{t\}\\rho\_\{t\}\(x\)=\-\\nabla\\cdot\\\!\\big\(\\rho\_\{t\}\(x\)v\(x,t\)\\big\)\.\(35\)The central question is therefore: can one choose a deterministic velocity fieldvvso that \([35](https://arxiv.org/html/2605.06829#A2.E35)\) matches the Fokker–Planck evolution \([32](https://arxiv.org/html/2605.06829#A2.E32)\)?
### B\.3Rewriting the diffusion term
The answer rests on the identity
Δρt=∇⋅\(ρt∇logρt\),\\Delta\\rho\_\{t\}=\\nabla\\cdot\\\!\\big\(\\rho\_\{t\}\\nabla\\log\\rho\_\{t\}\\big\),\(36\)valid wheneverρt\\rho\_\{t\}is positive and sufficiently smooth\. Substituting \([36](https://arxiv.org/html/2605.06829#A2.E36)\) into the Fokker–Planck equation gives
∂tρt=−∇⋅\(ρtf\)\+12g\(t\)2∇⋅\(ρt∇logρt\)\.\\partial\_\{t\}\\rho\_\{t\}=\-\\nabla\\cdot\\\!\\big\(\\rho\_\{t\}f\\big\)\+\\tfrac\{1\}\{2\}g\(t\)^\{2\}\\nabla\\cdot\\\!\\big\(\\rho\_\{t\}\\nabla\\log\\rho\_\{t\}\\big\)\.\(37\)Factoring out the divergence yields
∂tρt=−∇⋅\(ρtf−12g\(t\)2ρt∇logρt\)\.\\partial\_\{t\}\\rho\_\{t\}=\-\\nabla\\cdot\\\!\\left\(\\rho\_\{t\}f\-\\tfrac\{1\}\{2\}g\(t\)^\{2\}\\rho\_\{t\}\\nabla\\log\\rho\_\{t\}\\right\)\.\(38\)Comparing \([38](https://arxiv.org/html/2605.06829#A2.E38)\) with the continuity equation \([35](https://arxiv.org/html/2605.06829#A2.E35)\), one is led to define the deterministic velocity field
vPF\(x,t\)=f\(x,t\)−12g\(t\)2∇xlogρt\(x\)\.v\_\{\\mathrm\{PF\}\}\(x,t\)=f\(x,t\)\-\\tfrac\{1\}\{2\}g\(t\)^\{2\}\\nabla\_\{x\}\\log\\rho\_\{t\}\(x\)\.\(39\)
### B\.4Probability\-flow ODE
The corresponding deterministic dynamics are
dXtdt=f\(Xt,t\)−12g\(t\)2∇xlogρt\(Xt\)\.\\frac\{dX\_\{t\}\}\{dt\}=f\(X\_\{t\},t\)\-\\tfrac\{1\}\{2\}g\(t\)^\{2\}\\nabla\_\{x\}\\log\\rho\_\{t\}\(X\_\{t\}\)\.\(40\)By construction, the continuity equation associated with \([40](https://arxiv.org/html/2605.06829#A2.E40)\) is exactly \([38](https://arxiv.org/html/2605.06829#A2.E38)\), which is the same density evolution equation satisfied by the forward SDE\. Therefore the ODE \([40](https://arxiv.org/html/2605.06829#A2.E40)\) and the SDE \([31](https://arxiv.org/html/2605.06829#A2.E31)\) share the same one\-time marginals\.
This proves Proposition[3](https://arxiv.org/html/2605.06829#Thmtheorem3)at a formal level: the probability\-flow ODE is defined precisely so that its density evolution matches the Fokker–Planck evolution of the SDE\.
### B\.5Marginals versus path measures
It is important to emphasize what Proposition[3](https://arxiv.org/html/2605.06829#Thmtheorem3)does and does not say\. The proposition states that the SDE and ODE have the same one\-time marginals:
μtSDE=μtODEfor eacht∈\[0,1\]\.\\mu\_\{t\}^\{\\mathrm\{SDE\}\}=\\mu\_\{t\}^\{\\mathrm\{ODE\}\}\\qquad\\text\{for each \}t\\in\[0,1\]\.It does*not*say that they induce the same law on trajectories\. The SDE defines a stochastic path measure, while the ODE defines a deterministic flow map\. Thus the two models agree at the level of marginals but generally differ at the level of paths\. This is the distinction emphasized in Section[3\.5](https://arxiv.org/html/2605.06829#S3.SS5)\.
### B\.6Learned probability\-flow ODE
In practice, the score∇xlogρt\(x\)\\nabla\_\{x\}\\log\\rho\_\{t\}\(x\)is unknown and is replaced by a learned approximationsθ\(x,t\)s\_\{\\theta\}\(x,t\)\. This yields the practical probability\-flow ODE
dXtdt=f\(Xt,t\)−12g\(t\)2sθ\(Xt,t\)\.\\frac\{dX\_\{t\}\}\{dt\}=f\(X\_\{t\},t\)\-\\tfrac\{1\}\{2\}g\(t\)^\{2\}s\_\{\\theta\}\(X\_\{t\},t\)\.\(41\)Whensθ≈∇xlogρts\_\{\\theta\}\\approx\\nabla\_\{x\}\\log\\rho\_\{t\}, the dynamics \([41](https://arxiv.org/html/2605.06829#A2.E41)\) approximately preserve the intended one\-time marginals\. The quality of the resulting sampler then depends on both score approximation error and numerical integration error\.
### B\.7Instantaneous change\-of\-variables
Suppose that the learned velocity field
vθ\(x,t\)=f\(x,t\)−12g\(t\)2sθ\(x,t\)v\_\{\\theta\}\(x,t\)=f\(x,t\)\-\\tfrac\{1\}\{2\}g\(t\)^\{2\}s\_\{\\theta\}\(x,t\)defines an invertible flow\. Then the corresponding density evolves according to the instantaneous change\-of\-variables formula
ddtlogρt\(Xt\)=−∇⋅vθ\(Xt,t\)\.\\frac\{d\}\{dt\}\\log\\rho\_\{t\}\(X\_\{t\}\)=\-\\nabla\\cdot v\_\{\\theta\}\(X\_\{t\},t\)\.\(42\)This follows directly from the continuity equation and is the continuous\-time analogue of the standard change\-of\-variables rule used in normalizing flows\(Chenet al\.,[2018](https://arxiv.org/html/2605.06829#bib.bib6)\)\. Integrating \([42](https://arxiv.org/html/2605.06829#A2.E42)\) along a trajectory yields
logρ0\(X0\)=logρ1\(X1\)\+∫01∇⋅vθ\(Xt,t\)𝑑t,\\log\\rho\_\{0\}\(X\_\{0\}\)=\\log\\rho\_\{1\}\(X\_\{1\}\)\+\\int\_\{0\}^\{1\}\\nabla\\cdot v\_\{\\theta\}\(X\_\{t\},t\)\\,dt,\(43\)up to the direction\-of\-time convention used to parameterize the flow\. This is the basis for the likelihood connection discussed in Section[6](https://arxiv.org/html/2605.06829#S6)\.
### B\.8Remark on regularity
The derivations above are formal and assume sufficient smoothness, positivity of densities, and existence of the required derivatives\. In particular, the identity \([36](https://arxiv.org/html/2605.06829#A2.E36)\) and the change\-of\-variables formula \([42](https://arxiv.org/html/2605.06829#A2.E42)\) both require regularity conditions that may fail in singular or degenerate settings\. For the purposes of this survey, the main point is conceptual: the probability\-flow ODE is obtained by rewriting the Fokker–Planck equation as a continuity equation with a score\-corrected deterministic velocity\.
### B\.9Takeaway
The probability\-flow ODE does not represent a different endpoint problem from the forward SDE\. Rather, it is a deterministic reparameterization of the same marginal evolution\. This is why it provides a natural bridge between diffusion models and deterministic transport methods such as continuous normalizing flows: the score field learned for stochastic reverse\-time sampling can also be used to define a deterministic ODE sampler with the same one\-time marginals\.
## Appendix CObjective Equivalences: Denoising Score Matching, DDPM Losses, and Weighted Fisher Divergence
This appendix provides background for Proposition[2](https://arxiv.org/html/2605.06829#Thmtheorem2)\. The goal is to make explicit why denoising score matching \(DSM\), DDPM\-style noise\-prediction losses, and continuous\-time score\-SDE objectives can all be viewed as weighted score\-regression objectives under Gaussian perturbations\. We keep the derivations at a survey level, emphasizing the common structure rather than the most general technical formulation\(Hyvärinen,[2005](https://arxiv.org/html/2605.06829#bib.bib4); Vincent,[2011](https://arxiv.org/html/2605.06829#bib.bib5); Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1); Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2); Kingmaet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib28)\)\.
### C\.1Classical score matching and Fisher divergence
Letρ\\rhobe a target density onℝd\\mathbb\{R\}^\{d\}and letsθ:ℝd→ℝds\_\{\\theta\}:\\mathbb\{R\}^\{d\}\\to\\mathbb\{R\}^\{d\}be a model score field\. The ideal Fisher\-divergence objective is
𝒥F\(θ\)=12𝔼x∼ρ\[‖sθ\(x\)−∇xlogρ\(x\)‖2\]\.\\mathcal\{J\}\_\{\\mathrm\{F\}\}\(\\theta\)=\\frac\{1\}\{2\}\\,\\mathbb\{E\}\_\{x\\sim\\rho\}\\\!\\left\[\\\|s\_\{\\theta\}\(x\)\-\\nabla\_\{x\}\\log\\rho\(x\)\\\|^\{2\}\\right\]\.\(44\)At first sight, \([44](https://arxiv.org/html/2605.06829#A3.E44)\) appears intractable because the true score∇xlogρ\(x\)\\nabla\_\{x\}\\log\\rho\(x\)is unknown\. The key observation ofHyvärinen \([2005](https://arxiv.org/html/2605.06829#bib.bib4)\)is that, under suitable regularity and boundary conditions, \([44](https://arxiv.org/html/2605.06829#A3.E44)\) can be rewritten by integration by parts into a form that depends only on samples fromρ\\rhoand the model score:
𝒥F\(θ\)≡𝔼x∼ρ\[∇⋅sθ\(x\)\+12‖sθ\(x\)‖2\]\+constant,\\mathcal\{J\}\_\{\\mathrm\{F\}\}\(\\theta\)\\equiv\\mathbb\{E\}\_\{x\\sim\\rho\}\\\!\\left\[\\nabla\\cdot s\_\{\\theta\}\(x\)\+\\frac\{1\}\{2\}\\\|s\_\{\\theta\}\(x\)\\\|^\{2\}\\right\]\+\\text\{constant\},\(45\)where the constant is independent ofθ\\theta\.
Thus score matching estimates the logarithmic gradient of a density without ever requiring direct evaluation of the density itself\. In modern generative modeling, this basic idea is not applied to a single densityρ\\rho, but to a family of perturbed marginals\(ρt\)t∈\[0,1\]\)\(\\rho\_\{t\}\)\_\{t\\in\[0,1\]\}\)\.
### C\.2Denoising score matching under a corruption kernel
Letqt\(xt∣x0\)q\_\{t\}\(x\_\{t\}\\mid x\_\{0\}\)be a tractable corruption kernel and letρ0\\rho\_\{0\}denote the data density\. The corresponding perturbed marginal is
ρt\(xt\)=∫qt\(xt∣x0\)ρ0\(x0\)𝑑x0\.\\rho\_\{t\}\(x\_\{t\}\)=\\int q\_\{t\}\(x\_\{t\}\\mid x\_\{0\}\)\\,\\rho\_\{0\}\(x\_\{0\}\)\\,dx\_\{0\}\.\(46\)A key identity underlying denoising score matching is
∇xtlogρt\(xt\)=𝔼\[∇xtlogqt\(xt∣x0\)\|xt\]\.\\nabla\_\{x\_\{t\}\}\\log\\rho\_\{t\}\(x\_\{t\}\)=\\mathbb\{E\}\\\!\\left\[\\nabla\_\{x\_\{t\}\}\\log q\_\{t\}\(x\_\{t\}\\mid x\_\{0\}\)\\,\\middle\|\\,x\_\{t\}\\right\]\.\(47\)This identity shows that the score of the perturbed marginal can be recovered as a conditional expectation of the conditional score\. Therefore one may train a model score field by regressing against the analytically available conditional target rather than the intractable marginal score\(Vincent,[2011](https://arxiv.org/html/2605.06829#bib.bib5)\)\.
The denoising score\-matching objective is
ℒDSM\(θ\)=𝔼x0∼ρ0𝔼xt∼qt\(⋅∣x0\)\[∥sθ\(xt,t\)−∇xtlogqt\(xt∣x0\)∥2\]\.\\mathcal\{L\}\_\{\\mathrm\{DSM\}\}\(\\theta\)=\\mathbb\{E\}\_\{x\_\{0\}\\sim\\rho\_\{0\}\}\\,\\mathbb\{E\}\_\{x\_\{t\}\\sim q\_\{t\}\(\\cdot\\mid x\_\{0\}\)\}\\Big\[\\\|s\_\{\\theta\}\(x\_\{t\},t\)\-\\nabla\_\{x\_\{t\}\}\\log q\_\{t\}\(x\_\{t\}\\mid x\_\{0\}\)\\\|^\{2\}\\Big\]\.\(48\)By theL2L^\{2\}projection identity,
ℒDSM\(θ\)\\displaystyle\\mathcal\{L\}\_\{\\mathrm\{DSM\}\}\(\\theta\)=𝔼\[∥sθ\(xt,t\)−𝔼\[∇xtlogqt\(xt∣x0\)∣xt\]∥2\]\+constant\\displaystyle=\\mathbb\{E\}\\\!\\left\[\\big\\\|s\_\{\\theta\}\(x\_\{t\},t\)\-\\mathbb\{E\}\[\\nabla\_\{x\_\{t\}\}\\log q\_\{t\}\(x\_\{t\}\\mid x\_\{0\}\)\\mid x\_\{t\}\]\\big\\\|^\{2\}\\right\]\+\\text\{constant\}=𝔼\[‖sθ\(xt,t\)−∇xtlogρt\(xt\)‖2\]\+constant\.\\displaystyle=\\mathbb\{E\}\\\!\\left\[\\\|s\_\{\\theta\}\(x\_\{t\},t\)\-\\nabla\_\{x\_\{t\}\}\\log\\rho\_\{t\}\(x\_\{t\}\)\\\|^\{2\}\\right\]\+\\text\{constant\}\.\(49\)Hence the population minimizer of DSM is the score of the perturbed marginal:
sθ∗\(xt,t\)=∇xtlogρt\(xt\)\.s\_\{\\theta\}^\{\*\}\(x\_\{t\},t\)=\\nabla\_\{x\_\{t\}\}\\log\\rho\_\{t\}\(x\_\{t\}\)\.
This is the key sense in which DSM is already a score\-regression objective on perturbed data\.
### C\.3Gaussian perturbations
Now assume the perturbation kernel is Gaussian:
xt=m\(t\)x0\+s\(t\)ε,ε∼𝒩\(0,I\)\.x\_\{t\}=m\(t\)x\_\{0\}\+s\(t\)\\varepsilon,\\qquad\\varepsilon\\sim\\mathcal\{N\}\(0,I\)\.\(50\)Then
qt\(xt∣x0\)=𝒩\(m\(t\)x0,s\(t\)2I\),q\_\{t\}\(x\_\{t\}\\mid x\_\{0\}\)=\\mathcal\{N\}\\\!\\big\(m\(t\)x\_\{0\},\\ s\(t\)^\{2\}I\\big\),and its conditional score is
∇xtlogqt\(xt∣x0\)=−1s\(t\)2\(xt−m\(t\)x0\)\.\\nabla\_\{x\_\{t\}\}\\log q\_\{t\}\(x\_\{t\}\\mid x\_\{0\}\)=\-\\frac\{1\}\{s\(t\)^\{2\}\}\\big\(x\_\{t\}\-m\(t\)x\_\{0\}\\big\)\.\(51\)Using \([50](https://arxiv.org/html/2605.06829#A3.E50)\), we also have
xt−m\(t\)x0=s\(t\)ε,hence∇xtlogqt\(xt∣x0\)=−1s\(t\)ε\.x\_\{t\}\-m\(t\)x\_\{0\}=s\(t\)\\varepsilon,\\qquad\\text\{hence\}\\qquad\\nabla\_\{x\_\{t\}\}\\log q\_\{t\}\(x\_\{t\}\\mid x\_\{0\}\)=\-\\frac\{1\}\{s\(t\)\}\\,\\varepsilon\.\(52\)This is the basic algebraic reason that Gaussian DSM can be written either as score regression or as noise regression\.
### C\.4Continuous\-time weighted DSM
In the continuous\-time setting, one samplesttfrom a distribution on\[0,1\]\[0,1\]and minimizes a weighted objective of the form
ℒCT\(θ\)=𝔼t𝔼x0,ε\[λ\(t\)∥sθ\(xt,t\)−∇xtlogqt\(xt∣x0\)∥2\],\\mathcal\{L\}\_\{\\mathrm\{CT\}\}\(\\theta\)=\\mathbb\{E\}\_\{t\}\\,\\mathbb\{E\}\_\{x\_\{0\},\\varepsilon\}\\Big\[\\lambda\(t\)\\,\\\|s\_\{\\theta\}\(x\_\{t\},t\)\-\\nabla\_\{x\_\{t\}\}\\log q\_\{t\}\(x\_\{t\}\\mid x\_\{0\}\)\\\|^\{2\}\\Big\],\(53\)wherextx\_\{t\}is generated according to \([50](https://arxiv.org/html/2605.06829#A3.E50)\)\. Substituting \([51](https://arxiv.org/html/2605.06829#A3.E51)\) gives
ℒCT\(θ\)=𝔼t𝔼x0,ε\[λ\(t\)‖sθ\(xt,t\)\+1s\(t\)2\(xt−m\(t\)x0\)‖2\],\\mathcal\{L\}\_\{\\mathrm\{CT\}\}\(\\theta\)=\\mathbb\{E\}\_\{t\}\\,\\mathbb\{E\}\_\{x\_\{0\},\\varepsilon\}\\Big\[\\lambda\(t\)\\,\\left\\\|s\_\{\\theta\}\(x\_\{t\},t\)\+\\frac\{1\}\{s\(t\)^\{2\}\}\\big\(x\_\{t\}\-m\(t\)x\_\{0\}\\big\)\\right\\\|^\{2\}\\Big\],\(54\)or equivalently, using \([52](https://arxiv.org/html/2605.06829#A3.E52)\),
ℒCT\(θ\)=𝔼t𝔼x0,ε\[λ\(t\)‖sθ\(xt,t\)\+1s\(t\)ε‖2\]\.\\mathcal\{L\}\_\{\\mathrm\{CT\}\}\(\\theta\)=\\mathbb\{E\}\_\{t\}\\,\\mathbb\{E\}\_\{x\_\{0\},\\varepsilon\}\\Big\[\\lambda\(t\)\\,\\left\\\|s\_\{\\theta\}\(x\_\{t\},t\)\+\\frac\{1\}\{s\(t\)\}\\varepsilon\\right\\\|^\{2\}\\Big\]\.\(55\)
Because the population minimizer is the score of the perturbed marginalρt\\rho\_\{t\}, this is a weighted score\-regression objective along the diffusion path\(Songet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib2)\)\.
### C\.5Weighted Fisher\-divergence interpretation
The previous subsection implies the weighted Fisher\-divergence viewpoint used in Section[5](https://arxiv.org/html/2605.06829#S5)\. Indeed, applying \([49](https://arxiv.org/html/2605.06829#A3.E49)\) pointwise in time yields
ℒCT\(θ\)≡∫01λ\(t\)𝔼x∼ρt\[‖sθ\(x,t\)−st\(x\)‖2\]𝑑t\+constant,\\mathcal\{L\}\_\{\\mathrm\{CT\}\}\(\\theta\)\\equiv\\int\_\{0\}^\{1\}\\lambda\(t\)\\,\\mathbb\{E\}\_\{x\\sim\\rho\_\{t\}\}\\big\[\\\|s\_\{\\theta\}\(x,t\)\-s\_\{t\}\(x\)\\\|^\{2\}\\big\]\\,dt\+\\text\{constant\},\(56\)where
st\(x\)=∇xlogρt\(x\)\.s\_\{t\}\(x\)=\\nabla\_\{x\}\\log\\rho\_\{t\}\(x\)\.Thus the continuous\-time objective is, up to an additive constant independent ofθ\\theta, a time\-integrated Fisher divergence between the learned score and the true score of the perturbed marginals\.
This is the formal content behind Proposition[2](https://arxiv.org/html/2605.06829#Thmtheorem2): different diffusion and score\-based objectives are unified because they estimate the same statistical object, namely the score field of\(ρt\)\(\\rho\_\{t\}\), with different parameterizations and weights\.
### C\.6DDPM: ELBO versus simplified noise prediction
DDPM is derived from a variational lower bound on the reverse Markov\-chain likelihood\(Sohl\-Dicksteinet al\.,[2015](https://arxiv.org/html/2605.06829#bib.bib9); Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1)\)\. In discrete time, the full objective can be written as a sum of KL and reconstruction terms:
ℒELBO\\displaystyle\\mathcal\{L\}\_\{\\mathrm\{ELBO\}\}=𝔼\[KL\(q\(xT∣x0\)∥p\(xT\)\)\+∑t=2TKL\(q\(xt−1∣xt,x0\)∥pθ\(xt−1∣xt\)\)\\displaystyle=\\mathbb\{E\}\\Big\[\\mathrm\{KL\}\\big\(q\(x\_\{T\}\\mid x\_\{0\}\)\\,\\\|\\,p\(x\_\{T\}\)\\big\)\+\\sum\_\{t=2\}^\{T\}\\mathrm\{KL\}\\big\(q\(x\_\{t\-1\}\\mid x\_\{t\},x\_\{0\}\)\\,\\\|\\,p\_\{\\theta\}\(x\_\{t\-1\}\\mid x\_\{t\}\)\\big\)−logpθ\(x0∣x1\)\]\.\\displaystyle\\hskip 56\.9055pt\-\\log p\_\{\\theta\}\(x\_\{0\}\\mid x\_\{1\}\)\\Big\]\.\(57\)In practice, however, the most widely used training loss is the simplified noise\-prediction objective
ℒε\(θ\)=𝔼t,x0,ε\[‖ε−εθ\(xt,t\)‖2\],xt=α¯tx0\+1−α¯tε\.\\mathcal\{L\}\_\{\\varepsilon\}\(\\theta\)=\\mathbb\{E\}\_\{t,x\_\{0\},\\varepsilon\}\\big\[\\\|\\varepsilon\-\\varepsilon\_\{\\theta\}\(x\_\{t\},t\)\\\|^\{2\}\\big\],\\qquad x\_\{t\}=\\sqrt\{\\bar\{\\alpha\}\_\{t\}\}x\_\{0\}\+\\sqrt\{1\-\\bar\{\\alpha\}\_\{t\}\}\\varepsilon\.\(58\)This practical loss is a reweighted surrogate of the variational objective rather than a term\-by\-term identical rewriting of \([57](https://arxiv.org/html/2605.06829#A3.E57)\)\. Later work clarified this weighting structure and its continuous\-time limit\(Nichol and Dhariwal,[2021](https://arxiv.org/html/2605.06829#bib.bib10); Kingmaet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib28)\)\.
### C\.7DDPM noise prediction as score regression
In DDPM notation, one identifies
m\(t\)=α¯t,s\(t\)=1−α¯t,m\(t\)=\\sqrt\{\\bar\{\\alpha\}\_\{t\}\},\\qquad s\(t\)=\\sqrt\{1\-\\bar\{\\alpha\}\_\{t\}\},so the conditional score becomes
∇xtlogqt\(xt∣x0\)=−11−α¯tε\.\\nabla\_\{x\_\{t\}\}\\log q\_\{t\}\(x\_\{t\}\\mid x\_\{0\}\)=\-\\frac\{1\}\{\\sqrt\{1\-\\bar\{\\alpha\}\_\{t\}\}\}\\,\\varepsilon\.\(59\)Now letεθ\(xt,t\)\\varepsilon\_\{\\theta\}\(x\_\{t\},t\)denote a model that predicts the noise\. Define the induced score model by
sθ\(xt,t\)=−11−α¯tεθ\(xt,t\)\.s\_\{\\theta\}\(x\_\{t\},t\)=\-\\frac\{1\}\{\\sqrt\{1\-\\bar\{\\alpha\}\_\{t\}\}\}\\,\\varepsilon\_\{\\theta\}\(x\_\{t\},t\)\.\(60\)Then minimizing \([58](https://arxiv.org/html/2605.06829#A3.E58)\) is equivalent, up to a schedule\-dependent scaling, to minimizing squared error betweensθ\(xt,t\)s\_\{\\theta\}\(x\_\{t\},t\)and the conditional score \([59](https://arxiv.org/html/2605.06829#A3.E59)\)\. Thus DDPM noise prediction is a reparameterized form of score regression\(Hoet al\.,[2020](https://arxiv.org/html/2605.06829#bib.bib1); Kingmaet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib28)\)\.
### C\.8Parameterization equivalences
For Gaussian perturbations, one may move freely between three common parameterizations:
- •Noise prediction:predictε\\varepsilonin xt=m\(t\)x0\+s\(t\)ε\.x\_\{t\}=m\(t\)x\_\{0\}\+s\(t\)\\varepsilon\.
- •Clean\-sample prediction:solve for x0=xt−s\(t\)εm\(t\)\.x\_\{0\}=\\frac\{x\_\{t\}\-s\(t\)\\varepsilon\}\{m\(t\)\}\.
- •Score prediction:use st\(xt∣x0\)=−1s\(t\)ε\.s\_\{t\}\(x\_\{t\}\\mid x\_\{0\}\)=\-\\frac\{1\}\{s\(t\)\}\\varepsilon\.
These parameterizations are algebraically equivalent in the Gaussian setting, but they lead to different optimization geometries because the effective scaling with respect tottchanges\. This is one reason alternative parameterizations can show different empirical behavior despite corresponding to the same underlying score field\(Nichol and Dhariwal,[2021](https://arxiv.org/html/2605.06829#bib.bib10); Songet al\.,[2020a](https://arxiv.org/html/2605.06829#bib.bib14); Kingmaet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib28); Salimans and Ho,[2022](https://arxiv.org/html/2605.06829#bib.bib53)\)\.
### C\.9Takeaway
The main conclusion of this appendix is that many objectives used in diffusion and score\-based generative modeling are not fundamentally different estimators of different quantities\. Rather, they are different weighted and reparameterized ways of estimating the score field of the perturbed marginals\. This is the sense in which DDPM noise\-prediction losses, DSM, and continuous\-time score\-SDE objectives are unified by Proposition[2](https://arxiv.org/html/2605.06829#Thmtheorem2)\.
## Appendix DFlow Matching: Conditional Targets and Marginal Optimal Velocity
This appendix provides background for Proposition[4](https://arxiv.org/html/2605.06829#Thmtheorem4)\. The goal is to make explicit the relationship between endpoint\-conditioned regression targets in flow matching and the marginal velocity field that appears in the continuity equation\. The key point is that the practical conditional flow\-matching objective is anL2L^\{2\}regression problem whose population minimizer is the conditional expectation of the path derivative given the current state and time\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3),[2024](https://arxiv.org/html/2605.06829#bib.bib47); Albergoet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib44)\)\.
### D\.1Path samplers and induced probability paths
Letμ0\\mu\_\{0\}denote the data law andμ1\\mu\_\{1\}a reference law, typically Gaussian\. Let
π\(x0,x1\)\\pi\(x\_\{0\},x\_\{1\}\)be a coupling betweenμ0\\mu\_\{0\}andμ1\\mu\_\{1\}\. A path sampler specifies, for eacht∈\[0,1\]t\\in\[0,1\], a random intermediate state
xt=Φt\(x0,x1\),x\_\{t\}=\\Phi\_\{t\}\(x\_\{0\},x\_\{1\}\),whereΦt\\Phi\_\{t\}is measurable in\(x0,x1\)\(x\_\{0\},x\_\{1\}\)and differentiable intt\. The induced law ofxtx\_\{t\}defines a probability path
μt=\(Φt\)\#π,\\mu\_\{t\}=\(\\Phi\_\{t\}\)\_\{\\\#\}\\pi,with densityρt\\rho\_\{t\}when it exists\.
A standard example is the affine path
xt=a\(t\)x0\+b\(t\)x1,x\_\{t\}=a\(t\)x\_\{0\}\+b\(t\)x\_\{1\},\(61\)wherea\(t\)a\(t\)andb\(t\)b\(t\)are scalar interpolation functions\. In this case, the path derivative is
x˙t:=ddtxt=a′\(t\)x0\+b′\(t\)x1\.\\dot\{x\}\_\{t\}:=\\frac\{d\}\{dt\}x\_\{t\}=a^\{\\prime\}\(t\)x\_\{0\}\+b^\{\\prime\}\(t\)x\_\{1\}\.\(62\)This affine\-path view is representative of the broader path\-based perspective shared by flow matching and stochastic interpolants\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3); Albergoet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib44)\)\.
### D\.2Conditional versus marginal velocities
The object in \([62](https://arxiv.org/html/2605.06829#A4.E62)\) is an endpoint\-conditioned velocity: it depends on the sampled pair\(x0,x1\)\(x\_\{0\},x\_\{1\}\)as well as on time\. By contrast, the velocity field appearing in the continuity equation must be a function of the current state and time:
∂tρt\+∇⋅\(ρtvt\)=0\.\\partial\_\{t\}\\rho\_\{t\}\+\\nabla\\cdot\(\\rho\_\{t\}v\_\{t\}\)=0\.\(63\)The corresponding marginally correct velocity field is
v∗\(x,t\)=𝔼\[x˙t∣xt=x,t\]\.v^\{\*\}\(x,t\)=\\mathbb\{E\}\[\\dot\{x\}\_\{t\}\\mid x\_\{t\}=x,\\,t\]\.\(64\)
This distinction is essential\. Flow matching is implemented using endpoint\-conditioned samples and endpoint\-conditioned targets, but the learned model
is an unconditional function of\(x,t\)\(x,t\)\. The rest of this appendix explains why these two views are compatible\.
### D\.3Conditional flow\-matching objective
The practical conditional flow\-matching objective is
ℒCFM\(θ\)=𝔼\[‖vθ\(xt,t\)−x˙t‖2\],\\mathcal\{L\}\_\{\\mathrm\{CFM\}\}\(\\theta\)=\\mathbb\{E\}\\big\[\\\|v\_\{\\theta\}\(x\_\{t\},t\)\-\\dot\{x\}\_\{t\}\\\|^\{2\}\\big\],\(65\)where the expectation is taken over the sampling procedure
t∼p\(t\),\(x0,x1\)∼π,xt=Φt\(x0,x1\)\.t\\sim p\(t\),\\qquad\(x\_\{0\},x\_\{1\}\)\\sim\\pi,\\qquad x\_\{t\}=\\Phi\_\{t\}\(x\_\{0\},x\_\{1\}\)\.For the affine path \([61](https://arxiv.org/html/2605.06829#A4.E61)\), this becomes
ℒCFM\(θ\)=𝔼\[‖vθ\(xt,t\)−\(a′\(t\)x0\+b′\(t\)x1\)‖2\]\.\\mathcal\{L\}\_\{\\mathrm\{CFM\}\}\(\\theta\)=\\mathbb\{E\}\\Big\[\\big\\\|v\_\{\\theta\}\(x\_\{t\},t\)\-\\big\(a^\{\\prime\}\(t\)x\_\{0\}\+b^\{\\prime\}\(t\)x\_\{1\}\\big\)\\big\\\|^\{2\}\\Big\]\.\(66\)
This objective is practical because the targetx˙t\\dot\{x\}\_\{t\}is explicitly computable from sampled endpoint pairs\.
### D\.4Why the population minimizer is a conditional expectation
We now derive the optimal regression target\. Let
Z:=\(xt,t\),Y:=x˙t\.Z:=\(x\_\{t\},t\),\\qquad Y:=\\dot\{x\}\_\{t\}\.Then the conditional flow\-matching loss can be written abstractly as
ℒ\(v\)=𝔼\[‖v\(Z\)−Y‖2\],\\mathcal\{L\}\(v\)=\\mathbb\{E\}\\big\[\\\|v\(Z\)\-Y\\\|^\{2\}\\big\],where the minimization is over measurable functions ofZZ\.
By the standardL2L^\{2\}projection identity,
𝔼\[‖v\(Z\)−Y‖2\]\\displaystyle\\mathbb\{E\}\\big\[\\\|v\(Z\)\-Y\\\|^\{2\}\\big\]=𝔼\[∥v\(Z\)−𝔼\[Y∣Z\]∥2\]\+𝔼\[∥𝔼\[Y∣Z\]−Y∥2\]\.\\displaystyle=\\mathbb\{E\}\\big\[\\\|v\(Z\)\-\\mathbb\{E\}\[Y\\mid Z\]\\\|^\{2\}\\big\]\+\\mathbb\{E\}\\big\[\\\|\\mathbb\{E\}\[Y\\mid Z\]\-Y\\\|^\{2\}\\big\]\.\(67\)The second term is independent ofvv, so the minimizer is achieved by
v∗\(Z\)=𝔼\[Y∣Z\]\.v^\{\*\}\(Z\)=\\mathbb\{E\}\[Y\\mid Z\]\.\(68\)Substituting back the definitions ofZZandYYyields
v∗\(x,t\)=𝔼\[x˙t∣xt=x,t\]\.v^\{\*\}\(x,t\)=\\mathbb\{E\}\[\\dot\{x\}\_\{t\}\\mid x\_\{t\}=x,\\,t\]\.\(69\)This proves Proposition[4](https://arxiv.org/html/2605.06829#Thmtheorem4)\.
### D\.5Interpretation of the projection formula
The decomposition \([67](https://arxiv.org/html/2605.06829#A4.E67)\) shows that conditional flow matching is an ordinary regression problem\. The model cannot recover the full endpoint\-conditioned derivativex˙t\\dot\{x\}\_\{t\}because it only observes\(xt,t\)\(x\_\{t\},t\), not\(x0,x1\)\(x\_\{0\},x\_\{1\}\)directly\. The best it can do in mean\-squared error is therefore the conditional expectation ofx˙t\\dot\{x\}\_\{t\}given the information available to it\.
This is the basic reason endpoint\-conditioned supervision is mathematically compatible with learning an unconditional fieldvθ\(x,t\)v\_\{\\theta\}\(x,t\)\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib3)\)\. More recent work has emphasized that this regression viewpoint can be generalized into broader path\-based frameworks for fast and consistency\-style generative modeling\(Boffiet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib48)\)\.
### D\.6Connection to the continuity equation
The velocity field \([69](https://arxiv.org/html/2605.06829#A4.E69)\) is not merely the best regression target in an abstract statistical sense\. It is also the velocity field that is consistent with the induced probability path\.
Letφ:ℝd→ℝ\\varphi:\\mathbb\{R\}^\{d\}\\to\\mathbb\{R\}be a smooth test function\. Then
ddt𝔼\[φ\(xt\)\]\\displaystyle\\frac\{d\}\{dt\}\\mathbb\{E\}\[\\varphi\(x\_\{t\}\)\]=𝔼\[∇φ\(xt\)⋅x˙t\]\\displaystyle=\\mathbb\{E\}\\big\[\\nabla\\varphi\(x\_\{t\}\)\\cdot\\dot\{x\}\_\{t\}\\big\]\(70\)=𝔼\[∇φ\(xt\)⋅𝔼\[x˙t∣xt,t\]\]\\displaystyle=\\mathbb\{E\}\\big\[\\nabla\\varphi\(x\_\{t\}\)\\cdot\\mathbb\{E\}\[\\dot\{x\}\_\{t\}\\mid x\_\{t\},t\]\\big\]\(71\)=𝔼\[∇φ\(xt\)⋅v∗\(xt,t\)\]\.\\displaystyle=\\mathbb\{E\}\\big\[\\nabla\\varphi\(x\_\{t\}\)\\cdot v^\{\*\}\(x\_\{t\},t\)\\big\]\.\(72\)Equation \([72](https://arxiv.org/html/2605.06829#A4.E72)\) is the weak form of the continuity equation
∂tρt\+∇⋅\(ρtv∗\)=0\.\\partial\_\{t\}\\rho\_\{t\}\+\\nabla\\cdot\(\\rho\_\{t\}v^\{\*\}\)=0\.Thus the same conditional expectation that arises from theL2L^\{2\}regression argument is also the correct transport velocity for the induced marginal path\.
### D\.7Affine paths
For the affine interpolation \([61](https://arxiv.org/html/2605.06829#A4.E61)\), the target derivative is
x˙t=a′\(t\)x0\+b′\(t\)x1\.\\dot\{x\}\_\{t\}=a^\{\\prime\}\(t\)x\_\{0\}\+b^\{\\prime\}\(t\)x\_\{1\}\.Therefore the marginal optimal velocity is
v∗\(x,t\)=𝔼\[a′\(t\)x0\+b′\(t\)x1∣xt=x,t\]\.v^\{\*\}\(x,t\)=\\mathbb\{E\}\[a^\{\\prime\}\(t\)x\_\{0\}\+b^\{\\prime\}\(t\)x\_\{1\}\\mid x\_\{t\}=x,\\,t\]\.\(73\)This expression is typically not available in closed form for arbitrary couplingsπ\\pi, which is precisely why the conditional objective \([66](https://arxiv.org/html/2605.06829#A4.E66)\) is useful: it avoids having to compute the marginal conditional expectation analytically\.
### D\.8Dependence on the coupling
A crucial point is that the optimal velocity depends on the couplingπ\\pi\. Two different couplings may have the same endpoint lawsμ0\\mu\_\{0\}andμ1\\mu\_\{1\}but induce different conditional expectations
𝔼\[x˙t∣xt=x,t\]\.\\mathbb\{E\}\[\\dot\{x\}\_\{t\}\\mid x\_\{t\}=x,\\,t\]\.Thus the learned velocity is path\-dependent and coupling\-dependent\. This is one of the major conceptual differences from diffusion models, where the forward SDE more tightly constrains the path\.
In this sense, flow matching turns path design into an explicit modeling decision rather than a byproduct of a fixed diffusion\.
### D\.9Relation to diffusion, probability\-flow ODEs, and neighboring frameworks
Section[6](https://arxiv.org/html/2605.06829#S6)showed that a forward diffusion induces a probability\-flow ODE with velocity
vPF\(x,t\)=f\(x,t\)−12g\(t\)2st\(x\)\.v\_\{\\mathrm\{PF\}\}\(x,t\)=f\(x,t\)\-\\tfrac\{1\}\{2\}g\(t\)^\{2\}s\_\{t\}\(x\)\.Flow matching instead begins by specifying a path and then learns the corresponding velocity field directly\. The two viewpoints coincide when the chosen path matches the diffusion marginals and the conditional expectation in \([69](https://arxiv.org/html/2605.06829#A4.E69)\) equals the probability\-flow velocity\.
More broadly, this path\-first perspective overlaps substantially with stochastic interpolants, which explicitly treat flows and diffusions within a common interpolation framework\(Albergoet al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib44)\)\. It also overlaps conceptually with Bayesian Flow Networks, which provide another iterative probabilistic route to generation and have recently been related to diffusion\-style SDE formulations\(Graveset al\.,[2023](https://arxiv.org/html/2605.06829#bib.bib38); Xueet al\.,[2024](https://arxiv.org/html/2605.06829#bib.bib51)\)\. Thus flow matching is best understood not as an isolated method, but as one point in a larger family of path\-based transport models\.
### D\.10Path geometry, rectification, and guidance
The practical success of flow matching depends strongly on the geometry of the chosen path\. Rectified flow highlights this explicitly by seeking trajectories that are as straight as possible, thereby reducing stiffness and making coarse ODE discretizations more effective\(Liuet al\.,[2022](https://arxiv.org/html/2605.06829#bib.bib15)\)\. In the same spirit, recent work on guided flow matching shows that conditioning and guidance can interact substantially with path geometry and with the stability of the learned transport\(Fenget al\.,[2025](https://arxiv.org/html/2605.06829#bib.bib49)\)\. This reinforces one of the central themes of the paper: path choice is not merely a technical implementation detail, but a core modeling decision\.
### D\.11Takeaway
The main conclusion of this appendix is that the practical flow\-matching loss is not merely a heuristic regression objective\. Its population minimizer is exactly the marginal velocity field
v∗\(x,t\)=𝔼\[x˙t∣xt=x,t\],v^\{\*\}\(x,t\)=\\mathbb\{E\}\[\\dot\{x\}\_\{t\}\\mid x\_\{t\}=x,\\,t\],which is both the optimalL2L^\{2\}predictor of the path derivative and the velocity field consistent with the continuity equation for the induced path\. This is the formal content of Proposition[4](https://arxiv.org/html/2605.06829#Thmtheorem4), and it explains why endpoint\-conditioned supervision suffices to learn a state\-dependent transport field\.
## Appendix EMeasure\-Theoretic Background: Probability Laws, Pushforwards, and Path Measures
This appendix clarifies the measure\-theoretic language used throughout the survey\. The main purpose is not to develop a full abstract framework, but rather to explain why it is useful to distinguish probability*laws*from*densities*, and why many statements in diffusion, score\-based modeling, and flow matching are most naturally formulated in terms of probability measures and their pushforwards\.
### E\.1Probability laws versus densities
Let\(𝒳,ℬ\)\(\\mathcal\{X\},\\mathcal\{B\}\)be a measurable space, typically𝒳=ℝd\\mathcal\{X\}=\\mathbb\{R\}^\{d\}equipped with its Borelσ\\sigma\-algebra\. A probability distribution on𝒳\\mathcal\{X\}is formally a probability measure
μ:ℬ→\[0,1\]\.\\mu:\\mathcal\{B\}\\to\[0,1\]\.Ifμ\\muis absolutely continuous with respect to Lebesgue measure, then there exists a densityρ\\rhosuch that
μ\(dx\)=ρ\(x\)dx\.\\mu\(dx\)=\\rho\(x\)\\,dx\.In that case, one often identifies the distribution with its density\. However, this identification is only valid when such a density exists\.
This distinction matters because many constructions in generative modeling are naturally defined at the level of measures, even when densities may be singular, unavailable in closed form, or only defined after smoothing\. For this reason, throughout the paper we use
μtfor a probability law,ρtfor its density when it exists\.\\mu\_\{t\}\\quad\\text\{for a probability law\},\\qquad\\rho\_\{t\}\\quad\\text\{for its density when it exists\}\.
### E\.2Why measure language is useful in generative modeling
There are three recurring reasons to use measure\-theoretic language\.
#### \(i\) Pushforwards are fundamentally measure\-valued\.
A generative model often maps a simple random input to a more complex sample\. If
z∼ν,x=Gθ\(z\),z\\sim\\nu,\\qquad x=G\_\{\\theta\}\(z\),then the induced distribution ofxxis the pushforward measure
μθ=\(Gθ\)\#ν\.\\mu\_\{\\theta\}=\(G\_\{\\theta\}\)\_\{\\\#\}\\nu\.This definition makes sense whether or notμθ\\mu\_\{\\theta\}admits a density\.
#### \(ii\) Pathwise statements concern laws, not only densities\.
When we say that an SDE or ODE induces a family of marginals\(μt\)\(\\mu\_\{t\}\), this is a statement about the one\-time laws of a stochastic or deterministic process\. If those laws admit densities, we may write\(ρt\)\(\\rho\_\{t\}\)instead; but the primary object is the probability measure\.
#### \(iii\) Different dynamics may share marginals but not path measures\.
The probability\-flow ODE and the forward SDE may have the same one\-time marginals while inducing different distributions over trajectories\. This distinction is naturally expressed in terms of measures on path space rather than pointwise densities\.
### E\.3Pushforward measures
LetT:𝒳→𝒴T:\\mathcal\{X\}\\to\\mathcal\{Y\}be measurable, and letμ\\mube a probability measure on𝒳\\mathcal\{X\}\. The pushforward ofμ\\mubyTT, denotedT\#μT\_\{\\\#\}\\mu, is the probability measure on𝒴\\mathcal\{Y\}defined by
\(T\#μ\)\(A\)=μ\(T−1\(A\)\)for all measurableA⊆𝒴\.\(T\_\{\\\#\}\\mu\)\(A\)=\\mu\(T^\{\-1\}\(A\)\)\\qquad\\text\{for all measurable \}A\\subseteq\\mathcal\{Y\}\.\(74\)Equivalently, for any bounded measurable test functionφ\\varphi,
∫𝒴φ\(y\)\(T\#μ\)\(dy\)=∫𝒳φ\(T\(x\)\)μ\(dx\)\.\\int\_\{\\mathcal\{Y\}\}\\varphi\(y\)\\,\(T\_\{\\\#\}\\mu\)\(dy\)=\\int\_\{\\mathcal\{X\}\}\\varphi\(T\(x\)\)\\,\\mu\(dx\)\.\(75\)
This notation is ubiquitous in transport\-based generative modeling\. For example:
- •in normalizing flows, the learned map transports a base law to a data law;
- •in deterministic ODE flows, the time\-ttflow map transportsμ0\\mu\_\{0\}toμt\\mu\_\{t\};
- •in flow matching, a path sampler induces intermediate laws by pushing forward a coupling measure\.
### E\.4Probability paths
A probability path is simply a family of probability measures
\(μt\)t∈\[0,1\]\(\\mu\_\{t\}\)\_\{t\\in\[0,1\]\}connecting an initial lawμ0\\mu\_\{0\}and a terminal lawμ1\\mu\_\{1\}\. In diffusion and score\-based models, this path is usually defined as the family of one\-time laws of a forward SDE\. In flow matching, the path is often defined more directly through a coupling and an interpolation rule\.
If the path is induced by an interpolation map
xt=Φt\(x0,x1\),x\_\{t\}=\\Phi\_\{t\}\(x\_\{0\},x\_\{1\}\),with couplingπ\(x0,x1\)\\pi\(x\_\{0\},x\_\{1\}\)betweenμ0\\mu\_\{0\}andμ1\\mu\_\{1\}, then the time\-ttlaw is
μt=\(Φt\)\#π\.\\mu\_\{t\}=\(\\Phi\_\{t\}\)\_\{\\\#\}\\pi\.\(76\)This formula is naturally measure\-theoretic: it does not require densities, Jacobians, or smoothness\.
### E\.5Path measures
A one\-time marginal path\(μt\)\(\\mu\_\{t\}\)does not by itself determine a unique law on full trajectories\. A*path measure*is a probability measure on a suitable path space, for example
C\(\[0,1\],ℝd\),C\(\[0,1\],\\mathbb\{R\}^\{d\}\),the space of continuous trajectories\. An SDE induces a probability measure on this space, and an ODE with random initialization does as well\.
This distinction is important because different dynamics may share the same one\-time marginals while inducing different path measures\. This is precisely the situation for:
- •a forward SDE and its associated probability\-flow ODE,
- •different couplings in flow matching,
- •Schrödinger bridge formulations that optimize over path\-space laws\.
Thus, when we say that two models are equivalent, we must specify the level of equivalence:
- •equality of endpoint laws,
- •equality of all one\-time marginals,
- •or equality of path measures\.
These are increasingly stronger notions\.
### E\.6Weak formulations and continuity equations
Measure\-theoretic language is also useful because transport equations can be written in weak form even when densities are unavailable\. Suppose\(μt\)t∈\[0,1\]\(\\mu\_\{t\}\)\_\{t\\in\[0,1\]\}is a probability path andvtv\_\{t\}is a velocity field\. The continuity equation
∂tρt\+∇⋅\(ρtvt\)=0\\partial\_\{t\}\\rho\_\{t\}\+\\nabla\\cdot\(\\rho\_\{t\}v\_\{t\}\)=0can be interpreted weakly as
ddt∫φ\(x\)μt\(dx\)=∫∇φ\(x\)⋅vt\(x\)μt\(dx\)\\frac\{d\}\{dt\}\\int\\varphi\(x\)\\,\\mu\_\{t\}\(dx\)=\\int\\nabla\\varphi\(x\)\\cdot v\_\{t\}\(x\)\\,\\mu\_\{t\}\(dx\)\(77\)for all smooth compactly supported test functionsφ\\varphi\. This formulation remains meaningful even ifμt\\mu\_\{t\}does not admit a density\.
Similarly, the Fokker–Planck equation associated with an SDE can be understood weakly through the infinitesimal generator\. This is one reason measure\-theoretic formulations are natural in stochastic transport problems\.
### E\.7Scores require densities
Unlike pushforwards or weak continuity equations, the score
st\(x\)=∇xlogρt\(x\)s\_\{t\}\(x\)=\\nabla\_\{x\}\\log\\rho\_\{t\}\(x\)requires that the lawμt\\mu\_\{t\}admit a sufficiently smooth positive densityρt\\rho\_\{t\}\. Thus score\-based modeling is inherently a density\-based construction, even if the broader transport framework can be expressed at the level of measures\.
This is one reason diffusive perturbations are so useful: they regularize the distribution enough that densities become well\-defined for positive times in many settings\. By contrast, a deterministic transport map may push a measure onto a lower\-dimensional set or otherwise create singular structures unless additional regularity is imposed\.
### E\.8Marginal equivalence versus path equivalence
Several equivalence statements in the paper should be interpreted carefully:
- •The probability\-flow ODE and the forward SDE are equivalent at the level of one\-time marginals, not generally at the level of path measures\.
- •Conditional flow matching learns the marginally correct velocity field for a prescribed path, but this depends on the chosen coupling and interpolation rule\.
- •Schrödinger bridges are naturally formulated as optimization problems over path\-space measures, not merely over endpoint densities\.
In this sense, a measure\-theoretic viewpoint helps prevent overinterpretation of formal similarities between methods\. Two models may share the same endpoint law or the same marginal path while differing substantially in how probability mass moves over time\.
### E\.9Takeaway
The term*measure\-theoretic*in the title of this paper does not mean that the survey develops generative modeling from full axiomatic probability theory\. Rather, it means that we systematically distinguish between laws and densities, formulate transport using pushforwards and path measures, and interpret equivalence statements at the correct level of generality\. This viewpoint is especially useful for unifying diffusion, score\-based models, probability\-flow ODEs, and flow matching, because these methods differ not only in their objectives and samplers, but also in the sense in which their underlying transports should be considered equivalent\.
## Appendix FSchrödinger Bridges, Entropic Transport, and Connections to Diffusion Models
This appendix provides background on Schrödinger bridges \(SBs\) and explains why they are relevant to the unified transport viewpoint developed in this survey\. At a high level, Schrödinger bridge problems seek the most likely stochastic evolution between prescribed endpoint laws, relative to a reference diffusion\. This places them naturally between diffusion\-based generative modeling, stochastic control, and entropy\-regularized optimal transport\.
### F\.1From optimal transport to stochastic bridges
Classical optimal transport asks for a map or coupling that moves a source lawμ0\\mu\_\{0\}to a target lawμ1\\mu\_\{1\}at minimal transport cost\. In its dynamic formulation, one seeks a probability path\(μt\)\(\\mu\_\{t\}\)and velocity fieldvtv\_\{t\}minimizing a kinetic\-energy functional subject to the continuity equation\. In this deterministic setting, transport is described by a path of measures and an associated deterministic flow\.
Schrödinger bridges modify this picture by introducing stochasticity\. Rather than searching over deterministic transports, one searches over probability measures on path space that match prescribed endpoint laws while remaining as close as possible to a chosen*reference path measure*\. The resulting problem may be viewed as an entropy\-regularized version of dynamic optimal transport\.
### F\.2Reference process and path\-space KL minimization
Letℚ\\mathbb\{Q\}denote a reference path measure on trajectory space, for example the law of a diffusion process
dXt=f\(Xt,t\)dt\+g\(t\)dWt\.dX\_\{t\}=f\(X\_\{t\},t\)\\,dt\+g\(t\)\\,dW\_\{t\}\.\(78\)The Schrödinger bridge problem seeks a new path measureℙ\\mathbb\{P\}solving
minℙKL\(ℙ∥ℚ\)subject toX0∼μ0,X1∼μ1\.\\min\_\{\\mathbb\{P\}\}\\ \\mathrm\{KL\}\(\\mathbb\{P\}\\,\\\|\\,\\mathbb\{Q\}\)\\qquad\\text\{subject to\}\\qquad X\_\{0\}\\sim\\mu\_\{0\},\\quad X\_\{1\}\\sim\\mu\_\{1\}\.\(79\)Thus the goal is to find the path\-space law that satisfies the endpoint constraints while deviating as little as possible, in relative entropy, from the reference process\.
This formulation makes the path\-measure viewpoint explicit\. Unlike a problem stated only in terms of endpoint densities, \([79](https://arxiv.org/html/2605.06829#A6.E79)\) is an optimization over full stochastic evolutions\.
### F\.3Interpretation
The optimization problem \([79](https://arxiv.org/html/2605.06829#A6.E79)\) admits several complementary interpretations\.
#### \(i\) Most likely bridge\.
Among all stochastic processes that match the prescribed endpoints, the Schrödinger bridge is the one that is most likely relative to the reference diffusion\.
#### \(ii\) Entropic optimal transport\.
The path\-space KL penalty plays the role of an entropy regularizer\. As the noise level of the reference process tends to zero, one formally recovers deterministic optimal transport in suitable regimes\.
#### \(iii\) Controlled diffusion\.
The solution may be interpreted as a controlled version of the reference diffusion, where the control modifies the drift so as to satisfy the endpoint constraints\.
These viewpoints help explain why Schrödinger bridges connect naturally to both stochastic control and diffusion\-based generative modeling\.
### F\.4Controlled\-diffusion formulation
Under appropriate regularity assumptions, the Schrödinger bridge can be represented as a controlled diffusion
dXt=\(f\(Xt,t\)\+ut\(Xt\)\)dt\+g\(t\)dWt,dX\_\{t\}=\\big\(f\(X\_\{t\},t\)\+u\_\{t\}\(X\_\{t\}\)\\big\)\\,dt\+g\(t\)\\,dW\_\{t\},\(80\)where the control fieldutu\_\{t\}is chosen so that the resulting process matches the prescribed endpoint marginals while minimizing the path\-space relative entropy\. Informally, the bridge is obtained by altering the drift of the reference process as little as possible, measured in an entropy or control\-energy sense\.
This makes SBs conceptually close to diffusion\-based generative models: in both cases, one works with a stochastic process and modifies or reverses its drift to induce a desired distributional evolution\.
### F\.5Connection to score\-based diffusion modeling
The connection between Schrödinger bridges and score\-based diffusion models is most transparent when the reference process is itself a diffusion with tractable forward dynamics\. In that case:
- •the reference process provides a forward path measure;
- •the bridge solution modifies the drift to satisfy endpoint constraints;
- •reverse\-time and score\-based quantities naturally appear in the description of the resulting dynamics\.
This is one reason SBs are often viewed as a generalization of diffusion\-style generative modeling\. Standard score\-based diffusion models typically fix a forward corruption process and learn reverse\-time dynamics from data\. Schrödinger bridge methods instead treat the endpoint\-matching problem itself as a path\-space optimization problem relative to a reference diffusion\(De Bortoliet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib8)\)\.
From the perspective of this survey, diffusion and SB methods share the same broad language:
1. 1\.a reference stochastic process,
2. 2\.a path of intermediate laws,
3. 3\.drift corrections or controls,
4. 4\.and generation by transporting probability mass between endpoints\.
### F\.6Why SBs matter for this survey
Schrödinger bridges matter for at least three reasons\.
#### \(i\) They make the path\-measure viewpoint explicit\.
Many statements in diffusion modeling are phrased in terms of one\-time marginals\. SBs remind us that a stochastic generative model is more fundamentally a law on trajectories, not only a family of endpoint or marginal densities\.
#### \(ii\) They connect diffusion modeling to entropic transport\.
The bridge formulation clarifies that diffusion\-like models can be interpreted not merely as reverse\-time simulators, but as approximate solutions to a regularized transport problem on path space\.
#### \(iii\) They suggest principled conditioning mechanisms\.
Because SBs impose endpoint constraints directly at the level of path measures, they provide a natural conceptual framework for conditional generation and inverse problems, where one wants to transport between constrained distributions rather than merely sample unconditionally\.
### F\.7Relation to probability\-flow ODEs and flow matching
Schrödinger bridges are not identical to probability\-flow ODEs or flow matching, but they sit nearby in the conceptual landscape\.
#### Probability\-flow ODEs\.
The probability\-flow ODE provides a deterministic dynamics with the same one\-time marginals as a diffusion\. By contrast, an SB is intrinsically a stochastic path\-space object\. Thus SBs are closer in spirit to diffusion models than to deterministic ODE transports, even though marginal\-equivalence ideas remain relevant\.
#### Flow matching\.
Flow matching begins by selecting a probability path and then learning the corresponding velocity field directly\. Schrödinger bridges instead define a stochastic optimality problem over path measures relative to a reference process\. Both emphasize path design, but they do so in different ways: flow matching treats the path as a modeling choice, while SBs derive a path from an optimization principle\.
### F\.8Computational viewpoint
In practice, exact solution of the Schrödinger bridge problem is often intractable, and modern algorithms rely on iterative approximations, alternating updates, or score\-based parameterizations of forward and backward quantities\. The resulting methods can be seen as blending ideas from:
- •diffusion modeling,
- •iterative proportional fitting or Sinkhorn\-like updates on path space,
- •stochastic control,
- •and score estimation\.
For the purposes of this survey, the key point is not the algorithmic taxonomy but the conceptual role of SBs: they provide a principled path\-space formulation that helps organize a broader family of diffusion\-inspired transport methods\.
### F\.9Limitations of the analogy
Although Schrödinger bridges and diffusion\-based generative models are closely related, they should not be conflated\. In particular:
- •a standard diffusion model is not automatically solving an SB problem exactly;
- •SB formulations involve explicit endpoint constraints relative to a reference path measure;
- •and the optimization criterion is path\-space KL minimization, not simply score matching or likelihood maximization\.
Thus the relationship is best viewed as one of conceptual overlap and possible algorithmic approximation, rather than literal identity\.
### F\.10Takeaway
Schrödinger bridges provide a useful conceptual extension of the transport viewpoint in this survey\. They make explicit that stochastic generative modeling can be formulated as optimization over path measures relative to a reference diffusion, subject to endpoint constraints\. This strengthens the connections between diffusion models, stochastic control, and entropic optimal transport, and helps explain why path design, conditioning, and drift correction play such central roles across modern generative modeling methods\(De Bortoliet al\.,[2021](https://arxiv.org/html/2605.06829#bib.bib8)\)\.
## References
- A comprehensive survey on diffusion models and their applications\.arXiv preprint arXiv:2408\.10207\.External Links:[Link](https://arxiv.org/abs/2408.10207)Cited by:[§1\.4](https://arxiv.org/html/2605.06829#S1.SS4.p1.1),[§1\.4](https://arxiv.org/html/2605.06829#S1.SS4.p2.1)\.
- A\. Alaa, B\. Van Breugel, E\. S\. Saveliev, and M\. van der Schaar \(2022\)How faithful is your synthetic data? sample\-level metrics for evaluating and auditing generative models\.InProceedings of the 39th International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.162,pp\. 290–306\.External Links:[Link](https://proceedings.mlr.press/v162/alaa22a.html)Cited by:[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p1.1)\.
- M\. S\. Albergo, N\. M\. Boffi, and E\. Vanden\-Eijnden \(2025\)Stochastic interpolants: a unifying framework for flows and diffusions\.Journal of Machine Learning Research26\(209\),pp\. 1–80\.External Links:[Link](https://jmlr.org/papers/v26/23-1605.html)Cited by:[§D\.1](https://arxiv.org/html/2605.06829#A4.SS1.p2.4),[§D\.9](https://arxiv.org/html/2605.06829#A4.SS9.p2.1),[Appendix D](https://arxiv.org/html/2605.06829#A4.p1.1),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p5.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2),[4th item](https://arxiv.org/html/2605.06829#S8.I1.i4.p1.1),[§8\.1](https://arxiv.org/html/2605.06829#S8.SS1.p2.1),[§8\.2](https://arxiv.org/html/2605.06829#S8.SS2.SSS0.Px1.p1.2),[§8\.2](https://arxiv.org/html/2605.06829#S8.SS2.SSS0.Px2.p1.1)\.
- B\. D\. O\. Anderson \(1982\)Reverse\-time diffusion equation models\.Stochastic Processes and their Applications12\(3\),pp\. 313–326\.External Links:[Document](https://dx.doi.org/10.1016/0304-4149%2882%2990051-5)Cited by:[§A\.1](https://arxiv.org/html/2605.06829#A1.SS1.p2.1),[3rd item](https://arxiv.org/html/2605.06829#S1.I5.i3.p1.1),[§4\.1](https://arxiv.org/html/2605.06829#S4.SS1.p1.2),[§4\.1](https://arxiv.org/html/2605.06829#S4.SS1.p2.1)\.
- M\. Arjovsky and L\. Bottou \(2017\)Towards principled methods for training generative adversarial networks\.arXiv preprint arXiv:1701\.04862\.External Links:[Link](https://arxiv.org/abs/1701.04862)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p3.1)\.
- J\. Austin, D\. D\. Johnson, J\. Ho, D\. Tarlow, and R\. van den Berg \(2021\)Structured denoising diffusion models in discrete state\-spaces\.InAdvances in Neural Information Processing Systems,Vol\.34\.External Links:[Link](https://proceedings.neurips.cc/paper/2021/hash/958c530554f78bcd8e97125b70e6973d-Abstract.html)Cited by:[§10\.1](https://arxiv.org/html/2605.06829#S10.SS1.p1.1),[§10\.6](https://arxiv.org/html/2605.06829#S10.SS6.p1.1),[§3\.1](https://arxiv.org/html/2605.06829#S3.SS1.p2.1),[§3](https://arxiv.org/html/2605.06829#S3.p1.4)\.
- A\. Bansal, E\. Borgnia, H\. Chu, J\. S\. Li, H\. Kazemi, F\. Huang, M\. Goldblum, J\. Geiping, and T\. Goldstein \(2022\)Cold diffusion: inverting arbitrary image transforms without noise\.arXiv preprint arXiv:2208\.09392\.External Links:[Link](https://arxiv.org/abs/2208.09392)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p5.1),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px3.p1.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2),[§10\.1](https://arxiv.org/html/2605.06829#S10.SS1.p1.1),[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p2.1),[4th item](https://arxiv.org/html/2605.06829#S8.I1.i4.p1.1),[§8\.1](https://arxiv.org/html/2605.06829#S8.SS1.p2.1),[§8\.2](https://arxiv.org/html/2605.06829#S8.SS2.SSS0.Px2.p1.1)\.
- N\. M\. Boffi, M\. S\. Albergo, and E\. Vanden\-Eijnden \(2025\)Flow map matching with stochastic interpolants: a mathematical framework for consistency models\.Transactions on Machine Learning Research\.External Links:[Link](https://mlanthology.org/tmlr/2025/boffi2025tmlr-flow/)Cited by:[§D\.5](https://arxiv.org/html/2605.06829#A4.SS5.p2.1)\.
- A\. Campbell, J\. Benton, V\. De Bortoli, T\. Rainforth, G\. Deligiannidis, and A\. Doucet \(2022\)A continuous time framework for discrete denoising models\.InAdvances in Neural Information Processing Systems,Vol\.35\.External Links:[Link](https://papers.nips.cc/paper_files/paper/2022/hash/b5b528767aa35f5b1a60fe0aaeca0563-Abstract-Conference.html)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px3.p1.1),[§10\.3](https://arxiv.org/html/2605.06829#S10.SS3.p1.1),[§10\.6](https://arxiv.org/html/2605.06829#S10.SS6.p1.1),[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p2.1),[§10](https://arxiv.org/html/2605.06829#S10.p1.1),[§3\.6](https://arxiv.org/html/2605.06829#S3.SS6.p2.1),[§3](https://arxiv.org/html/2605.06829#S3.p1.4)\.
- H\. Cao, C\. Tan, Z\. Gao, Y\. Xu, G\. Chen, P\. Heng, and S\. Z\. Li \(2022\)A survey on generative diffusion model\.arXiv preprint arXiv:2209\.02646\.External Links:[Link](https://arxiv.org/abs/2209.02646)Cited by:[§1\.4](https://arxiv.org/html/2605.06829#S1.SS4.p1.1),[§1\.4](https://arxiv.org/html/2605.06829#S1.SS4.p2.1)\.
- F\. Cheema and R\. Urner \(2023\)Precision recall cover: a method for assessing generative models\.InProceedings of The 26th International Conference on Artificial Intelligence and Statistics,Proceedings of Machine Learning Research, Vol\.206,pp\. 6571–6594\.External Links:[Link](https://proceedings.mlr.press/v206/cheema23a.html)Cited by:[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p1.1)\.
- H\. Chen, K\. Zhang, H\. Tan, Z\. Xu, F\. Luan, L\. Guibas, G\. Wetzstein, and S\. Bi \(2025\)Gaussian mixture flow matching models\.InProceedings of the 42nd International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.267,pp\. 9783–9802\.External Links:[Link](https://proceedings.mlr.press/v267/chen25cl.html)Cited by:[4th item](https://arxiv.org/html/2605.06829#S9.I3.i4.p1.1),[§9\.5](https://arxiv.org/html/2605.06829#S9.SS5.p2.1)\.
- M\. Chen, K\. Huang, T\. Zhao, and M\. Wang \(2023a\)Score approximation, estimation and distribution recovery of diffusion models on low\-dimensional data\.InProceedings of the 40th International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.202,pp\. 4672–4712\.External Links:[Link](https://proceedings.mlr.press/v202/chen23o.html)Cited by:[§9\.3](https://arxiv.org/html/2605.06829#S9.SS3.p2.1),[§9](https://arxiv.org/html/2605.06829#S9.p1.1)\.
- R\. T\. Q\. Chen, Y\. Rubanova, J\. Bettencourt, and D\. Duvenaud \(2018\)Neural ordinary differential equations\.InAdvances in Neural Information Processing Systems,External Links:[Link](https://arxiv.org/abs/1806.07366)Cited by:[§B\.7](https://arxiv.org/html/2605.06829#A2.SS7.p1.3),[item 3](https://arxiv.org/html/2605.06829#S1.I3.i3.p1.1),[1st item](https://arxiv.org/html/2605.06829#S1.I5.i1.p1.1),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px3.p1.1),[§1\.1](https://arxiv.org/html/2605.06829#S1.SS1.p2.1),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px1.p1.1),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px3.p1.1),[§1\.8](https://arxiv.org/html/2605.06829#S1.SS8.p1.1),[§2\.4](https://arxiv.org/html/2605.06829#S2.SS4.p1.5),[§2](https://arxiv.org/html/2605.06829#S2.p1.1),[§6\.3](https://arxiv.org/html/2605.06829#S6.SS3.p1.1),[§6](https://arxiv.org/html/2605.06829#S6.p1.1),[2nd item](https://arxiv.org/html/2605.06829#S8.I1.i2.p1.1),[Table 2](https://arxiv.org/html/2605.06829#S8.T2.3.3.2.1.1)\.
- T\. Chen, R\. Zhang, and G\. Hinton \(2023b\)Analog bits: generating discrete data using diffusion models with self\-conditioning\.InInternational Conference on Learning Representations,External Links:[Link](https://arxiv.org/abs/2208.04202)Cited by:[§10\.6](https://arxiv.org/html/2605.06829#S10.SS6.p1.1),[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p2.1),[§10](https://arxiv.org/html/2605.06829#S10.p1.1),[§3\.6](https://arxiv.org/html/2605.06829#S3.SS6.p2.1)\.
- C\. Cheng, J\. Li, J\. Peng, and G\. Liu \(2024\)Categorical flow matching on statistical manifolds\.InAdvances in Neural Information Processing Systems,Vol\.37\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2024/hash/62a58f2130894e44e8a272c563a2c6f1-Abstract-Conference.html)Cited by:[§10\.6](https://arxiv.org/html/2605.06829#S10.SS6.p1.1),[§7\.5](https://arxiv.org/html/2605.06829#S7.SS5.p1.1),[§7](https://arxiv.org/html/2605.06829#S7.p1.1)\.
- H\. Chung, J\. Kim, M\. T\. McCann, M\. L\. Klasky, and J\. C\. Ye \(2023\)Diffusion posterior sampling for general noisy inverse problems\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=OnD9zGAGT0k)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p6.3),[§10\.5](https://arxiv.org/html/2605.06829#S10.SS5.p1.1)\.
- H\. Chung and J\. C\. Ye \(2022\)Score\-based diffusion models for accelerated mri\.Medical Image Analysis80,pp\. 102479\.External Links:[Link](https://pubmed.ncbi.nlm.nih.gov/35696876/)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p6.3)\.
- C\. Corneanu, R\. Gadde, and A\. M\. Martinez \(2024\)LatentPaint: image inpainting in latent space with diffusion models\.InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,pp\. 4334–4343\.External Links:[Link](https://openaccess.thecvf.com/content/WACV2024/html/Corneanu_LatentPaint_Image_Inpainting_in_Latent_Space_With_Diffusion_Models_WACV_2024_paper.html)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p6.3)\.
- O\. Davis, S\. Kessler, M\. Petrache, I\. I\. Ceylan, M\. Bronstein, and A\. J\. Bose \(2024\)Fisher flow matching for generative modeling over discrete data\.InAdvances in Neural Information Processing Systems,Vol\.37\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2024/hash/fadec8f2e65f181d777507d1df69b92f-Abstract-Conference.html)Cited by:[§10\.6](https://arxiv.org/html/2605.06829#S10.SS6.p1.1),[§7\.5](https://arxiv.org/html/2605.06829#S7.SS5.p1.1),[§7](https://arxiv.org/html/2605.06829#S7.p1.1)\.
- V\. De Bortoli, J\. Thornton, J\. Heng, and A\. Doucet \(2021\)Diffusion schrödinger bridge with applications to score\-based generative modeling\.arXiv preprint arXiv:2106\.01357\.External Links:[Link](https://arxiv.org/abs/2106.01357)Cited by:[§F\.10](https://arxiv.org/html/2605.06829#A6.SS10.p1.1),[§F\.5](https://arxiv.org/html/2605.06829#A6.SS5.p2.1),[3rd item](https://arxiv.org/html/2605.06829#S1.I2.i3.p1.1),[§10\.5](https://arxiv.org/html/2605.06829#S10.SS5.p1.1),[§3\.5](https://arxiv.org/html/2605.06829#S3.SS5.p2.1)\.
- L\. Dinh, D\. Krueger, and Y\. Bengio \(2015\)NICE: non\-linear independent components estimation\.InInternational Conference on Learning Representations,External Links:[Link](https://arxiv.org/abs/1410.8516)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p4.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2)\.
- L\. Dinh, J\. Sohl\-Dickstein, and S\. Bengio \(2017\)Density estimation using real NVP\.InInternational Conference on Learning Representations,External Links:[Link](https://arxiv.org/abs/1605.08803)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p4.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2)\.
- T\. Dockhorn, A\. Vahdat, and K\. Kreis \(2022\)Score\-based generative modeling with critically\-damped langevin diffusion\.InInternational Conference on Learning Representations,External Links:[Link](https://arxiv.org/abs/2112.07068)Cited by:[§10\.1](https://arxiv.org/html/2605.06829#S10.SS1.p1.1),[§10\.2](https://arxiv.org/html/2605.06829#S10.SS2.p1.1),[§3\.2](https://arxiv.org/html/2605.06829#S3.SS2.SSS0.Px2.p2.1)\.
- F\. Eijkelboom, G\. Bartosh, C\. A\. Naesseth, M\. Welling, and J\. van de Meent \(2024\)Variational flow matching for graph generation\.InAdvances in Neural Information Processing Systems,Vol\.37\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2024/hash/15b780350b302a1bf9a3bd273f5c15a4-Abstract-Conference.html)Cited by:[§10\.6](https://arxiv.org/html/2605.06829#S10.SS6.p1.1),[§7\.5](https://arxiv.org/html/2605.06829#S7.SS5.p1.1),[§7](https://arxiv.org/html/2605.06829#S7.p1.1)\.
- R\. Feng, C\. Yu, W\. Deng, P\. Hu, and T\. Wu \(2025\)On the guidance of flow matching\.InProceedings of the 42nd International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.267,pp\. 16993–17029\.External Links:[Link](https://proceedings.mlr.press/v267/feng25s.html)Cited by:[§D\.10](https://arxiv.org/html/2605.06829#A4.SS10.p1.1)\.
- I\. Gat, T\. Remez, N\. Shaul, F\. Kreuk, R\. T\. Q\. Chen, G\. Synnaeve, Y\. Adi, and Y\. Lipman \(2024\)Discrete flow matching\.InAdvances in Neural Information Processing Systems,Vol\.37\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2024/hash/f0d629a734b56a642701bba7bc8bb3ed-Abstract-Conference.html)Cited by:[§10\.6](https://arxiv.org/html/2605.06829#S10.SS6.p1.1),[§10](https://arxiv.org/html/2605.06829#S10.p1.1),[§7\.5](https://arxiv.org/html/2605.06829#S7.SS5.p1.1),[§7](https://arxiv.org/html/2605.06829#S7.p1.1)\.
- I\. Goodfellow, Y\. Bengio, and A\. Courville \(2016\)Deep learning\.MIT Press\.External Links:[Link](https://www.deeplearningbook.org/)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px1.p1.6),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p5.1)\.
- I\. Goodfellow, J\. Pouget\-Abadie, M\. Mirza, B\. Xu, D\. Warde\-Farley, S\. Ozair, A\. Courville, and Y\. Bengio \(2014\)Generative adversarial nets\.InAdvances in Neural Information Processing Systems,External Links:[Link](https://arxiv.org/abs/1406.2661)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p3.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2)\.
- W\. Grathwohl, R\. T\. Q\. Chen, J\. Bettencourt, I\. Sutskever, and D\. Duvenaud \(2019\)FFJORD: free\-form continuous dynamics for scalable reversible generative models\.arXiv preprint arXiv:1810\.01367\.External Links:[Link](https://arxiv.org/abs/1810.01367)Cited by:[item 3](https://arxiv.org/html/2605.06829#S1.I3.i3.p1.1),[1st item](https://arxiv.org/html/2605.06829#S1.I5.i1.p1.1),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px3.p1.1),[§1\.1](https://arxiv.org/html/2605.06829#S1.SS1.p2.1),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px1.p1.1),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px3.p1.1),[§1\.8](https://arxiv.org/html/2605.06829#S1.SS8.p1.1),[§6\.3](https://arxiv.org/html/2605.06829#S6.SS3.p1.3),[§6](https://arxiv.org/html/2605.06829#S6.p1.1),[Table 2](https://arxiv.org/html/2605.06829#S8.T2.3.3.2.1.1)\.
- A\. Graves, R\. K\. Srivastava, T\. Atkinson, and F\. Gomez \(2023\)Bayesian flow networks\.arXiv preprint arXiv:2308\.07037\.External Links:[Link](https://arxiv.org/abs/2308.07037)Cited by:[§D\.9](https://arxiv.org/html/2605.06829#A4.SS9.p2.1),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p5.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2),[4th item](https://arxiv.org/html/2605.06829#S8.I1.i4.p1.1),[§8\.1](https://arxiv.org/html/2605.06829#S8.SS1.p2.1),[§8\.2](https://arxiv.org/html/2605.06829#S8.SS2.SSS0.Px1.p1.2)\.
- D\. Haviv, A\. Pooladian, D\. Pe’Er, and B\. Amos \(2025\)Wasserstein flow matching: generative modeling over families of distributions\.InProceedings of the 42nd International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.267,pp\. 22238–22258\.External Links:[Link](https://proceedings.mlr.press/v267/haviv25a.html)Cited by:[§10\.3](https://arxiv.org/html/2605.06829#S10.SS3.p1.1),[§10\.6](https://arxiv.org/html/2605.06829#S10.SS6.p1.1),[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p2.1),[§10](https://arxiv.org/html/2605.06829#S10.p1.1),[§7\.5](https://arxiv.org/html/2605.06829#S7.SS5.p1.1),[§7](https://arxiv.org/html/2605.06829#S7.p1.1)\.
- M\. Heusel, H\. Ramsauer, T\. Unterthiner, B\. Nessler, G\. Klambauer, and S\. Hochreiter \(2017\)GANs trained by a two time\-scale update rule converge to a local nash equilibrium\.InAdvances in Neural Information Processing Systems,Vol\.30\.External Links:[Link](https://proceedings.neurips.cc/paper/2017/hash/8a1d694707eb0fefe65871369074926d-Abstract.html)Cited by:[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p1.1)\.
- J\. Ho, A\. Jain, and P\. Abbeel \(2020\)Denoising diffusion probabilistic models\.arXiv preprint arXiv:2006\.11239\.External Links:[Link](https://arxiv.org/abs/2006.11239)Cited by:[§C\.6](https://arxiv.org/html/2605.06829#A3.SS6.p1.1),[§C\.7](https://arxiv.org/html/2605.06829#A3.SS7.p1.2),[Appendix C](https://arxiv.org/html/2605.06829#A3.p1.1),[1st item](https://arxiv.org/html/2605.06829#S1.I2.i1.p1.1),[item 4](https://arxiv.org/html/2605.06829#S1.I3.i4.p1.1),[item 3](https://arxiv.org/html/2605.06829#S1.I4.i3.p1.1),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p5.1),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px3.p1.1),[§1\.1](https://arxiv.org/html/2605.06829#S1.SS1.p2.1),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px2.p1.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2),[§2\.2](https://arxiv.org/html/2605.06829#S2.SS2.p1.1),[§2\.3](https://arxiv.org/html/2605.06829#S2.SS3.p1.9),[§2](https://arxiv.org/html/2605.06829#S2.p1.1),[§3\.1](https://arxiv.org/html/2605.06829#S3.SS1.p1.2),[§3\.3](https://arxiv.org/html/2605.06829#S3.SS3.p1.6),[§3\.4](https://arxiv.org/html/2605.06829#S3.SS4.SSS0.Px1.p1.3),[§3](https://arxiv.org/html/2605.06829#S3.p1.4),[§4\.3](https://arxiv.org/html/2605.06829#S4.SS3.p1.6),[§5\.2](https://arxiv.org/html/2605.06829#S5.SS2.p1.3),[§5\.4](https://arxiv.org/html/2605.06829#S5.SS4.p1.2),[§5](https://arxiv.org/html/2605.06829#S5.p1.2),[Table 2](https://arxiv.org/html/2605.06829#S8.T2.5.7.1.1.1.1)\.
- P\. Holderrieth and E\. Erives \(2025\)An introduction to flow matching and diffusion models\.arXiv preprint arXiv:2506\.02070\.External Links:[Link](https://arxiv.org/abs/2506.02070)Cited by:[§1\.4](https://arxiv.org/html/2605.06829#S1.SS4.p1.1),[§1\.4](https://arxiv.org/html/2605.06829#S1.SS4.p2.1)\.
- E\. Hoogeboom, A\. A\. Gritsenko, J\. Bastings, B\. Poole, R\. van den Berg, and T\. Salimans \(2022\)Autoregressive diffusion models\.InInternational Conference on Learning Representations,External Links:[Link](https://arxiv.org/abs/2110.02037)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px3.p1.1),[§10\.2](https://arxiv.org/html/2605.06829#S10.SS2.p1.1),[§10\.6](https://arxiv.org/html/2605.06829#S10.SS6.p1.1),[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p2.1)\.
- A\. Hyvärinen \(2005\)Estimation of non\-normalized statistical models by score matching\.Journal of Machine Learning Research6,pp\. 695–709\.External Links:[Link](https://jmlr.org/papers/v6/hyvarinen05a.html)Cited by:[§C\.1](https://arxiv.org/html/2605.06829#A3.SS1.p1.5),[Appendix C](https://arxiv.org/html/2605.06829#A3.p1.1),[item 2](https://arxiv.org/html/2605.06829#S1.I3.i2.p1.2),[§1\.1](https://arxiv.org/html/2605.06829#S1.SS1.p2.1),[§2\.5](https://arxiv.org/html/2605.06829#S2.SS5.p1.5),[§5\.1](https://arxiv.org/html/2605.06829#S5.SS1.p1.1)\.
- A\. Jalal, M\. Arvinte, G\. Daras, E\. Price, A\. G\. Dimakis, and J\. I\. Tamir \(2021\)Robust compressed sensing mri with deep generative priors\.InAdvances in Neural Information Processing Systems,Vol\.34\.External Links:[Link](https://proceedings.neurips.cc/paper/2021/hash/7d6044e95a16761171b130dcb476a43e-Abstract.html)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p6.3)\.
- Y\. Janati, B\. Moufad, A\. Durmus, E\. Moulines, and J\. Olsson \(2024\)Divide\-and\-conquer posterior sampling for denoising diffusion priors\.InAdvances in Neural Information Processing Systems,Vol\.37,pp\. 97408–97444\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2024/file/b0ae046e198a5e43141519868a959c74-Paper-Conference.pdf)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p6.3),[§10\.5](https://arxiv.org/html/2605.06829#S10.SS5.p2.1)\.
- M\. Jiralerspong, J\. Bose, I\. Gemp, C\. Qin, Y\. Bachrach, and G\. Gidel \(2023\)Feature likelihood divergence: evaluating the generalization of generative models using samples\.InAdvances in Neural Information Processing Systems,Vol\.36,pp\. 33095–33119\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2023/file/68b138608ef80b08d65b1bd9594d9559-Paper-Conference.pdf)Cited by:[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p1.1)\.
- T\. Karras, M\. Aittala, T\. Aila, and S\. Laine \(2022\)Elucidating the design space of diffusion\-based generative models\.arXiv preprint arXiv:2206\.00364\.External Links:[Link](https://arxiv.org/abs/2206.00364)Cited by:[item 4](https://arxiv.org/html/2605.06829#S1.I3.i4.p1.1),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px3.p1.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2),[§10\.1](https://arxiv.org/html/2605.06829#S10.SS1.p1.1),[§3\.3](https://arxiv.org/html/2605.06829#S3.SS3.p2.1),[§5\.3](https://arxiv.org/html/2605.06829#S5.SS3.p1.3),[§6\.2](https://arxiv.org/html/2605.06829#S6.SS2.SSS0.Px2.p1.1),[§8\.2](https://arxiv.org/html/2605.06829#S8.SS2.SSS0.Px2.p1.1),[4th item](https://arxiv.org/html/2605.06829#S9.I3.i4.p1.1),[§9\.3](https://arxiv.org/html/2605.06829#S9.SS3.p1.1),[§9\.4](https://arxiv.org/html/2605.06829#S9.SS4.p1.1)\.
- B\. Kawar, M\. Elad, S\. Ermon, and J\. Song \(2022\)Denoising diffusion restoration models\.InAdvances in Neural Information Processing Systems,Vol\.35\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2022/hash/95504595b6169131b6ed6cd72eb05616-Abstract-Conference.html)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p6.3)\.
- G\. Kerrigan, G\. Migliorini, and P\. Smyth \(2024\)Functional flow matching\.InProceedings of The 27th International Conference on Artificial Intelligence and Statistics,Proceedings of Machine Learning Research, Vol\.238,pp\. 3934–3942\.External Links:[Link](https://proceedings.mlr.press/v238/kerrigan24a.html)Cited by:[§10\.3](https://arxiv.org/html/2605.06829#S10.SS3.p1.1),[§10\.6](https://arxiv.org/html/2605.06829#S10.SS6.p1.1),[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p2.1),[§10](https://arxiv.org/html/2605.06829#S10.p1.1),[§7\.5](https://arxiv.org/html/2605.06829#S7.SS5.p1.1),[§7](https://arxiv.org/html/2605.06829#S7.p1.1)\.
- M\. Khayatkhoei and W\. Abdalmageed \(2023\)Emergent asymmetry of precision and recall for measuring fidelity and diversity of generative models in high dimensions\.InProceedings of the 40th International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.202,pp\. 16326–16343\.External Links:[Link](https://proceedings.mlr.press/v202/khayatkhoei23a.html)Cited by:[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p2.1)\.
- D\. Kim, C\. Lai, W\. Liao, N\. Murata, Y\. Takida, T\. Uesaka, Y\. He, Y\. Mitsufuji, and S\. Ermon \(2024a\)Consistency trajectory models: learning probability flow ode trajectory of diffusion\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=ymjI8feDTD)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p5.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2),[§8\.1](https://arxiv.org/html/2605.06829#S8.SS1.p2.1),[§8\.2](https://arxiv.org/html/2605.06829#S8.SS2.SSS0.Px3.p1.1)\.
- D\. Kim, M\. Kwon, and Y\. Uh \(2024b\)Attribute based interpretable evaluation metrics for generative models\.InProceedings of the 41st International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.235,pp\. 24271–24293\.External Links:[Link](https://proceedings.mlr.press/v235/kim24t.html)Cited by:[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p1.1)\.
- D\. P\. Kingma, T\. Salimans, B\. Poole, and J\. Ho \(2021\)Variational diffusion models\.arXiv preprint arXiv:2107\.00630\.External Links:[Link](https://arxiv.org/abs/2107.00630)Cited by:[§C\.6](https://arxiv.org/html/2605.06829#A3.SS6.p1.3),[§C\.7](https://arxiv.org/html/2605.06829#A3.SS7.p1.2),[§C\.8](https://arxiv.org/html/2605.06829#A3.SS8.p1.1),[Appendix C](https://arxiv.org/html/2605.06829#A3.p1.1),[1st item](https://arxiv.org/html/2605.06829#S1.I2.i1.p1.1),[item 4](https://arxiv.org/html/2605.06829#S1.I3.i4.p1.1),[item 3](https://arxiv.org/html/2605.06829#S1.I4.i3.p1.1),[2nd item](https://arxiv.org/html/2605.06829#S1.I5.i2.p1.1),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p5.1),[§1\.1](https://arxiv.org/html/2605.06829#S1.SS1.p2.1),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px2.p1.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2),[§5\.4](https://arxiv.org/html/2605.06829#S5.SS4.p1.2)\.
- D\. P\. Kingma and M\. Welling \(2014\)Auto\-encoding variational bayes\.arXiv preprint arXiv:1312\.6114\.External Links:[Link](https://arxiv.org/abs/1312.6114)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p2.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2)\.
- T\. Kynkäänniemi, T\. Karras, S\. Laine, J\. Lehtinen, and T\. Aila \(2019\)Improved precision and recall metric for assessing generative models\.InAdvances in Neural Information Processing Systems,Vol\.32\.External Links:[Link](https://proceedings.neurips.cc/paper/2019/hash/0234c510bc6d908b28c70ff313743079-Abstract.html)Cited by:[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p1.1)\.
- C\. Léonard \(2014\)A survey of the schrödinger problem and some of its connections with optimal transport\.Discrete and Continuous Dynamical Systems \- Series A34\(4\),pp\. 1533–1574\.External Links:[Link](https://arxiv.org/abs/1308.0215)Cited by:[3rd item](https://arxiv.org/html/2605.06829#S1.I2.i3.p1.1)\.
- H\. Li and M\. Pereira \(2024\)Solving inverse problems via diffusion optimal control\.InAdvances in Neural Information Processing Systems,Vol\.37\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2024/file/86655bc516148e311bcfcf88f1744de7-Paper-Conference.pdf)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p6.3),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px1.p1.1),[§10\.5](https://arxiv.org/html/2605.06829#S10.SS5.p2.1)\.
- Y\. Liang, J\. Wu, Y\. Lai, and Y\. Qin \(2024\)Efficient precision and recall metrics for assessing generative models using hubness\-aware sampling\.InProceedings of the 41st International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.235,pp\. 29682–29699\.External Links:[Link](https://proceedings.mlr.press/v235/liang24f.html)Cited by:[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p1.1)\.
- Y\. Lipman, R\. T\. Q\. Chen, H\. Ben\-Hamu, M\. Nickel, and M\. Le \(2022\)Flow matching for generative modeling\.arXiv preprint arXiv:2210\.02747\.External Links:[Link](https://arxiv.org/abs/2210.02747)Cited by:[§D\.1](https://arxiv.org/html/2605.06829#A4.SS1.p2.4),[§D\.5](https://arxiv.org/html/2605.06829#A4.SS5.p2.1),[Appendix D](https://arxiv.org/html/2605.06829#A4.p1.1),[2nd item](https://arxiv.org/html/2605.06829#S1.I2.i2.p1.1),[item 1](https://arxiv.org/html/2605.06829#S1.I3.i1.p1.3),[item 2](https://arxiv.org/html/2605.06829#S1.I3.i2.p1.2),[item 1](https://arxiv.org/html/2605.06829#S1.I4.i1.p1.1),[item 3](https://arxiv.org/html/2605.06829#S1.I4.i3.p1.1),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px3.p1.1),[§1\.1](https://arxiv.org/html/2605.06829#S1.SS1.p1.1),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px1.p1.1),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px2.p1.1),[§1\.8](https://arxiv.org/html/2605.06829#S1.SS8.p1.1),[§10\.2](https://arxiv.org/html/2605.06829#S10.SS2.p1.1),[item 2](https://arxiv.org/html/2605.06829#S2.I2.i2.p1.1),[§2\.4](https://arxiv.org/html/2605.06829#S2.SS4.p1.6),[§2](https://arxiv.org/html/2605.06829#S2.p1.1),[§3\.5](https://arxiv.org/html/2605.06829#S3.SS5.p2.1),[§3\.6](https://arxiv.org/html/2605.06829#S3.SS6.p1.2),[§6\.4](https://arxiv.org/html/2605.06829#S6.SS4.p1.1),[§7\.1](https://arxiv.org/html/2605.06829#S7.SS1.p3.3),[§7\.1](https://arxiv.org/html/2605.06829#S7.SS1.p4.1),[§7\.2](https://arxiv.org/html/2605.06829#S7.SS2.p2.5),[§7\.4](https://arxiv.org/html/2605.06829#S7.SS4.p1.2),[§7](https://arxiv.org/html/2605.06829#S7.p1.1),[3rd item](https://arxiv.org/html/2605.06829#S8.I1.i3.p1.1),[§8\.2](https://arxiv.org/html/2605.06829#S8.SS2.SSS0.Px2.p1.1),[Table 2](https://arxiv.org/html/2605.06829#S8.T2.4.4.2.1.1)\.
- Y\. Lipman, M\. Havasi, P\. Holderrieth, N\. Shaul, M\. Le, B\. Karrer, R\. T\. Q\. Chen, D\. Lopez\-Paz, H\. Ben\-Hamu, and I\. Gat \(2024\)Flow matching guide and code\.arXiv preprint arXiv:2412\.06264\.External Links:[Link](https://arxiv.org/abs/2412.06264)Cited by:[Appendix D](https://arxiv.org/html/2605.06829#A4.p1.1),[§1\.4](https://arxiv.org/html/2605.06829#S1.SS4.p1.1),[§1\.4](https://arxiv.org/html/2605.06829#S1.SS4.p2.1),[3rd item](https://arxiv.org/html/2605.06829#S8.I1.i3.p1.1),[Table 2](https://arxiv.org/html/2605.06829#S8.T2.4.4.2.1.1)\.
- X\. Liu, C\. Gong, and Q\. Liu \(2022\)Flow straight and fast: learning to generate and transfer data with rectified flow\.arXiv preprint arXiv:2209\.03003\.External Links:[Link](https://arxiv.org/abs/2209.03003)Cited by:[§D\.10](https://arxiv.org/html/2605.06829#A4.SS10.p1.1),[item 1](https://arxiv.org/html/2605.06829#S1.I3.i1.p1.3),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px3.p1.1),[§10\.1](https://arxiv.org/html/2605.06829#S10.SS1.p1.1),[§10\.2](https://arxiv.org/html/2605.06829#S10.SS2.p1.1),[§3\.6](https://arxiv.org/html/2605.06829#S3.SS6.p1.2),[§7\.6](https://arxiv.org/html/2605.06829#S7.SS6.p1.1),[§7](https://arxiv.org/html/2605.06829#S7.p1.1),[3rd item](https://arxiv.org/html/2605.06829#S8.I1.i3.p1.1),[§8\.2](https://arxiv.org/html/2605.06829#S8.SS2.SSS0.Px2.p1.1),[Table 2](https://arxiv.org/html/2605.06829#S8.T2.5.5.2.1.1),[4th item](https://arxiv.org/html/2605.06829#S9.I3.i4.p1.1),[§9\.4](https://arxiv.org/html/2605.06829#S9.SS4.p1.1)\.
- A\. Lou, C\. Meng, and S\. Ermon \(2024\)Discrete diffusion modeling by estimating the ratios of the data distribution\.InProceedings of the 41st International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.235,pp\. 32819–32848\.External Links:[Link](https://proceedings.mlr.press/v235/lou24a.html)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p5.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2),[4th item](https://arxiv.org/html/2605.06829#S8.I1.i4.p1.1),[§8\.1](https://arxiv.org/html/2605.06829#S8.SS1.p2.1)\.
- A\. Lugmayr, M\. Danelljan, A\. Romero, F\. Yu, R\. Timofte, and L\. Van Gool \(2022\)RePaint: inpainting using denoising diffusion probabilistic models\.InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,pp\. 11461–11471\.External Links:[Link](https://openaccess.thecvf.com/content/CVPR2022/html/Lugmayr_RePaint_Inpainting_Using_Denoising_Diffusion_Probabilistic_Models_CVPR_2022_paper.html)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p6.3),[§10\.5](https://arxiv.org/html/2605.06829#S10.SS5.p1.1),[§10](https://arxiv.org/html/2605.06829#S10.p1.1)\.
- W\. Luo, Z\. Huang, Z\. Geng, J\. Z\. Kolter, and G\. Qi \(2024\)One\-step diffusion distillation through score implicit matching\.InAdvances in Neural Information Processing Systems,Vol\.37\.External Links:[Link](https://papers.nips.cc/paper_files/paper/2024/hash/d107ca794d83c8242e357e6a43a068f4-Abstract-Conference.html)Cited by:[§10\.4](https://arxiv.org/html/2605.06829#S10.SS4.p1.1),[§9\.4](https://arxiv.org/html/2605.06829#S9.SS4.p2.1)\.
- Z\. Ma, Y\. Zhang, G\. Jia, L\. Zhao, Y\. Ma, M\. Ma, G\. Liu, K\. Zhang, J\. Li, and B\. Zhou \(2024\)Efficient diffusion models: a comprehensive survey from principles to practices\.arXiv preprint arXiv:2410\.11795\.External Links:[Link](https://arxiv.org/abs/2410.11795)Cited by:[§1\.4](https://arxiv.org/html/2605.06829#S1.SS4.p1.1)\.
- C\. Meng, Y\. He, Y\. Song, J\. Song, J\. Wu, J\. Zhu, and S\. Ermon \(2022\)SDEdit: guided image synthesis and editing with stochastic differential equations\.InInternational Conference on Learning Representations,External Links:[Link](https://arxiv.org/abs/2108.01073)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p6.3),[§10\.5](https://arxiv.org/html/2605.06829#S10.SS5.p1.1),[§10](https://arxiv.org/html/2605.06829#S10.p1.1)\.
- L\. Mescheder, A\. Geiger, and S\. Nowozin \(2018\)Which training methods for GANs do actually converge?\.InProceedings of the 35th International Conference on Machine Learning \(ICML\),Proceedings of Machine Learning Research, Vol\.80,pp\. 3481–3490\.External Links:[Link](https://proceedings.mlr.press/v80/mescheder18a.html)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p3.1)\.
- M\. F\. Naeem, S\. Oh, Y\. Uh, Y\. Choi, and J\. Yoo \(2020\)Reliable fidelity and diversity metrics for generative models\.InProceedings of the 37th International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.119,pp\. 7176–7185\.External Links:[Link](https://proceedings.mlr.press/v119/naeem20a.html)Cited by:[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p1.1)\.
- A\. Q\. Nichol and P\. Dhariwal \(2021\)Improved denoising diffusion probabilistic models\.arXiv preprint arXiv:2102\.09672\.External Links:[Link](https://arxiv.org/abs/2102.09672)Cited by:[§C\.6](https://arxiv.org/html/2605.06829#A3.SS6.p1.3),[§C\.8](https://arxiv.org/html/2605.06829#A3.SS8.p1.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2),[§3\.1](https://arxiv.org/html/2605.06829#S3.SS1.p1.4),[§3](https://arxiv.org/html/2605.06829#S3.p1.4),[§5\.2](https://arxiv.org/html/2605.06829#S5.SS2.p1.4),[Table 2](https://arxiv.org/html/2605.06829#S8.T2.5.7.1.1.1.1)\.
- A\. Okhotin, D\. Molchanov, V\. Arkhipkin, G\. Bartosh, V\. Ohanesian, A\. Alanov, and D\. P\. Vetrov \(2023\)Star\-shaped denoising diffusion probabilistic models\.InAdvances in Neural Information Processing Systems,Vol\.36\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2023/hash/1fcefa894924bb1688041b7a26fb8aea-Abstract-Conference.html)Cited by:[§10\.1](https://arxiv.org/html/2605.06829#S10.SS1.p1.1),[§3\.1](https://arxiv.org/html/2605.06829#S3.SS1.p2.1)\.
- K\. Oko, S\. Akiyama, and T\. Suzuki \(2023\)Diffusion models are minimax optimal distribution estimators\.InProceedings of the 40th International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.202,pp\. 26517–26582\.External Links:[Link](https://proceedings.mlr.press/v202/oko23a.html)Cited by:[1st item](https://arxiv.org/html/2605.06829#S9.I3.i1.p1.1),[§9\.2](https://arxiv.org/html/2605.06829#S9.SS2.p1.5),[§9](https://arxiv.org/html/2605.06829#S9.p1.1)\.
- K\. Pandey, R\. Yang, and S\. Mandt \(2024\)Fast samplers for inverse problems in iterative refinement models\.InAdvances in Neural Information Processing Systems,Vol\.37,pp\. 26872–26914\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2024/file/2f46ef5725a8eca24f7f24a17955ad1a-Paper-Conference.pdf)Cited by:[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px3.p1.1),[§10\.5](https://arxiv.org/html/2605.06829#S10.SS5.p2.1),[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p2.1)\.
- G\. Papamakarios, E\. Nalisnick, D\. J\. Rezende, S\. Mohamed, and B\. Lakshminarayanan \(2021\)Normalizing flows for probabilistic modeling and inference\.Journal of Machine Learning Research22\(57\),pp\. 1–64\.External Links:[Link](https://jmlr.org/papers/v22/19-1028.html)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px1.p1.6),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p4.1),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p5.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2)\.
- Z\. Patel, J\. DeLoye, and L\. Mathias \(2024\)Exploring diffusion and flow matching under generator matching\.arXiv preprint arXiv:2412\.11024\.External Links:[Link](https://arxiv.org/abs/2412.11024)Cited by:[§8\.2](https://arxiv.org/html/2605.06829#S8.SS2.SSS0.Px4.p1.1)\.
- O\. Räisä, B\. Van Breugel, and M\. Van Der Schaar \(2025\)Position: all current generative fidelity and diversity metrics are flawed\.InProceedings of the 42nd International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.267,pp\. 82016–82050\.External Links:[Link](https://proceedings.mlr.press/v267/raisa25a.html)Cited by:[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p2.1)\.
- D\. J\. Rezende, S\. Mohamed, and D\. Wierstra \(2014\)Stochastic backpropagation and approximate inference in deep generative models\.arXiv preprint arXiv:1401\.4082\.External Links:[Link](https://arxiv.org/abs/1401.4082)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p2.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2)\.
- L\. Rout, N\. Raoof, G\. Daras, C\. Caramanis, A\. Dimakis, and S\. Shakkottai \(2023\)Solving linear inverse problems provably via posterior sampling with latent diffusion models\.InAdvances in Neural Information Processing Systems,Vol\.36,pp\. 49960–49990\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2023/file/9c70cfa2e7d9328c649c94d50cbf8faf-Paper-Conference.pdf)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p6.3),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px3.p1.1),[§10\.5](https://arxiv.org/html/2605.06829#S10.SS5.p1.1),[§10\.5](https://arxiv.org/html/2605.06829#S10.SS5.p2.1)\.
- F\. Rozet, G\. Andry, F\. Lanusse, and G\. Louppe \(2024\)Learning diffusion priors from observations by expectation maximization\.InAdvances in Neural Information Processing Systems,Vol\.37,pp\. 87647–87682\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2024/file/9f94298bac4668db4dc77ddb0a244301-Paper-Conference.pdf)Cited by:[§10\.5](https://arxiv.org/html/2605.06829#S10.SS5.p2.1)\.
- S\. S\. Sahoo, M\. Arriola, Y\. Schiff, A\. Gokaslan, E\. Marroquin, J\. T\. Chiu, A\. Rush, and V\. Kuleshov \(2024\)Simple and effective masked diffusion language models\.InAdvances in Neural Information Processing Systems,Vol\.37\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2024/hash/eb0b13cc515724ab8015bc978fdde0ad-Abstract-Conference.html)Cited by:[§10\.6](https://arxiv.org/html/2605.06829#S10.SS6.p1.1),[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p2.1),[§3\.6](https://arxiv.org/html/2605.06829#S3.SS6.p2.1)\.
- M\. S\. M\. Sajjadi, O\. Bachem, M\. Lucic, O\. Bousquet, and S\. Gelly \(2018\)Assessing generative models via precision and recall\.InAdvances in Neural Information Processing Systems,Vol\.31\.External Links:[Link](https://proceedings.neurips.cc/paper/2018/hash/f7696a9b362ac5a51c3dc8f098b73923-Abstract.html)Cited by:[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p1.1)\.
- T\. Salimans, I\. Goodfellow, W\. Zaremba, V\. Cheung, A\. Radford, and X\. Chen \(2016\)Improved techniques for training GANs\.arXiv preprint arXiv:1606\.03498\.External Links:[Link](https://arxiv.org/abs/1606.03498)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p3.1)\.
- T\. Salimans and J\. Ho \(2022\)Progressive distillation for fast sampling of diffusion models\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=TIdIXIpzhoI)Cited by:[§C\.8](https://arxiv.org/html/2605.06829#A3.SS8.p1.1)\.
- H\. Shen, J\. Zhang, B\. Xiong, R\. Hu, S\. Chen, Z\. Wan, X\. Wang, Y\. Zhang, Z\. Gong, G\. Bao, C\. Tao, Y\. Huang, Y\. Yuan, and M\. Zhang \(2025\)Efficient diffusion models: a survey\.Transactions on Machine Learning Research\.External Links:[Link](https://arxiv.org/abs/2502.06805)Cited by:[§1\.4](https://arxiv.org/html/2605.06829#S1.SS4.p1.1)\.
- L\. Simon, R\. Webster, and J\. Rabin \(2019\)Revisiting precision recall definition for generative modeling\.InProceedings of the 36th International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.97,pp\. 5799–5808\.External Links:[Link](https://proceedings.mlr.press/v97/simon19a.html)Cited by:[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p1.1)\.
- J\. Sohl\-Dickstein, E\. Weiss, N\. Maheswaranathan, and S\. Ganguli \(2015\)Deep unsupervised learning using nonequilibrium thermodynamics\.InInternational Conference on Machine Learning,External Links:[Link](https://arxiv.org/abs/1503.03585)Cited by:[§C\.6](https://arxiv.org/html/2605.06829#A3.SS6.p1.1),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px3.p1.1),[§3\.4](https://arxiv.org/html/2605.06829#S3.SS4.p1.1)\.
- J\. Song, C\. Meng, and S\. Ermon \(2020a\)Denoising diffusion implicit models\.arXiv preprint arXiv:2010\.02502\.External Links:[Link](https://arxiv.org/abs/2010.02502)Cited by:[§C\.8](https://arxiv.org/html/2605.06829#A3.SS8.p1.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2)\.
- Y\. Song, P\. Dhariwal, M\. Chen, and I\. Sutskever \(2023\)Consistency models\.arXiv preprint arXiv:2303\.01469\.External Links:[Link](https://arxiv.org/abs/2303.01469)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p5.1),[§1](https://arxiv.org/html/2605.06829#S1.p1.2),[§10\.4](https://arxiv.org/html/2605.06829#S10.SS4.p1.1),[§8\.1](https://arxiv.org/html/2605.06829#S8.SS1.p2.1),[§8\.2](https://arxiv.org/html/2605.06829#S8.SS2.SSS0.Px3.p1.1),[§9\.4](https://arxiv.org/html/2605.06829#S9.SS4.p2.1)\.
- Y\. Song and S\. Ermon \(2020\)Improved techniques for training score\-based generative models\.arXiv preprint arXiv:2006\.09011\.External Links:[Link](https://arxiv.org/abs/2006.09011)Cited by:[§1\.1](https://arxiv.org/html/2605.06829#S1.SS1.p2.1)\.
- Y\. Song, S\. Garg, J\. Shi, and S\. Ermon \(2020b\)Sliced score matching: a scalable approach to density and score estimation\.InProceedings of the 36th Conference on Uncertainty in Artificial Intelligence,Proceedings of Machine Learning Research, Vol\.115,pp\. 574–584\.External Links:[Link](https://proceedings.mlr.press/v115/song20a.html)Cited by:[§1\.1](https://arxiv.org/html/2605.06829#S1.SS1.p2.1)\.
- Y\. Song, J\. Sohl\-Dickstein, D\. P\. Kingma, A\. Kumar, S\. Ermon, and B\. Poole \(2021\)Score\-based generative modeling through stochastic differential equations\.arXiv preprint arXiv:2011\.13456\.External Links:[Link](https://arxiv.org/abs/2011.13456)Cited by:[§A\.1](https://arxiv.org/html/2605.06829#A1.SS1.p2.1),[§A\.6](https://arxiv.org/html/2605.06829#A1.SS6.p1.3),[§C\.4](https://arxiv.org/html/2605.06829#A3.SS4.p2.1),[Appendix C](https://arxiv.org/html/2605.06829#A3.p1.1),[1st item](https://arxiv.org/html/2605.06829#S1.I2.i1.p1.1),[item 1](https://arxiv.org/html/2605.06829#S1.I3.i1.p1.3),[item 3](https://arxiv.org/html/2605.06829#S1.I3.i3.p1.1),[item 4](https://arxiv.org/html/2605.06829#S1.I3.i4.p1.1),[item 1](https://arxiv.org/html/2605.06829#S1.I4.i1.p1.1),[item 2](https://arxiv.org/html/2605.06829#S1.I4.i2.p1.1),[1st item](https://arxiv.org/html/2605.06829#S1.I5.i1.p1.1),[2nd item](https://arxiv.org/html/2605.06829#S1.I5.i2.p1.1),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p5.1),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p6.3),[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px3.p1.1),[§1\.1](https://arxiv.org/html/2605.06829#S1.SS1.p1.1),[§1\.1](https://arxiv.org/html/2605.06829#S1.SS1.p2.1),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px1.p1.1),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px2.p1.1),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px3.p1.1),[§1\.8](https://arxiv.org/html/2605.06829#S1.SS8.p1.1),[item 1](https://arxiv.org/html/2605.06829#S2.I2.i1.p1.1),[§2\.3](https://arxiv.org/html/2605.06829#S2.SS3.p1.9),[§2\.5](https://arxiv.org/html/2605.06829#S2.SS5.p1.5),[§2](https://arxiv.org/html/2605.06829#S2.p1.1),[§3\.2](https://arxiv.org/html/2605.06829#S3.SS2.SSS0.Px1.p1.4),[§3\.2](https://arxiv.org/html/2605.06829#S3.SS2.SSS0.Px2.p1.5),[§3\.2](https://arxiv.org/html/2605.06829#S3.SS2.SSS0.Px2.p2.1),[§3\.2](https://arxiv.org/html/2605.06829#S3.SS2.p1.3),[§3\.3](https://arxiv.org/html/2605.06829#S3.SS3.p1.6),[§3\.4](https://arxiv.org/html/2605.06829#S3.SS4.p1.1),[§3\.5](https://arxiv.org/html/2605.06829#S3.SS5.p1.3),[§3](https://arxiv.org/html/2605.06829#S3.p1.4),[§4\.1](https://arxiv.org/html/2605.06829#S4.SS1.p1.2),[§4\.1](https://arxiv.org/html/2605.06829#S4.SS1.p2.1),[§4\.2](https://arxiv.org/html/2605.06829#S4.SS2.p1.9),[§4\.3](https://arxiv.org/html/2605.06829#S4.SS3.p1.6),[§4\.4](https://arxiv.org/html/2605.06829#S4.SS4.p1.1),[§5\.1](https://arxiv.org/html/2605.06829#S5.SS1.p1.5),[§5\.3](https://arxiv.org/html/2605.06829#S5.SS3.p1.3),[§5\.4](https://arxiv.org/html/2605.06829#S5.SS4.p1.2),[§5](https://arxiv.org/html/2605.06829#S5.p1.2),[§6\.1](https://arxiv.org/html/2605.06829#S6.SS1.p1.1),[§6\.1](https://arxiv.org/html/2605.06829#S6.SS1.p2.1),[§6\.3](https://arxiv.org/html/2605.06829#S6.SS3.p1.3),[§6\.4](https://arxiv.org/html/2605.06829#S6.SS4.p1.1),[§6](https://arxiv.org/html/2605.06829#S6.p1.1),[§7\.3](https://arxiv.org/html/2605.06829#S7.SS3.p1.2),[1st item](https://arxiv.org/html/2605.06829#S8.I1.i1.p1.1),[2nd item](https://arxiv.org/html/2605.06829#S8.I1.i2.p1.1),[§8\.2](https://arxiv.org/html/2605.06829#S8.SS2.SSS0.Px3.p1.1),[Table 2](https://arxiv.org/html/2605.06829#S8.T2.2.2.3.1.1),[Table 2](https://arxiv.org/html/2605.06829#S8.T2.3.3.2.1.1),[§9\.3](https://arxiv.org/html/2605.06829#S9.SS3.p1.1),[footnote 1](https://arxiv.org/html/2605.06829#footnote1)\.
- J\. P\. Stanczuk, G\. Batzolis, T\. Deveney, and C\. Schönlieb \(2024\)Diffusion models encode the intrinsic dimension of data manifolds\.InProceedings of the 41st International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.235,pp\. 46412–46440\.External Links:[Link](https://proceedings.mlr.press/v235/stanczuk24a.html)Cited by:[5th item](https://arxiv.org/html/2605.06829#S9.I3.i5.p1.1),[§9\.3](https://arxiv.org/html/2605.06829#S9.SS3.p2.1),[§9\.5](https://arxiv.org/html/2605.06829#S9.SS5.p2.1)\.
- G\. Stein, J\. Cresswell, R\. Hosseinzadeh, Y\. Sui, B\. Ross, V\. Villecroze, Z\. Liu, A\. L\. Caterini, E\. Taylor, and G\. Loaiza\-Ganem \(2023\)Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models\.InAdvances in Neural Information Processing Systems,Vol\.36\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2023/file/0bc795afae289ed465a65a3b4b1f4eb7-Abstract-Conference.html)Cited by:[§10\.7](https://arxiv.org/html/2605.06829#S10.SS7.p2.1)\.
- R\. Tang and Y\. Yang \(2024\)Adaptivity of diffusion models to manifold structures\.InProceedings of The 27th International Conference on Artificial Intelligence and Statistics,Proceedings of Machine Learning Research, Vol\.238,pp\. 1648–1656\.External Links:[Link](https://proceedings.mlr.press/v238/tang24a.html)Cited by:[5th item](https://arxiv.org/html/2605.06829#S9.I3.i5.p1.1),[§9\.3](https://arxiv.org/html/2605.06829#S9.SS3.p2.1)\.
- W\. Tang and H\. Zhao \(2024\)Score\-based diffusion models via stochastic differential equations: a technical tutorial\.arXiv preprint arXiv:2402\.07487\.External Links:[Link](https://arxiv.org/abs/2402.07487)Cited by:[§1\.4](https://arxiv.org/html/2605.06829#S1.SS4.p1.1),[§1\.4](https://arxiv.org/html/2605.06829#S1.SS4.p2.1)\.
- A\. Tewari, T\. Yin, G\. Cazenavette, S\. Rezchikov, J\. B\. Tenenbaum, F\. Durand, W\. T\. Freeman, and V\. Sitzmann \(2023\)Diffusion with forward models: solving stochastic inverse problems without direct supervision\.InAdvances in Neural Information Processing Systems,Vol\.36,pp\. 12349–12362\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2023/file/28e4ee96c94e31b2d040b4521d2b299e-Paper-Conference.pdf)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p6.3),[§1\.2](https://arxiv.org/html/2605.06829#S1.SS2.SSS0.Px1.p1.1),[§10\.5](https://arxiv.org/html/2605.06829#S10.SS5.p1.1),[§10](https://arxiv.org/html/2605.06829#S10.p1.1)\.
- P\. Vincent \(2011\)A connection between score matching and denoising autoencoders\.InNeural Computation,Vol\.23,pp\. 1661–1674\.External Links:[Link](https://doi.org/10.1162/NECO_a_00142)Cited by:[§C\.2](https://arxiv.org/html/2605.06829#A3.SS2.p1.4),[Appendix C](https://arxiv.org/html/2605.06829#A3.p1.1),[item 2](https://arxiv.org/html/2605.06829#S1.I3.i2.p1.2),[item 3](https://arxiv.org/html/2605.06829#S1.I4.i3.p1.1),[§1\.1](https://arxiv.org/html/2605.06829#S1.SS1.p2.1),[§2\.5](https://arxiv.org/html/2605.06829#S2.SS5.p1.5),[§4\.2](https://arxiv.org/html/2605.06829#S4.SS2.p1.9),[§5\.1](https://arxiv.org/html/2605.06829#S5.SS1.p1.5),[§5](https://arxiv.org/html/2605.06829#S5.p1.2)\.
- S\. Xie, Z\. Xiao, D\. P\. Kingma, T\. Hou, Y\. N\. Wu, K\. Murphy, T\. Salimans, B\. Poole, and R\. Gao \(2024\)EM distillation for one\-step diffusion models\.InAdvances in Neural Information Processing Systems,Vol\.37,pp\. 45073–45104\.External Links:[Link](https://papers.nips.cc/paper_files/paper/2024/hash/4fac0e32088db2fd2948cfaacc4fe108-Abstract-Conference.html)Cited by:[§10\.4](https://arxiv.org/html/2605.06829#S10.SS4.p1.1),[§9\.4](https://arxiv.org/html/2605.06829#S9.SS4.p2.1)\.
- Y\. Xu, Z\. Liu, M\. Tegmark, and T\. Jaakkola \(2022\)Poisson flow generative models\.InAdvances in Neural Information Processing Systems,Vol\.35\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2022/hash/6ad68a54eaa8f9bf6ac698b02ec05048-Abstract-Conference.html)Cited by:[§10\.1](https://arxiv.org/html/2605.06829#S10.SS1.p1.1),[§3\.6](https://arxiv.org/html/2605.06829#S3.SS6.p2.1),[§3](https://arxiv.org/html/2605.06829#S3.p1.4)\.
- Y\. Xu, Z\. Liu, Y\. Tian, S\. Tong, M\. Tegmark, and T\. Jaakkola \(2023\)PFGM\+\+: unlocking the potential of physics\-inspired generative models\.InProceedings of the 40th International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.202,pp\. 38566–38591\.External Links:[Link](https://proceedings.mlr.press/v202/xu23m.html)Cited by:[§3\.6](https://arxiv.org/html/2605.06829#S3.SS6.p2.1),[§3](https://arxiv.org/html/2605.06829#S3.p1.4),[§8\.2](https://arxiv.org/html/2605.06829#S8.SS2.SSS0.Px2.p1.1)\.
- Z\. Xu, R\. Qiu, Y\. Chen, H\. Chen, X\. Fan, M\. Pan, Z\. Zeng, M\. Das, and H\. Tong \(2024\)Discrete\-state continuous\-time diffusion for graph generation\.InAdvances in Neural Information Processing Systems,Vol\.37\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2024/hash/91813e5ddd9658b99be4c532e274b49c-Abstract-Conference.html)Cited by:[§10\.3](https://arxiv.org/html/2605.06829#S10.SS3.p1.1),[§10\.6](https://arxiv.org/html/2605.06829#S10.SS6.p1.1),[§3\.6](https://arxiv.org/html/2605.06829#S3.SS6.p2.1)\.
- K\. Xue, Y\. Zhou, S\. Nie, X\. Min, X\. Zhang, J\. Zhou, and C\. Li \(2024\)Unifying bayesian flow networks and diffusion models through stochastic differential equations\.arXiv preprint arXiv:2404\.15766\.External Links:[Link](https://arxiv.org/abs/2404.15766)Cited by:[§D\.9](https://arxiv.org/html/2605.06829#A4.SS9.p2.1),[§8\.2](https://arxiv.org/html/2605.06829#S8.SS2.SSS0.Px1.p1.2),[§8\.2](https://arxiv.org/html/2605.06829#S8.SS2.SSS0.Px4.p1.1)\.
- L\. Yang, Z\. Zhang, Y\. Song, S\. Hong, R\. Xu, Y\. Zhao, W\. Zhang, B\. Cui, and M\. Yang \(2022\)Diffusion models: a comprehensive survey of methods and applications\.arXiv preprint arXiv:2209\.00796\.External Links:[Link](https://arxiv.org/abs/2209.00796)Cited by:[§1\.4](https://arxiv.org/html/2605.06829#S1.SS4.p1.1),[§1\.4](https://arxiv.org/html/2605.06829#S1.SS4.p2.1)\.
- K\. Zhang, H\. Yin, F\. Liang, and J\. Liu \(2024\)Minimax optimality of score\-based diffusion models: beyond the density lower bound assumptions\.InProceedings of the 41st International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.235,pp\. 60134–60178\.External Links:[Link](https://proceedings.mlr.press/v235/zhang24bv.html)Cited by:[1st item](https://arxiv.org/html/2605.06829#S9.I3.i1.p1.1),[§9\.2](https://arxiv.org/html/2605.06829#S9.SS2.p1.5),[§9](https://arxiv.org/html/2605.06829#S9.p1.1)\.
- Y\. Zhang, X\. Shi, D\. Li, X\. Wang, J\. Wang, and H\. Li \(2023\)A unified conditional framework for diffusion\-based image restoration\.InAdvances in Neural Information Processing Systems,Vol\.36\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2023/file/9bf0810a4a1597a36d27ceea58667d92-Paper-Conference.pdf)Cited by:[§1](https://arxiv.org/html/2605.06829#S1.SS0.SSS0.Px2.p6.3),[§10\.5](https://arxiv.org/html/2605.06829#S10.SS5.p1.1)\.Similar Articles
The Geometry Behind Diffusion and Flow Matching: Gradient Flows and Geodesics in Wasserstein Space
This paper reveals that diffusion models and flow matching are two sides of the same Wasserstein geometry: diffusion follows a free-energy gradient flow (initial-value problem), while flow matching follows a Wasserstein geodesic (boundary-value problem), and they are unified through the JKO scheme.
Capturing non-Markovian dynamics in non-equilibrium stochastic systems using flow matching
This paper develops a generative flow matching method to capture non-Markovian dynamics in non-equilibrium stochastic systems, demonstrating improved predictions for the Kramers first passage time problem compared to Markovian baselines.
Perron--Frobenius Operator Matching for Generative Modeling
Introduces Perron–Frobenius Operator Matching (PFOM), a generative framework that unifies flow, diffusion, and jump models via integral PF operator matching, proving KL divergence yields a practical loss equivalent to Koopman path matching, and develops Nesterov-accelerated training and sampling for improved efficiency.
Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space
This paper reformulates language generation as a stochastic optimal control problem, addressing limitations of autoregressive and diffusion models, and proposes a closed-loop diffusion method in latent control space using Flow Matching, achieving high-fidelity generation and efficient parallel sampling.
Follow the Mean: Reference-Guided Flow Matching
This paper introduces a method for controllable generation in flow matching by adjusting the conditional endpoint mean using a reference set, offering both training-free and semi-parametric guidance for style and content control.