Constraint-Aware Flow Matching: Decision Aligned End-to-End Training for Constrained Sampling
Summary
Proposes Constraint-Aware Flow Matching, a novel end-to-end framework that aligns the model's learning dynamics with constrained sampling procedure, mitigating distributional shift from projection corrections for high-quality constrained generation.
View Cached Full Text
Cached at: 05/14/26, 06:18 AM
# Decision Aligned End-to-End Training for Constrained Sampling
Source: [https://arxiv.org/html/2605.12754](https://arxiv.org/html/2605.12754)
## Constraint\-Aware Flow Matching: Decision Aligned End\-to\-End Training for Constrained Sampling
Jacob K\. Christopher University of Virginia csk4sr@virginia\.edu &James E\. Warner NASA Langley Research Center james\.e\.warner@nasa\.gov &Ferdinando Fioretto University of Virginia fioretto@virginia\.edu
###### Abstract
Deep generative models provide state\-of\-the\-art performance across a wide array of applications, with recent studies showing increasing applicability for science and engineering\. Despite a growing corpus of literature focused on the integration of physics\-based constraints into the generation process, existing approaches fail to enforce strict constraint satisfaction while maintaining sample quality\. In particular, training\-free constrained sampling methods, while providing per\-sample feasibility guarantees, introduce a fundamental mismatch between the training objective and the constrained sampling procedure, often leading to performance degradation\. Identifying this training\-sampling misalignment as a central limitation of current constrained generative modeling approaches, this paper proposesConstraint\-Aware Flow Matching, a novel end\-to\-end framework that explicitly incorporates constraint projections into the training objective\. By aligning the model’s learned dynamics with the constrained sampling process, the proposed method mitigates distributional shift induced by projection\-based corrections, enabling high\-quality constrained generation\. The proposed approach is evaluated on three challenging real\-world benchmarks, illustrating the generality and efficacy of the method\.
## 1Introduction
Flow matching and diffusion generative models provide state\-of\-the\-art performance across a wide array of settings, representing the forefront of content creation for image and video generation\[[49](https://arxiv.org/html/2605.12754#bib.bib34),[30](https://arxiv.org/html/2605.12754#bib.bib45),[31](https://arxiv.org/html/2605.12754#bib.bib46)\], engineering\[[55](https://arxiv.org/html/2605.12754#bib.bib29)\], material science\[[18](https://arxiv.org/html/2605.12754#bib.bib35),[14](https://arxiv.org/html/2605.12754#bib.bib47),[58](https://arxiv.org/html/2605.12754#bib.bib49)\], and other scientific applications\[[16](https://arxiv.org/html/2605.12754#bib.bib9),[3](https://arxiv.org/html/2605.12754#bib.bib48),[28](https://arxiv.org/html/2605.12754#bib.bib50)\]\. Despite the promising potential of these models, the stochastic nature of these approaches often results in generations which resemble the training distribution but lack integral requirements of real\-world data\. Particularly within scientific domains, where samples are required to precisely satisfy domain\-specific constraints and physical laws to maintain any real meaning, a critical barrier to widespread adoption is the inability of generative models to adhere to these requirements\.
In response to these challenges, recent literature has started exploring constrained generative modeling for scientific domains\. This work can be partitioned into three classes: \(1\)Constraint Guidanceapproaches condition the generation on the constraint set, but these methods provide no formal certificate of feasibility, thus relying heavily on rejection sampling\[[12](https://arxiv.org/html/2605.12754#bib.bib30),[42](https://arxiv.org/html/2605.12754#bib.bib31),[10](https://arxiv.org/html/2605.12754#bib.bib51)\]; \(2\)Physics\-Informed Trainingaugments generative model training with a physical residual loss, aligning the sampling distribution with the constraint set but failing to enforce per\-sample satisfaction\[[56](https://arxiv.org/html/2605.12754#bib.bib1),[4](https://arxiv.org/html/2605.12754#bib.bib2)\]; \(3\)Constrained Samplingprovides per\-sample guarantees for constraint satisfaction, and, thus, these approaches have been most widely adopted\[[15](https://arxiv.org/html/2605.12754#bib.bib3),[35](https://arxiv.org/html/2605.12754#bib.bib5),[53](https://arxiv.org/html/2605.12754#bib.bib6),[59](https://arxiv.org/html/2605.12754#bib.bib4),[13](https://arxiv.org/html/2605.12754#bib.bib26)\]\. However, while constrained sampling shows significant promise, generation quality often degrades as compared to the unconstrained sampling approaches\.
This paper argues that this performance trade\-off stems from the misalignment between the training objective and the constrained sampling process\. Existing constrained sampling approaches are training\-free, modifying only the sampling phase and leveraging pretrained flow matching or diffusion models\. While this makes the integration of constraints inexpensive, requiring no model retraining, it also results in a model which is trained to sample in a fundamentally different way from how it is used at deployment\. Generally, state\-of\-the\-art methods rely on the integration of projections onto the feasible set during sampling, pushing the samples towards low density regions which may not have been learned during the training process\[[15](https://arxiv.org/html/2605.12754#bib.bib3),[35](https://arxiv.org/html/2605.12754#bib.bib5),[53](https://arxiv.org/html/2605.12754#bib.bib6)\]\. Figure[1](https://arxiv.org/html/2605.12754#S1.F1)\(b\) provides an illustration of this: while the initial predicted sample falls in a high\-density region of the data distribution, the projection pushes the sample to lower density regions when restoring feasibility\. The paper addresses this gap by introducing a constraint\-aware training procedure, ensuring alignment between training and sampling\.
Figure 1:Visualization ofConstraint\-Aware Flow Matchingcompared to standard flow matching\. While the clean state prediction from standard flow matching,z^1\\hat\{z\}\_\{1\}, falls in a high\-density region of the distribution, the projection degrades fidelity\. Conversely, our constraint\-aware objective optimizes thedownstream task, learning to predictz^1\\hat\{z\}\_\{1\}such that the projection falls in high\-density regions\.Contributions\.This work addresses the disconnect between the training and sampling processes of constrained flow matching approaches by making the following contributions:\(1\)It presents a novel end\-to\-end formulation of the constrained generation task, analyzing the failure modes of training\-free constrained modeling \(visualized in Figure[1](https://arxiv.org/html/2605.12754#S1.F1)\)\.\(2\)It derives a constraint\-aware training objective, inspired by the recent progress in the areas of differentiable optimization and decision\-focused learning\[[57](https://arxiv.org/html/2605.12754#bib.bib12),[54](https://arxiv.org/html/2605.12754#bib.bib55),[23](https://arxiv.org/html/2605.12754#bib.bib58),[21](https://arxiv.org/html/2605.12754#bib.bib56),[1](https://arxiv.org/html/2605.12754#bib.bib16),[5](https://arxiv.org/html/2605.12754#bib.bib60),[8](https://arxiv.org/html/2605.12754#bib.bib61),[22](https://arxiv.org/html/2605.12754#bib.bib59),[9](https://arxiv.org/html/2605.12754#bib.bib62),[40](https://arxiv.org/html/2605.12754#bib.bib57)\], aligning the training process with the downstream generative task and providing the first end\-to\-end training approach for constrained sampling approaches\.\(3\)It presents empirical evaluation on three real\-world scientific settings encompassing PDE\-constrained generation, microweather forecasting, and microstructure generation, with the introducedConstraint\-Aware Flow Matching \(CAFM\)reporting state\-of\-the\-art performance across these domains\.
## 2Related Work
Constraint Guidance and Physics\-Informed Training\.Early applications of generative models for constrained domains sought to control sample dynamics using model conditioning approaches, relying on learning\-based approaches which either trained an independent classifier or leveraged the diffusion model directly to steer the generation\[[26](https://arxiv.org/html/2605.12754#bib.bib27),[27](https://arxiv.org/html/2605.12754#bib.bib28)\]\. As these methods were increasingly applied for scientific tasks\[[55](https://arxiv.org/html/2605.12754#bib.bib29),[12](https://arxiv.org/html/2605.12754#bib.bib30),[42](https://arxiv.org/html/2605.12754#bib.bib31)\], more formal notions of constraint\-aware training began to be explored within the literature, applying principles from physics\-informed neural networks to generative contexts\[[48](https://arxiv.org/html/2605.12754#bib.bib32),[47](https://arxiv.org/html/2605.12754#bib.bib33)\]\. \([2024](https://arxiv.org/html/2605.12754#bib.bib2)\) proposed incorporating a physical constraint directly into the training objective of the diffusion model\.[Warneret al\.](https://arxiv.org/html/2605.12754#bib.bib1)\([2026](https://arxiv.org/html/2605.12754#bib.bib1)\) demonstrated that physical and statistical constraints could be adopted as a residual loss when learning a latent space for latent flow matching\[[19](https://arxiv.org/html/2605.12754#bib.bib52)\]\. However, while these works provide a stronger alternative to conditioning\-based constraint enforcement, they fail to provide per\-sample guarantees, align the sampling process only at a distributional\-level, and rely heavily on neural networks to approximate the true constraints\.
Constrained Sampling\.More recently,[Christopheret al\.](https://arxiv.org/html/2605.12754#bib.bib3)\([2024](https://arxiv.org/html/2605.12754#bib.bib3)\) introduced constrained sampling approaches for diffusion models, incorporating constraint correction during the generation process by projecting onto the feasible set\. Subsequent studies have demonstrated that constrained sampling provides state\-of\-the\-art performance in a broad range of domains including robotics\[[35](https://arxiv.org/html/2605.12754#bib.bib5)\], biology\[[16](https://arxiv.org/html/2605.12754#bib.bib9)\], and material science\[[18](https://arxiv.org/html/2605.12754#bib.bib35)\]\. Sampling\-time approaches have been extended to latent diffusion models\[[59](https://arxiv.org/html/2605.12754#bib.bib4)\], as well as to flow matching parameterizations\[[53](https://arxiv.org/html/2605.12754#bib.bib6),[36](https://arxiv.org/html/2605.12754#bib.bib36)\]\. However, while this class of methods contribute the strongest performance for constrained generative modeling tasks, they remain training\-free by design, and this disconnect between training and sampling often results in degraded sample quality by conventional metrics\.
Differentiable Optimization\.Our method builds on the general principle that optimization procedures can be embedded within differentiable computational pipelines\. Differentiation through constrained optimization problems was identified as of significant importance for machine learning settings, dating back to[Gouldet al\.](https://arxiv.org/html/2605.12754#bib.bib37)\([2016](https://arxiv.org/html/2605.12754#bib.bib37)\)\. This line of research has been predominantly applied in end\-to\-end learning settings, enabling machine learning models to integrate constraint programming into their training processes\.[Amos and Kolter](https://arxiv.org/html/2605.12754#bib.bib15)\([2017](https://arxiv.org/html/2605.12754#bib.bib15)\) proposed using a quadratic programming layer for solving constrained optimization problems during the forward pass of the network\. While this approach leveraged implicit differentiation of KKT conditions for the backward pass, differentiable optimization layers have been widely generalized\[[1](https://arxiv.org/html/2605.12754#bib.bib16),[33](https://arxiv.org/html/2605.12754#bib.bib7),[57](https://arxiv.org/html/2605.12754#bib.bib12),[39](https://arxiv.org/html/2605.12754#bib.bib17),[24](https://arxiv.org/html/2605.12754#bib.bib18),[46](https://arxiv.org/html/2605.12754#bib.bib19),[50](https://arxiv.org/html/2605.12754#bib.bib20),[43](https://arxiv.org/html/2605.12754#bib.bib21)\]\. While many of these methods were limited to convex constraint sets,[Kotaryet al\.](https://arxiv.org/html/2605.12754#bib.bib7)\([2023](https://arxiv.org/html/2605.12754#bib.bib7)\) and[Blondelet al\.](https://arxiv.org/html/2605.12754#bib.bib63)\([2022](https://arxiv.org/html/2605.12754#bib.bib63)\) demonstrated applicability for nonconvex constraints using sequential quadratic programming and fixed\-point differentiation, enabling greater generality of applicable use cases\.
## 3Preliminaries: Flow Matching
Flow Matching\[[37](https://arxiv.org/html/2605.12754#bib.bib39)\]\.Flow matching is a generative modeling framework where a neural network learns a time\-dependent velocity fieldvθ:ℝd×\[0,1\]→ℝdv\_\{\\theta\}:\\mathbb\{R\}^\{d\}\\times\[0,1\]\\to\\mathbb\{R\}^\{d\}that defines a deterministic flowψt:ℝd→ℝd\\psi\_\{t\}:\\mathbb\{R\}^\{d\}\\to\\mathbb\{R\}^\{d\}transporting samples from a base distributionp0p\_\{0\}to a target distributionp1p\_\{1\}\. Given an initial conditionz0∼p0z\_\{0\}\\sim p\_\{0\}, the generative trajectory is obtained by solving the ODE
ddtψt\(z0\)=vθ\(ψt\(z0\),t\),ψ0\(z0\)=z0,\\frac\{d\}\{dt\}\\psi\_\{t\}\(z\_\{0\}\)\\;=\\;v\_\{\\theta\}\\\!\\bigl\(\\psi\_\{t\}\(z\_\{0\}\),t\\bigr\),\\qquad\\psi\_\{0\}\(z\_\{0\}\)=z\_\{0\},\(1\)and the generated sample isz1=ψ1\(z0\)z\_\{1\}=\\psi\_\{1\}\(z\_\{0\}\)\. Training in flow matching is performed using a*reference transport path*constructed from paired endpoints\(𝐳0,𝐳1\)\(\\mathbf\{z\}\_\{0\},\\mathbf\{z\}\_\{1\}\), where𝐳0∼p0\\mathbf\{z\}\_\{0\}\\sim p\_\{0\}and𝐳1∼p1\\mathbf\{z\}\_\{1\}\\sim p\_\{1\}are coupled \(e\.g\., via an optimal transport plan\)\. A time\-ttpoint on this reference path is denoted𝐳t\\mathbf\{z\}\_\{t\}\(e\.g\.,𝐳t=\(1−t\)𝐳0\+t𝐳1\\mathbf\{z\}\_\{t\}=\(1\-t\)\\mathbf\{z\}\_\{0\}\+t\\mathbf\{z\}\_\{1\}, optionally with small noise\), and the corresponding reference velocity for the path is𝐳1−𝐳0\\mathbf\{z\}\_\{1\}\-\\mathbf\{z\}\_\{0\}\. The velocity networkvθv\_\{\\theta\}is then trained by regressing to this reference velocity:
ℒFM=‖vθ\(𝐳t,t\)⏞predicted−\(𝐳1−𝐳0\)⏞true velocity‖2\.\\mathcal\{L\}\_\{FM\}=\\\|\\overbrace\{\{v\}\_\{\\theta\}\(\\mathbf\{z\}\_\{t\},t\)\}^\{\\text\{predicted\}\}\-\\overbrace\{\(\\mathbf\{z\}\_\{1\}\-\\mathbf\{z\}\_\{0\}\)\}^\{\\text\{true velocity\}\}\\\|^\{2\}\.\(2\)
Notation\.Bold variables𝐳0,𝐳t,𝐳1\\mathbf\{z\}\_\{0\},\\mathbf\{z\}\_\{t\},\\mathbf\{z\}\_\{1\}denote states induced by the*training*coupling/path \(constructed directly from paired endpoints\), whereas non\-bold variablesz0,zt,z1z\_\{0\},z\_\{t\},z\_\{1\}denote states induced by the*learned generative flow*in \([1](https://arxiv.org/html/2605.12754#S3.E1)\)\. This distinction is important because training evaluatesvθv\_\{\\theta\}on𝐳t\\mathbf\{z\}\_\{t\}sampled from the reference path, while at generation time the model is queried onztz\_\{t\}produced by integrating the learned dynamics\.
## 4Constraint\-Aware Sampling
The goal of constrained generation is to produce samplesz1∼pdataz\_\{1\}\\sim p\_\{\\text\{data\}\}subject to a set of feasibility requirementsCC\. While often times the true data distributionpdatap\_\{\\text\{data\}\}may already satisfy the relevant constraints by construction \(e\.g\., physical systems typically respect the underlying governing laws\), the practical challenge is that a learned generative modelvθv\_\{\\theta\}, despite approximatingpdatap\_\{\\text\{data\}\}, may assign probability mass to infeasible regions of the ambient space\. If𝟙\(⋅\)\\mathbbm\{1\}\(\\cdot\)denotes an indicator function, the*idealized constrained target*is:
pC\(z1\)∝pdata\(z1\)𝟙\{z1∈C\},\\displaystyle p\_\{C\}\(z\_\{1\}\)\\propto p\_\{\\text\{data\}\}\(z\_\{1\}\)\\mathbbm\{1\}\\\{z\_\{1\}\\in C\\\},\(3\)
with the practical objective obtained by replacingpdatap\_\{\\text\{data\}\}with the learned approximation obtained byvθv\_\{\\theta\}\[[16](https://arxiv.org/html/2605.12754#bib.bib9)\]\. Across applications, the constraint setCCis selected on the basis of domain\-specific criteria, but, to ensure generality in the presentation, it can be arbitrarily defined as the intersection of a series of equality constraints𝐡\(z1\)=0\\mathbf\{h\}\(z\_\{1\}\)=0and inequality constraints𝐠\(z1\)≤0\\mathbf\{g\}\(z\_\{1\}\)\\leq 0, imposed on the ambient space\.
As discussed in Section[2](https://arxiv.org/html/2605.12754#S2), constrained sampling approaches provide state\-of\-the\-art performance for this class of generation task\. However, these methods suffer from their own set of challenges\. Seminal works focus on correcting intermediate or partially denoised states to allow constraint enforcement to guide the generation process\[[15](https://arxiv.org/html/2605.12754#bib.bib3),[35](https://arxiv.org/html/2605.12754#bib.bib5)\]\. Yet, enforcing constraints on noisy representations can bias the sampling trajectory, distort the learned dynamics, or over\-correct due to high\-variance noise\[[6](https://arxiv.org/html/2605.12754#bib.bib10)\]\. More recent studies have demonstrated that enforcing feasibility on clean state predictions results in a reduction of this bias, particularly in high\-dimensional settings within more restrictive feasibility criteria\[[53](https://arxiv.org/html/2605.12754#bib.bib6),[16](https://arxiv.org/html/2605.12754#bib.bib9)\]\. In consequence, this study adopts a final state projection method formalized below:
z^1\\displaystyle\\hat\{z\}\_\{1\}=ODESolve\(zt,vθ,t,1\)\\displaystyle=\\textrm\{ODESolve\}\\bigl\(z\_\{t\},v\_\{\\theta\},t,1\\bigr\)\(Forward Solve\)z1\\displaystyle z\_\{1\}=𝒫C\(z^1\)\\displaystyle=\\mathcal\{P\}\_\{C\}\\bigl\(\\hat\{z\}\_\{1\}\\bigr\)\(Constraint Projection\)zs\\displaystyle z\_\{s\}=ODESolve\(z1,−\(z1−z0\),1,s\)\\displaystyle=\\textrm\{ODESolve\}\\Bigl\(z\_\{1\},\-\\bigl\(z\_\{1\}\-z\_\{0\}\\bigr\),1,s\\Bigl\)\(Reverse Update\)where the projection operator𝒫C\(y\):=argminx∈C‖x−y‖22\\mathcal\{P\}\_\{C\}\(y\):=\\arg\\min\_\{x\\in C\}\\\|x\-y\\\|^\{2\}\_\{2\}returns the nearest feasible point, theODESolvefollows the trajectory defined in Equation \([1](https://arxiv.org/html/2605.12754#S3.E1)\), and timesteps=t\+Δs=t\+\\Deltafor a step sizeΔ\>0\\Delta\>0\. This formulation follows the sampling approach proposed by\[[53](https://arxiv.org/html/2605.12754#bib.bib6)\], which removes the intermediate feasibility assumption presented in prior work, solely enforcingterminal state feasibility\. At each sampling stage, theForward Solvepredicts a cleanz1z\_\{1\}, which is projected onto the constraint manifold\. Then, the learned flow is updated such that the end point enforces feasibility\.
## 5Aligning Training and Inference
Within the existing constrained sampling literature, the training\-free property of the sampling approaches is often regarded as a merit of the methodology\. However, existing diffusion and flow matching objectives are designed to sample frompdatap\_\{\\text\{data\}\}, thus remaining misaligned with the downstream task of sampling frompCp\_\{C\}\. Indeed, the standard training objective optimizes the unconstrained generative path \(Figure[1](https://arxiv.org/html/2605.12754#S1.F1)\(a\)\), while the actual deployment objective is strongly influenced by the feasibility correction \(Figure[1](https://arxiv.org/html/2605.12754#S1.F1)\(b\)\)\. These observations suggest that sampling\-time enforcement alone creates a disconnect between what the models is trained to optimize and what is ultimately evaluated in deployment\.
While this challenge remains unaddressed for constrained generative modeling, machine learning for optimization is an area of the literature where end\-to\-end learning has been explored extensively\[[41](https://arxiv.org/html/2605.12754#bib.bib11)\]\. This area of research formalizes a downstream task:
𝐱⋆\(𝐜\)=argmin𝐱\\displaystyle\\mathbf\{x\}^\{\\star\}\(\\mathbf\{c\}\)=\\underset\{\\mathbf\{x\}\}\{\\arg\\min\}\\;f\(𝐱,𝐜\)\\displaystyle f\(\\mathbf\{x\},\\mathbf\{c\}\)\(4a\)s\.t\.𝐠\(𝐱,𝐜\)≤0;𝐡\(𝐱,𝐜\)=0\\displaystyle\\mathbf\{g\}\(\\mathbf\{x\},\\mathbf\{c\}\)\\leq 0;\\quad\\mathbf\{h\}\(\\mathbf\{x\},\\mathbf\{c\}\)=0\(4b\)where𝐱⋆\(𝐜\)\\mathbf\{x\}^\{\\star\}\(\\mathbf\{c\}\)is an optimal decision of a constrained optimization problem parameterized by𝐜\\mathbf\{c\}, a prediction by an ML model\. While earlier works focused on prediction\-focused learning\[[29](https://arxiv.org/html/2605.12754#bib.bib14)\], minimizing the mean squared error between the predicted parameters𝐜^\\mathbf\{\\hat\{c\}\}and the true parameters𝐜\\mathbf\{c\}, later studies showed that decision\-focused learning\[[57](https://arxiv.org/html/2605.12754#bib.bib12)\], minimizing the downstream loss,
ℒDFL:=f\(𝐱⋆\(𝐜^\),𝐜\)−f\(𝐱⋆\(𝐜\),𝐜\),\\displaystyle\\mathcal\{L\}\_\{\\text\{DFL\}\}:=f\(\\mathbf\{x\}^\{\\star\}\(\\mathbf\{\\hat\{c\}\}\),\\mathbf\{c\}\)\-f\(\\mathbf\{x\}^\{\\star\}\(\\mathbf\{\{c\}\}\),\\mathbf\{c\}\),\(5\)results in improved downstream performance\. Theoretically, this improvement provided by decision\-focused learning, Equation \([5](https://arxiv.org/html/2605.12754#S5.E5)\), can be characterized by a discontinuity between the the downstream and the prediction\-focused loss; while the global optimum between the two losses is aligned, sharp changes in𝐱⋆\(𝐜^\)\\mathbf\{x\}^\{\\star\}\(\\mathbf\{\\hat\{c\}\}\)can result in suboptimality increasing even when prediction quality improves\[[20](https://arxiv.org/html/2605.12754#bib.bib13)\]\.
### 5\.1Constraint\-Aware Flow Matching Objective
Building from the problem formulation presented in Equation \([4](https://arxiv.org/html/2605.12754#S5.E4)\), this section extends the end\-to\-end formulation leveraged by decision\-focused learning to the setting of constrained generation\. Bridging the decision\-focused learning formulation to the constrained generation task in Equation \([3](https://arxiv.org/html/2605.12754#S4.E3)\), let𝐱⋆\(𝐜\):=z1\\mathbf\{x\}^\{\\star\}\(\\mathbf\{c\}\):=z\_\{1\},𝐜:=𝐳1\\mathbf\{c\}:=\\mathbf\{z\}\_\{1\}, andf\(𝐱,𝐜\):=‖𝐱−𝐜‖2f\(\\mathbf\{x\},\\mathbf\{c\}\):=\\\|\\mathbf\{x\}\-\\mathbf\{c\}\\\|^\{2\}, the objective of the projection operator, recovering the optimization:
𝐱⋆\(𝐜\)=argmin𝐱∥\\displaystyle\\mathbf\{x\}^\{\\star\}\(\\mathbf\{c\}\)=\\underset\{\\mathbf\{x\}\}\{\\arg\\min\}\\;\\\|𝐱−𝐜∥2\\displaystyle\\mathbf\{x\}\-\\mathbf\{c\}\\\|^\{2\}\(6a\)s\.t\.𝐠\(𝐱\)≤0;𝐡\(𝐱\)=0\\displaystyle\\mathbf\{g\}\(\\mathbf\{x\}\)\\leq 0;\\quad\\mathbf\{h\}\(\\mathbf\{x\}\)=0\(6b\)Assuming𝐳1∈C\\mathbf\{z\}\_\{1\}\\in\{C\}, this formulation recovers the training sample exactly, analogous to decision\-focused learning under the true parameterization of𝐜\\mathbf\{c\}\. At training time, then, the standard flow matching objective aligns with prediction\-focused learning; specifically, it optimizes the prediction task directly, as we define𝐜^:=z^1\\mathbf\{\\hat\{c\}\}:=\\hat\{z\}\_\{1\}, the unconstrained prediction\.
As a result, training\-free constrained sampling approaches realize a gap between the prediction\-focused task and the downstream decision\-focused task, mirroring the challenges that have been exhaustively studied within decision\-focused learning\. While this discontinuity especially emerges when applying nonconvex constraints, where small changes in the
Figure 2:Example of two predictions with identical prediction\-focused losses resulting in substantially different decision\-focused losses\.prediction task can result in larges changes in the minimizer of the projection, the gap is present even in convex cases as illustrated in Figure[2](https://arxiv.org/html/2605.12754#S5.F2)\. Indeed, the example shows that even in simple convex cases, predictionsz^1\{\\hat\{z\}\_\{1\}\}with identical mean square error can result in very different downstream decisionsz1z\_\{1\}\.
A natural objection could be that different parameterizations of generative models may allow training\-free constrained sampling to address this gap\. For instance, considering the loss landscape in Figure[2](https://arxiv.org/html/2605.12754#S5.F2), one may propose adopting a proximal method to sample from high density regions while satisfying the constraint\. As such, prior works have leveraged score\-matching as an approach for estimating the density function gradient, enabling proximal gradient optimization\[[15](https://arxiv.org/html/2605.12754#bib.bib3),[6](https://arxiv.org/html/2605.12754#bib.bib10)\]\. However, the disconnect between the training and sampling process remains the central challenge in this case as well; as the proximal algorithm diverges from the forward diffusion trajectory, score estimation becomes increasingly inaccurate due to the absence of training data in these low density regions\[[16](https://arxiv.org/html/2605.12754#bib.bib9),[52](https://arxiv.org/html/2605.12754#bib.bib42)\]\. Hence, a critical step to closing the gap with standard generative methods remains defining an aligned, end\-to\-end training process for constrained samplers\.
In order to derive this end\-to\-end training objective, we begin by revisiting the standard flow matching loss function:
ℒFM=‖vθ\(𝐳t,t\)⏞predicted−\(𝐳1−𝐳0\)⏞true velocity‖2\\displaystyle\\mathcal\{L\}\_\{FM\}=\\\|\\overbrace\{\{v\}\_\{\\theta\}\(\\mathbf\{z\}\_\{t\},t\)\}^\{\\text\{predicted\}\}\-\\overbrace\{\(\\mathbf\{z\}\_\{1\}\-\\mathbf\{z\}\_\{0\}\)\}^\{\\text\{true velocity\}\}\\\|^\{2\}=‖\(𝐳0\+vθ\(𝐳t,t\)\)−𝐳1‖2\\displaystyle=\\;\\\|\\bigl\(\\mathbf\{z\}\_\{0\}\+v\_\{\\theta\}\(\\mathbf\{z\}\_\{t\},t\)\\bigr\)\-\\mathbf\{z\}\_\{1\}\\\|^\{2\}=‖z^1−𝐳1‖2=‖𝐜^−𝐜‖2\\displaystyle=\\\|\\hat\{z\}\_\{1\}\-\\mathbf\{z\}\_\{1\}\\\|^\{2\}\\;=\\;\\\|\\mathbf\{\\hat\{c\}\}\-\\mathbf\{c\}\\\|^\{2\}\(Prediction\-Focused Loss\)
To align the flow matching loss with the downstream task, the mean square error optimization is imposed on the optimal solution𝐱⋆\(𝐜^\):=z1\\mathbf\{x\}^\{\\star\}\(\\mathbf\{\\hat\{c\}\}\):=z\_\{1\}rather than𝐜^:=z^1\\mathbf\{\\hat\{c\}\}:=\\hat\{z\}\_\{1\}, restoring alignment with the true endpoint\. Equation \([2](https://arxiv.org/html/2605.12754#S3.E2)\) is thus extended under this framing to yield theconstraint\-aware flow matching objective:
ℒCAFM\\displaystyle\\mathcal\{L\}\_\{CAFM\}=‖\(𝒫C\(𝐳0\+vθ\(𝐳t,t\)\)−𝐳0\)⏞projected velocity prediction−\(𝐳1−𝐳0\)⏞true velocity‖2\\displaystyle=\\Bigl\\\|\\overbrace\{\\Bigl\(\\mathcal\{P\}\_\{C\}\\bigl\(\\mathbf\{z\}\_\{0\}\+v\_\{\\theta\}\(\\mathbf\{z\}\_\{t\},t\)\\bigr\)\-\\mathbf\{z\}\_\{0\}\\Bigr\)\}^\{\\text\{projected velocity prediction\}\}\-\\overbrace\{\\Bigl\(\\mathbf\{z\}\_\{1\}\-\\mathbf\{z\}\_\{0\}\\Bigr\)\}^\{\\text\{true velocity\}\}\\Bigr\\\|^\{2\}\(8a\)=‖𝒫C\(𝐳0\+vθ\(𝐳t,t\)\)⏞𝒫C\(z^1\)=z1−𝐳1‖2\\displaystyle=\\Bigl\\\|\\overbrace\{\\mathcal\{P\}\_\{C\}\\bigl\(\\mathbf\{z\}\_\{0\}\+v\_\{\\theta\}\(\\mathbf\{z\}\_\{t\},t\)\\bigr\)\}^\{\\mathcal\{P\}\_\{C\}\(\\hat\{z\}\_\{1\}\)\\;=\\;z\_\{1\}\}\-\\;\\mathbf\{z\}\_\{1\}\\Bigr\\\|^\{2\}\(8b\)Notably, this recovers the decision\-focused loss in Equation \([5](https://arxiv.org/html/2605.12754#S5.E5)\),
ℒCAFM\\displaystyle\\mathcal\{L\}\_\{CAFM\}=‖𝒫C\(𝐳0\+vθ\(𝐳t,t\)\)⏞𝐱−𝐳1⏞𝐜‖2\\displaystyle=\\Bigl\\\|\\overbrace\{\\mathcal\{P\}\_\{C\}\\bigl\(\\mathbf\{z\}\_\{0\}\+v\_\{\\theta\}\(\\mathbf\{z\}\_\{t\},t\)\\bigr\)\}^\{\\mathbf\{x\}\}\-\\;\\overbrace\{\\mathbf\{z\}\_\{1\}\}^\{\\mathbf\{c\}\}\\Bigr\\\|^\{2\}=‖𝐱−𝐜‖2−0=f\(𝐱⋆\(𝐜^\),𝐜\)−f\(𝐱⋆\(𝐜\),𝐜\)\\displaystyle=\\\|\\mathbf\{x\}\-\\mathbf\{c\}\\\|^\{2\}\\\!\-0\\;=\\;f\(\\mathbf\{x\}^\{\\star\}\(\\mathbf\{\\hat\{c\}\}\),\\mathbf\{c\}\)\-f\(\\mathbf\{x\}^\{\\star\}\(\\mathbf\{\{c\}\}\),\\mathbf\{c\}\)\(Decision\-Focused Loss\)
### 5\.2Differentiable Projection Operators
Having justified the adoption of the constraint\-aware flow matching objective, the next challenge is defining the projection problem such that it can be treated as a differentiable operation\. The key observation is that, when implemented by an iterative optimization algorithm, the projection can be unrolled into the computational graph\. To facilitate end\-to\-end training, the operator𝒫C\\mathcal\{P\}\_\{C\}is integrated as adifferentiable projection layer: a parameterized mapping from the predicted clean statez^1\\hat\{z\}\_\{1\}to the projected statez1=𝒫C\(z^1\)z\_\{1\}=\\mathcal\{P\}\_\{C\}\(\\hat\{z\}\_\{1\}\)\. However, rather than requiring a closed\-form projector,𝒫C\\mathcal\{P\}\_\{C\}is implemented by an iterative optimization routine\. LetΦ\\Phidenote a single update of this optimization algorithm, starting from the initialization𝐱\(0\)\(𝐜^\)=z^1\\mathbf\{x\}^\{\(0\)\}\(\\mathbf\{\\hat\{c\}\}\)=\\hat\{z\}\_\{1\}, and
𝐱\(k\+1\)\(𝐜^\)=Φ\(𝐱\(k\),𝐜^\),k=0,…,K−1\.\\mathbf\{x\}^\{\(k\+1\)\}\(\\mathbf\{\\hat\{c\}\}\)=\\Phi\\bigl\(\\mathbf\{x\}^\{\(k\)\},\\mathbf\{\\hat\{c\}\}\\bigr\),\\quad k=0,\\ldots,K\-1\.Then, the projection layer used during training is composed of a series of iterative updates:
𝒫C\(z^1\)=𝐱⋆\(𝐜^\)≈𝐱\(K\)\(𝐜^\)=\(Φ∘⋯∘Φ\)\(z^1\)\\mathcal\{P\}\_\{C\}\\bigl\(\\hat\{z\}\_\{1\}\\bigr\)=\\mathbf\{x\}^\{\\star\}\(\\mathbf\{\\hat\{c\}\}\)\\approx\\mathbf\{x\}^\{\(K\)\}\(\\mathbf\{\\hat\{c\}\}\)=\\bigl\(\\Phi\\circ\\cdots\\circ\\Phi\\bigr\)\\bigl\(\\hat\{z\}\_\{1\}\\bigr\)where𝐱⋆\(𝐜^\)\\mathbf\{x\}^\{\\star\}\(\\hat\{\\mathbf\{c\}\}\)denotes the fixed\-point of the update map, satisfying
𝐱⋆\(𝐜^\)=Φ\(𝐱⋆,𝐜^\)\.\\mathbf\{x\}^\{\\star\}\(\\hat\{\\mathbf\{c\}\}\)=\\Phi\\bigl\(\\mathbf\{x\}^\{\\star\},\\hat\{\\mathbf\{c\}\}\\bigr\)\.Because the optimization routine is run for onlyKKiterations, the layer uses𝐱\(K\)\\mathbf\{x\}^\{\(K\)\}as a finite\-step approximation to this fixed\-point\. Under this view, the unrolled operations are added to the graph via the chain rule, enabling gradient flow from the downstream objective\.
Projecting with Sequential Quadratic Programming\.*To accommodate constraint sets defined by general nonlinear relations, including nonconvex structure, optimization updatesΦ\\Phiis practically implemented using Sequential Quadratic Programming \(SQP\) correction steps\. In contrast to proximal gradient updates, which are often more applicable for simple convex sets, SQP addresses nonlinearity by solving a sequence of local quadratic subproblems obtained from linearized constraints and a quadratic model of the Lagrangian*ℒ\(z1,λ,ν\)=12‖z^1−z1‖2\+λ⊤h\(z1\)\+ν⊤g\(z1\)\.\\mathcal\{L\}\(z\_\{1\},\\lambda,\\nu\)=\\tfrac\{1\}\{2\}\\\|\\hat\{z\}\_\{1\}\-\{z\}\_\{1\}\\\|^\{2\}\+\\lambda^\{\\top\}h\(z\_\{1\}\)\+\\nu^\{\\top\}g\(z\_\{1\}\)\.*whereν≥0\\nu\\geq 0\. This follows the rationale provided by[Kotaryet al\.](https://arxiv.org/html/2605.12754#bib.bib7)\([2023](https://arxiv.org/html/2605.12754#bib.bib7)\), who use SQP for nonconvex nonlinear constrained optimization layers, where feasibility is not enforced through a simple convex proximal map\. Appendix[B](https://arxiv.org/html/2605.12754#A2)provides further details of the implementation\.*
## 6Experiments
Three real\-world problems are selected for the downstream evaluation of the proposed methodology:
1. 1\.Partial Differential Equationconstrained physical systems, comprising four distinct problems,
2. 2\.Wind Velocity Field Estimationover spatiotemporal horizons under a prescribed coherence,
3. 3\.Microstructure Inverse\-Designfor material discovery under porosity constraints\.
For transparent evaluation, CAFM uses an identical sampling setup to Physics\-Constrained Flow Matching \(PCFM\)\[[53](https://arxiv.org/html/2605.12754#bib.bib6)\]in all settings, isolating the impact of end\-to\-end training\. Additional constrained sampling approaches are included when applicable, and Functional Flow Matching \(FFM\)\[[37](https://arxiv.org/html/2605.12754#bib.bib39)\]serves as a reference for unconstrained generative performance\. The method performance is evaluated byaccuracyandfeasibilityof the outputs, with domain\-specific evaluation metrics selected for each task\. Extended details are provided in Appendix[A](https://arxiv.org/html/2605.12754#A1)\.
Figure 3:Visualization of baseline performance on Reaction–Diffusion IC task\.### 6\.1Application 1: Partial Differential Equations
The first setting benchmarks CAFM across a set of challenging partial differential equations\. Evaluations on PDE governed systems have recently been adopted as a standard stress test for constrained generative models, providing real\-world scientific complexity that remains difficult for neural operators and generative models\[[53](https://arxiv.org/html/2605.12754#bib.bib6),[4](https://arxiv.org/html/2605.12754#bib.bib2)\]\.
The selected settings introduce increasing constraint complexity, as implemented by\[[53](https://arxiv.org/html/2605.12754#bib.bib6)\]\. Navier–Stokes imposes initial vorticity \(IC\) and linear global mass conservation constraints \(CL\)\. Reaction–Diffusion introduces curvature to the constraint manifold through nonlinear conservation law \(CL\)\. Burgers BC incorporates localized, pointwise boundary conditions increasing structural rigidity, while Burgers IC enforces initial conditions \(IC\) and both nonlinear global nonlinear mass conservation and local conservation updates \(CL\)\. Following[Utkarshet al\.](https://arxiv.org/html/2605.12754#bib.bib6)\([2025](https://arxiv.org/html/2605.12754#bib.bib6)\), solution subsets are constrained on held\-out IC or BC, allowing the experiment to evaluate the performance of constraint\-aware training when the test\-time constraints have not appeared in the training set\.
Table 1:Generative performance for zero\-shot methods on constrained PDEs with linear and nonlinear constraints\. Navier–Stokes enforces global conservation laws \(CL\) as linear constraints, along with initial condition \(IC\) constraints\. In contrast, Burgers and Reaction–Diffusion apply CL as nonlinear constraints along with IC or boundary conditions \(BC\) constraints\. Lower values indicate better performance, with best results inboldand second bestunderlined\.Several clear trends emerge across the settings reported in Table[1](https://arxiv.org/html/2605.12754#S6.T1)\. First, while FFM\[[32](https://arxiv.org/html/2605.12754#bib.bib38)\]reports strong reconstruction metrics in applications like Navier–Stokes, the high constraint residuals demonstrate that unconstrained approaches are not viable for these complex PDE settings\. Next, while the ECI framework\[[13](https://arxiv.org/html/2605.12754#bib.bib26)\]is generally competetive with the other constrained samplers for constraint satisfaction, it generally reports less competitive accuracy metrics as compared to the other constrained approaches\. Finally, and most notably, the table shows that CAFM consistently outperforms its constrained sampling counterpart PCFM, illustrating that the end\-to\-end training improves both accuracy and feasibility \(Figure[3](https://arxiv.org/html/2605.12754#S6.F3)\)\. These results offer strong evidence that CAFM can effectively scale across increasingly difficult constraint sets\. Furthermore, the strong performance illustrates that this method provides strong generalization to out\-of\-distribution constraint sets, given the evaluation is conducted over unseen initial conditions and boundary conditions\.
### 6\.2Application 2: Microweather Wind Velocity Field Estimation
The next evaluation assesses the proposed method for wind velocity field prediction\. Motivated by the real\-world gap in prediction of microweather, localized weather conditions which are computationally prohibitive to explicitly compute with fluid dynamics, this experiment requires that the generative model predicts mean and variance wind fields while maintaining realistic spatiotemporal coherence\[[51](https://arxiv.org/html/2605.12754#bib.bib54),[44](https://arxiv.org/html/2605.12754#bib.bib53)\]\. Due to the prohibitive cost of collecting this wind velocity data, which is dependent on the deployment of sensors, the training set is composed of sparse synthetic measurements to mimic the challenges of collecting spatially\-dense wind velocity data with deployed sensors\.
The evaluation targets a fixed\-grid version of the wind velocity field estimation setting described by[Warneret al\.](https://arxiv.org/html/2605.12754#bib.bib1)\([2026](https://arxiv.org/html/2605.12754#bib.bib1)\), where the goal is to generate realistic wind samples from sparse spatiotemporal observations while preserving a prescribed coherence structure\. As in the original setup, wind velocity is modeled using the standard decomposition\[[11](https://arxiv.org/html/2605.12754#bib.bib43)\]:
𝐔\(𝐫,ω\)=𝝁u\(𝐫\)\+𝐖\(𝐫,ω\)\\displaystyle\\mathbf\{U\}\(\\mathbf\{r\},\\omega\)=\\bm\{\\mu\}\_\{u\}\(\\mathbf\{r\}\)\+\\mathbf\{W\}\(\\mathbf\{r\},\\omega\)where𝐫=\[x1,x2,x3,t\]\\mathbf\{r\}=\[x\_\{1\},x\_\{2\},x\_\{3\},t\]represents a static spatiotemporal point,𝝁u\\bm\{\\mu\}\_\{u\}is a deterministic mean function overu∈ℝx1×x2×x3×tu\\in\\mathbb\{R\}^\{x\_\{1\}\\times x\_\{2\}\\times x\_\{3\}\\times t\}, and𝐖\(𝐫,ω\)\\mathbf\{W\}\(\\mathbf\{r\},\\omega\)is a zero\-mean, stationary Gaussian process\. The application considers a spatiotemporal correlation constraint, which is defined between two grid locations𝐫\\mathbf\{r\}and𝐫′\\mathbf\{r\}^\{\\prime\}by the coherence function:
𝐂𝐨𝐡\(𝐫,𝐫′,n\)=exp\(−n‖𝐝T\(𝐫−𝐫′\)‖‖𝝁u\(𝐫\)−𝝁u\(𝐫′\)‖\)\\displaystyle\\mathbf\{Coh\}\(\\mathbf\{r\},\\mathbf\{r\}^\{\\prime\},n\)=\\exp\\Bigl\(\-\\frac\{n\\\|\\mathbf\{d\}^\{T\}\(\\mathbf\{r\}\-\\mathbf\{r\}^\{\\prime\}\)\\\|\}\{\\\|\\bm\{\\mu\}\_\{u\}\(\\mathbf\{r\}\)\-\\bm\{\\mu\}\_\{u\}\(\\mathbf\{r\}^\{\\prime\}\)\\\|\}\\Bigr\)wherennis frequency and𝐝\\mathbf\{d\}contains directional decay coefficients\. Following[Warneret al\.](https://arxiv.org/html/2605.12754#bib.bib1)\([2026](https://arxiv.org/html/2605.12754#bib.bib1)\), the evaluation focuses on a two\-dimensional spatial plane over time,𝐫=\[x2,x3,t\]\\mathbf\{r\}=\[x\_\{2\},x\_\{3\},t\]\. In contrast to the continuous functional representation used in the original setup, our implementation operates on a fixed grid withNx2=10N\_\{x\_\{2\}\}=10,Nx3=10N\_\{x\_\{3\}\}=10, andNt=256N\_\{t\}=256, so each wind sample is represented as an arrayu∈ℝNx2×Nx3×Nt=ℝ10×10×256u\\in\\mathbb\{R\}^\{N\_\{x\_\{2\}\}\\times N\_\{x\_\{3\}\}\\times N\_\{t\}\}=\\mathbb\{R\}^\{10\\times 10\\times 256\}\.
Empirical results are reported in Table[2\(a\)](https://arxiv.org/html/2605.12754#S6.T2.st1)\. The unconstrained baseline, Functional Flow Matching \(FFM\)\[[32](https://arxiv.org/html/2605.12754#bib.bib38)\], is ineffective in this experimental setting, reporting the worst performance across all metrics; the higher MMSE and Variance MSE suggests that poor adherence to the coherence constraint results in wind velocities which fail to align with the training data\. Conversely, the Projected Diffusion Model \(PDM\)\[[15](https://arxiv.org/html/2605.12754#bib.bib3)\], which we extend to operate with the flow matching backbone as in\[[53](https://arxiv.org/html/2605.12754#bib.bib6)\], and the PCFM model report the strongest results for Variance MSE, while improving the MMSE and CV metrics over FFM\. In contrast, CAFM reports the strongest MMSE and constraint violation, lowering the coherence residual by an order of magnitude\. While the variance trade\-off is interesting to note, as it suggests that the constraint\-aware objective may suppress high\-variance structures to avoid projection instability, the significant improvements in feasibility and reconstruction accuracy mark a pronounced improvement over existing constrained sampling approaches\.
\(a\)Fixed\-grid wind modeling\.
\(b\)Microstructure inverse design\.
Table 2:Final metric summaries for the two generative modeling settings\. Lower values indicate better performance, with best results inboldand second\-best resultsunderlined\.
### 6\.3Application 3: Microstructure Inverse\-Design
The final setting benchmarks the proposed method on a material science inverse\-design task\. Due to the prohibitive cost of collecting microstructure imaging data, generative models have been viewed as a highly promising direction for obtaining necessary samples for discovering structure\-property linkages\[[15](https://arxiv.org/html/2605.12754#bib.bib3),[59](https://arxiv.org/html/2605.12754#bib.bib4),[17](https://arxiv.org/html/2605.12754#bib.bib40)\]\. For this evaluation, a sparse dataset of Bentheimer sandstone imaging data is adopted\[[34](https://arxiv.org/html/2605.12754#bib.bib44)\]and subsampled to create256×256256\\times 256patches rescaled to\[−1,1\]\[\-1,1\]\.
This experiment adopts a convex constraint set, following the definition provided by\[[59](https://arxiv.org/html/2605.12754#bib.bib4)\]\. Specifically, let𝐫i,j\\mathbf\{r\}\_\{i,j\}be the pixel value for rowiiand columnjj, where𝐫i,j∈\[−1,1\]\\mathbf\{r\}\_\{i,j\}\\in\[\-1,1\]for all values ofiiandjj\. The porosity constraint is then,
porosity=1n×m∑i=1n∑j=1m𝟙\(𝐫i,j<0\),\\textit\{porosity\}=\\tfrac\{1\}\{n\\times m\}\\sum^\{n\}\_\{i=1\}\\sum^\{m\}\_\{j=1\}\\mathbbm\{1\}\{\\left\(\\mathbf\{r\}\_\{i,j\}<0\\right\)\},where our constraint enforces a strict target porosity value for each sample \(e\.g\.porosity=0\.5\\textit\{porosity\}=0\.5\)\.
The downstream evaluation metrics are reported in Table[2\(b\)](https://arxiv.org/html/2605.12754#S6.T2.st2)\. The results for this task parallel the previous two settings in the general trends: FM struggles to adhere to the prescribed constraint, and the performance suffers because of this; PDM and PCFM remain competitive with one another, consistently improving over the FM baseline\. Meanwhile, CAFM outperforms both PCFM and all other methods across accuracy metrics, while maintaining perfect constraint satisfaction\. The empirical evaluation for this setting reiterates the benefit of aligning model training with the downstream objective, providing state\-of\-the\-art performance when utilizing this end\-to\-end training paradigm\.
## 7Discussion
Connection to Multistage Optimization\.While classical decision\-focused learning formulations consider a single downstream optimization problem, many real\-world systems are more accurately framed as sequential or multistage decision\-making, where predictions inform a sequence of optimization steps rather than a single decision\. Recent work has targeted such applications, leveraging a predictive model which, rather than parameterizing a single optimization problem, induces a policy through repeated applications of an optimization operator over a temporal horizon\[[45](https://arxiv.org/html/2605.12754#bib.bib41)\]\. Formally, let𝐱\(k\)\\mathbf\{x\}\_\{\(k\)\}denote the state at iterationkk, and let𝐜^\(k\)\\mathbf\{\\hat\{c\}\}\_\{\(k\)\}denote the predicted parameters at that stage\. A sequential decision process can be written:
𝐱\(k\+1\)=𝒪\(𝐱\(k\),𝐜^\(k\)\)\\mathbf\{x\}\_\{\(k\+1\)\}=\\mathcal\{O\}\\bigl\(\\mathbf\{x\}\_\{\(k\)\},\\mathbf\{\\hat\{c\}\}\_\{\(k\)\}\\bigr\)where𝒪\\mathcal\{O\}is a forward operator, coupling the prediction and optimization steps\. The final decision is then given by the trajectory endpoint\. This perspective generalizes Equation \([5](https://arxiv.org/html/2605.12754#S5.E5)\) by replacing a single optimization problem with a policy induced by the repeated optimization\. This formulation naturally extends to our methodology, which we believe is a fitting perspective for understanding how the single\-stage optimization in Equation \([6](https://arxiv.org/html/2605.12754#S5.E6)\), which is isolated at training time, generalizes to the multistage sampling process\.
Out\-of\-Distribution Generation\.One of the strongest arguments for the adoption of constrained sampling algorithms, as opposed to constraint\-aware training, is the native ability of sampling\-time generalization to arbitrary constraint sets\. The evaluation has demonstrated that CAFM effectively handles held\-out constraint specifications \(Section[6\.1](https://arxiv.org/html/2605.12754#S6.SS1)\) and under\-represented design properties \(Section[6\.3](https://arxiv.org/html/2605.12754#S6.SS3)\), demonstrating that our training\-aligned sampler remains robust to out\-of\-distribution tasks and constraints\. Furthermore, it is worth noting that additional sampling\-time constraints can be seamlessly incorporated into CAFM, creating a hybrid between CAFM and PCFM which remains effective due to the sampling\-time projections\. As such, the proposed approach remains general in its handling of domain shift, avoiding this pitfall of existing constraint\-aware training methods\.
Training Complexity\.Analogous to decision\-focused learning approaches, CAFM’s improved downstream performance is provided by incorporating the optimization problem into the training objective\. While existing decision\-focused learning literature has provided an array of efficiency imporvements that are leveraged by our implementation, the additional computational overhead introduced to solve the projection for each forward pass during training should be acknowledged\. To reduce the additional training requirements, the CAFM objective can be applied as a finetuning step, initializing from the pretrained flow matching weights, decreasing the total number of training steps needed to reach convergence\. Appendix[C](https://arxiv.org/html/2605.12754#A3)provides an additional analysis of this finetuning approach to illustrate the efficiency gains provided by warm starting with pretrained weights\.
## 8Conclusion
Motivated by the transformative potential of constraint\-aware diffusion and flow matching models for scientific applications, this paper focuses on addressing a fundamental misalignment between the training and sampling process of constrained sampling approaches\. To this end,Constraint\-Aware Flow Matchingis proposed, a novel end\-to\-end framework for constrained generative modeling that directly addresses the mismatch between the learned dynamics and downstream objective\. The resulting model is able to generate samples that satisfy constraints while improving data fidelity\. The empirical results on three challenging real\-world benchmarks demonstrate both the effectiveness and the generality of the proposed approach, reporting state\-of\-the\-art performance across increasingly complex constraint sets and data distributions\. Beyond improving performance in the studied settings, these findings highlight the importance of explicitly accounting for constraint\-enforcement mechanisms during training, rather than treating them solely as a post hoc correction at inference time\. More broadly, this paper marks a promising step towards building generative models that are both physically consistent and practically reliable in scientific and engineering applications\.
## Acknowledgments
This research is partially supported by NSF awards 2533631, 2401285, 2334936, and by DARPA under Contract No\.\#\\\#HR0011252E005\. The authors would like to thank the AiTHENA Program at NASA Langley Research Center for providing partial funding support for this research\. The authors acknowledge the Research Computing at the University of Virginia\. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF or DARPA\.
## References
- \[1\]A\. Agrawal, B\. Amos, S\. Barratt, S\. Boyd, S\. Diamond, and J\. Z\. Kolter\(2019\)Differentiable convex optimization layers\.Advances in neural information processing systems32\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p4.1),[§2](https://arxiv.org/html/2605.12754#S2.p3.1)\.
- \[2\]B\. Amos and J\. Z\. Kolter\(2017\)Optnet: differentiable optimization as a layer in neural networks\.InInternational conference on machine learning,pp\. 136–145\.Cited by:[§2](https://arxiv.org/html/2605.12754#S2.p3.1)\.
- \[3\]N\. Anand and T\. Achim\(2022\)Protein structure and sequence generation with equivariant denoising diffusion probabilistic models\.arXiv preprint arXiv:2205\.15019\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p1.1)\.
- \[4\]J\. Bastek, W\. Sun, and D\. M\. Kochmann\(2024\)Physics\-informed diffusion models\.arXiv preprint arXiv:2403\.14404\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p2.1),[§2](https://arxiv.org/html/2605.12754#S2.p1.1),[§6\.1](https://arxiv.org/html/2605.12754#S6.SS1.p1.1)\.
- \[5\]Q\. Berthet, M\. Blondel, O\. Teboul, M\. Cuturi, J\. Vert, and F\. Bach\(2020\)Learning with differentiable perturbed optimizers\.InAdvances in Neural Information Processing Systems,H\. Larochelle, M\. Ranzato, R\. Hadsell, M\. F\. Balcan, and H\. Lin \(Eds\.\),Vol\.33,pp\. 9508–9519\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p4.1)\.
- \[6\]M\. Blanke, Y\. Qu, S\. Shamekh, and P\. Gentine\(2025\)Strictly constrained generative modeling via split augmented langevin sampling\.arXiv preprint arXiv:2505\.18017\.Cited by:[§4](https://arxiv.org/html/2605.12754#S4.p3.6),[§5\.1](https://arxiv.org/html/2605.12754#S5.SS1.p4.1)\.
- \[7\]M\. Blondel, Q\. Berthet, M\. Cuturi, R\. Frostig, S\. Hoyer, F\. Llinares\-López, F\. Pedregosa, and J\. Vert\(2022\)Efficient and modular implicit differentiation\.Advances in Neural Information Processing Systems35,pp\. 5230–5242\.Cited by:[§2](https://arxiv.org/html/2605.12754#S2.p3.1)\.
- \[8\]M\. Blondel, A\. F\. Martins, and V\. Niculae\(2020\)Learning with fenchel\-young losses\.\.J\. Mach\. Learn\. Res\.21\(35\),pp\. 1–69\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p4.1)\.
- \[9\]M\. Blondel, O\. Teboul, Q\. Berthet, and J\. Djolonga\(2020\)Fast differentiable sorting and ranking\.InInternational Conference on Machine Learning,pp\. 950–959\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p4.1)\.
- \[10\]N\. Botteghi, F\. Califano, M\. Poel, and C\. Brune\(2023\)Trajectory generation, control, and safety with denoising diffusion probabilistic models\.arXiv preprint arXiv:2306\.15512\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p2.1)\.
- \[11\]L\. Carassale and G\. Solari\(2006\)Monte carlo simulation of wind velocity fields on complex structures\.Journal of Wind Engineering and Industrial Aerodynamics94\(5\),pp\. 323–339\.Cited by:[§6\.2](https://arxiv.org/html/2605.12754#S6.SS2.p2.14)\.
- \[12\]J\. Carvalho, A\. T\. Le, M\. Baierl, D\. Koert, and J\. Peters\(2023\)Motion planning diffusion: learning and planning of robot motions with diffusion models\.In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems \(IROS\),pp\. 1916–1923\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p2.1),[§2](https://arxiv.org/html/2605.12754#S2.p1.1)\.
- \[13\]C\. Cheng, B\. Han, D\. C\. Maddix, A\. F\. Ansari, A\. Stuart, M\. W\. Mahoney, and Y\. Wang\(2024\)Gradient\-free generation for hard\-constrained systems\.arXiv preprint arXiv:2412\.01786\.Cited by:[2nd item](https://arxiv.org/html/2605.12754#A1.I2.i2.p1.1.1),[§1](https://arxiv.org/html/2605.12754#S1.p2.1),[§6\.1](https://arxiv.org/html/2605.12754#S6.SS1.p3.1)\.
- \[14\]J\. K\. Christopher, S\. Baek, and F\. Fioretto\(2025\)Physics\-aware diffusion models for micro\-structure material design\.InAI for Materials Science Workshop at NeurIPS 2024,Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p1.1)\.
- \[15\]J\. K\. Christopher, S\. Baek, and N\. Fioretto\(2024\)Constrained synthesis with projected diffusion models\.Advances in Neural Information Processing Systems37,pp\. 89307–89333\.Cited by:[2nd item](https://arxiv.org/html/2605.12754#A1.I4.i2.p1.1.1),[2nd item](https://arxiv.org/html/2605.12754#A1.I6.i2.p1.1.1),[§1](https://arxiv.org/html/2605.12754#S1.p2.1),[§1](https://arxiv.org/html/2605.12754#S1.p3.1),[§2](https://arxiv.org/html/2605.12754#S2.p2.1),[§4](https://arxiv.org/html/2605.12754#S4.p3.6),[§5\.1](https://arxiv.org/html/2605.12754#S5.SS1.p4.1),[§6\.2](https://arxiv.org/html/2605.12754#S6.SS2.p3.1),[§6\.3](https://arxiv.org/html/2605.12754#S6.SS3.p1.2)\.
- \[16\]J\. K\. Christopher, A\. Seamann, J\. Cui, S\. Khare, and F\. Fioretto\(2025\)Constrained diffusion for protein design with hard structural constraints\.arXiv preprint arXiv:2510\.14989\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p1.1),[§2](https://arxiv.org/html/2605.12754#S2.p2.1),[§4](https://arxiv.org/html/2605.12754#S4.p2.5),[§4](https://arxiv.org/html/2605.12754#S4.p3.6),[§5\.1](https://arxiv.org/html/2605.12754#S5.SS1.p4.1)\.
- \[17\]S\. Chun, S\. Roy, Y\. T\. Nguyen, J\. B\. Choi, H\. S\. Udaykumar, and S\. S\. Baek\(2020\)Deep learning for synthetic microstructure generation in a materials\-by\-design framework for heterogeneous energetic materials\.Scientific reports10\(1\),pp\. 13307\.Cited by:[§6\.3](https://arxiv.org/html/2605.12754#S6.SS3.p1.2)\.
- \[18\]J\. Cui, J\. K\. Christopher, A\. Biswas, P\. V\. Balachandran, and F\. Fioretto\(2026\)Constrained diffusion for accelerated structure relaxation of inorganic solids with point defects\.arXiv preprint arXiv:2602\.19153\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p1.1),[§2](https://arxiv.org/html/2605.12754#S2.p2.1)\.
- \[19\]Q\. Dao, H\. Phung, B\. Nguyen, and A\. Tran\(2023\)Flow matching in latent space\.arXiv preprint arXiv:2307\.08698\.Cited by:[§2](https://arxiv.org/html/2605.12754#S2.p1.1)\.
- \[20\]E\. Demirović, P\. J\. Stuckey, J\. Bailey, J\. Chan, C\. Leckie, K\. Ramamohanarao, and T\. Guns\(2019\)An investigation into prediction\+ optimisation for the knapsack problem\.InInternational Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research,pp\. 241–257\.Cited by:[§5](https://arxiv.org/html/2605.12754#S5.p2.5)\.
- \[21\]A\. N\. Elmachtoub and P\. Grigas\(2022\)Smart “predict, then optimize”\.Management Science68\(1\),pp\. 9–26\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p4.1)\.
- \[22\]A\. M\. Ferber, T\. Huang, D\. Zha, M\. Schubert, B\. Steiner, B\. Dilkina, and Y\. Tian\(2023\)SurCo: learning linear surrogates for combinatorial nonlinear optimization problems\.InInternational Conference on Machine Learning, ICML 2023, 23\-29 July 2023, Honolulu, Hawaii, USA,A\. Krause, E\. Brunskill, K\. Cho, B\. Engelhardt, S\. Sabato, and J\. Scarlett \(Eds\.\),Proceedings of Machine Learning Research, Vol\.202,pp\. 10034–10052\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p4.1)\.
- \[23\]A\. M\. Ferber, B\. Wilder, B\. Dilkina, and M\. Tambe\(2020\)MIPaaL: mixed integer program as a layer\.InThe Thirty\-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty\-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7\-12, 2020,pp\. 1504–1511\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p4.1)\.
- \[24\]A\. Ferber, B\. Wilder, B\. Dilkina, and M\. Tambe\(2020\)Mipaal: mixed integer program as a layer\.InProceedings of the AAAI conference on artificial intelligence,Vol\.34,pp\. 1504–1511\.Cited by:[§2](https://arxiv.org/html/2605.12754#S2.p3.1)\.
- \[25\]S\. Gould, B\. Fernando, A\. Cherian, P\. Anderson, R\. S\. Cruz, and E\. Guo\(2016\)On differentiating parameterized argmin and argmax problems with application to bi\-level optimization\.arXiv preprint arXiv:1607\.05447\.Cited by:[§2](https://arxiv.org/html/2605.12754#S2.p3.1)\.
- \[26\]J\. Ho, A\. Jain, and P\. Abbeel\(2020\)Denoising diffusion probabilistic models\.Advances in neural information processing systems33,pp\. 6840–6851\.Cited by:[§2](https://arxiv.org/html/2605.12754#S2.p1.1)\.
- \[27\]J\. Ho and T\. Salimans\(2022\)Classifier\-free diffusion guidance\.arXiv preprint arXiv:2207\.12598\.Cited by:[§2](https://arxiv.org/html/2605.12754#S2.p1.1)\.
- \[28\]E\. Hoogeboom, V\. G\. Satorras, C\. Vignac, and M\. Welling\(2022\)Equivariant diffusion for molecule generation in 3d\.InInternational conference on machine learning,pp\. 8867–8887\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p1.1)\.
- \[29\]W\. Hu, P\. Wang, and H\. B\. Gooi\(2016\)Toward optimal energy management of microgrids via robust two\-stage optimization\.IEEE Transactions on smart grid9\(2\),pp\. 1161–1174\.Cited by:[§5](https://arxiv.org/html/2605.12754#S5.p2.4)\.
- \[30\]T\. B\. Ifriqi, J\. Nguyen, K\. Alahari, J\. Verbeek, and R\. T\. Chen\(2025\)Flowception: temporally expansive flow matching for video generation\.arXiv preprint arXiv:2512\.11438\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p1.1)\.
- \[31\]Y\. Jin, Z\. Sun, N\. Li, K\. Xu, H\. Jiang, N\. Zhuang, Q\. Huang, Y\. Song, Y\. Mu, and Z\. Lin\(2024\)Pyramidal flow matching for efficient video generative modeling\.arXiv preprint arXiv:2410\.05954\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p1.1)\.
- \[32\]G\. Kerrigan, G\. Migliorini, and P\. Smyth\(2023\)Functional flow matching\.arXiv preprint arXiv:2305\.17209\.Cited by:[3rd item](https://arxiv.org/html/2605.12754#A1.I2.i3.p1.1.1),[3rd item](https://arxiv.org/html/2605.12754#A1.I4.i3.p1.1.1),[§6\.1](https://arxiv.org/html/2605.12754#S6.SS1.p3.1),[§6\.2](https://arxiv.org/html/2605.12754#S6.SS2.p3.1)\.
- \[33\]J\. Kotary, J\. Christopher, M\. H\. Dinh, and F\. Fioretto\(2023\)Analyzing and enhancing the backward\-pass convergence of unrolled optimization\.arXiv preprint arXiv:2312\.17394\.Cited by:[Appendix B](https://arxiv.org/html/2605.12754#A2.SS0.SSS0.Px1.p1.4),[Appendix B](https://arxiv.org/html/2605.12754#A2.p1.2),[§2](https://arxiv.org/html/2605.12754#S2.p3.1),[§5\.2](https://arxiv.org/html/2605.12754#S5.SS2.p2.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.1)\.
- \[34\]R\. Li, I\. Shikhov, and C\. Arns\(2022\-03\-27\)Bentheimer sandstone image data\.Digital Porous Media Portal\.Note:DatasetExternal Links:[Document](https://dx.doi.org/10.17612/1J6K-SH07),[Link](https://doi.org/10.17612/1J6K-SH07)Cited by:[§A\.3](https://arxiv.org/html/2605.12754#A1.SS3.SSS0.Px3.p1.2),[§6\.3](https://arxiv.org/html/2605.12754#S6.SS3.p1.2)\.
- \[35\]J\. Liang, J\. K\. Christopher, S\. Koenig, and F\. Fioretto\(2025\)Simultaneous multi\-robot motion planning with projected diffusion models\.arXiv preprint arXiv:2502\.03607\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p2.1),[§1](https://arxiv.org/html/2605.12754#S1.p3.1),[§2](https://arxiv.org/html/2605.12754#S2.p2.1),[§4](https://arxiv.org/html/2605.12754#S4.p3.6)\.
- \[36\]J\. Liang, Y\. Sun, A\. Samaddar, S\. Madireddy, and F\. Fioretto\(2025\)Chance\-constrained flow matching for high\-fidelity constraint\-aware generation\.arXiv preprint arXiv:2509\.25157\.Cited by:[§2](https://arxiv.org/html/2605.12754#S2.p2.1)\.
- \[37\]Y\. Lipman, R\. T\. Chen, H\. Ben\-Hamu, M\. Nickel, and M\. Le\(2022\)Flow matching for generative modeling\.arXiv preprint arXiv:2210\.02747\.Cited by:[3rd item](https://arxiv.org/html/2605.12754#A1.I6.i3.p1.1.1),[§3](https://arxiv.org/html/2605.12754#S3.p1.5.1),[§6](https://arxiv.org/html/2605.12754#S6.p1.2)\.
- \[38\]L\. Lu, P\. Jin, and G\. E\. Karniadakis\(2019\)Deeponet: learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators\.arXiv preprint arXiv:1910\.03193\.Cited by:[§A\.2](https://arxiv.org/html/2605.12754#A1.SS2.SSS0.Px3.p1.2)\.
- \[39\]J\. Mandi and T\. Guns\(2020\)Interior point solving for lp\-based prediction\+ optimisation\.Advances in Neural Information Processing Systems33,pp\. 7272–7282\.Cited by:[§2](https://arxiv.org/html/2605.12754#S2.p3.1)\.
- \[40\]J\. Mandi, J\. Kotary, S\. Berden, M\. Mulamba, V\. Bucarey, T\. Guns, and F\. Fioretto\(2024\)Decision\-focused learning: foundations, state of the art, benchmark and future opportunities\.Journal of Artificial Intelligence Research81,pp\. 1623–1701\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2307.13565)Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p4.1)\.
- \[41\]J\. Mandi, J\. Kotary, S\. Berden, M\. Mulamba, V\. Bucarey, T\. Guns, and F\. Fioretto\(2024\)Decision\-focused learning: foundations, state of the art, benchmark and future opportunities\.Journal of Artificial Intelligence Research80,pp\. 1623–1701\.Cited by:[§5](https://arxiv.org/html/2605.12754#S5.p2.6)\.
- \[42\]F\. Mazé and F\. Ahmed\(2023\)Diffusion models beat gans on topology optimization\.InProceedings of the AAAI conference on artificial intelligence,Vol\.37,pp\. 9108–9116\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p2.1),[§2](https://arxiv.org/html/2605.12754#S2.p1.1)\.
- \[43\]M\. Niepert, P\. Minervini, and L\. Franceschi\(2021\)Implicit mle: backpropagating through discrete exponential family distributions\.Advances in Neural Information Processing Systems34,pp\. 14567–14579\.Cited by:[§2](https://arxiv.org/html/2605.12754#S2.p3.1)\.
- \[44\]D\. S\. Nithya, G\. Quaranta, V\. Muscarello, and M\. Liang\(2024\)Review of wind flow modelling in urban environments to support the development of urban air mobility\.Drones8\(4\)\.External Links:[Document](https://dx.doi.org/10.3390/drones8040147),ISSN 2504\-446X,[Link](https://www.mdpi.com/2504-446X/8/4/147)Cited by:[§6\.2](https://arxiv.org/html/2605.12754#S6.SS2.p1.1)\.
- \[45\]E\. Peršak and M\. F\. Anjos\(2024\)Decision\-focused forecasting: a differentiable multistage optimisation architecture\.arXiv preprint arXiv:2405\.14719\.Cited by:[§7](https://arxiv.org/html/2605.12754#S7.p1.3)\.
- \[46\]M\. V\. Pogančić, A\. Paulus, V\. Musil, G\. Martius, and M\. Rolinek\(2019\)Differentiation of blackbox combinatorial solvers\.InInternational conference on learning representations,Cited by:[§2](https://arxiv.org/html/2605.12754#S2.p3.1)\.
- \[47\]M\. Raissi and G\. E\. Karniadakis\(2018\)Hidden physics models: machine learning of nonlinear partial differential equations\.Journal of Computational Physics357,pp\. 125–141\.Cited by:[§2](https://arxiv.org/html/2605.12754#S2.p1.1)\.
- \[48\]M\. Raissi, P\. Perdikaris, and G\. E\. Karniadakis\(2019\)Physics\-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations\.Journal of Computational physics378,pp\. 686–707\.Cited by:[§2](https://arxiv.org/html/2605.12754#S2.p1.1)\.
- \[49\]R\. Rombach, A\. Blattmann, D\. Lorenz, P\. Esser, and B\. Ommer\(2022\)High\-resolution image synthesis with latent diffusion models\.InProceedings of the IEEE/CVF conference on computer vision and pattern recognition,pp\. 10684–10695\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p1.1)\.
- \[50\]S\. S\. Sahoo, A\. Paulus, M\. Vlastelica, V\. Musil, V\. Kuleshov, and G\. Martius\(2022\)Backpropagation through combinatorial algorithms: identity with projection works\.arXiv preprint arXiv:2205\.15213\.Cited by:[§2](https://arxiv.org/html/2605.12754#S2.p3.1)\.
- \[51\]T\. A\. Shah, M\. C\. Stanley, and J\. E\. Warner\(2025\)Generative modeling of microweather wind velocities for urban air mobility\.In2025 IEEE Aerospace Conference,Vol\.,pp\. 1–17\.External Links:[Document](https://dx.doi.org/10.1109/AERO63441.2025.11068624)Cited by:[§6\.2](https://arxiv.org/html/2605.12754#S6.SS2.p1.1)\.
- \[52\]Y\. Song and S\. Ermon\(2019\)Generative modeling by estimating gradients of the data distribution\.Advances in neural information processing systems32\.Cited by:[§5\.1](https://arxiv.org/html/2605.12754#S5.SS1.p4.1)\.
- \[53\]U\. Utkarsh, P\. Cai, A\. Edelman, R\. Gomez\-Bombarelli, and C\. V\. Rackauckas\(2025\)Physics\-constrained flow matching: sampling generative models with hard constraints\.arXiv preprint arXiv:2506\.04171\.Cited by:[1st item](https://arxiv.org/html/2605.12754#A1.I2.i1.p1.1.1),[1st item](https://arxiv.org/html/2605.12754#A1.I4.i1.p1.1.1),[1st item](https://arxiv.org/html/2605.12754#A1.I6.i1.p1.1.1),[§A\.1](https://arxiv.org/html/2605.12754#A1.SS1.SSS0.Px1.p1.1),[§A\.1](https://arxiv.org/html/2605.12754#A1.SS1.SSS0.Px3.p1.1),[§A\.1](https://arxiv.org/html/2605.12754#A1.SS1.SSS0.Px3.p2.8),[§1](https://arxiv.org/html/2605.12754#S1.p2.1),[§1](https://arxiv.org/html/2605.12754#S1.p3.1),[§2](https://arxiv.org/html/2605.12754#S2.p2.1),[§4](https://arxiv.org/html/2605.12754#S4.p3.5),[§4](https://arxiv.org/html/2605.12754#S4.p3.6),[§6\.1](https://arxiv.org/html/2605.12754#S6.SS1.p1.1),[§6\.1](https://arxiv.org/html/2605.12754#S6.SS1.p2.1),[§6\.2](https://arxiv.org/html/2605.12754#S6.SS2.p3.1),[§6](https://arxiv.org/html/2605.12754#S6.p1.2)\.
- \[54\]P\. Wang, P\. Donti, B\. Wilder, and Z\. Kolter\(2019\)Satnet: bridging deep learning and logical reasoning using a differentiable satisfiability solver\.InInternational Conference on Machine Learning,pp\. 6545–6554\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p4.1)\.
- \[55\]T\. J\. Wang, J\. Zheng, P\. Ma, Y\. Du, B\. Kim, A\. Spielberg, J\. Tenenbaum, C\. Gan, and D\. Rus\(2023\)Diffusebot: breeding soft robots with physics\-augmented generative diffusion models\.Advances in Neural Information Processing Systems36,pp\. 44398–44423\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p1.1),[§2](https://arxiv.org/html/2605.12754#S2.p1.1)\.
- \[56\]J\. E\. Warner, T\. A\. Shah, P\. E\. Leser, G\. F\. Bomarito, J\. D\. Pribe, and M\. C\. Stanley\(2026\)Latent generative modeling of random fields from limited training data\.External Links:2505\.13007,[Link](https://arxiv.org/abs/2505.13007)Cited by:[§A\.2](https://arxiv.org/html/2605.12754#A1.SS2.SSS0.Px1.p1.1),[§A\.2](https://arxiv.org/html/2605.12754#A1.SS2.SSS0.Px3.p1.2),[§1](https://arxiv.org/html/2605.12754#S1.p2.1),[§2](https://arxiv.org/html/2605.12754#S2.p1.1),[§6\.2](https://arxiv.org/html/2605.12754#S6.SS2.p2.13),[§6\.2](https://arxiv.org/html/2605.12754#S6.SS2.p2.14)\.
- \[57\]B\. Wilder, B\. Dilkina, and M\. Tambe\(2019\)Melding the data\-decisions pipeline: decision\-focused learning for combinatorial optimization\.InProceedings of the AAAI conference on artificial intelligence,Vol\.33,pp\. 1658–1665\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p4.1),[§2](https://arxiv.org/html/2605.12754#S2.p3.1),[§5](https://arxiv.org/html/2605.12754#S5.p2.4)\.
- \[58\]L\. Wu, R\. Jiao, Q\. Li, M\. Li, S\. Li, S\. Jin, and W\. Huang\(2026\)DMFlow: disordered materials generation by flow matching\.arXiv preprint arXiv:2602\.04734\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p1.1)\.
- \[59\]S\. Zampini, J\. K\. Christopher, L\. Oneto, D\. Anguita, and F\. Fioretto\(2025\)Training\-free constrained generation with stable diffusion models\.arXiv preprint arXiv:2502\.05625\.Cited by:[§1](https://arxiv.org/html/2605.12754#S1.p2.1),[§2](https://arxiv.org/html/2605.12754#S2.p2.1),[§6\.3](https://arxiv.org/html/2605.12754#S6.SS3.p1.2),[§6\.3](https://arxiv.org/html/2605.12754#S6.SS3.p2.6)\.
## Appendix AExperimental Setups and Reproducibility
This section is dedicated to documenting the specific evaluation settings, hyperparameters, and reproducibility details\. All experiments were conducted on a single NVIDIA A100\-80GB GPU\. Across all settings, CAFM trains the flow model with the same sampling\-time projection mechanism used during constrained inference\. Unless otherwise stated, optimization is performed using Adam, withβ=\(0\.9,0\.999\)\\beta=\(0\.9,0\.999\)and no weight decay\. This is implemented with a differentiable equality\-constrained SQP projection layer\. Given a predicted endpointuu, the projector computes the constraint residualh\(u\)h\(u\), forms the damped normal matrixJJ⊤\+ϵIJJ^\{\\top\}\+\\epsilon I, solves for the Lagrange multiplier update, and applies the correction\. The default damping isϵ=10−6\\epsilon=10^\{\-6\}, with an adaptive Cholesky fallback\. The projected velocity target is computed as
u1proj−utmax\(1−t,10−3\)\.\\frac\{u^\{\\mathrm\{proj\}\}\_\{1\}\-u\_\{t\}\}\{\\max\(1\-t,10^\{\-3\}\)\}\.For projected training losses, we use residual regularization weight10−310^\{\-3\}, sample timest∈\[0,0\.95\]t\\in\[0,0\.95\], and a projection clamp of10−310^\{\-3\}\.
### A\.1Partial Differential Equations
#### Evaluation Metrics\.
The selected evaluation criteria follows the metrics used by[Utkarshet al\.](https://arxiv.org/html/2605.12754#bib.bib6)\([2025](https://arxiv.org/html/2605.12754#bib.bib6)\)\. Each is described below:
- •Mean Squared Errors of the Mean \(MMSE\):Measures the squared error between the generated mean field and the ground\-truth mean field\. It evaluates how accurately the model reproduces the expected or average behavior of the target distribution\.
- •Standard Deviation \(SMSE\):Measures the squared error between the generated standard deviation field and the ground\-truth standard deviation field\. It evaluates how well the model captures the spatial distribution of uncertainty or variability\.
- •Constraint Violation of Boundary / Initial Conditions \(CV \(BC/IC\)\):Measures the constraint violation of the boundary conditions or initial conditions, depending on which are relevant for the problem\. Navier–Stokes, Reaction–Diffusion IC, and Burgers IC enforce initial conditions, while Burgers BC enforce boundary conditions\.
- •Constraint Violation of Conservation Laws \(CV \(CL\)\):Measures the constraint violation of the conservation laws, which are detemined by the specific problem\. These are linear for Navier–Stokes and nonlinear for Burgers and Reaction–Diffusion IC\.
#### Baselines\.
The following baselines are adopted for comparison in this experimental setting:
- •Physics\-Constrained Flow Matching \(PCFM\)\[[53](https://arxiv.org/html/2605.12754#bib.bib6)\]:Uses an identical sampling setup to CAFM, isolating the impact of end\-to\-end training\.
- •Extrapolation, Correction, and Interpolation \(ECI\)\[[13](https://arxiv.org/html/2605.12754#bib.bib26)\]:Uses a similar final state correction to PCFM and CAFM, but operates in a gradient\-free manner\. Strong performance reported for this PDE setting, making it a relevant baseline\.
- •Functional Flow Matching \(FFM\)\[[32](https://arxiv.org/html/2605.12754#bib.bib38)\]:Unconstrained generative flow matching baseline, adapted from standard flow matching for spatiotemporal function spaces\.
#### Implementation\.
The PDE experiments evaluate constrained generation across Reaction–Diffusion, Navier–Stokes, Burgers BC, and Burgers IC systems\. If not specified, all implementation details follow the implementation from\[[53](https://arxiv.org/html/2605.12754#bib.bib6)\]\. The flow model uses an FNO\-based encoder for PDE fields\. Unless otherwise specified, the differentiable projection layer usesmax\_iter=3SQP/Gauss–Newton correction steps with exact Jacobians\.
For the one\-dimensional Reaction–Diffusion setting, models are trained for 20,000 iterations with batch size 256 and learning rate3×10−53\\times 10^\{\-5\}\. The training data is generated following\[[53](https://arxiv.org/html/2605.12754#bib.bib6)\]; samples have dimensions128×100128\\times 100\. For Navier–Stokes, models are trained for 50,000 iterations with batch size 16 and learning rate3×10−53\\times 10^\{\-5\}\. Samples have dimensions32×32×2532\\times 32\\times 25\. Because of the larger dimensionality of this setting, projection usesmax\_iter=1withjacobian\_mode=penalty,penalty\_steps=5, andpenalty\_step\_size=1e\-2\. For Burgers with boundary\-condition constraints, models are trained for 20,000 iterations with batch size 256 and learning rate3×10−53\\times 10^\{\-5\}\. We use exact Jacobians,max\_iter=3\. Samples again have dimensions101×101101\\times 101\. For Burgers with initial\-condition constraints, models are trained for 20,000 iterations with batch size 256 and learning rate3×10−53\\times 10^\{\-5\}\. We use exact Jacobians andmax\_iter=3\. Samples have dimensions101×101101\\times 101\.
### A\.2Microweather Wind Velocity Field Estimation
#### Evaluation Metrics\.
The selected evaluation criteria follows the metrics used by[Warneret al\.](https://arxiv.org/html/2605.12754#bib.bib1)\([2026](https://arxiv.org/html/2605.12754#bib.bib1)\)\. Each is described below:
- •Mean Squared Errors of the Mean \(MMSE\):Measures the squared error between the generated mean field and the ground\-truth mean field\. It evaluates how accurately the model reproduces the expected or average behavior of the target distribution\.
- •Mean Squared Errors of the Variance \(Variance MSE\):Measures the squared error between the generated variance field and the ground\-truth variance field\. It evaluates how accurately the model reproduces the second\-order statistics \(variance structure\) of the target distribution\.
- •Constraint Violation of Coherence Function \(CV \(Coherence\)\):Measures constraint violation of the prescribed coherence function\.
#### Baselines\.
The following baselines are adopted for comparison in this experimental setting:
- •Physics\-Constrained Flow Matching \(PCFM\)\[[53](https://arxiv.org/html/2605.12754#bib.bib6)\]:Uses an identical sampling setup to CAFM, isolating the impact of end\-to\-end training\.
- •Projected Diffusion Models \(PDM\)\[[15](https://arxiv.org/html/2605.12754#bib.bib3)\]:Constrained sampler that applies projections at intermediate steps rather than at the final state\. We adapt the methodology to use a flow\-matching backbone as opposed to score\-based diffusion, keeping the sampler implementation otherwise consistent\.
- •Functional Flow Matching \(FFM\)\[[32](https://arxiv.org/html/2605.12754#bib.bib38)\]:Unconstrained generative flow matching baseline, adapted from standard flow matching for spatiotemporal function spaces\.
#### Implementation\.
The wind velocity experiments use a fixed\-grid version of the microweather field estimation problem\. Unlike the continuous DeepONet representation\[[38](https://arxiv.org/html/2605.12754#bib.bib64)\]used in prior work, this implementation models wind samples directly as fixed\-grid tensors\[[56](https://arxiv.org/html/2605.12754#bib.bib1)\]\. Each sample is represented on a10×1010\\times 10spatial grid with a temporal horizon of length 256\. Wind values are normalized to the range\[−2,47\]\[\-2,47\], and conditioning is provided through the sparse SODAR observation pattern\.
The flow model is a data\-space flow matching network with hidden size 128, four hidden layers, minimum noise scaleσmin=10−2\\sigma\_\{\\min\}=10^\{\-2\}, and 12 ODE integration steps\. The constraint enforces wind coherence over all point pairs\. Coherence is computed across the entire grid\. The constraint tolerance is10−310^\{\-3\}\. The baseline flow model is trained with batch size 8, 25 epochs, Adam learning rate10−310^\{\-3\}, and gradient clipping at norm 1\.0\. The CAFM setting is trained with projection enabled, batch size 8, 25 epochs, Adam learning rate10−410^\{\-4\}, and gradient clipping at norm 1\.0\. Runs are evaluated over seeds 0, 1, and 2\.
Training\-time projection is performed in data space using an unrolled Augmented Lagrangian method rather than SQP\. The training projector and evaluation\-time projection use a full\-grid augmented Lagrangian method, Adam learning rate10−210^\{\-2\},inner\_iters=64,outer\_iters=8,rho\_init=1\.0,rho\_scale=2\.0,rho\_max=128\.0, andbeta=0\.0\. Projection is applied at every sampling step\.
### A\.3Microstructure Inverse\-Design
#### Evaluation Metrics\.
- •Mean Squared Error \(MSE\):Measures the average squared error between each sample and the ground\-truth mean field\. This metric is adopted rather than MMSE as the inverse\-design targets shift the generated mean field out\-of\-distribution, making it not directly comparable to the training data\.
- •Nearest Neighbor Mean Squared Error \(NNMSE\):Measures the average squared error between each sample and the nearest neighbor in the training set\. Benchmarks how close samples are to the ground truth data\.
- •Constraint Violation of Porosity \(CV \(Porosity\)\):Measures percentage of samples violating the exact porosity target\.
#### Baselines\.
The following baselines are adopted for comparison in this experimental setting:
- •Physics\-Constrained Flow Matching \(PCFM\)\[[53](https://arxiv.org/html/2605.12754#bib.bib6)\]:Uses an identical sampling setup to CAFM, isolating the impact of end\-to\-end training\.
- •Projected Diffusion Models \(PDM\)\[[15](https://arxiv.org/html/2605.12754#bib.bib3)\]:Constrained sampler that applies projections at intermediate steps rather than at the final state\. We adapt the methodology to use a flow\-matching backbone as opposed to score\-based diffusion, keeping the sampler implementation otherwise consistent\.
- •Flow Matching \(FM\)\[[37](https://arxiv.org/html/2605.12754#bib.bib39)\]:Unconstrained generative flow matching baseline\.
#### Implementation\.
The microstructure experiments evaluate RGB Bentheimer sandstone generation under a target porosity constraint\. Images are generated at resolution256×256256\\times 256using the dataset\[[34](https://arxiv.org/html/2605.12754#bib.bib44)\]\. During training, images are randomly horizontally flipped and rescaled to\[−1,1\]\[\-1,1\]\.
The generative backbone is a UNet with base channel count 128, channel multipliers\[1,1,2,2,4,4\]\[1,1,2,2,4,4\], two residual blocks per resolution, attention at resolution 16, dropout 0, and EMA decay 0\.999\. The flow process uses 1000 training flow steps\. The baseline flow matching model is trained with batch size 16, Adam learning rate2×10−52\\times 10^\{\-5\}, gradient clipping at norm 1\.0, snapshots every 5000 steps, and validation every 250 steps\. The CAFM run is trained with batch size 8, maximum 35,000 steps, and Adam learning rate2×10−52\\times 10^\{\-5\}\. Sampling uses Euler integration with 250 steps and batch size 8\. Evaluation uses 1000 generated samples\. Inpcfm, the predicted endpoint is projected at each Euler step\. Bothpcfmandpdmalso apply a hard save\-time projection withϵ=0\\epsilon=0to enforce the exact target porosity/count constraint after sampling\.
The microstructure constraint is a porosity/count constraint\. RGB samples are first converted to grayscale using weights\[0\.299,0\.587,0\.114\]\[0\.299,0\.587,0\.114\]\. Pixels are thresholded at 0\.0, and the target fraction is set tok=0\.6251k=0\.6251\. The projection operator uses an unrolled heuristic top\-kk/count projector with a differentiable surrogate update\. Projection uses step size 0\.1,n\_iter=5, and the fixed\-point iteration backpropagation rule\.
## Appendix BBackward\-Pass Implementation Details
Standard unrolling approaches often suffer from high memory cost and backward\-pass instability\. Instead, following[Kotaryet al\.](https://arxiv.org/html/2605.12754#bib.bib7)\([2023](https://arxiv.org/html/2605.12754#bib.bib7)\),folded optimizationis adopted, which extends unrolling by differentiating the solver’s fixed\-point conditions, providing gradients via a single linear solve rather than explicitly unrollingKKiterations\[[33](https://arxiv.org/html/2605.12754#bib.bib7)\]\. This enables compatibility with highly optimized blackbox solvers to compute the fixed\-point, as it is only necessary to differentiate𝐱\(K\)\(𝐜^\)\\mathbf\{x\}^\{\(K\)\}\(\\mathbf\{\\hat\{c\}\}\)rather than unrolling previous iterates\.
#### Efficient Backward\-Pass\.
To propagate the gradients through this projection layer, the chain rule is then considered:
∂ℒ∂vθ=∂ℒ∂𝒫C\(z^\)⋅∂𝒫C\(z^\)∂z^⏟Jacobian of projection⋅∂z^∂vθ\\frac\{\\partial\\mathcal\{L\}\}\{\\partial v\_\{\\theta\}\}=\\frac\{\\partial\\mathcal\{L\}\}\{\\partial\\mathcal\{P\}\_\{C\}\(\\hat\{z\}\)\}\\cdot\\underbrace\{\\frac\{\\partial\\mathcal\{P\}\_\{C\}\(\\hat\{z\}\)\}\{\\partial\\hat\{z\}\}\}\_\{\\text\{Jacobian of projection\}\}\\cdot\\frac\{\\partial\{\\hat\{z\}\}\}\{\\partial v\_\{\\theta\}\}However, materializing the Jacobian∂𝒫C\(z^\)∂z^\\frac\{\\partial\\mathcal\{P\}\_\{C\}\(\\hat\{z\}\)\}\{\\partial\\hat\{z\}\}explicitly results in a high memory footprint, leading to challenges with scaling to higher dimensional settings and models\. As opposed to explicitly constructing this matrix, consider that the full Jacobian is never required; rather, the vector\-Jacobian product \(VJP\) can be used:
\(∂𝒫C\(z^\)∂z^\)⊤u,u=∂ℒ∂𝒫C\(z^\),\\Bigl\(\\frac\{\\partial\\mathcal\{P\}\_\{C\}\(\\hat\{z\}\)\}\{\\partial\\hat\{z\}\}\\Bigr\)^\{\\top\}u,\\quad u=\\frac\{\\partial\\mathcal\{L\}\}\{\\partial\\mathcal\{P\}\_\{C\}\(\\hat\{z\}\)\},whereuuis the upstream gradient\. As shown by[Kotaryet al\.](https://arxiv.org/html/2605.12754#bib.bib7), when the projection layer is defined by the fixed\-point of an iterative solver, wherez1=𝐳⋆\(z^1\)=Φ\(𝐳⋆,z^1\)z\_\{1\}=\\mathbf\{z\}^\{\\star\}\(\\hat\{z\}\_\{1\}\)=\\Phi\(\\mathbf\{z\}^\{\\star\},\\hat\{z\}\_\{1\}\), this differentiation with respect toz^1\\hat\{z\}\_\{1\}becomes:
∂z1∂z^1=∂Φ∂𝐳⋆∂z1∂z^1\+∂Φ∂z^1\\frac\{\\partial z\_\{1\}\}\{\\partial\\hat\{z\}\_\{1\}\}=\\frac\{\\partial\\Phi\}\{\\partial\\mathbf\{z\}^\{\\star\}\}\\frac\{\\partial z\_\{1\}\}\{\\partial\\hat\{z\}\_\{1\}\}\+\\frac\{\\partial\\Phi\}\{\\partial\\hat\{z\}\_\{1\}\}The system can be rearranged as:
\(I−∂Φ∂z1\(k\)\)∂z1∂z^1=∂Φ∂z^1\\Bigl\(I\-\\frac\{\\partial\\Phi\}\{\\partial\{z\}\_\{1\}^\{\(k\)\}\}\\Bigr\)\\frac\{\\partial z\_\{1\}\}\{\\partial\\hat\{z\}\_\{1\}\}=\\frac\{\\partial\\Phi\}\{\\partial\\hat\{z\}\_\{1\}\}Then, by algebra, the VJP can be computed implicitly by:
\(I−∂Φ∂z1\(k\)\)⊤w=u\\Bigl\(I\-\\frac\{\\partial\\Phi\}\{\\partial\{z\}\_\{1\}^\{\(k\)\}\}\\Bigr\)^\{\\top\}w=uand obtain
\(∂𝒫C\(z^\)∂z^\)⊤u=\(∂Φ∂z^1\)⊤w,\\Bigl\(\\frac\{\\partial\\mathcal\{P\}\_\{C\}\(\\hat\{z\}\)\}\{\\partial\\hat\{z\}\}\\Bigr\)^\{\\top\}u=\\Bigl\(\\frac\{\\partial\\Phi\}\{\\partial\\hat\{z\}\_\{1\}\}\\Bigr\)^\{\\top\}w,wherewwis the solution to the linear system\.
## Appendix CWarm Starting Constraint\-Aware Training
As mentioned in Section[7](https://arxiv.org/html/2605.12754#S7), the CAFM objective can be warm started by initializing from pretrained flow matching model weights\. While warm starting is not a fundamental requirement of the methodology, we view this as a practical adaptation of CAFM, reducing the training complexity introduced by unrolling the projection operator\. This is particularly valuable, as including projections in the training forward pass typically results in a ~2\.3×\\timesincrease in runtime per training step on the PDE suite\. The subsequent ablation tests how warm starting reduces the number of CAFM training steps needed to reach convergence\.

FFMCAFMFwd\.111\.6 ms212\.4 msBwd\.12\.8 ms74\.6 msTotal124\.4 ms287\.0 ms
Figure 4:Left:Warm starting CAFM training from various points in standard flow matching training on Burgers BC\. Additional warm starting steps yield faster convergence of the CAFM objective, with later transitions quickly converging to the same point as earlier transitions\.Right:Table details average runtime for forward pass and backward pass operations, reporting the overall runtime impact of CAFM training\.Figure[4](https://arxiv.org/html/2605.12754#A3.F4)provides empirical evidence that warm starting is useful, with runs adapted to the CAFM objective later in the standard flow matching training process converging more quickly to similar levels of MMSE and SMSE error\. This provides strong practical evidence that training complexity can be mitigated through warm starting adaptations, and we defer further exploration of this behavior to subsequent studies\.
## Appendix DUnrolling Steps Ablation
Section[5\.2](https://arxiv.org/html/2605.12754#S5.SS2)defines the differentiable projection operator as an iterative optimization layer\. In particular, starting from the predicted clean endpoint𝐱\(0\)\(𝐜^\)=z^1\\mathbf\{x\}^\{\(0\)\}\(\\mathbf\{\\hat\{c\}\}\)=\\hat\{z\}\_\{1\}, the projection is approximated by repeatedly applying an update mapΦ\\Phi,
𝐱\(k\+1\)\(𝐜^\)=Φ\(𝐱\(k\),𝐜^\),k=0,…,K−1,\\mathbf\{x\}^\{\(k\+1\)\}\(\\mathbf\{\\hat\{c\}\}\)=\\Phi\(\\mathbf\{x\}^\{\(k\)\},\\mathbf\{\\hat\{c\}\}\),\\qquad k=0,\\ldots,K\-1,so that the projected endpoint used by CAFM is given by
𝒫C\(z^1\)≈𝐱\(K\)\(𝐜^\)\.\\mathcal\{P\}\_\{C\}\(\\hat\{z\}\_\{1\}\)\\approx\\mathbf\{x\}^\{\(K\)\}\(\\mathbf\{\\hat\{c\}\}\)\.The number of iterationsKKdictates the depth of the differentiable projection layer used during training\. To study the sensitivity of CAFM to this approximation, we perform an ablation over the number of unrolling stepsKKin the fixed\-grid wind setting\.
Table 3:One\-seed unrolling\-steps ablation for constraint\-aware fixed\-grid wind sampling\.For each value ofKK, we evaluate the downstream\-selected checkpoint trained with that number of projection iterations\. During inference, the augmented Lagrangian projection is run with a fixed number of iterations, isolating the results to the training setup\. The corresponding results are reported in Table[3](https://arxiv.org/html/2605.12754#A4.T3)\.
The ablation suggests that only a few unrolled iterations are necessary for this setting\. Increasing the number of projection\-layer iterations fromK=1K=1toK=2K=2sharply reduces coherence violation, andK=4K=4reaches the training\-data floor for this seed under the stronger inference projection\. Beyond this point, constraint violation does not improve monotonically, suggesting that the feasibility benefit of additional unrolling steps has largely saturated\. Larger values ofKK, however, preserve or slightly improve MMSE and substantially improve Variance MSE asK≥8K\\geq 8\.Similar Articles
Enforcing Constraints in Generative Sampling via Adaptive Correction Scheduling
This research paper introduces adaptive correction scheduling for enforcing hard constraints in generative sampling, demonstrating that it improves the cost-accuracy frontier compared to terminal or stepwise projection methods.
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories
LeapAlign is a post-training method that improves flow matching model alignment with human preferences by reducing computational costs through two-step trajectory shortcuts while enabling stable gradient propagation to early generation steps. The method outperforms state-of-the-art approaches when fine-tuning Flux models across various image quality and text-alignment metrics.
SDFlow: Similarity-Driven Flow Matching for Time Series Generation
This paper introduces SDFlow, a similarity-driven flow matching framework for time series generation that addresses exposure bias in autoregressive models. It achieves state-of-the-art performance and inference speedups by operating in the frozen VQ latent space with low-rank manifold decomposition.
Flow-OPD: On-Policy Distillation for Flow Matching Models
Flow-OPD is a research paper introducing a two-stage on-policy distillation framework for Flow Matching text-to-image models, significantly improving generation quality and alignment metrics using Stable Diffusion 3.5 Medium.
Follow the Mean: Reference-Guided Flow Matching [R]
Introduces Reference-Guided Flow Matching, a method that uses a reference distribution to guide the flow matching process, improving sample quality and generation efficiency.