Constrained Flow Optimization via Sequential Fine Tuning for Molecular Design
Summary
Introduces Constrained Flow Optimization (CFO), a framework for fine-tuning generative flow models to maximize rewards while satisfying constraints in molecular design, with theoretical guarantees and experimental validation.
View Cached Full Text
Cached at: 06/01/26, 09:29 AM
# Constrained Flow Optimization via Sequential Fine-Tuning for Molecular Design
Source: [https://arxiv.org/html/2605.30610](https://arxiv.org/html/2605.30610)
###### Abstract
Adapting generative foundation models, in particular diffusion and flow models, to optimize given reward functions \(e\.g\., binding affinity\) while satisfying constraints \(e\.g\., molecular synthesizability\) is fundamental for their adoption in real\-world scientific discovery applications such as molecular design or protein engineering\. While recent works have introduced scalable methods for reward\-guided fine\-tuning of such models via reinforcement learning and control schemes, it remains an open problem how to algorithmically trade\-off reward maximization and constraint satisfaction in a reliable and predictable manner\. Motivated by this challenge, we first present a rigorous framework for*Constrained Generative Optimization*, which brings an optimization viewpoint to the introduced adaptation problem and retrieves the relevant task of constrained generation as a sub\-case\. Then, we introduceConstrainedFlowOptimization \(CFO\), an algorithm that automatically and provably balances reward maximization and constraint satisfaction by reducing the original problem to sequential fine\-tuning via established, scalable methods\. We provide convergence guarantees for constrained generative optimization and constrained generation viaCFO\. Ultimately, we present an experimental evaluation ofCFOon both synthetic, yet illustrative, settings, and a molecular design task\. Across these evaluations,CFOachieves consistent increases in reward while ensuring high constraint satisfaction, showcasing its practical utility for constrained generative optimization\.
Machine Learning, ICML
## 1Introduction
Recent advances in generative modeling, particularly the advent of diffusion\(Hoet al\.,[2020](https://arxiv.org/html/2605.30610#bib.bib22); Songet al\.,[2020](https://arxiv.org/html/2605.30610#bib.bib43),[2022](https://arxiv.org/html/2605.30610#bib.bib44)\)and flow models\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.30610#bib.bib31)\), have led to state\-of\-the\-art performance in several fields such as image synthesis\(Rombachet al\.,[2022](https://arxiv.org/html/2605.30610#bib.bib41)\), biology\(Corsoet al\.,[2022](https://arxiv.org/html/2605.30610#bib.bib11); Wohlwendet al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib52)\), and chemistry\(Hoogeboomet al\.,[2022](https://arxiv.org/html/2605.30610#bib.bib23)\)\. In particular, they have been applied for the design of protein structures\(Wuet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib53)\), drug\-like molecules\(Dunn and Koes,[2024](https://arxiv.org/html/2605.30610#bib.bib16)\), and DNA sequences\(Starket al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib45)\), among others\. These generative models excel at capturing complex data distributions and generating realistic samples\. However, approximately sampling from the data distribution is insufficient for most real\-world discovery applications, where one typically wishes to generate candidates maximizing task\-specific*rewards*, a problem recently denoted by*generative optimization*\(De Santiet al\.,[2025a](https://arxiv.org/html/2605.30610#bib.bib13); Liet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib29)\)\. Examples of rewards of interest include binding affinity in drug discovery\(Pantsar and Poso,[2018](https://arxiv.org/html/2605.30610#bib.bib37)\), or drug\-likeness\(Bickertonet al\.,[2012](https://arxiv.org/html/2605.30610#bib.bib5)\)\. To tackle the generative optimization problem, recent works have introduced scalable fine\-tuning methods that adapt a pre\-trained flow or diffusion model to maximize a given reward function under KL\-regularization from the pre\-trained model, via reinforcement learning \(RL\) or control theoretic methods\(e\.g\., Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15); Ueharaet al\.,[2024b](https://arxiv.org/html/2605.30610#bib.bib49); Tang and Zhou,[2024](https://arxiv.org/html/2605.30610#bib.bib46)\)\.
The importance of known constraints in generative optimization\.Many generative design and scientific discovery problems require generated samples to satisfy explicit, domain\-specific constraints, e\.g\., bounded toxicity\(Amorimet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib1)\), synthetic accessibility\(Ertl and Schuffenhauer,[2009](https://arxiv.org/html/2605.30610#bib.bib17); Neeseret al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib35)\), or biophysical plausibility of docking poses\(Buttenschoenet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib9)\)\. Even though current fine\-tuning schemes regularize toward a pre\-trained model\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15); Ueharaet al\.,[2024b](https://arxiv.org/html/2605.30610#bib.bib49); Tang and Zhou,[2024](https://arxiv.org/html/2605.30610#bib.bib46)\), which limits the distributional drift, they cannot certify hard constraints to be satisfied\(Ueharaet al\.,[2024a](https://arxiv.org/html/2605.30610#bib.bib48)\)\. This limitation arises because task\-specific constraints may not be encoded in the original dataset or may be learned only imperfectly from finite training data\. A naive approach to address such explicit constraints would be to include them as rewards, i\.e\., as another term in a manually weighted objective function\. However, this approach is unreliable in practice, as the appropriate weighting between rewards and constraints varies across tasks and training phases, and needs to be determined through inefficient trial and error\. Furthermore, as optimization explores high\-reward regions, the chosen weights can unexpectedly favor reward at the expense of constraint satisfaction, yielding samples with attractive rewards, which, however, violate the domain\-specific constraints\. Driven by these limitations of current flow adaptation methods for constraint satisfaction, we pose the following question:
*How can we fine\-tune a pre\-trained flow or diffusion model to reliably and predictably trade\-off reward optimization and constraint satisfaction?*
Our approach\.A growing body of work demonstrates that classical optimization ideas can be meaningfully adapted to the fine\-tuning of flow and diffusion models, including formulations motivated by mirror descent\(Nemirovskij and Yudin,[1983](https://arxiv.org/html/2605.30610#bib.bib36); De Santiet al\.,[2025a](https://arxiv.org/html/2605.30610#bib.bib13)\), chance constraints\(Ben\-Tal and Nemirovski,[2000](https://arxiv.org/html/2605.30610#bib.bib4); Zhanget al\.,[2025a](https://arxiv.org/html/2605.30610#bib.bib55)\), and bilevel optimization\(Bracken and McGill,[1973](https://arxiv.org/html/2605.30610#bib.bib8); Xiaoet al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib54)\)\. Analogously, in this work, we aim to tackle this question by introducing a formal framework for*Constrained Generative Optimization*\(Sec\.[3](https://arxiv.org/html/2605.30610#S3)\) via flow model fine\-tuning, which entails adapting a pre\-trained flow model to generate samples maximizing a reward function while satisfying arbitrary constraints\. Moreover, the proposed formulation retrieves the relevant task of constrained generative modeling as the sub\-case where the reward function is constant\. Next, we introduceConstrainedFlowOptimization \(CFO\), a dual approach based on the augmented Lagrangian scheme\(Birgin and Martínez,[2014](https://arxiv.org/html/2605.30610#bib.bib6)\)that turns the constrained objective into a sequence of ordinary generative optimization subproblems\. At a high level,CFOalternates between two steps: solving a KL\-regularized fine\-tuning problem\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15); Ueharaet al\.,[2024b](https://arxiv.org/html/2605.30610#bib.bib49)\)to maximize an augmented reward function, and updating the parameters of the augmented reward using estimated constraint violations on generated samples \(see Sec\.[4](https://arxiv.org/html/2605.30610#S4)\)\. This sequentially tunes the penalty on constraint violations, thereby avoiding the need to manually trade\-off weight selection\.CFOrenders it possible to adapt a pre\-trained flow model to maximize expected rewards while enforcing satisfaction of arbitrary constraints and preserving closeness to the pre\-trained model\. We provide guarantees that ensure constraint satisfaction under the realistic assumptions of an approximate solver, and that achieve reward maximization under a more idealized setting \(Sec\.[5](https://arxiv.org/html/2605.30610#S5)\)\. Finally, we evaluateCFOfor both constrained generative optimization and modeling problems, showcasing its performance in both visually interpretable settings and in molecular design tasks, showing constrained optimization of quantum mechanical properties \(Sec\.[6](https://arxiv.org/html/2605.30610#S6)\)\.
Our contributions\.We present the following contributions:
- •We formulate*Constrained Generative Optimization*via flow fine\-tuning, capturing the practically relevant task of reward\-guided adaptation under given constraints \(Sec\.[3](https://arxiv.org/html/2605.30610#S3)\)\.
- •We introduceConstrainedFlowOptimization \(CFO\), an augmented Lagrangian\-based method that provably tackles the introduced problem via sequential fine\-tuning \(Sec\.[4](https://arxiv.org/html/2605.30610#S4)\)\.
- •We provide guarantees for constrained generation and optimization viaCFOunder two oracle assumptions, leveraging augmented Lagrangian theory \(Sec\.[5](https://arxiv.org/html/2605.30610#S5)\)\.
- •We demonstrateCFO’s ability to trade\-off reward maximization and constraint satisfaction in both a visually interpretable setting and a high\-dimensional molecular design task \(Sec\.[6](https://arxiv.org/html/2605.30610#S6)\)\.
\(a\)Constrained generative optimization via fine\-tuning problem\.
\(b\)Adaptation to low\-cost area within black line\.
Figure 1:\([1\(a\)](https://arxiv.org/html/2605.30610#S1.F1.sf1)\) Pre\-trained and fine\-tuned policies inducing densitiesp1prep^\{\\text\{pre\}\}\_\{1\}and optimal densityp1∗p\_\{1\}^\{\*\}w\.r\.t\. rewardrrincreasing downwards and in red a high\-cost area\. \([1\(b\)](https://arxiv.org/html/2605.30610#S1.F1.sf2)\) Pre\-trained modelp1prep^\{\\text\{pre\}\}\_\{1\}adapts intop1∗p\_\{1\}^\{\*\}to maximizerrand stay within the constraint region inside the black line\.
## 2Background and Notation
Flow Models\.Flow\-based generative models constitute a prominent class of approaches for transforming a simple basepbasep^\{\\text\{base\}\}distribution \(e\.g\., Gaussian\) into a complex data distributionpdatap\_\{\\rm\{data\}\}\(Songet al\.,[2022](https://arxiv.org/html/2605.30610#bib.bib44),[2020](https://arxiv.org/html/2605.30610#bib.bib43); Lipmanet al\.,[2022](https://arxiv.org/html/2605.30610#bib.bib31)\)\. Formally, a flow is a time\-dependent mapψ:\[0,1\]×ℝd→ℝd\\psi:\[0,1\]\\times\\mathbb\{R\}^\{d\}\\to\\mathbb\{R\}^\{d\}, whereψt\(x0\)\\psi\_\{t\}\(x\_\{0\}\)denotes the position at timettof a sample that started atx0x\_\{0\}\. The trajectory ofxtx\_\{t\}is governed by a time\-dependent velocity fieldu:\[0,1\]×ℝd→ℝdu:\[0,1\]\\times\\mathbb\{R\}^\{d\}\\to\\mathbb\{R\}^\{d\}through the ordinary differential equation \(ODE\):
ddtψt\(x0\)=ut\(ψt\(x0\)\),ψ0\(x0\)=x0\.\\tfrac\{\\text\{d\}\}\{\\text\{d\}t\}\\psi\_\{t\}\(x\_\{0\}\)=u\_\{t\}\(\\psi\_\{t\}\(x\_\{0\}\)\),\\quad\\psi\_\{0\}\(x\_\{0\}\)=x\_\{0\}\.\(1\)A*generative*flow model defines a continuous\-time Markov process\{Xt\}t∈\[0,1\]\\\{X\_\{t\}\\\}\_\{t\\in\[0,1\]\}, by sampling an initial valueX0∼pbaseX\_\{0\}\\sim p^\{\\text\{base\}\}and evolving it according to the flow map,Xt=ψt\(X0\)X\_\{t\}=\\psi\_\{t\}\(X\_\{0\}\)\. The terminal stateX1=ψ1\(X0\)X\_\{1\}=\\psi\_\{1\}\(X\_\{0\}\)is required to follow the target distribution, i\.e\.,X1∼pdataX\_\{1\}\\sim p\_\{\\rm\{data\}\}\. Equivalently, the flow induces a family of intermediate marginal densitiesptp\_\{t\}describing the law ofXtX\_\{t\}at each timet∈\[0,1\]t\\in\[0,1\]\. We say that a velocity fielduugenerates the probability path\{pt\}t∈\[0,1\]\\\{p\_\{t\}\\\}\_\{t\\in\[0,1\]\}if the random variableXt=ψt\(X0\)∼ptX\_\{t\}\\\!=\\\!\\psi\_\{t\}\(X\_\{0\}\)\\\!\\sim\\\!p\_\{t\}for allt<1t\\\!<\\\!1\. In practice, choosingpbase=𝒩\(0,I\)p^\{\\text\{base\}\}=\\mathcal\{N\}\(0,I\)makes sampling tractable whileutu\_\{t\}provides the complexity needed to reachpdatap\_\{\\rm\{data\}\}\.
Flow Matching\.Flow Matching\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.30610#bib.bib31)\)is a simulation\-free algorithm to learn a vector fielduθu\_\{\\theta\}, such that the induced marginal densitiesptuθp^\{u\_\{\\theta\}\}\_\{t\}coincide with a prescribed probability path\{pt\}t∈\[0,1\]\\\{p\_\{t\}\\\}\_\{t\\in\[0,1\]\}and satisfyingp0uθ=pbasep^\{u\_\{\\theta\}\}\_\{0\}=p^\{\\text\{base\}\}andp1uθ=pdatap^\{u\_\{\\theta\}\}\_\{1\}=p\_\{\\rm\{data\}\}\.Lipmanet al\.\([2022](https://arxiv.org/html/2605.30610#bib.bib31)\)demonstrate that the Flow Matching and Conditional Flow Matching objectives share identical gradients, ensuring they converge to the same optimal vector field\. In practice, this is achieved by introducing a reference flow and regressing the learned fielduθ\(xt,t\)u\_\{\\theta\}\(x\_\{t\},t\)against the reference velocity:
minθ𝔼t,p\(x0,x1\)\[‖uθ\(xt,t\)−ddtψtref\(x\)‖2\]\.\\min\_\{\\theta\}\\mathbb\{E\}\_\{t,p\(x\_\{0\},x\_\{1\}\)\}\\left\[\\left\\lVert u\_\{\\theta\}\(x\_\{t\},t\)\-\\tfrac\{\\text\{d\}\}\{\\text\{d\}t\}\\psi^\{\\text\{ref\}\}\_\{t\}\(x\)\\right\\rVert^\{2\}\\right\]\.\(2\)With an appropriate choice of the reference flow, specifically one that follows a diffusion trajectory, the Flow Matching framework recovers diffusion models as a particular case, showing that diffusion training objectives can be viewed as special instances of flow\-based learning\(Lipmanet al\.,[2022](https://arxiv.org/html/2605.30610#bib.bib31); Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\)\. In practice,uθu\_\{\\theta\}is parameterized by a neural network, and sampling fromp1uθp^\{u\_\{\\theta\}\}\_\{1\}\(≈pdata\)\(\\approx p\_\{\\rm\{data\}\}\)is performed via simulating the ODE in Eq\.[1](https://arxiv.org/html/2605.30610#S2.E1)\.
Reinforcement Learning in continuous\-time\.Finite\-horizon continuous\-time reinforcement learning \(RL\)\(Wanget al\.,[2020](https://arxiv.org/html/2605.30610#bib.bib51); Trevenet al\.,[2023](https://arxiv.org/html/2605.30610#bib.bib47); Zhaoet al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib57)\)provides a framework for decision\-making in dynamical systems and can be cast as an instance of optimal control\. The state space is𝒳≔ℝd×\[0,1\]\\mathcal\{X\}\\\!\\coloneqq\\\!\\mathbb\{R\}^\{d\}\\times\\\!\[0,1\]and actions are taken from an action space𝒜\\mathcal\{A\}\. A policyπ:𝒳→𝒜\\pi:\\mathcal\{X\}\\to\\mathcal\{A\}prescribes an action for each state\(x,t\)∈𝒳\(x,t\)\\in\\mathcal\{X\}, yielding the dynamics:
ddtψt\(x\)=at\(ψt\(x\)\),at=π\(Xt,t\),X0∼pbase\.\\tfrac\{\\text\{d\}\}\{\\text\{d\}t\}\\psi\_\{t\}\(x\)\\;=\\;a\_\{t\}\(\\psi\_\{t\}\(x\)\),\\;a\_\{t\}=\\pi\(X\_\{t\},t\),\\;X\_\{0\}\\sim p^\{\\text\{base\}\}\.\(3\)The resulting process\{Xt\}t∈\[0,1\]\\\{X\_\{t\}\\\}\_\{t\\in\[0,1\]\}induces a family of marginals\{ptπ\}t∈\[0,1\]\\\{p\_\{t\}^\{\\pi\}\\\}\_\{t\\in\[0,1\]\}\. The aim is to optimize the expected performance, typically expressed through an integral reward accumulated along the trajectory and a terminal reward att=1t=1\(Wanget al\.,[2020](https://arxiv.org/html/2605.30610#bib.bib51)\)\. In our setting, we focus solely on the terminal reward\. We use RL notation to emphasize its generality and connection to standard practice, while noting that the setting coincides with deterministic optimal control since both the dynamics and the objective are known\.
Pre\-trained Flow Models as RL Policy\.A pre\-trained flow can be viewed as a feedback policy: at each timettand statexx, the velocity fieldupre\(x,t\)u^\{\\text\{pre\}\}\(x,t\)prescribes the action that determines how the system evolves\. Definingat=πpre\(Xt,t\)≔upre\(Xt,t\)a\_\{t\}=\\pi^\{\\text\{pre\}\}\(X\_\{t\},t\)\\coloneqq u^\{\\text\{pre\}\}\(X\_\{t\},t\)for a policyπpre:𝒳→𝒜\\pi^\{\\text\{pre\}\}:\\mathcal\{X\}\\\!\\to\\\!\\mathcal\{A\}\(De Santiet al\.,[2025b](https://arxiv.org/html/2605.30610#bib.bib12)\), and substituting into Eq\.[3](https://arxiv.org/html/2605.30610#S2.E3), yields deterministic closed\-loop dynamics\. Starting fromX0∼p0X\_\{0\}\\\!\\sim\\\!p\_\{0\}, rolling outπpre\\pi^\{\\text\{pre\}\}produces a trajectory\{Xt\}t∈\[0,1\]\\\{X\_\{t\}\\\}\_\{t\\in\[0,1\]\}with induced marginals\{ptπpre\}t∈\[0,1\]\\\{p\_\{t\}^\{\\pi^\{\\text\{pre\}\}\}\\\}\_\{t\\in\[0,1\]\}\. Intuitively, the policy selects the direction and speed to steer samples so that their distribution progressively matches the data, with the terminal marginalp1pre≔p1πpre≈pdatap\_\{1\}^\{\\text\{pre\}\}\\coloneqq p\_\{1\}^\{\\pi^\{\\text\{pre\}\}\}\\approx p\_\{\\text\{data\}\}\. Viewing flow models through this policy lens not only unifies flow\-based generation and control theory but also enables downstream fine\-tuning as policy improvement with a terminal reward\. For brevity, we refer to the pre\-trained flow by its implicit policyπpre\\pi^\{\\text\{pre\}\}\.
## 3Constrained Generative Optimization via Flow Fine\-Tuning
In this work, we aim to fine\-tune a pre\-trained flow modelπpre\\pi^\{\\text\{pre\}\}to obtain a new modelπ∗\\pi^\{\*\}inducing a process:
ddtψt\(x\)=atfine\(ψt\(x\)\),withatfine=π∗\(xt,t\)\.\\tfrac\{\\text\{d\}\}\{\\text\{d\}t\}\\psi\_\{t\}\(x\)=a\_\{t\}^\{\\text\{fine\}\}\(\\psi\_\{t\}\(x\)\),\\text\{ with \}a\_\{t\}^\{\\text\{fine\}\}=\\pi^\{\*\}\(x\_\{t\},t\)\.\(4\)such that its induced distributionp1∗≔p1π∗p^\{\*\}\_\{1\}\\coloneqq p\_\{1\}^\{\\pi^\{\*\}\}maximizes the expected value of a property of interest, while satisfying arbitrary constraints and preserving prior information fromπpre\\pi^\{\\text\{pre\}\}\. We denote this problem by*constrained generative optimization via fine\-tuning*, illustrated in Figure[1](https://arxiv.org/html/2605.30610#S1.F1)and defined as:
Constrained Generative Optimization via Flow Fine\-Tuningargmaxπ𝔼x∼p1π\[r\(x\)\]−αDKL\(p1π\|\|p1pre\)s\.t\.𝔼x∼p1π\[c\(x\)\]≤B\\begin\{split\}\\operatorname\*\{arg\\,max\}\_\{\\pi\}\\;&\\mathbb\{E\}\_\{x\\sim p^\{\\pi\}\_\{1\}\}\[r\(x\)\]\-\\alpha D\_\{KL\}\(p^\{\\pi\}\_\{1\}\|\|p^\{\\text\{pre\}\}\_\{1\}\)\\\\ \\text\{s\.t\.\}\\;&\\mathbb\{E\}\_\{x\\sim p^\{\\pi\}\_\{1\}\}\[c\(x\)\]\\leq B\\end\{split\}\(5\)
Wherer:𝒳→ℝr\\\!:\\\!\\mathcal\{X\}\\\!\\to\\\!\\mathbb\{R\}andc:𝒳→ℝc\\\!:\\\!\\mathcal\{X\}\\\!\\to\\\!\\mathbb\{R\}are a scalar reward and constraint functions,α∈ℝ\+\\alpha\\\!\\in\\\!\\mathbb\{R\}\_\{\+\}determines the KL\-regularization strength, andB∈ℝB\\\!\\in\\\!\\mathbb\{R\}is the upper bound on the constraint\. At the level of the problem statement, no differentiability ofrrorccis assumed: Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5)is posed for arbitraryr,cr,c\. Any additional regularity arises only from the particular inner solver we instantiate later, not from the problem itself\. Setting the reward termrrto be constant \(e\.g\.,r=0r\\\!=\\\!0\) in Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5), reduces the objective to a formulation of*constrained generation*as minimization of a KL divergence between the fine\-tuned model densityp1πp^\{\\pi\}\_\{1\}and the pre\-trained model \(i\.e\.,p1prep^\{\\text\{pre\}\}\_\{1\}\), while satisfying the expected constraint bound in Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5):
argminπαDKL\(p1π\|\|p1pre\)s\.t\.𝔼x∼p1π\[c\(x\)\]≤B\\operatorname\*\{arg\\,min\}\_\{\\pi\}\\;\\alpha D\_\{KL\}\(p^\{\\pi\}\_\{1\}\|\|p^\{\\text\{pre\}\}\_\{1\}\)\\quad\\text\{s\.t\.\}\\quad\\mathbb\{E\}\_\{x\\sim p^\{\\pi\}\_\{1\}\}\[c\(x\)\]\\leq B\(6\)This problem has been studied before byChamonet al\.\([2024](https://arxiv.org/html/2605.30610#bib.bib10)\); Khalafiet al\.\([2025](https://arxiv.org/html/2605.30610#bib.bib25)\)\. A first approach to tackle Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5)is to optimize a fixed\-weight Lagrangian\(Chamonet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib10); Zhanget al\.,[2025b](https://arxiv.org/html/2605.30610#bib.bib56)\):
maxπℒμ\(π\)=𝔼x∼p1π\[r\(x\)\]−αDKL\(p1π\|\|p1pre\)−μ\(𝔼x∼p1π\[c\(x\)\]−B\)s\.t\.μ≥0\\begin\{split\}\\max\_\{\\pi\}\\;\\mathcal\{L\}\_\{\\mu\}\(\\pi\)\\;=\\;\\mathbb\{E\}\_\{x\\sim p^\{\\pi\}\_\{1\}\}\[r\(x\)\]\-\\alpha D\_\{KL\}\(p^\{\\pi\}\_\{1\}\|\|p^\{\\text\{pre\}\}\_\{1\}\)\\\\ \-\\mu\\left\(\\mathbb\{E\}\_\{x\\sim p^\{\\pi\}\_\{1\}\}\[c\(x\)\]\-B\\right\)\\quad\\text\{s\.t\.\}\\;\\;\\mu\\geq 0\\end\{split\}\(7\)Here,μ∈ℝ≥0\\mu\\\!\\in\\\!\\mathbb\{R\}\_\{\\geq 0\}denotes the Lagrange multiplier that penalizes constraint violations\. However, optimizingℒμ\\mathcal\{L\}\_\{\\mu\}with a fixedμ\\muis unreliable for enforcing the constraint\. First, feasibility \(i\.e\.,𝔼x∼p1π\[c\(x\)\]≤B\\mathbb\{E\}\_\{x\\sim p^\{\\pi\}\_\{1\}\}\[c\(x\)\]\\leq B\) is not guaranteed for any givenμ\\mu, unless it exceeds an unknown, problem\-dependent threshold\. Second,μ\\mumust be tuned by hand, and there is no guaranteed or monotone mapping fromμ\\muto the resulting violation, so trial\-and\-error often leads to either infeasible or overly conservative solutions\. Finally, ifrris unbounded or approximate \(e\.g\., a learned proxy reward function\), maximizingℒμ\\mathcal\{L\}\_\{\\mu\}may shift probability mass toward high\-reward regions, yielding invalid designs\.
Toward overcoming such limitations, in the next section, we propose an algorithm that can provably tackle the constrained generative optimization problem introduced in Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5)by sequentially fine\-tuning the initial pre\-trained model via established methods\(e\.g\., Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\)\.
## 4ConstrainedFlowOptimization \(CFO\)
In the following, we introduceConstrainedFlowOptimization \(Alg\.[1](https://arxiv.org/html/2605.30610#alg1)\), which addresses the*constrained generative optimization*problem as formulated in Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5)by solving a sequence of unconstrained entropy\-regularized fine\-tuning subproblems, each with a different objective function computed via an augmented Lagrangian \(AL\) scheme\(Rockafellar,[1976](https://arxiv.org/html/2605.30610#bib.bib40); Fortin,[1975](https://arxiv.org/html/2605.30610#bib.bib19); Birgin and Martínez,[2014](https://arxiv.org/html/2605.30610#bib.bib6)\)\. Intuitively,CFOtackles the problem by embedding the given constraint into an*augmented*reward via two adaptive dual parameters, so that at each iteration, a standard entropy\-regularized fine\-tuning solver steers the model toward feasibility while improving reward\. Concretely,CFOmaintains two dual variables, the penalty parameterρk\\rho\_\{k\}and the Lagrange multiplierλk\\lambda\_\{k\}, whose updates effectively realize a proximal\-point\-style scheme on the dual variables\(Birgin and Martínez,[2014](https://arxiv.org/html/2605.30610#bib.bib6)\)\.
Overview of the Algorithm\.CFO\(Alg\.[1](https://arxiv.org/html/2605.30610#alg1)\) takes as input a pre\-trained modelπpre\\pi\_\{\\text\{pre\}\}, a number of iterationsKK, a minimal Lagrange multiplierλmin<0\\lambda\_\{\\text\{min\}\}\\\!<\\\!0, an initial penalty parameterρinit\>0\\rho\_\{\\text\{init\}\}\\\!\>\\\!0, a penalty growth rateη≥1\\eta\\\!\\geq\\\!1, and a contraction value0<τ<10\\\!<\\\!\\tau\\\!<\\\!1\. At each iterationkk,CFOperforms55main steps:
Algorithm 1ConstrainedFlowOptimization \(CFO\)1:Input:
πpre\\pi\_\{\\text\{pre\}\}: pre\-trained model,
KK: number of iterations,
λmin<0\\lambda\_\{\\text\{min\}\}<0: min\. Lagrange multiplier,
ρinit\>0\\rho\_\{\\text\{init\}\}\>0: initial penalty parameter,
η≥1\\eta\\geq 1: growth rate,
0<τ<10<\\tau<1: contraction value
2:Init:Set initial Lagrange multiplier
λ1=0\\lambda\_\{1\}=0and penalty
ρ1=ρinit\\rho\_\{1\}=\\rho\_\{\\text\{init\}\}parameters
3:for
k=1,2,…,Kk=1,2,\\dots,Kdo
4:Step 1:Update fine\-tuning AL objective:
fk\(x\)≔r\(x\)−ρk2\[max\(0,c\(x\)−B−λkρk\)\]2f\_\{k\}\(x\)\\coloneqq r\(x\)\-\\frac\{\\rho\_\{k\}\}\{2\}\\left\[\\max\\left\(0,c\(x\)\-B\-\\frac\{\\lambda\_\{k\}\}\{\\rho\_\{k\}\}\\right\)\\right\]^\{2\}\(8\)
5:Step 2:Compute
πk\\pi\_\{k\}via fine\-tuning:
πk←FineTuningSolver\(fk,πpre\)\\pi\_\{k\}\\leftarrow\\textsc\{\\small\{FineTuningSolver\}\}\(f\_\{k\},\\pi\_\{\\text\{pre\}\}\)\\vskip\-4\.2679pt\(9\)
6:Step 3:Set the empirical constraint gap
GkG\_\{k\}and contraction statistic
VkV\_\{k\}:
Gk=𝔼x∼p1πk\[c\(x\)\]−BVk=min\{Gk,−λk/ρk\}\\begin\{split\}G\_\{k\}&=\\mathbb\{E\}\_\{x\\sim p^\{\\pi\_\{k\}\}\_\{1\}\}\[c\(x\)\]\-B\\\\ V\_\{k\}&=\\min\\left\\\{G\_\{k\},\-\\lambda\_\{k\}/\\rho\_\{k\}\\right\\\}\\end\{split\}\\vskip\-2\.84526pt\(10\)
7:Step 4:Compute Lagrange multiplier proposal:
λk\+1←max\{λmin,min\{0,λk−ρkGk\}\}\\lambda\_\{k\+1\}\\leftarrow\\max\\left\\\{\\lambda\_\{\\text\{min\}\},\\min\\left\\\{0,\\lambda\_\{k\}\-\\rho\_\{k\}G\_\{k\}\\right\\\}\\right\\\}\(11\)
8:Step 5:Set the new penalty:
ρk\+1=\{ρk,ifk=1orVk≤τVk−1,ηρk,otherwise\\rho\_\{k\+1\}=\\begin\{cases\}\\rho\_\{k\},&\\text\{if \}k=1\\text\{ or \}V\_\{k\}\\leq\\tau V\_\{k\-1\},\\\\ \\eta\\rho\_\{k\},&\\text\{otherwise\}\\end\{cases\}\(12\)
9:endfor
10:Return:
πK\\pi\_\{K\}
Step 1:An augmented objectivefkf\_\{k\}\(Eq\.[8](https://arxiv.org/html/2605.30610#S4.E8)\) is formed as the difference between the reward and a penalty term:
fk\(x\)=r\(x\)−ρk2\[max\(0,c\(x\)−B−λkρk\)\]2,f\_\{k\}\(x\)\\;=\\;r\(x\)\-\\frac\{\\rho\_\{k\}\}\{2\}\\left\[\\max\\left\(0,c\(x\)\-B\-\\frac\{\\lambda\_\{k\}\}\{\\rho\_\{k\}\}\\right\)\\right\]^\{2\},where the offsetλk/ρk≤0\\lambda\_\{k\}/\\rho\_\{k\}\\leq 0shifts the term toward the current expected constraint\(Birgin and Martínez,[2014](https://arxiv.org/html/2605.30610#bib.bib6)\)\.
Step 2:AFineTuningSolver\(e\.g\., Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\)computesπk\\pi\_\{k\}by solving a standard KL\-regularized control \(or RL\) subproblem, with the current objectivefkf\_\{k\}:
πk∈argmaxπ𝔼x∼p1π\[fk\(x\)\]−αDKL\(p1π\|\|p1pre\),\\pi\_\{k\}\\;\\in\\;\\arg\\max\_\{\\pi\}\\;\\mathbb\{E\}\_\{x\\sim p^\{\\pi\}\_\{1\}\}\\left\[f\_\{k\}\(x\)\\right\]\-\\alpha D\_\{KL\}\(p^\{\\pi\}\_\{1\}\|\|p^\{\\text\{pre\}\}\_\{1\}\),\(13\)For completeness, we report a detailed implementation of this*oracle*step in Appendix[A](https://arxiv.org/html/2605.30610#A1)\. Other established fine\-tuning schemes can be used in place of Adjoint Matching, including gradient\-free choices such as DiffusionNFT\(Zhenget al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib85)\)and Flow\-GRPO\(Liuet al\.,[2026](https://arxiv.org/html/2605.30610#bib.bib83)\), which additionally enable non\-differentiable rewards and constraints\.
Step 3:CFOcomputes a Monte Carlo estimate of the constraintccunder the current policyπk\\pi\_\{k\}\(see Eq\.[10](https://arxiv.org/html/2605.30610#S4.E10)\), and subtracts the user\-defined bound B, thus obtaining the*empirical constraint gap*GkG\_\{k\}\. Furthermore, a*contraction statistic*VkV\_\{k\}is computed, which measures the current progress toward feasibility by comparing the recent estimateGkG\_\{k\}of the constraint gap with theλk/ρk≤0\\lambda\_\{k\}/\\rho\_\{k\}\\leq 0offset term\.
Step 4:Next,CFOuses the empirical constraint gapGkG\_\{k\}to apply a projected dual update to the Lagrange multiplier \(Eq\.[11](https://arxiv.org/html/2605.30610#S4.E11)\)\. IfGk\>0G\_\{k\}\\\!\>\\\!0, i\.e\., the constraint is violated, the multiplierλk\+1\\lambda\_\{k\+1\}is decreased, this strengthens the penalty\. Instead, ifGk<0G\_\{k\}\\\!<\\\!0, i\.e\., the constraint is fulfilled, the Lagrange multiplierλk\\lambda\_\{k\}is increased toward 0 to relax the penalty strength\.
Step 5:The contraction statisticVkV\_\{k\}\(Eq\.[10](https://arxiv.org/html/2605.30610#S4.E10)\) assesses progress toward feasibility\. IfVkV\_\{k\}does not contract sufficiently, i\.e\.,Vk\>τVk−1V\_\{k\}\\\!\>\\\!\\tau V\_\{k\-1\}, whereτ\\tauis a user\-defined contraction rate, thenCFOinfers that the penalty is not sufficiently high and thus increases it by a factorη\\eta\. Instead, ifVkV\_\{k\}is contracting,ρ\\rhois kept fixed \(Eq\.[12](https://arxiv.org/html/2605.30610#S4.E12)\)\. Ultimately,CFOreturns the fine\-tuned policyπK\\pi\_\{K\}\.
A discussion of hyperparameters appears in Appendix[D](https://arxiv.org/html/2605.30610#A4)\. Nevertheless, it is a priori unclear whetherCFOis guaranteed to solve the*constrained generative optimization*problem \(Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5)\)\. In the next section, we answer this affirmatively by showing that, under oracle assumptions,CFOachieves reward optimality and arbitrary constraint satisfaction\.
\(a\)p1prep^\{\\text\{pre\}\}\_\{1\}samples
\(b\)CFOsamples
\(c\)AM samples
\(d\)Evaluation
\(e\)p1prep^\{\\text\{pre\}\}\_\{1\}samples
\(f\)CFOwithB=0B\{=\}0
\(g\)CFOwithB=1B\{=\}1
\(h\)Evaluation
Figure 2:TopConstrained Generative Optimization: Samples from the pre\-trained policy \([2\(a\)](https://arxiv.org/html/2605.30610#S4.F2.sf1)\) and policies fine\-tuned withCFO\(with AM asFineTuningSolver\) \([2\(b\)](https://arxiv.org/html/2605.30610#S4.F2.sf2)\) and Adjoint Matching \([2\(c\)](https://arxiv.org/html/2605.30610#S4.F2.sf3)\) \(AM\)\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\)\.BottomConstrained generation: Samples from the pre\-trained policy \([2\(e\)](https://arxiv.org/html/2605.30610#S4.F2.sf5)\) and policies fine\-tuned withCFOforB=0B\{=\}0\([2\(f\)](https://arxiv.org/html/2605.30610#S4.F2.sf6)\) andB=1B\{=\}1\([2\(g\)](https://arxiv.org/html/2605.30610#S4.F2.sf7)\)\. The constraint\-free area is inside the red triangles\. Tables[2\(d\)](https://arxiv.org/html/2605.30610#S4.F2.sf4)and[2\(h\)](https://arxiv.org/html/2605.30610#S4.F2.sf8)present numerical results for their row\.
## 5Guarantees forCFO
Before presenting the convergence properties ofCFO, we first establish a mild and realistic assumption on theFineTuningSolverused in Alg\.[1](https://arxiv.org/html/2605.30610#alg1), which formalizes the approximate nature of its optimization steps and serves as the foundation for the theoretical guarantees that follow\.
###### Assumption 5\.1\(Approx\. Solver\)\.
At every iterationkk, the solver outputs a policyπk\\pi\_\{k\}satisfying:
Lρk\(πk,λk\)≥Lρk\(π,λk\)−ϵk,∀πL\_\{\\rho\_\{k\}\}\(\\pi\_\{k\},\\lambda\_\{k\}\)\\geq L\_\{\\rho\_\{k\}\}\(\\pi,\\lambda\_\{k\}\)\-\\epsilon\_\{k\},\\quad\\forall\\pi\(14\)whereLρk\(πk,λk\)=𝔼x∼p1πk\[fk\(x\)\]−αDKL\(p1π\|\|p1pre\)L\_\{\\rho\_\{k\}\}\(\\pi\_\{k\},\\lambda\_\{k\}\)\\\!=\\\!\\mathbb\{E\}\_\{x\\sim p^\{\\pi\_\{k\}\}\_\{1\}\}\\left\[f\_\{k\}\(x\)\\right\]\\\!\-\\\!\\alpha D\_\{KL\}\(p^\{\\pi\}\_\{1\}\|\|p^\{\\text\{pre\}\}\_\{1\}\)and the sequence\{ϵk\}⊆ℝ\+\\\{\\epsilon\_\{k\}\\\}\\subseteq\\mathbb\{R\}\_\{\+\}is bounded\.
This assumption captures the approximate nature of practical fine\-tuning or optimization oracles, and is standard in augmented Lagrangian \(AL\) frameworks\. It has been adopted in recent works\(e\.g\., De Santiet al\.,[2025b](https://arxiv.org/html/2605.30610#bib.bib12)\)\. The key requirement is that the approximation error remains bounded\.
We define the infeasibility of a policyπ\\pias:
G\(π\)=𝔼x∼p1π\[c\(x\)\]−B\.G\(\\pi\)=\\mathbb\{E\}\_\{x\\sim p^\{\\pi\}\_\{1\}\}\[c\(x\)\]\-B\.\(15\)If the infeasibilityG\(π\)G\(\\pi\)of a given policy is positive, the policy is infeasible, i\.e\., its average constraint is larger than the permissible bound\. IfG\(π\)G\(\\pi\)is negative, the policy is feasible and thus fulfills the constraint\. Using Assumption[5\.1](https://arxiv.org/html/2605.30610#S5.Thmtheorem1)and Eq\.[15](https://arxiv.org/html/2605.30610#S5.E15), we state our main convergence results forCFO\. The proofs are in Appendix[E](https://arxiv.org/html/2605.30610#A5)and draw on the analysis developed byBirgin and Martínez \([2014](https://arxiv.org/html/2605.30610#bib.bib6)\)\.
###### Theorem 5\.2\(Feasibility ofCFO\)\.
Let\{πk\}\\\{\\pi\_\{k\}\\\}be a sequence generated by Alg\.[1](https://arxiv.org/html/2605.30610#alg1)under Assumption[5\.1](https://arxiv.org/html/2605.30610#S5.Thmtheorem1)on theFineTuningSolver\. Letπ¯\\bar\{\\pi\}be a limit of the sequence\{πk\}\\\{\\pi\_\{k\}\\\}\. Then, we have:
⟨G\(π¯\)⟩\+≤⟨G\(π\)⟩\+∀π\\left\\langle G\(\\bar\{\\pi\}\)\\right\\rangle\_\{\+\}\\leq\\left\\langle G\(\\pi\)\\right\\rangle\_\{\+\}\\quad\\forall\\pi\(16\)whereG\(π\)G\(\\pi\)is defined in Eq\.[15](https://arxiv.org/html/2605.30610#S5.E15)and⟨⋅⟩\+:=max\{0,⋅\}\\left\\langle\\cdot\\right\\rangle\_\{\+\}:=\\max\\\{0,\\cdot\\\}
Concretely, Theorem[5\.2](https://arxiv.org/html/2605.30610#S5.Thmtheorem2)states thatCFOreturns a policy that minimizes the introduced infeasibility measure \(Eq\.[15](https://arxiv.org/html/2605.30610#S5.E15)\)\. Thus, returning either a feasible policy or one which minimizes the infeasibility as far as possible\.
###### Corollary 5\.3\(Feasibility of the Limiting Policy\)\.
Under the same conditions as Theorem[5\.2](https://arxiv.org/html/2605.30610#S5.Thmtheorem2), if the underlying problem admits a feasible policy, then the limiting policyπ¯\\bar\{\\pi\}is feasible, i\.e\., it satisfies the constraint \(i\.e\.,G\(π¯\)≤0G\(\\bar\{\\pi\}\)\\leq 0\)\.
Theorem[5\.2](https://arxiv.org/html/2605.30610#S5.Thmtheorem2)and Corollary[5\.3](https://arxiv.org/html/2605.30610#S5.Thmtheorem3)establish constraint satisfiability ofCFObut do not yet show optimality of the returned policy\. To achieve optimality,CFOrequires a stronger assumption on theFineTuningSolver, namely that the approximation error vanishes asymptotically, i\.e\.,ϵk→0\\epsilon\_\{k\}\\to 0\.
###### Theorem 5\.4\(Optimality ofCFO\)\.
Let\{πk\}\\\{\\pi\_\{k\}\\\}be the sequence generated by Alg\.[1](https://arxiv.org/html/2605.30610#alg1)under Assumption[5\.1](https://arxiv.org/html/2605.30610#S5.Thmtheorem1)withlimk→∞ϵk=0\\lim\_\{k\\to\\infty\}\\epsilon\_\{k\}=0\(see Eq\.[14](https://arxiv.org/html/2605.30610#S5.E14)\)\. Letπ¯\\bar\{\\pi\}be a limit of the sequence\{πk\}\\\{\\pi\_\{k\}\\\}\. If the problem in Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5)is feasible, i\.e\.,⟨G\(π¯\)⟩\+=0\\left\\langle G\(\\bar\{\\pi\}\)\\right\\rangle\_\{\+\}=0, then the limiting policyπ¯\\bar\{\\pi\}is a global maximizer\.
In practice, aFineTuningSolverachievingϵk→0\\epsilon\_\{k\}\\to 0is rarely available, and the attainable error bounds required by our guarantees depend strongly on the experimental setting\. Nevertheless, we show in our experiments \(Sec\.[6](https://arxiv.org/html/2605.30610#S6)\) that using Adjoint Matching\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\)is sufficient in practice, consistently yielding near\-optimal rewards while satisfying the constraints\. The convergence guarantees ofCFOdo not requirerrorccto be differentiable\. Any differentiability assumptions are inherited from the underlyingFineTuningSolver\. Consequently, if a gradient\-freeFineTuningSolveris used, such as DiffusionNFT\(Zhenget al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib85)\)or Flow\-GRPO\(Liuet al\.,[2026](https://arxiv.org/html/2605.30610#bib.bib83)\), thenCFOcan be applied in settings whererrandccare available only through function evaluations\.
## 6Experimental Evaluation
\(a\)Dipole \(D\),CFOand AM
\(b\)Energy \(Ha\),CFOand AM
\(c\)Dipole \(D\),CFOand NFT
\(d\)Energy \(Ha\),CFOand NFT
\(e\)Evaluation across bothFineTuningSolverchoices \(95%95\\%CI\)\.
Figure 3:Energy\-constrained dipole moment maximization of FlowMol\(Dunn and Koes,[2024](https://arxiv.org/html/2605.30610#bib.bib16)\)on GEOM Drugs\(Axelrod and Gomez\-Bombarelli,[2022](https://arxiv.org/html/2605.30610#bib.bib2)\)\.CFOattains a dipole moment comparable to the unconstrained baselines, but unlike AM and NFT keeps the expected energy inside the feasible region\. \([3\(a\)](https://arxiv.org/html/2605.30610#S6.F3.sf1)\-[3\(b\)](https://arxiv.org/html/2605.30610#S6.F3.sf2)\): Evolution of the constraint and reward duringCFOfine\-tuning with \(K=6K\\\!=\\\!6,N=10N\\\!=\\\!10\) in comparison to AM\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\)\(N=60N\\\!=\\\!60\), and we show the final iterate in yellow\. \([3\(c\)](https://arxiv.org/html/2605.30610#S6.F3.sf3)\-[3\(d\)](https://arxiv.org/html/2605.30610#S6.F3.sf4)\): Analogous trajectories with DiffusionNFT\(Zhenget al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib85)\)as the innerFineTuningSolver;CFO\(K=5K\\\!=\\\!5\) drives the expected energy below the−80\-80Ha bound that NFT alone fails to meet, while preserving comparable dipole moment\.We demonstrate the ability ofConstrainedFlowOptimization \(Alg\.[1](https://arxiv.org/html/2605.30610#alg1)\) to solve the*constrained generative optimization*problem \(see Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5)\) on both low\-dimensional illustrative settings, and on molecular design tasks\. In particular, we evaluate: \(i\) the performance ofCFOto solve Problem[5](https://arxiv.org/html/2605.30610#S3.E5)given visually interpretable reward and constraint functions, also for \(ii\) the sub\-case of constrained generation, recovered via a constant reward \(see Eq\.[6](https://arxiv.org/html/2605.30610#S3.E6)\)\. We further show that \(iii\)CFOscales to high\-dimensional molecular design tasks, and that \(iv\) it shows promising performances even with an approximateFineTuningSolver, or when run with a limited number of iterationsKK\.
CFOreliably solves constrained generative optimization low\-dimensional tasks\.We evaluateCFOto solve the*constrained generative optimization*problem \(Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5)\) on a visually interpretable setting, wherep1prep^\{\\text\{pre\}\}\_\{1\}is a mixture of two non\-overlapping Gaussians as shown in Figure[2\(a\)](https://arxiv.org/html/2605.30610#S4.F2.sf1), enabling visualization of constraint satisfaction during fine\-tuning\. In this setting, the rewardrris the negative squared distance to the white cross in Figures[2\(a\)](https://arxiv.org/html/2605.30610#S4.F2.sf1)\-[2\(c\)](https://arxiv.org/html/2605.30610#S4.F2.sf3)\(color\-coding in Figure[2\(b\)](https://arxiv.org/html/2605.30610#S4.F2.sf2)and[2\(c\)](https://arxiv.org/html/2605.30610#S4.F2.sf3)\)\. The constraintccis zero within the red triangles in Figures[2\(a\)](https://arxiv.org/html/2605.30610#S4.F2.sf1)\-[2\(c\)](https://arxiv.org/html/2605.30610#S4.F2.sf3), and increases linearly outside \(color\-coding in Figure[2\(a\)](https://arxiv.org/html/2605.30610#S4.F2.sf1)\)\. As shown in Figure[2\(b\)](https://arxiv.org/html/2605.30610#S4.F2.sf2),CFO, run withK=20K\\\!=\\\!20, andρinit=0\.5\\rho\_\{\\text\{init\}\}\\\!=\\\!0\.5, steers the pre\-trained flow model so its induced densityp∗p^\{\*\}is located predominantly within the valid regions \(i\.e\., red triangles\) where the constraint is fulfilled, while simultaneously optimizing the reward by moving samples toward the inner boundaries of both triangles\.CFOincreases the mean reward from−7\.62\-7\.62to−4\.75\-4\.75compared to the base model, while it reduces estimated constraint violations from0\.580\.58to0\.120\.12, as reported in Table[2\(d\)](https://arxiv.org/html/2605.30610#S4.F2.sf4)\. The minor residual violations ofCFO, in Figure[2\(b\)](https://arxiv.org/html/2605.30610#S4.F2.sf2), are likely due to Monte Carlo approximation errors during fine\-tuning\. In contrast toCFO, Adjoint Matching\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\), a well\-established reward\-guided fine\-tuning scheme, which does not take into account any constraint, raises the expected reward to−2\.93\-2\.93, but significantly degrades the model’s ability to satisfy the given constraints, increasing constraint violations from0\.580\.58to2\.472\.47\(Figure[2\(c\)](https://arxiv.org/html/2605.30610#S4.F2.sf3)\)\. The same pattern holds when swapping the innerFineTuningSolverwith a gradient\-free alternative, such as DiffusionNFT\(Zhenget al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib85)\)\. DiffusionNFT\(Zhenget al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib85)\)alone increases the reward to−3\.59\-3\.59but violates the constraint \(1\.761\.76\), whereasCFONFT\{\}\_\{\\text\{NFT\}\}attains−5\.28±0\.14\-5\.28\\pm 0\.14reward at0\.06±0\.010\.06\\pm 0\.01expected constraint \(see Table[2\(d\)](https://arxiv.org/html/2605.30610#S4.F2.sf4); qualitative samples in Apx\. Figure[7](https://arxiv.org/html/2605.30610#A2.F7)\), confirming thatCFOtransfers cleanly across first\-order \(AM\) and zeroth\-order \(NFT\) solvers\. We also benchmarked against DiffOpt\(Konget al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib26)\), a recent inference\-time constrained\-generation method that keeps model weights fixed: on this task, DiffOpt\(Konget al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib26)\)exhibits a per\-sample violation rate of1010–21%21\\%versus8\.5%8\.5\\%forCFO, while incurring4444–55×55\\timeshigher per\-sample sampling cost\.
Constant reward recovers Constrained Generation\.To illustrate the constrained generation \(see Eq\.[6](https://arxiv.org/html/2605.30610#S3.E6)\) capabilities, we consider a correlated Gaussian base densityp1prep^\{\\text\{pre\}\}\_\{1\}, visualized in Figure[2\(e\)](https://arxiv.org/html/2605.30610#S4.F2.sf5), and a constraintccpenalizing samples outside the red central triangle \(see Figure[2\(e\)](https://arxiv.org/html/2605.30610#S4.F2.sf5)\)\. In the following, we vary the boundB∈\{0\.0,1\.0\}B\\\!\\in\\\!\\\{0\.0,1\.0\\\}\(see Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5)\) to obtain diverse flow models inducing fine\-tuned distributionsp∗p^\{\*\}\. As shown in Figures[2\(f\)](https://arxiv.org/html/2605.30610#S4.F2.sf6)–[2\(g\)](https://arxiv.org/html/2605.30610#S4.F2.sf7), increasingBBcauses the resulting densities to visibly expand beyond the zero\-constraint region, illustrating the relaxation of constraint enforcement\. Quantitatively, the selected degree of permissible violation \(i\.e\., the value of B\) is reflected in the mean constraint violations incurred by the respective flow models, obtained by runningCFOwithK=20K\\\!=\\\!20, andρinit=0\.5\\rho\_\{\\text\{init\}\}\\\!=\\\!0\.5\. As shown in Table[2\(h\)](https://arxiv.org/html/2605.30610#S4.F2.sf8), while settingB=1B\\\!=\\\!1leads to expected constraint value of0\.010\.01, choosingB=1\.0B\\\!=\\\!1\.0rendersCFOless restrictive, inducing a policyπ∗\\pi^\{\*\}with a mean constraint of0\.480\.48\. While the base model exhibits𝔼p1pre\[c\(x\)\]=0\.57\\mathbb\{E\}\_\{p^\{\\text\{pre\}\}\_\{1\}\}\[c\(x\)\]\\\!=\\\!0\.57, the violation decreases to0\.480\.48underB=1\.0B\\\!=\\\!1\.0and further to0\.010\.01underB=0\.0B\\\!=\\\!0\.0\. These results illustrate how the choice ofBBcontrols tolerance to constraint violations, offering a mechanism to adaptCFOto domain\-specific requirements\.
CFOscales to high\-dimensional molecular design tasks\.To demonstrate the practical relevance ofCFOin high\-dimensional settings, we applyCFOto a molecular design task, where satisfying constraints is critical\. Specifically, we adapt a pre\-trained flow model, FlowMol\(Dunn and Koes,[2024](https://arxiv.org/html/2605.30610#bib.bib16)\), on GEOM Drugs\(Axelrod and Gomez\-Bombarelli,[2022](https://arxiv.org/html/2605.30610#bib.bib2)\), and maximize the dipole moment\(Minkin,[2012](https://arxiv.org/html/2605.30610#bib.bib34)\)under constraint fulfillment\. As constraints, we impose an upper bound on the totalxTBenergy \(i\.e\.,−80\-80Ha\), to be used as a proxy for chemical stability\. Further details on the constraint and reward functions employed are provided in Appendix[B](https://arxiv.org/html/2605.30610#A2)\. Both functions are computed via GNN\-based predictors \(see Appendix[B](https://arxiv.org/html/2605.30610#A2)\) trained onGFN2\-xTB\(Bannwarthet al\.,[2019](https://arxiv.org/html/2605.30610#bib.bib3)\)\. We employ differentiable rewards and constraints, because the specificFineTuningSolverwe use in our implementation, namely Adjoint Matching\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\), requires first\-order access to these functions\. Our method would also be compatible with non\-differentiable rewards and constraints \(see Sec\.[5](https://arxiv.org/html/2605.30610#S5)\)\.
In Figure[3](https://arxiv.org/html/2605.30610#S6.F3), we show the performance ofCFOfor the energy\-constrained dipole moment maximization molecular design task\. The optimal policyπ∗\\pi^\{\*\}computed byCFO\(K=6,N=10K\\\!\\\!=\\\!\\\!6,N\\\!\\\!=\\\!\\\!10\) increases the dipole moment from6\.556\.55Debye of the pre\-trained model to8\.398\.39Debye \(Figure[3\(a\)](https://arxiv.org/html/2605.30610#S6.F3.sf1)\)\. Simultaneously,π∗\\pi^\{\*\}shifts the flow model density to generate predominantly low\-energy samples, effectively achieving an expected energy of−82\.28\-82\.28Ha, thus satisfying the upper bound B of−80\-80Ha\. In Figure[4\(a\)](https://arxiv.org/html/2605.30610#S6.F4.sf1)\-[4\(c\)](https://arxiv.org/html/2605.30610#S6.F4.sf3), we present drug\-like samples from the fine\-tuned model, together with their ground\-truth reward and constraint values\. For reference, running Adjoint Matching \(N=60N\\\!\\\!=\\\!\\\!60\)\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\)purely for reward maximization, without enforcing the constraint, achieves a similar reward of8\.378\.37Debye, yet results in an expected energy of−78\.31\-78\.31Ha, thus not fulfilling the constraint \(see Table[3\(e\)](https://arxiv.org/html/2605.30610#S6.F3.sf5)\)\. In Appendix[B](https://arxiv.org/html/2605.30610#A2), we show that the GNN predictors are accurate throughout the optimization, with ground truth values of reward and constraint being optimized to the same extent\.
\(a\)15\.715\.7D /−86\.9\-86\.9Ha
\(b\)9\.19\.1D /−83\.4\-83\.4Ha
\(c\)12\.512\.5D /−93\.8\-93\.8Ha
\(d\)Molecular Statistic
Figure 4:\([4\(a\)](https://arxiv.org/html/2605.30610#S6.F4.sf1)\-[4\(c\)](https://arxiv.org/html/2605.30610#S6.F4.sf3)\) Drug\-like molecules sampled from the fine\-tuned model, together with ground\-truth dipole moments \(D\) and energies \(Ha\)\. \([4\(d\)](https://arxiv.org/html/2605.30610#S6.F4.sf4)\): Molecular statistics for 2000 molecules sampled from polices fine\-tuned withCFOand AM\. Validity \(Definition in Apx\.[C](https://arxiv.org/html/2605.30610#A3)\), RDKit\-Sanitization\(Landrum,[2025](https://arxiv.org/html/2605.30610#bib.bib27)\), QED\(Ertl and Schuffenhauer,[2009](https://arxiv.org/html/2605.30610#bib.bib17)\), Lipinski’s rule of 5\(Lipinski,[2004](https://arxiv.org/html/2605.30610#bib.bib30)\),log\\logP\(PubChem,[2010](https://arxiv.org/html/2605.30610#bib.bib38)\)\.CFOpreservesFineTuningSolver\-level molecular statistics \(e\.g\., QED, Lipinski\) while additionally satisfying the energy budget, i\.e\., constraint satisfaction comes at essentially no cost in chemical quality compared to pure reward maximization\.To showcase thatCFOis agnostic to the inner fine\-tuning solver, we additionally runCFOwith DiffusionNFT\(Zhenget al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib85)\)\(K=5K\\\!=\\\!5\) \(Figures[3\(c\)](https://arxiv.org/html/2605.30610#S6.F3.sf3)\-[3\(d\)](https://arxiv.org/html/2605.30610#S6.F3.sf4)\)\.CFOwith NFT increases the dipole moment from6\.556\.55to8\.278\.27Debye while reducing the expected energy to−80\.72\-80\.72Ha, thus satisfying the−80\-80Ha bound\. In contrast, unconstrained NFT alone achieves a comparable reward of8\.308\.30Debye but only−78\.67\-78\.67Ha in expected energy, violating the constraint\. This confirms thatCFOtransfers across both first\-order \(AM\) and gradient\-free \(NFT\) solvers, while consistently steering the policy into the feasible region\.
Optimizing molecular properties reduces validity, from34%34\\%for PRE to9%9\\%forCFOand to4%4\\%for AM\. Since validity is not explicitly enforced but only implicitly learned, fine\-tuning steers the model toward sparsely represented regions of chemical space where this notion degrades under our stringent validity criteria \(see definition in Apx\.[C](https://arxiv.org/html/2605.30610#A3)\);CFOstill maintains higher validity than AM at comparable reward\. In Appendix[C](https://arxiv.org/html/2605.30610#A3), we discuss how base model improvements and differentiable geometry relaxation could increase the validity of generated molecules\. We also report standard molecular statistics forCFOand AM to contextualize reward\-guided fine\-tuning\. Although not optimization targets, these metrics reflect shifts from the initial model, which is trained on GEOM Drugs and then fine\-tuned to maximize the dipole under energy constraints\. As shown in Table[4\(d\)](https://arxiv.org/html/2605.30610#S6.F4.sf4),CFOexhibits slightly smaller shifts compared to AM, e\.g\., a QED of0\.370\.37forCFO, which is closer to the0\.450\.45of PRE than the0\.340\.34for AM\. Beyond the expectation, we also report per\-sample feasibility:CFOsatisfies the energy constraint on61\.4%61\.4\\%of generated molecules, compared to40\.6%40\.6\\%for unconstrained AM\. To further illustrateCFO’s flexibility, in Appendix[C](https://arxiv.org/html/2605.30610#A3)we report an additional experiment where the energy constraint is replaced by a learnedPoseBusters\-based validity criterion\(Buttenschoenet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib9)\), whichCFOuses to reduce the predicted violation rate from53%53\\%to39%39\\%while still increasing the dipole moment\.
CFOoutperforms a manually tuned penalty baseline\.We compareCFOto a manually tuned fixed\-μ\\mupenalty baseline, which optimizes the Lagrangian with a fixed constraint weightμ\\mu\(Eq\.[7](https://arxiv.org/html/2605.30610#S3.E7)\)\. To selectμ\\mu, we run the baseline over 18 values covering 13 orders of magnitude,μ∈\[10−6,106\]\\mu\\\!\\in\\\!\[10^\{\-6\},10^\{6\}\]\(Figure[5\(a\)](https://arxiv.org/html/2605.30610#S6.F5.sf1)\)\. We observe high sensitivity toμ\\mu\. For smallμ\\mu\(e\.g\.,μ≤0\.01\\mu\\leq 0\.01\), the baseline attains high reward but exhibits severe constraint violations \(e\.g\.,8\.348\.34D,−78\.94\-78\.94Ha forμ=0\.01\\mu\\\!=\\\!0\.01\) and fails to satisfy the constraint\. Conversely, for largeμ\\mu\(≥1\.0\\geq\\\!1\.0\), the constraint is enforced, but reward degrades substantially \(6\.696\.69Debye forμ=50\.0\\mu\\\!=\\\!50\.0\), falling belowCFO\. In contrast,CFOsatisfies the constraint across all tested hyperparameter settings, while the achieved reward remains stable around8\.398\.39Debye \(ablation study in Apx\.[D](https://arxiv.org/html/2605.30610#A4)\)\. Overall, only twoμ\\muout of 18 values achieve a reward comparable toCFOwhile satisfying the constraint \(Figure[5\(c\)](https://arxiv.org/html/2605.30610#S6.F5.sf3)\), confirming that manual tuning is unreliable and inefficient \(Sec\.[3](https://arxiv.org/html/2605.30610#S3)\)\. Since both methods use the same number of gradient steps per run, this exhaustive tuning makes the baseline about18×18\\timesmore expensive\. In contrast,CFOadapts the parameters online, yielding a more robust trade\-off between reward maximization and constraint satisfaction\.
\(a\)CFOand fixedμ\\mu\-Baseline
𝔼\[r\(x\)\]↑\\mathbb\{E\}\[r\(x\)\]\\uparrow𝔼\[c\(x\)\]≤B\\mathbb\{E\}\[c\(x\)\]\\\!\\leq\\\!BPRE6\.55±0\.076\.55\_\{\\pm 0\.07\}✗CFO8\.39±0\.108\.39\_\{\\pm 0\.10\}✓AMμ=0\.58\.38±0\.148\.38\_\{\\pm 0\.14\}✓AMμ=0\.18\.33±0\.138\.33\_\{\\pm 0\.13\}✓Total RuntimeCFO≈44\\approx 44minAMμ≈727\\approx 727min
\(b\)Evaluation
\(c\)Counts vs\. Dipole
Figure 5:\([5\(a\)](https://arxiv.org/html/2605.30610#S6.F5.sf1)\-[5\(b\)](https://arxiv.org/html/2605.30610#S6.F5.sf2)\): Pareto comparingCFO\(K=6K\\\!=\\\!6,N=10N\\\!=\\\!10\) against fixed\-μ\\mubaselines \(Eq\.[7](https://arxiv.org/html/2605.30610#S3.E7)\) run with AM \(N=60N\\\!=\\\!60\) \(i\.e\., same number of gradient steps\)\. The baseline with1818values uniformly acrossμ∈\[1e−6,1e6\]\\mu\\in\[1\\mathrm\{e\}\{\-6\},1\\mathrm\{e\}\{6\}\]\. Manualμ\\mu\-tuning is unreliable: out of1818values ofμ\\mu, only22simultaneously achieveCFO\-level dipole moments and satisfy the energy constraint\. Histogram shows dipole moments for all baseline runs, separated \(in color\) by feasibility compared toCFO, indicated by the purple dashed line which is feasible without any tuning\.[5\(b\)](https://arxiv.org/html/2605.30610#S6.F5.sf2): Numeric Evaluation of \([5\(a\)](https://arxiv.org/html/2605.30610#S6.F5.sf1)\), \(95%95\\%CI\)B=−80B=\-80Ha\.[5\(c\)](https://arxiv.org/html/2605.30610#S6.F5.sf3): Distribution of dipole moments for fixed\-μ\\mubaselines separated by constraint satisfaction,CFOresult indicated\.CFOcan run with approximate fine\-tuning oracles and a limited number of iterationsKK\.WhileCFOperformsKKouter iterations, standard fine\-tuning solvers\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15); Ueharaet al\.,[2024c](https://arxiv.org/html/2605.30610#bib.bib50)\)requireNNinner steps\. To avoid a double loop, we fix the total solver budget toM=K⋅NM\\\!=\\\!K\\cdot Nin all experiments\. IncreasingKKreallocates compute from a more accurate inner solver to more frequent dual updates, trading solver precision for update frequency\.
Under a fixed budgetM=6000M\\\!=\\\!6000\(Figures[2\(a\)](https://arxiv.org/html/2605.30610#S4.F2.sf1)–[2\(c\)](https://arxiv.org/html/2605.30610#S4.F2.sf3)\), varyingKKshows a clear reward\-constraint trade\-off\. Few updates \(K=3K\\\!=\\\!3,N=2000N\\\!=\\\!2000\) yield high reward but large constraint violation \(0\.400\.40\), while frequent updates \(K=100K\\\!=\\\!100,N=60N\\\!=\\\!60\) nearly eliminate violations \(0\.100\.10\) at the cost of reward \(−5\.91\-5\.91\)\. An intermediate setting \(K=20K\\\!=\\\!20\) achieves both low violation \(0\.120\.12\) and high reward \(−4\.75\-4\.75\), see Figure[6](https://arxiv.org/html/2605.30610#A2.F6)\. Overall,CFOeffectively allocates a fixed compute budget, balancing solver accuracy and dual update frequency, to match the computational cost of theFineTuningSolver\.
Importantly, this observation also holds for the molecular design task in Figure[3](https://arxiv.org/html/2605.30610#S6.F3)\.CFO\(K=6K\\\!\\\!=\\\!\\\!6,N=10N\\\!\\\!=\\\!\\\!10\) and AM \(N=60N\\\!\\\!=\\\!\\\!60\) have comparable computational cost, as both perform6060gradient steps\. Concretely,CFOhas a total runtime of44\.544\.5min and compares well to the runtime of AM with40\.2540\.25min\. This5%5\\%increase arises from the extra sampling and constraint evaluation performed in Step33of Alg\.[1](https://arxiv.org/html/2605.30610#alg1)\. Thus demonstrating thatCFOcan operate effectively in high\-dimensional domains even with an approximate oracle\. In Appendix[C](https://arxiv.org/html/2605.30610#A3), we additionally report molecular\-design results on QM9\(Ramakrishnanet al\.,[2014](https://arxiv.org/html/2605.30610#bib.bib39)\)using a differentiable simulator \(dxTB\(Friedeet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib18)\)\) as exact reward and constraint functions, complementing the GEOM Drugs experiments shown here\.
## 7Related Work
Control\-based fine\-tuning of flow and diffusion models\.Recent works have tackled fine\-tuning of diffusion and flow models to maximize expected rewards under KL regularization as an entropy\-regularized optimal control problem\(e\.g\., Ueharaet al\.,[2024b](https://arxiv.org/html/2605.30610#bib.bib49); Tang and Zhou,[2024](https://arxiv.org/html/2605.30610#bib.bib46); Ueharaet al\.,[2024c](https://arxiv.org/html/2605.30610#bib.bib50); Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\)\. Such methods have been successfully applied to real\-world domains such as image generation\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\), molecular design\(Ueharaet al\.,[2024c](https://arxiv.org/html/2605.30610#bib.bib50)\), or protein engineering\(Ueharaet al\.,[2024c](https://arxiv.org/html/2605.30610#bib.bib50)\)\. These methods have also been adopted as subroutines to tackle settings beyond reward maximization, such as manifold exploration\(De Santiet al\.,[2025b](https://arxiv.org/html/2605.30610#bib.bib12),[2026b](https://arxiv.org/html/2605.30610#bib.bib81)\)or optimization of distributional objectives, such as conditional value at risk\(De Santiet al\.,[2025a](https://arxiv.org/html/2605.30610#bib.bib13); Wanget al\.,[2026](https://arxiv.org/html/2605.30610#bib.bib82)\)or reward\-guided model merging\(De Santiet al\.,[2026a](https://arxiv.org/html/2605.30610#bib.bib80)\)\.CFOextends fine\-tuning methods for reward maximization to leverage known constraint functions and can be straightforwardly used as a plug\-in oracle in more complex settings \(e\.g\., exploration and distributional fine\-tuning\)\. Importantly,CFOis agnostic to the underlying data modality \(continuous, discrete, or mixed\): the choice of innerFineTuningSolverdetermines this\. First\-order solvers such as Adjoint Matching\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\)are well\-suited to differentiable rewards and constraints, while gradient\-free schemes such as DiffusionNFT\(Zhenget al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib85)\)and Flow\-GRPO\(Liuet al\.,[2026](https://arxiv.org/html/2605.30610#bib.bib83)\)enable non\-differentiable objectives, and discrete\-space solvers such as DRAKES\(Wanget al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib84)\)or SEPO\(Zekri and Boullé,[2026](https://arxiv.org/html/2605.30610#bib.bib87)\)unlock direct application to fully discrete generative models \(e\.g\., for peptide or protein design\)\.
Constrained Generative Modeling and Optimization\.Most prior work addresses constraint\-aware generative modeling, developing tools for handling linear\(Graikoset al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib20)\), differentiable\(Khalafiet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib24)\), and black\-box\(Konget al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib26)\)constraints\. Enforcement spans training\-time dual/penalty formulations\(Khalafiet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib24)\)and inference\-time strategies such as reward\-weighted denoising for non\-differentiable objectives\(Konget al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib26)\)and classifier or classifier\-free guidance for differentiable surrogates\(Dhariwal and Nichol,[2021](https://arxiv.org/html/2605.30610#bib.bib14); Ho and Salimans,[2022](https://arxiv.org/html/2605.30610#bib.bib21)\)\. These techniques have been applied in domains such as molecular design\(Konget al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib26)\)and constrained planning\(Maet al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib32)\)\. The closest work to ours is arguablyKhalafiet al\.\([2024](https://arxiv.org/html/2605.30610#bib.bib24)\), with the main difference that our setting is for post\-training, i\.e\., at fine\-tuning time, constrained generative optimization rather than a training\-time scheme enforcing given constraints\. Concretely,Khalafiet al\.\([2024](https://arxiv.org/html/2605.30610#bib.bib24)\)keep the pre\-trained model weights fixed and instead enforce the constraint through inference\-time guidance, whereasCFO*fine\-tunes*the model so the constraint is internalized into the weights, and inference proceeds as standard sampling at base\-model cost\.
Augmented Lagrangian and Dual Methods in Constrained Sampling\.Augmented Lagrangian and dual formulations turn equality and inequality constraints into auxiliary updates that run with the sampler, enabling draws from unnormalized targets while enforcing feasibility either per\-sample or in expectation\(Khalafiet al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib25); Blankeet al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib7); Chamonet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib10); Smithet al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib42)\)\. For example,Zhanget al\.\([2025b](https://arxiv.org/html/2605.30610#bib.bib56)\)employ an augmented Lagrangian method to steer diffusion rollouts toward time\-varying safety sets without retraining of the base model\. Dual schemes similarly maintain physical invariants during sampling or data assimilation while still retaining sufficient exploration of feasible states\(Blankeet al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib7)\)\. In addition to constraint generation or sampling,CFOalso performs reward\-driven optimization with the augmented formulation\.
## 8Conclusion
This work tackles the problem of*constrained generative optimization*via fine\-tuning of pre\-trained flow and diffusion models, a relevant and challenging task in discovery applications such as drug discovery\. After proposing a constrained optimization formulation of the problem, we introducedConstrainedFlowOptimization, a method that transforms the constrained objective into a sequence of fine\-tuning steps, and provides feasibility and optimality guarantees\. Empirical results on both illustrative settings and molecular design tasks demonstrate the ability ofCFOto steer pre\-trained flow models toward high\-reward regions while satisfying the given constraints\. Promising directions include adding zero\-order oracles toCFObeyond the current first\-order choice, and developing inference\-time constraint handling rather than fine\-tuning, and testing on protein engineering tasks\.
## Acknowledgments
This publication was made possible by the ETH AI Center doctoral fellowship to Riccardo De Santi\. The project has received funding from the Swiss National Science Foundation under NCCR Catalysis \(grant number 180544 and 225147\) and NCCR Automation \(grant agreement 51NF40 180545\), a National Centre of Competence in Research funded by the Swiss National Science Foundation\. This work was supported by an ETH Zurich Research Grant\.
## Impact Statement
This paper presents work whose goal is to advance the field of generative optimization\. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here\.
## References
- A\. M\. Amorim, L\. F\. Piochi, A\. T\. Gaspar, A\. J\. Preto, N\. Rosario\-Ferreira, and I\. S\. Moreira \(2024\)Advancing drug safety in drug development: bridging computational predictions for enhanced toxicity prediction\.Chemical research in toxicology37\(6\),pp\. 827–849\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p2.1)\.
- S\. Axelrod and R\. Gomez\-Bombarelli \(2022\)GEOM, energy\-annotated molecular conformations for property prediction and molecular generation\.Scientific Data9\(1\),pp\. 185\.Cited by:[Figure 3](https://arxiv.org/html/2605.30610#S6.F3),[Figure 3](https://arxiv.org/html/2605.30610#S6.F3.10.5),[§6](https://arxiv.org/html/2605.30610#S6.p4.1)\.
- C\. Bannwarth, S\. Ehlert, and S\. Grimme \(2019\)GFN2\-xtb—an accurate and broadly parametrized self\-consistent tight\-binding quantum chemical method with multipole electrostatics and density\-dependent dispersion contributions\.Journal of chemical theory and computation15\(3\),pp\. 1652–1671\.Cited by:[§6](https://arxiv.org/html/2605.30610#S6.p4.1)\.
- A\. Ben\-Tal and A\. Nemirovski \(2000\)Robust solutions of linear programming problems contaminated with uncertain data\.Mathematical programming88\(3\),pp\. 411–424\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p4.1)\.
- G\. R\. Bickerton, G\. V\. Paolini, J\. Besnard, S\. Muresan, and A\. L\. Hopkins \(2012\)Quantifying the chemical beauty of drugs\.Nature chemistry4\(2\),pp\. 90–98\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p1.1)\.
- E\. G\. Birgin and J\. M\. Martínez \(2014\)Practical augmented lagrangian methods for constrained optimization\.SIAM\.Cited by:[Appendix D](https://arxiv.org/html/2605.30610#A4.p2.5),[Appendix E](https://arxiv.org/html/2605.30610#A5.p2.2),[Appendix E](https://arxiv.org/html/2605.30610#A5.p5.1),[Appendix E](https://arxiv.org/html/2605.30610#A5.p7.1),[Appendix E](https://arxiv.org/html/2605.30610#A5.p9.4),[§1](https://arxiv.org/html/2605.30610#S1.p4.1),[§4](https://arxiv.org/html/2605.30610#S4.p1.2),[§4](https://arxiv.org/html/2605.30610#S4.p3.2),[§5](https://arxiv.org/html/2605.30610#S5.p3.3)\.
- M\. Blanke, Y\. Qu, S\. Shamekh, and P\. Gentine \(2025\)Strictly constrained generative modeling via split augmented langevin sampling\.arXiv preprint arXiv:2505\.18017\.Cited by:[§7](https://arxiv.org/html/2605.30610#S7.p3.1)\.
- J\. Bracken and J\. T\. McGill \(1973\)Mathematical programs with optimization problems in the constraints\.Operations research21\(1\),pp\. 37–44\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p4.1)\.
- M\. Buttenschoen, G\. M\. Morris, and C\. M\. Deane \(2024\)PoseBusters: ai\-based docking methods fail to generate physically valid poses or generalise to novel sequences\.Chemical Science15\(9\),pp\. 3130–3139\.Cited by:[Appendix C](https://arxiv.org/html/2605.30610#A3.p4.8),[§1](https://arxiv.org/html/2605.30610#S1.p2.1),[§6](https://arxiv.org/html/2605.30610#S6.p7.10)\.
- L\. F\. Chamon, M\. R\. Karimi, and A\. Korba \(2024\)Constrained sampling with primal\-dual langevin monte carlo\.Advances in Neural Information Processing Systems37,pp\. 29285–29323\.Cited by:[§3](https://arxiv.org/html/2605.30610#S3.p3.21),[§7](https://arxiv.org/html/2605.30610#S7.p3.1)\.
- G\. Corso, H\. Stärk, B\. Jing, R\. Barzilay, and T\. Jaakkola \(2022\)Diffdock: diffusion steps, twists, and turns for molecular docking\.arXiv preprint arXiv:2210\.01776\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p1.1)\.
- R\. De Santi, M\. Franke, Y\. Hsieh, and A\. Krause \(2026a\)A unified density operator view of flow control and merging\.arXiv preprint arXiv:2602\.08012\.Cited by:[§7](https://arxiv.org/html/2605.30610#S7.p1.1)\.
- R\. De Santi, K\. Protopapas, Y\. Hsieh, and A\. Krause \(2026b\)Verifier\-constrained flow expansion for discovery beyond the data\.arXiv preprint arXiv:2602\.15984\.Cited by:[§7](https://arxiv.org/html/2605.30610#S7.p1.1)\.
- R\. De Santi, M\. Vlastelica, Y\. Hsieh, Z\. Shen, N\. He, and A\. Krause \(2025a\)Flow density control: generative optimization beyond entropy\-regularized fine\-tuning\.Advances in Neural Information Processing Systems \(NeurIPS\)\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p1.1),[§1](https://arxiv.org/html/2605.30610#S1.p4.1),[§7](https://arxiv.org/html/2605.30610#S7.p1.1)\.
- R\. De Santi, M\. Vlastelica, Y\. Hsieh, Z\. Shen, N\. He, and A\. Krause \(2025b\)Provable maximum entropy manifold exploration via diffusion models\.InICLR 2025 Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy,Cited by:[§2](https://arxiv.org/html/2605.30610#S2.p4.11),[§5](https://arxiv.org/html/2605.30610#S5.p2.1),[§7](https://arxiv.org/html/2605.30610#S7.p1.1)\.
- P\. Dhariwal and A\. Nichol \(2021\)Diffusion models beat gans on image synthesis\.Advances in neural information processing systems34,pp\. 8780–8794\.Cited by:[§7](https://arxiv.org/html/2605.30610#S7.p2.1)\.
- C\. Domingo\-Enrich, M\. Drozdzal, B\. Karrer, and R\. T\. Chen \(2024\)Adjoint matching: fine\-tuning flow and diffusion generative models with memoryless stochastic optimal control\.arXiv preprint arXiv:2409\.08861\.Cited by:[Appendix A](https://arxiv.org/html/2605.30610#A1),[Appendix A](https://arxiv.org/html/2605.30610#A1.p1.1),[Appendix A](https://arxiv.org/html/2605.30610#A1.p2.3),[Appendix A](https://arxiv.org/html/2605.30610#A1.p3.1),[Appendix B](https://arxiv.org/html/2605.30610#A2.p4.3),[Appendix C](https://arxiv.org/html/2605.30610#A3.p3.7),[Figure 11](https://arxiv.org/html/2605.30610#A4.F11),[Figure 11](https://arxiv.org/html/2605.30610#A4.F11.4.2),[Appendix E](https://arxiv.org/html/2605.30610#A5.p1.10),[Appendix E](https://arxiv.org/html/2605.30610#A5.p1.15),[Appendix E](https://arxiv.org/html/2605.30610#A5.p1.6),[§1](https://arxiv.org/html/2605.30610#S1.p1.1),[§1](https://arxiv.org/html/2605.30610#S1.p2.1),[§1](https://arxiv.org/html/2605.30610#S1.p4.1),[§2](https://arxiv.org/html/2605.30610#S2.p2.9),[§3](https://arxiv.org/html/2605.30610#S3.p4.1),[Figure 2](https://arxiv.org/html/2605.30610#S4.F2),[Figure 2](https://arxiv.org/html/2605.30610#S4.F2.4.2),[§4](https://arxiv.org/html/2605.30610#S4.p4.2),[§5](https://arxiv.org/html/2605.30610#S5.p6.5),[Figure 3](https://arxiv.org/html/2605.30610#S6.F3),[Figure 3](https://arxiv.org/html/2605.30610#S6.F3.10.5),[§6](https://arxiv.org/html/2605.30610#S6.p2.23),[§6](https://arxiv.org/html/2605.30610#S6.p4.1),[§6](https://arxiv.org/html/2605.30610#S6.p5.10),[§6](https://arxiv.org/html/2605.30610#S6.p9.5),[§7](https://arxiv.org/html/2605.30610#S7.p1.1),[Algorithm 2](https://arxiv.org/html/2605.30610#alg2),[4](https://arxiv.org/html/2605.30610#alg2.l4),[6](https://arxiv.org/html/2605.30610#alg2.l6),[7](https://arxiv.org/html/2605.30610#alg2.l7)\.
- I\. Dunn and D\. R\. Koes \(2024\)Mixed continuous and categorical flow matching for 3d de novo molecule generation\.arXiv preprint arXiv:2404\.19739\.Cited by:[Appendix C](https://arxiv.org/html/2605.30610#A3.p1.1),[§1](https://arxiv.org/html/2605.30610#S1.p1.1),[Figure 3](https://arxiv.org/html/2605.30610#S6.F3),[Figure 3](https://arxiv.org/html/2605.30610#S6.F3.10.5),[§6](https://arxiv.org/html/2605.30610#S6.p4.1)\.
- P\. Ertl and A\. Schuffenhauer \(2009\)Estimation of synthetic accessibility score of drug\-like molecules based on molecular complexity and fragment contributions\.Journal of Cheminformatics1\(1\),pp\. 8\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p2.1),[Figure 4](https://arxiv.org/html/2605.30610#S6.F4),[Figure 4](https://arxiv.org/html/2605.30610#S6.F4.2.1)\.
- M\. Fortin \(1975\)Minimization of some non\-differentiable functionals by the augmented lagrangian method of hestenes and powell\.Applied Mathematics and Optimization2\(3\),pp\. 236–250\.Cited by:[§4](https://arxiv.org/html/2605.30610#S4.p1.2)\.
- M\. Friede, C\. Hölzer, S\. Ehlert, and S\. Grimme \(2024\)Dxtb—an efficient and fully differentiable framework for extended tight\-binding\.The Journal of Chemical Physics161\(6\)\.Cited by:[Figure 9](https://arxiv.org/html/2605.30610#A3.F9),[Figure 9](https://arxiv.org/html/2605.30610#A3.F9.4.2),[Appendix C](https://arxiv.org/html/2605.30610#A3.p3.7),[§6](https://arxiv.org/html/2605.30610#S6.p11.8)\.
- A\. Graikos, N\. Jojic, and D\. Samaras \(2024\)Fast constrained sampling in pre\-trained diffusion models\.arXiv preprint arXiv:2410\.18804\.Cited by:[§7](https://arxiv.org/html/2605.30610#S7.p2.1)\.
- J\. Ho, A\. Jain, and P\. Abbeel \(2020\)Denoising diffusion probabilistic models\.Advances in neural information processing systems33,pp\. 6840–6851\.Cited by:[Appendix A](https://arxiv.org/html/2605.30610#A1.p2.3),[§1](https://arxiv.org/html/2605.30610#S1.p1.1)\.
- J\. Ho and T\. Salimans \(2022\)Classifier\-free diffusion guidance\.arXiv preprint arXiv:2207\.12598\.Cited by:[§7](https://arxiv.org/html/2605.30610#S7.p2.1)\.
- E\. Hoogeboom, V\. G\. Satorras, C\. Vignac, and M\. Welling \(2022\)Equivariant diffusion for molecule generation in 3d\.InInternational conference on machine learning,pp\. 8867–8887\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p1.1)\.
- S\. Khalafi, D\. Ding, and A\. Ribeiro \(2024\)Constrained diffusion models via dual training\.Advances in Neural Information Processing Systems37,pp\. 26543–26576\.Cited by:[§7](https://arxiv.org/html/2605.30610#S7.p2.1)\.
- S\. Khalafi, I\. Hounie, D\. Ding, and A\. Ribeiro \(2025\)Composition and alignment of diffusion models using constrained learning\.arXiv preprint arXiv:2508\.19104\.Cited by:[§3](https://arxiv.org/html/2605.30610#S3.p3.21),[§7](https://arxiv.org/html/2605.30610#S7.p3.1)\.
- L\. Kong, Y\. Du, W\. Mu, K\. Neklyudov, V\. De Bortoli, D\. Wu, H\. Wang, A\. Ferber, Y\. Ma, C\. P\. Gomes,et al\.\(2024\)Diffusion models as constrained samplers for optimization with unknown constraints\.arXiv preprint arXiv:2402\.18012\.Cited by:[Figure 7](https://arxiv.org/html/2605.30610#A2.F7),[Figure 7](https://arxiv.org/html/2605.30610#A2.F7.5.2),[Table 1](https://arxiv.org/html/2605.30610#A2.T1),[Table 1](https://arxiv.org/html/2605.30610#A2.T1.4.2),[Appendix B](https://arxiv.org/html/2605.30610#A2.p5.9.1),[§6](https://arxiv.org/html/2605.30610#S6.p2.23),[§7](https://arxiv.org/html/2605.30610#S7.p2.1)\.
- G\. Landrum \(2025\)RDKit: open\-source cheminformatics\. https://www\.rdkit\.org\.https://www\.rdkit\.org\.Cited by:[Figure 4](https://arxiv.org/html/2605.30610#S6.F4),[Figure 4](https://arxiv.org/html/2605.30610#S6.F4.2.1)\.
- Z\. Li, H\. Yuan, K\. Huang, C\. Ni, Y\. Ye, M\. Chen, and M\. Wang \(2024\)Diffusion model for data\-driven black\-box optimization\.arXiv preprint arXiv:2403\.13219\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p1.1)\.
- C\. A\. Lipinski \(2004\)Lead\-and drug\-like compounds: the rule\-of\-five revolution\.Drug discovery today: Technologies1\(4\),pp\. 337–341\.Cited by:[Figure 4](https://arxiv.org/html/2605.30610#S6.F4),[Figure 4](https://arxiv.org/html/2605.30610#S6.F4.2.1)\.
- Y\. Lipman, R\. T\. Chen, H\. Ben\-Hamu, M\. Nickel, and M\. Le \(2022\)Flow matching for generative modeling\.arXiv preprint arXiv:2210\.02747\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p1.1),[§2](https://arxiv.org/html/2605.30610#S2.p1.8),[§2](https://arxiv.org/html/2605.30610#S2.p2.6),[§2](https://arxiv.org/html/2605.30610#S2.p2.9)\.
- J\. Liu, G\. Liu, J\. Liang, Y\. Li, J\. Liu, X\. Wang, P\. Wan, D\. Zhang, and W\. Ouyang \(2026\)Flow\-grpo: training flow matching models via online rl\.Advances in neural information processing systems38,pp\. 40783–40818\.Cited by:[§4](https://arxiv.org/html/2605.30610#S4.p4.3),[§5](https://arxiv.org/html/2605.30610#S5.p6.5),[§7](https://arxiv.org/html/2605.30610#S7.p1.1)\.
- H\. Ma, S\. Bodmer, A\. Carron, M\. Zeilinger, and M\. Muehlebach \(2025\)Constraint\-aware diffusion guidance for robotics: real\-time obstacle avoidance for autonomous racing\.arXiv preprint arXiv:2505\.13131\.Cited by:[§7](https://arxiv.org/html/2605.30610#S7.p2.1)\.
- V\. I\. Minkin \(2012\)Dipole moments in organic chemistry\.Springer Science & Business Media\.Cited by:[§6](https://arxiv.org/html/2605.30610#S6.p4.1)\.
- R\. M\. Neeser, B\. Correia, and P\. Schwaller \(2024\)FSscore: a personalized machine learning\-based synthetic feasibility score\.Chemistry\-Methods4\(11\),pp\. e202400024\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p2.1)\.
- A\. S\. Nemirovskij and D\. B\. Yudin \(1983\)Problem complexity and method efficiency in optimization\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p4.1)\.
- T\. Pantsar and A\. Poso \(2018\)Binding affinity via docking: fact and fiction\.Molecules23\(8\),pp\. 1899\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p1.1)\.
- PubChem \(2010\)AID 19623 \- Partition coefficient \(logP\) \- PubChem\.External Links:[Link](https://pubchem.ncbi.nlm.nih.gov/bioassay/19623)Cited by:[Figure 4](https://arxiv.org/html/2605.30610#S6.F4),[Figure 4](https://arxiv.org/html/2605.30610#S6.F4.2.1)\.
- R\. Ramakrishnan, P\. O\. Dral, M\. Rupp, and O\. A\. Von Lilienfeld \(2014\)Quantum chemistry structures and properties of 134 kilo molecules\.Scientific data1\(1\),pp\. 1–7\.Cited by:[Figure 9](https://arxiv.org/html/2605.30610#A3.F9),[Figure 9](https://arxiv.org/html/2605.30610#A3.F9.4.2),[Appendix C](https://arxiv.org/html/2605.30610#A3.p3.7),[§6](https://arxiv.org/html/2605.30610#S6.p11.8)\.
- R\. T\. Rockafellar \(1976\)Augmented lagrangians and applications of the proximal point algorithm in convex programming\.Mathematics of operations research1\(2\),pp\. 97–116\.Cited by:[§4](https://arxiv.org/html/2605.30610#S4.p1.2)\.
- R\. Rombach, A\. Blattmann, D\. Lorenz, P\. Esser, and B\. Ommer \(2022\)High\-resolution image synthesis with latent diffusion models\.InProceedings of the IEEE/CVF conference on computer vision and pattern recognition,pp\. 10684–10695\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p1.1)\.
- H\. D\. Smith, N\. L\. Diamant, and B\. L\. Trippe \(2025\)Calibrating generative models\.arXiv preprint arXiv:2510\.10020\.Cited by:[§7](https://arxiv.org/html/2605.30610#S7.p3.1)\.
- J\. Song, C\. Meng, and S\. Ermon \(2022\)Denoising diffusion implicit models\.arXiv preprint arXiv:2010\.02502\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p1.1),[§2](https://arxiv.org/html/2605.30610#S2.p1.8)\.
- Y\. Song, J\. Sohl\-Dickstein, D\. P\. Kingma, A\. Kumar, S\. Ermon, and B\. Poole \(2020\)Score\-based generative modeling through stochastic differential equations\.arXiv preprint arXiv:2011\.13456\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p1.1),[§2](https://arxiv.org/html/2605.30610#S2.p1.8)\.
- H\. Stark, B\. Jing, C\. Wang, G\. Corso, B\. Berger, R\. Barzilay, and T\. Jaakkola \(2024\)Dirichlet flow matching with applications to dna sequence design\.arXiv preprint arXiv:2402\.05841\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p1.1)\.
- W\. Tang and F\. Zhou \(2024\)Fine\-tuning of diffusion models via stochastic control: entropy regularization and beyond\.arXiv preprint arXiv:2403\.06279\.Cited by:[Appendix E](https://arxiv.org/html/2605.30610#A5.p1.15),[Appendix E](https://arxiv.org/html/2605.30610#A5.p1.6),[§1](https://arxiv.org/html/2605.30610#S1.p1.1),[§1](https://arxiv.org/html/2605.30610#S1.p2.1),[§7](https://arxiv.org/html/2605.30610#S7.p1.1)\.
- L\. Treven, J\. Hübotter, B\. Sukhija, F\. Dorfler, and A\. Krause \(2023\)Efficient exploration in continuous\-time model\-based reinforcement learning\.Advances in Neural Information Processing Systems36,pp\. 42119–42147\.Cited by:[§2](https://arxiv.org/html/2605.30610#S2.p3.4)\.
- M\. Uehara, Y\. Zhao, T\. Biancalani, and S\. Levine \(2024a\)Understanding reinforcement learning\-based fine\-tuning of diffusion models: a tutorial and review\.arXiv preprint arXiv:2407\.13734\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p2.1)\.
- M\. Uehara, Y\. Zhao, K\. Black, E\. Hajiramezanali, G\. Scalia, N\. L\. Diamant, A\. M\. Tseng, T\. Biancalani, and S\. Levine \(2024b\)Fine\-tuning of continuous\-time diffusion models as entropy\-regularized control\.arXiv preprint arXiv:2402\.15194\.Cited by:[Appendix E](https://arxiv.org/html/2605.30610#A5.p1.15),[§1](https://arxiv.org/html/2605.30610#S1.p1.1),[§1](https://arxiv.org/html/2605.30610#S1.p2.1),[§1](https://arxiv.org/html/2605.30610#S1.p4.1),[§7](https://arxiv.org/html/2605.30610#S7.p1.1)\.
- M\. Uehara, Y\. Zhao, K\. Black, E\. Hajiramezanali, G\. Scalia, N\. L\. Diamant, A\. M\. Tseng, S\. Levine, and T\. Biancalani \(2024c\)Feedback Efficient Online Fine\-Tuning of Diffusion Models\.InProceedings of the 41st International Conference on Machine Learning,pp\. 48892–48918\.Cited by:[§6](https://arxiv.org/html/2605.30610#S6.p9.5),[§7](https://arxiv.org/html/2605.30610#S7.p1.1)\.
- C\. Wang, M\. Uehara, Y\. He, A\. Wang, A\. Lal, T\. Jaakkola, S\. Levine, A\. Regev, H\. Wang, and T\. Biancalani \(2025\)Fine\-tuning discrete diffusion models via reward optimization with applications to dna and protein design\.InInternational Conference on Learning Representations,Vol\.2025,pp\. 47871–47899\.Cited by:[§7](https://arxiv.org/html/2605.30610#S7.p1.1)\.
- H\. Wang, T\. Zariphopoulou, and X\. Y\. Zhou \(2020\)Reinforcement learning in continuous time and space: a stochastic control approach\.Journal of Machine Learning Research21\(198\),pp\. 1–34\.Cited by:[§2](https://arxiv.org/html/2605.30610#S2.p3.4),[§2](https://arxiv.org/html/2605.30610#S2.p3.7)\.
- Z\. Wang, R\. De Santi, X\. Mo, M\. M\. Zavlanos, A\. Krause, and K\. H\. Johansson \(2026\)Efficient tail\-aware generative optimization via flow model fine\-tuning\.arXiv preprint arXiv:2602\.16796\.Cited by:[§7](https://arxiv.org/html/2605.30610#S7.p1.1)\.
- J\. Wohlwend, G\. Corso, S\. Passaro, N\. Getz, M\. Reveiz, K\. Leidal, W\. Swiderski, L\. Atkinson, T\. Portnoi, I\. Chinn,et al\.\(2025\)Boltz\-1 democratizing biomolecular interaction modeling\.BioRxiv,pp\. 2024–11\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p1.1)\.
- K\. E\. Wu, K\. K\. Yang, R\. van den Berg, S\. Alamdari, J\. Y\. Zou, A\. X\. Lu, and A\. P\. Amini \(2024\)Protein structure generation via folding diffusion\.Nature communications15\(1\),pp\. 1059\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p1.1)\.
- Q\. Xiao, H\. Yuan, A\. Saif, G\. Liu, R\. Kompella, M\. Wang, and T\. Chen \(2025\)A first\-order generative bilevel optimization framework for diffusion models\.arXiv preprint arXiv:2502\.08808\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p4.1)\.
- O\. Zekri and N\. Boullé \(2026\)Fine\-tuning discrete diffusion models with policy gradient methods\.Advances in Neural Information Processing Systems38,pp\. 152868–152906\.Cited by:[§7](https://arxiv.org/html/2605.30610#S7.p1.1)\.
- B\. Zhang, Z\. Wang, and Y\. Liu \(2025a\)A gradient guided diffusion framework for chance constrained programming\.arXiv preprint arXiv:2510\.12238\.Cited by:[§1](https://arxiv.org/html/2605.30610#S1.p4.1)\.
- J\. Zhang, L\. Zhao, A\. Papachristodoulou, and J\. Umenberger \(2025b\)Constrained diffusers for safe planning and control\.arXiv preprint arXiv:2506\.12544\.Cited by:[§3](https://arxiv.org/html/2605.30610#S3.p3.21),[§7](https://arxiv.org/html/2605.30610#S7.p3.1)\.
- H\. Zhao, H\. Chen, J\. Zhang, D\. Yao, and W\. Tang \(2025\)Score as action: fine tuning diffusion generative models by continuous\-time reinforcement learning\.InForty\-second International Conference on Machine Learning,Cited by:[§2](https://arxiv.org/html/2605.30610#S2.p3.4)\.
- K\. Zheng, H\. Chen, H\. Ye, H\. Wang, Q\. Zhang, K\. Jiang, H\. Su, S\. Ermon, J\. Zhu, and M\. Liu \(2025\)Diffusionnft: online diffusion reinforcement with forward process\.arXiv preprint arXiv:2509\.16117\.Cited by:[Figure 7](https://arxiv.org/html/2605.30610#A2.F7),[Figure 7](https://arxiv.org/html/2605.30610#A2.F7.5.2),[§4](https://arxiv.org/html/2605.30610#S4.p4.3),[§5](https://arxiv.org/html/2605.30610#S5.p6.5),[Figure 3](https://arxiv.org/html/2605.30610#S6.F3),[Figure 3](https://arxiv.org/html/2605.30610#S6.F3.10.5),[§6](https://arxiv.org/html/2605.30610#S6.p2.23),[§6](https://arxiv.org/html/2605.30610#S6.p6.7),[§7](https://arxiv.org/html/2605.30610#S7.p1.1)\.
## Appendix AImplementation ofFineTuningSolver\- Adjoint Matching\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\)
To ensure completeness, below we provide pseudocode for one concrete realization of aFineTuningSolveras in Eq\.[9](https://arxiv.org/html/2605.30610#S4.E9)\. We describe exactly the version employed in Sec\.[6](https://arxiv.org/html/2605.30610#S6), which builds on the Adjoint Matching framework\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\), casting linear fine\-tuning as a stochastic optimal control problem and tackling it via regression\.
Letupreu^\{\\text\{pre\}\}be the initial, pre\-trained vector field, andufinetunedu^\{\\text\{finetuned\}\}its fine\-tuned counterpart\. We also useα¯\\bar\{\\alpha\}to refer to the accumulated noise schedule fromHoet al\.\([2020](https://arxiv.org/html/2605.30610#bib.bib22)\), effectively following the flow models notation introduced by Adjoint Matching\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15), Sec\. 5\.2\)\. The full procedure is in Alg\.[2](https://arxiv.org/html/2605.30610#alg2)\.
Algorithm 2FineTuningSolver\- Adjoint Matching\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\)1:Input:
N:N:number of iterations,
uk:u^\{k\}:current finetuned flow vector field,
upre:u^\{\\text\{pre\}\}:pre\-trained flow vector field,
α\\alpharegularization coefficient \(Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5)\),
∇f\\nabla f: objective function gradient,
mmbatch size,
hhstep size
2:Init:
ufinetuned≔uku^\{\\text\{finetuned\}\}\\coloneqq u^\{k\}with parameter
θ\\theta
3:for
n=0,1,2,…,N−1n=0,1,2,\\ldots,N\-1do
4:Sample
mmtrajectories
\{Xt\}0≤t≤1\\\{X\_\{t\}\\\}\_\{0\\leq t\\leq 1\}via a memoryless noise schedule
σ\(t\)\\sigma\(t\)\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\), e\.g\.,
sampleϵt∼𝒩\(0,I\),X0∼𝒩\(0,I\), then:\\text\{sample \}\\epsilon\_\{t\}\\sim\\mathcal\{N\}\(0,I\),\\;X\_\{0\}\\sim\\mathcal\{N\}\(0,I\)\\text\{, then:\}\(17\)Xt\+h=Xt\+h\(2uθfinetuned\(Xt,t\)−α¯tαtXt\)\+hσ\(t\)ϵtX\_\{t\+h\}=X\_\{t\}\+h\\left\(2u\_\{\\theta\}^\{\\text\{finetuned\}\}\(X\_\{t\},t\)\-\\frac\{\\bar\{\\alpha\}\_\{t\}\}\{\\alpha\_\{t\}\}X\_\{t\}\\right\)\+\\sqrt\{h\}\\sigma\(t\)\\epsilon\_\{t\}\(18\)
5:Use objective function gradient:
a~1=−1α∇X1f\(X1\)\\tilde\{a\}\_\{1\}=\-\\frac\{1\}\{\\alpha\}\\nabla\_\{X\_\{1\}\}f\(X\_\{1\}\)
6:For each trajectory, solve the lean adjoint ODE,\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15), Eq\. 38\-39\), from
t=1t=1to
0:
a~t−h=a~t\+ha~t⊤∇Xt\(2upre\(Xt,t\)−α¯tαtXt\)\\tilde\{a\}\_\{t\-h\}=\\tilde\{a\}\_\{t\}\+h\\tilde\{a\}\_\{t\}^\{\\top\}\\nabla\_\{X\_\{t\}\}\\left\(2u^\{\\text\{pre\}\}\(X\_\{t\},t\)\-\\frac\{\\bar\{\\alpha\}\_\{t\}\}\{\\alpha\_\{t\}\}X\_\{t\}\\right\)\(19\)
7:Where
XtX\_\{t\}and
a~t\\tilde\{a\}\_\{t\}are computed without gradients, i\.e\.,
Xt=stopgrad\(Xt\),a~t=stopgrad\(a~t\)X\_\{t\}=\\texttt\{stopgrad\}\(X\_\{t\}\),\\tilde\{a\}\_\{t\}=\\texttt\{stopgrad\}\(\\tilde\{a\}\_\{t\}\)\. For each trajectory, compute the Adjoint Matching objective\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15), Eq\. 37\):
ℒθ=∑t∈\{0,h,…,1−h\}‖2σ\(t\)\(uθfinetuned\(Xt,t\)−upre\(Xt,t\)\)\+σ\(t\)a~t‖2\\mathcal\{L\}\_\{\\theta\}=\\sum\_\{t\\in\\\{0,h,\\dots,1\-h\\\}\}\\left\\lVert\\frac\{2\}\{\\sigma\(t\)\}\\left\(u\_\{\\theta\}^\{\\text\{finetuned\}\}\(X\_\{t\},t\)\-u^\{\\text\{pre\}\}\(X\_\{t\},t\)\\right\)\+\\sigma\(t\)\\tilde\{a\}\_\{t\}\\right\\rVert^\{2\}\(20\)
8:Compute the gradient
∇θℒ\(θ\)\\nabla\_\{\\theta\}\\mathcal\{L\}\(\\theta\)and update
θ\\theta\.
9:endfor
10:Output:Fine\-tuned flow vector field
uθfinetunedu\_\{\\theta\}^\{\\text\{finetuned\}\}
For further implementation details, we refer toDomingo\-Enrichet al\.\([2024](https://arxiv.org/html/2605.30610#bib.bib15), Appendix G\)\.
## Appendix BFurther Experiments and Details \- Illustrative Examples
Reward\-only rejection sampling\.We also compare against a simple rejection\-sampling baseline, complementary to the fixed\-μ\\mubaseline in Eq\.[7](https://arxiv.org/html/2605.30610#S3.E7)\. We fine\-tune a policy purely on the reward signal using Adjoint Matching and then enforce feasibility only by rejecting samples that violate the constraint\. On the example in Figure[2\(c\)](https://arxiv.org/html/2605.30610#S4.F2.sf3), this reward\-only policy attains a constraint satisfaction rate of13\.40%13\.40\\%, compared to84\.40%84\.40\\%for the policy fine\-tuned withCFO, i\.e\., accounting for the constraint during fine\-tuning\. Inspecting the samples further reveals that \(1\) violations underCFOoccur predominantly near the constraint boundary, and \(2\) rejection sampling is ineffective when the reward optimum and the constraint region are poorly aligned\.
Details for visually interpretable settings \(Figure[2](https://arxiv.org/html/2605.30610#S4.F2)\)\.The Mixture of Gaussians \(Figure[2\(a\)](https://arxiv.org/html/2605.30610#S4.F2.sf1)\) is generated by
p\(x\)=12\(𝒩\(x∣\[−7−2\],𝚺\)\+𝒩\(x∣\[27\],𝚺\)\),with𝚺=\[3003\],p\(x\)=\\frac\{1\}\{2\}\\left\(\\mathcal\{N\}\\left\(x\\mid\\begin\{bmatrix\}\-7\\\\ \-2\\end\{bmatrix\},\\bm\{\\Sigma\}\\right\)\+\\mathcal\{N\}\\left\(x\\mid\\begin\{bmatrix\}2\\\\ 7\\end\{bmatrix\},\\bm\{\\Sigma\}\\right\)\\right\),\\;\\text\{ with \}\\;\\bm\{\\Sigma\}=\\begin\{bmatrix\}3&0\\\\ 0&3\\end\{bmatrix\},We sample20k20kpoints \(80/2080/20train/validation split\) and train a MLP with33hidden layers, each with256256nodes, for the vector fieldvv\. The same setting is used for the experiment on the correlated Gaussian \(Figure[2\(e\)](https://arxiv.org/html/2605.30610#S4.F2.sf5)\), with:
p\(x\)=𝒩\(x∣\[0\.50\.5\],\[10\.50\.51\]\)p\(x\)=\\mathcal\{N\}\\left\(x\\mid\\begin\{bmatrix\}0\.5\\\\ 0\.5\\end\{bmatrix\},\\begin\{bmatrix\}1&0\.5\\\\ 0\.5&1\\end\{bmatrix\}\\right\)
The constraint triangles have the following vertices:
1. 1\.MoG: △I:\(\[−10−4\],\[−5−4\]\[−52\]\)and△II:\(\[4−1\],\[102\],\[54\]\)\\triangle^\{I\}:\\left\(\\begin\{bmatrix\}\-10\\\\ \-4\\end\{bmatrix\},\\begin\{bmatrix\}\-5\\\\ \-4\\end\{bmatrix\}\\begin\{bmatrix\}\-5\\\\ 2\\end\{bmatrix\}\\right\)\\;\\text\{ and \}\\;\\triangle^\{II\}:\\left\(\\begin\{bmatrix\}4\\\\ \-1\\end\{bmatrix\},\\begin\{bmatrix\}10\\\\ 2\\end\{bmatrix\},\\begin\{bmatrix\}5\\\\ 4\\end\{bmatrix\}\\right\)
2. 2\.Correlated Gaussian: △:\(\[−1−0\.5\],\[1−0\.5\],\[01\]\)\\triangle:\\;\\left\(\\begin\{bmatrix\}\-1\\\\ \-0\.5\\end\{bmatrix\},\\begin\{bmatrix\}1\\\\ \-0\.5\\end\{bmatrix\},\\begin\{bmatrix\}0\\\\ 1\\end\{bmatrix\}\\right\)
Computational Cost of:CFOcompared to AM\.We plot the computational cost ofCFOfor a fixed budgetM=6000M=6000and a varyingK∈\{3,15,20,100\}K\\in\\\{3,15,20,100\\\}and compare it to AM\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\)withN=6000N=6000\(see Sec\.[6](https://arxiv.org/html/2605.30610#S6)\)\.
Figure 6:Reward and constraint for different values of \(K/N\)Comparison to DiffOpt\(Konget al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib26)\), an inference\-time constrained\-generation method\.DiffOpt is a recent inference\-time scheme that keeps the pre\-trained model weights fixed and instead modifies the sampling procedure to satisfy constraints \(a fundamentally different setting fromCFO, which fine\-tunes the model\)\. For completeness, we report a comparison on the illustrative MoG task \(Figure[2\(a\)](https://arxiv.org/html/2605.30610#S4.F2.sf1)\-[2\(c\)](https://arxiv.org/html/2605.30610#S4.F2.sf3)\); the qualitative sample distributions are shown in Figure[7](https://arxiv.org/html/2605.30610#A2.F7), and numerical results are summarized in Table[1](https://arxiv.org/html/2605.30610#A2.T1)\. Across three settings of DiffOpt’s guidance and Langevin\-MCMC hyperparameters \(βr,βc,L,η\\beta\_\{r\},\\beta\_\{c\},L,\\eta\) DiffOpt achieves a per\-sample violation rate of1010–50%50\\%versus8\.5%8\.5\\%forCFO, and incurs a substantially higher per\-sample sampling cost \(between44×44\\timesand55×55\\timesslower thanCFO\) because every sample requires repeated guidance steps\. Tuning DiffOpt is therefore essential:*aggressive*settings \(βc≥50\\beta\_\{c\}\\geq 50\) caused sample divergence and were discarded\. Even when tuned to either favor reward \(“high reward”\) or constraint satisfaction \(“gentle\+long”\), DiffOpt either violates the constraint substantially \(twice the rate ofCFO\) or, when constraint\-tight, sacrifices both reward and sampling efficiency\.CFO, on the contrary, fine\-tunes the policy once, retains the per\-sample inference cost of the base model, and reaches a reward of−4\.75\-4\.75at8\.5%8\.5\\%violation without any per\-sample guidance\.
\(a\)CFO\(NFT\)
\(b\)DiffOpt \(default\)
\(c\)DiffOpt \(high reward\)
\(d\)DiffOpt \(gentle\+long\)
Figure 7:Qualitative comparison of samples on the MoG task\. \([7\(a\)](https://arxiv.org/html/2605.30610#A2.F7.sf1)\)CFOwith DiffusionNFT\(Zhenget al\.,[2025](https://arxiv.org/html/2605.30610#bib.bib85)\)as the innerFineTuningSolver, as a fine\-tuning\-based reference\. \([7\(b\)](https://arxiv.org/html/2605.30610#A2.F7.sf2)\-[7\(d\)](https://arxiv.org/html/2605.30610#A2.F7.sf4)\) DiffOpt\(Konget al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib26)\)under three hyperparameter regimes; see Table[1](https://arxiv.org/html/2605.30610#A2.T1)for the corresponding numerical results\.Table 1:DiffOpt\(Konget al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib26)\)on the MoG task across three hyperparameter regimes, compared againstCFO\. All values are mean±\\pm95%95\\%CI\.
## Appendix CFurther Results on Molecular Design Experiments
Molecular Design\.For the molecular design task, we fine\-tune FlowMol\(Dunn and Koes,[2024](https://arxiv.org/html/2605.30610#bib.bib16)\), which jointly models continuous atomic coordinates and discrete categorical variables \(atom types, formal charges, bond orders\)\. We refer toDunn and Koes \([2024](https://arxiv.org/html/2605.30610#bib.bib16)\)for the sampling of categorical and initial values\. We use Gaussian sampling for the experiments on GEOM\-Drugs and CTMC for the experiments on QM9\.
GNN Details and Generalization\.To verify that optimization targets the intended physical objective rather than exploiting the surrogate, we evaluate the ground\-truthxTBvalues for every molecule sampled during the execution ofCFOand compare their properties to the GNN predictions\. For the energy \(used as a constraint\), surrogate predictions are essentially indistinguishable fromxTB, indicating faithful approximation within the explored region\. For the dipole moment \(the maximization target\), the surrogate systematically underestimates the true xTB values by10%10\\%, yet the two remain strongly correlated and move in lockstep throughout the fine\-tuning\. Consequently, improvements under the surrogate translate to larger gains underxTB\. Overall, these checks indicate thatCFOdoes not exploit model artifacts and remains within the training distribution\.
\(a\)Dipole Moment \(in D\)
\(b\)Energy \(in Ha\)
\(c\)Evaluation
Figure 8:Energy\-Constrained Dipole Moment Maximization for Molecular Design \(MD\) \([8\(a\)](https://arxiv.org/html/2605.30610#A3.F8.sf1)\-[8\(b\)](https://arxiv.org/html/2605.30610#A3.F8.sf2)\): Evolution of the constraint and reward duringCFOcompared to the truextbValue\.[8\(c\)](https://arxiv.org/html/2605.30610#A3.F8.sf3): Numeric Comparison between ofCFOandxtb\.\(a\)Dipole Moment \(in D\)
\(b\)Energy \(in Ha\)
Figure 9:Energy\-constrained dipole moment maximization on QM9\(Ramakrishnanet al\.,[2014](https://arxiv.org/html/2605.30610#bib.bib39)\)and usingdxtb\(Friedeet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib18)\)as reward and constraint functions, with exact gradients of the simulation\.Additional Results with Exact Rewards and Constraints usingdxtb\.In a complementary experiment, we employdxtb\(Friedeet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib18)\)instead of neural approximators to obtain rewards and constraints, which offers exact gradients over atomic positions\. For this experiment, we fine\-tune FlowMol pre\-trained on QM9\(Ramakrishnanet al\.,[2014](https://arxiv.org/html/2605.30610#bib.bib39)\)\. We again maximize the dipole moment while constraining the total energy to remain below−18\-18Ha, a value that differs from the constraint in the main paper due to the different atomic number distribution\. As shown in Table[2](https://arxiv.org/html/2605.30610#A3.T2), the pre\-trained modelπpre\\pi^\{pre\}violates such a constraint with65%65\\%of samples\. In contrast, the model fine\-tuned viaCFOcan successfully achieve zero constraint violation \(30 Monte Carlo samples, all below the threshold\) while increasing the average norm of the dipole moment from3\.43±3\.453\.43\\pm 3\.45to8\.66±4\.508\.66\\pm 4\.50, as shown in Fig\.[9\(a\)](https://arxiv.org/html/2605.30610#A3.F9.sf1)\. As a baseline comparison, we compare to just using Adjoint Matching\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\), which increases the dipole to9\.049\.04D but also the energy to−15\.5\-15\.5Ha\.
Results usingposebustervalidity score function\.To further highlightCFO’s flexibility, we replace the energy constraint with a molecular\-validity criterion based onposebuster\(Buttenschoenet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib9)\), while keeping the dipole moment as reward\. We train a GNN on a custom validation score that equals zero when a molecule is connected and passes the basicposebusterchecks, and11otherwise, runningCFOwithK=2K=2,N=50N=50, andB=0\.3B=0\.3\. The pre\-trained model attains a dipole moment of6\.926\.92D but has a53%53\\%constraint\-violation rate\. In contrast,CFOincreases the reward to9\.609\.60D while reducing the predicted violation rate to39%39\\%\. In contrast to the energy constraints presented in the main text, the predicted violation rate also differs from the ground truth violation rate, which might be circumvented by an online learning of the constraint function\.
Additional Discussion on Validity of Molecules\.For the molecular design experiments on drug\-like molecules presented in the main text, we further apply an RDKit validation step, including stereochemistry reassignment, hydrogen count correction, and full sanitization \(valences, kekulization, bond orders\)\. Approximately7%7\\%of final molecules pass, which can be attributed to several reasons: Already in the base FlowMol model, only34%34\\%of molecules fulfill the RDKit validation step, highlighting the need for more diverse pre\-training datasets and further base model improvements\. Furthermore, the FlowMol\-generated geometries used during optimization are not geometrically relaxed, which can lead to invalid bond lengths or angles \(see examples in Figure[10](https://arxiv.org/html/2605.30610#A3.F10)\)\. This motivates the development of fully differentiable geometry relaxation methods for molecular design or the extension ofCFOto different solvers\.


Figure 10:Generated drug\-like molecules failing the validity test and showing unreasonable bond lengths and angles, highlighted with red circles\.Definition of stringent validity criteria\.We evaluate validity on a more stringent criterion than pureRDKit\-Sanitization\. Namely, we add two criteria before that to make the workflow as follows:
Is sample connected?→Can generated conformer can be embedded?→RDKit−Sanitization\\text\{Is sample connected?\}\\to\\text\{Can generated conformer can be embedded?\}\\to\\texttt\{RDKit\}\-\\text\{Sanitization\}
Table 2:Numeric results forCFOon QM9 usingdxtbfor dipole and energy\.
## Appendix DParameter Details, Ablation Studies, and Algorithmic Extensions forCFOand Adjoint Matching
\(a\)Dipole \(in D\) for differentNN\.
\(b\)Energy \(in Ha\) for differentNN\.
Figure 11:Unconstrained Dipole maximization of AM\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\), i\.e\.,μ=0\\mu=0in Eq\.[7](https://arxiv.org/html/2605.30610#S3.E7), for differentNN\.Discussion of the most important Hyperparameters ofCFOandFineTuningSolver:
- •Initial penaltyρinit\\rho\_\{\\text\{init\}\}\.A largerρinit\\rho\_\{\\text\{init\}\}penalizes constraint violations more strongly, effectively reducing early exploration within the base distribution\. Smallerρinit\\rho\_\{\\text\{init\}\}does the opposite\.
- •Penalty growth rateη≥1\\eta\\\!\\geq\\\!1\.Controls the penalty growth across updates\. Largerη\\etaaccelerates enforcement and thus can reduce exploration of high\-reward regions\. Smallerη\\etatightens feasibility more gradually, allowing for early reward progress, but potentially slower constraint satisfaction\.
- •Contraction rateτ∈\(0,1\)\\tau\\\!\\in\\\!\(0,1\)\.Determines when to update the penalty parameterρ\\rho\. Smallerτ\\tautriggers more frequent updates, values near one update conservatively\.
- •Multiplier lower boundλmin<0\\lambda\_\{\\min\}\\\!<\\\!0\.Safeguards the Lagrange multiplier via clipping\. Smallerλmin\\lambda\_\{\\min\}permits larger corrective signals of the offset, see Sec\.[4](https://arxiv.org/html/2605.30610#S4)\. If set to a large negative value, its influence on the final output is typically small, sinceλmin\\lambda\_\{\\min\}is not achieved\.
- •FineTuningSolverregularizationα\\alpha\.Trade\-off between staying close to the base distribution and reallocating mass\. Largerα\\alphaenforces stronger KL\-regularization of the policy\. A smallerα\\alphaallows greater deviation from the base policy\.
- •Sampling for constraint estimation \(sample count/batch size\)\.Larger samples reduce estimator variance, stabilizing updates and improving feasibility\. If the sample size is too small, this yields volatile or biased estimates that can steerCFOto off\-target solutions\.
Multi\-Constraint Extension ofCFO\.CFOstraightforwardly extends to multiple constraints\{cj\}j=1J\\\{c^\{j\}\\\}\_\{j=1\}^\{J\}with bounds\{Bj\}j=1J\\\{B^\{j\}\\\}\_\{j=1\}^\{J\}by introducing one Lagrange multiplierλkj\\lambda\_\{k\}^\{j\}and penaltyρkj\\rho\_\{k\}^\{j\}per constraint, yielding the multi\-constraint augmented reward
fk\(x\)=r\(x\)−∑j=1Jρkj2\[max\(0,cj\(x\)−Bj−λkjρkj\)\]2\.f\_\{k\}\(x\)=r\(x\)\-\\sum\_\{j=1\}^\{J\}\\frac\{\\rho\_\{k\}^\{j\}\}\{2\}\\left\[\\max\\\!\\left\(0,\\;c^\{j\}\(x\)\-B^\{j\}\-\\frac\{\\lambda\_\{k\}^\{j\}\}\{\\rho\_\{k\}^\{j\}\}\\right\)\\right\]^\{2\}\.All steps in Alg\.[1](https://arxiv.org/html/2605.30610#alg1)apply coordinate\-wise to\(λkj,ρkj\)\(\\lambda\_\{k\}^\{j\},\\rho\_\{k\}^\{j\}\), and the feasibility and optimality guarantees of Sec\.[5](https://arxiv.org/html/2605.30610#S5)carry over by standard arguments in the augmented Lagrangian literature\(Birgin and Martínez,[2014](https://arxiv.org/html/2605.30610#bib.bib6)\)\. This enables, for instance, jointly enforcing an energy constraint and a validity constraint on the molecular design task\.
Table 3:Hyperparameters forCFOand Adjoint MatchingAblation study forρinit,η,τ,\\rho\_\{\\text\{init\}\},\\eta,\\tau,andλmin\\lambda\_\{\\text\{min\}\}\.\. In the following, we provide an ablation study for the molecular design task \(Figure[3\(a\)](https://arxiv.org/html/2605.30610#S6.F3.sf1)\-[3\(b\)](https://arxiv.org/html/2605.30610#S6.F3.sf2)\) as well as the MoG task \(Figure[2\(b\)](https://arxiv.org/html/2605.30610#S4.F2.sf2)\)\.
Table 4:Ablation Study for MoG \([2\(b\)](https://arxiv.org/html/2605.30610#S4.F2.sf2)\) and Molecular Design Tasks \([3\(a\)](https://arxiv.org/html/2605.30610#S6.F3.sf1)\-[3\(b\)](https://arxiv.org/html/2605.30610#S6.F3.sf2)\)Across tasks, CFO’s sensitivity to hyperparameters varies: while the MoG task exhibits clear shifts in reward and constraint satisfaction across settings, the molecular design task remains highly robust, with only minor fluctuations\. Larger initialρinit\\rho\_\{\\text\{init\}\}and higherη\\etaconsistently tighten constraint satisfaction at the cost of modestly reduced reward, whereasλmin\\lambda\_\{\\text\{min\}\}andτ\\tauhave a lower effect\. The lower effect ofλmin\\lambda\_\{\\text\{min\}\}likely stems fromλ\\lambdararely reaching its lower bound, and the contraction parameter barely impacts updates\. A separate batch\-size ablation on MoG shows that larger batches significantly improve constraint satisfaction and reward maximization\.
Table 5:Ablation Study for the MoG task with different batch sizes
## Appendix EProofs
Before we present a proof of the theorems in Section[5](https://arxiv.org/html/2605.30610#S5)\. We will transform the main problem in Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5)to a simpler form\. First, we recall that the policyπ\\piis a vector field\. It has been shown before that the ODE in Eq\.[1](https://arxiv.org/html/2605.30610#S2.E1)and a stochastic differential equation \(SDE\) of the form
dXt=b\(Xt,t\)dt\+σ\(t\)dBt,X0∼p0,dX\_\{t\}=b\(X\_\{t\},t\)dt\+\\sigma\(t\)dB\_\{t\},\\;X\_\{0\}\\sim p\_\{0\},\(21\)with driftb:ℝd×\[0,1\]→ℝdb:\\mathbb\{R\}^\{d\}\\times\[0,1\]\\to\\mathbb\{R\}^\{d\}, diffusion coefficientσ:\[0,1\]→ℝ≥0\\sigma:\[0,1\]\\to\\mathbb\{R\}\_\{\\geq 0\}and Brownian motionBtB\_\{t\}induce the same marginals\{pt\}\\\{p\_\{t\}\\\}\. For an exact definition ofbband a proof of this statement, we refer to\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\)\. Controlling this SDE can be done by adjusting the drift as follows\(Tang and Zhou,[2024](https://arxiv.org/html/2605.30610#bib.bib46); Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\):
dXt=\(b\(Xt,t\)\+σ\(t\)u\(Xt,t\)\)dt\+σ\(t\)dBt,X0∼p0,dX\_\{t\}=\\left\(b\(X\_\{t\},t\)\+\\sigma\(t\)u\(X\_\{t\},t\)\\right\)dt\+\\sigma\(t\)dB\_\{t\},\\;X\_\{0\}\\sim p\_\{0\},whereu:ℝd×\[0,1\]→ℝdu:\\mathbb\{R\}^\{d\}\\times\[0,1\]\\to\\mathbb\{R\}^\{d\}is a control vector field, this means the pre\-trained model is a controlled model withu≡0u\\equiv 0\. With these notational changes, we reformulate the optimization problem in Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5)in terms of the controlled diffusion process𝐗u∼pu\\mathbf\{X\}^\{u\}\\sim p^\{u\}:
maxu∈𝒰𝔼𝐗u∼pu\[r\(X1\)\]−αDKL\(p1u\(⋅\)\|\|p1pre\(⋅\)\)s\.t\.𝔼𝐗u∼pu\[c\(X1\)\]≤B\\begin\{split\}\\max\_\{u\\in\\mathcal\{U\}\}\\quad&\\mathbb\{E\}\_\{\\mathbf\{X\}^\{u\}\\sim p^\{u\}\}\\left\[r\(X\_\{1\}\)\\right\]\-\\alpha D\_\{KL\}\(p^\{u\}\_\{1\}\(\\cdot\)\|\|p^\{\\text\{pre\}\}\_\{1\}\(\\cdot\)\)\\\\ \\text\{s\.t\.\}\\quad&\\mathbb\{E\}\_\{\\mathbf\{X\}^\{u\}\\sim p^\{u\}\}\\left\[c\(X\_\{1\}\)\\right\]\\leq B\\end\{split\}\(22\)Eq\.[22](https://arxiv.org/html/2605.30610#A5.E22)may seem the same as Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5), but it is in terms of a diffusion process\. This way we can calculate the KL efficiently, see\(Eq\. 18, Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\), by using Girsanov’s theorem, which gives the relationship between the control processuuand the KL\-Divergence:
DKL\(pu\(𝐗\|X0\)\|\|ppre\(𝐗\|X0\)\)=𝔼𝐗u∼pu\[∫0112∥u\(Xt,t\)∥2dBt\]D\_\{\\text\{KL\}\}\(p^\{u\}\(\\mathbf\{X\}\|X\_\{0\}\)\\;\|\|\\;p^\{\\text\{pre\}\}\(\\mathbf\{X\}\|X\_\{0\}\)\)=\\mathbb\{E\}\_\{\\mathbf\{X\}^\{u\}\\sim p^\{u\}\}\\left\[\\int\_\{0\}^\{1\}\\frac\{1\}\{2\}\\left\\lVert u\(X\_\{t\},t\)\\right\\rVert^\{2\}dB\_\{t\}\\right\]Meaning if both processes have the same initial valueX0X\_\{0\}, the KL divergence between the controlled and uncontrolled process is equal to the expected value of the squared norm of the controluu\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15); Ueharaet al\.,[2024b](https://arxiv.org/html/2605.30610#bib.bib49); Tang and Zhou,[2024](https://arxiv.org/html/2605.30610#bib.bib46)\)\. This dependence on the initial value can be dropped when using a specific noise schedule\(Domingo\-Enrichet al\.,[2024](https://arxiv.org/html/2605.30610#bib.bib15)\)\. Recalling that marginals at timettarept\(x\)p\_\{t\}\(x\), i\.e\.Xt∼pt\(x\)X\_\{t\}\\sim p\_\{t\}\(x\), then we can equivalently write the optimization problem as:
maxu∈𝒰𝔼𝐗u∼pu\[r\(X1\)\]−α𝔼\[∫0112‖u\(Xtu,t\)‖2𝑑t\]s\.t\.𝔼𝐗u∼pu\[c\(X1\)\]≤B\\begin\{split\}\\max\_\{u\\in\\mathcal\{U\}\}\\quad&\\mathbb\{E\}\_\{\\mathbf\{X\}^\{u\}\\sim p^\{u\}\}\\left\[r\(X\_\{1\}\)\\right\]\-\\alpha\\mathbb\{E\}\\left\[\\int\_\{0\}^\{1\}\\frac\{1\}\{2\}\\\|u\(X\_\{t\}^\{u\},t\)\\\|^\{2\}dt\\right\]\\\\ \\text\{s\.t\.\}\\quad&\\mathbb\{E\}\_\{\\mathbf\{X\}^\{u\}\\sim p^\{u\}\}\\left\[c\(X\_\{1\}\)\\right\]\\leq B\\end\{split\}Where the expectation is taken over the controlled process𝐗u\\mathbf\{X\}^\{u\}\. For numerical optimization, we now assume that the controluuis a parametric model, typically a neural network, with parametersθ\\theta\. The resulting optimization problem is then:
maxθ∈ℝmF\(θ\):=Fr\(θ\)−αFKL\(θ\)=𝔼x∼p1uθ\[r\(x\)\]−α𝔼\[∫0112‖uθ\(Xt,t\)‖2𝑑t\]s\.t\.G\(θ\):=𝔼x∼p1uθ\[c\(x\)\]−B≤0\\begin\{split\}\\max\_\{\\theta\\in\\mathbb\{R\}^\{m\}\}\\quad F\(\\theta\):=&\\;F\_\{r\}\(\\theta\)\-\\alpha F\_\{KL\}\(\\theta\)\\\\ =&\\;\\mathbb\{E\}\_\{x\\sim p^\{u\_\{\\theta\}\}\_\{1\}\}\[r\(x\)\]\-\\alpha\\mathbb\{E\}\\left\[\\int\_\{0\}^\{1\}\\frac\{1\}\{2\}\\\|u\_\{\\theta\}\(X\_\{t\},t\)\\\|^\{2\}dt\\right\]\\\\ \\text\{s\.t\.\}\\quad G\(\\theta\):=&\\;\\mathbb\{E\}\_\{x\\sim p^\{u\_\{\\theta\}\}\_\{1\}\}\[c\(x\)\]\-B\\leq 0\\end\{split\}\(23\)For some functionF:ℝm→ℝF:\\mathbb\{R\}^\{m\}\\to\\mathbb\{R\}and functionG:ℝm→ℝG:\\mathbb\{R\}^\{m\}\\to\\mathbb\{R\}\. This is finite\-dimensional optimization overθ\\theta\.
Next, we present a proof that Alg\.[1](https://arxiv.org/html/2605.30610#alg1)can find a parameterized policyπθ\\pi\_\{\\theta\}, withθ∈ℝm\\theta\\in\\mathbb\{R\}^\{m\}that minimizes the infeasibility while maximizing the reward\. The proof is adapted from “Practical Augmented Lagrangian Methods for Constrained Optimization”\(Birgin and Martínez,[2014](https://arxiv.org/html/2605.30610#bib.bib6), Chapter 5\)\.
The augmented Lagrangian objective in Eq\.[13](https://arxiv.org/html/2605.30610#S4.E13)becomes:
Lρ\(θ,λ\)=F\(θ\)−ρ2\[max\(0,G\(θ\)−λρ\)\]2L\_\{\\rho\}\(\\theta,\\lambda\)=F\(\\theta\)\-\\frac\{\\rho\}\{2\}\\left\[\\max\\left\(0,G\(\\theta\)\-\\frac\{\\lambda\}\{\\rho\}\\right\)\\right\]^\{2\}\(24\)whereλ∈ℝ≤0\\lambda\\in\\mathbb\{R\}\_\{\\leq 0\}is the Lagrange multiplier,ρ\>0\\rho\>0is a penalty parameter\.
With this notation, the assumption on the solver becomes:
###### Assumption E\.1\(Solver\)\.
For allk∈ℕk\\in\\mathbb\{N\}, we obtainuusuch that:
Lρk\(θk,λk\)≥Lρk\(θ,λk\)−ϵk∀θ∈ℝmL\_\{\\rho\_\{k\}\}\(\\theta\_\{k\},\\lambda\_\{k\}\)\\geq L\_\{\\rho\_\{k\}\}\(\\theta,\\lambda\_\{k\}\)\-\\epsilon\_\{k\}\\quad\\forall\\;\\theta\\in\\mathbb\{R\}^\{m\}\(25\)where the sequence\{ϵk\}⊆ℝ\+\\\{\\epsilon\_\{k\}\\\}\\subseteq\\mathbb\{R\}\_\{\+\}is bounded\.
This corresponds to Assumption 5\.1 from\(Birgin and Martínez,[2014](https://arxiv.org/html/2605.30610#bib.bib6)\)\. Assumption[E\.1](https://arxiv.org/html/2605.30610#A5.Thmtheorem1)states that the solver can find an approximate maximizer of the subproblem\.
Next we state and prove the main result for the algorithm\. Namely, in the limit, we obtain a minimizer of the infeasibility measure\.
###### Theorem E\.2\(Feasibility ofConstrainedFlowOptimization\)\.
Let\{θk\}\\\{\\theta\_\{k\}\\\}be a sequence generated by Alg\.[1](https://arxiv.org/html/2605.30610#alg1)under the solver Assumption[E\.1](https://arxiv.org/html/2605.30610#A5.Thmtheorem1)\. Letθ¯\\bar\{\\theta\}be a limit of the sequence\{θk\}\\\{\\theta\_\{k\}\\\}\. Then, we have:
⟨G\(θ¯\)⟩\+≤⟨G\(θ\)⟩\+∀θ∈ℝm,\\left\\langle G\(\\bar\{\\theta\}\)\\right\\rangle\_\{\+\}\\leq\\left\\langle G\(\\theta\)\\right\\rangle\_\{\+\}\\quad\\forall\\theta\\in\\mathbb\{R\}^\{m\},\(26\)whereG\(θ\):=𝔼x∼p1uθ\[c\(x\)\]−B≤0G\(\\theta\):=\\;\\mathbb\{E\}\_\{x\\sim p^\{u\_\{\\theta\}\}\_\{1\}\}\[c\(x\)\]\-B\\leq 0and⟨⋅⟩\+:=max\{0,⋅\}\\left\\langle\\cdot\\right\\rangle\_\{\+\}:=\\max\\\{0,\\cdot\\\}\.
###### Proof\.
By definitionℝm\\mathbb\{R\}^\{m\}is closed andθk∈ℝm\\theta\_\{k\}\\in\\mathbb\{R\}^\{m\}thusθ¯∈ℝm\\bar\{\\theta\}\\in\\mathbb\{R\}^\{m\}\. We consider two cases:\{ρk\}\\\{\\rho\_\{k\}\\\}bounded andρk→∞\\rho\_\{k\}\\to\\infty\. First we assume\{ρk\}\\\{\\rho\_\{k\}\\\}is bounded, there existsk0k\_\{0\}such thatρk=ρk0\\rho\_\{k\}=\\rho\_\{k\_\{0\}\}for allk≥k0k\\geq k\_\{0\}\. Therefore, for allk≥k0k\\geq k\_\{0\}, the upper bracket of Eq\.[12](https://arxiv.org/html/2605.30610#S4.E12)holds\. This implies that\|Vk\|→0\|V\_\{k\}\|\\to 0, so⟨G\(θk\)⟩\+→0\\left\\langle G\(\\theta\_\{k\}\)\\right\\rangle\_\{\+\}\\to 0\. Thus, the limit point is feasible\.
Now, assume thatρk→∞\\rho\_\{k\}\\to\\infty\. LetK⊆ℕK\\subseteq\\mathbb\{N\}be such that:
θk→θ¯fork∈Kandk→∞\\theta\_\{k\}\\rightarrow\\bar\{\\theta\}\\;\\text\{ for \}\\;k\\in K\\;\\text\{and\}\\;k\\rightarrow\\inftyAssume by contradiction that there existsθ∈ℝd\\theta\\in\\mathbb\{R\}^\{d\}such that
⟨G\(θ¯\)⟩\+2\>⟨G\(θ\)⟩\+2\\left\\langle G\(\\bar\{\\theta\}\)\\right\\rangle\_\{\+\}^\{2\}\>\\left\\langle G\(\\theta\)\\right\\rangle\_\{\+\}^\{2\}By the continuity ofGG, the boundedness of\{λk\}\\left\\\{\\lambda\_\{k\}\\right\\\}, and the fact thatρk→∞\\rho\_\{k\}\\rightarrow\\infty, there existsc\>0c\>0andk0∈ℕk\_\{0\}\\in\\mathbb\{N\}such that for allk∈K,k≥k0k\\in K,k\\geq k\_\{0\}:
⟨G\(θk\)−λkρk⟩\+2\>⟨G\(θ\)−λkρk⟩\+2\+c\\left\\langle G\(\\theta\_\{k\}\)\-\\frac\{\\lambda\_\{k\}\}\{\\rho\_\{k\}\}\\right\\rangle\_\{\+\}^\{2\}\>\\left\\langle G\(\\theta\)\-\\frac\{\\lambda\_\{k\}\}\{\\rho\_\{k\}\}\\right\\rangle\_\{\+\}^\{2\}\+cTherefore, for allk∈K,k≥k0k\\in K,k\\geq k\_\{0\}:
F\(θk\)−ρk2\[⟨G\(θk\)−λkρk⟩\+2\]<F\(θ\)−ρk2\[⟨G\(θ\)−λkρk⟩\+2\]−ρkc2\+F\(θk\)−F\(θ\)F\(\\theta\_\{k\}\)\-\\frac\{\\rho\_\{k\}\}\{2\}\\left\[\\left\\langle G\(\\theta\_\{k\}\)\-\\frac\{\\lambda\_\{k\}\}\{\\rho\_\{k\}\}\\right\\rangle\_\{\+\}^\{2\}\\right\]<F\(\\theta\)\-\\frac\{\\rho\_\{k\}\}\{2\}\\left\[\\left\\langle G\(\\theta\)\-\\frac\{\\lambda\_\{k\}\}\{\\rho\_\{k\}\}\\right\\rangle\_\{\+\}^\{2\}\\right\]\-\\frac\{\\rho\_\{k\}c\}\{2\}\+F\(\\theta\_\{k\}\)\-F\(\\theta\)Sincelimk∈Kθk=θ¯\\lim\_\{k\\in K\}\\theta\_\{k\}=\\bar\{\\theta\}, the continuity ofFF, and the boundedness of\{ϵk\}\\\{\\epsilon\_\{k\}\\\}, there existsk1≥k0k\_\{1\}\\geq k\_\{0\}such that, fork∈Kk\\in Kk≥k1k\\geq k\_\{1\}:
ρkc2−F\(θk\)\+F\(θ\)\>ϵk\\frac\{\\rho\_\{k\}c\}\{2\}\-F\(\\theta\_\{k\}\)\+F\(\\theta\)\>\\epsilon\_\{k\}Therefore,
F\(θk\)−ρk2\[⟨G\(θk\)−λkρk⟩\+2\]<F\(θ\)−ρk2\[⟨G\(θ\)−λkρk⟩\+2\]−ϵkF\(\\theta\_\{k\}\)\-\\frac\{\\rho\_\{k\}\}\{2\}\\left\[\\left\\langle G\(\\theta\_\{k\}\)\-\\frac\{\\lambda\_\{k\}\}\{\\rho\_\{k\}\}\\right\\rangle\_\{\+\}^\{2\}\\right\]<F\(\\theta\)\-\\frac\{\\rho\_\{k\}\}\{2\}\\left\[\\left\\langle G\(\\theta\)\-\\frac\{\\lambda\_\{k\}\}\{\\rho\_\{k\}\}\\right\\rangle\_\{\+\}^\{2\}\\right\]\-\\epsilon\_\{k\}fork∈K,k≥k1k\\in K,k\\geq k\_\{1\}\. This contradicts Assumption[E\.1](https://arxiv.org/html/2605.30610#A5.Thmtheorem1)\. ∎
Theorem[E\.2](https://arxiv.org/html/2605.30610#A5.Thmtheorem2)and its proof correspond to\(Birgin and Martínez,[2014](https://arxiv.org/html/2605.30610#bib.bib6), Sec\. 5\.1\)\. Theorem[E\.2](https://arxiv.org/html/2605.30610#A5.Thmtheorem2)establishes that Alg\.[1](https://arxiv.org/html/2605.30610#alg1), under the iterates given in Assumption[E\.1](https://arxiv.org/html/2605.30610#A5.Thmtheorem1), identifies minimizers of the infeasibility, i\.e\.,
⟨G\(θ\)⟩\+:=⟨𝔼x∼p1uθ\[c\(x\)\]−B≤0⟩\+\.\\left\\langle G\(\\theta\)\\right\\rangle\_\{\+\}:=\\;\\left\\langle\\mathbb\{E\}\_\{x\\sim p^\{u\_\{\\theta\}\}\_\{1\}\}\[c\(x\)\]\-B\\leq 0\\right\\rangle\_\{\+\}\.Consequently, if the original optimization problem is feasible, then every limit point of the sequence produced by the algorithm is also feasible\.
Next, we will see that, assuming thatϵk\\epsilon\_\{k\}tends to zero, it is possible to prove that, in the feasible case, the algorithm asymptotically finds a global maximizer of the problem in Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5)\.
###### Theorem E\.3\(Optimality ofConstrainedFlowOptimization\)\.
Let\{θk\}⊂ℝd\\\{\\theta\_\{k\}\\\}\\subset\\mathbb\{R\}^\{d\}be a sequence generated by Alg\.[1](https://arxiv.org/html/2605.30610#alg1)under Assumption[E\.1](https://arxiv.org/html/2605.30610#A5.Thmtheorem1)andlimk→∞ϵk=0\\lim\_\{k\\rightarrow\\infty\}\\epsilon\_\{k\}=0\. Letθ¯∈ℝm\\bar\{\\theta\}\\in\\mathbb\{R\}^\{m\}be a limit of the sequence\{θk\}\\\{\\theta\_\{k\}\\\}\. Suppose that⟨G\(θ\)⟩\+=0\\left\\langle G\(\\theta\)\\right\\rangle\_\{\+\}=0, thenθ¯\\bar\{\\theta\}is a global maximizer of Eq\.[5](https://arxiv.org/html/2605.30610#S3.E5)\.
###### Proof\.
LetK⊆ℕK\\subseteq\\mathbb\{N\}be such that\.
θk→θ¯fork∈Kandk→∞\\theta\_\{k\}\\rightarrow\\bar\{\\theta\}\\;\\text\{ for \}\\;k\\in K\\;\\text\{ and \}\\;k\\rightarrow\\inftyBy assumption, the problem is feasible, thus, by Theorem[E\.2](https://arxiv.org/html/2605.30610#A5.Thmtheorem2), we have thatθ¯\\bar\{\\theta\}is feasible\. Letθ∈ℝm\\theta\\in\\mathbb\{R\}^\{m\}be such thatG\(θ\)≤0G\(\\theta\)\\leq 0\. By the definition of the algorithm, we have that
F\(θk\)−ρk2\[⟨G\(θk\)−λkρk⟩\+2\]≥F\(θ\)−ρk2\[⟨G\(θ\)−λkρk⟩\+2\]−ϵkF\(\\theta\_\{k\}\)\-\\frac\{\\rho\_\{k\}\}\{2\}\\left\[\\left\\langle G\(\\theta\_\{k\}\)\-\\frac\{\\lambda\_\{k\}\}\{\\rho\_\{k\}\}\\right\\rangle\_\{\+\}^\{2\}\\right\]\\geq F\(\\theta\)\-\\frac\{\\rho\_\{k\}\}\{2\}\\left\[\\left\\langle G\(\\theta\)\-\\frac\{\\lambda\_\{k\}\}\{\\rho\_\{k\}\}\\right\\rangle\_\{\+\}^\{2\}\\right\]\-\\epsilon\_\{k\}\(27\)for allk∈ℕk\\in\\mathbb\{N\}, as well as by assumptionG\(θ\)≤0G\(\\theta\)\\leq 0, we have that
⟨G\(θ\)−λkρk⟩\+2≤\(λkρk\)2\.\\left\\langle G\(\\theta\)\-\\frac\{\\lambda\_\{k\}\}\{\\rho\_\{k\}\}\\right\\rangle\_\{\+\}^\{2\}\\leq\\left\(\\frac\{\\lambda\_\{k\}\}\{\\rho\_\{k\}\}\\right\)^\{2\}\.\(28\)We again consider the two cases:ρk→∞\\rho\_\{k\}\\rightarrow\\inftyand\{ρk\}\\\{\\rho\_\{k\}\\\}bounded\.
In the first case, we assumeρk→∞\\rho\_\{k\}\\rightarrow\\infty\. By Eq\.[27](https://arxiv.org/html/2605.30610#A5.E27)and Eq\.[28](https://arxiv.org/html/2605.30610#A5.E28), we have
F\(θk\)≥F\(θk\)−ρk2\[⟨G\(θk\)−λkρk⟩\+2\]≥F\(θ\)−\(λk\)22ρk−ϵk\.F\(\\theta\_\{k\}\)\\geq F\(\\theta\_\{k\}\)\-\\frac\{\\rho\_\{k\}\}\{2\}\\left\[\\left\\langle G\(\\theta\_\{k\}\)\-\\frac\{\\lambda\_\{k\}\}\{\\rho\_\{k\}\}\\right\\rangle\_\{\+\}^\{2\}\\right\]\\geq F\(\\theta\)\-\\frac\{\(\\lambda\_\{k\}\)^\{2\}\}\{2\\rho\_\{k\}\}\-\\epsilon\_\{k\}\.Taking limits fork∈Kk\\in K, and using thatθk→θ¯\\theta\_\{k\}\\rightarrow\\bar\{\\theta\}, we have thatlimk∈K\(λk\)2/ρk=0\\lim\_\{k\\in K\}\(\\lambda\_\{k\}\)^\{2\}/\\rho\_\{k\}=0andlimk∈Kϵk=0\\lim\_\{k\\in K\}\\epsilon\_\{k\}=0, by the continuity ofFFand the convergence ofθk\\theta\_\{k\}, we get
F\(θ¯\)≥F\(θ\)\.F\(\\bar\{\\theta\}\)\\geq F\(\\theta\)\.Sinceθ\\thetais an arbitrary feasible element ofℝm\\mathbb\{R\}^\{m\},θ¯\\bar\{\\theta\}is a global optimizer\.
For the second case, we assume\{ρk\}\\\{\\rho\_\{k\}\\\}is bounded, there existsk0∈ℕk\_\{0\}\\in\\mathbb\{N\}such thatρk=ρk0\\rho\_\{k\}=\\rho\_\{k\_\{0\}\}for allk≥k0k\\geq k\_\{0\}\. Therefore, by Assumption[E\.1](https://arxiv.org/html/2605.30610#A5.Thmtheorem1), Eq\.[27](https://arxiv.org/html/2605.30610#A5.E27)holds for allk≥k0k\\geq k\_\{0\}, and Eq\.[28](https://arxiv.org/html/2605.30610#A5.E28)holds withρ=ρk0\\rho=\\rho\_\{k\_\{0\}\}\. Thus,
F\(θk\)−ρk02\[⟨G\(θk\)−λkρk0⟩\+2\]≥F\(θ\)−\(λk\)22ρk0−ϵk\.F\(\\theta\_\{k\}\)\-\\frac\{\\rho\_\{k\_\{0\}\}\}\{2\}\\left\[\\left\\langle G\(\\theta\_\{k\}\)\-\\frac\{\\lambda\_\{k\}\}\{\\rho\_\{k\_\{0\}\}\}\\right\\rangle\_\{\+\}^\{2\}\\right\]\\geq F\(\\theta\)\-\\frac\{\(\\lambda\_\{k\}\)^\{2\}\}\{2\\rho\_\{k\_\{0\}\}\}\-\\epsilon\_\{k\}\.for allk≥k0k\\geq k\_\{0\}\. LetK1⊆ℕK\_\{1\}\\subseteq\\mathbb\{N\}andλ∗∈ℝ≤0\\lambda^\{\*\}\\in\\mathbb\{R\}\_\{\\leq 0\}be such that:limk∈K1λk=λ∗\\lim\_\{k\\in K\_\{1\}\}\\lambda\_\{k\}=\\lambda^\{\*\}\. By the feasibility ofθ¯\\bar\{\\theta\}, taking limits in the inequality above fork∈K1k\\in K\_\{1\}, we get
F\(θ¯\)−ρk02\[⟨G\(θ¯\)−λ∗ρk0⟩\+2\]≥F\(θ\)−\(λ∗\)22ρk0−ϵk\.F\(\\bar\{\\theta\}\)\-\\frac\{\\rho\_\{k\_\{0\}\}\}\{2\}\\left\[\\left\\langle G\(\\bar\{\\theta\}\)\-\\frac\{\\lambda^\{\*\}\}\{\\rho\_\{k\_\{0\}\}\}\\right\\rangle\_\{\+\}^\{2\}\\right\]\\geq F\(\\theta\)\-\\frac\{\(\\lambda^\{\*\}\)^\{2\}\}\{2\\rho\_\{k\_\{0\}\}\}\-\\epsilon\_\{k\}\.\(29\)Now, ifG\(θ¯\)=0G\(\\bar\{\\theta\}\)=0, sinceλ∗/ρk0≥0\\lambda^\{\*\}/\\rho\_\{k\_\{0\}\}\\geq 0, we have that
⟨G\(θ¯\)−λ∗ρk0⟩\+2=\(λ∗ρk0\)2\\left\\langle G\(\\bar\{\\theta\}\)\-\\frac\{\\lambda^\{\*\}\}\{\\rho\_\{k\_\{0\}\}\}\\right\\rangle\_\{\+\}^\{2\}=\\left\(\\frac\{\\lambda^\{\*\}\}\{\\rho\_\{k\_\{0\}\}\}\\right\)^\{2\}Therefore, by Eq\.[29](https://arxiv.org/html/2605.30610#A5.E29),
F\(θ¯\)−ρk02\[⟨G\(θ¯\)−λ∗ρk0⟩\+2\]≥F\(θ\)−\(λ∗\)22ρk0\.F\(\\bar\{\\theta\}\)\-\\frac\{\\rho\_\{k\_\{0\}\}\}\{2\}\\left\[\\left\\langle G\(\\bar\{\\theta\}\)\-\\frac\{\\lambda^\{\*\}\}\{\\rho\_\{k\_\{0\}\}\}\\right\\rangle\_\{\+\}^\{2\}\\right\]\\geq F\(\\theta\)\-\\frac\{\(\\lambda^\{\*\}\)^\{2\}\}\{2\\rho\_\{k\_\{0\}\}\}\.\(30\)But, by Eq\.[10](https://arxiv.org/html/2605.30610#S4.E10),limk→∞min\{G\(θk\),−λ∗/ρk0\}=0\\lim\_\{k\\rightarrow\\infty\}\\min\\\{G\(\\theta\_\{k\}\),\-\\lambda^\{\*\}/\\rho\_\{k\_\{0\}\}\\\}=0\. Therefore, ifG\(θ¯\)<0G\(\\bar\{\\theta\}\)<0, we necessarily have thatλ∗=0\\lambda^\{\*\}=0\. Therefore, Eq\.[30](https://arxiv.org/html/2605.30610#A5.E30)implies thatF\(θ¯\)≥F\(θ\)F\(\\bar\{\\theta\}\)\\geq F\(\\theta\)\. Sinceθ\\thetais an arbitrary feasible element ofℝm\\mathbb\{R\}^\{m\},θ¯\\bar\{\\theta\}is a global optimizer\. ∎
We want to make two remarks about Theorem[E\.3](https://arxiv.org/html/2605.30610#A5.Thmtheorem3): first, as mentioned in Sec\.[5](https://arxiv.org/html/2605.30610#S5), having access to such a solver is difficult and, in practice, rarely the case\. Secondly, we refer the reader to\(Birgin and Martínez,[2014](https://arxiv.org/html/2605.30610#bib.bib6), Sec\. 5\.2\)for a discussion about the setsKKandK1K\_\{1\}, how they are connected to the convexity ofFFandGG, and the corresponding theorem and proof\.Similar Articles
Constraint-Aware Flow Matching: Decision Aligned End-to-End Training for Constrained Sampling
Proposes Constraint-Aware Flow Matching, a novel end-to-end framework that aligns the model's learning dynamics with constrained sampling procedure, mitigating distributional shift from projection corrections for high-quality constrained generation.
ProHiFlo: Hierarchical Flow Matching with Functional Guidance for De Novo Protein Generation
Introduces ProHiFlo, a hierarchical flow matching framework for de novo protein generation with coarse-to-fine generation, functional guidance, and SE(3)-equivariant architecture, achieving state-of-the-art performance with 4x fewer sampling steps.
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models
Flow-DPPO replaces ratio clipping with divergence proximal constraints in flow matching models, improving training stability and multi-objective optimization through exact KL divergence computation.
Flow-Direct: Feedback-Efficient and Reusable Guidance for Flow Models via Non-Parametric Guidance Field
Flow-Direct introduces a non-parametric guidance field for flow-based generative models that accumulates reward feedback persistently, improving feedback efficiency and enabling reuse of collected samples to guide generation for multiple objectives without additional reward evaluations.
Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards
The paper identifies off-manifold drift in guided flow models under compositional rewards and proposes Conflict-Aware Additive Guidance (CAR), a lightweight method that dynamically resolves gradient conflicts to improve generation fidelity without retraining.