SNAP-FM: Sparse Nonlinear Accelerated Projection for Physics-Constrained Generative Modeling

arXiv cs.LG 07/02/26, 04:00 AM Papers
Summary
Proposes SNAP-FM, a method that leverages sparse GPU nonlinear optimization to accelerate constraint projection in physics-constrained generative modeling, achieving faster inference while preserving exact physical constraint satisfaction.
arXiv:2607.00095v1 Announce Type: new Abstract: Generative models have emerged as scalable surrogates for physical simulation, yet they offer no guarantee that their outputs respect the conservation laws, boundary conditions, and nonlinear invariants that govern the underlying physics. Constrained sampling closes this gap, enforcing such constraints exactly at inference time without retraining, but at a computational cost: projection, correction, and trajectory-optimization steps are repeated during sampling, with these steps becoming expensive for nonlinear constraints. Standard ML frameworks exacerbate this: their dense tensor algebra and limited sparse solver composability obscure the structure that physical constraints naturally induce, making efficient batched nonlinear optimization difficult to realize in practice. We address this bottleneck by exploiting the structure that sample-wise batching and local PDE couplings induce in the projection subproblems -- namely, block-sparse Jacobian and KKT systems -- exposing this structure using ExaModels.jl and solving the resulting sparse nonlinear programs with MadNLP.jl and GPU sparse factorization. Applied to Physics-Constrained Flow Matching (PCFM), on PDE benchmarks with linear, nonlinear, one-dimensional, and two-dimensional constraints, this approach accelerates nonlinear constraint projection while maintaining constraint satisfaction. These results show that sparse GPU nonlinear optimization is a practical foundation for constrained generative sampling in scientific machine learning.
Original Article
View Cached Full Text
Cached at: 07/02/26, 05:35 AM
# SNAP-FM: Sparse Nonlinear Accelerated Projection for Physics-Constrained Generative Modeling
Source: [https://arxiv.org/html/2607.00095](https://arxiv.org/html/2607.00095)
Theodoros XenakisUtkarsh UtkarshPengfei CaiRafael Gómez\-BombarelliAlan EdelmanChristopher V\. Rackauckas

###### Abstract

Generative models have emerged as scalable surrogates for physical simulation, yet they offer no guarantee that their outputs respect the conservation laws, boundary conditions, and nonlinear invariants that govern the underlying physics\. Constrained sampling closes this gap, enforcing such constraints exactly at inference time without retraining, but at a computational cost: projection, correction and trajectory\-optimization steps are repeated during sampling, with these steps becoming expensive for nonlinear constraints\. Standard ML frameworks exacerbate this: their dense tensor algebra and limited sparse solver composability obscure the structure that physical constraints naturally induce, making efficient batched nonlinear optimization difficult to realize in practice\. We address this bottleneck by exploiting the structure that sample\-wise batching and local PDE couplings induce in the projection subproblems – namely, block\-sparse Jacobian and KKT systems – exposing this structure usingExaModels\.jland solving the resulting sparse nonlinear programs withMadNLP\.jland GPU sparse factorization\. Applied to Physics\-Constrained Flow Matching \(PCFM\), on PDE benchmarks with linear, nonlinear, one\-dimensional, and two\-dimensional constraints, this approach accelerates nonlinear constraint projection while maintaining constraint satisfaction\. These results show that sparse GPU nonlinear optimization is a practical foundation for constrained generative sampling in scientific machine learning\.

Machine Learning, ICML

## 1Introduction

Generative models have emerged as flexible surrogates for physical simulation, learning solution distributions for partial differential equations \(PDEs\) and amortizing inference across varying physical conditions\(Priceet al\.,[2023](https://arxiv.org/html/2607.00095#bib.bib22); Yuanet al\.,[2023](https://arxiv.org/html/2607.00095#bib.bib23); Huanget al\.,[2024](https://arxiv.org/html/2607.00095#bib.bib21); Utkarshet al\.,[2025a](https://arxiv.org/html/2607.00095#bib.bib1)\)\. Yet their deployment in scientific settings is limited by a fundamental gap: unconstrained generative models do not, by themselves, guarantee physical fidelity\. Conservation of mass, momentum, and energy, nonlinear boundary conditions, and invariants tied to the governing equations are central to classical numerical simulation, but are routinely violated by learned surrogates unless explicitly enforced\(Raissiet al\.,[2019](https://arxiv.org/html/2607.00095#bib.bib15); Liet al\.,[2021](https://arxiv.org/html/2607.00095#bib.bib19)\)\. Closing this gap without sacrificing the amortized inference advantage of generative models is the central challenge of physics\-constrained generative modeling\.

Constraint enforcement in generative sampling can be broadly divided into soft and hard approaches\. Soft methods, including training\-time penalty losses\(Baldanet al\.,[2025](https://arxiv.org/html/2607.00095#bib.bib29); Huanget al\.,[2024](https://arxiv.org/html/2607.00095#bib.bib21)\), PINN\-style residual regularization, physics\-informed neural operators, and architecture\-level inductive biases\(Greydanuset al\.,[2019](https://arxiv.org/html/2607.00095#bib.bib27); Richter\-Powellet al\.,[2022](https://arxiv.org/html/2607.00095#bib.bib28)\), encourage constraint satisfaction approximately\. These approaches are computationally attractive and scale naturally within modern deep learning pipelines, but they generally provide no exact feasibility guarantees and can exhibit increased constraint violation under distribution shift\.

Hard\-constrained methods instead aim to enforce feasibility exactly, either through inference\-time correction and optimization\(Utkarshet al\.,[2025a](https://arxiv.org/html/2607.00095#bib.bib1); Christopheret al\.,[2024](https://arxiv.org/html/2607.00095#bib.bib24); Chenget al\.,[2025](https://arxiv.org/html/2607.00095#bib.bib17); Römeret al\.,[2024](https://arxiv.org/html/2607.00095#bib.bib25); Yuanet al\.,[2023](https://arxiv.org/html/2607.00095#bib.bib23)\)or through end\-to\-end constrained formulations\(Utkarshet al\.,[2025b](https://arxiv.org/html/2607.00095#bib.bib31)\)\. In the simplest case, one can apply a final post\-hoc projection that maps a generated sample onto the feasible manifold, but such a late correction can introduce substantial distributional distortion\. To reduce this mismatch, many methods interleave corrections with the sampling dynamics, repeatedly guiding or projecting intermediate states toward feasibility\(Utkarshet al\.,[2025a](https://arxiv.org/html/2607.00095#bib.bib1); Chenget al\.,[2025](https://arxiv.org/html/2607.00095#bib.bib17); Christopheret al\.,[2024](https://arxiv.org/html/2607.00095#bib.bib24); Ben\-Hamuet al\.,[2024](https://arxiv.org/html/2607.00095#bib.bib18)\)\. Relaxed variants further delay or soften early corrections, reflecting the fact that early iterates often remain noise\-like and may be harmed by overly strict constraint enforcement\.

Despite these algorithmic differences, hard\-constrained test\-time methods share a common computational burden: feasibility is enforced through repeated optimization during sampling\(Utkarshet al\.,[2025a](https://arxiv.org/html/2607.00095#bib.bib1); Chenget al\.,[2025](https://arxiv.org/html/2607.00095#bib.bib17); Christopheret al\.,[2024](https://arxiv.org/html/2607.00095#bib.bib24)\)\. In projection\-based methods, this burden is especially explicit: each correction maps an intermediate state onto, or closer to, the constraint manifold by solving a constrained optimization problem\. For linear constraints, this projection often reduces to a single matrix solve\. For the nonlinear conservation laws governing many physical systems, including flux constraints, integral invariants, and nonlinear boundary conditions, the same step requires iterative nonlinear optimization\. Since this optimization must be performed repeatedly across sampling steps and over batches of generated samples, it can dominate the total cost of hard\-constrained generation\. The central question is therefore not merely whether to enforce hard constraints, but how to make the resulting projection and correction steps fast enough to be practical at scale\.

The algorithmic response is well known from large\-scale optimization: exploit the sparsity induced by the constraints, and solve the resulting KKT systems using sparse linear algebra\(Nocedal and Wright,[2006](https://arxiv.org/html/2607.00095#bib.bib8); Pacaud and Shin,[2024](https://arxiv.org/html/2607.00095#bib.bib12); Rennichet al\.,[2014](https://arxiv.org/html/2607.00095#bib.bib32); Lu and Yang,[2025](https://arxiv.org/html/2607.00095#bib.bib33)\)\. However, this strategy is difficult to deploy inside modern generative sampling pipelines\. Standard ML frameworks are optimized primarily for dense batched tensor algebra, and although sparse tensor support exists, it does not yet provide a mature end\-to\-end stack for batched nonlinear programming, interior\-point globalization, GPU\-resident sparse KKT factorization, and integration with the sampling loop\(Paszkeet al\.,[2019](https://arxiv.org/html/2607.00095#bib.bib34); Bradburyet al\.,[2018](https://arxiv.org/html/2607.00095#bib.bib35)\)\. This gap between constrained optimization practice and deep generative modeling infrastructure motivates our approach\.

The projection NLPs arising in batched hard\-constrained sampling are highly structured\. Across the batch, the constraint Jacobian is block diagonal because each sample’s constraints are independent of the others\. Within each block, the Jacobian is sparse because physical conservation laws and PDE discretizations couple only local spatial or temporal degrees of freedom\. This two\-level structure yields sparse KKT systems that are amenable to GPU\-resident sparse factorization, avoiding the dense tensorization that would otherwise make nonlinear projection prohibitively expensive\.

We realize this idea through SNAP\-FM, a sparse nonlinear projection framework based on Physics\-Constrained Flow Matching \(PCFM\)\(Utkarshet al\.,[2025a](https://arxiv.org/html/2607.00095#bib.bib1)\)\. SNAP\-FM usesExaModels\.jlto symbolically compile the structured projection NLP,MadNLP\.jlas the interior\-point solver, and GPU sparse factorization for the resulting KKT systems\. Built on PCFM, which enforces arbitrary nonlinear constraints zero\-shot in pretrained flow models, SNAP\-FM targets the main computational bottleneck of constraint enforcement: repeated nonlinear projection during sampling\. Across heat, reaction\-diffusion, Burgers, and two\-dimensional Navier–Stokes benchmarks, SNAP\-FM accelerates the projection step relative to generic optimization baselines while maintaining constraint satisfaction\.

## 2Physics Constrained Flow Matching

This work builds directly on the Physics\-Constrained Flow Matching framework introduced by Utkarsh et al\.\([2025a](https://arxiv.org/html/2607.00095#bib.bib1)\), with the goal of extending it to a domain the current implementation doesn’t support: the efficient enforcement of nonlinear constraints at scale\. Among recent constraint\-aware generative models, PCFM is the natural foundation for our work, also showing promising extension to different scientific domains such as atomistic generative models\(Caiet al\.,[2026](https://arxiv.org/html/2607.00095#bib.bib36)\)\. It ensures constraint satisfaction exactly at inference time, requiring no retraining or architectural modifications of the underlying flow model, operating entirely post\-hoc\. It projects intermediate flow states onto constraint manifolds at inference time, without requiring gradient information during training, enabling zero\-shot constraint enforcement up to machine precision for pretrained flow matching models\. Unlike prior zero\-shot approaches restricted to linear or non\-overlapping constraints, it admits arbitrary nonlinear and coupled constraints within a single framework\. These characteristics make PCFM a solid foundation on which to build a more general\-purpose constrained sampling mechanism\.

### 2\.1Sampling

The sampling algorithm utilized in PCFM is a constraint\-guided algorithm that interleaves lightweight constraint corrections with marginally consistent flow updates\. The procedure consists of four main steps: forward shooting, Gauss\-Newton projection, reverse updating, and relaxed constraint corrections\.

Letvθ\(u,τ\)v\_\{\\theta\}\(u,\\tau\)denote the pretrained flow model, which defines an ODE transporting samplesu0∼π0u\_\{0\}\\sim\\pi\_\{0\}from a tractable prior to solution\-like outputsu1∼π1u\_\{1\}\\sim\\pi\_\{1\}over flow timeτ∈\[0,1\]\\tau\\in\[0,1\]\. Given a constraint functionh\(u\)h\(u\), PCFM enforcesh\(u1\)=0h\(u\_\{1\}\)=0by discretizing\[0,1\]\[0,1\]intoNNuniform substeps\. At each timestep,τ→τ′=τ\+Δτ\\tau\\rightarrow\\tau^\{\\prime\}=\\tau\+\\Delta\\tau, three operations are performed: a one\-step extrapolation to the terminal time using the learned velocity, a projection of the resulting candidate onto the constraint manifoldℳ=\{u:h\(u\)=0\}\\mathcal\{M\}=\\\{u:h\(u\)=0\\\}, and a linear interpolation back toτ′\\tau^\{\\prime\}along the optimal\-transport displacement\(Utkarshet al\.,[2025a](https://arxiv.org/html/2607.00095#bib.bib1)\):

u^1\\displaystyle\\hat\{u\}\_\{1\}=uτ\+\(1−τ\)vθ\(uτ,τ\),\\displaystyle=u\_\{\\tau\}\+\(1\-\\tau\)\\,v\_\{\\theta\}\(u\_\{\\tau\},\\tau\),\(1\)u1\\displaystyle u\_\{1\}=arg⁡minu⁡12‖u−u^1‖2s\.t\.h\(u\)=0,\\displaystyle=\\arg\\min\_\{u\}\\;\\tfrac\{1\}\{2\}\\\|u\-\\hat\{u\}\_\{1\}\\\|^\{2\}\\quad\\text\{s\.t\.\}\\quad h\(u\)=0,\(2\)uτ′\\displaystyle u\_\{\\tau^\{\\prime\}\}=u0\+τ′\(u1−u0\)\.\\displaystyle=u\_\{0\}\+\\tau^\{\\prime\}\(u\_\{1\}\-u\_\{0\}\)\.\(3\)
The full PCFM algorithm of\(Utkarshet al\.,[2025a](https://arxiv.org/html/2607.00095#bib.bib1)\)additionally permits a penalized correction at each step,arg⁡minu⁡‖u−u^τ′‖2\+λ‖h\(u\+\(1−τ′\)vθ\(u,τ′\)\)‖2\\arg\\min\_\{u\}\\\|u\-\\hat\{u\}\_\{\\tau^\{\\prime\}\}\\\|^\{2\}\+\\lambda\\\|h\(u\+\(1\-\\tau^\{\\prime\}\)v\_\{\\theta\}\(u,\\tau^\{\\prime\}\)\)\\\|^\{2\}, to compensate for nonlinearity under coarse discretization\. However, we omit it, as in our experiments, the streamlined three\-step variant is sufficient\.

The key structural feature exploited in this work is that the loop above is constraint\-agnostic\. Every operation except the projection in step 2 is a fixed arithmetic kernel, learning the entire specifications of the problem to be funneled into the projection subproblem\. The cost of a single sampling pass therefore can be divided into two main terms: the constraint projections and the velocity evaluations\. The forward shoot and OT pullback are independent of the constraints being enforced, whereas the projection step scales with the number of imposed constraints and their complexities, causing the projection step to be the bottleneck of the sampling, especially in these more complex cases\. The remainder of this work focuses on optimizing the projection step via exploitation of the sparsity in the Jacobians, since constraints are independent across samples, and temporal couplings are typically local\.

## 3Theory

In this section, we discuss common constraints imposed on PDE systems and derive structural properties of the associated constraint Jacobians\.

We adopt notation similar to that of Utkarsh et al\.\([2025a](https://arxiv.org/html/2607.00095#bib.bib1)\)\. Specifically, we consider partial differential equations on a bounded spatiotemporal domainΩ×\[0,T\]\\Omega\\times\[0,T\], whereΩ⊂ℝd\\Omega\\subset\\mathbb\{R\}^\{d\}\. The true PDE solution is a fieldu:Ω×\[0,T\]→ℝu:\\Omega\\times\[0,T\]\\to\\mathbb\{R\}, which we approximate using a discretized tensor𝐮∈ℝNx×Nt\\mathbf\{u\}\\in\\mathbb\{R\}^\{N\_\{x\}\\times N\_\{t\}\}, whereNxN\_\{x\}andNtN\_\{t\}are the number of spatial and temporal grid points, respectively\.

In addition to satisfying the governing PDE, the solution must also satisfy a collection of physical constraints\. These constraints are represented as

\(ℋk\(u\)\)k=𝟎\\displaystyle\(\\mathcal\{H\}\_\{k\}\(u\)\)\_\{k\}=\\mathbf\{0\}whereℋk\\mathcal\{H\}\_\{k\}is thekk\-th constraint acting on the solutionuu\. For convenience, we letℋ\(u\)\\mathcal\{H\}\(u\)denote the concatenation of all imposed constraints\. Further, it is implied that the constraint operator can also work on the discretized solution𝐮\\mathbf\{u\}\.

Although the discussion in this section focuses on one\-dimensional PDEs, the main ideas extend naturally to higher\-dimensional settings\.

### 3\.1Constraints

Constraints imposed for PDEs typically arise from physical principles such as conservation laws, initial conditions, and boundary conditions\(LeVeque,[2004](https://arxiv.org/html/2607.00095#bib.bib11),[1992](https://arxiv.org/html/2607.00095#bib.bib10)\)\. These constraints may be linear or nonlinear inuu, and may act locally or globally in space and time\. A selection of commonly encountered constraints is summarized in table[1](https://arxiv.org/html/2607.00095#S3.T1)\.

Table 1:Overview of common constraint types for PDEs\.Most constraints encountered in practice are local in time\. Meaning, they depend only on a single time step or a small number of neighboring time steps\. For instance, an initial value constraint applies only att=0t=0, while a mass conservation constraint typically applies to each time step independently\. Even nonlinear conservation laws involving temporal derivatives can often be discretized using finite\-difference schemes that couple only a small number of neighboring time steps, thereby preserving temporal locality\(Quarteroniet al\.,[2007](https://arxiv.org/html/2607.00095#bib.bib9)\)\. The general Dirichlet IC/BC constraint formulated asAu−bAu\-bin[1](https://arxiv.org/html/2607.00095#S3.T1)could, in principle, define a global coupling, depending on the structure ofAA\. In practice, however,AAis usually sparse and only couples nearby spatial or temporal nodes\.

### 3\.2Jacobian Structure

The Gauss\-Newton projection step presented in[2\.1](https://arxiv.org/html/2607.00095#S2.SS1)can be reformulated as solving the constrained optimization problem

uproj=minu‖u−u1‖22subject toℋ\(u\)=0\\displaystyle u\_\{\\text\{proj\}\}=\\min\_\{u\}\|\|u\-u\_\{1\}\|\|\_\{2\}^\{2\}\\quad\\text\{subject to \}\\mathcal\{H\}\(u\)=0\(4\)where the variablesuucan be interpreted as flattened vectors\. Solving such a constrained optimization problem generally involves repeatedly solving large linear systems derived from the Karush\-Kuhn\-Tucker \(KKT\) conditions\(Nocedal and Wright,[2006](https://arxiv.org/html/2607.00095#bib.bib8)\)\. These systems typically take the form

\[WJℋTJℋ0\]\\displaystyle\\begin\{bmatrix\}W&J\_\{\\mathcal\{H\}\}^\{T\}\\\\ J\_\{\\mathcal\{H\}\}&0\\end\{bmatrix\}\(5\)whereJℋ=∂ℋ∂uJ\_\{\\mathcal\{H\}\}=\\frac\{\\partial\\mathcal\{H\}\}\{\\partial u\}is the Jacobian of the constraintsℋ\\mathcal\{H\}, andWWis the Hessian of the Lagrangian\. We defer a detailed discussion of the KKT derivation to Appendix[B](https://arxiv.org/html/2607.00095#A2)\. Regardless of the specific optimization method employed, computing the Jacobian \- or an approximation to it \- is generally a necessary part of the solution procedure\(Nocedal and Wright,[2006](https://arxiv.org/html/2607.00095#bib.bib8)\)\.

A key observation is that physically motivated PDE constraints typically induce highly sparse Jacobians after discretization\. Exploiting this sparsity is crucial for efficiently solving the associated linear systems\. Moreover, for a fixed set of constraints, the sparsity pattern remains constant throughout the optimization process, allowing repeated linear solves to reuse structural information\.

To understand whyJℋJ\_\{\\mathcal\{H\}\}is sparse, consider the discretized solution as a flattened vector𝐮∈ℝNxNt\\mathbf\{u\}\\in\\mathbb\{R\}^\{N\_\{x\}N\_\{t\}\}\. The Jacobian then has dimensionsJℋ∈ℝNc×NxNtJ\_\{\\mathcal\{H\}\}\\in\\mathbb\{R\}^\{N\_\{c\}\\times N\_\{x\}N\_\{t\}\}, whereNcN\_\{c\}denotes the number of constraints\.

If a constraint depends only on a single time step, at mostNxN\_\{x\}entries of the corresponding Jacobian row will be nonzero\. Similarly, if a constraint involves only a few spatial steps, but is global in time, then each row contains only a small number of nonzero entries per time step, yielding at most𝒪\(Nt\)\\mathcal\{O\}\(N\_\{t\}\)non\-zero entries overall\. Furthermore, constraints that are global in one domain but local in the other often induce repeated block structures in the Jacobian, since the same constraint operator is applied repeatedly across spatial or temporal points\. For an involved example involving both mass and an initial value constraint, we refer to Appendix[C](https://arxiv.org/html/2607.00095#A3)\.

#### 3\.2\.1Solution Batches

It is also useful to consider the case where a batch ofNsN\_\{s\}samples is generated simultaneously\. In that setting, the Jacobian has dimensionJℋ∈ℝNcNs×NxNtNsJ\_\{\\mathcal\{H\}\}\\in\\mathbb\{R\}^\{N\_\{c\}N\_\{s\}\\times N\_\{x\}N\_\{t\}N\_\{s\}\}\. Since each sample is independent, the resulting Jacobian exhibits a block\-diagonal structure, with each block corresponding to the constraints of an individual sample\.

### 3\.3Sparse Solvers and GPU acceleration

Several optimization frameworks have recently been developed to exploit the structure and sparsity of the objective and constraint functions in large\-scale nonlinear programs \(NLP\)\. Two notable libraries areExaModels\.jl\(Shinet al\.,[2024a](https://arxiv.org/html/2607.00095#bib.bib5)\)andMadNLP\.jl\(Shinet al\.,[2024b](https://arxiv.org/html/2607.00095#bib.bib4)\)\.

ExaModels is an algebraic modeling and automatic differentiation framework designed for high\-performance nonlinear optimization\(Shinet al\.,[2024a](https://arxiv.org/html/2607.00095#bib.bib5)\)\. When using a general\-purpose modeling framework such asJuMP\.jl\(Lubinet al\.,[2023a](https://arxiv.org/html/2607.00095#bib.bib13)\), it has been found that automatic differentiation may constitute a significant fraction of the total solver time\(Shinet al\.,[2024b](https://arxiv.org/html/2607.00095#bib.bib4)\)\. ExaModels is designed to mitigate this issue by generating sparsity\-aware computational graphs and compiling them into SIMD\-parallel kernels throughExaCore\. This allows objectives, constraints, Jacobians and Hessians to be evaluated efficiently in a single sparsity\-preserving pass, while exploiting the parallelism available on modern GPU architectures\.\(Shinet al\.,[2024b](https://arxiv.org/html/2607.00095#bib.bib4)\)\.

MadNLP implements a primal\-dual interior point solver, working together with ExaModels to form and solve sparse KKT systems efficiently\. Since the sparsity pattern of the Jacobian typically remains fixed throughout the optimization process, MadNLP can leverage this to optimize the linear system solves, significantly reducing overhead associated with repeated factorizations of the Jacobian\(Shinet al\.,[2024b](https://arxiv.org/html/2607.00095#bib.bib4)\)\.

Furthermore, MadNLP works across both CPU and GPU\. On GPUs, MadNLP interfaces with NVIDIA’s GPU sparse linear solver, cuDSS\([2025](https://arxiv.org/html/2607.00095#bib.bib7)\)\. cuDSS solves sparse linear systems using variants ofLULU,LDLTLDL^\{T\}or Cholesky factorization, and is specifically designed for exploiting the parallelism of NVIDIA GPU architectures\. Large, sparse KKT systems can thus be solved substantially faster than with traditional CPU\-based solvers\(Pacaud and Shin,[2024](https://arxiv.org/html/2607.00095#bib.bib12)\)\.

## 4Methodology and Setup

We evaluate SNAP\-FM via benchmarking of the PCFM sampling loop across a set of PDE problems chosen to span the constraint regimes of practical interest, and for each problem, compare the runtime and constraint\-satisfaction behavior of numerous optimization backends inside the projection step\. For implementation details and scripts used to reproduce the experiments, we provide the code at[https://github\.com/xenakistheo/PCFM\.jl](https://github.com/xenakistheo/PCFM.jl)\. Additional details on the experimental setup, hardware, and benchmarking procedure are given in Appendix[E](https://arxiv.org/html/2607.00095#A5)\.

### 4\.1Optimization Methods

Our main proposed solver,ExaModels \+ MadNLP on GPUis benchmarked against five baselines that utilize different algebraic modeling layers, NLP solvers, and execution hardware\. We chose this set of methods such that comparisons against the adjacent baselines allow for isolation of contributions of each of the three axes\.

##### ExaModels \+ MadNLP \(GPU\)

As mentioned earlier,ExaModels\.jl\(Shinet al\.,[2024a](https://arxiv.org/html/2607.00095#bib.bib5)\)compiles the projection NLP into SIMD\-parallel GPU kernels viaExaCore, allowing for evaluation of objectives, constraints and ADD in a single sparsity\-preserving pass\. The resulting structured KKT system is then solved entirely on the GPU byMadNLP\.jl\(Shinet al\.,[2024b](https://arxiv.org/html/2607.00095#bib.bib4)\), an interior\-point \(IP\) solver, with the sparse linear systems broken down via cuDSS\. This combination is a promising direction for large\-scale batches where the full Jacobian structure can be exploited for GPU\-accelerated NLP\.

##### ExaModels \+ MadNLP \(CPU\)

This optimization method is of identical modeling and solver to our main method, but instead executed on the CPU\. This baseline isolates the contribution of GPU acceleration\. Namely, any performance gap that is found between this method and our above approach is purely attributed to hardware utilization\.

##### JuMP \+ MadNLP \(CPU\)

We utilize the same NLP solver, but with the model instead expressed inJuMP\.jl\(Lubinet al\.,[2023b](https://arxiv.org/html/2607.00095#bib.bib38)\), a general\-purpose algebraic modeling package\. JuMP differs from ExaModels in that it does not preserve SIMD structure or compile to GPU kernels\. Therefore, this allows for isolation of the contribution of the modeling layer, specifically the value of structure\-preserving compilation\.

##### JuMP \+ Ipopt

Ipopt\(Wächter and Biegler,[2006](https://arxiv.org/html/2607.00095#bib.bib39)\)is one of the most widely deployed NLP solvers in scientific computing\. We include it as a standard benchmark any practitioner would reach for when including nonlinear constraints inPCFM\.jl, and serves as empirical evidence that projection remains the sampling bottleneck in the nonlinear regime\.

##### Optimization\.jl \+ IPNewton

We implemented Optimization\.jl with IPNewton to benchmark with a second interior\-point baseline through theOptimization\.jl\(Dixit and Rackauckas,[2023](https://arxiv.org/html/2607.00095#bib.bib3)\)unified interface\. This ensures that any observed performance improvements are attributed correctly to MadNLP implementations of interior\-point methods, not just the IP method itself\.

##### Optimization\.jl \+ L\-BFGS

We include L\-BFGS as an unconstrained optimization baseline, ensuring the structured\-NLP framing is truly advantageous and necessary\. Unlike the other methods, L\-BFGS does not impose hard equality constraints\. Instead, the constrained projection problem is converted into a different mathematical problem in which the constraint residuals enter through a penalty term\. This distinction is important when interpreting runtime and feasibility results, since a faster L\-BFGS should not be read as a faster solution of the hard\-constrained projection problem\.

### 4\.2Test Problems

We consider generative modeling for a collection of six different PDE benchmark problems with varying physical constraints\. These problems are chosen to evaluate how different constraint sets influence both inference quality and computational cost\. More importantly, we are also interested in how these results vary across the different solvers\. The benchmark problems are as follows

- •One dimensional heat equation with initial value and mass\-conservation constraints\.
- •One dimensional heat equation with initial value, mass\-conservation, and energy evolution constraints\.
- •One dimensional reaction diffusion equation with Neumann boundary conditions and a nonlinear conservation constraint\.
- •One dimensional Burgers’ equation with boundary and mass\-conservation constraints\.
- •One dimensional Burgers’ equation with initial value, mass\-conservation, and local flux constraints\.
- •Two\-dimensional Navier\-Stokes equation in vorticity form with initial\-value, total\-vorticity conservation, and enstrophy constraints\.

The test problems with their constraints are described in detail in Appendix[A](https://arxiv.org/html/2607.00095#A1)\.

For both the heat equation and Burgers’ equation, the paired test problems use identical pretrained FFM models\. In each case, the training dataset and network architecture are unchanged, and only the inference\-time constraints differ\.

For the Burgers’ with initial value constraint, we also experiment with removing the mass\-conservation, and the flux constraints in a scaling study\.

The discretization procedures used to enforce the constraints are described in Appendix[D](https://arxiv.org/html/2607.00095#A4)\.

## 5Experiments and Results

We evaluate the proposed methods on the benchmark problems outlined in Section[4\.2](https://arxiv.org/html/2607.00095#S4.SS2)\. These problems span constraint regimes of increasing difficulty, including linear and nonlinear one\-dimensional constraints, as well as a nonlinear two\-dimensional problem\. For each benchmark, we compare all optimization backends from Section[4\.1](https://arxiv.org/html/2607.00095#S4.SS1)in terms of runtime, and constraint violation\. Beyond the main benchmark results, we include two scaling studies to better characterize the computational behavior of the projection step\. First, we consider the heat equation with fixed initial condition and mass\-conservation constraints, and vary the number of generated samples to study scaling with batch size\. Second, we consider the Burgers equation under increasingly rich constraint sets, allowing us to assess how solver performance changes as additional nonlinear constraints are imposed\.

Unless otherwise stated, all reported runtimes are averaged over four repetitions with3232generated samples — an exception being the two\-dimensional Navier\-Stokes problem for which we generated22samples due to the substantially larger state dimension\. Wall\-clock timings are reported usingBenchmarkTools\.jl’s@btime±\\pma standard deviation\.DNFdenotes runs that did not converge within the allotted runtime budget of six hours\.

### 5\.1Runtime

Tables[2](https://arxiv.org/html/2607.00095#S5.T2),[3](https://arxiv.org/html/2607.00095#S5.T3),[4](https://arxiv.org/html/2607.00095#S5.T4),[5](https://arxiv.org/html/2607.00095#S5.T5),[6](https://arxiv.org/html/2607.00095#S5.T6)&[7](https://arxiv.org/html/2607.00095#S5.T7)present the central results of this paper: end\-to\-end sampling time for a full pass of PCFM under each method and constraint regime, broken down into the model\-evaluation and projection components for specification\. The fastest solver in each constraint regime is highlighted in bold\.

Table 2:Heat Equation \- 1Table 3:Heat Equation \- 2Table 4:Burgers’ \(BC\)Table 5:Burgers’ \(IC, Mass, Flux\)Table 6:Reaction\-Diffusion EquationTable 7:Navier\-Stokes’ EquationA few observations worth highlighting\. Notably, the ExaModels \+ MadNLP combination beat the other methods in all six test problems\. Moreover, the gap between the GPU and CPU implementation isolates the contribution of the hardware\. This difference between the two was especially pronounced in the case of the heat equation with an energy\-evolution constraint, as well as the Burgers’ PDE with Godunov’s flux constraints\. We hypothesize that the nonlinearity of these two sets of constraints made it significantly harder to solve as an optimization problem\. Hence, the difference in performance became more pronounced\. Secondly, the difference in the results of JuMP \+ MadNLP and ExaModels \+ MadNLP both run on CPU isolates the modeling layer, which is our main focus here\. What we observe is that there is a consistent speedup from the exploitation of the sparsity structure of the Jacobian matrices at compile time\. Even for the simple constraints in the heat equation,[2](https://arxiv.org/html/2607.00095#S5.T2), the difference is significant\.

We also note that missing entries correspond to runs that did not complete within the runtime budget\. These failures are themselves informative: all methods except for the ExaModels \+ MadNLP failed to converge on the Burgers’ Equation constraint that included IC and flux constraints within a reasonable amount of time\. IPNewton also failed on Burgers’ with boundary constraints, as well as on the Navier\-Stokes’ equation\.

### 5\.2Constraint Violation

Beyond analyzing the runtimes, we want to examine constraint violations along the sampling trajectory\.

Constraint satisfaction is quantified using a metric that measures the average magnitude of the constraint violations across all samples in a batch\. Violations are normalized within each constraint type before being aggregated, ensuring that constraints with different scales or numbers of instances contribute comparably\. The resulting scalar score provides a fair basis for comparing optimization backends on a given set of constraints\. We refer to this metric asInfeasibility\.

Figure[1](https://arxiv.org/html/2607.00095#S5.F1)follows the mass\-conservation residual along the sampling trajectory for our Burgers’ equation constraint under each method\. The structure\-exploiting methods, namelyExaModels \+ MadNLPon GPU and CPU as well asJuMP \+ MadNLPproduce residuals that oscillate around zero with a magnitude∼10−3\\sim 10^\{\-3\}\. To contrast, the unstructured methods produce residuals that appear biased\. IPNewton undershoots the constraint in the early steps and overshoots in the latter, while L\-BFGS decays negatively throughout\. While the overall magnitudes are relatively similar, the qualitative behavior of our methods differs\.

![Refer to caption](https://arxiv.org/html/2607.00095v1/images/mass_violation_burgers_grid.png)Figure 1:Mass\-conservation residual along the PCFM sampling trajectory for Burgers’ equation\. Structure\-exploiting methods \(top row\) produce residuals that oscillate around zero, consistent with constraint satisfaction modulo numerical noise\. Unstructured methods \(bottom row\) produce systematically biased residuals:IPNewtonundershoots then overshoots,L\-BFGSdrifts monotonically\.The infeasibility values in our results also highlight an important nuance\. The constraints are solved up to the numerical tolerances of the nonlinear optimizer\. Thus, the reported infeasibility should be interpreted as the residual constraint violation remaining after convergence, rather than as evidence that the constraints are ignored\. In all experiments, we used the solver’s default feasibility tolerance of10−410^\{\-4\}\. Tighter or problem\-specific tolerances may further reduce constraint violation, but could also increase runtime or affect solver robustness\. Investigating how to tune these tolerances, and how this trade\-off impacts both feasibility and sampling cost, is an important direction for future work\.

#### 5\.2\.1Scaling Study \- Number of Samples

Figure[2](https://arxiv.org/html/2607.00095#S5.F2)shows the runtime of each optimization backend as the number of generated samples is increased\. The runtime grows approximately linearly with the number of samples for all methods\. This is consistent with the block\-diagonal structure of the batched projection problem: increasing the batch size adds independent constraint blocks, but does not introduce additional cross\-sample coupling\. The relative ordering of the methods is also largely preserved across batch sizes\.

![Refer to caption](https://arxiv.org/html/2607.00095v1/images/nsample_scale.png)Figure 2:Runtime scaling with the number of generated samples for the first heat\-equation constraint regime\. Time scales linearly with number of samples, and ordering is preserved\.
#### 5\.2\.2Scaling Study \- Number of Constraints

Table[8](https://arxiv.org/html/2607.00095#S5.T8)shows the runtime and infeasibility values as additional constraints are imposed on the Burgers equation\. Even for the smaller nonlinear constraint sets, MadNLP\-based methods are substantially faster than the alternative baselines\. As local flux constraints are added, the projection problem becomes significantly more difficult\. In this regime, GPU acceleration becomes increasingly important\. The ExaModels \+ MadNLP GPU backend is the only method that remains tractable for the largest tested constraint set\.

The constraint sets increase in complexity\.IC\-onlyfixes the initial condition, whileIC, Massadditionally enforces conservation of the spatial integral over time\. The Flux\(kk\) constraints addkklocal Godunov update residuals – see Appendix[D](https://arxiv.org/html/2607.00095#A4)\.

Table 8:Runtime scaling with the number of imposed constraints for the Burgers equation\. DNF denotes runs that did not finish within the runtime budget\.

## 6Conclusion

We present an acceleration of the projection subproblem in Physics\-Constrained Flow Matching by exploiting the sparsity structure inherent in the constraints\. This work is motivated by the observation that the computational complexity of the projection step scales poorly during the PCFM sampling process as the number and complexity of constraints increase\. By analyzing the Jacobian sparsity pattern, we reformulate the problem as a collection of independent per\-sample subproblems while additionally leveraging the fixed sparsity structure arising from the discretized conservation constraints\. To exploit this structure efficiently on the GPU, we employedExaModels\.jlas the modeling layer together withMadNLP\.jlas the nonlinear programming solver\. Across our benchmarks, this approach yields a significant speedup for computationally demanding nonlinear constraints, while not compromising on correctness\.

## Acknowledgments

We thank the reviewers for their thoughtful comments and constructive suggestions, which helped improve the clarity and quality of this paper\.

This material is based upon work supported by the U\.S\. National Science Foundation under award Nos CNS\-2346520, RISE\-2425761, and DMS\-2325184, by the Defense Advanced Research Projects Agency \(DARPA\) under Agreement No\. HR00112490488, by the Department of Energy, National Nuclear Security Administration under Award Number DE\-NA0004266 and by the United States Air Force Research Laboratory under Cooperative Agreement Number FA8750\-19\-2\-1000\. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights\. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof\. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof\.

## Impact Statement

This paper presents work whose goal is to make physically consistent generative simulation more practical for scientific applications\. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here\.

## References

- G\. Baldan, Q\. Liu, A\. Guardone, and N\. Thuerey \(2025\)Flow matching meets pdes: a unified framework for physics\-constrained generation\.arXiv preprint arXiv:2506\.08604\.Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p2.1)\.
- H\. Ben\-Hamu, O\. Puny, I\. Gat, B\. Karrer, U\. Singer, and Y\. Lipman \(2024\)D\-flow: differentiating through flows for controlled generation\.External Links:2402\.14017,[Link](https://arxiv.org/abs/2402.14017)Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p3.1)\.
- J\. Bradbury, R\. Frostig, P\. Hawkins, M\. J\. Johnson, C\. Leary, D\. Maclaurin, G\. Necula, A\. Paszke, J\. VanderPlas, S\. Wanderman\-Milne,et al\.\(2018\)JAX: composable transformations of python\+ numpy programs\.Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p5.1)\.
- P\. Cai, U\. Utkarsh, N\. Segal, A\. Subramanian, K\. Jager, E\. Pan, A\. Edelman, C\. V\. Rackauckas, and R\. Gomez\-Bombarelli \(2026\)Enforcing constraints in molecular and crystalline generative models via physics\-constrained flow matching\.InAI for Accelerated Materials Design \- ICLR 2026,External Links:[Link](https://openreview.net/forum?id=QVQjDmIFg0)Cited by:[§2](https://arxiv.org/html/2607.00095#S2.p1.1)\.
- C\. Cheng, B\. Han, D\. C\. Maddix, A\. F\. Ansari, A\. Stuart, M\. W\. Mahoney, and Y\. Wang \(2025\)Gradient\-free generation for hard\-constrained systems\.External Links:2412\.01786,[Link](https://arxiv.org/abs/2412.01786)Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p3.1),[§1](https://arxiv.org/html/2607.00095#S1.p4.1)\.
- J\. K\. Christopher, S\. Baek, and F\. Fioretto \(2024\)Constrained synthesis with projected diffusion models\.Advances in Neural Information Processing Systems37,pp\. 89307–89333\.Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p3.1),[§1](https://arxiv.org/html/2607.00095#S1.p4.1)\.
- V\. Dixit and C\. Rackauckas \(2023\)Optimization\.jl: a unified interface for mathematical optimization in julia\.InJuliaCon Proceedings,Cited by:[§4\.1](https://arxiv.org/html/2607.00095#S4.SS1.SSS0.Px5.p1.1)\.
- S\. Greydanus, M\. Dzamba, and J\. Yosinski \(2019\)Hamiltonian neural networks\.Advances in neural information processing systems32\.Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p2.1)\.
- J\. Huang, G\. Yang, Z\. Wang, and J\. J\. Park \(2024\)DiffusionPDE: generative pde\-solving under partial observation\.Advances in Neural Information Processing Systems37,pp\. 130291–130323\.Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p1.1),[§1](https://arxiv.org/html/2607.00095#S1.p2.1)\.
- R\. J\. LeVeque \(1992\)Numerical methods for conservation laws\.Lectures in Mathematics ETH Zürich,Birkhäuser,Basel\.Cited by:[§3\.1](https://arxiv.org/html/2607.00095#S3.SS1.p1.1)\.
- R\. J\. LeVeque \(2004\)Finite\-volume methods for hyperbolic problems\.Cambridge Texts in Applied Mathematics,Cambridge University Press\.Cited by:[§A\.2](https://arxiv.org/html/2607.00095#A1.SS2.p2.1),[§3\.1](https://arxiv.org/html/2607.00095#S3.SS1.p1.1)\.
- Z\. Li, N\. Kovachki, K\. Azizzadenesheli, B\. Liu, K\. Bhattacharya, A\. Stuart, and A\. Anandkumar \(2021\)Fourier neural operator for parametric partial differential equations\.External Links:2010\.08895,[Link](https://arxiv.org/abs/2010.08895)Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p1.1)\.
- H\. Lu and J\. Yang \(2025\)CuPDLP\. jl: a gpu implementation of restarted primal\-dual hybrid gradient for linear programming in julia\.Operations Research73\(6\),pp\. 3440–3452\.Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p5.1)\.
- M\. Lubin, O\. Dowson, J\. Dias Garcia, J\. Huchette, B\. Legat, and J\. P\. Vielma \(2023a\)JuMP 1\.0: Recent improvements to a modeling language for mathematical optimization\.Mathematical Programming Computation15,pp\. 581–589\.External Links:[Document](https://dx.doi.org/10.1007/s12532-023-00239-3)Cited by:[§3\.3](https://arxiv.org/html/2607.00095#S3.SS3.p2.1)\.
- M\. Lubin, O\. Dowson, J\. Dias Garcia, J\. Huchette, B\. Legat, and J\. P\. Vielma \(2023b\)JuMP 1\.0: Recent improvements to a modeling language for mathematical optimization\.Mathematical Programming Computation15,pp\. 581–589\.External Links:[Document](https://dx.doi.org/10.1007/s12532-023-00239-3)Cited by:[§4\.1](https://arxiv.org/html/2607.00095#S4.SS1.SSS0.Px3.p1.1)\.
- J\. Nocedal and S\. Wright \(2006\)Numerical optimization\.Springer\.Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p5.1),[§3\.2](https://arxiv.org/html/2607.00095#S3.SS2.p1.1),[§3\.2](https://arxiv.org/html/2607.00095#S3.SS2.p2.3)\.
- NVIDIA Corporation \(2025\)NVIDIA cuDSS: a high\-performance CUDA library for direct sparse solvers\.Note:Accessed: 2026\-05\-07External Links:[Link](https://docs.nvidia.com/cuda/cudss/)Cited by:[§3\.3](https://arxiv.org/html/2607.00095#S3.SS3.p4.2)\.
- F\. Pacaud and S\. Shin \(2024\)GPU\-accelerated dynamic nonlinear optimization with ExaModels and MadNLP\.In2024 IEEE 63rd Conference on Decision and Control \(CDC\),pp\. 5963–5968\.External Links:[Document](https://dx.doi.org/10.1109/CDC56724.2024.10886720)Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p5.1),[§3\.3](https://arxiv.org/html/2607.00095#S3.SS3.p4.2)\.
- A\. Paszke, S\. Gross, F\. Massa, A\. Lerer, J\. Bradbury, G\. Chanan, T\. Killeen, Z\. Lin, N\. Gimelshein, L\. Antiga,et al\.\(2019\)Pytorch: an imperative style, high\-performance deep learning library\.Advances in neural information processing systems32\.Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p5.1)\.
- I\. Price, A\. Sanchez\-Gonzalez, F\. Alet, T\. R\. Andersson, A\. El\-Kadi, D\. Masters, T\. Ewalds, J\. Stott, S\. Mohamed, P\. Battaglia,et al\.\(2023\)Gencast: diffusion\-based ensemble forecasting for medium\-range weather\.arXiv preprint arXiv:2312\.15796\.Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p1.1)\.
- A\. Quarteroni, R\. Sacco, and F\. Saleri \(2007\)Numerical mathematics\.Springer\.Cited by:[§3\.1](https://arxiv.org/html/2607.00095#S3.SS1.p2.4)\.
- M\. Raissi, P\. Perdikaris, and G\.E\. Karniadakis \(2019\)Physics\-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations\.Journal of Computational Physics378,pp\. 686–707\.External Links:ISSN 0021\-9991,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.jcp.2018.10.045),[Link](https://www.sciencedirect.com/science/article/pii/S0021999118307125)Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p1.1)\.
- S\. C\. Rennich, D\. Stosic, and T\. A\. Davis \(2014\)Accelerating sparse cholesky factorization on gpus\.In2014 4th Workshop on Irregular Applications: Architectures and Algorithms \(IAˆ 3\),pp\. 9–16\.Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p5.1)\.
- J\. Richter\-Powell, Y\. Lipman, and R\. T\. Chen \(2022\)Neural conservation laws: a divergence\-free perspective\.Advances in Neural Information Processing Systems35,pp\. 38075–38088\.Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p2.1)\.
- R\. Römer, A\. von Rohr, and A\. P\. Schoellig \(2024\)Diffusion predictive control with constraints\.arXiv preprint arXiv:2412\.09342\.Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p3.1)\.
- S\. Shin, M\. Anitescu, and F\. Pacaud \(2024a\)Accelerating optimal power flow with GPUs\.Note:ExaModels\.jl \+ MadNLP\.jl v0\.8 release, Julia Discourse, March 2024Cited by:[§3\.3](https://arxiv.org/html/2607.00095#S3.SS3.p1.1),[§3\.3](https://arxiv.org/html/2607.00095#S3.SS3.p2.1),[§4\.1](https://arxiv.org/html/2607.00095#S4.SS1.SSS0.Px1.p1.1)\.
- S\. Shin, F\. Pacaud, and M\. Anitescu \(2024b\)Accelerating optimal power flow with GPUs: SIMD abstraction of nonlinear programs and condensed\-space interior\-point methods\.Electric Power Systems Research236\.Cited by:[§3\.3](https://arxiv.org/html/2607.00095#S3.SS3.p1.1),[§3\.3](https://arxiv.org/html/2607.00095#S3.SS3.p2.1),[§3\.3](https://arxiv.org/html/2607.00095#S3.SS3.p3.1),[§4\.1](https://arxiv.org/html/2607.00095#S4.SS1.SSS0.Px1.p1.1)\.
- U\. Utkarsh, P\. Cai, A\. Edelman, R\. Gomez\-Bombarelli, and C\. Rackauckas \(2025a\)Physics\-constrained flow matching: sampling generative models with hard constraints\.NeurIPS\.External Links:2506\.04171Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p1.1),[§1](https://arxiv.org/html/2607.00095#S1.p3.1),[§1](https://arxiv.org/html/2607.00095#S1.p4.1),[§1](https://arxiv.org/html/2607.00095#S1.p7.1),[§2\.1](https://arxiv.org/html/2607.00095#S2.SS1.p2.11),[§2\.1](https://arxiv.org/html/2607.00095#S2.SS1.p3.1),[§2](https://arxiv.org/html/2607.00095#S2.p1.1),[§3](https://arxiv.org/html/2607.00095#S3.p2.6)\.
- U\. Utkarsh, D\. C\. Maddix, R\. Ma, M\. W\. Mahoney, and Y\. Wang \(2025b\)End\-to\-end probabilistic framework for learning with hard constraints\.arXiv preprint arXiv:2506\.07003\.Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p3.1)\.
- A\. Wächter and L\. T\. Biegler \(2006\)On the implementation of an interior\-point filter line\-search algorithm for large\-scale nonlinear programming\.Mathematical Programming106\(1\),pp\. 25–57\.External Links:[Document](https://dx.doi.org/10.1007/s10107-004-0559-y)Cited by:[§4\.1](https://arxiv.org/html/2607.00095#S4.SS1.SSS0.Px4.p1.1)\.
- Y\. Yuan, J\. Song, U\. Iqbal, A\. Vahdat, and J\. Kautz \(2023\)Physdiff: physics\-guided human motion diffusion model\.InProceedings of the IEEE/CVF international conference on computer vision,pp\. 16010–16021\.Cited by:[§1](https://arxiv.org/html/2607.00095#S1.p1.1),[§1](https://arxiv.org/html/2607.00095#S1.p3.1)\.

## Appendix AModel Problems

Many of the constraints considered in this work are formulated in terms of conserved integral quantities\. We therefore define the mass and energy of a solutionuuat timettby

m\(t\)\\displaystyle m\(t\):=∫Ωu\(x,t\)𝑑x\\displaystyle:=\\int\_\{\\Omega\}u\(x,t\)\\,dx\(6\)E\(t\)\\displaystyle E\(t\):=12∫Ωu2\(x,t\)𝑑x\\displaystyle:=\\frac\{1\}\{2\}\\int\_\{\\Omega\}u^\{2\}\(x,t\)\\,dx\(7\)
### A\.1Heat Equation Problems

We first consider the one\-dimensional heat equation

ut=αuxx,Ω=\[0,2π\],t∈\[0,1\]\\displaystyle u\_\{t\}=\\alpha u\_\{xx\},\\quad\\Omega=\[0,2\\pi\],\\quad t\\in\[0,1\]\(8\)with periodic boundary conditionsu\(0,t\)=u\(2π,t\)u\(0,t\)=u\(2\\pi,t\), and initial values given by

u\(x,0\)=uICH\(x\):=sin⁡\(x\+ϕ\)\\displaystyle u\(x,0\)=u\_\{IC\}^\{\\text\{H\}\}\(x\):=\\sin\(x\+\\phi\)The spatiotemporal gridsizes areNx=Nt=100N\_\{x\}=N\_\{t\}=100\.

For this specific problem, there is an analytical solution given by

uanalytic\(x,t\)=exp⁡\(−αt\)sin⁡\(x\+ϕ\)\\displaystyle u\_\{\\text\{analytic\}\}\(x,t\)=\\exp\(\-\\alpha t\)\\sin\(x\+\\phi\)\(9\)
The FFM model was trained using a dataset constructed by computing the analytical solution for different diffusion coefficients and phases\. Namely, the parameters were sampled from the following distributions\.

α∼𝒰\(1,5\),ϕ∼𝒰\(0,π\)\\displaystyle\\alpha\\sim\\mathcal\{U\}\(1,5\),\\quad\\phi\\sim\\mathcal\{U\}\(0,\\pi\)The training dataset contained10,00010,000such solutions\.

During inference, we evaluate on two different constraint regimes\.

In the first regime, we fixϕ=π/4\\phi=\\pi/4to keep the initial condition constant, and we enforce the initial value and mass conservation constraints\.

ℋH,1\(u\)=\[u\(x,0\)−uICH\(x\)m\(t\)−m\(0\)\]\\displaystyle\\mathcal\{H\}^\{\\text\{H,1\}\}\(u\)=\\begin\{bmatrix\}u\(x,0\)\-u\_\{IC\}^\{\\text\{H\}\}\(x\)\\\\ m\(t\)\-m\(0\)\\\\ \\end\{bmatrix\}\(10\)
In the second set of constraints, we additionally impose an energy\-evolution constraint together with a local PDE constraint\. The resulting constraint operator is given by

ℋH,2\(u\)=\[u\(x,0\)−uICH\(x\)m\(t\)−m\(0\)ut−αuxxddt12∫Ω\(u\(x,t\)\)2𝑑x\+α∫Ω\(ux\)2𝑑x\]\\displaystyle\\mathcal\{H\}^\{\\text\{H,2\}\}\(u\)=\\begin\{bmatrix\}u\(x,0\)\-u\_\{IC\}^\{\\text\{H\}\}\(x\)\\\\ m\(t\)\-m\(0\)\\\\ u\_\{t\}\-\\alpha u\_\{xx\}\\\\ \\frac\{d\}\{dt\}\\frac\{1\}\{2\}\\int\_\{\\Omega\}\(u\(x,t\)\)^\{2\}\\;dx\+\\alpha\\int\_\{\\Omega\}\(u\_\{x\}\)^\{2\}\\;dx\\end\{bmatrix\}\(11\)
A derivation and motivation for the energy\-evolution constraint are provided in Appendix[D](https://arxiv.org/html/2607.00095#A4)\.

### A\.2Burgers’

Secondly, consider the one\-dimensional inviscid Burgers’ equation

ut\+12\(u2\)x=0,Ω=\[0,1\],t∈\[0,1\]\\displaystyle u\_\{t\}\+\\frac\{1\}\{2\}\(u^\{2\}\)\_\{x\}=0,\\quad\\Omega=\[0,1\],\\quad t\\in\[0,1\]\(12\)
The training dataset was constructed by generating solutions with the Godunov finite volume method\(LeVeque,[2004](https://arxiv.org/html/2607.00095#bib.bib11)\), on aNx=Nt=101N\_\{x\}=N\_\{t\}=101grid\. For the training set, the following three conditions were imposed

1. 1\.Initial value: u\(x,0\)=uICB\(x;ploc\):=11\+exp⁡\(x−plocϵ\)u\(x,0\)=u\_\{\\text\{IC\}\}^\{\\text\{B\}\}\(x;p\_\{\\text\{loc\}\}\):=\\frac\{1\}\{1\+\\exp\(\\frac\{x\-p\_\{\\text\{loc\}\}\}\{\\epsilon\}\)\}withϵ=0\.02\\epsilon=0\.02fixed, and the location sampled randomly fromploc∼𝒰\(0\.2,0\.8\)p\_\{\\text\{loc\}\}\\sim\\mathcal\{U\}\(0\.2,0\.8\)\.
2. 2\.Dirichlet condition on the left boundary u\(0,t\)=ubcwithubc∼𝒰\(0,1\)u\(0,t\)=u\_\{\\text\{bc\}\}\\quad\\text\{with \}u\_\{\\text\{bc\}\}\\sim\\mathcal\{U\}\(0,1\)
3. 3\.Von Neumann condition on the right ddxu\(1,t\)=0\\frac\{d\}\{dx\}u\(1,t\)=0

8080different values ofplocp\_\{\\text\{loc\}\}andubcu\_\{\\text\{bc\}\}were sampled each, generating a total of64006400solutions for the training dataset\.

The first regime of constraints considered for the Burgers’ equation imposes a Dirichlet condition on the left boundary, and a Neumann condition on the right boundary\. The Dirichlet value is not fixed, but varies across samples \- as in the training data\. There is also a mass conservation constraint\. Together the constraints are given by

ℋB,BC\(u\)=\[u\(0,t\)−uLm\(t\)−m0∂nu\(1,t\)\]\\displaystyle\\mathcal\{H\}^\{\\text\{B,BC\}\}\(u\)=\\begin\{bmatrix\}u\(0,t\)\-u\_\{L\}\\\\ m\(t\)\-m\_\{0\}\\\\ \\partial\_\{n\}u\(1,t\)\\end\{bmatrix\}\(13\)
In the second regime, we impose an initial value constraint, withplocp\_\{\\text\{loc\}\}sampled as in the training set\. In addition to the typical mass constraint we also impose a sequence of local conservation updates based on Godunov’s flux method\. The constraint function then becomes

ℋB,IC\(u\)=\[u\(x,0\)−uICB\(x;ploc\)\)m\(t\)−m\(0\)RFlux\(k\)\(u\)\(k=1,\.\.,5\)\]\\displaystyle\\mathcal\{H\}^\{\\text\{B,IC\}\}\(u\)=\\begin\{bmatrix\}u\(x,0\)\-u\_\{\\text\{IC\}\}^\{\\text\{B\}\}\(x;p\_\{\\text\{loc\}\}\)\)\\\\ m\(t\)\-m\(0\)\\\\ R^\{\(k\)\}\_\{\\text\{Flux\}\}\(u\)\\;\(k=1,\.\.,5\)\\end\{bmatrix\}\(14\)A derivation and description of the flux constraint based on the Godunov method are provided in Appendix[D](https://arxiv.org/html/2607.00095#A4)\.

### A\.3Reaction\-Diffusion Equation

Next, we consider a nonlinear reaction\-diffusion equation given by

ut=ρu\(1−u\)−νuxx\\displaystyle u\_\{t\}=\\rho u\(1\-u\)\-\\nu u\_\{xx\}\(15\)Ω=\[0,1\],t∈\[0,1\]\\displaystyle\\Omega=\[0,1\],\\quad t\\in\[0,1\]\(16\)with parameters\(ρ,ν\)=\(0\.01,0\.005\)\(\\rho,\\nu\)=\(0\.01,0\.005\)\. The equation is discretized on a grid with spatial and temporal resolutionsNx=128N\_\{x\}=128andNt=100N\_\{t\}=100, respectively\.

The initial conditions are sampled from randomized combinations of sinusoidal and localized bump functions\. The training dataset is generated by pairing8080distinct initial conditions with8080boundary conditions, resulting in a total of64006400PDE solutions\. The solutions used for training are computed using a semi\-implicit finite difference scheme\.

ℋRD\(u\)=\[u\(x,0\)−uIC\(x\)m\(t\)−\(m0\+∫0tρu\(1−u\)𝑑τ\+∫0t\(gL\(τ\)−g\(τ\)\)dτ\)\]\\displaystyle\\mathcal\{H\}^\{\\text\{RD\}\}\(u\)=\\begin\{bmatrix\}u\(x,0\)\-u\_\{\\text\{IC\}\}\(x\)\\\\\[3\.00003pt\] \\begin\{aligned\} m\(t\)\-\\Bigl\(m\_\{0\}&\+\\int\_\{0\}^\{t\}\\rho u\(1\-u\)\\,d\\tau\\\\ &\+\\int\_\{0\}^\{t\}\(g\_\{L\}\(\\tau\)\-g\(\\tau\)\)\\,d\\tau\\Bigr\)\\end\{aligned\}\\end\{bmatrix\}\(17\)

### A\.4Navier\-Stokes Equation

Lastly we consider the Navier\-Stokes equation in two dimensions, given in its vorticity form with periodic boundary conditions as

wt\+u⋅∇w=ν∇2w\+f\(x\)\\displaystyle w\_\{t\}\+u\\cdot\\nabla w=\\nu\\nabla^\{2\}w\+f\(x\)\(18\)Ω=\[0,1\]2,t∈\[0,49\]\\displaystyle\\Omega=\[0,1\]^\{2\},\\quad t\\in\[0,49\]\(19\)wherew:=∇×uw:=\\nabla\\times uis the vorticity, and we set the viscosityν=10−3\\nu=10^\{\-3\}\. We let the forcing functionf\(x\)=0\.12sin⁡\(2π\(x\+y\)\+ϕ\)f\(x\)=0\.1\\sqrt\{2\}\\sin\(2\\pi\(x\+y\)\+\\phi\)\. The spatiotemporal resolutions are as followsNx=Ny=16N\_\{x\}=N\_\{y\}=16,Nt=50N\_\{t\}=50\. The training dataset was constructed by solving the governing problem with a Crank–Nicolson spectral solver\. Specifically,10,00010,000simulations were generated from100100randomly sampled initial vorticitiesw0w\_\{0\}, drawn from a Gaussian random field, and100100forcing phasesϕ∼𝒰\(0,π/2\)\\phi\\sim\\mathcal\{U\}\(0,\\pi/2\)\. With periodic boundary conditions the vorticity\-mass is conserved\. We further impose an initial value constraint, and an energy conservation constraint on the vorticity\. In total, these are the constraints:

ℋNS\(w\)=\[w\(x,y,0\)−wIC\(x,y\)∫Ωw\(x,y,t\)𝑑x𝑑y−∫ΩwIC\(x,y\)𝑑x𝑑y∫Ωw\(x,y,t\)2𝑑x𝑑y−∫ΩwIC\(x,y\)2𝑑x𝑑y\]\\displaystyle\\mathcal\{H\}^\{\\text\{NS\}\}\(w\)=\\begin\{bmatrix\}w\(x,y,0\)\-w\_\{\\text\{IC\}\}\(x,y\)\\\\\[3\.99994pt\] \\displaystyle\\int\_\{\\Omega\}w\(x,y,t\)\\,dx\\,dy\-\\int\_\{\\Omega\}w\_\{\\text\{IC\}\}\(x,y\)\\,dx\\,dy\\\\\[3\.99994pt\] \\displaystyle\\int\_\{\\Omega\}w\(x,y,t\)^\{2\}\\,dx\\,dy\-\\int\_\{\\Omega\}w\_\{\\text\{IC\}\}\(x,y\)^\{2\}\\,dx\\,dy\\end\{bmatrix\}\(20\)

## Appendix BNewton\-KKT system

Consider the equality constrained optimisation problem

minx∈ℝn⁡f\(x\)subject tog\(x\)=0\\displaystyle\\min\_\{x\\in\\mathbb\{R\}^\{n\}\}f\(x\)\\quad\\text\{subject to \}g\(x\)=0\(21\)
for some objective functionf:ℝn→ℝf:\\mathbb\{R\}^\{n\}\\to\\mathbb\{R\}, and constraint functiong:ℝn→ℝmg:\\mathbb\{R\}^\{n\}\\to\\mathbb\{R\}^\{m\}

Our goal is to show that approximating this solution can be reduced to repeatedly solving linear systems of the formAu=bAu=b, whereAAhas a characteristic sparse structure that is highly dependent on the Jacobian ofgg\.

The key idea to solving \([21](https://arxiv.org/html/2607.00095#A2.E21)\) is that at the optimum, we cannot improve the objective while remaining in the feasible set\. Mathematically, this means that if we suppose thatx∗x^\{\*\}solves

Supposex∗x^\{\*\}solves \([21](https://arxiv.org/html/2607.00095#A2.E21)\)\. Consider a small perturbationΔx\\Delta x\. To remain within the feasible set we must require

g\(x∗\+Δx\)=0g\(x^\{\*\}\+\\Delta x\)=0Linearization gives

g\(x∗\)\+J\(x∗\)Δx=0g\(x^\{\*\}\)\+J\(x^\{\*\}\)\\Delta x=0where we have introduced the Jacobian ofgg,J\(u\):=∇xg\(u\)∈ℝm×nJ\(u\):=\\nabla\_\{x\}g\(u\)\\in\\mathbb\{R\}^\{m\\times n\}\. Asx∗x^\{\*\}by assumption is in the feasible set, we have thatJ\(x∗\)Δx=0J\(x^\{\*\}\)\\Delta x=0\. In other words, all feasible perturbations must lie in the nullspace of the Jacobian\.

Moreover, asx∗x^\{\*\}is an optimum, one cannot decrease the objective function in any feasible direction\. This implies that

∇f\(x∗\)TΔx=0,∀ΔxsatisfyingJ\(x∗\)Δx=0\\nabla f\(x^\{\*\}\)^\{T\}\\Delta x=0,\\quad\\forall\\Delta x\\text\{ satisfying \}J\(x^\{\*\}\)\\Delta x=0
A basic linear algebra result is that a vectorvvwhich is orthogonal to the nullspace of a matrixQQ, is necessarily in the rowspace ofQQ\. This implies that we can write

∇f\(x∗\)=−J\(x∗\)Tλ\\nabla f\(x^\{\*\}\)=\-J\(x^\{\*\}\)^\{T\}\\lambdafor someλ∈ℝm\\lambda\\in\\mathbb\{R\}^\{m\}\.

These two conditions together form the Karush\-Kuhn\-Tucker \(KKT\) conditions that we require the solution to satisfy

∇f\(x\)\+\(J\(x\)\)Tλ\\displaystyle\\nabla f\(x\)\+\(J\(x\)\)^\{T\}\\lambda=0\\displaystyle=0g\(x\)\\displaystyle g\(x\)=0\\displaystyle=0By introducing the Lagrangianℒ\(x,λ\)=f\(x\)\+\(J\(x\)\)Tλ\\mathcal\{L\}\(x,\\lambda\)=f\(x\)\+\(J\(x\)\)^\{T\}\\lambdawe can also reformulate the problem of findingx∗x^\{\*\}so thatF\(x,λ\)=0F\(x,\\lambda\)=0, where we define

F:ℝn\+m→ℝn\+m,F\(x,λ\):=\[∇xℒ\(x,λ\)g\(x\)\]\\displaystyle F:\\mathbb\{R\}^\{n\+m\}\\to\\mathbb\{R\}^\{n\+m\},\\quad F\(x,\\lambda\):=\\begin\{bmatrix\}\\nabla\_\{x\}\\mathcal\{L\}\(x,\\lambda\)\\\\ g\(x\)\\end\{bmatrix\}\(22\)
One can recall that a single iteration of the multivariate Newton\-Raphson method for solving the problemG\(u\):ℝp→ℝqG\(u\):\\mathbb\{R\}^\{p\}\\to\\mathbb\{R\}^\{q\}takes the form

u\(k\+1\)=u\(k\)−\(∇G\(u\(k\)\)\)−1G\(u\(k\)\)\\displaystyle u^\{\(k\+1\)\}=u^\{\(k\)\}\-\(\\nabla G\(u^\{\(k\)\}\)\)^\{\-1\}G\(u^\{\(k\)\}\)which can be simplified to solving the linear equation

∇G\(u\(k\)\)Δk=−G\(u\(k\)\),Δk:=u\(k\+1\)−u\(k\)\\displaystyle\\nabla G\(u^\{\(k\)\}\)\\Delta\_\{k\}=\-G\(u^\{\(k\)\}\),\\quad\\Delta\_\{k\}:=u^\{\(k\+1\)\}\-u^\{\(k\)\}\(23\)
Hence, to solveF\(x,λ\)=0F\(x,\\lambda\)=0using the Newton\-Raphson method, we need to compute∇\(x,λ\)F\(x,λ\)\\nabla\_\{\(x,\\lambda\)\}F\(x,\\lambda\)\.

A straightforward computation yields the linear system

\[WJTJ0\]\[ΔxnΔλn\]=−\[∇f\(xn\)\+\(J\(xn\)\)Tλng\(xn\)\],W:=∇xx2ℒ\(xn,λn\)\\displaystyle\\begin\{bmatrix\}W&J^\{T\}\\\\ J&0\\end\{bmatrix\}\\begin\{bmatrix\}\\Delta x\_\{n\}\\\\ \\Delta\\lambda\_\{n\}\\end\{bmatrix\}=\-\\begin\{bmatrix\}\\nabla f\(x\_\{n\}\)\+\(J\(x\_\{n\}\)\)^\{T\}\\lambda\_\{n\}\\\\ g\(x\_\{n\}\)\\end\{bmatrix\},\\quad W:=\\nabla\_\{xx\}^\{2\}\\mathcal\{L\}\(x\_\{n\},\\lambda\_\{n\}\)\(24\)
The matrix on the left\-hand side of \([24](https://arxiv.org/html/2607.00095#A2.E24)\) is called theNewton\-KKT system\. Its sparse block structure plays a central role in large\-scale constrained optimisation algorithms\.

## Appendix CJacobian Structure Example

This appendix presents an example illustrating the sparsity structure in the Jacobian arising from a common set of PDE constraints\.

Consider a PDE subject to the following constraints\.

ℋ\(u\)=\[u\(x,0\)−f\(x\)∀x∈Ω∫Ωu\(x,t\)𝑑x−∫Ωu\(x,0\)𝑑x∀t∈\[0,T\]\]=𝟎\\displaystyle\\mathcal\{H\}\(u\)=\\begin\{bmatrix\}u\(x,0\)\-f\(x\)\\quad\\forall x\\in\\Omega\\\\ \\int\_\{\\Omega\}u\(x,t\)\\;dx\-\\int\_\{\\Omega\}u\(x,0\)\\;dx\\quad\\forall t\\in\[0,T\]\\\\ \\end\{bmatrix\}=\\mathbf\{0\}Assume the solution is discretized on a spatiotemporal grid with\(Nx,Nt\)=\(3,3\)\(N\_\{x\},N\_\{t\}\)=\(3,3\)grid points:x1,x2,x3x\_\{1\},x\_\{2\},x\_\{3\}, andt1,t2,t3t\_\{1\},t\_\{2\},t\_\{3\}\. Let𝐮i\(k\)\\mathbf\{u\}\_\{i\}^\{\(k\)\}denote the approximation ofu\(xi,tk\)u\(x\_\{i\},t\_\{k\}\)\.

The initial value constraint then becomes

𝐮i\(1\)−fi=0,fi:=f\(xi\),i=1,2,3\\displaystyle\\mathbf\{u\}\_\{i\}^\{\(1\)\}\-f\_\{i\}=0,\\qquad f\_\{i\}:=f\(x\_\{i\}\),\\quad i=1,2,3Using a simple left Riemann sum approximation to the integral, the mass conservation constraint takes the form

∑i=12𝐮i\(k\)−∑i=12𝐮i\(1\)=0,k=2,3\\displaystyle\\sum\_\{i=1\}^\{2\}\\mathbf\{u\}\_\{i\}^\{\(k\)\}\-\\sum\_\{i=1\}^\{2\}\\mathbf\{u\}\_\{i\}^\{\(1\)\}=0,\\quad k=2,3where the constraint at the first timestep is omitted\.

In total, this yields55constraints\. For convenience, we represent the approximation𝐮\\mathbf\{u\}as the flattened vector

𝐮=\(𝐮1\(1\),𝐮2\(1\),𝐮3\(1\),…,𝐮1\(3\),𝐮2\(3\),𝐮3\(3\)\)T∈ℝ9\\mathbf\{u\}=\(\\mathbf\{u\}\_\{1\}^\{\(1\)\},\\mathbf\{u\}\_\{2\}^\{\(1\)\},\\mathbf\{u\}\_\{3\}^\{\(1\)\},\.\.\.,\\mathbf\{u\}\_\{1\}^\{\(3\)\},\\mathbf\{u\}\_\{2\}^\{\(3\)\},\\mathbf\{u\}\_\{3\}^\{\(3\)\}\)^\{T\}\\in\\mathbb\{R\}^\{9\}\.

The Jacobian of the constraint operator,Jℋ∈ℝ5×9J\_\{\\mathcal\{H\}\}\\in\\mathbb\{R\}^\{5\\times 9\}has entries\(Jℋ\)i,j=∂ℋi∂𝐮j\(J\_\{\\mathcal\{H\}\}\)\_\{i,j\}=\\frac\{\\partial\\mathcal\{H\}\_\{i\}\}\{\\partial\\mathbf\{u\}\_\{j\}\}, where the constraints are ordered such that the initial value constraints appear first\.

A straightforward computation gives

Jℋ=\[100000000010000000001000000\-1\-10110000\-1\-10000110\]\\displaystyle J\_\{\\mathcal\{H\}\}=\\begin\{bmatrix\}\\textbf\{1\}&0&0&0&0&0&0&0&0\\\\ 0&\\textbf\{1\}&0&0&0&0&0&0&0\\\\ 0&0&\\textbf\{1\}&0&0&0&0&0&0\\\\ \\textbf\{\-1\}&\\textbf\{\-1\}&0&\\textbf\{1\}&\\textbf\{1\}&0&0&0&0\\\\ \\textbf\{\-1\}&\\textbf\{\-1\}&0&0&0&0&\\textbf\{ 1\}&\\textbf\{1\}&0\\\\ \\end\{bmatrix\}\(25\)
The3×33\\times 3identity block in the upper\-left corner arises from the initial value constraints\. More generally, initial value constraints typically contribute block\-diagonal structure to the Jacobian\.

Similarly, regardless of the number of grid points, a mass conservation constraint discretized using a left Riemann sum produces rows containing a sequence ofNx−1N\_\{x\}\-1nonzero entries followed by a zero, reflecting the structure of the underlying quadrature rule\.

## Appendix DConstraints

### D\.1Energy Evolution Constraint

In generating solutions for the heat equation we considered an energy evolution constraint\. Here, ”energy” is meant as

E\(t\):=12∫Ω\(u\(x,t\)\)2𝑑xE\(t\):=\\frac\{1\}\{2\}\\int\_\{\\Omega\}\(u\(x,t\)\)^\{2\}\\;dxnot to be confused with the physical ”energy” of the system\. The energy referred to here is more of a measure of variance ofuu\. We might want to enforce that the temperature profile,uuis smoothed out over time\. In other words, that the temporal derivativeddtE\(t\)≤0\\frac\{d\}\{dt\}E\(t\)\\leq 0\. If one assumes zero flux on the boundary, then this is a purely mathematical artifact of the heat equation\.

ddt12∫Ω\(u\(x,t\)\)2𝑑x\\displaystyle\\frac\{d\}\{dt\}\\frac\{1\}\{2\}\\int\_\{\\Omega\}\(u\(x,t\)\)^\{2\}\\;dx=∫Ωuut𝑑x\\displaystyle=\\int\_\{\\Omega\}uu\_\{t\}\\;dx=∫Ωαuuxx𝑑x\\displaystyle=\\int\_\{\\Omega\}\\alpha uu\_\{xx\}\\;dx=−α∫Ω\(ux\)2𝑑x\\displaystyle=\-\\alpha\\int\_\{\\Omega\}\(u\_\{x\}\)^\{2\}\\;dxwhere the second\-to\-last equality comes from substituting in the heat equation, and the last equality comes from integration by parts \- assuming zero flux on boundary\. Importantly, the last expression must be negative, hence, we can enforce

ddt12∫Ω\(u\(x,t\)\)2𝑑x=−α∫Ω\(ux\)2𝑑x\\frac\{d\}\{dt\}\\frac\{1\}\{2\}\\int\_\{\\Omega\}\(u\(x,t\)\)^\{2\}\\;dx=\-\\alpha\\int\_\{\\Omega\}\(u\_\{x\}\)^\{2\}\\;dxThis constraint enforces that the energy of the solution,E\(t\)E\(t\), which is to be interpreted as the variability of the solution in space, decreases over time \- much as one would expect from a diffusive process such as the heat equation\.

### D\.2Godunov’s Flux Conservation

For the Burgers’ equation with fixed initial condition, we additionally imposed a sequence of local conservation constraints based on Godunov’s finite\-volume method\. The purpose of this constraint was to encourage the neural solution to satisfy not only global mass conservation, but also the local conservative transport structure of the PDE\.

Specifically, the constraint operator included the residual terms

RFlux\(k\)\(u\),k=1,…,5,R^\{\(k\)\}\_\{\\text\{Flux\}\}\(u\),\\qquad k=1,\\dots,5,where each residual corresponds to one conservative update step of a finite\-volume discretization\.

Let the spatial domain be partitioned into cells indexed byii, with cell averages

uik≈1Δx∫xixi\+1u\(x,tk\)𝑑x\.u\_\{i\}^\{k\}\\approx\\frac\{1\}\{\\Delta x\}\\int\_\{x\_\{i\}\}^\{x\_\{i\+1\}\}u\(x,t\_\{k\}\)\\,dx\.For Burgers’ equation

ut\+12\(u2\)x=0,u\_\{t\}\+\\frac\{1\}\{2\}\(u^\{2\}\)\_\{x\}=0,Godunov’s method updates the solution according to

uik\+1=νik−ΔtΔx\(Fi\+12k−Fi−12k\),u\_\{i\}^\{k\+1\}=\\nu\_\{i\}^\{k\}\-\\frac\{\\Delta t\}\{\\Delta x\}\\left\(F\_\{i\+\\frac\{1\}\{2\}\}^\{k\}\-F\_\{i\-\\frac\{1\}\{2\}\}^\{k\}\\right\),whereFi\+12kF\_\{i\+\\frac\{1\}\{2\}\}^\{k\}denotes the numerical flux through the interface between neighboring cells\.

For Burgers’ equation, the Godunov flux is given by

F\(a,b\)=\{minu∈\[a,b\]⁡u22,a≤b,maxu∈\[b,a\]⁡u22,a\>b,F\(a,b\)=\\begin\{cases\}\\min\_\{u\\in\[a,b\]\}\\frac\{u^\{2\}\}\{2\},&a\\leq b,\\\\\[3\.00003pt\] \\max\_\{u\\in\[b,a\]\}\\frac\{u^\{2\}\}\{2\},&a\>b,\\end\{cases\}witha=uika=u\_\{i\}^\{k\}andb=ui\+1kb=u\_\{i\+1\}^\{k\}\.

Using the neural network prediction evaluated on the spatial grid, we defined the flux residual as the mismatch between consecutive predicted states and a single Godunov update:

RFlux\(k\)\(u\)=uk\+1−\[uk−ΔtΔx\(Fi\+12k−Fi−12k\)\]\.R\_\{\\text\{Flux\}\}^\{\(k\)\}\(u\)=u^\{k\+1\}\-\\left\[u^\{k\}\-\\frac\{\\Delta t\}\{\\Delta x\}\\left\(F\_\{i\+\\frac\{1\}\{2\}\}^\{k\}\-F\_\{i\-\\frac\{1\}\{2\}\}^\{k\}\\right\)\\right\]\.
Thus, the constraint enforces that the predicted solution approximately evolves according to a locally conservative finite\-volume scheme over several successive time steps\.

### D\.3Discretization

To evaluate the conservation constraints in practice, the continuous space\-time domain was discretized on a uniform grid

xi=iΔx,tk=kΔt,x\_\{i\}=i\\Delta x,\\qquad t\_\{k\}=k\\Delta t,whereΔx\\Delta xandΔt\\Delta tdenote the spatial and temporal step sizes, respectively\. Using the representation that𝐮ik\\mathbf\{u\}\_\{i\}^\{k\}is the approximation tou\(xi,tk\)u\(x\_\{i\},t\_\{k\}\), the continuous conservation operators could be approximated through discrete sums and local update relations\.

Global conservation quantities were approximated through Riemann sums over the spatial grid\. For example, the total mass at timetkt\_\{k\}was approximated as

m\(tk\)≈∑iuikΔx\.m\(t\_\{k\}\)\\approx\\sum\_\{i\}u\_\{i\}^\{k\}\\Delta x\.
#### D\.3\.1Neumann Boundary Conditions

For problems with homogeneous Neumann boundary conditions, the continuous condition

∂u∂n=0\\frac\{\\partial u\}\{\\partial n\}=0was enforced discretely by requiring the normal derivative at the boundary to vanish\. In one spatial dimension, this corresponds to

u2k−u1kΔx=0,uNxk−uNx−1kΔx=0,\\frac\{u\_\{2\}^\{k\}\-u\_\{1\}^\{k\}\}\{\\Delta x\}=0,\\qquad\\frac\{u\_\{N\_\{x\}\}^\{k\}\-u\_\{N\_\{x\}\-1\}^\{k\}\}\{\\Delta x\}=0,which implies that the boundary values mirror their neighboring interior values,

u1k=u2k,uNxk=uNx−1k\.u\_\{1\}^\{k\}=u\_\{2\}^\{k\},\\qquad u\_\{N\_\{x\}\}^\{k\}=u\_\{N\_\{x\}\-1\}^\{k\}\.
This prevents artificial fluxes through the boundary and ensures consistency with the conservation constraints\.

#### D\.3\.2Discretization of the Conservation Constraint

Consider the continuous conservation constraint

ddt∫Ωρ\(u\)𝑑x−C=0,\\frac\{d\}\{dt\}\\int\_\{\\Omega\}\\rho\(u\)\\,dx\-C=0,whereρ\(u\)\\rho\(u\)denotes some quantity whose integral in space we’d like to conserve, andCCis some constant\.

Using the spatial discretization, the integral term was approximated through a Riemann sum,

∫Ωρ\(u\)𝑑x≈∑iρ\(uik\)Δx\.\\int\_\{\\Omega\}\\rho\(u\)\\,dx\\approx\\sum\_\{i\}\\rho\(u\_\{i\}^\{k\}\)\\Delta x\.
The temporal derivative was then approximated using a finite difference,

ddt∫Ωρ\(u\)𝑑x≈∑iρ\(uik\+1\)Δx−∑iρ\(uik\)ΔxΔt\.\\frac\{d\}\{dt\}\\int\_\{\\Omega\}\\rho\(u\)\\,dx\\approx\\frac\{\\sum\_\{i\}\\rho\(u\_\{i\}^\{k\+1\}\)\\Delta x\-\\sum\_\{i\}\\rho\(u\_\{i\}^\{k\}\)\\Delta x\}\{\\Delta t\}\.
Hence, the fully discretized conservation constraint becomes

∑iρ\(uik\+1\)Δx−∑iρ\(uik\)ΔxΔt−C\.\\frac\{\\sum\_\{i\}\\rho\(u\_\{i\}^\{k\+1\}\)\\Delta x\-\\sum\_\{i\}\\rho\(u\_\{i\}^\{k\}\)\\Delta x\}\{\\Delta t\}\-C\.

## Appendix EImplementation Details

All sampling runs were initialized from white noise –iidiidGaussian entries – matched to the grid of the target PDE as a starting point for our models\.

For all the methods, the forward pass of the pretrained neural operator is executed on the GPU\. This choice is intentional to isolate the cost of the projection step, rather than benchmark CPU versus GPU neural\-network inference\. A fully CPU\-resident implementation would likely incur additional cost from evaluating the neural operator on the CPU, and should therefore be expected to have larger end\-to\-end runtimes than those reported here\.

### E\.1Hardware

All experiments were conducted on a high\-performance computing cluster with 1 × NVIDIA L40S GPU \(46 GB VRAM\), 4 CPU cores, and 64 GB RAM\.

### E\.2Test for Correctness

We verify that all methods produce solutions to the projection subproblem of comparable quality\.

Figure[3](https://arxiv.org/html/2607.00095#A5.F3)shows generated samples under each backend, as compared with an analytic solve for reference\. TheExaModels \+ MadNLPvariants \(both CPU and GPU\) reproduce the analytic sample to visual accuracy, demonstrating that the GPU execution does not compromise the correctness of the solution\. The remaining backends,JuMP,IPNewton, andL\-BFGS, produce samples that satisfy the constraints nominally but are visibly distinct from the reference\.

As PCFM is a generative method, even with exact constraint satisfaction, different optimization trajectories can occur\. Therefore, the figure highlights differences in reproducibility rather than correctness\.

![Refer to caption](https://arxiv.org/html/2607.00095v1/images/samples_heat_grid.png)Figure 3:Generated heat\-equation samples under each optimization backend, alongside an analytic reference\.ExaModels\+MadNLP\(CPU and GPU\) reproduces the analytic sample;JuMP,IPNewton, andL\-BFGSproduce visibly different samples despite using the same input noise, indicating that solver choice affects the generated sample beyond runtime\.
SNAP-FM: Sparse Nonlinear Accelerated Projection for Physics-Constrained Generative Modeling

Similar Articles

Physics-informed generative AI for semiconductor manufacturing: Enforcing hard physical constraints in generative models by construction

@plugyawn: Introducing: Megaprop: a library for efficient preconditioned optimization across GPUs! Megaprop is a fork of Megatron …

Accelerated Fourier SAT (AFSAT): Fully Realising a GPU-based Symmetric Pseudo-Boolean SAT Solver

Enforcing Constraints in Generative Sampling via Adaptive Correction Scheduling

FreeForm: Reduced-Order Deformable Simulation from Particle-Based Skinning Eigenmodes

Submit Feedback

Similar Articles

Physics-informed generative AI for semiconductor manufacturing: Enforcing hard physical constraints in generative models by construction
@plugyawn: Introducing: Megaprop: a library for efficient preconditioned optimization across GPUs! Megaprop is a fork of Megatron …
Accelerated Fourier SAT (AFSAT): Fully Realising a GPU-based Symmetric Pseudo-Boolean SAT Solver
Enforcing Constraints in Generative Sampling via Adaptive Correction Scheduling
FreeForm: Reduced-Order Deformable Simulation from Particle-Based Skinning Eigenmodes