Differentiable Parameter Optimization for DAEs with State-Dependent Events

arXiv cs.LG Papers

Summary

This paper presents methods for differentiable parameter optimization of differential-algebraic equations (DAEs) with state-dependent events, comparing automatic differentiation through simulation with explicit discrete-adjoint methods.

arXiv:2605.05395v1 Announce Type: new Abstract: Differential-algebraic equations (DAEs) with state-dependent events arise in systems whose continuous dynamics are constrained by algebraic equations and interrupted by mode changes, switching logic, impacts, or state reinitializations. Gradient-based parameter learning for such systems is challenging because algebraic variables are implicitly defined, event times depend on the parameters, and reset maps introduce discontinuities. This paper studies differentiable parameter optimization for semi-explicit DAEs with events. We formulate the learning problem as a constrained least-squares problem with DAE dynamics, algebraic constraints, guard equations, and reset maps. We then develop two complementary gradient-computation strategies. The first is an automatic-differentiation-through-simulation method that solves algebraic variables inside the vector field, differentiates the algebraic solve using the implicit function theorem, and handles events through segmented differentiable integration. The second is an explicit discrete-adjoint method that represents the forward simulation as an event-split residual system and computes gradients by solving for the Lagrange multipliers of smooth-segment and event residuals. The formulation clarifies that residual terms in the adjoint method are equality constraints, not heuristic penalties. We compare the two approaches in terms of gradient interpretation, event-time handling, implementation complexity, and local validity. Both methods provide gradients for the event path selected by the forward simulation and are valid under fixed event ordering and transversal guard crossings.
Original Article
View Cached Full Text

Cached at: 05/08/26, 07:17 AM

# Differentiable Parameter Optimization for DAEs with State-Dependent Events
Source: [https://arxiv.org/html/2605.05395](https://arxiv.org/html/2605.05395)
Ion Matei Fujitsu Research of America imatei@fujitsu\.com & Maksym Zhenirovskyy Fujitsu Research of America mzhenirovskyy@fujitsu\.com & Anthony Wong Fujitsu Research of America awong@fujitsu\.com

###### Abstract

Differential\-algebraic equations \(DAEs\) with state\-dependent events arise in systems whose continuous dynamics are constrained by algebraic equations and interrupted by mode changes, switching logic, impacts, or state reinitializations\. Gradient\-based parameter learning for such systems is challenging because algebraic variables are implicitly defined, event times depend on the parameters, and reset maps introduce discontinuities\. This paper studies differentiable parameter optimization for semi\-explicit DAEs with events\. We formulate the learning problem as a constrained least\-squares problem with DAE dynamics, algebraic constraints, guard equations, and reset maps\. We then develop two complementary gradient\-computation strategies\. The first is an automatic\-differentiation\-through\-simulation method that solves algebraic variables inside the vector field, differentiates the algebraic solve using the implicit function theorem, and handles events through segmented differentiable integration\. The second is an explicit discrete\-adjoint method that represents the forward simulation as an event\-split residual system and computes gradients by solving for the Lagrange multipliers of smooth\-segment and event residuals\. The formulation clarifies that residual terms in the adjoint method are equality constraints, not heuristic penalties\. We compare the two approaches in terms of gradient interpretation, event\-time handling, implementation complexity, and local validity\. Both methods provide gradients for the event path selected by the forward simulation and are valid under fixed event ordering and transversal guard crossings\.

## 1Introduction

Differential\-algebraic equations \(DAEs\) are a common modeling formalism for physical and engineered systems whose dynamics combine differential evolution with algebraic constraints\. They arise in electrical circuits, constrained mechanical systems, chemical processes, power networks, and equation\-based modeling environments\. Classical references on numerical DAE integration and sensitivity analysis provide the foundation for this setting\(Ascher and Petzold,[1998](https://arxiv.org/html/2605.05395#bib.bib6); Hairer and Wanner,[1996](https://arxiv.org/html/2605.05395#bib.bib5)\)\. In many of these applications, the continuous dynamics are interrupted by discrete events: switches open or close, controllers change modes, constraints become active, impacts occur, or state variables are reinitialized when thresholds are crossed\. The resulting models are hybrid DAEs: they evolve continuously between events, but their state and algebraic variables must remain consistent across event\-triggered discontinuities\. This view is consistent with the Modelica DAE representation, where equations may define DAEs with discontinuities, variable structure, and discrete\-event control\(Modelica Association,[2025](https://arxiv.org/html/2605.05395#bib.bib4)\)\.

Parameter estimation and design optimization for hybrid DAEs are challenging because the simulator is not simply a smooth map from parameters to outputs\. The algebraic variables are implicitly defined by constraints and must be recomputed during integration and after reinitialization\. Event times are also parameter\-dependent; changing a parameter can shift the instant at which a guard condition is reached, thereby changing the time intervals over which the continuous dynamics are integrated\. Finally, events introduce discontinuities through reset maps\. Even when the event sequence is fixed, gradients must account for continuous dynamics, event\-time sensitivities, reset maps, and post\-event algebraic consistency\.

Recent differentiable ODE solvers have made gradient\-based learning practical for continuous\-time models\. Libraries such astorchdiffeq\(Chen,[2018](https://arxiv.org/html/2605.05395#bib.bib24)\)anddiffrax\(Kidger,[2021](https://arxiv.org/html/2605.05395#bib.bib22)\)support backpropagation through ODE solvers and include differentiable event handling throughodeint\_event, with standard examples such as the bouncing ball model\(Chenet al\.,[2021](https://arxiv.org/html/2605.05395#bib.bib25),[2018](https://arxiv.org/html/2605.05395#bib.bib26)\)\. Related Neural ODE work also considers event\-sequence modeling, where events serve as observations or inputs to a latent continuous\-time model rather than as hard physical constraints\(Kuleshovet al\.,[2024](https://arxiv.org/html/2605.05395#bib.bib23)\)\. These developments show that differentiable simulation with events is useful and increasingly accessible\. However, most existing work is formulated for ODEs or latent ODE architectures\. It does not directly address parameter learning for DAEs, where algebraic constraints must be solved and differentiated consistently throughout the event\-driven simulation\.

This paper studies differentiable parameter learning for semi\-explicit DAEs with state\-dependent events\. We consider systems of the form

x˙​\(t\)=f​\(t,x​\(t\),z​\(t\),p\),0=g​\(t,x​\(t\),z​\(t\),p\),y​\(t\)=h​\(t,x​\(t\),z​\(t\),p\),\\dot\{x\}\(t\)=f\(t,x\(t\),z\(t\),p\),\\qquad 0=g\(t,x\(t\),z\(t\),p\),\\qquad y\(t\)=h\(t,x\(t\),z\(t\),p\),wherexxare differential states,zzare algebraic variables, andppare parameters to be learned\. Events are specified by guard functions

ϕe​\(t,x,z,p\)=0,\\phi\_\{e\}\(t,x,z,p\)=0,together with reinitialization rules that map pre\-event variables to post\-event variables\. The learning problem is posed as output matching under the DAE dynamics, algebraic equations, event guards, and reset maps\.

We develop and compare two gradient\-computation strategies\. The first is an automatic\-differentiation \(AD\)\-through\-simulation method\. It treats the simulator as a reduced map from parameters to predicted outputs and differentiates the composed loss directly\. For DAEs, this requires differentiating through the algebraic solve used inside the vector field\. We implement this using JAX anddiffrax\(Kidger,[2021](https://arxiv.org/html/2605.05395#bib.bib22)\), with a custom JVP for the algebraic solve based on the implicit function theorem, segmented event handling, composite guard detection, and right\-continuous target\-time evaluation\.

The second strategy is an explicit discrete\-adjoint method\. Instead of differentiating through the simulator as a black\-box computation graph, it constructs an event\-split discrete residual system\. Smooth segments are constrained by DAE integration residuals, while event blocks enforce guard conditions, reinitialization equations, continuity of unchanged variables, and post\-event algebraic consistency\. The residuals are not added as heuristic penalty terms\. They are equality constraints, or equivalently Lagrange terms that vanish on feasible trajectories\. Their multipliers define the adjoint variables, and the resulting backward sweep computes parameter gradients without explicitly forming full trajectory and event\-time sensitivity matrices\.

The contribution of this paper is a unified treatment of differentiable parameter learning for DAEs with state\-dependent events\. Specifically, we:

1. 1\.formulate parameter learning for hybrid DAEs as a constrained least\-squares problem with algebraic constraints, event guards, and reset maps;
2. 2\.present an AD\-through\-simulation implementation in which algebraic variables are differentiated through an implicit\-function\-theorem custom JVP and events are handled by segmented differentiable integration;
3. 3\.derive an explicit discrete\-adjoint formulation in which residuals are equality constraints and the associated multipliers are solved by a backward event\-split sweep;
4. 4\.compare the two approaches in terms of gradient interpretation, event\-time handling, implementation complexity, and local validity under a fixed event sequence\.

Both approaches compute gradients that are local to the event path selected by the forward simulation\. They assume transversal event crossings and no change in the number, identity, or ordering of events under sufficiently small parameter perturbations\. These assumptions are standard for differentiating piecewise\-smooth hybrid trajectories and clarify the limitations of gradient\-based optimization near grazing events, simultaneous events, or event sequence changes\. This local\-path interpretation is closely related to sensitivity updates in hybrid systems, often expressed through jump conditions or saltation matrices at event times\(Konget al\.,[2024](https://arxiv.org/html/2605.05395#bib.bib3); Sacconet al\.,[2014](https://arxiv.org/html/2605.05395#bib.bib2)\)\. These conditions describe how first\-order perturbations are propagated across a state\-triggered discontinuity, and they provide a useful conceptual link between hybrid\-system sensitivity analysis and the event\-time terms appearing in the gradient formulations studied here\.

We applied the two approaches to two examples: an electrical circuit with events and to a planar bouncing ball examples, where the events are triggered by ball impacts\. The code that generated the experimental results summarized in this paper can be found athttps://github\.com/ionmatei/diff\-dae\-events/\.

Paper structure:Section[2](https://arxiv.org/html/2605.05395#S2)formulates parameter learning for DAEs with state\-dependent events as a constrained least\-squares problem\. Section[3](https://arxiv.org/html/2605.05395#S3)presents the AD\-through\-simulation approach, including implicit differentiation of the algebraic solve and segmented event handling\. Section[4](https://arxiv.org/html/2605.05395#S4)develops the explicit discrete\-adjoint formulation based on event\-split residual constraints\. Section[6](https://arxiv.org/html/2605.05395#S6)compares the proposed approaches with related differentiable ODE and event\-learning methods\. Section[7](https://arxiv.org/html/2605.05395#S7)concludes the paper and outlines future directions\. The appendices include additional mathematical and implementation details\.

## 2Problem Formulation

Let

0=τ0<τ1<⋯<τM<τM\+1=T0=\\tau\_\{0\}<\\tau\_\{1\}<\\cdots<\\tau\_\{M\}<\\tau\_\{M\+1\}=Tdenote the event times generated by the hybrid DAE\. On each interval\(τm,τm\+1\)\(\\tau\_\{m\},\\tau\_\{m\+1\}\), the system satisfies

x˙m​\(t\)=f​\(t,xm​\(t\),zm​\(t\),p\),0=g​\(t,xm​\(t\),zm​\(t\),p\)\.\\dot\{x\}\_\{m\}\(t\)=f\(t,x\_\{m\}\(t\),z\_\{m\}\(t\),p\),\\qquad 0=g\(t,x\_\{m\}\(t\),z\_\{m\}\(t\),p\)\.At an event timeτm\\tau\_\{m\}, an active guard satisfies

ϕem​\(τm,xm−,zm−,p\)=0,\\phi\_\{e\_\{m\}\}\(\\tau\_\{m\},x\_\{m\}^\{\-\},z\_\{m\}^\{\-\},p\)=0,and a reset map or reinitialization equation relates the pre\-event and post\-event variables,

\(xm\+,zm\+\)=Ψem​\(τm,xm−,zm−,p\),\(x\_\{m\}^\{\+\},z\_\{m\}^\{\+\}\)=\\Psi\_\{e\_\{m\}\}\(\\tau\_\{m\},x\_\{m\}^\{\-\},z\_\{m\}^\{\-\},p\),with the understanding that algebraic variables may be recomputed after the differential state is reset\.

Given output datayidatay\_\{i\}^\{\\mathrm\{data\}\}at observation timestit\_\{i\}, the parameter\-estimation problem can be written as

minp,\{xm,zm\},\{τm\}\\displaystyle\\min\_\{p,\\\{x\_\{m\},z\_\{m\}\\\},\\\{\\tau\_\{m\}\\\}\}12​∑i=1Nobs‖h​\(ti,x​\(ti\),z​\(ti\),p\)−yidata‖22\\displaystyle\\frac\{1\}\{2\}\\sum\_\{i=1\}^\{N\_\{\\mathrm\{obs\}\}\}\\left\\\|h\(t\_\{i\},x\(t\_\{i\}\),z\(t\_\{i\}\),p\)\-y\_\{i\}^\{\\mathrm\{data\}\}\\right\\\|\_\{2\}^\{2\}s\.t\.\\displaystyle\\mathrm\{s\.t\.\}x˙m​\(t\)−f​\(t,xm​\(t\),zm​\(t\),p\)=0,\\displaystyle\\dot\{x\}\_\{m\}\(t\)\-f\(t,x\_\{m\}\(t\),z\_\{m\}\(t\),p\)=0,g​\(t,xm​\(t\),zm​\(t\),p\)=0,\\displaystyle g\(t,x\_\{m\}\(t\),z\_\{m\}\(t\),p\)=0,Em​\(τm,wm−,wm\+,p\)=0,m=1,…,M,\\displaystyle E\_\{m\}\(\\tau\_\{m\},w\_\{m\}^\{\-\},w\_\{m\}^\{\+\},p\)=0,\\qquad m=1,\\ldots,M,wherew=\(x,z\)w=\(x,z\)andEmE\_\{m\}collects the guard condition, reset equations, continuity conditions for variables not modified by the event, and post\-event algebraic consistency\.

This constrained formulation is the common starting point for both methods studied in this paper\. The AD\-through\-simulation method eliminates the trajectory variables by executing the simulator and differentiating the reduced map from parameters to outputs\. The discrete\-adjoint method keeps the residual constraints explicit after discretization and solves for the corresponding Lagrange multipliers\.

## 3AD\-through\-Simulation for DAEs with Events

The AD\-through\-simulation approach treats the event\-driven DAE simulator as a differentiable program\. For a selected parameter vectorpoptp\_\{\\mathrm\{opt\}\}, the simulator produces predicted outputs

Y^​\(popt\)=\[y^​\(t1;popt\)⋮y^​\(tN;popt\)\]\.\\widehat\{Y\}\(p\_\{\\mathrm\{opt\}\}\)=\\begin\{bmatrix\}\\hat\{y\}\(t\_\{1\};p\_\{\\mathrm\{opt\}\}\)\\\\ \\vdots\\\\ \\hat\{y\}\(t\_\{N\};p\_\{\\mathrm\{opt\}\}\)\\end\{bmatrix\}\.The reduced learning problem is then

minpopt⁡𝒥^​\(popt\)=𝒥​\(Y^​\(popt\)\),\\min\_\{p\_\{\\mathrm\{opt\}\}\}\\widehat\{\\mathcal\{J\}\}\(p\_\{\\mathrm\{opt\}\}\)=\\mathcal\{J\}\(\\widehat\{Y\}\(p\_\{\\mathrm\{opt\}\}\)\),and the gradient is computed by automatic differentiation:

∇popt𝒥^=AD​\[𝒥​\(Y^​\(popt\)\)\]\.\\nabla\_\{p\_\{\\mathrm\{opt\}\}\}\\widehat\{\\mathcal\{J\}\}=\\mathrm\{AD\}\\left\[\\mathcal\{J\}\(\\widehat\{Y\}\(p\_\{\\mathrm\{opt\}\}\)\)\\right\]\.
The main DAE\-specific issue is the algebraic variable\. During integration,zzis defined implicitly by

g​\(t,x,z,p\)=0\.g\(t,x,z,p\)=0\.We write the solution as

z=ζ​\(t,x,p\),g​\(t,x,ζ​\(t,x,p\),p\)=0\.z=\\zeta\(t,x,p\),\\qquad g\(t,x,\\zeta\(t,x,p\),p\)=0\.The differential state is advanced using the reduced vector field

x˙=f​\(t,x,ζ​\(t,x,p\),p\)\.\\dot\{x\}=f\(t,x,\\zeta\(t,x,p\),p\)\.In the implementation, the algebraic solve is performed by chord Newton iterations and differentiated using a custom JVP derived from the implicit function theorem\. Thus, gradients propagate through the algebraic constraint without differentiating through every Newton iteration\.

Events are handled by segmented integration\. At the beginning of each segment, all guard functions are evaluated and guards that are already active are masked out to avoid immediate retriggering\. The remaining active guards are combined into a scalar event condition,

Φ​\(t,x,p\)=mine∈𝒜⁡ϕe​\(t,x,ζ​\(t,x,p\),p\),\\Phi\(t,x,p\)=\\min\_\{e\\in\\mathcal\{A\}\}\\phi\_\{e\}\(t,x,\\zeta\(t,x,p\),p\),where𝒜\\mathcal\{A\}is the set of active guards\. A root finder locates the first zero crossing of this composite guard\. The corresponding event index is selected, the reset map is applied, the algebraic variables are recomputed, and the next segment starts from the post\-event state\.

To make this process compatible with JIT compilation, the simulator is written as fixed\-length scans over a prescribed maximum number of event\-bounded segments\. Physical segments are recorded until the terminal time is reached; remaining iterations are padding segments\. Target\-time evaluation is performed by reintegrating each recorded segment with the requested target times clipped to that segment\. The output at each target time is selected from the latest real segment whose closed interval contains the target, giving the right\-continuous convention

y^​\(τm\)=h​\(τm,x​\(τm\+\),z​\(τm\+\),p\)\.\\hat\{y\}\(\\tau\_\{m\}\)=h\(\\tau\_\{m\},x\(\\tau\_\{m\}^\{\+\}\),z\(\\tau\_\{m\}^\{\+\}\),p\)\.
The resulting loss and gradient are evaluated with a JIT\-compiledvalue\_and\_gradcall\. These gradients pass through the algebraic solve, ODE integration, event\-time computation, reset maps, target\-time selection, and output reconstruction\. They are local to the event sequence selected by the forward simulation\.

A useful way to interpret this method is that it differentiates the numerical algorithm actually executed by the simulator\. This includes solver steps, algebraic solves, event\-time refinement, reset maps, and target\-time selection\. The advantage is that the gradient is tightly coupled to the solver tolerance and implementation used for the forward pass\. The drawback is that reverse\-mode AD must retain or reconstruct the computational trace of the segmented simulation\. This can become expensive when the number of states, events, or solver steps grows\. More details can be found in Appendix[A](https://arxiv.org/html/2605.05395#A1)\.

## 4Discrete\-Adjoint Parameter Learning for DAEs with Events

The AD\-through\-simulation approach requires a simulator whose numerical operations are visible to the automatic\-differentiation system\. Many mature DAE solvers, however, are available only as external numerical engines\. The discrete\-adjoint approach targets this setting\. It uses the external solver to generate the event\-split forward trajectory at the current parameter values, including the event times and pre\- and post\-event states\. These quantities are then used to construct a differentiable discrete residual system, from which the Lagrange multipliers and parameter gradient are computed explicitly\.

The discrete\-adjoint method starts from the same constrained parameter estimation problem, but keeps the constraints explicit after discretization\. The forward solve produces an event\-split trajectory: smooth segments between events and event records containing the event time, active event index, pre\-event state, and post\-event state\.

Let segmentmmbe bounded by event timesτm\\tau\_\{m\}andτm\+1\\tau\_\{m\+1\}\. The implementation stores normalized time coordinates

0=ηm,0<ηm,1<⋯<ηm,Nm−1=1,0=\\eta\_\{m,0\}<\\eta\_\{m,1\}<\\cdots<\\eta\_\{m,N\_\{m\}\-1\}=1,and reconstructs physical node times as

tm,k=τm\+ηm,k​\(τm\+1−τm\)\.t\_\{m,k\}=\\tau\_\{m\}\+\\eta\_\{m,k\}\(\\tau\_\{m\+1\}\-\\tau\_\{m\}\)\.Thus, event times determine both the segment boundaries and the physical time steps used in the residuals\.

For

wm,k=\[xm,kzm,k\],w\_\{m,k\}=\\begin\{bmatrix\}x\_\{m,k\}\\\\ z\_\{m,k\}\\end\{bmatrix\},the smooth\-step residual is the trapezoidal DAE residual

Rm,k=\[−xm,k\+1\+xm,k\+hm,k2​\[f​\(tm,k,xm,k,zm,k,p\)\+f​\(tm,k\+1,xm,k\+1,zm,k\+1,p\)\]g​\(tm,k\+1,xm,k\+1,zm,k\+1,p\)\]=0,R\_\{m,k\}=\\begin\{bmatrix\}\-x\_\{m,k\+1\}\+x\_\{m,k\}\+\\frac\{h\_\{m,k\}\}\{2\}\\left\[f\(t\_\{m,k\},x\_\{m,k\},z\_\{m,k\},p\)\+f\(t\_\{m,k\+1\},x\_\{m,k\+1\},z\_\{m,k\+1\},p\)\\right\]\\\\ g\(t\_\{m,k\+1\},x\_\{m,k\+1\},z\_\{m,k\+1\},p\)\\end\{bmatrix\}=0,wherehm,k=tm,k\+1−tm,kh\_\{m,k\}=t\_\{m,k\+1\}\-t\_\{m,k\}\. Event blocks are represented by residuals of the form

Em​\(τm,wm−,wm\+,p\)=0,E\_\{m\}\(\\tau\_\{m\},w\_\{m\}^\{\-\},w\_\{m\}^\{\+\},p\)=0,which enforce the guard condition, reset equations, continuity of unchanged variables, and algebraic consistency after the event\.

The finite\-dimensional constrained problem is therefore

minW,τ,p\\displaystyle\\min\_\{W,\\tau,p\}𝒥h​\(W,τ,p\)\\displaystyle\\mathcal\{J\}\_\{h\}\(W,\\tau,p\)s\.t\.\\displaystyle\\mathrm\{s\.t\.\}Rm,k​\(W,τ,p\)=0,\\displaystyle R\_\{m,k\}\(W,\\tau,p\)=0,Em​\(W,τ,p\)=0,\\displaystyle E\_\{m\}\(W,\\tau,p\)=0,whereWWdenote the collection of the state and algebraic variables\. Problem \([4](https://arxiv.org/html/2605.05395#S4.Ex21)\) is not a penalty relaxation\. It is the discrete constrained form of the same parameter\-estimation problem\. The corresponding discrete Lagrangian is

ℒh=𝒥h​\(W,τ,p\)\+∑m,kλm,k⊤​Rm,k​\(W,τ,p\)\+∑mμm⊤​Em​\(W,τ,p\)\.\\mathcal\{L\}\_\{h\}=\\mathcal\{J\}\_\{h\}\(W,\\tau,p\)\+\\sum\_\{m,k\}\\lambda\_\{m,k\}^\{\\top\}R\_\{m,k\}\(W,\\tau,p\)\+\\sum\_\{m\}\\mu\_\{m\}^\{\\top\}E\_\{m\}\(W,\\tau,p\)\.Because the residuals vanish on feasible trajectories,ℒh=𝒥h\\mathcal\{L\}\_\{h\}=\\mathcal\{J\}\_\{h\}for any multipliers\. The multipliers are chosen so that the stationarity conditions with respect toWWandτ\\taueliminate the need to compute explicit trajectory and event\-time sensitivities\. The gradient is then obtained from

d​𝒥hd​p=∂ℒh∂p\.\\frac\{d\\mathcal\{J\}\_\{h\}\}\{dp\}=\\frac\{\\partial\\mathcal\{L\}\_\{h\}\}\{\\partial p\}\.
Computationally, the gradient is evaluated by a reverse block sweep\. Smooth segment blocks solve local adjoint systems associated with the trapezoidal DAE residuals\. Event blocks solve event\-adjoint equations associated with the guard and reinitialization residuals\. Because event times define adjacent segment boundaries, event\-time stationarity couples an event block with the segment immediately before it\. The implementation handles this by storing an affine pending\-event adjoint and resolving its scalar coefficient when the preceding segment is swept\. The discrete\-adjoint route trades implementation convenience for control over the gradient calculation\. Its accuracy is tied to the residual discretization used for the adjoint calculation, not necessarily to the tolerance of the forward DAE solver\. This distinction explains why a high\-accuracy forward solve does not automatically imply a high\-accuracy discrete\-adjoint gradient: the adjoint residual mesh and quadrature must be sufficiently accurate for the optimization problem\. Conversely, because the method does not require differentiating through the solver internals, it can be paired with mature external DAE solvers and can expose smaller linear systems when symbolic preprocessing or tearing is available\. More details can be found in Appendix[B](https://arxiv.org/html/2605.05395#A2)\.

## 5Experiments

We evaluate the proposed gradient pipeline on two hybrid\-system parameter identification problems\. The first is an electrical ladder network with a state\-dependent reinit event, and is used to compare the implicit\-AD route \(diffrax with a JAX\-AD\-friendly algebraic solver\) against the discrete\-adjoint route \(Sundials/IDA forward solve, padded backward sweep for computing the multipliers\)\. The second is a planar multi\-ball system with contact\-driven reinit events; we use it to study how each route scales with the number of ballsNNand to position our JAX\-AD pipeline against the established PyTorch\-AD baseline\. The code for generating the experimental results can be found athttps://github\.com/ionmatei/diff\-dae\-events/\.

### 5\.1Cauer ladder with a state\-dependent event

The Cauer DAE has 7 differential states \(capacitor voltagesC1C\_\{1\}–C5C\_\{5\}and inductor currentsL1L\_\{1\},L2L\_\{2\}\), 5 algebraic states, and 7 positive\-valued parametersθ=\(C1,C2,C3,C4,C5,L1,L2\)\\theta=\(C\_\{1\},C\_\{2\},C\_\{3\},C\_\{4\},C\_\{5\},L\_\{1\},L\_\{2\}\)\. A singlewhenclause resetsC3C\_\{3\}when its voltage crosses0\.50\.5V\. The trajectory over 20 seconds exhibits1111event firings\. We initialize the parameters from the truth via a log\-uniform multiplicative bias of half\-width0\.10\.1\(so each component lies in\[θ⋆⋅e−0\.1,θ⋆⋅e\+0\.1\]\[\\theta^\{\\star\}\\\!\\cdot\\\!e^\{\-0\.1\},\\theta^\{\\star\}\\\!\\cdot\\\!e^\{\+0\.1\}\], i\.e\. a±10%\\pm 10\\%band\) and run Adam for500500iterations with step size10−210^\{\-2\}on500500uniform target points\. Both routes use the same sigmoid blending sharpnessβ=150\\beta=150across event boundaries to widen the loss basin by smoothing the effects of the events \(see[A\.5](https://arxiv.org/html/2605.05395#A1.SS5)\)\.

Table[1](https://arxiv.org/html/2605.05395#S5.T1)reports the relative error per parameter, the final loss of the chosen snapshot, and the wall\-clock cost\. The implicit\-AD route attains a lower final loss than the discrete\-adjoint route \(1\.69×10−61\.69\\times 10^\{\-6\}vs\.4\.00×10−44\.00\\times 10^\{\-4\}\)\. It also recovers several parameters to below one percent relative error, although the remaining error onC5C\_\{5\}indicates that some parameter directions are weakly identifiable from the chosen output and time grid\. The discrete\-adjoint route recoversC3C\_\{3\},C4C\_\{4\}, andL1L\_\{1\}accurately, but retains larger residual errors onC1C\_\{1\},C5C\_\{5\}, andL2L\_\{2\}\. Figure[1](https://arxiv.org/html/2605.05395#S5.F1)overlays the optimized trajectories on the ground truth, and Figures[2](https://arxiv.org/html/2605.05395#S5.F2)shows the loss histories\.

Table 1:Cauer ladder with a state\-dependent reinit event\. Per\-parameter relative error\|θ−θ⋆\|/\|θ⋆\|\|\\theta\-\\theta^\{\\star\}\|/\|\\theta^\{\\star\}\|, expressed in percent, after Adam optimization with a±10%\\pm 10\\%initial bias and500500iterations\. Best\-loss snapshot reported\. AD: implicit differentiation through diffrax with a JAX\-AD\-friendly algebraic solver\. DA: discrete adjoint via Sundials/IDA forward and padded backward sweep\.![Refer to caption](https://arxiv.org/html/2605.05395v1/figures/optimization_cauer_events_adam_outputs.png)\(a\)Implicit\-AD route\.
![Refer to caption](https://arxiv.org/html/2605.05395v1/figures/optimization_cauer_events_da_outputs.png)\(b\)Discrete\-adjoint route\.

Figure 1:Cauer ladder: optimized vs\. true trajectories at the best\-loss snapshot\. Both routes capture the qualitative event structure; the AD route additionally matches the post\-reset oscillation envelope\.![Refer to caption](https://arxiv.org/html/2605.05395v1/figures/optimization_cauer_events_adam_loss.png)\(a\)AD loss history\.
![Refer to caption](https://arxiv.org/html/2605.05395v1/figures/optimization_cauer_events_da_loss.png)\(b\)DA loss history\.

Figure 2:Cauer ladder: Adam loss histories on log scale\. The AD curve descends∼\\sim5 decades whereas the DA curve plateaus at∼10−4\\sim 10^\{\-4\}; the gap is consistent across the last∼100\\sim 100iterations\.The parameter errors should be interpreted in the context of output sensitivity and identifiability\. Different parameter combinations can produce nearly indistinguishable output trajectories, especially along weakly identifiable directions\. Therefore, matching the output loss is the primary optimization objective, while exact parameter recovery depends on the informativeness of the measured outputs and the time grid\.

From the optimization perspective any parameter combination that minimize the loss is acceptable \(i\.e\., there may be multiple local minima\)\. The accuracy gap between the two routes on this benchmark traces back to two effects\. First, the discrete\-adjoint pipeline constructs the Jacobian based on a trapezoidal\-step approach, so its gradient quality is bounded by the adjoint mesh density rather than by the forward solver tolerance\. Second, the loss surface near the truth is anisotropic and dominated by long\-time\-scale plateaus between events; with a uniform target grid the trapezoidal Jacobians of the discrete adjoint accumulate a small but coherent bias along the soft directions, observable here as the residual error onC5C\_\{5\}andC1C\_\{1\}\. The implicit\-AD pipeline gradient inherits the diffrax tolerance \(rtol=atol=10−6\\mathrm\{rtol\}\{=\}\\mathrm\{atol\}\{=\}10^\{\-6\}\) directly\. The price is that AD must hold the entire diffrax tape per segment in JAX memory; on this 7\-parameter system this is negligible, but it scales with the number of state dimensions and the length of the dense\-output tape, a distinction that drives the next experiment\. The reported loss function values are computed based on the best parameters without the segment blending that has a smoothing effect\.

### 5\.2Multi\-ball planar bouncing system: scaling withNN

We consider a planar system ofNNidentical bouncing balls in a square box of half\-width1010, with elastic collisions against the four walls \(restitutionebe\_\{b\}\) and inelastic ground impacts \(restitutionege\_\{g\}\)\. The dynamics is an ODE with state\-dependent events,4​N\+\(N2\)4N\+\\binom\{N\}\{2\}event clauses forN∈\{3,7,15\}N\\in\\\{3,7,15\\\}\. We optimize three identifiable parametersθ=\(g,eg,eb\)\\theta=\(g,e\_\{g\},e\_\{b\}\)from a randomly biased start \(±10%\\pm 10\\%log\-uniform\)\. All three identification routes share the same500500points uniform target grid, the same Adam step size, and the same convergence tolerance; the only differences are how the gradient is obtained and which framework holds the autograd tape:

jax\_adour JAX/diffrax pipeline with implicit AD, same algebraic\-solver custom\-jvp as in[5\.1](https://arxiv.org/html/2605.05395#S5.SS1)\.

jax\_dathe JAX padded\-discrete\-adjoint pipeline of[5\.1](https://arxiv.org/html/2605.05395#S5.SS1), ported to ODE\-with\-events\.

pytorch\_adPyTorch baseline usingtorchdiffeqwith hand\-written event\-time root\-finding, AD through the entire segmented integration tape\.

Table[2](https://arxiv.org/html/2605.05395#S5.T2)summarizes the result\. Per\-parameter relative errors are reported as percentages, the final validation loss is on a denser held\-out target grid, and timing is total wall\-clock for the budget of iterations actually used\.

Table 2:Multi\-ball bouncing system withN∈\{3,7,15\}N\\in\\\{3,7,15\\\}balls\. Per\-parameter relative error in %, final validation loss, and total wall\-clock time\. Budget:300300iters forN∈\{3,7\}N\\in\\\{3,7\\\}and600600iters forN=15N=15\. Times measured on a single CPU thread\.![Refer to caption](https://arxiv.org/html/2605.05395v1/figures/benchmark_three_N3_prediction_error_vs_time.png)\(a\)N=3N=3\.
![Refer to caption](https://arxiv.org/html/2605.05395v1/figures/benchmark_three_N7_prediction_error_vs_time.png)\(b\)N=7N=7\.
![Refer to caption](https://arxiv.org/html/2605.05395v1/figures/benchmark_three_N15_prediction_error_vs_time.png)\(c\)N=15N=15\.

Figure 3:Bouncing balls: prediction error vs\. wall\-clock time\. The horizontal axis exposes the throughput differential: the JAX\-AD curve reaches the same accuracy plateau as the PyTorch\-AD curve in5×5\\\!\\timesto35×35\\\!\\timesless wall\-clock time across the three problem sizes\.Three observations stand out\.

*\(i\) Per\-iteration scaling withNN\.*The JAX\-AD pipeline is the flattest:74→71→11074\\\!\\to\\\!71\\\!\\to\\\!110ms/iter asNNgoes from33to1515\. JIT compilation amortizes across iterations, so the marginal cost is dominated by the diffrax forward and the segment\-wise solve; both stay vectorized inNN\. The PyTorch\-AD baseline scales steeply \(554→1740→6652554\\\!\\to\\\!1740\\\!\\to\\\!6652ms/iter, roughly3×3\\timesperN→2​NN\\\!\\to\\\!2N\), reflecting the unfused per\-event\-segment work and the overhead oftorchdiffeq’s adaptive stepper rebuilding its graph each call\. The JAX\-DA pipeline sits between the two and inherits the discrete\-adjoint cost of building per\-step Jacobians, which scales with the segment count and hence with the event density\.

*\(ii\) Accuracy/throughput frontier\.*AtN∈\{3,7\}N\\\!\\in\\\!\\\{3,7\\\}, our JAX\-AD pipeline strictly dominates: it reaches a lower loss \(by∼2×\\sim\\\!2\\\!\\times\) in∼5×–16×\\sim\\\!5\\\!\\times\\text\{\-\-\}16\\\!\\timesless wall time\. AtN=15N=15the comparison flips: PyTorch\-AD reaches3\.9×10−83\.9\\\!\\times\\\!10^\{\-8\}versus JAX\-AD’s1\.4×10−51\.4\\\!\\times\\\!10^\{\-5\}but at37×37\\\!\\timesthe cost\. Whether the extra accuracy is worth it depends on the downstream task; for parameter identification under noisy measurements, the JAX\-AD result is well within the noise floor of any realistic measurement model\.

*\(iii\) When the discrete adjoint trails\.*The JAX\-DA results mirror the Cauer experiment: the route is competitive on small problems but the gradient bias induced by the per\-segment trapezoidal mesh and the uniform target grid grows withNN, ending at∼4%\\sim\\\!4\\%onebe\_\{b\}atN=15N=15\. This is not a fundamental property of the discrete adjoint, finer per\-segment meshes would close the gap, but it is a property of the padded fixed\-shape JIT regime that makes the route fast on small problems\.

## 6Discussion and Comparison with Existing Approaches

Most differentiable continuous\-time learning methods are formulated for ODEs\. Neural ODEs introduced the view of a neural network layer as the solution of an initial\-value problem and popularized adjoint sensitivity methods for memory\-efficient training\(Chenet al\.,[2018](https://arxiv.org/html/2605.05395#bib.bib26)\)\. Implementations such astorchdiffeqprovide differentiable ODE solvers, support direct backpropagation through solver operations, and expose an adjoint interface for constant\-memory reverse\-mode differentiation\(Chen,[2018](https://arxiv.org/html/2605.05395#bib.bib24)\)\. These tools are effective when the model is an ODE and the state evolves continuously between observation times\.

Event\-aware ODE methods extend this setting by adding event functions and reset maps\. They allow models such as bouncing balls or switching systems to be represented as smooth ODE flows separated by event\-triggered state updates\(Chenet al\.,[2021](https://arxiv.org/html/2605.05395#bib.bib25)\)\. In such methods, gradients can pass through event times, terminal states, and reset maps, provided the event sequence is fixed locally\. This is closely related to our AD\-through\-simulation approach\. The key difference is that our setting includes algebraic constraints: the simulator must solve for algebraic variables during integration, after events, and during output reconstruction\.

Other recent work uses ODEs to model irregular event sequences\. For example, COTODE constructs continuous latent trajectories for event\-sequence modeling by interpolating event embeddings and integrating a Neural ODE\-type model\(Kuleshovet al\.,[2024](https://arxiv.org/html/2605.05395#bib.bib23)\)\. This addresses a different problem\. In such models, events are observations or inputs to a latent representation\. In our setting, events are physical or logical constraints generated by guard equations; they trigger reinitialization rules that must remain consistent with algebraic equations\.

Classical DAE solvers and equation\-based simulation tools already provide robust mechanisms for forward simulation of DAEs with events, including root finding, state reinitialization, and consistency restoration\. However, they are not usually designed for end\-to\-end differentiable parameter learning\. Sensitivity analysis for DAEs is well established, but events add additional terms through event\-time shifts and reset maps\. Practical simulation tools also often expose the forward trajectory without exposing the residual\-level structure needed for efficient gradient\-based optimization\.

Our work occupies the intersection of these directions\. Compared with differentiable ODE solvers, we treat algebraic variables as implicit functions of time, differential states, and parameters, and differentiate them through the implicit function theorem\. Compared with event\-sequence Neural ODE models, we treat events as constraints of the physical model rather than as data tokens\. Compared with classical DAE simulation, we focus on differentiable parameter learning and provide both an AD\-through\-simulation implementation and an explicit discrete\-adjoint formulation\. Table[3](https://arxiv.org/html/2605.05395#S6.T3)summarizes the conceptual differences\.

Table 3:Comparison of differentiable event\-learning approaches\.The AD\-through\-simulation method is closest in spirit to differentiable ODE solvers with event handling\. Its advantage is implementation simplicity: once the DAE is reduced by solving the algebraic equations, the event\-driven simulation can be written as a differentiable program\. Its limitation is that the returned gradient is the derivative of the computational path selected by the forward event solver\.

The explicit discrete\-adjoint method is closer to constrained optimization\. It keeps the residual equations visible and solves for their Lagrange multipliers\. This gives direct control over the discrete residuals, event blocks, and event\-time stationarity conditions\. It also clarifies the theoretical role of residuals: they are not loss penalties, but equality constraints whose multipliers produce the adjoint gradient\.

The discrete\-adjoint construction used here is closely related to earlier work in the control and sensitivity\-analysis communities\. In power\-system and hybrid systems analysis, trajectory sensitivities for differential\-algebraic\-discrete models were developed with explicit jump conditions at switching and reset events\(Galánet al\.,[1999](https://arxiv.org/html/2605.05395#bib.bib13); Hiskens and Pai,[2000](https://arxiv.org/html/2605.05395#bib.bib14)\)\. Adjoint sensitivity methods were also developed for DAEs, including the derivation and numerical solution of adjoint DAE systems\(Caoet al\.,[2003](https://arxiv.org/html/2605.05395#bib.bib34)\)\. More recently, discrete\-adjoint methods have been implemented for hybrid dynamical systems with state\- and time\-dependent switching, with applications to preventive and corrective control\(Zhanget al\.,[2017](https://arxiv.org/html/2605.05395#bib.bib12)\)\. Our formulation follows this same first\-discretize\-then\-adjoint philosophy, but focuses on parameter learning for DAEs with event\-triggered reinitialization and makes the event\-split residual structure explicit\. he distinction in this paper is not the existence of adjoint methods for hybrid systems, but the combination of an event\-split DAE residual construction with a practical parameter\-learning workflow and a comparison against AD\-through\-simulation implementations\.

The algebraic solve used in our AD\-through\-simulation implementation should be viewed as a baseline realization of implicit differentiation rather than as a state\-of\-the\-art Modelica\-like compilation strategy\(Fritzson,[2014](https://arxiv.org/html/2605.05395#bib.bib7)\)\. We solve the algebraic equations with a Newton\-type iteration and differentiate the solution map using the implicit function theorem\. This is mathematically standard and sufficient for exposing the role of algebraic constraints in gradient computation\. However, production DAE solvers and Modelica tools typically perform substantial symbolic and structural preprocessing before any numerical nonlinear solve is called\. Classical DAE solvers such as DASSL and the SUNDIALS family provide robust implicit integration and nonlinear\-solver infrastructure for DAE/ODE systems\(Hindmarshet al\.,[2005](https://arxiv.org/html/2605.05395#bib.bib19); Petzold,[1982](https://arxiv.org/html/2605.05395#bib.bib20)\)\. In Modelica\-based workflows, the compiler first analyzes the equation structure, performs matching and sorting, identifies algebraic loops, and applies tearing to reduce the number of variables that must be solved simultaneously\(Casella,[2015](https://arxiv.org/html/2605.05395#bib.bib16); Elmqvist and Otter,[1994](https://arxiv.org/html/2605.05395#bib.bib18); Täuberet al\.,[2014](https://arxiv.org/html/2605.05395#bib.bib17); Zimmer,[2009](https://arxiv.org/html/2605.05395#bib.bib15)\)\. Thus, our Newton/implicit\-function approach should be understood as a clear differentiable reference implementation\. A production implementation could replace the full algebraic solve by the smaller torn algebraic loops produced by a Modelica compiler, thereby reducing both the forward nonlinear solve and the backward implicit linear solve\. Related work on equation\-based algorithmic differentiation also emphasizes that exploiting the equation structure of DAE models can be more memory efficient than differentiating a fully expanded simulation trace\(Elsheikhet al\.,[2015](https://arxiv.org/html/2605.05395#bib.bib1)\)\.

Julia and the SciML ecosystem are highly relevant to differentiable simulation\. Julia was designed to combine high\-level numerical programming with performance comparable to traditional compiled languages\(Bezansonet al\.,[2017](https://arxiv.org/html/2605.05395#bib.bib9)\), and DifferentialEquations\.jl provides a feature\-rich ecosystem for ODEs, DAEs, and hybrid differential equations\(Rackauckas and Nie,[2017](https://arxiv.org/html/2605.05395#bib.bib10)\)\. Therefore, a Julia/SciML implementation could be faster for some solver\-heavy or AD\-heavy workloads, especially when the solver, event handling, and sensitivity calculation are compiled and specialized together\. Our focus on JAX and PyTorch is motivated instead by their broader adoption in the machine learning community and the larger surrounding ecosystem of tooling, examples, GPU workflows, and developer support\(Bradburyet al\.,[2018](https://arxiv.org/html/2605.05395#bib.bib38); Paszkeet al\.,[2019](https://arxiv.org/html/2605.05395#bib.bib11)\)\. This difference also makes a direct runtime comparison potentially misleading: performance would depend not only on the programming language, but also on the chosen solver, event\-location strategy, algebraic\-loop handling, AD mode, compiler specialization, batching strategy, and hardware backend\. For this reason, we treat Julia/SciML as an important complementary implementation path, but not as a direct baseline unless both implementations are engineered to the same level of solver and compiler optimization\.

## 7Conclusion

This paper studied differentiable parameter optimization for semi\-explicit DAEs with state\-dependent events\. The central difficulty is that the parameter\-to\-output map is shaped not only by the continuous DAE dynamics, but also by algebraic consistency constraints, parameter\-dependent event times, and event\-triggered reset maps\. These features make direct application of standard ODE\-based differentiable simulation methods insufficient without additional care\.

We presented two complementary approaches\. The first is an AD\-through\-simulation method that reduces the DAE by solving the algebraic variables inside the vector field and differentiates the algebraic solve using the implicit function theorem\. Events are handled through segmented integration, composite guard detection, reset maps, and right\-continuous target\-time evaluation\. This approach is relatively simple to implement in modern AD frameworks and naturally fits JAX/PyTorch\-style differentiable programming\.

The second approach is an explicit discrete\-adjoint method\. It represents the forward simulation as an event\-split residual system, with trapezoidal DAE residuals on smooth segments and event residuals at guard crossings\. The residuals are treated as equality constraints rather than penalty terms\. Their Lagrange multipliers define the discrete adjoint variables, and a backward event\-split sweep computes the parameter gradient while avoiding explicit trajectory and event\-time sensitivity matrices\.

The two methods offer different tradeoffs\. AD\-through\-simulation benefits from the flexibility of differentiable programming and is easier to integrate with machine\-learning toolchains\. The explicit discrete\-adjoint method does not require an AD\-through simulation engine, it gives more control over the numerical residuals, event\-time stationarity conditions, and linear algebra used in the gradient computation\. In both cases, the gradients are local to the event path selected by the forward simulation and rely on transversal guard crossings and unchanged event ordering under small parameter perturbations\.

Future work will focus on improving robustness near nonsmooth event transitions, including grazing contacts, simultaneous events, and event\-order changes\. A second direction is to integrate more advanced DAE compilation techniques, such as algebraic\-loop detection and tearing, so that both the forward algebraic solve and the backward implicit solve operate on smaller systems\. Finally, larger\-scale comparisons across AD frameworks, Modelica\-based toolchains, and production DAE solvers would help clarify the practical performance and scalability limits of differentiable optimization for hybrid DAE models\.

## References

- \[1\]\(1998\)Computer methods for ordinary differential equations and differential\-algebraic equations\.Society for Industrial and Applied Mathematics,Philadelphia, PA\.External Links:[Document](https://dx.doi.org/10.1137/1.9781611971392)Cited by:[§1](https://arxiv.org/html/2605.05395#S1.p1.1)\.
- \[2\]J\. Bezanson, A\. Edelman, S\. Karpinski, and V\. B\. Shah\(2017\)Julia: a fresh approach to numerical computing\.SIAM Review59\(1\),pp\. 65–98\.External Links:[Document](https://dx.doi.org/10.1137/141000671)Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p10.1)\.
- \[3\]J\. Bradbury, R\. Frostig, P\. Hawkins, M\. J\. Johnson, C\. Leary, D\. Maclaurin, G\. Necula, A\. Paszke, J\. VanderPlas, S\. Wanderman\-Milne, and Q\. Zhang\(2018\)JAX: composable transformations of Python\+NumPy programs\.External Links:[Link](http://github.com/google/jax)Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p10.1)\.
- \[4\]Y\. Cao, S\. Li, L\. Petzold, and R\. Serban\(2003\)Adjoint sensitivity analysis for differential\-algebraic equations: the adjoint DAE system and its numerical solution\.SIAM Journal on Scientific Computing24\(3\),pp\. 1076–1089\.Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p8.1)\.
- \[5\]F\. Casella\(2015\)Simulation of large\-scale models in Modelica: state of the art and future perspectives\.InProceedings of the 11th International Modelica Conference,Versailles, France,pp\. 459–468\.External Links:[Document](https://dx.doi.org/10.3384/ecp15118459)Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p9.1)\.
- \[6\]R\. T\. Q\. Chen, B\. Amos, and M\. Nickel\(2021\)Learning neural event functions for ordinary differential equations\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=kW_zpEmMLdP)Cited by:[§1](https://arxiv.org/html/2605.05395#S1.p3.1),[§6](https://arxiv.org/html/2605.05395#S6.p2.1)\.
- \[7\]R\. T\. Q\. Chen, Y\. Rubanova, J\. Bettencourt, and D\. K\. Duvenaud\(2018\)Neural ordinary differential equations\.InAdvances in Neural Information Processing Systems,Vol\.31,pp\. 6571–6583\.External Links:[Link](https://papers.neurips.cc/paper_files/paper/2018/hash/69386f6bb1dfed68692a24c8686939b9-Abstract.html)Cited by:[§1](https://arxiv.org/html/2605.05395#S1.p3.1),[§6](https://arxiv.org/html/2605.05395#S6.p1.1)\.
- \[8\]R\. T\. Q\. Chen\(2018\)torchdiffeq: differentiable ode solvers with full gpu support and o\(1\)\-memory backpropagation\.Note:https://github\.com/rtqichen/torchdiffeqGitHub repositoryCited by:[§1](https://arxiv.org/html/2605.05395#S1.p3.1),[§6](https://arxiv.org/html/2605.05395#S6.p1.1)\.
- \[9\]H\. Elmqvist and M\. Otter\(1994\)Methods for tearing systems of equations in object\-oriented modeling\.InProceedings of the European Simulation Multiconference,Barcelona, Spain,pp\. 326–332\.Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p9.1)\.
- \[10\]A\. Elsheikh, F\. Casella, D\. Zimmer, and W\. Schamai\(2015\)An equation\-based algorithmic differentiation technique for differential algebraic equations\.Journal of Computational and Applied Mathematics281,pp\. 135–151\.External Links:[Document](https://dx.doi.org/10.1016/j.cam.2014.12.006)Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p9.1)\.
- \[11\]P\. Fritzson\(2014\)Principles of object\-oriented modeling and simulation with modelica 3\.3: a cyber\-physical approach\.John Wiley & Sons\.External Links:ISBN 978\-1118859124,[Link](https://onlinelibrary.wiley.com/doi/book/10.1002/9781118989166)Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p9.1)\.
- \[12\]S\. Galán, W\. F\. Feehery, and P\. I\. Barton\(1999\)Parametric sensitivity functions for hybrid discrete/continuous systems\.Applied Numerical Mathematics31\(1\),pp\. 17–47\.External Links:[Document](https://dx.doi.org/10.1016/S0168-9274%2898%2900125-1)Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p8.1)\.
- \[13\]E\. Hairer and G\. Wanner\(1996\)Solving ordinary differential equations ii: stiff and differential\-algebraic problems\.2 edition,Springer Series in Computational Mathematics, Vol\.14,Springer,Berlin, Heidelberg\.External Links:[Document](https://dx.doi.org/10.1007/978-3-642-05221-7)Cited by:[§1](https://arxiv.org/html/2605.05395#S1.p1.1)\.
- \[14\]A\. C\. Hindmarsh, P\. N\. Brown, K\. E\. Grant, S\. L\. Lee, R\. Serban, D\. E\. Shumaker, and C\. S\. Woodward\(2005\)SUNDIALS: suite of nonlinear and differential/algebraic equation solvers\.ACM Transactions on Mathematical Software31\(3\),pp\. 363–396\.External Links:[Document](https://dx.doi.org/10.1145/1089014.1089020)Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p9.1)\.
- \[15\]I\. A\. Hiskens and M\. A\. Pai\(2000\)Trajectory sensitivity analysis of hybrid systems\.IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications47\(2\),pp\. 204–220\.External Links:[Document](https://dx.doi.org/10.1109/81.828574)Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p8.1)\.
- \[16\]P\. Kidger\(2021\)Diffrax: numerical differential equation solvers in JAX\.Note:https://github\.com/patrick\-kidger/diffraxSoftware libraryCited by:[§1](https://arxiv.org/html/2605.05395#S1.p3.1),[§1](https://arxiv.org/html/2605.05395#S1.p5.1)\.
- \[17\]N\. J\. Kong, J\. J\. Payne, J\. Zhu, and A\. M\. Johnson\(2024\)Saltation matrices: the essential tool for linearizing hybrid dynamical systems\.Proceedings of the IEEE\.External Links:[Document](https://dx.doi.org/10.1109/JPROC.2024.3433157)Cited by:[§1](https://arxiv.org/html/2605.05395#S1.p8.1)\.
- \[18\]I\. Kuleshov, G\. Boeva, V\. Zhuzhel, E\. Romanenkova, E\. Vorsin, and A\. Zaytsev\(2024\)COTODE: COntinuous Trajectory neural Ordinary Differential Equations for modelling event sequences\.External Links:2408\.08055,[Link](https://arxiv.org/abs/2408.08055)Cited by:[§1](https://arxiv.org/html/2605.05395#S1.p3.1),[§6](https://arxiv.org/html/2605.05395#S6.p3.1)\.
- \[19\]Modelica Association\(2025\)Modelica language specification: appendix b, modelica dae representation\.Modelica Association\.Note:Accessed 2026\-05\-04External Links:[Link](https://specification.modelica.org/master/modelica-dae-representation.html)Cited by:[§1](https://arxiv.org/html/2605.05395#S1.p1.1)\.
- \[20\]A\. Paszke, S\. Gross, F\. Massa, A\. Lerer, J\. Bradbury, G\. Chanan, T\. Killeen, Z\. Lin, N\. Gimelshein, L\. Antiga, A\. Desmaison, A\. Köpf, E\. Yang, Z\. DeVito, M\. Raison, A\. Tejani, S\. Chilamkurthy, B\. Steiner, L\. Fang, J\. Bai, and S\. Chintala\(2019\)PyTorch: an imperative style, high\-performance deep learning library\.InAdvances in Neural Information Processing Systems,Vol\.32\.External Links:[Link](https://papers.neurips.cc/paper_files/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html)Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p10.1)\.
- \[21\]L\. R\. Petzold\(1982\-09\)Description of DASSL: a differential/algebraic system solver\.Technical reportSandia National Laboratories\.External Links:[Link](https://www.osti.gov/biblio/5882821)Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p9.1)\.
- \[22\]C\. Rackauckas and Q\. Nie\(2017\)DifferentialEquations\.jl – a performant and feature\-rich ecosystem for solving differential equations in Julia\.Journal of Open Research Software5\(1\),pp\. 15\.External Links:[Document](https://dx.doi.org/10.5334/jors.151)Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p10.1)\.
- \[23\]A\. Saccon, N\. van de Wouw, and H\. Nijmeijer\(2014\)Sensitivity analysis of hybrid systems with state jumps with application to trajectory tracking\.InProceedings of the 53rd IEEE Conference on Decision and Control,pp\. 3065–3070\.External Links:[Document](https://dx.doi.org/10.1109/CDC.2014.7039877)Cited by:[§1](https://arxiv.org/html/2605.05395#S1.p8.1)\.
- \[24\]P\. Täuber, L\. Ochel, W\. Braun, and B\. Bachmann\(2014\)Practical realization and adaptation of Cellier’s tearing method\.InProceedings of the 6th International Workshop on Equation\-Based Object\-Oriented Modeling Languages and Tools,pp\. 11–19\.External Links:[Document](https://dx.doi.org/10.1145/2666202.2666204)Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p9.1)\.
- \[25\]H\. Zhang, S\. Abhyankar, E\. M\. Constantinescu, and M\. Anitescu\(2017\)Discrete adjoint sensitivity analysis of hybrid dynamical systems with switching\.IEEE Transactions on Circuits and Systems I: Regular Papers64\(5\),pp\. 1247–1259\.External Links:[Document](https://dx.doi.org/10.1109/TCSI.2017.2651683)Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p8.1)\.
- \[26\]D\. Zimmer\(2009\)Module\-preserving compilation of Modelica models\.InProceedings of the 7th International Modelica Conference,Como, Italy,pp\. 556–565\.External Links:[Link](https://ep.liu.se/en/conference-article.aspx?Article_No=104&issue=43&series=ecp)Cited by:[§6](https://arxiv.org/html/2605.05395#S6.p9.1)\.

## Appendix ADetails of the AD\-through\-Simulation DAE Method

This appendix gives the mathematical and implementation details of the AD\-through\-simulation method for DAEs with state\-dependent events\. The method treats the event\-driven simulator as a differentiable program\. A loss is evaluated by simulating the DAE at prescribed observation times, and the parameter gradient is obtained by automatic differentiation through the simulation path selected during the forward pass\.

### A\.1DAE specification and parameter layout

The model is specified by differential statesx​\(t\)∈ℝnxx\(t\)\\in\\mathbb\{R\}^\{n\_\{x\}\}, algebraic variablesz​\(t\)∈ℝnzz\(t\)\\in\\mathbb\{R\}^\{n\_\{z\}\}, parametersp∈ℝnpp\\in\\mathbb\{R\}^\{n\_\{p\}\}, and outputsy​\(t\)∈ℝnyy\(t\)\\in\\mathbb\{R\}^\{n\_\{y\}\}\. The semi\-explicit DAE has the form

x˙=f​\(t,x,z,p\),0=g​\(t,x,z,p\),y=h​\(t,x,z,p\),\\dot\{x\}=f\(t,x,z,p\),\\qquad 0=g\(t,x,z,p\),\\qquad y=h\(t,x,z,p\),whereffis the differential\-state vector field,ggis the algebraic constraint, andhhis the output map\.

Only a subset of the full parameter vector may be optimized\. Letpopt∈ℝnoptp\_\{\\mathrm\{opt\}\}\\in\\mathbb\{R\}^\{n\_\{\\mathrm\{opt\}\}\}denote the optimized parameter vector, and letIoptI\_\{\\mathrm\{opt\}\}denote the corresponding indices inside the full parameter vectorpp\. During loss evaluation, the full parameter vector is reconstructed as

p=insert⁡\(pbase,Iopt,popt\),p=\\operatorname\{insert\}\\left\(p\_\{\\mathrm\{base\}\},I\_\{\\mathrm\{opt\}\},p\_\{\\mathrm\{opt\}\}\\right\),wherepbasep\_\{\\mathrm\{base\}\}contains the fixed baseline values for all parameters not being optimized\. If no explicit output maphhis specified, the differential statexxis used as the output\.

The equations definingff,gg,hh, and the event guards are compiled into JAX\-compatible numerical functions\. This avoids per\-equation Python evaluation inside the JIT\-compiled simulation and gives fixed\-shape functions of\(t,x,z,p\)\(t,x,z,p\)\.

### A\.2Algebraic variables as implicit functions

The numerical ODE solver advances only the differential statexx\. At each vector\-field evaluation, the algebraic variablezzis obtained by solving

g​\(t,x,z,p\)=0\.g\(t,x,z,p\)=0\.When this algebraic equation has a locally unique solution, we denote it by

z=ζ​\(t,x,p\),g​\(t,x,ζ​\(t,x,p\),p\)=0\.z=\\zeta\(t,x,p\),\\qquad g\(t,x,\\zeta\(t,x,p\),p\)=0\.The DAE is then integrated through the reduced vector field

x˙=F​\(t,x,p\)=f​\(t,x,ζ​\(t,x,p\),p\),\\dot\{x\}=F\(t,x,p\)=f\(t,x,\\zeta\(t,x,p\),p\),whereFFis the vector field passed to the ODE solver\.

In the implementation,ζ​\(t,x,p\)\\zeta\(t,x,p\)is computed by a chord\-Newton iteration\. Letz\(0\)z^\{\(0\)\}denote the initial algebraic guess and letQQdenote the number of chord\-Newton iterations\. A regularized algebraic Jacobian is formed as

Gz=∂g∂z​\(t,x,z\(0\),p\)\+ϵz​I,G\_\{z\}=\\frac\{\\partial g\}\{\\partial z\}\(t,x,z^\{\(0\)\},p\)\+\\epsilon\_\{z\}I,whereϵz\>0\\epsilon\_\{z\}\>0is a small regularization constant andIIis the identity matrix of dimensionnzn\_\{z\}\. The iteration is

z\(q\+1\)=z\(q\)−Gz−1​g​\(t,x,z\(q\),p\),q=0,…,Q−1\.z^\{\(q\+1\)\}=z^\{\(q\)\}\-G\_\{z\}^\{\-1\}g\(t,x,z^\{\(q\)\},p\),\\qquad q=0,\\ldots,Q\-1\.The same factorization ofGzG\_\{z\}is reused across these chord iterations\.

The algebraic solve is differentiated using a custom JVP based on the implicit function theorem\. Let

gt=∂g∂t,gx=∂g∂x,gz=∂g∂z,gp=∂g∂p,g\_\{t\}=\\frac\{\\partial g\}\{\\partial t\},\\qquad g\_\{x\}=\\frac\{\\partial g\}\{\\partial x\},\\qquad g\_\{z\}=\\frac\{\\partial g\}\{\\partial z\},\\qquad g\_\{p\}=\\frac\{\\partial g\}\{\\partial p\},all evaluated at\(t,x,ζ​\(t,x,p\),p\)\(t,x,\\zeta\(t,x,p\),p\)\. For perturbations\(d​t,d​x,d​p\)\(dt,dx,dp\)in\(t,x,p\)\(t,x,p\), the corresponding algebraic perturbation isd​zdz\. Differentiating

g​\(t,x,ζ​\(t,x,p\),p\)=0g\(t,x,\\zeta\(t,x,p\),p\)=0gives

gz​d​z\+gt​d​t\+gx​d​x\+gp​d​p=0\.g\_\{z\}\\,dz\+g\_\{t\}\\,dt\+g\_\{x\}\\,dx\+g\_\{p\}\\,dp=0\.Therefore,

d​z=−gz−1​\(gt​d​t\+gx​d​x\+gp​d​p\)\.dz=\-g\_\{z\}^\{\-1\}\\left\(g\_\{t\}\\,dt\+g\_\{x\}\\,dx\+g\_\{p\}\\,dp\\right\)\.\(A\.1\)Equation \([A\.1](https://arxiv.org/html/2605.05395#A1.Ex11)\) is the JVP used for the algebraic solve\. It allows gradients to propagate through the algebraic constraint without differentiating through every Newton iteration\.

### A\.3Event guards and reset maps

Events are specified by guard conditions and reinitialization rules\. Letnen\_\{e\}denote the number of event guards\. Each guard is represented as a scalar function

ϕe​\(t,x,z,p\),e=1,…,ne\.\\phi\_\{e\}\(t,x,z,p\),\\qquad e=1,\\ldots,n\_\{e\}\.The implementation uses the sign convention

ϕe\>0before the event,ϕe=0at the event,ϕe<0after the event\.\\phi\_\{e\}\>0\\quad\\text\{before the event\},\\qquad\\phi\_\{e\}=0\\quad\\text\{at the event\},\\qquad\\phi\_\{e\}<0\\quad\\text\{after the event\}\.Thus, all events are represented as positive\-to\-nonpositive crossings\. For example, a conditiona\>ba\>bis rewritten asϕ=b−a\\phi=b\-a, while a conditiona<ba<bis rewritten asϕ=a−b\\phi=a\-b\.

The event vector is

Φ​\(t,x,z,p\)=\[ϕ1​\(t,x,z,p\)⋮ϕne​\(t,x,z,p\)\]\.\\Phi\(t,x,z,p\)=\\begin\{bmatrix\}\\phi\_\{1\}\(t,x,z,p\)\\\\ \\vdots\\\\ \\phi\_\{n\_\{e\}\}\(t,x,z,p\)\\end\{bmatrix\}\.Since guards may depend on algebraic variables, the event vector used by the solver is evaluated after solving the algebraic equation:

Φ​\(t,x,p\)=Φ​\(t,x,ζ​\(t,x,p\),p\)\.\\Phi\(t,x,p\)=\\Phi\(t,x,\\zeta\(t,x,p\),p\)\.
Each event typeeehas an associated reset map

x\+=Ψe​\(t,x−,z−,p\),x^\{\+\}=\\Psi\_\{e\}\(t,x^\{\-\},z^\{\-\},p\),wherex−x^\{\-\}andz−z^\{\-\}are the pre\-event differential and algebraic variables, andx\+x^\{\+\}is the post\-event differential state\. The implementation supports direct reinitialization of differential states\. Algebraic variables are not reset directly\. Instead, afterx\+x^\{\+\}is computed, the post\-event algebraic variablez\+z^\{\+\}is recomputed from

g​\(t,x\+,z\+,p\)=0\.g\(t,x^\{\+\},z^\{\+\},p\)=0\.
For a fixed event indexee, the reset map is differentiated as an ordinary JAX function\. Its local differential is

d​x\+=Ψe,x​d​x−\+Ψe,z​d​z−\+Ψe,p​d​p\+Ψe,t​d​t,dx^\{\+\}=\\Psi\_\{e,x\}\\,dx^\{\-\}\+\\Psi\_\{e,z\}\\,dz^\{\-\}\+\\Psi\_\{e,p\}\\,dp\+\\Psi\_\{e,t\}\\,dt,whereΨe,x\\Psi\_\{e,x\},Ψe,z\\Psi\_\{e,z\},Ψe,p\\Psi\_\{e,p\}, andΨe,t\\Psi\_\{e,t\}denote the partial derivatives ofΨe\\Psi\_\{e\}with respect tox−x^\{\-\},z−z^\{\-\},pp, andtt, respectively\.

### A\.4Segmented event simulation

The simulation is represented as a sequence of smooth segments separated by events\. Lettst\_\{s\}andxsx\_\{s\}denote the start time and initial differential state of a segment\. The algebraic state at the segment start is

zs=ζ​\(ts,xs,p\)\.z\_\{s\}=\\zeta\(t\_\{s\},x\_\{s\},p\)\.The event vector is evaluated at the segment start, and an active\-event mask is formed:

αe=𝟏​\{ϕe​\(ts,xs,zs,p\)\>εϕ\},e=1,…,ne\.\\alpha\_\{e\}=\\mathbf\{1\}\\\{\\phi\_\{e\}\(t\_\{s\},x\_\{s\},z\_\{s\},p\)\>\\varepsilon\_\{\\phi\}\\\},\\qquad e=1,\\ldots,n\_\{e\}\.Hereαe∈\{0,1\}\\alpha\_\{e\}\\in\\\{0,1\\\}indicates whether guardeeis active for event detection on the current segment, andεϕ\>0\\varepsilon\_\{\\phi\}\>0is a small threshold\. Guards already active at the segment boundary are masked out to avoid immediate retriggering after a reset\.

The active guards are combined into a scalar event condition

Φmin​\(t,x,p;α\)=mine=1,…,ne⁡\{ϕe​\(t,x,ζ​\(t,x,p\),p\),αe=1,C,αe=0,\\Phi\_\{\\min\}\(t,x,p;\\alpha\)=\\min\_\{e=1,\\ldots,n\_\{e\}\}\\begin\{cases\}\\phi\_\{e\}\(t,x,\\zeta\(t,x,p\),p\),&\\alpha\_\{e\}=1,\\\\ C,&\\alpha\_\{e\}=0,\\end\{cases\}whereC\>0C\>0is a large constant\. The ODE solver advances the reduced system

x˙=F​\(t,x,p\)\\dot\{x\}=F\(t,x,p\)until either the terminal timeTTis reached or the composite guardΦmin\\Phi\_\{\\min\}crosses zero\. A bracketing root finder is then used to refine the event time\.

If an event is detected at timeτ\\tau, the active event index is selected as

e⋆=arg⁡mine:αe=1⁡ϕe​\(τ,x​\(τ−\),ζ​\(τ,x​\(τ−\),p\),p\)\.e^\{\\star\}=\\arg\\min\_\{e:\\alpha\_\{e\}=1\}\\phi\_\{e\}\(\\tau,x\(\\tau^\{\-\}\),\\zeta\(\\tau,x\(\\tau^\{\-\}\),p\),p\)\.Herex​\(τ−\)x\(\\tau^\{\-\}\)denotes the pre\-event differential state\. The pre\-event algebraic variable is

z−=ζ​\(τ,x​\(τ−\),p\),z^\{\-\}=\\zeta\(\\tau,x\(\\tau^\{\-\}\),p\),and the reset gives

x​\(τ\+\)=Ψe⋆​\(τ,x​\(τ−\),z−,p\)\.x\(\\tau^\{\+\}\)=\\Psi\_\{e^\{\\star\}\}\(\\tau,x\(\\tau^\{\-\}\),z^\{\-\},p\)\.The next segment starts from the post\-event pair\(τ,x​\(τ\+\)\)\(\\tau,x\(\\tau^\{\+\}\)\)\.

For JIT compilation, event detection is implemented as a fixed\-length scan over a prescribed maximum number of segments, denoted byKmaxK\_\{\\max\}\. Each scan step records

\(ak,bk,xk,rk\),k=1,…,Kmax\.\(a\_\{k\},b\_\{k\},x\_\{k\},r\_\{k\}\),\\qquad k=1,\\ldots,K\_\{\\max\}\.Hereaka\_\{k\}is the start time of segmentkk,bkb\_\{k\}is the segment end time \(either an event time or the terminal time\),xkx\_\{k\}is the differential state ataka\_\{k\}, andrk∈\{0,1\}r\_\{k\}\\in\\\{0,1\\\}indicates whether the segment is a real physical segment or padding\. If the terminal time is reached beforeKmaxK\_\{\\max\}segments are used, the remaining scan iterations are degenerate padding segments\. If the scan reachesKmaxK\_\{\\max\}before the terminal time, the trajectory is marked as saturated and the loss is made non\-finite to avoid optimizing against an invalid simulation\.

### A\.5Target\-time evaluation and output reconstruction

After event detection, each real segment is reintegrated to evaluate the trajectory at the observation times\. Let

t1,…,tNt\_\{1\},\\ldots,t\_\{N\}denote the observation or target times\. For segmentkkwith interval\[ak,bk\]\[a\_\{k\},b\_\{k\}\], each target timetit\_\{i\}is clipped into the segment interval:

t~k,i=min⁡\{max⁡\{ti,ak\},bk\}\.\\tilde\{t\}\_\{k,i\}=\\min\\\{\\max\\\{t\_\{i\},a\_\{k\}\\\},b\_\{k\}\\\}\.The segment is then integrated with output saved at all clipped times, giving

Xk,i≈x​\(t~k,i\)\.X\_\{k,i\}\\approx x\(\\tilde\{t\}\_\{k,i\}\)\.The arrayXk,iX\_\{k,i\}contains candidate state values for all segment\-target pairs\. Under hard selection, only the value associated with the segment that containstit\_\{i\}is used\.

With right\-continuous hard selection, the selected segment for target timetit\_\{i\}is

k⋆​\(i\)=max⁡\{k:rk=1,ti∈\[ak,bk\]\}\.k^\{\\star\}\(i\)=\\max\\left\\\{k:r\_\{k\}=1,\\;t\_\{i\}\\in\[a\_\{k\},b\_\{k\}\]\\right\\\}\.The predicted differential state is

x^​\(ti;p\)=Xk⋆​\(i\),i\.\\hat\{x\}\(t\_\{i\};p\)=X\_\{k^\{\\star\}\(i\),i\}\.Thus, iftit\_\{i\}is exactly an event time, the target is assigned to the post\-event segment\.

The implementation also supports optional soft segment selection\. For each segment\-target pair, define

ωk,i=σ​\(β​\(ti−ak\)\)​σ​\(β​\(bk−ti\)\)​rk,\\omega\_\{k,i\}=\\sigma\\\!\\left\(\\beta\(t\_\{i\}\-a\_\{k\}\)\\right\)\\sigma\\\!\\left\(\\beta\(b\_\{k\}\-t\_\{i\}\)\\right\)r\_\{k\},whereσ​\(s\)=1/\(1\+exp⁡\(−s\)\)\\sigma\(s\)=1/\(1\+\\exp\(\-s\)\)is the logistic sigmoid andβ\>0\\beta\>0controls the sharpness of the segment\-membership transition\. The normalized weights are

ω¯k,i=ωk,i∑ℓ=1Kmaxωℓ,i\+ϵω,\\bar\{\\omega\}\_\{k,i\}=\\frac\{\\omega\_\{k,i\}\}\{\\sum\_\{\\ell=1\}^\{K\_\{\\max\}\}\\omega\_\{\\ell,i\}\+\\epsilon\_\{\\omega\}\},whereϵω\>0\\epsilon\_\{\\omega\}\>0prevents division by zero\. The blended state is

x^​\(ti;p\)=∑k=1Kmaxω¯k,i​Xk,i\.\\hat\{x\}\(t\_\{i\};p\)=\\sum\_\{k=1\}^\{K\_\{\\max\}\}\\bar\{\\omega\}\_\{k,i\}X\_\{k,i\}\.\(A\.2\)Asβ→∞\\beta\\to\\infty, this approaches hard segment selection\. For finiteβ\\beta, the predicted state changes more smoothly near event boundaries\.

Oncex^​\(ti;p\)\\hat\{x\}\(t\_\{i\};p\)is available, the algebraic variable and output are reconstructed as

z^​\(ti;p\)=ζ​\(ti,x^​\(ti;p\),p\),\\hat\{z\}\(t\_\{i\};p\)=\\zeta\(t\_\{i\},\\hat\{x\}\(t\_\{i\};p\),p\),and

y^​\(ti;p\)=h​\(ti,x^​\(ti;p\),z^​\(ti;p\),p\)\.\\hat\{y\}\(t\_\{i\};p\)=h\(t\_\{i\},\\hat\{x\}\(t\_\{i\};p\),\\hat\{z\}\(t\_\{i\};p\),p\)\.

### A\.6Loss, gradient, and outer optimization

Letyidatay\_\{i\}^\{\\mathrm\{data\}\}denote the measured or reference output at target timetit\_\{i\}\. The loss is

𝒥​\(popt\)=∑i=1N‖y^​\(ti;popt\)−yidata‖22,\\mathcal\{J\}\(p\_\{\\mathrm\{opt\}\}\)=\\sum\_\{i=1\}^\{N\}\\left\\\|\\hat\{y\}\(t\_\{i\};p\_\{\\mathrm\{opt\}\}\)\-y\_\{i\}^\{\\mathrm\{data\}\}\\right\\\|\_\{2\}^\{2\},or the corresponding mean loss\. The full loss evaluation reconstructs the full parameter vectorpp, simulates the event\-driven DAE, evaluatesy^​\(ti;popt\)\\hat\{y\}\(t\_\{i\};p\_\{\\mathrm\{opt\}\}\)at all target times, and computes the output mismatch\.

The implementation computes

\(𝒥​\(popt\),∇popt𝒥​\(popt\)\)\\left\(\\mathcal\{J\}\(p\_\{\\mathrm\{opt\}\}\),\\nabla\_\{p\_\{\\mathrm\{opt\}\}\}\\mathcal\{J\}\(p\_\{\\mathrm\{opt\}\}\)\\right\)with a JIT\-compiledvalue\_and\_gradcall\. Gradients propagate through the algebraic solve, reduced ODE integration, event\-time computation, reset map, target\-time selection, and output reconstruction\.

The resulting gradient can be used by a first\-order method such as Adam or by a quasi\-Newton method such as L\-BFGS\-B\. In both cases, the optimizer uses the same JIT\-compiled loss and gradient\. The optimization driver records the loss, gradient norm, parameter values, and iteration time, and stops if the loss or gradient becomes non\-finite\.

### A\.7Local validity

The AD\-through\-simulation gradient is local to the event path selected during the forward pass\. It assumes that small parameter perturbations do not change the number of events, their active indices, or their ordering\. It also assumes transversal guard crossings\. For eventeeat timeτ\\tau, this means

dd​t​ϕe​\(t,x​\(t\),z​\(t\),p\)\|t=τ−≠0,\\frac\{d\}\{dt\}\\phi\_\{e\}\(t,x\(t\),z\(t\),p\)\\bigg\|\_\{t=\\tau^\{\-\}\}\\neq 0,whereτ−\\tau^\{\-\}denotes the left limit before the event\.

If an event becomes grazing, two events become simultaneous, or the event ordering changes, the reduced simulation map can be nonsmooth\. In that case, the returned gradient should be interpreted as the derivative of the selected computational path rather than as a globally smooth sensitivity\.

## Appendix BDetails of the Explicit Discrete\-Adjoint Method

This appendix describes the discrete\-adjoint method\. Unlike the AD\-through\-simulation approach, this method keeps the discretized DAE and event constraints visible and computes the gradient by solving for the corresponding Lagrange multipliers\.

### B\.1Event\-split discrete trajectory

The forward simulation produces an event\-split trajectory\. Smooth segmentmmstores discrete values

\{tm,k,xm,k,zm,k,x˙m,k\}k=0Nm−1,\\\{t\_\{m,k\},x\_\{m,k\},z\_\{m,k\},\\dot\{x\}\_\{m,k\}\\\}\_\{k=0\}^\{N\_\{m\}\-1\},wheretm,kt\_\{m,k\}is thekk\-th time node in segmentmm,xm,kx\_\{m,k\}is the differential state,zm,kz\_\{m,k\}is the algebraic state,x˙m,k\\dot\{x\}\_\{m,k\}is the state derivative, andNmN\_\{m\}is the number of nodes in the segment\.

Each event record stores

τm,em,wm−,wm\+\.\\tau\_\{m\},\\qquad e\_\{m\},\\qquad w\_\{m\}^\{\-\},\\qquad w\_\{m\}^\{\+\}\.Hereτm\\tau\_\{m\}is themm\-th event time,em∈\{1,…,ne\}e\_\{m\}\\in\\\{1,\\ldots,n\_\{e\}\\\}is the index of the event guard that fired, andnen\_\{e\}is the number of event guards\. The variableswm−w\_\{m\}^\{\-\}andwm\+w\_\{m\}^\{\+\}denote the pre\- and post\-event DAE variables:

wm−=\[xm−zm−\],wm\+=\[xm\+zm\+\]\.w\_\{m\}^\{\-\}=\\begin\{bmatrix\}x\_\{m\}^\{\-\}\\\\ z\_\{m\}^\{\-\}\\end\{bmatrix\},\\qquad w\_\{m\}^\{\+\}=\\begin\{bmatrix\}x\_\{m\}^\{\+\}\\\\ z\_\{m\}^\{\+\}\\end\{bmatrix\}\.The indexeme\_\{m\}selects the event\-specific guard, reset equations, and continuity equations used in the event residual\.

For fixed\-shape compilation, the event\-split trajectory is stored in padded block arrays\. Each block is either padding, a smooth segment, or an event:

0=padding,1=segment,2=event\.\\texttt\{0\}=\\text\{padding\},\\qquad\\texttt\{1\}=\\text\{segment\},\\qquad\\texttt\{2\}=\\text\{event\}\.For a segment bounded by event timesτm\\tau\_\{m\}andτm\+1\\tau\_\{m\+1\}, physical node times are reconstructed from normalized coordinates:

tm,k=τm\+ηm,k​\(τm\+1−τm\),0≤ηm,k≤1\.t\_\{m,k\}=\\tau\_\{m\}\+\\eta\_\{m,k\}\(\\tau\_\{m\+1\}\-\\tau\_\{m\}\),\\qquad 0\\leq\\eta\_\{m,k\}\\leq 1\.\(B\.1\)Thus, event times determine both the segment boundaries and the physical time grid used in the residuals\.

### B\.2Loss derivatives and adjoint loads

The discrete loss is evaluated by interpolating the event\-split trajectory at the observation times:

𝒥h​\(W,τ,p\)=1Nobs​∑i=1Nobs‖y^​\(ti;W,τ,p\)−yidata‖22\.\\mathcal\{J\}\_\{h\}\(W,\\tau,p\)=\\frac\{1\}\{N\_\{\\mathrm\{obs\}\}\}\\sum\_\{i=1\}^\{N\_\{\\mathrm\{obs\}\}\}\\left\\\|\\hat\{y\}\(t\_\{i\};W,\\tau,p\)\-y\_\{i\}^\{\\mathrm\{data\}\}\\right\\\|\_\{2\}^\{2\}\.HereWWdenotes all discrete differential and algebraic variables over all segments,τ=\(τ1,…,τM\)\\tau=\(\\tau\_\{1\},\\ldots,\\tau\_\{M\}\)denotes the event times, andppdenotes the model parameters\.

Automatic differentiation is used only for this loss\-evaluation map, producing the direct derivatives

ℓW=∂𝒥h∂W,ℓp=∂𝒥h∂p,ℓτ=∂𝒥h∂τ\.\\ell\_\{W\}=\\frac\{\\partial\\mathcal\{J\}\_\{h\}\}\{\\partial W\},\\qquad\\ell\_\{p\}=\\frac\{\\partial\\mathcal\{J\}\_\{h\}\}\{\\partial p\},\\qquad\\ell\_\{\\tau\}=\\frac\{\\partial\\mathcal\{J\}\_\{h\}\}\{\\partial\\tau\}\.For an individual nodewm,kw\_\{m,k\}, we write

ℓm,k=∂𝒥h∂wm,k\.\\ell\_\{m,k\}=\\frac\{\\partial\\mathcal\{J\}\_\{h\}\}\{\\partial w\_\{m,k\}\}\.The vectorℓp\\ell\_\{p\}is the direct parameter derivative of the loss\. The vectorℓτ\\ell\_\{\\tau\}contains the direct derivatives of the loss with respect to the event times\.

During the backward sweep, we use the following accumulated quantities\. The symbolaadenotes an adjoint load with respect to a node variableww\. The symbolqpq\_\{p\}denotes the running accumulator for the parameter gradient\. The symbolsqtsq\_\{t\_\{s\}\}andqteq\_\{t\_\{e\}\}denote accumulated sensitivities with respect to the start and end times of a smooth segment\. When a segment boundary is an event time, these boundary\-time sensitivities contribute to stationarity with respect to that event time\.

### B\.3Smooth\-step residuals

For one smooth step in segmentmm, define

wc=wm,k,wn=wm,k\+1,tc=tm,k,tn=tm,k\+1\.w\_\{c\}=w\_\{m,k\},\\qquad w\_\{n\}=w\_\{m,k\+1\},\\qquad t\_\{c\}=t\_\{m,k\},\\qquad t\_\{n\}=t\_\{m,k\+1\}\.The subscriptsccandnndenote the current and next nodes in the forward time direction\. With

wc=\[xczc\],wn=\[xnzn\],w\_\{c\}=\\begin\{bmatrix\}x\_\{c\}\\\\ z\_\{c\}\\end\{bmatrix\},\\qquad w\_\{n\}=\\begin\{bmatrix\}x\_\{n\}\\\\ z\_\{n\}\\end\{bmatrix\},the trapezoidal DAE residual is

Rk​\(wc,wn,tc,tn,p\)=\[−xn\+xc\+tn−tc2​\[f​\(tc,xc,zc,p\)\+f​\(tn,xn,zn,p\)\]g​\(tn,xn,zn,p\)\]\.R\_\{k\}\(w\_\{c\},w\_\{n\},t\_\{c\},t\_\{n\},p\)=\\begin\{bmatrix\}\-x\_\{n\}\+x\_\{c\}\+\\frac\{t\_\{n\}\-t\_\{c\}\}\{2\}\\left\[f\(t\_\{c\},x\_\{c\},z\_\{c\},p\)\+f\(t\_\{n\},x\_\{n\},z\_\{n\},p\)\\right\]\\\\ g\(t\_\{n\},x\_\{n\},z\_\{n\},p\)\\end\{bmatrix\}\.\(B\.2\)The first block is the trapezoidal residual for the differential state\. The second block enforces the algebraic constraint at the end of the step\.

The adjoint sweep uses the Jacobians

Jn=∂Rk∂wn,Jc=∂Rk∂wc,Jp=∂Rk∂p,J\_\{n\}=\\frac\{\\partial R\_\{k\}\}\{\\partial w\_\{n\}\},\\qquad J\_\{c\}=\\frac\{\\partial R\_\{k\}\}\{\\partial w\_\{c\}\},\\qquad J\_\{p\}=\\frac\{\\partial R\_\{k\}\}\{\\partial p\},and the time derivatives

rtc=∂Rk∂tc,rtn=∂Rk∂tn\.r\_\{t\_\{c\}\}=\\frac\{\\partial R\_\{k\}\}\{\\partial t\_\{c\}\},\\qquad r\_\{t\_\{n\}\}=\\frac\{\\partial R\_\{k\}\}\{\\partial t\_\{n\}\}\.All derivatives are evaluated at the stored forward trajectory and current parameter value\.

### B\.4Backward sweep through a smooth segment

Letana\_\{n\}be the adjoint load entering nodewnw\_\{n\}from later times\. This load contains the direct loss derivative atwnw\_\{n\}and all contributions propagated backward from future steps and events\. The local multiplierλk\\lambda\_\{k\}for the step residualRk=0R\_\{k\}=0is found from

Jn⊤​λk=−an\.J\_\{n\}^\{\\top\}\\lambda\_\{k\}=\-a\_\{n\}\.The dimension ofλk\\lambda\_\{k\}matches the dimension of the residualRkR\_\{k\}\.

After solving forλk\\lambda\_\{k\}, the load propagated to the previous node is

ac=ℓc\+Jc⊤​λk,a\_\{c\}=\\ell\_\{c\}\+J\_\{c\}^\{\\top\}\\lambda\_\{k\},where

ℓc=∂𝒥h∂wc\.\\ell\_\{c\}=\\frac\{\\partial\\mathcal\{J\}\_\{h\}\}\{\\partial w\_\{c\}\}\.The parameter\-gradient accumulator is updated by

qp←qp\+Jp⊤​λk\.q\_\{p\}\\leftarrow q\_\{p\}\+J\_\{p\}^\{\\top\}\\lambda\_\{k\}\.At the beginning of the complete backward sweep,qpq\_\{p\}is initialized with the direct loss derivativeℓp\\ell\_\{p\}\.

The same multiplier contributes to derivatives with respect to the physical timestct\_\{c\}andtnt\_\{n\}\. Define

dc=rtc⊤​λk,dn=rtn⊤​λk\.d\_\{c\}=r\_\{t\_\{c\}\}^\{\\top\}\\lambda\_\{k\},\\qquad d\_\{n\}=r\_\{t\_\{n\}\}^\{\\top\}\\lambda\_\{k\}\.These are scalar sensitivities of the Lagrangian contributionλk⊤​Rk\\lambda\_\{k\}^\{\\top\}R\_\{k\}with respect to the current and next node times\.

Lettst\_\{s\}andtet\_\{e\}denote the start and end times of the segment\. From \([B\.1](https://arxiv.org/html/2605.05395#A2.Ex5)\),

tc=ts\+ηc​\(te−ts\),tn=ts\+ηn​\(te−ts\),t\_\{c\}=t\_\{s\}\+\\eta\_\{c\}\(t\_\{e\}\-t\_\{s\}\),\\qquad t\_\{n\}=t\_\{s\}\+\\eta\_\{n\}\(t\_\{e\}\-t\_\{s\}\),whereηc\\eta\_\{c\}andηn\\eta\_\{n\}are the normalized coordinates associated withtct\_\{c\}andtnt\_\{n\}\. Therefore, the step contributes to the segment\-boundary sensitivities as

qts←qts\+dc​\(1−ηc\)\+dn​\(1−ηn\),q\_\{t\_\{s\}\}\\leftarrow q\_\{t\_\{s\}\}\+d\_\{c\}\(1\-\\eta\_\{c\}\)\+d\_\{n\}\(1\-\\eta\_\{n\}\),and

qte←qte\+dc​ηc\+dn​ηn\.q\_\{t\_\{e\}\}\\leftarrow q\_\{t\_\{e\}\}\+d\_\{c\}\\eta\_\{c\}\+d\_\{n\}\\eta\_\{n\}\.This is how event\-time dependence enters the smooth\-segment adjoint sweep: iftst\_\{s\}ortet\_\{e\}is an event time, then the corresponding boundary sensitivity contributes to stationarity with respect to that event time\.

### B\.5Event residuals

At event timeτm\\tau\_\{m\}, the active event index iseme\_\{m\}\. To simplify notation in this subsection, we writeτ=τm\\tau=\\tau\_\{m\}ande=eme=e\_\{m\}\. The event residual is

E​\(τ,w\+,w−,p\)=\[ϕe​\(τ,w−,p\)ρe​\(τ,w\+,w−,p\)ce​\(w\+,w−\)g​\(τ,x\+,z\+,p\)\]=0\.E\(\\tau,w^\{\+\},w^\{\-\},p\)=\\begin\{bmatrix\}\\phi\_\{e\}\(\\tau,w^\{\-\},p\)\\\\ \\rho\_\{e\}\(\\tau,w^\{\+\},w^\{\-\},p\)\\\\ c\_\{e\}\(w^\{\+\},w^\{\-\}\)\\\\ g\(\\tau,x^\{\+\},z^\{\+\},p\)\\end\{bmatrix\}=0\.\(B\.3\)Hereϕe\\phi\_\{e\}is the active guard function,ρe\\rho\_\{e\}contains the event\-specific reset equations,cec\_\{e\}contains continuity equations for variables not changed by the event, and the final block enforces post\-event algebraic consistency\.

Define the event Jacobians

A=∂E∂w\+,B=∂E∂w−,C=∂E∂p,eτ=∂E∂τ\.A=\\frac\{\\partial E\}\{\\partial w^\{\+\}\},\\qquad B=\\frac\{\\partial E\}\{\\partial w^\{\-\}\},\\qquad C=\\frac\{\\partial E\}\{\\partial p\},\\qquad e\_\{\\tau\}=\\frac\{\\partial E\}\{\\partial\\tau\}\.These derivatives are evaluated at the stored event data\(τ,w−,w\+,p\)\(\\tau,w^\{\-\},w^\{\+\},p\)\.

### B\.6Event adjoint and pending\-event representation

When the reverse scan reaches an event block, the post\-event segment has already been processed\. Leta\+a^\{\+\}denote the adjoint load arriving from the post\-event side, i\.e\., the accumulated derivative with respect tow\+w^\{\+\}\. The event multiplierμ\\musatisfies

A⊤​μ=−a\+\.A^\{\\top\}\\mu=\-a^\{\+\}\.This equation enforces stationarity with respect to the post\-event variablesw\+w^\{\+\}\.

The event timeτ\\taualso appears in the event residual and in the neighboring segment time grids\. Therefore, stationarity with respect toτ\\taumust also be enforced\. In the implementation, the solution of the event\-adjoint equation is represented in affine form:

μ​\(c\)=μ0\+c​v,A⊤​v=0\.\\mu\(c\)=\\mu\_\{0\}\+cv,\\qquad A^\{\\top\}v=0\.Hereμ0\\mu\_\{0\}is a particular solution ofA⊤​μ=−a\+A^\{\\top\}\\mu=\-a^\{\+\},vvis a null vector ofA⊤A^\{\\top\}, andccis a scalar coefficient that will be determined from event\-time stationarity\. This affine representation is used because the event block alone does not determinecc\.

The affine event multiplier induces an affine load on the pre\-event variables:

a−​\(c\)=B⊤​μ​\(c\)=a0−\+c​av−,a^\{\-\}\(c\)=B^\{\\top\}\\mu\(c\)=a^\{\-\}\_\{0\}\+ca^\{\-\}\_\{v\},where

a0−=B⊤​μ0,av−=B⊤​v\.a^\{\-\}\_\{0\}=B^\{\\top\}\\mu\_\{0\},\\qquad a^\{\-\}\_\{v\}=B^\{\\top\}v\.The vectora−​\(c\)a^\{\-\}\(c\)is the terminal adjoint load passed to the segment before the event\.

The event contribution to the parameter\-gradient accumulator is

qpE​\(c\)=C⊤​μ​\(c\)=qp,0E\+c​qp,vE,q\_\{p\}^\{E\}\(c\)=C^\{\\top\}\\mu\(c\)=q\_\{p,0\}^\{E\}\+cq\_\{p,v\}^\{E\},where

qp,0E=C⊤​μ0,qp,vE=C⊤​v\.q\_\{p,0\}^\{E\}=C^\{\\top\}\\mu\_\{0\},\\qquad q\_\{p,v\}^\{E\}=C^\{\\top\}v\.The superscriptEEdenotes an event\-block contribution\. The subscript0denotes the contribution from the particular multiplierμ0\\mu\_\{0\}, while the subscriptvvdenotes the contribution from the null directionvv\.

The event contribution to stationarity with respect to the event time is

qτE​\(c\)=ℓτ,m\+qτ\+\+eτ⊤​μ​\(c\)=qτ,0E\+c​qτ,vE\.q\_\{\\tau\}^\{E\}\(c\)=\\ell\_\{\\tau,m\}\+q\_\{\\tau\}^\{\+\}\+e\_\{\\tau\}^\{\\top\}\\mu\(c\)=q\_\{\\tau,0\}^\{E\}\+cq\_\{\\tau,v\}^\{E\}\.Hereℓτ,m\\ell\_\{\\tau,m\}is the direct derivative of the loss with respect to the event timeτm\\tau\_\{m\}, obtained fromℓτ\\ell\_\{\\tau\}\. The scalarqτ\+q\_\{\\tau\}^\{\+\}is the boundary\-time sensitivity contributed by the already\-processed post\-event segment\. The termeτ⊤​μ​\(c\)e\_\{\\tau\}^\{\\top\}\\mu\(c\)is the event\-residual contribution with respect toτ\\tau\. Therefore,

qτ,0E=ℓτ,m\+qτ\+\+eτ⊤​μ0,qτ,vE=eτ⊤​v\.q\_\{\\tau,0\}^\{E\}=\\ell\_\{\\tau,m\}\+q\_\{\\tau\}^\{\+\}\+e\_\{\\tau\}^\{\\top\}\\mu\_\{0\},\\qquad q\_\{\\tau,v\}^\{E\}=e\_\{\\tau\}^\{\\top\}v\.
The scalarccis not determined at the event block itself because the segment before the event also contributes to the same event\-time stationarity condition\. The event block is therefore stored as a pending event containing the affine quantities

a0−,av−,qp,0E,qp,vE,qτ,0E,qτ,vE\.a^\{\-\}\_\{0\},\\quad a^\{\-\}\_\{v\},\\quad q\_\{p,0\}^\{E\},\\quad q\_\{p,v\}^\{E\},\\quad q\_\{\\tau,0\}^\{E\},\\quad q\_\{\\tau,v\}^\{E\}\.

### B\.7Resolving the pending event

When the reverse scan reaches the segment before the pending event, the segment end time is the same event timeτm\\tau\_\{m\}\. Consequently, that segment contributes to the same event\-time stationarity condition used to determinecc\.

The segment is swept twice\. The first sweep uses terminal loada0−a^\{\-\}\_\{0\}and the true loss loads on the segment\. It produces three quantities:

as,0,qp,0S,qτ,0S\.a\_\{s,0\},\\qquad q\_\{p,0\}^\{S\},\\qquad q\_\{\\tau,0\}^\{S\}\.Hereas,0a\_\{s,0\}is the adjoint load propagated to the start of the segment,qp,0Sq\_\{p,0\}^\{S\}is the parameter\-gradient contribution from the segment, andqτ,0Sq\_\{\\tau,0\}^\{S\}is the segment contribution to the derivative with respect to the segment end time, which is the event time\.

The second sweep uses terminal loadav−a^\{\-\}\_\{v\}and zero loss loads\. It produces

as,v,qp,vS,qτ,vS\.a\_\{s,v\},\\qquad q\_\{p,v\}^\{S\},\\qquad q\_\{\\tau,v\}^\{S\}\.These quantities measure how the segment contribution changes with the null\-direction componentc​vcvof the event multiplier\. The superscriptSSdenotes a smooth\-segment contribution\.

Together, the two sweeps define affine segment quantities:

as​\(c\)=as,0\+c​as,v,a\_\{s\}\(c\)=a\_\{s,0\}\+ca\_\{s,v\},qpS​\(c\)=qp,0S\+c​qp,vS,q\_\{p\}^\{S\}\(c\)=q\_\{p,0\}^\{S\}\+cq\_\{p,v\}^\{S\},and

qτS​\(c\)=qτ,0S\+c​qτ,vS\.q\_\{\\tau\}^\{S\}\(c\)=q\_\{\\tau,0\}^\{S\}\+cq\_\{\\tau,v\}^\{S\}\.
The event time is an internal variable of the event\-split trajectory\. Hence the total derivative of the Lagrangian with respect to this event time must vanish:

0=qτE​\(c\)\+qτS​\(c\)\.0=q\_\{\\tau\}^\{E\}\(c\)\+q\_\{\\tau\}^\{S\}\(c\)\.Substituting the affine forms gives

0=qτ,0E\+c​qτ,vE\+qτ,0S\+c​qτ,vS\.0=q\_\{\\tau,0\}^\{E\}\+cq\_\{\\tau,v\}^\{E\}\+q\_\{\\tau,0\}^\{S\}\+cq\_\{\\tau,v\}^\{S\}\.Therefore,

c=−qτ,0E\+qτ,0Sqτ,vE\+qτ,vS\.c=\-\\frac\{q\_\{\\tau,0\}^\{E\}\+q\_\{\\tau,0\}^\{S\}\}\{q\_\{\\tau,v\}^\{E\}\+q\_\{\\tau,v\}^\{S\}\}\.\(B\.4\)A small numerical regularization may be added to the denominator in the implementation\.

Afterccis known, the affine quantities are evaluated\. The load propagated to the start of the preceding segment is

as=as,0\+c​as,v\.a\_\{s\}=a\_\{s,0\}\+ca\_\{s,v\}\.The parameter\-gradient accumulator is updated by

qp←qp\+qp,0E\+c​qp,vE\+qp,0S\+c​qp,vS\.q\_\{p\}\\leftarrow q\_\{p\}\+q\_\{p,0\}^\{E\}\+cq\_\{p,v\}^\{E\}\+q\_\{p,0\}^\{S\}\+cq\_\{p,v\}^\{S\}\.The pending event is cleared, and the reverse scan continues to earlier blocks\.

### B\.8Total gradient

The gradient returned by the discrete\-adjoint method has the form

∇p𝒥h=ℓp\+∑m,kJp,m,k⊤​λm,k\+∑mCm⊤​μm\.\\nabla\_\{p\}\\mathcal\{J\}\_\{h\}=\\ell\_\{p\}\+\\sum\_\{m,k\}J\_\{p,m,k\}^\{\\top\}\\lambda\_\{m,k\}\+\\sum\_\{m\}C\_\{m\}^\{\\top\}\\mu\_\{m\}\.The first term,ℓp\\ell\_\{p\}, is the direct loss derivative with respect to the parameters\. The second term accumulates smooth\-step residual contributions, whereJp,m,kJ\_\{p,m,k\}is the parameter Jacobian of the step residual andλm,k\\lambda\_\{m,k\}is the corresponding step multiplier\. The third term accumulates event\-residual contributions, whereCmC\_\{m\}is the parameter Jacobian of themm\-th event residual andμm\\mu\_\{m\}is the corresponding event multiplier\.

In the implementation, this expression is accumulated in the running quantityqpq\_\{p\}\. Event\-time effects enter through the boundary\-time sensitivities and through the scalar stationarity solve in \([B\.4](https://arxiv.org/html/2605.05395#A2.Ex40)\); they do not appear as a separate final term because stationarity with respect to each event time has already been enforced\.

Thus, the method computes the gradient of the residual\-constrained discrete problem by solving the Lagrange multipliers explicitly\. The residuals are enforced as equality constraints, not as loss penalties\.

Similar Articles

From Noise to Control: Parameterized Diffusion Policies

arXiv cs.AI

This paper introduces Parameterized Diffusion Policy (PDP), a framework that makes diffusion policies controllable by conditioning on low-dimensional latent parameters, enabling smooth behavior interpolation and adaptation without retraining. It demonstrates improved performance on complex multimodal robot tasks in simulation and real-world experiments.

Dual Advantage Fields

arXiv cs.LG

Dual Advantage Fields (DAF) is a policy-extraction method for offline goal-conditioned RL that converts a bilinear dual value model into a local advantage signal by learning an action-effect model predicting feature displacement and scoring actions by alignment with the goal direction. Accepted at the ICML 2026 Workshop on Decision Making, DAF shows improved performance on OGBench locomotion, manipulation, and puzzle tasks.