A Bayesian Filtering Approach for Learning Lagrangian Dynamics from Noisy Measurements

arXiv cs.LG 07/01/26, 04:00 AM Papers
Summary
This paper presents a Bayesian filtering approach to learn Lagrangian dynamics from partial, noisy measurements by parameterizing kinetic and potential energies with neural networks and jointly estimating states and parameters via maximum likelihood.
arXiv:2606.31137v1 Announce Type: new Abstract: This paper proposes a Bayesian filtering-based approach for learning the dynamics of a physical system from partial, noisy measurements. We model the system dynamics using a Lagrangian mechanics formulation. As in Lagrangian neural networks (LNNs), we parameterize the kinetic and potential energies with neural networks. The unknown external forces in the Lagrangian formulation are modeled as white Gaussian noise. The corresponding Euler--Lagrange equations then yield a continuous-time stochastic state-space model (SSM) that describes the system dynamics. The neural network parameters and system states are then jointly learned via a maximum-likelihood method using Gaussian-approximation-based Bayesian filters. The effectiveness of the proposed method is demonstrated on pendulum and Duffing oscillator examples, and its performance is compared with conventional LNNs and with approximate Bayesian filters using known system models.
Original Article
View Cached Full Text
Cached at: 07/01/26, 05:34 AM
# A Bayesian Filtering Approach for Learning Lagrangian Dynamics from Noisy Measurements
Source: [https://arxiv.org/html/2606.31137](https://arxiv.org/html/2606.31137)
Kundan Kumar∗, Shreya Das∗, and Simo Särkkä∗Equal contribution\.K\. Kumar is with the School of Engineering and Applied Science, Ahmedabad University, India \(e\-mail: kundan\.kumar@ahduni\.edu\.in\)\. S\. Das is with the Department of Electrical Engineering, Indian Institute of Technology \(BHU\) Varanasi, India \(e\-mail: shreyadas\.eee@iitbhu\.ac\.in\)\. K\. Kumar and S\. Särkkä are with the ELLIS Institute Finland and the Department of Electrical Engineering and Automation, Aalto University, Finland \(e\-mail: \{kundan\.kumar; simo\.sarkka\}@aalto\.fi\)\.

###### Abstract

This paper proposes a Bayesian filtering\-based approach for learning the dynamics of a physical system from partial, noisy measurements\. We model the system dynamics using a Lagrangian mechanics formulation\. As in Lagrangian neural networks \(LNNs\), we parameterize the kinetic and potential energies with neural networks\. The unknown external forces in the Lagrangian formulation are modeled as white Gaussian noise\. The corresponding Euler–Lagrange equations then yield a continuous\-time stochastic state\-space model \(SSM\) that describes the system dynamics\. The neural network parameters and system states are then jointly learned via a maximum\-likelihood method using Gaussian\-approximation\-based Bayesian filters\. The effectiveness of the proposed method is demonstrated on pendulum and Duffing oscillator examples, and its performance is compared with conventional LNNs and with approximate Bayesian filters using known system models\.

###### Index Terms:

Lagrangian neural networks, Euler\-Lagrange equation, Bayesian filtering, state and parameter estimation, partial measurements\.

## IIntroduction

Accurate modeling of the dynamics of a physical system is crucial in many applications, including robotics\[[30](https://arxiv.org/html/2606.31137#bib.bib43)\], industrial automation\[[25](https://arxiv.org/html/2606.31137#bib.bib40)\], navigation\[[12](https://arxiv.org/html/2606.31137#bib.bib47)\], health technology\[[27](https://arxiv.org/html/2606.31137#bib.bib46),[28](https://arxiv.org/html/2606.31137#bib.bib36)\], and target tracking\[[20](https://arxiv.org/html/2606.31137#bib.bib45)\]\. In many applications, constructing the model from the first principles is not possible because the physics are either partially or fully unknown\. In those cases, it is possible to treat the system unknown and learn the dynamic model from data using approaches such as neural networks \(NNs\)\[[24](https://arxiv.org/html/2606.31137#bib.bib21),[6](https://arxiv.org/html/2606.31137#bib.bib25)\], basis function expansions\[[31](https://arxiv.org/html/2606.31137#bib.bib22)\], or Gaussian process regression\[[32](https://arxiv.org/html/2606.31137#bib.bib23),[29](https://arxiv.org/html/2606.31137#bib.bib24)\]\. However, purely data\-driven approaches typically require large amounts of training data and often fail to obey the known physical laws\. Physics\-informed neural networks \(PINNs\) address these limitations by combining data\-driven learning with physical constraints, thereby improving generalization and interpretability while reducing data requirements\[[15](https://arxiv.org/html/2606.31137#bib.bib26),[5](https://arxiv.org/html/2606.31137#bib.bib27),[13](https://arxiv.org/html/2606.31137#bib.bib28)\]\.

![Refer to caption](https://arxiv.org/html/2606.31137v1/x1.png)Figure 1:Schematic diagram of the proposed method\. The kinetic energy \(TT\) and potential energy \(VV\) of the Lagrangian \(LL\) are parameterized using neural networks \(NNs\)\. The Euler–Lagrange equation with stochastic input is rewritten as a state\-space model\. The state and the neural network parameters are jointly estimated with an approximate Bayesian filter\.Lagrangian neural networks \(LNNs\) are a class of PINNs that use a Lagrangian formulation to learn system dynamics\[[7](https://arxiv.org/html/2606.31137#bib.bib2)\]\. By leveraging the Lagrangian formulation, LNNs incorporate physical laws, such as energy conservation, directly into the learning process\. Deep Lagrangian neural networks \(DeLaN\) were introduced in\[[23](https://arxiv.org/html/2606.31137#bib.bib20)\]as a physics\-informed approach for learning the equations of motion of robotic systems in real time\. Subsequent work extended this framework to model system energy and friction via inverse learning and demonstrated its effectiveness in real\-world robotic tasks and reinforcement learning\[[21](https://arxiv.org/html/2606.31137#bib.bib4),[22](https://arxiv.org/html/2606.31137#bib.bib6),[8](https://arxiv.org/html/2606.31137#bib.bib49)\]\.

Existing formulations of LNN typically assume access to the full system states, including the generalized velocities, which is often unrealistic in real\-world settings\. When only generalized coordinates are measured, the generalized velocities can, in principle, be obtained via numerical differentiation\[[3](https://arxiv.org/html/2606.31137#bib.bib41)\]\. However, this approach is limited in generality, amplifies measurement noise, and can significantly degrade performance\. In this paper, we propose an alternative approach, where an approximate Bayesian filter, such as an extended Kalman filter \(EKF\)\[[4](https://arxiv.org/html/2606.31137#bib.bib32),[26](https://arxiv.org/html/2606.31137#bib.bib9)\]or a sigma\-point filter\[[2](https://arxiv.org/html/2606.31137#bib.bib29),[34](https://arxiv.org/html/2606.31137#bib.bib30),[14](https://arxiv.org/html/2606.31137#bib.bib31)\]is used to estimate the full system states from partial and noisy measurements jointly with the LNN learning task\. In this approach, the neural networks used to parameterize the Lagrangian are trained via a parameter estimation procedure implemented on top of the filter, so the training explicitly accounts for measurement partiality and noise\.

Specifically, the main contributions of this paper are as follows \(see also Fig\.[1](https://arxiv.org/html/2606.31137#S1.F1)\):

- •We formulate the Lagrangian dynamics learning problem from partial and noisy measurements within a Bayesian state estimation framework\.
- •We model the unknown forces and uncertainties entering the Euler–Lagrange equations as white Gaussian noises, which leads to a continuous\-time stochastic state space model \(SSM\)\.
- •The kinetic and potential energies of the Lagrangian are parameterized using neural networks and embedded into the Euler–Lagrange equation\.
- •We employ an approximate Bayesian filtering method to jointly estimate the system states and learn the neural network parameters from partial observations\.

Unlike conventional LNNs that rely on full\-state observations, the proposed approach uses noisy partial observations\. Its effectiveness is demonstrated through numerical experiments, and its performance is compared with that of traditional LNNs and Gaussian\-approximated Bayesian filters using the known model\.

## IILagrangian Formulation

Lagrangian mechanics is a way to formulate the dynamics of a physical system by expressing them in terms of the system’s kinetic and potential energies\[[10](https://arxiv.org/html/2606.31137#bib.bib42),[17](https://arxiv.org/html/2606.31137#bib.bib48)\]\. Consider a system with generalized coordinateq=\(q1,q2,…,qn\)q=\(q\_\{1\},q\_\{2\},\\ldots,q\_\{n\}\)and the corresponding generalized velocity given byq˙=dqdt\\dot\{q\}=\\frac\{dq\}\{dt\}\. Let us assume that the potential energy is a scalar functionV\(q\)V\(q\)defined on the configuration space and depends only on the generalized coordinatesqq\. The kinetic energyT\(q,q˙\)T\(q,\\dot\{q\}\)can be a function of both the generalized coordinateqqand the velocityq˙\\dot\{q\}, although it often only depends onq˙\\dot\{q\}\. The Lagrangian can then be written as follows\[[10](https://arxiv.org/html/2606.31137#bib.bib42), p\. 35\]:

L\(q,q˙\)=T\(q,q˙\)−V\(q\)\.L\(q,\\dot\{q\}\)=T\(q,\\dot\{q\}\)\-V\(q\)\.\(1\)The Euler–Lagrange equations for the corresponding system are given by\[[10](https://arxiv.org/html/2606.31137#bib.bib42), p\. 48\]

ddt∂L\(q,q˙\)∂q˙−∂L\(q,q˙\)∂q=ητ,\\frac\{d\}\{dt\}\\frac\{\\partial L\(q,\\,\\dot\{q\}\)\}\{\\partial\\dot\{q\}\}\-\\frac\{\\partial L\(q,\\,\\dot\{q\}\)\}\{\\partial q\}=\\eta\_\{\\tau\},\(2\)where termητ\\eta\_\{\\tau\}accounts for the non\-conservative effects, including external forces\. By applying the chain rule, \([2](https://arxiv.org/html/2606.31137#S2.E2)\) becomes

∂2L\(q,q˙\)∂q∂q˙q˙\+∂2L\(q,q˙\)∂q˙∂q˙q¨−∂L\(q,q˙\)∂q=ητ\.\\frac\{\\partial^\{2\}L\(q,\\dot\{q\}\)\}\{\\partial q\\partial\\dot\{q\}\}\\dot\{q\}\+\\frac\{\\partial^\{2\}L\(q,\\dot\{q\}\)\}\{\\partial\\dot\{q\}\\partial\\dot\{q\}\}\\ddot\{q\}\-\\frac\{\\partial L\(q,\\,\\dot\{q\}\)\}\{\\partial q\}=\\eta\_\{\\tau\}\.\(3\)We assume that the Hessian matrix,∂2L\(q,q˙\)∂q˙∂q˙\\frac\{\\partial^\{2\}L\(q,\\dot\{q\}\)\}\{\\partial\\dot\{q\}\\partial\\dot\{q\}\}, is non\-singular\. The above equation \([3](https://arxiv.org/html/2606.31137#S2.E3)\) can then be explicitly solved for the generalized acceleration,q¨\\ddot\{q\}, as follows:

q¨=\[∂2L\(q,q˙\)∂q˙∂q˙\]−1\[ητ\+∂L\(q,q˙\)∂q−∂2L\(q,q˙\)∂q∂q˙q˙\]\.\\ddot\{q\}=\\Big\[\\frac\{\\partial^\{2\}L\(q,\\,\\dot\{q\}\)\}\{\\partial\\dot\{q\}\\partial\\dot\{q\}\}\\Big\]^\{\-1\}\\,\\Big\[\\eta\_\{\\tau\}\+\\frac\{\\partial L\(q,\\,\\dot\{q\}\)\}\{\\partial q\}\-\\frac\{\\partial^\{2\}L\(q,\\,\\dot\{q\}\)\}\{\\partial q\\partial\\dot\{q\}\}\\dot\{q\}\\Big\]\.\(4\)We can now model the following term as white noise:

η≈\[∂2L\(q,q˙\)∂q˙∂q˙\]−1ητ,\\eta\\approx\\Big\[\\frac\{\\partial^\{2\}L\(q,\\,\\dot\{q\}\)\}\{\\partial\\dot\{q\}\\partial\\dot\{q\}\}\\Big\]^\{\-1\}\\eta\_\{\\tau\},\(5\)whereη\\etais a Gaussian white noise with a spectral densityQcQ\_\{c\}\. Using \([4](https://arxiv.org/html/2606.31137#S2.E4)\) and \([5](https://arxiv.org/html/2606.31137#S2.E5)\), the continuous\-time dynamics of a system can be expressed in first\-order stochastic state space form in terms of the state vector\[qq˙\]⊤\\begin\{bmatrix\}q&\\dot\{q\}\\end\{bmatrix\}^\{\\top\}as follows:

ddt\[qq˙\]=\[q˙\[∂2\(L\(q,q˙\)\)∂q˙∂q˙\]−1\[∂\(L\(q,q˙\)\)∂q−∂2\(L\(q,q˙\)\)∂q∂q˙q˙\]\]\+\[0η\]\.\\begin\{split\}&\\frac\{d\}\{dt\}\\begin\{bmatrix\}q\\\\ \\dot\{q\}\\end\{bmatrix\}=\\begin\{bmatrix\}\\dot\{q\}\\\\ \\Big\[\\frac\{\\partial^\{2\}\\left\(L\(q,\\dot\{q\}\)\\right\)\}\{\\partial\\dot\{q\}\\partial\\dot\{q\}\}\\Big\]^\{\-1\}\\,\\Big\[\\frac\{\\partial\\left\(L\(q,\\dot\{q\}\)\\right\)\}\{\\partial q\}\-\\frac\{\\partial^\{2\}\\left\(L\(q,\\dot\{q\}\)\\right\)\}\{\\partial q\\partial\\dot\{q\}\}\\dot\{q\}\\Big\]\\end\{bmatrix\}\+\\begin\{bmatrix\}0\\\\ \\eta\\end\{bmatrix\}\.\\end\{split\}\(6\)

## IIIProposed Methodology

In this section, we develop a Bayesian estimation framework that jointly learns the system dynamics and estimates the system states from partial and noisy measurements within the Lagrangian formulation\.

### III\-ALNN learning and Bayesian filtering

Let us assume that the kinetic and potential energies,T\(q,q˙\)T\(q,\\,\\dot\{q\}\)andV\(q\)V\(q\), are not available a priori and hence the LagrangianL\(q,q˙\)L\(q,\\dot\{q\}\)of the system is unknown\. To learn the system dynamics, our aim is now to infer the unknown energy functions from the available measurements using a suitable class of function approximators\. In this work, we employ a neural network\-based approach to approximate the energy functions\. This is the same approach as is used in Lagrangian neural networks \(LNNs\)\[[7](https://arxiv.org/html/2606.31137#bib.bib2),[23](https://arxiv.org/html/2606.31137#bib.bib20),[21](https://arxiv.org/html/2606.31137#bib.bib4),[22](https://arxiv.org/html/2606.31137#bib.bib6)\]\.

The unknownT\(q,q˙\)T\(q,\\dot\{q\}\)andV\(q\)V\(q\)are approximated using neural networks as follows:

T\(q,q˙\)≈TNN\(q,q˙;μ\),V\(q\)≈VNN\(q;μ\),\\begin\{split\}T\(q,\\dot\{q\}\)&\\approx T\_\{\\text\{NN\}\}\(q,\\,\\dot\{q\};\\mu\),\\\\ V\(q\)&\\approx V\_\{\\text\{NN\}\}\(q;\\mu\),\\end\{split\}\(7\)whereTNNT\_\{\\text\{NN\}\}andVNNV\_\{\\text\{NN\}\}denote neural networks \(NNs\), andμ\\murepresents the learnable parameters \(weights and biases\)\. The acceleration term in \([6](https://arxiv.org/html/2606.31137#S2.E6)\) can then be approximated as:

q¨≈f\(q,q˙;μ\)\+η,\\ddot\{q\}\\approx f\(q,\\,\\dot\{q\};\\,\\mu\)\+\\eta,\(8\)where

LNN\(q,q˙;μ\)=TNN\(q,q˙;μ\)−VNN\(q;μ\),f\(q,q˙;μ\)=\[∂2\(LNN\(q,q˙;μ\)\)∂q˙∂q˙\]−1×\[∂\(LNN\(q,q˙;μ\)\)∂q−∂2\(LNN\(q,q˙;μ\)\)∂q∂q˙q˙\]\.\\begin\{split\}L\_\{\\text\{NN\}\}\(q,\\,\\dot\{q\};\\mu\)&=T\_\{\\text\{NN\}\}\(q,\\,\\dot\{q\};\\mu\)\-V\_\{\\text\{NN\}\}\(q;\\mu\),\\\\ f\(q,\\,\\dot\{q\};\\,\\mu\)&=\\Bigg\[\\frac\{\\partial^\{2\}\\left\(L\_\{\\text\{NN\}\}\(q,\\dot\{q\};\\mu\)\\right\)\}\{\\partial\\dot\{q\}\\partial\\dot\{q\}\}\\Bigg\]^\{\-1\}\\\\ &\\quad\\times\\Bigg\[\\frac\{\\partial\\left\(L\_\{\\text\{NN\}\}\(q,\\dot\{q\};\\mu\)\\right\)\}\{\\partial q\}\-\\frac\{\\partial^\{2\}\\left\(L\_\{\\text\{NN\}\}\(q,\\dot\{q\};\\mu\)\\right\)\}\{\\partial q\\partial\\dot\{q\}\}\\dot\{q\}\\Bigg\]\.\\end\{split\}In state estimation\[[26](https://arxiv.org/html/2606.31137#bib.bib9)\], the state of the dynamical system is typically denoted byxx\. Defining the state asx=\[x1x2\]⊤=\[qq˙\]⊤∈ℝnxx=\\begin\{bmatrix\}x\_\{1\}&x\_\{2\}\\end\{bmatrix\}^\{\\top\}=\\begin\{bmatrix\}q&\\dot\{q\}\\end\{bmatrix\}^\{\\top\}\\in\\mathbb\{R\}^\{n\_\{x\}\}, \([6](https://arxiv.org/html/2606.31137#S2.E6)\) can be rewritten as

\[dx1dtdx2dt\]=\[x2f\(x1,x2;μ\)\]\+\[0η\]\.\\begin\{bmatrix\}\\frac\{dx\_\{1\}\}\{dt\}\\\\ \\frac\{dx\_\{2\}\}\{dt\}\\end\{bmatrix\}=\\begin\{bmatrix\}x\_\{2\}\\\\ f\(x\_\{1\},x\_\{2\};\\mu\)\\end\{bmatrix\}\+\\begin\{bmatrix\}0\\\\ \\eta\\end\{bmatrix\}\.\(9\)To enable state and parameter estimation using Bayesian filtering, the continuous\-time dynamics can be discretized using, for example, the Euler–Maruyama method\[[28](https://arxiv.org/html/2606.31137#bib.bib36),[26](https://arxiv.org/html/2606.31137#bib.bib9)\]\. The resulting discrete\-time process model then has the form

\[x1,kx2,k\]=\[x1,k−1\+x2,k−1Δtx2,k−1\+f\(x1,k−1,x2,k−1;μ\)Δt\]⏟f~\(xk−1;μ\)\+η~k−1,\\begin\{bmatrix\}x\_\{1,k\}\\\\ x\_\{2,k\}\\end\{bmatrix\}=\\underbrace\{\\begin\{bmatrix\}x\_\{1,k\-1\}\+x\_\{2,k\-1\}\\Delta t\\\\ x\_\{2,k\-1\}\+f\(x\_\{1,k\-1\},x\_\{2,k\-1\};\\mu\)\\Delta t\\end\{bmatrix\}\}\_\{\\tilde\{f\}\(x\_\{k\-1\};\\mu\)\}\+\\tilde\{\\eta\}\_\{k\-1\},whereΔt\\Delta tis the sampling interval andη~k−1∼𝒩\(0,Qk−1\)\\tilde\{\\eta\}\_\{k\-1\}\\sim\\mathcal\{N\}\(0,Q\_\{k\-1\}\)is a Gaussian process noise withQk−1=diag\(0,QcΔt\)Q\_\{k\-1\}=\\mathrm\{diag\}\(0,Q\_\{c\}\\Delta t\), which thus has the form

xk=f~\(xk−1;μ\)\+η~k−1\.x\_\{k\}=\\tilde\{f\}\(x\_\{k\-1\};\\mu\)\+\\tilde\{\\eta\}\_\{k\-1\}\.\(10\)Note that in practice, it is often beneficial to use more sophisticated discretization methods than the plain Euler–Maruyama \(see, e\.g,\[[28](https://arxiv.org/html/2606.31137#bib.bib36),[26](https://arxiv.org/html/2606.31137#bib.bib9)\]\)\.

We further assume that the available data can be modeled as the following measurement model:

yk=h\(xk\)\+νk,y\_\{k\}=h\(x\_\{k\}\)\+\\nu\_\{k\},\(11\)whereνk∼𝒩\(0,Rk\)\\nu\_\{k\}\\sim\\mathcal\{N\}\(0,R\_\{k\}\)is the Gaussian measurement noise\. For example, if we only measure the generalized coordinates \(but not the velocities\), we haveh\(xk\)=x1,kh\(x\_\{k\}\)=x\_\{1,k\}, but substantially more general measurement models can also be used\.

Equations \([10](https://arxiv.org/html/2606.31137#S3.E10)\) and \([11](https://arxiv.org/html/2606.31137#S3.E11)\) now define a standard Bayesian state estimation problem\[[26](https://arxiv.org/html/2606.31137#bib.bib9)\], where the unknown parameters of the system are the neural network weightsμ\\mu\. Thus, this formulation enables the joint state and parameter estimation within the Bayesian filtering framework\.

### III\-BAffine approximation

The state and parameter estimation problem for the nonlinear state space model \(SSM\) \([10](https://arxiv.org/html/2606.31137#S3.E10)\)\-\([11](https://arxiv.org/html/2606.31137#S3.E11)\) cannot be solved exactly in closed form\. Therefore, we adopt a Gaussian approximated Bayesian filtering framework\[[26](https://arxiv.org/html/2606.31137#bib.bib9),[18](https://arxiv.org/html/2606.31137#bib.bib16),[9](https://arxiv.org/html/2606.31137#bib.bib33),[33](https://arxiv.org/html/2606.31137#bib.bib50)\]and construct an affine approximation of the stochastic SSM \([10](https://arxiv.org/html/2606.31137#S3.E10)\)\-\([11](https://arxiv.org/html/2606.31137#S3.E11)\) as follows:

xk≈Ak−1xk−1\+ak−1\+η´k−1,yk≈Hkxk\+bk\+ν´k,\\begin\{split\}x\_\{k\}&\\approx A\_\{k\-1\}x\_\{k\-1\}\+a\_\{k\-1\}\+\\acute\{\\eta\}\_\{k\-1\},\\\\ y\_\{k\}&\\approx H\_\{k\}x\_\{k\}\+b\_\{k\}\+\\acute\{\\nu\}\_\{k\},\\end\{split\}\(12\)whereη´k−1∼𝒩\(0,Λk−1\)\\acute\{\\eta\}\_\{k\-1\}\\sim\\mathcal\{N\}\(0,\\Lambda\_\{k\-1\}\)andν´k∼𝒩\(0,Ωk\)\\acute\{\\nu\}\_\{k\}\\sim\\mathcal\{N\}\(0,\\Omega\_\{k\}\)\. The approximation parameters areAk−1∈ℝnx×nx,ak−1∈ℝnx,Hk∈ℝny×nxA\_\{k\-1\}\\in\\mathbb\{R\}^\{n\_\{x\}\\times n\_\{x\}\},a\_\{k\-1\}\\in\\mathbb\{R\}^\{n\_\{x\}\},H\_\{k\}\\in\\mathbb\{R\}^\{n\_\{y\}\\times n\_\{x\}\}, andbk∈ℝnyb\_\{k\}\\in\\mathbb\{R\}^\{n\_\{y\}\}\. Various approaches exist in the literature for computing these parameters\. In this work, we employ 1\) sigma\-point\-based statistical linear regression \(SLR\) and 2\) the first\-order Taylor series approximation \(the extended Kalman filter, EKF\)\. These approaches are discussed in detail below\.

1\) In statistical linear regression, the approximation parametersAk−1,ak−1,Λk−1A\_\{k\-1\},\\;a\_\{k\-1\},\\;\\Lambda\_\{k\-1\}are computed as\[[26](https://arxiv.org/html/2606.31137#bib.bib9),[9](https://arxiv.org/html/2606.31137#bib.bib33),[1](https://arxiv.org/html/2606.31137#bib.bib34)\]

Ak−1\\displaystyle A\_\{k\-1\}=Γ⊤Pk−1∣k−1−1,\\displaystyle=\\Gamma^\{\\top\}P\_\{k\-1\\mid k\-1\}^\{\-1\},ak−1\\displaystyle a\_\{k\-1\}=x¯−Ak−1x^k−1∣k−1,\\displaystyle=\\bar\{x\}\-A\_\{k\-1\}\\hat\{x\}\_\{k\-1\\mid k\-1\},Λk−1\\displaystyle\\Lambda\_\{k\-1\}=Φ−Ak−1Pk−1∣k−1Ak−1⊤,\\displaystyle=\\Phi\-A\_\{k\-1\}P\_\{k\-1\\mid k\-1\}A\_\{k\-1\}^\{\\top\},wherex^k−1∣k−1≈E\[xk−1∣y1:k−1\]\\hat\{x\}\_\{k\-1\\mid k\-1\}\\approx E\[x\_\{k\-1\}\\mid y\_\{1:k\-1\}\]is the approximate conditional mean ofxk−1x\_\{k\-1\}given measurements up to time stepk−1k\-1, and the corresponding error covariancePk−1∣k−1≈E\[\(xk−1−x^k−1∣k−1\)\(xk−1−x^k−1∣k−1\)⊤∣y1:k−1\]P\_\{k\-1\\mid k\-1\}\\approx E\[\(x\_\{k\-1\}\-\\hat\{x\}\_\{k\-1\\mid k\-1\}\)\(x\_\{k\-1\}\-\\hat\{x\}\_\{k\-1\\mid k\-1\}\)^\{\\top\}\\mid y\_\{1:k\-1\}\]\. Above,x¯≈∑j=1mωjf~\(ζj,k−1∣k−1;μ\)\\bar\{x\}\\approx\\sum\_\{j=1\}^\{m\}\\omega\_\{j\}\\tilde\{f\}\(\\zeta\_\{j,k\-1\\mid k\-1\};\\mu\),Γ=∑j=1mwj\(ζj,k−1∣k−1−x^k−1∣k−1\)\(𝒵j−x¯\)⊤\\Gamma=\\sum\_\{j=1\}^\{m\}w\_\{j\}\(\\zeta\_\{j,k\-1\\mid k\-1\}\-\\hat\{x\}\_\{k\-1\\mid k\-1\}\)\(\\mathcal\{Z\}\_\{j\}\-\\bar\{x\}\)^\{\\top\},Φ=∑j=1mwj\(𝒵j−x¯\)\(𝒵j−x¯\)⊤\\Phi=\\sum\_\{j=1\}^\{m\}w\_\{j\}\(\\mathcal\{Z\}\_\{j\}\-\\bar\{x\}\)\(\\mathcal\{Z\}\_\{j\}\-\\bar\{x\}\)^\{\\top\}, whereζj,k−1∣k−1=Sk−1∣k−1ξj\+x^k−1∣k−1\\zeta\_\{j,k\-1\\mid k\-1\}=S\_\{k\-1\\mid k\-1\}\\xi\_\{j\}\+\\hat\{x\}\_\{k\-1\\mid k\-1\},Sk−1∣k−1S\_\{k\-1\\mid k\-1\}is the lower triangular Cholesky factor ofPk−1∣k−1P\_\{k\-1\\mid k\-1\},mmdenotes the number of sigma points,wjw\_\{j\}is the weight corresponding tojjth unit sigma pointξj\\xi\_\{j\}, and𝒵j=f~\(ζj,k−1∣k−1;μ\)\\mathcal\{Z\}\_\{j\}=\\tilde\{f\}\(\\zeta\_\{j,k\-1\\mid k\-1\};\\mu\)\. The approximation parameters for the measurement model\(Hk,bk,Ωk\)\(H\_\{k\},b\_\{k\},\\Omega\_\{k\}\)are computed in a similar manner \(see\[[26](https://arxiv.org/html/2606.31137#bib.bib9)\]\)\.

2\) In the Taylor series approximation, first\-order Taylor series approximations\[[4](https://arxiv.org/html/2606.31137#bib.bib32)\]of the nonlinear functionsf~\(xk−1\)\\tilde\{f\}\(x\_\{k\-1\}\)andh\(xk\)h\(x\_\{k\}\)are employed around the availablex^k−1∣k−1\\hat\{x\}\_\{k\-1\\mid k\-1\}andx^k∣k−1\\hat\{x\}\_\{k\\mid k\-1\}, respectively\. Here,x^k∣k−1≈E\[xk∣y1:k−1\]\\hat\{x\}\_\{k\\mid k\-1\}\\approx E\[x\_\{k\}\\mid y\_\{1:k\-1\}\]is the approximate mean ofxkx\_\{k\}giveny1:k−1y\_\{1:k\-1\}\. The resulting approximation parameters are given byAk−1=Fj\(x^k−1∣k−1\)A\_\{k\-1\}=F\_\{j\}\(\\hat\{x\}\_\{k\-1\\mid k\-1\}\),ak−1=f~\(x^k−1∣k−1;μ\)−Fj\(x^k−1∣k−1\)x^k−1∣k−1a\_\{k\-1\}=\\tilde\{f\}\(\\hat\{x\}\_\{k\-1\\mid k\-1\};\\mu\)\-F\_\{j\}\(\\hat\{x\}\_\{k\-1\\mid k\-1\}\)\\hat\{x\}\_\{k\-1\\mid k\-1\},Hk=Hj\(x^k∣k−1\)H\_\{k\}=H\_\{j\}\(\\hat\{x\}\_\{k\\mid k\-1\}\),bk=h\(x^k∣k−1\)−Hj\(x^k∣k−1\)x^k∣k−1b\_\{k\}=h\(\\hat\{x\}\_\{k\\mid k\-1\}\)\-H\_\{j\}\(\\hat\{x\}\_\{k\\mid k\-1\}\)\\hat\{x\}\_\{k\\mid k\-1\},Λk−1=Qk−1\\Lambda\_\{k\-1\}=Q\_\{k\-1\}, andΩk=Rk\\Omega\_\{k\}=R\_\{k\}\. Here,Fj\(x^k−1∣k−1\)=∂f~\(xk−1;μ\)∂xk−1\|xk−1=x^k−1∣k−1F\_\{j\}\(\\hat\{x\}\_\{k\-1\\mid k\-1\}\)=\\partialderivative\{\\tilde\{f\}\(x\_\{k\-1\};\\mu\)\}\{x\_\{k\-1\}\}\\Big\|\_\{x\_\{k\-1\}=\\hat\{x\}\_\{k\-1\\mid k\-1\}\}andHj\(x^k∣k−1\)=∂h\(xk\)∂xk\|xk=x^k∣k−1H\_\{j\}\(\\hat\{x\}\_\{k\\mid k\-1\}\)=\\partialderivative\{h\(x\_\{k\}\)\}\{x\_\{k\}\}\\Big\|\_\{x\_\{k\}=\\hat\{x\}\_\{k\\mid k\-1\}\}denote the Jacobians of the process and measurement models, respectively\.

### III\-CGaussian approximated Bayesian filter

In this section, we present the approximate Bayesian filtering algorithm based on the affine approximation in \([12](https://arxiv.org/html/2606.31137#S3.E12)\)\. The filtering procedure is performed in two steps\[[26](https://arxiv.org/html/2606.31137#bib.bib9)\]: prediction step and update step\. On the prediction step, the predictive density of the statexkx\_\{k\}given the measurementsy1:k−1y\_\{1:k\-1\}is computed asp\(xk∣y1:k−1\)=𝒩\(xk∣x^k∣k−1,Pk∣k−1\)p\(x\_\{k\}\\mid y\_\{1:k\-1\}\)=\\mathcal\{N\}\(x\_\{k\}\\mid\\hat\{x\}\_\{k\\mid k\-1\},P\_\{k\\mid k\-1\}\), where

x^k∣k−1=Ak−1x^k−1∣k−1\+ak−1,\\displaystyle\\hat\{x\}\_\{k\\mid k\-1\}=A\_\{k\-1\}\\hat\{x\}\_\{k\-1\\mid k\-1\}\+a\_\{k\-1\},\(13\)Pk∣k−1=Ak−1Pk−1∣k−1Ak−1⊤\+Λk−1,\\displaystyle P\_\{k\\mid k\-1\}=A\_\{k\-1\}P\_\{k\-1\\mid k\-1\}A\_\{k\-1\}^\{\\top\}\+\\Lambda\_\{k\-1\},\(14\)and the linearization parameters are computed as described in the previous section\. Upon receiving the new measurement, the posterior distribution ofxkx\_\{k\}conditioned ony1:ky\_\{1:k\}is given byp\(xk∣y1:k\)=𝒩\(xk∣x^k∣k,Pk∣k\)p\(x\_\{k\}\\mid y\_\{1:k\}\)=\\mathcal\{N\}\(x\_\{k\}\\mid\\hat\{x\}\_\{k\\mid k\},\\,P\_\{k\\mid k\}\), where

Kk\\displaystyle K\_\{k\}=Pk∣k−1Hk⊤\(Pk∣k−1yy\)−1,\\displaystyle=P\_\{k\\mid k\-1\}H\_\{k\}^\{\\top\}\\left\(P^\{yy\}\_\{k\\mid k\-1\}\\right\)^\{\-1\},\(15\)x^k∣k\\displaystyle\\hat\{x\}\_\{k\\mid k\}=x^k∣k−1\+Kk\(yk−y^k∣k−1\),\\displaystyle=\\hat\{x\}\_\{k\\mid k\-1\}\+K\_\{k\}\(y\_\{k\}\-\\hat\{y\}\_\{k\\mid k\-1\}\),\(16\)Pk∣k\\displaystyle P\_\{k\\mid k\}=Pk∣k−1−KkPk∣k−1yyKk⊤\.\\displaystyle=P\_\{k\\mid k\-1\}\-K\_\{k\}P^\{yy\}\_\{k\\mid k\-1\}K\_\{k\}^\{\\top\}\.\(17\)Above, the linearization parameters are again computed as described in the previous section,y^k∣k−1=Hkx^k∣k−1\+bk\\hat\{y\}\_\{k\\mid k\-1\}=H\_\{k\}\\hat\{x\}\_\{k\\mid k\-1\}\+b\_\{k\}is the predicted mean of measurement, andPk∣k−1yy=HkPk∣k−1Hk⊤\+ΩkP^\{yy\}\_\{k\\mid k\-1\}=H\_\{k\}P\_\{k\\mid k\-1\}H\_\{k\}^\{\\top\}\+\\Omega\_\{k\}is its error covariance\.

The neural network parametersμ\\muappearing in the SSM in \([10](https://arxiv.org/html/2606.31137#S3.E10)\) are estimated by minimizing the negative log\-likelihood, also known as the energy function, of the measurementsy1:Tsy\_\{1:T\_\{s\}\}givenμ\\mu\[[26](https://arxiv.org/html/2606.31137#bib.bib9),[16](https://arxiv.org/html/2606.31137#bib.bib12)\]\. The energy function accumulated overTsT\_\{s\}time steps can be written asETs\(μ\)=∑k=1Ts12\[log⁡\(\(2π\)ny\|Pk∣k−1yy\|\)\+\(yk−y^k∣k−1\)⊤\(Pk∣k−1yy\)−1\(yk−y^k∣k−1\)\]\.E\_\{T\_\{s\}\}\(\\mu\)=\\sum\_\{k=1\}^\{T\_\{s\}\}\\frac\{1\}\{2\}\\Big\[\\log\(\(2\\pi\)^\{n\_\{y\}\}\|P^\{yy\}\_\{k\\mid k\-1\}\|\)\+\(y\_\{k\}\-\\hat\{y\}\_\{k\\mid k\-1\}\)^\{\\top\}\(P^\{yy\}\_\{k\\mid k\-1\}\)^\{\-1\}\(y\_\{k\}\-\\hat\{y\}\_\{k\\mid k\-1\}\)\\Big\]\.The implementation of the estimation algorithm, which also evaluates the energy function, is summarized in Algorithm[1](https://arxiv.org/html/2606.31137#alg1)\.

Algorithm 1Estimation algorithm for affine SSM1:function

\[\{x^k\|k,Pk\|k\}k=1Ts,ETs\(μ\)\]=EST\[\\\{\\hat\{x\}\_\{k\|k\},P\_\{k\|k\}\\\}\_\{k=1\}^\{T\_\{s\}\},E\_\{T\_\{s\}\}\(\\mu\)\]=\\text\{EST\}\(

y1:Ts,μy\_\{1:T\_\{s\}\},\\mu\)

2:Initialize:Start from

x^0\|0\\hat\{x\}\_\{0\|0\},

P0\|0P\_\{0\|0\}, and

E0\(μ\)=0E\_\{0\}\(\\mu\)=0\.

3:for

k=1,…,Tsk=1,\\ldots,T\_\{s\}do

4:Compute

\[Ak−1,ak−1,Λk−1\]\[A\_\{k\-1\},\\,a\_\{k\-1\},\\,\\Lambda\_\{k\-1\}\]using SLR or Taylor

5:series approximation\.

6:Compute

x^k∣k−1\\hat\{x\}\_\{k\\mid k\-1\}and

Pk∣k−1P\_\{k\\mid k\-1\}following \([13](https://arxiv.org/html/2606.31137#S3.E13)\) and \([14](https://arxiv.org/html/2606.31137#S3.E14)\)\.

7:Evaluate

\[Hk,bk,Ωk\]\[H\_\{k\},\\,b\_\{k\},\\,\\Omega\_\{k\}\]using SLR or Taylor series

8:approximation\.

9:Update

x^k∣k\\hat\{x\}\_\{k\\mid k\}and

Pk∣kP\_\{k\\mid k\}using \([16](https://arxiv.org/html/2606.31137#S3.E16)\) and \([17](https://arxiv.org/html/2606.31137#S3.E17)\)\.

10:Evaluate

Ek\(μ\)=Ek−1\(μ\)\+12\(log⁡\(2π\)ny\|Pk∣k−1yy\|E\_\{k\}\(\\mu\)=E\_\{k\-1\}\(\\mu\)\+\\frac\{1\}\{2\}\(\\log\(2\\pi\)^\{n\_\{y\}\}\|P\_\{k\\mid k\-1\}^\{yy\}\|
11:

\+\(yk−y^k∣k−1\)⊤\(Pk∣k−1yy\)−1\(yk−y^k∣k−1\)\)\+\(y\_\{k\}\-\\hat\{y\}\_\{k\\mid k\-1\}\)^\{\\top\}\(P\_\{k\\mid k\-1\}^\{yy\}\)^\{\-1\}\(y\_\{k\}\-\\hat\{y\}\_\{k\\mid k\-1\}\)\)\.

12:endfor

13:endfunction

The parameter estimate is then obtained as\[[26](https://arxiv.org/html/2606.31137#bib.bib9)\]

μ^=arg⁡minμ⁡ETs\(μ\),\\hat\{\\mu\}=\\arg\\min\_\{\\mu\}E\_\{T\_\{s\}\}\(\\mu\),\(18\)which in practice can be found by using an iterative optimization method such as gradient descent or Adam optimizer\[[11](https://arxiv.org/html/2606.31137#bib.bib13)\]\.

The pseudo\-code for parameter estimation is presented in Algorithm[2](https://arxiv.org/html/2606.31137#alg2)\. For simplicity, the update step in the algorithm is formulated using gradient descent\[[11](https://arxiv.org/html/2606.31137#bib.bib13)\]\. However, the proposed framework is general and can be implemented with other optimization methods, such as Adam\.

Algorithm 2Parameter estimation for the proposed method1:Initialize:Start from an initial guess

μ\\mu\.

2:forepoch

=1,…,MaxEpochs=1,\\ldots,\\textrm\{MaxEpochs\}do

3:Run the filter over the entire dataset to compute the

4:energy function

\[⋅,ETs\(μ\)\]=EST\(y1:Ts,μ\)\[\\cdot,\\,E\_\{T\_\{s\}\}\(\\mu\)\]=\\text\{EST\}\(y\_\{1:T\_\{s\}\},\\mu\)\.

5:Compute the gradient of

ETsE\_\{T\_\{s\}\}with respect to

6:

μ\\mu, that is,

∇ETs\(μ\)\\nabla E\_\{T\_\{s\}\}\(\\mu\)\.

7:Update the parameters

μ←μ−γ∇ETs\(μ\)\\mu\\leftarrow\\mu\-\\gamma\\nabla E\_\{T\_\{s\}\}\(\\mu\)\.

8:endfor

## IVExperimental Results

In this section, we evaluate the proposed method on a simple pendulum and a Duffing oscillator\. The kinetic energy is assumed symmetric in velocity,T\(q˙\)=T\(−q˙\)T\(\\dot\{q\}\)=T\(\-\\dot\{q\}\), and we enforce this property by parameterizing it as a function of the squared velocity, usingq˙2\\dot\{q\}^\{2\}as the input to the network\. Additionally, we impose the conditionT\(0\)=0T\(0\)=0by subtracting the network output evaluated at zero velocity, thereby ensuring that the kinetic energy is zero whenq˙=0\\dot\{q\}=0\. The kinetic and potential energy networks are fully connected neural networks with one and two hidden layers of 32 and 64 neurons, respectively\. Both use the softplus activation function, followed by a linear output layer without bias\. To fix the reference level of the potential energy, we setV\(0\)=0V\(0\)=0by subtracting the network output atq=0q=0\. Both networks are trained using the Adam optimizer with a learning rate of5×10−35\\times 10^\{\-3\}\. The dataset is divided into training and testing subsets, with 70% of the data used for model training and the remaining 30% reserved for testing and performance evaluation\.

We implement the proposed EKF \(PrEKF\) and proposed cubature Kalman filter \(PrCKF\), and compare their performance with LNNs, the traditional EKF \(TrEKF\), and the traditional CKF \(TrCKF\), assuming the model is known\. In the traditional LNNs, the velocityq˙\\dot\{q\}is obtained via numerical differentiation, and\(q,q˙\)\(q,\\,\\dot\{q\}\)are used as input to the LNNs\.

For the simple pendulum problem, the states\(q,q˙\)\(q,\\,\\dot\{q\}\)represent the angular position and angular velocity, respectively\. The ground truth angular acceleration is given byq¨=−glsin⁡\(q\)\\ddot\{q\}=\-\\frac\{g\}\{l\}\\sin\{q\}\[[26](https://arxiv.org/html/2606.31137#bib.bib9)\], whereggis gravitational acceleration andllis pendulum length\. The mass and length of the pendulum are set to unity, andQkQ\_\{k\}is the same as in\[[26](https://arxiv.org/html/2606.31137#bib.bib9), pp\. 67–68\]\. The noisy measurement model isyk=qk\+νky\_\{k\}=q\_\{k\}\+\\nu\_\{k\}, whereνk∼𝒩\(0,0\.01\)\\nu\_\{k\}\\sim\\mathcal\{N\}\(0,\\,0\.01\)\.

The results summarized in Table[I](https://arxiv.org/html/2606.31137#S4.T1)and Table[II](https://arxiv.org/html/2606.31137#S4.T2)indicate that the proposed methods \(PrEKF and PrCKF\) achieve RMSE values comparable to those of standard estimators \(TrEKF and TrCKF\) with a known model for both noisy and noise\-free measurements\. On the other hand, the RMSE obtained using LNNs is significantly higher for noisyyky\_\{k\}because the measurement noise inqqis amplified during numerical differentiation used to obtainq˙\\dot\{q\}\. However, as reported in Table[II](https://arxiv.org/html/2606.31137#S4.T2), in the absence of measurement noise inqq, the LNNs also provide results comparable to the traditional and proposed filters\.

TABLE I:RMSE values of different methods under noisy measurements for the simple pendulum and Duffing oscillator\.TABLE II:RMSE values of different methods for the simple pendulum and Duffing oscillator under noise\-freeyky\_\{k\}\.In the Duffing oscillator problem\[[19](https://arxiv.org/html/2606.31137#bib.bib37)\], the states\(q,q˙\)\(q,\\,\\dot\{q\}\)denote the position and velocity, respectively\. The acceleration is given byq¨=−αq−βq3\\ddot\{q\}=\-\\alpha q\-\\beta q^\{3\}, whereα=−1\\alpha=\-1N/m andβ=1\\beta=1N/m3, andQk=10−5I2×2Q\_\{k\}=10^\{\-5\}I\_\{2\\times 2\}\. The measurement equation isyk=qk\+νky\_\{k\}=q\_\{k\}\+\\nu\_\{k\}, whereνk∼𝒩\(0,0\.01\)\\nu\_\{k\}\\sim\\mathcal\{N\}\(0,0\.01\)\. The RMSE values reported in Table[I](https://arxiv.org/html/2606.31137#S4.T1)and Table[II](https://arxiv.org/html/2606.31137#S4.T2)show trends similar to those observed in the simple pendulum problem, with PrEKF and PrCKF achieving performance comparable to TrEKF and TrCKF with a known model, while the LNNs exhibit degraded performance for noisy measurements but comparable performance in the noise\-free case\.

## VConclusion

In this paper, we have proposed a Bayesian filtering\-based framework for learning system dynamics from partial, noisy measurements using a Lagrangian mechanics formulation\. The kinetic and potential energies were parameterized using neural networks, and unknown forces were modeled as white Gaussian noise\. A continuous\-time stochastic state\-space model was derived from the Lagrangian formulation\. The maximum\-likelihood approach was employed to jointly estimate the system states and neural network parameters using Gaussian approximation\-based Bayesian filters\. Numerical examples involving a simple pendulum and a Duffing oscillator demonstrated the effectiveness of the proposed approach and highlighted its advantages over conventional Lagrangian neural networks\.

## References

- \[1\]\(2007\)Discrete\-time nonlinear filtering algorithms using Gauss–Hermite quadrature\.Proceedings of the IEEE95\(5\),pp\. 953–977\.Cited by:[§III\-B](https://arxiv.org/html/2606.31137#S3.SS2.p2.1)\.
- \[2\]I\. Arasaratnam and S\. Haykin\(2009\)Cubature Kalman filters\.IEEE Transactions on Automatic Control54\(6\),pp\. 1254–1269\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p3.1)\.
- \[3\]K\. E\. Atkinson\(2008\)An introduction to numerical analysis\.John Wiley & Sons\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p3.1)\.
- \[4\]Y\. Bar\-Shalom, X\. R\. Li, and T\. Kirubarajan\(2001\)Estimation with applications to tracking and navigation: theory algorithms and software\.John Wiley & Sons\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p3.1),[§III\-B](https://arxiv.org/html/2606.31137#S3.SS2.p3.15)\.
- \[5\]S\. Cai, Z\. Mao, Z\. Wang, M\. Yin, and G\. E\. Karniadakis\(2021\)Physics\-informed neural networks \(PINNs\) for fluid mechanics: a review\.Acta Mechanica Sinica37\(12\),pp\. 1727–1738\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p1.1)\.
- \[6\]S\. R\. Chu, R\. Shoureshi, and M\. Tenorio\(2002\)Neural networks for system identification\.IEEE Control Systems Magazine10\(3\),pp\. 31–35\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p1.1)\.
- \[7\]M\. Cranmer, S\. Greydanus, S\. Hoyer, P\. Battaglia, D\. Spergel, and S\. Ho\(2020\)Lagrangian neural networks\.InProceedings of the 8th International Conference on Learning Representations,pp\. 1–7\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p2.1),[§III\-A](https://arxiv.org/html/2606.31137#S3.SS1.p1.3)\.
- \[8\]S\. Das, K\. Kumar, M\. Iqbal, O\. Savolainen, D\. Baumann, L\. Ruotsalainen, and S\. Särkkä\(2026\)Integrating Lagrangian neural networks into the Dyna framework for reinforcement learning\.to appear in Proceedings of 34th European Signal Processing Conference \(EUSIPCO\)\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p2.1)\.
- \[9\]Á\. F\. García\-Fernández, L\. Svensson, M\. R\. Morelande, and S\. Särkkä\(2015\)Posterior linearization filter: principles and implementation using sigma points\.IEEE Transactions on Signal Processing63\(20\),pp\. 5561–5573\.Cited by:[§III\-B](https://arxiv.org/html/2606.31137#S3.SS2.p1.5),[§III\-B](https://arxiv.org/html/2606.31137#S3.SS2.p2.1)\.
- \[10\]H\. Goldstein, C\. P\. Poole, and J\. Safko\(1950\)Classical mechanics\.Addison\-Wesley Reading\.Cited by:[§II](https://arxiv.org/html/2606.31137#S2.p1.15),[§II](https://arxiv.org/html/2606.31137#S2.p1.8)\.
- \[11\]I\. Goodfellow, Y\. Bengio, and A\. Courville\(2016\)Deep learning\.MIT Press\.Cited by:[§III\-C](https://arxiv.org/html/2606.31137#S3.SS3.p3.2),[§III\-C](https://arxiv.org/html/2606.31137#S3.SS3.p4.1)\.
- \[12\]M\. S\. Grewal, L\. R\. Weill, and A\. P\. Andrews\(2007\)Global positioning systems, inertial navigation, and integration\.John Wiley & Sons\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p1.1)\.
- \[13\]S\. Greydanus, M\. Dzamba, and J\. Yosinski\(2019\)Hamiltonian neural networks\.InProceedings of the 33rd Conference on Neural Information Processing Systems,pp\. 15379–15389\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p1.1)\.
- \[14\]K\. Ito and K\. Xiong\(2002\)Gaussian filters for nonlinear filtering problems\.IEEE Transactions on Automatic Control45\(5\),pp\. 910–927\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p3.1)\.
- \[15\]G\. E\. Karniadakis, I\. G\. Kevrekidis, L\. Lu, P\. Perdikaris, S\. Wang, and L\. Yang\(2021\)Physics\-informed machine learning\.Nature Reviews Physics3\(6\),pp\. 422–440\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p1.1)\.
- \[16\]J\. Kokkala, A\. Solin, and S\. Särkkä\(2016\)Sigma\-point filtering and smoothing based parameter estimation in nonlinear dynamic systems\.Journal of Advances in Information Fusion11\(1\),pp\. 15–30\.Cited by:[§III\-C](https://arxiv.org/html/2606.31137#S3.SS3.p2.5)\.
- \[17\]T\. Lee, M\. Leok, and N\. H\. McClamroch\(2018\)Global formulations of lagrangian and hamiltonian dynamics on manifolds\.Springer\.Cited by:[§II](https://arxiv.org/html/2606.31137#S2.p1.8)\.
- \[18\]T\. Lefebvre, H\. Bruyninckx, and J\. De Schuller\(2002\)Comment on “A new method for the nonlinear transformation of means and covariances in filters and estimators”\[with authors’ reply\]\.IEEE Transactions on Automatic Control47\(8\),pp\. 1406–1409\.Cited by:[§III\-B](https://arxiv.org/html/2606.31137#S3.SS2.p1.5)\.
- \[19\]G\. Li, L\. Zeng, L\. Zhang, and Q\. J\. Wu\(2017\)State identification of Duffing oscillator based on extreme learning machine\.IEEE Signal Processing Letters25\(1\),pp\. 25–29\.Cited by:[§IV](https://arxiv.org/html/2606.31137#S4.p5.8)\.
- \[20\]X\. R\. Li and V\. P\. Jilkov\(2003\)Survey of maneuvering target tracking\. Part I: dynamic models\.IEEE Transactions on Aerospace and Electronic Systems39\(4\),pp\. 1333–1364\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p1.1)\.
- \[21\]M\. Lutter, K\. Listmann, and J\. Peters\(2019\)Deep Lagrangian networks for end\-to\-end learning of energy\-based control for under\-actuated systems\.In2019 IEEE/RSJ International Conference on Intelligent Robots and Systems \(IROS\),pp\. 7718–7725\.External Links:[Document](https://dx.doi.org/10.1109/IROS40897.2019.8968268)Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p2.1),[§III\-A](https://arxiv.org/html/2606.31137#S3.SS1.p1.3)\.
- \[22\]M\. Lutter and J\. Peters\(2023\)Combining physics and deep learning to learn continuous\-time dynamics models\.The International Journal of Robotics Research42\(3\),pp\. 83–107\.External Links:[Document](https://dx.doi.org/10.1177/02783649231169492)Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p2.1),[§III\-A](https://arxiv.org/html/2606.31137#S3.SS1.p1.3)\.
- \[23\]M\. Lutter, C\. Ritter, and J\. Peters\(2019\)Deep Lagrangian networks: using physics as model prior for deep learning\.InInternational Conference on Learning Representations,pp\. 1–17\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p2.1),[§III\-A](https://arxiv.org/html/2606.31137#S3.SS1.p1.3)\.
- \[24\]K\. S\. Narendra and K\. Parthasarathy\(1990\)Identification and control of dynamical systems using neural networks\.IEEE Transactions on Neural Networks1\(1\),pp\. 4–27\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p1.1)\.
- \[25\]G\. Rigatos\(2011\)Modelling and control for intelligent industrial systems: adaptive algorithms in robotics and industrial engineering\.Springer Science & Business Media\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p1.1)\.
- \[26\]S\. Särkkä and L\. Svensson\(2023\)Bayesian filtering and smoothing\.2nd edition,Cambridge University Press\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p3.1),[§III\-A](https://arxiv.org/html/2606.31137#S3.SS1.p2.12),[§III\-A](https://arxiv.org/html/2606.31137#S3.SS1.p2.13),[§III\-A](https://arxiv.org/html/2606.31137#S3.SS1.p2.7),[§III\-A](https://arxiv.org/html/2606.31137#S3.SS1.p4.1),[§III\-B](https://arxiv.org/html/2606.31137#S3.SS2.p1.5),[§III\-B](https://arxiv.org/html/2606.31137#S3.SS2.p2.1),[§III\-B](https://arxiv.org/html/2606.31137#S3.SS2.p2.17),[§III\-C](https://arxiv.org/html/2606.31137#S3.SS3.p1.3),[§III\-C](https://arxiv.org/html/2606.31137#S3.SS3.p2.5),[§III\-C](https://arxiv.org/html/2606.31137#S3.SS3.p3.1),[§IV](https://arxiv.org/html/2606.31137#S4.p3.7)\.
- \[27\]S\. Särkkä, A\. Solin, A\. Nummenmaa, A\. Vehtari, T\. Auranen, S\. Vanni, and F\. H\. Lin\(2012\)Dynamic retrospective filtering of physiological noise in BOLD fMRI: DRIFTER\.NeuroImage60\(2\),pp\. 1517–1527\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p1.1)\.
- \[28\]S\. Särkkä and A\. Solin\(2019\)Applied stochastic differential equations\.Cambridge University Press\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p1.1),[§III\-A](https://arxiv.org/html/2606.31137#S3.SS1.p2.12),[§III\-A](https://arxiv.org/html/2606.31137#S3.SS1.p2.13)\.
- \[29\]S\. Särkkä\(2019\)The use of Gaussian processes in system identification\.InEncyclopedia of Systems and Control,pp\. 1–10\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p1.1)\.
- \[30\]B\. Siciliano, O\. Khatib, and T\. Kröger\(2008\)Springer handbook of robotics\.New York, NY, USA: Springer\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p1.1)\.
- \[31\]A\. Svensson, T\. B\. Schön, A\. Solin, and S\. Särkkä\(2015\)Nonlinear state space model identification using a regularized basis function expansion\.In2015 IEEE 6th International Workshop on Computational Advances in Multi\-Sensor Adaptive Processing \(CAMSAP\),pp\. 481–484\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p1.1)\.
- \[32\]A\. Svensson, A\. Solin, S\. Särkkä, and T\. Schön\(2016\)Computationally efficient Bayesian learning of Gaussian process state space models\.InProceedings of Artificial Intelligence and Statistics,pp\. 213–221\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p1.1)\.
- \[33\]F\. Tronarp, Á\. F\. García\-Fernández, and S\. Särkkä\(2018\)Iterative filtering and smoothing in nonlinear and non\-Gaussian systems using conditional moments\.IEEE Signal Processing Letters25\(3\),pp\. 408–412\.Cited by:[§III\-B](https://arxiv.org/html/2606.31137#S3.SS2.p1.5)\.
- \[34\]E\. A\. Wan and R\. Van Der Merwe\(2000\)The unscented Kalman filter for nonlinear estimation\.InProceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium \(Cat\. No\. 00EX373\),pp\. 153–158\.Cited by:[§I](https://arxiv.org/html/2606.31137#S1.p3.1)\.
A Bayesian Filtering Approach for Learning Lagrangian Dynamics from Noisy Measurements

Similar Articles

Learning to Distributedly Estimate under Partially Known Dynamics: A Covariance-Agnostic Neural Kalman Consensus Filter

Learning Dynamical Systems from Multiple Sparse Datasets: A Hierarchical Bayesian Modeling Approach

Structured Noise Adaptation for Sequential Bayesian Filtering with Embedded Latent Transfer Operators

Learning dynamical systems from noisy data with Weak-form Kernel Ridge Regression

Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise

Submit Feedback

Similar Articles

Learning to Distributedly Estimate under Partially Known Dynamics: A Covariance-Agnostic Neural Kalman Consensus Filter
Learning Dynamical Systems from Multiple Sparse Datasets: A Hierarchical Bayesian Modeling Approach
Structured Noise Adaptation for Sequential Bayesian Filtering with Embedded Latent Transfer Operators
Learning dynamical systems from noisy data with Weak-form Kernel Ridge Regression
Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise