EMMA: Extracting Multiple physical parameters from Multimodal Data

Hugging Face Daily Papers 05/21/26, 12:00 AM Papers

Summary

EMMA is a physics-informed multimodal framework that recovers dynamical parameters from raw video, audio, and image data using a Liquid Time-Constant network and physics-constrained loss, outperforming existing baselines across diverse benchmarks.

We introduce EMMA, a physics-informed multimodal framework that recovers all identifiable dynamical parameters of a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, or assumptions about known initial conditions and coordinate frames, EMMA performs joint inference of explicit parameters, implicit dynamical components, and calibration invariants within a unified continuous-time model. EMMA leverages a Liquid Time-Constant (LTC) network to learn latent dynamics from heterogeneous modalities while a physics-constrained loss enforces consistency with the governing differential equations. A unified feature pipeline enables consistent alignment across video trajectories, acoustic signatures, and chart-derived measurements, allowing EMMA to estimate parameters under forced, implicit, and multivariate dynamics without requiring segmentation masks, differentiable rendering, or specialized sensors. Across 100+ scenarios including five standard dynamical benchmarks (75 Delfys videos), real-world rover and quadrotor systems with hidden inputs, and simulation-chart case studies spanning biological and chaotic systems, EMMA delivers robust multi-parameter recovery and significantly outperforms existing single-modality and equation-discovery baselines. Our results establish EMMA as a general, scalable solution for physics-consistent model extraction from opportunistic multimodal data. Code and data are available at: https://github.com/ImpactLabASU/EMMA-CVPR2026

Original Article

View Cached Full Text

Cached at: 06/09/26, 08:41 AM

Paper page - EMMA: Extracting Multiple physical parameters from Multimodal Data

Source: https://huggingface.co/papers/2605.24047

Abstract

EMMA is a physics-informed multimodal framework that directly recovers dynamical parameters from raw video, audio, and image data using a Liquid Time-Constant network and physics-constrained loss.

We introduce EMMA, a physics-informedmultimodal frameworkthat recovers all identifiabledynamical parametersof a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, or assumptions about known initial conditions and coordinate frames, EMMA performs joint inference of explicit parameters, implicit dynamical components, and calibration invariants within a unifiedcontinuous-time model. EMMA leverages aLiquid Time-Constant(LTC) network to learnlatent dynamicsfromheterogeneous modalitieswhile aphysics-constrained lossenforces consistency with the governingdifferential equations. Aunified feature pipelineenables consistent alignment across video trajectories, acoustic signatures, and chart-derived measurements, allowing EMMA to estimate parameters under forced, implicit, and multivariate dynamics without requiring segmentation masks, differentiable rendering, or specialized sensors. Across 100+ scenarios including five standard dynamical benchmarks (75 Delfys videos), real-world rover and quadrotor systems with hidden inputs, and simulation-chart case studies spanning biological and chaotic systems, EMMA delivers robust multi-parameter recovery and significantly outperforms existing single-modality and equation-discovery baselines. Our results establish EMMA as a general, scalable solution for physics-consistent model extraction fromopportunistic multimodal data. Code and data are available at: https://github.com/ImpactLabASU/EMMA-CVPR2026

View arXiv page View PDF Project page GitHub0 Add to collection

Get this paper in your agent:

hf papers read 2605\.24047

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.24047 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.24047 in a dataset README.md to link it from this page.

Spaces citing this paper1

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

EMMA: Extracting Multiple physical parameters from Multimodal Data

Paper page - EMMA: Extracting Multiple physical parameters from Multimodal Data

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper1

Collections including this paper0

Similar Articles

EmoS: A High-Fidelity Multimodal Benchmark for Fine-grained Streaming Emotional Understanding

Deep Temporal Modeling and Ensemble Fusion for Multimodal Emotion Recognition from Physiological Signals

LongMoE: Longitudinal Multimodal Learning via Trajectory-Aware Mixture-of-Experts

MULTISEISMO: A Multimodal Seismic Dataset and Model for Cross-Modal Seismic Understanding

Multiplication in Multimodal LLMs: Computation with Text, Image, and Audio Inputs

Submit Feedback

Similar Articles

EmoS: A High-Fidelity Multimodal Benchmark for Fine-grained Streaming Emotional Understanding

Deep Temporal Modeling and Ensemble Fusion for Multimodal Emotion Recognition from Physiological Signals

LongMoE: Longitudinal Multimodal Learning via Trajectory-Aware Mixture-of-Experts

MULTISEISMO: A Multimodal Seismic Dataset and Model for Cross-Modal Seismic Understanding

Multiplication in Multimodal LLMs: Computation with Text, Image, and Audio Inputs