Tag
LaWAM enables efficient robot control by predicting compact latent visual subgoals instead of expensive video generation, achieving state-of-the-art success rates with up to 24x lower latency than pixel-space world action models.
This paper argues that large language models struggle with causal reasoning and long-horizon planning due to a mismatch between sequence prediction and reasoning over latent environment dynamics, and introduces the Latent Dynamics Inference perspective along with the Flux environment to study these limitations.
GPLD introduces a gradient-penalized latent dynamics regularizer for DreamerV3 to enforce local smoothness in transition learning, improving sample efficiency on continuous control tasks, especially complex locomotion.
EMMA is a physics-informed multimodal framework that recovers dynamical parameters from raw video, audio, and image data using a Liquid Time-Constant network and physics-constrained loss, outperforming existing baselines across diverse benchmarks.
Introduces NormWear-2, a world model that encodes multivariate physiological signals and clinical interventions into a shared latent space, using chaos-theoretic balancing to improve long-horizon forecasting across daily life, point-of-care, and clinical settings.