@BasileTerv987: Accepted to TMLR, with reproducibility certification v2 of our JEPA-WM study (arXiv:2512.24497) is out, with new data-s…
Summary
Basile Terver and colleagues' paper on Joint-Embedding Predictive World Models (JEPA-WM) for robotics has been accepted to TMLR with a reproducibility certification. The updated version includes new data-scaling experiments, a Lipschitz analysis of multistep rollout training, and extended discussions.
View Cached Full Text
Cached at: 05/25/26, 02:40 PM
Accepted to TMLR, with reproducibility certification
v2 of our JEPA-WM study (arXiv:2512.24497) is out, with new data-scaling experiments, a Lipschitz analysis of multistep rollout training, and extended discussions.
Recap + what’s new
w/ @JimmyTYYang1, Jean Ponce, @AdrienBardes, @ylecun
Basile Terver (@BasileTerv987): My first PhD paper is out! 🎓
“What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?”
tl:dr: JEPA-WMs for robotics: learn dynamics on top of visual encoders, optimize actions towards goal 👇
w/ @JimmyTYYang1, Jean Ponce, @AdrienBardes, @ylecun
Similar Articles
@artemZholus: thanks! in the second paper (https://arxiv.org/abs/2605.06388) we used your (and RAE's) recipe and it worked.
This paper systematically compares reconstruction-based and semantic latent spaces for action-conditioned latent diffusion world models in robotics. It finds that semantic encoders like V-JEPA 2.1 generally outperform reconstruction encoders on policy-relevant metrics, advocating for semantic latent spaces as a stronger foundation for robotics world models.
So, what is Yann LeCun's "World Models" and JEPA and is it Really a Replacement for LLMs?
Discusses Yann LeCun's 'World Models' and JEPA from a recent arXiv paper, clarifying that it is not a replacement for LLMs but a model optimized for visual processing in robotics, self-driving, and industrial controls.
Representation Without Reward: A JEPA Audit for LLM Fine-Tuning
This paper audits Joint-embedding predictive architectures (JEPA) for LLM fine-tuning on a natural-language-to-regex task, testing twenty-two auxiliary objectives. The results show that hidden-state representation improvements are only weakly coupled to decoded-task accuracy, with no auxiliary surviving family-wise correction.
Sub-JEPA: Subspace Gaussian Regularization for Stable End-to-End World Models
The authors introduce Sub-JEPA, a method using Subspace Gaussian Regularization to improve the stability of end-to-end world models like LeWM, showing consistent performance gains on continuous-control benchmarks.
LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
LeWorldModel introduces a stable, end-to-end Joint-Embedding Predictive Architecture that trains directly from pixels with minimal hyperparameters and provable anti-collapse guarantees. It achieves significant speedups in planning compared to foundation models while maintaining competitive performance on robotic manipulation tasks.