PhysiFormer: Learning to Simulate Mechanics in World Space
Summary
PhysiFormer uses coordinate-space diffusion to generate physically-plausible 3D object motions without explicit inductive biases, enabling efficient multi-object reasoning and generalization to complex materials and geometries.
View Cached Full Text
Cached at: 06/26/26, 02:04 AM
Paper page - PhysiFormer: Learning to Simulate Mechanics in World Space
Source: https://huggingface.co/papers/2606.27364
Abstract
PhysiFormer uses coordinate-space diffusion to generate physically-plausible 3D object motions without explicit inductive biases, enabling efficient multi-object reasoning and generalization to complex materials and geometries.
We present PhysiFormer, adiffusion transformerfor physically-plausible 3D object motion. Unlike video world models that operate in view-dependent pixel space, PhysiFormer represents objects as3D meshesexpressed inworld coordinates. Given the initial vertex positions and velocities, as well as object material type, rigid or elastic, the model samples futurevertex trajectories. While related neural physics approaches build on ad-hoc latent spaces or explicitly enforce rigidity and causality, PhysiFormer shows that excellent results can be obtained without any such inductive biases, by casting vertex trajectory prediction as a singledenoising diffusion processdirectly inworld coordinates. Theprobabilistic formulationcaptures uncertainty in the learned dynamics, enabling diverse plausible futures from initial conditions, making this framework potentially useful for applications with unobserved uncertainty. The model featuresattention factorisedover time, space, and objects for efficiency, enablingpermutation-invariantmulti-object reasoning without needing explicit object encoding. Trained on over 100k simulated trajectories, PhysiFormer generates rigid and elastic mechanics, and generalises to mixed-material settings, unseen real-world geometries, and larger object counts. It substantially outperformsautoregressive baselinesin trajectory accuracy, rigidity preservation, and momentum-basedphysical consistency. Our results position coordinate-space diffusion as a promising step toward view-invariant, geometry-aware world modelling for robotics, graphics, and physical design. Visualisations, code, and models are available at https://yimingc9.github.io/physiformer.
View arXiv pageView PDFAdd to collection
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.27364 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.27364 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.27364 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions
PhyGenHOI is a novel framework that generates physically accurate 4D human-object interactions by coupling motion diffusion models with material point method simulations using 3D Gaussian representations.
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
PhysForge is a two-stage framework that generates interactive 3D assets with grounded physics and kinematic parameters, addressing the bottleneck of static geometry in virtual worlds.
PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation
PAIWorld enhances diffusion-transformer world models with geometric awareness and cross-view attention to improve multi-view 3D consistency for robotic manipulation tasks, achieving state-of-the-art results on benchmarks.
EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video
EgoPhys introduces a framework to construct deformable physical digital twins from egocentric RGB video using generalizable priors and a compact codebook, enabling zero-shot generalization to unseen objects without per-spring optimization. The system is demonstrated on a real robot, showing that egocentric human play video can serve as internal world representation for deformable-object planning.
@ziqi_huang_: An interesting work on Physical AI: PhysX-Omni. First unified sim-ready generation framework for rigid, deformable, and…
PhysX-Omni is a unified framework for simulation-ready physical 3D generation covering rigid, deformable, and articulated objects, with a new dataset (PhysXVerse) and benchmark (PhysX-Bench).