Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

Hugging Face Daily Papers 05/19/26, 12:00 AM Papers

Summary

A training-free 4D mesh generation approach using Spatio-Temporal Attention Chains accelerates creation to 9 seconds (13x speedup) while improving temporal consistency and scaling to longer sequences, with zero-shot capabilities for tracking and camera estimation.

4D mesh generation has recently emerged as a powerful paradigm for recovering dynamic 3D structure from videos, but existing methods remain slow, computationally expensive, and difficult to scale to longer sequences. We introduce a training-free approach that accelerates 4D mesh generation while improving temporal correspondence quality. Our key observation is that temporal correspondences emerge inside a 4D backbone long before its generated meshes become visually accurate. We exploit this with a general framework we call Spatio-Temporal Attention Chain which propagates information across space and time. Starting from vertices on an anchor mesh, the chain maps vertices to latent tokens. It then follows temporal correspondences in latent space, and recovers frame-specific vertices through latent-to-vertex attention. This design avoids expensive explicit matching while preserving anchor mesh details and thereby improving dynamic mesh geometry and temporal consistency. Compared to state-of-the-art, our method generates a 4D mesh in 9 seconds, achieving a 13times speedup while producing higher-quality results. Moreover, our approach scales to videos up to 16times longer without degrading mesh quality. Beyond generation, the improved correspondences enable competitive zero-shot performance on two downstream tasks: 2D object tracking and 4D tracking. We further show that our framework enables reliable camera estimation, a capability not supported by prior 4D mesh generation methods.

Original Article

View Cached Full Text

Cached at: 05/20/26, 06:39 PM

Paper page - Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

Source: https://huggingface.co/papers/2605.19786 Published on May 19

Submitted byhttps://huggingface.co/Dvir

Samuelon May 20

Abstract

A training-free 4D mesh generation approach uses spatio-temporal attention chains to accelerate mesh creation while improving temporal correspondence quality and enabling scalable long-sequence processing.

4D mesh generationhas recently emerged as a powerful paradigm for recovering dynamic 3D structure from videos, but existing methods remain slow, computationally expensive, and difficult to scale to longer sequences. We introduce a training-free approach that accelerates4D mesh generationwhile improving temporal correspondence quality. Our key observation is thattemporal correspondencesemerge inside a 4D backbone long before its generated meshes become visually accurate. We exploit this with a general framework we callSpatio-Temporal Attention Chainwhich propagates information across space and time. Starting from vertices on ananchor mesh, the chain maps vertices tolatent tokens. It then followstemporal correspondencesin latent space, and recovers frame-specific vertices throughlatent-to-vertex attention. This design avoids expensive explicit matching while preservinganchor meshdetails and thereby improving dynamic mesh geometry and temporal consistency. Compared to state-of-the-art, our method generates a 4D mesh in 9 seconds, achieving a 13times speedup while producing higher-quality results. Moreover, our approach scales to videos up to 16times longer without degrading mesh quality. Beyond generation, the improved correspondences enable competitive zero-shot performance on two downstream tasks:2D object trackingand4D tracking. We further show that our framework enables reliablecamera estimation, a capability not supported by prior4D mesh generationmethods.

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2605\.19786

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.19786 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.19786 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.19786 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

Paper page - Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Helix4D: Complex 4D Mesh Generation

D4RT: Teaching AI to see the world in four dimensions

4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

NeuROK: Generative 4D Neural Object Kinematics

Submit Feedback

Similar Articles

Helix4D: Complex 4D Mesh Generation

D4RT: Teaching AI to see the world in four dimensions

4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

NeuROK: Generative 4D Neural Object Kinematics