Helix4D: Complex 4D Mesh Generation

Hugging Face Daily Papers 05/25/26, 12:00 AM Papers

Summary

Helix4D introduces a framework for high-quality dynamic 4D mesh generation from video by extending Trellis2 with cross-frame attention and a 4D temporal encoding that repurposes redundant spatial RoPE bands without adding parameters.

Current video-to-4D methods struggle with complex topology changes, transparent materials, thin structures, and inner surfaces. We present Helix4D, a dynamic mesh generation framework by inheriting the expressive representation of Trellis2, adapting it from image-to-3D to video-conditioned 4D generation. Our design arises from two key questions: (a) how to enable Trellis2's frame-local attention to share information across frames while preserving its pretrained quality on rare cases such as transparent objects and inner surfaces, and (b) how to inject temporal information into a purely 3D positional encoding without breaking pretrained capabilities. We address (a) with a sliding-window cross-frame attention and anchor on the first frame. The first frame is generated by the base Trellis2 model and injected into our model, letting it inherit Trellis2's quality in rare cases through cross-frame attention. We address (b) with a 4D temporal encoding that repurposes redundant low-frequency spatial RoPE bands for time, extending the encoding from 3D with no additional parameters. Extensive experiments show the effectiveness of Helix4D for high-quality dynamic mesh generation on ActionBench and our own challenging complex dynamics set.

Original Article

View Cached Full Text

Cached at: 05/26/26, 06:42 AM

Paper page - Helix4D: Complex 4D Mesh Generation

Source: https://huggingface.co/papers/2605.26109

Abstract

Helix4D enables high-quality dynamic mesh generation by adapting Trellis2’s frame-local attention across frames and extending 3D positional encoding with 4D temporal information.

Current video-to-4D methods struggle with complex topology changes, transparent materials, thin structures, and inner surfaces. We present Helix4D, adynamic mesh generationframework by inheriting the expressive representation ofTrellis2, adapting it from image-to-3D to video-conditioned 4D generation. Our design arises from two key questions: (a) how to enableTrellis2’sframe-local attentionto share information across frames while preserving its pretrained quality on rare cases such as transparent objects and inner surfaces, and (b) how to inject temporal information into a purely 3Dpositional encodingwithout breaking pretrained capabilities. We address (a) with a sliding-windowcross-frame attentionand anchor on the first frame. The first frame is generated by the baseTrellis2model and injected into our model, letting it inheritTrellis2’s quality in rare cases throughcross-frame attention. We address (b) with a4D temporal encodingthat repurposes redundant low-frequency spatialRoPE bandsfor time, extending the encoding from 3D with no additional parameters. Extensive experiments show the effectiveness of Helix4D for high-qualitydynamic mesh generationonActionBenchand our own challengingcomplex dynamics set.

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2605\.26109

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.26109 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.26109 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.26109 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Helix4D: Complex 4D Mesh Generation

Paper page - Helix4D: Complex 4D Mesh Generation

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking

Submit Feedback

Similar Articles

Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking