HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Hugging Face Daily Papers 04/15/26, 12:00 AM Papers

3d-world-model 3d-gaussian-splatting multi-modal text-to-3d scene-generation open-source

Summary

HY-World 2.0 is a multi-modal world model framework that generates high-fidelity 3D Gaussian Splatting scenes from text, images, and videos through specialized modules for panorama generation, trajectory planning, and scene composition, achieving state-of-the-art performance among open-source approaches.

We introduce HY-World 2.0, a multi-modal world model framework that advances our prior project HY-World 1.0. HY-World 2.0 accommodates diverse input modalities, including text prompts, single-view images, multi-view images, and videos, and produces 3D world representations. With text or single-view image inputs, the model performs world generation, synthesizing high-fidelity, navigable 3D Gaussian Splatting (3DGS) scenes. This is achieved through a four-stage method: a) Panorama Generation with HY-Pano 2.0, b) Trajectory Planning with WorldNav, c) World Expansion with WorldStereo 2.0, and d) World Composition with WorldMirror 2.0. Specifically, we introduce key innovations to enhance panorama fidelity, enable 3D scene understanding and planning, and upgrade WorldStereo, our keyframe-based view generation model with consistent memory. We also upgrade WorldMirror, a feed-forward model for universal 3D prediction, by refining model architecture and learning strategy, enabling world reconstruction from multi-view images or videos. Also, we introduce WorldLens, a high-performance 3DGS rendering platform featuring a flexible engine-agnostic architecture, automatic IBL lighting, efficient collision detection, and training-rendering co-design, enabling interactive exploration of 3D worlds with character support. Extensive experiments demonstrate that HY-World 2.0 achieves state-of-the-art performance on several benchmarks among open-source approaches, delivering results comparable to the closed-source model Marble. We release all model weights, code, and technical details to facilitate reproducibility and support further research on 3D world models.

Original Article

View Cached Full Text

Cached at: 04/20/26, 08:29 AM

Paper page - HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Source: https://huggingface.co/papers/2604.14268 Published on Apr 15

#1 Paper of the day (https://huggingface.co/papers/date/2026-04-17) Authors:

Abstract

HY-World 2.0 is a multi-modal world model framework that generates high-fidelity 3D Gaussian Splatting scenes from diverse inputs using specialized modules for panorama generation, trajectory planning, world expansion, and composition, along with an enhanced rendering platform for interactive 3D exploration.

We introduce HY-World 2.0, a multi-modal world model (https://huggingface.co/papers?q=multi-modal%20world%20model) framework that advances our prior project HY-World 1.0. HY-World 2.0 accommodates diverse input modalities, including text prompts, single-view images, multi-view images, and videos, and produces 3D world representations (https://huggingface.co/papers?q=3D%20world%20representations). With text or single-view image inputs, the model performs world generation, synthesizing high-fidelity, navigable 3D Gaussian Splatting (https://huggingface.co/papers?q=3D%20Gaussian%20Splatting) (3DGS) scenes. This is achieved through a four-stage method: a) Panorama Generation with HY-Pano 2.0 (https://huggingface.co/papers?q=HY-Pano%202.0), b) Trajectory Planning with WorldNav (https://huggingface.co/papers?q=WorldNav), c) World Expansion with WorldStereo 2.0 (https://huggingface.co/papers?q=WorldStereo%202.0), and d) World Composition with WorldMirror 2.0 (https://huggingface.co/papers?q=WorldMirror%202.0). Specifically, we introduce key innovations to enhance panorama fidelity, enable 3D scene understanding and planning, and upgrade WorldStereo, our keyframe-based view generation (https://huggingface.co/papers?q=keyframe-based%20view%20generation) model with consistent memory. We also upgrade WorldMirror, a feed-forward model (https://huggingface.co/papers?q=feed-forward%20model) for universal 3D prediction, by refining model architecture and learning strategy, enabling world reconstruction from multi-view images or videos. Also, we introduce WorldLens, a high-performance 3DGS rendering platform (https://huggingface.co/papers?q=rendering%20platform) featuring a flexible engine-agnostic architecture, automatic IBL lighting, efficient collision detection, and training-rendering co-design, enabling interactive exploration (https://huggingface.co/papers?q=interactive%20exploration) of 3D worlds with character support. Extensive experiments demonstrate that HY-World 2.0 achieves state-of-the-art performance on several benchmarks among open-source approaches, delivering results comparable to the closed-source model Marble. We release all model weights, code, and technical details to facilitate reproducibility and support further research on 3D world models.

View arXiv page (https://arxiv.org/abs/2604.14268)View PDF (https://arxiv.org/pdf/2604.14268)Project page (https://3d-models.hunyuan.tencent.com/world/)GitHub1.3k (https://github.com/Tencent-Hunyuan/HY-World-2.0)Add to collection (https://huggingface.co/login?next=%2Fpapers%2F2604.14268)

Get this paper in your agent:

hf papers read 2604.14268

Don’t have the latest CLI?curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.14268 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.14268 in a dataset README.md to link it from this page.

Spaces citing this paper1

Collections including this paper6

Browse 6 collections that include this paper (https://huggingface.co/collections?paper=2604.14268)

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Paper page - HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper1

Collections including this paper6

Similar Articles

tencent/HY-World-2.0

DreamX-World 1.0: A General-Purpose Interactive World Model

MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold

WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes

Holo-World: Unified Camera, Object and Weather Control for Video World Model

Submit Feedback

Similar Articles

DreamX-World 1.0: A General-Purpose Interactive World Model

MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold

WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes

Holo-World: Unified Camera, Object and Weather Control for Video World Model