tencent/HY-World-2.0

Hugging Face Models Trending Models

Summary

HY-World 2.0 is Tencent's open-source multi-modal 3D world model that reconstructs and generates 3D worlds from text, images, and videos, producing editable 3D assets (meshes/Gaussian Splatting) comparable to closed-source methods.

Task: image-to-3d Tags: hy-world-2.0, safetensors, worldmodel, 3d, hy-world, image-to-3d, en, zh, license:other, region:us
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 02:45 PM

tencent/HY-World-2.0 · Hugging Face

Source: https://huggingface.co/tencent/HY-World-2.0

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

English|简体中文

HY-World-2.0 Teaser

“What Is Now Proved Was Once Only Imagined”

https://huggingface.co/tencent/HY-World-2.0#%F0%9F%8E%A5-video🎥 Video

https://huggingface.co/tencent/HY-World-2.0#%F0%9F%94%A5-news🔥 News

https://huggingface.co/tencent/HY-World-2.0#%F0%9F%93%8B-table-of-contents📋 Table of Contents

https://huggingface.co/tencent/HY-World-2.0#%F0%9F%93%96-introduction📖 Introduction

HY-World 2.0is a multi-modal world model framework forworld generationandworld reconstruction. It accepts diverse input modalities — text, single-view images, multi-view images, and videos — and produces 3D world representations (meshes / Gaussian Splattings). It offers two core capabilities:

HY-World 2.0 is thefirst open-source state-of-the-art3D world model, delivering results comparable to closed-source methods such as Marble. We will release all model weights, code, and technical details to facilitate reproducibility and advance research in this field.

https://huggingface.co/tencent/HY-World-2.0#why-3d-world-modelsWhy 3D World Models?

Existing world models, such as Genie 3, Cosmos, and HY-World 1.5 (WorldPlay+WorldCompass), generate pixel-level videos — essentially “watching a movie” that vanishes once playback ends.HY-World 2.0 takes a fundamentally different approach: it directly produces editable, persistent 3D assets (meshes / 3DGS) that can be imported into game engines like Blender/Unity/Unreal Engine/Isaac Sim — more like “building a playable game” than recording a clip. This paradigm shift natively resolves many long-standing pain points of video world models:

Video World Models3D World Model (HY-World 2.0)OutputPixel videos (non-editable)Real 3D assets — meshes / 3DGS (fully editable)Playable DurationLimited (typically < 1 min)Unlimited — assets persist permanently3D ConsistencyPoor (flickering, artifacts across views)Native — inherently consistent in 3DReal-Time RenderingRequires per-frame inference; high latencyConsumer GPUs can render in real timeControllabilityWeak (imprecise character control, no real physics)Precise — zero-error control, real physics collision, accurate lightingInference CostAccumulates with every interactionOne-time generation; rendering cost ≈ 0Engine Compatibility✗ Video files only✓ Directly importable into Blender / UE / Isaac Engine\\color\{IndianRed\}\{\\textsf\{Watch a video, then it's gone\}\}\\color\{RoyalBlue\}\{\\textbf\{Build a world, keep it forever\}\} All above arereal 3D assets(not generated videos) and entirely created by HY-World 2.0 -- captured from live real-time interaction.

https://huggingface.co/tencent/HY-World-2.0#%E2%9C%A8-highlights✨ Highlights

  • Real 3D Worlds, Not Just Videos Unlike video-only world models (e.g., Genie 3, HY World 1.5), HY-World 2.0 generatesreal 3D assets— 3DGS, meshes, and point clouds — that are freely explorable, editable, and directly importable intoUnity / Unreal Engine / Isaac. From a single text prompt or image, create navigable 3D worlds with diverse styles: realistic, cartoon, game, and more.

  • Instant 3D Reconstruction from Photos & Videos Powered byWorldMirror 2.0, a unified feed-forward model that predicts dense point clouds, depth maps, surface normals, camera parameters, and 3DGS from multi-view images or casual videos in a single forward pass. Supports flexible-resolution inference (50K–500K pixels) with SOTA accuracy. Capture a video, get a digital twin.

  • Interactive Character Exploration Go beyond viewing —play inside your generated worlds. HY-World 2.0 supports first-person navigation and third-person character mode, enabling users to freely explore AI-generated streets, buildings, and landscapes with physics-based collision. Go toour product pagefor free try.

https://huggingface.co/tencent/HY-World-2.0#%F0%9F%A7%A9-architecture🧩 Architecture

  • Refer to our tech report for more details A systematic pipeline of HY-World 2.0 —Panorama Generation(HY-Pano-2.0) →Trajectory Planning(WorldNav) →World Expansion(WorldStereo 2.0) →World Composition(WorldMirror 2.0 + 3DGS) — that automatically transforms text or a single image into a high-fidelity, navigable 3D world (3DGS/mesh outputs).

https://huggingface.co/tencent/HY-World-2.0#%F0%9F%93%9D-open-source-plan📝 Open-Source Plan

  • ✅ Technical Report
  • ✅ WorldMirror 2.0 Code & Model Checkpoints
  • ⬜ Full Inference Code for World Generation (WorldNav + World Composition)
  • ⬜ Panorama Generation (HY-Pano 2.0) Model & Code —HunyuanWorld 1.0available as interim alternative
  • ⬜ World Expansion (WorldStereo 2.0) Model & Code —WorldStereoavailable as interim alternative

https://huggingface.co/tencent/HY-World-2.0#%F0%9F%8E%81-model-zoo🎁 Model Zoo

https://huggingface.co/tencent/HY-World-2.0#world-reconstruction–worldmirror-seriesWorld Reconstruction — WorldMirror Series

ModelDescriptionParamsDateHugging FaceWorldMirror 2.0Multi-view / video → 3D reconstruction~1.2B2026DownloadWorldMirror 1.0Multi-view / video → 3D reconstruction (legacy)~1.2B2025Download

https://huggingface.co/tencent/HY-World-2.0#panorama-generationPanorama Generation

ModelDescriptionParamsDateHugging FaceHY-PanoGenText / image → 360° panorama—Coming Soon—

https://huggingface.co/tencent/HY-World-2.0#world-generationWorld Generation

ModelDescriptionParamsDateHugging FaceWorldStereo 2.0Panorama → navigable 3DGS world—Coming Soon— We recommend referring to our previous works,WorldStereoandWorldMirror, for background knowledge on world generation and reconstruction.

https://huggingface.co/tencent/HY-World-2.0#%F0%9F%A4%97-get-started🤗 Get Started

https://huggingface.co/tencent/HY-World-2.0#install-requirementsInstall Requirements

We recommend CUDA 12.4 for installation.

# 1. Clone the repository
git clone https://github.com/Tencent-Hunyuan/HY-World-2.0
cd HY-World-2.0

# 2. Create conda environment
conda create -n hyworld2 python=3.10
conda activate hyworld2

# 3. Install PyTorch (CUDA 12.4)
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124

# 4. Install dependencies
pip install -r requirements.txt

# 5. Install FlashAttention
# (Recommended) Install FlashAttention-3
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention/hopper
python setup.py install
cd ../../
rm -rf flash-attention

# For simpler installation, you can also use FlashAttention-2
pip install flash-attn --no-build-isolation

https://huggingface.co/tencent/HY-World-2.0#code-usage–panorama-generation-hy-pano-2Code Usage — Panorama Generation (HY-Pano-2)

Coming soon.

https://huggingface.co/tencent/HY-World-2.0#code-usage–world-generation-worldnav-worldstereo-2-and-3dgsCode Usage — World Generation (WorldNav, WorldStereo-2, and 3DGS)

Coming soon.

We recommend referring to our previous work,WorldStereo, for the open-source preview version of WorldStereo-2.

https://huggingface.co/tencent/HY-World-2.0#code-usage–worldmirror-20Code Usage — WorldMirror 2.0

WorldMirror 2.0 supports the following usage modes:

We provide adiffusers-like Python API for WorldMirror 2.0. Model weights are automatically downloaded from Hugging Face on first run.

from hyworld2.worldrecon.pipeline import WorldMirrorPipeline

pipeline = WorldMirrorPipeline.from_pretrained('tencent/HY-World-2.0')
result = pipeline('path/to/images')

With Prior Injection (Camera & Depth):

result = pipeline(
    'path/to/images',
    prior_cam_path='path/to/prior_camera.json',
    prior_depth_path='path/to/prior_depth/',
)

For the detailed structure of camera/depth priors and how to prepare them, seePrior Preparation Guide.

CLI:

# Single GPU
python -m hyworld2.worldrecon.pipeline --input_path path/to/images

# Multi-GPU
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.pipeline \
    --input_path path/to/images \
    --use_fsdp --enable_bf16

Important:In multi-GPU mode, the number of input images must be>= the number of GPUs. For example, with\-\-nproc\_per\_node=8, provide at least 8 images.

https://huggingface.co/tencent/HY-World-2.0#gradio-app–worldmirror-20Gradio App — WorldMirror 2.0

We provide an interactiveGradioweb demo for WorldMirror 2.0. Upload images or videos and visualize 3DGS, point clouds, depth maps, normal maps, and camera parameters in your browser.

# Single GPU
python -m hyworld2.worldrecon.gradio_app

# Multi-GPU
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.gradio_app \
    --use_fsdp --enable_bf16

For the full list of Gradio app arguments (port, share, local checkpoints, etc.), seeDOCUMENTATION.md.

https://huggingface.co/tencent/HY-World-2.0#%F0%9F%94%AE-performance🔮 Performance

For full benchmark results, please refer to thetechnical report.

https://huggingface.co/tencent/HY-World-2.0#worldstereo-20–camera-controlWorldStereo 2.0 — Camera Control

MethodsCamera MetricsVisual QualityRotErr ↓TransErr ↓ATE ↓Q-Align ↑CLIP-IQA+ ↑Laion-Aes ↑CLIP-I ↑SEVA1.6901.5782.8793.2320.4794.62377.16Gen3C0.9441.5802.7893.3530.4894.86382.33WorldStereo0.7621.2452.1414.1490.5475.25789.05WorldStereo 2.00.4920.9681.7684.2050.5445.266****89.43### https://huggingface.co/tencent/HY-World-2.0#worldstereo-20–single-view-generated-reconstructionWorldStereo 2.0 — Single-View-Generated Reconstruction

MethodsTanks-and-TemplesMipNeRF360Precision ↑Recall ↑F1-Score ↑AUC ↑Precision ↑Recall ↑F1-Score ↑AUC ↑SEVA33.5935.3436.7351.0322.3855.6328.7546.81Gen3C46.7325.5131.2442.4423.2875.3735.2652.10Lyra50.3828.6732.5443.0530.0258.6036.0549.89FlashWorld26.5820.7222.2930.4535.9753.7742.6053.86WorldStereo 2.043.6241.0241.4358.1943.1965.3251.27****65.79WorldStereo 2.0 (DMD)40.4144.4143.1660.0942.3464.8350.5265.64### https://huggingface.co/tencent/HY-World-2.0#worldmirror-20–point-map-reconstructionWorldMirror 2.0 — Point Map Reconstruction

**Point Map Reconstruction on 7-Scenes, NRGBD, and DTU.**We report the mean Accuracy and Completeness of WorldMirror under different input configurations.Boldresults are best. “L / M / H” denote low / medium / high inference resolution. “+ all priors” denotes injection of camera extrinsics, camera intrinsics, and depth priors.

Method7-Scenes(scene)NRGBD(scene)DTU(object)Acc. ↓Comp. ↓Acc. ↓Comp. ↓Acc. ↓Comp. ↓WorldMirror 1.0L0.0430.0550.0460.0491.4761.768L + all priors0.0210.0260.0220.0201.3471.392M0.0430.0490.0410.0451.0171.780M + all priors0.0180.0230.0160.0140.7350.935H0.0790.0870.0770.0932.2712.113H + all priors0.0420.0410.0780.0821.7731.478WorldMirror 2.0L0.0410.0520.0470.0581.3522.009L + all priors0.0190.0240.0170.0151.1001.201M0.0330.0460.0390.0471.0051.892M + all priors0.0130.0170.013****0.0130.6900.876H0.0370.0400.0460.0530.8451.904H + all priors0.0120.0160.0150.0160.554****0.771### https://huggingface.co/tencent/HY-World-2.0#worldmirror-20–prior-comparisonWorldMirror 2.0 — Prior Comparison

**Comparison with Pow3R and MapAnything under Different Prior Conditions.**Results are averaged on 7-Scenes, NRGBD, and DTU datasets. Pow3R (pro) refers to the original Pow3R with Procrustes alignment.

https://huggingface.co/tencent/HY-World-2.0#%F0%9F%8E%AC-more-examples🎬 More Examples

https://huggingface.co/tencent/HY-World-2.0#%F0%9F%93%96-documentation📖 Documentation

For detailed usage guides, parameter references, output format specifications, and prior injection instructions, see**DOCUMENTATION.md**.

https://huggingface.co/tencent/HY-World-2.0#%F0%9F%93%9A-citation📚 Citation

If you find HunyuanWorld 2.0 useful for your research, please cite:

@article{hyworld22026,
  title={HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds},
  author={Tencent HY-World Team},
  journal={arXiv preprint},
  year={2026}
}

@article{hunyuanworld2025tencent,
    title={HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels},
    author={Team HunyuanWorld},
    year={2025},
    journal={arXiv preprint}
}

https://huggingface.co/tencent/HY-World-2.0#%F0%9F%93%A7-contact📧 Contact

Please send emails to[email protected]for questions or feedback.

https://huggingface.co/tencent/HY-World-2.0#%F0%9F%99%8F-acknowledgements🙏 Acknowledgements

We would like to thankHunyuanWorld 1.0,WorldMirror,WorldPlay,WorldStereo,HunyuanImagefor their great work.

Similar Articles

tencent/HY-Embodied-0.5

Hugging Face Models Trending

Tencent releases HY-Embodied-0.5, a suite of foundation models designed for embodied AI agents featuring a Mixture-of-Transformers (MoT) architecture with efficient 2B and powerful 32B variants for real-world robot control and spatial-temporal reasoning.

Genie 3: A new frontier for world models

Google DeepMind Blog

DeepMind announces Genie 3, a general-purpose world model capable of generating interactive environments from text prompts at 24fps in 720p with improved consistency and real-time interactivity compared to previous versions.