SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects

Hugging Face Daily Papers 05/19/26, 12:00 AM Papers

Summary

SceneCode converts natural language prompts into executable code to generate interactive, simulation-ready indoor scenes with articulated objects, enabling fine-grained controllability and on-demand asset creation.

Indoor scene synthesis underpins embodied AI, robotic manipulation, and simulation-based policy evaluation, where a useful scene must specify not only what the environment looks like, but also how its objects are structured. Existing pipelines, however, typically represent generated content as static meshes and inherit articulation only from curated asset libraries, which limits object-level controllability and prevents new interactable assets from being produced on demand. We address this gap by formulating physically interactable indoor scene synthesis as programmatic world generation, and present SceneCode, a framework that compiles a natural language prompt into an executable, code-driven indoor world rather than a collection of opaque meshes. A room-level agentic backbone first turns the prompt into a structured house layout and emits per-object AssetRequests through a planner--designer--critic loop. Each request is then routed to one of five code-generation strategies and converted into a synthesized part-wise Blender Python programs that are validated through an execution-guided repair-and-refine loop. The resulting programs are compiled into simulation-ready assets, and exported as SDF for physics simulation. A persistent scene-state registry links object requests, executable programs, rendered geometry, and simulation assets, turning scene assembly into a traceable and locally editable world-building process. We evaluate SceneCode across scene-level synthesis, object-level asset quality, human judgment, and downstream robot interaction. Results show that executable world programs improve prompt-faithful indoor scene generation and produce assets with cleaner mesh structure, and simulator-loadable articulation metadata. Project page: https://scene-code.github.io/.

Original Article

View Cached Full Text

Cached at: 05/20/26, 02:35 AM

Paper page - SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects

Source: https://huggingface.co/papers/2605.19587

Abstract

SceneCode enables programmable indoor scene generation by converting natural language prompts into executable code that produces interactive, simulation-ready environments with structured object representations.

Indoor scene synthesisunderpins embodied AI, robotic manipulation, and simulation-based policy evaluation, where a useful scene must specify not only what the environment looks like, but also how its objects are structured. Existing pipelines, however, typically represent generated content as static meshes and inherit articulation only from curated asset libraries, which limitsobject-level controllabilityand prevents new interactable assets from being produced on demand. We address this gap by formulating physically interactableindoor scene synthesisasprogrammatic world generation, and present SceneCode, a framework that compiles anatural language promptinto an executable, code-driven indoor world rather than a collection of opaque meshes. A room-level agentic backbone first turns the prompt into a structured house layout and emits per-object AssetRequests through a planner--designer--critic loop. Each request is then routed to one of five code-generation strategies and converted into a synthesized part-wiseBlender Python programsthat are validated through anexecution-guided repair-and-refine loop. The resulting programs are compiled intosimulation-ready assets, and exported asSDFfor physics simulation. A persistentscene-state registrylinks object requests, executable programs, rendered geometry, and simulation assets, turning scene assembly into a traceable and locally editable world-building process. We evaluate SceneCode across scene-level synthesis, object-level asset quality, human judgment, and downstream robot interaction. Results show that executable world programs improve prompt-faithful indoor scene generation and produce assets with cleaner mesh structure, and simulator-loadablearticulation metadata. Project page: https://scene-code.github.io/.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2605\.19587

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.19587 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.19587 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.19587 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects

Paper page - SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

Coding Agent Is Good As World Simulator

Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

Submit Feedback

Similar Articles

WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

Coding Agent Is Good As World Simulator

Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models