SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects
Summary
SceneCode converts natural language prompts into executable code to generate interactive, simulation-ready indoor scenes with articulated objects, enabling fine-grained controllability and on-demand asset creation.
View Cached Full Text
Cached at: 05/20/26, 02:35 AM
Paper page - SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects
Source: https://huggingface.co/papers/2605.19587
Abstract
SceneCode enables programmable indoor scene generation by converting natural language prompts into executable code that produces interactive, simulation-ready environments with structured object representations.
Indoor scene synthesisunderpins embodied AI, robotic manipulation, and simulation-based policy evaluation, where a useful scene must specify not only what the environment looks like, but also how its objects are structured. Existing pipelines, however, typically represent generated content as static meshes and inherit articulation only from curated asset libraries, which limitsobject-level controllabilityand prevents new interactable assets from being produced on demand. We address this gap by formulating physically interactableindoor scene synthesisasprogrammatic world generation, and present SceneCode, a framework that compiles anatural language promptinto an executable, code-driven indoor world rather than a collection of opaque meshes. A room-level agentic backbone first turns the prompt into a structured house layout and emits per-object AssetRequests through a planner--designer--critic loop. Each request is then routed to one of five code-generation strategies and converted into a synthesized part-wiseBlender Python programsthat are validated through anexecution-guided repair-and-refine loop. The resulting programs are compiled intosimulation-ready assets, and exported asSDFfor physics simulation. A persistentscene-state registrylinks object requests, executable programs, rendered geometry, and simulation assets, turning scene assembly into a traceable and locally editable world-building process. We evaluate SceneCode across scene-level synthesis, object-level asset quality, human judgment, and downstream robot interaction. Results show that executable world programs improve prompt-faithful indoor scene generation and produce assets with cleaner mesh structure, and simulator-loadablearticulation metadata. Project page: https://scene-code.github.io/.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.19587
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.19587 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.19587 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.19587 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes
WorldAct is a framework that converts static 3D generated environments into editable and interactive object-centric scenes using multimodal agents and geometric reconstruction, enabling object-level editing and embodied task execution.
SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning
SimWorld Studio is an open-source platform that uses an evolving coding agent to automatically generate and refine 3D environments for embodied agent learning. It leverages self-evolution and co-evolution mechanisms to create adaptive training scenarios, significantly improving agent performance.
Coding Agent Is Good As World Simulator
This paper presents an agentic framework that uses coding agents to generate physically plausible world simulations from natural language prompts, outperforming video-based models in physical accuracy and instruction fidelity.
Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis
A novel MLLM-based agentic framework called Code-as-Room generates 3D indoor rooms by converting top-down images into executable Blender code through a structured execution harness with cross-stage memory to maintain context.
Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models
This paper introduces SEIG, a framework that uses pretrained vision-language models to reconstruct 3D scenes from single images as editable Blender programs through progressive refinement of geometry, materials, composition, and lighting.