Multi-scale Mixture of World Models for Embodied Agents in Evolving Environments

arXiv cs.AI 07/02/26, 04:00 AM Papers

mixture-of-experts world-models embodied-agents multi-scale-reasoning dynamic-adaptation arxiv

Summary

This paper introduces MuSix, a framework for embodied agents that uses scale-aware world model mixture and evolution to handle multi-scale reasoning and dynamic adaptation in evolving environments, achieving improvements over baselines on EmbodiedBench and HAZARD.

arXiv:2607.00457v1 Announce Type: new Abstract: Embodied agents operating in the real world require multi-scale reasoning and knowledge adaptation as conditions change. We identify two challenges in applying Mixture of Experts (MoE) to this setting: routing lacks an explicit notion of scale, preventing targeted updates at specific scales, and a uniform update policy cannot accommodate the different rates at which knowledge at each scale becomes outdated. We present MuSix, a framework that addresses both challenges through scale-aware world model mixture and evolution. A two-stage routing mechanism grounds scale selection in experiential distance, a measure of situational novelty inspired by Construal Level Theory: a meta-router first maps this quantity to a weight over continuous scale space, then per-scale base routers select world models within the identified scale. For adaptation, scale-dependent forgetting rates allow low-scale knowledge to refresh rapidly while high-scale abstractions persist, and gated inter-scale transfer maintains coherence across the hierarchy. Experiments on EmbodiedBench and HAZARD show that MuSix improves over state-of-the-art baselines on multi-scale reasoning and dynamic adaptation.

Original Article

View Cached Full Text

Cached at: 07/02/26, 05:40 AM

# Multi-scale Mixture of World Models for Embodied Agents in Evolving Environments
Source: [https://arxiv.org/abs/2607.00457](https://arxiv.org/abs/2607.00457)
[View PDF](https://arxiv.org/pdf/2607.00457)

> Abstract:Embodied agents operating in the real world require multi\-scale reasoning and knowledge adaptation as conditions change\. We identify two challenges in applying Mixture of Experts \(MoE\) to this setting: routing lacks an explicit notion of scale, preventing targeted updates at specific scales, and a uniform update policy cannot accommodate the different rates at which knowledge at each scale becomes outdated\. We present MuSix, a framework that addresses both challenges through scale\-aware world model mixture and evolution\. A two\-stage routing mechanism grounds scale selection in experiential distance, a measure of situational novelty inspired by Construal Level Theory: a meta\-router first maps this quantity to a weight over continuous scale space, then per\-scale base routers select world models within the identified scale\. For adaptation, scale\-dependent forgetting rates allow low\-scale knowledge to refresh rapidly while high\-scale abstractions persist, and gated inter\-scale transfer maintains coherence across the hierarchy\. Experiments on EmbodiedBench and HAZARD show that MuSix improves over state\-of\-the\-art baselines on multi\-scale reasoning and dynamic adaptation\.

## Submission history

From: Jinwoo Jang \[[view email](https://arxiv.org/show-email/3b4d85fb/2607.00457)\] **\[v1\]**Wed, 1 Jul 2026 05:23:56 UTC \(9,191 KB\)

Multi-scale Mixture of World Models for Embodied Agents in Evolving Environments

Similar Articles

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

Multi-Agent World Models (3 minute read)

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

tencent/HY-Embodied-0.5

ABot-M0.5: Unified Mobility-and-Manipulation World Action Model

Submit Feedback

Similar Articles

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

Multi-Agent World Models (3 minute read)

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

ABot-M0.5: Unified Mobility-and-Manipulation World Action Model