Tag
This paper proposes a strategic robustness objective for learning simulators in model-based reinforcement learning, formulated as a minimax game between a model player and an adversarial policy player. Theoretical guarantees and a provably convergent algorithm are provided, with experiments showing reduced prediction error and improved real-world policy transfer.
Proposes Model-Based Diffusion Policy Optimization (MBDPO), a framework that unifies search and policy optimization in world models using diffusion policy representations, achieving consistent scaling behavior and superior performance across offline and online reinforcement learning tasks.
GPLD introduces a gradient-penalized latent dynamics regularizer for DreamerV3 to enforce local smoothness in transition learning, improving sample efficiency on continuous control tasks, especially complex locomotion.
Summary of David Silver's Reinforcement Learning Lecture 8 on integrating learning and planning, covering model-based RL and AlphaGo's use of policy and value networks with Monte Carlo Tree Search.
This paper introduces the DR.Q algorithm, which improves model-based representations for Q-learning by maximizing mutual information and using faded prioritized experience replay to reduce bias and overfitting in continuous control tasks.
This paper analyzes the 'training in imagination' paradigm in model-based reinforcement learning, deriving optimal sample allocation strategies and characterizing how dynamics and reward model errors affect policy returns.