model-based-rl

#model-based-rl

Theoretical Foundations and Effective Algorithms for Policy-Aware Simulator Learning

arXiv cs.LG ↗ · 2026-05-29 Cached

This paper proposes a strategic robustness objective for learning simulators in model-based reinforcement learning, formulated as a minimax game between a model player and an adversarial policy player. Theoretical guarantees and a provably convergent algorithm are provided, with experiments showing reduced prediction error and improved real-world policy transfer.

0 favorites 0 likes

#model-based-rl

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

arXiv cs.LG ↗ · 2026-05-27 Cached

Proposes Model-Based Diffusion Policy Optimization (MBDPO), a framework that unifies search and policy optimization in world models using diffusion policy representations, achieving consistent scaling behavior and superior performance across offline and online reinforcement learning tasks.

0 favorites 0 likes

#model-based-rl

Dreaming Smoothly and Sample Efficiently with Gradient Penalized Latent Dynamics

arXiv cs.LG ↗ · 2026-05-25 Cached

GPLD introduces a gradient-penalized latent dynamics regularizer for DreamerV3 to enforce local smoothness in transition learning, improving sample efficiency on continuous control tasks, especially complex locomotion.

0 favorites 0 likes

#model-based-rl

@ickma2311: David Silver RL Course (Lecture 8): Integrating Learning and Planning AlphaGo is a beautiful example of integrating lea…

X AI KOLs Timeline ↗ · 2026-05-16 Cached

Summary of David Silver's Reinforcement Learning Lecture 8 on integrating learning and planning, covering model-based RL and AlphaGo's use of policy and value networks with Monte Carlo Tree Search.

0 favorites 0 likes

#model-based-rl

Debiased Model-based Representations for Sample-efficient Continuous Control

Hugging Face Daily Papers ↗ · 2026-05-12 Cached

This paper introduces the DR.Q algorithm, which improves model-based representations for Q-learning by maximizing mutual information and using faded prioritized experience replay to reduce bias and overfitting in continuous control tasks.

0 favorites 0 likes

#model-based-rl

On Training in Imagination

arXiv cs.LG ↗ · 2026-05-11 Cached

This paper analyzes the 'training in imagination' paradigm in model-based reinforcement learning, deriving optimal sample allocation strategies and characterizing how dynamics and reward model errors affect policy returns.

0 favorites 0 likes

model-based-rl

Theoretical Foundations and Effective Algorithms for Policy-Aware Simulator Learning

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

Dreaming Smoothly and Sample Efficiently with Gradient Penalized Latent Dynamics

@ickma2311: David Silver RL Course (Lecture 8): Integrating Learning and Planning AlphaGo is a beautiful example of integrating lea…

Debiased Model-based Representations for Sample-efficient Continuous Control

On Training in Imagination

Submit Feedback