Tag
DiPOD stabilizes diffusion policy optimization by interleaving self-distillation with policy-gradient updates to maintain a tight ELBO, preventing the double-drift phenomenon and achieving higher rewards in both language and continuous control tasks.
This paper develops a geometric dynamical framework to model how predictive AI assistance alters exploratory cognition by stabilizing trajectories before self-generated exploration, leading to reduced exploratory responsiveness, hysteresis, and delayed recovery upon assistance withdrawal.