elbo

Tag

Cards List
#elbo

Diffusion Policy Optimization without Drifting Apart

arXiv cs.LG · 18h ago Cached

DiPOD stabilizes diffusion policy optimization by interleaving self-distillation with policy-gradient updates to maintain a tight ELBO, preventing the double-drift phenomenon and achieving higher rewards in both language and continuous control tasks.

0 favorites 0 likes
← Back to home

Submit Feedback