Tag
Introduces TruDi, a method that enables training diffusion policies in massively parallel on-policy reinforcement learning by using a trust-region optimization rule to enforce KL constraints, achieving strong performance across 73 tasks.