@svlevine: A new way to do off-policy RL with diffusion: if we have off-policy data, we need to figure out what the diffusion late…

X AI KOLs Following Papers

Summary

A new method for off-policy reinforcement learning with diffusion models, using flow reversal to handle off-policy data by reversing the diffusion process on it.

A new way to do off-policy RL with diffusion: if we have off-policy data, we need to figure out what the diffusion latent steps for it would be with our *current* policy (not the one that collected it), so this requires reversing the diffusion process on off-policy data.
Original Article
View Cached Full Text

Cached at: 06/18/26, 04:03 AM

A new way to do off-policy RL with diffusion: if we have off-policy data, we need to figure out what the diffusion latent steps for it would be with our current policy (not the one that collected it), so this requires reversing the diffusion process on off-policy data.

Aditya Oberai (@aditya_oberai): What if we treat flow steps as RL actions?

Combined with our “flow reversal” technique, this leads to a really clean & powerful recipe for flow offline RL!

Thread 🧵

Similar Articles

Drift Q-Learning

arXiv cs.LG

Proposes DriftQL, which combines a drift-based behavioral regularizer with critic-driven policy improvement for offline RL, outperforming diffusion and flow methods on D4RL and OGBench while maintaining simplicity and efficiency.