@seohong_park: RQL is a new, clean algorithm for (offline) flow RL! The main idea is to treat flow steps as MDP steps, and use "revers…

X AI KOLs Following Papers

Summary

RQL is a new algorithm for offline flow reinforcement learning that treats flow steps as MDP steps and uses reversed flows to generate hindsight trajectories.

RQL is a new, clean algorithm for (offline) flow RL! The main idea is to treat flow steps as MDP steps, and use "reversed" flows to generate hindsight flow trajectories for off-policy data.
Original Article
View Cached Full Text

Cached at: 06/18/26, 12:01 AM

RQL is a new, clean algorithm for (offline) flow RL!

The main idea is to treat flow steps as MDP steps, and use “reversed” flows to generate hindsight flow trajectories for off-policy data.

Aditya Oberai (@aditya_oberai): What if we treat flow steps as RL actions?

Combined with our “flow reversal” technique, this leads to a really clean & powerful recipe for flow offline RL!

Thread 🧵

Similar Articles

Reversal Q-Learning

arXiv cs.LG

This paper proposes Reversal Q-Learning (RQL), an offline reinforcement learning algorithm that trains a flow policy using an expanded Markov decision process framework and techniques to enable off-policy RL without backpropagation through time. It achieves state-of-the-art performance on challenging simulated robotic tasks.

Drift Q-Learning

arXiv cs.LG

Proposes DriftQL, which combines a drift-based behavioral regularizer with critic-driven policy improvement for offline RL, outperforming diffusion and flow methods on D4RL and OGBench while maintaining simplicity and efficiency.