@seohong_park: RQL is a new, clean algorithm for (offline) flow RL! The main idea is to treat flow steps as MDP steps, and use "revers…

X AI KOLs Following 06/17/26, 05:39 PM Papers

rql flow-rl offline-rl reinforcement-learning algorithm research

Summary

RQL is a new algorithm for offline flow reinforcement learning that treats flow steps as MDP steps and uses reversed flows to generate hindsight trajectories.

RQL is a new, clean algorithm for (offline) flow RL! The main idea is to treat flow steps as MDP steps, and use "reversed" flows to generate hindsight flow trajectories for off-policy data.

Original Article

View Cached Full Text

Cached at: 06/18/26, 12:01 AM

RQL is a new, clean algorithm for (offline) flow RL!

The main idea is to treat flow steps as MDP steps, and use “reversed” flows to generate hindsight flow trajectories for off-policy data.

Aditya Oberai (@aditya_oberai): What if we treat flow steps as RL actions?

Combined with our “flow reversal” technique, this leads to a really clean & powerful recipe for flow offline RL!

Thread 🧵

Similar Articles

Reversal Q-Learning

arXiv cs.LG

This paper proposes Reversal Q-Learning (RQL), an offline reinforcement learning algorithm that trains a flow policy using an expanded Markov decision process framework and techniques to enable off-policy RL without backpropagation through time. It achieves state-of-the-art performance on challenging simulated robotic tasks.

@aditya_oberai: What if we treat flow steps as RL actions? Combined with our “flow reversal” technique, this leads to a really clean & …

X AI KOLs Timeline

Proposes treating flow steps as RL actions combined with a 'flow reversal' technique for flow offline reinforcement learning.

Drift Q-Learning

arXiv cs.LG

Proposes DriftQL, which combines a drift-based behavioral regularizer with critic-driven policy improvement for offline RL, outperforming diffusion and flow methods on D4RL and OGBench while maintaining simplicity and efficiency.

@svlevine: A new way to do off-policy RL with diffusion: if we have off-policy data, we need to figure out what the diffusion late…

X AI KOLs Following

A new method for off-policy reinforcement learning with diffusion models, using flow reversal to handle off-policy data by reversing the diffusion process on it.

@svlevine: Diffusion (or flow) makes for excellent policies, but training them with RL is notoriously hard: BPTT is unstable, RL o…