Drift Q-Learning
Summary
Proposes DriftQL, which combines a drift-based behavioral regularizer with critic-driven policy improvement for offline RL, outperforming diffusion and flow methods on D4RL and OGBench while maintaining simplicity and efficiency.
View Cached Full Text
Cached at: 06/02/26, 03:41 PM
# Drift Q-Learning Source: [https://arxiv.org/abs/2606.00350](https://arxiv.org/abs/2606.00350) [View PDF](https://arxiv.org/pdf/2606.00350) > Abstract:Offline reinforcement learning requires improving a policy from fixed data while avoiding out\-of\-distribution actions with unreliable value estimates\. Diffusion and flow policies handle this trade\-off by modeling the behavior distribution to regularize the RL objective, but they require iterative denoising, solver integrations, and in more efficient variants, distillation or other approximations at inference\. We propose DriftQL, which combines a drift\-based behavioral regularizer with critic\-driven policy improvement\. The value signal biases the policy toward high\-value regions of the data support, while attraction and repulsion together keep generated actions near the data and prevent collapse onto a single mode\. DriftQL is implemented as a single network with a unified training objective and generates actions in a single forward pass\. On D4RL and OGBench, DriftQL consistently outperforms diffusion and flow methods, advancing the state of the art\. Under degraded data quality, where the baselines visibly struggle, DriftQL remains close to its clean\-data performance, positioning it as a promising alternative to diffusion and flow\-based methods while maintaining the simplicity and efficiency of deterministic approaches\. Project page:[this https URL](https://driftql.github.io/) ## Submission history From: Mohamad H Danesh \[[view email](https://arxiv.org/show-email/b71ab235/2606.00350)\] **\[v1\]**Fri, 29 May 2026 20:42:30 UTC \(1,995 KB\)
Similar Articles
DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization
This paper proposes DRIFT, a framework that combines offline trajectories with importance-weighted supervised fine-tuning to efficiently achieve multi-turn interactive learning performance comparable to reinforcement learning.
Debiased Model-based Representations for Sample-efficient Continuous Control
This paper introduces the DR.Q algorithm, which improves model-based representations for Q-learning by maximizing mutual information and using faded prioritized experience replay to reduce bias and overfitting in continuous control tasks.
Reinforcement Learning via Value Gradient Flow
Value Gradient Flow (VGF) presents a scalable approach to behavior-regularized reinforcement learning by formulating it as an optimal transport problem solved through discrete gradient flow, achieving state-of-the-art results on offline RL and LLM RL benchmarks. The method eliminates explicit policy parameterization while enabling adaptive test-time scaling by controlling transport budget.
Drifting Objectives for Refining Discrete Diffusion Language Models
This paper introduces TokenDrift, a drifting objective that refines discrete diffusion language models by lifting categorical predictions to a continuous semantic space for anti-symmetric drifting, significantly improving generation quality under a fixed number of denoising steps.
@probablynotaz9: Solo-author ICML paper alert Ever wanted to post-train your diffusion LLM with good old policy gradients, without havin…
This solo-author ICML paper introduces Amortized Group Relative Policy Optimization (AGRPO) to enable effective reinforcement learning post-training for diffusion language models.