DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

Hugging Face Daily Papers 05/29/26, 12:00 AM Papers

Summary

This paper proposes DRIFT, a framework that combines offline trajectories with importance-weighted supervised fine-tuning to efficiently achieve multi-turn interactive learning performance comparable to reinforcement learning.

Large language models are increasingly deployed in multi-turn interactive settings where users or environments can iteratively provide lightweight feedback. Unfortunately, optimizing such behavior presents a sharp dilemma in practice: online reinforcement learning is able to effectively address multi-turn dynamics but is prohibitively expensive due to the cost of generating full correction trajectories at every update, whereas offline supervised fine-tuning (SFT) is efficient but suffers from distribution shift and behavioral collapse. To this end, we novelly propose DRIFT (Decoupled Rollouts and Importance-Weighted Fine-Tuning), a framework that operationalizes the theoretical insight that the KL-regularized RL objective is equivalent to importance-weighted supervised learning. DRIFT decouples rollout from optimization by sampling offline interaction trajectories from a fixed reference policy, deriving return-based importance weights, and optimizing the policy via weighted SFT on the resulting dataset. Empirically, we demonstrate that DRIFT matches or exceeds the performance of multi-turn reinforcement learning baselines while maintaining the training efficiency and simplicity of standard supervised fine-tuning. Code is available at https://github.com/2020-qqtcg/DRIFT.

Original Article

View Cached Full Text

Cached at: 06/01/26, 07:18 AM

Paper page - DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

Source: https://huggingface.co/papers/2605.31455 Published on May 29

Submitted byhttps://huggingface.co/mujianijan

mjon Jun 1

Abstract

DRIFT is a framework that combines offline trajectories with importance-weighted supervised fine-tuning to achieve multi-turn interactive learning efficiency and performance comparable to reinforcement learning.

Large language models are increasingly deployed in multi-turn interactive settings where users or environments can iteratively provide lightweight feedback. Unfortunately, optimizing such behavior presents a sharp dilemma in practice:online reinforcement learningis able to effectively addressmulti-turn dynamicsbut is prohibitively expensive due to the cost of generating full correction trajectories at every update, whereasoffline supervised fine-tuning(SFT) is efficient but suffers from distribution shift andbehavioral collapse. To this end, we novelly propose DRIFT (DecoupledRolloutsand Importance-Weighted Fine-Tuning), a framework that operationalizes the theoretical insight that theKL-regularized RL objectiveis equivalent toimportance-weighted supervised learning. DRIFT decouples rollout from optimization by sampling offline interaction trajectories from a fixedreference policy, deriving return-basedimportance weights, and optimizing the policy via weighted SFT on the resulting dataset. Empirically, we demonstrate that DRIFT matches or exceeds the performance of multi-turn reinforcement learning baselines while maintaining the training efficiency and simplicity of standard supervised fine-tuning. Code is available at https://github.com/2020-qqtcg/DRIFT.

View arXiv page View PDF GitHub0 Add to collection

Get this paper in your agent:

hf papers read 2605\.31455

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.31455 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.31455 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.31455 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

Paper page - DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Drift Q-Learning

DRIFT: Refining Instruction Data via On-Policy Data Attribution

Process Reward Informed Tree Rollout for Effective Multi-Turn RL

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

Submit Feedback

Similar Articles

DRIFT: Refining Instruction Data via On-Policy Data Attribution

Process Reward Informed Tree Rollout for Effective Multi-Turn RL

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning