ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

Hugging Face Daily Papers 05/06/26, 12:00 AM Papers

Summary

ReflectDrive-2 is a new discrete diffusion planner for autonomous driving that uses reinforcement learning to enable self-editing of trajectory tokens, achieving high performance and low latency on the NAVSIM benchmark.

We introduce ReflectDrive-2, a masked discrete diffusion planner with separate action expert for autonomous driving that represents plans as discrete trajectory tokens and generates them through parallel masked decoding. This discrete token space enables in-place trajectory revision: AutoEdit rewrites selected tokens using the same model, without requiring an auxiliary refinement network. To train this capability, we use a two-stage procedure. First, we construct structure-aware perturbations of expert trajectories along longitudinal progress and lateral heading directions and supervise the model to recover the original expert trajectory. We then fine-tune the full decision--draft--reflect rollout with reinforcement learning (RL), assigning terminal driving reward to the final post-edit trajectory and propagating policy-gradient credit through full-rollout transitions. Full-rollout RL proves crucial for coupling drafting and editing: under supervised training alone, inference-time AutoEdit improves PDMS by at most 0.3, whereas RL increases its gain to 1.9. We also co-design an efficient reflective decoding stack for the decision--draft--reflect pipeline, combining shared-prefix KV reuse, Alternating Step Decode, and fused on-device unmasking. On NAVSIM, ReflectDrive-2 achieves 91.0 PDMS with camera-only input and 94.8 PDMS in a best-of-6 oracle setting, while running at 31.8 ms average latency on NVIDIA Thor.

Original Article

View Cached Full Text

Cached at: 05/08/26, 07:19 AM

Paper page - ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

Source: https://huggingface.co/papers/2605.04647

Abstract

ReflectDrive-2 employs a masked discrete diffusion planner with parallel decoding for autonomous driving, enabling in-place trajectory revision through token rewriting and achieving high performance with efficient reflective decoding.

We introduce ReflectDrive-2, amasked discrete diffusion plannerwith separate action expert for autonomous driving that represents plans asdiscrete trajectory tokensand generates them throughparallel masked decoding. This discrete token space enables in-place trajectory revision:AutoEditrewrites selected tokens using the same model, without requiring an auxiliary refinement network. To train this capability, we use a two-stage procedure. First, we constructstructure-aware perturbationsof expert trajectories along longitudinal progress and lateral heading directions and supervise the model to recover the original expert trajectory. We then fine-tune the full decision--draft--reflect rollout withreinforcement learning(RL), assigning terminal driving reward to the final post-edit trajectory and propagatingpolicy-gradient creditthrough full-rollout transitions. Full-rollout RL proves crucial for coupling drafting and editing: under supervised training alone, inference-timeAutoEditimprovesPDMSby at most 0.3, whereas RL increases its gain to 1.9. We also co-design an efficient reflective decoding stack for thedecision--draft--reflect pipeline, combiningshared-prefix KV reuse,Alternating Step Decode, andfused on-device unmasking. OnNAVSIM, ReflectDrive-2 achieves 91.0PDMSwith camera-only input and 94.8PDMSin a best-of-6 oracle setting, while running at 31.8 ms average latency on NVIDIA Thor.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2605\.04647

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.04647 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.04647 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.04647 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

Paper page - ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning

Submit Feedback

Similar Articles

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning