Recovering Hidden Reward in Diffusion-Based Policies

Hugging Face Daily Papers 05/01/26, 12:00 AM Papers

Summary

This research paper explores methods for recovering hidden rewards within diffusion-based policies, likely aiming to improve the alignment or efficiency of such models.

This paper introduces EnergyFlow, a framework that unifies generative action modeling with inverse reinforcement learning by parameterizing a scalar energy function whose gradient is the denoising field. We establish that under maximum-entropy optimality, the score function learned via denoising score matching recovers the gradient of the expert's soft Q-function, enabling reward extraction without adversarial training. Formally, we prove that constraining the learned field to be conservative reduces hypothesis complexity and tightens out-of-distribution generalization bounds. We further characterize the identifiability of recovered rewards and bound how score estimation errors propagate to action preferences. Empirically, EnergyFlow achieves state-of-the-art imitation performance on various manipulation tasks while providing an effective reward signal for downstream reinforcement learning that outperforms both adversarial IRL methods and likelihood-based alternatives. These results show that the structural constraints required for valid reward extraction simultaneously serve as beneficial inductive biases for policy generalization. The code is available at https://github.com/sotaagi/EnergyFlow.

Original Article

View Cached Full Text

Cached at: 05/08/26, 07:12 AM

Paper page - Recovering Hidden Reward in Diffusion-Based Policies

Source: https://huggingface.co/papers/2605.00623 Get this paper in your agent:

hf papers read 2605\.00623

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.00623 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.00623 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.00623 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Recovering Hidden Reward in Diffusion-Based Policies

Paper page - Recovering Hidden Reward in Diffusion-Based Policies

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Hierarchical Variational Policies for Reward-Guided Diffusion

Adaptive Order Policies for Masked Diffusion

@svlevine: A new way to do off-policy RL with diffusion: if we have off-policy data, we need to figure out what the diffusion late…

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

Submit Feedback

Similar Articles

Hierarchical Variational Policies for Reward-Guided Diffusion

Adaptive Order Policies for Masked Diffusion

@svlevine: A new way to do off-policy RL with diffusion: if we have off-policy data, we need to figure out what the diffusion late…

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training