policy-extraction

#policy-extraction

QPILOTS: Efficient Test-Time Q-Steering for Flow Policies

arXiv cs.LG ↗ · 5d ago Cached

QPILOTS is a method that steers flow policies at inference time by using critic gradients projected from noisy intermediate states, achieving state-of-the-art performance on offline-to-online RL benchmarks and improving pretrained VLA models without modifying the base policy.

0 favorites 0 likes

#policy-extraction

Dual Advantage Fields

arXiv cs.LG ↗ · 2026-06-04 Cached

Dual Advantage Fields (DAF) is a policy-extraction method for offline goal-conditioned RL that converts a bilinear dual value model into a local advantage signal by learning an action-effect model predicting feature displacement and scoring actions by alignment with the goal direction. Accepted at the ICML 2026 Workshop on Decision Making, DAF shows improved performance on OGBench locomotion, manipulation, and puzzle tasks.

0 favorites 0 likes

policy-extraction

QPILOTS: Efficient Test-Time Q-Steering for Flow Policies

Dual Advantage Fields

Submit Feedback