Tag
QPILOTS is a method that steers flow policies at inference time by using critic gradients projected from noisy intermediate states, achieving state-of-the-art performance on offline-to-online RL benchmarks and improving pretrained VLA models without modifying the base policy.
Dual Advantage Fields (DAF) is a policy-extraction method for offline goal-conditioned RL that converts a bilinear dual value model into a local advantage signal by learning an action-effect model predicting feature displacement and scoring actions by alignment with the goal direction. Accepted at the ICML 2026 Workshop on Decision Making, DAF shows improved performance on OGBench locomotion, manipulation, and puzzle tasks.