Tag
QPILOTS is a method that steers flow policies at inference time by using critic gradients projected from noisy intermediate states, achieving state-of-the-art performance on offline-to-online RL benchmarks and improving pretrained VLA models without modifying the base policy.