Tag
QGF is an RL algorithm that improves policies at test time by using a value gradient to guide a pre-trained flow policy, avoiding training-time instability while maintaining competitive performance.
This paper explains the root cause of reward hacking in reward-guided flow and diffusion models, attributing it to finite-particle plug-in estimation of the Doob h-function, and proposes a reward damping schedule to correct within-mode bias without additional computational cost.
Introduces Constrained Flow Optimization (CFO), a framework for fine-tuning generative flow models to maximize rewards while satisfying constraints in molecular design, with theoretical guarantees and experimental validation.
The paper identifies off-manifold drift in guided flow models under compositional rewards and proposes Conflict-Aware Additive Guidance (CAR), a lightweight method that dynamically resolves gradient conflicts to improve generation fidelity without retraining.
Flow-Direct introduces a non-parametric guidance field for flow-based generative models that accumulates reward feedback persistently, improving feedback efficiency and enabling reuse of collected samples to guide generation for multiple objectives without additional reward evaluations.