Tag
This paper introduces PNAPO, an offline preference optimization framework for rectified flow models that augments preference data with noise samples and uses dynamic regularization to improve training efficiency and sample efficiency.