Tag
This paper theoretically identifies and mitigates context distribution shift in multi-turn dialogue RL, proposing Calibrated Interactive RL that couples interactive RL with simulator alignment to reduce the sim-to-real gap and achieve state-of-the-art performance.