Tag
ReflectMT introduces a two-stage RL method that trains LRMs to internalize reflection, enabling single-pass high-quality translation with 94% fewer tokens than multi-step reasoning models like DeepSeek-R1.