seq2seq

Tag

Cards List
#seq2seq

Reference-Free Reinforcement Learning Fine-Tuning for MT: A Seq2Seq Perspective

arXiv cs.CL · 2026-05-18 Cached

This paper applies Group Relative Policy Optimization (GRPO) to encoder-decoder Seq2Seq models for machine translation fine-tuning, using reference-free rewards (LaBSE and COMET-Kiwi) that require no parallel data, and achieves consistent improvements across 13 languages.

0 favorites 0 likes
← Back to home

Submit Feedback