self-generations

#self-generations

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

arXiv cs.CL ↗ · 2026-05-27 Cached

This paper introduces CroCo, a method for cross-lingual contrastive preference tuning on self-generated responses, showing that a reward model trained on English preferences can effectively rank responses in other languages, improving model performance across 14 languages without language-specific annotations.

0 favorites 0 likes

self-generations

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

Submit Feedback