Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation
Summary
This paper proposes a reinforcement learning approach to enable large language models to translate unseen languages by leveraging in-context linguistic knowledge, outperforming in-context learning and supervised fine-tuning.
View Cached Full Text
Cached at: 06/05/26, 06:07 AM
Paper page - Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation
Source: https://huggingface.co/papers/2606.06428
Abstract
Reinforcement learning approach enables large language models to translate unseen languages by leveraging in-context linguistic knowledge rather than memorizing specific languages.
Prior work has shown thatlarge language models(LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limitedzero-shot transferat test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire themeta-skillof utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose areinforcement learning(RL) approach to unseen language translation given richlinguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages thanin-context learningorsupervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.
View arXiv pageView PDFGitHub2Add to collection
Get this paper in your agent:
hf papers read 2606\.06428
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.06428 in a model README.md to link it from this page.
Datasets citing this paper1
#### HanxuHU/rl-new-language Viewer• Updatedabout 2 hours ago • 135k • 71
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.06428 in a Space README.md to link it from this page.
Collections including this paper1
Similar Articles
Self-Consolidating Language Models: Continual Knowledge Incorporation from Context
The paper introduces Self-Consolidating Language Models (SCoL), a framework that uses meta-reinforcement learning to write current context into model weights for continual knowledge incorporation. It demonstrates improved acquisition and retention over baselines in both QA and long-context consolidation tasks.
Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models
This paper investigates how using diverse self-generated data during mid-training improves the effectiveness of Reinforcement Learning in Large Language Models, particularly for reasoning tasks.
Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax
This paper proposes using reinforcement learning with semantic rewards (via GRPO) to expand LLMs to low-resource languages without the typical alignment tax of catastrophic forgetting, showing improved semantic quality and transferability over supervised fine-tuning.
Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models
This paper proposes mid-training language models on self-generated diverse reasoning traces before reinforcement learning, showing improved RL performance on math benchmarks by exposing models to multiple valid solution approaches.
Translate-R1: Cost-Aware Translation Tool Use via Reinforcement Learning
Translate-R1 introduces a reinforcement learning approach for cost-aware translation tool use in LLMs, where the model learns to decide when to translate inputs based on its own comprehension and a cost-sensitivity parameter, achieving Pareto-optimal trade-offs across multiple languages.