Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?

Hugging Face Daily Papers 06/02/26, 12:00 AM Papers

Summary

Large language models can improve translation for low-resource languages through structured linguistic reasoning traces, with the most significant benefits occurring during inference rather than training.

Large language models (LLMs) offer a promising approach to machine translation (MT) for extremely low-resource languages by incorporating linguistic resources through in-context learning. However, LLMs often struggle to apply grammatical information effectively during translation. Inspired by recent progress in chain-of-thought reasoning, we investigate whether low-resource MT can benefit from structured intermediate steps of linguistic analysis and grammatical reasoning. We propose a pipeline for automatically generating step-by-step linguistic reasoning traces from Universal Dependencies treebanks, dictionaries, and grammar-rule banks. We evaluate these traces in three settings: in-context learning (ICL), supervised fine-tuning (SFT), and reinforcement fine-tuning (RFT), on Xibe and Chintang as test cases. Our results show that linguistic reasoning traces are most effective as inference-time guidance: in ICL, reliable sentence-specific traces substantially improve translation performance across most models, languages, and metrics. In contrast, using the linguistic reasoning traces as training data yields smaller and less consistent gains, as models learn the trace format but often generate erroneous content. These findings suggest that LLMs can leverage grammatical information for low-resource MT when given reliable linguistic analyses, while learning to generate such analyses remains a major bottleneck.

Original Article

View Cached Full Text

Cached at: 06/09/26, 12:41 PM

Paper page - Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?

Source: https://huggingface.co/papers/2606.03782

Abstract

Large language models(LLMs) offer a promising approach tomachine translation(MT) for extremelylow-resource languagesby incorporating linguistic resources throughin-context learning. However, LLMs often struggle to apply grammatical information effectively during translation. Inspired by recent progress inchain-of-thought reasoning, we investigate whether low-resource MT can benefit from structured intermediate steps of linguistic analysis and grammatical reasoning. We propose a pipeline for automatically generating step-by-steplinguistic reasoning tracesfromUniversal Dependencies treebanks, dictionaries, and grammar-rule banks. We evaluate these traces in three settings:in-context learning(ICL),supervised fine-tuning(SFT), andreinforcement fine-tuning(RFT), on Xibe and Chintang as test cases. Our results show thatlinguistic reasoning tracesare most effective as inference-time guidance: in ICL, reliable sentence-specific traces substantially improve translation performance across most models, languages, and metrics. In contrast, using thelinguistic reasoning tracesas training data yields smaller and less consistent gains, as models learn the trace format but often generate erroneous content. These findings suggest that LLMs can leverage grammatical information for low-resource MT when given reliable linguistic analyses, while learning to generate such analyses remains a major bottleneck.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2606\.03782

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.03782 in a model README.md to link it from this page.

Datasets citing this paper1

#### OLAResearchX/LingReason Viewer• Updated5 days ago • 4.58k • 33

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.03782 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?

Paper page - Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?

Abstract

Models citing this paper0

Datasets citing this paper1

Spaces citing this paper0

Collections including this paper0

Similar Articles

Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners

Enhanced and Efficient Reasoning in Large Learning Models

ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces

Think Multilingual, Not Harder: A Data-Efficient Framework for Teaching Reasoning Models to Code-Switch

Decoding the Critique Mechanism in Large Reasoning Models

Submit Feedback

Similar Articles

Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners

Enhanced and Efficient Reasoning in Large Learning Models

ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces

Think Multilingual, Not Harder: A Data-Efficient Framework for Teaching Reasoning Models to Code-Switch

Decoding the Critique Mechanism in Large Reasoning Models