Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play

Hugging Face Daily Papers 04/20/26, 12:00 AM Papers

reasoning self-play language-models reinforcement-learning transfer-learning game-ai mathematical-reasoning

Summary

STRATAGEM is a new framework for improving reasoning transferability in language models by using game self-play with a Reasoning Transferability Coefficient and Reasoning Evolution Reward to reinforce abstract, domain-agnostic reasoning patterns over game-specific heuristics. Experiments show strong improvements on mathematical reasoning, general reasoning, and code generation benchmarks.

Games offer a compelling paradigm for developing general reasoning capabilities in language models, as they naturally demand strategic planning, probabilistic inference, and adaptive decision-making. However, existing self-play approaches rely solely on terminal game outcomes, providing no mechanism to distinguish transferable reasoning patterns from game-specific heuristics. We present STRATAGEM, which addresses two fundamental barriers to reasoning transfer: domain specificity, where learned patterns remain anchored in game semantics, and contextual stasis, where static game contexts fail to cultivate progressive reasoning. STRATAGEM selectively reinforces trajectories exhibiting abstract, domain-agnostic reasoning through a Reasoning Transferability Coefficient, while incentivizing adaptive reasoning development via a Reasoning Evolution Reward. Experiments across mathematical reasoning, general reasoning, and code generation benchmarks demonstrate substantial improvements, with particularly strong gains on competition-level mathematics where multi-step reasoning is critical. Ablation studies and human evaluation confirm that both components contribute to transferable reasoning.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/21/26, 07:21 AM

Paper page - Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play

Source: https://huggingface.co/papers/2604.17696 Authors:

Abstract

STRATAGEM addresses limitations in reasoning transfer for language models by using a reasoning transferability coefficient and evolution reward to promote abstract, domain-agnostic patterns over game-specific heuristics.

Games offer a compelling paradigm for developing general reasoning capabilities in language models, as they naturally demandstrategic planning,probabilistic inference, andadaptive decision-making. However, existingself-playapproaches rely solely on terminal game outcomes, providing no mechanism to distinguish transferable reasoning patterns from game-specific heuristics. We present STRATAGEM, which addresses two fundamental barriers to reasoning transfer:domain specificity, where learned patterns remain anchored in game semantics, andcontextual stasis, where static game contexts fail to cultivate progressive reasoning. STRATAGEM selectively reinforces trajectories exhibiting abstract, domain-agnostic reasoning through aReasoning Transferability Coefficient, while incentivizing adaptive reasoning development via aReasoning Evolution Reward. Experiments across mathematical reasoning, general reasoning, and code generation benchmarks demonstrate substantial improvements, with particularly strong gains on competition-level mathematics wheremulti-step reasoningis critical. Ablation studies and human evaluation confirm that both components contribute to transferable reasoning.

View arXiv page View PDF GitHub0 Add to collection

Get this paper in your agent:

hf papers read 2604\.17696

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.17696 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.17696 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.17696 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play

Paper page - Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

STRIDE-ED: A Strategy-Grounded Stepwise Reasoning Framework for Empathetic Dialogue Systems

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

TriEx: A Game-based Tri-View Framework for Explaining Internal Reasoning in Multi-Agent LLMs

How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data

How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning

Submit Feedback

Similar Articles

STRIDE-ED: A Strategy-Grounded Stepwise Reasoning Framework for Empathetic Dialogue Systems

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

TriEx: A Game-based Tri-View Framework for Explaining Internal Reasoning in Multi-Agent LLMs

How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data

How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning