self-play

#self-play

SPARK: Self-Play with Asymmetric Reward from Knowledge Graphs

arXiv cs.AI ↗ · 2d ago Cached

This paper introduces SPARK, a self-play reinforcement learning framework that leverages knowledge graphs derived from scientific literature to improve relational reasoning in vision-language models.

0 favorites 0 likes

#self-play

Bootstrapping Post-training Signals for Open-ended Tasks via Rubric-based Self-play on Pre-training Text

arXiv cs.CL ↗ · 2026-04-23 Cached

Cornell researchers propose POP, a self-play framework that lets an LLM generate its own rubrics and training pairs for open-ended tasks, boosting Qwen-2.5-7B on healthcare QA, creative writing and instruction following without human labels.

0 favorites 0 likes

#self-play

Improving LLM Code Reasoning via Semantic Equivalence Self-Play with Formal Verification

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers from University of Edinburgh propose a self-play framework using Liquid Haskell for formal verification to train LLMs on semantic equivalence reasoning, releasing OpInstruct-HSx dataset (28k programs) and achieving 13.3pp accuracy gains on EquiBench.

0 favorites 0 likes

#self-play

Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play

Hugging Face Daily Papers ↗ · 2026-04-20 Cached

STRATAGEM is a new framework for improving reasoning transferability in language models by using game self-play with a Reasoning Transferability Coefficient and Reasoning Evolution Reward to reinforce abstract, domain-agnostic reasoning patterns over game-specific heuristics. Experiments show strong improvements on mathematical reasoning, general reasoning, and code generation benchmarks.

0 favorites 0 likes

#self-play

Dota 2 with large scale deep reinforcement learning

OpenAI Blog ↗ · 2019-12-13 Cached

OpenAI Five became the first AI system to defeat Dota 2 world champions using large-scale deep reinforcement learning with self-play, demonstrating superhuman performance on a complex game with long time horizons and imperfect information.

0 favorites 0 likes

#self-play

Emergent tool use from multi-agent interaction

OpenAI Blog ↗ · 2019-09-17 Cached

OpenAI demonstrates that agents trained in a hide-and-seek environment discover six distinct emergent strategies and tool-use behaviors through multi-agent competition, without explicit incentives for object interaction. This work suggests multi-agent co-adaptation can produce complex intelligent behavior through self-supervised learning.

0 favorites 0 likes

#self-play

OpenAI Five

OpenAI Blog ↗ · 2018-06-25 Cached

OpenAI Five is a reinforcement learning agent that masters Dota 2 through self-play training with curriculum learning and strategic randomization, progressing from random behavior to executing complex human-level strategies.

0 favorites 0 likes

#self-play

Competitive self-play

OpenAI Blog ↗ · 2017-10-11 Cached

OpenAI demonstrates that competitive self-play in simulated 3D robot environments enables AI agents to discover complex physical behaviors like tackling, ducking, and faking without explicit instruction, suggesting self-play will be fundamental to future powerful AI systems.

0 favorites 0 likes

#self-play

More on Dota 2

OpenAI Blog ↗ · 2017-08-16 Cached

OpenAI describes iterative improvements to their Dota 2 bot during The International tournament, combining coaching with self-play to enhance agent performance through rapid training cycles and strategic refinements discovered during professional matches.

0 favorites 0 likes

#self-play

Dota 2

OpenAI Blog ↗ · 2017-08-11 Cached

OpenAI created a bot that defeats world-class Dota 2 professionals in 1v1 matches using only self-play learning, without imitation learning or tree search. The achievement demonstrates progress toward AI systems that can accomplish complex goals in dynamic, multi-agent environments.

0 favorites 0 likes

self-play

Submit Feedback