scalable-rl

#scalable-rl

Towards Scalable Multi-Task Reinforcement Learning with Large Decision Models

arXiv cs.LG ↗ · 2026-06-25 Cached

This paper introduces LDM-v0, a large decision model trained offline on trajectories from thousands of diverse reinforcement learning environments, demonstrating that a single transformer policy can match the performance of task-specific policies across robotics, autonomous driving, inventory management, cybersecurity, trading, and video games.

0 favorites 0 likes

#scalable-rl

ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning

arXiv cs.CL ↗ · 2026-05-25 Cached

ARES proposes a framework for automatically constructing rubric-based RL data from pretraining documents, generating question-answer pairs and weighted rubrics to enable instance-level reward supervision for open-ended LLM responses, outperforming existing methods on multi-dimensional open-ended tasks.

0 favorites 0 likes

scalable-rl

Towards Scalable Multi-Task Reinforcement Learning with Large Decision Models

ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning

Submit Feedback