@SergioPaniego: https://x.com/SergioPaniego/status/2067270222671741360
Summary
OpenReward environments now integrate directly into TRL's GRPOTrainer via a single OpenRewardSpec, allowing zero-glue-code training against a catalog of RL environments. The integration is experimental and part of a broader effort to make environment and agent RL first-class in TRL.
View Cached Full Text
Cached at: 06/18/26, 02:06 AM
Train against a live reward environment in TRL, now with OpenReward
TL;DR: OpenReward environments now plug straight into TRL’s GRPOTrainer. One OpenRewardSpec wires an ORS environment (its tasks, tools, and reward) into the trainer’s three slots, so you can train against the OpenReward catalog (or a self-hosted or local ORS server) with no glue code. pip install trl.
OpenReward is an open ecosystem of RL environments bui
lt on the Open Reward Standard (ORS), a public HTTP/SSE protocol for how an environment exposes its tasks, tools, sessions, and rewards. Because ORS is just a protocol, the same environment can run on the hosted openreward.ai catalog, self-hosted on your own infra, or locally while you develop it.
One OpenRewardSpec resolves an environment into the trainer’s three slots, so you pick one from the catalog, hand it over, and train:
pythonfrom trl import GRPOConfig, GRPOTrainer from trl.experimental.openreward import OpenRewardSpec
Resolves the env, its tasks, and its ORS-computed reward into the three trainer slots.
spec = OpenRewardSpec(“Eigent/SETA”, num_tasks=64)
trainer = GRPOTrainer( model=“Qwen/Qwen3-4B”, args=GRPOConfig(num_generations=8, max_tool_calling_iterations=20), train_dataset=spec.train_dataset, # the ORS task list environment_factory=spec.environment_factory, # one ORS session per rollout reward_funcs=spec.reward_funcs, # the ORS-computed reward ) trainer.train()
That runs today. The policy calls the environment’s tools turn by turn, the environment scores the outcome, and GRPO trains on it. The harness (the tool surface and the loop) comes from the environment’s tools; the only part being trained is the policy. Point the spec at a catalog name (set OPENREWARD_API_KEY) or at a URL for a self-hosted or local server. A full runnable script is seta.py.
Install, set your key, and launch (single node, vLLM colocate, as in the example):
bashpip install “trl[vllm,openreward]” export OPENREWARD_API_KEY=…
Terminal 1: vLLM server (2 GPUs)
CUDA_VISIBLE_DEVICES=2,3 trl vllm-serve
–model Qwen/Qwen3-4B
–tensor-parallel-size 2
–port 8000
Terminal 2: training (2 GPUs)
CUDA_VISIBLE_DEVICES=0,1 accelerate launch
–config_file examples/accelerate_configs/deepspeed_zero2.yaml
–num_processes 2
examples/scripts/openreward/seta.py
–vllm-mode server
–vllm-server-base-url http://localhost:8000
NOTE: OpenReward support is experimental (it lives under trl.experimental), so expect the API to keep evolving. It is one step in a broader direction to make environment and agent RL first-class in TRL, with the design being worked out in the open: environment-owned reward (#5912), environment-owned dataset (#5903), and a single rollout-source contract that unifies environment and agent rollouts (#5974).
TRL also integrates OpenEnv, the open environment standard. For the wider landscape of RL environment frameworks beyond TRL, see The ultimate guide to RL environments.
Resources
-
TRL OpenReward guide: https://huggingface.co/docs/trl/openreward
-
Runnable example (seta.py): https://github.com/huggingface/trl/blob/main/examples/scripts/openreward/seta.py
-
OpenReward catalog: https://openreward.ai
-
Open Reward Standard (ORS): https://openrewardstandard.io
-
The ultimate guide to RL environments: https://huggingface.co/spaces/AdithyaSK/rl-environments-guide
-
Agent Glossary (the vocabulary used here): https://huggingface.co/blog/agent-glossary
-
TRL OpenEnv integration: https://huggingface.co/docs/trl/openenv
Similar Articles
@adithya_s_k: You can now train on 350+ RL Environments from OpenReward with TRL with just a few lines of code
OpenReward and TRL now support training on over 350 reinforcement learning environments with minimal code.
@SergioPaniego: OpenEnv is growing fast in tutorials. If you're looking to get started with RL environments, check them out > evaluate …
OpenEnv, a platform for reinforcement learning environments, is expanding its tutorials, covering topics like evaluating agents, rewards via rubrics, and connecting agents via MCP.
GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero
GRLO introduces a novel reinforcement learning post-training method that achieves strong generalization across multiple domains (math, code, etc.) from only 5K prompts and 22.7 GPU hours, significantly outperforming in-domain RLVR baselines in efficiency and data requirements.
@SergioPaniego: OpenEnv has a new home: http://github.com/huggingface/OpenEnv… starting today, it's coordinated by a committee that inc…
OpenEnv, a framework for creating and deploying isolated execution environments for agentic RL training, has moved to Hugging Face and is now governed by a committee including Meta-PyTorch, NVIDIA, and others.
The Open Source Community is backing OpenEnv for Agentic RL
OpenEnv, a library for creating agentic execution environments to train open source agents with reinforcement learning, is becoming more open with a new governance committee including Meta-PyTorch, Hugging Face, Nvidia, and others, aiming to provide a protocol layer that works across models and harnesses.