@SergioPaniego: https://x.com/SergioPaniego/status/2067270222671741360

X AI KOLs Timeline Tools

Summary

OpenReward environments now integrate directly into TRL's GRPOTrainer via a single OpenRewardSpec, allowing zero-glue-code training against a catalog of RL environments. The integration is experimental and part of a broader effort to make environment and agent RL first-class in TRL.

https://t.co/AKHNVGmBPz
Original Article
View Cached Full Text

Cached at: 06/18/26, 02:06 AM

Train against a live reward environment in TRL, now with OpenReward

TL;DR: OpenReward environments now plug straight into TRL’s GRPOTrainer. One OpenRewardSpec wires an ORS environment (its tasks, tools, and reward) into the trainer’s three slots, so you can train against the OpenReward catalog (or a self-hosted or local ORS server) with no glue code. pip install trl.

OpenReward is an open ecosystem of RL environments bui

lt on the Open Reward Standard (ORS), a public HTTP/SSE protocol for how an environment exposes its tasks, tools, sessions, and rewards. Because ORS is just a protocol, the same environment can run on the hosted openreward.ai catalog, self-hosted on your own infra, or locally while you develop it.

One OpenRewardSpec resolves an environment into the trainer’s three slots, so you pick one from the catalog, hand it over, and train:

pythonfrom trl import GRPOConfig, GRPOTrainer from trl.experimental.openreward import OpenRewardSpec

Resolves the env, its tasks, and its ORS-computed reward into the three trainer slots.

spec = OpenRewardSpec(“Eigent/SETA”, num_tasks=64)

trainer = GRPOTrainer( model=“Qwen/Qwen3-4B”, args=GRPOConfig(num_generations=8, max_tool_calling_iterations=20), train_dataset=spec.train_dataset, # the ORS task list environment_factory=spec.environment_factory, # one ORS session per rollout reward_funcs=spec.reward_funcs, # the ORS-computed reward ) trainer.train()

That runs today. The policy calls the environment’s tools turn by turn, the environment scores the outcome, and GRPO trains on it. The harness (the tool surface and the loop) comes from the environment’s tools; the only part being trained is the policy. Point the spec at a catalog name (set OPENREWARD_API_KEY) or at a URL for a self-hosted or local server. A full runnable script is seta.py.

Install, set your key, and launch (single node, vLLM colocate, as in the example):

bashpip install “trl[vllm,openreward]” export OPENREWARD_API_KEY=…

Terminal 1: vLLM server (2 GPUs)

CUDA_VISIBLE_DEVICES=2,3 trl vllm-serve
–model Qwen/Qwen3-4B
–tensor-parallel-size 2
–port 8000

Terminal 2: training (2 GPUs)

CUDA_VISIBLE_DEVICES=0,1 accelerate launch
–config_file examples/accelerate_configs/deepspeed_zero2.yaml
–num_processes 2
examples/scripts/openreward/seta.py
–vllm-mode server
–vllm-server-base-url http://localhost:8000

NOTE: OpenReward support is experimental (it lives under trl.experimental), so expect the API to keep evolving. It is one step in a broader direction to make environment and agent RL first-class in TRL, with the design being worked out in the open: environment-owned reward (#5912), environment-owned dataset (#5903), and a single rollout-source contract that unifies environment and agent rollouts (#5974).

TRL also integrates OpenEnv, the open environment standard. For the wider landscape of RL environment frameworks beyond TRL, see The ultimate guide to RL environments.

Resources

  • TRL OpenReward guide: https://huggingface.co/docs/trl/openreward

  • Runnable example (seta.py): https://github.com/huggingface/trl/blob/main/examples/scripts/openreward/seta.py

  • OpenReward catalog: https://openreward.ai

  • Open Reward Standard (ORS): https://openrewardstandard.io

  • The ultimate guide to RL environments: https://huggingface.co/spaces/AdithyaSK/rl-environments-guide

  • Agent Glossary (the vocabulary used here): https://huggingface.co/blog/agent-glossary

  • TRL OpenEnv integration: https://huggingface.co/docs/trl/openenv

Similar Articles

The Open Source Community is backing OpenEnv for Agentic RL

Hugging Face Blog

OpenEnv, a library for creating agentic execution environments to train open source agents with reinforcement learning, is becoming more open with a new governance committee including Meta-PyTorch, Hugging Face, Nvidia, and others, aiming to provide a protocol layer that works across models and harnesses.