Tag
This paper introduces EnvSimBench, a benchmark for evaluating Large Language Models' ability to simulate environments for agent training. It identifies a 'state change cliff' in current LLMs and proposes a constraint-driven pipeline to reduce hallucinations and costs.
Huggingface introduces EcomRLVE-GYM, a framework providing eight verifiable environments for training reinforcement learning agents on complex e-commerce tasks. The tool features adaptive difficulty curricula and algorithmic rewards to improve task completion in shopping assistants, demonstrated by training a Qwen 3 8B model.