Tag
Mercor announces joining the OpenEnv committee alongside Meta, PyTorch, NVIDIA, PrimeIntellect, and Hugging Face to guide the open foundation for agentic environments.
PatchWorld introduces a gradient-free framework that transforms offline trajectories into executable Python world models via counterexample-guided code repair, enabling interpretable and inspectable belief-state programs for planning in partially observable environments.
SkillEvolBench is a diagnostic benchmark for evaluating whether large language model agents can distill episodic experience into reusable procedural skills. It includes 180 tasks across six environments and finds that current agents often struggle to form robust reusable skills, with raw trajectory reuse often outperforming distilled skills.