Tag
This paper introduces FATE, an on-policy framework that leverages failure trajectories to enhance the safety and performance of tool-using LLM agents through self-evolution and Pareto-aware optimization.
OpenAI proposes POLO (Plan Online, Learn Offline), a framework combining model-based control with value function learning and coordinated exploration to enable efficient learning on complex control tasks like humanoid locomotion and dexterous manipulation with minimal real-world experience.
OpenAI introduces a method for learning complex nonlinear system dynamics using deep generative models over temporal segments, enabling stable long-horizon predictions and differentiable trajectory optimization for model-based control.