Tag
GAGPO proposes a critic-free RL method that uses a non-parametric grouped value proxy for step-level credit assignment in multi-turn agentic tasks, outperforming strong baselines on ALFWorld and WebShop.
SynAE is a framework for evaluating the quality of synthetic data used in tool-calling agent evaluations, assessing validity, fidelity, and diversity across multiple axes. It addresses challenges of insufficient or sensitive real data by providing metrics to guide synthetic data generation.
The tweet highlights a paper by the Meituan team on Skill0, an RL recipe for skill internalization, and references a related paper on self-distilled agentic RL.
SDAR enhances multi-turn agent training by integrating self-distillation with a sigmoid gate to selectively strengthen positive token-level guidance while mitigating negative teacher rejections, achieving significant improvements over GRPO across multiple benchmarks.