multi-turn-agents

#multi-turn-agents

GAGPO: Generalized Advantage Grouped Policy Optimization

arXiv cs.AI ↗ · yesterday Cached

GAGPO proposes a critic-free RL method that uses a non-parametric grouped value proxy for step-level credit assignment in multi-turn agentic tasks, outperforming strong baselines on ALFWorld and WebShop.

0 favorites 0 likes

#multi-turn-agents

SynAE: A Framework for Measuring the Quality of Synthetic Data for Tool-Calling Agent Evaluations

arXiv cs.CL ↗ · 2026-05-22 Cached

SynAE is a framework for evaluating the quality of synthetic data used in tool-calling agent evaluations, assessing validity, fidelity, and diversity across multiple axes. It addresses challenges of insufficient or sensitive real data by providing metrics to guide synthetic data generation.

0 favorites 0 likes

#multi-turn-agents

@maximelabonne: That's so cool! The same team at @Meituan_LongCat wrote Skill0, where they propose an RL recipe for skill internalizati…

X AI KOLs Following ↗ · 2026-05-17 Cached

The tweet highlights a paper by the Meituan team on Skill0, an RL recipe for skill internalization, and references a related paper on self-distilled agentic RL.

0 favorites 0 likes

#multi-turn-agents

Self-Distilled Agentic Reinforcement Learning

Hugging Face Daily Papers ↗ · 2026-05-14 Cached

SDAR enhances multi-turn agent training by integrating self-distillation with a sigmoid gate to selectively strengthen positive token-level guidance while mitigating negative teacher rejections, achieving significant improvements over GRPO across multiple benchmarks.

0 favorites 0 likes

multi-turn-agents

GAGPO: Generalized Advantage Grouped Policy Optimization

SynAE: A Framework for Measuring the Quality of Synthetic Data for Tool-Calling Agent Evaluations

@maximelabonne: That's so cool! The same team at @Meituan_LongCat wrote Skill0, where they propose an RL recipe for skill internalizati…

Self-Distilled Agentic Reinforcement Learning

Submit Feedback