hierarchical-skills

#hierarchical-skills

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

arXiv cs.CL ↗ · 4d ago Cached

OPID is a framework that extracts dense token-level supervision from completed on-policy trajectories for reinforcement learning of language agents, using hierarchical skills (episode-level and step-level) to improve sample efficiency and robustness.

0 favorites 0 likes

#hierarchical-skills

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

Hugging Face Daily Papers ↗ · 5d ago Cached

OPID proposes an on-policy skill distillation framework that extracts dense hindsight supervision from completed trajectories, combining outcome-based RL with token-level self-distillation to improve language agent training efficiency and performance on multi-turn tasks.

0 favorites 0 likes

hierarchical-skills

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

Submit Feedback