hierarchical-skills

Tag

Cards List
#hierarchical-skills

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

arXiv cs.CL · 4d ago Cached

OPID is a framework that extracts dense token-level supervision from completed on-policy trajectories for reinforcement learning of language agents, using hierarchical skills (episode-level and step-level) to improve sample efficiency and robustness.

0 favorites 0 likes
#hierarchical-skills

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

Hugging Face Daily Papers · 5d ago Cached

OPID proposes an on-policy skill distillation framework that extracts dense hindsight supervision from completed trajectories, combining outcome-based RL with token-level self-distillation to improve language agent training efficiency and performance on multi-turn tasks.

0 favorites 0 likes
← Back to home

Submit Feedback