beacon-framework

#beacon-framework

Milestone-Guided Policy Learning for Long-Horizon Language Agents

arXiv cs.CL ↗ · 5d ago Cached

This paper introduces BEACON, a milestone-guided policy learning framework designed to improve credit assignment and sample efficiency for long-horizon language agents. It demonstrates significant performance improvements over GRPO and GiGPO on benchmarks like ALFWorld, WebShop, and ScienceWorld.

0 favorites 0 likes

beacon-framework

Milestone-Guided Policy Learning for Long-Horizon Language Agents

Submit Feedback