Tag
This paper proposes Hierarchical Advantage-Weighted Behavior Cloning (HABC) for fine-tuning Vision-Language-Action (VLA) policies using online reinforcement learning with sparse binary episode outcomes. HABC separates viability and efficiency objectives via adaptive critic heads and intervention-aware credit assignment, significantly improving success rates on contact-rich bimanual manipulation tasks.
OpenWebRL presents an open framework for training visual web agents using online multi-turn reinforcement learning on real websites, achieving state-of-the-art performance with minimal initial supervision. Their 4B-parameter model outperforms prior open agents and competes with proprietary systems like OpenAI CUA and Gemini CUA.