hierarchical-advantage-weighting

#hierarchical-advantage-weighting

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Hugging Face Daily Papers ↗ · 2026-06-15 Cached

This paper proposes Hierarchical Advantage-Weighted Behavior Cloning (HABC) for fine-tuning Vision-Language-Action (VLA) policies using online reinforcement learning with sparse binary episode outcomes. HABC separates viability and efficiency objectives via adaptive critic heads and intervention-aware credit assignment, significantly improving success rates on contact-rich bimanual manipulation tasks.

0 favorites 0 likes

hierarchical-advantage-weighting

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Submit Feedback