state-action-modeling

#state-action-modeling

GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents

arXiv cs.LG ↗ · 2026-05-21 Cached

GROW proposes a novel reinforcement learning framework that adapts GRPO to multi-turn VLM agent tasks by decomposing trajectories into state-action pairs and computing advantages between them, achieving state-of-the-art performance on over 800 Minecraft tasks.

0 favorites 0 likes

state-action-modeling

GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents

Submit Feedback