Tag
Researchers propose APT, a two-stage training method that pretrains action experts on vision-action pairs before integrating language conditioning, significantly improving out-of-distribution instruction generalization for Vision-Language-Action policies.