Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models

Hugging Face Daily Papers 05/14/26, 12:00 AM Papers

Summary

Proposes a training-free inference-time method for Vision-Language-Action models to correct pace and path dynamics, improving success rates by up to 28.8% in dynamic environments.

Vision-Language-Action (VLA) models achieve remarkable flexibility and generalization beyond classical control paradigms. However, most prevailing VLAs are trained under a single-frame observation paradigm, which leaves them structurally blind to temporal dynamics. Consequently, these models degrade severely in non-stationary scenarios, even when trained or finetuned on dynamic datasets. Existing approaches either require expensive retraining or suffer from latency bottlenecks and poor temporal consistency across action chunks. We propose Pace-and-Path Correction, a training-free, closed-form inference-time operator that wraps any chunked-action VLA. From a single quadratic cost, joint minimization yields a unified solution that decomposes orthogonally into two distinct channels. The pace channel compresses execution along the planned direction, while the path channel applies an orthogonal spatial offset, jointly absorbing the perceived dynamics within the chunk window. We evaluate our approach on a comprehensive diagnostic benchmark MoveBench designed to isolate motion as the sole controlled variable. Empirical results demonstrate that our framework consistently outperforms state-of-the-art training-free wrappers and dynamic-adaptive methods and improves success rates by up to 28.8% and 25.9% in absolute terms over foundational VLA models in dynamic-only and static-dynamic mixed environments, respectively.

Original Article

View Cached Full Text

Cached at: 05/15/26, 08:24 AM

Paper page - Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models

Source: https://huggingface.co/papers/2605.11459

Abstract

Vision-Language-Action models suffer from temporal blindness in dynamic environments, but a training-free correction method using quadratic optimization improves performance by addressing pace and path dynamics simultaneously.

Vision-Language-Action (VLA) models achieve remarkable flexibility and generalization beyond classical control paradigms. However, most prevailing VLAs are trained under a single-frame observation paradigm, which leaves them structurally blind totemporal dynamics. Consequently, these models degrade severely in non-stationary scenarios, even when trained or finetuned on dynamic datasets. Existing approaches either require expensive retraining or suffer from latency bottlenecks and poor temporal consistency across action chunks. We propose Pace-and-Path Correction, a training-free, closed-form inference-time operator that wraps anychunked-actionVLA. From a singlequadratic cost,joint minimizationyields a unified solution that decomposes orthogonally into two distinct channels. The pace channel compresses execution along the planned direction, while the path channel applies an orthogonal spatial offset, jointly absorbing the perceived dynamics within the chunk window. We evaluate our approach on a comprehensive diagnostic benchmark MoveBench designed to isolate motion as the sole controlled variable. Empirical results demonstrate that our framework consistently outperforms state-of-the-art training-free wrappers and dynamic-adaptive methods and improves success rates by up to 28.8% and 25.9% in absolute terms over foundational VLA models in dynamic-only andstatic-dynamic mixed environments, respectively.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2605\.11459

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.11459 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.11459 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.11459 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models

Paper page - Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

StableVLA: Towards Robust Vision-Language-Action Models without Extra Data

D-VLA: A High-Concurrency Distributed Asynchronous Reinforcement Learning Framework for Vision-Language-Action Models

AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models

Closed-Loop Neural Activation Control in Vision-Language-Action Models

VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies

Submit Feedback

Similar Articles

StableVLA: Towards Robust Vision-Language-Action Models without Extra Data

D-VLA: A High-Concurrency Distributed Asynchronous Reinforcement Learning Framework for Vision-Language-Action Models

AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models

Closed-Loop Neural Activation Control in Vision-Language-Action Models

VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies