Robotic Policy Adaptation via Weight-Space Meta-Learning
Summary
Introduces WIZARD, a weight-space meta-learning framework that generates task-specific LoRA parameters for frozen VLA policies from language instructions and demonstration videos, enabling efficient task adaptation without fine-tuning.
View Cached Full Text
Cached at: 06/10/26, 12:08 AM
Paper page - Robotic Policy Adaptation via Weight-Space Meta-Learning
Source: https://huggingface.co/papers/2606.07217
Abstract
WIZARD is a weight-space meta-learning framework that generates task-specific LoRA parameters for frozen VLA policies using language instructions and demonstration videos, enabling efficient task adaptation without fine-tuning.
Vision-Language-Action (VLA) models are emerging as a promising paradigm for robotic manipulation, enabling general-purpose policies trained from large corpora of demonstrations and action labels. However, adapting these models to new tasks still typically requires task-specific demonstrations, action annotations, and additional fine-tuning, making deployment costly and difficult to scale. We propose WIZARD, aweight-space meta-learningframework that sidesteps task-specific fine-tuning by generating task-specificLoRA parametersfor afrozen VLA policy. Given only a language instruction and a short demonstration video, WIZARD predicts the corresponding adaptation weights in a single forward pass, without target-task action labels or test-time optimization. Duringmeta-training, WIZARD learns to maptask evidencedirectly toexpert LoRA updates, capturing relationships between tasks in weight space. Experiments on LIBERO show that WIZARD improves performance by up to ~2x on unseen dataset collections and up to ~14x on unseen tasks. On a Franka Emika Panda, WIZARD consistently improves over a real-domain adapted baseline, showing that generated adapters provide task-level specialization beyond simulation.
View arXiv pageView PDFProject pageGitHub0Add to collection
Get this paper in your agent:
hf papers read 2606\.07217
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.07217 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.07217 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.07217 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies
LaWAM enables efficient robot control by predicting compact latent visual subgoals instead of expensive video generation, achieving state-of-the-art success rates with up to 24x lower latency than pixel-space world action models.
Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes
This paper proposes Hierarchical Advantage-Weighted Behavior Cloning (HABC) for fine-tuning Vision-Language-Action (VLA) policies using online reinforcement learning with sparse binary episode outcomes. HABC separates viability and efficiency objectives via adaptive critic heads and intervention-aware credit assignment, significantly improving success rates on contact-rich bimanual manipulation tasks.
Policy and World Modeling Co-Training for Language Agents
This paper introduces PaW, a co-training framework that adds auxiliary world modeling supervision to policy learning during on-policy RL rollouts, improving language agent training without additional computational overhead.
Video2LoRA: Parametric Video Internalization for Vision-Language Models
This paper introduces Video2LoRA, a method that predicts Low-Rank Adaptation (LoRA) weights directly from video representations, enabling efficient video processing in frozen vision-language models. It reduces visual token load by up to 1500x and query TTFT by 6-80x while maintaining performance on video summarization and captioning benchmarks.
Weak-Link Optimization for Multi-Agent Reasoning and Collaboration
This paper proposes WORC, a weak-link optimization framework for multi-agent LLM systems that identifies and reinforces underperforming agents through meta-learning-based weight prediction and uncertainty-driven resource allocation, achieving 82.2% accuracy on reasoning benchmarks while improving system stability.