PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

Hugging Face Daily Papers Papers

Summary

PANDO is a web agent framework that improves efficiency through online skill distillation, reducing token usage by 58-61% while outperforming baselines on VisualWebArena tasks.

Recent advances in multimodal web agents often rely on increased inference-time computation, including rollout search, verifier passes, offline skill discovery, and specialist model stacks. This raises a central question: can a web agent become more efficient as it accumulates experience, rather than more expensive? We first analyze trajectories from VisualWebArena and identify three recurring sources of inefficiency: repeat-action loops, hidden discovery costs, and low prompt-cache reuse. We then introduce PANDO, a single-rollout online skill-distillation framework that maintains a structured Skill Library and combines progress reflection, confidence-based skill demotion, hierarchical routing, visual compression, and cache-aware prompting. On the full set of 910 VisualWebArena tasks, PANDO achieves a 58.3% success rate, outperforming SGV (54.0%) and our WALT reproduction (45.2%), while using 58% fewer tokens than SGV and 61% fewer tokens than WALT, without any pre-evaluation discovery budget. A 300-task ablation further shows that rules and routines provide most of the success gains, while routing, compression, and cache-aware prompting convert the larger skill library into lower marginal token cost. Finally, we introduce three trajectory-level efficiency metrics -- Action Repetition Rate, Step Overhead Ratio, and Prompt Cache Utilization -- to make efficiency visible beyond terminal success.
Original Article
View Cached Full Text

Cached at: 05/29/26, 11:04 PM

Paper page - PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

Source: https://huggingface.co/papers/2605.24785

Abstract

PANDO is a web agent framework that improves efficiency through experience accumulation by reducing redundant actions, optimizing skill discovery, and enhancing prompt caching without sacrificing performance.

Recent advances inmultimodal web agentsoften rely on increased inference-time computation, includingrollout search,verifier passes,offline skill discovery, andspecialist model stacks. This raises a central question: can a web agent become more efficient as it accumulates experience, rather than more expensive? We first analyze trajectories fromVisualWebArenaand identify three recurring sources of inefficiency: repeat-action loops, hidden discovery costs, and low prompt-cache reuse. We then introduce PANDO, a single-rollout onlineskill-distillation frameworkthat maintains a structuredSkill Libraryand combinesprogress reflection,confidence-based skill demotion,hierarchical routing,visual compression, andcache-aware prompting. On the full set of 910VisualWebArenatasks, PANDO achieves a 58.3% success rate, outperforming SGV (54.0%) and our WALT reproduction (45.2%), while using 58% fewer tokens than SGV and 61% fewer tokens than WALT, without any pre-evaluation discovery budget. A 300-task ablation further shows that rules and routines provide most of the success gains, while routing, compression, andcache-aware promptingconvert the largerskill libraryinto lower marginal token cost. Finally, we introduce three trajectory-level efficiency metrics --Action Repetition Rate,Step Overhead Ratio, andPrompt Cache Utilization-- to make efficiency visible beyond terminal success.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2605\.24785

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.24785 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.24785 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.24785 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

arXiv cs.AI

This paper proposes SGDR (State-Grounded Dynamic Retrieval), an online skill learning method for web agents that enables stepwise, state-aware skill reuse rather than static task-level retrieval. Experiments on WebArena show SGDR achieves 37.5% success rate with GPT-4.1, a ~10.6% relative gain over strong baselines.

@dair_ai: https://x.com/dair_ai/status/2061104052818108476

X AI KOLs Following

A roundup of three notable AI papers: SkillOpt treats skill documents as trainable parameters to optimize frozen agents; a new method compiles agentic workflows into model weights for 100x cost reduction; and AutoScientists introduces a decentralized agent team for long-running science without a central planner.