web-agents

#web-agents

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

arXiv cs.CL ↗ · 6d ago Cached

Introduces Ko-WideSearch, a Korean breadth-search benchmark for web agents that evaluates exhaustive set enumeration across 228 tables. Findings show agents have high item recall but struggle with row completion, especially for open-ended cells.

0 favorites 0 likes

#web-agents

@dair_ai: If you build web agents, this one is worth your time. It's on how to make agent skills reusable. (bookmark it) LLM web …

X AI KOLs Following ↗ · 2026-06-18 Cached

This paper introduces SkillMigrator, an LLM web agent that learns reusable skills and transfers them across websites by matching layout structure rather than domain-specific metadata, reducing LLM action count by 8-10% on WebArena and Mind2Web benchmarks.

0 favorites 0 likes

#web-agents

Beyond Domains: Reusing Web Skills via Transferable Interaction Patterns

arXiv cs.AI ↗ · 2026-06-17 Cached

This paper introduces SkillMigrator, an agent that learns reusable web skills as transferable interaction patterns (TIPs) and transfers them across websites by matching layout structure, reducing LLM action counts by 8-10% on benchmarks.

0 favorites 0 likes

#web-agents

@rsalakhu: Congrats to the @browser_use team for taking the #1 spot on Odysseys, a highly challenging benchmark for long-horizon w…

X AI KOLs Following ↗ · 2026-06-16 Cached

The browser_use team achieved the #1 spot on the Odysseys benchmark, a challenging evaluation for long-horizon web agents, outperforming models like Opus 4.6 and GPT-5.4.

0 favorites 0 likes

#web-agents

Are Online Skill and Memory Modules Always Worth Their Tokens? A Budget-Constrained Study of Web Agents

arXiv cs.CL ↗ · 2026-06-16 Cached

This paper investigates whether online skill and memory modules for web agents are worth their token cost under a fixed inference budget, finding that a budget-matched vanilla baseline often matches or outperforms augmented methods across three domains and models.

0 favorites 0 likes

#web-agents

Benchmarking Web Agent Safety under E-commerce Deceptive Interfaces

arXiv cs.CL ↗ · 2026-06-15 Cached

This paper introduces WebDecept, a framework for injecting deceptive interface patterns into web environments to evaluate the safety of autonomous web agents. Experiments show current agents are highly susceptible to such manipulations, highlighting safety challenges for real-world deployment.

0 favorites 0 likes

#web-agents

Signal-Driven Observation for Long-Horizon Web Agents

arXiv cs.CL ↗ · 2026-06-08 Cached

The paper proposes Signal-Driven Observation (SDO), a method for web agents to avoid context degradation by only reading task-relevant parts of the DOM and re-invoking observation only when triggered by specific signals, rather than reading the full page state at every action step.

0 favorites 0 likes

#web-agents

AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

arXiv cs.LG ↗ · 2026-06-05 Cached

AsyncWebRL introduces an asynchronous multi-step reinforcement learning system for vision-language web agents, achieving up to 2.9x training speedup and setting a new state-of-the-art on WebGym by replacing per-trajectory normalization with a constant to reduce trajectory length inefficiency.

0 favorites 0 likes

#web-agents

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

Hugging Face Daily Papers ↗ · 2026-06-05 Cached

SlimSearcher is a framework that improves efficiency in deep research agents by combining Pareto-efficient trajectory filtering and adaptive reward shaping, reducing tool-call rounds by 17-58% while maintaining accuracy on benchmarks like GAIA, BrowseComp, and XBenchDeepSearch.

0 favorites 0 likes

#web-agents

Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

arXiv cs.AI ↗ · 2026-06-04 Cached

This paper proposes SGDR (State-Grounded Dynamic Retrieval), an online skill learning method for web agents that enables stepwise, state-aware skill reuse rather than static task-level retrieval. Experiments on WebArena show SGDR achieves 37.5% success rate with GPT-4.1, a ~10.6% relative gain over strong baselines.

0 favorites 0 likes

#web-agents

Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration

arXiv cs.AI ↗ · 2026-06-01 Cached

Proposes SCALE, a framework for self-improving web agents using cognitive-aware exploration with three adversarial roles and a graph exploration strategy. Also introduces a large-scale dataset SCALE-20k from real websites, showing significant improvements in MLLM-based web agents.

0 favorites 0 likes

#web-agents

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

Hugging Face Daily Papers ↗ · 2026-06-01 Cached

OpenWebRL presents an open framework for training visual web agents using online multi-turn reinforcement learning on real websites, achieving state-of-the-art performance with minimal initial supervision. Their 4B-parameter model outperforms prior open agents and competes with proprietary systems like OpenAI CUA and Gemini CUA.

0 favorites 0 likes

#web-agents

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

arXiv cs.AI ↗ · 2026-05-29 Cached

This paper introduces GTA, a scalable framework for automatically generating long-horizon, multi-hop web agent tasks with executable trajectories, addressing the lack of process-level supervision in web agent benchmarks. The framework integrates crawling, retrieval-based seeding, and automated quality control to produce realistic tasks across multiple websites.

0 favorites 0 likes

#web-agents

@googledevs: Modern Web Guidance + Chrome DevTools for agents = A powerful new workflow. Matthias Rohmer takes you inside the #Googl…

X AI KOLs Following ↗ · 2026-05-26 Cached

Google demonstrated at Google I/O a new workflow of Chrome DevTools with AI agents, including APIs such as WebMCP and HTML-in-Canvas, aiming to make it easy for developers to expose web page functionality to AI agents while maintaining semantics, accessibility, and security boundaries.

0 favorites 0 likes

#web-agents

DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning

arXiv cs.AI ↗ · 2026-05-26 Cached

DRIVE proposes a dual-level skill modeling framework that separates reasoning knowledge from interaction knowledge for web agents under continual learning, achieving a 52.8% task success rate on WebArena, outperforming the skill-free baseline by 7.3 percentage points.

0 favorites 0 likes

#web-agents

Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection

arXiv cs.LG ↗ · 2026-05-21 Cached

Weasel is a trajectory selection method for offline training of web agents that improves out-of-domain generalization by balancing importance and diversity. It achieves up to 12.5x training speedups and improved performance across several benchmarks.

0 favorites 0 likes

#web-agents

Skim: Speculative Execution for Fast and Efficient Web Agents

arXiv cs.AI ↗ · 2026-05-19 Cached

Accio is a speculative execution framework that reduces cost and latency for web agents by leveraging offline site-structure profiling and online selection of fast paths, achieving a 1.9x reduction in per-task cost and 33.4% latency reduction while maintaining accuracy.

0 favorites 0 likes

#web-agents

ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents

arXiv cs.AI ↗ · 2026-05-18 Cached

ShopGym is a framework that converts live e-commerce storefronts into self-contained sandbox shops for realistic, controllable, and reproducible benchmarking of web agents, with synthetic tasks across seven skill categories.

0 favorites 0 likes

#web-agents

SimPersona: Learning Discrete Buyer Personas from Raw Clickstreams for Grounded E-Commerce Agents

arXiv cs.AI ↗ · 2026-05-15 Cached

SimPersona learns discrete buyer personas from raw clickstreams using a VQ-VAE and maps them to persona tokens for LLM-based web agents, achieving high conversion-rate alignment across many live storefronts.

0 favorites 0 likes

#web-agents

WebHarbor - We "dock" the real websites into local for web agents! [R]

Reddit r/MachineLearning ↗ · 2026-05-14

WebHarbor packages 15 real websites (Amazon, GitHub, BBC, etc.) as self-contained Flask+SQLite apps in a single Docker image with sub-second reset, designed for reproducible web agent evaluation and training. The project invites community contributions to expand to 100+ sites, with co-authorship opportunities.

0 favorites 0 likes

web-agents

Submit Feedback