web-agents

#web-agents

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

Hugging Face Daily Papers ↗ · 2026-05-12 Cached

This paper introduces LongMemEval-V2, a benchmark for evaluating long-term memory systems in web agents, along with two memory methods: AgentRunbook-R and AgentRunbook-C.

0 favorites 0 likes

#web-agents

@AdinaYakup: Qwen released WebWorld an open world model series for web agents 8B/14B/32B+Dataset Apache2.0 +9.9% MiniWob++, +10.9% W…

X AI KOLs Following ↗ · 2026-05-11 Cached

Qwen released WebWorld, an open-source model series for web agents (8B/14B/32B) under Apache 2.0, which improves performance on MiniWob++ and WebArena benchmarks.

0 favorites 0 likes

#web-agents

Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

arXiv cs.AI ↗ · 2026-05-11 Cached

Apple Research introduces Weblica, a framework for creating scalable and reproducible training environments for visual web agents using HTTP caching and LLM-based synthesis.

0 favorites 0 likes

#web-agents

Region4Web: Rethinking Observation Space Granularity for Web Agents

arXiv cs.CL ↗ · 2026-05-11 Cached

This paper introduces Region4Web, a framework that improves web agent performance by organizing observation spaces into functional regions rather than individual elements. It demonstrates that this approach reduces observation length and increases task success rates on the WebArena benchmark.

0 favorites 0 likes

#web-agents

Where Did It Go Wrong? Process-Level Evaluation of Web Agents with Semantic State Tracking

Hugging Face Daily Papers ↗ · 2026-04-08 Cached

This paper introduces WebStep, a benchmark and framework for process-level evaluation of web agents using semantic state tracking. It reveals detailed performance differences and error localization beyond terminal success metrics.

0 favorites 0 likes

#web-agents

WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

Papers with Code Trending ↗ · 2025-07-20 Cached

WebShaper is a formalization-driven framework for synthesizing information-seeking datasets using set theory and Knowledge Projections, achieving state-of-the-art performance on GAIA and WebWalkerQA benchmarks among open-source agents.

0 favorites 0 likes

web-agents

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

@AdinaYakup: Qwen released WebWorld an open world model series for web agents 8B/14B/32B+Dataset Apache2.0 +9.9% MiniWob++, +10.9% W…

Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

Region4Web: Rethinking Observation Space Granularity for Web Agents

Where Did It Go Wrong? Process-Level Evaluation of Web Agents with Semantic State Tracking

WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

Submit Feedback