interactive-agents

#interactive-agents

Online Agent-as-a-Judge: Situation-Generating Evaluation for Interactive Agents

arXiv cs.AI ↗ · 2026-06-09 Cached

Proposes Online Agent-as-a-Judge, an evaluation framework that uses an in-world evaluator agent to actively generate situations for testing interactive social agents, improving coverage and reliability over passive methods.

0 favorites 0 likes

#interactive-agents

MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning

Hugging Face Daily Papers ↗ · 2026-05-13 Cached

The paper proposes the Map-then-Act Paradigm (MAP), a plug-and-play framework that shifts environmental understanding before execution in interactive LLM agents, achieving consistent gains across benchmarks and enabling frontier models to surpass near-zero baseline performance in 22 of 25 game environments.

0 favorites 0 likes

interactive-agents

Online Agent-as-a-Judge: Situation-Generating Evaluation for Interactive Agents

MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning

Submit Feedback