MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning
Summary
The paper proposes the Map-then-Act Paradigm (MAP), a plug-and-play framework that shifts environmental understanding before execution in interactive LLM agents, achieving consistent gains across benchmarks and enabling frontier models to surpass near-zero baseline performance in 22 of 25 game environments.
View Cached Full Text
Cached at: 05/14/26, 04:16 AM
Paper page - MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning
Source: https://huggingface.co/papers/2605.13037
Abstract
Interactive LLM agents suffer from delayed environmental perception and epistemic bottlenecks due to reactive understanding during execution, which the proposed Map-then-Act Paradigm (MAP) addresses by acquiring environmental knowledge beforehand through global exploration, task-specific mapping, and knowledge-augmented execution.
Current interactive LLM agents rely ongoal-conditioned stepwise planning, whereenvironmental understandingis acquired reactively during execution rather than established beforehand. This temporal inversion leads to Delayed Environmental Perception: agents must infer environmental constraints throughtrial-and-error, resulting in anEpistemic Bottleneckthat traps them in inefficient failure cycles. Inspired by humanaffordance perceptionandcognitive map theory, we propose theMap-then-Act Paradigm(MAP), a plug-and-play framework that shifts environment understanding before execution. MAP consists of three stages: (1)Global Exploration, acquiring environment-general priors; (2)Task-Specific Mapping, constructing a structured cognitive map; and (3)Knowledge-Augmented Execution, solving tasks grounded on the map. Experiments show consistent gains across benchmarks and LLMs. OnARC-AGI-3, MAP enables frontier models to surpass near-zero baseline performance in 22 of 25 game environments. We further introduceMAP-2K, a dataset of map-then-act trajectories, and show that training on it outperforms expert execution traces, suggesting that understanding environments is more fundamental than imitation.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.13037
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.13037 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.13037 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.13037 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
TMAS introduces a multi-agent framework that enhances large language model reasoning by scaling test-time compute through structured collaboration and hierarchical memory systems. The approach uses specialized agents, cross-trajectory information flow, and hybrid reward reinforcement learning to improve iterative scaling and stability on challenging reasoning benchmarks.
AIPO: : Learning to Reason from Active Interaction
This paper introduces AIPO, a reinforcement learning framework that enhances LLM reasoning by allowing the model to actively consult collaborative agents during exploration to overcome capability boundaries.
Agentick: A Unified Benchmark for General Sequential Decision-Making Agents
This paper introduces Agentick, a unified benchmark for evaluating general sequential decision-making agents across RL, LLM, and VLM paradigms. It provides 37 procedurally generated tasks and reveals that no single approach currently dominates, highlighting significant room for improvement in agent autonomy.
Learning Agentic Policy from Action Guidance
The paper proposes ActGuide-RL, a method for training agentic policies in LLMs by using human action data as guidance to overcome exploration barriers in reinforcement learning without extensive supervised fine-tuning.
Tools as Continuous Flow for Evolving Agentic Reasoning
This paper introduces FlowAgent, a novel framework that reconceptualizes tool chaining as continuous trajectory generation using conditional flow matching to improve robustness in long-horizon agentic reasoning.