Bridging the Agent-World Gap: Text World Models for LLM-based Agents

Hugging Face Daily Papers Papers

Summary

This paper systematically reviews text world models for LLM-based agents, covering foundations, construction paradigms, applications in planning and training, and evaluation methods.

Large language model (LLM)-based agents are increasingly used in interactive textual environments, from web navigation and code editing to tool use and long-horizon dialogue. Yet many remain largely reactive, mapping observations to actions without an explicit model of how these environments are structured and evolve. This motivates text world models (TWMs): transition models over textual states that, given a state and a candidate action, predict the resulting webpage, terminal output, API response, or user reply, thereby supporting planning, efficient learning, and principled evaluation. We systematically review text world models for LLM-based agents, organized around a formal framework and the agent lifecycle: (1) Foundations, defining text world models and characterizing them by state representation and grounding domain; (2) Construction, taxonomizing LLM-as-WM and code-as-WM paradigms and reviewing methods for building them; (3) Application, examining how world models support agents at training time through experience synthesis and at inference time through planning, verification, and adaptation; and (4) Evaluation, covering both evaluation of the world model itself and its use as an evaluation environment for agents. We aim to consolidate this rapidly developing area, clarify its design space, and highlight open challenges for future research.
Original Article
View Cached Full Text

Cached at: 06/10/26, 05:45 AM

Paper page - Bridging the Agent-World Gap: Text World Models for LLM-based Agents

Source: https://huggingface.co/papers/2606.09032 Authors:

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Abstract

Text world models serve as transition models for LLM-based agents in interactive environments, enabling planning and efficient learning by predicting environmental changes from textual states and actions.

Large language model (LLM)-based agents are increasingly used in interactive textual environments, from web navigation and code editing to tool use and long-horizon dialogue. Yet many remain largely reactive, mapping observations to actions without an explicit model of how these environments are structured and evolve. This motivatestext world models(TWMs):transition modelsovertextual statesthat, given a state and a candidate action, predict the resulting webpage, terminal output, API response, or user reply, thereby supportingplanning, efficient learning, and principled evaluation. We systematically reviewtext world modelsforLLM-based agents, organized around a formal framework and the agent lifecycle: (1) Foundations, definingtext world modelsand characterizing them by state representation and grounding domain; (2) Construction, taxonomizing LLM-as-WM and code-as-WM paradigms and reviewing methods for building them; (3) Application, examining how world models support agents at training time throughexperience synthesisand at inference time throughplanning,verification, andadaptation; and (4) Evaluation, covering both evaluation of the world model itself and its use as an evaluation environment for agents. We aim to consolidate this rapidly developing area, clarify its design space, and highlight open challenges for future research.

View arXiv pageView PDFGitHub5Add to collection

Get this paper in your agent:

hf papers read 2606\.09032

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.09032 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.09032 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.09032 in a Space README.md to link it from this page.

Collections including this paper1

Similar Articles

World Model for Robot Learning: A Comprehensive Survey

Hugging Face Daily Papers

This comprehensive survey reviews the literature on world models for robot learning, covering their roles in policy learning, planning, and simulation. It highlights key paradigms, benchmarks, and future directions for predictive modeling in embodied agents.