Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

Hugging Face Daily Papers 04/20/26, 12:00 AM Papers

Summary

This paper proposes a method to train LLM agents with intrinsic meta-evolution capabilities, enabling spontaneous self-improvement without external rewards at inference time. Applied to Qwen3-30B and Seed-OSS-36B, the approach yields a 20% performance boost on web navigation benchmarks, with a 14B model outperforming Gemini-2.5-Flash.

Most agents today ``self-evolve'' by following rewards and rules defined by humans. However, this process remains fundamentally dependent on external supervision; without human guidance, the evolution stops. In this work, we train agents to possess an intrinsic meta-evolution capability to spontaneously learn about unseen environments prior to task execution. To instill this ability, we design an outcome-based reward mechanism that measures how much an agent's self-generated world knowledge improves its success rate on downstream tasks. This reward signal is used exclusively during the training phase to teach the model how to explore and summarize effectively. At inference time, the agent requires no external rewards or human instructions. It spontaneously performs native self-evolution to adapt to unknown environments using its internal parameters. When applied to Qwen3-30B and Seed-OSS-36B, this shift to native evolution yields a 20% performance increase on WebVoyager and WebWalker. Most strikingly, the generated world knowledge even enables a compact 14B Qwen3 model to outperform the unassisted Gemini-2.5-Flash, establishing a new paradigm for truly evolving agents.

Original Article

View Cached Full Text

Cached at: 04/21/26, 07:20 AM

Paper page - Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

Source: https://huggingface.co/papers/2604.18131

Abstract

Agents equipped with intrinsic meta-evolution capabilities demonstrate improved performance on web navigation tasks through self-generated world knowledge without external supervision.

Most agents today ``self-evolve’’ by following rewards and rules defined by humans. However, this process remains fundamentally dependent on external supervision; without human guidance, the evolution stops. In this work, we train agents to possess an intrinsicmeta-evolutioncapability to spontaneously learn about unseen environments prior to task execution. To instill this ability, we design anoutcome-based reward mechanismthat measures how much an agent’s self-generatedworld knowledgeimproves its success rate ondownstream tasks. This reward signal is used exclusively during the training phase to teach the model how to explore and summarize effectively. At inference time, the agent requires no external rewards or human instructions. It spontaneously performsnative self-evolutionto adapt to unknown environments using its internal parameters. When applied toQwen3-30BandSeed-OSS-36B, this shift to native evolution yields a 20% performance increase onWebVoyagerandWebWalker. Most strikingly, the generatedworld knowledgeeven enables a compact 14B Qwen3 model to outperform the unassistedGemini-2.5-Flash, establishing a new paradigm for truly evolving agents.

View arXiv page View PDF GitHub1 Add to collection

Get this paper in your agent:

hf papers read 2604\.18131

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.18131 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.18131 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.18131 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

Paper page - Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution

OpenSkill: Open-World Self-Evolution for LLM Agents

EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

MetaEvo: A Meta-Optimization Framework for Experience-Driven Agent Evolution

Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

Submit Feedback

Similar Articles

CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution

OpenSkill: Open-World Self-Evolution for LLM Agents

EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

MetaEvo: A Meta-Optimization Framework for Experience-Driven Agent Evolution

Rethinking Continual Experience Internalization for Self-Evolving LLM Agents