EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

Hugging Face Daily Papers 06/09/26, 05:57 PM Papers

Summary

EEVEE is a novel test-time prompt learning framework for LLM agents that handles heterogeneous data streams through task clustering and co-evolving router-prompt optimization, achieving significant improvements over existing methods across multiple benchmarks.

In this paper, we propose EEVEE, the first multi-dataset test-time prompt learning framework for LLM agents, enabling test-time prompt learning under real-world task streams. Existing methods are largely designed for single-dataset settings, while real-world applications require models to handle heterogeneous input streams drawn from multiple datasets, domains, and task distributions, limiting their practical applicability. To mitigate cross-dataset interference, EEVEE introduces a router that partitions incoming inputs into task clusters and assigns them to suitable prompt configurations. This design is optimized via a router-prompt co-evolution strategy, which employs interleaved router and prompt learning phases to address their mutual dependency. Experiments across multiple datasets demonstrate that the framework improves robustness under heterogeneous data streams while maintaining single-benchmark learning capability and efficiency. Specifically, EEVEE improves average multi-benchmark scores by 10.38 and 24.32 points over Qwen3-4B-Instruct and DeepSeek-V3.2, surpassing SOTA methods GEPA and ACE by up to 37.2% and 48.2%.

Original Article

View Cached Full Text

Cached at: 06/10/26, 05:44 AM

Paper page - EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

Source: https://huggingface.co/papers/2606.11182

Abstract

EEVEE is a novel test-time prompt learning framework for LLM agents that handles heterogeneous data streams through task clustering and co-evolving router-prompt optimization.

In this paper, we propose EEVEE, the firstmulti-dataset test-time prompt learningframework forLLM agents, enablingtest-time prompt learningunder real-world task streams. Existing methods are largely designed for single-dataset settings, while real-world applications require models to handle heterogeneous input streams drawn from multiple datasets, domains, and task distributions, limiting their practical applicability. To mitigatecross-dataset interference, EEVEE introduces arouterthat partitions incoming inputs intotask clustersand assigns them to suitableprompt configurations. This design is optimized via arouter-prompt co-evolutionstrategy, which employs interleavedrouterand prompt learning phases to address their mutual dependency. Experiments across multiple datasets demonstrate that the framework improves robustness underheterogeneous data streamswhile maintaining single-benchmark learning capability and efficiency. Specifically, EEVEE improves average multi-benchmark scores by 10.38 and 24.32 points over Qwen3-4B-Instruct and DeepSeek-V3.2, surpassing SOTA methods GEPA and ACE by up to 37.2% and 48.2%.

View arXiv page View PDF Project page GitHub2 Add to collection

Get this paper in your agent:

hf papers read 2606\.11182

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.11182 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.11182 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.11182 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

Paper page - EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Environment-Grounded Automated Prompt Optimization for LLM Game Agents

EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems

EVE-Agent: Evidence-Verifiable Self-Evolving Agents

PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

Submit Feedback

Similar Articles

Environment-Grounded Automated Prompt Optimization for LLM Game Agents

EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems

EVE-Agent: Evidence-Verifiable Self-Evolving Agents

PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration