EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents
Summary
EEVEE is a novel test-time prompt learning framework for LLM agents that handles heterogeneous data streams through task clustering and co-evolving router-prompt optimization, achieving significant improvements over existing methods across multiple benchmarks.
View Cached Full Text
Cached at: 06/10/26, 05:44 AM
Paper page - EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents
Source: https://huggingface.co/papers/2606.11182
Abstract
EEVEE is a novel test-time prompt learning framework for LLM agents that handles heterogeneous data streams through task clustering and co-evolving router-prompt optimization.
In this paper, we propose EEVEE, the firstmulti-datasettest-time prompt learningframework forLLM agents, enablingtest-time prompt learningunder real-world task streams. Existing methods are largely designed for single-dataset settings, while real-world applications require models to handle heterogeneous input streams drawn from multiple datasets, domains, and task distributions, limiting their practical applicability. To mitigatecross-dataset interference, EEVEE introduces arouterthat partitions incoming inputs intotask clustersand assigns them to suitableprompt configurations. This design is optimized via arouter-prompt co-evolutionstrategy, which employs interleavedrouterand prompt learning phases to address their mutual dependency. Experiments across multiple datasets demonstrate that the framework improves robustness underheterogeneous data streamswhile maintaining single-benchmark learning capability and efficiency. Specifically, EEVEE improves average multi-benchmark scores by 10.38 and 24.32 points over Qwen3-4B-Instruct and DeepSeek-V3.2, surpassing SOTA methods GEPA and ACE by up to 37.2% and 48.2%.
View arXiv pageView PDFProject pageGitHub2Add to collection
Get this paper in your agent:
hf papers read 2606\.11182
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.11182 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.11182 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.11182 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Environment-Grounded Automated Prompt Optimization for LLM Game Agents
Introduces an automated prompt optimization framework for LLM game agents that decomposes the observation-to-action pipeline into two agents and iteratively refines prompts via an evolutionary loop guided by environment returns. Evaluated on BabyAI tasks, it significantly improves success rates (e.g., from 0% to 72.5% on PutNext) without updating model weights.
EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
EvoTest introduces J-TTL, a benchmark for measuring agent test-time learning capabilities, and proposes an evolutionary framework where an Actor Agent plays games while an Evolver Agent iteratively improves the system's prompts, memory, and hyperparameters without fine-tuning. The method demonstrates superior performance compared to reflection and memory-based baselines on complex text-based games.
EVE-Agent: Evidence-Verifiable Self-Evolving Agents
EVE-Agent introduces a framework for self-evolving search agents that ensure evidence verifiability by generating questions, answers, and evidence spans, and training on marginal accuracy gain of evidence. This improves grounded correctness without human annotations.
PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents
The paper introduces PACEvolve++, a reinforcement learning framework that improves test-time policy adaptation for evolutionary search agents by decoupling hypothesis generation from execution.
Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration
This paper proposes a method to train LLM agents with intrinsic meta-evolution capabilities, enabling spontaneous self-improvement without external rewards at inference time. Applied to Qwen3-30B and Seed-OSS-36B, the approach yields a 20% performance boost on web navigation benchmarks, with a 14B model outperforming Gemini-2.5-Flash.