PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

Hugging Face Daily Papers Papers

Summary

The paper introduces PACEvolve++, a reinforcement learning framework that improves test-time policy adaptation for evolutionary search agents by decoupling hypothesis generation from execution.

Large language models have become drivers of evolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and research tasks, where evaluations are expensive, and progress depends on learning task-specific search dynamics. We introduce PACEvolve++, an advisor-model reinforcement learning framework for test-time policy adaptation in evolutionary search agents. PACEvolve++ decouples strategic search decisions from implementation: a trainable advisor generates, assesses, and selects hypotheses, while a stronger frontier model translates selected hypotheses into executable candidates. To train the advisor under non-stationary feedback, we propose a phase-adaptive approach that adapts its optimization strategy to different phases of the evolutionary process. Early in evolution, it uses group-relative feedback to learn broad search preferences; later, as reward gaps compress, it emphasizes best-of-k frontier contribution to support stable refinement. Across expert-parallel load balancing, sequential recommendation, and protein fitness extrapolation, PACEvolve++ outperforms the state-of-the-art evolutionary search framework with frontier models, achieving faster convergence and stabilizing test-time training during evolutionary search.
Original Article
View Cached Full Text

Cached at: 05/12/26, 02:50 AM

Paper page - PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

Source: https://huggingface.co/papers/2605.07039 Authors:

,

,

,

,

,

,

,

,

,

,

,

,

Abstract

PACEvolve++ enables adaptive policy selection in evolutionary search through a reinforcement learning framework that decouples hypothesis generation from execution while adapting optimization strategies across evolutionary phases.

Large language models have become drivers ofevolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and research tasks, where evaluations are expensive, and progress depends on learning task-specific search dynamics. We introduce PACEvolve++, an advisor-modelreinforcement learningframework fortest-time policy adaptationinevolutionary searchagents. PACEvolve++ decouples strategic search decisions from implementation: a trainable advisor generates, assesses, and selects hypotheses, while a strongerfrontier modeltranslates selected hypotheses into executable candidates. To train the advisor under non-stationary feedback, we propose aphase-adaptive approachthat adapts its optimization strategy to different phases of the evolutionary process. Early in evolution, it usesgroup-relative feedbackto learn broad search preferences; later, as reward gaps compress, it emphasizesbest-of-kfrontier contribution to support stable refinement. Across expert-parallel load balancing, sequential recommendation, and protein fitness extrapolation, PACEvolve++ outperforms the state-of-the-artevolutionary searchframework withfrontier models, achieving fasterconvergenceand stabilizing test-time training duringevolutionary search.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2605\.07039

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.07039 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.07039 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.07039 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems

arXiv cs.CL

EvoTest introduces J-TTL, a benchmark for measuring agent test-time learning capabilities, and proposes an evolutionary framework where an Actor Agent plays games while an Evolver Agent iteratively improves the system's prompts, memory, and hyperparameters without fine-tuning. The method demonstrates superior performance compared to reflection and memory-based baselines on complex text-based games.