PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

Hugging Face Daily Papers 05/07/26, 12:00 AM Papers

Summary

The paper introduces PACEvolve++, a reinforcement learning framework that improves test-time policy adaptation for evolutionary search agents by decoupling hypothesis generation from execution.

Large language models have become drivers of evolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and research tasks, where evaluations are expensive, and progress depends on learning task-specific search dynamics. We introduce PACEvolve++, an advisor-model reinforcement learning framework for test-time policy adaptation in evolutionary search agents. PACEvolve++ decouples strategic search decisions from implementation: a trainable advisor generates, assesses, and selects hypotheses, while a stronger frontier model translates selected hypotheses into executable candidates. To train the advisor under non-stationary feedback, we propose a phase-adaptive approach that adapts its optimization strategy to different phases of the evolutionary process. Early in evolution, it uses group-relative feedback to learn broad search preferences; later, as reward gaps compress, it emphasizes best-of-k frontier contribution to support stable refinement. Across expert-parallel load balancing, sequential recommendation, and protein fitness extrapolation, PACEvolve++ outperforms the state-of-the-art evolutionary search framework with frontier models, achieving faster convergence and stabilizing test-time training during evolutionary search.

Original Article

View Cached Full Text

Cached at: 05/12/26, 02:50 AM

Paper page - PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

Source: https://huggingface.co/papers/2605.07039 Authors:

Abstract

PACEvolve++ enables adaptive policy selection in evolutionary search through a reinforcement learning framework that decouples hypothesis generation from execution while adapting optimization strategies across evolutionary phases.

Large language models have become drivers ofevolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and research tasks, where evaluations are expensive, and progress depends on learning task-specific search dynamics. We introduce PACEvolve++, an advisor-modelreinforcement learningframework fortest-time policy adaptationinevolutionary searchagents. PACEvolve++ decouples strategic search decisions from implementation: a trainable advisor generates, assesses, and selects hypotheses, while a strongerfrontier modeltranslates selected hypotheses into executable candidates. To train the advisor under non-stationary feedback, we propose aphase-adaptive approachthat adapts its optimization strategy to different phases of the evolutionary process. Early in evolution, it usesgroup-relative feedbackto learn broad search preferences; later, as reward gaps compress, it emphasizesbest-of-kfrontier contribution to support stable refinement. Across expert-parallel load balancing, sequential recommendation, and protein fitness extrapolation, PACEvolve++ outperforms the state-of-the-artevolutionary searchframework withfrontier models, achieving fasterconvergenceand stabilizing test-time training duringevolutionary search.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2605\.07039

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.07039 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.07039 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.07039 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

Paper page - PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales

EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems

PACE: Two-Timescale Self-Evolution for Small Language Model Agents

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

EvoSci: A Bio-Inspired Multi-Agent Framework for the Evolution of Scientific Discovery

Submit Feedback

Similar Articles

EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales

EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems

PACE: Two-Timescale Self-Evolution for Small Language Model Agents

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

EvoSci: A Bio-Inspired Multi-Agent Framework for the Evolution of Scientific Discovery