What Makes an LLM a Good Optimizer? A Trajectory Analysis of LLM-Guided Evolutionary Search

Hugging Face Daily Papers Papers

Summary

Large-scale study of 15 LLMs across 8 tasks reveals that optimization success hinges on maintaining localized search trajectories rather than initial problem-solving ability or solution novelty.

Recent work has demonstrated the promise of orchestrating large language models (LLMs) within evolutionary and agentic optimization systems. However, the mechanisms driving these optimization gains remain poorly understood. In this work, we present a large-scale study of LLM-guided evolutionary search, collecting optimization trajectories for 15 LLMs across 8 tasks. Although zero-shot problem-solving ability correlates with final optimization outcomes, it explains only part of the variance: models with similar initial capability often induce dramatically different search trajectories and outcomes. By analyzing these trajectories, we find that strong LLM optimizers behave as local refiners, producing frequent incremental improvements while progressively localizing the search in semantic space. Conversely, weaker optimizers exhibit large semantic drift, with sporadic breakthroughs followed by stagnation. Notably, various measures of solution novelty do not predict final performance; novelty is beneficial only when the search remains sufficiently localized around high-performing regions of the solution space. Our results highlight the importance of trajectory analysis for understanding and improving LLM-based optimization systems and provide actionable insights for their design and training.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/22/26, 02:41 PM

Paper page - What Makes an LLM a Good Optimizer? A Trajectory Analysis of LLM-Guided Evolutionary Search

Source: https://huggingface.co/papers/2604.19440

Abstract

LLM-guided evolutionary search shows that optimization success depends on search trajectory characteristics rather than initial problem-solving ability alone, with strong optimizers refining locally while weak ones show semantic drift.

Recent work has demonstrated the promise of orchestratinglarge language models(LLMs) within evolutionary and agentic optimization systems. However, the mechanisms driving these optimization gains remain poorly understood. In this work, we present a large-scale study of LLM-guidedevolutionary search, collectingoptimization trajectoriesfor 15 LLMs across 8 tasks. Although zero-shot problem-solving ability correlates with final optimization outcomes, it explains only part of the variance: models with similar initial capability often induce dramatically different search trajectories and outcomes. By analyzing these trajectories, we find that strong LLM optimizers behave as local refiners, producing frequent incremental improvements while progressively localizing the search insemantic space. Conversely, weaker optimizers exhibit largesemantic drift, with sporadic breakthroughs followed by stagnation. Notably, various measures ofsolution noveltydo not predict final performance; novelty is beneficial only when the search remains sufficiently localized around high-performing regions of the solution space. Our results highlight the importance oftrajectory analysisfor understanding and improving LLM-based optimization systems and provide actionable insights for their design and training.

View arXiv pageView PDFProject pageGitHub0Add to collection

Get this paper in your agent:

hf papers read 2604\.19440

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.19440 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.19440 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.19440 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Evaluating LLMs as Human Surrogates in Controlled Experiments

arXiv cs.CL

This paper evaluates whether off-the-shelf LLMs can reliably simulate human responses in controlled behavioral experiments by comparing LLM-generated data with human survey responses on accuracy perception. The findings show that while LLMs capture directional effects and aggregate belief-updating patterns, they do not consistently match human-scale effect magnitudes, clarifying when synthetic LLM data can serve as behavioral proxies.

Self-Evolving LLM Memory Extraction Across Heterogeneous Tasks

Hugging Face Daily Papers

Researchers introduce BEHEMOTH benchmark and CluE cluster-based prompt optimization to enable LLMs to extract and retain heterogeneous memory across diverse tasks, achieving 9% gains over prior self-evolving frameworks.