What Makes an LLM a Good Optimizer? A Trajectory Analysis of LLM-Guided Evolutionary Search
Summary
Large-scale study of 15 LLMs across 8 tasks reveals that optimization success hinges on maintaining localized search trajectories rather than initial problem-solving ability or solution novelty.
View Cached Full Text
Cached at: 04/22/26, 02:41 PM
Paper page - What Makes an LLM a Good Optimizer? A Trajectory Analysis of LLM-Guided Evolutionary Search
Source: https://huggingface.co/papers/2604.19440
Abstract
LLM-guided evolutionary search shows that optimization success depends on search trajectory characteristics rather than initial problem-solving ability alone, with strong optimizers refining locally while weak ones show semantic drift.
Recent work has demonstrated the promise of orchestratinglarge language models(LLMs) within evolutionary and agentic optimization systems. However, the mechanisms driving these optimization gains remain poorly understood. In this work, we present a large-scale study of LLM-guidedevolutionary search, collectingoptimization trajectoriesfor 15 LLMs across 8 tasks. Although zero-shot problem-solving ability correlates with final optimization outcomes, it explains only part of the variance: models with similar initial capability often induce dramatically different search trajectories and outcomes. By analyzing these trajectories, we find that strong LLM optimizers behave as local refiners, producing frequent incremental improvements while progressively localizing the search insemantic space. Conversely, weaker optimizers exhibit largesemantic drift, with sporadic breakthroughs followed by stagnation. Notably, various measures ofsolution noveltydo not predict final performance; novelty is beneficial only when the search remains sufficiently localized around high-performing regions of the solution space. Our results highlight the importance oftrajectory analysisfor understanding and improving LLM-based optimization systems and provide actionable insights for their design and training.
View arXiv pageView PDFProject pageGitHub0Add to collection
Get this paper in your agent:
hf papers read 2604\.19440
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2604.19440 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2604.19440 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2604.19440 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Local LLM Inference Optimization: The Complete Guide
A comprehensive guide to optimizing local LLM inference on consumer hardware, covering tools like llama.cpp, vLLM, and LM Studio, with practical advice on memory hierarchy, layer placement, and common failure modes.
Large Language Models as Optimizers: A Survey of Direct vs. Tool-Augmented Approaches and Their Performance Frontiers
This survey categorizes LLM-based optimization into three paradigms—direct, tool-augmented, and tool-creating—and reviews their performance frontiers and limitations.
EvoMD-LLM: Learning the Language of Species Evolution in Reactive Molecular Dynamics
EvoMD-LLM reformulates reactive molecular dynamics trajectories as symbolic temporal sequences, enabling LLMs to model species evolution over time through fine-tuning and temporal scaffolding, achieving up to 66.14% accuracy and interpretable predictions.
LEAP: Trajectory-Level Evaluation of LLMs in Iterative Scientific Design
The paper introduces LEAPBench, a 55-task framework for trajectory-level evaluation of LLMs in iterative scientific design, revealing that outcome-based scoring misses efficiency gains and that domain-agnostic prompting can outperform domain-aware prompting in matching published best designs.
@Kevin_GuoweiXu: How should LLMs sample on hard reasoning problems during post-training and inference where direct rollouts rarely produ…
Introduces BES (Bidirectional Evolutionary Search), a search framework for LLMs that combines forward candidate evolution with backward goal decomposition to improve sampling on hard reasoning problems during post-training and inference.