SAGE: Stochastic Prompt Optimization via Agent-Guided Exploration

arXiv cs.CL Papers

Summary

Introduces SPO, a stochastic search framework for automatic prompt optimization, with three strategies including SAGE, an agent-guided multi-agent pipeline. Evaluated on benchmarks and deployed on a mental-health chatbot, showing improvements in retention through continuous optimization.

arXiv:2606.18902v1 Announce Type: new Abstract: Context engineering has emerged as a primary lever for improving AI systems without parameter updates. Recent work showing that textual gradients do not function as real gradients motivates treating automatic prompt optimization (APO) as black-box search. We introduce SPO (Stochastic Prompt Optimization), a framework for stochastic search over prompt space, and compare three strategies of increasing sophistication: error-informed random search, a genetic algorithm with evolutionary operators, and SAGE (SPO via Agent-Guided Exploration), a multi-agent pipeline with diagnostic code execution. Across three benchmarks, no single strategy dominates; effectiveness depends on the interaction of landscape structure with error type. We further deploy SAGE on a mental-health chatbot under a continuous optimization paradigm, where it compounds eight cycles of individually-noisy A/B tests into a statistically robust gain in next-day retention. We argue that coupling qualitative diagnosis with quantitative validation is what makes agentic optimization effective for open-ended task-oriented dialogue.
Original Article
View Cached Full Text

Cached at: 06/18/26, 05:46 AM

# SAGE: Stochastic Prompt Optimization via Agent-Guided Exploration
Source: [https://arxiv.org/abs/2606.18902](https://arxiv.org/abs/2606.18902)
[View PDF](https://arxiv.org/pdf/2606.18902)

> Abstract:Context engineering has emerged as a primary lever for improving AI systems without parameter updates\. Recent work showing that textual gradients do not function as real gradients motivates treating automatic prompt optimization \(APO\) as black\-box search\. We introduce SPO \(Stochastic Prompt Optimization\), a framework for stochastic search over prompt space, and compare three strategies of increasing sophistication: error\-informed random search, a genetic algorithm with evolutionary operators, and SAGE \(SPO via Agent\-Guided Exploration\), a multi\-agent pipeline with diagnostic code execution\. Across three benchmarks, no single strategy dominates; effectiveness depends on the interaction of landscape structure with error type\. We further deploy SAGE on a mental\-health chatbot under a continuous optimization paradigm, where it compounds eight cycles of individually\-noisy A/B tests into a statistically robust gain in next\-day retention\. We argue that coupling qualitative diagnosis with quantitative validation is what makes agentic optimization effective for open\-ended task\-oriented dialogue\.

## Submission history

From: Ziyi Zhu \[[view email](https://arxiv.org/show-email/99e3891a/2606.18902)\] **\[v1\]**Wed, 17 Jun 2026 10:25:25 UTC \(739 KB\)

Similar Articles

SePO: Self-Evolving Prompt Agent for System Prompt Optimization

arXiv cs.CL

SePO (Self-Evolving Prompt Optimization) proposes a self-referential prompt agent that optimizes both task agents' system prompts and its own system prompt through an evolutionary search, outperforming Manual-CoT, TextGrad, and MetaSPO across five benchmarks including AIME'25, ARC-AGI-1, and GPQA.

Self-Supervised Prompt Optimization

Papers with Code Trending

This paper introduces Self-Supervised Prompt Optimization (SPO), a framework that optimizes prompts for LLMs without external references by using output comparisons, significantly reducing costs and data requirements.

Environment-Grounded Automated Prompt Optimization for LLM Game Agents

arXiv cs.CL

Introduces an automated prompt optimization framework for LLM game agents that decomposes the observation-to-action pipeline into two agents and iteratively refines prompts via an evolutionary loop guided by environment returns. Evaluated on BabyAI tasks, it significantly improves success rates (e.g., from 0% to 72.5% on PutNext) without updating model weights.

SPEAR: Code-Augmented Agentic Prompt Optimization

arXiv cs.CL

SPEAR is a code-augmented agentic prompt optimizer that uses a Python sandbox for structural error analysis, achieving state-of-the-art performance on multiple LLM evaluation suites including industrial judge tasks, BBH, and GSM8K.