SAGE: Stochastic Prompt Optimization via Agent-Guided Exploration
Summary
Introduces SPO, a stochastic search framework for automatic prompt optimization, with three strategies including SAGE, an agent-guided multi-agent pipeline. Evaluated on benchmarks and deployed on a mental-health chatbot, showing improvements in retention through continuous optimization.
View Cached Full Text
Cached at: 06/18/26, 05:46 AM
# SAGE: Stochastic Prompt Optimization via Agent-Guided Exploration Source: [https://arxiv.org/abs/2606.18902](https://arxiv.org/abs/2606.18902) [View PDF](https://arxiv.org/pdf/2606.18902) > Abstract:Context engineering has emerged as a primary lever for improving AI systems without parameter updates\. Recent work showing that textual gradients do not function as real gradients motivates treating automatic prompt optimization \(APO\) as black\-box search\. We introduce SPO \(Stochastic Prompt Optimization\), a framework for stochastic search over prompt space, and compare three strategies of increasing sophistication: error\-informed random search, a genetic algorithm with evolutionary operators, and SAGE \(SPO via Agent\-Guided Exploration\), a multi\-agent pipeline with diagnostic code execution\. Across three benchmarks, no single strategy dominates; effectiveness depends on the interaction of landscape structure with error type\. We further deploy SAGE on a mental\-health chatbot under a continuous optimization paradigm, where it compounds eight cycles of individually\-noisy A/B tests into a statistically robust gain in next\-day retention\. We argue that coupling qualitative diagnosis with quantitative validation is what makes agentic optimization effective for open\-ended task\-oriented dialogue\. ## Submission history From: Ziyi Zhu \[[view email](https://arxiv.org/show-email/99e3891a/2606.18902)\] **\[v1\]**Wed, 17 Jun 2026 10:25:25 UTC \(739 KB\)
Similar Articles
SePO: Self-Evolving Prompt Agent for System Prompt Optimization
SePO (Self-Evolving Prompt Optimization) proposes a self-referential prompt agent that optimizes both task agents' system prompts and its own system prompt through an evolutionary search, outperforming Manual-CoT, TextGrad, and MetaSPO across five benchmarks including AIME'25, ARC-AGI-1, and GPQA.
Self-Supervised Prompt Optimization
This paper introduces Self-Supervised Prompt Optimization (SPO), a framework that optimizes prompts for LLMs without external references by using output comparisons, significantly reducing costs and data requirements.
Environment-Grounded Automated Prompt Optimization for LLM Game Agents
Introduces an automated prompt optimization framework for LLM game agents that decomposes the observation-to-action pipeline into two agents and iteratively refines prompts via an evolutionary loop guided by environment returns. Evaluated on BabyAI tasks, it significantly improves success rates (e.g., from 0% to 72.5% on PutNext) without updating model weights.
SocraticPO: Policy Optimization via Interactive Guidance
SocraticPO augments RL rollouts with Socratic-style natural language guidance and reward decay to improve scientific reasoning in LLMs, outperforming strong baselines on SciKnowEval benchmarks.
SPEAR: Code-Augmented Agentic Prompt Optimization
SPEAR is a code-augmented agentic prompt optimizer that uses a Python sandbox for structural error analysis, achieving state-of-the-art performance on multiple LLM evaluation suites including industrial judge tasks, BBH, and GSM8K.