bbh

Tag

Cards List
#bbh

SPEAR: Code-Augmented Agentic Prompt Optimization

arXiv cs.CL · 2026-05-27 Cached

SPEAR is a code-augmented agentic prompt optimizer that uses a Python sandbox for structural error analysis, achieving state-of-the-art performance on multiple LLM evaluation suites including industrial judge tasks, BBH, and GSM8K.

0 favorites 0 likes
← Back to home

Submit Feedback