LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
Summary
This paper introduces AutoTTS, an environment-driven framework that automates the discovery of test-time scaling strategies for LLMs by formulating it as controller synthesis. It demonstrates improved accuracy-cost tradeoffs on mathematical reasoning benchmarks with minimal computational overhead.
View Cached Full Text
Cached at: 05/11/26, 02:42 AM
Paper page - LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
Source: https://huggingface.co/papers/2605.08083 Published on May 8
#2 Paper of the day Authors:
,
,
,
,
,
,
,
,
,
,
,
Abstract
AutoTTS automates test-time scaling strategy discovery by formulating it as controller synthesis over reasoning trajectories and probe signals, achieving improved accuracy-cost tradeoffs with minimal computational overhead.
Test-time scaling(TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, leaving much of the computation-allocation space unexplored. We propose an environment-driven framework, AutoTTS, that changes what researchers design: from individual TTS heuristics to environments where TTS strategies can be discovered automatically. The key to AutoTTS lies in environment construction: the discovery environment must make the control space tractable and provide cheap, frequent feedback for TTS search. As a concrete instantiation, we formulate width--depth TTS ascontroller synthesisover pre-collectedreasoning trajectoriesandprobe signals, where controllers decide when to branch, continue, probe, prune, or stop and can be evaluated cheaply without repeated LLM calls. We further introducebeta parameterizationto make the search tractable andfine-grained execution trace feedbackto improve discovery efficiency by helping the agent diagnose why a TTS program fails. Experiments on mathematical reasoning benchmarks show that the discovered strategies improve the overall accuracy--cost tradeoff over strong manually designed baselines. The discovered strategies generalize to held-out benchmarks and model scales, while the entire discovery costs only $39.9 and 160 minutes. Our data, and code will be open-source at https://github.com/zhengkid/AutoTTS.
View arXiv pageView PDFProject pageGitHub16Add to collection
Get this paper in your agent:
hf papers read 2605\.08083
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.08083 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.08083 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.08083 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Agentic Test-Time Scaling (GitHub Repo)
AutoTTS is an open-source tool that uses agentic discovery to automatically find optimal test-time scaling strategies for LLMs, significantly reducing token usage and cost through replay-based evaluation.
LLMs Know When They Know, but Do Not Act on It: A Metacognitive Harness for Test-time Scaling
This paper proposes a metacognitive harness that separates monitoring from reasoning in LLMs, using pre-solve feeling-of-knowing and post-solve judgment-of-learning signals to control when to trust, retry, or aggregate answers, improving accuracy on text, code, and multimodal benchmarks without parameter updates.
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive
This paper introduces AutoLLMResearch, an agentic framework that automates the configuration of expensive LLM experiments by learning from low-fidelity environments and extrapolating to high-cost settings. It aims to reduce computational waste and reliance on expert intuition in scalable LLM research.
@ihtesham2005: If you still think AI agents can't do real research, this paper will end that argument. Researchers from Google and Met…
Researchers from Google and Meta propose AutoTTS, a framework using AI agents to automatically discover and refine test-time scaling strategies for LLMs without human intervention. The agent successfully identified complex, coordinated reasoning mechanisms that outperformed manual baselines at a low computational cost.
Researchers let AI Agents Optimize LLM Reasoning and Cut Tokens by 70%
Researchers developed AutoTTS, a framework where AI agents automatically design control policies to optimize LLM inference, cutting token consumption by approximately 70% while maintaining high reasoning accuracy.