Tag
ReM-MoA introduces a memory-augmented Mixture-of-Agents framework that sustains scaling through ranked reasoning memory and curated diversified memory routing, outperforming prior MoA variants across five reasoning benchmarks.
Sakana AI introduces AB-MCTS, an inference-time scaling algorithm that enables multiple frontier AI models (Gemini 2.5 Pro, o4-mini, DeepSeek-R1-0528) to cooperate, significantly outperforming individual models on the ARC-AGI-2 benchmark.
This paper formulates LLM inference budget allocation as a constrained optimization problem, proposing CLEAR to reallocate resources from low-utility queries to those near emergence thresholds, achieving up to 3× accuracy improvement under tight budgets.
This paper introduces a method to predict best-of-N inference scaling gains for language models using cheap statistics from a single labeled validation-set sampling pass. A compact predictor with three core features achieves Spearman ρ=0.90 with actual gains, enabling screening of configurations before expensive reward-model scoring.
A new paper shows that using a weak model with k=8 proposals and a critic-comparator selection loop can match frontier model performance on SWE-bench Verified, reaching 76.4% accuracy. The key insight is that correct patches are often already present in a weak model's top-k candidates, and the challenge is effective selection using execution verification.
This paper studies verifier-backed committee search as inference-time boosting for reasoning language models, showing that a committee of weak reasoning models can match the performance of much stronger models on code repair tasks like SWE-bench Verified.
This paper introduces Test-Time Personalization (TTP), a framework that improves LLM personalization by scaling inference-time computation through candidate sampling and reward-based selection. It diagnoses failure modes in standard reward models and proposes a probabilistic personalized reward model to mitigate them.
This paper introduces Distributional Process Reward Models, using conditional optimal transport to calibrate PRMs for more accurate success probability estimates in inference-time scaling. It demonstrates improved calibration and downstream performance on mathematical reasoning benchmarks like MATH-500 and AIME.
RAO (Recursive Agent Optimization) is an end-to-end reinforcement learning approach for training LLM agents to spawn, delegate to, and coordinate with recursive copies of themselves, turning recursive inference into a learned capability.
This paper introduces Recursive Language Models (RLMs), an inference strategy that enables LLMs to process arbitrarily long prompts by treating them as external environments and recursively calling themselves over prompt snippets. RLMs handle inputs two orders of magnitude beyond context windows and outperform base LLMs on long-context tasks with comparable cost.