inference-time-scaling

#inference-time-scaling

ReM-MoA: Reasoning Memory Sustains Mixture-of-Agents Scaling

arXiv cs.AI ↗ · 5d ago Cached

ReM-MoA introduces a memory-augmented Mixture-of-Agents framework that sustains scaling through ranked reasoning memory and curated diversified memory routing, outperforming prior MoA variants across five reasoning benchmarks.

0 favorites 0 likes

#inference-time-scaling

Sakana Fugu (3 minute read)

TLDR AI ↗ · 2026-06-22 Cached

Sakana AI introduces AB-MCTS, an inference-time scaling algorithm that enables multiple frontier AI models (Gemini 2.5 Pro, o4-mini, DeepSeek-R1-0528) to cooperate, significantly outperforming individual models on the ARC-AGI-2 benchmark.

0 favorites 0 likes

#inference-time-scaling

The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs

arXiv cs.AI ↗ · 2026-06-03 Cached

This paper formulates LLM inference budget allocation as a constrained optimization problem, proposing CLEAR to reallocate resources from low-utility queries to those near emergence thresholds, achieving up to 3× accuracy improvement under tight budgets.

0 favorites 0 likes

#inference-time-scaling

Predicting Inference-Time Scaling Gains from Labeled Validation-Set Output Statistics

arXiv cs.CL ↗ · 2026-06-03 Cached

This paper introduces a method to predict best-of-N inference scaling gains for language models using cheap statistics from a single labeled validation-set sampling pass. A compact predictor with three core features achieves Spearman ρ=0.90 with actual gains, enabling screening of configurations before expensive reward-model scoring.

0 favorites 0 likes

#inference-time-scaling

@dair_ai: NEW paper worth reading. GPT-5.4 nano plus a critic-comparator orchestration loop hits 76.4% on SWE-bench Verified, mat…

X AI KOLs Following ↗ · 2026-05-18 Cached

A new paper shows that using a weak model with k=8 proposals and a critic-comparator selection loop can match frontier model performance on SWE-bench Verified, reaching 76.4% accuracy. The key insight is that correct patches are often already present in a weak model's top-k candidates, and the challenge is effective selection using execution verification.

0 favorites 0 likes

#inference-time-scaling

Agentic Systems as Boosting Weak Reasoning Models

arXiv cs.AI ↗ · 2026-05-15 Cached

This paper studies verifier-backed committee search as inference-time boosting for reasoning language models, showing that a committee of weak reasoning models can match the performance of much stronger models on code repair tasks like SWE-bench Verified.

0 favorites 0 likes

#inference-time-scaling

Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures

arXiv cs.LG ↗ · 2026-05-13 Cached

This paper introduces Test-Time Personalization (TTP), a framework that improves LLM personalization by scaling inference-time computation through candidate sampling and reward-based selection. It diagnoses failure modes in standard reward models and proposes a probabilistic personalized reward model to mitigate them.

0 favorites 0 likes

#inference-time-scaling

Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport

arXiv cs.LG ↗ · 2026-05-11 Cached

This paper introduces Distributional Process Reward Models, using conditional optimal transport to calibrate PRMs for more accurate success probability estimates in inference-time scaling. It demonstrates improved calibration and downstream performance on mathematical reasoning benchmarks like MATH-500 and AIME.

0 favorites 0 likes

#inference-time-scaling

@apurvasgandhi: Sub-agents are a promising inference-time scaling primitive: • Expand an agent's working memory • Divide-and-conquer ha…

X AI KOLs Timeline ↗ · 2026-05-08

RAO (Recursive Agent Optimization) is an end-to-end reinforcement learning approach for training LLM agents to spawn, delegate to, and coordinate with recursive copies of themselves, turning recursive inference into a learned capability.

0 favorites 0 likes

#inference-time-scaling

Recursive Language Models

Papers with Code Trending ↗ · 2025-12-31 Cached

This paper introduces Recursive Language Models (RLMs), an inference strategy that enables LLMs to process arbitrarily long prompts by treating them as external environments and recursively calling themselves over prompt snippets. RLMs handle inputs two orders of magnitude beyond context windows and outperform base LLMs on long-context tasks with comparable cost.

0 favorites 0 likes

inference-time-scaling

Submit Feedback