@askalphaxiv: A fascinating paper supervised by Yoshua Bengio "Generative Recursive Reasoning" Test time compute should scale not jus…

X AI KOLs Timeline Papers

Summary

The paper 'Generative Recursive Reasoning' introduces a method that scales test-time compute by sampling multiple latent reasoning trajectories in parallel, enabling the model to explore diverse hypotheses and avoid deterministic collapse. This approach improves performance on tasks such as Sudoku, ARC AGI, N Queens, and graph coloring, and can also generate valid Sudoku boards and MNIST digits.

A fascinating paper supervised by Yoshua Bengio "Generative Recursive Reasoning" Test time compute should scale not just by thinking deeper, but by thinking wider. This paper makes recursion generative. It samples many latent reasoning trajectories, letting the model explore multiple hypotheses in parallel, so they don't follow one deterministic path and collapse to one answer. It improves Sudoku, ARC AGI, N Queens, and graph coloring, while also generating valid Sudoku boards and MNIST digits from scratch.
Original Article
View Cached Full Text

Cached at: 05/21/26, 07:39 PM

A fascinating paper supervised by Yoshua Bengio

“Generative Recursive Reasoning”

Test time compute should scale not just by thinking deeper, but by thinking wider.

This paper makes recursion generative. It samples many latent reasoning trajectories, letting the model explore multiple hypotheses in parallel, so they don’t follow one deterministic path and collapse to one answer.

It improves Sudoku, ARC AGI, N Queens, and graph coloring, while also generating valid Sudoku boards and MNIST digits from scratch.

Similar Articles

Generative Recursive Reasoning

arXiv cs.AI

This paper introduces Generative Recursive reAsoning Models (GRAM), a probabilistic framework that extends recursive reasoning models by enabling stochastic latent trajectories, multiple hypotheses, and inference-time scaling through depth and parallel sampling.

Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions

arXiv cs.AI

This paper evaluates three approaches (pure chain-of-thought reasoning, single-shot code execution, and iterative code execution) on 1,000 GSM-Symbolic problems using Claude Haiku 4.5, finding that chain-of-thought is the most robust to perturbation, while code execution does not improve reasoning robustness on grade-school math problems.

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy

Hugging Face Daily Papers

TMAS introduces a multi-agent framework that enhances large language model reasoning by scaling test-time compute through structured collaboration and hierarchical memory systems. The approach uses specialized agents, cross-trajectory information flow, and hybrid reward reinforcement learning to improve iterative scaling and stability on challenging reasoning benchmarks.

Reasoning Models Don't Just Think Longer, They Move Differently

arXiv cs.CL

This paper investigates whether reasoning-trained language models simply allocate more compute (longer chains of thought) or follow qualitatively different internal trajectories by analyzing hidden-state trajectory geometry across code, math, and SAT domains. After correcting for generation length, they find that reasoning-trained models exhibit distinct trajectory geometry—most clearly in code—indicating reasoning training changes how computation unfolds, not just how much is used.