@Kevin_GuoweiXu: How should LLMs sample on hard reasoning problems during post-training and inference where direct rollouts rarely produ…

X AI KOLs Timeline 05/28/26, 03:38 PM Papers

llm search reasoning evolutionary-search post-training inference bidirectional

Summary

Introduces BES (Bidirectional Evolutionary Search), a search framework for LLMs that combines forward candidate evolution with backward goal decomposition to improve sampling on hard reasoning problems during post-training and inference.

How should LLMs sample on hard reasoning problems during post-training and inference where direct rollouts rarely produce a correct answer? Best-of-N (e.g., GRPO) and tree search share two limitations: Verification signals are sparse Candidates stay within the model's own distribution We introduce BES: Bidirectional Evolutionary Search — a search framework that couples forward candidate evolution with backward goal decomposition. Works for both post-training and inference.

Original Article

Similar Articles

@lateinteraction: very cool work !!

X AI KOLs Timeline

Guowei Xu discusses limitations of Best-of-N and tree search methods for LLMs on hard reasoning problems, noting sparse verification signals and that candidates remain within the model's distribution.

Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs

arXiv cs.CL

Deliberate Evolution (DE) is an agentic framework that improves LLM-based symbolic regression by decoupling candidate generation from search control, using adaptive operators, structural diagnosis tools, and reflective memory to achieve better results with only 40% of the standard sample budget.

@burny_tech: A Survey on Latent Reasoning "Large Language Models (LLMs) have demonstrated impressive reasoning capabilities, especia…

X AI KOLs Timeline

This survey provides a comprehensive overview of latent reasoning in LLMs, exploring methods that perform multi-step inference in continuous hidden states without explicit token-level supervision.

@rasbt: How can an LLM switch between low-, medium-, and high-effort reasoning? And how does an LLM learn to reason more or les…

X AI KOLs Timeline

An article explaining how LLMs can switch between low, medium, and high effort reasoning during inference and training.

Counterexample Guided Learning in the Large using Reasoning Agents

arXiv cs.LG

This paper proposes using counterexample-guided learning for LLMs to perform regular-expression induction, where a verifier provides counterexamples to refine candidate expressions. The method significantly improves sample efficiency and success rates on challenging tasks, demonstrating that LLMs can benefit from structured feedback beyond treating it as additional data.

Similar Articles

@lateinteraction: very cool work !!

Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs

@burny_tech: A Survey on Latent Reasoning "Large Language Models (LLMs) have demonstrated impressive reasoning capabilities, especia…

@rasbt: How can an LLM switch between low-, medium-, and high-effort reasoning? And how does an LLM learn to reason more or les…

Counterexample Guided Learning in the Large using Reasoning Agents

Submit Feedback