@Kevin_GuoweiXu: How should LLMs sample on hard reasoning problems during post-training and inference where direct rollouts rarely produ…

X AI KOLs Timeline Papers

Summary

Introduces BES (Bidirectional Evolutionary Search), a search framework for LLMs that combines forward candidate evolution with backward goal decomposition to improve sampling on hard reasoning problems during post-training and inference.

How should LLMs sample on hard reasoning problems during post-training and inference where direct rollouts rarely produce a correct answer? Best-of-N (e.g., GRPO) and tree search share two limitations: Verification signals are sparse Candidates stay within the model's own distribution We introduce BES: Bidirectional Evolutionary Search — a search framework that couples forward candidate evolution with backward goal decomposition. Works for both post-training and inference.
Original Article

Similar Articles

@lateinteraction: very cool work !!

X AI KOLs Timeline

Guowei Xu discusses limitations of Best-of-N and tree search methods for LLMs on hard reasoning problems, noting sparse verification signals and that candidates remain within the model's distribution.

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

arXiv cs.LG

This paper investigates when chain-of-thought reasoning is beneficial for LLMs, showing that early-stage entropy dynamics reliably indicate reasoning utility, and introduces EDRM, a lightweight, training-free framework that adaptively selects inference strategies to achieve significant token savings while maintaining or improving accuracy.