@Kevin_GuoweiXu: How should LLMs sample on hard reasoning problems during post-training and inference where direct rollouts rarely produ…
Summary
Introduces BES (Bidirectional Evolutionary Search), a search framework for LLMs that combines forward candidate evolution with backward goal decomposition to improve sampling on hard reasoning problems during post-training and inference.
Similar Articles
@lateinteraction: very cool work !!
Guowei Xu discusses limitations of Best-of-N and tree search methods for LLMs on hard reasoning problems, noting sparse verification signals and that candidates remain within the model's distribution.
Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs
Deliberate Evolution (DE) is an agentic framework that improves LLM-based symbolic regression by decoupling candidate generation from search control, using adaptive operators, structural diagnosis tools, and reflective memory to achieve better results with only 40% of the standard sample budget.
@jiqizhixin: Awesome blog! State of RL for reasoning LLMs https://aweers.de/blog/2026/rl-for-llms/…
A comprehensive blog post reviewing the state of reinforcement learning for reasoning LLMs, covering methods from REINFORCE and PPO to GRPO and beyond, with connections to key models like InstructGPT and DeepSeek-R1.
When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions
This paper investigates when chain-of-thought reasoning is beneficial for LLMs, showing that early-stage entropy dynamics reliably indicate reasoning utility, and introduces EDRM, a lightweight, training-free framework that adaptively selects inference strategies to achieve significant token savings while maintaining or improving accuracy.
What Makes an LLM a Good Optimizer? A Trajectory Analysis of LLM-Guided Evolutionary Search
Large-scale study of 15 LLMs across 8 tasks reveals that optimization success hinges on maintaining localized search trajectories rather than initial problem-solving ability or solution novelty.