Tag
Active-GRPO introduces an adaptive imitation and self-improving reasoning framework that dynamically decides when to imitate references and when to reinforce the model's own discoveries for molecular optimization, achieving statistically significant improvements over previous methods on the TOMG-Bench-MolOpt benchmark.
This paper introduces BALAR, a training-free Bayesian agentic loop algorithm that enables large language models to actively reason and ask clarifying questions in multi-turn interactions. It demonstrates significant performance improvements over baselines on detective, puzzle, and clinical diagnosis benchmarks.