Tag
SocraticPO augments RL rollouts with Socratic-style natural language guidance and reward decay to improve scientific reasoning in LLMs, outperforming strong baselines on SciKnowEval benchmarks.