reward-decay

Tag

Cards List
#reward-decay

SocraticPO: Policy Optimization via Interactive Guidance

arXiv cs.LG · 6d ago Cached

SocraticPO augments RL rollouts with Socratic-style natural language guidance and reward decay to improve scientific reasoning in LLMs, outperforming strong baselines on SciKnowEval benchmarks.

0 favorites 0 likes
← Back to home

Submit Feedback