graph-based-methods

#graph-based-methods

GraphPO: Graph-based Policy Optimization for Reasoning Models

arXiv cs.CL ↗ · yesterday Cached

GraphPO is a novel graph-based reinforcement learning framework that represents rollouts as a directed acyclic graph, merging semantically equivalent reasoning paths to reduce redundant exploration and improve credit assignment for large reasoning models.

0 favorites 0 likes

graph-based-methods

GraphPO: Graph-based Policy Optimization for Reasoning Models

Submit Feedback