graph-based-methods

Tag

Cards List
#graph-based-methods

GraphPO: Graph-based Policy Optimization for Reasoning Models

arXiv cs.CL · yesterday Cached

GraphPO is a novel graph-based reinforcement learning framework that represents rollouts as a directed acyclic graph, merging semantically equivalent reasoning paths to reduce redundant exploration and improve credit assignment for large reasoning models.

0 favorites 0 likes
← Back to home

Submit Feedback