strategy-distillation

Tag

Cards List
#strategy-distillation

Beyond Trajectory Imitation: Strategy-Guided Policy Optimization for LLM Reasoning

arXiv cs.AI · 2d ago Cached

Introduces Strategy-Guided Policy Optimization (SGPO) for LLM reasoning, which replaces trajectory imitation with strategy distillation, improving generalization on math benchmarks.

0 favorites 0 likes
← Back to home

Submit Feedback