reusable-rubric

#reusable-rubric

ARBOR: Online Process Rewards via a Reusable Rubric Buffer for Search Agents

arXiv cs.CL ↗ · 2026-06-03 Cached

ARBOR introduces a reusable rubric buffer to provide online process rewards for LLM-based search agents, improving training efficiency when outcome-only rewards are insufficient. It outperforms GRPO and DAPO on multi-hop QA benchmarks, converting up to 42% of zero-gradient training groups into informative ones.

0 favorites 0 likes

reusable-rubric

ARBOR: Online Process Rewards via a Reusable Rubric Buffer for Search Agents

Submit Feedback