reusable-rubric

Tag

Cards List
#reusable-rubric

ARBOR: Online Process Rewards via a Reusable Rubric Buffer for Search Agents

arXiv cs.CL · 2026-06-03 Cached

ARBOR introduces a reusable rubric buffer to provide online process rewards for LLM-based search agents, improving training efficiency when outcome-only rewards are insufficient. It outperforms GRPO and DAPO on multi-hop QA benchmarks, converting up to 42% of zero-gradient training groups into informative ones.

0 favorites 0 likes
← Back to home

Submit Feedback