Tag
ReCrit introduces a transition-aware reinforcement learning framework for scientific critic reasoning, decomposing initial-to-critic behavior into four quadrants (Correction, Sycophancy, Robustness, Boundary) and using dynamic asynchronous rollout. It improves critic accuracy significantly on Qwen models across multiple scientific benchmarks.