Tag
Critic-R introduces a framework using a critic model to provide introspective feedback between the reasoning agent and retriever, improving agentic search performance at both inference and training time without requiring retraining the agent.
OpenAI introduced CriticGPT, a GPT-4-based model designed to catch errors in ChatGPT's code output. When human trainers use CriticGPT for code review, they outperform those without assistance 60% of the time, addressing a fundamental limitation of RLHF as models become increasingly capable.