Tag
Proposes on-policy critique distillation (Opcd) using weak models as critics to provide revision directions for strong models, improving reasoning and alignment without requiring weak models to solve tasks.