critique-distillation

Tag

Cards List
#critique-distillation

Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight

arXiv cs.AI · 2026-06-02 Cached

Proposes on-policy critique distillation (Opcd) using weak models as critics to provide revision directions for strong models, improving reasoning and alignment without requiring weak models to solve tasks.

0 favorites 0 likes
← Back to home

Submit Feedback