weak-to-strong

#weak-to-strong

Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight

arXiv cs.AI ↗ · 2026-06-02 Cached

Proposes on-policy critique distillation (Opcd) using weak models as critics to provide revision directions for strong models, improving reasoning and alignment without requiring weak models to solve tasks.

0 favorites 0 likes

#weak-to-strong

Weak-to-Strong Elicitation via Mismatched Wrong Drafts

arXiv cs.CL ↗ · 2026-05-19 Cached

The paper proposes a method using mismatched wrong drafts from a weaker model to elicit superior reasoning in a stronger learner via GRPO, achieving state-of-the-art results on Mathstral-7B for MATH-500 and AIME benchmarks.

0 favorites 0 likes

#weak-to-strong

@dair_ai: NEW paper worth reading. GPT-5.4 nano plus a critic-comparator orchestration loop hits 76.4% on SWE-bench Verified, mat…

X AI KOLs Following ↗ · 2026-05-18 Cached

A new paper shows that using a weak model with k=8 proposals and a critic-comparator selection loop can match frontier model performance on SWE-bench Verified, reaching 76.4% accuracy. The key insight is that correct patches are often already present in a weak model's top-k candidates, and the challenge is effective selection using execution verification.

0 favorites 0 likes

#weak-to-strong

@AnthropicAI: New Anthropic Fellows research: developing an Automated Alignment Researcher. We ran an experiment to learn whether Cla…

X AI KOLs ↗ · 2026-04-14

Anthropic Fellows research demonstrates an experiment using Claude Opus 4.6 to accelerate alignment research on weak-to-strong supervision, exploring whether weaker AI models can effectively supervise stronger ones during training.

0 favorites 0 likes

weak-to-strong

Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight

Weak-to-Strong Elicitation via Mismatched Wrong Drafts

@dair_ai: NEW paper worth reading. GPT-5.4 nano plus a critic-comparator orchestration loop hits 76.4% on SWE-bench Verified, mat…

@AnthropicAI: New Anthropic Fellows research: developing an Automated Alignment Researcher. We ran an experiment to learn whether Cla…

Submit Feedback