weak-to-strong

Tag

Cards List
#weak-to-strong

Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight

arXiv cs.AI · 2026-06-02 Cached

Proposes on-policy critique distillation (Opcd) using weak models as critics to provide revision directions for strong models, improving reasoning and alignment without requiring weak models to solve tasks.

0 favorites 0 likes
#weak-to-strong

Weak-to-Strong Elicitation via Mismatched Wrong Drafts

arXiv cs.CL · 2026-05-19 Cached

The paper proposes a method using mismatched wrong drafts from a weaker model to elicit superior reasoning in a stronger learner via GRPO, achieving state-of-the-art results on Mathstral-7B for MATH-500 and AIME benchmarks.

0 favorites 0 likes
#weak-to-strong

@dair_ai: NEW paper worth reading. GPT-5.4 nano plus a critic-comparator orchestration loop hits 76.4% on SWE-bench Verified, mat…

X AI KOLs Following · 2026-05-18 Cached

A new paper shows that using a weak model with k=8 proposals and a critic-comparator selection loop can match frontier model performance on SWE-bench Verified, reaching 76.4% accuracy. The key insight is that correct patches are often already present in a weak model's top-k candidates, and the challenge is effective selection using execution verification.

0 favorites 0 likes
#weak-to-strong

@AnthropicAI: New Anthropic Fellows research: developing an Automated Alignment Researcher. We ran an experiment to learn whether Cla…

X AI KOLs · 2026-04-14

Anthropic Fellows research demonstrates an experiment using Claude Opus 4.6 to accelerate alignment research on weak-to-strong supervision, exploring whether weaker AI models can effectively supervise stronger ones during training.

0 favorites 0 likes
← Back to home

Submit Feedback