Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher

Hugging Face Daily Papers 05/31/26, 12:00 AM Papers

Summary

Trust functions enable near-lossless weak-to-strong generalization by identifying reliable weak labels for training, achieving performance comparable to ground-truth supervision across multiple domains.

Weak-to-strong generalization studies how to improve a strong student using supervision from a weaker teacher when reliable labels are scarce. We view this primarily as a data selection problem, where the key challenge is to identify which weak labels are reliable enough to serve as a training signal. To address this, we introduce trust functions that assign each weak label a scalar trust score and use these scores to filter weak supervision. Across several domains, including world knowledge, quantitative reasoning, and strategy games, trust filtering yields students that match and sometimes surpass ground-truth supervision, achieving near-lossless weak-to-strong generalization. Moreover, trust functions enable an iterative weak-to-strong chain that compounds gains by training a student and reusing it as the next teacher, amplifying the gains. There are several mechanisms to which advantage of trust functions can be attributed.

Original Article

View Cached Full Text

Cached at: 06/10/26, 12:08 AM

Paper page - Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher

Source: https://huggingface.co/papers/2606.01000

Abstract

Trust functions enable effective weak-to-strong generalization by identifying reliable weak labels for training, achieving performance comparable to ground-truth supervision across multiple domains.

Weak-to-strong generalizationstudies how to improve a strong student using supervision from a weaker teacher whenreliable labelsare scarce. We view this primarily as adata selectionproblem, where the key challenge is to identify which weak labels are reliable enough to serve as atraining signal. To address this, we introducetrust functionsthat assign each weak label a scalar trust score and use these scores to filterweak supervision. Across several domains, including world knowledge, quantitative reasoning, and strategy games, trust filtering yields students that match and sometimes surpass ground-truth supervision, achieving near-losslessweak-to-strong generalization. Moreover,trust functionsenable aniterative weak-to-strong chainthat compounds gains by training a student and reusing it as the next teacher, amplifying the gains. There are several mechanisms to which advantage oftrust functionscan be attributed.

View arXiv page View PDF Project page GitHub0 Add to collection

Get this paper in your agent:

hf papers read 2606\.01000

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.01000 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.01000 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.01000 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher

Paper page - Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Trust-Region Behavior Blending for On-Policy Distillation

A Systematic Study of Training-Free Methods for Trustworthy Large Language Models

When Can LLMs Learn to Reason with Weak Supervision?

Weak-to-strong generalization

Trust Region On-Policy Distillation

Submit Feedback

Similar Articles

Trust-Region Behavior Blending for On-Policy Distillation

A Systematic Study of Training-Free Methods for Trustworthy Large Language Models

When Can LLMs Learn to Reason with Weak Supervision?

Trust Region On-Policy Distillation