Avoiding Structural Failure Modes in Tabular Fair SSL: Online Primal-Dual Allocation under Confidence Gating

arXiv cs.LG Papers

Summary

This paper identifies structural failure modes in tabular fair semi-supervised learning under confidence gating and proposes Online Primal-Dual Allocation (OPDA) to mitigate them without per-dataset tuning.

arXiv:2605.16446v1 Announce Type: new Abstract: Semi-supervised learning (SSL) enables prediction with limited labels, but high-stakes tabular applications (medical, credit, recidivism) require statistical fairness guarantees. We identify a structural conflict in tabular fair SSL through a diagnostic stress test: under confidence-gated pseudo-labeling, moment-matching fairness regularizers can trigger two failure modes -- Masking Collapse (fairness erodes confidence, starving pseudo-labels) and Trivial Saturation (drift to constant predictors). We propose Online Primal-Dual Allocation (OPDA), an online controller that schedules fairness and entropy-based stability penalties using violation, risk, and pseudo-label health signals, avoiding per-dataset selection of a fixed fairness weight within this diagnostic regime. On the evaluated tabular benchmarks (Adult, ACSIncome, COMPAS), OPDA mitigates the degenerate regimes observed under static weighting and simple single-signal adaptive baselines. On Adult and COMPAS, it yields non-degenerate operating points competitive with the empirical static-$\lambda$ frontier; on ACSIncome, it preserves utility with a wider fairness-utility spread. Relative to OPDA-lite, the full controller mainly shifts the operating point toward higher utility on ACSIncome, while Adult highlights the fairness-utility trade-off between the two variants. These results position OPDA as a calibration-free controller for non-degenerate operating points in tabular fair SSL without per-dataset tuning.
Original Article
View Cached Full Text

Cached at: 05/19/26, 06:43 AM

# Avoiding Structural Failure Modes in Tabular Fair SSL: Online Primal-Dual Allocation under Confidence Gating
Source: [https://arxiv.org/html/2605.16446](https://arxiv.org/html/2605.16446)
11institutetext:1College of Computer Science and Technology, Jilin University, China
2Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, China###### Abstract

Semi\-supervised learning \(SSL\) enables prediction with limited labels, but high\-stakes tabular applications \(medical, credit, recidivism\) require statistical fairness guarantees\. We identify a structural conflict in tabular fair SSL through a diagnostic stress test: under confidence\-gated pseudo\-labeling, moment\-matching fairness regularizers can trigger two failure modes—Masking Collapse \(fairness erodes confidence, starving pseudo\-labels\) and Trivial Saturation \(drift to constant predictors\)\. We propose Online Primal\-Dual Allocation \(OPDA\), an online controller that schedules fairness and entropy\-based stability penalties using violation, risk, and pseudo\-label health signals, avoiding per\-dataset selection of a fixed fairness weight within this diagnostic regime\. On the evaluated tabular benchmarks \(Adult, ACSIncome, COMPAS\), OPDA mitigates the degenerate regimes observed under static weighting and simple single\-signal adaptive baselines\. On Adult and COMPAS, it yields non\-degenerate operating points competitive with the empirical static\-λ\\lambdafrontier; on ACSIncome, it preserves utility with a wider fairness\-utility spread\. Relative to OPDA\-lite, the full controller mainly shifts the operating point toward higher utility on ACSIncome, while Adult highlights the fairness–utility trade\-off between the two variants\. These results position OPDA as a calibration\-free controller for non\-degenerate operating points in tabular fair SSL without per\-dataset tuning\.

## 1Introduction

Consider high\-stakes tabular applications such as medical decision support, credit scoring, and recidivism prediction, where labeled data is scarce due to expensive expert annotations, yet regulatory compliance mandates statistical fairness guarantees across demographic groups\. Semi\-supervised learning \(SSL\) enables prediction with limited labels, but modern pseudo\-label SSL relies on*confidence gating*: pseudo\-labels are retained only when predictions exceed thresholdτ\\tau\[[22](https://arxiv.org/html/2605.16446#bib.bib1)\]\. However, fairness regularizers enforce moment matching across groups, systematically suppressing confidence and disabling the pseudo\-labeling mechanism\.

##### Scope\.

This work focuses on*tabular multi\-output*fair SSL, motivated by three factors: \(1\)Prevalence in high\-stakes applications: tabular data dominates medical decision support, credit scoring, and recidivism prediction, where both label scarcity and fairness constraints are critical\[[16](https://arxiv.org/html/2605.16446#bib.bib27),[9](https://arxiv.org/html/2605.16446#bib.bib28),[1](https://arxiv.org/html/2605.16446#bib.bib38)\]; \(2\)Isolation of failure mechanisms: the multi\-label structure \(L\>1L\{\>\}1\) allows us to isolate and verify the predicted failure modes without confounding factors from high\-dimensional representations; \(3\)Computational efficiency: tabular benchmarks enable extensive ablations and sensitivity analyses that would be prohibitively expensive on image data\. Accordingly, our empirical evaluation uses a diagnostic stress\-test setting to isolate and verify the failure modes, but the identified mechanisms \(Masking Collapse, Trivial Saturation\) are fundamental to confidence\-gated fair SSL, and OPDA’s multi\-signal design is applicable to standard single\-label settings\.

We identify twostructural failure modes\.Type I \(Masking Collapse\):fairness pressure concentrates predictions near decision boundaries, pseudo\-label coverageqtq\_\{t\}collapses, yielding gradient starvation\.Type II \(Trivial Saturation\):training drifts toward constant predictors with near\-zero fairness violation but severely degraded utility\. Both failures arise from*non\-stationary*coupling between pseudo\-label selection and constraint evaluation, rendering static weighting brittle\. Figure[1](https://arxiv.org/html/2605.16446#S1.F1)illustrates these causal pathways\.

![Refer to caption](https://arxiv.org/html/2605.16446v1/x1.png)Figure 1:Causal mechanisms of structural failures in fair SSL\.High fairness pressure diverges into two pathways:Type I \(Masking Collapse\)erodes confidence and starves gradients;Type II \(Trivial Saturation\)collapses predictions to constants with degraded utility\.We formalize these pathologies from three perspectives\. First, under confidence\-gated SSL, sufficiently strong moment\-matching penalties admit constant solutions with vanishing gradients \(Proposition 1; Appendix A\)\. Second, for SimFair\[[18](https://arxiv.org/html/2605.16446#bib.bib13)\], fairness enforcement is strictly sign\-conflicting with entropy\-based confidence sharpening in logit space \(Proposition 2; Appendix B\)\. Third, we provide convex\-surrogate analysis of OPDA’s outer\-loop dynamics with sublinear\-regret guarantees \(Appendix G\)\.

This distinguishes our approach from multi\-objective methods \(PCGrad\[[24](https://arxiv.org/html/2605.16446#bib.bib22)\], MGDA\[[21](https://arxiv.org/html/2605.16446#bib.bib20)\]\) that assume stationary objectives\. Fair SSL under confidence gating exhibits endogenous non\-stationarity: constraint functionsVt​\(θ\)V\_\{t\}\(\\theta\)andHt​\(θ\)H\_\{t\}\(\\theta\)depend on evolving pseudo\-label maskMt​\(θ\)M\_\{t\}\(\\theta\)\. OPDA tracks a moving equilibrium by adapting dual weights epoch\-by\-epoch using online observables\.

We proposeOnline Primal\-Dual Allocation \(OPDA\)to resolve this structural deadlock\. OPDA uses bilevel budget–allocation parameterization:λv\(t\)=Bt​πt\\lambda\_\{v\}^\{\(t\)\}=B\_\{t\}\\pi\_\{t\}andλh\(t\)=Bt​\(1−πt\)\\lambda\_\{h\}^\{\(t\)\}=B\_\{t\}\(1\-\\pi\_\{t\}\)\. We update\(Bt,πt\)\(B\_\{t\},\\pi\_\{t\}\)using online observables including fairness violation \(vtv\_\{t\}\), risk proxy \(rtr\_\{t\}\), pseudo\-label health signals \(qt,pt,ESStq\_\{t\},p\_\{t\},\\mathrm\{ESS\}\_\{t\}\), and gradient alignment\. Budget dynamics implement noise\-robust knee\-seeking; allocation dynamics are conflict\-aware and anti\-starvation\-guaranteed\. OPDA runs with the same default configuration across all datasets \(supplementary Appendix D\)\.

##### Contributions\.

- •Mechanism diagnosis\.We identify two failure modes in confidence\-gated tabular fair SSL and formalize sufficient local mechanisms: vanishing\-gradient neighborhoods \(Proposition 1; Appendix A\) and exact logit\-space sign conflict \(Proposition 2; Appendix B\)\.
- •Pseudo\-label\-health\-aware controller\.We propose OPDA, an epoch\-level budget\-allocation controller that schedules fairness and stability pressure using online observables, with a single default configuration across the evaluated tabular settings\.
- •Empirical evidence in tabular stress tests\.Across Adult, ACSIncome, and COMPAS, OPDA mitigates degenerate regimes observed under static weighting and naive single\-signal controllers, while producing competitive operating points in a diagnostic stress\-test setting\.

##### Code availability\.

## 2Related Work

##### Semi\-Supervised Learning\.

Modern SSL uses consistency regularization with confidence\-gated pseudo\-labeling\[[22](https://arxiv.org/html/2605.16446#bib.bib1),[26](https://arxiv.org/html/2605.16446#bib.bib2),[7](https://arxiv.org/html/2605.16446#bib.bib3)\]\. The unsupervised loss is masked when predictions fall below thresholdτ\\tau, introducing switch\-like dependency\. In multi\-label settings, gating is applied element\-wise\[[23](https://arxiv.org/html/2605.16446#bib.bib4),[5](https://arxiv.org/html/2605.16446#bib.bib5)\]\. Most SSL methods assume external regularizers do not systematically disable the unlabeled objective\. Our analysis reveals moment\-matching pressure can suppress confidence and deactivate pseudo\-label masks\.

##### Statistical Fairness via Moment Matching\.

Group fairness notions \(DP, EOp\)\[[25](https://arxiv.org/html/2605.16446#bib.bib9),[12](https://arxiv.org/html/2605.16446#bib.bib8)\]control disparities across sensitive groups\. Differentiable surrogates use moment matching or distributional discrepancy\[[11](https://arxiv.org/html/2605.16446#bib.bib11),[15](https://arxiv.org/html/2605.16446#bib.bib12)\]\. We adopt SimFair\[[18](https://arxiv.org/html/2605.16446#bib.bib13)\]as representative instantiation\. Crucially, constraint evaluation and pseudo\-label selection share evolving predictions, making constraint functions*endogenously non\-stationary*\. OPDA schedules dual pressure using online observables rather than assuming fixed constraint landscape\.

##### Fairness in Semi\-Supervised Learning\.

Prior fair SSL work uses primal–dual updates\[[6](https://arxiv.org/html/2605.16446#bib.bib15)\], group\-aware re\-weighting\[[14](https://arxiv.org/html/2605.16446#bib.bib16),[20](https://arxiv.org/html/2605.16446#bib.bib17)\], or representation\-level fairness\[[19](https://arxiv.org/html/2605.16446#bib.bib18),[3](https://arxiv.org/html/2605.16446#bib.bib19)\]\. These approaches focus on reducing bias given pseudo\-labels\. We address an orthogonal concern: strong moment\-matching pressure can destabilize pseudo\-labeling itself, yielding Masking Collapse or Trivial Saturation\.

##### Dynamic Weighting and Feedback Control\.

Multi\-objective methods \(GradNorm\[[8](https://arxiv.org/html/2605.16446#bib.bib21)\], PCGrad\[[24](https://arxiv.org/html/2605.16446#bib.bib22)\], MGDA\[[21](https://arxiv.org/html/2605.16446#bib.bib20)\]\) adjust weights based on gradient statistics\. However, they operate on gradients alone and do not monitor mechanism health\. In fair SSL, pseudo\-label availability can degrade catastrophically when fairness pressure suppresses confidence\. OPDA operates at epoch granularity and explicitly tracks pseudo\-label health signals \(qt,pt,ESStq\_\{t\},p\_\{t\},\\mathrm\{ESS\}\_\{t\}\) to detect SSL mechanism failure before gradient pathology\.

We frame OPDA as feedback control\[[27](https://arxiv.org/html/2605.16446#bib.bib24),[4](https://arxiv.org/html/2605.16446#bib.bib25)\]that tracks a moving equilibrium \(Proposition 3; Appendix C of the supplementary material\)\. OPDA separates total dual budget from allocation, updating both using online observables\. This provides a bridge to online\-optimization interpretation with regret guarantees \(Appendix G\)\.

## 3The OPDA Framework

To resolve the structural deadlock between moment\-matching fairness and confidence\-gated pseudo\-labeling, we proposeOnline Primal\-Dual Allocation \(OPDA\)\. OPDA treats each epoch as an online round and adaptively schedules two dual weights: fairness penalty and entropy\-based stability penalty, using online observables \(vt,rt,qt,pt,ESStv\_\{t\},r\_\{t\},q\_\{t\},p\_\{t\},\\mathrm\{ESS\}\_\{t\}, gradient alignment\)\.

##### Why entropy\-based stability?

FixMatch\-style confidence gating retains pseudo\-labels only when predictions are sufficiently confident\[[22](https://arxiv.org/html/2605.16446#bib.bib1)\]\. To preserve pseudo\-labeling feasibility, we introduce teacher\-view entropy minimization as a classical SSL regularizer\[[10](https://arxiv.org/html/2605.16446#bib.bib36)\]\. OPDA does not assume this channel is always beneficial: its allocation weight can be driven to a small floor when mask starvation is not dominant\.

##### Signals\.

OPDA monitors multiple online observables to detect both fairness violations and pseudo\-label health degradation\. At epochtt:vtv\_\{t\}is training\-time fairness penalty \(SimFair DP\-style moment matching, Appendix E\);rt=1−MacroF1val∈\[0,1\]r\_\{t\}=1\-\\mathrm\{MacroF1\}\_\{\\mathrm\{val\}\}\\in\[0,1\]is risk proxy;qt∈\[0,1\]q\_\{t\}\\in\[0,1\]is pseudo\-label pass ratio;pt∈\[0,1\]p\_\{t\}\\in\[0,1\]is proxy accuracy;ESSt\>0\\mathrm\{ESS\}\_\{t\}\>0is effective sample size\. We track pseudo\-label health due to known self\-training pathologies\[[2](https://arxiv.org/html/2605.16446#bib.bib37)\]\.

##### Risk proxy as an outer\-loop monitor\.

We definert=1−MacroF1valr\_\{t\}=1\-\\mathrm\{MacroF1\}\_\{\\mathrm\{val\}\}on a held\-out validation split as a bounded scalar that tracks utility degradation under class imbalance\. The validation Macro\-F1 used inrtr\_\{t\}is binarized with a fixed threshold of0\.50\.5during training; final reported test Macro\-F1 uses the per\-label rescaling thresholds described in Sec\.[4](https://arxiv.org/html/2605.16446#S4)\. Crucially,rtr\_\{t\}is*not*added to the inner\-loop training objective, and we do*not*backpropagate through Macro\-F1; it is used only by OPDA’s outer\-loop controller to gate budget growth against excessive utility loss, avoiding overfitting to validation performance\.

##### Fairness notion\.

Optimized constraintVt​\(θ\)V\_\{t\}\(\\theta\)is DP\-style group moment matching \(Appendix E\)\[[25](https://arxiv.org/html/2605.16446#bib.bib9)\]\. We useDP gapas primary fairness metric; EOp/EOd are evaluation\-only metrics\[[12](https://arxiv.org/html/2605.16446#bib.bib8)\]\.

### 3\.1Fair SSL as Non\-stationary Constrained Optimization

LetFt​\(θ\)F\_\{t\}\(\\theta\)denote base SSL objective in epochtt:

Ft​\(θ\)=ℒsup​\(θ\)\+λu​ℒunsup​\(θ;Mt​\(θ\)\),F\_\{t\}\(\\theta\)=\\mathcal\{L\}\_\{\\mathrm\{sup\}\}\(\\theta\)\+\\lambda\_\{u\}\\,\\mathcal\{L\}\_\{\\mathrm\{unsup\}\}\\\!\\big\(\\theta;M\_\{t\}\(\\theta\)\\big\),\(1\)whereMt​\(θ\)M\_\{t\}\(\\theta\)is selection mask under confidence\-thresholding\[[22](https://arxiv.org/html/2605.16446#bib.bib1)\]\. We consider two constraint functions: fairness violationVt​\(θ\)V\_\{t\}\(\\theta\)\(SimFair DP\) and entropy\-based stabilityHt​\(θ\)H\_\{t\}\(\\theta\)on weak views\. Crucially, both are non\-stationary through dependence onMt​\(θ\)M\_\{t\}\(\\theta\):

Vt​\(θ\)=V​\(θ;Mt​\(θ\)\),Ht​\(θ\)=H​\(θ;Mt​\(θ\)\)\.V\_\{t\}\(\\theta\)=V\(\\theta;M\_\{t\}\(\\theta\)\),\\qquad H\_\{t\}\(\\theta\)=H\(\\theta;M\_\{t\}\(\\theta\)\)\.\(2\)
Proxy constrained formulation:

minθ\\displaystyle\\min\_\{\\theta\}\\quadFt​\(θ\)\\displaystyle F\_\{t\}\(\\theta\)s\.t\.Vt​\(θ\)≤ϵv,Ht​\(θ\)≤ϵh,\\displaystyle V\_\{t\}\(\\theta\)\\leq\\epsilon\_\{v\},\\qquad H\_\{t\}\(\\theta\)\\leq\\epsilon\_\{h\},\(3\)with time\-varying Lagrangian

ℒt​\(θ,λv\(t\),λh\(t\)\)=Ft​\(θ\)\+λv\(t\)​Vt​\(θ\)\+λh\(t\)​Ht​\(θ\)\.\\mathcal\{L\}\_\{t\}\(\\theta,\\lambda\_\{v\}^\{\(t\)\},\\lambda\_\{h\}^\{\(t\)\}\)=F\_\{t\}\(\\theta\)\+\\lambda\_\{v\}^\{\(t\)\}V\_\{t\}\(\\theta\)\+\\lambda\_\{h\}^\{\(t\)\}H\_\{t\}\(\\theta\)\.\(4\)OPDA avoids static fairness weight via validation; instead, it steers dual variables using online improvement\-vs\-risk signals\.

### 3\.2Dual Re\-parameterization: Budget and Allocation

OPDA re\-parameterizes dual variables intototal budgetBtB\_\{t\}andallocation ratioπt\\pi\_\{t\}:

λv\(t\)\\displaystyle\\lambda\_\{v\}^\{\(t\)\}=Bt​πt,λh\(t\)=Bt​\(1−πt\),\\displaystyle=B\_\{t\}\\,\\pi\_\{t\},\\qquad\\lambda\_\{h\}^\{\(t\)\}=B\_\{t\}\\,\(1\-\\pi\_\{t\}\),\(5\)whereBt∈\[Bmin,Bmaxsoft\],πt∈\[0,1\]\.\\displaystyle B\_\{t\}\\in\[B\_\{\\min\},B\_\{\\max\}^\{\\mathrm\{soft\}\}\],\\quad\\pi\_\{t\}\\in\[0,1\]\.This decouples enforcement intensity \(BtB\_\{t\}\) from conflict resolution \(πt\\pi\_\{t\}\)\. In the released implementation, the log\-domain update uses a numerical floorεB\\varepsilon\_\{B\}whenBmin=0B\_\{\\min\}=0, and the non\-binding cap is implemented asBmaxsoftB\_\{\\max\}^\{\\mathrm\{soft\}\}\(Table[1](https://arxiv.org/html/2605.16446#S4.T1)\)\.

OPDA uses fixed internal constants across datasets \(Table[1](https://arxiv.org/html/2605.16446#S4.T1); supplementary Appendix D\)\. Figure[2](https://arxiv.org/html/2605.16446#S3.F2)illustrates signal flow\.

![Refer to caption](https://arxiv.org/html/2605.16446v1/x2.png)Figure 2:OPDA architecture\.Outer loop updates total budgetBtB\_\{t\}using gain\-cost signals; inner loop allocates budget between fairness and stability via alignment\-gated urgency\.#### 3\.2\.1Knee\-Seeking Budget Dynamics

We update total enforcement budget once per epoch in the numerically stabilized log domainut=log⁡\(max⁡\(Bt,Bmin\+εB\)\)u\_\{t\}=\\log\\\!\\big\(\\max\(B\_\{t\},\\;B\_\{\\min\}\+\\varepsilon\_\{B\}\)\\big\)\. Letv¯t,r¯t\\bar\{v\}\_\{t\},\\bar\{r\}\_\{t\}denote EMA\-smoothed fairness penalty and risk proxy\. OPDA forms signed improvement\-vs\-risk signal

st\\displaystyle s\_\{t\}=Gt−Ct,\\displaystyle=G\_\{t\}\-C\_\{t\},\(6\)Gt\\displaystyle G\_\{t\}=\[−Δ​v¯t−mv​\(t\)\]\+,\\displaystyle=\\big\[\-\\Delta\\bar\{v\}\_\{t\}\-m\_\{v\}\(t\)\\big\]\_\{\+\},\(7\)Ct\\displaystyle C\_\{t\}=\[Δ​r¯t−mr​\(t\)\]\+,\\displaystyle=\\big\[\\Delta\\bar\{r\}\_\{t\}\-m\_\{r\}\(t\)\\big\]\_\{\+\},\(8\)wheremv​\(t\)m\_\{v\}\(t\)andmr​\(t\)m\_\{r\}\(t\)are noise margins \(supplementary Appendix D\)\. Thus,st\>0s\_\{t\}\>0only when fairness improves beyond noise without risk deterioration beyond noise\. The controller applies stabilization to obtains~t\\tilde\{s\}\_\{t\}and updates

ut\+1\\displaystyle u\_\{t\+1\}=Π𝒰​\(ut\+ηt​s~t\),\\displaystyle=\\Pi\_\{\\mathcal\{U\}\}\\\!\\Big\(u\_\{t\}\+\\eta\_\{t\}\\,\\tilde\{s\}\_\{t\}\\Big\),\(9\)Bt\+1\\displaystyle B\_\{t\+1\}=Π\[Bmin,Bmaxsoft\]​\(exp⁡\(ut\+1\)\)\.\\displaystyle=\\Pi\_\{\[B\_\{\\min\},B\_\{\\max\}^\{\\mathrm\{soft\}\}\]\}\\\!\\big\(\\exp\(u\_\{t\+1\}\)\\big\)\.\(10\)

#### 3\.2\.2Conflict\-Aware Allocation

OPDA allocates budgetBtB\_\{t\}between penalties using allocation ratioπt∈\[0,1\]\\pi\_\{t\}\\in\[0,1\]\. Letgbase=∇θFt​\(θ\)g\_\{\\mathrm\{base\}\}=\\nabla\_\{\\theta\}F\_\{t\}\(\\theta\),gv=∇θVt​\(θ\)g\_\{v\}=\\nabla\_\{\\theta\}V\_\{t\}\(\\theta\),gh=∇θHt​\(θ\)g\_\{h\}=\\nabla\_\{\\theta\}H\_\{t\}\(\\theta\)\. OPDA computes alignment cosines

ctv=cos⁡\(gv,gbase\),cth=cos⁡\(gh,gbase\),c\_\{t\}^\{v\}=\\cos\(g\_\{v\},g\_\{\\mathrm\{base\}\}\),\\qquad c\_\{t\}^\{h\}=\\cos\(g\_\{h\},g\_\{\\mathrm\{base\}\}\),\(11\)and uses sigmoid gateϕ​\(c\)\\phi\(c\)to suppress urgency when constraint gradient is anti\-aligned\.

OPDA forms two urgency scalarsdv​\(t\)d\_\{v\}\(t\)anddh​\(t\)d\_\{h\}\(t\)\.dv​\(t\)d\_\{v\}\(t\)summarizes noise\-filtered fairness improvement and current violation;dh​\(t\)d\_\{h\}\(t\)summarizes pseudo\-label health deficits \(qt,pt,ESStq\_\{t\},p\_\{t\},\\mathrm\{ESS\}\_\{t\}\) relative to warmup baselines\. Both are EMA\-smoothed and gated byϕ​\(⋅\)\\phi\(\\cdot\)\(supplementary Appendix D\)\.

Target allocation minimizes convex quadratic surrogate:

πt⋆\\displaystyle\\pi\_\{t\}^\{\\star\}=arg⁡minπ∈\[0,1\]⁡\(dv​\(t\)​\(π−1\)2\+dh​\(t\)​π2\)\\displaystyle=\\arg\\min\_\{\\pi\\in\[0,1\]\}\\Big\(d\_\{v\}\(t\)\(\\pi\-1\)^\{2\}\+d\_\{h\}\(t\)\\pi^\{2\}\\Big\)\(12\)=dv​\(t\)dv​\(t\)\+dh​\(t\)\+ε\.\\displaystyle=\\frac\{d\_\{v\}\(t\)\}\{d\_\{v\}\(t\)\+d\_\{h\}\(t\)\+\\varepsilon\}\.\(13\)To prevent starvation, OPDA enforces ESS\-adaptive floorπmin​\(t\)\>0\\pi\_\{\\min\}\(t\)\>0with EMA and clipping:

πt\\displaystyle\\pi\_\{t\}=Π\[πmin​\(t\),1−πmin​\(t\)\]​\(EMA​\(πt⋆\)\),\\displaystyle=\\Pi\_\{\[\\pi\_\{\\min\}\(t\),\\,1\-\\pi\_\{\\min\}\(t\)\]\}\\Big\(\\mathrm\{EMA\}\(\\pi\_\{t\}^\{\\star\}\)\\Big\),λv\(t\)\\displaystyle\\lambda\_\{v\}^\{\(t\)\}=Bt​πt,λh\(t\)=Bt​\(1−πt\)\.\\displaystyle=B\_\{t\}\\pi\_\{t\},\\quad\\lambda\_\{h\}^\{\(t\)\}=B\_\{t\}\(1\-\\pi\_\{t\}\)\.\(14\)

### 3\.3Formal Characterization of Failure Modes

We formalize the two structural failures identified in Sec\.[1](https://arxiv.org/html/2605.16446#S1)\.

##### Scope\.

The following propositions analyze failure modes under convex surrogate losses\. In practice, OPDA operates on non\-convex deep networks\. We view these propositions as providing qualitative guidance—they identify structural pathologies \(vanishing gradients, gradient conflict\) that motivate OPDA’s multi\-signal design\. Empirical validation is provided in Sec\.[4](https://arxiv.org/html/2605.16446#S4)\.

###### Proposition 1\(Masking Collapse: vanishing\-gradient neighborhoods\)

Consider confidence\-gated SSL with thresholdτ∈\(1/2,1\)\\tau\\in\(1/2,1\)and moment\-matching fairness penaltyV​\(θ\)V\(\\theta\)\(e\.g\., squared DP\)\. Suppose the model can realize constant predictors, and letθ∗\\theta^\{\*\}satisfyfθ∗​\(x\)≡cf\_\{\\theta^\{\*\}\}\(x\)\\equiv cwithV​\(θ∗\)=0V\(\\theta^\{\*\}\)=0\. Ifc∈\(1−τ,τ\)c\\in\(1\-\\tau,\\tau\)\(low\-confidence region\), then there exists open neighborhood𝒩​\(θ∗\)\\mathcal\{N\}\(\\theta^\{\*\}\)such that for allθ∈𝒩​\(θ∗\)\\theta\\in\\mathcal\{N\}\(\\theta^\{\*\}\),ℒunsup​\(θ\)=0\\mathcal\{L\}\_\{\\mathrm\{unsup\}\}\(\\theta\)=0and∇θℒunsup​\(θ\)=0\\nabla\_\{\\theta\}\\mathcal\{L\}\_\{\\mathrm\{unsup\}\}\(\\theta\)=0\.

Intuition\.Strong fairness pressure drives predictions toward decision boundary \(y^≈0\.5\\hat\{y\}\\approx 0\.5\), causing confidence gate to reject most samples\. This creates vanishing\-gradient neighborhoods preventing learning from unlabeled data\. Proof in Appendix A\.

###### Proposition 2\(Gradient conflict under SimFair\)

Consider binary classification with two groups \(K=2K=2\)\. LetVSimFair​\(θ\)=‖μ0​\(θ\)−μ1​\(θ\)‖2V\_\{\\mathrm\{SimFair\}\}\(\\theta\)=\\\|\\mu\_\{0\}\(\\theta\)\-\\mu\_\{1\}\(\\theta\)\\\|\_\{2\}denote SimFair DP penalty, andH​\(θ\)H\(\\theta\)denote entropy minimization\. Assume: \(i\) imbalanced groups, \(ii\) distinct group means, \(iii\) differentiable model\. Then logit\-space gradients exhibit sign conflict: fairness pushes toward mean equalization while entropy minimization pushes predictions away from0\.50\.5\. Whether this translates to parameter\-space negative alignment depends on network Jacobian and is monitored online via cosine alignment in OPDA\.

Intuition\.SimFair’s moment\-matching reduces‖μ0−μ1‖\\\|\\mu\_\{0\}\-\\mu\_\{1\}\\\|\(reducing inter\-group variance\)\. Entropy minimization drives predictions toward\{0,1\}\\\{0,1\\\}\(increasing prediction variance\)\. Under imbalanced groups, these create persistent gradient conflict\. Proof in Appendix B\.

### 3\.4Theoretical Justification \(Sketch\)

We emphasize these statements are theory on convex surrogates; they do not claim global optimality for deep non\-convex training\.

###### Lemma 1\(Anti\-starvation of dual weights\)

LetB¯\>0\\underline\{B\}\>0denote effective budget lower bound, and assumeBt≥B¯B\_\{t\}\\geq\\underline\{B\}andπt∈\[πmin​\(t\),1−πmin​\(t\)\]\\pi\_\{t\}\\in\[\\pi\_\{\\min\}\(t\),1\-\\pi\_\{\\min\}\(t\)\]withπmin​\(t\)\>0\\pi\_\{\\min\}\(t\)\>0\. Then at every epoch,λv\(t\)≥B¯​πmin​\(t\)\\lambda\_\{v\}^\{\(t\)\}\\geq\\underline\{B\}\\,\\pi\_\{\\min\}\(t\)andλh\(t\)≥B¯​πmin​\(t\)\\lambda\_\{h\}^\{\(t\)\}\\geq\\underline\{B\}\\,\\pi\_\{\\min\}\(t\), so neither penalty can be fully turned off\.

Empirical validation\.Figure[5](https://arxiv.org/html/2605.16446#S4.F5)\(Sec\.[4](https://arxiv.org/html/2605.16446#S4)\) shows maintainingλh\(t\)\>0\\lambda\_\{h\}^\{\(t\)\}\>0correlates with pseudo\-label coverageqtq\_\{t\}remaining above critical threshold\.

Scope\.This characterizes OPDA’s controller\-level equilibrium under deterministic drift approximation \(Appendix G\)\. It does not claim global optimality but provides precise interpretation of knee\-seeking behavior in controller units\.

## 4Experiments

We evaluate OPDA under semi\-supervised training with SimFair regularization\[[18](https://arxiv.org/html/2605.16446#bib.bib13)\]to answer:\(RQ1\) Diagnosis:Do the two structural failure modes manifest under fixed dual weighting?\(RQ2\) Trade\-off:Can OPDA produce competitive utility–fairness operating point without manual tuning?\(RQ3\) Mechanism:Do induced dual schedules evolve consistently with OPDA’s design?\(RQ4\) Comparison:Does OPDA’s multi\-signal design offer advantages over single\-signal adaptive baselines?

### 4\.1Experimental Settings

##### Diagnostic design rationale\.

Our experimental setting is adiagnostic stress testdesigned to isolate and expose pseudo\-label\-health failures under shared fairness pressure\. On Adult, we construct a 23\-dimensional label vector \(income \+ workclass \+ occupation one\-hot blocks\) and apply fairness constraints to all dimensions\. This design choice serves two purposes: \(i\)Correlated attributes:workclassandoccupationare predictive ofincomeand may themselves exhibit demographic disparities, so constraining onlyincomecould allow bias to propagate through auxiliary predictions\. \(ii\)Multi\-label stress test: Applying fairness constraints to all dimensions creates a more challenging optimization landscape that exercises OPDA’s conflict\-aware allocation mechanism\. Standard single\-label deployment would use simpler configurations; we adopt this multi\-label structure specifically to stress\-test the controller’s ability to avoid structural failures\. To verify that improvements are not artifacts of this multi\-label aggregation, we report per\-dimension decomposition metrics \(Tables[5](https://arxiv.org/html/2605.16446#S4.T5)and[6](https://arxiv.org/html/2605.16446#S4.T6)\) that isolate the primary income prediction task\.

##### Datasets\.

Adult\[[16](https://arxiv.org/html/2605.16446#bib.bib27)\]: sensitive attributesex\(K=2K\{=\}2\),L=23L\{=\}23dimensions \(income \+ workclass \+ occupation one\-hot blocks\)\.ACSIncome\[[9](https://arxiv.org/html/2605.16446#bib.bib28)\]: sensitive attributeRAC1P\(K=9K\{=\}9race groups\),L=4L\{=\}4binary targets \(income\_over\_50k, employed, pubcov, migrated\)\.COMPAS\[[1](https://arxiv.org/html/2605.16446#bib.bib38)\]: sensitive attributesex\(K=2K\{=\}2\),L=2L\{=\}2binary targets \(general\_recid, violent\_recid\)\.

##### Metrics\.

Macro\-F1\(macro\-averaged overLLdimensions\) as primary utility metric, using per\-label rescaling thresholds \(empirical prevalence\)\.DP gapas primary fairness metric:DPgap≜∑k=1K∥𝔼\[y^\]−𝔼\[y^∣a=k\]∥2\\mathrm\{DP\\ gap\}\\triangleq\\sum\_\{k=1\}^\{K\}\\left\\lVert\\mathbb\{E\}\[\\hat\{y\}\]\-\\mathbb\{E\}\[\\hat\{y\}\\mid a\{=\}k\]\\right\\rVert\_\{2\}\. We additionally reportEOp gapfor completeness\. Per\-dimension decomposition metrics in Tables[5](https://arxiv.org/html/2605.16446#S4.T5)and[6](https://arxiv.org/html/2605.16446#S4.T6)isolate primary income prediction\. Those decomposition tables additionally reportBinary EOdon the primary task, so the main tables and the decomposition tables use different secondary fairness diagnostics\.

##### Methods\.

FixMatch \(Base\): Standard SSL\[[22](https://arxiv.org/html/2605.16446#bib.bib1)\]\.Static\-λ\\lambda: FixMatch \+ SimFair with fixedλfair\\lambda\_\{\\mathrm\{fair\}\}\. We run gridλfair∈\{0\.1,1,10,20,…,100\}\\lambda\_\{\\mathrm\{fair\}\}\\in\\\{0\.1,1,10,20,\\ldots,100\\\}as empirical frontier\.Adaptive baselines: Three single\-signal controllers \(EMA\-P, PI, DualAsc\) that updateλfair\\lambda\_\{\\mathrm\{fair\}\}once per epoch using only the EMA\-smoothed fairness violationv¯t\\bar\{v\}\_\{t\}\(with momentumρ=0\.9\\rho\{=\}0\.9, matching OPDA’s EMA\)\. Unlike OPDA, none monitors pseudo\-label health \(qt,pt,ESStq\_\{t\},p\_\{t\},\\mathrm\{ESS\}\_\{t\}\), gradient alignment, or risk proxy—this isolation tests whether fairness\-only feedback suffices to avoid structural failures\. All three use a single pre\-specified configuration with the target violationvtgtv\_\{\\mathrm\{tgt\}\}calibrated deterministically from warmup\-epoch violations \(no tuning required; supplementary Appendix I\)\. Single\-signal controllers face a fundamental dilemma: they cannot distinguish between fairness violations caused by insufficient model capacity \(requiring higherλfair\\lambda\_\{\\mathrm\{fair\}\}\) versus pseudo\-label starvation \(requiring lowerλfair\\lambda\_\{\\mathrm\{fair\}\}\), leading to oscillation or premature saturation\.OPDA: Schedules\(λv\(t\),λh\(t\)\)\(\\lambda\_\{v\}^\{\(t\)\},\\lambda\_\{h\}^\{\(t\)\}\)via\(Bt,πt\)\(B\_\{t\},\\pi\_\{t\}\)with fixed constants \(Table[1](https://arxiv.org/html/2605.16446#S4.T1)\)\.MOO baselines: PCGrad\[[24](https://arxiv.org/html/2605.16446#bib.bib22)\]and CAGrad\[[17](https://arxiv.org/html/2605.16446#bib.bib39)\]for gradient\-based comparison\.

Table 1:Fixed OPDA controller configuration used throughout the paper \(pre\-specified; not tuned\)\.These quantities are*internal constants*of the controller implementation and are held fixed across datasets/backbones unless explicitly perturbed for diagnostics in supplementary Appendix H\. The released implementation additionally uses a numerical log\-floorεB=10−8\\varepsilon\_\{B\}=10^\{\-8\}in the log/exp budget parameterization to keeputu\_\{t\}well\-defined even whenBmin=0B\_\{\\min\}=0\.

### 4\.2Main Results

##### RQ1: Failure mode diagnosis\.

Figure[3](https://arxiv.org/html/2605.16446#S4.F3)illustrates two structural failures under excessive fairness pressure\. On Adult, Static\-λ=100\\lambda\{=\}100exhibitsMasking Collapse:qt→0\.02q\_\{t\}\\to 0\.02, Macro\-F1→0\.15\\to 0\.15\(near base\-rate\)\. On ACSIncome, aggressive single\-signal control can induceTrivial Saturation; supplementary Appendix I shows PI reaches 100% saturation for 5 of 6 target\-violation settings across all five pseudo\-labeling backbones\. Both failures validate Propositions[1](https://arxiv.org/html/2605.16446#Thmproposition1)and[2](https://arxiv.org/html/2605.16446#Thmproposition2)\.

![Refer to caption](https://arxiv.org/html/2605.16446v1/x3.png)Figure 3:Illustrative failure modes under excessive fairness pressure\.Adult:Type I—coverage collapse under staticλfair\\lambda\_\{\\mathrm\{fair\}\}\.ACSIncome:Type II—constant predictor drift under aggressive single\-signal control\.
##### RQ2: Trade\-off performance\.

Table[2](https://arxiv.org/html/2605.16446#S4.T2)summarizes results across three datasets\. On Adult and COMPAS, OPDA generates non\-degenerate operating points that are competitive with the empirical static\-λ\\lambdafrontier \(Figure[4](https://arxiv.org/html/2605.16446#S4.F4)\)\. On ACSIncome, OPDA preserves utility while exhibiting wider fairness\-utility spread in the more complexK=9K\{=\}9,L=4L\{=\}4setting, without guaranteeing uniform fairness improvement\. Decomposition metrics \(Tables[5](https://arxiv.org/html/2605.16446#S4.T5),[6](https://arxiv.org/html/2605.16446#S4.T6)\) show that the same utility–fairness trade\-off appears on the primary income prediction task\.

Table 2:Main results across three tabular benchmarks\.Mean±\\pmstd over 10 randomly selected seeds\. OPDA is positioned as a calibration\-free operating\-point generator rather than a guarantee of uniform dominance over every fixed static weight\.![Refer to caption](https://arxiv.org/html/2605.16446v1/x4.png)Figure 4:Empirical Pareto frontiers\.OPDA operating points \(red stars\) lie near the favorable region of the static\-λ\\lambdafrontier on Adult and COMPAS; broader spread on ACSIncome\.
##### RQ3: Mechanism validation\.

Figure[5](https://arxiv.org/html/2605.16446#S4.F5)visualizes how representative fairness levels reshape prediction variance and pseudo\-label coverage over training\. On Adult, larger fixed fairness pressure drives pseudo\-label coverageqtq\_\{t\}downward after warmup, consistent with mask starvation\. On ACSIncome, coverage can remain high even while prediction variance is suppressed, consistent with constant\-predictor drift\. These trajectories match the two failure mechanisms that motivate OPDA’s non\-zero stability channel\.

![Refer to caption](https://arxiv.org/html/2605.16446v1/x5.png)Figure 5:Failure\-mechanism trajectories under representative fairness levels\.Larger fairness pressure reduces pseudo\-label coverage on Adult, while ACSIncome can retain high coverage even as prediction variance is suppressed\.
##### RQ4: Comparison with adaptive baselines\.

EMA\-P and DualAsc are reasonable single\-signal baselines at the default target setting, but supplementary Appendix I shows that their target\-violation sensitivity is mild on Adult and much stronger on ACSIncome\. PI is the least robust baseline: supplementary Appendix I shows 100% saturation for 5 of 6 target settings across all five pseudo\-labeling backbones\. DualAsc lacks OPDA’s anti\-starvation guarantee and pseudo\-label\-health monitoring\. OPDA’s multi\-signal design reduces saturation risk while maintaining competitive trade\-offs\. Table[3](https://arxiv.org/html/2605.16446#S4.T3)adds a compact empirical comparison with gradient\-based MOO\. On Adult, OPDA reaches lower DP than PCGrad/CAGrad, with utility comparable to CAGrad and below PCGrad; on ACSIncome, PCGrad/CAGrad recover utility but remain less fair; on COMPAS, gradient\-only methods exhibit non\-zero saturation, while OPDA remains at Sat\.%=0=0and is more fair than both PCGrad and CAGrad\. Table[4](https://arxiv.org/html/2605.16446#S4.T4)summarizes ranges over five pseudo\-labelers under one fixed configuration\. The controller remains in the same qualitative regime across backbones, so the observed behavior is not tied to a single SSL pipeline\.

Table 3:Compact empirical comparison with gradient\-based MOO methodson the FixMatch backbone\. Cells report mean Macro\-F1 / DP / Sat\.%\. Sat\.% is omitted for Static\-λ\\lambdabecause it is a fixed\-weight baseline\. Full mean±\\pmstd tables are provided in the supplementary PDF\.Table 4:Cross\-backbone ranges over five pseudo\-labelers\(FixMatch, MeanTeacher, NoisyStudent, PseudoLabel, UDA\)\. Cells report the range of mean Macro\-F1 / mean DP obtained with a single fixed configuration per controller\. This summarizes whether OPDA’s operating regime depends on a specific SSL backbone\.

### 4\.3Ablation and Sensitivity

##### OPDA\-lite ablation\.

OPDA\-lite removes gradient alignment gating and replaces OPDA’s full multi\-signal budget logic with a simplified budget\-and\-allocation controller\. Relative to OPDA\-lite, full OPDA shifts the operating point toward utility preservation when pseudo\-label feasibility is fragile\. On Adult, the two methods achieve essentially the same Macro\-F1, while OPDA\-lite attains a lower DP gap\. On ACSIncome, full OPDA recovers higher Macro\-F1 at the cost of a larger DP gap\. Detailed ablation results are provided in supplementary Appendix I\.

##### Sensitivity analysis\.

We evaluate robustness to controller constants: smoothing parameters \(ρEMA\\rho\_\{\\mathrm\{EMA\}\}\), noise margins \(mv,mrm\_\{v\},m\_\{r\}\), budget bounds \(Bmin,BmaxB\_\{\\min\},B\_\{\\max\}\), allocation floor \(πmin\\pi\_\{\\min\}\)\. OPDA remains stable under moderate perturbations around defaults\. Detailed sensitivity results are provided in supplementary Appendix H\.

### 4\.4Per\-Dimension Decomposition

Tables[5](https://arxiv.org/html/2605.16446#S4.T5)and[6](https://arxiv.org/html/2605.16446#S4.T6)report per\-dimension metrics isolating primary income prediction\. In addition to per\-dimension DP gap and F1, these tables report Binary DP and Binary EOd on the primary task\. On Adult income dimension: OPDA achieves F1=0\.69=0\.69, DP gap=0\.15=0\.15vs Static\-λ=10\\lambda\{=\}10F1=0\.49=0\.49, DP gap=0\.08=0\.08\. On ACS income dimension: OPDA achieves F1=0\.68=0\.68, DP gap=0\.39=0\.39vs Static\-λ=10\\lambda\{=\}10F1=0\.66=0\.66, DP gap=0\.23=0\.23\. These show that OPDA’s utility\-preserving operating point on the aggregate metrics is not an artifact of multi\-label aggregation, while the utility–fairness trade\-off remains visible on the primary task\.

Table 5:Label\-block decomposition on Adult \(income dimension\)\.Per\-block DP gap and Macro\-F1, plus Binary DP and Binary EOd on the primary income task\.Table 6:Label\-block decomposition on ACSIncome \(target1 dimension\)\.Target1 is the primary income prediction task; Binary DP and Binary EOd are reported on that task\.

## 5Conclusion

We studied fair semi\-supervised training under confidence gating and moment\-matching regularization\. We identified two structural failure modes—Masking Collapse and Trivial Saturation—arising from non\-stationary coupling between pseudo\-label mask and fairness constraint\. We proposed OPDA, an online controller that adaptively schedules dual weights via budget–allocation reparameterization\. In tabular multi\-output stress tests, OPDA provides calibration\-free path to controlled operating points under label scarcity\. While our evaluation focuses on diagnostic stress tests to isolate failure mechanisms, the identified pathologies \(Masking Collapse, Trivial Saturation\) are inherent to confidence\-gated fair SSL, and OPDA’s design principles extend to standard single\-label deployment scenarios\. While not universally superior across all metrics and datasets, OPDA’s value lies in diagnosing and managing failure regimes without per\-dataset tuning\. Comparisons with single\-signal adaptive baselines provide additional evidence: PI exhibits trivial saturation across most target\-violation settings, while EMA\-P and DualAsc are comparatively robust on Adult but substantially more sensitive on ACSIncome\. Relative to OPDA\-lite ablation, full OPDA mainly shifts the operating point toward higher utility on ACSIncome, whereas Adult highlights the fairness–utility trade\-off between the two variants\.

##### Limitations\.

Our study focuses on tabular multi\-output benchmarks \(Adult, ACSIncome, COMPAS\)\. Whether identified failure modes manifest in high\-dimensional image/text fair\-SSL remains open\. Adaptive baselines are intentionally minimal \(single\-signal\); comparing against richer Pareto\-based strategies is natural next step\. On ACSIncome, OPDA exhibits larger DP gap variance on multi\-label aggregate, illustrating positioning as controller avoiding per\-dataset weight selection rather than uniform fairness improvement guarantee\. OPDA requires pre\-specifying controller constants such as warmup lengthTwarmT\_\{\\mathrm\{warm\}\}and budget bounds, though we demonstrate robustness across a diagnostic range\.

##### Future directions\.

\(i\) Extending OPDA to high\-dimensional image/text benchmarks; \(ii\) systematic comparison with Pareto\-based adaptive strategies; \(iii\) investigating dimension\-weighted fairness variants where practitioners specify label\-specific fairness relevance\.

##### Use of Generative AI\.

We used generative AI tools only for language polishing and grammar correction\. All technical content, experiments, analysis, and final wording were verified by the authors, who take full responsibility for the manuscript\.

## References

- \[1\]J\. Angwin, J\. Larson, S\. Mattu, and L\. Kirchner\(2016\-05\-23\)Machine bias: there’s software used across the country to predict future criminals\. and it’s biased against blacks\.Note:*ProPublica*Accessed: 2026\-03\-13External Links:[Link](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing)Cited by:[§1](https://arxiv.org/html/2605.16446#S1.SS0.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2605.16446#S4.SS1.SSS0.Px2.p1.6)\.
- \[2\]E\. Arazo, D\. Ortego, P\. Albert, N\. E\. O’Connor, and K\. McGuinness\(2020\)Pseudo\-labeling and confirmation bias in deep semi\-supervised learning\.InInternational Joint Conference on Neural Networks \(IJCNN\),pp\. 1–8\.External Links:[Document](https://dx.doi.org/10.1109/IJCNN48605.2020.9207304)Cited by:[§3](https://arxiv.org/html/2605.16446#S3.SS0.SSS0.Px2.p1.6)\.
- \[3\]M\. Arjovsky, L\. Bottou, I\. Gulrajani, and D\. Lopez\-Paz\(2019\)Invariant risk minimization\.arXiv preprint arXiv:1907\.02893\.Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px3.p1.1)\.
- \[4\]A\. G\. Baydin, R\. Cornish, D\. M\. Rubio, M\. Schmidt, and F\. Wood\(2018\)Online learning rate adaptation with hypergradient descent\.InInternational Conference on Learning Representations \(ICLR\),Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px4.p2.1)\.
- \[5\]D\. Berthelot, N\. Carlini, E\. D\. Cubuk, A\. Kurakin, K\. Sohn, H\. Zhang, and C\. Raffel\(2019\)MixMatch: a holistic approach to semi\-supervised learning\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px1.p1.1)\.
- \[6\]B\. Brubach, D\. Chakrabarti, J\. P\. Dickerson, A\. Srinivasan, and L\. Tsepenekas\(2021\)Fairness, semi\-supervised learning, and more: a general framework for clustering with stochastic pairwise constraints\.InProceedings of the AAAI Conference on Artificial Intelligence \(AAAI\),pp\. 6822–6830\.Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px3.p1.1)\.
- \[7\]H\. Chen, R\. Tao, Y\. Fan, Z\. Zhang, and X\. Jing\(2023\)SoftMatch: addressing the quantity\-quality tradeoff in semi\-supervised learning\.InInternational Conference on Learning Representations \(ICLR\),Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px1.p1.1)\.
- \[8\]Z\. Chen, V\. Badrinarayanan, C\. Lee, and A\. Rabinovich\(2018\)GradNorm: gradient normalization for adaptive loss balancing in deep multitask networks\.InProceedings of the 35th International Conference on Machine Learning,J\. Dy and A\. Krause \(Eds\.\),Proceedings of Machine Learning Research, Vol\.80,pp\. 794–803\.External Links:[Link](https://proceedings.mlr.press/v80/chen18a.html)Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px4.p1.1)\.
- \[9\]F\. Ding, M\. Hardt, J\. P\. Miller, and L\. Schmidt\(2021\)Retiring adult: new datasets for fair machine learning\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Cited by:[§1](https://arxiv.org/html/2605.16446#S1.SS0.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2605.16446#S4.SS1.SSS0.Px2.p1.6)\.
- \[10\]Y\. Grandvalet and Y\. Bengio\(2004\)Semi\-supervised learning by entropy minimization\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Vol\.17,pp\. 529–536\.Cited by:[§3](https://arxiv.org/html/2605.16446#S3.SS0.SSS0.Px1.p1.1)\.
- \[11\]A\. Gretton, K\. M\. Borgwardt, M\. J\. Rasch, B\. Schölkopf, and A\. J\. Smola\(2012\)A kernel two\-sample test\.Journal of Machine Learning Research13,pp\. 723–773\.Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px2.p1.1)\.
- \[12\]M\. Hardt, E\. Price, and N\. Srebro\(2016\)Equality of opportunity in supervised learning\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px2.p1.1),[§3](https://arxiv.org/html/2605.16446#S3.SS0.SSS0.Px4.p1.1)\.
- \[13\]E\. Hazan\(2016\)Introduction to online convex optimization\.Foundations and Trends in Optimization2\(3–4\),pp\. 157–325\.External Links:[Document](https://dx.doi.org/10.1561/2400000013)Cited by:[Remark 1](https://arxiv.org/html/2605.16446#Thmremark1.p1.3)\.
- \[14\]F\. Kamiran and T\. Calders\(2012\)Data preprocessing techniques for classification without discrimination\.Knowledge and Information Systems33\(1\)\.Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px3.p1.1)\.
- \[15\]T\. Kamishima, S\. Akaho, H\. Asoh, and J\. Sakuma\(2011\)Fairness\-aware learning through regularization approach\.InIEEE International Conference on Data Mining Workshops \(ICDMW\),Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px2.p1.1)\.
- \[16\]R\. Kohavi\(1996\)Census income\.Note:UCI Machine Learning RepositoryExternal Links:[Document](https://dx.doi.org/10.24432/C5GP7S)Cited by:[§1](https://arxiv.org/html/2605.16446#S1.SS0.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2605.16446#S4.SS1.SSS0.Px2.p1.6)\.
- \[17\]B\. Liu, X\. Liu, X\. Jin, P\. Stone, and Q\. Liu\(2021\)CAGrad: conflict\-averse gradient descent for multi\-task learning\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Cited by:[§4\.1](https://arxiv.org/html/2605.16446#S4.SS1.SSS0.Px4.p1.12)\.
- \[18\]T\. Liu, X\. Liu, W\. Jiang, V\. Saligrama, C\. Rudin, and Y\. Wu\(2023\)SimFair: a unified framework for fairness\-aware multi\-label classification\.InProceedings of the AAAI Conference on Artificial Intelligence \(AAAI\),External Links:[Document](https://dx.doi.org/10.1609/aaai.v37i12.26677)Cited by:[§1](https://arxiv.org/html/2605.16446#S1.SS0.SSS0.Px1.p3.1),[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px2.p1.1),[§4](https://arxiv.org/html/2605.16446#S4.p1.1)\.
- \[19\]D\. Madras, E\. Creager, T\. Pitassi, and R\. Zemel\(2018\)Learning adversarially fair and transferable representations\.InProceedings of the 35th International Conference on Machine Learning,J\. Dy and A\. Krause \(Eds\.\),Proceedings of Machine Learning Research, Vol\.80,pp\. 3384–3393\.External Links:[Link](https://proceedings.mlr.press/v80/madras18a.html)Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px3.p1.1)\.
- \[20\]H\. Schmutz, O\. Humbert, and P\. Mattei\(2023\)Don’t fear the unlabelled: safe semi\-supervised learning via debiasing\.InThe Eleventh International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=TN9gQ4x0Ep3)Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px3.p1.1)\.
- \[21\]O\. Sener and V\. Koltun\(2018\)Multi\-task learning as multi\-objective optimization\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Cited by:[§1](https://arxiv.org/html/2605.16446#S1.SS0.SSS0.Px1.p4.3),[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px4.p1.1)\.
- \[22\]K\. Sohn, D\. Berthelot, C\. Li, Z\. Zhang, N\. Carlini, E\. D\. Cubuk, A\. Kurakin, H\. Zhang, and C\. Raffel\(2020\)FixMatch: simplifying semi\-supervised learning with consistency and confidence\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Cited by:[§1](https://arxiv.org/html/2605.16446#S1.p1.1),[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px1.p1.1),[§3](https://arxiv.org/html/2605.16446#S3.SS0.SSS0.Px1.p1.1),[§3\.1](https://arxiv.org/html/2605.16446#S3.SS1.p1.6),[§4\.1](https://arxiv.org/html/2605.16446#S4.SS1.SSS0.Px4.p1.12)\.
- \[23\]A\. Tarvainen and H\. Valpola\(2017\)Mean teachers are better role models: weight\-averaged consistency targets improve semi\-supervised deep learning results\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px1.p1.1)\.
- \[24\]T\. Yu, S\. Kumar, A\. Gupta, S\. Levine, K\. Hausman, and C\. Finn\(2020\)Gradient surgery for multi\-task learning\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Cited by:[§1](https://arxiv.org/html/2605.16446#S1.SS0.SSS0.Px1.p4.3),[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px4.p1.1),[§4\.1](https://arxiv.org/html/2605.16446#S4.SS1.SSS0.Px4.p1.12)\.
- \[25\]M\. B\. Zafar, I\. Valera, M\. Gomez Rodriguez, and K\. P\. Gummadi\(2017\)Fairness Constraints: Mechanisms for Fair Classification\.InProceedings of the 20th International Conference on Artificial Intelligence and Statistics,A\. Singh and J\. Zhu \(Eds\.\),Proceedings of Machine Learning Research, Vol\.54,pp\. 962–970\.External Links:[Link](https://proceedings.mlr.press/v54/zafar17a.html)Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px2.p1.1),[§3](https://arxiv.org/html/2605.16446#S3.SS0.SSS0.Px4.p1.1)\.
- \[26\]B\. Zhang, Y\. Wang, W\. Hou, H\. Wu, J\. Wang, and M\. Okumura\(2021\)FlexMatch: boosting semi\-supervised learning with curriculum pseudo labeling\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px1.p1.1)\.
- \[27\]J\. Zhang, I\. Mitliagkas, and C\. Ré\(2019\)YellowFin and the art of momentum tuning\.InProceedings of Machine Learning and Systems \(MLSys\),Cited by:[§2](https://arxiv.org/html/2605.16446#S2.SS0.SSS0.Px4.p2.1)\.
- \[28\]M\. Zinkevich\(2003\)Online convex programming and generalized infinitesimal gradient ascent\.Technical reportTechnical ReportCMU\-CS\-03\-110,Carnegie Mellon University\.Cited by:[Remark 1](https://arxiv.org/html/2605.16446#Thmremark1.p1.3)\.

Similar Articles

Online Allocation with Unknown Shared Supply

arXiv cs.AI

This paper introduces the Online Shared Supply Allocation problem and proposes a deterministic threshold-proportional policy (GPA) that achieves a 4/3-approximation to the offline optimum. It also includes a learning-augmented extension to handle imperfect forecasts and demonstrates superior performance in synthetic and real-world experiments.

Optimistic Dual Averaging Unifies Modern Optimizers

arXiv cs.LG

This paper introduces SODA, a generalization of Optimistic Dual Averaging that unifies various modern optimizers like Muon and Lion. It proposes a practical wrapper that improves performance across different scales without requiring additional hyperparameter tuning for weight decay.