Sum-of-Squares Degree Barriers for the Reweighted-Hinge Method in Robust Halfspace Learning: A Christoffel-Function Characterization
Summary
This paper establishes a characterization of the sum-of-squares degree barriers for the reweighted-hinge method in robust halfspace learning using the Christoffel function, revealing a margin-degree tradeoff and explicit outlier barriers.
View Cached Full Text
Cached at: 06/17/26, 05:36 AM
# Sum-of-Squares Degree Barriers for the Reweighted-Hinge Method in Robust Halfspace Learning: A Christoffel-Function Characterization
Source: [https://arxiv.org/html/2606.17215](https://arxiv.org/html/2606.17215)
###### Abstract
A certificate that removes outliers sees the data only through its low\-degree moments, and an adversary exploits exactly this, hiding corruption where the clean data already looks typical, in the blind spot no bounded\-degree test resolves\. That blind spot turns out to have an exact size: the Christoffel function of the clean marginal, the very quantity modern data analysis thresholds to*detect*outliers, here read from the adversary’s side as the corruption a bounded\-degree certificate cannot*remove*\. We turn this inversion into the organizing principle of the reweighted\-hinge approach to robustly learning \-margin halfspaces under malicious noise\(Shen25;ZengShen25\): the governing resource is the Sum\-of\-Squares*degree*of the outlier\-removal certificate, and the*resolution principle*states that the maximal corruption mass which can hide at a centerccfrom a degree\-2t2tcertificate is exactly the Christoffel function\(c\)t\+1\{\}\_\{t\+1\}\(c\)of the clean marginal\. Three consequences follow, all against the certificate method \(not information\-theoretic\)\. A*margin–degree tradeoff*: certifying the dense pancake to error costs SoS degree\(log\(1/\)\)\\Omega\(\\log\(1/\\varepsilon\)\)or margin\(log\(1/\)/d\)\\Omega\(\\sqrt\{\\log\(1/\\varepsilon\)\}/\\sqrt\{d\}\), explaining why thelog\(1/\)\\log\(1/\\varepsilon\)marginShen25records is*forced*, with a weighted\-Chebyshev reduction making the threshold2t=\(\(\|c\|/s\)2\)2t=\\Theta\(\(\\lvert c\\rvert/s\)^\{2\}\)tight modulo one classical weighted\-extremal estimate\. A*degree\-22outlier barrier*: the resolution principle realized as an explicit instance on which degree22is stuck at1/2while degree44escapes, locating the method’s small breakdown rate in the degree, not the analysis\. And a*degree\-2t2talgorithm*tracing the frontier1\-1/2t\(recoveringShen25att=1t=1\), whose gain is an explicit constant, capped by the pancake density and shown unimprovable by the degree\-22barrier\.
cc\(c\)t\+1\{\}\_\{t\+1\}\(c\)large herekeep the point\(it is typical\)small : outlierdata analysislarge==typical \(keep the point\)*samet\+1,**opposite reading*cc\(c\)t\+1\{\}\_\{t\+1\}\(c\)large hereadversary hides\(impossible to clear\)small : certifiablethis paperlarge==the adversary can hide here
Figure 1:The inversion\.The same Christoffel curve\(c\)t\+1\{\}\_\{t\+1\}\(c\)carries two opposite meanings\. In moment\-based data analysis a large value marks a typical point to keep; here a large value is exactly the corruption mass an adversary can hide atccthat no degree\-2t2tcertificate can clear\. One object, read as a detector on the left and as a barrier on the right\.
###### Contents
1. [1Introduction](https://arxiv.org/html/2606.17215#S1)1. [1\.1Technical overview](https://arxiv.org/html/2606.17215#S1.SS1) 2. [1\.2Related work](https://arxiv.org/html/2606.17215#S1.SS2)
2. [2Preliminaries](https://arxiv.org/html/2606.17215#S2)1. [2\.1Auxiliary results](https://arxiv.org/html/2606.17215#S2.SS1)
3. [3The Christoffel function and the certificate ceiling](https://arxiv.org/html/2606.17215#S3)1. [3\.1The certificate ceiling is a truncated moment problem](https://arxiv.org/html/2606.17215#S3.SS1) 2. [3\.2The Christoffel function governs the resolution](https://arxiv.org/html/2606.17215#S3.SS2) 3. [3\.3The degree of robustness is the Christoffel degree](https://arxiv.org/html/2606.17215#S3.SS3) 4. [3\.4From barrier to law: a weighted\-Chebyshev reduction](https://arxiv.org/html/2606.17215#S3.SS4)
4. [4Consequences: barriers and a matching algorithm](https://arxiv.org/html/2606.17215#S4)1. [4\.1The margin–degree tradeoff](https://arxiv.org/html/2606.17215#S4.SS1) 2. [4\.2The degree\-2 outlier barrier](https://arxiv.org/html/2606.17215#S4.SS2) 3. [4\.3The degree\-2t2treweighted hinge](https://arxiv.org/html/2606.17215#S4.SS3) 4. [4\.4The information\-theoretic floor](https://arxiv.org/html/2606.17215#S4.SS4) 5. [4\.5Supporting lemmas](https://arxiv.org/html/2606.17215#S4.SS5)
5. [5Discussion](https://arxiv.org/html/2606.17215#S5)1. [5\.1Context and comparisons](https://arxiv.org/html/2606.17215#S5.SS1) 2. [5\.2Open problems](https://arxiv.org/html/2606.17215#S5.SS2) 3. [5\.3Conclusion](https://arxiv.org/html/2606.17215#S5.SS3)
6. [References](https://arxiv.org/html/2606.17215#bib)
7. [ANotation and map of results](https://arxiv.org/html/2606.17215#A1)1. [A\.1Notation](https://arxiv.org/html/2606.17215#A1.SS1) 2. [A\.2Map of results](https://arxiv.org/html/2606.17215#A1.SS2)
8. [BDeferred proofs](https://arxiv.org/html/2606.17215#A2)1. [B\.1The margin–degree tradeoff](https://arxiv.org/html/2606.17215#A2.SS1) 2. [B\.2Minorant reduction and Gauss quadrature](https://arxiv.org/html/2606.17215#A2.SS2) 3. [B\.3The largest\-root bound](https://arxiv.org/html/2606.17215#A2.SS3) 4. [B\.4Log\-concave quantiles and margin geometry](https://arxiv.org/html/2606.17215#A2.SS4) 5. [B\.5The degree\-2t2treweighted hinge](https://arxiv.org/html/2606.17215#A2.SS5) 6. [B\.6The degree\-2 outlier barrier](https://arxiv.org/html/2606.17215#A2.SS6) 7. [B\.7The breakdown floor](https://arxiv.org/html/2606.17215#A2.SS7)
9. [CAuxiliary results](https://arxiv.org/html/2606.17215#A3)1. [C\.1Robust learning and log\-concave inputs](https://arxiv.org/html/2606.17215#A3.SS1) 2. [C\.2Orthogonal polynomials and the moment problem](https://arxiv.org/html/2606.17215#A3.SS2)
10. [DFrom sum\-of\-squares certificates to a semidefinite program](https://arxiv.org/html/2606.17215#A4)
11. [EExtension to nasty noise](https://arxiv.org/html/2606.17215#A5)
12. [FAdditional related work](https://arxiv.org/html/2606.17215#A6)
13. [GA closer look atShen25](https://arxiv.org/html/2606.17215#A7)1. [G\.1The margin–accuracy coupling](https://arxiv.org/html/2606.17215#A7.SS1) 2. [G\.2The breakdown constant of the degree\-22analysis](https://arxiv.org/html/2606.17215#A7.SS2) 3. [G\.3A natural reconstruction of the reweighting program](https://arxiv.org/html/2606.17215#A7.SS3)
## 1Introduction
The problem is to learn a halfspace efficiently when a constant fraction of the training data has been corrupted by an adversary\. In the malicious\-noise model ofKearnsLi93, an adversary observes the clean sample, then replaces an \-fraction of the \(instance, label\) pairs by arbitrary points of its choosing, with full knowledge of the learner and the clean distribution; the goal is to outputw^\\hat\{w\}whose error on the*clean*inliers is at most \. This is the strongest of the standard corruption models, and the central question is how large a noise rate a polynomial\-time learner can tolerate\.
One influential line of work answers this through a single algorithmic device: hinge\-loss minimization, later reweighted to suppress outliers\.Talwar20observed that plain hinge\-loss minimization already resists noise when the clean marginal places dense mass in a band \(a “pancake”\) around every point’s projection onto the separating direction\.Shen25turned this into a constant\-noise\-rate guarantee for log\-concave\-mixture marginals by*reweighting*the hinge: the weights come from a degree\-22outlier\-removal program \(a variance constraint\), and the analysis controls the dirty contribution to the hinge gradient through the quantityLSN\(q∘SD\)=sup∥w∥≤1∑i∈SDqi\|w⋅xi\|\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)=\\sup\_\{\\lVert w\\rVert\\leq 1\}\\sum\_\{i\\in S\_\{D\}\}q\_\{i\}\\lvert w\\cdot x\_\{i\}\\rvert\. The resulting algorithm tolerates a constant noise rate, but the constant comes with two limitations that the authors state plainly\. The tolerable rate is tiny \(=02−32\{\}\_\{0\}=2^\{\-32\}in the published version\), andShen25notes it “cannot be made close to the optimal breakdown point of12\\tfrac\{1\}\{2\}due to inherent limitations of our approach”; and the required margin carries an extra logarithmic factor,=\(log\(1/\)/d\)\\gamma=\\Omega\(\\log\(1/\\varepsilon\)/\\sqrt\{d\}\), a “logarithmic dependence on1/1/\\varepsilonthat appears less natural” than the\(1/d\)\\Omega\(1/\\sqrt\{d\}\)margin one would hope for\. The same two features recur in the sparse follow\-up ofZengShen25\.
The margin’s dependence on is a chicken\-and\-egg problem\. A margin is a fixed property of the data distribution, whereas the accuracy is the learner’s target, set at will once the problem is in hand; requiring=\(log\(1/\)/d\)\\gamma=\\Omega\(\\log\(1/\\varepsilon\)/\\sqrt\{d\}\)couples the two backwards, demanding*better\-separated data*exactly when the learner asks for smaller error, a property the data cannot supply on request\. One would rather fix the geometry once and buy accuracy with computation, not with margin\.
This paper explains both limitations\. We argue that they are not artifacts of a particular constant or a loose inequality, but consequences of a single resource that governs the whole approach: the polynomial, equivalently Sum\-of\-Squares \(SoS\),*degree*of the outlier\-removal certificate\. The reweighted\-hinge method needs two distributional facts about the clean marginal, and both are supplied by certificates that read off only the low\-order moments: a*lower*bound on the mass of the dense pancake \(so the hinge has signal\), and an*upper*bound on a reweighted moment of the dirty points \(so the outlier program can shrinkLSN\\mathrm\{LSN\}\)\. Each is a degree\-2t2tSoS proof from the first2t2tmoments\. We show that the degree these proofs require is exactly what controls the margin factor and the noise rate, and thatShen25’s degree\-22program sits at a corner of a tight frontier\.
We frame the contribution, throughout, in terms of degree: degree is to this method what the approximate degree of a predicate is to the polynomial method in algorithm design – the parameter that simultaneously buys the guarantee and bounds it\. Our two lower bounds are degree barriers; our algorithm is a degree\-2t2tprogram that traces the frontier the barriers cut out\. As with any barrier of this kind, the scope is essential and we state it up front: the lower bounds are against the*moment/SoS\-certificate method*– any procedure whose only distributional input is the degree\-≤2t\\leq 2tcertified moments of the clean marginal\. They are not information\-theoretic, and they are not unconditional Statistical\-Query hardness\. The margin–degree barrier strengthens, per direction, to a genuine degree\-2t2tSum\-of\-Squares lower bound \([Remark˜3\.1](https://arxiv.org/html/2606.17215#S3.Thmtheorem1)\); lifting it to the fulldd\-dimensional reweighting program jointly is open fort≥2t\\geq 2\([Section˜5\.2](https://arxiv.org/html/2606.17215#S5.SS2)\)\. This is precisely the class of methods to whichShen25’s “our approach” belongs, which is why the barriers explain that line’s admitted limits rather than contradicting them\.
#### The margin–degree tradeoff\.
Our first result \([Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)\) is a barrier on certifying band mass\. Fix a directionwwand letU=w⋅xU=w\\cdot xbe the \(centered, log\-concave\) projection with standard deviationss\. Consider a band of half\-width=\(s\)\\tau=\\Theta\(s\)centered atcc, a distance\|c\|\\lvert c\\rvertoff the origin\. Any degree\-2t2tSoS certificate that provesPr\[\|U−c\|≤\]≥=\(1\)\\Pr\[\\,\\lvert U\-c\\rvert\\leq\\tau\\,\]\\geq\\rho=\\Omega\(1\)from the first2t2tmoments ofUUrequires
2t=\(\(\|c\|/s\)2\)\(sub\-Gaussian\),2t=\(\|c\|/s\)\(sub\-exponential\)\.2t=\\Omega\\big\(\(\\lvert c\\rvert/s\)^\{2\}\\big\)\\quad\\text\{\(sub\-Gaussian\)\},\\qquad 2t=\\Omega\\big\(\\lvert c\\rvert/s\\big\)\\quad\\text\{\(sub\-exponential\)\}\.A certificate learner that reaches error must certify the pancake around all but an \-fraction of the sample projections, whose worst case is an \-quantile of the marginal, an off\-center band at\|c\|/s=\(log\(1/\)\)\\lvert c\\rvert/s=\\Theta\(\\log\(1/\\varepsilon\)\)for the sub\-exponential tails of a general log\-concave law\. Substituting gives the dichotomy: such a learner must pay*either*SoS degreet=\(log\(1/\)\)t=\\Omega\(\\log\(1/\\varepsilon\)\), hence running timen\(log\(1/\)\)n^\{\\Omega\(\\log\(1/\\varepsilon\)\)\},*or*margin=\(log\(1/\)/d\)\\gamma=\\Omega\(\\sqrt\{\\log\(1/\\varepsilon\)\}/\\sqrt\{d\}\)\. Thelog\(1/\)\\log\(1/\\varepsilon\)inShen25’s margin is therefore*conserved*: it can be traded against SoS degree, but inside the one\-shot certificate model it cannot be removed\. This is the structural reason the margin is “less natural\.”
The same calculation explains a finer point: whyShen25payslog\(1/\)\\log\(1/\\varepsilon\)rather thanlog\(1/\)\\sqrt\{\\log\(1/\\varepsilon\)\}\. The square root appears only for genuinely sub\-Gaussian marginals; for the sub\-exponential tails \(=1\\alpha=1\) of a general log\-concave mixture, the worst band sits at\|c\|/s=\(log\(1/\)\)\\lvert c\\rvert/s=\\Theta\(\\log\(1/\\varepsilon\)\)and the degree bound is linear in\|c\|/s\\lvert c\\rvert/s, so the margin pays the full logarithm\.
#### The degree\-22outlier barrier\.
Our second result \([Theorem˜4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)\) is a barrier on the outlier\-removal program itself, and it pinsShen25’s degree\-22choice as the bottleneck\. There is a clean log\-concave mixture \(⪯I2\\Sigma\\preceq\{\}^\{2\}I\) and a malicious set of rate≥\(1/d\)\\eta\\geq\\Omega\(1/d\)on which*every label\-oblivious degree\-22\(variance\) reweighting*\(any feasibleqqproduced without ground\-truth labels\) leaves, in expectation over the adversary’s choice of dirty points,
E\[LSN\(q∘SD\)\]=\(n1/2¯\)\.\\mdmathbb\{E\}\\big\[\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)\\big\]=\\Omega\\big\(\{\}^\{1/2\}\\,n\\,\\bar\{\\sigma\}\\big\)\.So a degree\-22outlier program is stuck at the1/2rate, exactly the rate ofShen25\. The barrier is label\-oblivious in expectation by necessity: the literal “for all feasibleqq” statement is false once the removal budget≥\\xi\\geq\\eta, since the feasible polytope then contains aqqthat simply zeroes out the spike; what no label\-oblivious program can do is*find*thatqq, because it cannot tell the dirty points from genuine clean points at the same locations\. The construction needs≥\(1/d\)\\eta\\geq\\Omega\(1/d\)so that two genuine log\-concave clusters fit inside the covariance budget; we state this regime in the theorem\.
#### The degree\-2t2talgorithm\.
Our third result \([Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)\) shows the frontier the two barriers cut out is achievable, by replacingShen25’s variance program with a degree\-2t2tSoS program\. The degree\-2t2treweighting certifies, for every feasibleqq,
LSN\(q∘SD\)≤n\(Ct\)¯,1−12t\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)\\;\\leq\\;n\\,\(Ct\)\\,\\bar\{\\sigma\}\\,\{\}^\{1\-\\frac\{1\}\{2t\}\},hence tolerates
≤\(t\)0=\(min\{4,\(\(Ct\)¯\)2t2t−1\}\)\\eta\\;\\leq\\;\{\}\_\{0\}\(t\)\\;=\\;\\Omega\\\!\\left\(\\min\\\!\\left\\\{\\frac\{\\rho\}\{4\},\\;\\left\(\\frac\{\\gamma\\rho\}\{\(Ct\)\\bar\{\\sigma\}\}\\right\)^\{\\\!\\frac\{2t\}\{2t\-1\}\}\\right\\\}\\right\)with clean inlier error≤\\leq\\varepsilon, in timenO\(t\)n^\{O\(t\)\}\. Att=1t=1this is exactlyShen25’s1/2rate; asttgrows, the exponent1−1/2t→11\-1/2t\\to 1removes the degree\-22*exponent*loss\. The gain is that removal of an exponent \(∼0c2→c\{\}\_\{0\}\\sim c^\{2\}\\to cin the outlier bottleneck\), bought with the sub\-exponential certifiable\-moment constant\(Ct\)2t\(Ct\)^\{2t\}, not\(Ct\)t\(Ct\)^\{t\}, which is false for general log\-concave marginals \(for the Laplace lawm2t/m2t=\(\(t\)\)2tm\_\{2t\}/m\_\{2\}^\{t\}=\(\\Theta\(t\)\)^\{2t\}\)\. The base is\(Ct\)\(Ct\), not its square root; the rate is capped by the pancake density/4\\rho/4and does*not*approach1/21/2\. SoShen25’s degree\-22program is the degree\-22corner of a tight frontier:[Theorem˜4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)shows degree22cannot beat1/2, and[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)shows degree2t2treaches1\-1/2tand no degree does better against the matching barrier\.
#### The breakdown floor\.
Finally \([Proposition˜4\.11](https://arxiv.org/html/2606.17215#S4.Thmtheorem11)\), a two\-point argument shows the noise tolerance has a hard ceiling that no algorithm of any kind can cross: under malicious noise with a \-margin log\-concave mixture, no learner attains clean inlier error below/\(2\(1−\)\)≥/2\\eta/\(2\(1\-\\eta\)\)\\geq\\eta/2\. This is rate\-tight against[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)’s\(\)\\Theta\(\\eta\)guarantee, and confirms that the\(\)\\Theta\(\\eta\)dependence is correct\. It is a*rate*floor, not a threshold: the construction \(Le Cam’s two points, through the Kearns–Li envelopeTV≤/\(1−\)\\mathrm\{TV\}\\leq\\eta/\(1\-\\eta\)and the/\\theta/\\pirandom\-hyperplane disagreement identity\) certifies the exponent but not the exact breakdown constant\. The tight, margin\-dependent breakdown rate=⋆\(/\)\{\}^\{\\star\}=\\Theta\(\\gamma/\\sigma\)remains open\.
#### Contributions\.
- •A Christoffel\-function framework\([Section˜3](https://arxiv.org/html/2606.17215#S3)\), which we put first because it organizes everything else\. The maximal corruption mass that can hide at a centerccfrom a degree\-2t2tmoment certificate is exactly the Christoffel function\(c\)t\+1\{\}\_\{t\+1\}\(c\)of the clean marginal \(*resolution principle*,[Proposition˜3\.3](https://arxiv.org/html/2606.17215#S3.Thmtheorem3)\)\. This ties the SoS*degree*to the order at which the Christoffel function resolves a corruption\. The same function modern data analysis uses to*keep*a typical point \(a large Christoffel value reads as “in the bulk”\) is, on the population, exactly the mass an adversary can*hide*: one object, opposite sign, detection turning into impossibility \([Section˜3\.3](https://arxiv.org/html/2606.17215#S3.SS3)\)\. The barriers and the algorithm below are three readings of it\.
- •A margin–degree tradeoff\([Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)\): reading the resolution principle off an off\-center band, certifying a dense band at\|c\|/s\\lvert c\\rvert/sstandard deviations off center costs SoS degree2t=\(\(\|c\|/s\)2\)2t=\\Omega\(\(\\lvert c\\rvert/s\)^\{2\}\)\(sub\-Gaussian;\(\|c\|/s\)\\Omega\(\\lvert c\\rvert/s\)sub\-exponential\), so a certificate learner of error pays degree\(log\(1/\)\)\\Omega\(\\log\(1/\\varepsilon\)\)or margin\(log\(1/\)/d\)\\Omega\(\\sqrt\{\\log\(1/\\varepsilon\)\}/\\sqrt\{d\}\)\. Thelog\(1/\)\\log\(1/\\varepsilon\)inShen25’s margin is conserved within the certificate model, and the weighted\-Chebyshev reduction \([Proposition˜3\.6](https://arxiv.org/html/2606.17215#S3.Thmtheorem6)\) closes the bound to an exact law2t=\(\(\|c\|/s\)2\)2t=\\Theta\(\(\\lvert c\\rvert/s\)^\{2\}\)\([Theorem˜4\.5](https://arxiv.org/html/2606.17215#S4.Thmtheorem5)\) modulo one classical weighted\-extremal estimate\.
- •A degree\-22outlier barrier\([Theorem˜4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)\): the resolution principle atc=/c=\\sigma/\\sqrt\{\\eta\}predicts a degree\-22/degree\-44gap, which we realize as a clean log\-concave mixture on which every label\-oblivious degree\-22reweighting is stuck at1/2, matching the algorithm att=1t=1\.
- •A degree\-2t2treweighted\-hinge algorithm\([Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)\): the order\-\(t\+1\)\(t\{\+\}1\)Christoffel detector run as a certificate, givingLSN\(q∘SD\)≤n\(Ct\)¯1−1/2t\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)\\leq n\(Ct\)\\bar\{\\sigma\}\\,\{\}^\{1\-1/2t\}in timenO\(t\)n^\{O\(t\)\}, removing the degree\-22exponent loss and tracing the frontier the barriers cut out\. The gain is an explicit constant capped by/4\\rho/4, not a march to1/21/2\.
- •A breakdown floor\([Proposition˜4\.11](https://arxiv.org/html/2606.17215#S4.Thmtheorem11)\): a rate\-tight/2\\eta/2ceiling on the achievable inlier error, matching the algorithm’s\(\)\\Theta\(\\eta\)dependence\.
Together these say that for the reweighted\-hinge approach,*degree is the resource*and the Christoffel function is the object that sets it:[Proposition˜3\.3](https://arxiv.org/html/2606.17215#S3.Thmtheorem3)names it,[Theorem˜4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)and[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)bracket the noise rate as a tight function of degree,[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)bounds the margin as a function of degree, and the two admitted limitations ofShen25are thet=1t=1shadow of one Christoffel\-degree frontier\.
Table 1:Where the reweighted\-hinge line stands, and what this paper adds\. “Cert\. degree” is the SoS degree of the outlier\-removal certificate; “Tolerated rate” is the malicious noise rate the analysis tolerates\. ForThis paper, the tolerated rate is the explicit constant\(t\)0\{\}\_\{0\}\(t\)of[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9): the degree\-2t2talgorithm succeeds for≤\(t\)0\\eta\\leq\{\}\_\{0\}\(t\), which*rises with the degree*fromShen25’s tiny∼\(/¯\)2\\sim\(\\gamma\\rho/\\bar\{\\sigma\}\)^\{2\}\(the degree\-22squaring loss\) toward the pancake budget/4\\rho/4, never approaching1/21/2; the underlying dirty\-gradient bound decays as1\-1/2t\(Shen25’s1/2att=1t=1, tending to asttgrows\)\. TheTalwar20margin is listed as\(log\(1/\)/d\)\\Omega\(\\log\(1/\\varepsilon\)/\\sqrt\{d\}\)for uniformity with the column; that paper writes it\(logd/d\)\\Omega\(\\log d/\\sqrt\{d\}\), which is the same bound at inverse\-polynomial accuracy=1/poly\(d\)\\varepsilon=1/\\operatorname\{poly\}\(d\), wherelog\(1/\)=\(logd\)\\log\(1/\\varepsilon\)=\\Theta\(\\log d\)\. Our guarantees hold for both malicious and*nasty*noise, with the same frontier and only a constant\-factor smaller tolerable rate under nasty noise \([Appendix˜E](https://arxiv.org/html/2606.17215#A5)\)\.WorkNoise modelMarginalMarginTolerated rateCert\. degreeLower boundsTalwar20adversarial label,maliciouslog\-concave mix\.\(log\(1/\)d\)\\Omega\\left\(\\dfrac\{\\log\(1/\\varepsilon\)\}\{\\sqrt\{d\}\}\\right\)label\(1\)\\Omega\(1\), mal\.\(\)\\Omega\(\\gamma\)22noneShen25maliciouslog\-concave mix\.\(log\(1/\)d\)\\Omega\\left\(\\dfrac\{\\log\(1/\\varepsilon\)\}\{\\sqrt\{d\}\}\\right\)≤1232\\eta\\leq\\dfrac\{1\}\{2^\{32\}\}22noneZengShen25malicious,sparselog\-concave mix\.\(log\(1/\)d\)\\Omega\\left\(\\dfrac\{\\log\(1/\\varepsilon\)\}\{\\sqrt\{d\}\}\\right\)≤1232\\eta\\leq\\dfrac\{1\}\{2^\{32\}\}22noneThis papermalicious,nastylog\-concave mix\.\(1d\)\\Theta\\left\(\\dfrac\{1\}\{\\sqrt\{d\}\}\\right\)≤\(t\)0\\eta\\leq\{\}\_\{0\}\(t\)2t2t2 barriers\(Thm[4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1),[4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)\)*\(tradeoff\)**\(at most/4\\rho/4asttgrows\)**\(tunable\)*tight \(Thm[4\.5](https://arxiv.org/html/2606.17215#S4.Thmtheorem5)\); floor \(Prop[4\.11](https://arxiv.org/html/2606.17215#S4.Thmtheorem11)\)
[Table˜1](https://arxiv.org/html/2606.17215#S1.T1)locates the contribution\. We are the first in this line to treat the certificate*degree*as the resource, and the first to prove*lower bounds*\(barriers\) for it; the upper\-bound noise gain is deliberately modest and is matched by our own degree\-22barrier\.
### 1\.1Technical overview
Three ideas drive the results; we sketch each before the formal proofs in[Sections˜3](https://arxiv.org/html/2606.17215#S3)and[4](https://arxiv.org/html/2606.17215#S4)and the appendix\. All three are facets of the same Christoffel\-function picture \([Section˜3](https://arxiv.org/html/2606.17215#S3)\): Gauss quadrature locates where the Christoffel function resolves a band, the moment cloak realizes the smallest unresolved spike, and the degree\-2t2tHölder bound is that detector run as a certificate\.
#### \(a\) Gauss quadrature beats the moment problem\.
A degree\-2t2tcertificate of band mass is, after a minorant reduction, a polynomialppof degree≤2t\\leq 2twithp≤𝟏Bp\\leq\\mathbf\{1\}\_\{B\}together with an SoS proof, from the moments alone, that∫p≥\\int p\\geq\\rho\. Because the proof uses onlym0,…,m2tm\_\{0\},\\dots,m\_\{2t\}, its conclusion must hold for*every*nonnegative measure matching those moments\. Gauss quadrature produces one such measure explicitly: the\(t\+1\)\(t\+1\)\-point measure on the roots of the degree\-\(t\+1\)\(t\+1\)orthogonal polynomial of reproduces all moments up to order2t\+12t\+1, and it is supported on justt\+1t\+1atoms\. If the bandBBcontains*no*quadrature node, this witness assigns it mass zero, so the best certifiable value is zero and no degree\-2t2tcertificate can exist\. A band stays node\-free as long as it lies beyond the largest root, which sits at\(st\)\\Theta\(s\\sqrt\{t\}\)for sub\-Gaussian weights and\(st\)\\Theta\(s\\,t\)for sub\-exponential ones\. An off\-center band at distance\|c\|\\lvert c\\rvertis thus invisible below degree\(\|c\|/s\)2\(\\lvert c\\rvert/s\)^\{2\}, the source of the tradeoff\. The largest\-root bound is elementary in the two cases we name \(Jacobi\-matrix eigenvalues with Gershgorin for the sub\-Gaussian2st\+12s\\sqrt\{t\+1\}and sub\-exponentialO\(st\)O\(st\)endpoints\); the fully general log\-concave exponent is the one place we invoke a black box, the exponential\-weight asymptotics ofLevinLubinsky01, and we flag it\.
#### \(b\) The moment\-cloaked spike\.
The degree\-22barrier needs an instance that a variance program cannot clean but a higher\-degree program can\. The construction is a*moment\-cloaked spike*: two genuine log\-concave clusters at±\(/\)e1\\pm\(\\sigma/\\sqrt\{\\eta\}\)\\,e\_\{1\}, to which the adversary addsn\\eta ndirty points drawn from the same clusters\. The dirty points are engineered to be invisible to the second moment \(they spend exactly the variance budget,1n∑SD\(e1⋅xi\)2=⋅\(/\)2=2\\frac\{1\}\{n\}\\sum\_\{S\_\{D\}\}\(e\_\{1\}\\cdot x\_\{i\}\)^\{2\}=\\eta\\cdot\(\\sigma/\\sqrt\{\\eta\}\)^\{2\}=\{\}^\{2\}\) while being first\-moment\-aligned, so they contribute∑SD\|e1⋅xi\|=n\\sum\_\{S\_\{D\}\}\\lvert e\_\{1\}\\cdot x\_\{i\}\\rvert=\\sqrt\{\\eta\}\\,n\\sigmatoLSN\\mathrm\{LSN\}\. Because the variance constraints see only the multiset of points, a label\-oblivious program cannot distinguish then\\eta ndirty points from the2n2\\eta ngenuine clean points at the same locations; removing its budget of≤n\\leq\\xi nleaves at least a third of the spike weight, henceE\[LSN\]≥13n\\mdmathbb\{E\}\[\\mathrm\{LSN\}\]\\geq\\frac\{1\}\{3\}\\sqrt\{\\eta\}\\,n\\sigma\. A degree\-44program escapes, because the spike inflates the fourth moment to¯4/≫¯4\\bar\{\\sigma\}^\{4\}/\\eta\\gg\\bar\{\\sigma\}^\{4\}and is no longer cloaked, which is exactly why the barrier is specific to degree22\.
#### \(c\) Degree\-2t2tHölder onLSN\\mathrm\{LSN\}\.
The algorithm is the dual move\. Given a degree\-2t2tSoS certificate that the reweighted2t2t\-th moment is bounded,1n∑iqi\(w⋅xi\)2t≤\(Ct\)2t¯2t\\frac\{1\}\{n\}\\sum\_\{i\}q\_\{i\}\(w\\cdot x\_\{i\}\)^\{2t\}\\leq\(Ct\)^\{2t\}\\bar\{\\sigma\}^\{2t\}for allww, Hölder’s inequality with exponent2t2tsplitsLSN\\mathrm\{LSN\}as
∑SDqi\|w⋅xi\|≤\(∑SDqi\)1−12t\(∑SDqi\(w⋅xi\)2t\)12t≤\(n\)1−12t\(n\(Ct\)2t¯2t\)12t=n1−12t\(Ct\)¯\.\\sum\_\{S\_\{D\}\}q\_\{i\}\\lvert w\\cdot x\_\{i\}\\rvert\\leq\\Big\(\\sum\_\{S\_\{D\}\}q\_\{i\}\\Big\)^\{1\-\\frac\{1\}\{2t\}\}\\Big\(\\sum\_\{S\_\{D\}\}q\_\{i\}\(w\\cdot x\_\{i\}\)^\{2t\}\\Big\)^\{\\frac\{1\}\{2t\}\}\\leq\(\\eta n\)^\{1\-\\frac\{1\}\{2t\}\}\\big\(n\(Ct\)^\{2t\}\\bar\{\\sigma\}^\{2t\}\\big\)^\{\\frac\{1\}\{2t\}\}=\{\}^\{1\-\\frac\{1\}\{2t\}\}n\(Ct\)\\bar\{\\sigma\}\.The even power\(w⋅x\)2t\(w\\cdot x\)^\{2t\}is a polynomial, so the moment bound is SoS\-certifiable; feasibility of the clean\-indicator weights is the certifiable\-hypercontractivity bound on the empirical clean measure\. The reason the exponent stops at11and never reaches the constant rate is the certifiable\-moment constant: for general \(sub\-exponential\) log\-concave marginals it is\(Ct\)2t\(Ct\)^\{2t\}, contributing the base\(Ct\)\(Ct\)to the final bound, so largerttimproves the \-exponent at a multiplicative cost in the constant, a genuine tradeoff, not a free lunch\. This is the sense in which degree buys noise tolerance, and the reason the purchase is bounded\.
#### Scope and open questions\.
Two cautions are built into the results above and we restate them together\. The barriers are*method\-specific*: they rule out the degree\-2t2tmoment/SoS\-certificate family, not all efficient algorithms, and not learners outside the one\-shot certificate model \(an iterative, localization\-style filter sidesteps one\-shot certification and is not covered\)\. And the algorithm’s gain is*modest*: it removes the degree\-22exponent loss but stays capped by/4\\rho/4and does not march toward1/21/2\. Several questions are left open, in increasing order of reach\. The breakdown floor is rate\-tight but not threshold\-tight; the conjectured margin\-dependent rate=⋆\(/\)\{\}^\{\\star\}=\\Theta\(\\gamma/\\sigma\)needs a multi\-point or band\-mass argument beyond two points\. The degree barrier is a lower bound; a matching*upper*certificate of degreeO\(\(\|c\|/s\)2\)O\(\(\\lvert c\\rvert/s\)^\{2\}\)would make the tradeoff exactly2t=\(\(\|c\|/s\)2\)2t=\\Theta\(\(\\lvert c\\rvert/s\)^\{2\}\)\. The general\-log\-concave largest\-root bound still rests onLevinLubinsky01; removing that black box would make the barrier self\-contained\. And whether an iterative localization scheme provably escapes the margin–degree barrier \(by leaving the one\-shot certificate model the barrier scopes\) is the question whose answer would change the picture rather than refine it\.
### 1\.2Related work
The hinge approach we analyze originates withTalwar20and is sharpened by reweighting inShen25, thenZengShen25\.KlivansSTV25reach error2\+2\\eta\+\\varepsilonfor halfspaces under contamination, but for Gaussian marginals, without a margin, and by iterative filtering rather than a moment certificate, so they are a motivation for the certificate model and not subsumed by it\. The SoS machinery we import is from robust statistics\(HopkinsLi18;KothariSteinhardtSteurer18;KlivansKothariMeka18;DiakonikolasHopkinsPensiaTiegel24\); we use it to certify a moment*upper*bound and, for the barrier, to scope the certificate model precisely\. The dual direction \(certifiable*anti*\-concentration, an SoS lower bound on slab mass\(BakshiKothariRTV24\)\) is exactly what a degree\-2t2tband\-mass certificate would need and exactly what our[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)shows the low\-order moments cannot supply\. Lower bounds for nasty and agnostic halfspaces\(DiakonikolasKaneStewart18;DiakonikolasKaneZarifis20\)address different noise models and do not give a degree barrier forLSN\\mathrm\{LSN\}\. The orthogonal\-polynomial facts we use are classical\(Szego39;Gautschi04;LevinLubinsky01;KriecherbauerMcLaughlin99\), as are the log\-concave\(BrascampLieb76;LovaszVempala07\)and malicious\-noise\(KearnsLi93\)ingredients\.
## 2Preliminaries
This section fixes the noise model, the distributional assumptions, the reweighted\-hinge program whose dirty gradient we control, and the two algebraic objects that the rest of the paper turns on: the dense pancake and the degree\-2t2tmoment certificate of its band mass\. The reader should leave with one picture in mind\. Outlier removal in this line of work is a*polynomial*certificate of a band\-mass lower bound, its*degree*is the resource, and the distributional input that powers it is a degree\-2t2tcertifiable moment bound \([Lemma˜4\.13](https://arxiv.org/html/2606.17215#S4.Thmtheorem13)\); every barrier we prove is a lower bound on that degree\.
#### Notation\.
Vectors live inRd\\mdmathbb\{R\}^\{d\};∥⋅∥\\lVert\\cdot\\rVertis the Euclidean norm and⟨u,v⟩=u⋅v\\langle u,v\\rangle=u\\cdot vthe inner product\. For a halfspace directionwwwe writeU=w⋅xU=w\\cdot xfor the one\-dimensional projection\. A sample isS=\{\(xi,yi\)\}i=1nS=\\\{\(x\_\{i\},y\_\{i\}\)\\\}\_\{i=1\}^\{n\}withyi∈\{±1\}y\_\{i\}\\in\\\{\\pm 1\\\}; we split it into a clean partSCS\_\{C\}and a dirty partSDS\_\{D\}\(defined below\)\. Reweighting weights areq=\(q1,…,qn\)∈\[0,1\]nq=\(q\_\{1\},\\dots,q\_\{n\}\)\\in\[0,1\]^\{n\}, andq∘Sq\\circ Sdenotes the sample carrying those weights\. We use¯2\(w\)≔E𝒟X\[\(w⋅\(x−¯\)\)2\]\\bar\{\\sigma\}^\{2\}\(w\)\\coloneqq\\mdmathbb\{E\}\_\{\\mathcal\{D\}\_\{X\}\}\\big\[\(w\\cdot\(x\-\\bar\{\\mu\}\)\)^\{2\}\\big\]for the projected variance of the clean marginal, with¯\\bar\{\\mu\}its mean\. Throughout,t≥1t\\geq 1is the degree parameter: the certificate is a degree\-2t2tobject and the SoS program runs in timenO\(t\)n^\{O\(t\)\}\. We write𝟏B\\mathbf\{1\}\_\{B\}for the indicator of a setB⊆RB\\subseteq\\mdmathbb\{R\}, andhinge\(w;x,y\)≔max\{0,1−y\(w⋅x\)\}\\operatorname\{hinge\}\(w;x,y\)\\coloneqq\\max\\\{0,\\,1\-y\\,\(w\\cdot x\)\\\}for the hinge loss\. Constants denotedCCare absolute and may change line to line; is the target error\.
#### Malicious noise\.
We work in the malicious\-noise model ofKearnsLi93, the strongest sample\-corruption model for which positive results of this kind are stated\. An adversary with full knowledge of the learner, the clean distribution, and the realized clean sample selects an \-fraction of thennexamples and replaces those\(instance,label\)\(\\text\{instance\},\\text\{label\}\)pairs by*arbitrary*pairs of its choosing; the remaining\(1−\)n\(1\-\\eta\)nexamples, the*inliers*, are drawn i\.i\.d\. from the clean distribution\. We writeSCS\_\{C\}for the inliers andSDS\_\{D\}for the at mostn\\eta ncorrupted examples\. The objective is stated against the clean distribution only: the learner outputsw^\\hat\{w\}and is charged its error on the inlier law,
err𝒟X\(w^\)≔P\(x,y\)∼𝒟\[sign\(w^⋅x\)≠y\]≤,\\mathrm\{err\}\_\{\\mathcal\{D\}\_\{X\}\}\(\\hat\{w\}\)\\ \\coloneqq\\ \\mdmathbb\{P\}\_\{\(x,y\)\\sim\\mathcal\{D\}\}\\big\[\\operatorname\{sign\}\(\\hat\{w\}\\cdot x\)\\neq y\\big\]\\ \\leq\\ \\varepsilon,where𝒟\\mathcal\{D\}is the clean joint distribution with marginal𝒟X\\mathcal\{D\}\_\{X\}on instances\. This is the goal of robustly learning the inlier halfspace, not of fitting the corrupted points\. We state everything for malicious noise; the results extend, with a constant\-factor smaller tolerable rate, to the stronger*nasty\-noise*model\(BshoutyEironKushilevitz02\)in which the adversary may also delete clean inliers \([Appendix˜E](https://arxiv.org/html/2606.17215#A5)\)\.
#### Distributional assumptions\.
The clean marginal is a balanced mixture
𝒟X=1K∑j=1K𝒟j,\\mathcal\{D\}\_\{X\}\\ =\\ \\frac\{1\}\{K\}\\sum\_\{j=1\}^\{K\}\\mathcal\{D\}\_\{j\},where each component𝒟j\\mathcal\{D\}\_\{j\}is*log\-concave*onRd\\mdmathbb\{R\}^\{d\}with meanjsatisfying∥∥j≤r\\lVert\{\}\_\{j\}\\rVert\\leq rand covariance⪯jI2\{\}\_\{j\}\\preceq\{\}^\{2\}I, in the regimer=O\(d\)r=O\(\\sigma\\sqrt\{d\}\)\. The clean labels are realized by a \-margin halfspace: there is a unitw⋆w^\{\\star\}withy\(w⋆⋅x\)≥y\\,\(w^\{\\star\}\\cdot x\)\\geq\\gammafor every clean\(x,y\)\(x,y\)\. This is exactly the setting ofShen25and, for the single\-component case, ofTalwar20; our barriers are about the certificates one can build over precisely this class\.
Two features of the assumption are load\-bearing, and we flag them now because the constants later depend on them\. First, the mixture𝒟X\\mathcal\{D\}\_\{X\}is in general*not*log\-concave, even though each𝒟j\\mathcal\{D\}\_\{j\}is; the certificates must survive the mixing\. Second, a log\-concave law is only*sub\-exponential*, not sub\-Gaussian: a centered one\-dimensional log\-concaveZZobeys the linear moment growth∥Z∥L2t≤Ct∥Z∥L2\\lVert Z\\rVert\_\{L^\{2t\}\}\\leq Ct\\,\\lVert Z\\rVert\_\{L^\{2\}\}, soE\[Z2t\]≤\(Ct\)2t\(E\[Z2\]\)t\\mdmathbb\{E\}\[Z^\{2t\}\]\\leq\(Ct\)^\{2t\}\\,\(\\mdmathbb\{E\}\[Z^\{2\}\]\)^\{t\}, and this order inttcannot be improved \(the Laplace law hasE\[Z2t\]/E\[Z2\]t=\(2t\)\!/2t=\(\(t\)\)2t\\mdmathbb\{E\}\[Z^\{2t\}\]/\\mdmathbb\{E\}\[Z^\{2\}\]^\{t\}=\(2t\)\!/2^\{t\}=\(\\Theta\(t\)\)^\{2t\}\)\(LovaszVempala07\)\. The exponent2t2tin\(Ct\)2t\(Ct\)^\{2t\}, rather than the sub\-Gaussian\(Ct\)t\(Ct\)^\{t\}, is the reasonShen25’s margin carries a factorlog\(1/\)\\log\(1/\\varepsilon\)and notlog\(1/\)\\sqrt\{\\log\(1/\\varepsilon\)\}, and we track it faithfully through[Lemma˜4\.13](https://arxiv.org/html/2606.17215#S4.Thmtheorem13)\.
#### Reweighted hinge and the dirty\-gradient quantity\.
The algorithmic template ofTalwar20andShen25, which our degree\-2t2talgorithm refines, minimizes a*reweighted*hinge loss over a ball of radius1/1/\\gamma:
min∥w∥≤1/∑i=1nqihinge\(w;xi,yi\),\\min\_\{\\lVert w\\rVert\\leq 1/\\gamma\}\\ \\sum\_\{i=1\}^\{n\}q\_\{i\}\\,\\operatorname\{hinge\}\(w;x\_\{i\},y\_\{i\}\),where the weightsq∈\[0,1\]nq\\in\[0,1\]^\{n\}are produced by an outlier\-removal program that downweights the corrupted examples\. The only way the dirty examples can corrupt the minimizer is through the gradient they contribute, and a subgradient ofqihinge\(w;xi,yi\)q\_\{i\}\\,\\operatorname\{hinge\}\(w;x\_\{i\},y\_\{i\}\)has norm at mostqi∥xi∥q\_\{i\}\\,\\lVert x\_\{i\}\\rVertwith the sign of−yi\-y\_\{i\}\. The quantity that bounds the dirty contribution is therefore the weighted one\-dimensional spread of the corrupted points: for a setS′S^\{\\prime\}of examples,
LSN\(q∘S′\)≔sup∥w∥≤1∑i∈S′qi\|w⋅xi\|\.\\mathrm\{LSN\}\(q\\circ S^\{\\prime\}\)\\ \\coloneqq\\ \\sup\_\{\\lVert w\\rVert\\leq 1\}\\ \\sum\_\{i\\in S^\{\\prime\}\}q\_\{i\}\\,\\lvert w\\cdot x\_\{i\}\\rvert\.\(2\.1\)By the subgradient bound above and the triangle inequality,LSN\(q∘SD\)\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)is an upper bound on the norm of the dirty subgradient∥∑i∈SDqi∂whinge\(w;xi,yi\)∥\\lVert\\sum\_\{i\\in S\_\{D\}\}q\_\{i\}\\,\\partial\_\{w\}\\operatorname\{hinge\}\(w;x\_\{i\},y\_\{i\}\)\\rVert, uniformly over∥w∥≤1\\lVert w\\rVert\\leq 1\. ControllingLSN\(q∘SD\)\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)is thus the entire game on the dirty side; the weightsqqare good exactly when this quantity is small\. Talwar’s andShen25’s degree\-22\(variance\) program drives it to\(n¯\)1/2\\Theta\(n\\bar\{\\sigma\}\\,\{\}^\{1/2\}\), and[Section˜4](https://arxiv.org/html/2606.17215#S4)both raises the degree to beat the exponent and proves a degree\-22barrier matching it\.
#### Dense pancakes\.
The clean \(inlier\) side is controlled through the dense\-pancake structure ofTalwar20\. For a directionww, a center point’s projection, and a half\-width , the*pancake*\(band\)
𝒫w\(x\)=\{x′∈Rd:\|w⋅x′−w⋅x\|≤\}\\mathcal\{P\}\_\{w\}\(x\)\\ =\\ \\big\\\{\\,x^\{\\prime\}\\in\\mdmathbb\{R\}^\{d\}:\\ \\lvert w\\cdot x^\{\\prime\}\-w\\cdot x\\rvert\\leq\\tau\\,\\big\\\}is the slab of width22\\tauorthogonal towwaround the center’s projection\. We call it*\-dense*if it carries at least a \-fraction of the clean mass,P𝒟X\[𝒫w\]≥\\mdmathbb\{P\}\_\{\\mathcal\{D\}\_\{X\}\}\[\\mathcal\{P\}\_\{w\}\]\\geq\\rho\. The role of density is that a band carrying enough clean mass cannot be emptied by the adversary’sn\\eta npoints, which is what lets the reweighted hinge recover the inlier direction; the achievable noise rate is capped by the pancake density at/4\\rho/4\. The technical content is to*certify*a band\-mass lower bound from moments alone, which is the next object\.
#### Degree\-2t2tmoment certificates\.
This is the central object of the paper\. An outlier\-removal program of the SoS family does not see the distribution; its only distributional input is a bounded list of certified moments of the clean marginal, and whatever band\-mass lower bound it can deduce from them by a Positivstellensatz proof\. We make this precise via the standard polynomial*minorant*reduction\.
###### Definition 2\.1\(Degree\-2t2tmoment certificate of band mass\)\.
Let be the law of the projectionU=w⋅xU=w\\cdot xunder the clean marginal, with momentsmk:=E\[Uk\]m\_\{k\}:=\\mdmathbb\{E\}\[U^\{k\}\]for0≤k≤2t0\\leq k\\leq 2t; letB⊆RB\\subseteq\\mdmathbb\{R\}be a band; and letR\[u\]≤2t\\mdmathbb\{R\}\[u\]\_\{\\leq 2t\}denote the real univariate polynomials inuuof degree at most2t2t\. A*degree\-2t2tpseudo\-expectation consistent with*is a linear functionalE~:R\[u\]≤2t→R\\tilde\{\\mdmathbb\{E\}\}\\colon\\mdmathbb\{R\}\[u\]\_\{\\leq 2t\}\\to\\mdmathbb\{R\}satisfying
E~\[1\]=1,E~\[q2\]≥0for allq∈R\[u\]≤t,E~\[uk\]=mkfor0≤k≤2t\.\\tilde\{\\mdmathbb\{E\}\}\[1\]=1,\\qquad\\tilde\{\\mdmathbb\{E\}\}\[q^\{2\}\]\\geq 0\\ \\ \\text\{for all \}q\\in\\mdmathbb\{R\}\[u\]\_\{\\leq t\},\\qquad\\tilde\{\\mdmathbb\{E\}\}\[u^\{k\}\]=m\_\{k\}\\ \\ \\text\{for \}0\\leq k\\leq 2t\.A*degree\-2t2tmoment certificate*of the boundP\[U∈B\]≥\\mdmathbb\{P\}\[U\\in B\]\\geq\\rhois a polynomialp∈R\[u\]≤2tp\\in\\mdmathbb\{R\}\[u\]\_\{\\leq 2t\}, with coefficient vector\(p0,…,p2t\)∈R2t\+1\(p\_\{0\},\\dots,p\_\{2t\}\)\\in\\mdmathbb\{R\}^\{2t\+1\}so thatp\(u\)=∑k=02tpkukp\(u\)=\\sum\_\{k=0\}^\{2t\}p\_\{k\}\\,u^\{k\}, that is a*minorant*of the band indicator,
p\(u\)≤1B\(u\)for allu∈R,p\(u\)\\ \\leq\\ \\mathbf\{1\}\_\{B\}\(u\)\\qquad\\text\{for all \}u\\in\\mdmathbb\{R\},together with a degree\-2t2tsum\-of\-squares \(Positivstellensatz\) derivation, from the three axioms above alone, of
E~\[p\]=∑k=02tpkmk≥for every consistentE~,\\tilde\{\\mdmathbb\{E\}\}\[p\]\\ =\\ \\sum\_\{k=0\}^\{2t\}p\_\{k\}\\,m\_\{k\}\\ \\geq\\ \\rho\\qquad\\text\{for every consistent \}\\tilde\{\\mdmathbb\{E\}\},the equality being linearity ofE~\\tilde\{\\mdmathbb\{E\}\}together withE~\[uk\]=mk\\tilde\{\\mdmathbb\{E\}\}\[u^\{k\}\]=m\_\{k\}\(so the valueE~\[p\]=∑kpkmk\\tilde\{\\mdmathbb\{E\}\}\[p\]=\\sum\_\{k\}p\_\{k\}m\_\{k\}is common to all consistentE~\\tilde\{\\mdmathbb\{E\}\}\)\. Sincep≤𝟏Bp\\leq\\mathbf\{1\}\_\{B\}pointwise gives∑k=02tpkmk=E\[p\]≤P\[U∈B\]\\sum\_\{k=0\}^\{2t\}p\_\{k\}\\,m\_\{k\}=\\mdmathbb\{E\}\[p\]\\leq\\mdmathbb\{P\}\[U\\in B\], any such certificate transfers its bound∑k=02tpkmk≥\\sum\_\{k=0\}^\{2t\}p\_\{k\}\\,m\_\{k\}\\geq\\rhoto a genuine lower boundP\[U∈B\]≥\\mdmathbb\{P\}\[U\\in B\]\\geq\\rhoon the band mass\.
The minorant reduction is what makes[Definition˜2\.1](https://arxiv.org/html/2606.17215#S2.Thmtheorem1)the right notion of “what a degree\-2t2tprogram can prove”: a degree\-2t2tmoment\-only certificate ofP\[U∈B\]≥\\mdmathbb\{P\}\[U\\in B\]\\geq\\rho*exists*if and only if such a minorantppexists\. We record this equivalence, together with the Gauss\-quadrature dual that drives the barrier of[Section˜4](https://arxiv.org/html/2606.17215#S4), as[Lemma˜4\.14](https://arxiv.org/html/2606.17215#S4.Thmtheorem14); both are textbook moment theory\(Szego39;Gautschi04\)and the proof is in the appendix\. The upshot, used repeatedly, is that the strongest band\-mass lower bound a degree\-2t2tprogram can certify is governed by how the first2t2tmoments constrain a measure, and a\(t\+1\)\(t\{\+\}1\)\-node quadrature rule matching those moments can push all mass outside a far band, capping the certifiable density at zero\. That is the engine of the margin–degree tradeoff\.
#### Certifiable hypercontractivity\.
The clean side needs the reverse: a degree\-2t2t*upper*bound on projected moments, proved by SoS inww, so that the program can act uniformly over directions\. This is certifiable hypercontractivity, and for our log\-concave mixture it holds with the sub\-exponential constant\.
Formally \(stated and proved with the main results as[Lemma˜4\.13](https://arxiv.org/html/2606.17215#S4.Thmtheorem13)\), for every integert≥1t\\geq 1there is a degree\-2t2tSoS proof, in the variableww, of
Ex∼𝒟X\[\(w⋅\(x−¯\)\)2t\]≤\(Ct\)2t\(¯2\(w\)\)t,\\mdmathbb\{E\}\_\{x\\sim\\mathcal\{D\}\_\{X\}\}\\big\[\(w\\cdot\(x\-\\bar\{\\mu\}\)\)^\{2t\}\\big\]\\ \\leq\\ \(Ct\)^\{2t\}\\,\\big\(\\bar\{\\sigma\}^\{2\}\(w\)\\big\)^\{t\},withCCan absolute constant; that is,𝒟X\\mathcal\{D\}\_\{X\}is2t2t\-certifiably hypercontractive with constant\(Ct\)2t\(Ct\)^\{2t\}\.
The constant here is\(Ct\)2t\(Ct\)^\{2t\}, the sub\-exponential \(Lovász–VempalaLovaszVempala07\) constant, not the sub\-Gaussian\(Ct\)t\(Ct\)^\{t\}\. The latter is genuinely false for general log\-concave laws: a one\-line Laplace check gives
E\[Z2t\]E\[Z2\]t=\(2t\)\!2t=\(\(t\)\)2t,\\frac\{\\mdmathbb\{E\}\[Z^\{2t\}\]\}\{\\mdmathbb\{E\}\[Z^\{2\}\]^\{t\}\}=\\frac\{\(2t\)\!\}\{2^\{t\}\}=\(\\Theta\(t\)\)^\{2t\},which exceeds any\(Ct\)t\(Ct\)^\{t\}fort≥Ct\\geq C, so no sub\-Gaussian certificate exists in this generality\. For sub\-Gaussian components \(Gaussian, strongly log\-concave, or uniform\-on\-the\-cube\) the sharper\(Ct\)t\(Ct\)^\{t\}does hold and is SoS\-certifiable\(KlivansKothariMeka18;DiakonikolasHopkinsPensiaTiegel24\); we do not assume that, and we commit to the honest\(Ct\)2t\(Ct\)^\{2t\}throughout\. The single\-component certificate for the general log\-concave case is obtained by a scalar\-to\-SoS lift; its self\-contained derivation, and the mixing and averaging steps that carry it to[Equation˜4\.7](https://arxiv.org/html/2606.17215#S4.E7), are deferred to[Appendix˜B](https://arxiv.org/html/2606.17215#A2)\.
### 2\.1Auxiliary results
Our proofs draw on a handful of external results: the log\-concave and malicious\-noise ingredients named above, the orthogonal\-polynomial facts behind the Gauss\-quadrature barrier, and the moment\-theory facts the Christoffel perspective of[Section˜3](https://arxiv.org/html/2606.17215#S3)invokes\. To keep them in one place rather than scattered across the proofs, every external result we use is collected and restated, with hypotheses and a citation, in[Appendix˜C](https://arxiv.org/html/2606.17215#A3); we cross\-reference each at its point of use by its label there\.
## 3The Christoffel function and the certificate ceiling
Start from the adversary’s vantage point\. A degree\-2t2toutlier\-removal certificate reads the clean marginal only through its first2t2tmoments, so a corruption is invisible to it exactly when*some*nonnegative measure that still carries the corruption matches those moments\. One quantity then decides everything: among all measures sharing the clean marginal’s first2t2tmoments, how much mass can be piled at a single centercc? A rate\- corruption planted atccstays beyond the certificate’s reach for as long as that maximum exceeds \.
We did not have to invent this maximum\. The most mass a moment\-matching measure can place atccis exactly the*Christoffel function*\(c\)t\+1\{\}\_\{t\+1\}\(c\)of the marginal \([Fact˜3\.2](https://arxiv.org/html/2606.17215#S3.Thmtheorem2)\), the reciprocal of the Christoffel–Darboux kernel on its diagonal\. The certificate model thus hands us the Christoffel function unbidden, as the*resolution threshold*of its own moment cone; the*resolution principle*\([Proposition˜3\.3](https://arxiv.org/html/2606.17215#S3.Thmtheorem3)\) makes the equivalence exact\. Everything below is a reading of where\(c\)t\+1\{\}\_\{t\+1\}\(c\)falls relative to the corruption it must resolve: it is large at a tail center for small degree \(a corruption hides\), it collapses past the node edge \(a band is forced empty\), and it crosses at a computable order \(the corruption is exposed\)\. The learning consequences of these three readings are drawn in[Section˜4](https://arxiv.org/html/2606.17215#S4)\.
Two literatures meet here\. On one side, the certificate ceiling is an instance of the*truncated moment problem*\(CurtoFialkow91;DetteStudden97\)and of the moment–SOS hierarchy\(LasserrePauwelsCDK\)\. On the other, the same Christoffel function is, in modern data analysis, exactly the moment\-based*outlier detector*\(LasserrePauwels19;PauwelsLasserreCDOutliers\), run \(as we will see\) with the opposite sign\. We quantify the learning\-theoretic limits of that detector\. The three figures of this section are one Christoffel curve seen from three distances: a far band swept empty past the node edge \([Figure˜2](https://arxiv.org/html/2606.17215#S3.F2)\), the curve crossing the corruption rate at a computable center \([Figure˜3](https://arxiv.org/html/2606.17215#S3.F3)\), and the same curve carrying the detector’s meaning in reverse \([Figure˜1](https://arxiv.org/html/2606.17215#S0.F1)\)\.
### 3\.1The certificate ceiling is a truncated moment problem
Make the opening’s question precise, in the dual form the certificate actually faces\. To defend a bandBBthe certificate must*lower*\-bound the clean mass inside it, and that bound is only as strong as the worst moment\-matching measure allows\. Fix the projected clean marginal onR\\mdmathbb\{R\}with momentsmk=E\[Uk\]m\_\{k\}=\\mdmathbb\{E\}\[U^\{k\}\]\. By[Definition˜2\.1](https://arxiv.org/html/2606.17215#S2.Thmtheorem1), the strongest band\-mass lower bound a degree\-2t2tcertificate can establish is the moment\-LP value
m¯\(2t\)=min\{\(B\):≥0,∫ukd=mk\(0≤k≤2t\)\},\\displaystyle\\underline\{m\}\(2t\)\\;=\\;\\min\\Big\\\{\\nu\(B\):\\ \\nu\\geq 0,\\ \\textstyle\\int u^\{k\}\\,d\\nu=m\_\{k\}\\ \\ \(0\\leq k\\leq 2t\)\\Big\\\},\(3\.1\)the minimum band mass over the*moment variety*ℳ2t\(\)≔\{≥0:∫ukd=mk,k≤2t\}\\mathcal\{M\}\_\{2t\}\(\\mu\)\\coloneqq\\\{\\nu\\geq 0:\\int u^\{k\}d\\nu=m\_\{k\},\\ k\\leq 2t\\\}of nonnegative measures sharing ’s first2t2tmoments\. This is the truncated Stieltjes moment problem with a linear objective; its extreme points are the finitely\-supported*principal representations*, of which Gauss quadrature is one\(DetteStudden97\)\. The governing framework fact is already visible here: for an off\-center band the varietyℳ2t\(\)\\mathcal\{M\}\_\{2t\}\(\\mu\)contains a representation supported away fromBB, som¯\(2t\)=0\\underline\{m\}\(2t\)=0and no degree\-2t2tcertificate can force mass intoBB\([Figure˜2](https://arxiv.org/html/2606.17215#S3.F2)\)\. Symmetrically, certifiable*anti*\-concentration\(BakshiKothariRTV24\)is the*upper*envelopemax\{\(B\):∈ℳ2t\(\)\}\\max\\\{\\nu\(B\):\\nu\\in\\mathcal\{M\}\_\{2t\}\(\\mu\)\\\}; the two are the two faces of the same moment cone, which is why one is certifiable at low degree and the other \(the pro\-concentration we need\) is not\.
clean marginalbandB=\[c−,c\+\]B=\[c\-\\tau,\\,c\+\\tau\]planted massatccnode range\[−Rt\+1,Rt\+1\]\[\-R\_\{t\+1\},\\,R\_\{t\+1\}\],Rt\+1≍stR\_\{t\+1\}\\asymp s\\sqrt\{t\}Gauss nodesu1,…,ut\+1u\_\{1\},\\dots,u\_\{t\+1\}no node inBB⇒\\Rightarrowcertifiable mass=0=0Figure 2:The impossibility mechanism\.A degree\-2t2tcertificate sees the clean marginal only through its first2t2tmoments, which are reproduced exactly by the\(t\+1\)\(t\{\+\}1\)\-node Gauss measure supported on the nodesu1,…,ut\+1∈\[−Rt\+1,Rt\+1\]u\_\{1\},\\dots,u\_\{t\+1\}\\in\[\-R\_\{t\+1\},R\_\{t\+1\}\]withRt\+1≍stR\_\{t\+1\}\\asymp s\\sqrt\{t\}\. A bandBBpushed past the node range contains no node, so every measure matching the moments places zero mass there, and a corruption of any rate planted atccstays invisible to the certificate\.
### 3\.2The Christoffel function governs the resolution
Let\{\}kk≥0\\\{\{\}\_\{k\}\\\}\_\{k\\geq 0\}be the orthonormal polynomials of and
Kn\(x,y\)=∑k=0n−1\(x\)k\(y\)k,\(x\)n≔Kn\(x,x\)−1=\(∑k=0n−1\(x\)2k\)−1\\displaystyle K\_\{n\}\(x,y\)=\\sum\_\{k=0\}^\{n\-1\}\{\}\_\{k\}\(x\)\{\}\_\{k\}\(y\),\\qquad\{\}\_\{n\}\(x\)\\coloneqq K\_\{n\}\(x,x\)^\{\-1\}=\\Big\(\\sum\_\{k=0\}^\{n\-1\}\{\}\_\{k\}\(x\)^\{2\}\\Big\)^\{\-1\}\(3\.2\)the Christoffel–Darboux kernel and the Christoffel function of ordernn\. The single fact we need is classical and givesnits meaning as a*maximal mass*\.
###### Fact 3\.2\(Christoffel function as maximal mass;Nevai86;DetteStudden97\)\.
For everyx0∈Rx\_\{0\}\\in\\mdmathbb\{R\},\(x0\)t\+1=min\{∫p2d:degp≤t,p\(x0\)=1\}=max\{\(\{x0\}\):∈ℳ2t\(\)\}\.\{\}\_\{t\+1\}\(x\_\{0\}\)=\\min\\\{\\int p^\{2\}\\,d\\mu:\\ \\deg p\\leq t,\\ p\(x\_\{0\}\)=1\\\}=\\max\\\{\\nu\(\\\{x\_\{0\}\\\}\):\\ \\nu\\in\\mathcal\{M\}\_\{2t\}\(\\mu\)\\\}\.The two equalities are dual: the left is the extremal problem definingt\+1, and the right is the largest atom a measure with ’s first2t2tmoments may carry atx0x\_\{0\}\. In particular the Gauss weight at a nodeuiu\_\{i\}oft\+1equals\(ui\)t\+1\{\}\_\{t\+1\}\(u\_\{i\}\)\. Equivalently, in the finite\-dimensional reproducing\-kernel space\(R\[u\]≤t,⟨⋅,⋅⟩\)\(\\mdmathbb\{R\}\[u\]\_\{\\leq t\},\\langle\\cdot,\\cdot\\rangle\)with reproducing kernelKtK\_\{t\},\(x0\)t\+1=∥∥−2x0\{\}\_\{t\+1\}\(x\_\{0\}\)=\\\|\{\}\_\{x\_\{0\}\}\\\|^\{\-2\}is the reciprocal squared norm of point\-evaluationp↦p\(x0\)p\\mapsto p\(x\_\{0\}\): the resolution threshold is the point\-evaluation energy atx0x\_\{0\}\.
###### Proof\.
For the right equality, let∈ℳ2t\(\)\\nu\\in\\mathcal\{M\}\_\{2t\}\(\\mu\)carry masswwatx0x\_\{0\}, and letpphavedegp≤t\\deg p\\leq t,p\(x0\)=1p\(x\_\{0\}\)=1\. Sincedegp2≤2t\\deg p^\{2\}\\leq 2t,∫p2d=∫p2d\\int p^\{2\}\\,d\\nu=\\int p^\{2\}\\,d\\mu; dropping all of but the atom atx0x\_\{0\}givesw=wp\(x0\)2≤∫p2d=∫p2dw=w\\,p\(x\_\{0\}\)^\{2\}\\leq\\int p^\{2\}\\,d\\nu=\\int p^\{2\}\\,d\\mu\. Minimizing the right side over admissibleppyieldsw≤\(x0\)t\+1w\\leq\{\}\_\{t\+1\}\(x\_\{0\}\), and the Gauss measure withx0x\_\{0\}adjoined as a node attains it\. ∎
The maximal\-mass reading turns both barriers into one inequality\. A corruption is hidden from a degree\-2t2tmoment program exactly when the corrupted sample still lies inℳ2t\(\)\\mathcal\{M\}\_\{2t\}\(\\mu\), i\.e\. when its moments are consistent with the clean ones\. Placing a planted mass at a tail centerccis possible withinℳ2t\(\)\\mathcal\{M\}\_\{2t\}\(\\mu\)up to\(c\)t\+1\{\}\_\{t\+1\}\(c\), and no further\.
###### Proposition 3\.3\(Christoffel resolution principle\)\.
Let a corruption of mass be planted at centercc\(an atom, or a width\-O\(s\)O\(s\)cluster, atcc\)\. It is consistent with the clean marginal’s first2t2tmoments \(hence invisible to every degree\-2t2tmoment certificate\) if and only if
≤\(c\)t\+1\.\\displaystyle\\eta\\ \\leq\\ \{\}\_\{t\+1\}\(c\)\.\(3\.3\)Equivalently, the least certificate degree that*resolves*a mass\- corruption at centerccis2t⋆\(c,\)2t^\{\\star\}\(c,\\eta\)witht⋆=min\{t:\(c\)t\+1<\}t^\{\\star\}=\\min\\\{t:\{\}\_\{t\+1\}\(c\)<\\eta\\\}\.
###### Proof\.
Immediate from[Fact˜3\.2](https://arxiv.org/html/2606.17215#S3.Thmtheorem2): a representation inℳ2t\(\)\\mathcal\{M\}\_\{2t\}\(\\mu\)carrying the planted mass atccexists iff≤\(c\)t\+1\\eta\\leq\{\}\_\{t\+1\}\(c\), and such a representation is precisely a clean\-moment\-consistent explanation of the corrupted sample, against which no degree\-2t2tcertificate can act\. \(For a width\-O\(s\)O\(s\)cluster the same holds up to the\(1\)\\Theta\(1\)factor by whicht\+1varies over the cluster\.\) ∎
Both barriers are already visible in[Proposition˜3\.3](https://arxiv.org/html/2606.17215#S3.Thmtheorem3): they are the two regimes of where\(c\)t\+1\{\}\_\{t\+1\}\(c\)sits relative to the planted mass\. The first is sharp enough to record on its own\.
###### Fact 3\.4\(The degree\-22/degree\-44gap at the \-tail\)\.
At the threshold centerc=/c=\\sigma/\\sqrt\{\\eta\}of a variance\-2marginal,
\(c\)2=11\+c2/2=1\+=\(\),\(c\)3≤1\(c\)22=\(\)2\.\\displaystyle\{\}\_\{2\}\(c\)=\\frac\{1\}\{1\+c^\{2\}/\{\}^\{2\}\}=\\frac\{\\eta\}\{1\+\\eta\}=\\Theta\(\\eta\),\\qquad\{\}\_\{3\}\(c\)\\leq\\frac\{1\}\{\{\}\_\{2\}\(c\)^\{2\}\}=\\Theta\(\{\}^\{2\}\)\.\(3\.4\)Hence a mass\- corruption atccis invisible to degree22\(≤\(c\)2\\eta\\leq\{\}\_\{2\}\(c\)\) but resolved by degree44\(\>\(c\)3\\eta\>\{\}\_\{3\}\(c\)\): a gap of one moment order, with a sharp constant, in and the certificate order alone\.
###### Proof\.
Apply[Proposition˜3\.3](https://arxiv.org/html/2606.17215#S3.Thmtheorem3)with the first orthonormal polynomials of a variance\-2law,≡01\{\}\_\{0\}\\equiv 1,\(u\)1=u/\{\}\_\{1\}\(u\)=u/\\sigma,\(u\)2=\(u2−\)2/\(22\)\{\}\_\{2\}\(u\)=\(u^\{2\}\-\{\}^\{2\}\)/\(\{\}^\{2\}\\sqrt\{2\}\)\(up to the\(1\)\\Theta\(1\)fourth\-moment constant\), and evaluate\(c\)k\+1=\(∑j≤k\(c\)2j\)−1\{\}\_\{k\+1\}\(c\)=\\big\(\\sum\_\{j\\leq k\}\{\}\_\{j\}\(c\)^\{2\}\\big\)^\{\-1\}atc=/c=\\sigma/\\sqrt\{\\eta\}\. ∎
The same principle fixes the ceiling for an off\-center*band*: it carries forced certificate mass exactly when\(c\)t\+1\{\}\_\{t\+1\}\(c\)has dropped to the target , i\.e\. exactly whenccenters the node range\|c\|Rt\+1\|c\|\\lesssim R\_\{t\+1\}, since the smallest Christoffel value at the edge is\(\)\\Theta\(\\rho\)exactly at the largest node\. Below the node range the ceiling is0unconditionally; inside it,\(\(B\)\)\\Theta\(\\mu\(B\)\)modulo the weighted\-extremal estimate isolated in[Section˜3\.4](https://arxiv.org/html/2606.17215#S3.SS4)\. Both regimes are forced on a learner in[Section˜4](https://arxiv.org/html/2606.17215#S4)\.
###### Example 3\.5\(The hiding game\)\.
Make the numbers concrete\. Let be a clean, centered, unit\-variance log\-concave law \(s=1s=1\), and let the adversary corrupt a rate=10−4\\eta=10^\{\-4\}of the data\. The resolution principle says the adversary’s best move is to plant that mass as far out as it can still hide: the threshold center isc=/=100c=\\sigma/\\sqrt\{\\eta\}=100standard deviations from the origin, a spike of weight=10−4\\eta=10^\{\-4\}atu=100u=100\. Audit it the way a degree\-22certificate does, through the mean and the variance\. The spike shifts the mean byc=10−2\\eta c=10^\{\-2\}\(absorbed into recentering\) and addsc2=1\\eta c^\{2\}=1to the variance: it spends*exactly one unit of variance budget*, the whole budget the clean law already owns\. Through the first two moments the corrupted sample is indistinguishable from a clean unit\-variance law, so\(c\)2=\(\)\{\}\_\{2\}\(c\)=\\Theta\(\\eta\)and no degree\-22outlier program can so much as name the spike\. Degree44changes the ledger\. The spike contributesc4=104\\eta c^\{4\}=10^\{4\}to the fourth moment, while a clean unit\-variance log\-concave law carriesE\[U4\]=\(1\)\\mdmathbb\{E\}\[U^\{4\}\]=\\Theta\(1\): the fourth moment is inflated by a factor∼1/\\sim 1/\\eta, a discrepancy the first two moments hid completely\. This is exactly the\(c\)3=\(\)2≪\{\}\_\{3\}\(c\)=\\Theta\(\{\}^\{2\}\)\\ll\\etaof[Fact˜3\.4](https://arxiv.org/html/2606.17215#S3.Thmtheorem4): the atom that saturates the degree\-22ceiling sits four orders*above*the degree\-44ceiling, and a degree\-44certificate exposes it\. One spike, two verdicts; all that changed is the order of the moments one is allowed to look through\.
centercc\(c\)t\+1\{\}\_\{t\+1\}\(c\)023cdeg4⋆c^\{\\star\}\_\{\\deg 4\}c=/c=\\sigma/\\sqrt\{\\eta\}\(c\)2=\(\)\{\}\_\{2\}\(c\)=\\Theta\(\\eta\), above:*hidden from degree*22\(c\)3=\(\)2\{\}\_\{3\}\(c\)=\\Theta\(\{\}^\{2\}\), below:*exposed by degree*44Figure 3:The resolution threshold\.The certificate ceiling\(c\)t\+1\{\}\_\{t\+1\}\(c\)decays from the bulk, and a corruption of rate at centerccis exposed exactly where the curve drops below the dotted line \. Raising the degree from22to44pulls the curve down so that at the tail centerc=/c=\\sigma/\\sqrt\{\\eta\}the value falls from\(\)\\Theta\(\\eta\)\(still hidden\) to\(\)2\\Theta\(\{\}^\{2\}\)\(now resolved\): the degree\-22/degree\-44gap of[Example˜3\.5](https://arxiv.org/html/2606.17215#S3.Thmtheorem5)\.
### 3\.3The degree of robustness is the Christoffel degree
[Proposition˜3\.3](https://arxiv.org/html/2606.17215#S3.Thmtheorem3)is the lower\-bound counterpart of a tool already in use\.LasserrePauwels19andPauwelsLasserreCDOutliersdetect outliers and recover supports by thresholding the*empirical*Christoffel function: a pointxxis flagged when\(x\)n\{\}\_\{n\}\(x\)\(built from the empirical moments\) is small, i\.e\. when degree\-nnmoments placexxoutside the bulk\.[Proposition˜3\.3](https://arxiv.org/html/2606.17215#S3.Thmtheorem3)says that this is exactly the mechanism, and the only mechanism, available to a degree\-2t2toutlier\-removal program: it can down\-weight a planted corruption atccif and only if the order\-\(t\+1\)\(t\{\+\}1\)Christoffel function already resolves it,\(c\)t\+1<\{\}\_\{t\+1\}\(c\)<\\eta\. The reweighted\-hinge programs ofTalwar20;Shen25run this detector at degree22; raising the hinge program to degree2t2tis raising the Christoffel detector to ordert\+1t\{\+\}1\. We can now state the law the framework yields, the*degree of robustness*: the SoS degree at which a certificate first tolerates corruption rate at margin scale equals the Christoffel order that resolves a mass\- cluster at the \-tail of the marginal,
t⋆\(\)=min\{t:\(/\)t\+1<\}\.t^\{\\star\}\(\\eta\)\\ =\\ \\min\\bigl\\\{\\,t:\\ \{\}\_\{t\+1\}\(\\sigma/\\sqrt\{\\eta\}\)\\,<\\,\\eta\\,\\bigr\\\}\.It is a property of and the order, which is why the resource is degree and not sample size: more samples sharpen the*empirical*Christoffel function toward its population limit, but the population value at the planted center is fixed by the marginal and the order, so an infinite sample at fixed degree buys nothing against a hidden spike\. The only knob that lowers\(c\)t\+1\{\}\_\{t\+1\}\(c\)past is the order itself\. The degree\-2t2talgorithm of[Section˜4](https://arxiv.org/html/2606.17215#S4)is this law made operational\.
#### Relationship to the Christoffel\-function program of Lasserre, Pauwels, and Putinar\.
The Christoffel function is by now a central object in moment\-based data analysis\(PauwelsLasserre16;LasserrePauwels19;PauwelsPutinarLasserre21;LasserrePauwelsPutinar22\), where the*empirical*Christoffel function serves as a*detector*: a large value\(x\)n\{\}\_\{n\}\(x\)marksxxas typical, a small value flags it as an outlier, and sublevel sets recover the shape of a point cloud\(PauwelsLasserre16;PauwelsLasserreCDOutliers\)\. The algebraic engine, the identification of the moment–SoS hierarchy with the Christoffel–Darboux kernel\(LasserrePauwelsCDK;Lasserre23PosCert\), is the same one we use for the SoS dual of the moment LP\. Our reading, however, runs the sign the other way: a*large*\(c\)t\+1\{\}\_\{t\+1\}\(c\)is bad, not good\. By[Proposition˜3\.3](https://arxiv.org/html/2606.17215#S3.Thmtheorem3)it is exactly the corruption mass an adversary can hide atccfrom a degree\-2t2tcertificate, so the centers a detector finds hardest to clear are precisely the centers a robust learner cannot defend\. In one line, the*detector’s blind spot*is the*adversary’s hiding place*:
H≔\{c:\(c\)t\+1≥\},H\\ \\coloneqq\\ \\\{\\,c:\\ \{\}\_\{t\+1\}\(c\)\\geq\\eta\\,\\\},the set of centers at which a rate\- corruption survives every degree\-2t2tcertificate; raising the degree is the only way to shrink it\. Where that program asks “isxxtypical, and at what order does the empirical detector see it?”, we ask “how much malicious mass can survive atcc, and at what order is it forced out?”, and we turn the answer into an*impossibility*statement for the certificate method, rather than a detection rule\. To our knowledge the Christoffel function has served this line as a constructive tool, for detection, support and density estimation, and typicality, and has not previously been used as a barrier from which to derive learning\-theoretic lower bounds; that inversion \([Figure˜1](https://arxiv.org/html/2606.17215#S0.F1)\), and the margin–degree law it yields, is what is new here\.
### 3\.4From barrier to law: a weighted\-Chebyshev reduction
The framework’s lower edge leaves one quantity open: a*matching*upper certificate, a degree\-2t2tminorant whose band mass meets the moment\-LP ceiling\. We can say precisely what closing it requires, and reduce it to one classical question about orthogonal polynomials\. The key is to parametrize the minorant by a square\. Forq∈R\[u\]≤tq\\in\\mdmathbb\{R\}\[u\]\_\{\\leq t\}with\|q\|≥1\|q\|\\geq 1onR∖B\\mdmathbb\{R\}\\setminus B, the degree\-2t2tpolynomialp=1−q2p=1\-q^\{2\}is automatically a minorant of𝟏B\\mathbf\{1\}\_\{B\}\(onBB,p≤1p\\leq 1; offBB,q2≥1q^\{2\}\\geq 1forcesp≤0p\\leq 0\), andE\[p\]=1−∫q2d\\mdmathbb\{E\}\[p\]=1\-\\int q^\{2\}\\,d\\mu\. Optimizing overqqtherefore yields a certifiable band mass governed by the*weighted\-Chebyshev extremal value*
𝖤t\(B\)≔min\{∫q2d:q∈R\[u\]≤t,\|q\|≥1onR∖B\}\.\\displaystyle\\mathsf\{E\}\_\{t\}\(B\)\\ \\coloneqq\\ \\min\\Big\\\{\\textstyle\\int q^\{2\}\\,d\\mu:\\ q\\in\\mdmathbb\{R\}\[u\]\_\{\\leq t\},\\ \|q\|\\geq 1\\text\{ on \}\\mdmathbb\{R\}\\setminus B\\Big\\\}\.\(3\.5\)
###### Proposition 3\.6\(Reduction to a weighted extremal problem\)\.
The degree\-2t2tcertificate ceiling for the bandBBis at least1−𝖤t\(B\)1\-\\mathsf\{E\}\_\{t\}\(B\)\. Moreover the Gauss\-node interpolantq0q\_\{0\}of degreett, defined byq0\(uj\)=𝟏\[uj∉B\]q\_\{0\}\(u\_\{j\}\)=\\mathbf\{1\}\[u\_\{j\}\\notin B\]on the nodesu1,…,ut\+1u\_\{1\},\\dots,u\_\{t\+1\}oft\+1, has∫q02d=1−WB\\int q\_\{0\}^\{2\}\\,d\\mu=1\-W\_\{B\}, whereWB≔∑uj∈B\(uj\)t\+1W\_\{B\}\\coloneqq\\sum\_\{u\_\{j\}\\in B\}\{\}\_\{t\+1\}\(u\_\{j\}\)is the Gauss\-quadrature \(Christoffel\) mass of the nodes insideBB\. Thus the certifiable band mass is controlled byWBW\_\{B\}, the discrete Christoffel content of the band, up to the admissibility ofq0q\_\{0\}\.
###### Proof\.
The first claim is the constructionp=1−q2p=1\-q^\{2\}above: each admissibleqqgives a degree\-2t2tminorant of value1−∫q2d1\-\\int q^\{2\}\\,d\\mu, and the supremum of these values is1−𝖤t\(B\)1\-\\mathsf\{E\}\_\{t\}\(B\), a lower bound on the ceiling\. For the identity,degq02=2t≤2t\+1\\deg q\_\{0\}^\{2\}=2t\\leq 2t\+1, so Gauss quadrature is exact:
∫q02d=∑j\(uj\)t\+1q0\(uj\)2=∑uj∉B\(uj\)t\+1=1−WB,\\int q\_\{0\}^\{2\}\\,d\\mu=\\sum\_\{j\}\{\}\_\{t\+1\}\(u\_\{j\}\)\\,q\_\{0\}\(u\_\{j\}\)^\{2\}=\\sum\_\{u\_\{j\}\\notin B\}\{\}\_\{t\+1\}\(u\_\{j\}\)=1\-W\_\{B\},using∑j\(uj\)t\+1=∫1d=1\\sum\_\{j\}\{\}\_\{t\+1\}\(u\_\{j\}\)=\\int 1\\,d\\mu=1\([Fact˜3\.2](https://arxiv.org/html/2606.17215#S3.Thmtheorem2)for the weights\)\. ∎
The reduction isolates the one missing estimate: an*admissible*polynomial achieving the node\-interpolant value, up to the spectral edge\.
###### Lemma 3\.7\(Weighted\-extremal value at the edge; the remaining estimate\)\.
Let be centered log\-concave of standard deviationssandB=\[c−,c\+\]B=\[c\-\\tau,c\+\\tau\],=s\\tau=\\theta s,=\(1\)\\theta=\\Theta\(1\)\. For every∈\(0,1\)\\delta\\in\(0,1\)there isc\>0c\>0such that
\|c\|≤\(1−\)Rt\+1⟹𝖤t\(B\)≤1−c\(B\)\.\|c\|\\leq\(1\-\\delta\)R\_\{t\+1\}\\ \\Longrightarrow\\ \\mathsf\{E\}\_\{t\}\(B\)\\leq 1\-c\\,\\mu\(B\)\.
[Lemma˜3\.7](https://arxiv.org/html/2606.17215#S3.Thmtheorem7)is exactly the weighted\-extremal estimate that classical potential theory for exponential weights supplies: in the bulk, Christoffel asymptotics\(MateNevaiTotik91;Totik00\)giveWB=\(B\)\(1\+o\(1\)\)W\_\{B\}=\\mu\(B\)\(1\+o\(1\)\), while the infinite–finite range inequality\(LevinLubinsky01;KriecherbauerMcLaughlin99\)controls the polynomial mass beyond the Mhaskar–Rakhmanov–Saff numberat\+1≍Rt\+1a\_\{t\+1\}\\asymp R\_\{t\+1\}, so the cost stays below11up to the edge\. We have not derived the uniform non\-asymptotic constantccfrom scratch, so we state[Lemma˜3\.7](https://arxiv.org/html/2606.17215#S3.Thmtheorem7)as the residual input; granting it, the barrier becomes an exact law\.
The reduction \([Proposition˜3\.6](https://arxiv.org/html/2606.17215#S3.Thmtheorem6)\) together with this residual estimate \([Lemma˜3\.7](https://arxiv.org/html/2606.17215#S3.Thmtheorem7)\) supplies the matching upper certificate2t=\(\(\|c\|/s\)2\)2t=\\Theta\(\(\|c\|/s\)^\{2\}\)to the framework’s lower edge, modulo a single weighted\-extremal statement that is purely about orthogonal polynomials, decoupled from the learning model, and supported by the cited edge asymptotics\.[Section˜4](https://arxiv.org/html/2606.17215#S4)draws the learning consequence as an exact law\. We regard deriving its uniform constant as the natural next target\.
## 4Consequences: barriers and a matching algorithm
The framework of[Section˜3](https://arxiv.org/html/2606.17215#S3)has already done the work; the robust\-learning statements below are its readings at specific centers\. The certificate ceiling is\(\(B\)\)\\Theta\(\\mu\(B\)\)inside the node range and0beyond it \([Proposition˜3\.3](https://arxiv.org/html/2606.17215#S3.Thmtheorem3)\), and each result is one corner of that dichotomy\. The single resource is the*degree*of the Sum\-of\-Squares \(SoS\) outlier\-removal certificate, the order at which the Christoffel function resolves a corruption\. The reweighted\-hinge approach ofTalwar20;Shen25runs on a degree\-22certificate \(a variance bound on the reweighted sample\); we show what raising the degree to2t2tbuys, and what it does not\.[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)reads the resolution principle off an off\-center band, tying the certifiable band mass to the SoS degree through Gauss quadrature and yielding a margin–degree dichotomy that pins down where thelog\(1/\)\\log\(1/\\varepsilon\)inShen25’s margin comes from;[Theorem˜4\.5](https://arxiv.org/html/2606.17215#S4.Thmtheorem5)closes it to an exact law\.[Theorem˜4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)realizes the degree\-22corner as one explicit instance, on which degree\-22reweighting is stuck at the1/2rate while degree\-44escapes, exactly as\(c\)2\{\}\_\{2\}\(c\)versus\(c\)3\{\}\_\{3\}\(c\)predicts in \([3\.4](https://arxiv.org/html/2606.17215#S3.E4)\)\.[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)is the matching algorithm: a degree\-2t2tprogram, the order\-\(t\+1\)\(t\{\+\}1\)empirical Christoffel detector, that traces the achievable frontier1\-1/2twitht=1t=1recoveringShen25exactly\.[Proposition˜4\.11](https://arxiv.org/html/2606.17215#S4.Thmtheorem11)is the information\-theoretic floor/2\\eta/2that the algorithm matches in rate\. The two barriers \([Theorems˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)and[4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)\) are against the certificate method, not against all algorithms; we state the model each time\. Throughout, the marginal is the \-margin log\-concave mixture of[Section˜2](https://arxiv.org/html/2606.17215#S2),¯2\(w\)\\bar\{\\sigma\}^\{2\}\(w\)is its reweighted directional variance, andCCis an absolute constant whose value may change between displays\. Full proofs are deferred to[Appendix˜B](https://arxiv.org/html/2606.17215#A2)\.
### 4\.1The margin–degree tradeoff
The first result reads the certificate ceiling of[Section˜3\.1](https://arxiv.org/html/2606.17215#S3.SS1)off an off\-center band\. By \([3\.1](https://arxiv.org/html/2606.17215#S3.E1)\) the strongest band\-mass bound a degree\-2t2tcertificate can prove is the moment\-LP valuem¯\(2t\)\\underline\{m\}\(2t\); the tradeoff is the contrapositive of the ceiling, once the band clears the node rangem¯\(2t\)=0\\underline\{m\}\(2t\)=0and no certificate exists\. The scope is the moment/SoS\-certificate model of[Definition˜2\.1](https://arxiv.org/html/2606.17215#S2.Thmtheorem1),Shen25’s “our approach” made formal: it is*not*an information\-theoretic or unconditional\-SQ lower bound, and a learner that reads the marginal in any other way is outside it\.
###### Theorem 4\.1\(Margin–degree tradeoff; certificate method\)\.
LetU=w⋅xU=w\\cdot xfor a fixed directionww, withUUcentered log\-concave of standard deviationss, and letB=\[c−,c\+\]B=\[c\-\\tau,c\+\\tau\]be a band of half\-width=s\\tau=\\theta s,=\(1\)\\theta=\\Theta\(1\), at centercc\. Suppose there is a degree\-2t2tSoS certificate that lower\-bounds the band mass, a proof ofPrU\[U∈B\]≥=\(1\)\\Pr\_\{U\}\[U\\in B\]\\geq\\rho=\\Omega\(1\)whose only input is the first2t2tmoments ofUU\(made precise in[Lemma˜4\.14](https://arxiv.org/html/2606.17215#S4.Thmtheorem14)\)\. Then
2t=\(\(\|c\|/s\)2\)\(sub\-Gaussian marginal\),2t=\(\|c\|/s\)\(sub\-exponential marginal\)\.\\displaystyle 2t\\;=\\;\\Omega\\big\(\(\\lvert c\\rvert/s\)^\{2\}\\big\)\\quad\\text\{\(sub\-Gaussian marginal\)\},\\qquad 2t\\;=\\;\\Omega\\big\(\\lvert c\\rvert/s\\big\)\\quad\\text\{\(sub\-exponential marginal\)\}\.\(4\.1\)Consequently, any certificate learner attaining inlier error on the marginal of[Section˜2](https://arxiv.org/html/2606.17215#S2)must pay*either*SoS degree
t=\(log\(1/\)\)\(runtimen\(log\(1/\)\)\),or=\(log\(1/\)/d\)\.\\displaystyle t\\;=\\;\\Omega\\big\(\\log\(1/\\varepsilon\)\\big\)\\quad\\text\{\(runtime \}n^\{\\Omega\(\\log\(1/\\varepsilon\)\)\}\\text\{\)\},\\qquad\\text\{or\}\\qquad\\gamma\\;=\\;\\Omega\\big\(\\sqrt\{\\log\(1/\\varepsilon\)\}\\,/\\sqrt\{d\}\\big\)\.\(4\.2\)Thelog\(1/\)\\log\(1/\\varepsilon\)inShen25’s margin is conserved: it is exchangeable against SoS degree, and removable only by leaving the one\-shot moment\-certificate model\.
###### Proof sketch\.
A degree\-2t2tmoment\-only certificate ofPrU\[U∈B\]≥\\Pr\_\{U\}\[U\\in B\]\\geq\\rhois, by the minorant reduction \([Lemma˜4\.14](https://arxiv.org/html/2606.17215#S4.Thmtheorem14)\(i\)\), a polynomialppof degree≤2t\\leq 2twithp≤𝟏Bp\\leq\\mathbf\{1\}\_\{B\}pointwise together with a degree\-2t2tproof, from the moment axioms alone, that∫pd≥\\int p\\,d\\nu\\geq\\rho\. Because the proof reads only the moments, its conclusion must hold for*every*nonnegative measure matching the momentsm0,…,m2tm\_\{0\},\\dots,m\_\{2t\},mk=E\[Uk\]m\_\{k\}=\\mdmathbb\{E\}\[U^\{k\}\]\. Weak moment\-LP duality then caps the best certifiable value byinf\{\(B\):≥0,∫ukd=mk\(k≤2t\)\}\\inf\\\{\\nu\(B\):\\nu\\geq 0,\\ \\int u^\{k\}\\,d\\nu=m\_\{k\}\\ \(k\\leq 2t\)\\\}, so a single matching measure that places*no*mass onBBkills the certificate\. Gauss quadrature \([Lemma˜4\.14](https://arxiv.org/html/2606.17215#S4.Thmtheorem14)\(ii\)\) supplies one: the\(t\+1\)\(t\{\+\}1\)\-node Gauss measure matches all2t2tmoments and is supported only on thet\+1t\{\+\}1zerosu1,…,ut\+1u\_\{1\},\\dots,u\_\{t\+1\}of the orthonormal polynomialt\+1\. WithRt\+1≔maxi\|ui\|R\_\{t\+1\}\\coloneqq\\max\_\{i\}\\lvert u\_\{i\}\\rvertthe largest absolute zero, a band with\|c\|−\>Rt\+1\\lvert c\\rvert\-\\tau\>R\_\{t\+1\}contains no node, hence carries zero Gauss mass, hence admits no certificate\. The largest\-root bound \([Lemma˜4\.15](https://arxiv.org/html/2606.17215#S4.Thmtheorem15)\) givesRt\+1=O\(st\)R\_\{t\+1\}=O\(s\\sqrt\{t\}\)in the sub\-Gaussian case andRt\+1=O\(st\)R\_\{t\+1\}=O\(st\)in the sub\-exponential case; pushing the band out to\|c\|−s\>Rt\+1\\lvert c\\rvert\-\\theta s\>R\_\{t\+1\}yields \([4\.1](https://arxiv.org/html/2606.17215#S4.E1)\) by contraposition\. The dichotomy \([4\.2](https://arxiv.org/html/2606.17215#S4.E2)\) follows by settingccto the worst \-quantile\|c⋆\|\\lvert c^\{\\star\}\\rvertwith and reading off the log\-concave tail \([Lemma˜4\.16](https://arxiv.org/html/2606.17215#S4.Thmtheorem16)\)\. The full proof, including the geometric step relating a standardized half\-width to the physical margin, is in[Appendix˜B](https://arxiv.org/html/2606.17215#A2)\. ∎
###### Corollary 4\.4\(Freud\-exponent family; the margin exponent is1/1/\\alpha\)\.
Suppose the clean marginal has a Freud\-type taile−\|u/s\|e^\{\-\\lvert u/s\\rvert\}with exponent≥1\\alpha\\geq 1\(=2\\alpha=2sub\-Gaussian,=1\\alpha=1sub\-exponential\)\. Then the dichotomy of[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)reads: any certificate learner of inlier error must pay SoS degreet=\(log\(1/\)\)t=\\Omega\(\\log\(1/\\varepsilon\)\)*or*margin=\(\(log\(1/\)\)1//d\)\\gamma=\\Omega\\big\(\(\\log\(1/\\varepsilon\)\)^\{1/\\alpha\}/\\sqrt\{d\}\\big\)\. The degree floor is*universal*in ; the margin exponent1/1/\\alphais the reciprocal Freud exponent, equal to the growth rate of the equilibrium\-measure \(MRS\) numberat≍st1/a\_\{t\}\\asymp s\\,t^\{1/\\alpha\}\([Fact˜C\.4](https://arxiv.org/html/2606.17215#A3.Thmtheorem4)\), which is where the orthonormal\-polynomial zeros end\. The cases=2,1\\alpha=2,1recover \([4\.2](https://arxiv.org/html/2606.17215#S4.E2)\) and thelog\(1/\)\\log\(1/\\varepsilon\)marginShen25records\.
###### Proof\.
By[Fact˜C\.4](https://arxiv.org/html/2606.17215#A3.Thmtheorem4)the largest zero isRt\+1=\(1\+o\(1\)\)at\+1≍st1/R\_\{t\+1\}=\(1\+o\(1\)\)\\,a\_\{t\+1\}\\asymp s\\,t^\{1/\\alpha\}, so the no\-node condition\|c\|−s\>Rt\+1\\lvert c\\rvert\-\\theta s\>R\_\{t\+1\}of[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)forces2t=\(\(\|c\|/s\)\)2t=\\Omega\(\(\\lvert c\\rvert/s\)\)\. The \-quantile of thee−\|u/s\|e^\{\-\\lvert u/s\\rvert\}tail is\|c⋆\|=\(s\(log\(1/\)\)1/\)\\lvert c^\{\\star\}\\rvert=\\Theta\\big\(s\(\\log\(1/\\varepsilon\)\)^\{1/\\alpha\}\\big\), whence2t=\(\(\|c⋆\|/s\)\)=\(log\(1/\)\)2t=\\Omega\(\(\\lvert c^\{\\star\}\\rvert/s\)\)=\\Omega\(\\log\(1/\\varepsilon\)\)at fixed standardized half\-width; and the geometric correspondence of[Lemma˜4\.16](https://arxiv.org/html/2606.17215#S4.Thmtheorem16)turns the escape condition into=\(\|c⋆\|/\(sd\)\)=\(\(log\(1/\)\)1//d\)\\gamma=\\Omega\(\\lvert c^\{\\star\}\\rvert/\(s\\sqrt\{d\}\)\)=\\Omega\\big\(\(\\log\(1/\\varepsilon\)\)^\{1/\\alpha\}/\\sqrt\{d\}\\big\)\. ∎
So far the tradeoff is a lower bound on degree\. The weighted\-Chebyshev reduction of[Section˜3](https://arxiv.org/html/2606.17215#S3)\([Proposition˜3\.6](https://arxiv.org/html/2606.17215#S3.Thmtheorem6)\) upgrades it to a*matching*ceiling, so the threshold is not merely necessary but exact, conditional on the single residual orthogonal\-polynomial estimate[Lemma˜3\.7](https://arxiv.org/html/2606.17215#S3.Thmtheorem7)\.
###### Theorem 4\.5\(Tight margin–degree threshold; conditional on[Lemma˜3\.7](https://arxiv.org/html/2606.17215#S3.Thmtheorem7)\)\.
Assume[Lemma˜3\.7](https://arxiv.org/html/2606.17215#S3.Thmtheorem7)\. For a sub\-Gaussian log\-concave marginal, the degree\-2t2tcertificate ceiling for a band of half\-width=\(s\)\\tau=\\Theta\(s\)at centerccis\(\(B\)\)\\Theta\(\\mu\(B\)\)for\|c\|≤\(1−\)Rt\+1\\lvert c\\rvert\\leq\(1\-\\delta\)R\_\{t\+1\}and0for\|c\|\>Rt\+1\\lvert c\\rvert\>R\_\{t\+1\}\([Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)\)\. Hence the threshold of[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)is tight:2t=\(\(\|c\|/s\)2\)\.2t=\\Theta\\big\(\(\\lvert c\\rvert/s\)^\{2\}\\big\)\.
###### Proof\.
Upper: the reduction[Proposition˜3\.6](https://arxiv.org/html/2606.17215#S3.Thmtheorem6)and the edge estimate[Lemma˜3\.7](https://arxiv.org/html/2606.17215#S3.Thmtheorem7)give a minorant of value≥c\(B\)\\geq c\\,\\mu\(B\)for\|c\|≤\(1−\)Rt\+1\\lvert c\\rvert\\leq\(1\-\\delta\)R\_\{t\+1\}\. Lower:[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)gives ceiling0once\|c\|\>Rt\+1\\lvert c\\rvert\>R\_\{t\+1\}, i\.e\. once2t<\(\(\|c\|/s\)/C1\)22t<\(\(\\lvert c\\rvert/s\)/C\_\{1\}\)^\{2\}\. WithRt\+1=\(st\)R\_\{t\+1\}=\\Theta\(s\\sqrt\{t\}\)\([Lemma˜4\.15](https://arxiv.org/html/2606.17215#S4.Thmtheorem15)\) the two meet at2t=\(\(\|c\|/s\)2\)2t=\\Theta\(\(\\lvert c\\rvert/s\)^\{2\}\)\. ∎
### 4\.2The degree\-2 outlier barrier
The degree\-22/degree\-44gap \([Fact˜3\.4](https://arxiv.org/html/2606.17215#S3.Thmtheorem4)\) is a statement about the marginal; here it is forced on a learner\. At the \-tail center the certificate ceiling is\(c\)2=\(\)\{\}\_\{2\}\(c\)=\\Theta\(\\eta\)at degree22and\(c\)3=\(\)2\{\}\_\{3\}\(c\)=\\Theta\(\{\}^\{2\}\)at degree44\([Fact˜3\.4](https://arxiv.org/html/2606.17215#S3.Thmtheorem4), and[Example˜3\.5](https://arxiv.org/html/2606.17215#S3.Thmtheorem5)for the moment audit\), so any degree\-22outlier program must leave the spike that a degree\-44program removes\. The following instance makes the gap a concrete rate separation: degree22cannot beat1/2on it, while degree44does\. Where the tradeoff was about which bands a certificate can see, this is about which outliers a low\-degree program can remove\.
The relevant quantity is the dirty contribution to the hinge gradient,
LSN\(q∘SD\)=sup∥w∥≤1∑i∈SDqi\|w⋅xi\|,\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)=\\sup\_\{\\lVert w\\rVert\\leq 1\}\\sum\_\{i\\in S\_\{D\}\}q\_\{i\}\\lvert w\\cdot x\_\{i\}\\rvert,whereq∈\[0,1\]nq\\in\[0,1\]^\{n\}are the outlier\-removal weights andSDS\_\{D\}is the corrupted set\. The barrier is*label\-oblivious in expectation*: it holds for any reweighting computed without ground\-truth labels, on average over which cluster points the adversary corrupts\. The literal “for all feasibleqq” fails once the removal budget reaches \(the polytope then contains the weight that zeros the spike\); we state the honest in\-expectation form\.
###### Theorem 4\.7\(Degree\-2 outlier barrier; label\-oblivious\)\.
Let≥\(1/d\)\\eta\\geq\\Omega\(1/d\)\. There is a clean log\-concave mixture with covariance⪯I2\\Sigma\\preceq\{\}^\{2\}Iand an \-fraction malicious setSDS\_\{D\}such that every*label\-oblivious degree\-2 \(variance\) reweighting*—anyq∈\[0,1\]nq\\in\[0,1\]^\{n\}with∑iqi≥\(1−\)n\\sum\_\{i\}q\_\{i\}\\geq\(1\-\\xi\)nfeasible for the directional\-variance budget1n∑iqi\(w⋅xi\)2≤¯2\\frac\{1\}\{n\}\\sum\_\{i\}q\_\{i\}\(w\\cdot x\_\{i\}\)^\{2\}\\leq\\bar\{\\sigma\}^\{2\}for allww\(here¯2=\(\)2\\bar\{\\sigma\}^\{2\}=\\Theta\(\{\}^\{2\}\)is the*empirical*directional variance of the corrupted sample, which a degree\-2 program cannot certify below without detecting the spike\), produced by a procedure not given the labels—leaves, in expectation over the adversary’s choice of which cluster points are corrupted,
E\[LSN\(q∘SD\)\]=\(n1/2¯\)\.\\displaystyle\\mdmathbb\{E\}\\big\[\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)\\big\]\\;=\\;\\Omega\\big\(\{\}^\{1/2\}\\,n\\,\\bar\{\\sigma\}\\big\)\.\(4\.3\)In contrast, the degree\-2t2tprogram of[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)achievesLSN\(q∘SD\)/n≤C¯1−1/2t\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)/n\\leq C\\bar\{\\sigma\}\\,\{\}^\{1\-1/2t\}; in particular degree44\(t=2t=2\) reaches<3/41/2\{\}^\{3/4\}<\{\}^\{1/2\}\. Degree\-22outlier removal is stuck at the1/2rate, i\.e\.\(/¯\)20\{\}\_\{0\}\\lesssim\(\\gamma\\rho/\\bar\{\\sigma\}\)^\{2\}\.
\(a\) the construction𝒞−\\mathcal\{C\}\_\{\-\}𝒞\+\\mathcal\{C\}\_\{\+\}e1e\_\{1\}−/\-\\sigma/\\sqrt\{\\eta\}\+/\+\\sigma/\\sqrt\{\\eta\}label\-oblivious programsees*one*multisetclean cluster𝒞±\\mathcal\{C\}\_\{\\pm\}dirtySDS\_\{D\}\(\-fraction\)\(b\) the moment ledgerm2=2m\_\{2\}=\{\}^\{2\}\(unchanged\)ccc\+dc\{\+\}d×1/\\times\\,1/\\etam4m\_\{4\}:\(\)4→/4\\Theta\(\{\}^\{4\}\)\{\\to\}\{\}^\{4\}/\\etaccc\+dc\{\+\}dcc: clean onlyc\+dc\{\+\}d: clean\+\+dirtydegree22blind, degree44seesFigure 4:The*moment\-cloaked spike*behind[Theorem˜4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)\. \(a\) Two genuine log\-concave clusters𝒞±\\mathcal\{C\}\_\{\\pm\}at±\(/\)e1\\pm\(\\sigma/\\sqrt\{\\eta\}\)e\_\{1\}; the adversary’s dirty points \(filled\) are drawn from the*same*clusters and sit at the clean points’ \(hollow\) locations, so a label\-oblivious variance program sees one indistinguishable multiset\. \(b\) The dirty mass spends exactly the variance budget, so the second momentm2=2m\_\{2\}=\{\}^\{2\}is identical with or without it \(degree22is blind\), while the fourth moment is inflated by1/1/\\eta, the tell a degree\-44certificate sees\.###### Proof sketch\.
The construction is a*moment\-cloaked spike*\([Figure˜4](https://arxiv.org/html/2606.17215#S4.F4)\): two genuine log\-concave clusters𝒞±\\mathcal\{C\}\_\{\\pm\}at means±\(/\)e1\\pm\(\\sigma/\\sqrt\{\\eta\}\)e\_\{1\}\([Lemma˜4\.17](https://arxiv.org/html/2606.17215#S4.Thmtheorem17); using two unimodal bumps rather than one±a\\pm aatom keeps each component log\-concave, at the cost of the regime≥\(1/d\)\\eta\\geq\\Omega\(1/d\)\)\. The adversary addsn\\eta ndirty points drawn from the same𝒞±\\mathcal\{C\}\_\{\\pm\}, so the sample carries≈3n\\approx 3\\eta npoints near±ae1\\pm ae\_\{1\}witha=/a=\\sigma/\\sqrt\{\\eta\}\. The dirty points are engineered to be*second\-moment invisible*\(they spend exactly the variance budget,1n∑SD\(e1⋅xi\)2=a2=2\\frac\{1\}\{n\}\\sum\_\{S\_\{D\}\}\(e\_\{1\}\\cdot x\_\{i\}\)^\{2\}=\\eta a^\{2\}=\{\}^\{2\}\) but*first\-moment aligned*,∑SD\|e1⋅xi\|=na=n\\sum\_\{S\_\{D\}\}\\lvert e\_\{1\}\\cdot x\_\{i\}\\rvert=\\eta n\\,a=\\sqrt\{\\eta\}\\,n\\sigma\. Because the variance constraints depend only on the multiset of points, a label\-oblivious program cannot tell then\\eta ndirty points from the2n2\\eta nclean points of𝒞±\\mathcal\{C\}\_\{\\pm\}at the same locations; removing at mostn\\xi npoints total, it must keep at least a13\\tfrac\{1\}\{3\}share of the combined cluster in expectation, henceE\[LSN\]≥13na=13n\\mdmathbb\{E\}\[\\mathrm\{LSN\}\]\\geq\\tfrac\{1\}\{3\}\\eta n\\,a=\\tfrac\{1\}\{3\}\\sqrt\{\\eta\}\\,n\\sigma\. The sameSDS\_\{D\}has an inflated fourth moment,1n∑SD\(e1⋅xi\)4=¯4/≫¯4\\frac\{1\}\{n\}\\sum\_\{S\_\{D\}\}\(e\_\{1\}\\cdot x\_\{i\}\)^\{4\}=\\bar\{\\sigma\}^\{4\}/\\eta\\gg\\bar\{\\sigma\}^\{4\}, which a degree\-44certificate sees and drives down to the3/4rate, so the separation is exactly one of moment degree\. Details are in[Appendix˜B](https://arxiv.org/html/2606.17215#A2)\. ∎
### 4\.3The degree\-2t2treweighted hinge
The algorithm is the degree\-of\-robustness law of[Section˜3\.3](https://arxiv.org/html/2606.17215#S3.SS3)run forward\. Its outlier\-removal step*is*the order\-\(t\+1\)\(t\{\+\}1\)empirical Christoffel detector ofLasserrePauwels19run as a certificate rather than a threshold, by the bridge raising the detector order is the only knob, and the two barriers fix exactly which rates it can and cannot reach\.[Algorithm˜1](https://arxiv.org/html/2606.17215#alg1)reweights the sample with this degree\-2t2tSum\-of\-Squares program and then minimizes the reweighted hinge loss\. The single parameter is the degreett\. Att=1t=1the degree\-2t2tconstraint is exactly the empirical directional variance, and[Algorithm˜1](https://arxiv.org/html/2606.17215#alg1)coincides with the reweighted\-hinge method ofShen25:*their algorithm is the degree\-22\(t=1t=1\) special case of ours*, and raisingttis the only change: thed×dd\\times dGram matrix of their degree\-22program becomes the\(d\+t−1t\)×\(d\+t−1t\)\\binom\{d\+t\-1\}\{t\}\\times\\binom\{d\+t\-1\}\{t\}moment matrix of the degree\-2t2tprogram \([Appendix˜D](https://arxiv.org/html/2606.17215#A4)\), and nothing else in the method moves\. The gain is*modest and explicit*: degree2t2tremoves the degree\-22exponent loss in the noise tolerance \(c2↦cc^\{2\}\\mapsto c\) at the price of the sub\-exponential prefactor\(Ct\)\(Ct\); it does not march to1/21/2\(the tolerance is capped by the pancake density/4\\rho/4\)\. The value of the result is its*tightness*:[Theorem˜4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)shows the degree\-22corner cannot be improved within the method\.
Algorithm 1Degree\-2t2trobust reweighted hinge1:sample
S=\{\(xi,yi\)\}i=1nS=\\\{\(x\_\{i\},y\_\{i\}\)\\\}\_\{i=1\}^\{n\}; margin ; degree
t≥1t\\geq 1; variance bound
¯2\\bar\{\\sigma\}^\{2\}; removal budget
=\(\)\\xi=\\Theta\(\\eta\)
2:unit halfspace
w^\\hat\{w\}
3:Prune\.Discard each
\(xi,yi\)\(x\_\{i\},y\_\{i\}\)whose
∥xi∥\\lVert x\_\{i\}\\rVertexceeds the clean\-sample radius
Rcl=C¯dR\_\{\\mathrm\{cl\}\}=C\\bar\{\\sigma\}\\sqrt\{d\}\.⊳\\trianglerightremoves gross\-norm corruptions; every clean∥x∥≤Rcl\\lVert x\\rVert\\leq R\_\{\\mathrm\{cl\}\}w\.h\.p\. \(log\-concave concentration\)
4:Reweight\(degree\-
2t2tsoft outlier removal\)\. Find
q∈\[0,1\]nq\\in\[0,1\]^\{n\}with
∑iqi≥\(1−\)n\\sum\_\{i\}q\_\{i\}\\geq\(1\-\\xi\)nadmitting a degree\-
2t2tSoS certificate, in the indeterminate
ww, of
5:1n∑iqi\(w⋅xi\)2t≤\(Ct\)2t¯2tfor allw\\displaystyle\\rule\{0\.0pt\}\{15\.49997pt\}\\tfrac\{1\}\{n\}\\sum\_\{i\}q\_\{i\}\\,\(w\\cdot x\_\{i\}\)^\{2t\}\\ \\leq\\ \(Ct\)^\{2t\}\\,\\bar\{\\sigma\}^\{2t\}\\qquad\\text\{for all \}w
6:⊳\\trianglerighta semidefinite program of sizedO\(t\)d^\{O\(t\)\}\([Appendix˜D](https://arxiv.org/html/2606.17215#A4)\); feasible by[Lemma˜4\.13](https://arxiv.org/html/2606.17215#S4.Thmtheorem13)\. Att=1t=1: the variance LP ofShen25\.
7:Reweighted hinge\.
w^←argmin∥w∥≤1/∑iqihinge\(w;xi,yi\)\\displaystyle\\hat\{w\}\\leftarrow\\arg\\min\_\{\\lVert w\\rVert\\leq 1/\\gamma\}\\ \\sum\_\{i\}q\_\{i\}\\,\\operatorname\{hinge\}\(w;x\_\{i\},y\_\{i\}\)\.⊳\\trianglerightconvex program
8:return
w^/∥w^∥\\hat\{w\}/\\lVert\\hat\{w\}\\rVert\.
###### Theorem 4\.9\(Degree\-2t2trobust hinge\)\.
Run[Algorithm˜1](https://arxiv.org/html/2606.17215#alg1)with degreet≥1t\\geq 1\. Its degree\-2t2toutlier\-removal program is feasible \(the clean\-indicatorqqis certified by[Lemma˜4\.13](https://arxiv.org/html/2606.17215#S4.Thmtheorem13)\), and any feasibleqqsatisfies the Hölder bound
LSN\(q∘SD\)=sup∥w∥≤1∑i∈SDqi\|w⋅xi\|≤n\(Ct\)¯\.1−12t\\displaystyle\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)\\;=\\;\\sup\_\{\\lVert w\\rVert\\leq 1\}\\sum\_\{i\\in S\_\{D\}\}q\_\{i\}\\lvert w\\cdot x\_\{i\}\\rvert\\;\\leq\\;n\\,\(Ct\)\\,\\bar\{\\sigma\}\\,\{\}^\{1\-\\frac\{1\}\{2t\}\}\.\(4\.4\)Consequently, under the \-margin log\-concave mixture of[Section˜2](https://arxiv.org/html/2606.17215#S2), the outputw^\\hat\{w\}has inlier errorerr𝒟X\(w^\)≤\\mathrm\{err\}\_\{\\mathcal\{D\}\_\{X\}\}\(\\hat\{w\}\)\\leq\\varepsilonfor every malicious rate
≤\(t\)0=\(min\{4,\(/\(\(Ct\)¯\)\)2t2t−1\}\),\\displaystyle\\eta\\;\\leq\\;\{\}\_\{0\}\(t\)\\;=\\;\\Omega\\Big\(\\min\\Big\\\{\\tfrac\{\\rho\}\{4\},\\,\\big\(\\gamma\\rho/\(\(Ct\)\\bar\{\\sigma\}\)\\big\)^\{\\frac\{2t\}\{2t\-1\}\}\\Big\\\}\\Big\),\(4\.5\)in timenO\(t\)n^\{O\(t\)\}with sample sizen=dO\(t\)poly\(1/\)n=d^\{O\(t\)\}\\operatorname\{poly\}\(1/\\varepsilon\)\. Att=1t=1this is exactlyShen25’s1/2rate; ast→∞t\\to\\inftythe exponent in \([4\.4](https://arxiv.org/html/2606.17215#S4.E4)\) tends to11\.
###### Proof sketch\.
The bound \([4\.4](https://arxiv.org/html/2606.17215#S4.E4)\) is one application of Hölder with exponent2t2t:∑SDqi\|w⋅xi\|≤\(∑SDqi\)1−12t\(∑SDqi\|w⋅xi\|2t\)12t\\sum\_\{S\_\{D\}\}q\_\{i\}\\lvert w\\cdot x\_\{i\}\\rvert\\leq\(\\sum\_\{S\_\{D\}\}q\_\{i\}\)^\{1\-\\frac\{1\}\{2t\}\}\(\\sum\_\{S\_\{D\}\}q\_\{i\}\\lvert w\\cdot x\_\{i\}\\rvert^\{2t\}\)^\{\\frac\{1\}\{2t\}\}, then∑SDqi≤n\\sum\_\{S\_\{D\}\}q\_\{i\}\\leq\\eta nand, since\|w⋅x\|2t=\(w⋅x\)2t\\lvert w\\cdot x\\rvert^\{2t\}=\(w\\cdot x\)^\{2t\}is an even polynomial and the sum overSDS\_\{D\}is dominated by the sum over all ofSS, the certified bound1n∑Sqi\(w⋅xi\)2t≤\(Ct\)2t¯2t\\frac\{1\}\{n\}\\sum\_\{S\}q\_\{i\}\(w\\cdot x\_\{i\}\)^\{2t\}\\leq\(Ct\)^\{2t\}\\bar\{\\sigma\}^\{2t\}caps the second factor\. Feasibility is[Lemma˜4\.13](https://arxiv.org/html/2606.17215#S4.Thmtheorem13)applied to the empirical clean measure\. The noise tolerance \([4\.5](https://arxiv.org/html/2606.17215#S4.E5)\) comes from feeding \([4\.4](https://arxiv.org/html/2606.17215#S4.E4)\) into the robustness condition ofShen25: the clean hinge gradient along the empirical optimumw^\\hat\{w\}is lower\-bounded by the dense\-pancake density , and requiring4\(−2\)\>LSN\(q∘SD\)/n\\frac\{\\gamma\}\{4\}\(\\rho\-2\\xi\)\>\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)/nsolves to\(t\)0\{\}\_\{0\}\(t\)\. The full assembly is in[Appendix˜B](https://arxiv.org/html/2606.17215#A2)\. ∎
### 4\.4The information\-theoretic floor
The algorithm’s tolerance is bounded above by the barriers; its accuracy is bounded below by information theory\. The next proposition is a breakdown floor that no algorithm, of any computational power, can beat, and it matches[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)in rate\.
###### Proposition 4\.11\(Breakdown floor\)\.
Under malicious noise \(the adversary sees the clean sample and replaces an \-fraction of instance–label pairs with full knowledge\) and a \-margin log\-concave mixture with⪯I2\\Sigma\\preceq\{\}^\{2\}I, no algorithm can outputw^\\hat\{w\}with inlier error below
err𝒟X\(w^\)≥2\(1−\)≥2\.\\displaystyle\\mathrm\{err\}\_\{\\mathcal\{D\}\_\{X\}\}\(\\hat\{w\}\)\\;\\geq\\;\\frac\{\\eta\}\{2\(1\-\\eta\)\}\\;\\geq\\;\\frac\{\\eta\}\{2\}\.\(4\.6\)The bound is margin\-independent and rate\-tight against the\(\)\\Theta\(\\eta\)upper bound of[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)\. The two\-halfspace instance realizing it is genuinely \-margin separable in the regime≤e−/222\\eta\\leq e^\{\-\{\}^\{2\}/2\{\}^\{2\}\}\(the rotation by angle=/\(1−\)\\theta=\\pi\\eta/\(1\-\\eta\)preserves the margin there\); the floor itself does not depend on \.
###### Proof sketch\.
Le Cam two\-point\. Take an isotropic log\-concave instance and two halfspacesw1⋆,w2⋆w\_\{1\}^\{\\star\},w\_\{2\}^\{\\star\}at angle , inducing clean joint lawsP1,P2P\_\{1\},P\_\{2\}\. The Kearns–Li malicious envelope\(KearnsLi93\)givesTV\(P1,P2\)≤/\(1−\)\\mathrm\{TV\}\(P\_\{1\},P\_\{2\}\)\\leq\\eta/\(1\-\\eta\), so an \-malicious adversary makes the two instances indistinguishable whenever
/=Pr\[sgn\(w1⋆⋅x\)≠sgn\(w2⋆⋅x\)\]≤/\(1−\)\\theta/\\pi=\\Pr\[\\operatorname\{sgn\}\(w\_\{1\}^\{\\star\}\\\!\\cdot x\)\\neq\\operatorname\{sgn\}\(w\_\{2\}^\{\\star\}\\\!\\cdot x\)\]\\leq\\eta/\(1\-\\eta\)\(the random\-hyperplane disagreement identity\)\. Any singlew^\\hat\{w\}errs by at least/\(2\)\\theta/\(2\\pi\)on one of the two instances, giving \([4\.6](https://arxiv.org/html/2606.17215#S4.E6)\)\. The covariance control under halfspace truncation usesBrascampLieb76\. Details in[Appendix˜B](https://arxiv.org/html/2606.17215#A2)\. ∎
### 4\.5Supporting lemmas
We collect the four lemmas the theorems rest on\. Each is stated here and proved in[Appendix˜B](https://arxiv.org/html/2606.17215#A2)\.
###### Lemma 4\.13\(Certifiable hypercontractivity\)\.
Let𝒟X=1K∑j=1K𝒟j\\mathcal\{D\}\_\{X\}=\\frac\{1\}\{K\}\\sum\_\{j=1\}^\{K\}\\mathcal\{D\}\_\{j\}onRd\\mdmathbb\{R\}^\{d\}, each𝒟j\\mathcal\{D\}\_\{j\}log\-concave with meanj\(∥∥j≤r\\lVert\{\}\_\{j\}\\rVert\\leq r\) and⪯jI2\{\}\_\{j\}\\preceq\{\}^\{2\}I, in the regimer=O\(d\)r=O\(\\sigma\\sqrt\{d\}\); write¯=1K∑jj\\bar\{\\mu\}=\\frac\{1\}\{K\}\\sum\_\{j\}\{\}\_\{j\}and¯2\(w\)=E𝒟X\[\(w⋅\(x−¯\)\)2\]\\bar\{\\sigma\}^\{2\}\(w\)=\\mdmathbb\{E\}\_\{\\mathcal\{D\}\_\{X\}\}\[\(w\\cdot\(x\-\\bar\{\\mu\}\)\)^\{2\}\]\. Then for everyt≥1t\\geq 1there is a degree\-2t2tSoS proof inwwof
Ex∼𝒟X\[\(w⋅\(x−¯\)\)2t\]≤\(Ct\)2t\(¯2\(w\)\)t,\\displaystyle\\mdmathbb\{E\}\_\{x\\sim\\mathcal\{D\}\_\{X\}\}\\big\[\(w\\cdot\(x\-\\bar\{\\mu\}\)\)^\{2t\}\\big\]\\;\\leq\\;\(Ct\)^\{2t\}\\,\\big\(\\bar\{\\sigma\}^\{2\}\(w\)\\big\)^\{t\},\(4\.7\)withCCabsolute\. The constant is the sub\-exponential\(Ct\)2t\(Ct\)^\{2t\}; the sub\-Gaussian\(Ct\)t\(Ct\)^\{t\}holds only under added sub\-Gaussianity of the components\.
###### Lemma 4\.14\(Minorant reduction and Gauss quadrature\)\.
Let be a measure with all moments and infinite support,mk=∫ukdm\_\{k\}=\\int u^\{k\}\\,d\\mu\.\(i\)A degree\-2t2tmoment\-only certificate ofPr\[U∈B\]≥\\Pr\[U\\in B\]\\geq\\rhois equivalent to a polynomialpp,degp≤2t\\deg p\\leq 2t, withp≤𝟏Bp\\leq\\mathbf\{1\}\_\{B\}pointwise, together with a degree\-2t2tSoS proof from the moment axioms\{∫ukd=mk\}k≤2t\\\{\\int u^\{k\}\\,d\\nu=m\_\{k\}\\\}\_\{k\\leq 2t\}that∫pd≥\\int p\\,d\\nu\\geq\\rho\.\(ii\)The\(t\+1\)\(t\{\+\}1\)\-node Gauss measure=GQ∑iuii\{\}\_\{\\mathrm\{GQ\}\}=\\sum\_\{i\}\{\}\_\{i\}\{\}\_\{u\_\{i\}\}\(nodes the zeros of the orthonormal polynomialt\+1, Christoffel weights\>i0\{\}\_\{i\}\>0\) is nonnegative, atomic on thet\+1t\{\+\}1nodes, and exact on degrees≤2t\+1\\leq 2t\+1:∫gd=GQ∫gd\\int g\\,d\{\}\_\{\\mathrm\{GQ\}\}=\\int g\\,d\\mu\.
###### Lemma 4\.15\(Largest\-root bound\)\.
For centered log\-concave at scaless, the largest absolute zeroRt\+1=maxi\|ui\|R\_\{t\+1\}=\\max\_\{i\}\\lvert u\_\{i\}\\rvertof the orthonormal polynomialt\+1satisfies:Rt\+1≤2st\+1R\_\{t\+1\}\\leq 2s\\sqrt\{t\+1\}for a sub\-Gaussian tail \(e−\|u/s\|2e^\{\-\\lvert u/s\\rvert^\{2\}\}, exact for Hermite\), andRt\+1=O\(st\)R\_\{t\+1\}=O\(s\\,t\)for a sub\-exponential tail \(e−\|u/s\|e^\{\-\\lvert u/s\\rvert\}, Freud exponent=1\\alpha=1\)\. More generally, for a Freud exponent∈\[1,2\]\\alpha\\in\[1,2\],Rt\+1≍st1/R\_\{t\+1\}\\asymp s\\,t^\{1/\\alpha\}\.
###### Lemma 4\.16\(Log\-concave quantiles and margin geometry\)\.
For centered log\-concave at scaless, the \-quantile satisfies\|c⋆\|=\(slog\(1/\)\)\\lvert c^\{\\star\}\\rvert=\\Theta\(s\\sqrt\{\\log\(1/\\beta\)\}\)in the sub\-Gaussian case and\(slog\(1/\)\)\\Theta\(s\\log\(1/\\beta\)\)in the sub\-exponential case\. Under near\-isotropy \(∥w∥≤1/\\lVert w\\rVert\\leq 1/\\gamma,s=\(1\)s=\\Theta\(1\)\), a standardized half\-width=\(s\)\\tau=\\Theta\(s\)corresponds to physical margin=\(d\)\\gamma=\\Theta\(\\tau\\sqrt\{d\}\)\.
The cloak construction behind[Theorem˜4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)is also deferred; we state it as a lemma for the appendix\.
###### Lemma 4\.17\(Two\-cluster moment cloak\)\.
For≥\(1/d\)\\eta\\geq\\Omega\(1/d\)there exist two log\-concave components𝒞±\\mathcal\{C\}\_\{\\pm\}onRd\\mdmathbb\{R\}^\{d\}with⪯𝒞±I2\{\}\_\{\\mathcal\{C\}\_\{\\pm\}\}\\preceq\{\}^\{2\}Iand means±\(/\)e1\\pm\(\\sigma/\\sqrt\{\\eta\}\)e\_\{1\}\(so∥∥±=/≤r\\lVert\{\}\_\{\\pm\}\\rVert=\\sigma/\\sqrt\{\\eta\}\\leq rfits the regimer=O\(d\)r=O\(\\sigma\\sqrt\{d\}\)exactly when≥\(1/d\)\\eta\\geq\\Omega\(1/d\)\) such that the combined3n3\\eta n\-point cluster \(clean2n2\\eta nplus dirtyn\\eta n, both drawn from𝒞±\\mathcal\{C\}\_\{\\pm\}\) spends exactly thee1e\_\{1\}\-variance budget2and is invariant under permuting dirty and clean cluster points\.
## 5Discussion
### 5\.1Context and comparisons
#### The reweighted\-hinge cluster, and the two limitations we explain\.
Hinge\-loss minimization is robust because the loss is Lipschitz: a corrupted point can tilt the gradient by only so much\.Talwar20made this quantitative through the dense\-pancake condition and a first\-order \(KKT\) analysis, tolerating a malicious fraction=\(\)\\eta=\\Omega\(\\gamma\)at margin=\(logd/d\)\\gamma=\\Omega\(\\log d/\\sqrt\{d\}\)\.Shen25sharpened the constant to0a fixed absolute constant by*reweighting*the hinge: an outlier\-removal program assigns weightsq∈\[0,1\]nq\\in\[0,1\]^\{n\}and the only distributional fact it uses about the clean marginal is a*degree\-22*\(empirical\-variance\) bound on the reweighted directional moment, through the controlling quantityLSN\(q∘SD\)\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)\.ZengShen25carry the same degree\-22reweighting program into the attribute\-efficient setting, inheriting the same two features, and those two features are exactly what we name\. The multiclass work ofAdhikariZeng26instead removes outliers by cluster\-based pruning with a*standard*multiclass hinge rather than by reweighting, so it lies outside the reweighted\-hinge program our barriers target\.Shen25states only that the breakdown rate “cannot be made close to the optimal breakdown point of12\\tfrac\{1\}\{2\}due to inherent limitations of our approach,” without identifying the limitation, and that the margin has a “logarithmic dependence on1/1/\\varepsilonthat appears less natural,” which he conjectures “can be improved to=\(1/d\)\\gamma=\\Omega\(1/\\sqrt\{d\}\)” through an active\-learning subroutine\. We make the limitation precise: it is the*degree*of the certificate\.[Theorem˜4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)gives a clean log\-concave instance on which degree22is*provably*stuck at the1/2rate that forcesShen25’s small constant, while degree44already reaches3/4; the constant is small because the exponent is12\\tfrac\{1\}\{2\}, a property of degree22, and raising the degree lifts it toward the pancake cap \([Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9),1\-1/2t, not a march to1/21/2\)\. The margin he calls less natural is, we show,*forced*:[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)proves thelog\(1/\)\\log\(1/\\varepsilon\)is conserved by every one\-shot certificate of fixed degree, so it is the intrinsic price of the degree, not an artifact of his analysis, removable only by raising the degree \(inn\(log1/\)n^\{\\Omega\(\\log 1/\\varepsilon\)\}time\), enlarging the margin, or leaving the one\-shot model by iterating, the very active\-learning route he conjectures \([Section˜5\.2](https://arxiv.org/html/2606.17215#S5.SS2)\)\. The companion[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)raises the program to degree2t2tand recovers the matching upper side1\-1/2t, sot=1t=1\(Shen25\) is the exact degree\-22corner of a frontier rather than an isolated bound\.
#### The machinery we import\.
Degree raising is borrowed wholesale from the sum\-of\-squares \(SoS\) literature on robust statistics\. The mechanism that converts a degree\-2t2tcertified\-moment bound into an outlier\-removal guarantee at rate1\-1/2tis the proofs\-to\-algorithms paradigm ofHopkinsLi18andKothariSteinhardtSteurer18, and the property that makes it apply \(certifiable hypercontractivity, an SoS proof thatE\[\(w⋅x\)2t\]\\mdmathbb\{E\}\[\(w\\cdot x\)^\{2t\}\]is bounded by a power ofE\[\(w⋅x\)2\]\\mdmathbb\{E\}\[\(w\\cdot x\)^\{2\}\]\) is due toKlivansKothariMeka18, with the general subgaussian case settled byDiakonikolasHopkinsPensiaTiegel24\. For us the certifiable constant is\(Ct\)2t\(Ct\)^\{2t\}, the sub\-exponential rate of a general log\-concave marginal \(a single Laplace coordinate already forcesm2t/m2t=\(\(t\)\)2tm\_\{2t\}/m\_\{2\}^\{t\}=\(\\Theta\(t\)\)^\{2t\}, so the often\-quoted\(Ct\)t\(Ct\)^\{t\}is false at this generality\); this is what couples the achievable to the SoS degree in[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)\.
#### The dual direction we provably do not have\.
The same SoS toolkit can certify*anti*\-concentration:BakshiKothariRTV24give quasi\-polynomial SoS certificates that a slab carries*at most*a prescribed mass, beyond the Gaussian case\. Our dense\-pancake premise needs the opposite \(a degree\-2t2tcertificate that an off\-center band carries*at least*mass\), and[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)is precisely the statement that no such*pro*\-concentration certificate exists below degree\(\(\|c\|/s\)2\)\\Omega\(\(\|c\|/s\)^\{2\}\)\. This is the cone asymmetry of[Section˜3\.1](https://arxiv.org/html/2606.17215#S3.SS1)read at the learning scale: the two faces ofℳ2t\(\)\\mathcal\{M\}\_\{2t\}\(\\mu\)are certifiable anti\-concentration \(the upper envelope\) and the pro\-concentration we need \(the lower envelope\), and only the former is low\-degree\. The conservedlog\(1/\)\\log\(1/\\varepsilon\)is that asymmetry’s price\.
#### What does not subsume our result\.
KlivansSTV25achieve a2\+2\\eta\+\\varepsilonguarantee for learning halfspaces under heavy contamination by iterative filtering, anO\(1\)O\(1\)noise rate that may look like it dominates our modest constant\. It does not reach our setting on four counts: it is Gaussian, not general log\-concave mixture; it assumes*no*margin and so has nolog\(1/\)\\log\(1/\\varepsilon\)to conserve; it is a filtering algorithm, not a moment/SoS certificate, so the degree\-cost question we study does not arise for it; and it targets agnostic \(boundary\-volume\) error rather than the clean\-inlier errorerr𝒟X\(w^\)\\mathrm\{err\}\_\{\\mathcal\{D\}\_\{X\}\}\(\\hat\{w\}\)of the malicious model\. We cite it as a reference point for what*is*achievable absent these constraints, not as a result that absorbs the reweighted\-hinge line\. The agnostic and nasty\-noise halfspace tradition,DiakonikolasKaneStewart18via iterative spectral filtering and the near\-optimal SQ lower bounds ofDiakonikolasKaneZarifis20, lives in a genuinely different model \(OPT\+\+\\varepsilonerror, not the margin\-plus\-malicious model here\) and provides our methodological foil below\.
#### Method\-specific barriers, and the SQ contrast\.
Our lower bounds rule out a specific class of methods\.[Theorems˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)and[4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)are barriers against the*moment/SoS\-certificate method*, any learner whose only distributional input is the degree\-≤2t\\leq 2tcertified moments of the clean marginal\. They are not information\-theoretic, and they are not statistical\-query hardness\. This scoping is what turnsShen25’s informal “limitations of our approach” into a theorem about the approach, while leaving open that a procedure outside the model does better\. The contrast with the unconditional/SQ tradition is thus between two kinds of impossibility: a SQ lower bound\(DiakonikolasKaneZarifis20\)says no efficient query algorithm of a broad class succeeds, whereas our degree barrier says no certificate of a bounded resource succeeds, and names the resource \(SoS degree\) whose increase provably buys the way out \(traced by[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)\)\. The barrier and the algorithm meet at the same exponent, which is why we state them together\.
### 5\.2Open problems
- •The tight breakdown=⋆\(/\)\{\}^\{\\star\}=\\Theta\(\\gamma/\\sigma\)\.[Proposition˜4\.11](https://arxiv.org/html/2606.17215#S4.Thmtheorem11)is rate\-tight \(err≥/2\\mathrm\{err\}\\geq\\eta/2\) but not threshold\-tight: it certifies the\(\)\\Theta\(\\eta\)floor, not the constant at which learning fails\. Under a hard margin the two\-point construction provably saturates \(the disagreement wedge is exponentially evacuated,≤sat12e−/222\{\}\_\{\\mathrm\{sat\}\}\\leq\\frac\{1\}\{2\}e^\{\-\{\}^\{2\}/2\{\}^\{2\}\}\), so closing the gap to the upper cap=0/4\{\}\_\{0\}=\\rho/4requires a multi\-point or SQ band\-mass argument that exploits the low boundary density=\(/\)\\rho=\\Theta\(\\gamma/\\sigma\)\. We conjecture the tight breakdown is=⋆\(/\)\{\}^\{\\star\}=\\Theta\(\\gamma/\\sigma\)\.
- •The matching certificate\-degree upper bound \(now reduced to one estimate\)\.[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)lower\-bounds the degree of an off\-center pro\-concentration certificate by\(\(\|c\|/s\)2\)\\Omega\(\(\|c\|/s\)^\{2\}\)\.[Proposition˜3\.6](https://arxiv.org/html/2606.17215#S3.Thmtheorem6)turns the matching upper bound into the weighted\-Chebyshev extremal problem𝖤t\(B\)\\mathsf\{E\}\_\{t\}\(B\), and[Theorem˜4\.5](https://arxiv.org/html/2606.17215#S4.Thmtheorem5)makes the threshold tight,2t=\(\(\|c\|/s\)2\)2t=\\Theta\(\(\|c\|/s\)^\{2\}\),*modulo*a single estimate,[Lemma˜3\.7](https://arxiv.org/html/2606.17215#S3.Thmtheorem7): that the extremal cost stays below11up to the Mhaskar–Rakhmanov–Saff edge\. Classical Christoffel and finite–infinite\-range asymptotics\(MateNevaiTotik91;Totik00;LevinLubinsky01\)deliver this asymptotically; deriving its uniform non\-asymptotic constant is the residual open problem, and by[Remark˜3\.9](https://arxiv.org/html/2606.17215#S3.Thmtheorem9)it genuinely requires weighted potential theory rather than an elementary construction\.
- •A self\-contained general\-log\-concave largest\-root bound\.The Gauss\-quadrature obstruction needs the largest orthogonal\-polynomial root for the corrupted weight\. We prove the two endpoints elementarily \(sub\-Gaussian2st\+12s\\sqrt\{t\+1\}and sub\-exponentialO\(st\)O\(st\), via Jacobi\-matrix eigenvalues and Gershgorin\), but the fully general log\-concave case currently invokes the exponential\-weight asymptotics ofLevinLubinsky01\. A self\-contained largest\- root bound for general log\-concave weights would remove that black box\.
- •Does iterative localization escape the barrier?Our barriers are scoped to one\-shot certificates\.Talwar20and the line after it raise the possibility of an*iterative*localization in the style ofAwasthiBalcanLong17that re\-certifies on a shrinking band\. Such a procedure leaves the one\-shot\-certificate model by construction, so[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)does not forbid it; whether iteration provably escapes the margin–degree barrier, and at what compute, is open and is the most likely route to removing thelog\(1/\)\\log\(1/\\varepsilon\)without payingn\(log1/\)n^\{\\Omega\(\\log 1/\\varepsilon\)\}\.
- •A degree\-2t2tSoS lower bound for the joint program\.By[Remark˜3\.1](https://arxiv.org/html/2606.17215#S3.Thmtheorem1), the margin–degree barrier is a degree\-2t2tSum\-of\-Squares lower bound for the*one\-dimensional*, per\-direction band\-mass certification\. Lifting it to a degree\-2t2tSoS lower bound for the fulldd\-dimensional reweighting program of[Algorithm˜1](https://arxiv.org/html/2606.17215#alg1)over all directions*jointly*is open fort≥2t\\geq 2: the per\-direction reduction is lossy \(one degree\-2t2tpseudo\-moment matrix couples all directions, and the axiswise Gauss witnesses are not jointly realizable\), and the one\-dimensional nonnegativity\-equals\-SoS exactness that powers[Remark˜3\.1](https://arxiv.org/html/2606.17215#S3.Thmtheorem1)fails inRd\\mdmathbb\{R\}^\{d\}\. A singledd\-dimensional pseudo\-expectation that defeats the program in every direction at once would turn the certificate barrier into a genuine SoS lower bound for the method itself\.
- •Statistical\-query hardness for the learning problem\.Our barriers limit the moment/SoS\-certificate method; they are not algorithm\-independent\. Whether margin\-halfspace learning under malicious noise is hard for the broad statistical\-query class\(DiakonikolasKaneZarifis20\)—a different object, needing a family of nearly\-indistinguishable hard instances rather than a single certification barrier—is open, and would be the unconditional counterpart to the method\-specific barriers proved here\.
### 5\.3Conclusion
Read through the degree\-cost lens, robust margin\-halfspace learning by reweighted hinge has a single governing quantity: the SoS degree of the outlier\-removal certificate\. Degree22is theShen25corner; degree2t2tbuys outlier tolerance1\-1/2tand costsnO\(t\)n^\{O\(t\)\}time; and two barriers fence the achievable region from below: one \([Theorem˜4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)\) showing degree22cannot beat1/2, the other \([Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)\) showing any fixed degree conserves thelog\(1/\)\\log\(1/\\varepsilon\)margin, with the algorithm of[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)tracing the frontier they bound\. The barriers are method\-specific, against the certificate itself; making the degree the resource is what lets the algorithm and the limits be stated to the same exponent\. The quantity that sets that degree is not new: it is the Christoffel function of the marginal, the same object data analysis reads to spot an outlier, here measuring instead the room an adversary has to hide one\.
## References
## Appendix ANotation and map of results
For convenience we collect the recurring notation and chart how the results depend on one another\.[Table˜2](https://arxiv.org/html/2606.17215#A1.T2)lists the symbols;[Figure˜5](https://arxiv.org/html/2606.17215#A1.F5)is the dependency graph of the theorems, lemmas, and external facts\.
### A\.1Notation
SymbolMeaningSymbolMeaning*Noise model and algorithm*malicious noise rate\(t\)0\{\}\_\{0\}\(t\)tolerated rate at degree2t2ttarget inlier errorerr𝒟X\\mathrm\{err\}\_\{\\mathcal\{D\}\_\{X\}\}inlier error ofw^\\hat\{w\}S=SC∪SDS=S\_\{C\}\\cup S\_\{D\}clean / dirty sampleq∈\[0,1\]nq\\in\[0,1\]^\{n\}reweighting weightshinge\\operatorname\{hinge\}hinge lossLSN\(q∘SD\)\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)dirty\-gradient quantity \([2\.1](https://arxiv.org/html/2606.17215#S2.E1)\)ttdegree parameter \(certificate degree2t2t\)nnsample size*Clean distribution*𝒟X=1K∑j𝒟j\\mathcal\{D\}\_\{X\}=\\tfrac\{1\}\{K\}\\sum\_\{j\}\\mathcal\{D\}\_\{j\}clean marginal𝒟j\\mathcal\{D\}\_\{j\}log\-concave component⪯jI2\{\}\_\{j\}\\preceq\{\}^\{2\}Icomponent covariancerrmean bound∥∥j≤r\\lVert\{\}\_\{j\}\\rVert\\leq rmarginw⋆w^\{\\star\}target halfspaceU=w⋅xU=w\\cdot xone\-dimensional projections=w⊤ws=\\sqrt\{w^\{\\top\}\\Sigma w\}projected std\. dev\.¯2\(w\)\\bar\{\\sigma\}^\{2\}\(w\)reweighted directional variancelaw ofUU*Band, certificate, moments*B=\[c−,c\+\]B=\[c\-\\tau,c\+\\tau\]band,c\\tau,\\ chalf\-width, centermk=E\[Uk\]m\_\{k\}=\\mdmathbb\{E\}\[U^\{k\}\]momentsR\[u\]≤2t\\mdmathbb\{R\}\[u\]\_\{\\leq 2t\}polynomials of degree≤2t\\leq 2tE~\\tilde\{\\mdmathbb\{E\}\}degree\-2t2tpseudo\-expectationpppolynomial minorant \(p≤𝟏Bp\\leq\\mathbf\{1\}\_\{B\}\)pancake density𝒫w\\mathcal\{P\}\_\{w\}dense pancake \(band\)𝖤t\(B\)\\mathsf\{E\}\_\{t\}\(B\)weighted\-Chebyshev valueLEAK\\mathrm\{LEAK\}Christoffel–Darboux leakage*Orthogonal polynomials \([Section˜3](https://arxiv.org/html/2606.17215#S3)\)*korthonormal polynomials ofKn\(x,y\)K\_\{n\}\(x,y\)Christoffel–Darboux kernel\(x\)n=Kn\(x,x\)−1\{\}\_\{n\}\(x\)=K\_\{n\}\(x,x\)^\{\-1\}Christoffel functionuju\_\{j\}Gauss\(–Hermite\) nodes=j\(uj\)t\+1\{\}\_\{j\}=\{\}\_\{t\+1\}\(u\_\{j\}\)Christoffel \(Gauss\) weightsℓj\\ell\_\{j\}fundamental Lagrange polysRt\+1R\_\{t\+1\}largest root oft\+1eqequilibrium densityWB=∑uj∈BjW\_\{B\}=\\sum\_\{u\_\{j\}\\in B\}\{\}\_\{j\}Christoffel band masse0=∑uj∈Bℓje\_\{0\}=\\sum\_\{u\_\{j\}\\in B\}\\ell\_\{j\}in\-band block sum
Table 2:Summary of notations\.
### A\.2Map of results
Arrows in[Figure˜5](https://arxiv.org/html/2606.17215#A1.F5)point from each ingredient*up*to the result that uses it: the external facts feed the lemmas, which feed the headline results\. The Christoffel framework of[Section˜3](https://arxiv.org/html/2606.17215#S3)\(Prop[3\.3](https://arxiv.org/html/2606.17215#S3.Thmtheorem3),Prop[3\.6](https://arxiv.org/html/2606.17215#S3.Thmtheorem6)\) is the organizing layer from which the barriers and the algorithm of[Section˜4](https://arxiv.org/html/2606.17215#S4)are read off\.
Thm[4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)margin–degreeThm[4\.5](https://arxiv.org/html/2606.17215#S4.Thmtheorem5)tight \(cond\.\)Prop[3\.3](https://arxiv.org/html/2606.17215#S3.Thmtheorem3)resolutionThm[4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)degree\-2 barrierThm[4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)algorithmProp[4\.11](https://arxiv.org/html/2606.17215#S4.Thmtheorem11)floorLem[B\.1](https://arxiv.org/html/2606.17215#A2.Thmtheorem1)dualityLem[B\.2](https://arxiv.org/html/2606.17215#A2.Thmtheorem2)witnessProp[3\.6](https://arxiv.org/html/2606.17215#S3.Thmtheorem6)reductionLem[3\.7](https://arxiv.org/html/2606.17215#S3.Thmtheorem7)wtd\.\-extremalLem[4\.16](https://arxiv.org/html/2606.17215#S4.Thmtheorem16)quantilesLem[4\.14](https://arxiv.org/html/2606.17215#S4.Thmtheorem14)GaussLem[4\.15](https://arxiv.org/html/2606.17215#S4.Thmtheorem15)rootLem[4\.17](https://arxiv.org/html/2606.17215#S4.Thmtheorem17)cloakLem[4\.13](https://arxiv.org/html/2606.17215#S4.Thmtheorem13)hypercontr\.Fact[C\.4](https://arxiv.org/html/2606.17215#A3.Thmtheorem4)Levin–LubinskyFact[C\.7](https://arxiv.org/html/2606.17215#A3.Thmtheorem7)Christoffel==max massFact[C\.1](https://arxiv.org/html/2606.17215#A3.Thmtheorem1)Lovász–VempalaFact[C\.5](https://arxiv.org/html/2606.17215#A3.Thmtheorem5)hypercontr\.\(certif\.\)Fact[C\.2](https://arxiv.org/html/2606.17215#A3.Thmtheorem2)Kearns–LiFact[C\.3](https://arxiv.org/html/2606.17215#A3.Thmtheorem3)Brascamp–Lieb
Figure 5:Map of results: arrows point from each ingredient*up*to the result that uses it\. External facts \(bottom, restated in[Appendix˜C](https://arxiv.org/html/2606.17215#A3)\) feed the lemmas, which feed the headline results \(top\)\. The §[3](https://arxiv.org/html/2606.17215#S3)framework \(Prop[3\.3](https://arxiv.org/html/2606.17215#S3.Thmtheorem3),Prop[3\.6](https://arxiv.org/html/2606.17215#S3.Thmtheorem6)\) is the Christoffel\-function layer that organizes the margin–degree results, which are proved from the lemmas below\.
## Appendix BDeferred proofs
This appendix reproduces the full proofs\. Each result is restated verbatim \(via the starred macro, so the appendix copy cannot drift from[Section˜4](https://arxiv.org/html/2606.17215#S4)\) and then proved\. We proceed in dependency order: the margin–degree barrier and its supporting lemmas first, then the algorithm and the certifiable\-hypercontractivity lemma it needs, then the degree\-22barrier and its cloak, and finally the breakdown floor\. Where a step rests on an external theorem or a not\-yet\-self\-contained sub\-lemma, a remark after the proof says so explicitly\.
### B\.1The margin–degree tradeoff
Throughout this subsection is a centered probability measure onR\\mdmathbb\{R\}with all moments finite and infinite support,mk=∫ukdm\_\{k\}=\\int u^\{k\}\\,d\\muits moments andssits standard deviation;B=\[c−,c\+\]B=\[c\-\\tau,c\+\\tau\]is a band of half\-width=s\\tau=\\theta s,=\(1\)\\theta=\\Theta\(1\), and=\(1\)\\rho=\\Omega\(1\)a target mass\. A*degree\-2t2tmoment\-only certificate*of\(B\)≥\\mu\(B\)\\geq\\rhois a polynomial minorantp≤𝟏Bp\\leq\\mathbf\{1\}\_\{B\},degp≤2t\\deg p\\leq 2t, together with a degree\-2t2tSoS proof*from the moment axioms*\{∫ukd=mk\}k≤2t\\\{\\int u^\{k\}\\,d\\nu=m\_\{k\}\\\}\_\{k\\leq 2t\}alone that∫pd≥\\int p\\,d\\nu\\geq\\rho\([Lemma˜4\.14](https://arxiv.org/html/2606.17215#S4.Thmtheorem14)\(i\)\)\. Reading only the moments, such a proof certifies\(B\)≥E\[p\]≥\\mu\(B\)\\geq\\mdmathbb\{E\}\[p\]\\geq\\rhofor the true law and, simultaneously, for*every*nonnegative matchingm0,…,m2tm\_\{0\},\\dots,m\_\{2t\}; hence the best certifiable value is
¯\(2t\)≔supp:degp≤2tp≤𝟏Binf≥0∫ukd=mk\(k≤2t\)∫pd,\\displaystyle\\overline\{\\rho\}\(2t\)\\;\\coloneqq\\;\\sup\_\{\\begin\{subarray\}\{c\}p:\\ \\deg p\\leq 2t\\\\ p\\leq\\mathbf\{1\}\_\{B\}\\end\{subarray\}\}\\;\\inf\_\{\\begin\{subarray\}\{c\}\\nu\\geq 0\\\\ \\int u^\{k\}\\,d\\nu=m\_\{k\}\\,\(k\\leq 2t\)\\end\{subarray\}\}\\;\\int p\\,d\\nu,\(B\.1\)and a certificate exists only if¯\(2t\)≥\\overline\{\\rho\}\(2t\)\\geq\\rho\. The next two lemmas isolate the model\-independent core \(weak duality and a Gauss\-quadrature witness\) after which[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)is one per\-class root estimate\.
###### Lemma B\.1\(weak moment–LP duality\)\.
The certifiable ceiling is at most the moment\-LP band minimum,¯\(2t\)≤m¯\(2t\)\\overline\{\\rho\}\(2t\)\\leq\\underline\{m\}\(2t\), where
m¯\(2t\)≔inf\{\(B\):≥0,∫ukd=mk∀k≤2t\}\.\\displaystyle\\underline\{m\}\(2t\)\\;\\coloneqq\\;\\inf\\big\\\{\\nu\(B\):\\nu\\geq 0,\\ \\textstyle\\int u^\{k\}\\,d\\nu=m\_\{k\}\\ \\forall k\\leq 2t\\big\\\}\.\(B\.2\)In particular, if some nonnegative matchesm0,…,m2tm\_\{0\},\\dots,m\_\{2t\}and has\(B\)=0\\nu\(B\)=0, thenm¯\(2t\)=¯\(2t\)=0\\underline\{m\}\(2t\)=\\overline\{\\rho\}\(2t\)=0and no degree\-2t2tcertificate of\(B\)≥\\mu\(B\)\\geq\\rhoexists\.
###### Proof\.
For any admissiblepp\(degp≤2t\\deg p\\leq 2t,p≤𝟏Bp\\leq\\mathbf\{1\}\_\{B\}\) and any admissible , the integral∫pd=∑k≤2tpkmk=E\[p\]\\int p\\,d\\nu=\\sum\_\{k\\leq 2t\}p\_\{k\}m\_\{k\}=\\mdmathbb\{E\}\[p\]is the*same*for all admissible , since it depends on only through the matched moments\. Let⋆attain \([B\.2](https://arxiv.org/html/2606.17215#A2.E2)\)\. Then
E\[p\]=∫pd≤⋆∫𝟏Bd=⋆\(B\)⋆=m¯\(2t\),\\displaystyle\\mdmathbb\{E\}\[p\]\\;=\\;\\int p\\,d\{\}\_\{\\star\}\\;\\leq\\;\\int\\mathbf\{1\}\_\{B\}\\,d\{\}\_\{\\star\}\\;=\\;\{\}\_\{\\star\}\(B\)\\;=\\;\\underline\{m\}\(2t\),usingp≤𝟏Bp\\leq\\mathbf\{1\}\_\{B\}and≥⋆0\{\}\_\{\\star\}\\geq 0; taking the supremum overppgives the inequality\. The “in particular” is the casem¯\(2t\)=0\\underline\{m\}\(2t\)=0via \([B\.1](https://arxiv.org/html/2606.17215#A2.E1)\)\. Only this direction \(weak duality\) is used; the reverse inequality is a separate tightness question, not load\-bearing here\. ∎
###### Lemma B\.2\(Gauss quadrature empties a root\-free band\)\.
Letu1<⋯<ut\+1u\_\{1\}<\\dots<u\_\{t\+1\}be the real simple zeros of the degree\-\(t\+1\)\(t\{\+\}1\)orthonormal polynomialt\+1of , andRt\+1≔max1≤i≤t\+1\|ui\|R\_\{t\+1\}\\coloneqq\\max\_\{1\\leq i\\leq t\+1\}\\lvert u\_\{i\}\\rvert\. If the band avoids them all,\|c\|−s\>Rt\+1\\lvert c\\rvert\-\\theta s\>R\_\{t\+1\}, thenm¯\(2t\)=0\\underline\{m\}\(2t\)=0; hence, by[Lemma˜B\.1](https://arxiv.org/html/2606.17215#A2.Thmtheorem1), no degree\-2t2tcertificate of\(B\)≥\\mu\(B\)\\geq\\rhoexists\.
###### Proof\.
Let=GQ∑iuii\{\}\_\{\\mathrm\{GQ\}\}=\\sum\_\{i\}\{\}\_\{i\}\{\}\_\{u\_\{i\}\}be the Gauss measure with Christoffel weights\>i0\{\}\_\{i\}\>0\. By[Lemma˜4\.14](https://arxiv.org/html/2606.17215#S4.Thmtheorem14)\(ii\) it is nonnegative, atomic on thet\+1t\{\+\}1nodes, and exact on degrees≤2t\+1\\leq 2t\+1; in particular it matchesm0,…,m2tm\_\{0\},\\dots,m\_\{2t\}and is admissible in \([B\.2](https://arxiv.org/html/2606.17215#A2.E2)\)\. Therefore\(B\)GQ=∑i:ui∈Bi\{\}\_\{\\mathrm\{GQ\}\}\(B\)=\\sum\_\{i:\\,u\_\{i\}\\in B\}\{\}\_\{i\}, which is0*iff no node lies inBB*\. A node can lie inB=\[c−s,c\+s\]B=\[c\-\\theta s,c\+\\theta s\]only if\|ui\|≥\|c\|−s\\lvert u\_\{i\}\\rvert\\geq\\lvert c\\rvert\-\\theta s, so\|c\|−s\>Rt\+1\\lvert c\\rvert\-\\theta s\>R\_\{t\+1\}leavesBBnode\-free and\(B\)GQ=0\{\}\_\{\\mathrm\{GQ\}\}\(B\)=0; asGQis admissible in \([B\.2](https://arxiv.org/html/2606.17215#A2.E2)\),m¯\(2t\)=0\\underline\{m\}\(2t\)=0\. ∎
See[4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)
###### Proof\.
Fix one directionww; a lower bound needs only one bad pair\(w,c\)\(w,c\), so no uniformity\-over\-wwissue arises\. Apply the setup above to=\\mu=the centered log\-concave law ofU=w⋅xU=w\\cdot x\(standard deviationss; all moments finite and the law of infinite support, as log\-concave laws are sub\-exponential\), with bandB=\[c−s,c\+s\]B=\[c\-\\theta s,c\+\\theta s\]\. By[Lemmas˜B\.1](https://arxiv.org/html/2606.17215#A2.Thmtheorem1)and[B\.2](https://arxiv.org/html/2606.17215#A2.Thmtheorem2), a degree\-2t2tcertificate of\>0\\rho\>0forcesBBto*contain*a zero oft\+1, i\.e\.Rt\+1≥\|c\|−sR\_\{t\+1\}\\geq\\lvert c\\rvert\-\\theta s\. It remains to boundRt\+1R\_\{t\+1\}in each marginal class, the one place the marginal’s tail enters\.
*Sub\-Gaussian marginal\.*By[Lemma˜4\.15](https://arxiv.org/html/2606.17215#S4.Thmtheorem15), ane−\|u/s\|2e^\{\-\\lvert u/s\\rvert^\{2\}\}tail \(exactly Hermite for the Gaussian\) hasRt\+1≤C1st\+1R\_\{t\+1\}\\leq C\_\{1\}s\\sqrt\{t\+1\}\. Combined withRt\+1≥\|c\|−sR\_\{t\+1\}\\geq\\lvert c\\rvert\-\\theta sthis forcest\+1≥\(\|c\|/s−\)/C1\\sqrt\{t\+1\}\\geq\(\\lvert c\\rvert/s\-\\theta\)/C\_\{1\}, i\.e\.2t=\(\(\|c\|/s\)2\)2t=\\Omega\\big\(\(\\lvert c\\rvert/s\)^\{2\}\\big\)for\|c\|/s≥2\\lvert c\\rvert/s\\geq 2\\theta, the first half of \([4\.1](https://arxiv.org/html/2606.17215#S4.E1)\)\.
*Sub\-exponential marginal\.*By[Lemma˜4\.15](https://arxiv.org/html/2606.17215#S4.Thmtheorem15), ane−\|u/s\|e^\{\-\\lvert u/s\\rvert\}tail \(Freud exponent=1\\alpha=1, e\.g\. Laplace\) hasRt\+1≤C1′stR\_\{t\+1\}\\leq C\_\{1\}^\{\\prime\}s\\,t, soRt\+1≥\|c\|−sR\_\{t\+1\}\\geq\\lvert c\\rvert\-\\theta sforces2t=\(\|c\|/s\)2t=\\Omega\(\\lvert c\\rvert/s\), the second half of \([4\.1](https://arxiv.org/html/2606.17215#S4.E1)\)\. The power differs, but the dichotomy below shows both feed the samelog\(1/\)\\log\(1/\\varepsilon\)\.
*The margin–degree dichotomy\.*A certificate learner with inlier error must certify a dense pancake around all but a fraction of the clean sample projections; the worst center is the \-quantile\|c⋆\|\\lvert c^\{\\star\}\\rvert\. By the log\-concave tail bounds of[Lemma˜4\.16](https://arxiv.org/html/2606.17215#S4.Thmtheorem16),\|c⋆\|=\(slog\(1/\)\)\\lvert c^\{\\star\}\\rvert=\\Theta\(s\\sqrt\{\\log\(1/\\beta\)\}\)in the sub\-Gaussian case and\|c⋆\|=\(slog\(1/\)\)\\lvert c^\{\\star\}\\rvert=\\Theta\(s\\log\(1/\\beta\)\)in the sub\-exponential case\. Substituting into \([4\.1](https://arxiv.org/html/2606.17215#S4.E1)\):
sub\-Gaussian:2t=\(\(\|c⋆\|/s\)2\)=\(log\(1/\)\),\\displaystyle 2t=\\Omega\\big\(\(\\lvert c^\{\\star\}\\rvert/s\)^\{2\}\\big\)=\\Omega\(\\log\(1/\\varepsilon\)\),sub\-exponential:2t=\(\|c⋆\|/s\)=\(log\(1/\)\),\\displaystyle 2t=\\Omega\(\\lvert c^\{\\star\}\\rvert/s\)=\\Omega\(\\log\(1/\\varepsilon\)\),the same floort=\(log\(1/\)\)t=\\Omega\(\\log\(1/\\varepsilon\)\)at fixed standardized half\-width=\(s\)\\tau=\\Theta\(s\)\. The only way out is to widen \. Under near\-isotropy \(∥w∥≤1/\\lVert w\\rVert\\leq 1/\\gamma,s=\(1\)s=\\Theta\(1\)\) the geometric correspondence of[Lemma˜4\.16](https://arxiv.org/html/2606.17215#S4.Thmtheorem16)turns a standardized half\-width=\(s\)\\tau=\\Theta\(s\)into a physical margin=\(d\)\\gamma=\\Theta\(\\tau\\sqrt\{d\}\), so escaping the degree floor requires=\(log\(1/\)/d\)\\gamma=\\Omega\(\\sqrt\{\\log\(1/\\varepsilon\)\}/\\sqrt\{d\}\)\. This is the dichotomy \([4\.2](https://arxiv.org/html/2606.17215#S4.E2)\): degree\(log\(1/\)\)\\Omega\(\\log\(1/\\varepsilon\)\)or margin\(log\(1/\)/d\)\\Omega\(\\sqrt\{\\log\(1/\\varepsilon\)\}/\\sqrt\{d\}\)\.
∎
### B\.2Minorant reduction and Gauss quadrature
See[4\.14](https://arxiv.org/html/2606.17215#S4.Thmtheorem14)
###### Proof\.
\(i\) For the lower\-bound direction: ifp≤𝟏Bp\\leq\\mathbf\{1\}\_\{B\}with a degree\-2t2tSoS proof from the moment axioms that∫pd≥\\int p\\,d\\nu\\geq\\rho, then applying that proof to the genuine pseudo\-expectation induced by \(which is a valid degree\-2t2tpseudo\-expectation, as has all moments\) and using soundness givesPr\[U∈B\]=E\[𝟏B\]≥E\[p\]≥\\Pr\[U\\in B\]=\\mdmathbb\{E\}\[\\mathbf\{1\}\_\{B\}\]\\geq\\mdmathbb\{E\}\[p\]\\geq\\rho\. Conversely, any degree\-2t2tmoment\-only Positivstellensatz lower bound onPr\[U∈B\]\\Pr\[U\\in B\]produces such a minorantppas the polynomial certificate it manipulates: the certificate cannot evaluate𝟏B\\mathbf\{1\}\_\{B\}itself, so it must bound it below by a degree\-≤2t\\leq 2tpolynomial and argue about that polynomial’s moment integral\. This is the standard indicator\-sandwich characterization of moment lower bounds\.
\(ii\) The zeros oft\+1are real and simple, the Christoffel weights are positive, and the\(t\+1\)\(t\{\+\}1\)\-point Gauss rule integrates every polynomial of degree≤2t\+1\\leq 2t\+1exactly against ; these are the classical properties of Gaussian quadrature\. In particular∫gd=GQ∫gd\\int g\\,d\{\}\_\{\\mathrm\{GQ\}\}=\\int g\\,d\\mufordegg≤2t\+1\\deg g\\leq 2t\+1, so the momentsm0,…,m2tm\_\{0\},\\dots,m\_\{2t\}are matched, and≥GQ0\{\}\_\{\\mathrm\{GQ\}\}\\geq 0is supported on exactly thet\+1t\{\+\}1nodes\. ∎
### B\.3The largest\-root bound
See[4\.15](https://arxiv.org/html/2606.17215#S4.Thmtheorem15)
###### Proof\.
The zeros oft\+1are exactly the eigenvalues of the\(t\+1\)×\(t\+1\)\(t\{\+\}1\)\\times\(t\{\+\}1\)Jacobi \(symmetric tridiagonal\) truncationJt\+1J\_\{t\+1\}of multiplication\-by\-uuin the orthonormal basis\. Writing the three\-term recurrence as
u=kbk\+k\+1ak\+kbk−1,k−1ak=∫udk2,bk=∫udkk\+1\>0,\\displaystyle u\\,\{\}\_\{k\}\\;=\\;b\_\{k\}\\,\{\}\_\{k\+1\}\+a\_\{k\}\\,\{\}\_\{k\}\+b\_\{k\-1\}\\,\{\}\_\{k\-1\},\\qquad a\_\{k\}=\\int u\\,\{\}\_\{k\}^\{2\}\\,d\\mu,\\quad b\_\{k\}=\\int u\\,\{\}\_\{k\}\{\}\_\{k\+1\}\\,d\\mu\>0,\(B\.3\)the matrixJt\+1J\_\{t\+1\}has diagonal\(a0,…,at\)\(a\_\{0\},\\dots,a\_\{t\}\)and off\-diagonal\(b0,…,bt−1\)\(b\_\{0\},\\dots,b\_\{t\-1\}\)\. By Gershgorin’s theorem, every eigenvalue lies within\|ak\|\+bk\+bk−1\\lvert a\_\{k\}\\rvert\+b\_\{k\}\+b\_\{k\-1\}of some diagonal entryaka\_\{k\}, so
Rt\+1=maxi\|ui\|≤maxk\(\|ak\|\+bk\+bk−1\)\.\\displaystyle R\_\{t\+1\}\\;=\\;\\max\_\{i\}\\lvert u\_\{i\}\\rvert\\;\\leq\\;\\max\_\{k\}\\big\(\\lvert a\_\{k\}\\rvert\+b\_\{k\}\+b\_\{k\-1\}\\big\)\.\(B\.4\)It remains to bound the recurrence coefficients in each case\.
*Sub\-Gaussian \(Hermite\)\.*For=𝒩\(0,s2\)\\mu=\\mathcal\{N\}\(0,s^\{2\}\)the orthonormal polynomials are the scaled Hermite polynomials, withak=0a\_\{k\}=0andbk=s\(k\+1\)/2b\_\{k\}=s\\sqrt\{\(k\+1\)/2\}\. Then \([B\.4](https://arxiv.org/html/2606.17215#A2.E4)\) givesRt\+1≤bt\+bt−1≤2s\(t\+1\)/2≤2st\+1R\_\{t\+1\}\\leq b\_\{t\}\+b\_\{t\-1\}\\leq 2s\\sqrt\{\(t\+1\)/2\}\\leq 2s\\sqrt\{t\+1\}\. \(The exact largest Hermite zero iss2\(t\+1\)\(1−o\(1\)\)s\\sqrt\{2\(t\+1\)\}\\,\(1\-o\(1\)\), confirming thet\\sqrt\{t\}scaling\.\) For a general sub\-Gaussian Freud weighte−\|u/s\|2e^\{\-\\lvert u/s\\rvert^\{2\}\}the same conclusion holds with the recurrence coefficients controlled at scalesks\\sqrt\{k\}, givingRt\+1≤C1st\+1R\_\{t\+1\}\\leq C\_\{1\}s\\sqrt\{t\+1\}\.
*Sub\-exponential \(Laplace,=1\\alpha=1\)\.*For a Freud weighte−\|u/s\|e^\{\-\\lvert u/s\\rvert\}the recurrence coefficients scale asak,bk≍ska\_\{k\},b\_\{k\}\\asymp s\\,k, so \([B\.4](https://arxiv.org/html/2606.17215#A2.E4)\) givesRt\+1=O\(st\)R\_\{t\+1\}=O\(s\\,t\)\. The Mhaskar–Rakhmanov–Saff \(MRS\) numberan≍sna\_\{n\}\\asymp s\\,nis the right scale here\.
*General Freud exponent∈\(1,2\)\\alpha\\in\(1,2\)\.*For a convex weighte−\|u/s\|e^\{\-\\lvert u/s\\rvert\}the recurrence coefficients satisfyak,bk=O\(sk1/\)a\_\{k\},b\_\{k\}=O\(s\\,k^\{1/\\alpha\}\), where the genuineO\(st1/\)O\(s\\,t^\{1/\\alpha\}\)scaling \(rather than the cruderO\(st2\)O\(s\\,t^\{2\}\)that a naive telescoping of \([B\.3](https://arxiv.org/html/2606.17215#A2.E3)\) would give\) is the MRS number, supplied by the exponential\-weight asymptotics ofLevinLubinsky01\([Fact˜C\.4](https://arxiv.org/html/2606.17215#A3.Thmtheorem4); see alsoKriecherbauerMcLaughlin99for the Riemann–Hilbert treatment of the zero distribution\)\. This yieldsRt\+1≍st1/R\_\{t\+1\}\\asymp s\\,t^\{1/\\alpha\}\. ∎
### B\.4Log\-concave quantiles and margin geometry
See[4\.16](https://arxiv.org/html/2606.17215#S4.Thmtheorem16)
###### Proof\.
*Quantiles\.*For a centered log\-concave law at scaless, the upper tail satisfiesPr\[\|U\|≥ts\]≤e−\(t\)\\Pr\[\\lvert U\\rvert\\geq ts\]\\leq e^\{\-\\Omega\(t\)\}fort≥1t\\geq 1\(sub\-exponential\), with the strongere−\(t2\)e^\{\-\\Omega\(t^\{2\}\)\}under sub\-Gaussianity; these are the standard one\-dimensional log\-concave tail bounds\(LovaszVempala07\)\. Inverting at level gives the \-quantile\|c⋆\|=\(slog\(1/\)\)\\lvert c^\{\\star\}\\rvert=\\Theta\(s\\log\(1/\\beta\)\)in the sub\-exponential case and\(slog\(1/\)\)\\Theta\(s\\sqrt\{\\log\(1/\\beta\)\}\)in the sub\-Gaussian case\.
*Margin geometry\.*Homogenizex↦\(x,1\)x\\mapsto\(x,1\)so that a halfspace with margin becomes a linear threshold, and normalize∥w∥≤1/\\lVert w\\rVert\\leq 1/\\gamma\. Under⪯I2\\Sigma\\preceq\{\}^\{2\}Iwith=1/d\\sigma=1/\\sqrt\{d\}\(near\-isotropy,s=\(1\)s=\\Theta\(1\)\), a standardized half\-width=\(s\)\\tau=\\Theta\(s\)in the projected coordinateU=w⋅xU=w\\cdot xcorresponds to a physical band of width\(\)\\Theta\(\\tau\)scaled by∥w∥\\lVert w\\rVert, i\.e\. a margin=\(d\)\\gamma=\\Theta\(\\tau\\sqrt\{d\}\)\. ∎
### B\.5The degree\-2t2treweighted hinge
We first prove the certifiable\-hypercontractivity lemma the algorithm needs, then the algorithm itself\.
See[4\.13](https://arxiv.org/html/2606.17215#S4.Thmtheorem13)
###### Proof\.
*Step 1 \(the correct scalar constant\)\.*A centered one\-dimensional log\-concaveZ=w⋅zZ=w\\cdot zhas∥Z∥L2t≤a⋅2t∥Z∥L2\\lVert Z\\rVert\_\{L^\{2t\}\}\\leq a\\cdot 2t\\,\\lVert Z\\rVert\_\{L^\{2\}\}withaaabsolute \(the1/ sub\-exponential moment growth,*linear*in the order, of a log\-concave law;[Fact˜C\.1](https://arxiv.org/html/2606.17215#A3.Thmtheorem1),LovaszVempala07\)\. Hencem2t≤\(C0t\)2t\(s2\)tm\_\{2t\}\\leq\(C\_\{0\}t\)^\{2t\}\(s^\{2\}\)^\{t\}withC0=2aC\_\{0\}=2a\. This constant is tight in the worst case: for the Laplace law,m2t/m2t=\(2t\)\!/2t=\(\(t\)\)2tm\_\{2t\}/m\_\{2\}^\{t\}=\(2t\)\!/2^\{t\}=\(\\Theta\(t\)\)^\{2t\}, so*no*\(Ct\)t\(Ct\)^\{t\}bound can hold for alltt\. This is why the sub\-exponential constant\(Ct\)2t\(Ct\)^\{2t\}, and not the sub\-Gaussian\(Ct\)t\(Ct\)^\{t\}, is the correct one for general log\-concave marginals; it is consistent with the=1\\alpha=1root scaling in[Lemma˜4\.15](https://arxiv.org/html/2606.17215#S4.Thmtheorem15)\.
*Step 2 \(single\-component SoS certificate\)\.*For a single component𝒟j\\mathcal\{D\}\_\{j\}, the tensor inequality
⟨Mj\(2t\),w⊗2t⟩≤\(C0t\)2t\(w⊤wj\)t\\langle M\_\{j\}^\{\(2t\)\},w^\{\\otimes 2t\}\\rangle\\leq\(C\_\{0\}t\)^\{2t\}\(w^\{\\top\}\{\}\_\{j\}w\)^\{t\}is degree\-2t2tSoS\-certifiable inww, the right\-hand side being att\-th power of a PSD quadratic and hence SoS\. For sub\-Gaussian components this certificate is given byKlivansKothariMeka18;DiakonikolasHopkinsPensiaTiegel24\([Fact˜C\.5](https://arxiv.org/html/2606.17215#A3.Thmtheorem5)\); for general log\-concave components we use the scalar→\\toSoS lift \([Remark˜B\.8](https://arxiv.org/html/2606.17215#A2.Thmtheorem8)below\), the SoS Cauchy–Schwarz/Hölder pyramid that bootstraps the scalar bound of Step 1 through the certified moment tensorsM\(2k\)M^\{\(2k\)\},k≤tk\\leq t\.
*Step 3 \(the mixture, via a binomial split\)\.*The mixture is not log\-concave, so we split each component contribution\. WithZj=w⋅\(x−\)jZ\_\{j\}=w\\cdot\(x\-\{\}\_\{j\}\)and the bounded spread⋅jw\{\}\_\{j\}\\cdot wwhere=j−j¯\{\}\_\{j\}=\{\}\_\{j\}\-\\bar\{\\mu\}\(so\|⋅jw\|≤r\\lvert\{\}\_\{j\}\\cdot w\\rvert\\leq rfor∥w∥=1\\lVert w\\rVert=1\),
\(w⋅\(x−¯\)\)2t=\(Zj\+⋅jw\)2t≤22t−1\(Zj2t\+\(⋅jw\)2t\),\\displaystyle\\big\(w\\cdot\(x\-\\bar\{\\mu\}\)\\big\)^\{2t\}\\;=\\;\(Z\_\{j\}\+\{\}\_\{j\}\\cdot w\)^\{2t\}\\;\\leq\\;2^\{2t\-1\}\\big\(Z\_\{j\}^\{2t\}\+\(\{\}\_\{j\}\\cdot w\)^\{2t\}\\big\),\(B\.5\)by convexity ofu↦u2tu\\mapsto u^\{2t\}\. The gap in \([B\.5](https://arxiv.org/html/2606.17215#A2.E5)\) is, for fixedjj, a univariate nonnegative polynomial in the linear forms together with a positive multiple of the certified tensor term, hence \([B\.5](https://arxiv.org/html/2606.17215#A2.E5)\) is an SoS inequality inww\.
*Step 4 \(averaging preserves SoS\)\.*Averaging \([B\.5](https://arxiv.org/html/2606.17215#A2.E5)\) over the components, and using that a nonnegative combination of SoS proofs is SoS,
E𝒟X\[\(w⋅\(x−¯\)\)2t\]≤22t−11K∑j\(E𝒟j\[Zj2t\]\+\(⋅jw\)2t\)=22t−1\(\(w\)t\+\(w\)t\),\\displaystyle\\mdmathbb\{E\}\_\{\\mathcal\{D\}\_\{X\}\}\\big\[\(w\\cdot\(x\-\\bar\{\\mu\}\)\)^\{2t\}\\big\]\\;\\leq\\;2^\{2t\-1\}\\,\\frac\{1\}\{K\}\\sum\_\{j\}\\Big\(\\mdmathbb\{E\}\_\{\\mathcal\{D\}\_\{j\}\}\[Z\_\{j\}^\{2t\}\]\+\(\{\}\_\{j\}\\cdot w\)^\{2t\}\\Big\)\\;=\\;2^\{2t\-1\}\\big\(\{\}\_\{t\}\(w\)\+\{\}\_\{t\}\(w\)\\big\),with the within\-component term\(w\)t=1K∑jE𝒟j\[Zj2t\]\{\}\_\{t\}\(w\)=\\frac\{1\}\{K\}\\sum\_\{j\}\\mdmathbb\{E\}\_\{\\mathcal\{D\}\_\{j\}\}\[Z\_\{j\}^\{2t\}\]and the spread term\(w\)t=1K∑j\(⋅jw\)2t\{\}\_\{t\}\(w\)=\\frac\{1\}\{K\}\\sum\_\{j\}\(\{\}\_\{j\}\\cdot w\)^\{2t\}\. The total\-variance identity
¯2\(w\)=1K∑jw⊤wj\+1K∑j\(⋅jw\)2\\displaystyle\\bar\{\\sigma\}^\{2\}\(w\)\\;=\\;\\frac\{1\}\{K\}\\sum\_\{j\}w^\{\\top\}\{\}\_\{j\}w\+\\frac\{1\}\{K\}\\sum\_\{j\}\(\{\}\_\{j\}\\cdot w\)^\{2\}\(B\.6\)bounds both terms by¯2\(w\)t\\bar\{\\sigma\}^\{2\}\(w\)^\{t\}: the term\(w\)t\{\}\_\{t\}\(w\)by Step 2 applied componentwise \(eachE𝒟j\[Zj2t\]≤\(C0t\)2t\(w⊤wj\)t\\mdmathbb\{E\}\_\{\\mathcal\{D\}\_\{j\}\}\[Z\_\{j\}^\{2t\}\]\\leq\(C\_\{0\}t\)^\{2t\}\(w^\{\\top\}\{\}\_\{j\}w\)^\{t\}\) and the PSD\-dominationw⊤wj≤¯2\(w\)⋅O\(1\)w^\{\\top\}\{\}\_\{j\}w\\leq\\bar\{\\sigma\}^\{2\}\(w\)\\cdot O\(1\)supplied by the isotropy constant=O\(1\)\\kappa=O\(1\)in the regimer=O\(d\)r=O\(\\sigma\\sqrt\{d\}\); the term\(w\)t\{\}\_\{t\}\(w\)by\(⋅jw\)2t=\(\(⋅jw\)2\)t≤¯2\(w\)t\(\{\}\_\{j\}\\cdot w\)^\{2t\}=\(\(\{\}\_\{j\}\\cdot w\)^\{2\}\)^\{t\}\\leq\\bar\{\\sigma\}^\{2\}\(w\)^\{t\}from \([B\.6](https://arxiv.org/html/2606.17215#A2.E6)\)\. Assembling, withC=4C0C=4C\_\{0\}\\sqrt\{\\kappa\}, gives the SoS\-certified bound \([4\.7](https://arxiv.org/html/2606.17215#S4.E7)\)\. ∎
See[4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)
###### Proof\.
*TheLSN\\mathrm\{LSN\}bound \([4\.4](https://arxiv.org/html/2606.17215#S4.E4)\)\.*Fixwwwith∥w∥≤1\\lVert w\\rVert\\leq 1\. By Hölder’s inequality with conjugate exponents2t2t−1\\tfrac\{2t\}\{2t\-1\}and2t2t,
∑i∈SDqi\|w⋅xi\|≤\(∑i∈SDqi\)1−12t\(∑i∈SDqi\|w⋅xi\|2t\)12t\.\\displaystyle\\sum\_\{i\\in S\_\{D\}\}q\_\{i\}\\lvert w\\cdot x\_\{i\}\\rvert\\;\\leq\\;\\Big\(\\sum\_\{i\\in S\_\{D\}\}q\_\{i\}\\Big\)^\{1\-\\frac\{1\}\{2t\}\}\\Big\(\\sum\_\{i\\in S\_\{D\}\}q\_\{i\}\\lvert w\\cdot x\_\{i\}\\rvert^\{2t\}\\Big\)^\{\\frac\{1\}\{2t\}\}\.The first factor is at most\(n\)1−12t\(\\eta n\)^\{1\-\\frac\{1\}\{2t\}\}since∑i∈SDqi≤\|SD\|≤n\\sum\_\{i\\in S\_\{D\}\}q\_\{i\}\\leq\\lvert S\_\{D\}\\rvert\\leq\\eta n\. For the second,\|w⋅x\|2t=\(w⋅x\)2t\\lvert w\\cdot x\\rvert^\{2t\}=\(w\\cdot x\)^\{2t\}is an even \(hence polynomial\) power, the summand is nonnegative, andSD⊆SS\_\{D\}\\subseteq S, so
∑i∈SDqi\(w⋅xi\)2t≤∑i∈Sqi\(w⋅xi\)2t≤n\(Ct\)2t¯2t\\sum\_\{i\\in S\_\{D\}\}q\_\{i\}\(w\\cdot x\_\{i\}\)^\{2t\}\\ \\leq\\ \\sum\_\{i\\in S\}q\_\{i\}\(w\\cdot x\_\{i\}\)^\{2t\}\\ \\leq\\ n\\,\(Ct\)^\{2t\}\\bar\{\\sigma\}^\{2t\}by the degree\-2t2tcertificate thatqqsatisfies\. Hence
∑i∈SDqi\|w⋅xi\|≤\(n\)1−12t\(n\(Ct\)2t¯2t\)12t=n\(Ct\)¯,1−12t\\displaystyle\\sum\_\{i\\in S\_\{D\}\}q\_\{i\}\\lvert w\\cdot x\_\{i\}\\rvert\\;\\leq\\;\(\\eta n\)^\{1\-\\frac\{1\}\{2t\}\}\\big\(n\(Ct\)^\{2t\}\\bar\{\\sigma\}^\{2t\}\\big\)^\{\\frac\{1\}\{2t\}\}\\;=\\;n\\,\(Ct\)\\,\\bar\{\\sigma\}\\,\{\}^\{1\-\\frac\{1\}\{2t\}\},and taking the supremum over∥w∥≤1\\lVert w\\rVert\\leq 1gives \([4\.4](https://arxiv.org/html/2606.17215#S4.E4)\)\. Att=1t=1this isnC¯1/2n\\,C\\bar\{\\sigma\}\\,\{\}^\{1/2\}, exactlyShen25’s Lemma 6; ast→∞t\\to\\inftythe exponent of tends to11\.
*Feasibility\.*The clean\-indicator weight \(qi=1q\_\{i\}=1onSCS\_\{C\},qi=0q\_\{i\}=0onSDS\_\{D\}, so=\\xi=\\eta\) has the required degree\-2t2tcertificate: it is exactly the SoS bound of[Lemma˜4\.13](https://arxiv.org/html/2606.17215#S4.Thmtheorem13)applied to the empirical clean measure, which holds with high probability oncen≥dO\(t\)n\\geq d^\{O\(t\)\}\(enough samples for the empirical moment tensors to concentrate around the population tensors that[Lemma˜4\.13](https://arxiv.org/html/2606.17215#S4.Thmtheorem13)certifies\)\. The program is thus nonempty, and the SDP has sizedO\(t\)d^\{O\(t\)\}and runtimenO\(t\)n^\{O\(t\)\}\.
*The noise tolerance \([4\.5](https://arxiv.org/html/2606.17215#S4.E5)\)\.*Plug \([4\.4](https://arxiv.org/html/2606.17215#S4.E4)\) into the robustness condition ofShen25\. Along the empirical optimumw^\\hat\{w\}, the clean hinge gradient is lower\-bounded by the dense\-pancake density \(the in\-regime log\-concave band mass alongw^\\hat\{w\}\); the condition for the dirty gradient not to overwhelm it is4\(−2\)\>LSN\(q∘SD\)/n\\frac\{\\gamma\}\{4\}\(\\rho\-2\\xi\)\>\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)/n\. Using \([4\.4](https://arxiv.org/html/2606.17215#S4.E4)\) this reads4\(−2\)\>\(Ct\)¯1−1/2t\\frac\{\\gamma\}\{4\}\(\\rho\-2\\xi\)\>\(Ct\)\\bar\{\\sigma\}\\,\{\}^\{1\-1/2t\}\. Balancing the two bottlenecks \(the pancake budget , independent oftt, and the outlier term\) solves to
≤\(t\)0=\(min\{4,\(/\(\(Ct\)¯\)\)2t2t−1\}\),\\displaystyle\\eta\\;\\leq\\;\{\}\_\{0\}\(t\)\\;=\\;\\Omega\\Big\(\\min\\Big\\\{\\tfrac\{\\rho\}\{4\},\\,\\big\(\\gamma\\rho/\(\(Ct\)\\bar\{\\sigma\}\)\\big\)^\{\\frac\{2t\}\{2t\-1\}\}\\Big\\\}\\Big\),which is \([4\.5](https://arxiv.org/html/2606.17215#S4.E5)\)\. Under this , theShen25analysis yieldserr𝒟X\(w^\)≤\\mathrm\{err\}\_\{\\mathcal\{D\}\_\{X\}\}\(\\hat\{w\}\)\\leq\\varepsilon, in timenO\(t\)n^\{O\(t\)\}withn=dO\(t\)poly\(1/\)n=d^\{O\(t\)\}\\operatorname\{poly\}\(1/\\varepsilon\)\. ∎
### B\.6The degree\-2 outlier barrier
We first record the cloak, then prove the barrier\.
See[4\.17](https://arxiv.org/html/2606.17215#S4.Thmtheorem17)
###### Proof\.
Take each𝒞±\\mathcal\{C\}\_\{\\pm\}to be an isotropic log\-concave bump of scale centered at±ae1\\pm ae\_\{1\}witha=/a=\\sigma/\\sqrt\{\\eta\}; a unimodal scale\- density is log\-concave, which is the reason for using two bumps rather than a single±a\\pm aatom \(the latter would be bimodal, hence not log\-concave\)\. The mean offset is∥∥±=/\\lVert\{\}\_\{\\pm\}\\rVert=\\sigma/\\sqrt\{\\eta\}, which is≤r\\leq rwithin the certified regimer=O\(d\)r=O\(\\sigma\\sqrt\{d\}\)exactly when≥\(1/d\)\\eta\\geq\\Omega\(1/d\)\. Draw2n2\\eta nclean andn\\eta ndirty points from the same𝒞±\\mathcal\{C\}\_\{\\pm\}, so the combined cluster has≈3n\\approx 3\\eta npoints near±ae1\\pm ae\_\{1\}\. The empirical second moment in thee1e\_\{1\}direction over the dirty points is1n∑SD\(e1⋅xi\)2≈⋅a2=⋅/2=2\\frac\{1\}\{n\}\\sum\_\{S\_\{D\}\}\(e\_\{1\}\\cdot x\_\{i\}\)^\{2\}\\approx\\eta\\cdot a^\{2\}=\\eta\\cdot\{\}^\{2\}/\\eta=\{\}^\{2\}, so the cluster spends exactly thee1e\_\{1\}\-variance budget\. Since clean and dirty points are drawn from the identical law at identical locations, the multiset\{xi\}\\\{x\_\{i\}\\\}is invariant under any permutation that exchanges dirty and clean cluster points\. ∎
See[4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)
###### Proof\.
Use the construction of[Lemma˜4\.17](https://arxiv.org/html/2606.17215#S4.Thmtheorem17): two log\-concave clusters𝒞±\\mathcal\{C\}\_\{\\pm\}at means±ae1\\pm ae\_\{1\},a=/a=\\sigma/\\sqrt\{\\eta\}, with2n2\\eta nclean andn\\eta ndirty points drawn from the same law\. The adversary’s only freedom is*which*of the≈3n\\approx 3\\eta ncluster points are labeled dirty\.
*The cloak\.*By[Lemma˜4\.17](https://arxiv.org/html/2606.17215#S4.Thmtheorem17), \(a\) the dirty points are*second\-moment invisible*:1n∑SD\(e1⋅xi\)2=a2=2\\frac\{1\}\{n\}\\sum\_\{S\_\{D\}\}\(e\_\{1\}\\cdot x\_\{i\}\)^\{2\}=\\eta a^\{2\}=\{\}^\{2\}, which is within the directional\-variance budget¯2\\bar\{\\sigma\}^\{2\}and so does not flag them; and \(b\) they are*first\-moment aligned*:∑SD\|e1⋅xi\|=n⋅a=n⋅/=n\\sum\_\{S\_\{D\}\}\\lvert e\_\{1\}\\cdot x\_\{i\}\\rvert=\\eta n\\cdot a=\\eta n\\cdot\\sigma/\\sqrt\{\\eta\}=\\sqrt\{\\eta\}\\,n\\sigma, which is exactly theLSN\\mathrm\{LSN\}mass the program must somehow remove\.
*The label\-oblivious lower bound\.*The literal “for all feasibleqq” claim is*false*once the budget≥\\xi\\geq\\eta: the feasibility polytope then contains the weightqqthat zeros all ofSDS\_\{D\}\(and indeed the whole spike\), which costs at most3n≤n3\\eta n\\leq\\xi nremoved points if≥3\\xi\\geq 3\\eta, and second\-moment feasibility is unaffected\. So a budget\-aware adversary cannot force*every*feasibleqqto keep the spike\. The correct statement is in expectation over which cluster points are dirty\. The degree\-22variance constraints depend only on the multiset\{xi\}\\\{x\_\{i\}\\\}, which by[Lemma˜4\.17](https://arxiv.org/html/2606.17215#S4.Thmtheorem17)is invariant under exchanging dirty and clean cluster points; hence a*label\-oblivious*program \(one not given the ground\-truth corruption labels\) must treat all≈3n\\approx 3\\eta ncluster points identically\. It removes at mostn=n\\xi n=\\kappa\\eta npoints total, so the combined cluster retains weight≥\(3−\)n\\geq\(3\-\\kappa\)\\eta n\. By symmetry, in expectation over whichn\\eta nof the3n3\\eta ncluster points are dirty, the dirty share of the retained weight is13\\tfrac\{1\}\{3\}, so the expected retained dirty weight is≥13\(3−\)n≥13n\\geq\\tfrac\{1\}\{3\}\(3\-\\kappa\)\\eta n\\geq\\tfrac\{1\}\{3\}\\eta nfor≤2\\kappa\\leq 2\. Each retained dirty point contributes\|e1⋅xi\|≈a\\lvert e\_\{1\}\\cdot x\_\{i\}\\rvert\\approx atoLSN\\mathrm\{LSN\}alongw=e1w=e\_\{1\}, so
E\[LSN\(q∘SD\)\]≥13n⋅a=13n⋅=13n=\(n1/2¯\),\\displaystyle\\mdmathbb\{E\}\\big\[\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)\\big\]\\;\\geq\\;\\tfrac\{1\}\{3\}\\eta n\\cdot a\\;=\\;\\tfrac\{1\}\{3\}\\eta n\\cdot\\frac\{\\sigma\}\{\\sqrt\{\\eta\}\}\\;=\\;\\tfrac\{1\}\{3\}\\sqrt\{\\eta\}\\,n\\sigma\\;=\\;\\Omega\\big\(\{\}^\{1/2\}n\\bar\{\\sigma\}\\big\),which is \([4\.3](https://arxiv.org/html/2606.17215#S4.E3)\) \(using¯=\(\)\\bar\{\\sigma\}=\\Theta\(\\sigma\)\)\. Comparing with the1\-1/2trate of[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)att=1t=1shows degree22is stuck at1/2, i\.e\.\(/¯\)20\{\}\_\{0\}\\lesssim\(\\gamma\\rho/\\bar\{\\sigma\}\)^\{2\}\.
*Degree44escapes\.*The sameSDS\_\{D\}has an*inflated*empirical fourth moment in thee1e\_\{1\}direction:1n∑SD\(e1⋅xi\)4≈⋅a4=⋅/4=2/4≫¯4\\frac\{1\}\{n\}\\sum\_\{S\_\{D\}\}\(e\_\{1\}\\cdot x\_\{i\}\)^\{4\}\\approx\\eta\\cdot a^\{4\}=\\eta\\cdot\{\}^\{4\}/\{\}^\{2\}=\{\}^\{4\}/\\eta\\gg\\bar\{\\sigma\}^\{4\}\. A degree\-44certificate sees this; degree\-44feasibility forces the reweighted fourth moment down, which by thet=2t=2instance of \([4\.4](https://arxiv.org/html/2606.17215#S4.E4)\) drivesLSN\(q∘SD\)/n≤C′¯<3/4C′¯1/2\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)/n\\leq C^\{\\prime\}\\bar\{\\sigma\}\\,\{\}^\{3/4\}<C^\{\\prime\}\\bar\{\\sigma\}\\,\{\}^\{1/2\}\. So the same instance defeats degree22but not degree44, and the separation is exactly one of moment degree\. ∎
### B\.7The breakdown floor
See[4\.11](https://arxiv.org/html/2606.17215#S4.Thmtheorem11)
###### Proof\.
Le Cam two\-point\. Consider an isotropic log\-concave instance \(K=1K=1,=I2\\Sigma=\{\}^\{2\}I\) and two halfspacesw1⋆,w2⋆w\_\{1\}^\{\\star\},w\_\{2\}^\{\\star\}at angle ; letP1,P2P\_\{1\},P\_\{2\}be the two clean joint laws they induce\. The exact Kearns–Li malicious envelope \([Fact˜C\.2](https://arxiv.org/html/2606.17215#A3.Thmtheorem2),KearnsLi93\) bounds the total variation between the two clean distributions an \-malicious adversary can induce:TV\(P1,P2\)≤/\(1−\)\\mathrm\{TV\}\(P\_\{1\},P\_\{2\}\)\\leq\\eta/\(1\-\\eta\)\. Therefore the adversary can make the two instances information\-theoretically indistinguishable whenever the disagreement probability satisfies
=Pr\[sgn\(w1⋆⋅x\)≠sgn\(w2⋆⋅x\)\]≤1−,\\displaystyle\\frac\{\\theta\}\{\\pi\}\\;=\\;\\Pr\\big\[\\operatorname\{sgn\}\(w\_\{1\}^\{\\star\}\\\!\\cdot x\)\\neq\\operatorname\{sgn\}\(w\_\{2\}^\{\\star\}\\\!\\cdot x\)\\big\]\\;\\leq\\;\\frac\{\\eta\}\{1\-\\eta\},\(B\.7\)where the left equality is the random\-hyperplane disagreement identity for a spherically symmetric \(here isotropic log\-concave\) marginal\. Pick at the boundary of \([B\.7](https://arxiv.org/html/2606.17215#A2.E7)\),=/\(1−\)\\theta=\\pi\\eta/\(1\-\\eta\)\. Any single outputw^\\hat\{w\}disagrees with at least one ofw1⋆,w2⋆w\_\{1\}^\{\\star\},w\_\{2\}^\{\\star\}on at least half of their mutual disagreement region, so it errs by at least/\(2\)\\theta/\(2\\pi\)on that instance, giving
err𝒟X\(w^\)≥2=12⋅1−=2\(1−\)≥2\.\\displaystyle\\mathrm\{err\}\_\{\\mathcal\{D\}\_\{X\}\}\(\\hat\{w\}\)\\;\\geq\\;\\frac\{\\theta\}\{2\\pi\}\\;=\\;\\frac\{1\}\{2\}\\cdot\\frac\{\\eta\}\{1\-\\eta\}\\;=\\;\\frac\{\\eta\}\{2\(1\-\\eta\)\}\\;\\geq\\;\\frac\{\\eta\}\{2\}\.The covariance control⪯I2\\Sigma\\preceq\{\}^\{2\}Ifor the \(halfspace\-truncated\) two\-point instances uses that halfspace truncation does not increase covariance for log\-concave laws \([Fact˜C\.3](https://arxiv.org/html/2606.17215#A3.Thmtheorem3),BrascampLieb76\)\. ∎
## Appendix CAuxiliary results
For self\-containedness we collect, in one place, every external result the paper relies on, each with full hypotheses, conclusion, and citation, and a one\-line proof or pointer where it is short\.[Section˜C\.1](https://arxiv.org/html/2606.17215#A3.SS1)gathers the robust\-learning and log\-concave inputs;[Section˜C\.2](https://arxiv.org/html/2606.17215#A3.SS2)gathers the orthogonal\-polynomial and moment\-problem facts used in[Section˜3](https://arxiv.org/html/2606.17215#S3)\. The labels are the ones cited from the main text, so each cross\-reference there resolves to its statement here\.
### C\.1Robust learning and log\-concave inputs
###### Fact C\.1\(Log\-concave moment growth;LovaszVempala07\)\.
There is an absolute constantC0C\_\{0\}such that every centered one\-dimensional log\-concave random variableZZwithE\[Z2\]=s2\\mdmathbb\{E\}\[Z^\{2\}\]=s^\{2\}satisfiesE\[\|Z\|p\]1/p≤C0ps\\mdmathbb\{E\}\[\\lvert Z\\rvert^\{p\}\]^\{1/p\}\\leq C\_\{0\}\\,p\\,sfor allp≥2p\\geq 2; in particularE\[Z2t\]≤\(C0t\)2ts2t\\mdmathbb\{E\}\[Z^\{2t\}\]\\leq\(C\_\{0\}t\)^\{2t\}s^\{2t\}\. The growth is linear inpp\(sub\-exponential\), and the exponent is tight: for the Laplace law,E\[Z2t\]/\(E\[Z2\]\)t=\(2t\)\!/2t=\(\(t\)\)2t\\mdmathbb\{E\}\[Z^\{2t\}\]/\(\\mdmathbb\{E\}\[Z^\{2\}\]\)^\{t\}=\(2t\)\!/2^\{t\}=\(\\Theta\(t\)\)^\{2t\}\.
###### Fact C\.2\(Malicious indistinguishability;KearnsLi93\)\.
Under malicious noise at rate , two clean modelsP1,P2P\_\{1\},P\_\{2\}admit a common corrupted law \(hence are information\-theoretically indistinguishable\) for some adversary strategy if and only ifTV\(P1,P2\)≤/\(1−\)\\mathrm\{TV\}\(P\_\{1\},P\_\{2\}\)\\leq\\eta/\(1\-\\eta\)\.
###### Proof\.
The corrupted law of modeliiis\(1−\)Pi\+Ai\(1\-\\eta\)P\_\{i\}\+\\eta A\_\{i\}for an arbitrary noise distributionAiA\_\{i\}, so a common corrupted law exists iff\(1−\)P1\+A1=\(1−\)P2\+A2\(1\-\\eta\)P\_\{1\}\+\\eta A\_\{1\}=\(1\-\\eta\)P\_\{2\}\+\\eta A\_\{2\}is solvable in distributionsA1,A2A\_\{1\},A\_\{2\}\. Rearranging,\(A1−A2\)=\(1−\)\(P2−P1\)\\eta\(A\_\{1\}\-A\_\{2\}\)=\(1\-\\eta\)\(P\_\{2\}\-P\_\{1\}\), and a signed measure\(1−\)\(P2−P1\)\(1\-\\eta\)\(P\_\{2\}\-P\_\{1\}\)of total variation\(1−\)TV\(P1,P2\)\(1\-\\eta\)\\mathrm\{TV\}\(P\_\{1\},P\_\{2\}\)is the \-scaled difference of two probability measures iff its total mass is zero \(it is\) and its variation is at most \. Hence solvability holds iff\(1−\)TV\(P1,P2\)≤\(1\-\\eta\)\\mathrm\{TV\}\(P\_\{1\},P\_\{2\}\)\\leq\\eta, i\.e\.TV\(P1,P2\)≤/\(1−\)\\mathrm\{TV\}\(P\_\{1\},P\_\{2\}\)\\leq\\eta/\(1\-\\eta\)\. ∎
###### Fact C\.3\(Truncation does not increase covariance;BrascampLieb76\)\.
If is log\-concave onRd\\mdmathbb\{R\}^\{d\}andK⊆RdK\\subseteq\\mdmathbb\{R\}^\{d\}is convex with\(K\)\>0\\mu\(K\)\>0, then the conditional law\(⋅∣K\)\\mu\(\\cdot\\mid K\)is log\-concave andCov\(\(⋅∣K\)\)⪯Cov\(\)\\operatorname\{Cov\}\\big\(\\mu\(\\cdot\\mid K\)\\big\)\\preceq\\operatorname\{Cov\}\(\\mu\)\.
###### Fact C\.4\(Largest zero of Freud orthogonal polynomials;LevinLubinsky01;KriecherbauerMcLaughlin99\)\.
For the weightW\(u\)=e−\|u/s\|W\(u\)=e^\{\-\\lvert u/s\\rvert\}with\>0\\alpha\>0, the largest zero of the degree\-nnorthonormal polynomial equals\(1\+o\(1\)\)an\(1\+o\(1\)\)\\,a\_\{n\}, where the Mhaskar–Rakhmanov–Saff \(MRS\) number satisfiesan≍sn1/a\_\{n\}\\asymp s\\,n^\{1/\\alpha\}\. In particular=2\\alpha=2gives\(sn\)\\Theta\(s\\sqrt\{n\}\)and=1\\alpha=1gives\(sn\)\\Theta\(s\\,n\)\.
###### Fact C\.5\(Certifiable hypercontractivity;KlivansKothariMeka18;DiakonikolasHopkinsPensiaTiegel24\)\.
For a centered sub\-Gaussian law there is a degree\-2t2tSoS proof, in the indeterminateww, ofE\[\(w⋅z\)2t\]≤\(Ct\)t\(w⊤w\)t\\mdmathbb\{E\}\[\(w\\cdot z\)^\{2t\}\]\\leq\(Ct\)^\{t\}\\,\(w^\{\\top\}\\Sigma w\)^\{t\}\. For a centered sub\-exponential \(general log\-concave\) law the analogous degree\-2t2tSoS proof holds with the weaker constant\(Ct\)2t\(Ct\)^\{2t\}\([Lemma˜4\.13](https://arxiv.org/html/2606.17215#S4.Thmtheorem13)and its sub\-lemma\)\.
### C\.2Orthogonal polynomials and the moment problem
Throughout, is a measure onR\\mdmathbb\{R\}with all momentsmk=∫ukdm\_\{k\}=\\int u^\{k\}\\,d\\muand infinite support,\{\}kk≥0\\\{\{\}\_\{k\}\\\}\_\{k\\geq 0\}are its orthonormal polynomials,\(x\)n=\(∑k<n\(x\)2k\)−1\{\}\_\{n\}\(x\)=\\big\(\\sum\_\{k<n\}\{\}\_\{k\}\(x\)^\{2\}\\big\)^\{\-1\}is its order\-nnChristoffel function, andℳ2t\(\)=\{≥0:∫ukd=mk,0≤k≤2t\}\\mathcal\{M\}\_\{2t\}\(\\mu\)=\\\{\\nu\\geq 0:\\int u^\{k\}\\,d\\nu=m\_\{k\},\\ 0\\leq k\\leq 2t\\\}is the moment variety of nonnegative measures matching the first2t2tmoments\.
###### Fact C\.6\(Gauss quadrature;Szego39;Gautschi04\)\.
The\(t\+1\)\(t\{\+\}1\)\-node Gauss rule for is exact onR\[u\]≤2t\+1\\mdmathbb\{R\}\[u\]\_\{\\leq 2t\+1\}: there are nodesu1<⋯<ut\+1u\_\{1\}<\\dots<u\_\{t\+1\}and weights,1…,t\+1\{\}\_\{1\},\\dots,\{\}\_\{t\+1\}with∫gd=∑i=1t\+1gi\(ui\)\\int g\\,d\\mu=\\sum\_\{i=1\}^\{t\+1\}\{\}\_\{i\}\\,g\(u\_\{i\}\)for every polynomialggof degree≤2t\+1\\leq 2t\+1\. The nodes are exactly the zeros oft\+1\(all real and simple\), and the Christoffel weights are positive,=i\(ui\)t\+1\>0\{\}\_\{i\}=\{\}\_\{t\+1\}\(u\_\{i\}\)\>0\. In particular=GQ∑i∈iuiℳ2t\(\)\{\}\_\{\\mathrm\{GQ\}\}=\\sum\_\{i\}\{\}\_\{i\}\{\}\_\{u\_\{i\}\}\\in\\mathcal\{M\}\_\{2t\}\(\\mu\)\.
###### Fact C\.7\(Christoffel function as maximal mass;Nevai86;DetteStudden97\)\.
For everyx0∈Rx\_\{0\}\\in\\mdmathbb\{R\},\(x0\)t\+1=min\{∫p2d:degp≤t,p\(x0\)=1\}=max\{\(\{x0\}\):∈ℳ2t\(\)\};\{\}\_\{t\+1\}\(x\_\{0\}\)=\\min\\\{\\int p^\{2\}\\,d\\mu:\\ \\deg p\\leq t,\\ p\(x\_\{0\}\)=1\\\}=\\max\\\{\\nu\(\\\{x\_\{0\}\\\}\):\\ \\nu\\in\\mathcal\{M\}\_\{2t\}\(\\mu\)\\\};i\.e\.\(x0\)t\+1\{\}\_\{t\+1\}\(x\_\{0\}\)is the largest atom a measure with ’s first2t2tmoments may carry atx0x\_\{0\}\. This is[Fact˜3\.2](https://arxiv.org/html/2606.17215#S3.Thmtheorem2), proved in[Section˜3\.2](https://arxiv.org/html/2606.17215#S3.SS2); restated here for the collection\.
###### Fact C\.8\(Christoffel\-function bulk/edge asymptotics;MateNevaiTotik91;Totik00\)\.
For measures regular in the bulk \(in particular Freud weightse−\|u/s\|e^\{\-\\lvert u/s\\rvert\},≥1\\alpha\\geq 1, on their support\),n\(x\)n→d/d\(x\)n\\,\{\}\_\{n\}\(x\)\\to\\mathrm\{d\}\\mu/\\mathrm\{d\}\\omega\\,\(x\)asn→∞n\\to\\inftyat every bulk pointxx, where is the equilibrium \(arcsine\-type\) measure of the support; the Christoffel function at ordernnthus resolves mass on the scale1/n1/nin the bulk and degrades toward the spectral edgex≍anx\\asymp a\_\{n\}\.
###### Fact C\.9\(Truncated moment problem and the moment cone;CurtoFialkow91;DetteStudden97\)\.
The set of moment sequences\(m0,…,m2t\)\(m\_\{0\},\\dots,m\_\{2t\}\)realizable by some nonnegative measure onR\\mdmathbb\{R\}is a closed convex cone, cut out by the positive\-semidefiniteness of the Hankel matrix\(mi\+j\)0≤i,j≤t\(m\_\{i\+j\}\)\_\{0\\leq i,j\\leq t\}; a sequence on its boundary is realized by a unique finitely\-supported measure \(a*principal representation*\), of which the Gauss measure of[Fact˜C\.6](https://arxiv.org/html/2606.17215#A3.Thmtheorem6)is one\. Linear functionals overℳ2t\(\)\\mathcal\{M\}\_\{2t\}\(\\mu\)\(in particular↦\(B\)\\nu\\mapsto\\nu\(B\)for a bandBB\) attain their extrema at these finitely\-supported representations \(canonical\-moments / Markov–Krein theory\)\.
###### Fact C\.10\(Infinite–finite range inequality for exponential weights;LevinLubinsky01\)\.
For a Freud weightW=e−\|u/s\|W=e^\{\-\\lvert u/s\\rvert\},≥1\\alpha\\geq 1, and every polynomialppof degree≤n\\leq n, theLq\(W\)L^\{q\}\(W\)\-mass ofpWpWoutside the MRS interval\[−an,an\]\[\-a\_\{n\},a\_\{n\}\]is exponentially small:
∫\|u\|\>an\|p\(u\)W\(u\)\|q𝑑u≤e−cn∫\|u\|≤an\|p\(u\)W\(u\)\|q𝑑u\\int\_\{\\lvert u\\rvert\>a\_\{n\}\}\\lvert p\(u\)W\(u\)\\rvert^\{q\}\\,du\\ \\leq\\ e^\{\-cn\}\\int\_\{\\lvert u\\rvert\\leq a\_\{n\}\}\\lvert p\(u\)W\(u\)\\rvert^\{q\}\\,dufor an absolutec=c\(,q\)\>0c=c\(\\alpha,q\)\>0\. Equivalently, polynomial mass beyond the MRS numberan≍sn1/a\_\{n\}\\asymp s\\,n^\{1/\\alpha\}ise−\(n\)e^\{\-\\Omega\(n\)\}\.
## Appendix DFrom sum\-of\-squares certificates to a semidefinite program
Both the outlier\-removal step of[Algorithm˜1](https://arxiv.org/html/2606.17215#alg1)and the minorant of[Definition˜2\.1](https://arxiv.org/html/2606.17215#S2.Thmtheorem1)ask for a*degree\-2t2tsum\-of\-squares \(SoS\) certificate*\. For completeness we recall why each is a semidefinite program \(SDP\) of sizedO\(t\)d^\{O\(t\)\}, hence solvable in polynomial time for fixedtt\. The correspondence is standard\(Parrilo03;Lasserre01\); we specialize it to the two programs we use\.
#### SoS as a positive\-semidefinite Gram matrix\.
Letvt\(w\)∈RNv\_\{t\}\(w\)\\in\\mdmathbb\{R\}^\{N\}, withN=\(d\+t−1t\)=dO\(t\)N=\\binom\{d\+t\-1\}\{t\}=d^\{O\(t\)\}, be the vector of all monomials inw=\(w1,…,wd\)w=\(w\_\{1\},\\dots,w\_\{d\}\)of degree exactlytt\. A real formP\(w\)P\(w\)homogeneous of degree2t2tis a*sum of squares*,P=∑ℓsℓ2P=\\sum\_\{\\ell\}s\_\{\\ell\}^\{2\}with eachsℓs\_\{\\ell\}a form of degreett, if and only if
P\(w\)=vt\(w\)⊤Mvt\(w\)for some symmetricM⪰0\.P\(w\)\\;=\\;v\_\{t\}\(w\)^\{\\top\}M\\,v\_\{t\}\(w\)\\qquad\\text\{for some symmetric \}M\\succeq 0\.\(D\.1\)\(“Only if”: writingsℓ\(w\)=⟨cℓ,vt\(w\)⟩s\_\{\\ell\}\(w\)=\\langle c\_\{\\ell\},v\_\{t\}\(w\)\\ranglegivesM=∑ℓcℓcℓ⊤⪰0M=\\sum\_\{\\ell\}c\_\{\\ell\}c\_\{\\ell\}^\{\\top\}\\succeq 0\. “If”: a Cholesky factorizationM=∑ℓcℓcℓ⊤M=\\sum\_\{\\ell\}c\_\{\\ell\}c\_\{\\ell\}^\{\\top\}recovers the squares\.\) The matrixvt\(w\)⊤Mvt\(w\)v\_\{t\}\(w\)^\{\\top\}Mv\_\{t\}\(w\)is a form whose coefficient of each monomialww\(\|\|=2t\\lvert\\alpha\\rvert=2t\) is a fixed*linear*functional⟨A,M⟩\\langle A,M\\rangleofMM, where theAAare the constant symmetric matrices that collect the index pairs\(,\)\(\\beta,\\gamma\)with\+=\\beta\+\\gamma=\\alpha\. Matching these to the coefficientsPPofPP, the statement “PPis SoS” becomes the SDP feasibility problem
∃M⪰0:⟨A,M⟩=Pfor all\|\|=2t\.\\exists\\,M\\succeq 0:\\qquad\\langle A,M\\rangle=P\\quad\\text\{for all \}\\lvert\\alpha\\rvert=2t\.\(D\.2\)An SoS certificate is sufficient forP\(w\)≥0P\(w\)\\geq 0on all ofRd\\mdmathbb\{R\}^\{d\}, and it is what we mean by a*degree\-2t2tcertificate*; it is also necessary whend=1d=1or2t=22t=2, while ford≥2d\\geq 2and2t≥42t\\geq 4it is a relaxation, and the relaxation*degree*2t2tis precisely the resource this paper studies \([Theorems˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)and[4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)\)\. Solving \([D\.2](https://arxiv.org/html/2606.17215#A4.E2)\) \(with any linear objective, to additive accuracy \) takespoly\(N,log\(1/\)\)=poly\(dO\(t\)\)\\operatorname\{poly\}\(N,\\log\(1/\\delta\)\)=\\operatorname\{poly\}\(d^\{O\(t\)\}\)time by interior\-point methods\.
#### The outlier\-removal program of[Algorithm˜1](https://arxiv.org/html/2606.17215#alg1)\.
Step 2 seeks weightsq∈\[0,1\]nq\\in\[0,1\]^\{n\}with∑iqi≥\(1−\)n\\sum\_\{i\}q\_\{i\}\\geq\(1\-\\xi\)ncertifying1n∑iqi\(w⋅xi\)2t≤\(Ct\)2t¯2t\\frac\{1\}\{n\}\\sum\_\{i\}q\_\{i\}\(w\\cdot x\_\{i\}\)^\{2t\}\\leq\(Ct\)^\{2t\}\\bar\{\\sigma\}^\{2t\}for allww\. Homogenizing by∥w∥2t\\lVert w\\rVert^\{2t\}, this asks the degree\-2t2tform
Pq\(w\)≔\(Ct\)2t¯2t∥w∥2t−1n∑i=1nqi\(w⋅xi\)2tP\_\{q\}\(w\)\\ \\coloneqq\\ \(Ct\)^\{2t\}\\bar\{\\sigma\}^\{2t\}\\,\\lVert w\\rVert^\{2t\}\\;\-\\;\\tfrac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}q\_\{i\}\\,\(w\\cdot x\_\{i\}\)^\{2t\}\(D\.3\)to be nonnegative for allww, which we certify by requiringPqP\_\{q\}to be SoS\. The coefficients ofPqP\_\{q\}are*linear*inqq: the form\(w⋅xi\)2t\(w\\cdot x\_\{i\}\)^\{2t\}has a fixed coefficient vectorbib\_\{i\}\(the symmetrized2t2t\-fold tensor power ofxix\_\{i\}\), so\[Pq\]=\(Ct\)2t¯2t\[∥w∥2t\]−1n∑iqi\[bi\]\[P\_\{q\}\]=\(Ct\)^\{2t\}\\bar\{\\sigma\}^\{2t\}\[\\lVert w\\rVert^\{2t\}\]\-\\frac\{1\}\{n\}\\sum\_\{i\}q\_\{i\}\\,\[b\_\{i\}\]\. Combining \([D\.2](https://arxiv.org/html/2606.17215#A4.E2)\) with the \(SDP\-representable\) box and budget constraints gives a single semidefinite program in the variables\(q,M\)\(q,M\):
findq∈\[0,1\]n,M⪰0:∑iqi≥\(1−\)n,⟨A,M⟩=\[Pq\]\(\|\|=2t\)\.\\text\{find\}\\quad q\\in\[0,1\]^\{n\},\\ M\\succeq 0:\\qquad\\textstyle\\sum\_\{i\}q\_\{i\}\\geq\(1\-\\xi\)n,\\quad\\langle A,M\\rangle=\[P\_\{q\}\]\\ \\ \(\\lvert\\alpha\\rvert=2t\)\.\(D\.4\)HereMMisN×NN\\times NwithN=dO\(t\)N=d^\{O\(t\)\}and there arennscalar variablesqiq\_\{i\}, so \([D\.4](https://arxiv.org/html/2606.17215#A4.E4)\) is solvable inpoly\(n,dO\(t\)\)\\operatorname\{poly\}\(n,d^\{O\(t\)\}\)time\. Att=1t=1,v1\(w\)=\(w1,…,wd\)v\_\{1\}\(w\)=\(w\_\{1\},\\dots,w\_\{d\}\),MMisd×dd\\times d, and \([D\.3](https://arxiv.org/html/2606.17215#A4.E3)\) is the quadratic formw⊤\(C2¯2I−1n∑iqixixi⊤\)ww^\{\\top\}\\\!\\big\(C^\{2\}\\bar\{\\sigma\}^\{2\}I\-\\frac\{1\}\{n\}\\sum\_\{i\}q\_\{i\}x\_\{i\}x\_\{i\}^\{\\top\}\\big\)w; SoS now coincides with positive\-semidefiniteness, so \([D\.4](https://arxiv.org/html/2606.17215#A4.E4)\) reduces to the reweighted\-covariance bound1n∑iqixixi⊤⪯C2¯2I\\frac\{1\}\{n\}\\sum\_\{i\}q\_\{i\}x\_\{i\}x\_\{i\}^\{\\top\}\\preceq C^\{2\}\\bar\{\\sigma\}^\{2\}I, the soft\-outlier\-removal program ofShen25\. This is the precise sense in which their degree\-22program is thet=1t=1instance of ours; raisingttenlarges the Gram matrix fromd×dd\\times dto\(d\+t−1t\)×\(d\+t−1t\)\\binom\{d\+t\-1\}\{t\}\\times\\binom\{d\+t\-1\}\{t\}and nothing else\. The robust\-statistics use of exactly this SoS\-reweighting SDP is due toHopkinsLi18;KothariSteinhardtSteurer18\.
#### The dual side: pseudo\-expectations and the minorant of[Definition˜2\.1](https://arxiv.org/html/2606.17215#S2.Thmtheorem1)\.
The lower\-bound programs are the SDP duals\. A degree\-2t2tpseudo\-expectationE~\\tilde\{\\mdmathbb\{E\}\}\([Definition˜2\.1](https://arxiv.org/html/2606.17215#S2.Thmtheorem1)\) on the one\-dimensional projectionUUis determined by its pseudo\-momentsE~\[uk\]\\tilde\{\\mdmathbb\{E\}\}\[u^\{k\}\],0≤k≤2t0\\leq k\\leq 2t; the axiomsE~\[1\]=1\\tilde\{\\mdmathbb\{E\}\}\[1\]=1,E~\[q2\]≥0\\tilde\{\\mdmathbb\{E\}\}\[q^\{2\}\]\\geq 0fordegq≤t\\deg q\\leq t, andE~\[uk\]=mk\\tilde\{\\mdmathbb\{E\}\}\[u^\{k\}\]=m\_\{k\}say exactly that the Hankel*moment matrix*ℳ=\(E~\[ui\+j\]\)0≤i,j≤t⪰0\\mathcal\{M\}=\\big\(\\tilde\{\\mdmathbb\{E\}\}\[u^\{i\+j\}\]\\big\)\_\{0\\leq i,j\\leq t\}\\succeq 0, that its top\-left entry is11, and that its first2t2tanti\-diagonals equal ’s moments\. Searching for a minorantp≤𝟏Bp\\leq\\mathbf\{1\}\_\{B\}that maximizesE\[p\]=∑kpkmk\\mdmathbb\{E\}\[p\]=\\sum\_\{k\}p\_\{k\}m\_\{k\}is the SDP dual to minimizing\(B\)\\nu\(B\)over nonnegative measures with these moments, the moment–SoS hierarchy of Lasserre\(Lasserre01;LasserrePauwelsCDK\)for the truncated moment problem \([Appendix˜C](https://arxiv.org/html/2606.17215#A3)\)\. BecauseUUis one\-dimensional,ℳ\\mathcal\{M\}is a\(t\+1\)×\(t\+1\)\(t\{\+\}1\)\\times\(t\{\+\}1\)Hankel matrix and the dual SDP is small; its extreme points are the Gauss\-quadrature measures \(flat extensions ofℳ\\mathcal\{M\}\) that drive the barrier of[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)\.
## Appendix EExtension to nasty noise
The whole picture \(both barriers and the degree\-2t2talgorithm\) transfers from malicious noise to the stronger*nasty\-noise*model ofBshoutyEironKushilevitz02, at the cost of a constant in the tolerable rate\. We record the statement and the \(short\) modifications here\.
#### The model\.
A nasty adversary draws thennclean examples i\.i\.d\. from𝒟\\mathcal\{D\}, inspects them \(knowing the learner,𝒟\\mathcal\{D\}, andw⋆w^\{\\star\}\), then*deletes*up ton\\eta nof them and*adds*up ton\\eta narbitrary\(instance,label\)\(\\text\{instance\},\\text\{label\}\)pairs, keeping the sample sizenn\. The learner seesSC∪SDS\_\{C\}\\cup S\_\{D\}, where nowSCS\_\{C\}is the*retained*clean subset \(\|SC\|≥\(1−\)n\\lvert S\_\{C\}\\rvert\\geq\(1\-\\eta\)n\) andSDS\_\{D\}the≤n\\leq\\eta nadded points\. This subsumes malicious noise \(the adversary may add without deleting\) and is the sample analogue of strong contamination in robust statistics\(DiakonikolasKaneStewart18;DKKLMS16\); the new power is the adversary’s ability to*remove*clean inliers it finds inconvenient\.
#### Both barriers transfer a fortiori\.
[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)is a lower bound on the certificate method for certifying band mass of the*clean*marginal; it is a statement about and the SoS degree, and is unaffected by how the sample is corrupted\.[Theorem˜4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)is realized by a malicious instance \(it adds cloaked points\), which a nasty adversary can reproduce verbatim; since nasty noise subsumes malicious, the degree\-22barrier holds here as well\. So the only thing to check is that the*upper*bound survives deletion\.
###### Proposition E\.1\(Degree\-2t2talgorithm under nasty noise\)\.
Under the conditions of[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9), run[Algorithm˜1](https://arxiv.org/html/2606.17215#alg1)on the observed nasty sample with degreet≥1t\\geq 1and budget=\(\)\\xi=\\Theta\(\\eta\)\. There is a constantcc\(one may takec=5c=5\) such that for every nasty\-noise rate
≤\(t\)0nasty=\(min\{c,\(/\(\(Ct\)¯\)\)2t2t−1\}\),\\eta\\ \\leq\\ \{\}\_\{0\}^\{\\mathrm\{nasty\}\}\(t\)\\ =\\ \\Omega\\Big\(\\min\\Big\\\{\\tfrac\{\\rho\}\{c\},\\,\\big\(\\gamma\\rho/\(\(Ct\)\\bar\{\\sigma\}\)\\big\)^\{\\frac\{2t\}\{2t\-1\}\}\\Big\\\}\\Big\),the output satisfieserr𝒟X\(w^\)≤\\mathrm\{err\}\_\{\\mathcal\{D\}\_\{X\}\}\(\\hat\{w\}\)\\leq\\varepsilonin timenO\(t\)n^\{O\(t\)\}withn=dO\(t\)poly\(1/\)n=d^\{O\(t\)\}\\operatorname\{poly\}\(1/\\varepsilon\)\. The frontier is the same1\-1/2tas in[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9); only the pancake cap loosens from/4\\rho/4to/c\\rho/c\.
###### Proof\.
Three of the four ingredients of[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)are unchanged; only the clean signal is reduced by deletion\.
*\(i\) The reweighting stays feasible: the certificate survives deletion\.*WriteS~\\widetilde\{S\}for the originalnnclean i\.i\.d\. points \(before the adversary deletes\), soSC⊆S~S\_\{C\}\\subseteq\\widetilde\{S\}\. The clean\-indicator weightq⋆q^\{\\star\}\(qi⋆=1q^\{\\star\}\_\{i\}=1onSCS\_\{C\},0onSDS\_\{D\}, hence=\\xi=\\eta\) admits a degree\-2t2tcertificate of1n∑iqi⋆\(w⋅xi\)2t≤\(Ct\)2t¯2t\\tfrac\{1\}\{n\}\\sum\_\{i\}q^\{\\star\}\_\{i\}\(w\\cdot x\_\{i\}\)^\{2t\}\\leq\(Ct\)^\{2t\}\\bar\{\\sigma\}^\{2t\}:
1n∑i∈SC\(w⋅xi\)2t≤1n∑i∈S~\(w⋅xi\)2t≤\(Ct\)2t¯2t,\\tfrac\{1\}\{n\}\\textstyle\\sum\_\{i\\in S\_\{C\}\}\(w\\cdot x\_\{i\}\)^\{2t\}\\ \\leq\\ \\tfrac\{1\}\{n\}\\sum\_\{i\\in\\widetilde\{S\}\}\(w\\cdot x\_\{i\}\)^\{2t\}\\ \\leq\\ \(Ct\)^\{2t\}\\bar\{\\sigma\}^\{2t\},where the first inequality holds because the dropped terms∑i∈S~∖SC\(w⋅xi\)2t\\sum\_\{i\\in\\widetilde\{S\}\\setminus S\_\{C\}\}\(w\\cdot x\_\{i\}\)^\{2t\}are a sum of2t2t\-th powers, hence SoS, so the inequality has a degree\-2t2tSoS proof; and the second is the certificate of[Lemma˜4\.13](https://arxiv.org/html/2606.17215#S4.Thmtheorem13)for the clean empirical measure \(valid w\.h\.p\. oncen≥dO\(t\)n\\geq d^\{O\(t\)\}\)\. Composing the two SoS proofs certifiesq⋆q^\{\\star\}\. Deletion therefore cannot break feasibility: removing clean points only*lowers*the reweighted moment, which is the direction the certificate needs\.
*\(ii\) TheLSN\\mathrm\{LSN\}bound is unchanged\.*The dirty contribution comes from the added setSDS\_\{D\}\(\|SD\|≤n\\lvert S\_\{D\}\\rvert\\leq\\eta n\); deleted points are simply absent and contribute nothing\. The Hölder argument behind \([4\.4](https://arxiv.org/html/2606.17215#S4.E4)\), applied toSDS\_\{D\}, gives, for every feasibleqq,
LSN\(q∘SD\)≤n\(Ct\)¯,1−12t\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)\\ \\leq\\ n\\,\(Ct\)\\,\\bar\{\\sigma\}\\,\{\}^\{1\-\\frac\{1\}\{2t\}\},exactly as before\.
*\(iii\) The pancake loses an \-fraction\.*This is the one genuine change\. A \-dense pancake carries≥n\\geq\\rho nclean projections; deleting at mostn\\eta nof them leaves≥\(−\)n\\geq\(\\rho\-\\eta\)n, so the retained clean band mass is≥′−\{\}^\{\\prime\}\\geq\\rho\-\\eta\. Plugging′into the gradient\-domination condition ofShen25along the empirical optimum, the clean signal beats the dirty term once
4\(−−2\)\>\(Ct\)¯\.1−12t\\tfrac\{\\gamma\}\{4\}\\,\(\\rho\-\\eta\-2\\xi\)\\ \>\\ \(Ct\)\\,\\bar\{\\sigma\}\\,\{\}^\{1\-\\frac\{1\}\{2t\}\}\.With=\\xi=\\kappa\\eta\(≤2\\kappa\\leq 2\) the left side is positive for</\(2\+2\)≥/c\\eta<\\rho/\(2\+2\\kappa\)\\geq\\rho/c, and balancing the two bottlenecks \(the pancake budget and the outlier term\) solves to\(t\)0nasty\{\}\_\{0\}^\{\\mathrm\{nasty\}\}\(t\)above, with the same exponent2t2t−1\\tfrac\{2t\}\{2t\-1\}and the same base\(Ct\)\(Ct\)\. ∎
## Appendix FAdditional related work
The barriers in this paper sit inside a large literature on learning halfspaces under noise, much of which predates and motivates the reweighted\-hinge line we analyze\. We collect here, with thanks to these lines of work, the threads most relevant to our setting; the survey is necessarily partial, and we have surely omitted important contributions\. This appendix expands the brief positioning of[Section˜5\.1](https://arxiv.org/html/2606.17215#S5.SS1)\.
#### Halfspaces under malicious and adversarial noise\.
The malicious\-noise model is due toKearnsLi93\. Efficient distribution\-specific learners begin withKlivansLongServedio09, whose smoothing/averaging analysis tolerates a noise rate scaling with the margin under isotropic log\-concave marginals, and with the agnostic halfspace algorithms ofKalaiKlivansMansourServedio08andKaneKlivansMeka13via polynomial regression and moment matching\. The localization technique ofAwasthiBalcanLong17raised the tolerable malicious rate to a constant\. Most directly,Shen21sample;Shen21perceptron;Shen23linear;Shen25develop the reweighted\-hinge and localized\-perceptron approaches we study, with attribute\-efficient, sparse, and multiclass extensions byShenZhang21;ZengShen25;AdhikariZeng26\(the last by cluster\-based pruning rather than reweighting\)\. Our degree barriers are meant to*explain*, rather than improve on, the constants these works report\.
#### Massart, random\-classification, and bounded noise\.
A parallel and influential line studies semi\-random label noise\.DiakonikolasGouleakisTzamos19gave the first distribution\-independent halfspace learner under Massart noise, extended to general Massart noise under Gaussian marginals byDiakonikolasMassartGaussian22; closest in spirit to our margin\-degree tradeoff,DiakonikolasMarginRCN23study the information\-computation tradeoff for*margin*halfspaces under random classification noise\. The accompanying hardness theory, statistical\-query and cryptographic, is developed byDiakonikolasMassartSQ22;DiakonikolasMassartCrypto22;DiakonikolasKaneZarifis20\. On the active side, efficient learning under bounded, Massart, and Tsybakov noise is due toAwasthiBalcanHaghtalabUrner15;AwasthiBalcanHaghtalabZhang16;ZhangLi21\.
#### Active and label\-efficient halfspace learning\.
The margin\-based active\-learning framework ofBalcanBroderZhang07and its log\-concave analysis byBalcanLong13, together with the perceptron\-based algorithms ofYanZhang17and the sparse, noise\-tolerant active learners ofZhang18;ZhangShenAwasthi20, share our reliance on the geometry of log\-concave marginals and on label\-efficient outlier handling, though their resource of interest is label complexity rather than certificate degree\.
#### Sum\-of\-squares, moment methods, and robust statistics\.
Our algorithm imports the proofs\-to\-algorithms paradigm ofHopkinsLi18;KothariSteinhardtSteurer18, whose moment\-certification machinery underlies our degree\-2t2toutlier removal\. Closely related are the outlier\-robust and list\-decodable regression algorithms ofKlivansKothariMeka18;KarmalkarKlivansKothari19and the sum\-of\-squares robust\-estimation results ofDiakonikolasSparseSoS22;BakshiDiakonikolasJKKV22; attribute\-efficient and list\-decodable variants for sparse settings appear inZengShen22listdec;ZengShen23PTF\. These build on the foundational program of algorithmic high\-dimensional robust statistics initiated byDKKLMS16;DKKLSSsever19and surveyed byDiakonikolasKaneSurvey19\. Our contribution is complementary: rather than a new robust algorithm, we isolate the certificate*degree*as the resource these methods spend, and show that spending it is, for the moment/SoS\-certificate family, sometimes unavoidable\.
## Appendix GA closer look atShen25
We collect three closer readings ofShen25, in each case to explain a feature rather than to fault it: why its required margin must track the target accuracy \([Section˜G\.1](https://arxiv.org/html/2606.17215#A7.SS1)\), the fragility of its breakdown constant \([Section˜G\.2](https://arxiv.org/html/2606.17215#A7.SS2)\), and a reconstruction of its reweighting program from the moment cone \([Section˜G\.3](https://arxiv.org/html/2606.17215#A7.SS3)\)\.
### G\.1The margin–accuracy coupling
On its own termsShen25is a clean and substantial result: it is the first to learn \-margin halfspaces under a*constant*malicious\-noise rate for general log\-concave\-mixture marginals, through a single reweighted\-hinge program\. Our purpose here is not to dispute that achievement but to look closely at one feature of its guarantee, the margin requirement=\(log\(1/\)/d\)\\gamma=\\Omega\(\\log\(1/\\varepsilon\)/\\sqrt\{d\}\), and to argue that its dependence on is a genuine conceptual wrinkle rather than a cosmetic one, before locating exactly where it comes from and how the paper proposes to pay it\.
#### The coupling runs backwards\.
A margin is a property of the*distribution*: it is fixed the moment the problem is posed, and the learner has no lever to enlarge it\. The accuracy is a property of the learner’s*intent*: it is chosen freely, and a learner may legitimately ask for as small as it likes\. A guarantee whose hypothesis≥\(log\(1/\)/d\)\\gamma\\geq\\Omega\(\\log\(1/\\varepsilon\)/\\sqrt\{d\}\)ties the two reads from goal to assumption rather than from assumption to conclusion: to*permit*a smaller target error it requires the*data*to have been better separated all along\. The dependency points the wrong way\. The learner cannot meet the hypothesis by adjusting the one quantity it controls, , because lowering only tightens the hypothesis it was meant to satisfy\. This is the chicken\-and\-egg: the accuracy one is allowed to demand is governed by a margin one cannot choose\.
#### An error floor on every fixed instance\.
The wrinkle is quantitative, not merely rhetorical\. Suppose a guarantee certifies inlier error≤\\leq\\varepsilonwhenever≥f\(\)\\gamma\\geq f\(\\varepsilon\)for an increasingff; heref\(\)=clog\(1/\)/df\(\\varepsilon\)=c\\,\\log\(1/\\varepsilon\)/\\sqrt\{d\}\. On any*fixed*instance the margin takes some value0, so the guarantee is available only for those withf\(\)≤0f\(\\varepsilon\)\\leq\{\}\_\{0\}, that is, for
≥\(\)0⋆=f−1\(\)0=exp\(−\(d0\)\)\.\\varepsilon\\;\\geq\\;\{\}\_\{\\star\}\(\{\}\_\{0\}\)\\;=\\;f^\{\-1\}\(\{\}\_\{0\}\)\\;=\\;\\exp\\\!\\big\(\-\\Theta\(\{\}\_\{0\}\\sqrt\{d\}\)\\big\)\.The quantity\(\)0⋆\{\}\_\{\\star\}\(\{\}\_\{0\}\)is an*error floor*fixed by the margin: below it the hypothesis fails to hold and the theorem is silent, so on a fixed\-margin distribution one cannot ask this analysis for an arbitrarily accurate classifier\. The floor is not an artifact of the write\-up that a sharper analysis would dissolve\. By the margin–degree barrier \([Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)\) thelog\(1/\)\\log\(1/\\varepsilon\)is*conserved*across every one\-shot certificate of fixed degree: a degree\-22program cannot lower the floor at all without enlarging the margin, and a higher\-degree program lowers it only by spending degree\. The coupling is therefore a property of the certificate method, not of one paper’s bookkeeping\.
#### An example: the natural\-margin mixture\.
Take the natural margin=0\(1/d\)\{\}\_\{0\}=\\Theta\(1/\\sqrt\{d\}\), the separation one assumes when two clusters sit a unit\-scale gap apart in their worst one\-dimensional projection, and the regime in which margin\-based halfspace learning is usually posed\. Then=⋆exp\(−\(1\)\)\{\}\_\{\\star\}=\\exp\(\-\\Theta\(1\)\), a*constant*: applied to a natural\-margin instance, the guarantee certifies only a constant inlier error and cannot certify=o\(1\)\\varepsilon=o\(1\)at all\. The margin one must assume in order to certify a target grows steadily as the target shrinks:
target accuracymargin the guarantee requires\(1\)\\Theta\(1\)\(1/d\)\\Theta\(1/\\sqrt\{d\}\)\(the natural margin\)1/poly\(d\)1/\\operatorname\{poly\}\(d\)\(logd/d\)\\Theta\(\\log d/\\sqrt\{d\}\)exp\(−da\)\\exp\(\-d^\{a\}\),0<a<12\\ 0<a<\\tfrac\{1\}\{2\}\(da−1/2\)\\Theta\(d^\{\\,a\-1/2\}\)exp\(−\(d\)\)\\exp\(\-\\Theta\(\\sqrt\{d\}\)\)\(1\)\\Theta\(1\)\(a constant margin\)By the last row the hypothesis has become a constant\-margin condition, the classes separated by a constant fraction of their spread, which is a far stronger demand than the1/d1/\\sqrt\{d\}one began with and one that most high\-dimensional instances do not meet\. A learner who wants exponentially small error must, under this guarantee, assume exponentially better\-separated data, a bargain it is in no position to strike\.
#### The resolution: pay in degree, not in margin\.
The paper’s first result says this is the whole of the trade, and that the learner has a second currency\. The conservedlog\(1/\)\\log\(1/\\varepsilon\)may be carried by the margin*or*by the Sum\-of\-Squares degree, but inside the one\-shot certificate model it is never waived \([Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)\)\. Choosing to pay it in degree,[Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)holds the margin at the natural\(1/d\)\\Theta\(1/\\sqrt\{d\}\)and certifies inlier error with a degree\-2t=\(log\(1/\)\)2t=\\Theta\(\\log\(1/\\varepsilon\)\)program, in timen\(log\(1/\)\)n^\{\\Theta\(\\log\(1/\\varepsilon\)\)\}\. Accuracy is then bought with computation, a resource the learner controls, instead of with a margin it does not\. The chicken\-and\-egg is not a paradox but a question of which resource pays: degree, which the learner can spend, or margin, which the data fixes in advance, and the \-dependent margin ofShen25is the consequence of having only the degree\-22, i\.e\. cheapest, certificate at hand\.
### G\.2The breakdown constant of the degree\-22analysis
Our barriers concern the*exponent*of the breakdown rate \(the1/2that degree22forces,[Theorem˜4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)\), not the particular constant0a given analysis reports\. The distinction matters because the constant is fragile in a way the exponent is not, as the published degree\-22analysis already shows\.Shen25proves the main guarantee \(his Theorem 2\) for≤02−32\{\}\_\{0\}\\leq 2^\{\-32\}\. The sampling step underlying the pruning subroutine, however \(his Lemma 21, invoked through Proposition 22\), is stated and proved only for∈0\[18,14\]\{\}\_\{0\}\\in\[\\tfrac\{1\}\{8\},\\tfrac\{1\}\{4\}\], and its proof uses the*lower*bound≥018\{\}\_\{0\}\\geq\\tfrac\{1\}\{8\}: the dirty\-count Chernoff estimatePr\[\|S¯D\|≥2\|S¯\|0\]≤exp\(−\|S¯\|0/3\)\\Pr\[\\,\\lvert\\bar\{S\}\_\{D\}\\rvert\\geq 2\{\}\_\{0\}\\lvert\\bar\{S\}\\rvert\\,\]\\leq\\exp\(\-\{\}\_\{0\}\\lvert\\bar\{S\}\\rvert/3\)is driven below by the stated sample size\|S¯\|≥32log\(1/\)\\lvert\\bar\{S\}\\rvert\\geq 32\\log\(1/\\delta\)only because32≥3/032\\geq 3/\{\}\_\{0\}when≥018\{\}\_\{0\}\\geq\\tfrac\{1\}\{8\}\. At the rate the main theorem actually uses,≤02−32≪18\{\}\_\{0\}\\leq 2^\{\-32\}\\ll\\tfrac\{1\}\{8\}, the two regimes are disjoint and the fixed sample size no longer certifies the step\.
The gap is benign, and we record it not to dispute the theorem, which we expect holds, but because it is the symptom our framework predicts\. When degree22squeezes the tolerable rate to a tiny constant, the modular constant\-chasing that carries it through the analysis becomes correspondingly delicate: building blocks calibrated to a constant\-order0coexist with a final bound that drives0far below them\. Pinning the phenomenon to the certificate*degree*rather than to any one constant \([Theorems˜4\.7](https://arxiv.org/html/2606.17215#S4.Thmtheorem7)and[4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)\) is what frees our account from this calibration: the exponent1−12t1\-\\tfrac\{1\}\{2t\}is a property of the moment certificate, whatever route the constants take\.
#### The repair\.
The lower bound≥018\{\}\_\{0\}\\geq\\tfrac\{1\}\{8\}enters at exactly one point: the substitution32≥3/032\\geq 3/\{\}\_\{0\}in the proof ofShen25’s Lemma 21\. Replacing the hypothesis∈0\[18,14\]\{\}\_\{0\}\\in\[\\tfrac\{1\}\{8\},\\tfrac\{1\}\{4\}\]by≤014\{\}\_\{0\}\\leq\\tfrac\{1\}\{4\}, and the sample\-size requirement\|S¯\|≥32log\(1/\)\\lvert\\bar\{S\}\\rvert\\geq 32\\log\(1/\\delta\)by
\|S¯\|≥30log1,\\lvert\\bar\{S\}\\rvert\\;\\geq\\;\\frac\{3\}\{\{\}\_\{0\}\}\\,\\log\\frac\{1\}\{\\delta\},restores the Chernoff stepexp\(−\|S¯\|0/3\)≤\\exp\(\-\{\}\_\{0\}\\lvert\\bar\{S\}\\rvert/3\)\\leq\\deltaverbatim, while≤014\{\}\_\{0\}\\leq\\tfrac\{1\}\{4\}still gives\|S¯C\|≥\(1−2\)0\|S¯\|≥12\|S¯\|\\lvert\\bar\{S\}\_\{C\}\\rvert\\geq\(1\-2\{\}\_\{0\}\)\\lvert\\bar\{S\}\\rvert\\geq\\tfrac\{1\}\{2\}\\lvert\\bar\{S\}\\rvert\. The factor−10\{\}\_\{0\}^\{\-1\}propagates through Proposition 22 into the sample complexity; because=02−32\{\}\_\{0\}=2^\{\-32\}is a fixed constant, it is absorbed into the absolute constant, so the complexity keeps its order indd,1/1/\\varepsilon,1/1/\\delta, andkkand only the hidden constant grows \(by≈232\\approx 2^\{32\}\)\. No other step invokes≥018\{\}\_\{0\}\\geq\\tfrac\{1\}\{8\}, soShen25’s Theorem 2 holds as stated at≤02−32\{\}\_\{0\}\\leq 2^\{\-32\}, with the sample\-size constant now made explicit\.
### G\.3A natural reconstruction of the reweighting program
The degree\-22program ofShen25reaches its guarantee through a sequence of inequalities, each tight enough to carry the argument but none visibly forced upon us: weightsq∈\[0,1\]nq\\in\[0,1\]^\{n\}are introduced to suppress outliers, a quantityLSN\(q∘SD\)\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)is bounded, an empirical rate=20\\xi=2\{\}\_\{0\}is substituted, and a chain of absolute constants is chased down to≤02−32\{\}\_\{0\}\\leq 2^\{\-32\}\. The result is correct, but the route is the hammer and chisel: the shell is opened by force\. We record here the same content read the other way\. In Grothendieck’s image of the*rising sea*one does not crack the problem; one chooses the ambient object so large and so well\-fitted that the result is already dissolved on arrival\. That object is the one this paper is built on — the degree\-2t2tmoment cone of the clean marginal and its Christoffel function \([Section˜3](https://arxiv.org/html/2606.17215#S3)\)\. Three changes of*definition*, not of proof, turn each ofShen25’s steps into a reading of it\.
#### The weights solve a moment\-certification program, not a heuristic downweighting\.
Shen25presentsq∈\[0,1\]nq\\in\[0,1\]^\{n\}as a soft indicator that suppresses suspected outliers, to be justified after the fact\. Read it instead as the variable of a certification program tied to the clean marginal’s moment cone \([Section˜3\.1](https://arxiv.org/html/2606.17215#S3.SS1)\):qqranges over the reweightings for which that cone certifies a bound on the reweighted directional moment1n∑iqi\(w⋅xi\)2t\\tfrac\{1\}\{n\}\\sum\_\{i\}q\_\{i\}\(w\\cdot x\_\{i\}\)^\{2t\}, uniformly inww\. The program is then not “remove the outliers” but “keep the largest subsample whose reweighted moments the clean cone can vouch for,” and nothing heuristic is chosen: the feasibleqqform a convex, semidefinite\-representable set, its non\-emptiness is certifiable hypercontractivity \([Lemma˜4\.13](https://arxiv.org/html/2606.17215#S4.Thmtheorem13)\), and the outlier\-removal certificate is that set’s separating functional \([Definition˜2\.1](https://arxiv.org/html/2606.17215#S2.Thmtheorem1)\)\.
#### LSN\\mathrm\{LSN\}is a support radius, and1\-1/2tis Hölder, not a constant\.
The controlling quantity enters as an opaque expression,LSN\(q∘SD\)=sup∥w∥≤1∑i∈SDqi\|w⋅xi\|\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)=\\sup\_\{\\lVert w\\rVert\\leq 1\}\\sum\_\{i\\in S\_\{D\}\}q\_\{i\}\\lvert w\\cdot x\_\{i\}\\rvert\. It is, in fact, the Euclidean radius of the zonotope generated by the reweighted dirty points: using\|w⋅xi\|=max=i±1\(w⋅xi\)i\\lvert w\\cdot x\_\{i\}\\rvert=\\max\_\{\{\}\_\{i\}=\\pm 1\}\{\}\_\{i\}\(w\\cdot x\_\{i\}\),
LSN\(q∘SD\)=sup∥w∥≤1max∈\{±1\}SD⟨w,∑i∈SDqixii⟩=maxz∈Z∥z∥,Z=\{∑i∈SDqixii:∈\{±1\}SD\}\.\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)=\\sup\_\{\\lVert w\\rVert\\leq 1\}\\ \\max\_\{\\sigma\\in\\\{\\pm 1\\\}^\{S\_\{D\}\}\}\\Big\\langle w,\\ \\textstyle\\sum\_\{i\\in S\_\{D\}\}q\_\{i\}\{\}\_\{i\}x\_\{i\}\\Big\\rangle=\\max\_\{z\\in Z\}\\lVert z\\rVert,\\qquad Z=\\Big\\\{\\textstyle\\sum\_\{i\\in S\_\{D\}\}q\_\{i\}\{\}\_\{i\}x\_\{i\}:\\ \\sigma\\in\\\{\\pm 1\\\}^\{S\_\{D\}\}\\Big\\\}\.The only distributional input is then a bound on one moment ofZZ, and the degree appears as an interpolation exponent: Hölder with the conjugate pair\(2t2t−1,2t\)\(\\tfrac\{2t\}\{2t\-1\},2t\)gives
∑SDqi\|w⋅xi\|≤\(∑SDqi\)1−12t\(∑SDqi\|w⋅xi\|2t\)12t,\\sum\_\{S\_\{D\}\}q\_\{i\}\\lvert w\\cdot x\_\{i\}\\rvert\\leq\\Big\(\\sum\_\{S\_\{D\}\}q\_\{i\}\\Big\)^\{1\-\\frac\{1\}\{2t\}\}\\Big\(\\sum\_\{S\_\{D\}\}q\_\{i\}\\lvert w\\cdot x\_\{i\}\\rvert^\{2t\}\\Big\)^\{\\frac\{1\}\{2t\}\},so the mass∑qi≤n\\sum q\_\{i\}\\leq\\eta nand the certified2t2t\-th moment combine into the rate1\-1/2t\([Theorem˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)\)\. The exponent1−12t1\-\\tfrac\{1\}\{2t\}interpolates between the0\-th and the2t2t\-th moment of the reweighted dirty sum — its mass and its certified2t2t\-th moment — so it is fixed by the degree, andShen25’s1/2is itst=1t=1corner\. No constant enters the exponent\.
#### The breakdown threshold is the resolution principle; the constants are slack\.
Shen25’s robustness step asks4\(−2\)\>LSN\(q∘SD\)/n\\tfrac\{\\gamma\}\{4\}\(\\rho\-2\\xi\)\>\\mathrm\{LSN\}\(q\\circ S\_\{D\}\)/nwith=20\\xi=2\{\}\_\{0\}, and the factors2,42,4and the bound≤02−32\{\}\_\{0\}\\leq 2^\{\-32\}are read off from Chernoff and union bounds\. Separate the structure from the slack\. The left side is the clean hinge signal the dense pancake guarantees, being a*lower*bound on the boundary Christoffel mass \([Facts˜C\.4](https://arxiv.org/html/2606.17215#A3.Thmtheorem4)and[3\.2](https://arxiv.org/html/2606.17215#S3.Thmtheorem2)\); the right side is the corruption a degree\-2t2tcertificate cannot account for, which by the resolution principle is governed byt\+1\([Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)\)\. The success/failure threshold is thus set by the Christoffel function, and the numbers2,4,2−322,4,2^\{\-32\}are the finite\-sample slack between this population balance and its empirical form \(the doubling=20\\xi=2\{\}\_\{0\}is the Chernoff step of[Section˜G\.2](https://arxiv.org/html/2606.17215#A7.SS2)\)\. They fix the absolute constant alone; they do not enter the rate1\-1/2tor the marginlog\(1/\)\\log\(1/\\varepsilon\), both of which are degree phenomena \([Theorems˜4\.9](https://arxiv.org/html/2606.17215#S4.Thmtheorem9)and[4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)\)\.
#### The nut opens\.
Under these definitions the degree\-2t2talgorithm is no longer a construction audited inequality by inequality\. Feasibility is membership in the moment cone; the tolerated rate is the cone’s Hölder interpolation exponent; the breakdown threshold is the resolution principle\. The two featuresShen25flags as needing explanation, the small breakdown constant and thelog\(1/\)\\log\(1/\\varepsilon\)margin, are then the two corners of one statement aboutt\+1: the constant is thet=1t=1, exponent\-12\\tfrac\{1\}\{2\}corner \([Fact˜3\.4](https://arxiv.org/html/2606.17215#S3.Thmtheorem4)\), and the margin is the degree\-conserved price of[Theorem˜4\.1](https://arxiv.org/html/2606.17215#S4.Thmtheorem1)\. Nothing is cracked; the same theorem, set in the moment cone, opens on its own\.Similar Articles
Robust Subspace-Constrained Quadratic Models for Low-Dimensional Structure Learning
This paper proposes a robust subspace-constrained quadratic model for learning low-dimensional structures from high-dimensional data, accommodating heavy-tailed noise. A gradient-based algorithm with backtracking line search is developed, and experiments show improved robustness and reconstruction accuracy.
Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback
This paper proves that online gradient descent achieves optimal √T regret for hidden-convex losses under a Hessian compatibility condition, resolving open questions in adversarial online learning. It also extends results to one-point bandit feedback with a T^{3/4} expected regret bound.
A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics
This paper derives a closed-form upper bound for admissible learning-rate steps in belief-space dynamics using KL divergence and Bregman geometry, focusing on cross-entropy classification.
Task-Restricted Symmetries in Recurrent Weight Space
This paper studies functional redundancy in recurrent neural networks by using ordered real Schur coordinates to identify structured ablations that preserve task performance, finding that task-restricted symmetries vary across tasks and trained solutions.
Mirror Descent Beyond Euclidean Stability: An Exponential Separation in Initialization Sensitivity
This paper reveals that Mirror Descent with non-quadratic regularizers can be exponentially more sensitive to initialization than Gradient Descent, even under well-conditioned settings, which has implications for reproducibility in RL and LLM post-training.