Certification of Machine Learning Models via Directional Sharpness

arXiv cs.LG 06/25/26, 04:00 AM Papers
machine-learning model-certification generalization sharpness robustness verification zero-knowledge-proofs
Summary
This paper introduces directional sharpness, a new metric for certifying the generalization performance of machine learning models that is both efficient to compute and more reliable than existing proxies like test accuracy or traditional sharpness, even when training deviates from prescribed procedures.
arXiv:2606.25004v1 Announce Type: new Abstract: In machine learning, model certification has been identified as an important method for gaining assurance about a model's trustworthiness and quality. A model's quality is largely determined by its ability to generalize, i.e., to perform well on data beyond what it was trained on. It is not possible to certify generalization directly, however, as it depends on unknown data and is not directly measurable. Proxies such as test accuracy can be misleading when the training process is perturbed (intentionally or accidentally), and metrics such as sharpness -- which has an empirically supported link to generalization -- are computationally expensive and can also serve as unreliable signals when training deviates from a prescribed procedure. In this work, we propose directional sharpness, a metric designed to efficiently and reliably indicate generalization despite potential training deviations. We provide empirical and analytical evidence that directional sharpness (1) correlates more strongly with generalization than existing metrics and (2) identifies models with poor generalization more reliably than existing metrics. Furthermore, directional sharpness is efficiently computable in model auditing settings, where the verifier has access to training data, and via zero-knowledge proofs that certify quality without revealing training data.
Original Article
View Cached Full Text
Cached at: 06/25/26, 05:11 AM
# Certification of Machine Learning Models via Directional Sharpness
Source: [https://arxiv.org/html/2606.25004](https://arxiv.org/html/2606.25004)
Adrià Gascón Google adriag@google\.comSarah Meiklejohn University College London s\.meiklejohn@ucl\.ac\.ukPortions of this work were done while the author was at Google\.Mariana Raykova Google marianar@google\.com

###### Abstract

In machine learning, model certification has been identified as an important method for gaining assurance about a model’s trustworthiness and quality\. A model’s quality is largely determined by its ability to generalize, i\.e\., to perform well on data beyond what it was trained on\. It is not possible to certify generalization directly, however, as it depends on unknown data and is not directly measurable\. Proxies such as test accuracy can be misleading when the training process is perturbed \(intentionally or accidentally\), and metrics such as*sharpness*—which has an empirically supported link to generalization—are computationally expensive and can also serve as unreliable signals when training deviates from a prescribed procedure\. In this work, we propose*directional sharpness*, a metric designed to efficiently and reliably indicate generalization despite potential training deviations\. We provide empirical and analytical evidence that directional sharpness \(1\) correlates more strongly with generalization than existing metrics and \(2\) identifies models with poor generalization more reliably than existing metrics\. Furthermore, directional sharpness is efficiently computable in model auditing settings, where the verifier has access to training data, and via zero\-knowledge proofs that certify quality without revealing training data\.

## 1Introduction

There are many properties of machine learning \(ML\) models that model providers and external stakeholders may wish to verify before those models are deployed\. Arguably, the most universal property for such*model certification*is the*quality*of the model, in terms of its performance on downstream tasks\. A natural and widely accepted notion of model quality is*generalization*\[[2](https://arxiv.org/html/2606.25004#bib.bib20),[40](https://arxiv.org/html/2606.25004#bib.bib21)\]: the ability of the model to perform well on new, unseen data\.

In practice, model quality is typically estimated using a holdout test set, and a model is said to “generalize well” if it achieves high test accuracy\. Test accuracy is not a reliable metric, however, as test sets can contain errors\[[64](https://arxiv.org/html/2606.25004#bib.bib79)\], fail to detect overfitting\[[19](https://arxiv.org/html/2606.25004#bib.bib78),[31](https://arxiv.org/html/2606.25004#bib.bib43),[35](https://arxiv.org/html/2606.25004#bib.bib72)\], or miss real\-world distribution shifts\[[68](https://arxiv.org/html/2606.25004#bib.bib77),[48](https://arxiv.org/html/2606.25004#bib.bib76),[8](https://arxiv.org/html/2606.25004#bib.bib75),[78](https://arxiv.org/html/2606.25004#bib.bib74)\]\. In adversarial settings, research has shown that a model can achieve high accuracy on test data while being vulnerable to adversarial inputs\[[32](https://arxiv.org/html/2606.25004#bib.bib84)\]or containing backdoors that activate only on specific triggers\[[34](https://arxiv.org/html/2606.25004#bib.bib16),[54](https://arxiv.org/html/2606.25004#bib.bib83),[28](https://arxiv.org/html/2606.25004#bib.bib50)\]\.

In statistical learning theory, generalization is captured by the gap between the empirical risk on the training set and population risk on fresh samples from the same \(unknown\) data distribution\. Ideally, a certificate of generalization would show that this gap is small\. However, the true data distribution is unknown, and thus generalization cannot be measured directly\. Researchers have instead focused on generalization bounds: upper bounds on this gap computable from the training set and model\[[2](https://arxiv.org/html/2606.25004#bib.bib20),[83](https://arxiv.org/html/2606.25004#bib.bib52),[58](https://arxiv.org/html/2606.25004#bib.bib57),[59](https://arxiv.org/html/2606.25004#bib.bib63),[40](https://arxiv.org/html/2606.25004#bib.bib21),[62](https://arxiv.org/html/2606.25004#bib.bib51)\]\. While theoretically appealing, existing bounds are often numerically vacuous or too loose to track actual performance for modern, over\-parameterized neural networks, rendering them unsuitable for certification\.

These gaps have motivated empirically groundedgeneralization metrics: quantities computable from the learned model and training data that reliably predict generalization\. Such metrics can serve as a principled proxy for generalization without relying on a specific holdout test set\.

Recently, the*sharpness*of the loss landscape has emerged as a compelling generalization metric, with empirical and theoretical evidence that models in wide “flat” basins generalize better than those in narrow “sharp” minima\[[45](https://arxiv.org/html/2606.25004#bib.bib27),[43](https://arxiv.org/html/2606.25004#bib.bib24),[67](https://arxiv.org/html/2606.25004#bib.bib29),[36](https://arxiv.org/html/2606.25004#bib.bib25),[14](https://arxiv.org/html/2606.25004#bib.bib35)\]\. One particularly effective approach measures the*sensitivity*of training loss to small parameter perturbations\[[25](https://arxiv.org/html/2606.25004#bib.bib23),[50](https://arxiv.org/html/2606.25004#bib.bib22),[45](https://arxiv.org/html/2606.25004#bib.bib27),[67](https://arxiv.org/html/2606.25004#bib.bib29),[21](https://arxiv.org/html/2606.25004#bib.bib58)\]\. The seminal work of Foret et al\.\[[25](https://arxiv.org/html/2606.25004#bib.bib23)\]introduces sharpness\-aware minimization \(SAM\), an optimizer that explicitly minimizes the model’s sharpness\. SAM has been shown to significantly improve generalization\[[25](https://arxiv.org/html/2606.25004#bib.bib23),[50](https://arxiv.org/html/2606.25004#bib.bib22),[14](https://arxiv.org/html/2606.25004#bib.bib35)\], robustness to noisy labels\[[6](https://arxiv.org/html/2606.25004#bib.bib32)\], and adversarial robustness\[[90](https://arxiv.org/html/2606.25004#bib.bib53)\]across various benchmarks and model architectures\. SAM training is also associated with improved resistance to membership inference attacks\[[56](https://arxiv.org/html/2606.25004#bib.bib33)\], backdoor defense\[[100](https://arxiv.org/html/2606.25004#bib.bib38)\], and improvements in machine unlearning\[[80](https://arxiv.org/html/2606.25004#bib.bib80),[22](https://arxiv.org/html/2606.25004#bib.bib31)\]\. These results provide strong evidence that perturbation\-based sharpness is a promising proxy for model generalization\.

Despite this explanatory power, existing sharpness metrics have not been explored in the context of certification, likely due to two fundamental barriers\. First, these metrics have been validated only via correlation with test accuracy on models trained honestly on clean data\. Given that test accuracy itself can fail to detect poor generalization, it remains unclear whether or not sharpness can reliably predict generalization under \(intentional or unintentional\) training deviations\. Second, existing metrics incur prohibitive computational cost even with direct access to the model and training data, let alone the overhead required in a setting where an external stakeholder would like to verify \(without access to the training data\) a cryptographic proof computed by the model provider\.

This motivates our goal: to*design a generalization metric that is strongly predictive of generalization, remains informative under training deviations, and is efficient enough to support privacy\-preserving verification\.*

Our results\.In this paper, we introduce*directional sharpness*, a generalization metric designed specifically for model certification\. Unlike static sharpness, which measures the loss landscape geometry at a single point, directional sharpness measuresdynamic stability: the extent to which a model remains in a flat region under continuous, stochastic perturbation\. We show that directional sharpness \(1\) outperforms existing metrics in terms of empirical correlation with generalization; \(2\) remains predictive of generalization even under training deviations, making it more sensitive in detecting poor generalization than existing measures; and \(3\) is significantly more efficient to compute\. Directional sharpness thus provides an efficient and robust metric for evaluating generalization when the verifier has access to the training data\. We also show how to extend this metric to the setting where the verifier does not have access to the training dataset, via zero\-knowledge proofs for directional sharpness\.

### 1\.1Technical Overview

We now outline the key ideas behind directional sharpness\. We first explain the shift from static to dynamic measurement, then discuss its theoretical grounding, and finally present empirical validation showing the metric is efficient and enables practical certification\.

From static sharpness to dynamic stability\.Our starting point is perturbation\-based sharpness, which shows strong empirical correlation with generalization in honestly trained models\[[43](https://arxiv.org/html/2606.25004#bib.bib24)\]\. However, certification demands more than correlation in benign settings: a metric must detect low\-quality models that are adversarially chosen to evade detection\.

We identify a fundamental limitation of existing sharpness metrics: they are*static*, and evaluate a single, averaged perturbation around the final model to measure sharpness\. This one\-shot view can miss two failure modes that matter for certification: \(i\)*incoherent per\-example geometry*, where loss landscapes are sharp only on data subsets but appear flat when averaged over the full dataset, and \(ii\)*unstable flat minima*, where a model sits in an apparently flat, narrow pocket adjacent to sharp regions that a single perturbation may not reliably detect\. Both failure modes are linked to models with poor generalization\[[38](https://arxiv.org/html/2606.25004#bib.bib71),[4](https://arxiv.org/html/2606.25004#bib.bib30)\], overfitting, and backdoors\[[95](https://arxiv.org/html/2606.25004#bib.bib37),[29](https://arxiv.org/html/2606.25004#bib.bib39)\]\.

Our core insight is that a reliable metric should*verify sharpness dynamically*: a model in a wide, flat basin will remain well\-behaved under a*sequence*of small, stochastic perturbations, while models with incoherent or unstable geometry will eventually be pushed into sharp regions\. We call this property*dynamic stability*\. Convergent evidence from sharpness\-aware optimization supports this view: the effectiveness of SAM depends critically on*continuous, stochastic*perturbations computed on small mini\-batches, while perturbations using the full dataset are substantially less effective\[[4](https://arxiv.org/html/2606.25004#bib.bib30),[53](https://arxiv.org/html/2606.25004#bib.bib46),[6](https://arxiv.org/html/2606.25004#bib.bib32)\]\.

Measuring dynamic stability via directional sharpness\.Building on this insight, we design directional sharpness, a generalization metric that measures the dynamic stability of a model\. Given a trained model, we apply a short sequence of SAM\-style, stochastic perturbations and record the resulting sharpness after each step\. A model in a genuinely flat basin absorbs these perturbations and maintains stable sharpness values; a model in a narrow, unstable region will be pushed toward sharp neighbors, producing detectable fluctuations\. As we show in Sections[4\.5](https://arxiv.org/html/2606.25004#S4.SS5)and[5](https://arxiv.org/html/2606.25004#S5), this shift from static to dynamic measurement yields significantly more reliable predictions of generalization, even under training deviations\.

Theoretical grounding\.We provide theoretical analysis linking directional sharpness to a formal notion of stability\. Under standard assumptions, we show that if the initial model𝒘0\{\\bm\{w\}\}\_\{0\}is near a stable minimum, then the expected squared sharpness at stepttunder continuous perturbation is uniformly bounded:

𝔼\[st2\]≤κ¯ρ⋅C⋅ℒD\(𝒘0\)for allt,\\mathbb\{E\}\[s\_\{t\}^\{2\}\]\\leq\\overline\{\\kappa\}\_\{\\rho\}\\cdot C\\cdot\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\_\{0\}\)\\qquad\\text\{for all \}t,whereC\>0C\>0andκ¯ρ\\overline\{\\kappa\}\_\{\\rho\}are constants andℒD\(𝒘0\)\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\_\{0\}\)is the training loss of the initial model\. Conversely, if the expected squared sharpness grows exponentially, then𝒘0\{\\bm\{w\}\}\_\{0\}resides in an unstable region, which makes directional sharpness a good indicator of apparently flat but dynamically unstable regions\.

We further analyze the robustness of directional sharpness under training deviations\. We formalize a class of deviations that induce low gradient coherence, a phenomenon linked to poor generalization, overfitting, and backdoors\[[12](https://arxiv.org/html/2606.25004#bib.bib42),[13](https://arxiv.org/html/2606.25004#bib.bib40),[70](https://arxiv.org/html/2606.25004#bib.bib41)\]\. For this class of deviations, we show that directional sharpness can detect instability signals that static measures may miss\.

We complement our analysis with empirical evaluations showing that directional sharpness \(i\) correlates with generalization more strongly than static sharpness metrics, and \(ii\) more consistently detects low\-quality models under training deviations like overfitting and backdoors\.

Empirical validation and practical certification\.We conduct extensive experiments validating directional sharpness for practical model certification\. We consider two scenarios:*auditing*, where a trusted verifier has access to training data and computes directional sharpness directly; and*certification*, where an external verifier, without accessing training data, checks a zero\-knowledge proof that a potentially adversarial provider’s claimed directional sharpness value was computed correctly\. We validate three properties that make directional sharpness suitable for both scenarios:

- •*strong correlation with generalization*: In Section[4\.5](https://arxiv.org/html/2606.25004#S4.SS5), we show that directional sharpness correlates significantly more strongly with generalization than even the best\-performing static metrics\.
- •*reliability under training deviations*: Existing generalization metrics have been evaluated only in the context of honestly trained models\. However, for many certification settings, it is important that directional sharpness remain correlated with generalization even under training deviations\. In Section[5](https://arxiv.org/html/2606.25004#S5), we evaluate directional sharpness on models with both intentional deviations \(e\.g\., injected backdoors\) and unintentional ones \(e\.g\., overfitting\)\. Directional sharpness reliably flags these models even when both test accuracy and static sharpness fail, confirming that dynamic stability measurement is more sensitive to subtle forms of poor generalization\.
- •*efficiency*: In the auditing setting, directional sharpness is4×4\\timesfaster to compute than test accuracy\. In certification, its zero\-knowledge proof is up to80,000×80\{,\}000\\timesfaster than proving training correctness step\-by\-step \(88\.87 minutes versus an estimated 118,671 hours for VGG\-11 over 150 epochs\)\.

### 1\.2Related Work

ML model auditing and certification\.We distinguish between the termsmodel auditingandmodel certificationbased on the verifier’s ability to access the training data\. Specifically, we definemodel auditingas a setting where the verifier has access to the training data\. Many existing techniques for checking specific trustworthy properties of ML models fall under this category:fairnessauditing ensures models do not discriminate against particular groups\[[37](https://arxiv.org/html/2606.25004#bib.bib93)\];differential privacy auditing\[[5](https://arxiv.org/html/2606.25004#bib.bib104)\]empirically tests the theoretical guarantee that a model’s output hides the presence of individual training examples; androbustnessauditing\[[16](https://arxiv.org/html/2606.25004#bib.bib100),[89](https://arxiv.org/html/2606.25004#bib.bib101),[52](https://arxiv.org/html/2606.25004#bib.bib102),[44](https://arxiv.org/html/2606.25004#bib.bib103)\]aims to provide provable guarantees against adversarial examples\.

Conversely, we definemodel certificationas a setting where the verifier has no access to the training data\. Many works in privacy\-preserving ML adopt this setting, where cryptographic techniques like zero\-knowledge proofs allow the verifier tocertifytrustworthy properties while preserving the privacy of the training data\. Existing works include certification of properties such as fairness\[[72](https://arxiv.org/html/2606.25004#bib.bib13),[26](https://arxiv.org/html/2606.25004#bib.bib94),[92](https://arxiv.org/html/2606.25004#bib.bib95),[97](https://arxiv.org/html/2606.25004#bib.bib99),[47](https://arxiv.org/html/2606.25004#bib.bib97),[65](https://arxiv.org/html/2606.25004#bib.bib98),[81](https://arxiv.org/html/2606.25004#bib.bib96)\], differential\-privacy guarantees\[[71](https://arxiv.org/html/2606.25004#bib.bib14)\], and vicinity to an optimum\[[79](https://arxiv.org/html/2606.25004#bib.bib89)\]\. Beyond property\-specific guarantees, zero\-knowledge proofs of training \(zkPoT\) allow the verifier to certify that a model was obtained by faithfully running a prescribed training algorithm\[[77](https://arxiv.org/html/2606.25004#bib.bib91),[1](https://arxiv.org/html/2606.25004#bib.bib81),[30](https://arxiv.org/html/2606.25004#bib.bib90)\]\.

Generalization and sharpness\.While existing methods can certify theintegrityof the training process or specific properties, they do not directly certify the most fundamental measure of model quality: its ability to generalize to unseen data\. Theoretically, a generalization certificate is a tight, computable upper bound on the population risk\[[66](https://arxiv.org/html/2606.25004#bib.bib62)\]\. Many works have focused on deriving such generalization bounds for modern ML models\[[58](https://arxiv.org/html/2606.25004#bib.bib57),[59](https://arxiv.org/html/2606.25004#bib.bib63),[2](https://arxiv.org/html/2606.25004#bib.bib20),[20](https://arxiv.org/html/2606.25004#bib.bib36),[36](https://arxiv.org/html/2606.25004#bib.bib25),[24](https://arxiv.org/html/2606.25004#bib.bib59),[63](https://arxiv.org/html/2606.25004#bib.bib61),[83](https://arxiv.org/html/2606.25004#bib.bib52),[61](https://arxiv.org/html/2606.25004#bib.bib60),[21](https://arxiv.org/html/2606.25004#bib.bib58)\]\. However, existing generalization bounds are often vacuous or too loose when applied to practical neural networks, and thus cannot certify generalization in practice\.

A parallel line of research has studied empirical proxies for generalization\. Among these, the geometry of the loss landscape, specifically itssharpness, has long been linked to generalization, with flatter minima generalizing better\[[45](https://arxiv.org/html/2606.25004#bib.bib27),[43](https://arxiv.org/html/2606.25004#bib.bib24)\]\. This connection has motivated optimization techniques, such as SAM, that explicitly seek flatter minima to improve generalization\[[25](https://arxiv.org/html/2606.25004#bib.bib23),[50](https://arxiv.org/html/2606.25004#bib.bib22)\]\. While sharpness has been studied extensively as a generalization proxy, to the best of our knowledge, it has not been explored as a practical certificate of generalization\.

## 2Preliminaries

### 2\.1Machine Learning

In this paper, we consider a generalized framework for supervised learning where a model is parameterized by𝒘∈ℝp\{\\bm\{w\}\}\\in\\mathbb\{R\}^\{p\}\. LetD=\{\(𝒙i,yi\)\}i=1ND=\\\{\(\{\\bm\{x\}\}\_\{i\},y\_\{i\}\)\\\}\_\{i=1\}^\{N\}denote the training dataset with inputs𝒙i∈ℝd\{\\bm\{x\}\}\_\{i\}\\in\\mathbb\{R\}^\{d\}and labelsyi∈ℝy\_\{i\}\\in\\mathbb\{R\}\. A loss functionℓ\(𝒘,\(𝒙,y\)\)\\ell\(\{\\bm\{w\}\},\(\{\\bm\{x\}\},y\)\)measures the quality of the model’s prediction on a data point\. The training loss isℒD\(𝒘\)=1N∑i=1Nℓ\(𝒘,\(𝒙i,yi\)\)\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\ell\(\{\\bm\{w\}\},\(\{\\bm\{x\}\}\_\{i\},y\_\{i\}\)\), which we abbreviate asℒ\(𝒘\):=ℒD\(𝒘\)\\mathcal\{L\}\(\{\\bm\{w\}\}\):=\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)andℓi\(𝒘\):=ℓ\(𝒘,\(𝒙i,yi\)\)\\ell\_\{i\}\(\{\\bm\{w\}\}\):=\\ell\(\{\\bm\{w\}\},\(\{\\bm\{x\}\}\_\{i\},y\_\{i\}\)\)when the dataset is clear from context\. A training algorithm𝖳𝗋𝖺𝗂𝗇ℒD:ℛ→ℝp\\mathsf\{Train\}\_\{\\mathcal\{L\}\_\{D\}\}:\\mathcal\{R\}\\rightarrow\\mathbb\{R\}^\{p\}takes a random seedr∈ℛr\\in\\mathcal\{R\}and outputs a model𝒘\{\\bm\{w\}\}that approximately minimizesℒD\(𝒘\)\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)\. We writeg\(𝒘\):=∇ℒD\(𝒘\)g\(\{\\bm\{w\}\}\):=\\nabla\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)for the full\-dataset gradient andH\(𝒘\)H\(\{\\bm\{w\}\}\)for the Hessian, dropping𝒘\{\\bm\{w\}\}when clear from context\.

In practice, optimization uses mini\-batchesξ⊂D\\xi\\subset Dof sizeB=\|ξ\|B=\|\\xi\|sampled uniformly at random\. We writeℒξ\(𝒘\)\\mathcal\{L\}\_\{\\xi\}\(\{\\bm\{w\}\}\)for the loss onξ\\xi, and define the stochastic gradient asgξ\(𝒘\):=∇ℒξ\(𝒘\)g\_\{\\xi\}\(\{\\bm\{w\}\}\):=\\nabla\\mathcal\{L\}\_\{\\xi\}\(\{\\bm\{w\}\}\)\. The gradient noise isζ\(𝒘,ξ\):=gξ\(𝒘\)−∇ℒD\(𝒘\)\\zeta\(\{\\bm\{w\}\},\\xi\):=g\_\{\\xi\}\(\{\\bm\{w\}\}\)\-\\nabla\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)\.

### 2\.2Generalization and Sharpness

Following the standard statistical learning framework\[[2](https://arxiv.org/html/2606.25004#bib.bib20),[62](https://arxiv.org/html/2606.25004#bib.bib51)\], we assume training exampleszi:=\(𝒙i,yi\)z\_\{i\}:=\(\{\\bm\{x\}\}\_\{i\},y\_\{i\}\)are drawn i\.i\.d\. from a fixed but unknown distribution𝒫\\mathcal\{P\}over an instance space𝒵=𝒳×𝒴\\mathcal\{Z\}=\\mathcal\{X\}\\times\\mathcal\{Y\}\. The*population risk*of a model𝒘\{\\bm\{w\}\}isR\(𝒘\)=𝔼z∼𝒫\[ℓ\(𝒘,z\)\],R\(\{\\bm\{w\}\}\)=\\mathbb\{E\}\_\{z\\sim\\mathcal\{P\}\}\[\\ell\(\{\\bm\{w\}\},z\)\],which cannot be computed directly since𝒫\\mathcal\{P\}is unknown\. Instead, we estimate it using the*empirical risk*:

R^\(𝒘\)=1N∑i=1Nℓ\(𝒘,zi\),\\hat\{R\}\(\{\\bm\{w\}\}\)=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\ell\(\{\\bm\{w\}\},z\_\{i\}\),which can be computed directly on the training set\.

The core challenge in learning is that a model with low empirical risk might not necessarily have a low population risk\. The*generalization gap*,Gap⁡\(𝒘\)=R\(𝒘\)−R^\(𝒘\)\\operatorname\{Gap\}\(\{\\bm\{w\}\}\)=R\(\{\\bm\{w\}\}\)\-\\hat\{R\}\(\{\\bm\{w\}\}\), quantifies how well training performance predicts test performance\. A model generalizes well when this gap is small\. In Section[3](https://arxiv.org/html/2606.25004#S3), we discuss how to certify generalization given that the true gap is unobservable\.

Generalization metrics and sharpness\.A generalization metric, or*complexity measure*\[[43](https://arxiv.org/html/2606.25004#bib.bib24),[36](https://arxiv.org/html/2606.25004#bib.bib25),[20](https://arxiv.org/html/2606.25004#bib.bib36),[84](https://arxiv.org/html/2606.25004#bib.bib82)\], is a quantity that monotonically relates to generalization and is computable without access to a test set\. Among the many proposed generalization metrics, one of the most prominent and empirically successful is the*sharpness*of the model\[[45](https://arxiv.org/html/2606.25004#bib.bib27),[25](https://arxiv.org/html/2606.25004#bib.bib23),[50](https://arxiv.org/html/2606.25004#bib.bib22)\]\. In this paper, we focus on perturbation\-based \(worst\-case\) sharpness:

###### Definition 1\(Worst\-Case Sharpness\)\.

Given a loss functionℒD\\mathcal\{L\}\_\{D\}parameterized by the training dataDD, a radiusρ\>0\\rho\>0, and norm∥⋅∥p\\\|\\cdot\\\|\_\{p\}, theworst\-case sharpnessof parameters𝐰\{\\bm\{w\}\}is defined as:

𝖲ρ,p\(𝒘,D\):=max‖ϵ‖p≤ρ⁡\(ℒD\(𝒘\+ϵ\)−ℒD\(𝒘\)\)\.\\mathsf\{S\}\_\{\\rho,p\}\(\{\\bm\{w\}\},D\):=\\max\_\{\\\|\\bm\{\\epsilon\}\\\|\_\{p\}\\leq\\rho\}\\left\(\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\+\\bm\{\\epsilon\}\)\-\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)\\right\)\.

This definition, however, is sensitive to parameter re\-scaling\[[17](https://arxiv.org/html/2606.25004#bib.bib56),[50](https://arxiv.org/html/2606.25004#bib.bib22)\], which can be exploited to make a sharp minimum appear artificially flat\. This led to a magnitude\-aware worst\-case version invariant to such re\-scaling:

###### Definition 2\(Magnitude\-Aware Worst\-Case Sharpness\[[50](https://arxiv.org/html/2606.25004#bib.bib22)\]\)\.

Given a loss functionℒD\\mathcal\{L\}\_\{D\}parameterized by the training dataDD, a radiusρ\>0\\rho\>0, and norm∥⋅∥p\\\|\\cdot\\\|\_\{p\}, denote𝐓𝐰:=diag\(\|𝐰\|\)\\mathbf\{T\}\_\{\{\\bm\{w\}\}\}:=\\textrm\{diag\}\(\|\{\\bm\{w\}\}\|\)\. Themagnitude\-aware worst\-case sharpnessof parameters𝐰\{\\bm\{w\}\}is:

𝖲ρ,p𝗆𝖺𝗀\(𝒘,D\):=max‖𝐓𝒘−1ϵ‖p≤ρ⁡\(ℒD\(𝒘\+ϵ\)−ℒD\(𝒘\)\)\.\\mathsf\{S\}^\{\\mathsf\{mag\}\}\_\{\\rho,p\}\(\{\\bm\{w\}\},D\):=\\max\_\{\\\|\{\\bf\{T\}\}^\{\-1\}\_\{\{\\bm\{w\}\}\}\\bm\{\\epsilon\}\\\|\_\{p\}\\leq\\rho\}\\left\(\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\+\\bm\{\\epsilon\}\)\-\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)\\right\)\.

### 2\.3Zero\-Knowledge Proofs

A zero\-knowledge proof \(ZKP\) of circuit satisfiability allows a prover holding a witnesswwto prove to a verifier that𝒞\(w\)=1\\mathcal\{C\}\(w\)=1for some public circuit𝒞\\mathcal\{C\}without revealing any information aboutww\. Many existing ZKP protocols can be used for proving directional sharpness, including zkSNARKs\[[33](https://arxiv.org/html/2606.25004#bib.bib8),[15](https://arxiv.org/html/2606.25004#bib.bib7),[27](https://arxiv.org/html/2606.25004#bib.bib9),[51](https://arxiv.org/html/2606.25004#bib.bib15)\]and proofs based on vector oblivious linear evaluation \(VOLE\)\[[87](https://arxiv.org/html/2606.25004#bib.bib2),[11](https://arxiv.org/html/2606.25004#bib.bib6),[18](https://arxiv.org/html/2606.25004#bib.bib1),[10](https://arxiv.org/html/2606.25004#bib.bib5),[9](https://arxiv.org/html/2606.25004#bib.bib3),[93](https://arxiv.org/html/2606.25004#bib.bib4)\]\.

Zero\-knowledge proofs of training\.Zero\-knowledge proofs of training \(zkPoTs\)\[[77](https://arxiv.org/html/2606.25004#bib.bib91),[1](https://arxiv.org/html/2606.25004#bib.bib81),[30](https://arxiv.org/html/2606.25004#bib.bib90),[79](https://arxiv.org/html/2606.25004#bib.bib89)\]allow a prover to prove that a model is trained correctly on a committed dataset without revealing any additional information about the model or the dataset\. ZkPoTs can be used for distributed model training, privacy auditing, and proving ownership\[[91](https://arxiv.org/html/2606.25004#bib.bib88),[1](https://arxiv.org/html/2606.25004#bib.bib81),[71](https://arxiv.org/html/2606.25004#bib.bib14)\]\.

## 3Certification of Model Generalization

### 3\.1Certification Threat Model

We now motivate and formalize what it means to certify generalization\. The goal of certification is to generate a \(short\) certificate that enables efficient verification of the certified property\. Thus, there are two main steps in the certification process: computation of the certificate \(proof\) and verification of the certificate\. The roles, the threat models, and the available inputs for the parties that execute these steps may differ in different applications\. We distinguish between two main settings:*model auditing*and*model certification*\.

In model auditing, the verifier is trusted by the model provider to have access to the training data\. This includes settings where the model provider uses generalization certificates as a quality metric for candidate model checkpoints and to detect unintentional training issues such as overfitting or memorization\. Since the model provider is the natural consumer of the generalization certificates, the threat model for the soundness of the certificates assumes that training has been done honestly\.

In external auditing applications, the verifier is a third party, but the trust framework for the auditor provides access to training data as well\. In this case, the model provider may or may not be trusted to execute the training honestly\.

On the other hand, in model certification, the verifier and the model provider—acting now as a cryptographic*prover*—are mutually distrustful\. Thus, the verifier is not given access to the training data, and the prover is not trusted to follow any prescribed steps for the training\. In this setting, the prover certifies a claim about the model relative to a committed training datasetDD\. We assume thatDDis authenticated, so the prover cannot arbitrarily craft, replace, or modify the dataset used for certification\. This assumption is consistent with existing proof\-of\-training literature\[[77](https://arxiv.org/html/2606.25004#bib.bib91),[1](https://arxiv.org/html/2606.25004#bib.bib81),[30](https://arxiv.org/html/2606.25004#bib.bib90),[79](https://arxiv.org/html/2606.25004#bib.bib89)\]and is natural in practical certification settings, where the authenticity ofDDcan be enforced by external authentication, trusted data sources, or additional proofs\. Attacks in which the provider can freely choose or forge the committed dataset are therefore outside our threat model\.

Bearing in mind the above certification applications, we proceed to describe our approach to certifying generalization and our security goals\. The first required step towards computing a generalization certificate is the ability to measure this property\. Next, we discuss the challenges to quantifying generalization\.

### 3\.2Quantifying Generalization

As discussed in Section[1](https://arxiv.org/html/2606.25004#S1), for “certifying generalization” to have practical meaning, it cannot mean proving the exact generalization gap or a loose theoretical bound\. Instead, we seek to certify an empirically motivatedgeneralization metricthat is computable and has direct, actionable implications in practice\. We now formalize what it means to certify generalization via such metrics\.

Metric\-based certification\.In our solution, we adopt a metric\-based approach in which the generalization certificate for the model is based on a generalization metric computed on the final model\. Specifically, a*generalization metric*is an efficiently computable \(randomized\) functionγℒ\\gamma\_\{\\mathcal\{L\}\}mapping\(𝒘,D,r\)\(\{\\bm\{w\}\},D,r\)to a valueγℒ\(𝒘,D,r\)∈ℝ≥0\\gamma\_\{\\mathcal\{L\}\}\(\{\\bm\{w\}\},D,r\)\\in\\mathbb\{R\}\_\{\\geq 0\}, whereDDand𝒘\{\\bm\{w\}\}denote the training set and trained model held by the model provider \(prover\),ℒ\\mathcal\{L\}is a public loss function, andrris a public random seed\. Smaller values ofγℒ\(𝒘,D,r\)\\gamma\_\{\\mathcal\{L\}\}\(\{\\bm\{w\}\},D,r\)are interpreted as stronger evidence of generalization\.

Therefore, bycertificate of generalizationwe mean a certificate \(proof\) of a predicate on this metric:

𝖯𝗋𝖾𝖽τ\(𝒘,D,r\)≡\[γℒ\(𝒘,D,r\)≤τ\],\\mathsf\{Pred\}\_\{\\tau\}\(\{\\bm\{w\}\},D,r\)\\equiv\[\\gamma\_\{\\mathcal\{L\}\}\(\{\\bm\{w\}\},D,r\)\\leq\\tau\],where the thresholdτ\\tauand seedrrare public\. When instantiated with a ZKP with knowledge soundness, this means that any computationally bounded prover who convinces the verifier must know a model𝒘\{\\bm\{w\}\}and datasetDDsatisfying𝖯𝗋𝖾𝖽τ\(𝒘,D,r\)\\mathsf\{Pred\}\_\{\\tau\}\(\{\\bm\{w\}\},D,r\)\. Thus, the prover cannot falsely claim a lower metric value for the model𝒘\{\\bm\{w\}\}and its committed datasetDD\. Importantly, the role of certification here is to verify the value ofγℒ\(𝒘,D,r\)\\gamma\_\{\\mathcal\{L\}\}\(\{\\bm\{w\}\},D,r\)and not to prove the link betweenγℒ\\gamma\_\{\\mathcal\{L\}\}and the true generalization gap\.

### 3\.3Desiderata of Generalization Metrics in Certification

Having defined metric\-based certification, we now ask what properties a generalization metric must satisfy to serve as the basis of a certificate\.

Existing evaluation method\.Because generalization does not have an observable “ground truth”, evaluating how well a metric predicts generalization must be done through some proxy\. Most empirical work evaluates a candidate metric by training a large grid of models, computing the metric on each model, and measuring its correlation with an empirical proxy for generalization—most commonly the*empirical generalization gap*measured on a holdout test set \(e\.g\.,\[[43](https://arxiv.org/html/2606.25004#bib.bib24)\]\)\. Metrics with stronger and more consistent correlation are viewed as more predictive of generalization\.

Evaluating metrics for certification\.However, existing metrics are primarily proposed as*explanatory*tools for understanding generalization in settings where models are trained honestly and evaluation data is benign\. In certification, the training procedure may deviate from the nominal one, either accidentally \(e\.g\., label noise, shortcut learning, or distributional artifacts\) or intentionally \(e\.g\., data poisoning or backdoors\)\. Thus, a metric used for certification requires stronger evidence than correlation on honestly trained models alone\.

Specifically, we propose the following desiderata that a metric should satisfy for the resulting certificate to be reliable\.

1. 1\.Correlated with generalization\.As an empirical generalization metric, it must correlate strongly with the empirical generalization gap across a diverse set of models\. This is the established baseline for any generalization metric\.
2. 2\.Predictive despite training deviations\.The metric should continue to track generalization even when the training process deviates from the nominal procedure \(intentionally or accidentally\)\. Concretely, this means it should be able to identify training deviations that preserve low training loss but degrade generalization, such as overfitting, memorization, spurious shortcuts, or adversarial behaviors that are only triggered under specific conditions\.
3. 3\.Efficiently computable\.As a practical certification tool, it must be computationally efficient, requiring at most a handful of mini\-batch operations and avoiding full\-dataset or Hessian\-level costs\.

On robustness to training deviations\.While a generalization metric used for certification should ideally provide guarantees against any training deviations, we note that formalizing such guarantees faces fundamental challenges\. First, the relationship between training deviations and generalization is not well defined\. Generalization, in the strict learning\-theoretic sense, is defined with respect to a fixed data distribution𝒫\\mathcal\{P\}\. For example, a backdoor model might behave maliciously only on attacker\-chosen trigger inputs that lie outside the support of𝒫\\mathcal\{P\}; such behavior may not increase the generalization gap as classically defined\. Second, ruling out*all*strategies that fool a metric would require quantifying over unconstrained adversarial objectives\. Some objectives may be statistically independent of the training set and therefore cannot be detected from the model and training data alone; without additional assumptions, universal detection can be impossible in a strong sense\.

Given these challenges, we instead take a principled approach to validate our metric\. In Section[4\.4](https://arxiv.org/html/2606.25004#S4.SS4), we provide analysis of a class of deviations characterized by*low gradient coherence*—a property empirically linked to poor generalization and known deviations—and show that such deviations can remain invisible to existing metrics yet can be detected by directional sharpness\. In Section[5](https://arxiv.org/html/2606.25004#S5), we complement our analysis with extensive experimental evidence that directional sharpness reliably flags models with known deviations even when test accuracy and static sharpness do not\.

## 4Defining and Validating Directional Sharpness

Section[3](https://arxiv.org/html/2606.25004#S3)formalized certifying generalization and identified three desiderata for generalization metrics: strong correlation, reliability under training deviations, and computational efficiency\. In this section, we develop such a metric based on the dynamic stability of the loss landscape\. We begin by reviewing the stochastic behavior of the SAM optimizer and deriving our metric from its dynamics \(Sections[4\.1](https://arxiv.org/html/2606.25004#S4.SS1)and[4\.2](https://arxiv.org/html/2606.25004#S4.SS2)\), then provide theoretical grounding \(Sections[4\.3](https://arxiv.org/html/2606.25004#S4.SS3)and[4\.4](https://arxiv.org/html/2606.25004#S4.SS4)\), and present empirical validation \(Section[4\.5](https://arxiv.org/html/2606.25004#S4.SS5)\)\.

### 4\.1From SAM to Dynamic Stability

The starting point of our generalization metric is Sharpness\-Aware Minimization \(SAM\)\[[25](https://arxiv.org/html/2606.25004#bib.bib23)\]\. SAM seeks flat minima by minimizing both the loss and itsworst\-case sharpness\(Def\.[1](https://arxiv.org/html/2606.25004#Thmdefinition1)\), yielding the min\-max objective:

min𝒘⁡max‖ϵ‖p≤ρ⁡ℒD\(𝒘\+ϵ\)\.\\min\_\{\{\\bm\{w\}\}\}\\max\_\{\\\|\\bm\{\\epsilon\}\\\|\_\{p\}\\leq\\rho\}\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\+\\bm\{\\epsilon\}\)\.\(1\)Solving the inner maximization exactly is intractable, so SAM approximates it via a first\-order Taylor expansion:ℒD\(𝒘\+ϵ\)≈ℒD\(𝒘\)\+ϵ⊤∇ℒD\(𝒘\)\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\+\\bm\{\\epsilon\}\)\\approx\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)\+\\bm\{\\epsilon\}^\{\\top\}\\nabla\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)\. The inner problem then reduces tomax‖ϵ‖p≤ρ⁡ϵ⊤∇ℒD\(𝒘\)\\max\_\{\\\|\\bm\{\\epsilon\}\\\|\_\{p\}\\leq\\rho\}\\bm\{\\epsilon\}^\{\\top\}\\nabla\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\), which is a dual norm problem with the closed\-form solution \(for1<p<∞1<p<\\inftyand∇ℒD\(𝒘\)≠0\\nabla\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)\\neq 0\):

ϵ^\(𝒘\)=ρsign\(∇ℒD\(𝒘\)\)\|∇ℒD\(𝒘\)\|q−1‖∇ℒD\(𝒘\)‖qq−1,\\hat\{\\bm\{\\epsilon\}\}\(\{\\bm\{w\}\}\)=\\rho\\,\\frac\{\\text\{sign\}\(\\nabla\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)\)\\,\|\\nabla\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)\|^\{q\-1\}\}\{\\\|\\nabla\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)\\\|\_\{q\}^\{\\,q\-1\}\},\(2\)whereqqsatisfies1p\+1q=1\\frac\{1\}\{p\}\+\\frac\{1\}\{q\}=1\. Substituting this back yields:

###### Definition 3\(Approximated Sharpness\[[25](https://arxiv.org/html/2606.25004#bib.bib23)\]\)\.

LetℒD\\mathcal\{L\}\_\{D\}be the loss, and let∇ℒD\(𝐰\)\\nabla\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)be its gradient\. Given a radiusρ\>0\\rho\>0and a pair of dualp,qp,qnorms \(where1p\+1q=1\\frac\{1\}\{p\}\+\\frac\{1\}\{q\}=1\), the approximated sharpness orSAM sharpnessof model𝐰\{\\bm\{w\}\}is:

𝖲ρ,p𝗌𝖺𝗆\(𝒘,D\):=∇ℒD\(𝒘\)⊤ϵ^\(𝒘\)=ρ‖∇ℒD\(𝒘\)‖q\.\\mathsf\{S\}\_\{\\rho,p\}^\{\\mathsf\{sam\}\}\(\{\\bm\{w\}\},D\):=\\nabla\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)^\{\\top\}\\hat\{\\bm\{\\epsilon\}\}\(\{\\bm\{w\}\}\)=\\rho\\\|\\nabla\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)\\\|\_\{q\}\.

ASAM\[[50](https://arxiv.org/html/2606.25004#bib.bib22)\]proposes using the magnitude\-aware worst\-case sharpness \(Def\.[2](https://arxiv.org/html/2606.25004#Thmdefinition2)\) as a more robust objective; this yields:

###### Definition 4\(Approximated Magnitude\-Aware Sharpness\[[50](https://arxiv.org/html/2606.25004#bib.bib22)\]\)\.

Let𝐓𝐰\\mathbf\{T\}\_\{\{\\bm\{w\}\}\}denotediag\(\|𝐰\|\)\\textrm\{diag\}\(\|\{\\bm\{w\}\}\|\)\. Given a radiusρ\>0\\rho\>0and a pair of dualp,qp,qnorms \(where1p\+1q=1\\frac\{1\}\{p\}\+\\frac\{1\}\{q\}=1\), the magnitude\-aware approximated sharpness orASAM sharpnessof𝐰\{\\bm\{w\}\}is:

𝖲ρ,p𝖺𝗌𝖺𝗆\(𝒘,D\):=max‖𝐓𝒘−1ϵ‖p≤ρ⁡\(∇ℒD\(𝒘\)⊤ϵ\)=ρ‖𝐓𝒘∇ℒD\(𝒘\)‖q\.\\mathsf\{S\}\_\{\\rho,p\}^\{\\mathsf\{asam\}\}\(\{\\bm\{w\}\},D\):=\\max\_\{\\\|\{\\bf\{T\}\}^\{\-1\}\_\{\{\\bm\{w\}\}\}\\bm\{\\epsilon\}\\\|\_\{p\}\\leq\\rho\}\\left\(\\nabla\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)^\{\\top\}\\bm\{\\epsilon\}\\right\)=\\rho\\\|\{\\bf\{T\}\}\_\{\{\\bm\{w\}\}\}\\nabla\\mathcal\{L\}\_\{D\}\(\{\\bm\{w\}\}\)\\\|\_\{q\}\.

SAM effectively makes two approximations: \(i\) replacing the worst\-case perturbation with the gradient direction, and \(ii\) replacing full\-batch loss with mini\-batch loss\. Recent work reveals that these approximations are not merely computational shortcuts but are*essential*for improving generalization: using full\-batch sharpness significantly degrades performance\[[25](https://arxiv.org/html/2606.25004#bib.bib23),[6](https://arxiv.org/html/2606.25004#bib.bib32)\]\. This suggests that SAM succeeds not merely because it finds points of low sharpness, but because it finds minima that remain stable under*continuous*,*stochastic*perturbation—two properties that existing static measures lack:

- •Continuous perturbation\.Static sharpness measures evaluate the loss increase from a single, bounded perturbation, which can be misleading when a minimum is locally flat but adjacent to sharp regions\. A sequence of perturbations can incrementally push parameters out of such narrow pockets, revealing the adjacent sharpness through detectable increases in sharpness\. This is confirmed by\[[99](https://arxiv.org/html/2606.25004#bib.bib28)\]: applying SAM to an SGD\-trained model pushes it from the SGD minimum to a much flatter SAM minimum\.
- •Stochastic perturbation\.Static measures compute sharpness on the full training set and can potentially mask high sharpness on specific subsets\. A model that appears flat on the full dataset but exhibits high sharpness on particular data points would likely lead to poor generalization\.

These findings motivate our central proposal: we define a generalization metric that measures sharpness not as a static property at a single point, but as a measure of*dynamic stability*—how consistently flat a model remains under continuous, sharpness\-aware perturbation\. We detail our approach next\.

### 4\.2Our Approach: Dynamic Stability

Input:Initial model parameters

𝒘0\{\\bm\{w\}\}\_\{0\}, dataset

DD, public random seed

rr, configuration

𝒞=\(T,B,Φ,𝖲,𝖿𝗅𝗎𝖼,ℒ,Θ\)\\mathcal\{C\}=\(T,B,\\Phi,\\mathsf\{S\},\\mathsf\{fluc\},\\mathcal\{L\},\\Theta\)\.

Output:Directional Sharpness measure

ℱ\\mathcal\{F\}\.

𝒘←𝒘0\{\\bm\{w\}\}\\leftarrow\{\\bm\{w\}\}\_\{0\}

ℋ←\[\]\\mathcal\{H\}\\leftarrow\[\\,\]

for*t←0t\\leftarrow 0toT−1T\-1*do

Sample a mini\-batch

ξt⊂D\\xi\_\{t\}\\subset Dof size

BBvia

\(r,t\)\(r,t\)
ℒξt←ℒξt\(𝒘\)\\mathcal\{L\}\_\{\\xi\_\{t\}\}\\leftarrow\\mathcal\{L\}\_\{\\xi\_\{t\}\}\(\{\\bm\{w\}\}\)
st←𝖲\(𝒘,ξt,Θ\)s\_\{t\}\\leftarrow\\mathsf\{S\}\(\{\\bm\{w\}\},\\xi\_\{t\},\\Theta\)
Append

\(𝒘,ξt,st,ℒξt\)\(\{\\bm\{w\}\},\\xi\_\{t\},s\_\{t\},\\mathcal\{L\}\_\{\\xi\_\{t\}\}\)to

ℋ\\mathcal\{H\}
𝒘←Φ\(𝒘,ξt,Θ\)\{\\bm\{w\}\}\\leftarrow\\Phi\(\{\\bm\{w\}\},\\xi\_\{t\},\\Theta\)
end for

ℱ←𝖿𝗅𝗎𝖼\(ℋ\)\\mathcal\{F\}\\leftarrow\\mathsf\{fluc\}\(\\mathcal\{H\}\)

return

ℱ\\mathcal\{F\}

Algorithm 1Computing directional sharpness\.We define our generalization metric, directional sharpness, as a measure of a model’s geometric stability under a series of sharpness\-aware perturbations\.

###### Definition 5\(Directional Sharpness\)\.

Given a model𝐰0\{\\bm\{w\}\}\_\{0\}, a datasetDD, a public random seedrr, batch sizeBB, number of stepsTT, a loss functionℒ\(⋅\)\\mathcal\{L\}\(\\cdot\), a sharpness function𝖲\(⋅\)\\mathsf\{S\}\(\\cdot\), a SAM update operatorΦΘ\(⋅\)\\Phi\_\{\\Theta\}\(\\cdot\)with hyperparametersΘ\\Theta, and a fluctuation statistic𝖿𝗅𝗎𝖼\(⋅\)\\mathsf\{fluc\}\(\\cdot\), fort∈\{0,…,T−1\}t\\in\\\{0,\\dots,T\-1\\\}, we first generate a probe historyℋ=\{h0,…,hT−1\}\\mathcal\{H\}=\\\{h\_\{0\},\\dots,h\_\{T\-1\}\\\}by iteratively computing:ht:=\(𝐰t,ξt,st,ℒξt\)h\_\{t\}:=\(\{\\bm\{w\}\}\_\{t\},\\xi\_\{t\},s\_\{t\},\\mathcal\{L\}\_\{\\xi\_\{t\}\}\)whereξt\\xi\_\{t\}is a mini\-batch of sizeBBdrawn uniformly fromDDusing seedrr,st=𝖲\(𝐰t,ξt,Θ\)s\_\{t\}=\\mathsf\{S\}\(\{\\bm\{w\}\}\_\{t\},\\xi\_\{t\},\\Theta\),ℒξt=ℒξt\(𝐰t\)\\mathcal\{L\}\_\{\\xi\_\{t\}\}=\\mathcal\{L\}\_\{\\xi\_\{t\}\}\(\{\\bm\{w\}\}\_\{t\}\), and𝐰t\+1:=ΦΘ\(𝐰t,ξt\)\{\\bm\{w\}\}\_\{t\+1\}:=\\Phi\_\{\\Theta\}\(\{\\bm\{w\}\}\_\{t\},\\xi\_\{t\}\)\. Thedirectional sharpnessof model𝐰0\{\\bm\{w\}\}\_\{0\}parameterized by𝒞=\(T,B,Φ,𝖲,𝖿𝗅𝗎𝖼,ℒ,Θ\)\\mathcal\{C\}=\(T,B,\\Phi,\\mathsf\{S\},\\mathsf\{fluc\},\\mathcal\{L\},\\Theta\)is defined as:

𝖲𝒞𝖽\(𝒘0,D,r\):=𝖿𝗅𝗎𝖼\(ℋ\)\.\\mathsf\{S\}^\{\\mathsf\{d\}\}\_\{\\mathcal\{C\}\}\(\{\\bm\{w\}\}\_\{0\},D,r\):=\\mathsf\{fluc\}\(\\mathcal\{H\}\)\.

Algorithm[1](https://arxiv.org/html/2606.25004#algorithm1)shows the procedure for computing directional sharpness\. Given a trained model𝒘0\{\\bm\{w\}\}\_\{0\}, we applyTTsteps of SAM\-style updates using freshly sampled mini\-batches\. At each steptt, we record a per\-step sharpnessst=𝖲ρ,p𝗌𝖺𝗆\(𝒘t,ξt\)s\_\{t\}=\\mathsf\{S\}\_\{\\rho,p\}^\{\\mathsf\{sam\}\}\(\{\\bm\{w\}\}\_\{t\},\\xi\_\{t\}\)\(e\.g\., SAM sharpness or ASAM sharpness\) with mini\-batchξt\\xi\_\{t\}\. This probing process yields a time series of SAM sharpness values\{s0,s1,…,sT−1\}\\\{s\_\{0\},s\_\{1\},\\dots,s\_\{T\-1\}\\\}\. We define the directional sharpness metric using a statistic that quantifies the stability of this sequence\.

Choice of fluctuation statistic\.Here𝖿𝗅𝗎𝖼\\mathsf\{fluc\}is a function that outputs a final non\-negative number quantifying the amount of fluctuation inℋ\\mathcal\{H\}\. In practice, we found that tracking the raw sharpness values\{st\}\\\{s\_\{t\}\\\}alone can be sensitive to the overall loss scale across models and datasets\. We therefore use an*empirical*normalization strategy viart:=st2/ℒξtr\_\{t\}:=s^\{2\}\_\{t\}/\\mathcal\{L\}\_\{\\xi\_\{t\}\}and set𝖿𝗅𝗎𝖼\(ℋ\):=Std\(\{log⁡\(rt\+ε\)\}t=0T−1\)\\mathsf\{fluc\}\(\\mathcal\{H\}\):=\\mathrm\{Std\}\(\\\{\\log\(r\_\{t\}\+\\varepsilon\)\\\}\_\{t=0\}^\{T\-1\}\)for a smallε\>0\\varepsilon\>0\. Empirically, this ratio\-based statistic is slightly more correlated with generalization than using\{st\}\\\{s\_\{t\}\\\}alone, so we adopt it by default\.

The key semantic difference from static sharpness is that directional sharpness tests how a model*responds*to SAM dynamics rather than measuring geometry at a single point\. If𝒘0\{\\bm\{w\}\}\_\{0\}resides in a truly flat basin—the type SAM converges to—then applying SAM perturbations will produce stable sharpness values\. Conversely, if𝒘0\{\\bm\{w\}\}\_\{0\}resides in a sharp or unstable region, SAM will push the model toward neighboring sharp areas, producing an unstable series\. Thus, a low directional sharpness value suggests that the model remains geometrically stable under the SAM probing dynamics, while a high value indicates instability along this trajectory\. We validate empirically that this dynamic\-stability signal correlates with generalization and provide a theoretical analysis of this connection in the next section\.

### 4\.3A Theoretical Foundation for Directional Sharpness

In this section, we provide a theoretical analysis for our directional sharpness metric\. We proceed by defining a notion of stability based on recent results on SAM dynamics\[[99](https://arxiv.org/html/2606.25004#bib.bib28)\], then deriving bounds that connect our sharpness measurements to this stability\.

Formalizing stability\.Recall that the goal of directional sharpness is to verify whether a trained model exhibits stability properties that are predictive of good generalization\. We formalize this stability property using the concept of SAM\-stability from recent optimization literature, as recent work suggests that SAM’s dynamics capture a generalization\-relevant bias stronger than static sharpness measures: among multiple minima of equal static sharpness, SAM exhibits an implicit bias favoring those that generalize better\[[99](https://arxiv.org/html/2606.25004#bib.bib28),[3](https://arxiv.org/html/2606.25004#bib.bib70),[85](https://arxiv.org/html/2606.25004#bib.bib26),[76](https://arxiv.org/html/2606.25004#bib.bib69)\]\.

Concretely, we consider the model we are certifying to have a small training loss and be near some global minimum𝒘∗\{\\bm\{w\}\}^\{\*\}\. In this regime, the model can be approximated by the local linearization of its prediction functionff:

flin\(𝒙;𝒘\)=f\(𝒙;𝒘∗\)\+⟨∇𝒘f\(𝒙;𝒘∗\),𝒘−𝒘∗⟩\.f\_\{\\text\{lin\}\}\(\{\\bm\{x\}\};\{\\bm\{w\}\}\)=f\(\{\\bm\{x\}\};\{\\bm\{w\}\}^\{\*\}\)\+\\langle\\nabla\_\{\{\\bm\{w\}\}\}f\(\{\\bm\{x\}\};\{\\bm\{w\}\}^\{\*\}\),\{\\bm\{w\}\}\-\{\\bm\{w\}\}^\{\*\}\\rangle\.The corresponding linearized loss is:

ℒlin\(𝒘\)=1N∑i=1Nℓ\(flin\(𝒙i;𝒘\),yi\)\.\\mathcal\{L\}\_\{\\text\{lin\}\}\(\{\\bm\{w\}\}\)=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\ell\\bigl\(f\_\{\\text\{lin\}\}\(\{\\bm\{x\}\}\_\{i\};\{\\bm\{w\}\}\),y\_\{i\}\\bigr\)\.This linearization allows us to analyze the local geometry of the loss landscape near convergence\. Throughout our analysis, we work in this linearized model and writeℒ\(𝒘\):=ℒlin\(𝒘\)\\mathcal\{L\}\(\{\\bm\{w\}\}\):=\\mathcal\{L\}\_\{\\text\{lin\}\}\(\{\\bm\{w\}\}\)\. We define stability in terms of SAM’s behavior onℒ\\mathcal\{L\}:

###### Definition 6\(SAM\-Stability\)\.

A minimum𝐰∗\{\\bm\{w\}\}^\{\*\}is \(linearly\)*SAM\-stable*if there existsC\>0C\>0such that for any starting point𝐰0\{\\bm\{w\}\}\_\{0\}near𝐰∗\{\\bm\{w\}\}^\{\*\}where the linearization is valid, applying the SAM optimizer to the lossℒ\\mathcal\{L\}yields a trajectory\{𝐰t\}\\\{\{\\bm\{w\}\}\_\{t\}\\\}satisfying𝔼\[ℒ\(𝐰t\)\]≤C⋅𝔼\[ℒ\(𝐰0\)\]\\mathbb\{E\}\[\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{t\}\)\]\\leq C\\cdot\\mathbb\{E\}\[\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{0\}\)\]for allt≥0t\\geq 0\. A minimum that is not SAM\-stable is called*SAM\-unstable*\.

Intuitively, a SAM\-stable minimum is one where SAM’s perturbation\-based dynamics remain bounded—the optimizer cannot escape, and the loss stays controlled\. Our central claim is that directional sharpness provides an efficient, empirical test for this stability\. We show that bounded SAM loss implies bounded expected sharpness, while exponential sharpness growth certifies instability\. Thus, a stable basin can yield controlled sharpness along the probing trajectory, and exponential growth rules out a stable basin\.

Linking directional sharpness to stability\.We establish this claim in two steps\. First, we derive a “sandwich” bound showing that the sharpnesssts\_\{t\}is tightly coupled to the lossℒ\(𝒘t\)\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{t\}\)at each steptt\. Second, we use this bound to characterize how sharpness evolves at stable versus unstable minima\.

We first adopt the following assumptions from\[[99](https://arxiv.org/html/2606.25004#bib.bib28)\]and establish that per\-step SAM sharpness measurements are coupled to the loss:

###### Assumption 1\(Smoothness and Polyak\-Łojasiewicz Condition\[[99](https://arxiv.org/html/2606.25004#bib.bib28)\]\)\.

There existμ\>0,Ls\>0\\mu\>0,L\_\{s\}\>0such that‖H\(𝐰\)‖2≤Ls\\\|H\(\{\\bm\{w\}\}\)\\\|\_\{2\}\\leq L\_\{s\}and‖g\(𝐰\)‖22≥2μℒ\(𝐰\)\\\|g\(\{\\bm\{w\}\}\)\\\|\_\{2\}^\{2\}\\geq 2\\mu\\,\\mathcal\{L\}\(\{\\bm\{w\}\}\)for all𝐰\{\\bm\{w\}\}\.

###### Assumption 2\(Bounded Gradient Noise\[[99](https://arxiv.org/html/2606.25004#bib.bib28)\]\)\.

The gradient noise varianceζ\(𝐰,ξ\)=gξ\(𝐰\)−g\(𝐰\)\\zeta\(\{\\bm\{w\}\},\\xi\)=g\_\{\\xi\}\(\{\\bm\{w\}\}\)\-g\(\{\\bm\{w\}\}\)is bounded by the full\-dataset loss\. That is there exists a constantσ\>0\\sigma\>0such that for all𝐰\{\\bm\{w\}\},𝔼\[‖ζ\(𝐰,ξ\)‖22\]≤σ2ℒ\(𝐰\)\.\\mathbb\{E\}\[\\\|\\zeta\(\{\\bm\{w\}\},\\xi\)\\\|\_\{2\}^\{2\}\]\\leq\\sigma^\{2\}\\mathcal\{L\}\(\{\\bm\{w\}\}\)\.

Assumption[1](https://arxiv.org/html/2606.25004#Thmassumption1)is standard and frequently used in non\-convex optimization\. Assumption[2](https://arxiv.org/html/2606.25004#Thmassumption2)states that mini\-batch noise is controlled by the loss value, a phenomenon that has been empirically observed\[[60](https://arxiv.org/html/2606.25004#bib.bib49),[23](https://arxiv.org/html/2606.25004#bib.bib47),[88](https://arxiv.org/html/2606.25004#bib.bib44)\]\.

###### Lemma 1\(Linear Sandwich Bound\)\.

Let\{𝐰t\}t≥0\\\{\{\\bm\{w\}\}\_\{t\}\\\}\_\{t\\geq 0\}be a sequence of models where Assumptions[1](https://arxiv.org/html/2606.25004#Thmassumption1)and[2](https://arxiv.org/html/2606.25004#Thmassumption2)hold\. Letsts\_\{t\}denote the SAM sharpness \(Def\.[3](https://arxiv.org/html/2606.25004#Thmdefinition3)\) of model𝐰t\{\\bm\{w\}\}\_\{t\}withp=q=2p=q=2and radiusρ\\rho, evaluated over a mini\-batch of sizeBBsampled uniformly without replacement fromDD\. Then for alltt:

κ¯ρ⋅ℒ\(𝒘t\)≤𝔼\[st2\|𝒘t\]≤κ¯ρ⋅ℒ\(𝒘t\),\\underline\{\\kappa\}\_\{\\rho\}\\cdot\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{t\}\)\\leq\\mathbb\{E\}\\bigl\[s\_\{t\}^\{2\}\\big\|\{\\bm\{w\}\}\_\{t\}\\bigr\]\\leq\\overline\{\\kappa\}\_\{\\rho\}\\cdot\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{t\}\),whereκ¯ρ=2ρ2μ\\underline\{\\kappa\}\_\{\\rho\}=2\\rho^\{2\}\\muandκ¯ρ=ρ2\(2Ls\+σ2B\)\\overline\{\\kappa\}\_\{\\rho\}=\\rho^\{2\}\\left\(2L\_\{s\}\+\\frac\{\\sigma^\{2\}\}\{B\}\\right\)\.

###### Proof\.

See Appendix[B\.1](https://arxiv.org/html/2606.25004#A2.SS1)\. ∎

This lemma shows that the expected squared sharpness value is bounded between constant multiples of the loss, with constants depending on gradient noise and local curvature\. We now present our main theorem showing the link between directional sharpness and SAM\-stability\.

###### Theorem 1\(Sharpness Dynamics and SAM\-Stability\)\.

Let\{𝐰t\}t≥0\\\{\{\\bm\{w\}\}\_\{t\}\\\}\_\{t\\geq 0\}be the trajectory generated by applying SAM toℒ\\mathcal\{L\}, starting from𝐰0\{\\bm\{w\}\}\_\{0\}near a global minimum𝐰∗\{\\bm\{w\}\}^\{\*\}\. Suppose Assumptions[1](https://arxiv.org/html/2606.25004#Thmassumption1)and[2](https://arxiv.org/html/2606.25004#Thmassumption2)hold, and letsts\_\{t\}denote the mini\-batch SAM sharpness at steptt\. Then the following statements hold:

\(a\) Stability implies bounded sharpness\.If𝒘∗\{\\bm\{w\}\}^\{\*\}is SAM\-stable in the sense of Def\.[6](https://arxiv.org/html/2606.25004#Thmdefinition6)with constantC\>0C\>0, then

𝔼\[st2\]≤κ¯ρCℒ\(𝒘0\)for allt≥0\.\\mathbb\{E\}\[s\_\{t\}^\{2\}\]\\leq\\overline\{\\kappa\}\_\{\\rho\}C\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{0\}\)\\qquad\\text\{for all \}t\\geq 0\.
\(b\) Exponential sharpness growth certifies instability\.Assumeℒ\(𝒘0\)\>0\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{0\}\)\>0\. If there existK\>0K\>0,α\>1\\alpha\>1, andt0≥0t\_\{0\}\\geq 0such that for allt≥t0t\\geq t\_\{0\},𝔼\[st2\]≥Kαtℒ\(𝒘0\),\\mathbb\{E\}\[s\_\{t\}^\{2\}\]\\geq K\\alpha^\{t\}\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{0\}\),then𝒘∗\{\\bm\{w\}\}^\{\*\}is SAM\-unstable\.

###### Proof\.

See Appendix[B\.2](https://arxiv.org/html/2606.25004#A2.SS2)\. ∎

Interpretation\.Theorem[1](https://arxiv.org/html/2606.25004#Thmtheorem1)provides theoretical support for our claim that directional sharpness is directly linked to SAM optimizer dynamics\. If the iterate lies in a SAM\-stable basin, then the expected sharpness is uniformly bounded \(Part \(a\)\)\. Conversely, exponential growth of the expected sharpness is a sufficient certificate that the iterate is not SAM\-stable \(Part \(b\)\)\. While the theorem is one\-way, empirically we find that poor generalization and training deviations consistently lead to large directional sharpness scores \(Section[5](https://arxiv.org/html/2606.25004#S5)\)\.

Remark \(Extension to ASAM\)\.For clarity, our analysis is stated for SAM sharpness \(Def\.[3](https://arxiv.org/html/2606.25004#Thmdefinition3)\)\. The same argument extends to ASAM \(Def\.[4](https://arxiv.org/html/2606.25004#Thmdefinition4)\), where the perturbation is constrained by‖\(𝐓𝒘\)−1ϵ‖p≤ρ\\\|\(\{\\bf T\}\_\{\{\\bm\{w\}\}\}\)^\{\-1\}\\bm\{\\epsilon\}\\\|\_\{p\}\\leq\\rhofor a scaling operator𝐓𝒘:=diag\(\|𝒘\|\)\\mathbf\{T\}\_\{\{\\bm\{w\}\}\}:=\\textrm\{diag\}\(\|\{\\bm\{w\}\}\|\)\. In particular, ASAM uses𝐓𝒘′=𝐓𝒘\+ϵ0I\{\\bf T\}^\{\\prime\}\_\{\{\\bm\{w\}\}\}=\{\\bf T\}\_\{\{\\bm\{w\}\}\}\+\\epsilon\_\{0\}Iwithϵ0\>0\\epsilon\_\{0\}\>0\[[50](https://arxiv.org/html/2606.25004#bib.bib22)\], so the per\-coordinate scaling is bounded away from0\. Assume each weight magnitude is upper bounded, i\.e\.,maxi⁡\(\|\(𝒘t\)i\|\+ϵ0\)≤M\\max\_\{i\}\(\|\(\{\\bm\{w\}\}\_\{t\}\)\_\{i\}\|\+\\epsilon\_\{0\}\)\\leq Mfor allt∈\{0,…,T−1\}t\\in\\\{0,\\dots,T\-1\\\}\. Then Lemma[1](https://arxiv.org/html/2606.25004#Thmlemma1)holds for ASAM withκ¯ρ=2ϵ02ρ2μ\\underline\{\\kappa\}\_\{\\rho\}=2\\epsilon\_\{0\}^\{2\}\\rho^\{2\}\\muandκ¯ρ=M2ρ2\(2Ls\+σ2B\)\\overline\{\\kappa\}\_\{\\rho\}=M^\{2\}\\rho^\{2\}\\\!\\left\(2L\_\{s\}\+\\frac\{\\sigma^\{2\}\}\{B\}\\right\)\.

### 4\.4Detecting Training Deviations

We now analyze why directional sharpness can detect poor generalization under training deviations that full\-batch sharpness measurements miss\. Specifically, we formalize a class of training deviations with lowgradient coherence, which has been empirically linked to poor generalization behavior and observed in known deviations; we then show that these deviations can be invisible to the full\-batch sharpness metric but detectable by directional sharpness\.

Gradient coherence measure\.Recent work\[[12](https://arxiv.org/html/2606.25004#bib.bib42)\]links flat minima and generalizable solutions to the notion ofdata coherence: how consistently different training examples “agree” on the local geometry\. Given per\-example HessiansHi:=∇2ℓi\(𝒘\)H\_\{i\}:=\\nabla^\{2\}\\ell\_\{i\}\(\{\\bm\{w\}\}\), the authors define a coherence score over the matrix𝐒∈ℝN×N\\mathbf\{S\}\\in\\mathbb\{R\}^\{N\\times N\}with entries𝐒ij:=Tr\(HiHj\)\\mathbf\{S\}\_\{ij\}:=\\sqrt\{\\mathrm\{Tr\}\(H\_\{i\}H\_\{j\}\)\}\. Intuitively,𝐒ij\\mathbf\{S\}\_\{ij\}is large when examplesiiandjjhave aligned curvature\.

Data coherence has also been studied in terms of gradient alignment\[[13](https://arxiv.org/html/2606.25004#bib.bib40),[70](https://arxiv.org/html/2606.25004#bib.bib41)\], which can be captured using the gradient Gram matrixGGwith entriesGij:=⟨gi\(𝒘\),gj\(𝒘\)⟩G\_\{ij\}:=\\langle g\_\{i\}\(\{\\bm\{w\}\}\),g\_\{j\}\(\{\\bm\{w\}\}\)\\ranglewheregi\(𝒘\):=∇ℓi\(𝒘\)g\_\{i\}\(\{\\bm\{w\}\}\):=\\nabla\\ell\_\{i\}\(\{\\bm\{w\}\}\)is the per\-example gradient\. Intuitively, large off\-diagonal entries inGGindicate that two examples have aligned gradients\. We can define a coherence score onGGby measuring how strongly per\-example gradients agree on the average direction:

###### Definition 7\(Gradient Coherence\)\.

Letgi\(𝐰\):=∇ℓi\(𝐰\)g\_\{i\}\(\{\\bm\{w\}\}\):=\\nabla\\ell\_\{i\}\(\{\\bm\{w\}\}\)andg\(𝐰\):=1N∑i=1Ngi\(𝐰\)g\(\{\\bm\{w\}\}\):=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}g\_\{i\}\(\{\\bm\{w\}\}\)\. The*gradient coherence*of model𝐰\{\\bm\{w\}\}on datasetDDis:

cg\(𝒘\):=1N2∑ijGij1NTr\(G\)=‖g\(𝒘\)‖221N∑i=1N‖gi\(𝒘\)‖22∈\[0,1\]\.c\_\{g\}\(\{\\bm\{w\}\}\):=\\frac\{\\frac\{1\}\{N^\{2\}\}\\sum\_\{ij\}G\_\{ij\}\}\{\\frac\{1\}\{N\}\\mathrm\{Tr\}\(G\)\}=\\frac\{\\\|g\(\{\\bm\{w\}\}\)\\\|\_\{2\}^\{2\}\}\{\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\\|g\_\{i\}\(\{\\bm\{w\}\}\)\\\|\_\{2\}^\{2\}\}\\in\[0,1\]\.

Intuitively, highcgc\_\{g\}means per\-example gradients are largely aligned, while lowcgc\_\{g\}means substantial cancellation happens across gradients\.

Training deviations induce low coherence\.Empirically, training deviations often manifest as poorly aligned \(incoherent\) gradients/curvature across examples\. In backdoor attacks, Yuan et al\.\[[95](https://arxiv.org/html/2606.25004#bib.bib37)\]observe a substantially more dispersed distribution of activation gradients within the backdoor target class, indicating reduced within\-class alignment of per\-example geometry\. Separately, Garg et al\.\[[29](https://arxiv.org/html/2606.25004#bib.bib39)\]show that memorized examples correlate with high per\-example local curvature, suggesting strong per\-example heterogeneity rather than shared curvature structure\. Motivated by these observations, we formalize the class of training deviations using gradient coherence:

###### Assumption 3\(Low\-Coherence Deviations\)\.

A training deviation produces a model𝐰\{\\bm\{w\}\}that achieves low training loss on a clean datasetDDof sizeNNwith small gradient‖g\(𝐰\)‖2≤ε\\\|g\(\{\\bm\{w\}\}\)\\\|\_\{2\}\\leq\\varepsilonand low gradient coherencecg\(𝐰\)≤c0≪1c\_\{g\}\(\{\\bm\{w\}\}\)\\leq c\_\{0\}\\ll 1\.

Our assumption says deviations produce models with low training loss and small gradient, yet these models have hidden poor generalization behavior manifested through low gradient coherence\.

Directional sharpness detects low gradient coherence\.We now demonstrate that directional sharpness can detect low\-coherence deviations in cases where static full\-batch sharpness cannot\. Consider a single step of directional sharpness measuring the mini\-batch SAM sharpness𝖲ξ=ρ‖gξ\(𝒘\)‖2\\mathsf\{S\}\_\{\\xi\}=\\rho\\\|g\_\{\\xi\}\(\{\\bm\{w\}\}\)\\\|\_\{2\}and a static, full\-batch SAM sharpness metric𝖲=ρ‖g\(𝒘\)‖2\\mathsf\{S\}=\\rho\\\|g\(\{\\bm\{w\}\}\)\\\|\_\{2\}\. To compare the scale of the random mini\-batch signal with the deterministic full\-batch signal, define the root\-mean\-square \(RMS\) mini\-batch sharpness𝖲rms\(𝒘\):=𝔼ξ\[𝖲ξ2\]\.\\mathsf\{S\}\_\{\\mathrm\{rms\}\}\(\{\\bm\{w\}\}\):=\\sqrt\{\\mathbb\{E\}\_\{\\xi\}\[\\mathsf\{S\}\_\{\\xi\}^\{2\}\]\}\.The next theorem shows that under low\-coherence deviations, the full\-batch sharpness𝖲\\mathsf\{S\}can be small while the RMS mini\-batch sharpness remains large\. Moreover, the squared RMS gap𝖲rms2−𝖲2\\mathsf\{S\}\_\{\\mathrm\{rms\}\}^\{2\}\-\\mathsf\{S\}^\{2\}grows as the batch sizeBBdecreases\.

###### Theorem 2\(Detection Gap\)\.

Let𝐰\{\\bm\{w\}\}be a model produced by a training deviation satisfying Assumption[3](https://arxiv.org/html/2606.25004#Thmassumption3)\. Letξ\\xibe a mini\-batch of sizeBBsampled uniformly from\[N\]\[N\]without replacement andgi:=∇ℓi\(𝐰\)g\_\{i\}:=\\nabla\\ell\_\{i\}\(\{\\bm\{w\}\}\)\. Then the squared RMS gap between mini\-batch sharpness and full\-batch sharpness is:

𝖲rms2−𝖲2=\(ρ2⋅N−BB\(N−1\)⋅1N∑i=1N‖gi‖22\)⋅\(1−cg\(𝒘\)\)\.\\displaystyle\\mathsf\{S\}\_\{\\mathrm\{rms\}\}^\{2\}\-\\mathsf\{S\}^\{2\}\\;=\\;\\bigl\(\\;\\rho^\{2\}\\cdot\\frac\{N\-B\}\{B\(N\-1\)\}\\cdot\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\\|g\_\{i\}\\\|\_\{2\}^\{2\}\\;\\bigr\)\\;\\cdot\\;\\bigl\(1\-c\_\{g\}\(\{\\bm\{w\}\}\)\\bigr\)\.\(3\)In particular, whenB≪NB\\ll Nandcg\(𝐰\)≪1c\_\{g\}\(\{\\bm\{w\}\}\)\\ll 1, this gap is dominated by

𝖲rms2−𝖲2≈ρ2⋅1B⋅1N∑i=1N‖gi‖22\.\\displaystyle\\mathsf\{S\}\_\{\\mathrm\{rms\}\}^\{2\}\-\\mathsf\{S\}^\{2\}\\;\\approx\\;\\rho^\{2\}\\cdot\\frac\{1\}\{B\}\\cdot\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\\|g\_\{i\}\\\|\_\{2\}^\{2\}\.\(4\)Moreover, if‖g\(𝐰\)‖2≤ε\\\|g\(\{\\bm\{w\}\}\)\\\|\_\{2\}\\leq\\varepsilonandε2≪1B⋅1N∑i=1N‖gi‖22\\varepsilon^\{2\}\\ll\\frac\{1\}\{B\}\\cdot\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\\|g\_\{i\}\\\|\_\{2\}^\{2\}, then the full\-batch sharpness is negligible relative to the RMS mini\-batch sharpness, i\.e\.,𝖲≪𝖲rms\.\\mathsf\{S\}\\ll\\mathsf\{S\}\_\{\\mathrm\{rms\}\}\.

The proof appears in Appendix[B\.3](https://arxiv.org/html/2606.25004#A2.SS3)\.

Implications for directional sharpness\.Intuitively, Theorem[2](https://arxiv.org/html/2606.25004#Thmtheorem2)shows that full\-batch sharpness only sees the*average*gradient, while mini\-batch sharpness is influenced by the*typical per\-example*gradient magnitude\. Thus, if per\-example gradients disagree, they can cancel in the full\-batch average, making‖g\(𝒘\)‖2\\\|g\(\{\\bm\{w\}\}\)\\\|\_\{2\}small even when the per\-example gradients themselves have large norm\. In other words, full\-batch sharpness can average away high per\-example gradient signals, making low\-coherence deviations difficult to detect\.

In contrast, these deviations can be detected by mini\-batch sharpness\. Eq\.[4](https://arxiv.org/html/2606.25004#S4.E4)shows that under low\-coherence deviations, the squared RMS gap is controlled by the average per\-example gradient scale1N∑i‖gi‖22\\frac\{1\}\{N\}\\sum\_\{i\}\\\|g\_\{i\}\\\|\_\{2\}^\{2\}and grows as≈1/B\\approx 1/B\. Equivalently, the RMS mini\-batch sharpness scale grows as≈1/B\\approx 1/\\sqrt\{B\}\. Thus, even when𝖲\\mathsf\{S\}is small, the mini\-batch second\-moment signal𝔼ξ\[𝖲ξ2\]\\mathbb\{E\}\_\{\\xi\}\[\\mathsf\{S\}\_\{\\xi\}^\{2\}\]can remain large, and using a smaller batch size makes this gap easier to detect\.

Finally, directional sharpness can amplify this one\-step RMS gap by aggregating mini\-batch sharpness overTTsteps\. Results from\[[12](https://arxiv.org/html/2606.25004#bib.bib42)\]suggest that low\-coherence solutions are less stable under SAM\-like dynamics; since directional sharpness includes a SAM\-style probing trajectory, low\-coherence models are more likely to exhibit larger fluctuation or escape behavior than coherent ones, making them more detectable\.

### 4\.5Empirical Analysis of Directional Sharpness

Our theoretical analysis provides evidence that by measuringdynamic stabilityrather than astatic snapshot, directional sharpness could serve as a more robust and reliable predictor of generalization\. We now validate directional sharpness against Desideratum[1](https://arxiv.org/html/2606.25004#S3.I1.i1)by evaluating its empirical correlation with the generalization gap\.

Setup\.We mimic the experimental setup of the large\-scale experiment by Jiang et al\.\[[43](https://arxiv.org/html/2606.25004#bib.bib24)\]and train a large grid of models on CIFAR\-10 and CIFAR\-100\[[49](https://arxiv.org/html/2606.25004#bib.bib54)\]with different optimizers, model architectures, and hyperparameters\. For CIFAR\-10, we use VGG with different sizes: VGG\-13\-BN, VGG\-16\-BN, and VGG\-19\-BN\[[74](https://arxiv.org/html/2606.25004#bib.bib67)\]; for CIFAR\-100, we use WideResNet28\-10\[[96](https://arxiv.org/html/2606.25004#bib.bib68)\]\. We train these models using the optimizers SGD, Adam, SAM\[[25](https://arxiv.org/html/2606.25004#bib.bib23)\], ASAM\[[50](https://arxiv.org/html/2606.25004#bib.bib22)\]\. For Adam, we choose learning rates\{0\.001,0\.0005,0\.00032,0\.0001\}\\\{0\.001,0\.0005,0\.00032,0\.0001\\\}, while for the other optimizers we use\{0\.1,0\.032,0\.01,0\.05\}\\\{0\.1,0\.032,0\.01,0\.05\\\}\. We sweep weight decay in\{0\.0001,0\.00005\}\\\{0\.0001,0\.00005\\\}, batch size in\{32,64,128\}\\\{32,64,128\\\}, and dropout rate in\{0\.0,0\.25,0\.5\}\\\{0\.0,0\.25,0\.5\\\}\. All models are trained to a 0\.01 cross\-entropy loss \(estimated on 100 mini\-batches\), and we discard any that fail to converge within 200 epochs\. For each converged model, we estimate the generalization gap using the difference between the test and training lossesGap⁡\(𝒘\)≈ℒDtest\(𝒘\)−ℒDtrain\(𝒘\)\\operatorname\{Gap\}\(\{\\bm\{w\}\}\)\\approx\\mathcal\{L\}\_\{D\_\{test\}\}\(\{\\bm\{w\}\}\)\-\\mathcal\{L\}\_\{D\_\{train\}\}\(\{\\bm\{w\}\}\)and compute its correlation with directional sharpness, ASAM sharpness \(ASAM\), and the best\-performing magnitude\-aware worst\-case sharpness \(Worst\-Case\) reported by Jiang et al\.\[[43](https://arxiv.org/html/2606.25004#bib.bib24)\]\. Following\[[43](https://arxiv.org/html/2606.25004#bib.bib24)\], we report Spearman’sρ\\rho, Kendall’sτ\\tau, and the granulated Kendall’s coefficientΨ\\Psi\.

Table 1:Correlation between different sharpness measures and generalization gap on a grid of1,1521\{,\}152models with different architectures, optimizers, and hyperparameters\.BBdenotes the batch size used for the sharpness computation\. Our directional sharpness measure has a significantly stronger correlation than the static magnitude\-aware worst\-case sharpness \(Worst\-Case\) and ASAM sharpness \(ASAM\) baselines\.Results and discussion\.Table[1](https://arxiv.org/html/2606.25004#S4.T1)shows all three correlation coefficients for our directional sharpness measure \(Ours\) compared against several existing sharpness measures\. Our directional sharpness measure shows a significantly stronger correlation with the generalization gap than all other sharpness measures\. These findings provide empirical support for our theoretical analysis and validate our core insight: reframing sharpness from a static geometric property to a measure of dynamic stability yields a metric that is more predictive of generalization than existing approaches\.

This section confirms that directional sharpness meets Desideratum[1](https://arxiv.org/html/2606.25004#S3.I1.i1), showing a stronger correlation with generalization than any static baseline\. We next proceed to validate it against Desideratum[2](https://arxiv.org/html/2606.25004#S3.I1.i2)\(robustness under training deviations\)\.

## 5Directional Sharpness for Model Certification

The previous section demonstrated the high correlation between directional sharpness and generalization, but a certification metric must also satisfy our two remaining desiderata: it must be robust against training deviations \(Desideratum[2](https://arxiv.org/html/2606.25004#S3.I1.i2)\) and efficiently computable \(Desideratum[3](https://arxiv.org/html/2606.25004#S3.I1.i3)\)\. In this section, we explore these two properties\. First, in Section[5\.1](https://arxiv.org/html/2606.25004#S5.SS1), we address Desideratum[2](https://arxiv.org/html/2606.25004#S3.I1.i2)by showing that directional sharpness consistently detects model failures that result from training deviations, including ones that cannot be detected by test accuracy or existing sharpness measures\.

Next, in Sections[5\.2](https://arxiv.org/html/2606.25004#S5.SS2)and[5\.3](https://arxiv.org/html/2606.25004#S5.SS3), we explore efficiency in the context of two related applications:*model auditing*and*model certification*\. We define model auditing using a single algorithm,𝖠𝗎𝖽𝗂𝗍\\mathsf\{Audit\}, that takes as input a model and its training data and outputs a binary decision indicating whether or not the model is satisfactory\. \(Alternatively, we could imagine𝖠𝗎𝖽𝗂𝗍\\mathsf\{Audit\}outputting a confidence score or other numeric value\. In Section[5\.3](https://arxiv.org/html/2606.25004#S5.SS3), we discuss a methodology for identifying thresholds for directional sharpness that verifiers could then use in predicates for producing a binary decision\.\) In our experiments, we show that this algorithm can be run up to4×4\\timesfaster with directional sharpness than with test\-accuracy evaluation, while also obtaining better correlation with generalization and robustness\.

The𝖠𝗎𝖽𝗂𝗍\\mathsf\{Audit\}algorithm can be run by a model provider wanting to audit its own model before releasing it, but it is not appropriate for an external verifier to whom the model provider might not want to reveal the training data\. Instead, we capture this setting of model*certification*as a pair of algorithms,𝖯𝗋𝗈𝗏𝖾\\mathsf\{Prove\}and𝖵𝖾𝗋𝗂𝖿𝗒\\mathsf\{Verify\}, where𝖯𝗋𝗈𝗏𝖾\\mathsf\{Prove\}takes as input the model and training data and outputs a proof, and𝖵𝖾𝗋𝗂𝖿𝗒\\mathsf\{Verify\}takes as input the model and the proof and again outputs either a binary decision \(in keeping with the traditional model for zero knowledge and other cryptographic primitives\) or a numeric score\. Here, we demonstrate that proving directional sharpness in zero knowledge is up to80,000×80\{,\}000\\timesmore efficient than proving the entire training process\.

Testbed\.All of our experiments are implemented in Python using PyTorch and run on a Google Colab A100 runtime with 83\.5 GB of RAM and a single NVIDIA A100 GPU\. For SAM training, we adopt the code provided by\[[99](https://arxiv.org/html/2606.25004#bib.bib28)\]\. For backdoor attacks, we use the toolbox from Li et al\.\[[55](https://arxiv.org/html/2606.25004#bib.bib55)\]\.

Implementation details of sharpness measures\.Unless specified otherwise, we setB=16B=16,T=10T=10,𝑙𝑟=0\.0001\\mathit\{lr\}=0\.0001,ρ=0\.05\\rho=0\.05, and use ASAM sharpness as the per\-step sharpness function in directional sharpness\. The static sharpness baselines we consider are also magnitude\-aware\. Specifically, we consider magnitude\-aware worst\-case sharpness, which we compute using the gradient\-ascent algorithm from Jiang et al\.\[[43](https://arxiv.org/html/2606.25004#bib.bib24), Algorithm 3\]; we also consider ASAM sharpness \(Def\.[4](https://arxiv.org/html/2606.25004#Thmdefinition4)\)\. We sometimes refer to these sharpness measures as “static” to contrast them with the dynamic nature of our directional sharpness measure\.

### 5\.1Directional Sharpness is More ReliableunderTraining Deviations

To evaluate reliability under training deviations, we run a set of*distinguishability*experiments that mimic common auditing and certification settings\. Each experiment involves a benign modelMbM\_\{b\}paired with a faulty modelMfM\_\{f\}, matched to \(nearly\) the same*observed accuracy*—the accuracy signal the verifier would naturally inspect in that scenario \(e\.g\., standard test accuracy, pre\-quantization accuracy, or accuracy on a small test set\)\. We then ask whether sharpness metrics can flag the faulty model when observed accuracy fails\. For each pair, we evaluate observed accuracy together with two metrics computed on the*clean*training set: full\-dataset ASAM sharpness \(Def\.[4](https://arxiv.org/html/2606.25004#Thmdefinition4)\) and directional sharpness\. We summarize each metric’s distinguishability by its benign/faulty ratio \(its value onMbM\_\{b\}divided by its value onMfM\_\{f\}\); a smaller ratio indicates better separation\. The failure settings we consider are:

Noisy labels\.This setting mimics an auditor who observes similar holdout accuracy, yet one model is trained on mislabeled data\. We train a VGG\-16\-BNMbM\_\{b\}on clean CIFAR\-10 and a faultyMfM\_\{f\}on CIFAR\-10 with 10% random label corruption \(each corrupted label replaced by a uniformly sampled incorrect class\), stopping both runs at 90% training accuracy\. Observed accuracy is measured on the standard test set\.

Overfitting to spurious features\.This setting mimics a scenario where two models appear equally accurate on the standard test set, but one has learned a non\-robust shortcut that may fail under distribution shift\. We train a VGG\-16\-BNMbM\_\{b\}on clean CIFAR\-10 and a faultyMfM\_\{f\}on a modified training set where, with probability0\.50\.5, we overlay a top\-left square whose color is deterministically tied to the label\. This spurious feature\[[31](https://arxiv.org/html/2606.25004#bib.bib43),[42](https://arxiv.org/html/2606.25004#bib.bib86),[98](https://arxiv.org/html/2606.25004#bib.bib87)\]is highly predictive during training but non\-causal\. Both runs are stopped at 90% training accuracy, and observed accuracy is standard test accuracy\.

Backdoors\.A backdoor attack\[[82](https://arxiv.org/html/2606.25004#bib.bib18),[75](https://arxiv.org/html/2606.25004#bib.bib17),[34](https://arxiv.org/html/2606.25004#bib.bib16),[54](https://arxiv.org/html/2606.25004#bib.bib83)\]implants a hidden trigger that causes the model to behave normally on benign inputs but switch to a malicious target behavior when the trigger is present\. We train a VGG\-16\-BNMbM\_\{b\}for 200 epochs on the clean CIFAR\-10 dataset and a faultyMfM\_\{f\}on CIFAR\-10 with 1% poisoned images, following BadNets\[[34](https://arxiv.org/html/2606.25004#bib.bib16)\]\. Observed accuracy is clean test accuracy\.

Post\-quantization model quality\.This setting mimics a scenario where two models have similar pre\-quantization accuracy, but one is more robust to quantization\-induced utility degradation\[[57](https://arxiv.org/html/2606.25004#bib.bib92)\]\. To construct two models with similar pre\-quantization accuracy but different post\-quantization performance, we train two VGG\-16\-BN models,MbM\_\{b\}with SAM andMfM\_\{f\}with standard SGD, and then perform 4\-bit quantization on both models;MbM\_\{b\}sees a minimal accuracy drop \(0\.920−0\.914=0\.0060\.920\-0\.914=0\.006\), whileMfM\_\{f\}suffers a much larger degradation \(0\.918−0\.893=0\.0250\.918\-0\.893=0\.025\)\. We report the pre\-quantization accuracy as observed accuracy\.

Small test sets\.In many real\-world scenarios, test sets can be unreliable, unrepresentative, or maliciously crafted\. We simulate a scenario where the available test set is too small to reliably reflect true performance\. We train a VGG\-16\-BNMbM\_\{b\}on clean CIFAR\-10 and a faultyMfM\_\{f\}with10%10\\%label corruption, both for 200 epochs, and craft a 100\-sample test set on which both models achieve 90% accuracy\. On the full CIFAR\-10 test set,MfM\_\{f\}performs substantially worse \(75%75\\%vs\.88%88\\%\)\. Observed accuracy is the small test set accuracy\.

The results for all five categories are in Table[2](https://arxiv.org/html/2606.25004#S5.T2), and clearly demonstrate the limitations of static metrics\. In all five scenarios, observed accuracy is \(nearly\) identical for the benign and faulty models, and the static sharpness ratios are much closer to 1 than the directional\-sharpness ratios, indicating weaker separation\.

Experiments on subtler and stronger deviations\.To further test whether directional sharpness remains informative beyond the settings in Table[2](https://arxiv.org/html/2606.25004#S5.T2), we run additional distinguishability experiments in Appendix[A](https://arxiv.org/html/2606.25004#A1)covering more diverse and challenging scenarios: \(i\) finer\-grained label\-noise rates; \(ii\) an adaptive, sharpness\-aware backdoor attack\[[39](https://arxiv.org/html/2606.25004#bib.bib11)\]; and \(iii\) memorization/backdoor failures on language models\. Across these settings, directional sharpness remains the strongest signal for benign/faulty separation\.

Table 2:Distinguishability of faulty models under observed accuracy, static ASAM sharpness, and directional sharpness\. Each entry reports the benign/faulty ratio\. Lower ratios indicate stronger separation\. Observed accuracy denotes the natural accuracy signal available to the verifier and is experiment\-specific\.
### 5\.2Model Quality Auditing

Table 3:Runtime of evaluating different metrics and their correlation with the generalization gap\. Directional sharpness is not only the most efficient metric, faster than even plain test accuracy, but also offers the strongest Kendall’sτ\\taucorrelation with generalization\.We now turn to the problem of model auditing, in which a verifier wants to measure the quality of a model given access to both the model and its training data\. We show that an auditing algorithm based on standard test accuracy fails to detect certain types of flaws that directional sharpness detects, and that prior sharpness metrics are too inefficient for this use case\.

Efficiency comparison with existing sharpness metrics\.We measure the runtime of computing directional sharpness and compare it against test accuracy and magnitude\-aware worst\-case sharpness \(Def\.[2](https://arxiv.org/html/2606.25004#Thmdefinition2)\) computed using the gradient\-ascent method described by Jiang et al\.\[[43](https://arxiv.org/html/2606.25004#bib.bib24)\]\. Table[3](https://arxiv.org/html/2606.25004#S5.T3)shows the runtime and correlation for different metrics\. When computed on a small batch \(B=8B=8,T=5T=5\), directional sharpness is over1,000×1\{,\}000\\timesfaster than static sharpness and even4×4\\timesfaster than computing accuracy over the full test set \(0\.21 vs\. 0\.92 seconds\)\. Consistent with Section[4\.5](https://arxiv.org/html/2606.25004#S4.SS5), directional sharpness also provides the strongest correlation with generalization\.

Auditing model privacy via membership inference\.Membership inference attacks \(MIAs\)\[[73](https://arxiv.org/html/2606.25004#bib.bib12)\]are a standard auditing tool for assessing privacy leakage: if an attacker can reliably distinguish training points \(“members”\) from unseen points \(“non\-members”\), then the model reveals information about its training data\. Prior work links MIA vulnerability to poor generalization and overfitting\[[94](https://arxiv.org/html/2606.25004#bib.bib34),[7](https://arxiv.org/html/2606.25004#bib.bib66)\], and shows that training deviations can further amplify membership leakage\[[86](https://arxiv.org/html/2606.25004#bib.bib65),[41](https://arxiv.org/html/2606.25004#bib.bib64)\]\. Motivated by our finding that directional sharpness provides a more reliable signal of poor generalization under deviations, we ask whether it can also serve as a stronger membership score—i\.e\., yield better separability between members and non\-members—and thus improve privacy auditing\.

Most black\-box MIAs use per\-sample loss or confidence as the decision signal, based on the observation that memorized training examples tend to incur lower loss\. We evaluate three per\-example scores—loss, static ASAM sharpness, and directional sharpness—on a VGG\-16\-BN model trained on CIFAR\-10, using a balanced set of 1,000 members \(training points\) and 1,000 non\-members \(test points\)\. Per\-example directional sharpness is computed by treating the example as the only training data and settingB=1B=1\. We compare attack performance using threshold\-independent AUC and best\-threshold accuracy \(ACC\)\. As shown in Table[4](https://arxiv.org/html/2606.25004#S5.T4), directional sharpness achieves the highest AUC and ACC, indicating that our dynamic stability test strengthens loss\-based MIA privacy auditing by improving member/non\-member separability\.

Table 4:MIA performance when using different per\-sample scores as the attack signal\. Higher AUC indicates better separability\.
### 5\.3Model Quality Certification

We finally turn our attention to the problem of model certification, in which a prover wants to convince a verifier of the quality of a model to which the verifier has access \(but, crucially, the verifier does not have access to the training data\)\. Given that we have demonstrated the strong correlation between our directional sharpness metric and model generalization \(and thus quality\), we show how to achieve this by generating a zero\-knowledge proof \(ZKP\) of directional sharpness\.

![Refer to caption](https://arxiv.org/html/2606.25004v1/x1.png)Figure 1:Static ASAM sharpness vs\. directional sharpness probe trajectories on models trained with SGD and SAM\. Static sharpness incorrectly suggests the SGD model generalizes better at epochs 130, 140, and 150, while directional sharpness trajectories reveal the SAM model is more stable under perturbations\.Table 5:Static ASAM sharpness vs\. directional sharpness values on two models trained with SAM and SGD\. Static sharpness incorrectly suggests the SGD model has better generalization, while directional sharpness correctly identifies the SAM model as the better\-generalizing one\.Effectiveness of directional sharpness\.We first demonstrate the effectiveness of directional sharpness compared to existing static sharpness metrics in model certification\. In particular, we build on our previous results by considering the specific use case in which a malicious model trainer wants to convince a verifier that a model was trained using SAM \(which is expensive\) but, in reality, trained the model using standard SGD \(which is much cheaper\)\. To simulate this scenario, we train two VGG\-16\-BN models on CIFAR\-10 for 150 epochs\. One model was trained with standard SGD, representing the malicious trainer’s model, while the other was honestly trained with SAM\. After training, the SAM model has a lower generalization gap \(0\.205\) than the SGD model \(0\.273\) and thus better generalization\.

Figure[1](https://arxiv.org/html/2606.25004#S5.F1)plots the trajectory of both sharpness metrics over the last5050epochs of training\. The static ASAM sharpness metric \(Def\.[4](https://arxiv.org/html/2606.25004#Thmdefinition4), dashed lines\) proves to be an unreliable signal, showing both models converging to similar values and incorrectly suggesting the SGD model has better generalization \(Table[5](https://arxiv.org/html/2606.25004#S5.T5)\)\. In contrast, directional sharpness \(solid lines\) is effective at distinguishing SAM and SGD models at all epochs\. The SAM model \(orange lines\) consistently displays low and stable sharpness, indicating it resides in a uniformly flat basin\. The SGD model \(blue lines\) exhibits high and erratic sharpness spikes\. This shows that while its full\-dataset sharpness may appear low, it is stuck in a sharp basin that is immediately revealed by directional sharpness\.

Efficiency of directional sharpness\.Now that we have established directional sharpness as more effective for model certification, we begin evaluating the end\-to\-end performance of proving directional sharpness in zero knowledge to demonstrate its efficiency\. We compare this cost against a*zero\-knowledge proof of training \(zkPoT\)*\[[1](https://arxiv.org/html/2606.25004#bib.bib81),[30](https://arxiv.org/html/2606.25004#bib.bib90)\]\.

To obtain realistic and concrete cost estimates, we extrapolate from the reported results of Kaizen\[[1](https://arxiv.org/html/2606.25004#bib.bib81)\]\. Kaizen proves mini\-batch gradient\-descent training using an optimized GKR\-style proof for each gradient update\. Its instantiation uses 64\-bit fixed\-point encoding with 32 fractional bits over𝔽p2\\mathbb\{F\}\_\{p^\{2\}\}, wherep=261−1p=2^\{61\}\-1\. Mini\-batches are sampled by applying a public pseudorandom permutation to the committed dataset indices and taking consecutive blocks of sizeBB\. Since both the zkPoT baseline and our ZKP of directional sharpness use the same Kaizen instantiation with parameters chosen forλ=100\\lambda=100\-bit security, both inherit a knowledge\-soundness error of at most2−1002^\{\-100\}for the corresponding statements\. We refer to\[[1](https://arxiv.org/html/2606.25004#bib.bib81)\]for their full protocol and implementation details\.

LetPBSGDP\_\{B\}^\{\\text\{SGD\}\}be the cost of proving one SGD iteration on batch sizeBBwith prover runtime and proof size taken from\[[1](https://arxiv.org/html/2606.25004#bib.bib81)\]\. Since SAM requires twice the computation per step versus SGD\[[25](https://arxiv.org/html/2606.25004#bib.bib23)\], we setPBSAM≈2⋅PBSGDP\_\{B\}^\{\\text\{SAM\}\}\\approx 2\\cdot P\_\{B\}^\{\\text\{SGD\}\}\. To avoid over\-penalizing our baseline, we adopt the finding that 1–5 epochs of SAM at the late stage achieve comparable generalization to full SAM training\[[99](https://arxiv.org/html/2606.25004#bib.bib28)\]and consider a “LATE\-SAM” baseline that proves 145 SGD epochs followed by 5 SAM epochs with batch sizeB=16B=16\. The total cost for the LATE\-SAM baseline is therefore155\|D\|B⋅PBSGD\\frac\{155\|D\|\}\{B\}\\cdot P\_\{B\}^\{\\text\{SGD\}\}\. Our ZKP of directional sharpness provesTTSAM steps plus a small circuit for the fluctuation function\. Since the cost of the fluctuation function is small compared withPBSAMP\_\{B\}^\{\\text\{SAM\}\}, we estimate our total cost as\(2T\)⋅PBSGD\(2T\)\\cdot P\_\{B\}^\{\\text\{SGD\}\}\.

Table[6](https://arxiv.org/html/2606.25004#S5.T6)reports these cost estimates\. A ZKP of directional sharpness \(running forT=5T=5steps with aB=8B=8batch\) is up to80,000×80\{,\}000\\timesfaster than proving the full 150\-epoch LATE\-SAM training run: a ZKP of LATE\-SAM training for VGG\-11 would take an estimated118,671118\{,\}671hours \(over 13 years\), while our ZKP of directional sharpness takes under 90 minutes\.

ModelsZKP of Directional SharpnessZKP of Training\(B=8,T=5\)\(B=8,T=5\)\(B=16\)\(B=16\)TimeComm\.SpeedupTimeComm\.\(min\.\)\(MB\)\(×\\times\)\(hr\.\)\(GB\)LeNet23\.639\.85𝟔𝟔,𝟎𝟔𝟑\\mathbf\{66\{,\}063\}26,016\{26\{,\}016\}471\.64\{471\.64\}AlexNet55\.4311\.59𝟔𝟗,𝟎𝟗𝟏\\mathbf\{69\{,\}091\}63,830\{63\{,\}830\}579\.72\{579\.72\}VGG\-1188\.8715\.28𝟖𝟎,𝟏𝟏𝟗\\mathbf\{80\{,\}119\}118,671118\{,\}671751\.59751\.59Table 6:Cost estimates for ZKPs of directional sharpness vs\. ZKPs of LATE\-SAM training\. For the zkPoT baseline, all models are trained for 145 epochs using SGD and switched to SAM for the final 5 epochs\.\|D\|=50,000\|D\|=50\{,\}000\.Certification decision based on directional sharpness\.Our experiment shows that directional sharpness provides a robust signal for distinguishing high\-quality models from low\-quality ones\. In practice, however, a verifier must still choose a threshold or decision rule to convert this signal into a binary “accept/reject” decision\.

We note that this threshold selection is a limitation of any metric\-based certification approach, since the true generalization gap is unobservable\. Thus, the goal is not to find a universal constant, but rather a metric reliable enough that practical calibration procedures can be effective\. Our theoretical and empirical results show that directional sharpness is well suited for this purpose: theoretically, it provides a principled measure of instability that is robust under training deviations \(Section[4\.3](https://arxiv.org/html/2606.25004#S4.SS3)and Section[4\.4](https://arxiv.org/html/2606.25004#S4.SS4)\); empirically, it exhibits stronger correlation with generalization \(Section[4\.5](https://arxiv.org/html/2606.25004#S4.SS5)\) and wider separation margins between benign and faulty models \(Table[2](https://arxiv.org/html/2606.25004#S5.T2)\) compared to existing metrics\.

These properties enable a straightforward calibration procedure: \(i\) construct a small calibration set of models of known quality for the task family \(e\.g\., a verifier may already have baseline models trained with SGD or SAM\); \(ii\) select a thresholdτ\\tauthat separates these models based on an acceptable risk level; \(iii\) applyτ\\tauto evaluate new models within the same task family\. In certification settings where a binary decision is not required, verifiers can also treat the directional sharpness value directly as a continuous confidence score, with lower values indicating higher model quality\.

## 6Conclusion

Model quality certification is essential for providing transparency and accountability in machine learning systems\. In this work, we use generalization as a proxy for overall model quality and develop a new, efficient methodology for its certification\. Our central contribution is a metric we call directional sharpness\. This measure strongly correlates with generalization, is reliable under existing training deviations, and is efficiently computable\. These properties make directional sharpness a suitable ingredient in ML model certification\. Therefore, directional sharpness can play an important role in mitigating attacks in adversarial machine learning\. Using directional sharpness to devise rigorous defenses against adversarial models, and incorporating these defenses into existing systems, is a natural continuation of this research\.

## Acknowledgments

Gefei Tan is supported by a Google PhD Fellowship\.

## References

- \[1\]K\. Abbaszadeh, C\. Pappas, J\. Katz, and D\. Papadopoulos\(2024\)Zero\-knowledge proofs of training for deep neural networks\.InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security,pp\. 4316–4330\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p2.1),[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p2.1),[§3\.1](https://arxiv.org/html/2606.25004#S3.SS1.p4.3),[§5\.3](https://arxiv.org/html/2606.25004#S5.SS3.p4.1),[§5\.3](https://arxiv.org/html/2606.25004#S5.SS3.p5.5),[§5\.3](https://arxiv.org/html/2606.25004#S5.SS3.p6.8)\.
- \[2\]P\. Alquier\(2024\)User\-friendly introduction to pac\-bayes bounds\.Foundations and Trends® in Machine Learning17\(2\),pp\. 174–303\.External Links:ISSN 1935\-8245,[Document](https://dx.doi.org/10.1561/2200000100)Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p3.1),[§1](https://arxiv.org/html/2606.25004#S1.p1.1),[§1](https://arxiv.org/html/2606.25004#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.25004#S2.SS2.p1.6)\.
- \[3\]M\. Andriushchenko, D\. Bahri, H\. Mobahi, and N\. Flammarion\(2023\)Sharpness\-aware minimization leads to low\-rank features\.Advances in Neural Information Processing Systems36,pp\. 47032–47051\.Cited by:[§4\.3](https://arxiv.org/html/2606.25004#S4.SS3.p2.1)\.
- \[4\]M\. Andriushchenko and N\. Flammarion\(2022\)Towards understanding sharpness\-aware minimization\.InInternational conference on machine learning,pp\. 639–668\.Cited by:[§1\.1](https://arxiv.org/html/2606.25004#S1.SS1.p3.1),[§1\.1](https://arxiv.org/html/2606.25004#S1.SS1.p4.1)\.
- \[5\]M\. S\. M\. S\. Annamalai, B\. Balle, J\. Hayes, G\. Kaissis, and E\. De Cristofaro\(2025\)The hitchhiker’s guide to efficient, end\-to\-end, and tight dp auditing\.arXiv preprint arXiv:2506\.16666\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p1.1)\.
- \[6\]C\. Baek, J\. Z\. Kolter, and A\. Raghunathan\(2024\)Why is sam robust to label noise?\.InThe Twelfth International Conference on Learning Representations,Cited by:[§1\.1](https://arxiv.org/html/2606.25004#S1.SS1.p4.1),[§1](https://arxiv.org/html/2606.25004#S1.p5.1),[§4\.1](https://arxiv.org/html/2606.25004#S4.SS1.p3.1)\.
- \[7\]T\. Baluta, S\. Shen, S\. Hitarth, S\. Tople, and P\. Saxena\(2022\)Membership inference attacks and generalization: a causal perspective\.InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security,pp\. 249–262\.Cited by:[§5\.2](https://arxiv.org/html/2606.25004#S5.SS2.p3.1)\.
- \[8\]B\. Barz and J\. Denzler\(2020\)Do we train on test data? purging cifar of near\-duplicates\.Journal of Imaging6\(6\),pp\. 41\.Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p2.1)\.
- \[9\]C\. Baum, L\. Braun, A\. Munch\-Hansen, B\. Razet, and P\. Scholl\(2021\-11\)Appenzeller to brie: efficient zero\-knowledge proofs for mixed\-mode arithmetic and Z2k\.InACM CCS 2021,G\. Vigna and E\. Shi \(Eds\.\),,pp\. 192–211\.Cited by:[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p1.4)\.
- \[10\]C\. Baum, L\. Braun, A\. Munch\-Hansen, and P\. Scholl\(2022\-08\)Mozℤ2k\\mathbb\{Z\}\_\{2^\{k\}\}arella: efficient vector\-OLE and zero\-knowledge proofs overℤ2k\\mathbb\{Z\}\_\{2^\{k\}\}\.InCRYPTO 2022, Part IV,Y\. Dodis and T\. Shrimpton \(Eds\.\),LNCS, Vol\.13510,,pp\. 329–358\.Cited by:[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p1.4)\.
- \[11\]C\. Baum, A\. J\. Malozemoff, M\. B\. Rosen, and P\. Scholl\(2021\)Mac’n’cheese: zero\-knowledge proofs for boolean and arithmetic circuits with nested disjunctions\.InAnnual International Cryptology Conference,pp\. 92–122\.Cited by:[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p1.4)\.
- \[12\]W\. Chang and R\. Khanna\(2025\)A unified stability analysis of SAM vs SGD: role of data coherence and emergence of simplicity bias\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems,Cited by:[§1\.1](https://arxiv.org/html/2606.25004#S1.SS1.p7.1),[§4\.4](https://arxiv.org/html/2606.25004#S4.SS4.p11.1),[§4\.4](https://arxiv.org/html/2606.25004#S4.SS4.p2.6)\.
- \[13\]S\. Chatterjee\(2020\)Coherent gradients: an approach to understanding generalization in gradient descent\-based optimization\.InInternational Conference on Learning Representations,Cited by:[§1\.1](https://arxiv.org/html/2606.25004#S1.SS1.p7.1),[§4\.4](https://arxiv.org/html/2606.25004#S4.SS4.p3.5)\.
- \[14\]Z\. Chen, J\. Zhang, Y\. Kou, X\. Chen, C\. Hsieh, and Q\. Gu\(2023\)Why does sharpness\-aware minimization generalize better than SGD?\.InThirty\-seventh Conference on Neural Information Processing Systems,Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p5.1)\.
- \[15\]A\. Chiesa, Y\. Hu, M\. Maller, P\. Mishra, P\. Vesely, and N\. P\. Ward\(2020\-05\)Marlin: preprocessing zkSNARKs with universal and updatable SRS\.InEUROCRYPT 2020, Part I,A\. Canteaut and Y\. Ishai \(Eds\.\),LNCS, Vol\.12105,,pp\. 738–768\.Cited by:[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p1.4)\.
- \[16\]J\. Cohen, E\. Rosenfeld, and Z\. Kolter\(2019\)Certified adversarial robustness via randomized smoothing\.Ininternational conference on machine learning,pp\. 1310–1320\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p1.1)\.
- \[17\]L\. Dinh, R\. Pascanu, S\. Bengio, and Y\. Bengio\(2017\)Sharp minima can generalize for deep nets\.InProceedings of the 34th International Conference on Machine Learning \- Volume 70,ICML’17,pp\. 1019–1028\.Cited by:[§2\.2](https://arxiv.org/html/2606.25004#S2.SS2.p4.1)\.
- \[18\]S\. Dittmer, Y\. Ishai, S\. Lu, and R\. Ostrovsky\(2022\-11\)Improving line\-point zero knowledge: two multiplications for the price of one\.InACM CCS 2022,H\. Yin, A\. Stavrou, C\. Cremers, and E\. Shi \(Eds\.\),,pp\. 829–841\.Cited by:[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p1.4)\.
- \[19\]C\. Dwork, V\. Feldman, M\. Hardt, T\. Pitassi, O\. Reingold, and A\. L\. Roth\(2015\)Preserving statistical validity in adaptive data analysis\.InProceedings of the forty\-seventh annual ACM symposium on Theory of computing,pp\. 117–126\.Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p2.1)\.
- \[20\]G\. K\. Dziugaite, A\. Drouin, B\. Neal, N\. Rajkumar, E\. Caballero, L\. Wang, I\. Mitliagkas, and D\. M\. Roy\(2020\)In search of robust measures of generalization\.Advances in Neural Information Processing Systems33,pp\. 11723–11733\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p3.1),[§2\.2](https://arxiv.org/html/2606.25004#S2.SS2.p3.1)\.
- \[21\]G\. K\. Dziugaite and D\. M\. Roy\(2017\)Computing nonvacuous generalization bounds for deep \(stochastic\) neural networks with many more parameters than training data\.InProceedings of the Thirty\-Third Conference on Uncertainty in Artificial Intelligence,Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p3.1),[§1](https://arxiv.org/html/2606.25004#S1.p5.1)\.
- \[22\]C\. Fan, J\. Jia, Y\. Zhang, A\. Ramakrishna, M\. Hong, and S\. Liu\(2025\)Towards llm unlearning resilient to relearning attacks: a sharpness\-aware minimization perspective and beyond\.InInternational Conference on Machine Learning,pp\. 15762–15778\.Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p5.1)\.
- \[23\]Y\. Feng and Y\. Tu\(2021\)The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima\.Proceedings of the National Academy of Sciences118\(9\),pp\. e2015617118\.Cited by:[§4\.3](https://arxiv.org/html/2606.25004#S4.SS3.p7.1)\.
- \[24\]A\. Foong, W\. Bruinsma, D\. Burt, and R\. Turner\(2021\)How tight can pac\-bayes be in the small data regime?\.Advances in Neural Information Processing Systems34,pp\. 4093–4105\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p3.1)\.
- \[25\]P\. Foret, A\. Kleiner, H\. Mobahi, and B\. Neyshabur\(2021\)Sharpness\-aware minimization for efficiently improving generalization\.InInternational Conference on Learning Representations,Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p4.1),[§1](https://arxiv.org/html/2606.25004#S1.p5.1),[§2\.2](https://arxiv.org/html/2606.25004#S2.SS2.p3.1),[§4\.1](https://arxiv.org/html/2606.25004#S4.SS1.p1.7),[§4\.1](https://arxiv.org/html/2606.25004#S4.SS1.p3.1),[§4\.5](https://arxiv.org/html/2606.25004#S4.SS5.p2.9),[§5\.3](https://arxiv.org/html/2606.25004#S5.SS3.p6.8),[Definition 3](https://arxiv.org/html/2606.25004#Thmdefinition3)\.
- \[26\]O\. Franzese, A\. Shahin Shamsabadi, C\. Luck, and H\. Haddadi\(2025\)Secure and confidential certificates of online fairness\.Advances in Neural Information Processing Systems38,pp\. 40077–40107\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p2.1)\.
- \[27\]A\. Gabizon, Z\. J\. Williamson, and O\. Ciobotaru\(2019\)PLONK: permutations over Lagrange\-bases for oecumenical noninteractive arguments of knowledge\.Note:Cryptology ePrint Archive, Report 2019/953External Links:[Link](https://eprint.iacr.org/2019/953)Cited by:[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p1.4)\.
- \[28\]Y\. Gao, B\. G\. Doan, Z\. Zhang, S\. Ma, J\. Zhang, A\. Fu, S\. Nepal, and H\. Kim\(2020\)Backdoor attacks and countermeasures on deep learning: a comprehensive review\.arXiv preprint arXiv:2007\.10760\.Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p2.1)\.
- \[29\]I\. Garg, D\. Ravikumar, and K\. Roy\(2024\)Memorization through the lens of curvature of loss function around samples\.InProceedings of the 41st International Conference on Machine Learning,pp\. 15083–15101\.Cited by:[§1\.1](https://arxiv.org/html/2606.25004#S1.SS1.p3.1),[§4\.4](https://arxiv.org/html/2606.25004#S4.SS4.p5.1)\.
- \[30\]S\. Garg, A\. Goel, S\. Jha, S\. Mahloujifar, M\. Mahmoody, G\. Policharla, and M\. Wang\(2023\)Experimenting with zero\-knowledge proofs of training\.InProceedings of the 2023 ACM SIGSAC conference on computer and communications security,pp\. 1880–1894\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p2.1),[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p2.1),[§3\.1](https://arxiv.org/html/2606.25004#S3.SS1.p4.3),[§5\.3](https://arxiv.org/html/2606.25004#S5.SS3.p4.1)\.
- \[31\]R\. Geirhos, J\. Jacobsen, C\. Michaelis, R\. Zemel, W\. Brendel, M\. Bethge, and F\. A\. Wichmann\(2020\)Shortcut learning in deep neural networks\.Nature Machine Intelligence2\(11\),pp\. 665–673\.Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p2.1),[item](https://arxiv.org/html/2606.25004#S5.I1.i2.p1.3)\.
- \[32\]I\. J\. Goodfellow, J\. Shlens, and C\. Szegedy\(2015\)Explaining and harnessing adversarial examples\.InInternational Conference on Learning Representations,Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p2.1)\.
- \[33\]J\. Groth\(2016\-05\)On the size of pairing\-based non\-interactive arguments\.InEUROCRYPT 2016, Part II,M\. Fischlin and J\. Coron \(Eds\.\),LNCS, Vol\.9666,,pp\. 305–326\.Cited by:[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p1.4)\.
- \[34\]T\. Gu, K\. Liu, B\. Dolan\-Gavitt, and S\. Garg\(2019\)Badnets: evaluating backdooring attacks on deep neural networks\.Ieee Access7,pp\. 47230–47244\.Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p2.1),[item](https://arxiv.org/html/2606.25004#S5.I1.i3.p1.2)\.
- \[35\]S\. Gururangan, S\. Swayamdipta, O\. Levy, R\. Schwartz, S\. R\. Bowman, and N\. A\. Smith\(2018\)Annotation artifacts in natural language inference data\.InProceedings of NAACL\-HLT,pp\. 107–112\.Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p2.1)\.
- \[36\]M\. Haddouche, P\. Viallard, U\. Şimşekli, and B\. Guedj\(2025\)A pac\-bayesian link between generalisation and flat minima\.InALT 2025\-36th International Conference on Algorithmic Learning Theory,pp\. 1–31\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p3.1),[§1](https://arxiv.org/html/2606.25004#S1.p5.1),[§2\.2](https://arxiv.org/html/2606.25004#S2.SS2.p3.1)\.
- \[37\]M\. Hardt, E\. Price, and N\. Srebro\(2016\)Equality of opportunity in supervised learning\.InProceedings of the 30th International Conference on Neural Information Processing Systems,NIPS’16,Red Hook, NY, USA,pp\. 3323–3331\.External Links:ISBN 9781510838819Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p1.1)\.
- \[38\]H\. He, G\. Huang, and Y\. Yuan\(2019\)Asymmetric valleys: beyond sharp and flat local minima\.Advances in neural information processing systems32\.Cited by:[§1\.1](https://arxiv.org/html/2606.25004#S1.SS1.p3.1)\.
- \[39\]P\. He, H\. Xu, J\. Ren, Y\. Cui, S\. Zeng, H\. Liu, C\. C\. Aggarwal, and J\. Tang\(2024\)Sharpness\-aware data poisoning attack\.InThe Twelfth International Conference on Learning Representations,Cited by:[Appendix A](https://arxiv.org/html/2606.25004#A1.p4.3),[§5\.1](https://arxiv.org/html/2606.25004#S5.SS1.p3.1)\.
- \[40\]F\. Hellström, G\. Durisi, B\. Guedj, and M\. Raginsky\(2025\)Generalization bounds: perspectives from information theory and pac\-bayes\.Foundations and Trends in Machine Learning18\(1\),pp\. 1–223\.Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p1.1),[§1](https://arxiv.org/html/2606.25004#S1.p3.1)\.
- \[41\]H\. Hu, Z\. Salcic, G\. Dobbie, J\. Chen, L\. Sun, and X\. Zhang\(2022\)Membership inference via backdooring\.InInternational Joint Conference on Artificial Intelligence \(31st: 2022\),pp\. 3832–3838\.Cited by:[§5\.2](https://arxiv.org/html/2606.25004#S5.SS2.p3.1)\.
- \[42\]P\. Izmailov, P\. Kirichenko, N\. Gruver, and A\. G\. Wilson\(2022\)On feature learning in the presence of spurious correlations\.InProceedings of the 36th International Conference on Neural Information Processing Systems,NIPS ’22,Red Hook, NY, USA\.External Links:ISBN 9781713871088Cited by:[item](https://arxiv.org/html/2606.25004#S5.I1.i2.p1.3)\.
- \[43\]Y\. Jiang, B\. Neyshabur\*, H\. Mobahi, D\. Krishnan, and S\. Bengio\(2020\)Fantastic generalization measures and where to find them\.InInternational Conference on Learning Representations,Cited by:[§1\.1](https://arxiv.org/html/2606.25004#S1.SS1.p2.1),[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p4.1),[§1](https://arxiv.org/html/2606.25004#S1.p5.1),[§2\.2](https://arxiv.org/html/2606.25004#S2.SS2.p3.1),[§3\.3](https://arxiv.org/html/2606.25004#S3.SS3.p2.1),[§4\.5](https://arxiv.org/html/2606.25004#S4.SS5.p2.9),[§5\.2](https://arxiv.org/html/2606.25004#S5.SS2.p2.4),[Table 3](https://arxiv.org/html/2606.25004#S5.T3.1.1.2.1),[§5](https://arxiv.org/html/2606.25004#S5.p5.4)\.
- \[44\]M\. Jordan, J\. Lewis, and A\. G\. Dimakis\(2019\)Provable certificates for adversarial examples: fitting a ball in the union of polytopes\.Advances in neural information processing systems32\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p1.1)\.
- \[45\]N\. S\. Keskar, D\. Mudigere, J\. Nocedal, M\. Smelyanskiy, and P\. T\. P\. Tang\(2017\)On large\-batch training for deep learning: generalization gap and sharp minima\.InInternational Conference on Learning Representations,Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p4.1),[§1](https://arxiv.org/html/2606.25004#S1.p5.1),[§2\.2](https://arxiv.org/html/2606.25004#S2.SS2.p3.1)\.
- \[46\]A\. Khaled and P\. Richtárik\(2023\)Better theory for SGD in the nonconvex world\.Transactions on Machine Learning Research\.External Links:ISSN 2835\-8856Cited by:[Appendix B](https://arxiv.org/html/2606.25004#A2.p1.1)\.
- \[47\]N\. Kilbertus, A\. Gascón, M\. Kusner, M\. Veale, K\. Gummadi, and A\. Weller\(2018\)Blind justice: fairness with encrypted sensitive attributes\.InInternational Conference on Machine Learning,pp\. 2630–2639\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p2.1)\.
- \[48\]P\. W\. Koh, S\. Sagawa, H\. Marklund, S\. M\. Xie, M\. Zhang, A\. Balsubramani, W\. Hu, M\. Yasunaga, R\. L\. Phillips, I\. Gao,et al\.\(2021\)Wilds: a benchmark of in\-the\-wild distribution shifts\.InInternational conference on machine learning,pp\. 5637–5664\.Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p2.1)\.
- \[49\]A\. Krizhevsky\(2009\)Learning multiple layers of features from tiny images\.Cited by:[§4\.5](https://arxiv.org/html/2606.25004#S4.SS5.p2.9)\.
- \[50\]J\. Kwon, J\. Kim, H\. Park, and I\. K\. Choi\(2021\)Asam: adaptive sharpness\-aware minimization for scale\-invariant learning of deep neural networks\.InInternational conference on machine learning,pp\. 5905–5914\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p4.1),[§1](https://arxiv.org/html/2606.25004#S1.p5.1),[§2\.2](https://arxiv.org/html/2606.25004#S2.SS2.p3.1),[§2\.2](https://arxiv.org/html/2606.25004#S2.SS2.p4.1),[§4\.1](https://arxiv.org/html/2606.25004#S4.SS1.p2.1),[§4\.3](https://arxiv.org/html/2606.25004#S4.SS3.p10.9),[§4\.5](https://arxiv.org/html/2606.25004#S4.SS5.p2.9),[Definition 2](https://arxiv.org/html/2606.25004#Thmdefinition2),[Definition 4](https://arxiv.org/html/2606.25004#Thmdefinition4)\.
- \[51\]S\. Lab\.\(2017\)Libsnark: a C\+\+ library for zkSNARK proofs\.External Links:[Link](https://github.com/scipr-lab/libsnark)Cited by:[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p1.4)\.
- \[52\]M\. Lecuyer, V\. Atlidakis, R\. Geambasu, D\. Hsu, and S\. Jana\(2019\)Certified robustness to adversarial examples with differential privacy\.In2019 IEEE symposium on security and privacy \(SP\),pp\. 656–672\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p1.1)\.
- \[53\]T\. Li, P\. Zhou, Z\. He, X\. Cheng, and X\. Huang\(2024\)Friendly sharpness\-aware minimization\.InProceedings of the IEEE/CVF conference on computer vision and pattern recognition,pp\. 5631–5640\.Cited by:[§1\.1](https://arxiv.org/html/2606.25004#S1.SS1.p4.1)\.
- \[54\]Y\. Li, Y\. Jiang, Z\. Li, and S\. Xia\(2024\)Backdoor learning: a survey\.IEEE Transactions on Neural Networks and Learning Systems35\(1\),pp\. 5–22\.Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p2.1),[item](https://arxiv.org/html/2606.25004#S5.I1.i3.p1.2)\.
- \[55\]Y\. Li, M\. Ya, Y\. Bai, Y\. Jiang, and S\. Xia\(2023\)BackdoorBox: a python toolbox for backdoor learning\.InICLR Workshop,Cited by:[§5](https://arxiv.org/html/2606.25004#S5.p4.1)\.
- \[56\]J\. Liu, S\. Oya, and F\. Kerschbaum\(2021\)Generalization techniques empirically outperform differential privacy against membership inference\.arXiv preprint arXiv:2110\.05524\.Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p5.1)\.
- \[57\]J\. Liu, J\. Cai, and B\. Zhuang\(2021\)Sharpness\-aware quantization for deep neural networks\.arXiv preprint arXiv:2111\.12273\.Cited by:[item](https://arxiv.org/html/2606.25004#S5.I1.i4.p1.6)\.
- \[58\]D\. A\. McAllester\(1998\)Some pac\-bayesian theorems\.InProceedings of the Eleventh Annual Conference on Computational Learning Theory,COLT’ 98,New York, NY, USA,pp\. 230–234\.External Links:ISBN 1581130570Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p3.1),[§1](https://arxiv.org/html/2606.25004#S1.p3.1)\.
- \[59\]D\. A\. McAllester\(1999\)PAC\-bayesian model averaging\.InProceedings of the twelfth annual conference on Computational learning theory,pp\. 164–170\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p3.1),[§1](https://arxiv.org/html/2606.25004#S1.p3.1)\.
- \[60\]T\. Mori, L\. Ziyin, K\. Liu, and M\. Ueda\(2022\)Power\-law escape rate of sgd\.InInternational Conference on Machine Learning,pp\. 15959–15975\.Cited by:[§4\.3](https://arxiv.org/html/2606.25004#S4.SS3.p7.1)\.
- \[61\]V\. Nagarajan and Z\. Kolter\(2019\)Deterministic PAC\-bayesian generalization bounds for deep networks via generalizing noise\-resilience\.InInternational Conference on Learning Representations,Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p3.1)\.
- \[62\]J\. Negrea, M\. Haghifam, G\. K\. Dziugaite, A\. Khisti, and D\. M\. Roy\(2019\)Information\-theoretic generalization bounds for sgld via data\-dependent estimates\.Advances in Neural Information Processing Systems32\.Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.25004#S2.SS2.p1.6)\.
- \[63\]G\. Neu, G\. K\. Dziugaite, M\. Haghifam, and D\. M\. Roy\(2021\)Information\-theoretic generalization bounds for stochastic gradient descent\.InConference on Learning Theory,pp\. 3526–3545\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p3.1)\.
- \[64\]C\. G\. Northcutt, A\. Athalye, and J\. Mueller\(2021\)Pervasive label errors in test sets destabilize machine learning benchmarks\.InProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks,Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p2.1)\.
- \[65\]S\. Park, S\. Kim, and Y\. Lim\(2022\)Fairness audit of machine learning models with confidential computing\.InProceedings of the ACM Web Conference 2022,pp\. 3488–3499\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p2.1)\.
- \[66\]M\. Pérez\-Ortiz, O\. Rivasplata, J\. Shawe\-Taylor, and C\. Szepesvári\(2021\)Tighter risk certificates for neural networks\.Journal of Machine Learning Research22\(227\),pp\. 1–40\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p3.1)\.
- \[67\]H\. Petzka, M\. Kamp, L\. Adilova, C\. Sminchisescu, and M\. Boley\(2021\)Relative flatness and generalization\.InAdvances in Neural Information Processing Systems,A\. Beygelzimer, Y\. Dauphin, P\. Liang, and J\. W\. Vaughan \(Eds\.\),Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p5.1)\.
- \[68\]B\. Recht, R\. Roelofs, L\. Schmidt, and V\. Shankar\(2019\)Do imagenet classifiers generalize to imagenet?\.InInternational conference on machine learning,pp\. 5389–5400\.Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p2.1)\.
- \[69\]M\. Sakarvadia, A\. Ajith, A\. M\. Khan, N\. C\. Hudson, C\. Geniesse, K\. Chard, Y\. Yang, I\. Foster, and M\. W\. Mahoney\(2025\)Mitigating memorization in language models\.InThe Thirteenth International Conference on Learning Representations,Cited by:[Appendix A](https://arxiv.org/html/2606.25004#A1.p5.14)\.
- \[70\]K\. A\. Sankararaman, S\. De, Z\. Xu, W\. R\. Huang, and T\. Goldstein\(2020\)The impact of neural network overparameterization on gradient confusion and stochastic gradient descent\.InInternational conference on machine learning,pp\. 8469–8479\.Cited by:[§1\.1](https://arxiv.org/html/2606.25004#S1.SS1.p7.1),[§4\.4](https://arxiv.org/html/2606.25004#S4.SS4.p3.5)\.
- \[71\]A\. S\. Shamsabadi, G\. Tan, T\. I\. Cebere, A\. Bellet, H\. Haddadi, N\. Papernot, X\. Wang, and A\. Weller\(2024\)Confidential\-dpproof: confidential proof of differentially private training\.InInternational Conference on Learning Representations \(ICLR\),Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p2.1),[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p2.1)\.
- \[72\]A\. S\. Shamsabadi, S\. C\. Wyllie, N\. Franzese, N\. Dullerud, S\. Gambs, N\. Papernot, X\. Wang, and A\. Weller\(2023\)Confidential\-profitt: confidential proof of fair training of trees\.InThe Eleventh International Conference on Learning Representations,Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p2.1)\.
- \[73\]R\. Shokri, M\. Stronati, C\. Song, and V\. Shmatikov\(2017\)Membership inference attacks against machine learning models\.InIEEE Symposium on Security and Privacy,pp\. 3–18\.Cited by:[§5\.2](https://arxiv.org/html/2606.25004#S5.SS2.p3.1)\.
- \[74\]K\. Simonyan and A\. Zisserman\(2015\)Very deep convolutional networks for large\-scale image recognition\.InInternational Conference on Learning Representations,Cited by:[§4\.5](https://arxiv.org/html/2606.25004#S4.SS5.p2.9)\.
- \[75\]H\. Souri, L\. Fowl, R\. Chellappa, M\. Goldblum, and T\. Goldstein\(2022\)Sleeper agent: scalable hidden trigger backdoors for neural networks trained from scratch\.Advances in Neural Information Processing Systems35,pp\. 19165–19178\.Cited by:[item](https://arxiv.org/html/2606.25004#S5.I1.i3.p1.2)\.
- \[76\]J\. M\. Springer, V\. Nagarajan, and A\. Raghunathan\(2024\)Sharpness\-aware minimization enhances feature quality via balanced learning\.InThe Twelfth International Conference on Learning Representations,Cited by:[§4\.3](https://arxiv.org/html/2606.25004#S4.SS3.p2.1)\.
- \[77\]H\. Sun, T\. Bai, J\. Li, and H\. Zhang\(2024\)Zkdl: efficient zero\-knowledge proofs of deep learning training\.IEEE Transactions on Information Forensics and Security\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p2.1),[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p2.1),[§3\.1](https://arxiv.org/html/2606.25004#S3.SS1.p4.3)\.
- \[78\]I\. E\. Tampu, A\. Eklund, and N\. Haj\-Hosseini\(2022\)Inflation of test accuracy due to data leakage in deep learning\-based classification of oct images\.Scientific Data9\(1\),pp\. 580\.Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p2.1)\.
- \[79\]G\. Tan, A\. Gascón, S\. Meiklejohn, M\. Raykova, X\. Wang, and N\. Luo\(2025\)Founding zero\-knowledge proof of training on optimum vicinity\.InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security,pp\. 1173–1187\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p2.1),[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p2.1),[§3\.1](https://arxiv.org/html/2606.25004#S3.SS1.p4.3)\.
- \[80\]H\. Tang and R\. Khanna\(2026\)Sharpness\-aware machine unlearning\.InThe Fourteenth International Conference on Learning Representations,Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p5.1)\.
- \[81\]E\. Toreini, M\. Mehrnezhad, and A\. Van Moorsel\(2023\)Verifiable fairness: privacy–preserving computation of fairness for machine learning systems\.InEuropean symposium on research in computer security,pp\. 569–584\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p2.1)\.
- \[82\]A\. Turner, D\. Tsipras, and A\. Madry\(2019\)Label\-consistent backdoor attacks\.arXiv preprint arXiv:1912\.02771\.Cited by:[item](https://arxiv.org/html/2606.25004#S5.I1.i3.p1.2)\.
- \[83\]G\. Valle\-Pérez and A\. A\. Louis\(2020\)Generalization bounds for deep learning\.External Links:2012\.04115Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p3.1),[§1](https://arxiv.org/html/2606.25004#S1.p3.1)\.
- \[84\]P\. Viallard, R\. Emonet, A\. Habrard, E\. Morvant, and V\. Zantedeschi\(2024\)Leveraging pac\-bayes theory and gibbs distributions for generalization bounds with complexity measures\.InInternational conference on artificial intelligence and statistics,pp\. 3007–3015\.Cited by:[§2\.2](https://arxiv.org/html/2606.25004#S2.SS2.p3.1)\.
- \[85\]K\. Wen, Z\. Li, and T\. Ma\(2023\)Sharpness minimization algorithms do not only minimize sharpness to achieve better generalization\.Advances in Neural Information Processing Systems36,pp\. 1024–1035\.Cited by:[§4\.3](https://arxiv.org/html/2606.25004#S4.SS3.p2.1)\.
- \[86\]Y\. Wen, L\. Marchyok, S\. Hong, J\. Geiping, T\. Goldstein, and N\. Carlini\(2024\)Privacy backdoors: enhancing membership inference through poisoning pre\-trained models\.Advances in Neural Information Processing Systems37,pp\. 83374–83396\.Cited by:[§5\.2](https://arxiv.org/html/2606.25004#S5.SS2.p3.1)\.
- \[87\]C\. Weng, K\. Yang, Z\. Yang, X\. Xie, and X\. Wang\(2022\-11\)AntMan: interactive zero\-knowledge proofs with sublinear communication\.InACM CCS 2022,H\. Yin, A\. Stavrou, C\. Cremers, and E\. Shi \(Eds\.\),,pp\. 2901–2914\.Cited by:[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p1.4)\.
- \[88\]S\. Wojtowytsch\(2024\)Stochastic gradient descent with noise of machine learning type part ii: continuous time analysis\.Journal of Nonlinear Science34\(1\),pp\. 16\.Cited by:[§4\.3](https://arxiv.org/html/2606.25004#S4.SS3.p7.1)\.
- \[89\]E\. Wong, F\. Schmidt, J\. H\. Metzen, and J\. Z\. Kolter\(2018\)Scaling provable adversarial defenses\.Advances in Neural Information Processing Systems31\.Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p1.1)\.
- \[90\]D\. Wu, S\. Xia, and Y\. Wang\(2020\)Adversarial weight perturbation helps robust generalization\.InNeurIPS,Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p5.1)\.
- \[91\]Z\. Xing, Z\. Zhang, Z\. Zhang, Z\. Li, M\. Li, J\. Liu, Z\. Zhang, Y\. Zhao, Q\. Sun, L\. Zhu,et al\.\(2025\)Zero\-knowledge proof\-based verifiable decentralized machine learning in communication network: a comprehensive survey\.IEEE Communications Surveys & Tutorials\.Cited by:[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p2.1)\.
- \[92\]C\. Yadav, A\. Roy\-Chowdhury, D\. Boneh, and K\. Chaudhuri\(2024\)FairProof: confidential and certifiable fairness for neural networks\.InProceedings of the 41st International Conference on Machine Learning,Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p2.1)\.
- \[93\]K\. Yang, P\. Sarkar, C\. Weng, and X\. Wang\(2021\-11\)QuickSilver: efficient and affordable zero\-knowledge proofs for circuits and polynomials over any field\.InACM CCS 2021,G\. Vigna and E\. Shi \(Eds\.\),,pp\. 2986–3001\.Cited by:[§2\.3](https://arxiv.org/html/2606.25004#S2.SS3.p1.4)\.
- \[94\]S\. Yeom, I\. Giacomelli, M\. Fredrikson, and S\. Jha\(2018\)Privacy risk in machine learning: analyzing the connection to overfitting\.In2018 IEEE 31st computer security foundations symposium \(CSF\),pp\. 268–282\.Cited by:[§5\.2](https://arxiv.org/html/2606.25004#S5.SS2.p3.1)\.
- \[95\]D\. Yuan, M\. Zhang, S\. Wei, L\. Liu, and B\. Wu\(2025\)Activation gradient based poisoned sample detection against backdoor attacks\.InThe Thirteenth International Conference on Learning Representations,Cited by:[§1\.1](https://arxiv.org/html/2606.25004#S1.SS1.p3.1),[§4\.4](https://arxiv.org/html/2606.25004#S4.SS4.p5.1)\.
- \[96\]S\. Zagoruyko and N\. Komodakis\(2016\)Wide residual networks\.InProceedings of the British Machine Vision Conference \(BMVC\),pp\. 87\.1–87\.12\.External Links:[Document](https://dx.doi.org/10.5244/C.30.87)Cited by:[§4\.5](https://arxiv.org/html/2606.25004#S4.SS5.p2.9)\.
- \[97\]T\. Zhang, S\. Dong, O\. D\. Kose, Y\. Shen, and Y\. Zhang\(2025\)FairZK: a scalable system to prove machine learning fairness in zero\-knowledge\.In2025 IEEE Symposium on Security and Privacy \(SP\),Vol\.,pp\. 3460–3478\.External Links:[Document](https://dx.doi.org/10.1109/SP61157.2025.00205)Cited by:[§1\.2](https://arxiv.org/html/2606.25004#S1.SS2.p2.1)\.
- \[98\]L\. Zhao, Q\. Liu, L\. Yue, W\. Chen, L\. Chen, R\. Sun, and C\. Song\(2024\)Comi: correct and mitigate shortcut learning behavior in deep neural networks\.InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval,pp\. 218–228\.Cited by:[item](https://arxiv.org/html/2606.25004#S5.I1.i2.p1.3)\.
- \[99\]Z\. Zhou, M\. Wang, Y\. Mao, B\. Li, and J\. Yan\(2025\)Sharpness\-aware minimization efficiently selects flatter minima late in training\.InInternational Conference on Learning Representations,Cited by:[1st item](https://arxiv.org/html/2606.25004#S4.I1.i1.p1.1),[§4\.3](https://arxiv.org/html/2606.25004#S4.SS3.p1.1),[§4\.3](https://arxiv.org/html/2606.25004#S4.SS3.p2.1),[§4\.3](https://arxiv.org/html/2606.25004#S4.SS3.p6.1),[§5\.3](https://arxiv.org/html/2606.25004#S5.SS3.p6.8),[§5](https://arxiv.org/html/2606.25004#S5.p4.1),[Assumption 1](https://arxiv.org/html/2606.25004#Thmassumption1),[Assumption 2](https://arxiv.org/html/2606.25004#Thmassumption2)\.
- \[100\]M\. Zhu, S\. Wei, L\. Shen, Y\. Fan, and B\. Wu\(2023\)Enhancing fine\-tuning based backdoor defense with sharpness\-aware minimization\.InICCV,Cited by:[§1](https://arxiv.org/html/2606.25004#S1.p5.1)\.

## Appendix

## Appendix AAdditional Experiments

Table 7:Results of additional distinguishability experiments\. Each entry reports the benign/faulty ratio; lower values indicate better separation\. Directional sharpness provides the strongest separation\.We provide additional distinguishability experiments extending Section[5\.1](https://arxiv.org/html/2606.25004#S5.SS1), evaluating whether directional sharpness remains informative under subtler, more challenging deviations and on non\-vision models\. We follow the same evaluation protocol as in Section[5\.1](https://arxiv.org/html/2606.25004#S5.SS1): for each setting, we train benign/faulty model pairs and ask whether sharpness metrics can separate them when observed accuracy fails\. Table[7](https://arxiv.org/html/2606.25004#A1.T7)reports the benign/faulty ratios for the same metrics as in Section[5\.1](https://arxiv.org/html/2606.25004#S5.SS1)\.

Backdoor on ResNet\-18 \(ResNet\-BD\)\.We repeat the BadNets backdoor experiment on ResNet\-18\. The faulty model is trained on CIFAR\-10 with 1% poisoned images\. Both benign and faulty models are trained for 160 epochs\. The observed accuracy reported is the clean test accuracy\.

Varying label noise \(Noise\)\.We repeat the label\-noise experiment with varying noise rates to evaluate the sensitivity of directional sharpness to weaker forms of data\-quality degradation\. The observed accuracy reported is the clean test accuracy\.

Sharpness\-aware poisoning attack \(SAPA\)\.SAPA\[[39](https://arxiv.org/html/2606.25004#bib.bib11)\]uses SAM\-style perturbations during poison generation to optimize the adversarial objective, making the poison more robust to retraining uncertainty\. We train a benign ResNet\-18 on clean CIFAR\-10 using standard SGD, and a faulty ResNet\-18 using SAPA’s hidden\-trigger backdoor attack with1%1\\%of the training set poisoned, perturbation budgetϵ=16/255\\epsilon=16/255, and sharpness radiusρ=0\.05\\rho=0\.05\. Both models are trained for 160 epochs\. The observed accuracy is the clean test accuracy\.

Backdoor \(LM\-BD\) and memorization \(LM\-Mem\) on language models\.We extend our distinguishability tests to transformer language models using TinyMem\[[69](https://arxiv.org/html/2606.25004#bib.bib10)\], a suite of lightweight GPT\-2\-style models\. Using the 4\-layer multiplicative architecture, we train a benign modelMbM\_\{b\}on clean data and a faulty modelMfM\_\{f\}on data poisoned with either thebackdoorartifact\[[69](https://arxiv.org/html/2606.25004#bib.bib10), Def\. 2\.3\]or thenoiseartifact\[[69](https://arxiv.org/html/2606.25004#bib.bib10), Def\. 2\.2\], which inject a trigger pattern and a random perturbation, respectively, into a small fraction of the training sequences \(see\[[69](https://arxiv.org/html/2606.25004#bib.bib10)\]for details\)\. In both settings,MfM\_\{f\}memorizes the injected artifacts while matching the observed \(clean\-test\) accuracy ofMbM\_\{b\}: the backdooredMfM\_\{f\}attains99\.3%99\.3\\%backdoor memorization at96\.92%96\.92\\%observed accuracy \(vs\.97\.41%97\.41\\%forMbM\_\{b\}\), and the noise\-trainedMfM\_\{f\}memorizes37\.4%37\.4\\%of the artifacts at97\.11%97\.11\\%observed accuracy \(vs\.97\.40%97\.40\\%forMbM\_\{b\}\)\.

## Appendix BProofs of Lemmas and Theorems

We first establish two direct consequences of our assumptions\. Under Assumption[1](https://arxiv.org/html/2606.25004#Thmassumption1), we have

2μℒ\(𝒘\)≤‖g\(𝒘\)‖22≤2Lsℒ\(𝒘\),2\\mu\\mathcal\{L\}\(\{\\bm\{w\}\}\)\\leq\\\|g\(\{\\bm\{w\}\}\)\\\|\_\{2\}^\{2\}\\leq 2L\_\{s\}\\mathcal\{L\}\(\{\\bm\{w\}\}\),\(5\)where the upper bound is a standard result from smoothness\[[46](https://arxiv.org/html/2606.25004#bib.bib85), Lemma 1\]withinf𝒘ℒ\(𝒘\)=0\\inf\_\{\{\\bm\{w\}\}\}\\mathcal\{L\}\(\{\\bm\{w\}\}\)=0\.

Additionally, under Assumption[2](https://arxiv.org/html/2606.25004#Thmassumption2), the gradient noise for a single sampleiisatisfies𝔼i‖ζi1\(𝒘\)‖22≤σ2ℒ\(𝒘\)\.\\mathbb\{E\}\_\{i\}\\\|\\zeta\_\{i\}^\{1\}\(\{\\bm\{w\}\}\)\\\|\_\{2\}^\{2\}\\leq\\sigma^\{2\}\\mathcal\{L\}\(\{\\bm\{w\}\}\)\.For a mini\-batchξ\\xiof sizeBBsampled uniformly without replacement, we have

𝔼ξ‖ζB\(𝒘,ξ\)‖22=N−BB\(N−1\)𝔼i‖ζi1\(𝒘\)‖22\.\\mathbb\{E\}\_\{\\xi\}\\\|\\zeta^\{B\}\(\{\\bm\{w\}\},\\xi\)\\\|\_\{2\}^\{2\}=\\frac\{N\-B\}\{B\(N\-1\)\}\\mathbb\{E\}\_\{i\}\\\|\\zeta\_\{i\}^\{1\}\(\{\\bm\{w\}\}\)\\\|\_\{2\}^\{2\}\.Since\(N−B\)/\(N−1\)≤1\(N\-B\)/\(N\-1\)\\leq 1, we obtain

𝔼ξ‖ζB\(𝒘,ξ\)‖22≤σ2Bℒ\(𝒘\)\.\\mathbb\{E\}\_\{\\xi\}\\\|\\zeta^\{B\}\(\{\\bm\{w\}\},\\xi\)\\\|\_\{2\}^\{2\}\\leq\\frac\{\\sigma^\{2\}\}\{B\}\\mathcal\{L\}\(\{\\bm\{w\}\}\)\.\(6\)
### B\.1Proof of Lemma[1](https://arxiv.org/html/2606.25004#Thmlemma1)

###### Proof\.

Forp=2p=2, the SAM sharpness on mini\-batchξt\\xi\_\{t\}is

st=ρ‖∇ℒξt\(𝒘t\)‖2=ρ‖g\(𝒘t\)\+ζt‖2,s\_\{t\}=\\rho\\\|\\nabla\\mathcal\{L\}\_\{\\xi\_\{t\}\}\(\{\\bm\{w\}\}\_\{t\}\)\\\|\_\{2\}=\\rho\\\|g\(\{\\bm\{w\}\}\_\{t\}\)\+\\zeta\_\{t\}\\\|\_\{2\},whereζt:=gξt\(𝒘t\)−g\(𝒘t\)\\zeta\_\{t\}:=g\_\{\\xi\_\{t\}\}\(\{\\bm\{w\}\}\_\{t\}\)\-g\(\{\\bm\{w\}\}\_\{t\}\)\. Conditioning on𝒘t\{\\bm\{w\}\}\_\{t\}, and using the fact that the mini\-batch gradient is unbiased, we have

𝔼\[st2∣𝒘t\]=ρ2\(‖g\(𝒘t\)‖22\+𝔼\[‖ζt‖22∣𝒘t\]\)\.\\mathbb\{E\}\[s\_\{t\}^\{2\}\\mid\{\\bm\{w\}\}\_\{t\}\]=\\rho^\{2\}\\left\(\\\|g\(\{\\bm\{w\}\}\_\{t\}\)\\\|\_\{2\}^\{2\}\+\\mathbb\{E\}\[\\\|\\zeta\_\{t\}\\\|\_\{2\}^\{2\}\\mid\{\\bm\{w\}\}\_\{t\}\]\\right\)\.For the lower bound, drop the nonnegative noise term and apply the PL lower bound in Eq\.[5](https://arxiv.org/html/2606.25004#A2.E5):

𝔼\[st2∣𝒘t\]≥ρ2‖g\(𝒘t\)‖22≥2ρ2μℒ\(𝒘t\)\.\\mathbb\{E\}\[s\_\{t\}^\{2\}\\mid\{\\bm\{w\}\}\_\{t\}\]\\geq\\rho^\{2\}\\\|g\(\{\\bm\{w\}\}\_\{t\}\)\\\|\_\{2\}^\{2\}\\geq 2\\rho^\{2\}\\mu\\,\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{t\}\)\.For the upper bound, apply the upper bound in Eq\.[5](https://arxiv.org/html/2606.25004#A2.E5)and the mini\-batch noise bound Eq\.[6](https://arxiv.org/html/2606.25004#A2.E6):

𝔼\[st2∣𝒘t\]≤ρ2\(2Ls\+σ2B\)ℒ\(𝒘t\)\.\\mathbb\{E\}\[s\_\{t\}^\{2\}\\mid\{\\bm\{w\}\}\_\{t\}\]\\leq\\rho^\{2\}\\left\(2L\_\{s\}\+\\frac\{\\sigma^\{2\}\}\{B\}\\right\)\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{t\}\)\.This completes the proof\. ∎

### B\.2Proof of Theorem[1](https://arxiv.org/html/2606.25004#Thmtheorem1)

###### Proof\.

Taking expectations in Lemma[1](https://arxiv.org/html/2606.25004#Thmlemma1)and using the tower rule gives

κ¯ρ𝔼\[ℒ\(𝒘t\)\]≤𝔼\[st2\]≤κ¯ρ𝔼\[ℒ\(𝒘t\)\]\.\\underline\{\\kappa\}\_\{\\rho\}\\,\\mathbb\{E\}\[\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{t\}\)\]\\leq\\mathbb\{E\}\[s\_\{t\}^\{2\}\]\\leq\\overline\{\\kappa\}\_\{\\rho\}\\,\\mathbb\{E\}\[\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{t\}\)\]\.\(7\)
\(a\) Stability implies bounded sharpness\.SAM\-stability gives𝔼\[ℒ\(𝒘t\)\]≤Cℒ\(𝒘0\)\\mathbb\{E\}\[\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{t\}\)\]\\leq C\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{0\}\)for alltt\. Substituting into the upper bound in Eq\.[7](https://arxiv.org/html/2606.25004#A2.E7)gives

𝔼\[st2\]≤κ¯ρCℒ\(𝒘0\)\.\\mathbb\{E\}\[s\_\{t\}^\{2\}\]\\leq\\overline\{\\kappa\}\_\{\\rho\}C\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{0\}\)\.
\(b\) Exponential sharpness growth certifies instability\.The upper side of the sandwich bound in Eq\.[7](https://arxiv.org/html/2606.25004#A2.E7)implies

𝔼\[ℒ\(𝒘t\)\]≥1κ¯ρ𝔼\[st2\]\.\\mathbb\{E\}\[\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{t\}\)\]\\geq\\frac\{1\}\{\\overline\{\\kappa\}\_\{\\rho\}\}\\mathbb\{E\}\[s\_\{t\}^\{2\}\]\.Therefore, if𝔼\[st2\]≥Kαtℒ\(𝒘0\)\\mathbb\{E\}\[s\_\{t\}^\{2\}\]\\geq K\\alpha^\{t\}\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{0\}\)for allt≥t0t\\geq t\_\{0\}, then

𝔼\[ℒ\(𝒘t\)\]≥Kκ¯ραtℒ\(𝒘0\)\.\\mathbb\{E\}\[\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{t\}\)\]\\geq\\frac\{K\}\{\\overline\{\\kappa\}\_\{\\rho\}\}\\alpha^\{t\}\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{0\}\)\.Sinceα\>1\\alpha\>1andℒ\(𝒘0\)\>0\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{0\}\)\>0, this loss trajectory is unbounded relative toℒ\(𝒘0\)\\mathcal\{L\}\(\{\\bm\{w\}\}\_\{0\}\)and violates the condition in Def\.[6](https://arxiv.org/html/2606.25004#Thmdefinition6)\. Hence𝒘∗\{\\bm\{w\}\}^\{\*\}is SAM\-unstable\. ∎

### B\.3Proof of Theorem[2](https://arxiv.org/html/2606.25004#Thmtheorem2)

We first prove the following lemma for mini\-batch gradients\.

###### Lemma 2\.

Letξ\\xibe a mini\-batch of sizeBBsampled uniformly without replacement from a dataset of sizeNN\. WriteSg2:=1N∑i=1N‖gi\(𝐰\)‖22S\_\{g\}^\{2\}:=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\\|g\_\{i\}\(\{\\bm\{w\}\}\)\\\|\_\{2\}^\{2\}and definecg\(𝐰\):=‖g\(𝐰\)‖22/Sg2c\_\{g\}\(\{\\bm\{w\}\}\):=\\\|g\(\{\\bm\{w\}\}\)\\\|\_\{2\}^\{2\}/S\_\{g\}^\{2\}\(Def\.[7](https://arxiv.org/html/2606.25004#Thmdefinition7)\)\. AssumingSg2\>0S\_\{g\}^\{2\}\>0,

𝔼ξ‖gξ\(𝒘\)‖22=‖g\(𝒘\)‖22\+N−BB\(N−1\)\(1−cg\(𝒘\)\)Sg2\.\\mathbb\{E\}\_\{\\xi\}\\\|g\_\{\\xi\}\(\{\\bm\{w\}\}\)\\\|\_\{2\}^\{2\}=\\\|g\(\{\\bm\{w\}\}\)\\\|\_\{2\}^\{2\}\+\\frac\{N\-B\}\{B\(N\-1\)\}\(1\-c\_\{g\}\(\{\\bm\{w\}\}\)\)S\_\{g\}^\{2\}\.

###### Proof\.

Expanding the squared norm and usingPr⁡\[i∈ξ\]=B/N\\Pr\[i\\in\\xi\]=B/NandPr⁡\[i,j∈ξ\]=B\(B−1\)/\(N\(N−1\)\)\\Pr\[i,j\\in\\xi\]=B\(B\-1\)/\(N\(N\-1\)\)fori≠ji\\neq j,

𝔼ξ‖gξ‖22\\displaystyle\\mathbb\{E\}\_\{\\xi\}\\\|g\_\{\\xi\}\\\|\_\{2\}^\{2\}=1B2\(BN∑i‖gi‖22\+B\(B−1\)N\(N−1\)∑i≠j⟨gi,gj⟩\)\\displaystyle=\\frac\{1\}\{B^\{2\}\}\\left\(\\frac\{B\}\{N\}\\sum\_\{i\}\\\|g\_\{i\}\\\|\_\{2\}^\{2\}\+\\frac\{B\(B\-1\)\}\{N\(N\-1\)\}\\sum\_\{i\\neq j\}\\langle g\_\{i\},g\_\{j\}\\rangle\\right\)=1BSg2\+B−1B⋅1N\(N−1\)∑i≠j⟨gi,gj⟩\.\\displaystyle=\\frac\{1\}\{B\}S\_\{g\}^\{2\}\+\\frac\{B\-1\}\{B\}\\cdot\\frac\{1\}\{N\(N\-1\)\}\\sum\_\{i\\neq j\}\\langle g\_\{i\},g\_\{j\}\\rangle\.Since

∑i≠j⟨gi,gj⟩=N2‖g‖22−NSg2,\\sum\_\{i\\neq j\}\\langle g\_\{i\},g\_\{j\}\\rangle=N^\{2\}\\\|g\\\|\_\{2\}^\{2\}\-NS\_\{g\}^\{2\},substitution gives

𝔼ξ‖gξ‖22=‖g‖22\+N−BB\(N−1\)\(Sg2−‖g‖22\)\.\\mathbb\{E\}\_\{\\xi\}\\\|g\_\{\\xi\}\\\|\_\{2\}^\{2\}=\\\|g\\\|\_\{2\}^\{2\}\+\\frac\{N\-B\}\{B\(N\-1\)\}\(S\_\{g\}^\{2\}\-\\\|g\\\|\_\{2\}^\{2\}\)\.Plugging inSg2−‖g‖22=\(1−cg\(𝒘\)\)Sg2S\_\{g\}^\{2\}\-\\\|g\\\|\_\{2\}^\{2\}=\(1\-c\_\{g\}\(\{\\bm\{w\}\}\)\)S\_\{g\}^\{2\}yields the lemma\. ∎

###### Proof of Theorem[2](https://arxiv.org/html/2606.25004#Thmtheorem2)\.

Forp=q=2p=q=2,

𝖲=ρ‖g\(𝒘\)‖2,𝖲ξ=ρ‖gξ\(𝒘\)‖2,𝖲rms2=𝔼ξ\[𝖲ξ2\]\.\\mathsf\{S\}=\\rho\\\|g\(\{\\bm\{w\}\}\)\\\|\_\{2\},\\qquad\\mathsf\{S\}\_\{\\xi\}=\\rho\\\|g\_\{\\xi\}\(\{\\bm\{w\}\}\)\\\|\_\{2\},\\qquad\\mathsf\{S\}\_\{\\mathrm\{rms\}\}^\{2\}=\\mathbb\{E\}\_\{\\xi\}\[\\mathsf\{S\}\_\{\\xi\}^\{2\}\]\.Therefore,

𝖲rms2−𝖲2=𝔼ξ\[𝖲ξ2\]−𝖲2=ρ2\(𝔼ξ‖gξ\(𝒘\)‖22−‖g\(𝒘\)‖22\)\.\\mathsf\{S\}\_\{\\mathrm\{rms\}\}^\{2\}\-\\mathsf\{S\}^\{2\}=\\mathbb\{E\}\_\{\\xi\}\[\\mathsf\{S\}\_\{\\xi\}^\{2\}\]\-\\mathsf\{S\}^\{2\}=\\rho^\{2\}\\left\(\\mathbb\{E\}\_\{\\xi\}\\\|g\_\{\\xi\}\(\{\\bm\{w\}\}\)\\\|\_\{2\}^\{2\}\-\\\|g\(\{\\bm\{w\}\}\)\\\|\_\{2\}^\{2\}\\right\)\.Applying Lemma[2](https://arxiv.org/html/2606.25004#Thmlemma2)yields Eq\.[3](https://arxiv.org/html/2606.25004#S4.E3):

𝖲rms2−𝖲2=ρ2⋅N−BB\(N−1\)⋅\(1−cg\(𝒘\)\)⋅Sg2\.\\mathsf\{S\}\_\{\\mathrm\{rms\}\}^\{2\}\-\\mathsf\{S\}^\{2\}=\\rho^\{2\}\\cdot\\frac\{N\-B\}\{B\(N\-1\)\}\\cdot\\bigl\(1\-c\_\{g\}\(\{\\bm\{w\}\}\)\\bigr\)\\cdot S\_\{g\}^\{2\}\.
IfB≪NB\\ll N, thenN−BB\(N−1\)≈1B\\frac\{N\-B\}\{B\(N\-1\)\}\\approx\\frac\{1\}\{B\}\. If alsocg\(𝒘\)≪1c\_\{g\}\(\{\\bm\{w\}\}\)\\ll 1, then1−cg\(𝒘\)≈11\-c\_\{g\}\(\{\\bm\{w\}\}\)\\approx 1, and hence we have Eq\.[4](https://arxiv.org/html/2606.25004#S4.E4):

𝖲rms2−𝖲2≈ρ21BSg2=ρ21B⋅1N∑i=1N‖gi\(𝒘\)‖22\.\\mathsf\{S\}\_\{\\mathrm\{rms\}\}^\{2\}\-\\mathsf\{S\}^\{2\}\\approx\\rho^\{2\}\\frac\{1\}\{B\}S\_\{g\}^\{2\}=\\rho^\{2\}\\frac\{1\}\{B\}\\cdot\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\\|g\_\{i\}\(\{\\bm\{w\}\}\)\\\|\_\{2\}^\{2\}\.
Finally, if‖g\(𝒘\)‖2≤ε\\\|g\(\{\\bm\{w\}\}\)\\\|\_\{2\}\\leq\\varepsilon, then𝖲2≤ρ2ε2\\mathsf\{S\}^\{2\}\\leq\\rho^\{2\}\\varepsilon^\{2\}\. Therefore, wheneverε2≪Sg2/B\\varepsilon^\{2\}\\ll S\_\{g\}^\{2\}/B, we have

𝖲2≪ρ2Sg2/B≈𝖲rms2−𝖲2\.\\mathsf\{S\}^\{2\}\\ll\\rho^\{2\}S\_\{g\}^\{2\}/B\\approx\\mathsf\{S\}\_\{\\mathrm\{rms\}\}^\{2\}\-\\mathsf\{S\}^\{2\}\.Hence𝖲2≪𝖲rms2\\mathsf\{S\}^\{2\}\\ll\\mathsf\{S\}\_\{\\mathrm\{rms\}\}^\{2\}and𝖲≪𝖲rms\\mathsf\{S\}\\ll\\mathsf\{S\}\_\{\\mathrm\{rms\}\}\. ∎
Certification of Machine Learning Models via Directional Sharpness

Similar Articles

Graph Transductive Sharpening: Leveraging Unlabeled Predictions in Node Classification

Are Flat Minima an Illusion?

UniSHARP: Universal Sharp Monocular View Synthesis

Are Safety Guarantees in Neural Networks Safe? How to Compute Trustworthy Robustness Certifications

Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models

Submit Feedback

Similar Articles

Graph Transductive Sharpening: Leveraging Unlabeled Predictions in Node Classification
UniSHARP: Universal Sharp Monocular View Synthesis
Are Safety Guarantees in Neural Networks Safe? How to Compute Trustworthy Robustness Certifications
Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models