Few-Shot Resampling for Scalable Statistically-Sound Data Mining
Summary
Introduces FewRS, a resampling-based approach that drastically reduces the number of resampled datasets required for statistically-sound data mining, achieving up to two orders of magnitude speedup while maintaining rigorous false discovery control and high statistical power.
View Cached Full Text
Cached at: 06/11/26, 01:44 PM
# Few-Shot Resampling for Scalable Statistically-Sound Data Mining Source: [https://arxiv.org/html/2606.11235](https://arxiv.org/html/2606.11235) Leonardo PellegrinaDepartment of Information Engineering, University of PadovaVia Gradenigo 6bPadovaItaly35129[leonardo\.pellegrina@unipd\.it](https://arxiv.org/html/2606.11235v1/mailto:[email protected])Fabio VandinDepartment of Information Engineering, University of PadovaVia Gradenigo 6bPadovaItaly35129[fabio\.vandin@unipd\.it](https://arxiv.org/html/2606.11235v1/mailto:[email protected]) \(2026\) ###### Abstract\. A key step in knowledge discovery is the evaluation of data mining results\. In several applications, including pattern mining, graph analysis, and others, this step includes the evaluation of the statistical significance of the results, to avoid spurious discoveries due only to noise or random fluctuations in the data\. While specialized procedures have been developed for some specific applications, resampling\-based approaches are widely used, in particular for complex analyses where analytical results cannot be derived\. However, current resampling\-based approaches require the generation and analysis of thousands of resampled datasets, and are therefore impractical for large datasets or computationally intensive analyses\. In this paper, we introduceFewRS, a simple and effective resampling\-based approach to assess the statistical significance of data mining results with rigorous guarantees on the probability of false discoveries\. Our approach can be used in every situation where resampling\-based approaches are applied\.FewRSbuilds on our derivation of a novel bound to the supremum deviation of test statistics representing the quality of data mining results\. We prove thatFewRSneeds to generate and analyze an extremely small number of resampled datasets, leading to a highly scalable approach with wide applicability\. We test our approach on common tasks such as pattern mining and network analysis\. In all cases, our approach results in a reduction of up to two orders of magnitude in running time compared to the state of the art, while preserving high statistical power, enabling the statistical validation of data mining results on large\-scale real\-world datasets\. statistically\-sound data mining, resampling, pattern mining, graph mining, scalable algorithm, FWER ††journalyear:2026††copyright:cc††conference:Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2; August 09–13, 2026; Jeju Island, Republic of Korea††booktitle:Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2 \(KDD ’26\), August 09–13, 2026, Jeju Island, Republic of Korea††doi:10\.1145/3770855\.3817752††isbn:979\-8\-4007\-2259\-2/2026/08††ccs:Information systems Data mining††ccs:Mathematics of computing Probability and statistics## 1\.Introduction Knowledge discovery is a complex process consisting of several steps whose goal is to extract useful information from complex and large datasets\. A crucial component of this process is the mining step\(Hanet al\.,[2022](https://arxiv.org/html/2606.11235#bib.bib8); Leskovecet al\.,[2020](https://arxiv.org/html/2606.11235#bib.bib7)\), where the dataset is analyzed to extract patterns or other characteristics \(e\.g\., clusters\) of the data which provide actionable information to the end user or inform subsequent analyses\. A key issue in the mining step is to avoid uninteresting or spurious results\(Tanet al\.,[2016](https://arxiv.org/html/2606.11235#bib.bib9)\)that do not provide useful knowledge and are instead due to factors such noise or random fluctuations of the data\. A common approach to avoid spurious discoveries is the evaluation of the statistical significance of the data mining results within the statistical hypothesis testing framework\. In this framework, the data mining analyses are summarized by one or multiple*statistics*, i\.e\., scores quantitatively capturing the aspects of interest\. The analyst then formulates a*null hypothesis*on the underlying process that generated the data, corresponding to the situation in which results are not interesting\. The data mining results are evaluated by comparing the statistics with their distribution under the null hypothesis \(i\.e\., when the null hypothesis holds\): results with large deviations from the null distribution are unlikely to be due to random fluctuations, and are therefore flagged as*significant*; results that are likely under the null hypothesis are instead discarded\. Example: market basket analysis\.A common task in market basket analysis is to mine a set of transactions, each corresponding to the items bought by a customer, with the goal of identifying sets of items, or itemsets, which are frequently bought together\. To avoid uninteresting results, we want to discard itemsets whose selling frequency is explained by the frequency of the single items\. Therefore, we define the*null hypothesis*in which single items are placed into transactions independently with probability equal to their frequency in the dataset\. \(Note that under the null hypothesis no itemset is interesting, since items are placed independently in transactions\.\) As test statistic, we choose the number of itemsets of sizeccthat appear in a fraction at leastθ\\thetaof the transactions of the dataset\. Assume that the value of the statistic isss\. The significance of the results is then given by the probability that, under the null hypothesis,≥s\\geq sitemsets of sizeccappear in a fraction at leastθ\\thetaof the transactions\. If such probability, calledpp\-value, is small, then the itemsets are significant, otherwise they are not\. While the statistical hypothesis testing framework cannot certify the*validity*of data mining outcomes, it provides formal guarantees on the avoidance of spurious discoveries, thus reducing the risk of investing resources on invalid or not meaningful results\. More in detail, for a given statisticA\(𝒟\)A\(\\mathcal\{D\}\)computed by analyzing the dataset𝒟\\mathcal\{D\}, letpAp\_\{A\}be its*pp\-value*, that is, the probability that the statisticA\(𝒟0\)A\(\\mathcal\{D\}^\{0\}\)computed by analyzing a dataset𝒟0\\mathcal\{D\}^\{0\}generated under the null hypothesis is*more extreme*\(e\.g\., higher\) thanA\(𝒟\)A\(\\mathcal\{D\}\)\. For a given*significance threshold*α∈\(0,1\]\\alpha\\in\(0,1\], flagging the results as significant whenpA≤αp\_\{A\}\\leq\\alphaguarantees that the probability of reporting a false discovery \(i\.e\., flagging the results as significant when they are not\) is bounded byα\\alpha\. When multiple analyses, described by statisticsA1\(𝒟\),A2\(𝒟\),…,Ak\(𝒟\)A\_\{1\}\(\\mathcal\{D\}\),A\_\{2\}\(\\mathcal\{D\}\),\\dots,A\_\{k\}\(\\mathcal\{D\}\), are performed, often one wants to bound the*Family\-Wise Error Rate*\(FWER\), that is the probability of even a single false discovery\. Several approaches\(Bonferroni,[1936](https://arxiv.org/html/2606.11235#bib.bib44); Holm,[1979](https://arxiv.org/html/2606.11235#bib.bib35); Hochberg,[1988](https://arxiv.org/html/2606.11235#bib.bib10)\)have been developed for such a*multiple hypothesis testing*scenario\. In essence, such approaches require to compute an*adjusted*significant thresholdα∗\\alpha^\{\*\}, which depends on the desired boundα\\alphaon the FWER and the numberkkof hypotheses\. Results withpp\-value belowα∗\\alpha^\{\*\}are then flagged as significant\. For some statistic and null hypotheses, thepp\-valuepAp\_\{A\}for a statisticA\(𝒟\)A\(\\mathcal\{D\}\)can be computed analytically\. For example, in market basket analysis, ifA\(𝒟\)A\(\\mathcal\{D\}\)is the frequency of a given itemset in𝒟\\mathcal\{D\}and the null hypothesis is the one described in our example above \(where items are placed independently in each transaction\), then it is easy to see that the frequencyA\(𝒟0\)A\(\\mathcal\{D\}^\{0\}\)of the itemset in a dataset𝒟0\\mathcal\{D\}^\{0\}from the null hypothesis follows a binomial distribution \(whose parameters are the product of the frequency of the single items in the itemset and the number of transactions in the dataset\)\. Several approaches have been designed to identify significant results for specific data mining tasks, such as significant pattern mining\(Webb,[2006](https://arxiv.org/html/2606.11235#bib.bib42),[2007](https://arxiv.org/html/2606.11235#bib.bib41),[2008](https://arxiv.org/html/2606.11235#bib.bib40); Teradaet al\.,[2013](https://arxiv.org/html/2606.11235#bib.bib185); Minatoet al\.,[2014](https://arxiv.org/html/2606.11235#bib.bib39)\), when the null hypothesis allows the analytical computation ofpp\-values\. However, analytical results cannot be exploited for more complex analyses or null hypotheses\. In such scenarios, resampling\-based approaches\(Westfall and Young,[1993](https://arxiv.org/html/2606.11235#bib.bib43)\)are widely used\. Such methods assess the statistical significance of the results by comparingA\(𝒟\)A\(\\mathcal\{D\}\)with the*empirical distribution*of the statisticA\(𝒟0\)A\(\\mathcal\{D\}^\{0\}\)under the null hypothesis\. The latter is obtained by generating a large number of*resampled datasets*, or resamples, which are datasets generated according to the null hypothesis\. Example: network polarization\.Given a graphGGrepresenting a social network, we are interested in studying its polarization\(Conoveret al\.,[2011](https://arxiv.org/html/2606.11235#bib.bib272); Cinelliet al\.,[2021](https://arxiv.org/html/2606.11235#bib.bib273); Garimellaet al\.,[2018](https://arxiv.org/html/2606.11235#bib.bib271); González\-Bailónet al\.,[2023](https://arxiv.org/html/2606.11235#bib.bib274); Pretiet al\.,[2025b](https://arxiv.org/html/2606.11235#bib.bib6)\)using one of the many measures available as our statisticA\(G\)A\(G\)\. These measures depend on the graph and on the colors of the vertices, where the color of a vertex represent the different sides or groups in an argument or debate\. The null hypothesis is that the edges in the graph are independent of the colors of the vertices, but the total number of edges that connect vertices of different colors are fixed; moreover, to capture the highly\-skewed degree distribution of real networks, in the null hypothesis the degree sequence of vertices is fixed\. For such an example, thepp\-value cannot be computed analytically, even for simple statistics such as the average, over the vertices, of the fraction of neighbors of a vertex that have the same color of the vertex\. However, one can use recent methods\(Fosdicket al\.,[2018](https://arxiv.org/html/2606.11235#bib.bib266); Pretiet al\.,[2025a](https://arxiv.org/html/2606.11235#bib.bib261)\)to generate \(independent\) resampled networks from the null distribution\. Resampling\-based approaches are also widely used in the multiple hypothesis testing scenario, even when thepp\-values can be computed analytically\. In fact, when the analytical computation ofpp\-values is possible, the*adjusted*significant thresholdα∗\\alpha^\{\*\}can be accurately estimated from resampled datasets\(Westfall and Young,[1993](https://arxiv.org/html/2606.11235#bib.bib43)\)\. Compared to analytical corrections, resampling\-based approaches result in higher*statistical power*\(i\.e\., the ability to correctly flag significant hypotheses\)\. This is a consequence of the fact that they inherently consider the correlation structure among hypotheses, while analytical adjustments are affected by worst\-case assumptions\. Given their usefulness and wide applicability, several methods leverage a resampling scheme \(see Section[2](https://arxiv.org/html/2606.11235#S2)\)\. However, they all require to generate*thousands*of resampled datasets for common values ofα\\alpha\(i\.e\.,α=0\.1\\alpha=0\.1orα=0\.05\\alpha=0\.05\)\. For complex analyses and/or large datasets, mining or generating even a*single*resampled dataset may take hours or longer\. Indeed, this is the case for the network analysis example described above\. Therefore, currently available resampling\-based approaches are impractical for modern\-sized datasets whenever the data mining step or the resampling procedure is computationally intensive\. Our contributions\.In this paper, we focus on resampling\-based approaches for assessing the statistical significance of data mining results\. Our contributions are fourfold\. Firstly, we introduceFewRS, a simple and effective resampling\-based approach to assess the statistical significance of data mining results while controlling the FWER\.FewRSuses a few\-shot resampling approach, that is, it analyzes a small number of resampled datasets, leading to a highly scalable method\.FewRScan leverage*any*procedure to generate resampled datasets and can be applied to any data mining analysis where the results are summarized by a statistic\. Therefore, our approach is applicable to a wide range of situations and data mining tasks\. Second, we derive a novel bound to the supremum deviation of test statistics representing the quality of data mining results\. This result is crucial in making our method practical, since it implies that mining a small number of resampled datasets is enough to identify statistically significant results\. Third, we prove thatFewRSalso provides guarantees on its statistical power \(i\.e\., the ability to correctly report significant hypotheses\)\. Remarkably, our analysis does not require any assumption on the distribution of significant patterns\. Fourth, we perform an extensive empirical evaluation on several analyses for common data mining tasks such as pattern mining and network analysis\. In all cases,FewRSresults in a reduction of up to two orders of magnitude in running time compared to the state of the art, while preserving high statistical power, enabling the statistical validation of data mining results on large\-scale real\-world datasets\. ## 2\.Related Works Our work focuses on efficient resampling\-based approaches to evaluate the statistical significance of data mining results\. We now describe the works most related to ours, and point the reader to recent reviews and tutorials\(Hämäläinen and Webb,[2019](https://arxiv.org/html/2606.11235#bib.bib196); Pellegrinaet al\.,[2019a](https://arxiv.org/html/2606.11235#bib.bib195)\)for a wider introduction to statistically\-sound data mining algorithms, and to the book byWestfall and Young \([1993](https://arxiv.org/html/2606.11235#bib.bib43)\)for an introduction to resampling\-based approaches\. Gioniset al\.\([2007](https://arxiv.org/html/2606.11235#bib.bib267)\)introduced the use of permutation strategies to assess data mining results\. Their approach focuses on datasets with binary features, represented as0\-11matrices, and on a specific null hypothesis, where every column and every row in a random dataset must have the same number of11’s as the original dataset\.Kirschet al\.\([2012](https://arxiv.org/html/2606.11235#bib.bib184)\)introduced a method to identify a set of statistically significant patterns with a small false discovery rate \(FDR\)\(Benjamini and Hochberg,[1995](https://arxiv.org/html/2606.11235#bib.bib45)\)\. Their method is specific for itemsets mining and for the null hypothesis where items are placed independently into transactions while preserving the expected frequency of each item\.Dalleiger and Vreeken \([2022](https://arxiv.org/html/2606.11235#bib.bib1)\)focused on mining non\-redundant sets of statistically significant patterns while controlling the FWER or the FDR\. Their method is specific for itemsets and a maximum entropy null\-distribution\. General resampling\-based approaches\(Westfall and Young,[1993](https://arxiv.org/html/2606.11235#bib.bib43)\)may be used to assess the statistical significance of data mining results, but their straightforward application requires to generate and mine a large number of resampled datasets\. A large body of recent work has focused on providing computationally efficient procedures for implementing some permutation\-based approaches in specific scenarios, or for the efficient generation of resampled datasets\. Several works\(Teradaet al\.,[2013](https://arxiv.org/html/2606.11235#bib.bib185); Llinares\-Lópezet al\.,[2015](https://arxiv.org/html/2606.11235#bib.bib57); Pellegrinaet al\.,[2019b](https://arxiv.org/html/2606.11235#bib.bib217); Pellegrina and Vandin,[2020](https://arxiv.org/html/2606.11235#bib.bib56); Teradaet al\.,[2015](https://arxiv.org/html/2606.11235#bib.bib38)\)have focused on*significant pattern mining*, where the goal is to identify patterns significantly associated with the value of a given binary feature \(the target\) in a transactional dataset\.\(Llinares\-Lópezet al\.,[2015](https://arxiv.org/html/2606.11235#bib.bib57); Pellegrina and Vandin,[2020](https://arxiv.org/html/2606.11235#bib.bib56); Teradaet al\.,[2015](https://arxiv.org/html/2606.11235#bib.bib38)\)use the Westfall\-Young \(WY\) permutation test\(Westfall and Young,[1993](https://arxiv.org/html/2606.11235#bib.bib43)\)\. The WY permutation procedure requires to estimate the quantile of the distribution of the minimum \(over all patterns\)pp\-value \(or, equivalently, of the distribution of the maximum deviation, over all patterns, of the measure of significance\)\. The use of such quantiles makes WY permutation testing more powerful than previous analytical approaches \(e\.g\., LAMP\(Teradaet al\.,[2013](https://arxiv.org/html/2606.11235#bib.bib185)\)\), but also more computationally expensive, since the estimation of such quantiles requires to mine a large number of permuted datasets\. The work most related to ours is\(Pellegrina and Vandin,[2024](https://arxiv.org/html/2606.11235#bib.bib11)\), which introduced FSR, a few\-shot resampling approach for significant pattern mining with rigorous guarantees on the FWER\. While FSR can be applied to several pattern types \(e\.g\., itemsets, sequential patterns, subgroups\) and to two different null hypotheses, it is still limited to the task of significant pattern mining described above, and cannot be applied to other data mining tasks \(e\.g\., more general pattern mining analyses or to graph mining, such as the example in Section[1](https://arxiv.org/html/2606.11235#S1)\)\. Our approach instead directly supports any scenario where the results are summarized by a statistic and resampling is possible\. Different lines of work have studied i\) efficient procedures to generate resampled datasets for various data types, including sequential data\(Tonon and Vandin,[2019](https://arxiv.org/html/2606.11235#bib.bib54); Jenkinset al\.,[2022](https://arxiv.org/html/2606.11235#bib.bib53)\), graphs\(Pretiet al\.,[2025a](https://arxiv.org/html/2606.11235#bib.bib261)\), and general binary transactional and sequence datasets\(Pretiet al\.,[2024](https://arxiv.org/html/2606.11235#bib.bib263); Abuissaet al\.,[2023](https://arxiv.org/html/2606.11235#bib.bib268)\), and ii\) pattern sampling or mining with constraints\(Al Hasan and Zaki,[2009](https://arxiv.org/html/2606.11235#bib.bib32); Boleyet al\.,[2011](https://arxiv.org/html/2606.11235#bib.bib5); Diopet al\.,[2018](https://arxiv.org/html/2606.11235#bib.bib4); Giacometti and Soulet,[2018](https://arxiv.org/html/2606.11235#bib.bib3); Diop,[2022](https://arxiv.org/html/2606.11235#bib.bib2)\)\(to reduce the cost of mining a single dataset\)\. These procedures are orthogonal to our method, which can be used to reduce the number of resampled datasets required in any application and can therefore exploit any method for the efficient generation of resampled datasets and any method to compute/bound the maximum test statistic\. ## 3\.Preliminaries We are given a dataset𝒟∈𝒳\\mathcal\{D\}\\in\\mathcal\{X\}from a domain𝒳\\mathcal\{X\}\. We assume that𝒟\\mathcal\{D\}comes from an \(unknown\) distributionγ\\gamma, that is,𝒟∼γ\\mathcal\{D\}\\sim\\gamma\. We define a set of analyses𝒜=\{A1,A2,…,Ak\}\\mathcal\{A\}=\\\{A\_\{1\},A\_\{2\},\\dots,A\_\{k\}\\\}as bounded real\-valued functionsAi:𝒳→ℝA\_\{i\}:\\mathcal\{X\}\\rightarrow\\mathbb\{R\}\. Every analysisAiA\_\{i\}corresponds to a data mining task or analysis, andAi\(𝒟\)A\_\{i\}\(\\mathcal\{D\}\)is the*statistic*corresponding to the result of analysisAiA\_\{i\}on the dataset𝒟\\mathcal\{D\}\. For example, ifAiA\_\{i\}corresponds to frequent pattern mining,Ai\(𝒟\)A\_\{i\}\(\\mathcal\{D\}\)may be defined as the number of frequent patterns mined from𝒟\\mathcal\{D\}\. Our goal is to find statistically significant results from analyses𝒜=\{A1,A2,…,Ak\}\\mathcal\{A\}=\\\{A\_\{1\},A\_\{2\},\\dots,A\_\{k\}\\\}\. To this end, we need to define a*null hypothesis*corresponding to the situation in which results are not interesting\. We define the null hypothesis through a corresponding*null distribution*γ0\\gamma\_\{0\}on datasets in the domain𝒳\\mathcal\{X\}\.γ0\\gamma\_\{0\}defines the probability with which datasets under the null hypothesis \(i\.e\., with no significant results\) are generated\. We define the setℋ0⊆𝒜=\{A1,A2,…,Ak\}\\mathcal\{H\}\_\{0\}\\subseteq\\mathcal\{A\}=\\\{A\_\{1\},A\_\{2\},\\dots,A\_\{k\}\\\}of*true null hypotheses*as the subset of\{A1,A2,…,Ak\}\\\{A\_\{1\},A\_\{2\},\\dots,A\_\{k\}\\\}for which the null hypothesis holds\. We denote with𝒟0\\mathcal\{D\}^\{0\}a \(random\) dataset sampled from the null distributionγ0\\gamma\_\{0\}\. Given the dataset𝒟\\mathcal\{D\}generated from the \(unknown\) distributionγ\\gamma, we then define the*null hypotheses*for the analysesAi∈𝒜A\_\{i\}\\in\\mathcal\{A\}as the hypotheses that observed resultsAi\(𝒟\)A\_\{i\}\(\\mathcal\{D\}\)*well conform*to the distribution of the statisticsAi\(𝒟0\)A\_\{i\}\(\\mathcal\{D\}^\{0\}\), where the dataset𝒟0\\mathcal\{D\}^\{0\}comes from the null distributionγ0\\gamma\_\{0\}\. We test the null hypothesis forAiA\_\{i\}using the dataset𝒟\\mathcal\{D\}; for simplicity, we assume that larger values ofAi\(𝒟\)A\_\{i\}\(\\mathcal\{D\}\)corresponds to higher evidence of significance for theii\-th hypothesis, but our approach easily extends \(e\.g\., by changing the sign of test statistic\) to the case where lower values ofAi\(𝒟\)A\_\{i\}\(\\mathcal\{D\}\)provide higher evidence of significance\. More precisely, we reject theii\-th null hypothesis if we are able to conclude that the observed valueAi\(𝒟\)A\_\{i\}\(\\mathcal\{D\}\)is*unlikely*under the distributionγ0\\gamma\_\{0\}, that isPr𝒟0∼γ0\(Ai\(𝒟0\)≥Ai\(𝒟\)\)≤t\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\left\(A\_\{i\}\(\\mathcal\{D\}^\{0\}\)\\geq A\_\{i\}\(\\mathcal\{D\}\)\\right\)\\leq t, for some small value oftt\. Such a probability is the*pp\-value*ofAiA\_\{i\}\. More generally, we want to identify a subset of𝒜\\mathcal\{A\}that is significant, i\.e\., the elements of𝒜\\mathcal\{A\}that significantly deviate from what is expected under the null distributionγ0\\gamma\_\{0\}\. Ideally, we want to reject the largest set of hypotheses while controlling the*Family\-Wise Error Rate*\(FWER\) below a valueα\\alphadefined by the user, where the FWER is the probability that at least one element ofℋ0\\mathcal\{H\}\_\{0\}is reported as significant \(i\.e\., of at least one false discovery\)\. A procedure to assess the statistical significance while controlling the FWER is the Bonferroni correction; this approach computes thepp\-valuepi=Pr𝒟0∼γ0\(Ai\(𝒟0\)≥Ai\(𝒟\)\)p\_\{i\}=\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\left\(A\_\{i\}\(\\mathcal\{D\}^\{0\}\)\\geq A\_\{i\}\(\\mathcal\{D\}\)\\right\)for eachii, and then outputs the set\{Ai:pi≤α/k\}\\left\\\{A\_\{i\}:p\_\{i\}\\leq\\alpha/k\\right\\\}\. While theα/k\\alpha/kmultiplicity correction is straightforward, this procedure features two key issues\. The first issue is that, as discussed previously, in most data mining tasks the null distributionγ0\\gamma\_\{0\}is quite complex, and the probabilitiespip\_\{i\}do not admit a closed\-form expression; in these settings,pip\_\{i\}cannot be computed, nor estimated efficiently\. This is the case for all the real\-world settings we consider in this work\. The second issue is that, even when thepip\_\{i\}’s are available, this analytic correction is in almost all situations too conservative and restrictive, resulting in a loss of*statistical power*, that is, few or no significant patterns are reported in output\. This is due to the inherent worst\-case assumption of independence among the analyses of𝒜\\mathcal\{A\}made by the Bonferroni correction, which is typically not met in practice\. To address such issues, state\-of\-the\-art methods adopt resampling to assess the statistical significance of their results\. ### 3\.1\.WY Resampling We now describe the Westfall\-Young \(WY\) resampling approach\(Westfall and Young,[1993](https://arxiv.org/html/2606.11235#bib.bib43)\)to identify significant elements of𝒜\\mathcal\{A\}while controlling the FWER belowα\\alpha\. The WY resampling procedure inherently considers the correlation structure among analyses𝒜=\{A1,A2,…,Ak\}\\mathcal\{A\}=\\\{A\_\{1\},A\_\{2\},\\dots,A\_\{k\}\\\}, resulting in \(asymptotically\) optimal statistical power\(Meinshausenet al\.,[2011](https://arxiv.org/html/2606.11235#bib.bib46)\)\. The analysis of\(Westfall and Young,[1993](https://arxiv.org/html/2606.11235#bib.bib43)\)leverages a technical condition, called*subset pivotality*, that is, the assumption that joint distribution of the statistics for which the null hypothesis is true \(i\.e\., analysesAiA\_\{i\}inℋ0\\mathcal\{H\}\_\{0\}\) is the same when*all*hypotheses are true null \(i\.e\.,ℋ0=𝒜\\mathcal\{H\}\_\{0\}=\\mathcal\{A\}\)\. While this assumption is sufficient to establish guarantees, it is still open whether it is necessary\(Westfall and Troendle,[2008](https://arxiv.org/html/2606.11235#bib.bib275)\)\. Analogously to previous work on significant pattern mining\(Llinares\-Lópezet al\.,[2015](https://arxiv.org/html/2606.11235#bib.bib57); Pellegrina and Vandin,[2020](https://arxiv.org/html/2606.11235#bib.bib56),[2024](https://arxiv.org/html/2606.11235#bib.bib11)\), we assume subset pivotality\. Note that this assumption is useful only when more than one analysis is performed \(k\>1k\>1\), and not needed whenk=1k=1\. Defineδ\(α\)\\delta\(\\alpha\)as δ\(α\)=argminx∈ℝ\{Pr𝒟0∼γ0\(maxA∈𝒜A\(𝒟0\)\>x\)≤α\}\.\\displaystyle\\delta\(\\alpha\)=\\operatorname\*\{arg\\,min\}\_\{x\\in\\mathbb\{R\}\}\\left\\\{\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\Bigl\(\\max\_\{A\\in\\mathcal\{A\}\}A\(\\mathcal\{D\}^\{0\}\)\>x\\Bigr\)\\leq\\alpha\\right\\\}\.The following result, from\(Westfall and Young,[1993](https://arxiv.org/html/2606.11235#bib.bib43)\), guarantees that, if one first identifiesδ\(α\)\\delta\(\\alpha\)and then reports in output all analysesAiA\_\{i\}withAi\(𝒟\)\>δ\(α\)A\_\{i\}\(\\mathcal\{D\}\)\>\\delta\(\\alpha\), then the FWER is bounded byα\\alpha\. That is, the following holds\. ###### Lemma 1\. LetR=\{Ai:Ai\(𝒟\)\>δ\(α\)\}R=\\\{A\_\{i\}:A\_\{i\}\(\\mathcal\{D\}\)\>\\delta\(\\alpha\)\\\}\. It holds Pr𝒟∼γ\(R∩ℋ0≠∅\)=Pr𝒟∼γ\(maxAi∈ℋ0\{Ai\(𝒟\)\}\>δ\(α\)\)≤α\.\\displaystyle\\Pr\_\{\\mathcal\{D\}\\sim\\gamma\}\(R\\cap\\mathcal\{H\}\_\{0\}\\neq\\emptyset\)=\\Pr\_\{\\mathcal\{D\}\\sim\\gamma\}\\Bigl\(\\max\_\{A\_\{i\}\\in\\mathcal\{H\}\_\{0\}\}\\\{A\_\{i\}\(\\mathcal\{D\}\)\\\}\>\\delta\(\\alpha\)\\Bigr\)\\leq\\alpha\. Despite the interesting theoretical properties ofδ\(α\)\\delta\(\\alpha\), i\.e\., that it allows reporting a set of hypothesesRRas significant with control of the FWER, computingδ\(α\)\\delta\(\\alpha\)is in almost all practical cases not possible, since no closed form of the joint distribution of𝒜\\mathcal\{A\}is typically available\. Therefore,δ\(α\)\\delta\(\\alpha\)is estimated using the following resampling procedure\. For an integerr≥1r\\geq 1, define𝒮=\{𝒟10,…,𝒟r0\}\\mathcal\{S\}=\\\{\\mathcal\{D\}^\{0\}\_\{1\},\\dots,\\mathcal\{D\}^\{0\}\_\{r\}\\\}as a set ofrrresamples taken i\.i\.d\. from the null distributionγ0\\gamma\_\{0\}\. For alli∈\[1,r\]i\\in\[1,r\], definedid\_\{i\}as the maximum test statistic, over the set𝒜\\mathcal\{A\}, evaluated on the resample𝒟i0\\mathcal\{D\}^\{0\}\_\{i\}:di=maxA∈𝒜\{A\(𝒟i0\)\}d\_\{i\}=\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\_\{i\}\)\\\}\. Assume, w\.l\.o\.g\., that the values\{d1,…,dr\}\\\{d\_\{1\},\\dots,d\_\{r\}\\\}are sorted in non\-increasing order, such thatdi≤dj,∀i\>jd\_\{i\}\\leq d\_\{j\},\\forall i\>j\. The estimateδ~\(α,𝒮\)\\tilde\{\\delta\}\(\\alpha,\\mathcal\{S\}\)ofδ\(α\)\\delta\(\\alpha\)is defined as theα\\alpha\-quantile of the set\{d1,…,dr\}\\\{d\_\{1\},\\dots,d\_\{r\}\\\}:δ~\(α,𝒮\)=d⌈αr⌉\\tilde\{\\delta\}\(\\alpha,\\mathcal\{S\}\)=d\_\{\\lceil\\alpha r\\rceil\}\. From the definition ofδ\(α\)\\delta\(\\alpha\), the estimateδ~\(α,𝒮\)\\tilde\{\\delta\}\(\\alpha,\\mathcal\{S\}\)converges toδ\(α\)\\delta\(\\alpha\)asr→\+∞r\\rightarrow\+\\infty, i\.e\., it holdslimr→\+∞𝔼𝒮\[δ~\(α,𝒮\)\]=δ\(α\)\\lim\_\{r\\rightarrow\+\\infty\}\\mathop\{\\mathbb\{E\}\}\_\{\\mathcal\{S\}\}\[\\tilde\{\\delta\}\(\\alpha,\\mathcal\{S\}\)\]=\\delta\(\\alpha\)\. The WY resampling approach provides a powerful technique to identify significant results whenever one or more analyses are considered, and it is widely applicable since it only requires the ability of resampling datasets from the null distributionγ0\\gamma\_\{0\}\. However, in practice its application can be extremely computationally intensive, in particular for large datasets or complex data mining tasks, for three reasons\. First, in order to obtain accurate estimates ofδ\(α\)\\delta\(\\alpha\), the numberrrof resampled datasets must be very large even for reasonably large values ofα\\alpha\. In fact, for commonly used values ofα\\alpha\(i\.e\.,α=0\.1\\alpha=0\.1orα=0\.05\\alpha=0\.05\),rrshould be of the order of10410^\{4\}\. Second, mining a dataset to computedi=maxA∈𝒜\{A\(𝒟i0\)\}d\_\{i\}=\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\_\{i\}\)\\\}can be very expensive for large datasets, even when only one complex analysis is performed\. Third, in many applications generating even one resample from the null distributionγ0\\gamma\_\{0\}is very expensive \(e\.g\., for generating networks where the degrees of vertices are fixed\)\. ## 4\.FewRSAlgorithm In this section we describe our algorithmFewRSto identify statistically significant results with rigorous guarantees on the FWER\.FewRSuses a*few\-shot*resampling approach, and can be used in every situation where resampling\-based approaches apply, providing a significant advantage even when a single analysis is performed\.FewRSbuilds on a novel probabilistic bound to the supremum deviation of statistics that is estimated on an extremely small number of resampled datasets \(see Section[4\.1](https://arxiv.org/html/2606.11235#S4.SS1)\)\. FewRSis described in Algorithm[1](https://arxiv.org/html/2606.11235#algorithm1)\. At a high level,FewRScomputes an upper boundδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)to the ideal thresholdδ\(α\)\\delta\(\\alpha\)defined by the WY resampling approach, and reports as significant all analyses with values \(in the real dataset\) aboveδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)\. The numbermmof resampled datasets used to computeδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)is set to the minimum value which guarantees thatδ^\(𝒮\)≥δ\(α\)\\hat\{\\delta\}\(\\mathcal\{S\}\)\\geq\\delta\(\\alpha\)with probability at least1−α1\-\\alpha, which leads to the required FWER guarantees \(see Section[4\.1](https://arxiv.org/html/2606.11235#S4.SS1)\)\. More in details,FewRSstarts by computing the numbermmof resampled datasets, that depends on the FWER upper boundα\\alphaprovided in input by the user \(line[1](https://arxiv.org/html/2606.11235#algorithm1)\)\. It then generates a set𝒮=\{𝒟10,𝒟20,…,𝒟m0\}\\mathcal\{S\}=\\\{\\mathcal\{D\}^\{0\}\_\{1\},\\mathcal\{D\}^\{0\}\_\{2\},\\dots,\\mathcal\{D\}^\{0\}\_\{m\}\\\}ofmmresampled datasets, that are drawn i\.i\.d\. from the null distributionγ0\\gamma\_\{0\}\(line[1](https://arxiv.org/html/2606.11235#algorithm1)\)\. The resampled datasets are generated with the procedureResample\(γ0,m\\gamma\_\{0\},m\)\. Then,FewRScomputes a thresholdδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)by evaluating the analyses𝒜\\mathcal\{A\}on themmresampled datasets𝒮\\mathcal\{S\}\(line[1](https://arxiv.org/html/2606.11235#algorithm1)\)\. The definition ofδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)is extremely simple, sinceδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)is the maximum value observed in the resampled datasets over the performed analyses\. The setRRof results is then computed by finding the analyses with values in the real dataset that exceed the thresholdδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)\(line[1](https://arxiv.org/html/2606.11235#algorithm1)\)\. Input:Set of analyses 𝒜\\mathcal\{A\}; dataset 𝒟\\mathcal\{D\}; null distr\. γ0\\gamma\_\{0\}; α∈\(0,1\)\\alpha\\in\(0,1\)\. Output:SetR⊆𝒜R\\subseteq\\mathcal\{A\}of significant analyses with FWER≤α\\leq\\alpha\. 1 m←⌈ln\(1α\)ln\(11−α\)⌉m\\leftarrow\\left\\lceil\\frac\{\\ln\\bigl\(\\frac\{1\}\{\\alpha\}\\bigr\)\}\{\\ln\\bigl\(\\frac\{1\}\{1\-\\alpha\}\\bigr\)\}\\right\\rceil; 2 𝒮←\\mathcal\{S\}\\leftarrowResample\(γ0,m\\gamma\_\{0\},m\); 3 δ^\(𝒮\)←max𝒟0∈𝒮,A∈𝒜A\(𝒟0\)\\hat\{\\delta\}\(\\mathcal\{S\}\)\\leftarrow\\max\_\{\\mathcal\{D\}^\{0\}\\in\\mathcal\{S\},A\\in\\mathcal\{A\}\}A\(\\mathcal\{D\}^\{0\}\); 4 R←\{A∈𝒜:A\(𝒟\)\>δ^\(𝒮\)\}R\\leftarrow\\bigl\\\{A\\in\\mathcal\{A\}:A\(\\mathcal\{D\}\)\>\\hat\{\\delta\}\(\\mathcal\{S\}\)\\bigr\\\}; 5return RR; Algorithm 1FewRSNote that the numbermmof resampled dataset required byFewRSis much smaller than current approaches, which is of the order of10410^\{4\}\. For example,FewRSrequires onlym=22m=22resampled datasets forα=0\.1\\alpha=0\.1, andm=59m=59resampled datasets forα=0\.05\\alpha=0\.05, providing a significant reduction over the state\-of\-the\-art\. FewRSmakes use of the procedureResample\(γ0,m\\gamma\_\{0\},m\)to generate themmresampled datasets from the null distributionγ0\\gamma\_\{0\}\. The specific resampling procedure depends on the datasets, data mining tasks, and null hypothesis of interest\. These must be specified by the user and are captured by the null distributionγ0\\gamma\_\{0\}provided in input toFewRS\. However,FewRScan employ any resampling procedure, with no restriction, and is therefore applicable to any scenario where resampling approaches are employed\. ### 4\.1\.Analysis We now prove thatFewRSprovides the required guarantees on the FWER\. Due to space constraints, all proofs are in the Appendix\. We start by proving the following technical result, which shows that, under the null hypothesisγ0\\gamma\_\{0\}, the maximum deviation has probability at leastα\\alphato have a value greater*or equal*than the ideal thresholdδ\(α\)\\delta\(\\alpha\)defined by the WY resampling approach\. Intuitively, this is trivially true whenmaxA∈𝒜\{A\(𝒟0\)\}\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}is a continuous random variable, while it follows from the definition ofδ\(α\)\\delta\(\\alpha\)\(see Section[3\.1](https://arxiv.org/html/2606.11235#S3.SS1)\) whenmaxA∈𝒜\{A\(𝒟0\)\}\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}is a discrete random variable\. ###### Lemma 1\. Let𝒟0\\mathcal\{D\}^\{0\}be a resample taken fromγ0\\gamma\_\{0\}\. Then, it holds Pr𝒟0∼γ0\(maxA∈𝒜\{A\(𝒟0\)\}≥δ\(α\)\)≥α\.\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\Bigl\(\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}\\geq\\delta\(\\alpha\)\\Bigr\)\\geq\\alpha\. Now, consider again them≥1m\\geq 1resamples𝒮=\{𝒟10,𝒟20,…,𝒟m0\}\\mathcal\{S\}=\\\{\\mathcal\{D\}^\{0\}\_\{1\},\\mathcal\{D\}^\{0\}\_\{2\},\\dots,\\mathcal\{D\}^\{0\}\_\{m\}\\\}drawn i\.i\.d\. from the null distributionγ0\\gamma\_\{0\}byFewRS\. Recall thatδ^\(𝒮\)=max𝒟0∈𝒮,A∈𝒜\{A\(𝒟0\)\}\\hat\{\\delta\}\(\\mathcal\{S\}\)=\\max\_\{\\mathcal\{D\}^\{0\}\\in\\mathcal\{S\},A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}is the maximum test statistic computed over all resamples\. The following result proves that, ifmmis large enough, thenδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)provides an upper bound toδ\(α\)\\delta\(\\alpha\)with probability at least1−α1\-\\alpha\. ###### Lemma 2\. Letm≥⌈ln\(1α\)ln\(11−α\)⌉m\\geq\\Bigl\\lceil\\frac\{\\ln\\left\(\\frac\{1\}\{\\alpha\}\\right\)\}\{\\ln\\left\(\\frac\{1\}\{1\-\\alpha\}\\right\)\}\\Bigr\\rceil\. ThenPr𝒮\(δ^\(𝒮\)≥δ\(α\)\)≥1−α\\Pr\_\{\\mathcal\{S\}\}\\bigl\(\\hat\{\\delta\}\(\\mathcal\{S\}\)\\geq\\delta\(\\alpha\)\\bigr\)\\geq 1\-\\alpha\. Lemma[2](https://arxiv.org/html/2606.11235#S4.Thmtheorem2)implies that, whenmmis set as inFewRS, the setR=\{Ai:Ai\(𝒟\)\>δ^\(𝒮\)\}R=\\\{A\_\{i\}:A\_\{i\}\(\\mathcal\{D\}\)\>\\hat\{\\delta\}\(\\mathcal\{S\}\)\\\}can be flagged as significant with FWER≤α\\leq\\alpha, i\.e\., it holdsPr\(R∩ℋ0≠∅\)≤α\\Pr\(R\\cap\\mathcal\{H\}\_\{0\}\\neq\\emptyset\)\\leq\\alpha\(sinceR⊆\{Ai:Ai\(𝒟\)\>δ\(α\)\}R\\subseteq\\\{A\_\{i\}:A\_\{i\}\(\\mathcal\{D\}\)\>\\delta\(\\alpha\)\\\}\)\. This directly implies the guarantees ofFewRS\. ###### Corollary 3\. The output ofFewRShas FWER≤α\\leq\\alpha\. To better interpret the bound tommgiven in Lemma[2](https://arxiv.org/html/2606.11235#S4.Thmtheorem2), we observe that, from the factex≥1\+xe^\{x\}\\geq 1\+x, it holds1/ln\(11−α\)≤1/α1/\\ln\(\\frac\{1\}\{1\-\\alpha\}\)\\leq 1/\\alpha; therefore,m≤⌈ln\(1/α\)/α⌉m\\leq\\left\\lceil\\ln\(1/\\alpha\)/\\alpha\\right\\rceil\. This implies that the numbermmof resamples used byFewRSscales nicely w\.r\.t\.α\\alpha\. Finally, we remark that the valuemmused byFewRSis the*minimum*number of resamples to control the FWER; this property is clearly useful for computational reasons, but also to maximize the statistical power, as increasing the size of𝒮\\mathcal\{S\}leads to higher values of the thresholdδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)\. Normalization for Multiple Analyses\.While the guarantees ofFewRShold for any set𝒜\\mathcal\{A\}of test statistics, a common issue of resampling\-based approaches is that the statistical power may be reduced when the functionsA∈𝒜A\\in\\mathcal\{A\}have drastically different scale \(and/or variance\)\. For instance, consider two functionsA1,A2A\_\{1\},A\_\{2\}with𝔼𝒟0\[A1\(𝒟0\)\]≫A2\(𝒟\)\\mathop\{\\mathbb\{E\}\}\_\{\\mathcal\{D\}\_\{0\}\}\[A\_\{1\}\(\\mathcal\{D\}^\{0\}\)\]\\gg A\_\{2\}\(\\mathcal\{D\}\): it is highly unlikely thatA2A\_\{2\}will be returned as significant, regardless of how farA2\(𝒟\)A\_\{2\}\(\\mathcal\{D\}\)is from what expected under the null distribution, sinceδ\(α\)≥𝔼𝒟0\[A1\(𝒟0\)\]\\delta\(\\alpha\)\\geq\\mathop\{\\mathbb\{E\}\}\_\{\\mathcal\{D\}\_\{0\}\}\[A\_\{1\}\(\\mathcal\{D\}^\{0\}\)\]\. To improve the power in the case of heterogeneous test statistics, we show that they can be normalized without affecting any theoretical guarantee\. For anyA∈𝒜A\\in\\mathcal\{A\}, if its expected valueμA=𝔼𝒟0\[A\(𝒟0\)\]\\mu\_\{A\}=\\mathop\{\\mathbb\{E\}\}\_\{\\mathcal\{D\}\_\{0\}\}\[A\(\\mathcal\{D\}^\{0\}\)\]and its varianceσA2=Var𝒟0\[A\(𝒟0\)\]\\sigma^\{2\}\_\{A\}=\\text\{Var\}\_\{\\mathcal\{D\}\_\{0\}\}\[A\(\\mathcal\{D\}^\{0\}\)\]are known, we normalizeAAasA⊙\(𝒟\)=\(A\(𝒟\)−μA\)/σAA^\{\\odot\}\(\\mathcal\{D\}\)=\(A\(\\mathcal\{D\}\)\-\\mu\_\{A\}\)/\\sigma\_\{A\}\. \(A⊙A^\{\\odot\}is typically known as the*standard score*forAA\.\) Alternatively, we may replaceσA\\sigma\_\{A\}with the rangeΔA\\Delta\_\{A\}ofAA, i\.e\.,ΔA=max𝒟0∈𝒳A\(𝒟0\)−min𝒟0∈𝒳A\(𝒟0\)\\Delta\_\{A\}=\\max\_\{\\mathcal\{D\}^\{0\}\\in\\mathcal\{X\}\}A\(\\mathcal\{D\}^\{0\}\)\-\\min\_\{\\mathcal\{D\}^\{0\}\\in\\mathcal\{X\}\}A\(\\mathcal\{D\}^\{0\}\)\. When none of the moments ofAAare known, we propose the following simple and practical solution, also used in our experiments\. Let𝒮′\\mathcal\{S\}^\{\\prime\}be a set ofm′m^\{\\prime\}i\.i\.d\. resamples taken fromγ0\\gamma\_\{0\}\. Then,μA\\mu\_\{A\}andσA2\\sigma^\{2\}\_\{A\}can be replaced by empirical estimates computed on𝒮′\\mathcal\{S\}^\{\\prime\}\(details are in Appendix[A\.2](https://arxiv.org/html/2606.11235#A1.SS2)\)\. Note that one may be tempted to estimate the moments ofAAusing the resamples𝒮\\mathcal\{S\}generated byFewRS; however, this choice may violate the FWER guarantees due to the correlation between the definition ofA⊙A^\{\\odot\}and the boundδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)to itsα\\alpha\-quantile\. In our experiments, when normalization was needed, we use an extremely small resampled set𝒮′\\mathcal\{S\}^\{\\prime\}\(m′=10m^\{\\prime\}=10\), which always provided accurate results\. In Appendix[A\.3](https://arxiv.org/html/2606.11235#A1.SS3)we test other choices ofm′m^\{\\prime\}\. Given the set𝒜⊙=\{A1⊙,…,Ak⊙\}\\mathcal\{A\}^\{\\odot\}=\\\{A\_\{1\}^\{\\odot\},\\dots,A\_\{k\}^\{\\odot\}\\\}of normalized test statistics,FewRScan be applied to𝒜⊙\\mathcal\{A\}^\{\\odot\}instead of𝒜\\mathcal\{A\}, without affecting the FWER guarantees, as stated in the following result\. ###### Corollary 4\. The output ofFewRS\(𝒜⊙,𝒟,γ0,α\)\(\\mathcal\{A\}^\{\\odot\},\\mathcal\{D\},\\gamma\_\{0\},\\alpha\)has FWER≤α\\leq\\alpha\. Power ofFewRS\.The previous section focuses on the false positive guarantees ofFewRS\. We also present several guarantees on the*power*ofFewRS\. Remarkably, our results do not require any assumption on the distribution of significant patterns\. In particular, we show that the thresholdδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)computed byFewRSis close to the \(ideal\) thresholdδ\(α\)\\delta\(\\alpha\); more formally,δ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)does not exceedδ\(α/k\)\\delta\(\\alpha/k\)with probability≥1−t\\geq 1\-t, wherekkdepends onα\\alphaandtt\. ###### Lemma 5\. For anyt∈\(0,1\)t\\in\(0,1\), letk=mαln\(11−t\)k=\\frac\{m\\alpha\}\{\\ln\(\\frac\{1\}\{1\-t\}\)\}\. Then, it holds Pr\(δ^\(𝒮\)\>δ\(α/k\)\)≤t\.\\displaystyle\\Pr\(\\hat\{\\delta\}\(\\mathcal\{S\}\)\>\\delta\(\\alpha/k\)\)\\leq t\. Interestingly, by settingt=1/2t=1/2, we obtain that the*median*value of the thresholdδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)computed byFewRSwill be within the ideal thresholdδ\(α\)\\delta\(\\alpha\)by the modest factork≈1\.44⋅ln\(1/α\)k\\approx 1\.44\\cdot\\ln\(1/\\alpha\), which isk=3\.17k=3\.17whenα=0\.1\\alpha=0\.1, andk=4\.26k=4\.26whenα=0\.05\\alpha=0\.05\. This guarantees thatFewRSwill robustly identify a significance threshold with guarantees on false discoveries, without a significant decrease in terms of power\. We prove Lemma[5](https://arxiv.org/html/2606.11235#S4.Thmtheorem5)in Appendix[A\.1](https://arxiv.org/html/2606.11235#A1.SS1), where we also describe and analyze a generalization ofFewRS: such approach computesδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)\(defined as the maximum test statistic over the resamples in Algorithm[1](https://arxiv.org/html/2606.11235#algorithm1)\) as the empirical quantile over an higher number of resamples\. This generalization improves power \(i\.e\., by reducing the factorkk\) at the cost of a higher running time, so it is useful when the generation of a higher number of resamples is possible, and it allows a more flexible exploration of the trade\-off between computational resources and accuracy of the procedure\. ## 5\.Experimental evaluation This section presents the results of our experiments\. The goals of our experimental evaluation are to assess the statistical power and scalability ofFewRSon synthetic datasets, where a ground truth is known, and to compare it with state\-of\-the\-art methods; we also applyFewRSto three realistic data mining tasks: frequent pattern mining from transactional datasets, the analysis of the diversity of interactions in labelled networks, and mining patterns associated to a target label\(Pellegrina and Vandin,[2024](https://arxiv.org/html/2606.11235#bib.bib11)\)\. For the first two scenarios, our resampling scheme considers expressive null models for transactional datasets\(Gioniset al\.,[2007](https://arxiv.org/html/2606.11235#bib.bib267); Pretiet al\.,[2024](https://arxiv.org/html/2606.11235#bib.bib263)\)and random graphs\(Fosdicket al\.,[2018](https://arxiv.org/html/2606.11235#bib.bib266); Pretiet al\.,[2025a](https://arxiv.org/html/2606.11235#bib.bib261)\)\. Experimental setup\.We implementedFewRSinPython\.111The code is available online[https://github\.com/leonardopellegrina/FewRS](https://github.com/leonardopellegrina/FewRS)All the experiments were run on a machine with 2\.30 GHz Intel Xeon CPU, using up to3232cores, 1 TB of RAM, on Ubuntu 22\.04\. To generate resamples for transactional datasets and labelled graphs, we used efficient implementations from previous works\(Pretiet al\.,[2024](https://arxiv.org/html/2606.11235#bib.bib263),[2025a](https://arxiv.org/html/2606.11235#bib.bib261)\)\. Synthetic experiments\.In this set of experiments we assesFewRSon synthetic data, evaluating its statistical power and scalability over a wide choice of parameters\. We consider a null distributionγ0\\gamma\_\{0\}that generates datasets𝒟0∈\{0,1\}n×d\\mathcal\{D\}^\{0\}\\in\\\{0,1\\\}^\{n\\times d\}withn=104n=10^\{4\}samples andddfeatures, where each entry𝒟j,i0\\mathcal\{D\}^\{0\}\_\{j,i\}of𝒟0\\mathcal\{D\}^\{0\}is a Bernoulli random variable with parameter1/21/2, i\.e\.,Pr\(𝒟j,i0=1\)=Pr\(𝒟j,i0=0\)=1/2\\Pr\(\\mathcal\{D\}^\{0\}\_\{j,i\}=1\)=\\Pr\(\\mathcal\{D\}^\{0\}\_\{j,i\}=0\)=1/2, for allj,ij,i\. While the entries of the same feature are independent, we control the degree of correlation among theddfeatures by first setting all featuresi\>1i\>1as copies of the first feature, and then randomly flipping each entry of𝒟0\\mathcal\{D\}^\{0\}independently at random with probabilityρ∈\[0,1/2\]\\rho\\in\[0,1/2\]; note that whenρ=0\\rho=0all the features are identical, and whenρ=1/2\\rho=1/2they are mutually independent\. For eachi∈\[1,d\]i\\in\[1,d\], we defineAiA\_\{i\}as the average value of theii\-th feature, i\.e\.,Ai\(𝒟0\)=1n∑j=1n𝒟j,i0A\_\{i\}\(\\mathcal\{D\}^\{0\}\)=\\frac\{1\}\{n\}\\sum\_\{j=1\}^\{n\}\\mathcal\{D\}^\{0\}\_\{j,i\}\. Thus, for eachiiwe test the hypothesis that theii\-th feature has mean≤1/2\\leq 1/2\. We fixα=0\.1\\alpha=0\.1, and vary the correlation, by varyingρ\\rho, and the numberddof features\. For WY, we use10310^\{3\}resamples\. Instead,FewRSonly requires2222resamples\. First, we evaluate the statistical power of each method by estimating its empirical FWER with the following procedure: for each method, we compute its corrected significance thresholdδ\\delta; then, we draw100100resamples fromγ0\\gamma\_\{0\}and compute the fraction of times that the maximum statisticmaxiAi\\max\_\{i\}A\_\{i\}is\>δ\>\\delta\. We repeat these operations1010times, and report the average empirical FWER and its standard deviation\. For the Bonferroni correction, we consider two variants that differ in the computation of the thresholdδ\\delta\. The first variant, Bonferroni\-E, setsδ\\deltausing the exact quantile of the Binomial distribution; note that this approach assumes full knowledge of the distribution of each test statistic, i\.e\., a way to exactly compute its quantile, which is usually not available\. The thresholdδ\\deltais defined using the exact Binomial distribution, and a union bound overddevents:δ=min\{x∈\[1,n\]:∑j=x\+1n\(ni\)12n≤αd\}\\delta=\\min\\left\\\{x\\in\[1,n\]:\\sum\_\{j=x\+1\}^\{n\}\\binom\{n\}\{i\}\\frac\{1\}\{2\}^\{n\}\\leq\\frac\{\\alpha\}\{d\}\\right\\\}\. The second variant, Bonferroni\-H, uses Hoeffding’s bound\(Mitzenmacher and Upfal,[2017](https://arxiv.org/html/2606.11235#bib.bib141)\), which only assumes that each test statistic is an average ofnnrandom variables bounded∈\[0,1\]\\in\[0,1\], and a union bound overddevents, thus is more widely applicable\. We haveδ=12\+ln\(d/α\)/\(2n\)\\delta=\\frac\{1\}\{2\}\+\\sqrt\{\\ln\(d/\\alpha\)/\(2n\)\}\. Figure[5](https://arxiv.org/html/2606.11235#A1.F5)shows the results for this experiment, for four values ofρ∈\[0,1/2\]\\rho\\in\[0,1/2\]\. Note that higher values of the empirical FWER \(below the nominal targetα\\alpha\) denote that the method is less conservative, thus more powerful\. The results clearly show that the empirical FWER of the WY resampling procedure is always very close to its nominal valueα\\alpha, confirming that it is \(asymptotically\) optimal\. Furthermore, the empirical FWER ofFewRSis always≤α\\leq\\alpha, confirming the theoretical guarantees from Section[4\.1](https://arxiv.org/html/2606.11235#S4.SS1)\. While the power ofFewRSis slightly lower than WY, it is still comparable and stable over all choices ofρ\\rhoanddd\. Remarkably, we observe that the average empirical FWER ofFewRSis≥α/3\\geq\\alpha/3, confirming the bound to the power ofFewRSdiscussed in our theoretical analysis\. For Bonferroni\-E, while its empirical FWER is very close toα\\alphawhend=1d=1, or when all test statistics are independent \(ρ=0\.5\\rho=0\.5\), it is overly conservative when they are correlated, resulting in subpar statistical power, particularly when the numberddof tests grows\. Bonferroni\-H, which uses less precise information on the distribution of each test statistic, is in all cases the most conservative approach\. We also varyα∈\[0\.1,0\.01\]\\alpha\\in\[0\.1,0\.01\]\(Figures[7](https://arxiv.org/html/2606.11235#A1.F7)and[8](https://arxiv.org/html/2606.11235#A1.F8)\); the results are analogous to the caseα=0\.1\\alpha=0\.1, confirming that, as expected,FewRSpreserves power and guarantees on the FWER in all cases\. Regarding computational resources, we note that we ran the experiments for the WY procedure only ford≤103d\\leq 10^\{3\}, due to high running times \(as running all the experiments would need≈\\approxa week to complete, we only report estimates of the running time of WY ford\>103d\>10^\{3\}\), while both Bonferroni andFewRSrun very fast\. We remark that the WY procedure poorly scales to higher values of the datasets sizenn, or to a higher number of resamples \(i\.e\., to achieve a more accurate estimate of the quantile for lower values ofα\\alpha\)\. Figure[6](https://arxiv.org/html/2606.11235#A1.F6)shows the running time of each method for severaldd, and we compare it with the achieved statistical power \(i\.e\., the empirical FWER\)\. As expected,FewRSis50×50\\timesfaster than WY, as it drastically reduces the number of resamples\. Overall, these experiments show thatFewRSachieves a substantially better trade\-off between scalability, allowing to handle large high\-dimensional datasets using reasonable computational resources, and statistical power, that is comparable to the optimal WY resampling and significantly better than Bonferroni in realistic settings \(ρ<0\.5\\rho<0\.5\)\. Analysis of transactional datasets\.In this experiment, we applyFewRSto evaluate the statistical significance of collections of frequent patterns from transactional datasets\(Agrawalet al\.,[1993](https://arxiv.org/html/2606.11235#bib.bib270)\)\. We consider1414real\-world datasets222From[https://archive\.ics\.uci\.edu](https://archive.ics.uci.edu/)and[http://fimi\.uantwerpen\.be](http://fimi.uantwerpen.be/)often used as benchmark in previous works; the statistics are shown in Table[1](https://arxiv.org/html/2606.11235#A1.T1)\(in the Appendix\)\. We represent each dataset𝒟∈\{0,1\}n×d\\mathcal\{D\}\\in\\\{0,1\\\}^\{n\\times d\}as a binary matrix, where each entry𝒟i,j\\mathcal\{D\}\_\{i,j\}is11when thejj\-th items is contained in theii\-th transaction,0otherwise\. An itemsetX⊆\[1,d\]X\\subseteq\[1,d\]is contained in theii\-th transaction of𝒟\\mathcal\{D\}if all the items ofXXare contained in theii\-th row of𝒟\\mathcal\{D\}, i\.e\., when∏j∈X𝒟i,j=1\\prod\_\{j\\in X\}\\mathcal\{D\}\_\{i,j\}=1\. The frequencyf\(X\)f\(X\)ofXXis defined asf\(X\)=1n∑i=1n∏j∈X𝒟i,jf\(X\)=\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\prod\_\{j\\in X\}\\mathcal\{D\}\_\{i,j\}, i\.e\., is the fraction of transactions of𝒟\\mathcal\{D\}that containsXX\. Given a frequency thresholdθ\\theta, the setFI\(𝒟,θ\)\\textsc\{FI\}\(\\mathcal\{D\},\\theta\)contains all itemsets of𝒟\\mathcal\{D\}withf\(X\)≥θf\(X\)\\geq\\theta:FI\(𝒟,θ\)=\{X⊆ℐ:f\(X\)≥θ\}\\textsc\{FI\}\(\\mathcal\{D\},\\theta\)=\\\{X\\subseteq\\mathcal\{I\}:f\(X\)\\geq\\theta\\\}\. For allc∈\[1,d\]c\\in\[1,d\]we define the setFI\(𝒟,θ,c\)\\textsc\{FI\}\(\\mathcal\{D\},\\theta,c\)of frequent itemsets with cardinalityccasFI\(𝒟,θ,c\)=\{X⊆ℐ:f\(X\)≥θ,\|X\|=c\}\\textsc\{FI\}\(\\mathcal\{D\},\\theta,c\)=\\\{X\\subseteq\\mathcal\{I\}:f\(X\)\\geq\\theta,\|X\|=c\\\}\. In this experiment we evaluate the significance of the set of frequent patternsFI\(𝒟,θ\)\\textsc\{FI\}\(\\mathcal\{D\},\\theta\)w\.r\.t\. two different null models for the dataset𝒟\\mathcal\{D\}\. The first model, called GMMT\(Gioniset al\.,[2007](https://arxiv.org/html/2606.11235#bib.bib267)\), definesγ0\\gamma\_\{0\}as the uniform distribution over the set of all datasets with the same marginals of𝒟\\mathcal\{D\}: for each𝒟0\\mathcal\{D\}^\{0\}sampled fromγ0\\gamma\_\{0\}, the number of11’s in each row and column is the same as in𝒟\\mathcal\{D\}\. The second model\(Pretiet al\.,[2024](https://arxiv.org/html/2606.11235#bib.bib263)\), denoted as ALICE, in addition to the column and row marginals, preserves the Bipartite Joint Degree Matrix of the graph representation of𝒟\\mathcal\{D\}\. To evaluate the statistical significance of collections of frequent patterns, we define the test statisticA\(𝒟\)=\|FI\(𝒟,θ\)\|A\(\\mathcal\{D\}\)=\|\\textsc\{FI\}\(\\mathcal\{D\},\\theta\)\|, and we employFewRSto generate resamples fromγ0\\gamma\_\{0\}to evaluate if the number of frequent patterns in the data is higher than what expected under the null distribution\. To do so,FewRSgenerates resamples𝒮=\{𝒟10,…,𝒟m0\}\\mathcal\{S\}=\\\{\\mathcal\{D\}^\{0\}\_\{1\},\\dots,\\mathcal\{D\}^\{0\}\_\{m\}\\\}fromγ0\\gamma\_\{0\}and computes the significance thresholdδ^\(𝒮\)=maxiA\(𝒟i0\)\\hat\{\\delta\}\(\\mathcal\{S\}\)=\\max\_\{i\}A\(\\mathcal\{D\}^\{0\}\_\{i\}\)\. We fixα=0\.05\\alpha=0\.05\. Recall that, whenA\(𝒟\)\>δ^\(𝒮\)A\(\\mathcal\{D\}\)\>\\hat\{\\delta\}\(\\mathcal\{S\}\), we conclude that this measure is significant with FWER≤α\\leq\\alpha\. For both GMMT and ALICE, sampling fromγ0\\gamma\_\{0\}is performed using algorithms based on the Markov Chain Monte Carlo \(MCMC\) and Metropolis\-Hastings methods\(Peskun,[1973](https://arxiv.org/html/2606.11235#bib.bib265); Mitzenmacher and Upfal,[2017](https://arxiv.org/html/2606.11235#bib.bib141); Pretiet al\.,[2024](https://arxiv.org/html/2606.11235#bib.bib263)\)\. For the MCMC, we use default setting for the number of iterations \(which is in the order of𝒪\(\|𝒟\|\)\\mathcal\{O\}\(\|\\mathcal\{D\}\|\), see Table[1](https://arxiv.org/html/2606.11235#A1.T1)\), and impose a time limit of44days to traverse the chain; when the time limit is exceeded, the current dataset is returned in output\. We use the values ofθ\\thetashown in Table[1](https://arxiv.org/html/2606.11235#A1.T1)\. We remark that for this task there are no analytical procedures to compute the significance of the results, since no closed form of the distribution of\|FI\(𝒟,θ\)\|\|\\textsc\{FI\}\(\\mathcal\{D\},\\theta\)\|under these complex null models is available\. Figure[1](https://arxiv.org/html/2606.11235#S5.F1)shows the results for this experiment\. The plots compare the number of frequent patterns\|FI\(𝒟,θ\)\|\|\\textsc\{FI\}\(\\mathcal\{D\},\\theta\)\|in the dataset𝒟\\mathcal\{D\}and the values of\|FI\(𝒟i0,θ\)\|\|\\textsc\{FI\}\(\\mathcal\{D\}^\{0\}\_\{i\},\\theta\)\|computed from the resamples drawn from the null distributions, using both the GMMT and ALICE models\. The table of Figure[9](https://arxiv.org/html/2606.11235#A1.F9)in Appendix reports the significance thresholdsδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)computed byFewRSand the averageFI¯=1m∑i=1mA\(𝒟i0\)\\overline\{\\text\{FI\}\}=\\frac\{1\}\{m\}\\sum\_\{i=1\}^\{m\}A\(\\mathcal\{D\}^\{0\}\_\{i\}\)of the test statistics over the resamples\. First, we observe that in both models the values of the test statistic measured on the resamples are very concentrated and stable across different random draws\. Regarding GMMT, from Figure[1](https://arxiv.org/html/2606.11235#S5.F1)we clearly conclude that almost all datasets have a significant number of frequent patterns \(sinceA\(𝒟\)\>δ^\(𝒮\)A\(\\mathcal\{D\}\)\>\\hat\{\\delta\}\(\\mathcal\{S\}\)\)\. The only exception is the dataset retail, which contains a lower number of frequent patterns than what expected under the null\. As observed by previous work\(Gioniset al\.,[2007](https://arxiv.org/html/2606.11235#bib.bib267)\), this dataset contains structure that may be explained by the row and column marginals alone\. Considering the ALICE model, we observed similar results, with some interesting differences\. First, the distribution of\|FI\(𝒟i0,θ\)\|\|\\textsc\{FI\}\(\\mathcal\{D\}^\{0\}\_\{i\},\\theta\)\|changes in some cases, e\.g\., for the dataset kosarak\. Furthermore, the number of frequent patterns in retail is significant w\.r\.t\. ALICE; in such a case, we may conclude that preserving additional properties of the input dataset in the null distributionγ0\\gamma\_\{0\}highlights additional structure that could not be detected with GMMT\. Finally, note thatFewRSis in perfect agreement with the WY procedure; all hypotheses that WY would reject are also rejected byFewRS, since their statisticsA\(𝒟\)A\(\\mathcal\{D\}\)are sensibly higher than the tresholdδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\), which provides a guaranteed upper bound to the quantileδ\(α\)\\delta\(\\alpha\)\. Moreover, all hypotheses that would not be rejected by WY are also not provided in output by our approach, since they fall well below the average resampled valuesFI¯\\overline\{\\text\{FI\}\}\(thus also well below theα\\alpha\-quantile\)\. We then analyze the distribution of frequent patterns with a higher granularity by considering different itemsets’ cardinalities\. For eachc∈\[1,10\]c\\in\[1,10\], we define the statisticAc\(𝒟\)=\|FI\(𝒟,θ,c\)\|A\_\{c\}\(\\mathcal\{D\}\)=\|\\textsc\{FI\}\(\\mathcal\{D\},\\theta,c\)\|which measures the number of frequent itemsets with sizecc, i\.e\., containingccitems\. As discussed in Section[4\.1](https://arxiv.org/html/2606.11235#S4.SS1), computingδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)using the test statisticsAcA\_\{c\}directly may not be very effective, as we expect such functions to have different scale and range; therefore, we consider the centered and range\-normalized test statisticsAc⊙A\_\{c\}^\{\\odot\}from Equation \([1](https://arxiv.org/html/2606.11235#A1.E1)\) \(Appendix[A\.2](https://arxiv.org/html/2606.11235#A1.SS2)\), where𝒮′\\mathcal\{S\}^\{\\prime\}is an independent set ofm′=10m^\{\\prime\}=10resamples fromγ0\\gamma\_\{0\}\. Thus, we defineδ^\(𝒮\)=maxi,cAc⊙\(𝒟i0\)\\hat\{\\delta\}\(\\mathcal\{S\}\)=\\max\_\{i,c\}A\_\{c\}^\{\\odot\}\(\\mathcal\{D\}\_\{i\}^\{0\}\)and report as significant allccwithAc⊙\(𝒟\)\>δ^\(𝒮\)A\_\{c\}^\{\\odot\}\(\\mathcal\{D\}\)\>\\hat\{\\delta\}\(\\mathcal\{S\}\)\.      Figure 1\.Left: testing the significance of the number of FI under the GMMT and ALICE models\. Right: testing the significance ofM\(G\)M\(G\)under the CM and Polaris models\.Left: testing the significance of the number of FI under the GMMT and ALICE models\. Right: testing the significance of $M\(G\)$ under the CM and Polaris models\.     Figure 2\.Testing the significance of the number of FI for different cardinalities under the GMMT and ALICE models\.Testing the significance of the number of FI for different cardinalities under the GMMT and ALICE models\.   Figure 3\.Running times to generate and analyze all samples withFewRSand WY for the GMMT and CM models\. The label \(p\) denotes parallelized approaches with3232cores\.Running times to generate and analyze all samples with FewRS and WY for the GMMT and CM models\. The label \(p\) denotes parallelized approaches with $32$ cores\.Figure[2](https://arxiv.org/html/2606.11235#S5.F2)reports the results for this experiment for the dataset retail \(all other datasets are shown in Figures[13](https://arxiv.org/html/2606.11235#A1.F13)\-[16](https://arxiv.org/html/2606.11235#A1.F16)in Appendix due to space constraints\)\. The plots compare the distribution of the numberAc\(𝒟\)A\_\{c\}\(\\mathcal\{D\}\)of frequent patterns with cardinalityccin the dataset𝒟\\mathcal\{D\}with the valuesAc\(𝒟i0\)A\_\{c\}\(\\mathcal\{D\}^\{0\}\_\{i\}\)observed from the resamples, under the GMMT and ALICE models\. Under GMMT, itemsets with cardinality55are significant for retail, an observation that may be overlooked by only considering the global number of frequent patterns\. Then, we observe that the structure of the frequent patterns for lower cardinalities is more interesting under the ALICE model\. We also show the values of the normalized test statisticsAc⊙A\_\{c\}^\{\\odot\}and the values of the significance thresholdδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\); the plots confirm that the employed normalization is very effective in detecting deviations from the null distribution that are comparable across different scales and ranges\. For running times, Figure[3](https://arxiv.org/html/2606.11235#S5.F3)shows the resources required byFewRSand \(an estimate of\) WY to generate all resamples and assess the statistical significance of the FI under GMMT \(ALICE is similar, see Figure[10](https://arxiv.org/html/2606.11235#A1.F10)in the Appendix\)\. Generating even one resample from the null models is quite costly \(see Figure[10](https://arxiv.org/html/2606.11235#A1.F10)in the Appendix\), in particular for large datasets, due to a high number of MCMC iterations needed to reach convergence and each iteration being fairly expensive\. The immediate consequence is that generating and mining a high number of resamples, as done by the WY procedure, is extremely expensive, and infeasible for larger instances\. In fact, WY would need several months to analyze77out of the1414datasets we considered, using10310^\{3\}resamples \(and parallelizing over3232cores\)\. Using the recommended amount of resamples \(10410^\{4\}\) would require more than a year of computation\. Instead,FewRSis faster by at least one order of magnitude, up to two orders, thus it enables the analysis of these instances by requiring at most a few days of computation\. Therefore, this experiment clearly shows thatFewRSis very effective and powerful in assessing significance for frequent pattern mining tasks, while greatly reducing the computational burden of state\-of\-the\-art resampling procedures\. Analysis of labelled networks\.Next, we applyFewRSto analyze large real\-world labelled networks\. We consider1515graphs, where each nodevvhas a categorial labelc\(v\)c\(v\), i\.e\., a color\. For instance, in graphs from online social networks the color of a node denotes the view of the corresponding profile w\.r\.t\. a polarizing topic \(e\.g\., the graph Guns encodes the opinion of each user w\.r\.t\. gun control\(Garimellaet al\.,[2018](https://arxiv.org/html/2606.11235#bib.bib271)\)\)\. Table[2](https://arxiv.org/html/2606.11235#A1.T2)in the Appendix shows their statistics\.      Figure 4\.Testing the significance ofMv\(G\)M\_\{v\}\(G\)of the1010nodes with highest degree under the Polaris model\.Testing the significance of $M\_\{v\}\(G\)$ of the $10$ nodes with highest degree under the Polaris model\.For each graphG=\(V,E\)G=\(V,E\)we consider a relatively simple measureM\(G\)M\(G\)of the level of polarization of the network, which computes the average fraction of neighbors of a nodevvthat has the same color ofvv\. More precisely, letN\(v\)N\(v\)be the multiset of neighbors of the nodev∈Vv\\in V\. We defineM\(G\)=1\|V\|∑v∈V\|\{u∈N\(v\):c\(u\)=c\(v\)\}\|\|N\(v\)\|M\(G\)=\\frac\{1\}\{\|V\|\}\\sum\_\{v\\in V\}\\frac\{\|\\\{u\\in N\(v\):c\(u\)=c\(v\)\\\}\|\}\{\|N\(v\)\|\}\. A large value ofM\(G\)M\(G\)denotes that most nodes of the network are connected to nodes of the same color, e\.g\., that have the same side on a polarizing topic, meaning that they have a low exposure to diverse opinions\. We also evaluate a more local measure of neighbor diversity by testing the connections of the most relevant nodes of the graph\. We consider the1010nodes with highest degree in the graphGG; for each such avv, we define the measureMv\(G\)=\|\{u∈N\(v\):c\(u\)=c\(v\)\}\|/\|N\(v\)\|M\_\{v\}\(G\)=\|\\\{u\\in N\(v\):c\(u\)=c\(v\)\\\}\|/\|N\(v\)\|which only considers the colors of the neighbors ofvv\. We evaluate the statistical significance of these measures using resamples from two null distributionsγ0\\gamma\_\{0\}\. First, we consider the Configuration Model \(CM\)\(Bollobás,[1980](https://arxiv.org/html/2606.11235#bib.bib264); Fosdicket al\.,[2018](https://arxiv.org/html/2606.11235#bib.bib266)\), where a random graphG0∼γ0G^\{0\}\\sim\\gamma\_\{0\}is taken uniformly at random from the set of all graphs with the same degree sequence of the input graphGG\. Then, we consider Polaris\(Pretiet al\.,[2025a](https://arxiv.org/html/2606.11235#bib.bib261)\), that is an extension of CM to labelled graphs which preserves also the color assortativity of the graph, that is the total number of edges between all pairs of colors\(Newman,[2003](https://arxiv.org/html/2606.11235#bib.bib262)\)\. As for the previous experiment, we sample from the null models with state\-of\-the\-art MCMC algorithms\(Peskun,[1973](https://arxiv.org/html/2606.11235#bib.bib265); Mitzenmacher and Upfal,[2017](https://arxiv.org/html/2606.11235#bib.bib141); Fosdicket al\.,[2018](https://arxiv.org/html/2606.11235#bib.bib266); Pretiet al\.,[2025a](https://arxiv.org/html/2606.11235#bib.bib261)\)\. To evaluate the statistical significance ofM\(G\)M\(G\), for each graphGG, we applyFewRSwithα=0\.05\\alpha=0\.05, which generates resamples𝒮=\{G10,…,Gm0\}\\mathcal\{S\}=\\\{G^\{0\}\_\{1\},\\dots,G^\{0\}\_\{m\}\\\}fromγ0\\gamma\_\{0\}and computes the significance thresholdδ^\(𝒮\)=maxiM\(Gi0\)\\hat\{\\delta\}\(\\mathcal\{S\}\)=\\max\_\{i\}M\(G^\{0\}\_\{i\}\)\. WhenM\(G\)\>δ^\(𝒮\)M\(G\)\>\\hat\{\\delta\}\(\\mathcal\{S\}\), we conclude that this measure is significant with FWER≤α\\leq\\alpha\. RegardingMv\(G\)M\_\{v\}\(G\), we define centralized and range\-normalized statisticsAv⊙\(G\)A\_\{v\}^\{\\odot\}\(G\)using an independent set of resamples𝒮′\\mathcal\{S\}^\{\\prime\}as done for the frequent patterns setting\. While we fixm′=10m^\{\\prime\}=10, we test other choices in Appendix[A\.3](https://arxiv.org/html/2606.11235#A1.SS3)\. Note that also for this task we can’t use any analytical correction procedure, since no closed form of the distribution ofM\(G0\)M\(G^\{0\}\)andMv\(G\)M\_\{v\}\(G\)under these complex null models is available\. Figure[1](https://arxiv.org/html/2606.11235#S5.F1)shows the results for this experiment\. The plots show a comparison between the values ofM\(G\)M\(G\)and the values ofM\(G0\)M\(G^\{0\}\)from the resamplesG0G^\{0\}generated fromγ0\\gamma\_\{0\}, for both the CM and Polaris models\. The table in Figure[9](https://arxiv.org/html/2606.11235#A1.F9)in Appendix reports the values ofM\(G\)M\(G\), the meansM¯=1m∑i=1mM\(Gi0\)\\overline\{M\}=\\frac\{1\}\{m\}\\sum\_\{i=1\}^\{m\}M\(G^\{0\}\_\{i\}\), and the thresholdsδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)computed byFewRS\. Note that, for all graphs and under CM, it holdsM\(G\)\>δ^\(𝒮\)M\(G\)\>\\hat\{\\delta\}\(\\mathcal\{S\}\); in all networksM\(G\)M\(G\)is significantly higher compared to random graphs from CM\. This may be a sign that the fraction of neighbors with the same color is not explained by the degree sequence of the nodes\. Then, we clearly observe that preserving the color assortativity with Polaris has a significant impact on the distribution ofM\(G0\)M\(G^\{0\}\): the values ofM\(G0\)M\(G^\{0\}\)are considerably higher in Polaris compared to the CM model\. Nevertheless, in all but33graphs \(Cite, US\-Elect, and Com\-Orkut\) we observeM\(G\)\>δ^\(𝒮\)M\(G\)\>\\hat\{\\delta\}\(\\mathcal\{S\}\)also under the Polaris model\. In such33cases, we observedM\(G\)<M¯M\(G\)<\\overline\{M\}, i\.e\., a fraction of mono\-chromatic neighbors lower than what expended under the Polaris null model\. These findings imply that the users in most of the networks we considered are exposed to less diverse content than expected under the null; on the other hand, in some cases the color assortativity of the network may explain the observed overall fraction of mono\-chromatic neighbors\. Figure[4](https://arxiv.org/html/2606.11235#S5.F4)reports the valuesMv\(G\)M\_\{v\}\(G\), that quantify the the fraction of neighbors with the same color for the nodes with highest degree in the graphs, and compare them withMv\(Gi0\)M\_\{v\}\(G^\{0\}\_\{i\}\)computed from the resampled graphs, for US\-Elect and Com\-Orkut under Polaris \(all other results are in Figures[17](https://arxiv.org/html/2606.11235#A1.F17)\-[20](https://arxiv.org/html/2606.11235#A1.F20)in Appendix due to space constraints\)\. Interestingly, the fraction of mono\-chromatic neighbors of the top nodes of US\-Elect aligns with the corresponding global average, i\.e\., it is lower than expected; on the other hand, for Com\-Orkut the top nodes have a significantly higher fraction of mono\-chromatic neighbors compared to the overall network\. These observations provide additional insights on the structure of the overall network, and the role played by the most relevant nodes\. Finally, note that, as in the previous setting,FewRSleads to the same results of the WY procedure: all hypotheses that WY would reject are also rejected byFewRS, and all hypotheses that would not be rejected by WY are also not provided in output byFewRS\. For computational resources, Figures[3](https://arxiv.org/html/2606.11235#S5.F3),[11](https://arxiv.org/html/2606.11235#A1.F11), and[12](https://arxiv.org/html/2606.11235#A1.F12)\(in the Appendix\) show the time to generate one resample under CM and Polaris models, the time needed byFewRS, and the time for the WY procedure\. Similarly to previous experiments, sampling from the null distribution is costly even for a single resample, e\.g\.,≈3\\approx 3hours for the largest graph\. This implies that WY does not scale to the most challenging instances, even when parallelized\. In fact, for the two largest graphs, generating10310^\{3\}resamples requires almost a week and more than one month, respectively\. Drawing10410^\{4\}resamples, which is the suggested amount to accurately estimate theα\\alpha\-quantile, is clearly infeasible: for the largest graph more than11year of computation would be necessary\. Instead,FewRSanalyzes such instances in at most a couple of days, with a speedup of at least11order of magnitude, up to22orders\. It is clear that considering larger instances, or more expensive analysis \(e\.g\., evaluating measures more complex thanM\(G\)M\(G\)\), is not possible with WY\. Therefore, we conclude thatFewRSsignificantly accelerates the statistical assessment of graph analysis tasks, while yielding rigorous guarantees and high statistical power; moreover, it enables the analysis of challenging instances, that are out of reach for state\-of\-the\-art methods\. Comparison with FSR\.Finally, we compare our method with FSR\(Pellegrina and Vandin,[2024](https://arxiv.org/html/2606.11235#bib.bib11)\), a few\-shot resampling approach tailored to discover significant patterns from transactional datasets, where each transaction is labelled with a binary target\. The set of analyses𝒜=\{AP:P∈ℒ\}\\mathcal\{A\}=\\\{A\_\{P\}:P\\in\\mathcal\{L\}\\\}contains one test statisticAPA\_\{P\}for each patternPPfrom the languageℒ\\mathcal\{L\}of*subgroups*; eachPPis defined as \(at mostzz\) conjunctions of logical conditions over theddfeatures of the dataset𝒟\\mathcal\{D\}\(statistics shown in Table[3](https://arxiv.org/html/2606.11235#A1.T3)\)\. Note that the size of𝒜\\mathcal\{A\}isΘ\(dz\)\\Theta\\left\(d^\{z\}\\right\)\. Each test statisticAPA\_\{P\}corresponds to the11\-quality of the subgroupPP, which quantifies its association with the target variable\(Pellegrina and Vandin,[2024](https://arxiv.org/html/2606.11235#bib.bib11)\)\(more details in Appendix[A\.4](https://arxiv.org/html/2606.11235#A1.SS4)\)\. We fixα=0\.05\\alpha=0\.05, and usemmresamples for bothFewRSand FSR; each resample is obtained by randomly permuting the target labels, while the maximum test statistic is computed with a depth\-first search\. Figure[22](https://arxiv.org/html/2606.11235#A1.F22)\(in the Appendix\) displays the respective thresholds, the number of significant patterns reported in output, and the running time\. We observed that the thresholdsδ^\(𝒮\)\\hat\{\\delta\}\(\\mathcal\{S\}\)obtained byFewRSare sensibly smaller than the thresholdsε\\varepsiloncomputed by FSR, regardless of the fact that FSR is tailored to the considered task, whileFewRShas wide applicability\. Consequently,FewRSreports in output a much higher number of significant patterns, for the same work \(and running time\)\. While FSR works with any number of resamples, asε\\varepsilonis computed with advanced concentration bounds, our results indicate that it is more conservative thanFewRS\. Overall,FewRSachieves an excellent trade\-off between power and computation resources, even in a high\-dimensional setting where a large number of analyses is performed\. ## 6\.Conclusions We introducedFewRS, a few\-shot resampling approach to assess the statistical significance of data mining results with rigorous guarantees on the probability of false discoveries\.FewRSbuilds on a novel probabilistic bound to the supremum deviation of statistics representing the quality of data mining results, and can be used in every scenario where resampling\-based approaches are applied\.FewRSgenerates and mines an extremely small number of resamples, providing a highly scalable approach with wide applicability\. Our experiments show thatFewRSresults in a reduction of up to two orders of magnitude in running time compared to state\-of\-the\-art approaches, enabling the identification of statistically\-sound data mining results on large\-scale real\-world datasets\. ###### Acknowledgements\. This work is supported by the STARS@UNIPD 2025 program, project ”AtHeNA: Algorithms for Heterogeneous Network Analysis”, with the support of the University of Padova and Fondazione Cassa di Risparmio di Padova e Rovigo, and by the Italian Ministry of University and Research \(MUR\), project “National Center for HPC, Big Data, and Quantum Computing” CN00000013\. ## References - M\. Abuissa, A\. Lee, and M\. Riondato \(2023\)ROhAN: row\-order agnostic null models for statistically\-sound knowledge discovery\.Data Mining and Knowledge Discovery37\(4\),pp\. 1692–1718\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p4.1)\. - R\. Agrawal, T\. Imieliński, and A\. Swami \(1993\)Mining association rules between sets of items in large databases\.InProceedings of the 1993 ACM SIGMOD international conference on Management of data,pp\. 207–216\.Cited by:[§5](https://arxiv.org/html/2606.11235#S5.p7.50)\. - M\. Al Hasan and M\. J\. Zaki \(2009\)Output space sampling for graph patterns\.Proceedings of the VLDB Endowment2\(1\),pp\. 730–741\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p4.1)\. - Y\. Benjamini and Y\. Hochberg \(1995\)Controlling the false discovery rate: a practical and powerful approach to multiple testing\.Journal of the Royal statistical society: series B \(Methodological\)57\(1\),pp\. 289–300\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p2.3)\. - M\. Boley, C\. Lucchese, D\. Paurat, and T\. Gärtner \(2011\)Direct local pattern sampling by efficient two\-step random procedures\.InProceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining,pp\. 582–590\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p4.1)\. - B\. Bollobás \(1980\)A probabilistic proof of an asymptotic formula for the number of labelled regular graphs\.European Journal of Combinatorics1\(4\),pp\. 311–316\.Cited by:[§5](https://arxiv.org/html/2606.11235#S5.p13.16)\. - C\. Bonferroni \(1936\)Teoria statistica delle classi e calcolo delle probabilita\.Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze8,pp\. 3–62\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p4.16)\. - M\. Cinelli, G\. De Francisci Morales, A\. Galeazzi, W\. Quattrociocchi, and M\. Starnini \(2021\)The echo chamber effect on social media\.Proceedings of the national academy of sciences118\(9\),pp\. e2023301118\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p6.3)\. - M\. Conover, J\. Ratkiewicz, M\. Francisco, B\. Gonçalves, F\. Menczer, and A\. Flammini \(2011\)Political polarization on twitter\.InProceedings of the international aaai conference on web and social media,Vol\.5,pp\. 89–96\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p6.3)\. - S\. Dalleiger and J\. Vreeken \(2022\)Discovering significant patterns under sequential false discovery control\.InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,pp\. 263–272\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p2.3)\. - L\. Diop, C\. T\. Diop, A\. Giacometti, D\. Li, and A\. Soulet \(2018\)Sequential pattern sampling with norm constraints\.In2018 IEEE International Conference on Data Mining \(ICDM\),pp\. 89–98\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p4.1)\. - L\. Diop \(2022\)High average\-utility itemset sampling under length constraints\.InPacific\-Asia Conference on Knowledge Discovery and Data Mining,pp\. 134–148\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p4.1)\. - B\. K\. Fosdick, D\. B\. Larremore, J\. Nishimura, and J\. Ugander \(2018\)Configuring random graph models with fixed degree sequences\.Siam Review60\(2\),pp\. 315–355\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p6.3),[§5](https://arxiv.org/html/2606.11235#S5.p1.1),[§5](https://arxiv.org/html/2606.11235#S5.p13.16)\. - K\. Garimella, G\. D\. F\. Morales, A\. Gionis, and M\. Mathioudakis \(2018\)Quantifying controversy on social media\.ACM Transactions on Social Computing1\(1\),pp\. 1–27\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p6.3),[§5](https://arxiv.org/html/2606.11235#S5.p12.3)\. - A\. Giacometti and A\. Soulet \(2018\)Dense neighborhood pattern sampling in numerical data\.InProceedings of the 2018 SIAM International Conference on Data Mining,pp\. 756–764\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p4.1)\. - A\. Gionis, H\. Mannila, T\. Mielikäinen, and P\. Tsaparas \(2007\)Assessing data mining results via swap randomization\.ACM Transactions on Knowledge Discovery from Data \(TKDD\)1\(3\),pp\. 14–es\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p2.3),[§5](https://arxiv.org/html/2606.11235#S5.p1.1),[§5](https://arxiv.org/html/2606.11235#S5.p7.50),[§5](https://arxiv.org/html/2606.11235#S5.p8.13)\. - S\. González\-Bailón, D\. Lazer, P\. Barberá, M\. Zhang, H\. Allcott, T\. Brown, A\. Crespo\-Tenorio, D\. Freelon, M\. Gentzkow, A\. M\. Guess,et al\.\(2023\)Asymmetric ideological segregation in exposure to political news on facebook\.Science381\(6656\),pp\. 392–398\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p6.3)\. - W\. Hämäläinen and G\. I\. Webb \(2019\)A tutorial on statistically sound pattern discovery\.Data Mining and Knowledge Discovery33,pp\. 325–377\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p1.1)\. - J\. Han, J\. Pei, and H\. Tong \(2022\)Data mining: concepts and techniques\.Morgan kaufmann\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p1.1)\. - Y\. Hochberg \(1988\)A sharper bonferroni procedure for multiple tests of significance\.Biometrika75\(4\),pp\. 800–802\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p4.16)\. - S\. Holm \(1979\)A simple sequentially rejective multiple test procedure\.Scandinavian journal of statistics,pp\. 65–70\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p4.16)\. - S\. Jenkins, S\. Walzer\-Goldfeld, and M\. Riondato \(2022\)SPEck: mining statistically\-significant sequential patterns efficiently with exact sampling\.Data Mining and Knowledge Discovery36\(4\),pp\. 1575–1599\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p4.1)\. - A\. Kirsch, M\. Mitzenmacher, A\. Pietracaprina, G\. Pucci, E\. Upfal, and F\. Vandin \(2012\)An efficient rigorous approach for identifying statistically significant frequent itemsets\.Journal of the ACM \(JACM\)59\(3\),pp\. 1–22\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p2.3)\. - J\. Leskovec, A\. Rajaraman, and J\. D\. Ullman \(2020\)Mining of massive data sets\.Cambridge university press\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p1.1)\. - F\. Llinares\-López, M\. Sugiyama, L\. Papaxanthos, and K\. Borgwardt \(2015\)Fast and memory\-efficient significant pattern mining via permutation testing\.InProceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining,pp\. 725–734\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p3.1),[§3\.1](https://arxiv.org/html/2606.11235#S3.SS1.p1.9)\. - N\. Meinshausen, M\. H\. Maathuis, and P\. Bühlmann \(2011\)Asymptotic optimality of the westfall\-young permutation procedure for multiple testing under dependence\.The Annals of Statistics,pp\. 3369–3391\.Cited by:[§3\.1](https://arxiv.org/html/2606.11235#S3.SS1.p1.9)\. - S\. Minato, T\. Uno, K\. Tsuda, A\. Terada, and J\. Sese \(2014\)A fast method of statistical assessment for combinatorial hypotheses based on frequent itemset enumeration\.InECML PKDD 2014,Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p5.10)\. - M\. Mitzenmacher and E\. Upfal \(2017\)Probability and computing: randomization and probabilistic techniques in algorithms and data analysis\.Cambridge university press\.Cited by:[§5](https://arxiv.org/html/2606.11235#S5.p13.16),[§5](https://arxiv.org/html/2606.11235#S5.p4.15),[§5](https://arxiv.org/html/2606.11235#S5.p7.50)\. - M\. E\. Newman \(2003\)Mixing patterns in networks\.Physical review E67\(2\),pp\. 026126\.Cited by:[§5](https://arxiv.org/html/2606.11235#S5.p13.16)\. - L\. Pellegrina, M\. Riondato, and F\. Vandin \(2019a\)Hypothesis testing and statistically\-sound pattern mining\.InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,KDD ’19,New York, NY, USA,pp\. 3215–3216\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p1.1)\. - L\. Pellegrina, M\. Riondato, and F\. Vandin \(2019b\)SPuManTE: significant pattern mining with unconditional testing\.InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,KDD ’19,New York, NY, USA,pp\. 1528–1538\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p3.1)\. - L\. Pellegrina and F\. Vandin \(2020\)Efficient mining of the most significant patterns with permutation testing\.Data Mining and Knowledge Discovery34,pp\. 1201–1234\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p3.1),[§3\.1](https://arxiv.org/html/2606.11235#S3.SS1.p1.9)\. - L\. Pellegrina and F\. Vandin \(2024\)Efficient discovery of significant patterns with few\-shot resampling\.Proceedings of the VLDB Endowment17\(10\),pp\. 2668–2680\.Cited by:[§A\.4](https://arxiv.org/html/2606.11235#A1.SS4.p2.18),[§2](https://arxiv.org/html/2606.11235#S2.p3.1),[§3\.1](https://arxiv.org/html/2606.11235#S3.SS1.p1.9),[§5](https://arxiv.org/html/2606.11235#S5.p1.1),[§5](https://arxiv.org/html/2606.11235#S5.p17.18)\. - P\. H\. Peskun \(1973\)Optimum monte\-carlo sampling using markov chains\.Biometrika60\(3\),pp\. 607–612\.Cited by:[§5](https://arxiv.org/html/2606.11235#S5.p13.16),[§5](https://arxiv.org/html/2606.11235#S5.p7.50)\. - G\. Preti, G\. de Francisci Morales, and M\. Riondato \(2024\)Alice and the caterpillar: a more descriptive null model for assessing data mining results\.Knowledge and Information Systems66\(3\),pp\. 1917–1954\.Cited by:[Table 1](https://arxiv.org/html/2606.11235#A1.T1),[Table 1](https://arxiv.org/html/2606.11235#A1.T1.10.5),[§2](https://arxiv.org/html/2606.11235#S2.p4.1),[§5](https://arxiv.org/html/2606.11235#S5.p1.1),[§5](https://arxiv.org/html/2606.11235#S5.p2.1),[§5](https://arxiv.org/html/2606.11235#S5.p7.50)\. - G\. Preti, M\. Riondato, A\. Gionis, and G\. De Francisci Morales \(2025a\)Polaris: sampling from the multigraph configuration model with prescribed color assortativity\.InProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining,pp\. 30–39\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p6.3),[§2](https://arxiv.org/html/2606.11235#S2.p4.1),[§5](https://arxiv.org/html/2606.11235#S5.p1.1),[§5](https://arxiv.org/html/2606.11235#S5.p13.16),[§5](https://arxiv.org/html/2606.11235#S5.p2.1)\. - G\. Preti, M\. Riondato, A\. Gionis, and G\. D\. F\. Morales \(2025b\)DSP: a statistically\-principled structural polarization measure\.arXiv preprint arXiv:2512\.03937\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p6.3)\. - P\. Tan, M\. Steinbach, and V\. Kumar \(2016\)Introduction to data mining\.Pearson Education India\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p1.1)\. - A\. Terada, H\. Kim, and J\. Sese \(2015\)High\-speed westfall\-young permutation procedure for genome\-wide association studies\.InProceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics,pp\. 17–26\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p3.1)\. - A\. Terada, M\. Okada\-Hatakeyama, K\. Tsuda, and J\. Sese \(2013\)Statistical significance of combinatorial regulations\.Proceedings of the National Academy of Sciences110\(32\),pp\. 12996–13001\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p5.10),[§2](https://arxiv.org/html/2606.11235#S2.p3.1)\. - A\. Tonon and F\. Vandin \(2019\)Permutation strategies for mining significant sequential patterns\.In2019 IEEE International Conference on Data Mining \(ICDM\),pp\. 1330–1335\.Cited by:[§2](https://arxiv.org/html/2606.11235#S2.p4.1)\. - G\. I\. Webb \(2006\)Discovering significant rules\.InProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining,pp\. 434–443\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p5.10)\. - G\. I\. Webb \(2007\)Discovering significant patterns\.Machine learning68,pp\. 1–33\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p5.10)\. - G\. I\. Webb \(2008\)Layered critical values: a powerful direct\-adjustment approach to discovering significant patterns\.Machine Learning71,pp\. 307–323\.Cited by:[§1](https://arxiv.org/html/2606.11235#S1.p5.10)\. - P\. H\. Westfall and J\. F\. Troendle \(2008\)Multiple testing with minimal assumptions\.Biometrical Journal: Journal of Mathematical Methods in Biosciences50\(5\),pp\. 745–755\.Cited by:[§3\.1](https://arxiv.org/html/2606.11235#S3.SS1.p1.9)\. - P\. H\. Westfall and S\. S\. Young \(1993\)Resampling\-based multiple testing: examples and methods for p\-value adjustment\.John Wiley & Sons\.Cited by:[Appendix A](https://arxiv.org/html/2606.11235#A1.1),[§1](https://arxiv.org/html/2606.11235#S1.p5.10),[§1](https://arxiv.org/html/2606.11235#S1.p7.3),[§2](https://arxiv.org/html/2606.11235#S2.p1.1),[§2](https://arxiv.org/html/2606.11235#S2.p3.1),[§3\.1](https://arxiv.org/html/2606.11235#S3.SS1.p1.13),[§3\.1](https://arxiv.org/html/2606.11235#S3.SS1.p1.9)\. ## Appendix AAppendix In this appendix we provide additional theoretical results, proofs, and experimental results that could not fit in the main manuscript\. ###### Proof of Lemma[1](https://arxiv.org/html/2606.11235#S3.Thmtheorem1)\(Westfall and Young,[1993](https://arxiv.org/html/2606.11235#bib.bib43)\)\. From the definition of the null hypotheses, and from subset pivotality, we have that Pr𝒟∼γ\(maxA∈ℋ0\{A\(𝒟\)\}\>δ\(α\)\)≤Pr𝒟0∼γ0\(maxA∈ℋ0\{A\(𝒟0\)\}\>δ\(α\)\)\.\\displaystyle\\Pr\_\{\\mathcal\{D\}\\sim\\gamma\}\\left\(\\max\_\{A\\in\\mathcal\{H\}\_\{0\}\}\\\{A\(\\mathcal\{D\}\)\\\}\>\\delta\(\\alpha\)\\right\)\\leq\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\left\(\\max\_\{A\\in\\mathcal\{H\}\_\{0\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}\>\\delta\(\\alpha\)\\right\)\.Thus, from the factℋ0⊆𝒜\\mathcal\{H\}\_\{0\}\\subseteq\\mathcal\{A\}, and the definition ofδ\(α\)\\delta\(\\alpha\), it holds Pr𝒟0∼γ0\(maxA∈ℋ0\{A\(𝒟0\)\}\>δ\(α\)\)≤Pr𝒟0∼γ0\(maxA∈𝒜\{A\(𝒟0\)\}\>δ\(α\)\)≤α,\\displaystyle\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\\!\\left\(\\max\_\{A\\in\\mathcal\{H\}\_\{0\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}\\\!\>\\\!\\delta\(\\alpha\)\\right\)\\\!\\leq\\\!\\\!\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\\!\\left\(\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}\\\!\>\\\!\\delta\(\\alpha\)\\right\)\\leq\\alpha,which concludes the proof\. ∎ ###### Proof of Lemma[1](https://arxiv.org/html/2606.11235#S4.Thmtheorem1)\. By definition ofδ\(α\)\\delta\(\\alpha\), for anyε\>0\\varepsilon\>0it holds Pr𝒟0∼γ0\(maxA∈𝒜\{A\(𝒟0\)\}\>δ\(α\)−ε\)\>α\.\\displaystyle\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\left\(\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}\>\\delta\(\\alpha\)\-\\varepsilon\\right\)\>\\alpha\.Taking the limitε→0\+\\varepsilon\\rightarrow 0^\{\+\}on both sides, we have α≤\\displaystyle\\alpha\\leqlimε→0\+Pr𝒟0∼γ0\(maxA∈𝒜\{A\(𝒟0\)\}\>δ\(α\)−ε\)\\displaystyle\\lim\_\{\\varepsilon\\rightarrow 0^\{\+\}\}\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\left\(\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}\>\\delta\(\\alpha\)\-\\varepsilon\\right\)=1−limε→0\+Pr𝒟0∼γ0\(maxA∈𝒜\{A\(𝒟0\)\}≤δ\(α\)−ε\)\.\\displaystyle=1\-\\lim\_\{\\varepsilon\\rightarrow 0^\{\+\}\}\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\left\(\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}\\leq\\delta\(\\alpha\)\-\\varepsilon\\right\)\.Letε1,ε2,…\\varepsilon\_\{1\},\\varepsilon\_\{2\},\\dotsbe a decreasing sequenceεi\>εi\+1\\varepsilon\_\{i\}\>\\varepsilon\_\{i\+1\}converging to0\. Then, letXi=“maxA∈𝒜\{A\(𝒟0\)\}≤δ\(α\)−εi”X\_\{i\}=\\text\{\`\`\}\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}\\leq\\delta\(\\alpha\)\-\\varepsilon\_\{i\}\\text\{''\}\. These form an increasing sequence of measurable setsXi⊆Xi\+1X\_\{i\}\\subseteq X\_\{i\+1\}, whose limit is limi→\+∞Xi=“maxA∈𝒜\{A\(𝒟0\)\}<δ\(α\)”\.\\displaystyle\\lim\_\{i\\rightarrow\+\\infty\}X\_\{i\}=\\text\{\`\`\}\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}<\\delta\(\\alpha\)\\text\{''\}\.Moreover, sinceXiX\_\{i\}is an increasing sequence, it holds limi→\+∞Pr𝒟0∼γ0\(Xi\)=Pr𝒟0∼γ0\(limi→\+∞Xi\)\.\\displaystyle\\lim\_\{i\\rightarrow\+\\infty\}\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\(X\_\{i\}\)=\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\left\(\\lim\_\{i\\rightarrow\+\\infty\}X\_\{i\}\\right\)\.Therefore, α≤1−limε→0\+Pr𝒟0∼γ0\(maxA∈𝒜\{A\(𝒟0\)\}≤δ\(α\)−ε\)\\displaystyle\\alpha\\leq 1\-\\lim\_\{\\varepsilon\\rightarrow 0^\{\+\}\}\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\left\(\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}\\leq\\delta\(\\alpha\)\-\\varepsilon\\right\)=1−Pr𝒟0∼γ0\(maxA∈𝒜\{A\(𝒟0\)\}<δ\(α\)\)=Pr𝒟0∼γ0\(maxA∈𝒜\{A\(𝒟0\)\}≥δ\(α\)\),\\displaystyle=1\-\\\!\\\!\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\\!\\left\(\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}\\\!<\\\!\\delta\(\\alpha\)\\right\)\\\!=\\\!\\\!\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\\!\\left\(\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}\\\!\\geq\\\!\\delta\(\\alpha\)\\right\),obtaining the statement\. ∎ ###### Proof of Lemma[2](https://arxiv.org/html/2606.11235#S4.Thmtheorem2)\. We prove thatPr𝒮\(δ^\(𝒮\)<δ\(α\)\)≤α\\Pr\_\{\\mathcal\{S\}\}\(\\hat\{\\delta\}\(\\mathcal\{S\}\)<\\delta\(\\alpha\)\)\\leq\\alpha\. First, note that the event “δ^\(𝒮\)<δ\(α\)\\hat\{\\delta\}\(\\mathcal\{S\}\)<\\delta\(\\alpha\)” is equivalent to the intersection of themmevents “maxA∈𝒜𝒟i0<δ\(α\)\\max\_\{A\\in\\mathcal\{A\}\}\\mathcal\{D\}^\{0\}\_\{i\}<\\delta\(\\alpha\)”,∀i∈\[1,m\]\\forall i\\in\[1,m\]; moreover, these events are mutually independent, since each resample𝒟i0\\mathcal\{D\}^\{0\}\_\{i\}is taken i\.i\.d\. fromγ0\\gamma\_\{0\}\. Then, from Lemma[1](https://arxiv.org/html/2606.11235#S4.Thmtheorem1), it holds Pr𝒟i0\(maxA∈𝒜𝒟i0<δ\(α\)\)≤1−α,\\Pr\_\{\\mathcal\{D\}^\{0\}\_\{i\}\}\\left\(\\max\_\{A\\in\\mathcal\{A\}\}\\mathcal\{D\}^\{0\}\_\{i\}<\\delta\(\\alpha\)\\right\)\\leq 1\-\\alpha,for alli∈\[1,m\]i\\in\[1,m\]\. Therefore, we have Pr𝒮\(δ^\(𝒮\)<δ\(α\)\)=∏i=1mPr𝒟i0\(maxA∈𝒜𝒟i0<δ\(α\)\)≤\(1−α\)m\.\\displaystyle\\Pr\_\{\\mathcal\{S\}\}\\left\(\\hat\{\\delta\}\(\\mathcal\{S\}\)<\\delta\(\\alpha\)\\right\)=\\prod\_\{i=1\}^\{m\}\\Pr\_\{\\mathcal\{D\}^\{0\}\_\{i\}\}\\left\(\\max\_\{A\\in\\mathcal\{A\}\}\\mathcal\{D\}^\{0\}\_\{i\}<\\delta\(\\alpha\)\\right\)\\leq\(1\-\\alpha\)^\{m\}\.We now show that, ifmmis set as in the statement, then\(1−α\)m≤α\(1\-\\alpha\)^\{m\}\\leq\\alpha\. This follow from the fact \(1−α\)m≤α⇔m≥ln\(1α\)ln\(11−α\),\\displaystyle\(1\-\\alpha\)^\{m\}\\leq\\alpha\\iff m\\geq\\frac\{\\ln\\bigl\(\\frac\{1\}\{\\alpha\}\\bigr\)\}\{\\ln\\bigl\(\\frac\{1\}\{1\-\\alpha\}\\bigr\)\},completing the proof\. ∎ ### A\.1\.Power Analysis ofFewRS This section presents additional results on the guarantees to the statistical power ofFewRS\. ###### Proof of Lemma[5](https://arxiv.org/html/2606.11235#S4.Thmtheorem5)\. From the definition ofδ\(α\)\\delta\(\\alpha\), it holds Pr\(maxA∈𝒜A\(𝒟0\)\>δ\(α/k\)\)≤α/k\.\\displaystyle\\Pr\\left\(\\max\_\{A\\in\\mathcal\{A\}\}A\(\\mathcal\{D\}^\{0\}\)\>\\delta\(\\alpha/k\)\\right\)\\leq\\alpha/k\.Therefore, we have Pr\(δ^\(𝒮\)≤δ\(α/k\)\)=∏i=1mPr\(maxA∈𝒜A\(𝒟i0\)≤δ\(α/k\)\)\\displaystyle\\Pr\\left\(\\hat\{\\delta\}\(\\mathcal\{S\}\)\\leq\\delta\(\\alpha/k\)\\right\)=\\prod\_\{i=1\}^\{m\}\\Pr\\left\(\\max\_\{A\\in\\mathcal\{A\}\}A\(\\mathcal\{D\}^\{0\}\_\{i\}\)\\leq\\delta\(\\alpha/k\)\\right\)=∏i=1m\(1−Pr\(maxA∈𝒜A\(𝒟i0\)\>δ\(α/k\)\)\)≥∏i=1m\(1−αk\)=\(1−αk\)m\.\\displaystyle=\\\!\\prod\_\{i=1\}^\{m\}\\\!\\left\(1\-\\Pr\\left\(\\max\_\{A\\in\\mathcal\{A\}\}A\(\\mathcal\{D\}^\{0\}\_\{i\}\)\\\!\>\\\!\\delta\(\\alpha/k\)\\right\)\\right\)\\geq\\prod\_\{i=1\}^\{m\}\\\!\\left\(1\-\\frac\{\\alpha\}\{k\}\\right\)\\\!=\\\!\\left\(1\-\\frac\{\\alpha\}\{k\}\\right\)^\{m\}\\\!\.Setting\(1−αk\)m≥1−t\(1\-\\frac\{\\alpha\}\{k\}\)^\{m\}\\geq 1\-tyields m≤ln\(11−t\)ln\(kk−α\)≤kln\(11−t\)α,\\displaystyle m\\leq\\frac\{\\ln\\left\(\\frac\{1\}\{1\-t\}\\right\)\}\{\\ln\\left\(\\frac\{k\}\{k\-\\alpha\}\\right\)\}\\leq\\frac\{k\\ln\\left\(\\frac\{1\}\{1\-t\}\\right\)\}\{\\alpha\},which always holds whenk=mαln\(11−t\)k=\\frac\{m\\alpha\}\{\\ln\(\\frac\{1\}\{1\-t\}\)\}, proving the statement\. ∎ The following results provide a generalization ofFewRSto improve the statistical power when the generation of a higher number of random resamples is possible\. Recall that𝒮=\{𝒟10,𝒟20,…,𝒟m0\}\\mathcal\{S\}=\\\{\\mathcal\{D\}^\{0\}\_\{1\},\\mathcal\{D\}^\{0\}\_\{2\},\\dots,\\mathcal\{D\}^\{0\}\_\{m\}\\\}, and definedi=maxA∈𝒜𝒟i0d\_\{i\}=\\max\_\{A\\in\\mathcal\{A\}\}\\mathcal\{D\}^\{0\}\_\{i\}\. For anyℓ∈\[1,m\]\\ell\\in\[1,m\], letr\(𝒮,ℓ\)r\(\\mathcal\{S\},\\ell\)be theℓ\\ell\-th largest value of\{di\}\\\{d\_\{i\}\\\}, i\.e\.,r\(𝒮,ℓ\)=max\{x:∑i=1m𝟙\[di≥x\]≥ℓ\}r\(\\mathcal\{S\},\\ell\)=\\max\\\{x:\\sum\_\{i=1\}^\{m\}\\mathds\{1\}\\left\[d\_\{i\}\\geq x\\right\]\\geq\\ell\\\}\. Finally, defineB\(i,m,p\)B\(i,m,p\)as the \(left\) tail of the Binomial distribution: B\(i,m,p\)=∑j=0i\(mj\)pj\(1−p\)m−j\.\\displaystyle B\(i,m,p\)=\\sum\_\{j=0\}^\{i\}\\binom\{m\}\{j\}p^\{j\}\(1\-p\)^\{m\-j\}\. ###### Lemma 1\. Letm≥⌈ln\(1α\)/ln\(11−α\)⌉m\\geq\\left\\lceil\\ln\(\\frac\{1\}\{\\alpha\}\)/\\ln\(\\frac\{1\}\{1\-\\alpha\}\)\\right\\rceil, and define ℓ=max\{i∈\[1,m\]:B\(i−1,m,α\)≤α\}\.\\displaystyle\\ell=\\max\\left\\\{i\\in\[1,m\]:B\(i\-1,m,\\alpha\)\\leq\\alpha\\right\\\}\.If line[1](https://arxiv.org/html/2606.11235#algorithm1)ofFewRSis replaced by “δ^\(𝒮\)←r\(𝒮,ℓ\)\\hat\{\\delta\}\(\\mathcal\{S\}\)\\leftarrow r\(\\mathcal\{S\},\\ell\)”, then the output ofFewRShas FWER≤α\\leq\\alpha\. ###### Proof\. First, note that the set\{i∈\[1,m\]:B\(i−1,m,α\)≤α\}\\left\\\{i\\in\[1,m\]:B\(i\-1,m,\\alpha\)\\leq\\alpha\\right\\\}is always not empty sinceB\(0,m,α\)≤αB\(0,m,\\alpha\)\\leq\\alphagiven the choice ofmm\. Recall that, from Lemma[1](https://arxiv.org/html/2606.11235#S4.Thmtheorem1), it holds Pr𝒟0∼γ0\(maxA∈𝒜\{A\(𝒟0\)\}≥δ\(α\)\)≥α\.\\Pr\_\{\\mathcal\{D\}^\{0\}\\sim\\gamma\_\{0\}\}\\Bigl\(\\max\_\{A\\in\\mathcal\{A\}\}\\\{A\(\\mathcal\{D\}^\{0\}\)\\\}\\geq\\delta\(\\alpha\)\\Bigr\)\\geq\\alpha\.Fori∈\[1,m\]i\\in\[1,m\], define the i\.i\.d\. Bernoulli random variablesYiY\_\{i\}with𝔼\[Yi\]=α\\mathop\{\\mathbb\{E\}\}\[Y\_\{i\}\]=\\alpha\. We have Pr\(r\(𝒮,ℓ\)<δ\(α\)\)=Pr\(∑i=1m𝟙\[di≥δ\(α\)\]<ℓ\)\\displaystyle\\Pr\\Bigl\(r\(\\mathcal\{S\},\\ell\)<\\delta\(\\alpha\)\\Bigr\)=\\Pr\\Bigl\(\\sum\_\{i=1\}^\{m\}\\mathds\{1\}\\left\[d\_\{i\}\\geq\\delta\(\\alpha\)\\right\]<\\ell\\Bigr\)≤Pr\(∑i=1mYi<ℓ\)=∑j=0ℓ−1\(mj\)αj\(1−α\)m−j=B\(ℓ−1,m,α\)≤α,\\displaystyle\\leq\\Pr\\Bigl\(\\sum\_\{i=1\}^\{m\}Y\_\{i\}<\\ell\\Bigr\)=\\sum\_\{j=0\}^\{\\ell\-1\}\\binom\{m\}\{j\}\\alpha^\{j\}\(1\-\\alpha\)^\{m\-j\}=B\(\\ell\-1,m,\\alpha\)\\leq\\alpha,where the first inequality follows from observing that𝟙\[di≥δ\(α\)\]\\mathds\{1\}\\left\[d\_\{i\}\\geq\\delta\(\\alpha\)\\right\]is a Bernoulli random variable with𝔼\[𝟙\[di≥δ\(α\)\]\]≥𝔼\[Yi\]=α\\mathop\{\\mathbb\{E\}\}\[\\mathds\{1\}\\left\[d\_\{i\}\\geq\\delta\(\\alpha\)\\right\]\]\\geq\\mathop\{\\mathbb\{E\}\}\[Y\_\{i\}\]=\\alpha, obtaining the statement\. ∎ The following result shows how to bound the quantileδ\(α\)\\delta\(\\alpha\)by the estimater\(𝒮,ℓ\)r\(\\mathcal\{S\},\\ell\), while simultaneously guaranteeing thatr\(𝒮,ℓ\)r\(\\mathcal\{S\},\\ell\)does not exceedδ\(α/k\)\\delta\(\\alpha/k\)for anyk\>1k\>1\. ###### Lemma 2\. Givenk\>1k\>1, letm,ℓ≥1m,\\ell\\geq 1, such that B\(ℓ−1,m,α\)≤α/2,B\(ℓ−1,m,α/k\)≥1−α/2\.\\displaystyle B\(\\ell\-1,m,\\alpha\)\\leq\\alpha/2,\\text\{\\hskip 8\.5359pt\}B\(\\ell\-1,m,\\alpha/k\)\\geq 1\-\\alpha/2\.Then, it holds Pr\(δ\(α\)≤r\(𝒮,ℓ\)≤δ\(α/k\)\)≥1−α\.\\displaystyle\\Pr\(\\delta\(\\alpha\)\\leq r\(\\mathcal\{S\},\\ell\)\\leq\\delta\(\\alpha/k\)\)\\geq 1\-\\alpha\. ###### Proof\. Fori∈\[1,m\]i\\in\[1,m\], defineYi,XiY\_\{i\},X\_\{i\}as mutually independent Bernoulli random variables with𝔼\[Yi\]=α\\mathop\{\\mathbb\{E\}\}\[Y\_\{i\}\]=\\alphaand𝔼\[Xi\]=α/k\\mathop\{\\mathbb\{E\}\}\[X\_\{i\}\]=\\alpha/k\. Recall thatPr\(di≥δ\(α\)\)≥α\\Pr\(d\_\{i\}\\geq\\delta\(\\alpha\)\)\\geq\\alphaand thatPr\(di\>δ\(α/k\)\)≤α/k\\Pr\(d\_\{i\}\>\\delta\(\\alpha/k\)\)\\leq\\alpha/k\. It holds Pr\(r\(𝒮,ℓ\)<δ\(α\)∨r\(𝒮,ℓ\)\>δ\(α/k\)\)\\displaystyle\\Pr\(r\(\\mathcal\{S\},\\ell\)<\\delta\(\\alpha\)\\vee r\(\\mathcal\{S\},\\ell\)\>\\delta\(\\alpha/k\)\)=Pr\(r\(𝒮,ℓ\)<δ\(α\)\)\+Pr\(r\(𝒮,ℓ\)\>δ\(α/k\)\)\\displaystyle=\\Pr\(r\(\\mathcal\{S\},\\ell\)<\\delta\(\\alpha\)\)\+\\Pr\(r\(\\mathcal\{S\},\\ell\)\>\\delta\(\\alpha/k\)\)=Pr\(∑i=1m𝟙\[di≥δ\(α\)\]<ℓ\)\+Pr\(∑i=1m𝟙\[di\>δ\(α/k\)\]≥ℓ\)\\displaystyle=\\Pr\\Bigl\(\\sum\_\{i=1\}^\{m\}\\mathds\{1\}\\left\[d\_\{i\}\\geq\\delta\(\\alpha\)\\right\]<\\ell\\Bigr\)\+\\Pr\\Bigl\(\\sum\_\{i=1\}^\{m\}\\mathds\{1\}\\left\[d\_\{i\}\>\\delta\(\\alpha/k\)\\right\]\\geq\\ell\\Bigr\)≤Pr\(∑i=1mYi<ℓ\)\+Pr\(∑i=1mXi≥ℓ\)\\displaystyle\\leq\\Pr\(\\sum\_\{i=1\}^\{m\}Y\_\{i\}<\\ell\)\+\\Pr\(\\sum\_\{i=1\}^\{m\}X\_\{i\}\\geq\\ell\)=B\(ℓ−1,m,α\)\+1−B\(ℓ−1,m,α/k\)≤α,\\displaystyle=B\(\\ell\-1,m,\\alpha\)\+1\-B\(\\ell\-1,m,\\alpha/k\)\\leq\\alpha,obtaining the statement\. ∎ We remark that, for any choice ofk\>1k\>1, there always exist a sufficiently large value ofmm\(and thus, a valid choice ofℓ\\ell\), such that the conditions in the statement of Lemma[2](https://arxiv.org/html/2606.11235#A1.Thmtheorem2)holds\. Table 1\.Statistics of the datasets considered in our experiments\.nnis the number of transactions,ddis the number of items,\|𝒟\|\|\\mathcal\{D\}\|is the sum of the transaction lengths,ℓ\\ellis the average transaction length,θ\\thetais the frequency threshold used to mine frequent itemsets \(set similarly to previous works\(Pretiet al\.,[2024](https://arxiv.org/html/2606.11235#bib.bib263)\)\)\.Table 2\.Statistics of the graphs considered in our experiments\.\|V\|\|V\|is the number of vertices,\|E\|\|E\|is the number of edges,\|ℒ\|\|\\mathcal\{L\}\|is the number of distinct nodes’ colors\.Table 3\.Statistics of the labelled datasets considered in our experiments\.nnis the number of transactions,ddis the number of features \(categorical/continuous\),μ\(𝒟\)\\mu\(\\mathcal\{D\}\)is the fraction of transactions with target equal to11,zzis the maximum number of conjunction terms in the languageℒ\\mathcal\{L\}\. ### A\.2\.Normalization We describe two types of normalizations based on the holdout set of resampled datasets𝒮′\\mathcal\{S\}^\{\\prime\}, withm′=\|𝒮′\|m^\{\\prime\}=\|\\mathcal\{S\}^\{\\prime\}\|\. The first type of normalization, that we use in our experiments, is based on the empirical standard score forAA; defineμ~A\(𝒮′\)\\tilde\{\\mu\}\_\{A\}\(\\mathcal\{S\}^\{\\prime\}\)andσ~A\(𝒮′\)\\tilde\{\\sigma\}\_\{A\}\(\\mathcal\{S\}^\{\\prime\}\)as μ~A\(𝒮′\)\\displaystyle\\tilde\{\\mu\}\_\{A\}\(\\mathcal\{S\}^\{\\prime\}\)=1m′∑𝒟′∈𝒮′A\(𝒟′\),\\displaystyle=\\frac\{1\}\{m^\{\\prime\}\}\\sum\_\{\\mathcal\{D\}^\{\\prime\}\\in\\mathcal\{S\}^\{\\prime\}\}A\(\\mathcal\{D\}^\{\\prime\}\),σ~A2\(𝒮′\)\\displaystyle\\tilde\{\\sigma\}^\{2\}\_\{A\}\(\\mathcal\{S\}^\{\\prime\}\)=1m′∑𝒟′∈𝒮′\(A\(𝒟′\)−μ~A\(𝒮′\)\)2,\\displaystyle=\\frac\{1\}\{m^\{\\prime\}\}\\sum\_\{\\mathcal\{D\}^\{\\prime\}\\in\\mathcal\{S\}^\{\\prime\}\}\\left\(A\(\\mathcal\{D\}^\{\\prime\}\)\-\\tilde\{\\mu\}\_\{A\}\(\\mathcal\{S\}^\{\\prime\}\)\\right\)^\{2\},as, respectively, empirical estimates of the expectation and variance ofAA\. The normalized statisticA⊙\(𝒟\)A^\{\\odot\}\(\\mathcal\{D\}\)is defined as: \(1\)A⊙\(𝒟\)=A\(𝒟\)−μ~A\(𝒮′\)σ~A2\(𝒮′\)\+τ,\\displaystyle A^\{\\odot\}\(\\mathcal\{D\}\)=\\frac\{A\(\\mathcal\{D\}\)\-\\tilde\{\\mu\}\_\{A\}\(\\mathcal\{S\}^\{\\prime\}\)\}\{\\sqrt\{\\tilde\{\\sigma\}^\{2\}\_\{A\}\(\\mathcal\{S\}^\{\\prime\}\)\}\+\\tau\},whereτ\\tauis a small constant \(e\.g\.,τ=0\.01\\tau=0\.01\)\. The second type uses an estimate of the range ofAA: A⊙\(𝒟\)=A\(𝒟\)−μ~A\(𝒮′\)max𝒟′∈𝒮′A\(𝒟′\)−min𝒟′∈𝒮′A\(𝒟′\)\+τ\.\\displaystyle A^\{\\odot\}\(\\mathcal\{D\}\)=\\frac\{A\(\\mathcal\{D\}\)\-\\tilde\{\\mu\}\_\{A\}\(\\mathcal\{S\}^\{\\prime\}\)\}\{\\max\_\{\\mathcal\{D\}^\{\\prime\}\\in\\mathcal\{S\}^\{\\prime\}\}A\(\\mathcal\{D\}^\{\\prime\}\)\-\\min\_\{\\mathcal\{D\}^\{\\prime\}\\in\\mathcal\{S\}^\{\\prime\}\}A\(\\mathcal\{D\}^\{\\prime\}\)\+\\tau\}\. ### A\.3\.Effect of normalization In this experiment we evaluated the accuracy of the normalization employed in our experiments\. Note that this step is needed by bothFewRSand WY only when the number of analyses is\>1\>1\. To do so, we vary the sizem′m^\{\\prime\}of the set𝒮′\\mathcal\{S\}^\{\\prime\}of independently generated resamples, showing the behavior of the estimatesμ~A\(𝒮′\)\\tilde\{\\mu\}\_\{A\}\(\\mathcal\{S\}^\{\\prime\}\)andσ~A2\(𝒮′\)\\tilde\{\\sigma\}^\{2\}\_\{A\}\(\\mathcal\{S\}^\{\\prime\}\), that are, respectively, empirical estimates of the expectation and variance of the statisticAA\(see Section[4\.1](https://arxiv.org/html/2606.11235#S4.SS1)and Appendix[A\.2](https://arxiv.org/html/2606.11235#A1.SS2)\)\. We consider the labelled network experiment, in particular the test statisticMvM\_\{v\}used for evaluating the fraction of same\-color neighbors of the highest degree nodevv, under the Polaris model\. In Figure[21](https://arxiv.org/html/2606.11235#A1.F21), we show the results of this experiment\. Each plot shows the behavior ofμ~\\tilde\{\\mu\}andσ~2\\tilde\{\\sigma\}^\{2\}form′∈\[5,25\]m^\{\\prime\}\\in\[5,25\], displaying the mean and std over1010runs\. Overall, we observe that both empirical estimates were very stable; therefore, our choice ofm′=10m^\{\\prime\}=10is a good trade\-off between accuracy of the estimates and computational overhead\. ### A\.4\.Comparison with FSR This section provides additional details regarding the experimental comparison ofFewRSwith FSR\. We are given a dataset𝒟=\{\(t1,ℓ1\),\(t2,ℓ2\),…,\(tn,ℓn\)\}\\mathcal\{D\}=\\\{\(t\_\{1\},\\ell\_\{1\}\),\(t\_\{2\},\\ell\_\{2\}\),\\dots,\(t\_\{n\},\\ell\_\{n\}\)\\\}withnnlabelled transactions, where eachtit\_\{i\}is a realization ofddfeatures \(each either categorical or continuous\), andℓi∈\{0,1\}\\ell\_\{i\}\\in\\\{0,1\\\}is a binary label\. A patternPPis defined as \(at mostzz\) conjunctions of logical conditions, where each condition is over one the features of the dataset\. We say that a patternPPis observed on a transactiontit\_\{i\}ifP∈tiP\\in t\_\{i\}\. The pattern languageℒ\\mathcal\{L\}contains all patterns that may be observed on a dataset𝒟\\mathcal\{D\}\. For each patternP∈ℒP\\in\\mathcal\{L\}we define the test statisticAPA\_\{P\}, that corresponds to the11\-quality of the subgroupPP, which quantifies its association with the target variable\(Pellegrina and Vandin,[2024](https://arxiv.org/html/2606.11235#bib.bib11)\)\. Letμ\(𝒟\)=1n∑i=1nℓi\\mu\(\\mathcal\{D\}\)=\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\ell\_\{i\}be the mean target value\. The test statisticAP\(𝒟\)A\_\{P\}\(\\mathcal\{D\}\)is defined as AP\(𝒟\)=1n∑i=1n𝟙\[P∈ti\]\(ℓi−μ\(𝒟\)\)\.\\displaystyle A\_\{P\}\(\\mathcal\{D\}\)=\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\mathds\{1\}\\left\[P\\in t\_\{i\}\\right\]\(\\ell\_\{i\}\-\\mu\(\\mathcal\{D\}\)\)\.The set of test statistics is𝒜=\{AP:P∈ℒ\}\\mathcal\{A\}=\\\{A\_\{P\}:P\\in\\mathcal\{L\}\\\}\. A resampled dataset𝒟0\\mathcal\{D\}^\{0\}is drawn from the null distributionγ0\\gamma\_\{0\}by randomly permuting the labels of𝒟\\mathcal\{D\}, i\.e\.,𝒟0=\{\(t1,ℓσ1\),\(t2,ℓσ2\),…,\(tn,ℓσn\)\}\\mathcal\{D\}^\{0\}=\\\{\(t\_\{1\},\\ell\_\{\\sigma\_\{1\}\}\),\(t\_\{2\},\\ell\_\{\\sigma\_\{2\}\}\),\\dots,\(t\_\{n\},\\ell\_\{\\sigma\_\{n\}\}\)\\\}, whereσ1,σ2,…,σn\\sigma\_\{1\},\\sigma\_\{2\},\\dots,\\sigma\_\{n\}is a uniform random permutation of\[1,n\]\[1,n\]\. In our experiment we focus on the setT\(k,𝒟\)T\(k,\\mathcal\{D\}\)ofkkpatterns with highest test statistic on𝒟\\mathcal\{D\}, withk=105k=10^\{5\}\. We report as significant all patternsP∈T\(k,𝒟\)P\\in T\(k,\\mathcal\{D\}\)withAP\(𝒟\)\>δ^\(𝒮\)A\_\{P\}\(\\mathcal\{D\}\)\>\\hat\{\\delta\}\(\\mathcal\{S\}\)forFewRS, and all patternsP∈T\(k,𝒟\)P\\in T\(k,\\mathcal\{D\}\)withAP\(𝒟\)\>εA\_\{P\}\(\\mathcal\{D\}\)\>\\varepsilonfor FSR\.      Figure 5\.Statistical power \(empirical FWER\) of resampling\-based methods \(WY andFewRS\) and Bonferroni w\.r\.t\. the correlationρ\\rhoand the number of testsdd\.Statistical power \(empirical FWER\) of resampling\-based methods \(WY and FewRS\) and Bonferroni w\.r\.t\. the correlation $\\rho$ and the number of tests $d$\.     Figure 6\.Left: running time of resampling\-based methods \(WY andFewRS\) and Bonferroni forρ=0\.5\\rho=0\.5\(other values are very similar\) vs\.dd\. Other plots: trade\-off between running time and statistical power between all methods for different values ofρ\\rho\.Left: running time of resampling\-based methods \(WY and FewRS\) and Bonferroni for $\\rho=0\.5$ \(other values are very similar\) vs\. $d$\. Other plots: trade\-off between running time and statistical power between all methods for different values of $\\rho$\.     Figure 7\.Statistical power \(empirical FWER\) of resampling\-based methods \(WY andFewRS\) and Bonferroni w\.r\.t\. the correlationρ\\rhoand the FWER boundα\\alpha\.Statistical power \(empirical FWER\) of resampling\-based methods \(WY and FewRS\) and Bonferroni w\.r\.t\. the correlation $\\rho$ and the FWER bound $\\alpha$\.     Figure 8\.Left: running time of resampling\-based methods \(WY andFewRS\) and Bonferroni forρ=0\.5\\rho=0\.5\(other values are very similar\) vs\.α\\alpha\. WY uses10310^\{3\}resamples forα=0\.1\\alpha=0\.1, and10410^\{4\}for other values\. Other plots: trade\-off between running time and statistical power between all methods for different values ofα\\alpha\.Left: running time of resampling\-based methods \(WY and FewRS\) and Bonferroni for $\\rho=0\.5$ \(other values are very similar\) vs\. $\\alpha$\. WY uses $10^\{3\}$ resamples for $\\alpha=0\.1$, and $10^\{4\}$ for other values\. Other plots: trade\-off between running time and statistical power between all methods for different values of $\\alpha$\. Figure 9\.Left: results for FI\. Right: results for labelled networks\.Left: results for FI\. Right: results for labelled networks\.    Figure 10\.Left: time to generate one sample from the GMMT and ALICE models\. Right: time to generate and analyze all samples withFewRSand WY under ALICE model\. The label \(p\) denotes parallelized approaches with3232cores\.Left: time to generate one sample from the GMMT and ALICE models\. Right: time to generate and analyze all samples with FewRS and WY under ALICE model\. The label \(p\) denotes parallelized approaches with $32$ cores\.   Figure 11\.Running times for the CM model\. Left: time to generate one sample\. Right: time to generate and analyze all samples withFewRSand WY\. The label \(p\) denotes parallelized approaches with up to3232cores\.Running times for the CM model\. Left: time to generate one sample\. Right: time to generate and analyze all samples with FewRS and WY\. The label \(p\) denotes parallelized approaches with up to $32$ cores\.   Figure 12\.Running times for the Polaris model\. Left: time to generate one sample\. Right: time to generate and analyze all samples withFewRSand WY\. The label \(p\) denotes parallelized approaches with3232cores, apart from Com\-LJ and Com\-Orkut, where only1515and55parallel executions can be performed due to high memory usage\.Running times for the Polaris model\. Left: time to generate one sample\. Right: time to generate and analyze all samples with FewRS and WY\. The label \(p\) denotes parallelized approaches with $32$ cores, apart from Com\-LJ and Com\-Orkut, where only $15$ and $5$ parallel executions can be performed due to high memory usage\.               Figure 13\.Testing the significance of the number of FI for different cardinalities under the GMMT model\.Testing the significance of the number of FI for different cardinalities under the GMMT model\.               Figure 14\.Testing the significance of the number of FI for different cardinalities under the GMMT model\.Testing the significance of the number of FI for different cardinalities under the GMMT model\.               Figure 15\.Testing the significance of the number of FI for different cardinalities under the ALICE model\.Testing the significance of the number of FI for different cardinalities under the ALICE model\.               Figure 16\.Testing the significance of the number of FI for different cardinalities under the ALICE model\.Testing the significance of the number of FI for different cardinalities under the ALICE model\.                Figure 17\.Testing the significance ofMv\(G\)M\_\{v\}\(G\)of the1010nodes with highest degree under the CM model\.Testing the significance of $M\_\{v\}\(G\)$ of the $10$ nodes with highest degree under the CM model\.                Figure 18\.Testing the significance ofMv\(G\)M\_\{v\}\(G\)of the1010nodes with highest degree under the CM model\.Testing the significance of $M\_\{v\}\(G\)$ of the $10$ nodes with highest degree under the CM model\.                Figure 19\.Testing the significance ofMv\(G\)M\_\{v\}\(G\)of the1010nodes with highest degree under the Polaris model\.Testing the significance of $M\_\{v\}\(G\)$ of the $10$ nodes with highest degree under the Polaris model\.                Figure 20\.Testing the significance ofMv\(G\)M\_\{v\}\(G\)of the1010nodes with highest degree under the Polaris model\.Testing the significance of $M\_\{v\}\(G\)$ of the $10$ nodes with highest degree under the Polaris model\.                Figure 21\.Normalization parameters of the functionMvM\_\{v\}for the nodevvwith highest degree, under the Polaris model\.Normalization parameters of the function $M\_\{v\}$ for the node $v$ with highest degree, under the Polaris model\.    Figure 22\.Comparison ofFewRSwith FSR in terms of significance threshold, number of significant patterns reported in output, and running time\.Comparison of FewRS with FSR in terms of significance threshold, number of significant patterns reported in output, and running time\.
Similar Articles
FederatedRSF : Federated Random Survival Forests for Partially Overlapping Medical Data
This paper presents FederatedRSF, a Python package for federated random survival forests that handles partially overlapping medical data across institutions without sharing raw data, and demonstrates comparable performance to centralized training on breast cancer data.
FSPO: Few-Shot Optimization of Synthetic Preferences Personalizes to Real Users
FSPO proposes a few-shot preference optimization algorithm for LLM personalization that reframes reward modeling as meta-learning, enabling models to quickly infer personalized reward functions from limited user preferences. The method achieves 87% personalization performance on synthetic users and 70% on real users through careful synthetic preference dataset construction.
A Resampling-Based Framework for Network Structure Learning in High-Dimensional Data
RSNet is an open-source R package that provides a resampling-based framework for robust and interpretable network inference in high-dimensional data, supporting partial correlation networks and conditional Gaussian Bayesian networks with graphlet-based topology analysis.
Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases
This paper investigates the 'small-vs-large gap', where training on fewer samples with more repetitions can lead to faster learning and compute savings compared to using larger datasets, attributing the speedup to layer-wise growth enabled by sampling biases. The findings suggest that smaller datasets with repetition can be proactively leveraged as favorable inductive biases, particularly in reasoning tasks.
RADS: Reinforcement Learning-Based Sample Selection Improves Transfer Learning in Low-resource and Imbalanced Clinical Settings
RADS uses reinforcement learning to pick the most informative samples for few-shot fine-tuning, boosting transfer-learning accuracy on low-resource, highly imbalanced clinical datasets.