Uncertainty Quantification of Engineering Structures by Polynomial Chaos Expansion and Multivariate Active Learning

arXiv cs.LG 06/17/26, 04:00 AM Papers
Summary
This paper presents a generalized adaptive sequential sampling method for constructing polynomial chaos expansion surrogate models to improve uncertainty quantification in multi-output engineering structures, balancing variance contribution and spatial exploration.
arXiv:2606.17233v1 Announce Type: new Abstract: In many engineering applications, a single high-fidelity model produces multiple quantities of interest (QoIs) under the same input parameters, e.g. finite element models of complex physical systems. To alleviate the high computational cost of direct model evaluations, surrogate models are widely used to construct efficient approximations of model responses. Naturally, the accuracy of surrogates strongly depends on the quality of the experimental design (ED). However, a single ED may not provide an adequate representation for all outputs simultaneously, especially when different outputs exhibit varying sensitivities to the input variables. A straightforward solution is to perform separate sampling for each output, but this results in increased sampling complexity and computational cost. From a statistical perspective, such an approach also ignores potential correlations among all outputs and may compromise data consistency. To address this issue, an adaptive sequential sampling method for constructing polynomial chaos expansion surrogate models is generalized for vector valued QoIs. The method sequentially selects new samples from a candidate pool based on their local contribution to the output variance, while balancing distance-based exploration of the input space and exploitation of aggregated variance information across all outputs. Its performance is compared with non-sequential Latin Hypercube Sampling through several numerical examples from engineering problems. Numerical results demonstrate that the proposed strategy improves both surrogate accuracy and stability, and provides a more reliable estimation of second-order statistics.
Original Article
View Cached Full Text
Cached at: 06/17/26, 05:37 AM
# Uncertainty Quantification of Engineering Structures by Polynomial Chaos Expansion and Multivariate Active Learning
Source: [https://arxiv.org/html/2606.17233](https://arxiv.org/html/2606.17233)
###### Abstract

In many engineering applications, a single high\-fidelity model produces multiple quantities of interest \(QoIs\) under the same input parameters, e\.g\. finite element models of complex physical systems\. To alleviate the high computational cost of direct model evaluations, surrogate models are widely used to construct efficient approximations of model responses\. Naturally, the accuracy of surrogates strongly depends on the quality of the experimental design \(ED\)\. However, a single ED may not provide an adequate representation for all outputs simultaneously, especially when different outputs exhibit varying sensitivities to the input variables\. A straightforward solution is to perform separate sampling for each output, but this results in increased sampling complexity and computational cost\. From a statistical perspective, such an approach also ignores potential correlations among all outputs and may compromise data consistency\. To address this issue, an adaptive sequential sampling method for constructing polynomial chaos expansion surrogate models is generalized for vector valued QoIs\. The method sequentially selects new samples from a candidate pool based on their local contribution to the output variance, while balancing distance\-based exploration of the input space and exploitation of aggregated variance information across all outputs\. Its performance is compared with non\-sequential Latin Hypercube Sampling through several numerical examples from engineering problems\. Numerical results demonstrate that the proposed strategy improves both surrogate accuracy and stability, and provides a more reliable estimation of second\-order statistics\.

###### keywords:

Polynomial Chaos Expansion , Uncertainty Quantification, Adaptive Sampling , Sequential Sampling

††journal:Engineering Structures\{highlights\}

Generalization and application of multi\-output sequential adaptive sampling are presented\.

A normalized variance aggregation is introduced to evaluate multi\-output PCE models\.

The utilized criterion balances variance contribution and spatial exploration of the samples for multi\-output problems\.

Numerical results of engineering examples demonstrate superior accuracy and stability of the active learning\.

## 1Introduction

Uncertainty is an inseparable part of all engineering systems and can affect structural behavior in unpredictable ways\. This phenomenon arises from numerous factors such as environmental conditions and construction errors, which influence the applied loads, material properties, and the dimensions of structural members\. Therefore, by determining the precise value of these uncertainties, an accurate estimation of structural performance and behavior over their lifetime can be obtained\. This estimation includes the safety level and the most probable failure modes, allowing for necessary actions to be taken on time when required\. The main goal is thus to determine the amount, understand the impact, describe, and propagate the uncertainties present in engineering structures using mathematical models\. In the real world, uncertainty quantification \(UQ\) for engineering structures requires complex simulations to model structural responses using finite element based models, which usually have a very high computational cost\. As a result, using numerical and traditional Monte Carlo \(MC\) simulation methods is often not feasible due to very high computational burden\. Therefore, the use of surrogate models for UQ in engineering structures has attracted significant attention\.

Moreover, in structural and mechanical analysis, a high\-fidelity numerical simulation often provides not one but several quantities of interest simultaneously\. This multi\-output nature becomes particularly challenging in uncertainty\-related analyses, where repeated model evaluations are required and the associated computational cost may become prohibitive\. Surrogate modeling therefore provides an effective way to alleviate this burden while maintaining acceptable prediction accuracy\. Among the available surrogate modeling techniques, polynomial chaos expansion \(PCE\)\[Wiener\_PCE\]has become a well\-established approach for UQ\[LuthenReview\], sensitivity analysis\[SUDRET,CRESTAUX20091161\], reliability analysis\[Zhou2020,Marelli2018\], and recently scientific machine learning\[NOVAK2024112926,SHARMA2024117314\]or operator learning\[SHARMA2026118796\]\. Owing to its natural mathematical structure, PCE provides not only an efficient approximation of random system responses, but also direct access to statistical quantities\[NOVAK2022106808\], such as moments, and sensitivity indicators through the expansion coefficients\[PCEMoments\]\. This property makes non\-intrusive PCE particularly attractive for applications with expensive numerical models, where the total number of admissible model evaluations is strongly limited, e\. g\. complex non\-linear finite element model of a prestressed concrete bridge presented by Slowik et al\.\[ApplicationPCE\_bridge\]\.

However, for non\-intrusive PCE constructed by regression, the quality of the surrogate highly depends on the ED\[PCEOLSreview\], i\.e\., quality of the training data set\. Various work has been devoted to the design of sampling strategies for surrogate construction\. Classical static sampling designs include MC sampling\[McKayConovBeck:three:1979\], Latin hypercube sampling \(LHS\)\[Stein:87:LargeSampleLHS\], and quasi\-Monte Carlo methods\[Nieder:RandNumGen\_AND\_QMC\_1992\]based on low\-discrepancy sequences\. In addition, from a geometric and distributional perspective, space\-filling principles have led to a variety of sampling criteria, such as miniMax, Maximin\[JohMooYlv:MixiMinMiniMax:JSPI:90\], generalized versions of distance\-based criteria\[MorMit:JStatPLanInf:95,VorEli:Technometrics:20\], discrepancy\-based criteria\[fang2001wrap\], and entropy\-based criteria\[Shewry1987MaximumES\]\. From a view of regression, several optimality criteria, including D\-optimality, A\-optimality, E\-optimality, and I\-optimality\[AtkinsonDonev1992,Duarte2025\], have also been employed to improve the stability and accuracy of surrogate model by optimizing various characteristics of information matrix\. These approaches are attractive because of their ease of implementation\. However, since all design points are selected a priori at once, these strategies cannot exploit the information extracted from existing evaluated samples\. Moreover, their efficiency may deteriorate when the response becomes complex or non\-uniform, especially in multi\-output settings where different outputs may show different levels of complexity across the input space\. In order to make full use of the information provided by the existing surrogate and available responses, adaptive sequential strategies, often referenced as active learning techniques, have been investigated in surrogate modeling and reliability analysis\[Zhou2020\]\. In these methods, the ED is iteratively refined by selecting new points according to the information provided by the continuously updated surrogate model\. The criteria are often based on the approximation error, statistical indicator, or local geometry of the input space\. For PCE, adaptive sequential design is typically guided by criteria tailored to the structure of the expansion\[AdaptiveCoherence,SequentialPCEThapa\]\. In practice, the criteria often rely on error indicators, basis adaptivity, or variance\-related measures that exploit the information contained in the PCE coefficients\[PCESoptimSeq,NOVAK2021114105\]\. Within this context, the variance\-based adaptive sequential sampling strategy proposed by Novák et al\. represents an important development for PCE\-based surrogate construction, where the associated criterion combines a variance\-related term with a distance\-based term to achieve a balance between exploitation and exploration\[NOVAK2021114105\]\. This balance has proved particularly appealing in engineering applications\[Theta\_application1,ThetaApplication\_2\]because it avoids both excessive concentration of samples and poor coverage in parts of the input space, while favoring areas that are expected to contribute significantly to the total response variance\. As a result, the criterion has shown promising performance for the adaptive construction of PCE surrogates\.

Despite these merits, the original criterion is formulated for single\-output problems\. This restriction may be limiting in many engineering applications, where one simulation simultaneously produces several outputs, and all of them must be approximated under a limited and shared computational budget\. The straightforward single\-output criterion is generally unsatisfactory for the following reasons\. First, using different ED for different outputs may make it difficult to maintain consistency across the surrogate construction, which is undesirable when multiple responses are to be approximated jointly in a unified multi\-output setting\. Second, the repeated sampling procedure for different training sets may become computationally inefficient when the underlying model is expensive\. Third, different outputs may exhibit different levels of variability, smoothness, and approximation difficulty, so that a sample selected as informative for one output may be much less informative for other outputs\. Therefore, the extension from single\-output to multi\-output criterion is not merely a formal generalization but a necessary step toward an efficient sampling strategy for practical multi\-output problems\.

Motivated by this limitation, this paper proposes a multi\-output criterion for the adaptive sequential sampling for non\-intrusive PCE\. The proposed approach aggregates variance and distance information from multiple outputs into a unified criterion, so that each additional sample is selected to maximize the overall improvement of the surrogate model\. The proposed method is verified through several numerical examples, including engineering problems, and is compared to LHS, as a standard sampling method for practical applications, in terms of approximation accuracy, statistical estimation performance, and model stability\. The results show that the proposed method provides an effective and efficient sampling strategy for multi\-output PCE surrogate models\.

## 2Non\-intrusive Polynomial Chaos Expansion

LetY=g\(𝑿\)Y=g\\left\(\\bm\{X\}\\right\)denote a mathematical model, where𝑿=\(X1,X2,…,XM\)\\bm\{X\}=\(X\_\{1\},X\_\{2\},\\dots,X\_\{M\}\)represents a set ofMMindependent random input variables\. In the framework of standard PCE, the input variables are first mapped to a standardized random germ=\(,1,2…,\)M\{\\bm\{\\xi\}\}=\(\{\}\_\{1\},\{\}\_\{2\},\.\.\.,\{\}\_\{M\}\), which enables construction of orthogonal polynomial bases according to the Wiener–Askey scheme\[Askey\]\. The orthonormal condition is defined as

⟨,j⟩k=∫\(\)j\(\)kp\(\)d=0,j≠k\.\\langle\{\}\_\{j\},\{\}\_\{k\}\\rangle=\\int\{\}\_\{j\}\(\\xi\)\{\}\_\{k\}\(\\xi\)p\(\\xi\)d\\xi=0,\\quad j\\neq k\.\(1\)where⟨⋅,⋅⟩\\langle\\cdot,\\cdot\\rangledenotes the inner product,jandkare orthogonal polynomial basis functions, andp\(\)p\(\\xi\)is the probability density function of the germ \. The corresponding multivariate polynomial basis function\(\)\\Psi\(\{\\bm\{\\xi\}\}\)is constructed as a tensor product of univariate orthonormal polynomials

\(\)=∏i=1M\(\)ii,\{\}\_\{\{\\bm\{\\alpha\}\}\}\\left\(\{\\bm\{\\xi\}\}\\right\)=\\prod\_\{i=1\}^\{M\}\{\}\_\{\{\}\_\{i\}\}\\left\(\{\}\_\{i\}\\right\),\(2\)where∈NM\{\{\\bm\{\\alpha\}\}\}\\in\\mdmathbb\{N\}^\{M\}is a multi\-index indicating the polynomial degree associated with each dimensioni\. Accordingly, the responseYYcan be further represented using these multivariate basis functions as\[ghanem2003stochastic\]

Y=g\(𝑿\)=∑∈NM\(\),Y=g\\left\(\\bm\{X\}\\right\)=\\sum\_\{\{\\bm\{\\alpha\}\}\\in\\mdmathbb\{N\}^\{M\}\}\{\}\_\{\{\\bm\{\\alpha\}\}\}\{\}\_\{\{\\bm\{\\alpha\}\}\}\\left\(\{\\bm\{\\xi\}\}\\right\),\(3\)whereare the deterministic coefficients of multivariate polynomial basis\. These coefficients can be estimated by minimizing the approximation error using regression techniques such as ordinary least squares \(OLS\) or least angle regression \(LAR\)\.

For practical computation, the PCE model must be truncated to a finite number of terms\. A commonly used truncation strategy is the total\-degree truncation, where only multivariate polynomial terms with total degree not exceeding a prescribed orderppwill be retained\. Then, the resulting number of basis terms is given by

P=\(M\+p\)\!M\!p\!\.P=\\frac\{\(M\+p\)\!\}\{M\!\\,p\!\}\.\(4\)AsMMorppincreases, the number of basis termsPPgrows rapidly, which leads to a significant increase in computational cost \- commonly known phenomenon referenced as a curse of dimensionality\. Therefore, an appropriate application of the basis selection strategy becomes crucial\. The OLS is commonly used when the number of basis functions is moderate, LAR and other advanced basis selection algorithms\[OMP,BCS\]are often adopted for high\-dimensional sparse problems where the number of candidate basis functions becomes far larger than necessary\. In the case of OLS, the deterministic coefficients can be easily estimated based on the ED, expressed as

=\(T\)−1T𝐲,\{\\bm\{\\beta\}\}=\\left\(\{\\bm\{\\Psi\}\}^\{T\}\{\\bm\{\\Psi\}\}\\right\)^\{\-1\}\\ \{\\bm\{\\Psi\}\}^\{T\}\\mathbf\{y\},\(5\)where𝐲\\mathbf\{y\}represents the output of the training set, and is the data matrix constructed by evaluating the basis functions at the input samples in the ED\.

Once a PCE surrogate model is established, its predictive capability must be quantified\. For this purpose, the leave\-one\-out \(LOO\) cross\-validation errorQ2Q^\{2\}is commonly employed, which is less prone to overfitting compared to the coefficient of determinationR2R^\{2\}\. The LOO error is obtained by evaluating the prediction error when each sample in the ED is excluded once from the model construction\. For each exclusion, a surrogate model is built using the remaining samples and the prediction error is computed at the omitted point\. Repeating this procedure for all samples provides an estimate of the overall prediction error\. Although a direct implementation of this procedure would require a large computational cost, the LOO errorQ2Q^\{2\}can also be computed analytically from a single PCE model as follows:

Q2=1−1nsim∑i=1nsim\[g\(x\(i\)\)−gPCE\(x\(i\)\)1−hi\]22Y,EDQ^\{2\}=1\-\\frac\{\\frac\{1\}\{n\_\{\\mathrm\{sim\}\}\}\\sum\_\{i=1\}^\{n\_\{\\mathrm\{sim\}\}\}\\left\[\\frac\{g\\\!\\left\(x^\{\(i\)\}\\right\)\-g^\{\\mathrm\{PCE\}\}\\\!\\left\(x^\{\(i\)\}\\right\)\}\{1\-h\_\{i\}\}\\right\]^\{2\}\}\{\{\}\_\{Y,\\mathrm\{ED\}\}^\{2\}\}\(6\)where2Y,ED\{\}\_\{Y,\\mathrm\{ED\}\}^\{2\}is the variance of the model responses evaluated at the ED points using the original model andhih\_\{i\}denotes theiith diagonal term of the matrix𝐇=\(T\)−1T\.\\mathbf\{H\}=\\mathbf\{\\Psi\}\\left\(\\mathbf\{\\Psi\}^\{T\}\\mathbf\{\\Psi\}\\right\)^\{\-1\}\\mathbf\{\\Psi\}^\{T\}\.

Moreover, the specific structure of the PCE facilitates the calculation of the statistical moments of the response\. The rawmmth order statistical moment can be rewritten in terms of the PCE as follows:

⟨Ym⟩\\displaystyle\\langle Y^\{m\}\\rangle=∫\[g\(𝐗\)\]mp𝐗\(𝐗\)𝑑𝐗\\displaystyle=\\int\[g\(\\mathbf\{X\}\)\]^\{m\}p\_\{\\mathbf\{X\}\}\(\\mathbf\{X\}\)\\,d\\mathbf\{X\}\(7\)=∫\[∑∈NM\(\)\]mp\(\)d\\displaystyle=\\int\\left\[\\sum\_\{\\alpha\\in\\mdmathbb\{N\}^\{M\}\}\(\\bm\{\\xi\}\)\\right\]^\{m\}p\_\{\\bm\{\\xi\}\}\(\\bm\{\\xi\}\)\\,d\\bm\{\\xi\}=∫∑∈1NM⋯∑∈mNM⋯1\(\)m1⋯\(\)mp\(\)d\\displaystyle=\\int\\sum\_\{\{\}\_\{1\}\\in\\mdmathbb\{N\}^\{M\}\}\\cdots\\sum\_\{\{\}\_\{m\}\\in\\mdmathbb\{N\}^\{M\}\}\{\}\_\{\{\}\_\{1\}\}\\cdots\{\}\_\{\{\}\_\{m\}\}\{\}\_\{\{\}\_\{1\}\}\(\\bm\{\\xi\}\)\\cdots\{\}\_\{\{\}\_\{m\}\}\(\\bm\{\\xi\}\)p\_\{\\bm\{\\xi\}\}\(\\bm\{\\xi\}\)\\,d\\bm\{\\xi\}=∑∈1NM⋯∑∈mNM⋯1∫m\(\)1⋯\(\)mp\(\)d\.\\displaystyle=\\sum\_\{\{\}\_\{1\}\\in\\mdmathbb\{N\}^\{M\}\}\\cdots\\sum\_\{\{\}\_\{m\}\\in\\mdmathbb\{N\}^\{M\}\}\{\}\_\{\{\}\_\{1\}\}\\cdots\{\}\_\{\{\}\_\{m\}\}\\int\{\}\_\{\{\}\_\{1\}\}\(\\bm\{\\xi\}\)\\cdots\{\}\_\{\{\}\_\{m\}\}\(\\bm\{\\xi\}\)p\_\{\\bm\{\\xi\}\}\(\\bm\{\\xi\}\)\\,d\\bm\{\\xi\}\.As an alternative to integrating the original modelg\(𝐗\)g\(\\mathbf\{X\}\), the integration now is performed only on the basis functions, greatly simplifying the computation\. Specifically, due to the orthonormality of the basis functions, the first and second statistical moments can be further simplified as

⟨Y1⟩==Y0\\langle Y^\{1\}\\rangle=\{\}\_\{Y\}=\{\}\_\{0\}\(8\)and

⟨Y2⟩=∑∈𝒜2⟨,⟩\.\\langle Y^\{2\}\\rangle=\\sum\_\{\\alpha\\in\\mathcal\{A\}\}^\{2\}\\langle,\\rangle\.\(9\)Then, the variance of the response=Y2⟨Y2⟩−Y2\{\}\_\{Y\}^\{2\}=\\langle Y^\{2\}\\rangle\-\{\}\_\{Y\}^\{2\}can be readily obtained from these two moments as

=Y2∑∈𝒜≠02\.\{\}\_\{Y\}^\{2\}=\\sum\_\{\\begin\{subarray\}\{c\}\\alpha\\in\\mathcal\{A\}\\\\ \\alpha\\neq 0\\end\{subarray\}\}^\{2\}\.\(10\)Owing to the convenient variance evaluation enabled by the PCE representation, and motivated by the connection between variance\-based measures and the overall variation of a function, such as the variation in the sense of Hardy and Krause\[Koksma:ineq\_1942\], the response variance is adopted as an important component of the subsequent \-criterion sampling strategy\. However, for adaptive sampling, a global variance is not sufficient, since it does not indicate where new samples should be placed\. It is necessary to identify the regions of the input space that contribute most significantly to the total variance\. To this end, the total variance is further localized by interpreting it as an integral of pointwise contributions over the input space\. Based on the PCE representation, a local variance measure can then be defined to quantify the contribution of a given point to the total variance, expressed as

\(\)𝒜2=\[∑∈𝒜≠0\(\)\]2p\(\),\{\}\_\{\\mathcal\{A\}\}^\{2\}\(\\bm\{\\xi\}\)=\\left\[\\sum\_\{\\begin\{subarray\}\{c\}\\alpha\\in\\mathcal\{A\}\\\\ \\alpha\\neq 0\\end\{subarray\}\}\\,\(\\bm\{\\xi\}\)\\right\]^\{2\}\\,p\_\{\\bm\{\\xi\}\}\(\\bm\{\\xi\}\),\(11\)where a large value of\(\)𝒜2\{\}\_\{\\mathcal\{A\}\}^\{2\}\(\\bm\{\\xi\}\)indicates that the corresponding region contributes more to the total variance of the surrogate response and is therefore important for adaptive sampling\.

## 3Adaptive Sequential Sampling

When the coefficients of a non\-intrusive PCE are estimated using regression techniques, the accuracy and stability of the surrogate model depend strongly on the quality of the ED\. In many practical engineering problems, the costs associated with the construction of ED, i\.e\. evaluations of the original numerical model, are often very high\. This issue becomes particularly significant among multi\-output problems, where samples may contribute unevenly to approximation of different outputs, requiring a larger number of training samples to achieve satisfactory accuracy across all outputs\. Therefore, an appropriate sampling strategy is essential for constructing a reliable surrogate model when only a limited number of training samples is available\.

Traditional ED approaches typically rely on static sampling strategies, in which all training points are selected in advance based solely on the input space and the assumed design objective, without exploiting the outputs of previously evaluated samples or structures of existing surrogates\[GARUD201771\]\. Owing to their generality and ease of implementation, static designs have been widely adopted as a common choice for surrogate construction, particularly when prior knowledge of the response is limited\. One of the earliest and most often used choice is MC sampling, which is attractive because of its simplicity and its natural consistency with the prescribed probability distribution of the inputs\. However, purely random sampling for low to mid\-size ED may suffer from poor geometric coverage \(clustering and large empty regions in design space\) and limited sampling efficiency when the available training budget is small\. To mitigate the problem caused by uneven sample distribution, subsequent developments introduce more structured sampling strategies to improve the coverage of the input domain\. A classic representative is LHS, which partitions each input dimension into multiple adjacent, non\-overlapping subintervals to improve the coverage of the input space and reduce the risk that some regions remain insufficiently explored\[Stein:87:LargeSampleLHS\]\. For this reason, LHS becomes one of the most widely used static sampling methods in surrogate modeling, sensitivity analysis, and UQ, especially for high\-dimensional problems\. Beyond marginal stratification, another important design idea is to improve the uniformity of samples in the global domain\. This consideration motivates the use of Quasi Monte Carlo\[Nieder:RandNumGen\_AND\_QMC\_1992\]methods based on low discrepancy sequences, such as Sobol, Halton, and Hammersley sequances, which aim to distribute samples more uniformly over the input domain than purely random sampling\.

Another related but explicitly geometric strategy is space\-filling design, under which the design problem is no longer regarded merely as sampling from the input distribution, but rather as selecting points that fill the admissible region effectively\. This line of development led to various criterial, including miniMax, Maximin\[JohMooYlv:MixiMinMiniMax:JSPI:90\], generalized versions of distance\-based criteria\[MorMit:JStatPLanInf:95,VorEli:Technometrics:20\], discrepancy\-based criteria\[fang2001wrap\], and entropy\-based criteria\[Shewry1987MaximumES\]\.

Importantly, space\-filling is one of the possible design principle for static ED\. When a surrogate model is constructed through regression, another important consideration is the prediction accuracy, based on which the model\-based optimal criteria are proposed\. In this class, the ED is selected to improve the conditioning of the regression, thereby enhancing the accuracy and stability of the surrogate model\. Representative examples include D\-optimality, A\-optimality, and E\-optimality\[Kieferoptimal\], which are defined based on the information of the covariance matrix, whereas I\-optimality\[Ioptimal\]focuses on minimizing the average prediction variance over the design domain\.

Despite their differences, all above strategies share a common limitation: the ED is selected before any response is observed\. Consequently, they cannot exploit the information revealed by the existing evaluated samples\. This issue becomes more critical in multi\-output problems, where different outputs may exhibit different levels of complexity, and a sample that is informative for one output may contribute less to the others\. As a result, static designs may fail to improve all outputs in a balanced manner, and more training samples are required to ensure satisfactory accuracy of the most complicated output\. Therefore, it is desirable to develop an adaptive sequential sampling criterion to fully utilize the information contained in the current training samples\. To this end, a multi\-output criterion is developed from the single\-output criterion originally proposed in\[NOVAK2021114105\]to guide the sequential selection of samples for PCE surrogate\.

We propose an adaptive sequential sampling strategy aimed at improving the estimation of the PCE coefficients for multi\-output problems\. The strategy relies on a candidate pool consisting ofnpooln\_\{\\mathrm\{pool\}\}realizations of the random input vector generated by an arbitrary sampling technique\. The enrichment process is initialized using a surrogate model constructed by a relatively small initial ED\. To account for the spatial distribution of the samples and the variance contribution of the model response, the criterion is extended to the multi\-output case as follows:

\(\(c\)\)=\(∑r=1Nout\(\(c\)\)r2maxi\(c\)∈𝒮c\(i\(c\)\)r2\)\(∑r=1Nout\(\(s\)\)r2maxi\(s\)∈𝒮s\(i\(s\)\)r2\)⏟ave normalized variance aggregation×lc,sM\\Theta\(\{\\bm\{\\xi\}\}^\{\(\\mathrm\{c\}\)\}\)=\\underbrace\{\\sqrt\{\\left\(\\sum\_\{r=1\}^\{N\_\{\\mathrm\{out\}\}\}\\frac\{\{\}\_\{\{r\}\}^\{2\}\(\{\\bm\{\\xi\}\}^\{\(\\mathrm\{c\}\)\}\)\}\{\\displaystyle\\max\_\{\{\\bm\{\\xi\}\}\_\{i\}^\{\(\\mathrm\{c\}\)\}\\in\\mathcal\{S\}\_\{\\mathrm\{c\}\}\}\{\}\_\{r\}^\{2\}\(\{\\bm\{\\xi\}\}\_\{i\}^\{\(\\mathrm\{c\}\)\}\)\}\\right\)\\left\(\\sum\_\{r=1\}^\{N\_\{\\mathrm\{out\}\}\}\\frac\{\{\}\_\{r\}^\{2\}\(\{\\bm\{\\xi\}\}^\{\(\\mathrm\{s\}\)\}\)\}\{\\displaystyle\\max\_\{\{\\bm\{\\xi\}\}\_\{i\}^\{\(\\mathrm\{s\}\)\}\\in\\mathcal\{S\}\_\{\\mathrm\{s\}\}\}\{\}\_\{r\}^\{2\}\(\{\\bm\{\\xi\}\}\_\{i\}^\{\(\\mathrm\{s\}\)\}\)\}\\right\)\}\}\_\{\\text\{ave normalized variance aggregation\}\}\\;\\times l\_\{\\mathrm\{c\},\\mathrm\{s\}\}^\{M\}\(12\)where\(c\)\{\\bm\{\\xi\}\}^\{\(\\mathrm\{c\}\)\}denotes a candidate point from the candidate pool𝒮c\\mathcal\{S\}\_\{\\mathrm\{c\}\}, and\(s\)\{\\bm\{\\xi\}\}^\{\(\\mathrm\{s\}\)\}denotes its nearest neighbor in the current ED\.\(\)r2\{\}\_\{r\}^\{2\}\(\{\\bm\{\\xi\}\}\)denotes the local variance associated with therrth \(r=1,…,Noutr=1,\\dots,N\_\{\\mathrm\{out\}\}\) output evaluated at point \. The normalization factors correspond to the maximum local variance of therrth output component evaluated over the candidate set𝒮c\\mathcal\{S\}\_\{\\mathrm\{c\}\}and over the corresponding nearest\-neighbor set𝒮s\\mathcal\{S\}\_\{\\mathrm\{s\}\}, respectively\. The distancelc,sl\_\{\\mathrm\{c\},\\mathrm\{s\}\}represents the Euclidean distance between the candidate point and its nearest neighbor in the input space, defined as

lc,s=∑i=1M\|−i\(c\)\|2i\(s\)\.l\_\{\\mathrm\{c\},\\mathrm\{s\}\}=\\sqrt\{\\sum\_\{i=1\}^\{M\}\|\{\}\_\{i\}^\{\(\\mathrm\{c\}\)\}\-\{\}\_\{i\}^\{\(\\mathrm\{s\}\)\}\|^\{2\}\}\.\(13\)whereMMis the dimension of the input vector\. Furthermore, we introduce an abbreviated notation by dropping the explicit dependence on the polynomial basis index set𝒜\\mathcal\{A\}, which may change during the adaptive construction of the PCE model as the ED is enriched\. Multiplication of these two independent contributions maintains the optimal balance between exploitation \(denoted as “ave normalized variance aggregation”\) and exploration \(the distance termlc,sl\_\{\\mathrm\{c\},\\mathrm\{s\}\}\)\.

The exploitation component of the proposed criterion is motivated by the idea of refining regions that contribute significantly to the overall variance of the model response\. For a PCE surrogate, the variance of each output can be interpreted as the integral of local variance contributions over the input domain\. This interpretation allows the introduction of a local variance measure that quantifies the contribution of a given location in the input space to the overall model variability\. When evaluating a candidate point\(c\)\{\\bm\{\\xi\}\}^\{\(\\mathrm\{c\}\)\}, it is natural to consider the region between this point and its nearest neighbor\(s\)\{\\bm\{\\xi\}\}^\{\(\\mathrm\{s\}\)\}in the current ED\. The variance contribution of this region can be approximated by combining the local variance measures at the two points\. In particular, the geometric mean of the normalized variance aggregations at the candidate and its nearest neighbor provides an estimate of the average variance level within this region\. Multiplying this quantity by the distance termlc,sMl\_\{\\mathrm\{c\},\\mathrm\{s\}\}^\{M\}yields an approximation of the variance contained in the neighborhood between the two points\. Consequently, the criterion can be interpreted as an estimate of the amount of unresolved variance that would be refined by selecting the candidate\. In this way, the criterion naturally favors candidates located in regions with both high variance contribution and low sampling density\.

In the particular case where the variance density is constant over the input domain, the proposed criterion is simplified to a purely space\-filling strategy, since the variance aggregation term becomes uniform for all candidates and the selection is driven solely by the distance termlc,sMl\_\{\\mathrm\{c\},\\mathrm\{s\}\}^\{M\}\. In this situation, the criterion behaves similarly to a miniMax design\[JohMooYlv:MixiMinMiniMax:JSPI:90,EliVorSad:miniMax:ADES:20\], which promotes a uniform distribution of samples over the design domain\. The geometric mean is used instead of the arithmetic mean in order to avoid selecting candidates located extremely far from the region of significant probability mass\. In problems with unbounded input domains \(e\.g\., Gaussian variables\), candidates far from the mean may exhibit very large distances but negligible variance contribution\. The geometric mean naturally penalizes such cases and prevents the selection of candidates with vanishing variance density\. Therefore, maximization of the criterion identifies candidates that are expected to refine regions with high variance contribution and insufficient sampling density, enabling efficient sequential enrichment of the ED\.

## 4Numerical Examples and Engineering Applications

The proposed criterion is numerically tested on several multi\-output examples and compared with LHS\. To comprehensively assess the model accuracy, various error measures are employed, including the mean absolute error \(MAE\), the maximum absolute errorAEmax\\mathrm\{AE\}\_\{\\max\}, and the relative variance errorvardefined as

MAE=∑i=1ntest\|yipce−yiref\|ntest,AEmax=maxi=1,…,ntest⁡\|yipce−yiref\|,=var\|−2\|y22y\.\\displaystyle\\begin\{aligned\} &\\mathrm\{MAE\}=\\frac\{\\sum\_\{i=1\}^\{n\_\{\\mathrm\{test\}\}\}\\lvert y\_\{i\}^\{pce\}\-y\_\{i\}^\{ref\}\\rvert\}\{n\_\{test\}\},\\quad\\mathrm\{AE\}\_\{\\max\}=\\max\_\{i=1,\\dots,n\_\{\\mathrm\{test\}\}\}\\left\|y\_\{i\}^\{\\mathrm\{pce\}\}\-y\_\{i\}^\{\\mathrm\{ref\}\}\\right\|,\\\\ &\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\mathrm\{\{\}\_\{var\}\}=\\frac\{\\lvert\{\}^\{2\}\-\{\}\_\{y\}^\{2\}\\rvert\}\{\{\}\_\{y\}^\{2\}\}\.\\end\{aligned\}\(14\)To further investigate the stability of the surrogate model, the LOO error is also considered\. For a consistent visualization, we use1−Q21\-Q^\{2\}in the following plots\. For all examples, the calculations are repeated in order to get statistical information, i\.e\. the mean values of above error measures as well as the corresponding±\\pm\\sigmaintervals\.

### 4\.1Illustrative Example: 2D Mirror Line Singularities

To illustrate the performance of the proposed criterion for sequential sampling, we first consider a 2D line\-singularity function exhibiting point symmetry about the center of input space,\(0\.5,0\.5\)\(0\.5,0\.5\)\. The mathematical formulation of the model is given by

Y=f\(𝑿\)=1\|0\.3−X12−X22\|\+−1\|0\.3−\(1−X1\)2−\(1−X2\)2\|\+,\(X1,X2\)∈\[0,1\]2\.Y=f\(\\bm\{X\}\)=\\frac\{1\}\{\\lvert 0\.3\-X\_\{1\}^\{2\}\-X\_\{2\}^\{2\}\\rvert\+\\delta\}\-\\frac\{1\}\{\\lvert 0\.3\-\(1\-X\_\{1\}\)^\{2\}\-\(1\-X\_\{2\}\)^\{2\}\\rvert\+\\delta\},\\qquad\(X\_\{1\},X\_\{2\}\)\\in\[0,1\]^\{2\}\.\(15\)This model contains two terms whose denominators vanish along circular curves in the input domain\. Singular behavior localized near these curves induces steep gradients confined to narrow regions, which makes the problem challenging for the PCE approximation, particularly when the parameter is small\. In this study, we focus on the case of=0\.1\\delta=0\.1, for which the variance of the response is approximately≈Y213\.070 477 042\{\}\_\{Y\}^\{2\}\\approx 13\.070\\,477\\,042\. Due to the narrow singular regions, space\-filling sampling strategies may fail to allocate sufficient samples in these regions\. Consequently, a global PCE surrogate model trained on such samples may not adequately capture the localized features\. This indicates the potential applicability of a variance\-based criterion to guide adaptive sampling toward regions that contribute most significantly to predictive uncertainty\. Moreover, for the purpose of constructing a multi\-output task, we introduce a diagonal partitionx1=x2x\_\{1\}=x\_\{2\}and treat the response in each subregion as a distinct output component, thereby reformulating the original single\-output problem into a multi\-output task:

Y1\(𝑿\)=\{f\(𝑿\),X1≥X2,0,X1<X2,Y2\(𝑿\)=\{0,X1\>X2,f\(𝑿\),X1≤X2\.Y\_\{1\}\(\\bm\{X\}\)=\\begin\{cases\}f\(\\bm\{X\}\),&X\_\{1\}\\geq X\_\{2\},\\\\ 0,&X\_\{1\}<X\_\{2\},\\end\{cases\}\\qquad Y\_\{2\}\(\\bm\{X\}\)=\\begin\{cases\}0,&X\_\{1\}\>X\_\{2\},\\\\ f\(\\bm\{X\}\),&X\_\{1\}\\leq X\_\{2\}\.\\end\{cases\}\(16\)
As shown in Fig\.[1](https://arxiv.org/html/2606.17233#S4.F1)\(b\), in a representative run, the training points selected by criterion are primarily concentrated near the circular singularity curves\. In contrast to the uniform distribution of the space\-filling LHS design in Fig\.[1](https://arxiv.org/html/2606.17233#S4.F1)\(a\), the \-adapted samples exhibit a clear clustering behavior along regions with a steep variance\. Based on these samples, Fig\.[2](https://arxiv.org/html/2606.17233#S4.F2)presents a comparison of the mean and standard deviation of the MAE, relative variance error, and LOO error over 100 independent runs in the log\-transformed space\. Figure[2](https://arxiv.org/html/2606.17233#S4.F2)\(a\) shows that, when the number of training samples is small, the criterion reduces the MAE more rapidly and reaches a plateau with fewer samples compared to LHS\. Although a slight increase in MAE is observed as the number of training samples continues to grow, the \-based approach maintains a consistently narrower standard deviation band and less sensitive to the randomness of repeated trials\. A similar convergence trend can be observed in Fig\.[2](https://arxiv.org/html/2606.17233#S4.F2)\(b\) for the relative variance error\. The \-based design exhibits a faster decay and converges to a noticeably lower error, while maintaining variability comparable to that of LHS in the log\-scale\. This suggests that adaptive sampling in high\-variance regions enhances the capability of the PCE surrogate to capture second\-order statistics more accurately\. For LOO error, Fig\.[2](https://arxiv.org/html/2606.17233#S4.F2)\(c\) shows that the criterion decreases slightly faster than LHS and ultimately achieves a lower error level\. The standard deviation band of the criterion is significantly narrow compared to LHS, and the influence of different realizations on the LOO error becomes negligible as the sample increases\. This reflects the stronger robustness of the \-adapted sampling strategy against variability across repeated realizations\. Overall, the results demonstrate that the criterion improves both accuracy and robustness of the PCE surrogate model with only marginal additional computational cost in this case\.

![Refer to caption](https://arxiv.org/html/2606.17233v1/x1.png)Figure 1:Sample distributions for the 2D mirror\-line singularity problem: \(a\) samples generated by LHS and \(b\) initial and \-adapted samples in the input space\.![Refer to caption](https://arxiv.org/html/2606.17233v1/x2.png)Figure 2:2D mirror line singularities: comparison of criterion and LHS for PCE surrogate model construction: mean and standard deviation of \(a\) MAE, \(b\) relative variance error, and \(c\) LOO error\.
### 4\.2Reinforced Concrete Beam Problem

Reinforced concrete structures require rigorous design optimization to satisfy numerous code\-based constraints safely and efficiently\[zhang2023influence\]\. However, evaluating these constraints often involves complex finite element\-based simulations, which are computationally demanding—especially within iterative optimization frameworks\. To demonstrate the effectiveness of the proposed sampling criterion for PCE surrogate model, a three\-span continuous reinforced concrete beam is analyzed as the second example \(see Fig\.[3](https://arxiv.org/html/2606.17233#S4.F3)\), with parameters adopted from\[ozturk2024research\]\. Due to the statically indeterminate nature of the beam, structural responses must be evaluated at multiple critical sections to determine the required reinforcement and dimensions\. Here, the PCE surrogate model is trained to estimate 15 primary structural responses: the bending moments at three locations per span \(both ends and mid\-span\), totaling 9 values, and the shear forces at both ends of each span, totaling 6 values\. These outputs are influenced by eight random variables, including dead and live loads, as well as material propertiesfcdf\_\{cd\}andfydf\_\{yd\}, whose statistical characteristics are summarized in Table[1](https://arxiv.org/html/2606.17233#S4.T1)\. Finally, the design constraints are formulated based on the flexural and shear capacities as presented in Equations \(17a\) and \(17b\)\.

Mr\(i\)\\displaystyle M\_\{r\}\(i\)=Asfyd\(d−Asfyd2bwfcd\)≥Md\(i\)\\displaystyle=A\_\{s\}f\_\{yd\}\\left\(d\-\\frac\{A\_\{s\}f\_\{yd\}\}\{2b\_\{w\}f\_\{cd\}\}\\right\)\\geq M\_\{d\}\(i\)\(17a\)Vr\(j\)\\displaystyle V\_\{r\}\(j\)=Aswsfywdd≥Vd\(j\)\\displaystyle=\\frac\{A\_\{sw\}\}\{s\}f\_\{ywd\}d\\geq V\_\{d\}\(j\)\(17b\)whereMd\(i\)M\_\{d\}\(i\)andMr\(i\)M\_\{r\}\(i\)represent the design bending moment and moment resistance at locationii\(includingnnsupports andkkmid\-spans\), whileVd\(j\)V\_\{d\}\(j\)andVr\(j\)V\_\{r\}\(j\)denote the design shear force and shear resistance at the ends of each span\.

In addition to the error metrics used previously, we further introduce theAEmax\\mathrm\{AE\}\_\{\\max\}in this example to provide a more comprehensive evaluation, which is particularly relevant in engineering applications, as it represents the worst\-case prediction error\. As shown in Fig\.[5](https://arxiv.org/html/2606.17233#S4.F5), when the initial surrogate models of two sampling strategies are at comparable levels of MAE, the criterion rapidly exhibits a pronounced advantage asnsamplen\_\{\\mathrm\{sample\}\}increases\. The results in terms of theAEmax\\mathrm\{AE\}\_\{\\max\}exhibit similar trends \(in Fig\.[4](https://arxiv.org/html/2606.17233#S4.F4)\) indicating that the worst\-case prediction error is also effectively reduced\. In terms of the relative variance error, for outputs 7, 8, 9, 14, and 15, the criterion demonstrates superior performance in the early stages of the sampling process, while approaching a performance comparable to LHS in the later stages\. For the remaining outputs, the criterion remains comparable to LHS throughout the entire sampling process \(in Figure[6](https://arxiv.org/html/2606.17233#S4.F6)\)\. A possible explanation for this behavior is that the normalization step in the criterion reduces the relative contribution of low\-variance regions within each output\. Consequently, regions with dominant local variance tend to govern the sampling process, while other regions become less influential\. Nevertheless, from a global perspective, this behavior is still acceptable, as the criterion still shows a generally superior performance\. Moreover, the criterion tends to produce narrower standard deviation regions in the later stages, indicating lower variability and improved robustness against repeats\. For the stability of the surrogate model, the criterion exhibits a clear advantage over LHS, as illustrated in Fig\.[7](https://arxiv.org/html/2606.17233#S4.F7)\. As the number of samples increases, the criterion rapidly outperforms LHS, and the gap continues to widen\. Overall, this example confirm that the proposed criterion provides a more effective sampling trategy for multi\-output PCE modeling in terms of model accuracy, stability, and aggregate performance of variance estimation\.

![Refer to caption](https://arxiv.org/html/2606.17233v1/x3.png)Figure 3:Geometry and loading configuration of the three\-span reinforced concrete beamTable 1:Reinforced concrete beam: statistical properties of random variablesVariableSymbolDistributionMeanCoV \(%\)Dead load \(KN/m\)X1X\_\{1\}Normal3010X2X\_\{2\}Normal1010X3X\_\{3\}Normal3010Live load \(KN/m\)X4X\_\{4\}Gumbel2025X5X\_\{5\}Gumbel1025X6X\_\{6\}Gumbel2025Reinforcement yield strength \(MPa\)X7X\_\{7\}Normal43510Concrete strength \(MPa\)X8X\_\{8\}Log\-normal1\.4615![Refer to caption](https://arxiv.org/html/2606.17233v1/x4.png)Figure 4:Reinforced concrete beam: comparison of criterion and LHS for PCE surrogate model construction: mean and standard deviation of maximum absolute error\.![Refer to caption](https://arxiv.org/html/2606.17233v1/x5.png)Figure 5:Reinforced concrete beam: comparison of criterion and LHS for PCE surrogate model construction: mean and standard deviation of MAE\.![Refer to caption](https://arxiv.org/html/2606.17233v1/x6.png)Figure 6:Reinforced concrete beam: comparison of criterion and LHS for PCE surrogate model construction: mean and standard deviation of relative variance error\.![Refer to caption](https://arxiv.org/html/2606.17233v1/x7.png)Figure 7:Reinforced concrete beam: comparison of criterion and LHS for PCE surrogate model construction: mean and standard deviation of LOO error\.
### 4\.3Offshore Wind Turbine Problem

Offshore wind turbines \(OWTs\) operate under complex environmental conditions, where structural responses are influenced by various external factors such as wind and wave\. The interaction between these environmental loads and the structural system can lead to nonlinear and coupled behavior, which poses challenges to accurate response prediction and reliability assessment\[FANG2026100035,ZHANG2025113309\]\. In this section, two representative cases are considered\. The first case focuses on wind\-induced structural responses, where the system is driven by a single environmental input, allowing the application of high polynomial orders for the surrogate without incurring excessive computational cost\. The second case addresses a more realistic condition involving multiple environmental inputs and their coupled effects on the structural response\. To this end, the NREL 5 MW monopile OWT from the OC3 project\[jonkman2009definition\]is adopted as the benchmark structure for both cases\.

#### 4\.3\.1Offshore wind turbine moment responses under wind excitation

This case investigates the structural response of an OWT under stochastic wind loading\. The mean wind speed is treated as a random input following a Weibull distribution with a mean value of 12 m/s and a coefficient of variation \(CoV\) of 16\.7%, whereas the associated turbulent wind field leads to variability in the response\[sehrawat2016dynamic,FANG2026100035\]\. The response contains two bending moments, namely the edgewise and flapwise moments, which are closely related to fatigue and ultimate load effects\.

Similarly to the previous example, four metrics \(i\.e\.,AEmax\\mathrm\{AE\}\_\{\\max\}, MAE, variance, and LOO error\) are considered to evaluate the performance of the criterion\. Fig\.[8](https://arxiv.org/html/2606.17233#S4.F8)\(a\) presents the evolution of theAEmax\\mathrm\{AE\}\_\{\\max\}for the two moments\. The criterion consistently achieves a lowerAEmax\\mathrm\{AE\}\_\{\\max\}than LHS throughout the sampling process\. AlthoughAEmax\\mathrm\{AE\}\_\{\\max\}under LHS decreases asnsamplen\_\{\\mathrm\{sample\}\}increases, it remains consistently worse\. In contrast, the criterion effectively controls the worst\-case prediction error with only a limited number of additional samples\. For MAE \(Fig\.[8](https://arxiv.org/html/2606.17233#S4.F8)\(b\)\), the LHS method exhibits substantial variability at small sample sizes, particularly for the first output\. As the sample size increases, the uncertainty decreases and both methods converge toward stable error levels\. For the first output, the two methods achieve comparable accuracy, whereas for the second output, LHS consistently yields a slightly lower MAE\. For variance, the reference valueref2\{\}^\{2\}\_\{ref\}is estimated using1×1051\\times 10^\{5\}Monte Carlo samples, which serves as the benchmark to assess the convergence of both sampling strategies\. As shown in Fig\.[8](https://arxiv.org/html/2606.17233#S4.F8)\(c\), both methods eventually converge toref2\{\}^\{2\}\_\{\\mathrm\{ref\}\}\. However, the criterion requires only a small number of training samples to rapidly improve the accuracy and stability of the variance prediction, and consistently outperforms LHS throughout the sampling process\. In terms of the stability of the surrogate, Fig\.[8](https://arxiv.org/html/2606.17233#S4.F8)\(d\) shows that the criterion exhibits relatively higher values of1−Q21\-Q^\{2\}in the early stages\. Whennsamplen\_\{\\mathrm\{sample\}\}increases, the stability improves progressively, eventually becoming better than that of LHS\. These observations collectively reflect the intrinsic balance of the criterion: in the early stages, candidate points located in remote regions with extreme local variance may represent aggressive sampling choices, potentially compromising global approximation accuracy and space\-filling, while effectively targeting regions associated with extreme responses; however, once selected, these points contribute to improving the global coverage of the input space through the distance\-based component, thereby enhancing the overall stability and convergence behavior of the surrogate model\.

![Refer to caption](https://arxiv.org/html/2606.17233v1/x8.png)Figure 8:OWT under wind excitation: comparison of criterion and LHS for PCE surrogate model construction: mean and standard deviation of maximum absolute error\.
#### 4\.3\.2Offshore wind turbine displacement responses under multiphysics coupling

This case investigates the structural response of an OWT under coupled environmental loading conditions\. Five environmental input variables are considered, including water depth, significant wave height, wind speed, wind direction, and wave direction\. These variables jointly govern the coupled aerodynamic and hydrodynamic effects on the structure, leading to complex and nonlinear interactions within the system\. The response is described by two displacement components in orthogonal directions, reflecting the global structural behavior under multiphysics coupling\. Figure[9](https://arxiv.org/html/2606.17233#S4.F9)provides a simplified representation of these environmental interactions affecting the OWT\. The statistical properties of the environmental random variables are listed in Table[2](https://arxiv.org/html/2606.17233#S4.T2)\[sehrawat2016dynamic\]\. Due to the increased number of input variables and the inherent nonlinearity of the responses, the number of basis functions in the PCE grows rapidly, resulting in a substantial computational cost of each simulation\. It should be emphasized that this cost is dictated by the intrinsic complexity of the surrogate itself\. The additional cost introduced by the criterion is almost negligible\. Therefore, instead of individual simulations for each realization, a set of 1000 samples is generated in advance using LHS, among which 100 selected samples are fixed as the testing set, while the remaining 900 samples are used as the initial training set and candidate pool for subsequent sampling\. For comparison, the original LHS sampling is replaced by randomly selecting subsets from the LHS candidate pool in this case\.

![Refer to caption](https://arxiv.org/html/2606.17233v1/x9.png)Figure 9:Schematic of multiphysics environmental interactions affecting the offshore wind turbineTable 2:Multiphysics OWT: statistical properties of environmental random variablesThe results of this case in Fig\.[10](https://arxiv.org/html/2606.17233#S4.F10)exhibit similar trends to those observed in the previous case\. The criterion consistently achieves improved control of the worst\-case prediction error, while maintaining comparable accuracy in terms of MAE\. For variance, both sampling strategies converge towards a stable asymptotic value, with the criterion showing a rapid decrease in the early stages\. It should be noted that all the convergence behavior is influenced by the finite size of the candidate pool, which inherently limits the available sampling space and leads to similar asymptotic values for both methods towards the end of the sampling process\. In particular, this effect is more evident for the1−Q21\-Q^\{2\}\. Nevertheless, the criterion exhibits a faster improvement in the early and intermediate stages, characterized by a more rapid decrease in1−Q21\-Q^\{2\}\. Aroundnsample=200n\_\{\\mathrm\{sample\}\}=200, its performance becomes comparable to that of the LHS subset, and subsequently surpasses it by achieving lower values for output 1\. Overall, the results from both cases demonstrate that the criterion provides improved sampling efficiency across multiple outputs, particularly under limited sampling budgets\. It enables more effective identification of regions associated with extreme responses, while also reducing variability in the surrogate predictions\. Such characteristics are especially valuable for computationally demanding engineering applications, including reliability analysis and limit\-state evaluation, where accurate characterization of extreme events is critical\.

![Refer to caption](https://arxiv.org/html/2606.17233v1/x10.png)Figure 10:Multiphysics OWT: comparison of criterion and LHS for PCE surrogate model construction: mean and standard deviation of maximum absolute error\.

## 5Discussion & Further Work

The results from the numerical examples shows that the proposed criterion successfully improves sampling efficiency in multi\-output problems\. By aggregating variance contributions across multiple outputs, the method enables a joint utilization of multi\-output information, thereby avoiding the computational cost and potential inconsistency caused by separate additional sampling\. In addition, the multi\-output criterion exhibits reduced variability across different repeats, suggesting lower sensitivity to random sampling and improved robustness of the surrogate model\. Despite these advantages, several limitations of the proposed approach should be acknowledged\. First, the method relies on a predefined candidate pool, from which new samples are selected during the sequential enrichment process\. In some high\-dimensional engineering applications, such as the multiphysics OWT, constructing a sufficiently large and representative candidate pool generated by advanced sampling techniques\[VorMas:Nbody2:ADES:20\]may become computationally prohibitive\. As a result, the quality of the candidate pool may limit the achievable performance of the adaptive sampling strategy\. Second, the computational cost associated with model updating may become significant in high\-dimensional problems\. In particular, when sparse regression techniques such as LAR are employed for PCE coefficient estimation, the iterative selection of basis functions can introduce additional computational overhead\. While LAR has advantages for handling large candidate basis sets in high\-dimensional cases, its cost may become non\-negligible when the surrogate model is updated frequently during the sequential sampling process\. These highlight the balance between sampling efficiency and increased computational cost, which should be considered in practical applications\. A combination of the high computational burden of iterative sparse solvers with models containing large number of QoIs may be not feasible in practical applications\. Possible solution can be seen in sparse solvers developed specifically for vector valued PCEs\[LOUKREZIS2025115746\]\.

A possible direction for future work is to further explore the correlation among multiple outputs\. While the proposed approach already performs well, explicitly accounting for dependencies between outputs may be beneficial, especially when strong correlations are present\. One way to achieve this is to employ covariance\-based decomposition techniques, such as principal component analysis \(PCA\), to project the original outputs onto a set of orthogonal and uncorrelated components\. The variance\-based multi\-output criterion can then be formulated in this transformed space, where each component after projection represents an independent contribution to the overall variability\. This transformation may help reduce redundancy among outputs and guide the sampling process toward the dominant modes\. The results can also be mapped back to the original output space through an inverse transformation\. Such an approach could be particularly useful for high\-dimensional problems\. The presented multi\-output active learning can be also further combined with various existing techniques based on multiple PCE surrogates, such as multi\-element PCEs\[MultielementPCE,NOVAK2023110728\]for approximation of stochastic PDEs or other regression\-based PCEs\[NOVAK2024112926,LOUKREZIS2025115746\]\.

## 6Conclusion

This paper proposes a novel adaptive sequential sampling strategy for the construction of multi\-output PCE surrogate models\. The generalized criterion extends the original single\-output formulation by incorporating a normalized aggregation of variance contributions across multiple outputs, while maintaining a balance between variance\-driven exploitation and distance\-based exploration\. The capability of the proposed approach has been demonstrated through several numerical examples, including both a synthetic function and engineering applications\. The results consistently show that the criterion improves surrogate accuracy and convergence behavior compared to LHS\. In particular, it achieves more effective control of the worst\-case prediction error, enhances the estimation of second\-order statistics, and exhibits reduced variability across repeated runs, indicating improved robustness of the surrogate model\. From an engineering perspective, the proposed method provides an efficient sampling strategy for multi\-output systems under limited computational budgets\. It enables more effective identification of regions associated with high variance or extreme responses, which is particularly beneficial for reliability analysis and UQ in complex systems\. Overall, the proposed multi\-output criterion offers a practical and robust framework for adaptive sampling in PCE surrogate modeling and shows strong potential for applications in computationally demanding engineering problems\.

## Data Availability

The data and scripts that support the findings of this study are openly available in https://github\.com/NovakLBUT/MultTheta\_engineering at doi\.org/10\.5281/zenodo\.20707012\.

## Acknowledgments

This work was supported by the Czech Science Foundation under project No\. 26\-20456S and project INODIN \(CZ\.02\.01\.01/00/23\_020/0008487, Innovative methods of materials diagnostics and monitoring of engineering infrastructure to increase its durability and service time\) co\-funded by European Union\.

## References
Uncertainty Quantification of Engineering Structures by Polynomial Chaos Expansion and Multivariate Active Learning

Similar Articles

Structure-Preserving Neural Surrogates with Tractable Uncertainty Quantification

Are you sure? A Comprehensive and Comprehensible Survey of Uncertainty Quantification in Symbolic Regression

Large-scale Uncertainty Quantification for Latent Variable Models Using Subsampling Markov Chain Monte Carlo

Geometry-Aware Post-Hoc Uncertainty Quantification in Operator Learning

Accurate Large-sample Uncertainty Quantification using Stochastic Gradient Markov Chain Monte Carlo

Submit Feedback

Similar Articles

Structure-Preserving Neural Surrogates with Tractable Uncertainty Quantification
Are you sure? A Comprehensive and Comprehensible Survey of Uncertainty Quantification in Symbolic Regression
Large-scale Uncertainty Quantification for Latent Variable Models Using Subsampling Markov Chain Monte Carlo
Geometry-Aware Post-Hoc Uncertainty Quantification in Operator Learning
Accurate Large-sample Uncertainty Quantification using Stochastic Gradient Markov Chain Monte Carlo