Mask-Morph Graph U-Net: A Generalisable Mesh-Based Surrogate for Crashworthiness Field Prediction under Large Geometric Variation
Summary
This paper introduces Mask-Morph Graph U-Net (MMGUNet), a graph neural network-based surrogate model for crashworthiness field prediction that addresses geometric generalisability via coarse-graph morphing and masked pretraining.
View Cached Full Text
Cached at: 05/18/26, 06:37 AM
# Mask-Morph Graph U-Net: A Generalisable Mesh-Based Surrogate for Crashworthiness Field Prediction under Large Geometric Variation Source: [https://arxiv.org/html/2605.15231](https://arxiv.org/html/2605.15231) Tobias LehrerYingxue ZhaoHaosu ZhouPhilipp StockerTobias PfaffNan Li[n\.li09@imperial\.ac\.uk](https://arxiv.org/html/2605.15231v1/mailto:[email protected]) ###### Abstract Nonlinear finite element crash simulations are accurate but computationally expensive, limiting their use in iterative design optimisation\. Machine\-learning surrogate models based on graph neural networks \(GNNs\) offer a faster alternative\. Message\-passing GNNs are widely used for mesh simulation, and their shared node and edge update functions are relatively generalisable across varying graph structures\. By contrast, non\-shareable edge\-specific aggregation layers can capture nonlinear relationships more accurately but usually require fixed graph connectivity, which limits generalisability\. This paper presents Mask\-Morph Graph U\-Net \(MMGUNet\), a practical approach to addressing the limitation of hierarchical Graph U\-Net architectures that use edge\-specific downsampling and upsampling layers\. Fixed coarse graph connectivity is required for edge\-specific layers\. To retain this while improving spatial correspondence, the proposed method morphs the coarsened graph hierarchy to each input mesh using feature\-aligned barycentric parameterisation before constructing cross\-graph edges\. It further applies node masking during supervised pretraining, followed by parameter\-efficient fine\-tuning in which high\-parameter edge\-specific layers are frozen\. The proposed approach is evaluated in in\-distribution, out\-of\-distribution, and cross\-component transfer settings using mean Euclidean distance and maximum intrusion percentage error\. Results show that coarse\-graph morphing improves test accuracy relative to a fixed\-coarse\-graph baseline, while masked supervised pretraining reduces the train\-test discrepancy and improves data efficiency during transfer\. The proposed model also achieves lower prediction error compared with external baselines\. These results demonstrate a practical route toward reusable, data\-efficient mesh\-based surrogate modelling for crashworthiness design exploration\. ###### keywords: Surrogate modelling , Graph neural networks , Crashworthiness analysis , Geometric generalisability , Transfer learning \\affiliation \[label1\]organization=Dyson School of Design Engineering, Imperial College London, city=London, country=UK \\affiliation \[label2\]organization=TUM School of Engineering and Design, Technical University of Munich, city=Munich, country=Germany \\affiliation \[label3\]organization=Faculty of Mechanical Engineering, OTH Regensburg, city=Regensburg, country=Germany \\affiliation \[label4\]organization=NVIDIA, country=UK ## 1Introduction Crashworthiness is an important performance criterion in the structural design of safety\-critical vehicle components, as it measures their ability to protect passengers during vehicle accidents\. Crashworthiness analysis is traditionally performed via nonlinear finite element \(FE\) simulations that capture complex crash modes with large deformationsWu \[[2006](https://arxiv.org/html/2605.15231#bib.bib1)\], Changet al\.\[[2007](https://arxiv.org/html/2605.15231#bib.bib2)\]\. Despite their accuracy, such FE analyses are computationally expensive, which limits their application in iterative workflows of design optimisation\. This motivates the development of machine\-learning surrogate models\. Early surrogate models for crashworthiness largely focused on predicting scalar responses like peak crushing force and specific energy absorptionAlbak \[[2023](https://arxiv.org/html/2605.15231#bib.bib8)\], Xionget al\.\[[2018](https://arxiv.org/html/2605.15231#bib.bib13)\], Ahmadi Dastjerdiet al\.\[[2019](https://arxiv.org/html/2605.15231#bib.bib11)\], Zende and Dalir \[[2022](https://arxiv.org/html/2605.15231#bib.bib12)\], using simple architectures such as multilayer perceptrons \(MLPs\) and recurrent neural networks \(RNNs\)Rogalaet al\.\[[2020](https://arxiv.org/html/2605.15231#bib.bib6)\], Sakaridiset al\.\[[2022](https://arxiv.org/html/2605.15231#bib.bib7)\], Koharet al\.\[[2020](https://arxiv.org/html/2605.15231#bib.bib9)\], Guoet al\.\[[2023](https://arxiv.org/html/2605.15231#bib.bib10)\]\. These models are limited to scalar inputs and outputs and therefore struggle to capture the detailed spatial behaviour of complex simulations\. Field prediction is important because crash response depends not only on global metrics but also on the spatial distribution of deformation, intrusion, and load transfer across the structure\. Convolutional neural networks \(CNNs\) have been used for field prediction by mapping simulation data to 2D or 3D image representationsKoharet al\.\[[2021](https://arxiv.org/html/2605.15231#bib.bib14)\], Liet al\.\[[2024](https://arxiv.org/html/2605.15231#bib.bib15)\]\. Although these approaches can predict detailed fields, pixel\- or voxel\-based representations can struggle to encode complex geometries with irregular discretisations\. To address these limitations, recent work has increasingly leveraged graph neural networks \(GNNs\) as surrogate models that directly encode vehicle components into graph representationsLiet al\.\[[2026](https://arxiv.org/html/2605.15231#bib.bib5)\], Wenet al\.\[[2023](https://arxiv.org/html/2605.15231#bib.bib16)\]\. Wen et al\.Wenet al\.\[[2023](https://arxiv.org/html/2605.15231#bib.bib16)\]used segment\-based graphs to predict the dynamic behaviour of regularly structured vehicle components\. Le Guennec et al\.Le Guennecet al\.\[[2025](https://arxiv.org/html/2605.15231#bib.bib47)\]proposed a neural field surrogate model for crash dynamic prediction of vehicle components, showing superior performance over traditional reduced\-order models\. André et al\.Andréet al\.\[[2023](https://arxiv.org/html/2605.15231#bib.bib50)\]combined neural networks with FE simulations to model mechanical joints in large\-scale crash analyses, showing how learned component models can reduce the cost of full\-vehicle simulation\. Thel et al\.Thelet al\.\[[2024](https://arxiv.org/html/2605.15231#bib.bib49),[2025](https://arxiv.org/html/2605.15231#bib.bib48)\]introduced Finite Element Method Integrated Networks \(FEMIN\) as a framework that embeds neural networks within the FE pipeline to replace selected parts of the simulation while preserving physics\-based structure\. Zhang et al\.Zhanget al\.\[[2026](https://arxiv.org/html/2605.15231#bib.bib51)\]proposed a mesh\-based GNN framework that compresses large multi\-component FE assemblies into smaller graph representations for rapid response prediction\. Nabian et al\.Nabianet al\.\[[2025](https://arxiv.org/html/2605.15231#bib.bib52)\]applied MeshGraphNetPfaffet al\.\[[2021](https://arxiv.org/html/2605.15231#bib.bib3)\]and TransolverWuet al\.\[[2024](https://arxiv.org/html/2605.15231#bib.bib53)\], Luoet al\.\[[2025](https://arxiv.org/html/2605.15231#bib.bib54)\]to a body\-in\-white crash dataset with more than 200 components using the NVIDIA PhysicsNeMoPhysicsNeMo Contributors \[[2023](https://arxiv.org/html/2605.15231#bib.bib55)\]\. These studies demonstrated the feasibility of GNN\-based surrogate modelling for multi\-component crash dynamics\. GNN surrogate models are also developed for other mesh\-based prediction in related domainsPfaffet al\.\[[2021](https://arxiv.org/html/2605.15231#bib.bib3)\], Deshpandeet al\.\[[2022](https://arxiv.org/html/2605.15231#bib.bib17)\], Heet al\.\[[2023](https://arxiv.org/html/2605.15231#bib.bib18)\], Fuet al\.\[[2023](https://arxiv.org/html/2605.15231#bib.bib19)\], Chenet al\.\[[2024](https://arxiv.org/html/2605.15231#bib.bib20)\], Zhouet al\.\[[2025](https://arxiv.org/html/2605.15231#bib.bib26)\]\. The most commonly adopted architecture is the encoder–processor–decoder architecture in the Graph Network\-based Simulators \(GNS\) proposed by Sanchez\-Gonzalez et al\.Sanchez\-Gonzalezet al\.\[[2020](https://arxiv.org/html/2605.15231#bib.bib4)\]\. This architecture applies multiple MLP\-based graph\-network blocks for iterative edge/node updates\. MeshGraphNet \(MGN\) adapts this architecture to mesh simulation and augments mesh\-edge interactions with additional world edges to better capture non\-local contact and collision effectsPfaffet al\.\[[2021](https://arxiv.org/html/2605.15231#bib.bib3)\]\. To improve computational efficiency on large graphs, several multiscale modelsFortunatoet al\.\[[2022](https://arxiv.org/html/2605.15231#bib.bib25)\], Caoet al\.\[[2022](https://arxiv.org/html/2605.15231#bib.bib24)\]perform message passing on hierarchies of coarsened graphs, reducing long\-range message passing steps\. MGN provides a strong foundation for modelling mesh data with message passing neural networks \(MPNNs\)\. This approach utilises MLPs as edge and node update functions, which are typically more transferable across different meshes with different topologies\. The trainable update functions \(weights\) are shared across all nodes and edges within the graph, so we refer to this as shared\-weight message passing\. By contrast, Deshpande et al\.Deshpandeet al\.\[[2022](https://arxiv.org/html/2605.15231#bib.bib17)\]proposed the Multi\-channel Aggregation Network \(MAgNET\), consisting of multi\-channel aggregation layers that assign non\-shareable, edge\-specific weights to each edge in each channel\. This can improve nonlinear approximation accuracy but requires a fixed or topology\-consistent graph structure during training\. As a result, compared with shared\-weight message passing models, its application is limited when the input mesh topology changes\. Prior mesh\-based crashworthiness surrogates such as the Recurrent Graph U\-Net \(ReGUNet\)Liet al\.\[[2026](https://arxiv.org/html/2605.15231#bib.bib5)\], Zhaoet al\.\[[2026](https://arxiv.org/html/2605.15231#bib.bib27)\]have shown that hierarchical Graph U\-Net architectures with fixed coarsened graphs and edge\-specific coarse\-level operations can achieve accurate and efficient deformation prediction for vehicle panel components\. However, the same mechanism that provides high prediction capacity also creates a generalisability bottleneck\. Edge\-specific layers require fixed coarse connectivity, while cross\-graph edges are commonly constructed using spatial proximity between the input mesh and a shared coarse template\. When geometric variation becomes large, spatial proximity alone can connect non\-corresponding structural regions and degrade prediction performance\. When the target distribution or component family changes more substantially, this limitation can also reduce transferability by increasing the amount of target data required for adaptation\. Here, generalisability refers to prediction on unseen geometries without retraining or fine\-tuning, whereas transferability refers to efficient adaptation using limited target data\. As shown in Figure[1](https://arxiv.org/html/2605.15231#S1.F1), spatial\-proximity\-based cross\-graph connections can become insufficient for large shape variation, leaving many nodes unconnected and reducing predictive performance\. This limitation reflects a fundamental trade\-off: models with edge\-specific operations can achieve higher predictive accuracy, but their reliance on fixed graph connectivity limits generalisability\. This motivates a generalisable mesh\-based surrogate that preserves fixed coarse topology for high\-capacity edge\-specific aggregation while adapting coarse graph geometry to each input shape, together with a transferable pretraining strategy for efficient target adaptation  \(a\)  \(b\) ∙\\bulletfine nodefine edge∙\\bulletcoarse nodecoarse edgecross\-graph edge Figure 1:Illustration of spatial\-proximity\-based cross\-graph edge construction\. When the fine and coarse graphs have similar geometry, as in \(a\), nearest\-neighbour fine\-to\-coarse edges provide meaningful local correspondence\. Under large shape variation, as in \(b\), the fixed coarse graph becomes geometrically misaligned, causing some fine nodes to connect to inappropriate coarse regions\.In this paper, we address the trade\-off between topological flexibility and predictive capacity in mesh\-based GNN surrogates for crashworthiness analysis\. The proposed Mask\-Morph Graph U\-Net retains fixed\-topology coarsened graphs so that edge\-specific multiscale aggregation layers can be used, but morphs the coarsened graph hierarchy to each input geometry before constructing fine\-to\-coarse cross\-graph edges\. This improves spatial correspondence under shape variation while preserving the trainable edge\-specific structure at coarse levels\. We further adopt a masked pretraining and parameter\-efficient fine\-tuning strategy to improve robustness, data efficiency, and transferability across tasks\. The resulting framework is termed Mask\-Morph Graph U\-Net \(MMGUNet\)\. We evaluate MMGUNet on multiple crashworthiness scenarios, including B\-pillar side\-impact cases and a U\-channel dynamic\-loading case, and demonstrate improved predictive performance and cross\-component transfer performance\. The main contributions of this work are as follows: - 1\.We propose Mask\-Morph Graph U\-Net, a multiscale mesh\-based GNN surrogate that combines topology\-preserving coarse\-graph morphing with edge\-specific downsampling and upsampling layers for crashworthiness field prediction\. - 2\.We introduce a feature\-aligned barycentric morphing procedure that allows fixed\-topology, edge\-specific multiscale graph operators to be reused across geometrically varying finite\-element meshes\. - 3\.We adopt a supervised masked pretraining and a parameter\-efficient fine\-tuning strategy for crashworthiness surrogate modelling to further improve generalisability and training efficiency\. - 4\.We construct and evaluate a multi\-geometry crashworthiness case\-study suite, including four B\-pillar shape variants and one U\-channel case, and provide comprehensive cross\-task transfer learning results\. The remainder of this paper is organised as follows\. Section[2](https://arxiv.org/html/2605.15231#S2)reviews related work\. Section[3](https://arxiv.org/html/2605.15231#S3)defines the task and presents the network architecture\. Section[4](https://arxiv.org/html/2605.15231#S4)describes dataset generation and the case\-study setup\. Section[5](https://arxiv.org/html/2605.15231#S5)introduces feature\-aligned morphing for shell meshes\. Section[6](https://arxiv.org/html/2605.15231#S6)presents the training strategy, including masked pretraining and parameter\-efficient fine\-tuning\. Section[7](https://arxiv.org/html/2605.15231#S7)reports and discusses the experimental results\. Finally, Section[8](https://arxiv.org/html/2605.15231#S8)concludes the paper\. ## 2Related work ### 2\.1Graph morphing For a generalisable fixed\-topology graph surrogate, a key requirement is to maintain consistent coarse\-level connectivity while improving geometric correspondence between fine and coarse graph levels\. This can effectively improve the model’s generalisability by constructing more meaningful cross\-graph edge connections, while maintaining high predictive accuracy due to the edge\-specific layersLiet al\.\[[2026](https://arxiv.org/html/2605.15231#bib.bib5)\]\. In this setting, the task can be formulated as template\-to\-target mesh morphing with fixed connectivity, specifically, updating nodal coordinates to follow the target surface, while preserving the source topology and avoiding remeshing\. In engineering practice, a commonly used morphing strategy is control\-point\-driven morphingde Boeret al\.\[[2007](https://arxiv.org/html/2605.15231#bib.bib38)\]\. This approach defines a set of control points or control regions, prescribes translations at those locations, and then computes the displacements of the remaining nodes to ensure a smooth geometric transition\. For example, given two meshes, if the objective is to morph a template mesh to match the shape of a target mesh, the boundary nodes of the template mesh can be constrained to coincide with the target mesh boundary\. For interior nodes, a common choice is radial basis function \(RBF\) interpolation, which propagates boundary displacements to interior nodes and can handle large deformations without explicit connectivity dependencede Boeret al\.\[[2007](https://arxiv.org/html/2605.15231#bib.bib38)\]\. Other widely used techniques include free\-form deformation \(FFD\), which controls smooth global shape changes through a low\-dimensional embedding latticeSederberg and Parry \[[1986](https://arxiv.org/html/2605.15231#bib.bib39)\]\. While these methods are efficient for geometry warping, they do not explicitly optimise surface\-to\-surface correspondence\. Parameterisation\-based morphing is a more effective approach in this context\. Cross\-parameterisation and inter\-surface mapping methods provide a stronger foundation by directly seeking bijective maps between meshesKraevoy and Sheffer \[[2004](https://arxiv.org/html/2605.15231#bib.bib40)\], Schreineret al\.\[[2004](https://arxiv.org/html/2605.15231#bib.bib41)\]\. A standard mesh\-morphing pipeline is to map source and target surfaces to a common parameter domain, establish correspondence there, and then interpolate nodal coordinates to obtain the transformed shapeAlexaet al\.\[[2000](https://arxiv.org/html/2605.15231#bib.bib31)\]\. Traditional parameterisation approaches include barycentric embedding by TutteTutte \[[1963](https://arxiv.org/html/2605.15231#bib.bib28)\]\. In this formulation, the boundary is fixed as a convex polygon, and each interior vertex is placed at the barycentric average of its neighbouring vertices, which yields a crossing\-free embedding under standard graph conditions\. Because the interior vertex constraints are linear, the embedding can be computed efficiently as the unique solution of a sparse linear system for a given boundary\. Later advancements, including Floater\-style barycentric variants and mean\-value coordinates, improve mapping quality while retaining the robustness of linear barycentric formulationsFloater \[[1997](https://arxiv.org/html/2605.15231#bib.bib29),[2003](https://arxiv.org/html/2605.15231#bib.bib30)\]\. Conformal methods such as least\-squares conformal maps \(LSCM\) further reduce angular distortionLévyet al\.\[[2002](https://arxiv.org/html/2605.15231#bib.bib42)\]\. More recent injective optimisation frameworks such as scalable locally injective mappingsRabinovichet al\.\[[2017](https://arxiv.org/html/2605.15231#bib.bib43)\]improve distortion control and reduce foldovers under larger shape variation\. ### 2\.2Masked pretraining and fine\-tuning Another way to enhance generalisability is to use masked pretraining followed by task\-specific fine\-tuning\. In computer vision, He et al\.Heet al\.\[[2022](https://arxiv.org/html/2605.15231#bib.bib32)\]proposed the masked autoencoder \(MAE\) which introduced high\-ratio masking with asymmetric encoder–decoder design\. The self\-supervised pretraining procedure involves pretraining the encoder–decoder architecture with randomly masked input, followed by introducing masked tokens in the latent representation before reconstructing the output with a lightweight decoderHeet al\.\[[2022](https://arxiv.org/html/2605.15231#bib.bib32)\], Devlinet al\.\[[2019](https://arxiv.org/html/2605.15231#bib.bib36)\]\. In graph learning, masked graph autoencoder frameworks such as GraphMAEHouet al\.\[[2022](https://arxiv.org/html/2605.15231#bib.bib33)\]pretrain encoders by reconstructing masked node information, thereby improving the generalisability of the fine\-tuned model on downstream tasks\. A subsequent decoding\-enhanced variant further improves transfer performance by strengthening the decoder design and reconstruction objective, which yields more informative latent representations for fine\-tuning across tasksHouet al\.\[[2023](https://arxiv.org/html/2605.15231#bib.bib34)\]\. For mesh\-based physics simulation applications, MeshMask extends this idea by masking a random subset of mesh nodes during pretraining\. The pretrained encoder can then be fine\-tuned on target simulation tasks with improved long\-horizon accuracy and robustnessGarnieret al\.\[[2025](https://arxiv.org/html/2605.15231#bib.bib35)\]\. Garnier et al\.Garnieret al\.\[[2025](https://arxiv.org/html/2605.15231#bib.bib35)\]have also examined the transfer learning ability of the proposed model and training strategy, indicating improved data efficiency when transferring a pretrained model to unseen tasks\. More generally, the benefits of pretraining and fine\-tuning have also been examined in the broader transfer\-learning literature for GNN surrogates\. Whalen et al\.Whalen and Mueller \[[2022](https://arxiv.org/html/2605.15231#bib.bib37)\]conducted a transfer\-learning study on GNN\-based surrogate models for trusses, showing that pretraining on similar datasets can effectively reduce data requirements and improve training efficiency when adapting to new tasks\. ## 3Task definition and Network Architecture ### 3\.1Task definition This study considers component\-level crashworthiness field prediction under fixed loading, boundary, material, and contact settings\. The geometric design of the component is varied, and the surrogate model predicts the terminal nodal displacement field after impact\. This single\-step formulation is appropriate for early\-stage design studies in which the final deformed state and intrusion response are the primary quantities of interest\. Under this controlled setup, the model’s generalisation behaviour is evaluated by increasing the magnitude of shape variation, changing mesh density and component scale, and transferring across related component geometries\. We formulate the prediction problem as a single\-step prediction task, similar toLiet al\.\[[2024](https://arxiv.org/html/2605.15231#bib.bib15)\], Deshpandeet al\.\[[2022](https://arxiv.org/html/2605.15231#bib.bib17)\]\. In this case, the input is the component shape and the output is the final\-step deformation fields in x, y, and z\. This formulation is well aligned with practical design optimisation, in which the terminal deformed state is typically sufficient for early\-stage crashworthiness assessment and decision\-making\. ### 3\.2Graph definition We represent each FE mesh as a graph 𝒢=\(V,E\),\\mathcal\{G\}=\(\{V\},\{E\}\),\(1\)whereVVis the set of nodes andEEis the set of edges\. The input node features are denoted by𝐕=\{𝐯i\}i=1Nv\\mathbf\{V\}=\\\{\\mathbf\{v\}\_\{i\}\\\}\_\{i=1\}^\{N\_\{v\}\}, and the edge features are denoted by𝐄=\{𝐞k\}k=1Ne\\mathbf\{E\}=\\\{\\mathbf\{e\}\_\{k\}\\\}\_\{k=1\}^\{N\_\{e\}\}\. Here,𝐯i\\mathbf\{v\}\_\{i\}is the node feature vector associated with nodeii,𝐞k\\mathbf\{e\}\_\{k\}is the edge feature vector associated with edgekk, andNvN\_\{v\}andNeN\_\{e\}denote the numbers of nodes and edges, respectively\. Each graph node corresponds to a mesh node, and each graph edge corresponds to an element edge in the mesh\. A nodeiihas nodal coordinate𝐱i=\[xi,yi,zi\]\\mathbf\{x\}\_\{i\}=\[x\_\{i\},y\_\{i\},z\_\{i\}\]\. For an edge connecting nodes\(i,j\)\(i,j\), we define the edge features as the relative position vector as 𝐞ij=\[𝐫ij,∥𝐫ij∥2\]∈ℝ4,𝐫ij=𝐱i−𝐱j,\\mathbf\{e\}\_\{ij\}=\\left\[\\mathbf\{r\}\_\{ij\},\\;\\lVert\\mathbf\{r\}\_\{ij\}\\rVert\_\{2\}\\right\]\\in\\mathbb\{R\}^\{4\},\\mathbf\{r\}\_\{ij\}=\\mathbf\{x\}\_\{i\}\-\\mathbf\{x\}\_\{j\},\(2\)where∥𝐫ij∥2\\lVert\\mathbf\{r\}\_\{ij\}\\rVert\_\{2\}denotes the Euclidean distance between nodesiiandjj\. For each undirected mesh edge, both directed edges\(i,j\)\(i,j\)and\(j,i\)\(j,i\)are included so that the signed relative\-position vector is consistently defined during message passing\. These edge features are sufficient to encode mesh shape and local geometric relations\. Therefore, absolute nodal coordinates are not required, and node features can be treated as optional positional encodings\. The effect of different node feature designs is analysed in Section[7\.1\.1](https://arxiv.org/html/2605.15231#S7.SS1.SSS1)\. The model’s output is the predicted node\-wise displacement: 𝐲^i=\[Δx^i,Δy^i,Δz^i\],i=1,…,Nv\.\\hat\{\\mathbf\{y\}\}\_\{i\}=\[\\Delta\\hat\{x\}\_\{i\},\\Delta\\hat\{y\}\_\{i\},\\Delta\\hat\{z\}\_\{i\}\],\\quad i=1,\\dots,N\_\{v\}\.\(3\) To improve message\-passing efficiency on large meshes, we adopt a hierarchical graph representation by coarsening the fine graph to multiple lower\-resolution levels, thereby reducing the number of nodes processed at deeper layers\. To keep coarse\-level topology constant, which is required for edge\-specific layers, the coarsened graphs are constructed from software\-generated template meshes and shared across samplesLiet al\.\[[2026](https://arxiv.org/html/2605.15231#bib.bib5)\]\. Cross\-graph message passing is then performed by connecting each fine node to itskknearest coarse nodes\. Because we use shared coarsened graphs, this connection method is only valid when the fine graph has a shape similar to the coarse graph\. As shown in Figure[1](https://arxiv.org/html/2605.15231#S1.F1), when the shapes differ substantially, fine nodes may be connected to inappropriate coarse nodes, thereby limiting model performance\. In the zoomed view of Figure[1](https://arxiv.org/html/2605.15231#S1.F1)\(b\), the fine\-mesh nodes located at the top\-right corner are connected to coarse\-mesh nodes in the T\-joint region\. Under a correct correspondence, they should instead connect to the top\-right nodes of the coarse mesh\. To address this limitation, we morph the coarsened graphs to match each input fine graph and construct cross\-graph edges after morphing the coarse hierarchy\. The morphing procedure is detailed in Section[5](https://arxiv.org/html/2605.15231#S5)\. ### 3\.3Network architecture MMGUNet is built on a hierarchical Graph U\-Net backbone for multiscale field prediction on FE meshesLiet al\.\[[2026](https://arxiv.org/html/2605.15231#bib.bib5)\]\. Unlike recurrent temporal graph surrogates designed for rollout prediction, the present architecture targets terminal crashworthiness field prediction under large geometric variation\. As illustrated in Figure[2](https://arxiv.org/html/2605.15231#S3.F2), the model adopts a hierarchical design based on the multi\-level graph representation\. Its main design objective is to combine topology\-flexible MLP\-based shared\-weight operations at fine levels with high\-capacity edge\-specific operations on fixed coarsened graphs\. The encoder maps node and edge inputs into latent states with MLP\-based operators\. At both the finest and coarsest levels, in\-graph message passing \(IG\-MP\) also uses MLP operators, which can be written as 𝐦ij\(l\)=ϕe\(l\)\(\[𝐡i\(l\),𝐡j\(l\),𝐞ij\]\),𝐡i\(l\+1\)=ϕv\(l\)\(\[𝐡i\(l\),∑j∈𝒩\(i\)𝐦ij\(l\)\]\),\\mathbf\{m\}\_\{ij\}^\{\(l\)\}=\\phi\_\{e\}^\{\(l\)\}\\\!\\left\(\[\\mathbf\{h\}\_\{i\}^\{\(l\)\},\\mathbf\{h\}\_\{j\}^\{\(l\)\},\\mathbf\{e\}\_\{ij\}\]\\right\),\\quad\\mathbf\{h\}\_\{i\}^\{\(l\+1\)\}=\\phi\_\{v\}^\{\(l\)\}\\\!\\left\(\[\\mathbf\{h\}\_\{i\}^\{\(l\)\},\\sum\_\{j\\in\\mathcal\{N\}\(i\)\}\\mathbf\{m\}\_\{ij\}^\{\(l\)\}\]\\right\),\(4\)where𝐦ij\(l\)\\mathbf\{m\}\_\{ij\}^\{\(l\)\}is the edge message for edge𝐞ij\\mathbf\{e\}\_\{ij\},𝐡i\(l\)\\mathbf\{h\}\_\{i\}^\{\(l\)\}is the latent feature for nodeiiat layerll,ϕe\(l\)\\phi\_\{e\}^\{\(l\)\}andϕv\(l\)\\phi\_\{v\}^\{\(l\)\}are shared\-weight MLP operators for edges and nodes, respectively, at layerll, and𝒩\(i\)\\mathcal\{N\}\(i\)denotes all neighbouring nodes connected to nodeii\. Figure 2:Overview of the proposed MMGUNet architecture\. Shared\-weight MLP\-based operations are used for input encoding, in\-graph message passing, and topology\-flexible cross\-graph operations, while edge\-specific operations are used on fixed\-topology coarsened graphs\. IG denotes in\-graph, DS denotes downsampling, US denotes upsampling, MP denotes message passing, and ES denotes edge\-specific\.A similar message passing layer is adopted for the first downsampling layer, i\.e\., from the input fine graph to the first shared coarsened graph\. This layer enables the model to accommodate graphs with arbitrary topology because it can be applied to graphs with different numbers of edges\. At coarser levels, we uses edge\-specific downsampling/upsampling layers \(DS\-ES and US\-ES\) on fixed\-topology downsampled graphsLiet al\.\[[2026](https://arxiv.org/html/2605.15231#bib.bib5)\]\. The key idea is to assign edge\-specific parameters per channel, thereby enabling more accurate approximation of nonlinear relationships\. In an equivalent channel\-wise form, the coarse\-node update is 𝐡i\(l\+1\)=LeakyReLU\(∑j∈𝒩\(i\)∑c∈Cin𝐡j\(c,l\)⋅𝐰ij\(c,l\)\),\\mathbf\{h\}\_\{i\}^\{\(l\+1\)\}=\\mathrm\{LeakyReLU\}\\\!\\left\(\\sum\_\{j\\in\\mathcal\{N\}\(i\)\}\\sum\_\{c\\in C\_\{in\}\}\\mathbf\{h\}\_\{j\}^\{\(c,l\)\}\\cdot\\mathbf\{w\}\_\{ij\}^\{\(c,l\)\}\\right\),\(5\)where𝐰ij\(c,l\)∈ℝCout\\mathbf\{w\}\_\{ij\}^\{\(c,l\)\}\\in\\mathbb\{R\}^\{C\_\{out\}\}is thecc\-th channel component of the edge\-specific non\-shareable weight for edgeeije\_\{ij\}at layerll,CinC\_\{in\}andCoutC\_\{out\}are the numbers of input and output channels, respectively, and𝐡j\(c,l\)\\mathbf\{h\}\_\{j\}^\{\(c,l\)\}is the channel\-specific latent node feature for thecc\-th channel at layerll\. Compared with shared\-weight message passing, this edge\-specific channel\-wise parameterisation substantially increases representation capacity on fixed graphs, but it also increases trainable parameter countLiet al\.\[[2026](https://arxiv.org/html/2605.15231#bib.bib5)\]\. The decoder maps final latent node states to the displacement prediction𝐲^i\\hat\{\\mathbf\{y\}\}\_\{i\}using MLP\-based operators similar to those in the encoder\. ## 4Dataset Generation This section describes the case studies constructed for this work\. We focus on B\-pillar\-like thin\-walled components under side\-impact loading, and we design the dataset to systematically evaluate generalisability of the surrogate model to shape variations\. In total, we consider seven case studies: six B\-pillar cases and one U\-channel case\. ##### B\-pillar case studies We generate four base B\-pillar shapes with increasing scale and geometric complexity\. B\-pillar A and B are approximately one\-third of full scale\. B\-pillar C is a larger, more complex geometry at approximately two\-thirds scale, and B\-pillar D is full scale with the richest local geometric features\. Figure[3](https://arxiv.org/html/2605.15231#S4.F3)\(a\) shows that B\-pillars C and D contain more complex local geometric features, such as stiffeners in the middle and wrinkles in the lower region\. B\-pillars A and B are created directly using computer\-aided design \(CAD\) software, whereas B\-pillars C and D are generated through hot\-stamping simulations and therefore provide more realistic component geometries\. This design allows us to examine the model’s ability to generalise across different component scales and levels of geometric complexity\.  \(a\)  \(b\) Figure 3:Case\-study geometries and B\-pillar morphing directions\. \(a\) Base component geometries: B\-pillar A, B\-pillar B, B\-pillar C, B\-pillar D, and U\-channel\. \(b\) B\-pillar design variation: top\-region translations in the x\- and y\-directions and local morphing in the z\-direction at one of three control points\.For all B\-pillar cases, boundary and loading conditions are fixed, as shown in Figure[4](https://arxiv.org/html/2605.15231#S4.F4)\. The FE setup simulates an experimental component\-level B\-pillar side crash testZhanget al\.\[[2020](https://arxiv.org/html/2605.15231#bib.bib56)\], where the top and bottom regions are constrained in all degrees of freedom, and a cylindrical impactor strikes the component at the same location, located at one\-third of the component height measured from the bottom, with a fixed impact velocity\. FE simulations were run with Virtual Performance Solution; details can be found inLiet al\.\[[2026](https://arxiv.org/html/2605.15231#bib.bib5)\]\. Only geometry is varied, allowing us to isolate the effect of shape design on crashworthiness response\. Shape variation is achieved by control point mesh morphing with Blender, followed by remeshing with HyperMesh to avoid mesh distortion and to create different mesh topologies for each sample\. In the full setting, morphing is applied in all three directions \(x,y,zx,y,z\)\. As illustrated in Figure[3](https://arxiv.org/html/2605.15231#S4.F3)\(b\), the top region is translated in both thexxandyydirections\.xxis varied in\[−5%,5%\]\[\-5\\%,5\\%\],yyin\[0,10%\]\[0,10\\%\]\. The whole component is also morphed inzzthrough one of three control points over the range\[−10%,10%\]\[\-10\\%,10\\%\]\. Appendix A presents extreme morphed B\-pillar cases to illustrate the extent of the geometric variation considered in this study\. We sample the morphing parameters with Latin Hypercube Sampling \(LHS\)McKayet al\.\[[1979](https://arxiv.org/html/2605.15231#bib.bib57)\]\. Beyond the four base B\-pillar studies, we additionally define two reduced\-direction variants of B\-pillar A \(B\-pillar A1 forzz\-direction morphing only and B\-pillar A2 forxzxz\-direction morphing, while the fullxyzxyz\-direction case is denoted B\-pillar A3\), resulting in six B\-pillar case studies in total\. These reduced variants are introduced to examine the model’s ability to generalise across morphing directions, for example when trained on smaller directional variation and evaluated on larger variation\. On the other hand, each B\-pillar has a different mesh density as detailed in Table[1](https://arxiv.org/html/2605.15231#S4.T1)\. For instance, B\-pillar B contains approximately twice as many mesh nodes as B\-pillar A despite being defined at the same geometric scale\. Examining these cases enables assessment of the model’s ability to generalise across different mesh densities\. Together, these B\-pillar case studies allow us to systematically evaluate the model’s generalisation behaviour with respect to shape, scale, mesh density, shape variation range, and local geometric complexity\. The geometric details are summarised in Table[1](https://arxiv.org/html/2605.15231#S4.T1)\.  \(a\)  \(b\) Figure 4:B\-pillar side\-impact simulation setup\. \(a\) Fixed boundary regions and impactor direction\. \(b\) Example deformed B\-pillar response, coloured by nodal displacement norm in millimetres\. ##### U\-channel case study To further test cross\-component generalisation, we include one U\-channel case from the U\-Channel\-2 part\-class of the U\-Channel sheet metal \(UCSM\) datasetLehreret al\.\[[2025](https://arxiv.org/html/2605.15231#bib.bib44)\]\. The U\-channel represents a stamped thin\-walled sheet\-metal component like the B\-pillar, so the two cases share a similar manufacturing origin and basic structural characteristics\. In addition, the U\-channel retains a U\-shaped profile similar to the middle region of a B\-pillar, which makes the comparison interpretable while still involving a distinct overall component geometry\. Together, these components are related enough to make cross\-component transfer physically meaningful, yet different enough to provide a stringent test of how strongly the model generalises beyond the original component family\. For the U\-channel case, shape variation is generated by sampling CAD parameters rather than direct morphing\. Specifically, we alter 16 geometric parameters with LHSMcKayet al\.\[[1979](https://arxiv.org/html/2605.15231#bib.bib57)\], including length, width, and height controls for the addendum, slant, and mid\-plane regions\. This leads to a broader and less constrained design space than the B\-pillar cases\. Details on the dataset parameters are provided in[A](https://arxiv.org/html/2605.15231#A1)\. Table[1](https://arxiv.org/html/2605.15231#S4.T1)summarises the seven case studies in terms of scale, approximate node count, variation definition, and geometric complexity\. For each B\-pillar case study, we generate 300 training samples and 50 test samples, with each set covering the full design space through an independent LHS design\. For the U\-channel case, owing to its larger design space, we use 1000 training samples and 200 test samples\. The training and test sets are also sampled independently so that both provide coverage of the full design space\. Table 1:Summary of the seven case studies used in this work\. ## 5Parameterisation\-based morphing with feature alignment The purpose of the morphing procedure is to improve fine\-to\-coarse graph correspondence while preserving the fixed topology required by edge\-specific coarse\-level layers\. For each input mesh, the coarsened template graph is geometrically morphed toward the input shape before nearest\-neighbour cross\-graph edges are constructed\. This allows the same coarse connectivity and edge\-specific parameters to be retained, while reducing the risk that fine nodes are connected to geometrically inappropriate coarse regions\. In this section, we introduce the morphing algorithm adopted in this study to improve geometric shape alignment between fixed\-topology hierarchical downsampled meshes and varying input fine meshes\. The method first computes a barycentric embedding of each mesh onto a shared parameter domain \(UV domain\)\. The template mesh is then morphed by transferring the target 3D nodal coordinates through linear interpolation in the UV domain\. To improve feature alignment, we further anchor the UV boundary to a case\-specific polygon defined by the detected corner nodes, replacing the traditional circular boundaryFloater \[[1997](https://arxiv.org/html/2605.15231#bib.bib29)\]\. ### 5\.1Barycentric embedding The objective of this step is to map the template and target shell meshes to a shared 2D domain so that node correspondence can be established and the required nodal displacement for shape transformation can be calculated\. We follow the classic idea of Tutte’s barycentric embedding theoremTutte \[[1963](https://arxiv.org/html/2605.15231#bib.bib28)\]\. This is because computer\-aided engineering \(CAE\) meshes for vehicle components are typically more well\-structured than arbitrary meshes in the computer vision domain\. Therefore, advanced anti\-distortion algorithms are not required for the meshes considered in this study\. In practice, we partition each mesh into boundary and interior nodes\. We then prescribe the boundary positions on a predefined convex shape, referred to here as the UV domain\. The boundary nodes must be identified using a consistent ordering and orientation\. Specifically, the boundary loop is defined in a clockwise direction, starting from the bottom\-left node of the component\. We then solve for interior node coordinates in the UV domain using uniform barycentric averaging of neighbouring nodes\. Because interior node constraints are linear, the mapping is obtained by solving a sparse linear system, yielding a unique embedded configuration for each mesh under fixed boundary conditions\. This formulation is also physically interpretable as the equilibrium state of a pinned spring system\. Further algorithmic details are provided in[B](https://arxiv.org/html/2605.15231#A2)\. ### 5\.2Boundary construction and feature alignment Although interior UV coordinates are determined by the linear system, the global correspondence pattern of the UV map is largely determined by how the boundary vertices are placed in the UV domain, for example whether they are distributed uniformly or anchored non\-uniformly to specific feature locations such as corners\. This is important for morphing, because interpolation in UV space implicitly assumes that semantically similar regions \(e\.g\., corners and flanges\) occupy comparable UV locations across different shapes\. A generic boundary can introduce phase ambiguity, whereas feature\-anchored boundaries help stabilise correspondenceKraevoy and Sheffer \[[2004](https://arxiv.org/html/2605.15231#bib.bib40)\], Schreineret al\.\[[2004](https://arxiv.org/html/2605.15231#bib.bib41)\]\. A common choice is to place boundary vertices uniformly on a circular disk\. This guarantees convexity and is easy to apply across diverse geometries, which is why it is widely used in standard applications of barycentric embeddingFloater \[[1997](https://arxiv.org/html/2605.15231#bib.bib29)\]\. In this work, the circular boundary is treated as a baseline for morphing quality comparison\. For vehicle component shell meshes, boundary geometry usually contains a set of detectable landmarks\. For example, we can detect corner nodes of a typical B\-pillar or U\-channel component\. Therefore, instead of relying only on a circular boundary, we exploit these landmarks to improve feature alignment: - 1\.Square option \(4 anchors\):when four principal corners are detected, the boundary loop is mapped to a square by assigning these corners to the square vertices, while intermediate boundary nodes are distributed along each side according to boundary loop order\. - 2\.Octagon option \(8 anchors\):when eight corners are available \(such as in B\-pillar components\), the boundary loop is mapped to a convex octagon\. The eight detected corners are assigned to octagon vertices, and the remaining boundary nodes are linearly distributed along each octagon edge\. In this study, we use the octagon boundary for the B\-pillar case study and the square boundary for the U\-channel case study\. This geometric alignment step improves UV\-domain consistency for downstream interpolationKraevoy and Sheffer \[[2004](https://arxiv.org/html/2605.15231#bib.bib40)\], Schreineret al\.\[[2004](https://arxiv.org/html/2605.15231#bib.bib41)\]\. Figure[5](https://arxiv.org/html/2605.15231#S5.F5)illustrates the three boundary embeddings \(disk, square, and octagon\)\.  \(a\) B\-pillar graph with detected corner nodes →\\rightarrow →\\rightarrow →\\rightarrow  \(b\) UV mapping onto unit disk  \(c\) UV mapping onto square domain  \(d\) UV mapping onto octagonal domain Figure 5:Feature\-aligned UV parameterisation of a representative B\-pillar mesh\. \(a\) Graph of the representative B\-pillar, where detected corner nodes are highlighted in red\. \(b\)–\(d\) Corresponding UV\-domain graphs mapped onto different convex boundary domains: unit disk, square, and octagon, respectively\. ### 5\.3Shape morphing via UV domain interpolation After mapping the template and target meshes to a common UV domain, we establish node\-wise correspondence in UV space and compute the new 3D position of each template node by linear interpolation from the coordinates of the corresponding nodes on the target mesh\. In other words, we define the updated nodal coordinates of each node of the template mesh based on interpolation from the coordinates from the target mesh\. This keeps mesh connectivity unchanged, so topology is preserved throughout morphing\. LetΦtemp:Vtemp→ℝ2\\Phi\_\{\\mathrm\{temp\}\}:V\_\{\\mathrm\{temp\}\}\\rightarrow\\mathbb\{R\}^\{2\}denote the parameterisation map of the template mesh, which assigns each template node to its UV coordinate in the common parameter domain\. For template nodeii, its UV coordinate is therefore 𝒔itemp=Φtemp\(i\)\.\\boldsymbol\{s\}\_\{i\}^\{\\mathrm\{temp\}\}=\\Phi\_\{\\mathrm\{temp\}\}\(i\)\.\(6\)Next, letT\(𝒔itemp\)T\(\\boldsymbol\{s\}\_\{i\}^\{\\mathrm\{temp\}\}\)denote the target UV element that contains the point𝒔itemp\\boldsymbol\{s\}\_\{i\}^\{\\mathrm\{temp\}\}\. If the vertices of this target\-space element are indexed byqqand the barycentric weights of𝒔itemp\\boldsymbol\{s\}\_\{i\}^\{\\mathrm\{temp\}\}with respect to that element areλq\\lambda\_\{q\}, the corresponding target\-space coordinate is obtained by barycentric interpolation: 𝐱~itgt=∑q∈T\(𝒔itemp\)λq𝐱qtgt\.\\tilde\{\\mathbf\{x\}\}\_\{i\}^\{\\mathrm\{tgt\}\}=\\sum\_\{q\\in T\(\\boldsymbol\{s\}\_\{i\}^\{\\mathrm\{temp\}\}\)\}\\lambda\_\{q\}\\,\\mathbf\{x\}\_\{q\}^\{\\mathrm\{tgt\}\}\.\(7\)This makes the interpolation step explicit:Φtemp\\Phi\_\{\\mathrm\{temp\}\}maps the template node into the common UV domain,T\(⋅\)T\(\\cdot\)identifies the enclosing target UV element, and the barycentric weights then interpolate the 3D target coordinates of that element\. The morphing can be visualised by calculating the intermediate morphing steps\. Let𝐱itemp∈ℝ3\\mathbf\{x\}\_\{i\}^\{\\mathrm\{temp\}\}\\in\\mathbb\{R\}^\{3\}and𝐱~itgt∈ℝ3\\tilde\{\\mathbf\{x\}\}\_\{i\}^\{\\mathrm\{tgt\}\}\\in\\mathbb\{R\}^\{3\}denote the corresponding coordinates of nodeiion the original and morphed template meshes, respectively\. For an intermediate interpolation parameterα∈\[0,1\]\\alpha\\in\[0,1\], the morphed coordinate is defined as 𝐱i\(α\)=\(1−α\)𝐱itemp\+α𝐱~itgt,\\mathbf\{x\}\_\{i\}\(\\alpha\)=\(1\-\\alpha\)\\mathbf\{x\}\_\{i\}^\{\\mathrm\{temp\}\}\+\\alpha\\tilde\{\\mathbf\{x\}\}\_\{i\}^\{\\mathrm\{tgt\}\},\(8\)whereα=0\\alpha=0recovers the template andα=1\\alpha=1gives the target interpolated template coordinate\. In the visual comparison in Figure[6](https://arxiv.org/html/2605.15231#S5.F6), we show four evenly spaced interpolation states from the smaller B\-pillar A configuration to the larger B\-pillar D configuration\. These states are obtained by samplingα\\alphauniformly from 0 to 1 to illustrate the full transition\. We compare the four\-step morphing trajectories obtained from circular and octagonal boundary parameterisation\. For clearer visualisation, different regions of the component are labelled with different colours\. From the colour distribution, clear distortion of the morphed shape under circular mapping can be observed\. This indicates inconsistent feature alignment between template and target and leads to visibly distorted morphed shape\. This can cause coarse nodes from different regions to connect to inappropriate fine\-level nodes during cross\-graph edge construction, therefore reducing message\-passing efficiency\. By contrast, octagonal mapping provides feature\-anchored boundary alignment through consistent corner indexing, which removes the arbitrary phase ambiguity from the circular domain\. As a result, nodes are morphed to their correct target locations \(e\.g\., top region nodes remain in top regions\), so cross\-edge connections constructed from spatial proximity are physically meaningful and support effective message passing\.  \(a\)  \(b\) Figure 6:Four\-step visualisation of template\-to\-target mesh morphing in a shared UV domain using \(a\) circular boundary mapping and \(b\) octagonal feature\-aligned boundary mapping\. The circular mapping exhibits a visible phase mismatch between corresponding regions, whereas the octagonal mapping preserves feature alignment and produces smoother template\-to\-target transitions\. ## 6Training strategy We train the model in two stages: masked pretraining followed by parameter\-efficient fine\-tuning\. In the masked pretraining stage, a subset of nodes and their associated edges is randomly masked, and the model is trained to predict the full response from the partially observed graph\. In the subsequent fine\-tuning stage, the pretrained model is adapted to the target task while updating only a restricted subset of parameters\. The motivation for combining these two stages is twofold\. First, models with complex hierarchical architectures tends have an increased risk of overfittingFortunatoet al\.\[[2022](https://arxiv.org/html/2605.15231#bib.bib25)\], Liet al\.\[[2026](https://arxiv.org/html/2605.15231#bib.bib5)\]\. Prior masked representation learning studies show that randomly masking nodes acts as a strong stochastic regulariser, improving generalisability and transferabilityHouet al\.\[[2022](https://arxiv.org/html/2605.15231#bib.bib33),[2023](https://arxiv.org/html/2605.15231#bib.bib34)\]\. Second, the edge\-specific downsampling layers introduce a large number of trainable parameters and therefore high training cost\. Masking reduces the number of active nodes and edges processed per iteration, which lowers computation and accelerates the training procedure\. The parameter\-efficient fine\-tuning policy further limits overfitting and enables more efficient training\. ### 6\.1Masked pretraining In contrast to masked graph autoencoder frameworks and MeshMask\-style pretraining pipelines that optimise separate encoder and decoder networks, we train a single architecture throughout both masked pretraining and fine\-tuning\. We pretrain the model in a supervised manner, where the model’s output is the full displacement field\. In each training iteration, we apply a random mask to a subset of graph nodes and remove their associated edges from message passing\. The masking is propagated across the first downsampling layer\. When a fine\-level node is masked, the associated fine\-graph edges are masked, and the corresponding fine\-to\-coarse cross\-graph edges are also masked\. This design follows the general pipeline used in prior masked graph pretraining studiesHouet al\.\[[2022](https://arxiv.org/html/2605.15231#bib.bib33)\], Garnieret al\.\[[2025](https://arxiv.org/html/2605.15231#bib.bib35)\], while being adapted to our hierarchical graph setting\. However, masked nodes remain in the output graph after graph upsampling and the supervised loss is computed on all nodes\. To preserve physically critical constraints, we define a protected node set that is never masked\. This set includes \(i\) nodes constrained by boundary conditions and \(ii\) nodes in the contact region with the impactor\. Therefore, masking is applied only to the nodes whose motion is not directly prescribed by supports or immediate contact constraints\. This constrained masking policy prevents the model from discarding key boundary or contact information that governs the global deformation response\. The mask ratio is a predefined hyperparameter and is fixed during training for a given experiment\. Ablation results and sensitivity analysis with respect to mask ratio are provided in Section[7\.1\.2](https://arxiv.org/html/2605.15231#S7.SS1.SSS2)\. The term pretraining in this work is not restricted to any single case study or downstream task\. Instead, the model can be pretrained on any set of structurally similar case studies and then fine\-tuned for the target application\. This includes a single\-case setting, in which both pretraining and fine\-tuning are conducted on the same case study, as well as a cross\-case setting, in which the model is pretrained on one or multiple related case studies and subsequently fine\-tuned on other unseen cases\. In this sense, pretraining is used as a general representation\-learning stage that is decoupled from a specific evaluation task\. More detailed definitions of the cross\-case protocols are provided in Section[6\.4](https://arxiv.org/html/2605.15231#S6.SS4)\. ### 6\.2Parameter\-efficient fine\-tuning After pretraining, the model is fine\-tuned on unmasked graphs using a parameter\-efficient update policy\. Specifically, we freeze all edge\-specific downsampling and upsampling layers, as well as the coarse\-level IG\-MP layer, and update only the remaining modules\. The main rationale is that the edge\-specific downsampling and upsampling layers primarily encode how information is transferred across graph levels in the hierarchy\. Because these layers operate on the shared coarsened graph topology, the learned inter\-level propagation pattern is reusable across components that follow the same downsampling strategy\. We also freeze the coarse\-level IG\-MP layer because the considered components belong to the same structural family and are subject to the same boundary conditions\. Under this setting, the coarse\-scale relationship between input geometry and output deformation is expected to remain largely consistent across cases\. By contrast, the main source of variation lies in the input shape, which is encoded mainly through the encoder, the first fine\-level IG\-MP layer, and the DS\-MP layer\. These parts are therefore left trainable during fine\-tuning so that the pretrained model can adapt to new geometries while preserving reusable multiscale propagation patterns learned during pretraining\. This policy substantially reduces the number of trainable parameters\. In an example architecture used in this work, the full model contains 37\.02M parameters in total, of which only 0\.18M remain trainable during fine\-tuning, corresponding to approximately 0\.48% of the full parameter count\. As a result, fine\-tuning is more efficient in both training time and GPU memory consumption while still allowing task\-specific adaptation\. ### 6\.3Training details For model hyperparameters, unless otherwise stated, we use a batch size of44and a learning rate of4×10−44\\times 10^\{\-4\}\. The encoded fine\-level hidden channel dimension is set to3232, and each DS\-ES layer doubles the channel dimension so that the coarse\-level hidden channel dimension reaches128128\. We use two fine\-level message\-passing steps and1515coarse\-level message\-passing steps\. The model outputs node\-wise 3D displacement fields after impact\. Training is supervised using mean squared error \(MSE\) over all nodal displacement components\. For evaluation, we report two metrics\. The first is the mean nodal Euclidean distance \(MED\), measuring how well the general deformed shape is predicted: MED=1Nv∑i=1Nv∥𝐲^i−𝐲i∥2\.\\mathrm\{MED\}=\\frac\{1\}\{N\_\{v\}\}\\sum\_\{i=1\}^\{N\_\{v\}\}\\lVert\\hat\{\\mathbf\{y\}\}\_\{i\}\-\\mathbf\{y\}\_\{i\}\\rVert\_\{2\}\.\(9\)where𝐲\\mathbf\{y\}and𝐲^\\hat\{\\mathbf\{y\}\}denote ground\-truth and predicted nodal displacement fields, respectively\. The second is the percentage error of maximum intrusion \(MIPE\), which measures the relative error in the peak absolute intrusion magnitude along thezzdirection: I\(𝐲\)=maxi\|yi,z\|,I\(\\mathbf\{y\}\)=\\max\_\{i\}\\left\|y\_\{i,z\}\\right\|,\(10\)MIPE\(%\)=\|I\(𝐲^\)−I\(𝐲\)\|I\(𝐲\)\+ϵ×100%\.\\mathrm\{MIPE\}\(\\%\)=\\frac\{\\left\|I\(\\hat\{\\mathbf\{y\}\}\)\-I\(\\mathbf\{y\}\)\\right\|\}\{I\(\\mathbf\{y\}\)\+\\epsilon\}\\times 100\\%\.\(11\)whereyi,zy\_\{i,z\}denotes thezz\-component of the nodal displacement at nodeii,I\(⋅\)I\(\\cdot\)denotes the maximum absolute intrusion over all nodes, andϵ\\epsilonis a small positive constant added for numerical stability\. In this study, the z\-axis is aligned with the intrusion direction used for evaluating the terminal displacement response\. ### 6\.4Generalisability study training strategy To evaluate different aspects of model generalisation, we train the model under four protocols using different combinations of case studies\. The first protocol is*single\-case*training, in which the model is trained using only the target case\. This setting evaluates model performance on a specific case study, all ablation studies and baseline comparisons are conducted under this protocol\. The second protocol is*all cases*, in which the model is trained using all available case studies, including the target case\. This setting is used to assess whether additional training cases can improve predictive accuracy on the target case\. The third protocol is*all but target*, in which the model is trained using all available source cases except the target case\. This provides a direct test of the model’s ability to generalise to unseen out\-of\-distribution cases\. The fourth protocol is*transfer learning*, in which the model is first pretrained on the source cases and then fine\-tuned using the target case only\. For the first three protocols, the models are trained for 2000 epochs\. When masked pretraining and parameter\-efficient fine\-tuning are applied, even for these non\-transfer protocols, the model is pretrained for 1000 epochs and then fine\-tuned for an additional 1000 epochs using the same dataset split\. For the transfer learning protocol, the models are pretrained for 1000 epochs on the source cases and then fine\-tuned for 1000 epochs on the target case\. For the B\-pillar studies, we define four trial groups to evaluate different generalisation regimes: - 1\.Trial A \(design space generalisation\):the target case is B\-pillar A3, while the source cases include B\-pillar A1 and A2\. This evaluates generalisation to a wider range of design space \(morphing range\)\. - 2\.Trial B \(shape generalisation\):the target case is B\-pillar B, while the source cases include B\-pillar A1, A2, and A3\. These cases are at similar scale but differ in geometry, thereby testing generalisation to an unseen shape, as well as mesh density\. - 3\.Trial C \(scale/complexity generalisation\-I\):the target case is B\-pillar C, while the source cases include all other B\-pillar cases\. This evaluates generalisation to scale and geometric complexity differences\. - 4\.Trial D \(scale/complexity generalisation\-II\):the target case is B\-pillar D, while the source cases include all other B\-pillar cases\. This further evaluates generalisation to full\-scale and the most geometrically complex case\. Table 2:Summary of the trial settings used in the generalisability and transfer studies\.\* B\-pillar A includes A1, A2, and A3\. Table[2](https://arxiv.org/html/2605.15231#S6.T2)summarises all trial settings considered in this study\. These trials are designed to evaluate the model’s cross\-case generalisability under different forms of distribution shift, and their quantitative results are presented in Section 7\.3\. We additionally consider cross\-component transfer learning using the U\-channel case\. In this setting, the model is pretrained on the B\-pillar cases and then fine\-tuned on the U\-channel case, thereby evaluating transfer across component families\. For this trial, the B\-pillar and U\-channel cases share a common coarse graph hierarchy, which means that the same coarse\-level topology and associated edge\-specific parameters are retained during transfer\. the coarse\-level node coordinates are morphed to the target U\-channel geometry before cross\-graph edges are constructed using the square UV domain during morphing\. This is referred to as Trial E in later sections\. Together, these settings provide a structured comparison of in\-distribution training, out\-of\-distribution generalisation, and transfer learning across variation range, shape/mesh density, scale/complexity, and component type\. ## 7Results and Discussion In this section, we evaluate the predictive performance and generalisation capability of the proposed method across the B\-pillar and U\-channel case studies\. We first perform a series of ablation experiments to identify suitable design choices and hyperparameters for the proposed architecture and training strategy\. We then compare MMGUNet with representative baseline methods onsingle\-casetraining\. After that, we study the out\-of\-distribution and cross\-case generalisation performance of MMGUNet\. The main finding is that the proposed combination of morphing\-based multiscale graph modelling, masked pretraining, and parameter\-efficient fine\-tuning improves accuracy while also strengthening generalisation to unseen shape variations and transfer settings\. ### 7\.1Ablation study #### 7\.1\.1Positional encoding As discussed in Section[3\.2](https://arxiv.org/html/2605.15231#S3.SS2), geometric information is primarily encoded through edge features, while node features are optional\. In this ablation, we evaluate whether adding positional node features improves prediction quality\. Performance is assessed with Morph\-GUNet \(MGUNet\) without masking, on B\-pillar A3 using MED and MIPE\. We compare four positional encoding components: - 1\.Zeros:node features are initialised to zeros \(no explicit positional encoding\)\. - 2\.One\-hot \(1H\):one\-hot node\-type encoding with three classes: boundary nodes \(fixed at top/bottom boundaries\), contact nodes \(in contact with the impactor\), and free nodes \(all remaining nodes\)\. - 3\.Laplacian eigenvector \(LE\):the first 16 nontrivial Laplacian eigenvectors of the input graphDwivedi and Bresson \[[2021](https://arxiv.org/html/2605.15231#bib.bib45)\]\. - 4\.Distance to critical regions \(DTC\):Euclidean distances from each node to critical regions \(to the closest fixed boundaries and impactor\-contact region\)\. Figure[7](https://arxiv.org/html/2605.15231#S7.F7)illustrates representative positional encodings used in this study\. Figure[7](https://arxiv.org/html/2605.15231#S7.F7)\(a\) shows the first five Laplacian eigenvectors \(LE\) of an example component, Figure[7](https://arxiv.org/html/2605.15231#S7.F7)\(b\) shows the one\-hot node\-type encoding \(1H\), and Figure[7](https://arxiv.org/html/2605.15231#S7.F7)\(c\) shows the two distance\-to\-critical\-region channels \(DTC\)\. For DTC, the value assigned to each node is computed as its distance to the nearest node belonging to the corresponding critical region\.  \(a\)  \(b\)  \(c\) Figure 7:Illustration of positional encodings: \(a\) the first five Laplacian eigenvectors \(LE\); \(b\) one\-hot node types \(1H\), where the three subfigures correspond to free nodes, boundary nodes, and contact nodes, respectively; and \(c\) distance\-to\-critical\-region channels \(DTC\), where the two subfigures correspond to distance to the loading region and distance to the boundary, respectively\.These encoding components are expected to provide complementary information to the model\. LE provides an intrinsic description of the graph structure by representing each node in a low\-frequency spectral basis of the mesh connectivity\. This can help the model distinguish nodes according to their global topological location, especially when local edge features alone are insufficient to identify long\-range structural context\. The one\-hot node\-type encoding explicitly informs the model of the physical role of each node under the prescribed loading and boundary conditions, allowing it to distinguish constrained, loaded, and freely deforming regions\. DTC further introduces a specific geometric descriptor by quantifying each node’s proximity to the main source of deformation and to the constrained supports\. This is expected to help the model learn spatially varying crash responses, such as local indentation near the impactor and the decay of deformation towards the fixed boundaries\. Overall, these positional features are intended to complement the edge\-based geometric representation by combining intrinsic graph position, boundary/loading semantics, and proximity to critical regions\. Table[3](https://arxiv.org/html/2605.15231#S7.T3)summarises the results for different positional\-encoding combinations\. Although LE improves performance compared with Zeros and 1H, combining LE with 1H \+ DTC leads to worse results than using 1H \+ DTC alone\. This may be because of several factors\. First, Laplacian eigenvectors are subject to sign ambiguity, since each eigenvector can be multiplied by−1\-1while representing the same spectral mode\. As a result, the same geometric region may receive inconsistent signs across different samples, making it harder for the model to learn a stable correspondence between the spectral channels and the physical deformation response\. Second, when the component geometry varies across samples, the ordering and spatial pattern of higher\-order eigenvectors may also change, particularly when eigenvalues are close\. This can introduce sample\-dependent variations in the LE channels that are not directly related to the crash response\. Finally, LE increases the node\-feature dimension, which introduces additional computational cost and may make optimisation more difficult under the same training budget\. In contrast, the 1H \+ DTC setting provides more directly interpretable and problem\-specific information\. The node type identifies the loading and boundary regions, while the distance features describe each node’s proximity to these critical regions\. This configuration achieves the best performance on both metrics and is therefore used as the default node\-feature configuration in the rest of this paper\. Table 3:Positional\-encoding ablation on B\-pillar A3\. Best values are shown in bold\. #### 7\.1\.2Mask ratio To quantify the effect of masking, we conduct a mask ratio ablation and evaluate the MGUNet model with mask ratios from 0% to 80%\. We use B\-pillar A3 case for this ablation study\. Figure[8](https://arxiv.org/html/2605.15231#S7.F8)reports the corresponding MED and MIPE\.  \(a\)  \(b\) Figure 8:Mask\-ratio ablation: \(a\) MED \(mm\) versus mask ratio and \(b\) MIPE \(%\) versus mask ratio\.The results show that introducing moderate masking improves generalisation compared with no masking \(0%\)\. Performance is best in the lower masking range\. MED reaches its minimum at 20%, while MIPE is lowest at 10% and remains competitive at 20%\. When the mask ratio further increases, both metrics degrade, indicating that excessive information removal harms displacement prediction quality\. Considering the overall trade\-off across both metrics, we select a 20% mask ratio for all subsequent experiments in this paper\. ### 7\.2Baseline comparisons We benchmark MMGUNet on two case studies,*B\-pillar A3*and*U\-channel*, trained using six different models in total\. These case studies are chosen to cover two complementary levels of geometric difficulty\. As summarised in Table[6](https://arxiv.org/html/2605.15231#A1.T6), B\-pillar A3 represents a relatively simple case with lower shape complexity but still sufficiently large shape variation\. The U\-channel represents a more challenging case with the largest design space variation, such that samples differ more strongly from one another\. The three external baseline models are MGNPfaffet al\.\[[2021](https://arxiv.org/html/2605.15231#bib.bib3)\], MS\-MGNFortunatoet al\.\[[2022](https://arxiv.org/html/2605.15231#bib.bib25)\], and MultigridGarnieret al\.\[[2024](https://arxiv.org/html/2605.15231#bib.bib46)\]\. For MGN, the model hyperparameters follow the settings suggested in the original paper\. MS\-MGN is implemented as a hierarchical variant with three downsampling layers to provide a fair comparison with our multiscale setting\. Multigrid is implemented as a W\-cycle model with self\-attention pooling, again using the parameter settings suggested in the original work\. The remaining three models serve as model ablations: GUNet\-fix, MGUNet, and the final MMGUNet\. GUNet\-fix constructs cross\-graph edge connections directly from the shared coarsened graphs without morphing, such that the coarsened graphs generally remain geometrically mismatched with the input graphs\. MGUNet instead establishes the cross\-graph edges after morphing the coarsened graphs to match the shape of each input graph\. MMGUNet further extends MGUNet by incorporating masked pretraining and parameter\-efficient fine\-tuning\. The comparison isolates the two mechanisms needed for large\-variation performance\. MGUNet tests whether morphing the coarse hierarchy improves geometric correspondence while preserving fixed topology, thereby improving no\-retraining generalisability\. MMGUNet then tests whether masked supervised pretraining further improves robustness and transferability\. Table[4](https://arxiv.org/html/2605.15231#S7.T4)shows that MMGUNet achieves the strongest overall prediction accuracy on both cases, and Figure[9](https://arxiv.org/html/2605.15231#S7.F9)shows that it attains the lowest mean error on the B\-pillar case with a comparatively tight error spread\. A useful way to interpret these results is through the trade\-off between message\-propagation capacity and generalisation\. MGN exhibits a relatively small train–test gap, but its absolute performance remains limited, which is likely due to insufficient message passing steps on the fine graph\. By contrast, MS\-MGN, Multigrid, and GUNet\-fix partially relieve this bottleneck by introducing multiscale pathways, but these richer mechanisms also make the models more likely to overfit, leading to a larger train–test discrepancy\. For GUNet\-fix in particular, the problem is compounded by the use of fixed downsampled graphs\. When shape variation is large, the resulting cross\-graph edge connections become geometrically mismatched, which weakens information propagation between graph levels and leads to poorer test performanceLiet al\.\[[2026](https://arxiv.org/html/2605.15231#bib.bib5)\]\. Comparing GUNet\-fix and MGUNet, the benefit of coarsened\-graph morphing is clear, as MGUNet achieves substantially lower errors by establishing more meaningful cross\-graph edge connections\. However, MGUNet still exhibits a relatively large generalisation gap, suggesting that improved cross\-graph alignment alone is insufficient to fully prevent overfitting\. Relative to MGUNet, MMGUNet further reduces both the error median and mean while maintaining compact interquartile ranges, indicating that the gain is not only in average accuracy but also in distributional robustness across samples\. This improved behaviour is because masked pretraining regularises representation learning, while parameter\-efficient fine\-tuning constrains task adaptation to a small subset of parameters, thereby reducing the tendency to overfit while preserving the transferable multiscale propagation patterns learned during pretraining\. Table 4:Performance metrics summary on B\-pillar and U\-channel case studies \(lower is better\)\. Results are reported as mean±\\pmstandard deviation, and best values are shown in bold\. \(a\)  \(b\) Figure 9:Baseline comparison on the B\-pillar A3 case study\. \(a\) Distribution of MED \(mm\)\. \(b\) Distribution of MIPE \(%\)\. For each model, grey and teal boxplots denote training and test distributions, respectively\. \(a\)  \(b\) Figure 10:Baseline comparison on the U\-channel case study\. \(a\) Distribution of MED \(mm\)\. \(b\) Distribution of MIPE \(%\)\. Training/test splits are shown by grey/teal boxplots\.The U\-channel case constitutes a more challenging case study, exhibiting larger shape variation and a greater deformation scale than B\-pillar A3\. Therefore larger absolute errors for all methods can be observed in Figure[10](https://arxiv.org/html/2605.15231#S7.F10)\. However, similar ranking trend remains\. The external baselines present the highest mean prediction errors and broadest tails, indicating limited robustness\. Among the ablations, MGUNet improves generalisation over GUNet\-fix\. MMGUNet further shifts the test distributions downward with comparatively tighter dispersion, especially in terms of maximum intrusion prediction\. Therefore, across both B\-pillar A3 and U\-channel case studies, MMGUNet achieves the lowest test errors among the evaluated models and shows a smaller train\-test discrepancy\. ### 7\.3Generalisability and transferability across cases This subsection presents the generalisation results across the four trial groups under the four training protocols defined in Section[6](https://arxiv.org/html/2605.15231#S6)using MMGUNet\. We report trial\-wise comparisons to evaluate how different training protocols affect cross\-case generalisation performance\. Figure[11](https://arxiv.org/html/2605.15231#S7.F11)compares the training outcomes across all four B\-pillar trials \(A–D\) using MED and MIPE\. In each trial, we visualise the four training protocols to provide a consistent comparison\.  \(a\)  \(b\) Figure 11:Trial\-wise training\-strategy comparison: \(a\) MED and \(b\) MIPE\. Each trial reports the four training protocols for direct comparison\. Broken y\-axes are used where necessary to display the substantially largerall\-but\-targeterrorsAs shown in Figure[11](https://arxiv.org/html/2605.15231#S7.F11), a consistent pattern emerges\. Direct out\-of\-distribution generalisation \(all but target\) remains challenging, especially in trials with larger distribution shifts \(Trials B–D\)\. The all\-cases protocol yields performance broadly comparable to target\-only training\. It provides small improvements in some trials while maintaining similarly strong accuracy in others, showing that the model can be trained jointly on multiple related cases while maintaining strong predictive performance across them\. For the transfer learning case, the models are fine\-tuned with 300 samples to ensure fair comparisons with the other training protocols\. Transfer learning consistently improves robustness and achieves better or at least comparable accuracy relative to target\-only training\. For simpler tasks such as Trial A, where the target case is the same component as the source cases, transfer learning achieves better prediction accuracy compared with all other training strategies\. For trials with larger distribution shifts, transfer learning leads to a slightly higher but comparable prediction error compared with target\-only and all\-cases training\. When the target problem belongs to a similar component family and shares comparable simulation conditions, boundary conditions, and material modelling assumptions, the pretrained model can be reused effectively\. Under this regime, the transfer strategy is also more training\-efficient, requiring fewer optimisation steps and less wall\-clock training time to reach similar or better performance levels\. We further analyse data efficiency for transfer learning with varying fine\-tuning\-set sizes\. Figure[12](https://arxiv.org/html/2605.15231#S7.F12)shows an example result using Trial B\. We compare the distribution of the test errors using pretrained and non\-pretrained models with different numbers of training samples\. As defined in Section[6\.4](https://arxiv.org/html/2605.15231#S6.SS4), the pretrained transfer\-learning model is first pretrained for 1000 epochs and then fine\-tuned for a further 1000 epochs, whereas the no\-pretrain model is trained from scratch for 1000 epochs only, without masking\-based pretraining or parameter freezing\. Therefore, the no\-pretrain setting should not be interpreted as equivalent to thesingle\-casetraining protocol\. This comparison evaluates target\-data efficiency rather than equal total training compute, because the pretrained model benefits from a reusable trained backbone\. The key result is that, once the backbone has been pretrained, adaptation to a new target case requires substantially fewer target samples than training from scratch\. Moreover, fine\-tuning with only 50 target samples can outperform no\-pretrain baseline with 300 samples, demonstrating that pretraining significantly reduces target\-data requirements while improving final accuracy\. Figure[13](https://arxiv.org/html/2605.15231#S7.F13)further provides a qualitative comparison between ground truth and prediction under these four settings\. Fine\-tuning a pretrained model with 300 samples shows the closest agreement with the ground truth, while fine\-tuning with only 50 samples still shows slightly better agreement than training from scratch with 300 samples\. By contrast, training from scratch with 50 samples exhibits the poorest agreement and the largest visible prediction errors\. This indicates that a backbone MMGUNet model pretrained with multiple datasets is reusable for new cases within a similar component family, and that fine\-tuning requires far fewer training samples to reach comparable accuracy\.  \(a\)  \(b\) Figure 12:Trial B \(shape transfer learning\) data\-requirement analysis: \(a\) MED and \(b\) MIPE, comparing pretrained fine\-tuning and no\-pretrain baselines under different target\-data sizes\. The y\-axis shows probability density, with each curve normalised to unit area, taller peaks indicate errors are more concentrated around that value\. \(a\) pretrained: fine\-tune with 300 samples  \(b\) pretrained: fine\-tune with 50 samples  \(c\) train from scratch with 300 samples  \(d\) train from scratch with 50 samples Figure 13:Trial B qualitative comparison of ground\-truth and predictedzz\-displacement fields for pretrained fine\-tuning and training from scratch under two target\-data budgets\. Each group shows ground truth, prediction, and pointwise prediction error\.Table[5](https://arxiv.org/html/2605.15231#S7.T5)summarises the overall effect of pretraining across all trials by averaging the results obtained under the available training sample sizes within each trial\. For Trials A–D, the reported averaged MED and MIPE values are computed as the mean of the results from the two sample sizes, 300 and 50\. For Trial E, we evaluate one training sample size \(200 samples\)\. The percentage improvement of the pretrained model is computed relative to the no\-pretrain baseline as the reduction in error: Improvement\(%\)=Eno\-pretrain−EpretrainEno\-pretrain×100%,\\mathrm\{Improvement\}\(\\%\)=\\frac\{\\mathrm\{E\}\_\{\\mathrm\{no\\text\{\-\}pretrain\}\}\-\\mathrm\{E\}\_\{\\mathrm\{pretrain\}\}\}\{\\mathrm\{E\}\_\{\\mathrm\{no\\text\{\-\}pretrain\}\}\}\\times 100\\%,\(12\)whereE\\mathrm\{E\}is the prediction error\. A positive percentage therefore indicates that pretraining reduces the prediction error\. Overall, Table[5](https://arxiv.org/html/2605.15231#S7.T5)shows that pretraining consistently improves performance for both MED and MIPE across all trials\. The largest gains are observed in Trials A and B, where pretraining reduces MED by 58\.73% and 48\.86%, respectively, and reduces MIPE by 48\.11% and 47\.85%, respectively\. These results indicate that pretraining provides substantial benefits in easier transfer settings, with around half of the baseline error reduced\. Trial C also shows clear improvement, although at a more moderate level, with reductions of 25\.78% in MED and 38\.90% in MIPE\. In Trial D, the improvement remains positive but becomes smaller, at 16\.37% for MED and 22\.17% for MIPE, suggesting that the benefit of pretraining diminishes as the transfer task becomes more challenging\. Trial E exhibits the smallest relative gains, with reductions of 8\.41% in MED and 10\.06% in MIPE, but still demonstrates that pretraining remains beneficial even in the cross\-component setting\. Taken together, these results show a robust and consistent advantage of pretraining across all trials, while also indicating that the magnitude of the benefit depends on the difficulty of the transfer scenario\. A detailed summary of the prediction accuracy of all trials with different fine\-tuning\-set budgets is presented in[C](https://arxiv.org/html/2605.15231#A3)\. Table 5:Overall trial\-level summary of pretraining vs\. no pretraining\. ## 8Conclusion This paper presents Mask\-Morph Graph U\-Net, a generalisable mesh\-based surrogate for crashworthiness field prediction under large geometric variation\. The method addresses a key limitation of fixed\-hierarchy, edge\-specific graph surrogates: fixed coarse topology is desirable for high\-capacity edge\-specific aggregation, but can lead to poor fine\-to\-coarse correspondence when component shape, scale or complexity varies substantially\. MMGUNet resolves this tension by preserving fixed coarse connectivity while morphing the coarsened graph hierarchy to each input geometry\. In addition, supervised masked pretraining followed by parameter\-efficient fine\-tuning reduces overfitting and improves target\-data efficiency in transfer learning settings\. Insingle\-casetraining, the results show that the proposed model achieves higher prediction accuracy and a smaller train–test gap than existing and ablated baselines, indicating reduced overfitting and improved robustness\. In transfer scenarios, masked pretraining with parameter\-efficient fine\-tuning consistently outperforms no\-pretrain baseline at the same data budget\. In Trials A and B, fine\-tuning with 50 target samples surpasses no\-pretrain baseline with 300 samples\. Overall, pretraining improves both MED and MIPE in all cases\. The largest gains in Trial A reach 58\.73% MED reduction and 48\.11% MIPE reduction\. We also observe consistent positive gains for all transfer learning case studies\. In practical crashworthiness design, engineers often need rapid estimates of deformation fields and intrusion measures for many geometric variants before committing to expensive nonlinear FE simulations\. The proposed surrogate supports this workflow by predicting nodal displacement fields directly on irregular FE meshes and by reducing the amount of target data required when adapting to related geometries\. The method is therefore intended as a decision\-support tool for early\-stage design exploration rather than a replacement for final certification\-level crash simulation\. The current study is limited to terminal displacement prediction under fixed loading, material, contact, and boundary conditions\. The proposed morphing strategy also assumes identifiable geometric landmarks and a shared coarse graph hierarchy across the evaluated component families\. Direct prediction for strongly shifted target geometries without target\-domain fine\-tuning remains challenging\. Future work will extend the framework to time\-dependent crash responses, variable loading and material conditions, broader component families, and larger vehicle\-level assemblies\. Future work will also integrate the surrogate into a broader design\-optimisation platform for vehicle panel components\. ## Acknowledgements The authors acknowledge funding support from UKRI \(UKRI221: AI\-Driven Design for Forming High\-Performance Vehicle Parts\), as well as PhD scholarships from Imperial College London\. They would also like to thank ESI Group for providing technical support with the Virtual\-Performance Solution \(VPS\)\. For the purpose of open access, the authors have applied a Creative Commons Attribution \(CC BY\) license to any Author Accepted Manuscript version arising\. ## Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper\. ## References - A\. Ahmadi Dastjerdi, M\. Moshref\-Javadi, H\. Ahmadian, and M\. Gholampour \(2019\)Crushing analysis and multi\-objective optimization of different length bi\-thin walled cylindrical structures under axial impact loading\.Engineering Optimization51,pp\. 1–18\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - E\.İ\. Albak \(2023\)Optimization design for circular multi\-cell thin\-walled tubes with discrete and continuous design variables\.Mechanics of Advanced Materials and Structures30\(24\),pp\. 5091–5105\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - M\. Alexa, D\. Cohen\-Or, and D\. Levin \(2000\)As\-rigid\-as\-possible shape interpolation\.InProceedings of EUROGRAPHICS 2000,pp\. 157–164\.External Links:[Document](https://dx.doi.org/10.1111/1467-8659.00339)Cited by:[§2\.1](https://arxiv.org/html/2605.15231#S2.SS1.p3.1)\. - V\. André, M\. Costas, M\. Langseth, and D\. Morin \(2023\)Neural network modelling of mechanical joints for the application in large\-scale crash analyses\.International Journal of Impact Engineering177,pp\. 104490\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - Y\. Cao, S\. Bhat, A\. Subhash, J\. Li, S\. Ovie, K\. Dvijotham, L\. N\. Chelliah, K\. Ramakrishnan, A\. Kalinowska, M\. J\. Glueck, A\. G\. Schwing, A\. Anandkumar, and G\. Cecchi \(2022\)Efficient learning of mesh\-based physical simulation with bsms\-gnn\.Note:arXiv preprintExternal Links:2210\.02573Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p2.1)\. - J\. Chang, T\. Tyan, M\. El\-Bkaily, J\. Cheng, A\. Marpu, Q\. Zeng, and J\. Santini \(2007\)Implicit and explicit finite element methods for crash safety analysis\.Technical reportTechnical Report2007\-01\-0982,SAE International\.External Links:[Document](https://dx.doi.org/10.4271/2007-01-0982)Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - Q\. Chen, X\. Wen, and Y\. Zhang \(2024\)Predicting dynamic responses of continuous deformable bodies: a graph\-based learning approach\.Computer Methods in Applied Mechanics and Engineering420,pp\. 116669\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p2.1)\. - A\. de Boer, M\. S\. van der Schoot, and H\. Bijl \(2007\)Mesh deformation based on radial basis function interpolation\.Computers & Structures85\(11–14\),pp\. 784–795\.External Links:[Document](https://dx.doi.org/10.1016/j.compstruc.2007.01.013)Cited by:[§2\.1](https://arxiv.org/html/2605.15231#S2.SS1.p2.1)\. - S\. Deshpande, J\. Lengiewicz, and S\. Bordas \(2022\)MAgNET: a graph U\-Net architecture for mesh\-based simulations\.Note:arXiv preprint arXiv:2211\.00713Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p2.1),[§1](https://arxiv.org/html/2605.15231#S1.p3.1),[§3\.1](https://arxiv.org/html/2605.15231#S3.SS1.p2.1)\. - J\. Devlin, M\. Chang, K\. Lee, and K\. Toutanova \(2019\)BERT: pre\-training of deep bidirectional transformers for language understanding\.arXiv preprint\.External Links:1810\.04805Cited by:[§2\.2](https://arxiv.org/html/2605.15231#S2.SS2.p1.1)\. - V\. P\. Dwivedi and X\. Bresson \(2021\)A generalization of transformer networks to graphs\.arXiv preprint arXiv:2012\.09699\.External Links:[Link](https://arxiv.org/abs/2012.09699)Cited by:[item 3](https://arxiv.org/html/2605.15231#S7.I1.i3.p1.1)\. - M\. S\. Floater \(1997\)Parametrization and smooth approximation of surface triangulations\.Computer Aided Geometric Design14\(3\),pp\. 231–250\.External Links:[Document](https://dx.doi.org/10.1016/S0167-8396%2896%2900031-3)Cited by:[§2\.1](https://arxiv.org/html/2605.15231#S2.SS1.p3.1),[§5\.2](https://arxiv.org/html/2605.15231#S5.SS2.p2.1),[§5](https://arxiv.org/html/2605.15231#S5.p2.1)\. - M\. S\. Floater \(2003\)Mean value coordinates\.Computer Aided Geometric Design20\(1\),pp\. 19–27\.External Links:[Document](https://dx.doi.org/10.1016/S0167-8396%2803%2900002-5)Cited by:[§2\.1](https://arxiv.org/html/2605.15231#S2.SS1.p3.1)\. - M\. Fortunato, T\. Pfaff, K\. Stachenfeld, and P\. Battaglia \(2022\)Multiscale meshgraphnets\.In2nd AI4Science Workshop at the 39th International Conference on Machine Learning \(ICML\),Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p2.1),[§6](https://arxiv.org/html/2605.15231#S6.p1.1),[§7\.2](https://arxiv.org/html/2605.15231#S7.SS2.p1.1),[Table 4](https://arxiv.org/html/2605.15231#S7.T4.10.8.8.5)\. - X\. Fu, J\. Hu, X\. Zhang, and H\. Feng \(2023\)An finite element analysis surrogate model with boundary oriented graph embedding approach for rapid design\.Journal of Computational Design and Engineering10\(3\),pp\. 1026–1046\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p2.1)\. - P\. Garnier, V\. Lannelongue, J\. Viquerat, and E\. Hachem \(2025\)MeshMask: pretraining mesh simulators with masked autoencoding\.arXiv preprint\.External Links:2502\.10841Cited by:[§2\.2](https://arxiv.org/html/2605.15231#S2.SS2.p1.1),[§6\.1](https://arxiv.org/html/2605.15231#S6.SS1.p2.1)\. - P\. Garnier, J\. Viquerat, and E\. Hachem \(2024\)Multi\-grid graph neural networks with self\-attention for computational mechanics\.External Links:2409\.11899,[Link](https://arxiv.org/abs/2409.11899)Cited by:[§7\.2](https://arxiv.org/html/2605.15231#S7.SS2.p1.1),[Table 4](https://arxiv.org/html/2605.15231#S7.T4.14.12.12.5)\. - W\. Guo, P\. Xu, C\. Yang, J\. Guo, L\. Yang, and S\. Yao \(2023\)Machine learning\-based crashworthiness optimization for the square cone energy\-absorbing structure of the subway vehicle\.Structural and Multidisciplinary Optimization66\(8\),pp\. 182\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - J\. He, Y\. Jiao, and S\. P\. A\. Bordas \(2023\)On the use of graph neural networks and shape\-function\-based gradient computation in the deep energy method\.International Journal for Numerical Methods in Engineering124\(4\),pp\. 864–879\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p2.1)\. - K\. He, X\. Chen, S\. Xie, Y\. Li, P\. Dollár, and R\. Girshick \(2022\)Masked autoencoders are scalable vision learners\.InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition \(CVPR\),pp\. 16000–16009\.Cited by:[§2\.2](https://arxiv.org/html/2605.15231#S2.SS2.p1.1)\. - Z\. Hou, X\. Liu, Y\. Cen, Y\. Dong, H\. Yang, C\. Wang, and J\. Tang \(2022\)GraphMAE: self\-supervised masked graph autoencoders\.InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining \(KDD\),pp\. 594–604\.Cited by:[§2\.2](https://arxiv.org/html/2605.15231#S2.SS2.p1.1),[§6\.1](https://arxiv.org/html/2605.15231#S6.SS1.p2.1),[§6](https://arxiv.org/html/2605.15231#S6.p1.1)\. - Z\. Hou, X\. Liu, Y\. Cen, Y\. Zhao, Y\. Dong, H\. Yang, C\. Wang, and J\. Tang \(2023\)GraphMAE2: a decoding\-enhanced masked self\-supervised graph learner\.InProceedings of the ACM Web Conference \(WWW\),pp\. 737–746\.Cited by:[§2\.2](https://arxiv.org/html/2605.15231#S2.SS2.p1.1),[§6](https://arxiv.org/html/2605.15231#S6.p1.1)\. - C\. P\. Kohar, T\. K\. Eller, D\. S\. Connolly, and K\. Inal \(2020\)Using artificial intelligence to aid vehicle lightweighting in crashworthiness with aluminum\.MATEC Web of Conferences326,pp\. 01006\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - C\. P\. Kohar, L\. Greve, T\. K\. Eller, D\. S\. Connolly, and K\. Inal \(2021\)A machine learning framework for accelerating the design process using cae simulations: an application to finite element analysis in structural crashworthiness\.Computer Methods in Applied Mechanics and Engineering385,pp\. 114008\.External Links:[Document](https://dx.doi.org/10.1016/j.cma.2021.114008)Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - V\. Kraevoy and A\. Sheffer \(2004\)Cross\-parameterization and compatible remeshing of 3d models\.ACM Transactions on Graphics23\(3\),pp\. 861–869\.External Links:[Document](https://dx.doi.org/10.1145/1015706.1015811)Cited by:[§2\.1](https://arxiv.org/html/2605.15231#S2.SS1.p3.1),[§5\.2](https://arxiv.org/html/2605.15231#S5.SS2.p1.1),[§5\.2](https://arxiv.org/html/2605.15231#S5.SS2.p4.1)\. - Y\. Le Guennec, T\. Defoort, J\. V\. Aguado, and D\. Borzacchiello \(2025\)Comparing traditional surrogate modelling and neural fields for vehicle crash simulation data\.InSIA Simulation numérique 2025,Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - T\. Lehrer, P\. Stocker, F\. Duddeck, and M\. Wagner \(2025\)UCSM: dataset of u\-shaped parametric cad geometries and real\-world sheet metal meshes for deep drawing\.Computer\-Aided Design188,pp\. 103924\.External Links:[Document](https://dx.doi.org/10.1016/j.cad.2025.103924)Cited by:[Figure 14](https://arxiv.org/html/2605.15231#A1.F14),[§4](https://arxiv.org/html/2605.15231#S4.SS0.SSS0.Px2.p1.1)\. - B\. Lévy, S\. Petitjean, N\. Ray, and J\. Maillot \(2002\)Least squares conformal maps for automatic texture atlas generation\.InProceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques \(SIGGRAPH\),pp\. 362–371\.External Links:[Document](https://dx.doi.org/10.1145/566654.566590)Cited by:[§2\.1](https://arxiv.org/html/2605.15231#S2.SS1.p3.1)\. - H\. Li, Y\. Zhao, H\. Zhou, T\. Pfaff, and N\. Li \(2026\)A graph neural network surrogate model for mesh\-based crashworthiness prediction of vehicle panel components\.Results in Engineering\.External Links:[Document](https://dx.doi.org/10.1016/j.rineng.2026.110925)Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1),[§1](https://arxiv.org/html/2605.15231#S1.p4.1),[§2\.1](https://arxiv.org/html/2605.15231#S2.SS1.p1.1),[§3\.2](https://arxiv.org/html/2605.15231#S3.SS2.p3.1),[§3\.3](https://arxiv.org/html/2605.15231#S3.SS3.p1.11),[§3\.3](https://arxiv.org/html/2605.15231#S3.SS3.p2.1),[§3\.3](https://arxiv.org/html/2605.15231#S3.SS3.p3.9),[§4](https://arxiv.org/html/2605.15231#S4.SS0.SSS0.Px1.p2.12),[§6](https://arxiv.org/html/2605.15231#S6.p1.1),[§7\.2](https://arxiv.org/html/2605.15231#S7.SS2.p2.1),[Table 4](https://arxiv.org/html/2605.15231#S7.T4.18.16.16.5)\. - H\. Li, H\. Zhou, and N\. Li \(2024\)An integrated convolutional neural network\-based surrogate model for crashworthiness performance prediction of hot\-stamped vehicle panel components\.InMATEC Web of Conferences,Vol\.401,pp\. 03013\.External Links:[Document](https://dx.doi.org/10.1051/matecconf/202440103013)Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1),[§3\.1](https://arxiv.org/html/2605.15231#S3.SS1.p2.1)\. - H\. Luo, H\. Wu, H\. Zhou, L\. Xing, Y\. Di, J\. Wang, and M\. Long \(2025\)Transolver\+\+: an accurate neural solver for pdes on million\-scale geometries\.arXiv preprint arXiv:2502\.02414\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - M\. D\. McKay, R\. J\. Beckman, and W\. J\. Conover \(1979\)A comparison of three methods for selecting values of input variables in the analysis of output from a computer code\.Technometrics21\(2\),pp\. 239–245\.Cited by:[§4](https://arxiv.org/html/2605.15231#S4.SS0.SSS0.Px1.p2.12),[§4](https://arxiv.org/html/2605.15231#S4.SS0.SSS0.Px2.p1.1)\. - M\. A\. Nabian, S\. Chavare, D\. Akhare, R\. Ranade, R\. Cherukuri, and S\. Tadepalli \(2025\)Automotive crash dynamics modeling accelerated with machine learning\.InSAE Technical Paper,pp\. 2026–01–0568\.External Links:[Document](https://dx.doi.org/10.4271/2026-01-0568)Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - T\. Pfaff, M\. Fortunato, A\. Sanchez\-Gonzalez, and P\. Battaglia \(2021\)Learning mesh\-based simulation with graph networks\.InInternational Conference on Learning Representations \(ICLR\),Note:arXiv:2010\.03409Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1),[§1](https://arxiv.org/html/2605.15231#S1.p2.1),[§7\.2](https://arxiv.org/html/2605.15231#S7.SS2.p1.1),[Table 4](https://arxiv.org/html/2605.15231#S7.T4.6.4.4.5)\. - PhysicsNeMo Contributors \(2023\)NVIDIA physicsnemo: an open\-source framework for physics\-based deep learning in science and engineering\.Note:Released February 24, 2023\. GitHubCited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - M\. Rabinovich, R\. Poranne, D\. Panozzo, and O\. Sorkine\-Hornung \(2017\)Scalable locally injective mappings\.ACM Transactions on Graphics36\(2\),pp\. 16:1–16:16\.External Links:[Document](https://dx.doi.org/10.1145/2977606)Cited by:[§2\.1](https://arxiv.org/html/2605.15231#S2.SS1.p3.1)\. - M\. Rogala, J\. Gajewski, and M\. Ferdynus \(2020\)The effect of geometrical non\-linearity on the crashworthiness of thin\-walled conical energy\-absorbers\.Materials13\(21\),pp\. 4857\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - E\. Sakaridis, N\. Karathanasopoulos, and D\. Mohr \(2022\)Machine\-learning based prediction of crash response of tubular structures\.International Journal of Impact Engineering166\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - A\. Sanchez\-Gonzalez, J\. Godwin, T\. Pfaff, R\. Ying, J\. Leskovec, and P\. Battaglia \(2020\)Learning to simulate complex physics with graph networks\.InInternational Conference on Machine Learning \(ICML\),Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p2.1)\. - J\. Schreiner, A\. Asirvatham, E\. Praun, and H\. Hoppe \(2004\)Inter\-surface mapping\.ACM Transactions on Graphics23\(3\),pp\. 870–877\.External Links:[Document](https://dx.doi.org/10.1145/1015706.1015812)Cited by:[§2\.1](https://arxiv.org/html/2605.15231#S2.SS1.p3.1),[§5\.2](https://arxiv.org/html/2605.15231#S5.SS2.p1.1),[§5\.2](https://arxiv.org/html/2605.15231#S5.SS2.p4.1)\. - T\. W\. Sederberg and S\. R\. Parry \(1986\)Free\-form deformation of solid geometric models\.InProceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques \(SIGGRAPH\),pp\. 151–160\.External Links:[Document](https://dx.doi.org/10.1145/15922.15903)Cited by:[§2\.1](https://arxiv.org/html/2605.15231#S2.SS1.p2.1)\. - S\. Thel, L\. Greve, M\. Karl, and P\. van der Smagt \(2025\)Accelerating crash simulations with finite element method integrated networks \(femin\): comparing two approaches to replace large portions of a fem simulation\.Computer Methods in Applied Mechanics and Engineering443,pp\. 118046\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - S\. Thel, L\. Greve, B\. van de Weg, and P\. van der Smagt \(2024\)Introducing finite element method integrated networks \(femin\)\.Computer Methods in Applied Mechanics and Engineering427,pp\. 117073\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - W\. T\. Tutte \(1963\)How to draw a graph\.Proceedings of the London Mathematical Societys3\-13\(1\),pp\. 743–767\.External Links:[Document](https://dx.doi.org/10.1112/plms/s3-13.1.743)Cited by:[§2\.1](https://arxiv.org/html/2605.15231#S2.SS1.p3.1),[§5\.1](https://arxiv.org/html/2605.15231#S5.SS1.p1.1)\. - Z\. Wen, Y\. Li, H\. Wang, and Y\. Peng \(2023\)Data\-driven spatiotemporal modeling for structural dynamics on irregular domains by stochastic dependency neural estimation\.Computer Methods in Applied Mechanics and Engineering404,pp\. 115831\.External Links:[Document](https://dx.doi.org/10.1016/j.cma.2022.115831)Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - E\. Whalen and C\. Mueller \(2022\)Toward reusable surrogate models: graph\-based transfer learning on truss structures\.Journal of Mechanical Design144\(2\),pp\. 021704\.Note:ASME Digital CollectionCited by:[§2\.2](https://arxiv.org/html/2605.15231#S2.SS2.p1.1)\. - H\. Wu, H\. Luo, H\. Wang, J\. Wang, and M\. Long \(2024\)Transolver: a fast transformer solver for pdes on general geometries\.arXiv preprint arXiv:2402\.02366\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - S\. R\. Wu \(2006\)Convergence study on explicit finite element for crashworthiness analysis\.Technical reportTechnical Report2006\-01\-0672,SAE International\.External Links:[Document](https://dx.doi.org/10.4271/2006-01-0672)Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - F\. Xiong, D\. Wang, Z\. Ma, L\. Yang, Z\. Li, and B\. Song \(2018\)Multi\-objective lightweight and crashworthiness optimization for the side structure of an automobile body\.Structural and Multidisciplinary Optimization58\(4\),pp\. 1823–1843\.Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - P\. Zende and H\. Dalir \(2022\)Multi\-objective optimization of composite square tube for minimizing peak crushing force and maximizing specific energy absorption using artificial neural network and genetic algorithm\.InASME International Mechanical Engineering Congress and Exposition, Proceedings \(IMECE\),Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - G\. Zhang, Y\. Liu, Y\. Quan, and J\. Yan \(2026\)A mesh\-based geometric deep learning framework for rapid response prediction of large\-scale and multi\-component mechanical structures in engineering\.Computer Methods in Applied Mechanics and Engineering448,pp\. 118435\.External Links:[Document](https://dx.doi.org/10.1016/j.cma.2025.118435)Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p1.1)\. - X\. Zhang, J\. Zhou, and Z\. Feng \(2020\)B\-pillar collision test method\.Technical reportAiways Automobile Shanghai Co Ltd,China\.Cited by:[§4](https://arxiv.org/html/2605.15231#S4.SS0.SSS0.Px1.p2.12)\. - Y\. Zhao, Q\. Chen, H\. Li, H\. Zhou, H\. R\. Attar, T\. Pfaff, T\. Wu, and N\. Li \(2026\)Recurrent U\-Net\-based graph neural network \(RUGNN\) for accurate deformation predictions in sheet material forming\.Advanced Engineering Informatics69,pp\. 104021\.External Links:[Document](https://dx.doi.org/10.1016/j.aei.2025.104021)Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p4.1)\. - H\. Zhou, Y\. Zhao, H\. Li, T\. Pfaff, and N\. Li \(2025\)A multi\-level graph\-based surrogate model for real\-time high\-fidelity sheet forming simulations\.Advanced Engineering Informatics66,pp\. 103458\.External Links:[Document](https://dx.doi.org/10.1016/j.aei.2025.103458)Cited by:[§1](https://arxiv.org/html/2605.15231#S1.p2.1)\. ## Appendix ADataset parameters In this appendix, we provide the detailed parameter definitions used to generate the B\-pillar and U\-channel datasets\. For the B\-pillar case, as illustrated in Figure[3](https://arxiv.org/html/2605.15231#S4.F3), geometric variation is introduced by morphing the top region of the component in thexxandyydirections, as well as morphing in thezzdirection at one of the three control points\. The magnitude of each morph is defined relative to the characteristic length of the component in the corresponding direction\. For example, the variation in thexxdirection is specified as±5%\\pm 5\\%of the total B\-pillar width, i\.e\. its length in thexxdirection\. The full set of B\-pillar parameters and bounds is listed in Table[6](https://arxiv.org/html/2605.15231#A1.T6)\. For the U\-channel case, the geometry can be decomposed into three main regions: the middle plane, the left addendum, and the right addendum, as shown in Figure[14](https://arxiv.org/html/2605.15231#A1.F14)\. We set the length of the middle plane,xmx\_\{m\}, to 200 mm as the reference parameter\. A total of 16 geometric parameters are varied within predefined bounds, as summarised in Table[6](https://arxiv.org/html/2605.15231#A1.T6)\. These parameters describe the lengths, widths, and heights of the different regions, as well as the slant and plane angles\. Because of the larger number of parameters and their broader ranges, the U\-channel case spans a wider design space than the B\-pillar case and is therefore inherently more challenging for the surrogate model to learn\. Representative examples of geometric variation for the B\-pillar and U\-channel components are shown in Figure[15](https://arxiv.org/html/2605.15231#A1.F15)\. Figure 14:Illustration of the parameters for U\-channel 2 in the UCSM datasetLehreret al\.\[[2025](https://arxiv.org/html/2605.15231#bib.bib44)\]\.Table 6:Case study parameters, bounds, and descriptions\. \(a\)  \(b\) Figure 15:Representative variation of the component shapes\. Figure \(a\) shows B\-pillar variations, and Figure \(b\) shows U\-channel variations\. ## Appendix BBarycentric parameterisation foundations ##### Graph setting and 3\-connected planarity LetG=\(V,E\)G=\(V,E\)be a simple undirected graph, where vertices are mesh nodes and edges follow mesh adjacency\. A graph is*planar*if it admits a drawing inℝ2\\mathbb\{R\}^\{2\}with non\-crossing edges\. A graph is*3\-vertex\-connected*if removing any set of at most two vertices leaves the graph connected\. These concepts are standard in Tutte’s spring\-embedding theorem, which provides key existence, uniqueness, and non\-self\-intersection guarantees for barycentric embeddings with convex boundary constraints\. In practice, the shell meshes used in this study are processed to satisfy the required boundary\-loop ordering before parameterisation\. ##### Boundary and interior vertices Assume the shell mesh has disk topology with a single outer boundary loop\. Let B=\(b0,…,bk−1\)⊂VB=\(b\_\{0\},\\dots,b\_\{k\-1\}\)\\subset V\(13\)denote the ordered boundary cycle, and let I:=V∖BI:=V\\setminus B\(14\)denote the interior vertices\. The parameterisation is obtained by prescribing UV positions onBBand solving for UV positions onII\. ##### Embedding map with fixed convex boundary We seek an embedding \(parameterisation\) map Φ:V→ℝ2,i↦𝐮i=\(ui,vi\),\\Phi:V\\rightarrow\\mathbb\{R\}^\{2\},\\qquad i\\mapsto\\mathbf\{u\}\_\{i\}=\(u\_\{i\},v\_\{i\}\),\(15\)such that boundary vertices inBBare mapped to a convex polygonal curve, while interior vertices satisfy a barycentric equilibrium condition\. This is the classical Tutte \(barycentric\) embedding and can be interpreted as the equilibrium of a spring system with pinned boundary\. ##### Harmonic \(barycentric\) condition For an interior vertexi∈Ii\\in I, letN\(i\)N\(i\)be its one\-ring neighbour set anddi=\|N\(i\)\|d\_\{i\}=\|N\(i\)\|its degree\. Using uniform barycentric weights, wij=\{1di,j∈N\(i\),0,otherwise,⇒∑j∈Vwij=1,wij≥0\.w\_\{ij\}=\\begin\{cases\}\\dfrac\{1\}\{d\_\{i\}\},&j\\in N\(i\),\\\\\[4\.0pt\] 0,&\\text\{otherwise\},\\end\{cases\}\\qquad\\Rightarrow\\qquad\\sum\_\{j\\in V\}w\_\{ij\}=1,\\ \\ w\_\{ij\}\\geq 0\.\(16\)The discrete harmonic condition is 𝐮i=∑j∈N\(i\)wij𝐮j,∀i∈I,\\mathbf\{u\}\_\{i\}=\\sum\_\{j\\in N\(i\)\}w\_\{ij\}\\,\\mathbf\{u\}\_\{j\},\\qquad\\forall i\\in I,\(17\)so each interior UV position is the weighted average of its neighbours\. ##### Laplacian form and partitioned linear system LetA∈ℝN×NA\\in\\mathbb\{R\}^\{N\\times N\}be the symmetric adjacency matrix ofGG, and define the degree matrixD=diag\(d1,…,dN\)D=\\mathrm\{diag\}\(d\_\{1\},\\dots,d\_\{N\}\)\. The uniform graph Laplacian is Stack UV coordinates intoU∈ℝN×2U\\in\\mathbb\{R\}^\{N\\times 2\}\(columns correspond touuandvv\)\. The interior barycentric constraints become \(LU\)i=𝟎∈ℝ2,∀i∈I\.\(LU\)\_\{i\}=\\mathbf\{0\}\\in\\mathbb\{R\}^\{2\},\\qquad\\forall i\\in I\.\(19\) Impose Dirichlet boundary valuesUB∈ℝ\|B\|×2U\_\{B\}\\in\\mathbb\{R\}^\{\|B\|\\times 2\}onBB, and denote unknown interior UVs byUI∈ℝ\|I\|×2U\_\{I\}\\in\\mathbb\{R\}^\{\|I\|\\times 2\}\. Partition indices as\(I,B\)\(I,B\): U=\[UIUB\],L=\[LIILIBLBILBB\]\.U=\\begin\{bmatrix\}U\_\{I\}\\\\ U\_\{B\}\\end\{bmatrix\},\\qquad L=\\begin\{bmatrix\}L\_\{II\}&L\_\{IB\}\\\\ L\_\{BI\}&L\_\{BB\}\\end\{bmatrix\}\.\(20\)Then the interior harmonic constraints yield the sparse linear system LIIUI\+LIBUB=0⟹LIIUI=−LIBUB,L\_\{II\}U\_\{I\}\+L\_\{IB\}U\_\{B\}=0\\quad\\Longrightarrow\\quad L\_\{II\}U\_\{I\}=\-L\_\{IB\}U\_\{B\},\(21\)which can be solved foruuandvvcoordinates\. ## Appendix CFull trial\-wise comparison details Tables[7](https://arxiv.org/html/2605.15231#A3.T7)and[8](https://arxiv.org/html/2605.15231#A3.T8)report the detailed trial\-wise comparison between pretrained and non\-pretrained models under different training sample sizes\. Results are presented as mean±\\pmstandard deviation for MED and MIPE, respectively\. The results show consistent accuracy improvements for the pretrained model compared with the model without pretraining\. Table 7:Full trial\-wise comparison of pretrained and non\-pretrained models using mean Euclidean distance \(MED, mm\)\. Results are reported as mean±\\pmstandard deviation\.Table 8:Full trial\-wise comparison of pretrained and non\-pretrained models using maximum intrusion percentage error \(MIPE, %\)\. Results are reported as mean±\\pmstandard deviation\.
Similar Articles
Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation
This paper introduces HMH, a hierarchical multi-scale Graph Neural Network framework designed to address oversmoothing and oversquashing in heterophilous graphs. It utilizes spectral filters with Haar bases to achieve scalable learning and improved performance on node and graph classification tasks.
UniT: Unified Geometry Learning with Group Autoregressive Transformer
UniT is a unified feed-forward model for geometry perception using a Group Autoregressive Transformer that integrates multiple paradigms (online/offline, multi-modal, long-horizon) while maintaining metric-scale accuracy via scale-adaptive loss and queue-style KV caching. It achieves state-of-the-art performance on ten benchmarks spanning seven tasks.
MorphStrata: Layer-Specific Perturbations for Generating Morphence Students in Time-Series Moving Target Defense
MorphStrata introduces a layer-specific stochastic noise injection strategy for generating diverse student models in a Moving Target Defense framework to enhance adversarial robustness in time-series forecasting, achieving up to 97.97% improvement in RMSE under BIM attacks with minimal training overhead.
Learning A Unified Risk Map for Autonomous Driving in Partially Observable Environments
Proposes a unified risk map modeling framework for autonomous driving that integrates traffic flow and collision risks in partially observable environments, using spatiotemporal modeling and diffusion-based scenario generation. Outperforms state-of-the-art occlusion-aware baselines on the Waymo Open Motion Dataset.
Robustness of Graph Self-Supervised Learning to Real-World Noise: A Case Study on Text-Driven Biomedical Graphs
This paper introduces NATD-GSSL, a framework evaluating the robustness of Graph Self-Supervised Learning on noisy, text-driven biomedical graphs. It demonstrates that certain GNN architectures and pretext tasks maintain performance despite real-world noise, offering practical guidance for unsupervised learning in imperfect datasets.