Diversity-Driven Offline Multi-Objective Optimization via Nested Pareto Set Learning

arXiv cs.LG Papers

Summary

This paper proposes DOMOO, a diversity-driven offline multi-objective optimization method that uses accumulative risk control and nested Pareto set learning to address out-of-distribution issues, achieving superior convergence and diversity on benchmarks.

arXiv:2606.15115v1 Announce Type: new Abstract: Multi-objective optimization (MOO) has emerged as a powerful approach to solving complex optimization problems involving multiple objectives. In many practical scenarios, function evaluations are unavailable or prohibitively expensive, necessitating optimization solely based on a fixed offline dataset. In this setting, known as offline MOO, the goal is to find out the Pareto set without access to the true objective functions. This setting suffers from the out-of-distribution (OOD) issue, where the surrogate model is not accurate for unseen designs. Due to the OOD issue, surrogate errors may cause the optimizer to select solutions that do not lie on the true Pareto front and are biased toward its extremes. To address this, this paper proposes Diversity-driven Offline Multi-Objective Optimization (DOMOO), which aims to find out a diverse and high-quality set of solutions. First, DOMOO incorporates an accumulative risk control module that estimates the potential risk of candidate solutions and alleviates the OOD issue between the training data and the generated solutions. In addition, a nested Pareto set learning (PSL) strategy is proposed to jointly learn preference and PSL parameters, then optimize them, enabling adaptation to diverse Pareto front geometries. To further enhance solution quality, we design a diversity-driven selection strategy that extracts a representative and well-distributed set of final solutions. To achieve this diversity-driven selection strategy, we propose $\text{IGD}_\text{offline}$, a tailored indicator for the offline setting that considers both diversity and convergence, and avoids the bias of hypervolume indicator. Extensive experiments on synthetic and real-world benchmarks show that DOMOO achieves the best average rank across tasks in both convergence and diversity among the compared methods.
Original Article
View Cached Full Text

Cached at: 06/16/26, 11:37 AM

# Diversity-Driven Offline Multi-Objective Optimization via Nested Pareto Set Learning
Source: [https://arxiv.org/html/2606.15115](https://arxiv.org/html/2606.15115)
Yaolin WenXiang XiaXin AnHanyi SiXiang ShuYangde FuLiang DouHong Qian

###### Abstract

Multi\-objective optimization \(MOO\) has emerged as a powerful approach to solving complex optimization problems involving multiple objectives\. In many practical scenarios, function evaluations are unavailable or prohibitively expensive, necessitating optimization solely based on a fixed offline dataset\. In this setting, known as offline MOO, the goal is to find out the Pareto set without access to the true objective functions\. This setting suffers from the out\-of\-distribution \(OOD\) issue, where the surrogate model is not accurate for unseen designs\. Due to the OOD issue, surrogate errors may cause the optimizer to select solutions that do not lie on the true Pareto front and are biased toward its extremes\. To address this, this paper proposes Diversity\-driven Offline Multi\-Objective Optimization \(DOMOO\), which aims to find out a diverse and high\-quality set of solutions\. First, DOMOO incorporates an accumulative risk control module that estimates the potential risk of candidate solutions and alleviates the OOD issue between the training data and the generated solutions\. In addition, a nested Pareto set learning \(PSL\) strategy is proposed to jointly learn preference and PSL parameters, then optimize them, enabling adaptation to diverse Pareto front geometries\. To further enhance solution quality, we design a diversity\-driven selection strategy that extracts a representative and well\-distributed set of final solutions\. To achieve this diversity\-driven selection strategy, we proposeIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}, a tailored indicator for the offline setting that considers both diversity and convergence, and avoids the bias of hypervolume indicator\. Extensive experiments on synthetic and real\-world benchmarks show that DOMOO achieves the best average rank across tasks in both convergence and diversity among the compared methods\.

Offline Optimization, Black\-Box Optimization, Multi\-Objective Optimization, Pareto Set Learning

## 1Introduction

Multi\-objective optimization \(MOO\) is widely used in fields ranging from neural architecture search\(Luet al\.,[2020](https://arxiv.org/html/2606.15115#bib.bib68)\)to antenna structure design\(Yuet al\.,[2019](https://arxiv.org/html/2606.15115#bib.bib70)\), where practitioners must balance conflicting goals, for example, developing a drug\(Nicolaou and Brown,[2013](https://arxiv.org/html/2606.15115#bib.bib84)\)that is both highly effective and minimally toxic\. MOO seeks to discover the complete set of Pareto optimal solutions, where no objective can be improved without degrading others\(Linet al\.,[2022](https://arxiv.org/html/2606.15115#bib.bib16)\)\. Many existing methods rely on surrogate models to approximate the true objectives\. However, to maintain the accuracy of the surrogates, they typically require actively querying new function evaluations with the true objectives during training\(Liet al\.,[2025](https://arxiv.org/html/2606.15115#bib.bib71)\)\. In many real\-world applications, such as protein engineering and molecular design\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64)\), evaluating true objective functions can be prohibitively expensive or hazardous\(Yuanet al\.,[2025](https://arxiv.org/html/2606.15115#bib.bib13)\), making function evaluations difficult\. Fortunately, these domains often provide available historical data \(i\.e\., offline dataset\) in the form of solutions and their corresponding true objective function values\. This motivates the offline MOO setting, where the goal is to recommend a set of solutions that represent the best trade\-offs among multiple objectives, using only an offline dataset without any active evaluation\.

A common approach to solving offline MOO is to train surrogate models \(e\.g\., Gaussian processes or deep neural networks\) on the offline dataset\. Then, optimization algorithms \(e\.g\., evolutionary algorithms\) explore the solution space under the guidance of surrogate models to identify solutions expected to perform well\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64); Yuanet al\.,[2025](https://arxiv.org/html/2606.15115#bib.bib13)\)\. However, the trained surrogates are susceptible to the out\-of\-distribution \(OOD\) issue, often producing unreliable predictions for solutions that lie far from the training distribution\(Luet al\.,[2023](https://arxiv.org/html/2606.15115#bib.bib43); Brookeset al\.,[2019](https://arxiv.org/html/2606.15115#bib.bib24); Chenet al\.,[2023](https://arxiv.org/html/2606.15115#bib.bib32); Yunet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib4)\)\. As shown in the left part of Figure[1](https://arxiv.org/html/2606.15115#S1.F1), we present an offline single\-objective optimization example for visualization\. In this setting, the surrogate model trained on an offline dataset tends to underestimate the true objective far from the dataset\. As a result, the optimizer selects solutions that appear promising under the surrogate but perform poorly under the true objective due to the OOD issue\. In the multi\-objective setting, the OOD issue can cause the surrogates to underestimate a few solutions, making them incorrectly dominate many others\. This leads toa severely imbalanced Pareto front \(as illustrated by the dark blue solutions in the right part of Figure[1](https://arxiv.org/html/2606.15115#S1.F1)\), where most solutions are eliminated and the diversity, as well as convergence, drops sharply\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64)\)\.

![Refer to caption](https://arxiv.org/html/2606.15115v1/pics/Motivation.png)Figure 1:Motivation illustration\.The left figure illustrates the OOD issue in offline single\-objective optimization, while the right figure highlights that OOD can lead to reduced diversity and convergence in offline multi\-objective optimization\.Despite its significance, the OOD issue in offline MOO remains largely underexplored\. Although several methods have been proposed to address OOD in single\-objective offline settings\(Qiet al\.,[2022](https://arxiv.org/html/2606.15115#bib.bib17); Kumar and Levine,[2020](https://arxiv.org/html/2606.15115#bib.bib29); Trabuccoet al\.,[2021](https://arxiv.org/html/2606.15115#bib.bib23)\), for instance, by incorporating conservatism into surrogate models to intentionally lower the predictions of potentially overestimated OOD solutions\(Yuet al\.,[2021](https://arxiv.org/html/2606.15115#bib.bib33)\), these techniques cannot be directly applied to MOO due to the intricate structure of Pareto dominance among multiple objectives\. As a result, they often exhibit poorer diversity in their solutions when naively extended to the multi\-objective case\.

Moreover, existing online MOO methods, such as multi\-objective Bayesian optimization\(Ozakiet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib73)\)and evolutionary algorithms\(Liet al\.,[2015](https://arxiv.org/html/2606.15115#bib.bib72)\), are typically immune to the OOD issue in their native setting, as they can actively query new data\. However, when these methods are directly applied to the offline scenario, where no additional data can be obtained, they often suffer from severe OOD\-induced errors, leading to degraded optimization performance\. This highlights the urgent need for principled methods that explicitly address OOD issue in offline MOO\.

Contribution\.To address the aforementioned problem in offline MOO, this paper proposes Diversity\-Driven Offline Multi\-Objective Optimization \(DOMOO\), a Nested Pareto Set Learning \(NPSL\) framework designed to improve the diversity and convergence of the candidate solutions\. Our key contributions are:

- •Nested PSL with Risk Control\.We propose a NPSL framework that simultaneously learns preference\-conditioned mappings and optimizes preference vectors with accumulative risk control\. To tackle OOD uncertainty, DOMOO embeds risk suppression within the preference update mechanism, ensuring a robust balance between diversity and reliability\.
- •Diversity\-Driven Selection WithIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}\.We design a diversity\-driven solution selection strategy with a novelIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}indicator tailored for the offline setting\. This indicator replaces the unavailable true Pareto front with a shifted offline reference, avoiding the bias of hypervolume toward extreme solutions and enabling reliable diversity evaluation without active queries\.
- •Strong Empirical Performance\.Extensive experiments on synthetic and real\-world benchmarks show that DOMOO achieves the best average rank across tasks in both convergence and diversity among the compared methods\.

The subsequent sections present the related work and preliminaries, describe the proposed DOMOO method, show the empirical results, and conclude the paper\.

## 2Related Work

Offline single\-objective optimization methods alleviating the OOD issue fall into three types: forward approaches \(e\.g\., COMs\(Trabuccoet al\.,[2021](https://arxiv.org/html/2606.15115#bib.bib23)\), NEMO\(Fu and Levine,[2021](https://arxiv.org/html/2606.15115#bib.bib5)\), COOREM\(Zhuet al\.,[2025](https://arxiv.org/html/2606.15115#bib.bib7)\)\), generative models \(e\.g\., MIN\(Kumar and Levine,[2020](https://arxiv.org/html/2606.15115#bib.bib29)\), CbAS\(Brookeset al\.,[2019](https://arxiv.org/html/2606.15115#bib.bib24)\)\), and trajectory\-based methods \(e\.g\., BONET\(Mashkariaet al\.,[2023](https://arxiv.org/html/2606.15115#bib.bib56)\), PGS\(Cheminguiet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib57)\)\)\. These methods respectively focus on surrogate robustness, distribution learning with regularization, and leveraging synthetic trajectories to explore quality solutions beyond the offline dataset\. While these methods mitigate the OOD issue, extending them to the multi\-objective setting is challenging as it requires balancing diversity and convergence across conflicting objectives\. Benchmarking efforts such as Design\-Bench\(Trabuccoet al\.,[2022](https://arxiv.org/html/2606.15115#bib.bib35)\)and SOO\-Bench\(Qianet al\.,[2025](https://arxiv.org/html/2606.15115#bib.bib83)\)have provided standardized evaluation protocols for offline single\-objective optimization; however, no comparable benchmark framework existed for the multi\-objective case until Off\-MOO\-Bench\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64)\)\.

Offline Multi\-Objective Optimization\.Offline MOO typically adopts three main approaches: evolutionary algorithms, Bayesian optimization, and deep neural network\-based methods\. Population\-based search strategies are commonly used in evolutionary algorithms, where a trained surrogate model serves as an oracle to guide the optimization process\. Representative methods following this paradigm include DDMOEA/GAN\(Zhanget al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib60)\), MS\-RV\(Yanget al\.,[2020](https://arxiv.org/html/2606.15115#bib.bib59)\), and IBEA\-MS\(Liuet al\.,[2023](https://arxiv.org/html/2606.15115#bib.bib58)\)\. Similarly, Bayesian optimization also employs a surrogate model as an oracle, but selects candidate solutions via acquisition functions and updates the selection iteratively\. Various methods and enhancements have been proposed under the multi\-objective Bayesian optimization \(MOBO\) framework, including MOBO\-qNEHVI\(Daultonet al\.,[2021](https://arxiv.org/html/2606.15115#bib.bib61)\), MOBO\-qParEGO\(Knowles,[2006](https://arxiv.org/html/2606.15115#bib.bib62)\), MOBO\-JES\(Hvarfneret al\.,[2022](https://arxiv.org/html/2606.15115#bib.bib63)\), and so on\. Unlike the previous two categories, which struggle to effectively address the OOD issue, neural network\-based methods can mitigate this problem by replacing traditional surrogate models with those adopted in forward approaches from offline single\-objective optimization \(e\.g\., COMs\(Trabuccoet al\.,[2021](https://arxiv.org/html/2606.15115#bib.bib23)\), IOMs\(Qiet al\.,[2022](https://arxiv.org/html/2606.15115#bib.bib17)\), Tri\-Mentoring\(Chenet al\.,[2023](https://arxiv.org/html/2606.15115#bib.bib32)\)\), and extending them using multiple models\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64)\)to handle offline MOO\. While these methods achieve strong convergence properties, they do not consider how to maintain solution diversity across the Pareto front \(PF\)\.

Pareto Set Learning\.PSL is a recently proposed model\-based approach that learns a mapping from preference vectors to Pareto optimal solutions by training a neural network\. PSL\-MOBO\(Linet al\.,[2022](https://arxiv.org/html/2606.15115#bib.bib16)\), which is the first method to integrate PSL with MOBO, enables efficient approximation of black\-box PFs by learning a preference\-conditioned solution generator based on surrogate models\. EPS\(Yeet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib79)\)combines evolutionary algorithms with PSL, enabling faster convergence and broader PF coverage through adaptive evolution of preference vectors\. CDM\-PSL\(Liet al\.,[2025](https://arxiv.org/html/2606.15115#bib.bib71)\)introduces diffusion models into Pareto set learning for MOBO, achieving improved solution quality and diversity under limited evaluations through conditional sampling and entropy\-based guidance\. However, PSL\-MOBO heavily relies on Gaussian process surrogates, which were primarily developed for online evaluation\. When applied to offline optimization, they often encounter severe OOD issues\.

## 3Preliminaries

### 3\.1Offline Multi\-Objective Optimization

In offline MOO, the goal is to optimize multiple conflicting objectives simultaneously using only a fixed, static dataset𝒟=\{\(𝒙i,𝒚i\)\}i=1N\\mathcal\{D\}=\\\{\(\\bm\{x\}\_\{i\},\\bm\{y\}\_\{i\}\)\\\}\_\{i=1\}^\{N\}, where𝒙i∈𝒳⊂ℝD\\bm\{x\}\_\{i\}\\in\\mathcal\{X\}\\subset\\mathbb\{R\}^\{D\}denotes a candidate solution and𝒚i\\bm\{y\}\_\{i\}is the associated objective vector\. The MOO problem can be formally stated asmin𝒙∈𝒳⁡𝒇​\(𝒙\)=\(f1​\(𝒙\),f2​\(𝒙\),…,fM​\(𝒙\)\)\\min\_\{\\bm\{x\}\\in\\mathcal\{X\}\}\\bm\{f\}\(\\bm\{x\}\)=\(f\_\{1\}\(\\bm\{x\}\),f\_\{2\}\(\\bm\{x\}\),\\dots,f\_\{M\}\(\\bm\{x\}\)\), where𝒇:𝒳→ℝM\\bm\{f\}:\\mathcal\{X\}\\to\\mathbb\{R\}^\{M\}is composed ofMMindividual objective functions\.

###### Definition 3\.1\(Pareto\-Optimal Solution\(Ehrgott,[2005](https://arxiv.org/html/2606.15115#bib.bib52)\)\)\.

A solution𝒙∗∈𝒳\\bm\{x\}^\{\\ast\}\\in\\mathcal\{X\}is called Pareto\-optimal if there exists no other solution𝒙′∈𝒳\\bm\{x\}^\{\\prime\}\\in\\mathcal\{X\}such that∀i∈\{1,2,…,M\}\\forall i\\in\\\{1,2,\\dots,M\\\},fi​\(𝒙′\)≤fi​\(𝒙∗\)f\_\{i\}\(\\bm\{x\}^\{\\prime\}\)\\leq f\_\{i\}\(\\bm\{x\}^\{\\ast\}\), with at least one strict inequality, i\.e\.,∃j∈\{1,2,…,M\}\\exists j\\in\\\{1,2,\\dots,M\\\}such thatfj​\(𝒙′\)<fj​\(𝒙∗\)f\_\{j\}\(\\bm\{x\}^\{\\prime\}\)<f\_\{j\}\(\\bm\{x\}^\{\\ast\}\)\.

###### Definition 3\.2\(Pareto Set and Pareto Front\(Liet al\.,[2015](https://arxiv.org/html/2606.15115#bib.bib72)\)\)\.

The set of all Pareto\-optimal solutions is called Pareto set, denoted byℳps\\mathcal\{M\}\_\{\\text\{ps\}\}, and its image under the mapping𝒇\\bm\{f\},𝒇​\(ℳps\)=\{𝒇​\(𝒙\)∣𝒙∈ℳps\}\\bm\{f\}\(\\mathcal\{M\}\_\{\\text\{ps\}\}\)=\\\{\\bm\{f\}\(\\bm\{x\}\)\\mid\\bm\{x\}\\in\\mathcal\{M\}\_\{\\text\{ps\}\}\\\}is called the Pareto front\.

However, in MOO no single solution can optimize all objectives concurrently and trade\-offs among conflicting objectives are inevitable\(Qianet al\.,[2013](https://arxiv.org/html/2606.15115#bib.bib2); Bianet al\.,[2025](https://arxiv.org/html/2606.15115#bib.bib1)\)\. Therefore, the primary goal in offline MOO can be viewed as the pursuit of the Pareto solutions \(i\.e\., solutions for which no other solution can improve some objectives without causing detriment to at least one other objective, as defined in Definition[3\.1](https://arxiv.org/html/2606.15115#S3.Thmtheorem1)\) and the effective approximation of the Pareto front \(as Definition[3\.2](https://arxiv.org/html/2606.15115#S3.Thmtheorem2)\)\.

### 3\.2Pareto Set Learning for Offline MOO

In MOO, the preference𝝀\\bm\{\\lambda\}reflects the relative importance or priority of each objective\. To learn a connection from all valid preferencesΛ=\{𝝀∈ℝ\+M∣∑λi=1\}\\Lambda=\\\{\\bm\{\\lambda\}\\in\\mathbb\{R\}^\{M\}\_\{\+\}\\mid\\sum\\lambda\_\{i\}=1\\\}to their corresponding Pareto solutions, Pareto set learning \(PSL\)\(Linet al\.,[2022](https://arxiv.org/html/2606.15115#bib.bib16)\)trains a Pareto set model through scalarization methods, which bridge preferences and Pareto solutions by transforming the multi\-objective problem into a single\-objective one for each preference\. Specifically, PSL\(Linet al\.,[2022](https://arxiv.org/html/2606.15115#bib.bib16)\)uses the scalarization based on the augmented Tchebycheff approach\(Kaliszewski,[1987](https://arxiv.org/html/2606.15115#bib.bib54)\):

gtch\_aug​\(𝒙∣𝝀\)\\displaystyle g\_\{\\text\{tch\\\_aug\}\}\(\\bm\{x\}\\mid\\bm\{\\lambda\}\)=max1≤i≤M⁡\{λi​\(fi​\(𝒙\)−\(zi∗−ε\)\)\}\\displaystyle=\\max\_\{1\\leq i\\leq M\}\\left\\\{\\lambda\_\{i\}\\left\(f\_\{i\}\(\\bm\{x\}\)\-\(z\_\{i\}^\{\\ast\}\-\\varepsilon\)\\right\)\\right\\\}\(1\)\+ρ​∑i=1Mλi​fi​\(𝒙\),∀𝝀∈Λ,\\displaystyle\\quad\+\\rho\\sum\_\{i=1\}^\{M\}\\lambda\_\{i\}f\_\{i\}\(\\bm\{x\}\),\\quad\\forall\\bm\{\\lambda\}\\in\\Lambda\\,,where𝒛∗=\(z1∗,⋯,zM∗\)\\bm\{z\}^\{\\ast\}=\(z\_\{1\}^\{\\ast\},\\cdots,z\_\{M\}^\{\\ast\}\)is the ideal vector for the objective𝒇​\(𝒙\)\\bm\{f\}\(\\bm\{x\}\), defined aszi∗=min𝒙∈𝒟⁡fi​\(𝒙\)z\_\{i\}^\{\\ast\}=\\min\_\{\\bm\{x\}\\in\\mathcal\{D\}\}f\_\{i\}\(\\bm\{x\}\)for eachi=1,…,Mi=1,\\dots,M,ε\\varepsilonis a small positive scalar andρ\\rhois a small positive scalar that depends on the problem and the current solution location\.

During the training process, for each sampled preference𝝀\\bm\{\\lambda\}, the Pareto set model outputs a solutionhϕ​\(𝝀\)h\_\{\\bm\{\\phi\}\}\(\\bm\{\\lambda\}\)and is optimized to minimize the scalarized objectivegtch\_aug​\(hϕ​\(𝝀\)\|𝝀\)g\_\{\\text\{tch\\\_aug\}\}\(h\_\{\\bm\{\\phi\}\}\(\\bm\{\\lambda\}\)\|\\bm\{\\lambda\}\)over all valid preferences:ϕ∗=arg⁡minϕ⁡𝔼𝝀∼Λ​gtch\_aug​\(𝒙=hϕ​\(𝝀\)\|𝝀\)\\bm\{\\phi\}^\{\\ast\}=\\arg\\min\_\{\\bm\{\\phi\}\}\\mathbb\{E\}\_\{\\bm\{\\lambda\}\\sim\\Lambda\}g\_\{\\text\{tch\\\_aug\}\}\(\\bm\{x\}=h\_\{\\bm\{\\phi\}\}\(\\bm\{\\lambda\}\)\|\\bm\{\\lambda\}\)\. However, in offline MOO, solutions cannot be evaluated during the optimization process\. Therefore,MMsurrogate modelsf^i\\hat\{f\}\_\{i\}are built for each objective based on the offline dataset𝒟\\mathcal\{D\}to predict solutions when calculating Equation[1](https://arxiv.org/html/2606.15115#S3.E1)\. With the trained Pareto set modelhϕ∗h\_\{\\bm\{\\phi\}^\{\\ast\}\}, we can obtain the Pareto set:ℳps=\{𝒙=hϕ∗​\(𝝀\)∣𝝀∈Λ\}\\mathcal\{M\}\_\{\\text\{ps\}\}=\\\{\\bm\{x\}=h\_\{\\bm\{\\phi\}^\{\\ast\}\}\(\\bm\{\\lambda\}\)\\mid\\bm\{\\lambda\}\\in\\Lambda\\\}, wherehϕ∗​\(𝝀\)=arg⁡min𝒙∈𝒳⁡gtch\_aug​\(𝒙∣𝝀\),∀𝝀∈Λh\_\{\\bm\{\\phi\}^\{\\ast\}\}\(\\bm\{\\lambda\}\)=\\arg\\min\_\{\\bm\{x\}\\in\\mathcal\{X\}\}g\_\{\\text\{tch\\\_aug\}\}\(\\bm\{x\}\\mid\\bm\{\\lambda\}\),\\forall\\bm\{\\lambda\}\\in\\Lambda\.

### 3\.3Energy Model

In offline MOO, the objective function cannot be evaluated during the optimization process, soMMsurrogate models are constructed for each objective given the offline dataset𝒟\\mathcal\{D\}to predict the objective values for any candidate solution\. However, most existing surrogate models typically ignore OOD risk, which can lead to performance degradation or unsafe decisions in high\-stakes applications\. Therefore, explicit risk modeling and suppression are necessary in offline multi\-objective optimization\. To mitigate the negative impact of OOD solutions, ARCOO\(Luet al\.,[2023](https://arxiv.org/html/2606.15115#bib.bib43)\)introduces the energy modelE𝝎E\_\{\\bm\{\\omega\}\}to assign an energy valueE𝝎​\(𝒙\)E\_\{\\bm\{\\omega\}\}\(\\bm\{x\}\)to each solution𝒙\\bm\{x\}, which is realized as a neural network that maps solutions𝒙∈ℝD\\bm\{x\}\\in\\mathbb\{R\}^\{D\}to their associated energyE𝝎​\(𝒙\)∈ℝE\_\{\\bm\{\\omega\}\}\(\\bm\{x\}\)\\in\\mathbb\{R\}\. The energy model is trained via contrastive divergence with Langevin dynamics negative sampling, and a risk suppression factorR​\(𝒙\)R\(\\bm\{x\}\)is computed to dynamically weight optimization updates, suppressing OOD solutions while emphasizing in\-distribution \(ID\) ones\. The detailed formulation and training process for the energy modelE𝝎​\(𝒙\)E\_\{\\bm\{\\omega\}\}\(\\bm\{x\}\)and the risk suppression factorR​\(𝒙\)R\(\\bm\{x\}\)are deferred to Appendix[A](https://arxiv.org/html/2606.15115#A1)\.

## 4The Proposed Method

In this section, we first provide an overview of the proposed method diversity\-driven offline multi\-objective optimization \(DOMOO\), followed by a detailed description of the nested Pareto set learning with accumulative risk control, and diversity\-driven solution selection strategy, respectively\.

### 4\.1Methodology Overview

Offline MOO struggles to alleviate the OOD issue, which results in a severely imbalanced PF \(i\.e\., solutions cluster in high\-density regions, failing to cover the entire PF\), damaging both the diversity and convergence of the solutions\. To alleviate this issue, we propose DOMOO, a risk\-aware offline MOO method via nested Pareto set learning\. We provide the framework of our algorithm in Figure[2](https://arxiv.org/html/2606.15115#S4.F2)and the corresponding pseudo\-code in Appendix[B](https://arxiv.org/html/2606.15115#A2)\. Specifically, we first trainMMsurrogate models for each objective\. Based on these surrogate models, we perform nested Pareto set learning with accumulative risk control to obtain a Pareto set model\. Finally, candidate solutions are generated by both the trained Pareto set model and the trained surrogate model, and then the proposed diversity\-driven solution selection strategy is employed, resulting in a solution set with balanced diversity and convergence\.

![Refer to caption](https://arxiv.org/html/2606.15115v1/x1.png)Figure 2:The framework of diversity\-driven offline multi\-objective optimization via nested Pareto set learning\.\(a\)Surrogate models are trained for each objective and energy model is trained for risk control\.\(b\)A nested Pareto set learning process with risk control is conducted to obtain a Pareto set model\.\(c\)Candidate solutions are generated and then sequentially selected using theIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}indicator to ensure diversity, followed by the HV indicator to guarantee convergence\.
### 4\.2Nested Pareto Set Learning with Risk Control

As described in Section[3\.2](https://arxiv.org/html/2606.15115#S3.SS2), PSL\(Linet al\.,[2022](https://arxiv.org/html/2606.15115#bib.bib16)\)trains a Pareto set model to map any valid preference𝝀∈Λ=\{𝝀∈ℝ\+M∣∑λi=1\}\\bm\{\\lambda\}\\in\\Lambda=\\\{\\bm\{\\lambda\}\\in\\mathbb\{R\}^\{M\}\_\{\+\}\\mid\\sum\\lambda\_\{i\}=1\\\}to its corresponding Pareto solution via scalarization\. However, in offline settings, the OOD issue can mislead the Pareto set model by promoting solutions with unreliably estimated high performance, creating an unexpected diversity on the PF\. To mitigate the OOD issue, we propose a nested Pareto set learning approach with risk control\. This approach addresses the OOD\-induced diversity loss by jointly optimizing the Pareto set model parameters and preferences in a nested manner, where the inner loop preference optimization explores underrepresented regions of the PF and incorporates risk control, while the outer loop model optimization improves solution quality under these risk\-guided preferences\.

While ARCOO\(Luet al\.,[2023](https://arxiv.org/html/2606.15115#bib.bib43)\)provides a principled risk control framework for single\-objective offline optimization, extending it to the multi\-objective setting is non\-trivial for two fundamental reasons\. First,preference\-risk coupling: in Pareto set learning, the optimization landscape is explicitly conditioned on preference vectors, and OOD risk does not act uniformly over the solution space; it varies across objectives and interacts with preference gradients\. Simply bounding the scalarized error without modeling how risk shifts preference dynamics is insufficient to prevent Pareto front distortion\. Second,multi\-objective dominance structure: surrogate overestimation in one objective can artificially dominate others under the Pareto dominance criterion, collapsing solution diversity\. Our extension shows that, by incorporating the risk suppression factorR​\(𝒙\)R\(\\bm\{x\}\)into the preference gradient update \(Eq\. \([2](https://arxiv.org/html/2606.15115#S4.E2)\)\), DOMOO effectively damps optimization steps toward unreliable OOD regions while preserving gradient flow toward well\-supported trade\-offs\.

Surrogate Model Training\.In offline MOO, the true objective functions are inaccessible during optimization\. Therefore, before the nested Pareto set learning begins, we constructMMsurrogate modelsf^1,…,f^M\\hat\{f\}\_\{1\},\\dots,\\hat\{f\}\_\{M\}from the offline dataset𝒟\\mathcal\{D\}, one for each objective\. The complete surrogate model is then given by𝒇^​\(𝒙\)=\(f^1​\(𝒙;𝜽1∗\),…,f^M​\(𝒙;𝜽M∗\)\)\\hat\{\\bm\{f\}\}\(\\bm\{x\}\)=\(\\hat\{f\}\_\{1\}\(\\bm\{x\};\\bm\{\\theta\}\_\{1\}^\{\\ast\}\),\\dots,\\hat\{f\}\_\{M\}\(\\bm\{x\};\\bm\{\\theta\}\_\{M\}^\{\\ast\}\)\)\.

Modeling and Suppressing Accumulative Risk\.In offline optimization, the risk of OOD is non\-negligible, and neglecting this risk may result in performance degradation\(Luet al\.,[2023](https://arxiv.org/html/2606.15115#bib.bib43)\)\. Therefore, explicit risk modeling and suppression are necessary in offline MOO to mitigate OOD risk\. Specifically, as shown in Figure[2](https://arxiv.org/html/2606.15115#S4.F2)\(a\), an energy modelE𝝎E\_\{\\bm\{\\omega\}\}is trained following ARCOO\(Luet al\.,[2023](https://arxiv.org/html/2606.15115#bib.bib43)\)to measure the risk of solutions and then a risk suppression factor is computed asR​\(𝒙\)=c​\(EQ~−E𝝎​\(𝒙\)\)/\(EQ~−EP~\)R\(\\bm\{x\}\)=\{c\(E\_\{\\tilde\{Q\}\}\-E\_\{\\bm\{\\omega\}\}\(\\bm\{x\}\)\)\}\\big/\(\{E\_\{\\tilde\{Q\}\}\-E\_\{\\tilde\{P\}\}\}\), whereEQ~=𝔼𝒙′∼Q~​\[E𝝎​\(𝒙′\)\],EP~=𝔼𝒙′∼P~​\[E𝝎​\(𝒙′\)\]E\_\{\\widetilde\{Q\}\}=\\mathbb\{E\}\_\{\\bm\{x\}^\{\\prime\}\\sim\\widetilde\{Q\}\}\[E\_\{\\bm\{\\omega\}\}\(\\bm\{x\}^\{\\prime\}\)\],\\ E\_\{\\widetilde\{P\}\}=\\mathbb\{E\}\_\{\\bm\{x\}^\{\\prime\}\\sim\\widetilde\{P\}\}\[E\_\{\\bm\{\\omega\}\}\(\\bm\{x\}^\{\\prime\}\)\]andccdenotes the initial momentum \(consistent with ARCOO\)\. TheP~\\widetilde\{P\}is the empirical distribution over the high\-quality batch of solutions in the offline dataset\. TheQ~\\widetilde\{Q\}is the high\-risk distribution sampled by Langevin dynamics starting fromP~\\widetilde\{P\}\. For more details about the energy modelE𝝎E\_\{\\bm\{\\omega\}\}, please refer to the Section[3\.3](https://arxiv.org/html/2606.15115#S3.SS3)\.

Nested Pareto Set Learning\.The nested Pareto set learning process consists of three phases: pretraining, exploration, and preference gradient update\. In each iteration, the preferences are updated first \(inner loop\), and then the Pareto set modelhϕh\_\{\\bm\{\\phi\}\}is trained to convergence using these updated preferences as targets \(outer loop\)\.

In the pretraining phase, we leverage the offline PF\(𝑿off,𝒀off\)\(\\bm\{X\}\_\{\\text\{off\}\},\\bm\{Y\}\_\{\\text\{off\}\}\)to provide a better initialization for the subsequent training process\. Specifically, during pretraining, we sample preferences from the offline preferencesΛoffline=\{𝝀off\(i\)=𝝀off\(i\)′/‖𝝀off\(i\)′‖1\}i=1n\\Lambda\_\{\\text\{offline\}\}=\\left\\\{\\bm\{\\lambda\}\_\{\\text\{off\}\}^\{\(i\)\}=\{\{\\bm\{\\lambda\}\_\{\\text\{off\}\}^\{\(i\)\}\}^\{\\prime\}\}\\big/\{\\left\\\|\{\\bm\{\\lambda\}\_\{\\text\{off\}\}^\{\(i\)\}\}^\{\\prime\}\\right\\\|\_\{1\}\}\\right\\\}\_\{i=1\}^\{n\}, wherennis the number of solutions in the offline PF and𝝀off\(i\)′=\(1/\(yoff,1\(i\)−z1∗\),⋯,1/\(yoff,M\(i\)−zM∗\)\)\{\\bm\{\\lambda\}\_\{\\text\{off\}\}^\{\(i\)\}\}^\{\\prime\}=\(1/\(y\_\{\\text\{off\},1\}^\{\(i\)\}\-z^\{\*\}\_\{1\}\),\\cdots,1/\(y\_\{\\text\{off\},M\}^\{\(i\)\}\-z^\{\*\}\_\{M\}\)\)\. Here,𝒛∗=\(z1∗,⋯,zM∗\)\\bm\{z\}^\{\\ast\}=\(z\_\{1\}^\{\\ast\},\\cdots,z\_\{M\}^\{\\ast\}\)is the ideal vector for the objective𝒇​\(𝒙\)\\bm\{f\}\(\\bm\{x\}\)and𝒚off\(i\)\\bm\{y\}\_\{\\text\{off\}\}^\{\(i\)\}is the objective vector of theii\-th solution in the offline PF\. By sampling preferences in this way, the pretraining process leverages the structure of the offline PF, providing a better initialization for the subsequent training stages and enabling the Pareto set model to start closer to the optimal solution distribution\.

Then, in the exploration phase, the preferences are sampled from the valid preferenceΛt=\{𝝀t\(b\)∼Dirichlet​\(α\)⊂Λ\}b=1B\\Lambda\_\{t\}=\\\{\\bm\{\\lambda\}\_\{t\}^\{\(b\)\}\\sim\\text\{Dirichlet\}\(\\alpha\)\\subset\\Lambda\\\}\_\{b=1\}^\{B\}, whereBBis the batch size of the solutions in each iteration\. Dirichlet\(α\\alpha\) withα=𝟏M\\alpha=\\bm\{1\}\_\{M\}, where𝟏M\\bm\{1\}\_\{M\}denotes theMM\-dimensional all\-ones vector, is defined on the simplex\{𝝀∈ℝ\+M∣∑λi=1\}\\\{\\bm\{\\lambda\}\\in\\mathbb\{R\}^\{M\}\_\{\+\}\\mid\\sum\\lambda\_\{i\}=1\\\}, enabling diverse trade\-off sampling and preventing overfitting to a narrow set of preferences\(Navonet al\.,[2021](https://arxiv.org/html/2606.15115#bib.bib15)\)\. This stage serves as a pure exploration phase, enabling the model to be trained over the entire preference space and thus improving its generalization across different preferences\.

Finally, in preference gradient update phase, preferences are adaptively updated using gradient information\. To mitigate OOD risk, we incorporate the explicit risk modeling and suppression into the preference update\. The preference gradient update phase with accumulative risk control is defined as follows:

𝝀t\(b\)\\displaystyle\\bm\{\\lambda\}\_\{t\}^\{\(b\)\}=𝝀t−1\(b\)−ηpref​R​\(𝒙=hϕ​\(𝝀t−1\(b\)\)\)\\displaystyle=\\bm\{\\lambda\}\_\{t\-1\}^\{\(b\)\}\-\\eta\_\{\\text\{pref\}\}R\\\!\\left\(\\bm\{x\}=h\_\{\\bm\{\\phi\}\}\\\!\\left\(\\bm\{\\lambda\}\_\{t\-1\}^\{\(b\)\}\\right\)\\right\)\(2\)⋅∇𝝀g^tch\_aug\(𝒙=hϕ\(𝝀\)∣𝝀\)\|𝝀t−1\(b\),b=1,2,…,B,\\displaystyle\\quad\\cdot\\nabla\_\{\\bm\{\\lambda\}\}\\hat\{g\}\_\{\\text\{tch\\\_aug\}\}\\\!\\left\(\\bm\{x\}=h\_\{\\bm\{\\phi\}\}\(\\bm\{\\lambda\}\)\\mid\\bm\{\\lambda\}\\right\)\\bigg\|\_\{\\bm\{\\lambda\}\_\{t\-1\}^\{\(b\)\}\},b=1,2,\\dots,B\\,,whereηpref\\eta\_\{\\text\{pref\}\}is the learning rate for preference optimization,R​\(𝒙\)R\(\\bm\{x\}\)is a risk suppression factor\(Luet al\.,[2023](https://arxiv.org/html/2606.15115#bib.bib43)\)that controls the OOD risk of solution𝒙\\bm\{x\}andg^tch\_aug​\(⋅\)\\hat\{g\}\_\{\\text\{tch\\\_aug\}\}\(\\cdot\)is the augmented Tchebycheff scalarization with the trained surrogate models\. Specifically, the augmented Tchebycheff scalarization is defined as:g^tch\_aug​\(𝒙∣𝝀\)=max1≤i≤M⁡\{λi​\(f^i​\(𝒙;𝜽i∗\)−\(zi∗−ε\)\)\}\+ρ​∑i=1Mλi​f^i​\(𝒙;𝜽i∗\)\\hat\{g\}\_\{\\text\{tch\\\_aug\}\}\(\\bm\{x\}\\mid\\bm\{\\lambda\}\)=\\max\_\{1\\leq i\\leq M\}\\\{\\lambda\_\{i\}\(\\hat\{f\}\_\{i\}\(\\bm\{x\};\\bm\{\\theta\}\_\{i\}^\{\\ast\}\)\-\(z\_\{i\}^\{\\ast\}\-\\varepsilon\)\)\\\}\+\\rho\\sum\_\{i=1\}^\{M\}\\lambda\_\{i\}\\hat\{f\}\_\{i\}\(\\bm\{x\};\\bm\{\\theta\}\_\{i\}^\{\\ast\}\), in whichf^i​\(⋅;𝜽i∗\)\\hat\{f\}\_\{i\}\(\\cdot;\\bm\{\\theta\}\_\{i\}^\{\\ast\}\)denotes the trained surrogate model for theii\-th objective\. Although Eq\. \([2](https://arxiv.org/html/2606.15115#S4.E2)\) minimizesg^tch\_aug\\hat\{g\}\_\{\\text\{tch\\\_aug\}\}, its gradient is evaluated on𝒙=hϕ​\(𝝀\)\\bm\{x\}=h\_\{\\bm\{\\phi\}\}\(\\bm\{\\lambda\}\)from the current model\. Hence, preferences leading to poor solutions produce larger gradients and are updated more, implicitly shifting exploration toward underrepresented regions\. The outer loop \(Eq\. \([3](https://arxiv.org/html/2606.15115#S4.E3)\)\) then improves solution quality under these updated preferences, jointly balancing diversity and quality\.

![Refer to caption](https://arxiv.org/html/2606.15115v1/x2.png)Figure 3:Visualization of the nested PSL\.The solutions generated by DOMOO after each preference update phases in IN\-1K/MOP7 task are visualized\.As shown in Figure[3](https://arxiv.org/html/2606.15115#S4.F3), it can be found that after the exploration and preference gradient update phases, the solutions generated by the Pareto set model become more uniformly distributed, better ensuring the diversity of the solution set\.

After updating the preferences, gradient descent is used to train the Pareto set modelhϕh\_\{\\bm\{\\phi\}\}with the trained surrogate model𝒇^​\(⋅\)\\hat\{\\bm\{f\}\}\(\\cdot\), whereηpsl\\eta\_\{\\text\{psl\}\}is the learning rate for the PSL model:

ϕ\\displaystyle\\bm\{\\phi\}=ϕ−ηpslB​∑b=1B∇ϕg^tch\_aug​\(𝒙t\(b\)=hϕ​\(𝝀t\(b\)\)\|𝝀t\(b\)\)\.\\displaystyle=\\bm\{\\phi\}\-\\frac\{\\eta\_\{\\text\{psl\}\}\}\{B\}\\sum\_\{b=1\}^\{B\}\\nabla\_\{\\bm\{\\phi\}\}\\hat\{g\}\_\{\\text\{tch\\\_aug\}\}\\\!\\left\(\\bm\{x\}\_\{t\}^\{\(b\)\}=h\_\{\\bm\{\\phi\}\}\\\!\\left\(\\bm\{\\lambda\}\_\{t\}^\{\(b\)\}\\right\)\\,\\big\|\\,\\bm\{\\lambda\}\_\{t\}^\{\(b\)\}\\right\)\\,\.\(3\)
Through the nested Pareto set learning approach, we obtain the trained Pareto set modelhϕ∗h\_\{\\bm\{\\phi\}^\{\\ast\}\},which can effectively adapt to diverse PF geometries and approximate the Pareto set powerfully\. Notably, the candidate solutions are guided by risk\-modulated preferences from Eq\. \([2](https://arxiv.org/html/2606.15115#S4.E2)\) throughout training, so the final selection stage can focus on the diversity–convergence trade\-off\.

### 4\.3Diversity\-Driven Solution Selection Strategy

Table 1:Comparison of average HV ranks achieved by different offline MOO methods across different tasks in Off\-MOO\-Bench\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64)\)\. For each task, the top three methods are highlighted using\(1st\),\(2nd\), and\(3rd\)formatting\.𝒟​\(best\)\\mathcal\{D\}\(\\text\{best\}\)denotes the best subset in the offline dataset \(i\.e\., with the highest HV\), and the last column reports the average rank across all tasks\. “N/A” marks cases where a method cannot complete on a task, either because it fails to return feasible solutions or because it exceeds practical limits in runtime or GPU memory on that task\.After the nested Pareto set learning, we have obtained a practical Pareto set modelhϕ∗h\_\{\{\\bm\{\\phi\}\}^\{\\ast\}\}that can easily approximate the Pareto set with the valid preferencesΛ\\Lambda\. However, in real offline MOO scenarios, the deployment of solution sets is often constrained by scale limitations, e\.g\., only limited solutions can be evaluated\. Therefore, how to select the optimal subset from the learned Pareto solution set becomes a key challenge\. In this paper, we propose a diversity\-driven solution selection strategy by combining two indicators: offline inverse generation distance \(IGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}\) and hypervolume \(HV\), to better balance diversity and convergence\.

The traditional inverse generation distance \(IGD\) assumes access to the true PF to evaluate how well a solution set covers it\.

In offline MOO, however, the true front is not observable since no additional evaluations are permitted\. Therefore, we adaptIGDto the offline regime by replacing the unknown true front with an offline PF estimated from the dataset and by introducing a shift to form a stricter reference\. The full definition is given by

IGDoffline=1n​∑i=1nmin1≤j≤\|𝑿cand\|⁡‖𝒚off\(i\)−β​y′​𝟏M−𝒚^cand\(j\)‖2,\\text\{IGD\}\_\{\\text\{offline\}\}=\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\min\_\{1\\leq j\\leq\|\\bm\{X\}\_\{\\text\{cand\}\}\|\}\\left\\\|\\bm\{y\}\_\{\\text\{off\}\}^\{\(i\)\}\-\\beta y^\{\\prime\}\\bm\{1\}\_\{M\}\-\\hat\{\\bm\{y\}\}\_\{\\text\{cand\}\}^\{\(j\)\}\\right\\\|\_\{2\}\\,,\(4\)wherennis the number of solutions in the offline PF,𝒚off\(i\)\\bm\{y\}\_\{\\text\{off\}\}^\{\(i\)\}is the objective vector of theii\-th solution in the offline PF,\|𝑿cand\|\|\\bm\{X\}\_\{\\text\{cand\}\}\|denotes the number of candidate solutions in𝑿cand\\bm\{X\}\_\{\\text\{cand\}\}and𝒚^cand\(j\)\\hat\{\\bm\{y\}\}\_\{\\text\{cand\}\}^\{\(j\)\}is the objective vector of thejj\-th solution in the candidate solutions𝑿cand\\bm\{X\}\_\{\\text\{cand\}\}predicted by the surrogate model\. Here,β\\betais a scaling factor andy′y^\{\\prime\}is a shift value, defined asy′=max1≤i≤n⁡min1≤m≤M⁡yoff,m\(i\)y^\{\\prime\}=\\max\_\{1\\leq i\\leq n\}\\min\_\{1\\leq m\\leq M\}y\_\{\\text\{off\},m\}^\{\(i\)\}, whereyoff,m\(i\)y\_\{\\text\{off\},m\}^\{\(i\)\}denotes themm\-th objective value of theii\-th solution in the offline PF\. The shift valuey′y^\{\\prime\}is introduced to construct a more challenging reference front, allowing a stricter evaluation of optimization performance in terms of convergence and diversity\. All objective values are min\-max normalized per objective based on the offline dataset, ensuring fair scale\-invariant comparisons\. It is worth noting that the construction ofIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}does not favor solutions that stay close to the offline data, as the reference front is normalized and shifted toward the ideal point, encouraging exploration and broad Pareto\-front coverage rather than conservative interpolation\.

Before performing the solution selection strategy, the trained Pareto set modelhϕ∗h\_\{\\bm\{\\phi\}^\{\\ast\}\}is employed to generateKKcandidate solutions𝑿ps=\{𝒙ps\(k\)=hϕ∗​\(𝝀ps\(k\)\)\}k=1K\\bm\{X\}\_\{\\text\{ps\}\}=\\\{\\bm\{x\}\_\{\\text\{ps\}\}^\{\(k\)\}=h\_\{\\bm\{\\phi^\{\\ast\}\}\}\(\\bm\{\\lambda\}\_\{\\text\{ps\}\}^\{\(k\)\}\)\\\}\_\{k=1\}^\{K\}, where𝝀ps\(k\)∼Dirichlet​\(α\)⊂Λ\\bm\{\\lambda\}\_\{\\text\{ps\}\}^\{\(k\)\}\\sim\\text\{Dirichlet\}\(\\alpha\)\\subset\\Lambda\. To further enhance the diversity of the candidate solution set, we combine theKKsolutions generated by our trained Pareto set modelhϕ∗h\_\{\\bm\{\\phi\}\}^\{\\ast\}with anotherKKsolutions produced by the surrogate model𝒇^\\hat\{\\bm\{f\}\}, thereby obtaining the complete candidate solutions𝑿cand\\bm\{X\}\_\{\\text\{cand\}\}\.

Diversity\-Driven Solution Selection\.To address the diversity challenge posed by the HV indicator in offline settings, which is demonstrated in Appendix[I](https://arxiv.org/html/2606.15115#A9), we select solutions based on bothIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}and HV indicators\. The core issue is that, due to limited offline data, surrogates often extrapolate a spurious Pareto front wider than the true one in OOD regions\. HV’s marginal\-volume mechanism then uniformly picks solutions along this illusory front, thereby selecting many tightly clustered, low\-quality solutions despite acceptable surrogate scores\. In contrast,IGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}penalizes deviation from the offline Pareto front, acting as a conservative filter against OOD artifacts\.

Notably,IGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}and HV are complementary indicators:IGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}emphasizes diversity and the uniform coverage of the PF, whereas HV focuses more on solution quality\. Therefore, combiningIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}with HV allows us to better balance diversity and convergence while mitigating the limitations of using HV alone\.

Therefore, we first utilize theIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}indicator to greedily select up to 128 solutions from the candidates𝑿cand\\bm\{X\}\_\{\\text\{cand\}\}\. The budget of 128 solutions for theIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}stage is not chosen heuristically but selected through hyperparameter analysis: varying this budget from 0 to 256 \(see Appendix[K](https://arxiv.org/html/2606.15115#A11)\) shows stable performance with a clear peak around 128, confirming that this split effectively balances diversity preservation and convergence refinement\. This encourages the selection of solutions that cover different regions of the offline PF, thereby enhancing the diversity of the solutions\. Subsequently, we select the remaining solutions from the candidates𝑿cand\\bm\{X\}\_\{\\text\{cand\}\}using the HV indicator, which maximizes the hypervolume in the objective space, serving as a convergence\-oriented filling\. With the diversity\-driven strategy combiningIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}for screening and HV for filling, we obtain the final solution set with256256solutions, which effectively balances between convergence and diversity\.

## 5Experiment

Table 2:Comparison of averageIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}ranks\. Details are the same as Table[1](https://arxiv.org/html/2606.15115#S4.T1)\.In this section, we conduct a comprehensive empirical evaluation of DOMOO against a series of existing offline MOO approaches across multiple benchmark tasks\. We begin by outlining the experimental setup, encompassing tasks, compared methods, training settings, and evaluation metrics\. Subsequently, we report the experimental results, perform an ablation study and a hyper\-parameter analysis\. The experiments are designed to answer the four significant questions:

1. Q1:Can DOMOO achieve better performance than other offline MOO methods in terms of convergence?
2. Q2:Can DOMOO balance diversity and convergence?
3. Q3:How do core modules contribute to diversity and convergence?
4. Q4:How do hyper\-parameters affect the diversity of the solution set obtained by DOMOO?

The four questions are answered sequentially in this section\. The full implementation and codes are available at[https://github\.com/YaolinWen/DOMOO](https://github.com/YaolinWen/DOMOO)\.

Benchmark and Tasks\.We evaluate DOMOO on Off\-MOO\-Bench\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64)\), which includes five categories of offline multi\-objective tasks: Synthetic functions, MO\-NAS, MORL, Sci\-Design, and RE\. These tasks span diverse domains, objective dimensionalities, and optimization difficulties, providing a comprehensive testbed for offline MOO\. Task details are provided in Appendix[C](https://arxiv.org/html/2606.15115#A3)\.

Compared Methods\.Our evaluation primarily adopts the baselines from Off\-MOO\-Bench\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64)\), encompassing both DNN\-based and GP\-based approaches\. To broaden the scope of method categories, we additionally evaluate ParetoFlow\(Yuanet al\.,[2025](https://arxiv.org/html/2606.15115#bib.bib13)\), a recent flow\-based generative method\. Detailed descriptions of all baselines are provided in Appendix[D](https://arxiv.org/html/2606.15115#A4)\.

Evaluation\.Following Off\-MOO\-Bench\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64)\), we evaluate each method by generating 256 solutions and querying the true objective functions\. We report HV\(Yuanet al\.,[2025](https://arxiv.org/html/2606.15115#bib.bib13)\), which measures the dominated volume with respect to a reference point \(i\.e\., Nadir Point in Figure[3](https://arxiv.org/html/2606.15115#S4.F3)\), where a higher HV indicates better performance\. To address the bias of HV toward extreme solutions in offline settings, we also reportIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}as in Section[4\.3](https://arxiv.org/html/2606.15115#S4.SS3)\.

### 5\.1Experimental Settings

Table 3:Ablation Study on the HV andIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}Indicator Performance of DOMOO\.
### 5\.2The Performance of DOMOO

About Superiority and Convergence \(To Q1\)\.Table[1](https://arxiv.org/html/2606.15115#S4.T1)reports the average HV rank of all compared offline MOO methods\. Detailed results at100th100^\{\\text\{th\}\}and50th50^\{\\text\{th\}\}percentiles are provided in Appendix[G](https://arxiv.org/html/2606.15115#A7)\. We make the following observations:\(1\)As shown in Table[1](https://arxiv.org/html/2606.15115#S4.T1), DOMOO achieves the best average rank across all tasks, verifying its effectiveness and convergence\.\(2\)End\-to\-End, Multi\-Head, and Multiple Models consistently outperform𝒟​\(best\)\\mathcal\{D\}\(\\text\{best\}\), highlighting the effectiveness of learned surrogates and generative models in discovering solutions beyond the offline dataset\.\(3\)GP\-based methods often tend to exhibit relatively less competitive\. This is partly because they are primarily designed for online optimization and may struggle in offline settings\. Moreover, their high computational cost and long runtime make them impractical for complex tasks, sometimes leading to failure to produce any solution within the time budget \(i\.e\., N/A in the Table[1](https://arxiv.org/html/2606.15115#S4.T1)\)\.\(4\)Although DOMOO performs worse on a few extremely discrete tasks \(e\.g\., C\-10/MOP1, C\-10/MOP2, IN\-1K/MOP5\), this is mainly because these tasks require very high\-dimensional one\-hot encodings, resulting in extremely sparse inputs that are difficult for neural Pareto\-set models to learn\. The core challenge lies in the mismatch between continuous optimization and one\-hot discrete spaces: valid data lie only on sparse vertices, while intermediate regions are largely OOD\. Although the risk model mitigates this issue, its effectiveness is limited by the vast OOD space\. Importantly, NAS tasks do not exhibit such extreme sparsity, as their discrete operations have low cardinality and structured choices; therefore, DOMOO still ranks among the top methods on most NAS subtasks\.\(5\)On several real\-world tasks \(e\.g\., MORL and Sci\-Design\), methods such as MultipleModels\+COMs or MultipleModels\+IOM occasionally achieve higher surrogate\-based HV scores than DOMOO\. We attribute this to a difference in design philosophy: these baselines incorporate conservatism primarily at the per\-objective surrogate level, which reduces but does not eliminate the risk of surrogate overestimation under the multi\-objective Pareto dominance structure, where overestimation in a single objective can still cause a solution to incorrectly dominate others\. DOMOO, in contrast, integrates risk suppression directly into the preference gradient update \(Eq\. \([2](https://arxiv.org/html/2606.15115#S4.E2)\)\), providing preference\-dependent, multi\-objective\-aware risk modulation\. The Pareto set model is then optimized via standard gradient descent \(Eq\. \([3](https://arxiv.org/html/2606.15115#S4.E3)\)\) under these risk\-guided preferences\. It trades slight surrogate HV for higher true reliability\. In real\-world applications such as molecular design and robotics control, deploying solutions with falsely predicted high efficacy can lead to costly experimental failures or safety risks; thus, robustness to surrogate errors often outweighs nominal surrogate scores\. Importantly, DOMOO achieves the best average rank across all benchmarks, indicating superior holistic performance rather than overfitting to specific tasks\.

Consequently, baseline methods relying on evolutionary search are less impacted by such discrete optimization tasks\. In a nutshell, the results verify that DOMOO can handle offline MOO tasks well and achieves superior optimization performance compared to other offline MOO methods, which answers Q1\.

About Diversity \(To Q2\)\.Table[2](https://arxiv.org/html/2606.15115#S5.T2)reports the averageIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}ranks based on the100th100^\{\\text\{th\}\}percentile\. Detailed results, including the50th50^\{\\text\{th\}\}percentile, are provided in the Appendix[H\.1](https://arxiv.org/html/2606.15115#A8.SS1)and Appendix[H\.2](https://arxiv.org/html/2606.15115#A8.SS2)\. We make the following observations:\(1\)As shown in Table[2](https://arxiv.org/html/2606.15115#S5.T2), DOMOO achieves the best average ranks on most tasks, highlighting its strong solution diversity\.\(2\)We observe that on RE tasks, most methods outperform the offline dataset in terms of HV, yet many perform worse when evaluated byIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}\. This discrepancy highlights practical limitations of HV in offline settings: inaccurate reference\-point estimation and model\-induced errors can make HV fail to faithfully reflect the diversity of the solution set\.IGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}penalizes unbalanced distributions, providing a more informative assessment of overall coverage\. Overall, the results indicate that DOMOO makes a better trade\-off between the convergence and diversity of the solutions, which answers Q2\.

### 5\.3Ablation Study

About the Contribution of Key Modules \(To Q3\)\.We evaluate the contribution of DOMOO’s components by comparing the full method against five ablated variants: w\.o\. ARC \(removing accumulative risk control\), w\.o\. NPSL \(removing nested Pareto set learning\), w\.o\. PSMG \(removing Pareto set model generation\), w\.o\. SMG \(removing surrogate model generation\), and w\.o\. DDSS \(removing diversity\-driven solution selection, replacing it with standard HV selection\)\. Detailed configurations for these variants are provided in Appendix[J](https://arxiv.org/html/2606.15115#A10)\. The results are reported in Table[3](https://arxiv.org/html/2606.15115#S5.T3)\. We observe that removing w\.o\. NPSL leads to the most significant performance drop, confirming the importance of preference\-conditioned solution refinement\. Excluding the surrogate model \(w\.o\. SMG\) also notably degrades solution quality\. Furthermore, the drops in w\.o\. PSMG and w\.o\. DDSS verify their roles in maintaining solution diversity, while w\.o\. ARC proves the necessity of risk\-aware optimization\. In summary, all modules contribute meaningfully to DOMOO’s overall performance, which answers Q3\.

### 5\.4Hyper\-Parameter Analysis

About the Impact of Hyper\-Parameters in DOMOO \(To Q4\)\.We found that the chosen hyper\-parameters, the exploration steps in nested Pareto set learningTexpT\_\{\\text\{exp\}\}, the scaling factorβ\\betainIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\},KKin DDSS and the risk ratio of the energy models, verify robust performance across most experiments, with only a few discrete problems necessitating fine\-tuning\. For further details, refer to Appendix[K](https://arxiv.org/html/2606.15115#A11)\.

## 6Conclusion and Discussion

This paper focuses on achieving better diversity while maintaining satisfactory convergence of the solution set in offline MOO and proposes DOMOO, an offline MOO method via Nested Pareto Set Learning\. DOMOO integrates nested Pareto set learning with risk control and the proposed diversity\-driven solution selection strategy to efficiently generate diverse and reliable solutions in offline MOO\.

Despite its promising performance, DOMOO has several limitations that point to directions for future work\. First, DOMOO is relatively less effective on highly discrete tasks with high\-cardinality one\-hot encodings, where the continuous optimization paradigm struggles with the sparse, vertex\-only structure of the search space\. A promising remedy is to adopt more compact latent representations \(e\.g\., VAEs\) or to develop hybrid architectures that can directly operate on categorical variables\. Second, the risk control mechanism relies on the energy model’s ability to coarsely distinguish ID from OOD regions; while our sensitivity analysis shows robustness to model capacity, developing calibration methods with formal guarantees on risk estimation quality is an important direction\. Third, extending DOMOO to higher\-dimensional objective spaces remains an open challenge due to the increased complexity of Pareto front representation and the growing difficulty of preference sampling in high\-dimensional simplices\. Finally, while DOMOO shows strong performance across benchmarks, further validation on a broader range of real\-world deployment scenarios, particularly in safety\-critical applications, is needed\.

## Acknowledgments

We would like to thank the anonymous reviewers for their constructive comments\. This work is supported by the National Key Research and Development Program of China under Grant 2024YFC3308503, the National Natural Science Foundation of China under Grant 62476091, and Ant Group\.

## Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning\. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here\.

## References

- C\. Bian, Y\. Zhou, M\. Li, and C\. Qian \(2025\)Stochastic population update can provably be helpful in multi\-objective evolutionary algorithms\.Artificial Intelligence341,pp\. 104308\.Cited by:[§3\.1](https://arxiv.org/html/2606.15115#S3.SS1.p2.1)\.
- D\. H\. Brookes, H\. Park, and J\. Listgarten \(2019\)Conditioning by adaptive sampling for robust design\.InProceedings of the 36th International Conference on Machine Learning,Long Beach, CA,pp\. 773–782\.Cited by:[Appendix M](https://arxiv.org/html/2606.15115#A13.p1.1),[§1](https://arxiv.org/html/2606.15115#S1.p2.1),[§2](https://arxiv.org/html/2606.15115#S2.p1.1)\.
- Y\. Chemingui, A\. Deshwal, T\. N\. Hoang, and J\. R\. Doppa \(2024\)Offline model\-based optimization via policy\-guided gradient search\.InProceedings of the 38th AAAI Conference on Artificial Intelligence,Vancouver, Canada,pp\. 11230–11239\.Cited by:[§2](https://arxiv.org/html/2606.15115#S2.p1.1)\.
- C\. Chen, C\. Beckham, Z\. Liu, X\. \(\. Liu, and C\. Pal \(2023\)Parallel\-mentoring for offline model\-based optimization\.InAdvances in Neural Information Processing Systems 36,New Orleans, LA\.Cited by:[Appendix D](https://arxiv.org/html/2606.15115#A4.p2.2),[§1](https://arxiv.org/html/2606.15115#S1.p2.1),[§2](https://arxiv.org/html/2606.15115#S2.p2.1)\.
- Z\. Chen, V\. Badrinarayanan, C\. Lee, and A\. Rabinovich \(2018\)GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks\.InProceedings of the 35th International Conference on Machine Learning,Stockholm, Sweden,pp\. 793–802\.Cited by:[Appendix D](https://arxiv.org/html/2606.15115#A4.p2.2)\.
- S\. Daulton, M\. Balandat, and E\. Bakshy \(2021\)Parallel Bayesian optimization of multiple noisy objectives with expected hypervolume improvement\.InAdvances in Neural Information Processing Systems 34,Virtual Event,pp\. 2187–2200\.Cited by:[Appendix D](https://arxiv.org/html/2606.15115#A4.p3.1),[§2](https://arxiv.org/html/2606.15115#S2.p2.1)\.
- K\. Deb, S\. Agrawal, A\. Pratap, and T\. Meyarivan \(2002\)A fast and elitist multiobjective genetic algorithm: NSGA\-II\.IEEE Transactions on Evolutionary Computation6\(2\),pp\. 182–197\.Cited by:[Appendix D](https://arxiv.org/html/2606.15115#A4.p2.2)\.
- Y\. Du and I\. Mordatch \(2019\)Implicit generation and modeling with energy based models\.InAdvances in Neural Information Processing Systems 32,Vancouver, Canada,pp\. 3603–3613\.Cited by:[§A\.1](https://arxiv.org/html/2606.15115#A1.SS1.p2.7)\.
- M\. Ehrgott \(2005\)Multicriteria optimization \(2\. ed\.\)\.Springer\.Cited by:[Definition 3\.1](https://arxiv.org/html/2606.15115#S3.Thmtheorem1)\.
- J\. Fu and S\. Levine \(2021\)Offline model\-based optimization via normalized maximum likelihood estimation\.InProceedings of the 9th International Conference on Learning Representations,Virtual Event\.Cited by:[§2](https://arxiv.org/html/2606.15115#S2.p1.1)\.
- C\. J\. Geyer \(1992\)Practical Markov chain Monte Carlo\.Statistical Science7\(4\),pp\. 473–483\.Cited by:[§A\.1](https://arxiv.org/html/2606.15115#A1.SS1.p2.7)\.
- G\. E\. Hinton \(2002\)Training products of experts by minimizing contrastive divergence\.Neural Computation14\(8\),pp\. 1771–1800\.Cited by:[§A\.1](https://arxiv.org/html/2606.15115#A1.SS1.p1.3)\.
- C\. Hvarfner, F\. Hutter, and L\. Nardi \(2022\)Joint entropy search for maximally\-informed Bayesian optimization\.InAdvances in Neural Information Processing Systems 35,New Orleans, LA,pp\. 11494–11506\.Cited by:[Appendix D](https://arxiv.org/html/2606.15115#A4.p3.1),[§2](https://arxiv.org/html/2606.15115#S2.p2.1)\.
- I\. Kaliszewski \(1987\)A modified weighted Tchebycheff metric for multiple objective programming\.Computers & Operations Research14\(4\),pp\. 315–323\.Cited by:[§3\.2](https://arxiv.org/html/2606.15115#S3.SS2.p1.2)\.
- J\. D\. Knowles \(2006\)ParEGO: A hybrid algorithm with on\-line landscape approximation for expensive multiobjective optimization problems\.IEEE Transactions on Evolutionary Computation10\(1\),pp\. 50–66\.Cited by:[Appendix D](https://arxiv.org/html/2606.15115#A4.p3.1),[§2](https://arxiv.org/html/2606.15115#S2.p2.1)\.
- A\. Kumar and S\. Levine \(2020\)Model inversion networks for model\-based optimization\.InAdvances in Neural Information Processing Systems 33,Virtual Event,pp\. 5126–5137\.Cited by:[§1](https://arxiv.org/html/2606.15115#S1.p3.1),[§2](https://arxiv.org/html/2606.15115#S2.p1.1)\.
- B\. Li, Z\. Di, Y\. Lu, H\. Qian, F\. Wang, P\. Yang, K\. Tang, and A\. Zhou \(2025\)Expensive multi\-objective Bayesian optimization based on diffusion models\.InProceedings of the 39th AAAI Conference on Artificial Intelligence,Philadelphia, PA,pp\. 27063–27071\.Cited by:[§1](https://arxiv.org/html/2606.15115#S1.p1.1),[§2](https://arxiv.org/html/2606.15115#S2.p3.1)\.
- B\. Li, J\. Li, K\. Tang, and X\. Yao \(2015\)Many\-objective evolutionary algorithms: A survey\.ACM Computing Surveys48\(1\),pp\. 13:1–13:35\.Cited by:[§1](https://arxiv.org/html/2606.15115#S1.p4.1),[Definition 3\.2](https://arxiv.org/html/2606.15115#S3.Thmtheorem2)\.
- X\. Lin, Z\. Yang, X\. Zhang, and Q\. Zhang \(2022\)Pareto set learning for expensive multi\-objective optimization\.InAdvances in Neural Information Processing Systems 35,New Orleans, LA\.Cited by:[Appendix M](https://arxiv.org/html/2606.15115#A13.p1.1),[§1](https://arxiv.org/html/2606.15115#S1.p1.1),[§2](https://arxiv.org/html/2606.15115#S2.p3.1),[§3\.2](https://arxiv.org/html/2606.15115#S3.SS2.p1.2),[§4\.2](https://arxiv.org/html/2606.15115#S4.SS2.p1.1)\.
- Z\. Liu, H\. Wang, and Y\. Jin \(2023\)Performance indicator\-based adaptive model selection for offline data\-driven multiobjective evolutionary optimization\.IEEE Transactions on Cybernetics53\(10\),pp\. 6263–6276\.Cited by:[§2](https://arxiv.org/html/2606.15115#S2.p2.1)\.
- H\. Lu, H\. Qian, Y\. Wu, Z\. Liu, Y\. Zhang, A\. Zhou, and Y\. Yu \(2023\)Degradation\-resistant offline optimization via accumulative risk control\.InProceedings of the 26th European Conference on Artificial Intelligence,Kraków, Poland,pp\. 1609–1616\.Cited by:[Appendix M](https://arxiv.org/html/2606.15115#A13.p1.1),[§1](https://arxiv.org/html/2606.15115#S1.p2.1),[§3\.3](https://arxiv.org/html/2606.15115#S3.SS3.p1.10),[§4\.2](https://arxiv.org/html/2606.15115#S4.SS2.p2.1),[§4\.2](https://arxiv.org/html/2606.15115#S4.SS2.p4.8),[§4\.2](https://arxiv.org/html/2606.15115#S4.SS2.p8.9)\.
- Z\. Lu, I\. Whalen, Y\. D\. Dhebar, K\. Deb, E\. D\. Goodman, W\. Banzhaf, and V\. N\. Boddeti \(2020\)NSGA\-Net: Neural architecture search using multi\-objective genetic algorithm \(extended abstract\)\.InProceedings of the 29th International Joint Conference on Artificial Intelligence,Virtual Event,pp\. 4750–4754\.Cited by:[§1](https://arxiv.org/html/2606.15115#S1.p1.1)\.
- S\. M\. Mashkaria, S\. Krishnamoorthy, and A\. Grover \(2023\)Generative pretraining for black\-box optimization\.InProceedings of the 40th International Conference on Machine Learning,Honolulu, HI,pp\. 24173–24197\.Cited by:[§2](https://arxiv.org/html/2606.15115#S2.p1.1)\.
- A\. Navon, A\. Shamsian, E\. Fetaya, and G\. Chechik \(2021\)Learning the Pareto front with hypernetworks\.InProceedings of the 9th International Conference on Learning Representations,Virtual Event\.Cited by:[§4\.2](https://arxiv.org/html/2606.15115#S4.SS2.p7.7)\.
- C\. A\. Nicolaou and N\. Brown \(2013\)Multi\-objective optimization methods in drug design\.Drug Discov\. Today: Technologies10\(3\),pp\. e427–e435\.Cited by:[§1](https://arxiv.org/html/2606.15115#S1.p1.1)\.
- E\. Nijkamp, M\. Hill, S\. Zhu, and Y\. N\. Wu \(2019\)Learning non\-convergent non\-persistent short\-run MCMC toward energy\-based model\.InAdvances in Neural Information Processing Systems 32,Vancouver, Canada,pp\. 5233–5243\.Cited by:[§A\.1](https://arxiv.org/html/2606.15115#A1.SS1.p2.7)\.
- R\. Ozaki, K\. Ishikawa, Y\. Kanzaki, S\. Takeno, I\. Takeuchi, and M\. Karasuyama \(2024\)Multi\-objective Bayesian optimization with active preference learning\.InProceedings of the 38th AAAI Conference on Artificial Intelligence,Vancouver, Canada,pp\. 14490–14498\.Cited by:[§1](https://arxiv.org/html/2606.15115#S1.p4.1)\.
- H\. Qi, Y\. Su, A\. Kumar, and S\. Levine \(2022\)Data\-driven offline decision\-making via invariant representation learning\.InAdvances in Neural Information Processing Systems 35,New Orleans, LA,pp\. 13226–13237\.Cited by:[Appendix D](https://arxiv.org/html/2606.15115#A4.p2.2),[§1](https://arxiv.org/html/2606.15115#S1.p3.1),[§2](https://arxiv.org/html/2606.15115#S2.p2.1)\.
- C\. Qian, Y\. Yu, and Z\. Zhou \(2013\)An analysis on recombination in multi\-objective evolutionary optimization\.Artificial Intelligence204,pp\. 99–119\.Cited by:[§3\.1](https://arxiv.org/html/2606.15115#S3.SS1.p2.1)\.
- H\. Qian, Y\. Zhu, X\. Shu, S\. Liu, Y\. Wen, X\. An, H\. Lu, A\. Zhou, K\. Tang, and Y\. Yu \(2025\)SOO\-Bench: benchmarks for evaluating the stability of offline black\-box optimization\.InProceedings of the 13th International Conference on Learning Representations,Singapore City, Singapore\.Cited by:[§2](https://arxiv.org/html/2606.15115#S2.p1.1)\.
- B\. Trabucco, X\. Geng, A\. Kumar, and S\. Levine \(2022\)Design\-Bench: Benchmarks for data\-driven offline model\-based optimization\.InProceedings of the 39th International Conference on Machine Learning,Baltimore, MD,pp\. 21658–21676\.Cited by:[§2](https://arxiv.org/html/2606.15115#S2.p1.1)\.
- B\. Trabucco, A\. Kumar, X\. Geng, and S\. Levine \(2021\)Conservative objective models for effective offline model\-based optimization\.InProceedings of the 38th International Conference on Machine Learning,Virtual Event,pp\. 10358–10368\.Cited by:[Appendix D](https://arxiv.org/html/2606.15115#A4.p2.2),[§1](https://arxiv.org/html/2606.15115#S1.p3.1),[§2](https://arxiv.org/html/2606.15115#S2.p1.1),[§2](https://arxiv.org/html/2606.15115#S2.p2.1)\.
- M\. Welling and Y\. W\. Teh \(2011\)Bayesian learning via stochastic gradient Langevin dynamics\.InProceedings of the 28th International Conference on Machine Learning,Bellevue, Washington,pp\. 681–688\.Cited by:[§A\.1](https://arxiv.org/html/2606.15115#A1.SS1.p2.7)\.
- K\. Xue, R\. Tan, X\. Huang, and C\. Qian \(2024\)Offline multi\-objective optimization\.InProceedings of the 41st International Conference on Machine Learning,Vienna, Austria,pp\. 55595–55624\.Cited by:[Appendix C](https://arxiv.org/html/2606.15115#A3.p1.1),[Appendix D](https://arxiv.org/html/2606.15115#A4.p1.1),[Appendix D](https://arxiv.org/html/2606.15115#A4.p2.2),[Appendix E](https://arxiv.org/html/2606.15115#A5.p1.3),[Table 14](https://arxiv.org/html/2606.15115#A7.T14),[Table 14](https://arxiv.org/html/2606.15115#A7.T14.4.2),[Table 25](https://arxiv.org/html/2606.15115#A8.T25),[Table 25](https://arxiv.org/html/2606.15115#A8.T25.4.2),[§1](https://arxiv.org/html/2606.15115#S1.p1.1),[§1](https://arxiv.org/html/2606.15115#S1.p2.1),[§2](https://arxiv.org/html/2606.15115#S2.p1.1),[§2](https://arxiv.org/html/2606.15115#S2.p2.1),[Table 1](https://arxiv.org/html/2606.15115#S4.T1),[Table 1](https://arxiv.org/html/2606.15115#S4.T1.2.1),[§5](https://arxiv.org/html/2606.15115#S5.p2.1),[§5](https://arxiv.org/html/2606.15115#S5.p3.1),[§5](https://arxiv.org/html/2606.15115#S5.p4.1)\.
- C\. Yang, J\. Ding, Y\. Jin, and T\. Chai \(2020\)Offline data\-driven multiobjective optimization: knowledge transfer between surrogates and generation of final solutions\.IEEE Transactions on Evolutionary Computation24\(3\),pp\. 409–423\.Cited by:[§2](https://arxiv.org/html/2606.15115#S2.p2.1)\.
- R\. Ye, L\. Chen, J\. Zhang, and H\. Ishibuchi \(2024\)Evolutionary preference sampling for Pareto set learning\.InProceedings of the 26th Genetic and Evolutionary Computation Conference,Melbourne, Australia,pp\. 630–638\.Cited by:[Appendix M](https://arxiv.org/html/2606.15115#A13.p1.1),[§2](https://arxiv.org/html/2606.15115#S2.p3.1)\.
- S\. Yu, S\. Ahn, L\. Song, and J\. Shin \(2021\)RoMA: Robust model adaptation for offline model\-based optimization\.InAdvances in Neural Information Processing Systems 34,Virtual Event,pp\. 4619–4631\.Cited by:[Appendix D](https://arxiv.org/html/2606.15115#A4.p2.2),[§1](https://arxiv.org/html/2606.15115#S1.p3.1)\.
- T\. Yu, S\. Kumar, A\. Gupta, S\. Levine, K\. Hausman, and C\. Finn \(2020\)Gradient surgery for multi\-task learning\.InAdvances in Neural Information Processing Systems 33,Virtual Event,pp\. 5824–5836\.Cited by:[Appendix D](https://arxiv.org/html/2606.15115#A4.p2.2)\.
- Z\. Yu, V\. Ramakrishnan, and C\. Meinzer \(2019\)Simulation optimization for Bayesian multi\-arm multi\-stage clinical trial with binary endpoints\.Journal of Biopharmaceutical Statistics29\(2\),pp\. 306–317\.Cited by:[§1](https://arxiv.org/html/2606.15115#S1.p1.1)\.
- Y\. Yuan, C\. Chen, Z\. Liu, W\. Neiswanger, and X\. \(\. Liu \(2023\)Importance\-aware co\-teaching for offline model\-based optimization\.InAdvances in Neural Information Processing Systems 36,New Orleans, LA\.Cited by:[Appendix D](https://arxiv.org/html/2606.15115#A4.p2.2)\.
- Y\. Yuan, C\. Chen, C\. Pal, and X\. Liu \(2025\)ParetoFlow: Guided flows in multi\-objective optimization\.InProceedings of the 13th International Conference on Learning Representations,Singapore City, Singapore\.Cited by:[Appendix D](https://arxiv.org/html/2606.15115#A4.p4.1),[§1](https://arxiv.org/html/2606.15115#S1.p1.1),[§1](https://arxiv.org/html/2606.15115#S1.p2.1),[§5](https://arxiv.org/html/2606.15115#S5.p3.1),[§5](https://arxiv.org/html/2606.15115#S5.p4.1)\.
- T\. Yun, S\. Yun, J\. Lee, and J\. Park \(2024\)Guided trajectory generation with diffusion models for offline model\-based optimization\.InAdvances in Neural Information Processing Systems 38,Vancouver, Canada\.Cited by:[§1](https://arxiv.org/html/2606.15115#S1.p2.1)\.
- Y\. Zhang, W\. Hu, W\. Yao, L\. Lian, and G\. G\. Yen \(2024\)Offline data\-driven multiobjective optimization evolutionary algorithm based on generative adversarial network\.IEEE Transactions on Evolutionary Computation28\(2\),pp\. 293–306\.Cited by:[§2](https://arxiv.org/html/2606.15115#S2.p2.1)\.
- Y\. Zhu, H\. Lu, Y\. Wu, S\. Liu, J\. Yang, and H\. Qian \(2025\)Constrained offline black\-box optimization via risk evaluation and management\.InProceedings of the 39th AAAI Conference on Artificial Intelligence,Philadelphia, PA,pp\. 23063–23071\.Cited by:[§2](https://arxiv.org/html/2606.15115#S2.p1.1)\.

## Appendix AEnergy Model Training and Risk Suppression Factor

### A\.1Train the Energy Model

To train the energy model to identity low\-risk and high\-risk solutions, ARCOO employs Contrastive Divergence \(CD\)\(Hinton,[2002](https://arxiv.org/html/2606.15115#bib.bib80)\):

ℒCD​\(𝝎\)=𝔼𝒙∼𝒫​\[E𝝎​\(𝒙\)\]−𝔼𝒙∼𝒬​\[E𝝎​\(𝒙\)\],\\mathcal\{L\}\_\{\\text\{CD\}\}\(\\bm\{\\omega\}\)=\\mathbb\{E\}\_\{\\bm\{x\}\\sim\\mathcal\{P\}\}\[E\_\{\\bm\{\\omega\}\}\(\\bm\{x\}\)\]\-\\mathbb\{E\}\_\{\\bm\{x\}\\sim\\mathcal\{Q\}\}\[E\_\{\\bm\{\\omega\}\}\(\\bm\{x\}\)\]\\,,\(5\)where𝒫\\mathcal\{P\}denotes the low\-risk distribution andQQdenotes the high\-risk distribution\.

Before training the energy model, the high\-risk distribution𝒬\\mathcal\{Q\}is still unfulfilled\. Since𝒬\\mathcal\{Q\}is intended to represent OOD solutions that are prone to overestimation, ARCOO adopts Markov Chain Monte Carlo \(MCMC\) methods\(Geyer,[1992](https://arxiv.org/html/2606.15115#bib.bib75); Welling and Teh,[2011](https://arxiv.org/html/2606.15115#bib.bib76)\)with Langevin dynamicsL​D𝝍LD\_\{\\bm\{\\psi\}\}\(Nijkampet al\.,[2019](https://arxiv.org/html/2606.15115#bib.bib77); Du and Mordatch,[2019](https://arxiv.org/html/2606.15115#bib.bib78)\)kernel to sample such solutions\. Let𝒬=L​D𝝍​\(𝒫;KLD\)\\mathcal\{Q\}=LD\_\{\\bm\{\\psi\}\}\(\\mathcal\{P\};K\_\{\\text\{LD\}\}\),𝒙0∼𝒫\\bm\{x\}\_\{0\}\\sim\\mathcal\{P\},𝒙k∼𝒬k\\bm\{x\}\_\{k\}\\sim\\mathcal\{Q\}^\{k\}, and𝒬k\\mathcal\{Q\}^\{k\}is sampled as:

𝒙k←𝒙k−1\+η​∇𝒙f^𝝍​\(𝒙k−1\)\+𝜶k,k=1,…,KLD,\\bm\{x\}\_\{k\}\\leftarrow\\bm\{x\}\_\{k\-1\}\+\\eta\\nabla\_\{\\bm\{x\}\}\\hat\{f\}\_\{\\bm\{\\psi\}\}\(\\bm\{x\}\_\{k\-1\}\)\+\\bm\{\\alpha\}\_\{k\},\\quad k=1,\\ldots,K\_\{\\text\{LD\}\}\\,,\(6\)whereαk,i\\alpha\_\{k,i\}denotes theii\-th element of the𝜶k\\bm\{\\alpha\}\_\{k\}, sampled independently asαk,i∼𝒩​\(0,η\)\\alpha\_\{k,i\}\\sim\\mathcal\{N\}\(0,\\eta\)andKLDK\_\{\\text\{LD\}\}is the total number of steps\. Starting from a sample𝒙0\\bm\{x\}\_\{0\}drawn from the low\-risk distribution𝒫\\mathcal\{P\}, the Langevin dynamicsLD𝝍​\(𝒫;KLD\)\\mathrm\{LD\}\_\{\\bm\{\\psi\}\}\(\\mathcal\{P\};K\_\{\\text\{LD\}\}\)performsKLDK\_\{\\text\{LD\}\}iterations of noisy gradient ascent to approximate a distribution𝒬\\mathcal\{Q\}that concentrates on overestimated OOD solutions\.

### A\.2Risk Suppression Factor

After training the energy modelE𝝎E\_\{\\bm\{\\omega\}\}, we use the output of the energy modelE𝝎​\(𝒙\)E\_\{\{\\bm\{\\omega\}\}\}\(\\bm\{x\}\), to compute a risk suppression factorR​\(𝒙\)R\(\\bm\{x\}\), defined as follows:

R​\(𝒙\)=c​\(EQ~−E𝝎​\(𝒙\)\)EQ~−EP~,R\(\\bm\{x\}\)=\\frac\{c\\left\(E\_\{\\tilde\{Q\}\}\-E\_\{\\bm\{\\omega\}\}\(\\bm\{x\}\)\\right\)\}\{E\_\{\\tilde\{Q\}\}\-E\_\{\\tilde\{P\}\}\}\\,,\(7\)whereEQ~=𝔼𝒙′∼Q~​\[E𝝎​\(𝒙′\)\],EP~=𝔼𝒙′∼P~​\[E𝝎​\(𝒙′\)\]E\_\{\\widetilde\{Q\}\}=\\mathbb\{E\}\_\{\\bm\{x\}^\{\\prime\}\\sim\\widetilde\{Q\}\}\[E\_\{\\bm\{\\omega\}\}\(\\bm\{x\}^\{\\prime\}\)\],\\ E\_\{\\widetilde\{P\}\}=\\mathbb\{E\}\_\{\\bm\{x\}^\{\\prime\}\\sim\\widetilde\{P\}\}\[E\_\{\\bm\{\\omega\}\}\(\\bm\{x\}^\{\\prime\}\)\]andccdenotes the initial momentum\. TheP~\\widetilde\{P\}represents the empirical distribution over the high\-quality batch of solutions in the offline dataset\. TheQ~\\widetilde\{Q\}represents the high\-risk distribution sampled by Langevin dynamics starting fromP~\\widetilde\{P\}\. With the risk suppression factor, we can suppress the risk to a corresponding level in each iteration of nested Pareto set learning\.

## Appendix BPseudo\-Code of DOMOO

The pseudo\-code of DOMOO is shown in Algorithm[1](https://arxiv.org/html/2606.15115#alg1)\. The algorithm aims to solve offline multi\-objective optimization problems and obtain a solution set with satisfactory diversity and convergence\. Given an offline dataset𝒟\\mathcal\{D\}, DOMOO trains the surrogate objectives𝒇^\\hat\{\\bm\{f\}\}and learns a Pareto set model that maps diverse preference vectors to corresponding Pareto\-optimal solutions\. The inputs to the algorithm include the offline dataset𝒟\\mathcal\{D\}, valid preferencesΛ\\Lambda, offline preferencesΛoffline\\Lambda\_\{\\text\{offline\}\}, the number of objectivesMM, total optimization stepsTT, the number of pretraining stepsTpreT\_\{\\text\{pre\}\}, the number of exploration stepsTexpT\_\{\\text\{exp\}\}, number of candidate solutionsKK, and batch sizeBB\.

At the beginning, DOMOO trainsMMsurrogate modelsf^i\\hat\{f\}\_\{i\}for each objective using the offline dataset𝒟\\mathcal\{D\}and initializes a Pareto set modelhϕh\_\{\\bm\{\\phi\}\}, as shown in lines 1\-2\. The Pareto set learning is performed in a nested manner\. In the inner loop \(lines 5\-14\), DOMOO updates the preferences depending on the current phase\. During the pretraining phase \(t≤Tpret\\leq T\_\{\\text\{pre\}\}\), preferences are sampled directly from the offline preference setΛoffline\\Lambda\_\{\\text\{offline\}\}, providing a better initialization for the subsequent training stages, as shown in line 7\. In the exploration phase \(Tpre<t≤Tpre\+TexpT\_\{\\text\{pre\}\}<t\\leq T\_\{\\text\{pre\}\}\+T\_\{\\text\{exp\}\}\), preferences are sampled from the Dirichlet distribution over the preference setΛ\\Lambda, i\.e\.,𝝀t\(b\)∼Dirichlet​\(α\)⊂Λ\\bm\{\\lambda\}\_\{t\}^\{\(b\)\}\\sim\\mathrm\{Dirichlet\}\(\\alpha\)\\subset\\Lambda, enabling the model to be trained over the entire preference space, as shown in line 9\. In the later stage \(t\>Tpre\+Texpt\>T\_\{\\text\{pre\}\}\+T\_\{\\text\{exp\}\}\), preferences are updated via gradient descent according to Equation[2](https://arxiv.org/html/2606.15115#S4.E2)to guide the Pareto set model to focus on regions where its performance is lacking, as shown in lines 11\-13\.

In the outer loop \(lines 15\-19\), the updated preferences are used to train the Pareto set model via gradient descent according to Equation[3](https://arxiv.org/html/2606.15115#S4.E3)\. After the nested Pareto set learning, diverse preferences are sampled again, and the trained Pareto set model generates candidate solutions \(lines 21\-22\)\. These candidate solutions are merged with solutions generated by the surrogate model to form a comprehensive candidate set \(line 23\)\.

Finally, DOMOO selects the final set of Pareto solutions using a two\-stage selection strategy: it first applies theIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}indicator to select solutions and then uses the HV indicator to fill the remaining solutions, as shown in lines 24\-27\. This selection mechanism ensures both diversity and convergence of the final solution set\.

Algorithm 1Diversity\-Driven Offline Multi\-Objective Optimization via Nested Pareto Set Learning0:Offline dataset

𝒟\\mathcal\{D\}, valid preferences

Λ\\Lambda, offline preferences

Λoffline\\Lambda\_\{\\text\{offline\}\}, objective number

MM, total steps

TT, pretraining steps

TpreT\_\{\\text\{pre\}\}, exploration steps

TexpT\_\{\\text\{exp\}\}, candidate number

KK, batch size

BB
0:

1:Train surrogate model

𝒇^​\(𝒙\)=\(f^1​\(𝒙;𝜽1∗\),⋯,f^M​\(𝒙;𝜽M∗\)\)\\hat\{\\bm\{f\}\}\(\\bm\{x\}\)=\(\\hat\{f\}\_\{1\}\(\\bm\{x\};\\bm\{\\theta\}\_\{1\}^\{\\ast\}\),\\cdots,\\hat\{f\}\_\{M\}\(\\bm\{x\};\\bm\{\\theta\}\_\{M\}^\{\\ast\}\)\)using

𝒟\\mathcal\{D\}
2:Initialize Pareto set model

hϕ:𝝀↦𝒙h\_\{\\bm\{\\phi\}\}:\\bm\{\\lambda\}\\mapsto\\bm\{x\}
3:/\* Nested Pareto Set Learning \*/

4:for

t=1t=1to

TTdo

5:/\* Inner\-Loop Preference Update \*/

6:if

t≤Tpret\\leq T\_\{\\text\{pre\}\}then

7:Sample preferences

Λt=\{𝝀t\(b\)∼Λoffline\}b=1B\\Lambda\_\{t\}=\\\{\\bm\{\\lambda\}\_\{t\}^\{\(b\)\}\\sim\\Lambda\_\{\\text\{offline\}\}\\\}\_\{b=1\}^\{B\}⊳\\trianglerightPretraining phase

8:elseif

Tpre<t≤Tpre\+TexpT\_\{\\text\{pre\}\}<t\\leq T\_\{\\text\{pre\}\}\+T\_\{\\text\{exp\}\}then

9:Sample preferences

Λt=\{𝝀t\(b\)∼Dirichlet​\(α\)⊂Λ\}b=1B\\Lambda\_\{t\}=\\\{\\bm\{\\lambda\}\_\{t\}^\{\(b\)\}\\sim\\text\{Dirichlet\}\(\\alpha\)\\subset\\Lambda\\\}\_\{b=1\}^\{B\}⊳\\trianglerightExploration phase

10:else

11:Generate

𝑿t−1=\{𝒙t−1\(b\)=hϕ​\(𝝀t−1\(b\)\)\}b=1B\\bm\{X\}\_\{t\-1\}=\\\{\\bm\{x\}\_\{t\-1\}^\{\(b\)\}=h\_\{\\bm\{\\phi\}\}\(\\bm\{\\lambda\}\_\{t\-1\}^\{\(b\)\}\)\\\}\_\{b=1\}^\{B\}⊳\\trianglerightPreference gradient update phase

12:Evaluate objective values via the surrogate model

13:Find preference vectors

Λt=\{𝝀t\(b\)\}b=1B\\Lambda\_\{t\}=\\\{\\bm\{\\lambda\}\_\{t\}^\{\(b\)\}\\\}\_\{b=1\}^\{B\}via gradient descent according to Equation[2](https://arxiv.org/html/2606.15115#S4.E2)

14:endif

15:/\* Outer\-Loop Set Model Update \*/

16:Generate

𝑿t=\{𝒙t\(b\)=hϕ​\(𝝀t\(b\)\)\}b=1B\\bm\{X\}\_\{t\}=\\\{\\bm\{x\}\_\{t\}^\{\(b\)\}=h\_\{\\bm\{\\phi\}\}\(\\bm\{\\lambda\}\_\{t\}^\{\(b\)\}\)\\\}\_\{b=1\}^\{B\}
17:Evaluate objective values via the surrogate model

18:Update Pareto set model parameters

ϕ\\bm\{\\phi\}via gradient descent according to Equation[3](https://arxiv.org/html/2606.15115#S4.E3)

19:endfor

20:/\* Candidate Solution Generation \*/

21:Sample diverse candidate preferences

Λps=\{𝝀ps\(k\)∼Dirichlet​\(α\)⊂Λ\}k=1K\\Lambda\_\{\\text\{ps\}\}=\\\{\\bm\{\\lambda\}\_\{\\text\{ps\}\}^\{\(k\)\}\\sim\\text\{Dirichlet\}\(\\alpha\)\\subset\\Lambda\\\}\_\{k=1\}^\{K\}
22:Generate

KKcandidates via the trained Pareto set model

hϕ∗h\_\{\\bm\{\\phi^\{\\ast\}\}\}:

𝑿ps=\{𝒙ps\(k\)=hϕ∗​\(𝝀ps\(k\)\)\}k=1K\\bm\{X\}\_\{\\text\{ps\}\}=\\\{\\bm\{x\}\_\{\\text\{ps\}\}^\{\(k\)\}=h\_\{\\bm\{\\phi^\{\\ast\}\}\}\(\\bm\{\\lambda\}\_\{\\text\{ps\}\}^\{\(k\)\}\)\\\}\_\{k=1\}^\{K\}
23:Merge

𝑿ps\\bm\{X\}\_\{\\text\{ps\}\}with the

KKsolutions generated by the surrogate model

𝒇^​\(⋅\)\\hat\{\\bm\{f\}\}\(\\cdot\)and obtain the final

𝑿cand\\bm\{X\}\_\{\\text\{cand\}\}
24:/\* Solution Selection based on Two Indicators \*/

25:Use the

IGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}indicator to select the solutions greedy from

𝑿cand\\bm\{X\}\_\{\\text\{cand\}\}for initial screening

26:Use the HV indicator to select remaining solutions from

𝑿cand\\bm\{X\}\_\{\\text\{cand\}\}for final filling

27:returnthe solution set of the selected Pareto solutions

## Appendix CTask Descriptions

In this section, We describe a set of tasks included in the benchmark, explaining their information in detail111In this study, we focus on tasks with up to three objectives\. This choice is motivated by the significantly increased complexity and computational cost associated with high\-dimensional Pareto fronts\. To ensure fair comparison and reproducibility under a limited computational budget, we do not evaluate tasks with more than three objectives\. Extending our method to higher\-dimensional objective spaces is left for future work\.\. We benchmark our method on Off\-MOO\-Bench tasks\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64)\), including diverse real\-world and synthetic tasks\. We focus on five distinct task categories222We communicated with the original authors and used updated benchmark data to complete the experimental results for all tasks, rather than relying on those reported in the original paper\. As a result, some discrepancies may exist\.\. An overview of the tasks is provided in Table[4](https://arxiv.org/html/2606.15115#A3.T4)\.

Table 4:Properties of the tasks\.Task NameDataset sizeDimensions\# ObjectivesSearch spaceSynthetic600002\-302\-3ContinuousMO\-NAS9735\-600005\-342\-3CategoricalMORL857197342Continuous4500101842ContinuousSci\-Design49001323Continuous4204842Sequence493742Sequence4800042SequenceRE600003\-62\-6Continuous & MixedTable 5:Problem information and reference point for synthetic functions\.Table 6:An overview of the search spaces in MO\-NAS tasks\.∙\\bulletSynthetic Function \(Synthetic\): This task comprises 16 subtasks, each with 2\-3 objectives, aiming to identify potential solutions across the offline dataset\. All synthetic problems feature continuous solution spaces\. Table[5](https://arxiv.org/html/2606.15115#A3.T5)provides detailed information about each problem, including the shape of the Pareto front and the reference point\.

∙\\bulletMulti\-Objective Neural Architecture Search \(MO\-NAS\): This task involves 14 subtasks, each aiming to optimize 2\-3 objectives in neural architecture design, including prediction error, parameter count, edge GPU latency, and so on\. Detailed information of these search spaces𝒳\\mathcal\{X\}can be found in Table[6](https://arxiv.org/html/2606.15115#A3.T6)\.

∙\\bulletMulti\-Objective Reinforcement Learning \(MORL\): This task encompasses two subtasks: \(a\) MO\-Swimmer: This task involves finding a control policy in a 9,734\-dimensional space to optimize both speed and energy efficiency for a robot\. \(b\) MO\-Hopper: This task involves finding a control policy in a 10,184\-dimensional space to optimize 2 objectives related to running and jumping for a single\-legged robot\.

∙\\bulletScientific Design \(Sci\-Design\): This task includes four representative subtasks: \(a\) Molecule design\-optimization in a pretrained 32\-dimensional latent space to improve activity against GSK3β\\betaand JNK3; \(b\) Regex\-maximizing bigram frequencies in protein sequences; \(c\) ZINC\-optimizing molecular properties \(logP and QED\) on a small\-scale dataset; \(d\) RFP\-large\-scale optimization of red fluorescent protein variants for solvent\-accessible surface area and stability\.

∙\\bulletReal\-World Application \(RE\): The task includes many real\-world multi\-objective engineering design problems, such as four bar truss design, pressure vessel design, disc brake design, and so on\.

## Appendix DDetails of Compared Methods

In line with Off\-MOO\-Bench\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64)\), our evaluation primarily includes two categories of methods\-deep neural network \(DNN\)\-based and Gaussian process \(GP\)\-based approaches\. Additionally, to broaden the scope of comparison, we also evaluate a flow\-based generative modeling technique\.

DNN\-Based Methods\.These methods employ surrogate DNN models combined with evolutionary algorithms for solution optimization\. We evaluate three configurations: \(a\) End\-to\-End Model \(E2E\): Directly outputs an m\-dimensional objective vector for a given design𝒙\\bm\{x\}, enhanced by multi\-task learning\(Chenet al\.,[2018](https://arxiv.org/html/2606.15115#bib.bib11); Yuet al\.,[2020](https://arxiv.org/html/2606.15115#bib.bib12)\)for improved objective performance\. \(b\) Multi\-Head Model \(MH\): uses multi\-task learning by a single surrogate model, with the same enhancements as the E2E model\. \(c\) Multiple Models \(MM\): Maintainsmmindependent surrogates, each trained with OOD mitigating techniques, such as COMs\(Trabuccoet al\.,[2021](https://arxiv.org/html/2606.15115#bib.bib23)\), RoMA\(Yuet al\.,[2021](https://arxiv.org/html/2606.15115#bib.bib33)\), IOM\(Qiet al\.,[2022](https://arxiv.org/html/2606.15115#bib.bib17)\), ICT\(Yuanet al\.,[2023](https://arxiv.org/html/2606.15115#bib.bib34)\), and Tri\-mentoring\(Chenet al\.,[2023](https://arxiv.org/html/2606.15115#bib.bib32)\)\. Following the original study\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64)\), we adopt NSGA\-II\(Debet al\.,[2002](https://arxiv.org/html/2606.15115#bib.bib66)\)as the default evolutionary algorithm\.

GP\-Based Methods\.Bayesian optimization computes an acquisition function to guide the selection of solutions, which are then evaluated using a surrogate model\. We consider three representative techniques: hypervolume\-based qNEHVI\(Daultonet al\.,[2021](https://arxiv.org/html/2606.15115#bib.bib61)\), scalarization\-based qParEGO\(Knowles,[2006](https://arxiv.org/html/2606.15115#bib.bib62)\), and information\-theoretic JES\(Hvarfneret al\.,[2022](https://arxiv.org/html/2606.15115#bib.bib63)\)\.

Generative Methods\.We additionally include ParetoFlow\(Yuanet al\.,[2025](https://arxiv.org/html/2606.15115#bib.bib13)\)as a comparison\. It is a flow\-based preference\-conditioned generator that employs classifier\-guided generation and thus trains one surrogate predictor per objective, while conditioning the flow\-based generator on uniformly sampled preference weights to produce solutions along the Pareto front\.

## Appendix ETraining Details

For fair comparison, we adopt the same experimental settings as in the Off\-MOO\-Bench\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64)\)\. In our method, the predictor network is a multilayer perceptron \(MLP\) with the following architecture:

Input→MLP\(2048\)→LeakyReLU→MLP\(2048\)→LeakyReLU→MLP\(1\)\.\\text\{Input\}\\rightarrow\\text\{MLP\(2048\)\}\\rightarrow\\text\{LeakyReLU\}\\rightarrow\\text\{MLP\(2048\)\}\\rightarrow\\text\{LeakyReLU\}\\rightarrow\\text\{MLP\(1\)\}\.We use mean squared error \(MSE\) as the loss function and optimize the network using Adam with a learning rate ofη=0\.001\\eta=0\.001and exponential learning rate decayγ=0\.98\\gamma=0\.98\. The model is trained on the offline dataset for 100 epochs with a batch size of 256\. Additionally, we apply data pruning to alleviate model collapse on certain tasks\.

For the energy\-based model, we use a separate MLP with the following architecture:

Input→MLP\(512\)→LeakyReLU→MLP\(512\)→LeakyReLU→MLP\(1\)\.\\text\{Input\}\\rightarrow\\text\{MLP\(512\)\}\\rightarrow\\text\{LeakyReLU\}\\rightarrow\\text\{MLP\(512\)\}\\rightarrow\\text\{LeakyReLU\}\\rightarrow\\text\{MLP\(1\)\}\.The energy\-based model is trained using the Adam optimizer with the same learning rate\. The energy head is updated via contrastive loss, where negative samples are generated using Langevin dynamics\. This model is trained for 50 epochs with a batch size of 256\.

We adopt task\-specific hyper\-parameters for different categories in the Off\-MOO\-Bench\. For MO\-NAS tasks, the energy model usesK=64K=64Langevin steps, the Pareto set model is pre\-trained for 100 steps, followed by 400 steps of optimization with randomly sampled preferences and 400 steps of nested PSL optimization\. For MORL tasks, due to the extremely high\-dimensional input space, we use a smaller configuration withK=8K=8Langevin steps, 100 pre\-training steps, and only 5 steps each for random preference optimization and nested PSL\. For all other tasks, we setK=42K=42for the energy model, and perform 200 steps of pre\-training, 200 steps of random preference optimization, and 100 steps of nested PSL\.

## Appendix FComputational Cost

All experiments are conducted on a workstation equipped with an Intel\(R\) Xeon\(R\) Gold 6354 CPU \(3\.00GHz\) and an NVIDIA RTX 3090 GPU\. The total computational cost of our method consists of five main components: training the surrogate model, training the energy model, initializing the Pareto set model, training the Pareto set model, and performing data selection\. The corresponding runtime \(measured in seconds\) is provided in Table[7](https://arxiv.org/html/2606.15115#A6.T7)\. Our method is efficient, completing most tasks within 10 minutes\.

Table 7:Time cost of DOMOO\.Table 8:The runtime for each method to complete model training and optimization on the C\-10/MOP1 and MO\-Hopper tasks \(unit: minutes\)\.As shown in Table[8](https://arxiv.org/html/2606.15115#A6.T8), DOMOO takes longer than some baseline methods due to the additional cumulative risk control module for handling OOD issues\. Although DOMOO includes additional components such as the energy model \(Table[7](https://arxiv.org/html/2606.15115#A6.T7)\), the overall runtime remains moderate\. As shown in Table[8](https://arxiv.org/html/2606.15115#A6.T8), DOMOO is only slightly slower than lightweight surrogate\-based baselines\-typically within one minute\-while remaining competitive or even faster than several existing methods\. More importantly, in offline optimization the quality of the obtained Pareto set is far more critical than marginal differences in runtime, since no additional evaluations or online interactions are permitted\. The modest overhead introduced by the risk\-control module therefore represents a reasonable and practical trade\-off\.

## Appendix GHV Experiment Results

### G\.1The100t​h100^\{th\}Percentile Results

As shown in Table[9](https://arxiv.org/html/2606.15115#A7.T9), Table[10](https://arxiv.org/html/2606.15115#A7.T10), Table[11](https://arxiv.org/html/2606.15115#A7.T11), Table[12](https://arxiv.org/html/2606.15115#A7.T12), and Table[13](https://arxiv.org/html/2606.15115#A7.T13), we report the100t​h100^\{th\}percentile hypervolume results with 256 solutions\. DOMOO consistently performs well across tasks\. Methods within one standard deviation of the best are highlighted inbold\.

Table 9:Hypervolume results for synthetic functions with 256 solutions and100t​h100^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 10:Hypervolume results for MO\-NAS with 256 solutions and100t​h100^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 11:Hypervolume results for MORL with 256 solutions and100t​h100^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 12:Hypervolume results for RE with 256 solutions and100t​h100^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 13:Hypervolume results for scientific design with 256 solutions and100t​h100^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.
### G\.2The50t​h50^\{th\}Percentile Results

As shown in Table[14](https://arxiv.org/html/2606.15115#A7.T14), we report the50t​h50^\{th\}percentile HV average ranks with 256 solutions\. As shown in Table[15](https://arxiv.org/html/2606.15115#A7.T15), Table[16](https://arxiv.org/html/2606.15115#A7.T16), Table[17](https://arxiv.org/html/2606.15115#A7.T17), Table[18](https://arxiv.org/html/2606.15115#A7.T18), and Table[19](https://arxiv.org/html/2606.15115#A7.T19), we report the50t​h50^\{th\}percentile hypervolume results with 256 solutions\. DOMOO consistently performs well across tasks\. Methods within one standard deviation of the best are highlighted inbold\.

Table 14:Comparison of average HV ranks at the50t​h50^\{th\}percentile achieved by different offline MOO methods across different tasks in Off\-MOO\-Bench\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64)\)\. For each task, the top three methods are highlighted using\(1st\),\(2nd\), and\(3rd\)formatting\.𝒟​\(best\)\\mathcal\{D\}\(\\text\{best\}\)denotes the best subset in the offline dataset \(i\.e\., with the highest HV\), and the last column reports the average rank across all tasks\.Table 15:Hypervolume results for synthetic functions with 256 solutions and50t​h50^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 16:Hypervolume results for MO\-NAS with 256 solutions and50t​h50^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 17:Hypervolume results for MORL with 256 solutions and50t​h50^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 18:Hypervolume results for RE with 256 solutions and50t​h50^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 19:Hypervolume results for scientific design with 256 solutions and50t​h50^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.

## Appendix HIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}Experiment Results

### H\.1The100t​h100^\{th\}Percentile Results

As shown in Table[20](https://arxiv.org/html/2606.15115#A8.T20), Table[21](https://arxiv.org/html/2606.15115#A8.T21), Table[22](https://arxiv.org/html/2606.15115#A8.T22), Table[23](https://arxiv.org/html/2606.15115#A8.T23), and Table[24](https://arxiv.org/html/2606.15115#A8.T24), we report the100t​h100^\{th\}percentile IGDoffline\{\}\_\{\\text\{offline\}\}results with 256 solutions\. DOMOO consistently performs well across tasks\. Methods within one standard deviation of the best are highlighted inbold\.

Table 20:IGDoffline\{\}\_\{\\text\{offline\}\}results for synthetic functions with 256 solutions and100t​h100^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 21:IGDoffline\{\}\_\{\\text\{offline\}\}results for MO\-NAS with 256 solutions and100t​h100^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 22:IGDoffline\{\}\_\{\\text\{offline\}\}results for MORL with 256 solutions and100t​h100^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 23:IGDoffline\{\}\_\{\\text\{offline\}\}results for RE with 256 solutions and100t​h100^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 24:IGDoffline\{\}\_\{\\text\{offline\}\}results for scientific design with 256 solutions and100t​h100^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.
### H\.2The50t​h50^\{th\}Percentile Results

As shown in Table[25](https://arxiv.org/html/2606.15115#A8.T25), we report the50t​h50^\{th\}percentile IGDoffline\{\}\_\{\\text\{offline\}\}average ranks with 256 solutions\. As shown in Table[26](https://arxiv.org/html/2606.15115#A8.T26), Table[27](https://arxiv.org/html/2606.15115#A8.T27), Table[28](https://arxiv.org/html/2606.15115#A8.T28), Table[29](https://arxiv.org/html/2606.15115#A8.T29), and Table[30](https://arxiv.org/html/2606.15115#A8.T30), we report the50t​h50^\{th\}percentile IGDoffline\{\}\_\{\\text\{offline\}\}results with 256 solutions\. DOMOO consistently performs well across tasks\. Methods within one standard deviation of the best are highlighted inbold\.

Table 25:Comparison of averageIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}ranks at the50t​h50^\{th\}percentile achieved by different offline MOO methods across different tasks in Off\-MOO\-Bench\(Xueet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib64)\)\. Details are the same as Table[14](https://arxiv.org/html/2606.15115#A7.T14)\.Table 26:IGDoffline\{\}\_\{\\text\{offline\}\}results for synthetic functions with 256 solutions and50t​h50^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 27:IGDoffline\{\}\_\{\\text\{offline\}\}results for MO\-NAS with 256 solutions and50t​h50^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 28:IGDoffline\{\}\_\{\\text\{offline\}\}results for MORL with 256 solutions and50t​h50^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 29:IGDoffline\{\}\_\{\\text\{offline\}\}results for RE with 256 solutions and50t​h50^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.Table 30:IGDoffline\{\}\_\{\\text\{offline\}\}results for scientific design with 256 solutions and50t​h50^\{th\}percentile evaluations\. For each task, algorithms within one standard deviation of having the highest performance arebolded\.

## Appendix IResults of Selection Indicators for Diversity

About the Impact of the Selection Indicator on Diversity\.The core issue with HV\-based selection in offline MOO arises from surrogate\-induced spurious Pareto fronts in OOD regions: due to limited offline data, surrogates often extrapolate a much wider front than the true one\. Candidates far beyond the true front typically correspond to poorly calibrated regions, i\.e\., they have low true quality and tend to cluster tightly in objective space\. When using HV for selection, its marginal\-volume mechanism tends to uniformly pick solutions along this wide spurious front\. Consequently, HV frequently selects many tightly clustered, low\-quality solutions in reality, harming true diversity despite acceptable surrogate\-based scores\. In contrast,IGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}uses the offline Pareto front as reference and computes average distances to it\. This inherently penalizes solutions deviating from reliable data regions, acting as a conservative filter against OOD artifacts where no active queries can correct surrogate errors\.

The resulting solution distributions are shown in Figure[4](https://arxiv.org/html/2606.15115#A9.F4)\. The results clearly demonstrate that the HV selection leads to a poorly distributed set with solutions clustered in a narrow region\. TheIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}selection produces a well\-distributed front that covers the entire spectrum of known trade\-offs, underscoring its effectiveness in preserving diversity\.

![Refer to caption](https://arxiv.org/html/2606.15115v1/x3.png)\(a\) Solution Distribution using HV Indicator\.
![Refer to caption](https://arxiv.org/html/2606.15115v1/x4.png)\(b\) Solution Distribution usingIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}Indicator\.

Figure 4:Comparison of final solution sets selected by different indicators\.
## Appendix JAdditional Details for Ablation Study

### J\.1Detailed Settings of Ablation Variants

In Section[5\.3](https://arxiv.org/html/2606.15115#S5.SS3)of the main paper, we evaluate the contribution of each essential module in DOMOO by comparing the full version with five ablated variants\. The detailed implementation settings for these variants are defined as follows:

∙\\bulletWithout Accumulative Risk Control \(w\.o\. ARC\):In this version, we replace the accumulative risk control \(as shown in Equation[2](https://arxiv.org/html/2606.15115#S4.E2)\) with learning rate in gradient descent\.

∙\\bulletWithout Nested Pareto Set Learning \(w\.o\. NPSL\):In this version, we remove the “Preference Update” part and randomly sample𝝀\\bm\{\\lambda\}from all valid preferencesΛ\\Lambdaat the begin of each iteration\.

∙\\bulletWithout Pareto Set Model Generation \(w\.o\. PSMG\):In this version, we remove the candidate generation step of the Pareto set modelhϕ∗h\_\{\\bm\{\\phi\}^\{\\ast\}\}and use the surrogate model alone to generate all candidate solutions\.

∙\\bulletWithout Surrogate Model Generation \(w\.o\. SMG\):In this version, we remove candidate generated by surrogate model and rely solely on the Pareto set model to generate all candidate solutions\.

∙\\bulletWithout Diversity\-Driven Solution Selection \(w\.o\. DDSS\):In this version, we omit the proposed solution selection strategy and select all256256solutions by HV\. All other settings in the five versions are kept similar to the original version\.

### J\.2Effectiveness of Diversity\-Driven Selection Mechanism

About the Effectiveness of Diversity\-Driven Selection\.To demonstrate the necessity of our proposed DDSS, we compare the solution distributions with and without this mechanism on a representative benchmark task, as shown in Figure[5](https://arxiv.org/html/2606.15115#A10.F5)\. Without DDSS, solution distribution shows a poorly diversified front along thef2f\_\{2\}axis, while our DDSS effectively produces a well\-distributed Pareto front\. This contrast highlights DDSS’s crucial role in balancing diversity and convergence under OOD constraints\.

![Refer to caption](https://arxiv.org/html/2606.15115v1/x5.png)\(a\) Solution Distribution Without DDSS\.
![Refer to caption](https://arxiv.org/html/2606.15115v1/x6.png)\(b\) Solution Distribution With DDSS\.

Figure 5:Comparison of solution distributions with and without the DDSS mechanism\.

## Appendix KHyper\-Parameter Analysis

About the Impact of Hyper\-Parameters\.To explore the sensitivity of DOMOO to different hyper\-parameters, we analyze the exploration steps in nested Pareto set learningTexpT\_\{\\text\{exp\}\}on three representative tasks, with results shown in in Tables[31](https://arxiv.org/html/2606.15115#A11.T31)and[32](https://arxiv.org/html/2606.15115#A11.T32)\. DOMOO is robust on continuous and sequence\-based tasks, but shows higher sensitivity on discrete tasks, likely due to the difficulty of optimizing over high\-cardinality categorical spaces\. Nonetheless, performance remains stable whenTexpT\_\{\\text\{exp\}\}is set within a reasonable range\.

About theKKin Diversity\-Driven Solution Selection\.To examine the effect of the DDSS selection budget on performance, we analyze the maximum number of solutions selected by the IGDoffline\{\}\_\{\\text\{offline\}\}\-based stage before HV filling\. As described in Section 4\.3, DOMOO first selects at most 128 solutions from𝐗cand\\mathbf\{X\}\_\{\\text\{cand\}\}using IGDoffline\{\}\_\{\\text\{offline\}\}, and then uses HV to fill the remaining slots to obtain 256 solutions for final evaluation\. This maximum number therefore plays a key role in balancing diversity and convergence\. We evaluate different settings of this hyper\-parameter on several representative tasks, with results reported in Tables[33](https://arxiv.org/html/2606.15115#A11.T33)and[34](https://arxiv.org/html/2606.15115#A11.T34)\. The results show that DOMOO remains stable across a broad range of values, and setting the maximum number to 128 provides a good balance between convergence quality and front coverage\.

About the Robustness to Scaling Factor in theIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}Indicator\.As shown in Table[35](https://arxiv.org/html/2606.15115#A11.T35), we further investigate the sensitivity ofIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}to the scaling factorβ\\beta\. Whenβ\\betais increased from 0\.5 to 5\.0, the average ranks of all methods exhibit only minor fluctuations, and their relative order remains largely unchanged\. Within a reasonable range, the choice of the scaling value does not substantially affect the comparative evaluation results underIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}, verifying the robustness of this indicator with respect to the scaling hyper\-parameter\. In addition, DOMOO consistently achieves the best average rank across all choices ofβ\\beta\.

About the Robustness of DOMOO to the Energy Model Risk\-Ratio Hyper\-parameter in Energy\-Based Tasks\.As shown in Tables[36](https://arxiv.org/html/2606.15115#A11.T36)and[37](https://arxiv.org/html/2606.15115#A11.T37), we further analyze the sensitivity of DOMOO to the risk ratio used in the construction of the energy models across different tasks\. When the risk ratio varies from 0\.2 to 1\.6, most tasks \(e\.g\.,re24,re25,re34,dtlz4\) exhibit almost unchanged HV andIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}, indicating very low sensitivity and strong robustness to this hyper\-parameter\. For tasks such asin1kmop7andmo\_hopper\_v2, the performance shows only mild and smooth variation without any abrupt degradation, suggesting controlled and predictable sensitivity rather than instability\. Overall, these results demonstrate that DOMOO maintains stable performance under a wide range of risk ratios, verifying the robustness of the algorithm with respect to the risk\-ratio hyper\-parameter in energy\-based tasks\.

Table 31:HV results under differentTexp/TT\_\{\\text\{exp\}\}/Tvalues\.Table 32:IGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}results under differentTexp/TT\_\{\\text\{exp\}\}/Tvalues\.Table 33:HV results under different maximum numbers\.Table 34:𝐈𝐆𝐃offline\\mathbf\{IGD\}\_\{\\text\{offline\}\}results under different maximum numbers\.Table 35:Comparison of averageIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}ranks under differentβ\\beta\.Table 36:Comparison of average HV ranks across different energy model risk ratios in Off\-MOO\-Bench\.Table 37:Comparison of averageIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}ranks across different energy model risk ratios in Off\-MOO\-Bench\.
## Appendix LHow DOMOO Performance Varies with Different Training Set Sizes

To further examine the robustness of DOMOO with respect to the amount of available training data, we conduct an additional sensitivity analysis in which the dataset is randomly subsampled to 25%, 50%, 75%, and 100% of its original size\. As reported in Tables[38](https://arxiv.org/html/2606.15115#A12.T38)and[39](https://arxiv.org/html/2606.15115#A12.T39), DOMOO maintains highly stable performance across all data scales\. For most tasks \(e\.g\.,in1kmop7,regex,re24\), both HV andIGDoffline\\mathrm\{IGD\}\_\{\\mathrm\{offline\}\}vary only marginally as the amount of training data changes, indicating that the method does not rely on large datasets to achieve strong performance\.

Interestingly, the HV metric formo\_hopper\_v2exhibits a slight downward trend as data size increases, while itsIGDoffline\\mathrm\{IGD\}\_\{\\mathrm\{offline\}\}values remain consistent across all subsampling ratios\. This suggests that the convergence behavior of DOMOO is not significantly affected by the available data volume\. Overall, these results demonstrate that DOMOO is robust and sample\-efficient, and its effectiveness persists even when the training data is substantially reduced\.

To further investigate the performance of DOMOO under varying levels of OOD severity, we prune the dataset by removing some high\-quality data to simulate different OOD levels\. The experimental results are shown in Tables[40](https://arxiv.org/html/2606.15115#A12.T40)–[45](https://arxiv.org/html/2606.15115#A12.T45)\. The experimental results show that DOMOO can effectively balance diversity and quality across different OOD levels\. Notably, even under severe OOD conditions \(Tables[40](https://arxiv.org/html/2606.15115#A12.T40)and[41](https://arxiv.org/html/2606.15115#A12.T41)\), DOMOO still maintains strong performance\.

Table 38:Comparison of averageHVranks across different tasks in Off\-MOO\-Bench under varying training dataset sizes \(25%, 50%, 75%, and 100% of the full training data\)\.Table 39:Comparison of averageIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}ranks across different tasks in Off\-MOO\-Bench under varying training dataset sizes \(25%, 50%, 75%, and 100% of the full training data\)\.Table 40:Results on the subset of data with quality scores between the 0th and 50th percentiles\.HVvalues are reported and higherHVindicates better performance\.Table 41:Results on the subset of data with quality scores between the 0th and 50th percentiles\.IGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}are reported and lowerIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}indicates better performance\.Table 42:Results on the subset of data with quality scores between the 0th and 75th percentiles\.HVvalues are reported and higherHVindicates better performance\.Table 43:Results on the subset of data with quality scores between the 0th and 75th percentiles\.IGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}are reported and lowerIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}indicates better performance\.Table 44:Results on the full dataset \(quality scores from 0th to 100th percentile\)\.HVvalues are reported and higherHVindicates better performance\.Table 45:Results on the full dataset \(quality scores from 0th to 100th percentile\)\.IGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}are reported and lowerIGDoffline\\text\{IGD\}\_\{\\text\{offline\}\}indicates better performance\.
## Appendix MPerformance of Online Pareto Set Learning Methods under Offline Optimization

As shown in Table[46](https://arxiv.org/html/2606.15115#A13.T46), online Pareto set learning methods, namely EPS\(Yeet al\.,[2024](https://arxiv.org/html/2606.15115#bib.bib79)\)and PSL\-MOBO\(Linet al\.,[2022](https://arxiv.org/html/2606.15115#bib.bib16)\), are not well\-suited for offline optimization\. When applied to offline optimization, they often encounter severe out\-of\-distribution \(OOD\) issues\(Luet al\.,[2023](https://arxiv.org/html/2606.15115#bib.bib43); Brookeset al\.,[2019](https://arxiv.org/html/2606.15115#bib.bib24)\), i\.e\., they yield solutions that are overconfident on the surrogate model, leading to significant deterioration or even invalidation of the solutions\.

Table 46:Hypervolume results of online Pareto set learning methods under the offline optimization\.

Similar Articles

DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment

arXiv cs.LG

DOG-DPO is a training-free data selection framework that treats preference pairs as structured geometric signals, decomposing multi-dataset preference geometry into anchor and residual subspaces to select diverse subsets for safety alignment. It achieves strong utility-robustness trade-offs using only 11% of preference pairs across six safety benchmarks.

Beyond Mode Collapse: Distribution Matching for Diverse Reasoning

arXiv cs.AI

This paper identifies mode collapse in on-policy RL methods like GRPO and proposes DMPO, which approximates forward KL minimization to maintain solution diversity. It achieves significant improvements on NP-hard combinatorial optimization and mathematical reasoning tasks.

Optimistic Dual Averaging Unifies Modern Optimizers

arXiv cs.LG

This paper introduces SODA, a generalization of Optimistic Dual Averaging that unifies various modern optimizers like Muon and Lion. It proposes a practical wrapper that improves performance across different scales without requiring additional hyperparameter tuning for weight decay.

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

arXiv cs.LG

This paper introduces xi-DPO, a novel preference optimization method that reformulates the objective to minimize distance to optimal ratio reward margins, addressing hyperparameter tuning challenges in SimPO. Experimental results show that xi-DPO outperforms existing methods on open benchmarks.