Agents on a Tree: Pathwise Coordination for Multi-Objective Molecular Optimization

arXiv cs.AI 06/02/26, 04:00 AM Papers
Summary
ATOM is a multi-agent framework that formulates molecular optimization as a tree-structured search with specialized agents along paths, enabling exploration of alternative molecular trajectories and improving Pareto coverage in multi-objective benchmarks.
arXiv:2606.00008v1 Announce Type: new Abstract: Multi-objective molecular optimization requires searching vast chemical spaces under conflicting objectives, where early design decisions strongly constrain downstream outcomes. Existing methods typically rely on a single policy or fixed scalarization, which limits their ability to represent diverse trade-offs and to explore multiple promising design trajectories. We propose ATOM, a multi-agent framework that formulates molecular optimization as a tree-structured search. Each node corresponds to an atomic operation and hosts an agent specialized for a particular objective or decision context. Agents coordinate along different paths of the tree rather than enforcing a global consensus, enabling the method to maintain and compare alternative molecular evolution trajectories. A global memory of past optimization behaviors further supports balanced exploration and exploitation across objectives. This tree-structured interaction enables reasoning over long-horizon dependencies inherent in molecular design. Experiments on challenging multi-objective benchmarks involving activity, synthesizability, and ADMET-related properties show that ATOM consistently achieves improved Pareto coverage and hypervolume over strong baselines. These results demonstrate the effectiveness of pathwise multi-agent coordination for molecular optimization. Code is available at https://anonymous.4open.science/r/ATOM-41CE.
Original Article
View Cached Full Text
Cached at: 06/02/26, 03:44 PM
# Agents on a Tree: Pathwise Coordination for Multi-Objective Molecular Optimization
Source: [https://arxiv.org/html/2606.00008](https://arxiv.org/html/2606.00008)
###### Abstract

Multi\-objective molecular optimization requires searching vast chemical spaces under conflicting objectives, where early design decisions strongly constrain downstream outcomes\. Existing methods typically rely on a single policy or fixed scalarization, which limits their ability to represent diverse trade\-offs and to explore multiple promising design trajectories\. We propose ATOM, a multi\-agent framework that formulates molecular optimization as a tree\-structured search\. Each node corresponds to an atomic operation and hosts an agent specialized for a particular objective or decision context\. Agents coordinate along different paths of the tree rather than enforcing a global consensus, enabling the method to maintain and compare alternative molecular evolution trajectories\. A global memory of past optimization behaviors further supports balanced exploration and exploitation across objectives\. This tree\-structured interaction enables reasoning over long\-horizon dependencies inherent in molecular design\. Experiments on challenging multi\-objective benchmarks involving activity, synthesizability, and ADMET\-related properties show that ATOM consistently achieves improved Pareto coverage and hypervolume over strong baselines\. These results demonstrate the effectiveness of pathwise multi\-agent coordination for molecular optimization\. Code is available at https://anonymous\.4open\.science/r/ATOM\-41CE\.

Machine Learning, ICML

## 1Introduction

![Refer to caption](https://arxiv.org/html/2606.00008v1/x1.png)Figure 1:\(a\) Correlations between representative molecular properties are weak or conflicting, illustrating the intrinsic difficulty of balancing multiple objectives\. \(b\) Our tree\-structured framework coordinates specialized agents along different search paths, enabling the exploration of alternative molecular evolution trajectories without enforcing a single global policy\.Multi\-objective molecular optimization is a central task in early\-stage drug discovery\(De Ryckeret al\.,[2018](https://arxiv.org/html/2606.00008#bib.bib49); Yanget al\.,[2024](https://arxiv.org/html/2606.00008#bib.bib35); Liuet al\.,[2025b](https://arxiv.org/html/2606.00008#bib.bib36)\), where the goal is to refine lead compounds to simultaneously satisfy multiple, often conflicting, properties such as bioactivity, drug\-likeness, and synthesizability\(Heet al\.,[2021](https://arxiv.org/html/2606.00008#bib.bib1)\)\. Traditional approaches, including high\-throughput screening \(HTS\)\(Graffet al\.,[2021](https://arxiv.org/html/2606.00008#bib.bib53)\)and simulation\-based methods\(Hsuet al\.,[2017](https://arxiv.org/html/2606.00008#bib.bib51)\), are effective but typically require substantial time and computational resources, limiting their scalability\.

Driven by advances in artificial intelligence, machine learning has emerged as a powerful paradigm for accelerating molecular discovery\(Hoffmanet al\.,[2022](https://arxiv.org/html/2606.00008#bib.bib113)\)\. Many existing methods for multi\-objective molecular design reduce the problem to a single\-objective formulation by assigning predefined weights to individual objectives\(Maziarkaet al\.,[2020](https://arxiv.org/html/2606.00008#bib.bib60); Jiet al\.,[2021](https://arxiv.org/html/2606.00008#bib.bib61); Xiaet al\.,[2024](https://arxiv.org/html/2606.00008#bib.bib58)\)\. While this strategy can be effective in practice, it relies heavily on expert\-designed weightings that are often difficult to calibrate and may bias the search toward suboptimal trade\-offs when objectives are strongly conflicting\(Xieet al\.,[2021](https://arxiv.org/html/2606.00008#bib.bib63); Fromer and Coley,[2023](https://arxiv.org/html/2606.00008#bib.bib59)\)\. Alternatively, Pareto\-based approaches attempt to approximate the Pareto front through large\-scale sampling followed by non\-dominated sorting\(Yasonik,[2020](https://arxiv.org/html/2606.00008#bib.bib114); Verhellen,[2022](https://arxiv.org/html/2606.00008#bib.bib115)\)\. However, such two\-stage pipelines are computationally expensive and scale poorly as the number of objectives and candidate molecules increases\. To improve sample efficiency, Bayesian optimization and Monte Carlo Tree Search \(MCTS\)\-based methods have been widely adopted for de novo molecular generation and multi\-objective property optimization\(Yanget al\.,[2024](https://arxiv.org/html/2606.00008#bib.bib35);[Southiratnet al\.,](https://arxiv.org/html/2606.00008#bib.bib3); Xieet al\.,[2021](https://arxiv.org/html/2606.00008#bib.bib63); Gaoet al\.,[2022](https://arxiv.org/html/2606.00008#bib.bib71)\)\. Despite their principled treatment of uncertainty, these methods often suffer from scalability issues in high\-dimensional chemical spaces, as well as the computational overhead associated with Gaussian process inference or deep tree expansions\. These limitations hinder their practical deployment in realistic multi\-objective molecular design settings\.

More recently, the rapid progress of large language models \(LLMs\)\(OpenAIet al\.,[2024](https://arxiv.org/html/2606.00008#bib.bib66); Baiet al\.,[2023](https://arxiv.org/html/2606.00008#bib.bib5); Dubeyet al\.,[2024](https://arxiv.org/html/2606.00008#bib.bib109)\)has sparked growing interest in their application to molecular generation\(Brahmavaret al\.,[2024](https://arxiv.org/html/2606.00008#bib.bib116); Wanget al\.,[2025](https://arxiv.org/html/2606.00008#bib.bib117)\)and optimization\(Yuet al\.,[2025](https://arxiv.org/html/2606.00008#bib.bib118); Yeet al\.,[2025](https://arxiv.org/html/2606.00008#bib.bib6)\)\. LLMs provide a flexible and scalable framework for goal\-conditioned generation and reasoning across heterogeneous molecular properties\(Nguyen and Grover,[2024](https://arxiv.org/html/2606.00008#bib.bib7)\)\. However, existing LLM\-based approaches typically formulate multi\-objective optimization as a monolithic generation problem, lacking explicit mechanisms for coordinating trade\-offs among conflicting objectives\(Liuet al\.,[2025a](https://arxiv.org/html/2606.00008#bib.bib8)\)\.

In this work, we view multi\-objective molecular optimization not as learning a single optimal policy, but as coordinating multiple specialized decision\-makers along distinct optimization paths\. Based on this perspective, we propose ATOM \(Agents on aTree for multi\-ObjectiveMolecular optimization\), a multi\-agent framework that formulates molecular optimization as a tree\-structured search\. Each node corresponds to an atomic\-level operation on a molecular population and hosts an agent specialized for a particular objective or decision context\. Agents coordinate pathwise along different branches of the tree rather than enforcing global consensus, enabling explicit comparison of alternative molecular evolution trajectories\. To support long\-horizon coordination, ATOM incorporates a global memory that aggregates historical optimization behaviors and high\-quality candidates across paths\. This shared context preserves agent specialization while balancing exploration and exploitation under competing objectives\. The resulting tree\-structured interaction facilitates long\-horizon reasoning and credit assignment in path\-dependent chemical spaces\.

In summary, our contributions are as follows: \(i\) We introduce a pathwise, tree\-structured formulation of multi\-objective molecular optimization that explicitly models alternative molecular evolution trajectories\. \(ii\) We propose ATOM, a multi\-agent framework in which specialized agents coordinate along different paths rather than collapsing into a single global policy\. \(iii\) We demonstrate empirically and theoretically that this structure leads to superior Pareto coverage and hypervolume on challenging multi\-objective benchmarks\.

## 22\. Related Work

### 2\.1Molecular Optimization

Molecular optimization is a core problem in drug discovery and materials science, and has gradually shifted from manual experimentation to data\-driven computational methods\(Gaoet al\.,[2022](https://arxiv.org/html/2606.00008#bib.bib71)\)\. Existing approaches can be broadly categorized into two classes\. \(1\) Combinatorial Optimization\. Traditional approaches treat molecular design as a search problem over a discrete, exponentially large chemical space\(Bohaceket al\.,[1996](https://arxiv.org/html/2606.00008#bib.bib9); Stumpfe and Bajorath,[2012](https://arxiv.org/html/2606.00008#bib.bib10)\)\. Common techniques include Monte Carlo Tree Search \(MCTS\)\(Yanget al\.,[2023](https://arxiv.org/html/2606.00008#bib.bib11)\), Genetic Algorithms \(GA\)\(Jensen,[2019](https://arxiv.org/html/2606.00008#bib.bib76); Fuet al\.,[2022](https://arxiv.org/html/2606.00008#bib.bib12)\), and Reinforcement Learning \(RL\)\(Bohaceket al\.,[1996](https://arxiv.org/html/2606.00008#bib.bib9); Stumpfe and Bajorath,[2012](https://arxiv.org/html/2606.00008#bib.bib10)\)\. While these methods explore the structural space iteratively, they often struggle with high\-dimensional search landscapes and the prohibitive computational cost of evaluating complex objectives\. \(2\) Generative Models in Molecular Design\. To mitigate the challenges of discrete search, recent research has moved toward generative modeling\(Duet al\.,[2024](https://arxiv.org/html/2606.00008#bib.bib13)\)\. These models learn the implicit probability distribution of chemical data to propose valid molecular candidates, effectively concentrating the search space\. Various architectures have been explored, including Variational Autoencoders \(VAEs\)\(Gómez\-Bombarelliet al\.,[2018](https://arxiv.org/html/2606.00008#bib.bib83); Jinet al\.,[2018](https://arxiv.org/html/2606.00008#bib.bib84)\), Generative Adversarial Networks \(GANs\)\(Guimaraeset al\.,[2017](https://arxiv.org/html/2606.00008#bib.bib14)\), flow\-based models\(Shiet al\.,[2020](https://arxiv.org/html/2606.00008#bib.bib15)\), and diffusion models\(Hoogeboomet al\.,[2022](https://arxiv.org/html/2606.00008#bib.bib16); Schneuinget al\.,[2024](https://arxiv.org/html/2606.00008#bib.bib17)\)\.

### 2\.2LLMs for Molecular Optimization

Large language models have recently been applied to molecule\-centered tasks, including property prediction and generation\(Luoet al\.,[2022](https://arxiv.org/html/2606.00008#bib.bib89); Liet al\.,[2023](https://arxiv.org/html/2606.00008#bib.bib18); Hanet al\.,[2023](https://arxiv.org/html/2606.00008#bib.bib91); Fanget al\.,[2023](https://arxiv.org/html/2606.00008#bib.bib19); Wuet al\.,[2024](https://arxiv.org/html/2606.00008#bib.bib92)\)\. Several works adapt LLMs for optimization: MOLLEO uses LLMs as genetic operators to improve crossover and mutation\(Wanget al\.,[2024](https://arxiv.org/html/2606.00008#bib.bib72)\); LICO applies context\-aware prompting for in\-context molecule refinement without retraining\(Nguyen and Grover,[2024](https://arxiv.org/html/2606.00008#bib.bib7)\); DrugAssist presents a human\-in\-the\-loop optimization framework that combines human insight with LLM reasoning\(Yeet al\.,[2025](https://arxiv.org/html/2606.00008#bib.bib6)\)\. Despite these advances, existing LLM\-based approaches typically treat multi\-objective optimization as a single unified generation task or instantiate agents for individual properties without mechanisms to coordinate across objectives\. Consequently, inter\-objective conflicts are seldom modeled explicitly and these methods cannot reliably discover coordinated, globally effective optimization trajectories\. In contrast, our work frames each objective as an autonomous chemistry\-aware agent and uses Monte–Carlo Tree Search to plan coordinated agent actions, enabling explicit trade\-off reasoning and automated discovery of optimization paths in molecular design\.

## 3Preliminary

We study multi\-objective molecular optimization over a discrete chemical space\. Let𝒳\\mathcal\{X\}denote the set of all chemically valid molecules, where each moleculex∈𝒳x\\in\\mathcal\{X\}is represented by a valid SMILES string\. Each moleculexxis associated with aKK\-dimensional objective vector𝐟\(x\)=\[f1\(x\),…,fK\(x\)\]∈ℝK\\mathbf\{f\}\(x\)=\[f\_\{1\}\(x\),\\ldots,f\_\{K\}\(x\)\]\\in\\mathbb\{R\}^\{K\}, where each objective functionfk:𝒳→ℝf\_\{k\}:\\mathcal\{X\}\\rightarrow\\mathbb\{R\}evaluates a molecular property of interest, such as target\-specific bioactivity, drug\-likeness \(QED\), or synthetic accessibility \(SA\)\. These scoring functions are generally non\-differentiable and treated as black\-box evaluators\. Given an initial moleculex0∈𝒳x\_\{0\}\\in\\mathcal\{X\}, the goal is to generate a set of candidate molecules that jointly optimize the objectives\. This is formulated as aKK\-objective maximization problem:

maxx∈𝒳⁡𝐟\(x\)\.\\max\_\{x\\in\\mathcal\{X\}\}\\;\\mathbf\{f\}\(x\)\.\(1\)Due to conflicting objectives, no single solution is optimal for all criteria\. Instead, we aim to identify a diverse set of trade\-off solutions\.

###### Definition 3\.1\(Pareto Dominance\)\.

Let𝒮⊂ℝK\\mathcal\{S\}\\subset\\mathbb\{R\}^\{K\}denote a non\-empty set of objective vectors obtained from candidate molecules\. For anyX,Y∈𝒮X,Y\\in\\mathcal\{S\}, we say thatYY*dominates*XX, denoted byY≻XY\\succ X, if

Y≻X⟺\{Yk≥Xk,∀k∈\{1,…,K\},∃k′∈\{1,…,K\}s\.t\.Yk′\>Xk′\.Y\\succ X\\quad\\Longleftrightarrow\\quad\\left\\\{\\begin\{aligned\} &Y\_\{k\}\\geq X\_\{k\},\\quad\\forall k\\in\\\{1,\\ldots,K\\\},\\\\ &\\exists k^\{\\prime\}\\in\\\{1,\\ldots,K\\\}\\ \\text\{s\.t\.\}\\ Y\_\{k^\{\\prime\}\}\>X\_\{k^\{\\prime\}\}\.\\end\{aligned\}\\right\.\(2\)An objective vectorX∈𝒮X\\in\\mathcal\{S\}is*non\-dominated*if there exists noY∈𝒮Y\\in\\mathcal\{S\}such thatY≻XY\\succ X\.

###### Definition 3\.2\(Pareto Fronts\)\.

The*first Pareto front*, also known as the*Pareto optimal set*, consists of all non\-dominated solutions:

𝒮1=\{X∈𝒮:∄Y∈𝒮s\.t\.Y≻X\}\.\\mathcal\{S\}\_\{1\}=\\left\\\{X\\in\\mathcal\{S\}\\;:\\;\\nexists Y\\in\\mathcal\{S\}\\ \\text\{s\.t\.\}\\ Y\\succ X\\right\\\}\.\(3\)Subsequent Pareto fronts are defined recursively by removing solutions in the preceding fronts\. Thekk\-th Pareto front is given by

𝒮k=\{X∈𝒮∖⋃i=1k−1𝒮i:∄Y∈𝒮∖⋃i=1k−1𝒮is\.t\.Y≻X\}\.\\mathcal\{S\}\_\{k\}=\\left\\\{X\\in\\mathcal\{S\}\\setminus\\bigcup\_\{i=1\}^\{k\-1\}\\mathcal\{S\}\_\{i\}\\;:\\;\\nexists Y\\in\\mathcal\{S\}\\setminus\\bigcup\_\{i=1\}^\{k\-1\}\\mathcal\{S\}\_\{i\}\\ \\text\{s\.t\.\}\\ Y\\succ X\\right\\\}\.\(4\)

## 4Method

![Refer to caption](https://arxiv.org/html/2606.00008v1/x2.png)Figure 2:Agents\-on\-a\-Tree framework for multi\-objective molecular optimization, combining pathwise MCTS planning with knowledge\-mediated coordination among specialized agents to improve Pareto coverage under conflicting objectives\.### 4\.1Algorithmic Framework of ATOM

Multi\-Agent Attribute\-Specific Optimization\.As illustrated in Figure 1, we instantiate a collection of expert agents within the proposed ATOM framework, where each agent is explicitly specialized for optimizing a particular molecular property\. Each expert is instantiated as a LLM, such as GPT\-4o mini\(OpenAI,[2024](https://arxiv.org/html/2606.00008#bib.bib34)\), and assigned a well\-defined domain role corresponding to a specific optimization objective\. Concretely, these roles include a QED expert, a SA expert, and target\-specific experts for GSK3β\\betaand JNK3, which are closely associated with Alzheimer’s disease\.

Recent studies have shown that LLM performance on biochemical and molecular reasoning tasks can be substantially enhanced through domain\-aware prompt engineering\(Luoet al\.,[2025](https://arxiv.org/html/2606.00008#bib.bib111); Liet al\.,[2025](https://arxiv.org/html/2606.00008#bib.bib122)\)\. Motivated by these findings, ATOM adopts expert\-specific prompt templates that explicitly specify the task scope, optimization objective, input representation, and expected output format for each agent\. This design enforces functional disentanglement across experts while enabling focused and interpretable decision\-making\.

To further improve specialization and practical effectiveness, each expert agent in ATOM is equipped with tool\-calling capabilities, enabling on\-demand access to domain\-specific tools such as RDKit and learned oracles for molecular property evaluation\. These tools allow agents to ground their reasoning in quantitative feedback, thereby supporting informed optimization actions rather than purely textual heuristics\. Detailed descriptions of the tool interfaces and invocation protocols are provided in Appendix A\.

Adaptive Trajectory Selection via UCT\-Style Scoring\.We employ a Monte Carlo Tree Search \(MCTS\) framework to adaptively select optimization trajectories in the molecular search space\. Each node in the tree represents a population of molecules, and its value is designed to reflect both task\-oriented optimization quality and multi\-objective structural diversity\. Specifically, the intrinsic value of a nodeNNis defined as

V\(N\)=λ⋅Sattr\(N\)\+\(1−λ\)⋅HV^\(N\),V\(N\)=\\lambda\\cdot S\_\{\\mathrm\{attr\}\}\(N\)\+\(1\-\\lambda\)\\cdot\\widehat\{\\mathrm\{HV\}\}\(N\),\(5\)whereSattr\(N\)S\_\{\\mathrm\{attr\}\}\(N\)denotes an attribute\-weighted score over the top\-performing molecules in the node,HV^\(N\)\\widehat\{\\mathrm\{HV\}\}\(N\)is the normalized hypervolume of the Pareto front induced by the node, andλ∈\[0,1\]\\lambda\\in\[0,1\]controls the trade\-off between directional optimization and diversity preservation\. The attribute score is computed as

Sattr\(N\)=1\|Top\-k\(N\)\|∑m∈Top\-k\(N\)∑i=1Kwif~i\(m\),S\_\{\\mathrm\{attr\}\}\(N\)=\\frac\{1\}\{\|\\mathrm\{Top\}\\text\{\-\}k\(N\)\|\}\\sum\_\{m\\in\\mathrm\{Top\}\\text\{\-\}k\(N\)\}\\sum\_\{i=1\}^\{K\}w\_\{i\}\\tilde\{f\}\_\{i\}\(m\),\(6\)wheref~i\(m\)\\tilde\{f\}\_\{i\}\(m\)denotes the normalized value of theii\-th molecular property andwiw\_\{i\}represents its corresponding importance weight provided by expert agents\.

During the selection phase, child nodes are chosen by maximizing a UCT\-style score:

UCT\(N\)=V\(N\)\+clog⁡NparentNvisit\(N\),\\mathrm\{UCT\}\(N\)=V\(N\)\+c\\sqrt\{\\frac\{\\log N\_\{\\mathrm\{parent\}\}\}\{N\_\{\\mathrm\{visit\}\}\(N\)\}\},\(7\)whereNvisit\(N\)N\_\{\\mathrm\{visit\}\}\(N\)denotes the number of visits to nodeNN, andcccontrols the degree of exploration\. This formulation prioritizes nodes with high intrinsic value while explicitly encouraging exploration of under\-visited regions\.

After rollout termination, node values are propagated upward using a hierarchical averaging rule:

Vparent←12\(Vparent\+1\|𝒞\|∑i=1\|𝒞\|Vchild\(i\)\),V\_\{\\mathrm\{parent\}\}\\leftarrow\\frac\{1\}\{2\}\\left\(V\_\{\\mathrm\{parent\}\}\+\\frac\{1\}\{\|\\mathcal\{C\}\|\}\\sum\_\{i=1\}^\{\|\\mathcal\{C\}\|\}V\_\{\\mathrm\{child\}\}^\{\(i\)\}\\right\),\(8\)where𝒞\\mathcal\{C\}denotes the set of child nodes\. This update strategy stabilizes value estimation by integrating the parent’s prior estimate with aggregated feedback from its descendants, enabling robust and adaptive trajectory selection throughout the search process\.

### 4\.2Theoretical Analysis of ATOM

The theoretical complexity of searching for optimal molecular structures typically grows exponentially with the decision depthLL\(molecule length\) and the branching factorKK\(chemical space dimensionality\), rendering exhaustive search computationally infeasible\(Polishchuket al\.,[2013](https://arxiv.org/html/2606.00008#bib.bib27)\)\. Furthermore, traditional Monte Carlo Tree Search \(MCTS\) is often restricted to sequential decision\-making for a single molecule, which fails to capture the evolutionary characteristics of molecular populations under complex multi\-objective distributions\. To address these challenges, we map the molecular generation process to the ATOM framework\. We propose a population\-based search paradigm that leverages theoretical sample complexity bounds for value and policy networks\(Silveret al\.,[2017](https://arxiv.org/html/2606.00008#bib.bib28)\); the Assumptions underpinning our theoretical analysis are detailed in Appendix F\. This framework ensures that the method remains computationally tractable in high\-dimensional spaces while maintaining practical relevance through population diversity\.

Escape via Agent SynergyThis step demonstrates how the synergy among specialized agents ensures the search escapes single\-objective local optima by concentrating the sample budget on orthogonal descent directions\. In our framework, the policy network acts as an orchestrator that assigns selection weights toKKspecialized agents\. Let the candidate child populations generated from statesds\_\{d\}by the set of agents\{Ag1,…,AgK\}\\\{Ag\_\{1\},\\dots,Ag\_\{K\}\\\}be denoted asr1,…,rKr\_\{1\},\\dots,r\_\{K\}\. According to Assumption 1, if the current population is trapped in a local optimum for objectiveii, there exists at least one orthogonal agent \(denoted asAg1Ag\_\{1\}\) that provides a significant value improvementδ\\delta\. Specifically, for any non\-orthogonal agentAgkAg\_\{k\}\(k\>1k\>1\), we have:

𝔼\[Vr1−Vrk\]≥δ\.\\mathbb\{E\}\[V\_\{r\_\{1\}\}\-V\_\{r\_\{k\}\}\]\\geq\\delta\.\(9\)Within the selection logic of ATOM, the expected sample complexity is governed by the algorithm’s capability to identify and track this synergistic direction\. The synergy manifests in the concentration of the policy distributionprp\_\{r\}: as search depthddincreases, the policy network adaptively assigns higher prior probabilities to specialized agents that resolve the current ”property bottlenecks” of the population based on evaluation feedback\. The probability that the search fails to escape the local optimum—by ”mis\-selecting” a sub\-optimal agentAgkAg\_\{k\}that offers no marginal gain—is bounded by the likelihood that its noisy score exceeds that of the optimal agentAg1Ag\_\{1\}:

ℙ\(SelectAgk\)≈ℙ\(Vr1−Vrk\+X1π−Xkπ≤UCB Bias\)\.\\mathbb\{P\}\(\\text\{Select \}Ag\_\{k\}\)\\approx\\mathbb\{P\}\(V\_\{r\_\{1\}\}\-V\_\{r\_\{k\}\}\+X^\{\\pi\}\_\{1\}\-X^\{\\pi\}\_\{k\}\\leq\\text\{UCB Bias\}\)\.\(10\)Given the lower boundδ\\deltaon the value gap from Assumption 1 and the decaying noise varianceσd\\sigma\_\{d\}from Assumption 2, this mis\-selection probability decays at a rate ofO\(exp⁡\(−δ2/σd2\)\)O\(\\exp\(\-\\delta^\{2\}/\\sigma\_\{d\}^\{2\}\)\)\. Consequently, the multi\-agent framework ensures that the search does not stall at single\-property local optima\. Instead, it adaptively switches to an orthogonal agent branch, leveraging the steepest descent direction toward the Pareto front\. This mechanism effectively reduces the branching factor to only those agents contributing to the joint objective improvement, thereby significantly enhancing search efficiency\.

Pruning via UCB\.Pruning via UCB\.Expansion is restricted to nodes whose optimistic upper bounds exceed the true optimal valueV∗V^\{\*\}, thereby pruning provably sub\-optimal branches:

U\(s′\)\+cd≥V∗⇔V\(s′\)\+Xs′\+cd≥V∗\.U\(s^\{\\prime\}\)\+c\_\{d\}\\geq V^\{\*\}\\iff V\(s^\{\\prime\}\)\+X\_\{s^\{\\prime\}\}\+c\_\{d\}\\geq V^\{\*\}\.\(11\)Rearranging terms, this is equivalent to the event where the noise and exploration bonus exceed the optimality gapΔs′=V∗−V\(s′\)\\Delta\_\{s^\{\\prime\}\}=V^\{\*\}\-V\(s^\{\\prime\}\):

ℙ\(Expands′\)=ℙ\(Δs′−Xs′−cd≤0\)\.\\mathbb\{P\}\(\\text\{Expand \}s^\{\\prime\}\)=\\mathbb\{P\}\(\\Delta\_\{s^\{\\prime\}\}\-X\_\{s^\{\\prime\}\}\-c\_\{d\}\\leq 0\)\.\(12\)
Under Assumption 2 \(Optimistic Pruning\),cdc\_\{d\}is chosen such that\|Xs′\|\[citestart\]≤cd\|X\_\{s^\{\\prime\}\}\|\[cite\_\{s\}tart\]\\leq c\_\{d\}with high probability\.For ”wrong” branches \(Agent Mismatch\) where the chosen agent does not address the current molecular bottleneck \(e\.g\., optimizing QED when SA is invalid\), the gapΔs′\\Delta\_\{s^\{\\prime\}\}remains large\.Consequently, the probabilityℙ\(Δs′≤Xs′\+cd\)\\mathbb\{P\}\(\\Delta\_\{s^\{\\prime\}\}\\leq X\_\{s^\{\\prime\}\}\+c\_\{d\}\)becomes negligible, ensuring that the number of sub\-optimal expansions in the ”SubOptimal” term of the main theorem remains sparse\.

Contraction via Knowledge BaseThe standard sample complexity for MCTS is polynomial in the tree depthDDunder decaying noise models\.

𝔼\[N\]≈O\(\(KD\)C\),\\mathbb\{E\}\[N\]\\approx O\(\(KD\)^\{C\}\),\(13\)
whereCCdepends on the noise decay rate\. However, in de novo molecular generation, the physical edit distanceLL\(number of atoms/bonds added\) to reach a high\-value molecule from scratch is large, makingKLK^\{L\}intractable\.Assumption 3 asserts that conditioning on the Knowledge Base𝒦\\mathcal\{K\}allows the generation of a target molecule inLKBL\_\{KB\}steps\.This effectively replaces the physical depthD=LD=Lwith the effective depthDeff=LKBD\_\{eff\}=L\_\{KB\}in the summation of the Sample Complexity bound:

∑d=0L\(…\)⟶∑d=0LKB\(…\)\.\\sum\_\{d=0\}^\{L\}\(\\dots\)\\longrightarrow\\sum\_\{d=0\}^\{L\_\{KB\}\}\(\\dots\)\.\(14\)Since the complexity grows polynomially \(or exponentially in worst cases\) with depth, the reductionLKB<LL\_\{KB\}<Lprovided by the KB prompts results in a significant contraction of the search space, enabling the sample complexity to remain bounded even for complex, multi\-constraint objectives\.

#### 4\.2\.1Knowledge\-Mediated Agent Coordination

We introduce a knowledge\-mediated coordination mechanism to enable structured information sharing among specialized expert agents within the MCTS framework\. Each tree nodeNNmaintains a molecular populationPNP\_\{N\}and is associated with a set of expert agents\{A1,…,AK\}\\\{A\_\{1\},\\dots,A\_\{K\}\\\}, each responsible for optimizing a specific molecular property\.

##### Lateral Knowledge Exchange\.

At the same tree depth, expert agents exchange compact, attribute\-ranked summaries to provide complementary guidance while preserving a designated lead objective\. For nodeNN, each agentAkA\_\{k\}extracts a top\-ranked subsetSk\(N\)S\_\{k\}\(N\)fromPNP\_\{N\}according to its objective\. When agentAqA\_\{q\}is selected as the lead optimizer, its effective scoring integrates auxiliary suggestions from other agents, such as

S~q\(N\)=Sq\(N\)∪⋃j≠qωj→qSj\(N\),\\tilde\{S\}\_\{q\}\(N\)=S\_\{q\}\(N\)\\cup\\bigcup\_\{j\\neq q\}\\omega\_\{j\\rightarrow q\}S\_\{j\}\(N\),\(15\)whereωj→q\\omega\_\{j\\rightarrow q\}controls the influence of auxiliary agents\. This mechanism enables constraint\-aware optimization without diluting the primary objective\.

##### Hierarchical Knowledge Propagation\.

To bias expansion toward promising regions, parent nodes propagate high\-quality molecules to their children\. Specifically, given a parent nodeNpN\_\{p\}and its Pareto front𝒫\(Np\)\\mathcal\{P\}\(N\_\{p\}\), a subset of top\-performing molecules is selected and injected into a child nodeNcN\_\{c\}:

Π\(Np→Nc\)=Top\-r\(𝒫\(Np\)\),\\displaystyle\\Pi\(N\_\{p\}\\to N\_\{c\}\)=\\text\{Top\-\}r\(\\mathcal\{P\}\(N\_\{p\}\)\),\(16\)PNc←PNc∪Π\(Np→Nc\)\.\\displaystyle P\_\{N\_\{c\}\}\\leftarrow P\_\{N\_\{c\}\}\\cup\\Pi\(N\_\{p\}\\to N\_\{c\}\)\.This hierarchical propagation transfers favorable structural motifs and multi\-objective trade\-offs along the search trajectory\.

##### Global Knowledge Integration\.

We further maintain a dynamic global memoryℳ\\mathcal\{M\}that aggregates high\-quality molecules discovered across all search trajectories\. Each node periodically contributes its best candidates toℳ\\mathcal\{M\}, while agents retrieve relevant exemplars during local optimization:

ℳ←Top\-M\(ℳ∪\{\(m,sm\)\}\),\\mathcal\{M\}\\leftarrow\\mathrm\{Top\}\\text\{\-\}M\\big\(\\mathcal\{M\}\\cup\\\{\(m,s\_\{m\}\)\\\}\\big\),\(17\)wheresms\_\{m\}denotes a composite attribute score\. Retrieved molecules are incorporated into the agent’s scoring function as global guidance:

Sattr\(ℳ\)\(N\)=\(1−γ\)Sattr\(N\)\+γSattr\(Retrieve\(ℳ\)\),S\_\{\\mathrm\{attr\}\}^\{\(\\mathcal\{M\}\)\}\(N\)=\(1\-\\gamma\)\\,S\_\{\\mathrm\{attr\}\}\(N\)\+\\gamma\\,S\_\{\\mathrm\{attr\}\}\\big\(\\mathrm\{Retrieve\}\(\\mathcal\{M\}\)\\big\),\(18\)withγ\\gammacontrolling the influence of global knowledge\.

Together, these mechanisms enable coordinated optimization across agents and search trajectories, preserving agent specialization while promoting efficient multi\-objective molecular exploration\.

## 5Experiments

Implementation Details\.We consider four molecular property objectives and analyze their pairwise correlations on a the ZINC20\(Irwinet al\.,[2020](https://arxiv.org/html/2606.00008#bib.bib29)\)dataset\. As shown in our Figure[1](https://arxiv.org/html/2606.00008#S1.F1), most objective pairs exhibit negative correlation, indicating substantial inter\-objective conflicts\. Based on this,we design five task settings with varying conflict intensities: 1\) QED\+SA \(non\-biological objectives\): Druglikeness \(QED\) and synthetic accessibility \(SA\) measure the develop ability and synthesizability of molecules, computed using RDKit\. 2\) GSK3β\\beta\+JNK3 \(biological objectives\): The inhibition of GSK3β\\betaand JNK3, two kinase targets as sociated with Alzheimer’s disease, predicted using random forest models\. 3\) QED\+SA\+GSK3β\\beta/JNK3: Optimization of either GSK3β\\betaor JNK3 inhibition under constraints of good QED and SA properties\. 4\) GSK3β\\beta\+JNK3\+QED: Simultaneous optimization of GSK3β\\betaand JNK3 inhibition with QED constraints, without explicit control over synthetic accessibility\. 5\) QED\+SA\+GSK3β\\beta\+JNK3: Joint optimization across all four objectives to balance activity, drug\-likeness, and synthesizability\.

Benchmarks and Evaluation Metrics\.To ensure a diverse chemical initialization for multi\-objective optimization, we leverage the ZINC20\. Our optimization objective spans four critical dimensions: GSK3β\\betainhibition, JNK3 inhibition, QED, and SA\. To rigorously quantify the performance across this multi\-dimensional frontier, we adopt the hypervolume \(HV\) indicator\(Zitzleret al\.,[2003](https://arxiv.org/html/2606.00008#bib.bib33)\)as our primary metric\. HV measures the volume of the objective space dominated by the attained solution set relative to a predefined reference point, thereby providing a joint characterization of both Pareto convergence and distributional diversity\. Furthermore, we supplement our analysis with standard molecular generation metrics, including novelty and diversity, to provide a holistic assessment of the generative performance\.

Baselines\.We evaluateATOMagainst a comprehensive suite of baselines, categorized into traditional molecular optimization frameworks and recent Large Language Model \(LLM\)\-based methods\. The traditional baselines comprise: \(1\)SMILES LSTM\(Bjerrum,[2017](https://arxiv.org/html/2606.00008#bib.bib30)\), a recurrent generative model that optimizes SMILES sequences through reinforcement learning; \(2\)SMILES GA\(Brownet al\.,[2019](https://arxiv.org/html/2606.00008#bib.bib101)\), which employs a genetic algorithm to evolve SMILES representations; \(3\)GRAPH GA\(Jensen,[2019](https://arxiv.org/html/2606.00008#bib.bib76)\), a genetic algorithm operating directly on molecular graph structures; \(4\)STONED\(Nigamet al\.,[2021](https://arxiv.org/html/2606.00008#bib.bib31)\), an efficient algorithm designed for rapid chemical space exploration via random string\-level mutations; and \(5\)GP BO\(Trippet al\.,[2021](https://arxiv.org/html/2606.00008#bib.bib102)\), a Bayesian optimization approach leveraging Gaussian processes\. For LLM\-based baselines, we compare against: \(1\)Drugassist\(Yeet al\.,[2025](https://arxiv.org/html/2606.00008#bib.bib6)\), which formulates molecule optimization as an interactive dialogue with an LLM; and \(2\) A directGPT\-4o mini\(Achiamet al\.,[2023](https://arxiv.org/html/2606.00008#bib.bib4)\)\-based optimization baseline, which applies a single\-agent LLM to optimize multiple objectives simultaneously \(3\)EAG\(Guet al\.,[2025](https://arxiv.org/html/2606.00008#bib.bib32)\), a multi\-agent framework that decomposes complex optimization tasks into coordinated pipeline stages\.

### 5\.1Result

Performance Comparison on Multi\-Objective Tasks\.Table[1](https://arxiv.org/html/2606.00008#S5.T1)reports the HV scores of different methods across 6 multi\-objective molecular optimization tasks\. ATOM consistently achieves the best performance in most settings, yielding the highest HV scores in each objective combination and the largest overall HV sum \(4\.351\)\. This demonstrates its superior ability to explore diverse high\-quality solutions and effectively balance conflicting objectives\. On simpler dual\-objective tasks such as QED\+SA and GSK3β\\beta\+JNK3, most baselines perform reasonably well, especially GRAPH GA and SMILES GA\. However, their performance declines substantially as the number of objectives increases\. Notably, SMILES LSTM and STONED perform poorly on tasks involving biological targets, highlighting their limited scalability in high\-dimensional spaces\. Among LLM\-based approaches, single\-agent methods like GPT4o\-mini \(HV sum: 3\.678\) show constrained capability in handling objective conflicts\. While the multi\-agent framework EAG \(HV sum: 3\.752\) demonstrates improved coordination, ATOM’s specialized expert collaboration and dynamic scheduling significantly outperform all these LLM\-based alternatives\.

Table 1:Performance comparison on multi\-objective molecular optimization tasks\. Higher values indicate better performance\.![Refer to caption](https://arxiv.org/html/2606.00008v1/figure/figure_3_academic_new.png)Figure 3:Normalized distributions of the generated molecules, QED and SA GSK3β\\betaand JNK3\.Multi\-Objective Results and Pareto Front Analysis\.To evaluate the optimization efficiency and trade\-off management of our proposed method, we analyze the property distributions and Pareto frontiers of the generated molecules\. As illustrated in Figure[3](https://arxiv.org/html/2606.00008#S5.F3), ATOM consistently shifts the probability density of key molecular properties—including GSK3β\\beta, JNK3, QED, and SA—towards the optimal high\-score regions\. While baseline methods such as SMILES GA and GRAPH GA often exhibit inconsistent performance or suffer from a high density of low\-quality candidates, ATOM demonstrates a robust ability to identify high\-quality molecules across all metrics\.This distributional superiority further translates into a more effective exploration of the objective space\. As shown in the Pareto front analysis in Figure[4](https://arxiv.org/html/2606.00008#S5.F4), ATOM yields non\-dominated solution sets that demonstrate competitive convergence and superior trade\-off balance, particularly in drug\-likeness\-oriented tasks\. In the challenging multi\-objective scenarios of GSK3β\\beta\+ QED and JNK3 \+ QED, ATOM significantly outperforms baselines by maintaining high biological activity scores \(\>0\.8\>0\.8\) even as the QED values approach the high\-quality region \(∼0\.9\\sim 0\.9\)\. While competing other LLM\-based methods such as Drugassist and EAG experience a sharp performance decay in target affinity when optimizing for molecular properties, ATOM effectively maintains a broader coverage near the ”knee region” of the frontier\. These results validate the effectiveness of ATOM’s dynamic coordination mechanism, which adaptively manages the intricate interdependencies between high\-dimensional objectives to preserve a more optimal and robust solution distribution\.

![Refer to caption](https://arxiv.org/html/2606.00008v1/x3.png)Figure 4:Non\-dominated solutions of various methods on GSK3β\\beta\+JNK3, QED\+SA, GSK3β\\beta\+QED and JNK3\+QED objectives\.

## 6Case Studies

### 6\.1Visualization of potential dual\-inhibitors for JNK3\-GSK3β\\beta

![Refer to caption](https://arxiv.org/html/2606.00008v1/figure/case_study_new_2.png)Figure 5:Examples of molecules generated by ATOM on the JNK3\-GSK3β\\betatarget pair\.Figure[5](https://arxiv.org/html/2606.00008#S6.F5)presents two representative dual\-target inhibitors \(Compound 1 and Compound 2\) generated by ATOM to jointly optimize binding to GSK3β\\beta\(PDB: 6Y9S\) and JNK3 \(PDB: 4WHZ\)\. Docking analysis indicates that both molecules stably occupy the ATP\-binding pockets of their respective targets and reproduce canonical kinase interaction motifs reported in prior structural studies, demonstrating ATOM’s ability to learn and transfer conserved binding patterns across targets\.

Compound 1 achieved strong predicted affinities for both kinases \(−140\.95kcal⋅mol−1\-140\.95\\ \\mathrm\{kcal\\cdot mol^\{\-1\}\}for GSK3β\\beta;−115\.64kcal⋅mol−1\-115\.64\\ \\mathrm\{kcal\\cdot mol^\{\-1\}\}for JNK3\)\. Its binding mode is characterized by hinge hydrogen bonds to VAL135 \(GSK3β\\beta\) and MET149 \(JNK3\), complemented by extensive hydrophobic interactions with VAL70 and LEU132 in GSK3β\\betaand ILE70 and VAL78 in JNK3\. In addition, a halogen bond with LEU206 in JNK3 further stabilizes the complex, highlighting ATOM’s capacity to exploit target\-specific auxiliary interactions\.

Compound 2 shows comparable dual\-target potency \(−139\.18kcal⋅mol−1\-139\.18\\ \\mathrm\{kcal\\cdot mol^\{\-1\}\}for GSK3β\\beta;−118\.42kcal⋅mol−1\-118\.42\\ \\mathrm\{kcal\\cdot mol^\{\-1\}\}for JNK3\)\. Binding is driven by conserved hinge interactions with VAL135 \(GSK3β\\beta\) and dual hydrogen bonding to MET149 \(JNK3\), together with hydrophobic packing involving ILE62 \(GSK3β\\beta\) and VAL78 \(JNK3\)\. Notably, Compound 2 uniquely forms metal\-mediated coordination with a cerium ion in the GSK3β\\betaactive site, illustrating ATOM’s flexibility in accommodating noncanonical yet energetically favorable interaction patterns\.

Beyond binding affinity, both compounds exhibit balanced drug\-like properties and low synthetic complexity, indicating that ATOM jointly optimizes binding performance and chemical feasibility\. Collectively, these results demonstrate that ATOM can generate dual\-target kinase inhibitors that preserve key structural interaction motifs while satisfying fundamental drug discovery constraints\.

### 6\.2Interpretable OptimizationPath

![Refer to caption](https://arxiv.org/html/2606.00008v1/figure/figure3.png)Figure 6:Molecular Optimization Trajectory with ATOM\.Figure[6](https://arxiv.org/html/2606.00008#S6.F6)shows a multi\-step optimization trajectory starting from a benzene\-centered scaffold bearing a heteroaryl amine and a polar amino\-alcohol side chain \(initial SMILES: NCC\(O\)c1ccc\(\-c2ccnc\(NC\(C\)c3ccccc3\)n2\)cc1\)\. ATOM applies targeted, structure\-aware edits guided by atomic attribution maps: early modifications reorganize the polar side chain into a more rigid, amide\-like motif and tune peripheral substituents to increase hydrogen\-bond directionality, improving predicted activity against both JNK3 and GSK3β\\betawhile only modestly increasing structural complexity; subsequent edits simplify highly basic functionality and introduce a carboxylate to rebalance lipophilicity and reduce overall complexity; finally, a JNK3\-specialized agent electronically withdraws the terminal phenyl ring to enhance complementarity in the JNK3 pocket, restoring potency without degrading QED or synthetic accessibility\. This trajectory demonstrates that ATOM attains balanced, explainable multi\-objective optimization via fine\-grained, chemically coherent substituent and functional\-group modifications rather than coarse scaffold replacement\.

## Conclusion

We introduced ATOM, a tree\-structured multi\-agent framework for multi\-objective molecular optimization that formulates molecular design as pathwise coordination over alternative evolution trajectories\. By assigning specialized agents to atomic operations along different branches and integrating a global memory for cross\-path information sharing, ATOM effectively captures conflicting trade\-offs and long\-horizon dependencies in chemical spaces\. Extensive experiments on challenging benchmarks demonstrate consistent improvements in Pareto coverage and hypervolume over strong baselines, highlighting the promise of tree\-structured multi\-agent coordination for complex molecular optimization tasks\.

## Impact Statement

This paper presents work whose goal is to advance the field of machine learning\. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here\.

## References

- J\. Achiam, S\. Adler, S\. Agarwal, L\. Ahmad, I\. Akkaya, F\. L\. Aleman, D\. Almeida, J\. Altenschmidt, S\. Altman, S\. Anadkat,et al\.\(2023\)Gpt\-4 technical report\.arXiv preprint arXiv:2303\.08774\.Cited by:[§5](https://arxiv.org/html/2606.00008#S5.p3.1.9)\.
- L\. N\. Alegre, A\. L\. Bazzan, D\. M\. Roijers, A\. Nowé, and B\. C\. da Silva \(2023\)Sample\-efficient multi\-objective learning via generalized policy improvement prioritization\.arXiv preprint arXiv:2301\.07784\.Cited by:[Appendix F](https://arxiv.org/html/2606.00008#A6.p1.5)\.
- J\. Bai, S\. Bai, Y\. Chu, Z\. Cui, K\. Dang, X\. Deng, Y\. Fan, W\. Ge, Y\. Han, F\. Huang,et al\.\(2023\)Qwen technical report\.arXiv preprint arXiv:2309\.16609\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p3.1)\.
- E\. J\. Bjerrum \(2017\)SMILES enumeration as data augmentation for neural network modeling of molecules\.arXiv preprint arXiv:1703\.07076\.Cited by:[§5](https://arxiv.org/html/2606.00008#S5.p3.1)\.
- R\. S\. Bohacek, C\. McMartin, and W\. C\. Guida \(1996\)The art and practice of structure\-based drug design: a molecular modeling perspective\.Medicinal research reviews16\(1\),pp\. 3–50\.Cited by:[§2\.1](https://arxiv.org/html/2606.00008#S2.SS1.p1.1)\.
- S\. B\. Brahmavar, A\. Srinivasan, T\. Dash, S\. R\. Krishnan, L\. Vig, A\. Roy, and R\. Aduri \(2024\)Generating novel leads for drug discovery using llms with logical feedback\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.38,pp\. 21–29\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p3.1)\.
- N\. Brown, M\. Fiscato, M\. H\. Segler, and A\. C\. Vaucher \(2019\)GuacaMol: benchmarking models for de novo molecular design\.Journal of Chemical Information and Modeling59\(3\),pp\. 1096–1108\.Cited by:[§5](https://arxiv.org/html/2606.00008#S5.p3.1)\.
- M\. De Rycker, B\. Baragaña, S\. L\. Duce, and I\. H\. Gilbert \(2018\)Challenges and recent progress in drug discovery for tropical diseases\.Nature559\(7715\),pp\. 498–506\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p1.1)\.
- Y\. Du, A\. R\. Jamasb, J\. Guo, T\. Fu, C\. Harris, Y\. Wang, C\. Duan, P\. Liò, P\. Schwaller, and T\. L\. Blundell \(2024\)Machine learning\-aided generative molecular design\.Nature Machine Intelligence6\(6\),pp\. 589–604\.Cited by:[§2\.1](https://arxiv.org/html/2606.00008#S2.SS1.p1.1)\.
- A\. Dubey, A\. Jauhri, A\. Pandey, A\. Kadian, A\. Al\-Dahle, A\. Letman, A\. Mathur, A\. Schelten, A\. Yang, A\. Fan,et al\.\(2024\)The llama 3 herd of models\.arXiv preprint arXiv:2407\.21783\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p3.1)\.
- Y\. Fang, X\. Liang, N\. Zhang, K\. Liu, R\. Huang, Z\. Chen, X\. Fan, and H\. Chen \(2023\)Mol\-instructions: a large\-scale biomolecular instruction dataset for large language models\.arXiv preprint arXiv:2306\.08018\.Cited by:[§2\.2](https://arxiv.org/html/2606.00008#S2.SS2.p1.1)\.
- J\. C\. Fromer and C\. W\. Coley \(2023\)Computer\-aided multi\-objective optimization in small molecule discovery\.Patterns4\(2\)\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p2.1)\.
- T\. Fu, W\. Gao, C\. Coley, and J\. Sun \(2022\)Reinforced genetic algorithm for structure\-based drug design\.Advances in Neural Information Processing Systems35,pp\. 12325–12338\.Cited by:[§2\.1](https://arxiv.org/html/2606.00008#S2.SS1.p1.1)\.
- W\. Gao, T\. Fu, J\. Sun, and C\. Coley \(2022\)Sample efficiency matters: a benchmark for practical molecular optimization\.Advances in NeuralInformation Processing Systems35,pp\. 21342–21357\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.00008#S2.SS1.p1.1)\.
- R\. Gómez\-Bombarelli, J\. N\. Wei, D\. Duvenaud, J\. M\. Hernández\-Lobato, B\. Sánchez\-Lengeling, D\. Sheberla, J\. Aguilera\-Iparraguirre, T\. D\. Hirzel, R\. P\. Adams, and A\. Aspuru\-Guzik \(2018\)Automatic chemical design using a data\-driven continuous representation of molecules\.ACS Central Science4\(2\),pp\. 268–276\.Cited by:[§2\.1](https://arxiv.org/html/2606.00008#S2.SS1.p1.1)\.
- D\. E\. Graff, E\. I\. Shakhnovich, and C\. W\. Coley \(2021\)Accelerating high\-throughput virtual screening through molecular pool\-based active learning\.Chemical Science12\(22\),pp\. 7866–7881\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p1.1)\.
- W\. Gu, J\. Han, H\. Wang, X\. Li, and B\. Cheng \(2025\)Explain\-analyze\-generate: a sequential multi\-agent collaboration method for complex reasoning\.InProceedings of the 31st International Conference on Computational Linguistics,pp\. 7127–7140\.Cited by:[§5](https://arxiv.org/html/2606.00008#S5.p3.1)\.
- G\. L\. Guimaraes, B\. Sanchez\-Lengeling, C\. Outeiral, P\. L\. C\. Farias, and A\. Aspuru\-Guzik \(2017\)Objective\-reinforced generative adversarial networks \(organ\) for sequence generation models\.arXiv preprint arXiv:1705\.10843\.Cited by:[§2\.1](https://arxiv.org/html/2606.00008#S2.SS1.p1.1)\.
- T\. Han, L\. C\. Adams, J\. Papaioannou, P\. Grundmann, T\. Oberhauser, A\. Löser, D\. Truhn, and K\. K\. Bressem \(2023\)MedAlpaca–an open\-source collection of medical conversational ai models and training data\.arXiv preprint arXiv:2304\.08247\.Cited by:[§2\.2](https://arxiv.org/html/2606.00008#S2.SS2.p1.1)\.
- J\. He, H\. You, E\. Sandström, E\. Nittinger, E\. J\. Bjerrum, C\. Tyrchan, W\. Czechtizky, and O\. Engkvist \(2021\)Molecular optimization by capturing chemist’s intuition using deep neural networks\.Journal of cheminformatics13\(1\),pp\. 26\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p1.1)\.
- S\. C\. Hoffman, V\. Chenthamarakshan, K\. Wadhawan, P\. Chen, and P\. Das \(2022\)Optimizing molecules using efficient queries from property evaluations\.Nature Machine Intelligence4\(1\),pp\. 21–31\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p2.1)\.
- E\. Hoogeboom, V\. G\. Satorras, C\. Vignac, and M\. Welling \(2022\)Equivariant diffusion for molecule generation in 3d\.InInternational conference on machine learning,pp\. 8867–8887\.Cited by:[§2\.1](https://arxiv.org/html/2606.00008#S2.SS1.p1.1)\.
- H\. Hsu, Y\. Hsu, L\. Chang, and J\. Yang \(2017\)An integrated approach with new strategies for qsar models and lead optimization\.BMC Genomics18,pp\. 1–9\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p1.1)\.
- J\. J\. Irwin, K\. G\. Tang, J\. Young, C\. Dandarchuluun, B\. R\. Wong, M\. Khurelbaatar, Y\. S\. Moroz, J\. Mayfield, and R\. A\. Sayle \(2020\)ZINC20—a free ultralarge\-scale chemical database for ligand discovery\.Journal of chemical information and modeling60\(12\),pp\. 6065–6073\.Cited by:[§5](https://arxiv.org/html/2606.00008#S5.p1.7)\.
- J\. H\. Jensen \(2019\)A graph\-based genetic algorithm and generative model/monte carlo tree search for the exploration of chemical space\.Chemical Ccience10\(12\),pp\. 3567–3572\.Cited by:[§2\.1](https://arxiv.org/html/2606.00008#S2.SS1.p1.1),[§5](https://arxiv.org/html/2606.00008#S5.p3.1)\.
- C\. Ji, Y\. Zheng, R\. Wang, Y\. Cai, and H\. Wu \(2021\)Graph polish: a novel graph generation paradigm for molecular optimization\.IEEE Transactions on Neural Networks and Learning Systems34\(5\),pp\. 2323–2337\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p2.1)\.
- W\. Jin, R\. Barzilay, and T\. Jaakkola \(2018\)Junction tree variational autoencoder for molecular graph generation\.InInternational Conference on Machine Learning,pp\. 2323–2332\.Cited by:[§2\.1](https://arxiv.org/html/2606.00008#S2.SS1.p1.1)\.
- L\. Kocsis and C\. Szepesvári \(2006\)Bandit based monte\-carlo planning\.InEuropean conference on machine learning,pp\. 282–293\.Cited by:[Appendix F](https://arxiv.org/html/2606.00008#A6.p2.3)\.
- J\. Li, W\. Liu, Z\. Ding, W\. Fan, Y\. Li, and Q\. Li \(2025\)Large language models are in\-context molecule learners\.IEEE Transactions on Knowledge and Data Engineering\.Cited by:[§4\.1](https://arxiv.org/html/2606.00008#S4.SS1.p2.1)\.
- Y\. Li, Z\. Li, K\. Zhang, R\. Dan, S\. Jiang, and Y\. Zhang \(2023\)Chatdoctor: a medical chat model fine\-tuned on a large language model meta\-ai \(llama\) using medical domain knowledge\.Cureus15\(6\)\.Cited by:[§2\.2](https://arxiv.org/html/2606.00008#S2.SS2.p1.1)\.
- Q\. Liu, J\. Ruan, H\. Li, H\. Zhao, D\. Wang, J\. Chen, W\. Guanglu, X\. Cai, Z\. Zheng, and T\. Xu \(2025a\)AMoPO: adaptive multi\-objective preference optimization without reward models and reference models\.ACL\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p3.1)\.
- Y\. Liu, J\. Yang, X\. Ren, Z\. Xinyi, Y\. Liu, B\. Song, X\. Zeng, and H\. Ishibuchi \(2025b\)Multi\-objective molecular design through learning latent pareto set\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 19006–19014\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p1.1)\.
- F\. Luo, J\. Zhang, Q\. Wang, and C\. Yang \(2025\)Leveraging prompt engineering in large language models for accelerating chemical research\.ACS Central Science\.Cited by:[§4\.1](https://arxiv.org/html/2606.00008#S4.SS1.p2.1)\.
- R\. Luo, L\. Sun, Y\. Xia, T\. Qin, S\. Zhang, H\. Poon, and T\. Liu \(2022\)BioGPT: generative pre\-trained transformer for biomedical text generation and mining\.Briefings in Bioinformatics23\(6\),pp\. bbac409\.Cited by:[§2\.2](https://arxiv.org/html/2606.00008#S2.SS2.p1.1)\.
- Ł\. Maziarka, A\. Pocha, J\. Kaczmarczyk, K\. Rataj, T\. Danel, and M\. Warchoł \(2020\)Mol\-cyclegan: a generative model for molecular optimization\.Journal of Cheminformatics12\(1\),pp\. 2\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p2.1)\.
- R\. Munoset al\.\(2014\)From bandits to monte\-carlo tree search: the optimistic principle applied to optimization and planning\.Foundations and Trends® in Machine Learning7\(1\),pp\. 1–129\.Cited by:[Appendix F](https://arxiv.org/html/2606.00008#A6.p2.3)\.
- T\. Nguyen and A\. Grover \(2024\)Lico: large language models for in\-context molecular optimization\.arXiv preprint arXiv:2406\.18851\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.00008#S2.SS2.p1.1)\.
- A\. Nigam, R\. Pollice, M\. Krenn, G\. dos Passos Gomes, and A\. Aspuru\-Guzik \(2021\)Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery \(stoned\) algorithm for molecules using selfies\.Chemical science12\(20\),pp\. 7079–7090\.Cited by:[§5](https://arxiv.org/html/2606.00008#S5.p3.1)\.
- J\. A\. OpenAI, S\. Adler, S\. Agarwal, L\. Ahmad, I\. Akkaya, F\. L\. Aleman, D\. Almeida, J\. Altenschmidt, S\. Altman, S\. Anadkat,et al\.\(2024\)Gpt\-4 technical report, 2024\.URL https://arxiv\. org/abs/2303\.087742,pp\. 6\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p3.1)\.
- OpenAI \(2024\)GPT\-4o mini\.Note:[https://openai\.com](https://openai.com/)Accessed: 2024Cited by:[§4\.1](https://arxiv.org/html/2606.00008#S4.SS1.p1.1)\.
- P\. G\. Polishchuk, T\. I\. Madzhidov, and A\. Varnek \(2013\)Estimation of the size of drug\-like chemical space based on gdb\-17 data\.Journal of computer\-aided molecular design27\(8\),pp\. 675–679\.Cited by:[§4\.2](https://arxiv.org/html/2606.00008#S4.SS2.p1.2)\.
- A\. Schneuing, C\. Harris, Y\. Du, K\. Didi, A\. Jamasb, I\. Igashov, W\. Du, C\. Gomes, T\. L\. Blundell, P\. Lio,et al\.\(2024\)Structure\-based drug design with equivariant diffusion models\.Nature Computational Science4\(12\),pp\. 899–909\.Cited by:[§2\.1](https://arxiv.org/html/2606.00008#S2.SS1.p1.1)\.
- C\. Shi, M\. Xu, Z\. Zhu, W\. Zhang, M\. Zhang, and J\. Tang \(2020\)Graphaf: a flow\-based autoregressive model for molecular graph generation\.arXiv preprint arXiv:2001\.09382\.Cited by:[§2\.1](https://arxiv.org/html/2606.00008#S2.SS1.p1.1)\.
- D\. Silver, J\. Schrittwieser, K\. Simonyan, I\. Antonoglou, A\. Huang, A\. Guez, T\. Hubert, L\. Baker, M\. Lai, A\. Bolton,et al\.\(2017\)Mastering the game of go without human knowledge\.nature550\(7676\),pp\. 354–359\.Cited by:[§4\.2](https://arxiv.org/html/2606.00008#S4.SS2.p1.2)\.
- \[45\]T\. Southiratn, B\. Koo, Y\. Lu, and S\. KimCombiMOTS: combinatorial multi\-objective tree search for dual\-target molecule generation\.InForty\-second International Conference on Machine Learning,Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p2.1)\.
- D\. Stumpfe and J\. Bajorath \(2012\)Exploring activity cliffs in medicinal chemistry: miniperspective\.Journal of medicinal chemistry55\(7\),pp\. 2932–2942\.Cited by:[§2\.1](https://arxiv.org/html/2606.00008#S2.SS1.p1.1)\.
- T\. Suzuki, D\. Ma, N\. Yasuo, and M\. Sekijima \(2024\)Mothra: multiobjective de novo molecular generation using monte carlo tree search\.Journal of Chemical Information and Modeling64\(19\),pp\. 7291–7302\.Cited by:[Appendix F](https://arxiv.org/html/2606.00008#A6.p1.6)\.
- A\. Tripp, G\. N\. Simm, and J\. M\. Hernández\-Lobato \(2021\)A fresh look at de novo molecular design benchmarks\.InNeurIPS 2021 AI for Science Workshop,Cited by:[§5](https://arxiv.org/html/2606.00008#S5.p3.1)\.
- K\. Van Moffaert, M\. M\. Drugan, and A\. Nowé \(2013\)Scalarized multi\-objective reinforcement learning: novel design techniques\.In2013 IEEE symposium on adaptive dynamic programming and reinforcement learning \(ADPRL\),pp\. 191–199\.Cited by:[Appendix F](https://arxiv.org/html/2606.00008#A6.p1.5)\.
- J\. Verhellen \(2022\)Graph\-based molecular pareto optimisation\.Chemical Science13\(25\),pp\. 7526–7535\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p2.1)\.
- H\. Wang, M\. Skreta, C\. Ser, W\. Gao, L\. Kong, F\. Strieth\-Kalthoff, C\. Duan, Y\. Zhuang, Y\. Yu, Y\. Zhu,et al\.\(2024\)Efficient evolutionary search over chemical space with large language models\.arXiv preprint arXiv:2406\.16976\.Cited by:[§2\.2](https://arxiv.org/html/2606.00008#S2.SS2.p1.1)\.
- R\. Wang, M\. Yang, and Y\. Shen \(2025\)Bridging molecular graphs and large language models\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 21234–21242\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p3.1)\.
- Z\. Wang, W\. Nie, Z\. Qiao, C\. Xiao, R\. Baraniuk, and A\. Anandkumar \(2022\)Retrieval\-based controllable molecule generation\.arXiv preprint arXiv:2208\.11126\.Cited by:[Appendix F](https://arxiv.org/html/2606.00008#A6.p3.3)\.
- C\. Wu, W\. Lin, X\. Zhang, Y\. Zhang, W\. Xie, and Y\. Wang \(2024\)PMC\-llama: toward building open\-source language models for medicine\.Journal of the American Medical Informatics Association,pp\. ocae045\.Cited by:[§2\.2](https://arxiv.org/html/2606.00008#S2.SS2.p1.1)\.
- X\. Xia, Y\. Liu, C\. Zheng, X\. Zhang, Q\. Wu, X\. Gao, X\. Zeng, and Y\. Su \(2024\)Evolutionary multiobjective molecule optimization in an implicit chemical space\.Journal of Chemical Information and Modeling64\(13\),pp\. 5161–5174\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p2.1)\.
- Y\. Xie, C\. Shi, H\. Zhou, Y\. Yang, W\. Zhang, Y\. Yu, and L\. Li \(2021\)Mars: markov molecular sampling for multi\-objective drug discovery\.arXiv preprint arXiv:2103\.10432\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p2.1)\.
- C\. Yang, X\. Wang, Y\. Lu, H\. Liu, Q\. V\. Le, D\. Zhou, and X\. Chen \(2023\)Large language models as optimizers\.InThe Twelfth International Conference on Learning Representations,Cited by:[§2\.1](https://arxiv.org/html/2606.00008#S2.SS1.p1.1)\.
- Y\. Yang, G\. Chen, J\. Li, J\. Li, O\. Zhang, X\. Zhang, L\. Li, J\. Hao, E\. Wang, and P\. Heng \(2024\)Enabling target\-aware molecule generation to follow multi objectives with pareto mcts\.Communications Biology7\(1\),pp\. 1074\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p1.1),[§1](https://arxiv.org/html/2606.00008#S1.p2.1)\.
- J\. Yasonik \(2020\)Multiobjective de novo drug design with recurrent neural networks and nondominated sorting\.Journal of Cheminformatics12\(1\),pp\. 14\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p2.1)\.
- G\. Ye, X\. Cai, H\. Lai, X\. Wang, J\. Huang, L\. Wang, W\. Liu, and X\. Zeng \(2025\)Drugassist: a large language model for molecule optimization\.Briefings in Bioinformatics26\(1\),pp\. bbae693\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.00008#S2.SS2.p1.1),[§5](https://arxiv.org/html/2606.00008#S5.p3.1)\.
- J\. Yu, Y\. Zheng, H\. Y\. Koh, S\. Pan, T\. Wang, and H\. Wang \(2025\)Collaborative expert llms guided multi\-objective molecular optimization\.arXiv preprint arXiv:2503\.03503\.Cited by:[§1](https://arxiv.org/html/2606.00008#S1.p3.1)\.
- P\. Zhang, X\. Peng, R\. Han, T\. Chen, and J\. Ma \(2025\)Rag2Mol: structure\-based drug design based on retrieval augmented generation\.Briefings in Bioinformatics26\(3\)\.Cited by:[Appendix F](https://arxiv.org/html/2606.00008#A6.p3.3)\.
- E\. Zitzler, L\. Thiele, M\. Laumanns, C\. M\. Fonseca, and V\. G\. Da Fonseca \(2003\)Performance assessment of multiobjective optimizers: an analysis and review\.IEEE Transactions on evolutionary computation7\(2\),pp\. 117–132\.Cited by:[§5](https://arxiv.org/html/2606.00008#S5.p2.1)\.

## Appendix AExpert Prompt Design

This section provides a detailed description of the prompt design templates for each expert agent in the ATOM framework, as well as the auxiliary tool components employed during optimization\. When constructing prompts, we closely follow the example formats provided in the original open\-source implementations to ensure that the large language model \(LLM\) can reliably interpret task instructions and execute the specified operations correctly\. To ensure stable system behavior and facilitate downstream parsing, the output format of each expert agent is strictly standardized\.

Figures 7, 8, and 9 present the prompt templates used for the GSK3β\\betaexpert, the JNK3 expert, and the synthetic accessibility \(SA\) expert, respectively\. For the SA property, we apply a normalization procedure to rescale its value range to\[0,1\]\[0,1\]and reformulate the original minimization objective as a maximization problem\. Specifically, an SA value closer to 1 indicates higher synthetic accessibility, whereas values closer to 0 correspond to increased synthetic difficulty\.

The ATOM framework integrates two primary tool components:RDKitandOracle\. RDKit is a widely used open\-source cheminformatics library that supports molecular representation, manipulation, and analysis across multiple formats, including SMILES, SMARTS, and SDF\. It serves as the core backend for molecular construction and property computation during the optimization process\. The Oracle library provides a unified interface for molecular property evaluation and similarity assessment, enabling efficient screening of candidate molecules based on predicted properties and structural similarity\. Together, these tools support reliable and scalable molecular evaluation within the ATOM framework\.

GSK3β\\betaExpert PromptRole Description:You are an expert computational chemist and molecular optimization specialist\. You have extensive knowledge of medicinal chemistryotation, and structure–activity relationships \(SAR\)\. Your task is to design novel molecules that show improved GSK3beta inhibitory activity while maintaining good drug\-likeness and chemical validity\. Your goal is to optimize these molecules to generate 50 new compounds with improved GSK3beta inhibition activity\.Please follow these guidelines carefully:Input: A list of starting SMILES strings \(example below\)\.Objective: Increase the predicted or expected GSK3beta inhibitory activity\.Output: Generate exactly 50 valid SMILES strings representing new molecules\.Constraints:The generated molecules should retain the core scaffold or key pharmacophores of the starting compounds \(do not generate entirely unrelated structures\)\. Ensure synthetic accessibility \(reasonable SA scores\) and good drug\-likeness \(high QED or Lipinski compliance\)\. Each SMILES must be syntactically valid and parseable by RDKit\.Optimization strategies\{Adjusted according to reference molecular dynamics\}reference molecular:\{Select from parent node, sibling nodes, and dynamic knowledge base\}Output Requirements:Output the 50 optimized SMILES only, one per line, numbered from 1 to 50\. Do not include explanations, commentary, or non\-SMILES text\.Figure 7:Prompt design template for the GSK3β\\betaexpert agent\.JNK3 Expert PromptRole Description:You are an expert computational chemist and molecular optimization specialist\. You have extensive knowledge of medicinal chemistryotation, and structure–activity relationships \(SAR\)\. Your task is to design novel molecules that show improved JNK3 inhibitory activity while maintaining good drug\-likeness and chemical validity\. Your goal is to optimize these molecules to generate 50 new compounds with improved JNK3 inhibition activity\.Please follow these guidelines carefully:Input: A list of starting SMILES strings \(example below\)\.Objective: Increase the predicted or expected JNK3 inhibitory activity\.Output: Generate exactly 50 valid SMILES strings representing new molecules\.Constraints:The generated molecules should retain the core scaffold or key pharmacophores of the starting compounds \(do not generate entirely unrelated structures\)\. Ensure synthetic accessibility \(reasonable SA scores\) and good drug\-likeness \(high QED or Lipinski compliance\)\. Each SMILES must be syntactically valid and parseable by RDKit\.Optimization strategies\{Adjusted according to reference molecular dynamics\}reference molecular:\{Select from parent node, sibling nodes, and dynamic knowledge base\}Output Requirements:Output the 50 optimized SMILES only, one per line, numbered from 1 to 50\. Do not include explanations, commentary, or non\-SMILES text\.SA Expert PromptRole Description:You are an expert computational chemist and molecular optimization specialist\. You have extensive knowledge of medicinal chemistry, SMILES notation, synthesis planning,and structure–activity relationships \(SAR\)\. Your task is to design novel molecules that improve synthetic accessibility \(lower SA score\) while maintaining stable JNK3 and GSK3beta activities and preserving the drug\-like properties\.Please follow these guidelines carefully:Input: A list of starting SMILES strings \(example below\)\.Objective: Decrease the predicted SA score \(make the molecule easier to synthesize\)Output: Generate exactly 50 valid SMILES strings representing new molecules\.Constraints:Exactly 50 output SMILES Each SMILES must be valid and RDKit\-parseable Modifications should be mild and chemically meaningful:simplify overly complex ring linkages,replace exotic fragments with classical synthetic handles ,reduce steric congestion or macrocyclic motifs ,use robust bioisosteric substitutions ,avoid exotic heterocycles or unusual protecting\-group\-like fragments\.Optimization strategies\{Adjusted according to reference molecular dynamics\}reference molecular:\{Select from parent node, sibling nodes, and dynamic knowledge base\}Output Requirements:Output the 50 optimized SMILES only, one per line, numbered from 1 to 50\. Do not include explanations, commentary, or non\-SMILES text\.Figure 8:Prompt design template for the SA expert agent\.
## Appendix BIntroduction To Related Properties

We consider the following molecular objectives used throughout this work: inhibition potency against GSK3β\\betaand JNK3, and two widely used drug\-likeness / synthesizability metrics \(QED and SA\)\.

##### GSK3β\\beta\.

Glycogen synthase kinase 3 beta \(GSK3β\\mathrm\{GSK3\}\\beta\) is a serine/threonine protein kinase involved in glycogen metabolism, cell proliferation, differentiation, and apoptosis\. It has attracted substantial interest in neurodegenerative disease research because it regulates tau phosphorylation, amyloid precursor protein processing, and neuronal survival\. Consequently, GSK3β\\betainhibition is considered an important molecular objective in drug discovery targeting neurodegeneration\.

##### JNK3\.

c\-Jun N\-terminal kinase 3 \(JNK3\) is a member of the MAPK family primarily expressed in the central nervous system\. JNK3 mediates cellular stress responses including apoptosis and inflammation; inhibiting JNK3 is therefore a relevant objective for mitigating neuronal cell death and inflammatory processes in neurodegenerative disease models\.

##### QED\.

Quantitative Estimate of Drug\-likeness \(QED\) aggregates several physicochemical properties \(molecular weight, lipophilicity/logP, topological polar surface area, counts of H\-bond donors/acceptors, aromatic ring count, and rotatable bonds\) into a single score on the interval\[0,1\]\[0,1\]\. Higher QED values indicate molecules that are more likely to exhibit favorable drug\-like properties\.

##### SA and normalized SA\.

Synthetic Accessibility \(SA\) is a heuristic score used to estimate the ease of laboratory synthesis; SA typically ranges from 1 \(easy\) to 10 \(difficult\)\. To treat synthetic feasibility as a maximization objective, we follow prior work and convert SA to a normalized score in\[0,1\]\[0,1\]via

Normalized\_SA=1−SA−19,\\mathrm\{Normalized\\\_SA\}\\;=\\;1\-\\frac\{\\mathrm\{SA\}\-1\}\{9\},\(19\)so that larger values correspond to higher synthetic accessibility\.

## Appendix CDetails of correlation calculation between different objectives

To characterize the relationships among the molecular objectives considered in this work, we conducted a correlation analysis on a random subset of molecules sampled from the ZINC20 database\. Specifically, we evaluated the pairwise dependencies between drug\-likeness \(QED\), synthetic accessibility \(SA\), and predicted binding affinities to JNK3 and GSK3β\\beta\.

For each molecule, QED and SA scores were computed using RDKit, while JNK3 and GSK3β\\betascores were obtained from the corresponding oracle models\. We then computed pairwise Spearman rank correlation coefficients, which capture monotonic relationships without assuming linearity and are robust to non\-Gaussian score distributions commonly observed in molecular property spaces\.

Figure[1](https://arxiv.org/html/2606.00008#S1.F1)presents the Pearson correlation coefficients between optimization objectives\. We observe that QED and SA are nearly decoupled \(r=0\.051r=0\.051\), while the two kinase objectives \(GSK3β\\betaand JNK3\) show a moderate positive correlation \(r=0\.351r=0\.351\), likely reflecting shared binding motifs\. The consistently weak correlations between biological activity and chemical feasibility metrics \(\|r\|≤0\.220\|r\|\\leq 0\.220\) suggest that improvements in binding affinity do not inherently translate to superior drug\-likeness, confirming the necessity of a multi\-objective optimization approach\.

These results highlight the intrinsic multi\-objective nature of the optimization problem and motivate the use of explicit multi\-objective optimization strategies in this work\.

## Appendix DPerformance on Top\-ranked Candidate Molecules

We further evaluate method effectiveness by reporting top\-10 and top\-50 average scores for each objective, together with overall average score and average rank \(Tables 2 and 3\)\. For the biological objective pair \(GSK3β\\beta\+ JNK3\), ATOM achieves the best overall ranking \(average rank = 1\.0\), with consistently strong top\-10 and top\-50 scores on both targets \(GSK3β\\beta: 0\.953 / 0943; JNK3: 0\.792 / 0\.754\)\. In contrast, while STONED performs well on GSK3β\\beta\(top\-50: 0\.897\), its performance on JNK3 is substantially lower \(average: 0\.342\), indicating difficulty in jointly optimizing correlated bioactivity objectives\. These results highlight ATOM’s advantage in coordinating expert agents under interdependent biological constraints\. For the non\-biological task \(QED \+ SA\), multiple methods achieve competitive results\. ATOM attains the highest average QED score \(0\.794\) and SA score \(0\.819\)\. SMILES GA slightly outperforms ATOM on top\-10 SA \(0\.997\) but lags behind on QED\. Overall, these results suggest that while some baselines excel at individual non\-biological objectives, ATOM maintains more balanced performance across objectives, particularly in challenging biological settings\.

Table 2:Comparison of GSK3β\\betaand JNK3 performance across different methods\.Table 3:Comparison of QED and SA performance across different methods\.
## Appendix EDiversity and Novelty Analysis

In Table 4, we present the diversity \(Div\) and novelty \(Nov\) scores of molecular libraries generated by different methods under six multi\-objective optimization settings\. Among them, theATOMmethod demonstrates consistently strong and balanced performance across nearly all experimental configurations\. Notably, ATOM achieves exceptionally high novelty scores, reaching 0\.9994 or above in most settings and approaching or attaining values close to 0\.9999 in several cases\. This substantially outperforms many conventional generative models \(e\.g\., SMILES LSTM, GB PO, Druggassist\) and several reinforcement\-learning or evolutionary algorithm baselines, indicating that ATOM is highly effective at exploring previously under\-sampled regions of chemical space and generating structurally novel molecules\. At the same time, ATOM maintains competitive diversity scores, typically falling in the 0\.75–0\.85 range across most settings\. In certain combinations \(e\.g\., QED\+SA, JNK3\+QED\+SA\), its diversity is comparable to or even superior to some strong baselines, showing that it can preserve reasonable molecular diversity while satisfying drug\-likeness and synthesizability constraints\. Overall, ATOM achieves a favorable trade\-off between diversity and novelty in multi\-objective molecule generation\. Its ability to deliver extremely high novelty while still maintaining acceptable diversity makes it particularly promising for drug discovery scenarios that require both exploration of novel chemical matter and adherence to desired property distributions\.

Table 4:Diversity \(Div\) and Novelty \(Nov\) of generated molecules under different multi\-objective settings\. All values are rounded to 4 decimal places\.
## Appendix FAssumption of Theoretical Analysis

Assumption 1\(Orthogonal Descent Capability\)\. For any statessthat is a local optimum for objectiveii\(where∇fi\(s\)≈0\\nabla f\_\{i\}\(s\)\\approx 0\) but not Pareto optimal, there exists at least one agentAgjAg\_\{j\}\(j≠ij\\neq i\) such that the expected improvement in the joint value function is lower\-bounded\(Van Moffaertet al\.,[2013](https://arxiv.org/html/2606.00008#bib.bib21); Alegreet al\.,[2023](https://arxiv.org/html/2606.00008#bib.bib22)\):

𝔼\[V\(T\(s,Agj\)\)−V\(s\)\]≥δ\>0\.\\mathbb\{E\}\[V\(T\(s,Ag\_\{j\}\)\)\-V\(s\)\]\\geq\\delta\>0\.\(20\)This implies that the agents provide ”orthogonal” gradient directions to escape single\-objective local optima\(Suzukiet al\.,[2024](https://arxiv.org/html/2606.00008#bib.bib20)\)\.

Assumption 2\(Optimistic Pruning Condition\)\. The exploration bonuscdc\_\{d\}is calibrated such that with high probability1−β1\-\\beta, the true valueV\(s\)V\(s\)is bounded by the optimistic estimate\(Munos and others,[2014](https://arxiv.org/html/2606.00008#bib.bib25); Kocsis and Szepesvári,[2006](https://arxiv.org/html/2606.00008#bib.bib26)\):

\|V\(s\)−U\(s\)\|≤cd\.\|V\(s\)\-U\(s\)\|\\leq c\_\{d\}\.\(21\)Furthermore, the noiseσd\\sigma\_\{d\}\(and thuscdc\_\{d\}\) decays as a function of depthdd\(e\.g\.,σd∝d−γ\\sigma\_\{d\}\\propto d^\{\-\\gamma\}ore−αde^\{\-\\alpha d\}\), reflecting that evaluations become more certain closer to leaf nodes \(refined molecules\)\.

Assumption 3\(Knowledge Base Guidance\)\. The Dynamic Knowledge Base𝒦\\mathcal\{K\}reduces the effective search horizon\(Zhanget al\.,[2025](https://arxiv.org/html/2606.00008#bib.bib23); Wanget al\.,[2022](https://arxiv.org/html/2606.00008#bib.bib24)\)\. For a target property profile requiringLLedit steps froms0s\_\{0\}, the conditional generation probability using KB prompts satisfies:

ℙ\(target inLKBsteps∣𝒦\)≥ℙ\(target inLsteps∣∅\),\\mathbb\{P\}\(\\text\{target in \}L\_\{KB\}\\text\{ steps\}\\mid\\mathcal\{K\}\)\\geq\\mathbb\{P\}\(\\text\{target in \}L\\text\{ steps\}\\mid\\emptyset\),\(22\)whereLKB<LL\_\{KB\}<L\. This effectively contracts the required search depth for complex objectives\.

## Appendix GBaseline Details

In this work, we compare our method against several established baselines on molecular optimization tasks\. The implementations used are listed below, prioritizing official code repositories or standardized benchmark suites wherever available\.

- •SMILES LSTM A sequence\-based LSTM model operating on SMILES strings, typically combined with hill\-climbing or reinforcement learning strategies for property optimization\. We use the official implementation from the ChemLactica Test Suite: [https://github\.com/YerevaNN/ChemLacticaTestSuite/tree/master/mol\_opt](https://github.com/YerevaNN/ChemLacticaTestSuite/tree/master/mol_opt)\(specificallysmiles\_lstm\_hc\)\.
- •SMILES GA A genetic algorithm operating directly on SMILES strings, performing crossover and mutation in string space\. We use the official implementation from the ChemLactica Test Suite: [https://github\.com/YerevaNN/ChemLacticaTestSuite/tree/master/mol\_opt](https://github.com/YerevaNN/ChemLacticaTestSuite/tree/master/mol_opt)\(specificallysmiles\_ga\)\.
- •Graph GA A genetic algorithm that operates on molecular graph representations, enabling more chemically meaningful edit operations\. We use the official implementation from the ChemLactica Test Suite: [https://github\.com/YerevaNN/ChemLacticaTestSuite/tree/master/mol\_opt](https://github.com/YerevaNN/ChemLacticaTestSuite/tree/master/mol_opt)\(specificallygraph\_ga\)\.
- •STONED A fragment\-based genetic algorithm using SELFIES representations for efficient local exploration of chemical space\. We use the official implementation from the ChemLactica Test Suite: [https://github\.com/YerevaNN/ChemLacticaTestSuite/tree/master/mol\_opt](https://github.com/YerevaNN/ChemLacticaTestSuite/tree/master/mol_opt)\(specificallystoned\)\.
- •GB PO A graph\-based or Bayesian optimization method for property\-guided molecular improvement \(as implemented in the benchmark suite, closely aligned with GPBO\-style approaches\)\. We use the official implementation from the ChemLactica Test Suite: [https://github\.com/YerevaNN/ChemLacticaTestSuite/tree/master/mol\_opt](https://github.com/YerevaNN/ChemLacticaTestSuite/tree/master/mol_opt)\.
- •DrugAssist A large language model\-based approach for molecule optimization guided by natural language prompts\. We use the official code repository: [https://github\.com/blazerye/DrugAssist](https://github.com/blazerye/DrugAssist)\.
- •EAG A sequential multi\-agent collaboration method for complex reasoning \(details based on the original description\)\. Since no official open\-source implementation is available, we re\-implemented the core components based on the method’s published algorithmic description\.
Agents on a Tree: Pathwise Coordination for Multi-Objective Molecular Optimization

Similar Articles

ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery

COAgents: Multi-Agent Framework to Learn and Navigate Routing Problems Search Space

Molecular Lead Optimization via Agentic Tool Planning

AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows

AgentPSO: Evolving Agent Reasoning Skill via Multi-agent Particle Swarm Optimization

Submit Feedback

Similar Articles

ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery
COAgents: Multi-Agent Framework to Learn and Navigate Routing Problems Search Space
Molecular Lead Optimization via Agentic Tool Planning
AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows
AgentPSO: Evolving Agent Reasoning Skill via Multi-agent Particle Swarm Optimization