Reading the Cell, Designing the Cure: Perturbation-Conditioned Molecular Diffusion for Function-Oriented Drug Design

arXiv cs.LG 05/18/26, 04:00 AM Papers
Summary
This paper formalizes transcriptome-based drug design (TBDD) as a generative inverse problem and proposes CURE, a multi-resolution transcriptome-guided diffusion framework that generates drug molecules conditioned on desired transcriptomic state transitions.
arXiv:2605.15243v1 Announce Type: new Abstract: When reliable target structures are unavailable at scale or phenotypes arise from dysregulated pathways, transcriptomic perturbations provide a system-level functional readout for drug action. In this work, we formalize \emph{Transcriptome-based Drug Design (TBDD)} as a generative inverse problem: designing drug molecules conditioned on desired transcriptomic state transitions. We analyze the inherently ill-posed nature of this task, which is further complicated by the profound domain gap between biology and chemistry and by the sparsity of transcriptomic signals. To address these challenges, we propose \textbf{\themodel{}} (A \textbf{C}ell\textbf{U}lar \textbf{R}esponse \textbf{E}ngine), a multi-resolution transcriptome-guided diffusion framework. \themodel{} features a specialized \textbf{Transcriptome Perturbation Functional Feature Extractor (TFE)} that (1) distills function-oriented perturbation embeddings from pre/post states, (2) aligns these signatures to dual chemical views to bridge the cross-modal gap, and (3) performs heterogeneity-aware aggregation to extract robust state-specific signals from noisy transcriptomic data. Extensive evaluations on both standard benchmarks and rigorous out-of-distribution protocols demonstrate that \themodel{} consistently outperforms strong baselines in structural quality and functional consistency. Furthermore, we validate its practical utility via a zero-shot gene-inhibitor design task, highlighting the potential of phenotype-driven generative discovery.
Original Article
View Cached Full Text
Cached at: 05/18/26, 06:38 AM
# Reading the Cell, Designing the Cure: Perturbation-Conditioned Molecular Diffusion for Function-Oriented Drug Design
Source: [https://arxiv.org/html/2605.15243](https://arxiv.org/html/2605.15243)
###### Abstract

When reliable target structures are unavailable at scale or phenotypes arise from dysregulated pathways, transcriptomic perturbations provide a system\-level functional readout for drug action\. In this work, we formalize*Transcriptome\-based Drug Design \(TBDD\)*as a generative inverse problem: designing drug molecules conditioned on desired transcriptomic state transitions\. We analyze the inherently ill\-posed nature of this task, which is further complicated by the profound domain gap between biology and chemistry and by the sparsity of transcriptomic signals\. To address these challenges, we proposeCURE\(ACellUlarResponseEngine\), a multi\-resolution transcriptome\-guided diffusion framework\. CURE features a specializedTranscriptome Perturbation Functional Feature Extractor \(TFE\)that \(1\) distills function\-oriented perturbation embeddings from pre/post states, \(2\) aligns these signatures to dual chemical views to bridge the cross\-modal gap, and \(3\) performs heterogeneity\-aware aggregation to extract robust state\-specific signals from noisy transcriptomic data\. Extensive evaluations on both standard benchmarks and rigorous out\-of\-distribution protocols demonstrate that CURE consistently outperforms strong baselines in structural quality and functional consistency\. Furthermore, we validate its practical utility via a zero\-shot gene\-inhibitor design task, highlighting the potential of phenotype\-driven generative discovery\.

Transcriptome\-Guided Drug Design, Single\-Cell Transcriptomics, Molecular Generation, Graph Diffusion Models

![Refer to caption](https://arxiv.org/html/2605.15243v1/x1.png)Figure 1:Performance overview across diverse evaluation metrics\.CURE achieves strong overall performance across structural metrics and function\-consistency proxies in both in\-distribution and out\-of\-distribution settings\.![Refer to caption](https://arxiv.org/html/2605.15243v1/x2.png)Figure 2:Schematic illustration of TBDD and its relationships with existing paradigms\.TBDD is the*reverse*\(design\) direction complementary to*Perturbation Prediction*, and serves as a function\-oriented complement to SBDD\.## 1Introduction

Drug discovery remains a costly and failure\-prone process\(Sadybekov and Katritch,[2023](https://arxiv.org/html/2605.15243#bib.bib20)\)\. While computational pipelines have long aimed to accelerate this trajectory, the field has been predominantly governed by*Structure\-Based Drug Design \(SBDD\)*\(Baiet al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib36); Sainiet al\.,[2025](https://arxiv.org/html/2605.15243#bib.bib21)\)\. SBDD relies on the lock\-and\-key principle, utilizing three\-dimensional \(3D\) protein target structures to design high\-affinity ligands\. However, this reductionist paradigm faces inherent bottlenecks: it falters when target structures are unknown \(e\.g\., disordered proteins\) or when disease phenotypes emerge from dysregulated multi\-pathway networks rather than a single actionable target\(Munsonet al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib48)\)\. Consequently, there is an urgent need for a complementary,*function\-oriented*design paradigm that can bypass explicit target structural constraints and directly address cellular phenotypic shifts\.

Transcriptomic perturbation signatures offer precisely this functional blueprint\. Unlike static structural data, the transition from a pre\-perturbation state to a post\-perturbation state \(i\.e\.,𝐓pre→𝐓post\\mathbf\{T\}\_\{\\mathrm\{pre\}\}\\to\\mathbf\{T\}\_\{\\mathrm\{post\}\}\) captures the global functional impact of a molecule on a cellular system\(Bunneet al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib25); Jiet al\.,[2021](https://arxiv.org/html/2605.15243#bib.bib8)\)\. This differential profile integrates pathway\-level interactions and network effects, effectively encoding the*molecule’s mechanism of action \(MoA\)*\. Despite the richness of this data, existing transcriptomics\-driven machine learning methods predominantly address the*forward*problem:*predicting cellular responses to known compounds*\(Hsiehet al\.,[2023](https://arxiv.org/html/2605.15243#bib.bib43); Weiet al\.,[2022](https://arxiv.org/html/2605.15243#bib.bib45)\)\. This asymmetry leaves the full potential of perturbation data untapped\. We argue that to truly complement SBDD, we must invert this workflow: leveraging phenotypic signatures not as prediction targets, but as*generative conditions*to guide the design of molecules that induce desired functional states \(Figure[2](https://arxiv.org/html/2605.15243#S0.F2)\)\.

To this end, we focus on*Transcriptome\-based Drug Design \(TBDD\)*\. Although preliminary explorations have touched upon this question, the field lacks a rigorous problem formulation and a systematic evaluation framework\. We formalize TBDD as an inverse problem: given a target functional transition\(𝐓pre,𝐓post\)\(\\mathbf\{T\}\_\{\\mathrm\{pre\}\},\\mathbf\{T\}\_\{\\mathrm\{post\}\}\)representing a therapeutic goal, the objective is to learn a conditional generatorp\(𝐆∣𝐓pre,𝐓post\)p\(\\mathbf\{G\}\\mid\\mathbf\{T\}\_\{\\mathrm\{pre\}\},\\mathbf\{T\}\_\{\\mathrm\{post\}\}\)over drug molecules\. This setting is \(i\)*orthogonal*to SBDD, conditioning on functional outcomes rather than physical constraints, and \(ii\)*inverse*to perturbation prediction\. Crucially, TBDD is inherently ill\-posed: transcriptomes encode functional effects rather than a unique atomic blueprint, and many distinct structures can yield similar signatures\. We embrace this reality with a distributional view: instead of seeking a unique inverse, we aim to sample diverse,*functionally consistent candidates*\.

Despite its promise, three challenges make TBDD difficult in practice\.\(1\) Cross\-modality domain gap:transcriptomic profiles and molecular graphs differ fundamentally in information density and inductive biases, making naïve direct conditioning unstable\(Xiaoet al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib46); Zhouet al\.,[2025](https://arxiv.org/html/2605.15243#bib.bib47)\)\.\(2\) Sparse, noisy single\-cell signals:single\-cell RNA\-seq offers access to heterogeneous drug responses, but dropout, batch effects, and high\-dimensional noise make conditioning brittle\. Meanwhile, compatibility with bulk transcriptomics is essential to exploit vast, high\-value legacy datasets\(Hafemeister and Halbritter,[2023](https://arxiv.org/html/2605.15243#bib.bib35); Van de Sandeet al\.,[2023](https://arxiv.org/html/2605.15243#bib.bib42)\)\.\(3\) Evaluation under limited ground truth:large\-scale wet\-lab validation is expensive, requiring careful proxy evaluation, strong retrieval baselines, and audit\-friendly split protocols to mitigate leakage and memorization concerns\.

To address these challenges, we presentCURE\(ACellUlarResponseEngine for Transcriptome\-based Drug Design\), a multi\-resolution transcriptome\-guided diffusion framework forde novomolecular generation\. CURE introduces a Transcriptome Perturbation Functional Feature Extractor\(TFE\)that \(i\) distills a function\-oriented perturbation embedding via the Bidirectional Transcriptome Perturbation Signal Interaction module\(TFE\-I\)and maps it into*dual\-view aligned chemical domains*\(graph\-topology and fingerprint views\) through the Dual\-View Molecular Domain Alignment module\(TFE\-A\); and \(ii\) leverages sparse scRNA\-seq with the Heterogeneity\-Aware Transcriptome Aggregation module\(TFE\-H\)to suppress technical noise while preserving subpopulation variation\. Finally, CURE employs a Graph Diffusion Transformer as the generative backbone, which iteratively reconstructs molecular graphs by conditioning on the extracted perturbation representations via Adaptive Layer Normalization \(AdaLN\)\.

Across multiple datasets and evaluation axes \(distributional quality, structural sanity/diversity, and function\-consistency proxies assessed by independent perturbation estimators\), CURE consistently outperforms strong baselines \(Figure[1](https://arxiv.org/html/2605.15243#S0.F1)\)\. We further showcase a zero\-shot gene\-inhibitor design scenario, illustrating the practical utility of transcriptome\-guided generation\. Our contributions are as follows:

- •Weformalize the taskof TBDD and provide asystematic analysisof its unique challenges\.
- •We proposeCURE, a multi\-resolution diffusion framework that enables robust conditioning by aligning functional signals with chemical domains and suppressing noise in sparse transcriptomic data\.
- •We design acomprehensive evaluation suiteincorporating rigorous out\-of\-distribution and zero\-shot protocols, demonstrating CURE’s consistent superiority over baselines in both structural and functional metrics\.

#### Conflict of Interest Disclosure\.

The authors declare no financial conflicts of interest related to the work presented in this paper\.

## 2Related Work

Machine\-Learning–Based Molecular Design\.Deep molecular design has evolved from SMILES sequence models to graph\-based approaches that preserve molecular topology\(Wanget al\.,[2025](https://arxiv.org/html/2605.15243#bib.bib14); Gómez\-Bombarelliet al\.,[2018](https://arxiv.org/html/2605.15243#bib.bib49); Huet al\.,[2025](https://arxiv.org/html/2605.15243#bib.bib57)\)\. Hierarchical generators such as\(Jinet al\.,[2020](https://arxiv.org/html/2605.15243#bib.bib2); Youet al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib28); Weller and Rohs,[2024](https://arxiv.org/html/2605.15243#bib.bib40)\)efficiently construct large molecules in a coarse\-to\-fine manner\. Yet unconditional generation is unfocused for drug\-design goals\. Transformer\-based graph diffusion models\(Liuet al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib1); Penget al\.,[2023](https://arxiv.org/html/2605.15243#bib.bib15); Hoogeboomet al\.,[2022](https://arxiv.org/html/2605.15243#bib.bib16); Schneuinget al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib26)\)enable multi\-conditional generation via mechanisms like AdaLN to inject external signals\.*Structure\-based drug design*\(SBDD\) remains a classical conditional paradigm that uses 3D pocket structures to guide ligand generation\(Alakhdaret al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib38); Guanet al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib39)\), but its single\-target perspective limits performance on multi\-pathway diseases and relies on high\-quality protein structures\(Isertet al\.,[2023](https://arxiv.org/html/2605.15243#bib.bib22); Wanget al\.,[2018](https://arxiv.org/html/2605.15243#bib.bib23); Fahim,[2025](https://arxiv.org/html/2605.15243#bib.bib24)\)\.

Cellular\-Perturbation Transcriptomics\.Transcriptomics offers a comprehensive snapshot of cellular function\. Large perturbational resources, such as\(Subramanianet al\.,[2017](https://arxiv.org/html/2605.15243#bib.bib3); Gaoet al\.,[2019](https://arxiv.org/html/2605.15243#bib.bib29); Zhanget al\.,[2025](https://arxiv.org/html/2605.15243#bib.bib4)\), provide massive gene\-expression profiles under chemical or genetic perturbations\. Building upon them, predictive models\(Qiet al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib5); Hetzelet al\.,[2022](https://arxiv.org/html/2605.15243#bib.bib6); Lotfollahiet al\.,[2019](https://arxiv.org/html/2605.15243#bib.bib7); Roohaniet al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib9)\)integrate chemistry and baseline state to forecast single\-cell or bulk responses, while frameworks like\(Adduriet al\.,[2025](https://arxiv.org/html/2605.15243#bib.bib13)\)target heterogeneity and batch effects\. Although useful for simulating responses, such models are predictive rather than generative\. Emerging*transcriptome\-guided generation*methods\(Li and Yamanishi,[2025](https://arxiv.org/html/2605.15243#bib.bib53); Kaitoh and Yamanishi,[2021](https://arxiv.org/html/2605.15243#bib.bib11); Chenget al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib12)\)depend on explicit statistics that risk losing information, and they still face the ill\-posedness of mapping macroscopic signals to complete structures\. These issues underline the need for function\-centric conditioning and architectural decomposition, which we pursue in CURE\.

## 3Setting and Problem Formulation

We consider three spaces\. Thechemical space𝒢\\mathcal\{G\}contains molecules represented as attributed graphs𝐆=\(𝒱,ℰ\)\\mathbf\{G\}=\(\\mathcal\{V\},\\mathcal\{E\}\)\. Thetranscriptome space𝒯⊂ℝd\\mathcal\{T\}\\subset\\mathbb\{R\}^\{d\}contains gene\-expression states𝐓∈ℝd\\mathbf\{T\}\\in\\mathbb\{R\}^\{d\}, whereddis the number of measured genes \(bulk\) or a harmonized feature dimension \(single\-cell\)\.

Aperturbation signaturecan be specified as\(𝐓pre,𝐓post\)∈𝒯×𝒯\(\\mathbf\{T\}\_\{\\mathrm\{pre\}\},\\mathbf\{T\}\_\{\\mathrm\{post\}\}\)\\in\\mathcal\{T\}\\times\\mathcal\{T\}or a derived representation𝐳=g\(𝐓pre,𝐓post\)\\mathbf\{z\}=g\(\\mathbf\{T\}\_\{\\mathrm\{pre\}\},\\mathbf\{T\}\_\{\\mathrm\{post\}\}\)\(e\.g\., log\-fold change or learned embeddings\)\. Given optional cellular contextcc\(cell type, state, batch, etc\.\), the goal of TBDD is to learn a conditional distribution over molecules

p\(𝐆∣𝐳,c\)=p\(𝐆∣𝐓pre,𝐓post,c\),p\(\\mathbf\{G\}\\mid\\mathbf\{z\},c\)=p\(\\mathbf\{G\}\\mid\\mathbf\{T\}\_\{\\mathrm\{pre\}\},\\mathbf\{T\}\_\{\\mathrm\{post\}\},c\),\(1\)from which we can sample candidate molecules whose induced cellular responses are functionally consistent with the target signature\.

## 4Multi\-Resolution Transcriptome\-Guided Diffusion Model

### 4\.1Model Architecture

![Refer to caption](https://arxiv.org/html/2605.15243v1/x3.png)Figure 3:Overall architecture of CURE\. The model consists of a Transcriptome Perturbation Functional Feature Extractor \(TFE\) that processes transcriptome expression data\(𝐓pre,𝐓post\)\(\\mathbf\{T\}\_\{\\mathrm\{pre\}\},\\mathbf\{T\}\_\{\\mathrm\{post\}\}\)to produce a conditional embedding\(𝐂\)\(\\mathbf\{C\}\), and Perturbation feature\-guided Molecular graph Diffusion model \(PMD\) uses the condition to generate a target molecule\.Our proposed CURE method constructs a graph diffusion model based on transcriptome perturbation signals in gene expression profiles for controlled molecule generation\. This model consists of two main parts: aTranscriptome perturbation Functional feature Extractor \(TFE\)and aPerturbation feature\-guided Molecular graph Diffusion model \(PMD\)\. The TFE fuses transcriptome information before and after perturbation and aligns it with the drug molecule feature space\. PMD guides drug molecule generation by injecting perturbation signals into the conditional diffusion process\. CURE is the first drug molecule generation method to integrate multi\-resolution cellular perturbation data while preserving heterogeneity information\. Furthermore, the generated molecules can be directly used for various downstream tasks, such as gene inhibitor discovery \(Figure[3](https://arxiv.org/html/2605.15243#S4.F3)\)\.

### 4\.2Perturbation Feature\-Guided Molecular Graph Diffusion Model

We used a conditional molecular generation diffusion model guided by the perturbation representations from the TFE\. The core architecture is based on the Diffusion Transformer\(Peebles and Xie,[2023](https://arxiv.org/html/2605.15243#bib.bib17)\), where the conditional representations are injected to guide the denoising process\.

Molecular Graph Diffusion Model\.The graph diffusion model uses a Markov chain\-driven forward process to progressively add noise to the molecular graph’s discrete features \(atom and bond types\):

q\(XGt∣XGt−1\)=Cat⁡\(XGt;p~=XGt−1𝐐Gt\),q\\left\(X\_\{G\}^\{t\}\\mid X\_\{G\}^\{t\-1\}\\right\)=\\operatorname\{Cat\}\\left\(X\_\{G\}^\{t\};\\tilde\{p\}=X\_\{G\}^\{t\-1\}\\mathbf\{Q\}\_\{G\}^\{t\}\\right\),\(2\)whereXXis the matrix representing the graphGGand𝐐\\mathbf\{Q\}is the graph transition matrix\. A neural network\-parameterized reverse process can reconstruct the graph from noise by iteratively removing it\. The reverse process learns to predict the original graph:

pθ\(G~0∣Gt\)=∏t∈Tpθ\(Gt−1∣Gt\)\.p\_\{\\theta\}\\left\(\\tilde\{G\}^\{0\}\\mid G^\{t\}\\right\)=\\prod\_\{t\\in T\}p\_\{\\theta\}\\left\(G^\{t\-1\}\\mid G^\{t\}\\right\)\.\(3\)pθ\(G~0∣Gt\)p\_\{\\theta\}\\left\(\\tilde\{G\}^\{0\}\\mid G^\{t\}\\right\)is combined withq\(Gt−1∣Gt,G0\)q\\left\(G^\{t\-1\}\\mid G^\{t\},G^\{0\}\\right\)to predict the graph reverse distribution:

pθ\(Gt−1∣Gt\)=q\(Gt−1∣G~,Gt\)pθ\(G~∣Gt\)\.p\_\{\\theta\}\\left\(G^\{t\-1\}\\mid G^\{t\}\\right\)=q\\left\(G^\{t\-1\}\\mid\\tilde\{G\},G^\{t\}\\right\)p\_\{\\theta\}\\left\(\\tilde\{G\}\\mid G^\{t\}\\right\)\.\(4\)The training objective is to minimize the negative log\-likelihood:

ℒ=𝔼q\(G0\)𝔼q\(Gt∣G0\)\[−𝔼𝐱∈G0log⁡pθ\(𝐱∣Gt\)\]\.\\mathcal\{L\}=\\mathbb\{E\}\_\{q\\left\(G^\{0\}\\right\)\}\\mathbb\{E\}\_\{q\\left\(G^\{t\}\\mid G^\{0\}\\right\)\}\\left\[\-\\mathbb\{E\}\_\{\\mathbf\{x\}\\in G^\{0\}\}\\log p\_\{\\theta\}\\left\(\\mathbf\{x\}\\mid G^\{t\}\\right\)\\right\]\.\(5\)
Transcriptome Perturbation Conditioned Molecular Generation\.The biodomains transcriptome perturbation representation from the TFE is injected into the Molecular with AdaLN method, guided by a multidimensional cluster embedder\. We use Classifier\-Free Guidance \(CFG\)\(Ho and Salimans,[2022](https://arxiv.org/html/2605.15243#bib.bib18)\)to implement conditional generation:

p^θ\(Gt−1∣Gt,𝐂\)=log⁡pθ\(Gt−1∣Gt\)\+𝐬\(log⁡pθ\(Gt−1∣Gt,𝐂\)−log⁡pθ\(Gt−1∣Gt\)\),\\begin\{split\}&\\hat\{p\}\_\{\\theta\}\\left\(G^\{t\-1\}\\mid G^\{t\},\\mathbf\{C\}\\right\)=\\log p\_\{\\theta\}\\left\(G^\{t\-1\}\\mid G^\{t\}\\right\)\\\\ &\+\\mathbf\{s\}\\left\(\\log p\_\{\\theta\}\\left\(G^\{t\-1\}\\mid G^\{t\},\\mathbf\{C\}\\right\)\-\\log p\_\{\\theta\}\\left\(G^\{t\-1\}\\mid G^\{t\}\\right\)\\right\),\\end\{split\}\(6\)where𝐬\\mathbf\{s\}represents the scale of guidance and𝐂\\mathbf\{C\}represents the condition\. During training, we use dynamic feature dropping and noise injection:

𝐂=\{Eθ\(𝐳t\)\+ϵwith1−p𝐞drop\+ϵwithp,ϵ∼𝒩\(0,𝐈\)\.\\mathbf\{C\}=\\begin\{cases\}\{E\_\{\\theta\}\}\(\\mathbf\{z\}^\{t\}\)\+\\boldsymbol\{\\epsilon\}&\\text\{with \}1\-\{p\}\\\\ \\mathbf\{e\}\_\{\\text\{drop\}\}\+\\boldsymbol\{\\epsilon\}&\\text\{with \}\{p\}\\end\{cases\},\\quad\\boldsymbol\{\\epsilon\}\\sim\\mathcal\{N\}\(0,\\mathbf\{I\}\)\.\(7\)With probabilitypp, the embeddingEθE\_\{\\theta\}is replaced by a learnable dropout vector𝐞drop\\mathbf\{e\}\_\{\\textit\{drop\}\}; otherwise, it is processed by embedderEθE\_\{\\theta\}\. Isotropic noiseϵ\\boldsymbol\{\\epsilon\}is then added\.

### 4\.3Transcriptome Perturbation Functional Feature Extractor

To efficiently extract perturbation functional signals in the biological transcriptome space for molecular condition generation, we designed aTranscriptome Perturbation Functional Feature Extractor \(TFE\)using methods of heterogeneity information preservation, perturbation signal interaction, and molecular domain alignment\. As illustrated in Figure[3](https://arxiv.org/html/2605.15243#S4.F3), the TFE comprises three parts: aHeterogeneity\-Aware Transcriptome Aggregation module \(TFE\-H\), aBidirectional Transcriptome Perturbation Signal Interaction Module \(TFE\-I\), and aDual\-View Molecular Domain Alignment Module \(TFE\-A\)\. All modules are progressively integrated to extract transcriptome perturbation functional signals and align them to drug molecule domains\.

Compared to existing methods, CURE can perform feature interaction and extraction on both bulk and single\-cell data while preserving heterogeneous information\. To achieve heterogeneous information preservation in single\-cell transcriptome data, we specifically designed the TFE\-H, which efficiently encodes single\-cell data, thereby improving the information utilization rate of single\-cell data\. In the TFE\-I, paired transcriptome perturbation data are interacted to extract the perturbation’s biological functional signals\. Finally, the functional perturbation signals are aligned with features of multiple molecular domains for subsequent conditionally controlled molecular generation tasks\.

Heterogeneity\-Aware Transcriptome Aggregation\.To address the inherent trade\-off between preserving population heterogeneity and mitigating sequencing noise when*conditioning*molecular generation on sparse single\-cell data, we propose the TFE\-H\.Conventional approaches typically collapse complex cellular populations into a single mean vector, inevitably obscuring sub\-population specific drug responses\. In contrast, our method is designed to construct a robust, fine\-grained representation of the cellular state distribution, ensuring that the generative process is conditioned on subtle, phenotype\-driven perturbation signatures rather than homogenized signals\. Specifically, we first leverage the SCimilarity\(Heimberget al\.,[2025](https://arxiv.org/html/2605.15243#bib.bib34)\)to project high\-dimensional, sparse raw expression profiles into a dense, biologically rich latent manifold\. Crucially, we implement a Cycle\-Stratified Structured Aggregation strategy: we partition the population based on cell cycle phases \(G1, S, G2/M\) and transcriptional clusters, performing hierarchical sampling and local aggregation within these biologically coherent groups\. This mechanism functions as a structured denoiser, effectively smoothing out random technical noise while rigorously preserving the distributional variance and sub\-population heterogeneity essential for precise transcriptome\-based drug design\.

Bidirectional Transcriptome Perturbation Signal InteractionTo distill the precise causal perturbation signature from the complex cellular background, we design the TFE\-I, which employs a dual\-stream architecture to explicitly model functional shift between the pre\- \(TpreT\_\{pre\}\) and post\-perturbation \(TpostT\_\{post\}\) states\. The TFE\-I consists of three stacked interaction blocks, each maintaining separate processing streams for the unperturbed and perturbed representations derived from the TFE\-H step\. Within each block, we utilize a symmetrical Cross\-Attention mechanism whereTpreT\_\{pre\}andTpostT\_\{post\}reciprocally serve as queries to attend to each other’s features\. This design forces the model to dynamically align the two states and highlight the differential gene expression patterns driven by the drug\. Following the cross\-stream interaction, a Self\-Attention layer within each stream refines the intra\-state feature dependencies\. Finally, the processed streams are integrated via an adaptive fusion unit, yielding a compact, function\-oriented perturbation embedding that encodes the net therapeutic effect independent of basal cellular variations\.

Dual\-View Molecular Domain Alignment\.The perturbation featurezz, extracted by the TFE\-I, resides in a continuous biological manifold\. To effectively guide molecular generation, this feature should be mapped to a chemical space\. However, a single chemical representation is often insufficient to capture the full complexity of drug\-like molecules: graph embeddings excel at encoding global topology and validity, while molecular fingerprints are explicitly designed to capture local pharmacophores and bioactivity \. To bridge this semantic gap and ensure the generated molecules are both structurally valid and functionally specific, we propose a Dual\-View Molecular Domain Alignment module that projects the transcriptomic features into two complementary chemical domains\.

View 1: Global Topological Alignment\.To guarantee the structural validity of the generated molecules, we align the perturbation feature with the latent space of a pretrained hierarchical graph autoencoder\(Jinet al\.,[2020](https://arxiv.org/html/2605.15243#bib.bib2)\)\. This latent manifold encapsulates essential chemical rules and topological constraints \(e\.g\., valency and ring structures\)\. By constraining the transcriptomic featurezzto map into this valid chemical space, we enforce the generative process to respect fundamental molecular topology\. The global topological alignment objective is defined as:

ℒglobal=ℒELBO\+ℒalign,\\mathcal\{L\}\_\{\\text\{global\}\}=\\mathcal\{L\}\_\{\\text\{ELBO\}\}\+\\mathcal\{L\}\_\{\\text\{align\}\},\(8\)whereℒELBO\\mathcal\{L\}\_\{\\text\{ELBO\}\}is the standard VAE evidence lower bound loss, andℒalign\\mathcal\{L\}\_\{\\text\{align\}\}aligns the mean and variance of the transcriptome\-derived features with the latent space\.

View 2: Local Bioactivity Alignment\.Global topology alone does not guarantee specific biological interactions\. To explicitly encode the functional groups and pharmacophores, we introduce an alignment view using Morgan Fingerprints\. Unlike graph embeddings, fingerprints provide a fixed\-dimensional, sparse vectorization of local chemical environments, which are highly correlated with bioactivity targets\. To map the continuous featurezzto this high\-dimensional, sparse discrete space, we employ a Sparse\-Aware Bioactivity Constraint\. This objective fuses sparse regression with label\-guided contrastive learning to handle the sparsity of the fingerprint space:

ℒlocal=ℒInfoNCE\+ℒsparse\.\\mathcal\{L\}\_\{\\text\{local\}\}=\\mathcal\{L\}\_\{\\text\{InfoNCE\}\}\+\\mathcal\{L\}\_\{\\text\{sparse\}\}\.\(9\)
By simultaneously minimizingℒ=ℒglobal\+γℒlocal\\mathcal\{L\}=\\mathcal\{L\}\_\{\\text\{global\}\}\+\\gamma\\mathcal\{L\}\_\{\\text\{local\}\}, our model learns a unified latent representation that satisfies both the synthesizability constraints imposed by the graph domain and the functional specificity required by the fingerprint domain\. This dual\-view strategy effectively resolves the cross\-modality domain gap by grounding the generation in chemically robust priors\. Details are in[SectionA\.5](https://arxiv.org/html/2605.15243#A1.SS5)\.

## 5Experiments

In the experimental section, we follow the same perspective as our evaluation metrics, assessing the model’s performance from three angles: macroscopic evaluation of the relationship between the generated and target molecular sets and the chemical and medicinal properties of the generated set itself, and microscopic evaluation of the effectiveness and accuracy of CURE conditional control generation\. To demonstrate the model’s generalization ability and the functional effects of the generated drugs, we designed the following three innovative evaluation experiments: zero\-shot prediction of gene inhibitors, characterization of the functional effects of generated drugs, and accuracy assessment of the drug screener\. To ensure reproducibility, we provide the necessary hyperparameter settings in[SectionD\.1](https://arxiv.org/html/2605.15243#A4.SS1)\.

Table 1:Comprehensive evaluation of CURE on both Bulk and Single\-cell data\. We report generalization performance and microscopic similarity across three generalization splits\. To address the lack of dedicated single\-cell baselines, we established a fair benchmark by adapting bulk models via pseudo\-bulk profiling\.Data TypeSplitMethodPreliminary EvaluationTranscriptome\-Guided EvaluationCoverage↑\\uparrowUnique↑\\uparrowSimilarity↑\\uparrowDistance↓\\downarrowQED↑\\uparrowFraggle Sim\.↑\\uparrowMorgan Sim\.↑\\uparrowPRnet MSE↓\\downarrowBulkIn\-DistributionGexMolGen54\.55%0\.76460\.891935\.40270\.51270\.74280\.69394\.6504Gx2Mol72\.73%0\.83600\.940517\.89630\.60410\.62030\.59202\.5987TRIOMPHE72\.73%0\.88090\.727048\.01690\.30710\.58420\.53727\.4599CURE100\.00%0\.89060\.95766\.78560\.56650\.88920\.82280\.2328Out\-of\-Distribution\(Unseen Cells\)GexMolGen54\.55%0\.76220\.887642\.54450\.51730\.72850\.68054\.2724Gx2Mol45\.45%0\.73210\.712365\.96710\.62030\.60150\.58493\.7071TRIOMPHE63\.64%0\.87860\.663756\.00780\.32090\.56940\.51388\.6310CURE90\.90%0\.88640\.823813\.61130\.57360\.94490\.91250\.2932Out\-of\-Distribution\(Unseen Drugs\)GexMolGen54\.55%0\.76090\.901340\.01220\.50980\.71940\.69035\.0482Gx2Mol45\.45%0\.72800\.710664\.60320\.62320\.61130\.58822\.7208TRIOMPHE63\.64%0\.88290\.732454\.31250\.35890\.57040\.50817\.4666CURE90\.90%0\.90180\.95769\.52650\.57250\.85920\.77220\.4866Single\-cellIn\-DistributionGexMolGen54\.55%0\.65210\.534241\.22010\.35200\.19880\.22454\.8549Gx2Mol45\.45%0\.70120\.610261\.44210\.45410\.21050\.28844\.1419TRIOMPHE63\.64%0\.75210\.500255\.22150\.39840\.15410\.20117\.7024CURE90\.90%0\.87710\.813727\.62230\.49460\.73100\.61140\.4829Out\-of\-Distribution\(Unseen Cells\)GexMolGen54\.55%0\.59540\.576947\.65390\.33620\.23630\.26215\.1874Gx2Mol45\.45%0\.67580\.462963\.87860\.40310\.21300\.24794\.4892TRIOMPHE54\.55%0\.73100\.431457\.40500\.30850\.14300\.19218\.5492CURE90\.90%0\.84740\.795425\.84730\.48340\.70230\.60910\.5392Out\-of\-Distribution\(Unseen Drugs\)GexMolGen54\.55%0\.59450\.585846\.00790\.33130\.18980\.22805\.6923Gx2Mol45\.45%0\.67320\.461862\.99200\.40500\.24290\.29764\.9385TRIOMPHE54\.55%0\.67380\.476058\.30310\.33320\.15350\.21989\.9462CURE90\.90%0\.83610\.782426\.19220\.49830\.71640\.60380\.6482

### 5\.1Experimental Setup

Datasets\.Bulk Cell Data: We used the L1000 Level 3 dataset\(Subramanianet al\.,[2017](https://arxiv.org/html/2605.15243#bib.bib3); Gaoet al\.,[2019](https://arxiv.org/html/2605.15243#bib.bib29)\), which profiles the expression of 978 landmark genes across nearly 20,000 drugs and various cell lines\. For training, we split the data 85:10:5 \(train:test:val\) using three strategies: random, mask drug, and mask cell\.Single\-Cell Data: We also utilized the Tahoe\-100M dataset\(Zhanget al\.,[2025](https://arxiv.org/html/2605.15243#bib.bib4)\), the largest single\-cell perturbation dataset available\. It contains results from over 300 drugs applied to 50 cancer cell lines, including their untreated states\.Gene Inhibitor Dataset: For evaluation, we built a gene inhibitor dataset from the ExCape database\(Sunet al\.,[2017](https://arxiv.org/html/2605.15243#bib.bib50)\)\. This set contains 1,200 to 23,000 known inhibitors for each of 10 selected human genes, enabling comparison with gene knockout expression profiles\. To guarantee experimental fairness, we include a detailed description of the training data in[SectionD\.3](https://arxiv.org/html/2605.15243#A4.SS3)\.

Evaluation Metrics\.We used three types of metrics to assess the model’s generative capabilities:Macroscopic Metrics: To reflect the properties of the entire generated set of drug molecules: \(1\) Heavy Atom Type Coverage \(Coverage\); \(2\) Uniqueness of structures in a single generated batch \(Unique\); \(3\) Fragment\-based similarity to a reference set \(Similarity\); \(4\) Fréchet ChemNet Distance to a reference set \(Distance\); \(5\) Quantitative Estimate of Drug\-likeness \(QED\); \(6\) Synthesizability of the target molecule \(SA\); \(7\) Validity of generated molecules \(Validity\)\.Microscopic Metrics: To assess the reliability of drug prediction based on gene perturbation: \(1\) Fraggle\-based molecular scaffold similarity \(Fraggle Sim\.\); \(2\) Morgan fingerprint\-based atomic environment similarity \(Morgan Sim\.\)\(Grant and Sit,[2021](https://arxiv.org/html/2605.15243#bib.bib31); Wanget al\.,[2022](https://arxiv.org/html/2605.15243#bib.bib32)\)\.Experimental Design Metrics: Innovatively designed to reflect the functional effects of generated drugs: \(1\) A metric to evaluate the difference in cellular gene expression effects between the generated drug and the ground\-truth drug \(PRnet MSE\)\. \(2\) On zero\-shot data of gene inhibitor effects, a metric to evaluate the similarity between the generated molecules and known gene inhibitors \(Gene Inhibitor Sim\.\)\(Méndez\-Lucioet al\.,[2020](https://arxiv.org/html/2605.15243#bib.bib33)\)\.

Baselines\.For the bulk data experiments, we selected several baseline models widely recognized in the TBDD task for comparison, including GexMolGen\(Chenget al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib12)\), TRIOMPHE\(Kaitoh and Yamanishi,[2021](https://arxiv.org/html/2605.15243#bib.bib11)\), and Gx2Mol\(Li and Yamanishi,[2025](https://arxiv.org/html/2605.15243#bib.bib53)\)\. As for single\-cell domain, there are currently no existing generative methods specialized for handling single\-cell resolution inputs\. To bridge this gap and establish a rigorous comparative benchmark, we adapted these strong bulk baselines by employing high\-quality pseudo\-bulk profiling, a standard computational biology technique that aggregates heterogeneous single\-cell populations into a unified representation via averaging\. Since these baseline architectures are inherently designed to process macroscopic transcriptomic signals, applying them to pseudo\-bulked single\-cell data constitutes a methodologically valid and logical comparison\. Crucially, to ensure strict experimental fairness, all methods were optimized using identical underlying data sources: for bulk benchmarks, all models shared the same training splits; for single\-cell benchmarks, baselines were trained on the exact same single\-cell dataset processed via pseudo\-bulk aggregation, directly contrasting with our model’s ability to utilize the granular data\.

Evaluation Protocols\.To rigorously assess OOD generalization and rule out data leakage from training interpolation, we established two strict zero\-shot protocols:1\) Unseen Cell Lines \(Hierarchical Biological Split\):We implement a strict hierarchical separation across tumor types, tissues, and cell lines\. By ensuring disjoint biological contexts between training and testing, this protocol challenges the model to disentangle intrinsic drug mechanisms from cellular heterogeneity, demonstrating robustness against transcriptomic background shifts\.2\) Unseen Drugs \(Scaffold\-level Split\):We partition the dataset based on*Bemis\-Murcko scaffolds*rather than random splitting\. This strategy segregates distinct molecular frameworks to strictly prevent information leakage from structural analogs\. Consequently, it compels the model to learn transferable structure\-function mappings and valid pharmacophores, avoiding the pitfall of memorizing specific molecular templates \(Appendix[B](https://arxiv.org/html/2605.15243#A2)\)\.

### 5\.2Preliminary Evaluation of Drug Generation

This experiment evaluates CURE from a macroscopic distributional perspective, verifying the quality, diversity, and chemical validity of the generated molecular space\. As detailed in Table[1](https://arxiv.org/html/2605.15243#S5.T1), our method establishes a new state\-of\-the\-art across both Bulk and Single\-cell modalities\. A critical finding is that CURE achieves a remarkably high Unique score while maintaining the lowest Fréchet ChemNet Distance\. This specific combination provides strong empirical support that the model performs true generative exploration, synthesizing novel, chemically valid structures, rather than merely reconstructing or memorizing the training scaffold\. The superiority of our framework is most pronounced in the Single\-cell setting\. Despite our rigorous adaptation of baseline models using high\-quality pseudo\-bulk profiles to ensure a fair comparison, they suffer severe performance degradation due to signal dilution from naive averaging\. In sharp contrast, CURE maintains robust distributional alignment\. This empirically confirms that our TFE\-H successfully extracts heterogeneity pharmacological signals from noisy cellular environments where conventional aggregation strategies fail, demonstrating robust adaptability even in the challenging Unseen Drugs split\.

Table 2:Zero\-shot Gene inhibitor similarity and affinity\.Target GeneMorgan ↑Affinity ↓ \(kcal/mol\)Gex\.TRIO\.Gx2\.CUREGex\.TRIO\.Gx2\.CUREAKT10\.7280\.5400\.7430\.804\-7\.45\-5\.67\-7\.00\-8\.63AKT20\.7120\.5150\.7060\.754\-7\.48\-5\.82\-7\.39\-8\.59AURKB0\.7440\.5530\.7190\.760\-7\.40\-6\.49\-7\.38\-8\.79CTSK0\.7490\.5350\.6990\.751\-7\.55\-6\.38\-7\.29\-8\.69EGFR0\.7470\.5400\.7380\.782\-7\.30\-6\.11\-6\.91\-9\.11HDAC10\.7200\.5190\.6970\.772\-7\.00\-5\.85\-7\.27\-8\.68MTOR0\.7940\.5270\.7450\.808\-7\.09\-5\.91\-7\.02\-8\.74PIK3CA0\.7640\.5240\.7260\.809\-7\.33\-5\.80\-7\.47\-9\.15SMAD30\.8450\.5900\.8430\.881\-7\.28\-5\.99\-7\.35\-9\.07TP530\.8090\.5880\.7930\.816\-7\.09\-5\.76\-7\.43\-8\.28

Table 3:TFE Performance of the drug screener\.Top\-KMorgan Sim\.↑\\uparrowHit Rate↑\\uparrowPRnet MSE↓\\downarrow50\.97530\.67660\.1668100\.96530\.91920\.1353150\.95680\.96940\.1228200\.91950\.97900\.1093Table 4:Ablation study of the TFE\.*\*Note: w/o TFE is trained with L1000 level 5\.*DatasetModuleValidity↑\\uparrowCoverage↑\\uparrowDiversity↑\\uparrowDistance↓\\downarrowSA↓\\downarrowQED↑\\uparrowMorgan Sim\.↑\\uparrowTFE\-HTFE\-ITFE\-AGlobalLocalL1000×\\times×\\times×\\times×\\times0\.877590\.91%0\.75048\.49820\.83550\.54260\.1824×\\times✓×\\times×\\times0\.300063\.64%0\.766282\.71830\.76510\.45560\.0886×\\times×\\times✓✓0\.240036\.36%0\.698251\.08300\.69330\.44000\.2527×\\times✓✓✓0\.9350100\.00%0\.89066\.78560\.63860\.56650\.8228Tahoe✓✓×\\times✓0\.965081\.82%0\.869343\.02270\.60430\.45880\.2674✓✓✓×\\times0\.940081\.82%0\.860029\.62230\.63570\.30480\.3219×\\times✓✓✓0\.945090\.91%0\.867130\.62230\.61930\.46450\.4924✓✓✓✓0\.980090\.91%0\.877127\.62230\.59940\.49460\.6114

### 5\.3Evaluation of Transcriptome\-guided Drug Molecular Generation

To rigorously evaluate conditional control, we assess performance via Structural Accuracy and Functional Fidelity\. In terms of structure, CURE achieves near perfect alignment, vastly outperforming baselines\. Even in the rigorous Unseen Drugs split, CURE maintains high similarity, indicating it has learned to identify and generate the essential pharmacophores and functional groups required to induce specific transcriptomic changes\. In terms of function, to rigorously quantify biological efficacy in this target\-free context, we introduced PRnet as a functional proxy\. PRnet is a predictive model that takes the raw basal transcriptome and a drug molecule as inputs to predict the resultant drug\-induced perturbation profile\. We evaluate performance by calculating the MSE between the phenotypic vector predicted for our generated molecule and that of the ground truth; a lower MSE signifies closer functional proximity to the desired therapeutic effect\. Crucially, to ensure the integrity of this metric, PRnet was trained on a strictly independent partition of the dataset to prevent data leakage\. The result \(Table[1](https://arxiv.org/html/2605.15243#S5.T1)\) confirms that CURE achieves high functional consistency, generating candidates that effectively induce the target transcriptomic state\.

### 5\.4Gene Inhibitor Prediction

To assess CURE’s utility in drug development and validate its capability to capture functional biological mechanisms, we established a rigorous zero\-shot benchmark targeting 10 canonical genes backed by extensive inhibitor libraries\(Sunet al\.,[2017](https://arxiv.org/html/2605.15243#bib.bib50)\)\. To ensure a fair comparison, we enforced a strict protocol where models trained exclusively on standard drug\-perturbation transcriptomes were tasked with generating molecules conditioned on unseen gene knockout \(KO\) signatures\. We leverage KO profiles as ”phenotypic anchors” representing the desired loss\-of\-function state, positing that a functionally aware model should generate structures that not only resemble known inhibitors but also exhibit physical binding potential to the target proteins\. Consequently, beyond calculating structural similarity \(Morgan Fingerprint\), we further performed molecular docking to assess the average Binding Affinity \(kcal/mol\) of the generated candidates against the target protein structures\.

As detailed in Table[2](https://arxiv.org/html/2605.15243#S5.T2), CURE significantly outperforms all baselines across both evaluation dimensions\. In terms of chemical structure, our method consistently achieves the highest similarity scores across all targets, indicating that the generated molecules share critical pharmacophores with known inhibitors\. More importantly, in terms of physical interaction, CURE achieves the strongest binding affinities \(lowest energy scores\) compared to baselines\. This consistency between high structural similarity and strong binding potential validates that CURE does not merely memorize chemical patterns but effectively extracts mechanism\-specific signals from transcriptomic perturbations\. By successfully translating phenotypic knockout signals into high\-affinity inhibitor structures without explicit protein structure training, CURE demonstrates robust potential for function\-oriented de novo drug design\. Details are in[SectionA\.4](https://arxiv.org/html/2605.15243#A1.SS4)\.

### 5\.5Performance Evaluation of TFE

To evaluate the effectiveness of TFE and verify the model’s translational application value, we employed a drug screening method, using molecular features𝐳\\mathbf\{z\}extracted by TFE as structural probes to query molecular fingerprints in a large\-scale drug database\. We performed a Top\-K nearest neighbor search\. Table[3](https://arxiv.org/html/2605.15243#S5.T3)details the screening performance based on three metrics at different search thresholdskk: the average structural similarity \(Morgan\) between the feature𝐳\\mathbf\{z\}and the topkkretrieved candidate molecules, the probability that the ground truth drug appears among the retrieved candidate molecules \(Hit Rate\), and the functional difference in predicted gene expression effects between the retrieved candidate molecules and the ground truth drug \(PRnet MSE\)\. Increasing thekkvalue reveals a typical trade\-off: although structural similarity naturally decreases due to the inclusion of distant neighbor molecules, search efficiency is significantly improved, reflected in higher Hit Rates and stronger functional alignments \(lower PRnet MSE\)\. Notably, when the thresholdk=10k=10, the model achieved a Hit Rate more than 90%, demonstrating a powerful ability to identify target drugs based on functional transcriptome input\. This demonstrates that CURE can refine a large chemical library into a clinically manageable library of compounds \(e\.g\., 10\-20 compounds\) with high structure and function fidelity, providing a pragmatic solution for time\-sensitive therapeutic applications\. Details are in[SectionA\.3](https://arxiv.org/html/2605.15243#A1.SS3)\.

### 5\.6Biochemical Interpretability Analysis

To systematically evaluate the interpretability of CURE, we analyzed the model’s learned representations from two complementary dimensions: the biological relevance of the functional latent space and the chemical structural fidelity of the generated molecules\.

Biological Interpretability Analysis\.Since CURE relies on phenotypic changes without explicit affinity metrics, we employed stratified UMAP to verify mechanistic principles\. First, projecting distinct inhibitors within fixed cellular backgrounds \(Figure[6](https://arxiv.org/html/2605.15243#A3.F6), top\) revealed discrete clustering by inhibitor type\. This topological separation implies the model encodes mechanism\-specific signatures, mapping perturbations to MoA rather than fitting noise\. Second, visualizing identical inhibitors across diverse cell lines \(Figure[6](https://arxiv.org/html/2605.15243#A3.F6), bottom\) exhibited stratification by cellular identity, confirming the model dynamically adapts functional representations to biological contexts rather than overfitting\. Collectively, these results demonstrate that the latent space effectively disentangles functional drug impacts from cellular backgrounds, supporting function oriented drug discovery\.

Chemical Structural Interpretability Analysis\.For the chemical structural analysis, we examined the generative diversity and structural logic of the output molecules through stochastic multi\-sampling\. We visualized multiple molecules generated from the same transcriptomic condition \(as shown in Figure[5](https://arxiv.org/html/2605.15243#A3.F5)and Table[7](https://arxiv.org/html/2605.15243#A3.T7)\)\. The results indicate that while the generated molecules maintain high similarity scores \(Fraggle/Morgan\) to the reference drugs, they exhibit significant diversity in their SMILES representations\. Generated molecules are not identical to training targets but share critical functional groups \(pharmacophores\) and local chemical environments\. This confirms the model has learned the underlying mechanism of how specific chemical substructures drive transcriptomic changes\.

### 5\.7Ablation Studies

We validated the contribution of each module as detailed in Table[4](https://arxiv.org/html/2605.15243#S5.T4)\. Crucially, in the single\-cell Tahoe dataset, removing the Heterogeneity Aware Aggregator \(TFE\-H\) caused a marked decline in both distributional alignment and structural similarity\. This confirms that TFE\-H is indispensable for distilling robust perturbation signals from noisy, heterogeneous cellular data\. Furthermore, the Interaction \(TFE\-I\) and Alignment \(TFE\-A\) modules proved foundational\. Their exclusion on the L1000 dataset led to functional model collapse, with generated molecules losing validity\. Specifically, the dual\-view alignment is critical: removing local fingerprint constraints resulted in a noticeable loss of pharmacophore fidelity, validating the necessity of our multi\-domain alignment strategy for preserving biological activity\.

## 6Conclusion

In this work, we present CURE, a TBDD framework that bridges cross\-modal gaps and transcriptomic noise via heterogeneity\-aware aggregation and dual\-view alignment\. Our method effectively addresses the cross\-modal domain gap inherent in target\-free generation\. Extensive validation confirms that CURE achieves superior structural accuracy and high functional consistency\.

## Acknowledgments

This work was supported by the Strategic Priority Research Program of Chinese Academy of Sciences \(Grant No\. XDA0480102\) and the National Science and Technology Major Project \(2023ZD0120901\)\.

## Impact Statement

This paper presents work whose goal is to advance the field of machine learning for drug discovery\. The proposed framework generates candidate drug molecules conditioned on transcriptomic perturbation data, which could accelerate early\-stage therapeutic discovery\. While the generated molecules require extensive experimental validation before any clinical consideration, we acknowledge the dual\-use potential inherent in generative molecular design\. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here\.

## Reproducibility Statement

To ensure reproducibility of our work, we have made our source code available at[https://github\.com/EdwardCurry/CURE\_TBDD](https://github.com/EdwardCurry/CURE_TBDD)\. Our experiments utilized exclusively open\-access data, including the L1000 dataset \(bulk RNA\-seq\)\(Subramanianet al\.,[2017](https://arxiv.org/html/2605.15243#bib.bib3); Gaoet al\.,[2019](https://arxiv.org/html/2605.15243#bib.bib29)\), Tahoe\-100M \(single\-cell data\)\(Zhanget al\.,[2025](https://arxiv.org/html/2605.15243#bib.bib4)\), and ExCape \(gene inhibitor information\)\(Sunet al\.,[2017](https://arxiv.org/html/2605.15243#bib.bib50)\)\. All hyperparameters used for training are explicitly documented in the configuration files within the code repository and in[SectionD\.1](https://arxiv.org/html/2605.15243#A4.SS1)\. For detailed implementation and reproduction steps, please refer to the provided code and README documentation\.

## References

- A\. K\. Adduri, D\. Gautam, B\. Bevilacqua, A\. Imran, R\. Shah, M\. Naghipourfar, N\. Teyssier, R\. Ilango, S\. Nagaraj, M\. Dong,et al\.\(2025\)Predicting cellular responses to perturbation across diverse contexts with state\.bioRxiv,pp\. 2025–06\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p2.1)\.
- A\. Alakhdar, B\. Poczos, and N\. Washburn \(2024\)Diffusion models in de novo drug design\.Journal of Chemical Information and Modeling64\(19\),pp\. 7238–7256\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p1.1)\.
- Q\. Bai, T\. Xu, J\. Huang, and H\. Perez\-Sanchez \(2024\)Geometric deep learning methods and applications in 3d structure\-based drug design\.Drug Discovery Today29\(7\),pp\. 104024\.Cited by:[§1](https://arxiv.org/html/2605.15243#S1.p1.1)\.
- Y\. Baran, A\. Bercovich, A\. Sebe\-Pedrós, Y\. Lubling, A\. Giladi, E\. Chomsky, Z\. Meir, M\. Hoichman, A\. Midber, and A\. Tanay \(2019\)MetaCell: analysis of single\-cell RNA\-seq data using K\-nn graph partitions\.Genome Biology20\(1\),pp\. 206\.Cited by:[§C\.7](https://arxiv.org/html/2605.15243#A3.SS7.p1.1)\.
- C\. Bunne, Y\. Roohani, Y\. Rosen, A\. Gupta, X\. Zhang, M\. Roed, T\. Alexandrov, M\. AlQuraishi, P\. Brennan, D\. B\. Burkhardt,et al\.\(2024\)How to build the virtual cell with artificial intelligence: priorities and opportunities\.Cell187\(25\),pp\. 7045–7063\.Cited by:[§1](https://arxiv.org/html/2605.15243#S1.p2.1)\.
- J\. Cheng, X\. Pan, Y\. Fang, K\. Yang, Y\. Xue, Q\. Yan, and Y\. Yuan \(2024\)GexMolGen: cross\-modal generation of hit\-like molecules via large language model encoding of gene expression signatures\.Briefings in Bioinformatics25\(6\),pp\. bbae525\.Cited by:[Table 12](https://arxiv.org/html/2605.15243#A3.T12.5.6.1),[§2](https://arxiv.org/html/2605.15243#S2.p2.1),[§5\.1](https://arxiv.org/html/2605.15243#S5.SS1.p3.1)\.
- A\. M\. Fahim \(2025\)Structure\-based drug design; computational strategies in drug discovery; antihypertensive agents; antiviral drugs; molecular docking; qsar; pharmacological insights\.Computational Biology and Chemistry,pp\. 108663\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p1.1)\.
- L\. Fu, S\. Shi, J\. Yi, N\. Wang, Y\. He, Z\. Wu, J\. Peng, Y\. Deng, W\. Wang, C\. Wu,et al\.\(2024\)ADMETlab 3\.0: an updated comprehensive online admet prediction platform enhanced with broader coverage, improved performance, api functionality and decision support\.Nucleic Acids Research52\(W1\),pp\. W422–W431\.Cited by:[§C\.4](https://arxiv.org/html/2605.15243#A3.SS4.p1.1)\.
- Y\. Gao, S\. Kim, Y\. Lee, and J\. Lee \(2019\)Cellular stress\-modulating drugs can potentially be identified by in silico screening with connectivity map \(cmap\)\.International Journal of Molecular Sciences20\(22\),pp\. 5601\.Cited by:[§D\.2](https://arxiv.org/html/2605.15243#A4.SS2.p1.1),[§2](https://arxiv.org/html/2605.15243#S2.p2.1),[§5\.1](https://arxiv.org/html/2605.15243#S5.SS1.p1.1),[Reproducibility Statement](https://arxiv.org/html/2605.15243#Sx3.p1.1)\.
- R\. Gómez\-Bombarelli, J\. N\. Wei, D\. Duvenaud, J\. M\. Hernández\-Lobato, B\. Sánchez\-Lengeling, D\. Sheberla, J\. Aguilera\-Iparraguirre, T\. D\. Hirzel, R\. P\. Adams, and A\. Aspuru\-Guzik \(2018\)Automatic chemical design using a data\-driven continuous representation of molecules\.ACS Central Science4\(2\),pp\. 268–276\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p1.1)\.
- L\. L\. Grant and C\. S\. Sit \(2021\)De novo molecular drug design benchmarking\.RSC Medicinal Chemistry12\(8\),pp\. 1273–1280\.Cited by:[§5\.1](https://arxiv.org/html/2605.15243#S5.SS1.p2.1)\.
- J\. Guan, X\. Zhou, Y\. Yang, Y\. Bao, J\. Peng, J\. Ma, Q\. Liu, L\. Wang, and Q\. Gu \(2024\)DecompDiff: diffusion models with decomposed priors for structure\-based drug design\.arXiv preprint arXiv:2403\.07902\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p1.1)\.
- C\. Hafemeister and F\. Halbritter \(2023\)Single\-cell rna\-seq differential expression tests within a sample should use pseudo\-bulk data of pseudo\-replicates\.bioRxiv,pp\. 2023–03\.Cited by:[§1](https://arxiv.org/html/2605.15243#S1.p4.1)\.
- G\. Heimberg, T\. Kuo, D\. J\. DePianto, O\. Salem, T\. Heigl, N\. Diamant, G\. Scalia, T\. Biancalani, S\. J\. Turley, J\. R\. Rock,et al\.\(2025\)A cell atlas foundation model for scalable search of similar human cells\.Nature638\(8052\),pp\. 1085–1094\.Cited by:[§4\.3](https://arxiv.org/html/2605.15243#S4.SS3.p3.1)\.
- L\. Hetzel, S\. Boehm, N\. Kilbertus, S\. Günnemann, F\. Theis,et al\.\(2022\)Predicting cellular responses to novel drug perturbations at a single\-cell resolution\.Advances in Neural Information Processing Systems35,pp\. 26711–26722\.Cited by:[§C\.6](https://arxiv.org/html/2605.15243#A3.SS6.p1.1),[§2](https://arxiv.org/html/2605.15243#S2.p2.1)\.
- J\. Ho and T\. Salimans \(2022\)Classifier\-free diffusion guidance\.arXiv preprint arXiv:2207\.12598\.Cited by:[§4\.2](https://arxiv.org/html/2605.15243#S4.SS2.p3.8)\.
- E\. Hoogeboom, V\. G\. Satorras, C\. Vignac, and M\. Welling \(2022\)Equivariant diffusion for molecule generation in 3d\.InInternational conference on machine learning,pp\. 8867–8887\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p1.1)\.
- C\. Hsieh, J\. Wen, S\. Lin, T\. Tseng, J\. Huang, H\. Huang, and H\. Juan \(2023\)scDrug: from single\-cell rna\-seq to drug response prediction\.Computational and Structural Biotechnology Journal21,pp\. 150–157\.Cited by:[§1](https://arxiv.org/html/2605.15243#S1.p2.1)\.
- F\. Hu, D\. Chen, Q\. Liu, and S\. Wu \(2025\)Improving multi\-task gnns for molecular property prediction via missing label imputation\.Machine Intelligence Research22\(1\),pp\. 131–144\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p1.1)\.
- C\. Isert, K\. Atz, and G\. Schneider \(2023\)Structure\-based drug design with geometric deep learning\.Current Opinion in Structural Biology79,pp\. 102548\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p1.1)\.
- Y\. Ji, M\. Lotfollahi, F\. A\. Wolf, and F\. J\. Theis \(2021\)Machine learning for perturbational single\-cell omics\.Cell Systems12\(6\),pp\. 522–537\.Cited by:[§1](https://arxiv.org/html/2605.15243#S1.p2.1)\.
- W\. Jin, R\. Barzilay, and T\. Jaakkola \(2020\)Hierarchical generation of molecular graphs using structural motifs\.InInternational conference on machine learning,pp\. 4839–4848\.Cited by:[Table 12](https://arxiv.org/html/2605.15243#A3.T12.5.7.1),[§2](https://arxiv.org/html/2605.15243#S2.p1.1),[§4\.3](https://arxiv.org/html/2605.15243#S4.SS3.p6.1)\.
- K\. Kaitoh and Y\. Yamanishi \(2021\)TRIOMPHE: transcriptome\-based inference and generation of molecules with desired phenotypes by machine learning\.Journal of Chemical Information and Modeling61\(9\),pp\. 4303–4320\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p2.1),[§5\.1](https://arxiv.org/html/2605.15243#S5.SS1.p3.1)\.
- C\. Li and Y\. Yamanishi \(2025\)Gx2Mol: de novo generation of hit\-like molecules from gene expression profiles\.InJoint European Conference on Machine Learning and Knowledge Discovery in Databases,pp\. 333–349\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p2.1),[§5\.1](https://arxiv.org/html/2605.15243#S5.SS1.p3.1)\.
- G\. Liu, J\. Xu, T\. Luo, and M\. Jiang \(2024\)Graph Diffusion Transformers for multi\-conditional molecular generation\.InThe Thirty\-eighth Annual Conference on Neural Information Processing Systems,Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p1.1)\.
- M\. Lotfollahi, F\. A\. Wolf, and F\. J\. Theis \(2019\)scGen predicts single\-cell perturbation responses\.Nature Methods16\(8\),pp\. 715–721\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p2.1)\.
- O\. Méndez\-Lucio, B\. Baillif, D\. Clevert, D\. Rouquié, and J\. Wichard \(2020\)De novo generation of hit\-like molecules from gene expression signatures using artificial intelligence\.Nature Communications11\(1\),pp\. 10\.Cited by:[§5\.1](https://arxiv.org/html/2605.15243#S5.SS1.p2.1)\.
- B\. P\. Munson, M\. Chen, A\. Bogosian, J\. F\. Kreisberg, K\. Licon, R\. Abagyan, B\. M\. Kuenzi, and T\. Ideker \(2024\)De novo generation of multi\-target compounds using deep generative chemistry\.Nature Communications15\(1\),pp\. 3636\.Cited by:[§1](https://arxiv.org/html/2605.15243#S1.p1.1)\.
- W\. Peebles and S\. Xie \(2023\)Scalable diffusion models with transformers\.InProceedings of the IEEE/CVF international conference on computer vision,pp\. 4195–4205\.Cited by:[§4\.2](https://arxiv.org/html/2605.15243#S4.SS2.p1.1)\.
- X\. Peng, J\. Guan, Q\. Liu, and J\. Ma \(2023\)MolDiff: addressing the atom\-bond inconsistency problem in 3d molecule diffusion generation\.arXiv preprint arXiv:2305\.07508\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p1.1)\.
- X\. Qi, L\. Zhao, C\. Tian, Y\. Li, Z\. Chen, P\. Huo, R\. Chen, X\. Liu, B\. Wan, S\. Yang,et al\.\(2024\)Predicting transcriptional responses to novel chemical perturbations using deep generative model for drug discovery\.Nature Communications15\(1\),pp\. 9256\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p2.1)\.
- Y\. Roohani, K\. Huang, and J\. Leskovec \(2024\)Predicting transcriptional outcomes of novel multigene perturbations with GEARS\.Nature Biotechnology42\(6\),pp\. 927–935\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p2.1)\.
- A\. V\. Sadybekov and V\. Katritch \(2023\)Computational approaches streamlining drug discovery\.Nature616\(7958\),pp\. 673–685\.Cited by:[§1](https://arxiv.org/html/2605.15243#S1.p1.1)\.
- M\. Saini, N\. Mehra, G\. Kumar, R\. Paul, and B\. Kovács \(2025\)Molecular and structure\-based drug design: from theory to practice\.InAdvances in Pharmacology,Vol\.103,pp\. 121–138\.Cited by:[§1](https://arxiv.org/html/2605.15243#S1.p1.1)\.
- A\. Schneuing, C\. Harris, Y\. Du, K\. Didi, A\. Jamasb, I\. Igashov, W\. Du, C\. Gomes, T\. L\. Blundell, P\. Lio,et al\.\(2024\)Structure\-based drug design with equivariant diffusion models\.Nature Computational Science4\(12\),pp\. 899–909\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p1.1)\.
- S\. R\. Srivatsan, J\. L\. McFaline\-Figueroa, V\. Ramani, L\. Saunders, J\. Cao, J\. Packer, H\. A\. Pliner, D\. L\. Jackson, R\. M\. Daza, L\. Christiansen,et al\.\(2020\)Massively multiplex chemical transcriptomics at single\-cell resolution\.Science367\(6473\),pp\. 45–51\.Cited by:[§D\.2](https://arxiv.org/html/2605.15243#A4.SS2.p1.1)\.
- A\. Subramanian, R\. Narayan, S\. M\. Corsello, D\. D\. Peck, T\. E\. Natoli, X\. Lu, J\. Gould, J\. F\. Davis, A\. A\. Tubelli, J\. K\. Asiedu,et al\.\(2017\)A next generation connectivity map: L1000 platform and the first 1,000,000 profiles\.Cell171\(6\),pp\. 1437–1452\.Cited by:[§B\.1](https://arxiv.org/html/2605.15243#A2.SS1.p1.3),[§D\.2](https://arxiv.org/html/2605.15243#A4.SS2.p1.1),[§2](https://arxiv.org/html/2605.15243#S2.p2.1),[§5\.1](https://arxiv.org/html/2605.15243#S5.SS1.p1.1),[Reproducibility Statement](https://arxiv.org/html/2605.15243#Sx3.p1.1)\.
- J\. Sun, N\. Jeliazkova, V\. Chupakhin, J\. Golib\-Dzib, O\. Engkvist, L\. Carlsson, J\. Wegner, H\. Ceulemans, I\. Georgiev, V\. Jeliazkov,et al\.\(2017\)ExCAPE\-DB: an integrated large scale dataset facilitating big data analysis in chemogenomics\.Journal of Cheminformatics9\(1\),pp\. 17\.Cited by:[§5\.1](https://arxiv.org/html/2605.15243#S5.SS1.p1.1),[§5\.4](https://arxiv.org/html/2605.15243#S5.SS4.p1.1),[Reproducibility Statement](https://arxiv.org/html/2605.15243#Sx3.p1.1)\.
- B\. Van de Sande, J\. S\. Lee, E\. Mutasa\-Gottgens, B\. Naughton, W\. Bacon, J\. Manning, Y\. Wang, J\. Pollard, M\. Mendez, J\. Hill,et al\.\(2023\)Applications of single\-cell rna sequencing in drug discovery and development\.Nature Reviews Drug Discovery22\(6\),pp\. 496–520\.Cited by:[§1](https://arxiv.org/html/2605.15243#S1.p4.1)\.
- L\. Wang, C\. Song, Z\. Liu, Y\. Rong, Q\. Liu, and S\. Wu \(2025\)Diffusion models for molecules: a survey of methods and tasks\.arXiv preprint arXiv:2502\.09511\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p1.1)\.
- M\. Wang, Z\. Wang, H\. Sun, J\. Wang, C\. Shen, G\. Weng, X\. Chai, H\. Li, D\. Cao, and T\. Hou \(2022\)Deep learning approaches for de novo drug design: an overview\.Current Opinion in Structural Biology72,pp\. 135–144\.Cited by:[§5\.1](https://arxiv.org/html/2605.15243#S5.SS1.p2.1)\.
- X\. Wang, K\. Song, L\. Li, and L\. Chen \(2018\)Structure\-based drug design strategies and challenges\.Current Topics in Medicinal Chemistry18\(12\),pp\. 998–1006\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p1.1)\.
- X\. Wei, J\. Dong, and F\. Wang \(2022\)scPreGAN, a deep generative model for predicting the response of single\-cell expression to perturbation\.Bioinformatics38\(13\),pp\. 3377–3384\.Cited by:[§1](https://arxiv.org/html/2605.15243#S1.p2.1)\.
- J\. A\. Weller and R\. Rohs \(2024\)Structure\-based drug design with a deep hierarchical generative model\.Journal of Chemical Information and Modeling64\(16\),pp\. 6450–6463\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p1.1)\.
- T\. Xiao, C\. Cui, H\. Zhu, and V\. G\. Honavar \(2024\)MolBind: multimodal alignment of language, molecules, and proteins\.arXiv preprint arXiv:2403\.08167\.Cited by:[§1](https://arxiv.org/html/2605.15243#S1.p4.1)\.
- Y\. You, R\. Zhou, J\. Park, H\. Xu, C\. Tian, Z\. Wang, and Y\. Shen \(2024\)Latent 3d graph diffusion\.Cited by:[§2](https://arxiv.org/html/2605.15243#S2.p1.1)\.
- J\. Zhang, A\. A\. Ubas, R\. de Borja, V\. Svensson, N\. Thomas, N\. Thakar, I\. Lai, A\. Winters, U\. Khan, M\. G\. Jones,et al\.\(2025\)Tahoe\-100M: a giga\-scale single\-cell perturbation atlas for context\-dependent gene function and cellular modeling\.bioRxiv,pp\. 2025–02\.Cited by:[§B\.1](https://arxiv.org/html/2605.15243#A2.SS1.p1.3),[§2](https://arxiv.org/html/2605.15243#S2.p2.1),[§5\.1](https://arxiv.org/html/2605.15243#S5.SS1.p1.1),[Reproducibility Statement](https://arxiv.org/html/2605.15243#Sx3.p1.1)\.
- Z\. Zhou, Y\. Li, P\. Hong, and H\. Xu \(2025\)Multimodal fusion with relational learning for molecular property prediction\.Communications Chemistry8\(1\),pp\. 200\.Cited by:[§1](https://arxiv.org/html/2605.15243#S1.p4.1)\.

## Appendix AMethodology Details

### A\.1Heterogeneity\-Aware Transcriptome Aggregation Details

Current molecular generation tasks struggle to effectively utilize single\-cell transcriptome data due to its high sparsity and significant technical noise\. Naive aggregation methods \(e\.g\., global averaging\) tend to collapse cellular heterogeneity, causing the model to learn homogenized signals rather than robust biological responses\. To address this, we designed theHeterogeneity\-Aware Transcriptome Aggregation \(TFE\-H\)module\. This module functions as astructured denoiser, leveraging biological priors to smooth out technical noise while preserving fine\-grained sub\-population distributions\. The process consists of two key stages:

1. 1\.Manifold Projection via SCimilarity:First, to mitigate the curse of dimensionality and sparsity dropout events \(D\>60,000D\>60,000\), we employ the SCimilarity framework to project raw expression profiles onto a biologically rich, dense latent manifold\. SCimilarity is a deep metric\-learning foundation model trained on approximately 7\.9 million single\-cell profiles across 56 studies \(spanning multiple tissues and diseases\)\. It maps transcriptionally similar cells to proximal points in a low\-dimensional embedding space \(d=128d=128\)\. By using this pre\-trained encoder, we performsemantic compression, extracting robust cell\-state features that are resilient to technical noise before any aggregation takes place\.
2. 2\.Cycle\-Stratified Structured Aggregation:Instead of performing indiscriminate global pooling, we construct aStructured Feature Matrixto represent the cell population\. For a given perturbation condition, we partition thedd\-dimensional cell embeddings into subsets based on their cell cycle phases \(G1, S, G2/M\)\. The cell cycle is compartmentalized into the G1 \(Gap 1\), S \(Synthesis\), and G2/M \(Gap 2/Mitosis\) phases, inherently encapsulates heterogeneity information derived from the specific sub\-population variations and unique phenotype\-driven drug responses distinct to each stage\. To balance the signal\-to\-noise ratio, we perform hierarchical sampling: we first aggregate cells locally with pooling in these biologically coherent clusters to reduce variance, and then sample a fixed total ofN=128N=128representative feature vectors according to the population’s cycle proportions with merging them to a distribution\-level condition embedding\. This approach ensures that the input to the diffusion model captures the full distributional shape of the cellular response, rather than a single mean vector\.

TFE\-H Ablation Study\.To validate the necessity of our architectural design, we conducted comprehensive ablation studies on the TFE\-H\. To validate the theoretical premise of our heterogeneity\-aware design, we conducted ablation studies comparing our Cycle\-Stratified Structured Aggregation against a Naive Bulk Averaging baseline \(where single\-cell data is simply averaged into a pseudo\-bulk vector\)\. As shown in the ablation results \(Table[5](https://arxiv.org/html/2605.15243#A1.T5)\), the Naive Averaging strategy resulted in a significant drop in performance \(e\.g\., increased Fréchet Distance\), indicating that collapsing the population distribution leads to a loss of critical pharmacological signals\. In contrast, our structured approach, which preserves cycle\-specific variance and sub\-population structures, yielded superior structural fidelity and functional alignment\. This confirms that the performance gains of CURE stem from the explicit modeling of cellular heterogeneity, effectively bridging the resolution gap between noisy single\-cell profiles and precise molecular generation\. Furthermore, comparative analysis against MLP and scratch\-trained CNN baselines \(Table[5](https://arxiv.org/html/2605.15243#A1.T5)\) confirms that TFE\-H provides the essential inductive bias to extract information\-dense features from these structured inputs, effectively bridging the modality gap between transcriptomic profiles and molecular structures\.

Table 5:Ablation study of TFE\-H\. We compared different architectures and strategies\.Metricw/oTFE\-HMLPConvNaive Bulk AveragingOursCoverage↑\\uparrow63\.6%63\.6%81\.8%90\.9%90\.9%Unique↑\\uparrow0\.450\.790\.840\.860\.88Similarity↑\\uparrow0\.580\.780\.700\.740\.81Distance↓\\downarrow73\.4538\.2633\.1530\.6227\.62QED↑\\uparrow0\.390\.440\.420\.460\.49Fraggle Sim\.↑\\uparrow0\.530\.580\.630\.640\.73Morgan Sim\.↑\\uparrow0\.490\.470\.570\.490\.61
### A\.2Formal Specification of TFE Modules

We provide the complete mathematical specification with tensor dimensions for each TFE sub\-module\.

TFE\-H \(Heterogeneity\-Aware Aggregation\)\.For single\-cell inputXsc∈ℝn×DX\_\{sc\}\\in\\mathbb\{R\}^\{n\\times D\}\(nncells,D\>60,000D\>60\{,\}000genes\):

1. 1\.Manifold Projection:H=fSCimilarity\(Xsc\)∈ℝn×128H=f\_\{\\mathrm\{SCimilarity\}\}\(X\_\{sc\}\)\\in\\mathbb\{R\}^\{n\\times 128\}, using a frozen pre\-trained encoder\.
2. 2\.Cycle\-Stratified Aggregation:PartitionHHinto\{HG1,HS,HG2/M\}\\\{H\_\{G1\},H\_\{S\},H\_\{G2/M\}\\\}by cell cycle phase\. Perform local pooling within each phase, then proportionally sampleN=128N=128representative vectors to formTpre,Tpost∈ℝ128×dT\_\{pre\},T\_\{post\}\\in\\mathbb\{R\}^\{128\\times d\}\.

For bulk input, TFE\-H is bypassed andTpre,Tpost∈ℝ1×dT\_\{pre\},T\_\{post\}\\in\\mathbb\{R\}^\{1\\times d\}are used directly\.

TFE\-I \(Bidirectional Perturbation Signal Interaction\)\.Input:Tpre,Tpost∈ℝN×dT\_\{pre\},T\_\{post\}\\in\\mathbb\{R\}^\{N\\times d\}\. The module consists of 3 stacked interaction blocks, each applying symmetrical cross\-attention followed by self\-attention:

Tpre′\\displaystyle T\_\{pre\}^\{\\prime\}=SelfAttn\(CrossAttn\(Q=Tpre,K=Tpost,V=Tpost\)\),\\displaystyle=\\mathrm\{SelfAttn\}\\\!\\left\(\\mathrm\{CrossAttn\}\(Q\{=\}T\_\{pre\},\\;K\{=\}T\_\{post\},\\;V\{=\}T\_\{post\}\)\\right\),\(10\)Tpost′\\displaystyle T\_\{post\}^\{\\prime\}=SelfAttn\(CrossAttn\(Q=Tpost,K=Tpre,V=Tpre\)\)\.\\displaystyle=\\mathrm\{SelfAttn\}\\\!\\left\(\\mathrm\{CrossAttn\}\(Q\{=\}T\_\{post\},\\;K\{=\}T\_\{pre\},\\;V\{=\}T\_\{pre\}\)\\right\)\.After 3 blocks, the two streams are concatenated and fused via self\-attention followed by mean pooling to produce the perturbation representationz∈ℝdzz\\in\\mathbb\{R\}^\{d\_\{z\}\}\.

TFE\-A \(Dual\-View Molecular Domain Alignment\)\.View 1 \(Global\):A pre\-trained HierVAE encoderQQprovides targets\(μenc,σenc\)=Q\(XG\)\(\\mu\_\{enc\},\\sigma\_\{enc\}\)=Q\(X\_\{G\}\)\. TFE projectszzto\(μf,σf\)=gproj\(z\)\(\\mu\_\{f\},\\sigma\_\{f\}\)=g\_\{\\mathrm\{proj\}\}\(z\), aligned viaℒalign=‖μenc−μf‖2\+‖σenc2−σf2‖2\\mathcal\{L\}\_\{\\text\{align\}\}=\\\|\\mu\_\{enc\}\-\\mu\_\{f\}\\\|^\{2\}\+\\\|\\sigma^\{2\}\_\{enc\}\-\\sigma^\{2\}\_\{f\}\\\|^\{2\}\.View 2 \(Local\):A projection head producesA=hproj\(z\)∈ℝ2048A=h\_\{\\mathrm\{proj\}\}\(z\)\\in\\mathbb\{R\}^\{2048\}against target Morgan fingerprintB=MorganFP\(G\)∈ℝ2048B=\\mathrm\{MorganFP\}\(G\)\\in\\mathbb\{R\}^\{2048\}\. The loss combines masked InfoNCE \(positive pairs share the same SMILES label, off\-diagonal same\-label entries masked to−∞\-\\infty,τ=0\.1\\tau\{=\}0\.1\) with sparse regression \(non\-zero positions weighted byw=log⁡\(1\+Bpos\)w=\\log\(1\{\+\}B\_\{pos\}\); zero positions penalized withα=0\.4\\alpha\{=\}0\.4,λ=0\.15\\lambda\{=\}0\.15\)\. Total:ℒ=ℒglobal\+γℒlocal\\mathcal\{L\}=\\mathcal\{L\}\_\{\\text\{global\}\}\+\\gamma\\mathcal\{L\}\_\{\\text\{local\}\}\. Training protocol: TFE trained∼\\sim30k steps, then frozen; PMD trained∼\\sim40k steps\.

### A\.3Details of the TFE Evaluation

The evaluation workflow begins with transcriptomic data from pre and post states, which can be at the bulk or single\-cell level\. Our conditional generative model TFE learns and encodes the gene expression changes caused by the disease, subsequently generating a perturbation representation𝐳\\mathbf\{z\}\.

The perturbation representation𝐳\\mathbf\{z\}, carrying specific therapeutic knowledge, is then used as a query\. We designed a cascaded filter with top\-k for rapidly identifying structurally similar analogs of the perturbation representation in a large compound library\. The core of this filter is a pre\-built molecular fingerprint database, where matches are found by performing a Top\-K nearest neighbor search\. The computational engine relies on the Tanimoto similarity coefficient to quantify the similarity between the query𝐳\\mathbf\{z\}and a database molecule’s fingerprint vector \(𝐟d\\mathbf\{f\}\_\{d\}\)\. Through this process, we can efficiently screen for a set of known compounds that are most structurally similar to the perturbation representation𝐳\\mathbf\{z\}, which are then considered potential drug candidates for the specific patient or disease state\.

### A\.4Details of the Gene Inhibitor \(Extended Analysis of Molecular Docking and Binding Affinity\)

To assess whether the molecules generated by CURE exhibit physical binding potential consistent with transcriptomic evidence, we moved beyond chemical similarity metrics to physics\-based simulations\. We employed molecular docking to calculate the binding affinity between the generated candidates and the crystal structures of the target proteins\. This section details the quantitative results across the benchmark set and provides a qualitative visual analysis of the docking poses\.

Quantitative Analysis: Surpassing Baselines in Physical Binding\.As presented in Table[2](https://arxiv.org/html/2605.15243#S5.T2), CURE consistently achieves the lowest binding energy \(indicating the strongest affinity\) across all 10 target genes\. Notably, in targets such as EGFR and PIK3CA, our model achieves affinities significantly superior to the baseline methods\. This quantitative advantage is non\-trivial\. The baseline models \(e\.g\., TRIOMPHE, GexMolGen\) often struggle to generate valid pharmacophores that fit tightly into specific protein pockets, resulting in higher docking energies \(weaker binding\)\. In contrast, CURE, driven by the heterogeneity\-aware aggregation module \(TFE\-H\), appears to capture the precise structural constraints required to induce the target phenotype, which inherently translates to high\-affinity binding structures\.

Qualitative Case Study: Visualizing the EGFR Pocket\.To intuitively understand these results, we visualized the docking poses for the Epidermal Growth Factor Receptor \(EGFR\) target\. Figure[4](https://arxiv.org/html/2605.15243#A1.F4)compares the ground\-truth inhibitor, Erlotinib, against a molecule generated by CURE under the zero\-shot setting\.

Ground Truth \(Left\): The known inhibitor Erlotinib docks into the ATP\-binding pocket of EGFR with an affinity of−7\.302\-7\.302kcal/mol\. Its quinazoline scaffold is well\-positioned to interact with the hinge region residues\.

Generated Molecule \(Right\): The molecule random sampled by CURE successfully occupies the same orthosteric binding site\. Remarkably, it achieves an even stronger binding affinity of−9\.164\-9\.164kcal/mol\. Visual inspection reveals that the generated molecule possesses a spatial configuration highly compatible with the pocket’s geometry\. It adopts a conformation that maximizes contact with the hydrophobic back pocket while positioning polar groups to potentially form stabilizing hydrogen bonds\.

Crucially, while the generated molecule differs in exact chemical composition \(SMILES\) from Erlotinib, it preserves the essential 3D topology required for inhibition\. This confirms that our model has performed a successful scaffold hopping, identifying a novel chemical entity that fulfills the same functional role\.

Bridging TBDD and SBDD: Function Informing Structure\.A notable implication of these results lies in the relationship between TBDD and SBDD\. Conventionally, achieving high docking scores is the primary objective of SBDD, which explicitly conditions generation on the 3D geometry of the protein pocket\. CURE, however, operates without ever seeing the protein structure; it is trained solely on transcriptomic perturbation data\. The fact that CURE achieves docking scores comparable to, or even exceeding, those of known inhibitors suggests that the transcriptomic fingerprint of a perturbation may contain implicit information about the target’s structural requirements\. By learning to invert the gene expression signature, the model appears to capture pharmacophoric features required to trigger that signature\. This suggests that function\-oriented design can serve as a viable pathway to generate structurally competent ligands, even in scenarios where crystal structures are unavailable or difficult to obtain\.

![Refer to caption](https://arxiv.org/html/2605.15243v1/x4.png)Figure 4:Molecular docking validation against the EGFR kinase domain\. The binding affinity of the CURE\-generated candidate was evaluated using AutoDock Vina compared to the inhibitor Erlotinib \(Ground Truth\)\. The generated molecule which was random sampled by CURE exhibited a binding affinity of \-9\.164 kcal/mol, surpassing the reference Erlotinib \(\-7\.302 kcal/mol\) under identical simulation conditions\. This indicates a stronger thermodynamic stability within the binding pocket\.
### A\.5Pseudocode for Training Loss

Algorithm 1Pseudocode for View\-1 Lossℒglobal\\mathcal\{L\}\_\{\\text\{global\}\}1:Input: Graph matrix

𝐗𝐆\\mathbf\{X\_\{G\}\}, TFE

FF, Transcription signals

𝐓pre,𝐓post\\mathbf\{T\}\_\{\\mathrm\{pre\}\},\\mathbf\{T\}\_\{\\mathrm\{post\}\}, VAE encoder

Q\(z\|𝐗𝐆\)Q\(z\|\\mathbf\{X\_\{G\}\}\), VAE decoder

P\(𝐗𝐆\|z\)P\(\\mathbf\{X\_\{G\}\}\|z\), KL weight

λKL\\lambda\_\{\\text\{KL\}\}
2:Output:

ℒglobal\\mathcal\{L\}\_\{\\text\{global\}\}
3:

\(μenc,σenc\)←zenc←Q\(𝐗𝐆\)\(\\mu\_\{\\text\{enc\}\},\\sigma\_\{\\text\{enc\}\}\)\\leftarrow z\_\{\\text\{enc\}\}\\leftarrow Q\(\\mathbf\{X\_\{G\}\}\)
4:

\(μf,σf\)←zf←F\(𝐓pre,𝐓post\)\(\\mu\_\{f\},\\sigma\_\{f\}\)\\leftarrow z\_\{f\}\\leftarrow F\(\\mathbf\{T\}\_\{\\mathrm\{pre\}\},\\mathbf\{T\}\_\{\\mathrm\{post\}\}\)
5:

ℒELBO←−𝔼zenc∼Q\[logP\(𝐗𝐆\|zenc\)\]\+λKLDKL\[Q\(zenc\|𝐗𝐆\)\|\|P\(zenc\)\]\\mathcal\{L\}\_\{\\text\{ELBO\}\}\\leftarrow\-\\mathbb\{E\}\_\{z\_\{\\text\{enc\}\}\\sim Q\}\[\\log P\(\\mathbf\{X\_\{G\}\}\|z\_\{\\text\{enc\}\}\)\]\+\\lambda\_\{KL\}D\_\{KL\}\[Q\(z\_\{\\text\{enc\}\}\|\\mathbf\{X\_\{G\}\}\)\|\|P\(z\_\{\\text\{enc\}\}\)\]⊳\\trianglerightStandard ELBO

6:

ℒalign←‖μenc−μf‖2\+‖σenc2−σf2‖2\\mathcal\{L\}\_\{\\text\{align\}\}\\leftarrow\|\|\\mu\_\{\\text\{enc\}\}\-\\mu\_\{f\}\|\|^\{2\}\+\|\|\\sigma\_\{\\text\{enc\}\}^\{2\}\-\\sigma\_\{f\}^\{2\}\|\|^\{2\}
7:

ℒglobal←ℒELBO\+ℒalign\\mathcal\{L\}\_\{\\text\{global\}\}\\leftarrow\\mathcal\{L\}\_\{\\text\{ELBO\}\}\+\\mathcal\{L\}\_\{\\text\{align\}\}
8:return

ℒglobal\\mathcal\{L\}\_\{\\text\{global\}\}

Algorithm 2Pseudocode for View\-2 Lossℒlocal\\mathcal\{L\_\{\\text\{local\}\}\}1:Input: Predict vector

𝐀\\mathbf\{A\}, target fingerprint

𝐁\\mathbf\{B\}, SMILES label list

𝐒\\mathbf\{S\}, temperature

τ=0\.1\\tau=0\.1, sparse weight

λ=0\.15\\lambda=0\.15,

α=0\.4\\alpha=0\.4
2:Output:

ℒlocal\\mathcal\{L\_\{\\text\{local\}\}\}
3:

ℒsparse←RegressionLoss\(𝐀,𝐁,α\)\\mathcal\{L\}\_\{\\text\{sparse\}\}\\leftarrow\\text\{RegressionLoss\}\(\\mathbf\{A\},\\mathbf\{B\},\\alpha\)
4:

ℒInfoNCE←ContrastLoss\(𝐀,𝐁,𝐒,τ,λ\)\\mathcal\{L\}\_\{\\text\{InfoNCE\}\}\\leftarrow\\text\{ContrastLoss\}\(\\mathbf\{A\},\\mathbf\{B\},\\mathbf\{S\},\\tau,\\lambda\)
5:

ℒlocal←ℒsparse\+ℒInfoNCE\\mathcal\{L\_\{\\text\{local\}\}\}\\leftarrow\\mathcal\{L\}\_\{\\text\{sparse\}\}\+\\mathcal\{L\}\_\{\\text\{InfoNCE\}\}
6:return

ℒlocal\\mathcal\{L\_\{\\text\{local\}\}\}
7:functionContrastLoss\(

𝐀\\mathbf\{A\},

𝐁\\mathbf\{B\},

𝐒\\mathbf\{S\},

τ\\tau,

λ\\lambda\)

8:Input:

𝐀∈ℝb×2048\\mathbf\{A\}\\in\\mathbb\{R\}^\{b\\times 2048\},

𝐁∈ℝb×2048\\mathbf\{B\}\\in\\mathbb\{R\}^\{b\\times 2048\}\(non\-negative integers\),

𝐒\\mathbf\{S\}\(length

bb\),

τ\\tau,

λ\\lambda
9:Output:

ℒInfoNCE\\mathcal\{L\}\_\{\\text\{InfoNCE\}\}
10:

𝐀←Normalize\(𝐀\),𝐁←Normalize\(𝐁\)\\mathbf\{A\}\\leftarrow\\text\{Normalize\}\(\\mathbf\{A\}\),\\mathbf\{B\}\\leftarrow\\text\{Normalize\}\(\\mathbf\{B\}\)
11:

𝐌𝐚𝐭𝐫𝐢𝐱←𝐀⋅𝐁⊤/τ\\mathbf\{Matrix\}\\leftarrow\\mathbf\{A\}\\cdot\\mathbf\{B\}^\{\\top\}/\\tau⊳\\trianglerightSimilarity matrix∈ℝb×b\\in\\mathbb\{R\}^\{b\\times b\}

12:

𝐋←\[0,1,…,b−1\]\\mathbf\{L\}\\leftarrow\[0,1,\\dots,b\-1\]⊳\\trianglerightLabels for diagonal elements

13:

𝐌𝐚𝐬𝐤same←Boolean\(Si=Sjfor alli,j\)\\mathbf\{Mask\}\_\{\\text\{same\}\}\\leftarrow\\text\{Boolean\}\(S\_\{i\}=S\_\{j\}\\text\{ for all \}i,j\)⊳\\trianglerightMask for same string labels

14:

𝐌𝐚𝐬𝐤same\[diagonal\]←False\\mathbf\{Mask\}\_\{\\text\{same\}\}\[\\text\{diagonal\}\]\\leftarrow\\text\{False\}⊳\\trianglerightExclude diagonal

15:

𝐌𝐚𝐭𝐫𝐢𝐱\[𝐌𝐚𝐬𝐤same\]←−∞\\mathbf\{Matrix\}\[\\mathbf\{Mask\}\_\{\\text\{same\}\}\]\\leftarrow\-\\infty⊳\\trianglerightSet non\-diagonal same\-label entries to large negative

16:

ℒInfoNCE’←CrossEntropy\(𝐌𝐚𝐭𝐫𝐢𝐱,𝐋\)\\mathcal\{L\}\_\{\\text\{InfoNCE'\}\}\\leftarrow\\text\{CrossEntropy\}\(\\mathbf\{Matrix\},\\mathbf\{L\}\)
17:if

λ\>0\\lambda\>0then

18:

𝐌𝐚𝐬𝐤zero←\(𝐁=0\)\\mathbf\{Mask\}\_\{\\text\{zero\}\}\\leftarrow\(\\mathbf\{B\}=0\)⊳\\trianglerightMask for zero positions in𝐁\\mathbf\{B\}

19:

ℒspa←Mean\(\(𝐀⋅𝐌𝐚𝐬𝐤zero\)2\)\\mathcal\{L\}\_\{\\text\{spa\}\}\\leftarrow\\text\{Mean\}\(\(\\mathbf\{A\}\\cdot\\mathbf\{Mask\}\_\{\\text\{zero\}\}\)^\{2\}\)
20:

ℒInfoNCE←ℒInfoNCE’\+λ⋅ℒspa\\mathcal\{L\}\_\{\\text\{InfoNCE\}\}\\leftarrow\\mathcal\{L\}\_\{\\text\{InfoNCE'\}\}\+\\lambda\\cdot\\mathcal\{L\}\_\{\\text\{spa\}\}
21:endif

22:return

ℒInfoNCE\\mathcal\{L\}\_\{\\text\{InfoNCE\}\}
23:endfunction

24:functionRegressionLoss\(

𝐀\\mathbf\{A\},

𝐁\\mathbf\{B\},

α\\alpha\)

25:Input:

𝐀∈ℝb×2048\\mathbf\{A\}\\in\\mathbb\{R\}^\{b\\times 2048\},

𝐁∈ℝb×2048\\mathbf\{B\}\\in\\mathbb\{R\}^\{b\\times 2048\}\(non\-negative integers\),

α\\alpha
26:Output:

ℒsparse\\mathcal\{L\}\_\{\\text\{sparse\}\}
27:

𝐖←ZerosLike\(𝐁\)\\mathbf\{W\}\\leftarrow\\text\{ZerosLike\}\(\\mathbf\{B\}\)⊳\\trianglerightInitialize weight matrix

28:

𝐌𝐚𝐬𝐤pos←\(𝐁\>0\)\\mathbf\{Mask\}\_\{\\text\{pos\}\}\\leftarrow\(\\mathbf\{B\}\>0\)⊳\\trianglerightNon\-zero position mask

29:

𝐌𝐚𝐬𝐤neg←\(𝐁=0\)\\mathbf\{Mask\}\_\{\\text\{neg\}\}\\leftarrow\(\\mathbf\{B\}=0\)⊳\\trianglerightZero position mask

30:

𝐖\[𝐌𝐚𝐬𝐤pos\]←log⁡\(1\+𝐁\[𝐌𝐚𝐬𝐤pos\]\)\\mathbf\{W\}\[\\mathbf\{Mask\}\_\{\\text\{pos\}\}\]\\leftarrow\\log\(1\+\\mathbf\{B\}\[\\mathbf\{Mask\}\_\{\\text\{pos\}\}\]\)⊳\\trianglerightWeights for non\-zero positions

31:

ℒpos←Sum\(𝐖⋅\(𝐀−𝐁\)2⋅𝐌𝐚𝐬𝐤pos\)/\(Sum\(𝐌𝐚𝐬𝐤pos\)\+ϵ\)\\mathcal\{L\}\_\{\\text\{pos\}\}\\leftarrow\\text\{Sum\}\(\\mathbf\{W\}\\cdot\(\\mathbf\{A\}\-\\mathbf\{B\}\)^\{2\}\\cdot\\mathbf\{Mask\}\_\{\\text\{pos\}\}\)/\(\\text\{Sum\}\(\\mathbf\{Mask\}\_\{\\text\{pos\}\}\)\+\\epsilon\)
32:

ℒneg←Sum\(𝐀2⋅𝐌𝐚𝐬𝐤neg\)/Sum\(𝐌𝐚𝐬𝐤neg\+ϵ\)\\mathcal\{L\}\_\{\\text\{neg\}\}\\leftarrow\\text\{Sum\}\(\\mathbf\{A\}^\{2\}\\cdot\\mathbf\{Mask\}\_\{\\text\{neg\}\}\)/\\text\{Sum\}\(\\mathbf\{Mask\}\_\{\\text\{neg\}\}\+\\epsilon\)
33:

ℒsparse←ℒpos\+α⋅ℒneg\\mathcal\{L\}\_\{\\text\{sparse\}\}\\leftarrow\\mathcal\{L\}\_\{\\text\{pos\}\}\+\\alpha\\cdot\\mathcal\{L\}\_\{\\text\{neg\}\}
34:return

ℒsparse\\mathcal\{L\}\_\{\\text\{sparse\}\}
35:endfunction

## Appendix BData Construction and Splitting Protocols

In this section, we provide a granular description of the data processing pipelines and the rigorous splitting strategies employed to construct the evaluation benchmarks described in Section[5](https://arxiv.org/html/2605.15243#S5)\. All preprocessing steps utilized Python librariesRDKit\(for molecular informatics\) andScanpy\(for transcriptomic data\)\.

### B\.1Dataset Preprocessing

Bulk Cell Data \(L1000\)\.We sourced the Level 3 profiles from the LINCS L1000 project\(Subramanianet al\.,[2017](https://arxiv.org/html/2605.15243#bib.bib3)\)\. To ensure data quality, we filtered out experimental instances with low transcriptional consistency scores \(distinct specificity<0\.8<0\.8\)\.Single\-Cell Data \(Tahoe\-100M\)\.For the Tahoe\-100M dataset\(Zhanget al\.,[2025](https://arxiv.org/html/2605.15243#bib.bib4)\), we performed standard quality control: cells with mitochondrial gene percentage\>5%\>5\\%or total gene counts<500<500were excluded\. Raw counts were normalized by library size and log\-transformed\.

### B\.2Implementation of OOD Splitting Strategies

To systematically evaluate generalization, we curated three distinct dataset configurations\. For each configuration, the final training/validation/testing ratio was maintained at approximately 85:10:5, but theunit of splittingdiffered fundamentally\.

#### 1\) Random Split \(IID Setting\)\.

All drug\-cell response pairs were pooled and randomly shuffled\. This setting assumes Independent and Identically Distributed \(IID\) data and serves as a baseline to measure the model’s capacity for interpolation within the training distribution\.

#### 2\) Unseen Drugs: The Scaffold\-Split Protocol\.

To strictly enforce theScaffold\-level Splitdescribed in the main text, we employed the Bemis\-Murcko scaffold decomposition algorithm provided byRDKit\. The procedure is as follows:

1. 1\.Scaffold Extraction:For every unique drug in the dataset \(L1000 and Tahoe\-100M\), we extracted its molecular scaffold by removing side chains and keeping the ring systems and linkers\.
2. 2\.Cluster Grouping:Drugs sharing the exact same Bemis\-Murcko scaffold were grouped into clusters\.
3. 3\.Stratified Partitioning:Instead of splitting individual drugs, we split thescaffold clusters\. This ensures that if a scaffold is assigned to the test set,noneof the drugs sharing that scaffold appear in the training set\.
4. 4\.Filtering:Trivial scaffolds \(e\.g\., single benzene rings\) containing an overwhelming number of drugs were downsampled in the training set to prevent class imbalance, while ensuring the test set contains complex, structurally distinct scaffolds\.

This process guarantees that the test set requires the model to generalize to new chemical spaces rather than recalling neighbors from the training set\.

#### 3\) Unseen Cell Lines: The Hierarchical Biological Split Protocol\.

To implement theUnseen Cell Linessetting, we built a hierarchy ofTissue→\\rightarrowTumor Type→\\rightarrowCell Line\.

1. 1\.Hierarchy Construction:Each cell line was mapped to its primary tissue of origin \(e\.g\., Lung, Kidney, Breast\) and specific disease subtype\.
2. 2\.Disjoint Separation:We held out entire tissue groups or distinct tumor types for the test set\. For instance, if ”Lung Tissue” is selected for testing, all cell lines derived from lung tissue are strictly excluded from the training set\.

## Appendix CMore Experimental Results and Discussions

### C\.1Impact of CFG Guidance Strength

We explored the effect of different CFG guidance strengths on the generation results \(Table[6](https://arxiv.org/html/2605.15243#A3.T6)\)\. We found that as the guidance strength increases, the performance metrics for molecular generation first rise and then slightly decline\. This reveals a trade\-off between potency and drug\-likeness, providing a basis for selecting the optimal hyperparameter in practical applications\. The experiment establishes a strength of 3 as the global optimum\. The CFG strength acts as a lever to control potency and drug\-likeness, low strength leads to insufficient potency, while high strength causes structural distortion\. This finding provides a general theoretical framework for hyperparameter optimization in drug generation tasks\. Unlike the previous experiments, which were tested on the entire dataset, this data was tested on a single batch \(batchsize=200\) to show the trend of the metrics\.

Table 6:Performance of training w/o cfg under different guidance scale\.↑\\uparrowmeans higher is better,↓\\downarrowmeans lower is better\.Metrictrain w/o cfg0123456789Validity↑\\uparrow0\.19340\.05500\.70250\.72500\.75000\.69500\.69500\.68000\.67500\.67500\.6300Coverage↑\\uparrow63\.64%54\.55%72\.73%72\.73%90\.91%81\.82%81\.82%81\.82%72\.73%72\.73%54\.55%Unique↑\\uparrow0\.79420\.72220\.79460\.80070\.80130\.79840\.80270\.79980\.80090\.80110\.8003Similarity↑\\uparrow0\.86910\.76640\.95980\.95890\.95810\.95710\.95730\.95640\.95730\.95640\.9574Distance↓\\downarrow28\.920345\.442310\.53589\.31039\.246710\.41509\.897410\.22889\.957010\.077810\.7401Fraggle Sim\.↑\\uparrow0\.34410\.32890\.87340\.87080\.88960\.87850\.87050\.88430\.88000\.86540\.8600Morgan Sim\.↑\\uparrow0\.20900\.12810\.80360\.80760\.82790\.81870\.80830\.82390\.80240\.79060\.8029QED↑\\uparrow0\.49520\.44350\.54350\.56290\.56730\.56350\.56500\.56640\.56710\.56720\.5632

![Refer to caption](https://arxiv.org/html/2605.15243v1/x5.png)Figure 5:Molecular structure diagrams generated through multiple sampling\.\*Note: The indicators in the chart represent, from left to right: Fraggle/Morgan/Scaffold Sim\. scores\.
### C\.2Molecular Structure Visualization

To further confirm that the high similarity does not result from memorization, we randomly sampled real drugs from the test set and compared them with model\-generated candidates \(Figure[5](https://arxiv.org/html/2605.15243#A3.F5), Table[7](https://arxiv.org/html/2605.15243#A3.T7)\)\. Crucially, while fingerprint\-based metrics \(Fraggle/Morgan\) exhibited high similarity, theScaffold Similarity\(based on Bemis\-Murcko decomposition\) was notably lower\. This divergence indicates that the generated molecules possess distinct chemical backbones despite sharing key functional properties\. Consequently, the high fingerprint similarity stems from the preservation of pharmacophores rather than structural replication, demonstrating the model’s capacity forscaffold hoppingand effective function extraction\.

Table 7:Molecular SMILE expressions and similarities generated from multiple samplingTarget SMILESGenerated SMILESFraggle ↑Morgan ↑Scaffold ↓Cc1cccc\(c1\)S\(=O\)\(=O\)N\[C@@H\]1CC\[C@@H\]\(CCNC\(=O\)c2cnccn2\)O\[C@@H\]1COCc1cccc\(S\(=O\)\(=O\)NC2CCC\(CCNC\(=O\)c3cnccn3\)PC2CO\)c11\.0000\.70270\.6875O=C\(NCCC1CCC\(NS\(=O\)c2cccc\(CO\)c2\)C\(CO\)O1\)c1cnccn10\.94770\.60260\.6875O=C\(NC1CCC\(CCNC\(=O\)c2cnccn2\)OC1CO\)c1ccc2c\(c1\)OCO20\.88390\.57690\.5857CN\(C\)CC\(=O\)Nc1ccc2O\[C@@H\]3\[C@@H\]\(C\[C@@H\]\(CC\(=O\)N4CCCCC4\)O\[C@@H\]3CO\)c2c1CN\(C\)CC\(=O\)Nc1ccc2c\(c1\)C1CC\(CC\(=O\)NC3CCCCC3\)OC\(CO\)C1O20\.99270\.74290\.6364O=C\(CC1CC2c3cc\(NC\(=O\)CC4CC4\)ccc3OC2C\(CO\)O1\)NCC1CC10\.88740\.57530\.4118O=C\(Nc1ccc2c\(c1\)C1CC\(CC\(=O\)N3CCCCC3\)OC\(CO\)C1O2\)NC1CCCC10\.98700\.71830\.6190COCCNC\(=O\)C\[C@@H\]1CC\[C@@H\]2\[C@H\]\(COC\[C@H\]\(O\)CN2C\(=O\)c2cc\(Cl\)cc\(Cl\)c2\)O1COCCNC\(=O\)CC1CCC\(NC\(=O\)c2cc\(Cl\)cc\(Cl\)c2\)C\(COCC\(C\)O\)O10\.95800\.46910\.4490COCCNC\(=O\)CC1CCC2C\(COCC\(O\)CN2\[N\+\]\(=O\)c2cc\(Cl\)cc\(Cl\)c2\)O10\.99990\.71830\.5600CCC\(=O\)N1CC\(O\)COCC2OC\(CC\(=O\)NCc3cc\(F\)cc\(F\)c3\)CCC210\.73890\.51950\.3333
### C\.3Latent Space Analysis and Visualization of biological interpretability

Since CURE operates within the framework of TBDD, it lacks explicit indicators for target binding affinity\. To rigorously investigate the biological interpretability of the model and verify whether the learned latent representations capture authentic mechanistic principles rather than mere statistical artifacts, we conducted a stratified visualization analysis using Uniform Manifold Approximation and Projection \(UMAP\) on the human gene inhibitor dataset\. Our visualization strategy was designed to probe the latent space from two complementary perspectives: functional specificity and biological context sensitivity\.

First, to validate the model’s capability to encode mechanism\-specific functional signatures, we isolated the cellular background by projecting latent embeddings of distinct gene inhibitors within a single cell line \(U251MG, HT29, A549\)\. As illustrated in the top row of Figure[6](https://arxiv.org/html/2605.15243#A3.F6), the resulting manifold reveals a striking structural organization where samples form discrete, tight clusters according to the inhibitor type\. This distinct separation implies that the model effectively extracts and encodes the unique transcriptomic perturbations associated with specific therapeutic targets, effectively mapping phenotypic changes to their underlying MoA\.

Second, to demonstrate that the model maintains sensitivity to cellular heterogeneity and is not overfitting to a generic drug signature, we visualized the embeddings of identical inhibitors \(MTOR, CTSK, SMAD3\) across diverse cell lines\. The bottom row of Figure[6](https://arxiv.org/html/2605.15243#A3.F6)exhibits clear stratification driven by cellular identity, confirming that the model dynamically adapts its functional representations based on the biological context\.

Collectively, these visualization results provide strong evidence for the model’s validity\. The ability to simultaneously achieve high intra\-class compactness for inhibitors \(demonstrating mechanistic understanding\) and inter\-class separability for cell lines \(demonstrating context awareness\) strongly suggests that the intermediate latent space operates as a biologically meaningful manifold\. This indicates that our framework effectively disentangles the specific functional impact of drug perturbations from complex cellular background effects, establishing a robust foundation for function\-oriented drug discovery\.

![Refer to caption](https://arxiv.org/html/2605.15243v1/x6.png)Figure 6:Latent space visualization using the human gene inhibitor dataset\. UMAP projections reveal the model’s ability to disentangle biological mechanisms from cellular contexts\. \(Top\) Distinct clustering of different inhibitors within the same cell line demonstrates the encoding of mechanism\-specific functional signatures\. \(Bottom\) Stratification of identical inhibitors across diverse cell lines confirms the model’s sensitivity to cellular heterogeneity\.
### C\.4Evaluation of Toxicity Properties

To further assess the pharmacological viability and safety profile of the generated compounds, we extended our evaluation to include toxicity\-related properties using the ADMETlab predictor\(Fuet al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib54)\)\. We compared molecules generated by CURE against those from baseline methods \(GexMolGen, Gx2Mol, TRIOMPHE\) as well as the ground\-truth drugs from the L1000 test set\. The evaluation covers a broad spectrum of toxicity risks, including mutagenicity \(Ames\), cardiotoxicity \(hERG\), and organ\-specific toxicities\. The results, summarized in Table[8](https://arxiv.org/html/2605.15243#A3.T8), demonstrate that CURE achieves a competitive safety profile\. Although our model was not explicitly optimized for these specific toxicity during training, the generated molecules exhibit toxicity scores that are consistently within a reasonable range, often matching or outperforming both the baseline methods and the ground\-truth reference drugs \(e\.g\., in Eye Irritation and Rat Oral Acute Toxicity\)\.

Table 8:Comparison of predicted toxicity properties across generative models and the ground truth \(L1000\)\. Arrows indicate whether lower \(↓\\downarrow\) or higher \(↑\\uparrow\) scores are desirable\.MetricGexMolGenGx2MolTRIOMPHECUREL1000 \(GT\)Ames Mutagenicity↓\\downarrow0\.50030\.46320\.59650\.48500\.5527hERG Blockers \(10μ\\muM\)↓\\downarrow0\.43610\.42360\.40030\.39830\.2973Hematotoxicity↓\\downarrow0\.44350\.34360\.39930\.35900\.5149Respiratory Toxicity↓\\downarrow0\.47550\.53320\.81410\.46840\.4715Carcinogenicity↓\\downarrow0\.45230\.45110\.54390\.47140\.5345DILI \(Liver Injury\)↓\\downarrow0\.69680\.69230\.66230\.65060\.6733ROA \(Rat Oral Acute Tox\.\)↓\\downarrow0\.34750\.33310\.66580\.32140\.3414FDAMDD \(Max Daily Dose\)↑\\uparrow0\.48750\.54980\.61360\.59180\.5272Eye Irritation↓\\downarrow0\.19830\.20820\.26820\.09420\.2238Eye Corrosion↓\\downarrow0\.03110\.02640\.14850\.01410\.0124
### C\.5Scaffold Novelty Analysis

To address concerns regarding whether high similarity metrics reflect structural duplication rather than genuine pharmacophoric capture, we performed a systematic scaffold novelty analysis on the bulk in\-distribution setting\. We report three complementary metrics: Unique Scaffold Ratio \(fraction of unique Bemis\-Murcko scaffolds among generated molecules\), Scaffold Novelty \(fraction of generated scaffolds absent from the training set\), and Internal Diversity \(mean pairwise Tanimoto distance among generated molecules\)\.

Table 9:Scaffold novelty analysis on bulk in\-distribution data\.MethodUnique Scaffold Ratio↑\\uparrowScaffold Novelty↑\\uparrowInternal Diversity↑\\uparrowGexMolGen0\.53730\.67960\.7646Gx2Mol0\.49750\.63370\.8360TRIOMPHE0\.56940\.66370\.8809CURE0\.63600\.72640\.8906As shown in Table[9](https://arxiv.org/html/2605.15243#A3.T9), CURE achieves the highest scores across all three metrics\. The high Scaffold Novelty \(0\.7264\) confirms that over 72% of generated scaffolds are absent from training, while the co\-occurrence of high fingerprint similarity \(Table[1](https://arxiv.org/html/2605.15243#S5.T1)\) and lower scaffold similarity \(Figure[5](https://arxiv.org/html/2605.15243#A3.F5)\) indicates successful scaffold hopping: preserving pharmacophores while exploring novel chemical backbones\.

### C\.6Alternative Functional Proxy Validation

To verify that our evaluation conclusions are robust to the choice of functional proxy, we replaced PRnet with chemCPA\(Hetzelet al\.,[2022](https://arxiv.org/html/2605.15243#bib.bib6)\), an independently validated perturbation predictor\. As shown in Table[10](https://arxiv.org/html/2605.15243#A3.T10), the method ranking is fully preserved under chemCPA, confirming that CURE’s superiority in functional consistency is not an artifact of any particular evaluator\.

Table 10:Functional proxy validation: PRnet vs\. chemCPA\. Method ranking is preserved\.MethodPRnet MSE↓\\downarrowchemCPA MSE↓\\downarrowGexMolGen4\.65045\.0821Gx2Mol2\.59872\.9487TRIOMPHE7\.45997\.8536CURE0\.23280\.3415
### C\.7Single\-Cell Baseline Robustness with Metacell Aggregation

No existing TBDD method natively handles single\-cell input\. Pseudo\-bulk averaging is the standard adaptation strategy\. To further strengthen the fairness of our comparison, we additionally tested metacell aggregation\(Baranet al\.,[2019](https://arxiv.org/html/2605.15243#bib.bib56)\), which constructs representative cellular profiles viakk\-nearest\-neighbor graph partitioning\.

Table 11:Single\-cell baseline robustness: pseudo\-bulk vs\. metacell aggregation \(in\-distribution\)\.AggregationMethodCoverage↑\\uparrowMorgan Sim\.↑\\uparrowPRnet MSE↓\\downarrowPseudo\-bulkGexMolGen54\.55%0\.22454\.8549Gx2Mol45\.45%0\.28844\.1419TRIOMPHE63\.64%0\.20117\.7024MetacellGexMolGen54\.55%0\.24924\.3120Gx2Mol54\.55%0\.31183\.9315TRIOMPHE63\.64%0\.24687\.2836TFE\-HCURE90\.90%0\.61140\.4829As shown in Table[11](https://arxiv.org/html/2605.15243#A3.T11), metacell aggregation provides only marginal improvement over pseudo\-bulk for baselines, while CURE maintains a substantial lead\. The limitation is architectural: baseline methods accept a single aggregated vector and cannot model within\-population heterogeneity, whereas TFE\-H preserves sub\-population structure\.

### C\.8Generator Architecture Ablation

To isolate the contribution of the Graph Diffusion backbone from TFE conditioning, we replaced the generator while keeping TFE fixed\.

Table 12:Generator architecture ablation with identical TFE conditioning \(bulk in\-distribution\)\.GeneratorSimilarity↑\\uparrowCoverage↑\\uparrowUnique↑\\uparrowMorgan Sim\.↑\\uparrowPRnet MSE↓\\downarrowSMILES Decoder\(Chenget al\.,[2024](https://arxiv.org/html/2605.15243#bib.bib12)\)0\.740672\.73%0\.64060\.51924\.6732Graph VAE\(Jinet al\.,[2020](https://arxiv.org/html/2605.15243#bib.bib2)\)0\.817672\.73%0\.74500\.68452\.5620Graph Diffusion0\.9576100\.00%0\.89060\.82280\.2328Graph Diffusion outperforms all alternatives across every metric \(Table[12](https://arxiv.org/html/2605.15243#A3.T12)\)\. The PRnet MSE gap \(0\.23 vs\. 2\.56 vs\. 4\.67\) is especially striking, confirming that the graph diffusion backbone is critical for functional fidelity\. Notably, even weaker generators produce condition\-responsive molecules when guided by TFE, validating the effectiveness of our feature extraction\.

### C\.9Extended TFE Input Processing Ablation

We compare different input processing strategies to demonstrate the co\-dependence of TFE\-I and TFE\-A\. The baseline uses officially pre\-processed L1000 Level 5 data, while TFE variants operate on raw Level 3 with separateTpreT\_\{pre\}andTpostT\_\{post\}\.

Table 13:TFE input processing ablation \(bulk in\-distribution\)\.Input ProcessingValidity↑\\uparrowCoverage↑\\uparrowDiversity↑\\uparrowMorgan Sim\.↑\\uparroww/o TFE \(Level 5\)0\.877590\.91%0\.75040\.1824Concat\(TpreT\_\{pre\},TpostT\_\{post\}\) \(Level 3\)0\.215036\.36%0\.71800\.0752TFE\-I only \(Level 3\)0\.300063\.64%0\.76620\.0886TFE\-A only \(Level 3\)0\.240036\.36%0\.69820\.2527Full TFE \(Level 3\)0\.9350100\.00%0\.89060\.8228As shown in Table[13](https://arxiv.org/html/2605.15243#A3.T13), naive concatenation of Level 3 data severely degrades performance, and using TFE\-I or TFE\-A alone provides only partial recovery\. Only the full TFE pipeline surpasses the Level 5 baseline by a large margin, confirming that TFE\-I and TFE\-A are co\-dependent: TFE\-I distills perturbation signals \(without alignment, these remain noisy\), while TFE\-A aligns to chemical domains \(without distillation, the alignment targets are incoherent\)\.

### C\.10Docking Protocol and Controls

We provide the complete molecular docking protocol used for the zero\-shot gene inhibitor evaluation \(Section 5\.4\)\. All docking simulations were performed using AutoDock Vina with a search box of25325^\{3\}Å \(except MTOR and SMAD3, which used30330^\{3\}Å due to larger binding sites\)\.

Table 14:Docking protocol: PDB structures, grid centers, and known inhibitor controls\.TargetPDB IDKnown InhibitorsGrid Center \(x, y, z\)AKT13O96MK\-2206\(8\.37,−\-6\.83, 12\.62\)AKT22JDRA\-443654\(21\.81, 1\.88, 42\.42\)AURKB4C2VBarasertib\(23\.21, 0\.30, 32\.79\)CTSK1VSNOdanacatib\(−\-2\.72, 24\.01, 6\.33\)EGFR1M17Erlotinib\(22\.01, 0\.25, 52\.79\)HDAC14BKXVorinostat\(−\-46\.76, 16\.29,−\-7\.79\)MTOR4JT6Torkinib\(51\.81, 0\.00,−\-46\.93\)PIK3CA7PG6Alpelisib\(−\-1\.25,−\-9\.01, 17\.46\)SMAD31U7FSIS3\(−\-12\.87, 36\.04, 81\.32\)TP532VUKPhiKan083\(124\.68, 105\.07,−\-43\.12\)Table 15:Binding affinity comparison \(kcal/mol\)\. Lower values indicate stronger binding\.TargetGexMolGen↓\\downarrowCURE↓\\downarrowKnown Inhib\.↓\\downarrowAKT1−\-7\.45−\-8\.63−\-9\.52AKT2−\-7\.48−\-8\.59−\-8\.14AURKB−\-7\.40−\-8\.79−\-10\.20CTSK−\-7\.55−\-8\.69−\-8\.58EGFR−\-7\.30−\-9\.11−\-7\.33HDAC1−\-7\.00−\-8\.68−\-8\.48MTOR−\-7\.09−\-8\.74−\-8\.22PIK3CA−\-7\.33−\-9\.15−\-8\.74SMAD3−\-7\.28−\-9\.07−\-8\.17TP53−\-7\.09−\-8\.28−\-7\.69As shown in Table[15](https://arxiv.org/html/2605.15243#A3.T15), CURE consistently achieves binding affinities comparable to or exceeding those of known inhibitors across all 10 targets, demonstrating that transcriptomics\-guided generation can produce physically viable binders\.

## Appendix DMore Experimental Details

### D\.1Model Training Setup

We provide a comprehensive description of the model architecture complexity and the specific hyperparameter settings used during the training phases\(Table[16](https://arxiv.org/html/2605.15243#A4.T16), Table[17](https://arxiv.org/html/2605.15243#A4.T17)\)\. For full reproducibility, we refer readers to the specific configuration files available in our source code repository\.

### D\.2Details for PRnet

To rigorously quantify biological efficacy, we employed PRnet as a functional proxy: a flexible and scalable perturbation\-conditioned generative model predicting transcriptional responses to novel complex perturbations at bulk and single\-cell levels\. We strictly enforced evaluation integrity by training the bulk PRnet model on the L1000 dataset\(Subramanianet al\.,[2017](https://arxiv.org/html/2605.15243#bib.bib3); Gaoet al\.,[2019](https://arxiv.org/html/2605.15243#bib.bib29)\)using the exact same training split as our generative framework, thereby eliminating any risk of data leakage from the test set\. Furthermore, for the single\-cell domain, the predictor was trained on the comprehensive Sci\-plex dataset\(Srivatsanet al\.,[2020](https://arxiv.org/html/2605.15243#bib.bib55)\), ensuring that our functional consistency metrics are derived from robust, domain\-specific biological priors\.

### D\.3Data Integrity and Prevention of Leakage

To ensure the validity of our evaluation and the generalization capability of the model, we strictly enforced data isolation protocols across all learnable modules\. The TFE were trained exclusively on the designated training splits of the TBDD dataset, with no exposure to molecules or transcriptomes from the validation or test sets\. Regarding the use of SCimilarity, it serves solely as a generic, frozen dimensionality\-reduction tool\. It was pre\-trained on a broad human cell atlas for general cell\-state embedding and was not fine\-tuned on our L1000, Tahoe\-100M, or ExCAPE datasets\. Thus, it contains no task\-specific supervision regarding drug\-perturbation mappings\. Similarly, the Morgan fingerprint alignment relies on deterministic RDKit computations without learning\. These rigorous measures ensure that the model’s performance stems from learning authentic structure\-function mappings rather than data leakage or memorization\.

Table 16:Hyperparameter Settings for Training Phases\.HyperparameterTFEPMDHardwareNVIDIA A100 \(40GB\)NVIDIA A100 \(40GB\)Total Training Time∼\\sim15 GPU hours∼\\sim48 GPU hoursTraining Steps30k40kBatch Size64400Learning Rate1×10−41\\times 10^\{\-4\}2×10−42\\times 10^\{\-4\}OptimizerAdamAdamDiffusion Steps \(TT\)–500Table 17:Summary of Model Parameters\.ComponentDescriptionParametersDiffusion ModelGraph Diffusion Transformer∼\\sim501\.0 MTFE\-IFeature extraction and alignment modules∼\\sim7\.8 MTFE\-AEncodes and reconstructs molecular graphs∼\\sim5\.3 M
Reading the Cell, Designing the Cure: Perturbation-Conditioned Molecular Diffusion for Function-Oriented Drug Design

Similar Articles

Generating Developable 3D Molecules via Pocket-Conditioned Diffusion and Property-Aware Optimization

TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation

DrugGen 2: A disease-aware language model for enhancing drug discovery

Gene Expression-Informed Jointly Controlled Generative Modeling for Precision Molecular Design

Conditional generation of antibody sequences with classifier-guided germline-absorbing discrete diffusion

Submit Feedback

Similar Articles

Generating Developable 3D Molecules via Pocket-Conditioned Diffusion and Property-Aware Optimization
TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation
DrugGen 2: A disease-aware language model for enhancing drug discovery
Gene Expression-Informed Jointly Controlled Generative Modeling for Precision Molecular Design
Conditional generation of antibody sequences with classifier-guided germline-absorbing discrete diffusion