Co-folding model guided by structural proteomics
Summary
Introduces AIMS-Fold, an inference-time guided-diffusion framework that integrates cross-linking mass spectrometry (XL-MS) and hydrogen-deuterium exchange (HDX-MS) data to improve protein co-folding predictions for induced proximity drug targets.
View Cached Full Text
Cached at: 05/27/26, 09:05 AM
# Co-folding model guided by structural proteomics
Source: [https://arxiv.org/html/2605.26192](https://arxiv.org/html/2605.26192)
Alon Shtrikman1∗Nitzan Simchi1∗Michal Ran Shchory1∗ Sagie Brodsky1∗Eran Seger1Kirill Pevzner1✉\{\}^\{1\\textrm\{\{\\char 0\\relax\}\}\} 1Protai Bio kirill@protai\.bio ∗Equal Contribution✉\{\}^\{\\textrm\{\{\\char 0\\relax\}\}\}Corresponding Author
###### Abstract
Protein structure generative models excel at predicting single protein static structures from sequence, but routinely fail to capture the correct conformational state of protein complexes, critical for protein design and induced proximity modalities such as antibodies and PROTACs\. While structural proteomics techniques like Cross\-Linking Mass Spectrometry \(XL\-MS\) and Hydrogen\-Deuterium Exchange \(HDX\-MS\) offer valuable spatial and dynamic insights, integrating these sparse, heterogeneous measurements into these models remains an open challenge\. Here, we bridge this gap by combining structural proteomics data with the rich biophysical priors learned by pretrained diffusion models\. We introduce AIMS\-Fold, an inference\-time guided\-diffusion framework that actively steers the generative sampling trajectory using differentiable physical potentials derived from XL\-MS spatial restraints and HDX\-MS solvent accessibility profiles\. We demonstrate that these structural methods individually enhance predictive accuracy, and their integration yields synergistic improvement\. Crucially, by leveraging these experimental restraints, AIMS\-Fold achieves higher accuracy on challenging induced proximity targets than purely computational, unguided state\-of\-the\-art models like Boltz\-2\. This establishes our framework as a powerful, integrative computational approach for the structure based drug design of induced proximity drugs\. Evaluation code will be made publicly available upon publication\.
## 1Introduction
Proximity\-inducing drugs, including proteolysis\-targeting chimeras \(PROTACs\) and molecular glues, represent a new class of therapeutic modalities\[[4](https://arxiv.org/html/2605.26192#bib.bib13),[31](https://arxiv.org/html/2605.26192#bib.bib14)\]\. Unlike classical small molecule inhibitors whose efficacy is dictated by the binary binding affinity to a single target, the activity of proximity inducers is driven by the assembly and dynamic behavior of a ternary complex \(e\.g\., a target protein, the bridging molecule, and an effector like an E3 ligase\)\[[40](https://arxiv.org/html/2605.26192#bib.bib33),[43](https://arxiv.org/html/2605.26192#bib.bib32),[29](https://arxiv.org/html/2605.26192#bib.bib31),[13](https://arxiv.org/html/2605.26192#bib.bib30)\]\. For rational drug design, using the correct protein complex state is critical\[[14](https://arxiv.org/html/2605.26192#bib.bib42)\]\. These complexes require a balance between structural stability and the flexibility to execute biological functions, such as optimal ubiquitination geometry\[[21](https://arxiv.org/html/2605.26192#bib.bib23),[5](https://arxiv.org/html/2605.26192#bib.bib2)\]\.
Recently, sequence\-to\-structure deep learning models, most notably AlphaFold3\[[1](https://arxiv.org/html/2605.26192#bib.bib24)\]and Boltz\-2\[[23](https://arxiv.org/html/2605.26192#bib.bib10)\], have impacted structural biology by providing highly accurate proteome\-wide structure predictions\. Despite these breakthroughs, such models are predominantly trained to map a sequence to a single, static structural state\[[20](https://arxiv.org/html/2605.26192#bib.bib43)\]\. Since dynamic protein\-protein interactions and drug\-induced complexes are sparsely represented in training repositories like the Protein Data Bank \(PDB\), these models frequently suffer from overconfidence in predicting one static state\[[16](https://arxiv.org/html/2605.26192#bib.bib3)\]\. Consequently, they fail to capture the conformational shifts driven by induced proximity drugs\[[24](https://arxiv.org/html/2605.26192#bib.bib44),[9](https://arxiv.org/html/2605.26192#bib.bib45)\]\.
Structural proteomics solves this by capturing the dynamics of protein complexes\[[17](https://arxiv.org/html/2605.26192#bib.bib21)\]\. Cross\-linking mass spectrometry \(XL\-MS\) provides spatial constraints\[[16](https://arxiv.org/html/2605.26192#bib.bib3),[17](https://arxiv.org/html/2605.26192#bib.bib21)\], while hydrogen\-deuterium exchange \(HDX\-MS\) captures solvent accessibility\[[19](https://arxiv.org/html/2605.26192#bib.bib16)\]\. Integrating raw MS data directly into structure generative models to actively guide structure prediction remains a challenge\[[39](https://arxiv.org/html/2605.26192#bib.bib46)\]\.
In this work, we bridge this gap by introducing AIMS\-Fold, a novel diffusion\-based generative model that actively uses sparse structural proteomics data to guide structure generation\. Rather than relying on model weights alone for static structure predictions or post\-hoc filtering using experimental data\[[38](https://arxiv.org/html/2605.26192#bib.bib34),[36](https://arxiv.org/html/2605.26192#bib.bib35),[27](https://arxiv.org/html/2605.26192#bib.bib36)\], AIMS\-Fold applies inference\-time steering\[[6](https://arxiv.org/html/2605.26192#bib.bib17)\]to a pretrained atomic diffusion model\. By translating XL\-MS and HDX\-MS data into differentiable physical potentials, our method actively alters the probability landscape during the reverse diffusion process, guiding the sampling trajectory toward biologically compatible conformations that satisfy the experimental constraints\. We demonstrate that integrating positive and negative spatial restraints \(XL\-MS\) with solvent accessibility patterns \(HDX\-MS\) yields significantly improved performance compared to unconstrained generation or post\-hoc filtering\.
Figure 1:AIMS\-Fold is an inference\-time guided\-diffusion framework that actively steers the generative sampling trajectory using experimentally derived constraints\. While Boltz\-2 inputs mainly include SMILES and protein sequences, AIMS\-Fold receives HDX\-MS, XL\-MS positive and XL\-MS negative constraints to improve model prediction accuracy\.
## 2Background
### 2\.1Diffusion\-based structure generation
Recent advances in biomolecular modeling, such as AlphaFold3\[[1](https://arxiv.org/html/2605.26192#bib.bib24)\]and Boltz\-2\[[23](https://arxiv.org/html/2605.26192#bib.bib10)\], frame structure prediction as a continuous\-time generative diffusion process\. The model operates directly in the space of 3D atomic coordinates, where a protein structure ofNNatoms is represented as𝐱∈ℝN×3\\mathbf\{x\}\\in\\mathbb\{R\}^\{N\\times 3\}\.
The forward diffusion process gradually noises the data𝐱0∼pdata\\mathbf\{x\}\_\{0\}\\sim p\_\{\\text\{data\}\}into a standard Gaussian distribution over a time variablet∈\[0,T\]t\\in\[0,T\]\. This destruction of signal is governed by a stochastic differential equation \(SDE\)\[[32](https://arxiv.org/html/2605.26192#bib.bib27)\]:
d𝐱=f\(𝐱,t\)dt\+g\(t\)d𝐰\\mathrm\{d\}\\mathbf\{x\}=f\(\\mathbf\{x\},t\)\\mathrm\{d\}t\+g\(t\)\\mathrm\{d\}\\mathbf\{w\}\(1\)In this formulation, the drift and diffusion coefficients are denoted byf\(𝐱,t\)f\(\\mathbf\{x\},t\)andg\(t\)g\(t\), respectively, whiled𝐰\\mathrm\{d\}\\mathbf\{w\}characterizes a standard Wiener process\.
To sample from the target distribution and generate novel structures, the model samples pure noise𝐱T∼𝒩\(0,𝐈\)\\mathbf\{x\}\_\{T\}\\sim\\mathcal\{N\}\(0,\\mathbf\{I\}\)and simulates the reverse\-time SDE \(reverse diffusion process\)\[[32](https://arxiv.org/html/2605.26192#bib.bib27),[2](https://arxiv.org/html/2605.26192#bib.bib9)\]:
d𝐱=\[f\(𝐱,t\)−g\(t\)2∇𝐱logpt\(𝐱\)\]dt\+g\(t\)d𝐰¯\\mathrm\{d\}\\mathbf\{x\}=\\left\[f\(\\mathbf\{x\},t\)\-g\(t\)^\{2\}\\nabla\_\{\\mathbf\{x\}\}\\log p\_\{t\}\(\\mathbf\{x\}\)\\right\]\\mathrm\{d\}t\+g\(t\)\\mathrm\{d\}\\mathbf\{\\bar\{w\}\}\(2\)Because the true marginal score function∇𝐱logpt\(𝐱\)\\nabla\_\{\\mathbf\{x\}\}\\log p\_\{t\}\(\\mathbf\{x\}\)is intractable, a neural networksθ\(𝐱,t\)s\_\{\\theta\}\(\\mathbf\{x\},t\)is trained via denoising score matching\[[37](https://arxiv.org/html/2605.26192#bib.bib5)\]to approximate it\. At each sampling timestep, the model predicts the fully denoised ground\-truth structure, denoted as𝐱^0\(𝐱,t\)\\hat\{\\mathbf\{x\}\}\_\{0\}\(\\mathbf\{x\},t\), which drives the trajectory toward a folded protein state\.
### 2\.2Inference\-time steering via energy potentials
A major innovation integrated into Boltz\-2 is the steering mechanism, an inference\-time method that applies physics\-based potentials to correct non\-physical predictions and guide the model toward specific conformational basins\. Crucially, this alters the probability landscape without requiring any retraining of the base neural network\. Mathematically, a differentiable energy potentialU\(𝐱\)U\(\\mathbf\{x\}\)is defined to represent the desired structural constraint\. Using Tweedie’s formula\[[10](https://arxiv.org/html/2605.26192#bib.bib7),[6](https://arxiv.org/html/2605.26192#bib.bib17)\], the potential is evaluated on the network’s current denoised prediction𝐱^0\(𝐱,t\)\\hat\{\\mathbf\{x\}\}\_\{0\}\(\\mathbf\{x\},t\)\. The gradient of this potential is then injected directly into the score function to guide the sampling trajectory\[[6](https://arxiv.org/html/2605.26192#bib.bib17)\]:
sθ~\(𝐱,t\)=sθ\(𝐱,t\)−λ\(t\)∇𝐱U\(𝐱^0\(𝐱,t\)\)\\tilde\{s\_\{\\theta\}\}\(\\mathbf\{x\},t\)=s\_\{\\theta\}\(\\mathbf\{x\},t\)\-\\lambda\(t\)\\nabla\_\{\\mathbf\{x\}\}U\(\\hat\{\\mathbf\{x\}\}\_\{0\}\(\\mathbf\{x\},t\)\)\(3\)whereλ\(t\)\\lambda\(t\)is a time\-dependent scaling factor that dictates the strength of the guidance\. Boltz\-steering utilizes flat\-bottomed penalty functions\. This means the potential applies zero gradient penalty as long as the generated structure satisfies the condition, but applies an increasingly severe penalty when boundaries are violated\. Natively, Boltz utilizes this mechanism to enforce physical plausibility, applying potentials to resolve steric clashes and correct stereochemistry errors during the generation\.
## 3Methods
### 3\.1MS\-guided diffusion and steering
AIMS\-Fold is a diffusion\-based generative model for biomolecular structure prediction built upon the Boltz\-2 architecture\[[23](https://arxiv.org/html/2605.26192#bib.bib10)\]\. This work uses and extends the Boltz\-2 implementation, available under the MIT License\. To better support structural proteomics constraints, we extend inference time guidance \(Boltz Steering\) to steer the generation process toward biologically compatible geometries\. Rather than relying solely on the neural network to predict the denoising step, AIMS\-Fold calculates energy potentials based on intermediate atomic coordinates\. The gradients of these physics\-informed potentials are injected into the sampling trajectory\[[6](https://arxiv.org/html/2605.26192#bib.bib17)\], actively altering the probability landscape to guide the model into conformational basins that satisfy experimental data\.
### 3\.2XL\-MS distance guidance and negative constraints
Following standard quality control and normalization, XL\-MS data is incorporated as distance constraints\. Cross\-links identified by XL\-MS yield two types of spatial constraints: positive constraints, which dictate that target residues reside within a specified proximity distance under a given experimental condition, and negative constraints, which infer that the residues distance exceeds the physical reach of the cross\-linker\. For positive constraints, we utilize the existing Boltz\-2 distance potentials to attract specified residues\.
Crucially, XL\-MS data derived from differential experimental conditions can derive negative constraints \(e\.g\., a cross\-link present in multiple treatments but missing in a specific state\)\. To support this, we introduce a negation flag that establishes a repulsive potential, pushing the specified residues beyond a user\-defined distance thresholddmind\_\{\\min\}to satisfy the differential missing\-link data\. For a set of negatively constrained residue pairs𝒩neg\\mathcal\{N\}\_\{\\text\{neg\}\}, the repulsive potential is formulated as:
Uneg\(𝐱\)=∑\(i,j\)∈𝒩negmax\(0,dmin−dij\)2U\_\{\\text\{neg\}\}\(\\mathbf\{x\}\)=\\sum\_\{\(i,j\)\\in\\mathcal\{N\}\_\{\\text\{neg\}\}\}\\max\(0,d\_\{\\min\}\-d\_\{ij\}\)^\{2\}\(4\)This applies a quadratic penalty only when the Euclidean distancedijd\_\{ij\}falls below the required threshold, forcing the two residues apart during the reverse diffusion steps\.
### 3\.3Integrating HDX\-MS protection data
Hydrogen\-deuterium exchange mass spectrometry \(HDX\-MS\) captures proximate, dynamic physical interactions, such as interface flexibility, that standard spatial constraints cannot fully resolve\[[45](https://arxiv.org/html/2605.26192#bib.bib37)\]\. Experimental HDX\-MS relative fractional uptake values encode the difference in deuterium uptake between states, where negative values indicate protection upon complex formation\. To translate these protection signals into active guidance during the diffusion process, AIMS\-Fold employs two strategies: a distance proxy and a physically differentiable burial potential\.
#### Distance\-based proxy constraints
We map protection data to spatial geometry\. Each protected residue generates an independent contact constraint against all residues of the other chain\. The maximum distance threshold is dynamically scaled by the magnitude of the experimental protection:
dmax\(i\)=dbase\(1−\|Δi\|ws\)d\_\{\\max\}\(i\)=d\_\{\\text\{base\}\}\\left\(1\-\|\\Delta\_\{i\}\|w\_\{s\}\\right\)\(5\)wheredmax\(i\)d\_\{\\max\}\(i\)is the adjusted maximum distance boundary for residueii,dbased\_\{\\text\{base\}\}is the default baseline interaction distance,\|Δi\|\|\\Delta\_\{i\}\|is the absolute magnitude of the experimental HDX\-MS protection signal \(derived from the relative fractional uptake difference, see subsections 3\.5 and B\.2\), andwsw\_\{s\}is a tunable scaling weight defining the sensitivity of the threshold to the experimental signal\. This calculated threshold is clamped to a minimum of 3Å\. This formulation ensures that the spatial constraint remains proportional to the biological signal: residues exhibiting a large HDX\-MS protection upon complex formation \(a large\|Δi\|\|\\Delta\_\{i\}\|\) strictly shrink thedmax\(i\)d\_\{\\max\}\(i\)boundary, receiving tighter distance bounds and forcing the model to bury them closer to the interaction interface during generation\.
#### Differentiable SASA\-based protection guidance
To more directly model the HDX\-MS protection solvent accessibility, we implement a differentiable burial potential\. For each protected residueii, we compute a Gaussian\-weighted neighbor countburiali\\text\{burial\}\_\{i\}as a differentiable proxy for Solvent Accessible Surface Area \(SASA\):
buriali=∑j≠ie−dij22σ2\\text\{burial\}\_\{i\}=\\sum\_\{j\\neq i\}e^\{\-\\frac\{d\_\{ij\}^\{2\}\}\{2\\sigma^\{2\}\}\}\(6\)wheredijd\_\{ij\}is the distance between atoms, andσ\\sigmais the width of the Gaussian kernel\. A smallerσ\\sigmayields a tighter burial definition, while a larger value provides a broader receptive field\. The burial value is then converted to a pseudo\-SASA metric by using an exponential decay function relative to a reference burial constantburialref\\text\{burial\}\_\{\\text\{ref\}\}:
SASAi=e−burialiburialref\\text\{SASA\}\_\{i\}=e^\{\-\\frac\{\\text\{burial\}\_\{i\}\}\{\\text\{burial\}\_\{\\text\{ref\}\}\}\}\(7\)For each experimentally protected residueii, a quadratic lossℒi\\mathcal\{L\}\_\{i\}is applied if theSASAi\\text\{SASA\}\_\{i\}exceeds the protection thresholdτ\\tau:
ℒi=kmax\(0,SASAi−τ\)2\\mathcal\{L\}\_\{i\}=k\\max\(0,\\text\{SASA\}\_\{i\}\-\\tau\)^\{2\}\(8\)The total lossℒ\\mathcal\{L\}is the sum of these penalties:
ℒ=∑i∈protectedℒi\\mathcal\{L\}=\\sum\_\{i\\in\\text\{protected\}\}\\mathcal\{L\}\_\{i\}\(9\)
### 3\.4Guidance scheduling
To ensure that the injection of physical priors does not destabilize the protein folding, we employ a piecewise timestep schedule over the reverse diffusion trajectory, progressing from pure noise to a fully denoised structure\. Applying strong guidance too early in the denoising process corrupts the generation, while applying it too late fails to influence the global topology\. Because cross\-linking mass spectrometry and hydrogen\-deuterium exchange influence the structure at different spatial resolutions, we decouple their respective guidance schedules\. Initially, the HDX\-MS guidance is disabled when the structure is too noisy for solvent accessible surface area to be structurally meaningful\. It is then applied to seed the burial of protected residues while the global fold remains highly fluid, and eventually ramps up to full strength to rigidly reinforce the experimental solvent accessibility profile as the global structure consolidates\. Conversely, spatial constraints derived from cross\-linking dictate the global arrangement of the complex\. To allow the base model to first establish local secondary structures without interference, this spatial guidance is delayed and is applied more sparsely to optimize inference speed\. It remains disabled during the initial unconstrained folding phase, is applied at a partial strength to gently draw the cross\-linked domains toward each other, and reaches full strength only in the final stages of diffusion to stabilize the interaction interfaces and satisfy the rigid distance boundaries\. Full timestep boundaries and hyperparameter configurations for these schedules are detailed in Appendix A\.
### 3\.5Constraints derivation from raw data
To effectively guide the diffusion trajectory, raw mass spectrometry measurements must be translated into actionable constraints\. We implemented processing pipelines for XL\-MS and HDX\-MS data, followed by an iterative subsetting strategy to resolve contradictory signals\.
Cross\-linking mass spectrometry constraints are derived directly from normalized, high\-confidence MS intensity data\. Positive constraints are assigned to residue pairs that exhibit significantly enriched intensity in the target biological state\. Conversely, negative constraints are assigned to residue pairs that exhibit significantly lower intensity in one experimental condition compared to another\. These act as repulsive spatial bounds, applying a gradient penalty to push the respective residues beyond the maximal physical length of the cross\-linker\. Hydrogen\-deuterium exchange constraints are derived by comparing the fractional uptake between the protein complex and the isolated binary or apo states\.
Because experimental MS data inherently contains noise, and because it captures dynamic ensembles where it is unknown a priori which specific constraints originate from the same discrete conformation, we implemented an iterative constraint subsetting strategy\. Rather than applying all constraints simultaneously, the constraint pool is partitioned into numerous subsets\. The model performs parallel guided generations across these subsets, and we assess constraint satisfaction for each resulting structure \(more details in section 4\.1\)\. Subsets that yield high satisfaction rates are preserved and combined\. This iterative process of generation, evaluation, and recombination effectively prunes noisy or contradictory signals, ultimately converging on a consistent constraint subset for the final, high\-accuracy structure generation\.
### 3\.6Related work
Historically, MS\-derived experimental restraints have been integrated using docking and integrative modeling platforms such as HADDOCK\[[7](https://arxiv.org/html/2605.26192#bib.bib25)\], RosettaHDX\[[36](https://arxiv.org/html/2605.26192#bib.bib35)\], DOT2\[[27](https://arxiv.org/html/2605.26192#bib.bib36)\]and HDXRank\[[38](https://arxiv.org/html/2605.26192#bib.bib34)\]\. These methods typically use XL\-MS distance bounds and HDX\-MS protection factors as scoring functions to filter and rank large number of generated candidate structures\[[28](https://arxiv.org/html/2605.26192#bib.bib28)\]\. These approaches rely heavily on rigid\-body docking or limited flexible refinement and scale poorly for highly dynamic, multi\-state complexes such as PROTAC ternary structures\[[4](https://arxiv.org/html/2605.26192#bib.bib13)\]\.
To bridge the gap between AI structure prediction and MS data, recent work has attempted to integrate experimental restraints directly into neural network architectures\. AlphaLink1\[[33](https://arxiv.org/html/2605.26192#bib.bib4)\]and AlphaLink2\[[34](https://arxiv.org/html/2605.26192#bib.bib1)\]successfully integrate XL\-MS data to improve the prediction of challenging protein\-protein interactions and antibody\-antigen complexes\. However, these methods treat cross\-links as explicit input features, requiring extensive retraining or fine\-tuning of the underlying AlphaFold architecture \(e\.g\., modifying the pair representation\), which limits their flexibility\. In contrast, AIMS\-Fold requires no model retraining as it is an inference\-time guidance method\.
The concept of using a pretrained diffusion model as a structural regularizer for experimental data was recently demonstrated by CryoBoltz\[[26](https://arxiv.org/html/2605.26192#bib.bib19)\]\. CryoBoltz applies inference\-time guidance to Boltz\-1, steering the sampling trajectory to minimize the distance between the predicted structure and a 3D point cloud representation of a cryo\-EM density map\.
## 4Results
To evaluate AIMS\-Fold, we conducted benchmarking across both synthetic and experimental datasets\. Our evaluation framework is designed to first establish the MS\-guided diffusion on clean cases where the data is ideal, followed by validation on noisy, heterogeneous experimental data captured from challenging induced proximity complexes\. Furthermore, we evaluated our MS\-guided generation against a post\-hoc filtering approach, where dozens of unconstrained candidates are generated and subsequently ranked by their agreement with the structural MS data\.
Table 1:Quantitative evaluation of AIMS\-Fold and Boltz\-2 \(Unguided\) on different cases\. The table details the percentage of satisfied constraints, categorized by constraint type, for each evaluated case\. To identify the best of 5 prediction, we ranked the generated models by the highest percentage of fulfilled experimental constraints\. In the event of a tie, the model with the highest DockQ score was selected\. If a reference structure was unavailable, we defaulted to the first generated model among the tied candidates\. The average percentage of satisfied constraints across all 5 generated samples is also indicated together with the standard deviation\. Bold values highlight a substantial improvement over the baseline model\.Best of 5: Constraints satisfied \(%\)Best of 5: Comparison to referenceAvg\-5: Constraints satisfied \(%\)CaseModelHDX\-MSXL\-MSDockQ↑\\uparrowlRMSD \(Å,↓\\downarrow\)iRMSD \(Å,↓\\downarrow\)HDX\-MSXL\-MSBRD4\-CRBNUnguided20\-0\.09 \(Incorrrect\)20\.3920\-HDX\-MS100\-0\.28 \(Acceptable\)6\.5392±1692\\pm 16\-WDR5\-DCAF1Unguided\-250\.05 \(Incorrect\)26\.710\.9\-20±11\.220\\pm 11\.2XL\-MS\-1000\.44 \(Acceptable\)6\.32\-82\.6±11\.282\.6\\pm 11\.2KAT6\-CRBNUnguided4063\-no available reference\-4051±8\.951\\pm 8\.9HDX\-MS705068±4\.268\\pm 4\.249±3\.849\\pm 3\.8XL\-MS608850±6\.750\\pm 6\.783±6\.783\\pm 6\.7HDX\-MS \+ XL\-MS708867±4\.867\\pm 4\.884±6\.384\\pm 6\.3PTPN2\-CRBNUnguided0\-0\.03 \(Incorrect\)31\.520\.00\-HDX\-MS50\-0\.31 \(Acceptable\)7\.03\.640±2040\\pm 20\-PD1\-NivolumabUnguided000\.21 \(Incorrect\)68\.32800HDX\-MS1001000\.58 \(Medium\)4\.42\.378±5\.878\\pm 5\.880±17\.180\\pm 17\.1HDX\-MS \+ XL\-MS1001000\.70 \(Medium\)3\.41\.6100100
### 4\.1Experimental setup and data curation
For the synthetic benchmarks, we curated a set of protein complexes, including PROTAC ternary complexes and protein\-antibody complexes, from the Protein Data Bank \(PDB\)\. We selected structures deposited after June 1, 2023 to ensure they were excluded from the Boltz\-2 training set to prevent data leakage\. Using these ground\-truth structures, we simulated both XL\-MS and HDX\-MS data\. Synthetic cross\-links were generated by identifying target residue pairs \(e\.g\., Lysine\-Lysine\) that fall within a strictly defined spatial threshold, accurately mimicking physical cross\-linker arm lengths\. Synthetic HDX\-MS protection factors were derived by calculating the theoretical Solvent Accessible Surface Area \(SASA\) directly from the experimental coordinates\. For our experimental evaluations, we utilized processed XL\-MS and HDX\-MS data, combining datasets curated from the literature with novel data generated in\-house\. We then translated these discrete experimental measurements into continuous distance constraints and surface protection patterns\.
To evaluate experimental agreement and structural integrity of our models, we utilize a number of metrics to capture both global agreement and local satisfaction of constraints\. When a ground\-truth structure is available, we assess docking accuracy using DockQ\[[3](https://arxiv.org/html/2605.26192#bib.bib12)\]\. For highly dynamic complexes, such as PROTAC induced ternary structures, DockQ might be misleading, therefore we also report Ligand Root Mean Square Deviation \(lRMSD\) and Interface Root Mean Square Deviation \(iRMSD\) to provide a general view of the structural accuracy\. To evaluate the protection patterns of proteins, we calculate the change in the Relative Solvent Accessible Surface Area \(Δ\\DeltaRSA\), which divides the absolute SASA by the maximum SASA for each amino acid\[[35](https://arxiv.org/html/2605.26192#bib.bib38)\]\. We assess the change in protection by calculating the RSA difference between the monomeric and complex states\. Satisfaction of an HDX\-MS constraint was defined asΔ\\DeltaRSA≥\\geq0\.05 \(higher delta means more protection\)\. The overall score is the percentage of constrained residues meeting this threshold\. For XL\-MS data, we calculate the Euclidean distance between the Cα\\alphaatoms of each pair of residues\. Thresholds for satisfaction were set based on the cross\-linker type\. For positive constraints, satisfaction is when the measured distance is less\-than or equal to the threshold, while negative constraint satisfaction is when the measured distance is larger than the defined threshold\. We evaluate overall satisfaction as the percentage of pairs that meet their respective thresholds\.
The prediction tasks were run on an AWS EC2 g5\.8xlarge instance, with 32vCPUs, 64GiB RAM, and NVIDIA A10G GPU\.
### 4\.2Performance on synthetic MS data
BRD4\-PROTAC\-CRBN\.The ternary structure of BRD4\-CRBN was shown to adopt distinct conformations depending on PROTAC used\. While earlier crystal structures such as PDB: 6BOY \(with the dBET6 PROTAC\) established a specific conformation\[[21](https://arxiv.org/html/2605.26192#bib.bib23)\], a more recent crystal structure, PDB: 8RQ9, confirmed that CFT1297 stabilizes a different conformation\[[15](https://arxiv.org/html/2605.26192#bib.bib26)\]\. Despite the experimental evidence, Boltz\-2 defaults the prediction with CFT1297 to the wrong PDB: 6BOY pose\. We created synthetic HDX\-MS data based on the difference between the two crystal structures \(6BOY vs\. 8RQ9\)\. Using 5 residues as constraints that were chosen by the constraints subsetting algorithm, the model was able to predict the correct ternary orientation with satisfaction of all of the HDX\-MS constraints\. The guidance improved the lRMSD from 20\.3 Å to 6\.5 Å \(Table 1 and Figure S2\)\.
WDR5\-PROTAC\-DCAF1\.Recent structural data demonstrates that active and inactive PROTACs induce distinctly different orientations of WDR5 relative to DCAF1\[[18](https://arxiv.org/html/2605.26192#bib.bib22)\]\. We found that Boltz\-2 fails to capture this activity\-dependent shift, incorrectly defaulting to an inactive conformation even when tasked with predicting an active ternary complex \(Figure S1\)\. While applying either positive or negative constraints improved the performance of the model, guiding the prediction with a combination of both yielded the best scores and successfully rescued the active target state, driving the iRMSD down from 10\.9 Å to 2 Å \(Figure S1\)\.
### 4\.3Performance on experimental MS data
KAT6A\-PROTAC\-CRBN\.Since this complex lacks solved experimental structures, we generated in house HDX\-MS and XL\-MS data, differentiating between active and inactive compounds, to steer our model toward the productive conformation\. We found that using HDX\-MS constraints alone successfully satisfies the local protection data, improving the HDX\-MS satisfaction of the best prediction from 40% to 70%, but does not improve the global XL\-MS fit \(Table 1\)\. Guiding our model with XL\-MS constraints alone improves satisfaction of both sets of data\. Layering both HDX\-MS and XL\-MS together best satisfies the constraints with the top model fulfilling 70% of the HDX\-MS and 88% of the XL\-MS constraints, an improvement that holds up consistently across the 5\-model averages\.
Figure 2:PTPN2\-PROTAC\-CRBN co\-folding\. \(a\) AIMS\-Fold predicted conformation partially agrees with the HDX\-MS data\. \(b\) The ground truth selected is cryo\-EM initialized MD simulation frame that best agrees with HDX\-MS data\. \(c\) Boltz\-2 predicts an incorrect conformation that doesn’t agree the experimental data\.PTPN2\-PROTAC\-CRBN\.We used the cryo\-EM ternary complex \(PDB: 8UH6\) as a starting structural model\. Static cryo\-EM structures do not always capture intrinsic protein flexibility in solution, leading to discrepancies with solution\-phase HDX\-MS data\. Since the investigators from\[[12](https://arxiv.org/html/2605.26192#bib.bib11)\]note substential flexibility of the complex and low agreement between cryo\-EM and HDX\-MS data, they performed MD simulations to resolve the discrepencies\. We evaluated several frames to identify the most representative one, by screening for the frame that satisfied the highest number of HDX\-MS protected residues \(defining a residue as protected if its Buried Surface Area was\>5\>5Å2\)\. This best\-fit frame served as the reference for the quantitative benchmarks, including DockQ, lRMSD, and iRMSD calculations\.
While an unguided Boltz\-2 prediction failed to capture the correct orientation \(iRMSD=19\.8iRMSD=19\.8Å\), AIMS\-Fold successfully utilized the HDX\-MS constraints to find the correct conformation, yielding an iRMSD of 3\.6 Å against the selected reference frame \(Figure 2, Table 1\)\. However, as noted in Figure 2a and Table 1, the guided model satisfies only 50% of constraints, highlighting a strong inherent training bias that resists extreme conformational shifts, and a specific area for future refinement\.
PD\-1\-Nivolumab\.Nivolumab is a cornerstone of cancer immunotherapy that functions by binding to the Programmed Cell Death Protein 1\(PD\-1\), preventing the deactivation of the host immune response\. Boltz\-2 fails to accurately predict the binding interface between the two, incorrectly positioning PD\-1 on the wrong side of the Nivolumab \(Figure 3\)\. These spatial errors result in poor DockQ and high iRMSD and lRMSD values when comparing to the crystal structure \(PDB: 5WT9, Table 1\), underscoring the complexity of antibody\-antigen complex predictions\.
To address this, we derived HDX\-MS constraint subsets \(as described in Methods Section 3\.4\), from previously published data, as well as XL\-MS constraints\[[46](https://arxiv.org/html/2605.26192#bib.bib18)\]\. While one out of the five HDX\-MS guided prediction models satisfied all of the constraints, incorporation of both HDX\-MS and XL\-MS data refined the modeling to satisfy 100% of the constraints in all 5 generated models, with DockQ scores reaching up to 0\.70 and iRMSD values of 1\.6 Å \(Table 1 and Figures S3, S4\)\.
Figure 3:PD\-1\-Nivolumab co\-folding\. The guided model covers the right interface area specified in the HDX\-MS data and agrees with the crystal structure\. The unconstrained model on the other hand places the interface on the opposite side\.
### 4\.4Guided generation vs\. Post\-hoc filtering
The default strategy for integrating experimental data into computational modeling was to use structural constraints to score and filter a large ensemble of naive predictions post\-generation\. However, this post\-hoc filtering strategy relies on the assumption that the unconstrained generative model actually sampled the correct conformation\. In vast and highly flexible conformational spaces, such as PROTAC ternary structures, unconstrained models can collapse into global energy minima or static states memorized during training\. If the model never explores the specific structural basin indicated by the proteomics data, generating and filtering thousands of decoy models will simply yield invalid results\. To overcome this fundamental limitation, guided generation is required\. Rather than passively relying on stochastic sampling to stumble upon the correct structure, inference\-time steering actively alters the probability landscape throughout the generation process into biologically valid conformational basins that satisfy the experimental constraints\.
Figure 4:Post\-hoc filtering is insufficient for the detection of the correct conformation\. \(A\) For the KAT6A\-PROTAC\-CRBN complex, 10 guided models are compared to 50 Boltz\-2 predictions across 5 different seeds\. Rows represent models, columns represent constraint type and color indicates % constraints fulfilled\. The models are sorted based on the average % constraints satisfied\. All naive models are ranked below the 10 guided ones\. \(B\) Swarm plot of DockQ scores for the PD1\-Nivolumab complex, comparing 5 guided AIMS\-Fold models against 100 naive Boltz\-2 predictions, generated across 10 different seeds, evaluated against the ground\-truth Cryo\-EM structure\.To demonstrate this, we compared 10 KAT6A\-PROTAC\-CRBN predictions generated by our model against 50 unguided predictions generated by Boltz\-2 across 5 different random seeds\. These models were ranked based on their average satisfaction across three experimental constraints: positive XL\-MS, negative XL\-MS, and HDX\-MS\. Ranking the generated structures by constraint satisfaction revealed that the 10 guided models were ranked at the top 10 positions, clearly outperforming all 50 unconstrained predictions \(Figure 4A\)\. However, because no ground\-truth experimental structure exists for the KAT6A ternary complex, and constraint satisfaction alone cannot definitively prove structural accuracy, we sought to confirm this fundamental limitation of unguided generation against a known structural reference\. Returning to the PD1\-Nivolumab complex, where our guided models successfully captured the Cryo\-EM conformation, we tested whether extensive unguided sampling could eventually stumble upon the correct state\. We generated 100 unguided Boltz\-2 predictions across 10 independent seeds\. Consistent with the KAT6A results, the naive model completely failed to sample the correct structural basin, as the 5 AIMS\-Fold models scored substantially higher than all 100 unguided predictions \(Figure 4B\)\.
## 5Discussion
In this work, we introduced AIMS\-Fold, demonstrating the first systematic approach to combining structural proteomics and diffusion\-based AI models for protein structure solving, enabling rational drug design\. Across our benchmarks, AIMS\-Fold significantly increased structural prediction accuracy for highly flexible, induced proximity and antibody based systems\. Crucially, we demonstrated that unconstrained AI models, such as Boltz\-2, predict static states that are heavily biased by their PDB training distributions\[[16](https://arxiv.org/html/2605.26192#bib.bib3),[30](https://arxiv.org/html/2605.26192#bib.bib39)\]\. Consequently, they can overlook biologically active conformations\[[41](https://arxiv.org/html/2605.26192#bib.bib47)\]\. By actively steering the generative trajectory, the model successfully identifies these overlooked conformations\. Furthermore, we showed that XL\-MS and HDX\-MS can provide complementary constraints\[[4](https://arxiv.org/html/2605.26192#bib.bib13)\]\.
The implications of this framework for rational drug design, particularly for PROTACs and molecular glues, are profound as their efficacy is driven by the dynamic assembly of a ternary complex \(e\.g\., POI\-E3\-PROTAC\) rather than binary affinity of the compound to the protein\[[31](https://arxiv.org/html/2605.26192#bib.bib14),[21](https://arxiv.org/html/2605.26192#bib.bib23)\]\. Traditional experimental approaches, such as X\-ray crystallography and Cryo\-EM, are low\-throughput, costly, and static, freezing the dynamics of the complex\[[11](https://arxiv.org/html/2605.26192#bib.bib6),[4](https://arxiv.org/html/2605.26192#bib.bib13)\]\. Equivalently, computational methods like molecular docking or molecular dynamics suffer from insufficient sampling and high computational costs\[[8](https://arxiv.org/html/2605.26192#bib.bib20),[42](https://arxiv.org/html/2605.26192#bib.bib8)\]\.
AIMS\-Fold overcomes these limitations by utilizing MS data to capture proteins in their native, dynamic states at a higher throughput\. For instance, in the rational design of PROTACs, it is well established that even minor modifications to a linker can completely obliterate target degradation by altering ternary complex cooperativity and by disrupting the ubiquitination zone\[[21](https://arxiv.org/html/2605.26192#bib.bib23)\]\. Standard, unconstrained AI prediction models fail to capture these subtle structural modifications, while AIMS\-Fold accurately models these PROTAC\-dependent conformational shifts\.
Despite these advancements, our approach has notable limitations regarding both experimental data acquisition and the underlying computational model\. Experimentally, AIMS\-Fold is constrained by MS peptide affinity and sequence coverage\. XL\-MS applicability heavily depends on the prevalence and accessibility of specific residues, particularly lysines, at the protein\-protein interface\[[47](https://arxiv.org/html/2605.26192#bib.bib48),[44](https://arxiv.org/html/2605.26192#bib.bib49)\]\. If these residues are absent, XL\-MS utilization is restricted\. Furthermore, XL\-MS requires the cross\-linked residues to fall strictly within a restrictive spatial distance \(typically < 30 Å\)\[[19](https://arxiv.org/html/2605.26192#bib.bib16)\]\. Similarly, HDX\-MS is limited by peptide\-level sequence coverage and resolution\[[25](https://arxiv.org/html/2605.26192#bib.bib50)\]\. Because HDX\-MS yields peptide\-level resolution instead of atomic level, it can introduce noise into the guidance gradients, as protection patterns might be affected by adjacent residues rather than direct binding ones and the other way around\. Computationally, AIMS\-Fold remains inherently limited by the training weights and biases of the base Boltz architecture\. If the experimental MS data points to a conformational state that deviates too extremely from the model’s training data, the underlying neural network can sometimes resist the applied steering potentials, leading to suboptimal constraint satisfaction\. Lastly, it is important to note that while AIMS\-Fold is potentially more accurate than Boltz\-2, it requires the generation of complex experimental data which may not always be feasible\.
Mitigating these limitations will be the focus of future research\. The reliance on lysine\-specific cross\-linking can be addressed by incorporating orthogonal chemistries that target acidic residues or provide non\-specific mapping\[[17](https://arxiv.org/html/2605.26192#bib.bib21),[22](https://arxiv.org/html/2605.26192#bib.bib41)\]\. Computationally, while the current iteration of AIMS\-Fold generates highly accurate singular structures, the comprehensive understanding of ternary complexes requires exploring broad conformational ensembles\. In future work, we aim to leverage the continuous and dynamic nature of HDX\-MS data to reveal multiple distinct, dynamic conformations of these complexes\.
## 6Declaration of Interests
All authors are employees and shareholders of Protai Bio, Ramat Gan, Israel
## References
- \[1\]J\. Abramson, J\. Adler, J\. Dunger, R\. Evans, T\. Green, A\. Pritzel, O\. Ronneberger, L\. Willmore, A\. J\. Ballard, J\. Bambrick, S\. W\. Bodenstein, D\. A\. Evans, C\. Hung, M\. O’Neill, D\. Reiman, K\. Tunyasuvunakool, Z\. Wu, A\. Žemgulytė, E\. Arvaniti, C\. Beattie, O\. Bertolli, A\. Bridgland, A\. Cherepanov, M\. Congreve, A\. I\. Cowen\-Rivers, A\. Cowie, M\. Figurnov, F\. B\. Fuchs, H\. Gladman, R\. Jain, Y\. A\. Khan, C\. M\. R\. Low, K\. Perlin, A\. Potapenko, P\. Savy, S\. Singh, A\. Stecula, A\. Thillaisundaram, C\. Tong, S\. Yakneen, E\. D\. Zhong, M\. Zielinski, A\. Žídek, V\. Bapst, P\. Kohli, M\. Jaderberg, D\. Hassabis, and J\. M\. Jumper\(2024\-06\)Accurate structure prediction of biomolecular interactions with AlphaFold 3\.Nature630\(8016\),pp\. 493–500\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p2.1),[§2\.1](https://arxiv.org/html/2605.26192#S2.SS1.p1.2)\.
- \[2\]\(1982\-05\)Reverse\-time diffusion equation models\.Stoch\. Process\. Their Appl\.12\(3\),pp\. 313–326\(en\)\.Cited by:[§2\.1](https://arxiv.org/html/2605.26192#S2.SS1.p3.1)\.
- \[3\]S\. Basu and B\. Wallner\(2016\-08\)DockQ: a quality measure for Protein\-Protein docking models\.PLOS ONE11\(8\),pp\. e0161879\.Cited by:[§4\.1](https://arxiv.org/html/2605.26192#S4.SS1.p2.4)\.
- \[4\]M\. Békés, D\. R\. Langley, and C\. M\. Crews\(2022\-03\)PROTAC targeted protein degraders: the past is prologue\.Nat Rev Drug Discov21\(3\),pp\. 181–200\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p1.1),[§3\.6](https://arxiv.org/html/2605.26192#S3.SS6.p1.1),[§5](https://arxiv.org/html/2605.26192#S5.p1.1),[§5](https://arxiv.org/html/2605.26192#S5.p2.1)\.
- \[5\]P\. P\. Chamberlain and L\. G\. Hamann\(2019\-10\)Development of targeted protein degradation therapeutics\.Nat Chem Biol15\(10\),pp\. 937–944\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p1.1)\.
- \[6\]H\. Chung, J\. Kim, M\. T\. Mccann, M\. L\. Klasky, and J\. C\. Ye\(2024\-05\)Diffusion posterior sampling for general noisy inverse problems\.External Links:2209\.14687Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p4.1),[§2\.2](https://arxiv.org/html/2605.26192#S2.SS2.p1.2),[§3\.1](https://arxiv.org/html/2605.26192#S3.SS1.p1.1)\.
- \[7\]C\. Dominguez, R\. Boelens, and A\. M\. J\. J\. Bonvin\(2003\-02\)HADDOCK: a protein\-protein docking approach based on biochemical or biophysical information\.J Am Chem Soc125\(7\),pp\. 1731–1737\(en\)\.Cited by:[§3\.6](https://arxiv.org/html/2605.26192#S3.SS6.p1.1)\.
- \[8\]M\. L\. Drummond and C\. I\. Williams\(2019\-04\)In silico modeling of PROTAC\-Mediated ternary complexes: validation and application\.J Chem Inf Model59\(4\),pp\. 1634–1644\(en\)\.Cited by:[§5](https://arxiv.org/html/2605.26192#S5.p2.1)\.
- \[9\]N\. Dunlop, F\. Erazo, F\. Jalalypour, and R\. Mercado\(2025\)Predicting PROTAC\-mediated ternary complexes with AlphaFold3 and boltz\-1\.Digit\. Discov\.4\(12\),pp\. 3782–3809\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p2.1)\.
- \[10\]B\. Efron\(2011\)Tweedie’s formula and selection bias\.J Am Stat Assoc106\(496\),pp\. 1602–1614\(en\)\.Cited by:[§2\.2](https://arxiv.org/html/2605.26192#S2.SS2.p1.2)\.
- \[11\]M\. S\. Gadd, A\. Testa, X\. Lucas, K\. Chan, W\. Chen, D\. J\. Lamont, M\. Zengerle, and A\. Ciulli\(2017\-05\)Structural basis of PROTAC cooperative recognition for selective protein degradation\.Nat Chem Biol13\(5\),pp\. 514–521\(en\)\.Cited by:[§5](https://arxiv.org/html/2605.26192#S5.p2.1)\.
- \[12\]Q\. Hao, M\. K\. Rathinaswamy, K\. L\. Klinge, M\. Bratkowski, A\. Mafi, C\. K\. Baumgartner, K\. M\. Hamel, G\. K\. Veits, R\. Jain, C\. Catalano, M\. Fitzgerald, A\. W\. Hird, E\. Park, H\. U\. Vora, J\. A\. Henderson, K\. Longenecker, C\. W\. Hutchins, W\. Qiu, G\. Scapin, Q\. Sun, V\. S\. Stoll, C\. Sun, P\. Li, D\. Eaton, D\. Stokoe, S\. L\. Fisher, C\. G\. Nasveschuk, M\. Paddock, and M\. E\. Kort\(2024\-08\)Mechanistic insights into a heterobifunctional degrader\-induced PTPN2/N1 complex\.Commun Chem7\(1\),pp\. 183\(en\)\.Cited by:[§4\.3](https://arxiv.org/html/2605.26192#S4.SS3.p2.2)\.
- \[13\]S\. J\. Hughes and A\. Ciulli\(2017\-11\)Molecular recognition of ternary complexes: a new dimension in the structure\-guided design of chemical degraders\.Essays Biochem\.61\(5\),pp\. 505–516\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p1.1)\.
- \[14\]M\. Ignatov, A\. Jindal, S\. Kotelnikov, D\. Beglov, G\. Posternak, X\. Tang, P\. Maisonneuve, G\. Poda, R\. A\. Batey, F\. Sicheri, A\. Whitty, P\. J\. Tonge, S\. Vajda, and D\. Kozakov\(2023\-04\)High accuracy prediction of PROTAC complex structures\.J\. Am\. Chem\. Soc\.145\(13\),pp\. 7123–7135\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p1.1)\.
- \[15\]A\. Kroupova, V\. A\. Spiteri, Z\. J\. Rutter, H\. Furihata, D\. Darren, S\. Ramachandran, S\. Chakraborti, K\. Haubrich, J\. Pethe, D\. Gonzales, A\. J\. Wijaya, M\. Rodriguez\-Rios, M\. Sturbaut, D\. M\. Lynch, W\. Farnaby, M\. A\. Nakasone, D\. Zollman, and A\. Ciulli\(2024\-10\)Design of a cereblon construct for crystallographic and biophysical studies of protein degraders\.Nature Communications15\(1\),pp\. 8885\(en\)\.Cited by:[§4\.2](https://arxiv.org/html/2605.26192#S4.SS2.p1.1)\.
- \[16\]T\. J\. Lane\(2023\-02\)Protein structure prediction has reached the single\-structure frontier\.Nat Methods20\(2\),pp\. 170–173\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p2.1),[§1](https://arxiv.org/html/2605.26192#S1.p3.1),[§5](https://arxiv.org/html/2605.26192#S5.p1.1)\.
- \[17\]K\. Lee and F\. J\. O’Reilly\(2023\-03\)Cross\-linking mass spectrometry for mapping protein complex topologies in situ\.Essays Biochem67\(2\),pp\. 215–228\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p3.1),[§5](https://arxiv.org/html/2605.26192#S5.p5.1)\.
- \[18\]M\. F\. Mabanglo, B\. Wilson, M\. Noureldin, S\. W\. Kimani, A\. Mamai, C\. Krausser, H\. González\-Álvarez, S\. Srivastava, M\. Mohammed, L\. Hoffer, M\. Chan, J\. Avrumutsoae, A\. S\. M\. Li, T\. Hajian, S\. Tucker, S\. Green, M\. Szewczyk, D\. Barsyte\-Lovejoy, V\. Santhakumar, S\. Ackloo, P\. Loppnau, Y\. Li, A\. Seitova, T\. Kiyota, J\. G\. Wang, G\. G\. Privé, D\. A\. Kuntz, B\. Patel, V\. Rathod, A\. Vala, B\. Rout, A\. Aman, G\. Poda, D\. Uehling, J\. Ramnauth, L\. Halabelian, R\. Marcellus, R\. Al\-Awar, and M\. Vedadi\(2024\-11\)Crystal structures of DCAF1\-PROTAC\-WDR5 ternary complexes provide insight into DCAF1 substrate specificity\.Nat Commun15\(1\),pp\. 10165\(en\)\.Cited by:[§4\.2](https://arxiv.org/html/2605.26192#S4.SS2.p2.1)\.
- \[19\]G\. R\. Masson, J\. E\. Burke, N\. G\. Ahn, G\. S\. Anand, C\. Borchers, S\. Brier, G\. M\. Bou\-Assaf, J\. R\. Engen, S\. W\. Englander, J\. Faber, R\. Garlish, P\. R\. Griffin, M\. L\. Gross, M\. Guttman, Y\. Hamuro, A\. J\. R\. Heck, D\. Houde, R\. E\. Iacob, T\. J\. D\. Jørgensen, I\. A\. Kaltashov, J\. P\. Klinman, L\. Konermann, P\. Man, L\. Mayne, B\. D\. Pascal, D\. Reichmann, M\. Skehel, J\. Snijder, T\. S\. Strutzenberg, E\. S\. Underbakke, C\. Wagner, T\. E\. Wales, B\. T\. Walters, D\. D\. Weis, D\. J\. Wilson, P\. L\. Wintrode, Z\. Zhang, J\. Zheng, D\. C\. Schriemer, and K\. D\. Rand\(2019\-07\)Recommendations for performing, interpreting and reporting hydrogen deuterium exchange mass spectrometry \(HDX\-MS\) experiments\.Nat Methods16\(7\),pp\. 595–602\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p3.1),[§5](https://arxiv.org/html/2605.26192#S5.p4.1)\.
- \[20\]K\. Ngo, P\. Yang, V\. Yarov\-Yarovoy, C\. E\. Clancy, and I\. Vorobyov\(2025\-07\)Harnessing AlphaFold to reveal hERG channel conformational state secrets\.Elife13\(RP104901\),pp\. RP104901\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p2.1)\.
- \[21\]R\. P\. Nowak, S\. L\. DeAngelo, D\. Buckley, Z\. He, K\. A\. Donovan, J\. An, N\. Safaee, M\. P\. Jedrychowski, C\. M\. Ponthier, M\. Ishoey, T\. Zhang, J\. D\. Mancias, N\. S\. Gray, J\. E\. Bradner, and E\. S\. Fischer\(2018\-07\)Plasticity in binding confers selectivity in ligand\-induced protein degradation\.Nat Chem Biol14\(7\),pp\. 706–714\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p1.1),[§4\.2](https://arxiv.org/html/2605.26192#S4.SS2.p1.1),[§5](https://arxiv.org/html/2605.26192#S5.p2.1),[§5](https://arxiv.org/html/2605.26192#S5.p3.1)\.
- \[22\]F\. J\. O’Reilly and J\. Rappsilber\(2018\-11\)Cross\-linking mass spectrometry: methods and applications in structural, molecular and systems biology\.Nat\. Struct\. Mol\. Biol\.25\(11\),pp\. 1000–1008\(en\)\.Cited by:[§5](https://arxiv.org/html/2605.26192#S5.p5.1)\.
- \[23\]S\. Passaro, G\. Corso, J\. Wohlwend, M\. Reveiz, S\. Thaler, V\. R\. Somnath, N\. Getz, T\. Portnoi, J\. Roy, H\. Stark, D\. Kwabi\-Addo, D\. Beaini, T\. Jaakkola, and R\. Barzilay\(2025\-06\)Boltz\-2: towards accurate and efficient binding affinity prediction\.\(en\)\.Cited by:[§A\.1](https://arxiv.org/html/2605.26192#A1.SS1.p1.6),[§1](https://arxiv.org/html/2605.26192#S1.p2.1),[§2\.1](https://arxiv.org/html/2605.26192#S2.SS1.p1.2),[§3\.1](https://arxiv.org/html/2605.26192#S3.SS1.p1.1)\.
- \[24\]G\. P\. Pereira, C\. Gouzien, P\. C\. T\. Souza, and J\. Martin\(2025\-03\)Challenges in predicting PROTAC\-mediated protein\-protein interfaces with AlphaFold reveal a general limitation on small interfaces\.Bioinform\. Adv\.5\(1\),pp\. vbaf056\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p2.1)\.
- \[25\]W\. Puchała, M\. Kistowski, L\. Zhukova, M\. Burdukiewicz, and M\. Dadlez\(2025\-04\)HRaDeX: R package and web server for computing high\-resolution deuterium uptake rates for HDX\-MS data\.J\. Proteome Res\.24\(4\),pp\. 1688–1700\(en\)\.Cited by:[§5](https://arxiv.org/html/2605.26192#S5.p4.1)\.
- \[26\]R\. Raghu, A\. Levy, G\. Wetzstein, and E\. D\. Zhong\(2025\-12\)Multiscale guidance of protein structure prediction with heterogeneous cryo\-EM data\.External Links:2506\.04490Cited by:[§3\.6](https://arxiv.org/html/2605.26192#S3.SS6.p3.1)\.
- \[27\]V\. A\. Roberts, E\. E\. Thompson, M\. E\. Pique, M\. S\. Perez, and L\. F\. Ten Eyck\(2013\-07\)DOT2: macromolecular docking with improved biophysical models\.J\. Comput\. Chem\.34\(20\),pp\. 1743–1758\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p4.1),[§3\.6](https://arxiv.org/html/2605.26192#S3.SS6.p1.1)\.
- \[28\]M\. P\. Rout and A\. Sali\(2019\-05\)Principles for integrative structural biology studies\.Cell177\(6\),pp\. 1384–1403\(en\)\.Cited by:[§3\.6](https://arxiv.org/html/2605.26192#S3.SS6.p1.1)\.
- \[29\]H\. Rui, K\. S\. Ashton, J\. Min, C\. Wang, and P\. R\. Potts\(2023\-03\)Protein\-protein interfaces in molecular glue\-induced ternary complexes: classification, characterization, and prediction\.RSC Chem\. Biol\.4\(3\),pp\. 192–215\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p1.1)\.
- \[30\]T\. Saldaño, N\. Escobedo, J\. Marchetti, D\. J\. Zea, J\. Mac Donagh, A\. J\. Velez Rueda, E\. Gonik, A\. García Melani, J\. Novomisky Nechcoff, M\. N\. Salas, T\. Peters, N\. Demitroff, S\. Fernandez Alberti, N\. Palopoli, M\. S\. Fornasari, and G\. Parisi\(2022\-04\)Impact of protein conformational diversity on AlphaFold predictions\.Bioinformatics38\(10\),pp\. 2742–2748\(en\)\.Cited by:[§5](https://arxiv.org/html/2605.26192#S5.p1.1)\.
- \[31\]S\. L\. Schreiber\(2021\-01\)The rise of molecular glues\.Cell184\(1\),pp\. 3–9\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p1.1),[§5](https://arxiv.org/html/2605.26192#S5.p2.1)\.
- \[32\]Y\. Song, J\. Sohl\-Dickstein, D\. P\. Kingma, A\. Kumar, S\. Ermon, and B\. Poole\(2020\-11\)Score\-based generative modeling through stochastic differential equations\.External Links:2011\.13456Cited by:[§2\.1](https://arxiv.org/html/2605.26192#S2.SS1.p2.2),[§2\.1](https://arxiv.org/html/2605.26192#S2.SS1.p3.1)\.
- \[33\]K\. Stahl, A\. Graziadei, T\. Dau, O\. Brock, and J\. Rappsilber\(2023\-12\)Protein structure prediction with in\-cell photo\-crosslinking mass spectrometry and deep learning\.Nat Biotechnol41\(12\),pp\. 1810–1819\(en\)\.Cited by:[§3\.6](https://arxiv.org/html/2605.26192#S3.SS6.p2.1)\.
- \[34\]K\. Stahl, R\. Warneke, L\. Demann, R\. Bremenkamp, B\. Hormes, O\. Brock, J\. Stülke, and J\. Rappsilber\(2024\-09\)Modelling protein complexes with crosslinking mass spectrometry and deep learning\.Nat Commun15\(1\),pp\. 7866\(en\)\.Cited by:[§3\.6](https://arxiv.org/html/2605.26192#S3.SS6.p2.1)\.
- \[35\]M\. Z\. Tien, A\. G\. Meyer, D\. K\. Sydykova, S\. J\. Spielman, and C\. O\. Wilke\(2013\-11\)Maximum allowed solvent accessibilites of residues in proteins\.PLoS One8\(11\),pp\. e80635\(en\)\.Cited by:[§4\.1](https://arxiv.org/html/2605.26192#S4.SS1.p2.4)\.
- \[36\]M\. H\. Tran, C\. E\. Martina, R\. Moretti, M\. Nagel, K\. L\. Schey, and J\. Meiler\(2025\-03\)RosettaHDX: predicting antibody\-antigen interaction from hydrogen\-deuterium exchange mass spectrometry data\.J\. Struct\. Biol\.217\(1\),pp\. 108166\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p4.1),[§3\.6](https://arxiv.org/html/2605.26192#S3.SS6.p1.1)\.
- \[37\]P\. Vincent\(2011\-07\)A connection between score matching and denoising autoencoders\.Neural Comput23\(7\),pp\. 1661–1674\(en\)\.Cited by:[§2\.1](https://arxiv.org/html/2605.26192#S2.SS1.p3.4)\.
- \[38\]L\. Wang, A\. Tučs, S\. Ding, K\. Tsuda, and A\. Sljoka\(2025\-07\)HDXRank: a deep learning framework for ranking protein complex predictions with hydrogen\-deuterium exchange data\.J\. Chem\. Theory Comput\.21\(14\),pp\. 7173–7187\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p4.1),[§3\.6](https://arxiv.org/html/2605.26192#S3.SS6.p1.1)\.
- \[39\]Y\. Wang and M\. Chen\(2026\-01\)Extrapolating foundation generative models with physics: a case study of exploring peptide conformations under protein\-environment interactions\.J\. Phys\. Chem\. Lett\.17\(2\),pp\. 456–465\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p3.1)\.
- \[40\]J\. A\. Ward, C\. Perez\-Lopez, and C\. Mayor\-Ruiz\(2023\-05\)Biophysical and computational approaches to study ternary complexes: a ’cooperative relationship’ to rationalize targeted protein degradation\.Chembiochem24\(10\),pp\. e202300163\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p1.1)\.
- \[41\]H\. K\. Wayment\-Steele, A\. Ojoawo, R\. Otten, J\. M\. Apitz, W\. Pitsawong, M\. Hömberger, S\. Ovchinnikov, L\. Colwell, and D\. Kern\(2024\-01\)Predicting multiple conformations via sequence clustering and AlphaFold2\.Nature625\(7996\),pp\. 832–839\(en\)\.Cited by:[§5](https://arxiv.org/html/2605.26192#S5.p1.1)\.
- \[42\]G\. Weng, J\. Gao, Z\. Wang, E\. Wang, X\. Hu, X\. Yao, D\. Cao, and T\. Hou\(2020\-06\)Comprehensive evaluation of fourteen docking programs on Protein\-Peptide complexes\.J Chem Theory Comput16\(6\),pp\. 3959–3969\(en\)\.Cited by:[§5](https://arxiv.org/html/2605.26192#S5.p2.1)\.
- \[43\]R\. P\. Wurz, H\. Rui, K\. Dellamaggiore, S\. Ghimire\-Rijal, K\. Choi, K\. Smither, A\. Amegadzie, N\. Chen, X\. Li, A\. Banerjee, Q\. Chen, D\. Mohl, and A\. Vaish\(2023\-07\)Affinity and cooperativity modulate ternary complex formation to drive targeted protein degradation\.Nat\. Commun\.14\(1\),pp\. 4177\(en\)\.Cited by:[§1](https://arxiv.org/html/2605.26192#S1.p1.1)\.
- \[44\]B\. Zhang, Z\. Gong, B\. Zhong, Z\. Liang, Y\. Zhang, Q\. Zhao, and L\. Zhang\(2025\-05\)Ultrafiltration\-enhanced cross\-linking mass spectrometry for comprehensive analysis of low molecular weight protein cross\-links\.Anal\. Chem\.97\(18\),pp\. 9606–9612\(en\)\.Cited by:[§5](https://arxiv.org/html/2605.26192#S5.p4.1)\.
- \[45\]J\. Zhang, J\. L\. Balsbaugh, S\. Gao, N\. G\. Ahn, and J\. P\. Klinman\(2020\-05\)Hydrogen deuterium exchange defines catalytically linked regions of protein flexibility in the catechol o\-methyltransferase reaction\.Proc\. Natl\. Acad\. Sci\. U\. S\. A\.117\(20\),pp\. 10797–10805\(en\)\.Cited by:[§3\.3](https://arxiv.org/html/2605.26192#S3.SS3.p1.1)\.
- \[46\]M\. M\. Zhang, R\. Y\. Huang, B\. R\. Beno, E\. G\. Deyanova, J\. Li, G\. Chen, and M\. L\. Gross\(2020\-05\)Epitope and paratope mapping of PD\-1/Nivolumab by mass Spectrometry\-Based Hydrogen–Deuterium exchange, cross\-linking, and molecular docking\.Analytical Chemistry\(en\)\.Cited by:[§4\.3](https://arxiv.org/html/2605.26192#S4.SS3.p5.1)\.
- \[47\]X\. Zhang, J\. Wang, D\. Tan, Q\. Li, M\. Li, Z\. Gong, C\. Tang, Z\. Liu, M\. Dong, and X\. Lei\(2018\-01\)Carboxylate\-selective chemical cross\-linkers for mass spectrometric analysis of protein structures\.Anal\. Chem\.90\(2\),pp\. 1195–1201\(en\)\.Cited by:[§5](https://arxiv.org/html/2605.26192#S5.p4.1)\.
## Appendix ASupplemental background
### A\.1Distance and contact constraints in Boltz\-2
To allow extended user control over the generated structures, Boltz\-2 introduced contact and pocket conditioning, allowing users to specify distance constraints derived from experimental methods or expert knowledge\[[23](https://arxiv.org/html/2605.26192#bib.bib10)\]\. For a specified pair of atomsiiandjj, the experimental constraint defines an allowable distance range\[dmin,dmax\]\[d\_\{\\min\},d\_\{\\max\}\]\. At a given diffusion timestep, the Euclidean distancedijd\_\{ij\}is calculated from the predicted denoised coordinates𝐱^0\(𝐱,t\)\\hat\{\\mathbf\{x\}\}\_\{0\}\(\\mathbf\{x\},t\)\. Boltz\-2 applies a flat\-bottomed distance potentialUdistU\_\{\\text\{dist\}\}, formulated as a quadratic penalty on boundary violations:
Udist\(𝐱\)=∑\(i,j\)\(max\(0,dij−dmax\)2\+max\(0,dmin−dij\)2\)U\_\{\\text\{dist\}\}\(\\mathbf\{x\}\)=\\sum\_\{\(i,j\)\}\\left\(\\max\(0,d\_\{ij\}\-d\_\{\\max\}\)^\{2\}\+\\max\(0,d\_\{\\min\}\-d\_\{ij\}\)^\{2\}\\right\)\(1\)Ifdijd\_\{ij\}falls outside the allowed bounds,∇𝐱Udist\\nabla\_\{\\mathbf\{x\}\}U\_\{\\text\{dist\}\}yields a non\-zero gradient\. During the reverse sampling process, this gradient continuously pushes or pulls the specified coordinates\. This mechanism forms the mathematical foundation necessary to integrate sparse spatial proteomics data, such as XL\-MS, actively steering the global topology of the complex until the distance requirements are satisfied\.
## Appendix BSupplemental methods
### B\.1Guidance scheduling
The piecewise timestep schedules are strictly defined over the reverse diffusion trajectory bounded byt=1t=1, representing pure noise, andt=0t=0, representing the fully denoised structure\. For the hydrogen\-deuterium exchange guidance, the potential is evaluated at every diffusion step, corresponding to an interval of 1\. Continuous evaluation was implemented in this framework to prevent the structure from prematurely committing to specific conformations, ensuring the gradient could intervene effectively before the sampling trajectory became fixed\. We define a maximum guidance weight of 2, which was empirically escalated from weaker defaults to ensure the computed gradient is forceful enough to compete with the primary diffusion score network\. The weight is scaled dynamically across three discrete stages\. Fort∈\(0\.95,1\]t\\in\(0\.95,1\], the guidance is disabled entirely\. Fort∈\(0\.7,0\.95\]t\\in\(0\.7,0\.95\], the potential is applied at 25% strength, equating to a weight of 0\.5\. Finally, fort∈\[0,0\.7\]t\\in\[0,0\.7\], the potential is applied at the full strength of 2\.0\. Associated hyperparameters include a penalty scalar ofk=20k=20and a Gaussian kernel width ofσ=5\\sigma=5\. We significantly increased the penalty weight to ensure it was strong enough to actually force protected residues into the core of the protein, preventing it from being overpowered by the main AI model\. We also narrowed the search radius to 5\.0 Å so the model only counts atoms that are directly next to each other\. This gives us a much cleaner signal and prevents the model from being confused by distant atoms while the structure is still messy and forming\. Finally, we set the exposure tolerance to zero, meaning that even the slightest bit of surface exposure is strictly penalized\. For the cross\-linking mass spectrometry spatial guidance, the potential is computed more sparsely at an interval of 4 diffusion steps\. The spatial guidance weight follows a distinct three\-stage step function where it remains disabled fort∈\(0\.75,1\]t\\in\(0\.75,1\]\. Fort∈\(0\.25,0\.75\]t\\in\(0\.25,0\.75\], the guidance is applied at 50% strength\. In the final stages of diffusion, bounded byt∈\[0,0\.25\]t\\in\[0,0\.25\], the potential is applied at 100% strength\. Additionally, the union lambda parameter for these spatial constraints utilizes an exponential interpolation starting at 8\.0 and ending at 0\.0 with an alpha of \-2\.0\.
### B\.2Constraints derivation from raw data
Cross\-linking mass spectrometry constraints are derived directly from normalized, high\-confidence MS intensity data\. Identified residue pairs are clustered based on their relative cross\-link intensities across differential experimental conditions \(e\.g\., active versus inactive proximity\-inducing compounds\)\.
Hydrogen\-deuterium exchange constraints are derived by comparing the fractional uptake between the ternary complex and the isolated binary or apo states\. To estimate individual amino acid uptake, we applied a standard deviation\-based peptide filtering protocol\. Briefly, individual peptides are excluded if their uptake standard deviation \(SD\) exceeds 2\.5 times the mean peptide SD of the sample\. The uptake for a given amino acid is calculated as the weighted mean of all overlapping peptides covering the residue\. To translate these uptake values into actionable metrics for structural evaluation, residues are categorized into protection regions \(Ternary\-Apo protection \> 5%\) and exposed regions \(protection < 5%\)\. For each generated ternary model, the change in solvent\-accessible surface area \(SASA\) upon complex formation is calculated\.
## Appendix CSupplemental results
Figure S1:WDR5\-PROTAC\-DCAF1 co\-folding\. Using positive and negative XL\-MS constraints our model successfully predicts the right orientation of WDR5 relative to DCAF1\. All structures are aligned based on DCAF1 with red and orange stretches to visualize the relative rotation of the WDR5\.Figure S2:BRD4\-PROTAC\-CRBN co\-folding\. a\) Boltz\-2 predicts a different conformation from the crystal structure, which does not cover protected residues\. b\) AIMS\-Fold predicts the right conformation when guided by HDX\-MS data\.Figure S3:Heatmap detailing the change in Relative Solvent Accessible Surface Area \(RSA\) across the interaction interface for the PD\-1\-Nivolumab case, visualizing the HDX\-MS protection patterns effectively captured by AIMS\-Fold\.Figure S4:XL\-MS distance satisfaction for the best\-performing constraint subset in PD\-1\-Nivolumab case\. The guided model successfully pulls cross\-linked regions within the physical threshold of the linker, whereas unguided models heavily violate these spatial boundaries\.Similar Articles
@SylvainGariel: Took me a while to figure out what all the ESMFold2 rage was about. At first, the benchmarking data didn't look super r…
ESMFold2 is an open-source AI model for protein structure prediction that achieves state-of-the-art performance on protein interactions and antibodies, with a massive structure database (ESM Atlas).
A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood?
Introduces InteractBind, a large-scale dataset and benchmark for fine-grained evaluation of protein-ligand models, focusing on binding-site localization and non-covalent interaction prediction. Evaluates eight existing models and finds limited binding-site localization despite strong binary binding prediction.
Probe Before You Edit: Probing-Guided Molecular Optimization for LLM Agents in Structure-Based Drug Design
This paper introduces PROBE, a framework that uses LLM agents to iteratively optimize ligands in structure-based drug design by probing pocket-ligand complex responses before editing, achieving state-of-the-art results on CrossDocked2020.
The Unreasonable Redundancy of Nature's Protein Folds
A blog post from Ligo discussing the redundancy of natural protein folds and the challenges of scaling structural data for generative biomolecular models, referencing AlphaFold3 and other recent models.
Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning
This paper proposes SoftBlobGIN, a framework that enhances the interpretability of protein language model representations by projecting them onto contact graphs for structure-aware message passing. It demonstrates improved performance on enzyme classification and binding-site detection while providing auditable structural explanations.