Deep Learning for Protein Complex Prediction and Design

arXiv cs.LG Papers

Summary

This PhD thesis introduces deep learning methods for protein complex prediction and design, including GLINTER for contact prediction, ESMPair for homolog pairing, and RedNet for binder design.

arXiv:2605.11189v1 Announce Type: new Abstract: Accurately modeling and designing protein complex structures is a central problem in computational structural biology, with broad implications for understanding cellular function and developing therapeutics. This thesis investigates two fundamental aspects of this problem using deep learning: domain-specific architectures that capture the hierarchical nature of protein structures, and search algorithms that efficiently navigate the vast sequence spaces of protein complexes to identify interacting homologs for improving complex structure prediction and to design protein sequences.
Original Article
View Cached Full Text

Cached at: 05/13/26, 06:34 AM

# Acknowledgments
Source: [https://arxiv.org/html/2605.11189](https://arxiv.org/html/2605.11189)
Deep Learning for Protein Complex Prediction and Design

by Ziwei Xie

A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science

at the TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO Chicago, Illinois March 2026

Thesis Committee: Jinbo Xu \(Thesis Advisor\) Madhur Tulsiani Aly Azeem Khan Gabriel Rocklin

Copyright © 2026 by Ziwei Xie

All Rights Reserved

###### Abstract

Accurately modeling and designing protein complex structures is a central problem in computational structural biology, with broad implications for understanding cellular function and developing therapeutics\. This thesis investigates two fundamental aspects of this problem using deep learning: domain\-specific architectures that capture the hierarchical nature of protein structures, and search algorithms that efficiently navigate the vast sequence spaces of protein complexes to identify interacting homologs for improving complex structure prediction and to design protein sequences\.

I first develop GLINTER, a graph neural network\-based method for predicting interfacial contacts between proteins\. GLINTER combines structural representations from monomeric structures with co\-evolutionary signals extracted via transformer, outperforming existing methods on both heterodimeric and homodimeric targets and effectively guiding protein–protein docking\. I then address a key bottleneck in complex structure prediction: identifying interacting homologs across species\. I propose ESMPair, which uses protein language models to pair homologs from individual chains\. ESMPair significantly improves structure prediction accuracy for heterodimers\. Finally, I introduce RedNet, a multiscale graph transformer for fixed\-backbone protein binder design\. RedNet integrates backbone and side\-chain information with a contrastive decoding algorithm that optimizes binding affinity and specificity, generating binders with improved thermodynamic properties that can discriminate between highly structurally similar targets\.

Together, these contributions demonstrate that domain\-specific deep learning architectures, combined with principled search strategies, can extract complementary information from protein structures, evolutionary data, and experimental measurements to advance both protein complex structure prediction and its inverse problem, fixed\-backbone protein binder design\.

I would like to thank my advisor, Prof\. Jinbo Xu, for his support, guidance, and insightful feedback throughout my PhD\. I am grateful to my committee members — Prof\. Madhur Tulsiani, Prof\. Aly Khan, and Prof\. Gabriel Rocklin — for their feedback on my proposal and thesis and for helping me navigate the various processes\. I also thank Prof\. Avrim Blum for serving as my local advisor in 2024, and Prof\. Greg Shakhnarovich for helping me navigate the final steps toward my thesis\.

I thank my lab mates for many inspiring discussions, and the staff, colleagues, and faculty for making TTIC a wonderful place to learn and conduct research\.

Finally, I want to thank my parents, my grandparents, and the rest of my family for their unconditional love, trust, and support throughout my life\. This thesis would not have been possible without them\.

###### Contents

1. [Acknowledgments](https://arxiv.org/html/2605.11189#Chx1)
2. [1Introduction](https://arxiv.org/html/2605.11189#Ch1)1. [1\.1Protein Structure Prediction](https://arxiv.org/html/2605.11189#Ch1.S1)1. [1\.1\.1Experimental Protein Structure Determination](https://arxiv.org/html/2605.11189#Ch1.S1.SS1) 2. [1\.1\.2Computational Protein Structure Prediction](https://arxiv.org/html/2605.11189#Ch1.S1.SS2) 2. [1\.2Computational Protein Design](https://arxiv.org/html/2605.11189#Ch1.S2)1. [1\.2\.1Experimental Protein Engineering Methods](https://arxiv.org/html/2605.11189#Ch1.S2.SS1) 2. [1\.2\.2Computational Protein Design](https://arxiv.org/html/2605.11189#Ch1.S2.SS2) 3. [1\.3Deep Learning for Protein Modeling](https://arxiv.org/html/2605.11189#Ch1.S3) 4. [1\.4Contributions](https://arxiv.org/html/2605.11189#Ch1.S4) 5. [References](https://arxiv.org/html/2605.11189#bib)
3. [2Graph Learning of Protein Interfacial Contacts](https://arxiv.org/html/2605.11189#Ch2)1. [2\.1Introduction](https://arxiv.org/html/2605.11189#Ch2.S1) 2. [2\.2Data and methods](https://arxiv.org/html/2605.11189#Ch2.S2)1. [2\.2\.1Network architecture](https://arxiv.org/html/2605.11189#Ch2.S2.SS1) 2. [2\.2\.2Features](https://arxiv.org/html/2605.11189#Ch2.S2.SS2) 3. [2\.2\.3Datasets](https://arxiv.org/html/2605.11189#Ch2.S2.SS3) 4. [2\.2\.4Training and evaluation](https://arxiv.org/html/2605.11189#Ch2.S2.SS4) 5. [2\.2\.5Methods to compare](https://arxiv.org/html/2605.11189#Ch2.S2.SS5) 3. [2\.3Results](https://arxiv.org/html/2605.11189#Ch2.S3)1. [2\.3\.1Evaluation of interfacial contact prediction](https://arxiv.org/html/2605.11189#Ch2.S3.SS1) 2. [2\.3\.2Ablation study](https://arxiv.org/html/2605.11189#Ch2.S3.SS2) 3. [2\.3\.3Application to selection of docking decoys](https://arxiv.org/html/2605.11189#Ch2.S3.SS3) 4. [2\.4Conclusion](https://arxiv.org/html/2605.11189#Ch2.S4) 5. [References](https://arxiv.org/html/2605.11189#biba)
4. [3Improved Protein Heterodimer Structure Prediction with Protein Language Models](https://arxiv.org/html/2605.11189#Ch3)1. [3\.1Introduction](https://arxiv.org/html/2605.11189#Ch3.S1) 2. [3\.2Data and Methods](https://arxiv.org/html/2605.11189#Ch3.S2)1. [3\.2\.1The PLM\-enhanced MSA pairing pipeline](https://arxiv.org/html/2605.11189#Ch3.S2.SS1) 2. [3\.2\.2Settings](https://arxiv.org/html/2605.11189#Ch3.S2.SS2) 3. [3\.2\.3Baselines](https://arxiv.org/html/2605.11189#Ch3.S2.SS3) 4. [3\.2\.4Time/Memory requirement analysis](https://arxiv.org/html/2605.11189#Ch3.S2.SS4) 3. [3\.3Results](https://arxiv.org/html/2605.11189#Ch3.S3)1. [3\.3\.1ESMPair overview](https://arxiv.org/html/2605.11189#Ch3.S3.SS1) 2. [3\.3\.2ESMPair outperforms other MSA pairing methods on heterodimer predictions](https://arxiv.org/html/2605.11189#Ch3.S3.SS2) 3. [3\.3\.3Ensemble improves the prediction accuracy](https://arxiv.org/html/2605.11189#Ch3.S3.SS3) 4. [3\.3\.4Factors influencing prediction accuracy](https://arxiv.org/html/2605.11189#Ch3.S3.SS4) 4. [3\.4Conclusion](https://arxiv.org/html/2605.11189#Ch3.S4) 5. [References](https://arxiv.org/html/2605.11189#bibb)
5. [4Redesign Selective Protein Binders Using Contrastive Decoding](https://arxiv.org/html/2605.11189#Ch4)1. [4\.1Introduction](https://arxiv.org/html/2605.11189#Ch4.S1)1. [4\.1\.1Related Works](https://arxiv.org/html/2605.11189#Ch4.S1.SS1) 2. [4\.2Data and Methods](https://arxiv.org/html/2605.11189#Ch4.S2)1. [4\.2\.1Protein Graph Representation](https://arxiv.org/html/2605.11189#Ch4.S2.SS1) 2. [4\.2\.2Network Architectures](https://arxiv.org/html/2605.11189#Ch4.S2.SS2) 3. [4\.2\.3Contrastive Decoding and Scoring](https://arxiv.org/html/2605.11189#Ch4.S2.SS3) 4. [4\.2\.4Datasets](https://arxiv.org/html/2605.11189#Ch4.S2.SS4) 3. [4\.3Results](https://arxiv.org/html/2605.11189#Ch4.S3)1. [4\.3\.1All\-atom graph transformer improves sequence recovery of monomeric and dimeric structures](https://arxiv.org/html/2605.11189#Ch4.S3.SS1) 2. [4\.3\.2Contrastive scoring improves zero\-shot binding affinity prediction](https://arxiv.org/html/2605.11189#Ch4.S3.SS2) 3. [4\.3\.3Contrastive decoding improves structural self\-consistency of binders](https://arxiv.org/html/2605.11189#Ch4.S3.SS3) 4. [4\.3\.4Contrastive decoding improves binding specificities of binders](https://arxiv.org/html/2605.11189#Ch4.S3.SS4) 5. [4\.3\.5Structural analysis of redesigned selective binder](https://arxiv.org/html/2605.11189#Ch4.S3.SS5) 4. [4\.4Conclusion](https://arxiv.org/html/2605.11189#Ch4.S4) 5. [References](https://arxiv.org/html/2605.11189#bibc)
6. [5Conclusions and Future Directions](https://arxiv.org/html/2605.11189#Ch5)1. [5\.1Future improvements](https://arxiv.org/html/2605.11189#Ch5.S1) 2. [References](https://arxiv.org/html/2605.11189#bibd)

###### List of Figures

1. [1\.1An illustration of spatial resolution of different methods\. AFM, atomic force microscopy; EM, electron microscopy; FRET, Fröster resonance energy transfer; NMR, nuclear magnetic resonance\. Reproduced from\[23\], licensed under the Attribution–Noncommercial–Share Alike 3\.0 Unported License \(http://creativecommons\.org/licenses/by\-nc\-sa/3\.0/\)\.](https://arxiv.org/html/2605.11189#Ch1.F1)
2. [1\.2Overview of the deep learning protein design pipeline, illustrating backbone generation, sequence design, and structure prediction\-based filtering\. Figure adapted from\[18\]\.](https://arxiv.org/html/2605.11189#Ch1.F2)
3. [2\.1Overview of the GLINTER architecture\.L1L\_\{1\}andL2L\_\{2\}are the lengths of the two protein chains, K is the number of channels in a CaConv layer and 144 is the total number of heads in the row attention weights generated by Facebook’s MSA Transformer \(Rao et al\., 2021\)](https://arxiv.org/html/2605.11189#Ch2.F1)
4. [2\.2CNN\+ESM\-Attention model](https://arxiv.org/html/2605.11189#Ch2.F2)
5. [2\.3The x\-axis is the TMscore of the predicted monomer structures\. The y\-axis is the difference of the top 10 precision resulting from the experimental and predicted monomer structures\. \(A\) “Residue, D\-cut=8”\. \(B\) “Residue \+ Atom, D\-cut=8,6” \(C\) “Residue \+ Surface, D\-cut=8,6” \(D\) “Residue \+ Atom \+ Surface, D\-cut=8,6,6” \(E\) “Residue \+ Atom \+ Surface \+ ESM, D\-cut=8,6,6”\. In all the box plots, the upper edge of the box is the third quartile \(Q3\), and the lower edge of the box is the first quartile \(Q1\), the orange line is the median, the upper cap is the highest datum below Q3 \+ 1\.5\(Q3 \- Q1\), and the lower cap is the lowest datum above Q1 \- 1\.5\(Q3\-Q1\)\.](https://arxiv.org/html/2605.11189#Ch2.F3)
6. [2\.4Comparison of top\-10 precision of three models: ESM, Residue\+Atom\+Surface and Residue\+Atom\+Surface\+ESM\. \(A\) compares Residue\+Atom\+Surface and ESM, \(B\) compares Residue\+Atom\+Surface\+ESM and ESM, and \(C\) compares Residue\+Atom\+Surface and Residue\+Atom\+Surface\+ESM](https://arxiv.org/html/2605.11189#Ch2.F4)
7. [2\.5Correlation betweenln⁡\(Meff\)\\ln\(\\text\{Meff\}\)\(x\-axis\) and the number of correct top\-10 predictions \(y\-axis\) of the ESM\-Attention model\. The targets without correct top\-10 predictions are excluded\. \(R2=0\.3093R^\{2\}=0\.3093\)](https://arxiv.org/html/2605.11189#Ch2.F5)
8. [2\.6The average quality \(measured by TMscore\) of the selected decoys by top predicted contacts\. The x\-axis is the number of top decoys selected\. In the legend, “top\-10”, “top\-25” and “top\-50” represent that top 10, 25 and 50 predicted contacts are used to select docking decoys, respectively\. “best decoy” indicates the quality of the best decoys generated by HDOCK](https://arxiv.org/html/2605.11189#Ch2.F6)
9. [3\.1Schematic illustration of ESMPair\. Given a pair of query sequences: \(1\) JackHMMER searches UniProt\[24\]to generate an MSA for each query, \(2\) homologs are grouped by species, \(3\) ESM\-MSA\-1b estimates column attention scores between each homolog and the query, \(4\) homologs from the same species with the same rank are paired and concatenated into interologs, and \(5\) AlphaFold\-Multimer takes the interolog MSA as input to predict the complex structure\.](https://arxiv.org/html/2605.11189#Ch3.F1)
10. [3\.2Prediction performance across pConf score regions and taxonomic domains\. \(a–b\) Negative correlation between the relative improvement of ESMPair over AF\-Multimer and pConf score\. \(c–f\) DockQ score comparison among ESMPair, AF\-Multimer, and Genome on Eucaryote, Bacteria, and Eucaryote&Bacteria domains\. Eucaryote&Bacteria denotes heterodimers whose two chains belong to different domains\. Heterodimers in our dataset originate from Eucaryotes, Bacteria, Viruses, and Archaea; we group Bacteria, Viruses, and Archaea into the Bacteria domain\. Across all test sets, ESMPair significantly outperforms both baselines on Eucaryote targets\.](https://arxiv.org/html/2605.11189#Ch3.F2)
11. [3\.3Comparison of ESMPair and AF\-Multimer on newly released targets \(a–f\) and an unresolved case \(g\)\. \(a–f\) Evaluations on 74 targets released after 30 April 2018\. \(a\) Bar chart showing the relative performance gap between ESMPair and AF\-Multimer across three categories: ESMPair outperforms AF\-Multimer, AF\-Multimer outperforms ESMPair, and equal performance\. \(b\) Interface and ligand RMSD distributions of structures predicted by ESMPair \(purple\) and AF\-Multimer \(yellow\)\. \(c–f\) Four representative cases: AF\-Multimer predicts incorrect ligand orientations for 7VSI and 7AQU, and incorrect binding sites for 7SL9 and 6FYH\. \(g\) The intermediate filament NFM–INA heterodimer predicted by ESMPair forms a four\-helix bundle\. Gray boxes indicate the interacting motifs of coil 1A, coil 1B, and coil 2 of the two proteins\.](https://arxiv.org/html/2605.11189#Ch3.F3)
12. [3\.4Comparison of ESMPair with four alternative MSA pairing approaches \(a–d\) and ensemble strategies \(e\) on pConf70 targets\. \(a–d\) Each point shows the DockQ score of a target for ESMPair \(x\-axis\) versus the compared method \(y\-axis\)\. Points below the diagonal indicate ESMPair outperforms the alternative\. Highlighted regions denote incorrect \(white\), acceptable \(gray\), medium \(yellow\), and high\-quality \(purple\) predictions by DockQ score\. \(e\) Gray bars show single\-strategy performance, where G\. = Genome, A\. = AF\-Multimer, and E\. = ESMPair\. ESMPair achieves the best single\-strategy result \(0\.259 DockQ, 42\.4% success rate\)\. Yellow bars show pairwise ensembles, with ESMPair \+ Genome performing best \(0\.277 DockQ, 44\.6% success rate\)\. The purple bar shows the three\-strategy ensemble achieving the highest overall performance \(0\.285 DockQ, 46\.8% success rate\)\.](https://arxiv.org/html/2605.11189#Ch3.F4)
13. [3\.5Factors affecting structure prediction performance\. Correlation between average Top\-5 Best DockQ score and \(a\) column attention score \(log\-scale\) predicted by ESM\-MSA\-1b, \(b\) number of effective sequences \(Meff\), \(c\) number of species, and \(d\) depth of paired MSA \(log\-scale\)\. \(e\) Distribution of column attention score versus the number of effective interologs\. The red curve shows the fitted linear regression model \(Pearsonr≈−0\.70r\\approx\-0\.70\), indicating that higher column attention scores correspond to fewer effective interologs\.](https://arxiv.org/html/2605.11189#Ch3.F5)
14. [4\.1Overview of the RedNet architecture\. Graph neural networks encode protein structure into node and edge representations, which are then decoded by a causal transformer to autoregressively predict amino acid sequences\.](https://arxiv.org/html/2605.11189#Ch4.F1)
15. [4\.2Structural analysis of the 6FOE–5WHJ selective binder pair\. \(A\) Interactions of redesigned binders \(red for the design chain of the on\-target complex and white for the design chain of the off\-target complex\) to their respective on\-target \(cyan\) and off\-target \(grey\) partners\. \(B\) Interactions of native binders to their respective on\-target and off\-target partners\.](https://arxiv.org/html/2605.11189#Ch4.F2)
16. [4\.3Structural analysis of the 5FFN–1LW6 selective binder pair\. \(A\) Interactions of redesigned binders \(red for the design chain of the on\-target complex and white for the design chain of the off\-target complex\) to their respective on\-target \(cyan\) and off\-target \(grey\) partners\. \(B\) Interactions of native binders to their respective on\-target and off\-target partners\.](https://arxiv.org/html/2605.11189#Ch4.F3)

###### List of Tables

1. [2\.1Features used in spatial graphs, whereLLis the number of residues,NNis the number of atoms,MMis the number of sampled surface vertices, andEEis the number of edges in an atom graph\.](https://arxiv.org/html/2605.11189#Ch2.T1)
2. [2\.2Average contact prediction precision \(%\) on the CASP\-CAPRI and PDB data](https://arxiv.org/html/2605.11189#Ch2.T2)
3. [2\.3Average contact prediction precision \(%\) on the HomoPDB2018 test set, which includes 165 homodimers released to PDB after January 1, 2018\.](https://arxiv.org/html/2605.11189#Ch2.T3)
4. [2\.4Average contact prediction precision \(%\) on the HeteroPDB2018 test set, which includes 72 heterodimers released to PDB after January 1, 2018\.](https://arxiv.org/html/2605.11189#Ch2.T4)
5. [2\.5Average interfacial contact precision \(%\) of different deep learning models on the CASP\-CAPRI data when experimental monomer structures are used\.](https://arxiv.org/html/2605.11189#Ch2.T5)
6. [2\.6Average interfacial contact precision \(%\) of different deep learning models on the CASP\-CAPRI data when monomer structures are predicted by AlphaFold\.](https://arxiv.org/html/2605.11189#Ch2.T6)
7. [2\.7Average interfacial contact precision \(%\) of the ESM\-Attention, CNN\+ESM\-Attention and Residue\+ESM models on the CASP\-CAPRI data](https://arxiv.org/html/2605.11189#Ch2.T7)
8. [2\.8Average top\-10 interfacial contact precision \(%\) of the ‘Residue\+Atom’ and ‘Residue\+Surface’ models on the CASP\-CAPRI data when experimental monomer structures are used](https://arxiv.org/html/2605.11189#Ch2.T8)
9. [3\.1DockQ scores and success rate of PLM\-enhanced pairing methods and baselines\. We report the average of Top\-5 Best DockQ score, Top\-1 Best DockQ score and Success Rate \(DockQ≥0\.23\\geq 0\.23\) \(%\) on pConf70, DockQ49and pConf80 test sets\. For one test target, we predicted five different structures using the five AlphaFold\-Multimer models\.](https://arxiv.org/html/2605.11189#Ch3.T1)
10. [3\.2Comparisons between ESMPair and AF\-Multimer on targets from all range pConf scores\. We report the average of DockQ score, TMscore, ICS and IPS as the evaluation metrics \(Larger values mean better performance\)\.](https://arxiv.org/html/2605.11189#Ch3.T2)
11. [3\.3The Top\-1 Best DockQ performance of two groups with different sequence length \(≥100\\geq 100and<100<100\)\. The GAP value is the subtraction between the DockQ score of the two different length groups\.](https://arxiv.org/html/2605.11189#Ch3.T3)
12. [3\.4The Top\-1 Best DockQ performance with or without full AF\-M features\.](https://arxiv.org/html/2605.11189#Ch3.T4)
13. [4\.1Summary of input features\. Core atoms: N, Cα\\alpha, C, O, pseudo\-Cβ\\beta\(C=5C\{=\}5\)\. The residue graph is aKK\-NN graph \(K=48K\{=\}48\) over Cα\\alphadistances\. The atom graph connects each Cα\\alphato nearby atoms via a radius graph \(r=15r\{=\}15Å, maxk=96k\{=\}96\)\. RBF:ϕ​\(d\)=exp⁡\(−\(d−μi\)2\)\\phi\(d\)=\\exp\(\-\(d\-\\mu\_\{i\}\)^\{2\}\),μi\\mu\_\{i\}linearly spaced in\[2,22\]\[2,22\],D=16D\{=\}16bins\. Atom type vocabularyA=37A\{=\}37; residue type vocabularyR=33R\{=\}33\.NN: residues;MM: atoms;EE: atom graph edges\.](https://arxiv.org/html/2605.11189#Ch4.T1)
14. [4\.2Performance comparison on monomers, homodimers, and heterodimers\.σ\\sigma: backbone coordinate noise level\. NSR: Native Sequence Recovery\. LL: Log\-Likelihood\. PPL: Perplexity\. For RedNet and ProteinMPNN, we test performance at different noise levelsσ∈\{0\.02,0\.1,0\.2,0\.3\}\\sigma\\in\\\{0\.02,0\.1,0\.2,0\.3\\\}\. ESM\-IF and PiFold are tested atσ=0\\sigma=0\.](https://arxiv.org/html/2605.11189#Ch4.T2)
15. [4\.3Heterodimer self\-consistency results on all 107 targets\.σ\\sigma: backbone coordinate noise level \(Å\)\. SR: Success Rate \(0–100%\), defined as pTM\>\>0\.55, ipTM\>\>0\.5, and Dsn pLDDT\>\>80, following BindCraft\. Dsn pLDDT: AlphaFold3 predicted LDDT for the design chain \(0–100\)\. ipTM: interface predicted Template Modeling score \(0–1\)\. pTM: AlphaFold3 predicted TM\-score of the complex \(0–1\)\. Tgt pLDDT: AlphaFold3 predicted LDDT for the target chain \(0–100\)\. RedNet\-CD uses contrastive decoding withα=1\\alpha=1,β=0\.9\\beta=0\.9\. All models are sampled at temperature=0\.001=0\.001\. Higher is better for all metrics\.Bold: best;underline: second best\.](https://arxiv.org/html/2605.11189#Ch4.T3)
16. [4\.4Energetics and geometric properties of designed binders\. Binding Score \(REU\): Rosetta binding score\. Int SC \(0–1\): interface shape complementarity\. Int Packstat \(0–1\): interface packing statistic\. Int dG \(REU\): interface free energy change\. Int dSASA \(Å2\): interface buried solvent\-accessible surface area\. REU: Rosetta Energy Units\. Due to Rosetta relaxation failures, we analyze 91 of 107 heterodimers that are successfully relaxed for all methods\. RedNet\-Ens combines RedNet and RedNet\-CD by selecting the design with the best binding score\.Bold: best;underline: second best\.](https://arxiv.org/html/2605.11189#Ch4.T4)
17. [4\.5Hydrophobicity and hydrogen\-bond properties of designed interfaces\. Surf Hydro \(0–1\): surface hydrophobicity\. Int Nres: number of interface residues\. Int HBonds: number of interface hydrogen bonds\. Int HBond %: percentage of interface residues involved in hydrogen bonds\. Int dUnsat HB: number of unsatisfied interface hydrogen bonds\. Int dUnsat HB %: percentage of unsatisfied interface hydrogen bonds\. Due to Rosetta relaxation failures, we analyze 91 of 107 heterodimers that are successfully relaxed for all methods\. RedNet\-Ens combines RedNet and RedNet\-CD by selecting the design with the best binding score\.Bold: best;underline: second best\.](https://arxiv.org/html/2605.11189#Ch4.T5)
18. [4\.6Selectivity measured by Rosetta binding score difference \(on\-target−\-off\-target\)\. A negative value indicates the binder prefers the on\-target\. SR \(Diff<<X\): percentage of cases where the on\-target preference exceeds the energy gap threshold X\.σ\\sigma: backbone coordinate noise level \(Å\)\. Higher is better for all metrics\.Bold: best;underline: second best\.](https://arxiv.org/html/2605.11189#Ch4.T6)
19. [4\.7Selectivity success measured by AlphaFold3 cofolding\.σ\\sigma: backbone coordinate noise level \(Å\)\. Selectivity: proportion where on\-target ipTM\>\>0\.55 and off\-target ipTM<<0\.55\. On\-Target: proportion where on\-target ipTM\>\>0\.55\. Off\-Target: proportion where off\-target ipTM\>\>0\.55\.Bold: best;underline: second best\.](https://arxiv.org/html/2605.11189#Ch4.T7)

### List of Algorithms

algocf\.1algocf\.2 algocf\.3algocf\.4algocf\.5algocf\.6

## Chapter 1Introduction

Proteins, composed of linear chains of amino acids that fold into three\-dimensional structures, perform a wide range of functions in many biological processes essential to life\. The biological functions of proteins are primarily determined by its dynamical structures and specific interactions with other molecules\. Protein\-protein interactions form the basis of complex cellular machinery, including the proteasome for protein degradation, and protein complexes that regulate gene expression\. Interactions between proteins and small molecules, enable cellular metabolism and signal transduction\. Protein\-nucleic acid interactions are essential for genome organization, transcriptional regulation, and epigenetic modifications that control gene expression programs\. Accurately modeling and designing protein complexes is therefore a central problem in biology and has immense promise for biotechnological and therapeutic applications, from designing enzymes with novel catalytic properties to creating biologics that modulating disease\-associated targets\.

Experimental determination of protein complex structures through X\-ray crystallography, while remaining the gold standard, is constrained by technical challenges in protein purification and crystallization, often requiring months of optimization and substantial resources\. Similarly, directed evolution approaches for designing protein binders, though powerful, are fundamentally limited by the requirement for suitable starting templates, the development of robust high\-throughput screening assays, and extensive iterations of mutagenesis and selection\. These experimental techniques demand specialized equipment, domain expertise, and significant resources, making them challenging for rapid development\. Consequently, computational methods that can accurately predict protein complex structures and design protein binders de novo offer a promising avenue to circumvent the limitations of experimental approaches\.

Proteins are challenging molecular systems to model and design due to their vast conformational and sequence spaces, complex interatomic potentials, and long\-range and high\-order interactions between residues\. Physics\-based approaches using empirical force fields have succeeded in predicting structures of small proteins and designing sequences for idealized scaffolds, but struggle with large, multi\-domain protein complexes\.

Machine learning—particularly deep learning—offers a complementary approach that addresses many of these limitations\. These methods learn relationships between protein sequences, structures, and functions from large datasets, including the Protein Data Bank\[[9](https://arxiv.org/html/2605.11189#bib.bib84)\], UniProt\[[69](https://arxiv.org/html/2605.11189#bib.bib103)\], multiplexed assays of variant effects \(MAVEs\)\[[25](https://arxiv.org/html/2605.11189#bib.bib104)\], and ClinVar\[[44](https://arxiv.org/html/2605.11189#bib.bib105)\]\. AlphaFold2 exemplifies this approach: by leveraging evolutionary information from multiple sequence alignments, it predicts protein structures from sequence with unprecedented accuracy\[[36](https://arxiv.org/html/2605.11189#bib.bib4)\]\.

In this chapter, I provide an overview of the core tasks in predicting and designing protein complex structures, survey current machine learning approaches to these problems, and summarize the contributions of this thesis\.

### 1\.1Protein Structure Prediction

#### 1\.1\.1Experimental Protein Structure Determination

X\-ray crystallography has long served as the gold standard for determining protein structures at atomic resolution\. Since the determination of myoglobin and hemoglobin structures in the late 1950s, the field has expanded dramatically; the Protein Data Bank \(PDB\) now archives over 200,000 experimentally determined structures\[[9](https://arxiv.org/html/2605.11189#bib.bib84)\]\. The technique works by directing X\-rays through a protein crystal and analyzing the resulting diffraction pattern to reconstruct the three\-dimensional structure, frequently around 2 Å\[[59](https://arxiv.org/html/2605.11189#bib.bib88)\]\. However, obtaining well\-ordered, diffraction\-quality crystals remains a major bottleneck, particularly for flexible proteins, membrane proteins, and large complexes\[[65](https://arxiv.org/html/2605.11189#bib.bib87)\]\.

Recent advances in Cryo\-Electron Microscopy \(Cryo\-EM\) have ushered in a “resolution revolution”\[[41](https://arxiv.org/html/2605.11189#bib.bib89)\], enabling the determination of macromolecular structures at near\-atomic resolution without the need for crystallization\. Compared to X\-ray crystallography, Cryo\-EM does not require crystallization, as samples are rapidly vitrified in their near\-native hydrated state and imaged using an electron beam\[[16](https://arxiv.org/html/2605.11189#bib.bib90)\]\. Single\-particle analysis, in particular, has become a powerful approach for resolving structures of complexes that resist crystallization\. However, Cryo\-EM still faces challenges with small proteins \(typically below 50 kDa\), and achieving resolutions below 2 Å remains difficult for many targets\.

Nuclear Magnetic Resonance \(NMR\) spectroscopy complements these approaches by probing protein dynamics directly in solution, capturing conformational flexibility and transient interactions that are often lost during crystallization\[[76](https://arxiv.org/html/2605.11189#bib.bib92)\]\. NMR is particularly valuable for studying intrinsically disordered proteins and protein–ligand interactions, though it becomes increasingly challenging for proteins larger than 40 kDa\.

Several lower\-resolution methods provide additional structural insights\. The comparison of different structure determination methods is shown in[Figure˜1\.1](https://arxiv.org/html/2605.11189#Ch1.F1)\. Small\-Angle X\-ray Scattering \(SAXS\) characterizes the overall shape and oligomeric state of macromolecules in solution\[[56](https://arxiv.org/html/2605.11189#bib.bib94)\]\. Förster Resonance Energy Transfer \(FRET\) tracks real\-time distance changes between labeled sites, revealing conformational dynamics\[[62](https://arxiv.org/html/2605.11189#bib.bib95)\]\. Cross\-linking mass spectrometry \(XL\-MS\) and hydrogen–deuterium exchange mass spectrometry \(HDX\-MS\) provide complementary information on spatial proximity and solvent accessibility, respectively\[[46](https://arxiv.org/html/2605.11189#bib.bib96)\]\.

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/dror2011.jpeg)Figure 1\.1:An illustration of spatial resolution of different methods\. AFM, atomic force microscopy; EM, electron microscopy; FRET, Fröster resonance energy transfer; NMR, nuclear magnetic resonance\. Reproduced from\[[23](https://arxiv.org/html/2605.11189#bib.bib193)\], licensed under the Attribution–Noncommercial–Share Alike 3\.0 Unported License \(http://creativecommons\.org/licenses/by\-nc\-sa/3\.0/\)\.
#### 1\.1\.2Computational Protein Structure Prediction

Physics\-based protein structure prediction rests on Anfinsen’s thermodynamic hypothesis: that a protein adopts the conformation corresponding to the global minimum of its free\-energy surface\[[6](https://arxiv.org/html/2605.11189#bib.bib83)\]\. Translating this principle into practice requires two components: a reliable energy function to describe the protein’s potential energy surface, and an efficient search algorithm for to find its global minimum\.

Molecular dynamics \(MD\) simulations approach this problem by modeling proteins and their surrounding solvent as classical particles governed by empirically derived force fields, propagating the system through time via numerical integration of equations of motion at femtosecond resolution\[[37](https://arxiv.org/html/2605.11189#bib.bib194)\]\. While MD can in principle provide atomic\-resolution models of conformational equilibria and folding transitions, its effectiveness is limited by two persistent challenges: the accuracy of current force fields in reproducing true energy surfaces\[[55](https://arxiv.org/html/2605.11189#bib.bib195)\], and the difficulty of sampling conformational space sufficiently to observe folding events\. Specialized hardware such as the Anton supercomputer has helped address the sampling bottleneck, enabling equilibrium simulations long enough to observe the folding of small, fast\-folding proteins\[[49](https://arxiv.org/html/2605.11189#bib.bib196)\], but extending this to larger and slower\-folding proteins remains prohibitive\.

Ab initio modeling methods such as Rosetta take a different approach, directly searching for low\-energy conformations by assembling short fragments of known protein structures using Monte Carlo sampling guided by physically motivated energy functions that emphasize short\-range interactions—van der Waals, hydrogen bonding, and desolvation—while dampening long\-range electrostatics\[[11](https://arxiv.org/html/2605.11189#bib.bib106)\]\. Combined with all\-atom refinement, Rosetta has achieved near\-atomic accuracy for small proteins across all secondary structure classes\[[20](https://arxiv.org/html/2605.11189#bib.bib107)\]\. However, both MD and ab initio methods scale poorly to large proteins and multi\-component complexes, where the combinatorial growth of conformational space overwhelms current sampling strategies and energy function accuracy\[[42](https://arxiv.org/html/2605.11189#bib.bib14)\]\.

The recent advances in protein structure prediction have come from the effective utilization of evolutionary information\. Early approaches relied on homology modeling, which builds three\-dimensional models by copying structural fragments from evolutionarily related proteins with experimentally determined structures\[[38](https://arxiv.org/html/2605.11189#bib.bib108)\]\. As more structures were deposited in the PDB and sequence databases grew, homology modeling became increasingly powerful, but remained fundamentally limited to proteins with detectable sequence similarity to known structures\.

Residues that are close in three\-dimensional space often co\-evolve to maintain structural stability, leaving detectable patterns in multiple sequence alignments \(MSAs\)\. As early as 1999, Lapedes et al\. proposed using Markov Random Fields \(MRFs\) to model pairwise couplings between co\-evolving residues\[[45](https://arxiv.org/html/2605.11189#bib.bib114)\], but limited sequence data and computational resources meant this work went largely unnoticed\. A decade later, Weigt et al\. developed a message\-passing algorithm to infer direct co\-evolutionary couplings across protein–protein interfaces\[[74](https://arxiv.org/html/2605.11189#bib.bib52)\], and Marks et al\. showed that co\-evolutionary signals alone could determine accurate protein folds\[[52](https://arxiv.org/html/2605.11189#bib.bib109)\]\. These methods depended critically on large, diverse MSAs—a requirement increasingly met by the rapid growth of sequence databases\.

Deep learning further transformed contact prediction and protein structure prediction\. MetaPSICOV combined multiple co\-evolution methods within a neural network\[[35](https://arxiv.org/html/2605.11189#bib.bib111)\], and RaptorX\-Contact showed that deep residual networks could substantially improve accuracy\[[71](https://arxiv.org/html/2605.11189#bib.bib51)\]\. AlphaFold1 predicted inter\-residue distance distributions from co\-evolutionary features, achieving top performance at CASP13\[[63](https://arxiv.org/html/2605.11189#bib.bib45)\]\. AlphaFold2 shifted to end\-to\-end structure prediction, using an attention\-based architecture to directly output three\-dimensional coordinates with accuracy comparable to experimental methods\[[36](https://arxiv.org/html/2605.11189#bib.bib4)\]\. AlphaFold\-Multimer extended this to protein complexes\[[27](https://arxiv.org/html/2605.11189#bib.bib112)\], and most recently, AlphaFold3 and RoseTTAFold All\-Atom have expanded to incorporate nucleic acids, small molecules, ions, and covalent modifications within unified frameworks\[[1](https://arxiv.org/html/2605.11189#bib.bib12),[40](https://arxiv.org/html/2605.11189#bib.bib113)\]\.

Despite these advances, current methods still depend heavily on homologous sequences and perform poorly on proteins with shallow MSAs, such as orphan proteins with few known relatives\[[14](https://arxiv.org/html/2605.11189#bib.bib115)\]\. Most methods predict a single static structure and cannot capture the conformational ensembles that underlie protein function\[[61](https://arxiv.org/html/2605.11189#bib.bib116)\]\. Additionally, these models do not explicitly learn physical energetics, limiting their ability to predict how mutations affect stability or binding affinity\[[12](https://arxiv.org/html/2605.11189#bib.bib117)\]\.

### 1\.2Computational Protein Design

#### 1\.2\.1Experimental Protein Engineering Methods

Directed evolution mimics natural selection in the laboratory to engineer proteins with improved or new properties\. The process works by repeatedly generating sequence diversity—through random mutagenesis or DNA recombination—then screening or selecting for improved variants\. A key challenge is linking each protein to the DNA that encodes it\. Display technologies solve this by attaching each variant to its genetic template\. Phage display presents variants on bacteriophage surfaces, enabling screening of libraries exceeding101010^\{10\}variants\[[64](https://arxiv.org/html/2605.11189#bib.bib98),[75](https://arxiv.org/html/2605.11189#bib.bib99)\], while yeast surface display enables quantitative selection using fluorescence\-activated cell sorting \(FACS\)\[[10](https://arxiv.org/html/2605.11189#bib.bib100)\]\. Continuous evolution platforms such as phage\-assisted continuous evolution \(PACE\) take this further by combining mutagenesis and selection into a single uninterrupted process, achieving hundreds of generations of evolution in days\[[26](https://arxiv.org/html/2605.11189#bib.bib101)\]\.

Despite its impact, directed evolution can screen only∼108\\sim 10^\{8\}–101310^\{13\}variants, a small fraction of the theoretical sequence space \(20N20^\{N\}for a protein ofNNresidues\), and generally requires a starting template with some level of the desired activity\. Rational design circumvents these limitations by using structural knowledge to introduce targeted mutations, though it depends on a detailed understanding of structure–function relationships\. Semi\-rational approaches bridge this gap by focusing directed evolution on specific regions—such as active\-site residues—using combinatorial saturation mutagenesis to reduce library size while enriching for functional variants\[[50](https://arxiv.org/html/2605.11189#bib.bib102)\]\.

#### 1\.2\.2Computational Protein Design

Structure\-based protein design aims to identify amino acid sequences that fold into a desired three\-dimensional structure and perform a specific function\. The field was pioneered by physics\-based approaches that use energy functions to evaluate sequence compatibility with a target backbone\. Dahiyat and Mayo demonstrated the first fully automated computational design of a protein, screening∼\\sim1\.9×\\times102710^\{27\}sequences using energy functions\[[19](https://arxiv.org/html/2605.11189#bib.bib118)\]\. Kuhlman et al\. subsequently used the Rosetta energy function to design Top7, a protein with a fold not found in nature, marking a milestone for the field\[[43](https://arxiv.org/html/2605.11189#bib.bib86)\]\. OSPREY introduced provable algorithms with guarantees of finding the global minimum energy conformation, enabling rigorous design with ensemble\-based methods\[[29](https://arxiv.org/html/2605.11189#bib.bib120)\]\. These physics\-based methods optimize sequences and side\-chain conformations \(rotamers\) to minimize energy, and have been extended to multistate design, where sequences are simultaneously optimized across multiple conformational states to achieve specificity—for example, stabilizing a desired binding interaction while destabilizing off\-target ones\[[22](https://arxiv.org/html/2605.11189#bib.bib121)\]\. Another key application of structure\-based design is protein binder engineering\. Cao et al\. demonstrated that miniprotein binders with nanomolar to picomolar affinity could be designed de novo from the target structure alone using Rosetta\[[13](https://arxiv.org/html/2605.11189#bib.bib122)\]\.

Deep learning is rapidly transforming each component of the conventional structure\-based protein design pipeline: backbone generation, sequence design, and filtering\[[18](https://arxiv.org/html/2605.11189#bib.bib131)\]; a workflow shown in[Figure˜1\.2](https://arxiv.org/html/2605.11189#Ch1.F2)\.

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/chu2024fig5.jpg)Figure 1\.2:Overview of the deep learning protein design pipeline, illustrating backbone generation, sequence design, and structure prediction\-based filtering\. Figure adapted from\[[18](https://arxiv.org/html/2605.11189#bib.bib131)\]\.Early generative approaches used GANs\[[5](https://arxiv.org/html/2605.11189#bib.bib5)\]and VAEs such as Ig\-VAE\[[24](https://arxiv.org/html/2605.11189#bib.bib181)\], but struggled with structural plausibility and diversity\. Hallucination\-based methods offered an alternative by optimizing sequences through gradient descent on structure prediction networks to generate novel topologies\[[7](https://arxiv.org/html/2605.11189#bib.bib9)\], and inpainting approaches fill in sequence and structure around functional sites in a single forward pass\[[70](https://arxiv.org/html/2605.11189#bib.bib10)\]\. BindCraft applied AlphaFold2\-based hallucination to binder design, achieving experimental success rates of 10–100% across diverse targets\[[54](https://arxiv.org/html/2605.11189#bib.bib161)\]\. Diffusion models such as RFdiffusion\[[72](https://arxiv.org/html/2605.11189#bib.bib6)\]and Chroma\[[32](https://arxiv.org/html/2605.11189#bib.bib7)\]generate novel, designable protein backbones from noise\. A growing frontier extends this to all\-atom generation\. Protpardelle\[[17](https://arxiv.org/html/2605.11189#bib.bib125)\]and Pallatom\[[57](https://arxiv.org/html/2605.11189#bib.bib126)\]generate full side\-chain conformations alongside backbone and sequence, while RFdiffusion2 designs enzyme active sites directly from functional group geometries at atom\-level resolution\[[3](https://arxiv.org/html/2605.11189#bib.bib124)\]\. AlphaProteo similarly generates all\-atom protein binders conditioned on target structures\[[77](https://arxiv.org/html/2605.11189#bib.bib85)\]\.

Given a fixed backbone, structure\-conditioned methods such as ProteinMPNN use message passing on protein graphs to predict sequences that fold into the target structure, achieving higher experimental success rates than Rosetta\[[33](https://arxiv.org/html/2605.11189#bib.bib157),[21](https://arxiv.org/html/2605.11189#bib.bib158)\]\. Complementary to structure\-conditioned approaches, protein language models such as ProGen, ESM, and EvoDiff generate protein sequences without requiring a backbone template\[[60](https://arxiv.org/html/2605.11189#bib.bib8),[51](https://arxiv.org/html/2605.11189#bib.bib127),[4](https://arxiv.org/html/2605.11189#bib.bib128),[30](https://arxiv.org/html/2605.11189#bib.bib173)\]\.

Designed candidates must be filtered for foldability, stability, and other properties\. Structure prediction methods\[[36](https://arxiv.org/html/2605.11189#bib.bib4),[1](https://arxiv.org/html/2605.11189#bib.bib12)\]serve as in silico validation, verifying that designed sequences fold into intended structures before experimental testing\[[78](https://arxiv.org/html/2605.11189#bib.bib132),[66](https://arxiv.org/html/2605.11189#bib.bib129),[54](https://arxiv.org/html/2605.11189#bib.bib161)\]\. As discussed in the previous section, current structure prediction models produce single static structures and cannot capture conformational ensembles, and binding affinity and mutational effect prediction remain unreliable, limiting their utility as design filters\.

Together, advances in backbone generation, sequence design, and computational filtering have enabled the design of novel binders, enzymes, and therapeutic proteins\[[39](https://arxiv.org/html/2605.11189#bib.bib130),[18](https://arxiv.org/html/2605.11189#bib.bib131)\]\. Yet significant challenges remain\. Current workflows require generating and screening thousands of candidates to find a few that function experimentally\. Properties critical for therapeutic development—such as solubility, low immunogenicity, and manufacturability—are rarely optimized during design\. Achieving catalytic efficiency comparable to natural enzymes remains particularly difficult\. Furthermore, co\-design and all\-atom generative methods, while promising, currently lag behind the two\-stage pipeline of backbone diffusion followed by fixed\-backbone sequence design in both designability and structural accuracy\.

### 1\.3Deep Learning for Protein Modeling

Deep learning for protein modeling is a fast\-moving field\. We give a brief survey of recent progress across structure prediction, design, dynamics, self\-supervised representation learning, and other downstream applications\.

###### Structure Prediction\.

AlphaFold2 introduced an end\-to\-end architecture that predicts atomic coordinates directly from sequence and MSA inputs\[[36](https://arxiv.org/html/2605.11189#bib.bib4)\]\. Its core components, the Evoformer for joint sequence\-MSA representation learning and the structure module with Invariant Point Attention for SE\(3\)\-equivariant coordinate prediction, define the template that subsequent methods have built on\. AlphaFold\-Multimer extends this framework to protein complexes by training on multimeric structures and pairing chains across MSAs\[[27](https://arxiv.org/html/2605.11189#bib.bib112)\]\. Independent re\-implementations broadened the architecture: RoseTTAFold uses a three\-track network that jointly reasons over sequence, residue pairs, and atomic coordinates\[[8](https://arxiv.org/html/2605.11189#bib.bib66)\]; OpenFold reproduces AlphaFold2 with open training data and code\[[2](https://arxiv.org/html/2605.11189#bib.bib197)\]; ESMFold replaces MSAs with embeddings from a large protein language model, trading evolutionary signal for inference speed and applicability to orphan sequences\[[48](https://arxiv.org/html/2605.11189#bib.bib198)\]\. Recent work extends these architectures to all\-atom biomolecular assemblies\. RoseTTAFold All\-Atom and AlphaFold3 unify proteins, nucleic acids, small molecules, ions, and covalent modifications in a single framework, with AlphaFold3 additionally replacing the structure module with a diffusion head over atomic coordinates\[[40](https://arxiv.org/html/2605.11189#bib.bib113),[1](https://arxiv.org/html/2605.11189#bib.bib12)\]\.

###### Protein Design\.

Hallucination methods optimize sequences through a frozen predictor by gradient ascent on confidence or geometric objectives, generating structures with desired properties\[[7](https://arxiv.org/html/2605.11189#bib.bib9)\]\. BindCraft applies AlphaFold2 hallucination to binder design, achieving experimental success rates of ten to one hundred percent across diverse targets\[[54](https://arxiv.org/html/2605.11189#bib.bib161)\]\. RFdiffusion adapts the RoseTTAFold structure module into a denoising diffusion model that generates backbones from noise, supporting unconditional generation, motif scaffolding, and binder design\[[72](https://arxiv.org/html/2605.11189#bib.bib6)\]\. RFdiffusion2 and AlphaProteo extend these ideas to all\-atom and target\-conditioned binder generation\[[3](https://arxiv.org/html/2605.11189#bib.bib124),[78](https://arxiv.org/html/2605.11189#bib.bib132)\]\. Pretrained predictors also serve as in silico filters: designs from sequence\-only or backbone\-conditioned generators are scored by predictor confidence metrics \(pLDDT, pAE, ipTM\), and self\-consistency between the designed structure and the predictor’s reconstruction from the designed sequence is the primary success criterion before experimental testing\[[72](https://arxiv.org/html/2605.11189#bib.bib6),[54](https://arxiv.org/html/2605.11189#bib.bib161),[66](https://arxiv.org/html/2605.11189#bib.bib129)\]\. This filter\-and\-screen workflow now underlies most experimentally validated design pipelines\.

###### Modeling Protein Dynamics\.

Structure predictors output single static conformations, while biological function depends on conformational ensembles\. A growing line of work adapts these architectures to capture dynamics\. AFCluster clusters and subsamples MSAs to elicit alternative conformations from AlphaFold2 at inference time without retraining, exposing functional states such as kinase active and inactive forms\[[73](https://arxiv.org/html/2605.11189#bib.bib199)\]\. AlphaFlow and ESMFlow fine\-tune AlphaFold2 and ESMFold with flow matching to sample diverse conformations from sequence, with training targets drawn from the PDB or short molecular dynamics trajectories\[[34](https://arxiv.org/html/2605.11189#bib.bib200)\]\. BioEmu fine\-tunes a structure prediction backbone on a large in\-house corpus of all\-atom molecular dynamics trajectories to approximate Boltzmann\-distributed ensembles in a single forward pass, at much lower cost than direct simulation\[[47](https://arxiv.org/html/2605.11189#bib.bib170)\]\. Together, these methods point toward end\-to\-end networks that approximate distributions over conformations rather than point estimates\.

###### Self\-Supervised Learning of Protein Representations\.

Protein language models learn representations of amino acid sequences from large protein sequence databases\. The ESM family trains off\-the\-shelf transformers with masked language modeling on UniRef, producing embeddings that transfer to structure, function, and variant effect prediction\[[60](https://arxiv.org/html/2605.11189#bib.bib8),[48](https://arxiv.org/html/2605.11189#bib.bib198)\]\. MSA Transformer extends the architectures to aligned sequences via axial attention, capturing co\-evolutionary signal directly in attention maps that prove useful for contact and interface prediction\[[58](https://arxiv.org/html/2605.11189#bib.bib41)\]\. Hybrid models inject structural information into the sequence vocabulary: ProstT5 conditions a sequence\-to\-sequence model on 3Di structural tokens and SaProt tokenizes residues jointly with Foldseek structural alphabets\[[67](https://arxiv.org/html/2605.11189#bib.bib171),[31](https://arxiv.org/html/2605.11189#bib.bib201)\]\. Hybrid sequence\-structure models consistently outperform pure\-sequence models on variant effect prediction and structure\-based design\.

###### Other Applications\.

Structure prediction architectures and protein language models have been adapted to a range of other problems\. AlphaMissense fine\-tunes AlphaFold2 with population frequency signal to score missense pathogenicity at proteome scale\[[15](https://arxiv.org/html/2605.11189#bib.bib202)\], and zero\-shot scoring with PLM likelihoods correlates with deep mutational scanning data and clinical pathogenicity\[[53](https://arxiv.org/html/2605.11189#bib.bib203),[28](https://arxiv.org/html/2605.11189#bib.bib204)\]\. Variant effect prediction nonetheless remains challenging: accuracy degrades for low\-frequency variants, multi\-residue mutations, and proteins with shallow MSAs\. Predicted structures also accelerate experimental determination, serving as molecular replacement templates in X\-ray crystallography for targets that resist conventional phasing\[[68](https://arxiv.org/html/2605.11189#bib.bib205)\]\.

### 1\.4Contributions

In this thesis, I investigate core aspects of modeling and designing protein complexes using deep learning\.

In Chapter 2, we develop GLINTER, a supervised deep learning method for predicting interfacial contacts between proteins\. GLINTER combines structural representations from monomers with attention maps from the MSA Transformer \(ESM\-MSA\) derived from interologs\. We show that GLINTER outperforms existing methods such as ComplexContact and DeepHomo on both heterodimers and homodimers, and can effectively guide protein–protein docking\.

Chapter 3 addresses a related challenge: identifying interacting homologs \(interologs\) using protein language models \(PLMs\)\. We propose ESMPair, which uses column\-wise attention scores from the pretrained ESM\-MSA\-1b model to pair sequences from individual chains\. ESMPair significantly improves complex structure prediction, particularly for heterodimers and eukaryotic targets where phylogeny\-based pairing methods struggle\.

Chapter 4 introduces RedNet, a framework for fixed\-backbone binder design that builds on the graph neural network approach established in GLINTER\. RedNet uses a multiscale graph transformer that incorporates both backbone geometry and side\-chain information, along with a contrastive decoding algorithm that optimizes for binding affinity and specificity\. We demonstrate that RedNet generates binders with improved properties and can discriminate between highly similar targets\.

Chapter 5 concludes with future directions for improving biomolecular modeling and expanding its scientific applications\.

## References

- \[1\]J\. Abramson, J\. Adler, J\. Dunger, R\. Evans, T\. Green, A\. Pritzel, O\. Ronneberger, L\. Willmore, A\. J\. Ballard, J\. Bambrick,et al\.\(2024\)Accurate structure prediction of biomolecular interactions with alphafold 3\.Nature,pp\. 1–3\.Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p6.1),[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p5.1),[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px1.p1.1)\.
- \[2\]G\. Ahdritz, N\. Bouatta, C\. Floristean, S\. Kadyan, Q\. Xia, W\. Gerecke, T\. J\. O’Donnell, D\. Berenberg, I\. Fisk, N\. Zanichelli,et al\.\(2024\)OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization\.Nature Methods21\(8\),pp\. 1514–1524\.Cited by:[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px1.p1.1)\.
- \[3\]\(2026\)Atom\-level enzyme active site scaffolding using RFdiffusion2\.Nature Methods23,pp\. 96–105\.External Links:[Document](https://dx.doi.org/10.1038/s41592-025-02975-x)Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p3.1),[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px2.p1.1)\.
- \[4\]S\. Alamdari, N\. Thakkar, R\. van den Berg, A\. Lu,et al\.\(2023\)Protein generation with evolutionary diffusion: sequence is all you need\.bioRxiv\.External Links:[Document](https://dx.doi.org/10.1101/2023.09.11.556673)Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p4.1)\.
- \[5\]N\. Anand and P\. Huang\(2018\)Generative modeling for protein structures\.Advances in neural information processing systems31\.Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p3.1)\.
- \[6\]C\. B\. Anfinsen\(1973\)Principles that govern the folding of protein chains\.Science181\(4096\),pp\. 223–230\.Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p1.1)\.
- \[7\]I\. Anishchenko, S\. J\. Pellock, T\. M\. Chidyausiku, T\. A\. Ramelot, S\. Ovchinnikov, J\. Hao, K\. Bafna, C\. Norn, A\. Kang, A\. K\. Bera,et al\.\(2021\)De novo protein design by deep network hallucination\.Nature600\(7889\),pp\. 547–552\.Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p3.1),[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px2.p1.1)\.
- \[8\]M\. Baeket al\.\(2021\)Accurate prediction of protein structures and interactions using a three\-track neural network\.Science373,pp\. 871–876\.Cited by:[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px1.p1.1)\.
- \[9\]H\. M\. Berman, J\. Westbrook, Z\. Feng, G\. Gilliland, T\. N\. Bhat, H\. Weissig, I\. N\. Shindyalov, and P\. E\. Bourne\(2000\)The protein data bank\.Nucleic acids research28\(1\),pp\. 235–242\.Cited by:[§1\.1\.1](https://arxiv.org/html/2605.11189#Ch1.S1.SS1.p1.1),[Chapter 1](https://arxiv.org/html/2605.11189#Ch1.p4.1)\.
- \[10\]E\. T\. Boder and K\. D\. Wittrup\(1997\)Yeast surface display for screening combinatorial polypeptide libraries\.Nature Biotechnology15\(6\),pp\. 553–557\.External Links:[Document](https://dx.doi.org/10.1038/nbt0697-553)Cited by:[§1\.2\.1](https://arxiv.org/html/2605.11189#Ch1.S2.SS1.p1.1)\.
- \[11\]P\. Bradley, K\. M\. S\. Misura, and D\. Baker\(2005\)Toward high\-resolution de novo structure prediction for small proteins\.Science309\(5742\),pp\. 1868–1871\.External Links:[Document](https://dx.doi.org/10.1126/science.1113801)Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p3.1)\.
- \[12\]G\. R\. Buel and K\. J\. Walters\(2022\)Can AlphaFold2 predict the impact of missense mutations on structure?\.Nature Structural & Molecular Biology29\(1\),pp\. 1–2\.External Links:[Document](https://dx.doi.org/10.1038/s41594-021-00714-2)Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p7.1)\.
- \[13\]L\. Cao, I\. Goreshnik, B\. Coventry, J\. B\. Case, L\. Miller, L\. Kozodoy, R\. E\. Chen, L\. Carter, A\. C\. Walls, Y\. Park,et al\.\(2022\)Design of protein\-binding proteins from the target structure alone\.Nature605\(7910\),pp\. 551–560\.External Links:[Document](https://dx.doi.org/10.1038/s41586-022-04654-9)Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p1.3)\.
- \[14\]D\. Chakravarty and L\. L\. Porter\(2022\)AlphaFold2 fails to predict protein fold switching\.Protein Science31\(6\),pp\. e4353\.External Links:[Document](https://dx.doi.org/10.1002/pro.4353)Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p7.1)\.
- \[15\]J\. Cheng, G\. Novati, J\. Pan, C\. Bycroft, A\. Žemgulytė, T\. Applebaum, A\. Pritzel, L\. H\. Wong, M\. Zielinski, T\. Sargeant,et al\.\(2023\)Accurate proteome\-wide missense variant effect prediction with AlphaMissense\.Science381\(6664\),pp\. eadg7492\.Cited by:[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px5.p1.1)\.
- \[16\]Y\. Cheng\(2018\)Single\-particle cryo\-EM—How did it get here and where will it go\.Science361\(6405\),pp\. 876–880\.External Links:[Document](https://dx.doi.org/10.1126/science.aat4346)Cited by:[§1\.1\.1](https://arxiv.org/html/2605.11189#Ch1.S1.SS1.p2.1)\.
- \[17\]A\. E\. Chu, J\. Kim, L\. Cheng,et al\.\(2024\)An all\-atom protein generative model\.Proceedings of the National Academy of Sciences121\(27\),pp\. e2311500121\.External Links:[Document](https://dx.doi.org/10.1073/pnas.2311500121)Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p3.1)\.
- \[18\]A\. E\. Chu, T\. Lu, and P\. Huang\(2024\)Sparks of function by de novo protein design\.Nature biotechnology42\(2\),pp\. 203–215\.Cited by:[Figure 1\.2](https://arxiv.org/html/2605.11189#Ch1.F2),[Figure 1\.2](https://arxiv.org/html/2605.11189#Ch1.F2.3.2),[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p2.1),[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p6.1)\.
- \[19\]B\. I\. Dahiyat and S\. L\. Mayo\(1997\)De novo protein design: fully automated sequence selection\.Science278\(5335\),pp\. 82–87\.External Links:[Document](https://dx.doi.org/10.1126/science.278.5335.82)Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p1.3)\.
- \[20\]R\. Das, B\. Qian, S\. Raman, R\. Vernon, J\. Thompson, P\. Bradley, S\. Khare, M\. D\. Tyka, D\. Bhat, D\. Chivian,et al\.\(2007\)Structure prediction for CASP7 targets using extensive all\-atom refinement with Rosetta@home\.Proteins: Structure, Function, and Bioinformatics69\(S8\),pp\. 118–128\.External Links:[Document](https://dx.doi.org/10.1002/prot.21636)Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p3.1)\.
- \[21\]J\. Dauparas, I\. Anishchenko, N\. Bennett,et al\.\(2022\)Robust deep learning\-based protein sequence design using proteinmpnn\.Science378\(6615\),pp\. 49–56\.Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p4.1)\.
- \[22\]J\. A\. Davey and R\. A\. Chica\(2012\)Multistate approaches in computational protein design\.Protein Science21\(9\),pp\. 1241–1252\.External Links:[Document](https://dx.doi.org/10.1002/pro.2128)Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p1.3)\.
- \[23\]R\. O\. Dror, R\. M\. Dirks, J\. Grossman, H\. Xu, and D\. E\. Shaw\(2012\)Biomolecular simulation: a computational microscope for molecular biology\.Annual review of biophysics41,pp\. 429–452\.Cited by:[Figure 1\.1](https://arxiv.org/html/2605.11189#Ch1.F1),[Figure 1\.1](https://arxiv.org/html/2605.11189#Ch1.F1.3.2)\.
- \[24\]R\. R\. Eguchi, C\. A\. Choe, and P\. Huang\(2022\)Ig\-vae: generative modeling of protein structure by direct 3d coordinate generation\.PLoS computational biology18\(6\),pp\. e1010271\.Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p3.1)\.
- \[25\]D\. Esposito, J\. Weile, J\. Shendure, L\. M\. Starita, A\. T\. Papenfuss, F\. P\. Roth, D\. M\. Fowler, and A\. F\. Rubin\(2019\)MaveDB: an open\-source platform to distribute and interpret data from multiplexed assays of variant effect\.Genome Biology20,pp\. 223\.External Links:[Document](https://dx.doi.org/10.1186/s13059-019-1845-6)Cited by:[Chapter 1](https://arxiv.org/html/2605.11189#Ch1.p4.1)\.
- \[26\]K\. M\. Esvelt, J\. C\. Carlson, and D\. R\. Liu\(2011\)A system for the continuous directed evolution of biomolecules\.Nature472\(7344\),pp\. 499–503\.External Links:[Document](https://dx.doi.org/10.1038/nature09929)Cited by:[§1\.2\.1](https://arxiv.org/html/2605.11189#Ch1.S2.SS1.p1.1)\.
- \[27\]R\. Evans, M\. O’Neill, A\. Pritzel, N\. Antropova, A\. Senior, T\. Green, A\. Žídek, R\. Bates, S\. Blackwell, J\. Yim,et al\.\(2022\)Protein complex prediction with AlphaFold\-Multimer\.bioRxiv\.External Links:[Document](https://dx.doi.org/10.1101/2021.10.04.463034)Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p6.1),[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px1.p1.1)\.
- \[28\]J\. Frazer, P\. Notin, M\. Dias, A\. Gomez, J\. K\. Min, K\. Brock, Y\. Gal, and D\. S\. Marks\(2021\)Disease variant prediction with deep generative models of evolutionary data\.Nature599\(7883\),pp\. 91–95\.Cited by:[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px5.p1.1)\.
- \[29\]M\. A\. Hallen, J\. W\. Martin, A\. Ojewole, J\. D\. Jou, A\. U\. Lowegard, M\. S\. Frenkel, P\. Gainza, H\. M\. Nisonoff, A\. Mukund, S\. Wang,et al\.\(2018\)OSPREY 3\.0: open\-source protein redesign for you, with powerful new features\.Journal of Computational Chemistry39\(30\),pp\. 2494–2507\.External Links:[Document](https://dx.doi.org/10.1002/jcc.25522)Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p1.3)\.
- \[30\]T\. Hayes, R\. Rao, H\. Akin, N\. J\. Sofroniew, D\. Oktay, Z\. Lin, R\. Verkuil, V\. Q\. Tran, J\. Deaton, M\. Wiggert,et al\.\(2025\)Simulating 500 million years of evolution with a language model\.Science387\(6736\),pp\. 850–858\.Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p4.1)\.
- \[31\]M\. Heinzinger, K\. Weissenow, J\. G\. Sanchez, A\. Henkel, M\. Mirdita, M\. Steinegger, and B\. Rost\(2024\)Bilingual language model for protein sequence and structure\.NAR Genomics and Bioinformatics6\(4\),pp\. lqae150\.Cited by:[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px4.p1.1)\.
- \[32\]J\. B\. Ingraham, M\. Baranov, Z\. Costello, K\. W\. Barber, W\. Wang, A\. Ismail, V\. Frappier, D\. M\. Lord, C\. Ng\-Thow\-Hing, E\. R\. Van Vlack,et al\.\(2023\)Illuminating protein space with a programmable generative model\.Nature623\(7989\),pp\. 1070–1078\.Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p3.1)\.
- \[33\]J\. Ingraham, V\. Garg, R\. Barzilay, and T\. Jaakkola\(2019\)Generative models for graph\-based protein design\.Advances in Neural Information Processing Systems32\.Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p4.1)\.
- \[34\]B\. Jing, B\. Berger, and T\. Jaakkola\(2024\)AlphaFold meets flow matching for generating protein ensembles\.InInternational Conference on Machine Learning,Cited by:[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px3.p1.1)\.
- \[35\]D\. T\. Jones, T\. Singh, T\. Kosciolek, and S\. Tetchner\(2015\)MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins\.Bioinformatics31\(7\),pp\. 999–1006\.External Links:[Document](https://dx.doi.org/10.1093/bioinformatics/btu791)Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p6.1)\.
- \[36\]J\. Jumper, R\. Evans, A\. Pritzel, T\. Green, M\. Figurnov, O\. Ronneberger, K\. Tunyasuvunakool, R\. Bates, A\. Žídek, A\. Potapenko,et al\.\(2021\)Highly accurate protein structure prediction with alphafold\.nature596\(7873\),pp\. 583–589\.Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p6.1),[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p5.1),[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px1.p1.1),[Chapter 1](https://arxiv.org/html/2605.11189#Ch1.p4.1)\.
- \[37\]M\. Karplus and J\. A\. McCammon\(2002\)Molecular dynamics simulations of biomolecules\.Nature structural biology9\(9\),pp\. 646–652\.Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p2.1)\.
- \[38\]L\. A\. Kelley, S\. Mezulis, C\. M\. Yates, M\. N\. Wass, and M\. J\. E\. Sternberg\(2015\)The Phyre2 web portal for protein modeling, prediction and analysis\.Nature Protocols10\(6\),pp\. 845–858\.External Links:[Document](https://dx.doi.org/10.1038/nprot.2015.053)Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p4.1)\.
- \[39\]T\. Kortemme\(2024\)De novo protein design—from new structures to programmable functions\.Cell187\(18\),pp\. 4934–4953\.Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p6.1)\.
- \[40\]R\. Krishna, J\. Wang, W\. Ahern, P\. Sturmfels, P\. Venkatesh, I\. Kalvet, G\. R\. Lee, F\. S\. Morey\-Burrows, I\. Anishchenko, I\. R\. Humphreys,et al\.\(2024\)Generalized biomolecular modeling and design with RoseTTAFold All\-Atom\.Science384\(6693\),pp\. eadl2528\.External Links:[Document](https://dx.doi.org/10.1126/science.adl2528)Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p6.1),[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px1.p1.1)\.
- \[41\]W\. Kühlbrandt\(2014\)The resolution revolution\.Science343\(6178\),pp\. 1443–1444\.External Links:[Document](https://dx.doi.org/10.1126/science.1251652)Cited by:[§1\.1\.1](https://arxiv.org/html/2605.11189#Ch1.S1.SS1.p2.1)\.
- \[42\]B\. Kuhlman and P\. Bradley\(2019\)Advances in protein structure prediction and design\.Nat Rev Mol Cell Biol20\(11\),pp\. 681–697\.Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p3.1)\.
- \[43\]B\. Kuhlman, G\. Dantas, G\. C\. Ireton, G\. Varani, B\. L\. Stoddard, and D\. Baker\(2003\)Design of a novel globular protein fold with atomic\-level accuracy\.Science302\(5649\),pp\. 1364–1368\.Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p1.3)\.
- \[44\]M\. J\. Landrum, J\. M\. Lee, M\. Benson, G\. R\. Brown, C\. Chao, S\. Chitipiralla, B\. Gu, J\. Hart, D\. Hoffman, W\. Jang,et al\.\(2018\)ClinVar: improving access to variant interpretations and supporting evidence\.Nucleic Acids Research46\(D1\),pp\. D1062–D1067\.External Links:[Document](https://dx.doi.org/10.1093/nar/gkx1153)Cited by:[Chapter 1](https://arxiv.org/html/2605.11189#Ch1.p4.1)\.
- \[45\]A\. S\. Lapedes, B\. G\. Giraud, L\. Liu, and G\. D\. Stormo\(1999\)Correlated mutations in models of protein sequences: phylogenetic and structural effects\.InStatistics in Molecular Biology and Genetics,Lecture Notes–Monograph Series, Vol\.33,pp\. 236–256\.External Links:[Document](https://dx.doi.org/10.1214/lnms/1215455556)Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p5.1)\.
- \[46\]A\. Leitner, M\. Faini, F\. Stengel, and R\. Aebersold\(2016\)Crosslinking and mass spectrometry: an integrated technology to understand the structure and function of molecular machines\.Trends in Biochemical Sciences41\(1\),pp\. 20–32\.External Links:[Document](https://dx.doi.org/10.1016/j.tibs.2015.10.008)Cited by:[§1\.1\.1](https://arxiv.org/html/2605.11189#Ch1.S1.SS1.p4.1)\.
- \[47\]S\. Lewis, T\. Hempel, J\. Jiménez\-Luna, M\. Gastegger, Y\. Xie, A\. Y\. Foong, V\. G\. Satorras, O\. Abdin, B\. S\. Veeling, I\. Zaporozhets,et al\.\(2025\)Scalable emulation of protein equilibrium ensembles with generative deep learning\.Science389\(6761\),pp\. eadv9817\.Cited by:[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px3.p1.1)\.
- \[48\]Z\. Lin, H\. Akin, R\. Rao, B\. Hie, Z\. Zhu, W\. Lu, N\. Smetanin, R\. Verkuil, O\. Kabeli, Y\. Shmueli,et al\.\(2023\)Evolutionary\-scale prediction of atomic\-level protein structure\.Science379\(6637\),pp\. 1123–1130\.Cited by:[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px1.p1.1),[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px4.p1.1)\.
- \[49\]K\. Lindorff\-Larsen, S\. Piana, R\. O\. Dror, and D\. E\. Shaw\(2011\)How fast\-folding proteins fold\.Science334\(6055\),pp\. 517–520\.Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p2.1)\.
- \[50\]S\. Lutz\(2010\)Beyond directed evolution–semi\-rational protein engineering and design\.Current Opinion in Biotechnology21\(6\),pp\. 734–743\.External Links:[Document](https://dx.doi.org/10.1016/j.copbio.2010.08.011)Cited by:[§1\.2\.1](https://arxiv.org/html/2605.11189#Ch1.S2.SS1.p2.4)\.
- \[51\]A\. Madani, B\. Krause, E\. R\. Greene, S\. Subramanian,et al\.\(2023\)Large language models generate functional protein sequences across diverse families\.Nature Biotechnology41\(8\),pp\. 1099–1106\.Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p4.1)\.
- \[52\]D\. S\. Marks, L\. J\. Colwell, R\. Sheridan, T\. A\. Hopf, A\. Pagnani, R\. Zecchina, and C\. Sander\(2011\)Protein 3D structure computed from evolutionary sequence variation\.PLoS ONE6\(12\),pp\. e28766\.External Links:[Document](https://dx.doi.org/10.1371/journal.pone.0028766)Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p5.1)\.
- \[53\]J\. Meier, R\. Rao, R\. Verkuil, J\. Liu, T\. Sercu, and A\. Rives\(2021\)Language models enable zero\-shot prediction of the effects of mutations on protein function\.InAdvances in Neural Information Processing Systems,Cited by:[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px5.p1.1)\.
- \[54\]M\. Pacesa, L\. Nickel, C\. Schellhaas, J\. Schmidt, E\. Pyatova, L\. Kissling, P\. Barendse, J\. Choudhury, S\. Kapoor, A\. Alcaraz\-Serna,et al\.\(2025\)One\-shot design of functional protein binders with bindcraft\.Nature646\(8084\),pp\. 483–492\.Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p3.1),[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p5.1),[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px2.p1.1)\.
- \[55\]S\. Piana, K\. Lindorff\-Larsen, and D\. E\. Shaw\(2011\)How robust are protein folding simulations with respect to force field parameterization?\.Biophysical journal100\(9\),pp\. L47–L49\.Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p2.1)\.
- \[56\]C\. D\. Putnam, M\. Hammel, G\. L\. Hura, and J\. A\. Tainer\(2007\)X\-ray solution scattering \(SAXS\) combined with crystallography and computation: defining accurate macromolecular structures, conformations and assemblies in solution\.Quarterly Reviews of Biophysics40\(3\),pp\. 191–285\.External Links:[Document](https://dx.doi.org/10.1017/S0033583507004635)Cited by:[§1\.1\.1](https://arxiv.org/html/2605.11189#Ch1.S1.SS1.p4.1)\.
- \[57\]W\. Qu, J\. Guan, R\. Ma, and K\. Zhai\(2024\)P\(all\-atom\) is unlocking new path for protein design\.bioRxiv\.External Links:[Document](https://dx.doi.org/10.1101/2024.08.16.608235)Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p3.1)\.
- \[58\]R\. Raoet al\.\(2021\)MSA transformer\.bioRxiv\.External Links:[Document](https://dx.doi.org/10.1101/2021.02.12.430858)Cited by:[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px4.p1.1)\.
- \[59\]RCSB PDB\(2024\)Guide to understanding PDB data: crystallographic data\.Note:[https://pdb101\.rcsb\.org/learn/guide\-to\-understanding\-pdb\-data/crystallographic\-data](https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/crystallographic-data)Accessed: 2025\-02\-20Cited by:[§1\.1\.1](https://arxiv.org/html/2605.11189#Ch1.S1.SS1.p1.1)\.
- \[60\]A\. Rives, J\. Meier, T\. Sercu, S\. Goyal, Z\. Lin, J\. Liu, D\. Guo, M\. Ott, C\. L\. Zitnick, J\. Ma,et al\.\(2021\)Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences\.Proceedings of the National Academy of Sciences118\(15\),pp\. e2016239118\.Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p4.1),[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px4.p1.1)\.
- \[61\]T\. Saldaño, N\. Escobedo, J\. Marchetti, D\. J\. Zea, J\. Mac Donagh, A\. J\. Velez Rueda, E\. Gonik, A\. García Melani, J\. Novomisky Nechcoff, M\. N\. Salas,et al\.\(2022\)Impact of protein conformational diversity on AlphaFold predictions\.Bioinformatics38\(10\),pp\. 2742–2748\.External Links:[Document](https://dx.doi.org/10.1093/bioinformatics/btac202)Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p7.1)\.
- \[62\]B\. Schuler and W\. A\. Eaton\(2008\)Protein folding studied by single\-molecule FRET\.Current Opinion in Structural Biology18\(1\),pp\. 16–26\.External Links:[Document](https://dx.doi.org/10.1016/j.sbi.2007.12.003)Cited by:[§1\.1\.1](https://arxiv.org/html/2605.11189#Ch1.S1.SS1.p4.1)\.
- \[63\]A\.W\. Senioret al\.\(2020\)Improved protein structure prediction using potentials from deep learning\.Nature577,pp\. 706–710\.Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p6.1)\.
- \[64\]G\. P\. Smith\(1985\)Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface\.Science228\(4705\),pp\. 1315–1317\.External Links:[Document](https://dx.doi.org/10.1126/science.4001944)Cited by:[§1\.2\.1](https://arxiv.org/html/2605.11189#Ch1.S2.SS1.p1.1)\.
- \[65\]M\. S\. Smyth and J\. H\. J\. Martin\(2000\)X\-ray crystallography\.Molecular Pathology53\(1\),pp\. 8–14\.External Links:[Document](https://dx.doi.org/10.1136/mp.53.1.8)Cited by:[§1\.1\.1](https://arxiv.org/html/2605.11189#Ch1.S1.SS1.p1.1)\.
- \[66\]H\. Stark, F\. Faltings, M\. Choi,et al\.\(2025\)BoltzGen: toward universal binder design\.bioRxiv\.External Links:[Document](https://dx.doi.org/10.1101/2025.11.20.689494)Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p5.1),[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px2.p1.1)\.
- \[67\]J\. Su, C\. Han, Y\. Zhou, J\. Shan, X\. Zhou, and F\. Yuan\(2023\)SaProt: protein language modeling with structure\-aware vocabulary\.bioRxiv\.Cited by:[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px4.p1.1)\.
- \[68\]T\. C\. Terwilliger, D\. Liebschner, T\. I\. Croll, C\. J\. Williams, A\. J\. McCoy, B\. K\. Poon, P\. V\. Afonine, R\. D\. Oeffner, J\. S\. Richardson, R\. J\. Read, and P\. D\. Adams\(2024\)AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination\.Nature Methods21\(1\),pp\. 110–116\.Cited by:[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px5.p1.1)\.
- \[69\]UniProt Consortium\(2023\)UniProt: the universal protein knowledgebase in 2023\.Nucleic Acids Research51\(D1\),pp\. D523–D531\.External Links:[Document](https://dx.doi.org/10.1093/nar/gkac1052)Cited by:[Chapter 1](https://arxiv.org/html/2605.11189#Ch1.p4.1)\.
- \[70\]J\. Wang, S\. Lisanza, D\. Juergens, D\. Tischer, J\. L\. Watson, K\. M\. Castro, R\. Ragotte, A\. Saragovi, L\. F\. Milles, M\. Baek,et al\.\(2022\)Scaffolding protein functional sites using deep learning\.Science377\(6604\),pp\. 387–394\.Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p3.1)\.
- \[71\]S\. Wanget al\.\(2017\)Accurate de novo prediction of protein contact map by ultra\-deep learning model\.PLoS Comput\. Biol\.13,pp\. e1005324\.Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p6.1)\.
- \[72\]J\. L\. Watson, D\. Juergens, N\. R\. Bennett, B\. L\. Trippe, J\. Yim, H\. E\. Eisenach, W\. Ahern, A\. J\. Borst, R\. J\. Ragotte, L\. F\. Milles,et al\.\(2023\)De novo design of protein structure and function with rfdiffusion\.Nature620\(7976\),pp\. 1089–1100\.Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p3.1),[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px2.p1.1)\.
- \[73\]H\. K\. Wayment\-Steele, A\. Ojoawo, R\. Otten, J\. M\. Apitz, W\. Pitsawong, M\. Hömberger, S\. Ovchinnikov, L\. Colwell, and D\. Kern\(2024\)Predicting multiple conformations via sequence clustering and AlphaFold2\.Nature625\(7996\),pp\. 832–839\.Cited by:[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px3.p1.1)\.
- \[74\]M\. Weigtet al\.\(2009\)Identification of direct residue contacts in protein\-protein interaction by message passing\.Proc\. Natl\. Acad\. Sci\. USA106,pp\. 67–72\.Cited by:[§1\.1\.2](https://arxiv.org/html/2605.11189#Ch1.S1.SS2.p5.1)\.
- \[75\]G\. P\. Winter\(2019\)Harnessing evolution to make medicines \(Nobel Lecture\)\.Angewandte Chemie International Edition58\(41\),pp\. 14438–14445\.External Links:[Document](https://dx.doi.org/10.1002/anie.201909343)Cited by:[§1\.2\.1](https://arxiv.org/html/2605.11189#Ch1.S2.SS1.p1.1)\.
- \[76\]K\. Wüthrich\(2003\)NMR studies of structure and function of biological macromolecules \(Nobel Lecture\)\.Angewandte Chemie International Edition42\(34\),pp\. 3340–3363\.External Links:[Document](https://dx.doi.org/10.1002/anie.200300595)Cited by:[§1\.1\.1](https://arxiv.org/html/2605.11189#Ch1.S1.SS1.p3.1)\.
- \[77\]V\. Zambaldi, D\. La, A\. E\. Chu, H\. Patani, A\. E\. Danson, T\. O\. C\. Kwan, T\. Frerix, R\. G\. Schneider, D\. Saxton, A\. Thillaisundaram, Z\. Wu, I\. Moraes, O\. Lange, E\. Papa, G\. Stanton, V\. Martin, S\. Singh, L\. H\. Wong, R\. Bates, S\. A\. Kohl, J\. Abramson, A\. W\. Senior, Y\. Alguel, M\. Y\. Wu, I\. M\. Aspalter, K\. Bentley, D\. L\. V\. Bauer, P\. Cherepanov, D\. Hassabis, P\. Kohli, R\. Fergus, and J\. Wang\(2024\)De novo design of high\-affinity protein binders with AlphaProteo\.arXiv preprint arXiv:2409\.08022\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2409.08022)Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p3.1)\.
- \[78\]V\. Zambaldi, D\. La, A\. E\. Chu, H\. Patani, A\. E\. Danson, T\. O\. C\. Kwan, T\. Frerix, R\. G\. Schneider, D\. Saxton, A\. Thillaisundaram, Z\. Wu, I\. Moraes, O\. Lange, E\. Papa, G\. Stanton, V\. Martin, S\. Singh, L\. H\. Wong, R\. Bates, S\. A\. Kohl, J\. Abramson, A\. W\. Senior, Y\. Alguel, M\. Y\. Wu, I\. M\. Aspalter, K\. Bentley, D\. L\. V\. Bauer, P\. Cherepanov, D\. Hassabis, P\. Kohli, R\. Fergus, and J\. Wang\(2024\)De novo design of high\-affinity protein binders with AlphaProteo\.arXiv preprint arXiv:2409\.08022\.External Links:2409\.08022,[Link](https://arxiv.org/abs/2409.08022)Cited by:[§1\.2\.2](https://arxiv.org/html/2605.11189#Ch1.S2.SS2.p5.1),[§1\.3](https://arxiv.org/html/2605.11189#Ch1.S3.SS0.SSS0.Px2.p1.1)\.

## Chapter 2Graph Learning of Protein Interfacial Contacts

This chapter presents work with Jinbo Xu\. It was published inBioinformaticsin 2022\[[39](https://arxiv.org/html/2605.11189#biba.bib11)\]\. Code is available at[https://github\.com/zw2x/glinter](https://github.com/zw2x/glinter)\.

### 2\.1Introduction

Proteins perform functions by interacting with other molecules or forming protein complexes\. As a result, the full characterization of protein–protein interactions with structural details is crucial to atom\-level understanding of protein functions\. The in silico structural characterization of protein complexes, or quaternary protein structure prediction, is a longstanding challenge in computational structural biology\. Given individual protein chains \(and possibly their structures\), interfacial contact prediction aims to predict which pairs of residues on the protein surface are geometrically close to each other after the protein chains bind together\. Interfacial contacts may facilitate generating and filtering docking decoys\[[1](https://arxiv.org/html/2605.11189#biba.bib18),[10](https://arxiv.org/html/2605.11189#biba.bib27),[13](https://arxiv.org/html/2605.11189#biba.bib30),[21](https://arxiv.org/html/2605.11189#biba.bib37)\], and reveal important biophysical properties and evolutionary information of protein interfaces\[[34](https://arxiv.org/html/2605.11189#biba.bib49)\]\. They are also useful for the redesign of protein–protein interfaces\[[18](https://arxiv.org/html/2605.11189#biba.bib34)\]and prediction of binding affinity\[[35](https://arxiv.org/html/2605.11189#biba.bib50)\]\.

Co\-evolution analysis by global statistical methods\[[3](https://arxiv.org/html/2605.11189#biba.bib20),[37](https://arxiv.org/html/2605.11189#biba.bib52)\]has been used for inter\-protein contact prediction\. A recent study\[[4](https://arxiv.org/html/2605.11189#biba.bib21)\]showed that co\-evolution\-based in silico protein–protein interaction screening methods produced more true protein–protein interactions than high\-throughput experimental techniques\. Nevertheless, accurate co\-evolution analysis needs a large number of sequence homologs and thus, may not work well on a large portion of heterodimers for which it is very challenging to find sufficient number of interacting paralogs \(interlogs\)\[[2](https://arxiv.org/html/2605.11189#biba.bib19),[11](https://arxiv.org/html/2605.11189#biba.bib28),[44](https://arxiv.org/html/2605.11189#biba.bib57)\]\. On the other hand, protein language models, which are trained on individual protein sequences or multiple sequence alignment \(MSAs\), are shown to perform similarly as or better than global statistical methods on intra\-chain contact prediction when few sequence homologs are available\[[25](https://arxiv.org/html/2605.11189#biba.bib41),[26](https://arxiv.org/html/2605.11189#biba.bib8)\]\. It was shown before that a deep learning model trained by individual protein chains works fine on protein complex contact prediction\[[44](https://arxiv.org/html/2605.11189#biba.bib57),[46](https://arxiv.org/html/2605.11189#biba.bib59)\]\. Therefore, we hypothesize that a deep language model trained on individual protein chains may also generalize well to protein–protein interactions, reducing the required number of interlogs\. Protein language models are also much faster since they require only one\-time forward computation during inference and thus, more suitable for proteome\-scale screening of protein–protein interactions\.

RaptorX ComplexContact\[[44](https://arxiv.org/html/2605.11189#biba.bib57),[46](https://arxiv.org/html/2605.11189#biba.bib59)\]possibly is the first deep learning method for interfacial contact prediction\. It is mainly developed for heterodimers, although can be used for homodimers\. Nevertheless, its deep models are purely trained on individual protein chains instead of protein complexes\. Further, ComplexContact does not make use of any \(experimental or predicted\) structures of constituent monomers of a dimer\. Recently, some deep learning methods are developed specifically for contact prediction of a homodimer, e\.g\. DNCON\_inter\[[24](https://arxiv.org/html/2605.11189#biba.bib40)\]and DeepHomo\[[42](https://arxiv.org/html/2605.11189#biba.bib56)\], both using ResNet originally implemented in RaptorX\[[36](https://arxiv.org/html/2605.11189#biba.bib51)\]\. In addition to evolution information, DeepHomo uses docking maps, native intra\-chain contacts, and experimental structural features derived from monomers to achieve state\-of\-the\-art performance\. However, it is slow in calculating docking maps and thus, cannot scale well to proteome\-scale prediction\. Some deep learning methods also use learned representations of tertiary structures, including voxels\[[7](https://arxiv.org/html/2605.11189#biba.bib24),[33](https://arxiv.org/html/2605.11189#biba.bib48)\]and radial/point cloud representations on protein surfaces\[[5](https://arxiv.org/html/2605.11189#biba.bib22),[9](https://arxiv.org/html/2605.11189#biba.bib26),[32](https://arxiv.org/html/2605.11189#biba.bib47)\]\. Meanwhile, some representations include anisotropy information in the structures\[[8](https://arxiv.org/html/2605.11189#biba.bib25),[23](https://arxiv.org/html/2605.11189#biba.bib39)\]while others do not\.

Given the tremendous progress in protein structure prediction\[[14](https://arxiv.org/html/2605.11189#biba.bib31),[16](https://arxiv.org/html/2605.11189#biba.bib4),[36](https://arxiv.org/html/2605.11189#biba.bib51),[41](https://arxiv.org/html/2605.11189#biba.bib13),[40](https://arxiv.org/html/2605.11189#biba.bib54)\]and the fast growing number of protein sequences, it is important to leverage predicted structures of constituent monomers and large sequence corpus to produce accurate, proteome\-scale interfacial contact predictions\. An interfacial contact prediction method shall effectively extract coevolution signals from a small number of interlogs, and make use of predicted structures of constituent monomers\. Here, we propose a new supervised deep learning method GLINTER for interfacial contact prediction that integrates representations learned from \(experimental and predicted\) monomer structures and attentions generated by the MSA Transformer \(ESM\-MSA\)\[[25](https://arxiv.org/html/2605.11189#biba.bib41)\]from interlogs of the dimer under prediction\. GLINTER applies to both heterodimers and homodimers, outperforming ComplexContact, DeepHomo and BIPSPI on the 13th and 14th CASP\-CAPRI datasets\. The contacts predicted by GLINTER may also improve the ranking of the HDOCK\-generated docking decoys\[[43](https://arxiv.org/html/2605.11189#biba.bib55)\]\. Further, our method runs very quickly, which makes it suitable for proteome\-scale study\.

### 2\.2Data and methods

#### 2\.2\.1Network architecture

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/glinter/arch-overview.png)Figure 2\.1:Overview of the GLINTER architecture\.L1L\_\{1\}andL2L\_\{2\}are the lengths of the two protein chains, K is the number of channels in a CaConv layer and 144 is the total number of heads in the row attention weights generated by Facebook’s MSA Transformer \(Rao et al\., 2021\)As shown in[Figure˜2\.1](https://arxiv.org/html/2605.11189#Ch2.F1), our network, denoted as GLINTER, consists of two major modules: a Siamese graph convolutional network \(GCN\) and a 16\-block ResNet\[[12](https://arxiv.org/html/2605.11189#biba.bib29)\]\. The GCN extracts local features from three types of graphs derived from monomer structures\. The ResNet takes as input the outputs of the GCN module and the attention weights generated by the MSA Transformer\[[25](https://arxiv.org/html/2605.11189#biba.bib41)\]and yields interfacial contact prediction\. One ResNet block has two convolutional layers, each with 96 filters and a3×33\\times 3kernel\. ELU and BatchNorm are used in each block\. ResNet is connected to a fully connected layer and a softmax layer for contact probability prediction\. The pseudocode of the main architecture is shown in[Algorithm˜1](https://arxiv.org/html/2605.11189#alg1)\.

1

2

3

4

\#

𝒫r\\mathcal\{P\}\_\{r\},𝒫l\\mathcal\{P\}\_\{l\}: monomer structures;𝐌\\mathbf\{M\}: paired MSA

5

6def*GLINTER\(*𝒫r\\mathcal\{P\}\_\{r\},𝒫l\\mathcal\{P\}\_\{l\},𝐌\\mathbf\{M\}*\)*:

7

\#Construct graphs from each monomer

𝒢r←BuildGraphs​\(𝒫r\)\\mathcal\{G\}\_\{r\}\\leftarrow\\textnormal\{\{BuildGraphs\}\}\(\\mathcal\{P\}\_\{r\}\)
\#Cα\\alpha, atom, surface graphs

8

𝒢l←BuildGraphs​\(𝒫l\)\\mathcal\{G\}\_\{l\}\\leftarrow\\textnormal\{\{BuildGraphs\}\}\(\\mathcal\{P\}\_\{l\}\)
9

\#Extract per\-monomer features via Siamese GCN

10

𝐡r←GCN​\(𝒢r\)\\mathbf\{h\}\_\{r\}\\leftarrow\\textnormal\{\{GCN\}\}\(\\mathcal\{G\}\_\{r\}\)
𝐡l←GCN​\(𝒢l\)\\mathbf\{h\}\_\{l\}\\leftarrow\\textnormal\{\{GCN\}\}\(\\mathcal\{G\}\_\{l\}\)
\#shared weights

11

\#Form pairwise representation

𝐏←OuterConcat​\(𝐡r,𝐡l\)\\mathbf\{P\}\\leftarrow\\textnormal\{\{OuterConcat\}\}\(\\mathbf\{h\}\_\{r\},\\mathbf\{h\}\_\{l\}\)
\#

𝐏i​j=\[𝐡r\(i\)∥𝐡l\(j\)\]\\mathbf\{P\}\_\{ij\}=\[\\mathbf\{h\}\_\{r\}^\{\(i\)\}\\\|\\mathbf\{h\}\_\{l\}^\{\(j\)\}\]
12

\#Extract and symmetrize ESM row attention

13

𝐀←MSATransformer​\(𝐌\)\\mathbf\{A\}\\leftarrow\\textnormal\{\{MSATransformer\}\}\(\\mathbf\{M\}\)
14

𝐀←𝐀\[:Nr,Nr:\]\+𝐀\[Nr:,:Nr\]⊤\\mathbf\{A\}\\leftarrow\\mathbf\{A\}\_\{\[:N\_\{r\},N\_\{r\}:\]\}\+\\mathbf\{A\}\_\{\[N\_\{r\}:,:N\_\{r\}\]\}^\{\\top\}
15

\#Concatenate and predict contacts

16

𝐙←Concat​\(𝐏,𝐀\)\\mathbf\{Z\}\\leftarrow\\textnormal\{\{Concat\}\}\(\\mathbf\{P\},\\mathbf\{A\}\)
𝐙←ResNet​\(𝐙\)\\mathbf\{Z\}\\leftarrow\\textnormal\{\{ResNet\}\}\(\\mathbf\{Z\}\)
\#16 blocks, 96 filters,3×33\\times 3kernel

17

𝐂←Softmax​\(FC​\(𝐙\)\)\\mathbf\{C\}\\leftarrow\\textnormal\{\{Softmax\}\}\(\\textnormal\{\{FC\}\}\(\\mathbf\{Z\}\)\)
18

19return

𝐂\\mathbf\{C\}
20

21

22

Algorithm 1GLINTER Main ArchitectureAt each graph convolution layer \(denoted as CaConv\), we calculate the message for a graph edge and node as follows\. For an edgeee, we feed its feature and the features of its two ends to a subnetwork to generate a message\. For a nodeqq, we first aggregate all messages of its adjacent nodes using max pooling, and then pass the result to a subnetwork to generate a message ofqq, i\.e\.

g​\(q\)=g​\(maxv∈Nq⁡f​\(\[xq,xv,e​\(q,v\)\]\)\)g\(q\)=g\(\\max\_\{v\\in N\_\{q\}\}f\(\[x\_\{q\},x\_\{v\},e\(q,v\)\]\)\)\(2\.1\)
wherexqx\_\{q\}is the feature of nodeqq,vvis a node in the neighborhoodNqN\_\{q\}ofqq,xvx\_\{v\}is the feature ofvv,e​\(q,v\)e\(q,v\)is the feature of edge\(q,v\)\(q,v\)and the non\-linear functionsggandffare two fully connected layers of 128 hidden units with BatchNorm and ReLU\. The pseudocode of the GCN is shown in[Algorithm˜2](https://arxiv.org/html/2605.11189#alg2)\.

1

2

3

4

\#

𝐗\\mathbf\{X\}: node features;𝐄\\mathbf\{E\}: edge features;𝐩\\mathbf\{p\}: positions;𝐑\\mathbf\{R\}: local frames

5

6def*CaConv\(*𝐗\\mathbf\{X\},𝐄\\mathbf\{E\},𝐩\\mathbf\{p\},𝐑\\mathbf\{R\}*\)*:

7

8for*q∈𝒢q\\in\\mathcal\{G\}*do

9

\#Compute edge messages in local reference frame

10for*v∈𝒩qv\\in\\mathcal\{N\}\_\{q\}*do

𝐩~v←𝐑q⊤​\(𝐩v−𝐩q\)\\tilde\{\\mathbf\{p\}\}\_\{v\}\\leftarrow\\mathbf\{R\}\_\{q\}^\{\\top\}\(\\mathbf\{p\}\_\{v\}\-\\mathbf\{p\}\_\{q\}\)
\#standardize coordinates

11

𝐦q,v←MLP​\(\[𝐱q​‖𝐱v‖​e​\(q,v\)∥𝐩~v\]\)\\mathbf\{m\}\_\{q,v\}\\leftarrow\\textnormal\{\{MLP\}\}\(\[\\mathbf\{x\}\_\{q\}\\\|\\mathbf\{x\}\_\{v\}\\\|e\(q,v\)\\\|\\tilde\{\\mathbf\{p\}\}\_\{v\}\]\)
12

13end for

14

\#Aggregate and update node feature

15

𝐦¯q←MaxPoolv∈𝒩q​\(𝐦q,v\)\\bar\{\\mathbf\{m\}\}\_\{q\}\\leftarrow\\textnormal\{\{MaxPool\}\}\_\{v\\in\\mathcal\{N\}\_\{q\}\}\(\\mathbf\{m\}\_\{q,v\}\)
16

𝐱q′←MLP​\(𝐦¯q\)\\mathbf\{x\}^\{\\prime\}\_\{q\}\\leftarrow\\textnormal\{\{MLP\}\}\(\\bar\{\\mathbf\{m\}\}\_\{q\}\)
17

18end for

19

20return

𝐗′\\mathbf\{X\}^\{\\prime\}
21

22

23

Algorithm 2CaConv: Graph Convolution with Local Reference FramesBoth coordinates and normals are used to represent the geometric properties of a monomer structure\[[32](https://arxiv.org/html/2605.11189#biba.bib47)\]\. We standardize the geometric features so that they are invariant to the coordinate system used by the monomer structure\. While calculating an message for any nodeqq\(i\.e\. computation offf\), all the adjacent nodes ofqqare first translated using q as the origin, and then rotated using its predefined local reference frame\[[22](https://arxiv.org/html/2605.11189#biba.bib38),[28](https://arxiv.org/html/2605.11189#biba.bib43)\]\. The standardized features are then concatenated with other features to form the actual inputs of functionff\. We use a separate graph convolution network \(GCN\) module to process each graph\. When multiple graphs are used for a monomer, the outputs of all its GCN modules are concatenated to form a single output vector of this monomer\. The outputs of two monomers are then outer\-concatenated to form a pairwise representation of this dimer\. When the ESM row attention weight is used, the attention matrix generated by Facebook’s MSA Transformer is concatenated to the pairwise representation, which is then fed to the ResNet for interfacial contact probability prediction\.

#### 2\.2\.2Features

###### Graph representation of protein structures

We build three different graphs from one protein structure: residue graph, atom graph and surface graph\. In a residue graph, a node is a residue represented by its CA atom, and there is an edge between two residue nodes if and only if the Euclidean distance between their CA atoms is within a certain cutoff, e\.g\. 8 Å\. In an atom graph, a node is a heavy atom or a residue represented by its CA atom, and there is one edge between one residue node and one atom node if and only if their Euclidean distance is within a certain cutoff\.

We use Reduce\[[38](https://arxiv.org/html/2605.11189#biba.bib53)\], MSMS\[[27](https://arxiv.org/html/2605.11189#biba.bib42)\]and trimesh\[[6](https://arxiv.org/html/2605.11189#biba.bib23)\]to construct the triangulated surface of a protein structure\. To build a surface graph, we first use Reduce to add hydrogen atoms and then construct the triangulated surface of a protein structure using MSMS\. MSMS may generate a large number of vertices on the triangulated surface\. We use the “remove\_closest” algorithm in the trimesh library to sample a subset of vertices such that any two of them are at least 0\.8Å away from each other\. The cutoff is set to 0\.8Å, because the number of remaining vertices does not change much when it is less than 0\.8Å\. The surface can be essentially interpreted as a mesh enclosing the protein\. Two neighboring triangles in the surface share either one edge or at least one vertex\. In a surface graph, one node represents one residue or one vertex on the triangulated surface\. There is one edge between one residue node and one triangle vertex if and only if their Euclidean distance is within a certain cutoff\. It takes only a few seconds to build a surface graph and thus, our method scales well on large\-scale prediction\[[4](https://arxiv.org/html/2605.11189#biba.bib21)\]\.

###### Features

Table 2\.1:Features used in spatial graphs, whereLLis the number of residues,NNis the number of atoms,MMis the number of sampled surface vertices, andEEis the number of edges in an atom graph\.[Table˜2\.1](https://arxiv.org/html/2605.11189#Ch2.T1)summarizes all the features\. The geometric features of a residue node include its coordinates and a local reference frame derived from the N\-CA\-C plane\. It uses the CA\-C bond as the x\-axis, the vector perpendicular to the plane formed by the N\-CA and CA\-C bonds as the z\-axis, and their cross\-product as the y\-axis\. Such a representation is rotation invariant and thus, may generalize well without data augmentation in contrast to the network that is not rotation invariant\. The other features of a residue node include position\-specific scoring matrix \(PSSM\), residue solvent accessible surface areas \(summation of the solvent accessible surface areas of all atoms in the residue\), the one\-hot encoding of amino acid type, and the sequence index of the residue divided by the protein sequence length \(which is used to provide order information for neural network architectures that are order invariant\)\[[14](https://arxiv.org/html/2605.11189#biba.bib31)\]\.

In an atom graph, an edge has a binary feature called ‘edge type’\. It is equal to 1 if the nodes of this edge belong to the same residue\. An atom is encoded by a 10\-dimensional 1\-hot vector, indicating four backbone atom types \(CA, N, C, O\) and six side chain atom types \(CB, C, N, O, S, H\)\. In a surface graph, we use the coordinates and normals generated by MSMS as the features of a triangle vertex\[[9](https://arxiv.org/html/2605.11189#biba.bib26)\], which indicate the contour and orientation of some local patches on the surfaces\. Normals are initially computed by MSMS, then validated by trimesh’s default protocol\.

###### Coevolution signals generated by Facebook’s MSA transformer

We use the row attention weights generated by the MSA Transformer as interfacial co\-evolution signals\. We build a joint MSA for a heterodimer using the protocol proposed by ComplexContact\[[44](https://arxiv.org/html/2605.11189#biba.bib57)\]\. For a homodimer, we simply concatenate each sequence in the MSA with itself\. We then select a diverse set of sequences from the joint MSA as the input of the MSA Transformer\. That is, we filter the MSA with HHfilter\[[31](https://arxiv.org/html/2605.11189#biba.bib46)\]and assign Henikoff weights to sequences\. We further symmetrized the generated inter\-chain attentions, following the MSA Transformer’s protocol\[[25](https://arxiv.org/html/2605.11189#biba.bib41)\]\.

#### 2\.2\.3Datasets

Following DeepHomo\[[42](https://arxiv.org/html/2605.11189#biba.bib56)\], we say there is one true contact between two residues \(of two monomers\) if in the experimental complex structure, the minimal distance between their respective heavy atoms is less than 8 Å\. We define the interfacial contact density of a given dimer byN/\(L1L2N/\(L\_\{1\}L\_\{2\}\), whereNNis the number of inter\-protein contacts andL1L\_\{1\}andL2L\_\{2\}are the respective lengths of the constituent monomers\.

###### CASP\-CAPRI data

We use all 32 dimers \(23 homodimers and 9 heterodimers\) with at most 1000 residues in the 13th and 14th CASP\-CAPRI datasets 40 as our test set\. We do not include the dimers with more than 1000 residues since Facebook’s MSA Transformer cannot handle such a large protein\. To avoid redundancy between our training and test sets and to fairly compare GLINTER with recently published methods, we do not use the 11th and 12th CASP\-CAPRI data\. We run HHblits on the ’uniclust30\_2016\_09’ database to build MSAs for individual chains and then concatenate two MSAs to form a joint MSA for a heterodimer using the method described in ComplexContact\[[44](https://arxiv.org/html/2605.11189#biba.bib57)\]\. We use monomer \(bound\) experimental structures as inputs since their unbound structures are unavailable\. We also tested the 3D structure models of individual chains predicted by AlphaFold\[[15](https://arxiv.org/html/2605.11189#biba.bib32),[30](https://arxiv.org/html/2605.11189#biba.bib45)\]in CASP13 and 14, except for T0974s2 which did not have a predicted 3D model\. The median interfacial contact density of this dataset is 1\.79%\. Calculated by FreeSASA\[[20](https://arxiv.org/html/2605.11189#biba.bib36)\], the median buried solvent accessible surface area \(SASA\) of this dataset is 2507Å\.

###### 3D complex data

Our training set has 5306 homodimers and 1036 heterodimers derived from 3DComplex\[[19](https://arxiv.org/html/2605.11189#biba.bib35)\]\. We do not include the dimers with more than 1000 residues due to MSA Transformer’s limit\[[25](https://arxiv.org/html/2605.11189#biba.bib41)\]\. We say two dimers are at mostx%x\\%similar, if the maximum sequence identity between their constituent monomers is no more thanx%x\\%and build a joint MSA as described in the previous subsection\. The median interfacial contact density of the training set is 0\.76%\. The median buried SASA of the training set is 2393 Å\.

###### PDB2018 data

We build two more test sets from the complexes released to PDB after January 1, 2018\. One test set \(denoted as ‘HomoPDB2018’\) has 165 homodimers and the other one \(denoted as ‘HeteroPDB2018’\) has 72 heterodimers\. We define homodimers and heterodimers in the same way as the 3DComplex data\. We exclude dimers similar to the training set, judged by MMseqs2 E\-value < 1\. We cluster dimers using the 40% sequence identity threshold and also remove dimers with interfacial contact density<0\.7%<0\.7\\%, which is slightly lower than the median interfacial contact density of the training set\. The medians of the buried SASAs of ‘HomoPDB2018’ and ‘HeteroPDB2018’ are 2557 and 2346 Å2, respectively\. The medians of the interfacial contact densities of ‘HomoPDB2018’ and ‘HeteroPDB2018’ are 2\.41% and 3\.52%, respectively\. It should be noted that although we remove dimers similar to our training set, there may be some redundancy between our test dimers and the training sets used by the other competing methods\. Therefore, the estimated performance of the competing methods on the PDB2018 data may be overly optimistic\.

#### 2\.2\.4Training and evaluation

We use weighted cross\-entropy as the loss function since the interfacial contact density is very small \(the median of the training set is 0\.76%\)\. We initially trained our network on a small training subset using weights 5, 10, 50 and 100 and found that the weight 5 yields the best average top\-10 precision in the first few epochs\. So in the formal training, we set the weight of a contact to be five times that of a non\-contact\. We trained our deep models using Adam as the optimizer\[[17](https://arxiv.org/html/2605.11189#biba.bib33)\], with the hyperparametersβ1=0\.9,β2=0\.9999,ϵ=10−8\\beta\_\{1\}=0\.9,\\beta\_\{2\}=0\.9999,\\epsilon=10^\{\-8\}\. The learning rate is initialized to0\.00010\.0001and reduced by half every four epochs\. All models are trained for 20 epochs on two Titan X GPUs, with minibatch size 1 on each GPU\. It takes 20–40 min to train one epoch\. For a given hyperparameter setting, we select the model with the best top\-10 precision on the validation data as the final model\.

Since our deep network is rotation invariant, we do not augment the training set by rotating a monomer multiple times\. Nevertheless, we randomly rotate a monomer once before training to prevent our deep network from learning unexpected artifacts in the dataset\. For a heterodimer, we use both of the orders of its two proteins in training\. For evaluation, we predict two contact maps for one heterodimer by exchanging the order of its two proteins, and then compute the geometric average of the two predicted contact map probability matrices as the final prediction\. We evaluate contact prediction in terms of topkkprecision wherek∈\[10,25,50,L/10\]k\\in\[10,25,50,L/10\]andL/5L/5andLLis the length of the shorter protein in a dimer\. When the number of native contacts is less thankk, we still usekkas the denominator while computing the topkkprecision\. Inter\-chain contact maps are more sparse than intra\-chain contact maps, so we evaluate a smaller number of predicted inter\-chain contacts\.

#### 2\.2\.5Methods to compare

We compare GLINTER with DeepHomo, ComplexContact and BIPSPI\. DeepHomo is a ResNet\-based method developed for only homodimers\. ComplexContact is a sequence\-only and ResNet\-based method developed mainly for heterodimers\. Both DeepHomo and ComplexContact take as input the coevolution information computed by CCMpred\[[29](https://arxiv.org/html/2605.11189#biba.bib44)\]while GLINTER does not\. BIPSPI works for both homodimers and heterodimers and can take both structures and MSAs as input\.

### 2\.3Results

We test our method with the bound experimental structures while comparing it with BIPSPI, DeepHomo and ComplexContact, as mentioned in[Section˜2\.2\.5](https://arxiv.org/html/2605.11189#Ch2.S2.SS5)\. We also study the impact of the quality of predicted structures on our method\.

#### 2\.3\.1Evaluation of interfacial contact prediction

Table 2\.2:Average contact prediction precision \(%\) on the CASP\-CAPRI and PDB dataAll values use native structures except GLINTER∗which uses AlphaFold\-predicted structures\. ‘DH’ = DeepHomo, ‘CC’ = ComplexContact\. HomoCASP: 23 homodimers; HeteroCASP: 9 heterodimers from CASP\-CAPRI\. HomoPDB and HeteroPDB from PDB2018 test set\. Bold indicates best performance\.

###### Performance on the CASP\-CAPRI data

As shown in[Table˜2\.2](https://arxiv.org/html/2605.11189#Ch2.T2), tested on the 23 test homodimers, GLINTER has 54% top 10 precision and 51% topL/10L/10precision whereLLis the sum of the two monomer protein sequence lengths, while DeepHomo has 30% top 10 precision and 27% topL/10L/10precision\. Tested on the nine heterodimers, GLINTER has 44% top 10 precision and 48% topL/10L/10precision, while ComplexContact has 14% top 10 precision and 14% topL/10L/10precision\. Even using the monomer structures predicted by AlphaFold\-1 and AlphaFold\-2 as input, GLINTER has 43% top 10 precision on the homodimers and 24% top 10 precision on the heterodimers\.

###### Performance on the PDB2018 data

Table 2\.3:Average contact prediction precision \(%\) on the HomoPDB2018 test set, which includes 165 homodimers released to PDB after January 1, 2018\.Note:All methods use experimental \(native\) monomer structures\.

Table 2\.4:Average contact prediction precision \(%\) on the HeteroPDB2018 test set, which includes 72 heterodimers released to PDB after January 1, 2018\.As shown in[Table˜2\.2](https://arxiv.org/html/2605.11189#Ch2.T2), tested on the 165 HomoPDB2018 homodimers, GLINTER has 48% top 10 precision, while BIPSPI and DeepHomo have 20 and 24% top 10 precision, respectively\. Tested on the 72 HeteroPDB2018 targets, GLINTER has 47% top 10 precision, while BIPSPI and ComplexContact have 18 and 14% top 10 precision, respectively\. See detailed results in[Tables˜2\.3](https://arxiv.org/html/2605.11189#Ch2.T3)and[2\.4](https://arxiv.org/html/2605.11189#Ch2.T4)\.

In summary, GLINTER consistently outperforms DeepHomo and ComplexContact by a large margin no matter which test sets are evaluated and whether experimental or predicted monomer structures are used\.

#### 2\.3\.2Ablation study

Table 2\.5:Average interfacial contact precision \(%\) of different deep learning models on the CASP\-CAPRI data when experimental monomer structures are used\.Note:Column “D\-cut” shows the distance cutoffs used to define graph edges \(in Å\)\. For example, “8,6,6” indicates that the residue graph, atom graph and surface graph use 8 Å, 6 Å, and 6 Å to define edges, respectively\.

Table 2\.6:Average interfacial contact precision \(%\) of different deep learning models on the CASP\-CAPRI data when monomer structures are predicted by AlphaFold\.Note:Column “D\-cut” shows the distance cutoffs used to define graph edges \(in Å\)\. For example, “8,6,6” indicates that the residue graph, atom graph and surface graph use 8 Å, 6 Å, and 6 Å to define edges, respectively\. The performance of the ESM\-Attention model in this table differs from the native structure results because the predicted and experimental monomer structures do not have exactly the same set of residues\.

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/glinter/sfig2.png)Figure 2\.2:CNN\+ESM\-Attention modelWe train the GLINTER models under eight different settings \(different sets of input features\)\.[Table˜2\.5](https://arxiv.org/html/2605.11189#Ch2.T5)and[Table˜2\.6](https://arxiv.org/html/2605.11189#Ch2.T6)show their test results with monomer experimental structures and AlphaFold\-predicted monomer structures, respectively\. We have studied the following eight settings:

“Residue”, “Residue\+ESM”, “Residue\+Atom”, “Residue\+Atom\+ESM”, “Residue\+Surface”, “Residue\+Surface\+ESM”, “Residue\+Atom\+Surface” and “Residue\+Atom\+Surface\+ESM” models\. Here, “Residue”, “Atom” and “Surface” represent the residue, atom and surface graphs, respectively\. “ESM” means that the ESM row attention weights are used\. Using the ESM row attention weights does not change the network architecture, but increases the input dimension of the first ResNet block, as shown in[Figure˜2\.1](https://arxiv.org/html/2605.11189#Ch2.F1)\.

To evaluate the contribution of the ESM row attention weights, we test a sequence\-only model called “ESM\-Attention” that uses only the ESM row attention weights as input\. Its major module is a 2D ResNet with the same architecture as the one used in the Residue\+ESM model\. To evaluate the contribution of the graph convolution module, we develop a sequence\-structure\-hybrid model denoted as “CNN\+ESM\-Attention”, which uses an 1D convolutional network \(CNN\) and the same set of input features\. Similar to the Residue\+ESM model, the CNN\+ESM\-Attention model consists of two major modules: a Siamese 1D CNN and a ResNet\. The 1D CNN has four convolution layers \(each with 128 filters and kernel size 5\) and the ResNet is the same as that used in the Residue\+ESM model \([Figure˜2\.2](https://arxiv.org/html/2605.11189#Ch2.F2)\)\. Both the ESM\-Attention and the CNN\+ESM\-Attention models are trained on the same dataset using the same protocols as the GLINTER models\.

###### Contribution of the graph convolution module

Table 2\.7:Average interfacial contact precision \(%\) of the ESM\-Attention, CNN\+ESM\-Attention and Residue\+ESM models on the CASP\-CAPRI dataNote:The ESM\-Attention model only uses MSAs as inputs, while the CNN\+ESM\-Attention and Residue\+ESM models use MSAs and experimental monomer structures as inputs\.

As shown in[Tables˜2\.7](https://arxiv.org/html/2605.11189#Ch2.T7)and[2\.5](https://arxiv.org/html/2605.11189#Ch2.T5), the CNN\+ESM\-Attention model has similar performance as the ESM\-Attention model\. The best CNN\+ESM\-Attention model has 35% top\-10 precision and 24% top\-L/10 precision, while the ESM\-Attention model has 31% top\-10 precision and 29% top\-L/10L/10precision\. In contrast, the Residue\+ESM model has 43% top\-10 precision and 42% top\-L/10L/10precision, which suggests that the residue graph \(derived from monomer structures\) used by GLINTER is indeed very helpful for interfacial contact prediction\.

###### Dependency on distance cutoff

Table 2\.8:Average top\-10 interfacial contact precision \(%\) of the ‘Residue\+Atom’ and ‘Residue\+Surface’ models on the CASP\-CAPRI data when experimental monomer structures are usedNote:The first row shows the distance cutoffs used to define graph edges\. For example, ‘8,6’ for ‘Residue\+Atom’ indicates that the residue graph and atom graph use 8 and 6 Å to define edges, respectively\.

The distance cutoff used to define graph edges is an important hyperparameter\. According to our observation, a model with a larger distance cutoff tends to have a lower training loss, although its prediction performance may not be as good\. A model with a smaller distance cutoff may have a higher training loss and much worse prediction performance\. As shown in[Tables˜2\.8](https://arxiv.org/html/2605.11189#Ch2.T8)and[2\.5](https://arxiv.org/html/2605.11189#Ch2.T5), the topkkprecision of GLINTER models increases along with the distance cutoff until reaching the optimal value\. For example, the top\-10 precision of the Residue\+Atom model increases from 22 to 33% as the distance cutoff increases from 4 to 6 Å, and then decreases to 27% when the distance cutoff is 8 Å\. This saturation effect on the distance cutoffs is also observed in\[[33](https://arxiv.org/html/2605.11189#biba.bib48)\]\. Different types of graphs may rely on distance cutoffs differently\. For example, the top 10 precision of the Residue\+Surface model is around 33% when the distance cutoff defining the surface graph ranges from 4 to 10 Å, while the precision of the “Residue\+Atom” model changes a lot with respect to the distance cutoff\. Here, we determine the optimal distance cutoff using the experimental monomer structures, which may not have the optimal performance when predicted monomer structures are used\.

###### Dependency on the quality of predicted monomer structures

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/glinter/tmscore-ablation.png)Figure 2\.3:The x\-axis is the TMscore of the predicted monomer structures\. The y\-axis is the difference of the top 10 precision resulting from the experimental and predicted monomer structures\. \(A\) “Residue, D\-cut=8”\. \(B\) “Residue \+ Atom, D\-cut=8,6” \(C\) “Residue \+ Surface, D\-cut=8,6” \(D\) “Residue \+ Atom \+ Surface, D\-cut=8,6,6” \(E\) “Residue \+ Atom \+ Surface \+ ESM, D\-cut=8,6,6”\. In all the box plots, the upper edge of the box is the third quartile \(Q3\), and the lower edge of the box is the first quartile \(Q1\), the orange line is the median, the upper cap is the highest datum below Q3 \+ 1\.5\(Q3 \- Q1\), and the lower cap is the lowest datum above Q1 \- 1\.5\(Q3\-Q1\)\.GLINTER models are trained with monomer experimental structures\. Here, we study their prediction performance when the AlphaFold\-predicted monomer structures are used\. We use the lower TMscore\[[45](https://arxiv.org/html/2605.11189#biba.bib58)\]of the two constituent monomer models to measure the structure quality of a dimer under test\. We exclude the test dimers without any correct topkkpredicted contacts when their native structures are used as input\. Since there are only dozens of test targets, we divide them into four groups according to their TMscores: low quality \(0\.2≤TMscore<0\.50\.2\\leq\\text\{TMscore\}<0\.5\), acceptable quality \(0\.5≤TMscore<0\.70\.5\\leq\\text\{TMscore\}<0\.7\), medium quality \(0\.7≤TMscore<0\.90\.7\\leq\\text\{TMscore\}<0\.9\) and high quality \(0\.9≤TMscore<1\.00\.9\\leq\\text\{TMscore\}<1\.0\)\.[Figure˜2\.3](https://arxiv.org/html/2605.11189#Ch2.F3)shows that even trained on bound experimental structures, our methods work well on predicted structures with medium or high quality \(i\.e\.TMscore\>0\.7\\text\{TMscore\}\>0\.7\)\. When the predicted monomer structures have lower quality \(TMscore<0\.7\\text\{TMscore\}<0\.7\), GLINTER models perform better with experimental structures than predicted structures\. By comparing[Figure˜2\.3](https://arxiv.org/html/2605.11189#Ch2.F3)D and E, we find that the ESM row attention weight may not be able to reduce the precision gap incurred by predicted structures\. This suggests that the ESM row attention weight derived purely from MSAs may not necessarily improve the robustness of our structure\-based models\.

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/glinter/fig2.png)Figure 2\.4:Comparison of top\-10 precision of three models: ESM, Residue\+Atom\+Surface and Residue\+Atom\+Surface\+ESM\. \(A\) compares Residue\+Atom\+Surface and ESM, \(B\) compares Residue\+Atom\+Surface\+ESM and ESM, and \(C\) compares Residue\+Atom\+Surface and Residue\+Atom\+Surface\+ESM
###### Contribution of the ESM row attention weight

As shown in[Tables˜2\.2](https://arxiv.org/html/2605.11189#Ch2.T2)and[2\.5](https://arxiv.org/html/2605.11189#Ch2.T5), on the 32 dimer targets, the ESM\-Attention model has top 10 andL/10L/10precision 31 and 29%, respectively, greatly outperforming BIPSPI, which has top 10 andL/10L/10precision 15 and 14%, respectively\. That is, even though the MSA Transformer is pre\-trained with the MSAs of single\-chain protein sequences, it works for inter\-chain contact prediction\. Over the nine heterodimer targets, the top 10 precision of ComplexContact and ESMAttention is 14 and 28%, respectively\. As shown in[Tables˜2\.5](https://arxiv.org/html/2605.11189#Ch2.T5)and[2\.6](https://arxiv.org/html/2605.11189#Ch2.T6), no matter whether native or predicted monomer structures are used the ESM row attention weight consistently improves the performance of GLINTER models, which confirms that coevolution signals are very useful for inter\-chain contact predictions\.[Figure˜2\.4](https://arxiv.org/html/2605.11189#Ch2.F4)A compares the performance of the ESM\-Attention model \(which is a sequence\-only model\) and the Residue\+Atom\+Surface model \(which is a structure\-only model\) when the native structures are used\. They have similar overall performance, but perform very differently on individual test targets, which suggests that the ESM row attention weight and structure information are highly complementary to each other\. On the majority of test targets, the Residue\+Atom\+Surface\+ESM model outperforms the ESM\-Attention model \([Figure˜2\.4](https://arxiv.org/html/2605.11189#Ch2.F4)B\) and the Residue\+Atom\+Surface model \([Figure˜2\.4](https://arxiv.org/html/2605.11189#Ch2.F4)C\)\.[Figure˜2\.4](https://arxiv.org/html/2605.11189#Ch2.F4)A and B differs only in the y\-axis by an ESM feature, so their comparison shows the impact of the ESM features\.[Figure˜2\.4](https://arxiv.org/html/2605.11189#Ch2.F4)A and C differs only in the x\-axis by Residue\+Atom\+Surf, so their comparison shows the impact of the Residue\+Atom\+Surf features\.

###### Case study of T0997

We study an interesting target, T0997, where the ESM\-Attention model and the Residue\+Atom\+Surface model perform much better than the Residue\+Atom\+Surface\+ESM model\. In Fig\. S5, the cluster of contacts correctly predicted by the ESM\-Attention model is different from the cluster correctly predicted by the Residue\+Atom\+Surface model, so we can hypothesize that the ESM\-Attention model \(MSA\-based\) and the Residue\+Atom\+Surface model \(structure\-based\) focus on very different patterns in T0997\. Therefore, it is possible that there are some structure patterns around the cluster correctly predicted by the ESM\-Attention model that make the Residue\+Atom\+Surface\+ESM model predictions misaligned with the ground truth contacts; see[Table˜2\.6](https://arxiv.org/html/2605.11189#Ch2.T6)\.

###### Dependency on the depth of MSAs

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/glinter/msa-ablation.png)Figure 2\.5:Correlation betweenln⁡\(Meff\)\\ln\(\\text\{Meff\}\)\(x\-axis\) and the number of correct top\-10 predictions \(y\-axis\) of the ESM\-Attention model\. The targets without correct top\-10 predictions are excluded\. \(R2=0\.3093R^\{2\}=0\.3093\)It is known that intra\-chain contact prediction precision correlates with the depth of MSAs denoted as Meff\. Given an MSA, we use CD\-HIT to cluster all the sequences in this MSA using 65% sequence identity as cutoff\. The effective MSA depth is defined as the number of clusters\. Here, we study the impact of MSA depth on interfacial contact prediction when the ESM row attention weight is used\. To remove the impact of inaccurate predicted structures, here, we test GLINTER models with native monomer structures\.[Figure˜2\.5](https://arxiv.org/html/2605.11189#Ch2.F5)shows that there is certain correlation \(R2=0\.3093R^\{2\}=0\.3093\) between the number of correct top\-10 predictions by the ESM\-Attention model and theln⁡\(Meff\)\\ln\(\\text\{Meff\}\)of the input MSA\.

#### 2\.3\.3Application to selection of docking decoys

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/glinter/fig3-dock.png)Figure 2\.6:The average quality \(measured by TMscore\) of the selected decoys by top predicted contacts\. The x\-axis is the number of top decoys selected\. In the legend, “top\-10”, “top\-25” and “top\-50” represent that top 10, 25 and 50 predicted contacts are used to select docking decoys, respectively\. “best decoy” indicates the quality of the best decoys generated by HDOCKA simple application of predicted interfacial contacts is to select the docking decoys\. We use the topkk\(k∈\[10,25,50\]k\\in\[10,25,50\]\) contacts predicted by the Residue\+Atom\+Surface\+ESM model to rank the docking decoys generated by HDOCK\. The quality of a docking decoy is calculated by comparing it with its experimental complex structure using MMalign \(Mukherjee and Zhang, 2009\)\. For each target, we select top N decoys ranked by the predicted interfacial contacts and define their highest TMscore as the “TMscore of the top N decoys”\. In[Figure˜2\.6](https://arxiv.org/html/2605.11189#Ch2.F6), the y\-axis shows the average TMscore of the top N decoys of all the test dimers\. Generally speaking, predicted contacts may improve the quality of top decoys by 5\-8%\. Except whenN=10N=10, generally speaking using more top predicted contacts may select better decoys than using only top 10 predicted contacts\.

### 2\.4Conclusion

We have presented an interfacial contact prediction method, GLINTER, that predicts inter\-protein contacts by integrating attention information generated by protein language models and graph modeling of monomer \(experimental and predicted\) structures\. The attention may capture evolutionary and coevolutionary information encoded in MSA\. We demonstrate that GLINTER outperforms existing methods and even if trained with experimental structures, it generalizes well to predicted structures\. The interfacial contacts predicted by our method may help improve selection of docking decoys\. Our ablation study shows that the attention information and structural features are complementary and important for interfacial contact prediction\. The features used by GLINTER can be calculated very efficiently and GLINTER is applicable to both heterodimers and homodimers\. Therefore, potentially GLINTER is applicable to the proteome\-scale study of protein–protein interactions and complexes\.

## References

- \[1\]C\. Baldassiet al\.\(2014\)Fast and accurate multivariate gaussian modeling of protein families: predicting residue contacts and protein\-interaction partners\.PLoS One9,pp\. e92721\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p1.1)\.
- \[2\]A\.\-F\. Bitbolet al\.\(2016\)Inferring interaction partners from protein sequences\.Proc\. Natl\. Acad\. Sci\. USA113,pp\. 12180–12185\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p2.1)\.
- \[3\]L\. Burger and E\. van Nimwegen\(2008\)Accurate prediction of protein–protein interactions from sequence alignments using a bayesian method\.Mol\. Syst\. Biol\.4,pp\. 165\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p2.1)\.
- \[4\]Q\. Conget al\.\(2019\)Protein interaction networks revealed by proteome coevolution\.Science365,pp\. 185–189\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p2.1),[§2\.2\.2](https://arxiv.org/html/2605.11189#Ch2.S2.SS2.SSS0.Px1.p2.1)\.
- \[5\]B\. Dai and C\. Bailey\-Kellogg\(2021\)Protein interaction interface region prediction by geometric deep learning\.Bioinformatics37,pp\. 2580–2588\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p3.1)\.
- \[6\]M\. Dawson\-Haggertyet al\.\(2019\)Trimesh\.Note:[https://github\.com/mikedh/trimesh](https://github.com/mikedh/trimesh)Cited by:[§2\.2\.2](https://arxiv.org/html/2605.11189#Ch2.S2.SS2.SSS0.Px1.p2.1)\.
- \[7\]G\. Derevyanko and G\. Lamoureux\(2019\)Protein–protein docking using learned three\-dimensional representations\.bioRxiv,pp\. 738690\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p3.1)\.
- \[8\]A\. Foutet al\.\(2017\)Protein interface prediction using graph convolutional networks\.Adv\. Neural Inf\. Process\. Syst\.,pp\. 6533–6542\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p3.1)\.
- \[9\]P\. Gainzaet al\.\(2020\)Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning\.Nat\. Methods17,pp\. 184–192\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p3.1),[§2\.2\.2](https://arxiv.org/html/2605.11189#Ch2.S2.SS2.SSS0.Px2.p2.1)\.
- \[10\]C\. Genget al\.\(2020\)IScore: a novel graph kernel\-based function for scoring protein–protein docking models\.Bioinformatics36,pp\. 112–121\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p1.1)\.
- \[11\]T\. Gueudreet al\.\(2016\)Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis\.Proc\. Natl\. Acad\. Sci\. USA113,pp\. 12186–12191\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p2.1)\.
- \[12\]K\. Heet al\.\(2016\)Deep residual learning for image recognition\.InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition \(CVPR\),pp\. 770–778\.Cited by:[§2\.2\.1](https://arxiv.org/html/2605.11189#Ch2.S2.SS1.p1.1)\.
- \[13\]T\.A\. Hopfet al\.\(2014\)Sequence co\-evolution gives 3d contacts and structures of protein complexes\.eLife3,pp\. e03430\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p1.1)\.
- \[14\]X\. Jing and J\. Xu\(2021\)Fast and effective protein model refinement by deep graph neural networks\.Nat Comput Sci1,pp\. 462–469\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p4.1),[§2\.2\.2](https://arxiv.org/html/2605.11189#Ch2.S2.SS2.SSS0.Px2.p1.1)\.
- \[15\]J\. Jumperet al\.\(2020\)High accuracy protein structure prediction using deep learning\.Fourteenth Crit\. Assess\. Tech\. Protein Struct\. Predict\.22,pp\. 24\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.11189#Ch2.S2.SS3.SSS0.Px1.p1.1)\.
- \[16\]J\. Jumper, R\. Evans, A\. Pritzel, T\. Green, M\. Figurnov, O\. Ronneberger, K\. Tunyasuvunakool, R\. Bates, A\. Žídek, A\. Potapenko,et al\.\(2021\)Highly accurate protein structure prediction with alphafold\.nature596\(7873\),pp\. 583–589\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p4.1)\.
- \[17\]D\.P\. Kingma and J\. Ba\(2014\)Adam: a method for stochastic optimization\.arXiv preprint arXiv:1412\.6980\.Cited by:[§2\.2\.4](https://arxiv.org/html/2605.11189#Ch2.S2.SS4.p1.2)\.
- \[18\]E\. Laine and A\. Carbone\(2015\)Local geometry and evolutionary conservation of protein surfaces reveal the multiple recognition patches in protein–protein interactions\.PLoS Comput\. Biol\.11,pp\. e1004580\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p1.1)\.
- \[19\]E\.D\. Levyet al\.\(2006\)3D complex: a structural classification of protein complexes\.PLoS Comput\. Biol\.2,pp\. e155\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.11189#Ch2.S2.SS3.SSS0.Px2.p1.2)\.
- \[20\]S\. Mitternacht\(2016\)FreeSASA: an open source c library for solvent accessible surface area calculations\.F1000Res5,pp\. 189\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.11189#Ch2.S2.SS3.SSS0.Px1.p1.1)\.
- \[21\]S\. Ovchinnikovet al\.\(2014\)Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information\.eLife3,pp\. e02030\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p1.1)\.
- \[22\]G\. Pagèset al\.\(2019\)Protein model quality assessment using 3d oriented convolutional neural networks\.Bioinformatics35,pp\. 3313–3319\.Cited by:[§2\.2\.1](https://arxiv.org/html/2605.11189#Ch2.S2.SS1.p4.4)\.
- \[23\]S\. Pittala and C\. Bailey\-Kellogg\(2020\)Learning context\-aware structural representations to predict antigen and antibody binding interfaces\.Bioinformatics36,pp\. 3996–4003\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p3.1)\.
- \[24\]F\. Quadiret al\.\(2021\)DNCON2\_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning\.Sci Rep11,pp\. 12295\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p3.1)\.
- \[25\]R\. Raoet al\.\(2021\)MSA transformer\.bioRxiv\.External Links:[Document](https://dx.doi.org/10.1101/2021.02.12.430858)Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p2.1),[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p4.1),[§2\.2\.1](https://arxiv.org/html/2605.11189#Ch2.S2.SS1.p1.1),[§2\.2\.2](https://arxiv.org/html/2605.11189#Ch2.S2.SS2.SSS0.Px3.p1.1),[§2\.2\.3](https://arxiv.org/html/2605.11189#Ch2.S2.SS3.SSS0.Px2.p1.2)\.
- \[26\]A\. Rives, J\. Meier, T\. Sercu, S\. Goyal, Z\. Lin, J\. Liu, D\. Guo, M\. Ott, C\. L\. Zitnick, J\. Ma,et al\.\(2021\)Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences\.Proceedings of the National Academy of Sciences118\(15\),pp\. e2016239118\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p2.1)\.
- \[27\]M\.F\. Sanneret al\.\(1996\)Reduced surface: an efficient way to compute molecular surfaces\.Biopolymers38,pp\. 305–320\.Cited by:[§2\.2\.2](https://arxiv.org/html/2605.11189#Ch2.S2.SS2.SSS0.Px1.p2.1)\.
- \[28\]S\. Sanyalet al\.\(2020\)ProteinGCN: protein model quality assessment using graph convolutional networks\.bioRxiv\.External Links:[Document](https://dx.doi.org/10.1101/2020.04.06.028266)Cited by:[§2\.2\.1](https://arxiv.org/html/2605.11189#Ch2.S2.SS1.p4.4)\.
- \[29\]S\. Seemayeret al\.\(2014\)CCMpred–fast and precise prediction of protein residue–residue contacts from correlated mutations\.Bioinformatics30,pp\. 3128–3130\.Cited by:[§2\.2\.5](https://arxiv.org/html/2605.11189#Ch2.S2.SS5.p1.1)\.
- \[30\]A\.W\. Senioret al\.\(2020\)Improved protein structure prediction using potentials from deep learning\.Nature577,pp\. 706–710\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.11189#Ch2.S2.SS3.SSS0.Px1.p1.1)\.
- \[31\]M\. Steineggeret al\.\(2019\)HH\-suite3 for fast remote homology detection and deep protein annotation\.BMC Bioinformatics20,pp\. 473\.Cited by:[§2\.2\.2](https://arxiv.org/html/2605.11189#Ch2.S2.SS2.SSS0.Px3.p1.1)\.
- \[32\]F\. Sverrissonet al\.\(2020\)Fast end\-to\-end learning on protein surfaces\.pp\. 15272–15281\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p3.1),[§2\.2\.1](https://arxiv.org/html/2605.11189#Ch2.S2.SS1.p4.4)\.
- \[33\]R\. Townshendet al\.\(2019\)End\-to\-end learning on 3d protein structure for interface prediction\.Adv\. Neural Inf\. Process\. Syst\.32,pp\. 15642–15651\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p3.1),[§2\.3\.2](https://arxiv.org/html/2605.11189#Ch2.S3.SS2.SSS0.Px2.p1.1)\.
- \[34\]G\. Uguzzoniet al\.\(2017\)Large\-scale identification of coevolution signals across homo\-oligomeric protein interfaces by direct coupling analysis\.Proc\. Natl\. Acad\. Sci\. USA114,pp\. E2662–E2671\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p1.1)\.
- \[35\]A\. Vangone and A\.M\. Bonvin\(2015\)Contacts\-based prediction of binding affinity in protein–protein complexes\.eLife4,pp\. e07454\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p1.1)\.
- \[36\]S\. Wanget al\.\(2017\)Accurate de novo prediction of protein contact map by ultra\-deep learning model\.PLoS Comput\. Biol\.13,pp\. e1005324\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p3.1),[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p4.1)\.
- \[37\]M\. Weigtet al\.\(2009\)Identification of direct residue contacts in protein\-protein interaction by message passing\.Proc\. Natl\. Acad\. Sci\. USA106,pp\. 67–72\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p2.1)\.
- \[38\]J\.M\. Wordet al\.\(1999\)Asparagine and glutamine: using hydrogen atom contacts in the choice of side\-chain amide orientation\.J\. Mol\. Biol\.285,pp\. 1735–1747\.Cited by:[§2\.2\.2](https://arxiv.org/html/2605.11189#Ch2.S2.SS2.SSS0.Px1.p2.1)\.
- \[39\]Z\. Xie and J\. Xu\(2022\)Deep graph learning of inter\-protein contacts\.Bioinformatics38\(4\),pp\. 947–953\.Cited by:[Chapter 2](https://arxiv.org/html/2605.11189#Ch2.p1.1)\.
- \[40\]J\. Xuet al\.\(2021\)Improved protein structure prediction by deep learning irrespective of co\-evolution information\.Nature Machine Intelligence3,pp\. 601–609\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p4.1)\.
- \[41\]J\. Xu\(2019\)Distance\-based protein folding powered by deep learning\.Proceedings of the National Academy of Sciences116\(34\),pp\. 16856–16865\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p4.1)\.
- \[42\]Y\. Yan and S\.\-Y\. Huang\(2021\)Accurate prediction of inter\-protein residue–residue contacts for homo\-oligomeric protein complexes\.Brief\. Bioinform22\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p3.1),[§2\.2\.3](https://arxiv.org/html/2605.11189#Ch2.S2.SS3.p1.4)\.
- \[43\]Y\. Yanet al\.\(2017\)HDOCK: a web server for protein–protein and protein–dna/rna docking based on a hybrid strategy\.Nucleic Acids Res\.45,pp\. W365–W373\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p4.1)\.
- \[44\]H\. Zenget al\.\(2018\)ComplexContact: a web server for inter\-protein contact prediction using deep learning\.Nucleic Acids Res\.46,pp\. W432–W437\.Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p2.1),[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p3.1),[§2\.2\.2](https://arxiv.org/html/2605.11189#Ch2.S2.SS2.SSS0.Px3.p1.1),[§2\.2\.3](https://arxiv.org/html/2605.11189#Ch2.S2.SS3.SSS0.Px1.p1.1)\.
- \[45\]Y\. Zhang and J\. Skolnick\(2005\)TM\-align: a protein structure alignment algorithm based on the tm\-score\.Nucleic Acids Res\.33,pp\. 2302–2309\.Cited by:[§2\.3\.2](https://arxiv.org/html/2605.11189#Ch2.S3.SS2.SSS0.Px3.p1.7)\.
- \[46\]T\.\-M\. Zhouet al\.\(2018\)Deep learning reveals many more inter\-protein residue–residue contacts than direct coupling analysis\.bioRxiv\.External Links:[Document](https://dx.doi.org/10.1101/240754)Cited by:[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p2.1),[§2\.1](https://arxiv.org/html/2605.11189#Ch2.S1.p3.1)\.

## Chapter 3Improved Protein Heterodimer Structure Prediction with Protein Language Models

This chapter presents work with Bo Chen, Jiezhong Qiu, Zhaofeng Ye, Jinbo Xu, and Jie Tang\. It was published inBriefings in Bioinformaticsin 2023\[[4](https://arxiv.org/html/2605.11189#bibb.bib82)\]\. Code is available at[https://github\.com/zw2x/msa\_pair](https://github.com/zw2x/msa_pair)\.

### 3\.1Introduction

While GLINTER demonstrates that graph neural networks can effectively predict interfacial contacts from monomeric structures and co\-evolutionary signals, its performance depends on the quality of the input co\-evolutionary information\. For heterodimeric complexes, constructing informative co\-evolutionary features requires identifying correct interacting homologs across species — a problem that remains a key bottleneck\. In this chapter, we address this challenge directly by developing a protein language model\-based approach to pair interacting homologs for improved complex structure prediction\.

Deep learning has made substantial progress in protein structure prediction by effectively leveraging evolutionary information\[[12](https://arxiv.org/html/2605.11189#bibb.bib4),[26](https://arxiv.org/html/2605.11189#bibb.bib54)\]\. These methods utilize the co\-evolutionary signals hidden in Multiple Sequence Alignments \(MSAs\) to infer inter\-residue interactions and three\-dimensional structures\. AlphaFold2\[[12](https://arxiv.org/html/2605.11189#bibb.bib4)\]is the representative method, demonstrating unparalleled accuracy in monomer structure prediction\. Building on this, AlphaFold\-Multimer\[[8](https://arxiv.org/html/2605.11189#bibb.bib68)\]extended the capability to protein complexes, significantly outperforming previous systems\[[5](https://arxiv.org/html/2605.11189#bibb.bib60),[10](https://arxiv.org/html/2605.11189#bibb.bib112),[27](https://arxiv.org/html/2605.11189#bibb.bib67)\]\. However, compared to the breakthrough in monomer folding, the accuracy of AlphaFold\-Multimer on heterodimer prediction remains limited \(success rate∼70%\\sim 70\\%, mean DockQ∼0\.6\\sim 0\.6\), leaving substantial room for improvement\[[27](https://arxiv.org/html/2605.11189#bibb.bib67)\]\.

The most important input feature to AlphaFold\-Multimer is the multiple sequence alignment \(MSA\)\[[10](https://arxiv.org/html/2605.11189#bibb.bib112),[27](https://arxiv.org/html/2605.11189#bibb.bib67)\]\. Compared with AlphaFold2\[[12](https://arxiv.org/html/2605.11189#bibb.bib4)\], which takes the MSA of a single protein as input, AlphaFold\-Multimer needs to build an MSA of interologs for protein complex structure prediction\. However, how to construct such an MSA is still an open problem for heteromers\. It requires the identification of interacting homologs in the MSAs of constituent single chains, which may be challenging since one species may have multiple sequences similar to the target sequence \(paralogs\)\. Several algorithms have been proposed to identify putative interologs from genome data, such as profiling co\-evolved genes\[[16](https://arxiv.org/html/2605.11189#bibb.bib69)\]and comparing phylogenetic trees\[[15](https://arxiv.org/html/2605.11189#bibb.bib70)\]\. Genome co\-localization and species information are two commonly used heuristics to form interologs for co\-evolution\-based complex contact and structure prediction\[[8](https://arxiv.org/html/2605.11189#bibb.bib68),[28](https://arxiv.org/html/2605.11189#bibb.bib57)\]\. Genome co\-localization is based on the observation that, in bacteria, many interacting genes are coded in operons\[[21](https://arxiv.org/html/2605.11189#bibb.bib71),[7](https://arxiv.org/html/2605.11189#bibb.bib72)\]and are co\-transcribed to perform their functions\. However, this rule does not perform well for complexes in eukaryotes with a large number of paralogs, since it becomes more difficult to disambiguate correct interologs\[[28](https://arxiv.org/html/2605.11189#bibb.bib57),[3](https://arxiv.org/html/2605.11189#bibb.bib19)\]\. The other phylogeny\-based method for identifying interologs was first proposed in ComplexContact\[[28](https://arxiv.org/html/2605.11189#bibb.bib57)\]and later similar ideas were adopted by AlphaFold\-Multimer\. This method first identifies groups of paralogs \(sequences of the same species\) from the MSA of each chain, then ranks the paralogs based on their sequence similarity to their corresponding primary chain and finally pairs sequences of the same species and with the same rank together\. However, these are all hand\-crafted approaches which merely take effect on specific domains\. In this paper, we instead investigate general and automatic algorithms for constructing MSAs of interologs for heterodimers effectively\.

Protein language models \(PLMs\)\[[20](https://arxiv.org/html/2605.11189#bibb.bib8),[18](https://arxiv.org/html/2605.11189#bibb.bib41),[9](https://arxiv.org/html/2605.11189#bibb.bib75)\]have emerged as a powerful paradigm for protein representation learning, benefiting tasks such as contact prediction and mutation effect prediction\[[19](https://arxiv.org/html/2605.11189#bibb.bib76)\]\. By capturing biological constraints and co\-evolutionary information from vast sequence databases, PLMs offer a distinct advantage over traditional methods\. This raises a natural question:*Can we leverage the co\-evolutionary information captured by PLMs to build effective interologs?*

In this chapter, we focus on heterodimer structure prediction\. We propose ESMPair, a simple yet effective MSA pairing algorithm that leverages column\-wise attention scores from ESM\-MSA\-1b\[[18](https://arxiv.org/html/2605.11189#bibb.bib41)\]to identify and pair homologs\. Extensive experiments on three test sets \(pConf70, pConf80, and DockQ49\) demonstrate that ESMPair achieves state\-of\-the\-art accuracy, outperforming AlphaFold\-Multimer on heterodimer prediction \(\+10\.7%\+10\.7\\%,\+7\.3%\+7\.3\\%, and\+3\.7%\+3\.7\\%in Top\-5 Best DockQ score, respectively\)\. We also find that ensemble strategies combining ESMPair with other methods further improve the prediction accuracy\. Notably, ESMPair excels on eukaryotic targets and cross\-kingdom pairs \(eukaryote\-bacteria\), answering the challenge of identifying interologs in these difficult cases\. Furthermore, we show that the diversity of interologs correlates positively with prediction accuracy\. Overall, ESMPair effectively incorporates the strength of PLMs to address the challenge of MSA pairing for heterodimer prediction\.

### 3\.2Data and Methods

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/msa_pair/esmpair_overview.png)Figure 3\.1:Schematic illustration of ESMPair\. Given a pair of query sequences: \(1\) JackHMMER searches UniProt\[[24](https://arxiv.org/html/2605.11189#bibb.bib103)\]to generate an MSA for each query, \(2\) homologs are grouped by species, \(3\) ESM\-MSA\-1b estimates column attention scores between each homolog and the query, \(4\) homologs from the same species with the same rank are paired and concatenated into interologs, and \(5\) AlphaFold\-Multimer takes the interolog MSA as input to predict the complex structure\.We introduce the framework of our proposed PLM\-enhanced MSA pairing method, i\.e\. ESMPair\. The overall framework of ESMPair is illustrated in[Figure˜3\.1](https://arxiv.org/html/2605.11189#Ch3.F1)\.

#### 3\.2\.1The PLM\-enhanced MSA pairing pipeline

Previous works\[[20](https://arxiv.org/html/2605.11189#bibb.bib8),[18](https://arxiv.org/html/2605.11189#bibb.bib41),[9](https://arxiv.org/html/2605.11189#bibb.bib75)\]have confirmed that PLMs can capture the co\-evolutionary and inter\-residue contact signals encoded in protein sequences\. Moreover, MSA\-based PLMs\[[18](https://arxiv.org/html/2605.11189#bibb.bib41)\]further offer explicit axial attention mechanisms to extract evolutionary information from MSAs\[[25](https://arxiv.org/html/2605.11189#bibb.bib79),[22](https://arxiv.org/html/2605.11189#bibb.bib44)\]\. In light of this, we adopt ESM\-MSA\-1b\[[18](https://arxiv.org/html/2605.11189#bibb.bib41)\]to explore building MSAs of interologs to improve PCP based on AlphaFold\-Multimer\[[8](https://arxiv.org/html/2605.11189#bibb.bib68)\]\.

###### Column attention \(ESMPair\)\.

The column attention weight matrix, calculated by ESM\-MSA\-1b, measures pairwise similarities between aligned residues in each column\. Formally, for each chain, we have the MSAM∈𝒜N×CM\\in\\mathcal\{A\}^\{N\\times C\}, where𝒜\\mathcal\{A\}denotes the set of the amino acid types,NNandCCrepresent the number of sequences and residues respectively\. The collections of column attention matrices are denoted as

\{Al​h​c∈ℝN×N:l∈\[L\],h∈\[H\],c∈\[C\]\},\\\{A\_\{lhc\}\\in\\mathbb\{R\}^\{N\\times N\}:l\\in\[L\],h\\in\[H\],c\\in\[C\]\\\},\(3\.1\)whereLLis the number of layers in PLM andHHis the number of attention heads of each layer\. We first symmetrize each column attention matrix, and then aggregate the symmetrized matrices along the dimensions ofLL,HHandCCto obtain the pairwise similarity matrix among the sequences of MSA, denoted asS∈ℝN×NS\\in\\mathbb\{R\}^\{N\\times N\}:

S=AGGl∈\[L\],h∈\[H\],c∈\[C\]​\(Al​h​c\+Al​h​c⊤\),S=\\underset\{l\\in\[L\],h\\in\[H\],c\\in\[C\]\}\{\\text\{AGG\}\}\\left\(A\_\{lhc\}\+A\_\{lhc\}^\{\\top\}\\right\),\(3\.2\)whereAGGis an entry\-wise aggregation operator such as entry\-wise mean operationMEAN​\(⋅\)\\text\{MEAN\}\(\\cdot\), sum operatorSUM​\(⋅\)\\text\{SUM\}\(\\cdot\), etc\. Unless otherwise specified,AGGis specified asSUM​\(⋅\)\\text\{SUM\}\(\\cdot\)in this paper\.

SSis symmetric and its first row,S1S\_\{1\}, measures the similarity between the query sequence and its hit sequences in the MSA\. The MSA pairing strategy is as follows: for each constituent chain of the query heterodimer, we group hits from the MSA by their species, and, within each group, rank sequences according to their similarity score inS1S\_\{1\}\. Finally, the sequences of each MSA with the same rank from the same species group are concatenated to form a paired MSA\.

#### 3\.2\.2Settings

###### Evaluation metric\.

We evaluate the accuracy of predicted complex structures using DockQ\[[2](https://arxiv.org/html/2605.11189#bibb.bib80)\], a widely\-used metric in the computational structural biology community\. Specifically, for each protein complex target, we calculate the highest DockQ score among its top\-NNpredicted models selected by their predicted confidences from AlphaFold\-Multimer\. We refer to this metric as the best DockQ among top\-NNpredictions\. We also report other metrics in some experiments like iRMS, TMscore, ICS and IPS, oligomer\-LDDT and QS\-global\.

###### Datasets\.

In order to investigate how improving pairing MSAs can improve the performance of AlphaFold\-Multimer, we construct a test set satisfying the following criteria:

1. \(i\)There are at least 100 sequences that can be paired given the species constraints\.
2. \(ii\)The two constituent chains of a heterodimeric target share<90%<90\\%sequence identity\.

We select heterodimers consisting of chains with 20–1024 residues \(due to the constraint of ESM\-MSA\-1b and also to ignore peptide\-protein complexes\), and the overall number of residues in a dimer is less than 1600 \(due to GPU memory constraint\) from Protein Data Bank \(PDB\), as accessed on 3 March 2022\. We use the default AlphaFold\-Multimer MSA search setting to search the UniProt database\[[6](https://arxiv.org/html/2605.11189#bibb.bib73)\]with JackHMMER\[[17](https://arxiv.org/html/2605.11189#bibb.bib74)\], which is used for MSA pairing\. We also search the Uniclust30 database\[[14](https://arxiv.org/html/2605.11189#bibb.bib81)\]with HHblits\[[23](https://arxiv.org/html/2605.11189#bibb.bib46)\], which is used for monomers, i\.e\. block diagonal pairing\. We further select those heterodimers with at least 100 sequences that can be paired by AlphaFold\-Multimer’s default pairing strategy\.

We define two dimers as at mostx%x\\%similar if the maximum sequence identity between their constituent monomers is no more thanx%x\\%\. Overall, we select 801 heterodimeric targets from PDB that are at most 40% similar to any other targets in the dataset and satisfy the aforementioned two criteria\. Then we use AlphaFold\-Multimer \(using the default MSA matching algorithm\) to predict their complex structures\. Based on their predicted confidence scores \(pConf\) or DockQ scores, 92 targets with their pConf less than 0\.7 are denoted as the pConf70 test set\. We select 0\.7 as the low confidence cutoff based on our fitted logistic regression models over 7000 DockQ and pConf pairs, because the conditional probability of the model having medium or better quality given pConf equals 0\.7 is slightly greater than 0\.5 \(around 0\.6\), while the probability is less than 0\.5 if pConf equals 0\.6\. For more comparisons, we also select 0\.8 as the cutoff, which results in the pConf80 test set of 168 targets, and 155 targets with their predicted DockQ scores less than 0\.49 are denoted as the DockQ49 test set\.

#### 3\.2\.3Baselines

Several heuristic MSA pairing strategies have been developed for protein complex contact and 3D structure prediction\[[1](https://arxiv.org/html/2605.11189#bibb.bib66),[10](https://arxiv.org/html/2605.11189#bibb.bib112)\]\.

###### Phylogeny\-based method\.

This strategy was first proposed in ComplexContact\[[28](https://arxiv.org/html/2605.11189#bibb.bib57)\]for complex contact prediction and is widely adopted by the community\. AlphaFold\-Multimer employed a similar strategy\. This strategy first groups sequences in an MSA by their species and then ranks sequences of the same species by their similarity to the query sequence\. When there is more than one sequence in a species group, it joins two sequences of the same rank within the same species group to form an interolog\. AlphaFold\-Multimer uses this strategy and shows state\-of\-the\-art accuracy in complex structure prediction\[[8](https://arxiv.org/html/2605.11189#bibb.bib68)\]\. Practically, we run the implementation code of AlphaFold\-Multimer following the default setting of the official repository \([https://github\.com/deepmind/alphafold](https://github.com/deepmind/alphafold)\)\. Notably, we only evaluate the unrelaxed model without the template information for time efficiency\[[12](https://arxiv.org/html/2605.11189#bibb.bib4)\]\.

###### Genetic distances\.

In bacteria, interacting genes sometimes are co\-located in operons and co\-transcribed to form protein complexes\[[21](https://arxiv.org/html/2605.11189#bibb.bib71)\]\. Consequently, we can detect interologs by the genetic distance of two genes\. This strategy pairs sequences of the same species based on the distances of their positions in the contigs, which are retrieved from ENA\. In our implementations, given a sequence from the first chain, we pair it with the sequence from the second chain that is closest to it in terms of genetic distance\. If there is more than one closest sequence, we select the one that has the lowest e\-value to the query sequence of the second chain; the e\-value is calculated by the MSA search algorithm used to construct the chain MSA\.

###### Block diagonalization\.

This strategy pads each chain sequence with gaps to the full length of the complex\[[10](https://arxiv.org/html/2605.11189#bibb.bib112)\]\. Therefore, each sequence in the constructed joint MSA, except for the query sequence, will include non\-gap tokens in exactly one chain and gap tokens in other chains\. By sorting sequences in the joint MSA, we can make non\-gap tokens appear only in the diagonal blocks, thus this strategy is termed block diagonalization\. In our implementations, given a sequence from the first \(second\) chain, we append \(prepend\) non\-gap tokens to it until the number of non\-gap tokens equals the length of the second \(first\) chain\.

#### 3\.2\.4Time/Memory requirement analysis

As ESMPair adopts column\-wise attention score from ESM\-MSA\-1b as the metric to construct MSA of interologs, the major running time and memory requirement comes from ESM\-MSA\-1b\. Practically, a single V100 GPU with 32GB memory can run a batch in the shape of 512 sequences with the max length of each sequence as 1024 within a few seconds\. While other baseline methods like block diagonalization, Genome, or the default strategy of AF\-M require no other machine learning models to supplement additional information\. Thus, these methods are free of memory requirements\.

Generally, the major running time and memory requirement of end\-to\-end complex structure predictions lies in the process of MSA searching and the final AF\-M prediction\. Taking the target with PDB ID ‘4rca’ as an example, it contains two subchains with each having about 300 residues\. Statistically, the running time of searching MSA of each subchain in the UniProt database with JackHMMER consumes about half an hour, which results in about 100K MSA sequences for each subchain\. After that, ESMPair is applied to construct MSA of interologs within a few minutes, resulting in 20K MSA of interologs\. As a follow\-up, AF\-M takes more than 20 minutes to use 5 models for predicting the structure of the target based on the obtained MSA of interologs, with each model predicting once without accessing the template and AMBER relaxation\.

In summary, the running time cost of sequence linking methods can be totally ignored in the end\-to\-end complex prediction pipeline with AF\-M\.

### 3\.3Results

Table 3\.1:DockQ scores and success rate of PLM\-enhanced pairing methods and baselines\. We report the average of Top\-5 Best DockQ score, Top\-1 Best DockQ score and Success Rate \(DockQ≥0\.23\\geq 0\.23\) \(%\) on pConf70, DockQ49and pConf80 test sets\. For one test target, we predicted five different structures using the five AlphaFold\-Multimer models\.Table 3\.2:Comparisons between ESMPair and AF\-Multimer on targets from all range pConf scores\. We report the average of DockQ score, TMscore, ICS and IPS as the evaluation metrics \(Larger values mean better performance\)\.Table 3\.3:The Top\-1 Best DockQ performance of two groups with different sequence length \(≥100\\geq 100and<100<100\)\. The GAP value is the subtraction between the DockQ score of the two different length groups\.Table 3\.4:The Top\-1 Best DockQ performance with or without full AF\-M features\.In this section, we first briefly outline the framework of ESMPair for PCP\. Then, we discuss how our proposed method has a better complex prediction accuracy than previous MSA pairing methods\. We find that the ensemble strategy showcases better performance than the default single strategy\. We further quantitatively analyze several key factors and hyperparameters that may impact the performance of our method, and also explore the capability of different measurements to distinguish the acceptable predictions from the unacceptable ones\. Finally, we compare the performance between ESMPair and AlphaFold\-Multimer on CASP15 heteromers\.

#### 3\.3\.1ESMPair overview

The overall framework of ESMPair is illustrated in[Figure˜3\.1](https://arxiv.org/html/2605.11189#Ch3.F1)with the details in Methods\. In complex structure prediction, predictors such as AlphaFold\-Multimer use inter\-chain co\-evolutionary signals by pairing sequences between MSAs of constituent single chains of the query complex\. Formally, given a query heterodimer, we obtain individual MSAs of its two constituent chains, denoted asM1∈𝒜N1×C1M\_\{1\}\\in\\mathcal\{A\}^\{N\_\{1\}\\times C\_\{1\}\}andM2∈𝒜N2×C2M\_\{2\}\\in\\mathcal\{A\}^\{N\_\{2\}\\times C\_\{2\}\}, where𝒜\\mathcal\{A\}is the alphabet used by PLM,N1N\_\{1\}andN2N\_\{2\}are the number of sequences in MSAsM1M\_\{1\}andM2M\_\{2\}, andC1C\_\{1\}andC2C\_\{2\}are the sequence lengths\. The MSA pairing pipeline aims at designing a matching or an injectionπ:\[N1\]→\[N2\]\\pi:\[N\_\{1\}\]\\to\[N\_\{2\}\]between MSAs from each chain to build the MSA of interologs, dubbed asMπ∈𝒜N×\(C1\+C2\)M\_\{\\pi\}\\in\\mathcal\{A\}^\{N\\times\(C\_\{1\}\+C\_\{2\}\)\}, whereNNis the number of sequences in the joint MSA\. In practice, the MSA of interologsMπM\_\{\\pi\}is a collection of the concatenated sequences\{concat​\(M1​\[i\],M2​\[π​\(i\)\]\):i∈P\}\\\{\\text\{concat\}\(M\_\{1\}\[i\],M\_\{2\}\[\\pi\(i\)\]\):i\\in P\\\}, wherePPis the indices of the sequences fromM1M\_\{1\}that can be paired with any sequences fromM2M\_\{2\}according to the matching patternπ\\pi\. Then the MSA of interologs is taken by predictors as input to predict the structure of the query heterodimer\.

#### 3\.3\.2ESMPair outperforms other MSA pairing methods on heterodimer predictions

###### Overall evaluation\.

For each test target we predict five 3D structures using AlphaFold\-Multimer’s five models and then report the average of Top\-kk\(k=1,5k=1,5\) Best DockQ score of the predicted structures and the corresponding success rate \(SR\) in Table[3\.1](https://arxiv.org/html/2605.11189#Ch3.T1)\. Our method outperforms the other methods\. To be specific, our method outperforms AF\-Multimer’s default MSA pairing strategy on all three test sets \(0\.259 versus 0\.234 on pConf70, 0\.423 versus 0\.406 on pConf80 and 0\.265 versus 0\.242 on DockQ49, in terms of Top\-5 DockQ score\)\. Our experimental results confirm that our proposed column\-wise\-attention\-based MSA pairing method, ESMPair, is better than the sequence similarity\-based method used in AF\-Multimer\.

Among all the MSA pairing methods, block diagonalization performs the worst \(−30%\-30\\%compared with ESMPair in terms of the average of Top\-5 best DockQ\)\. The result indicates that the inter\-chain co\-evolutionary information helps with complex structure prediction\. Among MSA pairing baselines, AF\-Multimer surpasses genetic co\-localization by a large margin \(\+12\.8%\+12\.8\\%Top\-5 DockQ\)\. All the proposed PLM\-enhanced pairing methods substantially outperform the block diagonalization and the genetic\-based methods\. Even though AF\-Multimer may have overly optimistic performance using the default pairing method since the training MSAs are built using it, ESMPair further exceeds it by a large margin \(\+4\.2∼10\.7%\+4\.2\\sim 10\.7\\%Top\-5 DockQ score over three test sets\)\.

###### ESMPair performs better on low pConf targets\.

As shown in Table[3\.1](https://arxiv.org/html/2605.11189#Ch3.T1), the performance gap between ESMPair and AF\-Multimer becomes narrower on pConf80 than on pConf70, with improvement ratio from 3\.7% to 10\.7%\. For an in\-depth analysis, we quantitatively analyze the correlations between the predicted confidence score \(pConf\) estimated by AF\-Multimer and the performance gap of the average of Top\-5 Best DockQ score between ESMPair and AF\-Multimer on DockQ49, as illustrated in[Figure˜3\.2](https://arxiv.org/html/2605.11189#Ch3.F2)\(a–b\)\.

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/msa_pair/esmpair_pconf.png)Figure 3\.2:Prediction performance across pConf score regions and taxonomic domains\. \(a–b\) Negative correlation between the relative improvement of ESMPair over AF\-Multimer and pConf score\. \(c–f\) DockQ score comparison among ESMPair, AF\-Multimer, and Genome on Eucaryote, Bacteria, and Eucaryote&Bacteria domains\. Eucaryote&Bacteria denotes heterodimers whose two chains belong to different domains\. Heterodimers in our dataset originate from Eucaryotes, Bacteria, Viruses, and Archaea; we group Bacteria, Viruses, and Archaea into the Bacteria domain\. Across all test sets, ESMPair significantly outperforms both baselines on Eucaryote targets\.The relative improvement is negatively correlated \(Pearson Correlation Coefficient is−0\.49\-0\.49\) with the predicted confidence score\. When pConf is less than 0\.2, the relative improvements even achieve 100%, while when pConf is more than 0\.8, ESMPair performs nearly on par with AF\-Multimer\. This is because AF\-Multimer can do well on relatively easier targets; it is very challenging to further improve it\.

To further quantify the performance comparisons between ESMPair and AF\-Multimer on natural targets with all range of pConf scores, we follow the same data processing pipeline to generate the dataset without applying any filtering based on pConf scores\. Subsequently, we randomly select 300 targets, and use ESMPair and the default pairing strategy of AF\-Multimer to predict their structures\. Of these 300 targets, 256 have pConf scores greater than or equal to 0\.7 while 44 have pConf scores lower than 0\.7\. For convenience, we use model 1 to generate predictions for each target, and make only one prediction per target\. We report the average of DockQ score, TMscore, Interface Contact Score \(ICS\) and Interface Path Score \(IPS\) as the evaluation metrics, shown in Table[3\.2](https://arxiv.org/html/2605.11189#Ch3.T2)\. The results suggest ESMPair outperforms AF\-M on targets with low pConf \(pConf<0\.7<0\.7\) scores, whereas it performs comparably with AF\-M on those with high pConf \(pConf≥0\.7\\geq 0\.7\) scores\.

###### ESMPair has a higher prediction accuracy on eukaryote targets\.

We further compare the DockQ distribution of ESMPair, AF\-Multimer and Genome on three kingdoms, i\.e\. Eukaryote, Bacteria and Eukaryote & Bacteria, which is a special domain where the two constituent chains in the heterodimer belong to the two domains, respectively\. To be specific, all the heterodimers from pConf70, DockQ49 and pConf80 are divided via the domains of Eukaryotes, Bacteria, Viruses, Archaea, Eukaryotes;Bacteria, respectively\. Note that we group the data from Bacteria, Viruses and Archaea as the Bacteria domain\.[Figure˜3\.2](https://arxiv.org/html/2605.11189#Ch3.F2)\(c–f\) demonstrates that ESMPair performs better than the other two MSA pairing methods on the Eukaryotes data by a large margin \(0\.420 for ESMPair, 0\.402 for AF\-Multimer and 0\.369 for Genome on the overall data\)\. As it is notoriously difficult to identify homologous protein sequences for the Eukaryotes data, ESMPair has a desirable property to build effective interologs on the Eukaryotes\. While in the Bacteria data, three strategies have similar performance \(around 0\.35 on the whole data\)\. Most strikingly, we find ESMPair has an extraordinary performance on the Euka\. & Bact\. data over the other two methods \(0\.394 for ESMPair, 0\.314 for AF\-Multimer and 0\.277 for Genome on the overall data\)\. We further check the performance gap for each target from the Euka\. & Bact\. data\. ESMPair performs significantly better on three out of six targets: 0\.443 \(ESMPair\) versus 0\.013 \(AF\-Multimer\) on 5D6J, 0\.289 versus 0\.201 on 6B03, and 0\.864 versus 0\.854 on 7AYE\. Besides, ESMPair performs on par with AF\-Multimer on the other three targets\. These results shed light on the robustness of PLMs\.

###### ESMPair outperforms AF\-Multimer on most of the newly released targets\.

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/msa_pair/esmpair_newcases.png)Figure 3\.3:Comparison of ESMPair and AF\-Multimer on newly released targets \(a–f\) and an unresolved case \(g\)\. \(a–f\) Evaluations on 74 targets released after 30 April 2018\. \(a\) Bar chart showing the relative performance gap between ESMPair and AF\-Multimer across three categories: ESMPair outperforms AF\-Multimer, AF\-Multimer outperforms ESMPair, and equal performance\. \(b\) Interface and ligand RMSD distributions of structures predicted by ESMPair \(purple\) and AF\-Multimer \(yellow\)\. \(c–f\) Four representative cases: AF\-Multimer predicts incorrect ligand orientations for 7VSI and 7AQU, and incorrect binding sites for 7SL9 and 6FYH\. \(g\) The intermediate filament NFM–INA heterodimer predicted by ESMPair forms a four\-helix bundle\. Gray boxes indicate the interacting motifs of coil 1A, coil 1B, and coil 2 of the two proteins\.We further select 74 targets that AF\-Multimer does not train on\[[8](https://arxiv.org/html/2605.11189#bibb.bib68)\], i\.e\. the targets whose release date is later than 30 April 2018, from the test dataset\. Then we compare the performance of predicted structures on these targets between ESMPair and AF\-Multimer in[Figure˜3\.3](https://arxiv.org/html/2605.11189#Ch3.F3)\. ESMPair outperforms AF\-Multimer on most of the targets \(57%\) with a relatively larger performance gap, while AF\-Multimer outperforms ESMPair on fewer targets \(35%\) with a relatively lower gap\. We further plot the distributions between interface RMSD and ligand RMSD of predicted structures via ESMPair and AF\-Multimer in[Figure˜3\.3](https://arxiv.org/html/2605.11189#Ch3.F3)\(b\)\. The holistic distributions predicted by ESMPair are closer to the origin of coordinates than those predicted by AF\-Multimer, which strongly proves that ESMPair is superior to AF\-Multimer on the predictions of newly released targets\.

Furthermore, we show why ESMPair performs better than AF\-Multimer by analyzing four PDB targets, 7VSI, 7AQU, 6FYH and 7SL9, in[Figure˜3\.3](https://arxiv.org/html/2605.11189#Ch3.F3)\(c–f\)\. Among these, 7VSI and 6FYH have a larger predicted iRMSD and lRMSD variance by AF\-Multimer, because AF\-Multimer predicts the wrong binding sites\. While AF\-Multimer predicts the right binding sites on 7SL9 and 7AQU that have a smaller predicted iRMSD and lRMSD variance, it unfortunately predicts the wrong ligand orientations\. In contrast, our proposed ESMPair correctly predicts the binding sites on the receptor and also places the ligand in the approximately correct relative orientation\.

To better illustrate the usage of ESMPair in predicting the protein complexes without known resolved 3D structures, we inspected the intermediate filament heterodimer formed between the neurofilament medium polypeptide \(NFM, UniProt ID P08553\) andα\\alpha\-internexin \(UniProt ID P46660\), which is known to form an anti\-parallel four\-helix bundle\[[13](https://arxiv.org/html/2605.11189#bibb.bib77),[11](https://arxiv.org/html/2605.11189#bibb.bib78)\]\. As shown in[Figure˜3\.3](https://arxiv.org/html/2605.11189#Ch3.F3)\(g\), both ESMPair and AF\-Multimer correctly predict the three binding interfaces from NFM andα\\alpha\-internexin\. However, ESMPair predicted the two coiled coils to pack as a four\-helix bundle, which is consistent with the experimental evidence, while AF\-Multimer predicted the two coiled coils to be separated\. This case demonstrates the potential to apply ESMPair to model unresolved protein complexes\.

###### ESMPair is more robust than AF\-Multimer on different sequence lengths\.

We split the targets from pConf70, pConf80 and DockQ49 datasets via the sequence length into two groups: one is the targets with≥100\\geq 100residues and the other one owns the targets with<100<100residues\. Note that we use the shorter protein between the two chains as the length of targets\. We provide the average Top\-1 Best DockQ comparison between the targets as shown in Table[3\.3](https://arxiv.org/html/2605.11189#Ch3.T3)\. The results demonstrate that ESMPair performs consistently better than AF\-M in different lengths\. Moreover, ESMPair is robust for complexes with variable lengths\.

###### ESMPair outperforms AF\-Multimer with or without the full features\.

We use the pConf70 set for comparing the performance between ESMPair and AF\-Multimer on the full feature settings, i\.e\. adding the template information and AMBER force\-field\. For each target, we run each of the five models once, as shown in Table[3\.4](https://arxiv.org/html/2605.11189#Ch3.T4)\. We can conclude that \(1\) the full feature setting indeed significantly improves the performance of ESMPair; \(2\) ESMPair rivals AF\-Multimer in all settings\.

#### 3\.3\.3Ensemble improves the prediction accuracy

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/msa_pair/fig4.png)Figure 3\.4:Comparison of ESMPair with four alternative MSA pairing approaches \(a–d\) and ensemble strategies \(e\) on pConf70 targets\. \(a–d\) Each point shows the DockQ score of a target for ESMPair \(x\-axis\) versus the compared method \(y\-axis\)\. Points below the diagonal indicate ESMPair outperforms the alternative\. Highlighted regions denote incorrect \(white\), acceptable \(gray\), medium \(yellow\), and high\-quality \(purple\) predictions by DockQ score\. \(e\) Gray bars show single\-strategy performance, where G\. = Genome, A\. = AF\-Multimer, and E\. = ESMPair\. ESMPair achieves the best single\-strategy result \(0\.259 DockQ, 42\.4% success rate\)\. Yellow bars show pairwise ensembles, with ESMPair \+ Genome performing best \(0\.277 DockQ, 44\.6% success rate\)\. The purple bar shows the three\-strategy ensemble achieving the highest overall performance \(0\.285 DockQ, 46\.8% success rate\)\.From[Figure˜3\.4](https://arxiv.org/html/2605.11189#Ch3.F4)\(a–d\), we found that different MSA pairing methods have their own advantages; even block diagonalization performs slightly better than ESMPair on about 30% of targets, which implies that they can complement each other\. To verify that, we combine 10 models predicted by any two of the MSA pairing methods, then we report the average of Top\-5 Best DockQ score, as shown in[Figure˜3\.4](https://arxiv.org/html/2605.11189#Ch3.F4)\(e\)\. The ensemble strategies, i\.e\. the yellow and purple bars, significantly outperform the corresponding single strategy, i\.e\. the gray bars\. ESMPair in addition to any one of the single strategies always has a better performance than the one without ESMPair; for example, the SR of ESMPair \+ Genome is 44\.6% versus 40\.4% of AF\-Multimer \+ Genome\. Finally, the ensemble of all three strategies, i\.e\. the purple bar, reaches the best performance with 0\.285 DockQ score and 46\.8% Success Rate, which motivates us that instead of merely using a single strategy to build interologs, the ensemble MSA pairing strategy may be the silver bullet to identify more effective interologs\.

#### 3\.3\.4Factors influencing prediction accuracy

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/msa_pair/fig5.png)Figure 3\.5:Factors affecting structure prediction performance\. Correlation between average Top\-5 Best DockQ score and \(a\) column attention score \(log\-scale\) predicted by ESM\-MSA\-1b, \(b\) number of effective sequences \(Meff\), \(c\) number of species, and \(d\) depth of paired MSA \(log\-scale\)\. \(e\) Distribution of column attention score versus the number of effective interologs\. The red curve shows the fitted linear regression model \(Pearsonr≈−0\.70r\\approx\-0\.70\), indicating that higher column attention scores correspond to fewer effective interologs\.We investigate the connections between the performance of ESMPair and some key factors of the formed MSA of interologs, such as the column\-wise attention score \(i\.e\. ColAttn\_score\), the number of effective sequences within MSA measured by Meff \(i\.e\. \#Meff\), the number of species \(i\.e\. \#Species\) and the depth of MSA \(i\.e\. Msa:Depth\)\. To be specific, we predict 1689 heterodimers sampled from PDB without filtering and divide them into different regions according to the value of each factor\. Notably, for ColAttn\_score, we average the score of each single chain in interolog, then re\-scale it in the logarithm form, and then average ColAttn\_score of all interologs from the paired MSA as the final score of the target\. For \#Meff, \#Species and Msa:Depth, we directly calculate the corresponding statistics based on the interologs\.

The correlations between DockQ score and each of the above factors are illustrated in[Figure˜3\.5](https://arxiv.org/html/2605.11189#Ch3.F5)\. \#Meff, \#Species and Msa:Depth have a similar trend where the predicted structure accuracy improves with the increasing of these factors\. It implies that MSA with more diversity represents more co\-evolutionary information that benefits structure predictions of AF\-Multimer, which also aligns with previous insights\[[18](https://arxiv.org/html/2605.11189#bibb.bib41)\]\. Moreover, increasing ColAttn\_score results in decreasing structure prediction accuracy\. Considering the self\-attention mechanism in the PLM, given a sequence as the query, the self\-attention mechanism aims at identifying the sequence with high homology affinity, i\.e\. the sequence with a high similarity score\[[18](https://arxiv.org/html/2605.11189#bibb.bib41)\]\. Therefore, a large ColAttn\_score indicates the MSA with a low \#Meff, which potentially results in inaccurate structure prediction\. To justify our speculation, we explicitly characterize the dependency between ColAttn\_score and \#Meff, as shown in[Figure˜3\.5](https://arxiv.org/html/2605.11189#Ch3.F5)\(e\)\. ColAttn\_score has shown a negative correlation to the \#Meff, with the Pearson correlation coefficient of−0\.70\-0\.70, which elucidates that a higher ColAttn\_score reflects MSA with lower sequence diversity\.

### 3\.4Conclusion

This paper explores a series of simple yet effective MSA pairing algorithms based on pre\-trained PLMs for constructing effective interologs\. To the best of our knowledge, this is the first time that PLMs are used to construct joint MSAs\. Experimental results have confirmed that the proposed ESMPair significantly outperforms the state\-of\-the\-art phylogeny\-based protocol adopted by AlphaFold\-Multimer\. What is more, ESMPair performs significantly better on targets from eukaryotes, which are hard to be predicted accurately by AF\-Multimer\. We further confirm that, instead of using the conventional single strategy to build interologs, the ensemble MSA pairing strategy can largely improve the structure prediction accuracy\. Generally, ESMPair has a profound impact on biological applications depending on the high quality MSA\. In the future, we will continue to explore more potential ways to leverage the advantages of PLM in building and choosing MSA\. We also look forward to applying our proposed methods to improve current MSA\-based applications\.

## References

- \[1\]M\. Baeket al\.\(2021\)Accurate prediction of protein structures and interactions using a three\-track neural network\.Science373,pp\. 871–876\.Cited by:[§3\.2\.3](https://arxiv.org/html/2605.11189#Ch3.S2.SS3.p1.1)\.
- \[2\]S\. Basu and B\. Wallner\(2016\)DockQ: a quality measure for protein\-protein docking models\.PLoS One11,pp\. e0161879\.Cited by:[§3\.2\.2](https://arxiv.org/html/2605.11189#Ch3.S2.SS2.SSS0.Px1.p1.2)\.
- \[3\]A\.\-F\. Bitbolet al\.\(2016\)Inferring interaction partners from protein sequences\.Proc\. Natl\. Acad\. Sci\. USA113,pp\. 12180–12185\.Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p3.1)\.
- \[4\]B\. Chen, Z\. Xie, J\. Qiu, Z\. Ye, J\. Xu, and J\. Tang\(2023\)Improved the heterodimer protein complex prediction with protein language models\.Briefings in Bioinformatics24\(4\),pp\. bbad221\.Cited by:[Chapter 3](https://arxiv.org/html/2605.11189#Ch3.p1.1)\.
- \[5\]S\.R\. Comeauet al\.\(2004\)ClusPro: an automated docking and discrimination method for the prediction of protein complexes\.Bioinformatics20,pp\. 45–50\.Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p2.2)\.
- \[6\]T\. U\. Consortium\(2021\)UniProt: the universal protein knowledgebase in 2021\.Nucleic Acids Res\.49,pp\. D480–D489\.Cited by:[§3\.2\.2](https://arxiv.org/html/2605.11189#Ch3.S2.SS2.SSS0.Px2.p2.1)\.
- \[7\]P\. Damet al\.\(2007\)Prediction of operons in microbial genomes\.Nucleic Acids Res\.35,pp\. 3642–3652\.Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p3.1)\.
- \[8\]DeepMind\(2021\)AlphaFold\-multimer\.GitHub\.Note:[https://github\.com/deepmind/alphafold](https://github.com/deepmind/alphafold)Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p2.2),[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p3.1),[§3\.2\.1](https://arxiv.org/html/2605.11189#Ch3.S2.SS1.p1.1),[§3\.2\.3](https://arxiv.org/html/2605.11189#Ch3.S2.SS3.SSS0.Px1.p1.1),[§3\.3\.2](https://arxiv.org/html/2605.11189#Ch3.S3.SS2.SSS0.Px4.p1.1)\.
- \[9\]A\. Elnaggaret al\.\(2020\)ProtTrans: towards cracking the language of life’s code through self\-supervised learning\.bioRxiv\.Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p4.1),[§3\.2\.1](https://arxiv.org/html/2605.11189#Ch3.S2.SS1.p1.1)\.
- \[10\]R\. Evans, M\. O’Neill, A\. Pritzel, N\. Antropova, A\. Senior, T\. Green, A\. Žídek, R\. Bates, S\. Blackwell, J\. Yim,et al\.\(2022\)Protein complex prediction with AlphaFold\-Multimer\.bioRxiv\.External Links:[Document](https://dx.doi.org/10.1101/2021.10.04.463034)Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p2.2),[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p3.1),[§3\.2\.3](https://arxiv.org/html/2605.11189#Ch3.S2.SS3.SSS0.Px3.p1.1),[§3\.2\.3](https://arxiv.org/html/2605.11189#Ch3.S2.SS3.p1.1)\.
- \[11\]H\. Herrmann and U\. Aebi\(2004\)Molecular architecture of intermediate filaments\.Annu\. Rev\. Biochem\.73,pp\. 749–789\.Cited by:[§3\.3\.2](https://arxiv.org/html/2605.11189#Ch3.S3.SS2.SSS0.Px4.p3.2)\.
- \[12\]J\. Jumper, R\. Evans, A\. Pritzel, T\. Green, M\. Figurnov, O\. Ronneberger, K\. Tunyasuvunakool, R\. Bates, A\. Žídek, A\. Potapenko,et al\.\(2021\)Highly accurate protein structure prediction with alphafold\.nature596\(7873\),pp\. 583–589\.Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p2.2),[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p3.1),[§3\.2\.3](https://arxiv.org/html/2605.11189#Ch3.S2.SS3.SSS0.Px1.p1.1)\.
- \[13\]L\. Kreplaket al\.\(2006\)Structure of the neurofilament head domain\.J\. Mol\. Biol\.357,pp\. 1165–1174\.Cited by:[§3\.3\.2](https://arxiv.org/html/2605.11189#Ch3.S3.SS2.SSS0.Px4.p3.2)\.
- \[14\]M\. Mirditaet al\.\(2017\)Uniclust databases of clustered and annotated protein sequences and alignments\.Nucleic Acids Res\.45,pp\. D170–D176\.Cited by:[§3\.2\.2](https://arxiv.org/html/2605.11189#Ch3.S2.SS2.SSS0.Px2.p2.1)\.
- \[15\]F\. Pazos and A\. Valencia\(2001\)Comparing phylogenetic trees to detect protein interactions\.Nucleic Acids Res\.29,pp\. 523–528\.Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p3.1)\.
- \[16\]M\. Pellegriniet al\.\(1999\)Profiling co\-evolved genes to identify interacting proteins\.Proc\. Natl\. Acad\. Sci\. USA96,pp\. 4285–4288\.Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p3.1)\.
- \[17\]S\.C\. Potteret al\.\(2018\)HMMER web server: 2018 update\.Nucleic Acids Res\.46,pp\. W200–W204\.Cited by:[§3\.2\.2](https://arxiv.org/html/2605.11189#Ch3.S2.SS2.SSS0.Px2.p2.1)\.
- \[18\]R\. Raoet al\.\(2021\)MSA transformer\.bioRxiv\.External Links:[Document](https://dx.doi.org/10.1101/2021.02.12.430858)Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p4.1),[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p5.3),[§3\.2\.1](https://arxiv.org/html/2605.11189#Ch3.S2.SS1.p1.1),[§3\.3\.4](https://arxiv.org/html/2605.11189#Ch3.S3.SS4.p2.1)\.
- \[19\]R\.M\. Raoet al\.\(2020\)Evolutionary scale modeling of biological systems\.bioRxiv\.Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p4.1)\.
- \[20\]A\. Rives, J\. Meier, T\. Sercu, S\. Goyal, Z\. Lin, J\. Liu, D\. Guo, M\. Ott, C\. L\. Zitnick, J\. Ma,et al\.\(2021\)Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences\.Proceedings of the National Academy of Sciences118\(15\),pp\. e2016239118\.Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p4.1),[§3\.2\.1](https://arxiv.org/html/2605.11189#Ch3.S2.SS1.p1.1)\.
- \[21\]H\. Salgadoet al\.\(2000\)Operons in escherichia coli: genomic analyses and predictions\.Proc\. Natl\. Acad\. Sci\. USA97,pp\. 6652–6657\.Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p3.1),[§3\.2\.3](https://arxiv.org/html/2605.11189#Ch3.S2.SS3.SSS0.Px2.p1.1)\.
- \[22\]S\. Seemayeret al\.\(2014\)CCMpred–fast and precise prediction of protein residue–residue contacts from correlated mutations\.Bioinformatics30,pp\. 3128–3130\.Cited by:[§3\.2\.1](https://arxiv.org/html/2605.11189#Ch3.S2.SS1.p1.1)\.
- \[23\]M\. Steineggeret al\.\(2019\)HH\-suite3 for fast remote homology detection and deep protein annotation\.BMC Bioinformatics20,pp\. 473\.Cited by:[§3\.2\.2](https://arxiv.org/html/2605.11189#Ch3.S2.SS2.SSS0.Px2.p2.1)\.
- \[24\]UniProt Consortium\(2023\)UniProt: the universal protein knowledgebase in 2023\.Nucleic Acids Research51\(D1\),pp\. D523–D531\.External Links:[Document](https://dx.doi.org/10.1093/nar/gkac1052)Cited by:[Figure 3\.1](https://arxiv.org/html/2605.11189#Ch3.F1),[Figure 3\.1](https://arxiv.org/html/2605.11189#Ch3.F1.3.2)\.
- \[25\]H\. Wanget al\.\(2020\)Axial\-deeplab: stand\-alone axial\-attention for panoptic segmentation\.ECCV\.Cited by:[§3\.2\.1](https://arxiv.org/html/2605.11189#Ch3.S2.SS1.p1.1)\.
- \[26\]J\. Xuet al\.\(2021\)Improved protein structure prediction by deep learning irrespective of co\-evolution information\.Nature Machine Intelligence3,pp\. 601–609\.Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p2.2)\.
- \[27\]R\. Yinet al\.\(2022\)Benchmarking alphafold\-multimer on the prediction of protein complexes\.bioRxiv\.External Links:[Document](https://dx.doi.org/10.1101/2022.05.10.491386)Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p2.2),[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p3.1)\.
- \[28\]H\. Zenget al\.\(2018\)ComplexContact: a web server for inter\-protein contact prediction using deep learning\.Nucleic Acids Res\.46,pp\. W432–W437\.Cited by:[§3\.1](https://arxiv.org/html/2605.11189#Ch3.S1.p3.1),[§3\.2\.3](https://arxiv.org/html/2605.11189#Ch3.S2.SS3.SSS0.Px1.p1.1)\.

## Chapter 4Redesign Selective Protein Binders Using Contrastive Decoding

### 4\.1Introduction

Designing protein binders that target specific proteins with high affinity and specificity is a fundamental challenge in protein engineering with broad applications in therapeutics and basic research\[[23](https://arxiv.org/html/2605.11189#bibc.bib14),[22](https://arxiv.org/html/2605.11189#bibc.bib130),[7](https://arxiv.org/html/2605.11189#bibc.bib131)\]\. While recent advances in deep learning have improved design capabilities, generating binders that achieve both tight binding and target specificity without extensive experimental optimization remains difficult\[[38](https://arxiv.org/html/2605.11189#bibc.bib132),[28](https://arxiv.org/html/2605.11189#bibc.bib161)\]\.

Current structure\-based binder design, also known as one\-sided interface design\[[23](https://arxiv.org/html/2605.11189#bibc.bib14)\], typically follows a three\-stage workflow\[[38](https://arxiv.org/html/2605.11189#bibc.bib132),[28](https://arxiv.org/html/2605.11189#bibc.bib161),[33](https://arxiv.org/html/2605.11189#bibc.bib6)\]: \(1\) backbone or all\-atom structure generation conditioned on the target protein, \(2\) sequence design to optimize binding properties and generate diverse candidates, and \(3\) filtering and ranking of designs using machine learning and physics\-based scoring functions\. Recent years have seen substantial progress in each component\. However, existing fixed\-backbone design methods do not adequately address the specific requirements of binder design, including affinity optimization, reduction of off\-target binding, and accurate representation of interface side\-chain conformations that determine binding specificity\.

Here we focus on the fixed\-backbone design component in the context of binder design\. Fixed\-backbone binder design aims to generate sequences for a binder that can fold independently and bind a target protein to form a predetermined complex structure\[[23](https://arxiv.org/html/2605.11189#bibc.bib14)\]\. ProteinMPNN has emerged as a widely adopted model for this task due to its efficiency and effectiveness at designing sequences from backbone structures, particularly for monomeric proteins with idealized scaffolds\. However, ProteinMPNN and similar fixed\-backbone design models have three key limitations for binder design\. First, these models operate solely on backbone atom positions and cannot capture side\-chain conformations that are critical for defining binding interfaces and specificity\. Second, successful binder design requires simultaneously optimizing two objectives: the binder must fold stably in its unbound state and form a stable complex with the target\. Current autoregressive sequence design models generate sequences based on per\-residue conditional probabilities and lack explicit mechanisms to jointly optimize the two objectives\. Third, current fixed\-backbone design models lack capabilities to improve binding specificity\.

To address these limitations, we introduce RedNet111Code and data are available at[https://github\.com/zw2x/rednet\_public](https://github.com/zw2x/rednet_public)\., a framework for binder sequence design with two main innovations\. First, we develop a multiscale graph neural network architecture that incorporates both backbone geometry and side\-chain information from the target, enabling more accurate modeling of binding interfaces\. Second, we introduce a contrastive decoding algorithm that leverages the trained model to improve binding while simultaneously reducing off\-target interactions\. To our best knowledge, this is the first principled algorithm that enables deep learning\-based fixed\-backbone binder design to explicitly improve both affinity and specificity\.

We evaluate RedNet on multiple benchmarks relevant to fixed\-backbone binder design\. On native sequence recovery, RedNet achieves 43% on heterodimers, a 30% relative improvement over ESM\-IF \(33%\) and 16% over ProteinMPNN \(37%\)\. On heterodimer self\-consistency evaluated by AlphaFold3, RedNet with contrastive decoding \(RedNet\-CD\) matches native sequence success rates \(68%\) on high\-quality targets, outperforming ProteinMPNN \(59%\) and ESM\-IF \(61%\)\. Rosetta energetic analysis confirms that RedNet\-CD and RedNet\-Ens \(an ensemble of RedNet and RedNet\-CD\) produce designs with native\-like or superior binding energetics, hydrogen bonding, and surface hydrophobicity\. Additionally, we curate a new benchmark from the PDB to assess binding selectivity against structurally similar off\-targets\. On this benchmark, RedNet\-CD achieves 64\.81% energetic selectivity at the base threshold, nearly doubling baseline RedNet \(33\.33%\) and outperforming all other methods, demonstrating that contrastive decoding specifically enhances the ability to discriminate between on\-target and off\-target interactions\.

#### 4\.1\.1Related Works

###### Fixed backbone design\.

Physics\-based fixed backbone design relies on two components: an energy function to model sequence\-structure compatibility and a search algorithm to explore sequence space\. Energy functions modeling van der Waals interactions, hydrogen bonding, electrostatics, and solvation are parameterized to reproduce features of natural proteins\[[1](https://arxiv.org/html/2605.11189#bibc.bib134)\]\. They are optimized to improve accuracy using small molecule and macromolecular data\[[29](https://arxiv.org/html/2605.11189#bibc.bib133)\]\. Search algorithms fall into two categories: stochastic approaches such as Rosetta use simulated annealing to identify low\-energy sequences\[[1](https://arxiv.org/html/2605.11189#bibc.bib134)\], while deterministic approaches such as OSPREY employ dead\-end elimination to provably identify the minimum energy conformations\[[17](https://arxiv.org/html/2605.11189#bibc.bib120)\]\. Although both approaches have demonstrated experimental successes, they become computationally demanding for large sequence spaces and typically require extensive experimental screening to identify functional designs\.

###### Deep learning for fixed backbone design\.

Deep learning methods have substantially improved both computational efficiency and experimental success rates for fixed backbone design\. These approaches can be categorized as autoregressive \(AR\) or non\-autoregressive \(NAR\)\. AR models, including Structured Transformer\[[21](https://arxiv.org/html/2605.11189#bibc.bib157)\]and ProteinMPNN\[[8](https://arxiv.org/html/2605.11189#bibc.bib158)\], generate sequences iteratively using graph\-based encoders that capture local structural context\. NAR models such as PiFold\[[14](https://arxiv.org/html/2605.11189#bibc.bib160)\]generate entire sequences in a single forward pass, achieving significant speedups while maintaining competitive accuracy\. ProteinMPNN has demonstrated substantially higher experimental success rates than Rosetta across diverse design challenges, in particular monomeric proteins\[[8](https://arxiv.org/html/2605.11189#bibc.bib158)\]\. These methods are now widely adopted in de novo protein design workflows\.

###### One\-sided interface design\.

One\-sided interface design, where a protein binder is designed against a fixed target, has broad applications from modulating signaling receptors to neutralizing pathogens\. Physics\-based approaches have achieved limited success, with functional designs typically restricted to targets with favorable features such as hydrophobic patches or concave surfaces, and to small, easily stabilized scaffolds\[[6](https://arxiv.org/html/2605.11189#bibc.bib189)\]\.

Deep learning has enabled substantial progress in this area\. Two\-stage approaches such as RFdiffusion\[[33](https://arxiv.org/html/2605.11189#bibc.bib6)\]first generate binder backbones via structure diffusion, then design sequences using ProteinMPNN\. End\-to\-end methods such as BindCraft\[[28](https://arxiv.org/html/2605.11189#bibc.bib161)\]directly optimize sequences through backpropagation of AlphaFold2 confidence metrics\. Both approaches have produced experimentally validated binders\. However, success rates vary considerably across targets \(0% to\>\>90%\)\[[38](https://arxiv.org/html/2605.11189#bibc.bib132)\], and many designs require further optimization to achieve high binding affinities\.

###### Multistate design for binding specificity\.

Many applications require designing proteins with defined binding specificities \(selectively binding one target while avoiding others\)\. Physics\-based multistate design algorithms, such as Rosetta MSD\[[20](https://arxiv.org/html/2605.11189#bibc.bib135)\], address this by simultaneously optimizing sequences across multiple structural states to maximize energy gaps between desired and undesired interactions\. Recent work has incorporated deep learning fixed backbone design into multistate workflows\[[18](https://arxiv.org/html/2605.11189#bibc.bib136)\]or extended to modeling structural ensembles, demonstrating success in applications such as conformational switches\[[16](https://arxiv.org/html/2605.11189#bibc.bib137)\]\. However, whether these approaches generalize to tuning binding specificities remains unclear\. Alternative strategies employ experimental screening heuristics, such as differential yeast display, to identify selective binders\[[39](https://arxiv.org/html/2605.11189#bibc.bib186)\], but such approaches do not leverage computational design capabilities to systematically optimize both affinity and specificity\.

### 4\.2Data and Methods

#### 4\.2\.1Protein Graph Representation

We utilize multi\-scale graph representations to model the backbone and side\-chain geometry of protein complex structures\.

###### Backbone representation for complex structures\.

We represent the protein complex as a residue\-level graphG=\(V,E\)G=\(V,E\), where each nodevi∈Vv\_\{i\}\\in Vcorresponds to a residue and edgesei​j∈Ee\_\{ij\}\\in Econnect spatially proximal residues within a distance cutoff\. Local frames are derived from backbone atom coordinates \(N, Cα\\alpha, C, O\) and edges are constructed usingkk\-nearest neighbors based on Cα\\alphadistances as in GLINTER\[[35](https://arxiv.org/html/2605.11189#bibc.bib11)\]\.

###### Sidechain representation for target chains\.

For target chains where sequence information is available, we construct an all\-atom graphGatom=\(Vatom,Eatom\)G\_\{\\text\{atom\}\}=\(V\_\{\\text\{atom\}\},E\_\{\\text\{atom\}\}\)to capture detailed sidechain interactions\. Each nodeva∈Vatomv\_\{a\}\\in V\_\{\\text\{atom\}\}represents a heavy atom, with features encoding local chemical environment; shown in later section[Section˜4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx2)\. Edges connect atoms within a distance cutoff and encode pairwise distances and other features\.

#### 4\.2\.2Network Architectures

##### Overview

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/rednet/rednet.png)Figure 4\.1:Overview of the RedNet architecture\. Graph neural networks encode protein structure into node and edge representations, which are then decoded by a causal transformer to autoregressively predict amino acid sequences\.The diagram of the overall architecture is shown in[Figure˜4\.1](https://arxiv.org/html/2605.11189#Ch4.F1)\. We employ graph neural networks to capture the protein graphs and a causal transformer to decode amino acids at each timestep\. We propose several improvements over existing encoder\-decoder graph neural network architectures for protein design\.

###### Graph attention network handles heterogeneous neighborhoods\.

1

2

\#

ss: node features;pp: edge features;ℰ\\mathcal\{E\}: neighbor indices;MM: edge mask

3

4def*GATLayer\(*ss,pp,ℰ\\mathcal\{E\},MM*\)*:

5

\#Build edge messages from nodes and edges

6

mi​j←Linear​\(pi​j\)\+Gather​\(Wsrc​s,ℰ\)\+Wtgt​sim\_\{ij\}\\leftarrow\\textnormal\{\{Linear\}\}\(p\_\{ij\}\)\+\\textnormal\{\{Gather\}\}\(W^\{\\text\{src\}\}s,\\mathcal\{E\}\)\+W^\{\\text\{tgt\}\}s\_\{i\}
7

mi​j←MLP​\(mi​j\)m\_\{ij\}\\leftarrow\\textnormal\{\{MLP\}\}\(m\_\{ij\}\)
8

\#Global pooling with gating

9

oiglobal←1\|δ​\(i\)\|​∑j∈δ​\(i\)mi​jo\_\{i\}^\{\\text\{global\}\}\\leftarrow\\frac\{1\}\{\|\\delta\(i\)\|\}\\sum\_\{j\\in\\delta\(i\)\}m\_\{ij\}
10

Δ​s←Linear​\(Sigmoid​\(Wg​si\)⊙oiglobal\)\\Delta s\\leftarrow\\textnormal\{\{Linear\}\}\(\\textnormal\{\{Sigmoid\}\}\(W^\{g\}s\_\{i\}\)\\odot o\_\{i\}^\{\\text\{global\}\}\)
11

\#Graph attention with gating

12

αi​j←WA⋅LeakyReLU​\(Linear​\(mi​j\)\)\\alpha\_\{ij\}\\leftarrow W^\{A\}\\cdot\\textnormal\{\{LeakyReLU\}\}\(\\textnormal\{\{Linear\}\}\(m\_\{ij\}\)\)
13

αi​j←Softmaxj∈δ​\(i\)​\(αi​j\)\\alpha\_\{ij\}\\leftarrow\\textnormal\{\{Softmax\}\}\_\{j\\in\\delta\(i\)\}\(\\alpha\_\{ij\}\)
14

vi​j←Linear​\(mi​j\)v\_\{ij\}\\leftarrow\\textnormal\{\{Linear\}\}\(m\_\{ij\}\)
15

oigat←∑j∈δ​\(i\)αi​j​vi​jo\_\{i\}^\{\\text\{gat\}\}\\leftarrow\\sum\_\{j\\in\\delta\(i\)\}\\alpha\_\{ij\}v\_\{ij\}
16

Δ​s←Δ​s\+Linear​\(Sigmoid​\(Wg′​si\)⊙oigat\)\\Delta s\\leftarrow\\Delta s\+\\textnormal\{\{Linear\}\}\(\\textnormal\{\{Sigmoid\}\}\(W^\{g^\{\\prime\}\}s\_\{i\}\)\\odot o\_\{i\}^\{\\text\{gat\}\}\)
17

\#Residual updates

18

s←s\+Dropout​\(Δ​s\)s\\leftarrow s\+\\text\{Dropout\}\(\\Delta s\)
19

s←s\+Dropout​\(MLP​\(s\)\)s\\leftarrow s\+\\text\{Dropout\}\(\\textnormal\{\{MLP\}\}\(s\)\)
20

21return

ss,

pp
22

23

Algorithm 3GAT LayerThe pseudocode for our variant of GAT\[[32](https://arxiv.org/html/2605.11189#bibc.bib141),[4](https://arxiv.org/html/2605.11189#bibc.bib142)\]is shown in[Algorithm˜3](https://arxiv.org/html/2605.11189#alg3)\. Edge messages are first constructed by combining projected edge features with source node features \(via gather\) and target node features\. These messages are refined through an MLP\. For autoregressive decoding, causal masks are precomputed based on the sampled decoding order and edge indices; the edge maskMMensures that each residue can only attend to neighbors that have already been seen\. The layer then applies two complementary aggregation strategies: \(1\) a global pooling branch that computes the mean of neighboring messages, modulated by a learned sigmoid gate, and \(2\) a graph attention branch where attention weights are computed via LeakyReLU activation, allowing the model to selectively attend to informative neighbors while down\-weighting noisy or less relevant nodes\. Both branches use gating mechanisms initialized near zero\. The final node representations are updated through residual connections with dropout regularization\.

###### Causal graph transformer with pairwise positional biases\.

1

2

\#

ss: node features;pp: pairwise features;MM: attention mask

3

4def*GraphAttention\(*ss,pp,MM*\)*:

5

\#Project to queries, keys, values

6

Q←Linear​\(s\)Q\\leftarrow\\textnormal\{\{Linear\}\}\(s\)
7

K,V←split​\(Linear​\(s\)\)K,V\\leftarrow\\text\{split\}\(\\textnormal\{\{Linear\}\}\(s\)\)
B←Linear​\(p\)B\\leftarrow\\textnormal\{\{Linear\}\}\(p\)
\#pairwise bias

8

\#Compute attention with pairwise bias

9

Ai​j←Qi⋅Kjd\+Bi​jA\_\{ij\}\\leftarrow\\frac\{Q\_\{i\}\\cdot K\_\{j\}\}\{\\sqrt\{d\}\}\+B\_\{ij\}
10

Ai​j←Ai​j\.masked\_fill​\(¬Mi​j,−∞\)A\_\{ij\}\\leftarrow A\_\{ij\}\.\\text\{masked\\\_fill\}\(\\neg M\_\{ij\},\-\\infty\)
11

αi​j←Softmaxj​\(Ai​j\)\\alpha\_\{ij\}\\leftarrow\\textnormal\{\{Softmax\}\}\_\{j\}\(A\_\{ij\}\)
12

\#Aggregate and project with gating

13

oi←∑jαi​j​Vjo\_\{i\}\\leftarrow\\sum\_\{j\}\\alpha\_\{ij\}V\_\{j\}
14

oi←Sigmoid​\(Linear​\(si\)\)⊙oio\_\{i\}\\leftarrow\\textnormal\{\{Sigmoid\}\}\(\\textnormal\{\{Linear\}\}\(s\_\{i\}\)\)\\odot o\_\{i\}
15

16return

Linear​\(o\)\\textnormal\{\{Linear\}\}\(o\)
17

18

19

Algorithm 4Attention with Pairwise BiasesMPNNs suffer from limited expressivity, over\-smoothing, and over\-squashing\[[36](https://arxiv.org/html/2605.11189#bibc.bib143),[24](https://arxiv.org/html/2605.11189#bibc.bib144),[2](https://arxiv.org/html/2605.11189#bibc.bib145)\]\. Adding proper normalization\[[40](https://arxiv.org/html/2605.11189#bibc.bib147),[5](https://arxiv.org/html/2605.11189#bibc.bib146)\], positional biases and global attentions\[[12](https://arxiv.org/html/2605.11189#bibc.bib148),[37](https://arxiv.org/html/2605.11189#bibc.bib149),[30](https://arxiv.org/html/2605.11189#bibc.bib151)\]may address some of these issues; however, long\-range interactions remain difficult to capture\.De novobinder design methods\[[28](https://arxiv.org/html/2605.11189#bibc.bib161)\]tend to exploit shape complementarity and short\-range hydrophobicity for generating successful designs, yet long\-range features of natural proteins—such as allosteric pathways—which are essential for certain biological functions\[[34](https://arxiv.org/html/2605.11189#bibc.bib155)\], may not be adequately captured\. We hypothesize that adding global\-attention may improve sequence modeling of long range interactions\. The pseudocode is shown in[Algorithm˜4](https://arxiv.org/html/2605.11189#alg4)\.

##### Multiscale\-Graph Featurizers

Table 4\.1:Summary of input features\. Core atoms: N, Cα\\alpha, C, O, pseudo\-Cβ\\beta\(C=5C\{=\}5\)\. The residue graph is aKK\-NN graph \(K=48K\{=\}48\) over Cα\\alphadistances\. The atom graph connects each Cα\\alphato nearby atoms via a radius graph \(r=15r\{=\}15Å, maxk=96k\{=\}96\)\. RBF:ϕ​\(d\)=exp⁡\(−\(d−μi\)2\)\\phi\(d\)=\\exp\(\-\(d\-\\mu\_\{i\}\)^\{2\}\),μi\\mu\_\{i\}linearly spaced in\[2,22\]\[2,22\],D=16D\{=\}16bins\. Atom type vocabularyA=37A\{=\}37; residue type vocabularyR=33R\{=\}33\.NN: residues;MM: atoms;EE: atom graph edges\.FeatureShapeDescriptionResidue graph: edge featuresCore atom pairwise RBF\(N,K,C2​D\)\(N,K,C^\{2\}D\)RBF over pairwise distances between core atomsCore atom inverse distance\(N,K,C2\)\(N,K,C^\{2\}\)\(1\+di​j\)−1\(1\+d\_\{ij\}\)^\{\-1\}for each core atom pairRelative residue index\(N,K\)\(N,K\)Sequence offset encodingSame\-chain indicator\(N,K\)\(N,K\)1 if same chainFrame\-relative positions\(N,K,3​C\)\(N,K,3C\)Core atom coordinates in local N–Cα\\alpha–C framesCβ\\beta–sidechain RBF\(N,K,32​D\)\(N,K,32D\)RBF from pseudo\-Cβ\\betato sidechain atoms; design chains maskedAtom graph: node featuresAtom type\(M,A\)\(M,A\)One\-hot over atom typesResidue type\(M,R\)\(M,R\)One\-hot over parent residue typeAtom exists\(M,1\)\(M,1\)1 if atom resolved in structureAtom graph: edge featuresRBF distance\(E,D\)\(E,D\)RBF over Cα\\alpha\-to\-atom distanceEuclidean distance\(E,1\)\(E,1\)Cα\\alpha\-to\-atom distanceResidue index offset\(E,65\)\(E,65\)Clamped relative residue index one\-hot \(±32\\pm 32\)Same\-chain indicator\(E,1\)\(E,1\)1 if same chainAtom graph: 3D coordinates \(equivariant attention\)Centroid positions\(N,3\)\(N,3\)Cα\\alphacoordinatesAtom positions\(M,3\)\(M,3\)All\-atom coordinates

Node and edge features are derived from backbone atom coordinates \(N, Cα\\alpha, C, O\) and include local geometric descriptors; shown in[Table˜4\.1](https://arxiv.org/html/2605.11189#Ch4.T1)\.

###### All\-atom equivariant GAT\.

1

2

\#

qq,kk: node features;xx,yy: coordinates;ℰ\\mathcal\{E\}: edges;ee: edge features

3

4def*EGATLayer\(*qq,kk,xx,yy,ℰ\\mathcal\{E\},ee*\)*:

5

\#Compute relative positions and distances

6

\(i,j\)←\(i,j\)\\leftarrowℰ\\mathcal\{E\}
7

zi​j←yj−xiz\_\{ij\}\\leftarrow y\_\{j\}\-x\_\{i\}
8

di​j←‖zi​j‖2d\_\{ij\}\\leftarrow\\\|z\_\{ij\}\\\|^\{2\}
9

\#Build edge messages

10

mi​j←Linear​\(\[qi​‖kj‖​di​j∥ei​j\]\)m\_\{ij\}\\leftarrow\\textnormal\{\{Linear\}\}\(\[q\_\{i\}\\\|k\_\{j\}\\\|d\_\{ij\}\\\|e\_\{ij\}\]\)
11

\#Compute multi\-head attention

12

αi​j←WQ​qi\+WK​kj\+WE​mi​j\\alpha\_\{ij\}\\leftarrow W^\{Q\}q\_\{i\}\+W^\{K\}k\_\{j\}\+W^\{E\}m\_\{ij\}
13

αi​j←WA⋅LeakyReLU​\(αi​j\)\\alpha\_\{ij\}\\leftarrow W^\{A\}\\cdot\\textnormal\{\{LeakyReLU\}\}\(\\alpha\_\{ij\}\)
αi​j←Softmaxj∈δ​\(i\)​\(αi​j\)\\alpha\_\{ij\}\\leftarrow\\textnormal\{\{Softmax\}\}\_\{j\\in\\delta\(i\)\}\(\\alpha\_\{ij\}\)
\#over neighborsjj

14

\#Aggregate values

15

vi​j←Linear​\(\[qi​‖kj‖​di​j∥ei​j\]\)v\_\{ij\}\\leftarrow\\textnormal\{\{Linear\}\}\(\[q\_\{i\}\\\|k\_\{j\}\\\|d\_\{ij\}\\\|e\_\{ij\}\]\)
16

oi←∑j∈δ​\(i\)αi​j​vi​jo\_\{i\}\\leftarrow\\sum\_\{j\\in\\delta\(i\)\}\\alpha\_\{ij\}v\_\{ij\}
17

houth^\{\\text\{out\}\}←Linear​\(\[qi∥oi\]\)\\leftarrow\\textnormal\{\{Linear\}\}\(\[q\_\{i\}\\\|o\_\{i\}\]\)
18

19return

houth^\{\\text\{out\}\}
20

Algorithm 5Equivariant Graph Attention LayerThe pseudo\-algorithm is shown in[Algorithm˜5](https://arxiv.org/html/2605.11189#alg5)\. The layer takes as input query and key node features \(qq,kk\), their corresponding coordinates \(xx,yy\), edge indicesℰ\\mathcal\{E\}, and edge attributesee\. Edge messages are constructed by concatenating source and target node features with squared pairwise distances and edge attributes, ensuring that the representation is invariant to global rotations and translations\. Multi\-head attention scores are computed using a GAT\-style mechanism with LeakyReLU activation\[[4](https://arxiv.org/html/2605.11189#bibc.bib142)\], allowing the model to learn which neighboring atoms are most relevant for each query node\. Output node features are obtained by aggregating attention\-weighted value vectors and projecting with a residual connection\.

##### Losses

###### Causal language model objectives\.

We train the model using a cross\-entropy loss over amino acid tokens\[[21](https://arxiv.org/html/2605.11189#bibc.bib157)\]\. Given predicted logitsy^i∈ℝ\|𝒱\|\\hat\{y\}\_\{i\}\\in\\mathbb\{R\}^\{\|\\mathcal\{V\}\|\}and ground truth tokenyiy\_\{i\}at positionii, the nodewise loss is defined as:

ℒnode=∑m∈ℳwm⋅1\|Mm\|​∑i∈MmCE​\(y^i,yi\)\\mathcal\{L\}\_\{\\text\{node\}\}=\\sum\_\{m\\in\\mathcal\{M\}\}w\_\{m\}\\cdot\\frac\{1\}\{\|M\_\{m\}\|\}\\sum\_\{i\\in M\_\{m\}\}\\text\{CE\}\(\\hat\{y\}\_\{i\},y\_\{i\}\)whereℳ\\mathcal\{M\}denotes a set of binary masks \(e\.g\., design site mask, prediction mask\) with corresponding weightswmw\_\{m\}, andCE​\(⋅,⋅\)\\text\{CE\}\(\\cdot,\\cdot\)is the cross\-entropy loss\.

###### Edgewise regularization\.

To encourage the model to learn informative pairwise representations, we add an auxiliary edgewise cross\-entropy loss\. During training, we find this regularization to prevent overfitting\. For each nodeiiand itskk\-nearest neighbors, the model predicts a joint token distribution over residue pairs\. Given edgewise logitsz^i​j∈ℝ\|𝒱\|2\\hat\{z\}\_\{ij\}\\in\\mathbb\{R\}^\{\|\\mathcal\{V\}\|^\{2\}\}and ground truth pair tokenzi​j=yi⋅\|𝒱\|\+yjz\_\{ij\}=y\_\{i\}\\cdot\|\\mathcal\{V\}\|\+y\_\{j\}, the edgewise loss is:ℒedge=1\|E\|​∑\(i,j\)∈ECE​\(z^i​j,zi​j\)\\mathcal\{L\}\_\{\\text\{edge\}\}=\\frac\{1\}\{\|E\|\}\\sum\_\{\(i,j\)\\in E\}\\text\{CE\}\(\\hat\{z\}\_\{ij\},z\_\{ij\}\)whereEEis the set of valid edges determined by the prediction mask\. The total loss isℒ=ℒnode\+λedge​ℒedge\\mathcal\{L\}=\\mathcal\{L\}\_\{\\text\{node\}\}\+\\lambda\_\{\\text\{edge\}\}\\mathcal\{L\}\_\{\\text\{edge\}\}, whereλedge\\lambda\_\{\\text\{edge\}\}controls the regularization strength\. In our experimentsλedge\\lambda\_\{\\text\{edge\}\}is set to 1\.

#### 4\.2\.3Contrastive Decoding and Scoring

Contrastive decoding\[[25](https://arxiv.org/html/2605.11189#bibc.bib139),[26](https://arxiv.org/html/2605.11189#bibc.bib140)\]modifies the predicted logits at each timestep to amplify features that distinguish the on\-target bound structure from the off\-target bound structure\. Here, the bound structure consists of the backbone coordinates of the complex and the side\-chain coordinates of the target chain\. Specifically, at timesteptt, the modified logits are computed as:

ℓ​\(st\)=\(1\+α\)​log⁡p​\(st\|s<t,ron,xon\)−α​log⁡p​\(st\|s<t,roff,xoff\)\\ell\(s\_\{t\}\)=\(1\+\\alpha\)\\log p\(s\_\{t\}\|s\_\{<t\},r\_\{\\texttt\{on\}\},x\_\{\\texttt\{on\}\}\)\-\\alpha\\log p\(s\_\{t\}\|s\_\{<t\},r\_\{\\texttt\{off\}\},x\_\{\\texttt\{off\}\}\)wherests\_\{t\}is the binder residue at positiontt,s<ts\_\{<t\}denotes the previously sampled binder residues,ronr\_\{\\texttt\{on\}\}androffr\_\{\\texttt\{off\}\}are the target sequences for the on\-target and off\-target complexes,xonx\_\{\\texttt\{on\}\}andxoffx\_\{\\texttt\{off\}\}are the corresponding bound structures, andα≥0\\alpha\\geq 0controls the strength of the contrastive penalty\. Whenα=0\\alpha=0, this reduces to standard decoding from the on\-target distribution\. Asα\\alphaincreases, the model increasingly penalizes residue choices that are also favorable under the off\-target context, thereby promoting specificity\.

To prevent the contrastive term from selecting low\-probability tokens, we constrain the candidate set at timestepttto tokens with sufficiently high probability under the on\-target distribution:

𝒮t=\{st:p​\(st\|s<t,ron,xon\)≥β​maxs∈𝒜⁡p​\(s\|s<t,ron,xon\)\}\\mathcal\{S\}\_\{t\}=\\\{s\_\{t\}:p\(s\_\{t\}\|s\{<t\},r\_\{\\texttt\{on\}\},x\_\{\\texttt\{on\}\}\)\\geq\\beta\\max\_\{s\\in\\mathcal\{A\}\}p\(s\|s\_\{<t\},r\_\{\\texttt\{on\}\},x\_\{\\texttt\{on\}\}\)\\\}where𝒜\\mathcal\{A\}is the amino acid alphabet andβ∈\[0,1\]\\beta\\in\[0,1\]is a truncation threshold\. This adaptive truncation ensures that only plausible residues are considered, while allowing the contrastive term to discriminate among them\. We then sample from the modified categorical distribution restricted to the candidate set:

pt​\(s\)=softmax𝒮t​\(ℓ​\(s\)\)p\_\{t\}\(s\)=\\text\{softmax\}\_\{\\mathcal\{S\}\_\{t\}\}\(\\ell\(s\)\)The pseudocode is shown in[Algorithm˜6](https://arxiv.org/html/2605.11189#alg6)\.

1def*ContrastiveDecode\(r*on*,x*on*,r*off*,x*off*,α,β,τ,L,𝒜r\_\{\\texttt\{on\}\},x\_\{\\texttt\{on\}\},r\_\{\\texttt\{off\}\},x\_\{\\texttt\{off\}\},\\alpha,\\beta,\\tau,L,\\mathcal\{A\}\)*:

s←\[mask\]Ls\\leftarrow\[\\texttt\{mask\}\]^\{L\}\#Initialize binder sequence with all mask tokens

2

3for*t=1,…,Lt=1,\\ldots,L*do

4

ℓ​\(a\)←\(1\+α\)​log⁡p​\(a∣s<t,ron,xon\)−α​log⁡p​\(a∣s<t,roff,xoff\)\\ell\(a\)\\leftarrow\(1\+\\alpha\)\\log p\(a\\mid s\_\{<t\},r\_\{\\texttt\{on\}\},x\_\{\\texttt\{on\}\}\)\-\\alpha\\log p\(a\\mid s\_\{<t\},r\_\{\\texttt\{off\}\},x\_\{\\texttt\{off\}\}\)
5

6

pmax←maxa∈𝒜⁡p​\(a∣s<t,ron,xon\)p\_\{\\max\}\\leftarrow\\max\_\{a\\in\\mathcal\{A\}\}\\,p\(a\\mid s\_\{<t\},r\_\{\\texttt\{on\}\},x\_\{\\texttt\{on\}\}\)
7

8

𝒮t←\{a∈𝒜:p​\(a∣s<t,ron,xon\)≥β⋅pmax\}\\mathcal\{S\}\_\{t\}\\leftarrow\\\{a\\in\\mathcal\{A\}:p\(a\\mid s\_\{<t\},r\_\{\\texttt\{on\}\},x\_\{\\texttt\{on\}\}\)\\geq\\beta\\cdot p\_\{\\max\}\\\}
9

10

st∼softmax𝒮t​\(ℓ​\(a\)/τ\)s\_\{t\}\\sim\\mathrm\{softmax\}\_\{\\mathcal\{S\}\_\{t\}\}\(\\ell\(a\)/\\tau\)
\#Sample with temperatureτ\\tau

11

12Set

s​\[t\]s\[t\]to

sts\_\{t\}
13

14end for

15return

ss
16

17

Algorithm 6Contrastive Decoding for Binder Sequence DesignThis decoding framework also enables direct sampling of sequences that optimize the binding free energyΔ​G\\Delta G, which relates to the folding free energies as follows\[[15](https://arxiv.org/html/2605.11189#bibc.bib3)\]:

Δ​G=Δ​Gcomplex−Δ​Gbinder−Δ​Gtarget\\Delta G=\\Delta G\_\{\\texttt\{complex\}\}\-\\Delta G\_\{\\texttt\{binder\}\}\-\\Delta G\_\{\\texttt\{target\}\}whereΔ​G\\Delta Gis the binding free energy,Δ​Gcomplex\\Delta G\_\{\\texttt\{complex\}\}is the folding free energy of the bound complex, andΔ​Gbinder\\Delta G\_\{\\texttt\{binder\}\}andΔ​Gtarget\\Delta G\_\{\\texttt\{target\}\}are the folding free energies of the unbound binder and target chains, respectively\. Multiple studies\[[10](https://arxiv.org/html/2605.11189#bibc.bib178)\]have demonstrated that folding free energy correlates with the log\-likelihood from fixed\-backbone sequence design models\. Therefore, the right\-hand side can be approximated as:

Δ​G≈log⁡p​\(s,r\|xbound\)−log⁡p​\(s\|xbinder\)−log⁡p​\(r\|xtarget\)\\Delta G\\approx\\log p\(s,r\|x\_\{\\texttt\{bound\}\}\)\-\\log p\(s\|x\_\{\\texttt\{binder\}\}\)\-\\log p\(r\|x\_\{\\texttt\{target\}\}\)wheressandrrare the binder and target chain sequences,xboundx\_\{\\texttt\{bound\}\}is the structure of the bound complex, andxbinderx\_\{\\texttt\{binder\}\}andxtargetx\_\{\\texttt\{target\}\}are the structures of the unbound binder and target chains, respectively\. Since the target sequencerris fixed during binder design, we can omit terms that do not depend onss, yielding the following contrastive decoding formulation:

ℓ​\(st\)=\(1\+α\)​log⁡p​\(st\|s<t,r,xbound\)−α​log⁡p​\(st\|s<t,xbinder\)\\ell\(s\_\{t\}\)=\(1\+\\alpha\)\\log p\(s\_\{t\}\|s\_\{<t\},r,x\_\{\\texttt\{bound\}\}\)\-\\alpha\\log p\(s\_\{t\}\|s\_\{<t\},x\_\{\\texttt\{binder\}\}\)This formulation encourages the model to select residues that are favorable in the bound state but unfavorable in the unbound state, thereby approximating the thermodynamic preference for complex formation\. The same candidate set truncation and sampling procedure described above are applied to this formulation\.

The two contrastive decoding formulations share the same mathematical form: in both cases, the model amplifies residue choices that are favorable under the on\-target context and unfavorable under an alternative context\. In the specificity formulation, the alternative context is the off\-target bound structure; in the affinity formulation, the alternative context is the unbound binder structure\.

##### Scoring

We define a set of scoring metrics to evaluate designed sequences for binding\. Straightforward evaluations of the autoregressive model,llandll\_global, measure respectively the average log\-likelihood of the designed binder sequence and the full complex sequence given the bound complex structure, assessing sequence structure compatibility\.ll\_mtrestricts the average to mutated positions only\.ll\_reffurther refines this by computing the log\-likelihood difference between the designed and wild\-type residues at mutated positions, directly measuring whether the mutations are predicted to be more favorable than the original sequence\.

The contrastive scores,ll\_cdandll\_cd\_ref, approximate binding free energy by subtracting the unbound binder log\-likelihood from the bound complex log\-likelihood, mirroring the thermodynamic cycle formulation\.ll\_cdcaptures the overall binding preference across the binder chain, whilell\_cd\_reffocuses on mutated positions and uses the wild\-type as a reference, providing a mutation\-specific estimate of the change in binding affinity relative to the original sequence\. Concurrent works\[[11](https://arxiv.org/html/2605.11189#bibc.bib190),[13](https://arxiv.org/html/2605.11189#bibc.bib17),[9](https://arxiv.org/html/2605.11189#bibc.bib192)\]have also explored similar scoring metrics and demonstrated improved correlation with experimental measurements of stability\.

ll=1Nbinder​∑i=1Nbinderℓi​\(ai\)\\displaystyle=\\frac\{1\}\{N\_\{\\texttt\{binder\}\}\}\\sum\_\{i=1\}^\{N\_\{\\texttt\{binder\}\}\}\\ell\_\{i\}\(a\_\{i\}\)ll\_global=1Ncomplex​∑i=1Ncomplexℓi​\(ai\)\\displaystyle=\\frac\{1\}\{N\_\{\\texttt\{complex\}\}\}\\sum\_\{i=1\}^\{N\_\{\\texttt\{complex\}\}\}\\ell\_\{i\}\(a\_\{i\}\)ll\_mt=1\|ℳ\|​∑i∈ℳℓi​\(ai\)\\displaystyle=\\frac\{1\}\{\|\\mathcal\{M\}\|\}\\sum\_\{i\\in\\mathcal\{M\}\}\\ell\_\{i\}\(a\_\{i\}\)ll\_ref=1\|ℳ\|​∑i∈ℳ\(ℓi​\(ai\)−ℓi​\(aiwt\)\)\\displaystyle=\\frac\{1\}\{\|\\mathcal\{M\}\|\}\\sum\_\{i\\in\\mathcal\{M\}\}\\left\(\\ell\_\{i\}\(a\_\{i\}\)\-\\ell\_\{i\}\(a\_\{i\}^\{\\text\{wt\}\}\)\\right\)ll\_cd=1Nbinder​∑i=1Nbinderℓi​\(ai\)−1Nbinder​∑i=1Nbinderℓiu​\(ai\)\\displaystyle=\\frac\{1\}\{N\_\{\\texttt\{binder\}\}\}\\sum\_\{i=1\}^\{N\_\{\\texttt\{binder\}\}\}\\ell\_\{i\}\(a\_\{i\}\)\-\\frac\{1\}\{N\_\{\\texttt\{binder\}\}\}\\sum\_\{i=1\}^\{N\_\{\\texttt\{binder\}\}\}\\ell\_\{i\}^\{u\}\(a\_\{i\}\)ll\_cd\_ref=1\|ℳ\|​∑i∈ℳ\(ℓi​\(ai\)−ℓi​\(aiwt\)\)−1\|ℳ\|​∑i∈ℳ\(ℓiu​\(ai\)−ℓiu​\(aiwt\)\)\\displaystyle=\\frac\{1\}\{\|\\mathcal\{M\}\|\}\\sum\_\{i\\in\\mathcal\{M\}\}\\left\(\\ell\_\{i\}\(a\_\{i\}\)\-\\ell\_\{i\}\(a\_\{i\}^\{\\text\{wt\}\}\)\\right\)\-\\frac\{1\}\{\|\\mathcal\{M\}\|\}\\sum\_\{i\\in\\mathcal\{M\}\}\\left\(\\ell\_\{i\}^\{u\}\(a\_\{i\}\)\-\\ell\_\{i\}^\{u\}\(a\_\{i\}^\{\\text\{wt\}\}\)\\right\)whereℓi=log⁡pi\\ell\_\{i\}=\\log p\_\{i\}is the predicted log\-probability at positioniigiven the bound complex structure and the target sequence,ℓiu\\ell\_\{i\}^\{u\}is the predicted log\-probability at positioniigiven the unbound binder structure,aia\_\{i\}andaiwta\_\{i\}^\{\\text\{wt\}\}are the designed and wild\-type residue types at positionii, respectively,NbinderN\_\{\\texttt\{binder\}\}is the sequence length of the binder chain,NcomplexN\_\{\\texttt\{complex\}\}is the total sequence length of the complex, andℳ=\{i:ai≠aiwt\}\\mathcal\{M\}=\\\{i:a\_\{i\}\\neq a\_\{i\}^\{\\text\{wt\}\}\\\}is the set of mutated positions in the binder chain\.

#### 4\.2\.4Datasets

##### Training Set

We use structures released before 2023\-01\-01 from Protein Data Bank \(PDB\)\[[3](https://arxiv.org/html/2605.11189#bibc.bib84)\]as the training and validation set\. We filter out structures with more than 20 polymer chains, resolutions worse than 5 Å, and experimental methods other than X\-ray diffraction and electron microscopy\. We retain only polypeptide\(L\) chains with 10–5000 residues, fewer than 10% unknown residues

###### Validation Set\.

We use the subset of structures that are released between 2022\-05\-01 and 2022\-12\-31 as the validation set\. We remove similar chains using the same procedure as the PDB test sets described below\.

##### Test Datasets

###### Low\-Homology PDB Test Set\.

We select structures released between 2023\-01\-01 and 2023\-12\-31\. To prevent potential sequence leakage, we search test sequences against training sequences using MMseqs2\[[31](https://arxiv.org/html/2605.11189#bibc.bib138)\]and retain only \(design\) chains with e\-value\>1\>1, which is typically stricter than commonly used 30% sequence identity threshold\. To reduce overrepresentation of certain protein families, we cluster the remaining test sequences at40%40\\%sequence identity using MMseqs2\. We categorize each structure by interface type based on the similarity of interacting chains: monomer \(no interface\), homodimer \(two similar chains\), or heterodimer \(two dissimilar chains\)\. For each interface type, we select one representative from each cluster and randomly sample300300monomers,150150homodimers, and150150heterodimers to create a balanced test set of600600structures\.

To test heterodimer self\-consistency, we select heterodimers whose total number of residues is at most500500, due to computational constraints\. This filtering retains107107samples\.

###### Selective Binder Test Set\.

Heterodimeric protein complexes are curated from all protein chains in the PDB \(released prior to 2025\-04\-14\)\. Two chains form an interacting heterodimer if their minimum Cα\\alpha–Cα\\alphadistance is≤10\\leq 10Å within a bioassembly and they share less than90%90\\%sequence identity as determined by MMseqs2 clustering\.

We filter chains to retain those between 20 and 500 residues in length, with fewer than10%10\\%unknown amino acids and no single amino acid type exceeding50%50\\%of the sequence\. For each chain cluster, we identify all its interacting heterodimers, retaining only clusters with interacting partners from at least two PDB entries\. To prevent promiscuous clusters that interact non\-specifically with many targets from dominating the test set, we exclude clusters with interacting partners from more than 30 PDB entries\. This step retains 991 unique clusters and 3,246 interacting heterodimers\.

Within each chain cluster \(designated as the binder cluster, i\.e\., the chain to be redesigned\), we randomly select one interacting heterodimer as the on\-target pair\. Each remaining heterodimer in the cluster serves as a candidate off\-target pair\. We structurally align the binder chains of the on\-target and off\-target pairs using TM\-align \(these belong to the same cluster\), and separately align their respective target chains\. We discard an off\-target candidate if any of the following conditions hold: \(1\) the target chains share100%100\\%sequence identity \(i\.e\., the on\-target and off\-target receptors are identical\), \(2\) the binder chain alignment has coverage below90%90\\%, \(3\) the binder chain alignment has sequence identity below90%90\\%, or \(4\) the binder chain alignment RMSD exceeds2\.52\.5Å\. After filtering, 691 unique binder clusters remain\.

To assess off\-target interaction difficulty, we define a difficulty score based on the Jaccard similarity of residue contact pairs \(Cα\\alpha–Cα\\alpha≤10\\leq 10Å\) between the on\-target and off\-target complexes:

Difficulty=JaccardSimilarity​\(𝒞on\-target,𝒞off\-target\)\\text\{Difficulty\}=\\text\{JaccardSimilarity\}\(\\mathcal\{C\}\_\{\\text\{on\-target\}\},\\mathcal\{C\}\_\{\\text\{off\-target\}\}\)where𝒞on\-target\\mathcal\{C\}\_\{\\text\{on\-target\}\}and𝒞off\-target\\mathcal\{C\}\_\{\\text\{off\-target\}\}denote the sets of inter\-chain residue contact pairs in the on\-target and off\-target complexes, respectively\. Higher scores indicate more shared interface contacts, making selective design more challenging\.

We retain off\-target interactions with Difficulty<0\.9<0\.9, yielding 656 unique binder clusters\. Within each cluster, we select the off\-target with the lowest Jaccard similarity \(i\.e\., the most dissimilar interface\), producing 656 non\-redundant on\-/off\-target pairs\. Evaluating each pair requires two AlphaFold3 cofolding passes \(one for on\-target, one for off\-target\), each with 10 recycles and 5 diffusion samples, followed by Rosetta relaxation \(3 repeats\) for both predicted complexes\. Given the computational cost, we uniformly sample 180 on\-/off\-target pairs for our benchmark\.

##### Benchmarking Methods

We compare our method against widely adopted fixed\-backbone sequence design methods: ESM\-IF\[[19](https://arxiv.org/html/2605.11189#bibc.bib159)\], ProteinMPNN\[[8](https://arxiv.org/html/2605.11189#bibc.bib158)\], and PiFold\[[14](https://arxiv.org/html/2605.11189#bibc.bib160)\]\.

### 4\.3Results

#### 4\.3\.1All\-atom graph transformer improves sequence recovery of monomeric and dimeric structures

We benchmark RedNet against widely adopted fixed backbone design models, including ESM\-IF, ProteinMPNN, and PiFold\. We evaluate native sequence recovery \(NSR\), wild\-type log\-likelihood \(LL\), and perplexity \(PPL\) across monomers, homodimers, and heterodimers \([Table˜4\.2](https://arxiv.org/html/2605.11189#Ch4.T2)\)\. ESM\-IF and PiFold are trained without noise augmentation, while ProteinMPNN and RedNet are compared across noise levelsσ∈\{0,0\.02,0\.1,0\.2,0\.3\}\\sigma\\in\\\{0,0\.02,0\.1,0\.2,0\.3\\\}to assess robustness to backbone coordinate perturbations\.

Table 4\.2:Performance comparison on monomers, homodimers, and heterodimers\.σ\\sigma: backbone coordinate noise level\. NSR: Native Sequence Recovery\. LL: Log\-Likelihood\. PPL: Perplexity\. For RedNet and ProteinMPNN, we test performance at different noise levelsσ∈\{0\.02,0\.1,0\.2,0\.3\}\\sigma\\in\\\{0\.02,0\.1,0\.2,0\.3\\\}\. ESM\-IF and PiFold are tested atσ=0\\sigma=0\.###### Monomers\.

On monomeric structures without noise augmentation, RedNet achieves the highest sequence recovery \(NSR==0\.43\) compared to ESM\-IF \(0\.38\), PiFold \(0\.40\), and ProteinMPNN atσ=0\.02\\sigma=0\.02\(0\.36\)\. RedNet achieves the most favorable wild\-type log\-likelihood \(LL==−\-1\.74 vs\.−\-1\.85 for ESM\-IF,−\-1\.92 for PiFold\), indicating better calibration of sequence probabilities on native sequences\. At matched noise levels, RedNet consistently outperforms ProteinMPNN: atσ=0\.02\\sigma=0\.02, RedNet achieves NSR==0\.37 versus ProteinMPNN’s 0\.36; atσ=0\.1\\sigma=0\.1, both models converge to NSR==0\.33, though RedNet maintains marginally better log\-likelihood \(−\-2\.01 vs\.−\-2\.02\)\. Performance degradation with increasing noise is comparable between models, with NSR decreasing from 0\.43 to 0\.30 for RedNet across the noise range tested\.

###### Homodimers\.

RedNet demonstrates strong performance on homodimers, achieving NSR==0\.49 atσ=0\\sigma=0, compared to ESM\-IF \(0\.43\), PiFold \(0\.45\), and ProteinMPNN atσ=0\.02\\sigma=0\.02\(0\.42\)\. RedNet also yields the lowest perplexity \(PPL==3\.86 vs\. 4\.29 for ESM\-IF, 6\.10 for PiFold\), suggesting higher confidence in native sequence predictions at symmetric protein–protein interfaces\. The log\-likelihood gap is substantial: RedNet achieves LL==−\-1\.28 compared to−\-1\.35 for ESM\-IF and−\-1\.73 for PiFold\. Across noise levels, RedNet maintains advantages over ProteinMPNN: atσ=0\.1\\sigma=0\.1, RedNet achieves NSR==0\.40 and PPL==5\.07 versus ProteinMPNN’s NSR==0\.40 and PPL==5\.58; atσ=0\.3\\sigma=0\.3, RedNet achieves LL==−\-1\.63 versus−\-1\.76 for ProteinMPNN\.

###### Heterodimers\.

The performance gap is most significant on heterodimeric interfaces, which are most relevant to one\-sided interface design\. Atσ=0\\sigma=0, RedNet achieves NSR==0\.43, outperforming ESM\-IF \(0\.33\), PiFold \(0\.35\), and ProteinMPNN atσ=0\.02\\sigma=0\.02\(0\.37\)\. This represents a 10 percentage point improvement over ESM\-IF \(30% relative\) and a 6 percentage point improvement over ProteinMPNN atσ=0\.02\\sigma=0\.02\(16% relative\)\. The perplexity gap is large: RedNet achieves PPL==6\.58 compared to ESM\-IF \(PPL==13\.39\), PiFold \(PPL==9\.16\), and ProteinMPNN \(PPL==8\.32\)\. Log\-likelihood differences follow similar patterns\.

###### Robustness to coordinate noise\.

We compare RedNet and ProteinMPNN across increasing noise levels to assess robustness to coordinate perturbations\. Both models show expected degradation with increasing noise: RedNet’s heterodimer NSR decreases from 0\.43 \(σ=0\\sigma=0\) to 0\.32 \(σ=0\.3\\sigma=0\.3\), while ProteinMPNN’s decreases from 0\.37 \(σ=0\.02\\sigma=0\.02\) to 0\.31 \(σ=0\.3\\sigma=0\.3\)\. RedNet maintains consistent advantages on dimer interfaces across all noise levels\. Atσ=0\.2\\sigma=0\.2, RedNet achieves heterodimer PPL==8\.84 versus ProteinMPNN’s PPL==9\.46\. Atσ=0\.3\\sigma=0\.3, RedNet achieves PPL==9\.27 compared to ProteinMPNN’s PPL==10\.12\. However, the sequence recovery rate gap between RedNet and ProteinMPNN decreases as the noise level increases\. It suggests that RedNet, which captures side\-chain information, is more sensitive to coordinate perturbation compared to backbone\-only models like ProteinMPNN\.

#### 4\.3\.2Contrastive scoring improves zero\-shot binding affinity prediction

Previous works have demonstrated the effectiveness of contrastive scoring for zero\-shot monomer stability prediction using ProteinMPNN and ESM\-IF, but have not empirically validated improvements in zero\-shot binding affinity prediction in the context of binder design\. We evaluate fixed\-backbone design models on predicting binding affinity changes upon mutation in a zero\-shot manner using the SKEMPI v2\.0 dataset, benchmarking RedNet against ProteinMPNN, ESM\-IF, and PiFold \(LABEL:tab:skempi\_affinity\)\. Since PiFold and ESM\-IF do not have released models trained with high noise levels \(over 0\.1 Å\), we benchmark all models at lower noise levels \(0 and 0\.02 Å\) for fairness\.

###### Overall performance\.

RedNet \(σ=0\.02\\sigma\{=\}0\.02\) with contrastive scoring methodscd\_llandcd\_ll\_refconsistently outperforms all other model, scoring combinations across Spearman’sρ\\rho, Kendall’sτ\\tau, and NDCG, achieving the highest Spearman correlation of 0\.28 \(cd\_ll\_ref\), Kendall’sτ\\tauof 0\.20 \(cd\_ll\_ref\), and NDCG of 0\.81 \(mt,ref\)\. This indicates that RedNet’s likelihood estimates using different scoring methods are better aligned with binding affinity than competing methods\.

###### RedNet consistently outperforms other models\.

RedNet \(σ=0\.02\\sigma\{=\}0\.02\) provides the highest Spearman correlations inll\(0\.21 vs\. 0\.17 for ProteinMPNN and ESM\-IF\) andglobal\(0\.26 vs\. 0\.24 for ESM\-IF and 0\.17 for ProteinMPNN\), with more pronounced gains in Kendall’sτ\\tau, where it leads in five of six scoring methods\. The exceptions aremtandref, where ProteinMPNN achieves higher Spearman correlations \(0\.23 and 0\.26, respectively, versus 0\.22 and 0\.24 for RedNet\), though RedNet matches or exceeds ProteinMPNN on these metrics in Kendall’sτ\\tauand NDCG\. These results suggest that RedNet’s all\-atom featurization provides a consistent advantage over backbone\-only models for zero\-shot binding affinity prediction\.

###### Contrastive scoring improves RedNet but not all models\.

The contrastive methodscd\_llandcd\_ll\_refconsistently improve RedNet at both noise levels across all three metrics\. In contrast, contrastive scoring degrades ProteinMPNN \(Spearman drops from 0\.26 withrefto 0\.24 withcd\_ll\_ref, and from 0\.23 withmtto 0\.10 withcd\_ll\) and ESM\-IF \(Spearman drops from 0\.24 withglobalto 0\.15 withcd\_llandcd\_ll\_ref\)\. While contrastive scoring does improve PiFold, its baseline correlations are near zero \(Spearmanmt=−0\.03=\-0\.03,ref=−0\.12=\-0\.12\), suggesting that PiFold fails at zero\-shot binding affinity prediction altogether and is not a meaningful basis for comparing scoring methods\. We hypothesize that RedNet benefits from contrastive scoring because it is trained on both bound complexes and monomers and leverages detailed all\-atom structure at interfaces, making it more sensitive to differences between bound and unbound states\.

###### Effect of noise augmentation\.

Comparing RedNet atσ=0\.02\\sigma\{=\}0\.02versusσ=0\\sigma\{=\}0, we observe consistent improvements\. Forll, Spearman’sρ\\rhoincreases from 0\.18 to 0\.21; forglobal, from 0\.23 to 0\.26; forcd\_ll, from 0\.23 to 0\.26; and forcd\_ll\_ref, from 0\.26 to 0\.28\. Similar gains appear in Kendall’sτ\\tauand NDCG\. This suggests that training with backbone coordinate noise improves the model’s sensitivity to binding affinities, despite yielding lower native sequence recovery\.

Overall, contrastive scoring with RedNet trained on all\-atom structures with noise augmentation yields the best zero\-shot binding affinity predictions, motivating the development of decoding methods that can improve contrastive scores and thus the affinity of designed binders\.

#### 4\.3\.3Contrastive decoding improves structural self\-consistency of binders

Table 4\.3:Heterodimer self\-consistency results on all 107 targets\.σ\\sigma: backbone coordinate noise level \(Å\)\. SR: Success Rate \(0–100%\), defined as pTM\>\>0\.55, ipTM\>\>0\.5, and Dsn pLDDT\>\>80, following BindCraft\. Dsn pLDDT: AlphaFold3 predicted LDDT for the design chain \(0–100\)\. ipTM: interface predicted Template Modeling score \(0–1\)\. pTM: AlphaFold3 predicted TM\-score of the complex \(0–1\)\. Tgt pLDDT: AlphaFold3 predicted LDDT for the target chain \(0–100\)\. RedNet\-CD uses contrastive decoding withα=1\\alpha=1,β=0\.9\\beta=0\.9\. All models are sampled at temperature=0\.001=0\.001\. Higher is better for all metrics\.Bold: best;underline: second best\.In protein binder design applications\[[28](https://arxiv.org/html/2605.11189#bibc.bib161)\], confidence metrics including pLDDT, ipTM, and pTM predicted by AlphaFold3 are used to filter designs and are shown to effectively guide wet\-lab experiments\. We withhold MSAs from AlphaFold3 to prevent artificial confidence inflation from evolutionary information, relying instead on target templates\. To simulate realistic de novo design scenarios, we adopt the default success criteria from BindCraft\.

We benchmark RedNet against ESM\-IF, ProteinMPNN, and PiFold on heterodimer design tasks across two datasets: all 107 heterodimeric targets \([Table˜4\.3](https://arxiv.org/html/2605.11189#Ch4.T3)\) and a high\-confidence subset of 44 targets where the AlphaFold3\-predicted pLDDT of the native target chain exceeds 70 \(LABEL:tab:self\_consistency\_hq\)\.

###### All targets\.

On the full 107 heterodimers, all methods face significant challenges: native sequences achieve only 29% success rate with mean ipTM of 0\.39\. ProteinMPNN achieves the highest design chain confidence \(Dsn pLDDT==55\.12\), followed by ESM\-IF \(53\.77\), RedNet \(53\.46\), and RedNet\-CD \(53\.12\), all above native sequences \(51\.31\); only PiFold \(50\.29\) underperforms\. However, improved monomer confidence does not translate to superior heterodimer self\-consistency: ProteinMPNN and RedNet\-CD both achieve 30% success rate\.

RedNet\-CD \(SR==30%\) achieves substantial gains over RedNet with standard ancestral sampling \(SR==25%\), both at temperature 0\.001\. Success rate is a composite measure of design chain stability \(Dsn pLDDT\), interface quality \(ipTM\), and overall complex confidence \(pTM\), reflecting the dual objective of binder redesign: jointly optimizing interface affinity and design chain stability\. Contrastive decoding achieves this dual goal more effectively than standard sampling, despite slightly lower individual confidence scores\.

###### High\-quality targets\.

We note that on the full 107\-target set, AlphaFold3 produces low\-confidence predictions for many targets \(average 62\.61 pLDDT of the target chains\), indicating that the evaluation may be bottlenecked by structure prediction quality rather than design quality alone\. To disentangle design capability from structure prediction error, we analyze a subset of 44 complexes where native sequences yield high\-confidence predictions \(pLDDT\>\>70\)\. RedNet\-CD achieves 68% success rate, matching native sequences and outperforming all other methods: ProteinMPNN \(59%\), ESM\-IF \(61%\), and PiFold \(64%\) with relative improvements of 15%, 11%, and 6%, respectively\.

Contrastive decoding significantly improves RedNet’s heterodimer self\-consistency, increasing success rate from 57% \(standard sampling\) to 68%, an absolute improvement of 11 percentage points \(19% relative\)\.

###### Comparison with ProteinMPNN on high\-quality targets\.

ProteinMPNN produces competitive monomer confidence \(Dsn pLDDT==67\.48 vs\. 67\.71 for RedNet\-CD\) for designed binder chains, yet RedNet\-CD achieves a substantially higher success rate \(68% vs\. 59%, a 15% relative gain\) for the complete complexes\. This suggests that RedNet’s designs are more consistently above the multi\-objective success thresholds \(pLDDT\>\>80, pTM\>\>0\.55, ipTM\>\>0\.5\), even when mean scores are comparable\. The side\-chain context and contrastive decoding in RedNet appear to provide critical information for optimizing interface interactions while maintaining binder chain stability, achieving native\-level heterodimer self\-consistency where backbone\-only models and standard ancestral sampling fall short\.

Table 4\.4:Energetics and geometric properties of designed binders\. Binding Score \(REU\): Rosetta binding score\. Int SC \(0–1\): interface shape complementarity\. Int Packstat \(0–1\): interface packing statistic\. Int dG \(REU\): interface free energy change\. Int dSASA \(Å2\): interface buried solvent\-accessible surface area\. REU: Rosetta Energy Units\. Due to Rosetta relaxation failures, we analyze 91 of 107 heterodimers that are successfully relaxed for all methods\. RedNet\-Ens combines RedNet and RedNet\-CD by selecting the design with the best binding score\.Bold: best;underline: second best\.Table 4\.5:Hydrophobicity and hydrogen\-bond properties of designed interfaces\. Surf Hydro \(0–1\): surface hydrophobicity\. Int Nres: number of interface residues\. Int HBonds: number of interface hydrogen bonds\. Int HBond %: percentage of interface residues involved in hydrogen bonds\. Int dUnsat HB: number of unsatisfied interface hydrogen bonds\. Int dUnsat HB %: percentage of unsatisfied interface hydrogen bonds\. Due to Rosetta relaxation failures, we analyze 91 of 107 heterodimers that are successfully relaxed for all methods\. RedNet\-Ens combines RedNet and RedNet\-CD by selecting the design with the best binding score\.Bold: best;underline: second best\.
###### Energetics and biochemical properties\.

AlphaFold3’s confidence metrics do not correlate well with stability or affinity and are not sensitive to mutational changes; higher self\-consistency does not necessarily imply better binding\. Other biochemical properties, such as surface hydrophobicity and aggregation propensity, are also important for practical binder design\. We therefore compute energetics and biochemical properties using Rosetta following the BindCraft protocol\[[28](https://arxiv.org/html/2605.11189#bibc.bib161)\]\. Due to stochasticity in Rosetta relaxation, we run three repeats per design and select the structure with the lowest energy\. We additionally construct RedNet\-Ens, which combines RedNet and RedNet\-CD by selecting the design with the best binding score for each target \([Tables˜4\.4](https://arxiv.org/html/2605.11189#Ch4.T4)and[4\.5](https://arxiv.org/html/2605.11189#Ch4.T5)\)\.

###### Binding energetics\.

RedNet\-Ens achieves the most favorable binding score \(−\-188\.04 REU\), outperforming all individual models including ProteinMPNN \(−\-184\.89\), RedNet\-CD \(−\-182\.26\), ESM\-IF \(−\-181\.27\), and native sequences \(−\-172\.42\)\. Contrastive decoding improves RedNet’s binding score from−\-179\.88 to−\-182\.26, indicating that the contrastive objective steers sequence selection toward more energetically favorable complexes\.

Interface shape complementarity is comparable across methods \(0\.64–0\.67\), with PiFold and RedNet\-Ens achieving the highest values \(0\.67\)\. ProteinMPNN and RedNet\-Ens have the best packing \(Int Packstat==0\.55\)\. For interface energy, RedNet\-Ens \(−\-56\.66 REU\) and PiFold \(−\-55\.19\) produce the most favorable values, both surpassing native interfaces \(−\-53\.35\)\. Notably, ProteinMPNN yields the least favorable Int dG \(−\-46\.98\) despite achieving the best packing, suggesting its designs may over\-optimize local packing at the expense of global interface energetics\.

RedNet\-Ens produces the largest buried surface area \(Int dSASA==1965\.96 Å2\), exceeding native interfaces \(1918\.33 Å2\), while ProteinMPNN yields the smallest \(1682\.12 Å2\), indicating reduced interface coverage\.

Contrastive decoding consistently improves RedNet across interface metrics: binding score improves from−\-179\.88 to−\-182\.26, interface free energy from−\-52\.49 to−\-54\.47, and buried surface area from 1868\.29 to 1894\.69 Å2, while maintaining comparable shape complementarity \(0\.66\) and packing \(0\.54\)\.

###### Matching native surface hydrophobicity\.

High surface hydrophobicity may cause aggregation, reduced solubility, and unwanted off\-target binding; thus excessive hydrophobicity of binders is undesirable in practice\. Native interfaces exhibit the lowest surface hydrophobicity \(0\.43\), suggesting a balanced mix of polar and nonpolar residues\. ProteinMPNN \(0\.43\), RedNet\-CD, and RedNet\-Ens \(0\.44\) closely match native surface hydrophobicity, while PiFold is the most hydrophobic \(0\.47\)\.

###### RedNet\-CD and RedNet\-Ens improve hydrogen bond formation\.

Hydrogen bonds at protein–protein interfaces contribute to binding affinity: buried polar atoms that lack hydrogen bond partners incur desolvation penalties that destabilize the complex\[[27](https://arxiv.org/html/2605.11189#bibc.bib119)\]\. BindCraft accordingly filters for sufficient interface hydrogen bonds \(\>\>2\) and penalizes unsatisfied ones \(<<3\)\[[28](https://arxiv.org/html/2605.11189#bibc.bib161)\]\.

RedNet\-Ens forms the most interface hydrogen bonds \(7\.31, 48\.99% of interface residues\), exceeding native interfaces \(6\.97, 46\.15%\)\. RedNet\-CD also surpasses native \(7\.01\), while RedNet \(6\.21\) and ProteinMPNN \(5\.44\) produce fewer, suggesting that contrastive decoding promotes hydrogen bond formation as a strategy to improve binding, compared to other approaches that rely more on hydrophobic packing\.

RedNet\-Ens also achieves the lowest percentage of unsatisfied hydrogen bonds \(14\.27%\), outperforming all methods and native structures \(17\.90%\)\. RedNet achieves the fewest absolute unsatisfied hydrogen bonds \(2\.44\), followed by ProteinMPNN \(2\.52\)\.

Overall, RedNet\-CD and RedNet\-Ens produce designs with native\-like or superior energetic and biochemical properties, validating that contrastive decoding and ensembling different sampling strategies can yield high\-quality interfaces without increasing surface hydrophobicity\.

#### 4\.3\.4Contrastive decoding improves binding specificities of binders

To our knowledge, no existing benchmarks or standardized metrics have been developed for computationally evaluating the binding specificity of designed protein binders\. To address this gap, we assemble a selective binder test set from heterodimers in the PDB \(detailed in[Section˜4\.2\.4](https://arxiv.org/html/2605.11189#Ch4.S2.SS4.SSSx2.Px2)\)\. Each test case consists of an on\-target and an off\-target receptor that share structural similarity, requiring the designed binder to discriminate between them based on subtle differences in the interface\.

We evaluate specificity using two complementary approaches\. First, we employ the Rosetta energy function to compute binding scores and assess energetic selectivity, as AlphaFold3’s confidence metrics lack the sensitivity to detect mutational changes of binding affinities\. Second, we use AlphaFold3\-based co\-folding tests to measure structural selectivity via ipTM thresholds\.

Table 4\.6:Selectivity measured by Rosetta binding score difference \(on\-target−\-off\-target\)\. A negative value indicates the binder prefers the on\-target\. SR \(Diff<<X\): percentage of cases where the on\-target preference exceeds the energy gap threshold X\.σ\\sigma: backbone coordinate noise level \(Å\)\. Higher is better for all metrics\.Bold: best;underline: second best\.###### Energetic analysis\.

Following BindCraft\[[28](https://arxiv.org/html/2605.11189#bibc.bib161)\], we define a composite Binder Score as the sum of the binding free energy \(Δ​Gbinding\\Delta G\_\{\\text\{binding\}\}\) and the folding free energy of the designed binder chain \(Δ​Gbinder\\Delta G\_\{\\text\{binder\}\}\)\. Equivalently, this is the energy of the complex minus the energy of the receptor \(Δ​Gcomplex−Δ​Greceptor\\Delta G\_\{\\text\{complex\}\}\-\\Delta G\_\{\\text\{receptor\}\}\)\. This metric captures the total energetic contribution of adding the binder to the system, encompassing both the strength of the interface interactions and the intrinsic stability of the binder in its bound conformation\. For a binder to be selective, it must exhibit a lower \(more favorable\) Binder Score when interacting with the on\-target receptor compared to the off\-target receptor\. The score difference \(Scoreon\{\}\_\{\\text\{on\}\}−\-Scoreoff\{\}\_\{\\text\{off\}\}\) thus provides a direct measure of energetic selectivity: a negative value indicates a preference for the on\-target \([Table˜4\.6](https://arxiv.org/html/2605.11189#Ch4.T6)\)\.

We evaluate selectivity at three stringency thresholds on 54 on\-/off\-target pairs filtered to a Jaccard interface similarity of 0\.5\. In the native control group, 50% of native binders exhibit a lower on\-target Binder Score, which is close to the random baseline and validates that our benchmark is well\-calibrated: native sequences, not having been optimized for selectivity between structurally similar receptors, serve as a sensible control for this energetic discrimination task\.

At the base threshold \(Diff<0<0\), RedNet\-CD achieves the highest success rate at 64\.81%, a 94% relative improvement over RedNet without contrastive decoding \(33\.33%\), and outperforming PiFold \(55\.56%, \+17% relative\), ProteinMPNN \(53\.70%, \+21% relative\), ESM\-IF \(53\.70%, \+21% relative\), and native sequences \(50\.00%, \+30% relative\)\.

At the moderate threshold \(Diff<−<\-5\), RedNet\-CD maintains its lead at 51\.85%, an 87% relative improvement over RedNet \(27\.78%\)\. It also outperforms ProteinMPNN \(44\.44%, \+17% relative\), ESM\-IF \(42\.59%, \+22% relative\), PiFold \(40\.74%, \+27% relative\), and native sequences \(33\.33%, \+56% relative\)\.

At the strictest threshold \(Diff<−<\-10\), ESM\-IF achieves the highest success rate at 35\.19%, followed by RedNet\-CD at 33\.33%, which is still a 50% relative improvement over RedNet \(22\.22%\) and comparable to ProteinMPNN \(31\.48%\)\.

Overall, RedNet\-CD leads at the base and moderate thresholds \(64\.81% at Diff<0<0\(vs\. 55\.56% for PiFold\) and 51\.85% at Diff<−<\-5 \(vs\. 44\.44% for ProteinMPNN\)\), while nearly doubling baseline RedNet \(33\.33% and 27\.78%\)\. Only at the strictest threshold \(Diff<−<\-10\) does ESM\-IF \(35\.19%\) narrowly lead RedNet\-CD \(33\.33%\)\. This demonstrates that contrastive decoding specifically enhances the model’s ability to discriminate between structurally similar on\-target and off\-target interactions\.

Table 4\.7:Selectivity success measured by AlphaFold3 cofolding\.σ\\sigma: backbone coordinate noise level \(Å\)\. Selectivity: proportion where on\-target ipTM\>\>0\.55 and off\-target ipTM<<0\.55\. On\-Target: proportion where on\-target ipTM\>\>0\.55\. Off\-Target: proportion where off\-target ipTM\>\>0\.55\.Bold: best;underline: second best\.
###### Co\-folding analysis\.

We also evaluate selectivity via the cofolding test \([Table˜4\.7](https://arxiv.org/html/2605.11189#Ch4.T7)\)\. It is worth noting that AlphaFold3’s predictions and confidences are not sensitive to mutational changes and do not discriminate folding or binding free energies accurately\. RedNet\-CD achieves the highest selectivity rate at 9\.26%\. However, both RedNet and RedNet\-CD \(σ=0\.02\\sigma=0\.02\) have lower on\-target success rates than ProteinMPNN and ESM\-IF, only exceeding PiFold and matching the native control group\. This is expected for RedNet\-CD, since it optimizes for selectivity instead of on\-target binding\.

We note several limitations of this benchmark\. First, the 180 evaluated pairs are uniformly sampled from 656 candidates, which may not fully represent the diversity of selective design challenges in the PDB\. Second, the Rosetta energy function used for energetic selectivity evaluation has known biases, which may favor certain types of interfaces over others\. Third, AlphaFold3’s confidence metrics are not sensitive to mutational changes in binding affinity, limiting the informativeness of the co\-folding evaluation\. Despite these limitations, the benchmark provides a standardized framework for comparing selectivity across design methods, and we expect it to be refined as more accurate scoring functions become available\.

#### 4\.3\.5Structural analysis of redesigned selective binder

We investigate how contrastive decoding enables redesigning specific binders through two case studies \([Figures˜4\.2](https://arxiv.org/html/2605.11189#Ch4.F2)and[4\.3](https://arxiv.org/html/2605.11189#Ch4.F3)\)\.

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/rednet/6FOE-5WHJ.png)Figure 4\.2:Structural analysis of the 6FOE–5WHJ selective binder pair\. \(A\) Interactions of redesigned binders \(red for the design chain of the on\-target complex and white for the design chain of the off\-target complex\) to their respective on\-target \(cyan\) and off\-target \(grey\) partners\. \(B\) Interactions of native binders to their respective on\-target and off\-target partners\.###### Case 1: 6FOE–5WHJ \(Fab\)\.

The first pair is 6FOE \(on\-target\) and 5WHJ \(off\-target\), both of which are Fab complexes\. Compared to the native binder, the redesigned binder mutates 4 contiguous residues at the interface from SQLY to GYRN\. In[Figure˜4\.2](https://arxiv.org/html/2605.11189#Ch4.F2)\(A\), we observe that the redesigned binder tends to form more favorable interactions by mutating L to R to exploit amino acids like F and W on the target chain of 6FOE; while the multipoint mutations are not favorable for the off\-target partner\. Despite the backbone structures of the target chains being similar \(RMSD==1\.89 Å\), RedNet\-CD is capable of exploiting side\-chain differences to enhance on\-target interactions\.

![Refer to caption](https://arxiv.org/html/2605.11189v1/assests/figs/rednet/5FFN-1LW6.png)Figure 4\.3:Structural analysis of the 5FFN–1LW6 selective binder pair\. \(A\) Interactions of redesigned binders \(red for the design chain of the on\-target complex and white for the design chain of the off\-target complex\) to their respective on\-target \(cyan\) and off\-target \(grey\) partners\. \(B\) Interactions of native binders to their respective on\-target and off\-target partners\.
###### Case 2: 5FFN–1LW6 \(Subtilisin\)\.

The second pair is 5FFN \(on\-target\) and 1LW6 \(off\-target\)\. The target chains of 5FFN and 1LW6 are both Subtilisin, and the native binders in 5FFN and 1LW6 are chymotrypsin inhibitors CI2A and CI2, respectively\. The interfaces of the two complexes share significant similarities \(Jaccard similarity==0\.38\), and the two target chains are structurally similar \(RMSD==2\.08 Å\)\. A segment on the interfaces of the redesigned binder is mutated from QV to ET\. In[Figure˜4\.3](https://arxiv.org/html/2605.11189#Ch4.F3)\(A\), we observe that the redesigned binder is able to retain strong interactions with the polar interfaces of the on\-target partner \(SSA\), while it has poor interactions with the off\-target partner \(ET\)\. This demonstrates that RedNet\-CD can improve specificity by destabilizing the off\-target interfaces\.

### 4\.4Conclusion

We have presented RedNet, a framework for fixed\-backbone binder sequence design that incorporates an all\-atom graph transformer architecture and a contrastive decoding algorithm\. The all\-atom graph transformer captures side\-chain information from the target, enabling improved sequence recovery over existing methods, particularly on heterodimeric interfaces that are most relevant to one\-sided binder design\. Contrastive scoring approaches improve zero\-shot binding affinity prediction, and contrastive decoding further enhances heterodimer self\-consistency and energetics, producing designs with native\-like or superior physicochemical properties\. On a newly curated selective binder benchmark, we demonstrate that contrastive decoding can discriminate between structurally similar on\-target and off\-target receptors by exploiting subtle side\-chain differences at the interface\. The flexibility of the contrastive decoding framework could enable a broader range of multistate design tasks without requiring model retraining\.

## References

- \[1\]R\. F\. Alford, A\. Leaver\-Fay, J\. R\. Jeliazkov, M\. J\. O’Meara, F\. P\. DiMaio, H\. Park, M\. V\. Shapovalov, P\. D\. Renfrew, V\. K\. Mulligan, K\. Kappel,et al\.\(2017\)The Rosetta all\-atom energy function for macromolecular modeling and design\.Journal of Chemical Theory and Computation13\(6\),pp\. 3031–3048\.External Links:[Document](https://dx.doi.org/10.1021/acs.jctc.7b00125)Cited by:[§4\.1\.1](https://arxiv.org/html/2605.11189#Ch4.S1.SS1.SSS0.Px1.p1.1)\.
- \[2\]U\. Alon and E\. Yahav\(2021\)On the bottleneck of graph neural networks and its practical implications\.InInternational Conference on Learning Representations \(ICLR\),External Links:[Link](https://openreview.net/forum?id=i80OPhOC9v)Cited by:[§4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx1.Px2.p1.1)\.
- \[3\]H\. M\. Berman, J\. Westbrook, Z\. Feng, G\. Gilliland, T\. N\. Bhat, H\. Weissig, I\. N\. Shindyalov, and P\. E\. Bourne\(2000\)The protein data bank\.Nucleic acids research28\(1\),pp\. 235–242\.Cited by:[§4\.2\.4](https://arxiv.org/html/2605.11189#Ch4.S2.SS4.SSSx1.p1.1)\.
- \[4\]S\. Brody, U\. Alon, and E\. Yahav\(2022\)How attentive are graph attention networks?\.InInternational Conference on Learning Representations \(ICLR\),External Links:[Link](https://openreview.net/forum?id=F72ximsx7C1),[Document](https://dx.doi.org/10.48550/arXiv.2105.14491)Cited by:[§4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx1.Px1.p1.1),[§4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx2.Px1.p1.6)\.
- \[5\]T\. Cai, S\. Luo, K\. Xu, D\. He, T\. Liu, and L\. Wang\(2021\)GraphNorm: a principled approach to accelerating graph neural network training\.InInternational Conference on Machine Learning \(ICML\),External Links:[Link](https://proceedings.mlr.press/v139/cai21e.html)Cited by:[§4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx1.Px2.p1.1)\.
- \[6\]A\. Chevalier, D\. Silva, G\. J\. Rocklin, D\. R\. Hicks, R\. Gebabla,et al\.\(2017\)Massively parallel de novo protein design for targeted therapeutics\.Nature550\(7674\),pp\. 74–79\.External Links:[Document](https://dx.doi.org/10.1038/nature23912)Cited by:[§4\.1\.1](https://arxiv.org/html/2605.11189#Ch4.S1.SS1.SSS0.Px3.p1.1)\.
- \[7\]A\. E\. Chu, T\. Lu, and P\. Huang\(2024\)Sparks of function by de novo protein design\.Nature biotechnology42\(2\),pp\. 203–215\.Cited by:[§4\.1](https://arxiv.org/html/2605.11189#Ch4.S1.p1.1)\.
- \[8\]J\. Dauparas, I\. Anishchenko, N\. Bennett,et al\.\(2022\)Robust deep learning\-based protein sequence design using proteinmpnn\.Science378\(6615\),pp\. 49–56\.Cited by:[§4\.1\.1](https://arxiv.org/html/2605.11189#Ch4.S1.SS1.SSS0.Px2.p1.1),[§4\.2\.4](https://arxiv.org/html/2605.11189#Ch4.S2.SS4.SSSx3.p1.1)\.
- \[9\]A\. Deng, K\. D\. Householder, F\. Wu, K\. C\. Garcia, and B\. L\. Trippe\(2025\)Predicting mutational effects on protein binding from folding energy\.InProceedings of the 42nd International Conference on Machine Learning \(ICML\),Proceedings of Machine Learning Research, Vol\.267,pp\. 13129–13151\.External Links:[Link](https://proceedings.mlr.press/v267/deng25d.html)Cited by:[§4\.2\.3](https://arxiv.org/html/2605.11189#Ch4.S2.SS3.SSSx1.p2.4)\.
- \[10\]H\. Dieckhaus, M\. Brocidiacono, N\. Z\. Randolph, and B\. Kuhlman\(2024\)Transfer learning to leverage larger datasets for improved prediction of protein stability changes\.Proceedings of the national academy of sciences121\(6\),pp\. e2314853121\.Cited by:[§4\.2\.3](https://arxiv.org/html/2605.11189#Ch4.S2.SS3.p3.5)\.
- \[11\]O\. Dutton, S\. Bottaro, M\. Invernizzi, I\. Redl, A\. Chung, F\. Hoffmann, L\. Henderson, S\. Ruschetta, F\. Airoldi, B\. M\. J\. Owens, P\. Foerch, C\. Fisicaro, and K\. Tamiola\(2024\)Improving inverse folding models at protein stability prediction without additional training or data\.InNeurIPS 2024 Workshop on Machine Learning in Structural Biology \(MLSB\),External Links:[Link](https://www.mlsb.io/papers_2024/Improving_Inverse_Folding_models_at_Protein_Stability_Prediction_without_additional_Training_or_Data.pdf)Cited by:[§4\.2\.3](https://arxiv.org/html/2605.11189#Ch4.S2.SS3.SSSx1.p2.4)\.
- \[12\]V\. P\. Dwivedi and X\. Bresson\(2020\)A generalization of transformer networks to graphs\.arXiv preprint arXiv:2012\.09699\.Cited by:[§4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx1.Px2.p1.1)\.
- \[13\]J\. Frellsen, M\. M\. Kassem, T\. Bengtsen,et al\.\(2025\)Zero\-shot protein stability prediction by inverse folding models: a free energy interpretation\.arXiv preprint arXiv:2506\.05596\.Cited by:[§4\.2\.3](https://arxiv.org/html/2605.11189#Ch4.S2.SS3.SSSx1.p2.4)\.
- \[14\]Z\. Gao, C\. Tan, and S\. Z\. Li\(2023\)PiFold: toward effective and efficient protein inverse folding\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=oMsN9TYwJ0j)Cited by:[§4\.1\.1](https://arxiv.org/html/2605.11189#Ch4.S1.SS1.SSS0.Px2.p1.1),[§4\.2\.4](https://arxiv.org/html/2605.11189#Ch4.S2.SS4.SSSx3.p1.1)\.
- \[15\]M\. K\. Gilson, J\. A\. Given, B\. L\. Bush, and J\. A\. McCammon\(1997\)The statistical\-thermodynamic basis for predicting binding affinities: a physical framework\.Biophysical Journal72\(3\),pp\. 1047–1069\.External Links:[Document](https://dx.doi.org/10.1016/S0006-3495%2897%2978756-2)Cited by:[§4\.2\.3](https://arxiv.org/html/2605.11189#Ch4.S2.SS3.p3.1)\.
- \[16\]A\. B\. Guo, L\. A\. Kidd, A\. J\. Borst, I\. Redl, G\. Ueda, X\. Li, S\. Chang, J\. A\. Fallas, T\. Kortemme, and D\. Baker\(2025\)Deep learning–guided design of dynamic proteins\.Science388\(6749\),pp\. eadr7094\.External Links:[Document](https://dx.doi.org/10.1126/science.adr7094)Cited by:[§4\.1\.1](https://arxiv.org/html/2605.11189#Ch4.S1.SS1.SSS0.Px4.p1.1)\.
- \[17\]M\. A\. Hallen, J\. W\. Martin, A\. Ojewole, J\. D\. Jou, A\. U\. Lowegard, M\. S\. Frenkel, P\. Gainza, H\. M\. Nisonoff, A\. Mukund, S\. Wang,et al\.\(2018\)OSPREY 3\.0: open\-source protein redesign for you, with powerful new features\.Journal of Computational Chemistry39\(30\),pp\. 2494–2507\.External Links:[Document](https://dx.doi.org/10.1002/jcc.25522)Cited by:[§4\.1\.1](https://arxiv.org/html/2605.11189#Ch4.S1.SS1.SSS0.Px1.p1.1)\.
- \[18\]L\. Hong and T\. Kortemme\(2024\)An integrative approach to protein sequence design through multiobjective optimization\.PLoS Computational Biology20\(7\),pp\. e1011953\.External Links:[Document](https://dx.doi.org/10.1371/journal.pcbi.1011953)Cited by:[§4\.1\.1](https://arxiv.org/html/2605.11189#Ch4.S1.SS1.SSS0.Px4.p1.1)\.
- \[19\]C\. Hsu, R\. Verkuil, J\. Liu, Z\. Lin, B\. L\. Hie, T\. Sercu, A\. Lerer, and A\. Rives\(2022\)Learning inverse folding from millions of predicted structures\.bioRxiv\.External Links:[Link](https://api.semanticscholar.org/CorpusID:248151599)Cited by:[§4\.2\.4](https://arxiv.org/html/2605.11189#Ch4.S2.SS4.SSSx3.p1.1)\.
- \[20\]E\. L\. Humphris and D\. J\. Mandell\(2005\)A Rosetta\-based algorithm for multi\-state design of proteins\.Structure13\(2\),pp\. 313–323\.External Links:[Document](https://dx.doi.org/10.1016/j.str.2004.12.003)Cited by:[§4\.1\.1](https://arxiv.org/html/2605.11189#Ch4.S1.SS1.SSS0.Px4.p1.1)\.
- \[21\]J\. Ingraham, V\. Garg, R\. Barzilay, and T\. Jaakkola\(2019\)Generative models for graph\-based protein design\.Advances in Neural Information Processing Systems32\.Cited by:[§4\.1\.1](https://arxiv.org/html/2605.11189#Ch4.S1.SS1.SSS0.Px2.p1.1),[§4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx3.Px1.p1.3)\.
- \[22\]T\. Kortemme\(2024\)De novo protein design—from new structures to programmable functions\.Cell187\(18\),pp\. 4934–4953\.Cited by:[§4\.1](https://arxiv.org/html/2605.11189#Ch4.S1.p1.1)\.
- \[23\]B\. Kuhlman and P\. Bradley\(2019\)Advances in protein structure prediction and design\.Nat Rev Mol Cell Biol20\(11\),pp\. 681–697\.Cited by:[§4\.1](https://arxiv.org/html/2605.11189#Ch4.S1.p1.1),[§4\.1](https://arxiv.org/html/2605.11189#Ch4.S1.p2.1),[§4\.1](https://arxiv.org/html/2605.11189#Ch4.S1.p3.1)\.
- \[24\]Q\. Li, Z\. Han, and X\. Wu\(2018\)Deeper insights into graph convolutional networks for semi\-supervised learning\.InProceedings of the AAAI Conference on Artificial Intelligence,Cited by:[§4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx1.Px2.p1.1)\.
- \[25\]X\. L\. Li, A\. Holtzman, D\. Fried, P\. Liang, J\. Eisner, T\. Hashimoto, L\. Zettlemoyer, and M\. Lewis\(2023\-07\)Contrastive decoding: open\-ended text generation as optimization\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),Toronto, Canada,pp\. 12286–12312\.External Links:[Link](https://aclanthology.org/2023.acl-long.687)Cited by:[§4\.2\.3](https://arxiv.org/html/2605.11189#Ch4.S2.SS3.p1.1)\.
- \[26\]S\. O’Brien and M\. Lewis\(2023\)Contrastive decoding improves reasoning in large language models\.arXiv preprint arXiv:2309\.09117\.External Links:[Link](https://arxiv.org/abs/2309.09117)Cited by:[§4\.2\.3](https://arxiv.org/html/2605.11189#Ch4.S2.SS3.p1.1)\.
- \[27\]C\. N\. Pace, H\. Fu, K\. Lee Fryar, J\. Landua, S\. R\. Trevino, D\. Schell, R\. L\. Thurlkill, S\. Imura, J\. M\. Scholtz, K\. Gajiwala,et al\.\(2014\)Contribution of hydrogen bonds to protein stability\.Protein Science23\(5\),pp\. 652–661\.External Links:[Document](https://dx.doi.org/10.1002/pro.2449)Cited by:[§4\.3\.3](https://arxiv.org/html/2605.11189#Ch4.S3.SS3.SSS0.Px7.p1.2)\.
- \[28\]M\. Pacesa, L\. Nickel, C\. Schellhaas, J\. Schmidt, E\. Pyatova, L\. Kissling, P\. Barendse, J\. Choudhury, S\. Kapoor, A\. Alcaraz\-Serna,et al\.\(2025\)One\-shot design of functional protein binders with bindcraft\.Nature646\(8084\),pp\. 483–492\.Cited by:[§4\.1\.1](https://arxiv.org/html/2605.11189#Ch4.S1.SS1.SSS0.Px3.p2.1),[§4\.1](https://arxiv.org/html/2605.11189#Ch4.S1.p1.1),[§4\.1](https://arxiv.org/html/2605.11189#Ch4.S1.p2.1),[§4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx1.Px2.p1.1),[§4\.3\.3](https://arxiv.org/html/2605.11189#Ch4.S3.SS3.SSS0.Px4.p1.1),[§4\.3\.3](https://arxiv.org/html/2605.11189#Ch4.S3.SS3.SSS0.Px7.p1.2),[§4\.3\.3](https://arxiv.org/html/2605.11189#Ch4.S3.SS3.p1.1),[§4\.3\.4](https://arxiv.org/html/2605.11189#Ch4.S3.SS4.SSS0.Px1.p1.6)\.
- \[29\]H\. Park, P\. Bradley, P\. Greisen Jr\., Y\. Liu, D\. Baker, and F\. DiMaio\(2016\)Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules\.Journal of Chemical Theory and Computation12\(12\),pp\. 6201–6212\.External Links:[Document](https://dx.doi.org/10.1021/acs.jctc.6b00819)Cited by:[§4\.1\.1](https://arxiv.org/html/2605.11189#Ch4.S1.SS1.SSS0.Px1.p1.1)\.
- \[30\]L\. Rampášek, M\. Galkin, V\. P\. Dwivedi, A\. T\. Luu, G\. Wolf, and D\. Beaini\(2022\)Recipe for a general, powerful, scalable graph transformer\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Vol\.35,pp\. 14501–14515\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2022/hash/5df679720f4c330f8a96c9053075c742-Abstract-Conference.html)Cited by:[§4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx1.Px2.p1.1)\.
- \[31\]M\. Steinegger and J\. Söding\(2017\)MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets\.Nature Biotechnology35\(11\),pp\. 1026–1028\.External Links:[Document](https://dx.doi.org/10.1038/nbt.3988)Cited by:[§4\.2\.4](https://arxiv.org/html/2605.11189#Ch4.S2.SS4.SSSx2.Px1.p1.6)\.
- \[32\]P\. Veličković, G\. Cucurull, A\. Casanova, A\. Romero, P\. Liò, and Y\. Bengio\(2018\)Graph attention networks\.InInternational Conference on Learning Representations \(ICLR\),External Links:[Link](https://openreview.net/forum?id=rJz0AsC5M)Cited by:[§4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx1.Px1.p1.1)\.
- \[33\]J\. L\. Watson, D\. Juergens, N\. R\. Bennett, B\. L\. Trippe, J\. Yim, H\. E\. Eisenach, W\. Ahern, A\. J\. Borst, R\. J\. Ragotte, L\. F\. Milles,et al\.\(2023\)De novo design of protein structure and function with rfdiffusion\.Nature620\(7976\),pp\. 1089–1100\.Cited by:[§4\.1\.1](https://arxiv.org/html/2605.11189#Ch4.S1.SS1.SSS0.Px3.p2.1),[§4\.1](https://arxiv.org/html/2605.11189#Ch4.S1.p2.1)\.
- \[34\]S\. J\. Wodak, E\. Paci, M\. Karplus, C\. P\. Mullinax, D\. Perahia, L\. C\. Remer, B\. Roux, J\. C\. Smith, W\. Thiel, and R\. Elber\(2019\)Allostery in its many disguises: from theory to applications\.Structure27\(4\),pp\. 566–578\.External Links:[Document](https://dx.doi.org/10.1016/j.str.2019.01.003)Cited by:[§4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx1.Px2.p1.1)\.
- \[35\]Z\. Xie and J\. Xu\(2022\)Deep graph learning of inter\-protein contacts\.Bioinformatics38\(4\),pp\. 947–953\.Cited by:[§4\.2\.1](https://arxiv.org/html/2605.11189#Ch4.S2.SS1.SSS0.Px1.p1.6)\.
- \[36\]K\. Xu, W\. Hu, J\. Leskovec, and S\. Jegelka\(2019\)How powerful are graph neural networks?\.InInternational Conference on Learning Representations \(ICLR\),External Links:[Link](https://openreview.net/forum?id=ryGs6iA5Km)Cited by:[§4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx1.Px2.p1.1)\.
- \[37\]C\. Ying, T\. Cai, S\. Luo, S\. Zheng, G\. Ke, D\. He, Y\. Shen, and T\. Liu\(2021\)Do transformers really perform bad for graph representation?\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Vol\.34,pp\. 28877–28888\.Cited by:[§4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx1.Px2.p1.1)\.
- \[38\]V\. Zambaldi, D\. La, A\. E\. Chu, H\. Patani, A\. E\. Danson, T\. O\. C\. Kwan, T\. Frerix, R\. G\. Schneider, D\. Saxton, A\. Thillaisundaram, Z\. Wu, I\. Moraes, O\. Lange, E\. Papa, G\. Stanton, V\. Martin, S\. Singh, L\. H\. Wong, R\. Bates, S\. A\. Kohl, J\. Abramson, A\. W\. Senior, Y\. Alguel, M\. Y\. Wu, I\. M\. Aspalter, K\. Bentley, D\. L\. V\. Bauer, P\. Cherepanov, D\. Hassabis, P\. Kohli, R\. Fergus, and J\. Wang\(2024\)De novo design of high\-affinity protein binders with AlphaProteo\.arXiv preprint arXiv:2409\.08022\.External Links:2409\.08022,[Link](https://arxiv.org/abs/2409.08022)Cited by:[§4\.1\.1](https://arxiv.org/html/2605.11189#Ch4.S1.SS1.SSS0.Px3.p2.1),[§4\.1](https://arxiv.org/html/2605.11189#Ch4.S1.p1.1),[§4\.1](https://arxiv.org/html/2605.11189#Ch4.S1.p2.1)\.
- \[39\]J\. Zhou, C\. Q\. Le, Y\. Zhang, and J\. A\. Wells\(2024\)A general approach for selection of epitope\-directed binders to proteins\.Proceedings of the National Academy of Sciences121\(19\),pp\. e2317307121\.Cited by:[§4\.1\.1](https://arxiv.org/html/2605.11189#Ch4.S1.SS1.SSS0.Px4.p1.1)\.
- \[40\]K\. Zhou, X\. Huang, Y\. Li, D\. Zha, R\. Chen, and X\. Hu\(2020\)Towards deeper graph neural networks with differentiable group normalization\.InAdvances in Neural Information Processing Systems \(NeurIPS\),External Links:[Link](https://proceedings.neurips.cc/paper/2020/hash/33dd6dba1d56e826aac1cbf23cdcca87-Abstract.html)Cited by:[§4\.2\.2](https://arxiv.org/html/2605.11189#Ch4.S2.SS2.SSSx1.Px2.p1.1)\.

## Chapter 5Conclusions and Future Directions

This thesis investigates two fundamental aspects of modeling the sequence\-structure relationship of protein complexes using deep learning: domain\-specific architectures and search algorithms\. These two aspects are closely related to the longstanding problems of parameterization and sampling in modeling protein dynamics and protein design\.

Protein structures are hierarchical, spanning atoms and residues, single domains and chains, to multi\-chain assemblies\. This hierarchy enables the design of domain\-specific deep learning architectures that extract the most from biomolecular data such as protein structures in the PDB for protein complex prediction and design\. We explore how graph neural networks and transformers with domain\-specific inductive biases can accurately and efficiently model different aspects of protein structure\.

Proteins are not only physical but also evolutionary and contextual; different aspects of proteins can be measured by different experimental techniques, and it is important that deep learning can integrate different modalities of experimental data to make biologically relevant inferences of protein conformational changes and functional effects, accurately and efficiently across different contexts\. We develop deep learning\-based algorithms that integrate protein structures, evolutionary information such as multiple sequence alignments of monomers and inferred paralogs, and assay data, demonstrating the effectiveness of deep learning in extracting complementary information from diverse data sources\.

Protein complex structure prediction and structure\-based binder design are closely related problems: the former aims to model essential aspects of the distribution of protein complex conformations given sequences, and the latter solves the corresponding inverse problem\. We successfully apply the multiscale architectures developed for modeling protein complexes to the inverse design problem, further demonstrating the generality of these deep learning approaches for protein complex modeling and design\.

Another core aspect of modeling protein complexes is the search problem\. Both the sequence and structure spaces of proteins are vast, and efficiently sampling these spaces using deep learning models is an active area of research\. In GLINTER \(chapter 2\), we use predicted interfacial contacts to constrain the search of docking poses; recent works have also explored using predicted protein contacts to guide the search of protein complex conformations and protein\-protein interactions with pretrained models\. In ESMPair \(chapter 3\), we demonstrate how to utilize deep learning models as scoring functions and transferable heuristics to guide the search for plausible interacting paralog pairs, boosting heterodimer prediction accuracy\. In RedNet \(chapter 4\), we develop sampling algorithms based on thermodynamic principles and autoregressive architectures to improve binding affinities and specificities — two core engineering parameters in protein binder design applications\.

### 5\.1Future improvements

###### Efficient deep learning architectures for all\-atom biomolecular structures\.

AlphaFold2 established domain\-specific end\-to\-end architectures for protein structure modeling, and AlphaFold3 effectively extended AlphaFold2 to additional chemical modalities, including nucleic acids, small molecules, glycans, and other common post\-translational modifications such as phosphorylation\. However, many challenges remain in current AlphaFold architectures and their variants\. The memory cost of storing pairwise activations, the inference cost of pairwise attention and multiplicative updates, and even naive full attention for large complexes and all\-atom conformations are all expensive\. Graph neural networks, as explored in our work, and other sparse architectures such as linear attention and local attention, are attractive — and even necessary — alternatives, particularly for modeling and simulating all\-atom structures and large complexes\.

Developing effective graph neural networks and sparse architectures, distilling existing models\[[23](https://arxiv.org/html/2605.11189#bibd.bib188)\], and optimizing kernels\[[10](https://arxiv.org/html/2605.11189#bibd.bib187)\]to greatly improve efficiency without sacrificing accuracy could enable protein complex modeling in many applications, such as more thorough exploration of conformational landscapes, higher\-throughput screening of designs and mutation scanning, and modeling pathways involving many complexes\.

###### Improved integration of multiple data and chemical modalities\.

Improving architectures and training models that can effectively integrate multiple sequence / structure alignments, structured and unstructured functional annotations, molecular dynamics trajectories\[[19](https://arxiv.org/html/2605.11189#bibd.bib169),[18](https://arxiv.org/html/2605.11189#bibd.bib170)\], and assay data\[[30](https://arxiv.org/html/2605.11189#bibd.bib168)\]— and that can generalize to related problems — is another ongoing challenge\.

Many pretrained models have incorporated different data modalities\[[29](https://arxiv.org/html/2605.11189#bibd.bib171),[20](https://arxiv.org/html/2605.11189#bibd.bib172),[15](https://arxiv.org/html/2605.11189#bibd.bib173),[18](https://arxiv.org/html/2605.11189#bibd.bib170),[6](https://arxiv.org/html/2605.11189#bibd.bib176)\]; however, few have convincingly demonstrated that a pretrained model can zero\- or few\-shot generalize to related tasks reliably\[[21](https://arxiv.org/html/2605.11189#bibd.bib179),[9](https://arxiv.org/html/2605.11189#bibd.bib180),[4](https://arxiv.org/html/2605.11189#bibd.bib175),[1](https://arxiv.org/html/2605.11189#bibd.bib174)\]\. RedNet attempts to generalize to binding affinity prediction in a zero\-shot setting; despite outperforming existing methods, there remains substantial room for improvement\. Given that the contrastive decoding algorithm for binder design follows thermodynamic principles, it is natural to fine\-tune current models on folding free energy using supervised\[[12](https://arxiv.org/html/2605.11189#bibd.bib178)\]or reinforcement learning\[[33](https://arxiv.org/html/2605.11189#bibd.bib177)\]approaches, such as with the Megascale dataset\[[30](https://arxiv.org/html/2605.11189#bibd.bib168)\], and generalize to binding affinity prediction\.

AlphaFold3\-like architectures are capable of handling different chemical modalities, which greatly extends their generality\. This opens many new possibilities in modeling cross\-modality molecules such as metalloenzymes and protein–small\-molecule conjugates, where conventional molecular dynamics approaches are lacking\. Currently, GLINTER and RedNet are trained only on protein structures but can be trivially extended to model all\-atom structures due to their heavy\-atom representations; a natural next step would be to extend them to other modalities in the PDB\.

###### Improved deep learning algorithms for modeling protein physics\.

Existing complex prediction and design deep learning models are typically trained to recover the geometry of native structures, such as distograms and heavy\-atom coordinates and tend to make blurry predictions\. Several types of interatomic interactions are not well captured, including hydrogen bonding, solvent effects, and electrostatics\[musil2021physics\]\. Improved treatment of these interactions could yield more accurate atomic details, better decoy ranking, and more reliable modeling of conformational changes — all of which remain challenging for current protein complex prediction and design models\. Such improvements would also enable new applications, such as designing pH\-sensitive binders\[[28](https://arxiv.org/html/2605.11189#bibd.bib163),[5](https://arxiv.org/html/2605.11189#bibd.bib164)\]or highly functional enzymes\[[27](https://arxiv.org/html/2605.11189#bibd.bib165)\]\.

Rotation and translation equivariance are inherent properties of protein dynamics\. In practice, however, state\-of\-the\-art structure prediction models — AlphaFold2 \(which, despite using invariant point attention, showed in ablation studies that equivariance contributes little to final performance\) and AlphaFold3 \(which removes equivariance altogether\) — do not benefit greatly from equivariant architectures\. This is not to say equivariance is unimportant for modeling protein conformations and dynamics\. For tasks requiring structural validity\[fu2022forces\], equivariance has proven beneficial and enables better extrapolation at test time\. Equivariant architectures also offer an elegant way to capture many\-body effects, which may be essential for certain protein dynamics applications beyond recapitulating experimental structures\. GLINTER is among the first methods to use equivariant neural networks that exploit the regularity of amino acid backbone geometry to predict inter\-protein contacts\. RedNet combines the same backbone geometry with equivariant graph attention networks for all\-atom structures in protein design, demonstrating promising improvements over pairwise distance\-based features\. It remains an open question whether and how to incorporate equivariance — in terms of training dynamics and hardware\-aligned architecture design — for modeling protein dynamics, and for which application\-specific metrics equivariances are essential\.

Diffusion models, closely related to energy\-based models, are promising alternatives for modeling protein biophysics\. Diffusion modeling and molecular dynamics have deep theoretical connections and have seen fruitful cross\-pollination in techniques for accelerated sampling and fine\-tuning\. Recent works from other groups as well as our ongoing work based on RedNet architectures have shown promising results in modeling protein physics with diffusion models\.

###### End\-to\-end protein design architectures\.

Current de novo protein design pipelines typically rely on a three\-stage setup\[[8](https://arxiv.org/html/2605.11189#bibd.bib131)\]: backbone\[[31](https://arxiv.org/html/2605.11189#bibd.bib6),[17](https://arxiv.org/html/2605.11189#bibd.bib7)\]or all\-atom structure generation\[[7](https://arxiv.org/html/2605.11189#bibd.bib125),[24](https://arxiv.org/html/2605.11189#bibd.bib126)\]structure\-based sequence redesign\[[11](https://arxiv.org/html/2605.11189#bibd.bib158)\], and ranking using deep learning structure prediction models\[[26](https://arxiv.org/html/2605.11189#bibd.bib156)\]or physics\-based force fields\[[2](https://arxiv.org/html/2605.11189#bibd.bib162),[22](https://arxiv.org/html/2605.11189#bibd.bib161)\]\. These stages are deeply connected from a Bayesian perspective, and a capable end\-to\-end model could improve the accuracy of all three while increasing efficiency\. RedNet can readily scan mutations and is straightforward to extend to simultaneous side\-chain conformation prediction\. Alternative architectures and training algorithms, such as energy\-based diffusion models\[[25](https://arxiv.org/html/2605.11189#bibd.bib167)\], are also promising and may prove more general — owing to their connection with protein dynamics — and more flexible for end\-to\-end de novo design\.

###### Accelerated search of conformation and sequence spaces\.

Current deep learning complex prediction and design models use off\-the\-shelf search algorithms at inference time\. Despite this, significant accuracy gains have been achieved by increasing the number of seeds and recycling iterations\[johansson2022improving,gao2022af2complex\]and tuning hyperparameters such as penalty weights and temperatures\[frank2024scalable,[22](https://arxiv.org/html/2605.11189#bibd.bib161)\]\. ESMPair is a proof\-of\-concept deep learning algorithm for searching interacting paralogs to improve structure predictions\. It can be extended, like AFCluster\[[32](https://arxiv.org/html/2605.11189#bibd.bib199)\], to predict multiple conformations of protein complexes, and coupled with improved confidence prediction and decoy ranking, can further improve complex prediction accuracy\.

###### Improved treatment of modality\- and application\-specific constraints for protein design\.

Compared to structure prediction, protein design is more open\-ended: different applications impose different engineering requirements, and effective design typically demands domain knowledge about both the modality \(e\.g\., whether to use a nanobody or monoclonal antibody, and which regions to maintain versus mutate to preserve functional residues\) and the application \(e\.g\., whether the engineering parameters are geometrical, mechanical, or thermodynamic\)\. For practical applications, it is important to make deep learning models more controllable to accommodate these diverse constraints and to optimize different objectives\[[34](https://arxiv.org/html/2605.11189#bibd.bib186)\]\. RedNet demonstrates that by considering the requirements of designing specific binders, one can develop deep learning algorithms that are more effective at designing and distinguishing binders with improved specificity\. RedNet can be easily finetuned to model important therapeutic modalities — including monoclonal antibodies\[[14](https://arxiv.org/html/2605.11189#bibd.bib181),[13](https://arxiv.org/html/2605.11189#bibd.bib182),[16](https://arxiv.org/html/2605.11189#bibd.bib183),[3](https://arxiv.org/html/2605.11189#bibd.bib184)\]and, more broadly, the immunoglobulin super family — to improve design performance\.

## References

- \[1\]Y\. Akiyama, Z\. Zhang, M\. Mirdita, M\. Steinegger, and S\. Ovchinnikov\(2025\)Scaling down protein language modeling with msa pairformer\.bioRxiv,pp\. 2025–08\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p2.1)\.
- \[2\]N\. R\. Bennett, B\. Coventry, I\. Goreshnik, B\. Huang, A\. Allen, D\. Vafeados, Y\. P\. Peng, J\. Dauparas, M\. Baek, L\. Stewart,et al\.\(2023\)Improving de novo protein binder design with deep learning\.Nature Communications14\(1\),pp\. 2625\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px4.p1.1)\.
- \[3\]N\. R\. Bennett, J\. L\. Watson, R\. J\. Ragotte, A\. J\. Borst, D\. L\. See, C\. Weidle, R\. Biswas, Y\. Yu, E\. L\. Shrock, R\. Ault,et al\.\(2026\)Atomically accurate de novo design of antibodies with rfdiffusion\.Nature649\(8095\),pp\. 183–193\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px6.p1.1)\.
- \[4\]A\. Bhatnagar, S\. Jain, J\. Beazer, S\. C\. Curran, A\. M\. Hoffnagle, K\. S\. Ching, M\. Martyn, S\. Nayfach, J\. A\. Ruffolo, and A\. Madani\(2025\)Scaling unlocks broader generation and deeper functional understanding of proteins\.bioRxiv,pp\. 2025–04\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p2.1)\.
- \[5\]S\. E\. Boyken, M\. A\. Benhaim, F\. Busch, M\. Jia, M\. J\. Bick, H\. Choi, J\. C\. Klima, Z\. Chen, C\. Walkey, A\. Mileant,et al\.\(2019\)De novo design of tunable, ph\-driven conformational changes\.Science364\(6441\),pp\. 658–664\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px3.p1.1)\.
- \[6\]B\. Chen, X\. Cheng, P\. Li, Y\. Geng, J\. Gong, S\. Li, Z\. Bei, X\. Tan, B\. Wang, X\. Zeng,et al\.\(2025\)XTrimoPGLM: unified 100\-billion\-parameter pretrained transformer for deciphering the language of proteins\.Nature Methods22\(5\),pp\. 1028–1039\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p2.1)\.
- \[7\]A\. E\. Chu, J\. Kim, L\. Cheng,et al\.\(2024\)An all\-atom protein generative model\.Proceedings of the National Academy of Sciences121\(27\),pp\. e2311500121\.External Links:[Document](https://dx.doi.org/10.1073/pnas.2311500121)Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px4.p1.1)\.
- \[8\]A\. E\. Chu, T\. Lu, and P\. Huang\(2024\)Sparks of function by de novo protein design\.Nature biotechnology42\(2\),pp\. 203–215\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px4.p1.1)\.
- \[9\]M\. Chungyoun, J\. Ruffolo, and J\. Gray\(2024\)FLAb: benchmarking deep learning methods for antibody fitness prediction\.BioRxiv,pp\. 2024–01\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p2.1)\.
- \[10\]T\. Dao, D\. Fu, S\. Ermon, A\. Rudra, and C\. Ré\(2022\)Flashattention: fast and memory\-efficient exact attention with io\-awareness\.Advances in neural information processing systems35,pp\. 16344–16359\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px1.p2.1)\.
- \[11\]J\. Dauparas, I\. Anishchenko, N\. Bennett,et al\.\(2022\)Robust deep learning\-based protein sequence design using proteinmpnn\.Science378\(6615\),pp\. 49–56\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px4.p1.1)\.
- \[12\]H\. Dieckhaus, M\. Brocidiacono, N\. Z\. Randolph, and B\. Kuhlman\(2024\)Transfer learning to leverage larger datasets for improved prediction of protein stability changes\.Proceedings of the national academy of sciences121\(6\),pp\. e2314853121\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p2.1)\.
- \[13\]F\. A\. Dreyer, D\. Cutting, C\. Schneider, H\. Kenlay, and C\. M\. Deane\(2023\)Inverse folding for antibody sequence design using deep learning\.arXiv preprint arXiv:2310\.19513\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px6.p1.1)\.
- \[14\]R\. R\. Eguchi, C\. A\. Choe, and P\. Huang\(2022\)Ig\-vae: generative modeling of protein structure by direct 3d coordinate generation\.PLoS computational biology18\(6\),pp\. e1010271\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px6.p1.1)\.
- \[15\]T\. Hayes, R\. Rao, H\. Akin, N\. J\. Sofroniew, D\. Oktay, Z\. Lin, R\. Verkuil, V\. Q\. Tran, J\. Deaton, M\. Wiggert,et al\.\(2025\)Simulating 500 million years of evolution with a language model\.Science387\(6736\),pp\. 850–858\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p2.1)\.
- \[16\]M\. H\. Høie, A\. M\. Hummer, T\. H\. Olsen, B\. Aguilar\-Sanjuan, M\. Nielsen, and C\. M\. Deane\(2025\)AntiFold: improved structure\-based antibody design using inverse folding\.Bioinformatics Advances5\(1\),pp\. vbae202\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px6.p1.1)\.
- \[17\]J\. B\. Ingraham, M\. Baranov, Z\. Costello, K\. W\. Barber, W\. Wang, A\. Ismail, V\. Frappier, D\. M\. Lord, C\. Ng\-Thow\-Hing, E\. R\. Van Vlack,et al\.\(2023\)Illuminating protein space with a programmable generative model\.Nature623\(7989\),pp\. 1070–1078\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px4.p1.1)\.
- \[18\]S\. Lewis, T\. Hempel, J\. Jiménez\-Luna, M\. Gastegger, Y\. Xie, A\. Y\. Foong, V\. G\. Satorras, O\. Abdin, B\. S\. Veeling, I\. Zaporozhets,et al\.\(2025\)Scalable emulation of protein equilibrium ensembles with generative deep learning\.Science389\(6761\),pp\. eadv9817\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p1.1),[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p2.1)\.
- \[19\]A\. Mirarchi, T\. Giorgino, and G\. De Fabritiis\(2024\)Mdcath: a large\-scale md dataset for data\-driven computational biophysics\.Scientific Data11\(1\),pp\. 1299\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p1.1)\.
- \[20\]E\. Nijkamp, J\. A\. Ruffolo, E\. N\. Weinstein, N\. Naik, and A\. Madani\(2023\)Progen2: exploring the boundaries of protein language models\.Cell systems14\(11\),pp\. 968–978\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p2.1)\.
- \[21\]P\. Notin, A\. Kollasch, D\. Ritter, L\. Van Niekerk, S\. Paul, H\. Spinner, N\. Rollins, A\. Shaw, R\. Orenbuch, R\. Weitzman,et al\.\(2023\)Proteingym: large\-scale benchmarks for protein fitness prediction and design\.Advances in neural information processing systems36,pp\. 64331–64379\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p2.1)\.
- \[22\]M\. Pacesa, L\. Nickel, C\. Schellhaas, J\. Schmidt, E\. Pyatova, L\. Kissling, P\. Barendse, J\. Choudhury, S\. Kapoor, A\. Alcaraz\-Serna,et al\.\(2025\)One\-shot design of functional protein binders with bindcraft\.Nature646\(8084\),pp\. 483–492\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px4.p1.1),[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px5.p1.1)\.
- \[23\]A\. Polino, R\. Pascanu, and D\. Alistarh\(2018\)Model compression via distillation and quantization\.arXiv preprint arXiv:1802\.05668\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px1.p2.1)\.
- \[24\]W\. Qu, J\. Guan, R\. Ma, and K\. Zhai\(2024\)P\(all\-atom\) is unlocking new path for protein design\.bioRxiv\.External Links:[Document](https://dx.doi.org/10.1101/2024.08.16.608235)Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px4.p1.1)\.
- \[25\]J\. P\. Roney, C\. Ou, and S\. Ovchinnikov\(2025\)Protein diffusion models as statistical potentials\.bioRxiv,pp\. 2025–12\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px4.p1.1)\.
- \[26\]J\. P\. Roney and S\. Ovchinnikov\(2022\)State\-of\-the\-art estimation of protein model accuracy using alphafold\.Physical review letters129\(23\),pp\. 238101\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px4.p1.1)\.
- \[27\]D\. Röthlisberger, O\. Khersonsky, A\. M\. Wollacott, L\. Jiang, J\. DeChancie, J\. Betker, J\. L\. Gallaher, E\. A\. Althoff, A\. Zanghellini, O\. Dym,et al\.\(2008\)Kemp elimination catalysts by computational enzyme design\.Nature453\(7192\),pp\. 190–195\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px3.p1.1)\.
- \[28\]E\. Strauch, S\. J\. Fleishman, and D\. Baker\(2014\)Computational design of a ph\-sensitive igg binding protein\.Proceedings of the National Academy of Sciences111\(2\),pp\. 675–680\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px3.p1.1)\.
- \[29\]J\. Su, C\. Han, Y\. Zhou, J\. Shan, X\. Zhou, and F\. Yuan\(2023\)SaProt: protein language modeling with structure\-aware vocabulary\.bioRxiv\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p2.1)\.
- \[30\]K\. Tsuboyama, J\. Dauparas, J\. Chen, E\. Laine, Y\. Mohseni Behbahani, J\. J\. Weinstein, N\. M\. Mangan, S\. Ovchinnikov, and G\. J\. Rocklin\(2023\)Mega\-scale experimental analysis of protein folding stability in biology and design\.Nature620\(7973\),pp\. 434–444\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p1.1),[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p2.1)\.
- \[31\]J\. L\. Watson, D\. Juergens, N\. R\. Bennett, B\. L\. Trippe, J\. Yim, H\. E\. Eisenach, W\. Ahern, A\. J\. Borst, R\. J\. Ragotte, L\. F\. Milles,et al\.\(2023\)De novo design of protein structure and function with rfdiffusion\.Nature620\(7976\),pp\. 1089–1100\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px4.p1.1)\.
- \[32\]H\. K\. Wayment\-Steele, A\. Ojoawo, R\. Otten, J\. M\. Apitz, W\. Pitsawong, M\. Hömberger, S\. Ovchinnikov, L\. Colwell, and D\. Kern\(2024\)Predicting multiple conformations via sequence clustering and AlphaFold2\.Nature625\(7996\),pp\. 832–839\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px5.p1.1)\.
- \[33\]T\. Widatalla, R\. Rafailov, and B\. Hie\(2024\)Aligning protein generative models with experimental fitness via direct preference optimization\.bioRxiv,pp\. 2024–05\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px2.p2.1)\.
- \[34\]J\. Zhou, C\. Q\. Le, Y\. Zhang, and J\. A\. Wells\(2024\)A general approach for selection of epitope\-directed binders to proteins\.Proceedings of the National Academy of Sciences121\(19\),pp\. e2317307121\.Cited by:[§5\.1](https://arxiv.org/html/2605.11189#Ch5.S1.SS0.SSS0.Px6.p1.1)\.

Similar Articles

Co-folding model guided by structural proteomics

arXiv cs.LG

Introduces AIMS-Fold, an inference-time guided-diffusion framework that integrates cross-linking mass spectrometry (XL-MS) and hydrogen-deuterium exchange (HDX-MS) data to improve protein co-folding predictions for induced proximity drug targets.

ProtSent: Protein Sentence Transformers

arXiv cs.LG

This article introduces ProtSent, a contrastive fine-tuning framework for protein language models that improves embedding quality for downstream tasks like remote homology detection and structural retrieval.