Deciphering Fingerprints of 3D Molecular Surfaces for Accurate Epitope Prediction

arXiv cs.LG Papers

Summary

SurfBind, a surface-centric learning framework for epitope prediction, uses Transformer-based architecture with patch-level surface modeling and binder-aware cross-attention to achieve state-of-the-art performance on epitope identification benchmarks.

arXiv:2606.23830v1 Announce Type: new Abstract: Molecular surfaces encode the geometric and physicochemical patterns that determine antibody-antigen recognition, central to epitope prediction. However, existing methods rely on sequences or backbone structures and struggle to capture discontinuous, surface-driven epitopes. This study presents SurfBind, a surface-centric learning framework for epitope prediction that operates directly on molecular surface representations. SurfBind integrates geometric and physicochemical cues through a Transformer-based architecture with patch-level surface modeling, binder-aware cross-attention, and a hierarchical coarse-to-fine prediction paradigm. Experiments on challenging epitope identification benchmarks, including SAbDab and DB5.5, demonstrate that SurfBind achieves state-of-the-art performance and strong generalization across unseen antibodies and conformational states, highlighting the value of interaction-aware surface modeling for understanding the crucial mechanisms of protein-protein interactions.
Original Article
View Cached Full Text

Cached at: 06/24/26, 07:48 AM

# Deciphering Fingerprints of 3D Molecular Surfaces for Accurate Epitope Prediction
Source: [https://arxiv.org/html/2606.23830](https://arxiv.org/html/2606.23830)
,Weihao XuanThe University of TokyoTokyoJapan,Jure LeskovecStanford UniversityPalo AltoUSA,Yejin ChoiStanford UniversityPalo AltoUSAandLi Erran LiAmazon AWSPalo AltoUSA

\(2026\)

###### Abstract\.

Molecular surfaces encode the geometric and physicochemical patterns that determine antibody\-antigen recognition, central to epitope prediction\. However, existing methods rely on sequences or backbone structures and struggle to capture discontinuous, surface\-driven epitopes\. This study presents SurfBind, a surface\-centric learning framework for epitope prediction that operates directly on molecular surface representations\. SurfBind integrates geometric and physicochemical cues through a Transformer\-based architecture with patch\-level surface modeling, binder\-aware cross\-attention, and a hierarchical coarse\-to\-fine prediction paradigm\. Experiments on challenging epitope identification benchmarks, including SAbDab and DB5\.5, demonstrate that SurfBind achieves state\-of\-the\-art performance and strong generalization across unseen antibodies and conformational states, highlighting the value of interaction\-aware surface modeling for understanding the crucial mechanisms of protein\-protein interactions\.

3D Surface Modeling, Protein\-protein Interaction

††journalyear:2026††copyright:cc††conference:Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2; August 09–13, 2026; Jeju Island, Republic of Korea††booktitle:Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2 \(KDD ’26\), August 09–13, 2026, Jeju Island, Republic of Korea††doi:10\.1145/3770855\.3818825††isbn:979\-8\-4007\-2259\-2/2026/08††ccs:Applied computing Molecular structural biology## 1\.Introduction

![Refer to caption](https://arxiv.org/html/2606.23830v1/x1.png)Figure 1\.Schematic overview of our antigen\-binding site prediction model\.Firstly, the antigen surface is sampled into a point cloud via a fast sampling mechanism \(a\), with fine\-grained features extracted by a point cloud network \(b,c\)\. The point cloud is then downsampled into ordered patches \(d,e\)\. In parallel, the antibody is represented using either protein language models \(PLMs\) or structure encoders \(f\)\. The antigen and antibody representations are then fed into SurfFormer to exchange mutual information and achieve binder\-awareness \(g\)\. Finally, antigen features are propagated from the subsampled patches back to the original points via interpolation \(i\), enabling multi\-resolution epitope prediction at both the point and patch levels \(h,j,k\)\.Proteins are fundamental components of biological systems, and their most critical functions, particularly in immune recognition and signaling, are often mediated through specific protein–protein interactions \(PPIs\)\(Denget al\.,[2025](https://arxiv.org/html/2606.23830#bib.bib136); Wu and Li,[2026](https://arxiv.org/html/2606.23830#bib.bib135)\)\. In antibody\-antigen binding, these interactions are governed by epitopes: localized regions on antigen surfaces whose geometric shape and physicochemical composition determine binding specificity and affinity\. Accurate epitope prediction is therefore central to antibody engineering, immunotherapy, and vaccine design\(Esmaielbeikiet al\.,[2016](https://arxiv.org/html/2606.23830#bib.bib18); Peterset al\.,[2020](https://arxiv.org/html/2606.23830#bib.bib12)\)\. However, this task remains challenging due to the complex and heterogeneous nature of epitopes, which are often discontinuous in sequence, sparsely distributed on protein surfaces, and highly sensitive to local surface geometry and chemistry\(Zenget al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib40); Wu,[2024](https://arxiv.org/html/2606.23830#bib.bib9);wusurfdesign; Wuet al\.,[2023a](https://arxiv.org/html/2606.23830#bib.bib131),[2026b](https://arxiv.org/html/2606.23830#bib.bib142),[2022b](https://arxiv.org/html/2606.23830#bib.bib141); Liet al\.,[2026](https://arxiv.org/html/2606.23830#bib.bib140)\)\.

Existing computational methods for epitope prediction primarily rely on sequences or backbone\-centric structural features\(Rives and others,[2021](https://arxiv.org/html/2606.23830#bib.bib94); Zhanget al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib148); Cliffordet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib17); Wu,[2025](https://arxiv.org/html/2606.23830#bib.bib7); Wuet al\.,[2021](https://arxiv.org/html/2606.23830#bib.bib6),[2026a](https://arxiv.org/html/2606.23830#bib.bib139),[2025](https://arxiv.org/html/2606.23830#bib.bib137); Jianget al\.,[2025](https://arxiv.org/html/2606.23830#bib.bib138)\)\. While effective for capturing global protein properties, these representations often struggle to resolve fine\-grained surface patterns that directly mediate antibody–antigen recognition\. In contrast, molecular surfaces encode the spatial arrangement and physicochemical complementarity that underlie binding interactions\(Mylonaset al\.,[2021](https://arxiv.org/html/2606.23830#bib.bib45); Riahiet al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib118)\)\. Yet surface information is often treated as auxiliary rather than as a first\-class modeling target, thereby limiting the ability of existing methods to accurately localize binding sites and generalize to unseen epitopes\.

Beyond that, epitope prediction presents additional challenges\. First, epitope formation is inherently interaction\-dependent: the same antigen surface may expose different binding regions depending on the antibody context, making partner\-agnostic predictions unreliable\(Potocnakovaet al\.,[2016](https://arxiv.org/html/2606.23830#bib.bib15); Soria\-Guerraet al\.,[2015](https://arxiv.org/html/2606.23830#bib.bib13)\)\. Second, meaningful epitope signals are often subtle and localized, requiring models to reason across multiple spatial scales, from coarse surface regions to fine\-grained atomic neighborhoods\. Finally, models must generalize across diverse antibodies and antigen families, where binding interfaces can vary significantly in size, shape, and chemical composition\(Desai and Kulkarni\-Kale,[2014](https://arxiv.org/html/2606.23830#bib.bib14); Sanchez\-Trincadoet al\.,[2017](https://arxiv.org/html/2606.23830#bib.bib16)\)\.

This work introduces SurfBind, which explicitly models binding\-relevant surface patterns and cross\-molecular dependencies\. It bridges the gap between surface pretraining and downstream PPI tasks by integrating geometric surface encoding with binder\-aware context modeling and hierarchical coarse\-to\-fine prediction\. Specifically, SurfBind partitions molecular surfaces into irregular local patches that respect the sparsity and redundancy of surface point clouds and organizes them using Morton ordering to enable efficient global reasoning\. SurfFormer\+\+ is then employed to model long\-range dependencies among surface patches and to incorporate geometric priors\. Crucially, SurfBind extends beyond single\-surface encoding by introducing binder\-aware cross\-attention, enabling the explicit exchange of information between interacting molecular partners\. To encourage interaction\-aligned representations, SurfBind leverages discrete latent modeling and multi\-level reconstruction objectives that target not only point statistics but also surface geometry and physicochemical properties\.

Evaluation on standard epitope prediction benchmarks demonstrates that explicitly modeling surface\-binder interactions yields improved accuracy, stronger generalization to unseen epitopes, and greater robustness across diverse antibody contexts\. These results highlight the importance of interaction\-driven surface modeling for epitope discovery and advance the state of the art in computational antibody\-antigen interface recognition\.

## 2\.Method

### 2\.1\.Preliminaries and Mathematical Notations

#### Task Description

Epitopes, known as antigenic determinants \(ADs\), are specific regions on antigens’ surfaces, which activate the human immune system against pathogens or abnormal cells\(Zenget al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib40); Wu,[2026](https://arxiv.org/html/2606.23830#bib.bib8)\)\. Their characterization and identification are significant for designing therapeutic or diagnostic antibodies, developing immunodiagnostic tests, and advancing epitope\-based peptide vaccines to combat infectious diseases\(Bukhariet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib39)\)\. Additionally, ADs may influence, yet are often overlooked, in improving the efficacy of RNA vaccines\. Their properties determine whether RNA vaccines can elicit an immune response and which responses will ensue\.

Epitopes are classified into two categories: B\-cell and T\-cell epitopes\. B\-cell epitopes \(BCEs\) are antigen fragments recognized by B cells and feature solvent\-exposed regions and can be classified as conformational or linear\. Linear BCEs consist of consecutive peptides and residues, while conformational BCEs comprise patches of solvent\-exposed atoms from non\-sequential residues, termed continuous and discontinuous BCEs, respectively\. Experimental techniques, such as peptide microarrays and phage display libraries, help identify linear BCEs\(Qiet al\.,[2021](https://arxiv.org/html/2606.23830#bib.bib34)\)\. However, approximately 90% of native BCEs are discontinuous, and mapping conformational BCEs without a complex structure is more difficult as their component residues may be far apart in sequences but spatially co\-located within protein structures\. Hydrogen/deuterium exchange experiments can infer this sort of BCEs but are confounded by allosteric structural perturbation when the binding effect extends beyond the binding site\(Denget al\.,[2017](https://arxiv.org/html/2606.23830#bib.bib36)\)\. Alternatively, computational methods like homology modeling, docking simulations, and molecular dynamics simulations are employed\. Despite their success, most conventional approaches are time\-consuming and require expertise in protein structure and function\.

#### Protein surface representation\.

We follow established surface construction and preprocessing pipelines\(Sverrissonet al\.,[2021](https://arxiv.org/html/2606.23830#bib.bib68); Mylonaset al\.,[2021](https://arxiv.org/html/2606.23830#bib.bib45); Stebliankinet al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib65); Li and Liu,[2023](https://arxiv.org/html/2606.23830#bib.bib46); Wu and Li,[2024](https://arxiv.org/html/2606.23830#bib.bib177)\)to achieve effective protein surface learning\. A protein withNNatoms is represented as𝒱a=\{\(𝐱ia,𝐭ia\)\}i=1N\\mathcal\{V\}^\{a\}=\\\{\(\\mathbf\{x\}^\{a\}\_\{i\},\\mathbf\{t\}^\{a\}\_\{i\}\)\\\}\_\{i=1\}^\{N\}, where𝐱ia∈ℝ3\\mathbf\{x\}^\{a\}\_\{i\}\\in\\mathbb\{R\}^\{3\}denotes atom coordinates and𝐭ia∈ℝ6\\mathbf\{t\}^\{a\}\_\{i\}\\in\\mathbb\{R\}^\{6\}encodes their one\-hot chemical types in the list\[C,H,O,N,S,Se\]\[\\mathrm\{C\},\\mathrm\{H\},\\mathrm\{O\},\\mathrm\{N\},\\mathrm\{S\},\\mathrm\{Se\}\]\. Protein surfaces are modeled as level sets of a smooth signed distance function \(SDF\) defined over atom centers\.

Each surface point𝐱is∈ℝ3\\mathbf\{x\}^\{s\}\_\{i\}\\in\\mathbb\{R\}^\{3\}is initialized by stochastic sampling around atom coordinates and projected onto a target SDF level set via gradient\-based optimization\. Surface normals𝐧is\\mathbf\{n\}^\{s\}\_\{i\}are computed as normalized SDF gradients at𝐱is\\mathbf\{x\}^\{s\}\_\{i\}\. After removing interior points, the resulting protein surface is represented as an oriented point cloudS=\{\(𝐱is,𝐧is\)\}i=1M\.S=\\\{\(\\mathbf\{x\}^\{s\}\_\{i\},\\mathbf\{n\}^\{s\}\_\{i\}\)\\\}\_\{i=1\}^\{M\}\.Every surface point is augmented with a chemical feature vector𝐡is∈ℝϕh\\mathbf\{h\}^\{s\}\_\{i\}\\in\\mathbb\{R\}^\{\\phi\_\{h\}\}\. To compute𝐡is\\mathbf\{h\}^\{s\}\_\{i\}, residue\-level information is aggregated fromKresK\_\{\\mathrm\{res\}\}\-nearest residues\{\(𝐱jR,𝐭jR\)\}j=1Kres\\\{\(\\mathbf\{x\}^\{R\}\_\{j\},\\mathbf\{t\}^\{R\}\_\{j\}\)\\\}\_\{j=1\}^\{K\_\{\\mathrm\{res\}\}\}based onCαC\_\{\\alpha\}distances, using a lightweight geometric aggregation network\. This residue\-centric representation offers an efficient approximation of local chemical environments while maintaining strong empirical performance\.

#### Surface patch partition and ordering\.

To enable scalable modeling, the surface point cloud𝐗s=\{𝐱is\}i=1M\\mathbf\{X\}^\{s\}=\\\{\\mathbf\{x\}^\{s\}\_\{i\}\\\}\_\{i=1\}^\{M\}is partitioned into local patches\. Specifically, a subset of patch centers𝐗c∈ℝρ​M×3\\mathbf\{X\}^\{\\mathrm\{c\}\}\\in\\mathbb\{R\}^\{\\rho M\\times 3\}is selected using farthest point sampling \(FPS\) with downsampling ratioρ\\rho\. For each center point, a local patch𝐗p∈ℝρ​M×Kp×3\\mathbf\{X\}^\{\\mathrm\{p\}\}\\in\\mathbb\{R\}^\{\\rho M\\times K\_\{\\mathrm\{p\}\}\\times 3\}is formed by selecting itsKpK\_\{\\mathrm\{p\}\}nearest neighbors from𝐗s\\mathbf\{X\}^\{s\}\.

As point clouds lack a canonical ordering, we impose a geometric sequence structure over patches\. Patch centers𝐗c\\mathbf\{X\}^\{\\mathrm\{c\}\}are mapped to a 1D ordering using a Morton \(Z\-order\) space\-filling curve\(Morton,[1966](https://arxiv.org/html/2606.23830#bib.bib56)\), producing an index sequence𝒪∈ℕρ​M×1\\mathcal\{O\}\\in\\mathbb\{N\}^\{\\rho M\\times 1\}\. Patches𝐗p\\mathbf\{X\}^\{\\mathrm\{p\}\}are arranged according to𝒪\\mathcal\{O\}, which preserves local spatial coherence while enabling sequence\-based processing in downstream models\(Chenet al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib58)\)\.

### 2\.2\.Backbone Architecture

To capture the hierarchical granularity inherent in protein surfaces \(*i\.e\.*, point\-level and patch\-level features\), we hierarchically extract surface details at multiple scales\. We then propose SurfFormer\+\+, which incorporates a cross\-attention module to enable sufficient information exchange between ligand and receptor patches\.

#### Point cloud network\.

We employ a standard surface point cloud network to extract local, point\-wise surface representations from the oriented surface point cloudS=\{\(𝐱is,𝐧is\)\}i=1MS=\\\{\(\\mathbf\{x\}^\{s\}\_\{i\},\\mathbf\{n\}^\{s\}\_\{i\}\)\\\}\_\{i=1\}^\{M\}\. The network follows the quasi\-geodesic convolution paradigm\(Sverrissonet al\.,[2021](https://arxiv.org/html/2606.23830#bib.bib68)\), where each surface point𝐱is\\mathbf\{x\}^\{s\}\_\{i\}is equipped with a local orthonormal frame\(𝐧is,𝐮is,𝐨is\)\(\\mathbf\{n\}^\{s\}\_\{i\},\\mathbf\{u\}^\{s\}\_\{i\},\\mathbf\{o\}^\{s\}\_\{i\}\)\(Duffet al\.,[2017](https://arxiv.org/html/2606.23830#bib.bib53)\)and aggregates features from a geodesic neighborhood𝒩​\(i\)\\mathcal\{N\}\(i\), which is determined by the filter window sizeσd\\sigma\_\{\\mathrm\{d\}\}\. Neighbor interactions, parameterized using relative geometric descriptors𝐩i​j=\(𝐱is−𝐱js\)⊤⋅\[𝐧is⊕𝐮is⊕𝐨is\]\\mathbf\{p\}\_\{ij\}=\\left\(\\mathbf\{x\}^\{s\}\_\{i\}\-\\mathbf\{x\}^\{s\}\_\{j\}\\right\)^\{\\top\}\\cdot\\left\[\{\\mathbf\{n\}\}^\{s\}\_\{i\}\\oplus\{\\mathbf\{u\}\}^\{s\}\_\{i\}\\oplus\{\\mathbf\{o\}\}^\{s\}\_\{i\}\\right\], are defined in the local coordinate system and weighted by a Gaussian functionw​\(di​j\)w\(\\mathrm\{d\}\_\{ij\}\)of an approximate geodesic distancedi​j\\mathrm\{d\}\_\{ij\}\. StackingL1L\_\{1\}\-layer operator yields point\-level surface features\{𝐡is\}i=1M\\\{\\mathbf\{h\}^\{s\}\_\{i\}\\\}\_\{i=1\}^\{M\}, subsequently used for patch\-level modeling\.

#### SurfFormer\+\+\.

Patch\-level representations are constructed by aggregating point\-wise features within each surface patch\. Specifically, for patch𝐗ip\\mathbf\{X\}^\{\\mathrm\{p\}\}\_\{i\}, point features\{𝐡js\(L1\)\}\\\{\{\\mathbf\{h\}^\{s\}\_\{j\}\}^\{\(L\_\{1\}\)\}\\\}are pooled and mapped to an initial patch embedding𝐡ip∈ℝϕp\\mathbf\{h\}^\{\\mathrm\{p\}\}\_\{i\}\\in\\mathbb\{R\}^\{\\phi\_\{\\mathrm\{p\}\}\}\. Global interactions between patches are modeled using a modified Transformer\(Vaswaniet al\.,[2017](https://arxiv.org/html/2606.23830#bib.bib100)\), referred to as SurfFormer, comprisingL2L\_\{2\}layers\.

Each SurfFormer layer applies multi\-head self\-attention over patch features\{𝐡ip\}\\\{\\mathbf\{h\}^\{\\mathrm\{p\}\}\_\{i\}\\\}, augmented with geometric structural embeddings derived from approximate geodesic distancesdi​j\\mathrm\{d\}\_\{ij\}via radial basis functions\(Schuttet al\.,[2018](https://arxiv.org/html/2606.23830#bib.bib51)\)\. These embeddings serve as invariant relative positional encodings, enabling geometry\-aware attention while preserving rotational and translational invariance\. In addition, the geometric ordering𝒪\\mathcal\{O\}is used to assign absolute sinusoidal positional embeddings to patches, which are then added at each layer to facilitate modeling of global context\.

#### Binder\-aware Cross\-attention Block

Incorporating binder information holds paramount importance in real\-world applications\. For instance, a more comprehensive epitope prediction task involves identifying the antigen’s interacting residues given a particular antibody\(Chenet al\.,[2024](https://arxiv.org/html/2606.23830#bib.bib43)\)\. To this end, cross\-attention is a widely used technique that facilitates the mutual exchange of features between two components\.

Notably, binders may only possess 1D sequences without crystal structures\. In such cases, protein language models \(PLMs\) like ESM\-2\(Lin and others,[2022](https://arxiv.org/html/2606.23830#bib.bib102)\)enable the acquisition of sequence\-level representations denoted as𝐡ligPLM∈ℝϕPLM\\mathbf\{h\}^\{\\mathrm\{PLM\}\}\_\{\\mathrm\{lig\}\}\\in\\mathbb\{R\}^\{\\phi\_\{\\mathrm\{PLM\}\}\}\. Then, an MLP is appended to increase the channel dimension, and the output representation is reshaped to a suitable number of patch vectors\. In the ideal scenario where both ligand and receptor structures are accessible, their patch features are written as\{𝐡j,ligp\}j=1ρ​M′\\left\\\{\\mathbf\{h\}^\{\\mathrm\{p\}\}\_\{j,\\mathrm\{lig\}\}\\right\\\}\_\{j=1\}^\{\\rho M^\{\\prime\}\}and\{𝐡i,recp\}i=1ρ​M\\left\\\{\\mathbf\{h\}^\{\\mathrm\{p\}\}\_\{i,\\mathrm\{rec\}\}\\right\\\}\_\{i=1\}^\{\\rho M\}, respectively, where we supposeM′M^\{\\prime\}patches in the ligand surface cloud𝐗ligs\\mathbf\{X\}^\{s\}\_\{\\mathrm\{lig\}\}\. The attention score is then computed as

\(1\)ei​j\(l\)=\(𝐡i,recp\(l\)​𝐖Q\)​\(𝐡j,ligp\(l\)​𝐖K\)⊤ϕp\.e\_\{ij\}^\{\(l\)\}=\\frac\{\\left\(\{\\mathbf\{h\}^\{\\mathrm\{p\}\}\_\{i,\\mathrm\{rec\}\}\}^\{\(l\)\}\\mathbf\{W\}\_\{Q\}\\right\)\\left\(\{\\mathbf\{h\}^\{\\mathrm\{p\}\}\_\{j,\\mathrm\{lig\}\}\}^\{\(l\)\}\\mathbf\{W\}\_\{K\}\\right\)^\{\\top\}\}\{\\sqrt\{\\phi\_\{\\mathrm\{p\}\}\}\}\.Here, we omit the geometric structural embedding term𝐫i​j\\mathbf\{r\}\_\{ij\}since the relative distances between ligands and receptors are typically unknown\. This module quantifies the influence of a ligand patch𝐡j,ligp\\mathbf\{h\}^\{\\mathrm\{p\}\}\_\{j,\\mathrm\{lig\}\}on a receptor patch𝐡i,recp\\mathbf\{h\}^\{\\mathrm\{p\}\}\_\{i,\\mathrm\{rec\}\}by generating attention scoresei​je\_\{ij\}that help identify relevant interactive patch pairs within a complex\. Further details on SurfFormer\+\+ are provided in App\.[A\.3](https://arxiv.org/html/2606.23830#A1.SS3)\.

### 2\.3\.Pretraining on Molecular Surfaces

By relying less on annotation, self\-supervised learning has significantly advanced domains such as language, vision, and life sciences\. Recently, Surface\-VQMAE\(Wu and Li,[2024](https://arxiv.org/html/2606.23830#bib.bib177)\)performs masked autoencoder \(MAE\)\(Heet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib49)\)on molecular surfaces, randomly masking a portion of surface patches and using an auto\-encoder to reconstruct surface features\. We extend their framework by introducing carefully designed recovery targets\.

#### Masking and tokenization\.

Surface patches are masked independently to account for patch overlap, with a masking ratioδ\\delta\. Masked and visible patch coordinate sets are denoted as𝐗p,m\\mathbf\{X\}^\{\\mathrm\{p,m\}\}and𝐗p,vis\\mathbf\{X\}^\{\\mathrm\{p,vis\}\}, respectively\. In line with empirical practice, relatively high masking ratios \(δ≥50%\\delta\\geq 50\\%\) are used without degrading performance\. Instead of a shared mask embedding, masked patch tokens are replaced by latent code embeddings using a vector\-quantized \(VQ\) formulation\. We use the discrete VAE paradigm\(Van Den Oordet al\.,[2017](https://arxiv.org/html/2606.23830#bib.bib200); Rameshet al\.,[2021](https://arxiv.org/html/2606.23830#bib.bib50)\)to establish a codebook𝒬=\{𝐞​\(i\)\}i=1NB\\mathcal\{Q\}=\\\{\\mathbf\{e\}\(i\)\\\}\_\{i=1\}^\{N\_\{B\}\}, which contains a group of embeddings𝐞​\(i\)∈ℝϕp\\mathbf\{e\}\(i\)\\in\\mathbb\{R\}^\{\\phi\_\{\\mathrm\{p\}\}\}and sample latent patch representations𝐳ip,m\\mathbf\{z\}^\{\\mathrm\{p,m\}\}\_\{i\}via a Gumbel\-Softmax relaxation\(Janget al\.,[2016](https://arxiv.org/html/2606.23830#bib.bib170)\)\. This relaxed posterior allows uncertainty to be expressed over masked tokens while remaining fully differentiable\. WhenNB=1N\_\{B\}=1, the formulation reduces to the standard MAE setup\.

#### Geometric Decoding Targets\.

Visible patch embeddings and sampled latent codes are merged as𝐇p\(0\)=𝐇p,vis⊕𝐙p,m\{\\mathbf\{H\}^\{\\mathrm\{p\}\}\}^\{\(0\)\}=\\mathbf\{H\}^\{\\mathrm\{p,vis\}\}\\oplus\\mathbf\{Z\}^\{\\mathrm\{p,m\}\}and processed by SurfFormer and lightweight decoders, producing final patch representations𝐇p\(L2\)\{\\mathbf\{H\}^\{\\mathrm\{p\}\}\}^\{\(L\_\{2\}\)\}\. Prediction targets are defined over masked patches and include point\-level statistics and surface geometry\. In particular, masked patch coordinates are reconstructed using a simple MLP head\(Liet al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib197); Chenet al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib58)\), and surface curvature descriptors are predicted based on local covariance analysis\(Mitra and Nguyen,[2003](https://arxiv.org/html/2606.23830#bib.bib196); Tianet al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib63)\)\. These geometric targets are invariant to rigid transformations\.

#### Physichemical Decoding Targets

In addition to geometry, we propose another group of chemical features\(Leemet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib160)\)as the pretraining targets denoted asϱ\\boldsymbol\{\\varrho\}, includinghydrogen bond acceptor potential and proton donorsandhydropathy\. Specifically, the locations of free electrons and potential hydrogen\-bond donors on the molecular surface were computed using a hydrogen\-bond potential as a reference\. Vertices on the molecular surface whose closest atom is a polar hydrogen, nitrogen, or oxygen were considered potential donors or acceptors in hydrogen bonds\. Then, a value from a Gaussian distribution was assigned to each vertex depending on the orientation between the heavy atoms\. These values range from−1\-1, which represents the optimal position for a hydrogen bond acceptor, to\+1\+1, which represents the optimal position for a hydrogen bond donor\. At the same time, each vertex was assigned a hydropathy scalar value according to the Kyte and Doolittle scale of the amino acid identity of the atom closest to the vertex\. These values, in the original scale, ranged from−4\.5\-4\.5\(hydrophilic\) to\+4\.5\+4\.5\(most hydrophobic\) and were then normalized to\[−1,1\]\[\-1,1\]\. Notably, we employ three lightweight MLPs to decode these physicochemical targets, thereby forcing the encoder to embed more semantic information in the surface point clouds\.

#### Training Losses\.

The overall training objective consists of four parts: the typical losses to recover the coordinates, curvatures, and chemical features for each surface patch, as well as the Kullback\-Leibler \(KL\) divergence to approximate the desired latent distributionp\(\.\)p\(\.\)\. Rigorously, the total lossℒ\\mathcal\{L\}isν1​ℒrec​\(𝐗p,m,𝐗^\)\+ν2​ℒcur​\(𝝍,𝝍^\)\+ν3​ℒchem​\(ϱ,ϱ^\)\+ν4​ℒKL​\(q​\(𝐙p,m\|𝐇p,m\),p​\(𝐙p,m\)\)\\nu\_\{1\}\\mathcal\{L\}\_\{\\mathrm\{rec\}\}\\left\(\\mathbf\{X\}^\{\\mathrm\{p,m\}\},\\hat\{\\mathbf\{X\}\}\\right\)\+\\\\ \\nu\_\{2\}\\mathcal\{L\}\_\{\\mathrm\{cur\}\}\\left\(\\boldsymbol\{\\psi\},\\hat\{\\boldsymbol\{\\psi\}\}\\right\)\+\\nu\_\{3\}\\mathcal\{L\}\_\{\\mathrm\{chem\}\}\\left\(\\boldsymbol\{\\varrho\},\\hat\{\\boldsymbol\{\\varrho\}\}\\right\)\+\\nu\_\{4\}\\mathcal\{L\}\_\{\\mathrm\{KL\}\}\\left\(q\\left\(\\mathbf\{Z\}^\{\\mathrm\{p,m\}\}\|\\mathbf\{H\}^\{\\mathrm\{p,m\}\}\\right\),p\(\\mathbf\{Z\}^\{\\mathrm\{p,m\}\}\)\\right\), where\{νi\}i=14\\\{\\nu\_\{i\}\\\}\_\{i=1\}^\{4\}are pre\-defined hyperparameters to balance the weights of different loss terms\.p\(\.\)p\(\.\)is the prior on the latent space and is usually initialized to a uniform distribution over all codebook vectors\.ℒcur\(\.\)\\mathcal\{L\}\_\{\\mathrm\{cur\}\}\(\.\)andℒchem\(\.\)\\mathcal\{L\}\_\{\\mathrm\{chem\}\}\(\.\)are both supervised via a root mean squared error \(RMSE\)\. Meanwhile, the reconstruction lossℒrec\(\.\)\\mathcal\{L\}\_\{\\mathrm\{rec\}\}\(\.\)is formulated using thel2l\_\{2\}\-norm Chamfer distance\(Fanet al\.,[2017](https://arxiv.org/html/2606.23830#bib.bib48)\)as1δ​ρ​M​Kp​∑i=1δ​ρ​M\(∑a∈𝐗^iminb∈𝐗ip,m⁡‖a−b‖22\+∑b∈𝐗ip,mmina∈𝐗^i⁡‖a−b‖22\)\\frac\{1\}\{\\delta\\rho MK\_\{\\textrm\{p\}\}\}\\sum\_\{i=1\}^\{\\delta\\rho M\}\\bigg\(\\\\ \\sum\_\{a\\in\{\\hat\{\\mathbf\{X\}\}\}\_\{i\}\}\\min\_\{b\\in\\mathbf\{X\}^\{\\mathrm\{p,m\}\}\_\{i\}\}\\\|a\-b\\\|^\{2\}\_\{2\}\+\\sum\_\{b\\in\\mathbf\{X\}^\{\\mathrm\{p,m\}\}\_\{i\}\}\\min\_\{a\\in\{\\hat\{\\mathbf\{X\}\}\}\_\{i\}\}\\\|a\-b\\\|^\{2\}\_\{2\}\\bigg\)\.

### 2\.4\.Coarse\-to\-fine Interface Prediction

#### Point Feature Propagation\.

In the patch partition module, the original point set undergoes subsampling\. However, in epitope discovery, it is necessary to identify all original surface points\. Therefore, we upsample surface patches to gradually restore the fine\-grained representations of the complete surface point cloud\.

To this end, we employ a technique inspired by PointNet\+\+\(Qiet al\.,[2017](https://arxiv.org/html/2606.23830#bib.bib54)\), wherein features are propagated from subsampled patches to the initial points\. It is realized by interpolating feature values ofρ​M\\rho Msurface patches at the coordinates of theMMsurface points\. Here, we adopt the inverse geodesic distance weighted average based onKin=3K\_\{\\textrm\{in\}\}=3nearest neighbors for this interpolation and attain:

\(2\)𝐡is,in=∑j=1Kindi​j−2​𝐡jp\(L2\)∑j=1Kindi​j−2,i=1,…,M\.\\mathbf\{h\}\_\{i\}^\{s,\\textrm\{in\}\}=\\frac\{\\sum\_\{j=1\}^\{K\_\{\\textrm\{in\}\}\}\\mathrm\{d\}\_\{ij\}^\{\-2\}\{\\mathbf\{h\}\_\{j\}^\{\\mathrm\{p\}\}\}^\{\(L\_\{2\}\)\}\}\{\\sum\_\{j=1\}^\{K\_\{\\textrm\{in\}\}\}\\mathrm\{d\}\_\{ij\}^\{\-2\}\},i=1,\.\.\.,M\.The interpolated features onMMpoints are then concatenated with skip\-linked features from the point cloud network as𝐡is,in⊕𝐡is\(L1\)∈ℝ2​ϕp\\mathbf\{h\}\_\{i\}^\{s,\\textrm\{in\}\}\\oplus\{\\mathbf\{h\}^\{s\}\_\{i\}\}^\{\(L\_\{1\}\)\}\\in\\mathbb\{R\}^\{2\\phi\_\{p\}\}\. A few shared fully\-connected and ReLU layers are applied to update each surface point’s feature vector, resulting in𝐡is′\{\\mathbf\{h\}\_\{i\}^\{s\}\}^\{\\prime\}\. The total lossℒep\\mathcal\{L\}\_\{\\mathrm\{ep\}\}is a weighted sum of the coarse\-scale \(*i\.e\.*, patch\-level\)ℒp\\mathcal\{L\}\_\{\\mathrm\{p\}\}and the fine\-scale \(*i\.e\.*, point\-level\)ℒs\\mathcal\{L\}\_\{\\mathrm\{s\}\}with a balance termζ\\zetaas:

\(3\)ℒep=ℒs\+ζ​ℒp=BCELoss​\(𝐘s,MLP​\(𝐇s′\)\)\+ζ​BCELoss​\(𝐘p,MLP​\(𝐇p\(L2\)\)\),\\begin\{split\}&\\mathcal\{L\}\_\{\\mathrm\{ep\}\}=\\mathcal\{L\}\_\{\\mathrm\{s\}\}\+\\zeta\\mathcal\{L\}\_\{\\mathrm\{p\}\}\\\\ &=\\mathrm\{BCELoss\}\\left\(\\mathbf\{Y\}^\{\\mathrm\{s\}\},\\mathrm\{MLP\}\\left\(\{\{\\mathbf\{H\}^\{s\}\}^\{\\prime\}\}\\right\)\\right\)\+\\zeta\\mathrm\{BCELoss\}\\left\(\\mathbf\{Y\}^\{\\mathrm\{p\}\},\\mathrm\{MLP\}\\left\(\{\\mathbf\{H\}^\{\\mathrm\{p\}\}\}^\{\(L\_\{2\}\)\}\\right\)\\right\),\\end\{split\}where𝐘p∈ℝρ​M\\mathbf\{Y\}^\{\\mathrm\{p\}\}\\in\\mathbb\{R\}^\{\\rho M\}and𝐘s∈ℝM\\mathbf\{Y\}^\{\\mathrm\{s\}\}\\in\\mathbb\{R\}^\{M\}are the coarse\- and fine\-grained ground truth epitope labels, respectively\. A binary cross\-entropy loss function \(BCELoss\) is utilized for supervision\. Remarkably, we adopt soft labels for𝐘p\\mathbf\{Y\}^\{\\mathrm\{p\}\}to indicate the degree or likelihood of surface patches to be the epitope, which is computed as the ratio of epitope points in each surface patch as𝐲ip=1/Kp​∑𝐱js∈𝐱ip𝐲js\\mathbf\{y\}^\{\\mathrm\{p\}\}\_\{i\}=1/K\_\{\\mathrm\{p\}\}\\sum\_\{\\mathbf\{x\}\_\{j\}^\{s\}\\in\\mathbf\{x\}^\{\\mathrm\{p\}\}\_\{i\}\}\\mathbf\{y\}^\{s\}\_\{j\}\. Due to the class imbalance issue \(*i\.e\.*, the number of non\-epitopes is far more than the number of epitopes\), we investigate the focal loss\(Linet al\.,[2017](https://arxiv.org/html/2606.23830#bib.bib42)\)to focus learning on hard misclassified examples, but observe no significant improvements\.

#### Extension to Multi\-scale Surface Representations

Our approach can be extended toSSscales \(S\>2S\>2\) for hierarchical learning by a successive surface patch partition strategy\. Specifically, we regard𝐗s\\mathbf\{X\}^\{s\}as the 0\-scale\. Then, for theii\-th scale \(1≤i≤S1\\leq i\\leq S\), the center point𝐗ci∈ℝΠj=1i​ρj​M×3\\mathbf\{X\}^\{\\mathrm\{c\}\_\{i\}\}\\in\\mathbb\{R\}^\{\\Pi\_\{j=1\}^\{i\}\{\\rho\}\_\{j\}M\\times 3\}can be produced by repeatedly using the FPS with a downsampling ratio ofρi\\rho\_\{i\}\(typicallyρi≤ρi\+1\\rho\_\{i\}\\leq\\rho\_\{i\+1\}\), and the corresponding patch𝐗pi∈ℝΠj=1i​ρj​M×Kp×3\\mathbf\{X\}^\{\\mathrm\{p\}\_\{i\}\}\\in\\mathbb\{R\}^\{\\Pi\_\{j=1\}^\{i\}\{\\rho\}\_\{j\}M\\times K\_\{\\textrm\{p\}\}\\times 3\}is obtained by aggregating the neighboringKpK\_\{\\textrm\{p\}\}points\. Consequently, by recursively back\-projecting, the masked and visible patches of all scales are acquired, denoted as\{𝐗pi,m,𝐗pi,vis\}i=1S\\\{\\mathbf\{X\}^\{\\mathrm\{p\}\_\{i\},\\mathrm\{m\}\},\\mathbf\{X\}^\{\\mathrm\{p\}\_\{i\},\\mathrm\{vis\}\}\\\}\_\{i=1\}^\{S\}\. And the coarse\-scale loss consists of multiple items as∑i=1SBCELoss​\(𝐘pi,MLP​\(𝐇pi\(L2\)\)\)\\sum\_\{i=1\}^\{S\}\\mathrm\{BCELoss\}\\left\(\\mathbf\{Y\}^\{\\mathrm\{p\}\_\{i\}\},\\mathrm\{MLP\}\\left\(\{\\mathbf\{H\}^\{\\mathrm\{p\}\_\{i\}\}\}^\{\(L\_\{2\}\)\}\\right\)\\right\)\.

## 3\.Results

### 3\.1\.Binding Site Prediction

#### Data Preprocessing

The pretraining data for SurfBind were derived fromPDB\-REDO\(Joostenet al\.,[2014](https://arxiv.org/html/2606.23830#bib.bib119)\)\. TheSAbDabdatabase\(Dunbaret al\.,[2014](https://arxiv.org/html/2606.23830#bib.bib86)\), as of 23 September 2023, was used to evaluate BCE predictions\. X\-ray crystal structures of Ab\-ag complexes binding to proteins with a resolution of 3\.0Å or better were filtered\. We deleted samples that lacked antigens or had incomplete antibodies, including those missing a heavy or light chain\. We also removed additional illegal data points with antigen chain lengths ¡10, yielding 5,531 complex structures\. The remaining subset was clustered into 658 groups based on the antigen sequence identity of 30% using MMseqs2\(Steinegger and Soding,[2017](https://arxiv.org/html/2606.23830#bib.bib64)\)\. These clusters were then split into training, validation, and test sets, with approximately 80%, 10%, and 10% allocated by sequence identity, antigen species, and binder count, respectively\. To be specific, most antigens in SAbDab have a single unique antibody, whereas antigens such as the HIV gp120 glycoprotein and the SARS\-CoV\-2 spike protein have many known binders whose sequences are highly dissimilar\. To assess the specificity of Ab\-ag binding sites, clusters of single\-binder antigens were randomly selected as validation and test samples based on the super\-clusters of antigen sequences, with the remaining samples used for training\. Meanwhile, the super\-clusters of those multi\-binder antigens were randomly split into training, validation, and test sets in proportions of 40%, 30%, and 30%, respectively\. Ultimately, we had 4,572, 548, and 411 samples under this split\. For label computation, an amino acid from an antigen is labeled as BCE if at least one heavy atom is within 4Å of another heavy atom from the antibody within the biological assembly\(Tubianaet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib183)\)\.

#### Baselines and Evaluation Metrics

We compared our mechanism with prior studies categorized into three types: sequence\-based, structure\-based, and surface\-based\. Remarkably, major BCE prediction methods includingCBTOPE\(Ansari and Raghava,[2010](https://arxiv.org/html/2606.23830#bib.bib27)\),BepiPred\-2\.0\(Jespersenet al\.,[2017](https://arxiv.org/html/2606.23830#bib.bib19)\),BepiPred\-3\.0\(Cliffordet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib17)\),Seppa\-3\.0\(Zhouet al\.,[2019](https://arxiv.org/html/2606.23830#bib.bib29)\),Epitope3D\(da Silvaet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib38)\),ScanNet\(Tubianaet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib183)\),PeSTo\(Krappet al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib26)\),SEMA\-2\.0\(Shashkova and others,[2022](https://arxiv.org/html/2606.23830#bib.bib31)\),DiscoTope\-3\.0\(Hoieet al\.,[2024](https://arxiv.org/html/2606.23830#bib.bib28)\),MaSIF\(Gainzaet al\.,[2020](https://arxiv.org/html/2606.23830#bib.bib66)\), anddMaSIF\(Sverrissonet al\.,[2021](https://arxiv.org/html/2606.23830#bib.bib68)\)were designed and trained in a partner\-agnostic way and forecasted unified binding sites, namely, all potential epitopes\. In contrast, approaches likeAF\-Multimer\(Evanset al\.,[2021](https://arxiv.org/html/2606.23830#bib.bib30)\),Pair\-EGRET\(Alamet al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib181)\),WALLE\(Liuet al\.,[2024](https://arxiv.org/html/2606.23830#bib.bib3)\), andSEPPA\-mAb\(Qiuet al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib25)\)allowed for antibody\-specific BCE discovery\. Notably, outputs of surface\-based algorithms such as SurfBind were represented as meshes or point clouds, whereas others produced residue\-level predictions\. To ensure a fair comparison across different representation scales, we mapped all surface\-based predictions to the residue level using a simple nearest\-neighbor rule\. Specifically, the prediction score for any residue was set equal to the score of its closest surface point, patch, or mesh face\. Additionally, followingda Silvaet al\.\([2022](https://arxiv.org/html/2606.23830#bib.bib38)\); Ciaet al\.\([2023](https://arxiv.org/html/2606.23830#bib.bib22)\), we disregarded buried residues and only considered residues close to the surface based on the relative solvent accessible surface area \(RSA\) threshold of 15%\. This optimization focuses the BCE task on the most relevant residues for antibody binding\. More details are in App\.[A](https://arxiv.org/html/2606.23830#A1)\.

Table 1\.Performance of various algorithms for the BCE discovery, where AB\-S is the abbreviation of antibody\-specific\.![Refer to caption](https://arxiv.org/html/2606.23830v1/x2.png)Figure 2\.\(a, b\)BCE vs\. non\-BCE ratio distributions based on unsupervised SurfBind representations using different codebook sizes, ranging from 10, 1000, to 10000, ordered by descending epitope ratio\. Bar plots display the number of patches assigned to each cluster\. Fig\.aused only point coordinate reconstruction as the pretext task, whereas Fig\.badded the surface geometry and chemical property prediction tasks for pretraining\.\(c\)Average residue type preferences of representative patch clusters over forty multi\-binder antigens\. Background color ranges from blue \(hydrophobic residues\) to red \(hydrophilic residues\) based on Kyte\-Doolittle hydropathy scales\.\(d\)Examples of the point coordinate reconstruction pretext task\. The grey and blue ones are the ground truth and predicted surface point clouds, respectively\.\(e\)Examples of the hydrogen bond donor/acceptor prediction pretext task, where colors indicate values from \-1 to 1\.\(f\)Examples of the hydrophobicity prediction pretext task, where colors show values from \-4\.5 \(hydrophilic\) to 4\.5 \(hydrophobic\)\.
#### Quantitative Comparison for BCE Discovery

Tab\.[1](https://arxiv.org/html/2606.23830#S3.T1)and Fig\.[3](https://arxiv.org/html/2606.23830#S3.F3)adocument the main results\. SurfBind achieves the best overall performance on the primary metrics, with an AUC\-PR of 0\.305 and an F1 score of 0\.429, which are considered primary because they balance precision and recall\. This represented improvements of 75\.38%, 66\.86%, and 60\.55% over antibody\-specific methods using AF\-Multimer, Pair\-EGRET, and SEPPA\-mAb, respectively\. Remarkably, the AUC\-PRs of the classic surface\-based models MaSIF and dMaSIF were only 0\.120 and 0\.136, respectively, far worse than SurfBind\. Additionally, AF\-Multimer, while highly effective for structure prediction, is not optimized for epitope localization and performs poorly under this evaluation protocol, with an AUC\-ROC of 0\.698\. This aligns with the recent study\(Polonskyet al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib41)\), which concluded that AF\-Multimer was limited in its ability to predict Ab\-ag complexes and to map epitopes\. Potential reasons for this failure included the lack of paired MSAs, low sensitivity to antibody sequences, and unimodal rather than multimodal prediction\(Tubianaet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib183)\)\. In general, sequence\-based algorithms \(average AUC\-ROC = 0\.586\) were inferior to structure\-based models \(average AUC\-ROC = 0\.677\), highlighting the importance of 3D complementary information for the BCE prediction\. It is also noteworthy that some of the latest structure\-based methods, such as DiscoTope\-3\.0, have incorporated PLM embeddings\(Evanset al\.,[2021](https://arxiv.org/html/2606.23830#bib.bib30)\)and have observed significant improvements\. Integrating knowledge from PLMs is expected to significantly enhance SurfBind\(Wuet al\.,[2023b](https://arxiv.org/html/2606.23830#bib.bib143)\)\.

![Refer to caption](https://arxiv.org/html/2606.23830v1/x3.png)Figure 3\.\(a\)Performance comparison for antibody\-specific BCE prediction on Sabdab\.b\. Case study visualization of four protein complexes\. Different antibodies can bind to entirely different regions of the same antigen \(grey structures\)\. Predicted epitopes \(red\) and non\-epitopes \(purple\) are shown on the antigen point clouds\.c\. Explainability of SurfBind attention\. Attention score maps were highly negatively correlated with distance maps between antigen residues \(rows\) and antibody residues \(columns\), where antigen residues were sorted by their minimum distance to the antibody\.d\. Hyperparameter search\. We reported fine\-tuned AUC\-ROCs across various masking ratios, downsampling ratios, and codebook sizes, and AUC\-PRs across different binding\-site distance thresholds\.e\. Ablation studies\. We documented the performance gains achieved by different pretraining techniques and auxiliary construction targets\.f\. Influence of conformational changes on SurfBind\. Bar plots showed results on bound, unbound, and predicted antigen structures from DB5\.5\.
#### Binder\-awareness of SurfBind

In addition to the overall quantitative metrics, we investigated SurfFormer’s ability to distinguish different antibody binding sites on the same antigen\. We selected four representative antigens from the SAbDab\-test set, each with multiple partner antibodies that bind to significantly different regions \(see Fig\.[3](https://arxiv.org/html/2606.23830#S3.F3)b\)\. The gametocyte surface protein \(GSP\) is important for male/female gamete fusion and exflagellation, and it interacts with host erythrocytes\. Two antibodies in PDB7ubsand7ua2targeting GSP had different lengths of complementarity\-determining regions \(CDRs\) in the variable fragment \(Fv\), with 55\.8% sequence identity\. They are bound to opposite sides of GSP, and SurfBind successfully located their distinct binding sites\. Tumor necrosis factor receptors \(TNFRs\) are a family of structurally similar membrane proteins that function as signaling pathways, activating cell death pathways or inducing gene expression involved in cellular differentiation and survival\. The sword\-shaped TNFR bound two different antibodies in PDB6mhrand6mi2with 60\.3% fv sequence identity\. SurfBind accurately captured the differences in their CDRs and predicted the associated BCEs\. A more challenging case was ADP\-ribosyl cyclase, a bifunctional enzyme catalyzing an essential chemical reaction\. Three SAbDab antibodies in PDB7duo,3l5w, and8byubound this enzyme, and SurfBind perfectly distinguished their specific binding interfaces\. Another macromolecule in humans, Interleukin 13 \(IL\-13\), is a protein encoded by the IL\-13 gene that affects immune cells in a manner similar to IL\-4\. Three antibodies with different binding modes to IL\-13 were tested\. SurfBind correctly recognized the BCE for the antibody in PDB7rew\. However, because this antibody had over 70% fv sequence similarity to the others, SurfBind yielded a more unified binding area for the antibodies in PDB8blqand3l5w\.

#### Explainability and Ablation Studies

The explainability of attention\(Vaswaniet al\.,[2017](https://arxiv.org/html/2606.23830#bib.bib100)\)provides a means to interpret the interactive patterns learned by SurfBind by comparing attention scores with geometric distributions in ground\-truth Ab\-ag complex structures\. Fig\.[3](https://arxiv.org/html/2606.23830#S3.F3)cenvision for four randomly selected Ab\-ag pairs, where we drew the cross\-attention score maps and the distance maps between ligand and receptor residues\. Higher attention scores are usually aligned with smaller minimum distances to the antibody, with a mean Spearman correlation of \-0\.427 between scores and distances\. This shows that SurfBind captured the spatial relationship between the antibody and antigen, informing its binding decision\.

We conducted comprehensive ablation studies to assess the contributions of SurfBind components and the effects of key hyperparameters \(see Fig\.[3](https://arxiv.org/html/2606.23830#S3.F3)dande\)\. First, each technique incrementally improved the AUC\-ROC on the SAbDab dataset\. The SurfFormer architecture alone achieved a competitive AUC\-ROC of 0\.768, outperforming the DiscoTope\-3\.0 baseline \(AUC\-ROC = 0\.751\)\. Vanilla MAE\-based generative pretraining improved the AUC\-ROC to 0\.802, and incorporating the VQ technique yielded the optimal AUC\-ROC of 0\.816\. Moreover, the chemical property pretext task significantly increased AUC\-ROC by 0\.033, underscoring the necessity of physicochemical knowledge to interpret PPIs\. Second, the optimal masking ratio was relatively high \(50%\-70%\), as higher patch removal rates largely eliminated redundancy, making the task challenging and difficult to solve by extrapolation from visible neighboring surface patches\. We also acknowledged that SurfBind was robust across downsampling ratios from 0\.01 to 0\.1 and benefited from a wider range of discrete latent vectors\. Thirdly, if expanding the distance thresholds defining BCEs, AUC\-PRs consistently escalated,*e\.g\.*, from 0\.305 to 0\.578 at 10Å, attributed to AUC\-PR’s higher sensitivity to class imbalance than AUC\-ROC\. This trend reflects task relaxation rather than improved localization precision\.

#### Robustness to Binders’ Conformational Changes

Proteins undergo conformational changes coupled with ligand binding\. These transitions occur across various lengths and time scales associated with functionally relevant phenomena\. Here, we examined how these conformational changes affect our model’s generalization on the Docking Benchmark \(DB\) 5\.5 dataset\(Vreven and others,[2015](https://arxiv.org/html/2606.23830#bib.bib179)\), which comprises high\-quality protein\-protein complex structures along with their unbound component forms\. The dataset was categorized by interface root\-mean\-squared deviation \(I\-RMSD\) between native and bound forms into rigid\-body \(162 cases\), medium difficulty \(60 cases\), and difficult \(35 cases\) subsets\. We designated DB5\.5 as the test set and excluded any SAbDab duplicates with 50% antigen sequence identity to avoid overlap as the training split\. We then retrained SurfBind and examined its efficacy, making several notable findings in Fig\.[3](https://arxiv.org/html/2606.23830#S3.F3)f\. First, without pretraining, SurfBind experienced a 6\.7% overall performance decline when transferring from bound to native structures\. Particularly, the loss was 15\.6% for highly flexible antigens \(average I\-RMSD = 3\.48Å\)\. Secondly, sites were easier to recognize for antigens that experienced smaller conformation changes\. SurfBind attained an AUC\-ROC of 0\.886 for rigid\-body cases, surpassing 0\.793 for highly flexible ones\. Thirdly, pretraining significantly improved SurfBind’s robustness to unbound structures, increasing the AUC\-ROC from 0\.759 to 0\.794\. We hypothesized that the pretraining dataset contained numerous monomers, enabling SurfBind to learn surface distributions of unbound forms and reduce the transfer\-learning gap\.

#### Docking with Predicted Binding Sites\.

To assess whether improved binding site prediction translates into practical gains in downstream modeling, we evaluate the effect of SurfBind\-predicted epitopes on antibody–antigen docking accuracy\. Following the DockGPT protocol\(McPartlon and Xu,[2023](https://arxiv.org/html/2606.23830#bib.bib10)\), we incorporate predicted binding sites as spatial constraints during docking and compare against blind docking and ground\-truth epitope guidance\. As detailed in App\.[C](https://arxiv.org/html/2606.23830#A3), SurfBind\-guided docking consistently improves DockQ success rate from 26\.1% to 38\.0% and reduces both interface and ligand RMSD relative to blind docking, recovering a substantial fraction of the performance achieved with native epitopes\. These results demonstrate that SurfBind predictions are accurate enough to meaningfully constrain docking and improve structural modeling outcomes\.

### 3\.2\.SurfBind is a Good Unsupervised Learner

Due to the scarcity of experimentally determined structures, PLMs have been pretrained on large unlabeled protein sequences like UniProt\(Lin and others,[2022](https://arxiv.org/html/2606.23830#bib.bib102)\)\. Despite efforts to pretrain on unlabeled 3D structures\(Wuet al\.,[2022a](https://arxiv.org/html/2606.23830#bib.bib69)\), there has been a notable lack of initiatives targeting protein molecular surfaces, which directly affect biomolecular interactions and function\. SurfBind bridges this gap by introducing a generalized MAE variant tailored for molecular surfaces\. Unlike conventional MAE, which uses a single embedding for masked tokens, SurfBind uses discrete variables to parameterize the representations of masked surface patches\. Each masked patch’s representation is replaced by the closest codebook vector in the Euclidean space before fed into the decoder\. This VQ offers two key advantages: \(1\) Following signal processing principles, VQ achieves substantial data compression with minimal loss of geometric and physicochemical surface information\. \(2\) Replacing individual data points with representative codebook vectors reduces noise, resulting in smoother and more robust patch\-level representations\. In addition to VQ, SurfBind introduces two higher\-order pretext tasks beyond just coordinate reconstruction: restoring critical surface geometric properties and chemical properties computed efficiently from the compact molecular surface boundary atoms \(Fig\.[2](https://arxiv.org/html/2606.23830#S3.F2)d,e, andf\)\.

#### Unsupervised Representations for BCE Discovery

In SurfBind, input surfaces were partitioned into patches, each mapped to a discrete codebook cluster, enabling analysis of shared patch characteristics within clusters before supervised fine\-tuning\. We extracted the representations of all surface patches in the SAbDab\-test set and mapped them to the corresponding codebook vectors and cluster assignments\. We then summarized the proportions of epitopes and non\-epitopes across clusters and visualized their distributions in Fig\.[2](https://arxiv.org/html/2606.23830#S3.F2)\. As envisioned in Fig\.[2](https://arxiv.org/html/2606.23830#S3.F2)a, nearly 6,000 out of 10,000 codebook clusters had a zero epitope ratio, illustrating that patches in those clusters were not prone to being BCEs\. Conversely, 72 clusters had an epitope ratio exceeding 50%, suggesting that patches in those clusters were more likely to be BCEs\. Moreover, incorporating the chemical property pretext task during pretraining \(refer to Fig\.[2](https://arxiv.org/html/2606.23830#S3.F2)b\) further improved the discrimination between BCEs and non\-BCEs based purely on clusters\. Moreover, as the number of codebook clusters increased from 10 to 10,000, BCEs became concentrated in a small fraction of the total number of clusters\. Notably, when raising the number of cluster categories to 10,000, only about 90 patch clusters were observed for the SAbDab\-test set\. Under these circumstances, approximately 50 of 90 clusters lacked epitopes, and 10 clusters were enriched for BCEs\. By utilizing each cluster’s epitope ratio as the predicted score for all patches in that cluster, the unsupervised SurfBind achieved an AUC\-ROC of 0\.695 on SAbDab\-test, competitive with the leading sequence\-based model AF\-Multimer \(AUC\-ROC = 0\.698\) and outperforming surface\-based MaSIF and dMaSIF baselines\. As discussed, VQMAE generalized the vanilla MAE by introducing more flexible discrete token vectors in the latent space\. VQMAE reduces to MAE in the extreme case of a single codebook cluster\. Augmenting the number of codebook clusters provided another perspective on VQ’s effectiveness\.

#### Residue Distributions across Patch Clusters

We analyzed the residue\-type distributions across various patch clusters\. For each surface patch, we associated its cluster assignment with the nearest residue\. We then tallied the occurrences of each residue type within the same patch cluster\. To mitigate bias arising from variations in residue\-type frequencies, residue counts were normalized to the total count for each residue type across all clusters\. This allowed us to derive an ”averaged preference score” reflecting the residue number distribution after analyzing over 40 multi\-binder antigens in Fig\.[2](https://arxiv.org/html/2606.23830#S3.F2)c\. We revealed that four patch clusters exhibited distinct preferences across hydropathy levels\. Some clusters were enriched for hydrophobic residues, while others favored hydrophilic residues\. This suggests that during pretraining, SurfBind effectively captured the physicochemical characteristics of molecular surfaces, as reflected in the residue compositions of the learned patch clusters\.

## 4\.Conclusion

Protein interactions with other biomolecules are fundamental to their function in most biological processes\. This study presents SurfBind, a novel surface\-based structural pretraining method that extracts valuable information from large\-scale collections of unlabeled molecular surfaces\. We demonstrate its effectiveness across several critical and challenging downstream tasks\.

## 5\.Limitations and Ethical Considerations

Our approach depends on the quality of available 3D structures and may underperform for highly flexible proteins or inaccurate structural models\. Predictions are probabilistic and require experimental validation\. This work is intended for responsible scientific research and is not a substitute for clinical or experimental decision\-making\.

## References

- R\. Alam, S\. Mahbub, and M\. S\. Bayzid \(2023\)Pair\-egret: enhancing the prediction of protein\-protein interaction sites through graph attention networks and protein language models\.bioRxiv,pp\. 2023–12\.Cited by:[§B\.2](https://arxiv.org/html/2606.23830#A2.SS2.p1.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1)\.
- H\. R\. Ansari and G\. P\. Raghava \(2010\)Identification of conformational b\-cell epitopes in an antigen from its primary sequence\.Immunome research6\(1\),pp\. 1–9\.Cited by:[§B\.1](https://arxiv.org/html/2606.23830#A2.SS1.p1.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1)\.
- S\. Basu and B\. Wallner \(2016\)DockQ: a quality measure for protein\-protein docking models\.PloS one11\(8\),pp\. e0161879\.Cited by:[Appendix C](https://arxiv.org/html/2606.23830#A3.p2.1)\.
- S\. N\. H\. BUKHARI, M\. A\. DAR, and M\. SHAFI \(2021\)USING random forest to predict t\-cell epitopes of dengue virus\.Cited by:[Appendix B](https://arxiv.org/html/2606.23830#A2.p1.1)\.
- S\. N\. H\. Bukhari, A\. Jain, E\. Haq, A\. Mehbodniya, and J\. Webber \(2022\)Machine learning techniques for the prediction of b\-cell and t\-cell epitopes as potential vaccine targets with a specific focus on sars\-cov\-2 pathogen: a review\.Pathogens11\(2\),pp\. 146\.Cited by:[§2\.1](https://arxiv.org/html/2606.23830#S2.SS1.SSS0.Px1.p1.1)\.
- G\. Chen, M\. Wang, Y\. Yang, K\. Yu, L\. Yuan, and Y\. Yue \(2023\)PointGPT: auto\-regressively generative pre\-training from point clouds\.arXiv preprint arXiv:2305\.11487\.Cited by:[§2\.1](https://arxiv.org/html/2606.23830#S2.SS1.SSS0.Px3.p2.4),[§2\.3](https://arxiv.org/html/2606.23830#S2.SS3.SSS0.Px2.p1.2)\.
- J\. Chen, B\. Zhao, S\. Lin, H\. Sun, X\. Mao, M\. Wang, Y\. Chu, L\. Hong, D\. Wei, M\. Li,et al\.\(2024\)TEPCAM: prediction of t\-cell receptor–epitope binding specificity via interpretable deep learning\.Protein Science33\(1\),pp\. e4841\.Cited by:[§2\.2](https://arxiv.org/html/2606.23830#S2.SS2.SSS0.Px3.p1.1)\.
- G\. Cia, F\. Pucci, and M\. Rooman \(2023\)Critical review of conformational b\-cell epitope prediction methods\.Briefings in bioinformatics24\(1\),pp\. bbac567\.Cited by:[§A\.2](https://arxiv.org/html/2606.23830#A1.SS2.SSS0.Px2.p2.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1)\.
- J\. N\. Clifford, M\. H\. Hoie, S\. Deleuran, B\. Peters, M\. Nielsen, and P\. Marcatili \(2022\)BepiPred\-3\.0: improved b\-cell epitope prediction using protein language models\.Protein Science31\(12\),pp\. e4497\.Cited by:[§B\.1](https://arxiv.org/html/2606.23830#A2.SS1.p1.1),[§1](https://arxiv.org/html/2606.23830#S1.p2.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1)\.
- B\. M\. da Silva, Y\. Myung, D\. B\. Ascher, and D\. E\. Pires \(2022\)Epitope3D: a machine learning method for conformational b\-cell epitope prediction\.Briefings in Bioinformatics23\(1\),pp\. bbab423\.Cited by:[§B\.2](https://arxiv.org/html/2606.23830#A2.SS2.p1.1),[Appendix B](https://arxiv.org/html/2606.23830#A2.p2.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1)\.
- A\. Del Vecchio, A\. Deac, P\. Liò, and P\. Veličković \(2021\)Neural message passing for joint paratope\-epitope prediction\.arXiv preprint arXiv:2106\.00757\.Cited by:[§B\.2](https://arxiv.org/html/2606.23830#A2.SS2.p1.1)\.
- A\. Deng, K\. Householder, F\. Wu, S\. Thrun, K\. C\. Garcia, and B\. Trippe \(2025\)Predicting mutational effects on protein binding from folding energy\.arXiv preprint arXiv:2507\.05502\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p1.1)\.
- B\. Deng, S\. Zhu, A\. M\. Macklin, J\. Xu, C\. Lento, A\. Sljoka, and D\. J\. Wilson \(2017\)Suppressing allostery in epitope mapping experiments using millisecond hydrogen/deuterium exchange mass spectrometry\.InMAbs,Vol\.9,pp\. 1327–1336\.Cited by:[§2\.1](https://arxiv.org/html/2606.23830#S2.SS1.SSS0.Px1.p2.1)\.
- D\. V\. Desai and U\. Kulkarni\-Kale \(2014\)T\-cell epitope prediction methods: an overview\.Immunoinformatics,pp\. 333–364\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p3.1)\.
- T\. Duff, J\. Burgess, P\. Christensen, C\. Hery, A\. Kensler, M\. Liani, and R\. Villemin \(2017\)Building an orthonormal basis, revisited\.JCGT6\(1\)\.Cited by:[§2\.2](https://arxiv.org/html/2606.23830#S2.SS2.SSS0.Px1.p1.10)\.
- J\. Dunbar, K\. Krawczyk, J\. Leem, T\. Baker, A\. Fuchs, G\. Georges, J\. Shi, and C\. M\. Deane \(2014\)SAbDab: the structural antibody database\.Nucleic acids research42\(D1\),pp\. D1140–D1146\.Cited by:[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px1.p1.1)\.
- Y\. EL\-Manzalawy, D\. Dobbs, and V\. Honavar \(2008\)Predicting linear b\-cell epitopes using string kernels\.Journal of Molecular Recognition: An Interdisciplinary Journal21\(4\),pp\. 243–255\.Cited by:[Appendix B](https://arxiv.org/html/2606.23830#A2.p1.1)\.
- R\. Esmaielbeiki, K\. Krawczyk, B\. Knapp, J\. Nebel, and C\. M\. Deane \(2016\)Progress and challenges in predicting protein interfaces\.Briefings in bioinformatics17\(1\),pp\. 117–131\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p1.1)\.
- R\. Evans, M\. O’Neill, A\. Pritzel, N\. Antropova, A\. Senior, T\. Green, A\. Zidek, R\. Bates, S\. Blackwell, J\. Yim,et al\.\(2021\)Protein complex prediction with alphafold\-multimer\.biorxiv,pp\. 2021–10\.Cited by:[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px3.p1.1)\.
- H\. Fan, H\. Su, and L\. J\. Guibas \(2017\)A point set generation network for 3d object reconstruction from a single image\.InProceedings of the IEEE conference on computer vision and pattern recognition,pp\. 605–613\.Cited by:[§2\.3](https://arxiv.org/html/2606.23830#S2.SS3.SSS0.Px4.p1.10)\.
- P\. Gainza, F\. Sverrisson, F\. Monti, E\. Rodola, D\. Boscaini, M\. Bronstein, and B\. Correia \(2020\)Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning\.Nature Methods17\(2\),pp\. 184–192\.Cited by:[§B\.3](https://arxiv.org/html/2606.23830#A2.SS3.p1.1),[Appendix B](https://arxiv.org/html/2606.23830#A2.p2.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1)\.
- K\. He, X\. Chen, S\. Xie, Y\. Li, P\. Dollar, and R\. Girshick \(2022\)Masked autoencoders are scalable vision learners\.InProceedings of the IEEE/CVF conference on computer vision and pattern recognition,pp\. 16000–16009\.Cited by:[§2\.3](https://arxiv.org/html/2606.23830#S2.SS3.p1.1)\.
- M\. H\. Hoie, F\. S\. Gade, J\. M\. Johansen, C\. Wurtzen, O\. Winther, M\. Nielsen, and P\. Marcatili \(2024\)DiscoTope\-3\.0: improved b\-cell epitope prediction using inverse folding latent representations\.Frontiers in Immunology15,pp\. 1322712\.Cited by:[§B\.2](https://arxiv.org/html/2606.23830#A2.SS2.p1.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1)\.
- C\. Hsu, R\. Verkuil, J\. Liu, Z\. Lin, B\. Hie, T\. Sercu, A\. Lerer, and A\. Rives \(2022\)Learning inverse folding from millions of predicted structures\.bioRxiv\.Cited by:[§B\.2](https://arxiv.org/html/2606.23830#A2.SS2.p1.1)\.
- E\. Jang, S\. Gu, and B\. Poole \(2016\)Categorical reparameterization with gumbel\-softmax\.arXiv preprint arXiv:1611\.01144\.Cited by:[§2\.3](https://arxiv.org/html/2606.23830#S2.SS3.SSS0.Px1.p1.8)\.
- M\. C\. Jespersen, B\. Peters, M\. Nielsen, and P\. Marcatili \(2017\)BepiPred\-2\.0: improving sequence\-based b\-cell epitope prediction using conformational epitopes\.Nucleic acids research45\(W1\),pp\. W24–W29\.Cited by:[§B\.1](https://arxiv.org/html/2606.23830#A2.SS1.p1.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1)\.
- Y\. Jiang, X\. Li, Y\. Zhang, J\. Han, Y\. Xu, A\. Pandit, Z\. Zhang, M\. Wang, M\. Wang, M\. Shen,et al\.\(2025\)PoseX: ai defeats physics approaches on protein\-ligand cross docking\.arXiv preprint arXiv:2505\.01700\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p2.1)\.
- W\. Jin, R\. Barzilay, and T\. Jaakkola \(2022\)Antibody\-antigen docking and design via hierarchical equivariant refinement\.arXiv preprint arXiv:2207\.06616\.Cited by:[§B\.4](https://arxiv.org/html/2606.23830#A2.SS4.p1.1)\.
- R\. P\. Joosten, F\. Long, G\. N\. Murshudov, and A\. Perrakis \(2014\)The pdb\_redo server for macromolecular structure model optimization\.IUCrJ1\(4\),pp\. 213–220\.Cited by:[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px1.p1.1)\.
- D\. P\. Kingma and J\. Ba \(2014\)Adam: a method for stochastic optimization\.arXiv preprint arXiv:1412\.6980\.Cited by:[§A\.1](https://arxiv.org/html/2606.23830#A1.SS1.SSS0.Px1.p1.5)\.
- L\. F\. Krapp, L\. A\. Abriata, F\. Cortes Rodriguez, and M\. Dal Peraro \(2023\)PeSTo: parameter\-free geometric deep learning for accurate prediction of protein binding interfaces\.Nature Communications14\(1\),pp\. 2175\.Cited by:[§B\.2](https://arxiv.org/html/2606.23830#A2.SS2.p1.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1)\.
- J\. Leem, L\. S\. Mitchell, J\. H\. Farmery, J\. Barton, and J\. D\. Galson \(2022\)Deciphering the language of antibodies using self\-supervised learning\.Patterns,pp\. 100513\.Cited by:[§2\.3](https://arxiv.org/html/2606.23830#S2.SS3.SSS0.Px3.p1.6)\.
- G\. Li, X\. Zhao, F\. Wu, and S\. Laue \(2026\)Joint design of protein surface and backbone using a diffusion bridge model\.Advances in Neural Information Processing Systems38,pp\. 169682–169708\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p1.1)\.
- P\. Li and Z\. Liu \(2023\)GeoBind: segmentation of nucleic acid binding interface on protein surface with geometric deep learning\.Nucleic Acids Research51\(10\),pp\. e60–e60\.Cited by:[§2\.1](https://arxiv.org/html/2606.23830#S2.SS1.SSS0.Px2.p1.5)\.
- S\. Li, L\. Zhang, Z\. Wang, D\. Wu, L\. Wu, Z\. Liu, J\. Xia, C\. Tan, Y\. Liu, B\. Sun,et al\.\(2023\)Masked modeling for self\-supervised representation learning on vision and beyond\.arXiv preprint arXiv:2401\.00897\.Cited by:[§2\.3](https://arxiv.org/html/2606.23830#S2.SS3.SSS0.Px2.p1.2)\.
- T\. Lin, P\. Goyal, R\. Girshick, K\. He, and P\. Dollar \(2017\)Focal loss for dense object detection\.InProceedings of the IEEE international conference on computer vision,pp\. 2980–2988\.Cited by:[§2\.4](https://arxiv.org/html/2606.23830#S2.SS4.SSS0.Px1.p2.14)\.
- Z\. Linet al\.\(2022\)Language models of protein sequences at the scale of evolution enable accurate structure prediction\.bioRxiv\.Cited by:[§B\.1](https://arxiv.org/html/2606.23830#A2.SS1.p1.1),[§2\.2](https://arxiv.org/html/2606.23830#S2.SS2.SSS0.Px3.p2.5),[§3\.2](https://arxiv.org/html/2606.23830#S3.SS2.p1.1)\.
- C\. Liu, L\. Denzler, Y\. Chen, A\. Martin, and B\. Paige \(2024\)AsEP: benchmarking deep learning methods for antibody\-specific epitope prediction\.Advances in Neural Information Processing Systems37,pp\. 11700–11734\.Cited by:[§B\.2](https://arxiv.org/html/2606.23830#A2.SS2.p1.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1)\.
- Z\. Liu, Y\. Li, L\. Han, J\. Li, J\. Liu, Z\. Zhao, W\. Nie, Y\. Liu, and R\. Wang \(2015\)PDB\-wide collection of binding data: current status of the pdbbind database\.Bioinformatics31\(3\),pp\. 405–412\.Cited by:[§A\.1](https://arxiv.org/html/2606.23830#A1.SS1.SSS0.Px2.p1.1)\.
- S\. Luo, Y\. Su, Z\. Wu, C\. Su, J\. Peng, and J\. Ma \(2023\)Rotamer density estimator is an unsupervised learner of the effect of mutations on protein\-protein interaction\.bioRxiv,pp\. 2023–02\.Cited by:[§A\.1](https://arxiv.org/html/2606.23830#A1.SS1.SSS0.Px2.p1.1)\.
- M\. McPartlon and J\. Xu \(2023\)Deep learning for flexible and site\-specific protein docking and design\.BioRxiv,pp\. 2023–04\.Cited by:[Appendix C](https://arxiv.org/html/2606.23830#A3.p2.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px7.p1.1)\.
- N\. J\. Mitra and A\. Nguyen \(2003\)Estimating surface normals in noisy point cloud data\.InProceedings of the nineteenth annual symposium on Computational geometry,pp\. 322–328\.Cited by:[§2\.3](https://arxiv.org/html/2606.23830#S2.SS3.SSS0.Px2.p1.2)\.
- G\. M\. Morton \(1966\)A computer oriented geodetic data base and a new technique in file sequencing\.Cited by:[§2\.1](https://arxiv.org/html/2606.23830#S2.SS1.SSS0.Px3.p2.4)\.
- S\. K\. Mylonas, A\. Axenopoulos, and P\. Daras \(2021\)DeepSurf: a surface\-based deep learning approach for the prediction of ligand binding sites on proteins\.Bioinformatics37\(12\),pp\. 1681–1690\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.23830#S2.SS1.SSS0.Px2.p1.5)\.
- B\. Peters, M\. Nielsen, and A\. Sette \(2020\)T cell epitope predictions\.Annual Review of Immunology38\(1\),pp\. 123–145\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p1.1)\.
- K\. Polonsky, T\. Pupko, and N\. T\. Freund \(2023\)Evaluation of the ability of alphafold to predict the three\-dimensional structures of antibodies and epitopes\.The Journal of Immunology211\(10\),pp\. 1578–1588\.Cited by:[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px3.p1.1)\.
- L\. Potocnakova, M\. Bhide, and L\. B\. Pulzova \(2016\)An introduction to b\-cell epitope mapping and in silico epitope prediction\.Journal of immunology research2016\(1\),pp\. 6760830\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p3.1)\.
- C\. R\. Qi, L\. Yi, H\. Su, and L\. J\. Guibas \(2017\)Pointnet\+\+: deep hierarchical feature learning on point sets in a metric space\.Advances in neural information processing systems30\.Cited by:[§2\.4](https://arxiv.org/html/2606.23830#S2.SS4.SSS0.Px1.p2.3)\.
- H\. Qi, M\. Ma, C\. Hu, Z\. Xu, F\. Wu, N\. Wang, D\. Lai, Y\. Li, H\. Zhang, H\. Jiang,et al\.\(2021\)Antibody binding epitope mapping \(abmap\) of hundred antibodies in a single run\.Molecular & Cellular Proteomics20\.Cited by:[§2\.1](https://arxiv.org/html/2606.23830#S2.SS1.SSS0.Px1.p2.1)\.
- T\. Qiu, L\. Zhang, Z\. Chen, Y\. Wang, T\. Mao, C\. Wang, Y\. Cun, G\. Zheng, D\. Yan, M\. Zhou,et al\.\(2023\)SEPPA\-mab: spatial epitope prediction of protein antigens for mabs\.Nucleic Acids Research,pp\. gkad427\.Cited by:[§B\.3](https://arxiv.org/html/2606.23830#A2.SS3.p1.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1)\.
- A\. Ramesh, M\. Pavlov, G\. Goh, S\. Gray, C\. Voss, A\. Radford, M\. Chen, and I\. Sutskever \(2021\)Zero\-shot text\-to\-image generation\.InInternational Conference on Machine Learning,pp\. 8821–8831\.Cited by:[§2\.3](https://arxiv.org/html/2606.23830#S2.SS3.SSS0.Px1.p1.8)\.
- S\. Riahi, J\. H\. Lee, T\. Sorenson, S\. Wei, S\. Jager, R\. Olfati\-Saber, Y\. Zhou, A\. Park, M\. Wendt, H\. Minoux,et al\.\(2023\)Surface id: a geometry\-aware system for protein molecular surface comparison\.Bioinformatics39\(4\),pp\. btad196\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p2.1)\.
- A\. Riveset al\.\(2021\)Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences\.Proceedings of the National Academy of Sciences118\(15\),pp\. e2016239118\.Cited by:[§B\.2](https://arxiv.org/html/2606.23830#A2.SS2.p1.1),[§1](https://arxiv.org/html/2606.23830#S1.p2.1)\.
- J\. L\. Sanchez\-Trincado, M\. Gomez\-Perosanz, and P\. A\. Reche \(2017\)Fundamentals and methods for t\-and b\-cell epitope prediction\.Journal of immunology research2017\(1\),pp\. 2680160\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p3.1)\.
- K\. T\. Schutt, H\. E\. Sauceda, P\. Kindermans, A\. Tkatchenko, and K\. Muller \(2018\)Schnet–a deep learning architecture for molecules and materials\.The Journal of Chemical Physics148\(24\)\.Cited by:[§2\.2](https://arxiv.org/html/2606.23830#S2.SS2.SSS0.Px2.p2.3)\.
- T\. I\. Shashkovaet al\.\(2022\)SEMA: antigen b\-cell conformational epitope prediction using deep transfer learning\.Frontiers in immunology,pp\. 5272\.Cited by:[§B\.2](https://arxiv.org/html/2606.23830#A2.SS2.p1.1),[Appendix B](https://arxiv.org/html/2606.23830#A2.p1.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1)\.
- R\. E\. Soria\-Guerra, R\. Nieto\-Gomez, D\. O\. Govea\-Alonso, and S\. Rosales\-Mendoza \(2015\)An overview of bioinformatics tools for epitope prediction: implications on vaccine development\.Journal of biomedical informatics53,pp\. 405–414\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p3.1)\.
- V\. Stebliankin, A\. Shirali, P\. Baral, J\. Shi, P\. Chapagain, K\. Mathee, and G\. Narasimhan \(2023\)Evaluating protein binding interfaces with transformer networks\.Nature Machine Intelligence5\(9\),pp\. 1042–1053\.Cited by:[§2\.1](https://arxiv.org/html/2606.23830#S2.SS1.SSS0.Px2.p1.5)\.
- M\. Steinegger and J\. Soding \(2017\)MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets\.Nature biotechnology35\(11\),pp\. 1026–1028\.Cited by:[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px1.p1.1)\.
- F\. Sverrisson, J\. Feydy, B\. E\. Correia, and M\. M\. Bronstein \(2021\)Fast end\-to\-end learning on protein surfaces\.InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,pp\. 15272–15281\.Cited by:[§B\.3](https://arxiv.org/html/2606.23830#A2.SS3.p1.1),[Appendix B](https://arxiv.org/html/2606.23830#A2.p2.1),[§2\.1](https://arxiv.org/html/2606.23830#S2.SS1.SSS0.Px2.p1.5),[§2\.2](https://arxiv.org/html/2606.23830#S2.SS2.SSS0.Px1.p1.10),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1)\.
- X\. Tian, H\. Ran, Y\. Wang, and H\. Zhao \(2023\)GeoMAE: masked geometric target prediction for self\-supervised point cloud pre\-training\.InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,pp\. 13570–13580\.Cited by:[§2\.3](https://arxiv.org/html/2606.23830#S2.SS3.SSS0.Px2.p1.2)\.
- J\. Tubiana, D\. Schneidman\-Duhovny, and H\. J\. Wolfson \(2022\)ScanNet: an interpretable geometric deep learning model for structure\-based protein binding site prediction\.Nature Methods19\(6\),pp\. 730–739\.Cited by:[§B\.2](https://arxiv.org/html/2606.23830#A2.SS2.p1.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px1.p1.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px3.p1.1)\.
- A\. Van Den Oord, O\. Vinyals,et al\.\(2017\)Neural discrete representation learning\.Advances in neural information processing systems30\.Cited by:[§2\.3](https://arxiv.org/html/2606.23830#S2.SS3.SSS0.Px1.p1.8)\.
- M\. Varadi, S\. Anyango, M\. Deshpande, S\. Nair, C\. Natassia, G\. Yordanova, D\. Yuan, O\. Stroe, G\. Wood, A\. Laydon,et al\.\(2022\)AlphaFold protein structure database: massively expanding the structural coverage of protein\-sequence space with high\-accuracy models\.Nucleic acids research50\(D1\),pp\. D439–D444\.Cited by:[§B\.4](https://arxiv.org/html/2606.23830#A2.SS4.p1.1)\.
- M\. Varadi, D\. Bertoni, P\. Magana, U\. Paramval, I\. Pidruchna, M\. Radhakrishnan, M\. Tsenkov, S\. Nair, M\. Mirdita, J\. Yeo,et al\.\(2024\)AlphaFold protein structure database in 2024: providing structure coverage for over 214 million protein sequences\.Nucleic Acids Research52\(D1\),pp\. D368–D375\.Cited by:[§B\.4](https://arxiv.org/html/2606.23830#A2.SS4.p1.1)\.
- A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, L\. Kaiser, and I\. Polosukhin \(2017\)Attention is all you need\.Advances in neural information processing systems30\.Cited by:[§2\.2](https://arxiv.org/html/2606.23830#S2.SS2.SSS0.Px2.p1.4),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px5.p1.1)\.
- T\. Vrevenet al\.\(2015\)Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2\.Journal of molecular biology427\(19\),pp\. 3031–3041\.Cited by:[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px6.p1.1)\.
- X\. Wang, G\. Terashi, C\. W\. Christoffer, M\. Zhu, and D\. Kihara \(2020\)Protein docking model evaluation by 3d deep convolutional neural networks\.Bioinformatics36\(7\),pp\. 2113–2118\.Cited by:[Appendix B](https://arxiv.org/html/2606.23830#A2.p2.1)\.
- F\. Wu, S\. Jin, Y\. Jiang, X\. Jin, B\. Tang, Z\. Niu, X\. Liu, Q\. Zhang, X\. Zeng, and S\. Z\. Li \(2022a\)Pre\-training of equivariant graph matching networks with conformation flexibility for drug binding\.Advanced Science9\(33\),pp\. 2203796\.Cited by:[§3\.2](https://arxiv.org/html/2606.23830#S3.SS2.p1.1)\.
- F\. Wu, S\. Jin, X\. Tang, J\. Xu, M\. Gerstein, L\. E\. Li, and J\. Zou \(2026a\)D\-flow: multi\-modality flow matching for d\-peptide design\.IEEE Journal of Biomedical and Health Informatics\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p2.1)\.
- F\. Wu, S\. Li, L\. Wu, S\. Z\. Li, D\. Radev, and Q\. Zhang \(2022b\)Discovering the representation bottleneck of graph neural networks from multi\-order interactions\.arXiv preprint arXiv:2205\.07266\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p1.1)\.
- F\. Wu and S\. Z\. Li \(2024\)Surface\-vqmae: vector\-quantized masked auto\-encoders on molecular surfaces\.InInternational Conference on Machine Learning,pp\. 53619–53634\.Cited by:[§2\.1](https://arxiv.org/html/2606.23830#S2.SS1.SSS0.Px2.p1.5),[§2\.3](https://arxiv.org/html/2606.23830#S2.SS3.p1.1)\.
- F\. Wu and S\. Z\. Li \(2026\)Dynamics\-inspired structure hallucination for protein\-protein interaction modeling\.arXiv preprint arXiv:2601\.06214\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p1.1)\.
- F\. Wu, D\. Radev, and S\. Z\. Li \(2023a\)Molformer: motif\-based transformer on 3d heterogeneous molecular graphs\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.37,pp\. 5312–5320\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p1.1)\.
- F\. Wu, L\. Wu, D\. Radev, J\. Xu, and S\. Z\. Li \(2023b\)Integration of pre\-trained protein language models into geometric deep learning networks\.Communications Biology6\(1\),pp\. 876\.Cited by:[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px3.p1.1)\.
- F\. Wu, W\. Xuan, H\. Qi, H\. Cao, H\. Chang, Z\. Zhou, H\. Zhao, M\. Jian, C\. Ma, Y\. Cheng,et al\.\(2026b\)Proteo\-r1: reasoning foundation models for de novo protein design\.arXiv preprint arXiv:2605\.02937\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p1.1)\.
- F\. Wu, Q\. Zhang, D\. Radev, J\. Cui, W\. Zhang, H\. Xing, N\. Zhang, and H\. Chen \(2021\)3d\-transformer: molecular representation with transformer in 3d space\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p2.1)\.
- F\. Wu, Z\. Zhou, S\. Jin, X\. Zeng, J\. Leskovec, and J\. Xu \(2025\)Surface\-based molecular design with multi\-modal flow matching\.InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\. 2,pp\. 3192–3203\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p2.1)\.
- F\. Wu \(2024\)A semi\-supervised molecular learning framework for activity cliff estimation\.InProceedings of the Thirty\-Third International Joint Conference on Artificial Intelligence,pp\. 6080–6088\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p1.1)\.
- F\. Wu \(2025\)DiffAntiSeq: a controllable diffusion model for efficient antibody library design\.InLLM for Scientific Discovery: Reasoning, Assistance, and Collaboration,Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p2.1)\.
- F\. Wu \(2026\)A semi\-supervised molecular learning framework for activity cliff estimation\.arXiv preprint arXiv:2601\.04507\.Cited by:[§2\.1](https://arxiv.org/html/2606.23830#S2.SS1.SSS0.Px1.p1.1)\.
- X\. Zeng, G\. Bai, C\. Sun, and B\. Ma \(2023\)Recent progress in antibody epitope prediction\.Antibodies12\(3\),pp\. 52\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.23830#S2.SS1.SSS0.Px1.p1.1)\.
- Z\. Zhang, M\. Xu, A\. Jamasb, V\. Chenthamarakshan, A\. Lozano, P\. Das, and J\. Tang \(2022\)Protein representation learning by geometric structure pretraining\.arXiv preprint arXiv:2203\.06125\.Cited by:[§1](https://arxiv.org/html/2606.23830#S1.p2.1)\.
- C\. Zhou, Z\. Chen, L\. Zhang, D\. Yan, T\. Mao, K\. Tang, T\. Qiu, and Z\. Cao \(2019\)SEPPA 3\.0—enhanced spatial epitope prediction enabling glycoprotein antigens\.Nucleic acids research47\(W1\),pp\. W388–W394\.Cited by:[§B\.1](https://arxiv.org/html/2606.23830#A2.SS1.p1.1),[§B\.3](https://arxiv.org/html/2606.23830#A2.SS3.p1.1),[§3\.1](https://arxiv.org/html/2606.23830#S3.SS1.SSS0.Px2.p1.1)\.

## Appendix AExperimental Details

### A\.1\.Training Process

#### Implementation Details\.

We implemented all experiments on 4 H100 GPUs, each with 80G memory\. During pretraining, SurfBind was trained using the Adam optimizer\(Kingma and Ba,[2014](https://arxiv.org/html/2606.23830#bib.bib202)\)with a weight decay of5\.e−35\.e\-3andβ1=0\.9\\beta\_\{1\}=0\.9andβ2=0\.999\\beta\_\{2\}=0\.999\. A ReduceLROnPlateau scheduler was employed to automatically adjust the learning rate with a patience of 5 epochs and a minimum learning rate of1​e−71e\-7\. The batch size was set to 32, and an initial learning rate was1\.e−41\.e\-4\. The maximum number of iterations was 200K, with a 10K warm\-up, and the validation frequency was 1K iterations\. The random seed was fixed as 2023\. Moreover, we empirically calculate the overlap ratio across all patches and observe a low score of 5\.34%\.

#### Dataset Details\.

For the pretraining dataset, we leverage PDB\-REDO that contains refined X\-ray structures in the Protein Data Bank \(PDB\)\(Liuet al\.,[2015](https://arxiv.org/html/2606.23830#bib.bib180)\)\. Here, we followed the scheme ofLuoet al\.\([2023](https://arxiv.org/html/2606.23830#bib.bib194)\)and clustered the protein chains at 50% sequence identity, yielding 38,413 chain clusters\. These clusters were randomly split into training, validation, and test sets at ratios of 95%, 0\.5%, and 4\.5%, respectively\. During the generative pretraining stage, the data loader first randomly selected a cluster, then randomly selected a chain from that cluster to ensure balanced sampling\. Next, a portion \(50% \- 70%\) of patches in the molecular surface of the chosen chain was randomly masked\. Finally, the feature extractor was required to restore both the low\-order \(*e\.g\.*, point coordinates\) and high\-order \(*e\.g\.*, surface geometry and physicochemical characteristics\) properties of the masked surface patches\.

For antigens with multiple antibody binders, the same antigen sequence may appear across different splits only with distinct antibodies, while no antibody–antigen complex is duplicated across splits\. In contrast, single\-binder antigens are confined to a single split, preventing antigen\-level leakage while enabling evaluation of antibody\-specific epitope generalization\.

### A\.2\.Hyperparameters

#### Hyperparameter Search Space\.

At the beginning, we adopted a random search to find the best combination of hyperparameters for the backbone architecture SurfFormer in three different downstream tasks with only supervised learning\. We then fixed these hyperparameter subsets to build three backbone architectures and further explored the hyperparameters for VQMAE\-style pretraining\.

Table 2\.Hyperparameters setup for SurfBind
#### Evaluation Metrics and Protocols\.

To estimate the prediction performance of the benchmarked predictors, we used a variety of well\-established performance metrics, including the balanced accuracy \(BAcc\), the F\-score, the area under the receiver operating characteristic curve \(AUC\-ROC\), and the area under the precision\-recall curve \(AUC\-PR\)\.

Consistent with the findings ofCiaet al\.\([2023](https://arxiv.org/html/2606.23830#bib.bib22)\), our investigation revealed that as the threshold for identifying surface residues increased, all methods not based on surface assessment yielded consistently lower scores\. This suggests that, as we narrow our evaluation to surface residues only, these methods perform worse\. Conversely, including buried residues in the surface category simplifies predictions by artificially amplifying the disparity between epitopes and non\-epitopes\. This enrichment primarily arises from a greater abundance of hydrophobic residues among non\-epitopes, relative to epitopes, as delineated by an additional RSA threshold\.

### A\.3\.Extended Introduction of SurfFormer\+\+

![Refer to caption](https://arxiv.org/html/2606.23830v1/x4.png)Figure 4\.a\. Different approaches to acquiring antibody representations when antibody structures may be inaccessible\.b\. A simple and general scheme to enable antibody\-specificity\.c\. The cross\-attention method incorporates the antibody information into antigens\.We highlighted that SurfFormer\+\+ employs cross\-attention to connect the antibody and the antigen and achieve binder\-awareness \(see Fig\.[4](https://arxiv.org/html/2606.23830#A1.F4)c\)\. Moreover, our architecture applies to a wide range of scenarios, even when antibody structures are unavailable\. In that situation, we can rely on PLMs to extract the protein\-level representation of the antibody and forward it to the cross\-attention calculation as the key and query \(see Fig\.[4](https://arxiv.org/html/2606.23830#A1.F4)a\)\. Additionally, we proposed a simple technique to make traditional BCE\-based prediction models partner\-specific\. Specifically, we used a dot product and a max\-pooling operation to exchange information between the ligand and the receptor \(see Fig\.[4](https://arxiv.org/html/2606.23830#A1.F4)b\)\.

## Appendix BBenchmarking Baselines

Recent years have seen exponential growth in BCE data, prompting rapid advances in machine learning \(ML\) methods for predicting ADs\. They use the physicochemical properties of amino acids as descriptors to rapidly and efficiently identify potential epitopes as vaccine candidates, thereby reducing the burden of the BCE mapping process by narrowing the list of candidate epitopes for experimental trials\. Preliminary endeavors use empirically computed energy terms and contact\-frequency\-based features as direct inputs to ML models such as support vector machines \(SVMs\)\(EL\-Manzalawyet al\.,[2008](https://arxiv.org/html/2606.23830#bib.bib33)\)and random forests \(RFs\)\(BUKHARIet al\.,[2021](https://arxiv.org/html/2606.23830#bib.bib32)\), but achieve limited accuracy due to the lack of 3D complementary information\. Meanwhile, some methods rely on sequence conservation or residue coevolution but often perform poorly for shallow sequence alignments\(Shashkova and others,[2022](https://arxiv.org/html/2606.23830#bib.bib31)\)\. Approaches such as AF\-Multimer, which are centered on de novo complex folding, concurrently reveal interfaces and subunit conformations\. However, they are limited to PPIs, are slower than structure\-based interface prediction, and can fail if the folding protocol falters\.

To address this constraint, geometric DL, an umbrella term that generalizes networks to Euclidean or non\-Euclidean domains, has emerged as a promising avenue for modeling macromolecular structures\. Adapting it to protein structures necessitates defining an appropriate protein representation\. Early studies\(Wanget al\.,[2020](https://arxiv.org/html/2606.23830#bib.bib37)\)integrate detailed atomic spatial information by isolating a 3D voxel grid around the interface region and employing convolutional neural networks \(CNNs\)\. Subsequent works\(da Silvaet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib38)\)describe antigens as 3D graphs, treating residues as vertices and the distance angles between them as edges\. This condenses complex 3D information into compact signatures while preserving binding\-related spatial features, and the signatures are processed by graph neural networks \(GNNs\) with E\(3\) or SE\(3\) equivariance and symmetry\. Embracing the premise that every surface residue may be immunogenic, a prevailing line of research focuses on molecular surfaces\. MaSIF\(Gainzaet al\.,[2020](https://arxiv.org/html/2606.23830#bib.bib66)\)pioneers the use of meshes, defining the patch as a region on a solvent\-excluded protein surface with a fixed geodesic radius around a potential contact point to predict interactions\. dMaSIF\(Sverrissonet al\.,[2021](https://arxiv.org/html/2606.23830#bib.bib68)\)streamlines the process by sampling atomic point clouds, alleviating the need for pre\-calculations\.

### B\.1\.Sequence\-based Algorithms

AF\-Multimerpredicts protein complexes from MSAs with impressive performance in general protein–protein docking tasks\. In our experiments, we used the AF\-Multimer implementation provided byColabFold, which performs an MSA search usingMMseqs2\. Following the default ColabFold protocol, we enabled MSAs during inference\. Since paired MSAs are generally unavailable for antibody–antigen systems, we used*unpaired*MSAs for the two chains, consistent with common benchmarking practice\. We used the default parameters: 10 recycles and 5 predicted models\. For downstream evaluation, we extracted the putative interfaces of the AF\-Multimer predictions \(*i\.e\.*, residue–residue contacts within 4Å\) and averaged the results across the 5 models\. The code used in this paper was obtained from[https://github\.com/sokrypton/ColabFold](https://github.com/sokrypton/ColabFold)\.CBTOPE\(Ansari and Raghava,[2010](https://arxiv.org/html/2606.23830#bib.bib27)\)is the first attempt to predict conformational BCEs from an amino acid sequence\. It trained support vector machine \(SVM\) models, and we used the web server at[http://www\.imtech\.res\.in/raghava/cbtope/](http://www.imtech.res.in/raghava/cbtope/)\.Seppa\-3\.0\(Zhouet al\.,[2019](https://arxiv.org/html/2606.23830#bib.bib29)\)used a logistic regression model to present a raw antigenicity score for each surface residue based on the micro\-environment features, such as glycosylation triangles and glycosylation\-related amino acid indexes\. This score was then calibrated by the overall tendency of neighboring residues\. We accessed its program at[http://www\.badd\-cao\.net/seppa3/](http://www.badd-cao.net/seppa3/)\.BepiPred\-2\.0\(Jespersenet al\.,[2017](https://arxiv.org/html/2606.23830#bib.bib19)\)replied on a random forest algorithm to forecast BCEs from antigen sequences\. It analyzed the residues using hydrophobicity and polarity measurements, along with their volume, RSA, and predicted secondary structure\. We leveraged its web server at[https://services\.healthtech\.dtu\.dk/services/BepiPred\-2\.0/](https://services.healthtech.dtu.dk/services/BepiPred-2.0/)for validation\.BepiPred\-3\.0\(Cliffordet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib17)\)is a sequence\-based tool that uses numerical representations from the protein language model ESM\-2\(Lin and others,[2022](https://arxiv.org/html/2606.23830#bib.bib102)\)to significantly improve the accuracy of both linear and conformational BCE prediction\. We used its freely available web server and a standalone package at[https://services\.healthtech\.dtu\.dk/services/BepiPred\-3\.0/](https://services.healthtech.dtu.dk/services/BepiPred-3.0/)to navigate the results\.

### B\.2\.Structure\-based Algorithms

Epi\-EPMP\(Del Vecchioet al\.,[2021](https://arxiv.org/html/2606.23830#bib.bib5)\)leverages a message\-passing network to perform joint paratope\-epitope prediction\. However, its code is not publicly available\.ScanNet\(Tubianaet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib183)\)is an end\-to\-end, interpretable geometric DL model that learned features directly from 3D structures\. It builds representations of atoms and amino acids based on the spatio\-chemical arrangement of their neighbors\. ScanNet is a versatile, powerful, and interpretable model suitable for functional site prediction tasks, and we used its publicly available web server at[http://bioinfo3d\.cs\.tau\.ac\.il/ScanNet/index\_real\.html](http://bioinfo3d.cs.tau.ac.il/ScanNet/index_real.html)\.Epitope3D\(da Silvaet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib38)\)used the concept of graph\-based signatures to model epitope and non\-epitope regions as graphs and extracted distance patterns that were used as evidence to train and test predictive models\. We submitted jobs via an API at[https://biosig\.lab\.uq\.edu\.au/epitope3d/api](https://biosig.lab.uq.edu.au/epitope3d/api)for validation\.PeSTo\(Krappet al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib26)\)is a geometric Transformer that acts directly on atomic coordinates labeled only with element names\. Its low computational cost enabled processing of large volumes of structural data, and we used its publicly available code at url https://github\.com/LBM\-EPFL/PeSTo\.SEMA\-2\.0\(Shashkova and others,[2022](https://arxiv.org/html/2606.23830#bib.bib31)\)fine\-tuned ESM\-1v\(Rives and others,[2021](https://arxiv.org/html/2606.23830#bib.bib94)\)and ESM\-IF1\(Hsuet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib152)\)models to predict residues comprising BCEs by providing an interpretable score corresponding to the expected number of contacts of an amino acid residue with the target antibody\. Two models were independently fine\-tuned, yielding SEMA\-1D and SEMA\-3D\. The author reported better performance of SEMA\-3D, which we validated on the web server at[https://sema\.airi\.net/prediction\_analysis](https://sema.airi.net/prediction_analysis)\.DiscoTope\-3\.0\(Hoieet al\.,[2024](https://arxiv.org/html/2606.23830#bib.bib28)\)is a structure\-based BCE prediction tool exploiting inverse folding representations generated from either AlphaFold predicted or solved structures\. It adopted the XGBoost architecture, which was trained on both predicted and solved antigen structures using a positive\-unlabelled learning ensemble approach\. This enabled large\-scale prediction of epitopes even when solved structures were unavailable\. We used its web server at[https://services\.healthtech\.dtu\.dk/services/DiscoTope\-3\.0/](https://services.healthtech.dtu.dk/services/DiscoTope-3.0/)\.Pair\-EGRET\(Alamet al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib181)\)is an edge\-aggregated graph attention network \(GAT\) that leveraged the features extracted from pretrained Transformer\-like models to accurately predict pairwise PPI sites\. It used a k\-nearest\-neighbor graph to represent the three\-dimensional structure of a protein and employed cross\-attention on top of a Siamese network to accurately identify interface residues between protein pairs\. We re\-run its code at[https://github\.com/1705004/Pair\-EGRET](https://github.com/1705004/Pair-EGRET)\.WALLE\(Liuet al\.,[2024](https://arxiv.org/html/2606.23830#bib.bib3)\)models each antibody–antigen complex as two residue\-level graphs \(antibody and antigen\), with nodes initialized by concatenating ESM embeddings and structural features\. Each graph is encoded independently using stacked GNN layers, after which antibody–antigen residue pairs are combined in a cross\-graph decoder to predict inter\-residue contacts\. We reproduce their method following the repository at[https://github\.com/biochunan/AsEP\-dataset](https://github.com/biochunan/AsEP-dataset)\.

### B\.3\.Surface\-based Algorithms

MaSIF\-Site\(Gainzaet al\.,[2020](https://arxiv.org/html/2606.23830#bib.bib66)\)is a conceptual framework that uses a geometric DL method to capture fingerprints important for specific biomolecular interactions\. It took as input mesh\-based representations of a protein surface and relied on hand\-crafted chemical and geometric features, which must also be pre\-computed and stored on the hard drive\. We used the recommended parameters for data processing: Circular patches of 12Å radius were computed from the surfaces of interacting proteins using the MaSIF data preparation module\. The patch data structure was a grid of 80 bins, with 5 angular and 16 radial coordinates\. We leveraged the code at[https://github\.com/LPDI\-EPFL/masif](https://github.com/LPDI-EPFL/masif)\.dMaSIF\-Search\(Sverrissonet al\.,[2021](https://arxiv.org/html/2606.23830#bib.bib68)\)extended MaSIF by bypassing the pre\-computation of physicochemical features and instead calculating molecular surfaces directly from the atomic point cloud in real\-time\. The model’s input had a data structure similar to that of a 12Å patch, in which each surface point was represented as a one\-hot encoding of surface chemicals, together with Gaussian and mean curvatures\. We retained the code from[https://github\.com/FreyrS/dMaSIF\.git](https://github.com/FreyrS/dMaSIF.git)and trained from scratch for evaluation\.SEPPA\-mAb\(Qiuet al\.,[2023](https://arxiv.org/html/2606.23830#bib.bib25)\)appended a fingerprints\-based patch model to Seppa\-3\.0\(Zhouet al\.,[2019](https://arxiv.org/html/2606.23830#bib.bib29)\), considering the structural and physicochemical complementarity between a possible epitope patch and the CDRs of monoclonal antibodies\. We leveraged its web server at[http://www\.badd\-cao\.net/seppa\-mab/](http://www.badd-cao.net/seppa-mab/)\.

### B\.4\.Limitations and Future Work

To the best of our knowledge, the developed SurfBind is the first attempt to conduct generative pretraining purely on molecular surfaces\. It outperforms state\-of\-the\-art empirical and ML\-based protein\-scoring functions in identifying antibody\-specific viable BCEs\. In spite of its promising progress, there is still some space left for future explorations\. First, more abundant databases can be exploited in our framework\. The powerful structure prediction methods\(Varadiet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib47)\)Alphafold\-Database\(Varadiet al\.,[2024](https://arxiv.org/html/2606.23830#bib.bib20)\)Second, PLMs have shown effectiveness in many protein\-related tasks, It might be beneficial if both the PLM and the geometric encoder are tuned\. Second, our protein graphs are built in a residue\-level manner\. However,\(Jinet al\.,[2022](https://arxiv.org/html/2606.23830#bib.bib73)\)has already demonstrated that atom\-level modeling significantly surpasses the residue\-level one\. Therefore, it is undeniable that the performance of our HTP will be improved dramatically if an atom\-level protein graph is considered\.

![Refer to caption](https://arxiv.org/html/2606.23830v1/x5.png)Figure 5\.The overall pipeline of our unsupervised SurfBind method\.The input surface point cloud is first preprocessed into ordered patches using the farthest point search \(FPS\) algorithm and Morton codes\. Patch\-level representations are then extracted by a point cloud network\. A portion of patches is randomly masked, and their features are replaced with vectors from a relaxed codebook\. Both visible patch embeddings and sampled codebook vectors are forwarded to SurfFormer to gain a global surface understanding\. Finally, three pretext tasks are proposed as pretraining objectives: reconstructing the coordinates of masked center points, predicting local surface geometry, and forecasting critical surface chemical properties\.

## Appendix CDocking Performance with SurfBind\-Guided Binding Sites

To evaluate whether improved binding site prediction translates into gains in downstream structural modeling, we assess the impact of SurfBind\-predicted epitopes on protein–protein docking accuracy\. Docking provides a practical and stringent test of binding site quality, as accurate site localization can substantially reduce the conformational search space and improve pose selection\.

Following the experimental protocol of DockGPT\(McPartlon and Xu,[2023](https://arxiv.org/html/2606.23830#bib.bib10)\), we incorporate SurfBind\-predicted epitopes as spatial constraints within the docking pipeline\. We compare three settings: \(i\) blind docking without binding site information, \(ii\) docking guided by SurfBind\-predicted epitopes, and \(iii\) docking guided by ground\-truth epitopes\. Docking quality is evaluated using DockQ\(Basu and Wallner,[2016](https://arxiv.org/html/2606.23830#bib.bib11)\), along with interface RMSD \(I\-RMSD\) and ligand RMSD \(L\-RMSD\), with results reported per target by taking the best prediction among the top\-ranked poses\.

Specifically, we assessed docking performance on two standard benchmarks: the Antibody Benchmark \(Ab\-Bench\), comprising 46 antibody–antigen complexes with unbound structures, and a held\-out subset of DB5\.5, containing 42 non\-redundant protein–protein complexes that are sequence\-disjoint from the training data\. To account for stochasticity when binding\-site constraints were provided, each target was evaluated over multiple independent runs with different random interface samplings, and metrics were averaged per complex\.

Results are summarized in Table[3](https://arxiv.org/html/2606.23830#A3.T3)\. Incorporating SurfBind\-predicted epitopes yields consistent improvements over blind docking across all metrics\. In particular, the docking success rate increases from 26\.1% under blind docking to 38\.0% when guided by SurfBind predictions\. Consequently, both I\-RMSD and L\-RMSD percentiles decrease, indicating more accurate interface reconstruction and ligand placement\. While docking guided by ground\-truth epitopes achieves a higher performance \(54\.3% success rate\), SurfBind recovers a substantial fraction of this gain without access to native binding site annotations\.

These results indicate that SurfBind predictions are sufficiently accurate to serve as effective docking constraints, leading to measurable improvements in docking accuracy\. Beyond standard epitope prediction benchmarks such as SAbDab and DB5\.5, this experiment demonstrates that SurfBind provides practical benefits when integrated into downstream antibody–antigen docking pipelines, supporting its utility for structure\-based interaction modeling\. While we do not perform formal hypothesis testing, the performance gains are robust across complexes and are not driven by a small subset of outliers, indicating consistent improvement across targets rather than isolated cases\.

Table 3\.Comparison of antibody–antigen docking performance following the*DockGPT*protocol\. Results are reported in terms of DockQ success rate \(SR, higher is better\) \[B\] and I\-RMSD/L\-RMSD percentiles \(lower is better\)\. The top block reportsblind docking\(no epitope information provided\)\. The bottom block reportsepitope\-guided docking, where DockGPT is supplied either with SurfBind\-predicted epitopes or ground\-truth epitopes\.

Similar Articles

Surflo: Consistent 3D Surface Flow Model with Global State

Hugging Face Daily Papers

Surflo is a feed-forward 3D reconstruction model that compresses unposed RGB views into latent tokens and decodes consistent 3D surface points via flow matching, enabling variable-resolution output and outperforming existing methods in speed.