ConTact: Contact-First Antibody CDR Design via Explicit Interface Reasoning

arXiv cs.LG 05/22/26, 04:00 AM Papers
antibody-design cdr contact-prediction graph-neural-network equivariant protein-design deep-learning
Summary
ConTact introduces a contact-then-act architecture for antibody CDR design that explicitly decomposes the task into interface reasoning, contact prediction, and contact-gated sequence generation, achieving state-of-the-art structural quality and epitope awareness on the Chimera-Bench benchmark.
arXiv:2605.21600v1 Announce Type: new Abstract: Computational antibody CDR design methods condition on antigen structure to generate binding loops, yet existing architectures conflate two fundamentally distinct sub-problems: identifying which CDR positions will contact the antigen, and selecting amino acids at those positions. This conflation forces models to learn contact reasoning implicitly through uniform message passing, diluting antigen signal across all positions equally. We introduce ConTact, a contact-then-act architecture that explicitly decomposes CDR design into three cascaded stages: learning surface complementarity fingerprints, predicting CDR-antigen contacts, and injecting contact-gated antigen features into the sequence head. A distance-biased cross-attention module encodes geometric priors favoring spatial neighbors, while a contact-weighted cross-entropy loss concentrates gradient signal on binding-critical positions. On CHIMERA-Bench dataset, ConTact achieves the best structural quality (7% RMSD improvement over the next-best baseline), best epitope awareness (10% F1 score over GNN baselines), and competitive sequence recovery (AAR 0.38) among several CDR-H3 design baselines.
Original Article
View Cached Full Text
Cached at: 05/22/26, 08:49 AM
# ConTact: Contact-First Antibody CDR Design via Explicit Interface Reasoning
Source: [https://arxiv.org/html/2605.21600](https://arxiv.org/html/2605.21600)
###### Abstract

Computational antibody CDR design methods condition on antigen structure to generate binding loops, yet existing architectures conflate two fundamentally distinct sub\-problems: identifying which CDR positions will contact the antigen, and selecting amino acids at those positions\. This conflation forces models to learn contact reasoning implicitly through uniform message passing, diluting antigen signal across all positions equally\. We introduceConTact, a contact\-then\-act architecture that explicitly decomposes CDR design into three cascaded stages: learning surface complementarity fingerprints, predicting CDR\-antigen contacts, and injecting contact\-gated antigen features into the sequence head\. A distance\-biased cross\-attention module encodes geometric priors favoring spatial neighbors, while a contact\-weighted cross\-entropy loss concentrates gradient signal on binding\-critical positions\. OnChimera\-Bench,ConTactachieves the best structural quality \(7% RMSD improvement over the next\-best baseline\), best epitope awareness \(10% F1 score over GNN baselines\), and competitive sequence recovery \(AAR 0\.38\) among several CDR\-H3 design baselines\.

antibody design, CDR, contact prediction, graph neural network, equivariant

## 1Introduction

Antibodies bind antigens through their complementarity\-determining regions \(CDRs\), six hypervariable loops whose sequence and structure determine binding specificity\(Chothia and Lesk,[1987](https://arxiv.org/html/2605.21600#bib.bib278)\)\. Computational CDR design methods condition on antigen structure to generate sequences and backbone conformations for these loops\(Luoet al\.,[2022](https://arxiv.org/html/2605.21600#bib.bib252); Konget al\.,[2023a](https://arxiv.org/html/2605.21600#bib.bib256),[b](https://arxiv.org/html/2605.21600#bib.bib251); Wuet al\.,[2025b](https://arxiv.org/html/2605.21600#bib.bib199)\)\. Yet a growing body of evidence shows that existing methods largely fail to leverage antigen information\. Predictions remain nearly unchanged when the antigen is removed\(Liet al\.,[2025](https://arxiv.org/html/2605.21600#bib.bib335)\), and BLOSUM substitution matrices explain model outputs as well as learned likelihoods\(Uçar and Sormanni,[2025](https://arxiv.org/html/2605.21600#bib.bib284); Chineryet al\.,[2024](https://arxiv.org/html/2605.21600#bib.bib337)\)\.

We argue that a fundamental cause is architectural: current methods conflate two distinct sub\-problems into a single prediction head\. The first sub\-problem is*where*the CDR will contact the antigen, i\.e\., which CDR positions form binding interactions\. The second is*what*amino acids to place at those positions, given the local chemistry of the binding partner\. Equivariant GNNs such as MEAN\(Konget al\.,[2023a](https://arxiv.org/html/2605.21600#bib.bib256)\)and RAAD\(Wuet al\.,[2025b](https://arxiv.org/html/2605.21600#bib.bib199)\)propagate antigen information through uniform message passing that treats all antigen residues equivalently\. Diffusion\-based methods like DiffAb\(Luoet al\.,[2022](https://arxiv.org/html/2605.21600#bib.bib252)\)concatenate antibody and antigen residues into a flat graph with only a fragment\-type embedding to distinguish them\. Even dyMEAN\(Konget al\.,[2023b](https://arxiv.org/html/2605.21600#bib.bib251)\), which uses a shadow paratope mechanism and edge distance prediction for contact\-aware graph construction, does not use predicted contacts to modulate sequence prediction\. In all cases, the model must simultaneously discover which positions are binding\-relevant and what residues belong there, with a uniform cross\-entropy loss that allocates equal learning capacity to every position\.

The CDR\-antigen interface is inherently sparse\. A CDR\-H3 of length 10–25 typically forms only 5–15 contacts with the antigen, and the amino acid identity at contact positions is directly constrained by the chemistry of the binding partner: hydrophobic pockets select for complementary hydrophobic CDR residues, while charged patches favor oppositely charged side chains\. Non\-contact positions are primarily constrained by backbone geometry and loop stability\. Treating these two classes of positions equally wastes learning capacity on the less informative non\-contact positions\.

We proposeConTact, a contact\-first architecture that decomposes CDR design into three explicit stages, addressing the*where*before the*what*\. First, the model learns surface complementarity fingerprints that characterize the local binding environment at each CDR position, inspired by molecular surface fingerprints\(Gainzaet al\.,[2020](https://arxiv.org/html/2605.21600#bib.bib238),[2023](https://arxiv.org/html/2605.21600#bib.bib239)\)\. Second, it predicts which CDR positions will contact the antigen using a supervised contact predictor\. Third, it selectively injects local antigen features into the CDR representation, gated by the predicted contact confidence, so that antigen information flows preferentially to binding\-critical positions\. A distance\-biased cross\-attention module provides geometric inductive bias by favoring spatial neighbors, and a contact\-weighted cross\-entropy loss concentrates gradient signal on positions the model identifies as contacts\.

Our contributions are:

1. 1\.We identify the conflation of contact identification and sequence prediction as a structural limitation of existing CDR design architectures, and propose the*contact\-first*design paradigm that decomposes these sub\-problems into an explicit three\-stage cascade\.
2. 2\.We introduce a contact\-gated injection mechanism with double gating \(learned gate×\\timescontact confidence\) that selectively routes antigen information to binding\-relevant CDR positions, preventing noise from distant antigen residues at non\-contact positions\.
3. 3\.We demonstrate onChimera\-BenchthatConTactachieves the best RMSD \(1\.63 Å, 7% over next\-best\), epitope F1 \(0\.79, 10% over GNN baselines\), fnat \(0\.67\), and AAR \(0\.38\) among eleven baselines\.

## 2Related Work

##### Equivariant GNN methods\.

MEAN\(Konget al\.,[2023a](https://arxiv.org/html/2605.21600#bib.bib256)\)introduced multi\-channel equivariant attention with alternating intra\-segment and inter\-segment layers for CDR design\. dyMEAN\(Konget al\.,[2023b](https://arxiv.org/html/2605.21600#bib.bib251)\)extended this with a shadow paratope mechanism that predicts inter\-chain edge distances for dynamic graph construction, making it the closest existing work to contact\-aware CDR design\. RAAD\(Wuet al\.,[2025b](https://arxiv.org/html/2605.21600#bib.bib199)\)defines eight relation\-aware edge types with Bernoulli edge sampling and a contrastive specificity loss applied at test\-time optimization\.ConTactdiffers from all three in that it uses predicted contacts to directly modulate the sequence prediction head through gated injection and position\-specific loss weighting, rather than using contact\-related information solely for graph topology \(dyMEAN\) or test\-time optimization \(RAAD\)\.

##### Diffusion and flow methods\.

DiffAb\(Luoet al\.,[2022](https://arxiv.org/html/2605.21600#bib.bib252)\)models CDR generation as a joint diffusion process over coordinates, orientations, and amino acid types\. AbFlowNet\(Abiret al\.,[2025](https://arxiv.org/html/2605.21600#bib.bib259)\)extends this with flow matching and trajectory balance loss\. AbMEGD\(Chenet al\.,[2025](https://arxiv.org/html/2605.21600#bib.bib261)\)and RADAb\(Wanget al\.,[2024](https://arxiv.org/html/2605.21600#bib.bib123)\)add retrieval\-augmented and multi\-expert components\. dyAb\(Tanet al\.,[2025](https://arxiv.org/html/2605.21600#bib.bib254)\)applies flow matching with structure relaxation\. FlowDesign\(Wuet al\.,[2025a](https://arxiv.org/html/2605.21600#bib.bib303)\)follows a diagnose\-then\-fix approach, identifying that standard Gaussian priors are poorly suited for CDR generation and replacing them with data\-driven prior distributions\. All these methods treat antigen conditioning as a flat concatenation of antibody and antigen residues with fragment\-type embeddings, applying uniform attention without distinguishing contact from non\-contact positions\.ConTactaddresses a complementary limitation: not the prior distribution, but the conditioning mechanism itself\.

##### Antigen conditioning failures\.

Multiple studies have documented that existing CDR design methods fail to effectively use antigen information\.Liet al\.\([2025](https://arxiv.org/html/2605.21600#bib.bib335)\)showed that predictions remain nearly unchanged when the antigen is removed\.Uçar and Sormanni \([2025](https://arxiv.org/html/2605.21600#bib.bib284)\)demonstrated that BLOSUM substitution matrices explain model outputs as well as learned likelihoods\.Chineryet al\.\([2024](https://arxiv.org/html/2605.21600#bib.bib337)\)found that simple computational methods can outperform deep learning in generating diverse, binder\-enriched antibody libraries\. RefineGNN\(Jinet al\.,[2022b](https://arxiv.org/html/2605.21600#bib.bib131)\), which receives no antigen input, achieves the second\-best binding metrics onChimera\-Bench, further corroborating this failure\. The contact\-first decomposition inConTactdirectly targets this problem by providing an explicit, supervised pathway for antigen information to reach the sequence head\.

##### Predict\-then\-design paradigms\.

The idea of predicting binding\-relevant features before designing sequences has precedent in broader protein design\. MaSIF\-seed\(Gainzaet al\.,[2023](https://arxiv.org/html/2605.21600#bib.bib239)\)predicts favorable interaction sites on molecular surfaces using learned surface fingerprints, then designs binders targeting those sites\. RFdiffusion\(Watsonet al\.,[2023](https://arxiv.org/html/2605.21600#bib.bib293)\)generates protein backbones first, then designs sequences with ProteinMPNN\.ConTactapplies a similar predict\-then\-design strategy at the residue\-contact level: predict which CDR positions will contact the antigen, then condition sequence design on those predictions\. Unlike MaSIF\-seed, which operates on molecular surfaces in a separate pipeline,ConTactperforms contact prediction and sequence design end\-to\-end within a single differentiable architecture\.

## 3Preliminaries

### 3\.1Task Definition

We adopt the formulation fromChimera\-Bench\(Ahmedet al\.,[2026](https://arxiv.org/html/2605.21600#bib.bib1)\)\. Given an antigen structureA=\{\(sj,𝐱j\)∣j∈VA\}A=\\\{\(s\_\{j\},\\mathbf\{x\}\_\{j\}\)\\mid j\\in V\_\{A\}\\\}, an epitope specificationE⊆VAE\\subseteq V\_\{A\}, and an antibody frameworkF=\{\(si,𝐱i\)∣i∈VFR\}F=\\\{\(s\_\{i\},\\mathbf\{x\}\_\{i\}\)\\mid i\\in V\_\{\\text\{FR\}\}\\\}, the task is to design CDR residuesR=\{\(sk,𝐱k\)∣k∈VCDR\}R=\\\{\(s\_\{k\},\\mathbf\{x\}\_\{k\}\)\\mid k\\in V\_\{\\text\{CDR\}\}\\\}that maximize the conditional likelihood subject to epitope contact constraints:

R∗=argmaxR⁡pθ\(R∣A,E,F\),s\.t\.𝒞\(R,A\)≠∅R^\{\*\}=\\operatorname\*\{arg\\,max\}\_\{R\}\\;p\_\{\\theta\}\\\!\\bigl\(R\\mid A,E,F\\bigr\),\\quad\\text\{s\.t\.\}\\;\\;\\mathcal\{C\}\(R,A\)\\neq\\emptyset\(1\)where each residue has amino acid typesk∈\{1,…,20\}s\_\{k\}\\in\\\{1,\\ldots,20\\\}and Cα\\alphacoordinate𝐱k∈ℝ3\\mathbf\{x\}\_\{k\}\\in\\mathbb\{R\}^\{3\}\. We denote by𝒞\(R,A\)=\{j∈VA∣∃k∈VCDR:‖𝐱k−𝐱j‖<dc\}\\mathcal\{C\}\(R,A\)=\\\{j\\in V\_\{A\}\\mid\\exists\\,k\\in V\_\{\\text\{CDR\}\}\\\!:\\\|\\mathbf\{x\}\_\{k\}\-\\mathbf\{x\}\_\{j\}\\\|<d\_\{c\}\\\}the set of antigen residues contacted within cutoffdcd\_\{c\}\. We focus on CDR\-H3, the most variable loop and primary determinant of antigen specificity\(Chothia and Lesk,[1987](https://arxiv.org/html/2605.21600#bib.bib278)\)\.

### 3\.2Graph Construction

We represent the antibody\-antigen complex as a heterogeneous graph𝒢=\(V,ℰ\)\\mathcal\{G\}=\(V,\\mathcal\{E\}\)\. The node setV=VHC∪VLC∪VA∪Vglob∪VvnV=V\_\{\\text\{HC\}\}\\cup V\_\{\\text\{LC\}\}\\cup V\_\{A\}\\cup V\_\{\\text\{glob\}\}\\cup V\_\{\\text\{vn\}\}contains residue nodes from the heavy chain \(VHCV\_\{\\text\{HC\}\}\), light chain \(VLCV\_\{\\text\{LC\}\}\), and antigen \(VAV\_\{A\}\), three global delimiter tokens \(Vglob=\{BOH,BOL,BOA\}V\_\{\\text\{glob\}\}=\\\{\\text\{BOH\},\\text\{BOL\},\\text\{BOA\}\\\}\), andNvnN\_\{\\text\{vn\}\}virtual nodes\(Sestaket al\.,[2026](https://arxiv.org/html/2605.21600#bib.bib220)\)\. Each residue nodeiicarries amino acid typesi∈\{1,…,20\}s\_\{i\}\\in\\\{1,\\ldots,20\\\}and four backbone atom coordinates𝐗i=\[𝐱iN,𝐱iCα,𝐱iC,𝐱iO\]∈ℝ4×3\\mathbf\{X\}\_\{i\}=\[\\mathbf\{x\}\_\{i\}^\{\\text\{N\}\},\\mathbf\{x\}\_\{i\}^\{\\text\{C\}\\alpha\},\\mathbf\{x\}\_\{i\}^\{\\text\{C\}\},\\mathbf\{x\}\_\{i\}^\{\\text\{O\}\}\]\\in\\mathbb\{R\}^\{4\\times 3\}\.

The edge setℰ\\mathcal\{E\}is partitioned into 10 typed subsets that capture different structural relationships\. Within each chain, we construct*radial edges*connecting all pairs within a Cα\\alphadistance cutoff,*sequential edges*linking residues separated by one or two positions in primary sequence, and*KNN edges*connecting each residue to its nearest spatial neighbors\. Across chains, we add*inter\-chain radial edges*and*inter\-chain KNN edges*that enable direct communication between antibody and antigen residues\. Three*global\-to\-chain edges*connect the delimiter tokens to their respective chains\. Two*virtual node edge types*connect each virtual node bidirectionally to all epitope and all CDR residues\. This creates a two\-hop shortcut between epitope and CDR, directly addressing the over\-squashing problem\(Alon and Yahav,[2021](https://arxiv.org/html/2605.21600#bib.bib329)\)where information from distant epitope residues dilutes through many layers of sequential message passing\.

Each edge\(i,j\)\(i,j\)carries a feature vector𝐞ij\\mathbf\{e\}\_\{ij\}encoding edge type \(one\-hot\), relative position in local coordinate frames, pairwise distance RBFs between backbone atom pairs, a quaternion encoding of relative backbone orientation, and local frame direction features\. Virtual node edges use learnable feature vectors rather than geometric features\.

### 3\.3Motivation: Contact\-First Decomposition

Existing CDR design methods process antigen information through spatial message passing or cross\-attention, but none separates the problem of*identifying contacts*from the problem of*designing residues at contacts*\. MEAN\(Konget al\.,[2023a](https://arxiv.org/html/2605.21600#bib.bib256)\)alternates intra\-segment and inter\-segment equivariant attention layers, attending uniformly to all antigen residues\. RAAD\(Wuet al\.,[2025b](https://arxiv.org/html/2605.21600#bib.bib199)\)defines eight relation\-aware edge types and uses Bernoulli edge sampling over antigen connections, but its contrastive specificity loss operates only at test\-time optimization, not during training\. dyMEAN\(Konget al\.,[2023b](https://arxiv.org/html/2605.21600#bib.bib251)\)introduces a shadow paratope that predicts inter\-chain edge distances for graph construction, making it the closest to contact\-aware, but these distances inform graph topology rather than the sequence prediction head\. DiffAb\(Luoet al\.,[2022](https://arxiv.org/html/2605.21600#bib.bib252)\)and AbFlowNet\(Abiret al\.,[2025](https://arxiv.org/html/2605.21600#bib.bib259)\)concatenate all residues into a flat graph with a fragment\-type embedding, applying uniform geometric attention\. RefineGNN\(Jinet al\.,[2022b](https://arxiv.org/html/2605.21600#bib.bib131)\)generates CDRs autoregressively without any antigen input, yet achieves surprisingly strong binding metrics, further highlighting the failure of existing conditioning approaches\.

All these methods apply a uniform cross\-entropy loss that treats every CDR position equally, whether it contacts the antigen or not\. This is suboptimal because amino acid identity at contact positions is directly constrained by the chemistry of the binding partner, while non\-contact positions are primarily constrained by backbone geometry and loop stability\.

We formalize the alternative as a design principle:*contact prediction should precede sequence prediction*\. If the model first identifies which CDR positions will contact the antigen, it can selectively route antigen information to those positions and concentrate learning capacity there\. This decomposition draws on the broader paradigm of predict\-then\-design in structural biology\(Gainzaet al\.,[2023](https://arxiv.org/html/2605.21600#bib.bib239); Watsonet al\.,[2023](https://arxiv.org/html/2605.21600#bib.bib293)\), but applies it at the residue\-contact level within a single end\-to\-end architecture rather than as a separate pipeline\.

### 3\.4Contact Definition

We define a CDR residuekkas contacting the antigen if its Cα\\alphaatom lies within 8 Å of any antigen Cα\\alphaatom:

ck=𝟙\[minj∈VA⁡‖𝐱k−𝐱j‖<8Å\]c\_\{k\}=\\mathbb\{1\}\\\!\\left\[\\min\_\{j\\in V\_\{A\}\}\\\|\\mathbf\{x\}\_\{k\}\-\\mathbf\{x\}\_\{j\}\\\|<8\\text\{~\\AA \}\\right\]\(2\)This threshold matches the symmetric contact definition used in theChimera\-Benchevaluation metrics \(fnat, iRMSD, DockQ\)\. The binary labelsck∈\{0,1\}c\_\{k\}\\in\\\{0,1\\\}serve as supervision for the contact prediction stage and as weights in the contact\-weighted sequence loss\.

## 4Method

ConTactconsists of three components: \(i\) a VirtualNode\-EGNN encoder that performs E\(3\)\-equivariant message passing over the heterogeneous graph, \(ii\) a distance\-biased cross\-attention module that combines CDR and antigen representations with spatial priors, and \(iii\) a three\-stage contact\-first decoder that cascades complementarity fingerprinting, contact prediction, and contact\-guided sequence generation\.[Figure1](https://arxiv.org/html/2605.21600#S4.F1)illustrates the full pipeline\.

Ab\-Ag Complex\{\(si,𝐗i\)\}\\\{\(s\_\{i\},\\mathbf\{X\}\_\{i\}\)\\\}VirtualNodeEGNN𝐡cdr\\mathbf\{h\}^\{\\text\{cdr\}\}CDR𝐡ag\\mathbf\{h\}^\{\\text\{ag\}\}AntigenDistance\-BiasedCross\-AttentionStage 1FingerprintStage 2ContactPredictorStage 3Local CompInjectorSeq Head→\\to20 AAℒfp\\mathcal\{L\}\_\{\\text\{fp\}\}ℒcontact\\mathcal\{L\}\_\{\\text\{contact\}\}ℒseq\\mathcal\{L\}\_\{\\text\{seq\}\}ℒcoord\\mathcal\{L\}\_\{\\text\{coord\}\}EncoderThree\-Stage DecoderFigure 1:ConTactarchitecture\.The encoder maps residue features through a VirtualNode\-EGNN to produce per\-residue embeddings and updated coordinates\. CDR and antigen embeddings are combined via distance\-biased cross\-attention\. The three\-stage decoder cascades complementarity fingerprinting, contact prediction, and contact\-guided local complementarity injection\. Each stage conditions on the previous stage’s output\. The contact\-weighted sequence lossℒseq\\mathcal\{L\}\_\{\\text\{seq\}\}up\-weights positions identified as contacts by Stage 2\. Dashed arrows indicate antigen features flowing directly to Stage 3 for local aggregation\.### 4\.1Feature Encoding

Each residueiiin the antibody\-antigen complex is represented by a feature vector𝐟i\\mathbf\{f\}\_\{i\}composed of five groups\.Amino acid identity: a one\-hot encoding of the residue type, masked to the zero vector for all CDR positions during training to prevent trivial teacher\-forcing\.Backbone distance RBFs: intraresidue bond lengths \(N–Cα\\alpha, Cα\\alpha–C, C–O\) each expanded into Gaussian basis functions:

ϕrbf\(d\)m=exp⁡\(−\(d−μm\)22ς2\),m=1,…,M\\phi\_\{\\text\{rbf\}\}\(d\)\_\{m\}=\\exp\\\!\\left\(\-\\frac\{\(d\-\\mu\_\{m\}\)^\{2\}\}\{2\\varsigma^\{2\}\}\\right\),\\quad m=1,\\ldots,M\(3\)whereμm\\mu\_\{m\}are uniformly spaced centers andς\\varsigmais the basis width\.Backbone angles: bond angles and dihedral angles \(ϕ\\phi,ψ\\psi,ω\\omega\), each encoded as sine\-cosine pairs\.Local frame directions: unit vectors along the three local coordinate axes defined by the N\-Cα\\alpha\-C backbone triangle\.Sinusoidal position embedding: encoding of the residue index within its chain at multiple frequency scales\.

A segment type indicator distinguishes heavy chain, light chain, and antigen residues\. A dual\-path MLP processes geometric and chemical features through separate pathways with SiLU activations, fuses the outputs, and projects to embedding dimensiondd:

𝐡i\(0\)=MLPfuse\(\[MLPgeom\(𝐟igeom\),MLPchem\(𝐟ichem\)\]\)\\mathbf\{h\}\_\{i\}^\{\(0\)\}=\\text\{MLP\}\_\{\\text\{fuse\}\}\\\!\\left\(\[\\text\{MLP\}\_\{\\text\{geom\}\}\(\\mathbf\{f\}\_\{i\}^\{\\text\{geom\}\}\),\\;\\text\{MLP\}\_\{\\text\{chem\}\}\(\\mathbf\{f\}\_\{i\}^\{\\text\{chem\}\}\)\]\\right\)\(4\)Epitope residues \(those inEE\) receive an additional learnable embedding𝐞epi\\mathbf\{e\}\_\{\\text\{epi\}\}added to their representation, providing an explicit signal that these residues are part of the designated binding site\.

### 4\.2VirtualNode\-EGNN Encoder

The encoder applies multiple relation\-aware E\(3\)\-equivariant GNN layers\(Satorraset al\.,[2021](https://arxiv.org/html/2605.21600#bib.bib179)\)on graph𝒢\\mathcal\{G\}\. Virtual nodes with learnable feature vectors and learnable coordinates participate in message passing through the VN\-to\-epitope and VN\-to\-CDR edge types\. Because each virtual node connects to both all epitope residues and all CDR residues, information flows from any epitope residue to any CDR residue in exactly two message\-passing steps\. Without virtual nodes, this information must traverse the graph via sequential edges, suffering from over\-squashing\(Alon and Yahav,[2021](https://arxiv.org/html/2605.21600#bib.bib329)\)at bottleneck residues\.

Each layerllupdates node features and coordinates simultaneously\. For edge\(i,j\)\(i,j\)of typett, the message function takes the concatenation of sender and receiver embeddings, an outer product geometry term, and the edge features:

𝐦ij\(l\)=MLPmsg\(l\)\(\[𝐡i\(l\),𝐡j\(l\),vec\(Δ𝐱ij\(Δ𝐱ij\)⊤\),𝐞ij\]\)\\mathbf\{m\}\_\{ij\}^\{\(l\)\}=\\text\{MLP\}\_\{\\text\{msg\}\}^\{\(l\)\}\\\!\\left\(\[\\mathbf\{h\}\_\{i\}^\{\(l\)\},\\;\\mathbf\{h\}\_\{j\}^\{\(l\)\},\\;\\text\{vec\}\\\!\\left\(\\Delta\\mathbf\{x\}\_\{ij\}\(\\Delta\\mathbf\{x\}\_\{ij\}\)^\{\\top\}\\right\),\\;\\mathbf\{e\}\_\{ij\}\]\\right\)\(5\)whereΔ𝐱ij=𝐱i\(l\)−𝐱j\(l\)\\Delta\\mathbf\{x\}\_\{ij\}=\\mathbf\{x\}\_\{i\}^\{\(l\)\}\-\\mathbf\{x\}\_\{j\}^\{\(l\)\}andvec\(⋅\)\\text\{vec\}\(\\cdot\)flattens the3×33\\times 3outer product matrix into a 9\-dimensional vector\. The entries ofΔ𝐱ij\(Δ𝐱ij\)⊤\\Delta\\mathbf\{x\}\_\{ij\}\(\\Delta\\mathbf\{x\}\_\{ij\}\)^\{\\top\}are dot products of displacement components, which are invariant to rotations, translations, and reflections\.

The model aggregates messages from all edge types with type\-specific linear projections and updates node features via a residual connection:

𝐡i\(l\+1\)=𝐡i\(l\)\+MLPnode\(l\)\(\[𝐡i\(l\),∑t𝐖t\(l\)∑j∈𝒩t\(i\)𝐦ij\(l\)\]\)\\mathbf\{h\}\_\{i\}^\{\(l\+1\)\}=\\mathbf\{h\}\_\{i\}^\{\(l\)\}\+\\text\{MLP\}\_\{\\text\{node\}\}^\{\(l\)\}\\\!\\left\(\\left\[\\mathbf\{h\}\_\{i\}^\{\(l\)\},\\;\\sum\_\{t\}\\mathbf\{W\}\_\{t\}^\{\(l\)\}\\sum\_\{j\\in\\mathcal\{N\}\_\{t\}\(i\)\}\\mathbf\{m\}\_\{ij\}^\{\(l\)\}\\right\]\\right\)\(6\)where𝐖t\(l\)\\mathbf\{W\}\_\{t\}^\{\(l\)\}is a type\-specific projection matrix and𝒩t\(i\)\\mathcal\{N\}\_\{t\}\(i\)denotes the neighbors of nodeiiunder edge typett\. Coordinates are updated equivariantly by adding a weighted sum of displacement vectors:

𝐱i\(l\+1\)=𝐱i\(l\)\+∑t1\|𝒩t\(i\)\|∑j∈𝒩t\(i\)Δ𝐱ij⋅MLPtcoord,\(l\)\(𝐦ij\(l\)\)\\mathbf\{x\}\_\{i\}^\{\(l\+1\)\}=\\mathbf\{x\}\_\{i\}^\{\(l\)\}\+\\sum\_\{t\}\\frac\{1\}\{\|\\mathcal\{N\}\_\{t\}\(i\)\|\}\\sum\_\{j\\in\\mathcal\{N\}\_\{t\}\(i\)\}\\Delta\\mathbf\{x\}\_\{ij\}\\cdot\\text\{MLP\}\_\{t\}^\{\\text\{coord\},\(l\)\}\(\\mathbf\{m\}\_\{ij\}^\{\(l\)\}\)\(7\)whereMLPtcoord,\(l\)\\text\{MLP\}\_\{t\}^\{\\text\{coord\},\(l\)\}produces a scalar weight\. The product of a displacement vector and a scalar computed from invariant inputs is equivariant by construction\. After all layers, the encoder produces residue embeddings𝐡∈ℝN×D\\mathbf\{h\}\\in\\mathbb\{R\}^\{N\\times D\}and updated backbone coordinates𝐗^\\hat\{\\mathbf\{X\}\}\.

### 4\.3Distance\-Biased Cross\-Attention

After encoding, we extract CDR node embeddings𝐇cdr∈ℝL×D\\mathbf\{H\}\_\{\\text\{cdr\}\}\\in\\mathbb\{R\}^\{L\\times D\}and antigen node embeddings𝐇ag∈ℝM×D\\mathbf\{H\}\_\{\\text\{ag\}\}\\in\\mathbb\{R\}^\{M\\times D\}\. Standard cross\-attention computes alignment scores purely from learned feature similarity, making no distinction between an antigen residue 5 Å from the CDR and one 30 Å away\.ConTactadds a Gaussian spatial bias that encodes a geometric inductive bias, since binding contacts are necessarily spatial neighbors\.

We project queries𝐐=𝐇cdr𝐖Q\\mathbf\{Q\}=\\mathbf\{H\}\_\{\\text\{cdr\}\}\\mathbf\{W\}\_\{Q\}and keys𝐊=𝐇ag𝐖K\\mathbf\{K\}=\\mathbf\{H\}\_\{\\text\{ag\}\}\\mathbf\{W\}\_\{K\}intoHHattention heads\. The attention score between CDR positioniiand antigen positionjjis:

αij\(h\)=softmaxj\(\(𝐪i\(h\)\)⊤𝐤j\(h\)Dh\+βij\)\\alpha\_\{ij\}^\{\(h\)\}=\\text\{softmax\}\_\{j\}\\\!\\left\(\\frac\{\(\\mathbf\{q\}\_\{i\}^\{\(h\)\}\)^\{\\top\}\\mathbf\{k\}\_\{j\}^\{\(h\)\}\}\{\\sqrt\{D\_\{h\}\}\}\+\\beta\_\{ij\}\\right\)\(8\)where the distance biasβij\\beta\_\{ij\}decays as a Gaussian function of the Cα\\alpha–Cα\\alphadistance between the updated coordinates from the encoder:

βij=exp⁡\(−dij22σ2\),dij=‖𝐱^iCα−𝐱^jCα‖2\\beta\_\{ij\}=\\exp\\\!\\left\(\-\\frac\{d\_\{ij\}^\{2\}\}\{2\\sigma^\{2\}\}\\right\),\\quad d\_\{ij\}=\\\|\\hat\{\\mathbf\{x\}\}\_\{i\}^\{\\text\{C\}\\alpha\}\-\\hat\{\\mathbf\{x\}\}\_\{j\}^\{\\text\{C\}\\alpha\}\\\|\_\{2\}\(9\)with bandwidthσ\\sigma\. The Gaussian decay is near 1\.0 for residues within van der Waals contact distance, drops toe−2≈0\.14e^\{\-2\}\\approx 0\.14at2σ2\\sigma\(approximately the contact threshold\), and becomes negligible beyond3σ3\\sigma\. The bias is shared across heads to provide a consistent spatial prior, while the learned query\-key projections specialize to different aspects of the binding interaction\.

The output for each CDR position concatenates the multi\-head weighted sum with the original CDR embedding:

𝐨i=\[𝐡icdr,∥h=1H∑j=1Mαij\(h\)𝐯j\(h\)\]\\mathbf\{o\}\_\{i\}=\\Big\[\\mathbf\{h\}\_\{i\}^\{\\text\{cdr\}\},\\;\\Big\\\|\_\{h=1\}^\{H\}\\sum\_\{j=1\}^\{M\}\\alpha\_\{ij\}^\{\(h\)\}\\mathbf\{v\}\_\{j\}^\{\(h\)\}\\Big\]\(10\)where∥\\\|denotes concatenation and𝐯j\(h\)=𝐡jag𝐖V\(h\)\\mathbf\{v\}\_\{j\}^\{\(h\)\}=\\mathbf\{h\}\_\{j\}^\{\\text\{ag\}\}\\mathbf\{W\}\_\{V\}^\{\(h\)\}are value projections\. The skip connection preserves the CDR embedding for downstream stages that primarily need structural context\.

### 4\.4Decoder

#### Stage 1: Complementarity Fingerprinting

The first decoder stage compresses the cross\-attention output into a compact representation of the local surface complementarity at each CDR position\. Binding interactions follow chemical complementarity patterns \(hydrophobic\-hydrophobic, charge\-charge, donor\-acceptor\) that can be captured in a low\-dimensional fingerprint, analogous to molecular fingerprints in cheminformatics\. Given the cross\-attention output𝐨i\\mathbf\{o\}\_\{i\}for CDR positionii, an MLP produces a fingerprint vector:

𝐟i=MLPfp\(𝐨i\)\\mathbf\{f\}\_\{i\}=\\text\{MLP\}\_\{\\text\{fp\}\}\(\\mathbf\{o\}\_\{i\}\)\(11\)
We train the fingerprint with a contrastive loss that enforces structural consistency\. CDR positions that face similar local antigen environments should have similar fingerprints, while positions facing dissimilar environments should have distinct fingerprints\. We define ground\-truth similarity between two CDR positionsiiandjj\(potentially from different complexes in the batch\) based on the cosine similarity of their true local environment descriptors, computed from the 3D arrangement and amino acid composition of the nearest antigen residues around each position\. Positive pairs𝒫\\mathcal\{P\}are those exceeding a similarity threshold\. The loss follows an InfoNCE formulation:

ℒfp=−1\|𝒫\|∑\(i,j\)∈𝒫log⁡exp⁡\(𝐟i⊤𝐟j/τfp\)∑k∈ℬexp⁡\(𝐟i⊤𝐟k/τfp\)\\mathcal\{L\}\_\{\\text\{fp\}\}=\-\\frac\{1\}\{\|\\mathcal\{P\}\|\}\\sum\_\{\(i,j\)\\in\\mathcal\{P\}\}\\log\\frac\{\\exp\(\\mathbf\{f\}\_\{i\}^\{\\top\}\\mathbf\{f\}\_\{j\}/\\tau\_\{\\text\{fp\}\}\)\}\{\\sum\_\{k\\in\\mathcal\{B\}\}\\exp\(\\mathbf\{f\}\_\{i\}^\{\\top\}\\mathbf\{f\}\_\{k\}/\\tau\_\{\\text\{fp\}\}\)\}\(12\)whereℬ\\mathcal\{B\}is the set of all CDR positions in the batch andτfp\\tau\_\{\\text\{fp\}\}is a temperature parameter\. The contrastive objective ensures that the fingerprint captures*what kind of binding environment*a CDR position faces, conditioning the subsequent contact prediction stage\.

#### Stage 2: Contact Prediction

The second stage predicts which CDR positions will form contacts with the antigen\. This is the central component of the contact\-first decomposition\. Existing methods leave contact identification as an implicit byproduct of message passing\(Konget al\.,[2023a](https://arxiv.org/html/2605.21600#bib.bib256); Wuet al\.,[2025b](https://arxiv.org/html/2605.21600#bib.bib199)\)or graph construction\(Konget al\.,[2023b](https://arxiv.org/html/2605.21600#bib.bib251)\)\.ConTactsupervises contact prediction explicitly and uses the predictions to gate downstream information flow\.

For each CDR positionii, we aggregate features from itsKKnearest antigen neighbors \(by Cα\\alphadistance\) using distance\-weighted pooling:

𝐚i=1K∑j∈KNNK\(i\)ϕrbf\(dij\)⊙𝐡jag\\mathbf\{a\}\_\{i\}=\\frac\{1\}\{K\}\\sum\_\{j\\in\\text\{KNN\}\_\{K\}\(i\)\}\\phi\_\{\\text\{rbf\}\}\(d\_\{ij\}\)\\odot\\mathbf\{h\}\_\{j\}^\{\\text\{ag\}\}\(13\)whereϕrbf\(dij\)\\phi\_\{\\text\{rbf\}\}\(d\_\{ij\}\)is a learned linear projection of the RBF encoding broadcast to match the antigen embedding dimension, and⊙\\odotdenotes elementwise multiplication\. The contact predictor takes a concatenation of four inputs: the CDR embedding, the KNN\-aggregated antigen features, an RBF encoding of the minimum distance to any antigen residue, and the complementarity fingerprint from Stage 1:

c^i=σ\(MLPct\(\[𝐡icdr,𝐚i,ϕrbf\(dimin\),𝐟i\]\)\)\\hat\{c\}\_\{i\}=\\sigma\\\!\\left\(\\text\{MLP\}\_\{\\text\{ct\}\}\\\!\\left\(\[\\mathbf\{h\}\_\{i\}^\{\\text\{cdr\}\},\\;\\mathbf\{a\}\_\{i\},\\;\\phi\_\{\\text\{rbf\}\}\(d\_\{i\}^\{\\text\{min\}\}\),\\;\\mathbf\{f\}\_\{i\}\]\\right\)\\right\)\(14\)wheredimin=minj∈VA⁡‖𝐱^i−𝐱^j‖d\_\{i\}^\{\\text\{min\}\}=\\min\_\{j\\in V\_\{A\}\}\\\|\\hat\{\\mathbf\{x\}\}\_\{i\}\-\\hat\{\\mathbf\{x\}\}\_\{j\}\\\|is the minimum Cα\\alphadistance to any antigen residue andσ\\sigmadenotes the sigmoid function\. Including the fingerprint𝐟i\\mathbf\{f\}\_\{i\}from Stage 1 creates a cascaded dependency, so that the quality of contact prediction depends on the learned complementarity representation\.

The contact predictor outputs a soft probabilityc^i∈\[0,1\]\\hat\{c\}\_\{i\}\\in\[0,1\]rather than a hard binary decision\. The sigmoid output already provides smooth gradients, and hard thresholding introduced training instability in preliminary experiments\. We supervise the contact predictor with a focal binary cross\-entropy loss\(Linet al\.,[2017](https://arxiv.org/html/2605.21600#bib.bib339)\)that addresses the inherent class imbalance, where non\-contact positions typically outnumber contacts by 3–5×\\times:

ℒcontact=−1L∑i=1L\(1−p^i\)γ\[cilog⁡c^i\+\(1−ci\)log⁡\(1−c^i\)\]\\mathcal\{L\}\_\{\\text\{contact\}\}=\-\\frac\{1\}\{L\}\\sum\_\{i=1\}^\{L\}\(1\-\\hat\{p\}\_\{i\}\)^\{\\gamma\}\\left\[c\_\{i\}\\log\\hat\{c\}\_\{i\}\+\(1\-c\_\{i\}\)\\log\(1\-\\hat\{c\}\_\{i\}\)\\right\]\(15\)whereci∈\{0,1\}c\_\{i\}\\in\\\{0,1\\\}is the ground\-truth contact label \([Equation2](https://arxiv.org/html/2605.21600#S3.E2)\),p^i\\hat\{p\}\_\{i\}denotes the predicted probability of the correct class, andγ\\gammais the focusing parameter\. The factor\(1−p^i\)γ\(1\-\\hat\{p\}\_\{i\}\)^\{\\gamma\}down\-weights well\-classified examples, concentrating the learning signal on hard, ambiguous positions near the contact boundary\.

#### Stage 3: Contact\-Guided Local Complementarity Injection

The third stage uses the predicted contact confidencec^i\\hat\{c\}\_\{i\}from Stage 2 to selectively inject local antigen information into the CDR embeddings\. Antigen features should influence CDR representations primarily at positions the model predicts will form contacts, while non\-contact positions should rely mainly on their backbone geometry context\.

For each CDR positionii, we aggregate features fromKK\-nearest antigen neighbors:

𝐡ilocal=1K∑j∈KNNK\(i\)𝐡jag\\mathbf\{h\}\_\{i\}^\{\\text\{local\}\}=\\frac\{1\}\{K\}\\sum\_\{j\\in\\text\{KNN\}\_\{K\}\(i\)\}\\mathbf\{h\}\_\{j\}^\{\\text\{ag\}\}\(16\)A learned gate modulates the injection magnitude based on the CDR embedding and the contact prediction:

gi=σ\(𝐰g⊤\[𝐡icdr,c^i\]\+bg\)g\_\{i\}=\\sigma\\\!\\left\(\\mathbf\{w\}\_\{g\}^\{\\top\}\[\\mathbf\{h\}\_\{i\}^\{\\text\{cdr\}\},\\;\\hat\{c\}\_\{i\}\]\+b\_\{g\}\\right\)\(17\)The enriched CDR embedding combines the original representation with the gated antigen information:

𝐡ienriched=𝐡icdr\+gi⋅c^i⋅MLPproj\(𝐡ilocal\)\\mathbf\{h\}\_\{i\}^\{\\text\{enriched\}\}=\\mathbf\{h\}\_\{i\}^\{\\text\{cdr\}\}\+g\_\{i\}\\cdot\\hat\{c\}\_\{i\}\\cdot\\text\{MLP\}\_\{\\text\{proj\}\}\(\\mathbf\{h\}\_\{i\}^\{\\text\{local\}\}\)\(18\)The productgi⋅c^ig\_\{i\}\\cdot\\hat\{c\}\_\{i\}creates a double gating mechanism\. The contact confidencec^i\\hat\{c\}\_\{i\}from Stage 2 provides a data\-driven prior: at non\-contact positions \(c^i≈0\\hat\{c\}\_\{i\}\\approx 0\), the antigen contribution is effectively zeroed out regardless of the learned gate, preventing noise from distant antigen residues\. The learned gategig\_\{i\}provides fine\-grained control, allowing the model to modulate injection magnitude even at contact positions based on local structural context\.

The final representation for the sequence head concatenates the enriched embedding with the contact\-masked cross\-attention output:

𝐳i=\[𝐡ienriched,c^i⋅𝐨iattn\]\\mathbf\{z\}\_\{i\}=\[\\mathbf\{h\}\_\{i\}^\{\\text\{enriched\}\},\\;\\hat\{c\}\_\{i\}\\cdot\\mathbf\{o\}\_\{i\}^\{\\text\{attn\}\}\]\(19\)where𝐨iattn\\mathbf\{o\}\_\{i\}^\{\\text\{attn\}\}is the attention output from[Section4\.3](https://arxiv.org/html/2605.21600#S4.SS3)\. Multiplying byc^i\\hat\{c\}\_\{i\}further suppresses the attention\-derived antigen information at non\-contact positions\.

#### Contact\-Weighted Sequence Head

The sequence head maps the final representation𝐳i\\mathbf\{z\}\_\{i\}to amino acid logitsℓi∈ℝ20\\boldsymbol\{\\ell\}\_\{i\}\\in\\mathbb\{R\}^\{20\}via an MLP\. Rather than standard per\-position cross\-entropy, we apply a contact\-weighted variant that allocates more learning capacity to positions predicted to form binding contacts:

ℒseq=−1L∑i=1Lwilog⁡exp⁡\(ℓiyi\)∑a=120exp⁡\(ℓia\)\\mathcal\{L\}\_\{\\text\{seq\}\}=\-\\frac\{1\}\{L\}\\sum\_\{i=1\}^\{L\}w\_\{i\}\\log\\frac\{\\exp\(\\ell\_\{i\}^\{y\_\{i\}\}\)\}\{\\sum\_\{a=1\}^\{20\}\\exp\(\\ell\_\{i\}^\{a\}\)\}\(20\)whereyiy\_\{i\}is the ground\-truth amino acid at positioniiand the position\-specific weight is:

wi=1\+α⋅c^iw\_\{i\}=1\+\\alpha\\cdot\\hat\{c\}\_\{i\}\(21\)The hyperparameterα\\alphacontrols the relative up\-weighting of contact positions\. This reweighting follows from the observation that standard cross\-entropy distributes learning capacity uniformly, treating a non\-contact glycine at the loop apex the same as a contact\-forming tryptophan buried in an antigen pocket\. By up\-weighting contacts, the model receives stronger gradient signal at precisely the positions where amino acid identity is most constrained by the antigen\.

At inference, the predicted amino acid at each position iss^i=argmaxa⁡ℓia\\hat\{s\}\_\{i\}=\\operatorname\*\{arg\\,max\}\_\{a\}\\ell\_\{i\}^\{a\}\. The contact predictionsc^i\\hat\{c\}\_\{i\}can also be inspected to verify which positions the model believes form contacts\.

### 4\.5Training Objective

The full training objective combines seven loss terms:

ℒ=ℒseq\+λcoordℒcoord\+λcontactℒcontact\+λfpℒfp\+λpairℒpair\+λdockℒdock\+λauxℒaux\\begin\{split\}\\mathcal\{L\}=\\mathcal\{L\}\_\{\\text\{seq\}\}&\+\\lambda\_\{\\text\{coord\}\}\\mathcal\{L\}\_\{\\text\{coord\}\}\+\\lambda\_\{\\text\{contact\}\}\\mathcal\{L\}\_\{\\text\{contact\}\}\+\\lambda\_\{\\text\{fp\}\}\\mathcal\{L\}\_\{\\text\{fp\}\}\\\\ &\+\\lambda\_\{\\text\{pair\}\}\\mathcal\{L\}\_\{\\text\{pair\}\}\+\\lambda\_\{\\text\{dock\}\}\\mathcal\{L\}\_\{\\text\{dock\}\}\+\\lambda\_\{\\text\{aux\}\}\\mathcal\{L\}\_\{\\text\{aux\}\}\\end\{split\}\(22\)
The coordinate lossℒcoord\\mathcal\{L\}\_\{\\text\{coord\}\}is a smooth\-ℓ1\\ell\_\{1\}\(Huber\) loss on predicted versus true Cα\\alphacoordinates for CDR positions:

ℒcoord=1L∑k∈VCDRsmoothℓ1\(𝐱^kCα−𝐱kCα,true\)\\mathcal\{L\}\_\{\\text\{coord\}\}=\\frac\{1\}\{L\}\\sum\_\{k\\in V\_\{\\text\{CDR\}\}\}\\text\{smooth\}\_\{\\ell\_\{1\}\}\\\!\\left\(\\hat\{\\mathbf\{x\}\}\_\{k\}^\{\\text\{C\}\\alpha\}\-\\mathbf\{x\}\_\{k\}^\{\\text\{C\}\\alpha,\\text\{true\}\}\\right\)\(23\)The pairing lossℒpair\\mathcal\{L\}\_\{\\text\{pair\}\}is an InfoNCE contrastive loss that matches mean\-pooled CDR and antigen embeddings within the batch, treating cognate pairs as positives:

ℒpair=−1B∑i=1Blog⁡exp⁡\(𝐡¯icdr⋅𝐡¯iag/τp\)∑k=1Bexp⁡\(𝐡¯icdr⋅𝐡¯kag/τp\)\\mathcal\{L\}\_\{\\text\{pair\}\}=\-\\frac\{1\}\{B\}\\sum\_\{i=1\}^\{B\}\\log\\frac\{\\exp\(\\bar\{\\mathbf\{h\}\}\_\{i\}^\{\\text\{cdr\}\}\\cdot\\bar\{\\mathbf\{h\}\}\_\{i\}^\{\\text\{ag\}\}/\\tau\_\{p\}\)\}\{\\sum\_\{k=1\}^\{B\}\\exp\(\\bar\{\\mathbf\{h\}\}\_\{i\}^\{\\text\{cdr\}\}\\cdot\\bar\{\\mathbf\{h\}\}\_\{k\}^\{\\text\{ag\}\}/\\tau\_\{p\}\)\}\(24\)The docking lossℒdock\\mathcal\{L\}\_\{\\text\{dock\}\}penalizes the minimum predicted Cα\\alphadistance from each CDR residue to epitope atoms when it exceeds a cutoff, encouraging the predicted backbone to dock near the epitope\. The auxiliary lossℒaux\\mathcal\{L\}\_\{\\text\{aux\}\}is a CDR feature reconstruction regularizer that prevents representation collapse\. All loss weightsλ\\lambdaare determined by hyperparameter sweeps using Weights & Biases \(W&B\)\.

## 5Experiments

### 5\.1Setup

##### Dataset and metrics\.

We evaluate onChimera\-Bench\(Ahmedet al\.,[2026](https://arxiv.org/html/2605.21600#bib.bib1)\), comprising 2,922 antibody\-antigen complexes with the epitope\-group split \(2,338/292/292 train/val/test\)\. We report CDR\-H3 results across eight metrics\. Sequence quality is measured by amino acid recovery \(AAR\), contact AAR \(CAAR, restricted to positions within 8 Å of the antigen\), and perplexity \(PPL\)\. Structural quality is measured by Cα\\alphaRMSD\. Binding quality is measured by fraction of native contacts \(fnat\), interface RMSD \(iRMSD\), DockQ\(Basu and Wallner,[2016](https://arxiv.org/html/2605.21600#bib.bib272)\), and epitope F1\. All interface metrics use symmetric Cα\\alpha–Cα\\alphacontacts at 8 Å restricted to CDR residues\.

##### Baselines\.

We compare against eleven baselines spanning four architectural families: equivariant GNNs \(RAAD\(Wuet al\.,[2025b](https://arxiv.org/html/2605.21600#bib.bib199)\), MEAN\(Konget al\.,[2023a](https://arxiv.org/html/2605.21600#bib.bib256)\), dyMEAN\(Konget al\.,[2023b](https://arxiv.org/html/2605.21600#bib.bib251)\)\), diffusion and flow models \(DiffAb\(Luoet al\.,[2022](https://arxiv.org/html/2605.21600#bib.bib252)\), AbFlowNet\(Abiret al\.,[2025](https://arxiv.org/html/2605.21600#bib.bib259)\), AbMEGD\(Chenet al\.,[2025](https://arxiv.org/html/2605.21600#bib.bib261)\), RADAb\(Wanget al\.,[2024](https://arxiv.org/html/2605.21600#bib.bib123)\), dyAb\(Tanet al\.,[2025](https://arxiv.org/html/2605.21600#bib.bib254)\)\), ODE \(AbODE\(Vermaet al\.,[2023](https://arxiv.org/html/2605.21600#bib.bib129)\)\), and autoregressive \(RefineGNN\(Jinet al\.,[2022b](https://arxiv.org/html/2605.21600#bib.bib131)\), AbDockGen\(Jinet al\.,[2022a](https://arxiv.org/html/2605.21600#bib.bib227)\)\)\. All models are retrained onChimera\-Benchwith their original hyperparameters\.

##### Implementation details\.

ConTacthas 9\.68M trainable parameters\. The feature encoder projects 108D input features to 32D embeddings\. The VN\-EGNN encoder uses 4 layers with 256D hidden features and 3 virtual nodes\. The cross\-attention module uses 2 heads with bandwidthσ=4\\sigma=4Å\. The complementarity fingerprint is 32\-dimensional\. The contact predictor MLP has two hidden layers of 128 units\. The contact weightα=4\.47\\alpha=4\.47and focal lossγ=2\\gamma=2\. Loss weights areλcoord=0\.598\\lambda\_\{\\text\{coord\}\}=0\.598,λcontact=1\.763\\lambda\_\{\\text\{contact\}\}=1\.763,λfp=0\.020\\lambda\_\{\\text\{fp\}\}=0\.020,λpair=0\.103\\lambda\_\{\\text\{pair\}\}=0\.103,λdock=0\.233\\lambda\_\{\\text\{dock\}\}=0\.233,λaux=0\.200\\lambda\_\{\\text\{aux\}\}=0\.200, determined by Weights & Biases sweep\. The notably large weight onλcontact\\lambda\_\{\\text\{contact\}\}reflects the importance of accurate contact prediction for downstream sequence quality\. We train with Adam \(lr=6\.31×10−4=6\.31\\times 10^\{\-4\}, exponential decayγlr=0\.944\\gamma\_\{\\text\{lr\}\}=0\.944per epoch\), gradient clipping at 0\.5, batch size 8, dropout 0\.1, and early stopping with patience 10 on validation loss\. Training takes approximately 1\.6 hours on a single NVIDIA H100 80GB GPU\.

### 5\.2Main Results

Table 1:CDR\-H3 design onChimera\-Bench\. Best inbold, second\-bestunderlined\.AAR↑\\uparrow\.38\.21RMSD↓\\downarrow1\.632\.86fnat↑\\uparrow\.67\.65epiF1↑\\uparrow\.79\.76ConTactRAADMEANDiffAbRefineGNNFigure 2:Comparison ofConTactagainst representative baselines on four key metrics\.ConTactachieves the best AAR, lowest RMSD, highest fnat, and highest epiF1\. For RMSD \(↓\\downarrow\), shorter bars are better\.
### 5\.3Results and Discussion

[Table1](https://arxiv.org/html/2605.21600#S5.T1)presents the full comparison\.ConTactachieves the best performance across structure, binding, and epitope metrics simultaneously\.[Figure2](https://arxiv.org/html/2605.21600#S5.F2)highlights three key metrics against representative baselines\.

##### Structural quality\.

ConTactachieves the lowest RMSD \(1\.63 Å\) among all methods, improving over RAAD by 7% \(1\.63 vs 1\.75 Å\) and over MEAN by 11% \(1\.63 vs 1\.84 Å\)\. The contact\-guided injection produces CDR backbones that are both geometrically accurate and properly positioned relative to the epitope\. The docking loss contributes to this, but the contact prediction stage is the key differentiator: by identifying which positions will interact with the antigen, the model generates backbone conformations that better accommodate the binding geometry\.

##### Binding quality\.

ConTactachieves the highest fnat \(0\.67\), lowest iRMSD \(1\.35 Å\), and highest DockQ \(0\.73\)\. RefineGNN achieves strong binding metrics \(fnat 0\.65, DockQ 0\.73\) despite receiving no antigen input, confirming that backbone geometry alone carries substantial information about interface contacts\(Liet al\.,[2025](https://arxiv.org/html/2605.21600#bib.bib335)\)\.ConTactmatches or surpasses RefineGNN on all binding metrics while additionally conditioning on the antigen, as reflected in its substantially higher epitope F1\.

##### Epitope awareness\.

ConTactachieves the best epitope F1 at 0\.79, surpassing RefineGNN \(0\.76\) by 4% and RAAD/MEAN \(both 0\.72\) by 10%\. This is the most direct validation of the contact\-first approach\. The explicit contact prediction stage forces the model to identify which antigen residues are binding\-relevant, and the contact\-gated injection transfers this awareness to the sequence head\. Methods without explicit contact reasoning \(RAAD, MEAN, dyMEAN\) achieve substantially lower epitope F1, confirming that uniform message passing fails to capture epitope specificity\.

##### Sequence recovery\.

ConTactachieves an AAR of 0\.38, the highest among all baselines\. The three GNN baselines \(RAAD, MEAN, dyMEAN\) reach 0\.37\. Contact AAR \(CAAR 0\.20\) remains comparable to most baselines \(RAAD 0\.21, MEAN 0\.24\), indicating that predicting the exact amino acid at contact positions remains fundamentally challenging across all current methods\. The contact\-weighted CE concentrates gradient signal on contact positions, but the underlying difficulty of predicting antigen\-specific amino acid identity persists\. This gap \(AAR 0\.38 vs CAAR 0\.20\) suggests the bottleneck is not learning capacity allocation but the information content of Cα\\alpha\-level antigen representations, which may not fully capture the chemical constraints imposed by antigen binding pockets\.

##### The contact\-first hypothesis\.

The results collectively validate the contact\-first decomposition\.ConTactdominates on structural and binding metrics because the explicit contact prediction stage provides a supervised bridge between the antigen representation and the sequence head\. Without this bridge, methods route antigen information through uniform message passing \(MEAN, RAAD\), fragment\-type embeddings \(DiffAb, AbFlowNet\), or graph topology \(dyMEAN\), all of which provide coordinate\-level guidance but fail to translate into position\-specific sequence preferences\. The RefineGNN comparison is particularly instructive: it achieves strong binding metrics from backbone geometry alone, demonstrating the baseline signal available without any antigen conditioning\.ConTactleverages this geometric signal through the VN\-EGNN encoder while adding antigen conditioning through the three\-stage decoder, achieving both the structural quality of geometry\-only methods and the epitope awareness that requires explicit antigen reasoning\.

## 6Conclusion

In this paper, we proposeConTact, which decomposes antibody CDR design into explicit contact prediction followed by contact\-guided sequence generation\. The three\-stage cascade \(complementarity fingerprinting, contact prediction, contact\-gated injection\) provides a supervised pathway for antigen information to reach the sequence head at binding\-relevant positions\. Experiments onChimera\-Benchdemonstrate that this contact\-first decomposition achieves the best structural quality, epitope awareness, and composite binding scores among eleven baselines\.

Contact AAR remains comparable to baselines, indicating that predicting antigen\-specific amino acid identity at binding positions is a fundamental bottleneck not resolved by contact\-aware conditioning alone\. Future work should explore richer antigen representations \(side\-chain geometry, surface electrostatics\) and multi\-modal sequence heads that capture the combinatorial nature of contact residue selection\.

## Impact Statement

This paper presents work whose goal is to advance computational antibody design\. Designed sequences require extensive experimental validation before any therapeutic application\. We see no specific negative societal consequences that must be highlighted\.

## References

- A\. R\. Abir, H\. S\. Shahgir, M\. R\. Z\. Ratul, M\. T\. Tahmid, G\. V\. Steeg, and Y\. Dong \(2025\)AbFlowNet: optimizing antibody\-antigen binding energy via diffusion\-gflownet fusion\.arXiv preprint arXiv:2505\.12358\.Cited by:[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px2.p1.1),[§3\.3](https://arxiv.org/html/2605.21600#S3.SS3.p1.1),[§5\.1](https://arxiv.org/html/2605.21600#S5.SS1.SSS0.Px2.p1.1)\.
- M\. Ahmed, N\. Taj, I\. U\. Khan, H\. Venkateswara, and M\. Patterson \(2026\)CHIMERA\-bench: a benchmark dataset for epitope\-specific antibody design\.InICLR 2026 Workshop on Generative and Experimental Perspectives for Biomolecular Design,External Links:[Link](https://openreview.net/forum?id=PyZvVIJbSy)Cited by:[§3\.1](https://arxiv.org/html/2605.21600#S3.SS1.p1.4),[§5\.1](https://arxiv.org/html/2605.21600#S5.SS1.SSS0.Px1.p1.3)\.
- U\. Alon and E\. Yahav \(2021\)On the bottleneck of graph neural networks and its practical implications\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=i80OPhOCVH2)Cited by:[§3\.2](https://arxiv.org/html/2605.21600#S3.SS2.p2.2),[§4\.2](https://arxiv.org/html/2605.21600#S4.SS2.p1.1)\.
- S\. Basu and B\. Wallner \(2016\)DockQ: a quality measure for protein\-protein docking models\.PloS one11\(8\),pp\. e0161879\.Cited by:[§5\.1](https://arxiv.org/html/2605.21600#S5.SS1.SSS0.Px1.p1.3)\.
- J\. Chen, X\. Cai, J\. Wu, and W\. Hu \(2025\)Antibody design and optimization with multi\-scale equivariant graph diffusion models for accurate complex antigen binding\.arXiv preprint arXiv:2506\.20957\.Cited by:[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px2.p1.1),[§5\.1](https://arxiv.org/html/2605.21600#S5.SS1.SSS0.Px2.p1.1)\.
- L\. Chinery, A\. M\. Hummer, B\. B\. Mehta, R\. Akbar, P\. Rawat, A\. Slabodkin, K\. Le Quy, F\. Lund\-Johansen, V\. Greiff, J\. R\. Jeliazkov, and C\. M\. Deane \(2024\)Simple computational methods can outperform deep learning in designing diverse, binder\-enriched antibody libraries\.bioRxiv\.External Links:[Document](https://dx.doi.org/10.1101/2024.03.26.586756)Cited by:[§1](https://arxiv.org/html/2605.21600#S1.p1.1),[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px3.p1.1)\.
- C\. Chothia and A\. M\. Lesk \(1987\)Canonical structures for the hypervariable regions of immunoglobulins\.Journal of molecular biology196\(4\),pp\. 901–917\.Cited by:[§1](https://arxiv.org/html/2605.21600#S1.p1.1),[§3\.1](https://arxiv.org/html/2605.21600#S3.SS1.p1.9)\.
- P\. Gainza, F\. Sverrisson, F\. Monti, E\. Rodola, D\. Boscaini, M\. Bronstein, and B\. Correia \(2020\)Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning\.Nature Methods17\(2\),pp\. 184–192\.Cited by:[§1](https://arxiv.org/html/2605.21600#S1.p4.1)\.
- P\. Gainza, S\. Wehrle, A\. Van Hall\-Beauvais, A\. Marchand, A\. Scheck, Z\. Harteveld, S\. Buckley, D\. Ni, S\. Tan, F\. Sverrisson,et al\.\(2023\)De novo design of protein interactions with learned surface fingerprints\.Nature617\(7959\),pp\. 176–184\.Cited by:[§1](https://arxiv.org/html/2605.21600#S1.p4.1),[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px4.p1.1),[§3\.3](https://arxiv.org/html/2605.21600#S3.SS3.p3.1)\.
- W\. Jin, R\. Barzilay, and T\. Jaakkola \(2022a\)Antibody\-antigen docking and design via hierarchical structure refinement\.InInternational Conference on Machine Learning,pp\. 10217–10227\.Cited by:[§5\.1](https://arxiv.org/html/2605.21600#S5.SS1.SSS0.Px2.p1.1)\.
- W\. Jin, J\. Wohlwend, R\. Barzilay, and T\. Jaakkola \(2022b\)Iterative refinement graph neural network for antibody sequence\-structure co\-design\.InInternational Conference on Learning Representations,Cited by:[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px3.p1.1),[§3\.3](https://arxiv.org/html/2605.21600#S3.SS3.p1.1),[§5\.1](https://arxiv.org/html/2605.21600#S5.SS1.SSS0.Px2.p1.1)\.
- X\. Kong, W\. Huang, and Y\. Liu \(2023a\)Conditional antibody design as 3D equivariant graph translation\.InInternational Conference on Learning Representations,Cited by:[§1](https://arxiv.org/html/2605.21600#S1.p1.1),[§1](https://arxiv.org/html/2605.21600#S1.p2.1),[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px1.p1.1),[§3\.3](https://arxiv.org/html/2605.21600#S3.SS3.p1.1),[§4\.4](https://arxiv.org/html/2605.21600#S4.SS4.SSSx2.p1.1),[§5\.1](https://arxiv.org/html/2605.21600#S5.SS1.SSS0.Px2.p1.1)\.
- X\. Kong, W\. Huang, and Y\. Liu \(2023b\)End\-to\-end full\-atom antibody design\.InInternational Conference on Machine Learning,pp\. 17409–17429\.Cited by:[§1](https://arxiv.org/html/2605.21600#S1.p1.1),[§1](https://arxiv.org/html/2605.21600#S1.p2.1),[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px1.p1.1),[§3\.3](https://arxiv.org/html/2605.21600#S3.SS3.p1.1),[§4\.4](https://arxiv.org/html/2605.21600#S4.SS4.SSSx2.p1.1),[§5\.1](https://arxiv.org/html/2605.21600#S5.SS1.SSS0.Px2.p1.1)\.
- Y\. Li, Y\. Lang, C\. Xu, Y\. Zhou, Z\. Pang, and P\. J\. Greisen \(2025\)Benchmarking inverse folding models for antibody CDR sequence design\.PLOS ONE20\(6\),pp\. e0324566\.External Links:[Document](https://dx.doi.org/10.1371/journal.pone.0324566)Cited by:[§1](https://arxiv.org/html/2605.21600#S1.p1.1),[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px3.p1.1),[§5\.3](https://arxiv.org/html/2605.21600#S5.SS3.SSS0.Px2.p1.1)\.
- T\. Lin, P\. Goyal, R\. Girshick, K\. He, and P\. Dollár \(2017\)Focal loss for dense object detection\.InIEEE International Conference on Computer Vision,pp\. 2980–2988\.Cited by:[§4\.4](https://arxiv.org/html/2605.21600#S4.SS4.SSSx2.p3.2)\.
- S\. Luo, Y\. Su, X\. Peng, S\. Wang, J\. Peng, and J\. Ma \(2022\)Antigen\-specific antibody design and optimization with diffusion\-based generative models for protein structures\.Advances in Neural Information Processing Systems35,pp\. 9754–9767\.Cited by:[§1](https://arxiv.org/html/2605.21600#S1.p1.1),[§1](https://arxiv.org/html/2605.21600#S1.p2.1),[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px2.p1.1),[§3\.3](https://arxiv.org/html/2605.21600#S3.SS3.p1.1),[§5\.1](https://arxiv.org/html/2605.21600#S5.SS1.SSS0.Px2.p1.1)\.
- V\. G\. Satorras, E\. Hoogeboom, and M\. Welling \(2021\)E \(n\) equivariant graph neural networks\.InInternational conference on machine learning,pp\. 9323–9332\.Cited by:[§4\.2](https://arxiv.org/html/2605.21600#S4.SS2.p1.1)\.
- F\. Sestak, L\. Schneckenreiter, J\. Brandstetter, S\. Hochreiter, A\. Mayr, and G\. Klambauer \(2026\)VN\-EGNN: E\(3\)\-equivariant graph neural networks with virtual nodes enhance protein binding site identification\.Journal of Cheminformatics18,pp\. 11\.External Links:[Document](https://dx.doi.org/10.1186/s13321-025-01127-9)Cited by:[§3\.2](https://arxiv.org/html/2605.21600#S3.SS2.p1.10)\.
- C\. Tan, Y\. Zhang, Z\. Gao, Y\. Huang, H\. Lin, L\. Wu, F\. Wu, M\. Blanchette, and S\. Z\. Li \(2025\)DyAb: flow matching for flexible antibody design with alphafold\-driven pre\-binding antigen\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 782–790\.Cited by:[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px2.p1.1),[§5\.1](https://arxiv.org/html/2605.21600#S5.SS1.SSS0.Px2.p1.1)\.
- T\. Uçar and P\. Sormanni \(2025\)BLOSUM is all you learn—generative antibody models reflect evolutionary priors\.bioRxiv,pp\. 2025–10\.Cited by:[§1](https://arxiv.org/html/2605.21600#S1.p1.1),[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px3.p1.1)\.
- Y\. Verma, M\. Heinonen, and V\. Garg \(2023\)Abode: ab initio antibody design using conjoined odes\.InInternational Conference on Machine Learning,pp\. 35037–35050\.Cited by:[§5\.1](https://arxiv.org/html/2605.21600#S5.SS1.SSS0.Px2.p1.1)\.
- Z\. Wang, Y\. Ji, J\. Tian, and S\. Zheng \(2024\)Retrieval augmented diffusion model for structure\-informed antibody design and optimization\.arXiv preprint arXiv:2410\.15040\.Cited by:[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px2.p1.1),[§5\.1](https://arxiv.org/html/2605.21600#S5.SS1.SSS0.Px2.p1.1)\.
- J\. L\. Watson, D\. Juergens, N\. R\. Bennett, B\. L\. Trippe, J\. Yim, H\. E\. Eisenach, W\. Ahern, A\. J\. Borber, R\. J\. Ragotte,et al\.\(2023\)De novo design of protein structure and function with RFdiffusion\.Nature620\(7976\),pp\. 1089–1100\.Cited by:[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px4.p1.1),[§3\.3](https://arxiv.org/html/2605.21600#S3.SS3.p3.1)\.
- J\. Wu, X\. Kong, N\. Sun, J\. Wei, S\. Shan, F\. Feng, F\. Wu, J\. Peng, L\. Zhang, Y\. Liu, and J\. Ma \(2025a\)FlowDesign: improved design of antibody cdrs through flow matching and better prior distributions\.Cell Systems\.External Links:[Document](https://dx.doi.org/10.1016/j.cels.2025.101270)Cited by:[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px2.p1.1)\.
- L\. Wu, H\. Lin, Y\. Huang, Z\. Gao, C\. Tan, Y\. Liu, T\. Wu, and S\. Z\. Li \(2025b\)Relation\-aware equivariant graph networks for epitope\-unknown antibody design and specificity optimization\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 895–904\.Cited by:[§1](https://arxiv.org/html/2605.21600#S1.p1.1),[§1](https://arxiv.org/html/2605.21600#S1.p2.1),[§2](https://arxiv.org/html/2605.21600#S2.SS0.SSS0.Px1.p1.1),[§3\.3](https://arxiv.org/html/2605.21600#S3.SS3.p1.1),[§4\.4](https://arxiv.org/html/2605.21600#S4.SS4.SSSx2.p1.1),[§5\.1](https://arxiv.org/html/2605.21600#S5.SS1.SSS0.Px2.p1.1)\.
ConTact: Contact-First Antibody CDR Design via Explicit Interface Reasoning

Similar Articles

AgForce Enables Antigen-conditioned Generative Antibody Design

Conditional generation of antibody sequences with classifier-guided germline-absorbing discrete diffusion

Reading the Cell, Designing the Cure: Perturbation-Conditioned Molecular Diffusion for Function-Oriented Drug Design

CellBRIDGE: Learning Cellular Trajectories via Interaction-Aware Alignment

Co-folding model guided by structural proteomics

Submit Feedback

Similar Articles

AgForce Enables Antigen-conditioned Generative Antibody Design
Conditional generation of antibody sequences with classifier-guided germline-absorbing discrete diffusion
Reading the Cell, Designing the Cure: Perturbation-Conditioned Molecular Diffusion for Function-Oriented Drug Design
CellBRIDGE: Learning Cellular Trajectories via Interaction-Aware Alignment
Co-folding model guided by structural proteomics