Table 3

Models for antibody sequence design. In this context, these models cannot be compared with a single specific task. Therefore, we have compared them in Figure 6 based on the number of parameters, dimension of the embedding, and number of layers. In the following table, we compare the dimensions of the dataset, strengths, limitations, and applications. The ’Training Dataset’ column displays the number of sequences used for training.

NameClassModelTraining
Dataset
DescriptionRef
AbLangAntibodyRoBERTaOAS
14M
Strengths: Multiple representations (e.g. residue codings, sequence codings).
 Limitations: Lower performance in restoring longer regions and N-terminus residues.
 Applications: Restoring missing residues.
[88]
AntiBERTaAntibodyRoBERTaOAS
58M
Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding.
 Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints.
 Applications: Paratope prediction and BCR repertoire analysis tasks.
[86]
AntiBERTyAntibodyBERTOAS
558M
Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues.
 Limitations: Lacks the ability to differentiate between different species.
 Applications: Affinity maturation; Identifying key binding residues.
[85]
ESM-1b
ESM-2
ProteinTransformerUniRef50
250M
Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability.
 Limitations: Dealing with a large model is challenging without sufficient resources.
 Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction.
[82, 84]
IgLMAntibodyTransformerOAS
558M
Strengths: Produces infilled residue spans at designated positions within the antibody sequence.
 Limitations: Less effective at generating light chain sequences for most species.
 Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types.
[92]
nanoBERTNanobodyAntiBERTaINDI
10M
Strengths: Demonstrated better performance for nanobodies compared with human-based methods.
 Limitations: Poor performance in distinguishing human sequences from non-human ones.
 Applications: Predicts the amino acid at a given position in a sequence.
[87]
Progen2-OASAntibodyTransformerOAS
554M
Strengths: Model size comparable with protein language models.
 Limitations: Performs poorly compared with models pre-trained on protein databases.
 Applications: Ab sequence generation.
[89]
ProtBERTProteinBERTUniRef100
216M
Strengths: Embeddings capture constraints relevant for protein structure and function.
 Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab.
 Applications: Multiple protein tasks for per-residue or per-sequence predictions.
[80]
NameClassModelTraining
Dataset
DescriptionRef
AbLangAntibodyRoBERTaOAS
14M
Strengths: Multiple representations (e.g. residue codings, sequence codings).
 Limitations: Lower performance in restoring longer regions and N-terminus residues.
 Applications: Restoring missing residues.
[88]
AntiBERTaAntibodyRoBERTaOAS
58M
Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding.
 Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints.
 Applications: Paratope prediction and BCR repertoire analysis tasks.
[86]
AntiBERTyAntibodyBERTOAS
558M
Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues.
 Limitations: Lacks the ability to differentiate between different species.
 Applications: Affinity maturation; Identifying key binding residues.
[85]
ESM-1b
ESM-2
ProteinTransformerUniRef50
250M
Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability.
 Limitations: Dealing with a large model is challenging without sufficient resources.
 Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction.
[82, 84]
IgLMAntibodyTransformerOAS
558M
Strengths: Produces infilled residue spans at designated positions within the antibody sequence.
 Limitations: Less effective at generating light chain sequences for most species.
 Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types.
[92]
nanoBERTNanobodyAntiBERTaINDI
10M
Strengths: Demonstrated better performance for nanobodies compared with human-based methods.
 Limitations: Poor performance in distinguishing human sequences from non-human ones.
 Applications: Predicts the amino acid at a given position in a sequence.
[87]
Progen2-OASAntibodyTransformerOAS
554M
Strengths: Model size comparable with protein language models.
 Limitations: Performs poorly compared with models pre-trained on protein databases.
 Applications: Ab sequence generation.
[89]
ProtBERTProteinBERTUniRef100
216M
Strengths: Embeddings capture constraints relevant for protein structure and function.
 Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab.
 Applications: Multiple protein tasks for per-residue or per-sequence predictions.
[80]
Table 3

Models for antibody sequence design. In this context, these models cannot be compared with a single specific task. Therefore, we have compared them in Figure 6 based on the number of parameters, dimension of the embedding, and number of layers. In the following table, we compare the dimensions of the dataset, strengths, limitations, and applications. The ’Training Dataset’ column displays the number of sequences used for training.

NameClassModelTraining
Dataset
DescriptionRef
AbLangAntibodyRoBERTaOAS
14M
Strengths: Multiple representations (e.g. residue codings, sequence codings).
 Limitations: Lower performance in restoring longer regions and N-terminus residues.
 Applications: Restoring missing residues.
[88]
AntiBERTaAntibodyRoBERTaOAS
58M
Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding.
 Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints.
 Applications: Paratope prediction and BCR repertoire analysis tasks.
[86]
AntiBERTyAntibodyBERTOAS
558M
Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues.
 Limitations: Lacks the ability to differentiate between different species.
 Applications: Affinity maturation; Identifying key binding residues.
[85]
ESM-1b
ESM-2
ProteinTransformerUniRef50
250M
Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability.
 Limitations: Dealing with a large model is challenging without sufficient resources.
 Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction.
[82, 84]
IgLMAntibodyTransformerOAS
558M
Strengths: Produces infilled residue spans at designated positions within the antibody sequence.
 Limitations: Less effective at generating light chain sequences for most species.
 Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types.
[92]
nanoBERTNanobodyAntiBERTaINDI
10M
Strengths: Demonstrated better performance for nanobodies compared with human-based methods.
 Limitations: Poor performance in distinguishing human sequences from non-human ones.
 Applications: Predicts the amino acid at a given position in a sequence.
[87]
Progen2-OASAntibodyTransformerOAS
554M
Strengths: Model size comparable with protein language models.
 Limitations: Performs poorly compared with models pre-trained on protein databases.
 Applications: Ab sequence generation.
[89]
ProtBERTProteinBERTUniRef100
216M
Strengths: Embeddings capture constraints relevant for protein structure and function.
 Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab.
 Applications: Multiple protein tasks for per-residue or per-sequence predictions.
[80]
NameClassModelTraining
Dataset
DescriptionRef
AbLangAntibodyRoBERTaOAS
14M
Strengths: Multiple representations (e.g. residue codings, sequence codings).
 Limitations: Lower performance in restoring longer regions and N-terminus residues.
 Applications: Restoring missing residues.
[88]
AntiBERTaAntibodyRoBERTaOAS
58M
Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding.
 Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints.
 Applications: Paratope prediction and BCR repertoire analysis tasks.
[86]
AntiBERTyAntibodyBERTOAS
558M
Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues.
 Limitations: Lacks the ability to differentiate between different species.
 Applications: Affinity maturation; Identifying key binding residues.
[85]
ESM-1b
ESM-2
ProteinTransformerUniRef50
250M
Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability.
 Limitations: Dealing with a large model is challenging without sufficient resources.
 Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction.
[82, 84]
IgLMAntibodyTransformerOAS
558M
Strengths: Produces infilled residue spans at designated positions within the antibody sequence.
 Limitations: Less effective at generating light chain sequences for most species.
 Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types.
[92]
nanoBERTNanobodyAntiBERTaINDI
10M
Strengths: Demonstrated better performance for nanobodies compared with human-based methods.
 Limitations: Poor performance in distinguishing human sequences from non-human ones.
 Applications: Predicts the amino acid at a given position in a sequence.
[87]
Progen2-OASAntibodyTransformerOAS
554M
Strengths: Model size comparable with protein language models.
 Limitations: Performs poorly compared with models pre-trained on protein databases.
 Applications: Ab sequence generation.
[89]
ProtBERTProteinBERTUniRef100
216M
Strengths: Embeddings capture constraints relevant for protein structure and function.
 Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab.
 Applications: Multiple protein tasks for per-residue or per-sequence predictions.
[80]
Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close