Models for antibody sequence design. In this context, these models cannot be compared with a single specific task. Therefore, we have compared them in Figure 6 based on the number of parameters, dimension of the embedding, and number of layers. In the following table, we compare the dimensions of the dataset, strengths, limitations, and applications. The ’Training Dataset’ column displays the number of sequences used for training.
Name . | Class . | Model . | Training Dataset . | Description . | Ref . |
---|---|---|---|---|---|
AbLang | Antibody | RoBERTa | OAS 14M | Strengths: Multiple representations (e.g. residue codings, sequence codings). Limitations: Lower performance in restoring longer regions and N-terminus residues. Applications: Restoring missing residues. | [88] |
AntiBERTa | Antibody | RoBERTa | OAS 58M | Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding. Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints. Applications: Paratope prediction and BCR repertoire analysis tasks. | [86] |
AntiBERTy | Antibody | BERT | OAS 558M | Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues. Limitations: Lacks the ability to differentiate between different species. Applications: Affinity maturation; Identifying key binding residues. | [85] |
ESM-1b ESM-2 | Protein | Transformer | UniRef50 250M | Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability. Limitations: Dealing with a large model is challenging without sufficient resources. Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction. | [82, 84] |
IgLM | Antibody | Transformer | OAS 558M | Strengths: Produces infilled residue spans at designated positions within the antibody sequence. Limitations: Less effective at generating light chain sequences for most species. Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types. | [92] |
nanoBERT | Nanobody | AntiBERTa | INDI 10M | Strengths: Demonstrated better performance for nanobodies compared with human-based methods. Limitations: Poor performance in distinguishing human sequences from non-human ones. Applications: Predicts the amino acid at a given position in a sequence. | [87] |
Progen2-OAS | Antibody | Transformer | OAS 554M | Strengths: Model size comparable with protein language models. Limitations: Performs poorly compared with models pre-trained on protein databases. Applications: Ab sequence generation. | [89] |
ProtBERT | Protein | BERT | UniRef100 216M | Strengths: Embeddings capture constraints relevant for protein structure and function. Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab. Applications: Multiple protein tasks for per-residue or per-sequence predictions. | [80] |
Name . | Class . | Model . | Training Dataset . | Description . | Ref . |
---|---|---|---|---|---|
AbLang | Antibody | RoBERTa | OAS 14M | Strengths: Multiple representations (e.g. residue codings, sequence codings). Limitations: Lower performance in restoring longer regions and N-terminus residues. Applications: Restoring missing residues. | [88] |
AntiBERTa | Antibody | RoBERTa | OAS 58M | Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding. Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints. Applications: Paratope prediction and BCR repertoire analysis tasks. | [86] |
AntiBERTy | Antibody | BERT | OAS 558M | Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues. Limitations: Lacks the ability to differentiate between different species. Applications: Affinity maturation; Identifying key binding residues. | [85] |
ESM-1b ESM-2 | Protein | Transformer | UniRef50 250M | Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability. Limitations: Dealing with a large model is challenging without sufficient resources. Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction. | [82, 84] |
IgLM | Antibody | Transformer | OAS 558M | Strengths: Produces infilled residue spans at designated positions within the antibody sequence. Limitations: Less effective at generating light chain sequences for most species. Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types. | [92] |
nanoBERT | Nanobody | AntiBERTa | INDI 10M | Strengths: Demonstrated better performance for nanobodies compared with human-based methods. Limitations: Poor performance in distinguishing human sequences from non-human ones. Applications: Predicts the amino acid at a given position in a sequence. | [87] |
Progen2-OAS | Antibody | Transformer | OAS 554M | Strengths: Model size comparable with protein language models. Limitations: Performs poorly compared with models pre-trained on protein databases. Applications: Ab sequence generation. | [89] |
ProtBERT | Protein | BERT | UniRef100 216M | Strengths: Embeddings capture constraints relevant for protein structure and function. Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab. Applications: Multiple protein tasks for per-residue or per-sequence predictions. | [80] |
Models for antibody sequence design. In this context, these models cannot be compared with a single specific task. Therefore, we have compared them in Figure 6 based on the number of parameters, dimension of the embedding, and number of layers. In the following table, we compare the dimensions of the dataset, strengths, limitations, and applications. The ’Training Dataset’ column displays the number of sequences used for training.
Name . | Class . | Model . | Training Dataset . | Description . | Ref . |
---|---|---|---|---|---|
AbLang | Antibody | RoBERTa | OAS 14M | Strengths: Multiple representations (e.g. residue codings, sequence codings). Limitations: Lower performance in restoring longer regions and N-terminus residues. Applications: Restoring missing residues. | [88] |
AntiBERTa | Antibody | RoBERTa | OAS 58M | Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding. Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints. Applications: Paratope prediction and BCR repertoire analysis tasks. | [86] |
AntiBERTy | Antibody | BERT | OAS 558M | Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues. Limitations: Lacks the ability to differentiate between different species. Applications: Affinity maturation; Identifying key binding residues. | [85] |
ESM-1b ESM-2 | Protein | Transformer | UniRef50 250M | Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability. Limitations: Dealing with a large model is challenging without sufficient resources. Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction. | [82, 84] |
IgLM | Antibody | Transformer | OAS 558M | Strengths: Produces infilled residue spans at designated positions within the antibody sequence. Limitations: Less effective at generating light chain sequences for most species. Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types. | [92] |
nanoBERT | Nanobody | AntiBERTa | INDI 10M | Strengths: Demonstrated better performance for nanobodies compared with human-based methods. Limitations: Poor performance in distinguishing human sequences from non-human ones. Applications: Predicts the amino acid at a given position in a sequence. | [87] |
Progen2-OAS | Antibody | Transformer | OAS 554M | Strengths: Model size comparable with protein language models. Limitations: Performs poorly compared with models pre-trained on protein databases. Applications: Ab sequence generation. | [89] |
ProtBERT | Protein | BERT | UniRef100 216M | Strengths: Embeddings capture constraints relevant for protein structure and function. Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab. Applications: Multiple protein tasks for per-residue or per-sequence predictions. | [80] |
Name . | Class . | Model . | Training Dataset . | Description . | Ref . |
---|---|---|---|---|---|
AbLang | Antibody | RoBERTa | OAS 14M | Strengths: Multiple representations (e.g. residue codings, sequence codings). Limitations: Lower performance in restoring longer regions and N-terminus residues. Applications: Restoring missing residues. | [88] |
AntiBERTa | Antibody | RoBERTa | OAS 58M | Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding. Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints. Applications: Paratope prediction and BCR repertoire analysis tasks. | [86] |
AntiBERTy | Antibody | BERT | OAS 558M | Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues. Limitations: Lacks the ability to differentiate between different species. Applications: Affinity maturation; Identifying key binding residues. | [85] |
ESM-1b ESM-2 | Protein | Transformer | UniRef50 250M | Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability. Limitations: Dealing with a large model is challenging without sufficient resources. Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction. | [82, 84] |
IgLM | Antibody | Transformer | OAS 558M | Strengths: Produces infilled residue spans at designated positions within the antibody sequence. Limitations: Less effective at generating light chain sequences for most species. Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types. | [92] |
nanoBERT | Nanobody | AntiBERTa | INDI 10M | Strengths: Demonstrated better performance for nanobodies compared with human-based methods. Limitations: Poor performance in distinguishing human sequences from non-human ones. Applications: Predicts the amino acid at a given position in a sequence. | [87] |
Progen2-OAS | Antibody | Transformer | OAS 554M | Strengths: Model size comparable with protein language models. Limitations: Performs poorly compared with models pre-trained on protein databases. Applications: Ab sequence generation. | [89] |
ProtBERT | Protein | BERT | UniRef100 216M | Strengths: Embeddings capture constraints relevant for protein structure and function. Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab. Applications: Multiple protein tasks for per-residue or per-sequence predictions. | [80] |
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.