Table 3 Open in new tab Models for...

Name

Class

Model

Training
Dataset

Description

Ref

AbLang

Antibody

RoBERTa

OAS
14M

Strengths: Multiple representations (e.g. residue codings, sequence codings).
Limitations: Lower performance in restoring longer regions and N-terminus residues.
Applications: Restoring missing residues.

[88]

AntiBERTa

Antibody

RoBERTa

OAS
58M

Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding.
Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints.
Applications: Paratope prediction and BCR repertoire analysis tasks.

[86]

AntiBERTy

Antibody

BERT

OAS
558M

Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues.
Limitations: Lacks the ability to differentiate between different species.
Applications: Affinity maturation; Identifying key binding residues.

[85]

ESM-1b
ESM-2

Protein

Transformer

UniRef50
250M

Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability.
Limitations: Dealing with a large model is challenging without sufficient resources.
Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction.

[82, 84]

IgLM

Antibody

Transformer

OAS
558M

Strengths: Produces infilled residue spans at designated positions within the antibody sequence.
Limitations: Less effective at generating light chain sequences for most species.
Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types.

[92]

nanoBERT

Nanobody

AntiBERTa

INDI
10M

Strengths: Demonstrated better performance for nanobodies compared with human-based methods.
Limitations: Poor performance in distinguishing human sequences from non-human ones.
Applications: Predicts the amino acid at a given position in a sequence.

[87]

Progen2-OAS

Antibody

Transformer

OAS
554M

Strengths: Model size comparable with protein language models.
Limitations: Performs poorly compared with models pre-trained on protein databases.
Applications: Ab sequence generation.

[89]

ProtBERT

Protein

BERT

UniRef100
216M

Strengths: Embeddings capture constraints relevant for protein structure and function.
Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab.
Applications: Multiple protein tasks for per-residue or per-sequence predictions.

[80]

Name

Class

Model

Training
Dataset

Description

Ref

AbLang

Antibody

RoBERTa

OAS
14M

[88]

AntiBERTa

Antibody

RoBERTa

OAS
58M

[86]

AntiBERTy

Antibody

BERT

OAS
558M

[85]

ESM-1b
ESM-2

Protein

Transformer

UniRef50
250M

[82, 84]

IgLM

Antibody

Transformer

OAS
558M

[92]

nanoBERT

Nanobody

AntiBERTa

INDI
10M

[87]

Progen2-OAS

Antibody

Transformer

OAS
554M

Strengths: Model size comparable with protein language models.
Limitations: Performs poorly compared with models pre-trained on protein databases.
Applications: Ab sequence generation.

[89]

ProtBERT

Protein

BERT

UniRef100
216M

[80]

Table 3

Open in new tab

Models for antibody sequence design. In this context, these models cannot be compared with a single specific task. Therefore, we have compared them in Figure 6 based on the number of parameters, dimension of the embedding, and number of layers. In the following table, we compare the dimensions of the dataset, strengths, limitations, and applications. The ’Training Dataset’ column displays the number of sequences used for training.

Name	Class	Model	Training Dataset	Description	Ref
AbLang	Antibody	RoBERTa	OAS 14M	Strengths: Multiple representations (e.g. residue codings, sequence codings). Limitations: Lower performance in restoring longer regions and N-terminus residues. Applications: Restoring missing residues.	[88]
AntiBERTa	Antibody	RoBERTa	OAS 58M	Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding. Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints. Applications: Paratope prediction and BCR repertoire analysis tasks.	[86]
AntiBERTy	Antibody	BERT	OAS 558M	Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues. Limitations: Lacks the ability to differentiate between different species. Applications: Affinity maturation; Identifying key binding residues.	[85]
ESM-1b ESM-2	Protein	Transformer	UniRef50 250M	Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability. Limitations: Dealing with a large model is challenging without sufficient resources. Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction.	[82, 84]
IgLM	Antibody	Transformer	OAS 558M	Strengths: Produces infilled residue spans at designated positions within the antibody sequence. Limitations: Less effective at generating light chain sequences for most species. Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types.	[92]
nanoBERT	Nanobody	AntiBERTa	INDI 10M	Strengths: Demonstrated better performance for nanobodies compared with human-based methods. Limitations: Poor performance in distinguishing human sequences from non-human ones. Applications: Predicts the amino acid at a given position in a sequence.	[87]
Progen2-OAS	Antibody	Transformer	OAS 554M	Strengths: Model size comparable with protein language models. Limitations: Performs poorly compared with models pre-trained on protein databases. Applications: Ab sequence generation.	[89]
ProtBERT	Protein	BERT	UniRef100 216M	Strengths: Embeddings capture constraints relevant for protein structure and function. Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab. Applications: Multiple protein tasks for per-residue or per-sequence predictions.	[80]

Name	Class	Model	Training Dataset	Description	Ref
AbLang	Antibody	RoBERTa	OAS 14M	Strengths: Multiple representations (e.g. residue codings, sequence codings). Limitations: Lower performance in restoring longer regions and N-terminus residues. Applications: Restoring missing residues.	[88]
AntiBERTa	Antibody	RoBERTa	OAS 58M	Strengths: Captures critical aspects of BCRs like mutation count, V gene origin, and B cell type for specialized understanding. Limitations: Limited ability to predict paratopes in an antigen-specific manner due to data constraints. Applications: Paratope prediction and BCR repertoire analysis tasks.	[86]
AntiBERTy	Antibody	BERT	OAS 558M	Strengths: Reveals repertoire trajectories and efficiently detects redundant sequences and emphasizes critical binding residues. Limitations: Lacks the ability to differentiate between different species. Applications: Affinity maturation; Identifying key binding residues.	[85]
ESM-1b ESM-2	Protein	Transformer	UniRef50 250M	Strengths: Learning from large-scale data generalization and adaptability improved performance and scalability. Limitations: Dealing with a large model is challenging without sufficient resources. Applications: ESM-1b: prediction of mutational effect and secondary structure; ESM2: filling missing amino acid and structure prediction.	[82, 84]
IgLM	Antibody	Transformer	OAS 558M	Strengths: Produces infilled residue spans at designated positions within the antibody sequence. Limitations: Less effective at generating light chain sequences for most species. Applications: Designing sequences with improved developability and reduced immunogenic risks for various species or chain types.	[92]
nanoBERT	Nanobody	AntiBERTa	INDI 10M	Strengths: Demonstrated better performance for nanobodies compared with human-based methods. Limitations: Poor performance in distinguishing human sequences from non-human ones. Applications: Predicts the amino acid at a given position in a sequence.	[87]
Progen2-OAS	Antibody	Transformer	OAS 554M	Strengths: Model size comparable with protein language models. Limitations: Performs poorly compared with models pre-trained on protein databases. Applications: Ab sequence generation.	[89]
ProtBERT	Protein	BERT	UniRef100 216M	Strengths: Embeddings capture constraints relevant for protein structure and function. Limitations: BERT-based language models like AntiBERTa seem to perform better on Ab. Applications: Multiple protein tasks for per-residue or per-sequence predictions.	[80]

This Feature Is Available To Subscribers Only