Table 2.

Inverse folding comparison*

Native/ESMFoldProteinMPNNProstT5ProstT5(RoundTrip70)
lDDT ↑0.78 ± 0.010.77 ± 0.010.72 ± 0.010.73 ± 0.01
RMSD ↓2.55 ± 0.012.61 ± 0.012.90 ± 0.012.81 ± 0.01
TM-score ↑0.62 ± 0.020.61 ± 0.020.58 ± 0.020.60 ± 0.02
PIDE100 ± 029.6 ± 121.9 ± 0.922.4 ± 0.9
Entropy ↓0.13 ± 0.010.39 ± 0.030.20 ± 0.010.19 ± 0.01
Native/ESMFoldProteinMPNNProstT5ProstT5(RoundTrip70)
lDDT ↑0.78 ± 0.010.77 ± 0.010.72 ± 0.010.73 ± 0.01
RMSD ↓2.55 ± 0.012.61 ± 0.012.90 ± 0.012.81 ± 0.01
TM-score ↑0.62 ± 0.020.61 ± 0.020.58 ± 0.020.60 ± 0.02
PIDE100 ± 029.6 ± 121.9 ± 0.922.4 ± 0.9
Entropy ↓0.13 ± 0.010.39 ± 0.030.20 ± 0.010.19 ± 0.01

*Performance: structural similarity of ESMFold (8) and AlphaFold2 (32) predictions for native (Natural/ESMFold) and generated sequences in our test set. Sequences were generated using ProteinMPNN, ProstT5 and a filtered version of ProstT5 (ProstT5(rTrip70)) which uses the intrinsic back-translation of the model to filter by sequence similarity between native 3Di sequences and their counterpart predicted from generated AA sequences (3Di→AA→3Di). We generated AA sequences either until convergence (defined as ≥70 percentage pairwise sequence identity - PIDE - for 3Di letters) or after maximally ten attempts (to conserve resources). Single-sequence based ESMFold predictions for generated sequences were compared against the native ground-truth predicted by AlphaFold2 using lDDT (60), RMSD, TM-score (61), PIDE, and entropy (KL-divergence between the AA distribution in UniProt and the generated sequences). Error bars indicate 95% confidence intervals estimated from 1000 bootstrap samples. Arrows next to metrics indicate whether higher (↑) or lower (↓) values are better. For PIDE applied to inverse folding, it is not clear whether higher is necessarily better.

Table 2.

Inverse folding comparison*

Native/ESMFoldProteinMPNNProstT5ProstT5(RoundTrip70)
lDDT ↑0.78 ± 0.010.77 ± 0.010.72 ± 0.010.73 ± 0.01
RMSD ↓2.55 ± 0.012.61 ± 0.012.90 ± 0.012.81 ± 0.01
TM-score ↑0.62 ± 0.020.61 ± 0.020.58 ± 0.020.60 ± 0.02
PIDE100 ± 029.6 ± 121.9 ± 0.922.4 ± 0.9
Entropy ↓0.13 ± 0.010.39 ± 0.030.20 ± 0.010.19 ± 0.01
Native/ESMFoldProteinMPNNProstT5ProstT5(RoundTrip70)
lDDT ↑0.78 ± 0.010.77 ± 0.010.72 ± 0.010.73 ± 0.01
RMSD ↓2.55 ± 0.012.61 ± 0.012.90 ± 0.012.81 ± 0.01
TM-score ↑0.62 ± 0.020.61 ± 0.020.58 ± 0.020.60 ± 0.02
PIDE100 ± 029.6 ± 121.9 ± 0.922.4 ± 0.9
Entropy ↓0.13 ± 0.010.39 ± 0.030.20 ± 0.010.19 ± 0.01

*Performance: structural similarity of ESMFold (8) and AlphaFold2 (32) predictions for native (Natural/ESMFold) and generated sequences in our test set. Sequences were generated using ProteinMPNN, ProstT5 and a filtered version of ProstT5 (ProstT5(rTrip70)) which uses the intrinsic back-translation of the model to filter by sequence similarity between native 3Di sequences and their counterpart predicted from generated AA sequences (3Di→AA→3Di). We generated AA sequences either until convergence (defined as ≥70 percentage pairwise sequence identity - PIDE - for 3Di letters) or after maximally ten attempts (to conserve resources). Single-sequence based ESMFold predictions for generated sequences were compared against the native ground-truth predicted by AlphaFold2 using lDDT (60), RMSD, TM-score (61), PIDE, and entropy (KL-divergence between the AA distribution in UniProt and the generated sequences). Error bars indicate 95% confidence intervals estimated from 1000 bootstrap samples. Arrows next to metrics indicate whether higher (↑) or lower (↓) values are better. For PIDE applied to inverse folding, it is not clear whether higher is necessarily better.

Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close