Protein Structure Prediction

Motivation:

We have millions of sequences but only about 170,000 crystal structures of protein due to the complexity of the crystallization process. Structure of a protein is an important factor for protein function. Protein structure can provide insights into understanding its function and also enables us to do design through mutagenesis or through identify binding pockets. This makes AlphaFold a powerful tool in the protein design pipeline.

Models:

AlphaFold /ColabFold : https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb

Omegafold:

https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/omegafold.ipynb

ESM Fold:

https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/ESMFold.ipynb

References:

Protein Sequence Recovery:

Motivation:

A typical protein design pipeline involves finding sequences for a given input structure with the lowest possible energy. Tools like ProteinMPNN enable you to find a library of sequences that are predicted to fold into a protein structure of interest. These sequences can be used as starting points in your design pipeline.

Models:

ProteinMPNN :

UPDATED 02/28 7:30PM