For in-person students please answer atleast 2-3 questions in Part A and 1 question in Part C. Questions marked with * are for extra credit
Part A: Using the PDB
<aside> ⚠️ Mandatory for MIT/Harvard Students and Committed Listeners Due at the start of class March 5
</aside>
Answer any of the following questions by Shuguang Zhang:
Where did amino acids come from before enzymes that make them, and before life started?
If you make an alpha-helix using D-amino acids, what handedness (right or left) would you expect?Can you discover additional helices in proteins?
Why most molecular helices are right-handed?
Why do beta-sheets tend to aggregate?
Why many amyloid diseases form b-sheet?
Design a b-sheet motif that forms a well-ordered structure.
Part B: Protein Analysis & Visualization
<aside> ⚠️ Mandatory for MIT/Harvard Students and Committed Listeners Due at the start of class March 5
</aside>
In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins.
Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions.
Briefly describe the protein you selected and why you selected it.Identify the amino acid sequence of your protein.
How long is it? What is the most frequent amino acid? You can use this notebook to count most frequent amino acid - https://colab.research.google.com/drive/1vlAU_Y84lb04e4Nnaf1axU8nQA6_QBP1?usp=sharing
How many protein sequence homologs are there for your protein?
Hint: Use the pBLAST tool to search for homologs and ClustalOmega to align and visualize them. Tutorial Here
Does your protein belong to any protein family?
Identify the structure page of your protein in RCSB
Open the structure of your protein in any 3D molecule visualization software:
Examine and analyze your protein, visually:Visualize the protein as "cartoon", "ribbon" and "ball and stick".
Part C. Using ML based protein tools
<aside> ⚠️ Mandatory for MIT/Harvard Students Due at the start of class March 5
</aside>
a) Pick a protein in the PDB and fold its sequence using any of the protein structure prediction models. Does the protein fold into the same shape ?
b) Do you notice any difference in the predicted structure and the PDB structure ? UPDATED 03/03 - You see difference between the predicted structure and the actual structure using the tutorial here
c) Are there any low confidence [by confidence we mean the plDDT score) regions in your protein ? If so, why is the confidence score of structure prediction model low in that region of the protein ? UPDATED 03/03 - Checkout the tutorial if you need help with seeing this in PyMOL. If you can’t see both the predicted structure and another file that you add. You can click on reset on the top right box in PyMOL
d) If there are low confidence regions do you think it would affect your ability to engineer the protein for a specific function ? What can you do in your design pipeline to account for it? [Extra Credit]
a) Generate sequence proposals for PDBID: 1BCF chain A.
b) And fold 1-2 sequences using a protein structure model from Question 1.
c) Is there a way to enable the newly designed sequences to preserve their binding to di-iron ? Refer to : https://www.bakerlab.org/wp-content/uploads/2022/11/Diffusion_preprint_12012022.pdf?ref=assemblyai.com UPDATED 03/03 - To answer this question, all you have to do is keep some parts of the protein sequence of 1BCF constant - You can find which regions in the document linked above. Just search for 1BCFUsing a Generative Model
a) You are a scientist trying to design a new drug binder for COVID-19. Generate a protein backbone that can bind to SARS-CoV-2 spike protein. Use PDB ID: 6M0J or any other target for identifying a binding pocket
b) Generate sequences for your newly sampled backbone and fold 1 or 2 of them. Visualize them using your favorite protein visualization tool from Part B.
c) How can you rank and select the new protein sequences to test in the lab ?
d) How can you experimentally verify if your newly designed binder binds to the target ? [Eg: Yeast Surface Display, Degradation Assays etc]
**e)** If you design a binder that strongly binds to SARS-CoV-2, what's the next step in your design pipeline ? What are some possible issues in its application as a drug in humans ? [*Extra Credit*]
f) Here using RFdiffusion, we designed a mini-protein binder. However many therapeutic protein binders designed are typically antibodies. What are some advantages of antibody binders ? [*Extra Credit*]
Engineering thermo-stability of enzymes
a) Pick an enzyme you are interested in [eg: PETase]
b) Summarize the function of this protein
c) Can you engineer a version of your protein that functions at high temperatures ?
d) How can you utilize machine learning tools for designing this protein ? [Extra Credit]
e) How would you test the thermo-stability of your newly designed enzyme ? [Extra Credit]