Homework

For in-person students please answer atleast 2-3 questions in Part A and 1 question in Part C. Questions marked with * are for extra credit

Part A: Using the PDB

<aside> ⚠️ Mandatory for MIT/Harvard Students and Committed Listeners Due at the start of class March 5

</aside>

Part B: Protein Analysis & Visualization

<aside> ⚠️ Mandatory for MIT/Harvard Students and Committed Listeners Due at the start of class March 5

</aside>

In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins.

  1. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions.

    Untitled

    Untitled

    Untitled

    Untitled

    Part C. Using ML based protein tools

    <aside> ⚠️ Mandatory for MIT/Harvard Students Due at the start of class March 5

    </aside>

    1. Using a protein structure prediction model of your choice

    a) Pick a protein in the PDB and fold its sequence using any of the protein structure prediction models. Does the protein fold into the same shape ?

    b) Do you notice any difference in the predicted structure and the PDB structure ? UPDATED 03/03 - You see difference between the predicted structure and the actual structure using the tutorial here

    c) Are there any low confidence [by confidence we mean the plDDT score) regions in your protein ? If so, why is the confidence score of structure prediction model low in that region of the protein ? UPDATED 03/03 - Checkout the tutorial if you need help with seeing this in PyMOL. If you can’t see both the predicted structure and another file that you add. You can click on reset on the top right box in PyMOL

    d) If there are low confidence regions do you think it would affect your ability to engineer the protein for a specific function ? What can you do in your design pipeline to account for it? [Extra Credit]

    1. Using a sequence recovery model (MPNN)

    a) Generate sequence proposals for PDBID: 1BCF chain A.

    b) And fold 1-2 sequences using a protein structure model from Question 1.

    c) Is there a way to enable the newly designed sequences to preserve their binding to di-iron ? Refer to : https://www.bakerlab.org/wp-content/uploads/2022/11/Diffusion_preprint_12012022.pdf?ref=assemblyai.com UPDATED 03/03 - To answer this question, all you have to do is keep some parts of the protein sequence of 1BCF constant - You can find which regions in the document linked above. Just search for 1BCFUsing a Generative Model

    a) You are a scientist trying to design a new drug binder for COVID-19. Generate a protein backbone that can bind to SARS-CoV-2 spike protein. Use PDB ID: 6M0J or any other target for identifying a binding pocket

    b) Generate sequences for your newly sampled backbone and fold 1 or 2 of them. Visualize them using your favorite protein visualization tool from Part B.

    c) How can you rank and select the new protein sequences to test in the lab ?

    d) How can you experimentally verify if your newly designed binder binds to the target ? [Eg: Yeast Surface Display, Degradation Assays etc]

     **e)** If you design a binder that strongly binds to SARS-CoV-2, what's the next step in your design pipeline ? What are some possible issues in its application as a drug in humans ? [*Extra Credit*]
     
     f) Here using RFdiffusion, we designed a mini-protein binder. However many therapeutic protein binders designed are typically antibodies. What are some advantages of antibody binders ?  [*Extra Credit*]
    
    1. Engineering thermo-stability of enzymes

      a) Pick an enzyme you are interested in [eg: PETase]

      b) Summarize the function of this protein

      c) Can you engineer a version of your protein that functions at high temperatures ?

      d) How can you utilize machine learning tools for designing this protein ? [Extra Credit]

      e) How would you test the thermo-stability of your newly designed enzyme ? [Extra Credit]

Untitled