
Picture Source: Bordin, Nicola et al (2023). Novel machine learning approaches revolutionize protein knowledge. Trends in Biochemical Sciences, Volume 48, Issue 4, 345 - 359
Lecture
<aside>
<img src="/icons/slide_green.svg" alt="/icons/slide_green.svg" width="40px" /> Lecture slides: Zhang - Protein Design - 2.25.25
</aside>
<aside>
<img src="/icons/video-camera_yellow.svg" alt="/icons/video-camera_yellow.svg" width="40px" /> Lecture Recording
</aside>
Recitation
<aside>
<img src="/icons/slide_green.svg" alt="/icons/slide_green.svg" width="40px" /> Recitation Slides (added Feb 28):
ProteinDesign.pdf
Phage therapy.pdf
</aside>
<aside>
<img src="/icons/video-camera_yellow.svg" alt="/icons/video-camera_yellow.svg" width="40px" /> Recitation Recording
</aside>
Homework
<aside>
⚠️ Please read through the full assignment before you get started. This assignment is mandatory for MIT/Harvard students and Committed Listeners
</aside>
<aside>
<img src="/icons/push-pin_green.svg" alt="/icons/push-pin_green.svg" width="40px" /> Key Links:
http://docs.google.com/spreadsheets/d/1AsYRLlrRLd6I8abxNHfuz1OtFTSqYZ87_kefBMsxhMo/edit?gid=0#gid=0
https://docs.google.com/spreadsheets/d/19_u8Sd8TdseHP6yAVrDSjIKf0vDU9RNH3SEACx8L22Y/edit?gid=0#gid=0
</aside>
Objective:
- Learn basic concepts: amino acid structure, 3D protein visualization, and the variety of ML-based design tools.
- Brainstorm as a group how to apply these tools to engineer a better bacteriophage (setting the stage for the final project).
Part A. Conceptual Questions
- Answer any of the following questions by Shuguang Zhang:
- How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
- Why humans eat beef but do not become a cow, eat fish but do not become fish?
- Why there are only 20 natural amino acids?
- Can you make other non-natural amino acids? Design some new amino acids.
- Where did amino acids come from before enzymes that make them, and before life started?
- If you make an alpha-helix using D-amino acids, what handedness (right or left) would you expect?
- Can you discover additional helices in proteins?
- Why most molecular helices are right-handed?
- Why do beta-sheets tend to aggregate?
- What is the driving force for b-sheet aggregation?
- Why many amyloid diseases form b-sheet?
- Can you use amyloid b-sheets as materials?
- Design a b-sheet motif that forms a well-ordered structure.
Part B: Protein Analysis and Visualization
In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins.
- Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions.
- Briefly describe the protein you selected and why you selected it.
- Identify the amino acid sequence of your protein.
- Identify the structure page of your protein in RCSB
- When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)
- Are there any other molecules in the solved structure apart from protein?
- Does your protein belong to any structure classification family?
- Open the structure of your protein in any 3D molecule visualization software:
- PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)
- Visualize the protein as "cartoon", "ribbon" and "ball and stick".
- Color the protein by secondary structure. Does it have more helices or sheets?
- Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
- Visualize the surface of the protein. Does it have any "holes" (aka binding pockets)?
Part C. Using ML-Based Protein Protein Tools
<aside>
<img src="/icons/snippet_lightgray.svg" alt="/icons/snippet_lightgray.svg" width="40px" /> Resources: https://colab.research.google.com/drive/1hXStRY9VCyw52n17uWdWQBj__IcR2ztK?usp=sharing#scrollTo=38gFJBazNdzJ
</aside>
- Fold your protein with AlphaFold or ESMFold or Boltz and compare it to the real structure.