Skip to content

Background

Protein Organization

  • Primary Structure: amino acid sequence
  • Secondary Structure: amino acid sequences linked by hydrogen bonds
  • Tertiary Structure: organization of secondary structures
  • Quaternary Structure: organization of multiple amino acid chains

The Importance of Protein Structure

  • Can help determine what a protein does
  • Often more conserved than the amino acid sequences that form them

Protein Structure Problem

As of now there are a few different ways to predict protein structure in the lab:

  • X-ray Crystallography
  • Nuclear Magnetic Resonance (NMR) Spectroscopy
  • 3D Electron Microscopy

However there are 100,000,000 known distinct proteins, each with a unique structure that determines function. Determining protein structure is time consuming and as such only a small fraction of exact 3D structures are known.

Levinthal's Paradox

  • Finding the native folded state of a protein by random searching of all possible configurations would take an enormous amount of time
  • However, proteins can often fold within seconds
  • Meaning some process must be guiding this folding

Using Sequence To Predict Structure

  • Instead of laboratory experimentation, there have been massive efforts to use a protein’s sequence to determine structure
  • In 1994, the Critical Assessment of Structure Protein (CASP) was established as a biennial assessment of methods to predict structure from sequence

Enter AlphaFold2

  • Google’s DeepMind team Entered AlphaFold 2 in CASP14
  • Achieved a median Global Distance Test Score of 92.4
    • This score essentially says: How close is my predicited structure to the known structure?
  • AlphaFold 2 works by:
    • starts with a user's query protein sequence
    • finding similar sequences to that query
    • aligns these sequences to create a mutliple sequence alignment
    • uses available structure data based on query sequence to create initial distances between residues
    • uses a neural network to iteratively update the distances between residues by using information from the sequence alignment
    • passes this to another neural network to determine how these residues are oriented in 3D space

Our Data

Today we will be comparing AlphaFold2's structure prediction of Proliferating Cell Nuclear Antigen (PCNA) and DNA ligase 1 (LIG1)!