Proteomics

Proteomics is the study of proteins that have been expressed by the genome, and are present in a particular tissue at a point in time.

In cells, DNA is the always same, but different proteins will be expressed. e.g. caterpillar vs. butterfly, heart tissue vs liver tissue.

The proteome is dynamic, this differs to the genome (DNA).

Reasons

  1. mRNA (transcriptome) does not predict the proteome.
  2. PTMs can affect the proteome. Almost 50% of proteins have PTMs.
  3. Protein expression is dynamic and shows the current state of the biological system.

Challenges

Biological challenges

  1. Dynamic range. 4x in cells, 10x in plasma
  2. Alternatively splicing (exons being used in transcription) increase the number of mRNA isoforms and proteins.
  3. Post-translational modifications (PTM) modify the protein backbone or sidechains. e.g. phosphorylation.

Technical challenges

  1. Acquiring sample
  2. No PCR (amplification / cloning)
  3. Sample preparation (growing, tagging)
  4. experimental time (gels)
  5. data analysis
  6. Not all fragments will be detected. unknown whether it is present in small quantities.

Approaches

  1. Profiling: take a snapshot of proteins that are present
  2. Differential: compare snapshots and detect changes in state
  3. Details: detect all PTMs including their positions on the protein.

Proteomics Workflow

Protein separation

Use a blender, a centrifuge or filtration on the sample to extract proteins and create a protein mixture.

Sodium dodecyl sulfate (SDS) is used to linearise proteins. This adds a negative charge to the proteins in proportion to their mass.

Dithiothreitol (DTT) can be used to break the disulfide bonds.

A dye may also be added

Polyacrylamide gel electrophoresis (PAGE) will now separate the proteins.

The protein mixture is placed in the gel. Positive and negative charge is applied on either end of the gel. The proteins will move depending on their charge (isoelectric point or pH), and molecular size. After a certain time the experiment is stopped.

Protein Excision and Digestion

Proteins are removed (excised) from the gel. Gloves must be used to avoid contamination.

Proteins need to be digested before being fed into the mass spectrometer.

DTT breaks down the disulfide bonds.

Iodoacetic acid is used to cap (alkylation) the cysteines (SH -> SCCOOH) to avoid re-formation of the disulfide bond.

Trypsin is used to break the polypeptide after arginines (R) and lysines (K). It is highly specific, efficent and well understood. The rules of fragmentation are used in data analysis.

Other proteases may be used depending on what needs to be investigated.

Arginine and lysine as basic and may end up being charged.

Digested (smaller) fragments are easier for MS to analyse and give high resolution data.

But peptide fragments must somehow to built up to protein again as part of data analysis.

Mass Spectrometry

Measures m/z (mass / charge) ratio of ionized molecules.

Molecules will have a monoisotopic mass, comprising of the most common isotope of each atom.

A mass spectrum is produced. It is a 2D graph with x-axis of m/z (mass/charge) and y-axis of relative ion abundance.

A mass spectrometer detects ions. These have a mass and charge.

Ions travel through the mass spec via electric and magnetic fields.

Ions are typically positively charged.

The detector is the opposite charge to the ions so it will attract them.

The items are that fed into the mass spec are referred to as analytes.

Ionisation techniques - MALDI

The protein fragments are mixed with matrix (X) on a metal plate. For Soft Laser Desorption (SLD), the matrix is a ultra fine metal powder and glycerol. MALDI uses a different laser and different matrix.

The matrix absorbs most of the laser's energy, and prevents the analytes from being damaged and fragmented. Proton transfer from the matrix to the analytes help create ions.

XH+ + M -> MH+ + X

Electron Spray Ionisation (ESI)

Protein solutions are passed through a high voltage (3000-5000V) needle. This produces droplets of ions.

Biological solutions are often liquid, so this technique is favoured over MALDI.

Drying gas helps remove solution (desolvation) as they travel to the mass spectrometer.

This technique may produce multiply charged ions for the same fragment.

This is done before the ions reach the mass spectrometer.

Mass Spectrometry - Performance

Accuracy

Difference between measured and true mass. Expressed as a percentage error.

Resolution

Can you see average mass or difference between each isotope. Peaks are more obviously separated at higher resolution.

Sensitivity

Dynamic range

Time-of-Flight (TOF)

Ions travel down and up a tube. They are accelerated by a known electric field.

The bend or reflection happens to smaller molecules earlier, so their time of flight is faster.

Quadrupole Analyser

Four cylindrical rods. Ions travel down the centre between all four rods. The rods are charged in opposite ends with a RF frequency. This creates an electric field for ions to travel down. Depending on how to RF and voltage is setup, some ions will crash into the rods, while others will make it to the end. This can act as a filter.

One MS could have three quadrupoles.

Peptide Mass Fingerprinting

Detects the fragments. Compares theoretical with observed experimental spectrum. Scoring is needed to determine the original protein.

FRAGFIT was simply based on matches

MOWSE was used to account for non-uniform distribution of fragments after digestion.

MASCOT is now a website to use MOWSE, and it considers protein database to score against.

Tandem MS (MS/MS) (MS2)

Mass specs in sequence and in different setups can measure different things.

MS1 -> Collision cell -> MS2

MS1 can be configured to all only 834-838 m/z ions. Usually it allows all ions through.

Collision cell can fragment the ions. If you do not put any gas in there it will allow ions through.

MS2 will measure m/z

Collision Induced Dissociation (CID)

A collision cell is where this happens, it is part of the mass spectrometer. It has higher pressure than the mass analyser regions of the mass spectrometer.

An ion comes into a gas filled chamber. Ions will collide with the gas molecules, and will break apart into fragment ions.

The fragments will typically break along the backbone. A backbone has the repeated pattern of N-Cα-CO

The two fragments are cleaved and fragments are referred to as either a,b,c or x,y,z, and with a number.

a,b,c fragments contain the N terminus, x,y,z contain the C terminus.

two fragments will either be a/x, b/y, or c/z.

a/x fragments between Cα and CO. b fragments between the next bond, CO and N. c fragments between N and Cα

the number increases from either terminus

Shotgun proteomics

Skip protein separation, instead digest all proteins into peptides.

Peptides are separated using high performance liquid chromatography.

Peptide separation must be done before MS analysis otherwise the mixture is too complex.

The peptides can be identified using tandem mass spectrometry.

Multidimensional Protein Identification Technology (MudPIT)

Peptides are obtained after protein digestion.

Then separated using:

  1. strong cation exchange chromatography (SCX)
  2. reversed phase chromatography

The separation can be repeated

Next is to go through MS/MS. Call them MS1 and MS2

Analytes are sent into MS1. These are known as precursor or parent ions.

An initial measurement of the peptides is recorded.

As the analytes are streaming in, MS1 is tuned to keep only certain peptides. These will make it through MS1, go into the collision cell, fragment, then go into MS2. The fragmented ions are known as product or daughter ions. Measurements from MS2 are saved by the mass analyzer / ion detection system.

MS1 is tuned again to allow the next most abundant peptide, MS2 will analyse the fragmented peptides. Eventually, MS1 can be turned to back to data analysis mode and process is repeated for next portion of peptides.

Limitations

Serial process

Based on most abundant, some will be missing.

reference: MudPIT at research.stowers.org

Differential experiments

Differential in gel electrophoresis (DIGE)

Tag your reference sample and other samples with different coloured gels (e.g. Cy2, Cy3, Cy5). The dyes are the same size an charge.

Mix the protein samples together.

Add all proteins to the SDS PAGE and separate via 2 dimensions.

Scan the gel with the corresponding wavelength of each gel to see the difference is protein quantity.

Relative differences in size or charge can be detected, so PTMs or other modifications can be detected.

If many samples are required, then the reference gel may be made up of a mixture of all samples.

Advantages

This avoids the issue with each gel experiment being different (the same protein may travel differently in the next experiment).

Disadvantages

limited sample capacity

low dynamic range

some proteins are hard to measure in gel (extreme hydrophobic / polar / mass)

Labour intensive

time consuming

hard to separate sample from gel

Isotope-coded affinity tags (ICAT)

Isotopic labelling of peptides, so they can be differentiated.

Consists of a reactive group, linker (heavy or light), biotin

e.g. in one sample, peptides may be labelled with deuterium instead of hydrogen.

Enables peptide mixtures from different samples

Originally linked to cysteines, but now per-methyl esterification to attach onto peptide carboxyl acid groups (COOH).

biotin marked peptides can be eluted (removed) from the mixture using avidin.

Stable isotope labelling in culture (SILAC)

Substitutes stable isotopes into the amino acid. e.g. 13C instead of 12C in lysine

One sample is made up of the light amino acid, the other sample of the heavy amino acid. The cells are grown in a dish and will eventually incorporate the heavy amino acids. This doesn't affect the metabolic process of the cells.

Both samples are combined and digested (e.g. with trypsin)

Mixture is analysed by LC-MS-MS

ref: silac faq.

Amine reactive isobaric tagging reagents (Isobaric labelling)

Four (or eight) different types of reagents may be used. Each comprises of

  1. reporter group (based on N-methylpiperazine)
  2. balance group (carbonyl)
  3. an amine specific peptide reactive group

The isobaric tag is the reporter group and balance group together.

Reporter and balance groups have a total mass of 145.1 Da. The different isobaric tags have different report group mass.

Commercialised as iTRAQ

Reagents reactive group will bind to the N terminus or lysine sidechain.

Quantification of proteins is done based on analysis of reporter group, which is fragmented during MS. Relative quantities of each protein can be measured this way.

iTraq

Advantages

Labelling can happen after cell or tissue lysis (breakdown)

Multiplexing

Uninterpreted MS/MS search

First step is like PMF

Identify possible peptides from database

Compute possible product ions (a,b,c,x,y,z)

Match the mass spec experimentally observed ions against these fragments.

Protein databases

Results from the mass spec are matched against proteins in a database.

Genomic databases include GenBank, DDBJ and EMBLBank

Protein database could be translated from above genomic databases.

PIR protein information resource

Uniprot. Annotated

Integr8. Links genomes and proteomes

Add keratin into your database search in case of contaminants.

Uniprot

Limitations

Finding a match is easy, but not always correct. This could be due to low quality MS/MS data. Or bad database search parameters found the wrong one.

If only one peptide matches, it isn't good enough to say protein has been expressed. Publishing guidelines recommend more than 1 match

References

Lecture 2 and 3 of systems biology 2015 at birkbeck

BroadE: Fundamentals of peptide and protein mass spectrometry at youtube.com