# Protein Structure Comparison

## Reasons

Identify conformational differences in same protein being bound to different ligands.

Identify functionally significant conformational changes

Understand similarities and differences in homologous proteins.

## RMSD superposition

RMSD compares two structures.

RMSD requires the same number of atoms so corresponding atoms can be compared.

For each equivalent atom pair, square the distance, sum all, average, then square root.

$$RMSD = min\left(\sqrt{\frac{1}{n}\sum_{pairs} d_{ij}^2} \right)$$

Find the best overlap by moving the centre of geometry, then rotating.

RMSD units is Angstrom.

1 Angstrom is a good match. Random values can give 6, so any higher should be ignored.

The types of atoms could be a subset. Cα, backbone, heavy atoms etc, and should be specified.

## Homologous structures

Helps transfer functional knowledge between proteins

Understand functional differences

Understand evolution

Structure is more conserved than sequence. Structure is a better comparison.

### RMSD of homologous structures

The main problem is equivalent atoms, especially when then have a different number of residues due to insertions or deletions. Either pick a set of pairs to minimise RMSD, or find the max number of atoms that be superimposed to give less than 3 Angstroms (for example).

As residues are different, use Cα or backbone atoms.

A comparison between structures A and B, with equivalent atoms can be calculated by:

$$RMSD_D = \frac{1}{n} \sqrt{\sum_{i<k} (d_{ik}^A-d_{ik}^B)^2}$$

where i<k

Sequence alignment must be done until lowest RMSDD value is found.

### Normalised Structural Alignment Score (SAS)

$$SAS = \frac {RMSD * 100} {N}$$

where N is the number of superposed residues.

### Cα - Cα distance map

x-axis 1..n, y-axis 1..n where n is the number of residues or Cα.

Also denote secondary structure along the x and y-axis.

For each (x,y), if the distance between Cα is less than 4.2 Angstroms, then place a dot on the chart.

The chart is of 1 protein structure.

Proteins with similar structure will have a similar distance map.

### TOPS abstraction

Topology diagram is lossy simplification of a protein's structure. It converts it into a graph of secondary structure elements (SSE) with their vector (edge) and connections to other SSEs (nodes).

This allows for subgraph matching to find the largest common graph.

### Sequential structure alignment program (SSAP)

Calculates distance vectors between beta carbon to other residues. This takes into account the rotameric state.

A matrix is created for each corresponding residue, and dynamic programming is used to find the highest score.

Another matrix is created from these scores, and dynamic programming is used again.

### Class, Architecture, Topology, Homologous superfamily (CATH)

CATH is a hierarchical database of protein structures.

An example of a homologous superfamily are cytochrome p450s.

#### Gene3D

CATH's homologous superfamily has been subdivided further, into functional families.

Gene3D takes the HMMs derived from CATH functional families alignment, and scans them against various sequence databases. This allows for annotation of new sequences.

Folds are not evenly distributed, some are more common that others, like the Rossman fold.