Protein Structure Quality

We can assess and describe the quality of structures found in databases such as the PDB.

PDB data

A PDB data file contains a 3D model a proteins

  • atomic coordinates x,y,z
  • residue types (α-helix, β-sheet or coil)
  • remarks

Measures of Quality

Resolution

In PDB files, resolution is given in Angstroms. 1 Angstrom is very good (high resolution), 3 Angstroms is poor. An electron density map is calculated from the diffraction patterns, at high resolution you can see every atom, at lower resolution some inference is needed. Example of myoglobin at different resolutions

Reasons for poor resolution include imperfect crystals, and local movement of the atoms

R-Free

See R-factors.

Average positional error

Luzzati Plot

x-axis: 1/d 0.2 to 0.4, where d is the distance between lattice planes.

y-axis: R-factor

calculate and plot R-factor using only those reflections matching around 1/d (vary d)

σA Plot

x-axis: ln σA

y-axis: (sinθ/λ)^2

Local structure variability

B-factor indicates local mobility (atoms moving around). Large B-factor means uncertainty.

Atom errors are shown in as ellipsoids. (not random)

PDB file might contain distinct conformations, with coordinates and fractional occupancy for certain atoms.

Stereochemical Quality or local covalent geometry

Bond Lengths

The model has bonds. During refinement, measure the bond difference and try to minimised it.

Ideal or expected is the theoretical lengths taken from small organic molecules, and model is from the current model under refinement.

$$ \sum_{bonds} w_{bond}(length_{ideal}-length_{model})^2$$

w is a weighting factor

Engh & Huber (1991) calculated the ideal bond lengths of small molecules (very accurate data) from the Cambridge Structural Database (CSD).

Stereochemical quality is how much the model agrees with prior chemical knowledge about amino acid's bond lengths and angles.

Backbone torsion angles

You can measure backbone torsion angles. This takes into account φ, ψ and ω angles. Looking at the Ramachandran plot, you can measure what % of the torsion angles lie within favourable regions.

Sidechain torsion angles

Sidechains also have torsion angles. There are only a few probable rotameric states.

χ1 is formed by N - Cα - Cβ - Cγ

χ2 is formed by Cα - Cβ - Cγ - etc..

You can plot these two angles, and compare it against typical values for that residue.

There are three typical conformations.

Trans = 180 degrees, gauche minus = -60 degrees, gauche plus = 60 degrees

Procheck program uses G-factors to measure the stereochemical properties of a given protein structure. Low G-factors are unlikely, and indicate a problem with the protein structure model. The score is given in log odds. i.e. log10P

NMR Spectroscopy

Nuclear Magnetic Resonance (NMR) produces multiple similar structures due to uncertainty and ambiguity in the data.

Multiple nuclear Overhauser effects (NOEs) are produced by short inter-proton distances (less than 6 Angstroms). The completeness is the ratio of observed to expected NOEs.

Disadvantages

Doesn't work on large proteins, but good for small and medium sized proteins.

Advantages

Proteins can be measured while in solution (instead of being locked in a crystal)

ref: Methods for Determining Atomic Structures.

Alpha Helix

Identifying a helix could be done by the following (poor) methods. Assume annotated PDB data is correct, analyse φ and ψ angles, or Cα to Cα distances.

The best identification is determined by hydrogen bonding pattern. The H bond is made between CO of residue i, and the NH of residue i+4

Hydrogen bond are weaker than covalent bonds, but stronger than van der Waals.

Hydrogen bond definition. A hydrogen bond exists between a donor functional group (D-H) and an atom, or group of atoms (A) able to accept the bond, when there is evidence of association between the groups and that this is due to, or enhanced by, the presence of the hydrogen atom already covalently linked to the Donor. (Protein-Ligand Interactions, Bohm, page 138)

In this case we have the donor as N-H, and the acceptor as C=O

Geometry

For the hydrogen bond, there are 3 atoms. D-H..A (donor, hydrogen, and acceptor). They form a triangle. We can measure:

θ, as the angle formed by D,H,A.

HA distance

The most probable angle is near 180 degrees, while HA distance is around 2 Angstroms. This is the McDonald and Thornton(1994) definition.

There are also some found towards 120 degrees and 2.6 Angstroms.

PDB stats for high res structures show only 5% of C=0 and 10% of N-H without hydrogen bonds.

H bonds are not limited to N-H and C=0, some residues (e.g. lysine, arginine) have N-H which behave like donors.

O exposed by glutamic and aspartic acid (COOH) can act as donors.

while the CON in glutamine and asparagine can be both acceptor and donors.

Mills and Dean definition is used by Chimera, it uses statistical probability based on models from the CSD.

Beta Sheet

β sheets are defined by non-sequential patterns of H-bonds (between strands)

DSSP

DSSP is a program that given the 3D atomic coordinates of a structure, identifies H-bonds by looking at energy between atoms. It then looks for sequential occurrences to find α helices, and non-sequential patterns to find β sheets.

References

DSSP.

Turns

The protein chain reverses direction, it is needed to form a globular protein.

Statistically some amino acids are more likely to be found in turns than α helix or β sheet. Gly because it is flexible with no sidechain, Pro lacks H-bonding and sterically repulses other atoms. Asn, Asp, Ser have short polar sidechains that can form H bonds.

γ turns

Tight turn of 3 residues. There is a H bond between residues i and i+2. There are two conformations, inverse and classic. Inverse is more common. See lecture notes for plot.

β turns

Four consecutive residues with Cαi and Cαi+3 having a distance less than 7 Angstroms, and not helical.

There are many types.. I, I', II, II'

φ and ψ from residues i+1, i+2 can be plotted on a Ramachandran plot. See notes for example. You can draw an arrow between i+1 and i+2 on the plot, types can be identified by where on the plot the arrow is.

Promotif

DSSP with changes added to detect turns.