Protein-Protein Docking

Reasons and motivations for computational docking

Interaction defines function

Better systems understanding of the cell

Experimental structure and complex determination is difficult

Thermodynamics of docking

Basics of docking

Create a scoring function, based on energy.

Search various conformations using the scoring function. Optimise the search


Thermodynamics functions are imperfect

Proteins are flexible (upon complex conformation)

Large search space for conformations

Protein interaction differences.


The lowest binding free energy or ΔG is the most optimal.

Binding Constant

The binding constant for receptor (R) and ligand (L) binding concentrations is

$$ \Delta G = RT \ln \frac {[RL]}{[R][L]} $$

where R is the ideal gas constant

and T is the temperature in kelvins


Binding site scoring

  • Free energy (electrostatics, stereochemistry, solvation)
  • geometric scores
  • statistical scores
  • phylogenetic

or function based on all of the above.

Binding site features

Shape is complementary between of the two proteins

Electrostatic residues are complementary

Area is around 1600A^2, up to 5000 A^2. Large interfaces introduce conformation changes upon binding.

Sequence is conserved at the binding site. Homodimers are conserved. Obligate proteins are less conserved, and transient complexes are even less conserved

Aromatic residues are found in the binding site

Hydrophobic residues are expected to be in the core, then interface, then surface

Patches of hydrophobic and polar regions

Water may be present in some interfaces


Interface is conserved during evolution

Residues are conserved within the interface

Residue propensity in the binding site interface

Some residues are more likely than others to be present in the binding site.

Binding sites that are non-obligate, obligate and crystal packing have amino acids with different propensities.

Shape complement

Proteins will have complement shape, like two 3D jigsaw pieces

Lesk and Stemberg algorithm will cycle through all the overlaps in 3D grid space, and give the conformation a score based on overlapping and adjacent units.

Binding Free Energy Function

$$ \Delta G = \Delta G_{van der Walls} + \Delta G_{desolvation} + \Delta E_{elecrostatic} $$

It can be made up of:

  1. van der Waals energy, based on complementary shape
  2. desolvation energy, based on hydrophobicity of the residues that are no longer exposed to water.
  3. electrostatic energy

Conformation Search

3 axes of rotation, 3 directions for translation.

Conformation change may happen upon binding. Options are

  • Ignore and keep the proteins rigid. Faster search, but may not discriminate between the correct and alternative conformations.
  • Model the conformation upon binding. Much slower search, correct solution based on energy function.
Shape complement search algorithms

Fast fourier transform (FFT), look for correlation between all surfaces of proteins

Geometric Hashing

pepper the surface of both proteins with calculated hash points.

Search through all transformations and rotations to maximize surface points.

Finds the best steric / geometrical fit, then refines docking using energy function.


High-resolution docking

Explicitly model conformational changes.

Needs parameters such as a energy function (where native state gives the lowest global minimum energy conformation (GMEC))

Sample smartly

Rosettadock uses Filters to help with the random search.

A low res search is execute, perturbation (random walk), side chain optimisation, rigid body optimisation. Repeat.

Cluster the results based on RMSD.

Data-driven docking

Restrict search to previous info. e.g. HADDOCK (high ambiguity driven protein-protein docking). HADDOCK uses info about experiments

Another option is to computationally mark residues as participating or excluded (solvent accessible) in the interface, then let MODELLER work.