Mapping Reads

Challenges for spliced aligners

An example aligner is TopHat2

  • Incorrectly mapped into an intro instead of the exon.
  • Incorrectly mapped into a pseudogene

Count normalisation

This is needed for differential gene expression counts.

There are biases due to the preparation steps related to the length of the RNA and the sequencing depth of a sample.


Reads per kilobase of exon per million mapped reads

$$ \frac{N_{gene} * 10^9}{L_{gene}} $$

Where Ngene is the number of mapped reads

and Lgene is the length of the gene's exon's base pairs.