Normalisation

Why do I need to normalise data?

This is such a common question that we have dedicated a whole page to it. Normalisation of data is done to correct for differences between samples that do not arise from the experimental conditions but  rather from other sources of variability. Two common examples are:

quantification of gene expression

quantification of bacteria in animal or plant tissue

For gene expression data, there are many steps that can lead to variability in the data but a common source of error is the variability of the reverse transcription reaction. If we take one RNA sample and perform three reverse transcription reactions, even if the pipetting error is miniscule, it is likely that the efficiency of reverse transcription will vary for each of the three reactions. This can be several-fold. So if we were to quantify our gene of interest without normalisation we would find that the same RNA sample gives us three possibly very different results. So we need to also measure a number of reference genes in the same RT reactions and used the average expression level of those that are stably expressed to correct for the differences in RT efficiency. This normalisation will also correct for other sources of variation such as pipetting error, differences in starting template etc.

If we now take the second example, quantification of bacteria in tissue, in this case we would not need to correct for reverse transcription because the starting template is DNA. Let’s suppose that we use 10 ng of genomic DNA template which has been extracted from some human tissue which we believe is infected with bacteria. We don’t know how much of the DNA is human, how much is bacterial and how much is from other sources (fungal, for example). We measure the number of bacteria in 10 ng of template, but we don’t know how many cells’ worth of human DNA is in the sample. So the best thing to do is to also measure how much human DNA we have by quantifying the DNA of a single copy gene or other region of the human genomic sequence so that we can work out how many bacteria there are ‘per x human cells’.

How many reference genes should I run to normalise my expression data?

MIQE guidelines suggest more than one reference gene should be evaluated. This is because it is not possible to establish how stable the expression of any gene is unless a number of genes are compared with each other.

We recommend running a panel of genes and using software such as geNorm, Normfinder or BestKeeper to identify the most stable which may then be used to normalise data. Once the most stable reference genes are established for a given set of experimental conditions, only those reference genes need be run in subsequent experiments.

If you have access to microarray or next-generation sequencing expression data for the same experimental conditions that you will be using for qPCR, you use these data to identify stably expressed reference genes.