Skip to main content

Planning your experiment

Careful experimental planning is crucial to obtain data which can yield the information you need for your experiment. For most experiments the most pressing question relates to the amount and type of sequencing required per sample, but there are other factors which may impact your experimental design.

If in any doubt, please do discuss your project with us. Protocols change frequently and it is possible that improved methods are available which may help your project.

Genomic sequencing

This depends on the size of your genome, its ploidy and whether you intend to compare against a reference genome or perform de-novo assembly. It can also be affected by biases in sample preparation (e.g. GC-bias) so the figures below are only a general guide.

Reference-based:

For haploid genomes with a single copy of major chromosomes (e.g. bacteria), aim for a median coverage of 10x (mean coverage of 15-20x).

For diploids, this should be doubled: i.e. you should aim for median coverage of your genome of 20-30x. In practice this means a mean depth of coverage of 40-50x. This has been shown to enable relatively accurate variant calling. The following Illumina technical note can be used to calculate the amount of sequence data required for an experiment.

These figures apply whether performing Illumina short read sequencing or PacBio long read sequencing.

De-novo sequencing:

If you do not have a reference genome (denovo sequencing) then, aim for 100x-200x coverage per sample with Illumina sequencing.

If using PacBio data then the question is whether an assembly can be achieved with PacBio only data (preferred) or if the PacBio data will be used to help scaffold high quality Illumina contigs. In the PacBio only case, a mean coverage of 50x is recommended when using modern PacBio aware assemblers (e.g. Celera). In the case of scaffolding, a coverage of 4x should be sufficient to scaffold the majority of the genome.

There are many ways to prepare material for sequencing. These range from PCR-free protocols which require large inputs of starting material (>1-2ug), to tagmentation protocols such as Nextera which can work with low inputs (50-100ng).

If you are working with an organism with a biased GC-content (i.e. <40% or >60%), we would recommend a PCR-free protocol if possible as the PCR-reaction can lead to over/under representation of certain features. PCR-free libraries require larger input of DNA. Please see the sample preparation pages for more details of individual protocols.

De-novo sequencing:

In addition if you are planning a de-novo sequencing experiment then it is highly recommended that several libraries are prepared with different insert sizes. We would recommend one library with an insert size of 200-500bp along with a long reads sequencing library (10-20kb fragments )  sequenced by Pacific Biosciences Sequel or Oxford Nanopore Technologies platform.

RNA-seq

This depends on the biological variability between replicates as well as the organism. Typically for prokaryote RNA-seq we would suggest aiming for 2-5 million reads per biological replicate and for eukaryotes anywhere from 10-40 million per replicate. Aim for high numbers if you wish to quantify splice variants and/or transcription start sites or if your genes of interest are likely to be expressed in low numbers.

The RNA Seq Power calculator spreadsheet may assist you in planning your experiment, but it is only a guide.

We generally recommend a minimum of 4 biological replicates per condition assuming a well controlled environment. This allows for the possibility that some replicates may be more variable than you anticipate. If you are dealing with environmental samples where biological variability is much higher you may need double this.

Note that the PacBio sequencer (Iso-seq method) can only be used to identify full length transcripts. Quantitative information is not provided.

The first decision is whether to polyA isolate RNA or to perform an rRNA depletion. This is necessary to remove highly abundant rRNA from your sample.

PolyA isolation has the advantage of being cheaper and potentially introducing less bias at the gene level. However, the polyA isolation method can lead to enrichment of the RNA at the 3′ end and can bias isoform estimation and lose any non-poly-adenelated RNA (e.g. histone mRNA).

rRNA depletion is typically more expensive and requires that a suitable method is used to remove rRNA fragments.

For all RNA-seq library preps we strongly recommend the use of a directional library such as Illumina TruSeq directional. Please see the sample preparation pages for more details of individual protocols.

Note that if you are looking at RNA species smaller than 50-100bp, the standard RNA library preparation methods are not appropriate. These methods tend to lose these smaller fragments during purification processes. Instead a specialised small RNA prep needs to be performed.

Amplicon sequencing

This can take many forms (e.g. 16S/18S/Co1), but typically we recommend that you contact us in advance of your experiment to discuss the best methodology. Typically we will recommend that sequencing adaptors and multiplex barcodes are added by the user in a two step PCR.

Note that cluster densities achieved for amplicon sequencing on Illumina platforms are typically 30% -60% less than than more complex libraries.

ChIP-Seq, Meth-Seq and RAD-Seq

For these methods, please contact us directly to discuss your experiment before starting any extraction.