Pacific Biosciences Sequel
The Sequel from Pacific Biosciences (PacBio) is our primary long-read platform. It is based on Pacific Biosciences standard chemistry as described in the long-read sequencing overview. DNA is prepared for sequencing by ligating dumb-bell shaped adaptors to the DNA and often size-selecting the DNA to remove shorter DNA fragments (Figure 1). Due to modified reagents the Sequel has less size depending loading bias as compared to the earlier RSII instrument.
Each SMRTcell contains 1000,000 Zero-Mode-Waveguides (ZMWs). These are picolitre size holes whose diameter is less than that of the incident laser light used to illuminate the enzyme and activate fluorescence in the dNTPs. See Figure 2 below for an illustration of sequel SMRT cells.
Figure 2: RSII SMRT cells (left) compared to Sequel SMRT cells (right)
Figure 3: Overview of PacBio sequencing (based on RSII sequencer)
An overview of PacBio based sequencing, including an expanded cut-away view of the SMRTcell, is shown in Figure 3. The laser light itself is enters through a transparency at the bottom of the SMRTcell. The small diameter of the ZMW means that light intensity falls off rapidly as it passes into the ZMW (i.e. the light behaves as an evanescent wave, rather than a travelling wave). This keeps the area at the bottom illuminated which is where the polymerase sits, whilst reducing the amount of fluorescent signal from free dNTPs which diffuse in and out of the SMRTcell. In other words, the dNTPs being incorporated by the polymerase will contribute most to any fluorescent signal.
During the sequencing reaction, any nucleotide incorporation event will generate an extended pulse of fluorescent light (Figure 4). This is because the polymerase has been engineered to favour an extended incorporation time. Based on the time and intensity, the identity of the base, and in some cases base-modifications can be inferred.
In optimal conditions, a typical Sequel SMRTcell will yield 6-12Gbase of data depending on the length of fragments. As a guide, depending on the quality of input material and the type of library preparation, this may be sufficient to completely assemble upto 16 bacterial genomes, although for some bacteria additional data may be required.
Figure 4: Bound polymerase incorporating fluorescently labelled dNTPs
Thanks to the circular nature of PacBio libraries, a balance exists between template length, read length and read quality with PacBio data. Figure 5 illustrates this.
Figure 5: PacBio data quality
For example, a polymerase which is able to read for 10kb, could read a single 10kb template or, it could read a 2kb template 5 times. The 2kb fragment, having been read 5 times, would be of much higher quality than the 10kb fragment since read errors are random and so multiple reads of the same molecule would iron out errors. Terminology which is important to understand with PacBio data is highlighted in Figure 5.
A polymerase read refers to the complete set of basecalls associated with the polymerase sequencing the forward strand, adaptor, and reverse strand. Internally, we remove the adaptor sequences and provide the subreads (just the forward and reverse sequences of the template). These can be further analysed to produce circular consensus reads (CCS) if the enzyme has made more than 1 pass of the molecule.
Current performance:
Using Sequel v3 chemistry with 10-hour movie time (see Figure 6):
- >6-12 Gb of sequence data
- >300,000 reads per SMRTcell
- Median polymerase read length of 12-18kb
Figure 6: Example Base Yield Density Plots from v3 chemistry (bases read vs polymerase read length)
How to generate good data
- Good data is critically dependent on high quality, contaminant free DNA. It is crucial to optimise DNA extraction protocols. Additional information on this can be found on our sample preparation page.
- To obtain the longest reads 5ug of starting material is an absolute minimum.
- Of the 1,000,000 ZMWs, only 30%-70% are expected to yield useful data. This is because some ZMWs will contain more than one polymerase (making it impossible to basecall), whilst others may contain no polymerases.
- The loading of SMRTcells needs to be optimised for each library. This can mean that for any project, the first one or two SMRTcells produce less or lower quality data than the remaining SMRTcells.