Second Generation Sequencing Technology

Image of laboratory.

Sequencing Technologies at BCM-HGSC

Never has the state of DNA sequencing technology been in greater flux than today. The steadfast approach of fluorescence-based Sanger sequencing appears to have reached its limit for technological improvements. It is being replaced by emerging technologies that promise faster and cheaper sequence information in far greater volumes than ever before. These next generation methodologies push back the limits of possibility, enabling research that would be impractical and too expensive using the Sanger paradigm. With this transition come new possibilities in the field of large-scale genomic science, coupled with new challenges in data storage and analysis.

Below are brief technical descriptions of the high-throughput platforms currently in use for large-scale sequence production at the BCM-HGSC. In each case, the library group shears DNA with nebulizers and size selects with a gel cut. After end repair, adaptors are ligated to the resulting fragments prior to the steps below:

Pyrosequencing: Roche

After selection and denaturation, fragments are hybridized to capture beads and then amplified by emulsion PCR (emPCR). After breaking the emulsion, approximately 1 to 2 million beads, in the presence of polymerase and cofactor, are deposited into individual wells of a pico titre plate (PTP) together with sulphurylase and luciferase enzyme beads and packing beads. After loading the plate into the GS-FLX instrument, individual dNTPs are dispensed in a predetermined order and flowed sequentially across the wells. On incorporation of a complementary dNTP, the released PPi is converted into ATP by sulphurylase, which in turn is cleaved by luciferase producing light from the oxidation of luciferin into oxyluciferin. Approximately 400,000 reads of 200-300 bases are recorded as flowgrams over a period of about eight hours. For homopolymer repeats up to about six nucleotides, the number of dNTPs added is directly proportional to the light signal. The newer Titanium series of reagents and PTPs is capable of approximately one million reads averaging 350-400 bases per run over ten hours. Insertions are the most common error type, followed by deletions.

Sequencing by ligation: Applied Biosystems

Approximately 100 million emPCR-prepared template beads are covalently attached to a glass slide. Upon annealing of a universal primer, a library of di-base probes is added. Appropriate conditions enable selective hybridization and ligation of probes to complementary positions. The first and second positions of the di-base probes are designed as interrogation bases, such that all 16 possible dinucleotides are encoded by four dyes. Following four-color imaging, the ligated di-base probes are chemically cleaved to generate a 5' phosphate group ready for the next cycle of hybridization, ligation, imaging, and cleavage, with the number of cycles determining the eventual read length. The extended primer is then stripped from the templates, and a second ligation round is performed with an n-1 primer thus resetting the interrogation bases one position to the left. Double interrogation of each base improves the accuracy of the color call. The remaining ligation cycles ensue, followed by three more rounds of n-1 primer reset. The resulting string of data bits, encoded in color space, are then aligned to a reference genome to decode the DNA sequence. Substitutions are the most common error type.

Reversible terminators: Illumina

DNA fragments ending in adaptors complementary to forward and reverse PCR primers are randomly distributed across eight channels of a glass slide, to which high-density forward and reverse primers are covalently attached (the flow cell). Within a Cluster Station, addition of unlabeled nucleotides and polymerase to the individual single-stranded DNA templates results in solid-phase bridge amplification producing about 80 million clusters, each containing approximately 1000 copies of the same template. After inserting the flow cell in the Genome Analyzer, the sequencing reactions are initiated by annealing a primer to the free ends of the templates in each cluster. The polymerase extends and then terminates DNA synthesis from a set of four reversible terminators (RTs), each labeled with a different fluorescent dye. Unincorporated RTs are washed away, base identification is performed by four-color imaging, and then blocking and dye groups are removed by chemical cleavage to permit the next cycle. Color images for a given cluster provide reads of up to about 75 bases. Substitutions are the most common error type.

Selected PubMed Citations

Wheeler, DA, Srinivasan, M, Egholm, M, Shen, Y, Chen, L, McGuire, A, He, W, Chen, YJ, Makhijani, V et al. The complete genome of an individual by massively parallel DNA sequencing.  Nature  2008 Apr 17; 452(7189):872-6. [PubMed]