Next Generation Sequencing Buying Guide
If you are looking to invest in Next Generation Sequencing Technology, this guide aims to educate you on the essential information needed to assist your decisions.
Learn about the key platform technologies, considerations for sample preparation, NGS software, key application areas and the future for NGS.
2. Next Generation Sequencing Technology
2.1 Cyclic Reversible Termination
2.3 Sequencing by Ligation
2.4 Real-Time Sequencing
2.5 General Considerations
3. Considerations for Library Preparation
3.1 Consider Sample Preparation
3.2 Consider Automated Sample Preparation
3.3 Consider Target Enrichment
4. Considerations for NGS Software
5. Applications of NGS
5.1 Genome Sequencing/Resequencing
5.2 DNA-Protein Interactions and Epigenome Sequencing
5.3 Transcriptome Sequencing
5.4 RNA-Protein Interactions
5.5 Small RNA Sequencing
6. The Future of NGS
In 2013 we will celebrate the 10th anniversary of the completion of the Human Genome Project, which was launched in 1990 and thus took 13 years to complete. Ten years on, a human genome can be sequenced in less than 10 days and soon will be sequenced routinely within a day. This remarkable progress over the past decade is due to significant advances in DNA sequencing from Sanger sequencing, which has been dominant for almost 30 years, to next generation sequencing (NGS, also termed massively parallel sequencing).
Sanger sequencing, which is often considered first generation sequencing technology, relies on a technique known as capillary electrophoresis, which separates fragments of DNA by size and then sequences them by detecting the final fluorescent base on each fragment. This technology, which has become widely adopted in laboratories across the world and is still extremely important today, has always been hampered by inherent limitations in throughput, scalability, speed and resolution.
The limitations associated with Sanger sequencing have catalyzed the development of NGS technologies, which are able to inexpensively and quickly produce large volumes of sequence data. NGS enables rapid sequencing of large stretches of DNA base pairs spanning entire genomes, with some instruments capable of producing hundreds of gigabases of data in a single sequencing run. The read length - the actual number of continuous sequenced bases - is much shorter in NGS than that attained by Sanger sequencing and at present NGS only provides 50-500 continuous base pair reads. Short reads represent the major limitation currently associated with NGS.
NGS technology is evolving at an unprecedented speed. Scientists can now routinely examine a single genome a large number of times, observe individual changes, study population variations, study metagenomics, differentiate cancer genomes from healthy genomes, study the epigenome and investigate the possibility of personalized medicine among other applications.
Next Generation Sequencing Technology
2.3 Sequencing by Ligation
2.4 Real-Time Sequencing
2.5 General Considerations
While next generation sequencers are usually grouped together in discussion by technologies, the methods they employ are quite different. While all NGS methods rely on the generation of representative unbiased sources of nucleic acid templates from the complex genomes being interrogated, they differ in their chemistries. A common feature, however, is that they allow for the sequencing of up to billions of individual DNA templates in a single reaction.
Current platforms on the market utilize one or more of the following technologies:
Cyclic Reversible Termination (CRT)
The CRT method takes advantage of the natural DNA replication process to extract genetic information from genomic DNA strands. CRT involves parallel sequencing of multiple fragments using reversible terminators that permit the detection of each single base that is incorporated into a growing DNA strand. As each complementary dNTP is added, DNA synthesis is terminated and the fluorescently-labeled terminator is imaged. The terminating group and fluorescent dye on the dNTP are then cleaved, permitting the incorporation of the next modified nucleotide. The incorporation, detection and cleavage steps constitute a single cycle, which is repeated multiple times to achieve the desired read length. Some sequencers utilize a four-color CRT cycle, while others use a one-color CRT cycle.
CRT is often the method of choice for applications such as de novo sequencing, chromatin immunoprecipitation sequencing (ChIP-Seq), whole-genome sequencing (WGS), targeted resequencing, microRNA discovery, expression tags and splice variants.
Pyrosequencing (Single Nucleotide Addition)
Pyrosequencing is a non-electrophoretic bioluminescence method that measures the release of inorganic pyrophosphate (PPi) by proportionally converting it into visible light using a series of enzymatic reactions. Unlike other sequencing approaches that use modified nucleotides to terminate DNA synthesis, the pyrosequencing method manipulates DNA polymerase by the addition of a dNTP in limiting amounts. Upon incorporation of the complementary dNTP, DNA polymerase extends the primer and pauses. DNA synthesis is reinitiated following the addition of the next complementary dNTP in the dispensing cycle. The user controls at which point each A, T, C or G dNTP is sequentially dispensed, allowing easy tracking of the desired sequence loci. The order and intensity of the light peaks are recorded as flowgrams, which reveal the underlying DNA sequence.
Currently, the pyrosequencing method is used for a broad range of applications such as SNP genotyping, identification of bacteria, DNA methylation analysis and whole genome sequencing. Pyrosequencing technology has many advantages over other DNA sequencing methods, for example, nucleotide dispensation can be easily programmed and alterations in the pyrogram pattern can reveal mutations, deletions and insertions.
Sequencing by Ligation (SBL)
SBL is another cyclic method that differs from CRT in its use of DNA ligase and either one-base-encoded probes or two-base-encoded probes. Unlike most currently popular DNA sequencing methods, this method does not use a DNA polymerase to create a second strand. Instead, the mismatch sensitivity of a DNA ligase enzyme is used to determine the underlying sequence of the target DNA molecule. In its simplest form, a fluorescently labeled probe hybridizes to its complementary sequence adjacent to the primed template. The dye-labeled probe is then joined to the primer following the addition of DNA ligase. Non-ligated probes are washed away, and the ligated probe is identified using fluorescent imaging. The cycle can be repeated by either using cleavable probes to remove the fluorescent probe and regenerate a 5’-PO4 group for subsequent ligation cycles, or by removing and hybridizing a new primer to the template.
SBL has the advantage of being easy to implement and accessible to all because it can be performed with off-the-shelf reagents. However, SBL has the limitation of very short read lengths and the data can be harder to interpret. The sequencing by ligation technology supports a wide range of applications such as whole transcriptome analysis, small RNA expression and serial analysis of gene expression (SAGE).
Real-Time Sequencing (Phospholinked Fluorescent Nucleotides)
The method of real-time sequencing involves imaging the continuous incorporation of dye-labeled nucleotides during DNA synthesis. Single DNA polymerase molecules are attached to the bottom surface of individual zero-mode waveguide detectors (Zmw detectors) that can obtain sequence information while phospholinked nucleotides are being incorporated into the growing primer strand. This method is currently still under development and needs more improvement because of high errors rate consisting of deletions, insertions and mismatches.
The type of chemistry utilized by NGS platforms only represents one of the ways in which they differ and there are several other factors to consider. Each of the NGS instruments currently on the market has its pros and cons. You will need to consider both your current and future needs when choosing an appropriate technology. Budget is likely to have a huge impact on your decision, but be prepared for the fact that the cheapest per-base technology may not be the one you need.
Principally, you will be concerned with performance metrics such as read length, accuracy and total sequence output, but you will also need to consider the basics. For example, you will need to think about your throughput requirements. A core facility is likely to have very different needs to a small independent laboratory. It is important to consider the extent of the technical support and training that is available from the manufacturer at the point on purchase and it is also recommended that you look into the company’s policy on after sales service. Usability and reliability of the equipment are extremely important and it is recommended that you research instruments thoroughly, read user reviews and request a product demonstration before purchase.
Considerations for Library Preparation
3.2 Consider Automated Sample Preparation
3.3 Consider Target Enrichment
In addition to considering your application and the capabilities of your NGS platform there are also several other important considerations that should not be overlooked.
Consider Sample Preparation
Success on any next-generation sequencing platform begins with optimal sample preparation - from sample isolation and purification to library construction and enrichment. As with any scientific methodology, it is well understood that the quality of sequencing data depends highly upon the quality of the sequenced material. Reagent kits that simplify and standardize the process of converting a DNA sample into a sequencing library, and, if desired, prepare it for multiplexing, can be purchased from both sequencer manufacturers and third party vendors.
The best NGS data begins with optimal sample. There are many commercially available purification kits, which can be used to extract DNA and RNA from a diverse range of sample types. It is important to choose a kit that will enable you to obtain high yields of pure DNA or RNA for your NGS workflow. Throughout NGS library preparation you will need to ensure you have established methods to determine the DNA quantity and quality. Typically DNA is quantified using a UV/VIS spectrophotometer and its purity assessed by visualization on an agarose gel.
Current methods for NGS library preparation generally consist of a distinct DNA fragmentation step. There are several methods that can be used for DNA fragmentation, which include acoustic shearing, nebulization, sonication and enzyme-based treatments. Each method has its benefits and limitations, thus you will need to consider your application and your desired end product. Following DNA fragmentation, a fragment “cleanup” step is carried out. This involves end repair to produce blunt-ended, 5’-phosphorylated DNA fragments, A-tailing to tag the 3’-ends with adenine for ligation, and adaptor ligation to link T-tailed adaptor molecules containing functional sequences to the fragmented DNA.
At this ligation stage you will need to decide whether you would like these adaptors to contain barcodes, or index sequences. Barcoding involves the construction of libraries with specific short sequence tags added on to each particular library and is important for multiplex NGS.
Following reaction cleanup you will need to choose your library fragment size and separate free adaptors from the desired product. Size selection, which is dependent on your instruments requirements and your application, has traditionally been performed using gel extraction protocols. However, there are now gel-free methods available to simplify this process.
Most imaging systems have not been designed to detect single fluorescent events, so the adapter-ligation reaction is typically amplified to produce the final product ready for cluster formation and sequencing. The two most common methods for amplification are emulsion PCR (emPCR) and solid-phase amplification. A common theme among NGS technologies is that the template is attached or immobilized to a solid surface or support. The immobilization of spatially separated template sites allows thousands to billions of sequencing reactions to be performed simultaneously. Library quality control monitoring and sample tracking are essential to ensure that only high quality libraries make it to the sequencer.
Preparing a sample for an NGS experiment takes planning to create the optimal library to be sequenced. As there are various steps involved in the sample preparation process, it is important to know which tools will be needed at each stage. Users should not however be intimidated by NGS library preparation. There are many different kits available that streamline one or more stages of the sample preparation workflow and the protocols that accompany the kits have been well tested. Because NGS platforms employ different methods, many of the commercially available kits are designed for a specific platform or application. There are a number of standard library preparation kits that offer protocols for sequencing whole genomes, mRNA, targeted regions such as whole exomes, custom-selected regions, protein-binding regions, and more. It is therefore important to choose your kits based on both your instrument and your application.
Consider Automated Sample Preparation
In recent years, NGS technologies have rapidly evolved to provide faster, better, cheaper and more reliable mapping of DNA and RNA sequences thus enabling a diverse set of genomic discoveries. The rapid increase in sequencer speed and capacity has created an increasing demand for high-throughput NGS sample preparation. Even with the development of kits, NGS sample preparation requires a considerable amount of time and effort. Manual sample preparation is highly labor intensive and prone to costly errors.
Utilizing a flexible and robust automated liquid handling platform can alleviate sample preparation bottlenecks and help you get the most out of your sequencing investment. Automating sample preparation procedures is likely to increase throughput, eliminate variability, improve reproducibility, reduce errors and lower overall cost. When purchasing a library preparation platform, as with any large piece of laboratory equipment, you will need to consider its flexibility, compatibility, capacity, speed and price.
Consider Target Enrichment
Target enrichment methods enable the selective capture of genomic regions of interest from a DNA sample prior to sequencing. Targeted enrichment can be useful in a number of situations where particular portions of a whole genome need to be analyzed, for example in exome sequencing. Current techniques for targeted enrichment include:
Hybrid Capture - Adaptor modified genomic DNA libraries are hybridized to target-specific probes either on a microarray surface or in solution. Background DNA fragments are washed away and the target DNA is then amplified by PCR and sequenced.
Molecular Inversion Probes (MIPs) - Probes consisting of a universal spacer region flanked by target-specific sequences are designed for each amplicon. These probes anneal at either side of the target region, and the gap is filled by a DNA polymerase and ligated to form a circular DNA fragment. The remaining linear DNA is digested using endonucleases, and the target DNA is PCR-amplified and sequenced.
PCR - PCR is directed toward the targeted regions of interest by conducting multiple long-range PCRs in parallel, a limited number of standard multiplex PCRs or highly multiplexed PCR methods that amplify very large numbers of short fragments.
Given the operational characteristics of these different targeted enrichment methods, they naturally vary their suitability for different fields of application. For example, where many megabases need to be analyzed (e.g. the exome), hybrid capture approaches are attractive as they can handle large target regions. In contrast, when small target regions need to be examined, especially in many samples, PCR-based approaches may be preferred as they enable a deep and even coverage over the region of interest.
Target enrichment can be a highly effective way of reducing sequencing costs and saving sequencing time, and has the power to bring the field of genomics into smaller laboratories, as well as being an invaluable tool for the detection of disease-causing variants. Target enrichment does however increase sample preparation cost and time. Assuming that the throughput of NGS and our ability to analyze large numbers of whole-genome datasets both continue to increase and the cost per base of sequence continues to decrease, it is likely that there will come a point at which it is no longer economical to perform target enrichment of single samples, compared to whole-genome sequencing.
Considerations for NGS Software
The current bottleneck of whole-genome and whole-exome sequencing projects is not the sequencing of the DNA itself but lies in the structured way of data management and the sophisticated computational analysis of the experimental data. Biologists are rarely trained in the computational and statistical techniques necessary to make sense of the large data sets generated by NGS.
The complete NGS data analysis process is complex, includes multiple analysis steps, is dependent on a multitude of programs and databases and involves handling large amounts of heterogeneous data. After NGS reads have been generated they are aligned to a known reference sequence or assembled de novo. The decision to use either strategy is based on the intended biological application as well as cost, effort and time considerations. For example, identifying and cataloging genetic variation in multiple strains of highly related genomes can be accomplished by aligning NGS reads to their reference genomes. This approach is substantially cheaper and faster than Sanger sequencing and single-nucleotide variations (SNVs) can be readily identified.
There are limitations to the alignment approach, such as placing reads within repetitive regions in the reference genome or in corresponding regions that may not exist in the reference genome. De novo assemblies have been reported for bacterial genomes and some mammalian bacterial artificial chromosomes, but substantial challenges exist for their application to human genomes.
A range of data analysis algorithms are available that perform specific tasks related to a given application. Some applications, such as de novo sequencing, require specialized assembly of sequencing reads. Other applications, such as RNA-Seq, require algorithms that quantify read counts to provide information about gene expression levels. While some data analysis algorithms are commercially available from software vendors, many are freely available open-source algorithms from academic institutions.
Typically, commercially available solutions for NGS aim to simplify analysis by providing easy-to-use graphical user interfaces (GUI). Such software tools may be a suitable entry point for small-scale laboratories, especially for analysis of simple datasets, but are generally limited in their flexibility and scalability and often do not adequately resolve issues around data handling and management. It is also important to remember that many challenges around NGS analysis are still being resolved and that commercial software packages are not exempt from common issues experienced with analyzing NGS data and may not be as advanced as the open-source tools being developed by large genome centers.
In addition to considering data analysis, it is also important to consider data management and data storage. Read lengths are typically between 100 and 400 base pairs in length, but can grow into much larger sizes, all of which need to be assembled. In the end, this translates into large amounts of data, volumes ranging from 120 to 600 gigabytes, which needs to be managed and stored. The initial investment in the NGS platform is often accompanied by an almost equal investment in upgrading the informatics infrastructure of the institution, hiring staff to analyze the data produced by the instrument, and storing the data for future use.
It is advantageous to have a centralized Bioinformatics Core to put in place platforms that acquire, store, and analyze the very large datasets created by NGS instruments. A Bioinformatics Core, already familiar with data of this type and complexity, dedicated to investigators, and jointly working with IT personnel, can span multiple domains rather effortlessly. If this is not a possible solution, you may wish to consider cloud computing. In cloud computing, a user can utilize a virtual operating system (or “cloud”) to process data on a computer cluster for high parallel tasks.
Applications of NGS
5.2 DNA-Protein Interactions and Epigenome Sequencing
5.3 Transcriptome Sequencing
5.4 RNA-Protein Interactions
5.5 Small RNA Sequencing
High-throughput NGS technologies have revolutionized genomics, epigenomics transcriptomics and metagenomics studies by allowing massively parallel sequencing at a relatively low cost. Current applications include:
Genome Sequencing/Resequencing - Whole genome sequencing/resequencing and targeted genome resequencing have been used extensively for sequence polymorphism discovery and mutation mapping. These applications are rapidly advancing our understanding of human health and disease and are also facilitating the de novo assembly of uncharacterized genomes.
DNA-Protein Interactions and Epigenome Sequencing - Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-Seq) is a powerful technique for genome-wide profiling of DNA-protein interactions and epigenetic marks. In this technique, antibodies select proteins and thereby enrich DNA fragments bound to it, which are then sequenced. It has facilitated a wide range of biological studies, including transcription factor binding, RNA polymerase occupancy, nucleosome positioning and histone modifications. Complementary methods being used to study chromatin structure and composition are Methyl-Seq and DNase-Seq for profiling DNA methylation and DNase hypersensitive sites, respectively.
Transcriptome Sequencing - The transcriptome represents the complete set of transcripts in a cell. Analysis of the transcriptome gives information on the functional elements of the genome and is important for understanding the process of development and disease. RNA-Seq is a powerful approach for profiling the transcriptome, wherein RNA analyses is carried out using next generation sequencing of cDNA. This strategy has been applied for profiling mRNA and noncoding RNA expression, alternative splicing, trans-splicing, alternative polyadenylation and for mapping transcription initiation, termination, and RNA editing sites.
RNA-Protein Interactions - CLIP-Seq, also known as HITS-CLIP, is a method employing in vivo crosslinking of RNA to protein followed by immunoprecipitation and high-throughput RNA sequencing. It is used to generate transcriptome-wide RNA-protein interaction maps.
Small RNA Sequencing - Similar to RNA-Seq, sequencing of size-selected short RNA provides insight into small RNA populations in different organisms, tissue and cell types, developmental stages, and disease states. It has greatly contributed to our understanding of the functions and regulatory mechanisms of different classes of small RNAs, such as microRNAs (miRNAs) and Piwi-interacting RNAs (piwiRNAs).
Ribonomics - RNA-binding proteins are involved in processing of RNA and affect regulation of gene expression. Applying recent technological developments, it is now possible to study individual RNA-binding proteins and large complexes such as ribosomes.
The Future of NGS
NGS technologies have gained the capacity to sequence gigabases of DNA in a high-throughput and highly efficient manner that has not been possible using traditional Sanger sequencing. Compared to traditional sequencing, the read lengths of current NGS approaches are relatively short, which is due to the small sequencing colonies and rapid signal deterioration. This is compensated by its highly-parallel fashion. Technical and chemical refinements are gradually increasing read lengths in NGS, but only novel technologies will be able to provide substantially longer reads. Since single DNA molecule sequencing technology can read through DNA templates in real time without amplification, it provides accurate sequencing data with potentially long-reads.
Consequently, novel third generation platforms with read-lengths as a focus, are currently under development. These new instruments are anticipated to be significantly faster than current technologies enabling genomes to be sequenced at a lower cost. In addition, new kits and reagents will continue to emerge that will enable NGS to be used for a wider range of applications. The protocols required for library preparation are likely to become more simplified and automation will continue to facilitate more streamlined workflows.
Nanopore sequencing is an exciting new method that is likely to be incorporated into some third generation sequencers. In nanopore sequencing a DNA strand is processed through a synthetic or protein nanopore and the subsequent changes in the electric current allows identification of the base passing the pore. This will theoretically allow sequencing of a complete chromosome in one step, without the need to generate a new DNA strand.
Despite still be in its infancy, NGS has already tremendously changed the landscape of biological research and has begun to engage with the clinical practice. In the next few decades, it is anticipated that genomic medicine, driven by NGS, will profoundly change the diagnosis, prognosis, and therapy of human diseases. Using NGS for personalized medicine is the ultimate goal for many. There are however, a number of challenges that must be adequately addressed before NGS can be transformed from a research tool to a routine clinical practice. Rapid interpretation of the masses of data produced currently requires highly specialized software, and represents one of the biggest obstacles in bringing whole genome sequencing routinely to the clinic.
Whether you are a first time buyer of NGS equipment, or an experienced user, there are a number of factors to consider if you are choosing a new platform. It is essential that you thoroughly examine your current and your future application needs.
Although it is necessary to consider the technology in terms of speed, read-length and accuracy, it is also imperative that you consider the basics, such as footprint, ease-of-use and technical support. It is important to note there are several accompanying considerations, such as library preparation and data management. These can have a significant impact on the cost of implementing NGS technology.
NGS technology is evolving at an unprecedented speed. New technologies will continue to emerge and new applications will be developed. Visit the SelectScience product directory for an overview of the latest NGS technology from leading manufacturers and read user reviews. Keep up to date with the latest NGS methods by visiting the SelectScience application note and video libraries. Watch the recent SelectScience NGS webinar to discover how automating library preparation and clean up can improve your sequencing data.
"The Ion PGM Sequencer is an easy to use all-in-a-box instrument, including analysis system. Also, it is a diagnosis capable system.”
Marc Ferre, University Hospital of Angers
“Hiseq is a very robust system for whole genome sequencing. A whole genome can be sequenced in a single day.”
Rahul Sharma, , Spinco Biotech
"The SRIworks is easy to use. I totally love it and I'm glad we have it in our lab."
Amiksha Shah, University of Texas Health Science
To help you choose the correct system, use the SelectScience product and supplier directory for an overview of systems from leading manufacturers and read user reviews from other SelectScience members.