If you are looking to invest in next generation sequencing (NGS) technology, this guide will provide you with the essential information you need to assist your decisions.
DNA sequencing has advanced significantly since the launch of the Human Genome Project in 1990. A human genome can now be sequenced in under 10 days for less than $1,000, and soon will be sequenced routinely within a day. This remarkable process is due to significant advances in DNA sequencing, from Sanger sequencing – which has been dominant for almost 30 years – to next generation sequencing (NGS, also termed massively parallel sequencing).
Sanger sequencing, which is often considered first generation sequencing technology, utilizes capillary electrophoresis to separate fragments of DNA by size and then sequences them by detecting the final fluorescent base on each fragment. This widely adopted technology is still extremely important today, but has always been hampered by inherent limitations in throughput, scalability, speed and resolution.
The limitations associated with Sanger sequencing have catalyzed the development of NGS technologies, which can inexpensively and quickly produce large volumes of sequence data. NGS enables rapid sequencing of large stretches of DNA base pairs spanning entire genomes, with some instruments capable of producing hundreds of gigabases of data in a single sequencing run. The read length – the actual number of continuous sequenced bases – is much shorter in NGS than that attained by Sanger sequencing, and at present NGS only provides 50-500 continuous base pair reads. Short reads represent the major limitation currently associated with NGS.
NGS technology is evolving at an unprecedented speed. Scientists can now routinely examine a single genome a large number of times, observe individual changes, study population variations, study the microbiome and metagenomics, differentiate cancer genomes from healthy genomes, study the epigenome, and investigate the possibility of personalized medicine, among other applications.
There are several main suppliers of next generation sequencing instruments and they all share the same fundamental sequencing process, but with varying technologies. Regardless of their method of arrival, next generation sequencers rely on the generation of representative, unbiased sources of nucleic acid templates from the complex genomes being interrogated. Clonally amplified DNA templates, or single DNA molecules, are sequenced in a massively parallel fashion in a flow cell. The sequencing is conducted in either a stepwise iterative process or in a continuous real-time manner. In this way, the instruments allow for the sequencing of up to billions of individual DNA templates in a single reaction.
Ion Torrent™ Technology
Thermo Fisher Scientific’s (formerly Life Technologies) Ion Torrent™ Technology directly translates chemically encoded information (A, C, G, T) into digital information (0, 1) on a semiconductor chip, similar to the one you might find in your digital camera. The Ion Personal Genome Machine™ (PGM™) sequencer essentially acts as the world’s smallest solid-state pH meter to determine DNA sequences. The DNA is fragmented, attached to beads and deposited in millions of wells across the surface of the chip. The wells are then sequentially flooded with one nucleotide after another. If a nucleotide is incorporated into the strand of bead-bound DNA, a hydrogen ion is given off, a chemical change is measured by an ion sensor beneath the well, and a base is called. This process takes place in millions of wells simultaneously, enabling sequencing in only a few hours.
In September 2015, Thermo Fisher Scientific launched the Thermo Scientific™ Ion S5™ and S5™ XL systems. These systems utilize the same technology as the PGM™ and Proton™ platforms, however they have been specifically designed for labs that are just moving into NGS testing and may be marketed as IVD instruments in the future.
Figure 1: Thermo Scientific™ Ion S5™ system
Sequencing by Synthesis (SBS) Technology
The MiSeq and HiSeq Platforms, and the other available Illumina systems, use SBS technology. Sequencing templates are immobilized on a proprietary flow cell surface that is designed to present the DNA in a manner that facilitates access to enzymes, while ensuring high stability of surface bound template and low non-specific binding of fluorescently labeled nucleotides. Solid-phase amplification creates up to 1,000 identical copies of each single template molecule in close proximity, generating a cluster, and because this process does not involve positioning of beads into wells or mechanical spotting, much higher densities are achieved.
SBS technology uses four fluorescently-labeled nucleotides to sequence the tens of millions of clusters on the flow cell surface in parallel, using a proprietary reversible terminator-based method. This enables detection of single bases as they are incorporated into growing DNA strands. Since all four reversible terminator-bound dNTPs are present during each sequencing cycle, natural competition minimizes incorporation bias. The result is base-by-base sequencing that enables highly accurate data for a broad range of applications.
Figure 2: Illumina HiSeq 2500 Sequencing System
Single Molecule, Real-Time (SMRT®) Technology
The PacBio RS II is a Single Molecule, Real-Time (SMRT®) DNA Sequencing System by Pacific Biosciences. SMRT technology, in which DNA polymerase attaches itself to a strand of DNA to be replicated, examines the individual base at the point it is attached, and then determines which of four building blocks, or nucleotides, is required to replicate that individual base. After determining which nucleotide is required, the polymerase incorporates that nucleotide into the growing strand that is being produced. After incorporation, the enzyme advances to the next base to be replicated and the process is then repeated.
Figure 3: Pacific Biosciences PacBio RS II
In October 2013, Roche announced that it would be shutting down its 454 Life Sciences sequencing operations. The GS Junior System and the GS FLX+ System are no longer available to purchase from Roche and the sequencing business has now closed.
Nanopore technology is an exciting new method currently being developed. The technology records characteristic changes in electric current as nucleic acids pass through a synthetic or protein nanopore and will theoretically allow sequencing of a complete chromosome in one step, without the need to generate a new DNA strand. Read more about nanopore technology in the future of ngs.
One of the bottlenecks for NGS is the amount of time and resources required for library preparation; this is true whichever sequencing instrument you choose. While every sequencer uses a slightly different technology, the methods for template construction and library preparation are pretty much the same, except for minor modifications made before the run.
Manual Library Preparation
Success on any NGS platform begins with optimal sample preparation – from sample isolation and purification to library construction and enrichment. As with any scientific methodology, it is well understood that the quality of sequencing data is highly dependent upon the quality of the sequenced material. Reagent kits that simplify and standardize the process of converting a DNA sample into a sequencing library and, if desired, prepare it for multiplexing, can be purchased from both sequencer manufacturers and third-party vendors.
There are many commercially available purification kits, which can be used to extract DNA and RNA from a diverse range of sample types. It is important to choose a kit that will enable you to obtain high yields of pure DNA or RNA for your NGS workflow. Throughout NGS library preparation, you will need to ensure you have established methods to determine the DNA quantity and quality. Typically, DNA is quantified using a UV/VIS spectrophotometer and its purity assessed by visualization on an agarose gel. KAPA Library Quantification Kits provide qPCR-based quantification of NGS libraries prior to pooling for capture or flow cell amplification.
Current methods for NGS library preparation generally consist of a distinct DNA fragmentation step, followed by a fragment ‘clean-up’ step. Following the reaction clean-up, it is necessary to choose library fragment size and separate free adaptors from the desired product. Size selection, which is dependent on your instrument’s requirements and your application, has traditionally been performed using gel extraction protocols. However, gel-free methods are now available that simplify this process. View the range of NGS library preparation kits available from Swift Biosciences.
Most imaging systems have not been designed to detect single fluorescent events, so the adapter-ligation reaction is typically amplified to produce the final product ready for cluster formation and sequencing. The two most common methods for amplification are emulsion PCR (emPCR) and solid-phase amplification.
Because NGS platforms employ different methods, many of the commercially available NGS kits are designed for a specific platform or application. A large number of standard library preparation kits offer protocols for sequencing whole genomes, mRNA, targeted regions such as whole exomes, custom-selected regions, protein-binding regions, and more. Some protocols are designed to require low sample input for library generation, and can be used for single cell analysis.
Automated Library Preparation
It is also possible to automate the whole library preparation workflow. Automation of sample and library preparation increases sample throughput and reproducibility, while eliminating labor-intensive manual steps and costly user errors. A range of automated and semi-automated products, including microfluidics-based instruments and liquid handling robots, are available to assist your library construction. Read this application note to learn how Labcyte has developed a method to streamline and miniaturize nextera™ library preparation for Illumina NGS using the Echo® 525 Series Liquid Handlers, to increase cost savings.
It is important to choose your system and kits based on both your instrument and your application. Consider batch size, the type of applications that will be performed, walk-away time, quantity of sample material, reproducibility, turn-around time, as well as budget and running costs.
Figure 7: PerkinElmer’s Sciclone NGSx Workstation is an automated solution for next generation sequencing sample preparation
Another major bottleneck of whole-genome and whole-exome sequencing projects is not the sequencing of the DNA itself, but is in the structured way of data management and the sophisticated computational analysis of the experimental data. Biologists are rarely trained in the computational and statistical techniques necessary to make sense of the large data sets generated by NGS.
The complete NGS data analysis process is complex; it includes multiple analysis steps, is dependent on a multitude of programs and databases, and involves handling large amounts of heterogeneous data. Data is produced at a rate faster than most computers can handle, and this has forced researchers to not just rethink software solutions, but also to consider data storage, processing power and data output.
Commercially available NGS software solutions might be delivered via desktop software or by the use of web-based interfaces.
Typically, commercially available solutions for NGS aim to simplify analysis by providing easy-to-use graphical user interfaces (GUI). Such software tools may be a suitable entry point for small-scale laboratories, especially for analysis of simple datasets, but are generally limited in their flexibility and scalability, and often do not adequately resolve issues around data handling and management. It is also important to remember that many challenges around NGS analysis are still being resolved; commercial software packages are not exempt from common issues experienced with analyzing NGS data and may not be as advanced as the open-source tools being developed by large genome centers.
When looking at software options, it is important to consider data management and data storage. Volumes ranging from 120 to 600 gigabytes will need to be managed and stored. The initial investment in the NGS platform is often accompanied by an almost equal investment in upgrading the informatics infrastructure of the institution, hiring staff to analyze the data produced by the instrument, and storing the data for future use. This cost is often not anticipated by the researcher.
It is advantageous to have a centralized Bioinformatics Core to put in place platforms that acquire, store and analyze the very large datasets created by NGS instruments. A Bioinformatics Core, already familiar with data of this type and complexity, dedicated to investigators, and jointly working with IT personnel, can span multiple domains rather effortlessly. If this is not a possible solution, you may wish to consider cloud computing. In the ‘cloud’, a user can utilize a virtual operating system to process data on a computer cluster for high parallel tasks.
Web-Based Interfaces and Cloud Computing
Several commercial players, such as GenomeQuest and DNAnexus, offer web-based browsers that manage all of the data coming from a NGS machine. This enables the researcher to work without the need for local computer infrastructure. The browser facilitates the management, analysis and delivery of genomic data through a secure cloud platform, which supports unlimited storage and computational resources. Cloud computing removes the hardware required for complex projects, allowing a faster set-up time and the ability to run multiple large projects in parallel. Large-scale data generated by NGS technology can be analyzed and stored alongside completed projects in the cloud. Cloud services can be selected to meet the users computational and storage requirements.
NGS technology is moving at an extremely fast pace, so much so that some researchers are unwilling to invest heavily in technology that might soon be outdated. Others may not have the time to complete sequencing projects. For these researchers, the use of a service provider might be an attractive option.
The expertise offered by service providers can enable a rapid turnaround time for efficient completion of projects, with sequencing performed under accredited conditions to ensure reliable results. Service providers offer a range of sequencing applications, including whole genome sequencing, de novo genome sequencing, exome sequencing, targeted resequencing, de novo transcriptome sequencing, RNA-seq, small RNA sequencing, microbiome sequencing, metagenomic sequencing, and metatranscriptome sequencing, and may offer multiplatform sequencing strategies. Experience in these sequencing applications has also helped service providers to offer specific and efficient protocols, such as those requiring low DNA input.
Using a service provider, researchers can submit their samples, which will be analyzed by the provider, who will then return the data. Researchers then only require suitable software to enable them to analyze and store the results. Some service providers now offer pre-sequencing and bioinformatics services in addition to the above. For example, GATC’s INVIEW™ portfolio, such as its INVIEW™ Human Exome, offers services including DNA isolation, library preparation, amplification of target region, sequencing and data analysis through to delivery in common files or providing access to web-based analysis software like QIAGEN’s Ingenuity® Variant Analysis™. Service providers are also starting to offer secure cloud computing of customer data.
GATC also offers a modular NGS service, NGSELECT (figure 9), where users can select one or more sequencing workflow modules, which include sequencing, library preparation, pre-sequencing or BioIT analysis. A choice of nine library separation protocols, and multiple sequencing modes on in-house Illumina and PacBio sequencing platforms, plus BioIT analysis, can be adapted to meet the specific needs of the customer.
Figure 9: NGSELECT is a modular NGS service that allows customers to rely on GATC’s extensive experience isolating DNA and RNA from nearly any starting material imaginable
Other examples of service providers are Thermo Fisher Scientific and Beckman Genomics.
NGS technology is evolving at an unprecedented speed. Scientists can now routinely examine a single genome, multiple times, observe individual changes, study population variations and metagenomics, differentiate cancer genomes from healthy genomes, and study the epigenome. NGS technology has the potential to revolutionize the field of companion diagnostics and personalized medicine, specifically in the area of oncology and cancer diagnostics, where customized treatment and therapy decisions are based on individual genomic data, targeting molecular changes in cancer cells. NGS also has applications in inherited disease testing, virology and microbiology.
There is currently only one FDA approved NGS analyzer, the MiSeqDx. Read the press release of this FDA announcement here. In October 2014 the Ion PGM Dx NGS System was CE-Marked for IVD use in European countries. The other commercially available NGS instruments are being used in the life science arena for clinical research, and in clinical diagnostic laboratories where regulations allow. The manufacturers of these instruments are engaging with the regulatory bodies to determine the best way in which these analyzers can be utilized for diagnostic use.
There are a number of challenges to implementing NGS into the clinical laboratory. These include, among others, achieving a sufficiently simple and reproducible workflow, standardization of data formats, association of mutations with clinical relevance, unclear pathway to regulatory approval, and most significantly, laboratories need to learn how to interpret and analyze the enormous amounts of data being collated.
Currently, the easiest way for clinical laboratories to access NGS technology is via use of an approved service provider, as discussed in the previous chapter. At the start of 2016, GATC Biotech announced that it had launched a new NGS service family called GATCLIQUID. The service consists of three assays that offer exclusive access to liquid biopsy-based tools, which can be used flexibly in clinical research and translational medicine.
Figure 10: GATCLIQUID offers exclusive access to liquid biopsy-based tools
Read this whitepaper to learn more about the GATCLIQUID liquid-biopsy based service family
Microbiome studies analyze microbial communities found in the human body and are important for elucidating the role of microbes in health and disease, as discussed in the following videos about the American Gut and Human Microbiome Project. The use of NGS provides a cost and time saving method for studying complex microbial samples, requiring just a single sample to analyze the entire microbial community and eliminating the need for microbial cultivation. The diversity (phylogeny and taxonomy) of complex microbial samples can be analyzed by sequencing of conserved genomic regions, such as the 16S rRNA gene or internal transcribed spacer (ITS) in bacterial and fungal samples respectively.
NGS technologies have gained the capacity to sequence gigabases of DNA in a high-throughput and highly efficient manner that has not been possible using traditional Sanger sequencing. Compared to traditional sequencing, the read lengths of current NGS approaches are relatively short, which is due to the small sequencing colonies and rapid signal deterioration. This is compensated for by its highly parallel fashion. Technical and chemical refinements are gradually increasing read lengths in NGS, but only novel technologies will be able to provide substantially longer reads. Since single DNA molecule sequencing technology can read through DNA templates in real time, without amplification, it provides accurate sequencing data with potentially long-reads.
Consequently, novel third generation platforms, with read-lengths as a focus, are currently under development. These new instruments are anticipated to be significantly faster than current technologies, enabling genomes to be sequenced at a lower cost. In addition, new kits and reagents will continue to emerge that will enable NGS to be used for a wider range of applications. The protocols required for library preparation are likely to become more simplified and automation will continue to facilitate more streamlined workflows.
Nanopore sequencing is an exciting new method that is likely to be incorporated into some third generation sequencers. In nanopore sequencing, a DNA strand is processed through a synthetic or protein nanopore and the subsequent changes in the electric current allow identification of the base passing the pore. This will theoretically allow sequencing of a complete chromosome in one step, without the need to generate a new DNA strand.
Oxford Nanopore Technologies is developing a range of protein nanopore-based electronic systems to analyze single molecules such as DNA, RNA and proteins. The technology, which is incorporated in the MinION™ portable device and PromethION™ desktop system, sequences DNA via the measurement of characteristic ionic current disruptions, as each of the DNA bases on an intact single-stranded DNA polymer passes through a protein nanopore. A high-throughput array chip design enables multiple simultaneous measurements. Read lengths of many tens of kilobases and bioinformatic analyses can be completed in real time for DNA sequencing applications including resequencing, de novo sequencing and epigenetics. The MinION™ platform has been commercially available to purchase since May 2015.
Despite still being in its infancy, NGS has already tremendously altered the landscape of biological research and has begun to engage with the clinical practice. In the next few decades, it is anticipated that genomic medicine, driven by NGS, will profoundly change the diagnosis, prognosis, and therapy of human diseases.
Using NGS for personalized medicine is the ultimate goal for many. There are, however, a number of challenges that must be adequately addressed before NGS can be transformed from a research tool to a routine clinical practice. Rapid interpretation of the masses of data produced currently requires highly specialized software, and represents one of the biggest obstacles in bringing whole genome sequencing routinely to the clinic.