If you are looking to invest in Next Generation Sequencing (NGS) Technology, this guide will provide you with the essential information needed to assist your decisions
Learn about the key platform technologies, considerations for sample preparation, NGS software, key application areas and the future for NGS.
In 2013, the world celebrated the 10th anniversary of the completion of the Human Genome Project, which was launched in 1990 and thus took 13 years to complete. Ten years on, a human genome can be sequenced in less than 10 days and soon will be sequenced routinely within a day. This remarkable progress over the past decade is due to significant advances in DNA sequencing, from Sanger sequencing, which has been dominant for almost 30 years, to next generation sequencing (NGS, also termed massively parallel sequencing).
Sanger sequencing, which is often considered first generation sequencing technology, relies on a technique known as capillary electrophoresis, which separates fragments of DNA by size and then sequences them by detecting the final fluorescent base on each fragment. This technology, which has become widely adopted in laboratories across the world and is still extremely important today, has always been hampered by inherent limitations in throughput, scalability, speed and resolution.
The limitations associated with Sanger sequencing have catalyzed the development of NGS technologies, which are able to inexpensively and quickly produce large volumes of sequence data. NGS enables rapid sequencing of large stretches of DNA base pairs spanning entire genomes, with some instruments capable of producing hundreds of gigabases of data in a single sequencing run. The read length - the actual number of continuous sequenced bases - is much shorter in NGS than that attained by Sanger sequencing, and at present NGS only provides 50-500 continuous base pair reads. Short reads represent the major limitation currently associated with NGS.
NGS technology is evolving at an unprecedented speed. Scientists can now routinely examine a single genome a large number of times, observe individual changes, study population variations, study metagenomics, differentiate cancer genomes from healthy genomes, study the epigenome, and investigate the possibility of personalized medicine, among other applications.
There are several main suppliers of next generation sequencing instruments and they all share the same fundamental sequencing process. However, they utilize different technologies to achieve this goal. Regardless of their method of arrival, next generation sequencers rely on the generation of representative, unbiased sources of nucleic acid templates from the complex genomes being interrogated. Clonally amplified DNA templates, or single DNA molecules, are sequenced in a massively parallel fashion in a flow cell. The sequencing is conducted in either a stepwise iterative process or in a continuous real-time manner. In this way, the instruments allow for the sequencing of up to billions of individual DNA templates in a single reaction.Life Technologies
Ion Torrent™ Technology directly translates chemically encoded information (A, C, G, T) into digital information (0, 1) on a semiconductor chip, similar to the one you might find in your digital camera. The Ion Personal Genome Machine™ (PGM™) sequencer and Ion Proton™ System essentially act as the world’s smallest solid-state pH meter to determine DNA sequences. The DNA is fragmented, attached to beads and deposited in millions of wells across the surface of the chip. The wells are then sequentially flooded with one nucleotide after another. If a nucleotide is incorporated into the strand of bead-bound DNA, a hydrogen ion is given off, a chemical change is measured and a base is called.
Figure 1: Life Technologies; PGMTM Sequencer
Life Technologies also manufacturers SOLiD NGS systems, which use a sequencing by ligation technology. SOLiD stands for Sequencing by Oligonucleotide Ligation and Detection. DNA ligase is used to determine the underlying sequence of the target DNA molecule. A fluorescently-labeled probe hybridizes to its complementary sequence adjacent to the primed template. The dye-labeled probe is then joined to the primer following the addition of DNA ligase. Non-ligated probes are washed away, and the ligated probe is identified using fluorescent imaging. The 5500 W Series Genetic Analysis Systems are NGS platforms that support a wide range of research applications, such as exome sequencing and RNA-Seq on a pay-per-lane basis.
Figure 2: Life Technologies; 5500 W Series utilizing SOLiD technologyIllumina
The MiSeq and HiSeq Platforms, and the other available Illumina systems, use SBS technology. Sequencing templates are immobilized on a proprietary flow cell surface that is designed to present the DNA in a manner that facilitates access to enzymes while ensuring high stability of surface bound template and low non-specific binding of fluorescently labeled nucleotides. Solid-phase amplification creates up to 1,000 identical copies of each single template molecule in close proximity, and because this process does not involve positioning of beads into wells or mechanical spotting, much higher densities are achieved.
SBS technology uses four fluorescently-labeled nucleotides to sequence the tens of millions of clusters on the flow cell surface in parallel, using a proprietary reversible terminator-based method. This enables detection of single bases as they are incorporated into growing DNA strands. Since all four reversible terminator-bound dNTPs are present during each sequencing cycle, natural competition minimizes incorporation bias. The result is base-by-base sequencing that enables highly accurate data for a broad range of applications.
Figure 3: Illumina: HiSeq 2500 Sequencing SystemPacific Biosciences
The PacBio RS II is a Single Molecule, Real-Time (SMRT®) DNA Sequencing System. SMRT technology, in which DNA polymerase attaches itself to a strand of DNA to be replicated, examines the individual base at the point it is attached, and then determines which of four building blocks, or nucleotides, is required to replicate that individual base. After determining which nucleotide is required, the polymerase incorporates that nucleotide into the growing strand that is being produced. After incorporation, the enzyme advances to the next base to be replicated and the process is then repeated.
Figure 4: Pacific Biosciences; PacBio RS IIRoche
In October 2013, Roche announced that it would be shutting down its 454 Life Sciences sequencing operations. GS Junior System and the GS FLX+ System utilize 454 Sequencing technologies. The 454 Sequencing™ process uses a sequencing by synthesis (SBS) approach to generate sequence data.
The 454 sequencers are scheduled to be phased out in mid-2016, with approximately 100 staff lay-offs over the next three years, and the closure of the 454 facility in Branford, Conn, USA. Until the business is fully shut down, Roche will continue to provide service and support to 454 instruments, parts, reagents and consumables.
One of the bottlenecks for NGS is the amount of time and resources required for library preparation; this is true whichever sequencing instrument you choose. While every sequencer uses a slightly different technology, the methods for template construction and library preparation are pretty much the same, except for minor modifications made before the run.Manual Library Preparation
Success on any NGS platform begins with optimal sample preparation - from sample isolation and purification to library construction and enrichment. As with any scientific methodology, it is well understood that the quality of sequencing data is highly dependent upon the quality of the sequenced material. Reagent kits that simplify and standardize the process of converting a DNA sample into a sequencing library and, if desired, prepare it for multiplexing, can be purchased from both sequencer manufacturers and third-party vendors.
There are many commercially available purification kits, which can be used to extract DNA and RNA from a diverse range of sample types. It is important to choose a kit that will enable you to obtain high yields of pure DNA or RNA for your NGS workflow. Throughout NGS library preparation, you will need to ensure you have established methods to determine the DNA quantity and quality. Typically, DNA is quantified using a UV/VIS spectrophotometer and its purity assessed by visualization on an agarose gel.
Current methods for NGS library preparation generally consist of a distinct DNA fragmentation step, followed by a fragment ‘cleanup’ step. Following the reaction cleanup, it is necessary to choose library fragment size and separate free adaptors from the desired product. Size selection, which is dependent on your instrument's requirements and your application, has traditionally been performed using gel extraction protocols. However, there are now gel-free methods available to simplify this process.
Most imaging systems have not been designed to detect single fluorescent events, so the adapter-ligation reaction is typically amplified to produce the final product ready for cluster formation and sequencing. The two most common methods for amplification are emulsion PCR (emPCR) and solid-phase amplification.Automated Library Preparation
Because NGS platforms employ different methods, many of the commercially available NGS kits are designed for a specific platform or application. There are a large number of standard library preparation kits that offer protocols for sequencing whole genomes, mRNA, targeted regions such as whole exomes, custom-selected regions, protein-binding regions, and more.
It is also possible to automate the whole library preparation workflow. Automation of sample and library preparation eliminates labor-intensive manual steps and costly user errors. It is important to choose your system and kits based on both your instrument and your application.
Figure 6: Beckman Coulter; SPRIworks Fragment Library System I for Illumina Genome Analyzer
Figure 7: Life Technologies; AB Library Builder™ System
Another major bottleneck of whole-genome and whole-exome sequencing projects is not the sequencing of the DNA itself, but is in the structured way of data management and the sophisticated computational analysis of the experimental data. Biologists are rarely trained in the computational and statistical techniques necessary to make sense of the large data sets generated by NGS.
The complete NGS data analysis process is complex, includes multiple analysis steps, is dependent on a multitude of programs and databases, and involves handling large amounts of heterogeneous data. Data is produced at a rate faster than most computers can handle and this has forced researchers to not just rethink software solutions, but also to consider data storage, processing power, and data output.
Commercially available NGS software solutions might be delivered via desktop software or by the use of web-based interfaces.Desktop Software
Typically, commercially available solutions for NGS aim to simplify analysis by providing easy-to-use graphical user interfaces (GUI). Such software tools may be a suitable entry point for small-scale laboratories, especially for analysis of simple datasets, but are generally limited in their flexibility and scalability, and often do not adequately resolve issues around data handling and management. It is also important to remember that many challenges around NGS analysis are still being resolved; commercial software packages are not exempt from common issues experienced with analyzing NGS data and may not be as advanced as the open-source tools being developed by large genome centers.
Figure 8: CLC Bio; CLC Genomics Workbench
When looking at software options, it is important to consider data management and data storage. Volumes ranging from 120 to 600 gigabytes will need to be managed and stored. The initial investment in the NGS platform is often accompanied by an almost equal investment in upgrading the informatics infrastructure of the institution, hiring staff to analyze the data produced by the instrument, and storing the data for future use. This cost is often not anticipated by the researcher.
It is advantageous to have a centralized Bioinformatics Core to put in place platforms that acquire, store and analyze the very large datasets created by NGS instruments. A Bioinformatics Core, already familiar with data of this type and complexity, dedicated to investigators, and jointly working with IT personnel, can span multiple domains rather effortlessly. If this is not a possible solution, you may wish to consider cloud computing. In cloud computing, a user can utilize a virtual operating system (or ‘cloud’) to process data on a computer cluster for high parallel tasks.Web-Based Interfaces
Several commercial players, such as GenomeQuest and DNAnexus, offer web-based browsers that manage all of the data coming from a NGS machine. This enables the researcher to work without the need for local computer infrastructure. The browser facilitates the management, analysis and delivery of genomic data through a secure cloud platform, which supports unlimited storage and computational resources.
Figure 9: DNAnexus; DNAnexus Platform
NGS technology is moving at an extremely fast pace, so much so that some researchers are unwilling to invest heavily in technology that might soon be outdated. For these researchers, the use of a service provider might be an attractive option. Using a service provider, researchers can submit DNA samples, which will be analyzed by the provider, who will then return the data. Researchers then only require suitable software to enable them to analyze and store the results.
NGS technology is evolving at an unprecedented speed. Scientists can now routinely examine a single genome a large number of times, observe individual changes, study population variations and metagenomics, differentiate cancer genomes from healthy genomes, and study the epigenome. NGS technology has the potential to revolutionize the field of oncology and cancer diagnostics, as well as having applications in companion diagnostics, inherited disease testing, virology and microbiology.
There is currently only one FDA approved NGS analyzer, the MiSeqDx. Read the press release of this FDA announcement here. The other commercially available NGS instruments are being used in the life science arena for clinical research, and in clinical diagnostic laboratories where regulations allow. The manufacturers of these instruments are engaging with the regulatory bodies to determine the best way in which these analyzers can be utilized for diagnostic use.
There are a number of challenges to implementing NGS into the clinical laboratory. These include, among others, achieving a sufficiently simple and reproducible workflow, standardization of data formats, association of mutations with clinical relevance, unclear pathway to regulatory approval, and most significantly, laboratories need to learn how to interpret and analyze the enormous amounts of data being collated.
Find out how private company SynapDx is using next generation sequencing to develop a test for the earlier diagnosis of autism spectrum disorders by listening to our podcast.
Discover how Illumina technology is being utilized in the clinical arena by reading our interview with LifeCodexx, a laboratory using NGS to perform non-invasive pre-natal screening for autosomal trisomies.
NGS technologies have gained the capacity to sequence gigabases of DNA in a high-throughput and highly efficient manner that has not been possible using traditional Sanger sequencing. Compared to traditional sequencing, the read lengths of current NGS approaches are relatively short, which is due to the small sequencing colonies and rapid signal deterioration. This is compensated for by its highly-parallel fashion. Technical and chemical refinements are gradually increasing read lengths in NGS, but only novel technologies will be able to provide substantially longer reads. Since single DNA molecule sequencing technology can read through DNA templates in real time, without amplification, it provides accurate sequencing data with potentially long-reads.
Consequently, novel third generation platforms, with read-lengths as a focus, are currently under development. These new instruments are anticipated to be significantly faster than current technologies, enabling genomes to be sequenced at a lower cost. In addition, new kits and reagents will continue to emerge that will enable NGS to be used for a wider range of applications. The protocols required for library preparation are likely to become more simplified and automation will continue to facilitate more streamlined workflows.
Nanopore sequencing is an exciting new method that is likely to be incorporated into some third generation sequencers. In nanopore sequencing, a DNA strand is processed through a synthetic or protein nanopore and the subsequent changes in the electric current allow identification of the base passing the pore. This will theoretically allow sequencing of a complete chromosome in one step, without the need to generate a new DNA strand.
Despite still being in its infancy, NGS has already tremendously changed the landscape of biological research and has begun to engage with the clinical practice. In the next few decades, it is anticipated that genomic medicine, driven by NGS, will profoundly change the diagnosis, prognosis, and therapy of human diseases.
Using NGS for personalized medicine is the ultimate goal for many. There are, however, a number of challenges that must be adequately addressed before NGS can be transformed from a research tool to a routine clinical practice. Rapid interpretation of the masses of data produced currently requires highly specialized software, and represents one of the biggest obstacles in bringing whole genome sequencing routinely to the clinic.
"It has high frequency and throughput and you are not dependent on a Sanger reaction."
Rabab Omran, Babylon University
"My personal best benchtop sequencer in terms of accuracy and efficiency. Best suitable for Bacterial Genomics as multiplexing is easier and cost effective."
Prashant Patil, CSIR-Institute Of Microbial Technology
"This is the best Next Generation Sequencing platform on the market"
Nidhan Kumar Biswas, Indian Statistical Institute