High-quality reference genomes are complete, contiguous, accurate, and representative of a given species. They are highly valued in scientific research because they serve as the foundation for studying gene function, gene expression, evolution, genetic variation, disease-causing mutations, epigenomics, and for comparative genomics across species. They also provide a framework upon which sequences from similar organisms can be mapped and assembled.
In the case of polyploid organisms, genomic features such as GC-rich regions, tandem repeats, and transposons are difficult to accurately represent using short-read sequencing technology. Long-read sequencing technology overcomes short-read limitations by producing reads thousands of bases long that span numerous genes over large sections of chromosomes. Their assembly facilitates accurate gene placement. These assemblies are often polished with short reads using hybrid strategies to create highly accurate genome sequences. This article provides examples of the use of third generation, long read technology by Novogene, to overcome the limitations of prior technology and produce accurate genome sequences for very large and polyploid genomes.
Chrysanthemum extracts and teas have traditionally been used as anti-hypertensives, antimicrobials, and as a source of iron and magnesium. They have been cultivated in China for millennia and are thought to have existed in the wild for millions of years. Their large, polyploid chromosomes had thus far prevented the creation of high-quality reference genomes to answer important questions such as the origin of their polyploid nature or whether the presence of duplicate genes on separate chromosomes influences flowering characteristics.
With the help of Novogene’s expertise, Song et al. (2021), employed a hybrid strategy using a combination of PacBio long-reads and Illumina short-reads to resolve the complex structure of the plant’s genome. The long reads produced were subsequently assembled using FALCON, Novogene’s automated intelligent delivery platform. Retrotransposons and DNA transposons were identified as the primary cause of cultivated chrysanthemum genome expansion. The findings of this experiment allowed the authors to conclude that chrysanthemum is allopolyploid, meaning their chromosome copies are derived from different species.
A similar hybrid approach was taken in another study investigating the evolutionary origin of sciadonic acid (SCA) genes in Torreya grandis, an evergreen coniferous tree species found in China. SCA consumption has been associated with a variety of health benefits including lowering triglycerides, reducing inflammation, and regulating lipid metabolism. The genome of T. grandis is estimated to be approximately 20Gb, challenging to sequence with traditional NGS methods. But with the combination of PacBio long-read sequencing, Illumina short-reads, and Novogene’s extensive experience in both methods, Lou et al. (2023), sequenced genomic DNA to decipher the full genomic sequence of the species, along with RNA sequencing to study relative gene expression.
This approach yielded about 19 Gb grouped into 11 chromosomes with a base accuracy of nearly 100%. The use of third-generation technology revealed ancient whole genome duplication events and Lou et al. (2023) found that T. grandis separated from Torreya wallichiana approximately 68 million years ago. It also revealed the importance of gene duplication events in T. grandis biological functions related to lipid transfer, stress responses, and secondary metabolism.
Advances in third-generation sequencing and analysis software have put the creation of high-quality genomic sequences from previously difficult to sequence organisms within the reach of all scientists. This has resulted in an expanding number of reference quality genome sequences. Coupled with other techniques they will provide a greater understanding of taxonomy and cell biology with broad applications in basic and applied research and provide a leap forward in the advancement of science.