How whole-genome sequencing is transforming large scale cancer analysis

Inocras CEO Jehee Suh and co-founder Young Seok Ju discuss how a collaboration with the Broad Institute is unlocking thousands of cancer whole genomes to advance precision oncology

22 May 2026
Matilde Marques, Life Sciences Assistant Editor
Matilde Marques
Assistant Editor

Editorial article

Jehee Suh and Young Seok Ju from Inocras at AACR 2026

Jehee Suh and Young Seok Ju at AACR 2026

Whole-genome sequencing (WGS) has long promised to transform cancer research, but translating that promise into practice has been limited by one major hurdle: scale. Cancer genomes are vast, complex, and data-rich, typically generating 100–200 GB per sample. While analyzing a handful of genomes is challenging, processing thousands in a harmonized, clinically meaningful way requires both advanced computational infrastructure and deep genomic expertise. This is the challenge Inocras is addressing.

Founded just seven years ago, the company is built around an ambitious goal: bringing population-scale whole-genome analysis into both research and clinical settings. Inocras describes itself as a “whole-genome native” organization, using WGS as the foundation for bioinformatics, clinical diagnostics, and precision oncology.

“Cancer is a disease of the genome,” explains Jehee Suh, CEO of Inocras. “All the information about what happened to the cell, why it became cancer, is inside that genome. Our role is to decode it for both research and clinical use.”

Suh is joined by Young Seok Ju, co-founder and Director of Genome Insight Institute, who leads the company’s technological development, particularly its bioinformatics capabilities.

“For cancer genome analysis, we need cutting-edge technologies,” says Ju. “I’m responsible for bioinformatics and computational analysis of genomic data.”

Why TCGA whole genomes? And why now?

A major focus for Inocras is its collaboration with the Broad Institute, aimed at systematically analyzing whole-genome sequencing data from The Cancer Genome Atlas (TCGA), one of the most influential cancer genomics initiatives to date. Although TCGA has significantly advanced the field, much of its earlier impact was driven by whole-exome sequencing (WES). In contrast, WGS data from the same samples has remained largely underutilized at scale.

“There has always been whole-genome data sitting there,” says Suh. “But no one has really opened it up systematically across the entire dataset.”

The collaboration, encompassing more than 8,500 cancer cases, seeks to change that by creating a harmonized, large-scale WGS resource for the research community. For Inocras, the effort is both scientific and strategic: accelerating a shift away from fragmented panels toward comprehensive genome-wide analysis.

“We want researchers and clinicians to adopt whole genome sequencing,” Suh explains. “If that happens, it will advance science and naturally support our mission as a company.”

From 8,500 genomes to 100,000 and beyond

Scaling from tens or hundreds of genomes to thousands introduces entirely new challenges. According to Ju, population-scale analysis is not simply a matter of doing more of the same, it requires fundamentally different infrastructure.

“Analyzing 8,000 genomes is completely different from analyzing 10 or 100,” he says. “The pipeline must be extremely robust.”

Since the collaboration began in November 2024, the team has focused heavily on optimizing its pipeline for large-scale WGS. This includes handling data transfer, pipeline execution, error detection, and harmonization across thousands of samples. Suh highlights the practical complexity, “Uploading and downloading data, running pipelines, finding and fixing errors. When you scale to 8,000 or 9,000 samples, that becomes a major challenge. You have to industrialize the process.”

Beyond technology, collaboration at this scale requires trust. With no pre-existing benchmark for comparison, both teams are effectively defining the standard together.

“We’re doing something nobody has done before,” says Suh. “There's no right answer to compare to. We're finding the right answer, and we believe this is the best answer, and this is going to be the gold-standard data set because of that great collaboration.”

What WGS reveals that exomes can miss

Early findings from the collaboration reinforce the value of whole-genome sequencing. Across the cohort, Inocras identified more than 200 million genomic variants, forming one of the largest cancer variant datasets generated to date. While WGS was expected to detect more variants than WES, the extent and clinical relevance of the additional findings were notable.

“Many people assume exome sequencing captures all coding mutations,” says Ju. “But in reality, it can miss regions that are difficult to capture.”

The analysis revealed 10–20% more mutations in protein-coding regions, including variants in well-characterized cancer genes such as TP53 and VHL.

“Whole-genome sequencing is superior for identifying driver mutations, even in known protein-coding genes,” Ju adds.

In addition, WGS enabled improved detection of copy number variations, structural variants, genomic rearrangements, and germline variants — all of which can contribute to cancer development and progression.

From genomic insight to clinical action

A key question for any large-scale genomics effort is its impact on patient care. Inocras has assessed clinical actionability across the 8,500-case dataset and reports that approximately 83% of cases show some level of actionability using a WGS-based approach.

“There are different levels,” explains Suh. “About 25% have FDA-approved therapies available. Others may match clinical trials or emerging treatments. But if you add all that up, it's impressive because 8 out of 10 cancer patients can do something if we really look into it.”

However, this potential varies across cancer types. While some show relatively high actionability, others remain significantly lower, highlighting important gaps in current knowledge and treatment options.

“Some cancers are at 60–70%, others are still at 10–20%,” says Suh. “That tells us where more research is urgently needed.”

Expanding to 100,000 genomes and beyond

With the initial cohort complete, Inocras is now focused on three next steps: deeper biological analysis, pipeline standardization, and scaling to larger datasets.

“Why stop at 8,500?” Suh asks. “We need 100,000, even a million.”

The rationale is clear: with 31 cancer types represented, current sample sizes per cancer remain limited, making it difficult to detect low-frequency mutations below 1%.

“If you’re looking at mutations below 1%, you’ll miss a lot,” Suh explains. “So getting a tenfold difference is the next goal.”

Ju also highlights the importance of improving population diversity in genomic datasets, which remain skewed toward certain ethnic groups.

“Many populations are still underrepresented,” he says. “We need to expand globally.”

The need for data infrastructure in cancer genomics

Looking ahead, both leaders stress that scaling cancer genomics requires more than sequencing capacity; it demands robust data infrastructure.

Suh argues that cancer must be a central focus of population-scale genomics efforts, “Cancer is one of the leading causes of death, and it’s fundamentally a genomic disease. It should be a top priority.”

However, Ju emphasizes that genomic data alone is not enough. To truly enable drug development and clinical decision-making, it must be paired with clinical data.

“Genome data is powerful, but incomplete,” he says. “We need secure systems to share data and integrate clinical context while protecting patient privacy.”

Whole-genome sequencing gains momentum at AACR

The growing importance of WGS was evident at AACR 2026, where both Suh and Ju observed a shift in the conversation. Suh points to increased momentum around AI-driven analysis, while Ju highlights the changing perception of WGS itself.

“A few years ago, people questioned why whole-genome sequencing was needed,” Ju says. “Now it’s clear: it’s a technology for today.”

As cancer research becomes increasingly data-driven and AI-enabled, whole-genome sequencing is emerging as a key foundation for next-generation precision oncology.

“We can now look at 100% of the genome,” says Suh. “The cost is manageable, the technology is here. So why not?”

Want the latest science news straight to your inbox? Become a SelectScience member for free today>>

Frequently asked questions

How is Inocras using whole-genome sequencing (WGS) and The Cancer Genome Atlas (TCGA) data to advance large-scale cancer genomics research?

Inocras is collaborating with the Broad Institute to systematically analyze whole-genome sequencing (WGS) data from more than 8,500 cancer cases in The Cancer Genome Atlas (TCGA). While TCGA’s earlier impact was driven largely by whole-exome sequencing (WES), the WGS data from the same samples has remained underutilized at scale.

Inocras is creating a harmonized, large-scale WGS resource by optimizing robust pipelines for data transfer, execution, error detection, and harmonization across thousands of samples. This effort is designed to accelerate a shift away from fragmented gene panels toward comprehensive, genome-wide analysis in both research and clinical settings, positioning the resulting dataset as a potential gold-standard resource for the cancer genomics community.

What advantages does whole-genome sequencing offer over whole-exome sequencing in identifying cancer-driving genomic variants such as TP53 and VHL?

Whole-genome sequencing (WGS) provides broader and deeper coverage than whole-exome sequencing (WES), revealing genomic alterations that exome-based approaches can miss. In the Inocras–Broad Institute analysis of TCGA cancer genomes, WGS identified more than 200 million genomic variants and detected 10–20% more mutations in protein-coding regions than WES, including additional variants in well-characterized cancer genes such as TP53 and VHL.

WGS also improved detection of copy number variations, structural variants, genomic rearrangements, and germline variants, all of which can contribute to cancer development and progression. These findings support the view that WGS is superior for identifying driver mutations, even within known protein-coding genes.

How does Inocras translate large-scale whole-genome cancer data into clinically actionable insights, and what are its goals for scaling to 100,000 genomes and beyond?

Inocras assesses clinical actionability across its large WGS cohorts by linking genomic findings to potential therapeutic options. In the 8,500-case TCGA-based dataset, approximately 83% of cancer cases show some level of clinical actionability using a WGS-based approach. Around 25% of cases have FDA-approved therapies available, while others may be matched to clinical trials or emerging treatments.

The degree of actionability varies by cancer type, ranging from about 60–70% in some cancers to 10–20% in others, highlighting areas where more research is urgently needed.

Looking ahead, Inocras aims to expand from 8,500 genomes to 100,000 and eventually to a million genomes, increase representation across 31 or more cancer types, improve detection of low-frequency mutations below 1%, and enhance population diversity. Achieving these goals requires not only sequencing capacity but also secure, scalable data infrastructure that integrates genomic and clinical data while protecting patient privacy.

Related content

Tags

Genome AnalysisGenomics, the study of genomes, includes functional genomics, evolutionary genomics and comparative genomics. There are many genomic technologies such as DNA sequencing of whole genomes, computational biology and bioinformatics. DNA and nucleic acids must be isolated and concentrated from cells for analysis with kits, automated analyzers and software. Other useful technologies for studying genomics include PCR, microarrays and electrophoresis.Cancer ResearchCancer research aims to understand the mechanisms of cancer development and progression to improve prevention, diagnosis, and treatment. From molecular biology to clinical trials, research spans a wide range of disciplines, including immunotherapy, targeted therapies, and drug discovery. Explore the best cancer research products in our peer-reviewed product directory; compare products, check reviews, and get pricing directly from manufacturers.AACRThe American Association for Cancer Research (AACR) is a leading organization focused on cancer research and treatment. It provides a platform for scientists, healthcare professionals, and industry leaders to share knowledge and collaborate in the fight against cancer.Genome SequencingGenome sequencing involves determining the complete DNA sequence of an organism's genome. It provides crucial information about genetic variations, mutations, and diseases. Advances in sequencing technologies, such as next-generation sequencing (NGS), have accelerated research in genomics, diagnostics, and personalized medicine. Explore genome sequencing tools in our peer-reviewed product directory; compare products, check reviews, and get pricing directly from manufacturers.